Here, I introduce an R function for calculating minor allele frequencies (MAF).
calcmaf <- function(M, col1ID = TRUE) {
if(col1ID) {
maf = colMeans(M[,-1])/2
} else {
maf = colMeans(M)/2
}
maf[maf > 0.5] <- 1 - maf[maf > 0.5]
return(unname(maf))
}
The calcMAF
function takes arguments M
and col1ID
.
M
is the genotype data frame with genotypes coded as 0:2
.
col1ID
takes TRUE
or FALSE
.
If TRUE
(default) the 1st column of M
is animal ID.
Let’s create an example genotype data frame for 10 genotypes and 20 SNPs, where the first column is animal ID.
set.seed(995)
M <- matrix(c(sample(0:2, 200, replace = TRUE)), nrow = 10)
(M <- as.data.frame(cbind(101:110, M)))
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21
1 101 2 0 2 1 0 2 1 1 1 0 1 1 2 1 0 1 2 0 1 1
2 102 1 2 0 0 1 1 1 2 2 0 0 2 2 0 1 2 2 2 1 0
3 103 0 0 2 1 0 1 2 1 0 2 2 1 1 0 0 0 0 2 0 1
4 104 1 1 2 1 0 2 2 1 1 1 0 1 2 1 0 2 2 0 2 0
5 105 0 2 1 1 2 2 2 1 1 2 1 0 2 2 2 1 2 0 1 1
6 106 0 1 2 0 0 0 2 0 0 0 0 1 0 1 0 1 1 1 1 2
7 107 2 2 2 2 0 0 1 0 0 2 1 0 2 2 1 0 2 0 0 0
8 108 2 1 2 1 1 1 0 1 0 0 2 1 0 1 1 2 2 2 0 1
9 109 2 1 0 2 0 0 0 0 0 0 0 0 0 2 0 1 0 0 2 2
10 110 2 2 0 2 0 0 1 0 0 1 0 1 1 2 0 1 1 2 0 2
Now, calculate MAF:
calcmaf(M) # equivalent to calcmaf(M, col1ID = TRUE)
[1] 0.40 0.40 0.35 0.45 0.20 0.45 0.40 0.35 0.25 0.40 0.35 0.40 0.40 0.40 0.25
[16] 0.45 0.30 0.45 0.40 0.50
Suppose, we had a data frame, purely genotypes, where the first column was not animal ID (M[,-1]
). Then:
calcmaf(M = M[,-1], col1ID = FALSE)
[1] 0.40 0.40 0.35 0.45 0.20 0.45 0.40 0.35 0.25 0.40 0.35 0.40 0.40 0.40 0.25
[16] 0.45 0.30 0.45 0.40 0.50