In this short blog post, I’m gonna show you a super simple way to calculate genotype frequencies.
Let’s say we’re looking at a single spot in the DNA (a biallelic situation) with two possible versions, “A” and “a”. That means we can have three possible genotype combos: “AA”, “Aa”, and “aa”.
What we want to do is count how many times each of these genotypes shows up at each SNP.
Let’s create a random genotype matrix for 10 genotyped animals and 20 SNPs.
set.seed(995)
(M <- matrix(c(sample(0:2, 200, replace = TRUE)), nrow = 10))
Which gives us:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
[1,] 2 0 2 1 0 2 1 1 1 0 1 1 2 1 0 1 2 0 1 1
[2,] 1 2 0 0 1 1 1 2 2 0 0 2 2 0 1 2 2 2 1 0
[3,] 0 0 2 1 0 1 2 1 0 2 2 1 1 0 0 0 0 2 0 1
[4,] 1 1 2 1 0 2 2 1 1 1 0 1 2 1 0 2 2 0 2 0
[5,] 0 2 1 1 2 2 2 1 1 2 1 0 2 2 2 1 2 0 1 1
[6,] 0 1 2 0 0 0 2 0 0 0 0 1 0 1 0 1 1 1 1 2
[7,] 2 2 2 2 0 0 1 0 0 2 1 0 2 2 1 0 2 0 0 0
[8,] 2 1 2 1 1 1 0 1 0 0 2 1 0 1 1 2 2 2 0 1
[9,] 2 1 0 2 0 0 0 0 0 0 0 0 0 2 0 1 0 0 2 2
[10,] 2 2 0 2 0 0 1 0 0 1 0 1 1 2 0 1 1 2 0 2
M
can either be a regular matrix
, a Matrix
(from the Matrix package), a data.frame
, or a data.table
.
We can count the number of each genotype at each SNP in just three simple lines of R code:
(aa <- colSums(M == 0))
(Aa <- colSums(M == 1))
(AA <- colSums(M == 2))
And here are the results:
[1] 3 2 3 2 7 4 2 4 6 5 5 3 3 2 6 2 2 5 4 3
[1] 2 4 1 5 2 3 4 5 3 2 3 6 2 4 3 5 2 1 4 4
[1] 5 4 6 3 1 3 4 1 1 3 2 1 5 4 1 3 6 4 2 3
There you have it! A super quick way to get those genotype counts. Hope that was helpful!