GitHub - lh3/naivepca: Naive PCA for genotype data

NaivePCA performs PCA for population genotype data. It implements the basic algorithm as is described by Patterson et al (2006). More precisely, suppose we have m samples of ploidy h and n biallelic markers. Let G_ij be the number of non-reference alleles for sample i at marker j. NaivePCA computes:

\mu_j  = \sum_{i=1}^m G_{ij} / m
p_j    = \mu_j / h
M_{ij} = \frac{G_{ij}-\mu_j}{\sqrt{p_j(1-p_j)}}
X_{ij} = \sum_{k=1}^n M_{ik} M_{jk} / n

and finds the eigenvectors of matrix (X_ij). Notably, if G_ij is missing data, M_ij takes zero and the computation of μ_j needs to be adjusted as well.

The input of NaivePCA looks like:

sample1  110022110100202021001122*201
sample2  2012201102*221020211*1222001

where a number represents a genotype and other characters are treated as missing data. For now, NaivePCA does not support real matrices. The output is TAB-delimited. The first column is the sample name. The i-th column gives the eigenvector corresponding the (i-1)-th largest eigenvalue.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
eigen.c		eigen.c
kseq.h		kseq.h
ksort.h		ksort.h
naivepca.c		naivepca.c
test.dat.gz		test.dat.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

Makefile

Makefile

README.md

README.md

eigen.c

eigen.c

kseq.h

kseq.h

ksort.h

ksort.h

naivepca.c

naivepca.c

test.dat.gz

test.dat.gz

Repository files navigation

About

Releases

Packages

Languages

lh3/naivepca

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages