The original version of popkin was an R package available on CRAN.
This binary suite focuses on optimizing the popkin function of the same package and related tasks, to apply to datasets with large numbers of loci and individuals.
This suite doesn not replace the rest of the functionality of the R package, particularly its plotting functions.
This version of popkin is a very simple scanner of plink BED/BIM/FAM files that produces a GRM.BIN output.
This c++ project converts genotypes on the fly, using minimal memory even for very large files.
It also has optional optimizations for processors with avx2 and avx512 instruction sets.
popkinpca is an efficient calculator of the eigenvalues and eigenvectors of the kinship or coancestry matrices otherwise calculated by popkin, designed for very large numbers of individuals.
This code employs the Mailman algorithm and the Spectra library for calculating the desired decompositions scalably, similar to predecessors such as SCOPE and ProPCA but adapted for popkin.
popkinpca requires both the Eigen and Spectra libraries, which on Fedora Linux you can install with this command:
dnf install eigen3-devel spectra-develYou may need to adjust the paths to their sources in the Makefile.
After that, this project can be compiled as usual:
makeIf successful, you can test that outputs are as expected by comparing to an internal R version (requires popkin, BEDMatrix, genio, and RSpectra CRAN packages) that is more straighforward code but consumes way more memory on larger files (it is tested on a toy dataset where this is not a concern). The test passed if there are no error messages:
make testUnder build/ there is a version already compiled under my own Linux computer, which may work with other Linux systems.
popkin requires these two arguments:
./popkin -i <input_base> -o <output_base>
<input_base>is the shared base name of the input plink.bed,.bim, and.famfiles, all of which are required.<output_base>is the base name of the output file..grm.bin,.grm.N.bin, and.grm.idextensions are automatically added to the corresponding GRM binary outputs that GCTA normally makes (except our kinship is half as small as theirs!).
There are additional options, please see ./popkin -h for more information.