IBDBillion

Introduction

C programs for efficient processing of billion-scale pairwise IBD sharing developed in the manuscript: Population-scale inheritance analysis of 858,635 individuals reveals North Sea migration from the Middle Ages to the Industrial Revolution https://doi.org/10.1101/2025.03.19.643007.

There are three programs: IBDkin_fastsmc, sumchr_IBDkin and ibdstat. We also provided demo data for testing each program, which can be found inside the test folders.

The usage of the programs follows this workflow:

IBDkin_fastsmc – Calculate Pairwise Total IBD Sharing

We modified the published program IBDkin to accept IBD segment calls from FastSMC and added custom processing options.

It also allows users to merge overlapping IBD segments (with option --remove_overlap 1) since they could be artefacts.

Usage

To run our version of the IBDkin, use the following command:

/pathto/IBDkin_fastsmc \
  --ibdfile ${ibdfile} \
  --map ${map} \
  --ind ${ind} \
  --range ${range} \
  --nthreads ${n_thread} \
  --out ${output_file} \
  --outmask \
  --outcoverage \
  --cutcm [min] [max] \
  --cutprob [max] \
  --remove_overlap 1

Input Options

Option	Type	Description
`--ibdfile`	`string`	File containing a list of paths to FastSMC IBD output files (one per line)
`--map`	`string`	Genetic map file in PLINK format
`--ind`	`string`	File with sample IDs to include (one ID per line)
`--range`	`string`	File with genomic regions included for each chromosome: chromosome, start bp, end bp. We used the range covered by the genetic maps in our case.
`--nthreads`	`int`	Number of threads to use
`--cutcm` [min] [max]	`float float`	Minimum and maximum cM range for IBD segments
`--cutprob` [prob]	`float`	Minimum predictive probability threshold for IBD segments (output of FastSMC)
`--remove_overlap`	`boolean`	`1` to retain only the longest overlapping IBD segment per pair; `0` otherwise

Output Format

The output is a tab-delimited file with the following columns:

Column	Description
`ID1`	Individual ID for person 1
`ID2`	Individual ID for person 2
`segnum`	Total number of IBD segments shared
`IBD1`	Total IBD1 sharing (cM): one pair of haplotypes shares IBD
`IBD2`	Total IBD2 sharing (cM): both pairs of haplotypes share IBD
`totg`	Total IBD sharing (cM), calculated as `IBD1 + 2 × IBD2`

Parallelisation

For cohorts with biobank-scale sample sizes (e.g., N ~ 500,000), we recommend running IBDkin separately for each chromosome to speed up computation and then combining results across all the chromosomes. To assist with this, we also developed a program called sumchr_IBDkin to combine the results across chromosomes efficiently.

sumchr_IBDkin - combine multiple outputs from IBDkin into one

sumchr_IBDkin takes multiple text files output from IBDkin_fastsmc, and aggregate the IBD sharing infomation including total IBD sharing and total IBD segments for all pairs observed.

Usage

/pathto/sumchr_IBDkin \
  --ibdsum_file ${ibdsum} \
  --ind ${ind} \
  --nthreads ${n_thread} \
  --out ${output_file} \
  --self [Pop_name]
  --across [Pop1_name] [Pop2_name]
   \

Input Options

Option	Type	Description
`--ibdsum_file`	`string`	File containing a list of paths to IBDkin_fastsmc output files (one per line)
`--ind`	`string`	File with sample IDs and the sample population name to include. Format: ID\tPop\n. No header. One sample per line
`--nthreads`	`int`	Number of threads to use
`--self` [Pop_name]	`string`	Include the intra-population sharing for Population [Pop_name]
`--across` [Pop1_name] [Pop2_name]	`string` `string`	Include the inter-popualtion sharing between Population [Pop1_name] and [Pop2_name]

Output Format

The output is a tab-delimited file with the following columns:

Column	Description
`ID1`	Individual ID for person 1
`ID2`	Individual ID for person 2
`segnum`	Total number of IBD segments shared
`IBD1`	Total IBD1 sharing (cM): one pair of haplotypes shares IBD
`IBD2`	Total IBD2 sharing (cM): both pairs of haplotypes share IBD
`totg`	Total IBD sharing (cM), calculated as `IBD1 + 2 × IBD2`

ibdstat - summarise the total IBD sharing and total number of IBD segments among pairs from geographic areas

Usage

/pathto/ibdstat \
  --ibdsum ${ibdsum} \
  --ind ${ind} \
  --nthreads ${n_thread} \
  --out ${output_file} \
  --count_country
  --count_region 
  --count_county 
  --count_council
  --segnum
 \

Input Options

Option	Type	Description
`--ibdsum`	`string`	Filename for pairwise IBD sharing summary (one pair per line), e.g. output by sumchr_IBDkin. Format follows the output of sumchr_IBDkin
`--ind`	`string`	File with sample IDs and the sample birthplace information on different levels of geographic divisions. Format: ID\tCountry\tRegion\tCounty\tCouncil. With Header. One sample per line
`--nthreads`	`int`	Number of threads to use
`--count_country`	NA	Output pairwise statistics at country level
`--count_region`	NA	Output pairwise statistics at region level
`--count_county`	NA	Output pairwise statistics at county level
`--count_council`	NA	Output pairwise statistics at council level
`--segnum`	NA	Use this option to output the summary statistics of the total number of IBD segments. Without this option, will output summary statistics based on the total sum of IBD sharing lengths

Output format

The output format is the following (each column is delimited by tab):

Column	Description
`area1`	name of Area 1
`area2`	name of Area 2
`totg`	Total sum of IBD lengths
`count`	number of pairs observed with the corresponding total IBD sharing

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
IBDkin_fastsmc		IBDkin_fastsmc
help_files		help_files
ibdstat		ibdstat
sumchr_ibdkin		sumchr_ibdkin
README.md		README.md
diagram.jpg		diagram.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IBDBillion

Introduction

IBDkin_fastsmc – Calculate Pairwise Total IBD Sharing

Usage

sumchr_IBDkin - combine multiple outputs from IBDkin into one

Usage

ibdstat - summarise the total IBD sharing and total number of IBD segments among pairs from geographic areas

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IBDBillion

Introduction

IBDkin_fastsmc – Calculate Pairwise Total IBD Sharing

Usage

sumchr_IBDkin - combine multiple outputs from IBDkin into one

Usage

ibdstat - summarise the total IBD sharing and total number of IBD segments among pairs from geographic areas

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages