twobitreader

twobitreader is a small, fast Python package for reading UCSC .2bit genome files. It supports random access by sequence name and genomic interval, making it useful for pulling slices from large genome files without loading whole chromosomes into memory.

The package reads .2bit files only; it does not write them.

Performance in v4

Version 4 keeps decoding pure Python while reducing startup cost and speeding up common slice paths. The main changes are lazy construction of the large two-byte lookup table, faster N-block lookup with bisect, and decoded sequence buffers backed by plain Python character lists instead of deprecated array('u') buffers.

Benchmarks below compare v4.0.0 with v3.1.8 on Python 3.14.5, using synthetic 5 Mb .2bit files. The v3.1.9 tag has the same reader implementation as v3.1.8, plus release/CI packaging changes.

Benchmark	v3.1.8	v4.0.0	Change
Cold import time	179.6 ms	35.6 ms	5.0x faster
Peak import memory	14.18 MB	2.22 MB	6.4x less
Plain 1 Mb slice	135.6 ms	17.3 ms	7.8x faster
10 bp slice with 50k N-blocks	0.749 ms	0.0026 ms	290x faster

Installation

Install the latest released package from PyPI:

pip install twobitreader

For local development, clone the repository and install it in editable mode:

git clone https://github.com/benjschiller/twobitreader.git
cd twobitreader
pip install -e ".[dev,docs]"
pre-commit install

Python Usage

Open a .2bit file with TwoBitFile. It behaves like a dictionary whose keys are sequence names and whose values are sliceable sequence objects.

from twobitreader import TwoBitFile

with TwoBitFile("hg19.2bit") as genome:
    print(genome.keys())
    print(genome.sequence_sizes()["chr1"])

    sequence = genome["chr1"][100_000:100_050]
    print(sequence)

Coordinates follow Python and UCSC BED conventions: they are 0-based and end-open. For example, genome["chr1"][10:20] returns 10 bases.

Converting an entire chromosome to a string works, but can use a lot of memory:

with TwoBitFile("hg19.2bit") as genome:
    chr_m = str(genome["chrM"])

Command-Line Usage

twobitreader can also read BED-style intervals from standard input and write FASTA records to standard output:

python -m twobitreader genome.2bit < regions.bed > regions.fa

Input lines should have at least three whitespace-separated fields:

chrom    start    end
chr1     100000   100050
chr2     250      300

Invalid regions are skipped with warnings written to standard error. Intervals that extend past the end of a sequence are truncated.

Downloading Genomes

The twobitreader.download module can fetch .2bit genomes from UCSC:

python -m twobitreader.download hg19

Please follow UCSC's usage guidelines and avoid excessive automated downloads.

Development

Run the full test suite with:

python3 -m unittest discover -s tests

Run the lightweight package smoke test with:

python3 test_package.py

Build the package with:

python3 -m build

Build the Sphinx documentation with:

sphinx-build -W --keep-going -b html doc doc/_build/html

Run formatting and repository checks with:

pre-commit run --all-files

The Makefile uses python in a few targets. If your environment only provides python3, run the equivalent command directly with python3.

License

twobitreader is licensed under the Perl Artistic License 2.0. See LICENSE.txt and COPYRIGHT for details.

No warranty is provided, express or implied.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.github/workflows		.github/workflows
doc		doc
tests		tests
twobitreader		twobitreader
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CHANGELOG		CHANGELOG
CLAUDE.md		CLAUDE.md
COPYRIGHT		COPYRIGHT
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
test_package.py		test_package.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

twobitreader

Performance in v4

Installation

Python Usage

Command-Line Usage

Downloading Genomes

Development

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

twobitreader

Performance in v4

Installation

Python Usage

Command-Line Usage

Downloading Genomes

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages