FastaFS lets you mount compressed FASTA archives as a virtual filesystem — enabling instant random access without preprocessing, indexing, or duplication.
Working with large FASTA files is inefficient and error-prone:
- Requires auxiliary files (
.fai,.dict) - Random access needs preprocessing or indexing
- Tools expect flat files, not compressed archives
- Storage is duplicated across pipelines
- Data and metadata can get out of sync
FastaFS solves this by turning compressed FASTA archives into a mountable filesystem.
- ⚡ Near-native performance – optimized C++ backend with minimal overhead
- 🧠 No on-demand preprocessing required – skip indexing and loading into memory
- 📂 Works with existing tools – use
grep,awk,samtools, etc. - 💾 Efficient storage – no duplicate FASTA files or temporary extraction
- 🔌 Mount as a filesystem – interact like regular files
- 🔄 Preserves compatibility – fully compatible with existing FASTA-based workflows and tooling
- 🎯 Selective decompression – only access the regions you need
# Clone
git clone https://github.com/yhoogstrate/fastafs.git
cd fastafs
# Build
./build-release.sh
make check
# Cache + mount
./fastafs cache reference ./reference.fa
./fastafs mount reference /mnt/genome
# Use like normal files
ls /mnt/genome
head /mnt/genome/chr1.faFastaFS introduces a virtual filesystem layer using FUSE.
When mounted:
- FASTA and metadata files are generated on-the-fly
- Only requested regions are decompressed
.fa,.fai,.dict, and.2bitstay perfectly in sync
➡️ No temporary files. No duplication. No indexing overhead.
- Large-scale genomics pipelines
- HPC environments with limited I/O bandwidth
- Streaming access to reference genomes
- Toolchains requiring standard FASTA input
- Reproducible workflows
https://github.com/yhoogstrate/fastafs/blob/master/doc/FASTAFS-FORMAT-SPECIFICATION.md
https://bio.tools/fastafs https://github.com/facebook/zstd/blob/dev/contrib/seekable_format/zstd_seekable_compression_format.md
- libboost (unit testing)
- libopenssl / libssl
- libfuse
- zlib / libzstd
- C++ compiler (C++14+)
- cmake or meson + ninja
sudo apt install git build-essential cmake libboost-dev libssl-dev \
libboost-test-dev libboost-system-dev libboost-filesystem-dev \
zlib1g-dev libzstd-dev libfuse-dev
git clone https://github.com/yhoogstrate/fastafs.git
cd fastafssudo yum install git cmake gcc-c++ boost-devel openssl-devel \
libzstd-devel zlib-devel fuse-devel
git clone https://github.com/yhoogstrate/fastafs.git
cd fastafscmake -DCMAKE_BUILD_TYPE=release -DCMAKE_INSTALL_PREFIX=/usr/local .
make -j $(nproc)
sudo make installWithout root:
cmake -DCMAKE_BUILD_TYPE=release -DCMAKE_INSTALL_PREFIX=~/.local .
make -j $(nproc)
make installfastafs cache test ./test.faOr from 2bit:
fastafs cache test ./test.2bitfastafs listfastafs mount hg19 /mnt/fastafs/hg19
ls /mnt/fastafs/hg19fastafs infofastafs psmount.fastafs#/path/to/file.fastafs /mnt/fastafs fuse auto,allow_other 0 0If you use FastaFS in your research, please cite:
Hoogstrate, Y., Jenster, G.W. & van de Werken, H.J.G.
FASTAFS: file system virtualisation of random access compressed FASTA files.
BMC Bioinformatics 22, 535 (2021).
https://doi.org/10.1186/s12859-021-04455-3
Contributions are welcome!
- Open an issue
- Submit a pull request
Format code with:
make tidyFastaFS does not replace FASTA or TwoBit — it enhances them by making them easier to use, more efficient, and seamlessly integrated into existing workflows.