-
Notifications
You must be signed in to change notification settings - Fork 12
Description
One problem I have with the BAM/SAM format is that the reference is not kept with the read file. You allude to this in your paper. Keeping old references around, especially for poorly characterized organisms, is problematic. I.e., stating 'HG18' for your reference is fairly safe because that human reference set is likely to be around for the next 10 years or so. However stating 'genome.fa' (as Illumina does for its references) is too generic. Likewise stating 'Lycopersicon esculentum v0.1' is likely to specify a reference that will not be around for more than a year.
Since you are developing the quip format then an option -- not a requirement -- to embed the reference into the quip file would be useful to us people using those little-known genomes. I suspect that you could compress the reference very nicely.