Description
muc_one_up/read_simulator/utils/samtools.py:downsample_bam() (lines 81-124) uses the deprecated INT.FRAC format for samtools view -s:
fraction_str = str(fraction).lstrip("0.")
downsample_param = f"{seed}.{fraction_str}"
# For fraction=0.75: produces "-s 42.75"
In samtools 1.22+, -s 42.75 is interpreted as probability = 42.75 (>1.0), which keeps all reads instead of downsampling. The old INT.FRAC format (seed.fraction) is no longer supported.
Impact
All downsampling operations silently produce identical copies of the input BAM. This was discovered during the manuscript benchmark experiments (experiment 3: coverage titration) where all 4 coverage fractions produced identical VNtyper results.
Reproduction
samtools --version # 1.22.1
samtools view -c -s 42.7500 input.bam # Returns ALL reads (42.75 > 1.0)
samtools view -c -s 0.75 input.bam # Returns ~75% (correct)
Fix
Replace -s SEED.FRAC with plain -s FRAC:
cmd = ["samtools", "view", "-s", str(fraction), "-b", str(input_bam), "-o", str(output_bam)]
Or use --subsample / --subsample-seed flags (but note --subsample has a bug with -b -o in samtools 1.22).
Environment
- samtools 1.22.1
- Python 3.12
- MucOneUp 0.44.4
Description
muc_one_up/read_simulator/utils/samtools.py:downsample_bam()(lines 81-124) uses the deprecatedINT.FRACformat forsamtools view -s:In samtools 1.22+,
-s 42.75is interpreted as probability = 42.75 (>1.0), which keeps all reads instead of downsampling. The oldINT.FRACformat (seed.fraction) is no longer supported.Impact
All downsampling operations silently produce identical copies of the input BAM. This was discovered during the manuscript benchmark experiments (experiment 3: coverage titration) where all 4 coverage fractions produced identical VNtyper results.
Reproduction
Fix
Replace
-s SEED.FRACwith plain-s FRAC:Or use
--subsample/--subsample-seedflags (but note--subsamplehas a bug with-b -oin samtools 1.22).Environment