Skip to content

Bug: samtools downsample_bam uses deprecated INT.FRAC format (samtools 1.22+ incompatible) #97

@berntpopp

Description

@berntpopp

Description

muc_one_up/read_simulator/utils/samtools.py:downsample_bam() (lines 81-124) uses the deprecated INT.FRAC format for samtools view -s:

fraction_str = str(fraction).lstrip("0.")
downsample_param = f"{seed}.{fraction_str}"
# For fraction=0.75: produces "-s 42.75"

In samtools 1.22+, -s 42.75 is interpreted as probability = 42.75 (>1.0), which keeps all reads instead of downsampling. The old INT.FRAC format (seed.fraction) is no longer supported.

Impact

All downsampling operations silently produce identical copies of the input BAM. This was discovered during the manuscript benchmark experiments (experiment 3: coverage titration) where all 4 coverage fractions produced identical VNtyper results.

Reproduction

samtools --version  # 1.22.1
samtools view -c -s 42.7500 input.bam   # Returns ALL reads (42.75 > 1.0)
samtools view -c -s 0.75 input.bam      # Returns ~75% (correct)

Fix

Replace -s SEED.FRAC with plain -s FRAC:

cmd = ["samtools", "view", "-s", str(fraction), "-b", str(input_bam), "-o", str(output_bam)]

Or use --subsample / --subsample-seed flags (but note --subsample has a bug with -b -o in samtools 1.22).

Environment

  • samtools 1.22.1
  • Python 3.12
  • MucOneUp 0.44.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions