Hello,
I've noticed a change in the output format in the log file for filter_remapped_reads.py between two WASP runs. The first run was done using a git clone from Aug 2019 and the second was done with a git clone from 2023. So both of these runs should have used the last release of WASP from April 2019. The old format was:
keep_reads: 53469484
bad_reads: 74236
discard_reads: 5649648 (of which 3980136 remapped with a different cigar)
The new format is:
[E::idx_find_and_load] Could not retrieve index file for '151_Hepatocytes_sorted.remap_sorted.bam'
keep reads: 10669652
bad reads: 88428
not present reads: 177570
discard reads: 39159
CIGAR mismatch 31472
CIGAR missing or multiple: 7687
mate pair missing: 0
Could you help me with two questions to make sure that WASP is running correctly:
- Should WASP still behave as expected even though it raised the issue about not finding the bam index in the second run? I also did not index the bam file in the first run and did not receive this issue.
- What caused the difference in the log file format? I noticed there were two commits to the filter_remapped_reads.py script in 2021. Was the format changed then?
- The percentage of reads discarded relative to the total remapped reads is much lower in the second run. Is this more likely due to different properties of the two datasets? The first dataset looked at haplotypes from over 100 individuals, and the second dataset only had a few dozen. Is WASP more likely to discard reads during remapping if there are more individuals in the dataset?
Thanks!
Kevin
Hello,
I've noticed a change in the output format in the log file for filter_remapped_reads.py between two WASP runs. The first run was done using a git clone from Aug 2019 and the second was done with a git clone from 2023. So both of these runs should have used the last release of WASP from April 2019. The old format was:
keep_reads: 53469484
bad_reads: 74236
discard_reads: 5649648 (of which 3980136 remapped with a different cigar)
The new format is:
[E::idx_find_and_load] Could not retrieve index file for '151_Hepatocytes_sorted.remap_sorted.bam'
keep reads: 10669652
bad reads: 88428
not present reads: 177570
discard reads: 39159
CIGAR mismatch 31472
CIGAR missing or multiple: 7687
mate pair missing: 0
Could you help me with two questions to make sure that WASP is running correctly:
Thanks!
Kevin