Bug
While testing edge cases for SAM file ingestion, I discovered that SamParser::ParseLine silently accepts non-numeric string values in fields that strictly require integers (FLAG, POS, MAPQ, PNEXT, and TLEN). Instead of throwing a validation error, the parser converts the string to 0 and writes the corrupted record to the ROOT file, causing silent downstream data corruption.
To Reproduce
- Create a corrupted SAM record with text in the
FLAG and POS columns:
echo -e "read_name\tBROKEN_FLAG\tchr1\tBROKEN_POS\t60\t100M\t=\t1200\t200\tATGC\tIIII" > type_crash.sam
- Run the converter:
./tools/samtoramntuple type_crash.sam output.root
Current Behavior
The parser outputs Processed 1 SAM records and successfully creates output.root. The corrupted string values are silently cast to 0.
Expected behavior
The parser should throw a validation error and safely abort or reject the record, preventing the creation of a corrupted RNTuple.
Root Cause
In src/ramcore/SamParser.cxx, the parser utilizes the C-style atoi() function, which unsafely returns 0 upon failing to parse a string.
Proposed Solution
I have already tested a local fix where I replace atoi() with a strict C++ class wrapper using std::stoi. This catches std::invalid_argument and properly halts execution to preserve data integrity.
Bug
While testing edge cases for SAM file ingestion, I discovered that
SamParser::ParseLinesilently accepts non-numeric string values in fields that strictly require integers (FLAG,POS,MAPQ,PNEXT, andTLEN). Instead of throwing a validation error, the parser converts the string to0and writes the corrupted record to the ROOT file, causing silent downstream data corruption.To Reproduce
FLAGandPOScolumns:echo -e "read_name\tBROKEN_FLAG\tchr1\tBROKEN_POS\t60\t100M\t=\t1200\t200\tATGC\tIIII" > type_crash.sam./tools/samtoramntuple type_crash.sam output.rootCurrent Behavior
The parser outputs
Processed 1 SAM recordsand successfully createsoutput.root. The corrupted string values are silently cast to0.Expected behavior
The parser should throw a validation error and safely abort or reject the record, preventing the creation of a corrupted RNTuple.
Root Cause
In
src/ramcore/SamParser.cxx, the parser utilizes the C-styleatoi()function, which unsafely returns0upon failing to parse a string.Proposed Solution
I have already tested a local fix where I replace
atoi()with a strict C++classwrapper usingstd::stoi. This catchesstd::invalid_argumentand properly halts execution to preserve data integrity.