Skip to content

Simplify reading and writing of HSPs in GECKO #4

@estebanpw

Description

@estebanpw

GECKO employs two functions readFragment and writeFragment to both write all HSPs to disk and to read them back in order to convert them e.g. to csv. These functions take into account whether the machine in which the execution is taking place is little or big endian to ensure cross-machine compatibility. However, since most of the time post-processing of HSPs is performed in the same machine (or the csv is used rather than the binary frags file), it becomes interesting to disable this cross-compatibility by default in order to improve computing time and facilitate understanding of the code. Still, a script/program to transform from little to big endian and viceversa must be added for the cases in which users who execute in different machines wish to convert them to enable post-processing.

Background

The binary frags file contains (in order):

  1. An 8-byte section that encodes an uint64_t which corresponds to the length of the query.
  2. An 8-byte section that encodes another uint64_t for the reference.
  3. A succession of struct FragFile written one after each other.

TODO

Action 1: Add new functions

Add the following functions to the comparisonFunctions.c file and the corresponding headers to comparisonFunctions.h file.

  1. Add a void writeFragmentRaw(struct FragFile *frag, FILE *f) function that writes the HSP contained in frag to the file f without little/big endian conversion.
  2. Add a void readFragmentRaw(struct FragFile *frag, FILE *f) function that reads an HSP from file f and stores it into frag without little/big endian conversion.
  3. Add a void readSequenceLengthRaw(uint64_t *length, FILE *f) function that reads 8 bytes from file f and saves it to variable length.
  4. Add a void writeSequenceLengthRaw(uint64_t *length, FILE *f) function that writes the 8 bytes where the length variable points to, to file f.

Action 2: Make use of the new functions throughout the code

Replace the old functions readFragment, writeFragment, readSequenceLength and writeSequenceLength with the new ones in the following files / functions:

  1. combineFrags.c
  2. filterFrags.c
  3. fragmentv2.c
  4. frags2text.c
  5. fragStat.c
  6. FragHits.c

Action 3: Add a new program that does the conversion from little / big endian

Include a new program that converts from little to big endian and viceversa. The program should be called transformEndianness.c and receive one binary file as input, and output the exact same file but converted to the opposite endianness, taking into consideration the size of each of the structures written.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions