-
Notifications
You must be signed in to change notification settings - Fork 4
5 Step 4: Block filtering
Parameters: min_majorblock, min_majorblock_steps, min_similarity, save_allblock, consider_all, merge_closeblock, max_diff_i, max_diff_l, off_lines, weighting_length, weighting_size
The main filtering process is done by identifying the number of positions in which each block is the major block of the dataset. This number can be changed via min_majorblock and should be used to find a balance between the number of blocks and the coverage of the block library. To obtain a haplotype library with a specific coverage we recommend the use of the parameter target_coverage to initialize an automatic fitting procedure to determine a good choice for min_majorblock. To control the number of iterations done to fit min_majorblock in target_coverage use max_iteration with min_step_size controlling the minimal difference in min_majorblock per step and target_stop providing a maximum difference to the target.
To control which block is the major block in each position one can control the weighting between the length and number of haplotypes in each block by using the parameters weighting_length and weighting_size.
To avoid excluding important blocks the minimum number is increased slowly (in min_majorblock_steps linear steps). The minimum similarity of a haplotype with a block to be included can be set by the parameter min_similarity. By this one can control the minimum similarity between two haplotypes of the same block. Haplotypes not fulfilling min_similarity but being in all node used to identify the block are not removed unless the parameter save_allblock is set to FALSE.
Additionally there are some minor parameters in the filtering process. To not consider haplotypes which are not in the block original one has to set consider_all to FALSE. To allow blocks with similar haplotypes and location to be merged one has to activate merge_closeblock and set the maximum differences between them via max_diff_i (different haplotypes) and max_diff_l (differences between both). The minimum number of additional haplotypes a block has to have compared to another block when the sequence of windows is the same can be set via off_lines.