Skip to content

chrom1, chrom2 and pair_type fields are now required in pairs file header #264

@js2264

Description

@js2264
  • Until v1.0.3, pairtools sort allows the header line to list column names chr1 and chr2 (as indicated in official 4DN specs).
  • Starting with v1.1.0, pairtools sort now expects the header line indicating column names to list chrom1 and chrom2, and breaks if the header line is #columns: readID chr1 pos1 chr2 pos2 strand1 strand2.
  • It also seem to require pair_type to be present in the #columns in the header, as well as in a column.

I understand that the chr1/chr2 can be circumvented by specifying -c1 and -c2 fields in CLI, but now if a pair_type column is not included, pairtools sort cannot work. Is this an intended behavior? Sorry if I missed something or if this issue has already been raised.

Reproducible example

  1. Here is an unsorted pairs file I created by hand, with chr1/chr2 in header:
echo -e "## pairs format v1.0
#columns: readID chr1 pos1 chr2 pos2 strand1 strand2
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
NS500150:497:HWH2WBGXC:4:23605:21900:3336\tNODE_1404\t461\tNODE_1404\t246\t --
NS500150:497:HWH2WBGXC:4:23603:4102:4882\tNODE_522\t6855\tNODE_1404\t1035\t--
NS500150:497:HWH2WBGXC:4:23606:10802:17906\tNODE_1404\t1441\tNODE_1814\t4433\t--" > tmp.pairs

This works

pip install pairtools==1.0.3
pairtools sort tmp.pairs 
## pairs format v1.0
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
#columns: readID chr1 pos1 chr2 pos2 strand1 strand2
NS500150:497:HWH2WBGXC:4:23605:21900:3336       NODE_1404       461     NODE_1404       246     --
NS500150:497:HWH2WBGXC:4:23606:10802:17906      NODE_1404       1441    NODE_1814       4433    --
NS500150:497:HWH2WBGXC:4:23603:4102:4882        NODE_522        6855    NODE_1404       1035    --

This fails:

pip install pairtools==1.1.1   ## pairtools 1.1.0 errors with `circular import` 
pairtools sort tmp.pairs 
## pairs format v1.0
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
#columns: readID chr1 pos1 chr2 pos2 strand1 strand2
Traceback (most recent call last):
  File "/home/rsg/micromamba/envs/metator/bin/pairtools", line 8, in <module>
    sys.exit(cli())
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/__init__.py", line 183, in wrapper
    return func(*args, **kwargs)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/sort.py", line 128, in sort
    sort_py(
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/sort.py", line 199, in sort_py
    colindex = int(col) if col.isnumeric() else column_names.index(col) + 1
ValueError: 'chrom1' is not in list
  1. Now, changing the chr1/chr2 to chrom1/chrom2 in the header:
echo -e "## pairs format v1.0
#columns: readID chrom1 pos1 chrom2 pos2 strand1 strand2
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
NS500150:497:HWH2WBGXC:4:23605:21900:3336\tNODE_1404\t461\tNODE_1404\t246\t --
NS500150:497:HWH2WBGXC:4:23603:4102:4882\tNODE_522\t6855\tNODE_1404\t1035\t--
NS500150:497:HWH2WBGXC:4:23606:10802:17906\tNODE_1404\t1441\tNODE_1814\t4433\t--" > tmp2.pairs

This works:

pip install pairtools==1.0.3
pairtools sort tmp2.pairs 
# sorted pairs...

This fails:

pip install pairtools==1.1.1
pairtools sort tmp2.pairs 
## pairs format v1.0
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
#columns: readID chr1 pos1 chr2 pos2 strand1 strand2
Traceback (most recent call last):
  File "/home/rsg/micromamba/envs/metator/bin/pairtools", line 8, in <module>
    sys.exit(cli())
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/__init__.py", line 183, in wrapper
    return func(*args, **kwargs)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/sort.py", line 128, in sort
    sort_py(
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/sort.py", line 199, in sort_py
    colindex = int(col) if col.isnumeric() else column_names.index(col) + 1
ValueError: 'pair_type' is not in list
  1. Now, adding pair_type:
echo -e "## pairs format v1.0
#columns: readID chrom1 pos1 chrom2 pos2 strand1 strand2 pair_type
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
NS500150:497:HWH2WBGXC:4:23605:21900:3336\tNODE_1404\t461\tNODE_1404\t246\t --
NS500150:497:HWH2WBGXC:4:23603:4102:4882\tNODE_522\t6855\tNODE_1404\t1035\t--
NS500150:497:HWH2WBGXC:4:23606:10802:17906\tNODE_1404\t1441\tNODE_1814\t4433\t--" > tmp3.pairs

This works:

pip install pairtools==1.0.3
pairtools sort tmp3.pairs 
# sorted pairs...

This works:

pip install pairtools==1.1.1
pairtools sort tmp3.pairs 
## pairs format v1.0
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
#columns: readID chrom1 pos1 chrom2 pos2 strand1 strand2 pair_type
NS500150:497:HWH2WBGXC:4:23605:21900:3336       NODE_1404       461     NODE_1404       246      --
NS500150:497:HWH2WBGXC:4:23606:10802:17906      NODE_1404       1441    NODE_1814       4433    --
NS500150:497:HWH2WBGXC:4:23603:4102:4882        NODE_522        6855    NODE_1404       1035    --

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions