Add genetic map auto detection and conversion by LouisLeNezet · Pull Request #272 · nf-core/phaseimpute

LouisLeNezet · 2026-03-16T16:46:53Z

PR checklist

Closes #216
Closes #150

Genetic map has been added to test_batch, test, test_quilt, test_stitch, test_panelprep, test_all and test_vcf. Therefore the md5sum for the test - with map have been updated.
The test names have been clarified.

atrigila · 2026-03-17T16:41:23Z

docs/usage.md

+The following parameters are automatically detected, but can also be set (e.g. when you only provide `pos` and `cm` with no header):
+
+- `--map_sep`: Field separator used in the map file (e.g. "\t", " ", ",")
+- `--map_header`: Whether the file contains a header row (`true` or `false`)
+- `--map_colnames`: Ordered list of column names in the file
+
+For the example below, the map file uses tab separators, contains a header, and provides the columns in the following order: `chr`, `id`, `cm`, `pos`. Therefore, the appropriate parameters would be:
+
+`--map_sep "\t" --map_header true --map_colnames "chr,id,cm,pos"`


This is extremely complicated for an user and the fact that it has more parameters to set configurations implies more risk of errors and the necessity of more tests and error messages.

I would suggest that we stick to one map format (e.g. glimpse format), then require users to upload that format as input file and then if they need to run any tool, the autoconversion will be from that strict format to the rest of the tools. The one map format to use could be the one that is the most "comprehensive" of them all (that it includes all the necessary information). We could then point out to external documentation on how to obtain those.

The autoconversion can be a single module that produces all the types of formats as different outputs, then used by the tools.

I think the format should be Oxford 3-column format

This could indeed be a simpler solution, where we would only need MAPCONVERT.
But as a consequence we push the burden of the conversion back to the user if they don't have the right format.

For which format to use as default, I think it could be nice to ask the nf-core community. When I worked on dogs, the genetic map file I found was mostly in 4-column plink format.

atrigila · 2026-03-17T16:45:37Z

modules/local/mapautodetect/main.nf

Before implementing mapautodetect and mapconvert locally, check if we can use the GAWK nf-core module. If this nf-core module does not fit your purpose, consider developing a general‐purpose module under nf-core CUSTOM/* modules and also use template scripts if it has more than 20 lines. You can then import the module into this pipeline via nf‑core modules install, which will avoid having a local copy.

Using the GAWK modules would make the maintenance complicated if we pass the program as a string.
But we could pass it as a file from the assets directory.

atrigila · 2026-03-17T16:51:46Z

docs/usage.md

+```csv title="chr21.map"
+chr1	id1	0.00000	55550
+chr1	id2	0.00000	632942
+chr1	id3	0.00000	633147
+chr1	id4	0.41029	785910
+chr1	id5	0.41742	788439
+chr1	id6	0.41764	788511
+chr1	id7	0.43061	792862
+chr1	id8	0.43586	794568


You should validate the map structure with a schema, which columns are mandatory? which are optional? Correct errors should be raised.

The schema usage, could indeed be a good solution.
I will try to implement it.

atrigila · 2026-03-17T16:52:59Z

nextflow_schema.json

+                "map_sep": {
+                    "type": "string",
+                    "description": "Separator used in the genetic map file.",
+                    "default": null,
+                    "fa_icon": "fas fa-arrows-alt-h",
+                    "hidden": true
+                },
+                "map_header": {
+                    "type": "boolean",
+                    "description": "Does the genetic map file contain a header line?",
+                    "fa_icon": "fas fa-heading",
+                    "hidden": true
+                },
+                "map_colnames": {
+                    "type": "string",
+                    "description": "Column names for the genetic map file.",
+                    "default": null,
+                    "fa_icon": "fas fa-columns",
+                    "pattern": "^(chr|id|cm|pos)(,(chr|id|cm|pos))*$",
+                    "hidden": true


This can be detected with a schema.

atrigila · 2026-03-17T16:54:22Z

workflows/phaseimpute/tests/test_stitch.nf.test

Do not change tests. Add new tests if necessary, but do not change previous tests. Otherwise, we cannot evaluate if previous functionality remains stable.

I made sure that the md5sum only changed for the test concerned by the addition of the map file.
The only snapshot that changed are for test where we added a map.

This is also why I've changed the test for a more descriptive name to better follow the changes.

atrigila · 2026-03-17T17:06:22Z

subworkflows/local/map_detect_convert_gawk/main.nf

If no nf-core modules fulfill this, this could be replaced by a single custom module doing the detect + convert using python/R (and submitted to nf-core). The AWK code has no validation, no numeric type enforcement, no header awareness, it is a bit harder to unit test and fragile to whitespaces.

atrigila · 2026-03-17T20:57:17Z

conf/test_glimpse2.config

Please do not change the original tests, this messes up with the nf-tests later and it is more difficult to do traceability. We can manage tests cases using the params in nf-test.

Here, I only reordered the input to match the other config files.

The aim of the test from my point of view is to be the closer as possible to real scenario.
That's why I did add a map file in test_batch, test, test_quilt, test_stitch, test_panelprep, test_all and test_vcf. As they weren't using one yet.

I can remove it from test_batch and test_vcf as they are not aimed to test the tools.
But for me, the test aimed at tool testing should have the map to me more comprehensive.

LouisLeNezet added 7 commits March 16, 2026 16:21

Add auto detection and convertion of map files

e38222f

Add map autoconversion map

dec5086

Add new map parameters

2dd2fc8

Update nft-utils

12bba83

Update changelog

0029e08

Update documentation

5bf32f9

Update test

13cf3b6

LouisLeNezet self-assigned this Mar 16, 2026

LouisLeNezet changed the base branch from main to dev March 16, 2026 16:47

LouisLeNezet mentioned this pull request Mar 16, 2026

Add genetic map support for all tools #216

Closed

11 tasks

Update changelog

8920079

LouisLeNezet linked an issue Mar 16, 2026 that may be closed by this pull request

Map is not supported yet #150

Open

LouisLeNezet added 10 commits March 17, 2026 11:34

Set to one tuple

f03ea87

Update workflow, doc and config

1eeb855

Update test

85db626

Update meta.yml

544a68a

Update config

25cca7d

Update test

f8f5cc5

Update json

3297280

Add map to all test

2cd80ab

Update test

6276989

Update test

3ac88b9

LouisLeNezet requested a review from atrigila March 17, 2026 14:04

atrigila requested changes Mar 17, 2026

View reviewed changes

LouisLeNezet added 2 commits March 18, 2026 13:06

Update test

070d806

Update test

0401804

Conversation

LouisLeNezet commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LouisLeNezet commented Mar 16, 2026 •

edited

Loading