Reference-guided genome assembly utilities for sorting contigs, conservatively cleaning mostly-correct assemblies, splitting reviewed chimeric contigs, cutting exact breakpoints, manual dot-plot review, plotting alignments, scaffolding final ordered contigs, and applying reviewed graph-supported gap fills.
ChromoSort provides one command, chromo, with nine subcommands:
| Command | Purpose |
|---|---|
chromo sort |
Assign contigs to the best-supported reference sequence from MUMmer coords or minimap2 PAF, merge alignment evidence, filter contained or low-value duplicate overlaps, protect likely split candidates, and write a reference-ordered FASTA with TSV decision reports (sort docs). |
chromo clean |
Apply sort-style filtering to raw contigs, conservatively fix retained contigs, orient/order the emitted records, and write a cleaned FASTA plus audit reports for mostly-correct assemblies (clean docs). |
chromo eval |
Prepare editable TSV review tables for algorithm-assisted, human-reviewed fix, scaffold, and gapfill decisions, with optional GFA, long-read PAF, and GAF evidence for the matching --reviewed-plan execution paths (eval docs). |
chromo fix |
Split chimeric or structurally inconsistent contigs into reference-labeled pieces by scanning query-ordered alignment blocks, smoothing ordinary gaps, selecting eligible reference/orientation transitions, and writing a fixed full-assembly FASTA plus an audit report (fix docs). |
chromo cut |
Apply exact reviewed breakpoint edits when you already know the cut positions, replacing each requested contig with numbered pieces while copying uncut contigs unchanged and recording every emitted slice (cut docs). |
chromo manual |
Build a self-contained browser dashboard for dot-plot curation, task-specific fix/scaffold/gapfill review-event queues, contig editing, optional GFA and long-read evidence panels, FASTA export, and reproducible recipe application (manual docs, dot-plot guide). |
chromo plot |
Draw whole-genome, per-reference, or selected-reference dot plots from existing MUMmer coords or minimap2 PAF alignments, optionally ordered by a chromo sort assignment report, without re-running an aligner (plot docs, dot-plot guide). |
chromo scaffold |
Join the final sorted contigs into one scaffold FASTA record per assigned reference, infer or fix N-gap lengths, report overlaps and gap decisions, and optionally add report-only GFA junction evidence (scaffold docs). |
chromo gapfill |
Plan graph-supported fills between adjacent sorted contigs using GFA paths plus optional GAF, Hi-C-like, or reference-placement PAF evidence, then apply only fillable and reviewed paths while unresolved junctions fall back to N gaps (gapfill docs). |
Full documentation is available at https://rotheconrad.github.io/chromosort/.
New users should start with Installation, then use Input Files to prepare MUMmer, minimap2, GFA, GAF, or Hi-C-like evidence. The Workflows page shows the recommended order for fixing, sorting, plotting, scaffolding, and graph-aware review. The Agent and Review Playbook gives reproducible patterns for choosing one primary coords or PAF alignment, same-reference inversion review, long-read/GFA/GAF evidence, and handoffs between datasets or assistant chats. The dot-plot guide is a mini tutorial for reading whole-genome and per-reference dot plots. The Architecture page maps algorithms and data models to the subcommands, modes, and parameters that activate them, while the Production Upgrade Roadmap tracks completed and follow-up review-layer work. Command-specific pages are linked in the table above.
For interpreting results, see Output Files and Troubleshooting.
MUMmer coords and minimap2 PAF files describe one exact reference FASTA and one exact assembly FASTA. If a ChromoSort step writes a changed FASTA by removing records, splitting contigs, cutting contigs, reverse-complementing records, renaming records, or scaffolding records, re-run MUMmer or minimap2 before using that changed FASTA as the assembly input to another alignment-dependent command.
You can reuse the original coords or PAF to make decisions about the original
assembly. For example, run chromo sort on raw.fa, inspect
split_candidate=yes rows, then run chromo fix on that same raw.fa with the
same raw alignment file. You should not run chromo fix on
sample.ordered.fa from chromo sort with coords that were generated from
raw.fa.
chromo plot --assignments is also an important special case: it plots the
original alignment rows while ordering the query axis by a chromo sort
assignment report. This is useful for reviewing sort decisions without
re-aligning, but it is not a new alignment of the edited FASTA. To validate
ordered.fa, fixed.fa, or a manual-export FASTA, generate fresh coords or PAF
for that exact FASTA. For help reading the resulting visual patterns, use the
dot-plot guide.
For most new ChromoSort runs, minimap2 PAF is the recommended primary alignment
input because it is fast and carries MAPQ. Use -c --secondary=no, then tune the
minimap2 preset and ChromoSort filters for the species and assembly quality.
MUMmer coords remains a good alternative and can provide a useful second
perspective when benchmarking, tuning a new crop group, or debugging a marginal
event.
You usually do not need to run both. ChromoSort normalizes coords and PAF rows
into the same internal alignment model before sorting, plotting, and fixing; the
remaining differences usually come from minimap2-vs-MUMmer alignment algorithms,
row fragmentation, primary/secondary handling, MAPQ, and identity fields rather
than separate ChromoSort decision logic. In the soybean coords-vs-PAF fix
benchmark, split counts differed by about 5-10%, while marginal split-contig
sets differed by about 20-30%. Treat those as reasonable starting expectations,
then use chromo eval with long-read PAF, GFA, and GAF evidence for stronger
support on biological calls.
chromo fix has four planner modes:
| Mode | What it considers | Smoothing |
|---|---|---|
chromosome |
Reference/chromosome changes only. | Yes |
conservative |
Reference/chromosome changes, plus only complex same-reference orientation events. | Yes |
comprehensive |
All reference/chromosome changes and all same-reference orientation changes. | Yes |
sensitive |
Every passing reference/orientation transition after adjacent same-target collapse. | No |
comprehensive is not guaranteed to be conservative plus extra calls. Because
it treats orientation as part of the smoothed target signature, it can choose
different candidate pieces or reject a plan that conservative would split.
Use it for broader review, especially same-reference inversion candidates.
git clone https://github.com/rotheconrad/chromosort.git
cd chromosort
mamba env create -f environment.yml
mamba activate chromosort
chromo --help
chromo sort --help
chromo clean --helpTypical conservative cleanup workflow for a mostly-correct assembly:
chromo clean \
--ref-fasta reference.fa \
--assembly-fasta assembly.fa \
--coords mummer/raw.coords \
--output-prefix results/sample \
--orient-to-reference \
--discarded-fasta results/sample.discarded.fa
# Re-align results/sample.clean.fa before final validation plots.Typical reviewed workflow, with re-alignment after FASTA edits:
# 1. Fix reviewed/suspect raw contigs.
chromo fix \
--assembly-fasta assembly.fa \
--coords mummer/raw.coords \
--contigs suspect_contig_1 suspect_contig_2 \
--output-fasta results/sample.fixed.fa \
--report results/sample.fixed_contigs.tsv
# 2. Re-align results/sample.fixed.fa with MUMmer or minimap2.
# 3. Sort the fixed FASTA with the fixed-FASTA alignment.
chromo sort \
--ref-fasta reference.fa \
--assembly-fasta results/sample.fixed.fa \
--coords mummer/fixed.coords \
--output-prefix results/sample.fixed \
--orient-to-reference
# 4. Plot from the same fixed-FASTA alignment for visual review.
chromo plot \
--ref-fasta reference.fa \
--assembly-fasta results/sample.fixed.fa \
--coords mummer/fixed.coords \
--assignments results/sample.fixed.contig_assignments.tsv \
--output-prefix plots/sample.fixed \
--per-ref
# Add --sel-ref Gm6 Gm12 Gm15 to redraw only selected references.git clone https://github.com/rotheconrad/chromosort.git
cd chromosort
pixi install
pixi run help
pixi run testCurrent version: 0.2.27. Operational commands are sort, clean, eval, fix, cut, manual, plot, scaffold, and gapfill. See docs/status.md or CHANGELOG.md for version history. See docs/roadmap.md for the production review-upgrade roadmap.
If you use ChromoSort, cite this repository and cite MUMmer or minimap2 for the alignment files used by the workflow. See CITATION.cff.
Please use the GitHub issue tracker for bug reports, feature requests, and questions: https://github.com/rotheconrad/chromosort/issues.
ChromoSort is released under the MIT License. See LICENSE.
This project is supported by the U.S. Department of Agriculture - Agricultural Research Service (USDA-ARS) - Genomics and Bioinformatics Research Unit (GBRU) through CRIS Project No. 6066-21310-006-000-D.
ChromoSort can consume MUMmer and minimap2 whole-genome alignments. Thanks to the genome assembly and comparative genomics communities whose workflows motivated transparent reference-guided contig sorting, splitting, plotting, and scaffolding tools.
| Version | Notes |
|---|---|
| Unreleased | Added agent-ready review documentation and coords-vs-PAF guidance, including PAF-first input recommendations, expected alignment-format differences from soybean testing, and clearer chromo fix mode documentation for chromosome, conservative, comprehensive, and sensitive planners. |
0.2.27 |
Refreshed publication-style architecture and user documentation: added algorithm/data-model activation maps, evidence authority mapping, updated eval/manual/GAF command guidance, synchronized input/output/workflow/status/troubleshooting docs, and verified docs/test consistency. |
0.2.26 |
Completed the GAF evidence and modular manual-panel upgrade: shared GAF parsing/traversal summaries, --gaf evidence in chromo eval fix/scaffold/gapfill, GAF status and selected-read fields in gapfill plans, optional --read-paf/--gaf panels in task-specific manual dashboards, and mixed GFA/PAF/GAF review fixtures/docs. |
0.2.25 |
Synchronized package, citation, Pixi, conda recipe, README, and docs version metadata; added the production-upgrade roadmap for paired eval table workflows and task-specific manual dashboards feeding reviewed fix, scaffold, and gapfill execution paths. |
0.2.24 |
Added chromo clean, a conservative cleanup command for mostly-correct assemblies that combines sort-style filtering with fix-style conservative splitting on retained raw contigs, then writes <prefix>.clean.fa plus initial-sort, fix, clean, and run-summary reports. Clarified README, command docs, and workflows around when FASTA-changing steps require fresh MUMmer or minimap2 alignments before downstream steps or final plots. |
0.2.23 |
Renamed the graph gap-filling command from chromo fill to chromo gapfill, moved the package entry point to chromosort.gapfill, replaced the package script with chromosort-gapfill, and updated gapfill output names to <prefix>.gapfill_plan.tsv and <prefix>.gapfilled.fa. |
0.2.22 |
Added Pixi installation support with pixi.toml, plus README figure assets and captions for chromo manual graph review and chromo plot whole-genome/per-reference examples. |
0.2.21 |
Added graph-aware safety policies. chromo sort and chromo fix now have warning-only --graph-guard checks, while `chromo scaffold --graph-overlap-policy report |
0.2.20 |
Added an end-to-end synthetic graph workflow to the README and shipped focused gapfill walkthrough inputs. The tutorial runs sort/manual/scaffold/gapfill with the graph-gotcha GFA, PAF, GAF, Hi-C-like contacts, review HTML, reviewed-plan TSV, and reviewed gapfill application. |
0.2.19 |
Improved chromo gapfill --review-html candidate comparison. Review dashboards now embed per-candidate path rows with path nodes, support scores, validation status, fill length, trim length, risk flags, and optional fill sequence so reviewers can compare ambiguous branches directly before exporting a reviewed plan. |
0.2.18 |
Added richer path-risk annotations to chromo gapfill. Gapfill plans and review HTML now report risk flags, branch-complexity score, high-degree graph nodes, self-loop nodes, unsequenced nodes, and cycle-guard counts so ambiguous or risky candidate paths are easier to triage. |
0.2.17 |
Added reference-placement PAF evidence to chromo gapfill. The new --ref-paf path scorer reports selected and best-alternate reference support, can conservatively resolve ambiguous branches when one candidate has unique expected-gap placement support, and conflicts with GAF or Hi-C support leave the gap unresolved. |
0.2.16 |
Expanded chromo manual --gfa review. Manual dashboards now include graph-neighborhood filtering, a selected-contig upstream/downstream neighbor panel, overlap/orientation details, and same-reference neighbor flags so branching graph context is easier to compare during manual curation. |
0.2.15 |
Added chromo manual --gfa graph context. Manual dashboards now embed per-contig GFA node evidence, graph complexity labels, degree/neighbor counts, coverage tags such as RC:i, and oriented neighbor summaries so manual breakpoint and ordering review can consider local assembly-graph structure. |
0.2.14 |
Added chromo gapfill --review-html, a self-contained HTML review table for gapfill plans. It embeds the same TSV columns, supports filtering and accepted-fill toggles, and exports a reviewed-plan TSV for --reviewed-plan; the TSV and HTML writers now share one row-generation path. |
0.2.13 |
Added reviewed gapfill-plan application for chromo gapfill. Planning output now includes an editable accept_fill column, and --reviewed-plan makes --apply fill only accepted rows after rechecking the current scaffold, contig pair, path nodes, and fillability; rejected or unaccepted rows fall back to N gaps. |
0.2.12 |
Added optional Hi-C pair support to chromo gapfill. Gapfill plans now report Hi-C path support and best alternate support, and otherwise ambiguous graph branches can be resolved when one candidate has unique summed contact support at or above --min-hic-path-support; conflicting GAF and Hi-C support leaves the junction unresolved. |
0.2.11 |
Expanded the input-file documentation with a dedicated graph-input section describing where to find matching GFA files, which reference-to-assembly PAF files to keep for raw and fixed FASTAs, and how optional GAF read-to-graph alignments are used by chromo gapfill. |
0.2.10 |
Added optional GAF read-path evidence to chromo gapfill. Gapfill plans now report GAF support counts, and otherwise ambiguous graph branches can be resolved when one candidate path has unique support after --min-gaf-mapq filtering and meets --min-gaf-path-support; weak, tied, or missing support still leaves the junction unresolved. |
0.2.9 |
Added chromo gapfill, a conservative graph-gap planning and optional application command. It writes <prefix>.gapfill_plan.tsv, refuses ambiguous or unverifiable GFA paths, applies sequence only with --apply, trims the right flank by the final graph overlap when filling, and falls back to inferred or fixed N gaps for unresolved junctions. |
0.2.8 |
Added report-only --gfa graph context to chromo sort and chromo fix. Sorting now writes <prefix>.graph_assignments.tsv with resolved graph nodes, node degree/self-loop evidence, and direct links to overlap-best contigs; fixing now writes a graph context table beside the split report so reviewed contigs can be checked against the assembly graph before gapfill workflows. |
0.2.7 |
Added chromo scaffold --gfa report-only graph evidence. When a GFA is provided, scaffolding now writes <prefix>.graph_gaps.tsv with resolved graph nodes, orientation-aware direct links, link overlap bp, short explicit GFA paths up to --graph-max-path-edges, intermediate candidate nodes, and missing/no-path statuses without changing FASTA output. |
0.2.6 |
Added the first graph-evidence foundation: a tested GFA parser for segment/link records, orientation-aware edge lookup helpers, overlap-CIGAR handling that preserves complex overlaps as non-trim lengths, and synthetic graph-gotcha fixtures with GFA, PAF, GAF, Hi-C-like, and expected-path files for future roadmap development. |
0.2.5 |
Added chromo manual, a self-contained HTML dashboard for manual dot-plot review, contig removal/restoration, order changes, breakpoints, inversions, scaffold labeling/export, FASTA downloads, recipe JSON export, and reproducible chromo manual apply recipe execution. |
0.2.4 |
Added chromo cut for exact reviewed breakpoint cuts, with repeatable --cut CONTIG:POS[,POS...], single-contig --contig/--pos, batch --cuts-file, cut-piece FASTA output, and an audit TSV report. |
0.2.3 |
Added explicit terminal-overlap classification/rescue in chromo sort, richer scaffold overlap reporting, and chromo scaffold --overlap-policy modes for warn-only, reference-coordinate trimming, and sequence-confirmed trimming. |
0.2.2 |
Reworked chromo fix so --contigs/--contigs-file only select the inspection subset, --all scans every candidate contig, --mode controls planner behavior for both scopes, and breakpoint limits apply per contig. |
0.2.1 |
Tightened chromo sort duplicate filtering for contaminated/alternate-fragment assemblies by using span-based overlap by default, requiring both novel coverage thresholds, rescuing very large near-threshold alignments, and letting split candidates protect their secondary reference spans. |
0.2.0 |
Added minimap2 PAF input for chromo sort and chromo fix, plus chromo plot PDF/SVG/PNG dot plots for coords/PAF with optional assignment-report query ordering. |
0.1.2 |
Raised the default auto-split query-span support threshold to 5% so small terminal off-target blocks are reported for review instead of being cut automatically. |
0.1.1 |
Tightened chromo fix breakpoint placement by collapsing adjacent same-reference/orientation runs, added complex same-reference orientation detection, added a run-level auto breakpoint budget, protected strong multi-reference split candidates during chromo sort, and documented the fix-before-sort workflow for suspected misjoins. |
0.1.0 |
Initial public package with chromo sort, chromo fix, chromo scaffold, duplicate-overlap filtering, user-nominated contig splitting, conservative auto smoothing, inferred/fixed-gap scaffolding, and synthetic tests. |