Skip to content

F10: CLI subcommand \pheno-dag run <yaml>\ for DAG execution#238

Open
KooshaPari wants to merge 2 commits into
mainfrom
epic/F10-dag-cli-run
Open

F10: CLI subcommand \pheno-dag run <yaml>\ for DAG execution#238
KooshaPari wants to merge 2 commits into
mainfrom
epic/F10-dag-cli-run

Conversation

@KooshaPari

@KooshaPari KooshaPari commented Jun 25, 2026

Copy link
Copy Markdown
Owner

User description

Summary


CodeAnt-AI Description

Add DAG file loading and parallel execution planning from the CLI

What Changed

  • The CLI can now read a DAG from a YAML file, turn it into an execution plan, and print the run order in parallel-ready buckets
  • Verbose mode now shows the parsed DAG details and can also print the exported YAML form of the plan
  • DAGs can be saved and restored as YAML or JSON, with support for round-trip conversion back into an executable graph
  • Cyclic or invalid DAG definitions now stop execution with a clear error message instead of producing a plan

Impact

✅ Run DAGs from YAML files
✅ See which steps can run in parallel
✅ Clearer errors for invalid or cyclic DAGs

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

@KooshaPari KooshaPari added the area:compute-infra Phenotype compute/infra epic label Jun 25, 2026
@gemini-code-assist

Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@codeant-ai

codeant-ai Bot commented Jun 25, 2026

Copy link
Copy Markdown

CodeAnt AI is reviewing your PR.

@codeant-ai

codeant-ai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@github-actions

Copy link
Copy Markdown

Legacy Tooling Scan Report

Severity Count
Critical 0
High 0
Medium 0
Low 0

No violations detected.

This is a WARN-mode scan. Fix before strict enforcement begins.

@socket-security

Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedcargo/​serde_yaml@​0.9.34%2Bdeprecated10010093100100
Addedcargo/​log@​0.4.3310010093100100
Updatedcargo/​clap@​4.6.1 ⏵ 4.5.619910093100100
Updatedcargo/​http@​1.4.0 ⏵ 1.4.210010093100100
Updatedcargo/​uuid@​1.23.1 ⏵ 1.20.0100 +110093100100

View full report

@socket-security

Copy link
Copy Markdown

Warning

Review the following alerts detected in dependencies.

According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.

Action Severity Alert  (click "▶" to expand/collapse)
Warn High
Obfuscated code: cargo zerocopy is 90.0% likely obfuscated

Confidence: 0.90

Location: Package overview

From: ?cargo/tauri-plugin-log@2.8.0cargo/zerocopy@0.8.52

ℹ Read more on: This package | This alert | What is obfuscated code?

Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at support@socket.dev.

Suggestion: Packages should not obfuscate their code. Consider not using packages with obfuscated code.

Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment @SocketSecurity ignore cargo/zerocopy@0.8.52. You can also ignore all packages with @SocketSecurity ignore-all. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.

View full report

@codeant-ai codeant-ai Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files label Jun 25, 2026
@sonarqubecloud

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 35da1ed19a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@@ -0,0 +1,14 @@
[package]
name = "byteport-cli"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Expose the advertised pheno-dag binary

This package adds src/main.rs but no [[bin]] target, so Cargo will build/install the executable as byteport-cli rather than the advertised pheno-dag name (Cargo documents that src/main.rs defaults to the package name unless a [[bin]] table customizes it: https://doc.rust-lang.org/cargo/reference/cargo-targets.html#binaries). As a result, the documented pheno-dag run ... command will not exist after cargo build/cargo install; add a [[bin]] name = "pheno-dag" target or align the docs and Clap name with byteport-cli.

Useful? React with 👍 / 👎.

match &cli.command {
DagCommand::Run {
yaml,
name: _name,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wire the --name filter into execution

When callers pass pheno-dag run --name <pattern> ..., the parsed filter is discarded here and run_dag receives only the path and verbose flag, so the schedule still includes every node. Because the option is advertised as an execution filter, this silently ignores a user-supplied constraint; pass it into run_dag and filter/validate matching nodes, or remove the flag until it is implemented.

Useful? React with 👍 / 👎.

Comment on lines +35 to +39
let mut queue: VecDeque<&K> = in_degree
.iter()
.filter(|(_, deg)| **deg == 0)
.map(|(n, _)| *n)
.collect();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The zero-in-degree queue is seeded by iterating a HashMap, which has randomized iteration order, so the produced topological order is non-deterministic across runs whenever multiple nodes are ready at once. This can make CLI output and downstream execution order unstable/flaky; use a deterministic structure (for example, collect candidates and sort, or use BTreeMap/BTreeSet) before enqueueing. [logic error]

Severity Level: Major ⚠️
- ⚠️ CLI `pheno-dag run` prints schedule buckets in random order.
- ⚠️ Re-running same YAML DAG yields different textual plans.
- ⚠️ Harder to diff or debug schedules across executions.
Steps of Reproduction ✅
1. Run the CLI entrypoint `pheno-dag run <yaml>` which calls `run_dag()` in
`crates/byteport-cli/src/main.rs:75-133`. `run_dag()` deserializes the YAML into
`DagSchema` (`serialize.rs:93-153`), converts it into a `Dag<String>` via
`DagSchema::into_dag()` (`serialize.rs:140-153`), then calls `scheduler::schedule(&dag)`
at `main.rs:112-118`.

2. Inside `scheduler::schedule()` in `crates/byteport-dag/src/scheduler.rs:35-73`, the
first step is to compute a topological order via `let topo_order = topo::kahn_sort(dag)?;`
at `scheduler.rs:40`, invoking `kahn_sort()` in `crates/byteport-dag/src/topo.rs:19-65`.

3. `kahn_sort()` seeds its zero-in-degree queue using a `HashMap` iteration: `let mut
queue: VecDeque<&K> = in_degree.iter().filter(|(_, deg)| **deg == 0).map(|(n, _)|
*n).collect();` at `topo.rs:35-39`. Both the `in_degree` map (built from
`Dag::iter_nodes()` over a `HashSet` in `dag.rs:21-27,117-119`) and the `HashMap::iter()`
traversal have non-deterministic iteration order due to randomized hashing.

4. For any DAG with multiple zero-in-degree nodes (for example, the multi-source pattern
exercised in `scheduler` tests at `scheduler.rs:200-216`), successive runs of `pheno-dag
run` on the same YAML can yield different `topo_order` sequences and thus different bucket
contents/orderings when `schedule()` groups nodes by rank (`scheduler.rs:58-67`), leading
to non-deterministic schedule text printed by `format_schedule()` at `scheduler.rs:75-92`
and used in `run_dag()` at `main.rs:121-123`.

Fix in Cursor Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** crates/byteport-dag/src/topo.rs
**Line:** 35:39
**Comment:**
	*Logic Error: The zero-in-degree queue is seeded by iterating a `HashMap`, which has randomized iteration order, so the produced topological order is non-deterministic across runs whenever multiple nodes are ready at once. This can make CLI output and downstream execution order unstable/flaky; use a deterministic structure (for example, collect candidates and sort, or use `BTreeMap`/`BTreeSet`) before enqueueing.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

format!("{:?}", child),
));
}
DfsColor::White => visit(dag, child, color, order)?,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The DFS implementation is recursive and can overflow the process stack on long dependency chains (large DAG depth), causing a runtime crash. Replace recursion with an explicit stack-based iterative traversal to keep memory bounded on heap and avoid stack overflow. [possible bug]

Severity Level: Major ⚠️
- ❌ Potential process crash for deep DAGs using dfs_sort.
- ⚠️ Library consumers cannot rely on DFS for large graphs.
Steps of Reproduction ✅
1. In any consumer (internal or external) of `byteport-dag`, construct a very deep linear
DAG using `Dag<String>` from `crates/byteport-dag/src/dag.rs:46-120`: repeatedly call
`Dag::add_node()` (`dag.rs:55-64`) and `Dag::add_edge()` (`dag.rs:70-79`) to form a chain
like `n1 -> n2 -> ... -> nN` where N is large (e.g., 20_000).

2. Call `dfs_sort(&dag)` from `crates/byteport-dag/src/topo.rs:81-97` on this deep DAG.
`dfs_sort()` initializes the color map (`topo.rs:85-86`) and then iterates all nodes
invoking the recursive helper `visit()` for white nodes at `topo.rs:88-91`.

3. Within `visit()` (`topo.rs:99-128`), each node's children are traversed, and for every
white child the function recurses via `DfsColor::White => visit(dag, child, color,
order)?` at `topo.rs:119`. On a path graph, this produces a recursion depth proportional
to the chain length (one stack frame per node).

4. With sufficiently large depth, the recursion exceeds the Rust thread's stack limit,
causing a runtime `stack overflow` panic rather than returning a `DagError`. Current unit
tests in `topo.rs:190-245` only exercise small graphs, so the overflow is not caught by
tests, but any production code that adopts `dfs_sort()` for large/deep DAGs would be
vulnerable to process crashes.

Fix in Cursor Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** crates/byteport-dag/src/topo.rs
**Line:** 119:119
**Comment:**
	*Possible Bug: The DFS implementation is recursive and can overflow the process stack on long dependency chains (large DAG depth), causing a runtime crash. Replace recursion with an explicit stack-based iterative traversal to keep memory bounded on heap and avoid stack overflow.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Comment on lines +63 to +67
DagCommand::Run {
yaml,
name: _name,
verbose,
} => run_dag(yaml, *verbose),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The --name CLI option is parsed but never used, so pheno-dag run --name ... silently executes the full DAG instead of filtering nodes as documented. Thread this argument into run_dag and apply the filter before scheduling to match the command contract. [incomplete implementation]

Severity Level: Major ⚠️
- ❌ pheno-dag --name executes full DAG, ignoring filter.
- ⚠️ Users cannot limit runs to matching nodes.
- ⚠️ Automations relying on name filtering mis-execute extra steps.
Steps of Reproduction ✅
1. Build the CLI crate so that `main()` in `crates/byteport-cli/src/main.rs:59-69` is the
entrypoint for the `pheno-dag` binary.

2. Run `pheno-dag run examples/ci-pipeline.yaml --name build` so Clap parses into
`DagCommand::Run { yaml, name: Some("build".into()), verbose: false }` via the
`DagCommand::Run` variant definition at `crates/byteport-cli/src/main.rs:38-52`.

3. In `main`, the match arm at `crates/byteport-cli/src/main.rs:62-67` destructures as
`name: _name` and calls `run_dag(yaml, *verbose)` without passing the parsed `name`,
effectively discarding the filter.

4. Inside `run_dag` at `crates/byteport-cli/src/main.rs:75-133`, the YAML is deserialized
and the full `Dag<String>` is scheduled via `scheduler::schedule(&dag)` without any
filtering, so the printed schedule includes all nodes even though `--name build` was
supplied.

Fix in Cursor Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** crates/byteport-cli/src/main.rs
**Line:** 63:67
**Comment:**
	*Incomplete Implementation: The `--name` CLI option is parsed but never used, so `pheno-dag run --name ...` silently executes the full DAG instead of filtering nodes as documented. Thread this argument into `run_dag` and apply the filter before scheduling to match the command contract.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Comment on lines +126 to +127
let export = DagSchema::from_dag(&dag, &schema.version)
.with_name(schema.name.clone().unwrap_or_default());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This always writes a name field in verbose export, even when the input DAG had no name, by converting None into an empty string. Preserve optionality instead of forcing Some(""), otherwise unnamed DAGs are serialized with a misleading blank name. [logic error]

Severity Level: Major ⚠️
- ⚠️ Verbose YAML export invents blank names for unnamed DAGs.
- ⚠️ Downstream tools misinterpret empty-name versus missing-name.
Steps of Reproduction ✅
1. Create a YAML DAG file without a `name` field, e.g. just `version`, `nodes`, and
`edges`, and run `pheno-dag run <file> --verbose` so `run_dag` in
`crates/byteport-cli/src/main.rs:75-133` is executed.

2. `DagSchema::from_yaml(&contents)` at `crates/byteport-cli/src/main.rs:87-93`
deserializes into a `DagSchema` whose `name` field is `None` because `DagSchema` defines
`pub name: Option<String>` with `#[serde(default, skip_serializing_if =
"Option::is_none")]` in `crates/byteport-dag/src/serialize.rs:79-85`.

3. After scheduling, the verbose branch at `crates/byteport-cli/src/main.rs:125-127`
builds `export = DagSchema::from_dag(&dag,
&schema.version).with_name(schema.name.clone().unwrap_or_default())`; with `schema.name ==
None`, `unwrap_or_default()` produces `""`, and `with_name` sets `self.name =
Some(name.into())` per `crates/byteport-dag/src/serialize.rs:135-137`.

4. When `export.to_yaml()` runs at `crates/byteport-cli/src/main.rs:128-130`, serde sees
`name: Some("")` and, because `skip_serializing_if` only skips `None`, emits a `name: ""`
field, so the CLI's verbose YAML output now claims the DAG has a blank name even though
the original input had no `name` field at all.

Fix in Cursor Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** crates/byteport-cli/src/main.rs
**Line:** 126:127
**Comment:**
	*Logic Error: This always writes a `name` field in verbose export, even when the input DAG had no name, by converting `None` into an empty string. Preserve optionality instead of forcing `Some("")`, otherwise unnamed DAGs are serialized with a misleading blank name.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Comment on lines +77 to +78
self.children.entry(from.clone()).or_default().push(to.clone());
self.parents.entry(to).or_default().push(from);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: add_edge allows inserting the same (from, to) edge multiple times, which inflates edge_count, duplicates dependency entries, and then gets silently collapsed by DagSchema::from_dag deduplication, causing inconsistent graph behavior across modules. Reject duplicate edges before pushing. [logic error]

Severity Level: Major ⚠️
- ⚠️ Dag edge_count differs before and after schema round-trip.
- ⚠️ Duplicate edges skew parent/child iteration and analytics.
Steps of Reproduction ✅
1. In any crate using the DAG library, construct a graph with `let mut dag: Dag<String> =
Dag::new();` and add nodes `"a"` and `"b"` via `dag.add_node("a".into())` and
`dag.add_node("b".into())` as in the patterns tested at
`crates/byteport-dag/src/dag.rs:137-145`.

2. Call `dag.add_edge("a".into(), "b".into()).unwrap();` twice; the implementation at
`crates/byteport-dag/src/dag.rs:70-79` only checks that both endpoints exist, then
unconditionally executes
`self.children.entry(from.clone()).or_default().push(to.clone());` and
`self.parents.entry(to).or_default().push(from);`, so `children_of("a")` and
`parents_of("b")` now each contain two identical entries and `edge_count()` at lines 96-99
returns 2.

3. Serialize this DAG with `let schema = DagSchema::from_dag(&dag, "1.0.0");` using the
implementation at `crates/byteport-dag/src/serialize.rs:99-125`, which iterates
`dag.children_of(node)` and inserts `(node.clone(), child.clone())` pairs into a
`BTreeSet<(String, String)>` (lines 108-116), implicitly deduplicating the duplicate `(a,
b)` edges.

4. Reconstruct a DAG from the schema via `let dag2 = schema.into_dag().unwrap();` as in
`crates/byteport-dag/src/serialize.rs:140-152` and observe that `dag2.edge_count()` is 1
while the original `dag.edge_count()` was 2, showing that allowing duplicate edges in
`add_edge` leads to inflated adjacency/edge counts in direct `Dag` usage and non-lossless
behavior when round-tripping through `DagSchema`.

Fix in Cursor Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** crates/byteport-dag/src/dag.rs
**Line:** 77:78
**Comment:**
	*Logic Error: `add_edge` allows inserting the same `(from, to)` edge multiple times, which inflates `edge_count`, duplicates dependency entries, and then gets silently collapsed by `DagSchema::from_dag` deduplication, causing inconsistent graph behavior across modules. Reject duplicate edges before pushing.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

@codeant-ai

codeant-ai Bot commented Jun 25, 2026

Copy link
Copy Markdown

CodeAnt AI finished reviewing your PR.

@kilo-code-bot

kilo-code-bot Bot commented Jun 27, 2026

Copy link
Copy Markdown

Code Review Summary

Status: Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 1
WARNING 2
SUGGESTION 4
Issue Details (click to expand)

CRITICAL

File Line Issue
crates/byteport-cli/Cargo.toml 2 Binary name mismatch: package is byteport-cli but no [[bin]] target exists, so Cargo will build as byteport-cli instead of advertised pheno-dag

WARNING

File Line Issue
crates/byteport-dag/src/serialize.rs 39 DagSerError missing Clone — public error type should implement Clone where practical per AGENTS.md
crates/byteport-dag/src/scheduler.rs 25 Schedule<K> missing Clone on public type

SUGGESTION

File Line Issue
crates/byteport-cli/src/main.rs 65 --name filter parsed but discarded via underscore pattern _name — not threaded into run_dag
crates/byteport-dag/src/topo.rs 35-39 Non-deterministic order: HashMap iteration randomizes zero-in-degree node selection
crates/byteport-dag/src/topo.rs 119 Recursive DFS in visit() may stack overflow on deep DAGs
crates/byteport-cli/src/main.rs 127 unwrap_or_default() creates Some("") for missing name, causing empty name: field in YAML output
crates/byteport-dag/src/dag.rs 77-78 add_edge allows duplicate edges, causing inconsistent edge counts
Files Reviewed (6 files)
  • crates/byteport-cli/Cargo.toml
  • crates/byteport-cli/src/main.rs
  • crates/byteport-dag/src/serialize.rs
  • crates/byteport-dag/src/scheduler.rs
  • crates/byteport-dag/src/topo.rs
  • crates/byteport-dag/src/dag.rs

Fix these issues in Kilo Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:compute-infra Phenotype compute/infra epic size:XXL This PR changes 1000+ lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant