F10: CLI subcommand \pheno-dag run <yaml>\ for DAG execution#238
F10: CLI subcommand \pheno-dag run <yaml>\ for DAG execution#238KooshaPari wants to merge 2 commits into
pheno-dag run <yaml>\ for DAG execution#238Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
CodeAnt AI is reviewing your PR. |
Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
|
Legacy Tooling Scan Report
No violations detected. This is a WARN-mode scan. Fix before strict enforcement begins. |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
|
Warning Review the following alerts detected in dependencies. According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.
|
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 35da1ed19a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| @@ -0,0 +1,14 @@ | |||
| [package] | |||
| name = "byteport-cli" | |||
There was a problem hiding this comment.
Expose the advertised pheno-dag binary
This package adds src/main.rs but no [[bin]] target, so Cargo will build/install the executable as byteport-cli rather than the advertised pheno-dag name (Cargo documents that src/main.rs defaults to the package name unless a [[bin]] table customizes it: https://doc.rust-lang.org/cargo/reference/cargo-targets.html#binaries). As a result, the documented pheno-dag run ... command will not exist after cargo build/cargo install; add a [[bin]] name = "pheno-dag" target or align the docs and Clap name with byteport-cli.
Useful? React with 👍 / 👎.
| match &cli.command { | ||
| DagCommand::Run { | ||
| yaml, | ||
| name: _name, |
There was a problem hiding this comment.
Wire the --name filter into execution
When callers pass pheno-dag run --name <pattern> ..., the parsed filter is discarded here and run_dag receives only the path and verbose flag, so the schedule still includes every node. Because the option is advertised as an execution filter, this silently ignores a user-supplied constraint; pass it into run_dag and filter/validate matching nodes, or remove the flag until it is implemented.
Useful? React with 👍 / 👎.
| let mut queue: VecDeque<&K> = in_degree | ||
| .iter() | ||
| .filter(|(_, deg)| **deg == 0) | ||
| .map(|(n, _)| *n) | ||
| .collect(); |
There was a problem hiding this comment.
Suggestion: The zero-in-degree queue is seeded by iterating a HashMap, which has randomized iteration order, so the produced topological order is non-deterministic across runs whenever multiple nodes are ready at once. This can make CLI output and downstream execution order unstable/flaky; use a deterministic structure (for example, collect candidates and sort, or use BTreeMap/BTreeSet) before enqueueing. [logic error]
Severity Level: Major ⚠️
- ⚠️ CLI `pheno-dag run` prints schedule buckets in random order.
- ⚠️ Re-running same YAML DAG yields different textual plans.
- ⚠️ Harder to diff or debug schedules across executions.Steps of Reproduction ✅
1. Run the CLI entrypoint `pheno-dag run <yaml>` which calls `run_dag()` in
`crates/byteport-cli/src/main.rs:75-133`. `run_dag()` deserializes the YAML into
`DagSchema` (`serialize.rs:93-153`), converts it into a `Dag<String>` via
`DagSchema::into_dag()` (`serialize.rs:140-153`), then calls `scheduler::schedule(&dag)`
at `main.rs:112-118`.
2. Inside `scheduler::schedule()` in `crates/byteport-dag/src/scheduler.rs:35-73`, the
first step is to compute a topological order via `let topo_order = topo::kahn_sort(dag)?;`
at `scheduler.rs:40`, invoking `kahn_sort()` in `crates/byteport-dag/src/topo.rs:19-65`.
3. `kahn_sort()` seeds its zero-in-degree queue using a `HashMap` iteration: `let mut
queue: VecDeque<&K> = in_degree.iter().filter(|(_, deg)| **deg == 0).map(|(n, _)|
*n).collect();` at `topo.rs:35-39`. Both the `in_degree` map (built from
`Dag::iter_nodes()` over a `HashSet` in `dag.rs:21-27,117-119`) and the `HashMap::iter()`
traversal have non-deterministic iteration order due to randomized hashing.
4. For any DAG with multiple zero-in-degree nodes (for example, the multi-source pattern
exercised in `scheduler` tests at `scheduler.rs:200-216`), successive runs of `pheno-dag
run` on the same YAML can yield different `topo_order` sequences and thus different bucket
contents/orderings when `schedule()` groups nodes by rank (`scheduler.rs:58-67`), leading
to non-deterministic schedule text printed by `format_schedule()` at `scheduler.rs:75-92`
and used in `run_dag()` at `main.rs:121-123`.(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** crates/byteport-dag/src/topo.rs
**Line:** 35:39
**Comment:**
*Logic Error: The zero-in-degree queue is seeded by iterating a `HashMap`, which has randomized iteration order, so the produced topological order is non-deterministic across runs whenever multiple nodes are ready at once. This can make CLI output and downstream execution order unstable/flaky; use a deterministic structure (for example, collect candidates and sort, or use `BTreeMap`/`BTreeSet`) before enqueueing.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| format!("{:?}", child), | ||
| )); | ||
| } | ||
| DfsColor::White => visit(dag, child, color, order)?, |
There was a problem hiding this comment.
Suggestion: The DFS implementation is recursive and can overflow the process stack on long dependency chains (large DAG depth), causing a runtime crash. Replace recursion with an explicit stack-based iterative traversal to keep memory bounded on heap and avoid stack overflow. [possible bug]
Severity Level: Major ⚠️
- ❌ Potential process crash for deep DAGs using dfs_sort.
- ⚠️ Library consumers cannot rely on DFS for large graphs.Steps of Reproduction ✅
1. In any consumer (internal or external) of `byteport-dag`, construct a very deep linear
DAG using `Dag<String>` from `crates/byteport-dag/src/dag.rs:46-120`: repeatedly call
`Dag::add_node()` (`dag.rs:55-64`) and `Dag::add_edge()` (`dag.rs:70-79`) to form a chain
like `n1 -> n2 -> ... -> nN` where N is large (e.g., 20_000).
2. Call `dfs_sort(&dag)` from `crates/byteport-dag/src/topo.rs:81-97` on this deep DAG.
`dfs_sort()` initializes the color map (`topo.rs:85-86`) and then iterates all nodes
invoking the recursive helper `visit()` for white nodes at `topo.rs:88-91`.
3. Within `visit()` (`topo.rs:99-128`), each node's children are traversed, and for every
white child the function recurses via `DfsColor::White => visit(dag, child, color,
order)?` at `topo.rs:119`. On a path graph, this produces a recursion depth proportional
to the chain length (one stack frame per node).
4. With sufficiently large depth, the recursion exceeds the Rust thread's stack limit,
causing a runtime `stack overflow` panic rather than returning a `DagError`. Current unit
tests in `topo.rs:190-245` only exercise small graphs, so the overflow is not caught by
tests, but any production code that adopts `dfs_sort()` for large/deep DAGs would be
vulnerable to process crashes.(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** crates/byteport-dag/src/topo.rs
**Line:** 119:119
**Comment:**
*Possible Bug: The DFS implementation is recursive and can overflow the process stack on long dependency chains (large DAG depth), causing a runtime crash. Replace recursion with an explicit stack-based iterative traversal to keep memory bounded on heap and avoid stack overflow.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| DagCommand::Run { | ||
| yaml, | ||
| name: _name, | ||
| verbose, | ||
| } => run_dag(yaml, *verbose), |
There was a problem hiding this comment.
Suggestion: The --name CLI option is parsed but never used, so pheno-dag run --name ... silently executes the full DAG instead of filtering nodes as documented. Thread this argument into run_dag and apply the filter before scheduling to match the command contract. [incomplete implementation]
Severity Level: Major ⚠️
- ❌ pheno-dag --name executes full DAG, ignoring filter.
- ⚠️ Users cannot limit runs to matching nodes.
- ⚠️ Automations relying on name filtering mis-execute extra steps.Steps of Reproduction ✅
1. Build the CLI crate so that `main()` in `crates/byteport-cli/src/main.rs:59-69` is the
entrypoint for the `pheno-dag` binary.
2. Run `pheno-dag run examples/ci-pipeline.yaml --name build` so Clap parses into
`DagCommand::Run { yaml, name: Some("build".into()), verbose: false }` via the
`DagCommand::Run` variant definition at `crates/byteport-cli/src/main.rs:38-52`.
3. In `main`, the match arm at `crates/byteport-cli/src/main.rs:62-67` destructures as
`name: _name` and calls `run_dag(yaml, *verbose)` without passing the parsed `name`,
effectively discarding the filter.
4. Inside `run_dag` at `crates/byteport-cli/src/main.rs:75-133`, the YAML is deserialized
and the full `Dag<String>` is scheduled via `scheduler::schedule(&dag)` without any
filtering, so the printed schedule includes all nodes even though `--name build` was
supplied.(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** crates/byteport-cli/src/main.rs
**Line:** 63:67
**Comment:**
*Incomplete Implementation: The `--name` CLI option is parsed but never used, so `pheno-dag run --name ...` silently executes the full DAG instead of filtering nodes as documented. Thread this argument into `run_dag` and apply the filter before scheduling to match the command contract.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| let export = DagSchema::from_dag(&dag, &schema.version) | ||
| .with_name(schema.name.clone().unwrap_or_default()); |
There was a problem hiding this comment.
Suggestion: This always writes a name field in verbose export, even when the input DAG had no name, by converting None into an empty string. Preserve optionality instead of forcing Some(""), otherwise unnamed DAGs are serialized with a misleading blank name. [logic error]
Severity Level: Major ⚠️
- ⚠️ Verbose YAML export invents blank names for unnamed DAGs.
- ⚠️ Downstream tools misinterpret empty-name versus missing-name.Steps of Reproduction ✅
1. Create a YAML DAG file without a `name` field, e.g. just `version`, `nodes`, and
`edges`, and run `pheno-dag run <file> --verbose` so `run_dag` in
`crates/byteport-cli/src/main.rs:75-133` is executed.
2. `DagSchema::from_yaml(&contents)` at `crates/byteport-cli/src/main.rs:87-93`
deserializes into a `DagSchema` whose `name` field is `None` because `DagSchema` defines
`pub name: Option<String>` with `#[serde(default, skip_serializing_if =
"Option::is_none")]` in `crates/byteport-dag/src/serialize.rs:79-85`.
3. After scheduling, the verbose branch at `crates/byteport-cli/src/main.rs:125-127`
builds `export = DagSchema::from_dag(&dag,
&schema.version).with_name(schema.name.clone().unwrap_or_default())`; with `schema.name ==
None`, `unwrap_or_default()` produces `""`, and `with_name` sets `self.name =
Some(name.into())` per `crates/byteport-dag/src/serialize.rs:135-137`.
4. When `export.to_yaml()` runs at `crates/byteport-cli/src/main.rs:128-130`, serde sees
`name: Some("")` and, because `skip_serializing_if` only skips `None`, emits a `name: ""`
field, so the CLI's verbose YAML output now claims the DAG has a blank name even though
the original input had no `name` field at all.(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** crates/byteport-cli/src/main.rs
**Line:** 126:127
**Comment:**
*Logic Error: This always writes a `name` field in verbose export, even when the input DAG had no name, by converting `None` into an empty string. Preserve optionality instead of forcing `Some("")`, otherwise unnamed DAGs are serialized with a misleading blank name.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| self.children.entry(from.clone()).or_default().push(to.clone()); | ||
| self.parents.entry(to).or_default().push(from); |
There was a problem hiding this comment.
Suggestion: add_edge allows inserting the same (from, to) edge multiple times, which inflates edge_count, duplicates dependency entries, and then gets silently collapsed by DagSchema::from_dag deduplication, causing inconsistent graph behavior across modules. Reject duplicate edges before pushing. [logic error]
Severity Level: Major ⚠️
- ⚠️ Dag edge_count differs before and after schema round-trip.
- ⚠️ Duplicate edges skew parent/child iteration and analytics.Steps of Reproduction ✅
1. In any crate using the DAG library, construct a graph with `let mut dag: Dag<String> =
Dag::new();` and add nodes `"a"` and `"b"` via `dag.add_node("a".into())` and
`dag.add_node("b".into())` as in the patterns tested at
`crates/byteport-dag/src/dag.rs:137-145`.
2. Call `dag.add_edge("a".into(), "b".into()).unwrap();` twice; the implementation at
`crates/byteport-dag/src/dag.rs:70-79` only checks that both endpoints exist, then
unconditionally executes
`self.children.entry(from.clone()).or_default().push(to.clone());` and
`self.parents.entry(to).or_default().push(from);`, so `children_of("a")` and
`parents_of("b")` now each contain two identical entries and `edge_count()` at lines 96-99
returns 2.
3. Serialize this DAG with `let schema = DagSchema::from_dag(&dag, "1.0.0");` using the
implementation at `crates/byteport-dag/src/serialize.rs:99-125`, which iterates
`dag.children_of(node)` and inserts `(node.clone(), child.clone())` pairs into a
`BTreeSet<(String, String)>` (lines 108-116), implicitly deduplicating the duplicate `(a,
b)` edges.
4. Reconstruct a DAG from the schema via `let dag2 = schema.into_dag().unwrap();` as in
`crates/byteport-dag/src/serialize.rs:140-152` and observe that `dag2.edge_count()` is 1
while the original `dag.edge_count()` was 2, showing that allowing duplicate edges in
`add_edge` leads to inflated adjacency/edge counts in direct `Dag` usage and non-lossless
behavior when round-tripping through `DagSchema`.(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** crates/byteport-dag/src/dag.rs
**Line:** 77:78
**Comment:**
*Logic Error: `add_edge` allows inserting the same `(from, to)` edge multiple times, which inflates `edge_count`, duplicates dependency entries, and then gets silently collapsed by `DagSchema::from_dag` deduplication, causing inconsistent graph behavior across modules. Reject duplicate edges before pushing.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix|
CodeAnt AI finished reviewing your PR. |
Code Review SummaryStatus: Issues Found | Recommendation: Address before merge Overview
Issue Details (click to expand)CRITICAL
WARNING
SUGGESTION
Files Reviewed (6 files)
|



User description
Summary
CodeAnt-AI Description
Add DAG file loading and parallel execution planning from the CLI
What Changed
Impact
✅ Run DAGs from YAML files✅ See which steps can run in parallel✅ Clearer errors for invalid or cyclic DAGs💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.