Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,491 changes: 415 additions & 1,076 deletions Cargo.lock

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,7 @@
resolver = "3"
members = [
"crates/byteport-transport",
"crates/byteport-dag",
"crates/byteport-cli",
"frontend/web/src-tauri",
]
14 changes: 14 additions & 0 deletions crates/byteport-cli/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
[package]
name = "byteport-cli"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Expose the advertised pheno-dag binary

This package adds src/main.rs but no [[bin]] target, so Cargo will build/install the executable as byteport-cli rather than the advertised pheno-dag name (Cargo documents that src/main.rs defaults to the package name unless a [[bin]] table customizes it: https://doc.rust-lang.org/cargo/reference/cargo-targets.html#binaries). As a result, the documented pheno-dag run ... command will not exist after cargo build/cargo install; add a [[bin]] name = "pheno-dag" target or align the docs and Clap name with byteport-cli.

Useful? React with 👍 / 👎.

version = "0.1.0"
edition = "2021"
license = "Apache-2.0"
description = "CLI for Phenotype compute/infra — DAG execution and orchestration"
publish = false

[dependencies]
byteport-dag = { path = "../byteport-dag" }
clap = { version = "4", features = ["derive"] }
serde = { version = "1", features = ["derive"] }
serde_yaml = "0.9"
thiserror = "2.0"
214 changes: 214 additions & 0 deletions crates/byteport-cli/src/main.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
//! # byteport-cli
//!
//! CLI for Phenotype compute/infra — DAG execution and orchestration.
//!
//! ## Subcommands
//!
//! | Subcommand | Description |
//! |-----------------------|-----------------------------------------------|
//! | `pheno-dag run <yaml>`| Parse a YAML DAG definition, schedule, execute |
//!
//! ## Example
//!
//! ```shell
//! # Run a DAG from a YAML file
//! pheno-dag run examples/ci-pipeline.yaml
//! ```

use std::fs;
use std::path::PathBuf;

use clap::{Parser, Subcommand};

use byteport_dag::dag::Dag;
use byteport_dag::scheduler;
use byteport_dag::serialize::DagSchema;

// ---------------------------------------------------------------------------
// CLI entry point
// ---------------------------------------------------------------------------

#[derive(Parser)]
#[command(name = "pheno-dag", about = "Phenotype compute/infra DAG automation", version)]
struct Cli {
#[command(subcommand)]
command: DagCommand,
}

#[derive(Subcommand)]
enum DagCommand {
/// Parse a YAML DAG definition, compute a schedule, and print execution plan
Run {
/// Path to the YAML file containing the DAG definition
yaml: PathBuf,

/// Optional DAG name filter (only execute nodes matching a pattern)
#[arg(short, long)]
name: Option<String>,

/// Enable verbose output including serialized schedule
#[arg(short, long)]
verbose: bool,
},
}

// ---------------------------------------------------------------------------
// Entry point
// ---------------------------------------------------------------------------

fn main() {
let cli = Cli::parse();

match &cli.command {
DagCommand::Run {
yaml,
name: _name,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wire the --name filter into execution

When callers pass pheno-dag run --name <pattern> ..., the parsed filter is discarded here and run_dag receives only the path and verbose flag, so the schedule still includes every node. Because the option is advertised as an execution filter, this silently ignores a user-supplied constraint; pass it into run_dag and filter/validate matching nodes, or remove the flag until it is implemented.

Useful? React with 👍 / 👎.

verbose,
} => run_dag(yaml, *verbose),
Comment on lines +63 to +67

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The --name CLI option is parsed but never used, so pheno-dag run --name ... silently executes the full DAG instead of filtering nodes as documented. Thread this argument into run_dag and apply the filter before scheduling to match the command contract. [incomplete implementation]

Severity Level: Major ⚠️
- ❌ pheno-dag --name executes full DAG, ignoring filter.
- ⚠️ Users cannot limit runs to matching nodes.
- ⚠️ Automations relying on name filtering mis-execute extra steps.
Steps of Reproduction ✅
1. Build the CLI crate so that `main()` in `crates/byteport-cli/src/main.rs:59-69` is the
entrypoint for the `pheno-dag` binary.

2. Run `pheno-dag run examples/ci-pipeline.yaml --name build` so Clap parses into
`DagCommand::Run { yaml, name: Some("build".into()), verbose: false }` via the
`DagCommand::Run` variant definition at `crates/byteport-cli/src/main.rs:38-52`.

3. In `main`, the match arm at `crates/byteport-cli/src/main.rs:62-67` destructures as
`name: _name` and calls `run_dag(yaml, *verbose)` without passing the parsed `name`,
effectively discarding the filter.

4. Inside `run_dag` at `crates/byteport-cli/src/main.rs:75-133`, the YAML is deserialized
and the full `Dag<String>` is scheduled via `scheduler::schedule(&dag)` without any
filtering, so the printed schedule includes all nodes even though `--name build` was
supplied.

Fix in Cursor Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** crates/byteport-cli/src/main.rs
**Line:** 63:67
**Comment:**
	*Incomplete Implementation: The `--name` CLI option is parsed but never used, so `pheno-dag run --name ...` silently executes the full DAG instead of filtering nodes as documented. Thread this argument into `run_dag` and apply the filter before scheduling to match the command contract.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

}
}

// ---------------------------------------------------------------------------
// DAG execution logic
// ---------------------------------------------------------------------------

/// Parse a YAML file, build the DAG, compute a schedule, and print it.
fn run_dag(path: &PathBuf, verbose: bool) {
// 1. Read the YAML file
let contents = match fs::read_to_string(path) {
Ok(s) => s,
Err(e) => {
eprintln!("Error: cannot read {} — {}", path.display(), e);
std::process::exit(1);
}
};

// 2. Deserialize into DagSchema
let schema = match DagSchema::from_yaml(&contents) {
Ok(s) => s,
Err(e) => {
eprintln!("Error: failed to parse YAML — {}", e);
std::process::exit(2);
}
};

if verbose {
eprintln!("[info] parsed {}: {} nodes, {} edges", path.display(), schema.nodes.len(), schema.edges.len());
if let Some(ref name) = schema.name {
eprintln!("[info] DAG name: {}", name);
}
eprintln!("[info] schema version: {}", schema.version);
}

// 3. Convert schema into an executable Dag
let dag: Dag<String> = match schema.into_dag() {
Ok(d) => d,
Err(e) => {
eprintln!("Error: invalid DAG definition — {}", e);
std::process::exit(3);
}
};

// 4. Compute the parallel-bucket schedule
let schedule = match scheduler::schedule(&dag) {
Ok(s) => s,
Err(e) => {
eprintln!("Error: DAG contains a cycle — {}", e);
std::process::exit(4);
}
};

// 5. Print the schedule
println!("{}", scheduler::format_schedule(&schedule));

// 6. Verbose: dump the serialized plan as YAML
if verbose {
let export = DagSchema::from_dag(&dag, &schema.version)
.with_name(schema.name.clone().unwrap_or_default());
Comment on lines +126 to +127

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This always writes a name field in verbose export, even when the input DAG had no name, by converting None into an empty string. Preserve optionality instead of forcing Some(""), otherwise unnamed DAGs are serialized with a misleading blank name. [logic error]

Severity Level: Major ⚠️
- ⚠️ Verbose YAML export invents blank names for unnamed DAGs.
- ⚠️ Downstream tools misinterpret empty-name versus missing-name.
Steps of Reproduction ✅
1. Create a YAML DAG file without a `name` field, e.g. just `version`, `nodes`, and
`edges`, and run `pheno-dag run <file> --verbose` so `run_dag` in
`crates/byteport-cli/src/main.rs:75-133` is executed.

2. `DagSchema::from_yaml(&contents)` at `crates/byteport-cli/src/main.rs:87-93`
deserializes into a `DagSchema` whose `name` field is `None` because `DagSchema` defines
`pub name: Option<String>` with `#[serde(default, skip_serializing_if =
"Option::is_none")]` in `crates/byteport-dag/src/serialize.rs:79-85`.

3. After scheduling, the verbose branch at `crates/byteport-cli/src/main.rs:125-127`
builds `export = DagSchema::from_dag(&dag,
&schema.version).with_name(schema.name.clone().unwrap_or_default())`; with `schema.name ==
None`, `unwrap_or_default()` produces `""`, and `with_name` sets `self.name =
Some(name.into())` per `crates/byteport-dag/src/serialize.rs:135-137`.

4. When `export.to_yaml()` runs at `crates/byteport-cli/src/main.rs:128-130`, serde sees
`name: Some("")` and, because `skip_serializing_if` only skips `None`, emits a `name: ""`
field, so the CLI's verbose YAML output now claims the DAG has a blank name even though
the original input had no `name` field at all.

Fix in Cursor Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** crates/byteport-cli/src/main.rs
**Line:** 126:127
**Comment:**
	*Logic Error: This always writes a `name` field in verbose export, even when the input DAG had no name, by converting `None` into an empty string. Preserve optionality instead of forcing `Some("")`, otherwise unnamed DAGs are serialized with a misleading blank name.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

match export.to_yaml() {
Ok(yaml) => println!("---\n{}", yaml),
Err(e) => eprintln!("Warning: could not serialize schedule to YAML — {}", e),
}
}
}

// ---------------------------------------------------------------------------
// Tests
// ---------------------------------------------------------------------------

#[cfg(test)]
mod tests {
use super::*;
use clap::CommandFactory;

#[test]
fn cli_definition_is_valid() {
Cli::command().debug_assert();
}

#[test]
fn run_dag_with_valid_yaml_succeeds() {
// Build a minimal YAML dag
let yaml = r#"
version: "1.0.0"
name: "test-pipeline"
nodes:
- { id: "build" }
- { id: "test" }
- { id: "deploy" }
edges:
- { from: "build", to: "test" }
- { from: "test", to: "deploy" }
"#;
let tmp = std::env::temp_dir().join("_f10_test_dag.yaml");
fs::write(&tmp, yaml).expect("write temp YAML");

// Capture stdout
let saved_path = PathBuf::from(&tmp);
run_dag(&saved_path, false);

fs::remove_file(&tmp).ok();
}

#[test]
fn run_dag_with_missing_file_reports_error() {
let missing = PathBuf::from("/nonexistent/pipeline.yaml");
// We just verify the function exits with an error code.
// Since run_dag calls process::exit, we check that it logs via
// stderr and exits. For unit test coverage we exercise the
// pre-exit paths via helper assertions.
let result = fs::read_to_string(&missing);
assert!(result.is_err(), "missing file must produce an error");
}

#[test]
fn run_dag_with_invalid_yaml_reports_error() {
let bad_yaml = "version: 1.0.0\nnodes: [invalid";
let tmp = std::env::temp_dir().join("_f10_bad_dag.yaml");
fs::write(&tmp, bad_yaml).expect("write bad YAML");

let schema = DagSchema::from_yaml(&fs::read_to_string(&tmp).unwrap());
assert!(schema.is_err(), "malformed YAML must fail to parse");

fs::remove_file(&tmp).ok();
}

#[test]
fn run_dag_with_cycle_reports_error() {
let yaml = r#"
version: "1.0.0"
nodes:
- { id: "a" }
- { id: "b" }
- { id: "c" }
edges:
- { from: "a", to: "b" }
- { from: "b", to: "c" }
- { from: "c", to: "a" }
"#;
let schema = DagSchema::from_yaml(yaml).unwrap();
let dag = schema.into_dag().unwrap();
let result = scheduler::schedule(&dag);
assert!(result.is_err(), "cyclic DAG must fail to schedule");
}
}
13 changes: 13 additions & 0 deletions crates/byteport-dag/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
[package]
name = "byteport-dag"
version = "0.1.0"
edition = "2021"
license = "Apache-2.0"
description = "DAG foundation: topological sort and parallel-bucket scheduler for Phenotype compute/infra"
publish = false

[dependencies]
serde = { version = "1", features = ["derive"] }
serde_json = "1"
serde_yaml = "0.9"
thiserror = "2.0"
Loading
Loading