feat(F3): enriched DAG schema with prereqs, acceptance, audit hooks by KooshaPari · Pull Request #240 · KooshaPari/BytePort

KooshaPari · 2026-06-25T15:51:32Z

User description

Summary

Add enriched DAG schema types to the BytePort DAG foundation, supporting prerequisites, acceptance criteria, and audit hooks for Phenotype compute/infra automation.

Context

The DAG crate (byteport-dag) provides core data structures (Dag, topological sort, parallel-bucket scheduler) but lacks domain-specific schema enrichment needed for Phenotype compute/infra. This change adds the schema layer so each node can carry pre-execution gating conditions, post-execution validation rules, and lifecycle audit hooks — all serializable to YAML/JSON for portability.

This is unit F3 in the DAG foundation + automation epic (epic_F).

Changes

New schema module (src/schema.rs): Enriched node/edge types with:
- Prerequisite enum (ImageReady, SecretAvailable, ResourceExists, ApiHealthy, FileExists, EnvironmentVariable, CustomScript)
- AcceptanceCriterion enum (ExitCode, OutputContains, HttpOk, MetricThreshold, LogCheck, CustomCheck)
- AuditHook enum (Webhook, LogEntry, MetricEmit, Notify) with HookTiming (Pre, Post, OnSuccess, OnFailure)
- SchemaNode with optional label, description, prerequisites, acceptance, audit_hooks, and arbitrary metadata
- SchemaEdge with optional label and condition
Updated serialize module (src/serialize.rs): DagSchema now uses the shared enriched types from schema module; expanded test coverage to 22 tests including round-trip and cross-format consistency
Workspace update (Cargo.toml): Added crates/byteport-dag to workspace members

Key Implementation Details

All schema types use #[serde(tag = "type")] internally-tagged enum representation for clean YAML/JSON output
PartialEq omitted from types containing f64 fields (MetricThreshold, MetricEmit) since f64 does not implement Eq
Schema types skip empty/None fields in serialization for compact output
Full YAML and JSON round-trip support with cross-format consistency

Use Cases

Define pre-flight checks: "image must be ready", "secret must exist", "API must be healthy" before a DAG node executes
Validate post-execution: "exit code 0", "output contains pattern", "HTTP health check passes"
Attach observability: emit metrics, write audit logs, fire webhooks, send notifications at configurable lifecycle points
Serialize enriched DAGs to YAML for version control and CI pipeline definitions

Testing

# Run all unit + doc tests
cargo test -p byteport-dag

# All 46 unit tests and 2 doc-tests pass

Links

Epic: epic_F — DAG foundation + automation
Area: compute-infra
Previous work: F5 (topological sort), F7 (artifact-aware scheduler)

CodeAnt-AI Description

Add a reusable DAG library with scheduling and portable schema output

What Changed

Added a generic DAG structure for adding nodes and edges, reading parent/child links, and catching duplicate nodes or missing nodes
Added two topological sort options and cycle checks so invalid dependency graphs are rejected
Added parallel scheduling that groups independent nodes into execution buckets and reports peak concurrency
Added enriched schema support for node prerequisites, acceptance checks, audit hooks, and edge conditions, with YAML and JSON round-trips

Impact

✅ Clearer DAG validation
✅ Faster parallel task planning
✅ Portable workflow definitions

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

gemini-code-assist · 2026-06-25T15:51:37Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

codeant-ai · 2026-06-25T15:51:38Z

CodeAnt AI is reviewing your PR.

codeant-ai · 2026-06-25T15:51:38Z

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

socket-security · 2026-06-25T15:52:02Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	cargo/serde_yaml@0.9.34%2Bdeprecated
	cargo/log@0.4.33
	cargo/clap@4.6.1 ⏵ 4.5.61
	cargo/http@1.4.0 ⏵ 1.4.2
	cargo/uuid@1.23.1 ⏵ 1.20.0	⁺¹

View full report

socket-security · 2026-06-25T15:52:03Z

Warning

Review the following alerts detected in dependencies.

According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.

Action	Severity	Alert (click "▶" to expand/collapse)
Warn		Obfuscated code: cargo `zerocopy` is 90.0% likely obfuscated Confidence: 0.90 Location: Package overview From: `?` → `cargo/tauri-plugin-log@2.8.0` → `cargo/zerocopy@0.8.52` ℹ Read more on: This package \| This alert \| What is obfuscated code? Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at `support@socket.dev`. Suggestion: Packages should not obfuscate their code. Consider not using packages with obfuscated code. Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment `@SocketSecurity ignore cargo/zerocopy@0.8.52`. You can also ignore all packages with `@SocketSecurity ignore-all`. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.

View full report

github-actions · 2026-06-25T15:52:06Z

Legacy Tooling Scan Report

Severity	Count
Critical	0
High	0
Medium	0
Low	0

No violations detected.

This is a WARN-mode scan. Fix before strict enforcement begins.

sonarqubecloud · 2026-06-25T15:52:13Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

codeant-ai · 2026-06-25T15:54:33Z

+        self.children.entry(from.clone()).or_default().push(to.clone());
+        self.parents.entry(to).or_default().push(from);


Suggestion: Duplicate edges are appended without any deduplication check, so repeated add_edge(from, to) calls create parallel duplicates in both adjacency lists. This inflates edge_count and causes contract inconsistencies with serialization (from_dag deduplicates edges via BTreeSet), leading to lossy round-trips and unexpected graph behavior. Reject or ignore already-existing edges before pushing. [logic error]

Severity Level: Major ⚠️

- ⚠️ Duplicate edges inflate `Dag::edge_count` unexpectedly. - ⚠️ Serialization round-trip drops duplicates, losing original structure.

Steps of Reproduction ✅

1. In a consumer or test, construct a DAG as in `crates/byteport-dag/src/dag.rs:168-173`: create `let mut dag = Dag::new();`, add nodes `"a"` and `"b"` via `dag.add_node("a").unwrap();` and `dag.add_node("b").unwrap();`, then call `dag.add_edge("a", "b").unwrap();` twice; each call executes `add_edge` at `dag.rs:69-78`, where lines 76-77 append `to` and `from` without any deduplication, resulting in `children["a"] == ["b", "b"]` and `parents["b"] == ["a", "a"]`. 2. Calling `dag.edge_count()` at `crates/byteport-dag/src/dag.rs:95-98` on this DAG sums the lengths of all `children` vectors, returning `2` for the single logical edge `"a" -> "b"` because it is stored twice, inflating edge metrics compared to a unique-edge interpretation. 3. Serializing this DAG with `DagSchema::from_dag(&dag, "1.0.0")` at `crates/byteport-dag/src/serialize.rs:81-120` uses a `BTreeSet<(String, String)>` (`edges_set` at lines 95-103) to collect edges; inserting `(node.clone(), child.clone())` from `dag.iter_nodes()` and `dag.children_of(node)` deduplicates the two `"a" -> "b"` entries, so `DagSchema.edges.len()` becomes `1` even though `dag.edge_count()` was `2`. 4. If a consumer then reconstructs a DAG via `schema.into_dag()` at `serialize.rs:131-139`, it iterates over `self.edges` and calls `dag.add_edge(edge.from.clone(), edge.to.clone())?;` once per schema edge, producing a new DAG with `edge_count() == 1` despite the original DAG having `edge_count() == 2`, demonstrating that duplicate adjacency entries are silently dropped by `from_dag`/`into_dag` round-trip and leading to inconsistent edge counts and behavior between the core `Dag` structure and the schema/serialization layer.

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** crates/byteport-dag/src/dag.rs **Line:** 76:77 **Comment:** *Logic Error: Duplicate edges are appended without any deduplication check, so repeated `add_edge(from, to)` calls create parallel duplicates in both adjacency lists. This inflates `edge_count` and causes contract inconsistencies with serialization (`from_dag` deduplicates edges via `BTreeSet`), leading to lossy round-trips and unexpected graph behavior. Reject or ignore already-existing edges before pushing. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

codeant-ai · 2026-06-25T15:55:29Z

+        let nodes: Vec<SchemaNode> = dag
+            .iter_nodes()
+            .map(|id| SchemaNode {


Suggestion: Nodes are emitted in raw HashSet iteration order, which is non-deterministic across runs. This makes YAML/JSON output unstable for the same logical graph, causing flaky snapshots, noisy diffs, and non-reproducible artifacts; sort node IDs before building nodes. [logic error]

Severity Level: Critical 🚨

- ❌ YAML schema output unstable across identical DAG executions. - ⚠️ Snapshot tests and config diffs become flaky, non-reproducible.

Steps of Reproduction ✅

1. In consumer code, build a DAG using the public API in `crates/byteport-dag/src/dag.rs`: create `let mut dag: Dag<String> = Dag::new();` (line 51) and add several nodes via `Dag::add_node` (lines 55–63), as shown in the crate-level example in `crates/byteport-dag/src/lib.rs:21–26`. 2. Serialize this DAG to a schema by calling `DagSchema::from_dag(&dag, "2.0.0")` (definition at `crates/byteport-dag/src/serialize.rs:81–120`), which constructs `nodes: Vec<SchemaNode>` using `dag.iter_nodes()` (lines 82–83) and `SchemaNode` mapping (line 84 onward); `iter_nodes()` is implemented in `Dag::iter_nodes` (dag.rs:115–117) as `self.nodes.iter()` over a `HashSet<K>` (dag.rs:23–30). 3. Because `Dag.nodes` is a `HashSet<K>` and `HashSet::iter()` order is hash-dependent and not guaranteed to be stable across processes, two separate runs of the same binary that both call `DagSchema::from_dag` on logically identical DAGs can produce `nodes` vectors in different orders (even if node insertion order was the same). 4. When callers serialize the schema via `DagSchema::to_yaml()` (serialize.rs:146–149) or `to_json_pretty()` (lines 165–167), the YAML/JSON array for `nodes` will reflect this non-deterministic iteration order, causing unstable serialized output that makes snapshot tests, config diffs, or stored artifacts noisy and non-reproducible even though the underlying graph is unchanged.

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** crates/byteport-dag/src/serialize.rs **Line:** 82:84 **Comment:** *Logic Error: Nodes are emitted in raw `HashSet` iteration order, which is non-deterministic across runs. This makes YAML/JSON output unstable for the same logical graph, causing flaky snapshots, noisy diffs, and non-reproducible artifacts; sort node IDs before building `nodes`. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

codeant-ai · 2026-06-25T15:55:29Z

+        let mut edges_set: BTreeSet<(String, String)> = BTreeSet::new();
+        for node in dag.iter_nodes() {
+            if let Some(children) = dag.children_of(node) {
+                for child in children {
+                    edges_set.insert((node.clone(), child.clone()));


Suggestion: from_dag silently deduplicates edges by inserting them into a BTreeSet, but Dag currently allows duplicate edges. This makes serialization lossy: from_dag(...).into_dag() can return a graph with fewer edges than the input, breaking round-trip expectations for callers that rely on edge multiplicity or exact edge_count. Preserve edges as-is (or reject duplicates explicitly at DAG insertion time) so behavior is consistent. [api mismatch]

Severity Level: Major ⚠️

- ❌ DagSchema round-trip drops duplicate edges from input DAG. - ⚠️ Edge-count based logic misbehaves after schema serialization.

Steps of Reproduction ✅

1. Construct a DAG with duplicate edges using the public API in `crates/byteport-dag/src/dag.rs`: create a `Dag<String>` via `Dag::new()` (line 51) and call `Dag::add_node` (lines 55–63) for nodes `"a"` and `"b"`, then call `Dag::add_edge("a".into(), "b".into())` twice (lines 66–79) so that `children` holds `["b", "b"]` and `edge_count()` (lines 95–98) returns `2`. 2. Convert this DAG into a schema by calling `DagSchema::from_dag(&dag, "2.0.0")` (constructor defined at `crates/byteport-dag/src/serialize.rs:81–120`), which builds `nodes` from `dag.iter_nodes()` (lines 82–93) and collects edges into `edges_set: BTreeSet<(String, String)>` at lines 95–101. 3. Observe in `from_dag` that for each parent/child pair the code inserts `(node.clone(), child.clone())` into `edges_set` (line 100), so duplicate edges from the original `Dag` are deduplicated when `edges_set.into_iter()` is mapped into `Vec<SchemaEdge>` (lines 104–112), yielding only one `SchemaEdge { from: "a", to: "b", .. }`. 4. Reconstruct a DAG from the schema with `DagSchema::into_dag()` (lines 128–140), which calls `Dag::add_node` for each schema node (lines 133–135) and `Dag::add_edge` once per `SchemaEdge` (lines 136–138), then compare `edge_count()` (dag.rs:95–98) between the original `Dag` (2) and reconstructed `Dag` (1) to see that `from_dag(...).into_dag()` is lossy whenever callers rely on edge multiplicity or exact edge counts.

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** crates/byteport-dag/src/serialize.rs **Line:** 96:100 **Comment:** *Api Mismatch: `from_dag` silently deduplicates edges by inserting them into a `BTreeSet`, but `Dag` currently allows duplicate edges. This makes serialization lossy: `from_dag(...).into_dag()` can return a graph with fewer edges than the input, breaking round-trip expectations for callers that rely on edge multiplicity or exact `edge_count`. Preserve edges as-is (or reject duplicates explicitly at DAG insertion time) so behavior is consistent. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

codeant-ai · 2026-06-25T15:55:29Z

+                        format!("{:?}", child),
+                    ));
+                }
+                DfsColor::White => visit(dag, child, color, order)?,


Suggestion: The DFS implementation is recursive, so very deep DAGs can overflow the call stack and abort the process at runtime. Use an explicit stack-based iterative DFS (or document/enforce a depth limit) to avoid stack-overflow crashes on large dependency chains. [possible bug]

Severity Level: Major ⚠️

- ❌ dfs_sort panics on deep DAGs via stack overflow. - ⚠️ Long dependency chains unsafe unless using kahn_sort alternative.

Steps of Reproduction ✅

1. In consumer code, construct a very deep DAG (e.g., a long linear chain) using the public `Dag` API in `crates/byteport-dag/src/dag.rs`: create `Dag<K>` via `Dag::new()` (line 51), repeatedly call `add_node` (lines 55–63) for nodes `n0..nN`, and call `add_edge` (lines 66–79) for each consecutive pair `(ni, ni+1)` to build a chain thousands of nodes deep. 2. Call the public topological sort function `topo::dfs_sort(&dag)` defined in `crates/byteport-dag/src/topo.rs:81–97`, which initializes the `color` map and `order` vector, then iterates `for start in dag.iter_nodes()` (line 88) and calls `visit(dag, start, &mut color, &mut order)?` (line 90) for white nodes. 3. Inside `visit` (topo.rs:99–128), for each node it iterates through `dag.children_of(node)` (lines 110–112), and on encountering a white child it recursively calls `visit(dag, child, color, order)?` (line 119), causing the call stack depth to grow in proportion to the longest path length in the DAG. 4. For sufficiently deep DAGs, these recursive `visit` calls form a call chain of length N, and on typical systems a large N (e.g., tens of thousands) will exhaust the process stack and trigger a stack overflow panic, aborting the process at runtime for callers that rely on `dfs_sort` instead of the non-recursive `kahn_sort`.

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** crates/byteport-dag/src/topo.rs **Line:** 119:119 **Comment:** *Possible Bug: The DFS implementation is recursive, so very deep DAGs can overflow the call stack and abort the process at runtime. Use an explicit stack-based iterative DFS (or document/enforce a depth limit) to avoid stack-overflow crashes on large dependency chains. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

codeant-ai · 2026-06-25T15:55:33Z

CodeAnt AI finished reviewing your PR.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 46a18222a1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-25T15:55:58Z

+        self.children.entry(from.clone()).or_default().push(to.clone());
+        self.parents.entry(to).or_default().push(from);


Reject cycle-forming edges

When a caller adds an edge that closes a path back to its source, such as a -> b -> c followed by c -> a, this method returns Ok(()) and mutates both adjacency lists, so the public Dag type can contain cycles despite being documented as acyclic and despite exposing CycleDetected. This also lets DagSchema::into_dag() accept cyclic schemas as valid Dags until a later scheduler/toposort call happens to catch them.

Useful? React with 👍 / 👎.

feat(F3): enriched DAG schema with prereqs, acceptance, audit hooks

46a1822

KooshaPari added the area:compute-infra Phenotype compute/infra epic label Jun 25, 2026

codeant-ai Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files label Jun 25, 2026

codeant-ai Bot reviewed Jun 25, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 25, 2026

View reviewed changes

		self.children.entry(from.clone()).or_default().push(to.clone());
		self.parents.entry(to).or_default().push(from);

Conversation

KooshaPari commented Jun 25, 2026 • edited by codeant-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Summary

Context

Changes

Key Implementation Details

Use Cases

Testing

Links

CodeAnt-AI Description

What Changed

Impact

Checking Your Pull Request

Talking to CodeAnt AI

Example

Preserve Org Learnings with CodeAnt

Example

Retrigger review

Check Your Repository Health

Uh oh!

gemini-code-assist Bot commented Jun 25, 2026

Uh oh!

codeant-ai Bot commented Jun 25, 2026

Uh oh!

codeant-ai Bot commented Jun 25, 2026

Thanks for using CodeAnt! 🎉

Uh oh!

socket-security Bot commented Jun 25, 2026

Uh oh!

socket-security Bot commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

sonarqubecloud Bot commented Jun 25, 2026

Quality Gate passed

Uh oh!

codeant-ai Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot commented Jun 25, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

KooshaPari commented Jun 25, 2026 •

edited by codeant-ai Bot

Loading