feat(dag): YAML/JSON serde round-trip serialization (F4)#236
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
CodeAnt AI is reviewing your PR. |
Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
|
Warning Review the following alerts detected in dependencies. According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.
|
|
Legacy Tooling Scan Report
No violations detected. This is a WARN-mode scan. Fix before strict enforcement begins. |
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f33ebad724
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| /// Deserialize from a YAML string. | ||
| pub fn from_yaml(yaml: &str) -> Result<Self, DagSerError> { | ||
| Ok(serde_yaml::from_str(yaml)?) |
There was a problem hiding this comment.
Validate deserialized DAG schemas before returning
For YAML inputs that contain an invalid graph (for example a self-loop a -> a, a cycle, or duplicate edges), this path returns Ok after only serde parsing; into_dag() later also accepts the self-loop/duplicates because add_edge only checks endpoint existence. That means config-driven callers can load a DagSchema/Dag that is not actually a valid DAG, despite the new serialization layer being the validation boundary. The JSON loader below has the same issue and should share the same schema validation before returning.
Useful? React with 👍 / 👎.
| self.children.entry(from.clone()).or_default().push(to.clone()); | ||
| self.parents.entry(to).or_default().push(from); |
There was a problem hiding this comment.
Suggestion: Duplicate edges are inserted without checks, which duplicates entries in children/parents and inflates edge_count. This also breaks serialization round-trip fidelity because DagSchema::from_dag deduplicates edges via BTreeSet, so repeated edges are silently lost; reject duplicates on insert (or switch adjacency storage to sets). [logic error]
Severity Level: Major ⚠️
- ❌ Serialization drops duplicate edges, changing graph semantics.
- ⚠️ Edge counts inconsistent between DAG and serialized schema.Steps of Reproduction ✅
1. Construct a DAG similarly to `sample_dag()` in
`crates/byteport-dag/src/serialize.rs:198-206`, but call `dag.add_edge("build".into(),
"test".into())` twice using `Dag::add_edge()` implemented at
`crates/byteport-dag/src/dag.rs:70-79`.
2. Call `dag.edge_count()` (`crates/byteport-dag/src/dag.rs:96-99`) and observe that it
returns a higher count than the logical number of distinct edges because the duplicate
`"build" → "test"` edge has been appended twice to both `children` and `parents` via the
lines at `dag.rs:77-78`.
3. Serialize this DAG using `DagSchema::from_dag(&dag, "1.0.0")`
(`crates/byteport-dag/src/serialize.rs:93-132`); the edge collection uses a
`BTreeSet<(String, String)>` at `serialize.rs:108-117`, so only one `(\"build\",
\"test\")` pair is retained in `schema.edges`.
4. Round-trip via `let dag2 = schema.into_dag().unwrap()`
(`crates/byteport-dag/src/serialize.rs:144-152`) and compare `dag.edge_count()` vs
`dag2.edge_count()`: the reconstructed DAG has fewer edges, confirming that duplicate
edges are silently dropped across serialization and that the in-memory graph and its
serialized representation diverge.(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** crates/byteport-dag/src/dag.rs
**Line:** 77:78
**Comment:**
*Logic Error: Duplicate edges are inserted without checks, which duplicates entries in `children`/`parents` and inflates `edge_count`. This also breaks serialization round-trip fidelity because `DagSchema::from_dag` deduplicates edges via `BTreeSet`, so repeated edges are silently lost; reject duplicates on insert (or switch adjacency storage to sets).
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| pub fn from_yaml(yaml: &str) -> Result<Self, DagSerError> { | ||
| Ok(serde_yaml::from_str(yaml)?) |
There was a problem hiding this comment.
Suggestion: Deserialization methods only parse syntax and return DagSchema without any structural validation, so malformed payloads (self-loops, duplicate edges, dangling references) are accepted at API boundary. Add invariant checks during from_yaml/from_json (or a required validation step) so invalid schemas are rejected immediately. [incomplete implementation]
Severity Level: Critical 🚨
- ❌ Invalid DAG schemas accepted by from_yaml/from_json.
- ⚠️ Cycles detected late, complicating error handling.
- ⚠️ Deserialization boundary lacks DAG integrity checks.Steps of Reproduction ✅
1. Create a YAML document that matches the `DagSchema` layout from
`crates/byteport-dag/src/serialize.rs:79-91` but includes a structurally invalid edge,
e.g.:
- `nodes: [{ id: "a" }]`
- `edges: [{ from: "a", to: "a" }]` (self-loop) or a duplicate edge listed twice.
2. Pass this YAML string into `DagSchema::from_yaml(yaml_str)` implemented at
`crates/byteport-dag/src/serialize.rs:165-167`; deserialization succeeds and returns
`Ok(DagSchema)` because `from_yaml()` only wraps `serde_yaml::from_str` with no structural
validation.
3. Convert the schema into an internal DAG via `DagSchema::into_dag()`
(`crates/byteport-dag/src/serialize.rs:144-152`), which in turn calls
`Dag::add_edge(edge.from.clone(), edge.to.clone())` at
`crates/byteport-dag/src/dag.rs:70-79`; this accepts self-loops and duplicate edges (it
only checks endpoint existence), producing a `Dag<String>` that violates acyclicity and
uniqueness invariants.
4. Later, when this DAG is used by `topo::kahn_sort(&dag)`
(`crates/byteport-dag/src/topo.rs:15-65`) or `scheduler::schedule(&dag)`
(`crates/byteport-dag/src/scheduler.rs:35-73`), those functions fail with
`DagError::CycleDetected(..)` for self-loops or behave inconsistently for duplicate edges,
demonstrating that malformed payloads are admitted at the deserialization boundary and
only rejected (or cause anomalies) deeper in the execution pipeline.(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** crates/byteport-dag/src/serialize.rs
**Line:** 165:166
**Comment:**
*Incomplete Implementation: Deserialization methods only parse syntax and return `DagSchema` without any structural validation, so malformed payloads (self-loops, duplicate edges, dangling references) are accepted at API boundary. Add invariant checks during `from_yaml`/`from_json` (or a required validation step) so invalid schemas are rejected immediately.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix|
CodeAnt AI finished reviewing your PR. |



User description
Summary
Add YAML/JSON serde-based round-trip serialization for the DAG schema, enabling portable DAG interchange via YAML and JSON formats.
Context
This implements unit F4 of epic_F (DAG foundation + automation). The byteport-dag crate already provides core DAG types (
Dag<K>,Node<K>,Edge) with topological sort and validation. This PR adds serialization support so DAGs can be exported/imported using standard YAML and JSON formats — critical for config-driven pipeline definitions, persistence, and cross-service DAG handoff.Changes
crates/byteport-dag/Cargo.toml— addedserde,serde_yaml,serde_jsondependenciesCargo.toml— addedcrates/byteport-dagto workspace memberscrates/byteport-dag/src/serialize.rs— new serialization module with:#[derive(Serialize, Deserialize)]on all DAG types (Dag,Node,Edge)dag.to_yaml()/Dag::from_yaml()for YAML round-tripdag.to_json()/Dag::from_json()for JSON round-tripDagSchema<K>andDagDocument<K>types for clean serialization outputcrates/byteport-dag/src/lib.rs— exposedserializemodule and re-exported convenience methodsKey Implementation Details
DagSchema<K>as the serialization intermediary withnodes(map) andedges(array) at the top levelUse Cases
Testing
Links
CodeAnt-AI Description
Add DAG scheduling, sorting, and YAML/JSON export support
What Changed
Impact
✅ Clearer DAG validation errors✅ Faster parallel task planning✅ Easier DAG import and export💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.