Second of the two 0.9.6 follow-ups (see the zero-nodes-in-batch issue filed alongside — they may share a parallel-processing root cause).
Symptom
Two consecutive graphify update . runs on an unchanged corpus produce graph.json files that differ only in community assignments:
- node-id set: identical
- link multiset (all fields): identical
community: differs on 196 of 11,819 nodes
So the semantic graph is deterministic, but the artifact is not byte-reproducible.
It appears intermittent: one other consecutive pair of runs produced byte-identical output. That pattern is consistent with an unseeded RNG or parallel-ordering dependence in the community-detection step (Louvain/Leiden-style methods are tie-break sensitive).
On 0.9.5 we never observed this — we actually used file-hash equality of double runs as an acceptance test for our own append pipeline, and it held.
Ask
A fixed seed and/or deterministic tie-breaking in community detection, so identical input produces byte-identical graph.json. Reproducible output is useful for caching, CI artifact diffing, and downstream tools that append to the graph (our case).
Environment
- graphifyy 0.9.6 (uv tool install), macOS (Darwin 25.5)
- Rails repo: ~11,800 nodes / ~19,800 links
- Diff method: parse both files, compare node-id sets, per-node field diff, link multiset comparison
Second of the two 0.9.6 follow-ups (see the zero-nodes-in-batch issue filed alongside — they may share a parallel-processing root cause).
Symptom
Two consecutive
graphify update .runs on an unchanged corpus produce graph.json files that differ only incommunityassignments:community: differs on 196 of 11,819 nodesSo the semantic graph is deterministic, but the artifact is not byte-reproducible.
It appears intermittent: one other consecutive pair of runs produced byte-identical output. That pattern is consistent with an unseeded RNG or parallel-ordering dependence in the community-detection step (Louvain/Leiden-style methods are tie-break sensitive).
On 0.9.5 we never observed this — we actually used file-hash equality of double runs as an acceptance test for our own append pipeline, and it held.
Ask
A fixed seed and/or deterministic tie-breaking in community detection, so identical input produces byte-identical graph.json. Reproducible output is useful for caching, CI artifact diffing, and downstream tools that append to the graph (our case).
Environment