Skip to content

fix(#1657): resolve three independent Windows papercuts#1670

Open
raman118 wants to merge 1 commit into
Graphify-Labs:v8from
raman118:fix/issue-1657-windows-papercuts
Open

fix(#1657): resolve three independent Windows papercuts#1670
raman118 wants to merge 1 commit into
Graphify-Labs:v8from
raman118:fix/issue-1657-windows-papercuts

Conversation

@raman118

@raman118 raman118 commented Jul 4, 2026

Copy link
Copy Markdown

1. The Accented Path Mojibake Massacre

• The Annoyance: If you had files named with beautiful accents (like Confirmé.md or Expérience.py ), Windows would choke on them. They’d get written
into manifest.json and GRAPH_REPORT.md looking like total garbage ( Confirm or Expérience ).
• The Root Cause: Standard Python Path.read_text() and .write_text() calls without explicit encoding fall back to the system locale default. On
Windows, that means cp1252 (Windows-1252). When UTF-8 characters were passed through, Windows decoded them using cp1252 , causing a classic encoding
leak.
• The Fix: We went into load_manifest and save_manifest inside graphify/detect.py and wrapped the read/write boundaries in clean open(...,
encoding="utf-8") blocks. Crucially, we also added ensure_ascii=False to json.dump() , forcing the JSON writer to save raw UTF-8 characters to the
file instead of converting them into escaped ASCII sequences ( \u00e9 ).
• The Safety Net: We added test_manifest_and_report_preserve_accented_paths to check that accented names round-trip perfectly byte-for-byte in both
the manifest and the generated markdown report.

2. The Windows Console Encoding Meltdown

• The Annoyance: Running graphify query on Windows without manually setting PYTHONIOENCODING=utf-8 would lead to ugly console crashes (
UnicodeEncodeError / UnicodeDecodeError ) whenever non-ASCII characters were returned in the output.
• The Root Cause: The default Windows command console doesn’t use UTF-8 by default. When Python tried to push non-ASCII characters to standard output,
it threw a fit.
• The Fix: At the absolute top of the CLI entry point ( graphify/main.py ), before any output gets printed, we added a reconfiguration guard:
if hasattr(sys.stdout, "reconfigure"):
sys.stdout.reconfigure(encoding="utf-8", errors="replace")
This forces both stdout and stderr to speak UTF-8, using replacement characters if something is truly mangled, without breaking piped/redirected
output.
• The Safety Net: We added test_query_cli_non_ascii_no_encoding_error in test_query_cli.py . It spawns a subprocess running the query CLI without
PYTHONIOENCODING to simulate the raw Windows environment, proving it no longer crashes.

3. The Import Cycles Noise Pollution

• The Annoyance: When running graphify on a corpus composed mostly of documents (like a folder full of markdown papers), the report would print a
section called ## Import Cycles . Because import cycles are a code-only concept, this was pure noise, usually returning "None detected" or parsing
markdown paths incorrectly.
• The Root Cause: The report generator was blindly appending the Import Cycles section without checking whether the corpus actually contained code.
• The Fix: In graphify/report.py , we added a quick scan to count the number of nodes in the graph that represent code. If has_code_nodes is false,
we skip building the ## Import Cycles section entirely.
• The Safety Net: We wrote two new tests: test_report_omits_import_cycles_on_document_only_corpus and
test_report_includes_import_cycles_on_code_heavy_corpus . Now we explicitly assert that the cycles section is omitted on markdown/document-only trees,
while remaining perfectly intact on code-heavy repos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant