Skip to content

feat: add CI sanity check for cross-reference validation#48

Merged
jinsonvarghese merged 4 commits intoOWASP:mainfrom
rashim27us:feature/cross-validation-ci
May 4, 2026
Merged

feat: add CI sanity check for cross-reference validation#48
jinsonvarghese merged 4 commits intoOWASP:mainfrom
rashim27us:feature/cross-validation-ci

Conversation

@rashim27us
Copy link
Copy Markdown
Contributor

Summary

This PR adds automated CI validation for > See also: cross-references across the APTS documentation.

Problem

The APTS documentation uses APTS-XX-NNN IDs inside > See also: blocks to link related requirements across domains.

However, there was no automated check to verify that these referenced IDs actually exist. That means broken, stale, or mistyped requirement references could silently remain in the documentation.

Solution

This PR introduces a standalone validation script and wires it into CI so invalid cross-references are caught automatically during pull requests and main branch pushes.

Changes

  1. Added scripts/validate_cross_references.py
  • Scans all Markdown files in the repository
  • Discovers valid defined requirement IDs
  • Parses > See also: blocks
  • Validates that each referenced APTS-XX-NNN ID exists
  • Exits with a non-zero status if invalid references are found
  • Uses only the Python standard library
  1. Updated .github/workflows/ci.yml
  • Added a Validate cross-references CI step
  • Runs the validation script as part of the existing CI workflow

Impact

  • Prevents broken See also references from being merged
  • Catches typos in requirement IDs early
  • Reduces manual review effort
  • Improves documentation reliability and maintainability as APTS grows

@jinsonvarghese
Copy link
Copy Markdown
Member

Hi @rashim27us, good idea; automated cross-reference validation would catch real bugs. A few things to address before this is ready.

Scope is too narrow
The script only validates references inside > **See also:** blocks, but cross-references also appear in "Related Requirements and Appendices" sections in appendix templates (Authority Delegation Matrix, Shift Handoff Template, Autonomy Downgrade Matrix, etc.) and in inline mentions throughout implementation guides. Expanding the scan to cover all APTS-XX-NNN references across all markdown files would make this much more useful.

Follow existing script conventions (not a blocker, but nice to have)
The other four scripts in scripts/ all follow the same pattern. To stay consistent:

  • Add from future import annotations
  • Use _ci_utils helpers (git_ls_files, read_text, display_path) instead of raw rglob and open(). git_ls_files only scans tracked files, which avoids false positives from untracked content.
  • Use def main() -> int: with return 0 / return 1 and raise SystemExit(main()) instead of sys.exit()
  • Remove the unused import os (ruff will flag this as F401)

@rashim27us
Copy link
Copy Markdown
Contributor Author

Thanks @jinsonvarghese for the review. I have implemented the changes suggested by you. kindly review the changes and let me know if we need something else before merging.

@jinsonvarghese
Copy link
Copy Markdown
Member

Thank you @rashim27us. The updated script addresses the earlier feedback well; it now follows the existing code conventions, scans all lines instead of just "See also" blocks, and ruff passes clean.

One thing to address:

  • The PLACEHOLDER_REQUIREMENT_IDS in _ci_utils.py hardcodes four IDs (APTS-XX-NNN, APTS-XX-ANN, APTS-SE-027, APTS-SE-A01) to suppress false positives. These are all format examples in CONTRIBUTING.md, README.md, and index.md, not actual cross-references. The issue is that if someone later adds a broken reference to APTS-SE-027 in a standard document, the script would silently skip it.

A simpler fix:

  • The script already limits definition collection to standard/ files, but runs validation against all markdown files. If you limit the validation loop to standard/ files too, all six false positives go away and the placeholder list can be removed entirely. The files outside standard/ only contain format examples, not real cross-references.

@rashim27us
Copy link
Copy Markdown
Contributor Author

Thanks @jinsonvarghese for your suggestions, I have resolved hardcoded IDs in the latest commit. Kindly review the changes, and let me know if something else needed.

@jinsonvarghese
Copy link
Copy Markdown
Member

@rashim27us Thank you. The script addresses all the earlier feedback. Placeholder IDs are removed from _ci_utils.py, and validation scope is now limited to standard/ files, which naturally avoids the false positives in CONTRIBUTING.md, README.md, and index.md. Ran it locally against the current main and it passes clean.

One small note for a future pass: the script validates references inside fenced code blocks (YAML/JSON examples in the appendix templates). All 17 of those are valid IDs today so it causes no issues, but if someone adds a hypothetical example with a made-up ID in a code block, it would flag. _ci_utils.py already has strip_fenced_code() that could handle this, but that is a nice-to-have, not a blocker.

Good to merge.

@jinsonvarghese jinsonvarghese merged commit 498b893 into OWASP:main May 4, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants