Generated: June 12, 2026 — last updated June 18, 2026 (Round 29) — Claude reads this automatically when the folder is mounted in Cowork
A regulated-environment security assessment platform built for a senior information security professional with 20 years of GRC and security management experience across financial services, healthcare, and real estate. The platform exists in two versions that share the same controls libraries, compliance framework mappings, and report format:
Version 1 — Claude Code Skill (pen-tester/): AI-augmented assessment workflows running inside Claude Code. Install once, runs wherever Claude Code runs. Produces interactive HTML reports.
Version 2 — Standalone App (pen-tester/standalone/): Full PyQt6 desktop application. Runs independently with no Claude API dependency. Persistent SQLite scan history, in-app triage interface, prior report carryover. Designed for restricted environments. Note: the OS & Software scanner queries the NVD API for CVE lookups — full air-gap requires either skipping OS assessment or providing a local NVD mirror.
Both versions assess seven target types: Website, AI Agent, API, Source Code, STIG Compliance, OS & Software (Standalone only), and Connected Systems.
- All assessment workflows run end-to-end without errors
- Standalone app launches, scans a target, reaches review tier, generates report
- Reports correctly render framework dropdown, language-filtered review steps, FAIL/PASS highlighting, and control requirements
- Git repos are clean, correctly scoped, and pushable via manage.ps1
- Controls count is audited and accurate across all files
HTML Report Templates — all four templates updated:
pen-tester/assets/report-template.htmlpen-tester/assets/api-report-template.htmlpen-tester/assets/code-review-report-template.htmlpen-tester/assets/interconnected-report-template.html
Changes applied to all four:
FW_MAPJavaScript block (12 frameworks × most control families) — maps security controls to framework-specific IDs and labels. CPX-* (complexity) and OSAUDIT families have NO entries in any framework table — those controls always show "⊘ Not relevant" regardless of the selected framework. PATCH/EOL/SVCCONFIG/SVCEXPOSE are covered in NIST SP 800-53, ISO 27001, PCI-DSS, and CMMC but NOT in OWASP Top 10, SOC 2, HIPAA, SEC/FINRA.fmtEvidence()— bolds[FAIL]and[PASS]tokens in evidence textdetectLanguage()+DETECTED_LANG+filterReviewSteps()— auto-detects assessment language (Python, JS, etc.) and shows only matching review stepsgetFw()+getFwInfo()— retrieves framework short ID and full label for a control- Framework dropdown (
fwDrop) — renames control IDs and marks irrelevant controls "Not relevant to this framework" - "What to confirm" section — shows only the selected framework ID + control label; blank when Default selected
- Expand/collapse buttons — stay highlighted (
.btn-activeCSS class) when clicked - Notes button — moved to left side; textarea spans full card width
- Control requirements — collapsed
<details>block at top right of each card
pen-tester/standalone/controls.py — Framework reference fields added to _known set so they no longer leak into review_procedure. Committed and pushed to Multi-Modal-Scanner_Standalone.
pen-tester/standalone/reporter.py — Formatting fixes. Committed and pushed.
manage.ps1 — New file at project root. Provides status and push -Repo scanner|standalone|both -m "msg" commands for managing dual-repo workflow. Committed to Multi-Modal-Scanner.
All four report templates — language detection regex fixed (Rounds 27–28). detectLanguages() / detectLanguage() had a stray '\\' prefix: '\\' + ext.replace('.','[.]') produced regex \[.]py (literal [, any char, ], py) — never matched .py in evidence text. Fix: removed the prefix so ext.replace('.','[.]') alone produces [.]py (character class matching only literal dot). Applied to all four templates: code-review-report-template.html (Round 27), report-template.html / api-report-template.html / interconnected-report-template.html (Round 28). Pushed scanner repo.
Code cleanup batch — DONE (Round 27). Removed ScanResult.elapsed_seconds from scanners.py (dataclass field + all 60 constructor kwargs); removed HAS_BS4/BeautifulSoup import block from scanners.py and beautifulsoup4 from requirements.txt; removed dead detect_languages() from detector.py; fixed os-software-controls.md header to "6 families"; added fix_text=fields.get('fix text', fields.get('fix', '')) to the non-STIG Control() constructor in controls.py.
DB decision/FP tracking removed — DONE (Round 28). Per user decision: decisions and false positives are tracked via HTML report carryforward only, not DB. Removed from engine.py: DecisionsDB/FalsePositivesDB imports, prior_decision/prior_evidence_changed/user_decision fields on AssessmentResult, use_prior parameter, self.false_positives, all DB FP lookup blocks in auto-tier and review-tier loops, apply_review_decision()/apply_manual_decision()/apply_all_prior_manual() methods, review_decided/manual_decided from get_summary(). Removed from db.py: decisions table, false_positives table, DecisionsDB class, FalsePositivesDB class. Report-based carryforward (prior_report_data, is_false_positive, fp_justification, user_notes) is fully intact. Pushed both repos.
STIG bugs fixed — DONE (Round 29). Three bugs resolved in reporter.py and main.py:
- CAT severity round-trip (
reporter.py_sev_to_cat): wasCRITICAL/HIGH → CAT Icausing all CAT II findings to display as CAT I. Fixed toCRITICAL → CAT I,HIGH → CAT II,MEDIUM/LOW → CAT III— correctly reversingstig_parser.py'sSEVERITY_MAP. - Prior carryforward (
reporter.py):_parse_controls_from_htmlnow matchesvar CONTROLS =(STIG template format) in addition toconst CONTROLS =.extract_prior_data_from_reportnow reads both non-STIG field names (mitigation/mitigationDesc/note) and STIG field names (isFalsePositive/fpJustification/userNotes). Prior FPs and notes from STIG reports now carry forward correctly. - Profile selection (
main.py_import_stig): now readsdialog.profile_combo.currentIndex(). Index 0 = all rules; index 1+ filtersparsed['rules']to those whosevuln_idorrule_idis in the selected profile'sselected_rules, and recomputes stats. Push:.\manage.ps1 push -Repo standalone -m "Round 29: fix STIG CAT severity round-trip, prior carryforward, profile selection". - STIG data injection (
reporter.py_generate_stig_html_report): switched from inlinevar CONTROLS = ...JS global to two-part injection:STIG_METAstill injected via</head>replacement;CONTROLSdata now injected as<script type="application/json" id="sat-controls-data">tag replacing<!-- SAT-CONTROLS-PLACEHOLDER -->. This makes STIG saved reports parseable by_parse_controls_from_html()using the same path as non-STIG reports. - STIG
stigStatuscarryforward (reporter.py+engine.py):extract_prior_data_from_report()now readsstigStatusfrom saved STIG reports and stores it asstig_statusin the result dict. Engine.py applies it after the FP/notes carryforward block: forlibrary == 'stig'controls still inNEEDS_REVIEW/NOT_STARTED, maps priorstig_status→ internal status (Open→NON_COMPLIANT,Not a Finding→COMPLIANT,Not Applicable→NOT_APPLICABLE,Not Reviewed→NEEDS_REVIEW).
pen-tester/standalone/code_scanner.py — COMPLIANT result emission fixed (Round 25). The scanner previously only emitted NON_COMPLIANT results. When no vulnerability pattern matched, controls were silently absent, causing engine.py's fallback to produce NEEDS_REVIEW instead of COMPLIANT. Fix: added _CPX_UNIVERSAL (CPX-STRUCT-004, CPX-STRUCT-001, CPX-METRIC-001, CPX-STRUCT-003) and _CPX_BY_LANG constants to define which controls are checked per language, plus a _build_compliant_results(detected_langs, noncompliant_ids) helper that emits COMPLIANT for every checked control_id that had no NON_COMPLIANT finding. Called at the end of both scan_directory() (directory scan) and the single-file path in scan_target(). engine.py's result_by_ctrl dedup ensures NON_COMPLIANT always wins if any file had a violation. Pushed as commit 16e31f9.
CLAUDE.md — Created at project root. Contains session-start reminder (run manage.ps1 status), repo structure, and pending commit reminder. Auto-loaded by Cowork at session start.
HOW_TO_START_NEW_SESSION.txt — Updated to remove outdated references (pen-test-triage-update, mandatory handoff file uploads, fixed bug count), added manage.ps1 step.
Bug fixes confirmed:
- Bug 4 (pyyaml missing from requirements.txt) — FIXED,
pyyaml>=6.0is present - Bug 6 (controls count 64 vs 67) — FIXED,
controls-library.mdnow says 67,SKILL.mdnow says 67 - Bug 1 (framework fields leaking into review_procedure) — FIXED via controls.py
_knownset
Git / repo cleanup:
pen-test-triage-updatesubmodule remote URL updated frompen-test-triage.git→Multi-Modal-Scanner.git- Merge conflict resolved (kept remote versions, which had all recent template changes)
- Multi-Modal-Scanner pushed at
fbb4148 - Multi-Modal-Scanner_Standalone pushed at
16e31f9(Round 25: code_scanner COMPLIANT fix); Round 26 pushes complete (vulnbank_backend test targets + engine.py no-inference fix + code-review template updates)
Redundancy audit completed. Files were moved to C:\to-delete-in-30\ in Round 25 and removed from git history in commit 6e5a2f0.
Bug 2 (Evidence fallback in engine.py): FULLY FIXED for review_required controls. When no scanner covers a control and no family relatives exist, the fallback now shows: (1) a target profile (scanners run, controls tested, pass/fail counts), (2) "No scanner covers {ctrl_id} — this control requires manual assessment", (3) a structured checklist from ctrl.test_procedure (split on ., up to 6 steps with [ ] checkboxes), falling back to ctrl.statement if no test_procedure exists, (4) "Assess manually and select: Accept finding / Compliant / N/A / False positive", (5) confidence set to 0.2. The automatic_confirmation tier fallback (line ~219) still shows a shorter message ("No scanner covers {ctrl_id} directly.\n\nControl: ...\nRequirement: ...\nThis control requires manual verification.") — but automatic_confirmation controls are expected to have scanner coverage by design, so this path is rarely hit.
Folder rename / structure cleanup: The root folder is still named "Revised pen tester" but the repo is Multi-Modal-Scanner. Plan to rename has been discussed but not executed (renaming requires reselecting the folder in Cowork).
Other projects review: Identified pen-tester-standalone and Pen tester with advice folders under C:\Users\slagb\OneDrive\Documents\Claude\Projects\ as potentially redundant with the current project. Could not access them (mount limited to "Revised pen tester"). Needs review in a session with the parent Projects folder mounted.
Bug 3 (STIG parser path): Supplement described this as going up two dirs incorrectly. Verified in current code: os.path.dirname(os.path.dirname(os.path.abspath(__file__))) from pen-tester/standalone/main.py correctly resolves to pen-tester/ → pen-tester/tools/. The path appears correct. May have been fixed in a prior session. Verify by actually running the STIG import dialog.
Bug 5 (STIG report template): stig-report-template.html was fully rewritten (Round 29) with interactive triage controls. Has not been verified end-to-end with a real STIG XCCDF file — the rewrite may have introduced regressions. Needs a test run.
Two separate repos, shared filesystem. Multi-Modal-Scanner tracks assets/, references/, SKILL.md, manage.ps1, CLAUDE.md. Multi-Modal-Scanner_Standalone tracks all .py files at pen-tester/standalone/. The outer repo's .gitignore excludes pen-tester/standalone/. Use manage.ps1 to push either or both.
Standalone stays nested at pen-tester/standalone/. Moving it to a sibling folder would break controls.py, reporter.py, and main.py which all use os.path.dirname(standalone_dir) to locate sibling references/ and assets/ directories. A config.py with PENTESTER_ROOT env var override was discussed but not implemented.
Tiered assessment model with three tier names — final, never change:
automatic confirmation— scanner determines pass/fail definitively; no human review neededreview required— scanner found something that needs human interpretation, OR no scanner covers this control at all (fallback: shows a structured checklist fromtest_procedure+ confidence 0.2)manual confirmation— requires organizational knowledge that no scanner can provide (policy, configuration, access control decisions)
Control family is AGENT, not SKILL. Renamed. All references to SKILL as a control family are wrong; SKILL is the Claude Code artifact type.
Framework mapping is primarily JS, but adding a new framework requires changes in three places. The FW_MAP block in each template handles control renaming and "not relevant" marking client-side. But a complete new-framework addition also requires: adding mapping rows to each .md control library (Markdown), and adding the lowercase key to controls.py _known (Python). See Extensibility in Critical Nuance for the full checklist.
Evidence language detection is runtime, not pre-generated. detectLanguage() reads the evidence text to infer programming language, then filterReviewSteps() shows only the matching review procedure steps. No pre-filtering at scan time.
Single repo for both versions — rejected because the standalone .py files and the skill's assets/references/ files have different commit cadences and different audiences on GitHub.
Moving standalone to a top-level sibling folder — rejected because it would break three hardcoded relative path lookups without a config abstraction layer.
Separate "cross-system" as a distinct product — rejected; cross-system-report-template.html and cross-system-controls.md were just old names for the interconnected workflow. The files were confirmed redundant with their interconnected-* replacements.
Plugin architecture for scanners — discussed, not implemented. The current architecture uses direct imports in engine.py. Plugin discovery via a plugins/ directory was considered but deferred.
Bash sandbox for git operations. The sandbox mount creates .git/index.lock files that block git commands. Every git add/commit/push must be run in the user's PowerShell, not via the bash tool. Do not attempt git operations via mcp__workspace__bash.
manage.ps1 execution. PowerShell script execution was disabled ( — FIXED (Round 25). ExecutionPolicy restriction)Set-ExecutionPolicy -Scope CurrentUser RemoteSigned has been run; manage.ps1 now works normally. Raw git equivalents below are kept for reference only:
# manage.ps1 status
cd "C:\Users\slagb\OneDrive\Documents\Claude\Projects\Revised pen tester"
git status --short; git log --oneline -3
cd pen-tester\standalone
git status --short; git log --oneline -3
cd ..\..\..
# manage.ps1 push -Repo scanner -m "msg"
cd "C:\Users\slagb\OneDrive\Documents\Claude\Projects\Revised pen tester"
git add -A; git commit -m "msg"; git push
# manage.ps1 push -Repo standalone -m "msg"
cd "C:\Users\slagb\OneDrive\Documents\Claude\Projects\Revised pen tester\pen-tester\standalone"
git add -A; git commit -m "msg"; git push
cd ..\..\..
# manage.ps1 push -Repo both -m "msg" (run scanner block, then standalone block)git pull --allow-unrelated-histories on pen-test-triage-update. The remote had been force-pushed with Multi-Modal-Scanner's history, creating unrelated histories. The pull produced conflicts on every shared file. Resolution: git checkout --theirs for all conflicted files, then git push --force.
git checkout --theirs run from wrong directory. Was accidentally run from the parent repo root instead of inside the submodule. "Updated 0 paths" is the tell — always cd into the submodule first.
CLAUDE.md not appearing in git status after git add. The file existed on disk (Test-Path returned True) but git ls-files CLAUDE.md showed it was already tracked and committed — it had been committed in a prior session. Not a bug; no action needed.
- Python 3.14 (user's exact version:
C:\Users\slagb\AppData\Local\Python\pythoncore-3.14-64\python.exe) - PyQt6 6.11.0 / PyQt6-Qt6 6.11.1
- No Claude API dependency in standalone app — ever
pen-tester/standalone/must remain a sibling ofpen-tester/references/andpen-tester/assets/for relative path lookups to work- Windows paths (backslashes) in all user-facing commands
node_modules/is in.gitignoreand should never be committed
- Accuracy over speed. Confirm everything is correct before stating it. If uncertain, say so.
- Terse communication. No preambles, no "let me..." or "I'll now...". Direct action or direct answer.
- No over-explaining security concepts. User has 20 years of GRC and security management experience.
- Don't ask clarifying questions when intent is clear. When they say "fix all bugs", fix them — don't ask which ones.
- Model guidance: Use Opus for bugs and architectural decisions. Use Sonnet for adding features, regenerating reports, updating docs.
- Tier names are immutable. Never shorten "automatic confirmation", "review required", "manual confirmation".
- Control family is AGENT. Never call it SKILL in any code or documentation.
- manage.ps1 status before any work session. Push via
manage.ps1 push -Repo [scanner|standalone|both] -m "message".
Multi-Modal-Scanner (root):
pen-tester/assets/*.html — report templates
pen-tester/references/*.md — control libraries
pen-tester/SKILL.md — Claude Code skill definition
manage.ps1, CLAUDE.md, README.md
Multi-Modal-Scanner_Standalone (at pen-tester/standalone/):
*.py — all Python source files
requirements.txt, README.md
Shared concern (changes may need both repos):
standalone/controls.py — parser logic affects template output
references/*.md — read at runtime by Standalone; tracked in Scanner only
-
pen-tester-standalone folder (
C:\Users\slagb\OneDrive\Documents\Claude\Projects\pen-tester-standalone) — Is this an old copy of the standalone app or something different? Needs review when parent Projects folder is mounted. -
Pen tester with advice folder — Same question. May be an earlier version of the project.
-
Root folder rename — Should "Revised pen tester" be renamed to "Multi-Modal-Scanner" to match the GitHub repo? Doing so requires reselecting the folder in Cowork. Low priority but creates naming inconsistency.
-
Bug 3 verification — STIG parser path appears correct in code but hasn't been tested end-to-end with a real STIG XML file. Confirm by running the STIG import dialog.
-
Bug 5 verification —
stig-report-template.htmlwas fully rewritten (Round 29). Has not been verified end-to-end with a real STIG XCCDF file post-rewrite. Confirm full triage flow: import STIG → run assessment → open HTML → mark rules → save → reload as prior report → verify carryforward. -
Controls count audit— RESOLVED. -
STIG profile selection is not wired up— FIXED (Round 29)._import_stig()now readsdialog.profile_combo.currentIndex(). Index 0 = "All rules" (no filter). Index 1+ maps toparsed['profiles'][idx - 1]; rules are filtered to those whosevuln_idorrule_idappears inselected_rules, andparsed['stats']is recomputed. The filteredparseddict is then passed toto_markdown()andStigsDB.save(). -
— REMOVED (Round 28). All three methods and the underlying DB classes (apply_review_decision(),apply_manual_decision(), andapply_all_prior_manual()are all never called from the GUIDecisionsDB,FalsePositivesDB) have been deleted. See Open Question 11. -
— REMOVED (Round 28).prior_evidence_changedis set but never consumedprior_evidence_changed,use_prior, andprior_decisionfields onAssessmentResult/AssessmentEnginewere all removed along with the DB tracking layer.###header audit confirmed: 67 controls incontrols-library.md(73 total headers minus 6 appendices), 53 inapi-controls-library.md, 51 incode-review-controls.md, 27 ininterconnected-controls.md, 12 inos-software-controls.md. All matchCONTROL_LIBRARIEShardcoded counts. -
DB-based decision/FP tracking was built but non-functional.— REMOVED (Round 28).DecisionsDB,FalsePositivesDB,apply_review_decision(),apply_manual_decision(), andapply_all_prior_manual()have all been removed fromengine.pyanddb.py. Report-based carryforward (prior_report_data) is the sole carryforward mechanism. Triage decisions and false positive notes are recorded in the HTML report and reloaded on the next scan via the "Load Prior Report" workflow. -
Evidence fallback (Bug 2)— RESOLVED. The review_required tier fallback now shows a target profile + structured checklist fromtest_procedure+ confidence 0.2. See Current State → In Progress for the remaining auto tier minor case. -
AGENT-007 and AGENT-010 scan results are silently discarded —
agent_scanner.pygenerates findings for both controls (dangerous instructions, no-confirmation patterns) but both areMANUAL_IDS, so the manual-tier loop ignoresresult_by_ctrlentirely. The scanner effort is wasted. Resolution options: (a) remove AGENT-007 and AGENT-010 fromMANUAL_IDSand move them toreview_requiredso scanner evidence surfaces in the report, or (b) remove their scan logic fromagent_scanner.py. Option (a) is preferable since the scanner detects genuinely dangerous configurations. Unresolved — needs a decision before fixing.
-
Run the delete commands to clean up redundant files— DONE (Round 25). Files moved toC:\to-delete-in-30\via Move-Item; redundant files also removed from git history in commit6e5a2f0. -
Enable PowerShell script execution— DONE.Set-ExecutionPolicy -Scope CurrentUser RemoteSignedconfirmed executed.manage.ps1runs without restriction. -
Test STIG import end-to-end (verifies Bug 3 and Bug 5). Expected: dialog opens, parse preview shows title/rules/CAT distribution, import writes
.mdtopen-tester/references/. -
Test the standalone app end-to-end against each target type. Expected result counts (from HANDOFF_SUPPLEMENT section 13): code scanner ~44 findings, API scanner ~12 findings, agent scanner ~9 findings. Test inputs are in
pen-tester/standalone/test_targets/(local only, gitignored):code_sample/app.py,api_sample/openapi.yaml,agent_sample/SKILL.md. Also verify code scanner now shows Compliant results (Round 25 fix).Multi-language directory test (Round 26/27): enter
test_targets\code_sample\vulnbank_backend— scans Python, Java, Go, and PHP together. Expect 30+ NON_COMPLIANT findings plus COMPLIANT results for every clean control. Files:app.py,UserService.java,transfer.go,payment.php. Language autodetection regex was fixed (Round 27) — verify all four languages now appear in "What to confirm". -
Review
pen-tester-standaloneandPen tester with advicefolders (mount parent Projects folder in Cowork first).
- Bug 2 review_required tier is fixed. Minor gap:
automatic_confirmationfallback (line ~219) shows a shorter message — low priority since auto-tier controls are expected to have scanner coverage. - PyInstaller packaging (Phase 10 of original 10-phase plan — only phase not yet complete). Confirm exact rebuild process for
pen-tester.skillzip artifact (likely: zip thepen-tester/directory and rename to.skill) and document it here before packaging. Code cleanup batch (standalone repo)— DONE (Round 27). RemovedScanResult.elapsed_secondsfromscanners.py(dataclass field + all 60 constructor kwargs); removedHAS_BS4/BeautifulSoupimport block fromscanners.pyandbeautifulsoup4fromrequirements.txt; removed deaddetect_languages()fromdetector.py; fixedos-software-controls.mdheader to "6 families"; addedfix_text=fields.get('fix text', fields.get('fix', ''))to the non-STIGControl()constructor incontrols.py. Pushed.Remove DB decision/FP tracking— DONE (Round 28).DecisionsDB,FalsePositivesDB,apply_review_decision(),apply_manual_decision(),apply_all_prior_manual(),prior_evidence_changed,use_prior,prior_decisionall removed fromengine.pyanddb.py. Report-based carryforward (prior_report_data) is the sole mechanism. Pushed.Patch the same language detection regex bug (remove strayDONE (Round 28). All four templates now have the correct'\\'prefix) indetectLanguage()in the other three templates:report-template.html,api-report-template.html,interconnected-report-template.html.[.]extregex.- Resolve AGENT-007/010 inconsistency (Open Question 12): either move both to
review_requiredtier or remove their scan logic fromagent_scanner.py. Add Windows 11 to— DONE (Round 25). All six Win11 versions added using Enterprise/Education dates (same convention as Win10 entries). Win11 version detection mirrors Win10's_OS_EOLdictdisplay_versionpattern.win11-22h2andwin11-23h2are already past EOL as of June 2026 and will produce NON_COMPLIANT.Capture exact pyyaml version— PyYAML 6.0.3 confirmed installed.requirements.txtentrypyyaml>=6.0covers it; no change needed.- Consider externalizing
VULN_PATTERNSincode_scanner.pyto a JSON/YAML file for runtime updates without rebuilding - Consider
config.pywithPENTESTER_ROOTenv var to allow moving standalone to a sibling directory cleanly - Rename root folder "Revised pen tester" → "Multi-Modal-Scanner" if desired
The sandbox can't run git. Every git command in this project must be run in the user's PowerShell. The bash sandbox creates .git/index.lock files that block git operations. This is not intermittent — it is consistent. Do not try to work around it.
pen-test-triage-update was a submodule pointing at the wrong repo. It was a git submodule inside Multi-Modal-Scanner that pointed to pen-test-triage.git, which was force-pushed at some point with Multi-Modal-Scanner's history. The result was two git repos (root and submodule) pointing at the same remote. The submodule's remote had been updated to Multi-Modal-Scanner.git and a merge conflict resolved. The directory was deleted in Round 25 (moved to C:\to-delete-in-30\ and removed from git history in commit 6e5a2f0). It no longer exists on disk.
All four report templates are now in sync. report-template.html, api-report-template.html, code-review-report-template.html, and interconnected-report-template.html all have the same JS utility functions, framework dropdown, language filter, expand/collapse highlight, and notes layout. If a change is made to one template's JS or CSS, it must be applied to all four.
The "cross-system" naming is retired. cross-system-report-template.html and cross-system-controls.md were old names for the interconnected workflow. They existed only in the pen-test-triage-update/ submodule, which was deleted in Round 25. The current names are interconnected-report-template.html and interconnected-controls.md. Do not create new files with "cross-system" in the name.
controls.py _known set controls what appears in review_procedure. Any field key in a control library .md file that is NOT in the _known set gets appended to review_procedure with .title() formatting. This was the root cause of framework abbreviations (owasp, nist-800, etc.) appearing as procedure steps. The fix is committed. If new fields are added to the .md control libraries, they must also be added to _known in controls.py.
detectLanguage() reads the evidence text, not the target. Language detection for "What to confirm" filtering happens client-side in the report template by scanning the evidence string. It is not set at scan time. This means if the evidence text doesn't contain clear language identifiers, all review steps will show.
detectLanguages() / detectLanguage() implementation details (verified from source): Scans (c.evidence || '') + ' ' + (c.finding || '') for ALL controls — not just the current card. Counts file extension matches using [.]ext(?=[:,(\s]|$) lookahead to avoid mid-word false positives. The character class [.] matches only a literal dot (NOT \. — no backslash). Fixed in Rounds 27–28: the original had a stray '\\' prefix producing \[.]ext which never matched .ext in paths. All four templates now use the correct [.]ext regex (code-review fixed Round 27; the other three fixed Round 28). .kt maps to 'Java'. detectLanguages() (code-review template) returns a sorted array of all detected languages; detectLanguage() (other three templates) returns only the top language or null. The _LANG_KEYS list that filterReviewSteps() uses as section headers is ['Python','Js/Ts','Rust','Java','C/C++','C#','Go','Php'] — these exact strings must appear as line prefixes in the review procedure text. Lines before the first language-keyed section are shown for all languages ("general steps"). If filtering produces an empty string, falls back to the full text.
getFwInfo() has a prefix-match fallback. After trying an exact c.family match in FW_MAP, it falls back to checking whether c.family.startsWith(k) or c.id.startsWith(k + '-') for any key k in the framework table. This means CPX-STRUCT would match a 'CPX' key if one were added — but no current framework table has 'CPX', so CPX controls are always "not relevant" in all frameworks.
Report filter bar — status "active" matches three statuses, not two. The dropdown option is labeled "Non-compliant + Needs review" but the JS filter fStatus === 'active' includes ['NON_COMPLIANT', 'NEEDS_REVIEW', 'NOT_TESTED']. The NEEDS_REVIEW stat pill likewise expands to ['NEEDS_REVIEW', 'NOT_TESTED'] when matched. Manual-tier controls always arrive as NOT_TESTED — the pill and "active" filter ensures they're visible under the default "needs attention" view.
Cards auto-expand on render. render() auto-opens .ctrl.nc, .ctrl.nr, and .ctrl.manual cards after building the list. COMPLIANT and NOT_APPLICABLE cards start collapsed. Text search in the filter bar searches c.id + c.name + c.evidence + c.family (case-insensitive substring).
The user is the product owner and architect. They make design decisions. Do not present options when they have already decided something. Do not override tier names, control family names, or GUI specs. Implement exactly what is specified.
stig-report-template.html was fully rewritten in Round 29 (Task #11). The four non-STIG templates (report-template.html, api-report-template.html, code-review-report-template.html, interconnected-report-template.html) received FW_MAP, fmtEvidence, detectLanguage, framework dropdown, etc. The STIG template (pen-tester/assets/stig-report-template.html) has a different structure (CAT I/II/III format) and did NOT receive those features. Its Round 29 rewrite added interactive triage: status buttons (Open / Not a Finding / Not Applicable / Not Reviewed / False Positive), FP modal, notes textarea, Save button, and dynamic CAT summary that recomputes as the user marks rules. Data injection uses <!-- SAT-CONTROLS-PLACEHOLDER --> → sat-controls-data JSON tag (same format as non-STIG reports). The old STIG template (294 lines, inline var CONTROLS) no longer exists.
FW_MAP lives in all four templates independently. There is no shared JS file. If a new compliance framework needs to be added, or an existing mapping corrected, the change must be manually applied to all four templates. This is the most likely source of drift.
_fw/_fwInfo variable ordering in templates is critical. These must be computed before reviewProcHtml is built, otherwise the framework note can't appear in "What to confirm". In prior code, this was wrong — they were computed after. If editing any template's JS block, preserve the order: compute _fw and _fwInfo → build _fwNote → build reviewProcHtml using ${_fwNote}.
_known set in controls.py — what it contains and what matters. Any field key in a .md control library that is NOT in _known gets appended to review_procedure with .title() formatting. The full _known set (around line 219 of controls.py) contains two categories:
Standard structural fields (always present, pre-session): name, control name, languages, cia, sources, cwe, statement, control statement, severity, severity if non-compliant, mapped severity, family, test, test approach, check, tier, source, reachability, framework, fix, fix text, check content, description, rationale, references, rule id, group id, version, weight, legacy ids, discussion, vul discuss, ia controls, responsibility, priority, security override guidance, potential impact, third party tools, mitigation control, severity override guidance, title, id, mitigations, applicable platforms, notes, common consequences, observed examples
Framework reference fields added (Round 26 or earlier):
'owasp', 'owasp-api', 'nist-800', 'iso-27001', 'cmmc', 'dod-srg', 'fedramp',
'hipaa', 'pci-dss', 'soc2', 'sec-finra', 'eu-dora', 'eu-ai',
'owasp-llm', 'nist-ai', 'iso-42001', 'saif', 'csa-ai',
'secondary', 'secondary cia', 'secondary cia (if applicable)',When new fields are added to control library .md files, add the lowercase key to _known in controls.py. If they already appear in the standard structural fields list above, no change is needed.
Tier assignment is computed by classify_control() in controls.py, not set in the .md library. When a control is parsed, its tier is determined by three hardcoded sets near the top of controls.py: AUTO_IDS (specific control IDs → automatic_confirmation), AUTO_FAMILIES (family names → automatic_confirmation), and MANUAL_IDS (specific IDs → manual_confirmation). Everything else defaults to review_required. Priority: AUTO_IDS / AUTO_FAMILIES are checked first; MANUAL_IDS is only reached if neither auto condition matches. DATA-001 and DATA-003 appear in both AUTO_IDS and MANUAL_IDS — they resolve to automatic_confirmation because AUTO wins. To change a control's tier, add/remove its ID from one of these sets in controls.py — a Tier: field in the .md file is recognized (it's in _known) but is NOT what drives the actual tier assignment.
Tier names in code use underscores; display uses spaces. In controls.py and engine.py, tiers are the strings "automatic_confirmation", "review_required", "manual_confirmation". In the UI and reports, they display as "automatic confirmation", "review required", "manual confirmation" (spaces). The "tier names are immutable" rule applies to the display form. Both forms must stay in sync.
CONTROL_LIBRARIES dict in controls.py hardcodes the control count for every library. Each entry includes a "count" value (e.g. "count": 67 for website_agent). If controls are added or removed from any .md library, the corresponding count in this dict must also be updated in controls.py. Not updating it won't crash the app but will produce incorrect counts in any UI that displays library statistics. This applies to all five library entries in the dict (website_agent, api, code_review, interconnected, os_software), not just website_agent. STIG controls are not in CONTROL_LIBRARIES — they use a separate parse_stig_controls(md_path) function.
requirements.txt has three dependencies: PyQt6>=6.6.0, requests>=2.31.0, pyyaml>=6.0. (beautifulsoup4 was removed in Round 27.) Relevant for Phase 10 (PyInstaller packaging) — the dependency footprint is small.
Report-based carryforward is the sole carryforward mechanism (DB-based carryforward was removed in Round 28). The user explicitly loads a previous HTML report via the "Load previous report" button. _load_previous_report() calls extract_prior_data_from_report(), validates the HTML contains sat-controls-data (rejects non-tool reports), and stores the result as self.prior_report_data = {control_id: {'is_fp': bool, 'justification': str, 'note': str, 'stig_status': str}}. This dict is passed to AssessmentEngine(prior_report_data=...).
apply_review_decision(), apply_manual_decision(), and apply_all_prior_manual() were removed in Round 28, along with DecisionsDB, FalsePositivesDB, prior_decision, prior_evidence_changed, user_decision, and use_prior. The only effective carryforward is report-based: FP status, notes, and STIG triage decisions from a user-loaded prior HTML report are applied by run_automatic_tier() via prior_report_data.
FalsePositivesDB was removed in Round 28. The false_positives table and all FalsePositivesDB methods were deleted from db.py. FP tracking is now exclusively report-based.
Several DB read methods exist but are never called from main.py: ScansDB.get_history(system_id, limit=20), SystemsDB.get_all(), SystemsDB.find_by_target(target). These are designed for a history/audit view that doesn't exist in the current GUI. CONTROL_LIBRARIES (from controls.py) is imported in main.py line 34 but never accessed — dead import, likely a leftover from an earlier GUI iteration.
FindingsDB is partially written during scan but never read back in the GUI. _run_next_target() in main.py calls engine.start_scan() at line 905 before starting ScanWorker — scan_id is always non-None during a scan, so FindingsDB.save() is never blocked by a null scan_id guard. FindingsDB.save() is called in three places in engine.py: (1) at line ~226 in run_automatic_tier() for auto-tier controls, (2) inside apply_review_decision(), (3) inside apply_manual_decision(). Since only auto-tier processing calls FindingsDB.save() (review/manual decision methods were removed in Round 28), only auto-tier control results are actually written to the findings table. Important: NOT all auto-tier controls are saved. When an auto-tier control is promoted to review_required (scanner returns NEEDS_REVIEW), continue at engine.py line ~177 skips the rest of the loop body including FindingsDB.save() — promoted controls are NOT saved. Auto-tier controls with no scanner match DO get saved (status=NEEDS_REVIEW, short fallback message). Review-tier and manual-tier results are never persisted. FindingsDB.get_for_scan() exists but is never called from main.py or reporter.py — the findings table is an incomplete audit log not surfaced in the UI. Reports are generated from engine.all_results (in-memory), not from the DB. findings has no UNIQUE constraint on (scan_id, control_id) — if an in-app triage screen is ever built that calls FindingsDB.save() for review/manual controls, duplicate rows could be produced for any control already saved in the auto loop. get_for_scan() would then return two rows for such controls.
Prior report import uses extract_prior_data_from_report() in reporter.py, not extract_fps_from_report(). extract_fps_from_report() exists but is deprecated — its docstring says "Prefer extract_prior_data_from_report for new callers." extract_prior_data_from_report(html_path) -> dict reads the sat-controls-data JSON tag and returns broader prior state (decisions, notes, FPs, STIG status). Returns {} on any error (file not found, malformed JSON, non-tool report). extract_fps_from_report(html_path) -> set is a thin wrapper that calls extract_prior_data_from_report and returns only the FP set. Always use extract_prior_data_from_report in new code.
extract_prior_data_from_report() returns a sparse dict — only controls where mitigation == 'YES' (FP), a non-empty note, OR a non-empty stig_status are included. Controls that meet none of those conditions are absent from the returned dict. A control absent from the dict means "no prior data" — it does NOT mean "was compliant" or "was non-compliant". The returned dict values have keys: is_fp (bool), justification (str), note (str), stig_status (str — empty for non-STIG controls).
_parse_controls_from_html() supports three formats (backward compatibility):
- New format:
<script type="application/json" id="sat-controls-data">tag — current format used by all generated reports (both STIG and non-STIG, as of Round 29) - Legacy format:
const CONTROLS =inline JS variable — older non-STIG report format, still readable - Legacy STIG format:
var CONTROLS =inline JS variable — older STIG report format, still readable (added Round 29)
This means users with reports generated by an earlier version of the tool can still load them via "Load previous report". If none of the three formats is found, returns None and extract_prior_data_from_report() returns {}.
STIG reports were NOT parseable by — FIXED (Round 29). _parse_controls_from_html()_parse_controls_from_html() now iterates over both 'const CONTROLS = ' and 'var CONTROLS = ' prefixes, so var CONTROLS in the STIG template is matched. Additionally, extract_prior_data_from_report() now checks both field naming conventions: non-STIG (mitigation == 'YES', mitigationDesc, note) and STIG (isFalsePositive, fpJustification, userNotes). Prior FPs and notes from STIG reports now carry forward correctly.
reporter.py generates four output formats. generate_html_report(), generate_markdown_report(), generate_csv_report(), generate_json_report() — all take (engine, output_path). HTML is the primary format. All formats are exposed in the GUI via the report format dropdown. STIG HTML goes through a separate _generate_stig_html_report() (called internally by generate_html_report() when any control in engine.all_results has library == 'stig' — NOT checked against engine.target_type). STIG uses a different template (CAT I/II/III format) and maps internal statuses to STIG standard terms: COMPLIANT → "Not a Finding", NON_COMPLIANT → "Open", NOT_APPLICABLE → "Not Applicable", FALSE_POSITIVE → "Not a Finding", NOT_TESTED/NEEDS_REVIEW → "Not Reviewed".
reporter.py template routing — get_template_path(target_type, selected_sets) selects the HTML template based on selected_sets:
"interconnected"inselected_sets→interconnected-report-template.html"code_review"inselected_sets→code-review-report-template.html"api"inselected_sets→api-report-template.html- anything else (website, agent, os) →
report-template.htmlSTIG bypassesget_template_path()entirely and usesget_stig_template_path()→stig-report-template.html.
stig_parser.py exports two functions. Located at pen-tester/tools/stig_parser.py, dynamically imported in main.py via sys.path.insert(0, tools_dir):
parse_stig(xml_path, profile_id=None)— parses XCCDF 1.1 XML; returns{'benchmark': {...}, 'profiles': [...], 'rules': [...], 'stats': {...}}to_markdown(parsed, include_profiles=False)— converts parsed data to.mdformat
parse_stig() return structure (verified from source):
benchmark: dict with keysid,title,description,version,release_info,date,publisher,sourceprofiles: list of{'id': str, 'title': str, 'selected_rules': [rule_id_refs]}— each STIG profile with its selected rule idref listrules: list of rule dicts (one per XCCDF Group); each rule hasvuln_id,rule_id,version,title,statement,stig_severity('high'/'medium'/'low'),mapped_severity('CRITICAL'/'HIGH'/'MEDIUM'),cat('CAT I'/'II'/'III'),cia,srg_refs,satisfies,ccis,fixtext,check_content,dpms_target,dpms_id,srg_titlestats:{'total_rules': N, 'cat_i': N, 'cat_ii': N, 'cat_iii': N}
STIG severity round-trip bug — FIXED (Round 29). _sev_to_cat() in reporter.py previously mapped CRITICAL/HIGH → CAT I, causing CAT II (stored as HIGH) to display as CAT I. Fixed to correctly reverse stig_parser.py's SEVERITY_MAP: CRITICAL → CAT I, HIGH → CAT II, MEDIUM/LOW → CAT III. Round-trip is now correct for all three CAT levels.
For STIG controls, control.control_id = the STIG version string (e.g., CYLN-OP-000010), NOT the Vuln ID. to_markdown() uses ### {vuln_id} (e.g., ### V-267789) as the Markdown section header — this passes the ^V-\d{5,6} regex in _parse_control_section(). The version string (e.g., CYLN-OP-000010) is written as - **Control ID**: {version} inside the section body. parse_stig_controls() reads fields.get('control id', fields.get('version', '')) to set ctrl_id — so the version string becomes control_id in the Control object, while the Vuln ID (V-267789) only appears in the header. The Vuln ID is stored separately as control.vuln_id. When looking up a STIG control in any DB or report, use the version string, not the Vuln ID.
STIG CIA classification is keyword-inferred, not authoritative. _infer_cia() does keyword matching on the concatenated rule title + VulnDiscussion text (lowercased). Keywords: C — encrypt, tls, ssl, certificate, credential, password, authentication, confidential, pii, sensitive data, disclosure, privacy, banner, identity provider, siem, audit, log; I — integrity, tamper, modify, certificate, tls, digital signature, hash, checksum, update, patch, version, configuration; A — availability, timeout, session, denial, database, port, protocol, service, disable, function. Multiple CIA letters can trigger. Default when no keywords match: 'C, I'. CIA is approximate for STIG controls — many keywords overlap (e.g., certificate and tls both trigger C and I simultaneously).
to_markdown() CCI list is capped at 5 per rule. rule['ccis'][:5] — if a rule has more than 5 CCI identifiers, the excess is shown as (+N more) in the generated .md file (e.g., CCI-001234, CCI-001235 (+3 more)). When parse_stig_controls() reads **CCIs**: back, it splits on ', ' → the last element of control.ccis will contain the (+N more) suffix (e.g., ['CCI-001234', 'CCI-001235 (+3 more)']). This is a cosmetic artifact — the CCI data is displayable but not cleanly list-parseable when the rule has more than 5 CCIs.
stig_parser.py handles XCCDF 1.1 only (namespace http://checklists.nist.gov/xccdf/1.1). STIGs using XCCDF 1.2 or a different namespace will parse silently with empty results or raise XML errors.
_infer_cia() keyword sets (verified from source): Called for each rule with (statement, title) — NOT check_content or fixtext. Scans combined lowercase string for membership:
- Confidentiality (
C):encrypt,tls,ssl,certificate,credential,password,authentication,confidential,pii,sensitive data,disclosure,privacy,banner,identity provider,siem,audit,log - Integrity (
I):integrity,tamper,modify,certificate,tls,digital signature,hash,checksum,update,patch,version,configuration - Availability (
A):availability,timeout,session,denial,database,port,protocol,service,disable,function - Default if no keywords match:
'C, I'(not empty). Note:certificateandtlsappear in BOTH C and I lists, so TLS-related rules always get both.configurationin the I list means many rules pick up Integrity even if not specifically about data integrity.
stig_parser.py has a CLI — usable as python stig_parser.py <stig.xml> [--output out.md] [--profile <profile_id>] [--format md|json]. The --profile flag applies profile filtering at parse time (passes profile_id to parse_stig()). The GUI import now also filters rules by profile — see Open Question 8 (fixed).
— FIXED (Round 29). See Open Question 8.StigImportDialog profile selection is cosmetic
stig_paths passed to AssessmentEngine contains paths to generated .md files, not XCCDF paths. Full STIG import flow:
- User clicks "Import STIG" button →
_import_stig()opensStigImportDialog - User selects XCCDF XML file →
StigImportDialog._browse()callsparse_stig(xml_path)immediately (parses while dialog is open) - User clicks "Import STIG" button in dialog →
dialog.exec()returns;dialog.stig_data['parsed']holds the already-parsed result _import_stig()callsto_markdown(parsed, include_profiles=True)— writes all rules topen-tester/references/stig-{safe_id}-controls.md, wheresafe_id = benchmark_id.replace(' ', '_').lower()StigsDB.save()records the import in the DB- Target list item is added with
data = {'target': xml_path, 'type': 'stig', 'stig_md_path': md_path}— notetargetis the XCCDF XML path, NOT the MD path;stig_md_pathis the generated MD path - At assessment time:
target = d['target'](XCCDF path, stored in DB as system target),stig_paths = [d.get('stig_md_path', '')](the.mdpath), passed toload_all_controls()→parse_stig_controls(md_path)
The tools_dir used by _import_stig() and StigImportDialog._browse(): os.path.dirname(os.path.dirname(os.path.abspath(__file__))) + "/tools" = pen-tester/tools/. The refs_dir: same two-levels-up + "/references" = pen-tester/references/. The existing stig-cylanceon-prem-controls.md in references/ was generated this way from a prior import. Note: imported STIG .md files land in pen-tester/references/, which is tracked by Multi-Modal-Scanner — they will be staged on the next git add -A from root.
AssessmentEngine.__init__ full signature (engine.py lines 64–93):
AssessmentEngine(
target: str,
target_type: str,
system_id: int,
selected_sets: list,
stig_paths: list = None,
framework_filter: str = None,
prior_fp_ids: set = None, # fallback if no prior_report_data
prior_report_data: dict = None, # from extract_prior_data_from_report()
)After construction, call engine.load_controls() to load and tier all controls into three lists: engine.auto_results (automatic_confirmation), engine.review_items (review_required), engine.manual_items (manual_confirmation). engine.all_results contains all controls combined — this is what reporter.py reads. Returns get_tier_counts(self.controls) (a dict of tier → count). engine.scan_id is None until engine.start_scan() is called, which creates the scans DB row.
engine.complete(report_path) — called after _write_reports() in main.py. Counts findings (status == NON_COMPLIANT only) and compliant (status == COMPLIANT only), then calls ScansDB.complete(scan_id, controls_tested, findings_count, compliant_count, report_path) and SystemsDB.update_last_scanned(system_id). Only runs if self.scan_id is set. Important: findings_count and compliant_count do NOT account for all controls — FALSE_POSITIVE + NOT_APPLICABLE + NOT_TESTED controls fall into neither bucket. The two counts will not sum to controls_tested.
engine.get_summary() and engine.get_findings() — two additional methods on AssessmentEngine not part of the scan lifecycle but called by reporter.py and _show_results() in main.py:
get_summary() -> dict— returns counts for total, compliant, non_compliant, not_applicable, false_positive, not_tested, critical/high/medium/low, auto_total, review_total, manual_total, auto_findings. Used for report header stats and GUI results screen. Important nuances:critical/high/medium/lowcount only NON_COMPLIANT results by severity — INFORMATIONAL findings are NOT counted in any severity key (noinformationalkey in the dict);not_testedcounts only status ==NOT_TESTED, notNEEDS_REVIEW;false_positivecountsr.is_false_positiveregardless of status (FPs can be COMPLIANT or NON_COMPLIANT in the raw status field).get_findings() -> list— returns NON_COMPLIANT results (excluding FPs) sorted by severity: CRITICAL(0), HIGH(1), MEDIUM(2), LOW(3), INFORMATIONAL(4). Used internally; unknown severities sort last.auto_findingsin the summary dict counts NON_COMPLIANT inauto_results— NEEDS_REVIEW-promoted auto controls remain inauto_resultswith status=NEEDS_REVIEW, so they are NOT counted inauto_findings.
auto_total + review_total + manual_total can exceed total. Promoted controls (auto-tier scanner returned NEEDS_REVIEW) are appended to self.review_items during the auto-tier loop (line 176) but are NEVER removed from self.auto_results. So they are counted in both auto_total (len(self.auto_results)) AND review_total (len(self.review_items)). total (from all_results) does NOT double-count them. In a scan with N promotions, auto_total + review_total + manual_total == total + N.
framework_filter is a string like "pci-dss" or "hipaa" that records which compliance framework was active during the scan. None means no filter (all frameworks). main.py never passes framework_filter to AssessmentEngine — it is always None in the current GUI. engine.framework_filter is always None; {{FRAMEWORK}} in the report is always "All frameworks"; scans.framework_filter is always None. Framework selection in reports is handled entirely client-side via the fwDrop dropdown in the browser. The framework_filter parameter is future infrastructure for a server-side filter.
target_type == 'unknown' falls back to 'website' in _start_assessment() (main.py line 845): target_type = d['type'] if d['type'] != 'unknown' else 'website'. If detect_target() returns 'unknown', the GUI treats the target as a website — runs website HTTP scanners against controls-library.md, uses report-template.html. The scan does not fail or error. If target_type is somehow not in _TYPE_SETS (shouldn't happen after the unknown→website conversion), _TYPE_SETS.get(target_type, ['website_agent']) defaults to ['website_agent'] controls.
_TYPE_SETS in main.py is the canonical target_type → selected_sets mapping:
_TYPE_SETS = {
'website': ['website_agent'],
'agent': ['website_agent'],
'api': ['api'],
'code': ['code_review'],
'stig': [], # uses stig_paths instead
'os': ['os_software'],
'interconnected': ['interconnected'],
}selected_sets drives both load_all_controls() in controls.py and get_template_path() in reporter.py. STIG is the only type with an empty set — controls come from stig_paths (paths to imported STIG .md files).
STIG controls hardcode tier='review_required' — they never go through classify_control(). parse_stig_controls() (controls.py line 309) sets tier='review_required' unconditionally for every STIG control. classify_control() is never called for STIG controls. Consequence: for a STIG scan, engine.auto_results = [] and engine.manual_items = [] — all STIG controls land in engine.review_items. auto_total = 0, review_total = N (all controls), manual_total = 0. Even if a STIG control's version string happened to match an ID in AUTO_IDS or MANUAL_IDS, it would still be review_required.
STIG scans unintentionally run WEBSITE_SCANNERS. target_type='stig' is passed to AssessmentEngine. In run_automatic_tier(), the dispatch chain (if target_type == 'code' ... elif target_type == 'api' ... elif target_type == 'os' ... elif target_type == 'agent' ... else ...) has no 'stig' branch, so it falls to the else clause and calls run_all_scanners(self.target, 'stig', ...). get_scanners_for_type() in scanners.py returns WEBSITE_SCANNERS for any type not explicitly handled — including 'stig'. These website scanners attempt to scan the XCCDF XML file path as a URL, fail with connection errors, and produce ERROR or empty results. Since auto_results = [] for a STIG scan, the auto-tier loop processes nothing. The family_evidence dict is built from the failed scan results (empty or ERROR), so all STIG review-tier controls get the no-match no-family checklist path: confidence=0.2, structured checklist, status=NOT_TESTED. The scanner errors are harmless in practice — STIG controls are review_required regardless and would all end up NOT_TESTED with checklists even if no scanners ran at all.
Multi-target scans auto-include interconnected controls. When the user enters more than one target in the GUI (main.py line ~856), 'interconnected' is automatically appended to selected_sets (unless it's STIG or already present). This means a multi-target scan runs both the primary library AND the 27 interconnected controls — Connected Systems assessment is triggered automatically, not as a separate explicit choice. A single-target scan never includes interconnected controls.
ScanWorker is a QThread. Scans run in a background thread (ScanWorker(QThread) in main.py) and emit pyqtSignal progress updates. The GUI stays responsive during scanning. If debugging a hang or crash during scanning, it's in the thread, not the main GUI thread. _abort_scan() does NOT terminate the thread — it only disconnects the progress and finished signals, sets scan_worker = None, and calls _reset_to_home(). The underlying QThread may continue running in the background. The thread cannot be safely force-terminated in PyQt6 without risking memory corruption. If a scan is aborted and a new scan started, both threads may run concurrently (the old one completes silently since signals are disconnected).
ScanWorker.progress signal is pyqtSignal(str, str, str, list) — 4 arguments. Connected at scan_worker.progress.connect(self._on_progress). Handler signature: _on_progress(self, name, desc, status, results). status is either 'running' (scanner started — appends to scanner feed, updates status label) or 'done' (scanner finished — increments progress bar and, for each NON_COMPLIANT result in results, appends [SEV] control_id — evidence[:80] to the findings feed). The desc arg appears in the 'running' line as "name: desc…". The results arg is always a list of ScanResult objects; when status == 'running', results is an empty list (signal is emitted before results are known).
prior_report_data vs prior_fp_ids in engine init. If prior_report_data (a dict from extract_prior_data_from_report()) is provided, the engine derives FP IDs as {cid for cid, v in prior_report_data.items() if v.get('is_fp')}. If only prior_fp_ids (a plain set of control IDs) is passed, that's used directly. The two-parameter design lets callers pass either a rich prior-report dict or a simpler set, depending on what's available. Important: self.prior_report_data is always a dict after __init__ — a None argument becomes {}. It is never None on the engine object, even if None was passed to the constructor (lines 77–82 of engine.py). user_notes from a prior report (the note key in prior_report_data) is also carried forward: ar.user_notes = prior['note'] (line 381) — this means triage notes from a previous report are pre-populated in the new scan's result objects.
control.frameworks field is always an empty list. controls.py never populates Control.frameworks — the parser puts all framework field values into _known (to prevent leakage into review_procedure) but does not assign them to ctrl.frameworks. reporter.py writes r.control.frameworks to the report JSON but it is always []. Framework display in reports is handled entirely by FW_MAP in the templates (client-side). Do not try to read control framework data from ctrl.frameworks — read it from the FW_MAP lookup or from the raw .md library fields.
STIG XCCDF import feeds into the same controls.py parser. stig_parser.py converts XCCDF rule records into the same field format that controls.py expects. The STIG-specific field names in _known (rule id, group id, check content, fix text, vul discuss, ia controls, weight, legacy ids, etc.) exist specifically to accommodate STIG data without leaking into review_procedure. This means the entire STIG assessment workflow — from XCCDF import to HTML report — runs through the same code paths as the other assessment types; no separate STIG-specific parser path exists in controls.py itself.
pen-tester/SKILL.md vs the SKILL.md that agent_scanner.py reads — these are different things. pen-tester/SKILL.md (tracked in Multi-Modal-Scanner) is the Claude Code skill definition file for this scanner project. When a user runs an AI Agent assessment and provides a SKILL.md as the target, agent_scanner.py reads THAT file as the agent config being assessed for security vulnerabilities. agent_scanner.py also accepts GPT configs, LangChain defs, and MCP manifests as agent targets. There is no circular dependency — the scanner's own SKILL.md is never read by the scanner itself during an assessment.
Standalone app reads assets/ and references/ at runtime from the local filesystem. reporter.py finds templates by resolving os.path.join(standalone_dir, '..', 'assets'). controls.py finds libraries via os.path.join(standalone_dir, '..', 'references'). This means: (1) the standalone must stay nested inside pen-tester/, (2) pushing template changes to GitHub does NOT automatically update the standalone's output — the local files are what matter at runtime.
assessments.db is gitignored and local only. The SQLite database at pen-tester/standalone/assessments.db stores scan history (systems, scans, findings, imported STIGs). The decisions and false_positives tables were removed in Round 28. Never commit it. pen-tester/standalone/reports/ (generated reports) is also gitignored.
Python Scripts directory is not on PATH. C:\Users\slagb\AppData\Local\Python\pythoncore-3.14-64\Scripts is not in the user's PATH. Running pip from a standard PowerShell prompt may fail. Use the full path or cd to the standalone directory and use python -m pip instead.
_parse_control_section() silently skips non-control sections — two conditions cause a ### section to be silently dropped: (1) the header doesn't match the control ID regex (^[A-Z]{2,10}(?:-[A-Z]{2,10})?-\d{3,4} or ^V-\d{5,6}), OR (2) the header starts with one of the skip_prefixes: ('NOTE', 'TODO', 'LEGEND', 'TOTAL', 'TABLE'). A section is also dropped if it has no name, control name, statement, or control statement field after parsing. When debugging a missing control in the assessment (present in .md but not appearing in results), check all three conditions.
Control object fields — what controls.py builds and scanners/engine.py consume via ar.control.X:
| Field | Type | Default | Notes |
|---|---|---|---|
control_id |
str | — | required; FAMILY-NNN format |
name |
str | — | required; human-readable control name |
family |
str | — | required; e.g. AUTH, CRYPTO, AGENT |
library |
str | — | required; website_agent / api / code_review / interconnected / stig |
cia |
str | "" |
primary CIA triad impact |
severity |
str | "MEDIUM" |
CRITICAL / HIGH / MEDIUM / LOW / INFORMATIONAL |
statement |
str | "" |
control statement |
test_procedure |
str | "" |
what to test (from Test: field in .md) |
review_procedure |
str | "" |
test + all non-_known sub-fields concatenated |
fix_text |
str | "" |
remediation text |
tier |
str | "review_required" |
computed by classify_control() — not read from .md |
frameworks |
list | [] |
list of framework reference strings |
cwe |
str | "" |
CWE reference |
languages |
str | "ALL" |
code-review only; drives filterReviewSteps() |
sources |
str | "" |
source references |
vuln_id |
str | "" |
STIG-specific |
rule_id |
str | "" |
STIG-specific |
srg_ref |
str | "" |
STIG-specific |
ccis |
list | [] |
STIG-specific CCIs |
check_content |
str | "" |
STIG-specific check content |
detect_target() in detector.py returns a dict, not a string. The function signature is detect_target(user_input: str) -> dict. The returned dict has keys: type, label, icon, control_sets (list), description. Valid type values: website, api, code, agent, stig, os, unknown. The control_sets value maps to the library key in CONTROL_LIBRARIES — both "website" and "agent" target types return control_sets: ["website_agent"] and share the same control library. This is why both assessment types use the same 67-control set.
Detection priority (checked in order): (1) OS scan keywords (localhost, 127.0.0.1, ::1, this machine, local machine, this host) → os; (2) URL matching ^https?:// → website; (3) STIG text pattern (xccdf, stig*.xml regex) → stig; (4) Agent text pattern (SKILL.md, .gpt, copilot, langchain, crewai, autogen, mcp, bedrock, vertex) → agent; (5) existing file path by extension (code extensions → code, .yaml/.yml/.json → api, .xml with xccdf/Benchmark content → stig, .md with SKILL in name → agent); (6) extension alone on non-existent path; (7) text contains swagger/openapi/api-spec/postman → api; (8) fallback → unknown. detect_languages(paths) was removed from detector.py in Round 27 (it was dead code, never called anywhere in the codebase).
detect_target() detection priority (verified against detector.py):
text.lower() in OS_SCAN_KEYWORDS→"os". Exact membership test against{'localhost', '127.0.0.1', '::1', 'this machine', 'local machine', 'this host'}. No prefix/suffix — must be exactly one of these strings.WEBSITE_PATTERN.match(text)→"website". Matcheshttp://orhttps://prefix (case-insensitive).STIG_PATTERNS.search(text)→"stig". Matches "xccdf" or "stig*.xml" ANYWHERE in string. Runs BEFORE agent check.AGENT_PATTERNS.search(text)→"agent". MatchesSKILL.md,.gpt,copilot,langchain,crewai,autogen,mcp,bedrock,vertexANYWHERE in string. Warning: "mcp" in any path component triggers agent detection (e.g., a file atC:/projects/mcp-server/config.jsonwould be wrongly detected as agent).os.path.exists(text)→ file/directory exists: check extension →code(if code ext),api(if .yaml/.yml/.json),stig(if .xml AND file reads as XCCDF),agent(if .md AND "SKILL" in basename)..jsonfiles matching API extension spec are detected as API even if not an OpenAPI spec.- Extension-only fallback (file not required to exist):
.py/.js/etc.→"code",.yaml/.yml/.json→"api". - Keyword fallback: 'swagger', 'openapi', 'api-spec', 'postman' in text →
"api". - Default:
"unknown"→main.pyconverts to"website"in_start_assessment().
extract_hostname(target) uses re.match(r'https?://([^/:]+)', target) — works for URLs only; returns the full target string unchanged for non-URL targets (file paths, local machine, etc.).
SystemsDB.get_or_create() keying — the full target string (e.g. https://example.com/api/v1) is the UNIQUE key in systems.target. extract_hostname() extracts just the hostname (e.g. example.com) and uses it as display_name. Two different URLs on the same host are two separate system records. For non-URL targets (code paths, STIG files, local machine), the full target string serves as both key and display name. The system_id returned is what ties all scans and decisions for that target together.
db.py connection setup (get_connection(), lines 21–26): sqlite3.connect() → conn.row_factory = sqlite3.Row (rows accessible as dicts) → PRAGMA journal_mode=WAL → PRAGMA foreign_keys=ON. All DB calls go through this function so every connection gets WAL mode and FK enforcement. WAL mode allows concurrent reads while a write is in progress — relevant if debugging "database locked" errors.
assessments.db has four active tables (schema in db.py; decisions and false_positives were removed in Round 28 — their schema entries below are retained for historical reference):
systems— one row per unique target string;UNIQUE(target); fields:id,target,target_type,display_name,first_scanned,last_scanned,scan_countscans— one row per scan run; FK tosystems.id; fields:id,system_id,started_at,completed_at,controls_tested,findings_count,compliant_count,control_sets(JSON),framework_filter,report_pathdecisions— per-control triage decisions;UNIQUE(system_id, control_id); fields:id,system_id,control_id,scan_id,tier,decision,evidence_hash,notes,decided_at,decided_byfalse_positives— FP suppression records;UNIQUE(system_id, control_id); fields:id,system_id,control_id,justification,evidence_hash,created_at,last_validated,is_activeimported_stigs— STIG import records;UNIQUE(stig_id); fields:id,stig_id,title,version,release_info,rule_count,file_path,imported_at,controls_md_pathfindings— per-control scan results; FK toscans.id; mirrorsAssessmentResultfields:scan_id,control_id,tier,status,severity,evidence,confidence,cvss_score,cvss_vector,reachability,remediation,is_false_positive
The decisions and false_positives tables were removed in Round 28, along with DecisionsDB, FalsePositivesDB, and the apply_review_decision()/apply_manual_decision() methods. The schema entries above are retained for historical reference (they document the original design intent). The only carryforward mechanism is report-based (prior_report_data loaded from a prior HTML report file).
evidence_hash(evidence_text) in db.py computes sha256(text).hexdigest()[:16] — a 16-character hex prefix. This is what's stored in decisions.evidence_hash and false_positives.evidence_hash, and compared on reassessment to detect evidence changes (FP re-evaluation trigger).
pen-tester/references/os-software-controls.md exists — this is the OS & Software assessment control library, used only by the Standalone app. It was not mentioned in the four control libraries listed in README.md's structure section but it is present on disk.
pen-tester/tools/stig_parser.py is the STIG XML parser utility. It is tracked in Multi-Modal-Scanner under pen-tester/tools/ and dynamically imported at runtime by the Standalone via sys.path.insert in main.py. It is not in the standalone's own directory.
Development phases: 9 of 10 complete. The original 10-phase plan: (1) project setup, (2) controls engine, (3) scanner framework, (4) website scanners, (5) code scanners, (6) API scanners, (7) agent scanners, (8) SQLite persistence, (9) PyQt6 GUI, (10) reporting & packaging. Phases 1–9 are complete. Phase 10's reporting component (HTML report generation, in-app triage, save/load) is done. Only the packaging component (PyInstaller bundling into a distributable .exe) remains.
Evidence confidence scoring thresholds (supplement section 2):
- >70% confidence → "Likely non-compliant" (red label)
- 50–70% → "Uncertain — review evidence" (amber label)
- <50% → "Insufficient data" (gray label)
package.json, package-lock.json — no longer present at root. They were created when the docx npm package was used to generate Project_Handoff_Document.docx and have since been deleted.
The app actually runs — confirmed by user. The user successfully launched python main.py on their Windows machine, navigated through the scan to the review screen, and observed the bugs live (phantom OWASP control, empty evidence). The app is not theoretical. Basic launch → scan → review flow is working. The "phantom OWASP control" was Bug 1 — framework field keys like owasp and nist-800 were not in _known, so they appeared as spurious review procedure steps. Bug 1 is now fixed.
manual_confirmation evidence is engine-generated, not scanner-generated. No scanner is ever invoked for manual_confirmation controls. run_automatic_tier() builds structured evidence for each manual item directly: control name + requirement + numbered test procedure steps (from ctrl.test_procedure, split on .) + "No automated scanner covers this control. Complete the steps above then record your determination." This mirrors the review_required fallback but is always applied (there's no "did a scanner cover this?" check for manual controls). The resulting evidence text is what the user sees in the triage interface as the checklist to work through.
Triage happens in the browser, not in the PyQt6 GUI. After a scan, the HTML report is opened in the browser. The user triages controls (marking FP, accepting findings, adding notes) directly in the browser UI and saves the result. In-app triage methods (apply_review_decision, apply_manual_decision) were removed in Round 28; in-app triage screens have not been built.
Complete triage save/load flow:
- In-browser state: Decisions are stored in a JS
decisions{}dict in memory. Notes in anotes{}dict. Thelswrapper (safelocalStoragewrapper that silently no-ops if blocked by browser security/private mode) also persists decisions and notes to localStorage, namespaced byREPORT_ID(lsKey = 'sat_' + REPORT_ID + '_' + controlId). - State seeding on load: On report open,
decisions{}is seeded from: (a) CONTROLS JSON — controls withmitigation === 'YES'are pre-marked as FALSE_POSITIVE, and (b) localStorage overrides for this REPORT_ID. localStorage wins over CONTROLS JSON — so live edits from a prior browser session are preserved even when loading the same HTML file again. - "Save updated report" button (id=
saveReport): Bakes currentdecisionsandnotesback into a copy of CONTROLS JSON, temporarily replacessat-controls-datatextContent, serializesdocument.documentElement.outerHTML, then triggers a browser download. Triage is NOT auto-saved — the user must click this button for decisions to persist in a downloadable file. - Downloaded file naming:
{original_report_name}_{YYYY-MM-DD_HH-MM}_saved.html. The downloaded file is a complete self-contained HTML with all decisions baked intosat-controls-data. - Loading the saved report: User loads the downloaded file via "Load previous report" in the PyQt6 GUI.
extract_prior_data_from_report()readssat-controls-data, returns FP/note state asprior_report_data, which is passed toAssessmentEngineon the next scan.
REPORT_ID is str(uuid.uuid4())[:8] generated fresh in reporter.py on each call to generate_html_report() (line 226). Each scan produces a new REPORT_ID — this namespaces localStorage so different reports don't share triage state in the same browser.
NEEDS_REVIEW controls don't appear in any summary subcount. After run_automatic_tier() completes but before triage, the status distribution is:
- Auto-tier controls: COMPLIANT, NON_COMPLIANT, NEEDS_REVIEW, or FALSE_POSITIVE (if from prior report). Auto-tier controls with no scanner match always get NEEDS_REVIEW (family-based evidence if available, otherwise generic fallback message).
- Review-tier controls: NOT_TESTED is the status for all original
review_requiredcontrols — both with and without scanner coverage. The review loop (run_automatic_tier()lines 244–327) NEVER setsar.statusin any branch: direct match, family-based evidence, or no-evidence checklist.ar.statusstays at the dataclass default (NOT_TESTED). The only non-NOT_TESTED exceptions are: NEEDS_REVIEW for controls promoted from auto-tier (status was set at engine.py line 174 before thecontinuethat moved them intoreview_items, so they arrive pre-set); FALSE_POSITIVE (from prior report carryforward). There is no review-tier path that produces COMPLIANT or NON_COMPLIANT before human triage. - Manual-tier controls: NOT_TESTED (dataclass default; no in-app triage path exists to change it)
Key clarification on review-tier scanner evidence: A review_required control with a direct scanner match (e.g., INPUT-005 matched by InputValidationScanner returning NON_COMPLIANT) appears in the report as NOT_TESTED with scanner evidence text populated. The engine correctly copies the evidence but intentionally withholds the status judgment — human triage is required before status resolves. "Scanner directly matched" does NOT produce COMPLIANT or NON_COMPLIANT for review-tier controls.
get_summary() counts: compliant (status==COMPLIANT), non_compliant (NON_COMPLIANT), not_tested (status==NOT_TESTED), not_applicable (NOT_APPLICABLE), false_positive (is_false_positive). NEEDS_REVIEW and ERROR statuses are NOT counted in any subcount. NOT_TESTED IS counted in the not_tested subcount. The sum of subcounts falls short of total only by the number of NEEDS_REVIEW (and ERROR) controls — controls that are NOT_TESTED do count. The report header {{TOTAL_CONTROLS}} and {{NON_COMPLIANT_COUNT}} reflect this — controls with NEEDS_REVIEW status aren't shown in the header stats but NOT_TESTED controls are (via implication: total minus compliant minus non_compliant minus etc.).
_generate_standalone_html(engine, findings_data, summary) is a fallback for non-STIG reports when the template file is missing from pen-tester/assets/. Different from _generate_stig_fallback_html(). Both fallbacks produce minimal HTML tables with no JavaScript interactivity.
User triage decision strings — if an in-app triage screen is ever built, these are the decision strings it should use. The underlying apply_review_decision() / apply_manual_decision() methods were removed in Round 28 and will need to be re-implemented. The decision string conventions should be preserved for consistency:
| Tier | Decision string | Resulting status |
|---|---|---|
review_required |
'false_positive' |
FALSE_POSITIVE + is_false_positive = True (FalsePositivesDB removed — re-implement storage) |
review_required |
'accept' |
NON_COMPLIANT (user confirms finding is real) |
review_required |
'compliant' |
COMPLIANT |
review_required |
'na' |
NOT_APPLICABLE |
manual_confirmation |
'fail' |
NON_COMPLIANT |
manual_confirmation |
'pass' |
COMPLIANT |
manual_confirmation |
'na' |
NOT_APPLICABLE |
automatic_confirmation controls have no triage decision — their status is set by the scanner result and is final. All three apply_* methods (apply_review_decision, apply_manual_decision, apply_all_prior_manual) were removed in Round 28 (see OQ 9, 11).
Reports save to pen-tester/standalone/reports/ (created automatically; gitignored). Filename format: {type_prefix}_{safe_target}_{timestamp}.{ext} — e.g. website_example.com_2026-06-12 02.30pm.html. The type_prefix comes from a dict in _write_reports(): website, api, code-review, agent, stig, assessment (fallback). CSV is always generated regardless of the user's format selection — generate_csv_report() runs unconditionally after the primary format. Reports open in the user's default browser via webbrowser.open('file:///' + path).
Multi-target combined report mode — when the user selects "Combined report" and there are multiple targets, main.py merges all four result lists from each subsequent engine into the primary engine: primary.all_results.extend(eng.all_results), primary.auto_results.extend(eng.auto_results), primary.review_items.extend(eng.review_items), primary.manual_items.extend(eng.manual_items). primary.target is set to all targets joined with ", ". This is a flat merge with no deduplication — if two targets both have results for, say, AUTH-001, both appear as separate entries in the combined report. The combined report is one report for multiple targets, not a deduplicated union. In separate-report mode, each target produces its own report file.
Combined mode stat card double-counting in _show_results(). After the combined merge, _show_results() still iterates ALL self.completed_engines and sums counts across each engine's .all_results. But the primary engine's all_results was already extended to include every secondary engine's items — so secondary engines' results are counted twice (once from primary's extended list, once from their own list). This means all stat card numbers (total, critical, high, compliant, suppressed) are inflated in combined mode: specifically, for N engines total, items from engines 2..N are double-counted. The report content itself is correct (produced from the merged primary engine); only the Results screen stat cards are affected by this double-count. In separate-report mode, each engine is counted once so no double-counting occurs. Acceptable limitation; low priority since combined mode is rarely used. The fix is straightforward: in _show_results(), use self.completed_engines[0].get_summary() (the merged primary) instead of summing across all completed_engines.
Multi-target scans are sequential, not parallel. _start_assessment() builds a pending_configs queue; _run_next_target() pops and scans one target at a time. Each target's ScanWorker must complete before the next starts (_on_target_done() → _run_next_target()). All targets report to completed_engines before _all_done() writes reports.
prior_report_data is applied to ALL targets in a multi-target scan. The same prior report dict (loaded once via "Load previous report") is passed to every AssessmentEngine instance (line ~902: prior_report_data=self.prior_report_data). If the prior report was for Target A and the scan includes Target A + Target B, Target B also gets the FP carryforward from Target A's prior report — even if the FPs don't apply. This is a known behavior, not a bug.
What the user will likely do in the next session: (1) run python main.py and verify Code Review scan shows Compliant results (Round 25 fix), (2) test multi-language directory scan with test_targets\code_sample\vulnbank_backend (Round 26 addition — Python/Java/Go/PHP), (3) test STIG import end-to-end, (4) test other scan types (API, Agent, OS), (5) then move to code cleanup batch or PyInstaller packaging.
The Claude Code skill supports 6 assessment types; the Standalone supports 7. pen-tester/SKILL.md defines: Website, AI Agent, Source Code, API, STIG, and Connected Systems (Interconnected). OS & Software assessment is Standalone-only — it requires local machine access that Claude Code cannot provide. Do not try to add OS assessment to the skill.
pen-tester.skill (21 KB, zip archive at project root) — the packaged Claude Code skill. Installable in Claude Code. May be behind current pen-tester/SKILL.md since it was last packaged in April 2026. Not a source file — it's a build artifact. If SKILL.md has changed since April, this needs to be rebuilt.
pen-tester-controls-catalogue.xlsx (40 KB, April 2026) — a spreadsheet catalogue of controls. Likely a reference/planning artifact from early development. Not used by the scanner at runtime. Status unknown — may be outdated relative to current .md libraries.
interactive-remediation-report.html — deleted. Was a demo/reference artifact for "VulnBank + Data Researcher Skill".
pen-tester-self-assessment.html/.md — deleted. Were historical artifacts from April 2026 (pre-AGENT rename, 60-control count).
pen-tester-vs-vanilla-comparison.html — deleted. Was a historical comparison artifact from April 2026.
security-assessment-report.html/.md — deleted. Were gitignored generated outputs from an April 2026 scan.
.~lock.pen-tester-controls-catalogue.xlsx# — deleted. Was a stale LibreOffice lock file.
test-targets/ at project root contains sample-skill/SKILL.md and sample-website/index.html — test inputs for the Claude Code skill. Tracked in Multi-Modal-Scanner. Separate from pen-tester/standalone/test_targets/ which is gitignored and local only.
pen-tester/standalone/gui/ contains only __init__.py. The entire GUI is implemented in main.py directly (PyQt6 classes defined inline). There is no separate GUI module structure despite the directory existing.
The planned 6-screen GUI is only 3 screens implemented. HANDOFF_SUPPLEMENT.md section 1 documents a 6-screen GUI design (Main window, STIG import, Progress, Review required, Manual confirmation, Results). Only 3 of these 6 screens are implemented in the current main.py:
- Screen 0: Home (target input, prior report load, STIG import, format options)
- Screen 1: Progress (scanner feed, live findings feed showing
[SEV] control_id — evidence[:80]for each NON_COMPLIANT result, progress bar: ≤90% during scanning viamin(int(count / max(10, count+2) * 90), 90)formula, jumps to 95% when each target completes (_on_target_done()), 100% when all targets done and reports generated (_all_done())) - Screen 2: Results (banner with targets/controls/findings counts; 5 stat cards: Controls tested, Critical, High, Compliant, Suppressed; report file links with Open buttons). Note: no stat card for Medium, Low, NEEDS_REVIEW, or NOT_APPLICABLE.
Screens 4 (Review required — split panel with action buttons: Accept finding/Compliant/N/A/False positive) and 5 (Manual confirmation — per-item radio button checklist) are the in-app triage screens that were never built. The apply_review_decision() and apply_manual_decision() methods that those screens would have called were removed in Round 28 and will need to be re-implemented if in-app triage is ever added. The supplement's Screen 1 mockup also included toolbar buttons (History, Systems, Settings, Help), a control-sets multi-select QListWidget, a framework dropdown in the GUI, and a prior scan banner — none of these appear in the current implementation. Control sets are auto-determined from target type; framework selection is client-side in reports; history/systems/settings are dead DB read methods.
Tier distribution across all ~195 controls (from design-time analysis in HANDOFF_SUPPLEMENT.md section 2, approximate):
- Automatic confirmation: ~125 controls (entire families + specific IDs with reliable scanner coverage)
- Review required: ~80 controls (scanner produces evidence, but human interpretation needed)
- Manual confirmation: ~22 controls (organizational knowledge required — policy, architecture, vendor docs)
pen-tester/standalone/make_test_reports.py — developer utility to generate 6 static test HTML reports using hardcoded sample data. Does NOT use the engine/scanner stack. Run from pen-tester/standalone/. Outputs to pen-tester/test-reports/: test-report-website.html, test-report-api.html, test-report-code-review.html, test-report-interconnected.html, test-report-os.html, test-report-fps.html (website with AUTH-001 pre-marked as FALSE_POSITIVE). OS type uses report-template.html (same as website — no separate OS HTML template). STIG is NOT covered by this utility. The JSON control objects in the hardcoded data match the same schema as generate_html_report() produces, so test reports behave identically to real scan outputs in the browser.
pen-tester/test-reports/ contains pre-generated test report HTML files for each assessment type. Tracked in Multi-Modal-Scanner. These are reference outputs, not generated by running the standalone app.
HANDOFF_SUPPLEMENT.md section 7 is stale and will mislead. It still says to upload handoff files, references pen-test-triage-update/ (deleted Round 25), and says "6 bugs" to fix. Do not follow section 7. Use HOW_TO_START_NEW_SESSION.txt and CLAUDE.md instead — both were updated in Round 25.
HANDOFF_SUPPLEMENT.md section 9 checksums are stale. They reference cross-system-controls.md (renamed to interconnected-controls.md) and reflect June 2, 2026 line counts — before Round 26+ changes to the report templates and controls.py. Do not use them for integrity verification without re-running the checksums.
Control library counts — one source of truth:
| Library file | Assessment types | Controls | Families |
|---|---|---|---|
controls-library.md |
Website, AI Agent | 67 | 13 |
api-controls-library.md |
API | 53 | 17 |
code-review-controls.md |
Code Review | 51 | 12 |
interconnected-controls.md |
Connected Systems | 27 | 9 (CHAIN, TRUST, RESCORE, DATAFLOW, SESSION, CRYPTO, CONFIG, INCIDENT, SUPPLY) |
os-software-controls.md |
OS & Software | 12 | — |
| STIG | STIG Compliance | dynamic (imported from XCCDF) | — |
Control counts are confirmed by ### header audit (all counts match CONTROL_LIBRARIES): controls-library.md has 73 ### headers total — 6 are non-control reference appendices (OWASP LLM 2025, NIST AI RMF 1.0, Google SAIF, ISO/IEC 42001, CSA AI, Platform Reference) = 67 actual controls. api-controls-library.md has 54 headers — 1 appendix (OWASP API Top 10 Quick Reference) = 53 controls. code-review-controls.md has 54 headers — 3 section headers (Control Domains, Framework References, Language Applicability) = 51 controls. interconnected-controls.md and os-software-controls.md header counts match directly (27 and 12). Open Question 6 is now resolved.
Connected Systems works differently in the Skill vs the Standalone — these are not the same workflow.
In the Claude Code skill version: requires two completed prior assessment HTML reports as inputs. AI analysis correlates findings across both reports to detect multi-step attack chains spanning connected systems, with CVSS re-scoring and reachability promotion. Does not scan a new target.
In the Standalone app: there is no separate "Connected Systems" target type and no requirement for prior assessments. Instead, when the user enters more than one target, main.py automatically appends 'interconnected' to selected_sets. This adds the 27 interconnected controls to the scan alongside the primary library. The interconnected controls run as additional review_required / manual_confirmation items — they don't correlate DB records or re-score CVSS. The reachability field on ScanResult (default "DIRECT") is stored but the standalone has no logic that re-scores CVSS based on it. Connected Systems in the standalone is effectively "extra controls assessed during a multi-target scan."
All 12 OS controls — IDs, families, tier assignments, and scanner coverage (verified against os-software-controls.md and controls.py):
| Control ID | Name | Family | Tier | Scanner covers? |
|---|---|---|---|---|
| PATCH-001 | OS security updates current | PATCH | auto | ✓ via os_scanner.py |
| PATCH-002 | Vulnerability patch SLA compliance | PATCH | manual | ✗ — always manual |
| PATCH-003 | Installed software CVE exposure | PATCH | auto | ✓ via NVD lookup |
| EOL-001 | OS end-of-life status | EOL | auto | ✓ via _OS_EOL dict |
| EOL-002 | Installed software end-of-life status | EOL | review | ✗ — no scanner |
| SOFTINV-001 | Software inventory documented | SOFTINV | manual | ✗ — always manual |
| SOFTINV-002 | Unauthorized software absent | SOFTINV | review | ✗ — no scanner |
| SVCCONFIG-001 | Services run as least-privilege | SVCCONFIG | review | ✗ — no scanner |
| SVCCONFIG-002 | Unnecessary/insecure services disabled | SVCCONFIG | auto | ✓ via ServicesScanner class |
| SVCEXPOSE-001 | Listening network services minimized | SVCEXPOSE | auto | ✓ via ServicesScanner class |
| SVCEXPOSE-002 | Remote management services secured | SVCEXPOSE | review | ✗ — no scanner |
| OSAUDIT-001 | OS audit logging enabled | OSAUDIT | review | ✗ — no scanner |
5 of 12 auto, 2 manual, 5 review_required (no scanner). os-software-controls.md header — FIXED (Round 27). File header now correctly says "12 controls across 6 families" (PATCH, EOL, SOFTINV, SVCCONFIG, SVCEXPOSE, OSAUDIT).
os_scanner.py internal structure — 3 scanner classes, not functions. scan_os_target() instantiates and runs exactly three classes in order: OSVersionScanner (produces PATCH-001 and EOL-001), SoftwareCVEScanner (produces PATCH-003), and ServicesScanner (produces SVCCONFIG-002 and SVCEXPOSE-001). There are no _check_services() or _check_ports() standalone functions — this logic lives inside ServicesScanner.scan(). Each scanner is wrapped in a try/except so one scanner crash doesn't abort the others. Scanner .name attributes (written to ScanResult.scanner and passed to progress_callback): OSVersionScanner.name = "os-version", SoftwareCVEScanner.name = "software-cve", ServicesScanner.name = "services". The target parameter to scan_os_target() is used as a label only — the scan always runs against the local machine regardless of what string is passed.
SoftwareCVEScanner only checks PRIORITY_KEYWORDS software against NVD — not all installed packages. The scanner enumerates all installed software (Windows: PowerShell registry query of HKLM Uninstall keys; Linux: dpkg-query first, rpm fallback), then filters to packages whose name matches any keyword from PRIORITY_KEYWORDS (38 keywords covering browsers, runtimes, crypto/SSH, web servers, databases, productivity, comms, devops tools, security tools, and common utilities). Only matching packages are queried against NVD. Non-priority software (e.g., a custom in-house application) is never queried. The NVD keyword is name[:50] — the software name truncated to 50 chars, with version omitted from the query string. CVSS threshold for a finding: ≥ 7.0; the finding is tagged CRITICAL if max score ≥ 9.0. Up to 5 CVEs are shown in evidence (_cve_summary() max_items=5), descriptions truncated to 120 chars.
EOL-001 has three outcome paths, not just pass/fail. When OS is found in _OS_EOL and past its date → NON_COMPLIANT. When found and within support but < 6 months remaining → NEEDS_REVIEW (triggers auto→review promotion). When found and ≥ 6 months remaining → COMPLIANT. When NOT found in _OS_EOL → NEEDS_REVIEW. Windows 10 and Windows 11 version-specific checks use display_version from the Windows registry (e.g., win10-22h2 has EOL date 2025-10-14, win11-22h2 has EOL date 2025-10-14 — both past EOL as of June 2026 → NON_COMPLIANT). Windows 11 entries were added in Round 25 — see _OS_EOL dict coverage summary below.
_EXPECTED_PORTS (ports that do NOT trigger SVCEXPOSE-001 NON_COMPLIANT/NEEDS_REVIEW): {22, 80, 443, 3389, 8080, 8443, 8888, 53, 135, 139, 445, 5985, 5986}. Any listening port outside this set triggers SVCEXPOSE-001 NEEDS_REVIEW (not outright NON_COMPLIANT — requires manual confirmation of business justification). The scanner uses netstat -ano on Windows and ss -tlnp (falling back to netstat -tlnp) on Linux.
_RISKY_SERVICES_WIN includes WinRM, but its ports are in _EXPECTED_PORTS. WinRM (WinRM service name) is in the risky services list for SVCCONFIG-002 — if the WinRM service is running, SVCCONFIG-002 fires NON_COMPLIANT. However, ports 5985 and 5986 (WinRM's ports) are in _EXPECTED_PORTS, so SVCEXPOSE-001 will NOT flag them as unexpected. Both controls can apply to the same WinRM configuration, but via different mechanisms.
Full _RISKY_SERVICES_WIN service list (verified from source — these are the 8 Windows service names that trigger SVCCONFIG-002 NON_COMPLIANT):
TlntSvr (Telnet server), FTPSVC (IIS FTP), tftpd32 (TFTP), SNMP (v1/v2 community strings), RemoteRegistry (allows remote registry edits), Spooler (Print Spooler — CVE-2021-34527 PrintNightmare), RasMan (Remote Access Service), WinRM (Windows Remote Management).
Full _RISKY_SERVICES_LINUX service list (verified from source — these are the Linux daemon names checked via systemctl or service enumeration):
telnetd (Telnet daemon), vsftpd (FTP), proftpd (ProFTPD), tftpd (TFTP), rshd (RSH — cleartext), rexecd (Rexec — cleartext), snmpd (SNMP — check v1/v2), xinetd (inetd super-server). Service detection on Linux uses systemctl list-units --type=service --state=running --plain --no-legend. If systemctl fails (non-systemd systems, or any exception), returns an empty list — no fallback. This means risky service detection is silently skipped on non-systemd Linux.
_OS_EOL dict OS coverage summary (verified from source): Windows 7/8/8.1; Windows 10 (version-dependent via display_version: 1909, 20H2, 21H1, 21H2, 22H2); Windows 11 (version-dependent via display_version: 21H2, 22H2, 23H2, 24H2, 25H2, 26H1 — added Round 25, Enterprise/Education dates); Windows Server 2008/2012/2016/2019/2022; Ubuntu 16.04–24.04; Debian 9–12; CentOS 6/7/8; RHEL 7/8; macOS 10.15/11/12. Not in dict: Windows Server 2025, macOS 13 (Ventura)/14 (Sonoma)/15 (Sequoia) — produce NEEDS_REVIEW. As of June 2026, win11-22h2 (EOL 2025-10-14), win11-23h2 (EOL 2026-11-10 — within 6 months, so NEEDS_REVIEW), win10-22h2 (EOL 2025-10-14), and ubuntu 20.04 (EOL 2025-04-30) will trigger findings on hosts running those versions.
PATCH-001 Windows update check has a 45-second timeout. _get_pending_windows_updates() runs a PowerShell COM object query (New-Object -ComObject Microsoft.Update.Session) with timeout=45. If the query fails or returns 'ERROR', PATCH-001 returns NEEDS_REVIEW (not NON_COMPLIANT). This is the longest-running step in an OS scan and may require elevation to get accurate results.
OS scanner ScanResult confidence and reachability values (verified against source):
PATCH-001 exact outcomes:
- NEEDS_REVIEW (query fails/error): severity='HIGH', confidence=0.3, no reachability
- COMPLIANT (pending=0): severity='CRITICAL', confidence=0.9, no reachability — severity='CRITICAL' on COMPLIANT is counterintuitive but reflects the severity of the control, not the finding
- NON_COMPLIANT (pending>0): severity='CRITICAL', confidence=0.95, cvss_score=7.8, cvss_vector='CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H', reachability='INDIRECT'
EOL-001 exact outcomes:
- NON_COMPLIANT (past EOL date): severity='CRITICAL', confidence=0.95, cvss_score=9.8, cvss_vector='CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H', reachability='DIRECT'
- NEEDS_REVIEW or COMPLIANT (within support): confidence=0.9, no reachability; severity='HIGH' if <12 months remaining, 'INFORMATIONAL' otherwise; status='NEEDS_REVIEW' if <6 months remaining, 'COMPLIANT' otherwise
- NEEDS_REVIEW (OS not in
_OS_EOLdict): severity='MEDIUM', confidence=0.3
PATCH-003 exact outcomes:
- NON_COMPLIANT (CVEs found): severity='CRITICAL' if max_score >= 9.0, else 'HIGH'; confidence=0.75; reachability='INDIRECT'
- COMPLIANT (no CVEs ≥ 7.0, or no priority packages identified): severity='HIGH', confidence=0.6, no reachability
- No NEEDS_REVIEW path exists for PATCH-003 — unlike PATCH-001, NVD query failures silently produce empty CVE results, so a failed NVD query produces COMPLIANT with lower confidence rather than NEEDS_REVIEW
SVCCONFIG-002 exact outcomes:
- NON_COMPLIANT (risky services found): severity='HIGH', confidence=0.9, reachability='DIRECT'
- COMPLIANT (running services present, none risky): severity='HIGH', confidence=0.85, no reachability
- NEEDS_REVIEW (can't enumerate services): severity='HIGH', confidence=0.2
SVCEXPOSE-001 exact outcomes:
- NEEDS_REVIEW (unexpected ports present, i.e., ports outside
_EXPECTED_PORTS): severity='MEDIUM', confidence=0.8, no reachability — unexpected ports produce NEEDS_REVIEW not NON_COMPLIANT; business justification required before flagging as violation - COMPLIANT (all listening ports within
_EXPECTED_PORTS): severity='MEDIUM', confidence=0.8, no reachability - NEEDS_REVIEW (can't enumerate ports): severity='MEDIUM', confidence=0.1
OS & Software scanner makes live HTTP requests to the NVD API. os_scanner.py enumerates the local machine (OS version via platform module + winreg on Windows for detailed build info, installed software, running services, listening ports), then queries https://services.nvd.nist.gov/rest/json/cves/2.0 to check installed software against the National Vulnerability Database for CVE exposure. Uses urllib.request (Python stdlib, NOT the requests library) — no extra dependency. NVD queries: resultsPerPage=10 (max 10 CVEs per keyword), _highest_cvss() finds the max score across all returned CVEs. No API key is passed (always unauthenticated — the api_key parameter in _nvd_query() exists but is never supplied by callers). Rate limit: _NVD_RATE_DELAY = 0.7 seconds between requests (~1.4 req/sec — conservative, well under the NVD limit of 5/sec). NVD query errors are silently caught and return [] — NVD unavailability produces no user-visible error, just empty CVE results. CVSS scores extracted in priority order: CVSSv3.1 > CVSSv3.0 > CVSSv2 (_highest_cvss() function). NVD queries use in-memory caching per scan run (_NVD_CACHE dict) — the same keyword won't hit the API twice in one scan. An internet connection is required for CVE lookups; local enumeration (OS version, software list, services, ports) runs offline. EOL date checking uses a hardcoded _OS_EOL dict in os_scanner.py — no network call. This is different from website (scanners.py) and API (api_scanner.py) scanners, which hit the target over the network. OS scanner hits NVD, not the target. Code review, STIG, and agent scanners are fully offline.
DB-based false positive carryforward was removed in Round 28. FalsePositivesDB, DecisionsDB, and evidence hash comparison were all deleted. The only carryforward is report-based: loading a prior HTML report via "Load previous report" applies FP marks, notes, and STIG triage decisions via prior_report_data in run_automatic_tier().
Project_Handoff_Document.docx — deleted. PROJECT_HANDOFF.md is the authoritative handoff document.
HANDOFF_SUPPLEMENT.md — keep for now. Sections 8 and 16 are inlined into this document; sections 7 and 9 are stale. Sections 1–6, 10–15 contain GUI specs, confidence thresholds, scanner architecture detail, and verified test procedures that may still be useful. Do not delete until confirmed redundant.
Python module map — what each .py file does:
| File | Entry point(s) | Purpose |
|---|---|---|
main.py |
run directly | PyQt6 GUI — all UI classes defined inline here; gui/ dir contains only __init__.py |
engine.py |
AssessmentEngine class |
Orchestration — loads controls library, calls scanners, applies tier logic, builds AssessmentResult list |
detector.py |
detect_target(user_input) |
Target type detection — reads input string and returns dict with type/label/icon/control_sets/description. Priority order: os → website → stig → agent → file-path → extension-fallback → keyword → unknown |
scanners.py |
run_all_scanners(target, target_type) |
Website scanner — despite the generic name, handles website HTTP scanning only |
agent_scanner.py |
scan_agent_config(filepath) → imported as scan_agent |
AI Agent scanner — static config analysis (parses SKILL.md, GPT configs, LangChain defs, MCP manifests); tests AGENT-001 through AGENT-011 controls; no live HTTP requests |
api_scanner.py |
scan_spec(filepath) → imported as scan_api |
API scanner — static analysis of OpenAPI/Swagger spec files (YAML or JSON); does NOT make live HTTP requests; target must be a file path to the spec |
code_scanner.py |
scan_target(target) → imported as scan_code; routes to scan_file() or scan_directory() |
Source code scanner — VULN_PATTERNS dict + LANG_EXTENSIONS map for language detection. Emits both NON_COMPLIANT (on pattern match) and COMPLIANT (for every checked control with no violation) via _build_compliant_results(). |
os_scanner.py |
scan_os_target(target) → imported as scan_os |
OS & Software scanner — enumerates local machine OS version, installed software, running services; queries NVD API for CVEs |
controls.py |
load_all_controls(selected_sets, stig_paths) |
Parses .md control libraries into Python objects; _known set gates what becomes review_procedure |
reporter.py |
generate_html_report(engine, path) and variants |
Generates HTML reports from AssessmentResult list; finds templates via relative path to ../assets/ |
db.py |
DB class methods (static) | SQLite persistence — stores target systems, scan history, per-control decisions, false positives, evidence hashes; keys decisions by target+control for carryforward |
make_test_reports.py |
run directly | Dev utility to generate test HTML reports for every assessment type; not part of the app |
Report output goes to pen-tester/standalone/reports/ — NOT pen-tester/test-reports/. _all_done() in main.py writes reports to os.path.join(os.path.dirname(os.path.abspath(__file__)), "reports") = pen-tester/standalone/reports/. This directory is gitignored (local only). pen-tester/test-reports/ is a separate directory with pre-generated static test reports tracked in Multi-Modal-Scanner.
_abort_scan() disconnects signals but does NOT terminate the scan thread. If the user clicks "Cancel" during a scan, _abort_scan() disconnects scan_worker.progress and scan_worker.finished signals and navigates back to the home screen. The ScanWorker QThread continues running in the background — it just can't update the UI or trigger report generation anymore. This is safe for short-running scans. For OS scans (which hit NVD API and can take longer), the thread may run for a significant time after abort. Accepted behavior for now. The proper fix is a cooperative stop flag: add self._stop_requested = False to ScanWorker.__init__(), set it to True in _abort_scan() before disconnecting signals, and check it at appropriate points inside ScanWorker.run() (e.g., between scanner calls) to exit early. Not currently queued.
prior_report_data persists across scan sessions within the same app launch. _reset_to_home() clears completed_engines, pending_configs, and _scanner_count, then navigates to screen 0 — but it does NOT clear self.prior_report_data, and it does NOT clear self.target_list (the list of configured scan targets). If a user loads a prior report, scans, resets, and scans again without loading a new prior report, the old prior_report_data is still applied to the new scan. The same targets already added to target_list remain after reset — the user can start a new scan of the same targets immediately without re-entering them. To clear prior_report_data, the user must restart the app or load a new prior report (which replaces the dict).
App initialization (main() function, lines 1132–1138): app.setStyle("Fusion") (flat cross-platform look), app.setFont(QFont("Segoe UI", 10)). MainWindow.setMinimumSize(860, 600). StigImportDialog.setMinimumWidth(520). The Fusion style avoids platform-specific widget rendering differences between Windows 10/11.
Two batch launchers in pen-tester/standalone/:
launch.bat— runspython main.py(uses whateverpythonis in PATH)run_app.bat— runs"C:\Users\slagb\AppData\Local\Python\pythoncore-3.14-64\python.exe" main.py(hardcoded to user's Python 3.14 install; use this ifpythonis not on PATH)
scanners.py (run_all_scanners, ScanResult) is a required import in engine.py — if it fails, engine.py fails to load entirely. The four optional scanners (code_scanner, api_scanner, agent_scanner, os_scanner) are wrapped in try/except: if any fails to import, it is set to None and skipped in run_automatic_tier() (guarded by if scan_code:, if scan_api:, etc.). Missing optional dependency → scanner skipped → controls fall through to no-scanner fallback (review_required checklist + confidence 0.2).
API assessment differs between skill and standalone. In the Claude Code skill (pen-tester/SKILL.md), API assessment accepts OpenAPI/Swagger specs, live API endpoint URLs, and Postman collections — Claude reads and interprets them with AI. In the Standalone, api_scanner.py only accepts a file path to an OpenAPI/Swagger YAML/JSON spec. Live API URLs and Postman collections are not supported in the standalone. detect_target() routes .yaml/.yml/.json files with API-like content to target_type = "api".
api_scanner.py requires a file path, not a URL. The API assessment target must be a path to an OpenAPI/Swagger spec file (.yaml, .yml, or .json). scan_spec() parses the spec statically — it detects BOLA patterns, missing auth requirements, mass assignment risks, debug endpoints, non-HTTPS server URLs, rate limiting gaps, and GraphQL risks by reading the spec structure. It makes no live HTTP requests. The detector.py routes .yaml/.json file inputs with API-like content to target_type = "api". If a user enters a live API URL instead of a spec file, scan_spec() will fail to parse it.
api_scanner.py control coverage — exact output from scan_spec() (verified against source):
| Control ID | Status produced | Trigger condition |
|---|---|---|
| BOLA-001 | NON_COMPLIANT or COMPLIANT | Always — NON_COMPLIANT if integer path params found, else COMPLIANT |
| AUTH-001 | NON_COMPLIANT or COMPLIANT | Always — NON_COMPLIANT if any endpoint lacks security, else COMPLIANT |
| BOPLA-001 | NON_COMPLIANT only | Only if schema has writable sensitive fields (role, isadmin, balance, etc.) |
| RATE-001 | NEEDS_REVIEW | Always — rate limiting can't be verified from spec alone |
| FUNC-001 | NON_COMPLIANT only | Only if /admin endpoints found with no security |
| SSRF-001 | NON_COMPLIANT only | Only if URL/URI/callback/redirect params found |
| CONFIG-001 | NON_COMPLIANT only | Only if /debug, /internal, /trace, /metrics, /actuator paths found |
| CONFIG-002 | NEEDS_REVIEW | Always — API surface summary (endpoint count, methods, schemas, etc.) |
| CONFIG-003 | NON_COMPLIANT only | Only if http:// server URLs found (non-HTTPS) |
| DATA-001 | NON_COMPLIANT only | Only if sensitive field names (password, ssn, credit_card, api_key, etc.) in schemas |
| WEBHOOK-001 | NEEDS_REVIEW only | Only if webhook/hook paths found |
| GRAPHQL-001 | NEEDS_REVIEW only | Only if graphql/graphiql paths found |
| INVENTORY-001 | NEEDS_REVIEW | Always — "are there undocumented endpoints?" is unanswerable from spec alone |
Important: AUTH-002 and AUTH-003 appear in AUTO_IDS for the API library but api_scanner.py produces no result for them. They get family-based evidence from AUTH-001 (confidence 0.5) — the report shows "No scanner maps directly to AUTH-002, but related AUTH controls were tested: [AUTH-001 result]" with status COMPLIANT (if AUTH-001 was COMPLIANT) or NEEDS_REVIEW. They stay in auto_results (not promoted to review_required). INVENTORY-001 is NOT in AUTO_IDS or MANUAL_IDS — it defaults to review_required tier; since its status is always NEEDS_REVIEW, it stays in review_items. The scanner produces 13 control results at most — the other ~40 API controls get no scanner result.
_scan_raw_spec() — YAML-less fallback. When HAS_YAML = False (pyyaml not installed) and the spec is YAML (not JSON), parse_spec() falls back to basic line-key extraction and sets '_raw': content in the spec dict. scan_spec() detects '_raw' in spec and not spec.get('paths') and calls _scan_raw_spec(). The fallback does keyword matching on lowercased raw text and produces results for only 6 control IDs: BOLA-001, AUTH-001, CONFIG-003, SSRF-001, CONFIG-001, DATA-001 — all at lower confidence (~0.6–0.7). No NEEDS_REVIEW results in fallback mode. The other ~47 API controls get no result when YAML parsing fails.
api_scanner.py only fully supports OpenAPI 3.x, not Swagger 2.x. extract_schemas() looks for spec.get('components', {}).get('schemas', {}) — the OpenAPI 3.x location. Swagger 2.x specs use spec.get('definitions', {}) instead. A Swagger 2.x .yaml file will parse successfully (if pyyaml is installed) but produce schemas = {} — meaning BOPLA-001 (mass assignment) and DATA-001 (sensitive fields) checks will find nothing and produce no results. BOLA-001 and AUTH-001 checks still work because they read spec['paths'], which exists in both versions. If a user provides a Swagger 2.x spec and gets no BOPLA-001 or DATA-001 findings, this is the reason.
api_scanner.py has a PyYAML guard. The module opens with try: import yaml; HAS_YAML = True except ImportError: HAS_YAML = False. If PyYAML is not installed and the spec file is YAML (not JSON), parse_spec() falls back to a very basic line-by-line key extraction that produces an incomplete spec dict. All subsequent scanner checks that look for structured spec data (endpoints, paths, security schemes) will return empty lists and produce minimal/inaccurate results. pyyaml>=6.0 is in requirements.txt (Bug 4 was fixed), so this shouldn't be an issue in the standard setup — but is the root cause if API scanning returns empty results.
agent_scanner.py is entirely keyword-based — no LLM calls, no semantic analysis. It looks for words from HIGH_RISK_TOOLS, MEDIUM_RISK_TOOLS, DEFENSIVE_KEYWORDS, DANGEROUS_INSTRUCTIONS, and CONFIRMATION_KEYWORDS sets in the lowercased file text. Some AGENT controls are conditional — only a ScanResult is produced when the trigger condition fires (e.g., AGENT-003 only if database/file tools present; AGENT-004 only if high_risk AND declared_purpose). AGENT-001, AGENT-002, and AGENT-005 always produce a ScanResult. All AGENT controls except AGENT-007/010 are review_required tier — AGENT family is not in AUTO_FAMILIES and individual AGENT IDs are not in AUTO_IDS. This means: (1) even always-producing controls like AGENT-001/002/005 go through the review-tier loop (not auto-tier), which sets evidence/confidence but NEVER sets ar.status → all AGENT controls show NOT_TESTED regardless of scanner coverage; (2) conditional AGENT controls that don't fire (no trigger condition met) go through the review-tier no-match path (family evidence or structured checklist), also staying NOT_TESTED.
StigsDB.get_all() is never called from main.py — joins ScansDB.get_history(), SystemsDB.get_all(), and SystemsDB.find_by_target() as dead read methods from an unimplemented history/audit UI. StigsDB.save() IS called in _import_stig() and uses ON CONFLICT(stig_id) DO UPDATE — re-importing the same STIG (same benchmark ID) overwrites the existing DB record and regenerates the .md file.
Report format options (GUI dropdown, default "HTML dashboard + Markdown"):
- "HTML dashboard + Markdown" → HTML + Markdown + CSV (CSV always generated)
- "HTML dashboard only" → HTML + CSV
- "Markdown only" → Markdown + CSV
- "JSON" → JSON + CSV
CSV is generated unconditionally for every scan regardless of selection. Multi-target mode options: "Separate reports" (default) or "Combined report".
Report filename pattern (_write_reports() in main.py): {type_prefix}_{safe_target}_{timestamp}.{ext}. The type_prefix maps: website → website, api → api, code → code-review, agent → agent, stig → stig, all others (including os and interconnected) → assessment. safe_target is the hostname (for URLs) or basename (for file/path inputs), stripped and truncated to 40 chars. Timestamp format is %Y-%m-%d %I.%M%p lowercased (e.g. 2026-06-13 02.30pm). Example: website_example.com_2026-06-13 02.30pm.html.
Markdown report omits INFORMATIONAL from severity breakdown. generate_markdown_report() in reporter.py only lists CRITICAL/HIGH/MEDIUM/LOW in its severity summary table (lines 350–354). INFORMATIONAL findings will appear in the Findings section if they are NON_COMPLIANT, but are not counted in the severity table header. HTML and JSON reports include INFORMATIONAL in their data.
JSON report silently omits NOT_TESTED and NEEDS_REVIEW controls. generate_json_report() (lines 444–471) routes each AssessmentResult to exactly one of three arrays: "findings" (NON_COMPLIANT), "compliant" (COMPLIANT), or "false_positives" (is_false_positive). Controls with status NOT_TESTED, NEEDS_REVIEW, or NOT_APPLICABLE appear in neither array — they are absent from the JSON output entirely. For any incomplete assessment (which includes all STIG assessments, since manual items remain NOT_TESTED), the JSON export produces a partial view. The HTML report does include all controls regardless of status (via engine.all_results loop at line 168).
Template selection by target type (get_template_path() in reporter.py): interconnected → interconnected-report-template.html; code/code_review → code-review-report-template.html; api → api-report-template.html; os/os_software → report-template.html; agent → report-template.html (falls through to default). The agent target type does NOT have its own template — it shares report-template.html with website and OS scans. However, _report_title() does return a distinct string for agents: target_type == 'agent' → "Agent Security Assessment" (while OS → "OS & Software Security Assessment", API → "API Security Assessment", code → "Code Review Security Assessment"). Template selection uses selected_sets if available, falling back to target_type.
STIG data is injected in two separate passes (updated Round 29). _generate_stig_html_report() still does NOT use {{REPORT_TITLE}}-style substitution — the STIG template reads JS/JSON vars directly. The two injections are:
STIG_META— injected as<script>var STIG_META = {...};</script>viatemplate.replace('</head>', meta_script + '\n</head>')(same mechanism as before)- CONTROLS array — injected as
<script type="application/json" id="sat-controls-data">{json}</script>by replacing<!-- SAT-CONTROLS-PLACEHOLDER -->in the template. This makes STIG reports parseable by_parse_controls_from_html()on load-prior-report, and the template reads controls viaJSON.parse(document.getElementById('sat-controls-data').textContent). This is the same format used by non-STIG reports — STIG carryforward now works identically.
STIG CAT level mapping (_sev_to_cat() in reporter.py): CRITICAL → CAT I, HIGH → CAT II, MEDIUM/LOW → CAT III, unknown → CAT II (default fallback). This correctly reverses stig_parser.py's SEVERITY_MAP (high→CRITICAL, medium→HIGH, low→MEDIUM). Fixed Round 29 — prior mapping was CRITICAL/HIGH → CAT I which elevated all CAT II findings to CAT I. Used only in _generate_stig_html_report() to compute catLevel for each finding entry.
STIG triage status carryforward (added Round 29 — engine.py + reporter.py): When a saved STIG HTML report is loaded via "Load previous report", extract_prior_data_from_report() reads the stigStatus field from each control in sat-controls-data and stores it as stig_status in prior_report_data. During the next scan, engine.py applies these saved decisions after the FP/notes carryforward block. For every control where ar.control.library == 'stig' and ar.status is still NEEDS_REVIEW or NOT_STARTED, the prior stig_status is mapped to internal status: 'Open' → 'NON_COMPLIANT', 'Not a Finding' → 'COMPLIANT', 'Not Applicable' → 'NOT_APPLICABLE', 'Not Reviewed' → 'NEEDS_REVIEW' (no-op). Only controls still awaiting review have their status overridden — controls already set by scanner logic are left alone. Combined with the FP/notes carryforward that runs before this block, a loaded STIG prior report now restores all three categories of saved state: FP marks, user notes, and manual triage decisions.
STIG internal → display status mapping (verified from _generate_stig_html_report() reporter.py lines 250–258):
COMPLIANT→ "Not a Finding"NON_COMPLIANT→ "Open"NOT_APPLICABLE→ "Not Applicable"FALSE_POSITIVE→ "Not a Finding" (same as COMPLIANT — FPs are NOT separately identified in STIG summary counts)NOT_TESTED→ "Not Reviewed"NEEDS_REVIEW→ "Not Reviewed"
STIG_META consequence: nafCount includes false positives. Since FALSE_POSITIVE maps to "Not a Finding", nafCount in STIG_META = COMPLIANT + FALSE_POSITIVE controls combined. Individual CONTROLS entries do have isFalsePositive: true for FPs, so they're distinguishable at the card level but not in the summary header.
STIG_META full structure (injected as var STIG_META = {...} before </head>): target (engine.target string), date (formatted %Y-%m-%d %H:%M), tester (hardcoded "Security Assessment Tool v1.0"), totalRules (total control count), openCat1/2/3 (NON_COMPLIANT counts per CAT), nafCount (COMPLIANT+FP), naCount (NOT_APPLICABLE), nrCount (NOT_TESTED + NEEDS_REVIEW).
STIG report fallback (_generate_stig_fallback_html()): If stig-report-template.html is missing from pen-tester/assets/, the STIG reporter falls back to a minimal static HTML table (Vuln ID, CAT level, Title, Status, Evidence truncated to 120 chars). No JavaScript interactivity — no filtering, no expand/collapse. This is a safety net only; the real template is required for full functionality.
scanners.py name is misleading. It is the website scanner specifically, not a generic scanner registry. The name predates the multi-scanner architecture.
Control IDs are a proprietary FAMILY-NNN taxonomy. The family abbreviations (AUTH, CRYPTO, HEADERS, AGENT, etc.) and three-digit control numbers are the tool's own naming scheme — not borrowed from OWASP, CWE, NIST, or any external standard. There is no external mapping to these IDs. A new session should never try to "align" control IDs to an external taxonomy or renumber them to match one.
Reports are self-contained HTML files. All scan data is embedded in the HTML at generation time as a JSON blob inside <script type="application/json" id="sat-controls-data">. The report template reads this tag: JSON.parse(document.getElementById('sat-controls-data').textContent). When the user saves the report (after triage), the updated state is re-serialized back to this tag. Reports can be emailed, archived, and reopened offline — no server required.
sat-controls-data JSON schema — each entry in the array (built by generate_html_report() in reporter.py):
id - control ID (e.g. "AUTH-001")
name - control name string
family - control family abbreviation
status - COMPLIANT / NON_COMPLIANT / NOT_APPLICABLE / NEEDS_REVIEW / etc.
severity - CRITICAL / HIGH / MEDIUM / LOW / INFORMATIONAL
cia - e.g. "C, I"
evidence - human-readable scanner output
finding - same value as evidence (duplicate field; kept for template compatibility)
remediation - remediation text
mitigation - "YES" if false positive, "NO" otherwise
mitigationDesc - FP justification text (empty string if not FP)
note - user's triage note (populated post-scan via GUI)
tier - "automatic_confirmation" / "review_required" / "manual_confirmation"
statement - control statement text from .md library
review_steps - review_procedure or test_procedure text from .md library
reachability - DIRECT / ONE_HOP / INTERNAL / MULTI_STEP / NONE
cvss - {"score": float, "vector": str} or null if no CVSS data
frameworks - [] (always empty; framework display is handled by FW_MAP in templates)
source - scanner_name string or "manual" for manual_confirmation controls
{{CONTROLS_JSON}} is inserted with json.dumps(findings_data).replace('</', '<\\/') to prevent XSS via early </script> tags.
Template placeholders substituted by generate_html_report() (simple str.replace, no Jinja2): {{REPORT_TITLE}}, {{TARGET_NAME}} (hostname only for URLs, basename for paths), {{TOTAL_CONTROLS}}, {{FRAMEWORK}} (framework_filter or "All frameworks"), {{NON_COMPLIANT_COUNT}}, {{CRIT_COUNT}}, {{HIGH_COUNT}}, {{MED_COUNT}}, {{LOW_COUNT}}, {{INFO_COUNT}}, {{CONTROLS_JSON}}, {{REPORT_DATE}} (%Y-%m-%d %H:%M), {{REPORT_ID}} (random 8-char UUID prefix).
CVSS scores come from the individual scanners, not from engine.py. Each scanner result (ScanResult) carries its own cvss_score and cvss_vector. engine.py copies these onto AssessmentResult (ar.cvss_score = sr.cvss_score). For controls where no scanner produces a result — review_required with no evidence hit, all manual_confirmation controls — cvss_score stays at the dataclass default 0.0 and cvss_vector stays "". The statement "every finding includes a CVSS score" applies to scanner-generated findings only. Do not remove or stub out CVSS fields from ScanResult or AssessmentResult — this is a key requirement for regulated-environment use.
run_automatic_tier() has a dead code line (engine.py line ~168). ar.status = sr.status.replace('_', '-') if sr.status == 'NON_COMPLIANT' else sr.status is immediately overwritten by the if/elif block on lines 169–179. It has no effect. This is a remnant from an earlier version where statuses used hyphens (e.g. NON-COMPLIANT). Ignore it when reading the code.
Auto-tier status assignment (lines 167–188) — exact logic when scanner match exists:
sr.status == 'NON_COMPLIANT'→ar.status = 'NON_COMPLIANT', copy severity/evidence/cvss/remediationsr.status == 'COMPLIANT'→ar.status = 'COMPLIANT', copy fieldssr.status == 'NEEDS_REVIEW'→ar.tier = 'review_required', append toreview_items,continue(skips field assignment and FindingsDB.save for this control)- anything else →
ar.status = sr.status, copy fields Note: thecontinuefor NEEDS_REVIEW means promoted controls havear.severity = "",ar.evidence = "", etc. at the end of the auto-tier loop. The review-tier processing loop (which runs after) then fills in evidence, severity, and all other fields for the promotedar— because it was appended toreview_items, the review loop processes it normally.
Auto-tier family evidence status (engine.py lines 210–217) — when an auto-tier control has no direct scanner match but related family controls were tested, ar.status = 'COMPLIANT' if AND ONLY IF all family results are COMPLIANT; otherwise ar.status = 'NEEDS_REVIEW' (never NON_COMPLIANT). Confidence = 0.5. This status-setting behavior is ONLY in the auto-tier loop. The review-tier no-match branch (lines 257–317) builds evidence and sets confidence (0.5 with family, 0.2 without) but never sets ar.status — review-tier controls stay NOT_TESTED regardless of what family siblings produced.
ar.remediation fallback populated by engine for no-scanner controls. For review-tier with no scanner match: if not ar.remediation: ar.remediation = ar.control.fix_text or ar.control.statement (engine.py line ~316). Same logic for manual_confirmation controls (engine.py line ~366). So by the time reporter.py serializes the result, ar.remediation is already set. reporter.py's r.remediation or r.control.fix_text or "" is a second fallback guard for the edge case where the engine didn't set it — in practice the engine already set it.
apply_review_decision() and apply_manual_decision() were removed in Round 28. For reference when re-implementing in-app triage: review tier accepted four decisions ('false_positive', 'accept', 'compliant', 'na'); manual tier accepted three ('fail', 'pass', 'na'). The 'false_positive' option was not available for manual tier. When rebuilding, is_false_positive=True + status='FALSE_POSITIVE' should be set for the false_positive case; storage mechanism (formerly FalsePositivesDB) will need to be re-implemented.
Prior false positive and note carryover from report applies to ALL tiers. At the end of run_automatic_tier() (engine.py lines 370–386), after all scanner processing, the code iterates self.all_results and applies prior_report_data. If a control ID was marked as FP in the prior report (mitigation == 'YES'), ar.is_false_positive = True, ar.fp_justification = prior justification or "Carried forward from previous assessment report", and ar.status = 'FALSE_POSITIVE'. If a note exists in the prior report, ar.user_notes is set. This runs AFTER scanner results are set, so a prior FP carryforward will override a scanner's NON_COMPLIANT result for that control.
Manual_confirmation evidence generation differs from review_required no-scanner in two ways. (1) Step count: manual shows ALL test procedure steps (no cap); review_required no-scanner caps at 6 steps (steps[:6]). (2) Statement handling: for manual controls, Requirement: {control.statement} is ALWAYS emitted first (if statement is non-empty), then test procedure steps are added — the statement is not a fallback; it's always present. For review_required no-scanner, the statement is a fallback shown ONLY IF test_procedure is empty (elif ar.control.statement: branch). Both split on . and format as [ ] N. step. (review) or N. step. (manual). Manual-tier ar.confidence is never set by the engine → stays at the dataclass default 1.0. This is the same as auto-tier no-scanner no-family, but for a different reason (no scanner coverage for manual controls by design).
When multiple scanners return results for the same control, NON_COMPLIANT wins. run_automatic_tier() builds result_by_ctrl by iterating all scan_results: if sr.control_id not in result_by_ctrl or sr.status == 'NON_COMPLIANT': result_by_ctrl[sr.control_id] = sr. If two scanners both hit AUTH-001 — one returning COMPLIANT, one returning NON_COMPLIANT — the NON_COMPLIANT result is stored and used for the assessment. This is "worst-case wins" logic. A COMPLIANT result never overwrites a NON_COMPLIANT result for the same control ID.
automatic_confirmation controls can be promoted to review_required at runtime. If a scanner returns NEEDS_REVIEW status for a control that classify_control() placed in the auto tier, run_automatic_tier() sets ar.tier = 'review_required' and ar.status = 'NEEDS_REVIEW', then adds the AssessmentResult to self.review_items (engine.py lines 173–177). The object is NOT removed from self.auto_results — it remains in both lists. This means get_summary()['auto_total'] (=len(self.auto_results)) and get_summary()['review_total'] (=len(self.review_items)) both count a promoted control — the tier totals overcount. all_results has it once (added during load_controls()). The tier in the final report for that control is review_required (because ar.tier was mutated). Scanner-reported uncertainty overrides the static tier assignment.
Agent target type runs different scanners depending on input type. In run_automatic_tier(), when target_type == 'agent', the code uses two independent if checks (not if/elif):
if os.path.isfile(self.target):
scan_results = scan_agent(self.target, ...) # file → static analysis
if self.target.startswith('http'):
scan_results.extend(run_all_scanners(self.target, ...)) # URL → HTTP scannersThe design is additive (the comment says "# Also run website scanners") — if both conditions were true, both would run. In practice they are mutually exclusive because a URL is not a real file path (isfile() returns False for URLs). A URL agent target gets only the website HTTP scanner suite; a file-based agent config gets only static scan_agent() analysis.
VULN_PATTERNS tuple format in code_scanner.py: each entry is (control_id, severity, regex_pattern, description, remediation). TypeScript reuses JavaScript patterns (VULN_PATTERNS['typescript'] = VULN_PATTERNS['javascript']). C reuses C++ patterns (VULN_PATTERNS.setdefault('c', VULN_PATTERNS['cpp'])). The language name in VULN_PATTERNS must match what LANG_EXTENSIONS returns for a file extension. Current LANG_EXTENSIONS mapping:
.py → python .js → javascript .ts → typescript .jsx → javascript
.tsx → typescript .java → java .go → go .php → php
.cs → csharp .cpp → cpp .c → c .h → c
.hpp → cpp .rs → rust
To add a new language: add entries to LANG_EXTENSIONS (extension → lang name), add a VULN_PATTERNS[lang_name] list of tuples, and add the lang name to code-review-controls.md's Languages: field for relevant controls.
scan_file() stops at the first match per pattern per file (line 176: break). If a file has 50 SQL injection instances all matching the same regex, only the first is reported. A control with multiple patterns (e.g., SEC-INJ-001 Python has two entries) can produce multiple results — one per distinct pattern that matches — but each pattern stops at its own first match.
Code scanner produces ONLY NON_COMPLIANT results — scan_file() never emits COMPLIANT. If no pattern matches a file, no ScanResult is produced for that control. Auto-tier code controls with no match fall through to the no-scanner NEEDS_REVIEW path in run_automatic_tier().
reachability is hard-coded to 'INTERNAL' for all code scanner ScanResults (line 174). Website scanners do not set it explicitly — they also get "DIRECT" via the ScanResult dataclass default (scanners.py line 45: reachability: str = "DIRECT"). The API scanner also does not set it — it too gets the dataclass default "DIRECT". INTERNAL is correct for source code vulnerabilities (exploitation requires access to the codebase); DIRECT is the sensible default for HTTP-based and API findings.
_analyze_complexity() caps function body scanning at 200 lines (line 254: range(i + 1, min(len(lines), i + 200))). Functions longer than 200 lines from their opening line have truncated complexity and nesting analysis. CPX-STRUCT-001 (function length) still fires based on func_lines > 50, but CPX-METRIC-001 and CPX-STRUCT-003 may be undercounted for very long functions.
SEC-AUTH-001 Python pattern is effectively abandoned. The regex r'@app\.route.*\ndef\s+\w+.*\n(?:(?!login_required|...).)*$' contains \n newline sequences and expects multi-line content, but scan_file() applies it to one line at a time via re.search(pattern, line, re.IGNORECASE). The regex will never match a single line and never fires. More importantly, SEC-AUTH-001 is in MANUAL_IDS — even if the regex were rewritten to scan file content as a block, the scanner result would be silently dropped by the engine (manual-tier controls never read result_by_ctrl). The control always shows NOT_TESTED with a test procedure checklist. No fix is planned.
DEV-BUILD-001 has no VULN_PATTERNS entry and is never produced. It's in AUTO_IDS, so it's in auto_results. With no direct scanner match, the auto-tier else branch runs: family relatives = scan_results starting with "DEV-BUILD" — DEV-BUILD-002 IS produced for Python/PHP, so DEV-BUILD-001 gets family-based evidence (confidence 0.5) and status COMPLIANT if DEV-BUILD-002 is COMPLIANT, else NEEDS_REVIEW. It stays in auto_results (not promoted). DEV-BUILD-002 (debug mode enabled) IS covered for Python and PHP only.
ScanResult dataclass — the data contract every scanner must return (defined in scanners.py lines 33–46). engine.py imports ScanResult from scanners and copies fields onto AssessmentResult. Any new scanner must return a list of ScanResult objects:
| Field | Type | Default | Notes |
|---|---|---|---|
scanner |
str | — | required; scanner class name string (e.g. "tls-scanner") → copied to AssessmentResult.scanner_name |
control_id |
str | — | required; FAMILY-NNN control this result maps to |
status |
str | — | required; COMPLIANT / NON_COMPLIANT / ERROR / NEEDS_REVIEW |
severity |
str | "MEDIUM" |
copied to AssessmentResult.severity |
evidence |
str | "" |
human-readable scanner output |
confidence |
float | 1.0 |
0.0–1.0 |
remediation |
str | "" |
copied to AssessmentResult.remediation |
cvss_score |
float | 0.0 |
copied directly; engine does not recompute |
cvss_vector |
str | "" |
full CVSS v3.1 vector string |
reachability |
str | "DIRECT" |
default "DIRECT"; valid values: DIRECT, ONE_HOP, INTERNAL, MULTI_STEP; stored in DB; no automatic re-scoring in standalone |
scanners.py contains 10 website HTTP scanner classes (the WEBSITE_SCANNERS list). All make live HTTP requests. Exact control IDs actually produced per class (verified against source — .controls list and actual scan() output can differ):
| Scanner class | .name |
Controls actually produced | Controls declared but never produced |
|---|---|---|---|
TLSScanner |
tls-scanner |
CRYPTO-001, CRYPTO-002, CRYPTO-005 | CRYPTO-003, CRYPTO-004, CRYPTO-006 |
HeaderScanner |
header-check |
HEADERS-001–006, HEADERS-007 (CORS) | — |
CookieScanner |
cookie-audit |
SESSION-001 (no-cookie case only), SESSION-003, SESSION-004, SESSION-005 | SESSION-002 |
AuthScanner |
auth-probe |
AUTH-001, AUTH-004 | AUTH-005 |
AuthzScanner |
authz-probe |
AUTHZ-001, AUTHZ-002, AUTHZ-003, AUTHZ-004 | AUTHZ-005 |
InputValidationScanner |
input-fuzzer |
INPUT-001, INPUT-003, INPUT-004, INPUT-005 | INPUT-002, INPUT-006, INPUT-007 |
EndpointDiscoveryScanner |
endpoint-discovery |
COMP-001, COMP-003, INFRA-001, INFRA-002, INFRA-004, DATA-002, DATA-004, AUDIT-001 | — |
SessionScanner |
session-analyzer |
SESSION-001 (if session cookie found), SESSION-002, AUTH-003, AUTH-006 | — |
SecretScanner |
secret-scan |
SECRETS-001, SECRETS-002 | SECRETS-003 |
ErrorHandlingScanner |
error-check |
ERROR-001 | ERROR-002, ERROR-003 |
The .controls list on each scanner is documentation, not contract — scan() may not produce all declared IDs. CRYPTO-003/004/006, AUTH-005, AUTHZ-005, SECRETS-003, ERROR-002/003, INPUT-002/006/007 are declared in their respective .controls lists but have no producing code path. What happens to each when not produced:
- CRYPTO-003/004/006 (CRYPTO is in
AUTO_FAMILIES→ auto_confirmation tier): no scanner match → auto-tier else branch → related = scan_results starting with "CRYPTO" (e.g. CRYPTO-001, CRYPTO-002, CRYPTO-005 produced by TLSScanner) → family-based evidence, confidence=0.5, status=COMPLIANT if all siblings COMPLIANT else NEEDS_REVIEW. These controls stay inauto_results— they are NOT promoted toreview_required. Promotion only happens when the scanner explicitly returns NEEDS_REVIEW; a no-match fallback always stays in auto_results. - INPUT-002 (in
AUTO_IDS→ auto_confirmation): family relatives INPUT-001/003/004/005 are produced by InputValidationScanner → family-based evidence, confidence=0.5. - SECRETS-003 (in
AUTO_IDS→ auto_confirmation): family relatives SECRETS-001/002 produced by SecretScanner → family-based evidence, confidence=0.5. - ERROR-002/003 (in
AUTO_IDS→ auto_confirmation): family relative ERROR-001 produced by ErrorHandlingScanner → family-based evidence, confidence=0.5. - AUTH-005 (in
AUTO_IDS→ auto_confirmation): family relatives AUTH-001/003/004/006 produced by AuthScanner/SessionScanner → family-based evidence. - AUTHZ-005 (in
MANUAL_IDS→ manual_confirmation): manual tier, receives only checklist evidence from test_procedure; scanner result (if one existed) would be silently dropped anyway. - INPUT-006/007 (not in AUTO_IDS/MANUAL_IDS → review_required): review loop; family relatives INPUT-001/003/004/005 produced → family-based evidence, confidence=0.5, status stays NOT_TESTED (review loop never sets ar.status).
- INFRA-003 (in
AUTO_IDS→ auto_confirmation): not in any scanner's.controlslist and no code produces it (grep confirms zero matches inscanners.py). Gets family-based evidence from INFRA-001/002/004 results produced byEndpointDiscoveryScanner, confidence=0.5. Stays inauto_results.
AUDIT-001 is review_required but EndpointDiscoveryScanner always produces a NEEDS_REVIEW result for it (scanners.py lines 1002–1022). Since AUDIT-001 is not in AUTO_IDS or MANUAL_IDS, load_controls() puts it in review_items. The scanner result enters result_by_ctrl normally, and the review-tier loop finds the match. However — following the Round 19 pattern — the review loop sets ar.evidence, ar.confidence, ar.severity from the scanner result but never sets ar.status. AUDIT-001 therefore shows as NOT_TESTED with scanner evidence populated, NOT as NEEDS_REVIEW. The NEEDS_REVIEW status from the scanner result is read and discarded.
CookieScanner only produces SESSION-001 in the no-cookies case (COMPLIANT). When the server sets cookies, CookieScanner produces SESSION-003/004/005 from cookie flags — but SESSION-001 and SESSION-002 are NOT produced. When the server sets no cookies at all, CookieScanner returns SESSION-001 COMPLIANT and exits early. SessionScanner is the only scanner that produces SESSION-002 (always → NEEDS_REVIEW) and the full SESSION-001 analysis (length-based, only when a named session cookie exists). For result_by_ctrl dedup: if both scanners produce SESSION-001, SessionScanner runs last and its NON_COMPLIANT wins.
AUTH-002 is not in any scanner's control list — auto_confirmation tier (in AUTO_IDS) with no direct scanner result. Gets family-based evidence from AUTH-001/003/004/006 results (all from AUTH family), confidence=0.5, status=COMPLIANT if all COMPLIANT else NEEDS_REVIEW. Stays in auto_results (not promoted).
AuthScanner makes 6 active POST requests to the discovered login endpoint with test credentials (username: test@test.com, password: wrongpassword). This is active login probing — it can trigger account lockout, IDS alerts, or be logged in security systems. The scanner probes up to 9 common login paths (/login, /signin, etc.) to find the endpoint first, then sends 6 rapid POSTs to test rate limiting.
AuthzScanner actively probes 18 admin paths and 9 sensitive paths. The 18 ADMIN_PATHS include /admin, /admin/dashboard, /administrator, /manage, /panel, /console, /api/admin, /wp-admin, /phpmyadmin, /cpanel, /dashboard, /settings, /config, and variants. The 9 SENSITIVE_PATHS include /api/users, /api/user/1, /api/user/2, /api/accounts, /api/orders, /api/payments, and variants. All probed with GET requests, allow_redirects=False, 5-second timeout per path.
Scanner crash behavior: only controls[0] gets an ERROR result. If a scanner's scan() method raises an unhandled exception, run_all_scanners() catches it and appends a single ERROR result using scanner.controls[0] as the control_id. The other controls in scanner.controls receive no result and fall through to the engine's no-scanner fallback. A scanner crash does NOT produce ERROR for all its declared controls.
If HAS_REQUESTS = False (requests not installed), some scanners return early: HeaderScanner and CookieScanner return a single ERROR ScanResult. AuthScanner, AuthzScanner, and SessionScanner return an empty list [] — silent fail, no error result.
ScanResult.elapsed_seconds was removed in Round 27. The field and all scanner assignments to it were deleted from scanners.py.
HAS_BS4 (BeautifulSoup) was removed in Round 27 — the try/except import block was deleted from scanners.py and beautifulsoup4 was removed from requirements.txt.
get_scanners_for_type() in scanners.py always returns WEBSITE_SCANNERS for every target type (including code, API, OS). This function is NOT what drives scanner dispatch — engine.py has its own explicit dispatch at lines 134–155 that overrides it:
if target_type == 'code' and scan_code: → scan_code() only
elif target_type == 'api' and scan_api: → scan_api() only
elif target_type == 'os' and scan_os: → scan_os() only
elif target_type == 'agent' and scan_agent:
if os.path.isfile(target): scan_agent() → scan_agent() for file targets
if target.startswith('http'): run_all_scanners() → ALSO website scanners if URL-based
else: → run_all_scanners() (website/fallback)STIG assessments produce no useful scanner results. target_type='stig' falls to the else branch and calls run_all_scanners(), which runs WEBSITE_SCANNERS against the XML file path. Each website scanner fails (XML path is not a valid URL/hostname) and the exception handler produces ERROR ScanResult objects keyed to each scanner's controls[0]. These ERROR results are in scan_results and family_evidence, but no STIG control family (e.g. CYLN-OP, APSC-DV) matches the website scanner control families (CRYPTO, HEADERS, etc.) — so all STIG controls fall to the review no-match no-family checklist path. STIG assessments have only review-tier controls (auto_results=[], manual_items=[] — STIG controls are always review_required, never manual_confirmation).
result_by_ctrl deduplication in run_automatic_tier() (engine.py line 160): multiple scanner results for the same control_id are reduced to one, keeping NON_COMPLIANT over any other status. If both are NON_COMPLIANT, later-processed scanner wins. If a control has no scanner result, engine.py checks for family-based evidence from related controls (confidence 0.5).
NEEDS_REVIEW in the auto tier has two distinct paths with different DB behavior:
- Scanner returns NEEDS_REVIEW (engine.py line 173):
ar.tier='review_required', appended toreview_items, thencontinue— skips field assignment (severity/evidence/cvss/etc. NOT set from scanner yet) AND skips FindingsDB.save. Fields get set later in the review loop. - No scanner result, family evidence shows partial failure (line 216):
ar.status='NEEDS_REVIEW', stays inauto_results(NOT added toreview_items), evidence and confidence ARE set, FindingsDB.save IS called. - No scanner result, no family evidence either (line 219): same — stays in
auto_results, FindingsDB.save IS called, but evidence is the generic "No scanner covers X" message. Paths 2 and 3 produce NEEDS_REVIEW rows in FindingsDB; path 1 does not.
agent_scanner.py does static analysis, not live HTTP. It parses configuration files to assess AI agent security posture. This is distinct from scanners.py (which makes live HTTP requests to a website). Both map to the same controls library (controls-library.md), but the evidence collection method is entirely different.
agent_scanner.py produces results for AGENT-001 through AGENT-011 — but not all 11 always appear. The exact output depends on what keywords are found in the config file:
- Always produces a result: AGENT-001 (NON_COMPLIANT if high-risk tools found, COMPLIANT otherwise), AGENT-002 (NON_COMPLIANT if no validation keywords, NEEDS_REVIEW if found), AGENT-005 (NON_COMPLIANT if injection surface tools found, NEEDS_REVIEW otherwise)
- Conditional (absent from results if not triggered): AGENT-003 (only if database/file tools → data exposure risk list is non-empty), AGENT-004 (only if both
high_risktools ANDdeclared_purposeis non-empty —declared_purposeis extracted viare.search(r'description[:\s]*>?\s*\n?\s*(.+?)(?:\n---|\n#|\Z)', content, re.IGNORECASE | re.DOTALL), capped at 200 chars; empty string if no match → AGENT-004 not triggered), AGENT-006 (only if none of['error', 'exception', 'fail', 'graceful', 'fallback']found in content — all 5 keywords checked, not just 3), AGENT-007 (only ifDANGEROUS_INSTRUCTIONSphrases found), AGENT-008 (only if delegation keywords like "delegate", "crew", "chain" found), AGENT-009 (only if no prompt-protection phrases found), AGENT-010 (NON_COMPLIANT only if explicit "without confirmation" or high-risk with no confirmation keywords), AGENT-011 (only if plugin/mcp/extension keywords found)
Full keyword sets (all verified against source):
HIGH_RISK_TOOLS (triggers AGENT-001, AGENT-003, AGENT-004, AGENT-005, AGENT-010): bash, shell, exec, execute, system, command, terminal, write, writefile, delete, remove, rm, unlink, sendemail, send_email, email, smtp, mail, database, databasequery, sql, query, db, webrequest, http, fetch, curl, webhook, deploy, publish, push, upload, payment, transfer, transaction.
MEDIUM_RISK_TOOLS (triggers AGENT-001 evidence only, not risk flags): read, readfile, file, filesystem, webfetch, browse, search, websearch, mcp, plugin, extension, tool, api, rest, graphql.
DEFENSIVE_KEYWORDS (for AGENT-002): validate, sanitize, check, verify, filter, whitelist, allowlist, restrict, limit, bound, escape, encode, reject, deny, refuse.
DANGEROUS_INSTRUCTIONS (for AGENT-007): don't ask, do not ask, without confirmation, without asking, just do it, no restrictions, unrestricted, any command, any file, any database, any query, all allowed, no limits, no limitation, unlimited.
CONFIRMATION_KEYWORDS (for AGENT-010): confirm, confirmation, approve, approval, ask, permission, consent, verify, human-in-the-loop, before proceeding, user must, requires approval.
agent_scanner.py produces results for AGENT-007 and AGENT-010, but both are manual_confirmation tier — the scanner evidence is silently discarded. AGENT-007 and AGENT-010 are in MANUAL_IDS. The engine's run_automatic_tier() builds result_by_ctrl from scanner output, but the manual-tier loop (lines 343–367) never reads result_by_ctrl — it only builds a checklist from test_procedure. The agent scanner's AGENT-007/010 findings (danger instructions, no-confirmation patterns) are generated, enter result_by_ctrl, and are then silently dropped. These controls show NOT_TESTED with test procedure checklist evidence, regardless of what the scanner found. This is a known inconsistency worth fixing: either move AGENT-007 and AGENT-010 to review_required tier (remove from MANUAL_IDS) so scanner evidence surfaces in reports, or remove their scan logic from agent_scanner.py entirely since the results are wasted. Currently unresolved — see Open Question 12.
Tool detection uses word-boundary regex (re.findall(r'\b\w+\b', content_lower)) — finds discrete words only. fetch matches but fetchUrl or web_request do NOT (the word boundary splits on _ and case changes don't help). This means config files using camelCase tool names (common in LangChain, MCP manifests) may undercount tool risk.
AGENT-008 delegation check includes 'agent' in its keyword list. Full list: ['delegate', 'agent', 'crew', 'chain', 'graph', 'multi-agent', 'sub-agent', 'handoff']. The word 'agent' appears in nearly every agent configuration file, meaning AGENT-008 fires for virtually every agent assessment — the evidence text saying "Multi-agent delegation indicators: agent" is expected and normal.
AGENT-011 plugin check includes 'tool', 'action', 'function' in its keyword list. Full list: ['plugin', 'extension', 'mcp', 'tool', 'action', 'function']. 'tool' and 'function' are extremely common words in agent config files, so AGENT-011 fires for nearly every assessment.
AGENT-009 prompt protection keywords (full list, substring-matched): 'do not reveal', 'never share', 'keep confidential', 'do not repeat', 'instructions are private', 'system prompt is'. If none of these exact phrases appear → NON_COMPLIANT. If any appear → no AGENT-009 result (implicitly compliant).
AGENT-010 has two separate NON_COMPLIANT paths:
- Explicit no_confirmation phrases found (
'don't ask','without confirmation','send immediately','execute without','just do it'): NON_COMPLIANT, confidence 0.9 - High-risk tools present AND no confirmation keywords found: NON_COMPLIANT, confidence 0.7 If confirmation keywords ARE found (even without explicit no-confirmation language): no AGENT-010 result produced at all.
code_scanner.py covers 9 languages (python, javascript, typescript, java, go, php, csharp, cpp/c, rust) via VULN_PATTERNS dict + LANG_EXTENSIONS map. Each entry is (control_id, severity, regex_pattern, description, remediation). Control families covered: SEC-INJ-001–006 (injection/deserialization), SEC-CRYPTO-001–004 (secrets/weak crypto/insecure random/password hashing), SEC-AUTH-001 (missing auth decorator), SEC-DATA-001/002/004 (sensitive data in logs, SELECT *, error exposure), SEC-MEM-001–002/004 (buffer overflow, use-after-free, unsafe blocks — Rust/C++ only), SEC-MEM-005–006 (resource management — higher-level langs), DEV-BUILD-002 (debug mode), DEV-TEST-003 (test-mode auth bypass). Detection is purely line-by-line regex — no AST, no semantic analysis, no cross-file dataflow. scan_target() routes to scan_file() (single file) or scan_directory() (recursive walk), both return ScanResult lists.
code_scanner.py also produces CPX- and CPX-MAINTAIN- control results** — not just vulnerability patterns. _analyze_complexity() produces: CPX-STRUCT-004 (file > 500 lines, severity INFORMATIONAL), CPX-STRUCT-001 (function > 50 lines, severity LOW), CPX-METRIC-001 (cyclomatic complexity > 10, severity MEDIUM if ≤15 else HIGH), CPX-STRUCT-003 (nesting depth > 4, severity MEDIUM). _check_practices() produces: CPX-MAINTAIN-002 (unused Python imports, severity LOW), CPX-MAINTAIN-003 (empty/bare exception handlers, severity MEDIUM — Python/JS/Java/Go/PHP only), CPX-MAINTAIN-001 (< 50% of Python functions have type annotations, severity LOW). Complexity analysis only examines the first 200 lines of each function body. CPX-STRUCT, CPX-METRIC, CPX-MAINTAIN are in AUTO_FAMILIES so these controls are auto-tier. All vulnerability pattern results use confidence=0.85 and reachability='INTERNAL'; complexity results use default confidence (1.0) and no reachability.
Each vulnerability pattern fires at most once per file — scan_file() calls break after the first matching line. A file with 10 SQL injection vulnerabilities produces one ScanResult for that pattern. If a file has two different SEC-INJ-001 patterns both matching, two ScanResults with control_id='SEC-INJ-001' are produced. For directory scans, result_by_ctrl dedup in run_automatic_tier() keeps only the last NON_COMPLIANT per control ID — so if 5 files each trigger SEC-INJ-001, only one finding surfaces in the final assessment. This is the biggest limitation of the code scanner for large codebases.
Code scanner results for MANUAL-tier controls are silently dropped. run_automatic_tier() only maps scanner results to controls in self.auto_results and self.review_items. Manual-tier controls (self.manual_items) receive only the checklist-from-test_procedure evidence, never scanner evidence. SEC-AUTH-001 and SEC-AUTH-002 are in MANUAL_IDS — even if the pattern fires, the result is ignored. DEV-TEST-003 (test-mode auth bypass, flagged by the JS pattern) is also manual tier — same outcome. Additionally, the SEC-AUTH-001 Python pattern (@app.route.*\ndef\s+\w+.*\n...) contains \n literals but is applied line-by-line, so it never matches — it's a dead pattern even before reaching the manual-tier drop.
scan_directory() skips common non-source directories: {'node_modules', '.git', '__pycache__', 'venv', '.venv', 'target', 'build', 'dist', 'vendor', '.idea', '.vscode'}. Files with extensions not in LANG_EXTENSIONS are silently ignored. Files that raise IOError/OSError (permission errors, binary files) return an empty results list. File content is read as encoding='utf-8', errors='ignore' — non-UTF-8 bytes are silently dropped rather than crashing.
No .gitmodules file exists at root. pen-test-triage-update/ was deleted in Round 25 with no .gitmodules cleanup required.
git add -f is required to stage standalone files from the root repo. pen-tester/standalone/ is in the root .gitignore. Running git add -A from root silently skips all standalone files — no error, no warning. This is intentional. Always commit standalone files from within pen-tester/standalone/. If you ever need to force-add from root: git add -f pen-tester/standalone/<file> — but this is an anti-pattern; use manage.ps1 or commit from the standalone dir.
PROJECT_HANDOFF.md is the primary handoff document from June 12, 2026 onward. HANDOFF_SUPPLEMENT.md and Project_Handoff_Document.docx are older reference artifacts. Sections 7 and 9 of the supplement are stale (see above). Section 8 (how to work with the user) and section 16 (error history) are inlined here. HOW_TO_START_NEW_SESSION.txt has been updated to reference this document, not the supplement.
AssessmentResult dataclass fields — exact definition at engine.py lines 40–58:
| Field | Type | Default | Notes |
|---|---|---|---|
control |
Control | — | the Control object from controls.py |
status |
str | "NOT_TESTED" |
COMPLIANT / NON_COMPLIANT / NOT_APPLICABLE / NEEDS_REVIEW / ERROR / NOT_TESTED / FALSE_POSITIVE — always uppercase |
tier |
str | "" |
automatic_confirmation / review_required / manual_confirmation |
severity |
str | "" |
CRITICAL / HIGH / MEDIUM / LOW / INFORMATIONAL — from the control library; used for summary counts |
evidence |
str | "" |
scanner output or fallback message |
confidence |
float | 1.0 |
0.0–1.0; scanner-assigned, or fallback: 0.5 (family-based evidence, both tiers), 0.2 (review-tier no-match no-family → checklist), 1.0 default (auto-tier no-match no-family — engine does NOT set confidence, stays at dataclass default) |
cvss_score |
float | 0.0 |
CVSS v3.1 base score |
cvss_vector |
str | "" |
CVSS v3.1 vector string |
reachability |
str | "" |
Auto-tier only: copied from ScanResult.reachability via sr.reachability or 'DIRECT' (engine.py line 186). Review-tier: never set by the review loop — stays "" even when a scanner produced a direct match with explicit reachability. The continue in the auto-tier NEEDS_REVIEW promotion (line 177) also skips line 186, so promoted controls also have ar.reachability = "". Reporter fallback (r.reachability or "DIRECT", line ~1130) means all review-tier controls show "DIRECT" in JSON output. Valid values: DIRECT, ONE_HOP, INTERNAL, MULTI_STEP; stored in DB; not auto-re-scored in standalone |
remediation |
str | "" |
fix/remediation text from the control's fix_text field |
is_false_positive |
bool | False |
set True when FP is applied; status is also set to FALSE_POSITIVE |
fp_justification |
str | "" |
the user-entered FP justification text |
user_notes |
str | "" |
user-entered notes; pre-populated from prior report carryforward |
scanner_name |
str | "" |
name of the scanner that produced this result |
FALSE_POSITIVE is a valid status value (set alongside is_false_positive = True). NOT_TESTED is the dataclass default and persists for controls that no scanner attempted. Never use lowercase status strings.
Control library .md format — how a control entry looks:
### FAMILY-NNN
- **Name**: Human-readable control name
- **CIA**: A | C | I | AC | CI | ACI (primary CIA triad impact)
- **Secondary**: C | I | A (secondary CIA impact, optional)
- **OWASP**: AXX:202X (OWASP Top 10 mapping)
- **NIST-800**: XX-X (NIST SP 800-53 control ID)
- **ISO-27001**: A.X.X.X
- **CMMC**: XX.LX-X.X.X
- **DoD-SRG**: SRG-APP-XXXXXX
- **FedRAMP**: XX-X (Low|Moderate)
- **HIPAA**: §164.XXX — description
- **PCI-DSS**: Req X.X — description
- **SOC2**: CCX.X — description
- **SEC-FINRA**: citation
- **EU-DORA**: Art. X — description
- **EU-AI**: Art. X — description
- **Statement**: What the control requires (one or two sentences).
- **Severity if Non-Compliant**: CRITICAL | HIGH | MEDIUM | LOW | INFORMATIONAL
- **Test**: What to test and what outcome indicates non-compliance.Tier is auto-assigned by classify_control() in controls.py — no Tier: field needed in the .md and any Tier: line present is ignored. Fields with keys not in _known will leak into review_procedure, so use only the documented field names above.
classify_control(control_id, family) logic (controls.py lines 98–103):
if control_id in AUTO_IDS or family in AUTO_FAMILIES:
return "automatic_confirmation"
if control_id in MANUAL_IDS:
return "manual_confirmation"
return "review_required" # default
Priority: AUTO check (ID or family) → MANUAL check → default review_required.
AUTO_FAMILIES (line 43) — entire family is automatic_confirmation regardless of individual ID:
CRYPTO, HEADERS, SESSION (website/agent); CONFIG, RATE (API); CPX-STRUCT, CPX-METRIC, CPX-MAINTAIN, DEV-DEP, DEV-BUILD, DEV-QUAL (code review)
AUTO_IDS (line 53) — explicit IDs always automatic_confirmation (regardless of family):
- Website/agent: AUTH-001/002/005/006, INPUT-001–004, SECRETS-001–003, ERROR-001–003, DATA-004, COMP-001/003, INFRA-001–004
- API: BOLA-001, AUTH-001/002/003, BOPLA-001, FUNC-001, SSRF-001, CONFIG-001–004, INPUT-001–003, DATA-001/002/003, SECRETS-001/002, GRAPHQL-001–003, WEBHOOK-001/002
- OS/Software: PATCH-001/003, EOL-001, SVCCONFIG-002, SVCEXPOSE-001
- Code: SEC-INJ-001–006, SEC-MEM-001–006, SEC-CRYPTO-001–004, SEC-DATA-001/003/004, DEV-BUILD-001/002
MANUAL_IDS (line 82) — explicit IDs always manual_confirmation:
- Website/agent: AUTHZ-005, DATA-003, DATA-001, AUDIT-002/003, COMP-002, AGENT-007/010
- Cross-system: TRUST-001/002/003, INCIDENT-001/002, SUPPLY-001/002
- Code: DEV-QUAL-003, DEV-TEST-002/003, SEC-AUTH-001/002
- OS/Software: PATCH-002, SOFTINV-001
Critical overlap — DATA-001 and DATA-003 appear in BOTH AUTO_IDS and MANUAL_IDS. AUTO wins because AUTO_IDS is checked first in classify_control(). These IDs are automatic_confirmation despite also being listed in MANUAL_IDS.
_infer_family(control_id) (controls.py line 263–265): control_id.rsplit('-', 1)[0] — splits on the LAST hyphen, so compound families are preserved:
AUTH-001→AUTHSEC-INJ-003→SEC-INJCPX-METRIC-001→CPX-METRICHEADERS-007→HEADERSThis is called when a control's.mdentry has noFamily:field. The inferred family then drivesclassify_control()— if the inferred family is inAUTO_FAMILIES, the control becomesautomatic_confirmation.
STIG controls are never classified by classify_control() — STIG controls get tier = "review_required" as a hardcoded default (set during parsing), not via this function.
The template above applies to controls-library.md (website/agent). code-review-controls.md has an additional field not present in other libraries: - **Languages**: ALL | Python | JavaScript | Rust | Java | C/C++ | C# | Go | PHP — this controls which language-specific review steps are shown in reports via filterReviewSteps(). When adding a code review control, include this field and ensure the value is in the supported set.
Multi-line field values in .md control libraries require 2-space indented continuation lines. _parse_control_section() (controls.py lines 193–199) extends a field across multiple lines only if continuation lines start with exactly two spaces (elif current_key and line.startswith(' ')). A blank line, a line starting with - **, or a line with different indentation ends the field. If a Test: or Statement: value needs to span multiple paragraphs, each continuation line must be indented by 2 spaces. Failure to do so silently truncates the field at the first unindented line.
parse_stig_controls() has NO multi-line continuation support. Unlike _parse_control_section(), the STIG parser (controls.py lines 284–289) only reads single-line - **Field**: Value entries. It has no elif line.startswith(' ') continuation check. STIG fields that span multiple lines — long VulnDiscussion/Statement text or lengthy Check/Fix blocks that stig_parser.py writes with embedded newlines — are silently truncated to their first line when parsed into Control objects. The control's .statement and .test_procedure may be incomplete for verbose STIG checks.
STIG controls: review_procedure is always "" — parse_stig_controls() never sets review_procedure; it's left at the dataclass default (empty string). In the engine's no-scanner fallback path, the review checklist reads ctrl.test_procedure or ctrl.statement for STIG controls. In reporter.py, the reviewer sees review_steps = r.control.review_procedure or r.control.test_procedure — since review_procedure is "", it falls back to test_procedure (the STIG check field).
STIG controls: test_procedure and check_content are identical — both are populated from fields.get('check', '') (lines 307 and 314). This is intentional redundancy: test_procedure feeds the reviewer checklist; check_content is the STIG-specific field for structured XCCDF compliance workflows.
fix_text is now populated for both STIG and non-STIG controls — FIXED (Round 27). _parse_control_section() now includes fix_text=fields.get('fix text', fields.get('fix', '')) in the Control() constructor. The engine's remediation fallback (if not ar.remediation: ar.remediation = ar.control.fix_text or ar.control.statement) now has a real value to fall back to for non-STIG controls.
Severity field name precedence in .md parser — _parse_control_section() tries these keys in order: "severity if non-compliant" → "severity" → "mapped severity" → default "MEDIUM". Use **Severity if Non-Compliant**: in regular control libraries. STIG .md files (generated by stig_parser.py) use **Mapped Severity**:. Do not use plain **Severity**: in new controls — it’s the lowest-precedence lookup and exists only as a legacy alias.
sat-controls-data field sources — five non-obvious fallbacks in reporter.py:
severity→r.severity or r.control.severity— if the scanner returns no severity (empty string, theAssessmentResultdataclass default), the control library’s severity is used. This applies to HTML, CSV, and JSON outputs.remediation→r.remediation or r.control.fix_text or ""— scanner remediation wins; falls back to the control library’sFix:field (fix_text). If neither exists, empty string. Critical caveat:control.fix_textis always""for regular (non-STIG) controls —_parse_control_section()omitsfix_textfrom theControl(...)constructor call entirely (lines 246–260). TheFix:/Fix Text:keys are in_known(so they don’t leak intoreview_procedure) but the value is never stored. For regular controls, the fallback chain is effectivelyr.remediation or "". For STIG controls,fix_textis correctly populated from thefixfield (line 308).reachability→r.reachability or "DIRECT"—AssessmentResult.reachabilityhas a dataclass default of""(empty string), but the JSON entry is seeded as"DIRECT"when the field is empty. The template always sees a non-empty reachability.review_steps→r.control.review_procedure or r.control.test_procedure or ""—review_procedure(test_procedure + all non-_knownsub-fields concatenated) is preferred;test_procedureis the fallback ifreview_procedureis somehow empty.source→r.scanner_name or "manual"— controls with no scanner (review_required fallback, all manual_confirmation controls) showsource: "manual".
How to work with this user (inlined from supplement section 8):
- Terse, direct commands. Rarely asks questions. Expects immediate action, not options or clarification.
- ALL CAPS = do it now. "ADD SUPPORT FOR X" means start building immediately.
- When asked "why is X broken" — diagnose AND fix in one response.
- Don’t ask clarifying questions when intent is clear.
- When given options, they pick the most comprehensive one every time. Default to the most complete option.
- Completeness and accuracy matter more than speed — verify before claiming done.
- They are the product owner AND architect. Don’t override their decisions; implement them.
- "How do you load the gui" means give the exact command (
python main.py), not a tutorial.
Multi-Modal-Scanner https://github.com/CavenderProjects/Multi-Modal-Scanner
last pushed: Round 29 (June 18, 2026)
branch: main, up to date with origin
UNCOMMITTED: PROJECT_HANDOFF.md (this file — update and push after each session)
Multi-Modal-Scanner_Standalone https://github.com/CavenderProjects/Multi-Modal-Scanner_Standalone
last pushed: Round 29 (June 18, 2026)
branch: main, up to date with origin
To push all Round 29 changes:
cd "C:\Users\slagb\OneDrive\Documents\Claude\Projects\Revised pen tester"
.\manage.ps1 push -Repo both -m "Round 29: STIG interactive triage, CAT fix, carryforward fix, profile selection, KeyError fix"Run .\manage.ps1 status to verify current state before starting new work.