extra tools by SimonGurney · Pull Request #72 · punk-security/SAIST

SimonGurney · 2026-05-08T20:48:50Z

No description provided.

github-actions

CRITICAL SEVERITY - INFORMATION DISCLOSURE AND PATH TRAVERSAL

The most critical vulnerability involves multiple path traversal and information disclosure risks. The user-supplied args.path is expanded and resolved but not validated against any sandbox or expected directory, allowing a malicious user to specify arbitrary paths like /etc/passwd, enabling the tool to read and potentially exfiltrate sensitive system files. Additionally, the _rank_candidate_files function uses os.walk without protection against symbolic links. If a repository contains a symlink pointing to external directories such as /etc or /home, it will follow the link and include those files in the repository profile, leading to further exposure of sensitive files outside project boundaries. The final output path for skill files is constructed by concatenating skills_dir with a filename that originates from an LLM response, and the sanitization in _safe_skill_filename could be bypassed, allowing path traversal that results in writing files outside the intended skills_dir.

HIGH SEVERITY - SENSITIVE DATA EXPOSURE AND PROMPT INJECTION

The inclusion of .env files in the TEXT_EXTENSIONS set means that the _rank_candidate_files function will score and include .env files from the target repository in the profile sent to the LLM for skill generation. Since .env files commonly contain secrets like API keys, database passwords, and tokens, this results in credential exposure to the LLM provider. Furthermore, the repository profile is directly interpolated into the user_prompt sent to the LLM without sanitization or separation. A malicious repository can contain crafted content in its source files, such as "ignore previous instructions" style attacks, allowing an attacker to manipulate the LLM's skill generation behavior, effectively performing a prompt injection attack that could override system-level guardrails.

HIGH SEVERITY - UNVALIDATED SKILL FILE CONTENT AND INJECTION RISK

The skill file content is received from the LLM response and written to disk after only whitespace stripping, with no validation, sanitization, or security review. If the LLM is manipulated through prompt injection, it could write malicious or sensitive content into skill files that could later influence future security reviews in arbitrary ways. Similarly, the load_analysis_skills function reads .md files from the skills directory without verifying file integrity or origin. An attacker who can write files to this directory, for example through a compromised repository, can inject malicious skill content that will directly influence the LLM's security analysis behavior.

MEDIUM SEVERITY - DATA CORRUPTION AND CHARACTER HANDLING ISSUES

The truncation of content at an arbitrary byte boundary can split multi-byte UTF-8 characters, producing garbled text. The use of errors='ignore' silently drops incomplete characters, which could corrupt security-relevant context or instructions within skill files and cause the LLM to receive malformed instructions. Additionally, when _read_sample decodes binary file content, it uses errors='replace', which silently replaces invalid UTF-8 bytes with the replacement character U+FFFD. This could inadvertently expose sensitive binary data, misrepresent file contents to the LLM, and more critically, mask encoding-based attacks or cause security-relevant data to be misinterpreted.

github-actions · 2026-05-08T20:50:56Z

+TEXT_EXTENSIONS = {
+    ".cs",
+    ".css",
+    ".env",


Security Issue: The TEXT_EXTENSIONS set includes '.env' as a text extension. The _rank_candidate_files function will score and include .env files in the repository profile sent to the LLM for skill generation. Since .env files commonly contain secrets (API keys, database passwords, tokens), this could result in credential exposure to the LLM provider. The IMPORTANT_FILENAMES set includes '.env.example' but actual .env files are also included via TEXT_EXTENSIONS.

Priority: HIGH

CWE: CWE-200

Recommendation: Remove '.env' from TEXT_EXTENSIONS or add explicit filtering to exclude .env files from file sampling. Consider adding '.env' (without .example) to an exclusion list.

Snippet: ".env",

github-actions · 2026-05-08T20:50:56Z

+{repository_profile}
+"""
+
+    generated = await llm.prompt_structured(system_prompt, user_prompt, GeneratedSkillFiles)


Security Issue: The repository profile (repository_profile) is constructed from user-controlled files in the target repository being analyzed and is directly interpolated into the user_prompt sent to the LLM. If a malicious repository contains specially crafted content (e.g., 'ignore previous instructions' style attacks) in its source files, this content gets included in the LLM prompt, potentially allowing an attacker to manipulate the LLM's skill generation behavior.

Priority: HIGH

CWE: CWE-94

Recommendation: Sanitize or escape the repository_profile content before including it in the LLM prompt. Consider wrapping it in an indented block or delimiters that clearly separate it from instructions, and add system-level guardrails.

Snippet: generated = await llm.prompt_structured(system_prompt, user_prompt, GeneratedSkillFiles)

github-actions · 2026-05-08T20:50:56Z

+            skipped.append(output_path)
+            continue
+
+        output_path.write_text(content + "\n", encoding="utf-8")


Security Issue: The skill filename comes from an LLM-generated response (skill.filename). While _safe_skill_filename sanitizes the filename, the LLM could theoretically generate a filename resulting in path traversal (e.g., if the sanitization is bypassed or a subdirectory path is embedded). The output_path = skills_dir / filename concatenation could write outside the intended skills_dir if filename contains path separators that survive sanitization.

Priority: MEDIUM

CWE: CWE-22

Recommendation: Validate that the final resolved path is still within the intended skills_dir before writing. Add a check like 'if not output_path.resolve().relative_to(skills_dir.resolve())' after constructing the output path.

Snippet: output_path.write_text(content + "\n", encoding="utf-8")

github-actions · 2026-05-08T20:50:56Z

+    skills: list[GeneratedSkillFile]
+
+
+def project_root_from_args(args) -> Path:


Security Issue: The function uses args.path from user-supplied arguments to construct file system paths. It calls expanduser() (allowing ~ expansion) and resolve() but does not validate that the resulting path is within an expected sandbox or directory. A malicious user could specify an arbitrary path (e.g., /etc/passwd or /proc/...) causing the tool to traverse, read, and exfiltrate content from arbitrary locations on the filesystem.

Priority: MEDIUM

CWE: CWE-22

Recommendation: Validate that the resolved path is within an expected project directory or set of allowed directories. Add checks to prevent path traversal beyond the intended scope.

Snippet: def project_root_from_args(args) -> Path:

github-actions · 2026-05-08T20:50:56Z

+            skipped.append(output_path)
+            continue
+
+        output_path.write_text(content + "\n", encoding="utf-8")


Security Issue: The skill file content comes from an LLM response via generate_skill_files. The content is only stripped of whitespace before being written to disk. There is no validation, sanitization, or security review of the content before writing. An LLM could be manipulated (via prompt injection) to write malicious or sensitive content into the skill files that could influence future security reviews.

Priority: MEDIUM

CWE: CWE-913

Recommendation: Add content validation before writing. At minimum, scan for attempts to override system instructions, inject prompt manipulation content, or include sensitive data.

Snippet: output_path.write_text(content + "\n", encoding="utf-8")

github-actions · 2026-05-08T20:50:57Z

+def _rank_candidate_files(project_root: Path, skills_dir: Path) -> list[tuple[int, str, Path]]:
+    candidates: list[tuple[int, str, Path]] = []
+
+    for current_root, dirnames, filenames in os.walk(project_root):


Security Issue: The _rank_candidate_files function uses os.walk on user-controlled project_root without protection against symbolic links. If a repository contains a symlink pointing to an external directory (e.g., /etc, /home), the walker will follow it and include those files in the repository profile. This could lead to reading sensitive files outside the intended project boundaries.

Priority: MEDIUM

CWE: CWE-41

Recommendation: Use followlinks=False (default) with os.walk but add additional checks to ensure resolved paths are within the project_root boundary using path.resolve() comparison. Alternatively, use pathlib.Path.rglob() which has safer symlink handling.

Snippet: for current_root, dirnames, filenames in os.walk(project_root):

github-actions · 2026-05-08T20:50:57Z

+            continue
+
+        try:
+            content = path.read_text(encoding="utf-8")


Security Issue: The load_analysis_skills function reads .md files from the skills_dir path which is derived from user-controlled input (args.path / configured_path). There is no verification of file integrity or origin. If an attacker can write files to this directory (e.g., via a compromised repository), they can inject malicious skill content that influences the LLM's security analysis behavior in arbitrary ways.

Priority: MEDIUM

CWE: CWE-345

Recommendation: Consider signing or checksum-verifying skill files, or restricting write access to the skills directory. At minimum, add a warning when skill content has been modified since generation.

Snippet: content = path.read_text(encoding="utf-8")

github-actions · 2026-05-08T20:50:57Z

+        truncated = False
+        encoded_length = len(content.encode("utf-8"))
+        if encoded_length > remaining:
+            content = content.encode("utf-8")[:remaining].decode("utf-8", errors="ignore").strip()


Security Issue: When truncating content at an arbitrary byte boundary, a multi-byte UTF-8 character can be split, producing garbled text. The use of errors='ignore' silently drops incomplete characters at the split point, which could corrupt security-relevant context or instructions within skill files. This could cause the LLM to receive malformed instructions.

Priority: LOW

CWE: CWE-172

Recommendation: Use proper character-aware truncation, e.g., truncate by character count or use a method that ensures no multi-byte characters are split. Consider using string slicing on the decoded string instead of encoding and truncating at byte boundaries.

Snippet: content = content.encode("utf-8")[:remaining].decode("utf-8", errors="ignore").strip()

github-actions · 2026-05-08T20:50:57Z

+        return ""
+
+    truncated = len(data) > max_bytes
+    text = data[:max_bytes].decode("utf-8", errors="replace").strip()


Security Issue: When _read_sample decodes binary file content, it uses errors='replace' which silently replaces invalid UTF-8 bytes with the replacement character (U+FFFD). This could inadvertently expose sensitive binary data or misrepresent file contents to the LLM. More critically, it could mask encoding-based attacks or cause security-relevant data to be misinterpreted.

Priority: LOW

CWE: CWE-172

Recommendation: Consider what behavior is appropriate for non-UTF-8 files. Either skip files that are not valid UTF-8 (using errors='strict') or log warnings when replacement occurs. For security-sensitive contexts, transparent data corruption should be avoided.

Snippet: text = data[:max_bytes].decode("utf-8", errors="replace").strip()

SimonGurney · 2026-05-09T10:15:44Z

2026-05-09 10:13:49,958 - saist.reportlab_pdf - ERROR - Unable to write ReportLab PDF file to 'reporting/report.pdf': Flowable <Table@0x781A8C69CE10 1 rows x 1 cols(tallest row 1378)> with cell(0,0) containing
'<Paragraph at 0x781a8c29c390>Executive Summary The assessment identified several high-imp'(482.40000000000003 x 1378), tallest cell 1378.0 points, too large on page 4 in frame 'normal'(489.6755905511812 x 736.2897637795278*) of template 'Later'

extra tools

6897cb9

github-actions Bot requested changes May 8, 2026

View reviewed changes

remove latex

bee555c

imnotbrandon previously approved these changes May 8, 2026

View reviewed changes

size fixes

8c848ad

SimonGurney dismissed imnotbrandon’s stale review via 8c848ad May 9, 2026 10:28

QA Team added 12 commits May 9, 2026 11:00

size fixes

e51ea80

tweaks

fc9fb13

report fixes

965374e

fixes

d327df9

add dedupe

d3bd8b9

fix temporario bug

45e300f

fix temporario bug

b007cd9

add azure

e162423

add validation steps to report

d526d0a

awesome webs

95f28ae

fixes

58c2761

openai fixes

164a31c

SimonGurney merged commit 789116e into main May 10, 2026
3 checks passed

SimonGurney deleted the improments branch May 10, 2026 22:54

SimonGurney mentioned this pull request May 10, 2026

Added Azure Open AI #12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extra tools#72

extra tools#72
SimonGurney merged 15 commits into
mainfrom
improments

SimonGurney commented May 8, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot May 8, 2026

Uh oh!

github-actions Bot May 8, 2026

Uh oh!

github-actions Bot May 8, 2026

Uh oh!

github-actions Bot May 8, 2026

Uh oh!

github-actions Bot May 8, 2026

Uh oh!

github-actions Bot May 8, 2026

Uh oh!

github-actions Bot May 8, 2026

Uh oh!

github-actions Bot May 8, 2026

Uh oh!

github-actions Bot May 8, 2026

Uh oh!

SimonGurney commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		skills: list[GeneratedSkillFile]


		def project_root_from_args(args) -> Path:

Conversation

SimonGurney commented May 8, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

SimonGurney commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants