- Business Background & Current Pain Points
Currently, as we advance our LLM-based next-generation white-box automated audit system (especially for complex business logic vulnerabilities like IDOR/Horizontal Privilege Escalation in microservices), we heavily rely on YASA as our foundational code parsing infrastructure.
While YASA's compilation-free parsing is powerful, and its Universal Abstract Syntax Tree (UAST) and Call Graph outputs excellently resolve macro-level inter-service topologies, the lack of micro-level intra-procedural logic analysis has become a severe bottleneck for our SAST pipeline:
Lack of DFG (Data Flow Graph): We cannot determine if a tainted parameter received in Method A is genuinely passed down to the internally called Method B without modification. This leads to exceptionally high base false-positive and false-negative rates.
Lack of CFG (Control Flow Graph): We cannot pinpoint guard conditions (e.g., if (uid != db.uid) or exception throws) along the data flow, losing the crucial "god's-eye view" needed for vulnerability triage.
- Current Workarounds & Disastrous Consequences
Because YASA cannot provide precise CFG/DFG slices, we are forced into a highly inefficient workaround when integrating with our LLM agents:
Extracting the full source code (Method Bodies) of all relevant methods along the Call Graph and feeding them directly to the LLM for reasoning.
This leads to unacceptable engineering issues:
Runaway Token Costs & Latency: When encountering "God Objects" (methods with hundreds of lines), the LLM's context window is easily maxed out. API costs and inference latency scale exponentially, blocking high-concurrency CI/CD pipeline integration.
LLM Hallucinations & Attention Dilution: Forcing the LLM to manually trace data flows across lengthy source codes frequently triggers the "Lost in the Middle" phenomenon, drastically degrading the accuracy of security verdicts.
Alternatives are Too Heavy: We considered external tools like Joern to generate Code Property Graphs (CPG) to bridge this gap, but its extreme memory hunger (tens of GBs) and Scala query barrier contradict our architectural philosophy of being lightweight and "shifting left" rapidly.
- Core Feature Requests
We kindly request the YASA team to drill down one layer beneath the UAST and support intra-procedural flow analysis capabilities:
[Req-1] Support Intra-procedural CFG Generation: Structurally output the basic blocks and branch jump relationships within a function.
[Req-2] Support Intra-procedural DFG Generation: Track variable assignments, propagation, and state mutations within a function.
[Req-3] Node Mapping Support: The generated CFG/DFG nodes MUST map precisely to existing UAST nodes or source code line numbers, enabling upper-layer tools to extract code slices accurately.
-
Typical Use Case
LLM-Driven Precision Slicing:
When YASA outputs a complete Call Graph, the upper-layer extraction tool can utilize YASA's CFG/DFG data to extract only the code lines that touch the tainted variable and the if/else branch lines controlling its flow.
This trims the 1000 lines of source code previously fed to the LLM down to a highly relevant 30-line logic slice.
-
Expected Value
Bridging the Last Mile of LLM + SAST: Provides truly industrial-grade code slicing capabilities, reducing LLM token consumption by over 80%.
Massive Boost to Detection Accuracy: LLMs can focus purely on semantic reasoning (e.g., verifying if an extracted guard condition is valid) rather than manual data flow tracking, drastically converging the false-positive rate.
Strengthening YASA's Product Moat: Fills the gap in flow analysis, bringing YASA on par with the core capabilities of modern, advanced static analysis engines (like CodeQL).
Currently, as we advance our LLM-based next-generation white-box automated audit system (especially for complex business logic vulnerabilities like IDOR/Horizontal Privilege Escalation in microservices), we heavily rely on YASA as our foundational code parsing infrastructure.
While YASA's compilation-free parsing is powerful, and its Universal Abstract Syntax Tree (UAST) and Call Graph outputs excellently resolve macro-level inter-service topologies, the lack of micro-level intra-procedural logic analysis has become a severe bottleneck for our SAST pipeline:
Lack of DFG (Data Flow Graph): We cannot determine if a tainted parameter received in Method A is genuinely passed down to the internally called Method B without modification. This leads to exceptionally high base false-positive and false-negative rates.
Lack of CFG (Control Flow Graph): We cannot pinpoint guard conditions (e.g., if (uid != db.uid) or exception throws) along the data flow, losing the crucial "god's-eye view" needed for vulnerability triage.
Because YASA cannot provide precise CFG/DFG slices, we are forced into a highly inefficient workaround when integrating with our LLM agents:
Extracting the full source code (Method Bodies) of all relevant methods along the Call Graph and feeding them directly to the LLM for reasoning.
This leads to unacceptable engineering issues:
Runaway Token Costs & Latency: When encountering "God Objects" (methods with hundreds of lines), the LLM's context window is easily maxed out. API costs and inference latency scale exponentially, blocking high-concurrency CI/CD pipeline integration.
LLM Hallucinations & Attention Dilution: Forcing the LLM to manually trace data flows across lengthy source codes frequently triggers the "Lost in the Middle" phenomenon, drastically degrading the accuracy of security verdicts.
Alternatives are Too Heavy: We considered external tools like Joern to generate Code Property Graphs (CPG) to bridge this gap, but its extreme memory hunger (tens of GBs) and Scala query barrier contradict our architectural philosophy of being lightweight and "shifting left" rapidly.
We kindly request the YASA team to drill down one layer beneath the UAST and support intra-procedural flow analysis capabilities:
[Req-1] Support Intra-procedural CFG Generation: Structurally output the basic blocks and branch jump relationships within a function.
[Req-2] Support Intra-procedural DFG Generation: Track variable assignments, propagation, and state mutations within a function.
[Req-3] Node Mapping Support: The generated CFG/DFG nodes MUST map precisely to existing UAST nodes or source code line numbers, enabling upper-layer tools to extract code slices accurately.
Typical Use Case
LLM-Driven Precision Slicing:
When YASA outputs a complete Call Graph, the upper-layer extraction tool can utilize YASA's CFG/DFG data to extract only the code lines that touch the tainted variable and the if/else branch lines controlling its flow.
This trims the 1000 lines of source code previously fed to the LLM down to a highly relevant 30-line logic slice.
Expected Value
Bridging the Last Mile of LLM + SAST: Provides truly industrial-grade code slicing capabilities, reducing LLM token consumption by over 80%.
Massive Boost to Detection Accuracy: LLMs can focus purely on semantic reasoning (e.g., verifying if an extracted guard condition is valid) rather than manual data flow tracking, drastically converging the false-positive rate.
Strengthening YASA's Product Moat: Fills the gap in flow analysis, bringing YASA on par with the core capabilities of modern, advanced static analysis engines (like CodeQL).