Optional progress bar and deep scanning#27
Optional progress bar and deep scanning#27farnaboldi wants to merge 4 commits intopsyray:release/v0.5.0from
Conversation
Reviewer's GuideThis PR introduces an optional Sequence diagram for deep-only scanning workflowsequenceDiagram
actor User
participant OasisCLI
participant Analyzer
participant OllamaManager
participant LLM
participant Report
User->>OasisCLI: Run with -at deep
OasisCLI->>Analyzer: Start deep-only analysis
Analyzer->>OllamaManager: Ensure deep model available
OllamaManager->>LLM: Load model (if needed)
Analyzer->>LLM: Analyze all code chunks for vulnerabilities
LLM-->>Analyzer: Return analysis results
Analyzer->>Report: Generate vulnerability report
Report-->>User: Output deep analysis results
Class diagram for progress bar disabling and deep scan integrationclassDiagram
class OasisCLI {
+args
+run_analysis_mode()
+_should_disable_progress(args)
}
class Analyzer {
+args
+process_analysis_with_model()
+_perform_deep_only_analysis()
+_should_disable_progress(args)
}
class OllamaManager {
+ensure_model_available(model, disable_progress)
+get_available_models(disable_progress)
+select_models(disable_progress)
+format_model_display_batch(disable_progress)
+_preload_model_info(disable_progress)
+_filter_lightweight_models(disable_progress)
}
class EmbeddingManager {
+args
+index_code_files(no_progress)
}
class Tools {
+_should_disable_progress(args, silent)
}
OasisCLI --> Analyzer
OasisCLI --> OllamaManager
Analyzer --> OllamaManager
Analyzer --> Tools
EmbeddingManager --> OllamaManager
EmbeddingManager --> Tools
OllamaManager --> Tools
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
- The new deep-only analysis method largely duplicates file and chunk iteration logic from the standard/deep pipelines—consider extracting common loops/prompts into shared helper functions to reduce code duplication.
- You reference
args.analyze_typefor selecting deep mode but haven’t added a corresponding CLI flag (e.g.-at/--analyze-type) in the argument parser, so the deep scan option won’t actually be settable.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new deep-only analysis method largely duplicates file and chunk iteration logic from the standard/deep pipelines—consider extracting common loops/prompts into shared helper functions to reduce code duplication.
- You reference `args.analyze_type` for selecting deep mode but haven’t added a corresponding CLI flag (e.g. `-at/--analyze-type`) in the argument parser, so the deep scan option won’t actually be settable.
## Individual Comments
### Comment 1
<location> `oasis/ollama_manager.py:210` </location>
<code_context>
logger.debug(f"Parameters: {params}")
- if 'num_ctx' in params:
+ if params and 'num_ctx' in params:
context_length = int(params.split()[1])
chunk_size = int(context_length * 0.9)
</code_context>
<issue_to_address>
Parsing 'num_ctx' from parameters may be fragile if format changes.
Consider validating the type of 'params' and using a more reliable parsing method to prevent errors if the format changes.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
if params and 'num_ctx' in params:
context_length = int(params.split()[1])
chunk_size = int(context_length * 0.9)
logger.info(f"Model {model} context length: {context_length}")
logger.warning(f"Could not detect context length for {model}, using default size: {MAX_CHUNK_SIZE}")
return MAX_CHUNK_SIZE
=======
import re
if isinstance(params, str):
match = re.search(r'num_ctx\s*[:=]?\s*(\d+)', params)
if match:
context_length = int(match.group(1))
chunk_size = int(context_length * 0.9)
logger.info(f"Model {model} context length: {context_length}")
return chunk_size
else:
logger.warning(f"Could not parse context length from parameters for {model}, using default size: {MAX_CHUNK_SIZE}")
else:
logger.warning(f"Parameters for {model} are not a string, using default size: {MAX_CHUNK_SIZE}")
return MAX_CHUNK_SIZE
>>>>>>> REPLACE
</suggested_fix>
### Comment 2
<location> `oasis/embedding.py:131` </location>
<code_context>
analyze_by_function=self.analyze_by_function,
- api_url=self.ollama_manager.api_url
+ api_url=self.ollama_manager.api_url,
+ no_progress=getattr(self.args, 'no_progress', False)
)
for file_path in files
</code_context>
<issue_to_address>
Passing 'no_progress' directly may miss other progress-related flags.
Other flags like 'silent' can also impact progress bar visibility. Use '_should_disable_progress' to handle all relevant cases consistently.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
api_url=self.ollama_manager.api_url,
no_progress=getattr(self.args, 'no_progress', False)
=======
api_url=self.ollama_manager.api_url,
no_progress=_should_disable_progress(self.args)
>>>>>>> REPLACE
</suggested_fix>
### Comment 3
<location> `oasis/analyze.py:344` </location>
<code_context>
return all_results
+ def _perform_deep_only_analysis(self, vulnerabilities, args, report):
+ """
+ Perform deep analysis on all files without initial scanning phase.
</code_context>
<issue_to_address>
Consider refactoring the deep analysis logic into a shared helper to eliminate duplicated code and centralize LLM calls.
### Refactor duplicated deep‐analysis loops into a shared helper
You can collapse both the existing `_perform_deep_analysis` and the new `_perform_deep_only_analysis` into a single driver that takes a map of files→chunks, then invoke that from each workflow. For example:
```python
class SecurityAnalyzer:
# … existing methods …
def _run_deep_analysis(
self,
vulnerabilities,
code_items: Dict[str, Dict],
args,
report
):
all_results = {}
logger.info("\n===== DEEP ANALYSIS MODE =====")
with tqdm(
total=len(vulnerabilities),
desc="Overall vulnerability progress",
position=0,
leave=True,
disable=_should_disable_progress(args)
) as vuln_pbar:
for vuln in vulnerabilities:
name, desc, patterns, impact, mitigation = self._get_vulnerability_details(vuln)
detailed_results = []
for file_path, data in code_items.items():
chunks = data.get("chunks", [data["content"]])
file_results = self._analyze_chunks(
file_path, chunks,
name, desc, patterns, impact, mitigation
)
if file_results:
detailed_results.append({
"file_path": file_path,
"similarity_score": 1.0,
"analysis": "\n\n---\n\n".join(
r["analysis"] for r in file_results
)
})
all_results[name] = detailed_results
if detailed_results:
report.generate_vulnerability_report(vuln, detailed_results, self.llm_model)
else:
logger.info(f"No files analyzed for {name}")
vuln_pbar.update(1)
return all_results
def _analyze_chunks(
self,
file_path: str,
chunks: List[str],
name: str,
desc: str,
patterns: List[str],
impact: str,
mitigation: str
) -> List[Dict]:
results = []
for idx, chunk in enumerate(chunks):
if not chunk.strip():
continue
prompt = self._build_analysis_prompt(
name, desc, patterns, impact, mitigation,
chunk, idx + 1, len(chunks)
)
try:
resp = self.client.generate(model=self.llm_model, prompt=prompt)
analysis = getattr(resp, "response", "")
results.append({"analysis": analysis, "chunk_index": idx})
except Exception as e:
logger.error(f"Error on {file_path}[{idx}]: {e}")
results.append({"analysis": f"ERROR: {e}", "chunk_index": idx})
return results
def _perform_deep_only_analysis(self, vulnerabilities, args, report):
# skip initial scan, analyze all code
return self._run_deep_analysis(vulnerabilities, self.code_base, args, report)
def _perform_deep_analysis(self, suspicious_data, args, report, main_pbar):
# pass only suspicious files
code_items = {
fpath: self.code_base[fpath]
for fpath in suspicious_data["files_by_vuln"].keys()
}
return self._run_deep_analysis(
vulnerabilities=suspicious_data["vulnerabilities"],
code_items=code_items,
args=args,
report=report
)
```
**Benefits:**
1. Eliminates the nearly‐identical nested loops in both flows.
2. Centralizes prompt‐building + LLM calls in `_analyze_chunks`.
3. Keeps progress bars and reports unified.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| logger.warning(f"Could not detect context length for {model}, using default size: {MAX_CHUNK_SIZE}") | ||
| return MAX_CHUNK_SIZE |
There was a problem hiding this comment.
suggestion (bug_risk): Parsing 'num_ctx' from parameters may be fragile if format changes.
Consider validating the type of 'params' and using a more reliable parsing method to prevent errors if the format changes.
| logger.warning(f"Could not detect context length for {model}, using default size: {MAX_CHUNK_SIZE}") | |
| return MAX_CHUNK_SIZE | |
| import re | |
| if isinstance(params, str): | |
| match = re.search(r'num_ctx\s*[:=]?\s*(\d+)', params) | |
| if match: | |
| context_length = int(match.group(1)) | |
| chunk_size = int(context_length * 0.9) | |
| logger.info(f"Model {model} context length: {context_length}") | |
| return chunk_size | |
| else: | |
| logger.warning(f"Could not parse context length from parameters for {model}, using default size: {MAX_CHUNK_SIZE}") | |
| else: | |
| logger.warning(f"Parameters for {model} are not a string, using default size: {MAX_CHUNK_SIZE}") | |
| return MAX_CHUNK_SIZE |
| api_url=self.ollama_manager.api_url, | ||
| no_progress=getattr(self.args, 'no_progress', False) |
There was a problem hiding this comment.
suggestion: Passing 'no_progress' directly may miss other progress-related flags.
Other flags like 'silent' can also impact progress bar visibility. Use '_should_disable_progress' to handle all relevant cases consistently.
| api_url=self.ollama_manager.api_url, | |
| no_progress=getattr(self.args, 'no_progress', False) | |
| api_url=self.ollama_manager.api_url, | |
| no_progress=_should_disable_progress(self.args) |
|
|
||
| return all_results | ||
|
|
||
| def _perform_deep_only_analysis(self, vulnerabilities, args, report): |
There was a problem hiding this comment.
issue (complexity): Consider refactoring the deep analysis logic into a shared helper to eliminate duplicated code and centralize LLM calls.
Refactor duplicated deep‐analysis loops into a shared helper
You can collapse both the existing _perform_deep_analysis and the new _perform_deep_only_analysis into a single driver that takes a map of files→chunks, then invoke that from each workflow. For example:
class SecurityAnalyzer:
# … existing methods …
def _run_deep_analysis(
self,
vulnerabilities,
code_items: Dict[str, Dict],
args,
report
):
all_results = {}
logger.info("\n===== DEEP ANALYSIS MODE =====")
with tqdm(
total=len(vulnerabilities),
desc="Overall vulnerability progress",
position=0,
leave=True,
disable=_should_disable_progress(args)
) as vuln_pbar:
for vuln in vulnerabilities:
name, desc, patterns, impact, mitigation = self._get_vulnerability_details(vuln)
detailed_results = []
for file_path, data in code_items.items():
chunks = data.get("chunks", [data["content"]])
file_results = self._analyze_chunks(
file_path, chunks,
name, desc, patterns, impact, mitigation
)
if file_results:
detailed_results.append({
"file_path": file_path,
"similarity_score": 1.0,
"analysis": "\n\n---\n\n".join(
r["analysis"] for r in file_results
)
})
all_results[name] = detailed_results
if detailed_results:
report.generate_vulnerability_report(vuln, detailed_results, self.llm_model)
else:
logger.info(f"No files analyzed for {name}")
vuln_pbar.update(1)
return all_results
def _analyze_chunks(
self,
file_path: str,
chunks: List[str],
name: str,
desc: str,
patterns: List[str],
impact: str,
mitigation: str
) -> List[Dict]:
results = []
for idx, chunk in enumerate(chunks):
if not chunk.strip():
continue
prompt = self._build_analysis_prompt(
name, desc, patterns, impact, mitigation,
chunk, idx + 1, len(chunks)
)
try:
resp = self.client.generate(model=self.llm_model, prompt=prompt)
analysis = getattr(resp, "response", "")
results.append({"analysis": analysis, "chunk_index": idx})
except Exception as e:
logger.error(f"Error on {file_path}[{idx}]: {e}")
results.append({"analysis": f"ERROR: {e}", "chunk_index": idx})
return results
def _perform_deep_only_analysis(self, vulnerabilities, args, report):
# skip initial scan, analyze all code
return self._run_deep_analysis(vulnerabilities, self.code_base, args, report)
def _perform_deep_analysis(self, suspicious_data, args, report, main_pbar):
# pass only suspicious files
code_items = {
fpath: self.code_base[fpath]
for fpath in suspicious_data["files_by_vuln"].keys()
}
return self._run_deep_analysis(
vulnerabilities=suspicious_data["vulnerabilities"],
code_items=code_items,
args=args,
report=report
)Benefits:
- Eliminates the nearly‐identical nested loops in both flows.
- Centralizes prompt‐building + LLM calls in
_analyze_chunks. - Keeps progress bars and reports unified.
|
|
||
| return all_results | ||
|
|
||
| def _perform_deep_only_analysis(self, vulnerabilities, args, report): |
There was a problem hiding this comment.
issue (code-quality): Low code quality found in SecurityAnalyzer._perform_deep_only_analysis - 19% (low-code-quality)
Explanation
The quality score for this function is below the quality threshold of 25%.This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines. - Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.
| return [self.format_model_display(model) for model in model_names] | ||
|
|
||
| def _filter_lightweight_models(self, models: List[str]) -> List[str]: | ||
| def _filter_lightweight_models(self, models: List[str], disable_progress: bool = False) -> List[str]: |
There was a problem hiding this comment.
issue (code-quality): Low code quality found in OllamaManager._filter_lightweight_models - 13% (low-code-quality)
Explanation
The quality score for this function is below the quality threshold of 25%.This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines. - Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.
|
Hi, thanks for your PR 👍 |
|
Thank you! I |
|
Could you come to the Discord server, we will talk about your modification 😉 |
Major architectural enhancement that makes vulnerability definitions modular and extensible: BREAKING CHANGES: - Vulnerability definitions moved from hardcoded config to JSON files - New vulnerability/ directory structure with 25 vulnerability types - Dynamic loading system with backward compatibility fallback NEW FEATURES: - 9 additional vulnerability types: lfi, rfi, redirect, cmdi, debug, deser, upload, cors, jwt - Extensible system allowing custom vulnerability definitions - JSON-based vulnerability schema for easy maintenance - Automatic detection and loading of custom vulnerability files ENHANCEMENTS: - Users can now add custom vulnerabilities by creating JSON files in vulnerability/ - Vulnerability definitions separated from application logic for better maintainability - Support for organization-specific and domain-specific vulnerability patterns - Enhanced help system displaying all available vulnerability types FILES ADDED: - vulnerability/: New directory with 25 vulnerability definition files - vulnerability/*.json: Individual JSON files for each vulnerability type FILES MODIFIED: - oasis/config.py: Replaced hardcoded VULNERABILITY_MAPPING with dynamic loading system USAGE: - Existing: oasis -i ./code -v sqli,xss,rce (unchanged) - New types: oasis -i ./app -v jwt,cors,upload,debug - Custom: oasis -i ./code -v customvuln (if vulnerability/customvuln.json exists) - Mixed: oasis -i ./api -v sqli,jwt,upload,cors This change enables OASIS to be customized for specific security requirements while maintaining full backward compatibility.
…AdaptiveAnalysis classes Resolves AttributeError where BatchAdaptiveAnalysis object was missing 'args' attribute needed for _should_disable_progress() function calls. CHANGES: - AdaptiveAnalysisPipeline.__init__: Added self.args = analyzer.args - BatchAdaptiveAnalysis.__init__: Added self.args = pipeline.analyzer.args This ensures proper args propagation through the class hierarchy: SecurityAnalyzer.args -> AdaptiveAnalysisPipeline.args -> BatchAdaptiveAnalysis.args Fixes error: "'BatchAdaptiveAnalysis' object has no attribute 'args'"
Run:
mkdir tmp; cd tmp; echo 'import os,sys;os.system("ls "+sys.argv[1])'>test.py; ls -la; oasis -i ./ -m gpt-oss:20b -sm gpt-oss:20b -v rce -t 0.0 -at deep --no-progressOutput:

With the
--no-progressbar I can get cleaner screenshots and with the-at deepI can perform deeper scansSummary by Sourcery
Add optional disabling of progress bars via
--no-progressand implement deep-only analysis mode to bypass initial lightweight scanning and perform direct deep analysis of all code.New Features:
--no-progressCLI flag to disable all progress bars--analyze_type deep) that skips initial scanning and analyzes all code chunks directlyEnhancements:
_should_disable_progressand propagate to all tqdm usages and OllamaManager methods