Skip to content

Optional progress bar and deep scanning#27

Open
farnaboldi wants to merge 4 commits intopsyray:release/v0.5.0from
farnaboldi:master
Open

Optional progress bar and deep scanning#27
farnaboldi wants to merge 4 commits intopsyray:release/v0.5.0from
farnaboldi:master

Conversation

@farnaboldi
Copy link

@farnaboldi farnaboldi commented Aug 19, 2025

Run:
mkdir tmp; cd tmp; echo 'import os,sys;os.system("ls "+sys.argv[1])'>test.py; ls -la; oasis -i ./ -m gpt-oss:20b -sm gpt-oss:20b -v rce -t 0.0 -at deep --no-progress

Output:
Screenshot 2025-08-19 at 8 17 46 PM

With the --no-progress bar I can get cleaner screenshots and with the -at deep I can perform deeper scans

Summary by Sourcery

Add optional disabling of progress bars via --no-progress and implement deep-only analysis mode to bypass initial lightweight scanning and perform direct deep analysis of all code.

New Features:

  • Introduce --no-progress CLI flag to disable all progress bars
  • Add a deep analysis mode (--analyze_type deep) that skips initial scanning and analyzes all code chunks directly

Enhancements:

  • Centralize progress disabling with _should_disable_progress and propagate to all tqdm usages and OllamaManager methods
  • Refine vulnerability analysis prompts with updated ASCII flowchart instructions and HTTP method descriptions

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Aug 19, 2025

Reviewer's Guide

This PR introduces an optional --no-progress flag to disable all progress bars via a new _should_disable_progress helper and implements a deep-only scanning mode (-at deep) that skips the lightweight initial scan and directly performs detailed LLM analysis on all code chunks, with seamless integration into the CLI and model management pipeline.

Sequence diagram for deep-only scanning workflow

sequenceDiagram
    actor User
    participant OasisCLI
    participant Analyzer
    participant OllamaManager
    participant LLM
    participant Report
    User->>OasisCLI: Run with -at deep
    OasisCLI->>Analyzer: Start deep-only analysis
    Analyzer->>OllamaManager: Ensure deep model available
    OllamaManager->>LLM: Load model (if needed)
    Analyzer->>LLM: Analyze all code chunks for vulnerabilities
    LLM-->>Analyzer: Return analysis results
    Analyzer->>Report: Generate vulnerability report
    Report-->>User: Output deep analysis results
Loading

Class diagram for progress bar disabling and deep scan integration

classDiagram
    class OasisCLI {
        +args
        +run_analysis_mode()
        +_should_disable_progress(args)
    }
    class Analyzer {
        +args
        +process_analysis_with_model()
        +_perform_deep_only_analysis()
        +_should_disable_progress(args)
    }
    class OllamaManager {
        +ensure_model_available(model, disable_progress)
        +get_available_models(disable_progress)
        +select_models(disable_progress)
        +format_model_display_batch(disable_progress)
        +_preload_model_info(disable_progress)
        +_filter_lightweight_models(disable_progress)
    }
    class EmbeddingManager {
        +args
        +index_code_files(no_progress)
    }
    class Tools {
        +_should_disable_progress(args, silent)
    }
    OasisCLI --> Analyzer
    OasisCLI --> OllamaManager
    Analyzer --> OllamaManager
    Analyzer --> Tools
    EmbeddingManager --> OllamaManager
    EmbeddingManager --> Tools
    OllamaManager --> Tools
Loading

File-Level Changes

Change Details Files
Add global progress-disabling utility and CLI flag
  • Introduce _should_disable_progress function in tools.py
  • Add --no-progress argument in CLI (oasis.py) to control bar visibility
oasis/tools.py
oasis/oasis.py
Refactor progress bar usage across analysis module
  • Replace disable=args.silent with disable=_should_disable_progress(args) in tqdm calls
  • Store args on analyzer instances for helper access
oasis/analyze.py
Implement deep-only analysis mode
  • Add analyze_type arg check and branch in process_analysis_with_model
  • Create _perform_deep_only_analysis to skip initial scan and run direct LLM analysis
  • Update CLI logging to reflect deep mode
oasis/analyze.py
oasis/oasis.py
Propagate disable_progress into OllamaManager
  • Extend get_available_models, select_models, ensure_model_available, format_model_display_batch, _preload_model_info, _filter_lightweight_models to accept disable_progress
  • Thread args.no_progress through model listing, downloading, and info loading
oasis/ollama_manager.py
Respect no-progress flag in embedding workflows
  • Pass no_progress into embedding and function extraction calls
  • Apply _should_disable_progress to tqdm in index_code_files and parallel tasks
oasis/embedding.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • The new deep-only analysis method largely duplicates file and chunk iteration logic from the standard/deep pipelines—consider extracting common loops/prompts into shared helper functions to reduce code duplication.
  • You reference args.analyze_type for selecting deep mode but haven’t added a corresponding CLI flag (e.g. -at/--analyze-type) in the argument parser, so the deep scan option won’t actually be settable.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new deep-only analysis method largely duplicates file and chunk iteration logic from the standard/deep pipelines—consider extracting common loops/prompts into shared helper functions to reduce code duplication.
- You reference `args.analyze_type` for selecting deep mode but haven’t added a corresponding CLI flag (e.g. `-at/--analyze-type`) in the argument parser, so the deep scan option won’t actually be settable.

## Individual Comments

### Comment 1
<location> `oasis/ollama_manager.py:210` </location>
<code_context>
             logger.debug(f"Parameters: {params}")

-        if 'num_ctx' in params:
+        if params and 'num_ctx' in params:
             context_length = int(params.split()[1])
             chunk_size = int(context_length * 0.9)
</code_context>

<issue_to_address>
Parsing 'num_ctx' from parameters may be fragile if format changes.

Consider validating the type of 'params' and using a more reliable parsing method to prevent errors if the format changes.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
        if params and 'num_ctx' in params:
            context_length = int(params.split()[1])
            chunk_size = int(context_length * 0.9)
            logger.info(f"Model {model} context length: {context_length}")
        logger.warning(f"Could not detect context length for {model}, using default size: {MAX_CHUNK_SIZE}")
        return MAX_CHUNK_SIZE
=======
        import re

        if isinstance(params, str):
            match = re.search(r'num_ctx\s*[:=]?\s*(\d+)', params)
            if match:
                context_length = int(match.group(1))
                chunk_size = int(context_length * 0.9)
                logger.info(f"Model {model} context length: {context_length}")
                return chunk_size
            else:
                logger.warning(f"Could not parse context length from parameters for {model}, using default size: {MAX_CHUNK_SIZE}")
        else:
            logger.warning(f"Parameters for {model} are not a string, using default size: {MAX_CHUNK_SIZE}")
        return MAX_CHUNK_SIZE
>>>>>>> REPLACE

</suggested_fix>

### Comment 2
<location> `oasis/embedding.py:131` </location>
<code_context>
                     analyze_by_function=self.analyze_by_function,
-                    api_url=self.ollama_manager.api_url
+                    api_url=self.ollama_manager.api_url,
+                    no_progress=getattr(self.args, 'no_progress', False)
                 )
                 for file_path in files 
</code_context>

<issue_to_address>
Passing 'no_progress' directly may miss other progress-related flags.

Other flags like 'silent' can also impact progress bar visibility. Use '_should_disable_progress' to handle all relevant cases consistently.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
                    api_url=self.ollama_manager.api_url,
                    no_progress=getattr(self.args, 'no_progress', False)
=======
                    api_url=self.ollama_manager.api_url,
                    no_progress=_should_disable_progress(self.args)
>>>>>>> REPLACE

</suggested_fix>

### Comment 3
<location> `oasis/analyze.py:344` </location>
<code_context>

         return all_results

+    def _perform_deep_only_analysis(self, vulnerabilities, args, report):
+        """
+        Perform deep analysis on all files without initial scanning phase.
</code_context>

<issue_to_address>
Consider refactoring the deep analysis logic into a shared helper to eliminate duplicated code and centralize LLM calls.

### Refactor duplicated deep‐analysis loops into a shared helper

You can collapse both the existing `_perform_deep_analysis` and the new `_perform_deep_only_analysis` into a single driver that takes a map of files→chunks, then invoke that from each workflow.  For example:

```python
class SecurityAnalyzer:
    # … existing methods …

    def _run_deep_analysis(
        self,
        vulnerabilities,
        code_items: Dict[str, Dict],
        args,
        report
    ):
        all_results = {}
        logger.info("\n===== DEEP ANALYSIS MODE =====")
        with tqdm(
            total=len(vulnerabilities),
            desc="Overall vulnerability progress",
            position=0,
            leave=True,
            disable=_should_disable_progress(args)
        ) as vuln_pbar:
            for vuln in vulnerabilities:
                name, desc, patterns, impact, mitigation = self._get_vulnerability_details(vuln)
                detailed_results = []

                for file_path, data in code_items.items():
                    chunks = data.get("chunks", [data["content"]])
                    file_results = self._analyze_chunks(
                        file_path, chunks,
                        name, desc, patterns, impact, mitigation
                    )
                    if file_results:
                        detailed_results.append({
                            "file_path": file_path,
                            "similarity_score": 1.0,
                            "analysis": "\n\n---\n\n".join(
                              r["analysis"] for r in file_results
                            )
                        })

                all_results[name] = detailed_results
                if detailed_results:
                    report.generate_vulnerability_report(vuln, detailed_results, self.llm_model)
                else:
                    logger.info(f"No files analyzed for {name}")

                vuln_pbar.update(1)

        return all_results

    def _analyze_chunks(
        self,
        file_path: str,
        chunks: List[str],
        name: str,
        desc: str,
        patterns: List[str],
        impact: str,
        mitigation: str
    ) -> List[Dict]:
        results = []
        for idx, chunk in enumerate(chunks):
            if not chunk.strip():
                continue
            prompt = self._build_analysis_prompt(
                name, desc, patterns, impact, mitigation,
                chunk, idx + 1, len(chunks)
            )
            try:
                resp = self.client.generate(model=self.llm_model, prompt=prompt)
                analysis = getattr(resp, "response", "")
                results.append({"analysis": analysis, "chunk_index": idx})
            except Exception as e:
                logger.error(f"Error on {file_path}[{idx}]: {e}")
                results.append({"analysis": f"ERROR: {e}", "chunk_index": idx})
        return results

    def _perform_deep_only_analysis(self, vulnerabilities, args, report):
        # skip initial scan, analyze all code
        return self._run_deep_analysis(vulnerabilities, self.code_base, args, report)

    def _perform_deep_analysis(self, suspicious_data, args, report, main_pbar):
        # pass only suspicious files
        code_items = {
            fpath: self.code_base[fpath]
            for fpath in suspicious_data["files_by_vuln"].keys()
        }
        return self._run_deep_analysis(
            vulnerabilities=suspicious_data["vulnerabilities"],
            code_items=code_items,
            args=args,
            report=report
        )
```

**Benefits:**
1. Eliminates the nearly‐identical nested loops in both flows.
2. Centralizes prompt‐building + LLM calls in `_analyze_chunks`.
3. Keeps progress bars and reports unified.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 217 to 218
logger.warning(f"Could not detect context length for {model}, using default size: {MAX_CHUNK_SIZE}")
return MAX_CHUNK_SIZE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Parsing 'num_ctx' from parameters may be fragile if format changes.

Consider validating the type of 'params' and using a more reliable parsing method to prevent errors if the format changes.

Suggested change
logger.warning(f"Could not detect context length for {model}, using default size: {MAX_CHUNK_SIZE}")
return MAX_CHUNK_SIZE
import re
if isinstance(params, str):
match = re.search(r'num_ctx\s*[:=]?\s*(\d+)', params)
if match:
context_length = int(match.group(1))
chunk_size = int(context_length * 0.9)
logger.info(f"Model {model} context length: {context_length}")
return chunk_size
else:
logger.warning(f"Could not parse context length from parameters for {model}, using default size: {MAX_CHUNK_SIZE}")
else:
logger.warning(f"Parameters for {model} are not a string, using default size: {MAX_CHUNK_SIZE}")
return MAX_CHUNK_SIZE

Comment on lines +130 to +131
api_url=self.ollama_manager.api_url,
no_progress=getattr(self.args, 'no_progress', False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Passing 'no_progress' directly may miss other progress-related flags.

Other flags like 'silent' can also impact progress bar visibility. Use '_should_disable_progress' to handle all relevant cases consistently.

Suggested change
api_url=self.ollama_manager.api_url,
no_progress=getattr(self.args, 'no_progress', False)
api_url=self.ollama_manager.api_url,
no_progress=_should_disable_progress(self.args)


return all_results

def _perform_deep_only_analysis(self, vulnerabilities, args, report):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider refactoring the deep analysis logic into a shared helper to eliminate duplicated code and centralize LLM calls.

Refactor duplicated deep‐analysis loops into a shared helper

You can collapse both the existing _perform_deep_analysis and the new _perform_deep_only_analysis into a single driver that takes a map of files→chunks, then invoke that from each workflow. For example:

class SecurityAnalyzer:
    # … existing methods …

    def _run_deep_analysis(
        self,
        vulnerabilities,
        code_items: Dict[str, Dict],
        args,
        report
    ):
        all_results = {}
        logger.info("\n===== DEEP ANALYSIS MODE =====")
        with tqdm(
            total=len(vulnerabilities),
            desc="Overall vulnerability progress",
            position=0,
            leave=True,
            disable=_should_disable_progress(args)
        ) as vuln_pbar:
            for vuln in vulnerabilities:
                name, desc, patterns, impact, mitigation = self._get_vulnerability_details(vuln)
                detailed_results = []

                for file_path, data in code_items.items():
                    chunks = data.get("chunks", [data["content"]])
                    file_results = self._analyze_chunks(
                        file_path, chunks,
                        name, desc, patterns, impact, mitigation
                    )
                    if file_results:
                        detailed_results.append({
                            "file_path": file_path,
                            "similarity_score": 1.0,
                            "analysis": "\n\n---\n\n".join(
                              r["analysis"] for r in file_results
                            )
                        })

                all_results[name] = detailed_results
                if detailed_results:
                    report.generate_vulnerability_report(vuln, detailed_results, self.llm_model)
                else:
                    logger.info(f"No files analyzed for {name}")

                vuln_pbar.update(1)

        return all_results

    def _analyze_chunks(
        self,
        file_path: str,
        chunks: List[str],
        name: str,
        desc: str,
        patterns: List[str],
        impact: str,
        mitigation: str
    ) -> List[Dict]:
        results = []
        for idx, chunk in enumerate(chunks):
            if not chunk.strip():
                continue
            prompt = self._build_analysis_prompt(
                name, desc, patterns, impact, mitigation,
                chunk, idx + 1, len(chunks)
            )
            try:
                resp = self.client.generate(model=self.llm_model, prompt=prompt)
                analysis = getattr(resp, "response", "")
                results.append({"analysis": analysis, "chunk_index": idx})
            except Exception as e:
                logger.error(f"Error on {file_path}[{idx}]: {e}")
                results.append({"analysis": f"ERROR: {e}", "chunk_index": idx})
        return results

    def _perform_deep_only_analysis(self, vulnerabilities, args, report):
        # skip initial scan, analyze all code
        return self._run_deep_analysis(vulnerabilities, self.code_base, args, report)

    def _perform_deep_analysis(self, suspicious_data, args, report, main_pbar):
        # pass only suspicious files
        code_items = {
            fpath: self.code_base[fpath]
            for fpath in suspicious_data["files_by_vuln"].keys()
        }
        return self._run_deep_analysis(
            vulnerabilities=suspicious_data["vulnerabilities"],
            code_items=code_items,
            args=args,
            report=report
        )

Benefits:

  1. Eliminates the nearly‐identical nested loops in both flows.
  2. Centralizes prompt‐building + LLM calls in _analyze_chunks.
  3. Keeps progress bars and reports unified.


return all_results

def _perform_deep_only_analysis(self, vulnerabilities, args, report):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Low code quality found in SecurityAnalyzer._perform_deep_only_analysis - 19% (low-code-quality)


ExplanationThe quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

  • Reduce the function length by extracting pieces of functionality out into
    their own functions. This is the most important thing you can do - ideally a
    function should be less than 10 lines.
  • Reduce nesting, perhaps by introducing guard clauses to return early.
  • Ensure that variables are tightly scoped, so that code using related concepts
    sits together within the function rather than being scattered.

return [self.format_model_display(model) for model in model_names]

def _filter_lightweight_models(self, models: List[str]) -> List[str]:
def _filter_lightweight_models(self, models: List[str], disable_progress: bool = False) -> List[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Low code quality found in OllamaManager._filter_lightweight_models - 13% (low-code-quality)


ExplanationThe quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

  • Reduce the function length by extracting pieces of functionality out into
    their own functions. This is the most important thing you can do - ideally a
    function should be less than 10 lines.
  • Reduce nesting, perhaps by introducing guard clauses to return early.
  • Ensure that variables are tightly scoped, so that code using related concepts
    sits together within the function rather than being scattered.

@psyray
Copy link
Owner

psyray commented Aug 20, 2025

Hi, thanks for your PR 👍
I will test and review it today

@farnaboldi
Copy link
Author

Thank you! I

@psyray psyray changed the base branch from master to release/v0.5.0 August 20, 2025 12:34
@psyray psyray changed the base branch from release/v0.5.0 to master August 20, 2025 12:36
@psyray psyray changed the base branch from master to release/v0.5.0 August 20, 2025 13:00
@psyray
Copy link
Owner

psyray commented Aug 20, 2025

Could you come to the Discord server, we will talk about your modification 😉
https://discord.gg/dW3sFwTtN3

Major architectural enhancement that makes vulnerability definitions modular and extensible:

BREAKING CHANGES:
- Vulnerability definitions moved from hardcoded config to JSON files
- New vulnerability/ directory structure with 25 vulnerability types
- Dynamic loading system with backward compatibility fallback

NEW FEATURES:
- 9 additional vulnerability types: lfi, rfi, redirect, cmdi, debug, deser, upload, cors, jwt
- Extensible system allowing custom vulnerability definitions
- JSON-based vulnerability schema for easy maintenance
- Automatic detection and loading of custom vulnerability files

ENHANCEMENTS:
- Users can now add custom vulnerabilities by creating JSON files in vulnerability/
- Vulnerability definitions separated from application logic for better maintainability
- Support for organization-specific and domain-specific vulnerability patterns
- Enhanced help system displaying all available vulnerability types

FILES ADDED:
- vulnerability/: New directory with 25 vulnerability definition files
- vulnerability/*.json: Individual JSON files for each vulnerability type

FILES MODIFIED:
- oasis/config.py: Replaced hardcoded VULNERABILITY_MAPPING with dynamic loading system

USAGE:
- Existing: oasis -i ./code -v sqli,xss,rce (unchanged)
- New types: oasis -i ./app -v jwt,cors,upload,debug
- Custom: oasis -i ./code -v customvuln (if vulnerability/customvuln.json exists)
- Mixed: oasis -i ./api -v sqli,jwt,upload,cors

This change enables OASIS to be customized for specific security requirements while maintaining full backward compatibility.
…AdaptiveAnalysis classes

Resolves AttributeError where BatchAdaptiveAnalysis object was missing 'args' attribute
needed for _should_disable_progress() function calls.

CHANGES:
- AdaptiveAnalysisPipeline.__init__: Added self.args = analyzer.args
- BatchAdaptiveAnalysis.__init__: Added self.args = pipeline.analyzer.args

This ensures proper args propagation through the class hierarchy:
SecurityAnalyzer.args -> AdaptiveAnalysisPipeline.args -> BatchAdaptiveAnalysis.args

Fixes error: "'BatchAdaptiveAnalysis' object has no attribute 'args'"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants