Jailbreak Detection System

A comprehensive AI safety system designed to detect and prevent harmful content generation, jailbreak attempts, and unethical AI interactions. This system helps ensure AI responses remain within ethical boundaries by analyzing prompts for potential risks and harmful content.

What is a Jailbreak?

A "jailbreak" in AI terms refers to attempts to bypass or override an AI system's safety measures and ethical guidelines. This system helps detect such attempts to maintain safe and responsible AI interactions.

Features

Prompt Analysis and Classification: Analyzes user inputs to identify their nature and potential risks
Toxicity Detection: Measures harmful, offensive, or inappropriate content
Sentiment Analysis: Evaluates the emotional tone and intent of the text
Context-Aware Detection: Considers conversation history to better understand context
Conversation History Tracking: Maintains a record of interactions for better analysis
Rich Console Output: Provides detailed, color-coded analysis results

Setup Instructions

Prerequisites

Python 3.8 or higher
pip
Git (optional)

Installation

Get the Code:

git clone https://github.com/muralikrish9/POIsim.git
cd POIsim

Set Up Python Environment:

# On Windows:
python -m venv venv
.\venv\Scripts\activate

# On macOS/Linux:
python3 -m venv venv
source venv/bin/activate

Install Required Packages:
```
pip install -r requirements.txt
```

Set Up Your API Key: Create .env file with:

GOOGLE_API_KEY=your_google_api_key_here

Download the Model:
```
python download_model.py
```

Running the System

Start the Program:
```
python test_jailbreak.py
```
Using the System:
- Type or paste your text when prompted
- Type 'quit' to exit

Deploying the dashboard

install streamlit: pip install streamlit
deploy dashboard: streamlit run streamlit_app.py

Project Structure

classifier/: Core detection algorithms
test_jailbreak.py: Main program
requirements.txt: Python dependencies
.env: Configuration file
download_model.py: Script to download the required model files

System Requirements

Windows 10/11, macOS, or Linux
4GB RAM minimum (8GB recommended)
2GB free storage
Internet connection
NVIDIA GPU with CUDA support (optional)

License

MIT License

Acknowledgments

Google AI
Hugging Face
Open-source community

System Architecture

flowchart TB
    %% Frontend Layer
    subgraph Frontend["Frontend Layer"]
        direction TB
        StreamlitApp["Streamlit Dashboard"]
        UI["User Interface"]
        Charts["Visualization Charts"]
        StreamlitApp --> UI
        StreamlitApp --> Charts
    end

    %% Core Layer
    subgraph Core["Core Layer"]
        direction TB
        JailbreakDetector["Jailbreak Detector"]
        RiskCalculator["Risk Calculator"]
        SentimentAnalyzer["Sentiment Analyzer"]
        JailbreakDetector --> RiskCalculator
        JailbreakDetector --> SentimentAnalyzer
    end

    %% Models Layer
    subgraph Models["Models Layer"]
        direction TB
        GeminiModel["Gemini Model"]
        DetoxifyModel["Detoxify Model"]
        BERTModel["BERT Model"]
        DistilBERTModel["DistilBERT Model"]
    end

    %% Data Layer
    subgraph Data["Data Layer"]
        direction TB
        HistoryManager["History Manager"]
        ConversationDB["Conversation History"]
        HistoryManager --> ConversationDB
    end

    %% Cross-Layer Connections
    UI --> JailbreakDetector
    Charts --> HistoryManager
    RiskCalculator --> Models
    SentimentAnalyzer --> Models
    JailbreakDetector --> HistoryManager

    %% Styling
    classDef frontend fill:#e1f5fe,stroke:#01579b,stroke-width:2px,color:#01579b
    classDef core fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,color:#2e7d32
    classDef models fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#e65100
    classDef data fill:#fce4ec,stroke:#c2185b,stroke-width:2px,color:#c2185b

    class Frontend,StreamlitApp,UI,Charts frontend
    class Core,JailbreakDetector,RiskCalculator,SentimentAnalyzer core
    class Models,GeminiModel,DetoxifyModel,BERTModel,DistilBERTModel models
    class Data,HistoryManager,ConversationDB data

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.streamlit		.streamlit
classifier		classifier
conversation_backups		conversation_backups
data/jbb		data/jbb
training		training
.gitattributes		.gitattributes
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
README.md		README.md
clear_history.py		clear_history.py
conversation_history.json		conversation_history.json
create_backup.py		create_backup.py
download_model.py		download_model.py
jailbreak-logo.png		jailbreak-logo.png
jailbreak_detector_math_and_scoring.md		jailbreak_detector_math_and_scoring.md
mathematical_representations.md		mathematical_representations.md
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py
test_cuda.py		test_cuda.py
test_jailbreak.py		test_jailbreak.py
test_trained_model.py		test_trained_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jailbreak Detection System

What is a Jailbreak?

Features

Setup Instructions

Prerequisites

Installation

Running the System

Deploying the dashboard

Project Structure

System Requirements

License

Acknowledgments

System Architecture

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

muralikrish9/POIsim

Folders and files

Latest commit

History

Repository files navigation

Jailbreak Detection System

What is a Jailbreak?

Features

Setup Instructions

Prerequisites

Installation

Running the System

Deploying the dashboard

Project Structure

System Requirements

License

Acknowledgments

System Architecture

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages