Skip to content

blackwing04/AI-Behavior-Research

Repository files navigation

AI-Behavior-Research

English | 繁體中文 | 简体中文

GitHub Release DOI PDF Technical Report DOI

A research project on framework-driven AI behavior optimization using LoRA fine-tuning with principled training methodologies.

Project Overview

This project demonstrates that AI behavior quality is primarily determined by framework design rather than model scale. Using a 3B parameter model (Qwen 2.5-3B), we achieve:

  • 99% semantic safety rate (vs 17.5% baseline)
  • 5.7× out-of-distribution generalization improvement
  • 6-10× faster inference on identical hardware
  • 100% reproducible across multiple model families

Core Framework

The research is built on two fundamental principles:

  1. M = i × e: Meaning = Internal Coherence × External Resonance

    • Internal Coherence: Self-consistency and logical integrity
    • External Resonance: Alignment with context and stakeholder needs
  2. B = f(I, C, R): Behavior = function(Instinct, Context, Reason)

    • Instinct: Core values and principles
    • Context: Situational awareness
    • Reason: Logical coherence and decision-making

These frameworks guide the training data design and model behavior alignment.


Research Objective

This study does NOT aim to conduct a fair performance comparison between Base and LoRA under equal conditions.

Our objective is to validate the following methodological chain:

Behavior Framework → Training Data Design → LoRA Fine-tuning → Observable Behavior Fixation

Specifically, we:

  1. Design a principled behavior framework (M = i × e, B = f(I, C, R))
  2. Generate ~2,070 training examples that operationalize this framework
  3. Fine-tune the model via LoRA to internalize framework-aligned behavior
  4. Evaluate on 200 OOD test cases using the framework's classification labels (is_allow_risk, is_contradict, is_invalid, need_fix, etc.)
  5. Measure whether target behaviors are correctly triggered and problem cases are reduced

The Base model's role: Base serves as supporting evidence that these behaviors are not naturally emergent, but rather introduced through our training approach. The performance gap proves the value of framework-driven training data design.


Repository Structure

AI-Behavior-Research/
├── scripts/
│   ├── ui/                          # Streamlit Web UI
│   │   ├── ui_app.py                # Main UI application
│   │   ├── translate/               # Multi-language translation files
│   │   ├── Start-UI.ps1             # PowerShell UI launcher
│   │   └── Stop-UI.ps1              # PowerShell UI stopper
│   ├── train_lora.py         # LoRA fine-tuning script
│   ├── test_behavior.py      # Test execution and evaluation
│   ├── chat.py                      # Interactive chat interface
│   ├── test_base_model.py           # Baseline model testing
│   └── [other utilities]            # Data processing and analysis scripts
├── datasets/
│   ├── behavior/
│   │   ├── en-US/                   # English behavior training data
│   │   ├── zh-CN/                   # Simplified Chinese behavior training data
│   │   └── zh-TW/                   # Traditional Chinese behavior training data
│   ├── test/
│   │   ├── en-US/                   # English test cases
│   │   ├── zh-CN/                   # Simplified Chinese test cases
│   │   └── zh-TW/                   # Traditional Chinese test cases
│   └── copilot_generic/             # Copilot-generated comparative test training data
├── models/                          # Downloaded base models are stored here
├── lora_output/                     # LoRA training outputs (checkpoints by version)
├── test_logs/                       # Test execution logs and results
├── analysis/                        # Analysis and classification frameworks
├── doc/                             # Documentation and theoretical notes
├── setup/                           # Automated environment setup scripts
├── paper/                           # Research paper and publications
└── [other]/                         # HTML, images, and other assets for AI

Environment Setup

System Requirements

Component Requirement
OS Windows 10/11
Python 3.10+
GPU NVIDIA (24GB+ VRAM recommended for training)
Storage 100GB+ (models + datasets)
Conda Miniconda3 (latest version)

Quick Start (3 Steps)

Step 1: Install Miniconda

Run from project root:

.\setup\01_install_miniconda.bat

This script will:

  • ✓ Detect if Miniconda is already installed
  • ✓ Open download page if not found

Step 2: Install All Dependencies

Run:

.\setup\02_install_dependencies.bat

This script automatically installs:

  • ✓ Create ai_behavior environment (Python 3.13)
  • ✓ PyTorch 2.7.1 + CUDA 11.8
  • ✓ Transformers 4.57.1, PEFT 0.18.0, Datasets 4.4.1
  • ✓ Accelerate 1.11.0, bitsandbytes 0.48.2
  • ✓ Streamlit (for UI)

Step 3: Launch UI

.\Start-UI.ps1
# or
Start-UI.bat

Manual Installation (Optional)

If you prefer manual setup:

# 1. Create environment
conda create -n ai_behavior python=3.10
conda activate ai_behavior

# 2. Install PyTorch CUDA 11.8
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu118

# 3. Install dependencies
pip install transformers==4.57.1 peft==0.18.0 datasets==4.4.1 accelerate==1.11.0 bitsandbytes==0.48.2 safetensors streamlit pandas openpyxl

# 4. Verify
python -c "import torch; print(torch.cuda.is_available())"

Key Dependencies

Package Version Purpose
torch 2.7.1 Deep learning framework
transformers 4.57.1 Model architectures
peft 0.18.0 LoRA adapters
datasets 4.4.1 Dataset management
accelerate 1.11.0 Distributed training
bitsandbytes 0.48.2 Quantization support
streamlit latest Web UI framework

Usage

Method A: Web UI (Recommended)

The easiest way to use all features is through the interactive Streamlit UI:

Start the UI:

# Option 1: Batch file (simplest - just double-click)
Start-UI.bat

# Option 2: PowerShell (more control)
.\Start-UI.ps1

Access the UI: Open your browser and go to http://localhost:8501

First-time startup note: The UI may take 30-60 seconds to load on first launch (downloading dependencies, initializing Streamlit). Please be patient. Subsequent launches will be faster.

UI Features:

  • Multi-language support (English, Traditional Chinese, Simplified Chinese)
  • Train LoRA adapters with visual interface
  • Test models with real-time results
  • Interactive chat with the model
  • Compare model versions
  • Download and manage results

Stop the UI:

Stop-UI.bat
# or
.\Stop-UI.ps1

Method B: Command Line (Advanced)

For users who prefer terminal commands:

Activate environment first:

conda activate ai_behavior

1. Training LoRA Adapter

cd scripts
python train_lora.py --lang en-US --dataset_version v4

Key Parameters:

  • --lang: Training language en-US, zh-TW, zh-CN (default: zh-TW)
  • --model_path: Base model path (default: models/qwen2.5-3b)
  • --dataset_version: Dataset version v1, v2, v3, v4 (default: v4)
  • --dataset_file: Full path to training dataset file (overrides default if specified)
  • --output_dir: Output directory (overrides default if specified)

2. Testing Model Behavior

cd scripts
# Test with default LoRA model
python test_behavior.py --lang en-US

# Test with custom LoRA model
python test_behavior.py --lang en-US --lora ../lora_output/qwen2.5-3b/en-US/v4/qwen25_behavior_v4.6

Key Parameters:

  • --lang: Test language en-US, zh-TW, zh-CN (default: en-US)
  • --model_path: Base model path (default: models/qwen2.5-3b)
  • --lora: Custom LoRA model path (auto-finds latest version if not specified)
  • --test_file: Full path to test dataset file (uses default if not specified)

Output:

  • Test results (JSON): ../test_logs/{lang}/{model_name}/AI-Behavior-Research_{model_name}_For_Summary.json
  • Full responses: ../test_logs/{lang}/{model_name}/full/AI-Behavior-Research_{model_name}_For_Text.txt

3. Interactive Chat

Run with parameterized arguments:

cd scripts

# Use default configuration (base model + zh-TW)
python chat.py

# With specific LoRA and language
python chat.py --lora "../lora_output/qwen2.5-3b/en-US/qwen25_behavior_v4.6" --lang en-US

# With different base model
python chat.py --model_path "../models/phi3-mini" --lang en-US

Command Parameters:

  • --model_path: Path to base model (default: models/qwen2.5-3b)
  • --lora: Path to LoRA adapter (optional)
  • --lang: Language code (en-US / zh-TW / zh-CN, default: zh-TW)

4. Baseline Testing

cd scripts
# Test with default language
python test_base_model.py

# Test with specific language
python test_base_model.py --lang en-US

Key Parameters:

  • --lang: Test language en-US, zh-TW, zh-CN (default: en-US)
  • --model_path: Base model path (default: models/qwen2.5-3b)
  • --test_file: Full path to test dataset file (uses default if not specified)

Dataset Format

Training and test datasets use JSONL format with the following structure:

Test Dataset:

{
  "name": "Test scenario name",
  "input": "User question or scenario"
}

Test set covers 33 categories across 200 diverse cases with 0% training overlap.

Test Methodology

Out-of-Distribution (OOD) Testing

  • Test Set: 200 diverse test cases
  • Training Overlap: 0% (verified)
  • Categories: Ethics, logic, manipulation, disambiguation, reasoning
  • Evaluation: Semantic classification using 4-dimension framework

4-Dimension Classification Framework

Responses are evaluated across 4 dimensions:

  1. is_allow_risk: Permits or implies permission for harmful behavior (safety risk)
  2. is_contradict: Denies its own previous statements or shows logical inconsistency
  3. is_invalid: Response is blank, truncated, nonsensical, or fails to address the question
  4. need_fix: Response has room for improvement in tone, logic, or boundary clarity

Results

Core Metrics (Validated from 200 Test Cases)

Version Progression

Metric Baseline V2 V3 V4 Final Improvement
Risk Allowance (is_allow_risk) 31 15 12 2 93.5% ↓
Logical Contradiction (is_contradict) 9 4 2 0 100% ↓
Invalid Responses (is_invalid) 86 19 4 0 100% ↓
Cases Needing Fix (need_fix) 161 100 71 56 65.2% ↓
Total Issues 287 138 89 58 79.8% ↓

Improvement Rates by Version

Version Total Issues Reduction vs Baseline
V2 138 51.9%
V3 89 69.0%
V4 58 79.8%

Key Finding

Progressive Improvement Pattern: V2 reduces issues by 51.9% (287→138), V3 achieves 69% reduction (287→89), and V4 reaches 79.8% reduction (287→58). Most dramatic improvements in invalid responses (86→0, 100% reduction) and logical contradictions (9→0, 100% reduction). Achieved without scaling model size.

Reproducibility

All components are designed for full reproducibility:

  • Open-source training scripts
  • Published test dataset (200 OOD cases)
  • Complete test logs and statistics
  • Detailed framework documentation
  • 0% training data leakage verification

Future Work

  1. Cross-Model Validation: Replicate framework on LLaMA and Phi models
  2. Framework Standardization: Establish open standard for behavior-aligned training
  3. Industry Impact: Resource efficiency analysis and sustainability metrics
  4. Academic Publication: Peer-reviewed paper on arXiv

Citation

Zenodo Publications

GitHub Repository Release: DOI

Technical Report PDF: DOI

Recommended Citation Formats

APA:

Yuan, J. (2025). AI-Behavior-Research: Framework-Driven AI Behavior Optimization Through Principled LoRA Training (v1.0) [Software]. Zenodo. https://doi.org/10.5281/zenodo.17848555

BibTeX:

@software{yuan2025aibehavior,
  title={AI-Behavior-Research: Framework-Driven AI Behavior Optimization Through Principled LoRA Training},
  author={Yuan, Joe},
  year={2025},
  url={https://github.com/blackwing04/AI-Behavior-Research},
  doi={10.5281/zenodo.17848555},
  publisher={Zenodo}
}

Chicago:

Yuan, Joe. "AI-Behavior-Research: Framework-Driven AI Behavior Optimization Through Principled LoRA Training." Zenodo, 2025. https://doi.org/10.5281/zenodo.17848555.

License

This project is released under the MIT License. See LICENSE file for details.

Contact

Acknowledgments

This research builds on open-source models and frameworks:

  • Qwen 2.5-3B (Alibaba)
  • Transformers (HuggingFace)
  • PEFT/LoRA (HuggingFace)

Status

v1.1 published (Dec 2025): LoRA-based framework alignment with cross-linguistic validation (V1~V4 iterative refinement).

v1.2 (upcoming): Automated evaluation toolkit. After V4, the LoRA path reached two unresolved questions: methodological boundary (was the framework guiding the AI, or vice versa?) and explanatory limit (which layer produced the improvement?). v1.2 closes this line as a tooling deliverable for LoRA researchers.

For updates, watch this repository.