Junaid - Fault Detection Agent

A sophisticated multi-agent system for detecting faults and defects in components using computer vision, robotic manipulation, and LangGraph orchestration.

Features

LangGraph Multi-Agent Architecture: Coordinated system with main coordinator, computer vision, and robotic arm control nodes
Vision Language Model Integration: Uses OpenRouter API with Gemma 3 VLM for fault detection
Configuration System: Centralized JSON config for cameras, robot, API keys, and all settings
Real-time Web Interface: Live camera feeds, communication transcript, and digital twin visualization
Voice & Text Communication: Support for both text and voice input (STT integration ready)
Digital Twin Visualization: URDF-based robotic arm state visualization
WebSocket Real-time Updates: Bidirectional communication for instant updates

Architecture

Agent Structure

Junaid Agent (LangGraph)
├── Main Coordinator Node
│   ├── Handles user communication
│   ├── Routes to appropriate sub-nodes
│   └── Manages conversation flow
├── Computer Vision Node
│   ├── Receives camera feed images
│   ├── Calls VLM API for analysis
│   └── Returns fault detection results
└── Robotic Arm Node
    ├── Executes manipulation commands
    ├── Controls gripper and joints
    └── Reports arm status

System Components

Backend (FastAPI + WebSocket)
- RESTful API for message processing
- WebSocket for real-time bidirectional communication
- URDF file upload and management
- Arm state tracking and broadcasting
Frontend (HTML/CSS/JavaScript)
- 3 camera feed displays (gripper, front, side)
- Real-time communication transcript
- Text and voice input interface
- Digital twin canvas with arm visualization
- URDF upload capability
Agent (LangGraph)
- State management across nodes
- Conditional routing logic
- OpenRouter VLM integration
- Extensible robotic arm control interface

Installation

Prerequisites

Python 3.12+
uv (for virtual environment management)
Modern web browser with WebSocket support

Setup

Create and activate virtual environment:

uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:

uv pip install -r requirements.txt

Configure the system:

Edit config.json to set your camera devices, robot serial port, and other settings:

# Validate your configuration
python validate_config.py

# Or edit the config file directly
nano config.json

See CONFIG.md for detailed configuration options.

Usage

Quick Start

# Start everything with one command
./start.sh

This will start both backend and frontend servers automatically.

Starting the Backend Server (Manual)

source .venv/bin/activate
cd backend
python server.py

The server will start on http://localhost:8000

Starting the Frontend

Option 1 - Using Python HTTP server:

cd frontend
python -m http.server 3000

Option 2 - Using Node.js http-server:

cd frontend
npx http-server -p 3000

Then open http://localhost:3000 in your browser.

Interacting with Junaid

Text Communication: Type messages in the text box and press Send or Enter
Voice Communication: Click the Voice button to start/stop recording (STT integration required)
Vision Analysis: Use commands like "inspect this component" or "check for defects"
Arm Control: Use commands like "pick up the component" or "rotate 90 degrees"
URDF Upload: Click "Choose File" under Digital Twin to upload your robot's URDF file

Example Commands

"Hello Junaid, please inspect this component"
"Check the front camera for any defects"
"Pick up the component"
"Rotate the part 45 degrees"
"Place the component down"

API Endpoints

REST API

GET / - API information
GET /api/conversation - Get conversation history
GET /api/arm-state - Get current arm state
POST /api/arm-state - Update arm state with servo positions
POST /api/upload-urdf - Upload URDF file
POST /api/message - Send a message to Junaid

WebSocket

ws://localhost:8000/ws - WebSocket endpoint for real-time communication

WebSocket Message Types

From Client:

{
  "type": "message",
  "data": {
    "text": "inspect component",
    "mode": "text",
    "camera_feed": "front",
    "image": "base64_encoded_image"
  }
}

From Server:

{
  "type": "message",
  "data": {
    "role": "assistant",
    "content": "Analysis complete...",
    "timestamp": "2026-01-16T...",
    "vision_analysis": "No defects found",
    "arm_status": {...}
  }
}

Integrating with Your Robot

1. Camera Feeds

Replace the simulated camera feeds in frontend/app.js with actual camera streams:

// In initializeCameras():
// Use WebRTC, MJPEG stream, or image polling
const video = document.createElement('video');
video.srcObject = stream; // Your camera stream

2. Robotic Arm Control

Implement the execute_arm_action() method in agent/junaid_agent.py:

async def execute_arm_action(self, action: str) -> dict:
    # Replace with your robot SDK
    # Example for ROS:
    # await self.ros_client.send_goal(action)
    
    # Example for serial control:
    # await self.serial_port.write(command)
    
    return {"status": "success", "action": action}

3. Sending Arm State Updates

From your robot control system, send POST requests to update arm state:

import requests

requests.post('http://localhost:8000/api/arm-state', json={
    'joint_positions': {
        'joint_1': 0.5,
        'joint_2': 1.2,
        'joint_3': -0.3,
        # ... more joints
    },
    'gripper_state': 'open'
})

4. Speech-to-Text Integration

Add STT in frontend/app.js:

async processVoiceInput(audioBlob) {
    // Example with OpenAI Whisper
    const formData = new FormData();
    formData.append('audio', audioBlob);
    
    const response = await fetch('https://api.openai.com/v1/audio/transcriptions', {
        method: 'POST',
        headers: {
            'Authorization': 'Bearer YOUR_API_KEY'
        },
        body: formData
    });
    
    const { text } = await response.json();
    // Send transcribed text to Junaid
}

Project Structure

junaid/
├── agent/
│   └── junaid_agent.py          # LangGraph agent implementation
├── backend/
│   └── server.py                # FastAPI server with WebSocket
├── frontend/
│   ├── index.html               # Web interface
│   └── app.js                   # Frontend application logic
├── data/
│   └── urdf/                    # Uploaded URDF files
├── requirements.txt             # Python dependencies
└── README.md                    # This file

Technologies Used

LangGraph: Multi-agent orchestration framework
FastAPI: Modern Python web framework
WebSocket: Real-time bidirectional communication
OpenRouter: VLM API gateway (Gemma 3)
HTML5 Canvas: Camera feeds and digital twin rendering

Future Enhancements

Full 3D URDF rendering with Three.js + urdf-loader
Speech-to-Text integration (Whisper, Google STT, Azure Speech)
Text-to-Speech for voice responses
Multiple VLM model support
Recording and replay of inspection sessions
Defect database and tracking
Multi-robot coordination
Advanced path planning for component inspection
Integration with quality control systems

Troubleshooting

WebSocket connection fails:

Ensure backend server is running on port 8000
Check firewall settings
Verify the WebSocket URL in app.js

Camera feeds not showing:

Camera feeds are simulated by default
Implement actual camera streaming for production use

Vision analysis not working:

Verify OpenRouter API key is set
Check network connectivity
Ensure images are being captured correctly

Import errors:

Make sure virtual environment is activated
Reinstall dependencies: uv pip install -r requirements.txt

License

MIT License - feel free to use and modify for your projects.

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Contact

For questions or support, please open an issue in the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
SO-ARM100		SO-ARM100
agent		agent
backend		backend
examples		examples
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CONFIG.md		CONFIG.md
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
config.json		config.json
config.py		config.py
requirements.txt		requirements.txt
start.sh		start.sh
test_agent.py		test_agent.py
test_basic.py		test_basic.py
test_vlm.py		test_vlm.py
validate_config.py		validate_config.py

Folders and files

Latest commit

History

Repository files navigation

Junaid - Fault Detection Agent

Features

Architecture

Agent Structure

System Components

Installation

Prerequisites

Setup

Usage

Quick Start

Starting the Backend Server (Manual)

Starting the Frontend

Interacting with Junaid

Example Commands

API Endpoints

REST API

WebSocket

WebSocket Message Types

Integrating with Your Robot

1. Camera Feeds

2. Robotic Arm Control

3. Sending Arm State Updates

4. Speech-to-Text Integration

Project Structure

Technologies Used

Future Enhancements

Troubleshooting

License

Contributing

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages