Skip to content

penguin-io/junaid

Repository files navigation

Junaid - Fault Detection Agent

A sophisticated multi-agent system for detecting faults and defects in components using computer vision, robotic manipulation, and LangGraph orchestration.

Features

  • LangGraph Multi-Agent Architecture: Coordinated system with main coordinator, computer vision, and robotic arm control nodes
  • Vision Language Model Integration: Uses OpenRouter API with Gemma 3 VLM for fault detection
  • Configuration System: Centralized JSON config for cameras, robot, API keys, and all settings
  • Real-time Web Interface: Live camera feeds, communication transcript, and digital twin visualization
  • Voice & Text Communication: Support for both text and voice input (STT integration ready)
  • Digital Twin Visualization: URDF-based robotic arm state visualization
  • WebSocket Real-time Updates: Bidirectional communication for instant updates

Architecture

Agent Structure

Junaid Agent (LangGraph)
├── Main Coordinator Node
│   ├── Handles user communication
│   ├── Routes to appropriate sub-nodes
│   └── Manages conversation flow
├── Computer Vision Node
│   ├── Receives camera feed images
│   ├── Calls VLM API for analysis
│   └── Returns fault detection results
└── Robotic Arm Node
    ├── Executes manipulation commands
    ├── Controls gripper and joints
    └── Reports arm status

System Components

  1. Backend (FastAPI + WebSocket)

    • RESTful API for message processing
    • WebSocket for real-time bidirectional communication
    • URDF file upload and management
    • Arm state tracking and broadcasting
  2. Frontend (HTML/CSS/JavaScript)

    • 3 camera feed displays (gripper, front, side)
    • Real-time communication transcript
    • Text and voice input interface
    • Digital twin canvas with arm visualization
    • URDF upload capability
  3. Agent (LangGraph)

    • State management across nodes
    • Conditional routing logic
    • OpenRouter VLM integration
    • Extensible robotic arm control interface

Installation

Prerequisites

  • Python 3.12+
  • uv (for virtual environment management)
  • Modern web browser with WebSocket support

Setup

  1. Create and activate virtual environment:
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  1. Install dependencies:
uv pip install -r requirements.txt
  1. Configure the system:

Edit config.json to set your camera devices, robot serial port, and other settings:

# Validate your configuration
python validate_config.py

# Or edit the config file directly
nano config.json

See CONFIG.md for detailed configuration options.

Usage

Quick Start

# Start everything with one command
./start.sh

This will start both backend and frontend servers automatically.

Starting the Backend Server (Manual)

source .venv/bin/activate
cd backend
python server.py

The server will start on http://localhost:8000

Starting the Frontend

Option 1 - Using Python HTTP server:

cd frontend
python -m http.server 3000

Option 2 - Using Node.js http-server:

cd frontend
npx http-server -p 3000

Then open http://localhost:3000 in your browser.

Interacting with Junaid

  1. Text Communication: Type messages in the text box and press Send or Enter
  2. Voice Communication: Click the Voice button to start/stop recording (STT integration required)
  3. Vision Analysis: Use commands like "inspect this component" or "check for defects"
  4. Arm Control: Use commands like "pick up the component" or "rotate 90 degrees"
  5. URDF Upload: Click "Choose File" under Digital Twin to upload your robot's URDF file

Example Commands

  • "Hello Junaid, please inspect this component"
  • "Check the front camera for any defects"
  • "Pick up the component"
  • "Rotate the part 45 degrees"
  • "Place the component down"

API Endpoints

REST API

  • GET / - API information
  • GET /api/conversation - Get conversation history
  • GET /api/arm-state - Get current arm state
  • POST /api/arm-state - Update arm state with servo positions
  • POST /api/upload-urdf - Upload URDF file
  • POST /api/message - Send a message to Junaid

WebSocket

  • ws://localhost:8000/ws - WebSocket endpoint for real-time communication

WebSocket Message Types

From Client:

{
  "type": "message",
  "data": {
    "text": "inspect component",
    "mode": "text",
    "camera_feed": "front",
    "image": "base64_encoded_image"
  }
}

From Server:

{
  "type": "message",
  "data": {
    "role": "assistant",
    "content": "Analysis complete...",
    "timestamp": "2026-01-16T...",
    "vision_analysis": "No defects found",
    "arm_status": {...}
  }
}

Integrating with Your Robot

1. Camera Feeds

Replace the simulated camera feeds in frontend/app.js with actual camera streams:

// In initializeCameras():
// Use WebRTC, MJPEG stream, or image polling
const video = document.createElement('video');
video.srcObject = stream; // Your camera stream

2. Robotic Arm Control

Implement the execute_arm_action() method in agent/junaid_agent.py:

async def execute_arm_action(self, action: str) -> dict:
    # Replace with your robot SDK
    # Example for ROS:
    # await self.ros_client.send_goal(action)
    
    # Example for serial control:
    # await self.serial_port.write(command)
    
    return {"status": "success", "action": action}

3. Sending Arm State Updates

From your robot control system, send POST requests to update arm state:

import requests

requests.post('http://localhost:8000/api/arm-state', json={
    'joint_positions': {
        'joint_1': 0.5,
        'joint_2': 1.2,
        'joint_3': -0.3,
        # ... more joints
    },
    'gripper_state': 'open'
})

4. Speech-to-Text Integration

Add STT in frontend/app.js:

async processVoiceInput(audioBlob) {
    // Example with OpenAI Whisper
    const formData = new FormData();
    formData.append('audio', audioBlob);
    
    const response = await fetch('https://api.openai.com/v1/audio/transcriptions', {
        method: 'POST',
        headers: {
            'Authorization': 'Bearer YOUR_API_KEY'
        },
        body: formData
    });
    
    const { text } = await response.json();
    // Send transcribed text to Junaid
}

Project Structure

junaid/
├── agent/
│   └── junaid_agent.py          # LangGraph agent implementation
├── backend/
│   └── server.py                # FastAPI server with WebSocket
├── frontend/
│   ├── index.html               # Web interface
│   └── app.js                   # Frontend application logic
├── data/
│   └── urdf/                    # Uploaded URDF files
├── requirements.txt             # Python dependencies
└── README.md                    # This file

Technologies Used

  • LangGraph: Multi-agent orchestration framework
  • FastAPI: Modern Python web framework
  • WebSocket: Real-time bidirectional communication
  • OpenRouter: VLM API gateway (Gemma 3)
  • HTML5 Canvas: Camera feeds and digital twin rendering

Future Enhancements

  • Full 3D URDF rendering with Three.js + urdf-loader
  • Speech-to-Text integration (Whisper, Google STT, Azure Speech)
  • Text-to-Speech for voice responses
  • Multiple VLM model support
  • Recording and replay of inspection sessions
  • Defect database and tracking
  • Multi-robot coordination
  • Advanced path planning for component inspection
  • Integration with quality control systems

Troubleshooting

WebSocket connection fails:

  • Ensure backend server is running on port 8000
  • Check firewall settings
  • Verify the WebSocket URL in app.js

Camera feeds not showing:

  • Camera feeds are simulated by default
  • Implement actual camera streaming for production use

Vision analysis not working:

  • Verify OpenRouter API key is set
  • Check network connectivity
  • Ensure images are being captured correctly

Import errors:

  • Make sure virtual environment is activated
  • Reinstall dependencies: uv pip install -r requirements.txt

License

MIT License - feel free to use and modify for your projects.

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Contact

For questions or support, please open an issue in the repository.

About

Junaid, the real life JARVIS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors