A sophisticated multi-agent system for detecting faults and defects in components using computer vision, robotic manipulation, and LangGraph orchestration.
- LangGraph Multi-Agent Architecture: Coordinated system with main coordinator, computer vision, and robotic arm control nodes
- Vision Language Model Integration: Uses OpenRouter API with Gemma 3 VLM for fault detection
- Configuration System: Centralized JSON config for cameras, robot, API keys, and all settings
- Real-time Web Interface: Live camera feeds, communication transcript, and digital twin visualization
- Voice & Text Communication: Support for both text and voice input (STT integration ready)
- Digital Twin Visualization: URDF-based robotic arm state visualization
- WebSocket Real-time Updates: Bidirectional communication for instant updates
Junaid Agent (LangGraph)
├── Main Coordinator Node
│ ├── Handles user communication
│ ├── Routes to appropriate sub-nodes
│ └── Manages conversation flow
├── Computer Vision Node
│ ├── Receives camera feed images
│ ├── Calls VLM API for analysis
│ └── Returns fault detection results
└── Robotic Arm Node
├── Executes manipulation commands
├── Controls gripper and joints
└── Reports arm status
-
Backend (FastAPI + WebSocket)
- RESTful API for message processing
- WebSocket for real-time bidirectional communication
- URDF file upload and management
- Arm state tracking and broadcasting
-
Frontend (HTML/CSS/JavaScript)
- 3 camera feed displays (gripper, front, side)
- Real-time communication transcript
- Text and voice input interface
- Digital twin canvas with arm visualization
- URDF upload capability
-
Agent (LangGraph)
- State management across nodes
- Conditional routing logic
- OpenRouter VLM integration
- Extensible robotic arm control interface
- Python 3.12+
- uv (for virtual environment management)
- Modern web browser with WebSocket support
- Create and activate virtual environment:
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate- Install dependencies:
uv pip install -r requirements.txt- Configure the system:
Edit config.json to set your camera devices, robot serial port, and other settings:
# Validate your configuration
python validate_config.py
# Or edit the config file directly
nano config.jsonSee CONFIG.md for detailed configuration options.
# Start everything with one command
./start.shThis will start both backend and frontend servers automatically.
source .venv/bin/activate
cd backend
python server.pyThe server will start on http://localhost:8000
Option 1 - Using Python HTTP server:
cd frontend
python -m http.server 3000Option 2 - Using Node.js http-server:
cd frontend
npx http-server -p 3000Then open http://localhost:3000 in your browser.
- Text Communication: Type messages in the text box and press Send or Enter
- Voice Communication: Click the Voice button to start/stop recording (STT integration required)
- Vision Analysis: Use commands like "inspect this component" or "check for defects"
- Arm Control: Use commands like "pick up the component" or "rotate 90 degrees"
- URDF Upload: Click "Choose File" under Digital Twin to upload your robot's URDF file
- "Hello Junaid, please inspect this component"
- "Check the front camera for any defects"
- "Pick up the component"
- "Rotate the part 45 degrees"
- "Place the component down"
GET /- API informationGET /api/conversation- Get conversation historyGET /api/arm-state- Get current arm statePOST /api/arm-state- Update arm state with servo positionsPOST /api/upload-urdf- Upload URDF filePOST /api/message- Send a message to Junaid
ws://localhost:8000/ws- WebSocket endpoint for real-time communication
From Client:
{
"type": "message",
"data": {
"text": "inspect component",
"mode": "text",
"camera_feed": "front",
"image": "base64_encoded_image"
}
}From Server:
{
"type": "message",
"data": {
"role": "assistant",
"content": "Analysis complete...",
"timestamp": "2026-01-16T...",
"vision_analysis": "No defects found",
"arm_status": {...}
}
}Replace the simulated camera feeds in frontend/app.js with actual camera streams:
// In initializeCameras():
// Use WebRTC, MJPEG stream, or image polling
const video = document.createElement('video');
video.srcObject = stream; // Your camera streamImplement the execute_arm_action() method in agent/junaid_agent.py:
async def execute_arm_action(self, action: str) -> dict:
# Replace with your robot SDK
# Example for ROS:
# await self.ros_client.send_goal(action)
# Example for serial control:
# await self.serial_port.write(command)
return {"status": "success", "action": action}From your robot control system, send POST requests to update arm state:
import requests
requests.post('http://localhost:8000/api/arm-state', json={
'joint_positions': {
'joint_1': 0.5,
'joint_2': 1.2,
'joint_3': -0.3,
# ... more joints
},
'gripper_state': 'open'
})Add STT in frontend/app.js:
async processVoiceInput(audioBlob) {
// Example with OpenAI Whisper
const formData = new FormData();
formData.append('audio', audioBlob);
const response = await fetch('https://api.openai.com/v1/audio/transcriptions', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY'
},
body: formData
});
const { text } = await response.json();
// Send transcribed text to Junaid
}junaid/
├── agent/
│ └── junaid_agent.py # LangGraph agent implementation
├── backend/
│ └── server.py # FastAPI server with WebSocket
├── frontend/
│ ├── index.html # Web interface
│ └── app.js # Frontend application logic
├── data/
│ └── urdf/ # Uploaded URDF files
├── requirements.txt # Python dependencies
└── README.md # This file
- LangGraph: Multi-agent orchestration framework
- FastAPI: Modern Python web framework
- WebSocket: Real-time bidirectional communication
- OpenRouter: VLM API gateway (Gemma 3)
- HTML5 Canvas: Camera feeds and digital twin rendering
- Full 3D URDF rendering with Three.js + urdf-loader
- Speech-to-Text integration (Whisper, Google STT, Azure Speech)
- Text-to-Speech for voice responses
- Multiple VLM model support
- Recording and replay of inspection sessions
- Defect database and tracking
- Multi-robot coordination
- Advanced path planning for component inspection
- Integration with quality control systems
WebSocket connection fails:
- Ensure backend server is running on port 8000
- Check firewall settings
- Verify the WebSocket URL in
app.js
Camera feeds not showing:
- Camera feeds are simulated by default
- Implement actual camera streaming for production use
Vision analysis not working:
- Verify OpenRouter API key is set
- Check network connectivity
- Ensure images are being captured correctly
Import errors:
- Make sure virtual environment is activated
- Reinstall dependencies:
uv pip install -r requirements.txt
MIT License - feel free to use and modify for your projects.
Contributions are welcome! Please feel free to submit issues or pull requests.
For questions or support, please open an issue in the repository.