B.E. Robotics and Artificial Intelligence - Thapar Institute of Engineering and Technology
B.S. Data Science and Applications - Indian Institute of Technology Madras
Pre-final year undergrad building at the intersection of robotics, computer vision, and autonomous systems. My work runs on real hardware and real infrastructure: multimodal ML pipelines, edge AI systems, agentic backends, and production-grade open-source tooling.
Open Source: Active contributor at JdeRobot/RoboticsAcademy with 16 merged PRs. Resolved an FP16 precision crash in the Object Detection pipeline, fixed deployment script bugs across run_academy.sh and develop_academy.sh, refactored the Hardware Abstraction Layer, and shipped 52 unit tests across 5 test classes.
Research at Thapar ELC (Summer 2025): Multimodal CNN and DNN for Parkinson's early detection. Fused MPU9250 tremor signals with voice recordings via late fusion, pushing combined model accuracy from 88% to 91%.
Computer Vision: Toll fraud detection system built on HOG features and LinearSVC. 24K+ images, 97% accuracy on multi-axle vehicle classification.
Robotics Research (ongoing): Audio-Visual-Thermal fusion architecture for autonomous SAR navigation in visually degraded environments, under Dr. Ankit Soni at Thapar, using Isaac Sim and ROS 2.
| Project | Description | Stack |
|---|---|---|
| Archon | Production-deployable instruction-to-deployment backend. Hybrid RAG (Cohere dense + BM25 sparse, RRF fusion) retrieves context, Anthropic Tool Use generates schema-validated code, and a GitHub App deployer pushes live sites to GitHub Pages. FastAPI and Celery handle async execution; Redis Pub/Sub streams logs over WebSocket to a React and TypeScript dashboard; full observability via Prometheus, Grafana, and OpenTelemetry. | FastAPI - Celery - Redis - Cohere - Anthropic API - React - TypeScript - Vite - PostgreSQL - SQLAlchemy - Alembic - Prometheus - Grafana - OpenTelemetry - Docker |
| Parkinson's Early Detection | Multimodal early detection fusing MPU9250 IMU tremor signals with voice recordings. CNN on voice features (88% accuracy) and DNN on tremor data combined via late fusion to reach 91%. Custom ESP32 hardware pipeline from sensor to model inference. | TensorFlow - Keras - Librosa - Parselmouth - scikit-learn - SoundDevice |
| Axon Core | Production-deployable fully local tri-modal AI assistant. A BART-MNLI zero-shot router dispatches across three paths: knowledge retrieval via Qdrant and local Gemma, OS-level tool execution with user confirmation, and general conversation. Hybrid RAG with MiniLM and BM25, reranked by a cross-encoder; GBNF-constrained sampling for tool calling. | FastAPI - LangChain - Qdrant - Ollama - Next.js - Docker - SQLAlchemy |
| Helix | Production-deployable recursive autonomous web agent on the OODA loop. Playwright handles JS-heavy DOMs, Claude Tool Use synthesizes Python solutions just-in-time, RestrictedPython and SIGALRM sandbox execution, and HTTP submission loops until a terminal state. Durable jobs via ARQ on Redis; Prometheus, Loki, and Grafana cover observability. | FastAPI - Playwright - Claude API - ARQ - Redis - Prometheus - Loki - Grafana - Docker |
| TruthTag: Toll-Audit | Classical CV pipeline cross-verifying RFID FASTag claims against physical vehicle geometry at toll plazas. 3780-dimensional HOG vectors, LinearSVC trained on 24K+ images, 97% accuracy on multi-axle classification. Cross-modal centroid tracker, MOG2 virtual tripwire, and a Streamlit audit dashboard. | OpenCV - scikit-learn - HOG - LinearSVC - NumPy - Streamlit - Seaborn - Matplotlib |
| Project | Description | Status |
|---|---|---|
| Canary Rover | Autonomous mine inspection rover. PPO locomotion trained in PyBullet (200K timesteps), real-time ROS 2 sensor stack for IMU, LiDAR, and BLDC encoders, SLAM via slam_toolbox and RTAB-Map, and full 3D simulation in NVIDIA Isaac Sim 5.1. Capstone project, team of 5. | Ongoing -- Capstone |
| MRI Reconstruction | Dual-branch physics-guided framework for accelerated MRI reconstruction. Learned gating network routes k-space data adaptively without anatomy labels at inference. Achieves +1.78% SSIM over single-branch baselines at 112ms and 302 GFLOPs on an RTX 4060. | Paper authored |
| Audio-Visual-Thermal SAR | Multimodal fusion architecture for autonomous SAR in visually degraded environments. Thermal, acoustic, and visual modalities fused for robust SLAM and detection, under Dr. Ankit Soni at Thapar. | Review paper authored |
Currently working on: Physics-guided deep learning for medical imaging -- RL-based autonomous navigation -- Multimodal sensor fusion for SAR -- Beyond-transformer sequence architectures

