Turning hard AI problems into reliable products β from green-field architecture to teams shipping at scale across PK Β· USA Β· UK Β· UAE.
I'm an engineering leader with 10+ years shipping full-stack systems and the last several leading AI transformation for ambitious products. I move comfortably from a whiteboard system diagram to a Rust async worker to a multi-tenant Kubernetes deploy β and I most enjoy the messy middle where research-grade models have to survive production traffic.
Today I lead engineering at Clerint Media, building a real-time broadcast intelligence platform that analyzes dozens of live TV streams concurrently β face recognition, OCR, and speech-to-text into a searchable, alertable evidence layer.
| π§ AI products from zero | Take a model that works in a notebook and turn it into a pipeline that survives 30 concurrent live streams, restarts, GPU loss, and 3 AM pages. |
| ποΈ Solution architecture audits | Walk into an existing system and find the 20% of design driving 80% of incidents and cloud bill. Output: a phased plan, not a 60-slide deck. |
| π§ Fractional CTO for early-stage | Pick the stack, hire the first engineers, ship the first version, and stay long enough to make sure it doesn't collapse under its own weight. |
| π AI transformation for established orgs | Wire LLMs, RAG, vector search, and computer vision into workflows that move real metrics β not demo metrics. |
A multi-tenant SaaS analyzing 30+ live HLS/RTSP TV channels in parallel, on a single-node Kubernetes cluster.
- Rust ML worker (
tonicgRPC +tokio) supervises an FFmpeg frame + audio pipeline per channel, fanning frames out viabroadcastchannels to OCR / face / speech workers. - Face recognition with SCRFD + ArcFace ONNX models; embeddings stored in pgvector for sub-second identity search across hours of footage.
- OCR via PaddleOCR HTTP service; speech-to-text via Deepgram WebSocket streams.
- NestJS orchestrator drains gRPC events β Prisma writes + Socket.io fan-out + BullMQ stories/alerts.
- React 19 + Vite + Tailwind 4 SPA with a live DVR timeline and custom clip range slider.
- Plain-YAML Kubernetes β two parallel deployments (main + MOIB) on bare-metal.
github.com/aqibbangash/urdu-stt-bench
CPU-only benchmark harness for offline Urdu speech-to-text. faster-whisper / CTranslate2 + Streamlit UI + Docker. A decision-support tool for picking the right STT model for low-resource languages without burning a GPU budget.
- Boring tech for the load-bearing parts. Plain YAML over Helm, Postgres over five exotic stores, monolith-until-it-hurts.
- Pipelines, not point solutions. A model that works in isolation is a science project; a supervised, restartable, observable pipeline is a product.
- Architecture follows team shape. I pick stacks for the people who'll maintain them on Wednesday at 4 PM, not for a conference talk.
- AI is plumbing, not magic. The interesting work is in latency budgets, fallbacks, eval harnesses, and what happens when the model is wrong.
If you're shipping something at the intersection of real-time systems, computer vision, NLP, or AI-into-existing-workflows β and you want a partner who'll architect it, code the hard parts, and stay until it ships β I'd like to hear about it.



