Sri Harsha Gouru harsha-gouru

Sri Harsha Gouru

I take things apart until I understand them — models, protocols, binaries, whatever's in front of me. Curiosity is the engine; reverse engineering is the method. If I can't open it up and see what's actually happening inside, I don't trust that I know it.

Right now I'm deep in agents, x402, and how small models actually behave under the hood — training them from scratch, watching them learn, hooking them up to onchain payments so they can pay for their own resources and compose with other agents. I think the web is about to get a lot more autonomous, and I want to be building the plumbing for that.

I work across the whole stack — training loops, inference kernels, deployment, the interface someone actually touches. The interesting problems live at the seams between layers, not inside them. Lately that's looked like: porting CUDA kernels to AMD, running on-device LLMs on Apple Silicon, reverse-engineering iOS/macOS internals, and building tooling that makes opaque systems legible.

Research

peft-hybrid-paper — LoRA placement determines continual learning outcomes in hybrid SSM-attention models. Attention-only LoRA → 9x lower perplexity, 5.6x fewer params
ane-gpu-speculative — first speculative decoding using Apple Neural Engine as draft + GPU as verifier
ane-gmlp-research — training a custom gated MLP directly on the ANE
apple-neural-engine-notes — hands-on findings: model execution and training on Apple Silicon
attention-residuals — study of Kimi's AttnRes: learned softmax attention over depth

Building

apple-fm-server — Apple's on-device Foundation Model as an OpenAI-compatible local API
localdictate — Right-Option-to-dictate menu bar app, 100% on-device
llm-inspector — see what an LLM is actually doing, token by token
ai-traffic-audit — what browser-based AI products send over the network
ai-privacy-monitor — track what AI platforms track about you, locally
x402-pay-per-joke — micropayment-powered API on Base, an agent paying per request
gpu-inference-playground — benchmark LLM inference on H100 / MI300X

Looking to collaborate with domain experts who need AI tooling for their work. If you're great at what you do — research, medicine, finance, hardware, anything — and the tool you need doesn't exist yet, reach out. You bring the problem and the taste, I'll bring the AI and systems side.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sri Harsha Gouru harsha-gouru

Achievements

Achievements

Block or report harsha-gouru

Sri Harsha Gouru

Pinned Loading

Uh oh!