AI-Augmented Engineer · LLMOps · Data Engineering · Cloud
Brasília, BR — open to remote
I build systems that put LLMs into production — from streaming data pipelines to eval infrastructure that catches prompt regressions before deploy.
Use Claude Code as an architecture amplifier: I define the system boundaries, own the decisions, and drive the execution.
PromptBench Studio — LLM Eval Infrastructure
LLM apps regress when prompts change without control. PromptBench treats prompts as versioned software assets.
PromptVersionis immutable — no UPDATE, every edit creates an auditable snapshot- Measures quality, latency, variance, and real cost (tokens from the provider's
usageresponse, not estimates) - Regression verdict: dimension ∧ slice ∧ cost — a better global average doesn't earn promotion if any slice regresses
- 87% test coverage on the evaluation module; deterministic checks are pure functions
Stack: Next.js · FastAPI · PostgreSQL 16 · Redis · arq · Anthropic Claude
Mapear-RN — Political Intelligence Platform
Political opinion monitoring across 167 municipalities in Rio Grande do Norte, Brazil — at < R$5/month.
- RSS + social media ingestion → NLP sentiment pipeline → LLM-generated strategic recommendations
- Real-time dashboard with RAG over regional political knowledge base
- Cloud-native on GCP with full data governance
Stack: Python · GCP · BigQuery · dbt · Cloud Run
Pulso — Real-time Crypto Intelligence
Exactly-once streaming pipeline with Claude as the anomaly explainer — ~US$40/month on GCP.
- Redpanda → ksqlDB → Apache Iceberg with exactly-once semantics
- Claude explains detected anomalies in natural language, not just flags them
- Cost-controlled LLM integration with token tracking
Stack: Redpanda · ksqlDB · Apache Iceberg · Python · Anthropic Claude
Every project here has:
- A problem statement — not just a stack demo
- ADRs (Architecture Decision Records) explaining why each key decision was made
- A build audit trail —
docs/progress/blockN.mdlogging what was done, what ran, what was decided - Real numbers — cost, latency, test coverage — not vague claims
AWS Certified Cloud Practitioner · Microsoft Azure AI Essentials · Anthropic Claude 101 · Anthropic Claude Code in Action
- Expanding PromptBench with multi-model judge (majority vote), continuous eval on real traffic, and GitHub Actions PR gate
- Growing English fluency for international collaboration
"A better global average doesn't earn promotion when it breaks the cost budget."
— from PromptBench Studio's regression verdict
