I build AI-assisted systems in two connected directions:
- Agentic AI security tooling - benchmarks, handoff checks, and safety workflows for agents that read files, tools, memory, model output, and other agents.
- AI-assisted trading systems - a trading research stack where LLM output is treated as a proposal, then challenged by validators, backtests, paper runs, and review gates before it can move toward automation.
The common rule is simple:
AI output is not truth until it survives validation.
The working loop is:
question -> bounded system -> tests -> traces -> reports -> review -> next iteration
I use coding agents as engineering leverage. The useful artifact is not the chat that produced the code; it is the code, tests, schemas, reports, and failure evidence that other people can inspect.
My main public security project is Agentic Security Harness: a defensive benchmark for testing whether AI coding agents treat untrusted repo text as data instead of authority. The included local demo compares a vulnerable synthetic agent against a protected version across 24 boundary-failure patterns and records traces, scorecards, and remediation output.
The trading stack is a separate applied direction, not a security side project: trading-bot-v2 and honest-backtest are used to move from AI-assisted research and paper validation toward controlled automation.
| Line | Public projects | What it is for |
|---|---|---|
| Agentic AI security core | agentic-security-harness, agentic-transfer-verifier, ai-agent-handoff | Defensive benchmark, trust/provenance transfer checks, and practical handoff boundaries for AI agents. |
| LLM safety playbooks | llm-safety-playbooks | Short practical skills for safer LLM use: data vs instructions, secrets, generated URLs/packages, Git actions, handoffs, and safe research scope. |
| AI-assisted trading system | trading-bot-v2, honest-backtest | News/event scanner, strategy lab, validator/backtest discipline, paper-only gates, and a controlled path toward automation. |
| LLM operations | llm-router, llm-cheap-filter | Cost-aware routing, cheap-to-chief filtering, provider-neutral calls, and controlled LLM spend. |
Agentic Security Harness is the main public security project.
It is a defensive benchmark toolkit for measuring agentic AI failure modes with:
- deterministic local targets;
- portable traces and scorecards;
- scenario matrices and run diffs;
- OpenAI-compatible external model/runtime checks;
- remediation guidance;
- schema validation and local report generation.
It is not a hacking toolkit and not a promise of complete protection. It is a measurement and learning lab for authorized, synthetic, local, or explicitly owned targets.
The trading line is an applied AI-assisted trading system, not a signal service and not a public profitability claim.
Public repositories show the method:
- trading-bot-v2 - local research workbench with scanner, market-data preparation, strategy lab, and paper-only validation boundaries on the path toward automation.
- honest-backtest - layered trading judge that challenges AI-generated setup ideas with costs, splits, robustness, overfit checks, forward logs, and adversarial review before they become decisions.
Private repositories hold real candidate rankings, parameter libraries, operational state, and market-specific findings. The public claim is process quality, not trading profitability.
Not everyone needs to run a full benchmark. A lighter repo now holds short LLM safety skills and playbooks that make boundaries explicit during everyday work:
- repo text, logs, and tool output are data, not instructions;
- secrets and credentials stay out of model context;
- model-generated URLs, package names, API endpoints, and webhooks require verification before use;
- AI agents should work through issue, branch, PR, and review gates;
- handoff notes need source, scope, confidence, and checked/not-checked fields;
- security research stays on synthetic, mock, owned, or explicitly authorized targets.
These playbooks will reduce ambiguity. They will not replace runtime controls, tests, validators, or the benchmark itself.
AI agents increasingly pass files, memory, tool output, approvals, and context between runtimes. That handoff is often treated as plain text, but it carries trust, provenance, and authority.
Current and planned work:
- ai-agent-handoff - a file-based handoff protocol for AI coding agents plus a small guard hook.
- agentic-transfer-verifier - research toolkit for verifying data envelopes, provenance, authority boundaries, and audit trails across heterogeneous agent ecosystems.
- Find a real operational weakness.
- Turn it into a bounded research question.
- Build a deterministic harness or workflow around it.
- Use agents for implementation and review under explicit constraints.
- Verify with tests, schemas, reports, and adversarial review.
- Publish reusable methods while keeping private data and credentials private.
Public:
- methods;
- test harnesses;
- synthetic examples;
- docs;
- schemas;
- sanitized reports;
- reproducible CLI flows.
Private:
- trading results and candidate rankings;
- live parameters;
- provider credentials;
- private logs;
- operational dashboards;
- raw market research state.
- Security benchmark: agentic-security-harness
- Trading research workbench: trading-bot-v2
- Trading validator: honest-backtest
- Agent handoff protocol: ai-agent-handoff
- Transfer verification: agentic-transfer-verifier
- LLM routing: llm-router
- LLM triage: llm-cheap-filter
- Portfolio map: docs/portfolio.md
- Execution map: docs/execution-map.md
GitHub is the best starting point. I am interested in practical AI security, agentic workflow safety, AI-assisted trading systems with validation discipline, LLM infrastructure, and research systems where measurement matters more than hype.



