Skip to content
View LesterALeong's full-sized avatar

Block or report LesterALeong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
LesterALeong/README.md

Lester Leong

AI engineer building reliable, evaluated agentic systems.

I bring quant-grade statistical rigor to LLM engineering: evals, error bars, and not shipping on noise. The discipline comes from running multi-agent systems where a wrong answer does not fail a unit test, it costs real money. PhD in Finance (ML time series).

I work at the intersection of AI engineering and quantitative finance: multi-agent orchestration, LLM evaluation and reliability, and production systems that have to be right, not just plausible.


Featured

llm-evalgate

Eval gates and reliability primitives for LLM pipelines, with a confidence interval on every metric and a statistically honest regression gate. The full eval surface (deterministic gates, calibrated LLM-as-judge, agentic-trace evals, multiple-comparisons-corrected CI gating), in CI, with the stats done right. Pure standard library at the core, fully offline-testable, deterministic under a seed.

pip install llm-evalgate · GitHub · PyPI

anachron

A look-ahead leakage detector for LLM agents: does an agent use information it could not have had at the time? It scores the leakage rate of an agent's tool calls under an as-of-date constraint, graded with the point-in-time rigor of quantitative backtesting — survivorship, restatements, transaction cost. The discipline that keeps a trading backtest honest, generalized to agents.

GitHub

Thousand Token Wood

Five small-model (Qwen2.5-3B) creature-agents in an emergent economy: an experiment in multi-agent dynamics under real economic pressure. Served on vLLM / Modal with a Gradio interface. Built for the Hugging Face build-small hackathon.

Live demo

Also building, in the same spirit: finance-paper-replication — replicating quant research papers and publishing the honest results — and finance-research-factory, a multi-agent q-fin research pipeline with alpha-protection built in.


Writing

I write about AI engineering, evals, quantitative finance, and building systems you can actually trust.


Stack

Python · LLMs · multi-agent systems · LLM-as-judge and evals · RAG · agent reliability · vLLM · Modal · Gradio · Hugging Face · bootstrap, hypothesis testing, and experimental design · pandas / NumPy


Open to conversations about hard problems in agent reliability and AI for finance.

Pinned Loading

  1. anachron anachron Public

    Point-in-time tool-call leakage scoring for LLM agents, as an Inspect extension. Measures whether an agent uses information it could not have had at the time.

    Python

  2. llm-evalgate llm-evalgate Public

    Eval gates with error bars: confidence intervals, calibrated LLM judges, and a statistically honest regression gate for LLM pipelines

    Python