A replication of TradeFM (arXiv 2602.23784) — a 524M-parameter decoder-only Transformer that generates realistic order flow by learning from raw Databento MBO (Level 3) event streams.
Kawawa-Beaudan, Sood, Papasotiriou, Borrajo, Veloso — JPMorgan AI Research, Feb 2026
TradeFM applies the foundation model paradigm to market microstructure. A single model learns unified trade-flow dynamics from billions of transactions across thousands of US equities, without asset-specific calibration. In closed-loop evaluation, generated order flow reproduces canonical stylized facts: heavy-tailed returns, volatility clustering, and lack of return autocorrelation.
Key properties:
- Partial observability — learns from the Level 3 event stream (what any market participant sees), not full limit order book snapshots
- Scale-invariant features — normalizes price, volume, and time features so one model works across assets with vastly different prices and liquidity profiles
- Zero-shot geographic generalization — trained on US equities, transfers to APAC markets with moderate perplexity degradation
- Closed-loop simulation — integrates with a deterministic LOB simulator for realistic rollouts
Databento MBO .csv.zst
└─ databento_loader.py load + decode events
├─ compute_adv() rolling ADV → liquidity tier
└─ preprocess_mbo_for_tradefm()
├─ ew_vwap.py EW-VWAP mid-price from Trade/Fill events
└─ scale-invariant features per (instrument, date) session
δp = (p_order − p_mid) / p_mid
v = log(1 + size)
Δt = ts_recv diff (seconds)
Δp = (p_mid − p_open) / p_open
tokenizer.py calibrate() on first 30 days → encode() → composite token
└─ mixed-base vocab: 2×2×16×16×16 = 16,384 tokens
contextual (not predicted): liquidity bin, market/participant flag, Δp bin
dataset.py sliding-window sequences per (instrument, date) → PyTorch Dataset
architecture.py TradeFM
├─ TabularEmbedding 4 embedding tables → concat → Linear projection
└─ N × DecoderLayer (RMSNorm + GQA + SwiGLU MLP)
└─ lm_head → next token (cross-entropy loss)
trainer.py AdamW (β=0.9, 0.95), linear warmup+decay, fp16, grad accumulation
market_simulator.py deterministic LOB for closed-loop rollouts
evaluation/stylized_facts.py ACF, kurtosis, K-S, Wasserstein-1
| Preset | Layers | Hidden | Heads (GQA) | Params |
|---|---|---|---|---|
| 125M | 12 | 768 | 12 / 4 | ~125M |
| 250M | 24 | 1024 | 16 / 4 | ~250M |
| 500M | 32 | 1024 | 32 / 8 | ~524M |
pip install -r requirements.txtRequirements: torch>=2.0, numpy, pandas, scipy, zstandard
python train.py \
--data "data/**/*.mbo.csv.zst" \
--model-size 500M \
--output-dir checkpoints/500MKey arguments:
| Argument | Default | Description |
|---|---|---|
--data |
required | Path, glob, or list of Databento MBO .csv.zst files |
--model-size |
500M |
Size preset: 125M, 250M, 500M |
--calib-days |
30 |
Trading days used to calibrate the tokenizer |
--val-days |
30 |
Trailing days held out for validation |
--context-length |
1024 |
Sequence length in tokens |
--epochs |
4 |
Training epochs |
--batch-size |
24 |
Per-device batch size |
--accum-steps |
56 |
Gradient accumulation (effective batch ≈ 4032) |
--lr |
5e-5 |
Peak learning rate |
--output-dir |
checkpoints |
Checkpoint output directory |
python -c "
import ast, pathlib
for f in pathlib.Path('tradefm').rglob('*.py'):
ast.parse(f.read_text())
print('OK', f)
"Databento MBO .csv.zst files (zstandard-compressed CSV) with columns:
| Column | Example | Notes |
|---|---|---|
ts_event |
2026-03-30T08:00:00.016Z |
Exchange timestamp (nanosecond UTC) |
ts_recv |
2026-03-30T08:00:00.015Z |
Feed receive timestamp |
action |
A / C / T / F |
Add, Cancel, Trade, Fill |
side |
A / B |
Ask (sell) / Bid (buy) |
price |
183.460000000 |
Decimal dollars |
size |
1635 |
Shares |
order_id |
435123903 |
Links Add → Cancel/Fill events |
symbol |
NVDA |
Ticker |
Only A (Add) and C (Cancel) events are model targets. T/F events update the EW-VWAP estimator only.
- Tokenizer frozen after calibration — calibrated once on the first 30 trading days, then fixed for all training and inference
- Equal-frequency bins for price — quantile binning gives high resolution near the mid-price where most orders cluster
- Equal-width bins for volume/time — applied to log-transformed values, effectively logarithmic in the original space
- Session boundaries respected — no sliding window crosses
(instrument_id, date)boundaries - Closed-loop generation —
TradeFM.generate()feeds each predicted token through the LOB simulator to get the updated price level, which is appended to context before the next step
This work builds on:
-
Kawawa-Beaudan et al. 2024 (arXiv 2409.07619) — "Ensemble Methods for Sequence Classification with Hidden Markov Models": established order flow as a sequence modeling problem using HMM ensembles for anomaly classification. TradeFM replaces HMMs with a large generative Transformer and flips the task from classification to generation.
-
Sirigano & Cont 2021 — showed a single deep learning model trained on pooled multi-stock data outperforms asset-specific models, motivating cross-asset generalization.