TradeFM

A replication of TradeFM (arXiv 2602.23784) — a 524M-parameter decoder-only Transformer that generates realistic order flow by learning from raw Databento MBO (Level 3) event streams.

Kawawa-Beaudan, Sood, Papasotiriou, Borrajo, Veloso — JPMorgan AI Research, Feb 2026

Overview

TradeFM applies the foundation model paradigm to market microstructure. A single model learns unified trade-flow dynamics from billions of transactions across thousands of US equities, without asset-specific calibration. In closed-loop evaluation, generated order flow reproduces canonical stylized facts: heavy-tailed returns, volatility clustering, and lack of return autocorrelation.

Key properties:

Partial observability — learns from the Level 3 event stream (what any market participant sees), not full limit order book snapshots
Scale-invariant features — normalizes price, volume, and time features so one model works across assets with vastly different prices and liquidity profiles
Zero-shot geographic generalization — trained on US equities, transfers to APAC markets with moderate perplexity degradation
Closed-loop simulation — integrates with a deterministic LOB simulator for realistic rollouts

Architecture

Databento MBO .csv.zst
  └─ databento_loader.py     load + decode events
       ├─ compute_adv()       rolling ADV → liquidity tier
       └─ preprocess_mbo_for_tradefm()
            ├─ ew_vwap.py     EW-VWAP mid-price from Trade/Fill events
            └─ scale-invariant features per (instrument, date) session
                  δp = (p_order − p_mid) / p_mid
                  v  = log(1 + size)
                  Δt = ts_recv diff (seconds)
                  Δp = (p_mid − p_open) / p_open

tokenizer.py   calibrate() on first 30 days → encode() → composite token
  └─ mixed-base vocab: 2×2×16×16×16 = 16,384 tokens
     contextual (not predicted): liquidity bin, market/participant flag, Δp bin

dataset.py     sliding-window sequences per (instrument, date) → PyTorch Dataset

architecture.py  TradeFM
  ├─ TabularEmbedding  4 embedding tables → concat → Linear projection
  └─ N × DecoderLayer  (RMSNorm + GQA + SwiGLU MLP)
  └─ lm_head → next token (cross-entropy loss)

trainer.py     AdamW (β=0.9, 0.95), linear warmup+decay, fp16, grad accumulation

market_simulator.py   deterministic LOB for closed-loop rollouts

evaluation/stylized_facts.py   ACF, kurtosis, K-S, Wasserstein-1

Model sizes

Preset	Layers	Hidden	Heads (GQA)	Params
125M	12	768	12 / 4	~125M
250M	24	1024	16 / 4	~250M
500M	32	1024	32 / 8	~524M

Installation

pip install -r requirements.txt

Requirements: torch>=2.0, numpy, pandas, scipy, zstandard

Usage

Training

python train.py \
    --data "data/**/*.mbo.csv.zst" \
    --model-size 500M \
    --output-dir checkpoints/500M

Key arguments:

Argument	Default	Description
`--data`	required	Path, glob, or list of Databento MBO `.csv.zst` files
`--model-size`	`500M`	Size preset: `125M`, `250M`, `500M`
`--calib-days`	`30`	Trading days used to calibrate the tokenizer
`--val-days`	`30`	Trailing days held out for validation
`--context-length`	`1024`	Sequence length in tokens
`--epochs`	`4`	Training epochs
`--batch-size`	`24`	Per-device batch size
`--accum-steps`	`56`	Gradient accumulation (effective batch ≈ 4032)
`--lr`	`5e-5`	Peak learning rate
`--output-dir`	`checkpoints`	Checkpoint output directory

Syntax check

python -c "
import ast, pathlib
for f in pathlib.Path('tradefm').rglob('*.py'):
    ast.parse(f.read_text())
    print('OK', f)
"

Input Data Format

Databento MBO .csv.zst files (zstandard-compressed CSV) with columns:

Column	Example	Notes
`ts_event`	`2026-03-30T08:00:00.016Z`	Exchange timestamp (nanosecond UTC)
`ts_recv`	`2026-03-30T08:00:00.015Z`	Feed receive timestamp
`action`	`A` / `C` / `T` / `F`	Add, Cancel, Trade, Fill
`side`	`A` / `B`	Ask (sell) / Bid (buy)
`price`	`183.460000000`	Decimal dollars
`size`	`1635`	Shares
`order_id`	`435123903`	Links Add → Cancel/Fill events
`symbol`	`NVDA`	Ticker

Only A (Add) and C (Cancel) events are model targets. T/F events update the EW-VWAP estimator only.

Design Choices

Tokenizer frozen after calibration — calibrated once on the first 30 trading days, then fixed for all training and inference
Equal-frequency bins for price — quantile binning gives high resolution near the mid-price where most orders cluster
Equal-width bins for volume/time — applied to log-transformed values, effectively logarithmic in the original space
Session boundaries respected — no sliding window crosses (instrument_id, date) boundaries
Closed-loop generation — TradeFM.generate() feeds each predicted token through the LOB simulator to get the updated price level, which is appended to context before the next step

Research Lineage

This work builds on:

Kawawa-Beaudan et al. 2024 (arXiv 2409.07619) — "Ensemble Methods for Sequence Classification with Hidden Markov Models": established order flow as a sequence modeling problem using HMM ensembles for anomaly classification. TradeFM replaces HMMs with a large generative Transformer and flips the task from classification to generation.
Sirigano & Cont 2021 — showed a single deep learning model trained on pooled multi-stock data outperforms asset-specific models, motivating cross-asset generalization.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
papers		papers
tradefm		tradefm
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
pipeline_walkthrough.ipynb		pipeline_walkthrough.ipynb
requirements.txt		requirements.txt
research-notes.md		research-notes.md
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TradeFM

Overview

Architecture

Model sizes

Installation

Usage

Training

Syntax check

Input Data Format

Design Choices

Research Lineage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TradeFM

Overview

Architecture

Model sizes

Installation

Usage

Training

Syntax check

Input Data Format

Design Choices

Research Lineage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages