NEXUS reads SEC filings from 140 technology companies, extracts supply chain and ownership relationships into a live knowledge graph, and routes every material corporate event through a 5-agent adversarial AI debate before submitting a trade.
Knowledge graph explorer -- NVDA selected, showing supply chain and institutional ownership edges
---
Ingestion. NEXUS polls EDGAR continuously for new 10-K, 13F, DEF14A, 8-K, SC TO-T, and S-4 filings across 140 tickers. Each filing type feeds a different extraction pass: 10-Ks yield supply chain relationships (NLP), 13Fs yield institutional ownership positions (iXBRL parser), DEF14As yield board interlocks (NLP), 8-Ks feed the event classifier.
Knowledge graph. Extracted relationships are stored in a typed edge schema: supply edges (1,659), ownership edges (34,843), board edges (326), and macro edges (8 rate-setting links to sovereign nodes). A Heterogeneous Graph Transformer trained on this graph produces node embeddings used as input features for the signal layer.
Signal layer. Eleven factors were tested under HLZ multiple-testing correction (M=400, Bonferroni). One passes all six red-team tests: fundamental_margin_compression at t=+4.834 on the 126-day horizon. A Funding Stress Index (spread-only PCA(1), 64% variance explained) overlays macro regime context on every rebalance.
Adversarial debate. Every 8-K filing classified as material (supply chain, acquisition, or guidance, with materiality score >= 0.60) triggers a Prefect 3 flow running five AI agents in sequence: NetworkAnalyst, EventMechanic, and ContagionMonitor run in parallel; RedTeam challenges all three; Synthesis produces a final action and conviction score. A stalemate rule forces pass when red team confidence exceeds 0.70 and synthesis conviction is below 0.50.
Execution. If conviction >= 0.60 and all kill switches pass, event_trader sizes a BUY position at 5% of NAV (capped at 10%), fetches the latest close price from price_history, and submits a market order to Alpaca paper. Positions are tracked via a thesis_id FK in portfolio_positions and closed after 21 trading days. The monthly factor rebalance runs as a separate Alpaca order batch.
Live BUY example. On 2026-05-24, a synthetic supply-chain filing (NVDA securing a $48B TSMC wafer agreement) produced:
| Stage | Output |
|---|---|
| Haiku classification | event_type=supply_chain, materiality=0.95, affected=[NVDA, TSM] |
| NetworkAnalyst | SUPPORT, confidence=0.82 |
| ContagionMonitor | FSI=-0.21 (benign macro) |
| RedTeam | Challenged execution risk; did not stalemate |
| Synthesis | action=buy, conviction=0.62 |
| Position sizing | NAV $1,073,633 x 5% = $53,682 notional, 249.45 shares at ~$215 |
| Alpaca order | order_id 5a3fdb2b, status accepted |
The filing was synthetic (injected for pipeline validation). The Alpaca order submission, kill switch checks, and position sizing are real production behavior.
The same day, a real NVDA earnings guidance 8-K produced conviction=0.38 with a stalemate override (red team confidence=0.74) and correctly submitted no order.
| Metric | Value |
|---|---|
| Universe | 140 US-listed tickers, technology supply chain |
| Supply chain edges | 1,659 NLP-extracted from 667 10-K filings |
| Institutional ownership edges | 34,843 across 22 funds x 140 companies x 20 quarters |
| Price history | 283,150 rows, 2018-01-01 to present |
| Insider transactions | 190,020 rows, 2018-2026 |
| EDGAR XBRL fundamentals | 74,662 rows, 137/140 tickers |
| HGT link prediction | val AUC 0.9807 (retrained on 140-node graph, epoch 280) |
| Factors tested | 11 total; 1 passes HLZ M=400 Bonferroni (fundamental_margin_compression t=+4.834 at 126d) |
| Walk-forward backtest | CAGR +8.72%, Sharpe 0.488, Max DD -32.68% |
| Tests | 193 passing |
| Alembic migrations | 13 revisions |
| Tier | Monthly cost | What you get |
|---|---|---|
| Free | $0 | SEC EDGAR, FRED, yfinance (dev), Alpaca paper trading, Anthropic API (~$5-10/month at production volume) |
| Real-time prices | ~$30 | Polygon.io starter (replace yfinance in production) |
| Point-in-time fundamentals | ~$50 | Sharadar (currently gated behind USE_SHARADAR=true) |
Eleven factors were tested using HLZ multiple-testing correction at M=400 (Bonferroni threshold |t| >= 3.78). Every null result is documented in docs/progress/ with full IC tables, regime splits, and bootstrap confidence intervals. An earlier backtest produced CAGR +14.3% and Sharpe 0.554, which was explicitly rejected after discovering that full-sample factor weights were applied at every rebalance -- approximately five years of look-ahead at the start of the window. The corrected result is CAGR +8.72%, Sharpe 0.488. The Phase 6 numbers do not appear anywhere in the current codebase as valid results. Three factors (momentum_12_1, size_proxy, gnn_drift) were flagged as declining by the trailing IC decay detector and removed from the aggregator before the corrected backtest was run.
flowchart TB
subgraph sources[External sources]
EDGAR[SEC EDGAR\n10-K / 13F / DEF14A\n8-K / SC TO-T / S-4]
FRED[FRED\nCP/EFFR, BAA10Y]
YF[yfinance\nOHLCV - dev only]
end
subgraph storage[Storage]
TS[(TimescaleDB\nprice_history hypertable)]
RD[(Redis\nedgar:filings queue\nedgar:seen_accessions)]
end
subgraph graph[Knowledge graph]
SE[supply_edges\n1,659]
OE[ownership_edges\n34,843]
BE[board_edges\n326]
ME[macro_edges]
CE[centrality_history]
end
subgraph signals[Signal layer]
FSI[Funding Stress Index\nspread-only PCA]
HGT[HGT embeddings\nval AUC 0.9807]
FAC[Factor library\n1/11 pass HLZ]
end
subgraph event[Event pipeline - Phase 11]
CLF[8-K Classifier\nHaiku + materiality gate]
DEB[5-agent debate\nNetworkAnalyst + EventMechanic\nContagionMonitor + RedTeam + Synthesis]
ET[event_trader\nconviction gate + kill switches]
end
subgraph exec[Monthly factor execution]
AGG[Composite alpha aggregator]
PORT[Portfolio constructor\nK=10, inv-vol, crowding cap]
REB[Monthly rebalance\ncheck_open_positions first]
end
ALP[(Alpaca paper)]
EDGAR --> TS & RD
FRED --> TS
YF --> TS
TS --> SE & OE & BE & ME
SE & OE & BE & ME --> CE
CE --> HGT
TS --> FSI & FAC
HGT --> FAC
RD --> CLF --> DEB --> ET --> ALP
FAC --> AGG --> PORT --> REB --> ALP
Two database session patterns coexist deliberately: async (AsyncSessionLocal over asyncpg) for API endpoints and live consumers, sync (psycopg2) for CLI scripts and signal computation. Do not mix them in the same module.
- Python 3.11
uvfor dependency management- Docker Desktop (TimescaleDB + Redis)
- SEC EDGAR user agent string (name + email, required by SEC fair-access policy)
- Anthropic API key (required to activate the debate pipeline -- Claude Haiku for classification, Claude Sonnet for RedTeam and Synthesis)
- Optional: FRED API key for the Funding Stress Index; Alpaca paper trading keys for live order submission
# Clone and enter the repo
git clone https://github.com/shiviancodes/nexus.git
cd nexus
# Configure secrets
cp .env.example .env
# Edit .env -- set POSTGRES_PASSWORD, EDGAR_USER_AGENT, ANTHROPIC_API_KEY
# Optional: FRED_API_KEY, ALPACA_API_KEY, ALPACA_SECRET_KEY, ALPACA_PAPER=true
# Start infrastructure
docker compose up -d
# nexus_timescaledb on 127.0.0.1:5432
# nexus_redis on 127.0.0.1:6379
# Install Python dependencies
uv sync --group dev
# Apply all 13 migrations
uv run alembic upgrade head# Seed 140 companies and 8 years of OHLCV (one-off, ~15 min)
uv run python scripts/seed_companies.py
uv run python scripts/load_price_history.py
# Build the knowledge graph
uv run python scripts/run_supply_extraction.py --yes
uv run python scripts/run_ownership_extraction.py --yes
uv run python scripts/run_board_extraction.py --yes
uv run python scripts/run_centrality.py --yes
# Funding Stress Index (needs FRED_API_KEY)
uv run python scripts/run_fsi.py --yes
# HGT training (CPU is fine; install pytorch-cpu separately)
uv run python scripts/run_gnn_training.py
# Factor backtest
uv run python -m nexus.signals.backtest
# Monthly factor rebalance -- dry run (no Alpaca keys required)
uv run python scripts/run_alpaca_rebalance.py
# Monthly factor rebalance -- live Alpaca paper orders
uv run python scripts/run_alpaca_rebalance.py --commit# Start the EDGAR monitor (polls all 140 CIKs every 5 min, pushes 8-K payloads to Redis)
uv run python -m nexus.events.edgar_monitor
# Start the classifier (blocking consumer; requires ANTHROPIC_API_KEY)
# Material filings automatically trigger debate + Alpaca order
uv run python -m nexus.events.classifier
# Process one queued item and exit (useful for smoke tests)
uv run python -m nexus.events.classifier --onceuv run pytest # 193 tests, no live infra required
uv run pytest tests/test_event_trader.py # Phase 11 execution wiring (11 tests)tests/_test_connections.py is underscore-prefixed and excluded from the default run -- it requires live infrastructure.
PyTorch is not in the main dependency set; install separately via the pytorch-cpu index when working on GNN components.
nexus/
├── api/ FastAPI app -- /health, /status, /graph/* endpoints
├── config/ pydantic-settings, 140-ticker universe, CIK map
├── data/ EDGAR client, 10-K/13F/DEF14A/XBRL parsers, FRED client, yfinance wrapper
├── db/ SQLAlchemy models, async + sync session factories
├── events/ EDGAR real-time monitor, 8-K classifier, M&A tracker
├── graph/ Centrality, sovereign/macro nodes, HGT dataset + model
├── signals/ FSI, 11 factors, HLZ correction, decay, crowding, causal validation
├── agents/ 5-agent debate flow (network/event/contagion -> red team -> synthesis)
├── execution/ Aggregator, portfolio constructor, event_trader, AlpacaClient, compliance
├── backtesting/ Point-in-time integrity helpers (assert_point_in_time_safe)
├── monitoring/ Run-time observability hooks
└── nlp/ Shared NLP utilities for the EDGAR text pipeline
migrations/ 13 Alembic revisions
Includes: price_history, supply/ownership/board/macro edges,
paper_trades, portfolio_nav, portfolio_positions (thesis_id FK),
signal_registry, thesis_log, fund_strategy, insider_transactions
scripts/ Operational runners (seed, extract, train, backtest, rebalance)
tests/ 193 unit tests; no live infrastructure required
docs/ Per-phase build logs, decisions, null results (gitignored -- local only)
models/ GNN checkpoints (hgt_link_pred_v2.pt -- val AUC 0.9807)
frontend/ React + Sigma.js knowledge graph explorer
These are non-negotiable and enforced in code and review:
- Point-in-time integrity is absolute.
assert_point_in_time_safe()guards every backtest query. No row withavailable_as_of > decision_dateis permitted. The Phase 6 look-ahead bug was caught by this invariant and forced a rerun. - Tier B factors are gated. Until
USE_SHARADAR=true, only free Tier A data is used. Never approximate Tier B factors with non-point-in-time inputs. - Kill switches are mandatory. Every Alpaca order submission -- event-driven or rebalance -- checks
check_kill_switches()before any order reaches the broker. Drawdown, position loss, FSI breach, and EDGAR rate limit are all hard stops. - External text is untrusted. EDGAR filing text, news, social data -- treated as adversarial input. Filing content passed to Claude is capped at 30,000 characters and never executed.
- yfinance is development only. All production code paths read from
price_history. - No look-ahead bias, ever. Flag immediately if discovered. Do not work around silently.
MIT -- see LICENSE.