NEXUS

NEXUS reads SEC filings from 140 technology companies, extracts supply chain and ownership relationships into a live knowledge graph, and routes every material corporate event through a 5-agent adversarial AI debate before submitting a trade.

Knowledge graph explorer -- NVDA selected, showing supply chain and institutional ownership edges ---

What it does

Ingestion. NEXUS polls EDGAR continuously for new 10-K, 13F, DEF14A, 8-K, SC TO-T, and S-4 filings across 140 tickers. Each filing type feeds a different extraction pass: 10-Ks yield supply chain relationships (NLP), 13Fs yield institutional ownership positions (iXBRL parser), DEF14As yield board interlocks (NLP), 8-Ks feed the event classifier.

Knowledge graph. Extracted relationships are stored in a typed edge schema: supply edges (1,659), ownership edges (34,843), board edges (326), and macro edges (8 rate-setting links to sovereign nodes). A Heterogeneous Graph Transformer trained on this graph produces node embeddings used as input features for the signal layer.

Signal layer. Eleven factors were tested under HLZ multiple-testing correction (M=400, Bonferroni). One passes all six red-team tests: fundamental_margin_compression at t=+4.834 on the 126-day horizon. A Funding Stress Index (spread-only PCA(1), 64% variance explained) overlays macro regime context on every rebalance.

Adversarial debate. Every 8-K filing classified as material (supply chain, acquisition, or guidance, with materiality score >= 0.60) triggers a Prefect 3 flow running five AI agents in sequence: NetworkAnalyst, EventMechanic, and ContagionMonitor run in parallel; RedTeam challenges all three; Synthesis produces a final action and conviction score. A stalemate rule forces pass when red team confidence exceeds 0.70 and synthesis conviction is below 0.50.

Execution. If conviction >= 0.60 and all kill switches pass, event_trader sizes a BUY position at 5% of NAV (capped at 10%), fetches the latest close price from price_history, and submits a market order to Alpaca paper. Positions are tracked via a thesis_id FK in portfolio_positions and closed after 21 trading days. The monthly factor rebalance runs as a separate Alpaca order batch.

Live BUY example. On 2026-05-24, a synthetic supply-chain filing (NVDA securing a $48B TSMC wafer agreement) produced:

Stage	Output
Haiku classification	event_type=supply_chain, materiality=0.95, affected=[NVDA, TSM]
NetworkAnalyst	SUPPORT, confidence=0.82
ContagionMonitor	FSI=-0.21 (benign macro)
RedTeam	Challenged execution risk; did not stalemate
Synthesis	action=buy, conviction=0.62
Position sizing	NAV $1,073,633 x 5% = $53,682 notional, 249.45 shares at ~$215
Alpaca order	order_id 5a3fdb2b, status accepted

The filing was synthetic (injected for pipeline validation). The Alpaca order submission, kill switch checks, and position sizing are real production behavior.

The same day, a real NVDA earnings guidance 8-K produced conviction=0.38 with a stalemate override (red team confidence=0.74) and correctly submitted no order.

Key numbers

Metric	Value
Universe	140 US-listed tickers, technology supply chain
Supply chain edges	1,659 NLP-extracted from 667 10-K filings
Institutional ownership edges	34,843 across 22 funds x 140 companies x 20 quarters
Price history	283,150 rows, 2018-01-01 to present
Insider transactions	190,020 rows, 2018-2026
EDGAR XBRL fundamentals	74,662 rows, 137/140 tickers
HGT link prediction	val AUC 0.9807 (retrained on 140-node graph, epoch 280)
Factors tested	11 total; 1 passes HLZ M=400 Bonferroni (fundamental_margin_compression t=+4.834 at 126d)
Walk-forward backtest	CAGR +8.72%, Sharpe 0.488, Max DD -32.68%
Tests	193 passing
Alembic migrations	13 revisions

Data costs

Tier	Monthly cost	What you get
Free	$0	SEC EDGAR, FRED, yfinance (dev), Alpaca paper trading, Anthropic API (~$5-10/month at production volume)
Real-time prices	~$30	Polygon.io starter (replace yfinance in production)
Point-in-time fundamentals	~$50	Sharadar (currently gated behind USE_SHARADAR=true)

Research methodology

Eleven factors were tested using HLZ multiple-testing correction at M=400 (Bonferroni threshold |t| >= 3.78). Every null result is documented in docs/progress/ with full IC tables, regime splits, and bootstrap confidence intervals. An earlier backtest produced CAGR +14.3% and Sharpe 0.554, which was explicitly rejected after discovering that full-sample factor weights were applied at every rebalance -- approximately five years of look-ahead at the start of the window. The corrected result is CAGR +8.72%, Sharpe 0.488. The Phase 6 numbers do not appear anywhere in the current codebase as valid results. Three factors (momentum_12_1, size_proxy, gnn_drift) were flagged as declining by the trailing IC decay detector and removed from the aggregator before the corrected backtest was run.

Architecture

flowchart TB
    subgraph sources[External sources]
        EDGAR[SEC EDGAR\n10-K / 13F / DEF14A\n8-K / SC TO-T / S-4]
        FRED[FRED\nCP/EFFR, BAA10Y]
        YF[yfinance\nOHLCV - dev only]
    end

    subgraph storage[Storage]
        TS[(TimescaleDB\nprice_history hypertable)]
        RD[(Redis\nedgar:filings queue\nedgar:seen_accessions)]
    end

    subgraph graph[Knowledge graph]
        SE[supply_edges\n1,659]
        OE[ownership_edges\n34,843]
        BE[board_edges\n326]
        ME[macro_edges]
        CE[centrality_history]
    end

    subgraph signals[Signal layer]
        FSI[Funding Stress Index\nspread-only PCA]
        HGT[HGT embeddings\nval AUC 0.9807]
        FAC[Factor library\n1/11 pass HLZ]
    end

    subgraph event[Event pipeline - Phase 11]
        CLF[8-K Classifier\nHaiku + materiality gate]
        DEB[5-agent debate\nNetworkAnalyst + EventMechanic\nContagionMonitor + RedTeam + Synthesis]
        ET[event_trader\nconviction gate + kill switches]
    end

    subgraph exec[Monthly factor execution]
        AGG[Composite alpha aggregator]
        PORT[Portfolio constructor\nK=10, inv-vol, crowding cap]
        REB[Monthly rebalance\ncheck_open_positions first]
    end

    ALP[(Alpaca paper)]

    EDGAR --> TS & RD
    FRED --> TS
    YF --> TS
    TS --> SE & OE & BE & ME
    SE & OE & BE & ME --> CE
    CE --> HGT
    TS --> FSI & FAC
    HGT --> FAC
    RD --> CLF --> DEB --> ET --> ALP
    FAC --> AGG --> PORT --> REB --> ALP

Two database session patterns coexist deliberately: async (AsyncSessionLocal over asyncpg) for API endpoints and live consumers, sync (psycopg2) for CLI scripts and signal computation. Do not mix them in the same module.

Getting started

Prerequisites

Python 3.11
uv for dependency management
Docker Desktop (TimescaleDB + Redis)
SEC EDGAR user agent string (name + email, required by SEC fair-access policy)
Anthropic API key (required to activate the debate pipeline -- Claude Haiku for classification, Claude Sonnet for RedTeam and Synthesis)
Optional: FRED API key for the Funding Stress Index; Alpaca paper trading keys for live order submission

Setup

# Clone and enter the repo
git clone https://github.com/shiviancodes/nexus.git
cd nexus

# Configure secrets
cp .env.example .env
# Edit .env -- set POSTGRES_PASSWORD, EDGAR_USER_AGENT, ANTHROPIC_API_KEY
# Optional: FRED_API_KEY, ALPACA_API_KEY, ALPACA_SECRET_KEY, ALPACA_PAPER=true

# Start infrastructure
docker compose up -d
# nexus_timescaledb on 127.0.0.1:5432
# nexus_redis on 127.0.0.1:6379

# Install Python dependencies
uv sync --group dev

# Apply all 13 migrations
uv run alembic upgrade head

Load data and run a backtest

# Seed 140 companies and 8 years of OHLCV (one-off, ~15 min)
uv run python scripts/seed_companies.py
uv run python scripts/load_price_history.py

# Build the knowledge graph
uv run python scripts/run_supply_extraction.py --yes
uv run python scripts/run_ownership_extraction.py --yes
uv run python scripts/run_board_extraction.py --yes
uv run python scripts/run_centrality.py --yes

# Funding Stress Index (needs FRED_API_KEY)
uv run python scripts/run_fsi.py --yes

# HGT training (CPU is fine; install pytorch-cpu separately)
uv run python scripts/run_gnn_training.py

# Factor backtest
uv run python -m nexus.signals.backtest

# Monthly factor rebalance -- dry run (no Alpaca keys required)
uv run python scripts/run_alpaca_rebalance.py

# Monthly factor rebalance -- live Alpaca paper orders
uv run python scripts/run_alpaca_rebalance.py --commit

Event-driven pipeline

# Start the EDGAR monitor (polls all 140 CIKs every 5 min, pushes 8-K payloads to Redis)
uv run python -m nexus.events.edgar_monitor

# Start the classifier (blocking consumer; requires ANTHROPIC_API_KEY)
# Material filings automatically trigger debate + Alpaca order
uv run python -m nexus.events.classifier

# Process one queued item and exit (useful for smoke tests)
uv run python -m nexus.events.classifier --once

Tests

uv run pytest                              # 193 tests, no live infra required
uv run pytest tests/test_event_trader.py   # Phase 11 execution wiring (11 tests)

tests/_test_connections.py is underscore-prefixed and excluded from the default run -- it requires live infrastructure.

PyTorch is not in the main dependency set; install separately via the pytorch-cpu index when working on GNN components.

Project structure

nexus/
├── api/         FastAPI app -- /health, /status, /graph/* endpoints
├── config/      pydantic-settings, 140-ticker universe, CIK map
├── data/        EDGAR client, 10-K/13F/DEF14A/XBRL parsers, FRED client, yfinance wrapper
├── db/          SQLAlchemy models, async + sync session factories
├── events/      EDGAR real-time monitor, 8-K classifier, M&A tracker
├── graph/       Centrality, sovereign/macro nodes, HGT dataset + model
├── signals/     FSI, 11 factors, HLZ correction, decay, crowding, causal validation
├── agents/      5-agent debate flow (network/event/contagion -> red team -> synthesis)
├── execution/   Aggregator, portfolio constructor, event_trader, AlpacaClient, compliance
├── backtesting/ Point-in-time integrity helpers (assert_point_in_time_safe)
├── monitoring/  Run-time observability hooks
└── nlp/         Shared NLP utilities for the EDGAR text pipeline

migrations/      13 Alembic revisions
                 Includes: price_history, supply/ownership/board/macro edges,
                 paper_trades, portfolio_nav, portfolio_positions (thesis_id FK),
                 signal_registry, thesis_log, fund_strategy, insider_transactions

scripts/         Operational runners (seed, extract, train, backtest, rebalance)
tests/           193 unit tests; no live infrastructure required
docs/            Per-phase build logs, decisions, null results (gitignored -- local only)
models/          GNN checkpoints (hgt_link_pred_v2.pt -- val AUC 0.9807)
frontend/        React + Sigma.js knowledge graph explorer

Hard constraints

These are non-negotiable and enforced in code and review:

Point-in-time integrity is absolute. assert_point_in_time_safe() guards every backtest query. No row with available_as_of > decision_date is permitted. The Phase 6 look-ahead bug was caught by this invariant and forced a rerun.
Tier B factors are gated. Until USE_SHARADAR=true, only free Tier A data is used. Never approximate Tier B factors with non-point-in-time inputs.
Kill switches are mandatory. Every Alpaca order submission -- event-driven or rebalance -- checks check_kill_switches() before any order reaches the broker. Drawdown, position loss, FSI breach, and EDGAR rate limit are all hard stops.
External text is untrusted. EDGAR filing text, news, social data -- treated as adversarial input. Filing content passed to Claude is capped at 30,000 characters and never executed.
yfinance is development only. All production code paths read from price_history.
No look-ahead bias, ever. Flag immediately if discovered. Do not work around silently.

License

MIT -- see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.github/workflows		.github/workflows
data		data
docs		docs
frontend		frontend
migrations		migrations
nexus		nexus
research		research
scripts		scripts
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
PROGRESS.md		PROGRESS.md
README.md		README.md
alembic.ini		alembic.ini
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NEXUS

What it does

Key numbers

Data costs

Research methodology

Architecture

Getting started

Prerequisites

Setup

Load data and run a backtest

Event-driven pipeline

Tests

Project structure

Hard constraints

License

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NEXUS

What it does

Key numbers

Data costs

Research methodology

Architecture

Getting started

Prerequisites

Setup

Load data and run a backtest

Event-driven pipeline

Tests

Project structure

Hard constraints

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages