Dissertation: Dynamic Evidence-Grounded Financial Knowledge Graph for Multi-Agent Simulated Trading
- Unified KG schema for
Entity,Evidence,Relation,FinancialSignal,TradingDecision - Neo4j-backed graph store and query client
- Data collection pipeline for OHLCV + news
- Evidence-grounded graph updates for technical signals and news mentions
- Experiment pipeline for graph-based decisions vs baselines
python3 -m venv .venv-linux
source .venv-linux/bin/activate
pip install --upgrade pip
pip install -r requirements.txtCreate .env in project root:
NEO4J_URI=bolt://127.0.0.1:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_password
NEO4J_DATABASE=neo4j
TAVILY_API_KEY=your_key
LLM_API_BASE=your_llm_api_base
LLM_API_KEY=your_key
LLM_MODEL=your_model_name
LLM_TIMEOUT_SECONDS=60
LLM_TEMPERATURE=0.2Run collection:
python -m src.main_collectRun experiment:
python -m src.main_experimentpython -m venv .venv
.venv\Scripts\activate
python -m pip install --upgrade pip
pip install -r requirements.txt
python -m src.main_collect
python -m src.main_experimentUse the same .env values as above.
data/raw/market_news/<ticker>/ohlcv_2021_now.parquetdata/raw/market_news/<ticker>/news_latest.parquetdata/experiments/trades.parquet
- For local single-instance Neo4j, prefer
bolt://127.0.0.1:7687. - If Neo4j is unavailable, collection can still save raw files, but graph writes are skipped.
- raw collected data under data/raw/...
- Neo4j graph with queryable nodes and relationships
- experiment output file data/experiments/trades.parquet
- runnable collection and experiment logs
the system supports:
- 3 Hong Kong stock tickers
- OHLCV market data ingestion
- Tavily news collection
- technical signal generation
- basic graph updates
- basic graph querying
- initial multi-agent decision generation
- initial backtest execution
Prove that the same ticker has different graph states at different as_of_date values.
# Snapshot at date A
python -m src.scripts.export_graph_snapshot \
--ticker 0700.HK --as-of 2025-01-01 \
--output data/experiments/snapshot_0700_2025-01-01.json
# Snapshot at date B
python -m src.scripts.export_graph_snapshot \
--ticker 0700.HK --as-of 2025-03-01 \
--output data/experiments/snapshot_0700_2025-03-01.jsonpython -m src.scripts.diff_snapshots \
--a data/experiments/snapshot_0700_2025-01-01.json \
--b data/experiments/snapshot_0700_2025-03-01.jsonOutput shows:
- Node count changes (signals, news, fundamentals, risks, evidences, claims)
- Added/removed signals, news, evidence, and claims between the two dates
- Final verdict: IDENTICAL or DIFFERENT
If DIFFERENT → temporal divergence is proven: the KG evolves over time for the same ticker.
Full experiment protocol: EXPERIMENT_PROTOCOL.md
python -m src.main_collectCollects OHLCV market data, news (Tavily), global news, and fundamentals for all three tickers. Writes raw parquet files to data/raw/market_news/<ticker>/. Also ingests data into the Neo4j knowledge graph if configured.
python -m src.main_experimentRuns four systems (no_kg_no_evidence, evidence_no_kg, static_kg, kg_dynamic) across all tickers and trade dates. Outputs:
data/experiments/trades_latest.csvdata/experiments/trades_latest.parquetdata/experiments/trades_<timestamp>.csv(archive)data/experiments/trades_<timestamp>.parquet(archive)
python -m src.scripts.export_graph_snapshot \
--ticker 0700.HK --as-of 2025-03-01 \
--output data/experiments/snapshot_0700_2025-03-01.jsonExports a point-in-time KG subgraph as JSON. Diff two snapshots to prove temporal divergence:
python -m src.scripts.diff_snapshots \
--a data/experiments/snapshot_0700_2025-01-01.json \
--b data/experiments/snapshot_0700_2025-03-01.json# JSON + Markdown case study
python -m src.scripts.export_decision_trace \
--decision-id decision:0700.HK:2025-03-01 \
--output data/experiments/trace_0700_2025-03-01.json \
--markdown-output data/experiments/trace_0700_2025-03-01.mdExports the full provenance chain: SourceDocument → Evidence → Claim → AgentAssessment → DecisionTrace → BacktestOutcome.
| Path | Format | Description |
|---|---|---|
data/experiments/trades_latest.csv |
CSV | Full experiment results |
data/experiments/trades_latest.parquet |
Parquet | Same, columnar format |
data/experiments/snapshot_<ticker>_<date>.json |
JSON | KG subgraph snapshot |
data/experiments/trace_<ticker>_<date>.json |
JSON | Decision trace |
data/experiments/trace_<ticker>_<date>.md |
Markdown | Human-readable case study |
- KG density: Currently dominated by technical signal nodes; news/fundamental nodes are sparser.
- Evidence grounding: SourceDocument → Evidence → Claim chain is partially implemented; some traces fall back to signal IDs.
- Decision logic: Simple weighted-average scoring with conflict detection; no sophisticated agent negotiation.
- LLM dependency:
kg_dynamicsystem requires a working LLM API endpoint. - Neo4j dependency: Experiments require a running Neo4j instance.
- Data coverage: Yahoo Finance may have gaps around HK holidays; missing dates are handled gracefully.
- Backtest realism: No slippage, volume constraints, or market impact modeling. Transaction cost is a flat 5 bp per side.