A comprehensive research platform investigating the relationship between R&D investment intensity and stock returns.
This project documents an economically meaningful annual R&D-intensity premium (HML_RD, Q5βQ1) and statistically significant monthly factor evidence (FF5 spanning), and provides an implementable strategy backtest (RD20 vs SPY) with explicit transaction costs.
| Metric | Value |
|---|---|
| Annual HML_RD premium (Q5βQ1) | +3.73%/yr (NeweyβWest p = 0.2793; 30 annual observations) |
| Monthly factor spanning (FF5 alpha) | +4.37%/yr (p = 0.0028; statistically significant) |
| Monthly FamaβMacBeth (RD coefficient) | +0.019 (p = 0.0737; marginal at 10%) |
| RD20 net premium vs SPY (after costs) | +7.52%/yr (Jul2001βJun2025; 24 JulyβJune periods) |
- Return Convention: July-June (Fama-French) to avoid look-ahead bias
- Universe: Point-in-time S&P 500 constituents
- Annual HML sample: Jul1995βJun2025 (30 annual observations; economic context)
- Primary inference: Monthly factor spanning tests + monthly FamaβMacBeth regressions (higher power)
- Exits / delistings (Tier-1): Cash-after-exit return construction + explicit delisting sensitivity (not CRSP dlret)
Features:
- Interactive R&D premium analysis
- Rolling window visualizations
- Factor spanning tests
- Implementable R&D ETF simulator
- Publication-ready research paper
Current runtime paths
backend/contains the active FastAPI application used by production containers.frontend/contains the active React + Vite SPA served by nginx.deploy/contains the production Docker Compose stack, nginx config, and SSL/certbot wiring.- Root
src/contains older research modules retained in the repo, but it is not part of the currentbackend/orfrontend/Docker build contexts.
fse-rnd-alpha/
βββ backend/ # FastAPI Python backend
β βββ app/
β β βββ api/routes/ # REST API endpoints
β β β βββ research.py # Core research endpoints
β β β βββ portfolio.py # ETF/portfolio endpoints
β β β βββ companies.py # Company data
β β β βββ factors.py # Factor analysis
β β β βββ backtests.py # Backtesting
β β β βββ fmp.py # FMP data proxy
β β β βββ ai_analysis.py # AI-powered analysis
β β β βββ papers.py # Paper content
β β β βββ admin.py # Admin dashboard
β β β βββ analytics.py # Page tracking
β β βββ services/ # Business logic
β β β βββ publication_snapshot.py # Frozen research data
β β β βββ rolling_window.py # Time-series analysis
β β β βββ etf_backtester.py # ETF simulation
β β β βββ factor_tests.py # Statistical tests
β β β βββ fmp_client.py # FMP API client
β β β βββ rd_alpha_scorer.py # R&D scoring
β β βββ core/ # Configuration and security
β β βββ db/ # Database session management
β β βββ main.py # FastAPI entry point
β βββ requirements.txt
β βββ Dockerfile
β
βββ frontend/ # React + Vite + TypeScript
β βββ src/
β β βββ pages/ # Main application pages
β β β βββ papers/
β β β β βββ MainPaper.tsx # Academic paper (~4000 lines)
β β β βββ Whitepaper.tsx # 11-slide investor deck
β β β βββ Portfolio.tsx # R&D ETF simulator
β β β βββ Research.tsx # Research overview
β β β βββ Companies.tsx # Company explorer
β β β βββ Statistics.tsx # Statistical analysis
β β β βββ Methodology.tsx # Methodology details
β β βββ components/
β β β βββ layout/ # Sidebar, Navbar, Footer
β β β βββ ui/ # shadcn/ui components
β β β βββ SafeChart.tsx # Recharts wrapper
β β β βββ InfoTooltip.tsx # Metric explanations
β β β βββ TableOfContents.tsx # Paper navigation
β β βββ lib/
β β β βββ api.ts # API client
β β β βββ analytics.ts # Page view tracking
β β β βββ utils.ts # Utility functions
β β βββ hooks/ # React Query hooks
β βββ package.json
β βββ vite.config.ts
β
βββ src/ # Older research toolkit retained in repo
β βββ ai/ # AI agents for R&D extraction
β β βββ agents/
β β β βββ rd_factor_agent.py # R&D signal extraction
β β β βββ rd_factor_agent_v2.py
β β βββ orchestrator/ # Multi-agent coordination
β β βββ prompts/ # GPT prompts
β β βββ schemas/ # Pydantic schemas
β β βββ utils/ # Caching, cost tracking
β β
β βββ backtesting/ # Backtesting engine
β β βββ engine.py # Main backtest runner
β β βββ enhanced_engine.py # Advanced backtesting
β β βββ portfolio_construction.py # Quintile sorting
β β βββ statistics.py # Statistical calculations
β β βββ returns_calculator.py # Return computation
β β βββ regression_analysis.py # Factor regressions
β β βββ publication_grade/ # Academic-quality analysis
β β βββ factor_returns.py # HML-RD factor
β β βββ inference.py # Newey-West t-stats
β β βββ portfolio_engine.py # Portfolio construction
β β βββ universe.py # S&P 500 management
β β
β βββ services/ # Business logic layer
β β βββ company_service.py # Company data retrieval
β β βββ backtest_service.py # Backtest execution
β β βββ portfolio_service.py # ETF management
β β βββ price_service.py # Price data
β β βββ rd_service.py # R&D calculations
β β βββ audit_service.py # Audit trail
β β
β βββ ingestion/ # Data ingestion
β β βββ sec_crawler.py # SEC EDGAR crawler
β β βββ xbrl_ingestor.py # XBRL parsing
β β βββ xbrl_tag_mapping.py # Tag standardization
β β βββ universe_builder.py # S&P 500 constituents
β β βββ annual_report_text_extractor.py
β β
β βββ factors/ # Factor definitions
β β βββ rd/
β β βββ rd_numeric_engine.py # Quantitative R&D factor
β β βββ rd_text_engine.py # Text-based R&D factor
β β βββ rd_text_engine_v2.py
β β
β βββ financials/ # Financial data processing
β β βββ canonical_schema.py # Standardized schema
β β βββ normaliser.py # Data normalization
β β βββ ratios.py # Financial ratios
β β βββ validation.py # Data validation
β β βββ data_quality_scoring.py
β β
β βββ models/ # Data models
β β βββ orm/ # SQLAlchemy ORM models
β β β βββ company.py # Company metadata
β β β βββ financials_core.py # Core financial data
β β β βββ financials_ratios.py # Computed ratios
β β β βββ price.py # Stock prices
β β β βββ backtest_run.py # Backtest metadata
β β β βββ text_factor_rd.py # Text R&D signals
β β β βββ virtual_etf_*.py # ETF models
β β βββ dto/ # Data transfer objects
β β
β βββ api/ # Flask API (admin dash)
β β βββ app_factory.py # Flask app creation
β β βββ blueprints/ # API blueprints
β β βββ middleware/ # Error handling, metrics
β β
β βββ admin_dash/ # Plotly Dash admin dashboard
β βββ user_dash/ # Plotly Dash user dashboard
β β
β βββ db/ # Database utilities
β β βββ connection.py # Connection management
β β βββ health.py # Health checks
β β βββ transaction_safety.py
β β
β βββ logging/ # Structured logging
β βββ monitoring/ # Metrics and Sentry
β βββ utils/ # Utility functions
β βββ tests/ # Test suite
β
βββ scripts/ # Data pipeline scripts
β βββ ingest_fmp_ultimate.py # FMP data ingestion
β βββ ingest_ff_factors.py # Fama-French factors
β βββ ingest_sp500_historical.py # S&P 500 history
β βββ ingest_wrds_tier2.py # WRDS/CRSP data
β βββ compute_july_june_returns.py # Return calculation
β βββ compute_rd_factors.py # R&D factor computation
β βββ crawl_sec_filings.py # SEC crawler
β βββ reproduce_publication.sh # Full reproduction
β βββ init_db.py # Database setup
β
βββ deploy/ # Production deployment
β βββ docker-compose.yml # Service orchestration
β βββ nginx.conf # Reverse proxy
β βββ deploy.sh # Deployment script
β βββ frontend/dist/ # Mounted to nginx
β
βββ papers/ # Research paper drafts
β βββ METHODOLOGY.md
β βββ paper_1_rd_returns.md
β βββ paper_2_industry_analysis.md
β βββ paper_3_multifactor.md
β βββ paper_4_fundamental.md
β
βββ docs/ # Additional documentation
β βββ api.md
β βββ database.md
β βββ DATA_ACQUISITION.md
β
βββ config/ # Configuration files
β βββ settings.py # App settings
β βββ logging.yml # Logging config
β βββ universe.yml # Universe definitions
β
βββ data/ # Data files
β βββ exports/ # Exported datasets
β βββ reference/ # Reference data
β
βββ migrations/ # Alembic migrations
β
βββ DATA_AVAILABILITY.md # Data sources & replication
βββ DATA_PROVENANCE.md # Data collection methods
βββ DEPLOYMENT_GUIDE.md # Deployment instructions
βββ FSE_RND_ALPHA_HANDOFF.md # Complete handoff docs
βββ requirements.txt # Python dependencies
βββ docker-compose.yml # Local Docker setup
βββ alembic.ini # Migration config
βββ pytest.ini # Test config
- Python 3.11+
- Node.js 18+
- PostgreSQL 15+
- Redis (optional, for caching)
# Clone repository
git clone https://github.com/vastdreams/fse-rnd-alpha.git
cd fse-rnd-alpha
# Backend setup
cd backend
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000
# Frontend setup (new terminal)
cd frontend
npm install
npm run devcd deploy
cp .env.example .env
# Edit .env with your database credentials
docker compose up -d| Source | Description | Tier |
|---|---|---|
| Financial Modeling Prep | Fundamentals, prices | Tier 1 |
| Ken French Data Library | Factor returns | Tier 1 |
| SEC EDGAR | 10-K filings | Tier 1 |
| CRSP/Compustat | Premium data (optional) | Tier 2 |
- Quintile Sorting: Firms ranked by R&D/Revenue annually (June)
- HML-RD Factor: Q5 (High R&D) minus Q1 (Low R&D) returns
- Inference: Newey-West HAC standard errors (lag=1)
- Robustness: Factor spanning, size controls, delisting sensitivity
Fiscal Year End: Dec 31, 2023
10-K Filed By: Mar 31, 2024
Portfolio Formation: June 30, 2024
Holding Period: July 1, 2024 β June 30, 2025
- Chan, Lakonishok & Sougiannis (2001) - R&D and stock returns
- Fama & French (1993, 2015) - Factor models
- Shumway (1997) - Delisting bias correction
| Endpoint | Description |
|---|---|
GET /api/research/publication-snapshot |
Frozen research results |
GET /api/research/quintile-performance/{window} |
Returns by quintile |
GET /api/research/rolling-windows/{window} |
Time-varying premium |
GET /api/research/aggregate-anova |
Statistical tests |
GET /api/research/fama-macbeth/{window} |
Fama-MacBeth regression |
| Endpoint | Description |
|---|---|
GET /api/portfolio/etf-holdings |
Current R&D ETF holdings |
GET /api/portfolio/sector-weights |
Sector allocation |
GET /api/portfolio/all-candidates |
All candidate stocks |
GET /api/portfolio/forecast-vs-actual |
Forecast performance |
| Endpoint | Description |
|---|---|
GET /api/research/export/cohort-data.csv |
Full research cohort |
GET /api/research/export/quintile-performance.csv |
Quintile returns |
GET /api/research/export/rolling-windows.csv |
Rolling window data |
GET /api/research/export/methodology-parameters.json |
Methodology params |
Full API documentation: /docs (Swagger UI)
The AI layer uses GPT-4 for extracting R&D signals from unstructured text:
# R&D factor extraction from 10-K chunks
from src.ai.agents.rd_factor_agent import extract_rd_from_chunk
signals = extract_rd_from_chunk(
chunk_text="...",
chunk_id="chunk_001",
page=42,
section="Business"
)
# Returns: RDChunkSignals(rd_mentions=5, topics=["AI", "Cloud"], tone_score=0.7)Publication-grade backtesting with Fama-French methodology:
from src.backtesting.engine import run_backtest
from src.backtesting.specs import BacktestSpec
spec = BacktestSpec(
factor_id="RND_v1_numeric",
universe=["pilot_top10"],
start_year=1995,
end_year=2024,
num_buckets=5,
holding_period_years=1
)
results = run_backtest(spec)Business logic abstraction over data access:
from src.services.company_service import get_company_details
from src.services.rd_service import calculate_rd_intensity
company = get_company_details("AAPL")
rd_intensity = calculate_rd_intensity("AAPL", 2023)| Table | Description |
|---|---|
companies |
Company metadata (ticker, name, sector, CIK) |
financials_core |
Annual fundamentals (R&D, revenue, assets) |
financials_ratios |
Computed ratios (R&D intensity, ROE) |
fmp_daily_prices |
Tier-1 daily prices (split-adjusted close) |
fmp_dividends |
Tier-1 dividend events (ex-div dates; used with fmp_daily_prices to construct total-return proxy) |
company_year_core |
Annual company snapshots |
text_factor_rd |
Text-derived R&D signals |
backtest_run |
Backtest execution metadata |
backtest_result |
Backtest results by year/bucket |
publication_snapshots |
Frozen research results |
# Database
POSTGRES_PASSWORD=your_secure_password
DATABASE_URL=postgresql+asyncpg://postgres:password@postgres:5432/rd_alpha
# Redis
REDIS_URL=redis://redis:6379/0
# API Keys
FMP_API_KEY=your_fmp_api_key
OPENAI_API_KEY=your_openai_key # For AI agents
# AWS (optional)
AWS_ACCESS_KEY_ID=your_key
AWS_SECRET_ACCESS_KEY=your_secret
S3_BUCKET=fse-rnd-alpha-data
# Security
SECRET_KEY=your_secret_key
DEBUG=false# Full FMP ingestion
python scripts/ingest_fmp_ultimate.py
# Fama-French factors
python scripts/ingest_ff_factors.py
# S&P 500 constituents
python scripts/ingest_sp500_historical.py
# SEC filings
python scripts/crawl_sec_filings.py# Compute returns (July-June)
python scripts/compute_july_june_returns.py --data-tier tier1
# Compute R&D factors
python scripts/compute_rd_factors.py
# Generate research metrics
python scripts/compute_research_metrics.py
# Full reproduction
./scripts/reproduce_publication.sh# Initialize
python scripts/init_db.py
# Migrations
alembic upgrade head# Run all tests
pytest
# With coverage
pytest --cov=src --cov=backend
# Specific test file
pytest tests/unit/test_backtesting.py- Data Availability - Data sources and access
- Data Provenance - Collection methodology
- Deployment Guide - Production setup
- Handoff Documentation - Complete project handoff
If you use this research, please cite the working paper (and optionally the open-source platform code):
@techreport{sehgal_rnd_alpha_2025,
author = {Sehgal, Abhishek},
title = {R\\&D Alpha: Investment Intensity and Long-Term Stock Returns},
institution = {FSE Research \\& Investments Pty Ltd},
year = {2025},
month = {12},
url = {https://research.finsoeasy.com/rnd-alpha-paper.pdf},
note = {Working paper; results are pinned to a frozen publication snapshot (see PDF for snapshot ID).}
}
@software{sehgal_fse_rnd_alpha_2026,
author = {Sehgal, Abhishek},
title = {FSE R\\&D Alpha Research Platform},
year = {2026},
version = {2.1.0},
url = {https://github.com/vastdreams/fse-rnd-alpha}
}This research is provided for educational and informational purposes only. It does not constitute investment advice. Past performance does not guarantee future results. The authors are not responsible for any investment decisions made based on this research.
MIT License - see LICENSE for details.
Contributions welcome! Please read our contributing guidelines and submit pull requests.
Built with β€οΈ by Finsoeasy