Skip to content

vastdreams/fse-rnd-alpha

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

164 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

R&D Alpha: Innovation-Driven Investment Research

License: MIT Live Demo

A comprehensive research platform investigating the relationship between R&D investment intensity and stock returns.


πŸ“Š Research Summary

This project documents an economically meaningful annual R&D-intensity premium (HML_RD, Q5–Q1) and statistically significant monthly factor evidence (FF5 spanning), and provides an implementable strategy backtest (RD20 vs SPY) with explicit transaction costs.

Key Findings

Metric Value
Annual HML_RD premium (Q5–Q1) +3.73%/yr (Newey–West p = 0.2793; 30 annual observations)
Monthly factor spanning (FF5 alpha) +4.37%/yr (p = 0.0028; statistically significant)
Monthly Fama–MacBeth (RD coefficient) +0.019 (p = 0.0737; marginal at 10%)
RD20 net premium vs SPY (after costs) +7.52%/yr (Jul2001–Jun2025; 24 July–June periods)

Methodology Highlights

  • Return Convention: July-June (Fama-French) to avoid look-ahead bias
  • Universe: Point-in-time S&P 500 constituents
  • Annual HML sample: Jul1995–Jun2025 (30 annual observations; economic context)
  • Primary inference: Monthly factor spanning tests + monthly Fama–MacBeth regressions (higher power)
  • Exits / delistings (Tier-1): Cash-after-exit return construction + explicit delisting sensitivity (not CRSP dlret)

πŸš€ Live Demo

research.finsoeasy.com

Features:

  • Interactive R&D premium analysis
  • Rolling window visualizations
  • Factor spanning tests
  • Implementable R&D ETF simulator
  • Publication-ready research paper

πŸ“ Repository Structure

Current runtime paths

  • backend/ contains the active FastAPI application used by production containers.
  • frontend/ contains the active React + Vite SPA served by nginx.
  • deploy/ contains the production Docker Compose stack, nginx config, and SSL/certbot wiring.
  • Root src/ contains older research modules retained in the repo, but it is not part of the current backend/ or frontend/ Docker build contexts.
fse-rnd-alpha/
β”œβ”€β”€ backend/                    # FastAPI Python backend
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ api/routes/         # REST API endpoints
β”‚   β”‚   β”‚   β”œβ”€β”€ research.py     # Core research endpoints
β”‚   β”‚   β”‚   β”œβ”€β”€ portfolio.py    # ETF/portfolio endpoints
β”‚   β”‚   β”‚   β”œβ”€β”€ companies.py    # Company data
β”‚   β”‚   β”‚   β”œβ”€β”€ factors.py      # Factor analysis
β”‚   β”‚   β”‚   β”œβ”€β”€ backtests.py    # Backtesting
β”‚   β”‚   β”‚   β”œβ”€β”€ fmp.py          # FMP data proxy
β”‚   β”‚   β”‚   β”œβ”€β”€ ai_analysis.py  # AI-powered analysis
β”‚   β”‚   β”‚   β”œβ”€β”€ papers.py       # Paper content
β”‚   β”‚   β”‚   β”œβ”€β”€ admin.py        # Admin dashboard
β”‚   β”‚   β”‚   └── analytics.py    # Page tracking
β”‚   β”‚   β”œβ”€β”€ services/           # Business logic
β”‚   β”‚   β”‚   β”œβ”€β”€ publication_snapshot.py  # Frozen research data
β”‚   β”‚   β”‚   β”œβ”€β”€ rolling_window.py        # Time-series analysis
β”‚   β”‚   β”‚   β”œβ”€β”€ etf_backtester.py        # ETF simulation
β”‚   β”‚   β”‚   β”œβ”€β”€ factor_tests.py          # Statistical tests
β”‚   β”‚   β”‚   β”œβ”€β”€ fmp_client.py            # FMP API client
β”‚   β”‚   β”‚   └── rd_alpha_scorer.py       # R&D scoring
β”‚   β”‚   β”œβ”€β”€ core/               # Configuration and security
β”‚   β”‚   β”œβ”€β”€ db/                 # Database session management
β”‚   β”‚   └── main.py             # FastAPI entry point
β”‚   β”œβ”€β”€ requirements.txt
β”‚   └── Dockerfile
β”‚
β”œβ”€β”€ frontend/                   # React + Vite + TypeScript
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ pages/              # Main application pages
β”‚   β”‚   β”‚   β”œβ”€β”€ papers/
β”‚   β”‚   β”‚   β”‚   └── MainPaper.tsx    # Academic paper (~4000 lines)
β”‚   β”‚   β”‚   β”œβ”€β”€ Whitepaper.tsx       # 11-slide investor deck
β”‚   β”‚   β”‚   β”œβ”€β”€ Portfolio.tsx        # R&D ETF simulator
β”‚   β”‚   β”‚   β”œβ”€β”€ Research.tsx         # Research overview
β”‚   β”‚   β”‚   β”œβ”€β”€ Companies.tsx        # Company explorer
β”‚   β”‚   β”‚   β”œβ”€β”€ Statistics.tsx       # Statistical analysis
β”‚   β”‚   β”‚   └── Methodology.tsx      # Methodology details
β”‚   β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”‚   β”œβ”€β”€ layout/              # Sidebar, Navbar, Footer
β”‚   β”‚   β”‚   β”œβ”€β”€ ui/                  # shadcn/ui components
β”‚   β”‚   β”‚   β”œβ”€β”€ SafeChart.tsx        # Recharts wrapper
β”‚   β”‚   β”‚   β”œβ”€β”€ InfoTooltip.tsx      # Metric explanations
β”‚   β”‚   β”‚   └── TableOfContents.tsx  # Paper navigation
β”‚   β”‚   β”œβ”€β”€ lib/
β”‚   β”‚   β”‚   β”œβ”€β”€ api.ts               # API client
β”‚   β”‚   β”‚   β”œβ”€β”€ analytics.ts         # Page view tracking
β”‚   β”‚   β”‚   └── utils.ts             # Utility functions
β”‚   β”‚   └── hooks/                   # React Query hooks
β”‚   β”œβ”€β”€ package.json
β”‚   └── vite.config.ts
β”‚
β”œβ”€β”€ src/                        # Older research toolkit retained in repo
β”‚   β”œβ”€β”€ ai/                     # AI agents for R&D extraction
β”‚   β”‚   β”œβ”€β”€ agents/
β”‚   β”‚   β”‚   β”œβ”€β”€ rd_factor_agent.py   # R&D signal extraction
β”‚   β”‚   β”‚   └── rd_factor_agent_v2.py
β”‚   β”‚   β”œβ”€β”€ orchestrator/            # Multi-agent coordination
β”‚   β”‚   β”œβ”€β”€ prompts/                 # GPT prompts
β”‚   β”‚   β”œβ”€β”€ schemas/                 # Pydantic schemas
β”‚   β”‚   └── utils/                   # Caching, cost tracking
β”‚   β”‚
β”‚   β”œβ”€β”€ backtesting/            # Backtesting engine
β”‚   β”‚   β”œβ”€β”€ engine.py                # Main backtest runner
β”‚   β”‚   β”œβ”€β”€ enhanced_engine.py       # Advanced backtesting
β”‚   β”‚   β”œβ”€β”€ portfolio_construction.py # Quintile sorting
β”‚   β”‚   β”œβ”€β”€ statistics.py            # Statistical calculations
β”‚   β”‚   β”œβ”€β”€ returns_calculator.py    # Return computation
β”‚   β”‚   β”œβ”€β”€ regression_analysis.py   # Factor regressions
β”‚   β”‚   └── publication_grade/       # Academic-quality analysis
β”‚   β”‚       β”œβ”€β”€ factor_returns.py    # HML-RD factor
β”‚   β”‚       β”œβ”€β”€ inference.py         # Newey-West t-stats
β”‚   β”‚       β”œβ”€β”€ portfolio_engine.py  # Portfolio construction
β”‚   β”‚       └── universe.py          # S&P 500 management
β”‚   β”‚
β”‚   β”œβ”€β”€ services/               # Business logic layer
β”‚   β”‚   β”œβ”€β”€ company_service.py       # Company data retrieval
β”‚   β”‚   β”œβ”€β”€ backtest_service.py      # Backtest execution
β”‚   β”‚   β”œβ”€β”€ portfolio_service.py     # ETF management
β”‚   β”‚   β”œβ”€β”€ price_service.py         # Price data
β”‚   β”‚   β”œβ”€β”€ rd_service.py            # R&D calculations
β”‚   β”‚   └── audit_service.py         # Audit trail
β”‚   β”‚
β”‚   β”œβ”€β”€ ingestion/              # Data ingestion
β”‚   β”‚   β”œβ”€β”€ sec_crawler.py           # SEC EDGAR crawler
β”‚   β”‚   β”œβ”€β”€ xbrl_ingestor.py         # XBRL parsing
β”‚   β”‚   β”œβ”€β”€ xbrl_tag_mapping.py      # Tag standardization
β”‚   β”‚   β”œβ”€β”€ universe_builder.py      # S&P 500 constituents
β”‚   β”‚   └── annual_report_text_extractor.py
β”‚   β”‚
β”‚   β”œβ”€β”€ factors/                # Factor definitions
β”‚   β”‚   └── rd/
β”‚   β”‚       β”œβ”€β”€ rd_numeric_engine.py # Quantitative R&D factor
β”‚   β”‚       β”œβ”€β”€ rd_text_engine.py    # Text-based R&D factor
β”‚   β”‚       └── rd_text_engine_v2.py
β”‚   β”‚
β”‚   β”œβ”€β”€ financials/             # Financial data processing
β”‚   β”‚   β”œβ”€β”€ canonical_schema.py      # Standardized schema
β”‚   β”‚   β”œβ”€β”€ normaliser.py            # Data normalization
β”‚   β”‚   β”œβ”€β”€ ratios.py                # Financial ratios
β”‚   β”‚   β”œβ”€β”€ validation.py            # Data validation
β”‚   β”‚   └── data_quality_scoring.py
β”‚   β”‚
β”‚   β”œβ”€β”€ models/                 # Data models
β”‚   β”‚   β”œβ”€β”€ orm/                     # SQLAlchemy ORM models
β”‚   β”‚   β”‚   β”œβ”€β”€ company.py           # Company metadata
β”‚   β”‚   β”‚   β”œβ”€β”€ financials_core.py   # Core financial data
β”‚   β”‚   β”‚   β”œβ”€β”€ financials_ratios.py # Computed ratios
β”‚   β”‚   β”‚   β”œβ”€β”€ price.py             # Stock prices
β”‚   β”‚   β”‚   β”œβ”€β”€ backtest_run.py      # Backtest metadata
β”‚   β”‚   β”‚   β”œβ”€β”€ text_factor_rd.py    # Text R&D signals
β”‚   β”‚   β”‚   └── virtual_etf_*.py     # ETF models
β”‚   β”‚   └── dto/                     # Data transfer objects
β”‚   β”‚
β”‚   β”œβ”€β”€ api/                    # Flask API (admin dash)
β”‚   β”‚   β”œβ”€β”€ app_factory.py           # Flask app creation
β”‚   β”‚   β”œβ”€β”€ blueprints/              # API blueprints
β”‚   β”‚   └── middleware/              # Error handling, metrics
β”‚   β”‚
β”‚   β”œβ”€β”€ admin_dash/             # Plotly Dash admin dashboard
β”‚   β”œβ”€β”€ user_dash/              # Plotly Dash user dashboard
β”‚   β”‚
β”‚   β”œβ”€β”€ db/                     # Database utilities
β”‚   β”‚   β”œβ”€β”€ connection.py            # Connection management
β”‚   β”‚   β”œβ”€β”€ health.py                # Health checks
β”‚   β”‚   └── transaction_safety.py
β”‚   β”‚
β”‚   β”œβ”€β”€ logging/                # Structured logging
β”‚   β”œβ”€β”€ monitoring/             # Metrics and Sentry
β”‚   β”œβ”€β”€ utils/                  # Utility functions
β”‚   └── tests/                  # Test suite
β”‚
β”œβ”€β”€ scripts/                    # Data pipeline scripts
β”‚   β”œβ”€β”€ ingest_fmp_ultimate.py       # FMP data ingestion
β”‚   β”œβ”€β”€ ingest_ff_factors.py         # Fama-French factors
β”‚   β”œβ”€β”€ ingest_sp500_historical.py   # S&P 500 history
β”‚   β”œβ”€β”€ ingest_wrds_tier2.py         # WRDS/CRSP data
β”‚   β”œβ”€β”€ compute_july_june_returns.py # Return calculation
β”‚   β”œβ”€β”€ compute_rd_factors.py        # R&D factor computation
β”‚   β”œβ”€β”€ crawl_sec_filings.py         # SEC crawler
β”‚   β”œβ”€β”€ reproduce_publication.sh     # Full reproduction
β”‚   └── init_db.py                   # Database setup
β”‚
β”œβ”€β”€ deploy/                     # Production deployment
β”‚   β”œβ”€β”€ docker-compose.yml           # Service orchestration
β”‚   β”œβ”€β”€ nginx.conf                   # Reverse proxy
β”‚   β”œβ”€β”€ deploy.sh                    # Deployment script
β”‚   └── frontend/dist/               # Mounted to nginx
β”‚
β”œβ”€β”€ papers/                     # Research paper drafts
β”‚   β”œβ”€β”€ METHODOLOGY.md
β”‚   β”œβ”€β”€ paper_1_rd_returns.md
β”‚   β”œβ”€β”€ paper_2_industry_analysis.md
β”‚   β”œβ”€β”€ paper_3_multifactor.md
β”‚   └── paper_4_fundamental.md
β”‚
β”œβ”€β”€ docs/                       # Additional documentation
β”‚   β”œβ”€β”€ api.md
β”‚   β”œβ”€β”€ database.md
β”‚   └── DATA_ACQUISITION.md
β”‚
β”œβ”€β”€ config/                     # Configuration files
β”‚   β”œβ”€β”€ settings.py                  # App settings
β”‚   β”œβ”€β”€ logging.yml                  # Logging config
β”‚   └── universe.yml                 # Universe definitions
β”‚
β”œβ”€β”€ data/                       # Data files
β”‚   β”œβ”€β”€ exports/                     # Exported datasets
β”‚   └── reference/                   # Reference data
β”‚
β”œβ”€β”€ migrations/                 # Alembic migrations
β”‚
β”œβ”€β”€ DATA_AVAILABILITY.md        # Data sources & replication
β”œβ”€β”€ DATA_PROVENANCE.md          # Data collection methods
β”œβ”€β”€ DEPLOYMENT_GUIDE.md         # Deployment instructions
β”œβ”€β”€ FSE_RND_ALPHA_HANDOFF.md    # Complete handoff docs
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ docker-compose.yml          # Local Docker setup
β”œβ”€β”€ alembic.ini                 # Migration config
└── pytest.ini                  # Test config

πŸ› οΈ Quick Start

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • PostgreSQL 15+
  • Redis (optional, for caching)

Local Development

# Clone repository
git clone https://github.com/vastdreams/fse-rnd-alpha.git
cd fse-rnd-alpha

# Backend setup
cd backend
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000

# Frontend setup (new terminal)
cd frontend
npm install
npm run dev

Docker Deployment

cd deploy
cp .env.example .env
# Edit .env with your database credentials
docker compose up -d

πŸ“ˆ Research Methodology

Data Sources

Source Description Tier
Financial Modeling Prep Fundamentals, prices Tier 1
Ken French Data Library Factor returns Tier 1
SEC EDGAR 10-K filings Tier 1
CRSP/Compustat Premium data (optional) Tier 2

Statistical Framework

  1. Quintile Sorting: Firms ranked by R&D/Revenue annually (June)
  2. HML-RD Factor: Q5 (High R&D) minus Q1 (Low R&D) returns
  3. Inference: Newey-West HAC standard errors (lag=1)
  4. Robustness: Factor spanning, size controls, delisting sensitivity

Return Convention

Fiscal Year End: Dec 31, 2023
10-K Filed By: Mar 31, 2024
Portfolio Formation: June 30, 2024
Holding Period: July 1, 2024 β†’ June 30, 2025

Key References

  • Chan, Lakonishok & Sougiannis (2001) - R&D and stock returns
  • Fama & French (1993, 2015) - Factor models
  • Shumway (1997) - Delisting bias correction

πŸ”Œ API Endpoints

Research Analysis

Endpoint Description
GET /api/research/publication-snapshot Frozen research results
GET /api/research/quintile-performance/{window} Returns by quintile
GET /api/research/rolling-windows/{window} Time-varying premium
GET /api/research/aggregate-anova Statistical tests
GET /api/research/fama-macbeth/{window} Fama-MacBeth regression

Portfolio & ETF

Endpoint Description
GET /api/portfolio/etf-holdings Current R&D ETF holdings
GET /api/portfolio/sector-weights Sector allocation
GET /api/portfolio/all-candidates All candidate stocks
GET /api/portfolio/forecast-vs-actual Forecast performance

Data Export

Endpoint Description
GET /api/research/export/cohort-data.csv Full research cohort
GET /api/research/export/quintile-performance.csv Quintile returns
GET /api/research/export/rolling-windows.csv Rolling window data
GET /api/research/export/methodology-parameters.json Methodology params

Full API documentation: /docs (Swagger UI)


🧠 Core Modules

AI Agents (src/ai/)

The AI layer uses GPT-4 for extracting R&D signals from unstructured text:

# R&D factor extraction from 10-K chunks
from src.ai.agents.rd_factor_agent import extract_rd_from_chunk

signals = extract_rd_from_chunk(
    chunk_text="...",
    chunk_id="chunk_001",
    page=42,
    section="Business"
)
# Returns: RDChunkSignals(rd_mentions=5, topics=["AI", "Cloud"], tone_score=0.7)

Backtesting Engine (src/backtesting/)

Publication-grade backtesting with Fama-French methodology:

from src.backtesting.engine import run_backtest
from src.backtesting.specs import BacktestSpec

spec = BacktestSpec(
    factor_id="RND_v1_numeric",
    universe=["pilot_top10"],
    start_year=1995,
    end_year=2024,
    num_buckets=5,
    holding_period_years=1
)
results = run_backtest(spec)

Services Layer (src/services/)

Business logic abstraction over data access:

from src.services.company_service import get_company_details
from src.services.rd_service import calculate_rd_intensity

company = get_company_details("AAPL")
rd_intensity = calculate_rd_intensity("AAPL", 2023)

πŸ—„οΈ Database Schema

Key Tables

Table Description
companies Company metadata (ticker, name, sector, CIK)
financials_core Annual fundamentals (R&D, revenue, assets)
financials_ratios Computed ratios (R&D intensity, ROE)
fmp_daily_prices Tier-1 daily prices (split-adjusted close)
fmp_dividends Tier-1 dividend events (ex-div dates; used with fmp_daily_prices to construct total-return proxy)
company_year_core Annual company snapshots
text_factor_rd Text-derived R&D signals
backtest_run Backtest execution metadata
backtest_result Backtest results by year/bucket
publication_snapshots Frozen research results

πŸ”§ Environment Variables

# Database
POSTGRES_PASSWORD=your_secure_password
DATABASE_URL=postgresql+asyncpg://postgres:password@postgres:5432/rd_alpha

# Redis
REDIS_URL=redis://redis:6379/0

# API Keys
FMP_API_KEY=your_fmp_api_key
OPENAI_API_KEY=your_openai_key  # For AI agents

# AWS (optional)
AWS_ACCESS_KEY_ID=your_key
AWS_SECRET_ACCESS_KEY=your_secret
S3_BUCKET=fse-rnd-alpha-data

# Security
SECRET_KEY=your_secret_key
DEBUG=false

πŸ“‹ Scripts Reference

Data Ingestion

# Full FMP ingestion
python scripts/ingest_fmp_ultimate.py

# Fama-French factors
python scripts/ingest_ff_factors.py

# S&P 500 constituents
python scripts/ingest_sp500_historical.py

# SEC filings
python scripts/crawl_sec_filings.py

Research Computation

# Compute returns (July-June)
python scripts/compute_july_june_returns.py --data-tier tier1

# Compute R&D factors
python scripts/compute_rd_factors.py

# Generate research metrics
python scripts/compute_research_metrics.py

# Full reproduction
./scripts/reproduce_publication.sh

Database

# Initialize
python scripts/init_db.py

# Migrations
alembic upgrade head

πŸ§ͺ Testing

# Run all tests
pytest

# With coverage
pytest --cov=src --cov=backend

# Specific test file
pytest tests/unit/test_backtesting.py

πŸ“„ Documentation


πŸ“œ Citation

If you use this research, please cite the working paper (and optionally the open-source platform code):

@techreport{sehgal_rnd_alpha_2025,
  author      = {Sehgal, Abhishek},
  title       = {R\\&D Alpha: Investment Intensity and Long-Term Stock Returns},
  institution = {FSE Research \\& Investments Pty Ltd},
  year        = {2025},
  month       = {12},
  url         = {https://research.finsoeasy.com/rnd-alpha-paper.pdf},
  note        = {Working paper; results are pinned to a frozen publication snapshot (see PDF for snapshot ID).}
}

@software{sehgal_fse_rnd_alpha_2026,
  author  = {Sehgal, Abhishek},
  title   = {FSE R\\&D Alpha Research Platform},
  year    = {2026},
  version = {2.1.0},
  url     = {https://github.com/vastdreams/fse-rnd-alpha}
}

⚠️ Disclaimer

This research is provided for educational and informational purposes only. It does not constitute investment advice. Past performance does not guarantee future results. The authors are not responsible for any investment decisions made based on this research.


πŸ“ License

MIT License - see LICENSE for details.


🀝 Contributing

Contributions welcome! Please read our contributing guidelines and submit pull requests.


Built with ❀️ by Finsoeasy

About

R&D Alpha: Empirical evidence on the relation between R&D investment intensity and long-term stock returns

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors