Firm-year generative AI exposure measures from Chinese listed-firm recruitment data, used in asset-pricing tests.
DeepSeek API labeling ~3.65M unique job titles (2014–2026) into PKU 100-class occupations. Running on AutoDL, ~3M remaining.
| Metric | Value |
|---|---|
| Human-AI agreement (1,000 titles) | 94.7% |
| DeepSeek E0 vs old E0 (53 occupations) | r = 0.9186 |
| DeepSeek Ef vs canonical Ef (13,599 firm-years) | r = 0.8560 |
| Fama-MacBeth t (DeepSeek) | 1.995 |
| Canonical t | 2.035 |
| R1 [0,10] event t (DeepSeek) | 3.93 |
BERT classifiers (title-only, title+category, chinese-roberta-wwm-ext) maxed at 77.4% accuracy — too far below DeepSeek's 94.7%. Full experiment details in E0/scripts/train_v2.py and logs in E0/logs/.
上市公司招聘大数据2014-2026.3_cleaned.csv.gz→ unique titles → DeepSeek API → merge → updatejd_class2in task candidates → rebuild E0 (occupation exposure) → rebuild Ef (firm-year exposure) → asset-pricing tests.
| Area | Purpose |
|---|---|
src/canonical/ |
Formal E0/Ef pipeline, asset pricing, event studies |
jdclass_mapper/ |
Title → occupation mapping (original 6-layer) |
E0/scripts/ |
DeepSeek labeling, BERT training, E0 rebuild |
Ef_factor_asset_pricing/ |
Asset pricing diagnostics |
scripts/ |
Publishing, comparison utilities |
data/processed/exposure/: occupation_E0.csv (E0), firm_year_Ef.csv (Ef).
See GOAL.md for full project tracking.