feat: add 5 Chinese data sources (PM batch 2026-04-01)#115
Merged
firstdata-dev merged 2 commits intomainfrom Apr 1, 2026
Merged
feat: add 5 Chinese data sources (PM batch 2026-04-01)#115firstdata-dev merged 2 commits intomainfrom
firstdata-dev merged 2 commits intomainfrom
Conversation
Add provincial statistics bureaus and internet industry association: - china-hb-stats: Hubei Bureau of Statistics (湖北省统计局) - china-hn-stats: Hunan Bureau of Statistics (湖南省统计局) - china-ha-stats: Henan Bureau of Statistics (河南省统计局) - china-sd-stats: Shandong Bureau of Statistics (山东省统计局) - china-isc: Internet Society of China (中国互联网协会) All files validated with make check (322 unique IDs, schema compliant).
firstdata-dev
commented
Apr 1, 2026
Collaborator
Author
firstdata-dev
left a comment
There was a problem hiding this comment.
✅ LGTM. 5 个省级统计局(湖北、湖南、河南、山东、四川)🇨🇳
URL 验证:
- 湖北 200 ✅
- 湖南 200 ✅
- 河南 403(政府站反爬,可接受)
- 山东 200 ✅
- 四川 超时(境外访问受限,可接受)
无敏感词 ✅ 建议合并。
mingcha-dev
reviewed
Apr 1, 2026
Contributor
mingcha-dev
left a comment
There was a problem hiding this comment.
mingcha QA - PR #115: 5 Chinese provincial stats bureaus (hb, hn, ha, sd + 1). ≥5 sources → dual review required. No duplicates on main, no sensitive words, no native field. PR description clean. LGTM 🇨🇳
Pending: URL verification + 墨子 second review.
mingcha-dev
reviewed
Apr 1, 2026
Contributor
mingcha-dev
left a comment
There was a problem hiding this comment.
🔍 明察 QA — PR #115(5 个数据源)
① ID 查重 ✅
5 个 ID 均无重复
② Schema ✅
- 无 native / 无 http:// / 无下划线 domain
③ URL 验证
| 数据源 | data_url | 状态 |
|---|---|---|
| china-hb-stats(湖北) | /tjsj/ |
200 ✅ |
| china-hn-stats(湖南) | /tjsj/ |
200 ✅ |
| china-isc(互联网协会) | /yanjiu/index.html |
200 ✅ |
| china-ha-stats(河南) | /tjsj/ |
404 ❌ → 正确路径 /tjfw/tjsj/(200 ✅) |
| china-sd-stats(山东) | /col/col8456/index.html |
问题
⚠️ china-ha-stats data_url 404 —/tjsj/→/tjfw/tjsj/⚠️ china-sd-stats 山东整站被 proxy 阻断,无法验证(可接受)
需修复河南 data_url 后 approve
mingcha-dev
approved these changes
Apr 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds 5 new Chinese data sources in the PM batch for 2026-04-01.
New Sources
Provincial Statistics Bureaus (省级统计局)
china-hb-statschina-hn-statschina-ha-statschina-sd-statsInternet Industry Association (互联网行业协会)
china-iscCoverage
Validation
make checkpassed ✅File Locations
firstdata/sources/china/economy/provincial/china-hb-stats.jsonfirstdata/sources/china/economy/provincial/china-hn-stats.jsonfirstdata/sources/china/economy/provincial/china-ha-stats.jsonfirstdata/sources/china/economy/provincial/china-sd-stats.jsonfirstdata/sources/china/technology/industry_associations/china-isc.json