feat: text2sql 查询修复 + 大宽表自动构建#10
Merged
Merged
Conversation
## text2sql 查询修复 - QueryExecutor: 自动从 Parquet 加载最新日数据到 SQLite 临时表, 支持列名别名(vol/factor_vol 双列兼容) - NLP Processor: 添加 money_flow_fields 和 ROE 字段识别, 修复排序提取(最多/前N名), 修复 % 预处理, 扩展金叉模式匹配 - SQL Generator: 技术指标/资金流查询路由到多表 JOIN builder, 添加 golden_cross 处理, 修复排序和 stock_name JOIN ## 大宽表自动构建 - 新建 wide_table_builder.py: 从 daily_basic/stk_factor/moneyflow/ stock_basic 合并最新交易日数据(317MB → 1.4MB) - 新建 wide_table_status.py: 状态检查 + 18:00 校验逻辑 - 注册为 derived job, 数据中心页面添加状态卡片和构建按钮 - 启动时打印宽表状态, 构建成功后自动清除缓存 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1. Text2SQL 缓存失效: QueryExecutor.invalidate_cache() 清除已加载的 SQLite 临时表, 在 data_jobs_tasks 和 API build 端点均调用 2. 后端 18:00 校验: /wide-table/build 和 /submit 两个路径都检查 past_cutoff, 防止绕过 UI 提前构建 3. net_mf_vol 列: 宽表构建器和 QueryExecutor 列映射都补充该列 🤞 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
QueryExecutor 在每次查询前检测 Parquet 文件 mtime, 文件被重建后自动重载,无需跨进程信号。 解决 Celery worker 清缓存不影响 web 进程的问题。 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
stock_business/stock_factor/stock_moneyflow 三个虚拟表共享 stock_business.parquet。mtime 改为按 (table_name, parquet_path) 独立追踪,检测到任一虚拟表 stale 时重置所有共享同一源文件的 已加载虚拟表,避免 JOIN 查询混合新旧数据。 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
概述
text2sql 页面查询失败修复 + 大宽表(stock_business.parquet)自动构建功能。
text2sql 查询修复
根因:
QueryExecutor直接在 SQLite 执行 SQL,但stock_business/stock_factor/stock_moneyflow表不存在(数据在 Parquet 文件中)。加上 NLP 和 SQL 生成器多层 bug。修改
app/services/text2sql_engine.pyapp/services/nlp_processor.pyapp/services/sql_generator.py修复前后对比
大宽表自动构建
将
stock_business.parquet从 317 MB 全历史 改为 1.4 MB 最新一天(226 倍压缩)。新建文件
app/utils/wide_table_builder.pyapp/services/wide_table_status.py修改文件
registry.pywide_table_builder为 derived jobdata_jobs_api.py/wide-table/status+/wide-table/build端点data_reader.pyinvalidate_stock_business_cache()data_jobs_tasks.pyindex.htmldata_jobs.jsstartup_runtime.pyrun.py功能
测试
🤖 Generated with Claude Code