fix(sync): repair leaderboard parser and cap community list to top N#29
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the skills catalog in the README and refactors the sync-community.sh script to rebuild the community list as a snapshot of the leaderboard (handling both additions and removals) rather than an ever-growing log. It also updates the HTML parsing logic in Node.js to handle multiple chunks safely. The review feedback highlights two important issues in scripts/sync-community.sh: a correctness bug where bracket-matching logic can fail if skill descriptions contain brackets, and a robustness issue where an empty community.json file causes jq to crash when iterating over a null value.
| if (m[1].includes('initialSkills')) { payload = m[1]; break; } | ||
| } | ||
| if (!payload) { process.exit(1); } | ||
| const inner = JSON.parse('\"' + payload + '\"'); |
There was a problem hiding this comment.
正確性問題 (High):目前的括號匹配邏輯(第 63-66 行)在遇到 initialSkills 陣列中的字串欄位(例如 description 或 name)包含中括號 [ 或 ] 時會出錯。例如,如果某個 Skill 的描述為 \"Support for [Markdown] formatting\",則內部的 ] 會導致 depth 提前歸零,從而截斷 JSON 字串並導致後續的 JSON.parse 失敗。\n\n建議將括號匹配邏輯修改為「狀態感知(State-aware)」的匹配器,忽略字串字面量內部的括號:\n\njavascript\n let depth = 0, i = arrStart;\n let inString = false, escape = false;\n for (; i < inner.length; i++) {\n const char = inner[i];\n if (escape) {\n escape = false;\n continue;\n }\n if (char === '\\\\') {\n escape = true;\n continue;\n }\n if (char === '\"') {\n inString = !inString;\n continue;\n }\n if (!inString) {\n if (char === '[') depth++;\n else if (char === ']') { depth--; if (depth === 0) break; }\n }\n }\n\n*(注意:由於此指令碼位於 Bash 雙引號中,反斜線需要進行雙重轉義,例如 \\\\)*
There was a problem hiding this comment.
Verified — an unbalanced [ or ] inside a string field does truncate the extracted JSON (a balanced [Markdown] happens to survive, but a lone bracket breaks it). Fixed in 975aa5a with a state-aware scanner that skips brackets inside string literals (tracking inStr/esc). Regression: still parses all 600 leaderboard entries, plus synthetic cases with unbalanced ], unbalanced [, and escaped quotes all pass.
| official_names=$(jq -n \ | ||
| --slurpfile hub "$HUB_FILE" \ | ||
| --slurpfile comm "$community_file" \ | ||
| '([$hub[0].skills[].name] - [$comm[0][].name]) | unique') |
There was a problem hiding this comment.
健壯性改進 (Medium):如果 community.json 檔案為空(0 位元組),jq 的 --slurpfile comm 會讀入空陣列,此時 $comm[0] 為 null。直接對 null 進行迭代($comm[0][].name)會導致 jq 拋出 Cannot iterate over null 錯誤並中斷執行。建議使用 // [] 運算子提供安全回退值。
| '([$hub[0].skills[].name] - [$comm[0][].name]) | unique') | |
| '([$hub[0].skills[].name] - [($comm[0] // [])[].name]) | unique') |
There was a problem hiding this comment.
Confirmed — a 0-byte community.json makes $comm[0] null and jq errors with "Cannot iterate over null". Fixed in 975aa5a using ($comm[0] // []).
The skills.sh RSC payload moved initialSkills into a later self.__next_f.push chunk. The old lazy regex started at the first push and spliced across chunks, pulling in unescaped quotes that broke JSON.parse — the sync job failed with 'Page format may have changed'. Now match each push's escaped string individually and pick the chunk containing initialSkills. Also stop the community list from growing without bound: rebuild community.json as a snapshot of the current top N instead of appending forever. Newcomers are added, skills that fell out of the top N are removed, and reviewed descriptions/tags on still-ranked entries are preserved. Reapplying the rule shrinks the list 226 -> 113.
c63675e to
612c29f
Compare
Address code review on PR #29: - State-aware bracket matcher in the RSC parser now ignores '[' / ']' inside string literals, so a skill description with an unbalanced bracket no longer truncates the extracted JSON. - Guard official_names against a 0-byte community.json: $comm[0] is null for an empty file and iterating null crashed jq; fall back to [].
The PR audit job fails the build when a newly synced skill scores HIGH/CRITICAL in skillshare audit. Add an audit gate to the sync validation step: each newcomer's cloned directory is audited and any HIGH/CRITICAL result is skipped, so the community list never collects skills that would fail the audit job. Falls back to no filtering (with a warning) when the skillshare CLI is unavailable. Also drop the 10 existing runcomfy AI-media skills flagged HIGH/CRITICAL (face-swap CRITICAL, the rest HIGH). They stay on the leaderboard, so a future sync re-evaluates them through the gate and keeps skipping them — a self-healing loop. Verified: local audit.sh against origin/main passes (exit 0), all newly added sources score MEDIUM or lower.
Skill Audit ResultsAudited 6 sources across 3 repos.
Audit Details
|
Address code review on PR #29: - State-aware bracket matcher in the RSC parser now ignores '[' / ']' inside string literals, so a skill description with an unbalanced bracket no longer truncates the extracted JSON. - Guard official_names against a 0-byte community.json: $comm[0] is null for an empty file and iterating null crashed jq; fall back to [].
No description provided.