Fix/typo corrections by eugman · Pull Request #256 · TabularEditor/TabularEditorDocs

eugman · 2026-02-01T19:52:11Z

I automatically scanned for typos, trained a local LLM to look for contextual errors, then I had Opus apply the fixes and I manually reviewed each one.

Also added a blocking GitHub action for common typos. We can add an inline fix button but it would require write permissions to the PR.

Spelling corrections identified and validated through a multi-stage process: 1. Automated detection using pyspellchecker library 2. False positive filtering via fine-tuned LLM classifier (Gemma3:4b via Ollama, GEPA-optimized) 3. Automated fixes applied by Claude Opus 4.5 4. Final human review and approval

Adds a Python-based spellcheck CI that blocks PRs with 100% reliable typos. Features: - Precompiled regex patterns for performance - Skips code blocks, inline code, and YAML frontmatter - Directory pruning (os.walk) for efficiency - Excludes localizedContent (English-only check) - GitHub Actions annotations for inline PR feedback - Symlink escape protection - JSON schema validation Files: - scripts/ci_spellcheck.py: Main detection script - data/common_typos.json: 32 typo patterns - .github/workflows/spellcheck.yml: CI workflow Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

eugman · 2026-02-02T11:49:29Z

Below is an optimized prompt that has a 100% accuracy rate against my 227 training samples. I used Gemma3:4b locally with this prompt to review all docs.

Contextual Spellcheck Prompt (GEPA Optimized - 100% F1)

System Instructions

You are a technical documentation editor specializing in data modeling and business intelligence tools. Your task is to analyze provided text and identify any contextual typos, grammatical errors, or inconsistent terminology, specifically focusing on clarity and adherence to standard conventions within technical documentation.

Important Considerations & Specifics:

File Extensions: Always capitalize file extensions (e.g., "pbix", "pbit").
Formal Tone: Prioritize formal and precise language over informal phrasing. For example, replace "lead to" with "resulted in" or "caused." Strive for direct and unambiguous phrasing.
Technical Terminology: Be aware that some terms might appear incorrect, but are actually valid technical terms within the data modeling and BI domain. Do not flag them as typos unless there is clear evidence of a misspelling. Examples of valid technical terms to not flag include "averagex" (a DAX function). A term's validity should be confirmed by domain expertise; do not assume a term is incorrect simply because it's unfamiliar.
Redundancy: Avoid flagging phrases that are already clear or overly explicit. Phrases like "No data" are considered clear and do not need correction.
Focus on Accuracy: Your focus is on technical accuracy and clarity, not subjective writing style. Concise and unambiguous phrasing is key.
Strategy: You should utilize a balanced strategy – critically assess the text for errors, but also maintain an understanding of the context to avoid flagging valid technical terms or minor stylistic choices as errors. Err on the side of caution; if there's a possibility a term is legitimately used within the BI/data modeling field, do not flag it.
"Fields" Terminology: When describing data model elements, be mindful of the term "fields," which can refer to various elements like model measures, Key Performance Indicators (KPIs), columns, and hierarchies. Recognize that any of these are valid uses of the term and should not be flagged as incorrect.

Output Format

Return your findings as a JSON array. Each element in the array should include the incorrect word, a suggested correction, and a concise reasoning behind the correction. For example: [{"word": "incorrect_word", "correction": "correct_word", "reason": "reason_for_correction"}]. If the text appears clear, concise, and grammatically correct with appropriate terminology, return an empty array.

Few-Shot Examples

Example 1

Text: Use XMLA, where as REST is slower.
Reasoning: The text contains a subtle grammatical error and a comparison highlighting a difference in performance between two technologies. The goal is to identify potential typos and suggest corrections within the context of technical documentation.
Issues: [{"word": "where as", "correction": "whereas", "reason": "grammatical error - 'where as' is incorrect usage; 'whereas' is the correct conjunction for contrasting ideas."}, {"word": "slower", "correction": "slower", "reason": "No correction needed - this is a valid comparative statement."}]

Example 2

Text: The feature is suported in version 3.
Issues: [{"word": "suported", "correction": "supported", "reason": "misspelling - incorrect spelling of supported"}]

Example 3

Text: The OLS (Object Level Security) feature restricts access to objects.
Reasoning: The phrase "OLS (Object Level Security) feature restricts access to objects" appears generally correct and doesn't contain obvious typos or contextual errors given the typical use of this terminology in a technical document.
Issues: []

Example 4

Text: Use params to filter the results.
Issues: []

mlonsk · 2026-02-02T14:47:18Z

Hey Eugene
I have reviewed all the markdown files and thank you for cleaning those up.
However, the workflow you built fails and I cannot approve PR before that is fixed :)

eugman · 2026-02-02T18:16:43Z

Hey Eugene I have reviewed all the markdown files and thank you for cleaning those up. However, the workflow you built fails and I cannot approve PR before that is fixed :)

Dangit! I tested it on a repo, but made changes after. I will investigate.

eugman · 2026-02-03T11:33:41Z

@mlonsk I've addressed the issue in my code (and found a few more typos) but it looks like the build job is not designed to handle pull requests from forks. This is what Claude Code said, I have not had a chance yet to manually validate. So it may be incorrect.

Root Cause

The AZURE_STATIC_WEB_APPS_API_TOKEN secret is not accessible. This typically happens when:

PR from a fork - GitHub doesn't expose repository secrets to PRs from forks for security reasons
Secret expired/deleted - The Azure SWA API token may have been rotated or deleted
Secret name mismatch - The workflow uses AZURE_STATIC_WEB_APPS_API_TOKEN_DELIGHTFUL_MUD_081AFFE03 but it may not exist
Solution Options

If this is your PR (not from a fork):

Ask a repository maintainer to check if the secret AZURE_STATIC_WEB_APPS_API_TOKEN_DELIGHTFUL_MUD_081AFFE03 exists in the repo settings
The token may need to be regenerated from the Azure Portal → Static Web Apps → Manage deployment token

If this is a fork PR:

This is expected behavior - fork PRs can't access secrets
A maintainer needs to merge or run the workflow from the main repo
Alternatively, the workflow could be updated to use skip_deploy_on_missing_secrets: true for PR previews

Quick fix for the workflow:

name: Build And Deploy
uses: Azure/static-web-apps-deploy@v1
with:
azure_static_web_apps_api_token: ${{ secrets.AZURE_STATIC_WEB_APPS_API_TOKEN_DELIGHTFUL_MUD_081AFFE03 }}
skip_deploy_on_missing_secrets: true # Add this line
...

This would allow the build to pass even when secrets aren't available (useful for fork PRs).

greggyb · 2026-02-03T16:20:41Z

scripts/ci_spellcheck.py

I'd recommend that we look for existing solutions in the space rather than maintaining this. If there's nothing holistic, then we could still reduce the maintenance burden with more complete building blocks. We could use something like mq to extract prose and an existing CLI spell checker (e.g., hunspell).

greggyb · 2026-02-03T16:23:54Z

data/common_typos.json

If we do roll our own spellchecker, then this file format is way over-engineered. We just need a 2-field or 3-field structured format of wrong, right, category, where category is optional. This is easy enough as a delimited file. The false positives can simply be another file. Then we can skip json deserialization. Version and updated don't make sense for a local format that is consumed by a single script. This will simplify the ingestion and validation as well in the script. Again, this feedback is only if we roll our own spellchecking script.

greggyb · 2026-02-03T16:25:29Z

I'd recommend breaking out the corrections from the script and new build step. The typos are valuable as-is, but I don't know that maintaining our own hand-rolled spellchecking is the right choice.

eugman · 2026-02-03T16:32:49Z

I'd recommend breaking out the corrections from the script and new build step. The typos are valuable as-is, but I don't know that maintaining our own hand-rolled spellchecking is the right choice.

Makes sense, I'll figure out how to do that.

I used pyspellchecker locally to catch spelling errors and then a local LLM to catch items that are not spelling errors but are contextual errors.

My main thought with the Github action was catching verified typos that have occurred in the past. Treating them like regressions, so to speak. But I'm still very new to all this devops stuff.

eugman and others added 2 commits February 1, 2026 14:32

Fixed more typos and false positives

68d9a9b

greggyb reviewed Feb 3, 2026

View reviewed changes

eugman closed this Feb 3, 2026

eugman mentioned this pull request Feb 3, 2026

Fix/typo corrections only #257

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/typo corrections#256

Fix/typo corrections#256
eugman wants to merge 3 commits intoTabularEditor:mainfrom
eugman:fix/typo-corrections

eugman commented Feb 1, 2026

Uh oh!

eugman commented Feb 2, 2026

Uh oh!

mlonsk commented Feb 2, 2026

Uh oh!

eugman commented Feb 2, 2026

Uh oh!

eugman commented Feb 3, 2026

Uh oh!

greggyb Feb 3, 2026

Uh oh!

greggyb Feb 3, 2026

Uh oh!

greggyb commented Feb 3, 2026

Uh oh!

eugman commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

eugman commented Feb 1, 2026

Uh oh!

eugman commented Feb 2, 2026

Contextual Spellcheck Prompt (GEPA Optimized - 100% F1)

System Instructions

Output Format

Few-Shot Examples

Example 1

Example 2

Example 3

Example 4

Uh oh!

mlonsk commented Feb 2, 2026

Uh oh!

eugman commented Feb 2, 2026

Uh oh!

eugman commented Feb 3, 2026

Uh oh!

greggyb Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

greggyb Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

greggyb commented Feb 3, 2026

Uh oh!

eugman commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants