Skip to content

feat: extend scanner language support for mobile dev (stacked on #10)#17

Open
smk508 wants to merge 2 commits intocytostack:mainfrom
smk508:extend-scanner-lang-support
Open

feat: extend scanner language support for mobile dev (stacked on #10)#17
smk508 wants to merge 2 commits intocytostack:mainfrom
smk508:extend-scanner-lang-support

Conversation

@smk508
Copy link
Copy Markdown

@smk508 smk508 commented Apr 11, 2026

Hey — this is a stacked follow-up to #10. @levnikmyskin's patch adds .dart to CODE_EXTENSIONS in src/scanner/anatomy-scanner.ts:44; I wanted to extend that to cover the rest of the Flutter toolchain (Kotlin, Swift, ObjC) and a few other common gaps, and fix a small drift issue I spotted in the same area.

This PR is stacked on #10, so the commit history here is @levnikmyskin's unchanged .dart commit followed by mine. If #10 merges first, I'll rebase and this diff will shrink to only the net-new additions. If this one lands first, #10 becomes a no-op.

The context

CODE_EXTENSIONS isn't a file-inclusion gate — it only controls which chars-per-token ratio estimateTokens uses (3.5 for code, 3.75 fallback, 4.0 for prose). A .dart file was already being scanned and written to anatomy.md before #10; its token count was just ~7% low because it fell through to the default ratio. Same is true today for .kt, .swift, .m, .mm, and plenty of others. Worth flagging in case the framing of "add dart support" made the change sound bigger than it is.

The drift

There are actually two near-identical extension sets in the repo:

  1. CODE_EXTENSIONS in src/scanner/anatomy-scanner.ts:41 — used by the anatomy scanner's internal estimateTokens.
  2. CODE_EXTS in src/tracker/token-estimator.ts:3 — used by the exported detectContentType() helper that other parts of the codebase consume.

They drifted apart the moment #10 added .dart to the first set only. This PR brings them back in sync (.dart is added to CODE_EXTS here) and drops a one-line // Keep in sync with ... comment above each set so the coupling is visible to the next person who touches either file.

The additions

Added to both sets (.dart was already in CODE_EXTENSIONS from #10 and is newly added to CODE_EXTS here):

  • Flutter toolchain: .kt, .kts, .swift, .m, .mm
  • C++ variants: .hpp, .hh, .cc, .cxx
  • Other common languages: .cs, .rb, .php, .lua
  • Web frontend: .vue, .svelte, .html, .htm
  • Schema / infra: .proto, .graphql, .gql, .tf
  • Shell variants: .bash, .zsh, .fish

No behavioral change beyond the ratio swap — description-extractor.ts already handles per-language description extraction for Kotlin, Swift, Dart, Ruby, C#, PHP, etc. via its own path.extname() routing, so nothing else needs touching.

Testing

Environment: macOS Darwin 25.3.0, Node 22, pnpm build pipeline unchanged.

Scratch Flutter layout used for rows 5–6:

<tmp>/lib/main.dart                                 (285 bytes, stateless MyApp widget)
<tmp>/android/app/src/main/kotlin/MainActivity.kt   (171 bytes, FlutterActivity subclass)
<tmp>/ios/Runner/AppDelegate.swift                  (227 bytes, FlutterAppDelegate subclass)
# Scenario Result
1 pnpm build Clean — tsc + hooks + vite dashboard build all green
2 node dist/bin/openwolf.js --help Prints usage, no errors
3 detectContentType("foo.kt" / ".swift" / ".m" / ".mm" / ".dart") on upstream/main All return "mixed" (falls through to 3.75 ratio)
4 Same calls on this branch All return "code" (3.5 ratio)
5 buildAnatomy() on scratch Flutter layout, upstream/main MainActivity.kt ~46 tok, AppDelegate.swift ~61 tok, main.dart ~76 tok
6 Same on this branch MainActivity.kt ~49 tok, AppDelegate.swift ~65 tok, main.dart ~82 tok — +6.5%, +6.6%, +7.9% respectively
7 660-char Dart sample through estimateTokens(text, "mixed") vs estimateTokens(text, "code") 176 → 189 tokens (+7.4%)
8 anatomy.md section/description output on scratch project Kotlin/Swift/Dart descriptions extracted correctly by the existing description-extractor.ts (no change needed there)

Rows 3 and 5 were captured by temporarily reverting both files to upstream/main on the same branch, running the same node one-liners, then restoring.

levnikmyskin and others added 2 commits April 6, 2026 20:34
Adds Flutter-critical languages (Kotlin, Swift, Objective-C) plus
common gaps (C++ variants, C#, Ruby, PHP, Lua, Vue, Svelte, HTML,
Protobuf, GraphQL, Terraform, shell variants) to both extension sets.

Also brings src/tracker/token-estimator.ts CODE_EXTS back in sync with
src/scanner/anatomy-scanner.ts CODE_EXTENSIONS — the two sets had
drifted apart since only CODE_EXTENSIONS gets the .dart addition from
cytostack#10. Adds a one-line "Keep in sync with ..." comment above each so
future additions hit both places.

These sets control the chars-per-token ratio (3.5 for code vs 3.75
fallback) used by estimateTokens; the net effect is ~7% more accurate
token accounting in anatomy.md and detectContentType() consumers for
projects written in these languages.
@smk508 smk508 changed the title feat: extend scanner language support (stacked on #10) feat: extend scanner language support for mobile dev (stacked on #10) Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants