Commit 24fdad8
committed
Two parser improvements that expand code-review-graph's file coverage
to extension-less Unix scripts and Korn shell files.
Feature 1: .ksh extension → bash parser (#235)
-----------------------------------------------
Register .ksh (Korn shell) with tree-sitter-bash alongside the existing
.sh / .bash / .zsh entries shipped in v2.3.0. Korn shell is close enough
to bash syntactically that tree-sitter-bash handles the structural
features the graph captures correctly.
Context: in the close comment on PR #230, @tirth8205 explicitly flagged
this as worth adding: "The .ksh extension in particular looks worth
adding — I didn't include it in #227."
Tests: test_detects_language extended with .ksh assertion;
test_ksh_extension_parses_as_bash — end-to-end regression test that
copies sample.sh to a temp .ksh file, parses it, and asserts identical
function set and edge counts.
Feature 2: shebang-based language detection (#237)
--------------------------------------------------
detect_language() was extension-only — any file with no extension returned
None and was silently skipped. This misses a huge category of production
files: git hooks, CI scripts, bin/ entry points, installers.
New SHEBANG_INTERPRETER_TO_LANGUAGE table maps common interpreter
basenames to languages already registered:
bash/sh/zsh/ksh/dash/ash -> bash
python/python2/python3/pypy/pypy3 -> python
node/nodejs -> javascript
ruby, perl, lua, Rscript, php
New _detect_language_from_shebang(path) static method reads the first
256 bytes, handles direct form (#!/bin/bash), env indirection
(#!/usr/bin/env bash), env -S flags, trailing flags (#!/bin/bash -e),
CRLF, binary content, and strict UTF-8 decoding.
detect_language() now falls back to the shebang probe for files with
no extension (suffix == ""). Files with a known extension are never
re-read — extension-based detection stays authoritative.
Tests (16 new in test_parser.py): every interpreter mapping, env -S flag,
trailing flags, missing shebang, empty file, binary content, unknown
interpreter, extension-does-not-get-overridden, and end-to-end
parse_file producing function nodes from an extension-less bash script.
Files changed
-------------
- code_review_graph/parser.py — .ksh mapping + SHEBANG_INTERPRETER_TO_LANGUAGE
table + _detect_language_from_shebang() + detect_language() fallback
- tests/test_multilang.py — .ksh detection + end-to-end ksh parsing test
- tests/test_parser.py — 16 shebang detection tests
1 parent 80d22bf commit 24fdad8
5 files changed
Lines changed: 310 additions & 20 deletions
File tree
- code_review_graph
- tools
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
8 | 9 | | |
9 | 10 | | |
10 | 11 | | |
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
14 | | - | |
| 15 | + | |
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
| |||
29 | 30 | | |
30 | 31 | | |
31 | 32 | | |
32 | | - | |
| 33 | + | |
33 | 34 | | |
34 | 35 | | |
35 | 36 | | |
| |||
54 | 55 | | |
55 | 56 | | |
56 | 57 | | |
57 | | - | |
| 58 | + | |
58 | 59 | | |
59 | 60 | | |
60 | 61 | | |
| |||
142 | 143 | | |
143 | 144 | | |
144 | 145 | | |
145 | | - | |
| 146 | + | |
146 | 147 | | |
147 | 148 | | |
148 | 149 | | |
| |||
153 | 154 | | |
154 | 155 | | |
155 | 156 | | |
156 | | - | |
| 157 | + | |
157 | 158 | | |
158 | 159 | | |
159 | 160 | | |
| |||
199 | 200 | | |
200 | 201 | | |
201 | 202 | | |
202 | | - | |
| 203 | + | |
203 | 204 | | |
204 | 205 | | |
205 | 206 | | |
| |||
228 | 229 | | |
229 | 230 | | |
230 | 231 | | |
231 | | - | |
| 232 | + | |
232 | 233 | | |
233 | 234 | | |
234 | 235 | | |
| |||
302 | 303 | | |
303 | 304 | | |
304 | 305 | | |
305 | | - | |
| 306 | + | |
306 | 307 | | |
307 | 308 | | |
308 | 309 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
108 | 108 | | |
109 | 109 | | |
110 | 110 | | |
| 111 | + | |
111 | 112 | | |
112 | 113 | | |
113 | 114 | | |
| |||
119 | 120 | | |
120 | 121 | | |
121 | 122 | | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
122 | 158 | | |
123 | 159 | | |
124 | 160 | | |
| |||
383 | 419 | | |
384 | 420 | | |
385 | 421 | | |
386 | | - | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
387 | 504 | | |
388 | 505 | | |
389 | 506 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| 14 | + | |
| 15 | + | |
14 | 16 | | |
15 | 17 | | |
16 | 18 | | |
| |||
28 | 30 | | |
29 | 31 | | |
30 | 32 | | |
31 | | - | |
32 | | - | |
| 33 | + | |
| 34 | + | |
33 | 35 | | |
34 | 36 | | |
35 | 37 | | |
| |||
56 | 58 | | |
57 | 59 | | |
58 | 60 | | |
59 | | - | |
60 | | - | |
| 61 | + | |
| 62 | + | |
61 | 63 | | |
62 | 64 | | |
63 | 65 | | |
| |||
82 | 84 | | |
83 | 85 | | |
84 | 86 | | |
85 | | - | |
86 | | - | |
| 87 | + | |
| 88 | + | |
87 | 89 | | |
88 | 90 | | |
89 | 91 | | |
| |||
118 | 120 | | |
119 | 121 | | |
120 | 122 | | |
121 | | - | |
122 | | - | |
| 123 | + | |
| 124 | + | |
123 | 125 | | |
124 | 126 | | |
125 | 127 | | |
| |||
144 | 146 | | |
145 | 147 | | |
146 | 148 | | |
147 | | - | |
148 | | - | |
| 149 | + | |
| 150 | + | |
149 | 151 | | |
150 | | - | |
| 152 | + | |
151 | 153 | | |
152 | 154 | | |
153 | 155 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1087 | 1087 | | |
1088 | 1088 | | |
1089 | 1089 | | |
| 1090 | + | |
| 1091 | + | |
| 1092 | + | |
| 1093 | + | |
| 1094 | + | |
| 1095 | + | |
| 1096 | + | |
| 1097 | + | |
| 1098 | + | |
| 1099 | + | |
| 1100 | + | |
| 1101 | + | |
| 1102 | + | |
| 1103 | + | |
| 1104 | + | |
| 1105 | + | |
| 1106 | + | |
| 1107 | + | |
| 1108 | + | |
| 1109 | + | |
| 1110 | + | |
| 1111 | + | |
| 1112 | + | |
| 1113 | + | |
| 1114 | + | |
| 1115 | + | |
| 1116 | + | |
| 1117 | + | |
| 1118 | + | |
| 1119 | + | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
1090 | 1123 | | |
1091 | 1124 | | |
1092 | 1125 | | |
| |||
0 commit comments