tree-sitter API for Java using WebAssembly.
This library brings tree-sitter parsing capabilities to Java by compiling tree-sitter and its language grammars to WebAssembly and running them through Chicory, a pure-Java WASM runtime. No JNI or native binaries required.
Tree-sitter4j is a Maven multi-module project:
The Java module that provides the public API. It contains:
TreeSitter-- factory for creating parser instances (manages the WASM module lifecycle via Chicory)TreeSitterParser-- wraps a tree-sitter parser; set a language, parse a string, get a treeTreeSitterTree/TreeSitterNode-- navigate the resulting AST (node types, children, S-expressions, byte ranges, ...)Language-- enum of available grammars (see below)
The Rust/WebAssembly module. It compiles tree-sitter core and the grammar crates listed in Cargo.toml into a single tree-sitter.wasm binary. The Makefile handles downloading WASI SDK and Binaryen, building, and optimizing the WASM output.
The following tree-sitter grammars are compiled into the WASM module and exposed through the Language enum:
| Language | Crate | Version | Repository |
|---|---|---|---|
| core | tree-sitter |
0.26.9 | [tree-sitter](https://docs.rs/tree-sitter/0.26.9/tree_sitter/ |
| JSON | tree-sitter-json |
0.24.8 | tree-sitter-json |
| Java | tree-sitter-java |
0.23.5 | tree-sitter-java |
| Properties | tree-sitter-properties |
0.3.0 | tree-sitter-properties |
| HTML | tree-sitter-html |
0.23.2 | tree-sitter-html |
| XML | tree-sitter-xml |
0.7.0 | tree-sitter-xml |
| Markdown | tree-sitter-md |
0.5.3 | tree-sitter-md |
| YAML | tree-sitter-yaml |
0.7.2 | tree-sitter-yaml |
To add a new language, add the grammar crate to wasm-build/Cargo.toml, register it in wasm-build/src/lib.rs with a new ID, and add the corresponding entry to the Language enum in core.
try (TreeSitter ts = TreeSitter.create();
TreeSitterParser parser = ts.newParser()) {
parser.setLanguage(Language.JAVA);
try (TreeSitterTree tree = parser.parseString("class Foo {}")) {
TreeSitterNode root = tree.rootNode();
System.out.println(root.toSexp());
}
}- Java 17+
- Maven 3.9+
- Rust toolchain with the
wasm32-wasip1target (only needed to rebuild the WASM binary)
If wasm-build/wasm/tree-sitter.wasm is already present (checked into the repo), you only need to run this command:
mvn clean installIf it is needed to rebuild the wasm binary file for whatever reason like to add a new grammar/language, then you will have to perform the following steps:
- Add the new grammar to the Cargo.toml file under the section
[dependencies], - Include the new language id and method to call under the rust file
wasm-build/src/lib.rs. see section// --- Language helpers ---, - Update the README.md file to add the new language
## Supported languages
Then, execute these commands
cd wasm-build
make allAdditionally, include to this new language part of the Enum io.roastedroot.treesitter.Language and rebuild the java core module
public enum Language {
JSON(0),
JAVA(1),
PROPERTIES(2),
HTML(3),
XML(4),
MARKDOWN(5),
YAML(6);
// NEWLANGUAGE(7);NOTE: The Makefile includes the instructions needed to install localy: wasi-sdk and Binaryen !