unified: use a vendored-in copy of tree-sitter-swift#21819
Conversation
There was a problem hiding this comment.
Pull request overview
This PR vendors the tree-sitter-swift grammar into unified/extractor to make iterating on the unified extractor’s Swift parsing prototype easier, and updates Rust/Bazel wiring to use the in-repo copy instead of the crates.io package.
Changes:
- Add a vendored
unified/extractor/tree-sitter-swiftcrate including generated parser sources, scanner, queries, and build scripts. - Switch
unified/extractorfrom the registrytree-sitter-swiftdependency to a local path dependency and add the new crate to the workspace. - Update Bazel third-party wiring to remove the crates.io
tree-sitter-swiftarchive and add required deps (cc,tree-sitter-language) for the new local crate.
Show a summary per file
| File | Description |
|---|---|
| unified/extractor/tree-sitter-swift/tree-sitter.json | Adds tree-sitter grammar metadata/config for Swift. |
| unified/extractor/tree-sitter-swift/src/tree_sitter/parser.h | Vendors tree-sitter parser API header used by generated sources/scanner. |
| unified/extractor/tree-sitter-swift/src/tree_sitter/array.h | Vendors tree-sitter internal array utilities used by generated sources. |
| unified/extractor/tree-sitter-swift/src/tree_sitter/alloc.h | Vendors tree-sitter allocator abstraction header. |
| unified/extractor/tree-sitter-swift/src/scanner.c | Adds Swift external scanner implementation (comments/raw strings/semi handling/etc.). |
| unified/extractor/tree-sitter-swift/README.md | Vendors upstream README for the grammar. |
| unified/extractor/tree-sitter-swift/queries/textobjects.scm | Adds textobject queries for Swift. |
| unified/extractor/tree-sitter-swift/queries/tags.scm | Adds tags queries for symbol definitions. |
| unified/extractor/tree-sitter-swift/queries/outline.scm | Adds outline queries for structure extraction. |
| unified/extractor/tree-sitter-swift/queries/locals.scm | Adds locals queries (definitions/scopes). |
| unified/extractor/tree-sitter-swift/queries/injections.scm | Adds injection queries (regex/comment injections). |
| unified/extractor/tree-sitter-swift/queries/indents.scm | Adds indentation queries. |
| unified/extractor/tree-sitter-swift/queries/highlights.scm | Adds syntax highlighting queries. |
| unified/extractor/tree-sitter-swift/queries/folds.scm | Adds folding queries. |
| unified/extractor/tree-sitter-swift/package.json | Vendors upstream Node package metadata for the grammar. |
| unified/extractor/tree-sitter-swift/LICENSE | Adds upstream MIT license for the vendored grammar. |
| unified/extractor/tree-sitter-swift/grammar.js | Vendors the Swift grammar definition. |
| unified/extractor/tree-sitter-swift/Cargo.toml | Adds a local Rust crate wrapper for the vendored Swift grammar. |
| unified/extractor/tree-sitter-swift/BUILD.bazel | Adds Bazel rules to build the vendored grammar as a Rust library. |
| unified/extractor/tree-sitter-swift/bindings/rust/lib.rs | Provides LanguageFn and embeds node-types/queries; includes basic tests. |
| unified/extractor/tree-sitter-swift/bindings/rust/build.rs | Builds parser.c + scanner.c via cc during Rust builds. |
| unified/extractor/tree-sitter-swift/bindings/node/index.js | Vendors Node binding loader. |
| unified/extractor/tree-sitter-swift/bindings/node/binding.cc | Vendors Node binding implementation exporting the language. |
| unified/extractor/tree-sitter-swift/binding.gyp | Vendors Node-gyp build configuration for the Node binding. |
| unified/extractor/Cargo.toml | Switches tree-sitter-swift dependency to local path. |
| unified/extractor/BUILD.bazel | Adds the new local tree-sitter-swift Bazel target as a dependency. |
| MODULE.bazel | Adds Bazel module repos for cc and tree-sitter-language; removes crates.io tree-sitter-swift repo. |
| misc/bazel/3rdparty/tree_sitter_extractors_deps/defs.bzl | Removes vendored crates.io tree-sitter-swift archive; adds mappings for the new local crate + its deps. |
| misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.tree-sitter-swift-0.7.2.bazel | Deletes the autogenerated BUILD file for the removed crates.io tree-sitter-swift dependency. |
| misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.bazel | Adds aliases for cc and tree-sitter-language. |
| Cargo.toml | Adds the vendored tree-sitter-swift crate as a workspace member. |
| Cargo.lock | Converts tree-sitter-swift from registry source to a workspace package entry (removes source/checksum). |
Copilot's findings
Comments suppressed due to low confidence (1)
unified/extractor/tree-sitter-swift/Cargo.toml:22
- This crate’s Rust tests and doctest example reference the
tree_sittercrate (tree_sitter::Parser), butCargo.tomldoesn’t declare atree-sitterdependency (and it can’t be used transitively). Add an explicittree-sitterdependency (or at least a dev-dependency) socargo test/doctests compile.
# When updating these dependencies, run `misc/bazel/3rdparty/update_cargo_deps.sh`
[dependencies]
tree-sitter-language = "0.1"
[build-dependencies]
cc = "1.2"
- Files reviewed: 31/35 changed files
- Comments generated: 1
| #define DIRECTIVE_COUNT 4 | ||
| const char* DIRECTIVES[OPERATOR_COUNT] = { | ||
| "if", | ||
| "elseif", | ||
| "else", | ||
| "endif" | ||
| }; |
|
This adds 600k lines of code, of which 550k comes from the auto-generated I think we should go against tree-sitter conventions and avoid checking it in these generated artifacts, and instead rely on Bazel rules to rebuild when needed. WDYT? |
Good idea. I'll try to set it up. |
Uses the `tree-sitter-generate` crate to generate these files on the fly.
Rerun has been triggered: 2 restarted 🚀 |
For ease of iteration on the prototype.