Skip to content

Improvement/Modern C++ Wrapper#7

Open
Muppetsg2 wants to merge 46 commits intonsumner:mainfrom
Muppetsg2:improvement/modern-cpp-wrapper
Open

Improvement/Modern C++ Wrapper#7
Muppetsg2 wants to merge 46 commits intonsumner:mainfrom
Muppetsg2:improvement/modern-cpp-wrapper

Conversation

@Muppetsg2
Copy link
Copy Markdown

@Muppetsg2 Muppetsg2 commented Apr 14, 2026

Improvement/Modern C++ Wrapper

Motivation / Context

I've spent some time improving the wrapper and adding missing features that were already available in the underlying Tree-sitter C API. This PR aims to make the C++ interface much more comprehensive, safer, and modern, while completely overhauling the CMake build system for better usability.

📌 Important Notes

  • Versioning: The version number of this library (currently 0.26.8) now strictly tracks the base version of the underlying Tree-sitter C library. This ensures complete clarity for end-users regarding compatibility.
  • Automated Testing: Added support for unit tests using the Catch2 framework to verify the wrapper's behavior. This can be toggled via the CPP_TS_BUILD_TESTS CMake option.
  • PR Incorporation: This Pull Request includes and consolidates all changes previously proposed in PR Add getFieldNameForNamedChild #6. (credits to @ashermancinelli)

Key Changes

1. Architecture & Compatibility (C++ & OS)

  • Multi-Standard Support: Introduced macros and logic to detect the C++ standard in use (TS_HAS_CXX17, TS_HAS_CXX20).
  • Backward Compatibility: Implemented a custom ts::StringView fallback to replace std::string_view for pre-C++17 builds.
  • Modern C++ (C++20): Utilized concepts for type verification in the new visit() function (with an SFINAE / std::enable_if fallback for older standards).
  • Cross-Platform Support: Added OS-specific macros and headers to handle file descriptors correctly across POSIX and Windows (fileno/dup vs _fileno/_dup).
  • File Extensions: Renamed the main header from .h to .hpp to better reflect C++ code, along with updated include guards (#pragma once and #ifndef CPP_TREE_SITTER_HPP).

2. Tree-sitter API Expansion

  • Query Mechanism: Introduced Tree-sitter's powerful query system with the addition of Query and QueryCursor classes, alongside helper structures like QueryCapture, QueryMatch, QueryCursorState, and QueryPredicateStep.
  • Code Editing & Ranges: Added structures to represent source code locations and edits: Point, InputEdit, and Range.
  • Input Management: Added an Input struct that enables asynchronous or chunked reading of source text along with encoding support.
  • Iterators & Navigation: Introduced the LookaheadIterator class for inspecting possible symbols in a given parser state, and expanded navigation methods within the Node class.
  • Visitor Pattern: Implemented a high-level visit() function that allows for easy, depth-first traversal (DFS) of the syntax tree (implemented iteratively for performance).
  • Metadata & Scoped Enums: Introduced type-safe scoped enums for constants (InputEncoding, SymbolType, LogType, Quantifier, QueryPredicateStepType) and support for LanguageMetadata.

3. Safety & Memory Management

  • Exceptions: Transitioned from silent C-style failures to standard C++ exceptions (std::runtime_error, std::length_error, std::invalid_argument, std::logic_error) for issues like query errors, missing languages, or exceeding 4GB buffer limits.
  • Smart Pointers: Expanded the use of std::unique_ptr coupled with custom deleters (details::FreeHelper), ensuring safe memory management for things like query cursors and iterators without leaks.
  • Validation: Added a validate() method to InputEdit to ensure edit ranges are mathematically sound before applying them.

4. WebAssembly (WASM) Support

  • Wasmtime Integration: Added full support for loading WebAssembly-compiled parsers via the Wasmtime library.
  • Environment Management: Introduced the WasmStore class to initialize and manage the lifecycle of the WASM environment and loaded grammars.
  • Loading Capabilities: Implemented logic to load WASM grammars directly from memory buffers or .wasm files.

5. CMake Build System Overhaul

  • MSVC Support: Added proper compiler flags for Microsoft Visual C++ (/W4, /Zc:__cplusplus) and a CPP_TS_MSVC_STATIC_RUNTIME option to toggle between static (/MT) and dynamic (/MD) MSVC runtime linking.
  • Modularization: Split the CMake script into manageable modules - grammar fetching logic is now in grammar.cmake, with helpers in utils.cmake and get_cpm.cmake.
  • Enhanced CPPTSAddGrammar: Now searches for grammars in local cache paths (CPP_TS_GRAMMAR_PATH) (resolves Pointing out existing parser's installation path #1).
    • Can download pre-compiled .wasm grammars on-the-fly directly from GitHub Releases.
    • Added support for parsers utilizing C++ scanners (scanner.cc, scanner.cpp), not just C.
  • Wasmtime Package Management: Added automatic fetching and linking of the compiled wasmtime library for Windows, macOS, and Linux when the CPP_TS_FEATURE_WASM flag is enabled.
  • Amalgamated Builds: Introduced the CPP_TS_AMALGAMATED option to compile Tree-sitter from a single unified file (lib.c), significantly speeding up build times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pointing out existing parser's installation path

1 participant