Skip to content

Roadmap: reduce default embedded binary footprint #347

@KKould

Description

@KKould

Feature Request

Is your feature request related to a problem? Please describe:

KiteSQL is intended to be a lightweight embedded SQL database for Rust. Recent local size checks show that KiteSQL is already smaller than the closest pure-Rust embedded SQL-engine baselines we tested, but it is still much larger than SQLite/libSQL-family code.

The current gap matters because downstream users may evaluate KiteSQL as an embedded dependency, where binary size, compile time, and dependency surface are part of the adoption cost.

Approximate local release measurements from minimal SELECT 1-style applications:

Project Configuration Release binary Stripped binary .text
libSQL default-features = false, core 2.9 MiB 2.5 MiB 2.0 MiB
KiteSQL in-memory ORM 10.4 MiB 7.4 MiB 5.7 MiB
KiteSQL in-memory SQL-only 11.0 MiB 8.0 MiB 6.3 MiB
GlueSQL default-features = false, memory storage 13 MiB 10 MiB 7.0 MiB
Turso default-features = false 19 MiB 15 MiB 10.2 MiB

Notes:

  • libSQL is much smaller because it remains in the SQLite/libSQL C implementation family.
  • Pure Rust SQL engines pay different costs from parser generality, typed execution layers, Rust std, and monomorphized implementation paths.
  • KiteSQL is already below GlueSQL and Turso in these local measurements, but the difference from GlueSQL is not yet dramatic.
  • A useful first milestone would be a default embedded configuration around < 4 MiB of .text, which would make KiteSQL clearly smaller than GlueSQL instead of only moderately smaller.

Current rough internal .text contribution observed in the in-memory ORM shape:

Area Approximate .text
expression 443 KiB
binder 428 KiB
execution 300 KiB
types 297 KiB
planner 176 KiB
optimizer 117 KiB
storage 92 KiB

Major external contributions in the same shape:

Dependency Approximate .text
Rust std 2.1 MiB
sqlparser 1.4 MiB
chrono 57 KiB
stacker 56 KiB

Describe the feature you'd like:

Create a size-reduction roadmap for KiteSQL's default embedded configuration. The goal is not only to remove individual dependencies, but also to make the implementation shape denser and easier to link in small applications.

Proposed work items:

  • Add a reproducible binary-size benchmark workflow or script.

    • Build minimal apps for in-memory ORM, in-memory SQL-only, LMDB, and RocksDB.
    • Record release binary size, stripped binary size, and .text size.
    • Prefer stable commands such as cargo bloat --crates, du, and strip.
    • Keep heavy comparisons such as Turso/DataFusion/DuckDB out of normal CI unless explicitly requested.
  • Replace the default sqlparser path with a KiteSQL-specific parser and AST.

    • This is the clearest expected reduction: roughly 1.0-1.5 MiB of .text.
    • Keep sqlparser compatibility behind an optional feature for users that need the current AST integration.
    • Suggested structure:
      • kite_sql_core: compact AST, binder, planner, executor, types.
      • kite_sql_parser: KiteSQL SQL subset parser.
      • kite_sql_sqlparser_compat: sqlparser::ast to KiteSQL AST conversion.
    • Start with the SQL subset KiteSQL actually supports instead of trying to clone all sqlparser dialect coverage.
  • Consolidate expression evaluation.

    • Current expression code is one of the largest internal areas.
    • Reduce many evaluator structs and trait/generic specializations into shared dispatch where practical.
    • Consider a compact ScalarOp/UnaryOp/BinaryOp representation with centralized evaluation.
    • Expected reduction: roughly 200-500 KiB, depending on how much monomorphization disappears.
  • Reduce binder and planner duplication.

    • Inspect repeated bind_* and visitor-generated symbols with bloat/symbol tools.
    • Prefer compact intermediate structures and shared routines where behavior is equivalent.
    • Replace recursion helper paths with explicit stacks if it removes recursive/stacker cleanly.
    • Expected reduction: roughly 100-400 KiB.
  • Feature-gate heavier type and cast support.

    • Consider a lite default type set:
      • Null
      • Boolean
      • integer
      • float
      • UTF-8 string
    • Move heavier or less common paths behind features:
      • decimal
      • date/time/timestamp
      • blob
      • interval
      • rich cast/formatting behavior
    • This may also reduce chrono and formatting-related code paths.
    • Expected reduction: roughly 100-300 KiB, depending on compatibility constraints.
  • Keep storage backends strictly feature-gated.

    • The smallest embedded shape should be memory-only.
    • LMDB and RocksDB should remain optional and should not accidentally pull unrelated runtime code.
    • RocksDB is especially important because its native dependency surface dominates many binary-size discussions.
  • Gate developer-facing and presentation-heavy paths.

    • Shell, table formatting, CSV, rich EXPLAIN, statistics/analyze, and pretty error paths should not be part of the smallest embedding profile unless required.
    • Avoid pulling CLI/debug dependencies through the library default feature set.
  • Review error and formatting paths.

    • Display, Debug, format!, rich error messages, and pretty plan printing can add code across many modules.
    • Consider a compact error representation for the smallest profile, with richer diagnostics behind a feature.
  • Add a size-oriented release profile recommendation.

    • Document an embedded-size profile such as:

      [profile.release]
      lto = "fat"
      codegen-units = 1
      panic = "abort"
      strip = "symbols"
      opt-level = "z"
    • Measure opt-level = "z" and "s" against performance-sensitive examples before recommending one as the default.

  • Treat no_std + alloc as a long-term core boundary, not the first size win.

    • no_std may be useful as an architectural constraint for a future core crate.
    • For ordinary native applications, the final host binary usually links Rust std anyway, so no_std alone is unlikely to remove the largest cost.
    • The bigger structural wins are parser/AST ownership, reduced monomorphization, and a more compact execution model.
  • Investigate a bytecode-like execution model as a long-term option.

    • SQLite's compactness comes partly from compiling many SQL features into a small VDBE opcode interpreter.
    • KiteSQL could eventually compile expressions/plans into compact opcodes to improve code reuse.
    • This is a larger architectural change and should be considered after parser/AST and evaluator consolidation.

Describe alternatives you've considered:

  • Only using no_std: useful as a boundary, but unlikely to reduce native binary size enough by itself.
  • Only removing small dependencies such as regex, recursive, or stacker: worthwhile, but these are tens to hundreds of KiB, not the multi-MiB gap to SQLite/libSQL-family code.
  • Comparing KiteSQL only with ORM/client libraries: misleading, because ORM/client crates usually do not include a SQL parser, binder, planner, executor, and embedded storage abstraction.
  • Comparing KiteSQL only with SQLite: also misleading, because SQLite is a decades-old C implementation with very high code density and a bytecode VM architecture.

Teachability, Documentation, Adoption, Migration Strategy:

Document the supported embedded-size profiles:

  • memory-lite: smallest local embedded profile.
  • lmdb: small persistent profile.
  • rocksdb: persistent write-heavy profile with larger native dependency surface.
  • compat-sqlparser: existing compatibility path for users that want or need sqlparser::ast.
  • full: richer type/cast/diagnostic/optimizer functionality.

Migration strategy:

  1. Add measurement scripts first so every size-related change has a before/after number.
  2. Keep current behavior as the compatibility baseline.
  3. Introduce new lighter features as opt-in or default-off until stable.
  4. Flip the default only after SQL compatibility and ORM workflows are covered by tests.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions