Roadmap: reduce default embedded binary footprint

## Feature Request

**Is your feature request related to a problem? Please describe:**

KiteSQL is intended to be a lightweight embedded SQL database for Rust. Recent local size checks show that KiteSQL is already smaller than the closest pure-Rust embedded SQL-engine baselines we tested, but it is still much larger than SQLite/libSQL-family code.

The current gap matters because downstream users may evaluate KiteSQL as an embedded dependency, where binary size, compile time, and dependency surface are part of the adoption cost.

Approximate local release measurements from minimal `SELECT 1`-style applications:

| Project | Configuration | Release binary | Stripped binary | `.text` |
| --- | --- | ---: | ---: | ---: |
| libSQL | `default-features = false`, `core` | 2.9 MiB | 2.5 MiB | 2.0 MiB |
| KiteSQL | in-memory ORM | 10.4 MiB | 7.4 MiB | 5.7 MiB |
| KiteSQL | in-memory SQL-only | 11.0 MiB | 8.0 MiB | 6.3 MiB |
| GlueSQL | `default-features = false`, memory storage | 13 MiB | 10 MiB | 7.0 MiB |
| Turso | `default-features = false` | 19 MiB | 15 MiB | 10.2 MiB |

Notes:

- libSQL is much smaller because it remains in the SQLite/libSQL C implementation family.
- Pure Rust SQL engines pay different costs from parser generality, typed execution layers, Rust `std`, and monomorphized implementation paths.
- KiteSQL is already below GlueSQL and Turso in these local measurements, but the difference from GlueSQL is not yet dramatic.
- A useful first milestone would be a default embedded configuration around `< 4 MiB` of `.text`, which would make KiteSQL clearly smaller than GlueSQL instead of only moderately smaller.

Current rough internal `.text` contribution observed in the in-memory ORM shape:

| Area | Approximate `.text` |
| --- | ---: |
| `expression` | 443 KiB |
| `binder` | 428 KiB |
| `execution` | 300 KiB |
| `types` | 297 KiB |
| `planner` | 176 KiB |
| `optimizer` | 117 KiB |
| `storage` | 92 KiB |

Major external contributions in the same shape:

| Dependency | Approximate `.text` |
| --- | ---: |
| Rust `std` | 2.1 MiB |
| `sqlparser` | 1.4 MiB |
| `chrono` | 57 KiB |
| `stacker` | 56 KiB |

**Describe the feature you'd like:**

Create a size-reduction roadmap for KiteSQL's default embedded configuration. The goal is not only to remove individual dependencies, but also to make the implementation shape denser and easier to link in small applications.

Proposed work items:

- [ ] Add a reproducible binary-size benchmark workflow or script.
  - Build minimal apps for in-memory ORM, in-memory SQL-only, LMDB, and RocksDB.
  - Record release binary size, stripped binary size, and `.text` size.
  - Prefer stable commands such as `cargo bloat --crates`, `du`, and `strip`.
  - Keep heavy comparisons such as Turso/DataFusion/DuckDB out of normal CI unless explicitly requested.

- [ ] Replace the default `sqlparser` path with a KiteSQL-specific parser and AST.
  - This is the clearest expected reduction: roughly 1.0-1.5 MiB of `.text`.
  - Keep `sqlparser` compatibility behind an optional feature for users that need the current AST integration.
  - Suggested structure:
    - `kite_sql_core`: compact AST, binder, planner, executor, types.
    - `kite_sql_parser`: KiteSQL SQL subset parser.
    - `kite_sql_sqlparser_compat`: `sqlparser::ast` to KiteSQL AST conversion.
  - Start with the SQL subset KiteSQL actually supports instead of trying to clone all `sqlparser` dialect coverage.

- [ ] Consolidate expression evaluation.
  - Current expression code is one of the largest internal areas.
  - Reduce many evaluator structs and trait/generic specializations into shared dispatch where practical.
  - Consider a compact `ScalarOp`/`UnaryOp`/`BinaryOp` representation with centralized evaluation.
  - Expected reduction: roughly 200-500 KiB, depending on how much monomorphization disappears.

- [ ] Reduce binder and planner duplication.
  - Inspect repeated `bind_*` and visitor-generated symbols with bloat/symbol tools.
  - Prefer compact intermediate structures and shared routines where behavior is equivalent.
  - Replace recursion helper paths with explicit stacks if it removes `recursive`/`stacker` cleanly.
  - Expected reduction: roughly 100-400 KiB.

- [ ] Feature-gate heavier type and cast support.
  - Consider a `lite` default type set:
    - `Null`
    - `Boolean`
    - integer
    - float
    - UTF-8 string
  - Move heavier or less common paths behind features:
    - decimal
    - date/time/timestamp
    - blob
    - interval
    - rich cast/formatting behavior
  - This may also reduce `chrono` and formatting-related code paths.
  - Expected reduction: roughly 100-300 KiB, depending on compatibility constraints.

- [ ] Keep storage backends strictly feature-gated.
  - The smallest embedded shape should be memory-only.
  - LMDB and RocksDB should remain optional and should not accidentally pull unrelated runtime code.
  - RocksDB is especially important because its native dependency surface dominates many binary-size discussions.

- [ ] Gate developer-facing and presentation-heavy paths.
  - Shell, table formatting, CSV, rich `EXPLAIN`, statistics/analyze, and pretty error paths should not be part of the smallest embedding profile unless required.
  - Avoid pulling CLI/debug dependencies through the library default feature set.

- [ ] Review error and formatting paths.
  - `Display`, `Debug`, `format!`, rich error messages, and pretty plan printing can add code across many modules.
  - Consider a compact error representation for the smallest profile, with richer diagnostics behind a feature.

- [ ] Add a size-oriented release profile recommendation.
  - Document an embedded-size profile such as:

    ```toml
    [profile.release]
    lto = "fat"
    codegen-units = 1
    panic = "abort"
    strip = "symbols"
    opt-level = "z"
    ```

  - Measure `opt-level = "z"` and `"s"` against performance-sensitive examples before recommending one as the default.

- [ ] Treat `no_std + alloc` as a long-term core boundary, not the first size win.
  - `no_std` may be useful as an architectural constraint for a future core crate.
  - For ordinary native applications, the final host binary usually links Rust `std` anyway, so `no_std` alone is unlikely to remove the largest cost.
  - The bigger structural wins are parser/AST ownership, reduced monomorphization, and a more compact execution model.

- [ ] Investigate a bytecode-like execution model as a long-term option.
  - SQLite's compactness comes partly from compiling many SQL features into a small VDBE opcode interpreter.
  - KiteSQL could eventually compile expressions/plans into compact opcodes to improve code reuse.
  - This is a larger architectural change and should be considered after parser/AST and evaluator consolidation.

**Describe alternatives you've considered:**

- Only using `no_std`: useful as a boundary, but unlikely to reduce native binary size enough by itself.
- Only removing small dependencies such as `regex`, `recursive`, or `stacker`: worthwhile, but these are tens to hundreds of KiB, not the multi-MiB gap to SQLite/libSQL-family code.
- Comparing KiteSQL only with ORM/client libraries: misleading, because ORM/client crates usually do not include a SQL parser, binder, planner, executor, and embedded storage abstraction.
- Comparing KiteSQL only with SQLite: also misleading, because SQLite is a decades-old C implementation with very high code density and a bytecode VM architecture.

**Teachability, Documentation, Adoption, Migration Strategy:**

Document the supported embedded-size profiles:

- `memory-lite`: smallest local embedded profile.
- `lmdb`: small persistent profile.
- `rocksdb`: persistent write-heavy profile with larger native dependency surface.
- `compat-sqlparser`: existing compatibility path for users that want or need `sqlparser::ast`.
- `full`: richer type/cast/diagnostic/optimizer functionality.

Migration strategy:

1. Add measurement scripts first so every size-related change has a before/after number.
2. Keep current behavior as the compatibility baseline.
3. Introduce new lighter features as opt-in or default-off until stable.
4. Flip the default only after SQL compatibility and ORM workflows are covered by tests.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Roadmap: reduce default embedded binary footprint #347

Feature Request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Project	Configuration	Release binary	Stripped binary	`.text`
libSQL	`default-features = false`, `core`	2.9 MiB	2.5 MiB	2.0 MiB
KiteSQL	in-memory ORM	10.4 MiB	7.4 MiB	5.7 MiB
KiteSQL	in-memory SQL-only	11.0 MiB	8.0 MiB	6.3 MiB
GlueSQL	`default-features = false`, memory storage	13 MiB	10 MiB	7.0 MiB
Turso	`default-features = false`	19 MiB	15 MiB	10.2 MiB

Area	Approximate `.text`
`expression`	443 KiB
`binder`	428 KiB
`execution`	300 KiB
`types`	297 KiB
`planner`	176 KiB
`optimizer`	117 KiB
`storage`	92 KiB

Dependency	Approximate `.text`
Rust `std`	2.1 MiB
`sqlparser`	1.4 MiB
`chrono`	57 KiB
`stacker`	56 KiB

Uh oh!

Uh oh!

Roadmap: reduce default embedded binary footprint #347

Description

Feature Request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions