Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
170 changes: 170 additions & 0 deletions .agents/docs/2026-06-30-l3-build-mcpp-implementation-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
# L3 `build.mcpp` — native imperative build program (implementation design)

Companion to `2026-06-29-manifest-environment-and-platform-design.md` (§L3). This
doc nails down the concrete MVP shipped in mcpp 0.0.78.

## What it is

A project-local `build.mcpp` (a C++ source file, Zig's `build.zig` / Cargo's
`build.rs` model — but in the project's own language, so no second language and it
dogfoods mcpp). mcpp compiles it with the **host** toolchain and runs it **before**
the main build; the program emits stdout directives that augment the main build.

```cpp
// build.mcpp
#include <cstdio>
int main() {
std::puts("mcpp:cxxflag=-DHAVE_FEATURE=1");
std::puts("mcpp:link-lib=m");
std::puts("mcpp:rerun-if-env-changed=USE_FAST");
}
```

## Directive protocol (Discipline 1 — structured output, not global mutation)

The program communicates **only** via stdout lines; everything else is ignored
(so the program may freely log to stderr/stdout). Recognized directives:

| Directive | Effect |
|---|---|
| `mcpp:cxxflag=<flag>` | append `<flag>` to `buildConfig.cxxflags` |
| `mcpp:cflag=<flag>` | append `<flag>` to `buildConfig.cflags` |
| `mcpp:link-lib=<name>` | append `-l<name>` to `buildConfig.ldflags` |
| `mcpp:link-search=<dir>` | append `-L<abs dir>` to `buildConfig.ldflags` (dir resolved against the project root) |
| `mcpp:cfg=<name>` | append `-D<name>` to **both** cflags and cxxflags |
| `mcpp:generated=<path>` | add `<path>` (relative to project root) to `buildConfig.sources` so the modgraph scanner picks it up |
| `mcpp:rerun-if-changed=<path>`| declare a file input (re-run gate, see Discipline 2) |
| `mcpp:rerun-if-env-changed=<VAR>` | declare an env input (re-run gate) |

It *requests* graph edges (flags/libs/sources); it never silently mutates build state.
Unknown `mcpp:` directives are ignored with a one-line warning (forward-compat).

## Declared-I/O re-run contract (Discipline 2 — fixes the `.mcpp_ok` blind spot)

The program is **not** re-run every build. Its parsed directives + declared inputs
are cached at `<proj>/.mcpp/build.mcpp.cache`. On each build we re-run iff:

- the cache is missing, **or**
- the `build.mcpp` source content hash changed, **or**
- the host compiler identity changed, **or**
- any declared `rerun-if-changed` file's content hash changed (or the file vanished), **or**
- any declared `rerun-if-env-changed` variable's current value changed, **or**
- any `generated=` output path no longer exists.

Otherwise the cached directives are reused without recompiling/running. This is the
documented replacement for the bare `.mcpp_ok` success marker ("process exited 0 ≠
outputs correct"): a **declared-input / declared-output contract**. Hashing reuses
the existing FNV-1a helpers (`mcpp::toolchain::hash_file` / `hash_string`).

Because the applied directives land in `buildConfig.{cflags,cxxflags,ldflags}` —
which already feed `canonical_compile_flags` → the fingerprint — and generated
sources feed the modgraph, the **main** build is automatically sensitive to a
changed `build.mcpp` output. The cache only avoids needless re-execution / file
regeneration (which would otherwise bump mtimes and force spurious rebuilds).

## Constraints (à la carte + supply-chain)

- **Leaf only.** `build.mcpp` chooses flags/sources/codegen and emits link
requirements; it must **not** gate the top-level dependency graph (that stays in
the applicative L1 `[target.'cfg(...)']` tables). The directive set deliberately
excludes "add a registry dependency".
- **Host build, target cfg.** It compiles+runs on the **host**. The MVP therefore
runs it only for **native** builds; under an explicit cross `--target` it is
**skipped with a warning** (compiling it with the cross frontend would yield a
binary that can't run on the host). Host-toolchain-for-cross is a follow-up.
- **Isolation.** Executed as a build action: child-only env (no calling-process
mutation, via `capture_exec`), declared inputs/outputs. Extending the same
declared-I/O contract to recipe `install()` is future work.

## Integration (src/build/prepare.cppm)

New module `src/build/build_program.cppm` exports
`run_build_program(Manifest&, root, hostCompiler, cppStandard)`. Called from
`prepare.cppm` right after toolchain detection (`tc`), i.e. **after** target
resolution + the L1 cfg-flag merge (buildConfig flags final) and **before** the
modgraph scanner (so `generated=` sources are scanned). Compile line:

```
<hostCompiler> -std=<cppStandard> -O0 -o <proj>/.mcpp/build.mcpp.bin <proj>/build.mcpp
```

Compile/run failures are hard errors surfaced with captured output.

**Host toolchain flags (sysroot).** A bare `g++ build.mcpp -o bin` works on a warm
dev box but fails on a fresh sandbox: the sandbox compiler can't find crt/libc
without the sysroot wiring the main build adds. So the compile reuses the host
subset of that wiring from the resolved `Toolchain` (`host_base_flags`): GCC gets
`--sysroot=<tc.sysroot>` (or, with no sysroot, the glibc-payload `-idirafter` /
`-B` / `-L`) plus binutils `-B` and the link-runtime `-L`/`-rpath` dirs; Clang
trusts its sibling `.cfg`. This mirrors `flags.cppm`'s GCC branch (kept a small
parallel copy rather than refactoring the platform-sensitive `compute_flags`
pre-release — a future unification should share one helper).

**Artifacts under `target/`.** The compiled program + the declared-input cache live
at `target/.build-mcpp/{build.mcpp.bin, build.mcpp.cache}` (a stable, non-
fingerprint-keyed subdir, since build.mcpp runs before the fingerprint exists), so
they persist across builds and aren't rebuilt needlessly.

## Tests

- `tests/e2e/89_build_mcpp.sh` — a `build.mcpp` emitting a `cxxflag` define + a
`generated` source; assert the define reaches the TU (a `#ifdef` gate) and the
generated source links. Second build asserts the cache short-circuits re-run;
touching a declared `rerun-if-changed` input forces re-run.

## Forward note — `.mcpp` as a first-class C++ extension

The compiler doesn't know the `.mcpp` extension, so we compile build.mcpp with an
explicit `-x c++` (otherwise the driver hands it to the linker as a "linker
script"). This is a special case of a broader convention worth adopting: **inside
an mcpp project, `.mcpp` is just C++.** A natural next step is to add `.mcpp` to the
main build's source glob (`src/**/*.{cppm,cpp,cc,c}` → `+ .mcpp`) with the same
`-x c++` treatment, so a project may use `.mcpp` for ordinary sources/modules — the
extension becomes a marker of "an mcpp-native C++ file" rather than a separate
language. `build.mcpp` is the first instance; the `-x c++` handling here is the
seed. Deferred (out of MVP scope) but the direction is intentional.

## Forward note — typed `import mcpp;` library (Zig-style code API over the wire protocol)

The stdout `mcpp:` text protocol is the **substrate**: it decouples `build.mcpp`
from mcpp's ABI/version, is language-agnostic, and ignores unknown directives
(forward-compatible). This is the Cargo `build.rs` model. Zig sits at the other
end — `build.zig` constructs the graph through a typed `std.Build` **library**.

The chosen direction is the hybrid both ecosystems converge on (cf. Rust's
`build-rs` crate): **keep the text protocol as the wire format, and ship a thin
typed `import mcpp;` module on top** that just emits those strings. So instead of

```cpp
import std;
int main() { std::puts("mcpp:link-lib=m"); }
```

a user writes the modules-first, no-headers form:

```cpp
import mcpp; // bundled in the mcpp binary
int main() { mcpp::link_lib("m"); mcpp::cxxflag("-DX"); }
```

Design constraints for that iteration (per project direction):
- **Bundled in the mcpp binary.** mcpp embeds the `mcpp` module source, writes +
compiles it (cached BMI + object under `target/`, not rebuilt unless the
toolchain changes), and makes it importable when compiling `build.mcpp`.
- **No `import std;` requirement.** The `mcpp` module implements its I/O with
minimal C-level primitives (no `import std;` in its interface), so neither it nor
`build.mcpp` forces the std-module staging cost on a tiny build script.
(Empirically, a standalone `import std;` needs `gcm.cache/std.gcm` staged at the
compile CWD + `std.o` linked — GCC ignores `-fmodule-file=std=` for C++ — so the
module is found via the same `gcm.cache/` staging the ninja backend uses.)
- **Typed API mirrors the directive set** 1:1 (`cxxflag`/`cflag`/`link_lib`/
`link_search`/`cfg`/`generated`/`rerun_if_changed`/`rerun_if_env_changed`).
- The string protocol stays as the documented low-level escape hatch.

This is the next iteration (post-0.0.78); the 0.0.78 core ships the wire-protocol
substrate so everything above layers on a stable foundation.

## mcpp-index dual perspective

A new workspace member `tests/examples/build-mcpp` whose `build.mcpp` emits a
define consumed by `main.cpp`, exercising the feature through the real pipeline.
88 changes: 88 additions & 0 deletions docs/07-build-mcpp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# `build.mcpp` — a native build program

**English** | [简体中文](zh/07-build-mcpp.md)

Most projects need nothing more than `mcpp.toml`. When you need build-time logic —
probe the host, generate a source, decide a flag from the environment — put a
`build.mcpp` in your project root. It is the mcpp analog of Zig's `build.zig` and
Cargo's `build.rs`, but written in **C++**: no second language, and it dogfoods
mcpp itself.

mcpp compiles `build.mcpp` with your toolchain and runs it **before** the main
build. The program talks to mcpp by printing `mcpp:` directives to stdout; those
directives augment the build.

## Quick example

```cpp
// build.mcpp
#include <cstdio>
#include <fstream>

int main() {
// Generate a source the main build will compile + link.
std::ofstream("src/generated.cpp") << "const char* banner() { return \"hi\"; }\n";

std::puts("mcpp:generated=src/generated.cpp"); // add it to the build
std::puts("mcpp:cxxflag=-DHAVE_BANNER=1"); // define a macro for all C++ TUs

if (std::getenv("USE_FAST")) std::puts("mcpp:cxxflag=-DFAST_PATH=1");
std::puts("mcpp:rerun-if-env-changed=USE_FAST"); // re-run me when USE_FAST changes
return 0;
}
```

```bash
mcpp build # compiles + runs build.mcpp, then builds the project
```

## Directives

Print these to stdout (one per line). Any line that does not start with `mcpp:`
is ignored, so you can freely log diagnostics.

| Directive | Effect |
|---|---|
| `mcpp:cxxflag=<flag>` | add `<flag>` to the C++ compile flags |
| `mcpp:cflag=<flag>` | add `<flag>` to the C compile flags |
| `mcpp:link-lib=<name>` | link `-l<name>` |
| `mcpp:link-search=<dir>` | add a library search dir (`-L`; relative dirs resolve against the project root) |
| `mcpp:cfg=<name>` | define `-D<name>` for both C and C++ |
| `mcpp:generated=<path>` | add a generated source (relative to the project root) to the build |
| `mcpp:rerun-if-changed=<path>` | re-run `build.mcpp` when this file changes |
| `mcpp:rerun-if-env-changed=<VAR>` | re-run `build.mcpp` when this env var changes |

The program **requests** build edges (flags, libraries, sources). It cannot add a
registry dependency — keep your dependency graph declarative in `mcpp.toml`
(including platform-conditional `[target.'cfg(...)'.dependencies]`). `build.mcpp`
is for *leaf* decisions: flags, codegen, link requirements.

## Incremental: declared inputs (no needless re-runs)

mcpp does **not** re-run `build.mcpp` on every build. It caches the program's
directives and re-runs only when something it depends on changed:

- the `build.mcpp` source itself,
- the toolchain,
- any file you declared with `rerun-if-changed`,
- any env var you declared with `rerun-if-env-changed`,
- (or a `generated` output went missing).

So **declare your inputs**: if your program reads `config.h` or the `USE_FAST`
variable, emit `mcpp:rerun-if-changed=config.h` / `mcpp:rerun-if-env-changed=USE_FAST`.
This replaces the old "process exited 0, so assume it's fine" guesswork with an
explicit input/output contract — incremental builds stay correct.

When nothing changed you'll see `build.mcpp up to date (cached)`; otherwise
`build.mcpp compiling` / `running`.

## Notes & limits

- **Runs on the host.** `build.mcpp` compiles and runs with the host toolchain.
Under a cross build (`mcpp build --target <triple>`) it is **skipped with a
warning** for now (host-toolchain-for-cross is a planned follow-up). Gate
*dependencies* on the target with `[target.'cfg(...)']` tables instead — those
evaluate on the resolved target. See [05 - mcpp.toml Manifest Guide](05-mcpp-toml.md).
- **CWD is the project root**, so relative paths (`src/generated.cpp`) land where
you expect.
- A non-zero exit from `build.mcpp` aborts the build and prints its output.
1 change: 1 addition & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@
- [04 - Building from Source & Contributing](04-build-from-source.md)
- [05 - mcpp.toml Manifest Guide](05-mcpp-toml.md)
- [06 - Workspaces](06-workspace.md)
- [07 - build.mcpp Build Program](07-build-mcpp.md)
81 changes: 81 additions & 0 deletions docs/zh/07-build-mcpp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# `build.mcpp` —— 原生构建程序

[English](../07-build-mcpp.md) | **简体中文**

绝大多数工程只需要 `mcpp.toml`。当你需要构建期逻辑——探测主机、生成源码、依据环境
决定某个编译开关——就在工程根目录放一个 `build.mcpp`。它是 mcpp 版的 Zig `build.zig`
/ Cargo `build.rs`,但用 **C++** 编写:不引入第二种语言,而且 mcpp 自己吃自己的狗粮。

mcpp 用你的工具链编译 `build.mcpp`,并在主构建**之前**运行它。程序通过向 stdout 打印
`mcpp:` 指令与 mcpp 通信,这些指令会增补本次构建。

## 快速示例

```cpp
// build.mcpp
#include <cstdio>
#include <fstream>

int main() {
// 生成一份源码,主构建会编译 + 链接它。
std::ofstream("src/generated.cpp") << "const char* banner() { return \"hi\"; }\n";

std::puts("mcpp:generated=src/generated.cpp"); // 加入构建
std::puts("mcpp:cxxflag=-DHAVE_BANNER=1"); // 为所有 C++ TU 定义宏

if (std::getenv("USE_FAST")) std::puts("mcpp:cxxflag=-DFAST_PATH=1");
std::puts("mcpp:rerun-if-env-changed=USE_FAST"); // USE_FAST 变化时重跑我
return 0;
}
```

```bash
mcpp build # 编译 + 运行 build.mcpp,然后构建工程
```

## 指令

把这些打印到 stdout(每行一条)。任何不以 `mcpp:` 开头的行都会被忽略,因此你可以
自由打印诊断日志。

| 指令 | 作用 |
|---|---|
| `mcpp:cxxflag=<flag>` | 给 C++ 编译追加 `<flag>` |
| `mcpp:cflag=<flag>` | 给 C 编译追加 `<flag>` |
| `mcpp:link-lib=<name>` | 链接 `-l<name>` |
| `mcpp:link-search=<dir>` | 增加库搜索目录(`-L`;相对路径按工程根目录解析) |
| `mcpp:cfg=<name>` | 为 C 与 C++ 同时定义 `-D<name>` |
| `mcpp:generated=<path>` | 把生成的源码(相对工程根目录)加入构建 |
| `mcpp:rerun-if-changed=<path>` | 该文件变化时重跑 `build.mcpp` |
| `mcpp:rerun-if-env-changed=<VAR>` | 该环境变量变化时重跑 `build.mcpp` |

程序**请求**构建边(开关、库、源码),它**不能**新增注册表依赖——请把依赖图保持在
`mcpp.toml` 里声明式管理(包括平台条件依赖 `[target.'cfg(...)'.dependencies]`)。
`build.mcpp` 用于*叶子*决策:开关、代码生成、链接需求。

## 增量:声明输入(避免无谓重跑)

mcpp **不会**每次构建都重跑 `build.mcpp`。它会缓存程序产出的指令,只有当它依赖的东西
变化时才重跑:

- `build.mcpp` 源码本身,
- 工具链,
- 任何用 `rerun-if-changed` 声明的文件,
- 任何用 `rerun-if-env-changed` 声明的环境变量,
- (或某个 `generated` 产物丢失了)。

所以请**声明你的输入**:如果程序读了 `config.h` 或 `USE_FAST` 变量,就分别 emit
`mcpp:rerun-if-changed=config.h` / `mcpp:rerun-if-env-changed=USE_FAST`。这用一份明确的
输入/输出契约取代了过去「进程退出码为 0 就当成功」的猜测——让增量构建保持正确。

无变化时你会看到 `build.mcpp up to date (cached)`;否则是 `build.mcpp compiling` /
`running`。

## 说明与限制

- **在主机上运行。** `build.mcpp` 用主机工具链编译并运行。在交叉构建
(`mcpp build --target <triple>`)下目前会**跳过并给出警告**(主机工具链交叉是计划中的
后续项)。要按目标平台门控*依赖*,请改用 `[target.'cfg(...)']` 表——它们按解析后的目标
求值。参见 [05 - mcpp.toml 工程文件指南](05-mcpp-toml.md)。
- **当前工作目录是工程根目录**,因此相对路径(`src/generated.cpp`)会落在你预期的位置。
- `build.mcpp` 非零退出会中止构建并打印其输出。
1 change: 1 addition & 0 deletions docs/zh/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@
- [04 - 从源码构建 & 参与贡献](04-build-from-source.md)
- [05 - mcpp.toml 工程文件指南](05-mcpp-toml.md)
- [06 - 工作空间](06-workspace.md)
- [07 - build.mcpp 构建程序](07-build-mcpp.md)
2 changes: 1 addition & 1 deletion mcpp.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "mcpp"
version = "0.0.77"
version = "0.0.78"
description = "Modern C++ build & package management tool"
license = "Apache-2.0"
authors = ["mcpp-community"]
Expand Down
Loading
Loading