Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 125 additions & 0 deletions .agents/docs/2026-06-30-build-mcpp-module-library-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# The `mcpp` build-module library for `build.mcpp` (Architecture & Design)

How mcpp provides a **typed module API** to `build.mcpp` so it can be written
modules-first (`import mcpp;`, no `#include`, no `import std;`) instead of printing
raw `mcpp:` protocol strings. Evaluated on five axes: **简洁 (simplicity) / 覆盖
(coverage) / 优化 (optimization) / 稳定 (stability) / 适配 (adaptability)**.

## The constraint that drives the whole design

The helper's job is to emit the *exact* `mcpp:` wire protocol **this** mcpp parses.
So it is **not a third-party library — it is part of the engine's ABI.** Any design
that lets the helper drift from the engine's protocol version (a separately
released package, a pinned dependency) introduces skew. This single fact rules out
most of the "obvious" options and points straight at "ship it with the engine."

## Options considered

| Option | What | Verdict |
|---|---|---|
| **A. Ship a prebuilt BMI** (`mcpp.gcm`/`.pcm` in the release) | precompiled module interface | ✗ BMIs are **not portable** across compiler vendor/version/flags (GCC gcm is locked to the exact GCC build). Would need a combinatorial matrix of BMIs. Fragile. |
| **B. Header-only** (`#include <mcpp_build.h>`) | a shipped header | ✗ contradicts modules-first ("no headers"). (Most portable, but off-brand; kept as a mental fallback only.) |
| **C. Cargo model** — helper is a normal `[build-dependencies]` package in the index | `build.mcpp` depends on a published `mcpp` package, resolved + compiled like any dep | △ composable, but adds a **resolution step** for a leaf script and reintroduces **version skew** (the package version vs the engine's protocol). Cargo's `build-rs` crate works this way — but Cargo's protocol is far more stable than a young tool's. |
| **D. Zig model** — helper is part of the tool, always present, version-matched | embed the module **source** in the binary; compile on demand against the host toolchain | ✓ **chosen.** Zig's `std.Build` ships with the compiler; the build API and the engine are one artifact, so they can never disagree. |

### Why "embed the **source**, not the BMI"

Source is the only **toolchain-portable** form. A BMI is compiler-version-locked;
source compiles against *whatever* host toolchain resolved for this build (gcc on
Linux, clang on macOS/Windows), at whatever version, with the same sysroot flags
the build already computes. So one embedded `constexpr std::string_view` adapts to
every toolchain — no matrix, no skew. This is the crux of **适配 + 稳定**.

## Chosen design

```
mcpp binary
└── constexpr std::string_view kMcppModuleSource // the `mcpp` module, embedded
│ (module; #include <cstdio> export module mcpp; … inline emitters)
▼ only when build.mcpp contains `import mcpp`
<proj>/target/.build-mcpp/
├── mcpp.cppm written from the embedded source
├── mcpp.gcm / .pcm compiled BMI (GCC gcm.cache/ | Clang pcm)
├── mcpp.o module object (linked into build.mcpp.bin)
└── build.mcpp.bin
```

1. **Embedded, version-matched** (`build_program.cppm` `kMcppModuleSource`). The
functions mirror the directive set 1:1 and `std::printf` the `mcpp:` lines. I/O
is C-level (global module fragment `#include <cstdio>`), so **the module needs
no `import std;`** — neither does a `build.mcpp` that only `import mcpp;`.
2. **Compiled on demand, into `target/`** — not in the project tree. GCC:
`-fmodules` → `gcm.cache/mcpp.gcm` + `mcpp.o`; Clang: `--precompile` → `mcpp.pcm`
then `-c` → `mcpp.o`. Reuses the build's own `host_base_flags` (sysroot etc.).
3. **Gated on actual use** — mcpp scans `build.mcpp` for `import mcpp`; only then
is the module built + linked and the compile run from `target/.build-mcpp/`
(so GCC finds `gcm.cache/` relative to cwd, via the 0.0.79 `capture_exec` cwd).
A `#include`-based `build.mcpp` compiles **byte-identically to before** — zero
blast radius.

## Five-axis evaluation

- **简洁** — one embedded string + one compile helper; no packaging, no install, no
registry entry, no version field. The user writes `import mcpp;` and it's there.
- **覆盖** — GCC (gcm) on Linux + Clang (pcm) on macOS/Windows = mcpp's whole
toolchain matrix (mcpp uses clang, not MSVC, on Windows). The directive API
covers every wire directive 1:1.
- **优化** — built only when `build.mcpp` *uses* it AND is being (re)compiled
(already gated by the declared-input cache), so a stable build.mcpp pays nothing.
Cost when it does run: one ~0.3 s module compile. *Future*: a **global
per-toolchain BMI cache** (`~/.mcpp/bmi/build-module/<toolchain-hash>/`,
symlinked into each project's `gcm.cache/`) would compile once per machine
instead of once per project — deferred; the per-project compile is cheap and
keeps the code simple.
- **稳定** — embedded source ⇒ **no version skew** (the headline win); use-gating
⇒ existing `#include` programs are untouched; failures surface as a clear "mcpp
module compile failed" with the compiler output.
- **适配** — source-on-demand adapts to any host toolchain/version automatically;
adding a directive = adding one `inline` function to the embedded string;
per-compiler module ABI handled by the GCC/Clang branch.

## Naming

`import mcpp;` (top-level) for brevity — `build.mcpp` context makes the scope
unambiguous. Future non-build helpers can live under `mcpp.<sub>` modules without
colliding. (`import mcpp.build;` was considered for namespace precision; rejected
for the common case's verbosity — revisit only if a second `mcpp` module appears.)

## API (mirrors the wire protocol 1:1)

```cpp
import mcpp;
int main() {
mcpp::cxxflag("-DHAVE_X=1");
mcpp::cflag("-DFOR_C");
mcpp::link_lib("m"); // -lm
mcpp::link_search("vendor/lib"); // -L…
mcpp::define("HAVE_FEATURE"); // cfg= → -DHAVE_FEATURE
mcpp::generated("src/gen.cpp");
mcpp::rerun_if_changed("config.h");
mcpp::rerun_if_env_changed("USE_FAST");
}
```

The raw stdout protocol stays the documented low-level substrate; `import mcpp;` is
the typed layer over it (the Cargo `build-rs`-over-`cargo::` shape, but
engine-bundled à la Zig).

## Implementation gotcha (recorded)

The embedded source contains the line `export module mcpp;`. mcpp's **default
line-based regex module scanner** (used on the Windows self-host build; the P1689
compiler-driven scanner ignores string literals) read that line *inside the raw
string literal* as `build_program.cppm` declaring a second module → "file already
exports module … cannot export 'mcpp'". Fix: write the declaration with a
`@MODULE@` placeholder substituted to `export module` at file-write time, so no
literal `export module <name>` text appears in mcpp's own source. (A broader fix
would be to teach the regex scanner to skip string/raw-string literals.)

## Coverage / stability boundaries (recorded)

- **Windows/macOS Clang path** is exercised by the mcpp-index `build-mcpp`
workspace member (its `mcpp test --workspace` runs on macOS/Windows with clang);
the e2e `92_build_mcpp_import.sh` covers the GCC path (it `requires: gcc`).
- Cross `--target` builds still skip `build.mcpp` entirely (host-only), so the
module is host-only too.
37 changes: 36 additions & 1 deletion docs/07-build-mcpp.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,44 @@ is ignored, so you can freely log diagnostics.

The program **requests** build edges (flags, libraries, sources). It cannot add a
registry dependency — keep your dependency graph declarative in `mcpp.toml`
(including platform-conditional `[target.'cfg(...)'.dependencies]`). `build.mcpp`
(including platform-conditional `[target.windows.dependencies]`). `build.mcpp`
is for *leaf* decisions: flags, codegen, link requirements.

## Typed API: `import mcpp;` (recommended)

Instead of printing raw strings you can write `build.mcpp` **modules-first** —
`import mcpp;`, no `#include`, no `import std;`. The `mcpp` module is bundled in the
mcpp binary (so it always matches your mcpp's protocol) and is compiled on demand;
its functions just emit the directives above:

```cpp
// build.mcpp
import mcpp;

int main() {
mcpp::cxxflag("-DHAVE_BANNER=1");
mcpp::link_lib("m"); // -lm
mcpp::link_search("vendor/lib"); // -L…
mcpp::define("HAVE_FEATURE"); // == mcpp:cfg= → -DHAVE_FEATURE
mcpp::generated("src/gen.cpp");
mcpp::rerun_if_changed("config.h");
mcpp::rerun_if_env_changed("USE_FAST");
}
```

| Function | Emits |
|---|---|
| `mcpp::cxxflag(s)` / `mcpp::cflag(s)` | `mcpp:cxxflag=` / `mcpp:cflag=` |
| `mcpp::link_lib(s)` / `mcpp::link_search(s)` | `mcpp:link-lib=` / `mcpp:link-search=` |
| `mcpp::define(s)` | `mcpp:cfg=` (i.e. `-D<s>`) |
| `mcpp::generated(p)` | `mcpp:generated=` |
| `mcpp::rerun_if_changed(p)` / `mcpp::rerun_if_env_changed(v)` | the matching `rerun-*` directives |

If your `build.mcpp` also needs to *write* a generated file, mix in a textual
`#include <fstream>` — that's fine; only `import std;` is unnecessary. The raw
stdout protocol above remains the low-level substrate; `import mcpp;` is the typed
layer over it.

## Incremental: declared inputs (no needless re-runs)

mcpp does **not** re-run `build.mcpp` on every build. It caches the program's
Expand Down
35 changes: 34 additions & 1 deletion docs/zh/07-build-mcpp.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,42 @@ mcpp build # 编译 + 运行 build.mcpp,然后构建工程
| `mcpp:rerun-if-env-changed=<VAR>` | 该环境变量变化时重跑 `build.mcpp` |

程序**请求**构建边(开关、库、源码),它**不能**新增注册表依赖——请把依赖图保持在
`mcpp.toml` 里声明式管理(包括平台条件依赖 `[target.'cfg(...)'.dependencies]`)。
`mcpp.toml` 里声明式管理(包括平台条件依赖 `[target.windows.dependencies]`)。
`build.mcpp` 用于*叶子*决策:开关、代码生成、链接需求。

## 类型化 API:`import mcpp;`(推荐)

除了打印裸字符串,你还可以把 `build.mcpp` 写成**模块优先**——`import mcpp;`,无
`#include`、无 `import std;`。`mcpp` 模块**内置在 mcpp 二进制里**(因此永远和你这版 mcpp
的协议匹配),按需编译;它的函数只是 emit 上面那些指令:

```cpp
// build.mcpp
import mcpp;

int main() {
mcpp::cxxflag("-DHAVE_BANNER=1");
mcpp::link_lib("m"); // -lm
mcpp::link_search("vendor/lib"); // -L…
mcpp::define("HAVE_FEATURE"); // == mcpp:cfg= → -DHAVE_FEATURE
mcpp::generated("src/gen.cpp");
mcpp::rerun_if_changed("config.h");
mcpp::rerun_if_env_changed("USE_FAST");
}
```

| 函数 | emit |
|---|---|
| `mcpp::cxxflag(s)` / `mcpp::cflag(s)` | `mcpp:cxxflag=` / `mcpp:cflag=` |
| `mcpp::link_lib(s)` / `mcpp::link_search(s)` | `mcpp:link-lib=` / `mcpp:link-search=` |
| `mcpp::define(s)` | `mcpp:cfg=`(即 `-D<s>`) |
| `mcpp::generated(p)` | `mcpp:generated=` |
| `mcpp::rerun_if_changed(p)` / `mcpp::rerun_if_env_changed(v)` | 对应的 `rerun-*` 指令 |

如果 `build.mcpp` 还需要*写*生成文件,混入一个文本 `#include <fstream>` 即可——这没问题,
只有 `import std;` 是不必要的。上面的裸 stdout 协议仍是底层基底;`import mcpp;` 是其上的
类型化层。

## 增量:声明输入(避免无谓重跑)

mcpp **不会**每次构建都重跑 `build.mcpp`。它会缓存程序产出的指令,只有当它依赖的东西
Expand Down
2 changes: 1 addition & 1 deletion mcpp.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "mcpp"
version = "0.0.80"
version = "0.0.81"
description = "Modern C++ build & package management tool"
license = "Apache-2.0"
authors = ["mcpp-community"]
Expand Down
111 changes: 107 additions & 4 deletions src/build/build_program.cppm
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,79 @@ std::vector<std::string> host_base_flags(const mcpp::toolchain::Toolchain& tc) {
return f;
}

// The bundled `mcpp` build module — a typed API over the stdout wire protocol so
// build.mcpp can `import mcpp;` (no `#include`, no `import std;`). I/O uses
// C-level primitives in the global module fragment, so the module needs no std
// module BMI. The functions mirror the directive set 1:1; they just print the
// `mcpp:` lines the engine already parses. Embedded in the binary (not shipped as
// a file) so it always matches this mcpp's protocol.
// NOTE: the module declaration line uses a `@MODULE@` placeholder (substituted
// with `export module` when written) so mcpp's own line-based module scanner does
// not mistake this embedded string for build_program.cppm exporting a 2nd module.
constexpr std::string_view kMcppModuleSource = R"CPP(module;
#include <cstdio>
@MODULE@ mcpp;
export namespace mcpp {
inline void cxxflag(const char* flag) { std::printf("mcpp:cxxflag=%s\n", flag); }
inline void cflag(const char* flag) { std::printf("mcpp:cflag=%s\n", flag); }
inline void link_lib(const char* name) { std::printf("mcpp:link-lib=%s\n", name); }
inline void link_search(const char* dir) { std::printf("mcpp:link-search=%s\n", dir); }
inline void define(const char* name) { std::printf("mcpp:cfg=%s\n", name); }
inline void generated(const char* path) { std::printf("mcpp:generated=%s\n", path); }
inline void rerun_if_changed(const char* path) { std::printf("mcpp:rerun-if-changed=%s\n", path); }
inline void rerun_if_env_changed(const char* var) { std::printf("mcpp:rerun-if-env-changed=%s\n", var); }
}
)CPP";

// Compile the bundled `mcpp` module into `bdir` and return the extra flags the
// build.mcpp compile needs to import it (the object `mcpp.o` is linked alongside).
// GCC : -fmodules → gcm.cache/mcpp.gcm + mcpp.o; build.mcpp compiles from
// `bdir` (cwd) so GCC finds gcm.cache/mcpp.gcm.
// Clang : --precompile → mcpp.pcm, then -c → mcpp.o; pass -fmodule-file=mcpp=<pcm>.
std::expected<std::vector<std::string>, std::string>
build_mcpp_module(const fs::path& bdir, const fs::path& compiler,
const std::vector<std::string>& base, const std::string& stdFlag,
bool isClang) {
std::error_code ec;
fs::path cppm = bdir / "mcpp.cppm";
std::string moduleSrc(kMcppModuleSource);
if (auto p = moduleSrc.find("@MODULE@"); p != std::string::npos)
moduleSrc.replace(p, std::string_view("@MODULE@").size(), "export module");
{ std::ofstream os(cppm, std::ios::trunc);
os << moduleSrc;
if (!os) return std::unexpected(std::string("could not write mcpp module source")); }

auto run = [&](std::vector<std::string> argv, const char* what)
-> std::expected<void, std::string> {
auto r = mcpp::platform::process::capture_exec(argv, {}, bdir.string());
if (r.exit_code != 0)
return std::unexpected(std::format("mcpp module {} failed (exit {}):\n{}",
what, r.exit_code, r.output));
return {};
};
auto with_base = [&](std::vector<std::string> head) {
for (auto& b : base) head.push_back(b);
return head;
};

std::vector<std::string> extra;
if (isClang) {
if (auto r = run(with_base({compiler.string(), stdFlag, "--precompile",
"mcpp.cppm", "-o", "mcpp.pcm"}), "precompile"); !r)
return std::unexpected(r.error());
if (auto r = run(with_base({compiler.string(), stdFlag, "-c",
"mcpp.pcm", "-o", "mcpp.o"}), "object"); !r)
return std::unexpected(r.error());
extra.push_back("-fmodule-file=mcpp=" + (bdir / "mcpp.pcm").string());
} else {
if (auto r = run(with_base({compiler.string(), stdFlag, "-fmodules", "-c",
"mcpp.cppm", "-o", "mcpp.o"}), "compile"); !r)
return std::unexpected(r.error());
extra.push_back("-fmodules");
}
return extra;
}

// ── Cache (line-based; one record per line, internal format) ───────────────
// program <hash>
// compiler <hash>
Expand Down Expand Up @@ -286,20 +359,50 @@ std::expected<void, std::string> run_build_program(
return {};
}

fs::create_directories(build_dir(root), ec);
fs::path bin = build_dir(root) / "build.mcpp.bin";
fs::path bdir = build_dir(root);
fs::create_directories(bdir, ec);
fs::path bin = bdir / "build.mcpp.bin";

// ── Compile build.mcpp with the host toolchain ──────────────────────────
std::string std_flag = "-std=" + std::string(cppStandard.empty() ? "c++23" : cppStandard);
auto base = host_base_flags(tc);

// Only wire the bundled `mcpp` module when build.mcpp actually imports it —
// so the common `#include`-based program compiles exactly as before (no
// -fmodules, cwd = project root). When it does `import mcpp;`, compile the
// module, link its object, and run the build.mcpp compile from `bdir` so GCC
// finds gcm.cache/mcpp.gcm.
std::string srcText;
{ std::ifstream is(src); std::ostringstream ss; ss << is.rdbuf(); srcText = ss.str(); }
bool usesModule = srcText.find("import mcpp") != std::string::npos;

std::vector<std::string> moduleFlags;
if (usesModule) {
auto mf = build_mcpp_module(bdir, hostCompiler, base, std_flag,
mcpp::toolchain::is_clang(tc));
if (!mf) return std::unexpected(mf.error());
moduleFlags = std::move(*mf);
}

// `-x c++` is required: the `.mcpp` extension is unknown to the compiler, so
// without it the driver hands build.mcpp to the linker as a linker script.
std::vector<std::string> compileArgv = { hostCompiler.string(), std_flag, "-O0" };
for (auto& bf : host_base_flags(tc)) compileArgv.push_back(bf);
for (auto& bf : base) compileArgv.push_back(bf);
for (auto& mf : moduleFlags) compileArgv.push_back(mf);
compileArgv.push_back("-x"); compileArgv.push_back("c++");
compileArgv.push_back(src.string());
if (usesModule) {
// Link the module object (reset the input language first so the .o isn't
// treated as C++ source).
compileArgv.push_back("-x"); compileArgv.push_back("none");
compileArgv.push_back((bdir / "mcpp.o").string());
}
compileArgv.push_back("-o"); compileArgv.push_back(bin.string());
mcpp::ui::info("build.mcpp", "compiling");
auto cres = mcpp::platform::process::capture_exec(compileArgv, {}, root.string());
// GCC resolves `import mcpp;` via gcm.cache/ relative to the compile cwd, so
// run the module-using compile from bdir; otherwise the project root is fine.
std::string compileCwd = usesModule ? bdir.string() : root.string();
auto cres = mcpp::platform::process::capture_exec(compileArgv, {}, compileCwd);
if (cres.exit_code != 0) {
return std::unexpected(std::format(
"build.mcpp failed to compile (exit {}):\n{}", cres.exit_code, cres.output));
Expand Down
2 changes: 1 addition & 1 deletion src/toolchain/fingerprint.cppm
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ import mcpp.toolchain.detect;

export namespace mcpp::toolchain {

inline constexpr std::string_view MCPP_VERSION = "0.0.80";
inline constexpr std::string_view MCPP_VERSION = "0.0.81";

struct FingerprintInputs {
Toolchain toolchain;
Expand Down
Loading
Loading