Skip to content

Commit c2fb00d

Browse files
authored
feat(build.mcpp): typed import mcpp; build module bundled in the binary (v0.0.81) (#193)
* feat(build.mcpp): typed import mcpp; build module bundled in the binary (v0.0.81) build.mcpp can be written modules-first — import mcpp; (no #include, no import std;) — calling a typed API (mcpp::cxxflag/define/link_lib/generated/…) that emits the same mcpp: wire protocol the engine already parses. Architecture (see .agents/docs/2026-06-30-build-mcpp-module-library-design.md): the helper IS part of the engine's ABI (it speaks this mcpp's protocol), so it ships WITH the engine, not as a versioned package — the Zig std.Build model, not Cargo's build-dep model. Embed the module SOURCE (constexpr string), not a BMI (BMIs are compiler-version-locked); compile on demand against the resolved host toolchain into target/.build-mcpp/ (GCC: -fmodules gcm.cache; Clang: --precompile .pcm). I/O is C-level so the module needs no import std. Gated on actual use: mcpp only builds/links the module when build.mcpp contains 'import mcpp' — a #include-based program compiles byte-identically to before (zero blast radius). Uses the 0.0.79 capture_exec cwd to let GCC find gcm.cache/. - src/build/build_program.cppm: kMcppModuleSource + build_mcpp_module + use-gating - tests/e2e/92_build_mcpp_import.sh (GCC path); docs/07-build-mcpp.md (+zh) - design doc; version -> 0.0.81. Clang path covered by the mcpp-index build-mcpp member's workspace job on macOS/Windows. * fix(build.mcpp): placeholder the embedded module decl so the regex scanner doesn't misread it The Windows self-host build uses the default regex module scanner, which read the 'export module mcpp;' line inside kMcppModuleSource (a raw string literal) as build_program.cppm exporting a second module -> 'file already exports module ... cannot export mcpp'. Use a @module@ placeholder in the embedded source, substituted with 'export module' when written. No behavior change; the generated mcpp.cppm is identical. * docs: record the regex-scanner gotcha in the build-module design
1 parent 57a4485 commit c2fb00d

7 files changed

Lines changed: 359 additions & 8 deletions

File tree

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
# The `mcpp` build-module library for `build.mcpp` (Architecture & Design)
2+
3+
How mcpp provides a **typed module API** to `build.mcpp` so it can be written
4+
modules-first (`import mcpp;`, no `#include`, no `import std;`) instead of printing
5+
raw `mcpp:` protocol strings. Evaluated on five axes: **简洁 (simplicity) / 覆盖
6+
(coverage) / 优化 (optimization) / 稳定 (stability) / 适配 (adaptability)**.
7+
8+
## The constraint that drives the whole design
9+
10+
The helper's job is to emit the *exact* `mcpp:` wire protocol **this** mcpp parses.
11+
So it is **not a third-party library — it is part of the engine's ABI.** Any design
12+
that lets the helper drift from the engine's protocol version (a separately
13+
released package, a pinned dependency) introduces skew. This single fact rules out
14+
most of the "obvious" options and points straight at "ship it with the engine."
15+
16+
## Options considered
17+
18+
| Option | What | Verdict |
19+
|---|---|---|
20+
| **A. Ship a prebuilt BMI** (`mcpp.gcm`/`.pcm` in the release) | precompiled module interface | ✗ BMIs are **not portable** across compiler vendor/version/flags (GCC gcm is locked to the exact GCC build). Would need a combinatorial matrix of BMIs. Fragile. |
21+
| **B. Header-only** (`#include <mcpp_build.h>`) | a shipped header | ✗ contradicts modules-first ("no headers"). (Most portable, but off-brand; kept as a mental fallback only.) |
22+
| **C. Cargo model** — helper is a normal `[build-dependencies]` package in the index | `build.mcpp` depends on a published `mcpp` package, resolved + compiled like any dep | △ composable, but adds a **resolution step** for a leaf script and reintroduces **version skew** (the package version vs the engine's protocol). Cargo's `build-rs` crate works this way — but Cargo's protocol is far more stable than a young tool's. |
23+
| **D. Zig model** — helper is part of the tool, always present, version-matched | embed the module **source** in the binary; compile on demand against the host toolchain |**chosen.** Zig's `std.Build` ships with the compiler; the build API and the engine are one artifact, so they can never disagree. |
24+
25+
### Why "embed the **source**, not the BMI"
26+
27+
Source is the only **toolchain-portable** form. A BMI is compiler-version-locked;
28+
source compiles against *whatever* host toolchain resolved for this build (gcc on
29+
Linux, clang on macOS/Windows), at whatever version, with the same sysroot flags
30+
the build already computes. So one embedded `constexpr std::string_view` adapts to
31+
every toolchain — no matrix, no skew. This is the crux of **适配 + 稳定**.
32+
33+
## Chosen design
34+
35+
```
36+
mcpp binary
37+
└── constexpr std::string_view kMcppModuleSource // the `mcpp` module, embedded
38+
│ (module; #include <cstdio> export module mcpp; … inline emitters)
39+
▼ only when build.mcpp contains `import mcpp`
40+
<proj>/target/.build-mcpp/
41+
├── mcpp.cppm written from the embedded source
42+
├── mcpp.gcm / .pcm compiled BMI (GCC gcm.cache/ | Clang pcm)
43+
├── mcpp.o module object (linked into build.mcpp.bin)
44+
└── build.mcpp.bin
45+
```
46+
47+
1. **Embedded, version-matched** (`build_program.cppm` `kMcppModuleSource`). The
48+
functions mirror the directive set 1:1 and `std::printf` the `mcpp:` lines. I/O
49+
is C-level (global module fragment `#include <cstdio>`), so **the module needs
50+
no `import std;`** — neither does a `build.mcpp` that only `import mcpp;`.
51+
2. **Compiled on demand, into `target/`** — not in the project tree. GCC:
52+
`-fmodules``gcm.cache/mcpp.gcm` + `mcpp.o`; Clang: `--precompile``mcpp.pcm`
53+
then `-c``mcpp.o`. Reuses the build's own `host_base_flags` (sysroot etc.).
54+
3. **Gated on actual use** — mcpp scans `build.mcpp` for `import mcpp`; only then
55+
is the module built + linked and the compile run from `target/.build-mcpp/`
56+
(so GCC finds `gcm.cache/` relative to cwd, via the 0.0.79 `capture_exec` cwd).
57+
A `#include`-based `build.mcpp` compiles **byte-identically to before** — zero
58+
blast radius.
59+
60+
## Five-axis evaluation
61+
62+
- **简洁** — one embedded string + one compile helper; no packaging, no install, no
63+
registry entry, no version field. The user writes `import mcpp;` and it's there.
64+
- **覆盖** — GCC (gcm) on Linux + Clang (pcm) on macOS/Windows = mcpp's whole
65+
toolchain matrix (mcpp uses clang, not MSVC, on Windows). The directive API
66+
covers every wire directive 1:1.
67+
- **优化** — built only when `build.mcpp` *uses* it AND is being (re)compiled
68+
(already gated by the declared-input cache), so a stable build.mcpp pays nothing.
69+
Cost when it does run: one ~0.3 s module compile. *Future*: a **global
70+
per-toolchain BMI cache** (`~/.mcpp/bmi/build-module/<toolchain-hash>/`,
71+
symlinked into each project's `gcm.cache/`) would compile once per machine
72+
instead of once per project — deferred; the per-project compile is cheap and
73+
keeps the code simple.
74+
- **稳定** — embedded source ⇒ **no version skew** (the headline win); use-gating
75+
⇒ existing `#include` programs are untouched; failures surface as a clear "mcpp
76+
module compile failed" with the compiler output.
77+
- **适配** — source-on-demand adapts to any host toolchain/version automatically;
78+
adding a directive = adding one `inline` function to the embedded string;
79+
per-compiler module ABI handled by the GCC/Clang branch.
80+
81+
## Naming
82+
83+
`import mcpp;` (top-level) for brevity — `build.mcpp` context makes the scope
84+
unambiguous. Future non-build helpers can live under `mcpp.<sub>` modules without
85+
colliding. (`import mcpp.build;` was considered for namespace precision; rejected
86+
for the common case's verbosity — revisit only if a second `mcpp` module appears.)
87+
88+
## API (mirrors the wire protocol 1:1)
89+
90+
```cpp
91+
import mcpp;
92+
int main() {
93+
mcpp::cxxflag("-DHAVE_X=1");
94+
mcpp::cflag("-DFOR_C");
95+
mcpp::link_lib("m"); // -lm
96+
mcpp::link_search("vendor/lib"); // -L…
97+
mcpp::define("HAVE_FEATURE"); // cfg= → -DHAVE_FEATURE
98+
mcpp::generated("src/gen.cpp");
99+
mcpp::rerun_if_changed("config.h");
100+
mcpp::rerun_if_env_changed("USE_FAST");
101+
}
102+
```
103+
104+
The raw stdout protocol stays the documented low-level substrate; `import mcpp;` is
105+
the typed layer over it (the Cargo `build-rs`-over-`cargo::` shape, but
106+
engine-bundled à la Zig).
107+
108+
## Implementation gotcha (recorded)
109+
110+
The embedded source contains the line `export module mcpp;`. mcpp's **default
111+
line-based regex module scanner** (used on the Windows self-host build; the P1689
112+
compiler-driven scanner ignores string literals) read that line *inside the raw
113+
string literal* as `build_program.cppm` declaring a second module → "file already
114+
exports module … cannot export 'mcpp'". Fix: write the declaration with a
115+
`@MODULE@` placeholder substituted to `export module` at file-write time, so no
116+
literal `export module <name>` text appears in mcpp's own source. (A broader fix
117+
would be to teach the regex scanner to skip string/raw-string literals.)
118+
119+
## Coverage / stability boundaries (recorded)
120+
121+
- **Windows/macOS Clang path** is exercised by the mcpp-index `build-mcpp`
122+
workspace member (its `mcpp test --workspace` runs on macOS/Windows with clang);
123+
the e2e `92_build_mcpp_import.sh` covers the GCC path (it `requires: gcc`).
124+
- Cross `--target` builds still skip `build.mcpp` entirely (host-only), so the
125+
module is host-only too.

docs/07-build-mcpp.md

Lines changed: 36 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,9 +54,44 @@ is ignored, so you can freely log diagnostics.
5454

5555
The program **requests** build edges (flags, libraries, sources). It cannot add a
5656
registry dependency — keep your dependency graph declarative in `mcpp.toml`
57-
(including platform-conditional `[target.'cfg(...)'.dependencies]`). `build.mcpp`
57+
(including platform-conditional `[target.windows.dependencies]`). `build.mcpp`
5858
is for *leaf* decisions: flags, codegen, link requirements.
5959

60+
## Typed API: `import mcpp;` (recommended)
61+
62+
Instead of printing raw strings you can write `build.mcpp` **modules-first**
63+
`import mcpp;`, no `#include`, no `import std;`. The `mcpp` module is bundled in the
64+
mcpp binary (so it always matches your mcpp's protocol) and is compiled on demand;
65+
its functions just emit the directives above:
66+
67+
```cpp
68+
// build.mcpp
69+
import mcpp;
70+
71+
int main() {
72+
mcpp::cxxflag("-DHAVE_BANNER=1");
73+
mcpp::link_lib("m"); // -lm
74+
mcpp::link_search("vendor/lib"); // -L…
75+
mcpp::define("HAVE_FEATURE"); // == mcpp:cfg= → -DHAVE_FEATURE
76+
mcpp::generated("src/gen.cpp");
77+
mcpp::rerun_if_changed("config.h");
78+
mcpp::rerun_if_env_changed("USE_FAST");
79+
}
80+
```
81+
82+
| Function | Emits |
83+
|---|---|
84+
| `mcpp::cxxflag(s)` / `mcpp::cflag(s)` | `mcpp:cxxflag=` / `mcpp:cflag=` |
85+
| `mcpp::link_lib(s)` / `mcpp::link_search(s)` | `mcpp:link-lib=` / `mcpp:link-search=` |
86+
| `mcpp::define(s)` | `mcpp:cfg=` (i.e. `-D<s>`) |
87+
| `mcpp::generated(p)` | `mcpp:generated=` |
88+
| `mcpp::rerun_if_changed(p)` / `mcpp::rerun_if_env_changed(v)` | the matching `rerun-*` directives |
89+
90+
If your `build.mcpp` also needs to *write* a generated file, mix in a textual
91+
`#include <fstream>` — that's fine; only `import std;` is unnecessary. The raw
92+
stdout protocol above remains the low-level substrate; `import mcpp;` is the typed
93+
layer over it.
94+
6095
## Incremental: declared inputs (no needless re-runs)
6196

6297
mcpp does **not** re-run `build.mcpp` on every build. It caches the program's

docs/zh/07-build-mcpp.md

Lines changed: 34 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,9 +50,42 @@ mcpp build # 编译 + 运行 build.mcpp,然后构建工程
5050
| `mcpp:rerun-if-env-changed=<VAR>` | 该环境变量变化时重跑 `build.mcpp` |
5151

5252
程序**请求**构建边(开关、库、源码),它**不能**新增注册表依赖——请把依赖图保持在
53-
`mcpp.toml` 里声明式管理(包括平台条件依赖 `[target.'cfg(...)'.dependencies]`)。
53+
`mcpp.toml` 里声明式管理(包括平台条件依赖 `[target.windows.dependencies]`)。
5454
`build.mcpp` 用于*叶子*决策:开关、代码生成、链接需求。
5555

56+
## 类型化 API:`import mcpp;`(推荐)
57+
58+
除了打印裸字符串,你还可以把 `build.mcpp` 写成**模块优先**——`import mcpp;`,无
59+
`#include`、无 `import std;``mcpp` 模块**内置在 mcpp 二进制里**(因此永远和你这版 mcpp
60+
的协议匹配),按需编译;它的函数只是 emit 上面那些指令:
61+
62+
```cpp
63+
// build.mcpp
64+
import mcpp;
65+
66+
int main() {
67+
mcpp::cxxflag("-DHAVE_BANNER=1");
68+
mcpp::link_lib("m"); // -lm
69+
mcpp::link_search("vendor/lib"); // -L…
70+
mcpp::define("HAVE_FEATURE"); // == mcpp:cfg= → -DHAVE_FEATURE
71+
mcpp::generated("src/gen.cpp");
72+
mcpp::rerun_if_changed("config.h");
73+
mcpp::rerun_if_env_changed("USE_FAST");
74+
}
75+
```
76+
77+
| 函数 | emit |
78+
|---|---|
79+
| `mcpp::cxxflag(s)` / `mcpp::cflag(s)` | `mcpp:cxxflag=` / `mcpp:cflag=` |
80+
| `mcpp::link_lib(s)` / `mcpp::link_search(s)` | `mcpp:link-lib=` / `mcpp:link-search=` |
81+
| `mcpp::define(s)` | `mcpp:cfg=`(即 `-D<s>`) |
82+
| `mcpp::generated(p)` | `mcpp:generated=` |
83+
| `mcpp::rerun_if_changed(p)` / `mcpp::rerun_if_env_changed(v)` | 对应的 `rerun-*` 指令 |
84+
85+
如果 `build.mcpp` 还需要**生成文件,混入一个文本 `#include <fstream>` 即可——这没问题,
86+
只有 `import std;` 是不必要的。上面的裸 stdout 协议仍是底层基底;`import mcpp;` 是其上的
87+
类型化层。
88+
5689
## 增量:声明输入(避免无谓重跑)
5790

5891
mcpp **不会**每次构建都重跑 `build.mcpp`。它会缓存程序产出的指令,只有当它依赖的东西

mcpp.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "mcpp"
3-
version = "0.0.80"
3+
version = "0.0.81"
44
description = "Modern C++ build & package management tool"
55
license = "Apache-2.0"
66
authors = ["mcpp-community"]

src/build/build_program.cppm

Lines changed: 107 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,79 @@ std::vector<std::string> host_base_flags(const mcpp::toolchain::Toolchain& tc) {
145145
return f;
146146
}
147147

148+
// The bundled `mcpp` build module — a typed API over the stdout wire protocol so
149+
// build.mcpp can `import mcpp;` (no `#include`, no `import std;`). I/O uses
150+
// C-level primitives in the global module fragment, so the module needs no std
151+
// module BMI. The functions mirror the directive set 1:1; they just print the
152+
// `mcpp:` lines the engine already parses. Embedded in the binary (not shipped as
153+
// a file) so it always matches this mcpp's protocol.
154+
// NOTE: the module declaration line uses a `@MODULE@` placeholder (substituted
155+
// with `export module` when written) so mcpp's own line-based module scanner does
156+
// not mistake this embedded string for build_program.cppm exporting a 2nd module.
157+
constexpr std::string_view kMcppModuleSource = R"CPP(module;
158+
#include <cstdio>
159+
@MODULE@ mcpp;
160+
export namespace mcpp {
161+
inline void cxxflag(const char* flag) { std::printf("mcpp:cxxflag=%s\n", flag); }
162+
inline void cflag(const char* flag) { std::printf("mcpp:cflag=%s\n", flag); }
163+
inline void link_lib(const char* name) { std::printf("mcpp:link-lib=%s\n", name); }
164+
inline void link_search(const char* dir) { std::printf("mcpp:link-search=%s\n", dir); }
165+
inline void define(const char* name) { std::printf("mcpp:cfg=%s\n", name); }
166+
inline void generated(const char* path) { std::printf("mcpp:generated=%s\n", path); }
167+
inline void rerun_if_changed(const char* path) { std::printf("mcpp:rerun-if-changed=%s\n", path); }
168+
inline void rerun_if_env_changed(const char* var) { std::printf("mcpp:rerun-if-env-changed=%s\n", var); }
169+
}
170+
)CPP";
171+
172+
// Compile the bundled `mcpp` module into `bdir` and return the extra flags the
173+
// build.mcpp compile needs to import it (the object `mcpp.o` is linked alongside).
174+
// GCC : -fmodules → gcm.cache/mcpp.gcm + mcpp.o; build.mcpp compiles from
175+
// `bdir` (cwd) so GCC finds gcm.cache/mcpp.gcm.
176+
// Clang : --precompile → mcpp.pcm, then -c → mcpp.o; pass -fmodule-file=mcpp=<pcm>.
177+
std::expected<std::vector<std::string>, std::string>
178+
build_mcpp_module(const fs::path& bdir, const fs::path& compiler,
179+
const std::vector<std::string>& base, const std::string& stdFlag,
180+
bool isClang) {
181+
std::error_code ec;
182+
fs::path cppm = bdir / "mcpp.cppm";
183+
std::string moduleSrc(kMcppModuleSource);
184+
if (auto p = moduleSrc.find("@MODULE@"); p != std::string::npos)
185+
moduleSrc.replace(p, std::string_view("@MODULE@").size(), "export module");
186+
{ std::ofstream os(cppm, std::ios::trunc);
187+
os << moduleSrc;
188+
if (!os) return std::unexpected(std::string("could not write mcpp module source")); }
189+
190+
auto run = [&](std::vector<std::string> argv, const char* what)
191+
-> std::expected<void, std::string> {
192+
auto r = mcpp::platform::process::capture_exec(argv, {}, bdir.string());
193+
if (r.exit_code != 0)
194+
return std::unexpected(std::format("mcpp module {} failed (exit {}):\n{}",
195+
what, r.exit_code, r.output));
196+
return {};
197+
};
198+
auto with_base = [&](std::vector<std::string> head) {
199+
for (auto& b : base) head.push_back(b);
200+
return head;
201+
};
202+
203+
std::vector<std::string> extra;
204+
if (isClang) {
205+
if (auto r = run(with_base({compiler.string(), stdFlag, "--precompile",
206+
"mcpp.cppm", "-o", "mcpp.pcm"}), "precompile"); !r)
207+
return std::unexpected(r.error());
208+
if (auto r = run(with_base({compiler.string(), stdFlag, "-c",
209+
"mcpp.pcm", "-o", "mcpp.o"}), "object"); !r)
210+
return std::unexpected(r.error());
211+
extra.push_back("-fmodule-file=mcpp=" + (bdir / "mcpp.pcm").string());
212+
} else {
213+
if (auto r = run(with_base({compiler.string(), stdFlag, "-fmodules", "-c",
214+
"mcpp.cppm", "-o", "mcpp.o"}), "compile"); !r)
215+
return std::unexpected(r.error());
216+
extra.push_back("-fmodules");
217+
}
218+
return extra;
219+
}
220+
148221
// ── Cache (line-based; one record per line, internal format) ───────────────
149222
// program <hash>
150223
// compiler <hash>
@@ -286,20 +359,50 @@ std::expected<void, std::string> run_build_program(
286359
return {};
287360
}
288361

289-
fs::create_directories(build_dir(root), ec);
290-
fs::path bin = build_dir(root) / "build.mcpp.bin";
362+
fs::path bdir = build_dir(root);
363+
fs::create_directories(bdir, ec);
364+
fs::path bin = bdir / "build.mcpp.bin";
291365

292366
// ── Compile build.mcpp with the host toolchain ──────────────────────────
293367
std::string std_flag = "-std=" + std::string(cppStandard.empty() ? "c++23" : cppStandard);
368+
auto base = host_base_flags(tc);
369+
370+
// Only wire the bundled `mcpp` module when build.mcpp actually imports it —
371+
// so the common `#include`-based program compiles exactly as before (no
372+
// -fmodules, cwd = project root). When it does `import mcpp;`, compile the
373+
// module, link its object, and run the build.mcpp compile from `bdir` so GCC
374+
// finds gcm.cache/mcpp.gcm.
375+
std::string srcText;
376+
{ std::ifstream is(src); std::ostringstream ss; ss << is.rdbuf(); srcText = ss.str(); }
377+
bool usesModule = srcText.find("import mcpp") != std::string::npos;
378+
379+
std::vector<std::string> moduleFlags;
380+
if (usesModule) {
381+
auto mf = build_mcpp_module(bdir, hostCompiler, base, std_flag,
382+
mcpp::toolchain::is_clang(tc));
383+
if (!mf) return std::unexpected(mf.error());
384+
moduleFlags = std::move(*mf);
385+
}
386+
294387
// `-x c++` is required: the `.mcpp` extension is unknown to the compiler, so
295388
// without it the driver hands build.mcpp to the linker as a linker script.
296389
std::vector<std::string> compileArgv = { hostCompiler.string(), std_flag, "-O0" };
297-
for (auto& bf : host_base_flags(tc)) compileArgv.push_back(bf);
390+
for (auto& bf : base) compileArgv.push_back(bf);
391+
for (auto& mf : moduleFlags) compileArgv.push_back(mf);
298392
compileArgv.push_back("-x"); compileArgv.push_back("c++");
299393
compileArgv.push_back(src.string());
394+
if (usesModule) {
395+
// Link the module object (reset the input language first so the .o isn't
396+
// treated as C++ source).
397+
compileArgv.push_back("-x"); compileArgv.push_back("none");
398+
compileArgv.push_back((bdir / "mcpp.o").string());
399+
}
300400
compileArgv.push_back("-o"); compileArgv.push_back(bin.string());
301401
mcpp::ui::info("build.mcpp", "compiling");
302-
auto cres = mcpp::platform::process::capture_exec(compileArgv, {}, root.string());
402+
// GCC resolves `import mcpp;` via gcm.cache/ relative to the compile cwd, so
403+
// run the module-using compile from bdir; otherwise the project root is fine.
404+
std::string compileCwd = usesModule ? bdir.string() : root.string();
405+
auto cres = mcpp::platform::process::capture_exec(compileArgv, {}, compileCwd);
303406
if (cres.exit_code != 0) {
304407
return std::unexpected(std::format(
305408
"build.mcpp failed to compile (exit {}):\n{}", cres.exit_code, cres.output));

src/toolchain/fingerprint.cppm

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ import mcpp.toolchain.detect;
1818

1919
export namespace mcpp::toolchain {
2020

21-
inline constexpr std::string_view MCPP_VERSION = "0.0.80";
21+
inline constexpr std::string_view MCPP_VERSION = "0.0.81";
2222

2323
struct FingerprintInputs {
2424
Toolchain toolchain;

0 commit comments

Comments
 (0)