diff --git a/.agents/docs/2026-06-30-l3-build-mcpp-implementation-design.md b/.agents/docs/2026-06-30-l3-build-mcpp-implementation-design.md new file mode 100644 index 0000000..a682411 --- /dev/null +++ b/.agents/docs/2026-06-30-l3-build-mcpp-implementation-design.md @@ -0,0 +1,170 @@ +# L3 `build.mcpp` — native imperative build program (implementation design) + +Companion to `2026-06-29-manifest-environment-and-platform-design.md` (§L3). This +doc nails down the concrete MVP shipped in mcpp 0.0.78. + +## What it is + +A project-local `build.mcpp` (a C++ source file, Zig's `build.zig` / Cargo's +`build.rs` model — but in the project's own language, so no second language and it +dogfoods mcpp). mcpp compiles it with the **host** toolchain and runs it **before** +the main build; the program emits stdout directives that augment the main build. + +```cpp +// build.mcpp +#include +int main() { + std::puts("mcpp:cxxflag=-DHAVE_FEATURE=1"); + std::puts("mcpp:link-lib=m"); + std::puts("mcpp:rerun-if-env-changed=USE_FAST"); +} +``` + +## Directive protocol (Discipline 1 — structured output, not global mutation) + +The program communicates **only** via stdout lines; everything else is ignored +(so the program may freely log to stderr/stdout). Recognized directives: + +| Directive | Effect | +|---|---| +| `mcpp:cxxflag=` | append `` to `buildConfig.cxxflags` | +| `mcpp:cflag=` | append `` to `buildConfig.cflags` | +| `mcpp:link-lib=` | append `-l` to `buildConfig.ldflags` | +| `mcpp:link-search=` | append `-L` to `buildConfig.ldflags` (dir resolved against the project root) | +| `mcpp:cfg=` | append `-D` to **both** cflags and cxxflags | +| `mcpp:generated=` | add `` (relative to project root) to `buildConfig.sources` so the modgraph scanner picks it up | +| `mcpp:rerun-if-changed=`| declare a file input (re-run gate, see Discipline 2) | +| `mcpp:rerun-if-env-changed=` | declare an env input (re-run gate) | + +It *requests* graph edges (flags/libs/sources); it never silently mutates build state. +Unknown `mcpp:` directives are ignored with a one-line warning (forward-compat). + +## Declared-I/O re-run contract (Discipline 2 — fixes the `.mcpp_ok` blind spot) + +The program is **not** re-run every build. Its parsed directives + declared inputs +are cached at `/.mcpp/build.mcpp.cache`. On each build we re-run iff: + +- the cache is missing, **or** +- the `build.mcpp` source content hash changed, **or** +- the host compiler identity changed, **or** +- any declared `rerun-if-changed` file's content hash changed (or the file vanished), **or** +- any declared `rerun-if-env-changed` variable's current value changed, **or** +- any `generated=` output path no longer exists. + +Otherwise the cached directives are reused without recompiling/running. This is the +documented replacement for the bare `.mcpp_ok` success marker ("process exited 0 ≠ +outputs correct"): a **declared-input / declared-output contract**. Hashing reuses +the existing FNV-1a helpers (`mcpp::toolchain::hash_file` / `hash_string`). + +Because the applied directives land in `buildConfig.{cflags,cxxflags,ldflags}` — +which already feed `canonical_compile_flags` → the fingerprint — and generated +sources feed the modgraph, the **main** build is automatically sensitive to a +changed `build.mcpp` output. The cache only avoids needless re-execution / file +regeneration (which would otherwise bump mtimes and force spurious rebuilds). + +## Constraints (à la carte + supply-chain) + +- **Leaf only.** `build.mcpp` chooses flags/sources/codegen and emits link + requirements; it must **not** gate the top-level dependency graph (that stays in + the applicative L1 `[target.'cfg(...)']` tables). The directive set deliberately + excludes "add a registry dependency". +- **Host build, target cfg.** It compiles+runs on the **host**. The MVP therefore + runs it only for **native** builds; under an explicit cross `--target` it is + **skipped with a warning** (compiling it with the cross frontend would yield a + binary that can't run on the host). Host-toolchain-for-cross is a follow-up. +- **Isolation.** Executed as a build action: child-only env (no calling-process + mutation, via `capture_exec`), declared inputs/outputs. Extending the same + declared-I/O contract to recipe `install()` is future work. + +## Integration (src/build/prepare.cppm) + +New module `src/build/build_program.cppm` exports +`run_build_program(Manifest&, root, hostCompiler, cppStandard)`. Called from +`prepare.cppm` right after toolchain detection (`tc`), i.e. **after** target +resolution + the L1 cfg-flag merge (buildConfig flags final) and **before** the +modgraph scanner (so `generated=` sources are scanned). Compile line: + +``` + -std= -O0 -o /.mcpp/build.mcpp.bin /build.mcpp +``` + +Compile/run failures are hard errors surfaced with captured output. + +**Host toolchain flags (sysroot).** A bare `g++ build.mcpp -o bin` works on a warm +dev box but fails on a fresh sandbox: the sandbox compiler can't find crt/libc +without the sysroot wiring the main build adds. So the compile reuses the host +subset of that wiring from the resolved `Toolchain` (`host_base_flags`): GCC gets +`--sysroot=` (or, with no sysroot, the glibc-payload `-idirafter` / +`-B` / `-L`) plus binutils `-B` and the link-runtime `-L`/`-rpath` dirs; Clang +trusts its sibling `.cfg`. This mirrors `flags.cppm`'s GCC branch (kept a small +parallel copy rather than refactoring the platform-sensitive `compute_flags` +pre-release — a future unification should share one helper). + +**Artifacts under `target/`.** The compiled program + the declared-input cache live +at `target/.build-mcpp/{build.mcpp.bin, build.mcpp.cache}` (a stable, non- +fingerprint-keyed subdir, since build.mcpp runs before the fingerprint exists), so +they persist across builds and aren't rebuilt needlessly. + +## Tests + +- `tests/e2e/89_build_mcpp.sh` — a `build.mcpp` emitting a `cxxflag` define + a + `generated` source; assert the define reaches the TU (a `#ifdef` gate) and the + generated source links. Second build asserts the cache short-circuits re-run; + touching a declared `rerun-if-changed` input forces re-run. + +## Forward note — `.mcpp` as a first-class C++ extension + +The compiler doesn't know the `.mcpp` extension, so we compile build.mcpp with an +explicit `-x c++` (otherwise the driver hands it to the linker as a "linker +script"). This is a special case of a broader convention worth adopting: **inside +an mcpp project, `.mcpp` is just C++.** A natural next step is to add `.mcpp` to the +main build's source glob (`src/**/*.{cppm,cpp,cc,c}` → `+ .mcpp`) with the same +`-x c++` treatment, so a project may use `.mcpp` for ordinary sources/modules — the +extension becomes a marker of "an mcpp-native C++ file" rather than a separate +language. `build.mcpp` is the first instance; the `-x c++` handling here is the +seed. Deferred (out of MVP scope) but the direction is intentional. + +## Forward note — typed `import mcpp;` library (Zig-style code API over the wire protocol) + +The stdout `mcpp:` text protocol is the **substrate**: it decouples `build.mcpp` +from mcpp's ABI/version, is language-agnostic, and ignores unknown directives +(forward-compatible). This is the Cargo `build.rs` model. Zig sits at the other +end — `build.zig` constructs the graph through a typed `std.Build` **library**. + +The chosen direction is the hybrid both ecosystems converge on (cf. Rust's +`build-rs` crate): **keep the text protocol as the wire format, and ship a thin +typed `import mcpp;` module on top** that just emits those strings. So instead of + +```cpp +import std; +int main() { std::puts("mcpp:link-lib=m"); } +``` + +a user writes the modules-first, no-headers form: + +```cpp +import mcpp; // bundled in the mcpp binary +int main() { mcpp::link_lib("m"); mcpp::cxxflag("-DX"); } +``` + +Design constraints for that iteration (per project direction): +- **Bundled in the mcpp binary.** mcpp embeds the `mcpp` module source, writes + + compiles it (cached BMI + object under `target/`, not rebuilt unless the + toolchain changes), and makes it importable when compiling `build.mcpp`. +- **No `import std;` requirement.** The `mcpp` module implements its I/O with + minimal C-level primitives (no `import std;` in its interface), so neither it nor + `build.mcpp` forces the std-module staging cost on a tiny build script. + (Empirically, a standalone `import std;` needs `gcm.cache/std.gcm` staged at the + compile CWD + `std.o` linked — GCC ignores `-fmodule-file=std=` for C++ — so the + module is found via the same `gcm.cache/` staging the ninja backend uses.) +- **Typed API mirrors the directive set** 1:1 (`cxxflag`/`cflag`/`link_lib`/ + `link_search`/`cfg`/`generated`/`rerun_if_changed`/`rerun_if_env_changed`). +- The string protocol stays as the documented low-level escape hatch. + +This is the next iteration (post-0.0.78); the 0.0.78 core ships the wire-protocol +substrate so everything above layers on a stable foundation. + +## mcpp-index dual perspective + +A new workspace member `tests/examples/build-mcpp` whose `build.mcpp` emits a +define consumed by `main.cpp`, exercising the feature through the real pipeline. diff --git a/docs/07-build-mcpp.md b/docs/07-build-mcpp.md new file mode 100644 index 0000000..9f26522 --- /dev/null +++ b/docs/07-build-mcpp.md @@ -0,0 +1,88 @@ +# `build.mcpp` — a native build program + +**English** | [简体中文](zh/07-build-mcpp.md) + +Most projects need nothing more than `mcpp.toml`. When you need build-time logic — +probe the host, generate a source, decide a flag from the environment — put a +`build.mcpp` in your project root. It is the mcpp analog of Zig's `build.zig` and +Cargo's `build.rs`, but written in **C++**: no second language, and it dogfoods +mcpp itself. + +mcpp compiles `build.mcpp` with your toolchain and runs it **before** the main +build. The program talks to mcpp by printing `mcpp:` directives to stdout; those +directives augment the build. + +## Quick example + +```cpp +// build.mcpp +#include +#include + +int main() { + // Generate a source the main build will compile + link. + std::ofstream("src/generated.cpp") << "const char* banner() { return \"hi\"; }\n"; + + std::puts("mcpp:generated=src/generated.cpp"); // add it to the build + std::puts("mcpp:cxxflag=-DHAVE_BANNER=1"); // define a macro for all C++ TUs + + if (std::getenv("USE_FAST")) std::puts("mcpp:cxxflag=-DFAST_PATH=1"); + std::puts("mcpp:rerun-if-env-changed=USE_FAST"); // re-run me when USE_FAST changes + return 0; +} +``` + +```bash +mcpp build # compiles + runs build.mcpp, then builds the project +``` + +## Directives + +Print these to stdout (one per line). Any line that does not start with `mcpp:` +is ignored, so you can freely log diagnostics. + +| Directive | Effect | +|---|---| +| `mcpp:cxxflag=` | add `` to the C++ compile flags | +| `mcpp:cflag=` | add `` to the C compile flags | +| `mcpp:link-lib=` | link `-l` | +| `mcpp:link-search=` | add a library search dir (`-L`; relative dirs resolve against the project root) | +| `mcpp:cfg=` | define `-D` for both C and C++ | +| `mcpp:generated=` | add a generated source (relative to the project root) to the build | +| `mcpp:rerun-if-changed=` | re-run `build.mcpp` when this file changes | +| `mcpp:rerun-if-env-changed=` | re-run `build.mcpp` when this env var changes | + +The program **requests** build edges (flags, libraries, sources). It cannot add a +registry dependency — keep your dependency graph declarative in `mcpp.toml` +(including platform-conditional `[target.'cfg(...)'.dependencies]`). `build.mcpp` +is for *leaf* decisions: flags, codegen, link requirements. + +## Incremental: declared inputs (no needless re-runs) + +mcpp does **not** re-run `build.mcpp` on every build. It caches the program's +directives and re-runs only when something it depends on changed: + +- the `build.mcpp` source itself, +- the toolchain, +- any file you declared with `rerun-if-changed`, +- any env var you declared with `rerun-if-env-changed`, +- (or a `generated` output went missing). + +So **declare your inputs**: if your program reads `config.h` or the `USE_FAST` +variable, emit `mcpp:rerun-if-changed=config.h` / `mcpp:rerun-if-env-changed=USE_FAST`. +This replaces the old "process exited 0, so assume it's fine" guesswork with an +explicit input/output contract — incremental builds stay correct. + +When nothing changed you'll see `build.mcpp up to date (cached)`; otherwise +`build.mcpp compiling` / `running`. + +## Notes & limits + +- **Runs on the host.** `build.mcpp` compiles and runs with the host toolchain. + Under a cross build (`mcpp build --target `) it is **skipped with a + warning** for now (host-toolchain-for-cross is a planned follow-up). Gate + *dependencies* on the target with `[target.'cfg(...)']` tables instead — those + evaluate on the resolved target. See [05 - mcpp.toml Manifest Guide](05-mcpp-toml.md). +- **CWD is the project root**, so relative paths (`src/generated.cpp`) land where + you expect. +- A non-zero exit from `build.mcpp` aborts the build and prints its output. diff --git a/docs/README.md b/docs/README.md index eda9cc9..dc26b10 100644 --- a/docs/README.md +++ b/docs/README.md @@ -9,3 +9,4 @@ - [04 - Building from Source & Contributing](04-build-from-source.md) - [05 - mcpp.toml Manifest Guide](05-mcpp-toml.md) - [06 - Workspaces](06-workspace.md) +- [07 - build.mcpp Build Program](07-build-mcpp.md) diff --git a/docs/zh/07-build-mcpp.md b/docs/zh/07-build-mcpp.md new file mode 100644 index 0000000..6e9fbfa --- /dev/null +++ b/docs/zh/07-build-mcpp.md @@ -0,0 +1,81 @@ +# `build.mcpp` —— 原生构建程序 + +[English](../07-build-mcpp.md) | **简体中文** + +绝大多数工程只需要 `mcpp.toml`。当你需要构建期逻辑——探测主机、生成源码、依据环境 +决定某个编译开关——就在工程根目录放一个 `build.mcpp`。它是 mcpp 版的 Zig `build.zig` +/ Cargo `build.rs`,但用 **C++** 编写:不引入第二种语言,而且 mcpp 自己吃自己的狗粮。 + +mcpp 用你的工具链编译 `build.mcpp`,并在主构建**之前**运行它。程序通过向 stdout 打印 +`mcpp:` 指令与 mcpp 通信,这些指令会增补本次构建。 + +## 快速示例 + +```cpp +// build.mcpp +#include +#include + +int main() { + // 生成一份源码,主构建会编译 + 链接它。 + std::ofstream("src/generated.cpp") << "const char* banner() { return \"hi\"; }\n"; + + std::puts("mcpp:generated=src/generated.cpp"); // 加入构建 + std::puts("mcpp:cxxflag=-DHAVE_BANNER=1"); // 为所有 C++ TU 定义宏 + + if (std::getenv("USE_FAST")) std::puts("mcpp:cxxflag=-DFAST_PATH=1"); + std::puts("mcpp:rerun-if-env-changed=USE_FAST"); // USE_FAST 变化时重跑我 + return 0; +} +``` + +```bash +mcpp build # 编译 + 运行 build.mcpp,然后构建工程 +``` + +## 指令 + +把这些打印到 stdout(每行一条)。任何不以 `mcpp:` 开头的行都会被忽略,因此你可以 +自由打印诊断日志。 + +| 指令 | 作用 | +|---|---| +| `mcpp:cxxflag=` | 给 C++ 编译追加 `` | +| `mcpp:cflag=` | 给 C 编译追加 `` | +| `mcpp:link-lib=` | 链接 `-l` | +| `mcpp:link-search=` | 增加库搜索目录(`-L`;相对路径按工程根目录解析) | +| `mcpp:cfg=` | 为 C 与 C++ 同时定义 `-D` | +| `mcpp:generated=` | 把生成的源码(相对工程根目录)加入构建 | +| `mcpp:rerun-if-changed=` | 该文件变化时重跑 `build.mcpp` | +| `mcpp:rerun-if-env-changed=` | 该环境变量变化时重跑 `build.mcpp` | + +程序**请求**构建边(开关、库、源码),它**不能**新增注册表依赖——请把依赖图保持在 +`mcpp.toml` 里声明式管理(包括平台条件依赖 `[target.'cfg(...)'.dependencies]`)。 +`build.mcpp` 用于*叶子*决策:开关、代码生成、链接需求。 + +## 增量:声明输入(避免无谓重跑) + +mcpp **不会**每次构建都重跑 `build.mcpp`。它会缓存程序产出的指令,只有当它依赖的东西 +变化时才重跑: + +- `build.mcpp` 源码本身, +- 工具链, +- 任何用 `rerun-if-changed` 声明的文件, +- 任何用 `rerun-if-env-changed` 声明的环境变量, +- (或某个 `generated` 产物丢失了)。 + +所以请**声明你的输入**:如果程序读了 `config.h` 或 `USE_FAST` 变量,就分别 emit +`mcpp:rerun-if-changed=config.h` / `mcpp:rerun-if-env-changed=USE_FAST`。这用一份明确的 +输入/输出契约取代了过去「进程退出码为 0 就当成功」的猜测——让增量构建保持正确。 + +无变化时你会看到 `build.mcpp up to date (cached)`;否则是 `build.mcpp compiling` / +`running`。 + +## 说明与限制 + +- **在主机上运行。** `build.mcpp` 用主机工具链编译并运行。在交叉构建 + (`mcpp build --target `)下目前会**跳过并给出警告**(主机工具链交叉是计划中的 + 后续项)。要按目标平台门控*依赖*,请改用 `[target.'cfg(...)']` 表——它们按解析后的目标 + 求值。参见 [05 - mcpp.toml 工程文件指南](05-mcpp-toml.md)。 +- **当前工作目录是工程根目录**,因此相对路径(`src/generated.cpp`)会落在你预期的位置。 +- `build.mcpp` 非零退出会中止构建并打印其输出。 diff --git a/docs/zh/README.md b/docs/zh/README.md index b8d2e2d..cd3cf4a 100644 --- a/docs/zh/README.md +++ b/docs/zh/README.md @@ -9,3 +9,4 @@ - [04 - 从源码构建 & 参与贡献](04-build-from-source.md) - [05 - mcpp.toml 工程文件指南](05-mcpp-toml.md) - [06 - 工作空间](06-workspace.md) +- [07 - build.mcpp 构建程序](07-build-mcpp.md) diff --git a/mcpp.toml b/mcpp.toml index 4c97b4f..bfc40cd 100644 --- a/mcpp.toml +++ b/mcpp.toml @@ -1,6 +1,6 @@ [package] name = "mcpp" -version = "0.0.77" +version = "0.0.78" description = "Modern C++ build & package management tool" license = "Apache-2.0" authors = ["mcpp-community"] diff --git a/src/build/build_program.cppm b/src/build/build_program.cppm new file mode 100644 index 0000000..e36824e --- /dev/null +++ b/src/build/build_program.cppm @@ -0,0 +1,332 @@ +// mcpp.build.build_program — L3 `build.mcpp`: a project-local native imperative +// build program (Zig's build.zig / Cargo's build.rs model, but in C++ so it +// dogfoods mcpp). Compiled with the HOST toolchain and run BEFORE the main build; +// it emits stdout `mcpp:` directives that augment the main build (extra flags, +// link libraries/search dirs, defines, generated sources). A declared-input cache +// (Discipline 2) re-runs it only when its source, a declared input, or a declared +// env var changes — the documented replacement for the bare `.mcpp_ok` marker. +// +// See .agents/docs/2026-06-30-l3-build-mcpp-implementation-design.md. + +module; + +export module mcpp.build.build_program; + +import std; +import mcpp.manifest; +import mcpp.platform.process; +import mcpp.toolchain.fingerprint; // hash_file / hash_string (FNV-1a, 16 hex) +import mcpp.toolchain.model; // Toolchain, PayloadPaths, is_clang/is_musl_target +import mcpp.toolchain.registry; // archive_tool +import mcpp.ui; + +export namespace mcpp::build { + +// Compile + run `/build.mcpp` (if present) with `hostCompiler` (the resolved +// host frontend) and apply its directives to `m.buildConfig`. `tc` supplies the +// sysroot / runtime flags a fresh sandbox needs to compile + link a freestanding +// host program. No-op when the file is absent. `isCross` skips execution (a host +// build program can't run when compiled for another target). +std::expected run_build_program( + mcpp::manifest::Manifest& m, + const std::filesystem::path& root, + const std::filesystem::path& hostCompiler, + const mcpp::toolchain::Toolchain& tc, + std::string_view cppStandard, + bool isCross); + +} // namespace mcpp::build + +namespace mcpp::build { + +namespace { + +namespace fs = std::filesystem; + +// Parsed directives in apply order. Stored verbatim in the cache so a cache hit +// reapplies the exact same edits without re-running the program. +struct Directives { + std::vector cxxflags; // -> buildConfig.cxxflags + std::vector cflags; // -> buildConfig.cflags + std::vector ldflags; // -> buildConfig.ldflags (already -l/-L) + std::vector defines; // cfg= -> -D, into BOTH c/cxx flags + std::vector generated; // relative source paths + std::vector rerunFiles; // declared file inputs + std::vector rerunEnv; // declared env-var inputs +}; + +std::string trim(std::string_view s) { + std::size_t b = 0, e = s.size(); + while (b < e && (s[b] == ' ' || s[b] == '\t' || s[b] == '\r')) ++b; + while (e > b && (s[e - 1] == ' ' || s[e - 1] == '\t' || s[e - 1] == '\r')) --e; + return std::string(s.substr(b, e - b)); +} + +// Resolve a possibly-relative path against the project root, returning an +// absolute lexically-normal path (no filesystem touch, so it works for dirs that +// the program is about to create as well as existing ones). +std::string abs_against_root(const fs::path& root, std::string_view p) { + fs::path pp(p); + if (pp.is_relative()) pp = root / pp; + return pp.lexically_normal().string(); +} + +// Parse one stdout line. Returns true if it was a recognized (or unknown-but- +// `mcpp:`) directive; false for ordinary program chatter. +bool parse_line(const fs::path& root, std::string_view raw, Directives& d) { + std::string line = trim(raw); + constexpr std::string_view kPfx = "mcpp:"; + if (!line.starts_with(kPfx)) return false; + std::string_view body = std::string_view(line).substr(kPfx.size()); + auto eq = body.find('='); + std::string key = std::string(body.substr(0, eq)); + std::string val = eq == std::string_view::npos ? std::string() : std::string(body.substr(eq + 1)); + + if (key == "cxxflag") d.cxxflags.push_back(val); + else if (key == "cflag") d.cflags.push_back(val); + else if (key == "link-lib") d.ldflags.push_back("-l" + val); + else if (key == "link-search") d.ldflags.push_back("-L" + abs_against_root(root, val)); + else if (key == "cfg") d.defines.push_back("-D" + val); + else if (key == "generated") d.generated.push_back(val); + else if (key == "rerun-if-changed") d.rerunFiles.push_back(val); + else if (key == "rerun-if-env-changed") d.rerunEnv.push_back(val); + else mcpp::ui::warning(std::format("build.mcpp: ignoring unknown directive 'mcpp:{}'", key)); + return true; +} + +void parse_output(const fs::path& root, std::string_view out, Directives& d) { + std::size_t pos = 0; + while (pos <= out.size()) { + std::size_t nl = out.find('\n', pos); + std::string_view ln = out.substr(pos, nl == std::string_view::npos ? std::string_view::npos : nl - pos); + parse_line(root, ln, d); + if (nl == std::string_view::npos) break; + pos = nl + 1; + } +} + +std::string env_value(const std::string& name) { + const char* v = std::getenv(name.c_str()); + return v ? std::string(v) : std::string(); +} + +// The host subset of flags.cppm's sysroot/runtime handling — enough to compile + +// link a freestanding host program on a fresh sandbox (where bare `g++ file -o x` +// can't find crt/libc). build.mcpp is host-only (skipped under cross), so we need +// only the native cases; these are passed as separate argv tokens (no shell). +std::vector host_base_flags(const mcpp::toolchain::Toolchain& tc) { + std::vector f; + // Clang reads its sibling `.cfg` by default, which wires libc++ + the + // sysroot. A simple host compile trusts it (the main build bypasses the cfg + // for reproducibility; here correctness on a fresh box is all we need). + if (mcpp::toolchain::is_clang(tc)) return f; + + // GCC: a fresh sandbox g++ needs --sysroot to find the C library + the + // include-fixed headers; without a sysroot, wire the glibc payload directly. + if (!tc.sysroot.empty()) { + f.push_back("--sysroot=" + tc.sysroot.string()); + } else if (tc.payloadPaths) { + auto& pp = *tc.payloadPaths; + f.push_back("-idirafter"); f.push_back(pp.glibcInclude.string()); + if (!pp.linuxInclude.empty()) { f.push_back("-idirafter"); f.push_back(pp.linuxInclude.string()); } + f.push_back("-B" + pp.glibcLib.string()); // crt1.o/crti.o discovery + f.push_back("-L" + pp.glibcLib.string()); // -lc/-lm resolution + } + // binutils -B so the driver finds ld/as (GCC, non-musl; musl ships its own). + if (!mcpp::toolchain::is_musl_target(tc)) { + auto ar = mcpp::toolchain::archive_tool(tc); + if (!ar.empty()) f.push_back("-B" + ar.parent_path().string()); + } + // Runtime lib dirs so the produced program can load private libs in-tree. + for (auto& d : tc.linkRuntimeDirs) { + f.push_back("-L" + d.string()); + f.push_back("-Wl,-rpath," + d.string()); + } + return f; +} + +// ── Cache (line-based; one record per line, internal format) ─────────────── +// program +// compiler +// in +// env +// d cxxflag|cflag|ldflag|define|generated +// The leading program/compiler/in/env lines are the re-run key; the `d` lines +// are the directives to reapply on a hit. + +// build.mcpp artifacts live under target/ (the build output tree), not in the +// project: target/.build-mcpp/{build.mcpp.bin, build.mcpp.cache}. A stable subdir +// (not the fingerprint-keyed one — build.mcpp runs before the fingerprint exists) +// so the binary + cache survive across builds and aren't rebuilt needlessly. +fs::path build_dir(const fs::path& root) { return root / "target" / ".build-mcpp"; } + +std::string cache_path(const fs::path& root) { + return (build_dir(root) / "build.mcpp.cache").string(); +} + +void write_cache(const fs::path& root, const std::string& programHash, + const std::string& compilerHash, const Directives& d) { + std::ofstream os(cache_path(root), std::ios::trunc); + if (!os) return; // best-effort: a failed cache write only loses the optimization + os << "program " << programHash << '\n'; + os << "compiler " << compilerHash << '\n'; + for (auto const& f : d.rerunFiles) + os << "in " << mcpp::toolchain::hash_file(abs_against_root(root, f)) << ' ' << f << '\n'; + for (auto const& e : d.rerunEnv) + os << "env " << mcpp::toolchain::hash_string(env_value(e)) << ' ' << e << '\n'; + auto emit = [&](std::string_view kind, const std::vector& v) { + for (auto const& x : v) os << "d " << kind << ' ' << x << '\n'; + }; + emit("cxxflag", d.cxxflags); + emit("cflag", d.cflags); + emit("ldflag", d.ldflags); + emit("define", d.defines); + emit("generated", d.generated); +} + +struct CacheRecord { + std::string programHash; + std::string compilerHash; + std::vector> inputs; // (hash, path) + std::vector> envs; // (hash, name) + Directives directives; + bool loaded = false; +}; + +CacheRecord read_cache(const fs::path& root) { + CacheRecord r; + std::ifstream is(cache_path(root)); + if (!is) return r; + std::string line; + while (std::getline(is, line)) { + if (line.empty()) continue; + auto sp = line.find(' '); + if (sp == std::string::npos) continue; + std::string tag = line.substr(0, sp); + std::string rest = line.substr(sp + 1); + if (tag == "program") r.programHash = rest; + else if (tag == "compiler") r.compilerHash = rest; + else if (tag == "in" || tag == "env") { + auto sp2 = rest.find(' '); + if (sp2 == std::string::npos) continue; + std::string h = rest.substr(0, sp2), name = rest.substr(sp2 + 1); + (tag == "in" ? r.inputs : r.envs).emplace_back(h, name); + } else if (tag == "d") { + auto sp2 = rest.find(' '); + if (sp2 == std::string::npos) continue; + std::string kind = rest.substr(0, sp2), val = rest.substr(sp2 + 1); + if (kind == "cxxflag") r.directives.cxxflags.push_back(val); + else if (kind == "cflag") r.directives.cflags.push_back(val); + else if (kind == "ldflag") r.directives.ldflags.push_back(val); + else if (kind == "define") r.directives.defines.push_back(val); + else if (kind == "generated") r.directives.generated.push_back(val); + } + } + r.loaded = true; + return r; +} + +// Decide whether the cached run is still valid (so we can skip recompiling/running). +bool cache_fresh(const fs::path& root, const CacheRecord& c, + const std::string& programHash, const std::string& compilerHash) { + if (!c.loaded) return false; + if (c.programHash != programHash) return false; + if (c.compilerHash != compilerHash) return false; + for (auto const& [h, path] : c.inputs) + if (mcpp::toolchain::hash_file(abs_against_root(root, path)) != h) return false; + for (auto const& [h, name] : c.envs) + if (mcpp::toolchain::hash_string(env_value(name)) != h) return false; + // A declared generated output that vanished invalidates the cache. + for (auto const& g : c.directives.generated) + if (!fs::exists(abs_against_root(root, g))) return false; + return true; +} + +void apply(mcpp::manifest::Manifest& m, const Directives& d) { + auto& bc = m.buildConfig; + bc.cxxflags.insert(bc.cxxflags.end(), d.cxxflags.begin(), d.cxxflags.end()); + bc.cflags.insert(bc.cflags.end(), d.cflags.begin(), d.cflags.end()); + bc.ldflags.insert(bc.ldflags.end(), d.ldflags.begin(), d.ldflags.end()); + // cfg defines apply to both C and C++ translation units. + bc.cflags.insert(bc.cflags.end(), d.defines.begin(), d.defines.end()); + bc.cxxflags.insert(bc.cxxflags.end(), d.defines.begin(), d.defines.end()); + // Generated sources join the source glob set so the modgraph scanner finds them. + for (auto const& g : d.generated) bc.sources.push_back(g); +} + +} // namespace + +std::expected run_build_program( + mcpp::manifest::Manifest& m, + const fs::path& root, + const fs::path& hostCompiler, + const mcpp::toolchain::Toolchain& tc, + std::string_view cppStandard, + bool isCross) { + + fs::path src = root / "build.mcpp"; + std::error_code ec; + if (!fs::exists(src, ec)) return {}; // no build program — nothing to do + + if (isCross) { + mcpp::ui::warning( + "build.mcpp present but skipped under a cross --target build " + "(it compiles and runs on the host; host-toolchain-for-cross is a follow-up)"); + return {}; + } + + std::string programHash = mcpp::toolchain::hash_file(src); + std::string compilerHash = mcpp::toolchain::hash_string(hostCompiler.string()); + + // Fast path: declared inputs unchanged → reapply cached directives, no run. + CacheRecord cache = read_cache(root); + if (cache_fresh(root, cache, programHash, compilerHash)) { + apply(m, cache.directives); + mcpp::ui::info("build.mcpp", "up to date (cached)"); + return {}; + } + + fs::create_directories(build_dir(root), ec); + fs::path bin = build_dir(root) / "build.mcpp.bin"; + + // ── Compile build.mcpp with the host toolchain ────────────────────────── + std::string std_flag = "-std=" + std::string(cppStandard.empty() ? "c++23" : cppStandard); + // `-x c++` is required: the `.mcpp` extension is unknown to the compiler, so + // without it the driver hands build.mcpp to the linker as a linker script. + std::vector compileArgv = { hostCompiler.string(), std_flag, "-O0" }; + for (auto& bf : host_base_flags(tc)) compileArgv.push_back(bf); + compileArgv.push_back("-x"); compileArgv.push_back("c++"); + compileArgv.push_back(src.string()); + compileArgv.push_back("-o"); compileArgv.push_back(bin.string()); + mcpp::ui::info("build.mcpp", "compiling"); + auto cres = mcpp::platform::process::capture_exec(compileArgv); + if (cres.exit_code != 0) { + return std::unexpected(std::format( + "build.mcpp failed to compile (exit {}):\n{}", cres.exit_code, cres.output)); + } + + // ── Run it; capture stdout(+stderr) and parse directives ──────────────── + mcpp::ui::info("build.mcpp", "running"); + auto rres = mcpp::platform::process::capture_exec({bin.string()}); + if (rres.exit_code != 0) { + return std::unexpected(std::format( + "build.mcpp exited with {} (build aborted):\n{}", rres.exit_code, rres.output)); + } + + Directives d; + parse_output(root, rres.output, d); + + // Missing declared generated outputs are a hard error (declared-output contract). + for (auto const& g : d.generated) { + if (!fs::exists(abs_against_root(root, g))) { + return std::unexpected(std::format( + "build.mcpp declared generated source '{}' but it does not exist after the run", g)); + } + } + + apply(m, d); + write_cache(root, programHash, compilerHash, d); + return {}; +} + +} // namespace mcpp::build diff --git a/src/build/prepare.cppm b/src/build/prepare.cppm index f07eea8..aea57d5 100644 --- a/src/build/prepare.cppm +++ b/src/build/prepare.cppm @@ -23,6 +23,7 @@ import mcpp.toolchain.stdmod; import mcpp.toolchain.post_install; import mcpp.toolchain.abi; import mcpp.build.plan; +import mcpp.build.build_program; import mcpp.lockfile; import mcpp.config; import mcpp.xlings; @@ -847,6 +848,21 @@ prepare_build(bool print_fingerprint, // Clang clang++.cfg). mcpp does not override it — the payload is // self-describing. See docs: 2026-05-21-linux-sysroot-missing-kernel-headers.md + // ── L3: project-local `build.mcpp` imperative build program ───────────── + // Compiled with the (host) toolchain and run now — after target resolution + // + the L1 cfg-flag merge (buildConfig flags are final) and BEFORE the + // modgraph scan (so its `generated=` sources are picked up). Its stdout + // directives augment buildConfig; a declared-input cache re-runs it only + // when its source/inputs/env change. Leaf-only: it cannot gate the top-level + // dependency graph. Skipped under a cross --target (host program, host run). + // See .agents/docs/2026-06-30-l3-build-mcpp-implementation-design.md. + if (auto bp = mcpp::build::run_build_program( + *m, *root, explicit_compiler, *tc, m->cppStandard.canonical, + /*isCross=*/!overrides.target_triple.empty()); + !bp) { + return std::unexpected(bp.error()); + } + // Resolve dependencies: walk the **transitive** graph from the main // manifest, BFS-style. Each unique `(namespace, shortName)` is fetched // once, its `[build].include_dirs` are propagated to the main diff --git a/src/toolchain/fingerprint.cppm b/src/toolchain/fingerprint.cppm index 075bf12..50847db 100644 --- a/src/toolchain/fingerprint.cppm +++ b/src/toolchain/fingerprint.cppm @@ -18,7 +18,7 @@ import mcpp.toolchain.detect; export namespace mcpp::toolchain { -inline constexpr std::string_view MCPP_VERSION = "0.0.77"; +inline constexpr std::string_view MCPP_VERSION = "0.0.78"; struct FingerprintInputs { Toolchain toolchain; diff --git a/tests/e2e/89_build_mcpp.sh b/tests/e2e/89_build_mcpp.sh new file mode 100755 index 0000000..50736ee --- /dev/null +++ b/tests/e2e/89_build_mcpp.sh @@ -0,0 +1,70 @@ +#!/usr/bin/env bash +# 89_build_mcpp.sh — L3 `build.mcpp`: a project-local native imperative build +# program (Zig build.zig / Cargo build.rs model, in C++). mcpp compiles it with +# the host toolchain and runs it before the main build; its stdout `mcpp:` +# directives augment the build (here: a cxxflag define + a generated source). A +# declared-input cache re-runs it only when its source / inputs / env change. +# See .agents/docs/2026-06-30-l3-build-mcpp-implementation-design.md. +# +# requires: gcc +set -e + +TMP=$(mktemp -d) +trap "rm -rf $TMP" EXIT +cd "$TMP" + +mkdir -p app/src +cd app + +cat > mcpp.toml <<'EOF' +[package] +name = "app" +version = "0.1.0" +EOF + +# main.cpp hard-asserts the cxxflag define reached this TU, and links a function +# that only exists in the generated source the build program writes. +cat > src/main.cpp <<'EOF' +#ifndef FROM_BUILD_MCPP +#error "build.mcpp cxxflag did not reach the translation unit" +#endif +int generated_value(); +int main() { return generated_value() == 42 ? 0 : 1; } +EOF + +# The build program: writes a generated source, emits a define, declares an env input. +cat > build.mcpp <<'EOF' +#include +#include +int main() { + std::ofstream("src/generated.cpp") << "int generated_value() { return 42; }\n"; + std::puts("mcpp:cxxflag=-DFROM_BUILD_MCPP=1"); + std::puts("mcpp:generated=src/generated.cpp"); + std::puts("mcpp:rerun-if-env-changed=MCPP_TEST_TOGGLE"); + return 0; +} +EOF + +# ── Build 1: first run — compiles+runs build.mcpp, applies directives ────── +"$MCPP" build > b1.log 2>&1 || { cat b1.log; echo "FAIL: build 1 errored"; exit 1; } +grep -q "build.mcpp" b1.log || { cat b1.log; echo "FAIL: build.mcpp not invoked"; exit 1; } +[ -f src/generated.cpp ] || { echo "FAIL: generated source not written"; exit 1; } +[ -f target/.build-mcpp/build.mcpp.cache ] || { echo "FAIL: cache not written under target/"; exit 1; } + +# The binary returns 0 only if both the define AND the generated source took effect. +"$MCPP" run > r1.log 2>&1 || { cat r1.log; echo "FAIL: run returned non-zero (define/generated source missing)"; exit 1; } + +# ── Build 2: touch a source so prepare runs again, but build.mcpp inputs are +# unchanged — the declared-input cache short-circuits the re-run. (A no-change +# rebuild is skipped wholesale by the top-level up-to-date check, so we touch a +# source to actually exercise the prepare path.) +touch src/main.cpp +"$MCPP" build > b2.log 2>&1 || { cat b2.log; echo "FAIL: build 2 errored"; exit 1; } +grep -qi "build.mcpp.*cached" b2.log || { cat b2.log; echo "FAIL: build.mcpp did not short-circuit (expected cached)"; exit 1; } + +# ── Build 3: a declared env input changed — forces a re-run ──────────────── +touch src/main.cpp +MCPP_TEST_TOGGLE=1 "$MCPP" build > b3.log 2>&1 || { cat b3.log; echo "FAIL: build 3 errored"; exit 1; } +grep -qi "build.mcpp.*\(running\|compiling\)" b3.log || { cat b3.log; echo "FAIL: changed env did not force build.mcpp re-run"; exit 1; } + +echo "OK"