Skip to content

Commit cc1f12d

Browse files
committed
ci/test: un-escape ninja sequences in CDB args; harden e2e for Windows
First CI round revealed two layered issues exposed by the new e2e: 1. **The shipped CDB carries ninja-escape artefacts.** `flags.cppm`'s `escape_path` ninja-escapes paths (`$ ` for space, `$:` for colon, `$$` for literal `$`) so they survive embedding in ninja rule strings. But the same escaped strings flow verbatim through `f.cxx` into `compile_commands.json`, whose `arguments` array is exec'd by clangd without any ninja involved. On Windows that makes `C:` appear as `C$:` in CDB → clangd can't resolve the path. POSIX usually escaped nothing in practice, hiding the bug; Windows drive letters always trigger `$:`. Fix in `compile_commands.cppm::split_flags`: same single pass now splits tokens *and* unescapes the three ninja sequences. Splitting was also broken for paths with a literal space (would've broken `-isystem "/path with space/include"` mid-arg) — that's fixed simultaneously since `$ ` is no longer a token separator. 2. **e2e test 47 needed Windows-friendly checks.** The original `[^"]+` grep returned the raw JSON-escaped form (`\\Users\\...`) and the `^[A-Za-z]:` regex didn't match `C$:`. Rewrote using `jq -r` (which does JSON un-escape natively, preinstalled on all GHA runners) and: - explicit ninja-escape sniff (`$:`, `$ `, `$$`) - both POSIX and Windows-drive absolute-path checks - basename check normalised against `\` / `/` Net: CDB now contains plain paths in both runtimes, and the test fails loudly if either layer regresses.
1 parent e6de2b6 commit cc1f12d

2 files changed

Lines changed: 86 additions & 45 deletions

File tree

src/build/compile_commands.cppm

Lines changed: 38 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -37,20 +37,47 @@ bool is_c_source(const std::filesystem::path& src) {
3737
return src.extension() == ".c";
3838
}
3939

40-
// Split a flag string into individual tokens.
40+
// Split a flag string into individual tokens AND un-escape ninja-style
41+
// path escapes (`$ ` → space, `$:` → `:`, `$$` → `$`).
42+
//
43+
// `flags.cppm::escape_path` ninja-escapes path arguments so they survive
44+
// embedding in ninja rule strings. Those escaped strings are then captured
45+
// into `f.cxx` / `f.cc` which is what we receive here. CDB consumers like
46+
// clangd exec the `arguments` array literally — no ninja involved — so
47+
// escaped chars must be undone or paths like `C:\Users\...` come through
48+
// as `C$:\Users\...` and break clangd's path resolution on Windows. (The
49+
// same issue would silently affect any POSIX path containing a space or
50+
// `$` — those just happen to be rare.)
51+
//
52+
// Splitting and un-escaping in one pass is correct: a literal space inside
53+
// a path appears as `$ ` in the input, which we must NOT treat as a token
54+
// separator.
4155
std::vector<std::string> split_flags(std::string_view s) {
4256
std::vector<std::string> out;
43-
std::size_t i = 0;
44-
while (i < s.size()) {
45-
while (i < s.size() && s[i] == ' ')
46-
++i;
47-
if (i >= s.size())
48-
break;
49-
std::size_t start = i;
50-
while (i < s.size() && s[i] != ' ')
51-
++i;
52-
out.emplace_back(s.substr(start, i - start));
57+
std::string token;
58+
auto flush = [&] {
59+
if (!token.empty()) {
60+
out.push_back(std::move(token));
61+
token.clear();
62+
}
63+
};
64+
for (std::size_t i = 0; i < s.size(); ++i) {
65+
char c = s[i];
66+
if (c == '$' && i + 1 < s.size()) {
67+
char nc = s[i + 1];
68+
if (nc == ' ' || nc == ':' || nc == '$') {
69+
token.push_back(nc);
70+
++i;
71+
continue;
72+
}
73+
}
74+
if (c == ' ') {
75+
flush();
76+
} else {
77+
token.push_back(c);
78+
}
5379
}
80+
flush();
5481
return out;
5582
}
5683

Lines changed: 48 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,14 @@
11
#!/usr/bin/env bash
22
# requires:
33
# 47_cdb_prebuilt_module_path_abs.sh — `-fprebuilt-module-path` in
4-
# compile_commands.json must be an ABSOLUTE path, not a bare `pcm.cache` /
5-
# `gcm.cache`. Reason: CDB `directory` is the project root, but ninja runs
6-
# from the outputDir; a bare relative path works at build time only and
7-
# silently breaks clangd's module resolution ("module 'X' not found").
4+
# compile_commands.json must be an ABSOLUTE path, NOT a bare `pcm.cache`,
5+
# AND must not carry ninja-escape artefacts like `C$:` on Windows.
6+
# Reason: CDB `directory` is the project root and clangd does `cd
7+
# directory` before running the args, so a bare relative path points at
8+
# `<projectRoot>/pcm.cache` (missing) and a `C$:` prefix is treated as a
9+
# literal string, not a Windows drive letter. Both modes silently break
10+
# clangd's module resolution while `mcpp build` itself keeps working
11+
# (ninja runs from outputDir AND unescapes its own escape sequences).
812
set -e
913

1014
TMP=$(mktemp -d)
@@ -18,51 +22,61 @@ cd app
1822
cdb=compile_commands.json
1923
[[ -f "$cdb" ]] || { echo "FAIL: no $cdb generated"; exit 1; }
2024

21-
# Extract every -fprebuilt-module-path=<value> token.
22-
# (No jq dependency — grep is enough and portable on macOS / git-bash.)
23-
vals=$(grep -oE '\-fprebuilt-module-path=[^"]+' "$cdb" | sed 's/^-fprebuilt-module-path=//')
24-
if [[ -z "$vals" ]]; then
25-
# No -fprebuilt-module-path emitted = GCC toolchain that uses -fmodules
26-
# only (gcm.cache). bmi_traits sets needsPrebuiltModulePath=false for
27-
# the GCC path. Nothing to assert — pass.
25+
command -v jq >/dev/null 2>&1 || {
26+
echo "SKIP: jq not on PATH (preinstalled on GitHub-hosted runners)"
27+
exit 0
28+
}
29+
30+
# jq returns each value JSON-unescaped (\\ → \, etc.).
31+
mapfile -t vals < <(
32+
jq -r '
33+
.[] | .arguments[]?
34+
| select(type == "string" and startswith("-fprebuilt-module-path="))
35+
| sub("^-fprebuilt-module-path="; "")
36+
' "$cdb"
37+
)
38+
39+
if [[ ${#vals[@]} -eq 0 ]]; then
40+
# GCC's libstdc++ flow uses -fmodules / gcm.cache without the explicit
41+
# -fprebuilt-module-path flag (see bmi_traits.needsPrebuiltModulePath).
42+
# Nothing to assert in that mode.
2843
echo "OK (no prebuilt-module-path flag — GCC toolchain)"
2944
exit 0
3045
fi
3146

32-
# Every value must be absolute AND point at the actual build cache dir.
3347
fail=0
34-
while read -r v; do
35-
[[ -z "$v" ]] && continue
48+
for v in "${vals[@]}"; do
3649
echo " checking: $v"
3750

38-
# Absolute path test: leading '/' on POSIX or '<letter>:' on Windows.
51+
# Must NOT carry ninja-escape artefacts. The key signal is `$:` (drive
52+
# letter) or `$ ` / `$$` (path with space / dollar). If any of these
53+
# survives into CDB the JSON-args runtime treats them as literal text
54+
# → clangd fails to find the BMI.
55+
if [[ "$v" == *'$:'* || "$v" == *'$ '* || "$v" == *'$$'* ]]; then
56+
echo "FAIL: value retains ninja escape sequence ('\$:' / '\$ ' / '\$\$') — must be plain path in CDB"
57+
fail=1
58+
fi
59+
60+
# Absolute: POSIX (starts with '/') or Windows drive (e.g. 'C:').
3961
if [[ "$v" =~ ^/ || "$v" =~ ^[A-Za-z]: ]]; then
4062
:
4163
else
42-
echo "FAIL: -fprebuilt-module-path value is relative: '$v'"
43-
echo " this breaks clangd because CDB 'directory' = project root,"
44-
echo " but the BMI cache lives under target/<triple>/<fp>/."
64+
echo "FAIL: value is relative: '$v'"
65+
echo " CDB 'directory' is the project root, but the BMI cache"
66+
echo " lives under target/<triple>/<fp>/ — clangd resolves to"
67+
echo " the wrong location and module imports fail."
4568
fail=1
4669
fi
4770

48-
# Must end with pcm.cache or gcm.cache (sanity).
49-
case "$v" in
50-
*/pcm.cache|*/gcm.cache) ;;
51-
*) echo "FAIL: -fprebuilt-module-path value does not end in pcm.cache/gcm.cache: '$v'"
71+
# Basename must be pcm.cache or gcm.cache (cross-platform: normalise
72+
# backslashes first so Windows paths like 'C:\foo\pcm.cache' work).
73+
normalised="${v//\\//}"
74+
case "${normalised##*/}" in
75+
pcm.cache|gcm.cache) ;;
76+
*) echo "FAIL: basename is not pcm.cache/gcm.cache: '${normalised##*/}'"
5277
fail=1 ;;
5378
esac
54-
done <<< "$vals"
79+
done
5580

5681
[[ $fail -eq 0 ]] || exit 1
57-
58-
# Stronger: clangd would `cd` into CDB's directory then resolve. Verify the
59-
# value, taken at face value (already absolute), points at a real dir.
60-
first=$(echo "$vals" | head -1)
61-
if [[ ! -d "$first" ]]; then
62-
echo "FAIL: -fprebuilt-module-path resolves to a non-existent dir: $first"
63-
echo " (build succeeded, so the BMI dir must exist somewhere — this"
64-
echo " means the flag points to the wrong place even with abs path.)"
65-
exit 1
66-
fi
67-
6882
echo "OK"

0 commit comments

Comments
 (0)