Skip to content

Commit 795045f

Browse files
committed
Deepen compare output and benchmark metadata
1 parent 2cd3189 commit 795045f

19 files changed

Lines changed: 218 additions & 15 deletions

File tree

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,7 @@ npm run dev -- pack --config ./starter/taskbundle.config.json
144144
`pack` also supports:
145145
- automatic git metadata detection
146146
- artifact hashes and sizes in `bundle.json`
147+
- benchmark-style outcome fields such as `status`, `score`, and `judgeNotes`
147148
- optional `.tar.gz` archive creation with `--archive`
148149

149150
### `taskbundle inspect`
@@ -220,6 +221,7 @@ They represent the same task captured from different tool/model combinations so
220221
- `workspace/files/`: captured task-related files
221222
- `git`: optional git root / branch / remote / commit metadata
222223
- `runner`: optional pack-time runtime metadata
224+
- `outcome`: optional benchmark or judge result fields
223225

224226
## Local Development
225227

@@ -280,7 +282,7 @@ docs/
280282
## Known Limitations
281283

282284
- Archives currently use `.tar.gz`, not `.zip`.
283-
- The project compares bundle-level metadata and counts, not semantic code quality.
285+
- The project can compare metadata, scores, and artifact hashes, but it still does not judge semantic code quality on its own.
284286
- Workspace capture still uses explicit copied file sets instead of repository-wide snapshot strategies.
285287
- There is no viewer UI yet.
286288

README.zh-CN.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,7 @@ npm run dev -- pack --config ./starter/taskbundle.config.json
144144
`pack` 现在还支持:
145145
- 自动采集 git 元数据
146146
-`bundle.json` 中记录 artifact 哈希和大小
147+
- 写入 benchmark / judge 结果字段,例如 `status``score``judgeNotes`
147148
-`--archive` 直接生成 `.tar.gz`
148149

149150
### `taskbundle inspect`
@@ -220,6 +221,7 @@ npm run dev -- scan ./examples
220221
- `workspace/files/`:捕获到的任务相关文件
221222
- `git`:可选的 git root / branch / remote / commit 信息
222223
- `runner`:可选的打包运行时信息
224+
- `outcome`:可选的 benchmark / judge 结果字段
223225

224226
## 本地开发
225227

@@ -280,7 +282,7 @@ docs/
280282
## 当前限制
281283

282284
- 归档格式目前是 `.tar.gz`,不是 `.zip`
283-
- compare 目前主要比较 bundle 元数据和统计信息,不做语义级代码评判
285+
- compare 现在可以比较 metadata、score 和 artifact hash,但它仍然不会自动做语义级代码评判
284286
- workspace 捕获仍然基于显式文件集合,而不是整仓库镜像策略
285287
- 还没有 viewer UI
286288

ROADMAP.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,15 @@ Task Bundle started as a small CLI MVP. This roadmap turns it into a practical f
99
- Done: schema validation for bundle metadata, workspace manifests, and event logs
1010
- Done: automatic git metadata detection during packing
1111
- Done: `compare` command
12+
- Done: richer `compare` output with artifact hash differences and score deltas
1213
- Done: `archive` and `extract` commands for `.tar.gz` bundles
1314
- Done: `validate` and `scan` commands for replay checks and bundle collections
1415
- Done: artifact hashes and sizes in `bundle.json`
16+
- Done: benchmark-style outcome fields in bundle metadata
1517
- Done: CLI smoke tests and GitHub Actions CI
1618
- Done: Chinese and English documentation
1719

1820
### v0.3
19-
- Planned: richer `compare` output with file-level change summaries
2021
- Planned: machine-readable benchmark result fields and scoring conventions
2122
- Planned: bundle collections and directory scans for multi-run comparisons
2223
- Planned: more curated example bundles for benchmark-style demos

ROADMAP.zh-CN.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,15 @@ Task Bundle 目前已经从一个小型 CLI MVP,走到了“可实际使用的
99
- 已完成:bundle metadata、workspace manifest、events 的 schema 校验
1010
- 已完成:打包时自动采集 git 元数据
1111
- 已完成:`compare` 命令
12+
- 已完成:更丰富的 `compare` 输出,包括 artifact hash 差异和 score delta
1213
- 已完成:`.tar.gz` 归档与解压命令
1314
- 已完成:`validate``scan` 命令,用于 replay 校验和 bundle 集合扫描
1415
- 已完成:artifact 哈希和大小写入 `bundle.json`
16+
- 已完成:bundle metadata 中的 benchmark / judge 结果字段
1517
- 已完成:CLI smoke tests 和 GitHub Actions CI
1618
- 已完成:中英文文档
1719

1820
### v0.3
19-
- 规划中:更丰富的 `compare` 输出,包括文件级差异摘要
2021
- 规划中:面向 benchmark 的结果字段和评分约定
2122
- 规划中:多 bundle 目录扫描与批量比较
2223
- 规划中:更多用于演示和 benchmark 的标准 example bundles

examples/hello-world-bundle-claude/bundle.json

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,5 +17,43 @@
1717
"events": "events.jsonl",
1818
"workspaceManifest": "workspace/manifest.json",
1919
"workspaceFilesDir": "workspace/files"
20+
},
21+
"artifactInfo": {
22+
"task": {
23+
"path": "task.md",
24+
"sha256": "18f5af61553b0b4e4de1a631c8984b9c260b2d8f04e709919d3e1da361c3ccf8",
25+
"size": 327
26+
},
27+
"summary": {
28+
"path": "summary.md",
29+
"sha256": "9197bf9acdc1ae2c0b816c768ff8d965beaa0dbbc5c0a912af1d566dac77c53a",
30+
"size": 243
31+
},
32+
"diff": {
33+
"path": "result.diff",
34+
"sha256": "019b7bc586b6b47ea312f3a2477c6a921ef26863b2d794686735efaa6f9d4f03",
35+
"size": 216
36+
},
37+
"events": {
38+
"path": "events.jsonl",
39+
"sha256": "99552c51ac717180ccf82cba204b145feac016aedd34357ece403e29862ff91c",
40+
"size": 414
41+
},
42+
"workspaceManifest": {
43+
"path": "workspace/manifest.json",
44+
"sha256": "597e67b477306fa3cb74838e3273c2aba220940b5ad81530e6ea820ea5174d1a",
45+
"size": 255
46+
}
47+
},
48+
"runner": {
49+
"os": "darwin",
50+
"nodeVersion": "v25.8.0",
51+
"cliVersion": "0.3.0",
52+
"promptSource": "cli"
53+
},
54+
"outcome": {
55+
"status": "success",
56+
"score": 0.89,
57+
"judgeNotes": "Correct fix, slightly more event noise, still replay-ready."
2058
}
2159
}

examples/hello-world-bundle/bundle.json

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,5 +44,16 @@
4444
"sha256": "597e67b477306fa3cb74838e3273c2aba220940b5ad81530e6ea820ea5174d1a",
4545
"size": 255
4646
}
47+
},
48+
"runner": {
49+
"os": "darwin",
50+
"nodeVersion": "v25.8.0",
51+
"cliVersion": "0.3.0",
52+
"promptSource": "cli"
53+
},
54+
"outcome": {
55+
"status": "success",
56+
"score": 0.93,
57+
"judgeNotes": "Small patch, correct output, complete workspace capture."
4758
}
4859
}

package-lock.json

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "task-bundle",
3-
"version": "0.2.0",
3+
"version": "0.3.0",
44
"description": "Portable task bundles for AI coding work.",
55
"license": "MIT",
66
"type": "commonjs",

src/cli/commands/compare.ts

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,11 +28,34 @@ export function registerCompareCommand(program: Command): void {
2828
printKeyValue("Right model", comparison.modelChange.right);
2929
printKeyValue("Left tool", comparison.toolChange.left);
3030
printKeyValue("Right tool", comparison.toolChange.right);
31+
printKeyValue("Left status", comparison.outcomeChange.leftStatus);
32+
printKeyValue("Right status", comparison.outcomeChange.rightStatus);
33+
printKeyValue("Left score", comparison.outcomeChange.leftScore?.toString());
34+
printKeyValue("Right score", comparison.outcomeChange.rightScore?.toString());
35+
printKeyValue("Score delta", formatNumber(comparison.outcomeChange.scoreDelta));
3136
console.log("");
3237
printList("Only in left", comparison.artifactDelta.onlyInLeft.length > 0 ? comparison.artifactDelta.onlyInLeft : ["None"]);
3338
printList("Only in right", comparison.artifactDelta.onlyInRight.length > 0 ? comparison.artifactDelta.onlyInRight : ["None"]);
3439
console.log("");
40+
printList(
41+
"Artifact hash changes",
42+
comparison.artifactChanges
43+
.filter((artifact) => !artifact.sameHash)
44+
.map((artifact) => `${artifact.artifact}: ${artifact.left?.sha256 ?? "missing"} -> ${artifact.right?.sha256 ?? "missing"}`)
45+
.concat(
46+
comparison.artifactChanges.filter((artifact) => !artifact.sameHash).length === 0 ? ["None"] : []
47+
)
48+
);
49+
console.log("");
3550
printKeyValue("Workspace file delta", String(comparison.counts.workspaceFilesDelta));
3651
printKeyValue("Event count delta", String(comparison.counts.eventCountDelta));
3752
});
3853
}
54+
55+
function formatNumber(value: number | undefined): string | undefined {
56+
if (value === undefined) {
57+
return undefined;
58+
}
59+
60+
return Number(value.toFixed(4)).toString();
61+
}

src/cli/commands/inspect.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,9 @@ export function registerInspectCommand(program: Command): void {
2929
printKeyValue("Repo", inspection.repo);
3030
printKeyValue("Commit", inspection.commit);
3131
printKeyValue("Branch", inspection.branch);
32+
printKeyValue("Status", inspection.outcome?.status);
33+
printKeyValue("Score", inspection.outcome?.score?.toString());
34+
printKeyValue("Prompt source", inspection.promptSource);
3235
printKeyValue("Tags", inspection.tags.join(", "));
3336
console.log("");
3437
printList("Artifacts", inspection.artifacts);

0 commit comments

Comments
 (0)