feat(core): surface validationOutcome on TASK_COMPLETED event (#429)#489
feat(core): surface validationOutcome on TASK_COMPLETED event (#429)#489lmorchard wants to merge 1 commit into
Conversation
PR2 of the 429 sequence (PR1 = #463, merged). Threads validationOutcome from TaskExecutionResult onto the TASK_COMPLETED event payload so both ChalkConsoleLogger (human-readable warning on force-accept) and JSONConsoleLogger (raw event stream) surface the signal. Eval-judge and telemetry consumers running the CLI with --logger json now see the outcome on task:completed events without needing the function return. chalkConsole prints a yellow warning line below the final answer only when validationOutcome === "force-accepted"; silent on "accepted" and undefined. JSON output is unchanged in structure — the new optional field appears automatically inside the existing task:completed event's data object. Also folds pnpm --filter pilo-core run check:schemas into the root check script so schema drift fails locally instead of only in CI. CLAUDE.md's two descriptions of the check chain are updated to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Surfaces the task validator’s validationOutcome on the task:completed (TASK_COMPLETED) event so event-driven consumers (chalk/json loggers, telemetry/eval tooling) can read it without relying on the execute() return value.
Changes:
- Adds optional
validationOutcome?: "accepted" | "force-accepted"toTaskCompleteEventDataand threads it into the TASK_COMPLETED emit path. - Updates
ChalkConsoleLoggerto print a warning only whenvalidationOutcome === "force-accepted", and adds unit tests for the new output behavior. - Regenerates the webagent event JSON schema and wires
check:schemasinto the rootpnpm run checkscript (and updates docs).
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/events.ts | Extends TASK_COMPLETED event payload type with optional validationOutcome + documentation. |
| packages/core/src/webAgent.ts | Emits validationOutcome on TASK_COMPLETED event payload. |
| packages/core/src/loggers/chalkConsole.ts | Prints a warning footer when validation was force-accepted. |
| packages/core/test/loggers.test.ts | Adds ChalkConsoleLogger tests covering warning/no-warning cases. |
| packages/core/test/webAgent.test.ts | Asserts TASK_COMPLETED includes validationOutcome in validation-related scenarios. |
| packages/core/schemas/webagent-event.json | Regenerated schema to include validationOutcome on TaskCompleteEventData. |
| package.json | Adds core check:schemas step into root check script. |
| CLAUDE.md | Updates documentation to reflect the new check script chain. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| this.emit(WebAgentEventType.TASK_COMPLETED, { | ||
| success: executionOutcome.success, | ||
| finalAnswer: executionOutcome.finalAnswer, | ||
| ...(executionOutcome.validationOutcome && { | ||
| validationOutcome: executionOutcome.validationOutcome, | ||
| }), | ||
| }); |
There was a problem hiding this comment.
Good catch, but this is pre-existing — git blame shows the emit site at webAgent.ts:1901-1907 has been missing both fields since 2301191 (Aug 2025), well before #463 or this PR touched the surrounding code. Other emit sites in the same file have the same gap (some supply iterationId, none I sampled supply timestamp). The wrapper at webAgent.ts:1993-1995 uses data: any which is why TypeScript doesn't flag it.
Filed as #490 for a proper audit/fix — out of scope for this PR (which only adds the optional validationOutcome field) but worth fixing properly so the schema and runtime payload agree.
Summary
PR2 of the 429 sequence — companion to #463 (PR1, merged).
validationOutcome(added toTaskExecutionResultin PR1) onto theTASK_COMPLETEDevent payload, so both human and JSON consumers see the signal. Eval-judge / telemetry callers runningpilo run --logger jsonnow get the outcome ontask:completedevents without needing the function return value.ChalkConsoleLoggerprints a yellow warning line below the final answer only whenvalidationOutcome === "force-accepted"; silent on"accepted"andundefined. The user already sees per-attemptTASK_VALIDATEDoutput during the run, so the success case isn't restated.pnpm --filter pilo-core run check:schemasinto the rootcheckscript. Localpnpm run checknow catches the same schema drift CI gates on. (Flagged in PR1's session notes; small enough to bundle here.)Extension UI surfacing of
validationOutcome(badge / color in ChatView) is still deferred and not done here — out of scope for this PR.Design Decisions
Event payload, not CLI return-value capture. Both
ChalkConsoleLoggerandJSONConsoleLoggerconsume events; arun.ts-only print would leave JSON consumers (eval-judge, telemetry) blind. Routing throughTaskCompleteEventDatalights up both paths from a single source. Wire-API surface grows by one optional field ontask:completedevents (in addition to the stream-endcompleteevent where the same value already lives viaStreamCompleteEventDatafrom PR1); the duplicate-on-wire was an explicit trade-off.Warn only on
"force-accepted". Silent on"accepted"andundefined. The validator's per-attempt verdict already shows in chalkConsole (✅ Task completed/⚠️ Partial completion/❌ Task failed); a final "validator accepted" footer would just restate that. Force-accept is the only state that isn't already in the chalk stream and is the only one with operational meaning.Two values only.
"accepted" | "force-accepted", same as PR1. No new enum values; both force-accept sub-cases (validator-disagreed and validator-errored) still lump together.check:schemasaftertypecheck, beforetest. Schema drift fails fast — before the multi-minute test suite has a chance to run.Changes
packages/core/src/:events.ts—TaskCompleteEventDatagains optionalvalidationOutcome?: "accepted" | "force-accepted"with JSDoc.webAgent.ts:1901-1904— emit site adds conditional spread ofexecutionOutcome.validationOutcome, mirroring the existing pattern atwebAgent.ts:1909-1911for theTaskExecutionResultreturn.loggers/chalkConsole.ts:182-186—handleTaskCompleteprints a yellow warning when the field is"force-accepted"; existing final-answer line unchanged.packages/core/test/:loggers.test.ts— newdescribe("ChalkConsoleLogger")block with 3 tests coveringforce-accepted(warning + final answer),accepted(final answer only),undefined(final answer only). Extends the existing test file rather than standing up a new one.webAgent.test.ts— 4 existing PR1validationOutcometests extended with amockLogger.events.find(TASK_COMPLETED)assertion on the new event payload, in addition to the existingresult.validationOutcomeassertions.packages/core/schemas/webagent-event.json— regenerated bypnpm --filter pilo-core run check:schemas. Single property addition underTaskCompleteEventData; nothing else changed.Root + project docs:
package.json—checkscript chainspnpm --filter pilo-core run check:schemasbetweentypecheckandtest.CLAUDE.md— two lines describing thecheckscript's contents updated to match the new chain.Test Plan
pnpm run checkpasses (core 843, cli 228, server 96, extension 272)pnpm run typecheckpassespnpm run format:checkpassespnpm --filter pilo-core run check:schemasclean on a clean treeReferences
Companion to feat(core): wire validator history and surface validationOutcome (#429) #463 (PR1; merged)
Issue Wire validator conversation history and surface validationOutcome in TaskExecutionResult #429 (already closed by PR1; this PR finishes the deferred consumer-plumbing piece called out in PR1's spec)
Follow-up: Several webAgent event emits miss the WebAgentEventData base fields (timestamp, iterationId) #490 —
webAgent.tsemit sites miss base WebAgentEventData fields (timestamp,iterationId); pre-existing inconsistency surfaced by Copilot on this PR. Deferred per spec scope.