Skip to content

feat(core): surface validationOutcome on TASK_COMPLETED event (#429)#489

Open
lmorchard wants to merge 1 commit into
mainfrom
feat/429-validator-outcome-cli
Open

feat(core): surface validationOutcome on TASK_COMPLETED event (#429)#489
lmorchard wants to merge 1 commit into
mainfrom
feat/429-validator-outcome-cli

Conversation

@lmorchard
Copy link
Copy Markdown
Collaborator

@lmorchard lmorchard commented Jun 1, 2026

Summary

PR2 of the 429 sequence — companion to #463 (PR1, merged).

  • Threads validationOutcome (added to TaskExecutionResult in PR1) onto the TASK_COMPLETED event payload, so both human and JSON consumers see the signal. Eval-judge / telemetry callers running pilo run --logger json now get the outcome on task:completed events without needing the function return value.
  • ChalkConsoleLogger prints a yellow warning line below the final answer only when validationOutcome === "force-accepted"; silent on "accepted" and undefined. The user already sees per-attempt TASK_VALIDATED output during the run, so the success case isn't restated.
  • Folds pnpm --filter pilo-core run check:schemas into the root check script. Local pnpm run check now catches the same schema drift CI gates on. (Flagged in PR1's session notes; small enough to bundle here.)

Extension UI surfacing of validationOutcome (badge / color in ChatView) is still deferred and not done here — out of scope for this PR.

Design Decisions

  • Event payload, not CLI return-value capture. Both ChalkConsoleLogger and JSONConsoleLogger consume events; a run.ts-only print would leave JSON consumers (eval-judge, telemetry) blind. Routing through TaskCompleteEventData lights up both paths from a single source. Wire-API surface grows by one optional field on task:completed events (in addition to the stream-end complete event where the same value already lives via StreamCompleteEventData from PR1); the duplicate-on-wire was an explicit trade-off.

  • Warn only on "force-accepted". Silent on "accepted" and undefined. The validator's per-attempt verdict already shows in chalkConsole (✅ Task completed / ⚠️ Partial completion / ❌ Task failed); a final "validator accepted" footer would just restate that. Force-accept is the only state that isn't already in the chalk stream and is the only one with operational meaning.

  • Two values only. "accepted" | "force-accepted", same as PR1. No new enum values; both force-accept sub-cases (validator-disagreed and validator-errored) still lump together.

  • check:schemas after typecheck, before test. Schema drift fails fast — before the multi-minute test suite has a chance to run.

Changes

packages/core/src/:

  • events.tsTaskCompleteEventData gains optional validationOutcome?: "accepted" | "force-accepted" with JSDoc.
  • webAgent.ts:1901-1904 — emit site adds conditional spread of executionOutcome.validationOutcome, mirroring the existing pattern at webAgent.ts:1909-1911 for the TaskExecutionResult return.
  • loggers/chalkConsole.ts:182-186handleTaskComplete prints a yellow warning when the field is "force-accepted"; existing final-answer line unchanged.

packages/core/test/:

  • loggers.test.ts — new describe("ChalkConsoleLogger") block with 3 tests covering force-accepted (warning + final answer), accepted (final answer only), undefined (final answer only). Extends the existing test file rather than standing up a new one.
  • webAgent.test.ts — 4 existing PR1 validationOutcome tests extended with a mockLogger.events.find(TASK_COMPLETED) assertion on the new event payload, in addition to the existing result.validationOutcome assertions.

packages/core/schemas/webagent-event.json — regenerated by pnpm --filter pilo-core run check:schemas. Single property addition under TaskCompleteEventData; nothing else changed.

Root + project docs:

  • package.jsoncheck script chains pnpm --filter pilo-core run check:schemas between typecheck and test.
  • CLAUDE.md — two lines describing the check script's contents updated to match the new chain.

Test Plan

  • pnpm run check passes (core 843, cli 228, server 96, extension 272)
  • pnpm run typecheck passes
  • pnpm run format:check passes
  • pnpm --filter pilo-core run check:schemas clean on a clean tree
  • Manual CLI eyeball skipped this session — force-accept is hard to trigger reliably without an impossible-criterion fixture. Unit tests cover all four states (accepted, two force-accept sub-cases, undefined). Reviewer: optional confirmation with a real run if convenient.

References

PR2 of the 429 sequence (PR1 = #463, merged). Threads validationOutcome
from TaskExecutionResult onto the TASK_COMPLETED event payload so both
ChalkConsoleLogger (human-readable warning on force-accept) and
JSONConsoleLogger (raw event stream) surface the signal. Eval-judge and
telemetry consumers running the CLI with --logger json now see the
outcome on task:completed events without needing the function return.

chalkConsole prints a yellow warning line below the final answer only
when validationOutcome === "force-accepted"; silent on "accepted" and
undefined. JSON output is unchanged in structure — the new optional
field appears automatically inside the existing task:completed event's
data object.

Also folds pnpm --filter pilo-core run check:schemas into the root
check script so schema drift fails locally instead of only in CI.
CLAUDE.md's two descriptions of the check chain are updated to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Surfaces the task validator’s validationOutcome on the task:completed (TASK_COMPLETED) event so event-driven consumers (chalk/json loggers, telemetry/eval tooling) can read it without relying on the execute() return value.

Changes:

  • Adds optional validationOutcome?: "accepted" | "force-accepted" to TaskCompleteEventData and threads it into the TASK_COMPLETED emit path.
  • Updates ChalkConsoleLogger to print a warning only when validationOutcome === "force-accepted", and adds unit tests for the new output behavior.
  • Regenerates the webagent event JSON schema and wires check:schemas into the root pnpm run check script (and updates docs).

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
packages/core/src/events.ts Extends TASK_COMPLETED event payload type with optional validationOutcome + documentation.
packages/core/src/webAgent.ts Emits validationOutcome on TASK_COMPLETED event payload.
packages/core/src/loggers/chalkConsole.ts Prints a warning footer when validation was force-accepted.
packages/core/test/loggers.test.ts Adds ChalkConsoleLogger tests covering warning/no-warning cases.
packages/core/test/webAgent.test.ts Asserts TASK_COMPLETED includes validationOutcome in validation-related scenarios.
packages/core/schemas/webagent-event.json Regenerated schema to include validationOutcome on TaskCompleteEventData.
package.json Adds core check:schemas step into root check script.
CLAUDE.md Updates documentation to reflect the new check script chain.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1901 to 1907
this.emit(WebAgentEventType.TASK_COMPLETED, {
success: executionOutcome.success,
finalAnswer: executionOutcome.finalAnswer,
...(executionOutcome.validationOutcome && {
validationOutcome: executionOutcome.validationOutcome,
}),
});
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, but this is pre-existing — git blame shows the emit site at webAgent.ts:1901-1907 has been missing both fields since 2301191 (Aug 2025), well before #463 or this PR touched the surrounding code. Other emit sites in the same file have the same gap (some supply iterationId, none I sampled supply timestamp). The wrapper at webAgent.ts:1993-1995 uses data: any which is why TypeScript doesn't flag it.

Filed as #490 for a proper audit/fix — out of scope for this PR (which only adds the optional validationOutcome field) but worth fixing properly so the schema and runtime payload agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants