Skip to content

Add token usage reporter and optimization skill#44

Open
V3RON wants to merge 2 commits into
mainfrom
feat/token-usage-reporter
Open

Add token usage reporter and optimization skill#44
V3RON wants to merge 2 commits into
mainfrom
feat/token-usage-reporter

Conversation

@V3RON
Copy link
Copy Markdown
Contributor

@V3RON V3RON commented May 19, 2026

Why this change is needed

Skillgym needs a built-in way for agents to compare billable token usage without parsing the full suite result or relying on ad-hoc artifacts. This adds a strict token-usage reporter for optimization loops and a bundled token-optimization skill that tells agents how to reduce token cost safely while keeping benchmark behavior intact.

For Skillgym users, this means token-cost tuning can stay inside the normal skillgym run workflow: run a baseline, make one small change, compare compact JSON output, and fall back to the existing artifacts when a row fails.

Closes #43.

Architecture and implementation details

  • add a built-in token-usage reporter and wire it through reporter loading, CLI help text, and the public reporter exports
  • keep reporter stdout to one compact JSON object with top-level passed, billable, artifacts, and rows fields so agents can parse it reliably
  • compute per-row billable only for passed rows with provider-backed normalized totals; keep failed, derived, and unavailable rows in the output with billable: null
  • aggregate top-level billable from comparable passed rows only, including repeated executions via successfulRepetitions
  • avoid a second token artifact format and reuse the existing suite-run artifact directory for debugging when rows fail
  • add the bundled token-optimization skill and document the required workflow: explicit target, minimal protecting suite, passing baseline first, one small edit at a time, before/after comparison, and snapshots only as optional follow-up protection
  • update the bundled skill docs, reporter docs, CLI skill coverage, and DICTIONARY.md with the approved token-usage and token-optimization terms
  • include the small fixture repairs needed to keep the requested local CI gates green while landing the reporter slice

Validation

  • pnpm run fmt:check
  • pnpm run test
  • pnpm run typecheck
  • pnpm run build
  • ran the full check set before each commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add token usage reporter and optimization skill

1 participant