Add token usage reporter and optimization skill#44
Open
V3RON wants to merge 2 commits into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why this change is needed
Skillgym needs a built-in way for agents to compare billable token usage without parsing the full suite result or relying on ad-hoc artifacts. This adds a strict
token-usagereporter for optimization loops and a bundledtoken-optimizationskill that tells agents how to reduce token cost safely while keeping benchmark behavior intact.For Skillgym users, this means token-cost tuning can stay inside the normal
skillgym runworkflow: run a baseline, make one small change, compare compact JSON output, and fall back to the existing artifacts when a row fails.Closes #43.
Architecture and implementation details
token-usagereporter and wire it through reporter loading, CLI help text, and the public reporter exportspassed,billable,artifacts, androwsfields so agents can parse it reliablybillableonly for passed rows with provider-backed normalized totals; keep failed, derived, and unavailable rows in the output withbillable: nullbillablefrom comparable passed rows only, including repeated executions viasuccessfulRepetitionstoken-optimizationskill and document the required workflow: explicit target, minimal protecting suite, passing baseline first, one small edit at a time, before/after comparison, and snapshots only as optional follow-up protectionDICTIONARY.mdwith the approvedtoken-usageandtoken-optimizationtermsValidation
pnpm run fmt:checkpnpm run testpnpm run typecheckpnpm run build