Releases: acailic/agent_debugger
Releases · acailic/agent_debugger
v0.1.18
v0.1.18
✨ Features
L3 Replay
- Implement L3 replay primitives — RestoreHook, DriftDetector, schema extensions
General
- Close gaps on 4 partially complete features
- Add per-feature README GIFs, harness improvements, and CI gates
🐛 Bug Fixes
Frontend
- Revert eslint to v9 — react-hooks plugin incompatible with v10
CI
- Add continue-on-error for codecov and deprecated dependency-review action
- Resolve CI lint errors in harness scripts
Tests
- Use UTC dates in analytics DB tests to match repository implementation
- Update test expectation for checkpoint_timestamp in restored session config
Review
- Address all review comments on PR #140
- Address remaining P1/P2 review feedback on restore and event filtering
- Resolve Pyright type warnings in new feature code
Demo
- Re-record demo GIFs with correct feature navigation and slower pacing
📦 Other
Dependencies
- Bump vite from 8.0.3 to 8.0.5
- Bump eslint from 9.39.4 to 10.2.0
- Bump globals from 16.5.0 to 17.4.0
- Bump codecov/codecov-action from 4 to 6
v0.1.17
✨ Features
- Alert lifecycle management, configurable policies, and query performance indexes (#135)
- Add harness improvements — pre-commit hooks, pyright, coverage gate, CI gates, and Claude Code hooks
- Add cost dashboard + trace search demo GIF
🐛 Bug Fixes
- Resolve lint errors and clean up coverage config
- Eliminate N+1 queries in
_load_candidate_sessionsevent_type filter (#132) - Drop
.distinct()from event_type subquery, add tests - Remove unused
sqlalchemy.existsimport in storage/search.py - Address code review findings — critical, major, and minor issues
📦 Other
Documentation
- Split architecture into abstract overview + detailed layer diagram
- Upgrade README architecture diagram with professional styling
- Update README to reference regenerated demo GIFs and screenshots
- Regenerate demo GIFs and screenshots with polished UI
v0.1.16
v0.1.16
Features
CLI
- Add
peaky-peek democommand for zero-friction onboarding (seed + launch server + frontend) - Add
peaky-peek seedandpeaky-peek servesubcommands
Frontend
- Update WhyButton to use diagnostic API
- Refactor DecisionTree recommendation to useMemo
- Refine replay UX and similar failure search
- Add failure analysis, entity insights, and Hindsight export
SDK
- Elevate to top-tier quality: thread safety, security, API robustness
- Harden SDK with typed schemas and frontend resilience
- Fix tenant isolation and test harness leaks
- Prevent HttpTransport creation without API key
Benchmarks
- Add benchmark seed data enrichment
- Stabilize benchmark test timeouts in CI
Testing
- Restore and fix failure_memory tests
- Stabilize LangChain test suite
- Improve restore flow and seed data validation
Bug Fixes
- Fix session update and collector regressions
- Fix export, drift contract, and request deduping
- Fix contract test import order for CI
- Multiple CI stability fixes (lint, test isolation, timeouts)
v0.1.15
🐛 Bug Fixes
- Fix HTTP transport not being wired up in local-first mode without API key — SDK silently dropped all events instead of sending them to the server
- Disable SDK in benchmark tests to prevent network retries
Full Changelog: v0.1.14...v0.1.15
v0.1.14
What's Changed
Stability & Correctness (Phases 1-2)
- Fix SDK connection leak in shared async client auto-cleanup
- Fix unhandled checkpoint restore errors leaking raw HTTP errors
- Fix SSE disconnect memory leak with CancelledError handling
- Fix duplicate SessionStore interface in frontend
- Add silent failure detection in
instrument()auto-patch - Replace
print()with logger in auto-patch transport - Add proper return types for
get_current_context() - Fix N+1 search pattern loading 500 sessions with all events
- Add rate limiting on API key creation
- Fix buffer silent data drop without caller notification
- Fix fragile replay
collapse_thresholddefault unwrapping - Fix
dangerouslySetInnerHTMLXSS risk in ToolInspector - Remove unused
useReplayBreakpoint.tshook
Quality & Maintainability (Phases 3-4)
- Encapsulate global mutable state in auto-patch transport
- Expose retry config via Config instead of hardcoded values
- Use
@abstractmethodin RecordingMixin - Add DB pool_timeout/pool_recycle for non-SQLite engines
- Add transaction boundaries in trace_routes
- Fix fragile querySelector with stable
getElementById - Improve accessibility (role="meter" replacement, validation cleanup)
- Fix auto-patch side effects at import time
- Improve Config validation flow
- Add CONTRIBUTING.md and update ARCHITECTURE.md
Tests
- Add 32 tests for storage/embedding (tokenize, vectors, similarity)
- Add 25 tests for storage/search (sessions, events, tenant isolation)
- Add tests for trace_routes rollback/commit behavior
- Add tests for engine factory pool settings
- Add tests for comparison_routes and trace_routes
- Add dedicated SessionManager tests
Refactoring
- Remove dead ranking module and unused feature tests
- Deep quality refinement across SDK, backend, and frontend
- Round 2 refinement — dead code removal, component improvements
- Replace falsy-or patterns with explicit None checks
- Improve DX — tooltips, validation, prescriptive hints, keyboard shortcuts
Full Changelog: v0.1.12...v0.1.14
v0.1.13
v0.1.13
✨ Features
Developer Experience
- Add tooltips explaining technical terms (replay_value, drift, policy_shift, ranking factors)
- Add keyboard shortcuts: Ctrl+1/2/3 for tabs, Ctrl+K or / for search focus
- Add prescriptive action hints in FailureClusterPanel mapped to error types
- Add drift direction indicator and remediation guidance in DriftAlertsPanel
- Add cost efficiency hint when per-token cost is notably high
- Add ranking factor breakdown labels in EventDetail
- Add skeleton loading animation to CostPanel
- Add config validation (endpoint URL, max_payload_kb, sample_rate)
- Expand boolean parsing to handle 1/0, yes/no, on/off
- Add actionable HTTP error messages (401/403/404/429)
- Add auto-patch console feedback for CLI visibility
- Enhance error messages with actionable suggestions
Frontend
- Convert LLMViewer and MultiAgentCoordinationPanel empty states to EmptyState component
📦 Other
Refactoring
- Remove dead code and simplify across backend, SDK, and frontend
- Remove dead ranking module and unused feature tests
- Extract severity weights to module-level constant
v0.1.12
v0.1.12
✨ Features
- Add Failure Clustering, Multi-Agent Coordination panels and fix Inspect tab layout
- Improve UI presentation — collapsible sections, clickable checkpoints, tree polish
- Professional UI polish for Inspect tab and DecisionTree
- UX polish for Trace tab, Inspect tab, and cross-tab design system
🐛 Bug Fixes
- Make Inspect tab DecisionTree full-width and auto-height
- Resolve CI failures (imports) — ensure collector submodules accessible, avoid circular imports on Python 3.10
📦 Other
Documentation
- Replace feature screenshots with per-feature GIFs in README
- Re-record feature GIFs with updated UI
Refactoring
- Deslop codebase — remove dead code, consolidate duplicates, fix DDD boundaries, polish UI layout
- Deduplicate _event_value across collector modules
Performance
- Optimize performance across backend, frontend, and SDK
Testing
- Add 94 tests for untested high-risk modules
v0.1.11
v0.1.11
✨ Features
- Implement medium-term features — adaptive replay, exploration, conversation, live dashboard
- Implement 5 research-backed quick wins
🐛 Bug Fixes
- Auto-increment session.errors when error events are added
- Use atomic SQL increment for session.errors in add_event
- Polish 6 issues for HN launch — drift threshold, seed data, benchmark calls
- Resolve CI failure (version) — sync expected version to 0.1.10
- Resolve CI failure (version) — sync hardcoded version in SDK init.py
- Resolve CI failure (lint) — break long line in seed_data.py
- Resolve CI failure (lint)
📦 Other
Dependencies
- Bump typescript from 5.9.3 to 6.0.2 in /frontend
- Bump vitest from 4.1.1 to 4.1.2 in /frontend
Documentation
- Add polished UI screenshots for README
- Add updated interactive course for Peaky Peek
- Add GitHub icon/link to course header
- Update course description with key features
- Update README screenshots and title
- Add scientific papers section and fix safety screenshot
- Remove Safety Audit Trail section
- Add research-backed action plan and paper analysis
Refactoring
- Deslop — trim verbose docstrings and remove duplicated formatTimestamp
- Derive version from installed package metadata
- Add 174 collector tests, extract UI components, fix test regressions
Testing
- Enhance auto-patch adapter tests — verify event data fields and missing paths
Style
- Polish UI for more professional appearance
v0.1.10
v0.1.10
✨ Features
UI
- Distill UI — restructure tabs, unify color tokens, add panel hierarchy
🐛 Bug Fixes
Seed data
- Enrich seed data — add tokens, costs, retention, alerts, fix error semantics
- Update dev proxy port and add missing db commits in seed script
CI
- Update benchmark tests for explicit error events
- Auto-fix ruff import ordering and unused imports
Config
- Make dev proxy port configurable via API_PORT env var
📦 Other
Tests
- Add regression tests for 6 discovered issues
Chores
- Bump version to 0.1.9
- Bump version to 0.1.10
v0.1.9
v0.1.9
✨ Features
- Pre-launch polish — simplified API, Docker, README, adapter improvements
🐛 Bug Fixes
- Resolve open issues #107-#113 — simplify complexity, add logging, extract imports
- Resolve drift alert bugs and add comprehensive test coverage
- Use parameterized query in _repair_legacy_sqlite_schema
- Suppress bandit B608 false positive in analytics_db
📦 Other
Refactoring
- Deslop SDK, server, and test suite — remove ~450 lines of duplication and dead code
- Decompose intelligence module and expand unit test coverage
- Complete test split — remove monolith, fix lint, update doc refs
- Deslop frontend types, emitter, and config — deduplicate shared utilities
Chores
- Bump SDK version and test to 0.1.8
- Untrack .omc/state from version control