Skip to content

Releases: acailic/agent_debugger

v0.1.18

07 Apr 10:39

Choose a tag to compare

v0.1.18

✨ Features

L3 Replay

  • Implement L3 replay primitives — RestoreHook, DriftDetector, schema extensions

General

  • Close gaps on 4 partially complete features
  • Add per-feature README GIFs, harness improvements, and CI gates

🐛 Bug Fixes

Frontend

  • Revert eslint to v9 — react-hooks plugin incompatible with v10

CI

  • Add continue-on-error for codecov and deprecated dependency-review action
  • Resolve CI lint errors in harness scripts

Tests

  • Use UTC dates in analytics DB tests to match repository implementation
  • Update test expectation for checkpoint_timestamp in restored session config

Review

  • Address all review comments on PR #140
  • Address remaining P1/P2 review feedback on restore and event filtering
  • Resolve Pyright type warnings in new feature code

Demo

  • Re-record demo GIFs with correct feature navigation and slower pacing

📦 Other

Dependencies

  • Bump vite from 8.0.3 to 8.0.5
  • Bump eslint from 9.39.4 to 10.2.0
  • Bump globals from 16.5.0 to 17.4.0
  • Bump codecov/codecov-action from 4 to 6

v0.1.17

04 Apr 01:49

Choose a tag to compare

✨ Features

  • Alert lifecycle management, configurable policies, and query performance indexes (#135)
  • Add harness improvements — pre-commit hooks, pyright, coverage gate, CI gates, and Claude Code hooks
  • Add cost dashboard + trace search demo GIF

🐛 Bug Fixes

  • Resolve lint errors and clean up coverage config
  • Eliminate N+1 queries in _load_candidate_sessions event_type filter (#132)
  • Drop .distinct() from event_type subquery, add tests
  • Remove unused sqlalchemy.exists import in storage/search.py
  • Address code review findings — critical, major, and minor issues

📦 Other

Documentation

  • Split architecture into abstract overview + detailed layer diagram
  • Upgrade README architecture diagram with professional styling
  • Update README to reference regenerated demo GIFs and screenshots
  • Regenerate demo GIFs and screenshots with polished UI

v0.1.16

03 Apr 20:59

Choose a tag to compare

v0.1.16

Features

CLI

  • Add peaky-peek demo command for zero-friction onboarding (seed + launch server + frontend)
  • Add peaky-peek seed and peaky-peek serve subcommands

Frontend

  • Update WhyButton to use diagnostic API
  • Refactor DecisionTree recommendation to useMemo
  • Refine replay UX and similar failure search
  • Add failure analysis, entity insights, and Hindsight export

SDK

  • Elevate to top-tier quality: thread safety, security, API robustness
  • Harden SDK with typed schemas and frontend resilience
  • Fix tenant isolation and test harness leaks
  • Prevent HttpTransport creation without API key

Benchmarks

  • Add benchmark seed data enrichment
  • Stabilize benchmark test timeouts in CI

Testing

  • Restore and fix failure_memory tests
  • Stabilize LangChain test suite
  • Improve restore flow and seed data validation

Bug Fixes

  • Fix session update and collector regressions
  • Fix export, drift contract, and request deduping
  • Fix contract test import order for CI
  • Multiple CI stability fixes (lint, test isolation, timeouts)

v0.1.15

02 Apr 17:51

Choose a tag to compare

🐛 Bug Fixes

  • Fix HTTP transport not being wired up in local-first mode without API key — SDK silently dropped all events instead of sending them to the server
  • Disable SDK in benchmark tests to prevent network retries

Full Changelog: v0.1.14...v0.1.15

v0.1.14

02 Apr 15:06

Choose a tag to compare

What's Changed

Stability & Correctness (Phases 1-2)

  • Fix SDK connection leak in shared async client auto-cleanup
  • Fix unhandled checkpoint restore errors leaking raw HTTP errors
  • Fix SSE disconnect memory leak with CancelledError handling
  • Fix duplicate SessionStore interface in frontend
  • Add silent failure detection in instrument() auto-patch
  • Replace print() with logger in auto-patch transport
  • Add proper return types for get_current_context()
  • Fix N+1 search pattern loading 500 sessions with all events
  • Add rate limiting on API key creation
  • Fix buffer silent data drop without caller notification
  • Fix fragile replay collapse_threshold default unwrapping
  • Fix dangerouslySetInnerHTML XSS risk in ToolInspector
  • Remove unused useReplayBreakpoint.ts hook

Quality & Maintainability (Phases 3-4)

  • Encapsulate global mutable state in auto-patch transport
  • Expose retry config via Config instead of hardcoded values
  • Use @abstractmethod in RecordingMixin
  • Add DB pool_timeout/pool_recycle for non-SQLite engines
  • Add transaction boundaries in trace_routes
  • Fix fragile querySelector with stable getElementById
  • Improve accessibility (role="meter" replacement, validation cleanup)
  • Fix auto-patch side effects at import time
  • Improve Config validation flow
  • Add CONTRIBUTING.md and update ARCHITECTURE.md

Tests

  • Add 32 tests for storage/embedding (tokenize, vectors, similarity)
  • Add 25 tests for storage/search (sessions, events, tenant isolation)
  • Add tests for trace_routes rollback/commit behavior
  • Add tests for engine factory pool settings
  • Add tests for comparison_routes and trace_routes
  • Add dedicated SessionManager tests

Refactoring

  • Remove dead ranking module and unused feature tests
  • Deep quality refinement across SDK, backend, and frontend
  • Round 2 refinement — dead code removal, component improvements
  • Replace falsy-or patterns with explicit None checks
  • Improve DX — tooltips, validation, prescriptive hints, keyboard shortcuts

Full Changelog: v0.1.12...v0.1.14

v0.1.13

31 Mar 23:12

Choose a tag to compare

v0.1.13

✨ Features

Developer Experience

  • Add tooltips explaining technical terms (replay_value, drift, policy_shift, ranking factors)
  • Add keyboard shortcuts: Ctrl+1/2/3 for tabs, Ctrl+K or / for search focus
  • Add prescriptive action hints in FailureClusterPanel mapped to error types
  • Add drift direction indicator and remediation guidance in DriftAlertsPanel
  • Add cost efficiency hint when per-token cost is notably high
  • Add ranking factor breakdown labels in EventDetail
  • Add skeleton loading animation to CostPanel
  • Add config validation (endpoint URL, max_payload_kb, sample_rate)
  • Expand boolean parsing to handle 1/0, yes/no, on/off
  • Add actionable HTTP error messages (401/403/404/429)
  • Add auto-patch console feedback for CLI visibility
  • Enhance error messages with actionable suggestions

Frontend

  • Convert LLMViewer and MultiAgentCoordinationPanel empty states to EmptyState component

📦 Other

Refactoring

  • Remove dead code and simplify across backend, SDK, and frontend
  • Remove dead ranking module and unused feature tests
  • Extract severity weights to module-level constant

v0.1.12

31 Mar 22:14

Choose a tag to compare

v0.1.12

✨ Features

  • Add Failure Clustering, Multi-Agent Coordination panels and fix Inspect tab layout
  • Improve UI presentation — collapsible sections, clickable checkpoints, tree polish
  • Professional UI polish for Inspect tab and DecisionTree
  • UX polish for Trace tab, Inspect tab, and cross-tab design system

🐛 Bug Fixes

  • Make Inspect tab DecisionTree full-width and auto-height
  • Resolve CI failures (imports) — ensure collector submodules accessible, avoid circular imports on Python 3.10

📦 Other

Documentation

  • Replace feature screenshots with per-feature GIFs in README
  • Re-record feature GIFs with updated UI

Refactoring

  • Deslop codebase — remove dead code, consolidate duplicates, fix DDD boundaries, polish UI layout
  • Deduplicate _event_value across collector modules

Performance

  • Optimize performance across backend, frontend, and SDK

Testing

  • Add 94 tests for untested high-risk modules

v0.1.11

31 Mar 15:01

Choose a tag to compare

v0.1.11

✨ Features

  • Implement medium-term features — adaptive replay, exploration, conversation, live dashboard
  • Implement 5 research-backed quick wins

🐛 Bug Fixes

  • Auto-increment session.errors when error events are added
  • Use atomic SQL increment for session.errors in add_event
  • Polish 6 issues for HN launch — drift threshold, seed data, benchmark calls
  • Resolve CI failure (version) — sync expected version to 0.1.10
  • Resolve CI failure (version) — sync hardcoded version in SDK init.py
  • Resolve CI failure (lint) — break long line in seed_data.py
  • Resolve CI failure (lint)

📦 Other

Dependencies

  • Bump typescript from 5.9.3 to 6.0.2 in /frontend
  • Bump vitest from 4.1.1 to 4.1.2 in /frontend

Documentation

  • Add polished UI screenshots for README
  • Add updated interactive course for Peaky Peek
  • Add GitHub icon/link to course header
  • Update course description with key features
  • Update README screenshots and title
  • Add scientific papers section and fix safety screenshot
  • Remove Safety Audit Trail section
  • Add research-backed action plan and paper analysis

Refactoring

  • Deslop — trim verbose docstrings and remove duplicated formatTimestamp
  • Derive version from installed package metadata
  • Add 174 collector tests, extract UI components, fix test regressions

Testing

  • Enhance auto-patch adapter tests — verify event data fields and missing paths

Style

  • Polish UI for more professional appearance

v0.1.10

30 Mar 14:29

Choose a tag to compare

v0.1.10

✨ Features

UI

  • Distill UI — restructure tabs, unify color tokens, add panel hierarchy

🐛 Bug Fixes

Seed data

  • Enrich seed data — add tokens, costs, retention, alerts, fix error semantics
  • Update dev proxy port and add missing db commits in seed script

CI

  • Update benchmark tests for explicit error events
  • Auto-fix ruff import ordering and unused imports

Config

  • Make dev proxy port configurable via API_PORT env var

📦 Other

Tests

  • Add regression tests for 6 discovered issues

Chores

  • Bump version to 0.1.9
  • Bump version to 0.1.10

v0.1.9

30 Mar 11:50

Choose a tag to compare

v0.1.9

✨ Features

  • Pre-launch polish — simplified API, Docker, README, adapter improvements

🐛 Bug Fixes

  • Resolve open issues #107-#113 — simplify complexity, add logging, extract imports
  • Resolve drift alert bugs and add comprehensive test coverage
  • Use parameterized query in _repair_legacy_sqlite_schema
  • Suppress bandit B608 false positive in analytics_db

📦 Other

Refactoring

  • Deslop SDK, server, and test suite — remove ~450 lines of duplication and dead code
  • Decompose intelligence module and expand unit test coverage
  • Complete test split — remove monolith, fix lint, update doc refs
  • Deslop frontend types, emitter, and config — deduplicate shared utilities

Chores

  • Bump SDK version and test to 0.1.8
  • Untrack .omc/state from version control