Skip to content

feat: GitHub issue intelligence pipeline and dashboard#11

Draft
sophiecarreras wants to merge 2 commits into
mainfrom
feat/176-issue-intelligence-pipeline
Draft

feat: GitHub issue intelligence pipeline and dashboard#11
sophiecarreras wants to merge 2 commits into
mainfrom
feat/176-issue-intelligence-pipeline

Conversation

@sophiecarreras
Copy link
Copy Markdown
Contributor

Implements backblaze-labs/demand-side-ai#176

What

Transforms the starter kit into a complete B2 Issue Intelligence sample app: a pipeline ingests GitHub issues, embeds and clusters them, classifies each with an LLM, and surfaces a dashboard showing backlog themes, category distribution, activity over time, and spec quality — all backed by Backblaze B2.

Why

Demonstrates B2 as a multi-role data store in a real AI/data pipeline: raw data lake (issue snapshots), derived-artifact store (embeddings, classifications, clusters), historical archive (append-only runs), and dashboard backend (report payloads). More compelling than a file-upload demo because artifacts accumulate naturally over time and the intelligence has real operational value.

Changes

Backend — intelligence pipeline

  • services/api/app/types/Issue, ClassificationResult, Cluster, SnapshotReport Pydantic models
  • services/api/app/config/intelligence.py — pipeline settings (repo, model, cost rates)
  • services/api/app/repo/github_issues.py — GitHub REST adapter with pagination and rate-limit backoff
  • services/api/app/repo/embedding_client.py — local embeddings via sentence-transformers/all-MiniLM-L6-v2 (no API key needed)
  • services/api/app/repo/llm_client.py — generic call_llm(system, user) wrapping Anthropic with retry
  • services/api/app/repo/intelligence_storage.py — all B2 read/write for raw + derived snapshot artifacts
  • services/api/app/service/ — pipeline stages: ingestion, embeddings, classification, clustering, analysis, snapshots
  • services/api/app/service/prompts/ — versioned LLM prompt templates for classification and cluster labeling
  • services/api/app/runtime/routes_intelligence.py — REST API (POST /snapshots, GET /snapshots, reports, issues, clusters, activity)
  • services/api/app/runtime/cli_intelligence.py — CLI entry points for ingest, reprocess, list, show
  • services/api/main.py — registers intelligence router
  • services/api/requirements.txt — adds anthropic, sentence-transformers, scikit-learn, numpy, pyarrow

Frontend — intelligence dashboard

  • apps/web/src/components/intelligence/ — 10 components: ClusterGrid, ClusterCard, CategoryBreakdown, ActivityTimeline, B2RoleDistribution, SpecDepthHistogram, IssueRow, IssueDetailPanel, SnapshotPicker, RunSnapshotButton
  • apps/web/src/app/intelligence/ — routes: overview, snapshot list, snapshot detail, issue list, cluster drill-down
  • apps/web/src/app/page.tsx — root redirects to intelligence overview (per AGENTS.md §2)
  • apps/web/src/lib/api-client.ts — intelligence API functions
  • apps/web/src/lib/queries.ts — TanStack Query hooks with 3s polling for running snapshots
  • packages/shared/src/types.ts — shared TypeScript types mirroring Pydantic models
  • apps/web/src/components/layout/app-sidebar.tsx — Intelligence and Snapshots nav entries

Docs

  • docs/features/intelligence.md — pipeline stages, cost model with worked example (~$0.19 for 169 issues), source adapter interface
  • docs/features/intelligence-dashboard.md — component catalog, data hooks, UX states, privacy rules
  • docs/features/storage-layout.md — B2 append-only snapshot layout with design rationale
  • docs/features/dashboard.md — updated to note root dashboard replaced by intelligence overview
  • docs/RUNBOOK.md — operational guide: rate limits, LLM parse failures, snapshot cleanup, cluster re-labeling
  • docs/exec-plans/intelligence-v0.md — execution plan and acceptance criteria
  • AGENTS.md — §2a documents app-specific rules for agents
  • ARCHITECTURE.md, README.md — updated for this app

Verification

# All checks pass
pnpm lint && pnpm lint:api && pnpm test:api && pnpm check:structure

# Run pipeline end-to-end (requires ANTHROPIC_API_KEY + GITHUB_TOKEN in .env)
pnpm intel:ingest
# → 169 issues fetched, 11 clusters formed, ~$0.19 total cost

# List snapshots
pnpm intel:list

# Start the app and view dashboard
pnpm dev  # → http://localhost:3000

Stack: Next.js 16 · FastAPI · sentence-transformers (local embeddings) · Anthropic claude-haiku-4-5 · HDBSCAN · pyarrow · Backblaze B2

Sophie Carreras and others added 2 commits May 28, 2026 15:37
Adds the demand-side-ai issue intelligence app on top of the starter kit:
- Backend pipeline: GitHub fetch -> OpenAI embeddings -> Anthropic classification -> HDBSCAN clustering -> SnapshotReport in B2
- Frontend: intelligence dashboard, snapshot list, issue list, cluster drill-down
- CLI: intel:ingest, intel:reprocess, intel:list, intel:show
- Docs: intelligence.md, intelligence-dashboard.md, storage-layout.md, RUNBOOK.md, exec plan

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…M JSON parsing

- Replace OpenAI embeddings with sentence-transformers (all-MiniLM-L6-v2)
  — eliminates OpenAI dependency; embeddings run locally at zero API cost
- Fix config env_file path: use Path(__file__).parents[4] so .env is
  found correctly regardless of working directory when CLI runs
- Add extra="ignore" to Settings to tolerate intelligence-only keys
- Remove model_dump_json(default=str) — Pydantic v2 handles datetime natively
- Update default LLM model to claude-haiku-4-5-20251001
- Extract JSON from LLM responses before parsing to handle markdown
  code fences that the new model wraps around structured output

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant