feat: GitHub issue intelligence pipeline and dashboard by sophiecarreras · Pull Request #11 · backblaze-b2-samples/vibe-coding-starter-kit

sophiecarreras · 2026-05-29T17:14:11Z

Implements backblaze-labs/demand-side-ai#176

What

Transforms the starter kit into a complete B2 Issue Intelligence sample app: a pipeline ingests GitHub issues, embeds and clusters them, classifies each with an LLM, and surfaces a dashboard showing backlog themes, category distribution, activity over time, and spec quality — all backed by Backblaze B2.

Why

Demonstrates B2 as a multi-role data store in a real AI/data pipeline: raw data lake (issue snapshots), derived-artifact store (embeddings, classifications, clusters), historical archive (append-only runs), and dashboard backend (report payloads). More compelling than a file-upload demo because artifacts accumulate naturally over time and the intelligence has real operational value.

Changes

Backend — intelligence pipeline

services/api/app/types/ — Issue, ClassificationResult, Cluster, SnapshotReport Pydantic models
services/api/app/config/intelligence.py — pipeline settings (repo, model, cost rates)
services/api/app/repo/github_issues.py — GitHub REST adapter with pagination and rate-limit backoff
services/api/app/repo/embedding_client.py — local embeddings via sentence-transformers/all-MiniLM-L6-v2 (no API key needed)
services/api/app/repo/llm_client.py — generic call_llm(system, user) wrapping Anthropic with retry
services/api/app/repo/intelligence_storage.py — all B2 read/write for raw + derived snapshot artifacts
services/api/app/service/ — pipeline stages: ingestion, embeddings, classification, clustering, analysis, snapshots
services/api/app/service/prompts/ — versioned LLM prompt templates for classification and cluster labeling
services/api/app/runtime/routes_intelligence.py — REST API (POST /snapshots, GET /snapshots, reports, issues, clusters, activity)
services/api/app/runtime/cli_intelligence.py — CLI entry points for ingest, reprocess, list, show
services/api/main.py — registers intelligence router
services/api/requirements.txt — adds anthropic, sentence-transformers, scikit-learn, numpy, pyarrow

Frontend — intelligence dashboard

apps/web/src/components/intelligence/ — 10 components: ClusterGrid, ClusterCard, CategoryBreakdown, ActivityTimeline, B2RoleDistribution, SpecDepthHistogram, IssueRow, IssueDetailPanel, SnapshotPicker, RunSnapshotButton
apps/web/src/app/intelligence/ — routes: overview, snapshot list, snapshot detail, issue list, cluster drill-down
apps/web/src/app/page.tsx — root redirects to intelligence overview (per AGENTS.md §2)
apps/web/src/lib/api-client.ts — intelligence API functions
apps/web/src/lib/queries.ts — TanStack Query hooks with 3s polling for running snapshots
packages/shared/src/types.ts — shared TypeScript types mirroring Pydantic models
apps/web/src/components/layout/app-sidebar.tsx — Intelligence and Snapshots nav entries

Docs

docs/features/intelligence.md — pipeline stages, cost model with worked example (~$0.19 for 169 issues), source adapter interface
docs/features/intelligence-dashboard.md — component catalog, data hooks, UX states, privacy rules
docs/features/storage-layout.md — B2 append-only snapshot layout with design rationale
docs/features/dashboard.md — updated to note root dashboard replaced by intelligence overview
docs/RUNBOOK.md — operational guide: rate limits, LLM parse failures, snapshot cleanup, cluster re-labeling
docs/exec-plans/intelligence-v0.md — execution plan and acceptance criteria
AGENTS.md — §2a documents app-specific rules for agents
ARCHITECTURE.md, README.md — updated for this app

Verification

# All checks pass
pnpm lint && pnpm lint:api && pnpm test:api && pnpm check:structure

# Run pipeline end-to-end (requires ANTHROPIC_API_KEY + GITHUB_TOKEN in .env)
pnpm intel:ingest
# → 169 issues fetched, 11 clusters formed, ~$0.19 total cost

# List snapshots
pnpm intel:list

# Start the app and view dashboard
pnpm dev  # → http://localhost:3000

Stack: Next.js 16 · FastAPI · sentence-transformers (local embeddings) · Anthropic claude-haiku-4-5 · HDBSCAN · pyarrow · Backblaze B2

Adds the demand-side-ai issue intelligence app on top of the starter kit: - Backend pipeline: GitHub fetch -> OpenAI embeddings -> Anthropic classification -> HDBSCAN clustering -> SnapshotReport in B2 - Frontend: intelligence dashboard, snapshot list, issue list, cluster drill-down - CLI: intel:ingest, intel:reprocess, intel:list, intel:show - Docs: intelligence.md, intelligence-dashboard.md, storage-layout.md, RUNBOOK.md, exec plan Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…M JSON parsing - Replace OpenAI embeddings with sentence-transformers (all-MiniLM-L6-v2) — eliminates OpenAI dependency; embeddings run locally at zero API cost - Fix config env_file path: use Path(__file__).parents[4] so .env is found correctly regardless of working directory when CLI runs - Add extra="ignore" to Settings to tolerate intelligence-only keys - Remove model_dump_json(default=str) — Pydantic v2 handles datetime natively - Update default LLM model to claude-haiku-4-5-20251001 - Extract JSON from LLM responses before parsing to handle markdown code fences that the new model wraps around structured output Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Sophie Carreras and others added 2 commits May 28, 2026 15:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: GitHub issue intelligence pipeline and dashboard#11

feat: GitHub issue intelligence pipeline and dashboard#11
sophiecarreras wants to merge 2 commits into
mainfrom
feat/176-issue-intelligence-pipeline

sophiecarreras commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sophiecarreras commented May 29, 2026

What

Why

Changes

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant