Jefreesujit · Jefreesujit · Apr 14, 2026 · Apr 10, 2026 · Apr 10, 2026 · Apr 10, 2026
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -0,0 +1,36 @@
+name: CI
+
+on:
+  pull_request:
+  push:
+    branches:
+      - main
+
+jobs:
+  validate:
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+
+      - name: Use Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version-file: "package.json"
+          cache: "npm"
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Lint
+        run: npm run lint
+
+      - name: Typecheck
+        run: npm run typecheck
+
+      - name: Test
+        run: npm run test
+
+      - name: Build
+        run: npm run build
diff --git a/AGENTS.md b/AGENTS.md
@@ -3,34 +3,40 @@
 This file provides comprehensive context for AI coding assistants (like Antigravity) to understand and work with the **Browser LLM Chat** codebase.
 
 ## 🚀 Project Overview
+
 - **Name**: Browser LLM Chat
 - **Primary Goal**: Fully local, privacy-first AI chat application running models via WebGPU.
 - **Tech Stack**: React 19, Vite, TypeScript, Vanilla CSS.
 - **Core Library**: `@huggingface/transformers` (v4.0.0-next.x).
 
 ## 🏗️ Architecture Summary
+
 - **No Backend**: Zero server-side inference. All models run in the browser's shared GPU memory.
 - **Offloading**: Heavy computation is strictly handled in `src/model.worker.ts` to prevent UI thread blocking.
 - **Data Flow**:
   1. `App.tsx` sends messages/images as a `WorkerRequest`.
   2. `model.worker.ts` processes inference using Transformers.js.
   3. `model.worker.ts` sends tokens back as `WorkerResponse`.
-  4. `App.tsx` updates React state for real-time streaming.
+  4. `App.tsx` updates the Zustand app store for real-time streaming.
 
 ## 📁 Critical Files
+
 - `src/App.tsx`: Main UI logic, message orchestration, and worker management.
+- `src/store/app-store.ts`: Shared app state and state transitions.
 - `src/model.worker.ts`: Worker entry point for Transformers.js inference.
 - `src/models.ts`: Configuration for all supported models and quantization settings.
 - `src/styles.css`: Custom "glassmorphic" theme.
 - `src/types.ts`: Common TypeScript interfaces and enums.
 
 ## ⚠️ Architectural Constraints
+
 - **Local-First**: Do NOT attempt to add backend API calls for inference.
 - **Web Workers**: Expensive logic (image processing, token generation) MUST stay in the worker.
 - **WebGPU Only**: The app target is WebGPU-enabled browsers. Fallback logic is minimal.
 - **VRAM Sensitivity**: Be cautious with large models. Use `q4f16` quantization by default.
 
 ## 🤝 Contribution Workflow
+
 - Ensure all new logic is fully typed.
 - Follow the existing aesthetic: Glassmorphism, CSS variables for colors, and responsive layouts.
 - Maintain the single-page, local-only architecture.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,34 +1,61 @@
-# 🤝 Contributing to Browser LLM Chat
+# Contributing to Browser LLM Chat
 
-We're delighted you're interested in contributing to **Browser LLM Chat**! This project aims to make high-quality AI accessible to everyone via the browser.
+## Setup
 
-## How to Help
+Use Node `22.x`, then install dependencies:
 
-### 🐛 Bug Reports
-If you find a bug, please open an issue with:
-- A clear description of the problem.
-- Your browser and OS version.
-- Steps to reproduce.
+```bash
+npm install
+```
 
-### ✨ Feature Requests
-We'd love to hear your ideas! If you have a feature in mind, feel free to open a discussion or a pull request.
+Run the app locally with:
 
-### 💻 Code Contributions
-1. **Fork** the repository and create your branch from `main`.
-2. **Install** dependencies: `npm install`.
-3. **Run** the development server: `npm run dev`.
-4. **Build** to check for errors: `npm run build`.
-5. **Commit** your changes with clear, descriptive messages.
-6. **Submit** a pull request!
+```bash
+npm run dev
+```
 
----
+## Required Checks
 
-## Technical Standards
-- **Local-First**: We never send user data to a backend.
-- **Modern UI**: Keep changes consistent with our **glassmorphic** theme.
-- **Web Workers**: Always perform expensive tasks in a worker context.
-- **TypeScript**: New code must be fully typed.
+Before opening a pull request, run:
 
----
+```bash
+npm run check
+```
 
-Happy contributing! ❤️
+That command validates linting, type safety, tests, and the production build.
+
+## Working Rules
+
+- Keep the app local-first. Do not add backend inference calls.
+- Keep expensive inference, summarization, and model work inside the worker layer.
+- Preserve the current SPA structure. Do not add router-driven page navigation unless explicitly scoped.
+- Keep new code fully typed.
+- Prefer small focused modules and hooks over growing monolith files.
+- Reuse shared helpers for storage, dialog behavior, and model logic instead of duplicating patterns.
+
+## Coding Standards
+
+- ESLint is the source of truth for lint rules.
+- Prettier is the source of truth for formatting.
+- Prefer type-only imports where possible.
+- Add tests for extracted pure logic and regressions when refactoring behavior-heavy code.
+
+## Suggested Workflow
+
+1. Create a branch from `main`.
+2. Make focused changes with clear commit scope.
+3. Run `npm run check`.
+4. Update docs when contributor behavior, scripts, or architecture expectations change.
+5. Open a pull request with a concise summary and validation notes.
+
+## Architecture Boundaries
+
+- `src/App.tsx`: app composition and screen orchestration
+- `src/store/`: shared app state
+- `src/components/`: presentational UI
+- `src/hooks/`: reusable React behavior
+- `src/worker/`: model runtime domains
+- `src/chat-store.ts` and `src/storage.ts`: persistence concerns
+- `src/test/`: shared test coverage for app and worker logic
+
+Keep responsibilities aligned with those boundaries when adding new code.
diff --git a/README.md b/README.md
@@ -1,95 +1,87 @@
-# 🌐 Browser LLM Chat
+# Browser LLM Chat
 
-A high-performance, **100% local-first** React application designed to run Large Language Models directly in your browser. Leveraging **WebGPU** via `@huggingface/transformers`, this experiment brings powerful inference to the client-side with no backend required.
+Browser LLM Chat is a fully local-first React application that runs language and vision models directly in the browser with WebGPU. There is no backend inference layer, no API key requirement, and no server-side chat state.
 
-![Browser LLM Chat Interface](https://img.shields.io/badge/Status-Experimental-orange)
-![WebGPU-Powered](https://img.shields.io/badge/Powered%20By-WebGPU-blue)
-![Local-First](https://img.shields.io/badge/Privacy-Local--First-green)
+## Highlights
 
----
+- Local-first inference with `@huggingface/transformers`
+- Web Worker based model loading and token streaming
+- Shared app state managed with Zustand
+- Chat history persisted locally with IndexedDB and localStorage fallback
+- Curated browser-ready models plus searchable Hugging Face discovery
+- Built-in settings for generation controls and downloaded-model cleanup
 
-## ✨ Key Features
+## Stack
 
-- **🚀 100% Local Inference**: Your prompts, images, and model outputs never leave your browser. Privacy is built-in by design.
-- **⚡ WebGPU Accelerated**: Utilizes your GPU's power for near-native performance on compatible browsers (Chrome/Edge Desktop).
-- **🧠 Advanced Model Support**: Access to specialized browser-friendly models, including:
-  - **Balanced (Gemma 3 1B)**: A balanced desktop default for everyday browser chat.
-  - **Reasoning (DeepSeek R1 1.5B)**: Built-in reasoning capabilities for complex logic via distillation.
-  - **Coding (Qwen 2.5 Coder)**: Compact coding helpers for quick edits and code explanations.
-  - **Vision (Qwen 3.5 Vision)**: Fully multimodal support for image-to-text tasks.
-  - **Fast / Mobile-Safe**: SmolLM2 (360M) and Qwen 2.5 (0.5B) for ultra-quick response times.
-- **⚙️ Generation Parameter Controls**: Fine-grained control over model temperature, top-p, and token limits via an intuitive settings dialog.
-- **💾 Storage & Chat Management**: Automatic Hugging Face caching, clear chat history, and robust data management to delete offline model files directly from the UI.
-- **🧵 Worker-Based Architecture**: Heavy computation happens in a dedicated Web Worker to keep the UI smooth and responsive.
+- React 19
+- Vite
+- TypeScript
+- Vanilla CSS
+- `@huggingface/transformers` `4.0.0-next.x`
 
----
+## Prerequisites
 
-## 🛠 Tech Stack
+- Node.js `22.x`
+- A WebGPU-capable browser
+  Recommended: recent Chrome or Edge desktop builds
 
-- **Core**: [React 19](https://react.dev/), Vite, TypeScript
-- **Inference**: [@huggingface/transformers (v4.0.0-next)](https://github.com/huggingface/transformers.js)
-- **Styling**: Vanilla CSS (Custom UI with glassmorphism and modern aesthetics)
-- **Formatting**: `react-markdown` with `remark-gfm` for rich text and reasoning blocks.
+## Getting Started
 
----
+```bash
+npm install
+npm run dev
+```
 
-## 🚀 Getting Started
+Open [http://localhost:5173](http://localhost:5173).
 
-### Prerequisites
+## Scripts
 
-- A browser with **WebGPU support** (Recommended: Chrome 113+ or Edge 113+ on Desktop).
-- [Node.js](https://nodejs.org/) installed.
+- `npm run dev`: start the Vite dev server
+- `npm run build`: typecheck and build the production bundle
+- `npm run preview`: preview the production build locally
+- `npm run lint`: run ESLint with zero warnings allowed
+- `npm run lint:fix`: apply safe ESLint autofixes
+- `npm run format`: format the repo with Prettier
+- `npm run format:check`: verify formatting without rewriting files
+- `npm run typecheck`: run TypeScript without emitting
+- `npm run test`: run the Vitest suite
+- `npm run test:watch`: run Vitest in watch mode
+- `npm run check`: run lint, typecheck, tests, and build
 
-### Installation
+## Architecture Notes
 
-1. **Clone the repository**:
-   ```bash
-   git clone <repository-url>
-   cd web-llm
-   ```
+- `src/App.tsx` is the top-level composition layer for the SPA shell.
+- Shared cross-screen UI, chat, and model state lives in `src/store/app-store.ts`.
+- Heavy inference work stays in `src/model.worker.ts` plus the focused worker helpers in `src/worker/`.
+- Chat persistence is handled locally through `src/chat-store.ts`.
+- Lightweight preferences, storage helpers, and storage feedback live in `src/storage.ts`.
+- Tests live in `src/test/` so contributors have one place to look for coverage.
+- There is intentionally no router or backend inference path.
 
-2. **Install dependencies**:
-   ```bash
-   npm install
-   ```
+## Quality Gates
 
-3. **Run the development server**:
-   ```bash
-   npm run dev
-   ```
+Pull requests are expected to pass:
 
-4. **Open in Browser**: Navigate to `http://localhost:5173`.
+- `npm run lint`
+- `npm run typecheck`
+- `npm run test`
+- `npm run build`
 
----
+GitHub Actions runs the same checks automatically.
 
-## 📝 Notes & Limitations
+## Documentation
 
-- **First Load**: The initial model download (200MB - 900MB depending on the model) may take some time depending on your connection.
-- **VRAM**: Older GPUs with limited VRAM may struggle with the Vision/Thinking models.
-- **Environment**: This is an experimental proof-of-concept.
+- [Architecture overview](docs/architecture.md)
+- [Model details](docs/models.md)
+- [Contributor guide](CONTRIBUTING.md)
+- [Agent context](AGENTS.md)
 
----
+## Limitations
 
-## 🤝 Credits
+- First-time model downloads can be large and slow on constrained networks.
+- Larger models remain sensitive to browser, VRAM, and device class.
+- WebGPU support is required for the supported experience.
 
-Special thanks to the **Hugging Face** team for the amazing [transformers.js](https://huggingface.co/docs/transformers.js/index) library and the open-source community for the quantized ONNX models.
+## License
 
----
-
-Built with ❤️ for the future of free, private AI.
-
----
-
-## 📚 Documentation
-
-- [**🏗️ Architecture & Privacy**](docs/architecture.md): Deep dive into our local-first, WebGPU-powered engine.
-- [**🧠 Model Details**](docs/models.md): Understanding the SmolLM and Qwen configurations.
-- [**🤖 AI Agent Context**](AGENTS.md): Contextual information for AI coding assistants.
-
----
-
-## 🤝 Community & Support
-
-- [**Contributing Guidelines**](CONTRIBUTING.md): How to help improve Browser LLM Chat.
-- [**Code of Conduct**](CODE_OF_CONDUCT.md): Our commitment to a welcoming environment.
-- [**License**](LICENSE): This project is released under the MIT License.
+MIT
diff --git a/docs/architecture.md b/docs/architecture.md
@@ -5,16 +5,21 @@
 ## Core Components
 
 ### 1. **React UI (Vite + TypeScript)**
-The user interface is built with **React 19** and **Vite**, focusing on a clean, responsive, and "glassmorphic" aesthetic. It manages state, chat history, and model selection.
+
+The user interface is built with **React 19** and **Vite**, focusing on a clean, responsive, and "glassmorphic" aesthetic. Shared app state is centralized in a lightweight **Zustand** store, while local component-only draft state remains local to the components that own it.
 
 ### 2. **WebGPU Acceleration**
+
 The application uses the **WebGPU API** to leverage the user's graphics hardware for model inference. This provides near-native performance for transformer-based models by utilizing the parallel processing power of modern GPUs.
 
 ### 3. **Web Worker Threading**
-To ensure a smooth UI experience, all heavy lifting (model loading, processing, and inference) is offloaded to a **dedicated Web Worker** (`model.worker.ts`). Communication between the UI and the worker happens asynchronously via the `postMessage` API.
+
+To ensure a smooth UI experience, all heavy lifting (model loading, processing, summarization, and inference) is offloaded to a **dedicated Web Worker** (`model.worker.ts`). Communication between the UI and the worker happens asynchronously via the `postMessage` API. The worker is kept intentionally coarse-grained: the entry file handles message routing, while a small `src/worker/` set owns model session state, conversation budgeting, and generation logic.
 
 ### 4. **Transformers.js (v4.0.0-next)**
+
 We use the `@huggingface/transformers` library to handle:
+
 - **ONNX Model Loading**: Loading quantized model weights.
 - **Tokenization**: Converting text to numerical input.
 - **Inference**: Running the model and streaming output tokens.
@@ -24,18 +29,35 @@ We use the `@huggingface/transformers` library to handle:
 ## 🔒 Security & Privacy
 
 ### **100% Local-First**
+
 - **No Data Leakage**: Your prompts, images, and model outputs never leave your machine. There is no backend telemetry or logging of your conversations.
 - **Offline Capable**: Once the model weights are downloaded into the browser cache, the application can run fully offline.
 
 ### **Model Provenance**
+
 - Models are fetched directly from the [Hugging Face Hub](https://huggingface.co/models). We use official and community-quantized versions of reputable models (SmolLM, Qwen).
 
 ### **Safe Model Execution**
+
 - The models run within the browser's sandboxed environment. They cannot access your local file system (except through explicit user-provided file uploads) or other browser data.
 
 ---
 
 ## 💡 Local Inference Benefits
+
 - **Zero Latency**: No network round-trips for inference.
 - **Privacy By Design**: Ideal for sensitive or personal queries.
 - **Cost Effective**: No expensive GPU server hosting required.
+
+---
+
+## 📁 Project Layout
+
+- `src/App.tsx`: SPA shell composition and screen orchestration
+- `src/store/app-store.ts`: shared app state and actions
+- `src/components/`: visible UI sections and dialogs
+- `src/hooks/`: reusable React behaviors that are shared across screens
+- `src/chat-store.ts`: durable chat persistence and legacy thread migration
+- `src/storage.ts`: lightweight browser state and storage feedback helpers
+- `src/model.worker.ts` + `src/worker/`: inference runtime
+- `src/test/`: centralized app and worker tests