Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: CI

on:
pull_request:
push:
branches:
- main

jobs:
validate:
runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Use Node.js
uses: actions/setup-node@v4
with:
node-version-file: "package.json"
cache: "npm"

- name: Install dependencies
run: npm ci

- name: Lint
run: npm run lint

- name: Typecheck
run: npm run typecheck

- name: Test
run: npm run test

- name: Build
run: npm run build
8 changes: 7 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,34 +3,40 @@
This file provides comprehensive context for AI coding assistants (like Antigravity) to understand and work with the **Browser LLM Chat** codebase.

## 🚀 Project Overview

- **Name**: Browser LLM Chat
- **Primary Goal**: Fully local, privacy-first AI chat application running models via WebGPU.
- **Tech Stack**: React 19, Vite, TypeScript, Vanilla CSS.
- **Core Library**: `@huggingface/transformers` (v4.0.0-next.x).

## 🏗️ Architecture Summary

- **No Backend**: Zero server-side inference. All models run in the browser's shared GPU memory.
- **Offloading**: Heavy computation is strictly handled in `src/model.worker.ts` to prevent UI thread blocking.
- **Data Flow**:
1. `App.tsx` sends messages/images as a `WorkerRequest`.
2. `model.worker.ts` processes inference using Transformers.js.
3. `model.worker.ts` sends tokens back as `WorkerResponse`.
4. `App.tsx` updates React state for real-time streaming.
4. `App.tsx` updates the Zustand app store for real-time streaming.

## 📁 Critical Files

- `src/App.tsx`: Main UI logic, message orchestration, and worker management.
- `src/store/app-store.ts`: Shared app state and state transitions.
- `src/model.worker.ts`: Worker entry point for Transformers.js inference.
- `src/models.ts`: Configuration for all supported models and quantization settings.
- `src/styles.css`: Custom "glassmorphic" theme.
- `src/types.ts`: Common TypeScript interfaces and enums.

## ⚠️ Architectural Constraints

- **Local-First**: Do NOT attempt to add backend API calls for inference.
- **Web Workers**: Expensive logic (image processing, token generation) MUST stay in the worker.
- **WebGPU Only**: The app target is WebGPU-enabled browsers. Fallback logic is minimal.
- **VRAM Sensitivity**: Be cautious with large models. Use `q4f16` quantization by default.

## 🤝 Contribution Workflow

- Ensure all new logic is fully typed.
- Follow the existing aesthetic: Glassmorphism, CSS variables for colors, and responsive layouts.
- Maintain the single-page, local-only architecture.
77 changes: 52 additions & 25 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,61 @@
# 🤝 Contributing to Browser LLM Chat
# Contributing to Browser LLM Chat

We're delighted you're interested in contributing to **Browser LLM Chat**! This project aims to make high-quality AI accessible to everyone via the browser.
## Setup

## How to Help
Use Node `22.x`, then install dependencies:

### 🐛 Bug Reports
If you find a bug, please open an issue with:
- A clear description of the problem.
- Your browser and OS version.
- Steps to reproduce.
```bash
npm install
```

### ✨ Feature Requests
We'd love to hear your ideas! If you have a feature in mind, feel free to open a discussion or a pull request.
Run the app locally with:

### 💻 Code Contributions
1. **Fork** the repository and create your branch from `main`.
2. **Install** dependencies: `npm install`.
3. **Run** the development server: `npm run dev`.
4. **Build** to check for errors: `npm run build`.
5. **Commit** your changes with clear, descriptive messages.
6. **Submit** a pull request!
```bash
npm run dev
```

---
## Required Checks

## Technical Standards
- **Local-First**: We never send user data to a backend.
- **Modern UI**: Keep changes consistent with our **glassmorphic** theme.
- **Web Workers**: Always perform expensive tasks in a worker context.
- **TypeScript**: New code must be fully typed.
Before opening a pull request, run:

---
```bash
npm run check
```

Happy contributing! ❤️
That command validates linting, type safety, tests, and the production build.

## Working Rules

- Keep the app local-first. Do not add backend inference calls.
- Keep expensive inference, summarization, and model work inside the worker layer.
- Preserve the current SPA structure. Do not add router-driven page navigation unless explicitly scoped.
- Keep new code fully typed.
- Prefer small focused modules and hooks over growing monolith files.
- Reuse shared helpers for storage, dialog behavior, and model logic instead of duplicating patterns.

## Coding Standards

- ESLint is the source of truth for lint rules.
- Prettier is the source of truth for formatting.
- Prefer type-only imports where possible.
- Add tests for extracted pure logic and regressions when refactoring behavior-heavy code.

## Suggested Workflow

1. Create a branch from `main`.
2. Make focused changes with clear commit scope.
3. Run `npm run check`.
4. Update docs when contributor behavior, scripts, or architecture expectations change.
5. Open a pull request with a concise summary and validation notes.

## Architecture Boundaries

- `src/App.tsx`: app composition and screen orchestration
- `src/store/`: shared app state
- `src/components/`: presentational UI
- `src/hooks/`: reusable React behavior
- `src/worker/`: model runtime domains
- `src/chat-store.ts` and `src/storage.ts`: persistence concerns
- `src/test/`: shared test coverage for app and worker logic

Keep responsibilities aligned with those boundaries when adding new code.
134 changes: 63 additions & 71 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,95 +1,87 @@
# 🌐 Browser LLM Chat
# Browser LLM Chat

A high-performance, **100% local-first** React application designed to run Large Language Models directly in your browser. Leveraging **WebGPU** via `@huggingface/transformers`, this experiment brings powerful inference to the client-side with no backend required.
Browser LLM Chat is a fully local-first React application that runs language and vision models directly in the browser with WebGPU. There is no backend inference layer, no API key requirement, and no server-side chat state.

![Browser LLM Chat Interface](https://img.shields.io/badge/Status-Experimental-orange)
![WebGPU-Powered](https://img.shields.io/badge/Powered%20By-WebGPU-blue)
![Local-First](https://img.shields.io/badge/Privacy-Local--First-green)
## Highlights

---
- Local-first inference with `@huggingface/transformers`
- Web Worker based model loading and token streaming
- Shared app state managed with Zustand
- Chat history persisted locally with IndexedDB and localStorage fallback
- Curated browser-ready models plus searchable Hugging Face discovery
- Built-in settings for generation controls and downloaded-model cleanup

## ✨ Key Features
## Stack

- **🚀 100% Local Inference**: Your prompts, images, and model outputs never leave your browser. Privacy is built-in by design.
- **⚡ WebGPU Accelerated**: Utilizes your GPU's power for near-native performance on compatible browsers (Chrome/Edge Desktop).
- **🧠 Advanced Model Support**: Access to specialized browser-friendly models, including:
- **Balanced (Gemma 3 1B)**: A balanced desktop default for everyday browser chat.
- **Reasoning (DeepSeek R1 1.5B)**: Built-in reasoning capabilities for complex logic via distillation.
- **Coding (Qwen 2.5 Coder)**: Compact coding helpers for quick edits and code explanations.
- **Vision (Qwen 3.5 Vision)**: Fully multimodal support for image-to-text tasks.
- **Fast / Mobile-Safe**: SmolLM2 (360M) and Qwen 2.5 (0.5B) for ultra-quick response times.
- **⚙️ Generation Parameter Controls**: Fine-grained control over model temperature, top-p, and token limits via an intuitive settings dialog.
- **💾 Storage & Chat Management**: Automatic Hugging Face caching, clear chat history, and robust data management to delete offline model files directly from the UI.
- **🧵 Worker-Based Architecture**: Heavy computation happens in a dedicated Web Worker to keep the UI smooth and responsive.
- React 19
- Vite
- TypeScript
- Vanilla CSS
- `@huggingface/transformers` `4.0.0-next.x`

---
## Prerequisites

## 🛠 Tech Stack
- Node.js `22.x`
- A WebGPU-capable browser
Recommended: recent Chrome or Edge desktop builds

- **Core**: [React 19](https://react.dev/), Vite, TypeScript
- **Inference**: [@huggingface/transformers (v4.0.0-next)](https://github.com/huggingface/transformers.js)
- **Styling**: Vanilla CSS (Custom UI with glassmorphism and modern aesthetics)
- **Formatting**: `react-markdown` with `remark-gfm` for rich text and reasoning blocks.
## Getting Started

---
```bash
npm install
npm run dev
```

## 🚀 Getting Started
Open [http://localhost:5173](http://localhost:5173).

### Prerequisites
## Scripts

- A browser with **WebGPU support** (Recommended: Chrome 113+ or Edge 113+ on Desktop).
- [Node.js](https://nodejs.org/) installed.
- `npm run dev`: start the Vite dev server
- `npm run build`: typecheck and build the production bundle
- `npm run preview`: preview the production build locally
- `npm run lint`: run ESLint with zero warnings allowed
- `npm run lint:fix`: apply safe ESLint autofixes
- `npm run format`: format the repo with Prettier
- `npm run format:check`: verify formatting without rewriting files
- `npm run typecheck`: run TypeScript without emitting
- `npm run test`: run the Vitest suite
- `npm run test:watch`: run Vitest in watch mode
- `npm run check`: run lint, typecheck, tests, and build

### Installation
## Architecture Notes

1. **Clone the repository**:
```bash
git clone <repository-url>
cd web-llm
```
- `src/App.tsx` is the top-level composition layer for the SPA shell.
- Shared cross-screen UI, chat, and model state lives in `src/store/app-store.ts`.
- Heavy inference work stays in `src/model.worker.ts` plus the focused worker helpers in `src/worker/`.
- Chat persistence is handled locally through `src/chat-store.ts`.
- Lightweight preferences, storage helpers, and storage feedback live in `src/storage.ts`.
- Tests live in `src/test/` so contributors have one place to look for coverage.
- There is intentionally no router or backend inference path.

2. **Install dependencies**:
```bash
npm install
```
## Quality Gates

3. **Run the development server**:
```bash
npm run dev
```
Pull requests are expected to pass:

4. **Open in Browser**: Navigate to `http://localhost:5173`.
- `npm run lint`
- `npm run typecheck`
- `npm run test`
- `npm run build`

---
GitHub Actions runs the same checks automatically.

## 📝 Notes & Limitations
## Documentation

- **First Load**: The initial model download (200MB - 900MB depending on the model) may take some time depending on your connection.
- **VRAM**: Older GPUs with limited VRAM may struggle with the Vision/Thinking models.
- **Environment**: This is an experimental proof-of-concept.
- [Architecture overview](docs/architecture.md)
- [Model details](docs/models.md)
- [Contributor guide](CONTRIBUTING.md)
- [Agent context](AGENTS.md)

---
## Limitations

## 🤝 Credits
- First-time model downloads can be large and slow on constrained networks.
- Larger models remain sensitive to browser, VRAM, and device class.
- WebGPU support is required for the supported experience.

Special thanks to the **Hugging Face** team for the amazing [transformers.js](https://huggingface.co/docs/transformers.js/index) library and the open-source community for the quantized ONNX models.
## License

---

Built with ❤️ for the future of free, private AI.

---

## 📚 Documentation

- [**🏗️ Architecture & Privacy**](docs/architecture.md): Deep dive into our local-first, WebGPU-powered engine.
- [**🧠 Model Details**](docs/models.md): Understanding the SmolLM and Qwen configurations.
- [**🤖 AI Agent Context**](AGENTS.md): Contextual information for AI coding assistants.

---

## 🤝 Community & Support

- [**Contributing Guidelines**](CONTRIBUTING.md): How to help improve Browser LLM Chat.
- [**Code of Conduct**](CODE_OF_CONDUCT.md): Our commitment to a welcoming environment.
- [**License**](LICENSE): This project is released under the MIT License.
MIT
26 changes: 24 additions & 2 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,21 @@
## Core Components

### 1. **React UI (Vite + TypeScript)**
The user interface is built with **React 19** and **Vite**, focusing on a clean, responsive, and "glassmorphic" aesthetic. It manages state, chat history, and model selection.

The user interface is built with **React 19** and **Vite**, focusing on a clean, responsive, and "glassmorphic" aesthetic. Shared app state is centralized in a lightweight **Zustand** store, while local component-only draft state remains local to the components that own it.

### 2. **WebGPU Acceleration**

The application uses the **WebGPU API** to leverage the user's graphics hardware for model inference. This provides near-native performance for transformer-based models by utilizing the parallel processing power of modern GPUs.

### 3. **Web Worker Threading**
To ensure a smooth UI experience, all heavy lifting (model loading, processing, and inference) is offloaded to a **dedicated Web Worker** (`model.worker.ts`). Communication between the UI and the worker happens asynchronously via the `postMessage` API.

To ensure a smooth UI experience, all heavy lifting (model loading, processing, summarization, and inference) is offloaded to a **dedicated Web Worker** (`model.worker.ts`). Communication between the UI and the worker happens asynchronously via the `postMessage` API. The worker is kept intentionally coarse-grained: the entry file handles message routing, while a small `src/worker/` set owns model session state, conversation budgeting, and generation logic.

### 4. **Transformers.js (v4.0.0-next)**

We use the `@huggingface/transformers` library to handle:

- **ONNX Model Loading**: Loading quantized model weights.
- **Tokenization**: Converting text to numerical input.
- **Inference**: Running the model and streaming output tokens.
Expand All @@ -24,18 +29,35 @@ We use the `@huggingface/transformers` library to handle:
## 🔒 Security & Privacy

### **100% Local-First**

- **No Data Leakage**: Your prompts, images, and model outputs never leave your machine. There is no backend telemetry or logging of your conversations.
- **Offline Capable**: Once the model weights are downloaded into the browser cache, the application can run fully offline.

### **Model Provenance**

- Models are fetched directly from the [Hugging Face Hub](https://huggingface.co/models). We use official and community-quantized versions of reputable models (SmolLM, Qwen).

### **Safe Model Execution**

- The models run within the browser's sandboxed environment. They cannot access your local file system (except through explicit user-provided file uploads) or other browser data.

---

## 💡 Local Inference Benefits

- **Zero Latency**: No network round-trips for inference.
- **Privacy By Design**: Ideal for sensitive or personal queries.
- **Cost Effective**: No expensive GPU server hosting required.

---

## 📁 Project Layout

- `src/App.tsx`: SPA shell composition and screen orchestration
- `src/store/app-store.ts`: shared app state and actions
- `src/components/`: visible UI sections and dialogs
- `src/hooks/`: reusable React behaviors that are shared across screens
- `src/chat-store.ts`: durable chat persistence and legacy thread migration
- `src/storage.ts`: lightweight browser state and storage feedback helpers
- `src/model.worker.ts` + `src/worker/`: inference runtime
- `src/test/`: centralized app and worker tests
Loading
Loading