VideoAnalyzer

Drop in a video. Ask anything about it.

VideoAnalyzer runs a full multi-modal analysis pipeline on any video file — object detection, transcription, scene segmentation, audio classification, OCR — then spins up an AI assistant that knows exactly what happened, when, and why.

What it does

Upload a video → everything below runs automatically in the background:

Step	What happens
Metadata probe	Duration, resolution, FPS via ffprobe
Whisper transcription	Full VTT transcript with timestamps (faster-whisper)
YOLO object detection	Frame-by-frame detection at 1 FPS (YOLOv8)
Scene segmentation	Cut detection + per-scene brightness, motion, color palette
Audio classification	Speech / silence / music+noise segmentation
OCR	On-screen text extracted from scene keyframes (EasyOCR)
Context assembly	Everything merged into a structured document for the AI

Then you chat with a Backboard AI assistant that can answer questions like:

"What objects appear between 1:30 and 2:00?" "Find every moment someone says 'product launch'" "Describe what's happening at 0:45 — visually, audibly, and any text on screen" "When does the car first appear and when does it leave?"

Stack

Python 3.10+ · Flask · uv
YOLOv8 (Ultralytics) — object detection
faster-whisper — speech transcription
EasyOCR — on-screen text recognition
ffmpeg — frame extraction + audio processing
Backboard — AI assistant with tool-call loop, thread memory, document storage

Quickstart

Prerequisites

brew install ffmpeg        # macOS
# or: sudo apt install ffmpeg

You'll also need a Backboard API key.

Install & run

git clone https://github.com/your-username/video-analyzer
cd video-analyzer

cp .env.example .env
# → add your BACKBOARD_API_KEY to .env

./start.sh

Open http://localhost:5050 and drop in a video.

start.sh syncs dependencies via uv, clears temp files, and starts the server. Ctrl+C to stop.

Configuration

# .env
BACKBOARD_API_KEY=your_api_key_here
WHISPER_MODEL=base          # tiny | base | small | medium | large
FLASK_PORT=5050

Whisper model size trades speed for accuracy. base is a good starting point; small or medium for better results on noisy audio.

API

All logic lives in the API — the UI is thin.

Videos

POST   /api/videos                  Upload a video (multipart/form-data, field: file)
GET    /api/videos                  List all videos + status
GET    /api/videos/{id}             Full analysis JSON
GET    /api/videos/{id}/video       Stream source file
GET    /api/videos/{id}/transcript.vtt  VTT transcript

Processing is async. Poll GET /api/videos/{id} and watch status: uploading → processing → ready (or error)

Chat

POST   /api/chat                    Send a message (returns task_id)
GET    /api/chat/task/{task_id}     Poll for response

Chat uses a task-polling pattern — post a message, get a task_id, poll until status: done.

Chat request body:

{
  "thread_id": "...",
  "content": "What objects appear in the first minute?",
  "video_id": "..."
}

Assistant tools

The AI has six tools it can call mid-conversation:

Tool	What it returns
`get_transcript`	Full or time-filtered VTT transcript
`search_transcript`	Timestamps matching a word or phrase
`get_objects_at_time`	Objects detected at a specific timestamp
`get_object_timeline`	Full appearance timeline for a named object
`get_scene_info`	Scene detail: colors, motion, audio, OCR text
`get_audio_segments`	Speech / silence / music timeline

Project structure

src/
├── app.py                  Flask app factory
├── models.py               Pydantic models (Video, Scene, ObjectSpan, ...)
├── backboard_client.py     Backboard SDK client
├── api/
│   ├── videos.py           Upload, list, serve endpoints
│   └── chat.py             Chat + task-polling + tool-call loop
├── assistant/
│   ├── setup.py            Assistant + system prompt
│   └── tools.py            Tool definitions (JSON schema)
└── services/
    ├── pipeline.py         Orchestrates all analysis steps
    ├── detector.py         YOLO frame detection
    ├── transcriber.py      Whisper transcription
    ├── audio.py            Audio segmentation
    ├── visual.py           Scene analysis + color palette
    ├── ocr.py              EasyOCR on keyframes
    ├── video_service.py    Backboard storage + local cache
    └── tool_handler.py     Dispatches assistant tool calls
templates/
├── index.html              Upload page
└── workspace.html          Video + chat workspace
models/
└── yolo26n.pt              YOLOv8 weights

Supported formats

.mp4 .mov .webm .avi .mkv — up to 500 MB

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
models		models
src		src
templates		templates
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
start.sh		start.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VideoAnalyzer

What it does

Stack

Quickstart

Prerequisites

Install & run

Configuration

API

Videos

Chat

Assistant tools

Project structure

Supported formats

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VideoAnalyzer

What it does

Stack

Quickstart

Prerequisites

Install & run

Configuration

API

Videos

Chat

Assistant tools

Project structure

Supported formats

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages