The open-source alternative to Opus Clip, Vidyo.ai, Klap, SubMagic, 2short.ai, and other AI clipping tools. Drop in any long-form YouTube video and get back ranked, viral-ready 9:16 shorts — for free, with no per-clip credits, no watermarks, and full control over the highlight algorithm.
Built for creators, agencies, and developers who don't want to pay $20–$300/month or be capped on minutes processed. Uses GPT-class LLM highlight detection and Whisper transcription to extract the most viral-worthy moments and auto-crop them vertically for TikTok, Reels, and Shorts.
Building your own Opus Clip–style SaaS? Skip the infra and ship on the same APIs that power this repo:
- AI Clipping API — end-to-end clip selection + render
- Auto-Crop API — vertical reframing only
| This repo | Opus Clip / Vidyo.ai / Klap / SubMagic | |
|---|---|---|
| Price | Free + open source (pay only for API usage) | $20–$300/month subscriptions |
| Per-clip credits | None — process unlimited videos | Monthly minute caps, overage fees |
| Watermarks | Never | On free tiers |
| Highlight algorithm | Fully editable virality framework | Black box |
| Output format | Any aspect ratio, any resolution | Locked presets |
| Batch processing | xargs an entire URL list |
Manual upload one-by-one |
| JSON / API output | Built-in (--output-json) |
Limited or paid tier only |
| Self-hostable | Yes — runs on your machine or server | SaaS only, your videos sit on their servers |
| White-label / embeddable | Yes — MIT licensed, import as Python lib | No |
- 🎬 YouTube In, Vertical Out: Hand it any YouTube URL — get back N viral-ready 9:16 mp4s
- 🔀 Two Modes — API (fast) or Local (offline): Default
--mode apiuses MuAPI for download/transcription/cropping;--mode localruns entirely on your machine withyt-dlp,faster-whisper, OpenAI, andffmpeg/opencv— pick what fits - 🤖 Virality-Aware Highlight Selection: Clips ranked on hooks, emotional peaks, opinion bombs, revelation moments, conflict, quotable lines, story peaks, and practical value — not just generic "interesting"
- 📈 Score + Hook + Reason for Every Clip: Each highlight comes with a viral score, an opening hook line, and a one-sentence explanation of why it works
- 🎤 Whisper Transcription, Your Choice: Cloud (
/openai-whispervia MuAPI) or local (faster-whisper, CPU or CUDA) — same downstream output shape - 🧩 Long-Video Aware: Videos over 30 minutes are auto-chunked with overlap so nothing gets missed
- ♻️ Smart Dedupe: Overlapping highlights are collapsed by score so you never get two near-duplicate clips
- 🎯 Smart Vertical Crop: API mode uses MuAPI's auto-crop; local mode runs OpenCV face tracking with motion smoothing
- 📱 Any Aspect Ratio: 9:16 for TikTok/Reels/Shorts, 1:1 for square, anything else by flag
- 🧰 CLI + Python Library: Use it from the shell or import
generate_shorts(...)into your own pipeline - 📦 JSON Output:
--output-jsondumps the full result (transcript + every candidate highlight + final clip URLs/paths) for downstream automation
Don't want to self-host? The AI Clipping API gives you the same Opus Clip–style pipeline as a single HTTP call — no Python, no dependencies, pay-per-clip instead of monthly subscriptions.
- Python 3.10+
- For API mode (default): a MuAPI key — powers download, transcription, highlight ranking, and clipping in a single dependency
- For Local mode (
--mode local):ffmpegon your PATH and anOPENAI_API_KEY(only the LLM step is remote; everything else runs offline)
-
Clone the repository:
git clone https://github.com/SamurAIGPT/AI-Youtube-Shorts-Generator.git cd AI-Youtube-Shorts-Generator -
Create and activate a virtual environment:
python3.10 -m venv venv source venv/bin/activate -
Install Python dependencies:
pip install -r requirements.txt # Only if you plan to use --mode local: pip install -r requirements-local.txt -
Set up environment variables:
Create a
.envfile in the project root:# API mode (default) MUAPI_API_KEY=your_muapi_key_here # Local mode (--mode local) — only the OPENAI key is required OPENAI_API_KEY=your_openai_key_here OPENAI_MODEL=gpt-4o-mini # optional, default gpt-4o-mini LOCAL_WHISPER_MODEL=base # tiny / base / small / medium / large-v3 LOCAL_WHISPER_DEVICE=auto # auto / cpu / cuda LOCAL_OUTPUT_DIR=output # where local mp4s land
python main.py "https://www.youtube.com/watch?v=VIDEO_ID"python main.py "https://www.youtube.com/watch?v=VIDEO_ID" --mode localLocal mode writes the rendered shorts to ./output/short_01.mp4, short_02.mp4, … (override with LOCAL_OUTPUT_DIR).
python main.py "https://www.youtube.com/watch?v=VIDEO_ID" \
--mode api \
--num-clips 5 \
--aspect-ratio 9:16 \
--output-json result.jsonDrop in a hosted mp4 URL directly via the Python API (the CLI is YouTube-first):
from shorts_generator import generate_shorts
result = generate_shorts(
"https://www.youtube.com/watch?v=...",
num_clips=5,
aspect_ratio="9:16",
)
for short in result["shorts"]:
print(short["score"], short["title"], short["clip_url"])Create a urls.txt file with one URL per line, then:
xargs -a urls.txt -I{} python main.py "{}"| Flag | Default | Notes |
|---|---|---|
--mode |
api |
api (MuAPI, fast, no setup) or local (yt-dlp + faster-whisper + OpenAI + ffmpeg) |
--num-clips |
3 |
How many shorts to render |
--aspect-ratio |
9:16 |
Any ratio; 9:16 for TikTok/Reels, 1:1 for square |
--format |
720 |
Source download resolution: 360 / 480 / 720 / 1080 |
--language |
auto | Force Whisper language code (e.g. en) |
--output-json |
— | Dump the full result (transcript + all candidates) to a file |
| Step | API mode (--mode api) |
Local mode (--mode local) |
|---|---|---|
| Download | MuAPI /youtube-download |
yt-dlp |
| Transcription | MuAPI /openai-whisper |
faster-whisper (CPU or CUDA) |
| Highlight LLM | MuAPI gpt-5-mini |
OpenAI (gpt-4o-mini by default) |
| Vertical crop | MuAPI /autocrop |
ffmpeg + OpenCV face tracking |
| Output | hosted URLs | local mp4 paths |
| Required keys | MUAPI_API_KEY |
OPENAI_API_KEY (+ ffmpeg on PATH) |
- Download: Fetches the source video from YouTube
- Transcribe: MuAPI
/openai-whisperproduces a timestamped transcript (verbose_json segments) - Detect content type: An LLM classifies the video (podcast, interview, tutorial, vlog, etc.) and density, so the prompt can be tuned per content style
- Long-video chunking: Videos > 30 min are split into 20-min overlapping chunks
- Highlight ranking: An LLM scans the transcript through a virality framework — hook moments, emotional peaks, opinion bombs, revelations, conflict, quotables, story peaks, practical value — and emits ranked candidates with scores 0–100
- Dedupe: Overlapping candidates are collapsed by score (>50% overlap → keep the higher score)
- Top-N selection: The top
--num-clipscandidates are selected - Auto-crop: Each highlight is rendered as a vertical short at the requested aspect ratio
Output: a list of mp4 URLs plus, for each clip, its title, viral score, hook sentence, and a one-line reason explaining why it should perform.
Console output looks like:
========================================================================
Highlights: 7 candidates → kept top 3
========================================================================
#1 score=92 124.3s → 187.6s
title: The one mistake that cost me $50K
hook: "Nobody talks about this, but it killed my first startup..."
clip: https://.../short_1.mp4
#2 score=88 ...
--output-json result.json produces:
{
"source_video_url": "...",
"transcript": { "duration": 1873.4, "segments": [...] },
"highlights": [ {...}, {...}, ... ],
"shorts": [
{
"title": "...",
"start_time": 124.3,
"end_time": 187.6,
"score": 92,
"hook_sentence": "...",
"virality_reason": "...",
"clip_url": "https://.../short_1.mp4"
}
]
}Edit shorts_generator/highlights.py:
- Virality framework:
VIRALITY_CRITERIA— the ranked list of signals the LLM optimizes for - System prompt:
HIGHLIGHT_SYSTEM_PROMPT— duration sweet spot, hook rules, JSON schema - Chunk size:
CHUNK_SIZE_SECONDS(default 1200) — chunk length for long videos - Long-video threshold:
LONG_VIDEO_THRESHOLD(default 1800) — videos longer than this are chunked - Chunk overlap:
CHUNK_OVERLAP_SECONDS(default 60) — overlap between chunks so cross-boundary clips aren't missed
Edit shorts_generator/config.py (or set env vars):
MUAPI_POLL_INTERVAL(default 5s) — seconds between job-status pollsMUAPI_POLL_TIMEOUT(default 1800s) — give up after this long
Audio is transcribed by MuAPI's /openai-whisper endpoint (server-side whisper-1). Pass --language <code> to lock the recognition to a specific language; otherwise it auto-detects.
AI-Youtube-Shorts-Generator/
├── main.py CLI entry point
├── requirements.txt core deps (api mode)
├── requirements-local.txt optional deps for --mode local
├── .env.example
└── shorts_generator/
├── config.py env / settings (MuAPI + OpenAI + Whisper)
├── muapi.py generic submit + poll wrapper
├── downloader.py API mode: YouTube download via MuAPI
├── transcriber.py API mode: MuAPI /openai-whisper client
├── highlights.py shared LLM virality ranking (pluggable backend)
├── clipper.py API mode: MuAPI /autocrop
├── pipeline.py mode dispatcher (api ↔ local)
└── local/ --mode local backends (offline)
├── downloader.py yt-dlp download
├── transcriber.py faster-whisper transcription
├── llm.py OpenAI chat-completions client
└── clipper.py ffmpeg cut + OpenCV vertical crop
The video may have no detectable speech, or it may be in a language Whisper struggles with. Try passing --language en (or the correct ISO-639-1 code) to skip auto-detection.
The AI Clipping API uses an improved algorithm that produces higher-quality clips with better highlight detection.
Contributions are welcome! Please fork the repository and submit a pull request.
This project is licensed under the MIT License.
