Skip to content

SamurAIGPT/AI-Youtube-Shorts-Generator

Repository files navigation

AI YouTube Shorts Generator

The open-source alternative to Opus Clip, Vidyo.ai, Klap, SubMagic, 2short.ai, and other AI clipping tools. Drop in any long-form YouTube video and get back ranked, viral-ready 9:16 shorts — for free, with no per-clip credits, no watermarks, and full control over the highlight algorithm.

Built for creators, agencies, and developers who don't want to pay $20–$300/month or be capped on minutes processed. Uses GPT-class LLM highlight detection and Whisper transcription to extract the most viral-worthy moments and auto-crop them vertically for TikTok, Reels, and Shorts.

Building your own Opus Clip–style SaaS? Skip the infra and ship on the same APIs that power this repo:

longshorts

Why Use This Instead of Opus Clip / Vidyo.ai / Klap?

This repo Opus Clip / Vidyo.ai / Klap / SubMagic
Price Free + open source (pay only for API usage) $20–$300/month subscriptions
Per-clip credits None — process unlimited videos Monthly minute caps, overage fees
Watermarks Never On free tiers
Highlight algorithm Fully editable virality framework Black box
Output format Any aspect ratio, any resolution Locked presets
Batch processing xargs an entire URL list Manual upload one-by-one
JSON / API output Built-in (--output-json) Limited or paid tier only
Self-hostable Yes — runs on your machine or server SaaS only, your videos sit on their servers
White-label / embeddable Yes — MIT licensed, import as Python lib No

Features

  • 🎬 YouTube In, Vertical Out: Hand it any YouTube URL — get back N viral-ready 9:16 mp4s
  • 🔀 Two Modes — API (fast) or Local (offline): Default --mode api uses MuAPI for download/transcription/cropping; --mode local runs entirely on your machine with yt-dlp, faster-whisper, OpenAI, and ffmpeg/opencv — pick what fits
  • 🤖 Virality-Aware Highlight Selection: Clips ranked on hooks, emotional peaks, opinion bombs, revelation moments, conflict, quotable lines, story peaks, and practical value — not just generic "interesting"
  • 📈 Score + Hook + Reason for Every Clip: Each highlight comes with a viral score, an opening hook line, and a one-sentence explanation of why it works
  • 🎤 Whisper Transcription, Your Choice: Cloud (/openai-whisper via MuAPI) or local (faster-whisper, CPU or CUDA) — same downstream output shape
  • 🧩 Long-Video Aware: Videos over 30 minutes are auto-chunked with overlap so nothing gets missed
  • ♻️ Smart Dedupe: Overlapping highlights are collapsed by score so you never get two near-duplicate clips
  • 🎯 Smart Vertical Crop: API mode uses MuAPI's auto-crop; local mode runs OpenCV face tracking with motion smoothing
  • 📱 Any Aspect Ratio: 9:16 for TikTok/Reels/Shorts, 1:1 for square, anything else by flag
  • 🧰 CLI + Python Library: Use it from the shell or import generate_shorts(...) into your own pipeline
  • 📦 JSON Output: --output-json dumps the full result (transcript + every candidate highlight + final clip URLs/paths) for downstream automation

Quick Start (No Setup)

Don't want to self-host? The AI Clipping API gives you the same Opus Clip–style pipeline as a single HTTP call — no Python, no dependencies, pay-per-clip instead of monthly subscriptions.


Installation (Self-Hosted)

Prerequisites

  • Python 3.10+
  • For API mode (default): a MuAPI key — powers download, transcription, highlight ranking, and clipping in a single dependency
  • For Local mode (--mode local): ffmpeg on your PATH and an OPENAI_API_KEY (only the LLM step is remote; everything else runs offline)

Steps

  1. Clone the repository:

    git clone https://github.com/SamurAIGPT/AI-Youtube-Shorts-Generator.git
    cd AI-Youtube-Shorts-Generator
  2. Create and activate a virtual environment:

    python3.10 -m venv venv
    source venv/bin/activate
  3. Install Python dependencies:

    pip install -r requirements.txt
    # Only if you plan to use --mode local:
    pip install -r requirements-local.txt
  4. Set up environment variables:

    Create a .env file in the project root:

    # API mode (default)
    MUAPI_API_KEY=your_muapi_key_here
    
    # Local mode (--mode local) — only the OPENAI key is required
    OPENAI_API_KEY=your_openai_key_here
    OPENAI_MODEL=gpt-4o-mini          # optional, default gpt-4o-mini
    LOCAL_WHISPER_MODEL=base          # tiny / base / small / medium / large-v3
    LOCAL_WHISPER_DEVICE=auto         # auto / cpu / cuda
    LOCAL_OUTPUT_DIR=output           # where local mp4s land

Usage

Single video (API mode — default)

python main.py "https://www.youtube.com/watch?v=VIDEO_ID"

Single video (Local mode — runs offline except for the LLM call)

python main.py "https://www.youtube.com/watch?v=VIDEO_ID" --mode local

Local mode writes the rendered shorts to ./output/short_01.mp4, short_02.mp4, … (override with LOCAL_OUTPUT_DIR).

With options

python main.py "https://www.youtube.com/watch?v=VIDEO_ID" \
    --mode api \
    --num-clips 5 \
    --aspect-ratio 9:16 \
    --output-json result.json

Local file

Drop in a hosted mp4 URL directly via the Python API (the CLI is YouTube-first):

from shorts_generator import generate_shorts

result = generate_shorts(
    "https://www.youtube.com/watch?v=...",
    num_clips=5,
    aspect_ratio="9:16",
)
for short in result["shorts"]:
    print(short["score"], short["title"], short["clip_url"])

Batch processing

Create a urls.txt file with one URL per line, then:

xargs -a urls.txt -I{} python main.py "{}"

CLI flags

Flag Default Notes
--mode api api (MuAPI, fast, no setup) or local (yt-dlp + faster-whisper + OpenAI + ffmpeg)
--num-clips 3 How many shorts to render
--aspect-ratio 9:16 Any ratio; 9:16 for TikTok/Reels, 1:1 for square
--format 720 Source download resolution: 360 / 480 / 720 / 1080
--language auto Force Whisper language code (e.g. en)
--output-json Dump the full result (transcript + all candidates) to a file

API mode vs Local mode

Step API mode (--mode api) Local mode (--mode local)
Download MuAPI /youtube-download yt-dlp
Transcription MuAPI /openai-whisper faster-whisper (CPU or CUDA)
Highlight LLM MuAPI gpt-5-mini OpenAI (gpt-4o-mini by default)
Vertical crop MuAPI /autocrop ffmpeg + OpenCV face tracking
Output hosted URLs local mp4 paths
Required keys MUAPI_API_KEY OPENAI_API_KEY (+ ffmpeg on PATH)

How It Works

  1. Download: Fetches the source video from YouTube
  2. Transcribe: MuAPI /openai-whisper produces a timestamped transcript (verbose_json segments)
  3. Detect content type: An LLM classifies the video (podcast, interview, tutorial, vlog, etc.) and density, so the prompt can be tuned per content style
  4. Long-video chunking: Videos > 30 min are split into 20-min overlapping chunks
  5. Highlight ranking: An LLM scans the transcript through a virality framework — hook moments, emotional peaks, opinion bombs, revelations, conflict, quotables, story peaks, practical value — and emits ranked candidates with scores 0–100
  6. Dedupe: Overlapping candidates are collapsed by score (>50% overlap → keep the higher score)
  7. Top-N selection: The top --num-clips candidates are selected
  8. Auto-crop: Each highlight is rendered as a vertical short at the requested aspect ratio

Output: a list of mp4 URLs plus, for each clip, its title, viral score, hook sentence, and a one-line reason explaining why it should perform.

Output

Console output looks like:

========================================================================
Highlights:    7 candidates → kept top 3
========================================================================

#1  score=92  124.3s → 187.6s
     title:  The one mistake that cost me $50K
     hook:   "Nobody talks about this, but it killed my first startup..."
     clip:   https://.../short_1.mp4

#2  score=88  ...

--output-json result.json produces:

{
  "source_video_url": "...",
  "transcript": { "duration": 1873.4, "segments": [...] },
  "highlights": [ {...}, {...}, ... ],
  "shorts": [
    {
      "title": "...",
      "start_time": 124.3,
      "end_time": 187.6,
      "score": 92,
      "hook_sentence": "...",
      "virality_reason": "...",
      "clip_url": "https://.../short_1.mp4"
    }
  ]
}

Configuration

Highlight selection criteria

Edit shorts_generator/highlights.py:

  • Virality framework: VIRALITY_CRITERIA — the ranked list of signals the LLM optimizes for
  • System prompt: HIGHLIGHT_SYSTEM_PROMPT — duration sweet spot, hook rules, JSON schema
  • Chunk size: CHUNK_SIZE_SECONDS (default 1200) — chunk length for long videos
  • Long-video threshold: LONG_VIDEO_THRESHOLD (default 1800) — videos longer than this are chunked
  • Chunk overlap: CHUNK_OVERLAP_SECONDS (default 60) — overlap between chunks so cross-boundary clips aren't missed

Polling / timeout

Edit shorts_generator/config.py (or set env vars):

  • MUAPI_POLL_INTERVAL (default 5s) — seconds between job-status polls
  • MUAPI_POLL_TIMEOUT (default 1800s) — give up after this long

Whisper transcription

Audio is transcribed by MuAPI's /openai-whisper endpoint (server-side whisper-1). Pass --language <code> to lock the recognition to a specific language; otherwise it auto-detects.

Project Structure

AI-Youtube-Shorts-Generator/
├── main.py                       CLI entry point
├── requirements.txt              core deps (api mode)
├── requirements-local.txt        optional deps for --mode local
├── .env.example
└── shorts_generator/
    ├── config.py                 env / settings (MuAPI + OpenAI + Whisper)
    ├── muapi.py                  generic submit + poll wrapper
    ├── downloader.py             API mode: YouTube download via MuAPI
    ├── transcriber.py            API mode: MuAPI /openai-whisper client
    ├── highlights.py             shared LLM virality ranking (pluggable backend)
    ├── clipper.py                API mode: MuAPI /autocrop
    ├── pipeline.py               mode dispatcher (api ↔ local)
    └── local/                    --mode local backends (offline)
        ├── downloader.py         yt-dlp download
        ├── transcriber.py        faster-whisper transcription
        ├── llm.py                OpenAI chat-completions client
        └── clipper.py            ffmpeg cut + OpenCV vertical crop

Troubleshooting

Whisper produced no segments

The video may have no detectable speech, or it may be in a language Whisper struggles with. Try passing --language en (or the correct ISO-639-1 code) to skip auto-detection.

Looking for better results?

The AI Clipping API uses an improved algorithm that produces higher-quality clips with better highlight detection.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request.

License

This project is licensed under the MIT License.

Related Projects

About

Open-source alternative to Opus Clip, Vidyo.ai, Klap & SubMagic. Turn long-form YouTube videos into viral 9:16 shorts using LLM highlight detection, Whisper transcription, and auto vertical cropping — free, no watermarks, no per-clip credits.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages