Multi-Provider TTS AI Blog Reader - Python, React, Tailwind CSS, FastAPI, SSE Streaming, Multi-Agent Pipeline, Text Chunking, Conversion History FullStack Project
A full-stack learning and portfolio project that turns long-form text and blog URLs into downloadable MP3 audio. You can run everything on your machine: a FastAPI backend coordinates multiple text-to-speech (TTS) engines, optional SSE (Server-Sent Events) streaming for a visible “multi-agent” pipeline, and a React + TypeScript + Vite frontend with Tailwind CSS for layout and theming. Beginners can start with zero API keys using Edge TTS or gTTS; advanced users can plug in OpenAI, ElevenLabs, or Replicate and compare quality, latency, and cost. The codebase is structured so you can read main.py for HTTP contracts and frontend/src for UI composition patterns you can reuse elsewhere.
- Live Demo: https://blog-reader-tts.vercel.app/
- Backend Live Demo: https://blog-audio-backend.arnobmahmud.com
- Keywords at a glance
- Provider comparison
- Architecture
- Features
- Technology stack and dependencies
- Project structure
- Frontend routes, pages, and components
- Backend overview
- API reference
- Environment variables (.env)
- Getting started
- How to run (development)
- Build, preview, and lint
- Learner walkthrough
- Reusing pieces in other projects
- Code snippets
- Multi-agent pipeline (SSE)
- Further documentation
- Conclusion
| Keyword | Short meaning in this project |
|---|---|
| TTS | Text-to-speech: synthesizing spoken audio from text. |
| SSE | Server-Sent Events: one-way stream from server to browser (used for pipeline progress). |
| FastAPI | Modern Python web framework; auto OpenAPI docs at /docs. |
| Vite | Fast dev server and bundler for the React frontend. |
| Chunking | Splitting long text into sentence-safe segments, synthesizing each, then merging audio. |
| Provider | A concrete TTS engine (e.g. Edge TTS, OpenAI) selected by the user. |
| Pipeline mode | Optional multi-step flow with streamed status updates per “agent” stage. |
| Simple mode | Single POST /api/convert request returning an MP3 file. |
| CORS | Cross-Origin Resource Sharing; required when the SPA and API are on different origins. |
| Sentry | Optional error monitoring for the browser (see frontend/src/sentry.ts and .env.example). |
| Provider | Status | Free Tier | API Key | Quality | Speed Control |
|---|---|---|---|---|---|
| Edge TTS | Working | Unlimited | No | Neural (very good) | Yes |
| gTTS | Working | Unlimited | No | Basic | No |
| ElevenLabs | Partial | 10k credits/mo | Yes | Premium | No |
| Hugging Face | Unavailable | N/A | Optional | N/A | No |
| Replicate | Paid | Limited free runs | Yes | High | No |
| OpenAI | Paid | $5 new account | Yes | Premium | Yes |
See docs/API_KEY_LIMITATION_AND_SITUATION.md for detailed provider status, billing notes, and troubleshooting.
┌─────────────────────────────────────────────────┐
│ Frontend (React) │
│ ReaderPage.tsx: tabs, provider select, pipeline │
│ SSE stepper, history, health indicators │
└──────────────────────┬──────────────────────────┘
│ /api/*
┌──────────────────────▼──────────────────────────┐
│ Backend (FastAPI) │
│ │
│ ┌── Simple Mode ──────────────────────────┐ │
│ │ POST /api/convert → TTS → FileResponse │ │
│ └─────────────────────────────────────────┘ │
│ │
│ ┌── Pipeline Mode (SSE) ──────────────────┐ │
│ │ Extractor → Analyzer → Preprocessor │ │
│ │ → Optimizer → Synthesizer → Validator │ │
│ │ → Assembler → Audio │ │
│ └─────────────────────────────────────────┘ │
│ │
│ TTS Providers: │
│ edge-tts │ gTTS │ ElevenLabs │ HuggingFace │
│ Replicate │ OpenAI │
└──────────────────────────────────────────────────┘Data flow (mental model): the browser loads the SPA. For API calls it uses either a relative /api/... (local dev: Vite proxies to http://localhost:8000) or an absolute origin from VITE_API_BASE_URL (typical production: static site on Vercel + API on a VPS). The backend writes temporary files under audio_files/ and exposes them via /api/audio/{file_id}.
| Feature | Description |
|---|---|
| 6 TTS Providers | Edge TTS, gTTS (both free), ElevenLabs, Hugging Face, Replicate, OpenAI |
| Multi-Agent Pipeline | 7-stage pipeline with SSE real-time progress (Extractor → Assembler) |
| Text Chunking | Auto-splits long text at sentence boundaries, concatenates audio chunks |
| Provider Health | Green/yellow/red status dots with per-provider notes |
| Dynamic Voices | /api/voices/{provider} fetches available voices (Edge: 400+, ElevenLabs: user voices) |
| Conversion History | localStorage-based history with playback and re-download |
| Pipeline Mode Toggle | Switch between simple (fast) and pipeline (multi-agent) modes |
| OpenAI Model Selection | Choose tts-1, tts-1-hd, or gpt-4o-mini-tts |
| Speed Control | 0.5x-2.0x for Edge TTS and OpenAI |
| URL Extraction | Scrape blog articles and convert to audio |
| Sample Texts | Quick-test presets (news, poetry, technology, story) |
| Error Handling | Provider-specific error messages with actionable suggestions |
| Package / area | Role in this project |
|---|---|
| fastapi, uvicorn | HTTP API, async-capable server. |
| python-dotenv | Loads .env next to main.py for API keys and CORS. |
| python-multipart, aiofiles | Form uploads and file streaming for audio. |
| requests, beautifulsoup4 | URL fetch and HTML parsing for /api/extract-text. |
| edge-tts, gTTS | Free TTS providers (no keys). |
| openai, elevenlabs, replicate, huggingface_hub | Paid or optional providers. |
| pydub (+ audioop-lts on Python 3.13+) | Audio concatenation and format handling. |
| Package | Role in this project |
|---|---|
| react, react-dom | UI library (React 19). |
| react-router-dom | Client routes: /, /app, /health. |
| typescript | Static typing for safer refactors. |
| vite, @vitejs/plugin-react | Dev server, HMR, production build. |
| tailwindcss, @tailwindcss/vite | Utility-first styling. |
| @radix-ui/react-* | Accessible primitives (tabs, select, dialog, etc.). |
| class-variance-authority, clsx, tailwind-merge | Variant styling and conditional className merging (cn() helper). |
| framer-motion | Motion on landing and UI polish. |
| lucide-react | Icon set. |
| sonner | Toast notifications. |
| @sentry/react | Optional client error reporting. |
blog-to-audio/
├── main.py # FastAPI: routes, TTS, pipeline, health, monitoring tunnel
├── requirements.txt # Python dependencies
├── .env.example # Backend env template (copy → `.env`)
├── Dockerfile # Backend container image
├── package.json # Root: `npm run lint` → frontend ESLint
├── README.md # This file
├── docs/
│ ├── API_KEY_LIMITATION_AND_SITUATION.md # Provider matrix & troubleshooting
│ ├── COOLIFY_PUBLIC_BACKEND_GUIDE.md
│ ├── DOCKER_VPS_BACKEND_PLAYBOOK.md
│ ├── Redis_Sentry_PostHog_INTEGRATION_GUIDE.md
│ ├── UI_STYLING_GUIDE.md
│ ├── VERCEL_PRODUCTION_GUARDRAILS.md
│ ├── SAFE_IMAGE_REUSABLE_COMPONENT.md
│ └── RIPPLE_BUTTON_EFFECT.md
└── frontend/
├── index.html # SPA shell + SEO meta
├── vite.config.ts # React plugin, Tailwind, `/api` proxy → :8000
├── package.json
├── public/ # favicon, fonts, images, robots.txt
└── src/
├── main.tsx # React root + Sentry init
├── App.tsx # Router + ErrorBoundary
├── sentry.ts # Sentry browser SDK wiring
├── index.css # Tailwind entry + global utilities
├── pages/
│ ├── IntroPage.tsx # Landing / portfolio intro
│ ├── ReaderPage.tsx # Main TTS tool (tabs, SSE, history)
│ └── HealthPage.tsx # Provider health view
├── components/
│ ├── layout/ # RootLayout, Footer, PageBackground, BackendDocLinks
│ ├── ui/ # Button, Card, Tabs, Select, … (Radix + CVA)
│ └── audio/ # AudioPlayerWithVisualizer
├── hooks/ # usePrefersReducedMotion
└── lib/ # utils.ts (`cn`), api-base.ts (`apiUrl`, `getApiBaseUrl`)| Route | Page | Purpose |
|---|---|---|
/ |
IntroPage.tsx |
Portfolio-style landing; links into the app. |
/app |
ReaderPage.tsx |
Full TTS experience: URL/text input, provider selection, simple vs pipeline, history. |
/health |
HealthPage.tsx |
Reads backend health / provider status for debugging or demos. |
* |
redirect → / |
Unknown paths fall back to home. |
Important files for learners
frontend/src/lib/api-base.ts— Central place for API URLs. In dev, leaveVITE_API_BASE_URLunset sofetch('/api/...')hits the Vite proxy.frontend/src/pages/ReaderPage.tsx— Largest UI module: forms, SSE client, state for provider and audio.frontend/src/components/ui/*— Small, composable pieces (e.g.Button,Tabs) you can copy into another design system with minimal changes if you keep the same Radix +cn()pattern.
FastAPIapp is created inmain.pywith title “Blog to Audio API”.- CORS reads
CORS_ORIGINS(comma-separated). If unset, it allows*(fine for local experiments; tighten for production). - Static audio is stored under
audio_files/and served through/api/audio/{file_id}. - Heavy or blocking TTS work may use
run_in_threadpoolso the event loop stays responsive while still exposing async routes. - Interactive docs: run the backend and open
/docs(Swagger UI) to try every endpoint with forms.
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/providers |
Provider configs with status, badges, and notes. |
GET |
/api/provider-health |
Per-provider health (green / yellow / red). |
GET |
/api/voices/{provider} |
Voice list for a provider (optional api_key query for key-in-browser flows). |
GET |
/api/sample-texts |
Built-in sample paragraphs for quick tests. |
GET |
/api/health |
Liveness JSON (useful for uptime checks). |
GET |
/api/audio/{file_id} |
Download or stream a generated file by id. |
POST |
/api/extract-text |
Extract readable text from a blog URL. |
POST |
/api/estimate |
Estimate duration / cost hints before conversion. |
POST |
/api/convert |
Simple mode: multipart form → MP3 FileResponse. |
POST |
/api/convert-pipeline |
Pipeline mode: StreamingResponse (SSE) with staged progress. |
POST |
/api/monitoring |
Sentry tunnel (optional): browser envelopes forwarded to Sentry when configured (see .env.example). |
Simple convert (curl):
curl -X POST http://localhost:8000/api/convert \
-F "text=Hello world" \
-F "provider=edge-tts" \
-F "voice=en-US-AriaNeural" \
--output audio.mp3Pipeline convert (SSE stream):
curl -N -X POST http://localhost:8000/api/convert-pipeline \
-F "text=Hello world" \
-F "provider=edge-tts"You do not need any .env file to try the project with Edge TTS or gTTS—those providers use no API keys.
| Variable | Required? | Purpose |
|---|---|---|
OPENAI_API_KEY |
Only for OpenAI TTS | From OpenAI API keys. |
ELEVENLABS_API_KEY |
Only for ElevenLabs | From ElevenLabs settings. |
REPLICATE_API_TOKEN |
Only for Replicate | From Replicate account tokens. |
HF_API_KEY |
Optional | Hugging Face token; provider currently unavailable for TTS here—safe to skip. |
SENTRY_TUNNEL_PROJECT_IDS |
Optional | Comma-separated numeric project ids allowed to post to /api/monitoring. |
CORS_ORIGINS |
Recommended in production | e.g. https://blog-reader-tts.vercel.app — required if the SPA POSTs to your API from another origin (tunnel, fetches). |
| Variable | Required? | Purpose |
|---|---|---|
VITE_API_BASE_URL |
Optional locally | Full API origin without trailing slash. Unset in dev so Vite proxies /api to port 8000. Set on Vercel to your public FastAPI URL. |
VITE_SENTRY_DSN |
Optional | Browser DSN from Sentry. |
VITE_SENTRY_RELEASE / VERCEL_GIT_COMMIT_SHA |
Optional | Release name for Sentry (Vercel often provides the SHA). |
VITE_SENTRY_ENVIRONMENT, VITE_SENTRY_TRACES_SAMPLE_RATE |
Optional | Fine-tuning Sentry behavior. |
How to obtain keys (quick path): create accounts on the provider sites above, generate a secret key, paste into .env, restart uvicorn. Never commit .env or .env.local—they are gitignored patterns in normal setups.
Prerequisites: Python 3.12+, Node.js 18+, pip, and npm.
git clone https://github.com/arnobt78/blog-to-audio.git
cd blog-to-audio
# Backend dependencies
pip install -r requirements.txt
# Frontend dependencies
cd frontend && npm install && cd ..Terminal 1 — backend
cd blog-to-audio
uvicorn main:app --reload --port 8000Terminal 2 — frontend
cd blog-to-audio/frontend
npm run devOpen http://localhost:5173. Navigate to /app for the reader. API calls use /api/..., which Vite forwards to http://localhost:8000 (see frontend/vite.config.ts).
Optional: visit http://localhost:8000/docs for interactive API documentation.
# Production build (frontend)
cd frontend
npm run build
npm run preview # local preview of dist/
# ESLint
npm run lint # in frontend/
# or from repo root:
cd ..
npm run lint- Start both servers (backend first is a good habit so
/apinever 502s during page load). - Open
/app, choose Edge TTS, pick a voice, paste short text, click convert — no.envrequired. - Open
/docson the backend and execute the samePOST /api/convertto see the raw HTTP contract. - Toggle pipeline mode in the UI and watch SSE chunks arrive (the app uses
fetch+response.body.getReader(), notEventSource; in DevTools → Network, inspect theconvert-pipelinerequest response stream). - Try URL extraction: paste an article URL and confirm cleaned text appears before synthesis.
- Add one provider key to
.env, restart uvicorn, and compare OpenAI vs Edge quality on the same paragraph. - Read
ReaderPage.tsxin small chunks: findfetchfor/api/convert-pipeline, follow theReadableStreamreader loop, and map each block to an API row in the table above.
| Piece | How to reuse |
|---|---|
apiUrl() / getApiBaseUrl() |
Copy frontend/src/lib/api-base.ts into any Vite + React app that talks to a separate API; set VITE_API_BASE_URL in production. |
cn() helper |
Copy frontend/src/lib/utils.ts pattern (clsx + tailwind-merge) for conflict-free Tailwind classes. |
| UI primitives | components/ui/button.tsx, card.tsx, etc. follow shadcn-style APIs—port them with Radix installed and your theme tokens. |
| FastAPI patterns | Provider abstraction, StreamingResponse for SSE, and FileResponse for downloads are self-contained in main.py—split into routers/modules if you fork for a larger service. |
| Dockerfile | Use as-is or extend for Coolify / VPS deploys (see docs/DOCKER_VPS_BACKEND_PLAYBOOK.md). |
Call the API from React (uses proxy in dev):
import { apiUrl } from "@/lib/api-base";
const res = await fetch(apiUrl("/api/providers"));
const providers = await res.json();Minimal Python client (simple convert):
import requests
r = requests.post(
"http://localhost:8000/api/convert",
data={"text": "Hello from Python", "provider": "edge-tts", "voice": "en-US-AriaNeural"},
)
open("out.mp3", "wb").write(r.content)The pipeline mode processes text through seven stages (each can emit SSE events for the UI):
- Extractor — Resolves URL or validates pasted text.
- Analyzer — Language / length / rough duration estimates.
- Preprocessor — Cleans noise, prepares chunks.
- Optimizer — Picks provider parameters and chunk strategy.
- Synthesizer — Calls TTS per chunk with retries where applicable.
- Validator — Sanity-checks generated audio.
- Assembler — Concatenates chunks (with small gaps if configured) into one MP3.
The frontend shows a stepper driven by these events so learners can see how async workflows are modeled as a state machine.
| Document | Topic |
|---|---|
| docs/API_KEY_LIMITATION_AND_SITUATION.md | Providers, billing, errors |
| docs/DOCKER_VPS_BACKEND_PLAYBOOK.md | Docker / VPS deployment |
| docs/COOLIFY_PUBLIC_BACKEND_GUIDE.md | Coolify + public API |
| docs/VERCEL_PRODUCTION_GUARDRAILS.md | Frontend production notes |
| docs/Redis_Sentry_PostHog_INTEGRATION_GUIDE.md | Observability integrations |
| docs/UI_STYLING_GUIDE.md | Styling conventions |
Blog-to-audio is a practical bridge between web scraping, REST + streaming APIs, and audio ML services. Working through it teaches how to combine React state management with long-running server tasks, how to design multipart and SSE endpoints in FastAPI, and how to offer progressive enhancement: free providers by default, paid providers when keys exist. Use the live demos as reference behavior, then fork and simplify (e.g. strip pipeline mode) if you want a smaller codebase for teaching.
This project is licensed under the MIT License. Feel free to use, modify, and distribute the code as per the terms of the license.
This is an open-source project - feel free to use, enhance, and extend this project further!
If you have any questions or want to share your work, reach out via GitHub or my portfolio at https://www.arnobmahmud.com.
Enjoy building and learning! 🚀
Thank you! 😊






