The video processor supports harvesting videos from the Virginia Senate's YouTube channel (@SenateofVirginia) in addition to the Granicus platform. This provides redundancy and captures videos that may only be published on YouTube.
Channel: @SenateofVirginia
Channel ID: UC9r1OpPhTY1VmL05bemQD0w
Source Identifier: senate-youtube (vs senate for Granicus)
The system uses the YouTube Data API v3 to fetch video metadata and yt-dlp to download videos.
YouTubeApiClient (src/Scraper/YouTube/YouTubeApiClient.php)
- Interfaces with YouTube Data API v3
- Fetches video listings and detailed metadata
- Parses ISO 8601 durations (PT1H23M45S → seconds)
- Tracks API quota usage
SenateYouTubeScraper (src/Scraper/Senate/SenateYouTubeScraper.php)
- Implements
VideoSourceScraperInterface - Extracts committee names from titles
- Parses dates from video metadata
- Detects event types (committee/subcommittee/floor)
- Returns standardized video records
VideoDownloadProcessor (src/Fetcher/VideoDownloadProcessor.php)
- Enhanced with yt-dlp support
- Detects YouTube URLs by domain
- Downloads best quality MP4 with audio
- Automatically downloads English captions
YouTube API → SenateYouTubeScraper → VideoScraper → JSON snapshots → Pipeline → Database
YouTube video downloads require yt-dlp:
# macOS (Homebrew)
brew install yt-dlp
# Ubuntu/Debian
pip install yt-dlp
# Or download binary from:
# https://github.com/yt-dlp/yt-dlp/releasesVerify installation:
which yt-dlp
yt-dlp --versionYouTube requires authentication cookies to bypass bot detection. You must export cookies from your local browser and upload them to the server.
-
Install the "Get cookies.txt LOCALLY" extension:
-
Export cookies:
- Visit https://youtube.com and ensure you're logged in
- Click the extension icon
- Click "Export" to download
cookies.txt
-
Upload to server:
scp cookies.txt ubuntu@your-server:/home/ubuntu/youtube-cookies.txt
-
Verify the file exists:
ssh ubuntu@your-server ls -lh /home/ubuntu/youtube-cookies.txt
Important: Cookies typically last for several months before expiring. The system will automatically detect when cookies expire and log a critical error.
Create Google Cloud Project:
- Go to https://console.cloud.google.com/
- Click "Select a project" → "New Project"
- Name: "Virginia Legislature Video Scraper"
- Click "Create"
Enable YouTube Data API v3:
- In the project, go to "APIs & Services" → "Library"
- Search for "YouTube Data API v3"
- Click on it and click "Enable"
Create API Key:
- Go to "APIs & Services" → "Credentials"
- Click "Create Credentials" → "API Key"
- Copy the generated key
- Click "Restrict Key" (recommended)
- Under "API restrictions", select "Restrict key"
- Choose "YouTube Data API v3"
- Save
Configure in Application:
- Open
includes/settings.inc.php - Add:
define('YOUTUBE_API_KEY', 'YOUR_KEY_HERE'); - Never commit the key to version control
YouTube cookies typically last several months before expiring. When they expire:
-
Automatic Detection:
- The system detects "Sign in to confirm you're not a bot" errors
- Logs a CRITICAL error (severity 7) with clear instructions
- Throws
YouTubeCookiesExpiredException - Stops all YouTube download attempts
-
You'll See in Logs:
CRITICAL: YouTube cookies have expired or are invalid. Export fresh cookies from your browser using "Get cookies.txt LOCALLY" extension and upload to /home/ubuntu/youtube-cookies.txt -
Impact:
- YouTube videos won't download until cookies are refreshed
- House and Senate Granicus videos continue processing normally
- No data loss or corruption
Quick Process:
# 1. On your local machine: Export cookies using browser extension
# 2. Upload to server
scp cookies.txt ubuntu@your-server:/home/ubuntu/youtube-cookies.txt
# 3. Optional: Restart video processor if currently running
ssh ubuntu@your-server
sudo systemctl restart video-pipeline.serviceVerification:
# Check file exists and is recent
ssh ubuntu@your-server
ls -lh /home/ubuntu/youtube-cookies.txt
# Should show file size around 10-50 KB
# Date should be recent (today)Check logs for cookie expiration:
# View recent critical errors
grep "CRITICAL.*YouTube cookies" /var/log/video-processor.log
# Monitor for YouTubeCookiesExpiredException
grep "YouTubeCookiesExpiredException" /var/log/video-processor.logSet up alerts (optional):
- Monitor logs for severity 7 errors
- Alert when
YouTubeCookiesExpiredExceptionappears - Reminder to refresh cookies every 2-3 months
The YouTube scraper is automatically included when running the pipeline:
# Scrape all sources (including YouTube)
php bin/scrape.php
# Run full pipeline
php bin/pipeline.phpOutput will include videos from three sources:
- House (Granicus)
- Senate (Granicus)
- Senate YouTube
YouTube API has daily quota limits:
Default quota: 10,000 units/day Typical usage: ~150 units per scrape Daily capacity: ~66 scrapes
Cost breakdown per scrape:
- Search: 100 units
- Video details: ~50 units (for 50 videos @ 1 unit each)
Monitor quota usage at: https://console.cloud.google.com/
Run unit tests:
includes/vendor/bin/phpunit tests/Scraper/YouTubeApiClientTest.php
includes/vendor/bin/phpunit tests/Scraper/SenateYouTubeScraperTest.phpTest scraping:
php bin/scrape.phpVerify output:
# Check for YouTube videos
cat storage/scraper/videos-*.json | jq '.records[] | select(.source=="senate-youtube") | {title, video_url, duration_seconds}'Test video download:
php bin/fetch_videos.php --limit=1src/Scraper/YouTube/YouTubeApiClient.php- YouTube Data API v3 clientsrc/Scraper/Senate/SenateYouTubeScraper.php- YouTube scrapersrc/Fetcher/VideoDownloadProcessor.php- Enhanced with yt-dlp support
tests/Scraper/YouTubeApiClientTest.php- API client teststests/Scraper/SenateYouTubeScraperTest.php- Scraper teststests/fixtures/youtube-live-videos.json- API response fixturetests/fixtures/youtube-video-details.json- Video details fixture
includes/settings-default.inc.php- AddedYOUTUBE_API_KEYconstantbin/scrape.php- RegisteredSenateYouTubeScraperbin/pipeline.php- Added YouTube scraper to pipeline
✅ YouTube Data API v3 integration ✅ Video fetching from channel ✅ Video details retrieval (title, description, duration, thumbnails) ✅ ISO 8601 duration parsing ✅ Committee name extraction from titles ✅ Date extraction from video metadata ✅ Event type detection (committee/subcommittee/floor) ✅ yt-dlp video download with MP4 format selection ✅ Automatic caption download (WebVTT format) ✅ API quota tracking and logging ✅ Error handling for quota limits and network failures ✅ Dual-source operation (Granicus + YouTube) ✅ Comprehensive test coverage (10 tests, 57 assertions)
Symptom:
ERROR: Sign in to confirm you're not a bot. Use --cookies-from-browser or --cookies
Cause: YouTube cookies have expired or are missing.
What Happens:
- The system detects this error automatically
- Logs a CRITICAL error at severity level 7
- Throws
YouTubeCookiesExpiredException - Halts further YouTube download attempts
- Granicus videos continue processing normally
Solution - Refresh Cookies:
-
Export fresh cookies from your local browser:
- Install "Get cookies.txt LOCALLY" extension (see Setup section above)
- Visit https://youtube.com while logged in
- Click extension icon → Export
- Save as
cookies.txt
-
Upload to server:
scp cookies.txt ubuntu@your-server:/home/ubuntu/youtube-cookies.txt
-
Restart the video processor:
ssh ubuntu@your-server sudo systemctl restart video-pipeline.service
Prevention:
- Cookies typically last several months
- Check logs regularly for severity 7 errors
- Consider setting up monitoring alerts for
YouTubeCookiesExpiredException
Check API key configuration:
php -r "require 'includes/settings.inc.php'; echo YOUTUBE_API_KEY ?? 'NOT SET';"Verify yt-dlp is installed:
which yt-dlpIf not installed, follow installation steps above.
Monitor quota at: https://console.cloud.google.com/
If consistently hitting limits, consider:
- Reducing scrape frequency
- Requesting quota increase from Google
Symptom:
YouTube cookies file not found at: /home/ubuntu/youtube-cookies.txt
Solution: Export and upload cookies file (see Setup section above)
If issues arise, temporarily disable the YouTube scraper:
In bin/scrape.php:
// Comment out YouTube scraper
// $senateYouTube = new SenateYouTubeScraper($http, YOUTUBE_API_KEY ?? '');
// Update VideoScraper to exclude YouTube
$scraper = new VideoScraper([$house, $senateGranicus], $writer, $logger);In bin/pipeline.php:
// Comment out YouTube scraper
// $senateYouTubeScraper = new SenateYouTubeScraper(...);
// Remove from array_merge
$newRecords = array_merge(
$houseScraper->scrape(),
$senateScraper->scrape()
// $senateYouTubeScraper->scrape()
);The Granicus scraper continues working normally with no data loss.
Completed: January 16, 2026 Tests: 10 tests, 57 assertions - all passing Status: Production ready