This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This is the elizaOS Knowledge Aggregation System - a central hub for aggregating, processing, and synthesizing knowledge for the elizaOS project using automated workflows and scripts. The system transforms raw data from various sources into actionable intelligence through structured pipelines.
The system follows a daily automated pipeline (scheduled via GitHub Actions):
- External Data Ingestion (01:00 UTC) - Syncs from external repositories
- Context Aggregation (01:30 UTC) - Consolidates all data sources
- Daily Fact Extraction (01:35 UTC) - Extracts key insights using LLM
- Council Briefing Generation (01:50 UTC) - Creates strategic summaries
- HackMD Note Updates (02:30 UTC) - Updates documentation platform
- Enhanced Poster Generation (04:00 UTC) - Creates visual content with ElizaOS branding
- Discord Briefing (04:30 UTC) - Sends daily briefings with current posters
scripts/- All automation scripts organized by function:scripts/etl/- Data processing pipeline (aggregate, extract, generate)scripts/integrations/- External service integrations (Discord, HackMD)scripts/posters/- Visual content generationscripts/prompts/- LLM prompt templatesscripts/archive/- Deprecated scripts (kept for reference)
the-council/- Processed daily data (aggregated/, council_briefing/, facts/, highlights/, retros/, summaries/)hackmd/- Local backups of generated contentai-news/,daily-silk/,github/,docs/- Raw data sources.github/workflows/- Automation workflowsrss/- RSS feeds for facts and council briefings
Run from repository root:
# Aggregate daily data sources
python scripts/etl/aggregate-sources.py [YYYY-MM-DD]
# Extract facts and insights from aggregated data
python scripts/etl/extract-facts.py -i the-council/aggregated/YYYY-MM-DD.json -o the-council/facts/YYYY-MM-DD.json -md hackmd/facts/YYYY-MM-DD.md
# Generate strategic council briefing
python scripts/etl/generate-council-context.py <input_file> <output_file>
# Generate monthly retrospective
python scripts/etl/generate-monthly-retro.py -y 2025 -m 11
# Generate quarterly/annual summary
python scripts/etl/generate-quarterly-summary.py -y 2025 -q 4
# Generate RSS feeds
python scripts/etl/generate-rss.py
# Generate daily editorial highlights from ai-news
python scripts/etl/generate-daily-highlights.py [--date YYYY-MM-DD] [--dry-run]
# Help-reports ETL pipeline (unified script with subcommands)
python scripts/etl/helpers.py extract -y 2025 -m 12 # Extract interactions
python scripts/etl/helpers.py analyze -y 2025 -m 12 # Generate reports
python scripts/etl/helpers.py backfill # Backfill historical
# Create new HackMD notes for prompts
python scripts/integrations/hackmd/create.py [-b BOOK_PERMALINK] [-i LOCAL_DIR_PATH]
# Update HackMD notes with daily content
python scripts/integrations/hackmd/update.py [-d YYYY-MM-DD] [-j] [-v]
# Send Discord briefing
python scripts/integrations/discord/webhook.py path/to/facts.json -d -c CHANNEL_ID -s# Generate poster from markdown with ElizaOS branding
./scripts/posters/posters-enhanced.sh input.md output.png
# Batch generate all posters
./scripts/posters/run-poster-generation.shLimited to Discord bot functionality:
cd scripts/
npm install # Only needed for Discord.js dependencyOPENROUTER_API_KEY- For LLM API callsHMD_API_ACCESS_TOKENorHACKMD_API_TOKEN- For HackMD APIDISCORD_BOT_TOKEN- For Discord postingSELFHST_ICONS_PATH- (Optional) Path to selfhst/icons repo for reference imagesGILBARBARA_LOGOS_PATH- (Optional) Path to gilbarbara/logos repo for SVG logos
- ElizaOS Docs: Technical documentation from
elizaOS/eliza→docs/ - Daily Silk: AI news from Discord →
daily-silk/YYYY-MM-DD.md - GitHub Activity: Repository activity logs →
github/stats/andgithub/summaries/ - AI News: Curated AI news →
ai-news/elizaos/
- Raw data aggregated to
the-council/aggregated/YYYY-MM-DD.json - Facts extracted to
the-council/facts/YYYY-MM-DD.json - Strategic briefing generated to
the-council/council_briefing/YYYY-MM-DD.json - Editorial highlights curated to
the-council/highlights/YYYY-MM-DD.json(2-3 top stories) - Content published to HackMD and Discord
- RSS feeds generated to
rss/feed.xmlandrss/council.xml
- JSON: Structured data for API consumption
- Markdown: Human-readable documentation
- Discord Embeds: Rich formatted briefings
- HackMD Notes: Collaborative documentation
- RSS Feeds: Syndication feeds for facts and briefings
The system surfaces contributor intelligence naturally in council briefings without brittle preprocessing:
Data Location:
- Contributor lifetime statistics:
github/api/summaries/contributors/{username}/lifetime.json - Each file contains ownership percentages, contribution domains, review networks, and activity patterns
- Synced daily from external repository (1,517 contributor profiles as of Jan 2026)
How It Works:
- 7-Day Activity Summaries:
aggregate-sources.pyincludes recent contributor activity (1,500+ user summaries) - Natural Discovery: Contributors mentioned in Discord discussions or GitHub activity are automatically contextualized
- LLM-Guided Analysis: Prompts in
extract-facts.pyandgenerate-council-context.pydirect the LLM to:- Identify key contributors in operational discussions
- Surface ownership concentration risks (bus factor analysis)
- Note collaboration patterns and review dependencies
- Flag emerging contributors or activity anomalies
Design Philosophy:
- Progressive Disclosure: Contributor data exists in structured files; LLMs with tool-calling can access on-demand
- Avoid Preprocessing: No regex parsing of AI-generated markdown; rely on natural mentions in operational logs
- Robust & Scalable: Works with any number of contributors without brittle text parsing
Example Output: Council briefings naturally include context like:
- "lalalune: Author of v2.0.0 branch with 900k+ additions (PR #6351)"
- "Review dependency: 78% reviewed by odilitime"
- "Bus factor: 2 contributors handle 75% of runtime work"
aggregate-sources.py: Reads diverse data sources, creates daily JSON filesextract-facts.py: LLM-powered fact extraction with categorizationgenerate-council-context.py: Strategic analysis using North Star alignmentgenerate-daily-highlights.py: Editorial curation of 2-3 highlights from ai-news Full Stories with character voices (seescripts/etl/README-highlights.md)generate-monthly-retro.py: Monthly "State of ElizaOS" council episodesgenerate-quarterly-summary.py: Quarterly/annual pattern analysisgenerate-rss.py: RSS feed generation
discord/webhook.py: Facts to Discord with smart summarizationdiscord/bot.py: Council briefing Discord bothackmd/create.py: HackMD note creation and managementhackmd/update.py: Daily HackMD content updates
extract-entities.py: Extract entities (tokens, projects, users) from facts with LLM classificationgenerate-icons.py: Generate icons for entities using Nano Banana Pro (Gemini 3 Image)validate-icons.py: Validate icons and sync icon_paths to manifest
See scripts/posters/README_ICON_GENERATION.md for detailed icon generation documentation.
# Extract entities from facts (with deduplication)
python scripts/etl/extract-entities.py --dedupe
# Interactive icon generation
python scripts/posters/generate-icons.py -i -t project
# Batch icon generation
python scripts/posters/generate-icons.py --batch project --limit 4
# Validate and sync icons
python scripts/posters/validate-icons.py --sync-onlyThe scripts/prompts/ directory contains LLM interaction templates:
config/north-star.txt- Mission, core principles, strategic contextextraction/facts.txt- Fact extraction prompthackmd/comms/- Communication prompts (Discord, tweets, newsletters)hackmd/dev/- Developer update promptshackmd/strategy/- Strategic analysis prompts
- Use date format YYYY-MM-DD consistently
- Check
the-council/aggregated/daily.jsonfor latest data - Verify file existence before processing
- Test with sample data first
- Check error handling for API calls
- Maintain backward compatibility with existing JSON structures
- Follow the established naming conventions (kebab-case for files)
- Script Independence Pattern: Scripts intentionally duplicate some utility code (date handling, LLM calls) to maintain operational isolation. This is a feature, not a bug - it prevents coupling and simplifies debugging in production. Changes to one script cannot break others. When you see duplicated functions across scripts, this is by design for the daily batch pipeline architecture.
- Update sync workflows in
.github/workflows/sync.yml - Modify
scripts/etl/aggregate-sources.pyto include new source - Update README.md with source documentation
- Test full pipeline with new data
When upstream data needs to be regenerated or pipeline gaps are discovered, follow this backfill process:
# 1. Resync upstream data from M3-org/ai-news (if needed)
for date in 2026-01-{01..13}; do
curl -L -o "ai-news/elizaos/json/$date.json" \
"https://raw.githubusercontent.com/M3-org/ai-news/gh-pages/elizaos/json/$date.json"
curl -L -o "ai-news/elizaos/md/$date.md" \
"https://raw.githubusercontent.com/M3-org/ai-news/gh-pages/elizaos/md/$date.md"
done
# 2. Reaggregate all affected dates
for date in 2026-01-{01..13}; do
python scripts/etl/aggregate-sources.py $date
done
# 3. Re-extract facts (requires OPENROUTER_API_KEY)
for date in 2026-01-{01..13}; do
python scripts/etl/extract-facts.py \
-i the-council/aggregated/$date.json \
-o the-council/facts/$date.json \
-md hackmd/facts/$date.md
sleep 5 # Rate limiting
done
# 4. Regenerate council briefings
for date in 2026-01-{01..13}; do
python scripts/etl/generate-council-context.py \
the-council/aggregated/$date.json \
the-council/council_briefing/$date.json
sleep 5 # Rate limiting
done
# 5. Update RSS feeds
python scripts/etl/generate-rss.py
# 6. (Optional) Regenerate posters - expensive and time-consuming
for date in 2026-01-{01..13}; do
python scripts/posters/illustrate.py -f the-council/facts/$date.json --batch
sleep 10
done# Check aggregation success rates
for date in 2026-01-{01..13}; do
echo -n "$date: "
jq -r '._metadata | "\(.sources_successful) successful, \(.sources_failed) failed"' \
the-council/aggregated/$date.json
done
# Check facts extraction
for date in 2026-01-{01..13}; do
echo -n "$date: "
jq -r '._metadata | "\(.status) - \(.total_facts // 0) facts"' \
the-council/facts/$date.json
done
# Check poster generation
for date in 2026-01-{01..13}; do
echo -n "$date: "
if [ -f "media/daily/$date/manifest.json" ]; then
jq -r '.stats | "\(.successful)/\(.total_generations) successful"' \
media/daily/$date/manifest.json
else
echo "NO MANIFEST"
fi
doneIssue: extract-facts.py may infer wrong briefing_date when LLM sees T-1 (previous day) AI news data.
Example: Facts file 2026-01-13.json contained "briefing_date": "2026-01-12"
Root Cause: The aggregator pulls target_date - timedelta(days=1) for AI news sources, but the LLM infers the date from the data content rather than the target filename.
Workaround: Manually verify briefing_date field in extracted facts matches expected date.
Fix Planned: Add explicit date validation to LLM prompt + post-process validation (see GitHub issue for details).
Issue: Batch poster generation can hit API rate limits (429 errors) when processing multiple dates.
Example: During 13-day backfill, poster generation hit "Resource exhausted" error on 2026-01-09.
Impact: Partial poster sets (4/5 instead of 5/5).
Workaround: Add sleep 10 between dates, or regenerate failed posters individually.
Fix Planned: Implement exponential backoff retry logic with rate limit tracking (see GitHub issue for details).
Issue: Upstream data structure can change without notice (e.g., images: [] → images: null or scalar strings).
Impact: Silent data quality degradation if not detected.
Fix Planned: Schema validation with baseline snapshots to detect breaking changes (see GitHub issue for details).
- API Rate Limits: Scripts include retry logic for OpenRouter and HackMD APIs. For poster generation, add delays between batch operations.
- Missing Data: Check if external sync workflows completed successfully. Verify upstream sources are available.
- JSON Format Errors: Validate input JSON structure before processing. Check for schema changes in upstream sources.
- Discord Posting Failures: Verify bot permissions and channel access.
- Wrong Date Fields: LLM may infer dates from content rather than target date. Verify
briefing_datematches expected date. - Partial Poster Sets: Check
media/daily/YYYY-MM-DD/manifest.jsonfor generation stats. API rate limits may cause partial generations.
# Verbose output for script debugging
python scripts/integrations/hackmd/update.py -v
python scripts/integrations/hackmd/create.py -v
# Test webhook without posting to Discord
python scripts/integrations/discord/webhook.py facts.json -o output.json
# Check aggregated data quality
jq '._metadata' the-council/aggregated/2026-01-13.json
# Verify facts extraction metadata
jq '._metadata' the-council/facts/2026-01-13.json
# Check poster generation stats
jq '.stats' media/daily/2026-01-13/manifest.jsonA comprehensive analysis of the pipeline identified improvements in:
- Data quality validation (schema validation, date field extraction)
- Error handling (retry logic, rate limiting, circuit breakers)
- Observability (metrics collection, health dashboard, alerting)
- Developer experience (setup automation, backfill tooling)
See GitHub issues for implementation roadmap and tracking.
The system is designed to be resilient and self-documenting through its comprehensive logging and structured data approach.