Skip to content

BlackPythonDevs/job-crawler

Job Crawler

A Discord bot that fetches job postings from Greenhouse and Rippling boards, generates AI-powered summaries via Ollama (using Pydantic AI), and posts new listings to a Discord channel. Valkey (Redis-compatible) tracks already-posted jobs to prevent duplicates.

Architecture

flowchart LR
    subgraph External Services
        GH[Greenhouse API]
        RP[Rippling API]
        DC[Discord Channel]
    end

    subgraph Docker Compose
        subgraph Bot Container
            BOT[Bot / Polling Loop]
            PF[Preflight Checks]
            CFG[Config]
            EMB[Embed Builder]
            SUM[Summarizer — Pydantic AI]
            CTL[Controller]
            GHC[Greenhouse Client]
            RPC[Rippling Client]
        end
        VK[(Valkey)]
        OL[Ollama LLM]
    end

    BOT --> PF
    PF -->|health check| VK
    PF -->|health check| OL
    BOT --> CFG
    BOT --> CTL
    CTL --> GHC
    CTL --> RPC
    GHC -->|fetch jobs| GH
    RPC -->|fetch jobs| RP
    BOT -->|filter new| VK
    BOT --> SUM
    SUM -->|summarize| OL
    BOT --> EMB
    EMB -->|post embed| DC
    BOT -->|mark posted| VK
Loading

Prerequisites

  • Python 3.14+
  • A Discord bot token and target channel ID
  • Valkey (or Redis) instance
  • Ollama instance with your preferred model

Quickstart

With Docker Compose

cp .env.example .env
# Edit .env with your values
docker compose up --build

This starts three services: the bot, Valkey, and Ollama.

Local Development

uv sync
cp .env.example .env
# Edit .env with your values
job-crawler

CLI Options

job-crawler              # Run the bot (polls once, posts to Discord, then exits)
job-crawler --dry-run    # Preview jobs locally without Discord
job-crawler --limit 5    # Cap the number of jobs posted per cycle
job-preflight            # Check that Valkey and Ollama are reachable

Configuration

All configuration is via environment variables. See .env.example.

Variable Required Default Description
DISCORD_TOKEN Yes Discord bot token (not required for --dry-run)
DISCORD_CHANNEL_ID Yes Channel to post job listings (not required for --dry-run)
VALKEY_URL No valkey://localhost:6379/0 Valkey/Redis connection URL
JOB_TTL_SECONDS No 7776000 (90 days) How long to remember posted jobs
BOARD_URLS No Value of GREENHOUSE_BOARD_URL Comma-separated list of board URLs (Greenhouse and/or Rippling)
GREENHOUSE_BOARD_URL No Temporal Technologies board Greenhouse board API endpoint (used as fallback when BOARD_URLS is not set)
OLLAMA_BASE_URL No http://localhost:11434/v1 Ollama API URL (read by Pydantic AI)
OLLAMA_MODEL No ministral-3 LLM model for summarization

Testing

uv run pytest

Project Structure

job_crawler/
├── bot.py          # Discord bot, polling loop, and CLI entrypoint
├── config.py       # Environment variable configuration
├── greenhouse.py   # Greenhouse API client and Job dataclass
├── ripling.py      # Rippling API client
├── controller.py   # Multi-source job fetcher router
├── state.py        # Valkey-backed job deduplication
├── summarize.py    # LLM summarization via Pydantic AI + Ollama
├── embeds.py       # Discord embed builder
└── preflight.py    # Service health checks (Valkey, Ollama)

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages