Skip to content

iconifyit/job-search-bot

Repository files navigation

Job Search Bot

An automated job search aggregator that scans multiple remote job APIs, scores matches against your resume using AI, and delivers daily reports via email.

Node.js License

⚠️ Disclaimer: This software is provided "as-is" without warranty of any kind. Use at your own risk. The authors are not responsible for any job opportunities missed, API rate limits exceeded, or AWS charges incurred. Always review the generated reports and verify job listings independently.

Features

  • AI-powered matching - Uses OpenAI or Anthropic Claude to evaluate job relevance against your resume
  • Resume-driven search - Parse your resume (PDF, DOCX, or Markdown) to extract skills and configure search criteria
  • Multi-source aggregation - Fetches from 15+ job sources: APIs, RSS feeds, email alerts, and web scrapers
  • ATS integrations - Built-in adapters for Greenhouse, Lever, and Ashby company job boards
  • Email source (BYOE) - Process job alert emails from LinkedIn, Indeed, ZipRecruiter via AWS SES
  • Flexible pipelines - Run email-only, API-only, or combined searches via CLI flags or config
  • Smart filtering - Hybrid local + AI filters for salary, location, job type, and work arrangement
  • Single source mode - Test individual job sources with --source=<url>
  • Source performance tracking - Monitor which sources deliver results with --stats
  • HTML reports - Beautiful reports hosted on S3 with direct links in email notifications
  • Extensible architecture - Add new job sources with YAML config or custom handlers

Architecture

job-search-bot/
├── src/
│   ├── cli/                 # CLI entry point and utilities
│   ├── api/                 # Optional REST API server
│   ├── core/
│   │   ├── pipeline/        # Pipeline stages (EmailSource, HardFilter, AIEvaluator, etc.)
│   │   ├── adapters/        # ATS adapters (Greenhouse, Lever, Ashby)
│   │   ├── services/        # NotificationService, StorageService, TemplateService
│   │   └── sources/         # Source utilities
│   ├── configs/             # Configuration management
│   └── utils/               # Shared utilities (browser, retry, concurrency)
├── user/                    # User-specific config (gitignored)
│   ├── settings/            # Your search settings (*.yaml)
│   ├── resumes/             # Your resume files
│   ├── handlers/            # Custom source handlers
│   ├── filters/             # Custom JavaScript filters
│   └── sources.yaml         # Job source definitions
├── templates/
│   ├── prompts/             # AI prompt templates
│   └── reports/             # HTML report templates
├── infra/                   # AWS CDK infrastructure
├── docs/                    # Architecture docs and ADRs
├── test/                    # Comprehensive test suite
└── logs/                    # Daily log files

Pipeline Flow

%%{init: {'theme': 'redux', 'layout': 'dagre'}}%%
flowchart LR
    subgraph Sources["📥 Job Sources"]
        direction TB
        EMAIL["📧 Email Alerts<br/><small>LinkedIn, Indeed, ZipRecruiter</small>"]
        API["🌐 Job APIs<br/><small>RemoteOK, Remotive, Jobicy</small>"]
        RSS["📡 RSS Feeds<br/><small>WeWorkRemotely, HN</small>"]
        ATS["🏢 ATS Boards<br/><small>Greenhouse, Lever, Ashby</small>"]
        SCRAPE["🔍 Scrapers<br/><small>FlexJobs</small>"]
    end

    subgraph Fetch["1️⃣ Fetch"]
        FETCHER["SourceFetcher<br/>━━━━━━━━━━<br/>• Parallel fetching<br/>• Auto field mapping<br/>• Deduplication"]
    end

    subgraph Filter["2️⃣ Filter"]
        HARD["HardFilter<br/>━━━━━━━━━━<br/>• Date cutoff<br/>• Salary threshold<br/>• Excluded companies<br/>• Job type filters"]
    end

    subgraph AI["3️⃣ AI Match"]
        EVAL["AIEvaluator<br/>━━━━━━━━━━<br/>• Resume context<br/>• Ideal role profile<br/>• Batch processing<br/>• Tier classification"]
    end

    subgraph Report["4️⃣ Report"]
        GEN["ReportGenerator<br/>━━━━━━━━━━<br/>• HTML generation<br/>• Job grouping<br/>• Match explanations"]
    end

    subgraph Deliver["📤 Deliver"]
        S3[("S3<br/>Report")]
        SNS["SNS<br/>Email"]
    end

    subgraph Config["⚙️ Config"]
        direction TB
        RESUME["📄 Resume<br/><small>PDF/DOCX/MD</small>"]
        SETTINGS["📋 Settings<br/><small>YAML config</small>"]
        IDEAL["🎯 Ideal Role<br/><small>Preferences</small>"]
    end

    %% Source connections
    EMAIL --> FETCHER
    API --> FETCHER
    RSS --> FETCHER
    ATS --> FETCHER
    SCRAPE --> FETCHER

    %% Pipeline flow
    FETCHER -->|"~500 jobs"| HARD
    HARD -->|"~150 jobs"| EVAL
    EVAL -->|"~30 matches"| GEN
    GEN --> S3
    S3 --> SNS

    %% Config connections
    SETTINGS -.->|"filters"| HARD
    RESUME -.->|"context"| EVAL
    IDEAL -.->|"preferences"| EVAL
Loading

Quick Start

1. Install

git clone https://github.com/yourusername/job-search-bot.git
cd job-search-bot
npm install

2. Configure Settings

Copy the example settings file and customize:

mkdir -p user/settings user/resumes
cp user/settings.example.yaml user/settings/settings.yaml
cp user/settings/ideal-role.example.yaml user/settings/ideal-role.yaml

Edit user/settings/settings.yaml to configure:

  • Pipeline mode: full, api, email, or custom
  • Job matching: Target titles, required/preferred keywords
  • Filters: Salary minimums, excluded companies, work arrangement preferences
  • AI settings: Provider selection, model, concurrency

3. Add Your Resume

Place your resume in user/resumes/ (PDF, DOCX, or Markdown supported):

cp ~/path/to/your-resume.pdf user/resumes/
# or use Markdown for best results:
cp ~/path/to/your-resume.md user/resumes/resume.md

The bot automatically finds and parses your resume at runtime.

4. Set Environment Variables

Copy .env.example to .env and configure:

cp .env.example .env

Key variables:

# AI Provider configuration (required for AI matching)
# Example: OpenAI
AI_PROVIDER=openai
AI_MODEL=gpt-5-mini
OPENAI_API_KEY=sk-...
# Or use Anthropic Claude instead:
# AI_PROVIDER=claude
# AI_MODEL=claude-3-haiku
# ANTHROPIC_API_KEY=sk-ant-...

# AWS Configuration (for reports and notifications)
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=your-access-key
AWS_SECRET_ACCESS_KEY=your-secret-key

# SNS Topic for email notifications
SNS_TOPIC_ARN=arn:aws:sns:us-east-1:123456789:job-alerts

# S3 bucket for email processing (if using email sources)
EMAIL_S3_BUCKET=your-email-bucket

# Third-party API keys (optional, for premium sources)
RAPIDAPI_KEY=your-key
THEIRSTACK_API_KEY=your-key
JOBVEN_API_KEY=your-key

5. Run

# Preview results without sending notifications (dry run)
npm run dry

# Debug mode - verbose output
node src/cli/index.js --debug --dry

# Live run - search and send notification
npm start

# Run only API/RSS sources (default, skip email)
node src/cli/index.js --mode=api --dry

# Run only email sources (skip API sources)
node src/cli/index.js --mode=email --dry

# Run all sources (email + API)
node src/cli/index.js --mode=full --dry

# Test a single source URL
node src/cli/index.js --source="https://remoteok.com/api" --debug

CLI Reference

Execution Modes

Flag Description
(none) Live run - sends real notifications
--dry Dry run - prints results without sending
--test Test mode - uses console notification service
--debug Verbose logging for troubleshooting

Pipeline Modes

Flag Description
--mode=api API/RSS sources with filters & AI (default)
--mode=email Email sources with filters & AI
--mode=full All sources (email + API) with filters & AI
--mode=email --no-filters Email aggregation only (no AI)

Additional Options

Flag Description
--source=<url> Run against a single source URL
--save-pre-ai Save job list before AI filtering to reports/
--no-archive Don't archive processed emails (allows re-runs)
--resume /path/to/file Use custom resume file
--settings /path/to/file Use custom settings file
--user <name> Name for report identification

Source Management

Flag Description
--stats Show source performance statistics
--dismiss-alert "Source Name" Suppress alerts for a source
--undismiss-alert "Source Name" Re-enable alerts for a source
--undismiss-alert all Re-enable all dismissed alerts

Pipeline Modes

The bot supports different pipeline modes for flexibility:

Mode Sources AI Filtering Use Case
api APIs, RSS Yes Daily search (default)
email Email alerts Yes Process job alert emails
full All sources Yes Comprehensive search
custom Configurable Configurable Advanced users

Settings Override

In user/settings/settings.yaml:

pipeline:
  mode: api  # api | email | full | custom

  # For custom mode, specify stages:
  stages:
    - email
    - sources
    - hardFilter
    - ai
    - report

Job Sources

Supported Source Types

  1. Free Public APIs - RemoteOK, Remotive, Himalayas, Jobicy, WorkingNomads
  2. RSS Feeds - WeWorkRemotely, HN Who's Hiring
  3. Authenticated APIs - TheirStack, Jobven, RapidAPI
  4. ATS Company Boards - Greenhouse, Lever, Ashby (300+ companies)
  5. Web Scrapers - FlexJobs (requires Chrome)
  6. Email Alerts - LinkedIn, Indeed, ZipRecruiter, Glassdoor

Adding Job Sources

Sources are configured in user/sources.yaml:

Simple API (auto-mapping)

- name: MyJobSite
  url: https://api.example.com/jobs

RSS Feed

- name: MyRSSFeed
  url: https://example.com/jobs.rss
  type: rss

ATS Company Board

- name: Stripe
  platform: greenhouse
  id: stripe

- name: Vercel
  platform: lever
  id: vercel

- name: Linear
  platform: ashby
  id: linear

Custom Handler

For APIs requiring authentication, pagination, or special logic, create a handler in user/handlers/:

// user/handlers/myjobsite.js
export default {
    getMapping: () => ({
        dataPath: 'results',
        mappings: {
            id          : 'job_id',
            title       : 'position',
            company     : 'employer.name',
            url         : 'application_url',
            location    : 'job_location',
            description : 'details',
            salary      : 'compensation',
            postedAt    : 'published_date'
        }
    }),

    // Optional: custom fetch logic for auth, pagination, etc.
    async fetch({ source, searchConfig, config }) {
        const response = await fetch('https://api.example.com/jobs', {
            headers: { 'Authorization': `Bearer ${process.env.MY_API_KEY}` }
        });
        return response.json();
    }
};

Then reference it in user/sources.yaml:

- name: MyJobSite
  handler: myjobsite

Configuration

User Configuration Files

Path Required Description
user/settings/settings.yaml Yes Search settings, filters, AI config
user/settings/ideal-role.yaml No Detailed preferences for AI matching
user/resumes/ Yes Your resume file(s) - PDF, DOCX, or MD
user/sources.yaml No Custom job sources (uses defaults if missing)
user/handlers/ No Custom handler implementations
user/filters/ No Custom JavaScript filter functions

Environment Variables

Variable Required Description
OPENAI_API_KEY Yes* OpenAI API key
ANTHROPIC_API_KEY Yes* Anthropic API key (alternative to OpenAI)
AWS_REGION Yes AWS region (default: us-east-1)
AWS_ACCESS_KEY_ID Yes AWS credentials
AWS_SECRET_ACCESS_KEY Yes AWS credentials
SNS_TOPIC_ARN No SNS topic for email notifications
EMAIL_S3_BUCKET No** S3 bucket for email processing
RAPIDAPI_KEY No RapidAPI key for premium sources
THEIRSTACK_API_KEY No TheirStack API key
JOBVEN_API_KEY No Jobven API key

*One AI provider key required if ai_matching: true **Required if using email pipeline mode

Scheduling

Linux/macOS (cron)

crontab -e
# Run daily at 10 AM
0 10 * * * cd /path/to/job-search-bot && node src/cli/index.js >> logs/cron.log 2>&1

AWS Lambda

The infra/ directory contains an AWS CDK stack for deploying as a scheduled Lambda function with email processing via SES.

cd infra
npm install
cdk deploy

Testing

# Run all tests
npm test

# Run specific test file
npm test -- --testPathPatterns="ai-matcher"

# Run with coverage
npm test -- --coverage

License

MIT License - see LICENSE for details.

About

Daily automated job search bot - scans remote job APIs and emails matching senior engineering roles

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors