Job Search Bot

An automated job search aggregator that scans multiple remote job APIs, scores matches against your resume using AI, and delivers daily reports via email.

⚠️ Disclaimer: This software is provided "as-is" without warranty of any kind. Use at your own risk. The authors are not responsible for any job opportunities missed, API rate limits exceeded, or AWS charges incurred. Always review the generated reports and verify job listings independently.

Features

AI-powered matching - Uses OpenAI or Anthropic Claude to evaluate job relevance against your resume
Resume-driven search - Parse your resume (PDF, DOCX, or Markdown) to extract skills and configure search criteria
Multi-source aggregation - Fetches from 15+ job sources: APIs, RSS feeds, email alerts, and web scrapers
ATS integrations - Built-in adapters for Greenhouse, Lever, and Ashby company job boards
Email source (BYOE) - Process job alert emails from LinkedIn, Indeed, ZipRecruiter via AWS SES
Flexible pipelines - Run email-only, API-only, or combined searches via CLI flags or config
Smart filtering - Hybrid local + AI filters for salary, location, job type, and work arrangement
Single source mode - Test individual job sources with --source=<url>
Source performance tracking - Monitor which sources deliver results with --stats
HTML reports - Beautiful reports hosted on S3 with direct links in email notifications
Extensible architecture - Add new job sources with YAML config or custom handlers

Architecture

job-search-bot/
├── src/
│   ├── cli/                 # CLI entry point and utilities
│   ├── api/                 # Optional REST API server
│   ├── core/
│   │   ├── pipeline/        # Pipeline stages (EmailSource, HardFilter, AIEvaluator, etc.)
│   │   ├── adapters/        # ATS adapters (Greenhouse, Lever, Ashby)
│   │   ├── services/        # NotificationService, StorageService, TemplateService
│   │   └── sources/         # Source utilities
│   ├── configs/             # Configuration management
│   └── utils/               # Shared utilities (browser, retry, concurrency)
├── user/                    # User-specific config (gitignored)
│   ├── settings/            # Your search settings (*.yaml)
│   ├── resumes/             # Your resume files
│   ├── handlers/            # Custom source handlers
│   ├── filters/             # Custom JavaScript filters
│   └── sources.yaml         # Job source definitions
├── templates/
│   ├── prompts/             # AI prompt templates
│   └── reports/             # HTML report templates
├── infra/                   # AWS CDK infrastructure
├── docs/                    # Architecture docs and ADRs
├── test/                    # Comprehensive test suite
└── logs/                    # Daily log files

Pipeline Flow

%%{init: {'theme': 'redux', 'layout': 'dagre'}}%%
flowchart LR
    subgraph Sources["📥 Job Sources"]
        direction TB
        EMAIL["📧 Email Alerts<br/><small>LinkedIn, Indeed, ZipRecruiter</small>"]
        API["🌐 Job APIs<br/><small>RemoteOK, Remotive, Jobicy</small>"]
        RSS["📡 RSS Feeds<br/><small>WeWorkRemotely, HN</small>"]
        ATS["🏢 ATS Boards<br/><small>Greenhouse, Lever, Ashby</small>"]
        SCRAPE["🔍 Scrapers<br/><small>FlexJobs</small>"]
    end

    subgraph Fetch["1️⃣ Fetch"]
        FETCHER["SourceFetcher<br/>━━━━━━━━━━<br/>• Parallel fetching<br/>• Auto field mapping<br/>• Deduplication"]
    end

    subgraph Filter["2️⃣ Filter"]
        HARD["HardFilter<br/>━━━━━━━━━━<br/>• Date cutoff<br/>• Salary threshold<br/>• Excluded companies<br/>• Job type filters"]
    end

    subgraph AI["3️⃣ AI Match"]
        EVAL["AIEvaluator<br/>━━━━━━━━━━<br/>• Resume context<br/>• Ideal role profile<br/>• Batch processing<br/>• Tier classification"]
    end

    subgraph Report["4️⃣ Report"]
        GEN["ReportGenerator<br/>━━━━━━━━━━<br/>• HTML generation<br/>• Job grouping<br/>• Match explanations"]
    end

    subgraph Deliver["📤 Deliver"]
        S3[("S3<br/>Report")]
        SNS["SNS<br/>Email"]
    end

    subgraph Config["⚙️ Config"]
        direction TB
        RESUME["📄 Resume<br/><small>PDF/DOCX/MD</small>"]
        SETTINGS["📋 Settings<br/><small>YAML config</small>"]
        IDEAL["🎯 Ideal Role<br/><small>Preferences</small>"]
    end

    %% Source connections
    EMAIL --> FETCHER
    API --> FETCHER
    RSS --> FETCHER
    ATS --> FETCHER
    SCRAPE --> FETCHER

    %% Pipeline flow
    FETCHER -->|"~500 jobs"| HARD
    HARD -->|"~150 jobs"| EVAL
    EVAL -->|"~30 matches"| GEN
    GEN --> S3
    S3 --> SNS

    %% Config connections
    SETTINGS -.->|"filters"| HARD
    RESUME -.->|"context"| EVAL
    IDEAL -.->|"preferences"| EVAL

Quick Start

1. Install

git clone https://github.com/yourusername/job-search-bot.git
cd job-search-bot
npm install

2. Configure Settings

Copy the example settings file and customize:

mkdir -p user/settings user/resumes
cp user/settings.example.yaml user/settings/settings.yaml
cp user/settings/ideal-role.example.yaml user/settings/ideal-role.yaml

Edit user/settings/settings.yaml to configure:

Pipeline mode: full, api, email, or custom
Job matching: Target titles, required/preferred keywords
Filters: Salary minimums, excluded companies, work arrangement preferences
AI settings: Provider selection, model, concurrency

3. Add Your Resume

Place your resume in user/resumes/ (PDF, DOCX, or Markdown supported):

cp ~/path/to/your-resume.pdf user/resumes/
# or use Markdown for best results:
cp ~/path/to/your-resume.md user/resumes/resume.md

The bot automatically finds and parses your resume at runtime.

4. Set Environment Variables

Copy .env.example to .env and configure:

cp .env.example .env

Key variables:

# AI Provider configuration (required for AI matching)
# Example: OpenAI
AI_PROVIDER=openai
AI_MODEL=gpt-5-mini
OPENAI_API_KEY=sk-...
# Or use Anthropic Claude instead:
# AI_PROVIDER=claude
# AI_MODEL=claude-3-haiku
# ANTHROPIC_API_KEY=sk-ant-...

# AWS Configuration (for reports and notifications)
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=your-access-key
AWS_SECRET_ACCESS_KEY=your-secret-key

# SNS Topic for email notifications
SNS_TOPIC_ARN=arn:aws:sns:us-east-1:123456789:job-alerts

# S3 bucket for email processing (if using email sources)
EMAIL_S3_BUCKET=your-email-bucket

# Third-party API keys (optional, for premium sources)
RAPIDAPI_KEY=your-key
THEIRSTACK_API_KEY=your-key
JOBVEN_API_KEY=your-key

5. Run

# Preview results without sending notifications (dry run)
npm run dry

# Debug mode - verbose output
node src/cli/index.js --debug --dry

# Live run - search and send notification
npm start

# Run only API/RSS sources (default, skip email)
node src/cli/index.js --mode=api --dry

# Run only email sources (skip API sources)
node src/cli/index.js --mode=email --dry

# Run all sources (email + API)
node src/cli/index.js --mode=full --dry

# Test a single source URL
node src/cli/index.js --source="https://remoteok.com/api" --debug

CLI Reference

Execution Modes

Flag	Description
(none)	Live run - sends real notifications
`--dry`	Dry run - prints results without sending
`--test`	Test mode - uses console notification service
`--debug`	Verbose logging for troubleshooting

Pipeline Modes

Flag	Description
`--mode=api`	API/RSS sources with filters & AI (default)
`--mode=email`	Email sources with filters & AI
`--mode=full`	All sources (email + API) with filters & AI
`--mode=email --no-filters`	Email aggregation only (no AI)

Additional Options

Flag	Description
`--source=<url>`	Run against a single source URL
`--save-pre-ai`	Save job list before AI filtering to reports/
`--no-archive`	Don't archive processed emails (allows re-runs)
`--resume /path/to/file`	Use custom resume file
`--settings /path/to/file`	Use custom settings file
`--user <name>`	Name for report identification

Source Management

Flag	Description
`--stats`	Show source performance statistics
`--dismiss-alert "Source Name"`	Suppress alerts for a source
`--undismiss-alert "Source Name"`	Re-enable alerts for a source
`--undismiss-alert all`	Re-enable all dismissed alerts

Pipeline Modes

The bot supports different pipeline modes for flexibility:

Mode	Sources	AI Filtering	Use Case
`api`	APIs, RSS	Yes	Daily search (default)
`email`	Email alerts	Yes	Process job alert emails
`full`	All sources	Yes	Comprehensive search
`custom`	Configurable	Configurable	Advanced users

Settings Override

In user/settings/settings.yaml:

pipeline:
  mode: api  # api | email | full | custom

  # For custom mode, specify stages:
  stages:
    - email
    - sources
    - hardFilter
    - ai
    - report

Job Sources

Supported Source Types

Free Public APIs - RemoteOK, Remotive, Himalayas, Jobicy, WorkingNomads
RSS Feeds - WeWorkRemotely, HN Who's Hiring
Authenticated APIs - TheirStack, Jobven, RapidAPI
ATS Company Boards - Greenhouse, Lever, Ashby (300+ companies)
Web Scrapers - FlexJobs (requires Chrome)
Email Alerts - LinkedIn, Indeed, ZipRecruiter, Glassdoor

Adding Job Sources

Sources are configured in user/sources.yaml:

Simple API (auto-mapping)

- name: MyJobSite
  url: https://api.example.com/jobs

RSS Feed

- name: MyRSSFeed
  url: https://example.com/jobs.rss
  type: rss

ATS Company Board

- name: Stripe
  platform: greenhouse
  id: stripe

- name: Vercel
  platform: lever
  id: vercel

- name: Linear
  platform: ashby
  id: linear

Custom Handler

For APIs requiring authentication, pagination, or special logic, create a handler in user/handlers/:

// user/handlers/myjobsite.js
export default {
    getMapping: () => ({
        dataPath: 'results',
        mappings: {
            id          : 'job_id',
            title       : 'position',
            company     : 'employer.name',
            url         : 'application_url',
            location    : 'job_location',
            description : 'details',
            salary      : 'compensation',
            postedAt    : 'published_date'
        }
    }),

    // Optional: custom fetch logic for auth, pagination, etc.
    async fetch({ source, searchConfig, config }) {
        const response = await fetch('https://api.example.com/jobs', {
            headers: { 'Authorization': `Bearer ${process.env.MY_API_KEY}` }
        });
        return response.json();
    }
};

Then reference it in user/sources.yaml:

- name: MyJobSite
  handler: myjobsite

Configuration

User Configuration Files

Path	Required	Description
`user/settings/settings.yaml`	Yes	Search settings, filters, AI config
`user/settings/ideal-role.yaml`	No	Detailed preferences for AI matching
`user/resumes/`	Yes	Your resume file(s) - PDF, DOCX, or MD
`user/sources.yaml`	No	Custom job sources (uses defaults if missing)
`user/handlers/`	No	Custom handler implementations
`user/filters/`	No	Custom JavaScript filter functions

Environment Variables

Variable	Required	Description
`OPENAI_API_KEY`	Yes*	OpenAI API key
`ANTHROPIC_API_KEY`	Yes*	Anthropic API key (alternative to OpenAI)
`AWS_REGION`	Yes	AWS region (default: us-east-1)
`AWS_ACCESS_KEY_ID`	Yes	AWS credentials
`AWS_SECRET_ACCESS_KEY`	Yes	AWS credentials
`SNS_TOPIC_ARN`	No	SNS topic for email notifications
`EMAIL_S3_BUCKET`	No**	S3 bucket for email processing
`RAPIDAPI_KEY`	No	RapidAPI key for premium sources
`THEIRSTACK_API_KEY`	No	TheirStack API key
`JOBVEN_API_KEY`	No	Jobven API key

*One AI provider key required if ai_matching: true **Required if using email pipeline mode

Scheduling

Linux/macOS (cron)

crontab -e

# Run daily at 10 AM
0 10 * * * cd /path/to/job-search-bot && node src/cli/index.js >> logs/cron.log 2>&1

AWS Lambda

The infra/ directory contains an AWS CDK stack for deploying as a scheduled Lambda function with email processing via SES.

cd infra
npm install
cdk deploy

Testing

# Run all tests
npm test

# Run specific test file
npm test -- --testPathPatterns="ai-matcher"

# Run with coverage
npm test -- --coverage

License

MIT License - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 235 Commits
.vscode		.vscode
docs		docs
infra		infra
scripts		scripts
src		src
templates		templates
test		test
user		user
.env.example		.env.example
.gitignore		.gitignore
.node-version		.node-version
CONVENTIONS.md		CONVENTIONS.md
README.md		README.md
TODO.md		TODO.md
issues.md		issues.md
jest.config.js		jest.config.js
job-boards.md		job-boards.md
job-search-bot-swark.mmd		job-search-bot-swark.mmd
job-search-bot.mmd		job-search-bot.mmd
package-lock.json		package-lock.json
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

Job Search Bot

Features

Architecture

Pipeline Flow

Quick Start

1. Install

2. Configure Settings

3. Add Your Resume

4. Set Environment Variables

5. Run

CLI Reference

Execution Modes

Pipeline Modes

Additional Options

Source Management

Pipeline Modes

Settings Override

Job Sources

Supported Source Types

Adding Job Sources

Simple API (auto-mapping)

RSS Feed

ATS Company Board

Custom Handler

Configuration

User Configuration Files

Environment Variables

Scheduling

Linux/macOS (cron)

AWS Lambda

Testing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages