An automated job search aggregator that scans multiple remote job APIs, scores matches against your resume using AI, and delivers daily reports via email.
⚠️ Disclaimer: This software is provided "as-is" without warranty of any kind. Use at your own risk. The authors are not responsible for any job opportunities missed, API rate limits exceeded, or AWS charges incurred. Always review the generated reports and verify job listings independently.
- AI-powered matching - Uses OpenAI or Anthropic Claude to evaluate job relevance against your resume
- Resume-driven search - Parse your resume (PDF, DOCX, or Markdown) to extract skills and configure search criteria
- Multi-source aggregation - Fetches from 15+ job sources: APIs, RSS feeds, email alerts, and web scrapers
- ATS integrations - Built-in adapters for Greenhouse, Lever, and Ashby company job boards
- Email source (BYOE) - Process job alert emails from LinkedIn, Indeed, ZipRecruiter via AWS SES
- Flexible pipelines - Run email-only, API-only, or combined searches via CLI flags or config
- Smart filtering - Hybrid local + AI filters for salary, location, job type, and work arrangement
- Single source mode - Test individual job sources with
--source=<url> - Source performance tracking - Monitor which sources deliver results with
--stats - HTML reports - Beautiful reports hosted on S3 with direct links in email notifications
- Extensible architecture - Add new job sources with YAML config or custom handlers
job-search-bot/
├── src/
│ ├── cli/ # CLI entry point and utilities
│ ├── api/ # Optional REST API server
│ ├── core/
│ │ ├── pipeline/ # Pipeline stages (EmailSource, HardFilter, AIEvaluator, etc.)
│ │ ├── adapters/ # ATS adapters (Greenhouse, Lever, Ashby)
│ │ ├── services/ # NotificationService, StorageService, TemplateService
│ │ └── sources/ # Source utilities
│ ├── configs/ # Configuration management
│ └── utils/ # Shared utilities (browser, retry, concurrency)
├── user/ # User-specific config (gitignored)
│ ├── settings/ # Your search settings (*.yaml)
│ ├── resumes/ # Your resume files
│ ├── handlers/ # Custom source handlers
│ ├── filters/ # Custom JavaScript filters
│ └── sources.yaml # Job source definitions
├── templates/
│ ├── prompts/ # AI prompt templates
│ └── reports/ # HTML report templates
├── infra/ # AWS CDK infrastructure
├── docs/ # Architecture docs and ADRs
├── test/ # Comprehensive test suite
└── logs/ # Daily log files
%%{init: {'theme': 'redux', 'layout': 'dagre'}}%%
flowchart LR
subgraph Sources["📥 Job Sources"]
direction TB
EMAIL["📧 Email Alerts<br/><small>LinkedIn, Indeed, ZipRecruiter</small>"]
API["🌐 Job APIs<br/><small>RemoteOK, Remotive, Jobicy</small>"]
RSS["📡 RSS Feeds<br/><small>WeWorkRemotely, HN</small>"]
ATS["🏢 ATS Boards<br/><small>Greenhouse, Lever, Ashby</small>"]
SCRAPE["🔍 Scrapers<br/><small>FlexJobs</small>"]
end
subgraph Fetch["1️⃣ Fetch"]
FETCHER["SourceFetcher<br/>━━━━━━━━━━<br/>• Parallel fetching<br/>• Auto field mapping<br/>• Deduplication"]
end
subgraph Filter["2️⃣ Filter"]
HARD["HardFilter<br/>━━━━━━━━━━<br/>• Date cutoff<br/>• Salary threshold<br/>• Excluded companies<br/>• Job type filters"]
end
subgraph AI["3️⃣ AI Match"]
EVAL["AIEvaluator<br/>━━━━━━━━━━<br/>• Resume context<br/>• Ideal role profile<br/>• Batch processing<br/>• Tier classification"]
end
subgraph Report["4️⃣ Report"]
GEN["ReportGenerator<br/>━━━━━━━━━━<br/>• HTML generation<br/>• Job grouping<br/>• Match explanations"]
end
subgraph Deliver["📤 Deliver"]
S3[("S3<br/>Report")]
SNS["SNS<br/>Email"]
end
subgraph Config["⚙️ Config"]
direction TB
RESUME["📄 Resume<br/><small>PDF/DOCX/MD</small>"]
SETTINGS["📋 Settings<br/><small>YAML config</small>"]
IDEAL["🎯 Ideal Role<br/><small>Preferences</small>"]
end
%% Source connections
EMAIL --> FETCHER
API --> FETCHER
RSS --> FETCHER
ATS --> FETCHER
SCRAPE --> FETCHER
%% Pipeline flow
FETCHER -->|"~500 jobs"| HARD
HARD -->|"~150 jobs"| EVAL
EVAL -->|"~30 matches"| GEN
GEN --> S3
S3 --> SNS
%% Config connections
SETTINGS -.->|"filters"| HARD
RESUME -.->|"context"| EVAL
IDEAL -.->|"preferences"| EVAL
git clone https://github.com/yourusername/job-search-bot.git
cd job-search-bot
npm installCopy the example settings file and customize:
mkdir -p user/settings user/resumes
cp user/settings.example.yaml user/settings/settings.yaml
cp user/settings/ideal-role.example.yaml user/settings/ideal-role.yamlEdit user/settings/settings.yaml to configure:
- Pipeline mode:
full,api,email, orcustom - Job matching: Target titles, required/preferred keywords
- Filters: Salary minimums, excluded companies, work arrangement preferences
- AI settings: Provider selection, model, concurrency
Place your resume in user/resumes/ (PDF, DOCX, or Markdown supported):
cp ~/path/to/your-resume.pdf user/resumes/
# or use Markdown for best results:
cp ~/path/to/your-resume.md user/resumes/resume.mdThe bot automatically finds and parses your resume at runtime.
Copy .env.example to .env and configure:
cp .env.example .envKey variables:
# AI Provider configuration (required for AI matching)
# Example: OpenAI
AI_PROVIDER=openai
AI_MODEL=gpt-5-mini
OPENAI_API_KEY=sk-...
# Or use Anthropic Claude instead:
# AI_PROVIDER=claude
# AI_MODEL=claude-3-haiku
# ANTHROPIC_API_KEY=sk-ant-...
# AWS Configuration (for reports and notifications)
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=your-access-key
AWS_SECRET_ACCESS_KEY=your-secret-key
# SNS Topic for email notifications
SNS_TOPIC_ARN=arn:aws:sns:us-east-1:123456789:job-alerts
# S3 bucket for email processing (if using email sources)
EMAIL_S3_BUCKET=your-email-bucket
# Third-party API keys (optional, for premium sources)
RAPIDAPI_KEY=your-key
THEIRSTACK_API_KEY=your-key
JOBVEN_API_KEY=your-key# Preview results without sending notifications (dry run)
npm run dry
# Debug mode - verbose output
node src/cli/index.js --debug --dry
# Live run - search and send notification
npm start
# Run only API/RSS sources (default, skip email)
node src/cli/index.js --mode=api --dry
# Run only email sources (skip API sources)
node src/cli/index.js --mode=email --dry
# Run all sources (email + API)
node src/cli/index.js --mode=full --dry
# Test a single source URL
node src/cli/index.js --source="https://remoteok.com/api" --debug| Flag | Description |
|---|---|
| (none) | Live run - sends real notifications |
--dry |
Dry run - prints results without sending |
--test |
Test mode - uses console notification service |
--debug |
Verbose logging for troubleshooting |
| Flag | Description |
|---|---|
--mode=api |
API/RSS sources with filters & AI (default) |
--mode=email |
Email sources with filters & AI |
--mode=full |
All sources (email + API) with filters & AI |
--mode=email --no-filters |
Email aggregation only (no AI) |
| Flag | Description |
|---|---|
--source=<url> |
Run against a single source URL |
--save-pre-ai |
Save job list before AI filtering to reports/ |
--no-archive |
Don't archive processed emails (allows re-runs) |
--resume /path/to/file |
Use custom resume file |
--settings /path/to/file |
Use custom settings file |
--user <name> |
Name for report identification |
| Flag | Description |
|---|---|
--stats |
Show source performance statistics |
--dismiss-alert "Source Name" |
Suppress alerts for a source |
--undismiss-alert "Source Name" |
Re-enable alerts for a source |
--undismiss-alert all |
Re-enable all dismissed alerts |
The bot supports different pipeline modes for flexibility:
| Mode | Sources | AI Filtering | Use Case |
|---|---|---|---|
api |
APIs, RSS | Yes | Daily search (default) |
email |
Email alerts | Yes | Process job alert emails |
full |
All sources | Yes | Comprehensive search |
custom |
Configurable | Configurable | Advanced users |
In user/settings/settings.yaml:
pipeline:
mode: api # api | email | full | custom
# For custom mode, specify stages:
stages:
- email
- sources
- hardFilter
- ai
- report- Free Public APIs - RemoteOK, Remotive, Himalayas, Jobicy, WorkingNomads
- RSS Feeds - WeWorkRemotely, HN Who's Hiring
- Authenticated APIs - TheirStack, Jobven, RapidAPI
- ATS Company Boards - Greenhouse, Lever, Ashby (300+ companies)
- Web Scrapers - FlexJobs (requires Chrome)
- Email Alerts - LinkedIn, Indeed, ZipRecruiter, Glassdoor
Sources are configured in user/sources.yaml:
- name: MyJobSite
url: https://api.example.com/jobs- name: MyRSSFeed
url: https://example.com/jobs.rss
type: rss- name: Stripe
platform: greenhouse
id: stripe
- name: Vercel
platform: lever
id: vercel
- name: Linear
platform: ashby
id: linearFor APIs requiring authentication, pagination, or special logic, create a handler in user/handlers/:
// user/handlers/myjobsite.js
export default {
getMapping: () => ({
dataPath: 'results',
mappings: {
id : 'job_id',
title : 'position',
company : 'employer.name',
url : 'application_url',
location : 'job_location',
description : 'details',
salary : 'compensation',
postedAt : 'published_date'
}
}),
// Optional: custom fetch logic for auth, pagination, etc.
async fetch({ source, searchConfig, config }) {
const response = await fetch('https://api.example.com/jobs', {
headers: { 'Authorization': `Bearer ${process.env.MY_API_KEY}` }
});
return response.json();
}
};Then reference it in user/sources.yaml:
- name: MyJobSite
handler: myjobsite| Path | Required | Description |
|---|---|---|
user/settings/settings.yaml |
Yes | Search settings, filters, AI config |
user/settings/ideal-role.yaml |
No | Detailed preferences for AI matching |
user/resumes/ |
Yes | Your resume file(s) - PDF, DOCX, or MD |
user/sources.yaml |
No | Custom job sources (uses defaults if missing) |
user/handlers/ |
No | Custom handler implementations |
user/filters/ |
No | Custom JavaScript filter functions |
| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY |
Yes* | OpenAI API key |
ANTHROPIC_API_KEY |
Yes* | Anthropic API key (alternative to OpenAI) |
AWS_REGION |
Yes | AWS region (default: us-east-1) |
AWS_ACCESS_KEY_ID |
Yes | AWS credentials |
AWS_SECRET_ACCESS_KEY |
Yes | AWS credentials |
SNS_TOPIC_ARN |
No | SNS topic for email notifications |
EMAIL_S3_BUCKET |
No** | S3 bucket for email processing |
RAPIDAPI_KEY |
No | RapidAPI key for premium sources |
THEIRSTACK_API_KEY |
No | TheirStack API key |
JOBVEN_API_KEY |
No | Jobven API key |
*One AI provider key required if ai_matching: true
**Required if using email pipeline mode
crontab -e# Run daily at 10 AM
0 10 * * * cd /path/to/job-search-bot && node src/cli/index.js >> logs/cron.log 2>&1The infra/ directory contains an AWS CDK stack for deploying as a scheduled Lambda function with email processing via SES.
cd infra
npm install
cdk deploy# Run all tests
npm test
# Run specific test file
npm test -- --testPathPatterns="ai-matcher"
# Run with coverage
npm test -- --coverageMIT License - see LICENSE for details.