A Claude Code skill that turns "compare these two systems" into an editorial-quality PDF with forced verdicts, a ranked steal list, and a fitness scorecard.
When you evaluate an external tool ecosystem, you get one of two outputs:
- A vague "both have strengths" document that doesn't tell you what to do
- A biased teardown that misses what the other system actually does well
Neither is useful. The real question is: what do we steal, and what do we protect?
Spawns 4 parallel research agents. Each reads actual source files, not directory names. Each is forced to commit to a WHO WINS verdict per dimension — no weasel words.
3 domain agents cover capability tracks (infrastructure, copy/QA, measurement/automation — or whatever 3 tracks fit your domain).
1 fitness agent scores both systems on 10 structural vectors:
| Vector | What It Measures |
|---|---|
| Prerequisites | Dependencies, setup complexity, what must exist before it runs |
| Cost Impact | Token burn, API calls, agent spawns per typical run |
| Quality Control | Does it validate output, or does quality depend on the model? |
| User Experience | Friction level, interaction model, output format fit |
| Skill Standard Compliance | Hits the canonical good-skill bar (scoped, testable, clear triggers) |
| Security | API key handling, prompt injection surface, data leaving workspace |
| System Architecture Fit | Skill vs pipeline vs full system — is the scope right? |
| Workspace Compatibility | Plugin namespace fit, routing conventions |
| Trigger Precision | Unambiguous triggers, no overlap with adjacent skills |
| Maintenance Decay Rate | How fast does this rot? (scraped data vs stable internal conventions) |
Phase 1 is a grill-me style intake — one question at a time, recommended answer provided, reads workspace before asking. Phase 2 spawns all 4 agents simultaneously. Phase 3 synthesizes domain matrix + fitness scorecard + ranked steal list. Phase 4 generates the PDF.
The deliverable is a PDF. Magazine layout. Cover page. SVG strength diagram. Per-track comparison blocks (orange winner, gray loser). Fitness scorecard. Full matrix. Numbered steal list ranked by operational impact. Moats callout.
- external-vs-internal — external repo or competitor vs your internal stack. Gap-finding bias.
- skill-vs-skill — two skills in your workspace. Decide keep/merge/deprecate.
- skill-vs-system — is this skill actually a system? Produces an architecture verdict.
Copy SKILL.md into your Claude Code workspace:
mkdir -p .claude/skills/system-comparison
cp SKILL.md .claude/skills/system-comparison/SKILL.mdRequires puppeteer for PDF generation:
bun add puppeteer
# or: npm install puppeteerSay any of these to invoke the skill:
- "compare our skills to [X]"
- "gap assessment between [A] and [B]"
- "what should we steal from [X]"
- "evaluate this repo"
- "is this a skill or a system"
- "benchmark against [X]"
The first validated run compared LeadGrow's internal outbound stack (GTM orchestrator + 20 lg-outbound skills) against coldoutboundskills (28 open-source skills, 1,000+ campaigns).
Finding: LeadGrow was back-loaded (strong post-launch, weak pre-launch). coldoutboundskills was front-loaded (strong pre-launch, weak post-launch). Symmetrical gap. 12 specific steal items ranked by operational impact.
That's the output this skill produces on every run.
- Claude Code with agent spawning support
bunornode+puppeteerfor PDF generation- Google Fonts access for PDF rendering (Poppins + Lora via CDN)
PDF uses cream (#faf9f5) / black (#141413) / orange (#d97757). Poppins for headings and labels, Lora for body copy. Letter size. The HTML component library is documented in SKILL.md under Phase 4.
Built by LeadGrow — GTM agency running Claude Code as operating infrastructure.