Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions .wip/tasks/draft-protect-quote-wire-crawlers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
status: draft
---

# Protect the quote wire from crawler loops

## Problem

The Elsewhere quote wire creates procedural journey URLs with query parameters. After
release, ClaudeBot and OpenAI crawlers followed the quote wire deeply enough to burn
through the Cloudflare Workers allotment.

The site should still expose the quote index and clean quote pages for normal discovery,
but crawlers should not be invited into the infinite-looking journey space.

## Proposed solution

Keep the public quote index crawlable while making the procedural journey URLs
crawler-hostile and budget-safe. Clean archive and quote URLs should remain available, but
query-state journey URLs should not be indexed or followed.

Start with a repo-level fix before adding Cloudflare custom rules. Robots directives
should explicitly keep crawlers out of `/elsewhere/quotes/*?*`, with specific handling for
ClaudeBot and OpenAI crawlers where useful. Journey links should signal `nofollow`, and
journey pages should keep canonical URLs pointed at their clean quote path.

Cloudflare-side protection should remain a fallback if robots directives and link hints do
not reduce abusive crawler traffic enough.

## Requirements

- `/elsewhere/quotes`, `/elsewhere/quotes/archive`, and clean `/elsewhere/quotes/{slug}`
pages should remain accessible.
- Procedural journey URLs with query parameters should be disallowed for crawlers.
- Journey continuation links should use `rel="nofollow"`.
- Journey pages should remain `noindex, nofollow` and canonicalize to the clean quote URL.
- ClaudeBot and OpenAI crawler traffic should be handled explicitly in robots directives.
- Do not add Cloudflare custom rules unless the repo-level crawl controls prove
insufficient.
- The first fix may rely on crawler politeness, but it should leave a clear escalation
path for Worker budget protection.
1 change: 1 addition & 0 deletions src/elsewhere/quotes/QuotePage.astro
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ const { Content } = quote;
<a
class={`quote-path quote-path--${index}`}
href={door.href}
rel="nofollow"
data-journey-focus={index === 0 ? true : undefined}
>
{doorWords[index]}
Expand Down
21 changes: 20 additions & 1 deletion src/pages/robots.txt.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,26 @@
import { env } from "cloudflare:workers";

const productionRobots = `User-agent: *
const quoteJourneyDisallow = "Disallow: /elsewhere/quotes/*?*";

const productionRobots = `User-agent: ClaudeBot
Allow: /
${quoteJourneyDisallow}

User-agent: GPTBot
Allow: /
${quoteJourneyDisallow}

User-agent: ChatGPT-User
Allow: /
${quoteJourneyDisallow}

User-agent: OAI-SearchBot
Allow: /
${quoteJourneyDisallow}

User-agent: *
Allow: /
${quoteJourneyDisallow}

Sitemap: https://johnhooks.io/sitemap.xml
`;
Expand Down