Use robots.txt and mitigate AI bots#460
Open
cycomachead wants to merge 6 commits into
Open
Conversation
- Update robots.txt to disallow /project/ and /api/ for compliant crawlers. - Add a user-agent blocklist to return 403 for aggressive SEO bots. - Implement rate limiting (60r/s) specifically for the /project/:id XML endpoint to reduce egress costs from automated scraping. - Serve robots.txt as a static file directly from the repository. Co-authored-by: Claude Code <noreply@anthropic.com>
…ad/ai/26/1 * 'main' of github.com:snap-cloud/snapCloud: docs: move installation and deployment guides to docs directory rerun migrations feat: enhance compression logging and prevent stale CSS assets feat: implement pre-compression and global gzip configuration
Update internal UI and model call sites to use the explicit /api/v1/project/:id path instead of the bare /project/:id route. This ensures legitimate application traffic bypasses bot-mitigation rate limits targeting legacy crawler entry points. Co-authored-by: Claude Code <noreply@anthropic.com>
cycomachead
commented
May 23, 2026
|
|
||
| # UA blocklist for SEO crawlers that ignore robots.txt. \b word-boundaries | ||
| # guard against substring false positives (e.g. "yeti" inside a longer UA). | ||
| map $http_user_agent $snap_block_ua { |
Member
Author
There was a problem hiding this comment.
These bots were pull from the user agents of actual request logs for a few days in may.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Update nginx config to reduce outbound traffic
Changes
html/robots.txt— Replaces the old per-bot rules with a policy that blocks/project/and/api/for all crawlers while explicitly allowing the upcoming/project/*/users/*viewer route; addsCrawl-delay: 10and aSitemap:line (confirm with owner whether a real sitemap exists before merging)nginx.conf.d/snap-bot-mitigation.conf(new) — http-contextmapthat flags 8 SEO bots known to ignore robots.txt (ahrefsbot,semrushbot,dotbot,mj12bot,sleepbot,yeti,blexbot,petalbot);limit_req_zonefor/project/<id>at 60 r/s;limit_req_status 429nginx.conf.d/snap-bot-mitigation-server.conf(new) — server-context directives:location = /robots.txtserving the static file from the repo (no app dependency), andif ($snap_block_ua) { return 403; }nginx.conf.d/locations.conf— Includes the server-context snippet (applies to both prod hosts, both staging hosts, and dev via the sharedlocations.conf); adds alocation ~ ^/project/[0-9]+/?$block withlimit_req zone=snap_project burst=120 nodelaytargeting the raw XML endpoint — the single largest egress sourcenginx.conf— Includessnap-bot-mitigation.confin http contextReviewer notes
error.logforlimiting requestslines after deploy and tighten only with evidence./project/<name>/users/<username>HTML route does not match the numeric-ID regex and is unaffected by rate limiting.nginx -t && systemctl reload nginx(graceful reload, not restart).Superconductor Ticket Implementation | App Preview | Guided Review