Skip to content

Draft PR to store changes to my blog post#17877

Draft
robbie-c wants to merge 2 commits into
masterfrom
sql-parser-robbie-edits
Draft

Draft PR to store changes to my blog post#17877
robbie-c wants to merge 2 commits into
masterfrom
sql-parser-robbie-edits

Conversation

@robbie-c

Copy link
Copy Markdown
Member

Changes

See the original https://docs.google.com/document/d/1d0J9OUwxN7uCD9q7TMU8P5S91IOg3Aoe9TTP_OKVCFI/edit?tab=t.0
See #17681

Checklist

  • I've read the docs and/or content style guides.
  • Words are spelled using American English
  • Use relative URLs for internal links
  • I've checked the pages added or changed in the Vercel preview build
  • If I moved a page, I added a redirect in vercel.json

ivanagas and others added 2 commits June 16, 2026 16:26
My version on top of #17681: keeps the original intro and voice, adopts the
section headings, links, and restructured lists, adds new content
(logical-vs-physical-layout rationale, ShrinkRay, coverage-guided generation),
and a few copy-edit passes.
@github-actions

github-actions Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Deploy preview

Status Details Updated (UTC)
🟢 Ready View preview Jun 24, 2026 12:37AM

@github-actions

Copy link
Copy Markdown
Contributor

Vale prose linter → found 0 errors, 19 warnings, 0 suggestions in your markdown

Full report → Copy the linter results into an LLM to batch-fix issues.

Linter being weird? Update the rules!

contents/blog/sql-parser.md — 0 errors, 19 warnings, 0 suggestions
Line Severity Message Rule
16:51 warning 'clacky' is a possible misspelling. PostHogBase.Spelling
22:73 warning 'autoresearch' is a possible misspelling. PostHogBase.Spelling
28:70 warning 'transpile' is a possible misspelling. PostHogBase.Spelling
33:37 warning Capitalize 'Product Analytics' for PostHog's product. Use 'product analytics' for the general industry concept. PostHogBase.ProductNames
33:56 warning Capitalize 'Session Replay' for PostHog's product. Use 'session replay' for the general industry concept. PostHogBase.ProductNames
33:72 warning Capitalize 'Error Tracking' for PostHog's product. Use 'error tracking' for the general industry concept. PostHogBase.ProductNames
33:151 warning 'transpilation' is a possible misspelling. PostHogBase.Spelling
33:285 warning 'untrusted' is a possible misspelling. PostHogBase.Spelling
35:27 warning 'transpilation' is a possible misspelling. PostHogBase.Spelling
35:132 warning 'transpiled' is a possible misspelling. PostHogBase.Spelling
37:4 warning 'Generating our parser with ANTLR' heading should be in sentence case, and product names should be capitalized. PostHogBase.SentenceCase
41:134 warning 'declaratively' is a possible misspelling. PostHogBase.Spelling
45:48 warning 'lookahead' is a possible misspelling. PostHogBase.Spelling
53:162 warning 'lookahead' is a possible misspelling. PostHogBase.Spelling
67:40 warning 'transpiler' is a possible misspelling. PostHogBase.Spelling
71:93 warning 'codegen' is a possible misspelling. PostHogBase.Spelling
75:182 warning 'lookahead' is a possible misspelling. PostHogBase.Spelling
75:237 warning 'lookahead' is a possible misspelling. PostHogBase.Spelling
87:13 warning 'anonymized' is a possible misspelling. PostHogBase.Spelling


After the success of using agents to [improve query performance through autoresearch](/blog/karpathy-autoresearch-query-engine-bug), I wanted to try something more ambitious.

I ran multiple long-running Claude Code sessions in parallel, and the result was 16K lines of "hand"-rolled parser code, 5K lines of tooling, and a few more K of tests.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept this instead of the stat about number of queries. My rationale is that this actually isn't a story about doing something at scale, it's more about doing something with AI coding that's hard with some interesting computer science


Hypothesis will "reduce" test cases for you, turning them into a minimal reproduction, but I couldn’t use that with SQL from other sources. For those I used [ShrinkRay](https://github.com/DRMacIver/shrinkray) instead.

Later on, I added code-coverage-guided test case generation, which gives a better distribution of generated SQL. With coverage feedback, the generator can tell which constructs it hasn't exercised yet and bias towards those. This wasn't necessary to hit 100% accuracy on a production corpus, but it did help me find some very subtle test cases.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is new, I did this since I wrote the original


The two parallel parser approaches shared their regression suites, so any failing test case found in one session was shared with the other.

Hypothesis will "reduce" test cases for you, turning them into a minimal reproduction, but I couldn’t use that with SQL from other sources. For those I used [ShrinkRay](https://github.com/DRMacIver/shrinkray) instead.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is new, I did this since I wrote the original

@github-actions

Copy link
Copy Markdown
Contributor

Bundle report

Total JS (gzip)

6.21 MiB (no change)

Eager graph (static-import closure per entrypoint)

Entrypoint Eager size Budget Modules
app 24.13 MiB (no change) report-only 5505
Largest modules in the app closure
Module Size
css ./node_modules/.pnpm/css-loader@5.2.7_webpack@5.101.3/node_modules/css-loader/dist/cjs.js??ruleSet[1].rules[8].oneOf[1].use[1]!./node_modules/.pnpm/postcss-loader@4.3.0_postcss@8.5.6_webpack@5.101.3/node_modules/postcss-loader/dist/cjs.js??ruleSet[1].rules[8].oneOf[1].use[2]!./src/styles/global.css 710.3 KiB
./src/components/Stickers/Stickers.tsx 696.4 KiB
./.cache/caches/gatsby-plugin-mdx/mdx-scopes-dir/31a094f140f119e73085d847ae81b99b.js + 2 modules 531.3 KiB
./node_modules/.pnpm/@radix-ui+react-icons@1.3.2_react@18.3.1/node_modules/@radix-ui/react-icons/dist/react-icons.esm.js 481.4 KiB
./node_modules/.pnpm/@codemirror+view@6.38.2/node_modules/@codemirror/view/dist/index.js 458.1 KiB
./node_modules/.pnpm/rehype-raw@7.0.0/node_modules/rehype-raw/lib/index.js + 29 modules 395.1 KiB
./node_modules/.pnpm/@posthog+icons@0.36.6_react-dom@18.3.1_react@18.3.1__react@18.3.1/node_modules/@posthog/icons/dist/posthog-icons.cjs.js 364.8 KiB
./node_modules/.pnpm/@posthog+icons@0.36.6_react-dom@18.3.1_react@18.3.1__react@18.3.1/node_modules/@posthog/icons/dist/posthog-icons.es.js 354.8 KiB
./src/hooks/useCustomers.tsx + 54 modules 353.9 KiB
./node_modules/.pnpm/react-markdown@8.0.7_@types+react@16.14.66_react@18.3.1/node_modules/react-markdown/lib/react-markdown.js + 88 modules 351.4 KiB
./node_modules/.pnpm/cloudinary-core@2.14.0_lodash@4.17.21/node_modules/cloudinary-core/cloudinary-core.js 281.9 KiB
./node_modules/.pnpm/@codesandbox+sandpack-react@2.20.0_react-dom@18.3.1_react@18.3.1__react@18.3.1/node_modules/@codesandbox/sandpack-react/dist/index.mjs 266.6 KiB
./src/components/ProductComparisonTable/index.tsx + 114 modules 264.0 KiB
./node_modules/.pnpm/d3@7.9.0/node_modules/d3/src/index.js + 208 modules 247.4 KiB
./src/components/Pricing/PricingSlider/Slider.tsx + 87 modules 239.9 KiB

Eager-graph budgets are report-only until a baseline is established. Sizes are gzip of public/**/*.js; eager size is webpack module source bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants