Skip to content

ValkDB/postgresparser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

postgresparser

CI Go Reference License

A pure-Go PostgreSQL parser. No cgo, no C toolchain — just go build.

Why postgresparser?

Need to parse PostgreSQL SQL in Go but can't use cgo? Deploying to Alpine containers, Lambda, ARM, scratch images, or anywhere that requires CGO_ENABLED=0?

postgresparser works everywhere go build works. It parses SQL into a structured intermediate representation (IR) that gives you tables, columns, joins, filters, CTEs, subqueries, and more — without executing anything.

result, err := postgresparser.ParseSQL(`
    SELECT u.name, COUNT(o.id) AS order_count
    FROM users u
    LEFT JOIN orders o ON o.user_id = u.id
    WHERE u.active = true
    GROUP BY u.name
    ORDER BY order_count DESC
`)

fmt.Println(result.Command)       // "SELECT"
fmt.Println(result.Tables)        // users, orders with aliases
fmt.Println(result.Columns)       // u.name, COUNT(o.id) AS order_count
fmt.Println(result.Where)         // ["u.active=true"]
fmt.Println(result.JoinConditions) // ["o.user_id=u.id"]
fmt.Println(result.GroupBy)       // ["u.name"]
fmt.Println(result.ColumnUsage)   // each column with its role: filter, join, projection, group, order

For multi-statement behavior and strict mode, see Statement Count Handling.

batch, err := postgresparser.ParseSQLAll(`
CREATE TABLE public.api_key (
    id integer NOT NULL
);
CREATE TABLE public.sometable (
    id integer NOT NULL
);`)
if err != nil {
    log.Fatal(err)
}

fmt.Println(len(batch.Statements))                              // 2
fmt.Println(batch.Statements[0].Query.DDLActions[0].ObjectName) // "api_key"
fmt.Println(batch.Statements[1].Query.DDLActions[0].ObjectName) // "sometable"

Performance: With SLL prediction mode, most queries parse in 70–350 µs.

Installation

go get github.com/valkdb/postgresparser

What you can build with it

  • Query linting — detect missing WHERE on DELETEs, flag SELECT *, enforce naming conventions
  • Dependency extraction — map which tables and columns a query touches, build lineage graphs
  • Migration tooling — parse DDL to understand schema changes, diff CREATE statements
  • Audit logging — tag log entries with structured metadata (tables, operation type, filtered columns)
  • Query rewriting — inject tenant filters, add audit columns, transform SQL before execution
  • Index advisors — analyze column usage patterns to suggest optimal indexes

Parsing

Handles the SQL you actually write in production:

  • DML: SELECT, INSERT, UPDATE, DELETE, MERGE
  • DDL: CREATE TABLE (columns/type/nullability/default + PK/FK/UNIQUE/CHECK constraints), CREATE INDEX, DROP TABLE/INDEX, ALTER TABLE, TRUNCATE, COMMENT ON
  • CTEs: WITH ... AS including RECURSIVE, materialization hints
  • JOINs: INNER, LEFT, RIGHT, FULL, CROSS, NATURAL, LATERAL
  • Subqueries: in SELECT, FROM, WHERE, and HAVING
  • Set operations: UNION, INTERSECT, EXCEPT (ALL/DISTINCT)
  • Upsert: INSERT ... ON CONFLICT DO UPDATE/DO NOTHING
  • JSONB: ->, ->>, @>, ?, ?|, ?&
  • Window functions: OVER, PARTITION BY
  • Type casts: ::type
  • Parameters: $1, $2, ...

IR field reference: ParsedQuery IR Reference Comment extraction guide: Comment Extraction Guide

Statement Count Handling

Use the API variant that matches your input contract:

  • ParseSQL(sql) parses the first statement only (backward-compatible behavior).
  • ParseSQLAll(sql) parses all statements and returns ParseBatchResult with one Statements[i] result per input statement (Index, RawSQL, Query, Warnings).
    • A statement failed conversion when Statements[i].Query == nil.
    • Correlation is deterministic: Statements[i].Index maps to source statement order.
    • HasFailures is true when any statement has a nil Query or any Warnings.
  • ParseSQLStrict(sql) requires exactly one statement and returns ErrMultipleStatements when input contains more than one.
  • ParseSQLWithOptions(sql, opts), ParseSQLAllWithOptions(sql, opts), and ParseSQLStrictWithOptions(sql, opts) expose optional extraction flags.
    • IncludeCreateTableFieldComments enables inline -- field-comment extraction in CREATE TABLE.
    • COMMENT ON extraction is always enabled.

Supported SQL Statements

See docs/supported-statements.md for full details on parsed commands, graceful handling (e.g. SET/SHOW/RESET), and what's currently UNKNOWN or unsupported.

Category Statements Status
DML SELECT, INSERT, UPDATE, DELETE, MERGE Full IR extraction
DDL CREATE TABLE, ALTER TABLE, DROP TABLE/INDEX, CREATE INDEX, TRUNCATE, COMMENT ON Full IR extraction
Utility SET, SHOW, RESET Graceful — returns UNKNOWN, no error
Other GRANT, REVOKE, CREATE VIEW/FUNCTION/TRIGGER, COPY, EXPLAIN, VACUUM, BEGIN/COMMIT/ROLLBACK, etc. Not yet supported — may error or return UNKNOWN

Analysis

The analysis subpackage provides higher-level intelligence on top of the IR:

  • analysis.AnalyzeSQL(sql) analyzes the first statement only (matches ParseSQL).
  • analysis.AnalyzeSQLAll(sql) analyzes all statements and returns SQLAnalysisBatchResult.
  • analysis.AnalyzeSQLStrict(sql) requires exactly one statement (matches ParseSQLStrict).

Column usage analysis

Know exactly how every column is used — filtering, joining, projection, grouping, ordering:

result, err := analysis.AnalyzeSQL("SELECT o.id, c.name FROM orders o JOIN customers c ON o.customer_id = c.id WHERE o.status = 'active'")

for _, cu := range result.ColumnUsage {
    fmt.Printf("%s.%s → %s\n", cu.TableAlias, cu.Column, cu.UsageType)
}
// o.id → projection
// c.name → projection
// o.customer_id → join
// c.id → join
// o.status → filter

WHERE condition extraction

Pull structured conditions with operators and values:

conditions, _ := analysis.ExtractWhereConditions("SELECT * FROM orders WHERE status = 'active' AND total > 100")

for _, c := range conditions {
    fmt.Printf("%s %s %v\n", c.Column, c.Operator, c.Value)
}
// status = active
// total > 100

Placeholder roles

AnalyzeSQL exposes the syntactic role of each ? or $N placeholder without re-parsing or scanning SQL text:

result, _ := analysis.AnalyzeSQL("SELECT * FROM t WHERE id = ? LIMIT ?")
for _, p := range result.Placeholders {
    fmt.Printf("placeholder %d: role=%s\n", p.Index, p.Role)
}
// Output:
//   placeholder 1: role=where_value
//   placeholder 2: role=limit

Roles cover common placeholder positions such as predicate values, function arguments, GROUP BY and ORDER BY ordinals, LIMIT, OFFSET, INTERVAL, array members, INSERT values, and UPDATE SET values. This supports generic use cases such as ORM type-checking, query rewriting, SQL linting, and normalized-SQL inspection.

Schema-aware JOIN relationship detection

Pass in your schema metadata and get back foreign key relationships — no heuristic guessing:

schema := map[string][]analysis.ColumnSchema{
    "customers": {
        {Name: "id", PGType: "bigint", IsPrimaryKey: true},
        {Name: "name", PGType: "text"},
    },
    "orders": {
        {Name: "id", PGType: "bigint", IsPrimaryKey: true},
        {Name: "customer_id", PGType: "bigint"},
    },
}

joins, _ := analysis.ExtractJoinRelationshipsWithSchema(
    "SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id",
    schema,
)
// orders.customer_id → customers.id

DDL extraction

For CREATE TABLE parsing, see examples/ddl/.

Performance

With SLL prediction mode, postgresparser parses most queries in 70–350 µs with minimal allocations. The IR extraction layer accounts for only ~3% of CPU — the rest is ANTLR's grammar engine, which SLL mode keeps fast.

See the Performance Guide for benchmarks, profiling results, and optimization details.

Examples

See the examples/ directory:

  • basic/ — Parse SQL and inspect the IR
  • analysis/ — Column usage, WHERE conditions, JOIN relationships
  • ddl/ — Parse CREATE TABLE / ALTER TABLE plus DELETE command metadata
  • multi_statement/ — Correlate ParseSQLAll output back to each input statement and detect failures (Query == nil)
  • sll_mode/ — SLL prediction mode for maximum throughput

Grammar

Built on ANTLR4 grammar files in grammar/. To regenerate after modifying:

antlr4 -Dlanguage=Go -visitor -listener -package gen -o gen grammar/PostgreSQLLexer.g4 grammar/PostgreSQLParser.g4

Compatibility

This is an ANTLR4-based grammar, not PostgreSQL's internal server parser. Some edge-case syntax may differ across PostgreSQL versions. If you find a query that parses in PostgreSQL but fails here, please open an issue with a minimal repro.

ParseSQL processes the first SQL statement for backward compatibility. For multi-statement input, use ParseSQLAll; to enforce exactly one statement, use ParseSQLStrict.

License

Apache License 2.0 — see LICENSE for details.

About

ANTLR-based PostgreSQL query parser for Go. Extracts tables, columns, joins, CTEs, parameters, DDL actions, and full column-usage metadata from SQL into a structured IR.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors