Skip to content

Chore!!: Migrate sqlglot to v30#5736

Open
VaggelisD wants to merge 2 commits intomainfrom
vaggelisd/migrate_sqlglot
Open

Chore!!: Migrate sqlglot to v30#5736
VaggelisD wants to merge 2 commits intomainfrom
vaggelisd/migrate_sqlglot

Conversation

@VaggelisD
Copy link
Collaborator

Summary of Changes

File Change Why
pyproject.toml sqlglot[rs]~=28.10.1sqlglot~=30.0.1 Version bump; [rs] extra removed (replaced by sqlglotc)
60+ files across sqlmesh/, tests/, web/ exp.Expressionexp.Expr in all type hints, isinstance checks, and type casts. Class definitions remain as exp.Expression subclasses sqlglot v30 introduced exp.Expr as the new base class; Expression is now a subclass. Functions like exp.alias_(), exp.func(), parse_one() return Expr, causing mypy errors when assigned to Expression-typed variables. Custom expression classes must still inherit Expression since Expr doesn't support constructor arguments
sqlmesh/core/schema_diff.py, tests/core/integration/utils.py exp.DataType.Typeexp.DType in type annotations DataType.Type enum is no longer recognized as a valid type by mypy; exp.DType is the v30 alias
sqlmesh/core/dialect.py New import path for AthenaTrinoParser, updated self.expression() calls sqlglot API: Athena parser renamed, expression() now takes pre-constructed objects
sqlmesh/utils/jinja.py Import Expression from sqlglot.expressions instead of sqlglot sqlglot.expressions is now a package; top-level re-export changed
sqlmesh/core/model/kind.py Clear meta["sql"] in _time_data_type_validator DataType.build now preserves the parsed expression directly. Our _parse_types extension sets meta["sql"], which the pydantic encoder prioritizes over dialect-aware rendering
sqlmesh/core/model/meta.py Normalize column types through dialect roundtrip during deserialization BigQuery INT no longer auto-maps to BIGINT during DataType.build; roundtripping through dialect SQL restores canonical form for stable data hashes
sqlmesh/utils/metaprogramming.py Added _resolve_import_module() to walk module hierarchy for re-exports to_table.__module__ is now sqlglot.expressions.builders but it's re-exported from sqlglot.expressions; generated import statements must use the public module
sqlmesh/core/context_diff.py Added sqlglotc to IGNORED_PACKAGES New sqlglot conditionally imports sqlglotc (C extension); dependency detection was picking it up as a user requirement
tests/core/test_macros.py Simplified expected Snowflake ARRAY_GENERATE_RANGE arithmetic sqlglot now simplifies (DATEDIFF(...) + 1 - 1) + 1 to DATEDIFF(...) + 1
tests/core/test_config.py Relaxed regex for error message module path Column class now reports __module__ as sqlglot.expressions.core instead of sqlglot.expressions

@VaggelisD VaggelisD force-pushed the vaggelisd/migrate_sqlglot branch from 3fd7c5d to a90a4bd Compare March 17, 2026 11:52
Signed-off-by: vaggelisd <daniasevangelos@gmail.com>
@VaggelisD VaggelisD force-pushed the vaggelisd/migrate_sqlglot branch from a90a4bd to 8df9c29 Compare March 17, 2026 11:53
@VaggelisD
Copy link
Collaborator Author

The DuckDB test failure is preexisting

Comment on lines +764 to +768
# Clear meta["sql"] (set by our parser extension) so the pydantic encoder
# uses dialect-aware rendering: e.sql(dialect=meta["dialect"]). Without this,
# the raw SQL text takes priority, which can be wrong for dialect-normalized
# types (e.g., default "TIMESTAMP" should render as "DATETIME" in BigQuery).
data_type.meta.pop("sql", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test for this? Why did this work before the bump? Do we risk messing up the serialized version of this field by popping sql?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was because of this commit tobymao/sqlglot#7092.

On v28, we'd:

  1. Parse the string text into exp.DataType with parse_one (setting meta["sql"] in the process)
  2. Reconstruct a fresh object again before returning, effectively discarding meta

On v30 we only do (1) now after the optimization PR.

We should have the same serialization and data hashes, otherwise e.g we'd notice BigQuery parsing "TIMESTAMP" into exp.DType.DATETIME but roundtripping it to "TIMESTAMP" again due to meta taking priority as raw text verbatim.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the current behavior is coincidentally consistent with what we used to do before the migration, because of the buggy AST node reconstruction in v28?

source=source,
target=target,
on=exp.condition(on) if on else None,
on=t.cast(exp.Condition, exp.condition(on)) if on else None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a cast here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, on v30 that was widened to exp.Expr, I'll widen the type hints for all the call chains

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks.

try:
parent_module = sys.modules.get(parent) or importlib.import_module(parent)
if getattr(parent_module, name, None) is obj:
return parent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this only work for 1 parent right now? For example, if the new module became sqlglot.expressions.foo.bar, would you stop at foo?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is under a for loop walking up to the root module, this should work I believe

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that if foo re-exports bar in the above example, don't we stop at foo? Which means that we'll still see a diff in the serialized import due to a different path.

Signed-off-by: vaggelisd <daniasevangelos@gmail.com>
@VaggelisD VaggelisD force-pushed the vaggelisd/migrate_sqlglot branch from 45dbbbc to 264240e Compare March 17, 2026 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants