feat!: preserve line breaks between consecutive comments in pretty mode#7770
feat!: preserve line breaks between consecutive comments in pretty mode#7770tobymao wants to merge 1 commit into
Conversation
When multiple comments were attached to the same expression (e.g., from consecutive -- comment lines before a SELECT), they were joined with spaces even in pretty mode. Now they are joined with the generator's separator (newline in pretty mode, space otherwise). Additionally, _replace_line_breaks is applied per-comment content rather than post-join, so newlines between separate comments get proper SQL indentation while newlines within a single multi-line comment remain as sentinels to prevent extra indentation. Closes #7764
SQLGlot Integration Test Results✅ All tests passedComparing:
Overallmain: 192428 total, 153523 passed (pass rate: 79.8%) sqlglot:fix/comments-pretty-mode-newlines: 190119 total, 151503 passed (pass rate: 79.7%) Transitions: Dialect pair changes: 0 previous results not found, 1 current results not found ✅ All tests passed |
| a /* sqlglot.meta case_sensitive */ /* noqa */ | ||
| a /* sqlglot.meta case_sensitive */ | ||
| /* noqa */ |
There was a problem hiding this comment.
This looks unsafe, given our heuristic for comment attaching. Observe the difference:
(sqlglot) ➜ sqlglot git:(main) sqlglot --parse "SELECT
dquote> a /* foo */ /* bar */
dquote> FROM tbl"
Select(
expressions=[
Column(
this=Identifier(this=a, quoted=False),
_comments=[
foo ,
bar ])],
from_=From(
this=Table(
this=Identifier(this=tbl, quoted=False))))
(sqlglot) ➜ sqlglot git:(main) sqlglot --parse "SELECT
a /* foo */
/* bar */
FROM tbl"
Select(
expressions=[
Column(
this=Identifier(this=a, quoted=False),
_comments=[
foo ])],
from_=From(
this=Table(
this=Identifier(this=tbl, quoted=False)),
_comments=[
bar ]))
The input results in getting two comments attached to the column. The output:
- Before: preserved the comment next to the column, so parsing it again will produce the same AST
- After: comment is moved to
tbl, so parsing it again will attach the comment in a different node
So metadata-related comments can end up in wrong places.
There was a problem hiding this comment.
This generalizes: the change is unsafe for any node carrying 2+ trailing comments. The 2nd+ comments get pushed onto their own line and re-attach to whatever comes next.
Perhaps a safer approach would be:
- comments_sql = self.sep().join(
+ comments_list = [
f"/*{self._replace_line_breaks(self.sanitize_comment(comment))}*/"
for comment in comments
if comment
- )
+ ]
- if not comments_sql:
+ if not comments_list:
return sql
if separated or isinstance(expression, self.WITH_SEPARATED_COMMENTS):
+ # Leading comments precede the same token, so joining them with the separator
+ # (a newline in pretty mode) is round-trip safe.
+ comments_sql = self.sep().join(comments_list)
return (
f"{self.sep()}{comments_sql}{sql}"
if not sql or sql[0].isspace()
else f"{comments_sql}{self.sep()}{sql}"
)
- return f"{sql} {comments_sql}"
+ # Trailing comments are joined with spaces: emitting them on separate lines would
+ # re-attach the later ones to the following token when the SQL is parsed again.
+ return f"{sql} {' '.join(comments_list)}"
When multiple comments are attached to the same expression (e.g., from consecutive
--comment lines before aSELECT), they were joined with spaces even in pretty mode. Now they are joined with the generator's separator (newline in pretty mode, space otherwise).Additionally,
_replace_line_breaksis applied per-comment content rather than post-join, so newlines between separate comments get proper SQL indentation while newlines within a single multi-line comment remain as sentinels to prevent extra indentation.Closes #7764