Skip to content

feat!: preserve line breaks between consecutive comments in pretty mode#7770

Open
tobymao wants to merge 1 commit into
mainfrom
fix/comments-pretty-mode-newlines
Open

feat!: preserve line breaks between consecutive comments in pretty mode#7770
tobymao wants to merge 1 commit into
mainfrom
fix/comments-pretty-mode-newlines

Conversation

@tobymao

@tobymao tobymao commented Jun 20, 2026

Copy link
Copy Markdown
Owner

When multiple comments are attached to the same expression (e.g., from consecutive -- comment lines before a SELECT), they were joined with spaces even in pretty mode. Now they are joined with the generator's separator (newline in pretty mode, space otherwise).

Additionally, _replace_line_breaks is applied per-comment content rather than post-join, so newlines between separate comments get proper SQL indentation while newlines within a single multi-line comment remain as sentinels to prevent extra indentation.

Closes #7764

When multiple comments were attached to the same expression (e.g., from
consecutive -- comment lines before a SELECT), they were joined with
spaces even in pretty mode. Now they are joined with the generator's
separator (newline in pretty mode, space otherwise).

Additionally, _replace_line_breaks is applied per-comment content rather
than post-join, so newlines between separate comments get proper SQL
indentation while newlines within a single multi-line comment remain as
sentinels to prevent extra indentation.

Closes #7764
@github-actions

Copy link
Copy Markdown
Contributor

SQLGlot Integration Test Results

✅ All tests passed

Comparing:

  • this branch (sqlglot:fix/comments-pretty-mode-newlines @ sqlglot 4cf2710)
  • baseline (main @ sqlglot 12eba76)

Overall

main: 192428 total, 153523 passed (pass rate: 79.8%)

sqlglot:fix/comments-pretty-mode-newlines: 190119 total, 151503 passed (pass rate: 79.7%)

Transitions:
No change

Dialect pair changes: 0 previous results not found, 1 current results not found

✅ All tests passed

Comment thread tests/test_transpile.py
Comment on lines -438 to +447
a /* sqlglot.meta case_sensitive */ /* noqa */
a /* sqlglot.meta case_sensitive */
/* noqa */

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unsafe, given our heuristic for comment attaching. Observe the difference:

(sqlglot) ➜  sqlglot git:(main) sqlglot --parse "SELECT
dquote> a /* foo */ /* bar */
dquote> FROM tbl"
Select(
  expressions=[
    Column(
      this=Identifier(this=a, quoted=False),
      _comments=[
        foo ,
        bar ])],
  from_=From(
    this=Table(
      this=Identifier(this=tbl, quoted=False))))
(sqlglot) ➜  sqlglot git:(main) sqlglot --parse "SELECT
a /* foo */
/* bar */
FROM tbl"
Select(
  expressions=[
    Column(
      this=Identifier(this=a, quoted=False),
      _comments=[
        foo ])],
  from_=From(
    this=Table(
      this=Identifier(this=tbl, quoted=False)),
    _comments=[
      bar ]))

The input results in getting two comments attached to the column. The output:

  • Before: preserved the comment next to the column, so parsing it again will produce the same AST
  • After: comment is moved to tbl, so parsing it again will attach the comment in a different node

So metadata-related comments can end up in wrong places.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generalizes: the change is unsafe for any node carrying 2+ trailing comments. The 2nd+ comments get pushed onto their own line and re-attach to whatever comes next.

Perhaps a safer approach would be:

-        comments_sql = self.sep().join(
+        comments_list = [
             f"/*{self._replace_line_breaks(self.sanitize_comment(comment))}*/"
             for comment in comments
             if comment
-        )
+        ]

-        if not comments_sql:
+        if not comments_list:
             return sql

         if separated or isinstance(expression, self.WITH_SEPARATED_COMMENTS):
+            # Leading comments precede the same token, so joining them with the separator
+            # (a newline in pretty mode) is round-trip safe.
+            comments_sql = self.sep().join(comments_list)
             return (
                 f"{self.sep()}{comments_sql}{sql}"
                 if not sql or sql[0].isspace()
                 else f"{comments_sql}{self.sep()}{sql}"
             )

-        return f"{sql} {comments_sql}"
+        # Trailing comments are joined with spaces: emitting them on separate lines would
+        # re-attach the later ones to the following token when the SQL is parsed again.
+        return f"{sql} {' '.join(comments_list)}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Comments in different lines combined into a single line

2 participants