Make --progress and --checkpoint strictly by-statement#1753
Open
rolandwalker wants to merge 1 commit intomainfrom
Open
Make --progress and --checkpoint strictly by-statement#1753rolandwalker wants to merge 1 commit intomainfrom
--progress and --checkpoint strictly by-statement#1753rolandwalker wants to merge 1 commit intomainfrom
Conversation
|
No correctness or security issues stood out in the PR-scoped changes. Residual risk / test gap:
|
Previously --progress and --checkpoint were influenced by linebreaks to some extent: multiline queries were correctly joined and counted/ dispatched/checkpointed as one query, but multiple queries on a single line were dispatched together. That means that the progress estimation could be thrown off somewhat, depending on the file contents, and more importantly means that a statement which was part of line with more than one statement might fail to be written to the line-influenced checkpoint file if that particular query succeeded, but a subsequent query on the same line failed. This subtlety is important if we are to use the checkpoint file to resume scripts, though in general it would be best when running scripts to avoid all of these corner cases by having one statement per line. We pull in sqlparse in addition to sqlglot, because sqlparse has the feature of preserving the input literally when splitting multi-statement lines. This also fixes a bug: the generator here named batch_gen was recreated in the --progress loop, which didn't matter before this change since iterating over a filehandle covered up the issue. Tests are added for statements_from_filehandle(), which had no coverage before. Incidentally * fix missing changelog entry * fix whitespace in a comment * remove a backslash by double-quoting a string which contains a single quote
0ef70d1 to
a0c50b8
Compare
scottnemes
approved these changes
Mar 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Previously
--progressand--checkpointwere influenced by linebreaks to some extent: multiline queries were correctly joined and counted/dispatched/checkpointed as one query, but multiple queries on a single line were dispatched together.That means that the progress estimation could be thrown off somewhat, depending on the file contents, and more importantly means that a statement which was part of line with more than one statement might fail to be written to the line-influenced checkpoint file if that particular query succeeded, but a subsequent query on the same line failed.
This subtlety is important if we are to use the checkpoint file to resume scripts, though in general it would be best when running scripts to avoid all of these corner cases by having one statement per line.
We pull in
sqlparsein addition tosqlglot, becausesqlparsehas the feature of preserving the input literally when splitting multi-statement lines.This also fixes a bug: the generator here named
batch_genwas recreated in the--progress loop, which didn't matter before this change since iterating over a filehandle covered up the issue.Tests are added for
statements_from_filehandle(), which had no coverage before.Incidentally
Checklist
changelog.mdfile.AUTHORSfile (or it's already there).