Stop using Vec::remove(0)#5
Merged
Merged
Conversation
When using `pg_query::parse` on a very large query (for example `SELECT
COUNT(*) FROM users WHERE users.id IN ({1.3M entries})`), pg_query would
effectively hang indefinitely (5-6 minutes in production, never waited
long enough on my local machine to get a measurement).
Profiling the code showed 99.99% of the time was spent in
`__memmove_avx_unaligned_erms`, called from `flatten` within
`ParseResult::new`. Digging into `nodes`, it was clear why. These
functions are operating by maintaining two lists, the remaining elements
to iterate over, and the results to return. When an element is removed
from the list to iterate over, it uses `Vec::remove(0)`, which must copy
the entire remaining vector over each time. This results in a
computational complexity of somewhere between `O(N^2)` and `O(N!)`.
This commit changes the code to use `VecDeque` instead, which is built
for exactly this purpose, and can pop from the front in constant time.
Ideally these functions wouldn't allocate at all. Rather than returning
`Vec`, they'd be much better off returning `impl Iterator` and yielding
without allocation (especially since every caller of these internally
just immediately iterates over the result). But that's a bigger rewrite,
and would be a backwards incompatible change to the API. I am willing to
implement that if such a change would be accepted.
In the short term though, using `VecDeque` removes the degenerate
performance case, and changes the performance from so slow I don't have
a measurement, to 1.2s on my local machine which is somewhat in line
with the Ruby implementation 975ms.
Author
|
CI failure is due to stable changing the error message of a panic, unrelated to this change. Gonna merge and let upstream handle it |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When using
pg_query::parseon a very large query (for exampleSELECT COUNT(*) FROM users WHERE users.id IN ({1.3M entries})), pg_query would effectively hang indefinitely (5-6 minutes in production, never waited long enough on my local machine to get a measurement).Profiling the code showed 99.99% of the time was spent in
__memmove_avx_unaligned_erms, called fromflattenwithinParseResult::new. Digging intonodes, it was clear why. These functions are operating by maintaining two lists, the remaining elements to iterate over, and the results to return. When an element is removed from the list to iterate over, it usesVec::remove(0), which must copy the entire remaining vector over each time. This results in a computational complexity of somewhere betweenO(N^2)andO(N!).This commit changes the code to use
VecDequeinstead, which is built for exactly this purpose, and can pop from the front in constant time. Ideally these functions wouldn't allocate at all. Rather than returningVec, they'd be much better off returningimpl Iteratorand yielding without allocation (especially since every caller of these internally just immediately iterates over the result). But that's a bigger rewrite, and would be a backwards incompatible change to the API. I am willing to implement that if such a change would be accepted.In the short term though, using
VecDequeremoves the degenerate performance case, and changes the performance from so slow I don't have a measurement, to 1.2s on my local machine which is somewhat in line with the Ruby implementation 975ms.