Per-line lazy VLQ decode for /symbolicate cold path#1760
Open
robhogan wants to merge 1 commit into
Open
Conversation
Summary: ## Background `/symbolicate` resolves each stack frame by looking up its generated (line, column) in the frame module's source map. With `transformer.unstable_compactSourceMaps` (#1743), modules keep their maps as VLQ `mappings` strings rather than decoded tuples, which means we must decode at `/symbolicate` time. That regressed `/symbolicate` performance by about ~45ms mean (300ms P99), and this stack aims to recover that. The current decode runs the whole module through `toBabelSegments(map).map(toSegmentTuple)` - a full `SourceMapConsumer` pass that allocates a Babel segment object and a tuple array per segment. ## Change The insight here is that for symbolication we typically only need to decode a small number of lines per module. We can’t randomly access lines in an VLQ string, but we can traverse it cheaply, indexing lines so that we can decode what we need. This adds a per-line, allocation-light, package-private `LineIndexedMappings` to `metro-source-map`. A single pass builds a per-line index: for each generated line, its byte offset into `mappings` and the running source line/column delta accumulators as they stand entering that line. A lookup jumps straight to the target line and decodes only that line's segments. `originalPositionFor` is byte-identical to a `greatestLowerBound` over `toBabelSegments(map).map(toSegmentTuple)` (1-based lines, 0-based columns, and generated-only segments resolving to null). `Server/symbolicate.js` caches the `LineIndexedMappings` per module for the duration of a request (I’m going to look into whether it’s worth expanding this given `LineIndexedMappings` has a small footprint). Tuple-backed modules are searched directly, as before, though I’m intending to delete that path. ## Performance With some representative modules of various sizes, picked from a real dev bundle: | module (lines / segments / mappings) | BASE | NEW | speedup | | 127 / 539 / 3.0K | 318 µs | 20 µs | ~16x | | 297 / 1180 / 6.7K | 755 µs | 42 µs | ~18x | | 8041 / 68259 / 400K | 62.4 ms | 2.7 ms | ~23x | | 11294 / 95814 / 565K (`ReactFabric-dev`) | 102.5 ms | 3.8 ms | ~27x | Gains scale with module size, so they concentrate on the largest modules, some of which appear frequently in traces. The React renderer, for instance, is one of the biggest modules in a typical bundle (~11k generated lines); its cold decode drops from ~102ms to ~4ms (~27x) - that difference alone likely recovers most of the regression. Because large modules dominate total symbolication time, time *decoding* is reduced ~25x for a typical request (note: there's an unchanged per-request overhead from `_getExplodedSourceMapsForBundleOptions` not included in these microbenchmarks, which becomes the next target for optimisation). Differential Revision: D110603508
Contributor
|
@robhogan has exported this pull request. If you are a Meta employee, you can view the originating Diff in D110603508. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Background
/symbolicateresolves each stack frame by looking up its generated (line, column) in the frame module's source map. Withtransformer.unstable_compactSourceMaps(#1743), modules keep their maps as VLQmappingsstrings rather than decoded tuples, which means we must decode at/symbolicatetime. That regressed/symbolicateperformance by about ~45ms mean (300ms P99), and this stack aims to recover that.The current decode runs the whole module through
toBabelSegments(map).map(toSegmentTuple)- a fullSourceMapConsumerpass that allocates a Babel segment object and a tuple array per segment.Change
The insight here is that for symbolication we typically only need to decode a small number of lines per module. We can’t randomly access lines in an VLQ string, but we can traverse it cheaply, indexing lines so that we can decode what we need.
This adds a per-line, allocation-light, package-private
LineIndexedMappingstometro-source-map. A single pass builds a per-line index: for each generated line, its byte offset intomappingsand the running source line/column delta accumulators as they stand entering that line. A lookup jumps straight to the target line and decodes only that line's segments.originalPositionForis byte-identical to agreatestLowerBoundovertoBabelSegments(map).map(toSegmentTuple)(1-based lines, 0-based columns, and generated-only segments resolving to null).Server/symbolicate.jscaches theLineIndexedMappingsper module for the duration of a request (I’m going to look into whether it’s worth expanding this givenLineIndexedMappingshas a small footprint).Tuple-backed modules are searched directly, as before, though I’m intending to delete that path.
Performance
With some representative modules of various sizes, picked from a real dev bundle:
| module (lines / segments / mappings) | BASE | NEW | speedup |
| 127 / 539 / 3.0K | 318 µs | 20 µs | ~16x |
| 297 / 1180 / 6.7K | 755 µs | 42 µs | ~18x |
| 8041 / 68259 / 400K | 62.4 ms | 2.7 ms | ~23x |
| 11294 / 95814 / 565K (
ReactFabric-dev) | 102.5 ms | 3.8 ms | ~27x |Gains scale with module size, so they concentrate on the largest modules, some of which appear frequently in traces. The React renderer, for instance, is one of the biggest modules in a typical bundle (~11k generated lines); its cold decode drops from ~102ms to ~4ms (~27x) - that difference alone likely recovers most of the regression.
Because large modules dominate total symbolication time, time decoding is reduced ~25x for a typical request (note: there's an unchanged per-request overhead from
_getExplodedSourceMapsForBundleOptionsnot included in these microbenchmarks, which becomes the next target for optimisation).Differential Revision: D110603508