RUST-2401 Increase performance with facet#674
Open
abr-egn wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
RUST-2401
This increases the performance of both the facet parsing and serialization paths via distinct sets of optimizations. There's still a significant penalty vs serde but this brings the worst case down from 5x to 3x.
For parsing, profiling showed
Parser::peekto be the hot path; the main cost there was allocating a newVecfor the stack of the returned next parse state. This was an unconditional alloc that was frequently wasted, either if the stack was identical (it would becloned and then the old value dropped) or if the parser was doing lookahead without advancing the state. To fix that,peeknow returns a cheap alloc-lessDeltastruct that describes the change to be made to advance the state; that's only applied when necessary and even then if the stack is unchanged (i.e. iterating over fields rather than moving up/down nesting) it saves the clone+drop.For serialization,
Serializer::serialize_opaque_scalarwas the hot path. That doesn't have any allocation, but it did have a linear chain of indirect function calls (value.get::<Type>()). This meant that every field walked incurred that chain, increasing by how far down that type happened to be. To avoid that, the type id is fetched once and directly compared toTypeId::of::<Type>(); because that's aconst fn, those turn into compile-time constants, and then the whole set of comparisons can become an O(1) operation via a jump table or similar.Side note: the jump table optimization was a much bigger win than the alloc removal, which surprised me quite a bit.
As far as I can tell, this was the low-hanging fruit for optimization. Getting substantially better performance would probably involve going down the same road some of the other facet format implementations do, e.g. using cranelift to generate machine code for the specific struct shapes. That would be basically a total rewrite and much more complicated at that, so I'd say we should wait and see what kind of adoption happens before investing that much time.
Benchmarks (lower is better):
Before
Delta vs serde:
After
Delta vs serde:
Delta vs old: