Skip to content

THRIFT-6069: python: use a flat fastbinary encode buffer#3596

Open
markjm wants to merge 2 commits into
apache:masterfrom
markjm:apache-port-fastbinary-flat-encode-buffer
Open

THRIFT-6069: python: use a flat fastbinary encode buffer#3596
markjm wants to merge 2 commits into
apache:masterfrom
markjm:apache-port-fastbinary-flat-encode-buffer

Conversation

@markjm

@markjm markjm commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Hi - I figured I'd share a few perf optimizations we are using internally. We are still on an older thrift version (😢 ), so I did use some agent magic to port these to the head of this repo. My testing was primarily on my branch, so buyer beware on that front!

This one is 1 of 3 PRs

⚠️ I did use AI tools to investigate and address feedback, but I am a real human ready to collaborate 😄


Replace std::vector back_inserter writes in the fastbinary encoder with a malloc/realloc buffer so encoded bytes can be copied in bulk while preserving the existing growth behavior.

Performance (50k iterations, warmed)

Message size apache/master branch speedup
8 KiB 12.19 us 1.11 us 11.0x
64 KiB 101.58 us 5.48 us 18.5x
256 KiB 564.34 us 25.22 us 22.4x
1 MiB 2188.97 us 136.22 us 16.1x

Decode performance is unchanged because this only affects the encode path.

Replace std::vector back_inserter writes in the Python 3 fastbinary encoder with a malloc/realloc buffer so encoded bytes can be copied in bulk while preserving the existing growth behavior.

Performance (50k iterations, warmed)

| Workload | Baseline | This commit | Speedup |
|----------|----------|-------------|---------|
| encode simple (30B) | 0.60 us | 0.53 us | 1.13x |
| encode 10-string (182B) | 1.44 us | 1.25 us | 1.15x |
| encode complex (395B) | 3.02 us | 2.63 us | 1.15x |

Larger object encode throughput

| Message size | apache/master | branch | speedup |
|--------------|---------------|--------|---------|
| 8 KiB | 12.19 us | 1.11 us | 11.0x |
| 64 KiB | 101.58 us | 5.48 us | 18.5x |
| 256 KiB | 564.34 us | 25.22 us | 22.4x |
| 1 MiB | 2188.97 us | 136.22 us | 16.1x |

Decode performance is unchanged because this only affects the encode path.
Copilot AI review requested due to automatic review settings June 13, 2026 07:28
@markjm markjm requested a review from mhlakhani as a code owner June 13, 2026 07:28
@mergeable mergeable Bot added the python label Jun 13, 2026
@markjm markjm changed the title python: use a flat fastbinary encode buffer [THRIFT-6069] python: use a flat fastbinary encode buffer Jun 13, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds coverage for Thrift’s accelerated binary protocol round-tripping a large TApplicationException, and refactors the Python C-extension encode buffer implementation away from std::vector to a manual malloc/realloc-backed buffer.

Changes:

  • Add a unit test that exercises _fast_encode/_fast_decode via TBinaryProtocolAcceleratedFactory for a large exception payload.
  • Replace EncodeBuffer’s std::vector<char> storage with a manual heap buffer and update protocol write logic to use memcpy.
  • Add required C/C++ headers for the new buffer implementation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
lib/py/test/thrift_TBinaryProtocol.py Adds accelerated large-message roundtrip test coverage for the fastbinary path.
lib/py/src/ext/types.h Refactors EncodeBuffer to raw malloc/realloc storage with destructor-based cleanup.
lib/py/src/ext/protocol.tcc Updates encode buffer allocation and write path to use the new EncodeBuffer API.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/py/src/ext/types.h
Comment on lines 134 to +145
struct EncodeBuffer {
std::vector<char> buf;
size_t pos;
char* data;
size_t size;
size_t capacity;

EncodeBuffer() : data(nullptr), size(0), capacity(0) {}

~EncodeBuffer() {
if (data) {
free(data);
}
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks bot

Comment thread lib/py/src/ext/types.h
Comment on lines +157 to +176
bool ensure(size_t additional) {
size_t needed = size + additional;
if (needed <= capacity) {
return true;
}

size_t new_capacity = capacity == 0 ? needed : capacity;
while (new_capacity < needed) {
new_capacity *= 2;
}

char* new_data = static_cast<char*>(realloc(data, new_capacity));
if (!new_data) {
return false;
}

data = new_data;
capacity = new_capacity;
return true;
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks bot

Comment thread lib/py/src/ext/types.h
Comment on lines +147 to +155
bool init(size_t initial_capacity) {
data = static_cast<char*>(malloc(initial_capacity));
if (!data) {
return false;
}
size = 0;
capacity = initial_capacity;
return true;
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks bot

Comment thread lib/py/test/thrift_TBinaryProtocol.py Outdated
Comment on lines +172 to +179
APPLICATION_EXCEPTION_TYPEARGS = [
TApplicationException,
(
None,
(1, 11, "message", "UTF8", None),
(2, 8, "type", None, None),
),
]
Make EncodeBuffer explicitly non-copyable, handle zero-capacity initialization, guard capacity growth against size_t overflow, and tighten the large-message fastbinary test to use immutable Thrift spec metadata.
@Jens-G

Jens-G commented Jun 13, 2026

Copy link
Copy Markdown
Member

I did use AI tools to investigate and address feedback, but I am a real human ready to collaborate

As long as they adhere to AGENTS.md all is fine.

@Jens-G Jens-G self-requested a review June 13, 2026 09:03
@Jens-G Jens-G changed the title [THRIFT-6069] python: use a flat fastbinary encode buffer THRIFT-6069: python: use a flat fastbinary encode buffer Jun 13, 2026
@Jens-G

Jens-G commented Jun 13, 2026

Copy link
Copy Markdown
Member

Code review

Found 1 issue:

  1. AI tool use is acknowledged in the PR body but neither commit contains the required Co-Authored-By: or Generated-by: label (AGENTS.md §4 says "Always label AI-assisted commits and PRs … Apply this label even when AI only generated a portion of the change").

thrift/AGENTS.md

Lines 57 to 71 in 35c1a53

## 4. AI-Generated Contributions
Per [`CONTRIBUTING.md § AI generated content`](CONTRIBUTING.md#ai-generated-content) and the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html):
- **Always** label AI-assisted commits and PRs. Use one or both of:
```
Co-Authored-By: <AI tool name and version>
Generated-by: <AI tool name and version>
```
Example:
```
THRIFT-9999: Fix connection timeout handling in Go client
Client: go
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants