Skip to content

asio_rpc_client overhaul#103

Open
al13n321 wants to merge 4 commits intomasterfrom
cli
Open

asio_rpc_client overhaul#103
al13n321 wants to merge 4 commits intomasterfrom
cli

Conversation

@al13n321
Copy link
Copy Markdown
Member

Running #101 with tsan revealed that enabling proper streaming on the server side made the client hit data races. Looking closer at the client code, it seems sloppy and incorrect in a few ways. Streaming mode was retrofitted to it without enough thought. This PR half rewrites it.

  • It was unclear what was going on with thread safety generally. I'm not sure what exactly the intended mental model was. This PR switches to this mental model:
    // asio_rpc_client consists of two "threads" of work: send side and receive side.
    // Each "thread" is a chain of boost asio calls and their callbacks,
    // executed sequentially.
    // In streaming mode, the two chains can run in parallel, and we should
    // be careful to avoid data races.
    // Send side covers connection, SSL handshake, and sending requests.
    // Receive side covers receiving responses.
  • There was a whole retry timer and a bunch of code for it... that is all fully unreachable, if I'm reading it correctly. Deleted.
  • There was a simple data race on a timer.
  • There were two different race conditions that could cause when_done callback can be called twice.

But this PR also removes some features:

  • Instead of one timeout for send+receive, there are now separate timers for sending request and receiving response. (And the timer for connecting was already separate. And ssl handshake wasn't covered by any timeout, maybe a bug; it's included in the connect timeout now.)
  • The code was kinda ambiguous on whether asio_rpc_client is meant to remain usable if some request failed in noncritical way (crc mismatch or read_resp_meta_ callback returned false). In that case it called a function called close_socket, but that function doesn't actually close socket, so the asio_rpc_client would probably still work; except in streaming mode, where it would probably get stuck (because we wouldn't send next requests from the queue). This PR makes asio_rpc_client consistently enter "abandoned" state on any error.

@al13n321
Copy link
Copy Markdown
Member Author

There was a whole retry timer and a bunch of code for it... that is all fully unreachable, if I'm reading it correctly.

Claude claims to have figured out what the deal is: "This code WAS reachable before commit 661ec77 ("Fix race on ASIO client during prevote and vote"), which fixed a bug where vote/prevote requests didn't honor busy_flag_, allowing concurrent sends on the same client. After that fix, all paths properly serialize, making send_retry unreachable."

@al13n321 al13n321 marked this pull request as ready for review April 17, 2026 00:09
@al13n321 al13n321 requested a review from antonio2368 April 17, 2026 00:09
@filimonov
Copy link
Copy Markdown

@al13n321
Copy link
Copy Markdown
Member Author

Streaming callback order is violated when a failure races with a normal completion

I thought about it too, but

  • it was already broken,
  • the existing callers don't care,
  • the fix would be mildly complex,
  • all proposed fixes seem incorrect (scenario: we're calling callback X with successful response; in parallel there's also a close_socket call in that sees that it needs to call 42 pending callbacks Y with errors, right after X returns; the callback X itself calls send() to send another message; this send() has to report error through its callback Z, while (1) not deadlocking, (2) somehow arranging for Z to be called after after Y).

I'll just add a comment explaining that error callbacks may happen out of order or in parallel, if that's ok.

Other comments seem wrong.

Comment thread src/asio_service.cxx Outdated
Comment thread src/asio_service.cxx Outdated
Comment thread src/asio_service.cxx Outdated
Comment thread src/asio_service.cxx Outdated
Comment thread src/asio_service.cxx
ptr<buffer> resp_buf(buffer::alloc(RPC_RESP_HEADER_SIZE));
aa::read( ssl_enabled_, ssl_socket_, socket_,
asio::buffer(resp_buf->data(), resp_buf->size()),
std::bind( &asio_rpc_client::response_read,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imagine a world where lambdas exist

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh, seems good that the functions have names and are listed sequentially rather than nested inside each other.

Comment thread src/asio_service.cxx Outdated
Comment thread src/asio_service.cxx
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants