Batch multiple transactions in a single commit in the commitlog#5018
Batch multiple transactions in a single commit in the commitlog#5018joshua-spacetime wants to merge 1 commit into
Conversation
Also pass a range to msync when writing entries to the segment offset index file, to be explicit and avoid flushing/examining unnecessary pages.
| let _ = self.append_internal().map_err(|e| { | ||
| warn!("failed to append to offset index: {e:?}"); | ||
| }); | ||
| let _ = self | ||
| .head | ||
| .async_flush() | ||
| .map_err(|e| warn!("failed to flush offset index: {e:?}")); |
There was a problem hiding this comment.
This flush isn't needed since append_internal already flushes.
| for tx in tx_buf.drain(..) { | ||
| clog.commit([tx.into_transaction()])?; | ||
| } | ||
| clog.commit(tx_buf.drain(..).map(|tx| tx.into_transaction()))?; |
There was a problem hiding this comment.
This could include quite a lot of transactions, up to the size of the durability queue size. Not sure if this should be bounded further or if it matters at all.
kim
left a comment
There was a problem hiding this comment.
I've thought about it many times, but at this point I'm not on board moving away from the 1-tx-per-commit restriction.
A torn write in the middle of a commit is guaranteed to destroy commit.n transactions. A smaller number of transactions per commit results in a higher number of commits per write, and increases the chance that at least some transactions are recoverable.
The trouble is that we are rather prone to torn writes, not least because they (the writes) are unaligned. Just advising to use confirmed reads is not enough, I'd argue, because users have no way of even knowing how many transactions could potentially be lost -- outside of benchmarking scenarios, I at least would want to design my application such that it doesn't write too much ahead of the "uncertainty window" of the durability layer.
We can certainly do better by improving our I/O model and recovery mechanisms, but at this point I think we'd basically weaken durability guarantees, and I don't think this is a good idea.
The offset index changes look fine to me. I would suggest to increase Options::offset_index_interval_bytes if there is data that suggests that we're updating the index too often.
Description of Changes
Batch multiple transactions in a single commit instead of one commit per transaction.
Reduces per commit overhead such as updating the segment offset index file.
API and ABI breaking changes
None
Expected complexity level and risk
1
Testing
Refactor