Unify mailbox TTL to a single value#1457
Conversation
|
i'm still not convinced this change is desired, per discussion in #1421. if a single TTL was used but it was set to 1 week (or some other value, with justification) instead of 24hrs, i would be concept ACK but still not convinced. if the argument in favor of this change is that the tradeoff of reduced code complexity over poorer quality of service is desirable is worth it, then sure... not sure i agree but if others (@spacebear21?) disagree i'm happy to be overruled. having just one TTL value can simplify pruning as that can just be implemented as a fixed size circular buffer of IDs. BIP 77, as currently written, does not specify a default expiry time at all, and this change effectively enforces a 24 hour limit instead of a soft 1 week limit, even though the resource requirements by a directory operator aren't that different and that doesn't seem very justified it follows that if this PR was only modified so that the hard limit was set to say 1 week in order to preserve the current best effort quality of service (and upgrade it to a guarantee in a sense), and this change is instead framed as removing the complexity of early deletion of read messages, we want the complexity of early deletion of read messages? arguably deleting those messages sooner rather than later is the more privacy preserving of the two, and comes with a 7* increase in storage capacity required, which also seems negligible (since it's only around 2GB now with the 24hr limit, and storage is cheap). On the other hand, if the argument is that this is a reduction in trust in the directory operator, i categorically disagree and believe that that actually undermines trust as i've argued in the ticket, so concept NACK if that's the justification |
|
Thanks for the detailed feedback @nothingmuch I’m still getting familiar with BIP 77 and the intended expectations around TTL, so this is helpful context. From your explanation, I understand that enforcing a strict 24-hour TTL may reduce quality of service compared to a 1 week expectation. I also see your point that simplification could still be achieved with a longer duration like 1 week, without negatively impacting service guarantees. Based on this, would you prefer that I update the PR to use a longer TTL (e.g., ~1 week) or should I hold off for now until there’s broader agreement on the desired behavior? |
|
In my opinion the simplification is significant enough that I'm leaning towards concept ACK, but I agree that 1 week sounds more appropriate for giving more choice to implementations re: session expirations. Obviously this doesn't reduce trust in the operator - in fact I wonder if DEFAULT_TTL should be straight-up configurable? |
| unread_ttl_below_capacity: DEFAULT_UNREAD_TTL_BELOW_CAPACITY, | ||
| unread_ttl_at_capacity: DEFAULT_UNREAD_TTL_AT_CAPACITY, | ||
| read_ttl: DEFAULT_READ_TTL, | ||
| ttl: DEFAULT_TTL, |
There was a problem hiding this comment.
i agree with spacebear, iirc the reason i didn't make this configurable was that at the time we already expected to rewrite the config handling, but this should be at the operator's discretion, since it's now only one param i think adding it to init as a standalone argument seems reasonable
There was a problem hiding this comment.
also, when added this should cross reference the rate limiting stuff, because these two parameters together enforce the limit on storage... maybe it's better to set the rate limiting by defining another config parameter for max storage, and then ensure that rate limit is (max_storage/storage_per_mailbox))/TTL writes per unit time
| unread_ttl_at_capacity: DEFAULT_UNREAD_TTL_AT_CAPACITY, | ||
| read_ttl: DEFAULT_READ_TTL, | ||
| ttl: DEFAULT_TTL, | ||
| early_removal_count: 0, |
There was a problem hiding this comment.
| early_removal_count: 0, |
| while let Some((created, id)) = self.insert_order.front().cloned() { | ||
| if created + self.unread_ttl_below_capacity < now { | ||
| if created + self.ttl < now { | ||
| debug_assert!(self.insert_order.len() >= self.early_removal_count); |
There was a problem hiding this comment.
| debug_assert!(self.insert_order.len() >= self.early_removal_count); |
| unread_ttl_at_capacity: Duration, | ||
| read_ttl: Duration, | ||
| ttl: Duration, | ||
| early_removal_count: usize, |
There was a problem hiding this comment.
| early_removal_count: usize, |
There was a problem hiding this comment.
| self.insert_order.len() |
There was a problem hiding this comment.
| self.remove(&id).await?; // FIXME what to do on is_none()? |
github is being glitchy, this suggestion needs to start from the if self.remove(&id) line but for some reason i can't make a comment on those lines
There was a problem hiding this comment.
i guess is_none would now only imply that the operator deleted mailboxes while the server was running, or there is some bug or disk failure.
therefore IMO None should not be treated this as an error (it may be intentional), but a log a warning is appropriate so if it's unintentional the operator would be informed of the data on disk disappearing. in case the operator suspects it's a bug, it's probably best not to log the shortid since the time at which this error would be discovered implies the time at which the shortid was inserted, and that information might be inadvertently posted in an issue, making this temporal information public.
| if self.len() == self.capacity { | ||
| if let Some((created, id)) = self.insert_order.front().cloned() { | ||
| if created + self.unread_ttl_at_capacity < now { | ||
| if created + self.ttl < now { |
There was a problem hiding this comment.
this is redundant with the code above, i believe this is dead code because self.len() == self.capacity will only be true if created + self.ttl < was already false before
There was a problem hiding this comment.
okay . thanks for the review
i would make the neccessary changes
16828a8 to
8988cc0
Compare
nothingmuch
left a comment
There was a problem hiding this comment.
utACK, code looks good but two minor corrections to (doc) comments
| self.pending_v2.retain(|_, v| v.receiver.strong_count().unwrap_or(0) > 1); | ||
|
|
||
| // Prune any fully expired mailboxes, whether read or unread | ||
| // Prune any fully expired mailboxes |
There was a problem hiding this comment.
nit:
| // Prune any fully expired mailboxes | |
| // Prune any expired mailboxes |
| #[serde(deserialize_with = "deserialize_duration_secs")] | ||
| pub timeout: Duration, | ||
| /// Mailbox lifetime in seconds. Together with `DEFAULT_CAPACITY` in | ||
| /// `db::files`, this bounds both disk usage and write throughput. |
There was a problem hiding this comment.
this could bound write throughput but doesn't do so currently except in the trivial sense that once storage is full no new writes are accepted, see tracking tasks in #941, this could be done with tower middleware
There was a problem hiding this comment.
Noted
i would remove the comment
caarloshenriq
left a comment
There was a problem hiding this comment.
tACK 8988cc0
Tested on linux mint with nix. Server starts cleanly, /ohttp-keys returns 200 with
the expected binary payload. Unit tests pass (cargo test -p payjoin-mailroom).
mailbox_ttl is configurable via config.toml and defaults to 1 week as
discussed
Coverage Report for CI Build 24402852101Warning Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes. Coverage increased (+0.04%) to 84.38%Details
Uncovered Changes
Coverage Regressions1 previously-covered line in 1 file lost coverage.
Coverage Stats
💛 - Coveralls |
8988cc0 to
abebc6d
Compare
Replace three separate TTL constants (read: 10 min, unread at capacity: 1 day, unread below capacity: 7 days) with a single DEFAULT_TTL of 24 hours based on creation time only. Removes read_order, read_mailbox_ids, mark_read(), and the read-TTL pruning pass. Capacity eviction remains as a storage pressure mechanism.
abebc6d to
d3d4c8a
Compare
spacebear21
left a comment
There was a problem hiding this comment.
utACK d3d4c8a, thanks @caarloshenriq for testing!
This Pr addresses the issue in #1421 which replace three separate TTL constants (read: 10 min, unread at capacity: 1 day, unread below capacity: 7 days) with a single DEFAULT_TTL of 24 hours based on creation time only. Removes read_order, read_mailbox_ids, mark_read(), and the read-TTL pruning pass. Capacity eviction remains as a storage pressure mechanism.
I brainstormed the implementation approach with Claude while working through this fix.
Please confirm the following before requesting review:
AI
in the body of this PR.