fetch: Close data race between BOC state and objcore flags#4402
Conversation
This will allow safe checks related to BOC states that are not guarded by the BOC lock.
After returning from ObjWaitState(), replace any access of oc->boc->state with the one that was captured by the wait loop.
It should be safe as it was because there shouldn't be a transition from BOS_FINISHED to BOS_FAILED, but now that it can be atomic, it should not only be safe but also future-proof. It originally matched ignore_state from the coccinelle patch because of the separate boc variable, and had to be manually edited.
1a0f32e to
ad2a668
Compare
|
I understand that we should read the boc state consistently, but don't we have an outdated read left on |
|
I'm not sure what "here" is referring to, the link you shared is highlighting Around this line there are two accesses to state = ObjWaitState(oc, BOS_STREAM);
AZ(oc->flags & OC_F_BUSY);
if (state == BOS_FAILED)
AN(oc->flags & OC_F_FAILED);The first one is always true so it's not a problem, and the second is safe too as long as the observed state is BOS_FAILED. We first fail the oc before failing the boc, so capturing Whereas before, the sequence could have been:
|
|
@dridi I clarified my earlier comment. |
|
But anyway, irrespective of a possible outdated read, this patch is an improvement, so we should merge it. |
|
Oh, we may still observe an outdated |
I guess the alternative is to either drop the check altogether or check while holding the objhead lock. I would expect that the |
That's what I guessed with similarly structured code and had to learn that the answer is no. But we can still see if it happens here |
|
Bugwash: ok |
The changes from #4073 increases the likelihood of a data race between a BOC state check followed immediately by an objcore flag check:
Once we reach
BOS_STREAM, we touchoc->flagsto checkOC_F_BUSY.If the backend fetch failed immediately after reaching
BOS_STREAM, we may then observeBOS_FAILEDwhen we touchoc->boc->statebut still seeoc->flagsfrom before the failure in this thread.Because
oc->flagsandoc->boc->stateare guarded by different locks, we can't safely rely on this construct. One option could be to move theOC_F_BUSYcheck down, and another, could be to capture the BOC state:This pull request implements the state capture.