Skip to content
This repository was archived by the owner on Feb 16, 2026. It is now read-only.

fetch: Close data race between BOC state and objcore flags#4402

Merged
dridi merged 4 commits into
varnishcache:masterfrom
dridi:atomic_boc_state
Oct 27, 2025
Merged

fetch: Close data race between BOC state and objcore flags#4402
dridi merged 4 commits into
varnishcache:masterfrom
dridi:atomic_boc_state

Conversation

@dridi
Copy link
Copy Markdown
Member

@dridi dridi commented Oct 3, 2025

The changes from #4073 increases the likelihood of a data race between a BOC state check followed immediately by an objcore flag check:

ObjWaitState(oc, BOS_STREAM);
AZ(oc->flags & OC_F_BUSY);
if (oc->boc->state == BOS_FAILED)
        AN(oc->flags & OC_F_FAILED);

Once we reach BOS_STREAM, we touch oc->flags to check OC_F_BUSY.

If the backend fetch failed immediately after reaching BOS_STREAM, we may then observe BOS_FAILED when we touch oc->boc->state but still see oc->flags from before the failure in this thread.

Because oc->flags and oc->boc->state are guarded by different locks, we can't safely rely on this construct. One option could be to move the OC_F_BUSY check down, and another, could be to capture the BOC state:

state = ObjWaitState(oc, BOS_STREAM);
AZ(oc->flags & OC_F_BUSY);
if (state == BOS_FAILED)
        AN(oc->flags & OC_F_FAILED);

This pull request implements the state capture.

dridi added 2 commits October 3, 2025 10:18
This will allow safe checks related to BOC states that are not guarded
by the BOC lock.
After returning from ObjWaitState(), replace any access of
oc->boc->state with the one that was captured by the wait
loop.
dridi added 2 commits October 3, 2025 12:58
It should be safe as it was because there shouldn't be a transition from
BOS_FINISHED to BOS_FAILED, but now that it can be atomic, it should not
only be safe but also future-proof. It originally matched ignore_state
from the coccinelle patch because of the separate boc variable, and
had to be manually edited.
@dridi dridi force-pushed the atomic_boc_state branch from 1a0f32e to ad2a668 Compare October 3, 2025 10:59
@nigoroll
Copy link
Copy Markdown
Member

nigoroll commented Oct 8, 2025

I understand that we should read the boc state consistently, but don't we have an outdated read left on oc->flags here?

@dridi
Copy link
Copy Markdown
Member Author

dridi commented Oct 8, 2025

I'm not sure what "here" is referring to, the link you shared is highlighting if (state == BOS_FAILED).

Around this line there are two accesses to oc->flags:

state = ObjWaitState(oc, BOS_STREAM);
AZ(oc->flags & OC_F_BUSY);
if (state == BOS_FAILED)
        AN(oc->flags & OC_F_FAILED);

The first one is always true so it's not a problem, and the second is safe too as long as the observed state is BOS_FAILED. We first fail the oc before failing the boc, so capturing BOS_FAILED is our warranty.

Whereas before, the sequence could have been:

  • client resumes upon BOS_STREAM
  • client observes no OC_F_BUSY
  • backend fetch fails
  • client observes BOS_FAILED
  • client cannot observe OC_F_FAILED

@nigoroll
Copy link
Copy Markdown
Member

@dridi I clarified my earlier comment.
My question was regarding outdated reads of OC_F_*. We set the oc flags before taking the boc lock so, while common architectures are likely to have a read of the oc flags updated, there is no guarantee, IIUC.

@nigoroll
Copy link
Copy Markdown
Member

But anyway, irrespective of a possible outdated read, this patch is an improvement, so we should merge it.

@dridi
Copy link
Copy Markdown
Member Author

dridi commented Oct 13, 2025

Oh, we may still observe an outdated oc->flags, but because we observed BOS_FAILED at resumption time it must contain the OC_F_FAILED bit. That bit is raised prior to reaching BOS_FAILED.

@dridi
Copy link
Copy Markdown
Member Author

dridi commented Oct 13, 2025

while common architectures are likely to have a read of the oc flags updated, there is no guarantee, IIUC.

I guess the alternative is to either drop the check altogether or check while holding the objhead lock. I would expect that the boc critical section on the client side is enough to invalidate the oc pointer and warrant a refresh.

@nigoroll
Copy link
Copy Markdown
Member

I would expect that the boc critical section on the client side is enough to invalidate the oc pointer and warrant a refresh.

That's what I guessed with similarly structured code and had to learn that the answer is no. But we can still see if it happens here

@dridi
Copy link
Copy Markdown
Member Author

dridi commented Oct 27, 2025

Bugwash: ok

@dridi dridi merged commit f039193 into varnishcache:master Oct 27, 2025
9 of 10 checks passed
@dridi dridi deleted the atomic_boc_state branch October 27, 2025 14:08
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants