Skip to content

Commit 5262235

Browse files
committed
add design
1 parent 4382cb3 commit 5262235

1 file changed

Lines changed: 153 additions & 0 deletions

File tree

Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
---
2+
title: "S3 Conditional Requests"
3+
weight: 1
4+
menu:
5+
main:
6+
parent: Design
7+
summary: Design document for S3 conditional request support in Ozone
8+
date: 2025-11-20
9+
jira: HDDS-4440
10+
status: accepted
11+
author: Márton Elek
12+
---
13+
14+
# S3 Conditional Requests Design
15+
16+
## Background
17+
18+
AWS S3 supports conditional requests using HTTP conditional headers, enabling atomic operations, cache optimization, and preventing race conditions. This includes:
19+
20+
- **Conditional Writes** (PutObject): `If-Match` and `If-None-Match` headers for atomic operations
21+
- **Conditional Reads** (GetObject, HeadObject): `If-Match`, `If-None-Match`, `If-Modified-Since`, `If-Unmodified-Since` for cache validation
22+
- **Conditional Copy** (CopyObject): Conditions on both source and destination objects
23+
24+
### Current State
25+
26+
- HDDS-10656 implemented atomic rewrite using `expectedDataGeneration`
27+
- OM HA uses single Raft group with single applier thread (Ratis StateMachineUpdater)
28+
- S3 gateway doesn't expose conditional headers to OM layer
29+
30+
## Use Cases
31+
32+
### Conditional Writes
33+
- **Atomic key rewrites**: Prevent race conditions when updating existing objects
34+
- **Create-only semantics**: Prevent accidental overwrites (`If-None-Match: *`)
35+
- **Optimistic locking**: Enable concurrent access with conflict detection
36+
- **Leader election**: Implement distributed coordination using S3 as backing store
37+
38+
### Conditional Reads
39+
- **Bandwidth optimization**: Avoid downloading unchanged objects (304 Not Modified)
40+
- **HTTP caching**: Support standard browser/CDN caching semantics
41+
- **Conditional processing**: Only process objects that meet specific criteria
42+
43+
### Conditional Copy
44+
- **Atomic copy operations**: Copy only if source/destination meets specific conditions
45+
- **Prevent overwrite**: Copy only if destination doesn't exist
46+
47+
## AWS S3 Conditional Write
48+
49+
### Specification
50+
51+
#### If-None-Match Header
52+
53+
```
54+
If-None-Match: "*"
55+
```
56+
57+
- Succeeds only if object does NOT exist
58+
- Returns `412 Precondition Failed` if object exists
59+
- Primary use case: Create-only semantics
60+
61+
#### If-Match Header
62+
63+
```
64+
If-Match: "<etag>"
65+
```
66+
67+
- Succeeds only if object EXISTS and ETag matches
68+
- Returns `412 Precondition Failed` if object doesn't exist or ETag mismatches
69+
- Primary use case: Atomic updates (compare-and-swap)
70+
71+
#### Restrictions
72+
73+
- Cannot use both headers together in same request
74+
- No additional charges for failed conditional requests
75+
76+
### Implementation
77+
78+
#### Architecture Overview
79+
80+
#### If-None-Match Implementation
81+
82+
##### S3 Gateway Layer
83+
84+
1. Parse `If-None-Match: *`.
85+
2. Set `existingKeyGeneration = -1`.
86+
3. Call `RpcClient.rewriteKey()`.
87+
88+
##### OM Create Phase
89+
90+
1. Validate `expectedDataGeneration == -1`.
91+
2. If key exists → throw `KEY_ALREADY_EXISTS`.
92+
3. Store `-1` in open key metadata.
93+
94+
##### OM Commit Phase
95+
96+
1. Check `expectedDataGeneration == -1` from open key.
97+
2. If key now exists (race condition) → throw `KEY_ALREADY_EXISTS`.
98+
3. Commit key.
99+
100+
##### Race Condition Handling
101+
102+
Using `-1` ensures atomicity. If a concurrent write (Client B) commits between Client A's Create and Commit, Client A's commit fails the `-1` validation check (key now exists), preserving strict create-if-not-exists semantics.
103+
104+
#### If-Match Implementation
105+
106+
Leverages existing `expectedDataGeneration` from HDDS-10656:
107+
108+
##### S3 Gateway Layer
109+
110+
1. Parse `If-Match: "<etag>"` header
111+
2. Look up existing key via `getS3KeyDetails()`
112+
3. Validate ETag matches, else throw `PRECOND_FAILED` (412)
113+
4. Extract `expectedGeneration` from existing key
114+
5. Pass `expectedGeneration` to RpcClient
115+
116+
##### OM Create Phase
117+
118+
1. Receive `expectedDataGeneration` parameter
119+
2. Look up current key and validate exists
120+
3. Extract current key's `updateID` value
121+
4. Create open key with `expectedDataGeneration = updateID`
122+
5. Return stream to S3 gateway
123+
124+
##### OM Commit Phase
125+
126+
1. Read open key (contains `expectedDataGeneration`)
127+
2. Read current committed key
128+
3. Validate `current.updateID == openKey.expectedDataGeneration`
129+
4. Commit if match, reject if mismatch (existing HDDS-10656 logic)
130+
131+
#### Error Mapping
132+
133+
| OM Error | S3 Status | S3 Error Code | Scenario |
134+
|----------|-----------|---------------|----------|
135+
| `KEY_ALREADY_EXISTS` | 412 | PreconditionFailed | If-None-Match failed |
136+
| `KEY_NOT_FOUND` | 412 | PreconditionFailed | If-Match failed (key missing) |
137+
| `ETAG_MISMATCH` | 412 | PreconditionFailed | If-Match failed (ETag mismatch) |
138+
| `GENERATION_MISMATCH` | 412 | PreconditionFailed | If-Match failed (concurrent modification) |
139+
140+
## AWS S3 Conditional Read
141+
142+
TODO
143+
144+
## AWS S3 Conditional Copy
145+
146+
TODO
147+
148+
## References
149+
150+
- [AWS S3 Conditional Requests](https://docs.aws.amazon.com/AmazonS3/latest/userguide/conditional-requests.html)
151+
- [RFC 7232 - HTTP Conditional Requests](https://tools.ietf.org/html/rfc7232)
152+
- [HDDS-10656 - Atomic Rewrite Key](https://issues.apache.org/jira/browse/HDDS-10656)
153+
- [Leader Election with S3 Conditional Writes](https://www.morling.dev/blog/leader-election-with-s3-conditional-writes/)

0 commit comments

Comments
 (0)