|
| 1 | +--- |
| 2 | +title: "S3 Conditional Requests" |
| 3 | +weight: 1 |
| 4 | +menu: |
| 5 | + main: |
| 6 | + parent: Design |
| 7 | +summary: Design document for S3 conditional request support in Ozone |
| 8 | +date: 2025-11-20 |
| 9 | +jira: HDDS-4440 |
| 10 | +status: accepted |
| 11 | +author: Márton Elek |
| 12 | +--- |
| 13 | + |
| 14 | +# S3 Conditional Requests Design |
| 15 | + |
| 16 | +## Background |
| 17 | + |
| 18 | +AWS S3 supports conditional requests using HTTP conditional headers, enabling atomic operations, cache optimization, and preventing race conditions. This includes: |
| 19 | + |
| 20 | +- **Conditional Writes** (PutObject): `If-Match` and `If-None-Match` headers for atomic operations |
| 21 | +- **Conditional Reads** (GetObject, HeadObject): `If-Match`, `If-None-Match`, `If-Modified-Since`, `If-Unmodified-Since` for cache validation |
| 22 | +- **Conditional Copy** (CopyObject): Conditions on both source and destination objects |
| 23 | + |
| 24 | +### Current State |
| 25 | + |
| 26 | +- HDDS-10656 implemented atomic rewrite using `expectedDataGeneration` |
| 27 | +- OM HA uses single Raft group with single applier thread (Ratis StateMachineUpdater) |
| 28 | +- S3 gateway doesn't expose conditional headers to OM layer |
| 29 | + |
| 30 | +## Use Cases |
| 31 | + |
| 32 | +### Conditional Writes |
| 33 | +- **Atomic key rewrites**: Prevent race conditions when updating existing objects |
| 34 | +- **Create-only semantics**: Prevent accidental overwrites (`If-None-Match: *`) |
| 35 | +- **Optimistic locking**: Enable concurrent access with conflict detection |
| 36 | +- **Leader election**: Implement distributed coordination using S3 as backing store |
| 37 | + |
| 38 | +### Conditional Reads |
| 39 | +- **Bandwidth optimization**: Avoid downloading unchanged objects (304 Not Modified) |
| 40 | +- **HTTP caching**: Support standard browser/CDN caching semantics |
| 41 | +- **Conditional processing**: Only process objects that meet specific criteria |
| 42 | + |
| 43 | +### Conditional Copy |
| 44 | +- **Atomic copy operations**: Copy only if source/destination meets specific conditions |
| 45 | +- **Prevent overwrite**: Copy only if destination doesn't exist |
| 46 | + |
| 47 | +## AWS S3 Conditional Write |
| 48 | + |
| 49 | +### Specification |
| 50 | + |
| 51 | +#### If-None-Match Header |
| 52 | + |
| 53 | +``` |
| 54 | +If-None-Match: "*" |
| 55 | +``` |
| 56 | + |
| 57 | +- Succeeds only if object does NOT exist |
| 58 | +- Returns `412 Precondition Failed` if object exists |
| 59 | +- Primary use case: Create-only semantics |
| 60 | + |
| 61 | +#### If-Match Header |
| 62 | + |
| 63 | +``` |
| 64 | +If-Match: "<etag>" |
| 65 | +``` |
| 66 | + |
| 67 | +- Succeeds only if object EXISTS and ETag matches |
| 68 | +- Returns `412 Precondition Failed` if object doesn't exist or ETag mismatches |
| 69 | +- Primary use case: Atomic updates (compare-and-swap) |
| 70 | + |
| 71 | +#### Restrictions |
| 72 | + |
| 73 | +- Cannot use both headers together in same request |
| 74 | +- No additional charges for failed conditional requests |
| 75 | + |
| 76 | +### Implementation |
| 77 | + |
| 78 | +#### Architecture Overview |
| 79 | + |
| 80 | +#### If-None-Match Implementation |
| 81 | + |
| 82 | +##### S3 Gateway Layer |
| 83 | + |
| 84 | +1. Parse `If-None-Match: *`. |
| 85 | +2. Set `existingKeyGeneration = -1`. |
| 86 | +3. Call `RpcClient.rewriteKey()`. |
| 87 | + |
| 88 | +##### OM Create Phase |
| 89 | + |
| 90 | +1. Validate `expectedDataGeneration == -1`. |
| 91 | +2. If key exists → throw `KEY_ALREADY_EXISTS`. |
| 92 | +3. Store `-1` in open key metadata. |
| 93 | + |
| 94 | +##### OM Commit Phase |
| 95 | + |
| 96 | +1. Check `expectedDataGeneration == -1` from open key. |
| 97 | +2. If key now exists (race condition) → throw `KEY_ALREADY_EXISTS`. |
| 98 | +3. Commit key. |
| 99 | + |
| 100 | +##### Race Condition Handling |
| 101 | + |
| 102 | +Using `-1` ensures atomicity. If a concurrent write (Client B) commits between Client A's Create and Commit, Client A's commit fails the `-1` validation check (key now exists), preserving strict create-if-not-exists semantics. |
| 103 | + |
| 104 | +#### If-Match Implementation |
| 105 | + |
| 106 | +Leverages existing `expectedDataGeneration` from HDDS-10656: |
| 107 | + |
| 108 | +##### S3 Gateway Layer |
| 109 | + |
| 110 | +1. Parse `If-Match: "<etag>"` header |
| 111 | +2. Look up existing key via `getS3KeyDetails()` |
| 112 | +3. Validate ETag matches, else throw `PRECOND_FAILED` (412) |
| 113 | +4. Extract `expectedGeneration` from existing key |
| 114 | +5. Pass `expectedGeneration` to RpcClient |
| 115 | + |
| 116 | +##### OM Create Phase |
| 117 | + |
| 118 | +1. Receive `expectedDataGeneration` parameter |
| 119 | +2. Look up current key and validate exists |
| 120 | +3. Extract current key's `updateID` value |
| 121 | +4. Create open key with `expectedDataGeneration = updateID` |
| 122 | +5. Return stream to S3 gateway |
| 123 | + |
| 124 | +##### OM Commit Phase |
| 125 | + |
| 126 | +1. Read open key (contains `expectedDataGeneration`) |
| 127 | +2. Read current committed key |
| 128 | +3. Validate `current.updateID == openKey.expectedDataGeneration` |
| 129 | +4. Commit if match, reject if mismatch (existing HDDS-10656 logic) |
| 130 | + |
| 131 | +#### Error Mapping |
| 132 | + |
| 133 | +| OM Error | S3 Status | S3 Error Code | Scenario | |
| 134 | +|----------|-----------|---------------|----------| |
| 135 | +| `KEY_ALREADY_EXISTS` | 412 | PreconditionFailed | If-None-Match failed | |
| 136 | +| `KEY_NOT_FOUND` | 412 | PreconditionFailed | If-Match failed (key missing) | |
| 137 | +| `ETAG_MISMATCH` | 412 | PreconditionFailed | If-Match failed (ETag mismatch) | |
| 138 | +| `GENERATION_MISMATCH` | 412 | PreconditionFailed | If-Match failed (concurrent modification) | |
| 139 | + |
| 140 | +## AWS S3 Conditional Read |
| 141 | + |
| 142 | +TODO |
| 143 | + |
| 144 | +## AWS S3 Conditional Copy |
| 145 | + |
| 146 | +TODO |
| 147 | + |
| 148 | +## References |
| 149 | + |
| 150 | +- [AWS S3 Conditional Requests](https://docs.aws.amazon.com/AmazonS3/latest/userguide/conditional-requests.html) |
| 151 | +- [RFC 7232 - HTTP Conditional Requests](https://tools.ietf.org/html/rfc7232) |
| 152 | +- [HDDS-10656 - Atomic Rewrite Key](https://issues.apache.org/jira/browse/HDDS-10656) |
| 153 | +- [Leader Election with S3 Conditional Writes](https://www.morling.dev/blog/leader-election-with-s3-conditional-writes/) |
0 commit comments