CLDSRV-908: CopyObject handle checksums#6176
Conversation
Hello leif-scality,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
Incorrect fix versionThe
Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:
Please check the |
|
LGTM — clean implementation of checksum propagation and recompute on CopyObject. Stream handling (jsutil.once guards, Azure per-part passthrough, error propagation) and the _shouldRecomputeChecksum decision logic are solid. One minor test style issue flagged inline. |
❌ 2 Tests Failed:
View the full list of 2 ❄️ flaky test(s)
To view more test analytics, go to the Test Analytics Dashboard |
5ca489a to
57d9894
Compare
|
LGTM |
57d9894 to
5bd2262
Compare
|
5bd2262 to
8b2b196
Compare
|
LGTM |
8b2b196 to
720f686
Compare
|
LGTM |
720f686 to
6385c3e
Compare
|
Well-structured PR with thorough test coverage across all checksum scenarios (propagation, recompute, 0-byte, COMPOSITE, multi-part, Azure, copy-to-self, external backends). |
6385c3e to
8ede498
Compare
|
|
LGTM — well-structured implementation with thorough test coverage across all the checksum scenarios (propagation, recompute, 0-byte, multi-part, Azure, copy-to-self, external backends, orphan cleanup, legacy string locations). |
b866c91 to
ae43793
Compare
|
LGTM |
ae43793 to
154e192
Compare
|
LGTM |
The recompute path writes the destination through plain data.put, which (unlike data.copyObject, used by the propagate path) does not return a dataStoreETag, so the destination's location entry was stored without one. Nothing breaks visibly — partNumber readers fall back to treating the copy as a single part — but it left the recompute path as the only data-writing path producing legacy ETag-less locations and dropped the written-bytes MD5 that integrity/audit tooling relies on. Read the MD5 from data.put's hashedStream callback argument (completedHash, which the arsenal wrapper guarantees is set before the callback fires) and stamp the location as 1:<md5>, matching createAndStoreObject. Falls back to dataRetrievalInfo.dataStoreETag for external backends. Adds a unit test asserting the recomputed destination location carries the 1:-prefixed MD5.
154e192 to
e8311fb
Compare
|
LGTM |
| if (err) { | ||
| return done(err); | ||
| } | ||
| partStream.once('error', done); |
There was a problem hiding this comment.
the error is not wrapped here
| // into the master passthrough and use its 'end' as the completion | ||
| // signal — same pattern arsenal's data.copyObject uses. | ||
| const perPart = new PassThrough(); | ||
| const wrapErr = err => Object.assign(err, { |
There was a problem hiding this comment.
You could pass the part as param to that function and define it once outside the async.eachSeries
| }); | ||
| sourceStream.once('error', finish); | ||
| checksumStream.once('error', finish); | ||
| checksumStream.on('data', () => {}); |
There was a problem hiding this comment.
It's not optimal to use a Transform only for the side effect of calculating checksum while discarding the output.
The Transform still works on pushing bytes into it's buffer.
Maybe we could rework the Checksum by implementing a new ChecksumWritable stream that does the same checksum computation but discards immediately the bytes without pushing.
There was a problem hiding this comment.
I don't think it is worth it to create a ChecksumWritable just to avoid a NOP. The checksum transform already handles everything and it is already tested
| sourceStream.once('error', finish); | ||
| checksumStream.once('error', finish); |
There was a problem hiding this comment.
Maybe this error should be wrapped as well to easily troubleshoot if an error is coming from the source or checksum stream
| }); | ||
| objectCopy(authInfo, req, sourceBucketName, objectKey, undefined, log, err => { | ||
| assert(err); | ||
| assert.strictEqual(err.is.InvalidRequest, true); |
There was a problem hiding this comment.
not ideal to use boolean comparison with .is
| assert.strictEqual(err.is.InvalidRequest, true); | |
| assert.strictEqual(err.message, 'InvalidRequest'); |
| objectCopy(authInfo, req, sourceBucketName, objectKey, undefined, log, | ||
| assertRecomputed('sha256', 'ChecksumSHA256', expectedDigest, err => { | ||
| cipherBundleSpy.restore(); | ||
| dataPutSpy.restore(); |
There was a problem hiding this comment.
Ideally those restore should be in a hook beforeEach or afterEach to cover the case where an error is thrown before without executing this callback, that would leave the spy in place
| dataGetStub.restore(); | ||
| assertRecomputed('sha256', 'ChecksumSHA256', expectedDigest, done)(err, xml); | ||
| }); | ||
| }, err => { | ||
| dataGetStub.restore(); |
There was a problem hiding this comment.
Same here, the stub should be tracked outside this test to be restored in a hook to avoid missing it if any code path throws without reaching this callbacks
| dataPutSpy.restore(); | ||
| batchDeleteSpy.restore(); |
There was a problem hiding this comment.
same here, the first assert.ifError from metadata.getObjectMD would not restore the spys
| assert(xml.includes('<ChecksumCRC32>AAAAAA==</ChecksumCRC32>')); | ||
| assert(xml.includes('<ChecksumType>FULL_OBJECT</ChecksumType>')); |
There was a problem hiding this comment.
Prefer assert.match with a regex to see the full detail of the xml + expected pattern in error message, or include a message to the assert to provide more details about actual vs expected value
| socket: {}, | ||
| }); | ||
| objectCopy(authInfo, req, sourceBucketName, objectKey, undefined, log, err => { | ||
| batchDeleteSpy.restore(); |
There was a problem hiding this comment.
Overall many sinon spy/stub in many tests seems they could skip restore if some error branches is reached (here the ifError of getObjectMD and putObjectMD).
Maybe we should include in the claude review skill that in test files any mock/spy/stub should be tracked and cleaned in an afterEach hook to make sure every branch is covered. Otherwise one test failure might impact another test.
GET+PUT+DELETEGET+PUT+DELETE(same)CopyObject/beginCopyFromURL/copyObject) — no GETGET+PUT+DELETE— streamed through CloudServerGET+PUT+DELETEGET+PUT+DELETE(same)GET+PUT+DELETEGET+PUT+DELETE(same)GETonly — reuse location (no PUT, no DELETE)PUT(data.put(null))PUT(data.put(null)) (same)