Skip to content

feat(storage): add progress stall timeout for S3 uploads#3306

Open
burak-initialcode wants to merge 4 commits into
aws-amplify:mainfrom
burak-initialcode:upload-timeouts
Open

feat(storage): add progress stall timeout for S3 uploads#3306
burak-initialcode wants to merge 4 commits into
aws-amplify:mainfrom
burak-initialcode:upload-timeouts

Conversation

@burak-initialcode
Copy link
Copy Markdown

@burak-initialcode burak-initialcode commented Apr 27, 2026

Issue

#3302

Description

Adds a configurable progress stall timeout to S3 storage uploads. When upload progress does not advance for the configured interval (e.g. due to network issues), the upload is automatically cancelled and the operation's onError consumer receives an error instead of hanging indefinitely.

Problem: On poor or unstable networks, S3 uploads can stall indefinitely without the client receiving any progress updates.

Solution:

  • New public ProgressStallTimeout sealed class in :core (Disabled / Interval(seconds)), mirroring the iOS API.
  • New public ProgressStallTimeoutException (AmplifyException subclass) in :core so consumers can pattern-match on the cause of the surfaced StorageException.
  • The plugin-wide default is set on AWSS3StoragePluginConfiguration.progressStallTimeout and can be overridden per upload via AWSS3StorageUploadFileOptions / AWSS3StorageUploadInputStreamOptions (null defers to the plugin default; Disabled explicitly opts out for that operation).
  • The resolved interval is threaded through the upload pipeline into WorkData. SinglePartUploadWorker and PartUploadTransferWorker decorate their ProgressListeners with a coroutine-based StallDetectingProgressListener. On stall, the worker cancels the upload job, marks the transfer FAILED (workers explicitly skip retry for ProgressStallTimeoutException), and surfaces the typed cause via Throwable.toStorageUploadException(...).
  • Default: Disabled (opt-in). Existing behavior is preserved.

Configuration:

// Plugin default: 30 seconds
Amplify.addPlugin(
    AWSS3StoragePlugin(
        AWSS3StoragePluginConfiguration {
            progressStallTimeout = ProgressStallTimeout.Interval(30)
        }
    )
)

// Override for a large upload: 120 seconds
val options = AWSS3StorageUploadFileOptions.builder()
    .progressStallTimeout(ProgressStallTimeout.Interval(120))
    .build()

Amplify.Storage.uploadFile(StoragePath.fromString("public/big.zip"), file, options)

How did you test these changes?

  • New unit tests:
    • StallDetectingProgressListenerTest — coroutine-based timer logic (no progress, progress resets timer, close cancels, disabled threshold no-op, etc.) using kotlinx-coroutines-test (StandardTestDispatcher + advanceTimeBy).
    • StorageExceptionExtensionsTest — verifies toStorageUploadException preserves the typed ProgressStallTimeoutException cause (direct and nested) and falls back to the default message for unrelated throwables.
    • AWSS3StoragePluginConfigurationTest — defaults, builder, and override semantics for progressStallTimeout.
    • AWSS3StorageUploadFileOptionsTest / AWSS3StorageUploadInputStreamOptionsTest — option propagation, null-defers-to-plugin semantics, builder + Java interop.
  • Updated existing operation/component unit tests for the new 6-arg StorageService.uploadFile / uploadInputStream overloads.
  • New instrumentation tests in AWSS3StoragePathUploadTest (mirrors the iOS PR coverage retained after reviewer feedback):
    • testUploadSmallFileWithProgressStallTimeoutOptionCompletesSuccessfully — single-part happy path with the stall timer set.
    • testUploadLargeFileWithProgressStallTimeoutOptionCompletesSuccessfully — multipart happy path with the stall timer set.
  • ./gradlew :core:test :aws-storage-s3:test — all passing locally.
  • ./gradlew ktlintCheck checkstyle apiDump — all passing; API dump shows additive-only changes (minor bump).

Documentation update required?

  • No
  • Yes (Please include a PR link for the documentation update)

General Checklist

  • Added Unit Tests
  • Added Integration Tests
  • Security oriented best practices and standards are followed (e.g. using input sanitization, principle of least privilege, etc)
  • Ensure commit message has the appropriate scope (e.g fix(storage): message, feat(auth): message, chore(all): message)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@burak-initialcode burak-initialcode requested review from a team as code owners April 27, 2026 10:21
Add a configurable progress stall timeout to S3 storage uploads. When
upload progress does not advance for the specified interval (e.g. due to
network issues), the upload is automatically cancelled and the operation's
onError consumer receives a typed StorageException whose cause is
ProgressStallTimeoutException, instead of hanging indefinitely.

ProgressStallTimeout (sealed: Disabled / Interval(seconds)) is exposed on
AWSS3StoragePluginConfiguration as the plugin-wide default and can be
overridden per upload via AWSS3StorageUploadFileOptions and
AWSS3StorageUploadInputStreamOptions. The resolved interval is threaded
through the upload pipeline into WorkData so SinglePartUploadWorker and
PartUploadTransferWorker can decorate their ProgressListeners with a
StallDetectingProgressListener; on stall the worker cancels the upload
job, marks the transfer FAILED (no retry), and surfaces the typed cause.

The default remains Disabled (opt-in), preserving existing behavior.

fixes aws-amplify#3302
Cover the happy path for both single-part and multipart S3 uploads when a
ProgressStallTimeout is set on the upload options. The stall timer must
not break a successful upload; these tests assert that the option flows
end-to-end through the worker pipeline without affecting normal completion.

Mirrors the unit-tested integration coverage retained in the iOS PR.
@burak-initialcode
Copy link
Copy Markdown
Author

@harsh62 any progress on this one?

@jvh-aws
Copy link
Copy Markdown
Contributor

jvh-aws commented Apr 29, 2026

@burak-initialcode I am currently looking into the PR. Will let you know once I have an update.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

❌ Patch coverage is 41.26394% with 158 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.03%. Comparing base (d6d9f52) to head (c2ae0e3).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3306      +/-   ##
==========================================
+ Coverage   55.91%   56.03%   +0.12%     
==========================================
  Files        1083     1087       +4     
  Lines       31932    32126     +194     
  Branches     4760     4803      +43     
==========================================
+ Hits        17854    18003     +149     
- Misses      12222    12241      +19     
- Partials     1856     1882      +26     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Add ProgressStallTimeoutTest covering Disabled/Interval semantics, factory
  methods, equality, and the negative/zero seconds normalization branches.
- Add ProgressStallTimeoutExceptionTest covering message and recovery
  suggestion round-trip and AmplifyException inheritance.
- Add AWSS3StoragePluginProgressStallTimeoutTest covering the plugin's
  resolveProgressStallTimeoutSeconds branches: plugin-level Disabled and
  Interval defaults, per-upload Interval override, per-upload Disabled
  override, and the non-AWS-options instanceof fallback path for both
  uploadFile (key + StoragePath) and uploadInputStream (key + StoragePath).
- Extend AWSS3StorageUploadFileOperationTest and
  AWSS3StorageUploadInputStreamOperationTest to exercise the new
  Java-friendly long-arg constructor and verify that the resolved
  progressStallTimeoutSeconds is forwarded to StorageService.
- Extend AWSS3StoragePathUploadFileOperationTest and
  AWSS3StoragePathUploadInputStreamOperationTest to verify that the
  progressStallTimeoutSeconds carried on the upload request is forwarded
  to StorageService.

Co-authored-by: Cursor <cursoragent@cursor.com>
@burak-pensa
Copy link
Copy Markdown

@jvh-aws, any progress?

Copy link
Copy Markdown
Contributor

@jvh-aws jvh-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. I am still testing this on my end. Otherwise, there are two small comments I had

Single-part uploads and multi-part chunks restarted via TransferOperations.resume()
previously dropped the configured progress-stall timeout, forcing the worker to
treat the resumed transfer as if stall detection was disabled. This matched neither
the original upload's behavior nor the iOS Amplify Storage parity expectations.

Thread the plugin-level default progressStallTimeoutSeconds resolved from
AWSS3StoragePluginConfiguration through the storage service, the per-bucket
service container, TransferManager, and TransferOperations.resume() so the worker
re-arms stall detection with the plugin default whenever a transfer is resumed.

Per-upload overrides remain unchanged: the initial start() path still honors
the per-upload progressStallTimeoutSeconds threaded from the upload operation
options. Only resume - which has no surviving per-upload context - falls back
to the plugin default rather than 0.

Migrate AWSS3StoragePluginProgressStallTimeoutTest to MockK per the project
convention for new Kotlin tests, and add TransferOperationsResumeTest covering
the resume work-data propagation in both the configured-default and legacy
zero-fallback cases.

Co-authored-by: Cursor <cursoragent@cursor.com>
@burak-initialcode burak-initialcode requested a review from jvh-aws May 19, 2026 11:04
@burak-initialcode
Copy link
Copy Markdown
Author

@jvh-aws Thanks for the review — both comments are addressed in 98a75d59.

1. Resume path + plugin default (Swift #4162 parity)

TransferOperations.resume() was calling start(..., progressStallTimeoutSeconds = 0L), so resumed uploads always lost stall detection even when the plugin had a default configured. That didn’t match the Swift behavior in aws-amplify/amplify-swift#4162, where resolvedProgressStallTimeoutSeconds falls back to the plugin configuration when there’s no per-upload override.

The plugin-level default from AWSS3StoragePluginConfiguration.progressStallTimeout is now threaded through AWSS3StoragePluginAWSS3StorageService.Factory / AWSS3StorageServiceTransferManagerTransferOperations.resume(), so resumed work is enqueued with that default in WorkData (PROGRESS_STALL_TIMEOUT_SECONDS). Per-upload overrides on the initial start() path are unchanged; only resume uses the plugin default, since the per-upload value isn’t persisted across pause/resume.

Added TransferOperationsResumeTest to assert the stall timeout is propagated into the resumed worker’s work data (configured default vs. legacy 0 fallback).

2. MockK for AWSS3StoragePluginProgressStallTimeoutTest

Migrated AWSS3StoragePluginProgressStallTimeoutTest from Mockito to MockK (mockk, every, verify(timeout = …), slot) to match the project convention for new Kotlin tests.

Verification

Locally: :aws-storage-s3:testDebugUnitTest, targeted progress-stall tests, :core:testDebugUnitTest (ProgressStallTimeout*), and :aws-storage-s3:ktlintCheck — all green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants