Skip to content

docs(studios): add S3 versioning guidance for checkpoint storage costs#1447

Open
ejseqera wants to merge 3 commits into
masterfrom
docs/studios-checkpoint-s3-versioning-guidance
Open

docs(studios): add S3 versioning guidance for checkpoint storage costs#1447
ejseqera wants to merge 3 commits into
masterfrom
docs/studios-checkpoint-s3-versioning-guidance

Conversation

@ejseqera
Copy link
Copy Markdown
Member

@ejseqera ejseqera commented May 19, 2026

Summary

  • Studios writes a checkpoint to the same S3 key every five minutes. With S3 versioning enabled on the work bucket, each write creates a new object version rather than an overwrite — up to 96 non-current versions per day per active session — which can cause significant unexpected storage costs.
  • Adds a new ### S3 versioning and checkpoint storage costs subsection under the existing checkpoint section explaining the interaction and providing actionable remediation steps.
  • Applied to both platform-cloud and platform-enterprise docs.

Test plan

  • Preview renders correctly on both enterprise and cloud docs
  • JSON and shell code blocks render without errors
  • Links and cross-references in surrounding sections are unaffected
  • Pre-commit passes (verified locally)

🤖 Generated with Claude Code

ejseqera added 2 commits May 19, 2026 17:28
Studios writes a checkpoint every five minutes to the same S3 key. When
S3 versioning is enabled on the work bucket, each write creates a new
object version rather than an overwrite, producing up to 96 non-current
versions per day per active session.

Add a new subsection under "Studio session checkpoints" that:
- Explains the versioning interaction and its cost implications
- Recommends an S3 Lifecycle rule (NoncurrentVersionExpiration: 1 day)
  with a ready-to-use JSON policy block and aws s3api CLI command
- Provides a bulk-delete shell command for clearing existing accumulated
  non-current versions
- Clarifies that non-current versions are safe to delete, while the
  current version and checkpoint directories must not be removed

Changes applied to both platform-cloud and platform-enterprise docs.
@ejseqera ejseqera requested a review from gwright99 May 19, 2026 21:31
@netlify
Copy link
Copy Markdown

netlify Bot commented May 20, 2026

Deploy Preview for seqera-docs ready!

Name Link
🔨 Latest commit 2ef8d31
🔍 Latest deploy log https://app.netlify.com/projects/seqera-docs/deploys/6a0df35a1785ee00086aa26e
😎 Deploy Preview https://deploy-preview-1447--seqera-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@justinegeffen justinegeffen requested a review from t0randr May 20, 2026 19:01
@justinegeffen justinegeffen added the 1. Dev/PM/SME Needs a review by a Dev/PM/SME label May 20, 2026
@justinegeffen justinegeffen added the 2. Edu reviews complete Reviews complete. Remove label when confirmed in prod. label May 21, 2026
@justinegeffen justinegeffen requested a review from robnewman May 22, 2026 14:25
@robnewman
Copy link
Copy Markdown
Member

robnewman commented May 26, 2026

@ejseqera Isn't this a generic issue across any cloud provider that provides object storage versioning?
(e.g. Azure, GCP, OCI).

I don't think we should limit this to just S3.

Copy link
Copy Markdown
Member

@robnewman robnewman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be limited to S3.

Checkpoints vary in size depending on libraries installed in your session environment. This can potentially result in many large files stored in the compute environment's pipeline work directory and saved to cloud storage. This storage will incur costs based on the cloud provider. Due to the architecture of Studios, you cannot delete any checkpoint files to save on storage costs. Deleting a Studio session's checkpoints will result in a corrupted Studio session that cannot be started nor recovered.
:::

### S3 versioning and checkpoint storage costs
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a generic problem across any cloud provider that supports object versioning, which is all of them. Don't limit to just S3


### S3 versioning and checkpoint storage costs

If your compute environment work directory uses an S3 bucket with **versioning enabled**, checkpoint writes create a new S3 object version every five minutes rather than overwriting the previous one. For an active Studio session, this produces up to 96 new object versions per day per session. Over time, these non-current versions accumulate and can significantly increase storage costs.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If your compute environment work directory uses an S3 bucket with **versioning enabled**, checkpoint writes create a new S3 object version every five minutes rather than overwriting the previous one. For an active Studio session, this produces up to 96 new object versions per day per session. Over time, these non-current versions accumulate and can significantly increase storage costs.
If your compute environment work directory uses an object storage bucket with **versioning enabled**, checkpoint writes create a new object version rather than overwriting the previous one. For an active Studio session, this produces many object versions per session. Over time, these non-current versions accumulate and can significantly increase storage costs.

If your compute environment work directory uses an S3 bucket with **versioning enabled**, checkpoint writes create a new S3 object version every five minutes rather than overwriting the previous one. For an active Studio session, this produces up to 96 new object versions per day per session. Over time, these non-current versions accumulate and can significantly increase storage costs.

:::warning
Only the latest version of each checkpoint file is read by Platform. However, non-current S3 object versions are not automatically removed and will continue to accrue storage costs until explicitly deleted or expired.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Only the latest version of each checkpoint file is read by Platform. However, non-current S3 object versions are not automatically removed and will continue to accrue storage costs until explicitly deleted or expired.
Only the latest version of each checkpoint file is read by Platform. However, non-current object versions are not automatically removed and will continue to accrue storage costs until explicitly deleted or expired.

Only the latest version of each checkpoint file is read by Platform. However, non-current S3 object versions are not automatically removed and will continue to accrue storage costs until explicitly deleted or expired.
:::

**Recommended mitigation:** Apply an S3 Lifecycle rule to expire non-current object versions on the `.studios/checkpoints/` prefix. A one-day expiry retains the current version while removing intermediate five-minute writes. You can also delete existing accumulated non-current versions manually using your cloud provider's console or CLI.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Recommended mitigation:** Apply an S3 Lifecycle rule to expire non-current object versions on the `.studios/checkpoints/` prefix. A one-day expiry retains the current version while removing intermediate five-minute writes. You can also delete existing accumulated non-current versions manually using your cloud provider's console or CLI.
**Recommended mitigation:** Apply lifecycle rules to expire non-current object versions on the `.studios/checkpoints/` prefix. A one-day expiry retains the current version while removing intermediate five-minute writes. You can also delete existing accumulated non-current versions manually using your cloud provider's console or CLI.

**Recommended mitigation:** Apply an S3 Lifecycle rule to expire non-current object versions on the `.studios/checkpoints/` prefix. A one-day expiry retains the current version while removing intermediate five-minute writes. You can also delete existing accumulated non-current versions manually using your cloud provider's console or CLI.

:::note
Non-current object versions (intermediate checkpoint writes) are safe to delete. Do **not** delete the current (latest) version of any checkpoint file or the checkpoint directory itself — doing so will corrupt the Studio session and it cannot be recovered.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this impact the "Start as new" functionality?

Checkpoints vary in size depending on libraries installed in your session environment. This can potentially result in many large files stored in the compute environment's pipeline work directory and saved to cloud storage. This storage will incur costs based on the cloud provider. Due to the architecture of Studios, you cannot delete any checkpoint files to save on storage costs. Deleting a Studio session's checkpoints will result in a corrupted Studio session that cannot be started nor recovered.
:::

### S3 versioning and checkpoint storage costs
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problems as the above Cloud docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1. Dev/PM/SME Needs a review by a Dev/PM/SME 2. Edu reviews complete Reviews complete. Remove label when confirmed in prod.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants