feat(dstack-util): mix gcp vTPM AK cert into instance_id#726
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates how dstack-util derives instance_id on GCP to avoid identity collisions when VMs are cloned from disk images/snapshots (which duplicate the persisted instance_id_seed). It does so by mixing a per-instance value read from the GCP vTPM (the AK certificate) into the instance_id derivation, while leaving other platforms unchanged.
Changes:
- Add
platform_instance_binding()that, on GCP, reads the vTPM AK certificate from NV (ECC first, then RSA) and contributessha256(cert)as a per-instance binding value. - Extend
instance_idderivation to include the platform binding when available (GCP), preserving the previous seed-only derivation on other platforms.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+717
to
+721
| /// On GCP we use the pre-provisioned vTPM Attestation Key certificate: it lives in | ||
| /// the vTPM NV store (not on the data disk), so a VM cloned from a disk image gets | ||
| /// a fresh vTPM with a different AK cert, while a reboot of the same VM keeps it | ||
| /// stable — exactly the property we need. The cert is also signed by Google, so the | ||
| /// host cannot trivially forge a duplicate. |
instance_id is derived from instance_id_seed, which is persisted on the data disk. On GCP a VM can be cloned from a disk image / snapshot, so every clone inherits the same seed and thus the same instance_id, letting multiple running VMs share one identity. On GCP, mix the public key of the pre-provisioned vTPM Attestation Key into the instance_id. The AK is derived deterministically from the per-instance Endorsement seed held in the vTPM (not on the data disk), so it is stable across reboot/stop-start but fresh on a disk clone. We hash the AK public area rather than its certificate so the binding is immune to certificate re-issuance: a re-signed cert carries new serial/ validity/signature bytes for the same key, which would otherwise change instance_id without a clone. (Observed cert validity is ~30 years from instance creation, so re-issuance is unlikely, but the pubkey removes the dependency entirely.) tpm-attest: expose the AK public area on LoadedAk (previously discarded). Verified on real c3-standard-4 TDX confidential VMs: - reboot: AK unchanged - stop/start: AK unchanged - clone from disk image: AK differs Fails closed: if GCP is detected but the AK cannot be loaded, error instead of silently falling back to the seed-only id. Other platforms are unaffected.
ff28c37 to
f1ba0a2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
instance_idis derived frominstance_id_seed, which is persisted on the data disk:On GCP a VM can be cloned from a disk image / snapshot. Every clone inherits the same
instance_id_seedand therefore computes the sameinstance_id, letting multiple running VMs share one identity (managed instance groups, image-based cloning, etc.).Fix
On GCP, mix the public key of the pre-provisioned vTPM Attestation Key into the
instance_id:The AK is derived deterministically from the per-instance Endorsement seed held in the vTPM — not on the data disk — so it is stable across reboot/stop-start but fresh on a disk clone, which is exactly the property needed to keep
instance_idunique per running VM. Reuses the existingtpm-attestGCP AK load path (prefers ECC, falls back to RSA).Why hash the AK public area, not the AK certificate: a certificate carries serial / validity / signature bytes that can change on re-issuance for the same key, which would shift
instance_idwithout a clone. The public area depends only on the key. (Observed AK cert validity is ~30 years from instance creation, so re-issuance is unlikely in practice — hashing the pubkey removes the dependency entirely.)platform_instance_binding()returnsNone).tpm-attest: exposes the AK public area onLoadedAk(previously discarded as_public).Validation
Tested on real
c3-standard-4--confidential-compute-type=TDXVMs (confirmed/dev/tdx_guest):Notes / follow-ups
instance_id(and RTMR3 measurement) change once after upgrade.