Skip to content

VM Scale set is in a Failed ProvisioningState, because of an extension #82

@Petkoii0

Description

@Petkoii0

We've discovered there is a VM Scale set, that seems to be in "failed" ProvisioningState. The nodes are running fine, but one extension on the VMSS has been reported as failed. Unfortunately this is a managed service and we don't have much of an access to troubleshoot this further. However we checked the state of the VMSS via the API with the command below:

az vmss get-instance-view --resource-group groupName --name vmssName

      "name": "AKSLinuxExtension",
      "statusesSummary": [
        {
          "code": "ProvisioningState/succeeded",
          "count": 1
        },
        {
          "code": "ProvisioningState/failed",
          "count": 1
        }
      ]
    }
  ],
  "orchestrationServices": null,
  "statuses": [
    {
      "code": "ProvisioningState/failed/VMExtensionHandlerNonTransientError",
      "displayStatus": "Provisioning failed",
      "level": "Error",
      "message": "The handler for VM extension type 'Microsoft.AKS.Compute.AKS.Linux.AKSNode' has reported terminal failure for VM extension 'AKSLinuxExtension' with error message: 'Failed to download artifacts: [ExtensionDownloadError] Timeout downloading extension package. Elapsed: 0:08:04.405546 URIs tried: 2/3. Last error: [HttpError] Download failed both on the primary and fallback channels. IOError [Errno -3] Temporary failure in name resolution -- 6 attempts made]'.\r\n    \r\n'Manifest download failed for the extension. 

When we ran on all VMSS nodes the same command it showed that everything looks fine:


     "name": "AKSLinuxExtension",
     "statuses": [
       {
         "code": "ProvisioningState/succeeded",
         "displayStatus": "Provisioning succeeded",
         "level": "Info",
         "message": "Successfully enabled Compute.AKS.Linux.AKSNode extension",
         "time": null
       }
     ],
     "substatuses": null,
     "type": "Microsoft.AKS.Compute.AKS.Linux.AKSNode",
     "typeHandlerVersion": "1.56"
   }

The question here is - how do we fix the error of the VMSS? Will restarting it fix the issue without a downtime, or there is another more pragmatic way to clear this error?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions