Skip to content

Fix cross-tenant registry endpoint resolution in DeploymentTemplateOperations#46719

Open
PratibhaShrivastav18 wants to merge 1 commit intoAzure:mainfrom
PratibhaShrivastav18:shrivastavp/dt-discovery-api-fix
Open

Fix cross-tenant registry endpoint resolution in DeploymentTemplateOperations#46719
PratibhaShrivastav18 wants to merge 1 commit intoAzure:mainfrom
PratibhaShrivastav18:shrivastavp/dt-discovery-api-fix

Conversation

@PratibhaShrivastav18
Copy link
Copy Markdown
Member

@PratibhaShrivastav18 PratibhaShrivastav18 commented May 5, 2026

Bug

When running az ml deployment-template list --registry-name <registry> against a registry in a different Azure AD tenant, the CLI uses the wrong API endpoint (INT/test environment) and fails with 403.

Root Cause

_get_registry_endpoint() in DeploymentTemplateOperations was making an ARM call (GET /subscriptions/.../registries/<name>) to determine the registry's region and construct the dataplane endpoint. This ARM call requires the caller's token to match the registry's tenant.

Flow before fix:

  1. ARM registry GET fails with 401 (cross-tenant token mismatch)
  2. Code falls back to hardcoded https://int.experiments.azureml-test.net
  3. INT endpoint returns 403 (user has no access to test environment)

Fix

Replace the ARM-based region lookup with the registry discovery API (/registrymanagement/v1.0/registries/{name}/discovery).

The discovery API:

  • Works cross-tenant (does not require ARM access to the registry's subscription)
  • Already succeeds in the reported scenario (confirmed in bug report logs)
  • Returns primaryRegion directly, which is used to construct https://{primaryRegion}.api.azureml.ms

Flow after fix:

  1. Registry discovery API called -> returns primaryRegion (e.g. eastus)
  2. Endpoint constructed: https://eastus.api.azureml.ms
  3. DT operations use the correct production endpoint

Testing

Verified with --debug output showing:

  • Discovery API: GET /registrymanagement/v1.0/registries/shrivastavp-reg/discovery -> 200
  • DT API call uses: https://eastus.api.azureml.ms/genericasset/v2.0/... -> 200
  • No ARM registry GET is made at all

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes cross-tenant registry endpoint resolution for DeploymentTemplateOperations by switching from an ARM-based registry lookup (which can fail with cross-tenant tokens) to the registry discovery API, and then deriving the deployment template dataplane endpoint from the discovery response.

Changes:

  • Replaced ARM registry GET–based region lookup with a call to the registry discovery API.
  • Constructed the deployment template dataplane endpoint based on the discovery response’s resolved primary region.

self._operation_scope.registry_name
)
)

Comment on lines 55 to +84
@@ -63,31 +69,19 @@ def _get_registry_endpoint(self) -> str:
credential = self._operation_config.credential

if credential and self._operation_scope.registry_name:
# Get registry information to determine the region
registry_operations = RegistryOperations(
operation_scope=self._operation_scope,
service_client=ServiceClient102022(
credential=credential,
subscription_id=self._operation_scope.subscription_id,
resource_group_name=self._operation_scope.resource_group_name,
),
all_operations=None, # type: ignore[arg-type]
credentials=credential,
# Use registry discovery API to get the primary region
discovery_base_url = _get_registry_discovery_endpoint_from_metadata(_get_default_cloud_name())
discovery_client = ServiceClientRegistryDiscovery(
credential=credential, base_url=discovery_base_url
)
response = (
discovery_client.registry_management_non_workspace.get_registry_management_non_workspace(
self._operation_scope.registry_name
)
)

registry = registry_operations.get(self._operation_scope.registry_name)

# Extract region from registry location or replication locations
region = None
if registry.location:
region = registry.location
elif registry.replication_locations and len(registry.replication_locations) > 0:
region = registry.replication_locations[0].location

if region:
# Format the endpoint using the detected region
# return f"https://int.experiments.azureml-test.net"
return f"https://{region}.api.azureml.ms"
if response.primary_region:
return f"https://{response.primary_region}.api.azureml.ms"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants