Problem
azd provision, azd deploy, and azd up do not validate auth state before starting work. When auth is expired or missing, users discover the failure after the command has already begun execution — sometimes minutes into a deployment. This is especially painful for azd up (31% success rate) where a compound provision+deploy operation fails late.
Telemetry Evidence (rolling 28 days)
The most common failure path is: user runs azd up → internally calls azd auth token → token fails → operation aborts.
| Result Code |
Failures/28d |
Affected Users |
auth.login_required |
543,972 |
1,353 |
auth.not_logged_in |
65,643 |
1,233 |
azidentity_AuthenticationFailedError |
7,258 |
65 |
610K+ auth failures per 28 days that could be caught before any work begins.
AI agent environments are hit hardest — GitHub Copilot CLI sees a 61% auth token failure rate, and Claude Code hits 41%. These agents call azd auth token speculatively before establishing sessions.
Proposal: Auth Pre-Flight Check
Add an auth validation step at the beginning of commands that require Azure credentials (provision, deploy, up, down, monitor). The check would:
- Verify a current user exists — catch
auth.ErrNoCurrentUser early
- Attempt a token acquisition — validate the credential is still valid (not just cached)
- Check token expiry — if the token expires within a configurable window (e.g., 5 minutes), warn the user
- Provide actionable guidance — instead of a generic failure, tell the user exactly what to do
Where to implement
The existing infrastructure supports this well:
ErrorHandlerPipeline (cli/azd/pkg/errorhandler/pipeline.go) already matches errors by type and pattern, and wraps them with ErrorWithSuggestion. Auth errors could be added to error_suggestions.yaml with clear fix instructions.
ErrorMiddleware (cli/azd/cmd/middleware/error.go) already classifies auth errors as UserContextError in classifyError(). A pre-flight middleware could run before the action, not just after.
- Middleware pattern — azd uses a middleware chain. A new
AuthPreFlightMiddleware could slot in before the action runs:
// Conceptual — new middleware that runs before the action
type AuthPreFlightMiddleware struct {
authManager auth.Manager
console input.Console
}
func (m *AuthPreFlightMiddleware) Run(ctx context.Context, next NextFn) (*actions.ActionResult, error) {
// Quick check: is anyone logged in?
currentUser, err := m.authManager.GetLoggedInUser(ctx)
if errors.Is(err, auth.ErrNoCurrentUser) {
return nil, &internal.ErrorWithSuggestion{
Err: err,
Suggestion: "Run 'azd auth login' to sign in before running this command.",
}
}
// Quick check: can we acquire a token?
_, err = m.authManager.GetToken(ctx)
if err != nil {
if _, ok := errors.AsType[*auth.ReLoginRequiredError](err); ok {
return nil, &internal.ErrorWithSuggestion{
Err: err,
Suggestion: "Your session has expired. Run 'azd auth login' to re-authenticate.",
}
}
}
// Auth is good — proceed with the real command
return next(ctx)
}
For AI agents specifically
AI agents (Claude Code, GitHub Copilot CLI, OpenCode) would benefit from:
- Non-interactive token check —
azd auth token --check (exit 0 if valid, exit 1 if not, no output). Agents can call this before running expensive commands.
- Machine-readable auth status —
azd auth status --output json returning {"logged_in": true, "expires_at": "2026-03-21T22:00:00Z", "tenant": "...", "user": "..."}. Agents can parse this to decide whether to prompt for login.
- Structured error codes in stderr — when auth fails, emit a JSON line to stderr with the error code so agents can programmatically handle it without parsing error messages.
Expected Impact
- Pre-flight catch: Prevents ~610K failed
auth token calls per 28 days from cascading into provision/deploy failures
- Better UX: Users see "Please run
azd auth login" immediately, not after waiting for Bicep compilation or Docker builds
- Agent-friendly:
--check flag lets agents validate auth state cheaply before running expensive operations
- Telemetry clarity: Pre-flight failures would get their own result code (e.g.,
preflight.auth.not_logged_in), separating preventable errors from genuine runtime auth failures
Related
Problem
azd provision,azd deploy, andazd updo not validate auth state before starting work. When auth is expired or missing, users discover the failure after the command has already begun execution — sometimes minutes into a deployment. This is especially painful forazd up(31% success rate) where a compound provision+deploy operation fails late.Telemetry Evidence (rolling 28 days)
The most common failure path is: user runs
azd up→ internally callsazd auth token→ token fails → operation aborts.auth.login_requiredauth.not_logged_inazidentity_AuthenticationFailedError610K+ auth failures per 28 days that could be caught before any work begins.
AI agent environments are hit hardest — GitHub Copilot CLI sees a 61%
auth tokenfailure rate, and Claude Code hits 41%. These agents callazd auth tokenspeculatively before establishing sessions.Proposal: Auth Pre-Flight Check
Add an auth validation step at the beginning of commands that require Azure credentials (
provision,deploy,up,down,monitor). The check would:auth.ErrNoCurrentUserearlyWhere to implement
The existing infrastructure supports this well:
ErrorHandlerPipeline(cli/azd/pkg/errorhandler/pipeline.go) already matches errors by type and pattern, and wraps them withErrorWithSuggestion. Auth errors could be added toerror_suggestions.yamlwith clear fix instructions.ErrorMiddleware(cli/azd/cmd/middleware/error.go) already classifies auth errors asUserContextErrorinclassifyError(). A pre-flight middleware could run before the action, not just after.AuthPreFlightMiddlewarecould slot in before the action runs:For AI agents specifically
AI agents (Claude Code, GitHub Copilot CLI, OpenCode) would benefit from:
azd auth token --check(exit 0 if valid, exit 1 if not, no output). Agents can call this before running expensive commands.azd auth status --output jsonreturning{"logged_in": true, "expires_at": "2026-03-21T22:00:00Z", "tenant": "...", "user": "..."}. Agents can parse this to decide whether to prompt for login.Expected Impact
auth tokencalls per 28 days from cascading into provision/deploy failuresazd auth login" immediately, not after waiting for Bicep compilation or Docker builds--checkflag lets agents validate auth state cheaply before running expensive operationspreflight.auth.not_logged_in), separating preventable errors from genuine runtime auth failuresRelated