Reset Copilot utility model cache and force one retry when CAPI omits chat fallback#316720
Draft
benvillalobos wants to merge 4 commits into
Draft
Reset Copilot utility model cache and force one retry when CAPI omits chat fallback#316720benvillalobos wants to merge 4 commits into
benvillalobos wants to merge 4 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes a caching edge case in the Copilot extension’s ModelMetadataFetcher where the “Copilot utility” chat model can get stuck unresolved for up to 10 minutes if a /models response omits is_chat_fallback: true.
Changes:
- Clears
_copilotUtilityModelwhen refreshing the model family map so stale/invalid state can’t persist across fetches. - Adds a bounded one-time forced refresh in
getCopilotUtilityModel()when models were returned but no chat fallback was flagged, to avoid being trapped by the 10-minute cache window.
Show a summary per file
| File | Description |
|---|---|
| extensions/copilot/src/platform/endpoint/node/modelMetadataFetcher.ts | Resets cached utility model on refresh and adds a forced single retry to recover from transient missing is_chat_fallback responses. |
Copilot's findings
- Files reviewed: 1/1 changed files
- Comments generated: 2
Comment on lines
279
to
283
| this._familyMap.clear(); | ||
| this._copilotUtilityModel = undefined; | ||
|
|
||
| const data: IModelAPIResponse[] = (await response.json()).data; | ||
| this._requestLogger.logModelListCall(requestId, requestMetadata, data); |
Comment on lines
174
to
+179
| await this._taskSingler.getOrCreate(ModelMetadataFetcher.ALL_MODEL_KEY, this._fetchModels.bind(this)); | ||
| if (!this._copilotUtilityModel && this._familyMap.size > 0) { | ||
| // Server returned models but did not flag a chat fallback; force one refresh | ||
| // before throwing so we are not stuck on a stale 10-minute cache window. | ||
| await this._taskSingler.getOrCreate(ModelMetadataFetcher.ALL_MODEL_KEY, () => this._fetchModels(true)); | ||
| } |
Comment on lines
+178
to
185
| if (!this._copilotUtilityModel && this._familyMap.size > 0 && !this._hasForcedUtilityModelRetry) { | ||
| // One-shot retry per auth epoch: avoids storming CAPI on persistent server misconfig. | ||
| this._hasForcedUtilityModelRetry = true; | ||
| this._logService.warn('Utility model unset after initial fetch; forcing one refresh'); | ||
| await this._taskSingler.getOrCreate(ModelMetadataFetcher.ALL_MODEL_KEY, () => this._fetchModels(true)); | ||
| } | ||
| const resolvedModel = this._copilotUtilityModel; | ||
| if (!resolvedModel || !isChatModelInformation(resolvedModel)) { |
Contributor
There was a problem hiding this comment.
Copilot's findings
Comments suppressed due to low confidence (4)
extensions/copilot/src/platform/endpoint/node/modelMetadataFetcher.ts:183
_hasForcedUtilityModelRetryis set totruebefore the forced refresh is awaited. If multiple callers reach this block concurrently, only the first will trigger/await the refresh; others will skip it and can throw even though the refresh is already in-flight (sinceTaskSinglerwould otherwise coalesce them). Consider setting the flag only after the forced refresh completes, or tracking/awaiting a shared in-flight retry promise so all callers wait for the same refresh.
if (!this._copilotUtilityModel && this._familyMap.size > 0 && !this._hasForcedUtilityModelRetry) {
// One-shot retry per auth epoch: avoids storming CAPI on persistent server misconfig.
this._hasForcedUtilityModelRetry = true;
this._logService.warn('Utility model unset after initial fetch; forcing one refresh');
await this._taskSingler.getOrCreate(ModelMetadataFetcher.ALL_MODEL_KEY, () => this._fetchModels(true));
}
extensions/copilot/src/platform/endpoint/node/modelMetadataFetcher.ts:116
- The retry gating is described as “one-shot per auth epoch”, but
_hasForcedUtilityModelRetryis only reset on logout (copilotToken === undefined) and on fetch errors. If the token changes/refreshes (auth change event) but remains defined, or after a later successful refresh cycle, this flag may staytrueindefinitely and prevent future one-shot retries, potentially reintroducing the original failure mode. Consider resetting the flag on any auth change (or when a fetch succeeds) to match the intended “epoch” semantics.
This issue also appears in the following locations of the same file:
- line 176
- line 178
- line 304
this._register(this._authService.onDidAuthenticationChange(() => {
// Auth changed so next fetch should be forced to get a new list
// Only clear the family map if the copilot token is undefined, as this means the user has logged out and we should clear the models, otherwise we want to keep the old models around until we get a new list
if (this._authService.copilotToken === undefined) {
this._familyMap.clear();
this._copilotUtilityModel = undefined;
this._hasForcedUtilityModelRetry = false;
}
this._completionsFamilyMap.clear();
this._lastFetchTime = 0;
}));
extensions/copilot/src/platform/endpoint/node/modelMetadataFetcher.ts:182
- PR description says the forced refresh is “bounded to one call per
getCopilotUtilityModel()invocation”, but the added_hasForcedUtilityModelRetrymakes it at most once for the lifetime of the instance (until logout/error), changing the documented behavior. Either adjust the implementation to match the description (one retry per invocation) or update the PR description/comments to reflect the actual global gating.
public async getCopilotUtilityModel(): Promise<IChatModelInformation> {
await this._taskSingler.getOrCreate(ModelMetadataFetcher.ALL_MODEL_KEY, this._fetchModels.bind(this));
if (!this._copilotUtilityModel && this._familyMap.size > 0 && !this._hasForcedUtilityModelRetry) {
// One-shot retry per auth epoch: avoids storming CAPI on persistent server misconfig.
this._hasForcedUtilityModelRetry = true;
this._logService.warn('Utility model unset after initial fetch; forcing one refresh');
await this._taskSingler.getOrCreate(ModelMetadataFetcher.ALL_MODEL_KEY, () => this._fetchModels(true));
extensions/copilot/src/platform/endpoint/node/modelMetadataFetcher.ts:309
_hasForcedUtilityModelRetryis reset tofalsein the_fetchModelscatch block. If the server is persistently misconfigured (no fallback model) or the refresh is failing, this can cause repeated forced refresh attempts on subsequentgetCopilotUtilityModel()calls (potentially doubling CAPI traffic). Consider leaving the flag set after a failed forced retry (or adding a backoff/timestamp) so the “one-shot” protection still holds during outages.
this._hasForcedUtilityModelRetry = false;
}
this._onDidModelRefresh.fire();
} catch (e) {
this._logService.error(e, `Failed to fetch models (${requestId})`);
this._lastFetchError = e;
- Files reviewed: 1/1 changed files
- Comments generated: 1
| @@ -272,6 +281,7 @@ export class ModelMetadataFetcher extends Disposable implements IModelMetadataFe | |||
| } | |||
|
|
|||
| this._familyMap.clear(); | |||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix a caching bug in
ModelMetadataFetcherthat can lock the Copilot utility chat model into a permanently-unresolvable state for up to 10 minutes after a single CAPI/modelsresponse that doesn't flag any model withis_chat_fallback: true.Problem
_copilotUtilityModelis only ever assigned inside_fetchModelswhen a model hasis_chat_fallback === true. It is never cleared. Combined with_shouldRefreshModels(), which short-circuits while_familyMap.size > 0and_lastFetchTimeis within the 10-minute window, a single bad response (or a 429 short-circuit while the cache is fresh) leavesgetCopilotUtilityModel()throwing"Unable to resolve Copilot utility chat model (server did not mark a chat fallback model)"until the cache expires.This has been a contributing source of sanity-test flakes in CI.
Fix
Two complementary changes:
1. Reset
_copilotUtilityModel = undefinedadjacent to_familyMap.clear()in_fetchModels.This makes
_copilotUtilityModela derived value of the current response rather than a sticky field that accumulates across fetches. The pattern now matches what_fetchModelsalready does for_familyMap— clear, then repopulate from the current response.Without the reset:
_copilotUtilityModel = ModelAand the current response no longer flags any model as the fallback, we'd keep serving the staleModelAfor the utility endpoint indefinitely.The reset on its own does not fix the original flake (a first-ever bad response still leaves
_copilotUtilityModelundefined), but it is correctness hygiene that the retry below relies on.2. Force one refresh in
getCopilotUtilityModel()when models are cached but the utility model is missing.The
trueargument is theforceparameter on_fetchModels, which bypasses the_shouldRefreshModels()cache check. Withoutforce, the retry would no-op immediately because the bad response just populated_familyMapand set_lastFetchTime = Date.now(). Withforce, we guarantee a fresh CAPI request, rebuild_familyMap, and (thanks to change #1) repopulate_copilotUtilityModelif the server flags one this time.The forced refresh is bounded to one call per
getCopilotUtilityModelinvocation and still flows throughTaskSingler, so coalescing semantics are preserved.