[Bugfix][Router] service_discovery: add has_ever_seen_model to K8sServiceNameServiceDiscovery by saikpr · Pull Request #945 · vllm-project/production-stack

saikpr · 2026-05-08T02:14:54Z

Summary

Adds has_ever_seen_model hook to K8sServiceNameServiceDiscovery, tracked from each Service's spec.selector["model"] label at watch time — independent of pod readiness.
Lets request.py distinguish a configured-but-scaled-to-zero model (503, retriable) from a bogus model name (404, terminal).
Mirrors the hook already implemented on StaticServiceDiscovery (fix(service_discovery): correctly return 503 on missing endpoints #889) and K8sPodIPServiceDiscovery.

Motivation

When KEDA scales a Deployment to zero on idle, the Service remains but no pod ever answers a /v1/models probe, so the router never registers the model. Any client request for that model hits request.py's 404 branch, and the OpenAI SDK treats 404 as terminal (no retry) — masking KEDA's scale-from-zero. The has_ever_seen_model hook (already consumed by request.py) converts the response to 503, and clients retry normally.

Test plan

Unit test: service ADDED with selector.model=X → has_ever_seen_model("X") == True even with no Ready pod.
Unit test: service DELETED → has_ever_seen_model returns False only after the last Service referencing that label is removed.
E2E: scale backing Deployment to zero with KEDA; request to /v1/chat/completions for that model returns 503 (not 404).

Related: #889

…viceNameServiceDiscovery Populate a "known models" set from each K8s Service's spec.selector["model"] label at watch time, regardless of whether a backing pod is Ready. This lets the request path distinguish a configured-but-scaled-to-zero model (-> 503) from a bogus model name (-> 404), matching the behavior already implemented for StaticServiceDiscovery and K8sPodIPServiceDiscovery. Rationale: when KEDA scales a Deployment to zero on idle, the Service remains but no pod ever answers a /v1/models probe, so the router never registers the model. Any client request for that model hits request.py's 404 branch, and the OpenAI SDK treats 404 as terminal (no retry), masking KEDA's scale-from-zero. The has_ever_seen_model hook, already consumed by request.py, converts the response to 503 and clients retry normally. Related: vllm-project#889 (added the same hook to StaticServiceDiscovery). Signed-off-by: Sainyam Kapoor <sainyam@amazon.com>

gemini-code-assist

Code Review

This pull request introduces a mechanism to track models even when their backing deployments are scaled to zero by monitoring Kubernetes Service selectors. It adds state management for known models and a helper method to check if a model has been previously observed. Feedback focuses on performance and correctness: a synchronous Kubernetes API call in the watcher path should be removed to avoid blocking the thread, and the tracking logic needs to be updated to correctly handle the removal of stale model entries when a service is updated to point to a different model.

gemini-code-assist · 2026-05-08T02:16:41Z

+        label = model_label
+        if label is None:
+            try:
+                svc = self.k8s_api.read_namespaced_service(engine_name, self.namespace)


This synchronous K8s API call is performed on every MODIFIED event when the service is not ready (since model_label is passed as None from _watch_engines). This can block the watcher thread and significantly degrade performance. It is recommended to extract the model label from the service object directly in _watch_engines and pass it to _on_engine_update regardless of readiness, which would eliminate the need for this fallback API call.

gemini-code-assist · 2026-05-08T02:16:41Z

+    def _track_known_model(
+        self,
+        engine_name: str,
+        event: str,
+        model_label: Optional[str],
+    ) -> None:
+        """Record every model observed via Service spec.selector["model"].
+
+        Ensures has_ever_seen_model() returns True even when the backing
+        Deployment is scaled to zero (no Ready pod -> no /v1/models probe
+        -> never registered by _on_engine_update without this hook).
+        """
+        if event == "DELETED":
+            with self.known_models_lock:
+                label = self._service_to_model.pop(engine_name, None)
+                if label is None:
+                    return
+                if not any(v == label for v in self._service_to_model.values()):
+                    self.known_models.discard(label)
+            return
+        label = model_label
+        if label is None:
+            try:
+                svc = self.k8s_api.read_namespaced_service(engine_name, self.namespace)
+                label = svc.spec.selector.get("model") if svc.spec.selector else None
+            except Exception:
+                label = None
+        if not label:
+            return
+        with self.known_models_lock:
+            self._service_to_model[engine_name] = label
+            self.known_models.add(label)


The current implementation of _track_known_model does not clean up old model labels when a service is updated to point to a new model. This leads to stale entries remaining in known_models even if no other service references them. The suggested change handles the update case by checking if the old label is still in use before discarding it.

def _track_known_model( self, engine_name: str, event: str, model_label: Optional[str], ) -> None: """Record every model observed via Service spec.selector["model"]. Ensures has_ever_seen_model() returns True even when the backing Deployment is scaled to zero (no Ready pod -> no /v1/models probe -> never registered by _on_engine_update without this hook). """ if event == "DELETED": with self.known_models_lock: label = self._service_to_model.pop(engine_name, None) if label is not None and not any( v == label for v in self._service_to_model.values() ): self.known_models.discard(label) return label = model_label if label is None: try: svc = self.k8s_api.read_namespaced_service(engine_name, self.namespace) label = svc.spec.selector.get("model") if svc.spec.selector else None except Exception: label = None if not label: return with self.known_models_lock: old_label = self._service_to_model.get(engine_name) if old_label == label: return self._service_to_model[engine_name] = label self.known_models.add(label) # Clean up old label if it's no longer used by any service if old_label is not None and not any( v == old_label for v in self._service_to_model.values() ): self.known_models.discard(old_label)

…s_ever_seen_model Cover the _track_known_model refcounting logic: - ADDED event records the model_label. - ADDED with label=None falls back to reading spec.selector["model"] via k8s_api. - Services without a "model" selector are ignored. - k8s_api read failures are swallowed (no crash). - DELETED drops the label only after the last service referencing it is gone. - MODIFIED updates the service -> label mapping. Signed-off-by: Sainyam Kapoor <sainyam@amazon.com>

…overy.has_ever_seen_model - Hoist _get_model_label(service) in _watch_engines so it reads the in-hand Service object regardless of pod readiness; drop the fallback read_namespaced_service call that blocked the watcher thread on every not-ready event (per gemini HIGH). - Wire DELETED events into _track_known_model so known_models / refcount map are reclaimed (previously DELETED early-returned in _watch_engines, leaving labels to accumulate for process lifetime). - Drop stale label on relabel: when a Service's spec.selector["model"] changes, remove the old label from known_models if no other Service still references it (per gemini MEDIUM). - Add get_known_models() for parity with K8sPodIPServiceDiscovery. - Expand tests: relabel-drops-stale, relabel-keeps-old-when-other-refs-it, deleted-unknown-svc, modified-same-label-noop, get_known_models snapshot. Consolidate setup into a pytest fixture. Signed-off-by: Sainyam Kapoor <sainyam@amazon.com>

saikpr · 2026-05-08T12:39:30Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a mechanism to track models discovered via Kubernetes Service labels, ensuring they remain 'known' even when deployments are scaled to zero. It adds state management and helper methods to the K8sServiceNameServiceDiscovery class, along with corresponding unit tests. Feedback identifies a logic error in the _track_known_model method where removing a model label during a MODIFIED event fails to clean up the old label, potentially leading to stale data. A refactoring is suggested to handle event types more robustly and improve efficiency.

gemini-code-assist · 2026-05-08T12:41:38Z

+            old_label = self._service_to_model.get(engine_name)
+            if event == "DELETED":
+                if old_label is None:
+                    return
+                del self._service_to_model[engine_name]
+                if old_label not in self._service_to_model.values():
+                    self.known_models.discard(old_label)
+                return
+            if not model_label or old_label == model_label:
+                return
+            self._service_to_model[engine_name] = model_label
+            self.known_models.add(model_label)
+            if (
+                old_label is not None
+                and old_label not in self._service_to_model.values()
+            ):
+                self.known_models.discard(old_label)


The current implementation of _track_known_model contains a logic error when a service's model label is removed (i.e., the label is deleted from the Service spec) during a MODIFIED event. The early return at line 1304 prevents the cleanup of the stale old_label from self._service_to_model and self.known_models, leading to incorrect tracking and a potential memory leak of stale mappings.

Additionally, the logic can be simplified and unified for both DELETED and MODIFIED events. Note that the check old_label not in self._service_to_model.values() is an $O(N)$ operation performed while holding a lock. While $N$ is typically small for Kubernetes services in a single namespace, using a reference counting dictionary would be more efficient for larger deployments.

old_label = self._service_to_model.get(engine_name) new_label = model_label if event != "DELETED" else None if old_label == new_label: return if new_label: self._service_to_model[engine_name] = new_label self.known_models.add(new_label) else: self._service_to_model.pop(engine_name, None) if old_label is not None and old_label not in self._service_to_model.values(): self.known_models.discard(old_label)

Fixes a stale-label leak when a Service's MODIFIED event removes the model selector (spec.selector["model"] dropped). Previously the `not model_label` early-return left old_label in _service_to_model and known_models, so has_ever_seen_model continued to report True for a label that the cluster no longer references. Unify the DELETED and MODIFIED-with-no-label paths: compute new_label (None for DELETED or missing selector), short-circuit on unchanged label, then perform the ref-count update in a single code path. Add tests for both "MODIFIED removes selector" branches. Signed-off-by: Sainyam Kapoor <sainyam@amazon.com>

saikpr · 2026-05-08T13:58:34Z

/gemini review

gemini-code-assist

Code Review

This pull request implements model tracking in K8sServiceNameServiceDiscovery to support scale-from-zero by monitoring Kubernetes service labels. It includes new unit tests for event handling and reference counting. Feedback highlights the need to track actual model names from engines for consistency and suggests optimizing the reference counting logic to O(1) to avoid performance issues during service updates.

gemini-code-assist · 2026-05-08T14:00:59Z

        model_names: List[str],
        model_label: Optional[str],
    ) -> None:
+        self._track_known_model(engine_name, event, model_label)


This implementation tracks the model_label in known_models to support scale-from-zero, but it misses the actual model_names returned by the engine (such as adapters or models with IDs different from the label). To maintain consistency with K8sPodIPServiceDiscovery and ensure that all discovered models are recognized by has_ever_seen_model, you should also update known_models with the actual model_names when the service is ready.

Suggested change

self._track_known_model(engine_name, event, model_label)

self._track_known_model(engine_name, event, model_label)

if event in ("ADDED", "MODIFIED") and is_service_ready and model_names:

# Track all models we've ever seen

with self.known_models_lock:

self.known_models.update(model_names)

gemini-code-assist · 2026-05-08T14:01:00Z

+                self._service_to_model.pop(engine_name, None)
+            if (
+                old_label is not None
+                and old_label not in self._service_to_model.values()


The check old_label not in self._service_to_model.values() has $O(N)$ time complexity, where $N$ is the number of services. While $N$ is typically small for Kubernetes services in this context, this check is performed inside a lock on every service update or deletion event. Consider maintaining a reference count for model labels (e.g., using a dictionary to track how many services reference each label) to make this check $O(1)$.

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

Sainyam Kapoor added 2 commits May 7, 2026 19:17

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix][Router] service_discovery: add has_ever_seen_model to K8sServiceNameServiceDiscovery#945

[Bugfix][Router] service_discovery: add has_ever_seen_model to K8sServiceNameServiceDiscovery#945
saikpr wants to merge 4 commits into
vllm-project:mainfrom
saikpr:feat/k8s-service-name-has-ever-seen-model

saikpr commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

saikpr commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

saikpr commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

saikpr commented May 8, 2026

Summary

Motivation

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

saikpr commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

saikpr commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant