Skip to content

[Bugfix][Router] service_discovery: add has_ever_seen_model to K8sServiceNameServiceDiscovery#945

Open
saikpr wants to merge 4 commits into
vllm-project:mainfrom
saikpr:feat/k8s-service-name-has-ever-seen-model
Open

[Bugfix][Router] service_discovery: add has_ever_seen_model to K8sServiceNameServiceDiscovery#945
saikpr wants to merge 4 commits into
vllm-project:mainfrom
saikpr:feat/k8s-service-name-has-ever-seen-model

Conversation

@saikpr
Copy link
Copy Markdown

@saikpr saikpr commented May 8, 2026

Summary

  • Adds has_ever_seen_model hook to K8sServiceNameServiceDiscovery, tracked from each Service's spec.selector["model"] label at watch time — independent of pod readiness.
  • Lets request.py distinguish a configured-but-scaled-to-zero model (503, retriable) from a bogus model name (404, terminal).
  • Mirrors the hook already implemented on StaticServiceDiscovery (fix(service_discovery): correctly return 503 on missing endpoints #889) and K8sPodIPServiceDiscovery.

Motivation

When KEDA scales a Deployment to zero on idle, the Service remains but no pod ever answers a /v1/models probe, so the router never registers the model. Any client request for that model hits request.py's 404 branch, and the OpenAI SDK treats 404 as terminal (no retry) — masking KEDA's scale-from-zero. The has_ever_seen_model hook (already consumed by request.py) converts the response to 503, and clients retry normally.

Test plan

  • Unit test: service ADDED with selector.model=Xhas_ever_seen_model("X") == True even with no Ready pod.
  • Unit test: service DELETED → has_ever_seen_model returns False only after the last Service referencing that label is removed.
  • E2E: scale backing Deployment to zero with KEDA; request to /v1/chat/completions for that model returns 503 (not 404).

Related: #889

…viceNameServiceDiscovery

Populate a "known models" set from each K8s Service's
spec.selector["model"] label at watch time, regardless of whether a
backing pod is Ready. This lets the request path distinguish a
configured-but-scaled-to-zero model (-> 503) from a bogus model name
(-> 404), matching the behavior already implemented for
StaticServiceDiscovery and K8sPodIPServiceDiscovery.

Rationale: when KEDA scales a Deployment to zero on idle, the Service
remains but no pod ever answers a /v1/models probe, so the router
never registers the model. Any client request for that model hits
request.py's 404 branch, and the OpenAI SDK treats 404 as terminal
(no retry), masking KEDA's scale-from-zero. The has_ever_seen_model
hook, already consumed by request.py, converts the response to 503 and
clients retry normally.

Related: vllm-project#889 (added the same hook to StaticServiceDiscovery).
Signed-off-by: Sainyam Kapoor <sainyam@amazon.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to track models even when their backing deployments are scaled to zero by monitoring Kubernetes Service selectors. It adds state management for known models and a helper method to check if a model has been previously observed. Feedback focuses on performance and correctness: a synchronous Kubernetes API call in the watcher path should be removed to avoid blocking the thread, and the tracking logic needs to be updated to correctly handle the removal of stale model entries when a service is updated to point to a different model.

Comment thread src/vllm_router/service_discovery.py Outdated
label = model_label
if label is None:
try:
svc = self.k8s_api.read_namespaced_service(engine_name, self.namespace)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This synchronous K8s API call is performed on every MODIFIED event when the service is not ready (since model_label is passed as None from _watch_engines). This can block the watcher thread and significantly degrade performance. It is recommended to extract the model label from the service object directly in _watch_engines and pass it to _on_engine_update regardless of readiness, which would eliminate the need for this fallback API call.

Comment thread src/vllm_router/service_discovery.py Outdated
Comment on lines +1280 to +1311
def _track_known_model(
self,
engine_name: str,
event: str,
model_label: Optional[str],
) -> None:
"""Record every model observed via Service spec.selector["model"].

Ensures has_ever_seen_model() returns True even when the backing
Deployment is scaled to zero (no Ready pod -> no /v1/models probe
-> never registered by _on_engine_update without this hook).
"""
if event == "DELETED":
with self.known_models_lock:
label = self._service_to_model.pop(engine_name, None)
if label is None:
return
if not any(v == label for v in self._service_to_model.values()):
self.known_models.discard(label)
return
label = model_label
if label is None:
try:
svc = self.k8s_api.read_namespaced_service(engine_name, self.namespace)
label = svc.spec.selector.get("model") if svc.spec.selector else None
except Exception:
label = None
if not label:
return
with self.known_models_lock:
self._service_to_model[engine_name] = label
self.known_models.add(label)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of _track_known_model does not clean up old model labels when a service is updated to point to a new model. This leads to stale entries remaining in known_models even if no other service references them. The suggested change handles the update case by checking if the old label is still in use before discarding it.

    def _track_known_model(
        self,
        engine_name: str,
        event: str,
        model_label: Optional[str],
    ) -> None:
        """Record every model observed via Service spec.selector["model"].

        Ensures has_ever_seen_model() returns True even when the backing
        Deployment is scaled to zero (no Ready pod -> no /v1/models probe
        -> never registered by _on_engine_update without this hook).
        """
        if event == "DELETED":
            with self.known_models_lock:
                label = self._service_to_model.pop(engine_name, None)
                if label is not None and not any(
                    v == label for v in self._service_to_model.values()
                ):
                    self.known_models.discard(label)
            return

        label = model_label
        if label is None:
            try:
                svc = self.k8s_api.read_namespaced_service(engine_name, self.namespace)
                label = svc.spec.selector.get("model") if svc.spec.selector else None
            except Exception:
                label = None

        if not label:
            return

        with self.known_models_lock:
            old_label = self._service_to_model.get(engine_name)
            if old_label == label:
                return

            self._service_to_model[engine_name] = label
            self.known_models.add(label)

            # Clean up old label if it's no longer used by any service
            if old_label is not None and not any(
                v == old_label for v in self._service_to_model.values()
            ):
                self.known_models.discard(old_label)

Sainyam Kapoor added 2 commits May 7, 2026 19:17
…s_ever_seen_model

Cover the _track_known_model refcounting logic:
- ADDED event records the model_label.
- ADDED with label=None falls back to reading spec.selector["model"] via k8s_api.
- Services without a "model" selector are ignored.
- k8s_api read failures are swallowed (no crash).
- DELETED drops the label only after the last service referencing it is gone.
- MODIFIED updates the service -> label mapping.

Signed-off-by: Sainyam Kapoor <sainyam@amazon.com>
…overy.has_ever_seen_model

- Hoist _get_model_label(service) in _watch_engines so it reads the in-hand
  Service object regardless of pod readiness; drop the fallback
  read_namespaced_service call that blocked the watcher thread on every
  not-ready event (per gemini HIGH).
- Wire DELETED events into _track_known_model so known_models / refcount
  map are reclaimed (previously DELETED early-returned in _watch_engines,
  leaving labels to accumulate for process lifetime).
- Drop stale label on relabel: when a Service's spec.selector["model"]
  changes, remove the old label from known_models if no other Service
  still references it (per gemini MEDIUM).
- Add get_known_models() for parity with K8sPodIPServiceDiscovery.
- Expand tests: relabel-drops-stale, relabel-keeps-old-when-other-refs-it,
  deleted-unknown-svc, modified-same-label-noop, get_known_models snapshot.
  Consolidate setup into a pytest fixture.

Signed-off-by: Sainyam Kapoor <sainyam@amazon.com>
@saikpr
Copy link
Copy Markdown
Author

saikpr commented May 8, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to track models discovered via Kubernetes Service labels, ensuring they remain 'known' even when deployments are scaled to zero. It adds state management and helper methods to the K8sServiceNameServiceDiscovery class, along with corresponding unit tests. Feedback identifies a logic error in the _track_known_model method where removing a model label during a MODIFIED event fails to clean up the old label, potentially leading to stale data. A refactoring is suggested to handle event types more robustly and improve efficiency.

Comment on lines +1296 to +1312
old_label = self._service_to_model.get(engine_name)
if event == "DELETED":
if old_label is None:
return
del self._service_to_model[engine_name]
if old_label not in self._service_to_model.values():
self.known_models.discard(old_label)
return
if not model_label or old_label == model_label:
return
self._service_to_model[engine_name] = model_label
self.known_models.add(model_label)
if (
old_label is not None
and old_label not in self._service_to_model.values()
):
self.known_models.discard(old_label)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of _track_known_model contains a logic error when a service's model label is removed (i.e., the label is deleted from the Service spec) during a MODIFIED event. The early return at line 1304 prevents the cleanup of the stale old_label from self._service_to_model and self.known_models, leading to incorrect tracking and a potential memory leak of stale mappings.

Additionally, the logic can be simplified and unified for both DELETED and MODIFIED events. Note that the check old_label not in self._service_to_model.values() is an $O(N)$ operation performed while holding a lock. While $N$ is typically small for Kubernetes services in a single namespace, using a reference counting dictionary would be more efficient for larger deployments.

            old_label = self._service_to_model.get(engine_name)
            new_label = model_label if event != "DELETED" else None

            if old_label == new_label:
                return

            if new_label:
                self._service_to_model[engine_name] = new_label
                self.known_models.add(new_label)
            else:
                self._service_to_model.pop(engine_name, None)

            if old_label is not None and old_label not in self._service_to_model.values():
                self.known_models.discard(old_label)

Fixes a stale-label leak when a Service's MODIFIED event removes the
model selector (spec.selector["model"] dropped). Previously the
`not model_label` early-return left old_label in _service_to_model and
known_models, so has_ever_seen_model continued to report True for a
label that the cluster no longer references.

Unify the DELETED and MODIFIED-with-no-label paths: compute new_label
(None for DELETED or missing selector), short-circuit on unchanged
label, then perform the ref-count update in a single code path.

Add tests for both "MODIFIED removes selector" branches.

Signed-off-by: Sainyam Kapoor <sainyam@amazon.com>
@saikpr
Copy link
Copy Markdown
Author

saikpr commented May 8, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements model tracking in K8sServiceNameServiceDiscovery to support scale-from-zero by monitoring Kubernetes service labels. It includes new unit tests for event handling and reference counting. Feedback highlights the need to track actual model names from engines for consistency and suggests optimizing the reference counting logic to O(1) to avoid performance issues during service updates.

model_names: List[str],
model_label: Optional[str],
) -> None:
self._track_known_model(engine_name, event, model_label)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This implementation tracks the model_label in known_models to support scale-from-zero, but it misses the actual model_names returned by the engine (such as adapters or models with IDs different from the label). To maintain consistency with K8sPodIPServiceDiscovery and ensure that all discovered models are recognized by has_ever_seen_model, you should also update known_models with the actual model_names when the service is ready.

Suggested change
self._track_known_model(engine_name, event, model_label)
self._track_known_model(engine_name, event, model_label)
if event in ("ADDED", "MODIFIED") and is_service_ready and model_names:
# Track all models we've ever seen
with self.known_models_lock:
self.known_models.update(model_names)

self._service_to_model.pop(engine_name, None)
if (
old_label is not None
and old_label not in self._service_to_model.values()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The check old_label not in self._service_to_model.values() has $O(N)$ time complexity, where $N$ is the number of services. While $N$ is typically small for Kubernetes services in this context, this check is performed inside a lock on every service update or deletion event. Consider maintaining a reference count for model labels (e.g., using a dictionary to track how many services reference each label) to make this check $O(1)$.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant