[Bugfix][Router] service_discovery: add has_ever_seen_model to K8sServiceNameServiceDiscovery#945
Conversation
…viceNameServiceDiscovery Populate a "known models" set from each K8s Service's spec.selector["model"] label at watch time, regardless of whether a backing pod is Ready. This lets the request path distinguish a configured-but-scaled-to-zero model (-> 503) from a bogus model name (-> 404), matching the behavior already implemented for StaticServiceDiscovery and K8sPodIPServiceDiscovery. Rationale: when KEDA scales a Deployment to zero on idle, the Service remains but no pod ever answers a /v1/models probe, so the router never registers the model. Any client request for that model hits request.py's 404 branch, and the OpenAI SDK treats 404 as terminal (no retry), masking KEDA's scale-from-zero. The has_ever_seen_model hook, already consumed by request.py, converts the response to 503 and clients retry normally. Related: vllm-project#889 (added the same hook to StaticServiceDiscovery). Signed-off-by: Sainyam Kapoor <sainyam@amazon.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to track models even when their backing deployments are scaled to zero by monitoring Kubernetes Service selectors. It adds state management for known models and a helper method to check if a model has been previously observed. Feedback focuses on performance and correctness: a synchronous Kubernetes API call in the watcher path should be removed to avoid blocking the thread, and the tracking logic needs to be updated to correctly handle the removal of stale model entries when a service is updated to point to a different model.
| label = model_label | ||
| if label is None: | ||
| try: | ||
| svc = self.k8s_api.read_namespaced_service(engine_name, self.namespace) |
There was a problem hiding this comment.
This synchronous K8s API call is performed on every MODIFIED event when the service is not ready (since model_label is passed as None from _watch_engines). This can block the watcher thread and significantly degrade performance. It is recommended to extract the model label from the service object directly in _watch_engines and pass it to _on_engine_update regardless of readiness, which would eliminate the need for this fallback API call.
| def _track_known_model( | ||
| self, | ||
| engine_name: str, | ||
| event: str, | ||
| model_label: Optional[str], | ||
| ) -> None: | ||
| """Record every model observed via Service spec.selector["model"]. | ||
|
|
||
| Ensures has_ever_seen_model() returns True even when the backing | ||
| Deployment is scaled to zero (no Ready pod -> no /v1/models probe | ||
| -> never registered by _on_engine_update without this hook). | ||
| """ | ||
| if event == "DELETED": | ||
| with self.known_models_lock: | ||
| label = self._service_to_model.pop(engine_name, None) | ||
| if label is None: | ||
| return | ||
| if not any(v == label for v in self._service_to_model.values()): | ||
| self.known_models.discard(label) | ||
| return | ||
| label = model_label | ||
| if label is None: | ||
| try: | ||
| svc = self.k8s_api.read_namespaced_service(engine_name, self.namespace) | ||
| label = svc.spec.selector.get("model") if svc.spec.selector else None | ||
| except Exception: | ||
| label = None | ||
| if not label: | ||
| return | ||
| with self.known_models_lock: | ||
| self._service_to_model[engine_name] = label | ||
| self.known_models.add(label) |
There was a problem hiding this comment.
The current implementation of _track_known_model does not clean up old model labels when a service is updated to point to a new model. This leads to stale entries remaining in known_models even if no other service references them. The suggested change handles the update case by checking if the old label is still in use before discarding it.
def _track_known_model(
self,
engine_name: str,
event: str,
model_label: Optional[str],
) -> None:
"""Record every model observed via Service spec.selector["model"].
Ensures has_ever_seen_model() returns True even when the backing
Deployment is scaled to zero (no Ready pod -> no /v1/models probe
-> never registered by _on_engine_update without this hook).
"""
if event == "DELETED":
with self.known_models_lock:
label = self._service_to_model.pop(engine_name, None)
if label is not None and not any(
v == label for v in self._service_to_model.values()
):
self.known_models.discard(label)
return
label = model_label
if label is None:
try:
svc = self.k8s_api.read_namespaced_service(engine_name, self.namespace)
label = svc.spec.selector.get("model") if svc.spec.selector else None
except Exception:
label = None
if not label:
return
with self.known_models_lock:
old_label = self._service_to_model.get(engine_name)
if old_label == label:
return
self._service_to_model[engine_name] = label
self.known_models.add(label)
# Clean up old label if it's no longer used by any service
if old_label is not None and not any(
v == old_label for v in self._service_to_model.values()
):
self.known_models.discard(old_label)…s_ever_seen_model Cover the _track_known_model refcounting logic: - ADDED event records the model_label. - ADDED with label=None falls back to reading spec.selector["model"] via k8s_api. - Services without a "model" selector are ignored. - k8s_api read failures are swallowed (no crash). - DELETED drops the label only after the last service referencing it is gone. - MODIFIED updates the service -> label mapping. Signed-off-by: Sainyam Kapoor <sainyam@amazon.com>
…overy.has_ever_seen_model - Hoist _get_model_label(service) in _watch_engines so it reads the in-hand Service object regardless of pod readiness; drop the fallback read_namespaced_service call that blocked the watcher thread on every not-ready event (per gemini HIGH). - Wire DELETED events into _track_known_model so known_models / refcount map are reclaimed (previously DELETED early-returned in _watch_engines, leaving labels to accumulate for process lifetime). - Drop stale label on relabel: when a Service's spec.selector["model"] changes, remove the old label from known_models if no other Service still references it (per gemini MEDIUM). - Add get_known_models() for parity with K8sPodIPServiceDiscovery. - Expand tests: relabel-drops-stale, relabel-keeps-old-when-other-refs-it, deleted-unknown-svc, modified-same-label-noop, get_known_models snapshot. Consolidate setup into a pytest fixture. Signed-off-by: Sainyam Kapoor <sainyam@amazon.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to track models discovered via Kubernetes Service labels, ensuring they remain 'known' even when deployments are scaled to zero. It adds state management and helper methods to the K8sServiceNameServiceDiscovery class, along with corresponding unit tests. Feedback identifies a logic error in the _track_known_model method where removing a model label during a MODIFIED event fails to clean up the old label, potentially leading to stale data. A refactoring is suggested to handle event types more robustly and improve efficiency.
| old_label = self._service_to_model.get(engine_name) | ||
| if event == "DELETED": | ||
| if old_label is None: | ||
| return | ||
| del self._service_to_model[engine_name] | ||
| if old_label not in self._service_to_model.values(): | ||
| self.known_models.discard(old_label) | ||
| return | ||
| if not model_label or old_label == model_label: | ||
| return | ||
| self._service_to_model[engine_name] = model_label | ||
| self.known_models.add(model_label) | ||
| if ( | ||
| old_label is not None | ||
| and old_label not in self._service_to_model.values() | ||
| ): | ||
| self.known_models.discard(old_label) |
There was a problem hiding this comment.
The current implementation of _track_known_model contains a logic error when a service's model label is removed (i.e., the label is deleted from the Service spec) during a MODIFIED event. The early return at line 1304 prevents the cleanup of the stale old_label from self._service_to_model and self.known_models, leading to incorrect tracking and a potential memory leak of stale mappings.
Additionally, the logic can be simplified and unified for both DELETED and MODIFIED events. Note that the check old_label not in self._service_to_model.values() is an
old_label = self._service_to_model.get(engine_name)
new_label = model_label if event != "DELETED" else None
if old_label == new_label:
return
if new_label:
self._service_to_model[engine_name] = new_label
self.known_models.add(new_label)
else:
self._service_to_model.pop(engine_name, None)
if old_label is not None and old_label not in self._service_to_model.values():
self.known_models.discard(old_label)Fixes a stale-label leak when a Service's MODIFIED event removes the model selector (spec.selector["model"] dropped). Previously the `not model_label` early-return left old_label in _service_to_model and known_models, so has_ever_seen_model continued to report True for a label that the cluster no longer references. Unify the DELETED and MODIFIED-with-no-label paths: compute new_label (None for DELETED or missing selector), short-circuit on unchanged label, then perform the ref-count update in a single code path. Add tests for both "MODIFIED removes selector" branches. Signed-off-by: Sainyam Kapoor <sainyam@amazon.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements model tracking in K8sServiceNameServiceDiscovery to support scale-from-zero by monitoring Kubernetes service labels. It includes new unit tests for event handling and reference counting. Feedback highlights the need to track actual model names from engines for consistency and suggests optimizing the reference counting logic to O(1) to avoid performance issues during service updates.
| model_names: List[str], | ||
| model_label: Optional[str], | ||
| ) -> None: | ||
| self._track_known_model(engine_name, event, model_label) |
There was a problem hiding this comment.
This implementation tracks the model_label in known_models to support scale-from-zero, but it misses the actual model_names returned by the engine (such as adapters or models with IDs different from the label). To maintain consistency with K8sPodIPServiceDiscovery and ensure that all discovered models are recognized by has_ever_seen_model, you should also update known_models with the actual model_names when the service is ready.
| self._track_known_model(engine_name, event, model_label) | |
| self._track_known_model(engine_name, event, model_label) | |
| if event in ("ADDED", "MODIFIED") and is_service_ready and model_names: | |
| # Track all models we've ever seen | |
| with self.known_models_lock: | |
| self.known_models.update(model_names) |
| self._service_to_model.pop(engine_name, None) | ||
| if ( | ||
| old_label is not None | ||
| and old_label not in self._service_to_model.values() |
There was a problem hiding this comment.
The check old_label not in self._service_to_model.values() has
Summary
has_ever_seen_modelhook toK8sServiceNameServiceDiscovery, tracked from each Service'sspec.selector["model"]label at watch time — independent of pod readiness.request.pydistinguish a configured-but-scaled-to-zero model (503, retriable) from a bogus model name (404, terminal).StaticServiceDiscovery(fix(service_discovery): correctly return 503 on missing endpoints #889) andK8sPodIPServiceDiscovery.Motivation
When KEDA scales a Deployment to zero on idle, the Service remains but no pod ever answers a
/v1/modelsprobe, so the router never registers the model. Any client request for that model hitsrequest.py's 404 branch, and the OpenAI SDK treats 404 as terminal (no retry) — masking KEDA's scale-from-zero. Thehas_ever_seen_modelhook (already consumed byrequest.py) converts the response to 503, and clients retry normally.Test plan
selector.model=X→has_ever_seen_model("X") == Trueeven with no Ready pod.has_ever_seen_modelreturns False only after the last Service referencing that label is removed./v1/chat/completionsfor that model returns 503 (not 404).Related: #889