Wire _resolve_embeddings into _run_hybrid_search (sync + async), plumb generator through dispatcher, and add diagnostics span
Parent: 46729
Depends on: 46732
Goal
Make the resolver actually run on real queries, fail-fast when embeddings are needed but no generator is configured, and surface embedding-generation latency as a first-class OpenTelemetry span. (Diagnostics scope merged from #46734 .)
Scope
Pipeline plumbing
�zure/cosmos/_execution_context/execution_dispatcher.py :: _ProxyQueryExecutionContext:
Accept �mbedding_generator in its options dict.
Pass it through to the hybrid-search aggregator.
hybrid_search_aggregator.py (sync) and �io/hybrid_search_aggregator.py (async):
In _run_hybrid_search / async sibling, after the plan is obtained and before per-partition fan-out, call _resolve_embeddings (from [Cosmos] [Embedding V0] _resolve_embeddings helper on hybrid-search aggregator (sync + async) #46732 ) and replace the working SqlQuerySpec with the augmented one.
Fast-fail : if the plan's �mbeddingParameterMap is non-empty but �mbedding_generator is None, raise a ValueError (or CosmosHttpResponseError 400) with:
"Query requires embedding generation but no embedding_generator was provided to query_items."
Thread the generator from container.py / �io/_container.py → �xecution_dispatcher → aggregator (already started in [Cosmos] [Embedding V0] Public surface: EmbeddingGenerator protocols + embedding_generator on query_items #46730 ; confirm the option key name is �mbedding_generator).
The resolver is called exactly once per top-level query attempt , not on every continuation page. Assert this with a counting mock in unit tests.
Diagnostics (merged from #46734 )
Inside _resolve_embeddings (and async sibling), wrap the generator call in an OpenTelemetry span:
`python
from azure.core.settings import settings
import time, logging
_logger = logging.getLogger("azure.cosmos")
tracer = settings.tracing_implementation()
if tracer is not None:
with tracer.span(name="cosmos.embedding_generation") as span:
span.add_attribute("cosmos.embedding.count", len(texts))
span.add_attribute("cosmos.embedding.generator_type", type(generator).name )
start = time.perf_counter()
vectors = generator.generate_embeddings(list(texts))
span.add_attribute("cosmos.embedding.latency_ms",
int((time.perf_counter() - start) * 1000))
else:
start = time.perf_counter()
vectors = generator.generate_embeddings(list(texts))
_logger.info(
"embedding_generation count=%d generator_type=%s latency_ms=%d",
len(texts), type(generator).name ,
int((time.perf_counter() - start) * 1000),
)
`
Mirror in async aggregator (await the call inside the span).
Privacy:
Never log raw input strings.
Never log returned vectors.
Parameter key names (e.g. @documentdb-hybridsearchquery-embedding-0) may be logged — they are not customer data.
If a tracer is configured, record a span event when the generator raises (so failures appear in traces, not just logs).
Acceptance criteria
End-to-end happy path works with a mocked plan + mocked generator (sync + async).
Fast-fail exception raised with the right message when generator is missing.
Generator called exactly once per top-level query attempt (assert with a counting mock).
With an OTel tracer, a cosmos.embedding_generation span is produced with count, latency_ms, and generator_type attributes.
Without a tracer, an INFO-level log line is produced.
Raw text / vectors do not appear in span attributes or log output (assert in tests).
Files likely touched
sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/execution_dispatcher.py
sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/aio/execution_dispatcher.py
sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/hybrid_search_aggregator.py
sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/aio/hybrid_search_aggregator.py
sdk/cosmos/azure-cosmos/azure/cosmos/container.py (option threading)
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_container.py (option threading)
Dependencies
Wire
_resolve_embeddingsinto_run_hybrid_search(sync + async), plumb generator through dispatcher, and add diagnostics spanParent: 46729
Depends on: 46732
Goal
Make the resolver actually run on real queries, fail-fast when embeddings are needed but no generator is configured, and surface embedding-generation latency as a first-class OpenTelemetry span. (Diagnostics scope merged from #46734.)
Scope
Pipeline plumbing
�zure/cosmos/_execution_context/execution_dispatcher.py :: _ProxyQueryExecutionContext:
hybrid_search_aggregator.py (sync) and �io/hybrid_search_aggregator.py (async):
Thread the generator from container.py / �io/_container.py → �xecution_dispatcher → aggregator (already started in [Cosmos] [Embedding V0] Public surface: EmbeddingGenerator protocols + embedding_generator on query_items #46730; confirm the option key name is �mbedding_generator).
The resolver is called exactly once per top-level query attempt, not on every continuation page. Assert this with a counting mock in unit tests.
Diagnostics (merged from #46734)
Inside _resolve_embeddings (and async sibling), wrap the generator call in an OpenTelemetry span:
`python
from azure.core.settings import settings
import time, logging
_logger = logging.getLogger("azure.cosmos")
tracer = settings.tracing_implementation()
if tracer is not None:
with tracer.span(name="cosmos.embedding_generation") as span:
span.add_attribute("cosmos.embedding.count", len(texts))
span.add_attribute("cosmos.embedding.generator_type", type(generator).name)
start = time.perf_counter()
vectors = generator.generate_embeddings(list(texts))
span.add_attribute("cosmos.embedding.latency_ms",
int((time.perf_counter() - start) * 1000))
else:
start = time.perf_counter()
vectors = generator.generate_embeddings(list(texts))
_logger.info(
"embedding_generation count=%d generator_type=%s latency_ms=%d",
len(texts), type(generator).name,
int((time.perf_counter() - start) * 1000),
)
`
Mirror in async aggregator (await the call inside the span).
Privacy:
If a tracer is configured, record a span event when the generator raises (so failures appear in traces, not just logs).
Acceptance criteria
Files likely touched
Dependencies
_resolve_embeddingshelper ([Cosmos] [Embedding V0] _resolve_embeddings helper on hybrid-search aggregator (sync + async) #46732).