test(integration): refresh stale notebook + workbench tests against current layout by fede-kamel · Pull Request #264 · oracle-samples/locus

fede-kamel · 2026-05-23T17:38:54Z

Eight pre-existing integration failures across three areas, all caused by tests drifting past code that moved. Not LLM-behavioral — purely staleness.

(A) `test_notebooks_subset.py` — 5 fails

TestNotebookExecution had 5 methods pointing at notebook filenames that no longer exist (the catalogue was renumbered to a contiguous 1-70 sequence). Updated each subprocess call to target the current file and renamed the test methods so the suite reads consistently.

Was	Is
`test_notebook_14_runs` → `notebook_19_sse_streaming.py`	`test_notebook_13_runs` → `notebook_13_sse_streaming.py`
`test_notebook_36_runs` → `notebook_41_structured_output.py`	`test_notebook_35_runs` → `notebook_35_structured_output.py`
`test_notebook_37_runs` → `notebook_42_reasoning_patterns.py`	`test_notebook_36_runs` → `notebook_36_reasoning_patterns.py`
`test_notebook_43_runs` → `notebook_48_playbooks.py`	`test_notebook_46_runs` → `notebook_46_playbooks.py`
`test_notebook_49_runs` → `notebook_54_checkpoint_backends.py`	`test_notebook_52_runs` → `notebook_52_checkpoint_backends.py`

(B) `test_workbench_categories.py` — 2 fails

workbench/backend/runner.py::NOTEBOOK_CATEGORIES combined router + observability into a single router-observability track. The tests still asserted the old taxonomy.

test_endpoint_returns_curated_categories — replaced "router" and "observability" in the required-id list with "router-observability".
Renamed test_observability_category_contains_new_sse_notebooks → test_router_observability_groups_router_plus_eventbus and pointed it at notebooks 58-61 (the current router + EventBus + observability notebooks) instead of 52-55 (which are now production / checkpointer tests).

(E) `examples/notebook_70_oci_tools.py` — 1 fail

The _env helper hard-required OCI_USE_PROFILE / OCI_USE_REGION / OCI_USE_TENANCY and OCI_GENAI_PROFILE, exiting 2 if any was missing. Hostile to users who already exported the standard OCI envelope (OCI_PROFILE, OCI_REGION, OCI_COMPARTMENT) — and the exact reason test_notebooks_all_live.py[notebook_70_oci_tools] failed in the live suite. Added a fallbacks= parameter to _env and wired every OCI_USE_* / OCI_GENAI_* read to fall back through the standard names.

Test plan

test_notebooks_subset.py::TestNotebookExecution — 5/5 pass (33s, real python examples/notebook_NN_*.py subprocesses)
test_workbench_categories.py::TestNotebookCategories — 4/4 pass against the live runner TestClient
test_notebooks_all_live.py[notebook_70_oci_tools] — passes with OCI_PROFILE / OCI_REGION / OCI_COMPARTMENT set
pre-commit run --files <staged> — pass
CI green

Out of scope

The other 6 failures from the full integration run were OCI read-timeouts under -n 4 parallel load (orchestrator + swarm tests hitting read timeout=60.0) and an xAI rate-limit hit — environmental, not code bugs. Those need a separate look at OCI throttling / pytest concurrency, not a test rewrite.

…urrent layout Three groups of pre-existing integration failures, none related to live LLM behavior — all are tests that drifted past code that moved. (A) ``tests/integration/test_notebooks_subset.py`` — ``TestNotebookExecution`` had 5 methods pointing at notebook filenames that no longer exist on disk (the notebook catalogue was renumbered to a contiguous 1-70 sequence). Updated each subprocess invocation to target the current file. Renamed the test methods to match the new numbers (``test_notebook_36_runs`` → ``test_notebook_35_runs`` etc.) so the suite reads consistently end-to-end. Added a header comment noting that these must stay in sync with ``examples/``. (B) ``tests/integration/test_workbench_categories.py`` — ``test_endpoint_returns_curated_categories`` asserted ``"router"`` was a top-level category; the workbench combined router + observability into a single ``"router-observability"`` track in ``workbench/backend/runner.py::NOTEBOOK_CATEGORIES``. Updated the required-id list. Renamed the SSE-suite assertion to ``test_router_observability_groups_router_plus_eventbus`` and pointed it at notebooks 58-61 (the actual router + EventBus + observability notebooks today) instead of 52-55 (which are now production / checkpointer tests). (E) ``examples/notebook_70_oci_tools.py`` — the ``_env`` helper hard- required ``OCI_USE_PROFILE`` / ``OCI_USE_REGION`` / ``OCI_USE_TENANCY`` and ``OCI_GENAI_PROFILE``, exiting 2 if any was missing. That's hostile to users who already exported the standard OCI envelope (``OCI_PROFILE`` / ``OCI_REGION`` / ``OCI_COMPARTMENT``), and it's exactly what tripped ``test_notebooks_all_live.py`` for this notebook in CI. Added a ``fallbacks=`` parameter to ``_env`` and wired every ``OCI_USE_*`` / ``OCI_GENAI_*`` read to fall back through the standard names. Documented in the helper's docstring. Local re-runs: - 5 ``test_notebooks_subset.py::TestNotebookExecution`` tests pass (33s, all run real ``python examples/notebook_NN_*.py`` subprocesses). - 4 ``test_workbench_categories.py::TestNotebookCategories`` tests pass against the live runner ``TestClient``. - ``test_notebooks_all_live.py[notebook_70_oci_tools]`` passes with ``OCI_PROFILE`` / ``OCI_REGION`` / ``OCI_COMPARTMENT`` set (no ``OCI_USE_*`` overrides required). Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

…x_tokens for reasoning models Three more pre-existing integration failures that surfaced once the stale-test cleanup landed. All caused by the test envelope being too tight for reasoning-model traffic (gpt-5.5, o-series), not by real bugs. (1) ``src/locus/models/providers/oci/client.py`` — the OCI Python SDK default read timeout is 60s, which isn't enough for reasoning-model summarization calls in the orchestrator + swarm flows (first response token can take 90-180s to arrive after the model finishes hidden chain-of-thought). Added ``connect_timeout`` (default 10s) and ``read_timeout`` (default 300s) to ``OCIClientConfig`` and wired both through to ``GenerativeAiInferenceClient`` for all four auth modes (api_key, security_token, instance_principal, resource_principal). Surfaces in failures as ``urllib3.ReadTimeoutError: ... read timeout=60.0``. (2) ``tests/integration/test_notebooks_all_live.py`` — the ``_NOTEBOOK_TIMEOUT_OVERRIDES`` map keyed off ``notebook_40_emergent_routing.py``; the notebook had been renumbered to ``notebook_34_emergent_routing.py`` so the override no longer matched, leaving the test on the default 360s budget while the underlying notebook actually needs ~7-9 min. Renamed the key. (3) ``tests/integration/conftest.py`` — the OCI / OpenAI test fixtures built models with ``max_tokens=512``. Reasoning models burn 200-2000+ output tokens on hidden chain-of-thought before producing any visible text; at 512 they return empty content with ``finish_reason='length'``, which surfaces in orchestrator + swarm tests as ``summary=''`` and ``findings={}`` even though ``success=True``. Bumped to 8192 with a comment explaining the ceiling-vs-target tradeoff (short-answer tests still finish fast because the model stops naturally when done). Local re-runs (BOAT-OC1, ``openai.gpt-5.5``, us-chicago-1): - ``test_summary_instead_of_bare_stop`` — passes (was OCI timeout) - ``test_notebook_runs_clean[notebook_34_emergent_routing]`` — passes (was 360s subprocess timeout) - ``test_swarm_executes_tasks`` — passes (was empty findings) - ``test_orchestrator_single_specialist`` — passes (was empty summary) - ``test_orchestrator_multiple_specialists`` — passes (was empty summary) 5/5 of the previously-environmental failures now pass deterministically. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

``test_instance_principal_client_creation`` was pinned to the exact keyword args passed to ``GenerativeAiInferenceClient``. The previous commit added ``timeout=(connect, read)`` to all four client-creation paths, so the strict ``assert_called_once_with(...)`` started missing the new kwarg. Updated the assertion to include the default tuple ``(10.0, 300.0)`` from ``OCIClientConfig``. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

…gonomics + httpx 1.0 cap + trademark naming (#265) Four PRs of fixes since b20. No new public APIs; tightens the SDK on durability (StateGraph interrupt resume), ergonomics (OCIModel aliases, AgentConfig.name, Tool.func), deps (httpx<1.0 cap), and brings the docs site in line with the approved product name. - #261 — StateGraph.interrupt_before now writes through the checkpointer at the pause boundary; resume advances past the gate instead of re-pausing. Inline interrupt() save crash with state=None fixed in the same pass. OCIModel gains region= and profile= ergonomic aliases. AgentConfig.name + Tool.func surface the names users naturally reach for. - #262 — Capped httpx<1.0; pre-release 1.0.dev3 drops the top-level Auth re-export and broke OCIRequestSigner + BearerAuth at import. - #257 — Applied the Oracle Trademark Legal-approved full name (wordmark above hero H1, persistent header) and short name (body prose / OG meta / tab title) across docs, README, and contributor markdown. - #264 — OCI client read timeout default 60s→300s for reasoning models; integration fixture max_tokens 512→8192 so reasoning models have budget for both hidden chain-of-thought and visible output; eight stale integration tests refreshed against current catalogue / workbench layout. See CHANGELOG.md for the full breakdown. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label May 23, 2026

fede-kamel added 2 commits May 23, 2026 15:49

fede-kamel merged commit dc90b33 into main May 23, 2026
10 checks passed

fede-kamel deleted the fix/stale-integration-tests branch May 23, 2026 20:10

fede-kamel mentioned this pull request May 23, 2026

chore(release): v0.2.0b21 — interrupt resume durability + OCIModel ergonomics + httpx 1.0 cap + trademark naming #265

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(integration): refresh stale notebook + workbench tests against current layout#264

test(integration): refresh stale notebook + workbench tests against current layout#264
fede-kamel merged 3 commits into
mainfrom
fix/stale-integration-tests

fede-kamel commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fede-kamel commented May 23, 2026

(A) test_notebooks_subset.py — 5 fails

(B) test_workbench_categories.py — 2 fails

(E) examples/notebook_70_oci_tools.py — 1 fail

Test plan

Out of scope

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

(A) `test_notebooks_subset.py` — 5 fails

(B) `test_workbench_categories.py` — 2 fails

(E) `examples/notebook_70_oci_tools.py` — 1 fail