Skip to content

test(integration): refresh stale notebook + workbench tests against current layout#264

Merged
fede-kamel merged 3 commits into
mainfrom
fix/stale-integration-tests
May 23, 2026
Merged

test(integration): refresh stale notebook + workbench tests against current layout#264
fede-kamel merged 3 commits into
mainfrom
fix/stale-integration-tests

Conversation

@fede-kamel
Copy link
Copy Markdown
Contributor

Eight pre-existing integration failures across three areas, all caused by tests drifting past code that moved. Not LLM-behavioral — purely staleness.

(A) test_notebooks_subset.py — 5 fails

TestNotebookExecution had 5 methods pointing at notebook filenames that no longer exist (the catalogue was renumbered to a contiguous 1-70 sequence). Updated each subprocess call to target the current file and renamed the test methods so the suite reads consistently.

Was Is
test_notebook_14_runsnotebook_19_sse_streaming.py test_notebook_13_runsnotebook_13_sse_streaming.py
test_notebook_36_runsnotebook_41_structured_output.py test_notebook_35_runsnotebook_35_structured_output.py
test_notebook_37_runsnotebook_42_reasoning_patterns.py test_notebook_36_runsnotebook_36_reasoning_patterns.py
test_notebook_43_runsnotebook_48_playbooks.py test_notebook_46_runsnotebook_46_playbooks.py
test_notebook_49_runsnotebook_54_checkpoint_backends.py test_notebook_52_runsnotebook_52_checkpoint_backends.py

(B) test_workbench_categories.py — 2 fails

workbench/backend/runner.py::NOTEBOOK_CATEGORIES combined router + observability into a single router-observability track. The tests still asserted the old taxonomy.

  • test_endpoint_returns_curated_categories — replaced "router" and "observability" in the required-id list with "router-observability".
  • Renamed test_observability_category_contains_new_sse_notebookstest_router_observability_groups_router_plus_eventbus and pointed it at notebooks 58-61 (the current router + EventBus + observability notebooks) instead of 52-55 (which are now production / checkpointer tests).

(E) examples/notebook_70_oci_tools.py — 1 fail

The _env helper hard-required OCI_USE_PROFILE / OCI_USE_REGION / OCI_USE_TENANCY and OCI_GENAI_PROFILE, exiting 2 if any was missing. Hostile to users who already exported the standard OCI envelope (OCI_PROFILE, OCI_REGION, OCI_COMPARTMENT) — and the exact reason test_notebooks_all_live.py[notebook_70_oci_tools] failed in the live suite. Added a fallbacks= parameter to _env and wired every OCI_USE_* / OCI_GENAI_* read to fall back through the standard names.

Test plan

  • test_notebooks_subset.py::TestNotebookExecution — 5/5 pass (33s, real python examples/notebook_NN_*.py subprocesses)
  • test_workbench_categories.py::TestNotebookCategories — 4/4 pass against the live runner TestClient
  • test_notebooks_all_live.py[notebook_70_oci_tools] — passes with OCI_PROFILE / OCI_REGION / OCI_COMPARTMENT set
  • pre-commit run --files <staged> — pass
  • CI green

Out of scope

The other 6 failures from the full integration run were OCI read-timeouts under -n 4 parallel load (orchestrator + swarm tests hitting read timeout=60.0) and an xAI rate-limit hit — environmental, not code bugs. Those need a separate look at OCI throttling / pytest concurrency, not a test rewrite.

…urrent layout

Three groups of pre-existing integration failures, none related to live
LLM behavior — all are tests that drifted past code that moved.

(A) ``tests/integration/test_notebooks_subset.py`` —
``TestNotebookExecution`` had 5 methods pointing at notebook filenames
that no longer exist on disk (the notebook catalogue was renumbered to
a contiguous 1-70 sequence). Updated each subprocess invocation to
target the current file. Renamed the test methods to match the new
numbers (``test_notebook_36_runs`` → ``test_notebook_35_runs`` etc.) so
the suite reads consistently end-to-end. Added a header comment noting
that these must stay in sync with ``examples/``.

(B) ``tests/integration/test_workbench_categories.py`` —
``test_endpoint_returns_curated_categories`` asserted ``"router"`` was
a top-level category; the workbench combined router + observability
into a single ``"router-observability"`` track in
``workbench/backend/runner.py::NOTEBOOK_CATEGORIES``. Updated the
required-id list. Renamed the SSE-suite assertion to
``test_router_observability_groups_router_plus_eventbus`` and pointed
it at notebooks 58-61 (the actual router + EventBus + observability
notebooks today) instead of 52-55 (which are now production /
checkpointer tests).

(E) ``examples/notebook_70_oci_tools.py`` — the ``_env`` helper hard-
required ``OCI_USE_PROFILE`` / ``OCI_USE_REGION`` / ``OCI_USE_TENANCY``
and ``OCI_GENAI_PROFILE``, exiting 2 if any was missing. That's
hostile to users who already exported the standard OCI envelope
(``OCI_PROFILE`` / ``OCI_REGION`` / ``OCI_COMPARTMENT``), and it's
exactly what tripped ``test_notebooks_all_live.py`` for this notebook
in CI. Added a ``fallbacks=`` parameter to ``_env`` and wired every
``OCI_USE_*`` / ``OCI_GENAI_*`` read to fall back through the standard
names. Documented in the helper's docstring.

Local re-runs:
- 5 ``test_notebooks_subset.py::TestNotebookExecution`` tests pass
  (33s, all run real ``python examples/notebook_NN_*.py`` subprocesses).
- 4 ``test_workbench_categories.py::TestNotebookCategories`` tests
  pass against the live runner ``TestClient``.
- ``test_notebooks_all_live.py[notebook_70_oci_tools]`` passes with
  ``OCI_PROFILE`` / ``OCI_REGION`` / ``OCI_COMPARTMENT`` set (no
  ``OCI_USE_*`` overrides required).

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
@oracle-contributor-agreement oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label May 23, 2026
…x_tokens for reasoning models

Three more pre-existing integration failures that surfaced once the
stale-test cleanup landed. All caused by the test envelope being too
tight for reasoning-model traffic (gpt-5.5, o-series), not by real bugs.

(1) ``src/locus/models/providers/oci/client.py`` — the OCI Python SDK
default read timeout is 60s, which isn't enough for reasoning-model
summarization calls in the orchestrator + swarm flows (first response
token can take 90-180s to arrive after the model finishes hidden
chain-of-thought). Added ``connect_timeout`` (default 10s) and
``read_timeout`` (default 300s) to ``OCIClientConfig`` and wired both
through to ``GenerativeAiInferenceClient`` for all four auth modes
(api_key, security_token, instance_principal, resource_principal).
Surfaces in failures as
``urllib3.ReadTimeoutError: ... read timeout=60.0``.

(2) ``tests/integration/test_notebooks_all_live.py`` — the
``_NOTEBOOK_TIMEOUT_OVERRIDES`` map keyed off
``notebook_40_emergent_routing.py``; the notebook had been renumbered
to ``notebook_34_emergent_routing.py`` so the override no longer
matched, leaving the test on the default 360s budget while the
underlying notebook actually needs ~7-9 min. Renamed the key.

(3) ``tests/integration/conftest.py`` — the OCI / OpenAI test
fixtures built models with ``max_tokens=512``. Reasoning models burn
200-2000+ output tokens on hidden chain-of-thought before producing
any visible text; at 512 they return empty content with
``finish_reason='length'``, which surfaces in orchestrator + swarm
tests as ``summary=''`` and ``findings={}`` even though
``success=True``. Bumped to 8192 with a comment explaining the
ceiling-vs-target tradeoff (short-answer tests still finish fast
because the model stops naturally when done).

Local re-runs (BOAT-OC1, ``openai.gpt-5.5``, us-chicago-1):

- ``test_summary_instead_of_bare_stop`` — passes (was OCI timeout)
- ``test_notebook_runs_clean[notebook_34_emergent_routing]`` — passes
  (was 360s subprocess timeout)
- ``test_swarm_executes_tasks`` — passes (was empty findings)
- ``test_orchestrator_single_specialist`` — passes (was empty summary)
- ``test_orchestrator_multiple_specialists`` — passes (was empty summary)

5/5 of the previously-environmental failures now pass deterministically.

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
``test_instance_principal_client_creation`` was pinned to the exact
keyword args passed to ``GenerativeAiInferenceClient``. The previous
commit added ``timeout=(connect, read)`` to all four client-creation
paths, so the strict ``assert_called_once_with(...)`` started missing
the new kwarg. Updated the assertion to include the default tuple
``(10.0, 300.0)`` from ``OCIClientConfig``.

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
@fede-kamel fede-kamel merged commit dc90b33 into main May 23, 2026
10 checks passed
@fede-kamel fede-kamel deleted the fix/stale-integration-tests branch May 23, 2026 20:10
fede-kamel added a commit that referenced this pull request May 23, 2026
…gonomics + httpx 1.0 cap + trademark naming (#265)

Four PRs of fixes since b20. No new public APIs; tightens the SDK
on durability (StateGraph interrupt resume), ergonomics (OCIModel
aliases, AgentConfig.name, Tool.func), deps (httpx<1.0 cap), and
brings the docs site in line with the approved product name.

- #261 — StateGraph.interrupt_before now writes through the
  checkpointer at the pause boundary; resume advances past the gate
  instead of re-pausing. Inline interrupt() save crash with
  state=None fixed in the same pass. OCIModel gains region= and
  profile= ergonomic aliases. AgentConfig.name + Tool.func surface
  the names users naturally reach for.
- #262 — Capped httpx<1.0; pre-release 1.0.dev3 drops the top-level
  Auth re-export and broke OCIRequestSigner + BearerAuth at import.
- #257 — Applied the Oracle Trademark Legal-approved full name
  (wordmark above hero H1, persistent header) and short name (body
  prose / OG meta / tab title) across docs, README, and contributor
  markdown.
- #264 — OCI client read timeout default 60s→300s for reasoning
  models; integration fixture max_tokens 512→8192 so reasoning
  models have budget for both hidden chain-of-thought and visible
  output; eight stale integration tests refreshed against current
  catalogue / workbench layout.

See CHANGELOG.md for the full breakdown.

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant