Skip to content

feat(acp): warn when ddtrace double-instruments the route handler#409

Closed
max-parke-scale wants to merge 1 commit into
nextfrom
mparke/issue3-ddtrace-double-instrument-guard
Closed

feat(acp): warn when ddtrace double-instruments the route handler#409
max-parke-scale wants to merge 1 commit into
nextfrom
mparke/issue3-ddtrace-double-instrument-guard

Conversation

@max-parke-scale

Copy link
Copy Markdown
Contributor

Issue #3 (0.13.0 incident): ddtrace doubled the ACP route template in Datadog APM

On centipede-student-assistant-agentex-agent, ddtrace's resource_name/http.route doubled the router prefix (POST /api/api, GET /healthz/healthz) alongside the correct single-prefix names. The served URL was unaffected (http.url, http.path_group, http.inferred_route stayed single), so traffic worked but monitors/dashboards filtered on resource_name:post_/api silently lost the doubled fraction.

Root cause

The ACP server registers single routes correctly — the doubling is ddtrace's. ddtrace's traced_handler wraps the inherited APIRoute.handle; when two independent ddtrace auto-instrumentation passes run in one process, their de-dupe guards (_datadog_patch flag, is_wrapted) don't cross-recognize each other and the handler is wrapped twice — so each request accumulates the route path twice. A single ddtrace never doubles (verified across 3.13.0→4.10.1 and the real ddtrace-run/ddtrace.auto launch paths); two entry points do.

In the incident the two passes were an in-agent ddtrace.patch_all() and the platform's Datadog single-step instrumentation (admission.datadoghq.com/enabled: "true"). The agent-side and monitor-side remediations are tracked separately; this PR is the SDK-side guard + guidance.

What this changes

  • base_acp_server.py: _warn_if_double_instrumented() runs at lifespan startup, counts ddtrace traced_handler wraps on the route handler, and logs an actionable WARNING if >1. Best-effort — reads ddtrace/wrapt internals inside try/except, returns 0 (silent) if ddtrace is inactive or internals shift. No behavior change beyond the log line.
  • docs/observability.md: the single-entry-point rule (don't combine patch_all()/ddtrace-run with platform SSI), the symptom, the local-dev tradeoff, the floor-only-pin fragility, and the @http.path_group:/api monitor filter that stays correct even when the resource name doubles.

Verification

  • Reproduced the doubling and confirmed the guard fires only when double-wrapped: count 0 (no ddtrace) → 1 (clean single instrument, no warning) → 2 (double-wrap, warning logged).
  • ruff check passes; the diff is additive (+51/−0 in the server file), no reformatting churn.
  • The full mock-server test suite was not run locally (the change is additive and log-only); CI will exercise it.

Files: src/agentex/lib/sdk/fastacp/base/base_acp_server.py (+51), docs/observability.md (new, +51).

🧑‍💻🤖 — posted via Claude Code

A process with more than one ddtrace auto-instrumentation entry point (e.g. an
in-agent ddtrace.patch_all()/ddtrace-run alongside Datadog single-step
instrumentation) wraps the FastAPI/Starlette route handler twice. Each request
then accumulates the route path twice, so ddtrace emits a doubled
resource_name/http.route (POST /api/api, GET /healthz/healthz) while the served
path is unchanged — silently blinding monitors filtered on the single-prefix
resource_name.

Add a best-effort startup check that counts ddtrace traced_handler wraps on the
route handler and logs an actionable warning when it exceeds one, plus
docs/observability.md documenting the single-entry-point rule and the
path_group-based monitor filter that stays correct even when the resource name
doubles.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@max-parke-scale max-parke-scale force-pushed the mparke/issue3-ddtrace-double-instrument-guard branch from 8328a52 to cb71af4 Compare June 17, 2026 18:34
@max-parke-scale

Copy link
Copy Markdown
Contributor Author

Closing — this doesn't belong in the general agentex SDK.

The incident (issue #3: ddtrace doubling the ACP route template → POST /api/api, blinded monitors) is already fixed by scaleapi/agentex-agents#1702. Root cause: two ddtrace copies at mismatched versions in one pod — in-image 4.8.6 (via ddtrace-run + in-app patch_all()) vs the SSI-injected 4.10.1 — defeated ddtrace's de-dupe guards and double-wrapped the inherited Route.handle.

Why not land the guard + docs here:

  • The guidance is SGP-platform specific (Datadog single-step instrumentation is enabled by the SGP helm charts via admission.datadoghq.com/enabled, so agents must not also self-instrument). That's a platform property and belongs in scaleapi/sgp / a deployment runbook, not a general PyPI-published SDK that non-SGP consumers also use.
  • The startup guard read ddtrace/wrapt internals (fragile across ddtrace upgrades) and only logged a warning; its marginal, generic value didn't justify carrying it in the SDK.

No code change is warranted in this repo. Knowledge capture for the SGP side can ride with #1702 / the platform owners.

🧑‍💻🤖 — posted via Claude Code

@max-parke-scale max-parke-scale deleted the mparke/issue3-ddtrace-double-instrument-guard branch June 17, 2026 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant