Skip to content

Add reusable Prometheus telemetry setup#93

Merged
leandropineda merged 17 commits into
mainfrom
feat/metrics-prom-otel
Apr 30, 2026
Merged

Add reusable Prometheus telemetry setup#93
leandropineda merged 17 commits into
mainfrom
feat/metrics-prom-otel

Conversation

@leandropineda
Copy link
Copy Markdown
Member

@leandropineda leandropineda commented Apr 29, 2026

Summary

Adds an opt-in [telemetry] extra to the SDK and a small helper layer so connectors can install a Prometheus-exporting MeterProvider in one call. Keeps the package import-clean when telemetry is not installed (no hard dep on opentelemetry-*). Existing publish counters now carry a robot_id attribute so FleetConnector deployments get per-robot breakdowns for free.

This PR makes OTEL optional and lifts the recipe into a function so it can be reused (e.g. inorbit-connector will call it from a single place).

Changes

  • Make OpenTelemetry optional via a new [telemetry] extra. Base install no longer pulls opentelemetry-*.
  • Add get_meter() and setup_prometheus_meter_provider() helpers so connectors install a Prometheus-exporting MeterProvider in one call instead of copy-pasting the recipe.
  • Unify with_counter_metric (sync + async, optional per-call attributes); deprecate with_counter_metric_async.
  • RobotSession.publish_* counters now carry robot_id for per-robot breakdowns in fleet deployments. Stop double-counting on publish_laser.
  • Demo Dockerfile + INORBIT_METRICS_PORT env to scrape the demo's /metrics.

Backwards compatibility

  • Base install (pip install inorbit-edge) no longer pulls OTEL packages. Anyone relying on transitive availability must add inorbit-edge[telemetry] (or pin OTEL themselves).
  • with_counter_metric(counter) no-attribute form behaves identically to the previous version.
  • with_counter_metric_async keeps working but warns.
  • calls_publish_*_total series gain a new robot_id label. Pre-aggregated PromQL like sum(rate(calls_publish_pose_total[5m])) still produces the same value because Prometheus sums across labels by default.

Demo

$ docker run --rm -p 9464:9464 \
  -v "$PWD/inorbit_edge/tests/demo:/demo:ro" \
  -e INORBIT_URL="https://space.inorbit.ai/cloud_sdk_robot_config" \
  -e INORBIT_API_URL="https://api.inorbit.ai" \
  -e INORBIT_API_KEY="XXXXXXXXX" \
  -e INORBIT_ACCOUNT_ID="XXXXXXXXX" \
  -e INORBIT_ROBOT_ID_PREFIX=$(hostname) \
  inorbit-edge-sdk-demo
2026-04-29 13:00:22,660 [INFO] OpenTelemetry metrics (Prometheus) on http://0.0.0.0:9464/metrics
2026-04-29 13:00:22,660 [INFO] Robot id prefix: 'jarvis'
2026-04-29 13:00:22,660 [INFO] Registering callback 'command_callback' for robot 'jarvis_edgesdk_demo_0'
2026-04-29 13:00:22,660 [INFO] Registering callback 'c' for robot 'jarvis_edgesdk_demo_0'
2026-04-29 13:00:22,660 [INFO] Registering callback 'c' for robot 'jarvis_edgesdk_demo_0'
2026-04-29 13:00:22,660 [INFO] Registering callback 'handler' for robot 'jarvis_edgesdk_demo_0'
2026-04-29 13:00:22,660 [INFO] Fetching config for robot jarvis_edgesdk_demo_0
2026-04-29 13:00:24,152 [INFO] Waiting for MQTT connection state '_is_connected' ...
2026-04-29 13:00:24,785 [INFO] Connected to MQTT
2026-04-29 13:00:25,152 [INFO] MQTT connection initiated. cowardly-dog.brokers.inorbit.ai:8883 (MQTT)
2026-04-29 13:00:26,017 [INFO] jarvis_edgesdk_demo_0: Robot footprint set: {'operationStatus': 'SUCCESS'}
2026-04-29 13:00:26,018 [INFO] Registering callback 'command_callback' for robot 'jarvis_edgesdk_demo_1'
2026-04-29 13:00:26,018 [INFO] Registering callback 'c' for robot 'jarvis_edgesdk_demo_1'
2026-04-29 13:00:26,018 [INFO] Registering callback 'c' for robot 'jarvis_edgesdk_demo_1'
2026-04-29 13:00:26,018 [INFO] Registering callback 'handler' for robot 'jarvis_edgesdk_demo_1'
2026-04-29 13:00:26,018 [INFO] Fetching config for robot jarvis_edgesdk_demo_1
2026-04-29 13:00:27,298 [INFO] Waiting for MQTT connection state '_is_connected' ...
2026-04-29 13:00:27,925 [INFO] Connected to MQTT
2026-04-29 13:00:28,298 [INFO] MQTT connection initiated. cowardly-dog.brokers.inorbit.ai:8883 (MQTT)
2026-04-29 13:00:28,940 [INFO] jarvis_edgesdk_demo_1: Robot footprint set: {'operationStatus': 'SUCCESS'}
$ curl http://0.0.0.0:9464/metrics
# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 319.0
python_gc_objects_collected_total{generation="1"} 51.0
python_gc_objects_collected_total{generation="2"} 0.0
# HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 113.0
python_gc_collections_total{generation="1"} 10.0
python_gc_collections_total{generation="2"} 0.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="12",patchlevel="13",version="3.12.13"} 1.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 4.621438976e+09
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 7.5976704e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.77746762148e+09
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 4.43
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 12.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1024.0
# HELP calls_publish_map_total number of calls to publish maps
# TYPE calls_publish_map_total counter
calls_publish_map_total{robot_id="jarvis_edgesdk_demo_0"} 1.0
calls_publish_map_total{robot_id="jarvis_edgesdk_demo_1"} 1.0
# HELP calls_publish_pose_total number of calls to publish poses
# TYPE calls_publish_pose_total counter
calls_publish_pose_total{robot_id="jarvis_edgesdk_demo_0"} 80.0
calls_publish_pose_total{robot_id="jarvis_edgesdk_demo_1"} 80.0
# HELP calls_publish_system_stats_total number of calls to publish system stats
# TYPE calls_publish_system_stats_total counter
calls_publish_system_stats_total{robot_id="jarvis_edgesdk_demo_0"} 80.0
calls_publish_system_stats_total{robot_id="jarvis_edgesdk_demo_1"} 80.0
# HELP calls_publish_key_values_total number of calls to publish key-values
# TYPE calls_publish_key_values_total counter
calls_publish_key_values_total{robot_id="jarvis_edgesdk_demo_0"} 160.0
calls_publish_key_values_total{robot_id="jarvis_edgesdk_demo_1"} 160.0
# HELP calls_publish_odometry_total number of calls to publish odometry
# TYPE calls_publish_odometry_total counter
calls_publish_odometry_total{robot_id="jarvis_edgesdk_demo_0"} 80.0
calls_publish_odometry_total{robot_id="jarvis_edgesdk_demo_1"} 80.0
# HELP calls_publish_path_total number of calls to publish paths
# TYPE calls_publish_path_total counter
calls_publish_path_total{robot_id="jarvis_edgesdk_demo_0"} 80.0
calls_publish_path_total{robot_id="jarvis_edgesdk_demo_1"} 80.0
# HELP calls_publish_lasers_total number of calls to publish laser(s)
# TYPE calls_publish_lasers_total counter
calls_publish_lasers_total{robot_id="jarvis_edgesdk_demo_0"} 80.0
calls_publish_lasers_total{robot_id="jarvis_edgesdk_demo_1"} 80.0

@leandropineda leandropineda marked this pull request as ready for review April 29, 2026 21:15
@leandropineda leandropineda requested a review from b-Tomas April 29, 2026 21:16
@leandropineda leandropineda changed the title Feat/metrics prom otel Add reusable Prometheus telemetry setup Apr 29, 2026
@b-Tomas b-Tomas requested a review from Copilot April 30, 2026 00:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes OpenTelemetry metrics support opt-in via a new telemetry extra and adds reusable helpers for configuring a Prometheus-exporting MeterProvider, while also enriching existing publish counters with a robot_id attribute for fleet breakdowns.

Changes:

  • Add requirements-telemetry.txt and wire it as an install extra (inorbit-edge[telemetry]); remove OTEL from base requirements.
  • Introduce get_meter() / setup_prometheus_meter_provider() helpers and unify sync/async with_counter_metric (deprecating with_counter_metric_async).
  • Add robot_id metric attributes to RobotSession.publish_* counters and update the demo (README, Dockerfile, env vars) to expose /metrics.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tox.ini Installs telemetry deps in tox so tests can run against OTEL.
setup.py Adds telemetry extras and bumps version to 2.1.0.
requirements.txt Removes OTEL deps from base install.
requirements-telemetry.txt Introduces pinned OTEL + Prometheus exporter dependencies.
inorbit_edge/metrics.py Adds optional-OTEL no-op behavior plus Prometheus provider setup + decorator updates.
inorbit_edge/robot.py Adds robot_id attributes to publish counters; fixes double-counting for publish_laser.
inorbit_edge/tests/test_metrics.py Adds tests for decorator behavior, robot_id attributes, and Prometheus setup helper.
inorbit_edge/tests/demo/example.py Adds optional /metrics server wiring via env vars.
inorbit_edge/tests/demo/README.md Documents demo usage, telemetry extra, and Docker run instructions.
inorbit_edge/tests/demo/Dockerfile Adds demo image that installs [video,telemetry] and defaults metrics env vars.
README.md Documents the new telemetry extra and shows usage for Prometheus + custom metrics.
.dockerignore Reduces Docker build context for demo image builds.
inorbit_edge/__init__.py / .bumpversion.cfg Version bump to 2.1.0.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread inorbit_edge/metrics.py Outdated
Comment thread inorbit_edge/metrics.py Outdated
Comment thread inorbit_edge/metrics.py Outdated
Comment thread inorbit_edge/tests/demo/README.md Outdated
Comment thread inorbit_edge/tests/test_metrics.py
Comment thread README.md
Copy link
Copy Markdown
Member

@b-Tomas b-Tomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @leandropineda

Just a couple of small comments

Comment thread setup.py Outdated
Comment thread inorbit_edge/metrics.py
Comment thread inorbit_edge/tests/demo/robots_config_example.yaml Outdated
Comment thread inorbit_edge/metrics.py Outdated
@leandropineda leandropineda merged commit b940488 into main Apr 30, 2026
24 checks passed
@leandropineda leandropineda deleted the feat/metrics-prom-otel branch April 30, 2026 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants