Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ Here’s what the resulting log would look like when a pipeline is run:

## Tracing

To get a bigger picture of the pipeline’s performance, try tracing it with [Langfuse](../../development/tracing.mdx#langfuse).
To get a bigger picture of the pipeline’s performance, try tracing it with [Langfuse](../../development/tracing/langfuse.mdx).

Our [Tracing](../../development/tracing.mdx) page has more about other tracing solutions for Haystack.

Expand Down
247 changes: 15 additions & 232 deletions docs-website/docs/development/tracing.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,150 +2,33 @@
title: "Tracing"
id: tracing
slug: "/tracing"
description: "This page explains how to use tracing in Haystack. It describes how to set up a tracing backend with OpenTelemetry, Datadog, or your own solution. This can help you monitor your app's performance and optimize it."
description: "This page explains how to use tracing in Haystack. It lists the tracing backends Haystack supports out of the box and explains how to enable, configure, and disable tracing."
---

import ClickableImage from "@site/src/components/ClickableImage";

# Tracing

This page explains how to use tracing in Haystack. It describes how to set up a tracing backend with OpenTelemetry, Datadog, or your own solution. This can help you monitor your app's performance and optimize it.

Traces document the flow of requests through your application and are vital for monitoring applications in production. This helps to understand the execution order of your pipeline components and analyze where your pipeline spends the most time.

## Configuring a Tracing Backend

Instrumented applications typically send traces to a trace collector or a tracing backend. Haystack provides out-of-the-box support for [OpenTelemetry](https://opentelemetry.io/) and [Datadog](https://app.datadoghq.eu/dashboard/lists). You can also quickly implement support for additional providers of your choosing.

### OpenTelemetry

The `OpenTelemetryConnector` component lets you trace your Haystack pipelines with [OpenTelemetry](https://opentelemetry.io/).

Simply install the integration with `pip install opentelemetry-haystack`, then add the connector to your pipeline.

:::info
Check out the [integration page](https://haystack.deepset.ai/integrations/opentelemetry) for more details and example usage.
:::

### Datadog

The `DatadogConnector` component lets you trace your Haystack pipelines with [Datadog](https://www.datadoghq.com/).

Simply install the integration with `pip install datadog-haystack`, then add the connector to your pipeline.

:::info
Check out the [integration page](https://haystack.deepset.ai/integrations/datadog) for more details and example usage.
:::

### Langfuse

`LangfuseConnector` component allows you to easily trace your Haystack pipelines with the Langfuse UI.

Simply install the component with `pip install langfuse-haystack`, then add it to your pipeline.

:::info
Check out the component's [documentation page](../pipeline-components/connectors/langfuseconnector.mdx) for more details and example usage, or our [blog post](https://haystack.deepset.ai/blog/langfuse-integration) for the complete walkthrough.
:::
<ClickableImage src="/img/11cec4f-langfuse-generation-span.png" alt="Langfuse trace detail view showing generation span with input prompt, output, metadata, latency, and cost information for a language model call" />

### MLflow

[MLflow](https://mlflow.org/) is an open-source platform for managing the end-to-end machine learning and AI lifecycle. MLflow provides native tracing support for Haystack. Simply install MLflow and enable automatic tracing with a single line of code.

```shell
pip install mlflow
```

```python
import mlflow

mlflow.haystack.autolog()
# Optionally set an experiment name
mlflow.set_experiment("Haystack")
```

This automatically captures traces from all Haystack pipelines and components, including latencies, token usage, cost, and any exceptions.

:::info
Check out the [MLflow Haystack integration guide](https://haystack.deepset.ai/integrations/mlflow) for a full walkthrough with examples.
:::

### Weights & Biases Weave

The `WeaveConnector` component allows you to trace and visualize your pipeline execution in [Weights & Biases](https://wandb.ai/site/) framework.

You will first need to create a free account on Weights & Biases website and get your API key, as well as install the integration with `pip install weights_biases-haystack`.

:::info
Check out the component's [documentation page](../pipeline-components/connectors/weaveconnector.mdx) for more details and example usage.
:::

### Custom Tracing Backend

To use your custom tracing backend with Haystack, follow these steps:

1. Implement the `Tracer` interface. The following code snippet provides an example using the OpenTelemetry package:
Traces document the flow of requests through your application and are vital for monitoring applications in production. This helps you understand the execution order of your pipeline components and analyze where your pipeline spends the most time.

```python
import contextlib
from typing import Optional, Dict, Any, Iterator
Instrumented applications typically send traces to a trace collector or a tracing backend. Haystack provides out-of-the-box support for several backends, and you can also quickly implement support for additional providers of your choosing.

from opentelemetry import trace
from opentelemetry.trace import NonRecordingSpan
## Supported Tracers

from haystack.tracing import Tracer, Span
from haystack.tracing import utils as tracing_utils
import opentelemetry.trace

class OpenTelemetrySpan(Span):
def __init__(self, span: opentelemetry.trace.Span) -> None:
self._span = span

def set_tag(self, key: str, value: Any) -> None:
# Tracing backends usually don't support any tag value
# `coerce_tag_value` forces the value to either be a Python
# primitive (int, float, boolean, str) or tries to dump it as string.
coerced_value = tracing_utils.coerce_tag_value(value)
self._span.set_attribute(key, coerced_value)

class OpenTelemetryTracer(Tracer):
def __init__(self, tracer: opentelemetry.trace.Tracer) -> None:
self._tracer = tracer

@contextlib.contextmanager
def trace(
self, operation_name: str, tags: Optional[Dict[str, Any]] = None, parent_span: Optional[Span] = None
) -> Iterator[Span]:
with self._tracer.start_as_current_span(operation_name) as span:
span = OpenTelemetrySpan(span)
if tags:
span.set_tags(tags)

yield span

def current_span(self) -> Optional[Span]:
current_span = trace.get_current_span()
if isinstance(current_span, NonRecordingSpan):
return None

return OpenTelemetrySpan(current_span)
```

2. Tell Haystack to use your custom tracer:

```python
from haystack import tracing

haystack_tracer = OpenTelemetryTracer(tracer)
tracing.enable_tracing(haystack_tracer)
```
| Tracer | Description |
| --- | --- |
| [OpenTelemetry](tracing/opentelemetry.mdx) | Send traces to any [OpenTelemetry](https://opentelemetry.io/)-compatible backend using the `OpenTelemetryTracer` or the `OpenTelemetryConnector` component. Includes a Jaeger setup for local development. |
| [MLflow](tracing/mlflow.mdx) | Capture traces with [MLflow](https://mlflow.org/)'s native Haystack tracing support. |
| [Datadog](tracing/datadog.mdx) | Trace your pipelines with [Datadog](https://www.datadoghq.com/) using the `DatadogTracer` or the `DatadogConnector` component. |
| [Langfuse](tracing/langfuse.mdx) | Trace your pipelines with the [Langfuse](https://langfuse.com/) UI using the `LangfuseTracer` or the `LangfuseConnector` component. |
| [Weights & Biases Weave](tracing/weave.mdx) | Trace and visualize pipeline execution in [Weights & Biases](https://wandb.ai/site/) using the `WeaveTracer` or the `WeaveConnector` component. |
| [LoggingTracer](tracing/logging-tracer.mdx) | Inspect the data flowing through your pipeline in real time through logs, with no backend setup. |
| [Custom Tracer](tracing/custom-tracer.mdx) | Connect any tracing backend by implementing the `Tracer` interface. |

## Disabling Auto Tracing

Haystack automatically detects and enables tracing under the following circumstances:

- If `opentelemetry-sdk` is installed and configured for OpenTelemetry.
- If `ddtrace` is installed for Datadog.
- If `opentelemetry-sdk` is installed and configured for OpenTelemetry. Note that this auto-enabling is deprecated and will be removed in Haystack 3.0 – use the [`OpenTelemetryConnector`](tracing/opentelemetry.mdx) to enable OpenTelemetry tracing instead.
- If `ddtrace` is installed for Datadog. Note that this auto-enabling is deprecated and will be removed in Haystack 3.0 – use the [`DatadogConnector`](tracing/datadog.mdx) to enable Datadog tracing instead.

To disable this behavior, there are two options:

Expand Down Expand Up @@ -180,103 +63,3 @@ To enable content tracing, there are two options:

tracing.tracer.is_content_tracing_enabled = True
```

## Visualizing Traces During Development

Use [Jaeger](https://www.jaegertracing.io/docs/1.6/getting-started/) as a lightweight tracing backend for local pipeline development. This allows you to experiment with tracing without the need for a complex tracing backend.
<ClickableImage src="/img/dd906d7-Screenshot_2024-02-22_at_16.51.01.png" alt="Jaeger UI trace timeline displaying haystack pipeline execution with component spans showing duration and nesting of operations" />

1. Run the Jaeger container. This creates a tracing backend as well as a UI to visualize the traces:

```shell
docker run --rm -d --name jaeger \
-e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
-p 6831:6831/udp \
-p 6832:6832/udp \
-p 5778:5778 \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
-p 14250:14250 \
-p 14268:14268 \
-p 14269:14269 \
-p 9411:9411 \
jaegertracing/all-in-one:latest
```
2. Install the OpenTelemetry SDK:

```shell
pip install opentelemetry-sdk
pip install opentelemetry-exporter-otlp
```
3. Configure `OpenTelemetry` to use the Jaeger backend:

```python
from opentelemetry.sdk.resources import Resource
from opentelemetry.semconv.resource import ResourceAttributes

from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

# Service name is required for most backends
resource = Resource(attributes={
ResourceAttributes.SERVICE_NAME: "haystack"
})

tracer_provider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces"))
tracer_provider.add_span_processor(processor)
trace.set_tracer_provider(tracer_provider)
```
4. Tell Haystack to use OpenTelemetry for tracing:

```python
import haystack.tracing

haystack.tracing.auto_enable_tracing()
```
5. Run your pipeline:

```python
...
pipeline.run(...)
...
```
6. Inspect the traces in the UI provided by Jaeger at [http://localhost:16686](http://localhost:16686/search).

## Real-Time Pipeline Logging

Use Haystack's [`LoggingTracer`](https://github.com/deepset-ai/haystack/blob/main/haystack/tracing/logging_tracer.py) logs to inspect the data that's flowing through your pipeline in real-time.

This feature is particularly helpful during experimentation and prototyping, as you don’t need to set up any tracing backend beforehand.

Here’s how you can enable this tracer. In this example, we are adding color tags (this is optional) to highlight the components' names and inputs:

```python
import logging
from haystack import tracing
from haystack.tracing.logging_tracer import LoggingTracer

logging.basicConfig(
format="%(levelname)s - %(name)s - %(message)s",
level=logging.WARNING,
)
logging.getLogger("haystack").setLevel(logging.DEBUG)

tracing.tracer.is_content_tracing_enabled = (
True # to enable tracing/logging content (inputs/outputs)
)
tracing.enable_tracing(
LoggingTracer(
tags_color_strings={
"haystack.component.input": "\x1b[1;31m",
"haystack.component.name": "\x1b[1;34m",
},
),
)
```

Here’s what the resulting log would look like when a pipeline is run:
<ClickableImage src="/img/55c3d5c84282d726c95fb3350ec36be49a354edca8a6164f5dffdab7121cec58-image_2.png" alt="Console output showing Haystack pipeline execution with DEBUG level tracing logs including component names, types, and input/output specifications" />
82 changes: 82 additions & 0 deletions docs-website/docs/development/tracing/custom-tracer.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
title: "Custom Tracer"
id: custom-tracer
slug: "/tracing-custom-tracer"
description: "Learn how to connect Haystack to a custom tracing backend by implementing the Tracer interface."
---

# Custom Tracer

Learn how to connect Haystack to a custom tracing backend by implementing the `Tracer` interface.

<div className="key-value-table">

| | |
| --- | --- |
| **Base classes** | `Tracer` and `Span` |
| **How to enable** | Implement the `Tracer` interface, then `tracing.enable_tracing(your_tracer)` |
| **Content tracing** | Optional. Set `HAYSTACK_CONTENT_TRACING_ENABLED` to `true` to trace component inputs and outputs |
| **Package** | Built into Haystack |
| **GitHub link** | https://github.com/deepset-ai/haystack/blob/main/haystack/tracing/tracer.py |

</div>

## Overview

If your tracing backend isn't supported out of the box, you can connect it to Haystack by implementing the `Tracer` interface. This gives you full control over how spans are created and how tags are recorded.

## Usage

1. Implement the `Tracer` interface. The following code snippet provides an example using the OpenTelemetry package:

```python
import contextlib
from typing import Optional, Dict, Any, Iterator

from opentelemetry import trace
from opentelemetry.trace import NonRecordingSpan

from haystack.tracing import Tracer, Span
from haystack.tracing import utils as tracing_utils
import opentelemetry.trace

class OpenTelemetrySpan(Span):
def __init__(self, span: opentelemetry.trace.Span) -> None:
self._span = span

def set_tag(self, key: str, value: Any) -> None:
# Tracing backends usually don't support any tag value
# `coerce_tag_value` forces the value to either be a Python
# primitive (int, float, boolean, str) or tries to dump it as string.
coerced_value = tracing_utils.coerce_tag_value(value)
self._span.set_attribute(key, coerced_value)

class OpenTelemetryTracer(Tracer):
def __init__(self, tracer: opentelemetry.trace.Tracer) -> None:
self._tracer = tracer

@contextlib.contextmanager
def trace(self, operation_name: str, tags: Optional[Dict[str, Any]] = None) -> Iterator[Span]:
with self._tracer.start_as_current_span(operation_name) as span:
span = OpenTelemetrySpan(span)
if tags:
span.set_tags(tags)

yield span

def current_span(self) -> Optional[Span]:
current_span = trace.get_current_span()
if isinstance(current_span, NonRecordingSpan):
return None

return OpenTelemetrySpan(current_span)
```

2. Tell Haystack to use your custom tracer:

```python
from haystack import tracing

haystack_tracer = OpenTelemetryTracer(tracer)
tracing.enable_tracing(haystack_tracer)
```
Loading