Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 22 additions & 20 deletions src/pages/docs/observe/features/alerts.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,19 @@ title: "Alerts and monitors"
description: "Define monitors on Observe project metrics (system or evaluation) and get notified by email or Slack when values cross a threshold."
---

## What it is
## About

**Alerts and monitors** are Observe’s way to get notified when a project metric crosses a threshold—so regressions in error rate, latency, cost, or evaluation quality can trigger email or Slack instead of someone watching the dashboard. Monitors cover system metrics (errors, response time, token usage) and evaluation metrics (e.g. toxicity, bias). Each monitor evaluates on a schedule and, when the threshold is breached, creates a critical or warning alert and sends notifications. Alert history is stored so past triggers can be reviewed and marked resolved; monitors can be muted without being deleted.
**Alerts and monitors** notify you when a metric goes above or below a value you set. Pick a metric (error rate, latency, cost, or an eval score), define a threshold, and choose where to get notified: email, Slack, or both. Monitors check the metric on a schedule. If the threshold is breached, you get an alert. You can review past alerts, mark them resolved, or mute a monitor without deleting it.

## Use cases
---

## When to use

- **Error and reliability** — Alert when error rate, LLM API failure rate, or error-free session rate crosses a threshold so you catch outages or degradation early.
- **Latency and performance** — Monitor span or LLM response time and get notified when p95 or average exceeds a limit.
- **Cost and usage** — Track token usage or daily/monthly tokens spent and alert when spend crosses a budget threshold.
- **Evaluation quality** — Monitor an eval (e.g. fail rate for a pass/fail eval, or a numeric score) and alert when quality drops below or goes above a value.
- **Notifications** — Send alerts to up to five email addresses and/or a Slack webhook so the right people are informed without checking the UI.
- **Catch errors early**: Get notified when error rate or API failure rate spikes after a deployment.
- **Stay within latency limits**: Alert when response time goes above your target.
- **Control costs**: Track token usage and get a warning before you hit your budget.
- **Monitor eval quality**: Know when a pass/fail eval like toxicity starts failing more often.
- **Stay informed without watching dashboards**: Send alerts to email, Slack, or both.

---

Expand All @@ -24,8 +26,8 @@ description: "Define monitors on Observe project metrics (system or evaluation)
Create a monitor for an Observe project and select the **metric type**:
![Choose the metric](/screenshot/product/observe/1.png)

- **System metrics** — e.g. count of errors, error-free session rates, LLM API failure rates, span response time, LLM response time, token usage, daily/monthly tokens spent.
- **Evaluation metrics** — Attach a CustomEvalConfig (eval) for that project. For pass/fail or choice evals you can set **threshold_metric_value** to the specific value to monitor (e.g. fail rate or a choice label).
- **System metrics**: count of errors, error-free session rates, LLM API failure rates, span response time, LLM response time, token usage, daily/monthly tokens spent.
- **Evaluation metrics**: attach an eval config for that project. For pass/fail or choice evals you can set **threshold_metric_value** to the specific value to monitor (e.g. fail rate or a choice label).

The monitor is scoped to one project (Observe projects only).
</Step>
Expand All @@ -34,21 +36,21 @@ description: "Define monitors on Observe project metrics (system or evaluation)
Set how the alert is triggered:
![Define the threshold](/screenshot/product/observe/2.png)

- **threshold_operator** **Greater than** or **Less than** (the current metric value is compared to the threshold).
- **threshold_type** — How the threshold is determined:
- **Static** — You set fixed **critical_threshold_value** and optionally **warning_threshold_value**. Alert fires when the metric is greater than (or less than) these values.
- **Percentage change** — Threshold is based on percentage change from a baseline (e.g. historical mean over a time window). You set **critical_threshold_value** and optionally **warning_threshold_value** as percentage values. **auto_threshold_time_window** (default one week, in minutes) defines the window used to compute the baseline.
- **threshold_operator**: **Greater than** or **Less than** (the current metric value is compared to the threshold).
- **threshold_type**: how the threshold is determined:
- **Static**: you set fixed **critical_threshold_value** and optionally **warning_threshold_value**. Alert fires when the metric is greater than (or less than) these values.
- **Percentage change**: threshold is based on percentage change from a baseline (e.g. historical mean over a time window). You set **critical_threshold_value** and optionally **warning_threshold_value** as percentage values. **auto_threshold_time_window** (default one week, in minutes) defines the window used to compute the baseline.

When the condition is met, the system creates an alert log (critical or warning) and triggers notifications.
</Step>

<Step title="Set alert frequency">
**alert_frequency** is how often the monitor is evaluated, in minutes (minimum 5, default 60). The monitor runs on this schedule and checks the metric over the relevant time window; if the threshold is breached, an alert is created and notifications are sent.
**alert_frequency** is how often the monitor is evaluated, in minutes (minimum 5, default 60). The monitor runs on this schedule and checks the metric over the relevant time window. If the threshold is breached, an alert is created and notifications are sent.
</Step>

<Step title="Configure notifications">
- **Email** — Add up to five addresses in **notification_emails**. They receive an email when an alert is triggered (subject and body include alert name, message, and type).
- **Slack** — Set **slack_webhook_url** to your Slack incoming webhook. Optional **slack_notes** are included in the message.
- **Email**: add up to five addresses in **notification_emails**. They receive an email when an alert is triggered (subject and body include alert name, message, and type).
- **Slack**: set **slack_webhook_url** to your Slack incoming webhook. Optional **slack_notes** are included in the message.
![Configure notifications](/screenshot/product/observe/3.png)
You can use email only, Slack only, or both. Mute a monitor with **is_mute** to stop notifications without deleting it.
</Step>
Expand All @@ -64,16 +66,16 @@ description: "Define monitors on Observe project metrics (system or evaluation)

---

## What you can do next
## Next Steps

<CardGroup cols={2}>
<Card title="Set Up Observability" icon="play" href="/docs/observe/features/quickstart">
Connect the SDK and start capturing traces.
</Card>
<Card title="Evals" icon="chart-line" href="/docs/observe/features/evals">
<Card title="Run Evals on Traces" icon="chart-line" href="/docs/observe/features/evals">
Run evaluations on your traced spans to score quality.
</Card>
<Card title="Sessions" icon="table-rows" href="/docs/observe/features/session">
<Card title="Group Traces by Session" icon="table-rows" href="/docs/observe/features/session">
Group traces into sessions for multi-turn analysis.
</Card>
<Card title="Users" icon="tags" href="/docs/observe/features/users">
Expand Down