diff --git a/docs.json b/docs.json index daf48489..b0cf6592 100644 --- a/docs.json +++ b/docs.json @@ -152,6 +152,7 @@ "guides/ai-agents/agent-memory", "guides/ai-agents/verified-answers", "guides/ai-agents/evaluations", + "guides/ai-agents/reviews", "guides/ai-agents/data-access", "guides/ai-agents/best-practices", "guides/ai-agents/ai-writeback", diff --git a/guides/ai-agents/reviews.mdx b/guides/ai-agents/reviews.mdx new file mode 100644 index 00000000..b60d4a4b --- /dev/null +++ b/guides/ai-agents/reviews.mdx @@ -0,0 +1,60 @@ +--- +title: "AI agent reviews" +sidebarTitle: "Reviews" +description: "Surface answers your agents probably got wrong, grouped by root cause, with one-click fixes for semantic layer and project context gaps." +--- + + + **Availability:** Reviews is a [Beta](/references/workspace/feature-maturity-levels) feature available on **Enterprise** and **Cloud Pro** plans. It is **off by default** and must be enabled by an organization admin. + + +Reviews scans every AI agent turn for signs that the answer was probably wrong — a missing metric, an ambiguous field, a gap in your project context — and groups the findings by root cause so you can fix them in one place. + +For each finding, Reviews proposes the smallest change that would have prevented it: + +- **Semantic layer fixes** open a pull request against your dbt project (rename a field, add a description, add a metric). +- **Project context fixes** add a short note that your agents read before answering future questions. + +## When to use it + +Turn Reviews on once you have agents in regular use and want a feedback loop that improves them without manually reading thread transcripts. It's most useful when: + +- Users ask the same kind of question repeatedly and the agent struggles +- You want to keep dbt metadata and agent instructions tight as your data model evolves +- You're rolling agents out to a wider audience and want admin-level visibility into quality + +If you're still setting your first agent up, start with [agent setup](/guides/ai-agents/getting-started) and [evaluations](/guides/ai-agents/evaluations) first — Reviews works best once there's real usage to learn from. + +## Enabling Reviews + +Reviews is opt-in. Until an admin enables it, no agent turns are processed and no findings are collected. + +1. Go to **Settings → Ask AI → General**. +2. Toggle **Review AI agent turns** on. + +Once enabled, future agent turns are reviewed in the background. Existing threads are not back-filled. + +To stop collecting findings, toggle the setting off. Previously collected findings remain visible until you act on or dismiss them. + +## Reviewing findings + +Open **Settings → Ask AI → Reviews** to see what Reviews has surfaced. Findings are grouped by root cause so you can fix the underlying issue once instead of replying to threads one by one. + +| Finding type | What it means | How to fix | +| --- | --- | --- | +| **Semantic layer** | A field, metric, or description is missing, ambiguous, or wrong in your dbt project. | Apply the suggested fix to open a pull request against your dbt repository. | +| **Project context** | Your agents are missing background knowledge to answer reliably (e.g. which table to use for "active users"). | Apply the suggested fix to add a note your agents read before answering. | + +Each finding links back to the original agent turn so you can see the question, the answer, and the evidence Reviews used to flag it. + +## Privacy and data handling + +- Reviews runs against agent turns inside your organization only. +- Findings are stored in Lightdash alongside your other agent data and respect your existing project and admin permissions. +- Disabling the toggle stops new collection immediately; it does not delete previously collected findings. + +## Related + +- [Evaluations](/guides/ai-agents/evaluations) — run a fixed set of prompts against your agent and grade the answers. +- [AI writeback](/guides/ai-agents/ai-writeback) — let an agent open a dbt pull request from chat. +- [Autopilot](/guides/ai-agents/autopilot) — scheduled agent that cleans up content and flags issues for review.