Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 9 additions & 7 deletions api/content-markdown.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,10 @@ import {
hasContentSlug,
hasSolutionSlug,
readContentSections,
readCookbookGoal,
readCookbookIntro,
} from "../src/lib/content-markdown";
import { joinContentSections } from "../src/lib/content-sections";
import { goalOnly } from "../src/lib/content-sections";
import { buildCookbookMarkdownDocument } from "../src/lib/cookbook-composition";
import { expandMdxImports } from "../src/lib/expand-mdx";
import {
Expand Down Expand Up @@ -168,17 +169,15 @@ function readRecipeMarkdown(rootDir: string, slug: string): string {
if (!hasContentSlug(rootDir, "recipes", slug)) {
throw new Error(`Recipe page not found: "${slug}"`);
}
return joinContentSections(readContentSections(rootDir, "recipes", slug));
return goalOnly(readContentSections(rootDir, "recipes", slug));
}

function readExampleMarkdown(rootDir: string, slug: string): string {
if (!hasContentSlug(rootDir, "examples", slug)) {
throw new Error(`Example page not found: "${slug}"`);
}

const content = joinContentSections(
readContentSections(rootDir, "examples", slug),
);
const content = goalOnly(readContentSections(rootDir, "examples", slug));

const example = examples.find((e) => e.id === slug);
if (!example) {
Expand Down Expand Up @@ -230,11 +229,14 @@ function readCookbookMarkdown(rootDir: string, slug: string): string {
};
});

const goal = readCookbookGoal(rootDir, slug);
return buildCookbookMarkdownDocument({
cookbookName: cookbook.name,
cookbookDescription: cookbook.description,
intro: readCookbookIntro(rootDir, slug),
goal,
intro: goal ? undefined : readCookbookIntro(rootDir, slug),
recipes: recipeInputs,
mode: "agent",
});
}

Expand Down Expand Up @@ -345,7 +347,7 @@ export function loadAgentPromptParts(
intentRecipe: readContent("intent-recipe"),
intentCookbook: readContent("intent-cookbook"),
intentExample: readContent("intent-example"),
localBootstrap: joinContentSections(
localBootstrap: goalOnly(
readContentSections(rootDir, "recipes", LOCAL_BOOTSTRAP_SLUG),
),
};
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
## What you are building

A streaming AI chat app on Databricks: a user sends a message, the server authenticates with the Databricks CLI profile (or a service-principal token in production), calls an AI Gateway chat endpoint via the OpenAI-compatible provider, and streams the answer back token-by-token. Chat sessions and messages are persisted in Lakebase Postgres so conversations survive page refreshes and redeploys.

### How the steps fit together
Expand Down
6 changes: 6 additions & 0 deletions content/cookbooks/app-with-lakebase/goal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
A Databricks App with Lakebase Postgres for persistent data storage. The app has schema setup, full CRUD API routes, and deploys to the Databricks Apps platform.

### Components

1. **Create a Lakebase Instance** — provision a managed Postgres project with an endpoint and database, and collect the connection values.
2. **Lakebase Data Persistence** — add the Lakebase plugin to your app with schema initialization, CRUD routes, and data access patterns.
5 changes: 5 additions & 0 deletions content/cookbooks/genie-analytics-app/goal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
A minimal Databricks App with AI/BI Genie conversational analytics. Users ask natural-language questions about their data and get SQL-powered answers through an embedded Genie chat interface.

### Components

1. **Genie Conversational Analytics** — configure a Genie space, wire up the server and client plugins, declare app resources, and deploy.
7 changes: 7 additions & 0 deletions content/cookbooks/lakebase-off-platform/goal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
A connection from an app hosted outside the Databricks Apps platform (for example on AWS, Vercel, or Netlify) to Lakebase Postgres. The app uses portable environment configuration, token management with automatic credential refresh, and Drizzle ORM for type-safe database access.

### Components

1. **Lakebase Environment Management** — set up a Zod-validated environment configuration for secure Lakebase connection values.
2. **Lakebase Token Management** — implement token fetch, cache, and automatic refresh for Lakebase Postgres credentials.
3. **Drizzle ORM with Lakebase** — configure a Drizzle ORM pool with auto-refreshing credentials and migration support.
8 changes: 8 additions & 0 deletions content/cookbooks/operational-data-analytics/goal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
An end-to-end operational data analytics pipeline: data flows from an OLTP database (Lakebase Postgres) through CDC replication into Unity Catalog, gets transformed through a medallion architecture (bronze/silver/gold layers), and is ready for dashboards and downstream consumers.

### Components

1. **Unity Catalog Setup** — configure Unity Catalog with external S3 storage for your destination catalog and schema.
2. **Create a Lakebase Instance** — provision a managed Postgres project as the OLTP source.
3. **Lakehouse Sync CDC** — enable change data capture replication from Lakebase tables to Unity Catalog Delta history tables.
4. **Medallion Architecture from CDC** — build silver (current-state) and gold (analytical) layers from the CDC history tables using Lakeflow Declarative Pipelines.
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
## Agentic Support Console

This template brings together the full Databricks developer stack into a single operational data application: an AI-powered support console where every customer message is automatically triaged by an LLM, and support agents review, approve, or override the suggestion from a purpose-built internal tool.

### Data Flow
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
## Content Moderator

This template demonstrates an internal content moderation tool built on Databricks: authors submit content for different channels (company blog, LinkedIn, Twitter, newsletter, press releases), moderators maintain per-channel guidelines, and an LLM scores each submission against those guidelines before a human reviewer makes the final call.

### Data Flow
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
## Inventory Intelligence

This template builds a full retail inventory management system on the Databricks stack: a React app where store managers monitor stock health, review AI-generated replenishment recommendations, and approve purchase orders — all powered by a live medallion pipeline and pluggable demand forecast job.

### Data Flow
Expand Down
43 changes: 14 additions & 29 deletions content/examples/rag-chat/content.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,21 @@
## RAG Chat App
### 2. Create the Lakebase Postgres prerequisites

This template demonstrates a Retrieval-Augmented Generation chat app built on Databricks: a user question is embedded, similar documents are retrieved from a pgvector store in Lakebase Postgres, and the retrieved context is injected into a Model Serving call that streams the answer back. Conversations and sources are persisted per chat in Lakebase.
The template's AppKit Lakebase plugin requires an existing Postgres **branch** and **database**. `databricks postgres create-project` automatically provisions a default branch named `production` and a default database on it, so one command is all you need. Pick a short lowercase project id and export the resolved resource names — the next step's `databricks apps init` command reads them as shell variables.

### Data Flow
```bash
PROJECT_ID=rag-chat

All retrieval and chat state live in Lakebase Postgres; generation uses AI Gateway:
databricks postgres create-project "$PROJECT_ID"

1. **Seeding** pulls a handful of Wikipedia articles on startup, chunks them by paragraph, embeds each chunk through the AI Gateway embeddings endpoint (`databricks-gte-large-en` by default), and writes rows into `rag.documents` with a `vector(1024)` column.
2. **User turns** are embedded with the same endpoint. The server runs a pgvector cosine-similarity search to retrieve the top-k matching chunks.
3. **Context injection**: the retrieved chunks are prepended as a system message before the user's conversation history is sent to the chat completion endpoint (`databricks-gpt-5-4-mini` by default) via AI Gateway.
4. **Streaming**: `streamText` streams tokens back to the client while an `onFinish` callback appends the assistant turn to Lakebase.
5. **Chat history**: every user and assistant turn is persisted in `chat.messages`, keyed by `chat_id`, so conversations can be resumed.
export BRANCH_NAME="projects/$PROJECT_ID/branches/production"
export DATABASE_NAME=$(databricks api get "/api/2.0/postgres/$BRANCH_NAME/databases" -o json | \
python3 -c "import json,sys; print(json.load(sys.stdin)['databases'][0]['name'])")

### Template Approach
echo "Branch: $BRANCH_NAME"
echo "Database: $DATABASE_NAME"
```

Unlike the other templates, **this template is designed to be consumed via `databricks apps init`**, not `git clone`. The init flow:
`create-project` is long-running; the CLI waits for it to finish by default. **If it reports `already exists`:**

- Prompts for the Lakebase Postgres branch and database resource names.
- Auto-resolves `PGHOST`, `PGDATABASE`, and `LAKEBASE_ENDPOINT` into your local `.env` by calling the Lakebase APIs.
- Writes `DATABRICKS_CONFIG_PROFILE` or `DATABRICKS_HOST` based on your Databricks CLI configuration.
- Drops you into a ready-to-run project directory named by `--name`.

This validates the [AppKit templates system](/docs/appkit/v0/development/templates) as a way to ship DevHub templates — see `appkit.plugins.json` and `.env.tmpl` in the template for how it works.

### What to Adapt

Setup and provisioning are documented in the repository's **`template/README.md`**.

To make this template your own:

- **Lakebase**: Point the bundle at your own Lakebase project, branch, and database (prompted at init time).
- **Model Serving endpoint**: Override `DATABRICKS_ENDPOINT` for a different chat model (e.g. `databricks-claude-sonnet-4`).
- **Embeddings endpoint**: Override `DATABRICKS_EMBEDDING_ENDPOINT` if you want a different embedding model. Make sure the `vector(N)` dimension in `server/lib/rag-store.ts` matches.
- **Seed data**: Replace the Wikipedia article list in `server/lib/seed-data.ts` with your own corpus. The chunking function splits on paragraph boundaries — adapt if your source has different structure.
- **Retrieval**: The default top-k is 5 and the similarity metric is cosine. Tune in `retrieveSimilar()`.
- **Prefer picking a different `PROJECT_ID`** (e.g. append a short suffix) and re-export `BRANCH_NAME` / `DATABASE_NAME` from the new id. Lakebase projects can hold data that other apps and pipelines depend on, so do **not** run `databricks postgres delete-project` on an existing project without explicit confirmation from the user that nothing else uses it.
- **Eventual-consistency exception:** if you just deleted a project with this id in the same session and `databricks postgres list-projects` no longer shows it, wait 30–60s and retry `create-project` — the control plane is briefly inconsistent after deletion.
25 changes: 0 additions & 25 deletions content/examples/rag-chat/deployment.md

This file was deleted.

34 changes: 34 additions & 0 deletions content/examples/rag-chat/goal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
This template demonstrates a Retrieval-Augmented Generation chat app built on Databricks: a user question is embedded, similar documents are retrieved from a pgvector store in Lakebase Postgres, and the retrieved context is injected into a Model Serving call that streams the answer back. Conversations and sources are persisted per chat in Lakebase.

### Data Flow

All retrieval and chat state live in Lakebase Postgres; generation uses AI Gateway:

1. **Seeding** pulls a handful of Wikipedia articles on startup, chunks them by paragraph, embeds each chunk through the AI Gateway embeddings endpoint (`databricks-gte-large-en` by default), and writes rows into `rag.documents` with a `vector(1024)` column.
2. **User turns** are embedded with the same endpoint. The server runs a pgvector cosine-similarity search to retrieve the top-k matching chunks.
3. **Context injection**: the retrieved chunks are prepended as a system message before the user's conversation history is sent to the chat completion endpoint (`databricks-gpt-5-4-mini` by default) via AI Gateway.
4. **Streaming**: `streamText` streams tokens back to the client while an `onFinish` callback appends the assistant turn to Lakebase.
5. **Chat history**: every user and assistant turn is persisted in `chat.messages`, keyed by `chat_id`, so conversations can be resumed.

### Template Approach

Unlike the other templates, **this template is designed to be consumed via `databricks apps init`**, not `git clone`. The init flow:

- Prompts for the Lakebase Postgres branch and database resource names.
- Auto-resolves `PGHOST`, `PGDATABASE`, and `LAKEBASE_ENDPOINT` into your local `.env` by calling the Lakebase APIs.
- Writes `DATABRICKS_CONFIG_PROFILE` or `DATABRICKS_HOST` based on your Databricks CLI configuration.
- Drops you into a ready-to-run project directory named by `--name`.

This validates the [AppKit templates system](/docs/appkit/v0/development/templates) as a way to ship DevHub templates — see `appkit.plugins.json` and `.env.tmpl` in the template for how it works.

### What to Adapt

Setup and provisioning are documented in the repository's **`template/README.md`**.

To make this template your own:

- **Lakebase**: Point the bundle at your own Lakebase project, branch, and database (prompted at init time).
- **Model Serving endpoint**: Override `DATABRICKS_ENDPOINT` for a different chat model (e.g. `databricks-claude-sonnet-4`).
- **Embeddings endpoint**: Override `DATABRICKS_EMBEDDING_ENDPOINT` if you want a different embedding model. Make sure the `vector(N)` dimension in `server/lib/rag-store.ts` matches.
- **Seed data**: Replace the Wikipedia article list in `server/lib/seed-data.ts` with your own corpus. The chunking function splits on paragraph boundaries — adapt if your source has different structure.
- **Retrieval**: The default top-k is 5 and the similarity metric is cosine. Tune in `retrieveSimilar()`.
21 changes: 0 additions & 21 deletions content/examples/rag-chat/prerequisites.md

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
## SaaS Subscription Tracker

This template demonstrates a straightforward internal CRUD tool built on Databricks: a SaaS subscription tracker where teams log the tools they use, who owns each subscription, what it costs, and when it renews. A Genie space provides self-serve analytics over the subscription data.

### Data Flow
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
## Vacation Rentals Operations Console

This template demonstrates an internal operations console for a vacation rentals platform ("Wanderbricks"). Operators see revenue performance by destination, work through a booking queue with per-booking flags and agent notes, and ask natural-language questions about the business through an embedded Genie chat panel.

### Data Flow
Expand Down
Loading