pipecat-ai · markbackman · May 29, 2026 · May 28, 2026
diff --git a/api-reference/server/services/stt/cartesia.mdx b/api-reference/server/services/stt/cartesia.mdx
@@ -1,11 +1,14 @@
 ---
 title: "Cartesia"
-description: "Speech-to-text service implementation using Cartesia's real-time transcription API"
+description: "Speech-to-text service implementations using Cartesia's real-time transcription APIs"
 ---
 
 ## Overview
 
-`CartesiaSTTService` provides real-time speech recognition using Cartesia's WebSocket API with the `ink-whisper` model, supporting streaming transcription with both interim and final results for low-latency applications.
+Cartesia provides two STT service implementations:
+
+- `CartesiaSTTService` for real-time speech recognition using Cartesia's WebSocket API with the `ink-whisper` model, supporting streaming transcription with both interim and final results for low-latency applications
+- `CartesiaTurnsSTTService` for turn-based speech recognition using Cartesia's v2 WebSocket API with the `ink-2` model, where the server drives turn boundaries and pushes structured events for turn lifecycle management including start, updates, eager end predictions, resume, and final turn completion
 
 <CardGroup cols={2}>
   <Card
@@ -16,12 +19,26 @@ description: "Speech-to-text service implementation using Cartesia's real-time t
     Pipecat's API methods for Cartesia STT integration
   </Card>
   <Card
-    title="Example Implementation"
+    title="Cartesia Turns STT API Reference"
+    icon="code"
+    href="https://reference-server.pipecat.ai/en/latest/api/pipecat.services.cartesia.turns.stt.html"
+  >
+    Pipecat's API methods for Cartesia Turns STT integration
+  </Card>
+  <Card
+    title="Standard STT Example"
     icon="play"
     href="https://github.com/pipecat-ai/pipecat/blob/main/examples/transcription/transcription-cartesia.py"
   >
     Complete example with transcription logging
   </Card>
+  <Card
+    title="Turns STT Example"
+    icon="play"
+    href="https://github.com/pipecat-ai/pipecat/blob/main/examples/transcription/transcription-cartesia-turns.py"
+  >
+    Complete example with turn-based transcription
+  </Card>
   <Card
     title="Cartesia Documentation"
     icon="book"
@@ -50,15 +67,13 @@ Before using Cartesia STT services, you need:
 
 1. **Cartesia Account**: Sign up at [Cartesia](https://cartesia.ai/)
 2. **API Key**: Generate an API key from your account dashboard
-3. **Model Access**: Ensure access to the ink-whisper transcription model
+3. **Model Access**: Ensure access to the transcription model you plan to use (`ink-whisper` for `CartesiaSTTService`, `ink-2` for `CartesiaTurnsSTTService`)
 
 ### Required Environment Variables
 
 - `CARTESIA_API_KEY`: Your Cartesia API key for authentication
 
-## Configuration
-
-### CartesiaSTTService
+## CartesiaSTTService
 
 <ParamField path="api_key" type="str" required>
   Cartesia API key for authentication.
@@ -107,9 +122,9 @@ Runtime-configurable settings passed via the `settings` constructor argument usi
 | `model`    | `str`             | `"ink-whisper"` | The transcription model to use. _(Inherited from base STT settings.)_    |
 | `language` | `Language \| str` | `"en"`          | Target language for transcription. _(Inherited from base STT settings.)_ |
 
-## Usage
+### Usage
 
-### Basic Setup
+#### Basic Setup
 
 ```python
 from pipecat.services.cartesia.stt import CartesiaSTTService
@@ -119,7 +134,7 @@ stt = CartesiaSTTService(
 )
 ```
 
-### With Custom Options
+#### With Custom Options
 
 ```python
 from pipecat.services.cartesia.stt import CartesiaSTTService
@@ -134,7 +149,7 @@ stt = CartesiaSTTService(
 )
 ```
 
-## Notes
+### Notes
 
 - **Inactivity timeout**: Cartesia disconnects WebSocket connections after 3 minutes of inactivity. The timeout resets with each message sent. Silence-based keepalive is enabled by default to prevent disconnections.
 - **Auto-reconnect on send**: If the connection is closed (e.g., due to timeout), the service automatically reconnects when the next audio data is sent.
@@ -147,7 +162,7 @@ stt = CartesiaSTTService(
   guide](/pipecat/fundamentals/service-settings) for migration details.
 </Tip>
 
-## Event Handlers
+### Event Handlers
 
 Cartesia STT supports the standard [service connection events](/api-reference/server/events/service-events):
 
@@ -161,3 +176,139 @@ Cartesia STT supports the standard [service connection events](/api-reference/se
 async def on_connected(service):
     print("Connected to Cartesia STT")
 ```
+
+## CartesiaTurnsSTTService
+
+The server drives turn boundaries with the `ink-2` model, pushing structured events for turn lifecycle management including start, updates, eager end predictions, resume, and final turn completion.
+
+<ParamField path="api_key" type="str" required>
+  Cartesia API key for authentication.
+</ParamField>
+
+<ParamField path="url" type="str" default="wss://api.cartesia.ai/stt/turns/websocket">
+  WebSocket URL for the Cartesia Streaming ASR v2 endpoint.
+</ParamField>
+
+<ParamField path="sample_rate" type="int | None" default="None">
+  Audio sample rate in Hz. If `None`, uses the pipeline sample rate.
+</ParamField>
+
+<ParamField path="should_interrupt" type="bool" default="True">
+  Whether to broadcast an interruption when the server signals the start of a new turn.
+</ParamField>
+
+<ParamField path="watchdog_min_timeout" type="float" default="0.5">
+  Minimum idle timeout (in seconds) before sending silence to prevent dangling turns. The actual threshold is `max(chunk_duration * 2, watchdog_min_timeout)`.
+</ParamField>
+
+<ParamField path="extra_headers" type="dict[str, str] | None" default="None">
+  Optional additional HTTP headers to send with the WebSocket handshake.
+</ParamField>
+
+<ParamField path="settings" type="CartesiaTurnsSTTService.Settings" default="None">
+  Runtime-updatable settings. See [Settings](#settings-2) below.
+</ParamField>
+
+### Settings
+
+Runtime-configurable settings passed via the `settings` constructor argument using `CartesiaTurnsSTTService.Settings(...)`. The ink-2 model family is English-only and does not support runtime model or language switching. Attempts to update these fields will be reported as unhandled.
+
+| Parameter  | Type              | Default   | Description                                                           |
+| ---------- | ----------------- | --------- | --------------------------------------------------------------------- |
+| `model`    | `str`             | `"ink-2"` | The transcription model to use. _(Inherited from base STT settings.)_ |
+| `language` | `Language \| str` | `None`    | Target language (fixed to English). _(Inherited from base STT settings.)_ |
+
+### Usage
+
+#### Basic Setup
+
+```python
+from pipecat.services.cartesia.turns.stt import CartesiaTurnsSTTService
+
+stt = CartesiaTurnsSTTService(
+    api_key=os.getenv("CARTESIA_API_KEY"),
+)
+```
+
+#### With Custom Configuration
+
+```python
+from pipecat.services.cartesia.turns.stt import CartesiaTurnsSTTService
+
+stt = CartesiaTurnsSTTService(
+    api_key=os.getenv("CARTESIA_API_KEY"),
+    sample_rate=16000,
+    should_interrupt=True,
+    watchdog_min_timeout=1.0,
+)
+```
+
+#### With Event Handlers
+
+```python
+from pipecat.services.cartesia.turns.stt import CartesiaTurnsSTTService
+
+stt = CartesiaTurnsSTTService(
+    api_key=os.getenv("CARTESIA_API_KEY"),
+)
+
+@stt.event_handler("on_turn_start")
+async def on_turn_start(service, transcript):
+    print(f"User started speaking: {transcript}")
+
+@stt.event_handler("on_turn_end")
+async def on_turn_end(service, transcript):
+    print(f"Final transcript: {transcript}")
+```
+
+### Turn-Based Protocol
+
+The service speaks the v2 turn-based wire protocol:
+
+```
+connected → turn.start → turn.update* → (turn.eager_end → turn.resume?)* → turn.end → ...
+```
+
+- **`turn.start`**: Server detected the start of a turn. Pushes `UserStartedSpeakingFrame` and optionally broadcasts an interruption.
+- **`turn.update`**: Incremental transcript update. Pushes `InterimTranscriptionFrame`.
+- **`turn.eager_end`**: Server eagerly predicted the end of turn. Available via event handler for speculative downstream processing.
+- **`turn.resume`**: User resumed speaking after an eager end. Available via event handler.
+- **`turn.end`**: Final transcript for the completed turn. Pushes `TranscriptionFrame` and `UserStoppedSpeakingFrame`.
+
+Transcripts are cumulative per turn. There is no `is_final` flag and no `finalize` command — closing the socket ends the session.
+
+### Notes
+
+- **English-only**: The ink-2 model family supports English transcription only at launch.
+- **No runtime model switching**: Unlike the v1 API, the ink-2 model does not support runtime model or language switching.
+- **Watchdog for dangling turns**: If audio stops flowing after a `turn.start`, the service sends silence to prevent the turn from hanging indefinitely. Configure the threshold with `watchdog_min_timeout`.
+- **Server-driven turns**: The server controls turn boundaries. There is no client-side `finalize` command.
+- **Interruption support**: Set `should_interrupt=True` to broadcast interruptions when the user starts speaking, enabling natural turn-taking.
+
+### Event Handlers
+
+Cartesia Turns STT supports the following event handlers:
+
+| Event                 | Handler Signature                         | Description                                                |
+| --------------------- | ----------------------------------------- | ---------------------------------------------------------- |
+| `on_connected`        | `async def(service)`                      | Connected to Cartesia WebSocket                            |
+| `on_disconnected`     | `async def(service)`                      | Disconnected from Cartesia WebSocket                       |
+| `on_connection_error` | `async def(service, error_msg)`           | Connection error occurred                                  |
+| `on_turn_start`       | `async def(service, transcript: str)`     | Server detected start of a turn                            |
+| `on_turn_update`      | `async def(service, transcript: str)`     | Incremental transcript update                              |
+| `on_turn_eager_end`   | `async def(service, transcript: str)`     | Server eagerly predicted end of turn                       |
+| `on_turn_resume`      | `async def(service)`                      | User resumed speaking after an eager end                   |
+| `on_turn_end`         | `async def(service, transcript: str)`     | Final transcript for the completed turn                    |
+
+Example:
+
+```python
+@stt.event_handler("on_turn_eager_end")
+async def on_turn_eager_end(service, transcript):
+    print(f"Eager end prediction: {transcript}")
+    # Optionally start processing speculatively
+
+@stt.event_handler("on_turn_resume")
+async def on_turn_resume(service):
+    print("User resumed speaking, discard speculative processing")
+```