feat(agents): add modality-aware Instructions with audio/text variants by toubatbrian · Pull Request #1484 · livekit/agents-js

toubatbrian · 2026-05-13T01:56:59Z

Description

Introduces a new Instructions class for system prompts that adapt to the user's input modality (audio vs. text). This enables agents to provide different guidance to the LLM depending on whether the user is speaking or typing—for example, instructing the LLM to normalize spoken expressions like "next Tuesday" when processing voice input, while treating text input literally.

The pipeline now applies the matching variant before each LLM turn based on SpeechHandle.inputDetails.modality, and AgentSession.generateReply() exposes an inputModality option to control which variant is used.

Changes Made

New Instructions class (chat_context.ts):
- Holds both audio and text variants of system instructions
- value property renders the currently active variant (defaults to audio)
- asModality(modality) returns a copy with the specified variant active, preserving both variants for future switches
- concat() method propagates both variants when combining instructions
- toJSON() serializes both variants for persistence
New concatInstructions() helper (chat_context.ts):
- Concatenates any mix of strings and Instructions objects
- Propagates both audio/text variants from all operands
- Returns a plain string if no Instructions are involved, otherwise returns Instructions
New applyInstructionsModality() function (generation.ts):
- Locates the instructions message in the chat context
- Applies the correct variant based on input modality before LLM inference
- No-op when no modality-aware instructions are present
Updated SpeechHandle (speech_handle.ts):
- Added InputDetails interface with modality: 'audio' | 'text'
- SpeechHandle.create() now accepts optional inputDetails parameter
- Getter inputDetails exposes the modality for the current turn
Updated AgentSession.generateReply() (agent_session.ts):
- New optional inputModality parameter (defaults to 'text')
- Passed through to AgentActivity for modality-aware instruction selection
Updated Agent and AgentOptions (agent.ts):
- instructions field now accepts string | Instructions
Updated AgentConfigUpdate (chat_context.ts):
- instructions field now accepts string | Instructions
- Serializes Instructions via toJSON() when present
Provider format adapters (openai, google, mistralai):
- Updated to handle Instructions content by extracting the value property
Utility updates (utils.ts):
- validateChatContextStructure() recognizes instructions type
- formatMessageContentPart() extracts value from Instructions
Example agent (instructions_per_modality.ts):
- Demonstrates a scheduling assistant with different instructions for voice vs. text users
- Voice users get guidance on parsing spoken expressions and self-corrections
- Text users get guidance on accepting literal input and skipping unnecessary confirmations
Comprehensive test suite (chat_context.test.ts):
- Tests serialization, concatenation, modality switching, and round-tripping
- Tests interaction with applyInstructionsModality() and ChatContext.copy()
- Verifies both variants are preserved across turns

Pre-Review Checklist

Build passes: All builds (lint, typecheck, tests) pass locally
AI-generated code reviewed: Code is hand-written and follows project conventions
Changes explained: All changes are documented above and in code comments
Scope appropriate: All changes relate to modality-aware instructions
Video demo: Not applicable (framework feature, not user-facing UI)

Testing

Added comprehensive unit tests covering:
- Serialization and JSON round-tripping
- Concatenation

Port livekit/agents#4987 to JS. Adds an Instructions class that holds separate audio/text system prompt variants and is resolved per-turn based on the input modality. The voice pipeline now calls applyInstructionsModality() before each LLM turn using the modality from SpeechHandle.inputDetails, and AgentSession.generateReply() takes a new inputModality option (defaults to 'text', matching Python). Provider format adapters (openai, google, mistralai) and remote_session render Instructions as their resolved string value.

changeset-bot · 2026-05-13T01:57:02Z

🦋 Changeset detected

Latest commit: 4f635e7

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 31 packages

Name	Type
@livekit/agents	Patch
@livekit/agents-plugin-anam	Patch
@livekit/agents-plugin-assemblyai	Patch
@livekit/agents-plugin-baseten	Patch
@livekit/agents-plugin-bey	Patch
@livekit/agents-plugin-cartesia	Patch
@livekit/agents-plugin-cerebras	Patch
@livekit/agents-plugin-deepgram	Patch
@livekit/agents-plugin-elevenlabs	Patch
@livekit/agents-plugin-fishaudio	Patch
@livekit/agents-plugin-google	Patch
@livekit/agents-plugin-hedra	Patch
@livekit/agents-plugin-hume	Patch
@livekit/agents-plugin-inworld	Patch
@livekit/agents-plugin-lemonslice	Patch
@livekit/agents-plugin-liveavatar	Patch
@livekit/agents-plugin-livekit	Patch
@livekit/agents-plugin-minimax	Patch
@livekit/agents-plugin-mistral	Patch
@livekit/agents-plugin-mistralai	Patch
@livekit/agents-plugin-neuphonic	Patch
@livekit/agents-plugin-openai	Patch
@livekit/agents-plugin-phonic	Patch
@livekit/agents-plugin-resemble	Patch
@livekit/agents-plugin-rime	Patch
@livekit/agents-plugin-runway	Patch
@livekit/agents-plugin-sarvam	Patch
@livekit/agents-plugin-silero	Patch
@livekit/agents-plugins-test	Patch
@livekit/agents-plugin-trugen	Patch
@livekit/agents-plugin-xai	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

CLAassistant · 2026-05-13T01:57:07Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ toubatbrian
❌ claude
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

chatItemToProto was passing the raw Instructions object through to the proto's text field, producing corrupt telemetry content. Extract .value when the content item is an Instructions instance.

…ar import

The ProtoMessage.text field is typed as ChatContent, so the original code intentionally passed objects through. Only Instructions need unwrapping to their rendered value; image/audio content should pass through unchanged so the OTLP exporter can serialize the full structure.

`===` on `Instructions` objects compares by reference, so two distinct instances with identical audio/text would falsely mark the realtime session as non-reusable on handoff. Add an `instructionsEqual` helper that compares strings by value and Instructions by audio + text, and use it in `_detachReusableResources`.

theomonnom · 2026-05-14T23:56:11Z

    params: {
      id?: string;
-      instructions?: string;
+      instructions?: string | Instructions;


I think we made a mistake in Python to use | Instructions. I'll remove that on Python.

Let's only use string for everything inside the ChatContext.

SG, I'll make a follow up PR once python side made the change

theomonnom

lgtm

chore(changeset): downgrade to patch bump

5c06ab3

This comment was marked as resolved.

Sign in to view

claude and others added 4 commits May 13, 2026 02:03

fix(telemetry): stringify Instructions in chatItemToProto

0892490

chatItemToProto was passing the raw Instructions object through to the proto's text field, producing corrupt telemetry content. Extract .value when the content item is an Instructions instance.

fix(telemetry): import Instructions from chat_context to avoid circul…

2aa45ac

…ar import

fix(telemetry): sort imports per prettier

ed28c1e

reduce parity gaps

9c68660

toubatbrian changed the title ~~feat(agents): add modality-aware Instructions with audio/text variants~~ brianyin/agt-2873-new-instructions-api May 13, 2026

toubatbrian changed the title ~~brianyin/agt-2873-new-instructions-api~~ feat(agents): add modality-aware Instructions with audio/text variants May 13, 2026

toubatbrian added the verified-port label May 13, 2026

toubatbrian added 3 commits May 13, 2026 15:43

simplify instruction type guard & align

933e44f

Merge branch 'main' into claude/port-instruction-api-js-pNpnn

e26a332

Merge branch 'main' into claude/port-instruction-api-js-pNpnn

80ad765

This comment was marked as resolved.

Sign in to view

theomonnom reviewed May 14, 2026

View reviewed changes

theomonnom approved these changes May 15, 2026

View reviewed changes

toubatbrian merged commit b7fdbe1 into main May 15, 2026
8 of 9 checks passed

toubatbrian deleted the claude/port-instruction-api-js-pNpnn branch May 15, 2026 00:20

This was referenced May 14, 2026

Version Packages #1499

Open

Version Packages KrishnaShuk/agents-js#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agents): add modality-aware Instructions with audio/text variants#1484

feat(agents): add modality-aware Instructions with audio/text variants#1484
toubatbrian merged 11 commits into
mainfrom
claude/port-instruction-api-js-pNpnn

toubatbrian commented May 13, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented May 13, 2026 •

edited

Loading

Uh oh!

CLAassistant commented May 13, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

theomonnom May 14, 2026

Uh oh!

toubatbrian May 15, 2026

Uh oh!

theomonnom left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

toubatbrian commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes Made

Pre-Review Checklist

Testing

Uh oh!

changeset-bot Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

CLAassistant commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

theomonnom May 14, 2026

Choose a reason for hiding this comment

Uh oh!

toubatbrian May 15, 2026

Choose a reason for hiding this comment

Uh oh!

theomonnom left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

toubatbrian commented May 13, 2026 •

edited

Loading

changeset-bot Bot commented May 13, 2026 •

edited

Loading

CLAassistant commented May 13, 2026 •

edited

Loading