Intermittent truncation of initial audio on PCM path (AudioByteStream ->   AudioSource.capture_frame())

### Bug Description

When using a custom TTS plugin that outputs raw PCM/WAV audio through the PCM path (`AudioByteStream -> AudioSource.capture_frame()`), 
the first few tens to hundreds of milliseconds of audio are intermittently truncated on the receiving end.

In our tests, this was observed only on the PCM path. 
We did not observe the same issue when using the encoded audio path (for example, MP3 via `AudioStreamDecoder`).

The issue is intermittent and occurs roughly 1 in 3-5 attempts under the same conditions.

What we verified:
- The raw audio bytes from the TTS provider are intact when dumped and played locally.
- Waveform analysis shows the initial samples are valid and do not contain abnormal spikes.
- We added debug logging in the Python SDK path and verified that every frame reaches `AudioSource.capture_frame()` successfully.
- We were not able to find any frame loss before `capture_frame()`.

In our tests, this appears to happen somewhere below that handoff point.

### Expected Behavior

All audio frames passed to `AudioSource.capture_frame()` should be delivered to the remote participant without truncation, 
including the beginning of speech.

### Reproduction Steps

```bash
1. Generate short utterances using a custom TTS plugin that returns WAV/PCM audio at 24kHz mono.
2. Feed the PCM bytes into the agents PCM path (`AudioByteStream`) and forward every emitted frame to `AudioSource.capture_frame()`.
3. Publish the audio track to a LiveKit room and receive it in a browser client.
4. Repeat playback of the same utterance multiple times under the same conditions.
5. Intermittently, the beginning of speech is truncated on the receiving side;
approximately the first 50-300ms is lost.

We can provide a minimal reproducer or captured PCM sample if needed.
```

### Operating System

Linux (production), macOS (development)

### Models Used

_No response_

### Package Versions

```bash
livekit-agents==1.4.2
livekit==1.1.2
Python 3.13
```

### Session/Room/Call IDs

_No response_

### Proposed Solution

```python
No concrete fix proposed yet. 
We are mainly reporting the issue and sharing the investigation results. 
If there is a recommended configuration or known workaround for the PCM path, we would like to try it.
```

### Additional Context

Symptoms:
- A brief "pop" sound at the start of speech
- The first syllable or word is partially or fully cut off
- The same audio may play correctly on one attempt and be truncated on another

Workarounds attempted:
- Insert delay (30/50/100ms) before the first frame: no improvement
- Change AudioByteStream chunk size: no improvement
- Switch Fish Audio output to raw PCM (skip WAV decode): slight improvement, not resolved
- Insert 200ms of silence padding before the first real audio frame: no improvement
- Change `rtc.AudioSource` `queue_size_ms` to 50 / 100 / 500: no improvement

This is particularly impactful for voice agent applications because users perceive it as the agent cutting off the beginning of sentences.

### Screenshots and Recordings

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermittent truncation of initial audio on PCM path (AudioByteStream -> AudioSource.capture_frame()) #5158

Bug Description

Expected Behavior

Reproduction Steps

Operating System

Models Used

Package Versions

Session/Room/Call IDs

Proposed Solution

Additional Context

Screenshots and Recordings

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Intermittent truncation of initial audio on PCM path (AudioByteStream -> AudioSource.capture_frame()) #5158

Description

Bug Description

Expected Behavior

Reproduction Steps

Operating System

Models Used

Package Versions

Session/Room/Call IDs

Proposed Solution

Additional Context

Screenshots and Recordings

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions