|
| 1 | +# Audio Converter |
| 2 | + |
| 3 | +The `AudioConverter` remixes and resamples PCM audio frames to a target sample rate and channel count. It operates on 10 ms frames of 16‑bit PCM data and returns the number of output samples produced for each 10 ms frame. |
| 4 | + |
| 5 | +API: `dev.onvoid.webrtc.media.audio.AudioConverter` |
| 6 | + |
| 7 | +## Overview |
| 8 | + |
| 9 | +- Input format: 16‑bit little‑endian PCM (`byte[]`) |
| 10 | +- Frame duration: exactly 10 ms per call |
| 11 | +- Channel remixing: up/down‑mix between mono/stereo (and other counts if supported by the native backend) |
| 12 | +- Resampling: arbitrary input/output sample rates (e.g., 48 kHz → 16 kHz) |
| 13 | +- Memory ownership: you provide both input and output buffers |
| 14 | +- Native resources: must be released with `dispose()` when done |
| 15 | + |
| 16 | +Key methods: |
| 17 | +- `AudioConverter(int srcSampleRate, int srcChannels, int dstSampleRate, int dstChannels)` – configure the converter |
| 18 | +- `int getTargetBufferSize()` – bytes required for the destination buffer for one 10 ms frame |
| 19 | +- `int convert(byte[] src, byte[] dst)` – convert one 10 ms input frame into the destination buffer, returns number of samples written (per frame across all channels) |
| 20 | +- `void dispose()` – free native resources |
| 21 | + |
| 22 | +## Frame sizing |
| 23 | + |
| 24 | +The converter operates on 10 ms frames. For a given sample rate and channel count, the number of samples per 10 ms is: |
| 25 | + |
| 26 | +- Samples per channel = sampleRate / 100 |
| 27 | +- Total samples (all channels) = samples per channel × channels |
| 28 | +- Bytes required = total samples × 2 (because 16‑bit PCM) |
| 29 | + |
| 30 | +Examples: |
| 31 | +- 48 kHz stereo input: samples = (48000 / 100) × 2 = 960 × 2 = 1920 samples → 3840 bytes |
| 32 | +- 16 kHz mono output: samples = (16000 / 100) × 1 = 160 samples → 320 bytes |
| 33 | + |
| 34 | +The method `getTargetBufferSize()` returns the exact number of bytes you need for the destination buffer for one 10 ms frame of the configured output format. |
| 35 | + |
| 36 | +## Basic usage |
| 37 | + |
| 38 | +```java |
| 39 | +import dev.onvoid.webrtc.media.audio.AudioConverter; |
| 40 | + |
| 41 | +// Convert 48 kHz stereo to 16 kHz mono |
| 42 | +int srcSampleRate = 48000; |
| 43 | +int srcChannels = 2; |
| 44 | +int dstSampleRate = 16000; |
| 45 | +int dstChannels = 1; |
| 46 | + |
| 47 | +AudioConverter converter = new AudioConverter(srcSampleRate, srcChannels, dstSampleRate, dstChannels); |
| 48 | + |
| 49 | +try { |
| 50 | + // Compute 10 ms frame sizes |
| 51 | + int srcSamplesPer10ms = (srcSampleRate / 100) * srcChannels; // 960 * 2 = 1920 samples |
| 52 | + int srcBytesPer10ms = srcSamplesPer10ms * 2; // 3840 bytes |
| 53 | + |
| 54 | + byte[] srcFrame = new byte[srcBytesPer10ms]; |
| 55 | + |
| 56 | + // Destination buffer for one 10 ms frame of output |
| 57 | + byte[] dstFrame = new byte[converter.getTargetBufferSize()]; // e.g., 320 bytes for 16 kHz mono |
| 58 | + |
| 59 | + // Fill srcFrame from your capture or pipeline (exactly 10 ms of PCM 16‑bit data) |
| 60 | + // ... |
| 61 | + |
| 62 | + int outSamples = converter.convert(srcFrame, dstFrame); |
| 63 | + // outSamples equals (dstSampleRate / 100) * dstChannels, e.g., 160 for 16 kHz mono |
| 64 | + |
| 65 | + // Process/use dstFrame (contains 10 ms of resampled/remixed PCM 16‑bit data) |
| 66 | +} |
| 67 | +finally { |
| 68 | + converter.dispose(); |
| 69 | +} |
| 70 | +``` |
| 71 | + |
| 72 | +## Continuous conversion loop |
| 73 | + |
| 74 | +```java |
| 75 | +AudioConverter converter = new AudioConverter(48000, 2, 48000, 1); // stereo to mono, same rate |
| 76 | + |
| 77 | +try { |
| 78 | + int srcBytesPer10ms = (48000 / 100) * 2 /*channels*/ * 2 /*bytes*/; // 1920 * 2 = 3840 |
| 79 | + byte[] srcFrame = new byte[srcBytesPer10ms]; |
| 80 | + byte[] dstFrame = new byte[converter.getTargetBufferSize()]; |
| 81 | + |
| 82 | + while (running) { |
| 83 | + // Read exactly 10 ms of input into srcFrame |
| 84 | + // ... |
| 85 | + |
| 86 | + converter.convert(srcFrame, dstFrame); |
| 87 | + |
| 88 | + // Write/queue dstFrame to the next stage (encoder, file, etc.) |
| 89 | + } |
| 90 | +} |
| 91 | +finally { |
| 92 | + converter.dispose(); |
| 93 | +} |
| 94 | +``` |
| 95 | + |
| 96 | +## Error handling and caveats |
| 97 | + |
| 98 | +- Frame length must be exactly 10 ms. If `src` has fewer samples than required, `convert` throws `IllegalArgumentException`. |
| 99 | +- Ensure `dst` is at least `getTargetBufferSize()` bytes. Otherwise, `IllegalArgumentException` is thrown. |
| 100 | +- Audio is assumed to be 16‑bit PCM. Do not pass float or 24‑bit samples. |
| 101 | +- Always call `dispose()` to free native resources when the converter is no longer needed. |
| 102 | + |
| 103 | +## Related guides |
| 104 | + |
| 105 | +- [Audio Processing](guide/audio_processing.md) |
| 106 | +- [Headless Audio](guide/headless_audio_device_module.md) |
| 107 | +- [Voice Activity Detector](guide/voice_activity_detector.md) |
0 commit comments