Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rohitjoins,

I don't understand why this is needed: according to my reading of Table 7-5/7-6/7-7 you don't need to check this and can just assume embedded stereo/6ch flags as false if static fields aren't present.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking into these commits and providing feedback.

I took a closer look at the ETSI TS 102 114 V1.6.1 specification to verify this, and unfortunately, assuming those flags are false when bStaticFieldsPresent == 0 violates the spec and will cause a parser desync.

According to Section 7.5.2 (Page 98), the definition of bStaticFieldsPresent states: If the bStaticFieldsPresent is false, the metadata fields that are static over the duration of an encoded stream are omitted from the extension substream header.

Because they are static over the duration of an encoded stream, they are simply omitted to save bandwidth, they do not become false? They retain whatever value was established in the initial keyframe.

Since DtsUtil.parseDtsHdHeader is a stateless utility, it has no memory of the stream's true bEmbeddedStereoFlag. If we blindly assume the flag is false on an intermediate frame, but the stream actually has embedded stereo, our parser will fail to skip those 8 bits and instantly desync. By the time it tries to read the coding mode for the MIME type, it will be reading misaligned garbage data.

Therefore, because we cannot reliably parse the Dynamic Metadata statelessly, safely confining the parsing to if (staticFieldsPresent) and using the sampleMimeType fallback in DtsReader is required to prevent crashes on intermediate frames?

Please let me know if my interpretation of the spec is correct, and if I should send this PR at its current state for internal review. Thanks!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining, you're right :)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nift4 Is your next PR going to modify the DTS format detection of MKV files?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
*/
package androidx.media3.extractor;

import static com.google.common.base.Preconditions.checkNotNull;
import static java.lang.annotation.ElementType.TYPE_USE;
import static java.lang.annotation.RetentionPolicy.SOURCE;

Expand Down Expand Up @@ -45,7 +46,7 @@ public final class DtsUtil {
/** Information parsed from a DTS frame header. */
public static final class DtsHeader {
/** The mime type of the DTS bitstream. */
public final @DtsAudioMimeType String mimeType;
@Nullable public final @DtsAudioMimeType String mimeType;

/** The audio sampling rate in Hertz, or {@link C#RATE_UNSET_INT} if unknown. */
public final int sampleRate;
Expand All @@ -63,7 +64,7 @@ public static final class DtsHeader {
public final int bitrate;

private DtsHeader(
String mimeType,
@Nullable String mimeType,
int channelCount,
int sampleRate,
int frameSize,
Expand Down Expand Up @@ -429,6 +430,8 @@ public static DtsHeader parseDtsHdHeader(byte[] header) throws ParserException {
int assetsCount; // nuNumAssets
int referenceClockCode; // nuRefClockCode
int extensionSubstreamFrameDurationCode; // nuExSSFrameDurationCode
boolean enableMixMetadata = false; // bMixMetadataEnbl
int[] mixerOutChannels = null;

boolean staticFieldsPresent = headerBits.readBit(); // bStaticFieldsPresent
if (staticFieldsPresent) {
Expand Down Expand Up @@ -456,13 +459,16 @@ public static DtsHeader parseDtsHdHeader(byte[] header) throws ParserException {
}
}

if (headerBits.readBit()) { // bMixMetadataEnbl
enableMixMetadata = headerBits.readBit();
if (enableMixMetadata) { // bMixMetadataEnbl
headerBits.skipBits(2); // nuMixMetadataAdjLevel
int mixerOutputMaskBits = (headerBits.readBits(2) + 1) << 2; // nuBits4MixOutMask
int mixerOutputConfigurationCount = headerBits.readBits(2) + 1; // nuNumMixOutConfigs
mixerOutChannels = new int[mixerOutputConfigurationCount];
// Output Mixing Configuration Loop
for (int i = 0; i < mixerOutputConfigurationCount; i++) {
headerBits.skipBits(mixerOutputMaskBits); // nuMixOutChMask
int mask = headerBits.readBits(mixerOutputMaskBits); // nuMixOutChMask
mixerOutChannels[i] = getRemapChannelCount(mask);
}
}
} else {
Expand All @@ -476,9 +482,13 @@ public static DtsHeader parseDtsHdHeader(byte[] header) throws ParserException {
headerBits.skipBits(extensionSubstreamFrameSizeBits); // nuAssetFsize
int sampleRate = C.RATE_UNSET_INT;
int channelCount = C.LENGTH_UNSET; // nuTotalNumChs
boolean embeddedStereo = false; // bEmbeddedStereoFlag
boolean embedded6ch = false; // bEmbeddedSixChFlag

// Asset descriptor, see ETSI TS 102 114 V1.6.1 (2019-08) Table 7-5.
// Asset descriptor: Size, Index and Per Stream Static Metadata, see ETSI TS 102 114 V1.6.1
// (2019-08) Table 7-5.
headerBits.skipBits(9 + 3); // nuAssetDescriptFsize, nuAssetIndex
String mimeType = null;
if (staticFieldsPresent) {
if (headerBits.readBit()) { // bAssetTypeDescrPresent
headerBits.skipBits(4); // nuAssetTypeDescriptor
Expand All @@ -493,8 +503,47 @@ public static DtsHeader parseDtsHdHeader(byte[] header) throws ParserException {
headerBits.skipBits(5); // nuBitResolution
sampleRate = SAMPLE_RATE_BY_INDEX[headerBits.readBits(4)]; // nuMaxSampleRate
channelCount = headerBits.readBits(8) + 1;
// Done reading necessary bits, ignoring the rest.
if (headerBits.readBit()) { // bOne2OneMapChannels2Speakers
if (channelCount > 2) {
embeddedStereo = headerBits.readBit(); // bEmbeddedStereoFlag
}
if (channelCount > 6) {
embedded6ch = headerBits.readBit(); // bEmbeddedSixChFlag
}
int speakerMaskLength = 0;
if (headerBits.readBit()) { // bSpkrMaskEnabled
speakerMaskLength = (headerBits.readBits(2) + 1) << 2; // nuNumBits4SAMask
headerBits.skipBits(speakerMaskLength); // nuSpkrActivityMask
}
int speakerRemapSetsCount = headerBits.readBits(3); // nuNumSpkrRemapSets
int[] speakerRemapSets = new int[speakerRemapSetsCount];
for (int i = 0; i < speakerRemapSetsCount; i++) {
speakerRemapSets[i] = headerBits.readBits(speakerMaskLength); // nuStndrSpkrLayoutMask[ns]
}
for (int i = 0; i < speakerRemapSetsCount; i++) {
int remapChannelCount = getRemapChannelCount(speakerRemapSets[i]);
int remapMaskLength = headerBits.readBits(5) + 1; // nuNumDecCh4Remap[ns]
for (int j = 0; j < remapChannelCount; j++) {
int remapMask = headerBits.readBits(remapMaskLength); // nuRemapDecChMask[ns][nCh]
int coef = Integer.bitCount(remapMask); // nCoef
headerBits.skipBits(coef * 5); // nuSpkrRemapCodes[ns][nCh][nc]
}
}
} else {
headerBits.skipBits(3); // nuRepresentationType
}

parseAssetDescriptorDynamicData(
headerBits,
embeddedStereo,
embedded6ch,
channelCount,
enableMixMetadata,
mixerOutChannels);

mimeType = parseDecoderNavigationData(headerBits);
}
// Done reading necessary bits, ignoring the rest.

long frameDurationUs = C.TIME_UNSET;
if (staticFieldsPresent) {
Expand All @@ -521,14 +570,151 @@ public static DtsHeader parseDtsHdHeader(byte[] header) throws ParserException {
extensionSubstreamFrameDurationCode, C.MICROS_PER_SECOND, referenceClockFrequency);
}
return new DtsHeader(
MimeTypes.AUDIO_DTS_EXPRESS,
mimeType,
channelCount,
sampleRate,
extensionSubstreamFrameSize,
frameDurationUs,
/* bitrate= */ 0);
}

// Asset descriptor: Dynamic Metadata - DRC, DNC and Mixing Metadata, see ETSI TS 102 114
// V1.6.1 (2019-08) Table 7-6.
private static void parseAssetDescriptorDynamicData(
ParsableBitArray headerBits,
boolean embeddedStereo,
boolean embedded6ch,
int channelCount,
boolean enableMixMetadata,
@Nullable int[] mixerOutChannels) {
boolean hasDrcCoef = headerBits.readBit();
if (hasDrcCoef) { // bDRCCoefPresent
headerBits.skipBits(8); // nuDRCCode
}
if (headerBits.readBit()) { // bDialNormPresent
headerBits.skipBits(5); // nuDialNormCode
}
if (hasDrcCoef && embeddedStereo) {
headerBits.skipBits(8); // nuDRC2ChDmixCode
}
if (enableMixMetadata && headerBits.readBit()) { // bMixMetadataPresent
checkNotNull(mixerOutChannels);
headerBits.skipBits(1 + 6); // bExternalMixFlag, nuPostMixGainAdjCode
if (headerBits.readBits(2) < 3) { // nuControlMixerDRC
headerBits.skipBits(3); // nuLimit4EmbeddedDRC
} else {
headerBits.skipBits(8); // nuCustomDRCCode
}
boolean audioScalePerChannel = headerBits.readBit(); // bEnblPerChMainAudioScale
for (int mixerOutChannel : mixerOutChannels) {
if (audioScalePerChannel) {
headerBits.skipBits(6 * mixerOutChannel); // nuMainAudioScaleCode[ns][nCh]
} else {
headerBits.skipBits(6); // nuMainAudioScaleCode[ns][0]
}
}
int mixesCount = 1; // nEmDM
int[] channelCountsForDownmixes = new int[3];
channelCountsForDownmixes[0] = channelCount; // nDecCh[0]
if (embedded6ch) {
channelCountsForDownmixes[mixesCount] = 6; // nDecCh[nEmDM]
mixesCount++; // nEmDM
}
if (embeddedStereo) {
channelCountsForDownmixes[mixesCount] = 2; // nDecCh[nEmDM]
mixesCount++; // nEmDM
}
for (int mixerOutChannel : mixerOutChannels) {
for (int downmix = 0; downmix < mixesCount; downmix++) {
int channelCountForDownmix = channelCountsForDownmixes[downmix];
for (int downmixChannel = 0; downmixChannel < channelCountForDownmix; downmixChannel++) {
int mask = headerBits.readBits(mixerOutChannel); // nuMixMapMask[ns][nE][nCh]
int coefficients = Integer.bitCount(mask); // nuNumMixCoefs[ns][nE][nCh]
headerBits.skipBits(coefficients * 6); // nuMixCoeffs[ns][nE][nCh][nC]
}
}
}
}
}

// Asset descriptor: Decoder Navigation Data, see ETSI TS 102 114 V1.6.1 (2019-08) Table 7-7.
private static String parseDecoderNavigationData(ParsableBitArray headerBits)
throws ParserException {
int codingMode = headerBits.readBits(2); // nuCodingMode
switch (codingMode) {
case 0: // DTS-HD Coding Mode that may contain multiple coding components
int extensionMask = headerBits.readBits(12);
if ((extensionMask & 0x100) != 0) { // Low bit rate component
return MimeTypes.AUDIO_DTS_EXPRESS;
} else {
return MimeTypes.AUDIO_DTS_HD;
}
case 1: // DTS-HD Loss-less coding mode without CBR component
return MimeTypes.AUDIO_DTS_HD;
case 2: // DTS-HD Low bit-rate mode
return MimeTypes.AUDIO_DTS_EXPRESS;
case 3: // The auxiliary coding mode is reserved for future applications.
default:
throw ParserException.createForMalformedContainer(
/* message= */ "Unsupported coding mode in DTS HD header: " + codingMode,
/* cause= */ null);
}
}

// See Table 7-10 in ETSI TS 102 114 V1.6.1
private static int getRemapChannelCount(int mask) {
int remapChannelCount = 0;
if ((mask & 0x0001) != 0) { // Centre in front of listener
remapChannelCount += 1;
}
if ((mask & 0x0002) != 0) { // Left/Right in front
remapChannelCount += 2;
}
if ((mask & 0x0004) != 0) { // Left/Right surround on side in rear
remapChannelCount += 2;
}
if ((mask & 0x0008) != 0) { // Low frequency effects subwoofer
remapChannelCount += 1;
}
if ((mask & 0x0010) != 0) { // Centre surround in rear
remapChannelCount += 1;
}
if ((mask & 0x0020) != 0) { // Left/Right height in front
remapChannelCount += 2;
}
if ((mask & 0x0040) != 0) { // Left/Right surround in rear
remapChannelCount += 2;
}
if ((mask & 0x0080) != 0) { // Centre Height in front
remapChannelCount += 1;
}
if ((mask & 0x0100) != 0) { // Over the listener's head
remapChannelCount += 1;
}
if ((mask & 0x0200) != 0) { // Between left/right and centre in front
remapChannelCount += 2;
}
if ((mask & 0x0400) != 0) { // Left/Right on side in front
remapChannelCount += 2;
}
if ((mask & 0x0800) != 0) { // Left/Right surround on side
remapChannelCount += 2;
}
if ((mask & 0x1000) != 0) { // Second low frequency effects subwoofer
remapChannelCount += 1;
}
if ((mask & 0x2000) != 0) { // Left/Right height on side
remapChannelCount += 2;
}
if ((mask & 0x4000) != 0) { // Centre height in rear
remapChannelCount += 1;
}
if ((mask & 0x8000) != 0) { // Left/Right height in rear
remapChannelCount += 2;
}
return remapChannelCount;
}

/**
* Returns the size of the extension substream header in a DTS-HD frame according to ETSI TS 102
* 114 V1.6.1 (2019-08), Section 7.5.2.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -430,17 +430,21 @@ private void updateFormatWithDtsHeaderInfo(DtsUtil.DtsHeader dtsHeader) {
if (dtsHeader.sampleRate == C.RATE_UNSET_INT || dtsHeader.channelCount == C.LENGTH_UNSET) {
return;
}
String sampleMimeType =
dtsHeader.mimeType != null
? dtsHeader.mimeType
: format != null ? format.sampleMimeType : null;
if (format == null
|| coreFormatPendingEmit
|| dtsHeader.channelCount != format.channelCount
|| dtsHeader.sampleRate != format.sampleRate
|| !Objects.equals(dtsHeader.mimeType, format.sampleMimeType)) {
|| !Objects.equals(sampleMimeType, format.sampleMimeType)) {
Format.Builder formatBuilder = format == null ? new Format.Builder() : format.buildUpon();
format =
formatBuilder
.setId(formatId)
.setContainerMimeType(containerMimeType)
.setSampleMimeType(dtsHeader.mimeType)
.setSampleMimeType(sampleMimeType)
.setChannelCount(dtsHeader.channelCount)
.setSampleRate(dtsHeader.sampleRate)
.setLanguage(language)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ track 256:
averageBitrate = 1536000
id = 1/256
containerMimeType = video/mp2t
sampleMimeType = audio/vnd.dts.hd;profile=lbr
sampleMimeType = audio/vnd.dts.hd
channelCount = 8
sampleRate = 48000
language = en
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ track 256:
averageBitrate = 1536000
id = 1/256
containerMimeType = video/mp2t
sampleMimeType = audio/vnd.dts.hd;profile=lbr
sampleMimeType = audio/vnd.dts.hd
channelCount = 8
sampleRate = 48000
language = en
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ track 256:
averageBitrate = 1536000
id = 1/256
containerMimeType = video/mp2t
sampleMimeType = audio/vnd.dts.hd;profile=lbr
sampleMimeType = audio/vnd.dts.hd
channelCount = 8
sampleRate = 48000
language = en
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ track 256:
averageBitrate = 1536000
id = 1/256
containerMimeType = video/mp2t
sampleMimeType = audio/vnd.dts.hd;profile=lbr
sampleMimeType = audio/vnd.dts.hd
channelCount = 8
sampleRate = 48000
language = en
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ track 256:
averageBitrate = 1536000
id = 1/256
containerMimeType = video/mp2t
sampleMimeType = audio/vnd.dts.hd;profile=lbr
sampleMimeType = audio/vnd.dts.hd
channelCount = 8
sampleRate = 48000
language = en
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ track 256:
averageBitrate = 1536000
id = 1/256
containerMimeType = video/mp2t
sampleMimeType = audio/vnd.dts.hd;profile=lbr
sampleMimeType = audio/vnd.dts.hd
channelCount = 8
sampleRate = 48000
language = en
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ track 256:
averageBitrate = 1536000
id = 1/256
containerMimeType = video/mp2t
sampleMimeType = audio/vnd.dts.hd;profile=lbr
sampleMimeType = audio/vnd.dts.hd
channelCount = 8
sampleRate = 48000
language = en
Expand Down