2020-07-22, 02:00 AM
UPDATE: After investigating this some more, I am currently under the impression that the thing that seems to most consistently reproduce the issue with trimmed DTS-HD MA streams is moments of silence at the start of the audio stream that contain at least some amount of low-level noise, such as dither noise.
Tracks with pure zero-byte silence at the start can be adjusted with -21ms without issue, in all cases I've tested to date.
Tracks with "silence" at the start that's got a tiny amount of noise in it, like dither, exhibit varying degrees of inaccuracy when decoded back to PCM after the -21ms adjustment has been made.
In some cases, this only affects the first few hundred samples or so, remaining isolated to the first two DTS frames. It's therefore probably quite harmless for the most part, however in one case it caused a noticeable blip in a waveform that could actually be heard as an audible pop. In other cases, the issue extends much further into the track. I think the one that was worst affected was potentially most messed up because it contained an LFE channel (which was mostly silent for a lot of the track) that had been dithered, so it had a lot of dithered "silence" throughout.
It would appear that there is something about the DTS format that means that DTS frames do not exist in a vaccuum, instead having some influence on frames that fall later in the bitstream. When those first two frames are removed, therefore, it sometimes has an impact on accurate decoding of the following frames.
My testing is ongoing so the extent of how concerned we should actually be about this remains to be seen. For the time being, I would suggest that for archival purposes, FLAC is the best bet unless you very rigorously test your DTS-HD MA streams by the following method:
1. Encode to .dtshd
2. Apply -21ms trim
3. Decode both .dtshd files (trimmed and untrimmed) to raw PCM (.raw, .pcm, or just .wav and then delete the header, doesn't matter as long as you don't mess up your hex editing)
4. Open the raw PCM in a hex editor: source, encoded, and trimmed. Compare all 3 to ensure that they are identical, apart from any trailing 00 bytes added by the DTS encoder (to make up a whole DTS frame if the end of the file did not contain the requisite number of samples already) and the first 1024 samples in the encoded version (which are 00 bytes)
If you want to know how many 00 bytes should be at the start of the encoded file (which you can delete in the hex editor to make the comparison part easier), here's how that works:
The bit depth (a.k.a. bit width) of an audio file tells you how many bits (or bytes, if converted) one PCM sample occupies, but a hex editor displays binary data grouped into bytes, not bits, so 16-bit and 24-bit are not directly applicable as-is.
1 byte contains 8 bits, so to get bit depth into terms of bytes, do
bit depth (in bits) / 8
e.g.
16/8 = 2
24/8 = 3
You also need to take into account how many channels there are, because obviously every channel has to be rendered into the data so more channels = more data.
The calculation is as follows:
number of samples x bit depth in bytes x number of channels
e.g.
1024 x (16/8) x 1 = 2048 bytes
1024 x (24/8) x 6 = 18432 bytes
And no, simply deleting the first two DTS frames manually in a hex editor does not fix the issue, since that's all that eac3to does with the -21ms delay anyway (well, that and discarding the DTS global header and everything from the NAVI-TBL onwards at the end).
Tracks with pure zero-byte silence at the start can be adjusted with -21ms without issue, in all cases I've tested to date.
Tracks with "silence" at the start that's got a tiny amount of noise in it, like dither, exhibit varying degrees of inaccuracy when decoded back to PCM after the -21ms adjustment has been made.
In some cases, this only affects the first few hundred samples or so, remaining isolated to the first two DTS frames. It's therefore probably quite harmless for the most part, however in one case it caused a noticeable blip in a waveform that could actually be heard as an audible pop. In other cases, the issue extends much further into the track. I think the one that was worst affected was potentially most messed up because it contained an LFE channel (which was mostly silent for a lot of the track) that had been dithered, so it had a lot of dithered "silence" throughout.
It would appear that there is something about the DTS format that means that DTS frames do not exist in a vaccuum, instead having some influence on frames that fall later in the bitstream. When those first two frames are removed, therefore, it sometimes has an impact on accurate decoding of the following frames.
My testing is ongoing so the extent of how concerned we should actually be about this remains to be seen. For the time being, I would suggest that for archival purposes, FLAC is the best bet unless you very rigorously test your DTS-HD MA streams by the following method:
1. Encode to .dtshd
2. Apply -21ms trim
3. Decode both .dtshd files (trimmed and untrimmed) to raw PCM (.raw, .pcm, or just .wav and then delete the header, doesn't matter as long as you don't mess up your hex editing)
4. Open the raw PCM in a hex editor: source, encoded, and trimmed. Compare all 3 to ensure that they are identical, apart from any trailing 00 bytes added by the DTS encoder (to make up a whole DTS frame if the end of the file did not contain the requisite number of samples already) and the first 1024 samples in the encoded version (which are 00 bytes)
If you want to know how many 00 bytes should be at the start of the encoded file (which you can delete in the hex editor to make the comparison part easier), here's how that works:
The bit depth (a.k.a. bit width) of an audio file tells you how many bits (or bytes, if converted) one PCM sample occupies, but a hex editor displays binary data grouped into bytes, not bits, so 16-bit and 24-bit are not directly applicable as-is.
1 byte contains 8 bits, so to get bit depth into terms of bytes, do
bit depth (in bits) / 8
e.g.
16/8 = 2
24/8 = 3
You also need to take into account how many channels there are, because obviously every channel has to be rendered into the data so more channels = more data.
The calculation is as follows:
number of samples x bit depth in bytes x number of channels
e.g.
1024 x (16/8) x 1 = 2048 bytes
1024 x (24/8) x 6 = 18432 bytes
And no, simply deleting the first two DTS frames manually in a hex editor does not fix the issue, since that's all that eac3to does with the -21ms delay anyway (well, that and discarding the DTS global header and everything from the NAVI-TBL onwards at the end).