2020-05-29, 04:58 PM
(This post was last modified: 2020-05-30, 05:16 PM by pipefan413.)
OK so this is bloody complicated and my brain is soup but I'll try my best to lay this out as clearly as I can. I know this is an extremely long-winded post, so feel free to completely ignore me or maybe just skim it and then read the bits in more detail that you think you might be able to clarify. It might be that this is entirely the wrong forum to ask this kind of specific question, but I figure it's worth a bash before I go poking about Doom9 or wherever else. Anyway, here goes...
I was in the process of redoing my Snowpiercer resync to make it more accurate than it already was, which involved me decoding the .dtshd 7.1 audio track off a Blu-ray Disc to PCM, inserting some samples of silence for precise sync, then re-encoding it back to .dtshd again. The thing is, when I encoded back to DTS-HD MA, something weird happened: the new file was significantly smaller, despite being encoded losslessly in the same format as source. After testing my whole workflow for any losses and verifying that there has been no audio data lost at any point in the chain, I think I've possibly worked out why, though...
The DTS-HD Master Audio Suite encoder produces 2 files:
I think the point of this is to work out where in the bitstream the bitrate suddenly jumps from low to high or high to low, and smooth (i.e. make more gradual) that transition so it's less abrupt. I'm not sure why this is necessary but I assume it's to stop some part of the hardware/decoding chain freaking out because of sudden bitrate spikes (I'd guess that decoders might deal with this badly and fail to increase the bitrate sufficiently fast enough to output losslessly, which could potentially result in audible degradation of the output). Now, if the PBR smoothing is doing that, you'll presumably end up with more areas of higher bitrates, since it's presumably slowly increasing bitrates over a longer time period rather than letting it stay low and then suddenly spike where needed; it follows that these areas of padded-up bitrate will make the file bigger. This might explain why the .dtshd file demuxed off a retail Blu-ray Disc was roughly 10% larger than my .dtshd encode fresh out of the DTS Suite, since mine had not yet gone through the bitrate-smoothing process (since this is only done when you actually author it to disc).
Now, to actually author a .dtshd stream to disc in a way that puts it through this bitrate-smoothing process, you need to take something else into account...
If you author a .dtshd stream to disc using something like tsMuxeR, it'll just ignore the whole bitrate-smoothing thing altogether as far as I can tell, since it doesn't prompt for a .dtspbr input or give any indication that the output file is not going to be disc compliant. However, if you try to do it in pro authoring software like Scenarist, it will prompt for the PBR analysis to be included so that it can smooth the bitrate out in the audio stream it puts on the disc. In order for this process to work, you need to have two things:
So, assuming that you wanted to author a compliant Blu-ray Disc using both a .dtshd stream and a .dtspbr file, you could do what I've done thus far: get the .dtshd stream off the disc, decode it to PCM, then feed it back through the official encoder in order to generate a new (non-smoothed, smaller) .dtshd audio stream and an accompanying .dtspbr bitrate analysis file to later feed to your authoring software. Right?
Well.. no, not necessarily, because there's something else that the encoder does to its output.
Turns out that when you encode to .dtshd in the Suite, it adds something else to the start of the file in addition to the aforementioned extra header: 1024 audio samples' worth of zero byte silence. In terms of duration, this is equivalent to a little over 21 milliseconds (1024 samples / 48000 sample rate = 0.021333... seconds). So you can see what I'm talking about visually, here's a mono audio track that's been encoded to .dtshd then decoded back to PCM, both before and after I removed the added 1024 samples of silence at the start. I've highlighted the 2048 zero bytes (silence). The bit depth for this audio file is 16-bit, meaning that one audio sample consists of 16 bits, and there are 8 bits in a byte, so 16 bits / 8 = 2 bytes per audio sample. 2048 / 2 = 1024 samples.
🔍 🔍
Now, this is an easy thing to fix if you're muxing to MKV, because you can simply apply a negative delay in eac3to like so:
Note that this might seem at first glance like it isn't completely accurate because 1024 samples is not exactly 21 ms, but in practice, eac3to rounds the 21 ms up so that it does happen to cut off precisely the 1024 samples we want to remove.
Anyway, if you're looking to actually author it back to a disc (and potentially put it through the bitrate smoothing process discussed above), well...
The thing is, there surely has to be some reason that the DTS Suite encoder is inserting 1024 samples of silence in the start of the bitstream in the first place. Until recently, I had absolutely no idea why that might be, and I might still be miles off on this but I may have stumbled upon a very fuzzy semblance of understanding while working on something unrelated.
I was working with a graphical (as in, images, no text) PGS subtitle stream for another film and during testing, I muxed one of my PGS streams into the appropriate AVC video file to see how it looked. The result wasn't quite right: MPC-HC played it back with every PGS image shown 2 frames later than they should be. In trying to figure out why that might be, I went back to tsMuxeR to see what the log said. I found this:
Firstly, DTS here doesn't mean DTS audio, it stands for Decode Time Stamps. Secondly, as far as I know, B-pyramid is a concept that relates purely to video streams, not things like PGS or audio. What I'm wondering here is whether tsMuxeR may done something to the h.264 video stream during muxing to have it pull video frames out of buffer 2 frames later than it otherwise would, and to compensate for this, also adjusted the PGS subtitle track (and possibly also the audio track, but I'll get to that in a moment) so that it also displays its images 2 frames later. However, when I played the video back in MPC-HC, I'm guessing (and it's only a semi-educated guess) that it ignores whatever tsMuxeR did to the h.264 stream and just displays the frames as it ordinarily would, but the PGS track *is* delayed by the 2 frames because however tsMuxeR achieves that differs as necessary to apply to the PGS format.
Now... how does that relate to DTS-HD and the 1024 samples of silence? Mostly, probably not much at all. However, it's made me wonder again why that 1024 samples' worth of silence (amounting to roughly half a video frame in duration) are even inserted into the bitstream in the first place. Is it stripped back out during the authoring process, possibly as part of the bitrate smoothing / PBR processing? Or might it be there because of some quirk of hardware decoding that means it actually results in more correct sync compared with the video stream?
All of this digging has brought up a lot of questions that I've tried to find answers to with... not a lot of success. I think the main ones are as follows:
1: What the hell is bitrate smoothing?
I was in the process of redoing my Snowpiercer resync to make it more accurate than it already was, which involved me decoding the .dtshd 7.1 audio track off a Blu-ray Disc to PCM, inserting some samples of silence for precise sync, then re-encoding it back to .dtshd again. The thing is, when I encoded back to DTS-HD MA, something weird happened: the new file was significantly smaller, despite being encoded losslessly in the same format as source. After testing my whole workflow for any losses and verifying that there has been no audio data lost at any point in the chain, I think I've possibly worked out why, though...
The DTS-HD Master Audio Suite encoder produces 2 files:
- a .dtshd audio bitstream file, which has variable bitrate
- a .dtspbr file, which contains an analysis of the changes in bitrate throughout the .dtshd audio bitstream
Quote:The Peak Bit Rate Analysis Tool analyses variable bit rate encodings (DTS-HD Master Audio encoded streams) graphically plotting the selected encoding’s bit rate over time, as if the encoding had been “smoothed” for authoring using a Peak Bit Rate scheduling utility. The smoothing process redistributes data throughout the encoded stream for a more constant flow of data during disc played back. Smoothing is performed during the authoring process of a disc.
I think the point of this is to work out where in the bitstream the bitrate suddenly jumps from low to high or high to low, and smooth (i.e. make more gradual) that transition so it's less abrupt. I'm not sure why this is necessary but I assume it's to stop some part of the hardware/decoding chain freaking out because of sudden bitrate spikes (I'd guess that decoders might deal with this badly and fail to increase the bitrate sufficiently fast enough to output losslessly, which could potentially result in audible degradation of the output). Now, if the PBR smoothing is doing that, you'll presumably end up with more areas of higher bitrates, since it's presumably slowly increasing bitrates over a longer time period rather than letting it stay low and then suddenly spike where needed; it follows that these areas of padded-up bitrate will make the file bigger. This might explain why the .dtshd file demuxed off a retail Blu-ray Disc was roughly 10% larger than my .dtshd encode fresh out of the DTS Suite, since mine had not yet gone through the bitrate-smoothing process (since this is only done when you actually author it to disc).
Now, to actually author a .dtshd stream to disc in a way that puts it through this bitrate-smoothing process, you need to take something else into account...
2: The DTS Suite header
If you author a .dtshd stream to disc using something like tsMuxeR, it'll just ignore the whole bitrate-smoothing thing altogether as far as I can tell, since it doesn't prompt for a .dtspbr input or give any indication that the output file is not going to be disc compliant. However, if you try to do it in pro authoring software like Scenarist, it will prompt for the PBR analysis to be included so that it can smooth the bitrate out in the audio stream it puts on the disc. In order for this process to work, you need to have two things:
- a .dtshd audio stream with a DTS-HD Master Audio Suite header attached (which goes before the actual DTS header and the audio bitstream itself)
- a .dtspbr file containing the PBR analysis
So, assuming that you wanted to author a compliant Blu-ray Disc using both a .dtshd stream and a .dtspbr file, you could do what I've done thus far: get the .dtshd stream off the disc, decode it to PCM, then feed it back through the official encoder in order to generate a new (non-smoothed, smaller) .dtshd audio stream and an accompanying .dtspbr bitrate analysis file to later feed to your authoring software. Right?
Well.. no, not necessarily, because there's something else that the encoder does to its output.
3: The sound of silence
Turns out that when you encode to .dtshd in the Suite, it adds something else to the start of the file in addition to the aforementioned extra header: 1024 audio samples' worth of zero byte silence. In terms of duration, this is equivalent to a little over 21 milliseconds (1024 samples / 48000 sample rate = 0.021333... seconds). So you can see what I'm talking about visually, here's a mono audio track that's been encoded to .dtshd then decoded back to PCM, both before and after I removed the added 1024 samples of silence at the start. I've highlighted the 2048 zero bytes (silence). The bit depth for this audio file is 16-bit, meaning that one audio sample consists of 16 bits, and there are 8 bits in a byte, so 16 bits / 8 = 2 bytes per audio sample. 2048 / 2 = 1024 samples.
🔍 🔍
Now, this is an easy thing to fix if you're muxing to MKV, because you can simply apply a negative delay in eac3to like so:
Code:
eac3to inputfile.dtshd outputfile.dtshd -21ms
Note that this might seem at first glance like it isn't completely accurate because 1024 samples is not exactly 21 ms, but in practice, eac3to rounds the 21 ms up so that it does happen to cut off precisely the 1024 samples we want to remove.
Anyway, if you're looking to actually author it back to a disc (and potentially put it through the bitrate smoothing process discussed above), well...
- It may not necessarily be as simple as deleting the 1024 samples and either reusing the existing .dtspbr file or generating another, since I'm not certain whether the header will still be correct for the contents of the file after removing 1024 samples from the start of the bitstream.
- It might not even be correct to remove the 1024 samples of silence if authoring a disc anyway, which brings me to my final point...
4: I sync my mind is melting
The thing is, there surely has to be some reason that the DTS Suite encoder is inserting 1024 samples of silence in the start of the bitstream in the first place. Until recently, I had absolutely no idea why that might be, and I might still be miles off on this but I may have stumbled upon a very fuzzy semblance of understanding while working on something unrelated.
I was working with a graphical (as in, images, no text) PGS subtitle stream for another film and during testing, I muxed one of my PGS streams into the appropriate AVC video file to see how it looked. The result wasn't quite right: MPC-HC played it back with every PGS image shown 2 frames later than they should be. In trying to figure out why that might be, I went back to tsMuxeR to see what the log said. I found this:
Quote:B-pyramid level 1 detected. Shift DTS to 2 frames
Firstly, DTS here doesn't mean DTS audio, it stands for Decode Time Stamps. Secondly, as far as I know, B-pyramid is a concept that relates purely to video streams, not things like PGS or audio. What I'm wondering here is whether tsMuxeR may done something to the h.264 video stream during muxing to have it pull video frames out of buffer 2 frames later than it otherwise would, and to compensate for this, also adjusted the PGS subtitle track (and possibly also the audio track, but I'll get to that in a moment) so that it also displays its images 2 frames later. However, when I played the video back in MPC-HC, I'm guessing (and it's only a semi-educated guess) that it ignores whatever tsMuxeR did to the h.264 stream and just displays the frames as it ordinarily would, but the PGS track *is* delayed by the 2 frames because however tsMuxeR achieves that differs as necessary to apply to the PGS format.
Now... how does that relate to DTS-HD and the 1024 samples of silence? Mostly, probably not much at all. However, it's made me wonder again why that 1024 samples' worth of silence (amounting to roughly half a video frame in duration) are even inserted into the bitstream in the first place. Is it stripped back out during the authoring process, possibly as part of the bitrate smoothing / PBR processing? Or might it be there because of some quirk of hardware decoding that means it actually results in more correct sync compared with the video stream?
5: TL;DR
All of this digging has brought up a lot of questions that I've tried to find answers to with... not a lot of success. I think the main ones are as follows:
- Is the PBR smoothing the reason that a .dtshd stream taken directly from a disc is significantly (~10%) larger than the exact same audio fresh out of the DTS-HD Master Audio Suite encoder? I'm guessing that's exactly it, but it's a guess.
- Although tsMuxeR will quite happily mux .dtshd to a BD folder structure, ISO or .m2ts without a .dtspbr file, which presumably means it isn't doing anything to regulate bitrate spikes in the stream, I expect burning that to disc and playing it back on hardware will result in issues e.g. the audio decoder not reacting quickly enough to the bitrate suddenly increasing and causing artefacts / rendering the output at a lower bitrate than it should. At least, I expect that would be the case for .dtshd from the encoder (e.g. that I've then stripped the header from but not PBR-smoothed). Although...
- Since tsMuxeR doesn't require a .dtspbr file, but a .dtshd stream taken directly from a retail disc will presumably have been already PBR-smoothed during authoring of the original disc, would simply muxing that .dtshd stream (maybe with a delay having been applied) back to a disc result in an already smoothed out audio stream that complies with the Blu-ray Disc standard? Or is it more complex than that? I'm guessing it probably is, otherwise I don't know why you'd need to do the bitrate smoothing process during authoring rather than just doing it beforehand and authoring the result to the disc instead. EDIT: Actually, yeah, I'm forgetting that the total bitrate of video and audio can't exceed the max allowed for a Blu-ray Disc, so presumably that's the reason that the smoothing is not done until the authoring stage. So I guess doing this in tsMuxeR *might* work but it might also mean that the total bitrate is exceeded if your video bitrate and audio bitrate are both very high. I wonder what, if anything, tsMuxeR does to deal with this problem? (For what it's worth, MediaInfo does not report any bitrate info about a solo .dtshd file but if it's muxed into an MKV or something it'll tell you a figure which I assume is just the average bitrate. In the case of Snowpiercer, I found this was about 5.8 Mbps, which is presumably a non-issue seeing as the video bitrate is only about 20-odd Mbps. I believe the total allowed bitrate on a Blu-ray is 54 Mbps (40 Mbps for video), so as long as video + audio are below that, I'm guessing it's fine.)
- Does the 1024 samples' worth (1024 / 512 = 2 DTS frames) of silence added to the start of the .dtshd bitstream by the encoder get removed again during disc authoring, or is it actually supposed to be kept in there to correct for some idiosyncrasy or another? (In other words: should I be manually removing those 1024 samples before authoring or not?) Someone elsewhere has suggested that the DTS Suite header basically tells Scenarist to skip those 2 audio frames, so I think it is effectively removed. CONFIRMED: Yes, it gets removed because the DTS Suite header contains a "codec delay" of 1024 samples.
- If it actually is more accurate to remove the 1024 samples of silence from the start of the bitstream before muxing, will doing so break the DTS Suite header and/or PBR analysis thus causing issues with the bitrate smoothing process that's supposed to happen during disc authoring? (This may not be relevant if the right thing to do is leave it in, if you plan to author with something like Scenarist.) CONFIRMED: Yes, the header contains an instruction to skip the first 1024 frames as mentioned above ("codec delay") so presumably just deleting them and not editing the header would screw this up. But... I wonder if I could remove the silence and also edit the header to remove the codec delay so that the same files could be used with the PBR to author a disc in pro software (i.e. not tsMuxer) *and* to mux into MKV with correct sync (since you can't just mux in the file with the 1024 sample silence to an MKV without it putting sync slightly off).