Bob-Stuart-Interview in voller Länge
In der aktuellen STEREO-Ausgabe berichten wir über das Pro und Kontra des MQA-Tonformats, darin ein Interview mit MQA-Mastermind Bob Stuart. Wir mussten Stuarts Antworten stark kürzen, deshalb hier nun seine Statements in voller Länge und im englischen Original: Stephan Hotto of Xivero has published his „Hypothesis Paper“, where he presumes a) that MQA uses apodizing…
In der aktuellen STEREO-Ausgabe berichten wir über das Pro und Kontra des MQA-Tonformats, darin ein Interview mit MQA-Mastermind Bob Stuart. Wir mussten Stuarts Antworten stark kürzen, deshalb hier nun seine Statements in voller Länge und im englischen Original:
Stephan Hotto of Xivero has published his „Hypothesis Paper“, where he presumes a) that MQA uses apodizing filters and states that those filters b) „distort the phase of the audio signal“ and c) „loose time domain resolution“ or even d) „introduce aliasing“. What would you answer?
It will be clear to informed readers that the author of the ‘Hypothesis Paper’ has either not read our various papers and published materials on MQA or, has failed to understand them at quite a profound level. The most accessible overview of our philosophy to improving high quality audio coding for the human listener, including measures of temporal blur, coding and losslessness, is given in the 2-page paper [1]. For more depth we refer an interested reader to [2][3][4][5][6]. The ‘Hypothesis paper’ is grounded in an outdated conceptual framework established more than 20 years ago; time and frequency are not mathematically equivalent for the human listener while Fourierbased analysis and coding has led to technologies that are less appropriate for our high-level cognitive processes. [7][8][9][10][11][12][14]
The MQA project is a sincere endeavour to raise quality and move away from the incorrect assumptions about digital coding, processing and precision of delivery. Coming, as it does, from the same team that first built and demonstrated lossless compression to the industry in the early 90s, MQA is going beyond lossless concepts to bring the very highest quality, by addressing critical assumptions around sampling and conversion.
To paraphrase Einstein: ‘We can’t necessarily fix a problem using the same thinking that created it.’ MQA does represent a true paradigm shift, in the original sense. [15]
Turning now to the specifics of this question:
a) Although apodizing was first introduced to the audio community by our team 13 years ago [16], MQA does not use apodizing, we have moved forward; a question already answered in [4].
b) Since MQA is an end-to-end process and is previewable in the studio by mastering and recording engineers, the processes we use to remove temporal blur from the signal or playback converters is neither a distortion nor necessarily an alteration of intent; in any case the ‘phase’ comment is misframed. Even if applied at a later stage, it is now widely accepted by the mastering community that the process of reducing temporal smear is equivalent to removing an error, not an additive distortion or ‘effect’. The sound is simply clearer through removal of pollution.
c) This is incorrect. MQA retains the same timeline precision as conventional and properly dithered PCM while MQA provides substantially lower temporal blur; (see our tutorials on temporal precision and blur in [6]).
d) All A/D and D/A conversion systems have aliasing as a side-effect. The MQA encoder pedantically avoids aliasing in the audio range and ensures that any which may have arisen above 20 kHz is below the noise of the recording and, in this way, turns out to be superior to popular or common converters.
a) How can download stores like Highresaudio make sure that they are offering real highres content and b) not upsampled material? c) Stephan Hotto and Lothar Kerestedjian from Highresaudio.de complain about not being able to analyse the spectral content of the decoded MQA signal as there is no way to access the signal in the digital domain after MQA decoding.
a) The question is imprecise: what do we mean by ‘highres’? Some define it as using 20 bits or more (DEG), others require 24 bits (JAS). Some feel that highres implies a sampling rate of 48 kHz (DEG) or 96 kHz (JAS) or higher. A few seem to believe that what is necessary is 24b audio in a lossless container. Each definition is arbitrary, technical, rooted in the digital domain and disconnected from the sound quality. For example, there is a disconnect in that DSD (1 bit) is also considered to be highres, as is studio analogue tape.
Our viewpoint is that ‘Resolution’ is a concept of Perception, and is best interpreted in the analogue domain. Resolution can be defined in the analogue domain where it deals with resolution in its behavioural sense – ‘the separation of events’; see [1]. This pioneering insight is better aligned to listening experience than to digital domain definitions of quality. As a result, we don’t include or exclude recorded material on the basis of digital file format or parameters such as sample rate or bit depth. Instead, we focus on temporal resolution, noise stability and analogue blur.
But, even without a definition, the implication behind the question is that music is occasionally upsampled to ‘fake’ that it was recorded at higher sample rates. There is no doubt that in the past this may have been a tactic used in distribution, undertaken by retailers or labels to facilitate higher prices. Just before moving on to the next part of this question, it is worth reflecting that, up to now, in the music distribution chain (i.e. post-release-master phase), some forms of up- and down-sampling have been accepted as normal and even desirable, or at least under the guise of providing convenience for the buyer of downloads.
One example of pervasive upsampling is offering DSD versions of recordings originally made in PCM (e.g. between 44.1 and 352 kHz); many more DSD versions are on sale than were recorded natively in that format. Another example of upsampling can be found where recordings at 88.2 kHz are offered at 96 kHz – maybe they were historically remastered to conform with physical formats or to comply with the ‘Mastered for iTunes’ (MFIT) program (this latter motivates a lot of 96/24 content in the market). As regards cross-sampling, some find it convenient that albums can be offered for download in multiple formats, e.g. the same album in DSD, 192/24, 96/24 even though the ‘signed-off master’ may have been in 88.2, 176.4 or 352.8 kHz. MQA has a very specific perspective on this, which is that with very few exceptions, derivatives are by definition not the master and are at least separated by one generation. And, as we discover, sometimes the highest rate version is not the ‘true master’.
b) In the answer to Q3 we will expand on that topic, but it is not technically possible to use Fourier-based analysis to determine definitively if a recording was upscaled or upsampled in distribution (i.e. post release or approval). It can only be used to flag a question. In fact, in some parts of the world, upsampling systems are designed specifically to confound inspection by FFT-based tools. An important complication, which has led to many misunderstandings, is the fact that simplistic FFT analysis of a recording will not always tell the whole story. For example in modern multitrack mixes, electronic instruments often give the impression of a 24 kHz (48 kHz sampling) ‘brick wall’ because the electronic instrument is built that way, but mixing that with a voice recorded at 96 kHz means that the project master may be 96 kHz. In another example, some important early digital recordings used 44.1 or 48 or 50 or 50.4 kHz tracks that were then mixed in analogue (sometimes with additional analogue tracks and effects) and mastered to analogue tape. When these recordings are digitised complaints are sometimes made of ‘upsampling’, whereas the truth is quite different. As another example – in our view quite different from expedient upsampling to raise the price – is where an artist returns to an earlier recording, specifically to take advantage of improved technology, to use tools that were not available when the recording was made. Is that a fake? Each case has to be viewed on merit, but such artist remasters, if genuine, in our perspective represent a new release for which there can be no simplistic technical criticism based on original track or partial mix sampling rates. Who knows, it may be lucky enough to suffer less compression the second time around! Although it may have been important in the past, these days it isn’t necessarily appropriate for a retailer to get involved in the supply chain – especially if it involves modifying files, changing metadata, re- sizing cover art or attempting judgemental audio QA without the benefit of direct access to the original assets. All that can really be determined by technical analysis is spectrum shape and active-bit count. There is no doubt that basic checking can find errors introduced in the supply chain and that’s a good thing, but in our experience these are much more likely to be human, documentation or machine errors than misrepresentation.
This process of altering metadata or checking download files is at variance with other music distribution methods such as vinyl, CD or Blu-ray. In these cases the consumer trusts the label 100% and the product cannot be changed on the way to the customer. MQA is setting out to provide that same confidence for streams and downloads. Part of what we are trying to do with Authentication in MQA is to actively indicate provenance, to break down distrust, by putting considerable effort into finding the true master, setting up acceptance criteria and workflows that seek to maximise the likelihood of the right outcome. Then, the MQA file guarantees that the deliverable is not changed in any way between the mastering engineer, issuing label and the end customer. We think this is a much better way. See our blog on provenance and comments from mastering engineers [4][25][24][23].
What can be a better assurance of quality than it is signed off by the artist, engineer or their representatives? That they can embed extended provenance data in the audio itself? Certainly nothing that can be done in the distribution chain, very simply because nothing can be changed without extinguishing the MQA confirmation light. Problem solved!
c) Note that it is already possible to analyse the digital output from Bluesound, Audirvana and other products which give access to the MQA ‘Core’ (up to 96kHz sampling rate) or the analogue outputs from decoders. In fact this is enough to determine if baseband audio might have been upsampled.
How can you at MQA make sure that files forwarded to you by a label for MQA encoding are a) original recordings with b) their native sample rate and c) not upsampled? This is what you guarantee with your “MQA studio” label, isn’t it? d) Can you give us a short explanation of the differences between “MQA studio” and mere “MQA”-labelled files?
For every recording there is fundamentally only one ‘release master’. This is the version/mix/mastering result which was approved by the artist/producer or their representatives, either at the time of release or specific remastering, or later from the archive. For almost 20 years, it has been the nature of digital music distribution that each true master will be used to provide many distribution derivatives. At MQA we deal with this according to several principles and methods. First, we tend to operate on a basis of trust and shared good intentions with the labels. People rarely want to do a bad job or to make mistakes, so setting the scene is the start.
Simply stated, we aim to release one and only one version of an album in MQA and, for that to be, at the time of encoding, the closest or the actual master asset that is approved for release by the artist. (This qualification is important because there are many examples of popular albums where the true highresolution master is not approved for release – that calls for investment, education or advocacy and is already resulting in MQA encoding masters which have not been released). The way this is done is by significant cooperation and dialogue with the labels, evolving criteria and workflow documents. A second workstream in MQA is using our so-called ‘white-glove’ technology portfolio where, sometimes in a straightforward way, other times with considerable effort, we can drill back into archive assets to extract a ‘more technically correct’ version of the recording – not changing intent, but removing artefacts of the equipment. These techniques range from correcting for specific converters or workstations or tape machines, etc. These methods have already been extensively used to improve mixes on new recordings or to optimise re-releases.
b) The concept of native sample rate is imprecise. It doesn’t exist for an analogue recording and for a digital production, the closest we have is the sample rate of the mix and mastering stages. But that is an inadequate criterion. An album may be finished and signed off, e.g. at 96 kHz, as a ‘root master’ but the next export stage is (not infrequently) to export a number of same-rate variants which are individually compressed, equalised or levelled for different distribution targets such as vinyl, download stores, MFIT or Bluray (as well as versions for CD, AAC, MP3 etc). While not lossy (in the sense of a perceptual codec that throws away base-band information), these high-rate exports are neither lossless nor original. At MQA we aim to encode the unmodified real master.
c) The MQA encoder has considerable intelligence when it comes to establishing information and temporal features of a song or work; it also has features that aim to trap a number of cases of technical or human error. Without being too specific, the encoder is always on the lookout for malformed files, mal-formatted samples, inconsistent bit depth, overflow, accidental or deliberate aliasing, indications of upsampling, the presence of audible watermarks, as well as probability that the file may have been reissued for optical disc or MFIT. We also fingerprint and cross-check if the item has been encoded before. If our encoder is used correctly, these types of issues will have been checked and the content either accepted or rejected. Uncertainty can result in automatic downgrade from ‘MQA Studio’ to ‘MQA’.
d) Music which has been through the process described in c) is generally marked ‘MQA Studio’. Content which is still uncertain but where there is a need to provide the music (pending provenance checks) will be marked ‘MQA’.
Another assumption is that the catalogue of a major label like Warner is simply “batch converted” to MQA, i.e. converted with standard parameters without analysing the content or knowing what kind of A/D converter has been used in the recording studio. Does Warner give you any information on the “history” of a recording?
This assumption is incorrect. While we are not prepared to divulge details about our label partners, anyone who is lucky enough to visit the WMG technical group and see the archive records would be incredibly impressed. This is a sincere group who have all the information at their fingertips to deliver the best releasable original to the encoding process. Just doing that requires significant de-duplication because of the many variants stored. For every master we can know the complete equipment chain, including ADC, mastering equipment, serial numbers, workflow. In the case of recent work we also have fingerprints of workstations, of tape recorders and calibration fingerprints with every analogue transfer. This information, combined with that gained from our 5-year development of this process, makes our encoder very good at recognising converter characteristics and, of course, it can take in more precise information for white-glove re-issues. Warner uses their deep data to select the assets, we then take that subset. We have similar good experiences to report from our dealings with other major label groups and independent labels such as 2L. No-one is trying to do a bad job. Occasionally information is unavailable and we have to track down who tuned a piano 30 years ago, but that adds to the fun.
Can you give us an update on when Universal Music will come up with MQA titles? And what about Sony Music?
There is a lot of work going on behind the scenes. However we can’t make pre-announcements on behalf of Universal, Sony, Merlin or other label groups. Watch this space ..!
REFERENCES
[1] Stuart, J.R., ‘Soundboard: High-Resolution Audio’, JAES Vol. 63 No. 10, pp831–832 (Oct 2015) Open Access http://www.aes.org/e-lib/browse.cfm?elib=18046
[2] Stuart, J. R. and Craven, P.G., ‘A Hierarchical Approach to Archiving and Distribution’, 9178, 137th AES Convention, (2014). Open Access: http://www.aes.org/e-lib/browse.cfm?elib=17501
[3] Stuart, J.R., Howard, K., ‘New digital coding scheme – MQA’, (Japanese translation by Hiroaki Suzuki), J. Japan Audio Society, vol. 55 #6, pp45 – 57 (Nov. 2015).
[4] Stuart, J.R., ‘Provenance and Quality’, http://bobtalks.co.uk/blog/mqa-philosophy/mqa-authentication-and-quality/.
[5] Stuart, J.R., ‘A Comprehensive Q&A with MQA’s Bob Stuart’, http://www.computeraudiophile.com/content/694- comprehensive-q-mqa-s-bob-stuart/ (April 2016)
[6] Stuart, J.R., ‘MQA: Questions and Answers’, http://www.stereophile.com/content/mqa-questions-andanswers# XLJUFTHDCb2Dg3a2.97, (Aug. 2016)
[7] Unser, M. ‘Sampling – 50 Years after Shannon’, Proc. IEEE vol. 88 No. 4, pp. 569–587 (Apr. 2000)
[8] Woszcyk, W., ‘Physical and perceptual considerations for high-resolution audio’, AES 115th Convention, New York, preprint 5931 (Oct 2003)
[9] Lewicki, M.S. ‘Efficient Coding of natural sounds’, Nature Neurosci. 5, 356–363 (2002). http://dx.doi.org/10.1038/nn831
[10] Gabor, D., ‘Theory of Communication’, Journal of the Institution of Electrical Engineers, 93, III, p. 429, (November 1946). http://dx.doi.org/10.1049/ji-1.1947.0015
[11] Gabor, D., ‘Acoustical Quanta and the Theory of Hearing’, Nature, 159, pp. 591–594, (1947). http://dx.doi.org/10.1038/159591a0
[12] Oppenheim, J. M. and Magnasco, M. O., ‘Human Time-Frequency Acuity Beats the Fourier Uncertainty Principle’, Phys. Rev. Lett., 110, 044301, (2013). http://dx.doi.org/10.1103/PhysRevLett.110.044301
[13] Maka, M, Sobieszczyk, P, et al., ‘Hearing overcomes uncertainty relation’, Euro Physics News 46 #1, pp 27–31.
[14] Oppenheim, J. M., et al., ‘Minimal Bounds on Nonlinearity in Auditory Processing’ (Jan 2013). arXiv:1301.0513 q-bio.NC.
[15] Kuhn, T., ‘The Structure of Scientific Revolutions’, University of Chicago Press, (1962)
[16] Craven, P.G., ‘Antialias Filters and System Transient Response at High Sample Rates’, J. Audio Eng. Soc., Vol. 52, No. 3, pp. 216–242, (March 2004)
[17] Jackson, H. M., Capp, M. D. and Stuart, J. R., ‘The audibility of typical digital audio filters in a high-fidelity playback system’, 9174, 137th AES Convention, (2014).
[18] Reiss, J. D. Reiss, ‘A meta-analysis of high resolution audio perceptual evaluation’. JAES 64(6):364–379, (June 2016).
[19] Oohashi, T., et al, ‘Inaudible High-Frequency Sounds Affect Brain Activity: Hypersonic Effect’, J. Neurophysiol, 83: 3548–3558, (2000).
[20] Dragotti, P.L., Vetterli, M., Blu, T., ‘Sampling Signals With Finite Rate of Innovation’, IEEE Trans.Sig. Proc. Vol. 50, No. 6, pp. 1417–1428, (May 2007)
[21] Moore, B.J.C., ‘The role of temporal fine-structure processing in pitch perception, masking, and speech perception for normal-hearing and hearing-impaired people’. Journal of the Association for Research in Otolaryngology, 9:399-406, 2008.
[22] Oppenheim, J. M., et al, ‘Degraded Time-Frequency Acuity to Time-Reversed Notes’, PLOS ONE, 8, pp. 1–6 (June 2013). http://dx.doi.org/10.1371/journal.pone.0065386
[23] Massenberg, G., ‘MQA interview’, https://www.youtube.com/watch?v=n5EEjyl7tzA
[24] Ludwig, B., ‘Bob Ludwig talks about MQA’, https://www.youtube.com/watch?v=1iF9_3DLEbk
[25] Various, ‘MQA by Mastering Engineers’, https://www.youtube.com/watch?v=5U-D_4DK6to&t=19s