Friday, November 15, 2013

MP3 and other HiRes formats



1. Introduction

Hey babes... today I´ll be talking about high resolution formats / codecs. I will show that even lossy codecs like MP3, AAC or WMA are perfectly able of encoding music in high resolution with spectacular results. I will show you listening tests, graphs and measurement results. I´ll also talk about the necessary requirements so that all of these codecs perform at their very best. I´ll tell you how these codec really sound (minus the usual bullshit). And finally, I´ll give advice how to unleash the best possible quality while using lossy codecs. I won´t talk about FLAC, APE, TAK, ALAC or WavPack... those are all lossless, a.k.a. encode music without taking something away. This article is divided into seperate paragraphs which I´ll mention now so that you might jump to the one interesting you the most:

1. Introduction
2. A bit of lossy history
3. How lossy codecs work
4. Transparency
5. Fairytales - or why MP3 & Co. are HiRes capable
6. Seeing is believing

7. Listening to differences
8. And now... the problem
9. The solution
10. How to make the best MP3s (& Co.) ever!
11. This really is high resolution? Some disadvantages
12. REAL "lossyless" high resolution
13. The sound of noise
14. Transcoding horror
15. Conclusion


2. A bit of lossy history

Fig. I: Responsible for all the 'mess': Karlheinz Brandenburg (copyright: Wikipedia)
The guy above did it all: Karlheinz Brandenburg. He was part of a group of scientists at the University of Erlangen-Nuremberg and in 1989 he described in his dissertation several techniques necessary for lossy codecs. The principles he wrote about are the foundations for any lossy codec and they have been in use ever since. Herr Brandenburg is called by many people the 'father' of MP3. In fact, after his dissertation he further developed this codec in cooperation with other scientists at the Fraunhofer Society. Since then, MP3 has become the most dominant codec to store music with. Around the same time (1992) Sony unleashed ATRAC necessary for the MiniDisc, by that catapulting lossy coding into mainstream consciousness. In 1999, Microsoft released WMA in order to have its own MP3-alternative which then could be licensed to partners for loads of money (or so they wished). In the same year, the Fraunhofer Society standardized what was to be the successor to MP3: AAC or MP4 and just like WMA and MP3 the use of this codec costs money. The Xiph.Org foundation followed suit in 2000 with a completely open and free alternative called Vorbis, commonly referred to as OGG. But as I said above, MP3 still is the most used lossy codec even though it´s old and technically inferior to the other codecs (ATRAC is worse).

3. How lossy codecs work

To make it short: they make music smaller while trying to keep sound quality on par with the original they were encoded from. They remove what our ears cannot hear... well, that´s not very precise. They remove parts of the music our ears AND our brain are unable to perceive. 'Perceive' is important as lossy codecs remove information our brain would ignore anyway. Because of this ear/brain combination those codecs are called 'psychoacoustic'; us humans never listen with our ears only, our brain is indeed the biggest part of our hearing. I won´t go into detail describing how lossy codecs are able to shrink filesizes... but have you ever thought about the description 'they remove parts of the music'? In fact, they don´t literally remove those parts; what they do is dynamically decreasing bit depth for certain parts, frequencies or information they deem to be inaudible. For example: if a louder part masks a softer part, lossy codecs decrease bit depth for the soft part to, say, 1 bit (see Fig. II). When you decrease bit depth you create a noisy residue, called quantization noise. The ability of a codec to hide this noise partly determines how transparent it sounds to us.

Fig. II: audio masking and subsequent bit depth decreasing (copyright: Wikipedia)
4. Transparency

If MP3 & Co. are only removing what we cannot perceive anyway, why do we sometimes hear more or less horrible compression artifacts? Well, the sound of a lossy codec basically depends on how efficiently it encodes, on the available bitrate (measured in kBit/s), encoding speed (fast isn´t always best) and if the codec is maintained well (a.k.a. continuously developed). MP3 for example has been in constant development for 20 years, first and foremost in its LAME variant. It is now so good that it reaches transparency for many people with a bitrate as low as 128 kBit/s. 'Transparency' means that the lossy encoding cannot be distinguished from the lossless original it was derived from. In other words, for many people a 128 kBit/s MP3 sounds the same as the original, 11 times bigger 1.411 kBit/s (bitrate of the CD) lossless source. In my opinion that fact not only proves how un-trained the ears (and brains) of casual listeners are, it also serves to point out how good lossy codecs have become. Wouldn´t it be transparent, you´d hear artifacts like flanging, pre-echo, smeared transients, problems with virtual stage, distortions (quantization noise), etc. These artifacts are a problem... for decades now MP3 has "enjoyed" a reputation as bad as the always-bad-stepmother in fairytales. Audiophiles and casual listeners alike constantly claim that MP3 (or any other lossy codec of their choice) sounds cold, lifeless, digital and... yes, compressed; lossy codecs are too often accused of "dumbing down" the sound.

5. Fairytales - or why MP3 & Co. are HiRes-capable

You see, all of this is was the truth. 15-20 years ago. In 2013 it´s indeed like the bad stepmother; a fairytale. Most lossy codecs are far, far better than their reputation. To you this may be surprising, dear constant reader, but they are indeed able to encode high resolution material like 24/44.1, 24/48 or even 24/96.
...
...
...
...
(Pause for dramatic effect)
...
...
...
...
Don´t believe it? Here´s why... all lossy codecs share a common trait: they encode any audio material not with static integer but with floating point values. You may remember that I talked for hours and hours about the ability of the MiniDisc to encode audio signals with a quality surpassing that of CDs. It´s exactly the same with MP3 & Co.: since they all employ floating point precision, the bit depth fed to the encoder's input is irrelevant, they will encode anything, whether it´s 16 bit, 24 bit or 32 bit.
What´s the difference between integer and floating point? Integer handles values like this: 23, 45, 156, 001, etc. Floating point is like this: 7,89654367, 674,342167, 55,236548955214587, etc.... you get the idea. It is able to work with higher precision since it allows for much more possible values than integer (or fixed point). The results are a gigantic dynamic range and a spectacular signal-to-noise ratio. For 24 bit integer you have a (quantization) noisefloor as low as -144 dB, for 32 bit floating point it´s at a stunning -202 dB.

6. Seeing is believing

The first graph below presents a tiny sine at 1.000 Hz with a level of -90 dB on an original 24/48 wave file (-> lossless, see Fig. III). This is so low-level that you won´t ever hear it. But you can see that the sine still looks like a perfect sine. 24 bit obviously allows for enough possible values for any signal at -90 dB to be properly represented, no wonder considering a signal-to-noise ratio of -144 dB. The next graph (Fig. IV) shows the same signal, though this time encoded with 16 bit. The situation now changes: this low-level signal gets very close to the -96 dB noisefloor limit of a common 16 bit system. In place of a sine you now have the very famous 'digital staircases' signal so often used to (mis-)represent supposed flaws of any digital system.

Fig. III: 1.000 Hz sine, -90 dB, 24/48 WAVE
Fig. IV: 1.000 Hz sine, -90 dB, 16/48 WAVE
Now let´s have a look at several codecs. Are they able to encode with high resolution so that the sine will look like Fig. III? Please be aware that I used only the latest codec versions; I also employed their highest possible bitrate. In case of MP3 (LAME) that´s 320 kBit/s, for AAC (Apple's implementation) it´s 320 kBit/s as well and for WMA Professional it´s 440 kBit/s

Fig. V: MP3 320 kBit/s, 1.000 Hz sine, -90 dB, 32/48
Fig. VI: AAC 320 kBit/s, 1.000 Hz sine, -90 dB, 32/48
Fig. VII: WMA Professional 440 kBit/s, 1.000 Hz sine, -90 dB, 24/48
Duh! The three graphs (Fig. V to VII) prove that we have true 24 bit resolution with every lossy codec represented here. Not one of the sines looks like Fig. IV. As I said: high resolution - despite lossy compression by a factor of 7. Wanna hear a word from the inventor of MP3, Herr Brandenburg (the guy up above), regarding this?
There are some kinds of deficiencies of standard audio equipment which cannot be found in properly designed Layer-3 and AAC codecs. They are listed here to mention the fact that it does not make sense to test for them. Most noticable are Dynamic range: MP3 and AAC both contain a global gain adjustment parameter for every block of music data. According to the word length and resolution of this parameter, the dynamic range of both MP3 and AAC is well beyond the equivalent of a 24 bit D/A resolution. In short, MP3 and AAC represent the music in a way that the dynamic range of every known audio source is perfectly retained. (Source)
High resolution again, he´s well aware of it. Of course he is, he developed it. But how will all those lossy codecs react to those plain old measurement signals he describes as unneccesary? I mean, lossy codecs are designed to work very well with music. Measurements are entirely different... those don´t fully adhere to all the psychoacoustic principles so vital to every lossy codec so effectivey they should measure horribly. To find out how they react to these difficult conditions I used RMAA and compared all encodings to the original .wav file they were derived from (a 32/48 kHz file). Therefore, the resulting, decoded test files were at 32/48 too (WMA Professional: 24/48). Look below at Fig. VIII and see for yourself how they managed to deal with this situation.

I have to repeat that lossy codecs behave worse with measurement signals, they are simply not developed for this kind of signal.

Fig. VIII
Duh... again. Think about it: roughly 90% of the test signal has been removed... gone forever... Poof! (WMA Professional: 80%) - yet they still perform so fuckin' well... and of all things with signals they aren´t even designed to encode well! But you probably glimpsed that there are quality differences between all those codecs. OGG for example ends up badly, this is due to OGG not being revised often enough. The same goes for WMA Professional; while it clearly is one of the winners in this contest it has an advantage because of its comparably high bitrate. Ironically, the oldest codec, MP3, fares best in RMAA's quality assessments. The picture changes somewhat when one looks at the graphs... to make a long story short, the best codec is AAC as used and continiously developed by Apple for iTunes. The AAC version from Nero isn´t able to hold up that well, again caused by long pauses in development.

7. Listening to differences

Measurements are one thing, listening to results another... and I´ll simply show you the residue that is produced when MP3 encodes music. I chose something you might already know, a song you´d hear on the radio: Woman's World (-> video) by Cher from her first album in 12 years, Closer to the Truth. The residue is called - see above - quantization noise, in this case mixed with imprecisions produced by the filterbanks MP3 needs to find out what it might erase. These quantization artifacts are usually hidden by the music itself (this is supposed to be that way). In order to reveal how MP3 works I simply inverted the phase of the original 32/48 .wav source file and mixed it with the decoded MP3 file. Et voilà, every little speck of dirt MP3 produced for this file is subsequently revealed. I didn´t alter the gain of this file, the level you´re hearing is the actual level of the errors within the MP3 file. Now, for those who´ll now say "Very loud... and that´s exactly why MP3 sucks big time" I can only say that you´re stupid. Just consider for a moment how our ears & brain perceive sounds and have a look at this: the RMS-level of the original file is -15 dB, the residue has an RMS-level of only -43 dB. A gain reduction of just 6 dB feels to our ear / brain half as loud. I think, you can do the math yourself.

Again: you cannot use this noise to point out how badly any lossy codec performs - it´s how it´s supposed to work, nothing else. As I said: (with a high enough bitrate) usually these artifacts are hidden and inaudible.



Sounds funny, doesn´t it? You can hear that the residue mirrors the original file closely, except that it´s stripped of bass and mids. Our ears & brain aren´t very good when it comes to high frequencies. Hence any lossy codec prefers to focus on the treble area. You can also hear that artifacts rise in level when the music gets louder and that their level decreases when the music gets softer or less complex. This is why lossy codecs are able to encode high resolution music - the additional resolution is kept. After all, high resolution is, when it comes to bit depth, nothing more than lowering the static quantization noise floor. To be fair, the artifacts left by MP3 are anything but static, they are chaotic and at all the places where MP3 removed information. BTW, the MP3 used for this example was encoded with 320 kBit/s. When the bitrate is reduced to, say, 128 kBit/s, the artifacts are considerably louder.

Update 30.03.15: Soundcloud used to host the audiofile containing the compression artifacts. But just this day, Soundcloud decided to delete everything I ever uploaded because their automated content protection system detected several breaches of copyright.
Well, of course it did! For my reviews I need to listen to music and in order to make sound differences available to you, dear Reader, I uploaded several samples, each of them - at max - 30 seconds long. Naturally, this isn´t a breach of copyright, because a) I don´t have a commercial agenda nor background for this blog and b) I don´t advertise filesharers nor do I encourage to download things illegally. I don´t even want to mention, that only 30 seconds (!) of a particular song or piece are far too short to be enjoyed properly by anyone who attempts to be an illegal asshole. Yet Soundcloud fears the lables and their paranoia of copyright breaches which in turn prompts them to be paranoid and incompetent ninnies themselves.
I hate paranoia, I don´t want to have anything to do with stupid people / companies and everything was deleted anyway... so I decided to delete my Soundcloud account. Sorry for that, dear Reader.

8. And now... the problem

All of this would be marvellous... if all those nifty lossy codecs would be decoded properly. Have you ever heard about a digital audio player or a smartphone that decodes lossy codecs with 32 bit floating point precision? See, neither have I. And that´s where the beast rears its ugly head: if they aren´t decoded with full precision they produce strong, additional quantization distortions NOT contained inside the signal itself. Software players for the PC usually don´t suffer from this malady, foobar2000, Winamp or JRiver decode MP3 & Co. with full floating point precision for simple playback. This makes perfect sense because the aforementioned softwares work internally with 32 or 64 bit floating point precision anyway (for DSPs, volume control, etc.). JRiver deemed this issue so important that they opened a thread in their forum, talking about it. But should you attempt a conversion from MP3 to WAVE, the basic problem is resurrected, meandering around again like a zombified corpse. Foobar2000 for example assumes that every lossy file was derived from CD; using the 'Auto' bit depth configuration in its converter dialogue converts everything lossy to 16 bit, whether it´s MP3, OGG, AAC or WMA (see Fig. IX). High resolution? Forget it. The same goes for the usually wonderful software dBpoweramp: floating point decoding has to be activated under advanced options (Fig. X).

Fig. IX: foobar2000 converter dialogue
Fig. X: dBpoweramp configuration, advanced dialogue
By all means, the 16 bit, quasi-standard decoding of MP3 & Co. isn´t a good thing. Imagine a CD you ripped yourself to MP3; these files were derived from a normal 16 bit source. 16 bit decoding should be enough then, right? Should, but is not. While the decoded data boasts the sources' original bit depth again, something new and eerie has been added... and I don´t mean the inlying compression errors produced by the encoder. No, this thing from the crypt is additional quantization noise produced by the decoder. Cause: truncating floating point values to integer values. These additional artifacts are produced only because the decoder works at half speed and with half of its options. Look at the graphs below:

Fig. XI: original, lossless 16 bit wavefile for comparison
Fig. XII: MP3, decoded with 32 bit floating point
Fig. XIII: MP3, decoded with 16 bit integer
Doesn´t look so bad, you´re saying? Well, then look again at the signal causing the noise, it´s a simple 1.000 Hz sine only. Fig. XI shows an original, lossless 16 bit wavefile, the quantization noise is evenly distributed across the spectrum. Fig. XII shows an MP3 file that has been decoded with floating point precision and while there are distortions, those are well below audible levels; most of them are at frequencies we cannot hear well. Fig. XIII shows the same file, this time decoded truncated to 16 bit integer. The artifacts have doubled - and only because they were decoded with 16 bits instead of 32 bit floating point. They might still be inaudible... but I´m not sure because now we have additional aliases at frequencies where us humans can hear extremely well. These distortions will be added by the stupid integer decoding, and it doesn´t matter if you have created those lossy files yourself or bought them at some online store. You´ve probably been listening to quantization artifacts all your life, errors produced by dumb decoding of portable players and stupid software. Even files encoded with WMA Professional, files that are clearly marked as being 24 bit by their data stream, suffer from erroneous decoding as most software decodes it to 16 bit only. I´m looking at you, foobar2000. Just because you´ve been programmed by people who give a shit about proprietary, 'bad' software coming from Microsoft, this still doesn´t mean that you have to behave like a silly goose.

9. The solution

I´m afraid that for the time being there is no solution. Companies producing digital audio players don´t seem to be aware of the problem or they just assume that your average-joe won´t notice it. Well, at least you can do something about it when decoding those files yourself with software. Just have a look at dBpoweramp again (Fig. X above) and configure it to decode MP3, AAC or OGG as 32 bit floating point or 24 bit. Do the same with foobar2000 in the converter dialogue (see Fig. IX) and change the output bit depth to '32' or '24', it´ll then decode lossy formats to their full potential when converting files to HDD. But there´s hope that this problem gains attention... people like Bob Katz who´s a mastering engineer and important enough to maybe excert some influence, mentioned this problem in a recent thread at the JRiver forum. Yeah, he was talking about dithering but at the same time he was fully aware that for MP3 & Co. it´s imperative that they are decoded properly and that decoding those lossy codecs with anything else than floating point will result in truncating values which in turn produces the aforementioned artifacts. He wrote:
"Did you know that all current Lame and Fraunhofer and Apple AAC and MP3 decoders run internally at 32-bit floating point? In fact, if you take a "16-bit" source AAC file and reproduce it through the AAC decoder, it produces a 32-bit float output word! If it was a very good encoding, you will lose audible depth if you reproduce it at 16-bit because more than 16-bits come out of the decoder. The output of an AAC decoder should therefore be dithered down from 32-bit float to 24-bits for best reproduction. Almost NO ONE does that, but they should, and I've heard the audible difference when I play AAC in an engine that permits that."
Thank you, Bob, exactly what I´ve been saying! At least someone acknowledges it. I won´t dither to 24 bit but each to his own (I´m not too fond of dithering to 24 bit, using a bit depth like this renders quantization-related problems moot). From now on, I will mention if digital players (portable CD players for example) are able to decode MP3 properly.

10. How to make the best MP3s (& Co.) ever!

In the meantime I´ll present some advice everyone encoding her/himself might find useful. I´ll also recommend the best sounding codec to you, based on my own personal experience. Please be aware that my suggestions will cause any encoding to take longer, if you don´t have the time to wait for the encoder, then don´t even bother. But then you´re not interested in best sound anyway, or are you?
First of all, all those nifty codecs are command line based, on a Windows PC they look like plain old DOS. Usually you can´t see this as the command line window is hidden by the software employing those decoders/encoders. But it nevertheless enables you to make everything yourself by using your keyboard... though I admit that it isn´t very convenient. So we´ll use nice, flashy software instead. The following encoding tips and setups will depend on the codec of your choice; in some cases their setup might be complicated for noobs but once you´ve done it correctly you won´t need to bother with it anymore.

MP3 (LAME)

One of the best and most versatile softwares around the net is the aforementioned dBpoweramp. It costs money but I can recommend it without reservations. It´s a powerful encoder/decoder for any format you can think of, it contains one of the best available CD rippers and its powerful talents are hidden inside an easy looking package. Even better, the CEO of Illustrate (company maintaining dBpoweramp) is a nice guy, discussing things and giving advice in his own forum and on hydrogenaudio. A free alternative would be xrecode II (shareware with a nag screen) but it´s buggy and inconvenient - use at your own risk. So, in case you want to use MP3 and in order to create the best sounding MP3s ever I recommend these encoding settings (Fig. XIV):

Fig. XIV: best encoding settings for MP3 (LAME)
If you´d like to use a different frontend for the command line based encoder/decoder instead  (xrecode II for example) I´ll now give you the commands so that you may copy and paste them:

-b 320 -q 0 --noreplaygain

The most important part is the '-q' switch, it configures the 'quality' option of the LAME MP3 encoder. The standard setting advertised by hydrogenaudio is '-q 2' but we want a choice and the best quality so we opt to set it ourselves. Why the constant bit rate (CBR) of 320 kBit/s when Hydrogenaudio recommends variable bit rate (VBR) in order to save on storage space? Think about it: 60 minutes of music occupy 137 megabytes when encoded with 320 kBit/s CBR. For VBR with an average of 240 kBit/s these 60 minutes take roughly 108 megabytes. A difference of 29 megabyte. In 2013, two to three photos on a smartphone alone consume this. We have to be realistic here: it might have been an issue 10 or even 5 years ago but nowadays with an abundance of storage space anywhere, surely we can afford bigger files. Furthermore, MP3 profits from more bitrate, no matter what the skeptics (hello, my dear hydrogenaudio-ists) are saying. More on sound issues later.


WMA Professional

Fig. XIV: best encoding settings for WMA Professional
I wouldn´t recommend WMA Standard as WMA Professional is superior in every way, it´s also the one codec officially supporting 24 bit output. Which is a fake of course, Microsoft just embedded an additional integer decoding option, internally it works with floating point just like other codecs. As a quality option, you should always use 2 pass encoding, it yields audibly superior results. Compared to MP3 above, 60 minutes of music occupy more space because of the highest 440 kBit/s setting: 189 megabytes. Should you really be concerned about storage requirements, using 384 kBit/s would work too, it´ll still sound well. None of this however hides the fact that WMA Professional enjoys close to zero hardware support. Most portable devices are able to play WMA Standard only - and I can´t recommend that one.

AAC (Apple)

Instead of using an old or proprietary codec I´ll recommend one of the most recent instead, one that also comes with ample hardware support by almost every manufacturer: AAC. Be advised that it now gets inconvenient. To make it easy, you could of course use the AAC codec from Nero (with dBpoweramp for example) but I´d advise against it; the one from Apple included with iTunes is much better (and it shames me to write this as I don´t like Apple). There´s only one way to unleash the iTunes encoder (or more precisely: the QuickTime encoder) with the best quality options: qtaacenc (get it here). You´d have to setup foobar2000 like this (Fig. XV):

Fig. XV: foobar2000 setup for qtaacenc
Just like with MP3 there´s a quality setting allowing for better-than-standard results (the standard settings are used by Apple for music they sell through the iTunes store): it´s simply called '--highest' (another source says '--high'). Don´t forget to instruct foobar2000 to use 32 bit during encoding, you now know well that AAC can handle it. Anyway, here are the commands in case you want something else besides foobar:

--cbr 320 --highest

11. This really is high resolution? Some disadvantages

MP3 and AAC (Apple) have one big disadvantage: they aren´t able to handle samplerates beyond 48 kHz. For those there´s only one codec left: WMA Professional. AAC (Nero) can handle 96 kHz too... but as I said above it´s not very good. Furthermore, devices usually able to handle AAC will react in strange ways (or not at all) when trying to play those 96 kHz AAC files. So you might want to use WMA Professional. But its main problem hasn´t changed: hardware support is extremely limited, not even software properly recognizes it.
What lossy codec to use depends on what you yourself consider to be high resolution. To me, HiRes starts with 24/48, for others it starts with 24/44.1. Strictly speaking, everything that´s not CD is high resolution. Take HDTracks: they sell even 24/44.1 as high resolution. More than one third of the music they´re offering is at 24/44.1 or 24/48. For those releases lossy codecs would be the perfect choice if you could ensure proper decoding on playback. And if you want to save on storage space, you might consider resampling your 96 kHz albums to 48 kHz and encode the result it with AAC or MP3.

12. REAL "lossyless" high resolution

But the best combination to save space and keep any file at its original resolution is... not FLAC. Have you ever heard about WavPack? Thought so. WavPack normally is, just like FLAC or APE, a completely lossless encoder/decoder that won´t ever touch the material it encodes. But it has a second, not so well known setting: WavPack lossy ('hybrid' is the correct designation). This 'lossy' mode is unlike MP3, AAC or WMA, it won´t remove anything within the music. As I said, the other lossy codecs work psychoacoustically and 'hack' into the frequency band at countless places, removing what cannot be perceived. WavPack lossy ignores psychoacoustics and does only this: reducing overall bitdepth of the file according to its level and distribution of frequencies.
Let´s assume a 24 bit file with lots of loud and soft parts. With WavPack lossy the soft parts will retain close to 24 bit resolution, loud parts will be reduced to 16-20 bit resolution. The resulting quantization noise is then moved by a very tame noiseshaper towards high frequencies where it cannot be perceived anymore. Very much like SACD. Unlike SACD though, bit depth isn´t static. WavPack lossy can be described as a 'dynamic bit depth decreaser'. In that respect it also differs from other lossy codecs; with them bitdepth changes a lot within frequencies, with WavPack lossy only from one level change to the next. It truly preserves the full bandwidth and dynamic range of high resolution material and in my experience, the added quantization noise remains completely inaudible. Remember the MP3 noise sample up above? If I would have extracted the artifacts of a WavPack lossy-encoded file, I´d have been required to raise the level by +50 dB to make them even audible!

Fig. XVI: RMAA chart comparing several lossy codec against their lossless source
Please refer to Fig. XVI above where three lossy codecs have to compete against their own source. While WMA Professional is superior to AAC Nero, it´s WavPack lossy winning the contest. It measures almost exactly like the original. Yet it´s 7 times smaller than the WAVE file it was produced from (1.200 kBit/s against 6.144 kBit/s). See Fig. XVII & Fig. XVIII for details.

Fig. XVII: total harmonic distortions - WAVE and WavPack lossy are the clear winner
Fig. XVIII: intermodulation distortions - WAVE & WavPack lossy measure the same
Because of its noiseshaping feature, WavPack lossy is ideally suited for encoding anything beyond 48 kHz. If you employ the '-x6' switch, you can exploit this further. Using this switch, I´ve found out during many hours of testing that WavPack lossy is fully transparent with 96 kHz material the moment the bitrate exceeds 1.000 kBit/s. Because of that I always use it with 1.200 kBit/s, as an additional security I also employ the '-h' switch (high quality). However, encoding takes forever. To me, this doesn´t matter; I only encode those files once and never touch them again. But with you it might be different - so you decide.

Fig. XIX: WavPack frontend with my recommended settings for best lossy quality
Fig. XIX presents my recommended settings for 96 kHz material using the WavPack frontend (for convenience I would´ve loved to recommend foobar2000 or dBpoweramp... but both won´t allow extra commandline switches - as it appears, foobar2000 now can. Very convenient, because it´ll use all your CPUs cores for additional encoding speed). The 'Extra Option' -x6 is most important to achieve the best possible quality... but using it will prolong encoding time (no kidding; while you wait, you could write a novel). The same goes for the '-h' switch and because it´ll also prolong time needed for decoding, it´s optional and not mentioned anymore. Anyway, for 44.1 / 48 kHz those 1.200 kBit/s are overkill, for 176,4 192 kHz they aren´t enough. That´s why the bitrate needs to be tailored to the samplerate:

44.1 / 48 kHz:
bitrate: 500-600 kB/s, switches: -x6 (optional)

88.2 / 96 kHz:
bitrate: 1.000-1.200 kB/s, switches: -x6

176,4 / 192 kHz:
bitrate: 2.000-2.400 kB/s, switches: -x6


13. The sound of noise

One of the most important questions of this article is how lossy codecs sound. And I don´t mean what the mainstream public thinks they are sounding which can be answered easily: cold, digital and lifeless (yet all use them - a mystery?). No, I mean the actual sound quality. I´ve said above that lossy codecs are far better than their reputation. BUT: you have to make sure to extract the best possible quality when using them! Please consider my encoding suggestions above again and remember that they take time but are worth every second spent on them. The best encoding isn´t the one that is the fastest (C'mon... a whole album encoded in 40 seconds... really?). To make it simple: the higher the quality, the longer it´ll take to encode. Just deal with it. Back to topic... if you used the best encoding options along with the highest bitrates and if you decoded all these files correctly (-> floating point) the sound of several lossy codecs is like this:
WavPack lossy: perfect. Using bitrates of roughly 1.200 kBit/s with 24/96 in combination with the -x6 switch it sounds exactly like the original. Always. Very limited hardware support... software support is good though. I have been archiving every bit of music with WavPack lossy since 2008 and I´ve never looked back. Not suited for portable use (for lack of hardware support), but perfectly suited for archiving and transcoding to other codecs like the ones below.
AAC (Apple): might produce instable staging, instruments occasionally seem to change size and position. Yet this happens so rarely that I could have been imagining it. Otherwise it´s completely devoid of artifacts. Sounds O.K. enough with lower bit rates. Hardware support is phenomenal. Simply the best mixture of convenience and good sound, therefore highly recommended for portable use
MP3: slightly 'dark', 'warm'. Somtimes sounds too dry, as if reverb has been reduced (especially audible with VBR). Smeared transients are another problem. No matter the bitrate, MP3 will always have difficulties encoding really short and tiny transients (the sample size for short blocks isn´t small enough). However, in 85-90% of all cases all of this isn´t audible at all. The danger of typical problems like metallic sizzling, flanging, etc. completely disappears if you use the highest bitrate of 320 kBit/s. Of course, using LAME avoids most problems anyway compared to other MP3 encoders. Hardware support? A 100%.
WMA Professional: on occasion creates instable staging worse than AAC (Apple). Instruments might change size and place, dimensions shrink or expand sometimes. This depends very much on the material, in many cases it´s completely inaudible. When used with 96 kHz, sound is too mellow. Otherwise it´s one of the most neutral and artifact free codecs available. Hardware support: laughable.
AAC (Nero): like WMA Professional, only (much) worse. Instruments always move around slightly, sizes vary as well, dimensions shrink and expand constantly. Furthermore, the sound feels 'blown up' at lower mids. Despite being much more recent and advanced than MP3 (LAME), it doesn´t sound remotely as good. Hardware support is - naturally - the same as for AAC (Apple).
WMA Standard: sizzles. Even with higher bitrates. On the other hand it enjoys almost the same hardware compatibility as MP3. I still can´t recommend it, it just isn´t good enough. 
OGG: was my standard choice more than 8 years ago. Shouldn´t be used nowadays. Obscures much of the virtual stage, sounds harsh (this is the only codec sounding literally 'digital'). Sounded different back in 2005: very beautiful and pleasant. Ignore it.

14. Transcoding Horror

If there´s one thing I hate, hate, hate people doing it´s transcoding from one lossy codec to another... or from 128 kBit/s to 320 kBit/s. You cannot imagine how many think that it actually improves quality - see here & here (I´ve found countless other examples, but they´re in German and my native language is a bitch for most people). The only thing it does is hurting quality immensely. To the encoder, the file that was compressed before is just a new file and it´s treated as such. Example: the really loud encoding artifacts within a 128 kBit/s MP3 are simply treated as they are, to the encoder they are just new musical information, a seemingly natural part of the music. You see, not one encoder in the world can distinguish between noise, artifacts or music... which means that artifacts are now treated as music. Back in 1999, when Microsoft tried to advertise its WMA Standard codec as superior to MP3 out of licensing greed, they tried to fool everyone to re-encode their MP3s to WMA. For this stupidity they should have been shot... so please, don´t you ever transcode from MP3 to MP3, AAC, WMA or OGG. The only allowed trancoding is one from WavPack lossy to another lossy codec (preferrably not WavPack lossy again).

15. Conclusion

My dear constant reader, I don´t know where you´re coming from, if you´re an audiophile, a skeptic or just feeling lucky to be here. But I know one thing: if MP3 & Co. would be sounding horrible no one would use lossy coding in the first place. If you´re an audiophile, do you really think that half of the earth's population fell for a ruse invented by some German scientist in order to 'dumb down' the sound of music? Then let me tell you that all the hills marbles of these 3 billion people are very much alive and there. This is no conspiracy, MP3 & Co. sound well enough that most people won´t even think about other formats. And as I´ve proven with this article, they are prepared to exceed and surpass mass-market sound with ease. Only if my suggestions are heeded, that is. Should you decide to follow them you´ll be rewarded with true high resolution sound... or high end quality, if you will. All coming from some 'dumbing down' lossy codec. I can´t stress it out often enough: take care in using the best possible quality when encoding with lossy codecs yourself. Take care in trying to make sure to employ the best decoding as well. Only then will you enjoy a sound quality you wouldn´t have expected from MP3 & Co. Happy encoding... and perfect decoding!


Last update: 29.12.2013

Thursday, October 03, 2013

Review: Sony D-E441 (1999)

Sony D-E441, view from above
This review is part of a larger comparison containing seven eight vintage portable CD players and the penultimate of these reviews. You can find the final conclusion here.

Overview

I bought the Sony D-E441 one and a half year ago and initially I never intended it to be reviewed here. I purchased it only to have a spare drive for my Sony D-465. They both share a similar drive mechanism (including the DAX-11 laser) and since the D-465 is fairly unreliable I thought it to be wise to have a back-up of sorts. But surprise: the D-465 still runs decently (with few exceptions) so I´ve started to use the D-E441 for listening. It sounds quite pleasant so it might be a nice thing for some people.

Sony D-E441: side view
On the photograph above you can see that it´s quite clunky, it doesn´t feature the compact elegance of the D-E705 and the D-E555. Manufacturing quality isn´t on par either. Take the lid: it is thin, flabby and seems quite fragile. The volume button is unmarked, numbers indicating how one adjusted the gain, are missing. Not convenient. An optical output is missing as well, a line-out is present though. Inside it still shares many parts of the aforementionend models. The D/A-converter is still the Toshiba TC9438FNL, the headphone amp the TA2120FN (used in the D-E555). The ESP2 feature is again controlled by the NPC SM5902, it also employs ADPCM compression. But one thing about the ESP feature is very different: the D-E441 always stores roughly two seconds of uncompressed audio in its RAM. If you start playback, the disc winds up to double-speed and pre-reads audio material into the RAM-buffer. After this buffer is filled, it winds down to normal speed. So even when the ESP isn´t activated you always have a working, non-compressing buffer at hand - and that IS convenient. There is a disadvantage to this method: depending on the drive speed it creates some small sines which are wandering erratically through the frequency band (see the measurements below).

Sony D-E441: detail
Listening test

Headphone out

I feel like a broken record... but don´t use the headphone output, please don´t. You won´t have the bass drop-off so prevalent on the D-E705/D-E555 - it seems to have a low output impedance. Yet it distorts so much that I at first though I had done something wrong during measurement. But no, the headphone-out-produced distortions are clearly audible... and naturally, I repeated the measurements to confirm my findings.

Line out

The line out sounds calm. Very calm. On the brink of being boring. The D-E441 is a warm sounding player, reducing highest treble and deepest bass. Timing and snap however are very, very good, it is neither too fast nor too slow. The perceived diminishment of the extreme frequency response ends however reduces punch considerably. When it comes to imaging, articulation is very good and superior to the reference. On the other hand, instruments that are supposed to be placed beside the artist who´s singing/playing in the center, move quite a bit behind her/him, hiding there. A stage as compact as this one usually helps depth and holographic impression. But not here, it´s not only compact, it´s also flat. BUT: it has immaculate stability, instruments never change their size or place, they are riveted to their spot. Reverb feels like an integral part of the virtual stage (unlike the D-E705/D-E555). The 'warm' impression the D-E441 makes, is augmented by dropping musical details audible on the reference files. Differentiation suffers as well, it isn´t able to resolve parts of the music that need to be resolved. All of this creates a warm, friendly, tasteful character with immaculate imaging stability. Critics could now argue that it doesn´t sound warm but boring; they´d be right too. With some material this warmth and friendliness is indeed unbearable and sleep inducing. So if you want to own it, please consider carefully what you´re in for, if you like its characteristic sound signature.

Sonic Balance:
Dynamics:
Resolution:
Stage / Ambiance:
Character:


Listen for yourself!

In this section you can compare my reference files to the recorded output of the Sony D-E441. I´ve uploaded several 30-seconds excerpts (fully legal) to Soundcloud for you to compare. This an example of transparency you won´t find anywhere else; what magazine offers audio examples of the device it reviews? These examples also serve to show how close to the source itself any device sounds when level differences are eliminated. I fully expect that some of my assessments might sound arbitrary to you, that is because differences with sources are tiny in reality. But please remember that EVERY other reviewer in the world faces the same problem. When you hit 'play' the files will be streamed to you in 128 kBit/s mp3, badly encoded. Therefore, I strongly advise you to download the files, they´re in 24/96 FLAC. With these files you not only have the highest quality possible, you´re also able to have a look at the aliasing performance of the Sony D-E441 if you want.
I wanted to upload two files to Soundcloud so that you, dear reader, would be able to compare them to their respective reference files. But as it appeared I wasn´t allowed to. Soundcloud accused me of breaking the copyright of the respective owners of the material I have been using so far. It´s funny, I didn´t even receive any warning beforehand... something like "Excuse me, Lable XYZ has alerted us that you use our platform to illegally distribute material... blablabla." I don´t distribute illegal material of course, but for labels/streaming providers, just about anyone is obviously a potential thief, redirecting the precious loot that in their mind is allotted to them by divine right. I´ve written Soundcloud an email, asking for more information. Let´s see how they answer. Depending on their answer I will decide if I´m keeping my 'Listen for yourself'-feature or not. I´m under the impression that I´ve done nothing wrong. I followed the 30-seconds-rule (it isn´t a rule really) and my blog is clearly NOT commercial (and won´t ever be) so I don´t know what their problem is.

Sony D-E441
Fancy graphs (measurements)

Line-out

Sony D-E441, RMAA's own quality assessments
Sony D-E441, frequency response
Sony D-E441, noise floor
Sony D-E441, total harmonic distortions + noise
Sony D-E441, intermodulation distortions
The line out of the D-E441 distorts less than the line outs of the D-E705/D-E555. But there are some odd peaks visible just above the noisefloor, these are caused by the CD drive winding up and down. I don´t exactly know how the workings of the drive mechanism are able to sneak into the audio signal, I also don´t know if they have an effect on the sound. They shouldn´t since they are so small... but I can´t be sure. On the whole, measurements are not bad yet they aren´t stellar either.

Sony D-E441, CCIF intermodulation distortions
Sony D-E441, jitter
Sony D-E441, impulse response
Ultrasonic aliasing (imaging) distortions are strong, the two models I´ve reviewed before utilize the same D/A-converter, so it isn´t surprising that the D-E441 behaves similarly. The jitter measurement is a bit misleading: the sines right to the 11.025 Hz sine were moving over the duration of the jitter testfile; in the beginning of the 60-second file they started below the 11.025 Hz sine at roughly 8.000 Hz (CD drive winds up), at the end of the test file they were just above it, stabilizing there (CD drive now is at normal speed). An example of the CD drive leaking into the audio signal. At last, the impulse response of the D-E441 gives the impression that Sony indeed was experimenting with non-standard aliasing filtering around that time, the D-E705/D-E555 exhibit the same response. Perhaps a reason for the slightly warm signature all three players are sharing.

Headphone out

Sony D-E441, frequency response with headphones, several impedances
Sony D-E441, total harmonic distortions + noise with headphones, several impedances
Sony D-E441, intermodulation distortions with headphones, several impedances
I don´t know how often I´ve repeated these measurements... but I always came up with the same result. In short: the headphone output of the D-E441 sucks big time. While frequency response indicates a low enough output impedance, distortions are so severe that a balanced sound cannot ever be approached with this model when the headphone output is employed. I have to tell you though that distortions can be lowered considerably when the volume is lowered to half of what´s possible (which isn´t easy with the missing indication numbers). Only then you´ll lose lots of gain which defeats the purpose of using the Sony in loud environment. Just use a portable headphone amplifier and connect it to the much better line out.

Last update: 03.10.2013

Sunday, September 29, 2013

Review: Sony D-E555 (1999)

Sony D-E555
This review is part of a larger comparison containing seven eight vintage portable CD players and the penultimate of these reviews. You can find the final conclusion here.

Overview

The Sony D-E555 was released one year (or a few months; were not able to find out the correct date) after the D-E705 and it happens to be very similar in every way. The only major, immediately visible difference is the design. IMO the design of the follow-up model isn´t exactly an improvement... though I like the blue colour (it also came in silver). Since it´s so similar to the D-E705 I won´t need to write too much about it. On the other hand, that circumstance leaves enough space that is going to be used for photographs instead. Yay!

Sony D-E555: rear view
You can see on the picture above that manufacturing quality isn´t the best. It´s OK though and - I´ve mentioned it - does not differ from the D-E705. It even came with the same accessories (cabled remote control with included display, battery pack, headphone)... blegh, it´s no use, this Discman is so similar to the D-E705 that I´ll concentrate on the (botched) design with the help of the aforementionend photos.

Sony D-E555: side view
Regarding the innards of the D-E555, please have a look at the D-E705, they are the same... well, almost. Instead of the ROHM BA3574BFS that serves as the headphone amp on the older model, the D-E555 uses the TA2120FN from Toshiba. So all important parts are now Toshiba. Funny for a Sony, isn´t it? Another thing that differs is the placement of the ICs responsible for the ESP. On the D-E705 they are soldered onto a seperate mini-PCB, hovering above the main-PCB. On the D-E555 these ICs have been incorporated on the main-PCB instead. But that´s all, at least to my layman eyes. So... time for a photograph again!

Sony D-E555: detail
Listening test

Headphone out

I hate to repeat myself... but don´t use the headphone output. Look further below at the measurements: the new headphone amp isn´t able to save the day. It´s still missing bass, this player will sound as thin as its predecessor. But we still have the...

Line out

... which won´t sound any different to the D-E705. Believe me, I´ve tried. I compared the sound produced by the D-E555 to my references, and, because I was so desperate for wanting to hear differences, to the files including the sound signature of the D-E705. But to no avail; the D-E705 and the D-E555 sound alike. Sometimes I was under the impression that the D-E555 sounds even warmer than the former model but that impression vanished the next second with different material. Just like the D-E705, the D-E555 sounds slightly muffled, yet it keeps differentiation. Holographic impression of the stage is still impaired by reverb that seems to be de-attached from the rest, singers still appear to sing closer to you. Shall I go on? Wait, I suggest something revolutionary new: why don´t you read the review of the D-E705 again? Oh, you did? Crap, I´m starting to repeat myself... not surprising when reviewing this Discman here.

Sonic Balance:
Dynamics:
Resolution:
Stage / Ambiance:
Character:

Sony D-E555: from above
Fancy graphs (measurements)

Line-out

Sony D-E555, RMAAs own quality assessments
Shall I go on and bore you with the same measurement graphs you´ve already seen on the Sony D-E705? I think not. Ah, wait... maybe jitter, impulse responses?

Sony D-E555, jitter
Sony D-E555, impulse response
Hm, fascinating: jitter is indeed different, just a bit. The thin spread I talked about in the review of the D-E705 has disappeared, instead the spread at the base of the sine got louder. Is this audible? I have no idea... but I don´t think so. The impulse response is the same and still atypical for Sony devices.

Headphone out

Sony D-E555, frequency response with headphones, several impedances
Sony D-E555, total harmonic distortions + noise with headphones, several impedances
Sony D-E555, intermodulation distortions with headphones, several impedances
As you can see, compared to the former model, distortions have been reduced. On this model, it´s also valid to say that headphones with higher impedances will perform better. Deep bass is still missing however, so prepare yourself for a tinny sound signature through the headphone output. Not as bad as the D-E705, but still miles away from useable. Since there´s nothing else to report... you know what this means, yes? Correct, another photo!!

Sony D-E555: detail
Last update: 29.09.2013
Related Posts Plugin for WordPress, Blogger...

The Socials