Lets gets down and dirty and into the nitty gritty, and break down the difference when it comes to 16 bit Vs 24 Bit sound quailty.
Ah, the fond memories! 12 bit sampling on Akai MPC60. It was once everyone’s go-to production trick - the magic sauce that made your drums sound more ‘gritty’.
Every hip hop producer literally loved it to bits! But gone are the days when samplers were actual hardware units sitting in your studio rack or desk. The rise of software samplers and digital audio workstations have blurred the basic concepts of digital audio for many.
We evolved to 16 bit CD-quality, enjoyed 18/20 bits in some digital effects units, and eventually reached 24 bits in professional recording gear. Today we have 24 bits available even in streaming audio. So what exactly is bit depth and why do you need to care about it?
Bit depth refers to the number of bits you have to capture audio content in a single moment of time. You should think about it as series of available levels, that define a dynamic range for the sound.
Analog source signal (straight curve) is converted to digital representation that can then be processed in the digital domain.
Your digital audio interface (be it portable recorder, sound card or for example hardware sampler) makes use of a special analog-to-digital converter (ADC) to convert analog electrical signal such as voltage or current into digital bitstream, that can be processed and stored as binary numbers. When we wish to audition the captured audio data we need to send the information to digital-to-analog converter (DAC), which then converts the bitstream back to voltage or current. It’s important to understand at this point, that sampling (or digital capturing) is essentially a process that creates a digital ‘snapshot’ of an (constantly varying) analog signal based on two main settings - speed of sampling (sample rate, which is expressed as number of samples taken during 1 second) and range of sampling (bit depth, which I will explain shortly).
What this actually means is that in practise digital recording can never be an exact representation of the original signal but rather a “sample” of it (hence the name “sampling”). Let’s talk nerdy for a bit, shall we?
In the computing world strings of binary digits or bits are used to describe anything a computer does and computers are able to manage entire strings of these at a time. 16 bits states that there are 16 binary digits in a word, so each digit in a string represent either a value of 0 or 1. 24 bits states there are 24 digits in a word and so on. A sample recorded at 16 bits can therefore contain over 65 000 (65 536) levels whereas 24 bits can contain over 16 million (16 777 216) unique levels. The difference between 16 bit and 24 bit sounds huge — at least when you look at the number of available levels. But look at this difference in terms of dynamics:
Imagine the quietest whisper and the loudest bang in a concert — that’s essentially your dynamic range. But although bigger bit depth is technically better (24 bit adding more 'resolution' compared to 16 bit), this added resolution doesn't necessarily mean higher quality, it just means we can encode a larger dynamic range. The term resolution might be a little misleading here, since many think of it being similar to adjusting screen resolution of your computer monitor. But when your turn down the bit depth of a file, you’ll actually get an increasing amount of low-level noise, kind of like tape hiss. This is why dynamic range is sometimes referred to as signal to noise ratio (SNR or S/N), although there are some differences. Here is what Head-Fi said about this common misunderstanding regarding bit depth:
The only difference between 16 bit and 24 bit is 48 dB of dynamic range (8 bits x 6 dB = 48 dB) and nothing else. This is not a question for interpretation or opinion, it is the provable, undisputed logical mathematics which underpins the very existence of digital audio.
The important question here is obviously: could you actually hear that 48 dB difference in music? Archimago conducted a small study which revealed that even the most experienced subgroups (musicians, sound engineers and hardware reviewers) couldn’t really tell the difference between original 24 bit music and the same files dithered down to 16 bits (which was then fed into the DAC in the 24 bit container) in A-B listening tests. Even if they had access to equipment costing >$6000. This effectively reveals at least some interesting points to consider: even symphony orchestra recordings (that can have dynamic range greater than 60 dB) don’t really benefit from this technical advantage. That additional 48 dB starts to sound almost laughable when we realize that some types of music today can have a dynamic range of just 12 dB. Maybe the younger generation of music producers shared the meme ‘do you even compress bro?’ too much so it became the rule of the day?
Joking aside, that’s not the whole story about bit depth. So far we have only talked about simple data, such as single mono or stereo files. But working with sound in the digital domain means we’re usually mixing multiple digital audio bitstreams to generate a single stereo or multichannel output. And that’s where this additional resolution becomes handy. Files that were captured with bigger dynamic range will have better signal to noise ratio and higher level of detail (smaller steps between points of amplitude). Additionally, there is this funky thing called floating point. This term can be even more confusing, so take a deep breath…ready?
As mentioned, the converters inside modern digital audio interfaces can theoretically handle 24 bit ADC-DAC resolution (theoretically, because technically speaking there is probably no audio system in the world giving more than 20 clear bits of signal due to resistance and semiconductor noise characteristics) but this can be further enhanced through arithmetic calculation: using floating point binary arithmetic instead of fixed-point allows a far greater range of numbers to be represented using the same number of bits. At a glance, the achieved theoretical dynamic range of 1680 dB (24 bit, 32 bit floating) does not make a lot of sense as human hearing ranges from 0 dB (the threshold of silence) to about 150 dB (the threshold of pain). Instead, you should think about it as added resolution - that is the accuracy with which analog amplitude can be represented in the digital domain.
Now focus! Using floating point binary arithmetic will not protect you from clipping during recording, since the converter still only handles 24 bit fixed bit depth. But 32 bit floating point calculation is significantly more accurate than 24 bit, and data loss (which affects audio quality) through rounding and so on as signals pass through plug-ins is much less of an issue if the audio files themselves are encoded as 32 bit float.
This advantage is not limited to just encoding audio files during recording though! The mixing engine bit depth inside a DAW software can also be set to use floating point calculation and that makes a big difference - by using floating point binary arithmetic you are able to avoid such problems as clipping during rendering using plugins, unnecessary noise introduced by plugin dithering and rounding errors during signal processing. Want fool-proof exports of your mix? Always export your files using floating point calculation since these files can recover from clipping disasters (assuming you used plugins that make use of floating point calculation)!
Still with me? Now that bit depth is demystified, we really should have a look at sample rate, or speed of capturing the audio signal.
Sample rate is given in hertz (Hz) which is a derived unit of frequency. A single hertz consists of a cycle that has amplitude values of 0 to +1, -1 and 0. We humans perceive frequency of sound waves as pitch but in sampling this essentially refers to the speed of capture. A reel-to-reel tape recorder is a perfect analogy to understand this: the faster the recording speed, the better the reproduction quality.
Technically speaking a machine could of course take a sample from the source signal at any speed that the ADC converter design allows. But there is a certain minimum requirement that is directly connected to the frequency area that we humans are able to hear, which ranges roughly from 20 hertz to 20 kilohertz and this of course varies from person to person due to individual physical properties. This effectively means that by design our ears can only comprehend the frequencies that fall in between this area. No upgrades available, folks.
But to achieve capturing this convincingly in the digital domain we also need to follow a certain law called Nyquist-Shannon sampling theorem which says the sampling frequency must be greater than twice the maximum frequency one wishes to reproduce. Otherwise our ears will tell the difference between analog source signal and it’s digital representation. This sets the minimum useful sample rate to roughly 44000 hertz. Additionally, to avoid a sampling problem called aliasing (more about this later), these sampled signals must also be low-pass filtered.
Mathematically speaking 44.1 kilohertz is ideal as it is the product of the squares of the first four prime numbers. But the real reason why it was adopted is actually historical: back in the days of VHS the limitations of circuitry used in now redundant VCR recording technology also placed this number to maximum of 44.1 kHz and it was widely adopted by major players in the industry. In modern technology the sampling speed can be significantly better than this and sample rates such as 48, 88.2, 96 or even 192 kilohertz are now commonly available.
Now did you also have that DJ friend who always claims LP’s sound superior to digital recordings? Although not obvious, there is one theory out there claiming he might be onto something: analog recording and playback equipment can indeed reach up to 50 kilohertz, which is 6 kHz more than our minimum requirement and incidentally superior to 44.1 kHz, which is known as the standard CD quality. This theory comes from an idea that, although we cannot hear it, the audio energy still exists on lower frequencies and will therefore affect the listening experience positively in an all-analog playback system. Whoa, hold your horses! But can vinyl performance shine in other areas? Here we need to understand the technical limitation of direct-cut vinyl record, which may only reach dynamic range of roughly 70 dB whereas a compact disc can reach 98 dB respectively.
Digital sampling is not completely problem-free, though. I mentioned earlier about the noise produced by the semiconductor components that will be always present. But sometimes noise is good: almost any kind of signal processing causes a reduction of bits, and prompts the need to use dithering, which essentially adds noise to the signal. This has the effect of spreading the many short-term errors across the audio spectrum as broadband noise. A typical usage example would be reducing bit depth of an audio file from 24 bits to 16 bits. Dithering in this case would be done by adding noise of a level less than the least-significant bit before rounding to 16 bits.
There is also aliasing, which occurs when the audio signal contains frequencies that are inaudible to humans. For example, using 32000 Hz sample rate will include frequency components above 16000 Hz (the Nyquist frequency for this sampling rate), which will cause aliasing when the music is reproduced by a digital to analog (DAC). To prevent this, an anti-aliasing filter is used to remove these components prior to sampling.
Last but not least, digital capturing is a process where timing accuracy is extremely important. This timing is handled by a word clock signal, which acts like a conductor, providing periodic timing signal to all the parts of a digital audio system in order to have each process triggered at a precise moment. When a sample is taken 44100 times within one second, no matter how high-end gear you have, there will be errors or small deviations in the ideal timing. This error is called jitter and it creates minute but audible deficiencies in sound quality.
How does this digital puzzle work together, then? Have a look at how these settings effect the bitstream that we create by selecting bit depth and sample rate:
Bit depth and sample rate determine how much bandwidth (or data per second) is required to transmit the file and also how much storage space it will reserve when stored in digital form.
By looking at the figures above we can now easily adapt this information to real-life situations, such as streaming audio content from internet or exporting a mix to a storage device. Bear in mind, that these numbers represent a single stereo file.
Remember that 3G/4G internet connection you have in your mobile? While it can be relatively fast it will most likely still transfer data at under 100 Mbit/s speeds or even more likely somewhere between 10-40 Mbit/s. This means that transferring a common cd-quality (44.1 kHz, 16 bit) uncompressed audio stream over a mobile internet connection would already take a big share of the available bandwidth. And then there is the necessity of buffering. Compressing audio content for streaming suddenly makes a lot of sense, doesn’t it? But the file generated during 1 minute of recording with these settings is still small by today’s standards.
Using a high quality recording device and 96 kilohertz 24 bit quality would result in a file size that is over three times bigger. This happens, because the speed of capture is faster and also because the converter is capturing a larger dynamic range. 192 kilohertz setting would of course grow the file size even further.
Number of tracks (mono)
Session size per minute
Session size per hour
16 bit, 44.1 kHz
16 bit, 48 kHz
16 bit, 96 kHz
24 bit, 44.1 kHz
24 bit, 48 kHz
24 bit, 96 kHz
A typical live tracking session, where you would have 24 mono tracks armed and recording simultaneously.
When we look at recording a live band or other similar tracking situation, we can clearly see how the session size (need to allocate more space) grows. Remembering that a typical recording session includes multiple takes and possible further processing (which will consume even more space on your storage device), it’s not difficult to predict at least a minimum session size. The tracks in this example are mono but given data translates directly into various other channel configurations (2 mono channels = 1 stereo, etc.).
Need to know the exact file size for your selected settings? Use this audio file size calculator: https://toolstud.io/video/audiosize.php
Let’s put this all together, have a look at what we’ve learned about the technical possibilities, and discuss a bit about the practical limitations and usage situations. In today’s world of fast computers and seemingly unlimited disc space it is perhaps wise to ask what would be the ideal quality you should be using for your recordings and in your mixing process? This question doesn’t have a single right or wrong answer, though. Let me explain.
Firstly, I should probably point out the biggest limitation that might still come as a surprise for some: your computer or recording device needs to have a really fast storage device in order to manage the high bandwidth requirement when recording or playing back a multitrack project using 24 bit 96 kHz high quality setting. Just think about 20 stereo tracks resulting in 87,8 Mbit/s bandwidth requirement and compare it with a typical 7200 RPM computer hard drive: the write/read speeds will usually fall in between 80-160 Mbit/s. Not a big surprise that writing 20 stereo channels at this quality would prove impossible. The interface (SATA, USB2/3, MicroSD, etc.) also makes a big difference. Remember: what the audio converter chip essentially creates is a stream of data that must be written to storage medium in almost real time!
The ‘trick’ that professionals use is to split the load onto separate physical drives. This makes a lot of sense as computers typically need to handle reading and writing operations of the operating system too. Portable recording devices are not usually prone to this since they typically handle much smaller number of audio tracks compared to DAW computers. Also, their operating system is usually quite different by design.
Secondly, audio files can hog a lot of storage space. Recording various takes, bringing in multichannel data and exporting mixes will always consume space. Sample libraries and sample-based virtual instruments usually require a lot of space and investing in a separate high-speed drive for the sound library makes a lot of sense. Also, while the file size grows exponentially when using higher quality settings, the perceived quality might not necessarily feel worth the extra bandwidth/space requirement. Always remember to look at these settings in context of what you are trying to accomplish and what is an acceptable quality to work with.
A simple ‘failsafe’ approach here would be to always record at higher quality, mix using high quality setting and then convert the final mixdown to whatever quality necessary but even that is not always practical. Every time you are using a high quality setting in your DAW you will also generate a higher CPU load to your computer. Mixing a simple radio jingle using 32 bit float 192 kHz engine would seem quite overkill for obvious reasons.
Floating point recording and mixing is a great feature to have as it potentially gives you more freedom through that added resolution. But this feature is not available everywhere: I have seen it in many digital audio workstations but it rarely exists in any portable recorders — unless you’re willing to part with a serious lump of cash.
Nowadays it seems many hobbyist and semi-professional producers out there are quite happy going with so-called ‘budget’ solutions. While that is perfectly fine (especially if you are not doing any recording from external sources), there are still obvious reasons why anyone even remotely serious about sound should consider investing into good quality hardware. Still, even the simplest DAC chip integrated to your modern computer’s motherboard would prove sufficient for many. When you export a mix (create a file) from modern DAW, this file will be written directly to storage medium and doesn’t need to travel through DAC.
Why did the Akai MPC60’s 12 bit sound turn out so popular? Well, Akai used 16 bit ADC/DAC converter chips on board, but the software was capable of writing the sampled material in a special non-linear 12 bit format. Maximum sample rate was 40 kHz. The Burr Brown PCM54HP (DAC) and PCM77P (ADC) converter chips used also provided a certain character (together with other less significant design features) that many later MPC revisions or other hardware samplers couldn’t quite match. You could, quite reasonably, claim that the sound quality or character it was able to produce matched quite perfectly for the style of music that really benefited from that extra ‘grit’.