In this article Studio Slave will lay down the psychoacoustic laws and show you how we can use them to improve our productions. so what are psychoacoustics we hear you asking. Psychoacoustics can be thought of as the way we hear and perceive sound within the brain. This is a very in depth subject however in this article we have collated all of the main points and expanded on them with the electronic music producer in mind. In this article we will cover these main points:
- PERCEPTION IS NOT LINEAR
- THE HAAS EFFECT (PRECEDENCE EFFECT)
- CRITICAL BANDS
- NATURAL RESPONSE TO LOUD NOISES
- EQUAL LOUDNESS CURVES
- FLETCHER-MUNSON TO ALTER DEPTH
- HEARING IN RMS
- SOUND LAYERING
- DISTANCE PERCEPTION
PERCEPTION IS NOT LINEAR
Our first point is that perception is logarithmic, As was discovered and researched by Gustav Fechner who introduced the concept of psycho-acoustics.What this means is that the way we perceive weight, light and most importantly in our case, sound. Is non linear, this can make it tricky to measure, which will be explained shortly.Firstly we need to understand frequency ranges. The human ears can detect frequencies in the range of 20Hz – 20KHz due to their inner and outer structure as well as how our brain processes the information they receive.
Once again the perceived loudness of sound at different frequencies is also non linear. Our ears will need a very low frequency of say a 100Hz sine-wave to be much louder for us to perceive it as the same volume as a sine-wave at 1KHz.
Dynamic range – the dynamic range of human hearing is defined by the threshold of human hearing (Being the lower limit) and the threshold of pain (being the higher limit).
The threshold of pain is relatively unbiased to frequency and is capped at around 120 dBspl which is roughly equivalent to being stood next to a jet as it takes off. However the threshold of hearing is very much frequency dependant which can be proved with equal loudness or fletcher-Munson curves which will be explained shortly. The threshold of human hearing can be defined by a 1KHz sine-wave measured at 20 micro-pascals which is the equivalent of 0 dBspl.
Just to put this into perspective the blood rushing through your veins is louder than this and can be heard if you are in an anechoic chamber (a chamber that has almost no reverberations) also known as a silent room.Armed with this basic knowledge we can now tackle some of the phenomenons of the psychoacoustic domain.
THE HAAS EFFECT (THE PRECEDENCE EFFECT)
this is where two near identical sounds (or an early reflection of a sound) that are no further than 35ms apart and are within 10 dB’s of each other will be interpreted as one sound. A good way to test this theory is to stand 10m from a concrete wall and clap. You should be able to hear the reverberated signal separately as it bounces off the concrete wall and back to your ears. This is because the reverberated sound is roughly 60ms or 20 metres (10m there & 10m back) apart from the original sound. If you keep clapping and move on closer to the wall, try and stop when you can no longer tell the difference between the two sounds and they just become one sound. You will be roughly 5m from the wall. This is due to being half the distance, the clap now returns to your ears in half the time which is 30ms. This is the time window where humans can no longer distinguish between the two separate sounds. With this in mind we can use this knowledge to enhance the stereo image of a sound. For example we can take a clap sound and duplicate it. Pan one clap hard left and the other one hard right. At the moment all this will due is increase the loudness of the sound. However if we now delay one of these signals by between 5-35ms we are now changing the frequency information in each channel making our sound very wide. As long as we keep this below 35ms our ears will still think that this is one sound. A caveat to this is that going above 5ms we will start to hear the sounds separate and going below 5ms will result in metallic and phasey sounds. The Haas window can change slightly depending on the type of sound such as if it is a transient or a sustained note but generally it will be anywhere between 0-40ms.
if two sounds with very similar frequency content play at the same time and in the same stereo field within a mix. The louder sound will drown out or ‘mask’ the quieter sound. So an example of this is if two people are having a conversation we can hear them perfectly. However if they are on a busy street and a bus was to drive past the chances are we would struggle to make out they’re conversation due to their voices being masked by similar frequencies from the bus and surroundings. So basically we can simply look at masking as drowning out of sounds due to louder sounds (and they’re harmonics) in the same frequency ranges
this just states that masking only occurs over a certain bandwidth, for example if a narrow band of white noise at 2KHz is played over a 2KHz sine-wave we won’t hear the sine-wave. As we alter the bandwidth, frequency or amplitude of the sine-wave the masking effect will change. A narrow band of white noise at say 8KHz will have little to no masking effect on a 2KHz sine-wave regardless of amplitude because these frequency ranges are too far apart. However if we pushed the bandwidth of the white noise to maximum, meaning that it is the same amplitude throughout the entire frequency spectrum, then of course we are not going to hear the sine-wave over the white noise.
NATURAL EAR PROTECTION TO LOUD NOISES
The human ear naturally shuts down to protect the cochlea when it is subject to loud noises. We can think of this as a self defence mechanism to protect our hearing. We can replicate this by playing a loud noise then quickly reducing the sound before bringing it back up again. This is done throughout music production to make sounds seem louder than they are. From big EDM drops right down to on a transient level. Some producers get up real close in waveform view and create small fades or an attenuation in volume in between the transient and tone of a sound to further accentuate and empower the impact of the sound. This is also done very well in film sound design for things such as explosions and gunshots.
EQUAL LOUDNESS CURVES
As we mentioned in the introduction of this article, the ear perceives the loudness of a spud differently dependant on the frequency. This was discovered by Fletcher and Munson who did a lot of research and came up with fletcher-Munson curves (also commonly known as equal loudness curves) they’re findings were that at high volumes a sound will appear to have more high and low end, monitoring at low volumes is better for balancing and gives a more true representation of the sound (as well as being less damaging to the ears)Drawing from this knowledge that at loud levels the low and high frequency content sounds louder, we can also figure out that by Slightly Scooping out mid frequency content & boosting highs & lows when listening at low volumes can make a mix be perceived as louder and closer. When in reality all we have done is tricked the brain into thinking its listening at a louder level by reducing he mid frequencies slightly.
FLETCHER-MUNSON TO ALTER DEPTH
to make sounds appear further away roll off the high frequencies. this simulates sound dissipation through air. This is because our ears can tell depth perception by the ratio of mid frequency content to the high & low frequency content. As sound travels through air the high frequencies are dampened or attenuated faster than the mid and low frequencies due to the size of their wave length (which is inversely proportional to frequency)
EARS HEAR SOUND IN AN AVERAGE WAY (RMS)
a sustained sound at any given volume will be perceived as louder than a transient at any given volume. (Ears perceive loudness by RMS values) this is why when we use compression to raise the body of a sound we think it’s louder or punchier than beforehand even though the peak level is the same, due to raising its RMS value.Fundamentally this rule governs the whole concept of why we use compression to reduce dynamic range and use as much head room within a mix as possible to achieve the loudest perceived signal possible. Combining the precedence effect and the way our ears detect loudness in an RMS type manor we can beef up sounds using a reverbs early reflections. By adding these early reflections we will be boosting the RMS value of the body of the sound. And as we know a higher RMS value equates to a louder (and fuller) perceived sound due to making the sounds waveform more similar to a sustained note rather than a transient. Provided our early reflections are <35ms) from the original signal we will hear this as one unified sound rather than an initial sound and its reflection. Making sure we don’t include the late reflections is important as this will prevent us from altering the depth perception of the sound.
localisation is how the brain figures out the directional positioning of a given sound. This positioning of sounds can be tricky to get right within the confines of a DAW. Check out this YouTube video which harnesses the power of binaural audio technology.
We can split the way our brain locates sounds into a few headings which have been put into priority order.Inter-aural time difference (ITD) – this is the time difference between the left and right ear, if I clap to the front, this will be fairly equal. However if I clap to the left we know instantly that the sound is coming from the left due to the sound reaching the right ear a fraction of a second later than that of the left ear.
Inter-aural amplitude difference (IAD) – this is similar to ITD however this is for amplitudes instead. This is most prevalent in the higher frequencies, it’s worth noting that IAD will be overridden by ITD if the two theory’s contradict each other.
For example if I dropped a coin on the floor by your right ear the sound would reach the right ear first shortly followed by the left. If for some reason the amplitude of the coin was louder in the left ear, our brain would still interpret the sound as coming from the right ear as the ITD takes precedence over the IAD.
Next we have the ears construction. Due to the shape of the ear (the pinnae) it is designed to channel sounds into the ear canal but the shape will muffle sounds approaching from the rear. Due to this anatomy our brains perceive sounds that are lacking high frequencies to be coming from the rear. Also micro movements of the head and differences in high frequency content in the left and right ears help us to hone in on the sound source.
with masking in mind, our ears struggle to differentiate between two sounds of similar frequency content when played at the same time. This means that when layering sounds, our ears don’t hear two disparate sounds but simply hear one new blended sound instead. We can layer and blend different tones and textures together to create a new more complex sound
depth perception Relies on reverberant patterns – a reverb signals timing in relation to surfaces around you. Think of this as how bats use echo sounding to see, by listening to how they’re loud squeaks bounce back off they’re surroundings.
This can be proven easily by try trying to judge how far away a noise is when your not near any hard surfaces. It will be much harder to make accurate guesses in environments such as on a football field or in an anechoic chamber.
In a DAW this is the equivalent to the reverb size and shape. For example choosing a small drum room preset will emulate this smaller sound whereas if we were using a convolution reverb then we could load up an impulse realise such as a cavern or cathedral for a massive cavernous reverberated sound.Depth perception also heavily relies on dry to wet reverb ratio. This is fairly easy to understand, the further away a sound is the less detail and clarity can be heard and the more of the room or reverberate signal we will hear in relation to the original.
Our distance perception also Relies on being able to detect the dampening of high frequencies as a sound travels though air. As mentioned earlier Air absorbs high frequencies much more than mid and lower frequencies over large distances due to the size of their wavelength. For us to know if the sounds high frequencies have been rolled off the Sound needs to be familiar to us in the first place such as the human voice or a certain instrument. Our brains can then use the memory of this sound as a reference point.
This article is an excerpt from book 1 of our ‘Zero To Hero Guide To Mixing In Ableton Live’ Ebook Series.