Recording vocals; spoken word and singing
Recording Setup:
Equipment includes; microphone, audio interface, DAW and active monitors
Audio interfaces, mixers and some microphones have gain control dials. These dials allow you to increase or attenuate the signal feeding the device. There may be multiple gain controls across a single system and it is paramount that they are set correctly. This process is called gain staging. A signal that exceeds the optimal input is considered too hot and will therefore cause the device to act in unconventional ways as it tries to process the signal. As general practice states, don’t clip the meters between devices.
As air molecules vibrate, they interact with the diaphragm of the microphone and an electrical signal is generated. That signal reaches the audio interface and is converted to a digital signal that is received by the recording software. The software processes the digital signal and sends it back to the audio interface where it is then returned to an electrical signal and sent to the active monitors. If you are using passive monitors then the audio interface will send the converted digital signal as an electrical signal to an amplifier before reaching the monitors.
Recording levels:
Since we are recording with analog instruments into a digital program, there are a few things to be aware of. Analog devices use a different decibel scale, therefore, correlating these two units of measurement is useful information when achieving an optimal recording.
Analog instruments such as amplifiers, microphone pre-amplifiers and busses on consoles start to generate harmonic distortion and colour to a signal when pushed to their upper limits. This is a big reason why analog recordings are considered warm and pleasant as opposed to the brightness that digital conversion applies. Engineers and producers will frequently overdrive analog circuits to achieve harmonic distortion by increasing gain dials pushing the signal above 0dB in the analog realm. 0dB on a digital meter is -18dB in the digital world. Recording around -18dB digitally is considered proper practice as this ensures the analog instruments that are being used are not producing any artefacts such as harmonic distortion. This will render a clean recording which is always the best starting point when processing not only vocals, but any audio for that matter.
Recording spoken word for podcasts, dialogue, ADR and tutorials.
Chain; HPF: rollf off any rumble or low frequency hum with a 2 or 4 pole filter @ 60Hz
EQ: listen for boomy, boxy, honky and harsh bands of frequencies and balance them
noise gate: for low amplitude signals set a noise gate at -30dB. Anything below this threshold will not be loud enough to engage the noise gate.
Compressor: 6:1 ratio with fast attack and medium release to catch any sharp transients
Compressor: 3:1 ratio with medium attack and slow release to dynamically control the voice
De-esser: use the band pass input EQ setting and sweep around the 5kHz to 8kHz band. Vocal sibilance fatigues the ear and becomes harsh if not corrected.
EQ: after compression, some frequencies may build up a little too much. Use this EQ as a safety feature should the voice become slightly unbalanced.
Limiter: finally, cap off the device chain with a limiter to catch any stray peaks that could potentially squeeze by. This ensures clipping the meter is not possible, preventing digital distortion in the recording.
There are many ways to chain devices together for optimal processing and they are all dependent on microphone design, room acoustics, proximity and even the voice itself. This is one way to record a voice to ensure you have as clean a recording as possible. Tweak these settings to fit each type of voice.
As spoken word is used far less with music or background sounds, it is important to have a very clean, ambient free recording that is dynamically compressed and balanced by itself. Singing is slightly different. Typically, you will need a HPF, some EQ and compression to get an optimal recording but the key is to pick up as much as possible. The reason is because the clean-up and correction should be done within the context of the song. The song may need more low end in the vocals to be heard in the mix a little better, therefore if we completely get rid of the low end in the recording with a filter then it cannot be brought back if we were to need it in the mix. Similarly, if we compress the singers voice too much then it will take a lot of work to try and regain the dynamic range if the song calls for it.
Chain; hpf: roll off any rumble or low frequency hum with a 2 or 4 pole high pass filter @ 60Hz. Some microphones offer a filter directly on the microphone
EQ: listen for boomy, boxy, honky and harsh bands of frequencies and balance them
Compressor: slightly compressing will keep the vocal in a controlled range while maintaining some of the dynamics
de-esser: use the band pass input EQ setting and sweep around the 5kHz to 8kHz band. Vocal sibilance fatigues the ear and becomes harsh if not corrected
Limiter: finally, cap off the device chain with a limiter to catch any stray peaks that could potentially squeeze by. This ensures clipping the meter is not possible, preventing digital distortion in the recording.
After the recording, more EQ, compression, saturation and even multiband compression can be applied to mix vocals with the song. Time based effects such as reverb, delay, chorus, phaser and flangers should be used at the end of the effect chain as these are used only to enhance a recording, not to correct it. Use these effects to create depth and width.
There are lots of ways to clean a voice recording wether its spoken word or singing but capturing the cleanest recording will make for optimal processing.
Comments