Training at Disney Animation II: Size Matters

In the world of animation the world can be as big a basketball and a basketball can be as big as the world. In the real world, size matters. Not in the animation world. In the world of animation audio, size does not seem to exist either. Low frequencies don’t have a wavelength in ProTools. Sure they look all stretched out compared to the HF samples but there are no worries that they are too big to fit in the hard drive. This is not just an issue for this particular studio, but rather studio world in general. Such spaces are controlled environments. Acoustically sterile. Tracks can be synthesized, spliced, re-sampled to another frequency range, mixed with another track from across the world with little regard for the acoustics of the space. The physical nature of acoustics is far removed from the thoughts of people in these quiet spaces. Those of us in live sound never have the luxury of NOT thinking about the affects of the local acoustics, and the loudspeakers that fill them. Live sound people live in an acoustic combat zone – battling the interaction of multiple speakers, open microphones, stage noise, reflections and more.  Studio sound folks can isolate and control sounds to create a clean sound or, if desired, purposefully create an acoustic combat zone for the listeners.

Things we need to know to do our job, and things we accept and move on

Digital audio is pretty much magic to me. Sure it is 1’s and 0’s, a clock frequency and sample rates and all but I don’t visualize shift registers and memory locations as a filter. I turn a knob on my digital parametric and I look at the result on the analyzer. Its amplitude and phase responses are indistinguishable from the analog filters that I understand down to the resistors, capacitors and op-amp feedback loops. The end result is that I accept the digital filter as functionally equivalent and move on to do my job – set the filters to the best shape for the application. Each of us has unique areas that we accept and move on from in order to specialize in our area of interest. Don’t ask me to operate your mix console artistically, but I will be happy to show you lots of interesting things, scientifically,  about what happens to the signal in its custody.

In the world of studio mixing, the focus is on acquisition and manipulation of the “media”:  a recorded bit stream of audio. Once captured the media is free of the constraints of physical size until its final delivery to the audience. The only place where size matters is in the differences between the mix room and the showroom. Here is where an interesting set of proportions play out: a tiny minority of listeners (the mixers) will do the vast majority of the listening  in the small space of the studio, The vast majority of the listeners (the audience) will do a tiny minority of the listening (perhaps even just once – rather than 100’s of times like the mixer)  in the showroom. So the link between the rooms is critical. We have a very short window for people to experience the details the mixers worked so hard to create in their controlled studio environment.

My job here was to train the engineers in how to operate the analyzer. The analyzer measures differences – this is pretty much its entire capability: differences in level, time, distortion, phase, added noise. This is exactly the tool needed to link the studio and the showroom. But the tool is useless without the knowledge of how the differences are shown, and the vital link as to how those differences are perceived. Night and day differences can be shown on the screen – that have no perceptible sonic difference. Conversely we can enact audible differences that will be invisible to our analyzer. It is going to be important to know the difference.

It was not surprising to me that media/engineers there have spent little time considering the acoustical physics at play. It is not their job. The acoustics of the rooms and the speakers are provided by others. Unless there are gross deficiencies in the mixing room setup they can move ahead with their work. Each individual knows which rooms they like – but the central criterion for these engineers is how well the mixing rooms predict the listening experience in the showroom. It is possible with extensive ear training to be extremely competent at hearing the translation, and memory mapping the difference in order to anticipate the effects. It is highly improbable, in my opinion, that one can figure out how to affect these two different spaces in order to make them the most similar, without a thorough understanding of the physical nature: the size, shape, speed of sound, and the mechanisms in our human anatomy related to our sonic perception. A mix engineer predicts the translation, a system and/or an acoustical engineer affects the translation. The system engineer’s role is to help the mix engineer’s predictions come true.

The relative analyzer: Human and machine

Disney animation’s purchase of the SIM dual-channel FFT analyzer creates an opportunity to open a window into the physical nature of sound. The analyzer’s renderings are purely objective, and only displays what physically exists, i.e. – no prediction, no simulation. This does not mean it displays things exactly as we hear them. It measures relationships – some of which we experience directly, some indirectly. For example, let’s listen to a recorded violin track. ProTools can show us the sampled waveform over time of the track – amplitude vs. time. The analyzer can show us the spectral content over frequency, the relative levels over frequency over a given period of time. ProTools can (or at least could someday) show you this as well. That is still the easy part because we still have a one-dimensional rendering of the signal – level/freq. This can also be done with a Real-Time-Analyzer – the king of one-dimensional audio analysis.

Where the analyzer breaks out on its own is in the relative response: the response of the violin at my ear – compared to its own response in its recorded electronic form.  The analyzer can see peaks and dips, how much time it took to travel, how different the arrival time is over frequency (phase over frequency), how much HF was lost in the air, how much energy the room reflections added to the low end, how much noise was added to the signal and more. These modifications to the waveform all come from physical realities, and therefore, the best solutions come with an understanding of the physical nature.

The analyzer sees the difference between an electrical signal an acoustical. Humans can’t do that  unless they have an XLR input in addition to their ears. We hear only the end product, complete with its acoustical effects. We are, however, doing our own version of relative analysis. When we hear the violin track we are comparing it in our brain to our memory file of violin sounds. This history file gives us a variety of reference points to consider: how close we are to the violin, how big the room is, how live the room is, whether the violin being mic’d or a pickup, the presence of nearby surfaces that create a strong reflection to mar the tone, and much more. If the violin sounds close, we have the perspective of being on stage. If it has a long reverberation tail we are cued to believe that we are in a large reflective room. If the picture in front of us is a distant violin playing outdoors our brains will know we will have an implausible scene.

Two other relative analyzers for humans (and other animals) are used for localization.  Binaural localization compares the signals at the left and right ears for arrival time and level. Sound from the left side arrives earlier and louder at that ear and we localize the sound there. For vertical localization we use the comb filter reflection signature of our outer ear to determine up and down. The outer ear is asymmetric in shape and therefore a different set of reflections guides the sound into our ear from above than below. We compare the reflection structure riding on the incoming signal to the memory mapped signature set in our heads and match it up to find the vertical location.

The FFT analyzer operates differently: give it any two signals and they can be compared. Differences in time and level over frequency are directly shown. The analyzer does not need to know what a violin sounds like to know if it is being accurately reproduced. Whatever sound is generated can be the reference for comparison.

The next level is the relative/relative. We can compare the response at one location to another – or at a given time to another – or both. We can look at the studio response in multiple locations, look at the studio compared to the theater etc.  Our human analyzer has this capability as well but this is a considerable challenge to get specific data. One can walk around a room and observe the differences, or you can play your reference track at the studio and then take it to the theater. While it is not so difficult to walk around and hear differences in overall level,  gross frequency response characteristics and spot a strong peak here and there, it is very difficult to pinpoint a cause such as 2ms difference in arrival between two speakers or a 3ms reflection off the console.  It is possible that a person walking around the room can find these things and more by ear alone and propose the best solutions. I could win the lottery too. The probability of success go up greatly if we add the analyzer results to the data set ( we don’t have to stop listening and walking) and supplement our ears with information that we cannot directly experience. Our ears never directly say: The time gap between the direct and reflected sound is 8 ms.  They hear it. The data is in there – but you can’t access it. With our analyzer this number pops right out and the resulting peaks at 125 Hz, 250 Hz , 375 Hz and up will be as clear as the Himalayas.  But in order to get these answers, we will need to know enough about acoustics behavior to have this info at our fingertips.

To be continued


  1. nice info,thaks.

Speak Your Mind