Phase Alignment of Subs – Why I don’t use the impulse response

OK. The saga continues down below……………………….

What is the best way to phase align our subwoofers to the mains? There is a hint in the way the question was phrased. I didn’t say time align (and it is not because I am afraid of copyright police). I say phase align because that is precisely what we will do. Simply put, you can’t time align a subwoofer to the mains. Why? because your subwoofers are stretched over time – the highest frequencies in your subwoofer can easily be 10-20 ms ahead of the lowest frequencies. Whatever delay time you choose leaves you with a pair of unsettling realities: (a) you are only aligning the timing for a limited ( I repeat LIMITED) frequency range, and (b) you are only aligning the timing for a limited ( I repeat LIMITED) geographical range of the room. So the first thing we need to come to grips is with is the fact that our solution is by no means a global one. There are two decisions to make: what frequency range do we want to optimize for this limited partnership and at what location.

Let’s begin with the frequency range. What makes the most sense to you? 30 Hz (where the subs are soloists) , 100 Hz (where the mains and subs share the load) of 300 Hz (where the mains are soloists)?  This should be obvious.  It should be just as obvious that since we have a moving target in time, that there is not one timing that can fit for all.

Analogy: a 100 car freight crosses the road in front of you. What time did the train cross the road? The answer spans 5 minutes, depending on whether you count the engine, the middle of the train, or the end. Such it is with the question: when does the subwoofer arrive? (and is also true for when does the main arrive?) How do we couple two time-stretched systems together? In this case it is pretty simple. We will couple the subwoofer train right behind the mains. The rear of the mains is 100 Hz and the front of the subs is the same. We will run the systems in series. The critical element is to link them at 100 Hz. (I am using 100 Hz as an example – this can, and will vary depending upon your particular system).

The procedure is simple. measure them both individually, view the phase and adjust the delay until they match. You have to figure out who is first and then delay the leader to meet the late speaker. This will depend upon your speaker and mic placement. I say this is simple – but in reality , it is quite difficult to see the phase response down here. Reflections corrupt the data – it is a real challenge.  Nonetheless, it can be done. It’s just a pain.

When I get a moment I will post up some pics to show a sub phase-align in the field. 

Wouldn’t it be nice if there was a simpler method? Like using the impulse response to get a nice simple answer directly in milleseconds, instead having to watch the fuzzy phase trace.  It is absolutely true that the impulse response method is easier.  In my next post I will explain why the easy way lacks sufficient accuracy for me to ever use with a client.

******************  Part II *****************************************

FFT measurement questions and answers

The first thing to understand about an impulse response is that it is a hypothetical construct. This could, to some extent, also be said about our phase and amplitude measurements, but it is much more apparent – and relevant with an impulse response.

The response on our analyzer is always an answer to a question. The amplitude response answers the question: What would be the level over frequency if we put in a signal that was flat over frequency. This is not hard to get our heads around. If we actually put in a flat signal (pink noise) we would see the response directly in a single channel. If not, we can use two channels and see the same thing as a transfer function. This makes it a hypothetical question- what would the response be with a flat signal – even if we use something like music.

Same story with phase but this gets more complex. Seen any excitation signals with a flat amplitude AND phase response?  You won’t find that in your pink noise. Pink noise achieves its flat amplitude response only by averaging over time. Random works for amplitude – but random phase – yikes – this will not get us any firm answers. In the case of phase we need to go to the transfer function hypothetical to get an answer – the phase response AS IF we sent a signal with flat phase in it. Still the answer is clear: this is what the system under test will do to the phase response over frequency.

Impulse response

The impulse response display on our FFT analyzer answers this question: what would be the amplitude vs. time response of the system under test IF the input signal was a “perfect” impulse.  Ok……….. so what is a perfect impulse?  A waveform with flat amplitude AND phase. That can’t be the pink noise described earlier, because pink noise has random phase. So what is it?  A single cycle of every frequency, all beginning at the same time. Ready set, GO, and all frequencies make a single round tripand stop. They all start together, the highest freq finishes first, and the lowest finishes last. If you looked at this on an oscilloscope (amp vs time) you would see the waveform rise vertically from a flat horizontal line, go to its peak and then return back to where is started.

IF the “perfect” impulse is perfectly reproduced it will rise and fall as a single straight line. The width of the line (in time) will relate to the HF limits of the system. The greater the HF extension, the thinner the impulse. As the HF range diminishes, the shortest round trip takes more time, and as a result the width of the impulse response thickens as the rise and fall reflect the longer timing. A system with a flat phase response has a single perfect rise and fall in its impulse response and a VERY important thing can be said about it: a single value of time can be attributed to it. The train arrives a 12:00 pm. All of it. 

The impulse response on the FFT analyzer is not an oscilloscope. We do not have to put in a perfect impulse. We will use a second generation transfer function, the inverse Fourier transform (IFT) , which is derived from the xfr frequency and phase responses. This is the answer to the hypothetical question: what would the amplitude vs time response be IF the system were excited by a perfect impulse. 

If the system under test does not reproduce the signal in time at all frequencies, then the impulse response shape will be modified. Any system that does NOT have a flat and amplitude and phase response will see its impulse response begin to be misshapen. Stretching and ringing, undershoot and overshoot will appear around the vertical peak. Once we are resigned to a non-flat phase response we must come to grips with the fact that a single time value can NOT describe the system. The system is stretched. The time is stretched. The impulse is stretched.

This is where the FFT impulse response can be misleading. We can easily see a high-point on the impulse response, even one that is highly stretched. Our eyes are naturally drawn to the peak – and most FFT analyzers automatically will have their cursors track the peak – and lead us to a simple answer like 22.4 ms, for something that is stretched 10ms either side of that. And here is where we can really get into trouble: we can nudge the analyzer around to get a variety of answers to the same question (e.g. the same speaker) by deciding how we want to filter time and frequency: ALL OF WHICH ARE POTENTIALLY MISLEADING BECAUSE NO SINGLE TIME VALUE CAN DESCRIBE A STRETCHED FUNCTION.

Did I mention that all speakers (as currently known to me) are time stretched?  So this means something pretty important. The simplistic single number derived from an impulse response can not be used to describe ANY speaker known (to me) especially a subwoofer.

Does a stretched impulse response tell you what frequencies are leading, and by how much? Good luck.  You would have a better chance decoding a German Enigma machine than divining the timing response over frequency out of the impulse. This brings us back to the heart of the problem with our original mission: we are trying to link the low frequencies of the main speaker (100 Hz) to the high frequencies of the subwoofer (100 Hz). The peaks of these two respective impulse responses are in totally different worlds. They are both strongly prejudiced toward the HF ranges of their particular devices which means the readings are likely to be the timings of 10 kHz and 100 Hz respectively. 

Simple answers for complex functions. Not so good.  That’s it for the moment. Next I will describe some of the different ways that impulse responses can be manipulated to give different answers and when and where the impulse response can provide an accurate means of setting delays.

********************** Part III ***********************************************

The linear basis  of the impulse response

 Those of us using the modern FFT analyzers that are purpose-built for pro (and amateur) audio have been spoiled. We have grown so accustomed to looking at a 24 or 48 point/octave frequency response display that we forget that this is NOT derived from logarithmic math. The FFT analyzer can only compute the frequency resp0nse in linear form. The quasi-log display we see is a patchwork of 8 or so linear computations put together into one (almost) seamless picture.  Underlying this is the fact that the composite picture is made up of a sequence of DIFFERENT time record lengths. Bear in mind that the editing room floor of our FFT analyzer is littered with unused portions of frequency data. We have clipped and saved only about half the freq response data from any of the individual time records.

How does this apply to the impulse response? Very big. The impulse response is derived from the transfer function frequency response (amp and phase). It is a 2nd generation product of the linear math. The IR is computed from a single frequency response – from a single time record – which means it comes from LINEAR frequency response data.  The inverse fourier transform (IFT) cannot be derived from the disected and combined slices  we use for the freq response. The IR cannot contain equal amounts of data taken from a 640 ms, 320 ms, 160 ms…. and so on down to 5ms  to derive it response. Think it through……… there is a time axis on the graph. It has to come from a single time event.

The IR we see comes from a single LINEAR transform. The importance is this: linear data favors the HF response. If you have 1000 data points, 500 of them are the top octave, 250 the next one down and so on. This means that our IR peak – where the “official” time will be found, is weighted in favor of the highest octave. If you have a leading tweeter, The IR will find it ahead of the pack (in time and level). The mids and lows will appear as lumpy foothills behind (to the right) of the Matterhorn peak. If you have a lagging tweeter, the IR will show the lumpy foothills ahead of the peak (to the left), but the peak will still be the highest point.  Our peak-finding function will still be drawn to the same point – the peak.

Now consider a comparison of arrival between two speakers – if they both extend out to 16 kHz (mains and delays) then the prejudice of the IR in favor of the HF response evens out. If we find the arrival time for both we can lock them together. Their response will be in phase at 16 kHz and remain in phase as we go down – (TO THE EXTENT THAT THE TWO SPEAKER MODELS ARE PHASE MATCHED).  This is a PARALLEL operation. 10kHz is linked to 10 kHz and 1k to 1k and 100 to 100 for as long as they share their range. If the speakers are compatible, one size fits all and the limitations of the IR are even on both sides of the equation. If they are not compatible over frquency, we will need to see the PHASE response to see where they diverge, and solutions enacted within this viewpoint. – later on that.

Now back to the subs…………

It should be clear now that the linear favoritism over frequency will NOT play out evenly in joining a sub to a main speaker. This is also true of aligning a woofer and tweeter in a two-way box. This problem holds for ANY sprectral crossover tuning. Linear frequency math does not have a and fair and balanced perspective over frequency. If you are looking at devices with different ranges they are subject to this distortion.  The location of the peak found in our IR is subject to the linear focus. If the main speaker is flat the peak will be found where there are more data points: the top end – 4 to 16 kHz. All other freq ranges with appear RELATIVE (leading or lagging) to this range. If you have a speaker that is similar to 100% of the speakers I have measured in the last 26 years, then one thing is certain: the response at 100 Hz is SUBSTANTIALLY behind the response we just found at 8 kHz.

The sub is NOT flat (duh!!) so there is a tradeoff game that goes on in the analyzer. As we lose energy (frequency rising) we gain data points (liner acquisition)  so the most likely place the peak will be found is in the upper areas of the subwoofer range and/or somewhat beyond,  before it has been too steeply attenuated.  If you have a subwoofer that is similar to 100% of the speakers I have measured in the last 26 years, then one thing is certain: the response at 30 Hz is SUBSTANTIALLY behind the response we just found at its upper region.

One of the reasons I have heard given as the reason to use the IR values alone to tune sprectral crossovers (subs+mains, or woofer+tweeter) is that the IR gives us “the bulk of the energy” for each driver and aligning “bulk of the energy1+bulk of energy2 = maximum bulk of energy.”  Sounds good in text. But it does NOT work that way. You are making a series connection at a specific freq range, not a parellel connection (where bulk might apply). Futhermore, the bulk formula is flawed anyway – because the linear freq nature if the IR means that the two “bulks” are weighed differently.

********************** Part  IV ******************

There are a variety of ways to compute an impulse response on an FFT analyzer. All of them haqve an effect on the shape of the response, how high the peak goes, and where (in time) the peak is found. Without going hard into the math we can look at the most decisive parameters.

VERY SIMPLIFIED IR Computation Features

1) The length of time included after time zero (the direct sound), in seconds, milliseconds etc.:  This differs from the the actual time record captured, since there is positive and negative time around time zero – but the math there is not important . In the end we have a span of time included in the computation.  This puts and end-stop on our display – we can’t see a 200ms reflection if we have only 100ms of data after the direct sound. We could, however choose to display less than the full amount of data we have. The visual may be a cropped version of the computation, or it could be the full length.  The capture time also limits how low we can go in frequency. We can’t see 30 Hz if we only have 10ms of data. Most IR response have the option of large amounts of time, so getting low frequencies included will not be a big issue. The fact that the frequency response is LINEAR means that frequency weighting favors the HF – no matter how long – or short our capture is.

2) Time increments/FFT resolution/sample freq:  How fine do we slice the response in time. The finer the slices, the  more detail we will see. More slices = higher frequencies. If we have slice it into .02 ms increments (50 kHz sample rate) we can see up to 25 kHz. If we slice at lower sample rates, the frequency range goes down. The same speaker, measured over the same amount of time, with different sample rates/time increments will include different frequeny ranges – and therefore MOST LIKELY will have its impulse peak centered at a different time. This is important. The speaker did not change, but our conclusions about it did. This is a non-issue if we are comparing two speakers that each cover the same range – they would both have the same shift applied to them. But if we have one speaker with a full HF range and one without the playing field just got tilted. If one speaker really has no HF, and the other one does – but it is filtered by the anaylzer, then we can assume that synchronizing the two peaks will put the speaker in phase.

Vertical scale: Linear/Log:  The uncultured version of the IR is linear in time, freq and level. This means that things that go negative will peak downward while positive movement goes upward. Polarity (and its inversion) can be seen. The down side of this is that the linear vertical scaling translating vewry poorly visually toward seeing the details of the IR such as late arrivals, reflections, etc. Worse yet is trying to discern level differences in linear. The Y axis does not read in dB. It reads in a ratio and this has to be converted. Upward peaks have a positve value and downward have a negative value. The strength of an echo can be computed by the ratio of the direct to the echo levels – and converted by the log 20 formula into dB. Where it strange is when you try to compute positive direct sound to a negative going reflection.

The log version is obtained by the Hilbert transform and shows the vertical scale in dB. But the downside is that there isn’t a downside. Pun intended. What I mean is that the negative side of the impulse is folded over with the positive and these are combined into a single log value. This can now be displayed in dB since everything is going one way. This has various names: Energy-time-curve (ETC) amoung others. The visual display is blind to the polarity but I am told by sam Berkow that the cursor in SMAART shows whether or not the energy is positive or negative – even though it all displays positive.


So once again we are back to the same place. If you are going to use the impulse response alone (I say you because it will not be me) to align speakers in different freq ranges you are prone to computational items that will affect the HF and LF sides of the equation differrently. One technique I have seen advocated is the push down the sample freq so low that the upper regions of the HF speaker are filtered out. The idea is this: if the Xover is 100 Hz, then drop the resolution of the analyzer down to filter out the region above 100 in the HF speaker. Then we will see the impulse response at 100 Hz of BOTH speakers – and VOILA we have aour simple answer.  BUT – one impulse response (the HF) has filtered the device by computation – the other (the LF) is filtered by a filter. We have a merger of  the VIRTUAL – a computationally created phase shift and freq response filtering (which we don’t hear) with an actual – the filter response of the Xover.  It is possible that the value for the impulse will give the correct reading so that the Xover is actually in phase – possible – not probable – but we won’t know until we measure the phase – which is the whole point of this exercize.

Simply put: why bother with a step-saving solution ( Xover alignment by IR)  if it is so prone to error that you have to do the second step (Xover alignment by phase) any way? If a step is to be skipped it is the IR – not the phase.



  1. I have a question about this. How important is it to set the delay for your reference signal within your FFT when you’re doing this? Can you just leave it alone so that it’s a common reference against measuring the tops and subs independently, or should you constantly adjust the delay on your reference signal as you adjust the delay on your tops or subs to phase align to two sources? OR do you figure out which sound source is arriving late and set your FFT delay based on that and add delay to the other source until the phase is aligned?

    • Thanks for writing Dave,
      Not very important. The time records are so long down in the subwoofer range that it takes a LOT of error in the propagation time before the data is degraded significantly. I will usually rough it in – but the reason is to simply ease the complexity of the picture – fewer wraparounds, easier to see.
      In any case there is ONE very critical aspect: whatever internal delay you use, it must remain unchanged through the process. The sub and main need to be viewed relative to the same time base in order for us to verify that we locked them together in phase.


  2. hi Bob, may i add in short words, that an ir, that is displayed by time vs.amplitude, never ca gives you an information on frequency domain or phase domain. totally agree with you. But why dont have a short look at the time domain to verify that the speakers to be aligned arrive around the same time at real world place (mic @audience)first. Next step would be to align the phase at the acoustical crossover point …
    am i missing your point ?
    thank you

  3. Thanks Sven,

    What you are saying makes total sense and I have no disagreement with that. 1)Get an starting point with the impulse response and 2) finish with the phase.

    The entire reason for me posting (and being my usual long-winded self) on this subject is the number of people I have encountered who want to stop at step 1 and pronounce the job done. If we are linking lows to highs this is simply not accurate enough – and can be conclusively proven by observing the phase response.

    If we are linking two systems that cover the same frequency range (such as a main and delay, main+downfill, or even subwoofer to distributed sub) then the impulse alone – the one step process – is fair game. The blurry vision the IR gives of frequency is fine AS LONG AS IT BLURS BOTH SPEAKERS THE SAME WAY. The problem with using the IR for Xover tuning is that the IR sees the Xover range clearly for one speaker and misses the other like a staggering drunk.
    I am sorry if I got too much into details to make that basic point unclear. Hopefully it is clearer now.

    Thanks again Sven for your input

    I have to more say on this. Off to a benefit concert for Haiti tonite. A million local artists. Sure to be chaos on stage. Sound engineer’s nightmare. And me? I will just be watching and enjoy. :-)


  4. great, thanks for your reply Bob.
    don´t worry so much about being so copious; this is why i enjoy to read your blog and your book !

  5. Hi Bob, thanks for this interesting thread.

    Now, for those who really want to use an IR to get a rough idea of the arrival time of the subs, before looking at the phase trace, try he following to get best measurement results:

    To get a better frequency resolution at lower frequencies, we can either increase the FFT time constant, or lower the sample frequency of our measurement system.

    This article triggered my curiosity to experiment, and lowering the sample rate gives spectacular results if the arrival time for low freqs is to be found. My sound card goes as low as Fs = 5512 Hz. Using Fs = 5512 Hz and an FFT size of 2K gives a frequency resolution of 2.7 Hz. To achieve the same frequency resolution at an Fs = 48K, we’d need an FFT size of 16K.

    Using the lower sample rate and FFT size, gives a stable IR or transfer function measurement at low frequencies without having to rely on a high number of averages.

    I’ll upload some screenshots tomorrow to illustrate the advantages of the lower Fs to measure lower frequencies.

    Best Regards,


  6. Thanks Igor,
    I look forward to seeing the screenshots. It is indeed much easier to find subwoofers with lower sample rate.


  7. Christoffer Brenna says:

    Thanks a lot for writing about this.

    My MOTU ultralite MKIII interface won’t go below 44.1 khz, but it can do 36 db LPF at the inputs, will that work, as long as I use the same settings for subs and tops?


    • Christoffer,
      Thanks for writing. Will it work? Yes and no. Yes it will show you an impulse response for the subs and tops, but no it will not (unless by extremely luck chance) show you the CORRE CT amount of delay required to accurately phase-align the connection between the subS and tops. The MOST IMPORTANT thing to remember here is that you are joining the TOP of your BOTTOMS to the BOTTOM of your TOPS. The impulse response peak can’t be told to represent a specific frequency – and that is what we need – the phase values (time) at the CROSSOVER frequency.

      Your MOTU can band limit the signal – and do so evenly to both sides of the equation. Fine. If you are timing one sub to another sub, that will work great because they cover the same range – both subs have the same bottom, middle and top – so a bulk time value will line up all of those areas. 30 Hz might be 15 ms behind 100 Hz in the sub but it will be the same for both subs – so a single bulk delay value is good to align to matched boxes (of any range). But as soon as we are joining unmatched boxes this goes out the window – and it’s hard to get more unmatched than a sub and top. If the XOVR filters are steep they may only share 0.5 octave or less in common.

      It is a difficult concept but we need to face the fact that our speakers are time-stretched – there is NO SINGLE propagation time for ANY speaker on the market – the lowest freqs are behind the highest. Well engineered Main speakers can run for multiple octaves with very little stretching, but as we go to the big wavelengths, you are going to see lag. I have NEVER seen a subwoofer that could tell time. They all fall further and further back as freq falls. Therefore, I have concluded that it is absurd to chase the windmill of assigning a single number to subwoofer propagation – via the impulse response, which makes it a bad choice for aligning to the also time-stretched (but differently so) HF.

      How can this problem be solved? As said before. Repeatable, reliable alignment at XOVER where the elements constructively ADD together, comes from the phase response, period. If you like to use the impulse to ballpark it in so that it is EASIER to see the phase response then go for it.

      Again Chrstoffer, thanks for your post and I hope this helps


  8. Christoffer Brenna says:

    thanks answering. What I wanted to know was if lowpassing my inputs would result in the same as a lower samplerate, and if I understood you correctly, it will.

    I’m a rookie at audio measuring, so I can’t contribute to the discussion here, but if you know of some online article/tutorial of how to use the phase response method you’re suggesting, it would help me a lot. As of now, my knowledge about phase vs time is limited, and I can’t figure out how to do the alignment from the phase-readouts.

    Sorry if I bother you with amateur-questions, but I’m really stuck in a venue where I need to align three sub positions with stereo mains + delays.


    • Low pass filter will have a similar enough effect to lower sample rate.

      The beauty of using phase for the xover is you don’t even have to know how to tell time. You can simply match the look of the phase traces. When theyy match, they are in phase – no matter what the actual time is – their relative time is sync’d.

      Speaking of rookies – I can’t figure out how to post a pic into the comment so I will post my of an a phase aligned crossovers separately.

  9. Christoffer Brenna says:

    So the way to go at it:

    -Get the aproximate delay with lazer etc
    -Monitor phase in realtime while audjusting delay time until coherence?

    In other words I need something like SIA smaart, fuzzmeasure will not do the trick because it lacks realtime monitoring?

    Thanks again

    • —–So the way to go at it:

      -Get the aproximate delay with lazer etc #### or by eye, or tape measure ######
      -Monitor phase in realtime while audjusting delay time until coherence? #### until phase responses MATCH ######

      In other words I need something like SIA smaart, fuzzmeasure will not do the trick because it lacks realtime monitoring? #### yes but fuzzmeasure??? Huh???? As bad a product name as Spectrafoo – is this for real? ######

      Thanks again

  10. Christoffer Brenna says:

    Hehe, I don’t care for the name either.
    I tested it (fuzzmeasure pro 3.2) against Smaart and for impulses it works as good, while its both cheaper and easier to use. It also has a waterfall module which easily spotted roommodes when i tested it.

    Not everyone can afford SIM 😉

    Sorry if my norwegian “english” is confusing by the way.

    Thanks a lot for your help, I will follow your blog for sure…!

  11. Hi,

    thanks for your post. Another approach is to optimize for the highest energy in a given frequency area (around the acoustical x-over frequency) in the result of the virtual addition.
    This is currently realized in SATlive and in the current Beta of systunes.


  12. Hi This all sounds very technical and I’m looking for an easy real world fast solution to aligning Ground subs ( directly below cluster ) to mid highs cluster that is hung.
    Is there a problem with this simple approach
    Based on a 100 hz crossover in the drive rack.
    1. Send a 100 hz sine tone to 2 channels
    2 flip the phase on one channel and delay the subs until substantial cancelattion occurs.
    3.flip back the phase and that’s it

    Thanks for any comments
    2. Flip the phase on one channel

    • Thanks for the comment Pete. I would not recommend this approach because you can just as easily be off by 10 ms ( a complete cycle at 100 Hz). The addition at neighboring frequencies will be poor if you are of by a cycle at 100 – even if 100 itself adds ok.


      • Hi,

        and an other thing to keep in mind is that the frequency on your x-over in most cases differs from the acoustic x- over frequency. The acoustic x-over frequency is the point where both signals arrive with the same level, i.e. the frequency where ‘perfect’ comb-filtering might occur. So you should optimize around the acoustic x-over frequency not at the frequency set on your controller. The ‘offset’ between both frequencies depends on a lot of parameters and might be quite large.


Speak Your Mind