Interpreting Coherence on the FFT Analyzer

When we make the transition from the simplistic renderings of one-dimensional acoustical analysis, i.e: those that present amplitude only, we have several new traces to contend with. The most notable are phase and coherence. Phase is often minimally grasped and coherence even less so. These are both difficult because we don’t have simple mechanisms in our ear/brain system to detect them, especially when either of these functions varies with frequency (which is almost always the case).

Last week I received an interesting set of questions about coherence from Geoff Maurice of Brockville Ontario and I though it best to answer it here since others may share the interest and contribute to the discussion. What Geoff is looking for is coherence explained without going hard into the math, so here we go.

1) Coherence is a statistical metric: It monitors the extent of the variation in the data sampled. Therefore, first and foremost we must have multiple samples. In FFT terms this means were are “averaging” the data.  If we have at least two samples, we can statistically evalaute the averge amplitude and phase values and the deviation between the individual samples.  An average amplitude value of +6 dB could be the product of two nearly identical – or vastly different samples. The coherence value indicates (inversely) the extent of the devation between the average and the individual samples.  Low deviation= high coherence and vice versa

2) The deviations between the samples that degrade coherence can be EITHER amplitude or phase or BOTH. Most factors affect both. Examples of this are wind, ambient noise, reverberation, a change in a filter setting. 

3) There are some factors that degrade coherence continually and some that degrade it only for a limited time. Continual degradation is caused by  the summation of the original signal with a relatively short delayed copy (the most obvious example is an echo).  The comb filtering results in a series of peaks and dips in both amplitude AND coherence. Variable degradation comes from non-correlation sources such as ambient noise.

4) Geoff asks: what mechanism causes a complete cancellation to reduce the coherence.  This is an interesting one. A complete cancellation at a given frequency results from the  summation of equal level signals that 180 degrees apart. At the place where this occurs (our measurement mic) the original signal (the first arrival) is not audible since it has been neutralized by the reflection. Since our transfer function analyzer is looking to compare the original electronic signal to the acoustic arrival it finds the signal missing at the mic. This does mean there is silence at the mic. Far from it. Instead what we have is any and all other signals in that frequency range. Reflections from long ago, signals from other sources (stage, sound fx) and ambient noise.  All of these signals will have very poor correlation to the original electronic refernce signal. On the contrast side the same early reflection will make for excellent coherence at other frequencies where the late arrival falls 360 out of phase – which means it is “in phase” and will add to the original signal. Strong early reflections make for stable coherence – high AND low. 

5) Geoff asked about the relationship of the number of averages to the coherence value. The coh value is calculated from whatever quantity you choose. If the deviation percentage is the same over two samples as it is over 16 the coh value will be matched. In the case of SIM we use a coherence blanking function that screens data below a given threshold. THAT threshold varies with # of averages. Why?  If you have just two samples you have “he said/she said” – who is right?  (we know this but I am not saying) – so these two better match or we can not resolve it.  With 16 samples we have lots of data to work with. One or two can be far off and not pull the average down to far.  So we use a high threshold (90%) for 2 averages and a low threshold for blanking with 16 average (20%).

6) next Q was – would you call a coherence value of 50% accurate?    yes………and no. A coherence of any value can be accurate. 0% coherence is an accurate representation of all noise and no signal.  But I think where the question goes is: how much coherence is good enough to act on?  The answer here is the ultimate in sliding scale.  Grading a curve on a curve.  If I am measuring a frontfill speaker in the second row , 50% is a very poor value.  If measuring an array in the back of an arena 50% is a great number. For me looking at coherence trends is more helpful than considering a particular number. A drop to 50% in an area where others are much higher gets my attention as would a drop to 20% where the median is 50% . A rise to 50 % where most are very low would also get my attention. Why is this range getting through, while others aren’t?

There is a start on this subject. Anyone with thoughts is invited to comment



  1. Onward. These questions came back from Geoff Maurice so we will go a bit further:

    Geoff’s Question:
    1. How is it that a cancellation causes a drop in coherence over a narrow band, whereas a simple EQ filter does not? In both cases we are providing the same input signal, getting similar acoustical output signals, but we are getting differing coherence values.
    Could you explain?

    6o6’s Answer: It’s all about noise. Let’s take a line level eq. How far down is the noise? 80, 90 maybe even 100 dB? Even a 30 dB cut filter leaves you high and dry above the noise floor.
    Now look at a speaker in a room. Noisy? Yes sir. There are all of the usual suspects like HVAC noise and such, but most importantly is the presence of the self-induced noise of reflections. The combined noise floor is both random (HVAC etc.) and correlated (reflections). But the limited length of the quasi-log FFT time records introduces a new type of noise – semi-correlated. What is this? A reflection – which is inherently correlated – but of sufficient time gap such that it exceeds the FFT time record. The FFT suffers amnesia and does not recognize the reflection as its own – instead it is seen as noise (e.g. a 20 ms reflection is correlatged in the LF range and semi-correlated in the HF where the time record is about 5ms). As the time gap of a reflection increases, the frequency range where the signal remains correlated falls.

    Now think of a strong early 1ms reflection – this is a single wavelength at 1kHz, .5 wavelength at 500 Hz, where you will see the greatest cancellation – and lowest coherence. The COH is low and the data is accurate reliable and repeatable. This is different from a momentary coherence loss such as when a shout happens. By contrast, momentary loss means inaccurate, unreliable, unrepeatable.
    So why is there this bad coherence at 500 Hz? Because the direct sound has been cancelled – it cannot be heard. But the 500 hz off the floor, the walls, the ceiling and multiple round trips still gets to the mic. These all arrive late and at different levels and placements on the phase cycle. The result is uncorrelated data being the remainder of the energy at 500 Hz.
    Football sports Analogy: You are the direct sound trying to run. A player of equal size and opposite direction is holding you in a perpetual stalemate. Meanwhile the other players are stilll bouncing all around the field. Direct sound – strong early reflection, late reflections


    Geoff’s question:

    2. Something I am not totally clear on…
    When we look at coherence between samples, are we looking at the relationship between samples taken from just the microphone (over different time records)? Or a correlation between the input (electrical) and samples from the Mic?
    My understanding was that we are looking at the correlation between multiple microphone samples (over time, but with no reference to the electrical signal). If we receive samples at the microphone that don’t match in time or amplitude, this reduces the coherence (very plainly stated).
    I realize that there must be some kind of correlation between the input/output but do not understand how that fits into the equation.

    6o6 Answer:
    The coherence is not based on the microphone or any other SINGLE channel. Coherence works on the correlation of multiple samples of transfer function data. If it just looked at a single channel, something changing – like music would reduce coherence. This why you see coherence ONLY on dual-channel devices – or quasi-dual-channel (those with a known source that internally transfer against it).

    Thanks for the Q’s Geoof. I hope this helps.

  2. Great explanation.

    Around 2 month ago I made some sreencaptures to show how the coherence gets worse with the distance.

    Best regards.

    • Nico,
      I hope you are well. I followed your link and I don’t see the graphics – there are graphic in some of the other posts but not your coherence ones – is it just me? I just see text that says “1 metro, 2 metros, etc….
      Check it and let me know.


    • Nico,
      The pics do a good job of showing a typical decline in coherence over distance. It appears to me that the data was taken indoors and that the speaker has a small format, low directionality HF driver. The HF coherence drops of very quickly – which would NOT happen if the HF driver was a directional horn – the bigger and narrower the horn, the farther you will go before the coherence gets trashed.
      The Midrange coherence drops off as usual – less directional control, stronger reflections.
      The LF coherence stays strong despite the lack of directional control. You can see the signature of the early reflections in the phase response. The reason the coherence stays up is becuase the time records in the analyzer are very long down here. Even though the reflections are strong – they are early enough to be recognizable to the analyzer as correlated data. Whereas the mids and highs show large areas of Coh loss, the LF range drops out only where the strong cancellations are present.

      Good stuff, thanks Nico

  3. Here is an additional item from Geoff Maurice leftover from his previous coherence-related queries:

    From Geoff:
    Early tests made us aware that when (3) boxes were arrayed horizontally, and you measured ONAX with the middle box (Box 2 of 3), there is an interesting interaction over the range of 1.5-2.5kHz.
    We found that the signal arriving from the outer boxes (1 and 3) sum together 180 degrees out of phase and equal in amplitude to the center box (#2) to cause a complete cancellation at ~2.1kHz and a deep cavern 10-15dB down from 1.5 – 2.5 kHz.
    There was some talk (by others) that because there was a cancellation taking place (with Low Coherence) that the data shown wasn’t completely accurate.
    Some stated that the “Cavern” wasn’t really as deep as the transfer function showed because of the coherence.
    The line spectrum view showed a similar conclusion but it didn’t look as bad.
    I never really believed this “answer”, but couldn’t justify it with any hard evidence (Thus all the Questions).
    Hope that helps to put things into perspective…

    6o6 answer: It looks to me that you were correct to be skeptical. There are two types of bad coherence – steady state and variable. Steady state low coherence – i.e. it stays put – does not improve with age, increased number of averages etc indicates a very really problem – one that is as bad as it looks. (or at least as it looks in that exact location). This indicates that the signal strength at this location is very poor compared to the noise level. The most likely cause of this is a collision of equal level signals that are 1/2 wavelength apart. (In your example a 0 dB, 0 deg signal is meeting 2x -6 dB, 180 deg signals: a perfect storm.) The coherence will only improve if those conditions change. (a delay is introduced, level taper, polarity change, move the mic or speakers.)
    The fact that the line spectrum -(which freely adds the noise and signal together indiscriminantly) shows a lesser cancellation does not call to question the accuracy of the XFR data – quite the opposite. It supports the coherence trace’s indication that the direct sound has been cancelled, leaving only distant reflections arriving at that location (at that frequency) to be the loudest sources. These are rejected by the coherence algorithm – since it can tell time – and would also be obvious to our ears (highly reverberant, indistinct or inappropriate location etc. )
    In short – steady low coherence is indicative of cancellation – either by a second speaker, or reflection.
    Variable low coherence – good, then bad, then worse, then good etc…… this comes from intermittent adds and cancels. Wind, fork lift, people talking, stereo info from the other channel are a few examples. These indicate a challenge NOT solvable with your signal processing.

    Steady good coherence = isolation or addition
    Steady bad coherence = extreme noise or cancellation or total lack of sync between channels

    Variable coherence = semi-isolated, somewhat noisy, windy etc.

    Hope this helps


Leave a Reply to Nico Suárez Cancel reply