Subject: Re: Definition and Measurement of Harmonicity From: Harvey Holmes <H.Holmes(at)UNSW.EDU.AU> Date: Fri, 21 Jan 2005 13:17:39 +1100Chris and Others, I assume here that this topic refers to the second of Jim Beauchamp's definitions: "the degree to which single sounds contain only harmonics" of a common fundamental frequency. I also assume that this is to be determined by examining the signal itself. In practice this will be a sampled signal x(n), n = 0, 1, ..., N-1. On this basis, I think we need more general and applicable measures than the ones so far mentioned in this stream, for the following reasons. Thus, Reinhart Frosch's formula only applies for a particular model of thick plucked or struck strings. It also doesn't give an overall measure of inharmonicity, since it only applies to the individual partials. In addition, it doesn't take into account the relative strengths of the various partials (e.g. a signal would still be almost perfectly harmonic, even if some of the partial frequencies are grossly in error, provided that the corresponding partial amplitudes are very small). It takes no account of noise of any sort in the signal, which is often the reason that a signal is less than perfectly harmonic. Similar comments can be made about Jim Beauchamp's formula. Incidentally, I couldn't find any measures of pitch salience on a quick browse through Slaney's Auditory Toolbox, as suggested by Brian Gygi. Possibly the biggest problem in practice with both the above formulas is that they assume that the partial frequencies are known. However, deriving these from the observed signal is a very difficult problem in practice. Papers are still being published even for the simplest case of a single sine wave in noise. Methods based on the DFT (or FFT) are often used to derive the partial frequencies in more complex cases, but simple-minded approaches using such methods are not very accurate. Subspace methods can also be used, but are computationally expensive. However, I think it is fair to say that the problem of finding good estimates of the partial frequencies is still very much an open problem. (And I suspect that the estimates would have to be very good indeed to use the proposed formulas.) ********************** My conclusion is that, instead of the above, we need something similar to the concept of the degree of voicing, which is commonly used in speech coding work. The degree of voicing is a measure of the degree to which a signal is periodic - i.e. the degree to which it is harmonic in the above sense. There is a good survey of voicing determination in "Pitch and Voicing Determination" by W.J. Hess in Chapter 1 of "Advances in Speech Signal Processing" by S. Furui and M.M. Sondhi (Eds), Marcel Dekker, 1992. As pointed out in that article, many pitch estimators produce estimates of the degree of voicing as a by-product of their pitch estimates. There is thus a large choice of harmonicity measures available. ********************** A brief description of two common measures of harmonicity that are easily appreciated in their own right are as follows (in their original contexts they were associated with further algorithms for pitch estimation or coding, or both). Firstly, consider the autocorrelation function (ACF) of x(n), denoted by R(k). If a signal x(n) is purely harmonic, it will also be perfectly periodic with some period K, so that x(n+K) = x(n) for all n. In this case it is easy to show that R(k) is also periodic with the same period. Also, R(0) is the global maximum value of R(k), but this same maximum value is also achieved by R(K), R(2*K), etc. However, If x(n) is not periodic, R(0) will be larger than R(K), R(2*K), ... In the case of a signal observed only on the interval [0, N-1], an appropriate (re-)definition of the ACF is R(k) = MEAN (x(n) * x(n+k)), where the mean is taken over all terms for which both n and n+k are in the range [0, N-1]. For a harmonic signal the ACF defined like this will still be approximately periodic and have almost equal major peaks at k = 0, K, 2*K, ..., provided that N is large enough to cover several periods (N > 2.5*K is often considered adequate). These considerations lead to a frequently used measure of harmonicity, defined as H1 = MAX (R(k)) / R(0), where the maximum is taken over k in the range [1, N-1]. H1 will always be less than or equal to 1, but in the purely harmonic case it will be near 1, whereas for a noise-like signal it will be near zero. It may be possible for H1 to be large for some non-harmonic signals, but in practice this measure has been found to be a reasonably good indicator of voicing (or harmonicity) for speech signals, and is widely used. It has the great advantage of being very simple to compute. A variation of this measure is to apply it to the residual signal following linear prediction, instead of to the signal itself. A second measure of harmonicity can be obtained by fitting a harmonic signal to the observed signal, e.g. using least squares. That is, we write x(n) = p(n) + e(n), where p(n) is a purely harmonic (periodic) signal and e(n) is an error (or residual) term. We can write p(n) in the form p(n) = SUM (A(k)*cos(k*w0*n + ph(k)) ), where the sum is taken over all harmonics k*w0 up to the Nyquist frequency (i.e. k ranges over [1, 2, ..., floor(pi / w0)] ). To perform the fit we then find the amplitudes A(k), the fundamental frequency (or pitch) w0 and the phases ph(k) that minimize the energy of the error sequence e(n). This is a highly nonlinear problem in general, but it becomes linear and simple to compute if the fundamental frequency is known. Hence in practice this method usually begins by finding a best pitch estimate w0, using any good method (see the Hess article), and then solving for the amplitudes and phases. The details can be found in a number of articles by R.J. McAulay and T.F. Quatieri on sinusoidal coding (e.g. Chapter 4 of "Speech Coding and Synthesis" by W.B. Kleijn and K.K. Paliwal (Eds) , Elsevier Science, 1995). The first harmonicity measure that results from this analysis is then H2 = SUM(h(n)^2) / SUM(e(n)^2) , where n ranges over [0, N-1]; i.e. the harmonic-to-residual ratio (similar to signal-to-noise ratio). H2 is large in the purely harmonic case and small in the noise-only case. Instead of H2 we could also consider H3 = SUM(h(n)^2) / [SUM(e(n)^2) + SUM(h(n)^2)] = H2 / (H2 + 1), which is near 1 in the purely harmonic case and near zero in the noise-only case, just like H1. (McAulay and Quatieri also give other related measures in their papers.) Unlike H1, however, these measures are clearly tied to the degree to which a signal is harmonic. But they require much more computation (though still not an enormous amount). ********************** All the above assumes implicitly that the signal is stationary, which is at best an approximation in real cases. The case in which the fundamental varies slowly can be handled in sinusoidal coding, but sudden changes are more problematic. Also, it is implicit that the residual e(n) is a noise-like signal, not other harmonic complexes (as in musical chords). Harvey Holmes At 05:23 15/01/2005, Reinhart Frosch wrote: >The inharmonicity of piano strings is treated in >section 12.3 of the book "The Physics of Musical >Instruments", by Fletcher and Rossing (Springer, >2nd ed. 1998). > >The basic equation for the frequency of the k-th >partial tone is: > >f[k] = f[1i] * k * (1 + k^2 * B)^0.5 ; > >here, f[1i] is the fundamental frequency of an >idealized string that has the same length, mass >and tension as the real string but is infinitely >flexible (i.e., has no stiffness). > >B = 0 corresponds to a string without stiffness >and thus to a harmonic complex tone; >B is an "inharmonicity coefficient". > >Reinhart Frosch, >(r. Physics Dept., ETH Zurich.) >CH-5200 Brugg. >reinifrosch(at)bluewin.ch > > >-- Original-Nachricht -- > >Date: Thu, 13 Jan 2005 14:50:20 +0000 > >Reply-To: Chris Share <cshare01(at)QUB.AC.UK> > >From: Chris Share <cshare01(at)QUB.AC.UK> > >Subject: Definition and Measurement of Harmonicity > >To: AUDITORY(at)LISTS.MCGILL.CA > > > >Hi, > > > >I'm interested in analysing musical signals in terms of their > >harmonicity. > > > >There are numerous references to harmonicity in the literature > >however I can't find a precise definition of it. Is there an > >agreed definition for this term? > > > >If someone could point me to some relevant literature it would > >be very much appreciated. > > > >Cheers, > > > >Chris Share