Abstract:
This study proposes a new model for dynamically tracking spectral shape change. In previous studies on computational implementation of auditory scene analysis, the sequential grouping has not been fully realized. Since the new model can track the spectral shape, it is able to make the sequential grouping based on timbre feasible. First, a spectral envelope is converted into a series of frequencies by using the IFIS (inverse function of integrated spectrum [Ohmuro et al., Tech. Rep. Speech Acoust. Soc. Jpn. SP89-72 (1992) (in Japanese)]. IFIS has good interpolation and extrapolation characteristics. Spectral shape is represented as a set of frequencies on the IFIS axis. Next, the frequencies are tracked with the FM-tracking model [Aikawa et al., J. Acoust. Soc. Am. 98, 2926(A) (1995)]. Aikawa et al.'s model represents the perceptual characteristic of sweep tones and is described by a second-order AR model. Finally, the frequencies are again converted into a spectrum. Furthermore, this new model has an additional function: spectral shape prediction. This is possible because the second-order AR model includes a prediction function, which can, for example, represent the recovered portion of the noise placed in a speech sound. The advantages of this model will be discussed in relation to speech signal separation techniques.