3pEA3. Sound source segregation using the inter-channel differences in both intensity and phase.

Session: Wednesday Afternoon, December 3


Author: Mariko Aoki
Location: NTT Human Interface Labs., 1-1 Hikarinooka Yokosuka, Kanagawa, 239, Japan, mariko@nttspch.hil.ntt.co.jp
Author: Shigeaki Aoki
Location: NTT Human Interface Labs., 1-1 Hikarinooka Yokosuka, Kanagawa, 239, Japan, mariko@nttspch.hil.ntt.co.jp

Abstract:

A method is proposed for segregating sounds from multiple sources recorded on a two-channel system. It uses the differences in both intensity and phase, between the channels. In addition, the relation between frequency resolution and sound quality of the proposed method is examined. In the tests, three pairs of mixed sounds were used, namely, male speech with female speech, two kinds of female speech (the same person with different phrases), and a cock's call with female speech. Frequency resolution was varied between 5 and 80 Hz. In the subjective test, subjects listened to the original sound plus sounds segregated at five different resolutions: 5, 10, 20, 40, and 80 Hz. Then they estimated the quality of these sounds on a five-point scale. Moreover, the correlation coefficient between the original sound and each segregated sound is discussed. In the case of a human voice, it is determined that the optimal frequency resolution is 10 or 20 Hz. The interesting fact is that the optimal frequency resolution was not the highest. Compared with the human voice, lower frequency resolution may be enough to segregate a cock's call.


ASA 134th Meeting - San Diego CA, December 1997