Abstract:
This paper proposes a method to automatically classify news speech articles using the word spotting technique. In word spotting, keyword probability Ps(w) ending at time t is computed by subtracting an accumulated probability of the best syllable sequence from the best syllable sequence + keyword probability. Keywords are extracted by finding the local peaks of the keyword probabilities. In article classification, the topic contribution of word (TCW) P(n|w) is computed in advance using ASAHI newspaper classification indices. Then a topic probability P(n)=(Sigma)[inf w]P(w)xP(n|w) is computed using the TCW P(n|w) and keyword probability Ps(w) is substituted into P(w). This topic probability is the integration of acoustic word probability Ps(w) and a priori knowledge probability TCW. The news article can be classified by finding the highest topic probability P(n). The classification of news speech articles was carried out using 5-min NHK news programs over 30 days. The phoneme HMMs were trained using one-day news (5 min). In the experiment, 77 articles were classified using 183 keywords. The numbers of phoneme HMMs and syllables are 40 and 108, respectively. The classification rate is at present 49.4% (38/77). The correct rate of word spotting was 54.9% (567/1033) and false alarm was 1.2 fa/kw/h.