Abstract:
A framework for ``phonological concept formation'' has been proposed, aiming to generate robust speech recognition models [Kojima et al., Proc. ICSLP 92, Vol. 1, pp. 269--272 (1992)]. For this purpose, a ``piecewise linear segment lattice'' model is proposed. The structure is represented as a lattice of segments, each of which is represented as regression coefficients of feature vectors within the segment. Compared with typical stochastic models like HMM, the advantages are: (1) It needs fewer samples to learn; (2) it represents objects in voluntary precision; and (3) its structure can be dynamically changed by less calculation. An outline of the generation algorithm is as follows: (1) Dividing each sample into segments using DP, where the number of segments is decided based on an MDL-like criterion; (2) matching between the sequences of segments within the same word by DP; (3) modifying the division according to their matching scores; (4) picking up similar (i.e., near) subsequences and gathering them into a phonelike cluster. Speaker-independent isolated word recognition is carried out using the proposed models which are generated in several conditions. The results show that the recognition rate is improved by forming phonelike clusters.