Abstract:
Speech recognition systems do not usually utilize prosodic information, i.e., information signaled by segmental duration and the fundamental frequency contour of the speech signal. The acoustic manifestation of prosody is, more often than not, considered as a disturbance in current statistical approaches to the speech recognition problem. The detection and transformation of sentence accent in, e.g., spoken language translation systems, will enable stress on a certain word in one language to be transformed into a suitable representation of corresponding constituents in the other language and satisfy the same semantic goal. In this study, a system for automatic detection of sentence accents to be used in speech recognition systems, is presented. The fundamental frequency is extracted from the speech signal and an estimated frequency declination is subtracted from the actual fundamental frequency in order to give a normalized representation of the variations. These fundamental frequency variations are given in musical intervals. The interpretations of sentence accents are carried out from this normalized manifestation of the fundamental frequency. Both the system architecture and some preliminary results will be shown.