Abstract:
The relation between sentence-level prominence and voice-source characteristics is investigated using shifting word focus in prepared dialogues of both English and Japanese from a female and a male speaker, respectively. The voice source was analyzed using a Rosenberg--Klatt model where the parameters include fundamental frequency (f0), amplitude of voicing (AV), open quotient (OQ), and spectral tilt (TL). The parameters were simultaneously estimated pitch synchronously from the speech signal by an ARX method. As a test of the estimation, electroglottograph (EGG) signals were also used to measure OQ pitch-by-pitch. The estimated parameters, including the ratio of glottal open phase to closed phase, were then compared with labeled prominence in the speech and a clear correlation found. Both the ARX analysis method and EGG-based measurements yield almost the same results for OQ estimation but the former is automatic and can be calculated from the raw speech waveform alone. A strong relationship between prominence and OQ and TL has been confirmed in these experiments, which extend previous work on the estimation of prominence from acoustic measures first reported in [W. N. Campbell, Proc. ICSLP 92, 663--666 (1992)].