Abstract:
State-of-the-art speech recognition and speech translation systems do not currently make use of prosodic information. Utterances often have one or more constituents semantically focused by prosodic means and detection of the focus/foci of an utterance is crucial for a correct interpretation of the speech signal. Thus, a semantic model of focus should be linked to a model describing the acoustic-phonetic correlates of the speech. However, variability exists at both the semantic and the prosodic ends. Semantically different kinds of foci might be associated with specific prosodic gestures. Also, a semantically specific type of focus might be realized in different ways in different varieties of a given language since general intonational patterns vary between dialects. In this paper, focus realization in three different dialects of Swedish is investigated. Subjects from Stockholm, Goteborg, and Malmo recorded three sets of four sentences where focus was systematically put on four different constituents by having the subjects answer wh-questions. Since Swedish is a language with two tonal accents, words with these accents both in and out of focus were included. Dialectal as well as individual variation in focus realization is described with emphasis on invariant and optional phenomena.