Dear All, My comment is not about HOW to get SINFA working, but WHY you would want to get it working. Since 1973 we have learned a great deal about phone identification by normal and hearing impaired listeners. Bob Bilger was a good friend, and his work represented an important stepping stone along the path toward building realistic and correct understanding of human speech processing. But today, in my view, SINFA is not a viable way to analyze human speech errors. One of the problems with the 1973 analysis was due to the limitations of computers in 1973. All the responses were averaged over the two main effects, tokens and SNR. This renders the results uninterperateable. Please share with us your thoughts on what the best methods are today, given what we now know. And I would be happy to do the same. My view: I would suggest you look at the alternatives, such as confusion patterns, which is a row of a confusion matrix, as a function of SNR, and most importantly, go down to the token level. It is time to give up on distinctive features. They are a production concept, great at classifying different types of speech productions, but they do not properly get at what human listeners do, especially those with hearing loss, when reporting individual consonants. Bilger and Wang make these points in their HSHR article. They emphasize individual differences of HI listeners (p 737), and the secondary role of distinctive features (p. 724) and of hearing level (p 737). I do not think that multidimentional scaling can give the answers to these questions, as it only works for a limited number of dimensions (2 or 3). Actual confusion data, as a function of SNR, are too complex for a 2-3 dimension analysis. Here are some pointers I suggest you consider, that describe how humans decode CV sounds as a function of the SNR. The Singh analysis explains why and how the articulation index (AI) works. The Trevino article shows the very large differences in consonant perception in impaired ears. Hearing loss leads to large individual differences, that are uncorrelated to hearing thresholds. The Toscano article is a good place to start. These two publications describe the speech cues normal hearing listeners use when decoding CV sounds. Each token has a threshold we call SNR_90, defined as the SNR where the errors go form zero to 10%. Most speech sounds are below the Shannon channel capacity limit, below which there are zero errors, until the SNR is at the token error threshold. Distinctive features are not a good description of phone perception. The real speech cues are relieved in these papers, and each token has an SNR_90. Bilger and wang discuss this problem on page 724 of their 1973 JSHR article. If you want to see another view, other than mine, read this, for starters: Zaar, Dau, 2015, JASA vol 138, pp 1253-1267 http://scitation.aip.org/content/asa/journal/jasa/138/3/10.1121/1.4928142 Jont Allen On 03/26/2016 10:44 AM, gvoysey wrote:
|