[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sinfa using matlab

Dear Jont and List,

I couldn't resist the kind invitation to discuss the analysis of consonant confusions...

I agree with Jont that SINFA may not be the best way to analyse consonant confusions. I also agree that confusion data is quite complex. My interpretation of this is that using only one analysis method will limit our understanding rather than extend it.

I disagree that it is time to give up on distinctive features as they *do* provide insights into certain aspects of consonant confusions (e.g. spectral integration as described in Christiansen and Greenberg, 2012).

The fact that distinctive features are defined by production characteristics does not, in my view, preclude them from playing a role in perception. This is indeed what the data from Christiansen and Greenberg, 2012 says. Moreover, this data cannot be explained by AI, which is why I argue that we need to be open to different ways of analysis and interpretation (and even open to degrading the speech signal by other means than noise - e.g. band-pass filtering).

Now, I do recognize the vast amount of valuable AI work and the attention it has received. All I am saying is that perhaps it is time to *also* pay attention to alternatives.

This is my poppyseed-free two cents...

Christiansen, T.U. and Greenberg, S. (2012) Perceptual confusions among consonans, Revisited – Cross-spectral Integration of phonetic-feature information and consonant recognition. IEEE Trans. Audio, Speech and Lang. Proc. 20: 147-161

Best regards,
Thomas Ulrich Christiansen, PhD
Senior Research and Development Engineer
Audiological Requirements
Audiology and Embedded Solutions

Oticon A/S
Kongebakken 9
DK - 2765 Smørum

Direct:  +45 3913 7675
Main: :  +45 3917 7100
Mail:    thch@xxxxxxxxxx
Web:     www.oticon.com

Jont Allen skrev den 27-03-2016 14:46:
Dear All,

My comment is not about HOW to get SINFA working, but WHY you would
want to get it working.

Since 1973 we have learned a great deal about phone identification by
normal and hearing impaired listeners. Bob Bilger was a good friend,
and his work represented
an important stepping stone along the path toward building realistic
and correct understanding of human speech processing. But today, in my
view, SINFA is not a viable
way to analyze human speech errors. One of the problems with the 1973
analysis was due to the limitations of computers in 1973. All the
responses were averaged over
the two main effects, tokens and SNR. This renders the results

Please share with us your thoughts on what the best methods are today,
given what we now know. And I would be happy to do the same.

My view:

I would suggest you look at the alternatives, such as confusion
patterns, which is a row of a confusion matrix, as a function of SNR,
and most importantly, go down to
the token level. It is time to give up on distinctive features. They
are a production concept, great at classifying different types of
speech productions, but they
do not properly get at what human listeners do, especially those with
hearing loss, when reporting individual consonants. Bilger and Wang
make these points in their HSHR article.
They emphasize individual differences of HI listeners (p 737), and the
secondary role of distinctive features (p. 724) and of hearing level
(p 737). I do not think that multidimentional scaling can give the
answers to these questions, as it only works for a limited number of
dimensions (2 or 3). Actual confusion data, as a function of SNR, are
too complex for a 2-3 dimension analysis.

Here are some pointers I suggest you consider, that describe how
humans decode CV sounds as a function of the SNR.

The Singh analysis explains why and how the articulation index (AI)
The Trevino article shows the very large differences in consonant
perception in impaired ears. Hearing loss leads to large individual
differences, that are uncorrelated to hearing thresholds.
The Toscano article is a good place to start.

	* Toscano, Joseph and Allen, Jont B (2014) _Across and within
consonant errors for isolated syllables in noise,_ Journal of Speech,
Language, and Hearing Research, Vol 57, pp 2293-2307;
doi:10.1044/2014_JSLHR-H-13-0244, (JSLHR [6],pdf [7], AuthorCopy [8])

	* Trevino, Andrea C and Allen, Jont B (2012). "Within-Consonant
Perceptual Differences in the Hearing Impaired Ear," JASA v134(1);
Jul, 2013, pp 607--617 (pdf [9])

	* Riya Singh and Jont Allen (2012); "The influence of stop
consonants’ perceptual features on the Articulation Index model," J.
Acoust. Soc. Am., apr v131,3051-3068 (pdf [10])

These two publications describe the speech cues normal hearing
listeners use when decoding CV sounds. Each token has a threshold we
call SNR_90, defined as the SNR where the errors go form zero to 10%.
Most speech sounds are below the Shannon channel capacity limit, below
which there are zero errors, until the SNR is at the token error

Distinctive features are not a good description of phone perception.
The real speech cues are relieved in these papers, and each token has
an SNR_90. Bilger and wang discuss this problem on page 724 of their
1973 JSHR article.

	* Li, F., Trevino, A., Menon, A. and Allen, Jont B (2012). "A
psychoacoustic method for studying the necessary and sufficient
perceptual cues of American English fricative consonants in noise" J.
Acoust. Soc. Am., v132(4) Oct, pp. 2663-2675 pdf [11]

	* F. Li, A. Menon, and Jont B Allen, (2010) _A psychoacoustic method
to find the perceptual cues of stop consonants in natural speech_,
apr, _J. Acoust. Soc. Am._ pp. 2599-2610, (pdf [12])

If you want to see another view, other than mine, read this, for

Zaar, Dau, 2015, JASA vol 138, pp 1253-1267


Jont Allen

On 03/26/2016 10:44 AM, gvoysey wrote:

I have not tried this, but i am willing to bet you can get FIX
running on a modern PC with DOSbox [4], which is a cross-platform
MS-DOS emulator. It’s most famous for letting you play very old
video games in your web browser (http://playdosgamesonline.com/
[5]), but there’s no reason it shouldn’t work just as well for
Real Work.


On Sat, Mar 26, 2016 at 5:06 AM, David Jackson Morris
<dmorris@xxxxxxxxx> wrote:

Dear Skyler,

I have been on a similar search and found an R package by David
van Leeuwen that is available at github. Please let me know if
you find any other alternatives?

FIX is really awesome, but every time I want to use it I have to
go over to Grannies and boot the Win 95 machine, and she makes me
eat poppyseed cake which makes me tummy sore. . .




INSS/Audiologopædi/Speech Pathology & Audiology
Byggning 22, 5 sal

Njalsgade 120

2300 København S

Office 22.5.14

TLF 35328660

University website [1]


FROM: AUDITORY - Research in Auditory Perception
[AUDITORY@xxxxxxxxxxxxxxx] on behalf of Skyler Jennings
SENT: Friday, March 25, 2016 9:15 PM
TO: AUDITORY@xxxxxxxxxxxxxxx
SUBJECT: sinfa using matlab

Dear list,

I am writing in search of MATLAB-based software that performs
sequential information transfer (SINFA; Wang and Bilger, 1973). I
am impressed with the quality of the DOS-based software maintained
by UCL called “FIX;” however, it would be more convenient to
do the analysis in MATLAB if possible.

I appreciate any help you can offer, whether it be guiding me to
publically-available software, or sharing software that you’ve




Skyler G. Jennings, Ph.D., Au.D. CCC-A

Assistant Professor

Department of Communication Sciences and Disorders

College of Health University of Utah

390 South 1530 East

Suite 1201 BEHS

Salt Lake City, UT 84112

801-581-6877 [2] (phone)

801-581-7955 [3] (fax)



Graham Voysey
Boston University College of Engineering
HRC Research Engineer
Auditory Biophysics and Simulation Laboratory
ERB 413

[2] tel:801-581-6877
[3] tel:801-581-7955
[6] http://jslhr.pubs.asha.org/Article.aspx?articleid=1894924
[13] http://scitation.aip.org/content/asa/journal/jasa/138/3/10.1121/1.4928142