noise-masking experiments and labiovelars (Kyle Gorman )


Subject: noise-masking experiments and labiovelars
From:    Kyle Gorman  <kgorman@xxxxxxxx>
Date:    Fri, 28 Nov 2008 18:47:00 -0500
List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

Hi listers, I have two related questions for you. I'm interested in the effects of following vowels on the perception of labiovelars. I was wondering if 1) anybody had done the experiment I'm proposing and 2) whether anybody had tried using stimuli generated in a manner I'm proposing. 1: There's a well-known diachronic pattern whereby labiovelars like [k^w] (English <qu>) followed by rounded vowels like [u] dissimilate, with outcomes like [ku], [pu], [u], etc. I wanted to investigate whether all these patterns could be motivated strictly by misperception. Subjects will be presented with stimuli of the form consisting of one of the stops [p t k k^w], followed immediately by one of [i e a o u] (as pronounced by a human speaker) masked in various degrees of noise and presented with a forced choice immediately afterwards (between [p t k k^w]). Surprisingly, I haven't been able to find any prior work doing this. The closest I've gotten is this, which crucially lacks labiovelar stimuli: H. Winitz, M. Schieb, and J. Reeds. Identification of Stops and Vowels for the Burst Portion of /p, t, k/ Isolated from Conversational Speech. Journal of the Acoustical Society of America, 51:1309–1317, 1972. Does anyone know of any work with the relevant stimuli? 2) The masking studies I'm aware of seem to be essentially adding random samples to the original stimuli to generate their noise conditions. I was wondering if anyone had tried to make this more natural by matching the intensity contour of the noise to the intensity contour of the original stimuli. The procedure is as follows. I calculate the RMS amplitude of the original signal by convolving the squared signal with a Kaiser window with beta = 20, # of points given by 3.2 x the number of frames for a single period of the lowest pitch, which I set at 100 Hz (young female speaker), then taking the square root of the result. I then crop the ends to get an intensity signal that is of the same length as the original signal. I believe this is the procedure described in the Praat manual. I then multiply a multiple of this signal by values sampled from the uniform distribution [-1, 1], add the original stimuli to the noise, and then renormalize. This sounds, to my ear, more natural then the stimuli simply in a bed of noise, though because of the Gaussian window, I perceive is a tiny bit of "anticipation" of the noise, and an equal amount of lag as well, though it is very subtle. Has anyone tried this? Is this a natural thing to compared to the "bed of noise" in the past? The "bed of noise" condition seems to be more like "being in the same room as a constant loud noise", whereas the "envelope of noise" I describe seems more likely to turn up errors caused by human production/perception systems. Yours, Kyle Gorman -- Kyle Gorman ~ kgorman@xxxxxxxx ~ 513 405 2543


This message came from the mail archive
http://www.auditory.org/postings/2008/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University