Owen’s point reminds me of a discussion on this list, maybe 20 years or so ago. Someone remarked that when on a plane, listening to
the engines warm up, they could sometimes hear tunes in that sound. One could to some extent choose which tunes to hear. It’s another example of the general phenomenon that one can get percepts that plausibly map on to more than one stimulus, and that the
brain uses top-down information to carve out one route in that map. I also like Julie’s point: the demographic overlap between readers of early text-to-speech articles and users of TikTok may not be
as great as many of us would wish J Bob PS Remember planes? From: Owen Brimijoin <owen.brimijoin@xxxxxx>
It definitely works with non words. If you play continuous noise with randomized spectra to people and ask them to press a button when they hear a particular vowel sound in there, they quite happily do so. Even though you never actually
intentionally embed any vowel sound in the noise, when you average the spectrograms of the signal just before each of all of the hundreds of button presses, hey presto you see the spectrum of the vowel they were listening for.
It’s a little spooky. And very cool. Link to shameless self promotion: - Owen. From: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>
on behalf of Bob Carlyon <Bob.Carlyon@xxxxxxxxxxxxxxxxx> Hi Malcolm Nice video. Kind of you to shave your legs for our benefit. I think this is an example of the general finding that prior information affects the perception of degraded speech, which has been extensively investigated with
vocoded speech. When vocoded with few channels this can sound intelligible, but sounds clear and obvious when preceded by either written or spoken (clear speech) versions of the original. It’s been found that this kind of clear-then-distorted exposure speeds
up learning so that it is easier to recognise new sentences vocoded in the same way:
Davis MH, Johnsrude IS, Hervais-Adelman A, Taylor K, McGettigan C (2005) Lexical information drives perceptual learning of distorted speech: Evidence from the comprehension of noise-vocoded sentences.
Journal of Experimental Psychology-General 134:222-241. The difference between the video and most published studies is that the distorted speech could plausibly ‘map onto’ one of two sentences. My colleague Matt Davis,
in a public science lecture, had the audience play “vocoder bingo”, in which Matt played several vocoded words, and the audience were all given cards with some words written on, and had to tick off each word when they heard it. Much excitement ensued (I’m
not sure what the prize was), but the catch was that none o fthe words were actually presented – they were just sufficiently similar/ambiguous to be convincing when paired with the written text. Matt tells me that they have published an imaging study, looking
at brain responses to ambiguous vocoded words when cued to hear them one way or another (e.g. ‘pit’ or ‘kitsch’). This may be the closest published work to what you ask for: Blank, H., Spangenberg, M., Davis, M.H. (2018) Neural prediction errors distinguish perception and misperception of speech. Journal of Neuroscience, 38 (27) 6076-6089 https://www.jneurosci.org/content/38/27/6076 Finally, I suspect that this is not a semantic effect as I expect it would work with non-words All the best, Bob From: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>
On Behalf Of Malcolm Slaney Has there been anything formal published on this effect? It sounds to me like a semantic version of the McGurk effect. Nice demo. - Malcolm |