I agree with the points made so far. I'd already drafted this as
adding 'some general points I think are compatible' before I read
Mattson's msg. I'm sending it anyway, because one or two points have
not yet been made, and while others are obvious or have now been
said, it may be helpful to put them in one place. I can supply
references for most of my points if you would like them, but much of
this is easily found in literature that may be of more relevance to
you. - Listening strategies are unavoidable, so even if you try to produce an unbiased initial situation, participants are likely to develop a strategy during the experiment that is tuned to the particular stimuli (including their range of variation) and task. The strategy may or may not vary significantly between individuals, depending on stimulus construction and presentation. - What do you want to generalise your results to? Responses to short sounds heard out of context may not generalise to responses to longer sounds, and the same sound can be interpreted very differently in different contexts. Ideally, your presentation context as well as your sound stimuli themselves will reflect the situations you want your experiment to be relevant to. - Another consideration might be the definition of your categories. Is it the domains (e.g. speech, music, natural environment) you are interested in, or detection of different timbres? If it's the domains, then it would seem reasonable to let awareness of the domain be part of the experiment, since expectations tend to drive perceptions of ambiguous stimuli. But it sounds as though timbre rather than domain may be the point. If timbre, then these can change within a 250-ms excerpt in all 3 domains mentioned. So considering whether you want natural dynamical variation or not could be important. (And perhaps to use stimuli long enough for those functional categories to be meaningful.) - do you care about thresholds, or what people normally do above threshold? Either way, exactly where and how in a sound chunk a particular change occurs is sometimes critical, and sometimes of no apparent importance at all. This is perhaps particularly true for speech, for example for f0 contours vis a vis the syllable structures that carry them, and for what the perceived function of the utterance is (which would normally require it to be heard in context). And though it sounds as though you are probably planning a psychoacoustic expt, the fact you've asked the question suggests that you might in time want to find a functionally-meaningful task, perhaps in addition to a 4IAX task. If you do, this could affect stimulus choice now. - Stimulus variation can strongly affect responses, presumably by helping or hindering attention to be focussed on particular acoustic properties. You can assess this by blocking stimulus presentations, so that listeners hear only one type of stimulus in a block, or by presenting the full range in a block. With subtle differences, you'll likely get different results. - While stimuli of constant duration look nicely controlled, and can be the best for some experiments (probably including discrimination and threshold tasks), it could be worth considering the amount of information conveyed within a stimulus. Generalising across genres, musical notes are typically rather slower than spontaneous conversational speech. In normal-rate speech, 250 ms can (but does not always) involve more than one syllable and often more than one word. Fast music can involve several notes within 250 ms, but in much music, single notes typically are longer than 250 ms. (There is no 1:1 relation between phonemes, syllables, words, and notes and phrases.) - In longer stretches of sound, temporal properties (e.g. amplitude envelope, factors that affect rhythm and metre) strongly affect perceptual responses, and how listeners hear them is culturally-sensitive. (Relevance to generalisation to real life, again.) - Relatedly, and following on from Bob's email, speech, music and environmental sounds can and typically do include harmonic, inharmonic and aperiodic sounds, and of course silence too (albeit often in different proportions). For speech, it is easy to stick to stimuli that have an f0, but generalisation to normal conditions may be somewhat limited. I'd predict, but don't know, the same for environmental sounds. - Finally, where do singing and rap fit in? Many of these issues cannot be resolved to produce perfectly controlled stimuli - you have to make (sometimes very tough) decisions about your focus and what's practical, after which other decisions are likely to be influenced by your earlier ones. Being aware that you are making the early ones before the design is finalised is useful though! I hope this helps, and good luck! Sarah On 09/05/2021 16:14, Mattson ogg wrote:
|