Moving listening tests from the lab to the micro-task labor market of Amazon Mechanical Turk speeds data collection and reduces investigator effort. However, it also reduces the amount of control investigators have over the testing environment, adding new
variability and potential biases to the data. In this work, we compare multiple stimulus listening tests performed in a lab environment to multiple stimulus listening tests performed in web environment on a population drawn from Mechanical Turk. IF you want
to read more about this work, here is our publication on that topic.
M.
Cartwright, B. Pardo, G. Mysore and M. Hoffman, “Fast and Easy Crowdsourced Perceptual Audio Evaluation,”
Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, March 20-25, 2016
http://music.cs.northwestern.edu/publications/cartwright_etal_icassp2016.pdf
Best wishes,
Bryan Pardo
From: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx> on behalf of Samuel Mehr <sam@xxxxxxxxxxxxxxx>
Sent: Monday, October 2, 2017 11:59 PM To: AUDITORY@xxxxxxxxxxxxxxx Subject: Re: Software for internet-based auditory testing Dear Dick,
Lots of folks do successful audio-based experiments on Turk and I generally find it to be a good platform for the sort of work you're describing (which is not really what I do, but experimentally is similar enough for the purposes of your question). I've
done a few simple listening experiments of the form "listen to this thing, answer some questions about it", and the results directly replicate parallel in-person experiments in my lab, even when Turkers geolocate to lots of far-flung countries. I require subjects
to wear headphones and validate that requirement with this great task from Josh McDermott's lab:
Woods, K. J. P., Siegel, M. H., Traer, J., & McDermott, J. H. (2017). Headphone screening to facilitate web-based auditory experiments.
Attention, Perception, & Psychophysics, 1–9.
https://doi.org/10.3758/s13414
In a bunch of piloting, passing the headphone screener correlates with a bunch of other checks on Turker compliance, positively. Things like "What color is the sky? Please answer incorrectly, on purpose" and "Tell us honestly how carefully
you completed this HIT". Basically, if you have a few metrics in an experiment that capture variance on some dimension related to participant quality, you should be able to easily tell which Turkers are actually doing good work and which aren't. Depending
on how your ethics approval is set up, you can either pay everyone and filter out bad subjects, or require them to pass some level of quality control to receive payment.
best
Sam
On Tue, Oct 3, 2017 at 8:57 AM, Richard F. Lyon
<dicklyon@xxxxxxx> wrote:
|