Re: [AUDITORY] Software for internet-based auditory testing

To add to the list of tools. Here are some relevant audio crowdsourcing tools that I've worked on:

1) CAQE - a tool for running crowdsourced audio quality evaluation (both pairwise and multi-stimulus tests), which is setup for easy use with MTurk workers and Heroku hosting.

https://github.com/interactiveaudiolab/CAQE/issues

2) Audio Annotator - a browser-based sound-event annotation tool that we've used in crowdsourced experiments

https://github.com/CrowdCurio/audio-annotator

3) HearingScreening.js - browser-based implementation of a simple tone-counting-based hearing screening that ensures a wide-bandwidth listening environment (i.e. not-laptop speakers) for crowdsourced participants

https://github.com/mcartwright/hearing-screening.js

Mark

On Wed, Oct 4, 2017 at 3:49 AM Brecht De Man <b.deman@xxxxxxxxxx> wrote:

Our ‘Web Audio Evaluation Tool’ aims to address several of the points raised here; e.g.

- “inexpensive and simple to program”: free, open source, and with an optional GUI test creator

- "ideally with response times”: all timing information (clicks, plays, …) is logged, and can for instance be visualised as a timeline (https://github.com/BrechtDeMan/WebAudioEvaluationTool/wiki/Features#metrics)

- “good functionality for auditory playback”: based on the Web Audio API (HTML) so no Flash, Java or other 3rd party software needed, very fast response and seamless switching, very widely compatible including mobile devices

- “can be used for all kinds of experiments”: implements a wide variety of standards as presets, based on a few elementary interfaces: vertical and horizontal sliders, Likert, AB(CD…), AB(CD…)X, ranking, and waveform annotation (https://github.com/BrechtDeMan/WebAudioEvaluationTool/wiki/Interfaces). Not so much ‘method of adjustment’ at this time.

We welcome any contributions and feature requests, as we aim to make a maximally comprehensive yet elegant and easy-to-use listening test tool through community effort.

I am not aware of any published use of it on Mechanical Turk - though it’s something I want to try myself soon - but others have integrated it in systems which track progress of several experiments, for instance. We’ve included some functionality to facilitate this, like the ‘returnURL’ attribute which specifies the page to direct to upon test completion.

All info on

https://github.com/BrechtDeMan/WebAudioEvaluationTool

and

Nicholas Jillings, Brecht De Man, David Moffat and Joshua D. Reiss, "Web Audio Evaluation Tool: A Browser-Based Listening Test Environment," 12th Sound and Music Computing Conference, July 2015. (http://smcnetwork.org/system/files/SMC2015_submission_88.pdf)

Please send any questions, suggestions or comments you may have to b.deman@xxxxxxxxxx.

Best wishes,

Brecht

________________________________________________

Brecht De Man

Postdoctoral researcher

Centre for Digital Music

Queen Mary University of London

School of Electronic Engineering and Computer Science

Mile End Road

London E1 4NS

United Kingdom

www.brechtdeman.com
b.deman@xxxxxxxxxx

Twitter | LinkedIn | GitHub
Google Scholar | ResearchGate | Academia

On 4 Oct 2017, at 06:38, Richard F. Lyon <dicklyon@xxxxxxx> wrote:

Many thanks, Sam and Bryan and Kevin and all those who replied privately.

I can see many possible ways forward; just need to get pecking at some...

Dick

On Tue, Oct 3, 2017 at 7:06 PM, kevin woods <kevinwoods@xxxxxxxxxxxxxxx> wrote:

Further to Sam's email, here is a link to a code package we put together to implement our headphone screening task (intended to improve the quality of crowdsourced data): http://mcdermottlab.mit.edu/downloads.html

We have generally found that the quality of data obtained online with our screening procedure is comparable to that of data obtained in the lab on the same experiments. For obvious reasons we have only run experiments where precise stimulus control seems unlikely to be critical.

Please feel free to contact us at kwoods@xxxxxxx with questions.

Sincerely,

Kevin Woods (on behalf of the McDermott Lab, Department of Brain and Cognitive Sciences, MIT)

On Tue, Oct 3, 2017 at 12:59 AM, Samuel Mehr <sam@xxxxxxxxxxxxxxx> wrote:

Dear Dick,

Lots of folks do successful audio-based experiments on Turk and I generally find it to be a good platform for the sort of work you're describing (which is not really what I do, but experimentally is similar enough for the purposes of your question). I've done a few simple listening experiments of the form "listen to this thing, answer some questions about it", and the results directly replicate parallel in-person experiments in my lab, even when Turkers geolocate to lots of far-flung countries. I require subjects to wear headphones and validate that requirement with this great task from Josh McDermott's lab:

Woods, K. J. P., Siegel, M. H., Traer, J., & McDermott, J. H. (2017). Headphone screening to facilitate web-based auditory experiments. Attention, Perception, & Psychophysics, 1–9. https://doi.org/10.3758/s13414-017-1361-2

In a bunch of piloting, passing the headphone screener correlates with a bunch of other checks on Turker compliance, positively. Things like "What color is the sky? Please answer incorrectly, on purpose" and "Tell us honestly how carefully you completed this HIT". Basically, if you have a few metrics in an experiment that capture variance on some dimension related to participant quality, you should be able to easily tell which Turkers are actually doing good work and which aren't. Depending on how your ethics approval is set up, you can either pay everyone and filter out bad subjects, or require them to pass some level of quality control to receive payment.

best

Sam

--

Samuel Mehr

Department of Psychology

Harvard University

themusiclab.org

naturalhistoryofsong.org

On Tue, Oct 3, 2017 at 8:57 AM, Richard F. Lyon <dicklyon@xxxxxxx> wrote:

Five years on, are there any updates on experience using Mechanical Turk and such for sound perception experiments?

I've never conducted psychoacoustic experiments myself (other than informal ones on myself), but now I think I have some modeling ideas that need to be tuned and tested with corresponding experimental data. Is MTurk the way to go? If it is, are IRB approvals still needed? I don't even know if that applies to me; probably my company has corresponding approval requirements.

I'm interested in things like SNR thresholds for binaural detection and localization of different types of signals and noises -- 2AFC tests whose relative results across conditions would hopefully not be strongly dependent on level or headphone quality. Are there good MTurk task structures that motivate people to do a good job on these, e.g. by making their space quieter, paying attention, getting more pay as the task gets harder, or just getting to do more similar tasks, etc.? Can the pay depend on performance? Or just cut them off when the SNR has been lowered to threshold, so that people with lower thresholds stay on and get paid longer?

If anyone in academia has a good setup for human experiments and an interest in collaborating on binaural model improvements, I'd love to discuss that, too, either privately or on the list.

Dick