[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AUDITORY] High fidelity cocktail party recordings



Hi Brian,

You may be interested in the noise we used to build WHAM!, a noisy speech separation dataset: http://wham.whisper.ai/
From the website: "The noise audio was collected at various urban locations throughout the San Francisco Bay Area in late 2018. The environments primarily consist of restaurants, cafes, bars, and parks. Audio was recorded using an Apogee Sennheiser binaural microphone on a tripod between 1.0 and 1.5 meters off the ground."
The (currently) publicly available data is only at 16 kHz: "The clips are in 32-bit floating point WAV format with 2 channels and a sampling rate of 16 kHz." (the precision is actually 24 bit, despite the encoding being 32-bit)
But your email as well as a couple other requests we recently received for higher sampling rates convinced Whisper.ai and us to make a separate release of 48 kHz noise data. This won't simply be a 48 kHz version of the short files that we released for WHAM!, but instead a collection of longer recordings of various lengths, obtained by cutting the original data every time we determine there is somewhat intelligible speech in it. This should provide the maximum amount of data to researchers interested in cocktail party noise.
We are currently preparing the release, and will announce it on this list when it is ready, hopefully within a few weeks.

Thanks,
Jonathan


Jonathan Le Roux <Jonathan.Le-Roux@xxxxxxxxxxxxxx>
Senior Principal Research Scientist, Speech & Audio Senior Team Leader
MERL - Mitsubishi Electric Research Laboratories
201 Broadway, 8th Floor, Cambridge, MA 02139
Tel.: +1-617-621-7547  Fax: +1-617-621-7550



On Thu, Jul 30, 2020 at 12:16 AM Monson, Brian <monson@xxxxxxxxxxxx> wrote:
Dear Colleagues,

I am looking for high-fidelity recordings of natural cocktail party or other complex acoustic background scenes.

By “natural” I mean recorded in actual settings (cocktail parties, restaurants, hospitals, subways/trains/buses, etc.), preferably with a microphone location that could represent where a human might actually be listening to the scene (rather than, say, a mic suspended from the ceiling or something similar).

By “background” I mean true background scenes with no near-field talkers speaking directly into the microphone.

By “high fidelity” I mean:
Original recording sampling rate at least 44.1 kHz
Flat microphone response to 20 kHz
At least 16-bit precision preferred

COVID is preventing me from making my own recordings, so I’d greatly appreciate it if anyone has any you’d be willing to share (or know of any publicly available) that meet, or nearly meet, these criteria.

Many thanks,

Brian


Brian B. Monson, PhD

Assistant Professor
Department of Speech and Hearing Science
Neuroscience Program
University of Illinois at Urbana-Champaign
901 S Sixth St, Rm 223
Champaign, IL 61820
217-300-6212 | monson@xxxxxxxxxxxx
anexlab.shs.illinois.edu




Under the Illinois Freedom of Information Act any written communication to or from university employees regarding university business is a public record and may be subject to public disclosure.