Hi all,
I'm a PhD candidate in ethnomusicology at UC Berkeley, and I have a research question with which I'm hoping you can help. My research focuses on sound communication between Taiwan and China, and especially how sound gets around censorship mechanisms. I'm trying to understand whether there are any technological reasons why audio communication might be more difficult to censor than visual communication. Since AI is central in many censorship tools, I am especially interested in the unique challenges of using sound data with AI.
My understanding thus far is that there were certain developments around 2010 which prompted the use of GPUs for AI, and led to huge breakthroughs in AI applications in industry. My question is whether the switch to GPUs also led to a greater focus on visual data because the physical architecture of the GPUs lend themselves better to visual rather than audio data. Thoughts on this topic? Is there visual bias in AI research? If so, is this bias technological, or cultural? What are some of the unique challenges of using AI technologies with sound data?
Looking forward to hearing your candid feedback, and thanks to Justin Salamon for pointing me toward this listserv.
Best,
Sarah