I wouldn't call this a semantic McGurk, given that it doesn't have to be driven by simultaneous bottom-up input from two modalities. That is, even if nothing is written on the screen but you're just thinking "green needle" to yourself, that's what you're likely to hear (whereas thinking "ga" while hearing "ba" won't get you to "da" - you need the simultaneous input from face and voice). So I'd agree with Roger that it's more akin to the phoneme restoration effect or work like Cynthia Connine's "she ran hot water for the p/bath," showing how expectations influence interpretation of bottom-up input.
I think most of US wouldn't be surprised that the same stimulus can be perceived in different ways, but my impression is that the general public tends to believe "what you see is what you get" and underestimates the power of top-down influences. Same reason #TheDress was such a hit.
When I include this in my class on speech perception, I also include this
video which shows Grover from Sesame street saying EITHER "Yes, yes, that sounds like an excellent idea!" OR "Yes, yes, that's a f*%#g excellent idea!"
Like I'm always telling my students - Speech is hard! Context helps!
Best,
Julia