When doing localization tests best practice is to use visually-opaque acoustically-transparent curtains. However, it's also best practice to provided respondents with visual references which they can use to respond.
Depending on the perceptual task, providing a reference stimulus with known location (visual & acoustic) can be extremely useful.
In audio engineering, things get more interesting when visual and auditory cues are in different spatial locations.
For example, in film sound mixing dialogue is pretty much always mixed to the centre channel only, even when actors are visible at the left and right of the projected image. There are technical limitations that prevent using phantom sources
to match the sound to the viewed location of the actors. The visual-auditory mismatch is generally not annoying or troublesome and we perceive the dialogue as eminating from the visual location on the screen - not the physical location of the loudspeaker.
In large theatres the physical mismatch between the stimuli can be quite large, routinely 30 feet.
This is because in multimodal perception vision generally dominates (think McGurk effect).