Naturally, music is made for humans, but does this mean that we need
physiological/perceptual/cognitive models to analyze the content of
music?
An excellent question. In my opinion, yes. Since the phenomena of
greatest interest are inherently perceptual, and since human hearing
usually works much better than any sets of algorithms that we've been
able to come up with so far, it seems likely that it will pay to
continue to develop approaches that attempt to mimic human
perception, partly by modeling human physiology and cognition.
If you have a task that is not primarily perceptual, say recovering
the notes played on a set of instruments, then it might work better
to use physical models of the instruments and mathematical
optimization techniques. It would work even better to "instrument"
the instruments with sensors that would give you more information
than you could get from a sound waveform.
But if the task is to follow and understand a melody or a rhythm or
other aspects of musical/perceptual language, then relying too much
on things like "physical" notions of pitch and tempo can do more harm
than good.
In summary, I think that "hearing" problems should be addressed with
auditory techniques, and other problems in sound and music should be
addressed with whatever techniques suit them best.