Voice Recognition

August 7th, 2010

No, this is not about a computer technology, so if your search engine got you here, you might as well go away. This is about a peculiarity of mine. I am really terrible about remembering people: I can never remember their names, and I’m not so good at remembering faces, either. But once I learn a voice, I never forget it. This commonly shows up when my wife and I are watching some television show or movie, particularly a science fiction show. We’ll see this character all covered up in latex and suddenly I’ll blurt out, “That’s the guy who played X in movie Y!” (I certainly don’t remember actors’ names). My wife will stare and shake her head; she can’t see the resemblance. I listen some more and confirm my impression. I’m quite good at this: I have even identified young actors in bit parts in old movies who have later become stars. Their appearances may have changed, but their voices can’t get past my voice recognition system.

I find this quite remarkable; after all, to pull this off requires the memorization of the voices of hundreds of people. How and why does that happen? Why does my mind zero in on voices when it ignores names and faces? I have only one guess: I’m hard of hearing.

“Huh?” you say. “How could being hard of hearing make you better at voice recognition?” Here’s the background: I have a terrible time understanding conversation in noisy environments. I hate parties and such gatherings because all the background noise makes it impossible for me to understand anything anybody says. I have developed a large collection of appropriate nothingburger statements to offer when somebody mumbles at me in such circumstances. “Indeed” works well in almost every situation. “I don’t know” is also pretty good. I have tried mumbling back sometimes, but that just yields a demand that I repeat myself, which only makes matters worse. Usually I can make out one or two words, and so I can often improvise something around that:

“Shar, maror oh raow fuffer wuth dinner shum marfer?” gets “I am certainly hungry.”
“fraffun varbul mummucks Theresa shinabar aken” gets “I haven’t had much chance to talk to Theresa.”

I don’t think that the problem arises from weak reception of sound – I never attended rock concerts when I was young and in those rare occasions when I am subjected to live music, I always find it too loud for my taste. I think it’s just a neural deficiency: I can’t distinguish words. I greatly enjoy music, and I always have music playing on my computer while working, so the defect does not seem to apply to music comprehension. I think that the defect is specific to speech recognition.

This leads me to suspect that I have attempted to compensate by enhancing my sensitivity to other components of human speech. I have always had a good ear for language and can detect a slight accent quickly (although my vocal control is insufficient to permit precise reproduction of the sounds peculiar to a foreign language). And of course, I am very good at recognizing intrinsic vocal components – that’s what makes it so easy for me to identify people from their voices. However, all that compensatory effort still can’t correct the problem: I simply can’t figure out what you’re saying in a noisy environment. Perhaps I should make a little button to wear on my shirt when in such environments: “Hearing defect: speak up!”