In contrast to human face recognition, voice recognition appears to be a difficult task. Whilst we can perform at above chance levels in simple matching tasks, performance is easily interfered with by distractor voices between study and test, or by the introduction of noise or disguise. This said, one reliable factor that assists performance is distinctiveness: the more distinctive a voice (or a face), the better performance is, and the more confident the listener feels.
Existing theoretical accounts of voice processing suggest two reasons that may account for our relatively poor performance with voices. First, our perceptual ability to tell voices apart may not be as sophisticated as when telling faces apart. Second, our priority when listening to voices may not lie in identifying the speaker, but may instead lie in identifying the speech. As such, the processing of speech, and of vocal affect may take precedence over the processing of identity.
These factors may mean that the study of the human listener can only reveal subtle ways by which to optimise speaker recognition algorithms. Moreover, the combination of face and voice as multiple biometrics may pose additional competing processing demands on the human which are absent from machine processing scenarios. The Hummingbird Project will examine the extent to which speaker recognition can be optimised, and will explore the benefits and the costs associated with information fusion across multiple modalities.