As with our work on facial systems, the starting point for our automated voice recognition work will be a set of widely-used biometric verification algorithms. Our methods will focus on i) frequency/power methods (such as Mel-frequency cepstral coefficients) which are difficult to define in humanly understandable terms, and ii) ‘prosodic’ feature which include pitch, duration of components, loudness and timbre, which have more of a physically-based understanding. One interesting exploration will include the number of physically-based characteristics that are captured within the first set of power methods as these will help to make those methods more explainable both in terms of processing and outcome.
Within the Hummingbird Project, we will analyse the performance of systems under ‘ideal’ conditions but also under a series of degraded conditions including white noise, background chatter and disguise. Our work will assess the degree to which automated voice recognition can be improved by emphasizing specific individual components identified as important to human listeners.