Audio Machine Learning for Logitech Video Conferencing Seminar

Time:: 16:00 - 17:00
Date:: 26 May 2023
Venue:: Building 13, room 3019

For more information regarding this seminar, please email Vanui Mardanyan at isvr@southampton.ac.uk .

Event details

ISVR Research seminar by Dr Andy Harper, Logitech

Recording Engineers will select the most appropriate microphone, and mic placement in the ideal room, minimising the processing required by the Mix Engineer. In many video conferencing applications, quite the opposite is true. MEMs mics are placed in the far-field, next to a loudspeaker, in a reverberant glass room with loud air conditioning and keyboards tapping. Processing must be carried out autonomously by intelligent algorithms, in real-time with minimal latency. These act to cancel the echo, suppress any artefacts from nonlinear distortions, beam to the active participant(s), and enhance the speech, removing dynamic noises and reverberation. In this talk, an overview of the challenges for video conferencing are given, from an acoustics, psychoacoustics, and machine learning perspective. The path of R&D through production is also discussed, as well as a look into the upcoming technology requirements as we leverage multi-device solutions and devise multi-modal solutions.

Speaker information

I have been leading the Audio Machine Learning team at Logitech for over 3 years, and have overseen the development and edge deployment of real-time audio ML algorithms into Logitech’s video conferencing devices. This began with our flagship and biggest selling product Rally Bar, through devices like Micpod and Rally Bar mini, to upcoming products like Logitech Sight and our AI powered tabletop camera with intelligent multi-participant framing. Graduating from ISVR in 2011 with an MSc in Sound and Vibration Studies, I went on to work for Celestion Loudspeakers for 6 years. As a Project Engineer then Research Engineer, I optimised arrays, horns, and motor designs, as well as oversaw the measurement facilities in the UK and China. I then led the development and introduction to market of Celestion Impulse Responses (their first digital product, and now the industry standard in guitar cabinet emulation), as well as their first application of machine learning - for quality control of loudspeakers. I later moved to Midas to work as part of the Research Team using AI for instrument classification and autonomous mixing. From there I joined Logitech, and discovered the many technically fascinating challenges of picking up and enhancing far-field speech for video collaboration, as well as transitioning a DSP audio pipeline to state-of-the-art machine learning algorithms.