Skip to main navigation Skip to main content
The University of Southampton
Virtual Acoustics and Audio Engineering

Research project: Use of machine learning for audio source separation

Currently Active:
Project type:

Developing machine learning algorithms for audio source separation of stereo mixtures for upmixing applications to newer listening systems like 3D, ambisonics or binaural.

Audio Source Separation
Audio Source Separation

Music and speech are two of the most used tools in human interactions. Both can influence people's reactions and moods and music can be especially impactful. Moreover, the technology available today in the music industry has made it very simple for people to have access to a large database of different songs. They can listen to recorded music on CDs and vinyls, on radio and on newer online streaming services. Usually, all these technologies use a stereo encoded recording method, since this is the most used listening arrangement.

The human listener is familiar with a mixture of sounds arriving at the ears, because in everyday life the auditory impressions are seldom perceived alone. Very often in a listening situation, there are several speakers talking at the same time, music is a mixture of different instruments and effects, and listeners need to separate every time the wanted from the unwanted information. The human brain is trained to separate or isolate just a single auditory event from a mixture. People can usually concentrate just on one person from a group of friends (a process known in the literature as the cocktail-party effect) or can listen just to one instrument from a song, suppressing the rest as background noise. Attempts have been made to try the same separation process as an automatic system and a significant amount of research has been undertaken in order to achieve a separation of sources that is as efficient as possible.

Audio source separation is still a relatively new research area, and its results are being used in several domains, from speech recognition and audio production, to spatial audio upmixing or karaoke. All existing methods can be divided into two main groups: supervised and unsupervised separation. For the first technique, prior information is required about the available sources in the mixture and the algorithm is trained to recognize them.

This project concentrates on the first technique, supervised source separation. Perhaps the most powerful supervised source separation techniques now available are those based on the use of neural networks. The goal of the PhD project is to use machine learning strategies to perform audio source separation of stereo encoded recordings to upmix the separated sources to other more complex listening systems such as 3D audio or binaural systems. To have a framework that can do this with a high quality is essential for several applications, especially in the music and movies industry.

Related research groups

Signal Processing, Audio and Hearing Group
Share Share this on Facebook Share this on Twitter Share this on Weibo
Privacy Settings