Exploring optimal ways to represent complex sound fields (e.g. audience noise, rain) for future spatial audio formats for broadcast.
Current audio formats (like stereo or 5.1) mix the audio between a set number of loudspeakers in known positions. If the loudspeaker layout of the reproduction system is correct, the signals from the loudspeakers produce signals at the listeners ears that correspond to the spatial impression that is desired (e.g. the positions of the sources). However, if the loudspeaker layout of the reproduction system is not ideal, the spatial impression is distorted. There is now a move towards object based audio formats that do not transmit mixed signals. Instead the audio “objects” are sent individually with metadata that describes their position and other attributes such as their size. It is then the job of the renderer to best distribute the audio objects based on the given reproduction system. This has many advantages. The audio format is now independent of the replay system. Consumers on a mobile phone; a high quality Hi-Fi system; a binaural system with head tracking; a wave field synthesis system; or a huge home theatre system can receive the same audio data and the renderer will optimise it for each device. The audio is also more interactive. For example, the listener can easily choose different audio track, change levels of different sources, for personal preference or to aid hearing impaired listeners.
Difficulties arise when the source is not well represented by a single point-source. Sounds that come from everywhere simultaneously (e.g. reverb, rain, audience noise) are not well represented by a single point source and a direction. These “diffuse” sound fields are difficult to encode and are the focus of this project.
The goal of the project is to develop innovative ways to represent these diffuse sound fields so listeners can be truly surrounded by the sound in the same we are in the real world. The results of listening tests can be used to exploit the redundancy in the human perception these sound fields. These techniques can be used to improve the feeling of being surrounded whilst reducing the amount of data required. Maximum effect, for minimum data.
This will hopefully lead to perceptually motivated metrics for assessing the diffuse sound field reproduction of different loudspeaker layouts and processing algorithms without the need for time consuming subjective testing. These can be used to design optimal decorrelation algorithms and loudspeaker layouts that will sound good but be efficient to implement.
These algorithms can be used in spatial audio formats that want the best quality with a small bandwidth. Applications such as broadcast and streaming are the obvious cases.
Michael Cousin, Filippo Fazi