We are using advanced video tracking technologies to provide a personalised spatial audio experience, independently of the listener position
Panning based spatial audio reproduction systems, for example those found in a cinema like the new Dolby Atmos, the home-friendly 5.1, or in its simplest form a stereo system, allow for the generation of robust virtual audio images when the listener is placed in the centre or line of symmetry of the loudspeaker set up, known as sweet spot. When the listener is outside of the sweet spot, the before perceived virtual audio images, collapse onto one of the nearest speakers. One way of tackling this phenomenon is to track the position of the listener with respect to the loudspeakers. This information can be used to adjust the loudspeaker input signals so that the listener always perceives a stable audio scene.
Other spatial reproduction systems can be improved by the use of listener tracking. Example of this are binaural reproduction and Transaural reproduction. The first allows, ideally, for an almost perfect 3D reproduction if head tracking is used in combination with personalised head related impulse responses. Transaural reproduction, allows for binaural reproduction without headphones. Generally Transaural systems have a fixed sweet spot, and are very dependent on the listener being placed there to obtain a good performance. Tracking techniques can be applied to Transaural reproduction devices, adjusting the loudspeaker signals so that the sweet spot moves with the listener.
Another aspect of the research which is tackled inside this project is to allow for almost 360∞ reproduction using a small number of distributed sources. This relates with consumer applications, in which spatial audio systems as those used for gaming or HDTV are based on soundbars or in a small combination of sources (5.1 or 7.1). The systems that nowadays are available on the market are far from providing a full 360 image, and even with the new surround audio formats as Dolby Atmos, which account for elevation of sources, image presentation at some directions is faulty. Again, this can be solved with the aid of listener tracking devices, which adapt each loudspeaker input signal to the listener position.
The future research will work towards providing full 360∞ images for a single listener. Once this is fully working, it will be adapted for multiple listeners inside a single room.
Marcos Simon, Filippo Fazi