Nils Poschadel, Stephan Preihs, Jürgen Peissig (2025): DOAVINCI: Direction of Arrival based Videoconferencing Incorporating Neural Networks for Increased Conversational Intelligibility, submitted to Fortschritte der Akustik - DAGA 2025, 51. Jahrestagung für Akustik, Kopenhagen.
This page provides some additonal material on DOAVINCI: direction of arrival based videoconferencing that incorporates neural networks to enhance conversational intelligibility. It leverages a spherical microphone array and a 360° camera to improve both audio and visual focus on active speakers. DOAVINCI employs deep learning based direction of arrival (DOA) estimation in the spherical harmonics domain, complemented by a voice activity detection. The detected DOA informs a beamforming algorithm that focuses on the active speaker, aiming to improve speech intelligibility by attenuating background noise. Additionally, the DOA information directs a zoomed and perspective-corrected view of the active speaker within the 360° video stream, aligning visual attention with auditory focus. The tool’s effectiveness in enhancing speech intelligibility is evaluated using the Short-Time Objective Intelligibility (STOI) metric across different realistic scenarios including varying SNR conditions.