Module overview
Multimodal AI is the study and design of artificial intelligence systems that integrate and learn jointly from multiple heterogeneous data sources, which can go from text, images and audio to biomedical signals, sensor streams, and many more. This module introduces the foundational principles of multimodal representation, alignment, and data fusion, and then examines how these techniques are implemented across a range of application domains, like health or audiovisual domains, among others. Students will learn how heterogeneous modalities are combined within a single system and how such systems are evaluated for robustness, bias, and reliability. The module also considers the social and regulatory implications of deploying multimodal AI technologies, including responsible and trustworthy AI.