Research

Four directions in audio, AI, and perception.

My work spans machine listening, trustworthy AI, multimodal systems, and interdisciplinary applications. Below is an overview of each research direction.

In the diagrams below where papers are listed, the index of publications can be found inpublications.

Audio and Music Intelligence research map
01

Audio and Music Intelligence

This field represents the main research foundation. The work develops computational methods for understanding, modelling, generating, and analysing sound and music. It covers acoustic event detection, audio tagging, acoustic scene classification, audio captioning, audio-language understanding, symbolic music analysis, music generation, expressive timing, healthcare audio, heart sound analysis, speech enhancement, and singing or opera voice modelling.

The motivation is to enable machines to perceive and reason about sound in ways that are useful for real-world environments, creative music systems, healthcare applications, and human-centred intelligent systems. The field has achieved a broad and coherent research programme: from modelling musical expression and symbolic music structure, to building machine listening systems, to connecting audio with language and multimodal reasoning, and finally to applying audio intelligence in healthcare and speech-related tasks.

  • Acoustic scene analysis
  • Music information retrieval
  • Bioacoustics
  • Speech enhancement
Robust, Trustworthy and Efficient AI research map
02

Robust, Trustworthy and Efficient AI

This field focuses on making audio and music AI systems more reliable, transferable, efficient, secure, and trustworthy. The work addresses four main challenges: limited data, domain shift, computational efficiency, and adversarial or privacy risks. The motivation is that real-world audio systems rarely operate under clean laboratory conditions; they must adapt across devices, cities, environments, species, speakers, and data distributions, while remaining efficient and secure.

The achievements include compact neural architectures for audio classification, pruning and low-complexity model design, domain adaptation methods for acoustic scene classification and bioacoustic event detection, curriculum learning for data-efficient learning under domain shift, synthetic speech spoofing detection, source speaker tracing, adversarial attack analysis for music models, and membership inference attacks against symbolic and generative music systems. Together, this field establishes a transition from performance-driven audio AI to robust, secure, efficient, and accountable audio AI.

  • Domain adaptation
  • Few-shot learning
  • Model efficiency
  • Audio security
Multimodal and Cross-Domain Perception research map
03

Multimodal and Cross-Domain Perception

This field extends audio intelligence beyond single-modality sound analysis. It investigates how audio can interact with language, vision, wireless signals, radar signals, skeleton data, and other sensing modalities. The motivation is that real-world perception is inherently multimodal: machines often need to combine sound with visual context, textual descriptions, spatial signals, physiological signals, or body movement patterns.

The work has achieved a broader perception framework that connects audio-visual scene understanding, audio captioning, language-based audio retrieval, audio question answering, indoor positioning, wireless radar sensing, and sign-language recognition. It demonstrates the ability to transfer modelling ideas across domains and to reason about heterogeneous data sources. This field therefore expands machine listening into a more general cross-domain perception agenda.

  • Audio-visual learning
  • Multimodal classification
  • Sign language recognition
  • Radar sensing
Peripheral Review and Interdisciplinary Works research map
04

Peripheral Review and Interdisciplinary Works

This field contains work that broadens the research profile beyond the central audio and music AI programme. It includes review and editorial work, education research, optical communication, hardware-oriented engineering, software/business analysis, and other interdisciplinary applications. The motivation is to connect technical expertise in modelling, intelligent systems, and signal analysis with wider academic and practical contexts.

The achievements are diverse. The field contributes to human-centred perspectives on computer audition, reflects on the use of generative AI in education and assessment, applies modelling ideas to optical communication and engineering systems, and includes early work on digital audio-book business models. Although these works are less central to the main audio-AI trajectory, they show methodological flexibility and the ability to transfer analytical and computational thinking across domains.

  • Computer audition review
  • AI education
  • Signal processing
  • FPGA systems