Towards Robust Unsupervised Disentanglement of Sequential Data – A Case Study Using Music Audio

Abstract

Disentangled sequential autoencoders (DSAEs) represent a class of probabilistic graphical models that describes an observed sequence with dynamic latent variables and a static latent variable. The former encode information at a frame rate identical to the observation, while the latter globally governs the entire sequence. This introduces an inductive bias and facilitates unsupervised disentanglement of the underlying local and global factors. In this paper, we show that the vanilla DSAE suffers from being sensitive to the choice of model architecture and capacity of the dynamic latent variables, and is prone to collapse the static latent variable. As a countermeasure, we propose TS-DSAE, a two-stage training framework that frst learns sequence-level prior distributions, which are subsequently employed to regularise the model and facilitate auxiliary objectives to promote disentanglement. The proposed framework is fully unsupervised and robust against the global factor collapse problem across a wide range of model confgurations. It also avoids typical solutions such as adversarial training which usually involves laborious parameter tuning, and domainspecifc data augmentation. We conduct quantitative and qualitative evaluations to demonstrate its robustness in terms of disentanglement on both artifcial and real-world music audio datasets.

Related

November 2023 | ACM TORS

Unbiased Identification of Broadly Appealing Content Using a Pure Exploration Infinitely-Armed Bandit Strategy

Maryam Aziz, Jesse Anderton, Kevin Jamieson, Alice Wang, Hugues Bouchard, Javed Aslam

October 2023 | CIKM

Graph Learning for Exploratory Query Suggestions in an Instant Search System

Enrico Palumbo, Andreas Damianou, Alice Wang, Alva Liu, Ghazal Fazelnia, Francesco Fabbri, Rui Ferreira, Fabrizio Silvestri, Hugues Bouchard, Claudia Hauff, Mounia Lalmas, Ben Carterette, Praveen Chandar, David Nyhan

September 2023 | CLEF

Cem Mil Podcasts: A Spoken Portuguese Document Corpus For Multi-modal, Multi-lingual and Multi-Dialect Information Access Research

Ekaterina Garmash, Edgar Tanaka, Ann Clifton, Joana Correia, Sharmistha Jat, Winstead Zhu, Rosie Jones, Jussi Karlgren