Classification Of Spontaneous And Scripted Speech For Multilingual Audio
Shahar Elisha, Andrew McDowell, Mariano Beguerisse-Díaz, Emmanouil Benetos
The decomposition of a music audio signal into its vocal and backing track components is analogous to image-toimage translation, where a mixed spectrogram is transformed into its constituent sources. We propose a novel application of the U-Net architecture — initially developed for medical imaging — for the task of source separation, given its proven capacity for recreating the fine, low-level detail required for high-quality audio reproduction. Through both quantitative evaluation and subjective assessment, experiments demonstrate that the proposed algorithm achieves state-of-the-art performance.
Shahar Elisha, Andrew McDowell, Mariano Beguerisse-Díaz, Emmanouil Benetos
Wojciech Reise, Ximena Fernández, Maria Dominguez, Heather A. Harrington, Mariano Beguerisse-Díaz
A. Ghazimatin, E. Garmash, G. Penha, K. Sheets, M. Achenbach, O. Semerci, R. Galvez, M. Tannenberg, S. Mantravadi, D. Narayanan, O. Kalaydzhyan, D. Cole, B. Carterette, A. Clifton, P. N. Bennett, C. Hauff, M. Lalmas-Roelleke