Generalized Metrics for Single-F0 Estimation Evaluation

Abstract

Single-f0 estimation methods, including pitch trackers and melody estimators, have historically been evaluated using a set of common metrics which score estimates frame-wise in terms of pitch and voicing accuracy.“Voicing” refers to whether or not a pitch is active, and has historically been regarded as a binary value. However, this has limitations because it is often ambiguous whether a pitch is present or absent, making a binary choice difficult for humans and algorithms alike. For example, when a source fades out or reverberates, the exact point where the pitch is no longer present is unclear. Many single-f0 estimation algorithms select a threshold for when a pitch is active or not, and different choices of threshold drastically affect the results of standard metrics. In this paper, we present a refinement on the existing single-f0 metrics, by allowing the estimated voicing to be represented as a continuous likelihood, and introducing a weighting on frame level pitch accuracy, which considers the energy of the source producing the f0 relative to the energy of the rest of the signal. We compare these metrics experimentally with the previous metrics using a number of algorithms and datasets and discuss the fundamental differences. We show that, compared to the previous metrics, our proposed metrics allow threshold-independent algorithm comparisons.

Related

September 2022 | Interspeech

Unsupervised Speaker Diarization that is Agnostic to Language Overlap Aware and Free of Tuning

M Iftekhar Tanveer, Diego Casabuena, Jussi Karlgren, Rosie Jones

September 2022 | Interspeech

Exploring audio-based stylistic variation in podcasts

Katariina Martikainen, Jussi Karlgren, Khiet Truong

July 2022 | SIGIR

What Makes a Good Podcast Summary?

Rezvaneh Rezapour, Sravana Reddy, Ann Clifton, Rosie Jones