Generalized Metrics for Single-F0 Estimation Evaluation

Abstract

Single-f0 estimation methods, including pitch trackers and melody estimators, have historically been evaluated using a set of common metrics which score estimates frame-wise in terms of pitch and voicing accuracy.“Voicing” refers to whether or not a pitch is active, and has historically been regarded as a binary value. However, this has limitations because it is often ambiguous whether a pitch is present or absent, making a binary choice difficult for humans and algorithms alike. For example, when a source fades out or reverberates, the exact point where the pitch is no longer present is unclear. Many single-f0 estimation algorithms select a threshold for when a pitch is active or not, and different choices of threshold drastically affect the results of standard metrics. In this paper, we present a refinement on the existing single-f0 metrics, by allowing the estimated voicing to be represented as a continuous likelihood, and introducing a weighting on frame level pitch accuracy, which considers the energy of the source producing the f0 relative to the energy of the rest of the signal. We compare these metrics experimentally with the previous metrics using a number of algorithms and datasets and discuss the fundamental differences. We show that, compared to the previous metrics, our proposed metrics allow threshold-independent algorithm comparisons.

View Publication

April 2024 | The Web Conference

Generalized Metrics for Single-F0 Estimation Evaluation

Abstract

Related

Long-term off-policy evaluation and learning

Lightweight and Efficient Spoken Language Identification of Long-form Audio

Contrastive Learning-based Audio to Lyrics Alignment for Multiple Languages