Generalized Metrics for Single-F0 Estimation Evaluation

Abstract

Single-f0 estimation methods, including pitch trackers and melody estimators, have historically been evaluated using a set of common metrics which score estimates frame-wise in terms of pitch and voicing accuracy.“Voicing” refers to whether or not a pitch is active, and has historically been regarded as a binary value. However, this has limitations because it is often ambiguous whether a pitch is present or absent, making a binary choice difficult for humans and algorithms alike. For example, when a source fades out or reverberates, the exact point where the pitch is no longer present is unclear. Many single-f0 estimation algorithms select a threshold for when a pitch is active or not, and different choices of threshold drastically affect the results of standard metrics. In this paper, we present a refinement on the existing single-f0 metrics, by allowing the estimated voicing to be represented as a continuous likelihood, and introducing a weighting on frame level pitch accuracy, which considers the energy of the source producing the f0 relative to the energy of the rest of the signal. We compare these metrics experimentally with the previous metrics using a number of algorithms and datasets and discuss the fundamental differences. We show that, compared to the previous metrics, our proposed metrics allow threshold-independent algorithm comparisons.

Related

September 2020 | RecSys

Inferring the Causal Impact of New Track Releases on Music Recommendation Platforms through Counterfactual Predictions

Rishabh Mehrotra, Prasanta Bhattacharya, Mounia Lalmas

August 2020 | KDD

Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions

Praveen Chandar, James McInerney, Brian Brost, Rishabh Mehrotra, Benjamin Carterette