Generalized Metrics for Single-F0 Estimation Evaluation

Abstract

Single-f0 estimation methods, including pitch trackers and melody estimators, have historically been evaluated using a set of common metrics which score estimates frame-wise in terms of pitch and voicing accuracy.“Voicing” refers to whether or not a pitch is active, and has historically been regarded as a binary value. However, this has limitations because it is often ambiguous whether a pitch is present or absent, making a binary choice difficult for humans and algorithms alike. For example, when a source fades out or reverberates, the exact point where the pitch is no longer present is unclear. Many single-f0 estimation algorithms select a threshold for when a pitch is active or not, and different choices of threshold drastically affect the results of standard metrics. In this paper, we present a refinement on the existing single-f0 metrics, by allowing the estimated voicing to be represented as a continuous likelihood, and introducing a weighting on frame level pitch accuracy, which considers the energy of the source producing the f0 relative to the energy of the rest of the signal. We compare these metrics experimentally with the previous metrics using a number of algorithms and datasets and discuss the fundamental differences. We show that, compared to the previous metrics, our proposed metrics allow threshold-independent algorithm comparisons.

Related

August 2023 | Interspeech

Lightweight and Efficient Spoken Language Identification of Long-form Audio

Winstead Zhu, Md Iftekhar Tanveer, Yang Janet Liu, Seye Ojumu, Rosie Jones

June 2023 | ICASSP

Contrastive Learning-based Audio to Lyrics Alignment for Multiple Languages

Simon Durand, Daniel Stoller, Sebastian Ewert

September 2022 | Interspeech

Unsupervised Speaker Diarization that is Agnostic to Language Overlap Aware and Free of Tuning

M Iftekhar Tanveer, Diego Casabuena, Jussi Karlgren, Rosie Jones