Computationally efficient methods for polyphonic music transcription
PhD thesis, University of Alicante.
Automatic music transcription is a music information retrieval (MIR) task which involves many different disciplines, such as audio signal processing, machine learning, computer science, psychoacoustics and music perception, music theory, and music cognition. The goal of automatic music transcription is to extract a human readable and interpretable representation, like a musical score, from an audio signal. To achieve this goal, it is necessary to estimate the pitches, onset times and durations of the notes, the tempo, the meter and the tonality of a musical piece. The most obvious application of automatic music transcription is to help a musician to write down the music notation of a performance from an audio recording, which is a time consuming task when it is done by hand. Besides this application, automatic music transcription can also be useful for other MIR tasks, like plagiarism detection, artist identification, genre classification, and composition assistance by changing the instrumentation, the arrangement or the loudness before resynthesizing new pieces. In general, music transcription methods can also provide information about the notes to symbolic music algorithms. This work addresses the automatic music transcription problem using different strategies. Novel efficient methods are proposed for onset detection (detection of the beginnings of musical events) and multiple fundamental frequency estimation (estimation of the pitches in a polyphonic mixture), using supervised learning and signal processing techniques. The main contributions of this work can be summarized in the following points: - An analytical and extensive review of the state of the art methods for onset detection and multiple fundamental frequency estimation. - The development of an efficient approach for onset detection and the construction of a public ground-truth data set for this task. - Two novel approaches for multiple pitch estimation of a priori known sounds using supervised learning methods. These algorithms were one of the first machine learning methods proposed for this task. - A simple iterative cancellation approach, mainly intended to transcribe piano music at a low computational cost. - Heuristic multiple fundamental frequency algorithms based on signal processing to analyze real music without any a priori knowledge. These methods, which are probably the main contribution of this work, experimentally reached the state of the art for this task with a very low computational burden.