A Subsequence Matching with Gaps-Range-Tolerances Framework: A Query-By-Humming Application
We propose a novel subsequence matching framework that allows for gaps in both the query and target sequences, variable match- ing tolerance levels efficiently tuned for each query and target se- quence, and also constrains the maximum match length. Using this framework, a space and time efficient dynamic programming method is developed: given a short query sequence and a large database, our method identifies the subsequence of the database that best matches the query, and further bounds the number of con- secutive gaps in both sequences. In addition, it allows the user to constrain the minimum number of matching elements between a query and a database sequence. We show that the proposed method is highly applicable to music retrieval. Music pieces are repre- sented by 2-dimensional time series, where each dimension holds information about the pitch and duration of each note, respectively. At runtime, the query song is transformed to the same 2-dimensional representation. We present an extensive experimental evaluation using synthetic and hummed queries on a large music database. Our method outperforms, in terms of accuracy, several DP-based subsequence matching methods—with the same time complexity— and a probabilistic model-based method.