PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Multiple choice tries
Luc Devroye, Gábor Lugosi, Gahyun Park and Wojchiek Szpankowski
Random Structures and Algorithms 2007.

Abstract

In this paper we consider tries built from n strings such that each string can be chosen from a pool of k strings, each of them generated by a discrete i.i.d. source. Three cases are considered: k = 2, k is large but fixed, and k ∼ c log n. The goal in each case is to obtain tries as balanced as possible. Various parameters such as height and fill-up level are analyzed. It is shown that for two-choice tries a 50% reduction in height is achieved when compared to ordinary tries. In a greedy on-line construction when the string that minimizes the depth of insertion for every pair is inserted, the height is only reduced by 25%. In order to further reduce the height by another 25%, we design a more refined on-line algorithm. The total computation time of the algorithm is O(n log n). Furthermore, when we choose the best among k ≥ 2 strings, then for large but fixed k the height is asymptotically equal to the typical depth in a trie. Finally, we show that further improvement can be achieved if the number of choices for each string is proportional to log n. In this case highly balanced trees can be constructed by a simple greedy algorithm for which the difference between the height and the fill-up level is bounded by a constant with high probability. This, in turn, has implications for distributed hash tables, leading to a randomized ID management algorithm in peer-to-peer networks such that, with high probability, the ratio between the maximum and the minimum load of a processor is O(1).

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:4624
Deposited By:Gábor Lugosi
Deposited On:13 March 2009