Traveling among Clusters: a Way to Reconsider the Benefits of the Cluster Hypothesis
Relying on the Cluster Hypothesis which states that relevant documents tend to be more similar one to each other than to non-relevant documents, most of information retrieval systems organizing search results as a set of clusters seek to gather all relevant documents in the same cluster. We propose here to reconsider the benefits of the entailed concentration of the relevant information. Contrary to what is commonly admitted, we believe that systems which aim to distribute the relevant documents in different clusters, since being more likely to highlight different aspects of the subject, may be at least as useful for the user as systems gathering all relevant documents in a single group. Since existing evaluation measures tend to greatly favor the latter systems, we first investigate ways to more fairly assess the ability to reach the relevant information from the list of cluster descriptions. At last, we show that systems distributing the relevant information in different clusters may actually provide a better information access than classical systems.