Using the structure of documents to improve the discovery of unexpected information
François Jacquenet and Christine Largeron
In: 21 ème ACM Syposium on Applied Computing SAC 2006, 23-27 April 2006, Dijon, France.
In this paper we are interested in taking into account the structure of the documents during the discovery of unexpected information in textual databases.
Following our first work that aimed at designing and integrating, in the UnexpectedMiner system, some measures for the evaluation of the unexpectedness of documents, we wanted to improve the system by taking into account the structure of the documents processed. Each part of the documents are weighted by some coefficients whose values are determined by optimization techniques. Those coefficients are then used in the unexpectedness measures used by UnexpectedMiner to determine if a document contains some unexpected information or not.
The efficiency of our new system is then evaluated and the experiments put forward the improvements induced by the use of the structure of the documents.