PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Extraction de Connaissances dans des Données Numériques et Textuelles
Jérome Azé
(2003) PhD thesis, LRI - University Paris South.


The work realised within the framework of this thesis relates to the retrieval of knowledge in transactional data. The analysis of such data is often linked to the definition of a minimal support uses to filter uninteresting knowledge. The experts of the data often have difficulties to determine this support. We have proposed a method enabling to fix no minimal support and based on the use of measures of quality. We choose to focus on the extraction of knowledge of the form ``association rules''. These rules must verify one or more quality standards to be considered as interesting and proposed to the expert. We have defined two measures of quality combining differents criteria and allowing us to extract interesting rules from the data. We thus could propose an algorithm allowing to extract these rules without using the constraint of the minimal support. The behavior of our algorithm has been studied with noisy data and we could highlight the difficulty of automatically extracting reliable knowledge from noisy data. One of the solutions which has been proposed consists to evaluate the noise resistance of each rule and to inform the expert during this analysis and validation of knowledge obtained. Lastly, a study on real data has been done within the framework of a process text mining. The knowledge looked for in these texts are association rules between concepts defined by the expert and specific to the field. We have proposed a tool extracting knowledge and assisting the expert during its validation. The various results obtained show that it is possible to extract interesting knowledge from textual data while minimizing the expert involvement in the association rules extraction step.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Thesis (PhD)
Additional Information: Association Rules, Measures of quality, Noisy data, Text Mining
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Natural Language Processing
Theory & Algorithms
ID Code:666
Deposited By:Jérome Azé
Deposited On:29 December 2004