Semi-Supervised Feature Learning from Clinical Text
This paper is focused on the automated identification of the clinical free-text records that contain useful information (e.g. symptoms, modifiers, diagnosis, etc) of a certain disease. We introduce a novel semi-supervised machine learning algorithm to address this problem, by training the set covering machine in a bootstrapping procedure. The advantage of the proposed technique is that not only can it find the documents of interest more accurately than searching based on diagnostic codes, the features it learned could also be directly used as a knowledge representation of the given topic and to assist either further machine learning algorithms or manual post-processing and analysis.