Linear Data Fusion with drCCA
Abhishek Tripathi, Arto Klami and Samuel Kaski
In: International Workshop on Machine Learning in Systems Biology, 24-25 September 2007, Evry (near Paris), France.
We consider a data fusion problem of combining two or more data sources
where each source consists of vector-valued measurements from the same object or entities but on different variables. The task is to include only those aspects which are mutually informative of each other. This task of including only shared aspects of data sources is motivated through two interrelated lines of thought. The first is noise reduction. If the data sources are measurements of the same entity corrupted by independent noise, discarding source-specific aspects will discard the noise and leave the shared properties that describe the shared entity. The second motivation is to analyze what is interesting in the data. One example is the study of activation profiles of yeast genes in several stressful treatments in the task of defining yeast stress response. In this example what is in common in the sources is what we are really interested in. The “noise” may be very structured; its definition is simply that it is source-specific.
|EPrint Type:||Conference or Workshop Item (Poster)|
|Project Keyword:||Project Keyword UNSPECIFIED|
|Subjects:||Theory & Algorithms|
|Deposited By:||Abhishek Tripathi|
|Deposited On:||10 February 2008|