Justifying additive-noise-model based causal discovery via algorithmic information theory
A recent method for causal discovery is in many cases able to infer whether X causes Y or Y causes X for just two observed variables X and Y. It is based on the observation that there exist (non-Gaussian) joint distributions P(X,Y) for which Y may be written as a function of X up to an additive noise term that is independent of X and no such model exists from Y to X. Whenever this is the case, one prefers the causal model X--> Y. Here we justify this method by showing that the causal hypothesis Y--> X is unlikely because it requires a specific tuning between P(Y) and P(X|Y) to generate a distribution that admits an additive noise model from X to Y. To quantify the amount of tuning required we derive lower bounds on the algorithmic information shared by P(Y) and P(X|Y). This way, our justification is consistent with recent approaches for using algorithmic information theory for causal reasoning. We extend this principle to the case where P(X,Y) almost admits an additive noise model. Our results suggest that the above conclusion is more reliable if the complexity of P(Y) is high.