Results of the Cause-Effect Pair Challenge

2019 
We organized a challenge in causal discovery from observational data with the aim of devising a “causation coefficient” to score pairs of variables. The participants were provided with a large database of thousands of pairs of variables {X, Y } (80% semi-artificial data and 20% real data) from which samples were drawn independently (i.e. ignoring possible time dependencies). The goal was to discover whether the data supports the hypothesis that Y = f(X, noise), which for the purpose of this challenge was our definition of causality (X causes Y). The participants adopted a machine learning approach, which contrasts with previously published model-based methods. They extracted numerous features of the joint empirical distribution of X and Y and built a classifier to separate pairs belonging to the class “X causes Y” from other cases (“Y causes X”, “X and Y are related” but not in a causal way, a third variable may be causing both X and Y, “X and Y are independent”). The classifier was trained from examples provided by the organizers and tested on independent test data for which the truth values of causal relationships was known only to the organizers. The participants achieved an Area under the ROC Curve (AUC) over 0.8 in the first phase deployed on the Kaggle challenge, which ran from March through September 2013 (round 1). The participants were then invited to improve upon the code efficiency by submitting fast causation coefficients on the Codalab platform (round 2). The causation coefficients developed by the winners have been made available under open source licenses. We have made all data and code publicly available at http://www.causality.inf.ethz.ch/CEdata/.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []