Causal Discovery with Attention-Based Convolutional Neural Networks

Meike Nauta; Doina Bucur; Christin Seifert

doi:10.3390/make1010019

journal article Open Access Jan 07, 2019

Causal Discovery with Attention-Based Convolutional Neural Networks

Meike Nauta

Doina Bucur

Christin Seifert

Machine Learning and Knowledge Extraction Vol. 1 No. 1 pp. 312-340 · MDPI AG

View at Publisher Save 10.3390/make1010019

Abstract

Having insight into the causal associations in a complex system facilitates decision making, e.g., for medical treatments, urban infrastructure improvements or financial investments. The amount of observational data grows, which enables the discovery of causal relationships between variables from observation of their behaviour in time. Existing methods for causal discovery from time series data do not yet exploit the representational power of deep learning. We therefore present the Temporal Causal Discovery Framework (TCDF), a deep learning framework that learns a causal graph structure by discovering causal relationships in observational time series data. TCDF uses attention-based convolutional neural networks combined with a causal validation step. By interpreting the internal parameters of the convolutional networks, TCDF can also discover the time delay between a cause and the occurrence of its effect. Our framework learns temporal causal graphs, which can include confounders and instantaneous effects. Experiments on financial and neuroscientific benchmarks show state-of-the-art performance of TCDF on discovering causal relationships in continuous time series data. Furthermore, we show that TCDF can circumstantially discover the presence of hidden confounders. Our broadly applicable framework can be used to gain novel insights into the causal dependencies in a complex system, which is important for reliable predictions, knowledge discovery and data-driven decision making.

Topics

No keywords indexed for this article. Browse by subject →

References

63

[1]

Kleinberg, S. (2015). Why: A Guide to Finding and Using Causes, O’Reilly.

[2]

Kleinberg, S. (2013). Causality, Probability, and Time, Cambridge University Press. 10.1017/cbo9781139207799

[3]

Zorzi "AR Identification of Latent-Variable Graphical Models" IEEE Trans. Autom. Control (2016) 10.1109/tac.2015.2491678

[4]

Spirtes "Introduction to causal inference" J. Mach. Learn. Res. (2010)

[5]

Zhang "Learning causality and causality-related learning: Some recent progress" Natl. Sci. Rev. (2017) 10.1093/nsr/nwx137

[6]

Helen Beebee, C.H., and Menzies, P. (2009). The Psychology of Causal Perception and Reasoning. The Oxford Handbook of Causation, Oxford University Press. Chapter 21. 10.1093/oxfordhb/9780199279739.001.0001

[7]

Abdul, A., Vermeulen, J., Wang, D., Lim, B.Y., and Kankanhalli, M. (2018, January 21–26). Trends and trajectories for explainable, accountable and intelligible systems: An hci research agenda. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, Montreal, QC, Canada. 10.1145/3173574.3174156

[8]

Runge, J., Sejdinovic, D., and Flaxman, S. (arXiv, 2017). Detecting causal associations in large nonlinear time series datasets, arXiv.

[9]

Huang, Y., and Kleinberg, S. (2015, January 18–20). Fast and Accurate Causal Inference from Time Series Data. Proceedings of the FLAIRS Conference, Hollywood, FL, USA.

[10]

Hu "A copula approach to assessing Granger causality" NeuroImage (2014) 10.1016/j.neuroimage.2014.06.013

[11]

Papana "Detecting causality in non-stationary time series using partial symbolic transfer entropy: Evidence in financial data" Comput. Econ. (2016) 10.1007/s10614-015-9491-x

[12]

Müller, B., Reinhardt, J., and Strickland, M.T. (2012). Neural Networks: An Introduction, Springer.

[13]

Hyvärinen, A., Shimizu, S., and Hoyer, P.O. (2008, January 5–9). Causal modelling combining instantaneous and lagged effects: An identifiable model based on non-Gaussianity. Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland. 10.1145/1390156.1390210

[14]

Malinsky "Causal discovery algorithms: A practical guide" Philos. Compass (2018) 10.1111/phc3.12470

[15]

Quinn "Estimating the directed information to infer causal relationships in ensemble neural spike train recordings" J. Comput. Neurosci. (2011) 10.1007/s10827-010-0247-2

[16]

Gevers "On the identifiability of dynamical networks" IFAC-PapersOnLine (2017) 10.1016/j.ifacol.2017.08.1310

[17]

Friston "Analysing connectivity with Granger causality and dynamic causal modelling" Curr. Opin. Neurobiol. (2013) 10.1016/j.conb.2012.11.010

[18]

Peters, J., Janzing, D., and Schölkopf, B. (2017). Elements of Causal Inference: Foundations and Learning Algorithms, MIT Press.

[19]

Papana, A., Kyrtsou, K., Kugiumtzis, D., and Diks, C. (2014). Identifying Causal Relationships in Case of Non-Stationary Time Series, Universiteit van Amsterdam. Technical Report.

[20]

Chu "Search for additive nonlinear time series causal models" J. Mach. Learn. Res. (2008)

[21]

Entner, D., and Hoyer, P.O. (2010, January 13–15). On causal discovery from time series data using FCI. Proceedings of the Fifth European Workshop on Probabilistic Graphical Models, Helsinki, Finland.

[22]

Peters, J., Janzing, D., and Schölkopf, B. (2013). Causal inference on time series using restricted structural equation models. Advances in Neural Information Processing Systems, The MIT Press.

[23]

Jiao "Universal estimation of directed information" IEEE Trans. Inf. Theory (2013) 10.1109/tit.2013.2267934

[24]

Granger "Investigating causal relations by econometric models and cross-spectral methods" Econom. J. Econom. Soc. (1969)

[25]

Chen "Frequency decomposition of conditional Granger causality and application to multivariate neural field potential data" J. Neurosci. Methods (2006) 10.1016/j.jneumeth.2005.06.011

[26]

Zorzi "Sparse plus low rank network identification: A nonparametric approach" Automatica (2017) 10.1016/j.automatica.2016.08.014

[27]

Marinazzo "Kernel method for nonlinear Granger causality" Phys. Rev. Lett. (2008) 10.1103/physrevlett.100.144103

[28]

Luo, Q., Ge, T., Grabenhorst, F., Feng, J., and Rolls, E.T. (2013). Attention-dependent modulation of cortical taste circuits revealed by Granger causality with signal-dependent noise. PLoS Comput. Biol., 9. 10.1371/journal.pcbi.1003265

[29]

Spirtes "Causal discovery and inference: Concepts and recent methodological advances" Applied Informatics (2016) 10.1186/s40535-016-0018-x

[30]

Spirtes, P., Glymour, C.N., and Scheines, R. (2000). Causation, Prediction, and Search, MIT Press. 10.7551/mitpress/1754.001.0001

[31]

Liu, Y., and Aviyente, S. (2012, January 5–8). The relationship between transfer entropy and directed information. Proceedings of the Statistical Signal Processing Workshop (SSP), Ann Arbor, MI, USA. 10.1109/ssp.2012.6319809

[32]

Guo, T., Lin, T., and Lu, Y. (May, January 30). An Interpretable LSTM Neural Network for Autoregressive Exogenous Model. Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada.

[33]

Louizos, C., Shalit, U., Mooij, J.M., Sontag, D., Zemel, R., and Welling, M. (2017). Causal effect inference with deep latent-variable models. Advances in Neural Information Processing Systems, The MIT Press.

[34]

Goudet, O., Kalainathan, D., Caillou, P., Guyon, I., Lopez-Paz, D., and Sebag, M. (arXiv, 2018). Causal Generative Neural Networks, arXiv.

[35]

Kalainathan, D., Goudet, O., Guyon, I., Lopez-Paz, D., and Sebag, M. (arXiv, 2018). SAM: Structural Agnostic Model, Causal Discovery and Penalized Adversarial Learning, arXiv.

[36]

Bai, S., Kolter, J.Z., and Koltun, V. (May, January 30). Convolutional Sequence Modeling Revisited. Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada.

[37]

Learning long-term dependencies with gradient descent is difficult

Y. Bengio, P. Simard, P. Frasconi

IEEE Transactions on Neural Networks 1994 10.1109/72.279181

[38]

Gehring, J., Auli, M., Grangier, D., Yarats, D., and Dauphin, Y.N. (2017, January 6–11). Convolutional Sequence to Sequence Learning. Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia.

[39]

Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A., and Kavukcuoglu, K. (2016). Conditional image generation with pixelCNN decoders. Advances in Neural Information Processing Systems, The MIT Press.

[40]

Borovykh, A., Bohte, S., and Oosterlee, C.W. (2017). Conditional time series forecasting with convolutional neural networks. Lecture Notes in Computer Science/Lecture Notes in Artificial Intelligence, Springer.

[41]

Binkowski, M., Marti, G., and Donnat, P. (arXiv, 2017). Autoregressive Convolutional Neural Networks for Asynchronous Time Series, arXiv.

[42]

Walther, D., Rutishauser, U., Koch, C., and Perona, P. (2004, January 15). On the usefulness of attention for object recognition. Proceedings of the Workshop on Attention and Performance in Computational Vision at ECCV, Prague, Czech Republic.

[43]

Yin "ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs" Trans. Assoc. Comput. Linguist. (2016) 10.1162/tacl_a_00097

[44]

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Kaiming He, Xiangyu Zhang, Shaoqing Ren et al.

2015 IEEE International Conference on Computer Vis... 10.1109/iccv.2015.123

[45]

Van Den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., and Kavukcuoglu, K. (arXiv, 2016). Wavenet: A generative model for raw audio, arXiv.

[46]

Sifre, L., and Mallat, S. (2018, October 15). Rigid-Motion Scattering for Image Classification. Available online: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.672.7091&rep=rep1&type=pdf.

[47]

Xception: Deep Learning with Depthwise Separable Convolutions

Francois Chollet

2017 IEEE Conference on Computer Vision and Patter... 10.1109/cvpr.2017.195

[48]

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren et al.

2016 IEEE Conference on Computer Vision and Patter... 10.1109/cvpr.2016.90

[49]

Martins, A., and Astudillo, R. (2016, January 19–24). From softmax to sparsemax: A sparse model of attention and multi-label classification. Proceedings of the International Conference on Machine Learning, New York, NY, USA.

[50]

Shen, T., Zhou, T., Long, G., Jiang, J., Wang, S., and Zhang, C. (2018, January 13–19). Reinforced Self-Attention Network: A Hybrid of Hard and Soft Attention for Sequence Modeling. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, Stockholm, Sweden. 10.24963/ijcai.2018/604

Showing 50 of 63 references

Cited By

197

A unified neural framework for long-term time series forecasting and granger-style causal analysis

Yanan Zhang, Bo Feng · 2026

Engineering Applications of Artific...

Research on Fault Diagnosis of Marine Diesel Engines Based on CNN-TCN–ATTENTION

Ao Ma, Jundong Zhang · 2025

Applied Sciences

A synergistic future for AI and ecology

Barbara A. Han, Kush R. Varshney · 2023

Proceedings of the National Academy...

Metrics

197

Citations

63

References

Details

Published: Jan 07, 2019
Vol/Issue: 1(1)
Pages: 312-340
License: View

Authors

M

Meike Nauta

Faculty of EEMCS, University of Twente, PO Box 217, 7500 AE Enschede, The Netherlands

D

Doina Bucur

Faculty of EEMCS, University of Twente, PO Box 217, 7500 AE Enschede, The Netherlands

C

Christin Seifert

Faculty of EEMCS, University of Twente, PO Box 217, 7500 AE Enschede, The Netherlands

Cite This Article

Meike Nauta, Doina Bucur, Christin Seifert (2019). Causal Discovery with Attention-Based Convolutional Neural Networks. Machine Learning and Knowledge Extraction, 1(1), 312-340. https://doi.org/10.3390/make1010019

Causal Discovery with Attention-Based Convolutional Neural Networks

You May Also Like