Deep Learning for Time Series Forecasting: Tutorial and Literature Survey

Deep learning based forecasting methods have become the methods of choice in many applications of time series prediction or
forecasting
often outperforming other approaches. Consequently, over the last years, these methods are now ubiquitous in large-scale industrial forecasting applications and have consistently ranked among the best entries in forecasting competitions (e.g., M4 and M5). This practical success has further increased the academic interest to understand and improve deep forecasting methods. In this article we provide an introduction and overview of the field: We present important building blocks for deep forecasting in some depth; using these building blocks, we then survey the breadth of the recent deep forecasting literature.

Topics

No keywords indexed for this article. Browse by subject →

References

206

[1]

Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation. 265–283.

[2]

Unsupervised real-time anomaly detection for streaming data

Subutai Ahmad, Alexander Lavin, Scott Purdy et al.

Neurocomputing 10.1016/j.neucom.2017.04.070

[3]

10.1145/2124295.2124312

[4]

Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C. Maddix, Syama Sundar Rangapuram, David Salinas, Jasper Schulz, et al. 2020. GluonTS: Probabilistic and neural time series modeling in python.Journal of Machine Learning Research 21, 116 (2020), 1–6.

[5]

Abdul Fatir Ansari, Konstantinos Benidis, Richard Kurle, Ali Caner Turkmen, Harold Soh, Alexander J. Smola, Bernie Wang, and Tim Januschowski. 2021. Deep explicit duration switching models for time series. Advances in Neural Information Processing Systems 34 (2021).

[6]

10.1016/j.asoc.2019.105963

[7]

10.1016/j.ijforecast.2008.07.004

[8]

10.1016/j.ejor.2017.02.046

[9]

Fadhel Ayed Lorenzo Stella Tim Januschowski and Jan Gasthaus. 2020. Anomaly Detection at Scale: The Case for Deep Distributional Time Series Models. arXiv:2007.15541. Retrieved from https://arxiv.org/abs/2007.15541.

[10]

Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473. Retrieved from https://arxiv.org/abs/1409.0473.

[11]

Shaojie Bai J. Zico Kolter and Vladlen Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv:1803.01271. Retrieved from https://arxiv.org/abs/1803.01271.

[12]

10.1016/j.ijforecast.2019.03.022

[13]

Kasun Bandara, Christoph Bergmeir, and Hansika Hewamalage. 2020. LSTM-MSNet: Leveraging forecasts on sets of related time series with multiple seasonal patterns. IEEE Transactions on Neural Networks and Learning Systems (2020).

[14]

Kasun Bandara Christoph Bergmeir and Slawek Smyl. 2017. Forecasting across time series databases using long short-term memory networks on groups of similar series. arXiv:1710.03222. Retrieved from https://arxiv.org/abs/1710.03222.

[15]

10.1007/978-3-030-36718-3_39

[16]

10.5555/2207809

[17]

Souhaib Ben Taieb, James W. Taylor, and Rob J. Hyndman. 2017. Coherent probabilistic forecasts for hierarchical time series. In Proceedings of the International Conference on Machine Learning. 3348–3357.

[18]

Marin Biloš, Johanna Sommer, Syama Sundar Rangapuram, Tim Januschowski, and Stephan Günnemann. 2021. Neural flows: Efficient alternative to neural ODEs. Advances in Neural Information Processing Systems 34 (2021).

[19]

10.1007/978-981-10-0557-2_87

[20]

Toby Bischoff and Austin Gross. 2019. Wavenet & Dropout: An efficient setup for competitive forecasts at scale. In Proceedings of the International Symposium on Forecasting.

[21]

Michael Bohlke-Schneider, Paul Jeha, Pedro Mercado, Shubham Kapoor, Jan Gasthaus, and Tim Januschowski. 2022. PSA-GAN: Progressive self attention GANs for synthetic time series. In Proceedings of the International Conference on Learning Representations.

[22]

10.1145/3399579.3399869

[23]

Oliver Borchert David Salinas Valentin Flunkert Tim Januschowski and Stephan Günnemann. 2022. Multi-objective model selection for time series forecasting. arXiv:2202.08485. Retrieved from https://arxiv.org/abs/2202.08485.

[24]

Anastasia Borovykh Sander Bohte and Cornelis W. Oosterlee. 2017. Conditional time series forecasting with convolutional neural networks. arXiv:1703.04691. Retrieved from https://arxiv.org/abs/1703.04691.

[25]

10.14778/3137765.3137775

[26]

A training algorithm for optimal margin classifiers

Bernhard E. Boser, Isabelle M. Guyon, Vladimir N. Vapnik

Proceedings of the fifth annual workshop on Comput... 10.1145/130385.130401

[27]

10.1016/j.csda.2004.02.006

[28]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems 33 (2020), 1877–1901.

[29]

Laurent Callot, Mehmet Caner, A. Özlem Önder, and Esra Ulaşan. 2019. A nodewise regression approach to estimating large portfolios. Journal of Business & Economic Statistics (2019), 1–12.

[30]

10.1002/jae.2512

[31]

Nicolas Chapados. 2014. Effective Bayesian modeling of groups of related count time series. In Proceedings of the International Conference on Machine Learning. PMLR, 1395–1403.

[32]

10.1007/978-3-030-04167-0

[33]

10.1145/2939672.2939785

[34]

Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. In Proceedings of the NeurIPS Workshop on Machine Learning Systems.

[35]

Probabilistic forecasting with temporal convolutional neural network

Yitian Chen, Yanfei Kang, Yixiong Chen et al.

Neurocomputing 10.1016/j.neucom.2020.03.011

[36]

KyungHyun Cho Bart van Merrienboer Dzmitry Bahdanau and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. arXiv:1409.1259. Retrieved from https://arxiv.org/abs/1409.1259.

[37]

Jan Chorowski Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2014. End-to-end continuous speech recognition using attention-based recurrent NN: First results. arXiv:1412.1602. Retrieved from https://arxiv.org/abs/1412.1602.

[38]

Jan K. Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. 2015. Attention-based models for speech recognition. In Proceedings of the Advances in Neural Information Processing Systems. 577–585.

[39]

10.1007/978-3-319-70139-4_54

[40]

Michael J. Crawley. 2012. Mixed-effects models. The R Book, Second Edition (2012), 681–714.

[41]

10.1057/jors.1972.50

[42]

Emmanuel de Bézenac, Syama Sundar Rangapuram, Konstantinos Benidis, Michael Bohlke-Schneider, Richard Kurle, Lorenzo Stella, Hilaf Hasson, Patrick Gallinari, and Tim Januschowski. 2020. Normalizing kalman filters for multivariate time series analysis. Advances in Neural Information Processing Systems 33 (2020).

[43]

Graph Neural Network-Based Anomaly Detection in Multivariate Time Series

Ailin Deng, Bryan Hooi

Proceedings of the AAAI Conference on Artificial I... 10.1609/aaai.v35i5.16523

[44]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186.

[45]

10.1016/j.ijforecast.2018.09.007

[46]

Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. 2017. Density estimation using Real NVP. In Proceedings of the 5th International Conference on Learning Representations.

[47]

10.5555/2832581.2832731

[48]

10.1145/2939672.2939875

[49]

10.1093/acprof:oso/9780199641178.001.0001

[50]

Elena Ehrlich Laurent Callot and François-Xavier Aubet. 2021. Spliced binned-pareto distribution for robust modeling of heavy-tailed time series. arXiv:2106.10952. Retrieved from https://arxiv.org/abs/2106.10952.

Showing 50 of 206 references

Cited By

280

Is Mamba effective for time series forecasting?

Zihan Wang, Fanheng Kong · 2025

Neurocomputing

Metrics

280

Citations

206

References

Details

Published: Dec 07, 2022
Vol/Issue: 55(6)
Pages: 1-36
License: View

Authors

K

Konstantinos Benidis

Amazon Research, Charlottenstrasse, Berlin, Germany

S

Syama Sundar Rangapuram

Amazon Research, Charlottenstrasse, Berlin, Germany

V

Valentin Flunkert

Amazon Research, Charlottenstrasse, Berlin, Germany

Y

Yuyang Wang

Amazon Research, East Palo Alto, CA, USA

D

Danielle Maddix

Amazon Research, East Palo Alto, CA, USA

C

Caner Turkmen

Amazon Research, Charlottenstrasse, Berlin, Germany

J

Jan Gasthaus

Amazon Research, Charlottenstrasse, Berlin, Germany

M

Michael Bohlke-Schneider

Amazon Research, Charlottenstrasse, Berlin, Germany

D

David Salinas

Amazon Research, Charlottenstrasse, Berlin, Germany

L

Lorenzo Stella

Amazon Research, Charlottenstrasse, Berlin, Germany

F

François-Xavier Aubet

Amazon Research, Charlottenstrasse, Berlin, Germany

L

Laurent Callot

Amazon Research, Charlottenstrasse, Berlin, Germany

T

Tim Januschowski

Zalando SE, Berlin, Germany

Cite This Article

Konstantinos Benidis, Syama Sundar Rangapuram, Valentin Flunkert, et al. (2022). Deep Learning for Time Series Forecasting: Tutorial and Literature Survey. ACM Computing Surveys, 55(6), 1-36. https://doi.org/10.1145/3533382

Deep Learning for Time Series Forecasting: Tutorial and Literature Survey

You May Also Like