journal article Open Access Feb 28, 2025

Transformer-Based Models for Probabilistic Time Series Forecasting with Explanatory Variables

Mathematics Vol. 13 No. 5 pp. 814 · MDPI AG
View at Publisher Save 10.3390/math13050814
Abstract
Accurate demand forecasting is essential for retail operations as it directly impacts supply chain efficiency, inventory management, and financial performance. However, forecasting retail time series presents significant challenges due to their irregular patterns, hierarchical structures, and strong dependence on external factors such as promotions, pricing strategies, and socio-economic conditions. This study evaluates the effectiveness of Transformer-based architectures, specifically Vanilla Transformer, Informer, Autoformer, ETSformer, NSTransformer, and Reformer, for probabilistic time series forecasting in retail. A key focus is the integration of explanatory variables, such as calendar-related indicators, selling prices, and socio-economic factors, which play a crucial role in capturing demand fluctuations. This study assesses how incorporating these variables enhances forecast accuracy, addressing a research gap in the comprehensive evaluation of explanatory variables within multiple Transformer-based models. Empirical results, based on the M5 dataset, show that incorporating explanatory variables generally improves forecasting performance. Models leveraging these variables achieve up to 12.4% reduction in Normalized Root Mean Squared Error (NRMSE) and 2.9% improvement in Mean Absolute Scaled Error (MASE) compared to models that rely solely on past sales. Furthermore, probabilistic forecasting enhances decision making by quantifying uncertainty, providing more reliable demand predictions for risk management. These findings underscore the effectiveness of Transformer-based models in retail forecasting and emphasize the importance of integrating domain-specific explanatory variables to achieve more accurate, context-aware predictions in dynamic retail environments.
Topics

No keywords indexed for this article. Browse by subject →

References
62
[1]
Petropoulos "Forecasting: Theory and practice" Int. J. Forecast. (2022) 10.1016/j.ijforecast.2021.11.001
[2]
Fildes "Retail forecasting: Research and practice" Int. J. Forecast. (2022) 10.1016/j.ijforecast.2019.06.004
[3]
Oliveira, J.M., and Ramos, P. (2019). Assessing the Performance of Hierarchical Forecasting Methods on the Retail Sector. Entropy, 21. 10.3390/e21040436
[4]
Theodoridis "Retail Demand Forecasting: A Multivariate Approach and Comparison of Boosting and Deep Learning Methods" Int. J. Artif. Intell. Tools (2024) 10.1142/s0218213024500015
[5]
Ramos, P., and Oliveira, J.M. (2016). A procedure for identification of appropriate state space and ARIMA models based on time-series cross-validation. Algorithms, 9. 10.3390/a9040076
[6]
Benidis "Deep Learning for Time Series Forecasting: Tutorial and Literature Survey" ACM Comput. Surv. (2022) 10.1145/3533382
[7]
Ramos, P., and Oliveira, J.M. (2023). Robust Sales Forecasting Using Deep Learning with Static and Dynamic Covariates. Appl. Syst. Innov., 6. 10.20944/preprints202308.0427.v1
[8]
Bojer "Kaggle forecasting competitions: An overlooked learning opportunity" Int. J. Forecast. (2021) 10.1016/j.ijforecast.2020.07.007
[9]
Iliadis, L., Maglogiannis, I., Alonso, S., Jayne, C., and Pimenidis, E. (2023, January 14–17). Cross-Learning-Based Sales Forecasting Using Deep Learning via Partial Pooling from Multi-level Data. Proceedings of the Engineering Applications of Neural Networks, León, Spain.
[10]
Teixeira "Enhancing Hierarchical Sales Forecasting with Promotional Data: A Comparative Study Using ARIMA and Deep Neural Networks" Mach. Learn. Knowl. Extr. (2024) 10.3390/make6040128
[11]
Oliveira, J.M., and Ramos, P. (2023). Investigating the Accuracy of Autoregressive Recurrent Networks Using Hierarchical Aggregation Structure-Based Data Partitioning. Big Data Cogn. Comput., 7. 10.20944/preprints202304.0222.v1
[12]
Islam "A comprehensive survey on applications of transformers for deep learning tasks" Expert Syst. Appl. (2024) 10.1016/j.eswa.2023.122666
[13]
Oliveira, J.M., and Ramos, P. (2024). Evaluating the Effectiveness of Time Series Transformers for Demand Forecasting in Retail. Mathematics, 12. 10.3390/math12172728
[14]
Torres "Deep Learning for Time Series Forecasting: A Survey" Big Data (2021) 10.1089/big.2020.0159
[15]
Bandara "Sales Demand Forecast in E-commerce Using a Long Short-Term Memory Neural Network Methodology" Proceedings of the Neural Information Processing, ICONIP 2019 (2019)
[16]
A hybrid deep learning framework with CNN and Bi-directional LSTM for store item demand forecasting

Reuben Varghese Joseph, Anshuman Mohanty, Soumyae Tyagi et al.

Computers & Electrical Engineering 2022 10.1016/j.compeleceng.2022.108358
[17]
Giri "Deep Learning for Demand Forecasting in the Fashion and Apparel Retail Industry" Forecasting (2022) 10.3390/forecast4020031
[18]
Kollu "Bi-GRU-APSO: Bi-Directional Gated Recurrent Unit with Adaptive Particle Swarm Optimization Algorithm for Sales Forecasting in Multi-Channel Retail" Telecom (2024) 10.3390/telecom5030028
[19]
Arai, K. (2024). Deep Learning Models for Inventory Decisions: A Comparative Analysis. Proceedings of the Intelligent Systems and Applications, Springer.
[20]
Yuan "Hybrid convolutional long short-term memory models for sales forecasting in retail" J. Forecast. (2024) 10.1002/for.3073
[21]
Wu "Unveiling consumer preferences: A two-stage deep learning approach to enhance accuracy in multi-channel retail sales forecasting" Expert Syst. Appl. (2024) 10.1016/j.eswa.2024.125066
[22]
Sousa "Predicting demand for new products in fashion retailing using censored data" Expert Syst. Appl. (2025) 10.1016/j.eswa.2024.125313
[23]
Huang "The value of competitive information in forecasting FMCG retail product sales and the variable selection problem" Eur. J. Oper. Res. (2014) 10.1016/j.ejor.2014.02.022
[24]
Loureiro "Exploring the use of deep neural networks for sales forecasting in fashion retail" Decis. Support Syst. (2018) 10.1016/j.dss.2018.08.010
[25]
Deep learning with long short-term memory networks and random forests for demand forecasting in multi-channel retail

Sushil Punia, Konstantinos Nikolopoulos, Surya Prakash Singh et al.

International Journal of Production Research 2020 10.1080/00207543.2020.1735666
[26]
Temporal Fusion Transformers for interpretable multi-horizon time series forecasting

Bryan Lim, Sercan Ö. Arık, Nicolas Loeff et al.

International Journal of Forecasting 2021 10.1016/j.ijforecast.2021.03.012
[27]
Wang "Considering economic indicators and dynamic channel interactions to conduct sales forecasting for retail sectors" Comput. Ind. Eng. (2022) 10.1016/j.cie.2022.107965
[28]
Kao "Deep Learning Based Purchase Forecasting for Food Producer-Retailer Team Merchandising" Sci. Program. (2022)
[29]
Ramos, P., Oliveira, J.M., Kourentzes, N., and Fildes, R. (2023). Forecasting Seasonal Sales with Many Drivers: Shrinkage or Dimensionality Reduction?. Appl. Syst. Innov., 6. 10.3390/asi6010003
[30]
Punia "Predictive analytics for demand forecasting: A deep learning-based decision support system" Knowl.-Based Syst. (2022) 10.1016/j.knosys.2022.109956
[31]
Nasseri, M., Falatouri, T., Brandtner, P., and Darbanian, F. (2023). Applying Machine Learning in Retail Demand Prediction—A Comparison of Tree-Based Ensembles and Long Short-Term Memory-Based Deep Learning. Appl. Sci., 13. 10.3390/app131911112
[32]
Wellens "Simplifying tree-based methods for retail sales forecasting with explanatory variables" Eur. J. Oper. Res. (2024) 10.1016/j.ejor.2023.10.039
[33]
Praveena "A Hybrid Deep Learning Based Deep Prophet Memory Neural Network Approach for Seasonal Items Demand Forecasting" J. Adv. Inf. Technol. (2024)
[34]
Wen, R., Torkkola, K., Narayanaswamy, B., and Madeka, D. (2018). A Multi-Horizon Quantile Recurrent Forecaster. arXiv.
[35]
DeepAR: Probabilistic forecasting with autoregressive recurrent networks

David Salinas, Valentin Flunkert, Jan Gasthaus et al.

International Journal of Forecasting 2020 10.1016/j.ijforecast.2019.07.001
[36]
Rasul, K., Seward, C., Schuster, I., and Vollgraf, R. (2021, January 18–24). Autoregressive Denoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting. Proceedings of the 38th International Conference on Machine Learning, Online.
[37]
Rasul, K., Sheikh, A.S., Schuster, I., Bergmann, U., and Vollgraf, R. (2021). Multivariate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows. arXiv.
[38]
Ranzato "Probabilistic Forecasting: A Level-Set Approach" Proceedings of the Advances in Neural Information Processing Systems (2021)
[39]
Meila, M., and Zhang, T. (2021, January 18–24). End-to-End Learning of Coherent Probabilistic Forecasts for Hierarchical Time Series. Proceedings of the 38th International Conference on Machine Learning, Online. PMLR; Proceedings of Machine Learning Research.
[40]
Kan, K., Aubet, F.X., Januschowski, T., Park, Y., Benidis, K., Ruthotto, L., and Gasthaus, J. (2022, January 28–30). Multivariate Quantile Function Forecaster. Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, Virtual. PMLR; Proceedings of Machine Learning Research.
[41]
Shchur, O., Turkmen, C., Erickson, N., Shen, H., Shirkov, A., Hu, T., and Wang, Y. (2023, January 12–15). AutoGluon-TimeSeries: AutoML for Probabilistic Time Series Forecasting. Proceedings of the International Conference on Automated Machine Learning, Potsdam, Germany. PMLR.
[42]
Tong "Enhancing time series forecasting: A hierarchical transformer with probabilistic decomposition representation" Inf. Sci. (2023) 10.1016/j.ins.2023.119410
[43]
Parameter-efficient deep probabilistic forecasting

Olivier Sprangers, Sebastian Schelter, Maarten de Rijke

International Journal of Forecasting 2023 10.1016/j.ijforecast.2021.11.011
[44]
Olivares "Probabilistic hierarchical forecasting with deep Poisson mixtures" Int. J. Forecast. (2024) 10.1016/j.ijforecast.2023.04.007
[45]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., and Polosukhin, I. (2017, January 4–9). Attention is All you Need. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA.
[46]
Zhou "Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting" Proc. AAAI Conf. Artif. Intell. (2021)
[47]
Wu, H., Xu, J., Wang, J., and Long, M. (2021, January 6–14). Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. Proceedings of the Advances in Neural Information Processing Systems, Online.
[48]
Woo, G., Liu, C., Sahoo, D., Kumar, A., and Hoi, S. (2022). ETSformer: Exponential Smoothing Transformers for Time-series Forecasting. arXiv.
[49]
Liu, Y., Wu, H., and Wang, J. (December, January 28). Non-stationary transformers: Exploring the stationarity in time series forecasting. Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022), New Orleans, LA, USA.
[50]
Kitaev, N., Łukasz, K., and Levskaya, A. (2020). Reformer: The Efficient Transformer. arXiv.

Showing 50 of 62 references