journal article Open Access Jul 30, 2022

Self-Supervised Transformer for Sparse and Irregularly Sampled Multivariate Clinical Time-Series

Abstract
Multivariate time-series data are frequently observed in critical care settings and are typically characterized by sparsity (missing information) and irregular time intervals. Existing approaches for learning representations in this domain handle these challenges by either aggregation or imputation of values, which in-turn suppresses the fine-grained information and adds undesirable noise/overhead into the machine learning model. To tackle this problem, we propose a
S
elf-supervised
Tra
nsformer for
T
ime-
S
eries (STraTS) model, which overcomes these pitfalls by treating time-series as a set of observation triplets instead of using the standard dense matrix representation. It employs a novel Continuous Value Embedding technique to encode continuous time and variable values without the need for discretization. It is composed of a Transformer component with multi-head attention layers, which enable it to learn contextual triplet embeddings while avoiding the problems of recurrence and vanishing gradients that occur in recurrent architectures. In addition, to tackle the problem of limited availability of labeled data (which is typically observed in many healthcare applications), STraTS utilizes self-supervision by leveraging unlabeled data to learn better representations by using time-series forecasting as an auxiliary proxy task. Experiments on real-world multivariate clinical time-series benchmark datasets demonstrate that STraTS has better prediction performance than state-of-the-art methods for mortality prediction, especially when labeled data is limited. Finally, we also present an interpretable version of STraTS, which can identify important measurements in the time-series data. Our data preprocessing and model implementation codes are available at
https://github.com/sindhura97/STraTS
.
Topics

No keywords indexed for this article. Browse by subject →

References
34
[1]
Shaojie Bai J. Zico Kolter and Vladlen Koltun. 2018. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. CoRR abs/1803.01271 (2018). arXiv:1803.01271 http://arxiv.org/abs/1803.01271
[3]
Edwin V. Bonilla, Kian Ming Adam Chai, and Christopher K. I. Williams. 2007. Multi-task gaussian process prediction. In Proceedings of the 21st Annual Conference on Advances in Neural Information Processing Systems. Curran Associates, Inc., 153–160.
[4]
Recurrent Neural Networks for Multivariate Time Series with Missing Values

Zhengping Che, Sanjay Purushotham, Kyunghyun Cho et al.

Scientific Reports 10.1038/s41598-018-24271-9
[6]
Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter F. Stewart. 2016. RETAIN: An interpretable predictive model for healthcare using reverse time attention mechanism. In Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems. 3504–3512.
[7]
Junyoung Chung Çaglar Gülçehre KyungHyun Cho and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. CoRR abs/1412.3555 (2014). arXiv:1412.3555 http://arxiv.org/abs/1412.3555
[8]
Antonio Maria Dell’Anna, Claudio Sandroni, Irene Lamanna, Ilaria Belloni, Katia Donadello, Jacques Creteur, Jean-Louis Vincent, and Fabio Silvio Taccone. 2017. Prognostic implications of blood lactate concentrations after cardiac arrest: a retrospective study. Annals of Intensive Care 7, 1 (2017), 1–9.
[9]

Jacob Devlin, Ming-Wei Chang, Kenton Lee et al.

Proceedings of the 2019 Conference of the North 10.18653/v1/n19-1423
[10]
A. Farhana and S. L. Lappin. 2021. Biochemistry, Lactate Dehydrogenase.[Updated 2020 May 17]. StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing (2021).
[11]
Joseph Futoma, Sanjay Hariharan, and Katherine A. Heller. 2017. Learning to detect sepsis with a multitask gaussian process RNN classifier. In Proceedings of the 34th International Conference on Machine Learning, (ICML’17). PMLR, 1174–1182.
[12]
PhysioBank, PhysioToolkit, and PhysioNet

Ary L. Goldberger, Luis A. N. Amaral, Leon Glass et al.

Circulation 10.1161/01.cir.101.23.e215
[13]
Max Horn, Michael Moor, Christian Bock, Bastian Rieck, and Karsten M. Borgwardt. 2020. Set functions for time series. In Proceedings of the 37th International Conference on Machine Learning, (ICML’20), Vol. 119. PMLR, 4353–4363.
[15]
Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey

Longlong Jing, Yingli Tian

IEEE Transactions on Pattern Analysis and Machine... 10.1109/tpami.2020.2992393
[16]
MIMIC-III, a freely accessible critical care database

Alistair E.W. Johnson, Tom J. Pollard, Lu Shen et al.

Scientific Data 10.1038/sdata.2016.35
[17]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, (ICLR’15).
[18]
Steven Cheng-Xian Li and Benjamin M. Marlin. 2015. Classification of sparse and irregularly sampled time series with mixtures of expected gaussian kernels and random features. In Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence. AUAI Press, 484–493.
[19]
Steven Cheng-Xian Li and Benjamin M. Marlin. 2016. A scalable end-to-end Gaussian process adapter for irregularly sampled time series classification. In Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems (2016). 1804–1812.
[20]
Zachary Chase Lipton David C. Kale and Randall C. Wetzel. 2015. Phenotyping of Clinical Time Series with LSTM Recurrent Neural Networks. CoRR abs/1510.07641 (2015). arXiv:1510.07641 http://arxiv.org/abs/1510.07641
[21]
Zachary C. Lipton, David C. Kale, and Randall C. Wetzel. 2016. Directly modeling missing data in sequences with RNNs: Improved classification of clinical time series. In Proceedings of the 1st Machine Learning in Health Care, MLHC 2016(JMLR Workshop and Conference Proceedings, Vol. 56). JMLR.org, 253–270.
[22]
Self-supervised Learning: Generative or Contrastive

Xiao Liu, Fanjin Zhang, Zhenyu Hou et al.

IEEE Transactions on Knowledge and Data Engineerin... 10.1109/tkde.2021.3090866
[24]
Gaussian Processes in Machine Learning

Carl Edward Rasmussen

Lecture Notes in Computer Science 10.1007/978-3-540-28650-9_4
[25]
Yulia Rubanova, Tian Qi Chen, and David Duvenaud. 2019. Latent ordinary differential equations for irregularly-sampled time series. In Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems, (2019) NeurIPS’19. 5321–5331.
[26]
Satya Narayan Shukla and Benjamin M. Marlin. 2019. Interpolation-prediction networks for irregularly sampled time series. In Proceedings of the 7th International Conference on Learning Representations, (ICLR’19). Retrieved from OpenReview.net. https://openreview.net/forum?id=r1efr3C9Ym.
[28]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems (2014). 3104–3112.
[30]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc.
[31]
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. In Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems (NeurIPS’19). 5754–5764.
Cited By
101
Computational and Structural Biotec...
BMC Medical Informatics and Decisio...
Metrics
101
Citations
34
References
Details
Published
Jul 30, 2022
Vol/Issue
16(6)
Pages
1-17
License
View
Funding
US National Science Foundation Award: IIS-1838730
Cite This Article
Sindhu Tipirneni, Chandan K. Reddy (2022). Self-Supervised Transformer for Sparse and Irregularly Sampled Multivariate Clinical Time-Series. ACM Transactions on Knowledge Discovery from Data, 16(6), 1-17. https://doi.org/10.1145/3516367
Related

You May Also Like

Graph evolution

Jure Leskovec, Jon Kleinberg · 2007

2,024 citations

Isolation-Based Anomaly Detection

Fei Tony Liu, Kai Ming Ting · 2012

1,600 citations

Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection

Ricardo J. G. B. Campello, Davoud Moulavi · 2015

673 citations

A Survey on Causal Inference

Liuyi Yao, Zhixuan Chu · 2021

376 citations

Temporal Link Prediction Using Matrix and Tensor Factorizations

Daniel M. Dunlavy, Tamara G. Kolda · 2011

351 citations