Abstract
For timing-sensitive edge applications, the demand for efficient lightweight machine learning solutions has increased recently. Tree ensembles are among the state-of-the-art in many machine learning applications. While single decision trees are comparably small, an ensemble of trees can have a significant memory footprint leading to cache locality issues, which are crucial to performance in terms of execution time. In this work, we analyze memory-locality issues of the two most common realizations of decision trees, i.e., native and if-else trees. We highlight that both realizations demand a more careful memory layout to improve caching behavior and maximize performance. We adopt a probabilistic model of decision tree inference to find the best memory layout for each tree at the application layer. Further, we present an efficient heuristic to take architecture-dependent information into account thereby optimizing the given ensemble for a target computer architecture. Our code-generation framework, which is freely available on an open-source repository, produces optimized code sessions while preserving the structure and accuracy of the trees. With several real-world data sets, we evaluate the elapsed time of various tree realizations on server hardware as well as embedded systems for Intel and ARM processors. Our optimized memory layout achieves a reduction in execution time up to 75 % execution for server-class systems, and up to 70 % for embedded systems, respectively.
Topics

No keywords indexed for this article. Browse by subject →

References
41
[1]
Runtime Optimizations for Tree-Based Machine Learning Models

Nima Asadi, Jimmy Lin, Arjen P. de Vries

IEEE Transactions on Knowledge and Data Engineerin... 10.1109/tkde.2013.73
[2]
G. Biau. 2012. Analysis of a random forests model. Journal of Machine Learning Research 13, Apr (2012), 1063–1095.
[3]
A random forest guided tour

Gérard Biau, Erwan Scornet

TEST 10.1007/s11749-016-0481-7
[5]
Bagging predictors

Leo Breiman

Machine Learning 10.1007/bf00058655
[6]
L. Breiman. 1996. Bias variance and arcing classifiers.
[7]
L. Breiman. 2000. Some Infinity Theory for Predictor Ensembles. Technical Report. Technical Report 579, Statistics Dept. UCB.
[8]
Random Forests

Leo Breiman

Machine Learning 10.1023/a:1010933404324
[11]
Jens Buss, Christian Bockermann, Katharina Morik, Wolfgang Rhode, and Tim Ruhe. 2016. FACT-Tools – Processing high-volume telescope data. In 26th Astronomical Data Analysis Software and Systems Conference (ADASS). ADASS.
[14]
M. Denil, D. Matheson, and N. De Freitas. 2014. Narrowing the gap: Random forests in theory and in practice. In International Conference on Machine Learning (ICML).
[15]
Ulrich Drepper. 2007. What Every Programmer Should Know About Memory.
[17]
Random Forests for Real Time 3D Face Analysis

Gabriele Fanelli, Matthias Dantone, Juergen Gall et al.

International Journal of Computer Vision 10.1007/s11263-012-0549-0
[18]
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

Yoav Freund, Robert E Schapire

Journal of Computer and System Sciences 10.1006/jcss.1997.1504
[19]
Extremely randomized trees

Pierre Geurts, Damien Ernst, Louis Wehenkel

Machine Learning 10.1007/s10994-006-6226-1
[20]
T. Gleixner and I. Molnar. 2012. Perf: Linux profiling with performance counters. https://perf.wiki.kernel.org/index.php/Main_Page. Accessed: 2021-05-31.
[23]
John L. Hennessy and David A. Patterson. 2011. Computer Architecture, Fifth Edition: A Quantitative Approach (5th ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
[26]
Pascal Libuschewski. 2017. Exploration of Cyber-physical Systems for GPGPU Computer Vision-based Detection of Biological Viruses. Ph.D. Dissertation. Technical University of Dortmund, Germany. http://hdl.handle.net/2003/35929.
[27]
M. Lichman. 2013. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml.
[28]
G. Louppe. 2014. Understanding random forests: From theory to practice. arXiv preprint arXiv:1407.7502 (2014).
[30]
Claudio Lucchese, Franco Maria Nradini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto, and Rossano Venturini. 2017. QuickScorer: Efficient traversal of large ensembles of decision trees. In Procs. ECML PKDD 2017. Springer, 383–387.
[33]
Supun Nakandala, Karla Saur, Gyeong-In Yu, Konstantinos Karanasos, Carlo Curino, Markus Weimer, and Matteo Interlandi. 2020. A tensor compiler for unified machine learning prediction serving. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 899–917.
[34]
Siegfried Nijssen and Joost N. Kok. 2006. Frequent subgraph mines: Runtimes don’t say everything. In Procs. 4th Workshop on Mining and Learning with Graphs (MLG). 173–180.
[39]
B. Van Essen, C. Macaraeg, M. Gokhale, and R. Prenger. 2012. Accelerating a random forest classifier: Multi-core, GP-GPU, or FPGA?. In Field-Programmable Custom Computing Machines (FCCM), 2012 IEEE 20th Annual International Symposium. IEEE, 232–239. 10.1109/fccm.2012.47
Metrics
18
Citations
41
References
Details
Published
Oct 18, 2022
Vol/Issue
21(6)
Pages
1-26
License
View
Funding
Deutsche Forschungsgemeinschaft Award: 405422836 and 124020371
Federal Ministry of Education and Research of Germany as part of the competence center for machine learning ML2R Award: 01IS18038A
Deutscher Akademischer Austauschdienst (DAAD) within the Programme for Project-Related Personal Exchange Award: 57559723
Cite This Article
Kuan-Hsun Chen, Chiahui Su, Christian Hakert, et al. (2022). Efficient Realization of Decision Trees for Real-Time Inference. ACM Transactions on Embedded Computing Systems, 21(6), 1-26. https://doi.org/10.1145/3508019
Related

You May Also Like

HiCH

Iman Azimi, Arman Anzanpour · 2017

138 citations

Building timing predictable embedded systems

Philip Axer, Rolf Ernst · 2014

79 citations

A Survey of Blockchain Data Management Systems

Qian Wei, Bingzhe Li · 2022

68 citations

LiBrA-CAN

Bogdan Groza, Stefan Murvay · 2017

43 citations