journal article Jul 01, 2026

SpaceFusion++: An operator fusion scheduler for neural language model inference

View at Publisher Save 10.1016/j.sysarc.2026.103708
Topics

No keywords indexed for this article. Browse by subject →

References
81
[1]
Kaplan (2020)
[2]
Hestness (2017)
[3]
Thompson (2020)
[4]
Hu "Model complexity of deep learning: A survey" Knowl. Inf. Syst. (2021) 10.1007/s10115-021-01605-0
[5]
Thompson (2020)
[6]
Paleyes "Challenges in deploying machine learning: a survey of case studies" ACM Comput. Surv. (2022) 10.1145/3533378
[7]
W. Niu, J. Guan, Y. Wang, G. Agrawal, B. Ren, DNNFusion: accelerating deep neural networks execution with advanced operator fusion, in: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2021, pp. 883–898. 10.1145/3453483.3454083
[8]
Zheng "Ansor: Generating High-Performance tensor programs for deep learning" (2020)
[9]
Chen "Learning to optimize tensor programs" Adv. Neural Inf. Process. Syst. (2018)
[10]
Zheng (2021)
[11]
Zheng "Astitch: Enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures" (2022)
[12]
Zheng "Chimera: An analytical optimizing framework for effective compute-intensive operators fusion" (2023)
[13]
Shi "Welder: Scheduling deep learning memory access via tile-graph" (2023)
[14]
Leary "XLA: TensorFlow, compiled" TensorFlow Dev Summit (2017)
[15]
Ansel "Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation" (2024)
[16]
Vaswani "Attention is all you need" Adv. Neural Inf. Process. Syst. (2017)
[17]
Devlin (2018)
[18]
Raffel "Exploring the limits of transfer learning with a unified text-to-text transformer" J. Mach. Learn. Res. (2020)
[19]
Touvron (2023)
[20]
Dubey (2024)
[21]
Zhang "Root mean square layer normalization" (2019)
[22]
Ba (2016)
[23]
Bridle "Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition" (1990)
[24]
Gu (2024)
[25]
Dao (2024)
[26]
Sun (2023)
[27]
Yang (2024)
[28]
Sun (2024)
[29]
Katharopoulos "Transformers are RNNs: Fast autoregressive transformers with linear attention" (2020)
[30]
G.E. Blelloch, Prefix Sums and Their Applications.
[31]
Zhu "SpaceFusion: Advanced deep learning operator fusion via space-mapping graph" (2025)
[32]
Zhang "MCFuser: High-performance and rapid fusion of memory-bound compute-intensive operators" (2024)
[33]
Boehm "On optimizing operator fusion plans for large-scale machine learning in systemML" Proc. VLDB Endow. (2018) 10.14778/3229863.3229865
[34]
Paszke "PyTorch: An imperative style, high-performance deep learning library" (2019)
[35]
Dao (2023)
[36]
Yang (2024)
[37]
P. Tillet, H.-T. Kung, D. Cox, Triton: an intermediate language and compiler for tiled neural network computations, in: Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, 2019, pp. 10–19. 10.1145/3315508.3329973
[38]
NVIDIA Corporation (2023)
[39]
NVIDIA Corporation (2023)
[40]
Kwon "Efficient memory management for large language model serving with PagedAttention" (2023)
[41]
Dong (2024)
[42]
NVIDIA Corporation (2023)
[43]
Ainslie (2023)
[44]
Liu (2024)
[45]
Tolstikhin "Mlp-mixer: An all-mlp architecture for vision" Adv. Neural Inf. Process. Syst. (2021)
[46]
Team (2024)
[47]
Dao (2023)
[48]
Chen "TVM: An automated End-to-End optimizing compiler for deep learning" (2018)
[49]
NVIDIA Corporation (2023)
[50]
Adams "Learning to optimize halide with tree search and random programs" ACM Trans. Graph. (2019) 10.1145/3306346.3322967

Showing 50 of 81 references

Metrics
0
Citations
81
References
Details
Published
Jul 01, 2026
Vol/Issue
176
Pages
103708
License
View
Cite This Article
Liang Zhu, Jianguo Yao, Haibing Guan (2026). SpaceFusion++: An operator fusion scheduler for neural language model inference. Journal of Systems Architecture, 176, 103708. https://doi.org/10.1016/j.sysarc.2026.103708