Abstract
Non-volatile Memory (NVM) offers the opportunity to build large, durable B+ trees with markedly higher performance and faster post-crash recovery than is possible with traditional disk- or flash-based persistence. Unfortunately, cache flush and fence instructions, required for crash consistency and failure atomicity on many machines, introduce substantial overhead not present in non-persistent trees, and force additional NVM reads and writes. The overhead is particularly pronounced in workloads that benefit from cache reuse due to good temporal locality or small working sets---traits commonly observed in real-world applications.

In this paper, we propose a buffered durable B+ tree (BD+Tree) that improves performance and reduces NVM traffic via
relaxed
persistence. Execution of a BD+Tree is divided into
epochs
of a few milliseconds each; if a crash occurs in epoch
e,
the tree recovers to its state as of the end of epoch
e
-2. (The persistence boundary can always be made current with an explicit sync operation, which quickly advances the epoch by 2.) NVM writes within an epoch are aggregated for delayed persistence, thereby increasing cache reuse and reducing traffic to NVM.

In comparison to state-of-the-art persistent B+ trees, our micro-benchmark experiments show that BD+Tree can improve throughput by up to 2.4x and reduce NVM writes by up to 90% when working sets are small or workloads exhibit strong temporal locality. On real-world workloads that benefit from cache reuse, BD+Tree realizes throughput improvements of 1.1--2.4x and up to a 99% decrease in NVM writes. Even on uniform workloads, with working sets that significantly exceed cache capacity, BD+Tree still improves throughput by 1--1.3x. The performance advantage of BD+Tree increases with larger caches, suggesting ongoing benefits as CPUs evolve toward gigabyte cache capacities.
Topics

No keywords indexed for this article. Browse by subject →

References
43
[3]
Wentao Cai, Haosen Wen, H Alan Beadle, Chris Kjellqvist, Mohammad Hedayati, and Michael L Scott. 2020. Understanding and Optimizing Persistent Memory Allocation. In 19th ACM SIGPLAN Intl. Symp. on Memory Management. 60--73.
[4]
Zhichao Cao, Siying Dong, Sagar Vemuri, and David H. C. Du. 2020. Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook. In 18th USENIX Conf. on File and Storage Technologies. 209--223.
[7]
Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In 1st ACM Symp. on Cloud Computing. 143--154.
[8]
Björn Daase, Lars Jonas Bollmeier, Lawrence Benson, and Tilmann Rabl. 2021. Maximizing Persistent Memory Bandwidth Utilization for OLAP Workloads. In 2021 Intl. Conf. on Management of Data. 339--351.
[11]
Michal Friedman, Erez Petrank, and Pedro Ramalhete. 2021. Mirror: Making Lock-free Data Structures Persistent. In 42nd ACM Conf. on Programming Language Design and Implementation. 1218--1232.
[13]
Swapnil Haria, Mark D. Hill, and Michael M. Swift. 2020. MOD: Minimally Ordered Durable Datastructures for Persistent Memory. In 25th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems. 775--788.
[15]
Deukyeon Hwang, Wook-Hee Kim, Youjip Won, and Beomseok Nam. 2018. Endurable Transient Inconsistency in Byte-Addressable Persistent B-Tree. In 16th USENIX Conf. on File and Storage Technologies. 187--200.
[16]
Intel Corporation. [n. d.]. Intel Performance Counter Monitor. https://github.com/intel/pcm.
[17]
Joseph Izraelevitz, Hammurabi Mendes, and Michael L. Scott. 2016. Linearizability of Persistent Memory Objects Under A Full-system-crash Failure Model. In 30th Intl. Symp. on Distributed Computing. 313--327.
[18]
Theodore Johnson and Dennis Shasha. 1989. Utilization of B-trees with Inserts, Deletes and Modifies. In 8th ACM Symp. on Principles of Database Systems. 235--246.
[21]
R Madhava Krishnan, Wook-Hee Kim, Xinwei Fu, Sumit Kumar Monga, Hee Won Lee, Minsung Jang, Ajit Mathew, and Changwoo Min. 2021. TIPS: Making Volatile Index Structures Persistent with DRAM-NVMM Tiering. In 2021 USENIX Annual Technical Conf. 773--787.
[22]
Christoph Lameter. 2005. Effective Synchronization on Linux/NUMA Systems. In Gelato Conference, Vol. 2005.
[23]
Se Kwon Lee, K Hyun Lim, Hyunsub Song, Beomseok Nam, and Sam H Noh. 2017. WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems. In 15th USENIX Conf. on File and Storage Technologies. 257--270.
[29]
Shaonan Ma, Kang Chen, Shimin Chen, Mengxing Liu, Jianglang Zhu, Hongbo Kang, and Yongwei Wu. 2021. ROART: Range-Query Optimized Persistent ART. In 19th USENIX Conf. on File and Storage Technologies. 1--16.
[31]
Faisal Nawab, Joseph Izraelevitz, Terence Kelly, Charles B. Morrey III, Dhruva R. Chakrabarti, and Michael L. Scott. 2017. Dalí: A Periodically Persistent Hash Map. In 31st Intl. Symp. on Distributed Computing. 37:1--37:16.
[32]
Storage Networking Industry Association. [n. d.]. Key-Value Traces. http://iotta.snia.org/traces/key-value.
[33]
Chundong Wang and Sudipta Chattopadhyay. 2020. Isle-tree: A B-tree with Intra-cache Line Sorted Leaves for Non-volatile Memory. In 38th IEEE Intl. Conf. on Computer Design. 573--580.
[37]
Haosen Wen, Wentao Cai, Mingzhe Du, Louis Jenkins, Benjamin Valpey, and Michael L. Scott. 2021. A fast, General System for Buffered Persistent Data Structures. In 50th Intl. Conf. on Parallel Processing. 1--11.
[38]
Jun Yang, Qingsong Wei, Cheng Chen, Chundong Wang, Khai Leong Yong, and Bingsheng He. 2015. NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems. In 13th USENIX Conf. on File and Storage Technologies. 167--181.
[41]
Jifei Yi, Benchao Dong, Mingkai Dong, Ruizhe Tong, and Haibo Chen. 2022. MT^2: Memory Bandwidth Regulation on Hybrid NVM/DRAM Platforms. In 20th USENIX Conf. on File and Storage Technologies. 199--216.
[43]
Xiaomin Zou, Fang Wang, Dan Feng, Tianjin Guan, and Nan Su. 2022. ROWE-tree: A Read-Optimized and Write-Efficient B-tree for Persistent Memory. In 51st Intl. Conf. on Parallel Processing. 1--11.
Metrics
3
Citations
43
References
Details
Published
Dec 18, 2024
Vol/Issue
2(6)
Pages
1-24
License
View
Funding
U.S. National Science Foundation Award: CNS-1955498, CNS-190080
Cite This Article
Mingzhe Du, Michael L. Scott (2024). Buffered Persistence in B+ Trees. Proceedings of the ACM on Management of Data, 2(6), 1-24. https://doi.org/10.1145/3698801