CAMAL: Optimizing LSM-trees via Active Learning

Weiping Yu; Siqiang Luo; Zihao Yu; Gao Cong

doi:10.1145/3677138

journal article Open Access Sep 30, 2024

CAMAL: Optimizing LSM-trees via Active Learning

Weiping Yu

Siqiang Luo

Zihao Yu

Gao Cong

Proceedings of the ACM on Management of Data Vol. 2 No. 4 pp. 1-26 · Association for Computing Machinery (ACM)

View at Publisher Save 10.1145/3677138

Abstract

We use machine learning to optimize LSM-tree structure, aiming to reduce the cost of processing various read/write operations. We introduce a new approach CAMAL, which boasts the following features: (1)
ML-Aided
: CAMAL is the first attempt to apply active learning to tune LSM-tree based key-value stores. The learning process is coupled with traditional cost models to improve the training process; (2)
Decoupled Active Learning
: backed by rigorous analysis, CAMAL adopts active learning paradigm based on a decoupled tuning of each parameter, which further accelerates the learning process; (3)
Easy Extrapolation
: CAMAL adopts an effective mechanism to incrementally update the model with the growth of the data size; (4)
Dynamic Mode
: CAMAL is able to tune LSM-tree online under dynamically changing workloads; (5)
Significant System Improvement
: By integrating CAMAL into a full system RocksDB, the system performance improves by 28% on average and up to 8x compared to a state-of-the-art RocksDB design.

Topics

No keywords indexed for this article. Browse by subject →

References

91

[1]

2016. HBase. http://hbase.apache.org/. [Online; accessed 12-January-2022].

[2]

2021. WiredTiger. https://github.com/wiredtiger/wiredtiger.

[3]

10.1145/3183713.3183761

[4]

Hussam Abu-Libdeh, Deniz Altinbüken, Alex Beutel, Ed H Chi, Lyric Doshi, Tim Kraska, Andy Ly, Christopher Olston, et al. 2020. Learned indexes for a google-scale disk-based database. arXiv preprint arXiv:2012.12501 (2020).

[5]

10.14778/2757807.2757810

[6]

Wail Y Alkowaileet, Sattam Alsubaiee, and Michael J Carey. 2019. An LSM-based Tuple Compaction Framework for Apache AsterixDB (Extended Version). arXiv preprint arXiv:1910.08185 (2019).

[7]

Apache. 2016. Cassandra. http://cassandra.apache.org.

[8]

10.1145/2463676.2465296

[9]

Oana Balmau Diego Didona Rachid Guerraoui Willy Zwaenepoel Huapeng Yuan Aashray Arora Karan Gupta and Pavan Konka. 2017. {TRIAD}: Creating Synergies Between Memory Disk and Log in Log Structured Key-Value Stores. In 2017 {USENIX} Annual Technical Conference ({USENIX} {ATC} 17). 363--375.

[10]

Oana Balmau, Florin Dinu, Willy Zwaenepoel, Karan Gupta, Ravishankar Chandhiramoorthi, and Diego Didona. 2019. SILK: Preventing Latency Spikes in Log-Structured Merge Key-Value Stores.. In USENIX Annual Technical Conference. 753--766.

[11]

Christian Berthet. 2017. Approximation of LRU caches miss rate: Application to power-law popularities. arXiv preprint arXiv:1705.10738 (2017).

[12]

10.14778/3229863.3229873

[13]

10.1145/3514221.3517845

[14]

Zhao Cao, Shimin Chen, Feifei Li, Min Wang, and X Sean Wang. 2013. LogKV: Exploiting key-value stores for event log processing. In Proc. Conf. Innovative Data Syst. Res.

[15]

Helen HW Chan, Yongkun Li, Patrick PC Lee, and Yinlong Xu. 2018. Hashkv: Enabling efficient updates in {KV} storage via hashing. In 2018 {USENIX} Annual Technical Conference ({USENIX} {ATC} 18). 1007--1019.

[16]

10.1145/1365815.1365816

[17]

10.14778/3485450.3485461

[18]

10.1145/2882903.2904441

[19]

Tianqi Chen Tong He Michael Benesty Vadim Khotilovich Yuan Tang Hyunsu Cho Kailong Chen Rory Mitchell Ignacio Cano Tianyi Zhou et al. 2015. Xgboost: extreme gradient boosting. R package version 0.4--2 1 4 (2015) 1--4.

[20]

10.1145/1807128.1807152

[21]

Yifan Dai, Yien Xu, Aishwarya Ganesan, Ramnatthan Alagappan, Brian Kroth, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. 2020. From wisckey to bourbon: A learned index for log-structured merge trees. In Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation. 155--171.

[22]

10.1145/3035918.3064054

[23]

10.1145/3183713.3196927

[24]

10.1145/3448016.3457273

[25]

Spooky

Niv Dayan, Tamar Weiss, Shmuel Dashevsky et al.

Proceedings of the VLDB Endowment 10.14778/3551793.3551853

[26]

Dayan, Niv and Idreos, Stratos. 2019. The log-structured merge-bush & the wacky continuum. In Proceedings of the 2019 International Conference on Management of Data. 449--466.

[27]

10.1145/1294261.1294281

[28]

10.14778/1687627.1687767

[29]

10.14778/3329772.3329780

[30]

Facebook. 2016. RocksDB. https://github.com/facebook/rocksdb.

[31]

10.1145/2741948.2741973

[32]

Google. 2016. LevelDB. https://github.com/google/leveldb/.

[33]

10.1145/3588917

[34]

Charles R Harris, K Jarrod Millman, Stéfan J Van Der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J Smith, et al. 2020. Array programming with NumPy. Nature 585, 7825 (2020), 357--362.

[35]

Trevor Hastie, Robert Tibshirani, Jerome H Friedman, and Jerome H Friedman. 2009. The elements of statistical learning: data mining, inference, and prediction. Vol. 2. Springer.

[36]

Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2019. DeepDB: Learn from Data, not from Queries! Proceedings of the VLDB Endowment 13, 7 (2019).

[37]

10.1007/s10115-021-01605-0

[38]

Enhui Huang, Liping Peng, Luciano Di Palma, Ahmed Abdelkafi, Anna Liu, and Yanlei Diao. 2018. Optimization for active learning-based interactive database exploration. Ph.D. Dissertation. Ecole Polytechnique; University of Massachusetts Amherst.

[39]

10.1145/3299869.3314041

[40]

10.14778/3529337.3529345

[41]

Andy Huynh Harshal A. Chaudhari Evimaria Terzi and Manos Athanassoulis. 2023. Towards Flexibility and Robustness of LSM Trees. arXiv:2311.10005 [cs.DB] 10.1007/s00778-023-00826-9

[42]

Stratos Idreos Niv Dayan Wilson Qin Mali Akmanalp Sophie Hilgard Andrew Ross James Lennon Varun Jain Harshita Gupta David Li et al. 2019. Design Continuums and the Path Toward Self-Designing Key-Value Stores that Know and Learn.. In CIDR.

[43]

William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, John Esmet, Yizheng Jiao, Ankur Mittal, Prashant Pandey, Phaneendra Reddy, Leif Walsh, et al. 2015. BetrFS: A Right-Optimized Write-Optimized File System.. In FAST, Vol. 15. 301--315.

[44]

10.1609/aaai.v33i01.33014039

[45]

10.1002/spe.2799

[46]

10.1145/3514221.3526167

[47]

10.1145/3183713.3196909

[48]

10.14778/3352063.3352129

[49]

Honghao Lin, Tian Luo, and DavidWoodruff. 2022. Learning augmented binary search trees. In International Conference on Machine Learning. PMLR, 13431--13440.

[50]

10.1145/3654978

Showing 50 of 91 references

Metrics

5

Citations

91

References

Details

Published: Sep 30, 2024
Vol/Issue: 2(4)
Pages: 1-26
License: View

Authors

W

Weiping Yu

Nanyang Technological University, Singapore, SG

S

Siqiang Luo

Nanyang Technological University, Singapore, SG

Z

Zihao Yu

Nanyang Technological University, Singapore, SG

G

Gao Cong

Nanyang Technological Univesity, Singapore, SG

Funding

Nanyang Technological University Award: 022029-00001

Cite This Article

Weiping Yu, Siqiang Luo, Zihao Yu, et al. (2024). CAMAL: Optimizing LSM-trees via Active Learning. Proceedings of the ACM on Management of Data, 2(4), 1-26. https://doi.org/10.1145/3677138

CAMAL: Optimizing LSM-trees via Active Learning

You May Also Like