Abstract
HTTP adaptive streaming (HAS) has emerged as a prevalent approach for over-the-top (OTT) video streaming services due to its ability to deliver a seamless user experience. A fundamental component of HAS is the bitrate ladder, which comprises a set of encoding parameters (e.g., bitrate-resolution pairs) used to encode the source video into multiple representations. This adaptive bitrate ladder enables the client’s video player to dynamically adjust the quality of the video stream in real-time based on fluctuations in network conditions, ensuring uninterrupted playback by selecting the most suitable representation for the available bandwidth. The most straightforward approach involves using a fixed bitrate ladder for all videos, consisting of pre-determined bitrate-resolution pairs known as
one-size-fits-all
. Conversely, the most reliable technique relies on intensively encoding all resolutions over a wide range of bitrates to build the
convex hull
, thereby optimizing the bitrate ladder by selecting the representations from the convex hull for each specific video. Several techniques have been proposed to predict content-based ladders without performing a costly, exhaustive search encoding. This article provides a comprehensive review of various convex hull prediction methods, including both conventional and learning-based approaches. Furthermore, we conduct a benchmark study of several handcrafted- and deep learning (DL)-based approaches for predicting content-optimized convex hulls across multiple codec settings. The considered methods are evaluated on our proposed large-scale dataset, which includes 300 UHD video shots encoded with software and hardware encoders using three state-of-the-art video standards, including AVC/H.264, HEVC/H.265, and VVC/H.266, at various bitrate points. Our analysis provides valuable insights and establishes baseline performance for future research in this field (
Dataset URL
:
https://nasext-vaader.insa-rennes.fr/ietr-vaader/datasets/br_ladder
).
Topics

No keywords indexed for this article. Browse by subject →

References
81
[1]
FFmpeg. 2023. Retrieved from https://www.ffmpeg.org/
[2]
Twitch. 2023. Retrieved from https://stream.twitch.tv/
[3]
Netflix TechBlog. 2020. VMAF. The Journey Continues. by Zhi Li Christos Bampis | by Netflix Technology. Blog|Netflix TechBlog. Retrieved from https://netflixtechblog.com/vmaf-the-journey-continues-44b51ee9ed12
[4]
YouTube Help. 2023. YouTube: Choose Live Encoder Settings Bitrates and Resolutions. Retrieved from https://support.google.com/youtube/answer/2853702
[5]
Netflix TechBlog. 2015. Per-Title Encode Optimization. Retrieved from https://netflixtechblog.com/per-title-encode-optimization-7e99442b62a2
[6]
Bitmovin. 2020. Per-Title Encoding. Retrieved from https://bitmovin.com/per-title-encoding
[7]
Apple Developer. 2021. Best Practices for Creating and Deploying HTTP Live Streaming Media for the IPhone and IPad. Retrieved from https://developer.apple.com/documentation/http_live_streaming/http_live_streaming_hls_authoring_specification_for_apple_devices
[9]
Mariana Afonso, Fan Zhang, and David R. Bull. 2018. Spatial resolution adaptation framework for video compression. In Applications of Digital Image Processing XLI, Vol. 10752. SPIE, 209–218.
[10]
Hadi Amirpour, Mohammad Ghanbari, and Christian Timmerer. 2022. DeepStream: Video streaming enhancements using compressed deep neural networks. IEEE Transactions on Circuits and Systems for Video Technology.
[11]
Hadi Amirpour, Christian Timmerer, and Mohammad Ghanbari. 2021. PSTR: Per-title encoding using spatio-temporal resolutions. In IEEE International Conference on Multimedia and Expo (ICME).
[12]
Nicolas Ballas Li Yao Chris Pal and Aaron Courville. 2015. Delving deeper into convolutional networks for learning video representations. arXiv:1511.06432. Retrieved from https://arxiv.org/abs/1511.06432
[14]
Madhukar Bhat, Jean-Marc Thiesse, and Patrick Le Callet. 2020. A case study of machine learning classifiers for real-time adaptive resolution prediction in video coding. In IEEE International Conference on Multimedia and Expo (ICME).
[16]
Chao Chen, Yao-Chung Lin, Steve Benting, and Anil Kokaram. 2018. Optimized transcoding for large scale adaptive streaming using playback statistics. In IEEE international Conference on Image Processing (ICIP).
[17]
Tianqi Chen and Tong He. 2015. XGBoost: Extreme gradient boosting. R Package Version 0.4-2, 4 (2015), 1–4.
[19]
Dacast. 2023. Retrieved March 02 2023 from https://www.dacast.com/blog/adaptive-bitrate-streaming/
[20]
Jan De Cock, Zhi Li, Megha Manohara, and Anne Aaron. 2016. Complexity-based consistent-quality encoding in the cloud. In IEEE International Conference on Image Processing (ICIP), 1484–1488.
[21]
ImageNet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher et al.

2009 IEEE Conference on Computer Vision and Patter... 10.1109/cvpr.2009.5206848
[23]
AWS Elemental. 2018. Retrieved July 02 2018 from https://www.youtube.com/playlist?list=PLwIpNYl7S0G_C5I76Tf46n6ImKssMn2kT/
[26]
Extremely randomized trees

Pierre Geurts, Damien Ernst, Louis Wehenkel

Machine Learning 10.1007/s10994-006-6226-1
[29]
Textural Features for Image Classification

Robert M. Haralick, K. Shanmugam, Its'Hak Dinstein

IEEE Transactions on Systems, Man, and Cybernetics 10.1109/tsmc.1973.4309314
[30]
Harmonic Inc 4K Demo Footage. 2017. Retrieved May 01 2017 from https://www.harmonicinc.com/4k-demo-footage-download/
[31]
Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren et al.

2016 IEEE Conference on Computer Vision and Patter... 10.1109/cvpr.2016.90
[32]
Tin Kam Ho. 1995. Random decision forests. In IEEE International Conference on Document Analysis and Recognition, Vol. 1, 278–282.
[33]
Densely Connected Convolutional Networks

Gao Huang, Zhuang Liu, Laurens Van Der Maaten et al.

2017 IEEE Conference on Computer Vision and Patter... 10.1109/cvpr.2017.243
[34]
Tianchi Huang, Rui-Xiao Zhang, and Lifeng Sun. 2021. Deep reinforced bitrate ladders for adaptive video streaming. In ACM Workshop on Network and Operating Systems Support for Digital Audio and Video.
[35]
International Telecommunication Union. 2008. ITU-T Recommendation P.910: Subjective Video Quality Assessment Methods for Multimedia Applications. Recommendation. ITU-T. Retrieved from https://www.itu.int/rec/T-REC-P.910
[36]
Ioannis Katsavounidis. 2015. Chimera Video Sequence Details and Scenes. Retrieved from https://www.cdvl.org/documents/NETFLIX_Chimera_4096x2160_Download_Instructions.pdf
[37]
Ioannis Katsavounidis. 2018. Dynamic Optimizer—A Perceptual Video Encoding Optimization Framework. Retrieved from https://netflixtechblog.com
[38]
Angeliki V. Katsenou, Mariana Afonso, Dimitris Agrafiotis, and David R. Bull. 2016. Predicting video rate-distortion curves using textural features. In Picture Coding Symposium (PCS).
[40]
Angeliki V. Katsenou, Fan Zhang, Kyle Swanson, Mariana Afonso, Joel Sole, and David R. Bull. 2021. VMAF-based bitrate ladder estimation for adaptive streaming. In Picture Coding Symposium (PCS).
[41]
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems, 3149–3157.
[42]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations (ICLR ’15), Conference Track Proceedings. In Yoshua Bengio and Yann LeCun (Eds.).
[45]
John P. Lewis. 1995. Fast template matching. In Vision Interface. Vol. 95, Canadian Image Processing and Pattern Recognition Society, Quebec City, QC, Canada, 120–123.
[46]
Zhuoran Li, Zhengfang Duanmu, Wentao Liu, and Zhou Wang. 2019. AVC, HEVC, VP9, AVS2 or AV1?—A comparative study of state-of-the-art video encoders on 4K videos. In International Conference on Image Analysis and Recognition. Springer, 162–173.
[47]
Ce Liu, William T. Freeman, Richard Szeliski, and Sing Bing Kang. 2006. Noise estimation from a single image. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’06), Vol. 1, 901–908.
[50]
Vignesh V. Menon, Hadi Amirpour, Mohammad Ghanbari, and Christian Timmerer. 2022. Perceptually-aware per-title encoding for adaptive video streaming. In IEEE International Conference on Multimedia and Expo (ICME).

Showing 50 of 81 references

Metrics
4
Citations
81
References
Details
Published
Jul 18, 2025
Vol/Issue
21(7)
Pages
1-23
Funding
Région Bretagne under the DEEPTEC project
Cite This Article
Ahmed Telili, Wassim Hamidouche, Hadi Amirpour, et al. (2025). Convex Hull Prediction Methods for Bitrate Ladder Construction: Design, Evaluation, and Comparison. ACM Transactions on Multimedia Computing, Communications, and Applications, 21(7), 1-23. https://doi.org/10.1145/3723006
Related

You May Also Like

Dual-path Convolutional Image-Text Embeddings with Instance Loss

Zhedong Zheng, Liang Zheng · 2020

483 citations

Understanding and Creating Art with AI: Review and Outlook

Eva Cetinic, James She · 2022

385 citations

VM-UNet: Vision Mamba UNet for Medical Image Segmentation

Jiacheng Ruan, Jixiong Li · 2025

333 citations