journal article Open Access Jun 07, 2024

An Efficient Transformer–CNN Network for Document Image Binarization

Electronics Vol. 13 No. 12 pp. 2243 · MDPI AG
View at Publisher Save 10.3390/electronics13122243
Abstract
Color image binarization plays a pivotal role in image preprocessing work and significantly impacts subsequent tasks, particularly for text recognition. This paper concentrates on document image binarization (DIB), which aims to separate an image into a foreground (text) and background (non-text content). We thoroughly analyze conventional and deep-learning-based approaches and conclude that prevailing DIB methods leverage deep learning technology. Furthermore, we explore the receptive fields of pre- and post-network training to underscore the Transformer model’s advantages. Subsequently, we introduce a lightweight model based on the U-Net structure and enhanced with the MobileViT module to capture global information features in document images better. Given its adeptness at learning both local and global features, our proposed model demonstrates competitive performance on two standard datasets (DIBCO2012 and DIBCO2017) and good robustness on the DIBCO2019 dataset. Notably, our proposed method presents a straightforward end-to-end model devoid of additional image preprocessing or post-processing, eschewing the use of ensemble models. Moreover, its parameter count is less than one-eighth of the model, which achieves the best results on most DIBCO datasets. Finally, two sets of ablation experiments are conducted to verify the effectiveness of the proposed binarization model.
Topics

No keywords indexed for this article. Browse by subject →

References
88
[1]
Pan, Y.F., Hou, X., and Liu, C.L. (2009, January 26–29). Text Localization in Natural Scene Images Based on Conditional Random Field. Proceedings of the 2009 10th International Conference on Document Analysis and Recognition, Barcelona, Spain. 10.1109/icdar.2009.97
[2]
Gupta "OCR binarization and image pre-processing for searching historical documents" Pattern Recognit. (2007) 10.1016/j.patcog.2006.04.043
[3]
Saabni "Text line extraction for historical document images" Pattern Recognit. Lett. (2014) 10.1016/j.patrec.2013.07.007
[4]
He "Junction detection in handwritten documents and its application to writer identification" Pattern Recognit. (2015) 10.1016/j.patcog.2015.05.022
[5]
Giotis "A survey of document image word spotting techniques" Pattern Recognit. (2017) 10.1016/j.patcog.2017.02.023
[6]
Kumar, G., and Bhatia, P.K. (2014, January 8–9). A Detailed Review of Feature Extraction in Image Processing Systems. Proceedings of the 2014 Fourth International Conference on Advanced Computing & Communication Technologies, Rohtak, India. 10.1109/acct.2014.74
[7]
Smith, R.W. (2007, January 23–26). An Overview of the Tesseract OCR Engine. Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Curitiba, Paraná, Brazil. 10.1109/icdar.2007.4376991
[8]
Gatos, B., Ntirogiannis, K., and Pratikakis, I. (2009, January 26–29). ICDAR 2009 Document Image Binarization Contest (DIBCO 2009). Proceedings of the 2009 10th International Conference on Document Analysis and Recognition, Barcelona, Spain. 10.1109/icdar.2009.246
[9]
Pratikakis, I., Gatos, B., and Ntirogiannis, K. (2010, January 16–18). H-DIBCO 2010—Handwritten Document Image Binarization Competition. Proceedings of the 2010 12th International Conference on Frontiers in Handwriting Recognition, Kolkata, India. 10.1109/icfhr.2010.118
[10]
Pratikakis, I., Gatos, B., and Ntirogiannis, K. (2011, January 18–21). ICDAR 2011 Document Image Binarization Contest (DIBCO 2011). Proceedings of the 2011 International Conference on Document Analysis and Recognition, Beijing, China. 10.1109/icdar.2011.299
[11]
Pratikakis, I., Gatos, B., and Ntirogiannis, K. (2012, January 8–20). ICFHR 2012 Competition on Handwritten Document Image Binarization (H-DIBCO 2012). Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition, Bari, Italy. 10.1109/icfhr.2012.216
[12]
Pratikakis, I., Gatos, B., and Ntirogiannis, K. (2013, January 25–28). ICDAR 2013 Document Image Binarization Contest (DIBCO 2013). Proceedings of the 2013 12th International Conference on Document Analysis and Recognition, Washington, DC, USA. 10.1109/icdar.2013.219
[13]
Ntirogiannis, K., Gatos, B., and Pratikakis, I. (2014, January 1–4). ICFHR2014 Competition on Handwritten Document Image Binarization (H-DIBCO 2014). Proceedings of the 2014 14th International Conference on Frontiers in Handwriting Recognition, Crete Island, Greece. 10.1109/icfhr.2014.141
[14]
Pratikakis, I., Zagoris, K., Barlas, G., and Gatos, B. (2016, January 23–26). ICFHR2016 Handwritten Document Image Binarization Contest (H-DIBCO 2016). Proceedings of the 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), Shenzhen, China. 10.1109/icfhr.2016.0118
[15]
Pratikakis, I., Zagoris, K., Barlas, G., and Gatos, B. (2017, January 9–15). ICDAR2017 Competition on Document Image Binarization (DIBCO 2017). Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan. 10.1109/icdar.2017.228
[16]
Pratikakis, I., Zagori, K., Kaddas, P., and Gatos, B. (2018, January 5–8). ICFHR 2018 Competition on Handwritten Document Image Binarization (H-DIBCO 2018). Proceedings of the 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), Niagara Falls, NY, USA. 10.1109/icfhr-2018.2018.00091
[17]
Pratikakis, I., Zagoris, K., Karagiannis, X., Tsochatzidis, L., Mondal, T., and Marthot-Santaniello, I. (2019, January 20–25). ICDAR 2019 Competition on Document Image Binarization (DIBCO 2019). Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia. 10.1109/icdar.2019.00249
[18]
Seuret, M., Nicolaou, A., Stutzmann, D., Maier, A., and Christlein, V. (2020, January 7–10). ICFHR 2020 Competition on Image Retrieval for Historical Handwritten Fragments. Proceedings of the 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), Dortmund, Germany. 10.1109/icfhr2020.2020.00048
[19]
Gallego "A selectional auto-encoder approach for document image binarization" Pattern Recognit. (2019) 10.1016/j.patcog.2018.08.011
[20]
A Threshold Selection Method from Gray-Level Histograms

Nobuyuki Otsu

IEEE Transactions on Systems, Man, and Cybernetics 1979 10.1109/tsmc.1979.4310076
[21]
Niblack, W. (1986). An Introduction to Digital Image Processing, Strandberg Publishing Company.
[22]
Adaptive document image binarization

J. Sauvola, M. Pietikainen

Pattern Recognition 2000 10.1016/s0031-3203(99)00055-2
[23]
Wolf "Extraction and recognition of artificial text in multimedia documents" Form. Pattern Anal. Appl. (2004)
[24]
Bernsen, J. (1986, January 27–31). Dynamic Thresholding of Grey-Level Images. Proceedings of the ICPR’86, Eighth International Conference on Pattern Recognition, Paris, France.
[25]
Gatos "Adaptive degraded document image binarization" Pattern Recognit. (2006) 10.1016/j.patcog.2005.09.010
[26]
Khurshid, K., Siddiqi, I., Faure, C., and Vincent, N. (2009). Comparison of Niblack Inspired Binarization Methods for Ancient Documents. Electronic Imaging, SPIE. 10.1117/12.805827
[27]
Jiang, L., Chen, K., Yan, S., Zhou, Y., and Guan, H. (2009, January 19–20). Adaptive Binarization for Degraded Document Images. Proceedings of the 2009 International Conference on Information Engineering and Computer Science, Wuhan, China. 10.1109/iciecs.2009.5362923
[28]
Bataineh "An adaptive local binarization method for document images based on a novel thresholding method and dynamic windows" Pattern Recognit. Lett. (2011) 10.1016/j.patrec.2011.08.001
[29]
Su "Robust Document Image Binarization Technique for Degraded Document Images" IEEE Trans. Image Process. (2013) 10.1109/tip.2012.2231089
[30]
Hadjadj "ISauvola: Improved Sauvola’s Algorithm for Document Image Binarization" Image Anal. Recognit. (2016) 10.1007/978-3-319-41501-7_82
[31]
Mustafa "Binarization of Document Image Using Optimum Threshold Modification" J. Phys. Conf. Ser. (2018) 10.1088/1742-6596/1019/1/012022
[32]
Zemouri "Enhancement of Historical Document Images by Combining Global and Local Binarization Technique" Int. J. Inf. Eng. Electron. Bus. (2014)
[33]
Ntirogiannis "A combined approach for the binarization of handwritten document images" Pattern Recognit. Lett. (2014) 10.1016/j.patrec.2012.09.026
[34]
Chaudhary "An effective and robust technique for the binarization of degraded document images" Int. J. Res. Eng. Technol. (2014) 10.15623/ijret.2014.0306025
[35]
Saddami "Kombinasi Metode Nilai Ambang Lokal dan Global untuk Restorasi Dokumen Jawi Kuno" J. Teknol. Inf. Dan Ilmu Komput. (2020)
[36]
Lu "Document image binarization using background estimation and stroke edges" Int. J. Doc. Anal. Recognit. (2010) 10.1007/s10032-010-0130-8
[37]
Santhanaprabhu "Extraction and Document Image Binarization Using Sobel Edge Detection" J. Eng. Res. Appl. (2014)
[38]
Lelore "FAIR: A Fast Algorithm for Document Image Restoration" IEEE Trans. Pattern Anal. Mach. Intell. (2013) 10.1109/tpami.2013.63
[39]
Holambe "Image Binarization for Degraded Document Images" Int. J. Comput. Appl. (2015)
[40]
Jia, F., Shi, C., He, K., Wang, C., and Xiao, B. (2016, January 23–26). Document Image Binarization Using Structural Symmetry of Strokes. Proceedings of the 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), Shenzhen, China. 10.1109/icfhr.2016.0083
[41]
Lai, A.N., and Lee, G. (2008, January 16–19). Binarization by Local k-Means Clustering for Korean Text Extraction. Proceedings of the 2008 IEEE International Symposium on Signal Processing and Information Technology, Sarajevo, Bosnia and Herzegovina. 10.1109/isspit.2008.4775658
[42]
Tong, L.J., Chen, K., Zhang, Y., Fu, X.L., and Duan, J.Y. (2009, January 17–19). Document Image Binarization Based on NFCM. Proceedings of the 2009 2nd International Congress on Image and Signal Processing, Tianjin, China. 10.1109/cisp.2009.5305330
[43]
Biswas, B., Bhattacharya, U., and Chaudhuri, B.B. (2014, January 24–28). A Global-to-Local Approach to Binarization of Degraded Document Images. Proceedings of the 2014 22nd International Conference on Pattern Recognition, Stockholm, Sweden. 10.1109/icpr.2014.519
[44]
Soua "GPU parallel implementation of the new hybrid binarization based on Kmeans method (HBK)" J. Real-Time Image Process (2018) 10.1007/s11554-014-0458-2
[45]
Annabestani "A new threshold selection method based on fuzzy expert systems for separating text from the background of document images" Iran. J. Sci. Technol. Trans. Electr. Eng. (2019) 10.1007/s40998-018-0160-7
[46]
Xiong "An enhanced binarization framework for degraded historical document images" Eurasip J. Image Video Process. (2021) 10.1186/s13640-021-00556-4
[47]
Rojas, I., Joya, G., and Catala, A. (2015). Insights on the Use of Convolutional Neural Networks for Document Image Binarization. Advances in Computational Intelligence, Springer International Publishing.
[48]
Fully convolutional networks for semantic segmentation

Jonathan Long, Evan Shelhamer, Trevor Darrell

2015 IEEE Conference on Computer Vision and Patter... 10.1109/cvpr.2015.7298965
[49]
Tensmeyer, C., and Martinez, T. (2017, January 9–15). Document image binarization with fully convolutional neural networks. Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan. 10.1109/icdar.2017.25
[50]
Calvo-Zaragoza, J., Vigliensoni, G., and Fujinaga, I. (2017, January 8–12). Pixel-Wise Binarization of Musical Documents with Convolutional Neural Networks. Proceedings of the 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA), Nagoya, Japan. 10.23919/mva.2017.7986876

Showing 50 of 88 references

Metrics
8
Citations
88
References
Details
Published
Jun 07, 2024
Vol/Issue
13(12)
Pages
2243
License
View
Cite This Article
Lina Zhang, Kaiyuan Wang, Yi Wan (2024). An Efficient Transformer–CNN Network for Document Image Binarization. Electronics, 13(12), 2243. https://doi.org/10.3390/electronics13122243
Related

You May Also Like

Machine Learning Interpretability: A Survey on Methods and Metrics

Diogo V. Carvalho, Eduardo M. Pereira · 2019

1,384 citations

The k-means Algorithm: A Comprehensive Survey and Performance Evaluation

Mohiuddin Ahmed, Raihan Seraj · 2020

1,342 citations

Sentiment Analysis Based on Deep Learning: A Comparative Study

Nhan Cach Dang, María N. Moreno-García · 2020

550 citations