journal article Open Access Apr 03, 2026

Hierarchical Deep Learning for File Fragment Classification

Electronics Vol. 15 No. 7 pp. 1507 · MDPI AG
View at Publisher Save 10.3390/electronics15071507
Abstract
File fragment classification is crucial in digital forensics, aiding in the recovery and reconstruction of fragmented files, which serve as key evidence; while deep learning techniques have advanced in this area, challenges remain, particularly regarding the consideration of inter-file-type relationships and the granularity of classification. To overcome these challenges, we introduce a hierarchical classification approach that leverages an agglomerative hierarchical clustering algorithm combined with a dynamic adjustment mechanism, optimizing category distribution among leaf nodes. This structure is further enhanced by developing specific classifiers for each leaf node, tailored to its unique characteristics. Experimental results on the FFT-75 dataset show that our method achieves 76.3% accuracy in a 75-class scenario (512-byte blocks), surpassing the accuracy achieved with existing approaches. This method improves classification accuracy, addressing misclassification issues caused by excessive classification types.
Topics

No keywords indexed for this article. Browse by subject →

References
28
[1]
Beebe "Sceadan: Using concatenated n-gram vectors for improved file and data type classification" IEEE Trans. Inf. Forensics Secur. (2013) 10.1109/tifs.2013.2274728
[2]
Wang "Sparse coding for n-gram feature extraction and training for file fragment classification" IEEE Trans. Inf. Forensics Secur. (2018) 10.1109/tifs.2018.2823697
[3]
Veenman, C.J. (2007). Statistical disk cluster classification for file carving. Third International Symposium on Information Assurance and Security, IEEE. 10.1109/ias.2007.75
[4]
Sportiello, L., and Zanero, S. (2011). File block classification by support vector machine. Sixth International Conference on Availability, IEEE. 10.1109/ares.2011.52
[5]
Fitzgerald "Using NLP techniques for file fragment classification" Digit. Investig. (2012) 10.1016/j.diin.2012.05.008
[6]
Li, Q., Ong, A., Suganthan, P., and Thing, V. (2010, January 17–18). A novel support vector machine approach to high entropy data fragment classification. Proceedings of the SAISMC 2010, Port Elizabeth, South Africa.
[7]
Bhat, K., Lam, J.T., and Zulkernine, F. (2018). Content-based file type identification. 2018 10th International Conference on Electrical and Computer Engineering (ICECE), IEEE. 10.1109/icece.2018.8636693
[8]
Ahmed, I., Lhee, K.S., Shin, H.J., and Hong, M.P. (2011). Fast content-based file type identification. IFIP International Conference on Digital Forensics, Springer. 10.1007/978-3-642-24212-0_5
[9]
Hanis, F.M., Khoshvaghti, H., Teimouri, M., and Veisi, H. (2021). A language-independent approach to classification of textual file fragments: Case study of Persian, English, and Chinese languages. 2021 11th International Conference on Computer Engineering and Knowledge (ICCKE), IEEE. 10.1109/iccke54056.2021.9721512
[10]
Amirani, M.C., Toorani, M., and Beheshti, A. (2008). A new approach to content-based file type detection. 2008 IEEE Symposium on Computers and Communications, IEEE. 10.1109/iscc.2008.4625611
[11]
Ahmed "Content-based file-type identification using cosine similarity and a divide-and-conquer approach" IETE Tech. Rev. (2010) 10.4103/0256-4602.67149
[12]
Sitompul, O.S., and Rahmat, R.F. (2015). Distributed autonomous Neuro-Gen learning engine for content-based document file type identification. 2014 International Conference on Cyber and IT Service Management (CITSM), IEEE.
[13]
Karampidis "File type identification-Computational intelligence for digital forensics" J. Digit. Forensics Secur. Law (2017)
[14]
Chen, Q., Liao, Q., Jiang, Z.L., Fang, J., Yiu, S., and Xi, G. (2018). File fragment classification using grayscale image conversion and deep learning in digital forensics. 2018 IEEE Security and Privacy Workshops (SPW), IEEE. 10.1109/spw.2018.00029
[15]
Mittal "FiFTy: Large-scale file fragment type identification using convolutional neural networks" IEEE Trans. Inf. Forensics Secur. (2020) 10.1109/tifs.2020.3004266
[16]
Karres, M., and Shahmehri, N. (2006). File type identification of data fragments by their binary structure. 2006 IEEE Information Assurance Workshop, IEEE. 10.1109/iaw.2006.1652088
[17]
Calhoun "Predicting the types of file fragments" Digit. Investig. (2008) 10.1016/j.diin.2008.05.005
[18]
Masoumi "File fragment recognition based on content and statistical features" Multimed. Tools Appl. (2021) 10.1007/s11042-021-10681-x
[19]
Bhatt "Hierarchy-based file fragment classification" Mach. Learn. Knowl. Extr. (2020) 10.3390/make2030012
[20]
Wang "Intra- and inter-sector contextual information fusion with joint self-attention for file fragment classification" Knowl.-Based Syst. (2024) 10.1016/j.knosys.2024.111565
[21]
Alam, S., and Altiparmak, Z. (2024). Optimizing file fragment classification by mitigating class imbalance problem. 2024 1st International Conference on Innovative Engineering Sciences and Technological Research (ICIESTR), IEEE. 10.1109/iciestr60916.2024.10798156
[22]
Park, J.G., Liu, S., and Hong, J.H. (2024). XMP: A cross-attention multi-scale performer for file fragment classification. 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE. 10.1109/icassp48485.2024.10447626
[23]
Liu "A cross-attention multi-scale performer with Gaussian bit-flips for file fragment classification" IEEE Trans. Inf. Forensics Secur. (2025) 10.1109/tifs.2025.3539527
[24]
Saaim, K.M., Felemban, M., Alsaleh, S., and Almulhem, A. (2022). Light-weight file fragments classification using depthwise separable convolutions. IFIP International Conference on ICT Systems Security and Privacy Protection, Springer International Publishing. 10.1007/978-3-031-06975-8_12
[25]
Zhu, N., Liu, Y., Wang, K., and Ma, C. (2023). File fragment type identification based on CNN and LSTM. Proceedings of the 2023 7th International Conference on Digital Signal Processing, Association for Computing Machinery. 10.1145/3585542.3585545
[26]
Wang, Y., Wu, K., Liu, W., Yap, K.H., and Chau, L.P. (2023). Image representation and deep inception-attention for file-type and malware classification. 2023 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE. 10.1109/iscas46773.2023.10181598
[27]
Liu, W., Wang, Y., Wu, K., Yap, K.H., and Chau, L.P. (2023). A byte sequence is worth an image: CNN for file fragment classification using bit shift and n-gram embeddings. 2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS), IEEE. 10.1109/aicas57966.2023.10168636
[28]
Mittal, G., Korus, P., and Memon, N. (2026, March 29). File Fragment Type (FFT)-75 Dataset [EB/OL]. Available online: https://ieee-dataport.org/open-access/file-fragment-type-fft-75-dataset.
Metrics
0
Citations
28
References
Details
Published
Apr 03, 2026
Vol/Issue
15(7)
Pages
1507
License
View
Funding
Cyberspace Security Discipline Award: Cyberspace Security Discipline
Cite This Article
Bailin Zou, Huiyi Liu (2026). Hierarchical Deep Learning for File Fragment Classification. Electronics, 15(7), 1507. https://doi.org/10.3390/electronics15071507
Related

You May Also Like

Machine Learning Interpretability: A Survey on Methods and Metrics

Diogo V. Carvalho, Eduardo M. Pereira · 2019

1,384 citations

The k-means Algorithm: A Comprehensive Survey and Performance Evaluation

Mohiuddin Ahmed, Raihan Seraj · 2020

1,342 citations

Sentiment Analysis Based on Deep Learning: A Comparative Study

Nhan Cach Dang, María N. Moreno-García · 2020

550 citations