journal article Open Access Dec 01, 2025

Training‐free few‐shot construction tool and material detection using pre‐trained vision‐language model

View at Publisher Save 10.1111/mice.70129
Topics

No keywords indexed for this article. Browse by subject →

References
67
[1]
Alam "A dynamic ensemble learning algorithm for neural networks" Neural Computing and Applications (2020) 10.1007/s00521-019-04359-7
[2]
Areerob "Multimodal artificial intelligence approaches using large language models for expert‐level landslide image analysis" Computer‐Aided Civil and Infrastructure Engineering (2025) 10.1111/mice.13482
[3]
Atik "Integrated column generation for volunteer‐based delivery assignment and route optimization" Computer‐Aided Civil and Infrastructure Engineering (2025) 10.1111/mice.13439
[4]
Bang "Proactive proximity monitoring with instance segmentation and unmanned aerial vehicle‐acquired video‐frame prediction" Computer‐Aided Civil and Infrastructure Engineering (2021) 10.1111/mice.12672
[5]
Bavelos "Virtual reality‐based dynamic scene recreation and robot teleoperation for hazardous environments" Computer‐Aided Civil and Infrastructure Engineering (2025) 10.1111/mice.13337
[6]
Cai "FedHIP: Federated learning for privacy‐preserving human intention prediction in human‐robot collaborative assembly tasks" Advanced Engineering Informatics (2024) 10.1016/j.aei.2024.102411
[7]
Cai "A context‐augmented deep learning approach for worker trajectory prediction on unstructured and dynamic construction sites" Advanced Engineering Informatics (2020) 10.1016/j.aei.2020.101173
[8]
Context-aware vision-language model agent enriched with domain-specific ontology for construction site safety monitoring

Chak-Fu Chan, Peter Kok-Yiu Wong, Xiaowen Guo et al.

Automation in Construction 2025 10.1016/j.autcon.2025.106305
[9]
Automatic vision-based calculation of excavator earthmoving productivity using zero-shot learning activity recognition

Chen Chen, Bo Xiao, Yuxuan Zhang et al.

Automation in Construction 2023 10.1016/j.autcon.2022.104702
[10]
Chen "Automated counting of steel construction materials: Model, methodology, and online deployment" Buildings (2024) 10.3390/buildings14061661
[11]
Artificial intelligence in infrastructure construction: A critical review

Ke Chen, Xiaojie Zhou, Zhikang Bao et al.

Frontiers of Engineering Management 2025 10.1007/s42524-024-3128-5
[12]
Cheng, T., Song, L., Ge, Y., Liu, W., Wang, X., & Shan, Y. (2024). YOLO‐World: Real‐time open‐vocabulary object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI (pp. 16901–16911). 10.1109/cvpr52733.2024.01599
[13]
Chun "A deep learning‐based image captioning method to automatically generate comprehensive explanations of bridge damage" Computer‐Aided Civil and Infrastructure Engineering (2022) 10.1111/mice.12793
[14]
Deng "Automatic indoor construction process monitoring for tiles based on BIM and computer vision" Journal of Construction Engineering and Management (2020) 10.1061/(asce)co.1943-7862.0001744
[15]
Dong "View‐shuffled clustering via the modified Hungarian algorithm" Neural Networks (2024) 10.1016/j.neunet.2024.106602
[16]
The Pascal Visual Object Classes (VOC) Challenge

Mark Everingham, Luc Van Gool, Christopher K. I. Williams et al.

International Journal of Computer Vision 2010 10.1007/s11263-009-0275-4
[17]
Fan "Vision‐language model‐based human‐robot collaboration for smart manufacturing: A state‐of‐the‐art survey" Frontiers of Engineering Management (2025) 10.1007/s42524-025-4136-9
[18]
Fei "Knowledge‐enhanced graph neural networks for construction material quantity estimation of reinforced concrete buildings" Computer‐Aided Civil and Infrastructure Engineering (2024) 10.1111/mice.13094
[19]
Gil "Zero‐shot monitoring of construction workers’ personal protective equipment based on image captioning" Automation in Construction (2024) 10.1016/j.autcon.2024.105470
[20]
Huang "A deep learning framework based on improved self‐supervised learning for ground‐penetrating radar tunnel lining inspection" Computer‐Aided Civil and Infrastructure Engineering (2024) 10.1111/mice.13042
[21]
Jeoung "Zero‐shot framework for construction equipment task monitoring" Computer‐Aided Civil and Infrastructure Engineering (2025) 10.1111/mice.13506
[22]
Jiang "A visual inspection and diagnosis system for bridge rivets based on a convolutional neural network" Computer‐Aided Civil and Infrastructure Engineering (2024) 10.1111/mice.13274
[23]
Jung "VisualSiteDiary: A detector‐free Vision‐Language Transformer model for captioning photologs for daily construction reporting and image retrievals" Automation in Construction (2024) 10.1016/j.autcon.2024.105483
[24]
Jung "An approach to automated detection of structural failure using chronological image analysis in temporary structures" International Journal of Construction Management (2019) 10.1080/15623599.2017.1411457
[25]
Karim "CONSCOM: An OO construction scheduling and change management system" Journal of Construction Engineering and Management (1999) 10.1061/(asce)0733-9364(1999)125:5(368)
[26]
Karim "OO information model for construction project management" Journal of Construction Engineering and Management (1999) 10.1061/(asce)0733-9364(1999)125:5(361)
[27]
Kim "Visual analytics for operation‐level construction monitoring and documentation: State‐of‐the‐art technologies, research challenges, and future directions" Frontiers in Built Environment (2020) 10.3389/fbuil.2020.575738
[28]
Kim "Adaptive detector and tracker on construction sites using functional integration and online learning" Journal of Computing in Civil Engineering (2017) 10.1061/(asce)cp.1943-5487.0000677
[29]
Kim "A few‐shot learning approach for database‐free vision‐based monitoring on construction sites" Automation in Construction (2021) 10.1016/j.autcon.2021.103566
[30]
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., Dollár, P., & Girshick, R. (2023). Segment anything. Proceedings of the IEEE International Conference on Computer Vision, Paris, France. https://doi.org/10.1109/ICCV51070.2023.00371 10.1109/iccv51070.2023.00371
[31]
Kumar "YOLOv4 algorithm for the real‐time detection of fire and personal protective equipments at construction sites" Multimedia Tools and Applications (2022) 10.1007/s11042-021-11280-6
[32]
Kunlamai "Improving visual question answering for bridge inspection by pre‐training with external data of image–text pairs" Computer‐Aided Civil and Infrastructure Engineering (2024) 10.1111/mice.13086
[33]
Li "Computer vision–based counting model for dense steel pipe on construction sites" Journal of Construction Engineering and Management (2022) 10.1061/(asce)co.1943-7862.0002217
[34]
Liang "Recognizing temporary construction site objects using CLIP‐based few‐shot learning and multi‐modal prototypes" Automation in Construction (2024) 10.1016/j.autcon.2024.105542
[35]
Liu "Crowdsourcing construction activity analysis from jobsite video streams" Journal of Construction Engineering and Management (2015) 10.1061/(asce)co.1943-7862.0001010
[36]
Liu, Y., Zhu, M., Li, H., Chen, H., Wang, X., & Shen, C. (2023). Matcher: Segment anything with one shot using all‐purpose feature matching. The Twelfth International Conference on Learning Representations, Vienna, Austria.
[37]
Mitterberger "Tie a knot: Human–robot cooperative workflow for assembling wooden structures using rope joints" Construction Robotics (2022) 10.1007/s41693-022-00083-2
[38]
OpenAI. (2023). Introducing ChatGPT. https://openai.com/blog/chatgpt/
[39]
Oquab "DINOv2: Learning Robust Visual Features without Supervision" Transactions on Machine Learning Research (2024)
[40]
Pan "Learning multi‐granular worker intentions from incomplete visual observations for worker‐robot collaboration in construction" Automation in Construction (2024) 10.1016/j.autcon.2023.105184
[41]
Pereira "FEMa: A finite element machine for fast learning" Neural Computing and Applications (2020) 10.1007/s00521-019-04146-4
[42]
Pour Rahimian "On‐demand monitoring of construction projects through a game‐like hybrid application of BIM and machine learning" Automation in Construction (2020) 10.1016/j.autcon.2019.103012
[43]
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. Proceedings of Machine Learning Research, Virtual.
[44]
Rafiei "A new neural dynamic classification algorithm" IEEE Transactions on Neural Networks and Learning Systems (2017) 10.1109/tnnls.2017.2682102
[45]
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, Kaiming He, Ross Girshick et al.

IEEE Transactions on Pattern Analysis and Machine... 2017 10.1109/tpami.2016.2577031
[46]
Ren "Efficient 3D robotic mapping and navigation method in complex construction environments" Computer‐Aided Civil and Infrastructure Engineering (2025) 10.1111/mice.13353
[47]
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High‐resolution image synthesis with latent diffusion models. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022, New Orleans, LA. https://doi.org/10.1109/CVPR52688.2022.01042 10.1109/cvpr52688.2022.01042
[48]
Romera‐Paredes, B., & Torr, P.H.S. (2015). An embarrassingly simple approach to zero‐shot learning. In R. Feris, C. Lampert, & D. Parikh (Eds.), 32nd International Conference on Machine Learning, ICML 2015 (Vol. 3, pp. 11–30). Springer. https://doi.org/10.1007/978‐3‐319‐50077‐5_2
[49]
Shen "A convolutional neural‐network‐based pedestrian counting model for various crowded scenes" Computer‐Aided Civil and Infrastructure Engineering (2019) 10.1111/mice.12454
[50]
Sun, B., Li, B., Cai, S., Yuan, Y., & Zhang, C. (2021). FSCE: Few‐shot object detection via contrastive proposal encoding. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual (pp. 7348–7358). https://doi.org/10.1109/CVPR46437.2021.00727 10.1109/cvpr46437.2021.00727

Showing 50 of 67 references

Metrics
1
Citations
67
References
Details
Published
Dec 01, 2025
Vol/Issue
40(30)
Pages
6004-6023
License
View
Funding
National Natural Science Foundation of China Award: 72201226
Cite This Article
Zhaoxin Zhang, Yantao Yu, Zaolin Pan, et al. (2025). Training‐free few‐shot construction tool and material detection using pre‐trained vision‐language model. Computer-Aided Civil and Infrastructure Engineering, 40(30), 6004-6023. https://doi.org/10.1111/mice.70129