Topics

No keywords indexed for this article. Browse by subject →

References
84
[1]
An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition

Baoguang Shi, Xiang Bai, Cong Yao

IEEE Transactions on Pattern Analysis and Machine... 10.1109/tpami.2016.2646371
[6]
Context Perception Parallel Decoder for Scene Text Recognition

Yongkun Du, Zhineng Chen, Caiyan Jia et al.

IEEE Transactions on Pattern Analysis and Machine... 10.1109/tpami.2025.3545453
[16]
Fully convolutional networks for semantic segmentation

Jonathan Long, Evan Shelhamer, Trevor Darrell

2015 IEEE Conference on Computer Vision and Patter... 10.1109/cvpr.2015.7298965
[17]
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Ross Girshick, Jeff Donahue, Trevor Darrell et al.

2014 IEEE Conference on Computer Vision and Patter... 10.1109/cvpr.2014.81
[18]
Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, Michael Maire, Serge Belongie et al.

Lecture Notes in Computer Science 10.1007/978-3-319-10602-1_48
[19]
ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, Jia Deng, Hao Su et al.

International Journal of Computer Vision 10.1007/s11263-015-0816-y
[20]
Radford "Learning transferable visual models from natural language supervision"
[21]
Liu "Visual instruction tuning"
[22]
Liu "TextMonkey: An OCR-free large multimodal model for understanding document" (2024)
[23]
Li "PP-OCRv3: More attempts for the improvement of ultra lightweight OCR system" (2022)
[27]
Connectionist temporal classification

Alex Graves, Santiago Fernández, Faustino Gomez et al.

Proceedings of the 23rd international conference o... 10.1145/1143844.1143891
[33]
CDistNet: Perceiving Multi-domain Character Distance for Robust Text Recognition

Tianlun Zheng, Zhineng Chen, Shancheng Fang et al.

International Journal of Computer Vision 10.1007/s11263-023-01880-0
[34]
Bleeker "Bidirectional scene text recognition with a single decoder"
[36]
Jia "Scaling up visual and vision-language representation learning with noisy text supervision"
[38]
Lai "Instruction-Following speech recognition" (2023)
[39]
Wang "OFA: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework"
[42]
Deshmukh "Pengi: An audio language model for audio tasks"
[43]
Zhu "MiniGPT-4: Enhancing vision-language understanding with advanced large language models" (2023)
[44]
Gu "A systematic survey of prompt engineering on vision-language foundation models" (2023)
[45]
Alayrac "Flamingo: A visual language model for few-shot learning"
[49]
Jaderberg "Synthetic data and artificial neural networks for natural scene text recognition" (2014)
[50]
Reading Text in the Wild with Convolutional Neural Networks

Max Jaderberg, Karen Simonyan, Andrea Vedaldi et al.

International Journal of Computer Vision 10.1007/s11263-015-0823-z

Showing 50 of 84 references

Metrics
17
Citations
84
References
Details
Published
Apr 01, 2025
Vol/Issue
47(4)
Pages
2723-2738
License
View
Funding
National Natural Science Foundation of China Award: 32341012
National Key R&D Program of China Award: 2022YFB3104703
Cite This Article
Yongkun Du, Zhineng Chen, Yuchen Su, et al. (2025). Instruction-Guided Scene Text Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(4), 2723-2738. https://doi.org/10.1109/tpami.2025.3525526