journal article Open Access Jan 23, 2025

Sports Intelligence: Assessing the Sports Understanding Capabilities of Language Models Through Question Answering from Text to Video

Electronics Vol. 14 No. 3 pp. 461 · MDPI AG
View at Publisher Save 10.3390/electronics14030461
Abstract
Understanding sports presents a fascinating challenge for Natural Language Processing (NLP) due to its intricate and ever-changing nature. Current NLP technologies struggle with the advanced cognitive demands required to reason over complex sports scenarios. To explore the current boundaries of this field, we extensively evaluated mainstream and emerging large models on various sports tasks and addressed the limitations of previous benchmarks. Our study ranges from answering simple queries about basic rules and historical facts to engaging in complex, context-specific reasoning using strategies like few-shot learning and chain-of-thought techniques. Beyond text-based analysis, we also explored the sports reasoning capabilities of mainstream video language models to bridge the gap in benchmarking multimodal sports understanding. Based on a comprehensive overview of main-stream large models on diverse sports understanding tasks, we presented a new benchmark, which highlighted the critical challenges of sports understanding for NLP and the varying capabilities of state-of-the-art large models on sports understanding. We also provided an extensive set of error analyses that pointed to detailed reasoning defects of large model reasoning which model-based error analysis failed to reveal. We hope the benchmark and the error analysis set will help identify future research priorities in this field.
Topics

No keywords indexed for this article. Browse by subject →

References
45
[1]
Araújo, D., Couceiro, M., Seifert, L., Sarmento, H., and Davids, K. (2021). Artificial Intelligence in Sport Performance Analysis, Routledge. 10.4324/9781003163589
[2]
ATP Tour, Inc. (2023). The 2023 ATP Official Rulebook, ATP Tour, Inc.
[3]
(ATP Tour, 2023). Electronic Line Calling Live To Be Adopted Across The ATP Tour, ATP Tour.
[4]
Spitz "Video assistant referees (VAR): The impact of technology on decision making in association football referees" J. Sports Sci. (2021) 10.1080/02640414.2020.1809163
[5]
Tamir, I., and Bar-Eli, M. (2020). The moral gatekeeper: Soccer and technology, the case of video assistant referee (VAR). Front. Psychol., 11. 10.3389/fpsyg.2020.613469
[6]
Guest, N.S., Horne, J., Vanderhout, S.M., and El-Sohemy, A. (2019). Sport nutrigenomics: Personalized nutrition for athletic performance. Front. Nutr., 6. 10.3389/fnut.2019.00008
[7]
Bonilla, D.A., Boullosa, D., and Del Coso, J. (2023). Advances in nutrition, dietary supplements and ergogenic aids for athletic performance: Trends and future prospects. Nutrients, 15. 10.3390/nu15102246
[8]
Zhang, X., Feng, S., Peng, R., and Li, H. (2022). The role of velocity-based training (VBT) in enhancing athletic performance in trained individuals: A meta-analysis of controlled trials. Int. J. Environ. Res. Public Health, 19. 10.3390/ijerph19159252
[9]
Haugen "The training and development of elite sprint performance: An integration of scientific and best practice literature" Sports Med. Open (2019) 10.1186/s40798-019-0221-0
[10]
Xia, H., Tracy, R., Zhao, Y., Wang, Y., Wang, Y.F., and Shen, W. (2023, January 1–4). Advanced Volleyball Stats for All Levels: Automatic Setting Tactic Detection and Classification with a Single Camera. Proceedings of the 2023 IEEE International Conference on Data Mining Workshops (ICDMW), Shanghai, China. 10.1109/icdmw60847.2023.00179
[11]
(2023). BIG-Bench Authors. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models. Trans. Mach. Learn. Res., 1–95.
[12]
Xia, H., Yang, Z., Wang, Y., Tracy, R., Zhao, Y., Huang, D., Chen, Z., Zhu, Y., Fang Wang, Y., and Shen, W. (2024). SportQA: A Benchmark for Sports Understanding in Large Language Models. arXiv. 10.18653/v1/2024.naacl-long.283
[13]
Li, H., Deng, A., Ke, Q., Liu, J., Rahmani, H., Guo, Y., Schiele, B., and Chen, C. (2024). Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports. arXiv.
[14]
AI@Meta (2024). The Llama 3 Herd of Models. arXiv.
[15]
AI@Meta (2024, April 18). Llama 3 Model Card. Available online: https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md.
[16]
OpenAI (2024). GPT-4 Technical Report. arXiv.
[17]
OpenAI (2024, May 13). Hello GPT-4o. Available online: https://openai.com/index/hello-gpt-4o/.
[18]
Gemini Team (2024). Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv.
[19]
Gemini Team (2024, December 18). Gemini Flash. Available online: https://deepmind.google/technologies/gemini/flash/.
[20]
Anthropic (2024, December 18). Claude 3.5 Sonnet Model Card Addendum. Available online: https://www-cdn.anthropic.com/fed9cc193a14b84131812372d8d5857f8f304c52/Model_Card_Claude_3_Addendum.pdf.
[21]
Anthropic (2024, March 04). Introducing the Next Generation of Claude. Available online: https://www.anthropic.com/news/claude-3-family.
[22]
Zhu, D., Chen, J., Shen, X., Li, X., and Elhoseiny, M. (2023). MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. arXiv.
[23]
Jin, P., Takanobu, R., Zhang, C., Cao, X., and Yuan, L. (2023). Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding. arXiv. 10.1109/cvpr52733.2024.01300
[24]
Xu, L., Zhao, Y., Zhou, D., Lin, Z., Ng, S.K., and Feng, J. (2024). PLLaVA: Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning. arXiv.
[25]
Lin, B., Zhu, B., Ye, Y., Ning, M., Jin, P., and Yuan, L. (2023). Video-LLaVA: Learning United Visual Representation by Alignment Before Projection. arXiv. 10.18653/v1/2024.emnlp-main.342
[26]
Zhu, B., Lin, B., Ning, M., Yan, Y., Cui, J., Wang, H., Pang, Y., Jiang, W., Zhang, J., and Li, Z. (2023). LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment. arXiv.
[27]
Muresan "KQA Pro: A Dataset with Explicit Compositional Programs for Complex Question Answering over Knowledge Base" Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (2022)
[28]
Burstein "BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions" Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2019)
[29]
Riloff, E., Chiang, D., Hockenmaier, J., and Tsujii, J. (November, January 31). HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium.
[30]
Dhingra, B., Mazaitis, K., and Cohen, W.W. (2017). Quasar: Datasets for Question Answering by Search and Reading. arXiv.
[31]
Barzilay "TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension" Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (2017)
[32]
Jardim, P.C., Moraes, L.M.P., and Aguiar, C.D. (2023, January 25–29). QASports: A Question Answering Dataset about Sports. Proceedings of the Brazilian Symposium on Databases: Dataset Showcase Workshop, Belo Horizonte, MG, Brazil. 10.5753/dsw.2023.233602
[33]
Sun, M., Li, S., Zhang, Y., and Liu, Y. (November, January 30). LiveQA: A Question Answering Dataset over Sports Live. Proceedings of the 19th Chinese National Conference on Computational Linguistics, Haikou, China.
[34]
Burstein "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2019)
[35]
Brown "Language models are few-shot learners" Adv. Neural Inf. Process. Syst. (2020)
[36]
Wei "Chain-of-thought prompting elicits reasoning in large language models" Adv. Neural Inf. Process. Syst. (2022)
[37]
Kojima "Large language models are zero-shot reasoners" Adv. Neural Inf. Process. Syst. (2022)
[38]
(2024). Gemini Team. Gemini: A Family of Highly Capable Multimodal Models. arXiv.
[39]
Girdhar, R., El-Nouby, A., Liu, Z., Singh, M., Alwala, K.V., Joulin, A., and Misra, I. (2023, January 18–22). ImageBind One Embedding Space to Bind Them All. Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada. 10.1109/cvpr52729.2023.01457
[40]
Gao, P., Han, J., Zhang, R., Lin, Z., Geng, S., Zhou, A., Zhang, W., Lu, P., He, C., and Yue, X. (2023). LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model. arXiv.
[41]
Zhang, R., Han, J., Liu, C., Gao, P., Zhou, A., Hu, X., Yan, S., Lu, P., Li, H., and Qiao, Y. (2023). LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention. arXiv.
[42]
Han, J., Zhang, R., Shao, W., Gao, P., Xu, P., Xiao, H., Zhang, K., Liu, C., Wen, S., and Guo, Z. (2023). ImageBind-LLM: Multi-modality Instruction Tuning. arXiv.
[43]
Li, Y., Chen, L., He, R., Wang, Z., Wu, G., and Wang, L. (2021). MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions. arXiv. 10.1109/iccv48922.2021.01328
[44]
Shao, D., Zhao, Y., Dai, B., and Lin, D. (2020, January 14–19). FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA. 10.1109/cvpr42600.2020.00269
[45]
Chan, C.M., Chen, W., Su, Y., Yu, J., Xue, W., Zhang, S., Fu, J., and Liu, Z. (2023). ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate. arXiv.
Metrics
5
Citations
45
References
Details
Published
Jan 23, 2025
Vol/Issue
14(3)
Pages
461
License
View
Cite This Article
Zhengbang Yang, Haotian Xia, Jingxi Li, et al. (2025). Sports Intelligence: Assessing the Sports Understanding Capabilities of Language Models Through Question Answering from Text to Video. Electronics, 14(3), 461. https://doi.org/10.3390/electronics14030461
Related

You May Also Like

Machine Learning Interpretability: A Survey on Methods and Metrics

Diogo V. Carvalho, Eduardo M. Pereira · 2019

1,384 citations

The k-means Algorithm: A Comprehensive Survey and Performance Evaluation

Mohiuddin Ahmed, Raihan Seraj · 2020

1,342 citations

Sentiment Analysis Based on Deep Learning: A Comparative Study

Nhan Cach Dang, María N. Moreno-García · 2020

550 citations