journal article Jan 09, 2026

DiffGaze: A Diffusion Model for Modelling Fine-grained Human Gaze Behaviour on 360 \({}^{\circ}\) Images

Abstract
Modelling human gaze behaviour on 360

\({}^{\circ}\)

images is important for various human–computer interaction applications. However, existing methods are limited to predicting discrete fixation sequences or aggregated saliency maps, thereby neglecting fine-grained gaze behaviour such as saccadic eye movements that can be captured by commercial eye-trackers. We introduce a more challenging task—
fine-grained gaze sequence generation
. This task aims to generate eye-tracker-like gaze data for given stimuli. We propose
DiffGaze
, a diffusion-based method for generating realistic and diverse fine-grained human gaze sequences conditioned on 360

\({}^{\circ}\)

images. We evaluate DiffGaze on two 360

\({}^{\circ}\)

image benchmarks for fine-grained gaze sequence generation as well as two downstream tasks, scanpath prediction and saliency prediction. Our evaluations show that DiffGaze outperforms the fine-grained gaze generation baselines in all tasks on both benchmarks. We also report a 21-participant survey study showing that our method generates gaze sequences that are indistinguishable from real human sequences. Taken together, our evaluations not only demonstrate the effectiveness of DiffGaze but also point towards a new generation of methods that faithfully model the rich spatial and temporal nature of natural human gaze behaviour.
Topics

No keywords indexed for this article. Browse by subject →

References
69
[4]
Ali Borji and Laurent Itti. 2015. Cat2000: A large scale fixation dataset for boosting saliency research. arXiv:1505.03581. Retrieved from https://arxiv.org/abs/1505.03581
[10]
Yupei Chen, Zhibo Yang, Seoyoung Ahn, Dimitris Samaras, Minh Hoai, and Gregory Zelinsky. 2021. COCO-Search18 fixation dataset for predicting goal-directed attention control. Scientific Reports 11, 1 (2021), 1–11.
[13]
Budmonde Duinkharjav, Kenneth Chen, Abhishek Tyagi, Jiayi He, Yuhao Zhu, and Qi Sun. 2022. Color-perception-guided display power reduction for virtual reality. ACM Transactions on Graphics (Proceedings SIGGRAPH Asia) 41, 6 (2022), 144:1–144:16.
[14]
Ralf Engbert Lars O. M. Rothkegel Daniel Backhaus and Hans Arne Trukenbrod. 2016. Evaluation of velocity-based saccade detection in the SMI-ETG 2W system. Technical Report Allgemeine Und Biologische Psychologie Uni-Versität Potsdam March.
[15]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33 (2020), 6840–6851.
[19]
SGaze: A Data-Driven Eye-Head Coordination Model for Realtime Gaze Prediction

Zhiming Hu, Congyi Zhang, Sheng Li et al.

IEEE Transactions on Visualization and Computer Gr... 10.1109/tvcg.2019.2899187
[20]
A model of saliency-based visual attention for rapid scene analysis

L. Itti, C. Koch, E. Niebur

IEEE Transactions on Pattern Analysis and Machine... 10.1109/34.730558
[25]
Chuhan Jiao Guanhua Zhang Yeonjoo Cho Zhiming Hu and Andreas Bulling. 2024. DiffEyeSyn: Diffusion-based user-specific eye movement synthesis. arXiv:2409.01240. Retrieved from https://arxiv.org/abs/2409.01240
[28]
Zhifeng Kong Wei Ping Jiaji Huang Kexin Zhao and Bryan Catanzaro. 2020. Diffwave: A versatile diffusion model for audio synthesis. arXiv:2009.09761. Retrieved from https://arxiv.org/abs/2009.09761
[30]
DeepGaze III: Modeling free-viewing human scanpaths with deep learning

Matthias Kümmerer, Matthias Bethge, Thomas S. A. Wallis

Journal of Vision 10.1167/jov.22.5.7
[36]
Rosanne Liu, Joel Lehman, Piero Molino, Felipe Petroski Such, Eric Frank, Alex Sergeev, and Jason Yosinski. 2018. An intriguing failing of convolutional neural networks and the CoordConv solution. In Proceedings of the 32nd International Conference on Advances in Neural Information Processing Systems (NIPS ’18), Vol. 31, 9628–9639.
[39]
Alexander Quinn Nichol and Prafulla Dhariwal. 2021. Improved denoising diffusion probabilistic models. In Proceedings of the International Conference on Machine Learning. PMLR, 8162–8171.
[41]
Rong Quan, Yantao Lai, Mengyu Qiu, and Dong Liang. 2024. Pathformer3D: A 3D scanpath transformer for 360 degree images. In Proceedings of the European Conference on Computer Vision. Springer, 73–90.
[45]
Robin Rombach Andreas Blattmann Dominik Lorenz Patrick Esser and Björn Ommer. 2021. High-resolution image synthesis with latent diffusion models. arXiv:2112.10752. Retrieved from https://arxiv.org/abs/2112.10752
[49]
Eye, Head and Torso Coordination During Gaze Shifts in Virtual Reality

Ludwig Sidenmark, Hans Gellersen

ACM Transactions on Computer-Human Interaction 10.1145/3361218

Showing 50 of 69 references

Metrics
3
Citations
69
References
Details
Published
Jan 09, 2026
Vol/Issue
16(1)
Pages
1-23
Funding
Deutsche Forschungsgemeinschaft Award: 251654672—TRR 161
Swiss National Science Foundation Award: 214434
European Union’s Horizon Europe research and innovation funding programme Award: 101072410
Cite This Article
Chuhan Jiao, Yao Wang, Guanhua Zhang, et al. (2026). DiffGaze: A Diffusion Model for Modelling Fine-grained Human Gaze Behaviour on 360 \({}^{\circ}\) Images. ACM Transactions on Interactive Intelligent Systems, 16(1), 1-23. https://doi.org/10.1145/3772075
Related

You May Also Like

The MovieLens Datasets

F. Maxwell Harper, Joseph A. Konstan · 2015

2,567 citations

Bridging the Gap Between Ethics and Practice

Ben Shneiderman · 2020

592 citations

Modeling User Preferences in Recommender Systems

Gawesh Jawaheer, Peter Weller · 2014

129 citations

Co-design of Human-centered, Explainable AI for Clinical Decision Support

Cecilia Panigutti, Andrea Beretta · 2023

88 citations