journal article Open Access Jan 01, 2025

Swin2‐MoSE: A new single image supersolution model for remote sensing

View at Publisher Save 10.1049/ipr2.13303
Abstract
Abstract

Due to the limitations of current optical and sensor technologies and the high cost of updating them, the spectral and spatial resolution of satellites may not always meet desired requirements. For these reasons, Remote‐Sensing Single‐Image Super‐Resolution (RS‐SISR) techniques have gained significant interest. In this paper, Swin2‐MoSE model is proposed, an enhanced version of Swin2SR. The model introduces MoE‐SM, an enhanced Mixture‐of‐Experts (MoE) to replace the Feed‐Forward inside all Transformer block. MoE‐SM is designed with Smart‐Merger, and new layer for merging the output of individual experts, and with a new way to split the work between experts, defining a new per‐example strategy instead of the commonly used per‐token one. Furthermore, it is analyzed how positional encodings interact with each other, demonstrating that per‐channel bias and per‐head bias can positively cooperate. Finally, the authors propose to use a combination of Normalized‐Cross‐Correlation (NCC) and Structural Similarity Index Measure (SSIM) losses, to avoid typical MSE loss limitations. Experimental results demonstrate that Swin2‐MoSE outperforms any Swin derived models by up to 0.377–0.958 dB (PSNR) on task of , and resolution‐upscaling ( and OLI2MSI datasets). It also outperforms SOTA models by a good margin, proving to be competitive and with excellent potential, especially for complex tasks. Additionally, an analysis of computational costs is also performed. Finally, the efficacy of Swin2‐MoSE is shown, applying it to a semantic segmentation task (SeasoNet dataset). Code and pretrained are available on
https://github.com/IMPLabUniPr/swin2‐mose/tree/official_code
Topics

No keywords indexed for this article. Browse by subject →

References
52
[8]
Koßmann D. Brack V. Wilhelm T.:Seasonet: A seasonal scene classification segmentation and retrieval dataset for satellite imagery over germany. In:Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium pp.243–246.IEEE Piscataway(2022) 10.1109/igarss46834.2022.9884079
[9]
Reiersen G. Dao D. Lütjens B. et al.:Reforestree: A dataset for estimating tropical forest carbon stock with deep learning and aerial imagery. In:Proceedings of the AAAI Conference on Artificial Intelligence pp.12119–12125.AAAI Press Menlo Park CA(2022) 10.1609/aaai.v36i11.21471
[10]
Daudt R.C. Le Saux B. Boulch A. Gousseau Y.:Urban change detection for multispectral earth observation using convolutional neural networks. In:Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS) pp.2115–2118.IEEE Piscataway(2018) 10.1109/igarss.2018.8518015
[12]
Lafenetre J. Nguyen N.L. Facciolo G. Eboli T.:Handheld burst super‐resolution meets multi‐exposure satellite imagery. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp.2055–2063.IEEE Piscataway(2023) 10.1109/cvprw59228.2023.00199
[13]
Nguyen N.L. Anger J. Davy A. Arias P. Facciolo G.:L1bsr: Exploiting detector overlap for self‐supervised single‐image super‐resolution of sentinel‐2 l1b imagery. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp.2012–2022(2023) 10.1109/cvprw59228.2023.00195
[14]
Conde M.V. Choi U.J. Burchi M. Timofte R.:Swin2sr: Swinv2 transformer for compressed image super‐resolution and restoration. In:European Conference on Computer Vision pp.669–687.Springer Berlin(2022) 10.1007/978-3-031-25063-7_42
[15]
Shazeer N. Mirhoseini A. Maziarz K. et al.:Outrageously large neural networks: The sparsely‐gated mixture‐of‐experts layer. arXiv preprint arXiv:170106538 (2017)
[16]
Dong X. Bao J. Chen D. et al.:Cswin transformer: A general vision transformer backbone with cross‐shaped windows. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp.12124–12134.IEEE Piscataway(2022) 10.1109/cvpr52688.2022.01181
[17]
Liu Z. Lin Y. Cao Y. et al.:Swin transformer: Hierarchical vision transformer using shifted windows. In:Proceedings of the IEEE/CVF international conference on computer vision pp.10012–10022.IEEE Piscataway(2021) 10.1109/iccv48922.2021.00986
[18]
Image Super-Resolution Using Deep Convolutional Networks

Chao Dong, Chen Change Loy, Kaiming He et al.

IEEE Transactions on Pattern Analysis and Machine... 10.1109/tpami.2015.2439281
[19]
Dai T. Cai J. Zhang Y. Xia S.T. Zhang L.:Second‐order attention network for single image super‐resolution. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp.11065–11074.IEEE Piscataway(2019) 10.1109/cvpr.2019.01132
[20]
Kim J. Lee J.K. Lee K.M.:Accurate image super‐resolution using very deep convolutional networks. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp.1646–1654.IEEE Piscataway(2016) 10.1109/cvpr.2016.182
[21]
Ledig C. Theis L. Huszár F. et al.:Photo‐realistic single image super‐resolution using a generative adversarial network. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp.4681–4690.IEEE Piscataway(2017) 10.1109/cvpr.2017.19
[22]
Wang X. Yu K. Wu S. et al.:Esrgan: Enhanced super‐resolution generative adversarial networks. In:Proceedings of the European Conference on Computer Vision (ECCV) Workshops.Springer Berlin(2018) 10.1007/978-3-030-11021-5_5
[23]
Niu B. Wen W. Ren W. et al.:Single image super‐resolution via a holistic attention network. In:Proceedings of Computer Vision (ECCV 2020): 16th European Conference Glasgow Part XII 16 pp.191–207.Springer Berlin(2020) 10.1007/978-3-030-58610-2_12
[24]
Wang Z. Cun X. Bao J. Zhou W. Liu J. Li H.:Uformer: A general u‐shaped transformer for image restoration. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp.17683–17693.IEEE Piscataway(2022) 10.1109/cvpr52688.2022.01716
[25]
Liang J. Cao J. Sun G. Zhang K. Van Gool L. Timofte R.:Swinir: Image restoration using swin transformer. In:Proceedings of the IEEE/CVF International Conference on Computer Vision pp.1833–1844.IEEE Piscataway(2021) 10.1109/iccvw54120.2021.00210
[26]
Zhang D. Huang F. Liu S. Wang X. Jin Z.:Swinfir: Revisiting the swinir with fast fourier convolution and improved training for image super‐resolution. arXiv preprint arXiv:220811247 (2022)
[27]
Liu Z. Hu H. Lin Y. et al.:Swin transformer v2: Scaling up capacity and resolution. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp.12009–12019.IEEE Piscataway(2022) 10.1109/cvpr52688.2022.01170
[28]
Riquelme C. Puigcerver J. Mustafa B. et al.:Scaling vision with sparse mixture of experts. In:Advances in Neural Information Processing Systems vol.34 pp.8583–8595.Curran Associates Inc. Red Hook NY(2021)
[29]
Hierarchical Mixtures of Experts and the EM Algorithm

Michael I. Jordan, Robert A. Jacobs

Neural Computation 10.1162/neco.1994.6.2.181
[30]
Chen Z. Zhang Y. Gu J. Kong L. Yang X. Yu F.:Dual aggregation transformer for image super‐resolution. In:Proceedings of the International Conference on Computer Vision – ICCV.IEEE Piscataway(2023) 10.1109/iccv51070.2023.01131
[32]
Image Super-Resolution Using Deep Convolutional Networks

Chao Dong, Chen Change Loy, Kaiming He et al.

IEEE Transactions on Pattern Analysis and Machine... 10.1109/tpami.2015.2439281
[35]
Xu W. XU G. Wang Y. Sun X. Lin D. WU Y.:High quality remote sensing image super‐resolution using deep memory connected network. In:Proceedings of the IEEE International Geoscience and Remote Sensing Symposium IGARSS pp.8889–8892.IEEE Piscataway(2018) 10.1109/igarss.2018.8518855
[40]
SWCGAN: Generative Adversarial Network Combining Swin Transformer and CNN for Remote Sensing Image Super-Resolution

Jingzhi Tu, Gang Mei, Zhengjing Ma et al.

IEEE Journal of Selected Topics in Applied Earth O... 10.1109/jstars.2022.3190322
[41]
Vaswani A. (2017)
[42]
Self-Attention with Relative Position Representations

Peter Shaw, Jakob Uszkoreit, Ashish Vaswani

Proceedings of the 2018 Conference of the North Am... 10.18653/v1/n18-2074
[43]
Liang J. Zeng H. Zhang L.:Efficient and degradation‐adaptive network forăreal‐world image super‐resolution. In:Avidan S. Brostow G. Cissé M. Farinella G.M. Hassner T.(eds.)Proceedings of the European Conference on Computer Vision (ECCV) pp.574–591.Springer Cham(2022) 10.1007/978-3-031-19797-0_33
[44]
He X. Yan K. Li R. Xie C. Zhang J. Zhou M.:Frequency‐adaptive pan‐sharpening with mixture of experts. arXiv preprint arXiv:240102151 (2024) 10.1609/aaai.v38i3.27984
[45]
Hwang C. Cui W. Xiong Y. et al.:Tutel: Adaptive mixture‐of‐experts at scale. In:Proceedings of Machine Learning and Systems vol.5.PMLR New York(2023)
[46]
Fedus W. "Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity" J. Mach. Learn. Res. (2022)
[47]
Jiang A.Q. Sablayrolles A. Roux A. et al.:Mixtral of experts. arXiv preprint arXiv:240104088 (2024)
[48]
Sajjadi M.S. Scholkopf B. Hirsch M.:Enhancenet: Single image super‐resolution through automated texture synthesis. In:Proceedings of the IEEE International Conference on Computer Vision pp.4491–4500.IEEE Piscataway(2017) 10.1109/iccv.2017.481
[49]
Image quality assessment: from error visibility to structural similarity

Zhou Wang, A.C. Bovik, H.R. Sheikh et al.

IEEE Transactions on Image Processing 10.1109/tip.2003.819861

Showing 50 of 52 references

Metrics
7
Citations
52
References
Details
Published
Jan 01, 2025
Vol/Issue
19(1)
License
View
Cite This Article
Leonardo Rossi, Vittorio Bernuzzi, Tomaso Fontanini, et al. (2025). Swin2‐MoSE: A new single image supersolution model for remote sensing. IET Image Processing, 19(1). https://doi.org/10.1049/ipr2.13303