Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning

Shuai Wang; Zhengyang Chen; Kong Aik Lee; Yanmin Qian; Haizhou Li

doi:10.1109/taslp.2024.3492793

journal article Jan 01, 2024

Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning

Shuai Wang

Zhengyang Chen Kong Aik Lee

Yanmin Qian

Haizhou Li

IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 32 pp. 4971-4998 · Institute of Electrical and Electronics Engineers (IEEE)

View at Publisher Save 10.1109/taslp.2024.3492793

Topics

No keywords indexed for this article. Browse by subject →

References

404

[1]

Voice Biometric: A Technology for Voice Based Authentication

Nilu Singh, Alka Agrawal, R. A. Khan

Advanced Science, Engineering and Medicine 10.1166/asem.2018.2219

[2]

Kiktova "Speaker recognition for surveillance application" J. Elect. Electron. Eng. (2015)

[3]

10.1109/icassp.2016.7472820

[4]

10.1109/msp.2008.931100

[5]

10.1016/j.neunet.2021.03.004

[6]

10.1109/msp.2015.2462851

[7]

10.1007/978-3-642-17641-8_18

[8]

10.1016/j.specom.2009.08.009

[9]

10.1007/978-3-642-01793-3_106

[10]

10.1016/j.sigpro.2007.11.017

[11]

10.1016/j.csl.2021.101317

[12]

Arik "Neural voice cloning with a few samples" (2018)

[13]

Jia "Transfer learning from speaker verification to multispeaker text-to-speech synthesis" (2018)

[14]

10.21437/interspeech.2021-557

[15]

10.21437/ssw.2019-28

[16]

Tomashenko "The voiceprivacy 2024 challenge evaluation plan" (2024)

[17]

10.21437/interspeech.2020-1602

[18]

10.1109/icassp40776.2020.9054311

[19]

10.1109/msp.2023.3240008

[20]

10.1121/1.1916342

[21]

10.1109/icassp.1983.1172258

[22]

10.1002/j.1538-7305.1987.tb00198.x

[23]

10.1109/78.80876

[24]

10.1109/icassp.1990.115629

[25]

10.1109/icassp.1991.150360

[26]

10.1109/icassp.1991.150357

[27]

10.1109/89.260362

[28]

10.1109/icnn.1997.614225

[29]

10.1006/dspr.1999.0361

[30]

Vector quantization in speech coding

J. Makhoul, S. Roucos, H. Gish

Proceedings of the IEEE 10.1109/proc.1985.13340

[31]

10.1109/tit.1983.1056716

[32]

10.21236/ada164453

[33]

10.1109/89.279278

[34]

10.21437/eurospeech.2003-759

[35]

10.1109/lsp.2006.870086

[36]

10.1109/icassp.2006.1659966

[37]

Kenny "Joint factor analysis of speaker and session variability: Theory and algorithms" (2005)

[38]

10.1109/tsa.2004.840940

[39]

10.1109/tasl.2006.881693

[40]

10.1109/tasl.2010.2064307

[41]

10.1007/11744085_41

[42]

10.1109/icassp.2018.8461375

[43]

Zeinali "But system description to voxceleb speaker recognition challenge" (2019)

[44]

10.1109/lsp.2021.3091932

[45]

10.1109/icassp.2013.6639344

[46]

Krizhevsky "Imagenet classification with deep convolutional neural networks" (2012)

[47]

10.21437/interspeech.2010-343

[48]

Representation Learning: A Review and New Perspectives

Y. Bengio, A. Courville, P. Vincent

IEEE Transactions on Pattern Analysis and Machine... 10.1109/tpami.2013.50

[49]

10.1109/icassp.2014.6854363

[50]

10.21437/interspeech.2015-81

Showing 50 of 404 references

Cited By

19

An End-to-End Overview of Clinical Speech AI

Si-Ioi Ng, Lingfeng Xu · 2026

IEEE Transactions on Audio, Speech...

Nes2Net: A Lightweight Nested Architecture for Foundation Model Driven Speech Anti-Spoofing

Tianchi Liu, Duc-Tuan Truong · 2025

IEEE Transactions on Information Fo...

Metrics

19

Citations

404

References

Details

Published: Jan 01, 2024
Vol/Issue: 32
Pages: 4971-4998
License: View

Authors

S

Shuai Wang

Shenzhen Research Institute of Big Data, School of Data Science, The Chinese University of Hong Kong, Shenzhen, China

Z

Zhengyang Chen

Auditory Cognition and Computational Acoustics Lab, Department of Computer Science and Engineering and MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China

K

Kong Aik Lee

Department of Electrical and Electronic Engineering, The Hong Kong Polytechnic University, Hong Kong

Y

Yanmin Qian

Auditory Cognition and Computational Acoustics Lab, Department of Computer Science and Engineering and MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China

H

Haizhou Li

Shenzhen Research Institute of Big Data, School of Data Science, The Chinese University of Hong Kong, Shenzhen, China

Funding

Shenzhen Science and Technology Program Award: ZDSYS20230626091302006

Shanghai Municipal Science and Technology Commission Project Award: 2021SHZDZX0102

Shenzhen Science and Technology Research Fund Award: JCYJ20220818103001002

China NSFC projects Award: 62401377

Internal Project of Shenzhen Research Institute of Big Data Award: T00120220002

Cite This Article

Shuai Wang, Zhengyang Chen, Kong Aik Lee, et al. (2024). Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 32, 4971-4998. https://doi.org/10.1109/taslp.2024.3492793

Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning

You May Also Like