Biography

I received a B.Eng. degree and an M.Sc. degree from Harbin Engineering University in 2012 and 2015, respectively. I received a Ph.D. degree in Electronic and Information Engineering at The Hong Kong Polytechnic University in 2022. I am now a postdoctoral fellow at The Hong Kong Polytechnic University. My research interests include speaker recognition and speech deepfake detection.

Education

Doctor of Philosophy in Electronic and Information Engineering (Speaker Recognition)
The Hong Kong Polytechnic University, Hong Kong SAR, Sep. 2018–Apr. 2022
Thesis title: Deep Speaker Embedding for Robust Speaker Verification
Master of Engineering in Underwater Acoustic Engineering (Acoustic Signal Processing)
Harbin Engineering University, China, Aug. 2012–Mar. 2015
Bachelor of Engineering in Electronic and Information Engineering
Harbin Engineering University, China, Aug. 2008–Jun. 2012

Working Experience

Postdoc Fellow, The Hong Kong Polytechnic University, Dec. 2022–Present
Research Associate, The Hong Kong Polytechnic University, Nov. 2021–Feb. 2022
Research Assistant, The Hong Kong Polytechnic University, Oct. 2017–Aug. 2018

Publication

Journal

Youzhi Tu, Man-Wai Mak, Kong-Aik Lee, and Weiwei Lin, “ConFusionformer: Locality-enhanced Conformer Through Multi-resolution Attention Fusion for Speaker Verification,” Neurocomputing, vol. 644, 2025.
Zezhong Jin, Youzhi Tu, ChongXin Gan, Man-Wai Mak, and Kong-Aik Lee, “Adversarially Adaptive Temperatures for Decoupled Knowledge Distillation With Applications to Speaker Verification,” Neurocomputing, vol. 624, 2025.
Youzhi Tu, Man-Wai Mak, and Jen-Tzung Chien, “Contrastive Self-Supervised Speaker Embedding With Sequential Disentanglement,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 2704–2715, 2024.
Youzhi Tu, Weiwei Lin, and Man-Wai Mak, “A Survey on Text-Dependent and Text-Independent Speaker Verification,” IEEE Access, vol. 10, pp. 99038–99049, 2022.
Youzhi Tu and Man-Wai Mak, “Aggregating Frame-Level Information in the Spectral Domain With Self-Attention for Speaker Embedding,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 944–957, 2022.
Youzhi Tu, Man-Wai Mak, and Jen-Tzung Chien, “Variational Domain Adversarial Learning With Mutual Information Maximization for Speaker Verification,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2013–2024, 2020.

Conference

ChongXin Gan, Youzhi Tu, Zezhong Jin, Man-Wai Mak, and Kong Aik Lee, “Grouped Knowledge Distillation with Adaptive Logit Softening for Speaker Recognition,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025.
Zezhong Jin, Youzhi Tu, Zhe Li, Zilong Huang, ChongXin Gan, and Man-Wai Mak, “Denoising Student Features with Diffusion Models for Knowledge Distillation in Speaker Verification,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025.
Zezhong Jin, Youzhi Tu, and Man-Wai Mak, “W-GVKT: Within-Global-View Knowledge Transfer for Speaker Verification,” in Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), 2024, pp. 3779–3783.
Zezhong Jin, Youzhi Tu, and Man-Wai Mak, “Self-Supervised Learning with Multi-Head Multi-Mode Knowledge Distillation for Speaker Verification,” in Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), 2024, pp. 4723–4727.
Youzhi Tu, Man-Wai Mak, and Jen-Tzung Chien, “Contrastive Speaker Embedding with Sequential Disentanglement,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024, pp. 10891–10895.
Lishi Zuo, Man-Wai Mak, and Youzhi Tu, “Promoting Independence of Depression and Speaker Features for Speaker Disentanglement in Speech-Based Depression Detection,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024, pp. 10191–10195.
Weiwei Lin, ChenHang He, Man-Wai Mak, Youzhi Tu, “Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech Representations,” in Proc. International Conference on Machine Learning (ICML), 2023, pp. 21065–21077.
Youzhi Tu and Man-Wai Mak, “Mutual Information Enhanced Training for Speaker Embedding,” in Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), 2021, pp. 91–95.
Youzhi Tu and Man-Wai Mak, “Short-time Spectral Aggregation for Speaker Embedding,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6708–6712.
Youzhi Tu, Man-Wai Mak, and Jen-Tzung Chien, “Information Maximized Variational Domain Adversarial Learning for Speaker Verification,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020, pp. 6449–6453.
Youzhi Tu, Man-Wai Mak, and Jen-Tzung Chien, “Variational Domain Adversarial Learning for Speaker Verification,” in Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), 2019, pp. 4315–4319.