Disentangled speaker and language representations using mutual information minimization and domain adaptation for cross-lingual TTS D Xin, T Komatsu, S Takamichi, H Saruwatari ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 15 | 2021 |
Cross-Lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space. D Xin, Y Saito, S Takamichi, T Koriyama, H Saruwatari Interspeech, 2947-2951, 2020 | 14 | 2020 |
Utmos: Utokyo-sarulab system for voicemos challenge 2022 T Saeki, D Xin, W Nakata, T Koriyama, S Takamichi, H Saruwatari arXiv preprint arXiv:2204.02152, 2022 | 9 | 2022 |
Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis. D Xin, Y Saito, S Takamichi, T Koriyama, H Saruwatari Interspeech, 1614-1618, 2021 | 7 | 2021 |
Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal Vocalizations D Xin, S Takamichi, H Saruwatari arXiv preprint arXiv:2206.10695, 2022 | 5 | 2022 |
Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech D Yang, T Koriyama, Y Saito, T Saeki, D Xin, H Saruwatari arXiv preprint arXiv:2302.13652, 2023 | | 2023 |
Improving Speech Prosody of Audiobook Text-to-Speech Synthesis with Acoustic and Textual Contexts D Xin, S Adavanne, F Ang, A Kulkarni, S Takamichi, H Saruwatari arXiv preprint arXiv:2211.02336, 2022 | | 2022 |
Mid-attribute speaker generation using optimal-transport-based interpolation of Gaussian mixture models A Watanabe, S Takamichi, Y Saito, D Xin, H Saruwatari arXiv preprint arXiv:2210.09916, 2022 | | 2022 |
Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation D Xin, S Takamichi, T Okamoto, H Kawai, H Saruwatari arXiv preprint arXiv:2204.10561, 2022 | | 2022 |
Emotional Speech with Nonverbal Vocalizations: Corpus Design, Synthesis, and Detection D Xin | | |