Neural speech synthesis with transformer network N Li, S Liu, Y Liu, S Zhao, M Liu Proceedings of the AAAI conference on artificial intelligence 33 (01), 6706-6713, 2019 | 511 | 2019 |
Close to human quality TTS with transformer N Li, S Liu, Y Liu, S Zhao, M Liu, M Zhou arXiv preprint arXiv:1809.08895, 2018 | 100 | 2018 |
Adaspeech: Adaptive text to speech for custom voice M Chen, X Tan, B Li, Y Liu, T Qin, S Zhao, TY Liu arXiv preprint arXiv:2103.00993, 2021 | 71 | 2021 |
Developing RNN-T models surpassing high-performance hybrid models with customization capability J Li, R Zhao, Z Meng, Y Liu, W Wei, S Parthasarathy, V Mazalov, Z Wang, ... arXiv preprint arXiv:2007.15188, 2020 | 66 | 2020 |
Naturalspeech: End-to-end text to speech synthesis with human-level quality X Tan, J Chen, H Liu, J Cong, C Zhang, Y Liu, X Wang, Y Leng, Y Yi, L He, ... arXiv preprint arXiv:2205.04421, 2022 | 19 | 2022 |
Delightfultts: The microsoft speech synthesis system for blizzard challenge 2021 Y Liu, Z Xu, G Wang, K Chen, B Li, X Tan, J Li, L He, S Zhao arXiv preprint arXiv:2110.12612, 2021 | 17 | 2021 |
Robutrans: A robust transformer-based text-to-speech model N Li, Y Liu, Y Wu, S Liu, S Zhao, M Liu Proceedings of the AAAI Conference on Artificial Intelligence 34 (05), 8228-8235, 2020 | 17 | 2020 |
Moboaligner: A neural alignment model for non-autoregressive tts with monotonic boundary search N Li, S Liu, Y Liu, S Zhao, M Liu, M Zhou arXiv preprint arXiv:2005.08528, 2020 | 8 | 2020 |
A light-weight contextual spelling correction model for customizing transducer-based speech recognition systems X Wang, Y Liu, S Zhao, J Li arXiv preprint arXiv:2108.07493, 2021 | 5 | 2021 |
Mixed-phoneme bert: Improving bert with mixed phoneme and sup-phoneme representations for text to speech G Zhang, K Song, X Tan, D Tan, Y Yan, Y Liu, G Wang, W Zhou, T Qin, ... arXiv preprint arXiv:2203.17190, 2022 | 4 | 2022 |
Delightfultts 2: End-to-end speech synthesis with adversarial vector-quantized auto-encoders Y Liu, R Xue, L He, X Tan, S Zhao arXiv preprint arXiv:2207.04646, 2022 | 3 | 2022 |
RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion D Yin, C Tang, Y Liu, X Wang, Z Zhao, Y Zhao, Z Xiong, S Zhao, C Luo arXiv preprint arXiv:2206.13865, 2022 | 1 | 2022 |
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers C Wang, S Chen, Y Wu, Z Zhang, L Zhou, S Liu, Z Chen, Y Liu, H Wang, ... arXiv preprint arXiv:2301.02111, 2023 | | 2023 |
Towards Contextual Spelling Correction for Customization of End-to-End Speech Recognition Systems X Wang, Y Liu, J Li, V Miljanic, S Zhao, H Khalil IEEE/ACM Transactions on Audio, Speech, and Language Processing 30, 3089-3097, 2022 | | 2022 |
Enhancing Monotonicity for Robust Autoregressive Transformer TTS. X Liang, Z Wu, R Li, Y Liu, S Zhao, H Meng INTERSPEECH, 3181-3185, 2020 | | 2020 |