Speechgpt: Empowering large language models with intrinsic cross-modal conversational abilities D Zhang, S Li, X Zhang, J Zhan, P Wang, Y Zhou, X Qiu arXiv preprint arXiv:2305.11000, 2023 | 78 | 2023 |
Speechtokenizer: Unified speech tokenizer for speech large language models X Zhang, D Zhang, S Li, Y Zhou, X Qiu arXiv preprint arXiv:2308.16692, 2023 | 13 | 2023 |
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling J Zhan, J Dai, J Ye, Y Zhou, D Zhang, Z Liu, X Zhang, R Yuan, G Zhang, ... arXiv preprint arXiv:2402.12226, 2024 | 6 | 2024 |
SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models X Zhang, D Zhang, S Li, Y Zhou, X Qiu | 1 | |
SpeechAlign: Aligning Speech Generation to Human Preferences D Zhang, Z Li, S Li, X Zhang, P Wang, Y Zhou, X Qiu arXiv preprint arXiv:2404.05600, 2024 | | 2024 |
SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation D Zhang, X Zhang, J Zhan, S Li, Y Zhou, X Qiu arXiv preprint arXiv:2401.13527, 2024 | | 2024 |
SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems D Zhang, Z Li, P Wang, X Zhang, Y Zhou, X Qiu arXiv preprint arXiv:2401.03945, 2024 | | 2024 |