Follow
Pan Zexu
Title
Cited by
Cited by
Year
Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection
R Tao, Z Pan, RK Das, X Qian, MZ Shou, H Li
Proceedings of the 29th ACM international conference on multimedia, 3927-3935, 2021
1352021
Multi-modal Attention for Speech Emotion Recognition
Z Pan, Z Luo, J Yang, H Li
Proc. Interspeech 2020, 364--368, 2020
712020
Selective listening by synchronizing speech with lips
Z Pan, R Tao, C Xu, H Li
IEEE/ACM Transactions on Audio, Speech and Language Processing 30, 1650 - 1664, 2022
342022
Muse: Multi-modal target speaker extraction with visual cues
Z Pan, R Tao, C Xu, H Li
ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021
332021
Multi-target DoA estimation with an audio-visual fusion mechanism
X Qian, M Madhavi, Z Pan, J Wang, H Li
ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021
322021
USEV: Universal speaker extraction with visual cue
Z Pan, M Ge, H Li
IEEE/ACM Transactions on Audio, Speech and Language Processing 30, 3032 - 3045, 2022
272022
Speaker Extraction with Co-Speech Gestures Cue
Z Pan, X Qian, H Li
IEEE Signal Processing Letters 29, 1467 - 1471, 2022
172022
VCSE: Time-Domain Visual-Contextual Speaker Extraction Network
J Li, M Ge, Z Pan, L Wang, J Dang
Proc. Interspeech 2022, 906-910, 2022
102022
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction
Z Pan, M Ge, H Li
Proc. Interspeech 2022, 2022
92022
Target active speaker detection with audio-visual cues
Y Jiang, R Tao, Z Pan, H Li
arXiv preprint arXiv:2305.12831, 2023
62023
Time-domain speech separation networks with graph encoding auxiliary
T Wang, Z Pan, M Ge, Z Yang, H Li
IEEE Signal Processing Letters 30, 110-114, 2023
62023
Is someone speaking
R Tao, Z Pan, RK Das, X Qian, MZ Shou, H Li
Proceedings of the 29th ACM International Conference on Multimedia, Oct, 2021
62021
Towards end-to-end speaker diarization in the wild
Z Pan, G Wichern, FG Germain, A Subramanian, J Le Roux
arXiv preprint arXiv: 2211.01299, 2022
42022
NeuroHeed: Neuro-steered speaker extraction using eeg signals
Z Pan, M Borsdorf, S Cai, T Schultz, H Li
arXiv preprint arXiv:2307.14303, 2023
32023
Rethinking the visual cues in audio-visual speaker extraction
J Li, M Ge, Z Pan, R Cao, L Wang, J Dang, S Zhang
arXiv preprint arXiv:2306.02625, 2023
32023
ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding Inpainting
Z Pan, W Wang, M Borsdorf, H Li
ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2022
32022
NeuroHeed+: Improving neuro-steered speaker extraction with joint auditory attention detection
Z Pan, G Wichern, FG Germain, S Khurana, JL Roux
arXiv preprint arXiv:2312.07513, 2023
12023
Generation or Replication: Auscultating Audio Latent Diffusion Models
D Bralios, G Wichern, FG Germain, Z Pan, S Khurana, C Hori, JL Roux
arXiv preprint arXiv:2310.10604, 2023
12023
LocSelect: Target Speaker Localization with an Auditory Selective Hearing Mechanism
Y Chen, X Qian, Z Pan, K Chen, H Li
arXiv preprint arXiv:2310.10497, 2023
12023
Restoring Speaking Lips from Occlusion for Audio-Visual Speech Recognition
J Wang, Z Pan, M Zhang, RT Tan, H Li
Proceedings of the AAAI Conference on Artificial Intelligence 38 (17), 19144 …, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–20