Look, listen, and act: Towards audio-visual embodied navigation C Gan, Y Zhang, J Wu, B Gong, JB Tenenbaum 2020 IEEE International Conference on Robotics and Automation (ICRA), 9701-9707, 2020 | 165 | 2020 |
Factorized Multimodal Transformer for Multimodal Sequential Learning A Zadeh, C Mao, K Shi, Y Zhang, PP Liang, S Poria, LP Morency arXiv preprint arXiv:1911.09826, 2019 | 60 | 2019 |
Watch, Reason and Code: Learning to Represent Videos Using Program X Duan, Q Wu, C Gan, Y Zhang, W Huang, A van den Hengel, W Zhu Proceedings of the 27th ACM International Conference on Multimedia, 1543-1551, 2019 | 8 | 2019 |
Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors J Sun, C Wang, J Wang, Y Zhang, C Xiao arXiv preprint arXiv:2405.10529, 2024 | 3 | 2024 |