CSWin transformer: A general vision transformer backbone with cross-shaped windows X Dong, J Bao, D Chen, W Zhang, N Yu, L Yuan, D Chen, B Guo IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2022), 2021 | 1258 | 2021 |
Mobile-former: Bridging mobilenet and transformer Y Chen, X Dai, D Chen, M Liu, X Dong, L Yuan, Z Liu IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2022), 2021 | 632 | 2021 |
Sharegpt4v: Improving large multi-modal models with better captions L Chen, J Li, X Dong, P Zhang, C He, J Wang, F Zhao, D Lin ECCV 2024, 2023 | 454 | 2023 |
How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites Z Chen, W Wang, H Tian, S Ye, Z Gao, E Cui, W Tong, K Hu, J Luo, Z Ma, ... Science China Information Sciences 67 (12), 220101, 2024 | 352 | 2024 |
Peco: Perceptual codebook for bert pre-training of vision transformers X Dong, J Bao, T Zhang, D Chen, W Zhang, L Yuan, D Chen, F Wen, N Yu Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI), 2021 | 264 | 2021 |
Vlmevalkit: An open-source toolkit for evaluating large multi-modality models H Duan, J Yang, Y Qiao, X Fang, L Chen, Y Liu, X Dong, Y Zang, P Zhang, ... Proceedings of the 32nd ACM international conference on multimedia, 11198-11201, 2024 | 242* | 2024 |
Internlm2 technical report Z Cai, M Cao, H Chen, K Chen, K Chen, X Chen, X Chen, Z Chen, Z Chen, ... arXiv preprint arXiv:2403.17297, 2024 | 219 | 2024 |
Internlm-xcomposer2: Mastering free-form text-image composition and comprehension in vision-language large model X Dong, P Zhang, Y Zang, Y Cao, B Wang, L Ouyang, X Wei, S Zhang, ... arXiv preprint arXiv:2401.16420, 2024 | 217 | 2024 |
Internlm: A multilingual language model with progressively enhanced capabilities ILM Team 2023-01-06)[2023-09-27]. https://github. com/InternLM/InternLM, 2023 | 201 | 2023 |
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition P Zhang, X Dong, B Wang, Y Cao, C Xu, L Ouyang, Z Zhao, S Ding, ... arXiv preprint arXiv:2309.15112, 2023 | 190 | 2023 |
Protecting Celebrities from DeepFake with Identity Consistency Transformer X Dong, J Bao, D Chen, T Zhang, W Zhang, N Yu, D Chen, F Wen, B Guo IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2022), 2022 | 160 | 2022 |
Are We on the Right Way for Evaluating Large Vision-Language Models? L Chen, J Li, X Dong, P Zhang, Y Zang, Z Chen, H Duan, J Wang, Y Qiao, ... arXiv preprint arXiv:2403.20330, 2024 | 147 | 2024 |
Maskclip: Masked self-distillation advances contrastive language-image pretraining X Dong, J Bao, Y Zheng, T Zhang, D Chen, H Yang, M Zeng, W Zhang, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 143 | 2023 |
Opera: Alleviating hallucination in multi-modal large language models via over-trust penalty and retrospection-allocation Q Huang, X Dong, P Zhang, B Wang, C He, J Wang, D Lin, W Zhang, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 140 | 2024 |
Lg-gan: Label guided adversarial network for flexible targeted attack of point cloud based deep networks H Zhou, D Chen, J Liao, K Chen, X Dong, K Liu, W Zhang, G Hua, N Yu Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020 | 120 | 2020 |
Internlm-xcomposer2-4khd: A pioneering large vision-language model handling resolutions from 336 pixels to 4k hd X Dong, P Zhang, Y Zang, Y Cao, B Wang, L Ouyang, S Zhang, H Duan, ... NIPS 2024, 2024 | 112 | 2024 |
Sharegpt4video: Improving video understanding and generation with better captions L Chen, X Wei, J Li, X Dong, P Zhang, Y Zang, Z Chen, H Duan, B Lin, ... arXiv preprint arXiv:2406.04325, 2024 | 90 | 2024 |
Beyond hallucinations: Enhancing lvlms through hallucination-aware direct preference optimization Z Zhao, B Wang, L Ouyang, X Dong, J Wang, C He arXiv preprint arXiv:2311.16839, 2023 | 83 | 2023 |
Long-clip: Unlocking the long-text capability of clip B Zhang, P Zhang, X Dong, Y Zang, J Wang European Conference on Computer Vision, 310-325, 2024 | 81 | 2024 |
Shape-invariant 3D adversarial point clouds Q Huang, X Dong, D Chen, H Zhou, W Zhang, N Yu Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022 | 80 | 2022 |