Pre-trained models: Past, present and future X Han, Z Zhang, N Ding, Y Gu, X Liu, Y Huo, J Qiu, Y Yao, A Zhang, ... AI Open 2, 225-250, 2021 | 937 | 2021 |
Learning depth-guided convolutions for monocular 3d object detection M Ding, Y Huo, H Yi, Z Wang, J Shi, Z Lu, P Luo Proceedings of the IEEE/CVF Conference on computer vision and pattern …, 2020 | 364 | 2020 |
Towards artificial general intelligence via a multimodal foundation model N Fei, Z Lu, Y Gao, G Yang, Y Huo, J Wen, H Lu, R Song, X Gao, T Xiang, ... Nature Communications 13 (1), 3094, 2022 | 238 | 2022 |
WenLan: Bridging vision and language by large-scale multi-modal pre-training Y Huo, M Zhang, G Liu, H Lu, Y Gao, G Yang, J Wen, H Zhang, B Xu, ... arXiv preprint arXiv:2103.06561, 2021 | 145 | 2021 |
Cots: Collaborative two-stream vision-language pre-training model for cross-modal retrieval H Lu, N Fei, Y Huo, Y Gao, Z Lu, JR Wen Proceedings of the IEEE/CVF conference on computer Vision and pattern …, 2022 | 78 | 2022 |
Vdt: General-purpose video diffusion transformers via mask modeling H Lu, G Yang, N Fei, Y Huo, Z Lu, P Luo, M Ding arXiv preprint arXiv:2305.13311, 2023 | 59* | 2023 |
Lightweight action recognition in compressed videos Y Huo, X Xu, Y Lu, Y Niu, M Ding, Z Lu, T Xiang, J Wen Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020 …, 2020 | 42* | 2020 |
Uniadapter: Unified parameter-efficient transfer learning for cross-modal modeling H Lu, Y Huo, G Yang, Z Lu, W Zhan, M Tomizuka, M Ding arXiv preprint arXiv:2302.06605, 2023 | 30 | 2023 |
Self-supervised video representation learning with constrained spatiotemporal jigsaw Y Huo, M Ding, H Lu, Z Lu, T Xiang, JR Wen, Z Huang, J Jiang, S Zhang, ... | 26* | 2021 |
Coarse-to-fine grained classification Y Huo, Y Lu, Y Niu, Z Lu, JR Wen Proceedings of the 42nd International ACM SIGIR Conference on Research and …, 2019 | 22 | 2019 |
& Zhu, J.(2021) X Han, Z Zhang, N Ding, Y Gu, X Liu, Y Huo, J Qiu, A Zhang, L Zhang, ... Pre-trained models: Past, present and future. AI Open 2, 225-250, 0 | 22 | |
Wenlan 2.0: Make ai imagine via a multimodal foundation model N Fei, Z Lu, Y Gao, G Yang, Y Huo, J Wen, H Lu, R Song, X Gao, T Xiang, ... arXiv, 2021 | 21 | 2021 |
Learning versatile neural architectures by propagating network codes M Ding, Y Huo, H Lu, L Yang, Z Wang, Z Lu, J Wang, P Luo arXiv preprint arXiv:2103.13253, 2021 | 16 | 2021 |
Cross-modal contrastive learning for generalizable and efficient image-text retrieval H Lu, Y Huo, M Ding, N Fei, Z Lu Machine Intelligence Research 20 (4), 569-582, 2023 | 12 | 2023 |
Lgdn: Language-guided denoising network for video-language modeling H Lu, M Ding, N Fei, Y Huo, Z Lu Advances in Neural Information Processing Systems 35, 25198-25211, 2022 | 12 | 2022 |
Not all bug reopens are negative: A case study on eclipse bug reports Q Mi, J Keung, Y Huo, S Mensah Information and Software Technology 99, 93-97, 2018 | 11 | 2018 |
Compressed video contrastive learning Y Huo, M Ding, H Lu, N Fei, Z Lu, JR Wen, P Luo Advances in Neural Information Processing Systems 34, 14176-14187, 2021 | 10 | 2021 |
Baichuan-omni technical report Y Li, H Sun, M Lin, T Li, G Dong, T Zhang, B Ding, W Song, Z Cheng, ... arXiv preprint arXiv:2410.08565 3 (7), 2024 | 9 | 2024 |
Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs Z Zhao, H Lu, Y Huo, Y Du, T Yue, L Guo, B Wang, W Chen, J Liu arXiv preprint arXiv:2406.09367, 2024 | 8 | 2024 |
Towards Event-oriented Long Video Understanding Y Du, K Zhou, Y Huo, Y Li, WX Zhao, H Lu, Z Zhao, B Wang, W Chen, ... arXiv preprint arXiv:2406.14129, 2024 | 6 | 2024 |