Hao Tan

Cited by

	All	Since 2019
Citations	4502	4441
h-index	19	19
i10-index	26	25

1600

800

400

1200

201820192020202120222023202445 81 329 658 1201 1514 651

Public access

View all

6 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Mohit BansalParker Distinguished Professor, Computer Science, UNC Chapel HillVerified email at cs.unc.edu
Trung H. BuiSenior Research Scientist & Research Manager, Adobe ResearchVerified email at adobe.com
Sheng ShenUC BerkeleyVerified email at berkeley.edu
Licheng Yu 虞立成Research Scientist and Manager, Facebook AIVerified email at fb.com
Jie Lei 雷杰Research Scientist, Meta AIVerified email at fb.com
Jaemin ChoPhD Student at UNC Chapel HillVerified email at cs.unc.edu
Zhewei YaoSnowflakeVerified email at snowflake.com
Yicong HongAdobe ResearchVerified email at anu.edu.au
Jialu LiUNC Chapel HillVerified email at cs.unc.edu
Liunian Harold LiUniversity of California, Los AngelesVerified email at cs.ucla.edu
Franck DernoncourtNLP/ML Researcher. MIT PhD.Verified email at adobe.com
Hyounghun KimUlsan National Institute of Science and Technology (UNIST)Verified email at unist.ac.kr
Zhe L. LinSenior Principal Scientist, Adobe ResearchVerified email at adobe.com
Yixin NieMeta, UNC Chapel HillVerified email at meta.com

Hao Tan

Adobe Research

Verified email at adobe.com - Homepage

Vision and Language 3D Multimodal


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Lxmert: Learning cross-modality encoder representations from transformers H Tan, M Bansal Proceedings of the 2019 Conference on Empirical Methods in Natural Language …, 2019	2348	2019
Unifying vision-and-language tasks via text generation J Cho, J Lei, H Tan, M Bansal International Conference on Machine Learning, 1931-1942, 2021	441	2021
How much can clip benefit vision-and-language tasks? S Shen, LH Li, H Tan, M Bansal, A Rohrbach, KW Chang, Z Yao, ... arXiv preprint arXiv:2107.06383, 2021	355	2021
Learning to navigate unseen environments: Back translation with environmental dropout H Tan, L Yu, M Bansal arXiv preprint arXiv:1904.04195, 2019	295	2019
A joint speaker-listener-reinforcer model for referring expressions L Yu, H Tan, M Bansal, TL Berg Proceedings of the IEEE conference on computer vision and pattern …, 2017	287	2017
Vokenization: Improving language understanding with contextualized, visual-grounded supervision H Tan, M Bansal arXiv preprint arXiv:2010.06775, 2020	119	2020
Vimpac: Video pre-training via masked token prediction and contrastive learning H Tan, J Lei, T Wolf, M Bansal arXiv preprint arXiv:2106.11250, 2021	60	2021
Envedit: Environment editing for vision-and-language navigation J Li, H Tan, M Bansal Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022	57	2022
Lrm: Large reconstruction model for single image to 3d Y Hong, K Zhang, J Gu, S Bi, Y Zhou, D Liu, F Liu, K Sunkavalli, T Bui, ... arXiv preprint arXiv:2311.04400, 2023	55	2023
Enabling robots to understand incomplete natural language instructions using commonsense reasoning H Chen, H Tan, A Kuntz, M Bansal, R Alterovitz 2020 IEEE International Conference on Robotics and Automation (ICRA), 1963-1969, 2020	51	2020
Diagnosing the environment bias in vision-and-language navigation Y Zhang, H Tan, M Bansal arXiv preprint arXiv:2005.03086, 2020	50	2020
Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model J Li, H Tan, K Zhang, Z Xu, F Luan, Y Xu, Y Hong, K Sunkavalli, ... arXiv preprint arXiv:2311.06214, 2023	44	2023
Expressing visual relationships via language H Tan, F Dernoncourt, Z Lin, T Bui, M Bansal arXiv preprint arXiv:1906.07689, 2019	39	2019
The curse of performance instability in analysis datasets: Consequences, source, and suggestions X Zhou, Y Nie, H Tan, M Bansal arXiv preprint arXiv:2004.13606, 2020	38	2020
An Effective Framework for Weakly-Supervised Phrase Grounding Q Wang, H Tan, S Shen, M Mahoney, Z Yao Proceedings of the 2020 Conference on Empirical Methods in Natural Language …, 2020	34*	2020
Improving cross-modal alignment in vision language navigation via syntactic information J Li, H Tan, M Bansal arXiv preprint arXiv:2104.09580, 2021	32	2021
Dmv3d: Denoising multi-view diffusion using 3d large reconstruction model Y Xu, H Tan, F Luan, S Bi, P Wang, J Li, Z Shi, K Sunkavalli, G Wetzstein, ... arXiv preprint arXiv:2311.09217, 2023	28	2023
Vidlankd: Improving language understanding via video-distilled knowledge transfer Z Tang, J Cho, H Tan, M Bansal Advances in Neural Information Processing Systems 34, 24468-24481, 2021	26	2021
Modality-balanced models for visual dialogue H Kim, H Tan, M Bansal Proceedings of the AAAI Conference on Artificial Intelligence 34 (05), 8091-8098, 2020	22	2020
Documentclip: Linking figures and main body text in reflowed documents F Liu, H Tan, C Tensmeyer arXiv preprint arXiv:2306.06306, 2023	19	2023

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors