Ammar Ahmad Awan

Cited by

	All	Since 2019
Citations	1352	1179
h-index	21	18
i10-index	27	24

400

200

100

300

2013201420152016201720182019202020212022202320245 13 19 20 26 84 106 133 188 195 381 173

Public access

View all

17 articles

5 articles

available

not available

Based on funding mandates

Co-authors

Dhabaleswar K. PandaProfessor of Computer Science, The Ohio State UniversityVerified email at cse.ohio-state.edu
Hari SubramoniThe Ohio State UniversityVerified email at cse.ohio-state.edu
He YuxiongMicrosoft ResearchVerified email at microsoft.com
Ching-Hsiang ChuResearch Scientist, Meta/FacebookVerified email at meta.com
Khaled HamidoucheAMD ResearchVerified email at amd.com
Jeff RasleyMicrosoftVerified email at microsoft.com
Arpan JainThe Ohio State UniversityVerified email at osu.edu
Reza Yazdani AminabadiMicrosoft ResearchVerified email at microsoft.com
Minjia ZhangUniversity of Illinois at Urbana-ChampaginVerified email at illinois.edu
Conglong LiSenior Researcher at Microsoft, CMU Ph.D.Verified email at microsoft.com
Akshay VenkateshNVIDIA; Ohio State UniversityVerified email at nvidia.com
Olatunji RuwaseMicrosoft ResearchVerified email at microsoft.com
Quentin AnthonyPhD Student, Ohio State UniversityVerified email at osu.edu
Jahanzeb HashmiSenior Architect, NVIDIAVerified email at nvidia.com
Zhewei YaoSnowflakeVerified email at snowflake.com
Xiaoyi LuAssistant Professor, University of California, MercedVerified email at ucmerced.edu
Kawthar Shafie KhorassaniThe Ohio State UniversityVerified email at osu.edu
(Altamont) Bracy Hamilton EltonPenguin ComputingVerified email at bracyelton.com
Sourav ChakrabortySamsung Semiconductor IncVerified email at samsung.com
Zeeshan PERVEZUniversity of WolverhamptonVerified email at wlv.ac.uk

Ammar Ahmad Awan

Microsoft

Verified email at osu.edu - Homepage

Deep Learning HPC Parallel I/O MPI Cloud Computing


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
S-caffe: Co-designing mpi runtimes and caffe for scalable deep learning on modern gpu clusters AA Awan, K Hamidouche, JM Hashmi, DK Panda ACM PPoPP '17 52 (8), 193-205, 2017	177	2017
Deepspeed-moe: Advancing mixture-of-experts inference and training to power next-generation ai scale S Rajbhandari, C Li, Z Yao, M Zhang, RY Aminabadi, AA Awan, J Rasley, ... International conference on machine learning, 18332-18346, 2022	144	2022
Deepspeed-inference: enabling efficient inference of transformer models at unprecedented scale RY Aminabadi, S Rajbhandari, AA Awan, C Li, D Li, E Zheng, O Ruwase, ... SC22: International Conference for High Performance Computing, Networking …, 2022	135	2022
An in-depth performance characterization of CPU-and GPU-based DNN training on modern architectures AA Awan, H Subramoni, DK Panda Proceedings of the Machine Learning on HPC Environments, 1-8, 2017	80	2017
1-bit adam: Communication efficient large-scale training with adam’s convergence speed H Tang, S Gan, AA Awan, S Rajbhandari, C Li, X Lian, J Liu, C Zhang, ... International Conference on Machine Learning, 10118-10129, 2021	70	2021
Scalable and efficient moe training for multitask multilingual models YJ Kim, AA Awan, A Muzio, AFC Salinas, L Lu, A Hendy, S Rajbhandari, ... arXiv preprint arXiv:2109.10465, 2021	60	2021
Scalable distributed dnn training using tensorflow and cuda-aware mpi: Characterization, designs, and performance evaluation AA Awan, J Bédorf, CH Chu, H Subramoni, DK Panda 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2019	56	2019
Efficient large message broadcast using NCCL and CUDA-aware MPI for deep learning AA Awan, K Hamidouche, A Venkatesh, DK Panda Proceedings of the 23rd European MPI Users' Group Meeting, 15-22, 2016	55	2016
Optimized broadcast for deep learning workloads on dense-GPU InfiniBand clusters: MPI or NCCL? AA Awan, CH Chu, H Subramoni, DK Panda Proceedings of the 25th European MPI Users' Group Meeting, 1-9, 2018	52	2018
Privacy-aware searching with oblivious term matching for cloud storage Z Pervez, AA Awan, AM Khattak, S Lee, EN Huh The Journal of Supercomputing 63, 538-560, 2013	47	2013
Gems: Gpu-enabled memory-aware model-parallelism system for distributed dnn training A Jain, AA Awan, AM Aljuhani, JM Hashmi, QG Anthony, H Subramoni, ... SC20: International Conference for High Performance Computing, Networking …, 2020	44	2020
Nv-group: link-efficient reduction for distributed deep learning on modern dense gpu systems CH Chu, P Kousha, AA Awan, KS Khorassani, H Subramoni, DK Panda Proceedings of the 34th ACM International Conference on Supercomputing, 1-12, 2020	39	2020
Performance characterization of dnn training using tensorflow and pytorch on modern clusters A Jain, AA Awan, Q Anthony, H Subramoni, DKDK Panda 2019 IEEE International Conference on Cluster Computing (CLUSTER), 1-11, 2019	39	2019
Oc-dnn: Exploiting advanced unified memory capabilities in cuda 9 and volta gpus for out-of-core dnn training AA Awan, CH Chu, H Subramoni, X Lu, DK Panda 2018 IEEE 25th International Conference on High Performance Computing (HiPC …, 2018	36	2018
Scaling tensorflow, pytorch, and mxnet using mvapich2 for high-performance deep learning on frontera A Jain, AA Awan, H Subramoni, DK Panda 2019 IEEE/ACM Third Workshop on Deep Learning on Supercomputers (DLS), 76-83, 2019	28	2019
Deepspeed-chat: Easy, fast and affordable rlhf training of chatgpt-like models at all scales Z Yao, RY Aminabadi, O Ruwase, S Rajbhandari, X Wu, AA Awan, ... arXiv preprint arXiv:2308.01320, 2023	27	2023
1-bit LAMB: communication efficient large-scale large-batch training with LAMB’s convergence speed C Li, AA Awan, H Tang, S Rajbhandari, Y He 2022 IEEE 29th International Conference on High Performance Computing, Data …, 2022	26	2022
CUDA kernel based collective reduction operations on large-scale GPU clusters CH Chu, K Hamidouche, A Venkatesh, AA Awan, DK Panda 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2016	26	2016
Efficient and scalable multi-source streaming broadcast on GPU clusters for deep learning CH Chu, X Lu, AA Awan, H Subramoni, J Hashmi, B Elton, DK Panda 2017 46th International Conference on Parallel Processing (ICPP), 161-170, 2017	24	2017
Exploiting GPUDirect RDMA in designing high performance OpenSHMEM for NVIDIA GPU clusters K Hamidouche, A Venkatesh, AA Awan, H Subramoni, CH Chu, ... 2015 IEEE International Conference on Cluster Computing, 78-87, 2015	24	2015

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors