Follow
Ching-Hsiang Chu
Ching-Hsiang Chu
Research Scientist, Meta/Facebook
Verified email at meta.com - Homepage
Title
Cited by
Cited by
Year
Software-hardware co-design for fast and scalable training of deep learning recommendation models
D Mudigere, Y Hao, J Huang, Z Jia, A Tulloch, S Sridharan, X Liu, ...
Proceedings of the 49th Annual International Symposium on Computer …, 2022
692022
The MVAPICH project: Transforming research into high-performance MPI library for HPC community
DK Panda, H Subramoni, CH Chu, M Bayatpour
Journal of Computational Science 52, 101208, 2021
602021
Scalable distributed dnn training using tensorflow and cuda-aware mpi: Characterization, designs, and performance evaluation
AA Awan, J Bédorf, CH Chu, H Subramoni, DK Panda
2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2019
552019
Optimized broadcast for deep learning workloads on dense-GPU InfiniBand clusters: MPI or NCCL?
AA Awan, CH Chu, H Subramoni, DK Panda
Proceedings of the 25th European MPI Users' Group Meeting, 1-9, 2018
512018
Nv-group: link-efficient reduction for distributed deep learning on modern dense gpu systems
CH Chu, P Kousha, AA Awan, KS Khorassani, H Subramoni, DK Panda
Proceedings of the 34th ACM International Conference on Supercomputing, 1-12, 2020
382020
M. khorashadi, P
D Mudigere, Y Hao, J Huang, Z Jia, A Tulloch, S Sridharan, X Liu, ...
Bhattacharya, P. Lapukhov, M. Naumov, L. Qiao, M. Smelyanskiy, B. Jia, and V …, 2021
362021
Oc-dnn: Exploiting advanced unified memory capabilities in cuda 9 and volta gpus for out-of-core dnn training
AA Awan, CH Chu, H Subramoni, X Lu, DK Panda
2018 IEEE 25th International Conference on High Performance Computing (HiPC …, 2018
362018
High-performance, distributed training of large-scale deep learning recommendation models
D Mudigere, Y Hao, J Huang, A Tulloch, S Sridharan, X Liu, M Ozdal, ...
arXiv preprint arXiv:2104.05158, 2021
302021
Improving SCTP performance by jitter-based congestion control over wired-wireless networks
JM Chen, CH Chu, EHK Wu, MF Tsai, JR Wang
EURASIP Journal on Wireless Communications and Networking 2011, 1-13, 2011
272011
CUDA kernel based collective reduction operations on large-scale GPU clusters
CH Chu, K Hamidouche, A Venkatesh, AA Awan, DK Panda
2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2016
262016
Performance evaluation of MPI libraries on GPU-enabled OpenPOWER architectures: Early experiences
KS Khorassani, CH Chu, H Subramoni, DK Panda
High Performance Computing: ISC High Performance 2019 International …, 2019
252019
Exploiting GPUDirect RDMA in designing high performance OpenSHMEM for NVIDIA GPU clusters
K Hamidouche, A Venkatesh, AA Awan, H Subramoni, CH Chu, ...
2015 IEEE International Conference on Cluster Computing, 78-87, 2015
242015
Designing high-performance mpi libraries with on-the-fly compression for modern gpu clusters
Q Zhou, C Chu, NS Kumar, P Kousha, SM Ghazimirsaeed, H Subramoni, ...
2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2021
232021
Efficient and scalable multi-source streaming broadcast on GPU clusters for deep learning
CH Chu, X Lu, AA Awan, H Subramoni, J Hashmi, B Elton, DK Panda
2017 46th International Conference on Parallel Processing (ICPP), 161-170, 2017
232017
Communication profiling and characterization of deep-learning workloads on clusters with high-performance interconnects
AA Awan, A Jain, CH Chu, H Subramoni, DK Panda
IEEE Micro 40 (1), 35-43, 2019
212019
Characterizing cuda unified memory (um)-aware mpi designs on modern gpu architectures
KV Manian, AA Ammar, A Ruhela, CH Chu, H Subramoni, DK Panda
Proceedings of the 12th Workshop on General Purpose Processing Using GPUs, 43-52, 2019
202019
Designing a profiling and visualization tool for scalable and in-depth analysis of high-performance GPU clusters
P Kousha, B Ramesh, KK Suresh, CH Chu, A Jain, N Sarkauskas, ...
2019 IEEE 26th International Conference on High Performance Computing, Data …, 2019
192019
IVC: Imperceptible video communication
R Carvalho, CH Chu, LJ Chen
Proc. of HotMobile (poster), 2014
182014
Distributed topology control for energy-efficient and reliable wireless communications
MT Sun, CH Chu, EHK Wu, CS Hsiao, AAK Jeng
IEEE Systems Journal 12 (3), 2152-2161, 2017
172017
Optimized large-message broadcast for deep learning workloads: MPI, MPI+ NCCL, or NCCL2?
AA Awan, KV Manian, CH Chu, H Subramoni, DK Panda
parallel computing 85, 141-152, 2019
162019
The system can't perform the operation now. Try again later.
Articles 1–20