Exploring the limits of transfer learning with a unified text-to-text transformer C Raffel, N Shazeer, A Roberts, K Lee, S Narang, M Matena, Y Zhou, W Li, ... The Journal of Machine Learning Research 21 (1), 5485-5551, 2020 | 6860 | 2020 |
Deep speech 2: End-to-end speech recognition in english and mandarin D Amodei, S Ananthanarayanan, R Anubhai, J Bai, E Battenberg, C Case, ... International conference on machine learning, 173-182, 2016 | 3118 | 2016 |
Mixed precision training P Micikevicius, S Narang, J Alben, G Diamos, E Elsen, D Garcia, ... arXiv preprint arXiv:1710.03740, 2017 | 1175 | 2017 |
Deep voice 3: Scaling text-to-speech with convolutional sequence learning W Ping, K Peng, A Gibiansky, SO Arik, A Kannan, S Narang, J Raiman, ... arXiv preprint arXiv:1710.07654, 2017 | 690* | 2017 |
Palm: Scaling language modeling with pathways A Chowdhery, S Narang, J Devlin, M Bosma, G Mishra, A Roberts, ... arXiv preprint arXiv:2204.02311, 2022 | 537 | 2022 |
Deep learning scaling is predictable, empirically J Hestness, S Narang, N Ardalani, G Diamos, H Jun, H Kianinejad, ... arXiv preprint arXiv:1712.00409, 2017 | 399 | 2017 |
Exploring sparsity in recurrent neural networks S Narang, E Elsen, G Diamos, S Sengupta arXiv preprint arXiv:1704.05119, 2017 | 288 | 2017 |
DSD: regularizing deep neural networks with dense-sparse-dense training flow S Han, J Pool, S Narang, H Mao, S Tang, E Elsen, B Catanzaro, J Tran, ... | 277* | 2016 |
Byt5: Towards a token-free future with pre-trained byte-to-byte models L Xue, A Barua, N Constant, R Al-Rfou, S Narang, M Kale, A Roberts, ... Transactions of the Association for Computational Linguistics 10, 291-306, 2022 | 130 | 2022 |
Block-sparse recurrent neural networks S Narang, E Undersander, G Diamos arXiv preprint arXiv:1711.02782, 2017 | 115 | 2017 |
Wt5?! training text-to-text models to explain their predictions S Narang, C Raffel, K Lee, A Roberts, N Fiedel, K Malkan arXiv preprint arXiv:2004.14546, 2020 | 83 | 2020 |
Scaling instruction-finetuned language models HW Chung, L Hou, S Longpre, B Zoph, Y Tay, W Fedus, E Li, X Wang, ... arXiv preprint arXiv:2210.11416, 2022 | 60 | 2022 |
Do transformer modifications transfer across implementations and applications? S Narang, HW Chung, Y Tay, W Fedus, T Fevry, M Matena, K Malkan, ... arXiv preprint arXiv:2102.11972, 2021 | 47 | 2021 |
Scaling up models and data with t5x and seqio A Roberts, HW Chung, A Levskaya, G Mishra, J Bradbury, D Andor, ... arXiv preprint arXiv:2203.17189 13, 2022 | 40 | 2022 |
Scale efficiently: Insights from pre-training and fine-tuning transformers Y Tay, M Dehghani, J Rao, W Fedus, S Abnar, HW Chung, S Narang, ... arXiv preprint arXiv:2109.10686, 2021 | 33 | 2021 |
Neural assistant: Joint action prediction, response generation, and latent knowledge reasoning A Neelakantan, S Yavuz, S Narang, V Prasad, B Goodrich, D Duckworth, ... arXiv preprint arXiv:1910.14613, 2019 | 15 | 2019 |
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling? Y Tay, M Dehghani, S Abnar, HW Chung, W Fedus, J Rao, S Narang, ... arXiv preprint arXiv:2207.10551, 2022 | 14 | 2022 |
Exploring the limits of transfer learning with a unified text-to-text transformer A Roberts, C Raffel, K Lee, M Matena, N Shazeer, PJ Liu, S Narang, W Li, ... | 6 | 2019 |
Predicting deep learning scaling J Hestness, G Diamos, HW Jun, S Narang, N Ardalani, MMA Patwary, ... US Patent App. 16/206,910, 2020 | 5 | 2020 |
Systems and methods for neural text-to-speech using convolutional sequence learning P Wei, P Kainan, S NARANG, A KANNAN, A GIBIANSKY, J RAIMAN, ... US Patent 10,796,686, 2020 | 4 | 2020 |