Follow
Łukasz Kaiser
Łukasz Kaiser
OpenAI & CNRS
Verified email at openai.com - Homepage
Title
Cited by
Cited by
Year
Attention is all you need
A Vaswani
Advances in Neural Information Processing Systems, 2017
1368632017
TensorFlow: Large-scale machine learning on heterogeneous systems
M Abadi, A Agarwal, P Barham, E Brevdo, Z Chen, C Citro, GS Corrado, ...
31656*2015
Google’s neural machine translation system: Bridging the gap between human and machine translation
Y Wu
arXiv preprint arXiv:1609.08144, 2016
89732016
Gpt-4 technical report
J Achiam, S Adler, S Agarwal, L Ahmad, I Akkaya, FL Aleman, D Almeida, ...
arXiv preprint arXiv:2303.08774, 2023
42612023
Evaluating large language models trained on code
M Chen, J Tworek, H Jun, Q Yuan, HPDO Pinto, J Kaplan, H Edwards, ...
arXiv preprint arXiv:2107.03374, 2021
29522021
Reformer: The efficient transformer
N Kitaev, Ł Kaiser, A Levskaya
arXiv preprint arXiv:2001.04451, 2020
27042020
Image transformer
N Parmar, A Vaswani, J Uszkoreit, L Kaiser, N Shazeer, A Ku, D Tran
International conference on machine learning, 4055-4064, 2018
20062018
Advances in neural information processing systems
A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ...
Attention is all you need, 2017
19132017
Training verifiers to solve math word problems
K Cobbe, V Kosaraju, M Bavarian, M Chen, H Jun, L Kaiser, M Plappert, ...
arXiv preprint arXiv:2110.14168, 2021
18372021
Attention is all you need
V Ashish
Advances in neural information processing systems 30, I, 2017
17992017
Rethinking attention with performers
K Choromanski, V Likhosherstov, D Dohan, X Song, A Gane, T Sarlos, ...
arXiv preprint arXiv:2009.14794, 2020
15822020
Attention Is All You Need.(Nips), 2017
A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ...
arXiv preprint arXiv:1706.03762 10, S0140525X16001837, 2017
14502017
Regularizing neural networks by penalizing confident output distributions
G Pereyra, G Tucker, J Chorowski, Ł Kaiser, G Hinton
arXiv preprint arXiv:1701.06548, 2017
12692017
Grammar as a Foreign Language
O Vinyals
arXiv preprint arXiv:1412.7449, 2015
11452015
Model-based reinforcement learning for atari
L Kaiser, M Babaeizadeh, P Milos, B Osinski, RH Campbell, ...
arXiv preprint arXiv:1903.00374, 2019
9832019
Generating wikipedia by summarizing long sequences
PJ Liu, M Saleh, E Pot, B Goodrich, R Sepassi, L Kaiser, N Shazeer
arXiv preprint arXiv:1801.10198, 2018
9832018
Universal transformers
M Dehghani, S Gouws, O Vinyals, J Uszkoreit, Ł Kaiser
arXiv preprint arXiv:1807.03819, 2018
9682018
Multi-task sequence to sequence learning
MT Luong, QV Le, I Sutskever, O Vinyals, L Kaiser
arXiv preprint arXiv:1511.06114, 2015
9642015
Tensor2tensor for neural machine translation
A Vaswani, S Bengio, E Brevdo, F Chollet, AN Gomez, S Gouws, L Jones, ...
arXiv preprint arXiv:1803.07416, 2018
6372018
Adding gradient noise improves learning for very deep networks
A Neelakantan, L Vilnis, QV Le, I Sutskever, L Kaiser, K Kurach, J Martens
arXiv preprint arXiv:1511.06807, 2015
6072015
The system can't perform the operation now. Try again later.
Articles 1–20