Follow
Tomasz Korbak
Tomasz Korbak
Anthropic
Verified email at anthropic.com - Homepage
Title
Cited by
Cited by
Year
Open problems and fundamental limitations of reinforcement learning from human feedback
S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ...
arXiv preprint arXiv:2307.15217, 2023
1152023
Pretraining language models with human preferences
T Korbak, K Shi, A Chen, RV Bhalerao, C Buckley, J Phang, SR Bowman, ...
International Conference on Machine Learning, 17506-17533, 2023
692023
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
L Berglund, M Tong, M Kaufmann, M Balesni, AC Stickland, T Korbak, ...
arXiv preprint arXiv:2309.12288, 2023
55*2023
Inverse Scaling: When Bigger Isn't Better
IR McKenzie, A Lyzhov, M Pieler, A Parrish, A Mueller, A Prabhu, ...
arXiv preprint arXiv:2306.09479, 2023
51*2023
Training language models with language feedback at scale
J Scheurer, JA Campos, T Korbak, JS Chan, A Chen, K Cho, E Perez
arXiv preprint arXiv:2303.16755, 2023
512023
Computational enactivism under the free energy principle
T Korbak
Synthese 198 (3), 2743-2763, 2021
312021
Improving code generation by training with natural language feedback
A Chen, J Scheurer, T Korbak, JA Campos, JS Chan, SR Bowman, K Cho, ...
arXiv preprint arXiv:2303.16749, 2023
272023
Aligning language models with preferences through f-divergence minimization
D Go, T Korbak, G Kruszewski, J Rozen, N Ryu, M Dymetman
arXiv preprint arXiv:2302.08215, 2023
252023
Towards understanding sycophancy in language models
M Sharma, M Tong, T Korbak, D Duvenaud, A Askell, SR Bowman, ...
arXiv preprint arXiv:2310.13548, 2023
232023
RL with KL penalties is better viewed as Bayesian inference
T Korbak, E Perez, CL Buckley
arXiv preprint arXiv:2205.11275, 2022
232022
On reinforcement learning and distribution matching for fine-tuning language models with no catastrophic forgetting
T Korbak, H Elsahar, G Kruszewski, M Dymetman
Advances in Neural Information Processing Systems 35, 16203-16220, 2022
202022
Controlling conditional language models without catastrophic forgetting
T Korbak, H Elsahar, G Kruszewski, M Dymetman
International Conference on Machine Learning, 11499-11528, 2022
172022
Interaction history as a source of compositionality in emergent communication
T Korbak, J Zubek, Ł Kuciński, P Miłoś, J Rączaszek-Leonardi
Interaction Studies 22 (2), 212-243, 2021
16*2021
Taken out of context: On measuring situational awareness in LLMs
L Berglund, AC Stickland, M Balesni, M Kaufmann, M Tong, T Korbak, ...
arXiv preprint arXiv:2309.00667, 2023
15*2023
Catalytic role of noise and necessity of inductive biases in the emergence of compositional communication
Ł Kuciński, T Korbak, P Kołodziej, P Miłoś
Advances in Neural Information Processing Systems 34, 23075-23088, 2021
132021
Scaffolded minds and the evolution of content in signaling pathways
T Korbak
Studies in Logic, Grammar and Rhetoric 41 (1), 89-103, 2015
112015
Measuring non-trivial compositionality in emergent communication
T Korbak, J Zubek, J Rączaszek-Leonardi
arXiv preprint arXiv:2010.15058, 2020
82020
Exploiting unsupervised pre-training and automated feature engineering for low-resource hate speech detection in polish
R Korzeniowski, R Rolczyński, P Sadownik, T Korbak, M Możejko
arXiv preprint arXiv:1906.09325, 2019
82019
Energy-based models for code generation under compilability constraints
T Korbak, H Elsahar, M Dymetman, G Kruszewski
arXiv preprint arXiv:2106.04985, 2021
72021
Fine-tuning Tree-LSTM for phrase-level sentiment classification on a polish dependency treebank
T Korbak, P Żak
Language and Technology Conference, 31-42, 2017
42017
The system can't perform the operation now. Try again later.
Articles 1–20