Fazl Barez

Cited by

	All	Since 2019
Citations	122	122
h-index	5	5
i10-index	4	4

2022202320245 69 47

Co-authors

Shay CohenUniversity of EdinburghVerified email at inf.ed.ac.uk
Philip TorrProfessor, University of OxfordVerified email at eng.ox.ac.uk
David DuvenaudAssociate Professor, University of TorontoVerified email at cs.toronto.edu
Ethan PerezAnthropic; New York UniversityVerified email at anthropic.com
Roger GrosseAssociate Professor, University of TorontoVerified email at cs.toronto.edu
Mrinank SharmaAnthropicVerified email at anthropic.com
Sören MindermannUniversity of Oxford, OATMLVerified email at cs.ox.ac.uk
Jan BraunerUniversity of OxfordVerified email at cs.ox.ac.uk
Jesse MuAnthropicVerified email at anthropic.com
Paul ChristianoNational Institute of Standards and TechnologyVerified email at nist.gov
Samuel R. BowmanNYU and AnthropicVerified email at nyu.edu

Fazl Barez

University of Oxford

Verified email at robots.ox.ac.uk - Homepage

Machine Learning Safety Interpretability Alignment


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
The Larger they are, the Harder they Fail: Language Models do not Recognize Identifier Swaps in Python AVM Barone, F Barez, I Konstas, SB Cohen The 61st Annual Meeting Of The Association For Computational Linguistics, 2023	23*	2023
PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration P Li, H Tang, T Yang, X Hao, T Sang, Y Zheng, J Hao, ME Taylor, Z Wang, ... arXiv preprint arXiv:2203.08553, 2022	23	2022
Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark J Hoelscher-Obermaier, J Persson, E Kran, I Konstas, F Barez* Findings of the Association for Computational Linguistics 2023, 11548–11559, 2023	22	2023
Neuron to Graph: Interpreting Language Model Neurons at Scale A Foote, N Nanda, E Kran, I Konstas, S Cohen, F Barez arXiv preprint arXiv:2305.19911, 2023	10	2023
Sleeper agents: Training deceptive llms that persist through safety training E Hubinger, C Denison, J Mu, M Lambert, M Tong, M MacDiarmid, ... arXiv preprint arXiv:2401.05566, 2024	9	2024
Understanding Addition in Transformers P Quirke, F Barez International Conference on Learning Representations (ICLR), 2023	5	2023
System III: Learning with Domain Knowledge for Safety Constraints F Barez, H Hasanbieg, A Abbate NeurIPS ML Safety Workshop, 2022	5	2022
Benchmarking specialized databases for high-frequency data F Barez, P Bilokon, R Xiong arXiv preprint arXiv:2301.12561, 2023	4	2023
Discovering topics and trends in the UK Government web archive D Beavan, F Barez, M Bel, J Fitzgerald, E Goudarouli, K Kollnig, ... Data Study Group Final Report. Alan Turing Institute, London, 2021	4*	2021
Large language models relearn removed concepts M Lo, SB Cohen, F Barez arXiv preprint arXiv:2401.01814, 2024	3	2024
Exploring the advantages of transformers for high-frequency trading F Barez, P Bilokon, A Gervais, N Lisitsyn arXiv preprint arXiv:2302.13850, 2023	3	2023
Identifying a preliminary circuit for predicting gendered pronouns in gpt-2 small C Mathwin, G Corlouer, E Kran, F Barez, N Nanda URL: https://itch. io/jam/mechint/rate/1889871, 2023	3	2023
Beyond Training Objectives: Interpreting Reward Model Divergence in Large Language Models M Luke, A Amir, N Clement, A Rauno, T Philip, B Fazl https://arxiv.org/abs/2310.08164, 2024	2*	2024
Interpreting Shared Circuits for Ordered Sequence Prediction in a Large Language Model M Lan, F Barez https://arxiv.org/abs/2311.04131, 2023	2*	2023
Increasing Trust in Language Models through the Reuse of Verified Circuits P Quirke, C Neo, F Barez arXiv preprint arXiv:2402.02619, 2024	1	2024
Measuring Value Alignment F Barez, P Torr arXiv preprint arXiv:2312.15241, 2023	1	2023
AI Systems of Concern K Matteucci, S Avin, F Barez, SÓ hÉigeartaigh arXiv preprint arXiv:2310.05876, 2023	1	2023
ED2: an environment dynamics decomposition framework for world model construction C Wang, T Yang, J Hao, Y Zheng, H Tang, F Barez, J Liu, J Peng, H Piao, ... arXiv preprint arXiv:2112.02817, 2021	1	2021
Near to Mid-term Risks and Opportunities of Open Source Generative AI F Eiras, A Petrov, B Vidgen, CS de Witt, F Pizzati, K Elkins, ... arXiv preprint arXiv:2404.17047, 2024		2024
The Scaling Behavior of Large Language Models AV Miceli-Barone, F Barez, SB Cohen, E Voita, U Germann, M Lukasik Proceedings of the First edition of the Workshop on the Scaling Behavior of …, 2024		2024

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors