Tangled up in BLEU: Reevaluating the evaluation of automatic machine translation evaluation metrics N Mathur, T Baldwin, T Cohn arXiv preprint arXiv:2006.06264, 2020 | 262 | 2020 |
Results of WMT22 metrics shared task: Stop using BLEU–neural metrics are better and more robust M Freitag, R Rei, N Mathur, C Lo, C Stewart, E Avramidis, T Kocmi, ... Proceedings of the Seventh Conference on Machine Translation (WMT), 46-68, 2022 | 199 | 2022 |
Results of the WMT21 metrics shared task: Evaluating metrics with expert-based human evaluations on TED and news domain M Freitag, R Rei, N Mathur, C Lo, C Stewart, G Foster, A Lavie, O Bojar Proceedings of the Sixth Conference on Machine Translation, 733-774, 2021 | 171 | 2021 |
Results of the WMT20 metrics shared task N Mathur, J Wei, M Freitag, Q Ma, O Bojar Proceedings of the Fifth Conference on Machine Translation, 688-725, 2020 | 152 | 2020 |
Accurate evaluation of segment-level machine translation metrics Y Graham, T Baldwin, N Mathur Proceedings of the 2015 Conference of the North American Chapter of the …, 2015 | 120 | 2015 |
Putting evaluation in context: Contextual embeddings improve machine translation evaluation N Mathur, T Baldwin, T Cohn Proceedings of the 57th Annual Meeting of the Association for Computational …, 2019 | 83 | 2019 |
Results of WMT23 metrics shared task: Metrics might be guilty but references are not innocent M Freitag, N Mathur, C Lo, E Avramidis, R Rei, B Thompson, T Kocmi, ... Proceedings of the Eighth Conference on Machine Translation, 578-628, 2023 | 53 | 2023 |
Randomized significance tests in machine translation Y Graham, N Mathur, T Baldwin Proceedings of the Ninth Workshop on Statistical Machine Translation, 266-274, 2014 | 51 | 2014 |
Chikiu Lo, Craig Stewart, Eleftherios Avramidis, Tom Kocmi, George Foster, Alon Lavie, and André FT Martins. 2022. Results of WMT22 metrics shared task: Stop using BLEU–neural … M Freitag, R Rei, N Mathur Proceedings of the Seventh Conference on Machine Translation (WMT), 46-68, 0 | 21 | |
The impact of multiword expression compositionality on machine translation evaluation B Salehi, N Mathur, P Cook, T Baldwin Proceedings of the 11th Workshop on Multiword Expressions, 54-59, 2015 | 18 | 2015 |
Sequence effects in crowdsourced annotations N Mathur, T Baldwin, T Cohn Proceedings of the 2017 conference on empirical methods in natural language …, 2017 | 12 | 2017 |
Towards efficient machine translation evaluation by modelling annotators N Mathur, T Baldwin, T Cohn Proceedings of the Australasian Language Technology Association Workshop …, 2018 | 4 | 2018 |
Are LLMs breaking MT metrics? results of the WMT24 metrics shared task M Freitag, N Mathur, D Deutsch, CK Lo, E Avramidis, R Rei, B Thompson, ... Proceedings of the Ninth Conference on Machine Translation, 47-81, 2024 | 3 | 2024 |
kiu Lo M Freitag, R Rei, N Mathur C., Stewart, C., Avramidis, E., Kocmi, T., Foster, G., Lavie, A., Martins …, 2022 | 2 | 2022 |
Robustness in Machine Translation Evaluation. N Mathur University of Melbourne, Parkville, Victoria, Australia, 2021 | 2 | 2021 |
Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy B Thompson, N Mathur, D Deutsch, H Khayrallah arXiv preprint arXiv:2409.09598, 2024 | 1 | 2024 |