Don't stop pretraining: Adapt language models to domains and tasks S Gururangan, A Marasović, S Swayamdipta, K Lo, I Beltagy, D Downey, ... arXiv preprint arXiv:2004.10964, 2020 | 2264 | 2020 |
Annotation artifacts in natural language inference data S Gururangan, S Swayamdipta, O Levy, R Schwartz, SR Bowman, ... arXiv preprint arXiv:1803.02324, 2018 | 1230 | 2018 |
Realtoxicityprompts: Evaluating neural toxic degeneration in language models S Gehman, S Gururangan, M Sap, Y Choi, NA Smith arXiv preprint arXiv:2009.11462, 2020 | 952 | 2020 |
All that's' human'is not gold: Evaluating human evaluation of generated text E Clark, T August, S Serrano, N Haduong, S Gururangan, NA Smith arXiv preprint arXiv:2107.00061, 2021 | 365 | 2021 |
Show your work: Improved reporting of experimental results J Dodge, S Gururangan, D Card, R Schwartz, NA Smith arXiv preprint arXiv:1909.03004, 2019 | 272 | 2019 |
Editing models with task arithmetic G Ilharco, MT Ribeiro, M Wortsman, S Gururangan, L Schmidt, ... arXiv preprint arXiv:2212.04089, 2022 | 269 | 2022 |
The llama 3 herd of models A Dubey, A Jauhri, A Pandey, A Kadian, A Al-Dahle, A Letman, A Mathur, ... arXiv preprint arXiv:2407.21783, 2024 | 221 | 2024 |
Variational pretraining for semi-supervised text classification S Gururangan, T Dang, D Card, NA Smith arXiv preprint arXiv:1906.02242, 2019 | 138 | 2019 |
Detoxifying language models risks marginalizing minority voices A Xu, E Pathak, E Wallace, S Gururangan, M Sap, D Klein arXiv preprint arXiv:2104.06390, 2021 | 111 | 2021 |
Branch-train-merge: Embarrassingly parallel training of expert language models M Li, S Gururangan, T Dettmers, M Lewis, T Althoff, NA Smith, ... arXiv preprint arXiv:2208.03306, 2022 | 110 | 2022 |
Demix layers: Disentangling domains for modular language modeling S Gururangan, M Lewis, A Holtzman, NA Smith, L Zettlemoyer arXiv preprint arXiv:2108.05036, 2021 | 101 | 2021 |
Time waits for no one! analysis and challenges of temporal misalignment K Luu, D Khashabi, S Gururangan, K Mandyam, NA Smith arXiv preprint arXiv:2111.07408, 2021 | 68 | 2021 |
Less: Selecting influential data for targeted instruction tuning M Xia, S Malladi, S Gururangan, S Arora, D Chen arXiv preprint arXiv:2402.04333, 2024 | 51 | 2024 |
kNN-Prompt: Nearest Neighbor Zero-Shot Inference W Shi, J Michael, S Gururangan, L Zettlemoyer arXiv preprint arXiv:2205.13792, 2022 | 44 | 2022 |
Silo language models: Isolating legal risk in a nonparametric datastore S Min, S Gururangan, E Wallace, W Shi, H Hajishirzi, NA Smith, ... arXiv preprint arXiv:2308.04430, 2023 | 43 | 2023 |
Scaling expert language models with unsupervised domain discovery S Gururangan, M Li, M Lewis, W Shi, T Althoff, NA Smith, L Zettlemoyer arXiv preprint arXiv:2303.14177, 2023 | 30 | 2023 |
Osworld: Benchmarking multimodal agents for open-ended tasks in real computer environments T Xie, D Zhang, J Chen, X Li, S Zhao, R Cao, TJ Hua, Z Cheng, D Shin, ... arXiv preprint arXiv:2404.07972, 2024 | 19 | 2024 |
Whose language counts as high quality? measuring language ideologies in text data selection S Gururangan, D Card, SK Dreier, EK Gade, LZ Wang, Z Wang, ... arXiv preprint arXiv:2201.10474, 2022 | 19 | 2022 |
Analysis of graph invariants in functional neocortical circuitry reveals generalized features common to three areas of sensory cortex SS Gururangan, AJ Sadovsky, JN MacLean PLoS computational biology 10 (7), e1003710, 2014 | 18 | 2014 |
M2D2: A massively multi-domain language modeling dataset M Reid, V Zhong, S Gururangan, L Zettlemoyer arXiv preprint arXiv:2210.07370, 2022 | 15 | 2022 |