Analyzing Statistics Students’ Writing Before and After the Emergence of Large Language Models

Author

Sara Colando, Erin Franke

Published

July 1, 2025

Overview: The use of Large Language Models (LLMs) has become ubiquitous in academic settings, particularly in written assignments (Baek et al., 2024). Reinhart et al. (2025) identified systematic differences between human writing and LLMs by leveraging Biber feature and lemma usage rates.1 In this project, we investigate whether (and if so, how) students’ statistics writing has systematically shifted toward being more similar to LLM academic writing since LLMs became widely accessible in 2022. We compare student writing to LLM academic writing through two corpora. The HAP-E Corpus contains 1,227 documents for which ChatGPT-4o (August 2024) was asked to generate the next 500 words (in the same tone and style) when prompted with a piece of academic writing (Brown, 2024). Meanwhile, the Student Corpus contains 2,353 student reports from three undergraduate statistics courses at Carnegie Mellon University: 36-202, 36-401, and 36-402. 36-202: Methods for Statistics & Data Science is a lower division course, which is typically the second statistics course that students take. 36-401: Modern Regression and 36-402: Advanced Methods for Data Analysis are upper division courses that statistics majors take their junior or senior year. For the reports, students are given a dataset and asked to answer a domain question with a report in the IMRaD format.2 Ultimately, we find that there systematic shift in both the style and vocabulary of students’ statistics reports toward ChatGPT’s academic writing style in both lower and upper division courses at Carnegie Mellon since 2022. In particular, the writing style of students’ introductions and conclusions has become more similar to ChatGPT’s writing style, on average. We are currently in the progress of figuring out next steps for this project. For more details on our current results and potential next steps, see Erin’s write-up.

References

Baek, C., Tate, T., & Warschauer, M. (2024). “ChatGPT seems too good to be true”: College students’ use and perceptions of generative AI. Computers and Education: Artificial Intelligence, 7, 100294. https://doi.org/10.1016/j.caeai.2024.100294
Biber, D. (1988). Variation across speech and writing. Cambridge University Press. https://doi.org/10.1017/cbo9780511621024
Brown, D. (2024). Browndw/human-ai-parallel-corpus · datasets at hugging face. In browndw/human-ai-parallel-corpus · Datasets at Hugging Face. https://huggingface.co/datasets/browndw/human-ai-parallel-corpus
Reinhart, A., Markey, B., Laudenbach, M., Pantusen, K., Yurko, R., Weinberg, G., & Brown, D. W. (2025). Do LLMs write like humans? Variation in grammatical and rhetorical styles. Proceedings of the National Academy of Sciences, 122(8). https://doi.org/10.1073/pnas.2422455122

Footnotes

  1. Biber features are a set of 67 rhetorical features (e.g., frequency of past tense, participial clauses, mean word length) used to characterize texts (Biber, 1988). Lemmas are the “root” form of a word (e.g. ensure, ensured, and ensures all share the same lemma of ensure).↩︎

  2. IMRaD stands for Introduction, Methods, Results, and Discussion. Generally, students are also asked to write a one-page executive summary which is analogous to an abstract in academic writing.↩︎