Data Analysis Exam Thoughts and Materials
This Spring, at the end of my first-year, I took the Data Analysis (DA) exam, which is the “qualifying” exam for CMU’s Statistics PhD program. The majority of the materials in this notebook are based on the Fall 2024 Regression Analysis (36-707) lecture notes, which can be found here.1
A little background on CMU’s DA Exam from the Statistics Graduate Student Handbook: “At the conclusion of each Spring Semester the Department administers the ‘Data Analysis Exam,’ which is designed to test students’ ability to apply statistical methods to address a substantive, real problem. Students are given eight hours to complete the exam, during which time they analyze the data and write a [ten page] report to present their analysis and conclusions. The faculty are realistic as to what can be accomplished during the eight-hour period. In grading the exam, the faculty are looking for clear presentation of an appropriate analysis of the data. Emphasis is not placed on technical or mathematical sophistication,” (p. 13).
How I Prepared:
In full transparency, beyond taking 36-707 last Fall, I did not do very much to prepare for the DA exam until the week before, where I spent roughly 4-6 hours each day prepping for the exam. That said, I found that a week was sufficient to review important concepts and make templates as well as other notes to use during the exam without being so long that I began to over-complicate concepts and my analysis strategies.
In the week leading up to the exam, I did the following to prepare:
Read through the 36-707 notes (primarily chapters 4-15) and wrote up notes with full code which could be easily copied and pasted into my own DA exam (my completed notes are available here)
Read through completed data analysis exams from some students in upper years of the PhD program
Read through my three revised data analysis reports from 36-707 – paying special attention to things I thought were effective and any comments I received on revisions.
Made a “recipes” sheet with the technical conditions for different models as well as the steps I would take for each model type (the recipes document I made with one of my cohort-mates is available here)
Made a DA report template that I could fill in during the exam, which included rubric items from 36-707 (see Table 1), copied into the relevant report sections.2
Report Section | Rubric Items |
---|---|
Executive Summary |
|
Introduction |
|
The Data (Data Summary and Exploratory Data Analysis) |
|
Methods |
|
Results |
|
Discussion |
|
Other |
|
Especially Helpful Preparations:
Making the recipes document!!!! On the DA exam, I used used a poisson model with an offset, which is a model I had only implemented and interpreted the results of a few times and might not have remembered if I had not made the recipe document ahead of time.
Reviewing APA formatting and practicing interpretations for more complicated scenarios (i.e., when there is an interaction, spline term, etc.).
Putting all my code into one document that could be easily copied into my DA report during the eight hour exam. This ended up saving me a ton of time, especially when I was running diagnostics and reporting results. By scaffolding my code ahead of time, I could focus on more substantive parts of my analysis and spend more time actually writing the report, which was much needed since I am a relatively slow writer.
What Could’ve Gone Better:
Better flexibility in how to approach model diagnostics. The partial residuals function I planned to use did not work for Poisson models with offsets (the function has since been fixed). So, I had to pivot to using randomized quantile residuals, which I did not prepare to use and was less familiar with. Fortunately for me, none of the covariates that I included in my model ended up having a non-linear relationship with my outcome variable, but if they had I think it would have taken me more time to think through transformations etc.
Spending less time on EDA plots. I spent a while trying to make the axis and strip names (for faceted plots) clean and also toggled the width and height of my figures a lot in an attempt to make them nicely spaced in my DA report. While this probably helped a bit in me passing (since it did make my report look nicer), I wish I had used that time to better narrow down which limitations or caveats of my model and the data I talked about in my report. Some sections of my report like like a laundry list of issues with the data and my analysis rather than a focused discussion of which limitations would impact my conclusions and their generalizablility.