Coaching Students in Data Analysis
Since 2012, I have taught introductory probability and statistics, Econ 103, to just under 1,000 undergraduate economics majors. Econ 103 had a bad reputation when I began teaching it: relatively few undergraduates are excited about a 9 a.m. mandatory statistics course. Full of energy as a new assistant professor, I redesigned Econ 103 from the ground up. Course evaluations improved substantially, and I happily declared victory. Student evaluations are one thing but practical data analysis skills, I have discovered, are another. The experience of working with two undergraduate research assistants in 2015—both of whom were extremely bright and had taken my Econ 103 course—revealed just how many of the data analysis skills I take for granted were never taught in a course, but instead acquired through years of practice, trial and error. Ever since, I have wanted to design a course that would help students come to grips with substantive applied data analysis.
This past fall I was given carte blanche by my undergraduate chair to develop a new undergraduate elective course in econometrics. The result was Econ 224: Statistical Learning and Causal Inference for Economics, a course covering the basics of statistical learning, a.k.a. “machine learning,” along with modern tools for untangling causal relationships from non-experimental data. My main goal, however, was not so much to impart a body of theoretical knowledge as to coach students through the process of applied data analysis: acquiring, loading, cleaning, and analyzing data, and writing up the results using modern tools for reproducible research. For this reason, and with the help of a course development grant from the Center for Teaching and Learning, I chose to forgo the familiar lecture format and organize Econ 224 around structured, active, in-class learning (SAIL).
Before each class meeting, students completed a short reading assignment and a list of associated reading questions covering the topic for the day. Each week alternated between a statistical learning topic and a causal inference topic, drawn from the books An Introduction to Statistical Learning by James et al. and Mastering Metrics by Angrist and Pischke. To ensure that everyone came prepared, I began each class with a five minute quiz drawn directly from the reading questions. On the whole this system worked well. Students completed the readings as assigned and the quizzes did not appear to cause undue stress; because the quiz questions were known in advance, there was no uncertainty. After the quiz, I typically spent between 10 and 15 minutes answering student questions about the reading assignment. To help get things started, I usually began with a question of my own, often one that implied a criticism of something from the reading. A number of students told me that they particularly enjoyed this approach, because it stimulated them to approach the readings not as an unquestionable repository of truth, but as an argument to be engaged with and evaluated.
I allocated the bulk of each class period, between 50 and 60 minutes, to “labs,” structured assignments that students completed in groups of three to four while the TAs and I walked around the room to facilitate. Labs fell into two categories: tools and applications. The former focused on the nuts and bolts of implementing and interpreting methods from the assigned readings using the R statistical programming language, while the latter turned students loose on substantive data analysis problems, many of them drawn from influential recent papers in economics. All of the semester’s labs are available from the course website: www.ditraglia.com/econ224
The labs were undoubtedly the most challenging part of Econ 224, both to prepare and to facilitate. A problem I encountered early on was the difficulty of gauging how much time would be required for in-class exercises. My earliest labs, for example, were more than double the correct length. This turned out to be something of a blessing in disguise, however, as I simultaneously discovered how time-consuming it was to prepare effective active learning exercises. A related challenge was the wide variation in students’ background knowledge. While some were double majors in computer science and economics, for example, others had never encountered any form of computer programming before. Differences in background are challenging in any class, of course, but active learning makes these differences much more apparent. After only one semester of SAIL teaching, I cannot claim to have hit upon the ideal solution, but two things strike me as crucial. The first is forming effective groups. There seems to be some debate among SAIL instructors as to whether students should be grouped at random, grouped based on background knowledge, or left free to choose their own groups. In the end, I tried a little of each. What is clear, however, is that the same students were much more productive in the right group. The second is having adequate TA support. A typical lab in Econ 224 had roughly one facilitator—i.e. TA or instructor—for each three groups of students. This ratio was small enough that we could devote attention to groups that encountered difficulty, particularly early in the semester. This “triage” approach worked extremely well, dramatically narrowing the initial gap in computing background.
Grades in Econ 224 were based on quizzes, problem sets, class participation and a final project. My intention was for the projects to encompass all aspects of a real-world data analysis: posing a question, finding and cleaning data, choosing appropriate statistical methods, implementing them using R and writing up the results in a professional-quality, reproducible report. I initially planned for students to complete their projects in groups of three to four. This idea proved so unpopular, however, that I allowed the students to vote. In the end, all of them chose to work independently. Because it was impractical for one person to give meaningful advice on 41 projects, I assigned each student a “project supervisor,” either the instructor or a TA, and used private discussion threads on the discussion board Piazza to keep track of each student’s project feedback. This allowed me to check on any student, should the need arise, while ensuring that all of them received meaningful feedback both in and out of class.
After reading 300 pages of student projects, and considerably fewer pages of student evaluations, I am prepared to declare Econ 224 a success. Students enjoyed the SAIL format. Attendance was high, and I had no difficulty keeping them focused during labs. More importantly, even those who initially struggled with R had no trouble carrying out basic programming, data manipulation and analysis tasks by the end of the semester. The code students submitted with their final projects was generally clean and well-documented, and the projects themselves were on the whole competently done. In-class labs gave students considerable practice working with data in R, and this practice clearly paid off. This is the most important lesson that I have drawn from teaching a SAIL course: students improve if you give them constant feedback and opportunities to practice. Creating such opportunities takes work, and giving effective feedback takes practice, but my experience with Econ 224 was an overwhelmingly positive one.
Francis J. DiTraglia is an assistant professor of economics in SAS, a member of the Warren Center for Network and Data Sciences, at Penn Engineering and a visiting researcher at the Philadelphia Federal Reserve Bank.
This essay continues the series that began in the fall of 1994 as the joint creation of the College of Arts and Sciences, the Center for Teaching and Learning and the Lindback Society for Distinguished Teaching.
See https://almanac.upenn.edu/talk-about-teaching-and-learning-archive for previous essays.