The Ontario Confederation of University Faculty Associations (OCUFA) put out an interesting little piece the week before last summarizing the problems with student evaluations of teaching. It contains reasonable summary of the literature and I thought some of it would be worth looking at here.
We’ve known for awhile now that the results of student evaluations are statistically biased in various ways. Perhaps the most important way they are biased is that professors who mark more leniently get higher rankings from their students. There is also the issue of what appears to be discrimination: female professors and visible minority professors tend to get lower ratings than white men. And then there’s the point that OCUFA makes with respect to the comments section of these evaluations being a hotbed of statements which amount to harassment. These points are all well worth making.
One might well ask: given that we all know about the problems with teaching evaluations, why in God’s name do institutions still use them? Fair question. Three hypotheses:
- Despite flaws in the statistical measurement of teaching, the comments actually do provide helpful feedback, which professors use to improve their teaching.
- When it comes to pay and promotion, research is weighted far more highly than teaching, so unless someone completely tanks their teaching evals – and by tanking I mean doing so much below par that it can’t reasonably be attributed to one of the biases listed above – they don’t really matter all that much (note: while this probably holds for tenured and tenure-track profs, I suspect the stakes are higher for sessionals).
- No matter how bad a measurement instrument they are, the idea that one wouldn’t treat student opinions seriously is totally untenable, politically.
In other words, there are benefits despite the flaws, the consequences of flaws might not be as great as you think, and to put it bluntly, it’s not clear what the alternative is. At least with student evaluations you can maintain the pretense that teaching matters to pay and promotion. Kill those, and what have you got? People already think professors don’t care enough about teaching. Removing the one piece of measurement and accountability for teaching that exists in the system – no matter how flawed – is simply not on.
That’s not to say there aren’t alternatives to measuring teaching. One could imagine a system of peer evaluation, where professors rate one another. Or one could imagine a system where the act of teaching and the act of marking are separated – and teachers are rated on how well their students perform. It’s not obvious to me that professors would prefer such a system.
Besides, it’s not as though the current system can’t be redeemed. Solutions exist. If we know that easy markers get systematically better ratings, then normalize ratings based on the class average mark. Same thing for gender and race: if you know what the systematic bias looks like, you can correct for it. And as for ugly stuff in the comments section, it’s hardly rocket science to have someone edit the material for demeaning comments prior to handing it to the prof in question.
There’s one area where the OCUFA commentary goes beyond the evidence however, and that’s in trying to translate the findings of student teaching evaluations (ie. how did Professor X do in Class Y) to surveys of institutional satisfaction. The argument they make here is that because the one is known to have certain biases, the other should never be used to make funding decisions. Now, without necessarily endorsing the idea of using student satisfaction as a funding metric, this is terrible logic. The two types of questionnaires are entirely different, ask different questions, and simply are not subject to the same kinds of biases. It is deeply misleading to imply otherwise.
Still, all that said, it’s good that this topic is being brought into the spotlight. Teaching is the most important thing universities do. We should have better ways of measuring its impact. If OCUFA can get us moving along that path, more power to them.
This is a complex issue, and I appreciate that it is being addressed here only briefly, but I’m wondering why higher ratings for “lenient” professors is considered “the most important” form of bias, while the well-documented gender, race (and personal appearance) issues are less significant? Adjusting for grades should be much easier, after all, than adjusting for instructors’ characteristics.
Student feedback can be solicited for formative purposes (to improve teaching) without relying on teaching evaluations as they are currently used in hiring/promotion/tenure decisions. But this is the crucial point: a student “opinion” about an instructor is not a proxy for an accurate assessment of student learning.
And there is no need to “imagine a system of peer evaluation”–this already takes place, but it may not be as robust as it could be, in part because some of the same biases re. personal characteristics tend to creep in to classroom observations.
Teaching portfolios that include a range of materials–course syllabi, sample lesson plans, self-reflections, data on student retention or other quantitative factors, and student and peer evaluations–provide far more effective measures of teaching effectiveness, but we also need better ways to evaluate student learning.
As a sessional I am very partial to peer evaluation, preferably from more than one colleague (one tenured/tenure-track and one sessional as well). Regarding promotion, the CA between CUPE3902 and UofT states the following:
“Where they are available, student evaluations, whether conducted by the Department or by a student organization or by any other means, shall not be admissible as the sole determining factor to demonstrate unsatisfactory performance in either the discipline procedure or in arbitration. Departments may make use of student evaluations as an element in the Department’s method for assessing work performance.”
Sorry to pick on one point: marks in a particular course or section could be higher for many reasons beyond the marking being easy, including students actually learning more from Prof X because s/he is a great teacher, so normalizing to correct for easy marking is to simplistic.
I’ll add that given all of the options for assessing teaching quality (and the need to do so since parents and government expect it) — student evaluations, grade outcomes, performance in next course, peer review, the dreadful “level of innovativeness,” and so on — I would always trust students above all else with my fate.