Something very important happened over the summer: The Ryerson Faculty Union won its case against the university in Ontario Superior Court against the use of student teaching evaluations in tenure and promotion decisions (it was silent on merit pay, but I’m fairly sure that’s because Ryerson academics don’t have it – as legal precedent I’m 100% certain merit pay is affected, too). This means literally every university in the country is going to have to re-think the evaluation of teaching – which is a fantastic opportunity to have some genuinely interesting, important national conversations on the subject.
Let’s talk about the decision itself. Technically, it did not tell Ryerson to stop using teaching evaluations in tenure/promotion decisions. What it said was that the university could not use averages from teaching evaluations in tenure/promotion decisions because the averages are meaningless. It left the door open to using distributions of scores, and I think it left the door open to adjusting the averages for various factors. The experts brought in by the Ryerson Faculty Association showed convincingly (see here and here) that student evaluations have been shown to have biases concerning (among other things) race and gender. I think it’s within the spirit of the decision to at least allow the university to use adjusted scores from the student evaluations. For instance, if women are systematically ranked lower by (say) 0.5 on a 5-point scale (or say by a third of a standard deviation if you prefer to calculate it that way), just tack on that amount to the individuals score. Really not that difficult.
The problem, I think, is that there are a lot of voices out there that actually want to do away with student input on teaching altogether. To them, the fact that bias can be corrected is irrelevant. The fact that you can ask way better questions about teaching than are currently being asked, or that you can use questionnaires to focus more on the learning experience than on the instructor, is irrelevant. To them, any student evaluation is just a “satisfaction survey” and what do students know anyway?
Now, there are good ways to evaluate teaching without surveys. Nobel-prize winning physicist Carl Weiman (formerly – if briefly – of UBC) has suggested evaluating professors based on a self-compiled inventory of their (hopefully evidence-informed) teaching practices. Institutions can make greater use of peer assessment of teaching, either formative or summative, although this requires a fair bit of work to train and standardize assessors (people sometimes forget that one argument in favour of student evaluations is that they place almost no work burden on professors, whereas the alternatives all do – be careful of what you wish for).
But I personally think it is untenable that student voices be ignored completely. Students spend half their lives in class rooms. They know good teaching when they see it. They may have a whole bunch of implicit biases which they transfer to their assessment, but the idea that their input is worthless, that they are simply too ignorant to give valuable feedback – which is what a lot of the dismissals of their value amount to – is, frankly, arrogant snobbery. Anyone pushing that line is probably less against the concept of student teaching evaluation than he/she is against the concept of evaluation tout court.
(Don’t dismiss this point. The Ontario Confederation of University Faculty Associations has been very vocal in its campaign against teaching evaluations in the last few years, yet not once to my knowledge has it suggested an alternative. I get the problems with the current system, but if you’re not putting forward alternatives, you’re not arguing for better accountability, you’re arguing for less accountability).
There are alternatives, however. One that universities could consider is the system in use at the University of California Merced, which Beckie Supiano profiled in a great little piece in the Chronicle of Higher Education last year. The Merced program, known as SATAL (Students Assessing Teaching and Learning) trains students in classroom observation, interviewing and reporting techniques. Small teams of students then assess individual classes – some focussing on instructor behavior, others focussing on gathering and synthesizing student feedback. In other words, it professionalizes student feedback.
The real answer here of course is that multiple perspectives on teaching are required both for formative and summative purposes. The Ryerson Faculty Association was right to push back on using averages. The trick now is to use the opportunity this ruling provides to put the assessment of teaching on a more solid footing right across the country. It’s a particular opportunity for student unions: a once-in-a-generation chance to really define what is meant by good teaching and putting it at the heart of the tenure and promotion process. Any student union thinking about focussing on any other issue for the next 24 months is wasting a golden opportunity.
I agree with your general thrust, in particular the importance of seeking and understanding the student’s experience and feedback. This sits within the perceived question of “student” versus “consumer” – which I perceive is simply a reflection of the extent to which institutions truly prioritise teaching effectiveness and faculty accountability for teaching effectiveness. I look to Harvard Business School as an example of a school dedicated to teaching effectiveness (and to holding individual faculty rigorously accountable for teaching effectiveness) – when last seen, every course had student feedback surveying, the school was very strongly encouraging of full participation in those surveys, and the results of those surveys were transparently published and available to everyone (admin, faculty and students alike) for review (and the results influenced student choices of electives and sections). I think the system worked very well, in part because it sat within a larger framework of faculty management. Such a system would of course be completely unacceptable to most faculty members (and their union locals) within our universities and colleges – and the institutions themselves are ill-equipped to manage well using such feedback.
I am given pause by the fact that discriminatory effects of student evaluations are very likely compounding across multiple marginalized identifies. So if a professor is assessed lower for being a woman (0.5 points) and Black (0.75 points), that doesn’t mean that a Black woman will be assessed lower by 1.25 points. It is a compounding, intersectional result of her race and gender, the effect of which may be different across different fields.
I don’t know that the current literature accounts for such complications in a quantitative way. If it does, then yes, by all means, correct for it. But we have to show the complexities of these discriminatory effects first, and having minoritized faculty be the guinea pigs is untenable (not that they aren’t already in many current systems, but at least there is literature showing that they aren’t being evaluated fairly and no claim that their scores have been adjusted appropriately to meet fairness standards when they go before the committee). Throwing the baby out with the bathwater and not having student evaluations at all is hardly the answer either; perhaps there needs to be an interim stage.
Just a couple of notes: Firstly, I could certainly see discrimination working in reverse, with a middle-aged male professor teaching military history being slammed by a class full of women’s studies majors. (No, I don’t know why they’d be in his class — it’s just an example exaggerated to make the point). Certainly I can imagine a prejudice against certain teaching methods taking root, and a peer-based system devolving into a test of faith in the latest doctrines.
My second point arises from the fact that every system of evaluation I’ve seen has claimed rigour and improvement over what preceded it.
The current student-evaluation system in my institution was initially devised by a former psych prof and temporary administrator, convinced that he’d found a rigorous index which alone would reveal to him the best teachers on campus. On the other hand, you want the evaluation to be multi-faceted, which implies a certain humility regarding any one means of assessment. I don’t think we can have both: any measure which claims to be rigorous will inspire hubris. It would be better to have overtly less-rigorous assessments, that we are invited to view with ironic detachment. There was a movement on social media of instructors reading the meanest evaluation comments they received. That’s the spirit.
Thirdly, a multi-faceted system isn’t necessarily any better. If anything, the biases of different assessors and methods might compound each other. I should think that we see something similar in university rankings, which you’ve shown that you regard with a proper irony.