Ok guys, I’m going to take the rest of the week to nerd out about performance-based funding (PBF) indicators, since clearly this is all anyone here in Ontario is going to be talking about for the next few months. I’m going to start with the issue of what indicators are going to be used—and fair warning: this is going to be long.
(Reminder to readers: I actually do this stuff for a living. If you think your institution needs help understanding the implications of the new PBF, please do get in touch).
Last week, the provincial government briefed institutions on the indicators it would like to use in its new performance-based funding formula, which in theory is meant to eventually govern 60% of all provincial grants to institutions. It won’t actually be that much, for a variety of reasons, but let’s stick to the government script for the moment. Currently, the government still has literally no idea about how it is going to turn data on these ten indicators into actual dollar amounts for institutions -I’ll talk more about how that might work later this week. But we can still make some useful observations about the indicators on their own in three respects. First, with respect to the conceptual validity of the proposed measure, second, with respect to any potential measurement problems, and third with respect to any perverse incentives the indicator may create.
Are you sitting comfortably? Then I’ll begin.
According to some briefing papers handed out by the Ontario Ministry of Training, Colleges and Universities last week (which someone very kindly provided me, you know who you are, many thanks) the new PBF is supposed to be based on ten indicators: six related to “skills and job outlooks” and four related to “economic and community impact”. One of the latter sets of indicators is meant to be designed and measured individually by each university and college (in consultation with the ministry), which is a continuation of practices adopted in previous strategic mandate agreements. One of the “economic” indicators is a dual indicator – a research metric for universities and an “apprenticeship-related” metric for colleges which is “under development” (i.e. the government has no clue what to do about this), so I won’t touch the college one for now. And finally, one of the skills indicators is some kind of direct measurement of skills, presumably using something like the tests HEQCO performed as part of its Postsecondary and Workplace Skills project (which I wrote about back here) –I will deal with separately with that one tomorrow because it’s a huge topic all on its own.
So that leaves us with eight indicators, which are:
Graduation Rate. Assuming they stick with current practice, this will be defined as “percentage of first-time, full-time undergraduate university students who commenced their study in a given Fall term and graduated from the same institution within 6 years.”. Obvious problems include what to do about transfer students (aren’t we supposed to be promoting pathways?)
Graduate Employment. The government is suggesting measuring not employment rates but the “proportion of graduates employed full-time in fields closely or partly related to one’s field of study”. This question is currently asked on both the university and college versions of Ontario Graduate Surveys but is not currently published at an institutional level. It is a self-report question, which means the government does not have to define what a “related” or “partly related” means.
Graduate Earnings. This is currently tracked by the ministry through the Ontario Graduate Surveys, but the government appears to want to switch to using Statistics Canada’s new Educational and Labour Market Longitudinal Platform (ELMLP), through which graduate incomes can be tracked through the tax system as a means of measurement. This is mostly to the good, in the sense that response rates will be high and valid and graduates can be tracked for longer (though it is not clear what the preferred time frame is here), but they will lose the ability to exclude graduates who are enrolled in school for a second degree.
Experiential Learning. The Government’s briefing document indicates it wants to use the “number and proportion” of graduates in programs with experiential learning (this is confusing because it is actually two indicators) for colleges, but substitutes the word “courses” for “programs” when it comes to universities. I have no idea what this means and suspect they may not either. Possibly, this is a complicated way of saying they want to know what proportion of graduates have had a work-integrated learning experience.
Institutional Strength/Focus. This is a weird one. The government says it wants to measure “the proportion of students in an area of strength” at each institution. I can’t see how any institution looking at this metric is going to name anything other than their largest faculty (usually Arts) as their area of strength. Or how OCAD isn’t just going to say “art/design” and get a 100% rating. Maybe there’s some subtlety here that I’m missing but this just seems pointless to me.
Research Funding and Capacity (universities only): Straight up, this is just how much tri-council research funding each institution receives, meaning it could be seen as a form of indirect provincial support to cover overhead on federal research. This seems clear enough, but presumably there will be quite some jostling about definitions, in particular: how is everyone supposed to count money for projects that have investigators at multiple institutions? Should it use the same method as the federal indirect research support program, or some other method? Over how many years will the calculation be made? A multiple-year rolling average seems best, since in any given year the number can be quite volatile at smaller institutions.
“Innovation”. Simply, they mean funding from industry sources (for universities, this is specified as “research income”). The government claims it can get this data from the Statscan/CAUBO Financial Information of Universities and Colleges Survey, although I’m 99% sure that’s not something that gets tracked specifically. Also, important question: do non-profits count as “industry”? Because particularly in the medical field, that’s a heck of a big chunk of the research pie.
“Community/Local Impact”. OK, hold on to your hats. Someone clearly told the government they should have a community impact indicator to make this look like “not just a business thing”, but of course community impact is complex, diffuse, and difficult to measure consistently. So, in their desperation to find a “community” metric which was easy to measure, they settled on…are you ready?…institution size…divided by….community size. No, you’re not misreading that and yes, it’s asinine. I mean, first of all it’s not performance. Secondly, it’s not clear how you measure community; for instance, Confederation College has campuses in five communities, Waterloo has three, etc., so what do you use as a denominator? Third: What? WHAT? Are you KIDDING ME? Set up a battle of wits between this idea and a bag of hammers and the blunt instruments win every time. This idea needs to die in a fire before this process goes any further because it completely undermines the idea of performance indicators. If the province needs a way to quietly funnel money to small town schools (helloooo, Nipissing!) then do it through the rest of the grant, not the performance envelope.
OK, so that is eight indicators. Two of these (community impact, institutional strength) are irretrievably stupid and should be jettisoned at the first opportunity. The “research” and “innovation” measures are reasonable provided sensible definitions are used (multi-year averages, the indirect funding method of counting tri-council income, inclusion of non-profits) and would be non-controversial in most European countries. The experiential learning one is probably OK, but again much depends on the actual definitions chosen.
That leaves the three graduation/employment metrics. There are some technical issues with all of them. The graduation rate definition is one-dimensional, and in most US states a simple grad rate is now usually accompanied by other metrics of progress beyond completion (e.g. indicators for successfully bringing transfer students to degree, or indicators for getting students to complete 30/60/90 credits). The graduate employment “in a related field” is going to make people scream (it might be a useful metric for professional programs, but in most cases degrees aren’t designed to link to occupations and even where they are, people shift occupations after a few years anyway) and in any case it is to be measured through a survey with notoriously low completion rates, which will matter at small institutions. The graduate income measure is technically OK but doesn’t work well as an indicator in some types of PBF systems because it does not scale with institution size (I’ll deal more with this in Thursday’s blog).
But the bigger issue with all three of these is that they conceivably set up some very bad incentives for institutions. In all three of them, institutions could juice their scores by dumping humanities or fine arts programs and admit only white dudes, because that’s who does best in the labour market. I’m not saying they would do this – institutions do have ethical compasses – but it is quite clearly a dynamic that could be in play at the margin. As it stands, there is a strong argument here that these measures have the potential to be anti-diversity and anti-access.
There is, I think, a way to counter this argument. Let’s say the folks in TCU do the right thing and consign those two ridiculous indicators to the dustbin: why not replace them with indicators which encourage broadening participation? For instance, awarding points to institutions which are particularly good at enrolling students with disabilities, Indigenous students, low-income students, etc. The first two are measured already through the current SMA process; the third could be measured through student aid files, if necessary. That way, any institution which tries to win points by being more restrictive in its intake would lose points on another (hopefully equally weighted) indicator, and the institutions which do best would be those that are both open access and have great graduation/employment outcomes. Which, frankly, is as it should be. Tomorrow: Measuring skills.
You’re right that the easy way to satisfy a measurement of job-success is cynically, but your partial response to the problem of measuring job success, balancing it with diversity measurements, could just produce an equally perverse system, in which tokenism would balance elitism, without either cancelling the other out.
More generally, I’d have to add that something similar is true for the other measurements and indeed, for measurement in general. If community impact is measured by the division you mention above, relocate to a smaller centre. In any case, all such measurements run up against Goodhart’s law: any metric becomes useless the moment that it becomes a goal. If the goal is more grants, then the solution is not better research, but more expensive research. If the goal is diversity, then the solution is tokenism. If the goal is job-market success for graduates, then the solution is cancelling all programs that don’t precisely lead into jobs. And so forth.
Fortunately, as you point out, universities aren’t entirely cynical, so these perverse effects will (hopefully) be limited. PBF, however, assumes that institutions won’t do the right thing on their own. There’d be no reason for PBF if the government thought institutions didn’t have to be “incentivized.” And it’s incentives which, I’ve argued above, inspire cynical responses.
Thanks for the succinct overview and critique of the proposed changes. I did not know about the existence of the ELMLP. This sort of population data seems superior to the Ontario Graduate Survey, at least for measuring reported income. Thanks for bringing that to my attention.
I always thought that if we wanted to know what skills and competences students gained while in university, we would assess students when they entere university and then when they graduate. Now what would that instrument be for the pre- and post-test? That’s the $100,000 question. While this wouldn’t necessarily control for student learning outside of university while attending university (for the sake of argument, assume students learn in two places: [1] in activities directly related to universities such as in lab, seminar, lecture, group work, reading, giving presentations, and so on; and [2] elsewhere), it would be superior to anything else I’ve seen for measuring learning. And yes, it would be even better than grades, which are our current means of indicating that students have learned something. With that said, it probably wouldn’t be better than professional accreditation. Still, it would be an improvement for most university programs, since most are not professional programs.
Alex covers this in his next post. He doesn’t mention it, but there was a rather famous study, entitled Academically Adrift, using the Collegiate Learning Assessment as an instrument to measure improvements in critical thinking and reading skills. Normally, it’s only given to graduates, but in this case it was also given to continuing students after their second years, and to graduating high school students. The headline figure was that nearly 40% of students made no improvement over their undergraduate education, but this missed the stronger point, that those who saw the greatest improvements were in the humanities and natural sciences, while those who saw the smallest increase (on this particular metric) were in the social sciences and professional programs. The most successful disciplines, moreover, turned out to be those with the highest reading and writing loads.
That said, nobody during the experiment was going to be judged on the results, so there wouldn’t have been a tendency to teach to the test. If this practice were generalized, there probably would be. What this does show, nevertheless, is that traditional teaching methods in traditional disciplines work, and probably shouldn’t be monkeyed with by administrators armed with standardized tests.
Of course the one thing you can’t tule out with respect to the Arum & Roska, precisely because “critical thinking” is not a fully-nailed-down-concept, is that maybe the CLA’s construct happened to conincide with the way it’s conceptualized in social sciences and humanities, thus giving those fields an advantage.
Good point, though social sciences did much worse than humanities, and so-called “hard sciences” did better than either. So critical thinking would seem to enjoy a conceptualization in keeping with older studies, in general.
This only seems fair, however. An index of numeracy or job skills would likely conceptualize those things in manners strikingly aligned with the fields being measured, as well.