Performance-Based Funding 101: Measuring Skills

Yesterday,  I critiqued most of the indicators being suggested for the new Ontario PBF system.  But I left one out because I thought it was worth a blog all on its own, and that is the indicator related to “skills and competencies”.  It’s the indicator that is likely to draw the most heat from the higher education traditionalists, and so it is worth drilling into.

In principle, measuring the ability of institutions to provide students more of the skills that allow them to succeed in the work force is perfectly sound.  Getting a better job is not the only reason students attend university or college, but repeated surveys over many decades show it is the most important one.  Similarly, while labour force improvement is not the only reason governments fund universities, it is the most important one (it pretty much is the only reason they fund colleges).  So, it is in no way wrong to try to find out whether institutions are doing a good job at this.  And if you don’t like simple employment or salary measures (both of which are fairly crude), then it makes sense to look at something like skills, and the contribution each institution makes towards improving them.

Problem is, there isn’t exactly a consensus on how to do that. 

In some countries – Brazil and Jordan, for instance – students take national subject-matter examinations.  Nobody seems to think this is a good idea in OECD countries, partly this is because there tends to be a lot of curriculum diversity, making the setting of such national tests hilariously fraught. But mostly, it’s because in many fields, there is no particularly straight line between disciplinary knowledge and the kinds of skills and competencies required in the labour market.  One is not necessarily superior to the other, but they are just different things.  And so was born the idea to try to measure these skills directly and more importantly, to measure their change over time.

The dominant international approach is a variation of the Collegiate Learning Assessment (CLA), pioneered about 20 years ago by the Council for Aid to Education (a division of RAND Corporation).  The approach is, essentially, give the same exam to a (hopefully randomly-selected) group of students during their first and final year of studies at an institution.  Assuming there has been no major change in the qualities of the entering student body (which can be done in the US by looking at average SAT scores, but is harder to do in Canada), this creates a “synthetic cohort” and obviates having to offer a test to incoming students and then waiting three years to test them again, which would be tedious.  You mark the scores on both tests, and you see the change in scores over time.  Not all of the growth in skills/competencies can be attributed to the institution, of course (22 year-olds, on the whole, are smarter than 18 year-olds, even if they don’t go to school), so the key is then to compare the change in results at one institution against those in the others.  Institutions where the measured “gain” is very high are hence deemed “good” at imparting skills and those where the gain is small are deemed “poor”.  By measuring learning gain rather than just exiting scores, institution can’t just win by skimming the best students; they actually have to do some imparting of skills.

This is all reasonable, methodologically speaking.  You can quibble about whether synthetic cohorts are good substitutes for real ones, but most experts think they are.  And many years of testing suggest the results are reliable.  The question is, are they valid?  And this is where things get tricky.

What CLA does is designed to test students’ ability to think critically, reason analytically, solve problems, and communicate clearly and cogently.  It does this by asking a set of open-ended questions about a document (for some sample questions, see here) and then scoring answers on a scoring rubric (see here).  This isn’t an outlandish approach, but there are questions to be asked about how well the concepts of “thinking critically” and “reasoning analytically” are measured.  As this quite excellent review of CLA methodology suggests, it seems to have straddled three separate and not-entirely compatible definitions of “critical thinking”; but while it seems to capture the nature of what is taught in humanities and social sciences, it is not clear that this is the same thing as “critical thinking skills” in other fields.

(A similar problem exists with tests that claim to measure “problem-solving”.  Turns out that results of exams of problem-solving actually are highly correlated with measures of IQ.  This is not the end of the world if you use the CLA synthetic cohort approach because you would be comparatively measuring an increase in IQ rather than IQ itself, which might not be such a terrible thing to the extent one could prove that the competencies IQ measures are valued in the work place.  But it’s not quite as advertised.)

Now, we don’t know exactly how Ontario plans to measure skills. The Higher Education Quality Council of Ontario (HECQO) – which is presumably the agency which will be put in charge of these initiatives – has been experimenting with a couple of different tests of skills: the HEIghten Critical Thinking assessment developed by the American company ETS (some sample questions here), and the Education and Skills Online (ESO) test, which is a variant of the test the OECD uses in its Programme for the International Assessment of Adult Competencies (methodology guide here).  The former is a one-off test of graduating students, and (though I am not a psychometrician) just seems like a really superficial version of CLA or OCED’s AHELO; the ESO test uses the synthetic cohort strategy but uses a test which is much heavier on simple literacy and numeracy skills and lighter on critical thinking (technically, it tries to measure “problem-solving in a digital environment” which may come close to IQ testing, but again not necessarily a problem if you are looking at “gain”).  That probably means EASI has higher levels of validity (because literacy and numeracy are pretty well-studied and firmly conceptualized), though again there has to be some doubt about the degree to which these specific skills represent what is valued/rewarded in the workplace.

The smart money is that Ontario is planning on using some variant of EASI (final report here, my take thereon here) in order to measure institutional outcomes.  Which means the focus will be on literacy and numeracy, which are not ridiculous things to want post-secondary graduates to have and hence not ridiculous outcomes for which to hold institutions accountable.  The problem is really twofold.

First, HEQCO (or whoever is eventually put in charge of this) is going to have to pay a lot of attention to sampling.  In the first initial test done in 2017, they basically got whoever they could to take the test.  Now that actual money is on the table, the randomization of the sample is incredibly important.  And it’s going to be hard to enforce genuine randomization unless some students all agree to take a test, which, you know, might be tricky since the government can’t actually force anyone to do so.  Second, it’s not clear how well the synthetic cohort strategy will work without a standardized measure like SAT scores to back it up: currently, there’s no good way to tell how alike the entering and exiting cohorts actually are.  I don’t think it’s beyond HEQCO’s wit to devise something (obvious possibility: use individual students’ entering grades, normalized by whatever means Ontario institutions use to equalize scores from high schools with different marking schemes), but they need something better than what they have now.

There will be some who say it is a grave perversion of the purpose of university to measure graduates on anything other than their knowledge of subject matter.  Hornswoggle.  If universities and college aren’t focussing on enhancing literacy and numeracy skills as part of their development of subject expertise, something is desperately wrong.  Some will claim it will force institutions to change their curriculum.  No, it won’t: but it will probably get them to change their methods of assessment to focus more on these transversal skills than on pure subject matter.  And that’s a good thing.

What it boils down to is the following: students and governments both pay universities and colleges on the assumption that an education there improves employability.  Employability is at least plausibly related to advanced literacy and numeracy skills.  We can measure literacy and numeracy, and subject to some technical improvements noted above, we can probably measure the extent to which institutions play a role in enhancing those skills.  This is – subject to caveats about not getting too excited about measures of “critical thinking” and paying attention to some important methodological issues – generally a good idea.  In fact, compared to some of the other performance measures the government of Ontario is considering, I would say it is among the better ones.

But do pay attention to those caveats.  They matter.

Posted in

8 responses to “Performance-Based Funding 101: Measuring Skills

  1. Ah, so someone has built a better mouse-trap!
    Two questions:
    1. Is the ESO the same thing as the ESAI? Feel free to delete this message if they are and you fix the text.
    2. Couldn’t the government require students to take the test: [1] when they graduate high school; and [2] when they graduate university? Am I missing something? My kid already takes the EQAO tests; I don’t see how this would be much different (though in a HE setting this would require the universities plan for these tests). This would essentially be a census, and eliminate issues of non-response bias. If the responses were then coupled with tax-filer data, one could then examine the association between ESAI test scores and income.

    For what it is worth, some IQ specialists contend IQ is largely fixed. A variation of that position, so I understand, is that there are two kinds of intelligence: fluid and crystallized. My understanding — and by no means am I an expert on this — is that fluid intelligence corresponds roughly to more traditional understandings of fixed IQ, whereas crystallized intelligence corresponds to the things one learns over time. As an aside, the greater one’s fluid intelligence, the easier it is to acquire crystallized intelligence (though this would also be moderated by diligence, etc). As one ages, fluid intelligence declines but is often compensated for by increases in crystallized intelligence.

    What does this have to do with education? If fluid intelligence is largely fixed, then the outcome of education is largely to increase crystallized intelligence. This also suggests that diligence (now being taught in Ontario public schools under the guise of a growth mindset), in many cases, is superior to fluid intelligence.

    1. 1) ESAI was the name of the project. ESO was the name of the test itself. We’re talking about the same thing.
      2) You could do a census, I suppose, but cost would be phenomenal.

  2. I would want the differences between institutions in ‘improving’ skills corrected for field of study and to be substantially bigger than the margin of error. Which would require calculation and reporting of error margins, which all too frequently is not done.

  3. I should think that we’re under no danger of forgetting that governments pay universities and students attend universities in order to improve wages. What we are under danger of forgetting, is that universities are anything else.

    And that leads to a paradox: university education generally leads to better incomes, but aiming for better incomes undermines university education.

    Using the ESO or CLA or something would allow us to measure how much student literacy skills advance while (say) completing an Art History degree, and I’m quite confident that it’s a lot. Knowing that their students would be testing on literacy improvement thinking, however, might incentivize universities to bully their art historians into teaching more remedial grammar and less art. This would frustrate the goal of actually inculcating literacy, but also deny students the benefits of a liberal education.

    1. Rats! Left in the word “thinking” in the second to last sentence. Maybe I should be studying literacy skills.

  4. I’m a big believer in accountability and an even bigger believer in developing transferable skills.
    But there’s a lot to say about all of this. First, I wouldn’t ever want to compare schools to each other, but would only want to measure the value add from first semester to last semester. Second, any standardized test is likely to be very stable once a solid methodology has been developed, but it should be noted that uneven student participation and effort can skew results considerably. Getting valid and reliable data is not easy. Third, a dollar spent measuring where there is no data to use for improvement is a dollar wasted. Let’s just call it an expenditure; if data can be used to influence faculty attitudes and behaviours to improve development of higher-order cognitive skills, this is an investment. The cost of implementing standardized testing is significant; direct costs are high, but indirect costs can be even higher. Finally, does anyone know of a system where this kind of PBF has actually led to tangible improvements in student outcomes? I’d appreciate a HESA blog on this.
    Anyone interested in learning more should look at the final report from Queen’s HEQCO-funded 4-year longitudinal study using both standardized measures and validated rubrics: http://www.queensu.ca/qloa/longitudinal-study/project-report or the more abbreviated results http://www.queensu.ca/qloa/longitudinal-study/quantitative-results

  5. In addition to the sampling, implementation and interpretation issues (i.e. comparison between universities, between student groups, between disciplines and fields of study) that you and others may be concerned about with ESO, the construct or model of literacy (really, just reading) is a problem.
    While the construct works for what it was designed to do—compare test results with results from questionnaires to explore socioeconomic relationships—once the test is carried into educational environments, it introduces a series of distortions and pedagogical perversions that are counter-productive and systemically unfair.
    Validity, really only internal validity here, in one context does not in any way imply validity in another context, contrary to what you stated: “That probably means EASI has higher levels of validity (because literacy and numeracy are pretty well-studied and firmly conceptualized).” Compounding the fault in this thinking is a gross misunderstanding of the transferability of literacy, which doesn’t automatically transfer from one context to the next.
    Although referred to as a test of literacy, numeracy and problem-solving, these comprehensive terms completely obscure the fact that this is a READING test. The other part of literacy, producing text, arguably the more important part in today’s workplaces, is NOT tested. In the so-called numeracy portion, some basic calculations must be performed to complete the reading items. In the problem-solving portion (already admittedly outdated) test-takers are presented with constructed on-line environments and must use some basic screen navigation techniques to respond to reading-type questions.
    The type of reading tested is simply a sophisticated scanning technique, where test-takers are prompted to find one or two bits of information, avoid distractions (the variable that predicts difficulty the most) and supply a one-word or one phrase response. You can think of it as an eye-hand coordination test for the information age. Rather than manipulate pegs in a peg board, test-takers manipulate bits of information. More than anything, ESO tests short-term memory in textual environments (texts with an average Grade 8 readability) and test-taking skills (remember those nasty distractors). Perhaps this approach is useful in very specific instances when hiring, but it’s counter-productive in education in general. The related problem with the construct is that it DOES NOT draw on a language development indicators, such as vocabulary, or sentence construction or reading comprehension skills that are taught in K-12 or reading for academic purposes strategies taught in PSE. Perversely, its theoretical basis is an error analysis of test-taking skills and not reading development indicators!
    So what happens then, when this construct is introduced and valued or even used in high-stakes decision-making in PSE? A whole slew of paradoxical and perverse responses will follow:
    ESO results won’t align neatly with the results from other tests or even grades, since the underlying construct and the overall pedagogical approach are fundamentally different.
    ESO skills (quick scanning to find bits of information to complete the question) are counter-productive to the development of skills valued and needed in PSE and afterwards—that is deep, careful reading, connecting text to experience and other texts, and the application of that text-informed knowledge. But superficial scanning skills will become more valued, depending on the stakes, and more time will be devoted to their development, pushing aside what really matters.
    If stakes are low for students, they will blow off the test. If stakes are high for institutions and low for students they will be pressured and professional judgement of educators will be continuously compromised. Forget about building trusting learning relationships.
    Those who will be subject to a perverse pedagogy are often the non-traditional students, out of school for some time with rusty test-taking skills and multi-lingual students studying in English or French for the first time. When they need more PSE supports, they will actually receive fewer.
    While multilingual students and others may be performing well in courses, the test construct, devoid of traditional language development indicators, will unfairly introduce new challenges. Again, they may be unfairly targeted for ESO remediation, a barrier to the types of skills they actually need.
    Like a faulty or biased algorithm, the ESO reading model, lifted out of its original context and into a learning context, will unleash a series of systemic inequities and counter-productive actions.

Leave a Reply

Your email address will not be published. Required fields are marked *

Search the Blog

Enjoy Reading?

Get One Thought sent straight to your inbox.
Subscribe now.