September 08

Data on Sexual Harassment & Sexual Assault in Higher Ed-an Australian Experiment

Earlier this year, I raged a bit at a project that the Ontario government had launched: namely, an attempt to survey every single student in Ontario about sexual assault in a way that – it seemed to me – likely to be (mis)used for constructing a league table on which institutions had the highest rates of sexual assault.  While getting more information about sexual assault seemed like a good idea, the possibility of a league table – based as it would be on a voluntary survey with pretty tiny likely response rates – was a terrible idea which I suggested needed to be re-thought.

Well, surprise!  Turns out Australian universities actually did this on their own initiative last year.  They asked the Australian Human Rights Commission (AHRC) to conduct a survey almost exactly along the lines I said was a terrible idea. And the results are…interesting.

To be precise: the AHRC took a fairly large sample (a shade over 300,000) of university students – not a complete census the way Ontario is considering – and sent them a well-thought-out survey (the report is here).  The response rate was 9.7%, and the report authors quite diligently and prominently noted the issues with data of this kind, which is the same as bedevils nearly all student survey research, including things like the National Survey of Student Engagement, the annual Canadian Undergraduate Research Consortium studies etc etc.

The report went on to outline a large number of extremely interesting and valuable findings.  Even if you take the view that these kinds of surveys are likely to overstate the prevalence of sexual assault and harassment because of response bias, the data about things like the perpetrators of assault/harassment, the settings in which it occurs, report of such events and the support sought afterwards are still likely to be accurate, and the report makes an incredible contribution by reporting these in detail (see synopses of the reports  from CNN, and Nature).  And, correctly, the report does not reveal data by institution.

So everything’s good?  Well, not quite.  Though the AHRC did not publish the data, the fact that it possessed data which could be analysed by institution set up a dynamic where if the data wasn’t released, there would be accusations of cover-up, suppression, etc.  So, the universities themselves – separate from the AHRC report – decided to voluntarily release their own data on sexual assaults.

Now I don’t think I’ve ever heard of institutions voluntarily releasing data on themselves which a) allowed direct comparisons between institutions b) on such a sensitive subject and c) where the data quality was so suspect.  But they did it.  And sure enough, news agencies such as ABC (the Australian one) and News Corp immediately turned this crap data into a ranking, which means that for years to come, the University of New England (it’s in small-town New South Wales) will be known as the sexual assault capital of Australian higher education.  Is that label justified?  Who knows?  The data quality makes it impossible to tell.   But UNE will have to live with it until the next time universities do a survey.

To be fair, on the whole the media reaction to the survey was not overly sensationalist.  For the most part, it focussed on the major cross-campus findings and not on institutional comparisons.  Which is good, and suggests that some of my concerns from last year may have been overblown (though I’m not entirely convinced our media will be as responsible as Australia’s).  That said, for data accuracy, use of a much smaller sample with incentives to produce a much higher response rate would still produce a much result with much better data quality than what the ARHC did, let alone the nonsensical census idea Ontario is considering.  The subject is too important to let bad data quality cloud the issue.


February 23

Garbage Data on Sexual Assaults

I am going to do something today which I expect will not put me in good stead with one of my biggest clients.  But the Government of Ontario is considering something unwise and I feel it best to speak up.

As many of you know, the current Liberal government is very concerned about sexual harassment and sexual assault on campus, and has devoted no small amount of time and political capital to getting institutions to adopt new rules and regulations around said issues.  One can doubt the likely effectiveness of such policies, but not the sincerity of the motive behind them.

One of the tools the Government of Ontario wishes to use in this fight is more public disclosure about sexual assault.  I imagine they have been influenced with how the US federal government collects and publishes statistics on campus crime, including statistics on sexual assaults.  If you want to hold institutions accountable for making campuses safer, you want to be able to measure incidents and show change over time, right?

Well, sort of.  This is tricky stuff.

Let’s assume you had perfect data on sexual assaults by campus.  What would that show?  It would depend in part on the definitions used.  Are we counting sexual assaults/harassment which occur on campus?  Or are we talking about sexual assaults/harassment experiences by students?  Those are two completely different figures.  If the purpose of these figures is accountability and giving prospective students the “right to know” (personal safety is after all a significant concern for prospective students), how useful is that first number?  To what extent does it make sense for institutions to be held accountable for things which do not occur on their property?

And that’s assuming perfect data, which really doesn’t exist.  The problems multiply exponentially when you decided to rely on sub-standard data.  And according to a recent Request for Proposals placed on the government tenders website MERX, the Government of Ontario is planning to rely on some truly awful data for its future work on this file.

Here’s the scoop: the Ministry of Advanced Education and Skills Development is planning to do two surveys: one in 2018 and one in 2024.  They plan on getting contact lists of emails of every single student in the system – at all 20 public universities, 24 colleges and 417 private institutions and handing them over to a contractor so they can do a survey. (This is insane from a privacy perspective – the much safer way to do this is to get institutions to send out an email to students with a link to a survey so the contractor never sees the names without students’ consent).  Then they are going to send out an email to all those students – close to 700,000 in total – offering $5/per head to answer a survey.

Its not clear what Ontario plans to do with this data.  But the fact that they are insistent that *every* student at *every* institution be sent the survey suggests to me that they want the option to be able to analyze and perhaps publish the data from this anonymous voluntary survey on a campus by campus basis.

Yes, really.

Now, one might argue: so what?  Pretty much every student survey works this way.  You send out a message to as many students as you can, offer an inducement and hope for the best in terms of response rate.  Absent institutional follow-up emails, this approach probably gets you a response rate between 10 and 15% (a $5 incentive won’t move that many students)  Serious methodologists grind their teeth over those kinds of low numbers, but increasingly this is the way of the world.  Phone polls don’t get much better than this.  The surveys we used to do for the Globe and Mail’s Canadian University Report were in that range.  The Canadian University Survey Consortium does a bit better than that because of multiple follow-ups and strong institutional engagement.  But hell, even StatsCan is down to a 50% response rate on the National Graduates Survey.

Is there non-response bias?  Sure.  And we have no idea what it is.  No one’s ever checked.  But these surveys are super-reliable even if they’re not completely valid.  Year after year we see stable patterns of responses, and there’s no reason to suspect that the non-response bias is different across institutions.  So if we see differences in satisfaction of ten or fifteen percent from one institution from another, most of us in the field are content to accept that finding.

So why is the Ministry’s approach so crazy when it’s just using the same one as every one else?  First of all, the stakes are completely different.  It’s one thing to be named an institution with low levels of student satisfaction.  It’s something completely different to be called the sexual assault capital of Ontario.  So accuracy matters a lot more.

Second, the differences between institutions are likely to be tiny.  We have no reason to believe a priori that rates differ much by institutions.  Therefore small biases in response patterns might alter the league table (and let’s be honest, even if Ontario doesn’t publish this as a league table, it will take the Star and the Globe about 30 seconds to turn it into one).  But we have no idea what the response biases might be and the government’s methodology makes no attempt to work that out.

Might people who have been assaulted be more likely to answer than those who did not?  If so, you’re going to get inflated numbers.  Might people have reasons to distort the results?  Might a Men’s Rights group encourage all its members to indicate they’d been assaulted to show that assault isn’t really a women’s issue?  With low response rates, it wouldn’t take many respondents to get that tactic to work.

The Government is never going to get accurate overall response rates from this approach.  They might, after repeated tries, start to see patterns in the data: sexual assault is more prevalent in institutions in large communities than in small ones, maybe; or it might happen more often to students in certain fields of study than others.  That might be valuable.  But if the first time the data is published all that makes the papers is a rank order of places where students are assaulted, we will have absolutely no way to contextualize the data, no way to assess its reliability or validity.

At best, if the data is reported system-wide, the data will be weak.  A better alternative would be to go with a smaller random sample and better incentives so as to obtain higher  response rates.  But if it remains a voluntary survey *and* there is some intention to publish on a campus-by campus basis, then it will be garbage.  And garbage data is a terrible way to support good policy objectives.

Someone – preferably with a better understanding of survey methodology – needs to put a stop to this idea.  Now.