Bibliometrics, Part Deux: Database Politics

Back in the 1950s, someone came up with the idea of creating indexes of citations in scientific journals. The Science Citation Index then appeared in 1961, and one for the social sciences followed in 1966.

Ever since that time, it has been possible to answer questions like: “who publishes more articles” and “whose articles are being cited more”? With little effort, you can aggregate up from individual academics to departments or institutions, and this data feeds many international rankings systems. But how reliable are the counts that come from these databases?

One issue is how journals make it into the various databases. Which journals a database chooses to include can change the outcomes of any comparisons using that database. Thomson Reuters’s Web of Science database is more selective than Elsevier’s Scopus, meaning the latter has more journals, but the difference is in lower-impact (i.e., less-frequently cited) publications. Google – which indexes publications rather than journals – is even more inclusive, but as a result also takes even more marginal publications.

Publication databases tend to be biased towards English-language publications (Google much less so than others), though Thomson Reuters has recently been active in adding Mandarin and Lusophone journals to go after the Chinese and Brazilian markets. And of course there are biases which arise from differences in publication cultures across disciplines; the fragmented nature of the humanities means they don’t have “standard” journals the way that some disciplines (e.g., biology) do, which leads to fewer such journals being included.

No one database is prima facie “better” than the others – each is just a different lens on the universe of scholarly publication. If you enter your name in Google Scholar, Scopus and WoS, there’s a strong likelihood that each will bring back a slightly different selection of your scholarly papers. But the results remain highly correlated; it’s nearly impossible to look great in one database and bad in another.

Yet there are stories of VPs and deans doing such a check and concluding bibliometrics are bogus because one of their favourite articles is missing. But this is weak: the point of bibliometrics is not to perfectly “capture” reality, but to compare academics (or groups of academics) to one another. Differences or omissions in coverage are fundamentally irrelevant unless the choice of database systematically favours one group of academics over another. Which it almost never does.

So, are there problems with all the major databases? Sure. Do these problems fundamentally affect their ability to reflect relative positions within the scholarly literature? Not one bit. And that’s precisely why bibilometrics work.

Posted in

Leave a Reply

Your email address will not be published. Required fields are marked *

Search the Blog

Enjoy Reading?

Get One Thought sent straight to your inbox.
Subscribe now.