The Leiden Manifesto

I previously wrote about my disdain for publication metrics, both here, and with DC. Last week, yet another comment piece broadly propounding metrics was published in Nature. The journal declared its conflict of interest – its publisher supports financially – clearly this time. The piece promotes a “manifesto” for bibliometrics, bolstered by the unlikely premise that, correctly deployed in a secondary role behind expert assessment, publication metrics are indispensable. However, some of the suggestions are a little odd, and the general damning of metrics with faint praise renders this ‘manifesto” at best an addition to the historical record of the current sorry conception of scientific assessment. Let’s dive in.

Metrics have proliferated: usually well intentioned, not always well informed, often ill applied.

Be fair, tell me about their weaknesses as well!

Research metrics can provide crucial information that would be difficult to gather or understand by means of individual expertise. But this quantitative information must not be allowed to morph from an instrument into the goal.

An assertion with nothing to back it up, followed by pure naïveté. What crucial information? It’s well-documented that performance metrics shape behaviour of those who are assessed. “Tell me how you measure me, and I’ll tell you how I behave” – from Goldratt, and discussed succintly here and elsewhere. Whether you want to allow it or not: this transformation has already happened. Graduate students learn impact factors by heart before they master complicated research techniques – and act accordingly.

Once-useful metrics become inadequate; new ones emerge

Which metric was once useful? Only a ridiculous idea that caused uproar in Australia is mentioned at this point. Cynics might say that this call for regular updating seems like an appeal for regular conferences about bibliometric meta-research. Just because the old metrics are bad, doesn’t make the new ones any good. 

Evaluators must strive for balance — simple indicators true to the complexity of the research process.

Grammatically correct, but the concept of any simple true indicator has several obvious flaws. Good research (in it’s published form) is innovative, yet careful, internally consistent and thorough. How can these disparate facets be captured in a simple indicator? How to deal with retractions, corrections, author contributions and derivative work? None of these important confounders can be adequately included in current schemes, and it’s against the interests of most involved parties to be too critical in any case. But we need criticism to get good science.

I was prepared to give this piece the benefit of the doubt, but I cracked when I read this part:

it makes no sense to distinguish between journals on the basis of very small impact factor differences. Avoid false precision: only one decimal is warranted.

A great deal is revealed here. First, the idea that anyone should distinguish between journals on the basis of impact factor is crude. But to distinguish on decimal points? Now we know – Nature Neuroscience (impact factor 14.2, rounded to one decimal place) is a better journal than Neuron (14.0). This advice is far from sensible.

(as a side note, it’s useful for my point here that the table of IFs for neuroscience journals was put online, but it must have taken a while to collate and what is really the point? Full disclosure: I have had 4 papers in Neuron and none in Nature Neuroscience.)

More importantly, the impact factor of the journal tells you nothing about the value of any individual article – whether it is good science or not.  For that you have to READ IT! Your own subjective ideas about journal prestige might tell you a little about the quality of the article, but nothing (and certainly no metric) compares to reading the paper. There is no correlation between impact factor and citation counts, because the distribution of citations is so skew (Seglen 1992). There is a correlation between impact factor and retraction rate.

Reading and judging a researcher’s work is much more appropriate than relying on one number.

Agreed, so why go to all the effort of getting any of the numbers? Perhaps because you are not qualified to read the work? Then don’t pretend to judge the researcher. The alternative of judging without any metrics is barely considered – but why should we imagine that numbers can save us from a lack of judgement? We will see below that judging without metrics is actually fine. 

We offer this distillation of best practice in metrics-based research assessment so that researchers can hold evaluators to account, and evaluators can hold their indicators to account.

This comment piece is not a distillation of best practice. It’s promotion of enterprises that waste time and in some cases, exist to make money but that contribute next-to-nothing to science. Yet another article intended to promote the use of metrics raises several peerless arguments against them. There is no reason to believe the premise that employing metrics properly (according to Leiden rules) will make them useful. No studies have been done (to my knowledge) as to what metrics contribute.

Ironically, also last week, a thorough analysis of peer review at NIH study sections showed that these expert panels (composed of working scientists) could provide useful judgement on the quality of grant applications. Thus the study section score could tell you how productive a particular research programme was expected to be. There is perhaps more irony about how this study measured research output (number of papers, number of citations and number of patents) – but we’ll have to let that one go. Now, I have never been on such a study section, but I know well some who have. It’s exhausting work – because it doesn’t employ metrics at all. The reviewers have a stack of grants to read and must understand them. Nobody thinks that the NIH system is perfect. But the study from Li and Agha shows that this approach, draining though it is, has value. They conclude:

Our findings demonstrate that peer review generates information about the quality of applications that may not be available otherwise.

Compare with the assertion above. We now have some evidence that expert assessment works. Until anyone can produce a publication metric with demonstrated value to science, metrics will continue to be a race to the bottom, a fetish. Just because some metrics were invented by researchers, doesn’t meant that they won’t damage science. Unfortunately, people occasionally invent things and regret their subsequent use (Eugene Garfield, anyone?).

Possibly the Leiden manifesto will be assessed favourably, tweeted about and widely cited – it’s in Nature after all and Nature has a super impact factor. But if one takes the time to read the piece, there are striking similarities to metrics: it’s well-intentioned, but sadly of little use, and quite possibly dangerous to science.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s