How well should scientific code be scrutinised?

A friend, Dr. Kieran Alden, forwarded me a Nature article today describing how Mozilla (who make the Firefox browser, amongst other things) are looking through 200-line snippets of code from 9 articles published in the journal PLoS Computational Biology. They are investigating the worth of having competent professional coders, people primarily trained in best coding practice, checking the code of scientists (who are often not formally trained at all) when it is published.

This is a debate that I gather has been gaining some momentum, I keep stumbling onto it. Should code be published, and how well should it be reviewed? I understand the desire of researchers to want to keep their code to themselves. From a research perspective there’s a lot of up-front cost to developing this sort of code, and once someone has it they can build on it and run experiments with almost no buy-in at all. Research code is very cheap to execute, and very difficult to get right in the first place. Certainly, there are a lot of labs who won’t share code until their first round of results emanating from it have already been published – it’s too easy for competing labs to get ahead based on your effort.

But not sharing code can arguably hold back science, with labs investigating similar phenomena having to develop their own code from scratch. This is not very efficient, and it seems that the incentives are wrong if this is the behaviour that the powers-that-be have produced. But that’s another story. The article briefly skirts around two facets of the debate: bad code can be misleading, and harm a discipline rather than informing it; and researchers who are already unsure about publishing code may be put off completely if it gets reviewed as well as published.

For me this isn’t even a debate, its a no-brainer. As a computer scientist by training who has developed several scientific simulations I am very familiar with the complexities of simulating complex domains. I smiled at the paper’s implicit suggestion that commercial software is the way forward – I’ve used linux, windows, mac, firefox… so much of this stuff is full of bugs. But that’s not the point. In research we’re simulating things we often don’t understand very well, which can be tricky to check for bugs: are the weird behaviours an interesting and unexpected result representative of the science, or coding errors? And that’s if you spot them; research labs certainly don’t have the budgets to spend on testing that commercial enterprise does.

For me it boils down to this: what do you think would happen if I, a computer scientist, tried to run experiments in a sterile biological lab with no formal training? Would you trust the results? You might find me drinking coffee out of a petri-dish. And then, when I came to publish, I was vague about my methods – “Just trust me, I did everything right… doesn’t matter how.” Science has demanded a stringent adherence to and reporting of experimental protocol in most disciplines. I don’t see how scientific results coming from a computer are any different. I think code should be published, and reviewed. The journal(s) that make this their policy (particularly the reviewing part) will earn a reputation for publishing robust, high quality work. Researchers have to publish in good journals to progress, so they will have to adhere to the policy – that or accept publication in a “lesser” journal. With spurious results coming from bad code holding such potential to mislead and set back a scientific field, I think it’s highly worthwhile that journals employ experienced coding professionals to check code (as suggested in the article). But really, this is too late in the process, what we really need is these stringent standards being employed during the research and development stage, so as to catch and fix errors as they occur, not killing research tracks years after they start when errors are caught only at the publication stage.

(The original article: EC Hayden. Mozilla plan seeks to debug scientific code. Nature 501: 472, 2013.)