How “cutting edge” is enough for a high impact publication? And a detour into statistical significance and computers.

I was curious to find a commentary paper titled “So you want to be a computational biologist” in Nature Biotechnology today.┬áNature Biotechnology is, as the “Nature” label might suggest, a cutting edge hugely-cited journal dedicated to disseminating the most relevant advances in biotechnology techniques and methodologies. The commentary seems to be targeted at Biologists, but whether they are established researchers or recent graduates is unclear. The commentary is the most basic of basic introductions to the world of coding and working with computers. It advocates using the unix command line, since that’s the best interface to control bioinformatics software from, and creating “traps” for your code, which are better known in computer science as tests. Testing your computer code is a fundamental practice introduced in first year undergraduate degrees, if not in high school. The caution advised against over-interpreting statistically significant results (there’s a lot to be said for the problems of using only statistical significance test, in absence of effect magnitude tests, in computer-based scientific disciplines where extra samples are easy to come by – see last paragraph) should be familiar to all scientific PhD candidates. The same applies for the use of ‘controls’ in designing experiments. I am surprised how much of this article constitutes basic computer science practice, statistics and experimental design, given the journal. The “have an Obama frame of mind: Yes, you can!” tone of the article is also curious.

All is not lost however, there are at least two excellent and highly relevant messages hidden in the article. 1) knowledge of the biology is required to correctly and appropriately interpret the results. This is not the whole story, I feel a good knowledge of abstractions/assumptions inherent in computational techniques is also required, but it is an encouraging message. 2) the article finishes by advising the would-be freshly inspired computational biologists to find help from people in the know. That chimes a lot with me: trying to conduct meaningful computational biology research without a firm grounding in the biology is crazy, its difficult to ensure that the research is relevant. To me it seems just as crazy for biologists with no training to start tinkering around with computational techniques, hoping to conduct sound research. Which leads me to wonder what the point of this article actually is. It seems to me the sensible message is, “Want to start doing computational biology research? Get some training and find people who know what they’re doing”, which requires a lot less space than 3 pages in a cutting edge journal.

Oh, and the statistics. When you’re conducting computer-based scientific research you have to be very careful about using only statistical significance tests (the ones that give you p-values). You’re best of using both statistical significance and effect magnitude tests (the latter are also called scientific significance tests). The reason has to do with the availability of samples. Statistical significance came about in a period of science when samples weren’t so easy to come by – experiments on people (psychology), or on animals, which were expensive and have ethical concerns. The experimenter wants to determine whether the experiment has any effect. This is normally done by rejecting the null hypothesis – a hypothesis along the lines of “what I’m doing/testing has absolutely no effect”. Statistical significance tests determine the probability of getting the observed data if the null hypothesis is true. What the experimenter is looking for is observed data that is so hugely unlikely (though not impossible) to happen that its reasonable to conclude the null hypothesis is wrong. Less than 5% chance is common, expressed as “< 0.05″.

Lets take an example, you want to test whether a coin is fair. The null hypothesis here is “the coin is fair”. You flip it 10 times, giving 6 heads and 4 tails. Is it fair coin? The p-value will come back with something > 0.05, meaning you cannot reject the null hypothesis. What this means is, “there’s a good chance you could have gotten 6 heads out of 10, even if the coin was fair – its just luck”. Getting 9 heads would be a different story… Anyway, so this time you take more samples. You flip the coin 1,000,000 times (you have a lot of free time, and rock solid commitment to the cause) and get 550,000 heads. Its nearly 50:50, but the difference is 50,000 heads. That’s a lot of heads. Its pretty unlikely that a fair coin would have given you this many more heads. I reckon you’d get a small p-value here, < 0.05 (or “*”). If you had even more time, and did 100,000,000 flips, getting 55,000,000 heads you’d get an even smaller p-value, probably <0.001. You can conclude the coin is not fair. What’s missing here is the “who cares” test, and that’s what effect magnitude gives you. The data showing an effect, i.e., pretty unlikely you’d have got this data by chance assuming the null hypothesis is true, is not the same as showing an important effect. And here’s the issue, you can always achieve statistical significance by taking more samples. There’s only one exception, and that’s if there genuinely was no effect in the first place – like if the coin really was 100% fair. But that’s almost never true, certainly not for any common coin in my possession, and almost certainly not for most null hypotheses. We may use controls in most experiments, but there’s nearly always some effect, no matter how good the control is. Add to that, it takes a special breed of researcher to be studying things could genuinely have no effect (I wouldn’t rule it out, but I can’t think of any right now). Almost anything you study, any experiment, will have some effect. What the world really cares about is, how much effect? Is it an important effect? Should we stop paying money for this thing? Stop doing this thing? I love crisps. I’m fairly sure that eating 1 crisp a day will reduce a persons life span. I might live 1 day less.

You’d never see this effect by looking at me alone, you’d need to measure the lifespans of millions of people. But it can be done, because the fat in crisps (and the carcinogens derived from frying them) will have some effect. So if you had the money, time, and will, you could show this to be the case. You could get that p-value, and put “***” on that graph. The problem is, nearly no one will care. I’m not going to stop eating crips that I love for a 1 day difference in life span. And that’s the difference. You can always show statistical significance with enough samples, the important question is whether the effect you have seen is a big or important effect. Compared to running experiments on people or animals, which are expensive, time consuming and can have ethical considerations, computers are very cheap and you can run millions of experiments. In light of that, statistical significance tests aren’t so useful anymore. You should still do them, but you should also perform effect magnitude tests.

Wow, that took a lot longer than I though it would. Congratulations if you got this far. Have a crisp (and a beer).