Dr. Mark Read

Research Fellow, Charles Perkins Centre, The University of Sydney
Research Gate
My Research Gate

Free statistics code

I’m no world-class statistician, but I am capable of learning what I need to get by. Recently I have been looking into running particular statistical tests and have been dismayed by either the general absence of implementations available online, else by the crazy hoops one needs to jump through to access them. The latter usually involves installing additional modules or entire software suites, which do not always play nice. So I implemented a few tests myself, and will place them here for all to use.

First, the Vargha-Delaney A-test [1]. Its not widely used, so I’ve provided a reference at the end. This is a really useful non-parametric effect magnitude test that can differentiate between two samples of observations. One sample might be a control experiment, the other may be a different algorithm, a new drug… the effects of wearing a fetching tin foil hat – it matters not. The test returns a value between 0 and 1, representing the probability that a randomly selected observation from sample A is bigger than a randomly selected observation from sample B. Hence, it tells you how much the two samples overlap. Values of 0.5 indicate that the medians are the same (though the spreads need not be). Values of 1 and 0 mean that there is no overlap at all. Vargha & Delaney [1] provide suggested thresholds for interpreting the effect size. 0.5 means no difference at all; up to 0.56 indicates a small difference; up to 0.64 indicates medium; anything over 0.71 is large. The same intervals apply below 0.5. I was once given an implementation of this test by Dr. Simon Poulding (I think it may have been originally written by the revered Prof Susan Stepney, but I’m not sure), but the implementation came with a strict warning that it was only suitable when both samples have the same number of observations, which is a serious drawback. I wrote another implementation that lifts this restriction; the key is that the A-test is a linear transformation of Cliff’s Delta statistic. I wrote yet a third implementation in python.

Second, I’ve been doing some python programming recently. I wanted to draw a cumulative distribution function based on empirical observations. I could not believe how many different methods of generating such a thing existed, several didn’t work, and some gave different answers. It’s not that difficult a statistic to calculate. so I wrote my own. Enjoy.

I didn’t have to create my own implementation of the Kolmogorov-Smirnov 2-sample test (KS2), but today it’s fast becoming a favourite. The A test (and Cliff’s Delta, Mann Whitney’s U, and probably a whole load more) only really compare medians, sometimes in the context of the spread. KS2 operates on two samples’ cumulative distribution functions, and as such it is sensitive not only to shifts in median value, but also their shapes and spreads. Most implementations return both a p-value (which is not the same as effect magnitude), and the ‘D’ value. D is awesome, and seems to be under-appreciated. It represents the maximum difference between the cumulative distributions at any value. See the bottom figure for a graphical interpretation. I plan to use this in automated calibration, where I like its sensitivity to so many aspects of a sample’s underlying distribution.

[1] Vargha A, Delaney HD. A critique and improvement of the CL common language effect size statistics of McGraw and Wong. J Educ Behav Stat. 2000;25(2):101-32.

Fig 1 taken from here. Fig 2 taken from here. Fig 3 taken from here.

Randomly generated science!

Details further highlighting the perils of academia’s “publish or perish” philosophy taken to the extreme (which it is) have emerged in Nature, also reported on in the Guardian. When I was an undergraduate at the University of York I remember hearing of a “random paper generator”: “SCIgen”. I seem to recall even having tried it, supplying some keywords and witnessing with amazement an entire manuscript of impressive sounding nonsense fly out the other end. The generator was created by three MIT graduates to demonstrate that conferences would accept anything, even if it made no sense. They succeeded! What’s more, others have been using the program ever since. French computer scientist Cyril Labbé has catalogued over 100 IEEE papers that are computer-written, a further 16 by Springer, and more. Labbé took the concept to the extreme when he created a fictional author, Ike Antkare, and had 102 fake papers lodged with Google Scholar. Quickly Antkare’s h-index rose to 94, making him the 21st most cited author on the planet (h-index is a measure of an author’s contribution to academia, it represents the number of papers an author has that have been cited the same number of times. An index of 5 means an author has 5 papers that have been cited 5 or more times).

The whole thing resonates with last year’s sting operation, published in Science, wherein 150 severely flawed papers were accepted for publication in a wide variety of open access journals (I wrote about that too). Researchers are under incredible pressure to publish: promotions and positions are tremendously competitive, and your publication record largely dictates what will happen to you. Funding bodies (and governments) wanting to measure research performance to the nth degree place an unhealthy attention on publications. The problem is that metrics often perturb the system they are trying to measure, as it is natural for people to start gaming the system. To get ahead researchers need to publish very frequently, but the simple truth is that it’s impossible to pump out more than 3 or 4 (at most!) really groundbreaking journal papers a year on a consistent basis. As a researcher you choose how to invest your time. Do you work on something potentially groundbreaking, but perhaps high risk as it may not pay off, or may not give you that critical publication by the time your current funding runs out? Or you can concentrate on publishing more incremental research more frequently and have a longer looking publication record when its comes to accessing more funds? What ought to happen is for the incremental stuff to be turned away from the higher quality journals, however the aforementioned sting operation demonstrates that this isn’t necessarily the case (with some exceptions, my personal opinion based on my experience is that some very dubious work can get into very high places based purely on who’s name is on the author list). The gate keepers to the journal are the same over-loaded stressed out academics panicking about where their next funding is coming from, and often they don’t do a great job (again, I’ve seen some pretty poor reviews). Academics are not paid for reviewing, and short of a benevolent and altruistic attitude towards the institution of science (which does not pay the bills), its not clear to me what motivates a particularly thorough job. Especially with ever growing numbers of manuscripts to review coming through the door. “Reputation”, you might think… but then reviewing is largely anonymous.

I’ve painted a bit of a caricature here, but its to highlight the point, and there is truth in it. Its hard to measure scientific progress with fine instruments. Blue sky research may have no obvious use right now, but at some point in the future another researcher you have never heard of will connect the dots and do something amazing. Or not. But you can’t really tell in the here and now what will be relevant in 10 or 50 years time. Trying too hard will only push people towards publishing incremental, rather than transformational, research. I’m occasionally regaled with stories of how some of the greatest scientists of yesteryear had periods of a few years when they didn’t publish anything – they were thinking, getting it right, and published only when ready. I wonder sometimes who is actually reading the thousands of incremental papers that researchers are pressured into publishing at an alarming rate, if the reviewers don’t even do a particularly good job.

I’m afraid I have to leave this article without much of a solution. A bad ending I know. To me it is clear that there’s a problem with the current scientific model, but I can’t offer any earth-shattering alternatives (yet). Though it does strike me that what we have here looks a little bit like the perceived problems with short-term outlooks in the financial markets (and Politics, though the latter does not seem overly well appreciated). Short-selling has periodically been banned in some markets, and high flying people in finance are being paid their bonuses in longer-term company shares they can only access years later, rather than cash. I wonder if it’s possible to engineer a longer-term outlook in science too?

Dr. Pom Lands in Oz

Actually I landed here 5 months ago, in September 2013. I’ve been sitting on this blog post for long enough. Almost a year ago exactly I landed a job at the University of Sydney, who were advertising computational biology fellowships as part of a new interdisciplinary centre being established to tackle the rise of obesity, diabetes and cardiovascular disease. I got married to my long-time partner a week before we flew out here, and we went about starting a new life in an exciting foreign land. 

Arriving here, my position is not what I expected, but in a good way. The interdisciplinary centre (the Charles Perkins Centre, see image) is a tremendously forward thinking and ambitious project to bring together disparate departments and skillsets across the university. They have promoted a banner of addressing the rise of the aforementioned diseases, but really the centre is aiming far wider than that. How wide this scope is changes by the week, even now; truly this is a university-wide endeavour. Understandably there’s a lot of excitement about what could be accomplished on the wings of fast-evolving interactions across faculties and a $250 million investment. My job? Well a job spec was not really available upon my arrival, which is unusual – in my experience the money for a research position arrives only after someone has clearly laid out what the problem is and how it will be addressed. In fact, my job is to build collaborations; to meet people from anywhere in the university (and outside) and assess whether providing a computational angle in analysing systems can be of benefit. In the context of a university rediscovering itself (I have heard numerous stories of researchers who launch collaborations with one another at conferences in far off exotic places, when their offices are in adjacent university buildings!) this makes complete sense, myself and the other computational modellers are to provide the “glue” between disciplines that can launch novel new research. In truth, no one knows exactly how far the new centre can go in advancing research, so there’s a lot of excitement. But there’s no doubt that what is being done here is of world class novelty.

I am largely free to write my own job spec, which is a unique and rewarding opportunity. It doesnt take long, however, before your available time (and then some) is completely occupied. I could do with staff! So we begin searching for students. But there are three big projects I’m working on at the moment. I’m continuing the modelling of immunology, building a hybrid spatially-explicit simulation to elucidate the signalling mechanisms responsible for the swarming behaviour observed in neutrophils responding to inflammation. I’ve also branched into modelling the gut microbial community. There’s growing compelling evidence that the bacteria occupying one’s bowels have a huge influence on health. The interactions between these bacteria and the immune system are largely uncharacterised at present, but we know there’s a lot of it. And I’m continuing my research on automated calibration of simulations. I still see this as an essential technology: there are so many biological parameters that are unknown, an efficient technology to identify them in simulation can provide worlds of useful information. It can do more, phrasing a calibration question differently – “find me parameters that will make this simulation of disease healthy instead” – could point to interesting new intervention strategies.

Posts following this one will elaborate on each of these projects as they mature. For the meantime I’m “stoked” to have landed such a fantastic and varied job in one of the world’s most impressive cities. I’m writing this on an airplane returning from my first Australian conference. Its been everything I hoped for: I’ve met lots of really friendly people, have got some promising leads, and am enthused to have discovered a vibrant immunology community here in Australia.

How “cutting edge” is enough for a high impact publication? And a detour into statistical significance and computers.

I was curious to find a commentary paper titled “So you want to be a computational biologist” in Nature Biotechnology today. Nature Biotechnology is, as the “Nature” label might suggest, a cutting edge hugely-cited journal dedicated to disseminating the most relevant advances in biotechnology techniques and methodologies. The commentary seems to be targeted at Biologists, but whether they are established researchers or recent graduates is unclear. The commentary is the most basic of basic introductions to the world of coding and working with computers. It advocates using the unix command line, since that’s the best interface to control bioinformatics software from, and creating “traps” for your code, which are better known in computer science as tests. Testing your computer code is a fundamental practice introduced in first year undergraduate degrees, if not in high school. The caution advised against over-interpreting statistically significant results (there’s a lot to be said for the problems of using only statistical significance test, in absence of effect magnitude tests, in computer-based scientific disciplines where extra samples are easy to come by – see last paragraph) should be familiar to all scientific PhD candidates. The same applies for the use of ‘controls’ in designing experiments. I am surprised how much of this article constitutes basic computer science practice, statistics and experimental design, given the journal. The “have an Obama frame of mind: Yes, you can!” tone of the article is also curious.

All is not lost however, there are at least two excellent and highly relevant messages hidden in the article. 1) knowledge of the biology is required to correctly and appropriately interpret the results. This is not the whole story, I feel a good knowledge of abstractions/assumptions inherent in computational techniques is also required, but it is an encouraging message. 2) the article finishes by advising the would-be freshly inspired computational biologists to find help from people in the know. That chimes a lot with me: trying to conduct meaningful computational biology research without a firm grounding in the biology is crazy, its difficult to ensure that the research is relevant. To me it seems just as crazy for biologists with no training to start tinkering around with computational techniques, hoping to conduct sound research. Which leads me to wonder what the point of this article actually is. It seems to me the sensible message is, “Want to start doing computational biology research? Get some training and find people who know what they’re doing”, which requires a lot less space than 3 pages in a cutting edge journal.

Oh, and the statistics. When you’re conducting computer-based scientific research you have to be very careful about using only statistical significance tests (the ones that give you p-values). You’re best of using both statistical significance and effect magnitude tests (the latter are also called scientific significance tests). The reason has to do with the availability of samples. Statistical significance came about in a period of science when samples weren’t so easy to come by – experiments on people (psychology), or on animals, which were expensive and have ethical concerns. The experimenter wants to determine whether the experiment has any effect. This is normally done by rejecting the null hypothesis – a hypothesis along the lines of “what I’m doing/testing has absolutely no effect”. Statistical significance tests determine the probability of getting the observed data if the null hypothesis is true. What the experimenter is looking for is observed data that is so hugely unlikely (though not impossible) to happen that its reasonable to conclude the null hypothesis is wrong. Less than 5% chance is common, expressed as “< 0.05″.

Lets take an example, you want to test whether a coin is fair. The null hypothesis here is “the coin is fair”. You flip it 10 times, giving 6 heads and 4 tails. Is it fair coin? The p-value will come back with something > 0.05, meaning you cannot reject the null hypothesis. What this means is, “there’s a good chance you could have gotten 6 heads out of 10, even if the coin was fair – its just luck”. Getting 9 heads would be a different story… Anyway, so this time you take more samples. You flip the coin 1,000,000 times (you have a lot of free time, and rock solid commitment to the cause) and get 550,000 heads. Its nearly 50:50, but the difference is 50,000 heads. That’s a lot of heads. Its pretty unlikely that a fair coin would have given you this many more heads. I reckon you’d get a small p-value here, < 0.05 (or “*”). If you had even more time, and did 100,000,000 flips, getting 55,000,000 heads you’d get an even smaller p-value, probably <0.001. You can conclude the coin is not fair. What’s missing here is the “who cares” test, and that’s what effect magnitude gives you. The data showing an effect, i.e., pretty unlikely you’d have got this data by chance assuming the null hypothesis is true, is not the same as showing an important effect. And here’s the issue, you can always achieve statistical significance by taking more samples. There’s only one exception, and that’s if there genuinely was no effect in the first place – like if the coin really was 100% fair. But that’s almost never true, certainly not for any common coin in my possession, and almost certainly not for most null hypotheses. We may use controls in most experiments, but there’s nearly always some effect, no matter how good the control is. Add to that, it takes a special breed of researcher to be studying things could genuinely have no effect (I wouldn’t rule it out, but I can’t think of any right now). Almost anything you study, any experiment, will have some effect. What the world really cares about is, how much effect? Is it an important effect? Should we stop paying money for this thing? Stop doing this thing? I love crisps. I’m fairly sure that eating 1 crisp a day will reduce a persons life span. I might live 1 day less.

You’d never see this effect by looking at me alone, you’d need to measure the lifespans of millions of people. But it can be done, because the fat in crisps (and the carcinogens derived from frying them) will have some effect. So if you had the money, time, and will, you could show this to be the case. You could get that p-value, and put “***” on that graph. The problem is, nearly no one will care. I’m not going to stop eating crips that I love for a 1 day difference in life span. And that’s the difference. You can always show statistical significance with enough samples, the important question is whether the effect you have seen is a big or important effect. Compared to running experiments on people or animals, which are expensive, time consuming and can have ethical considerations, computers are very cheap and you can run millions of experiments. In light of that, statistical significance tests aren’t so useful anymore. You should still do them, but you should also perform effect magnitude tests.

Wow, that took a lot longer than I though it would. Congratulations if you got this far. Have a crisp (and a beer).

Doctoral work accepted by PLOS ONE

Its been a good week. I have finally gotten the bulk of my PhD work published! It is to appear in PLOS ONE, and is currently in the production process. This paper has been a long time coming, we first started talking about a paper like this in 2010. Since then, its been through 14 complete redrafts, not to mention being written up as a PhD thesis.

This is a real milestone for me; I’ve been working on this material for so long, conscious that a large chunk of my PhD hasn’t been published and pondering the implications on my CV. To read the paper you’d think the focus is on ARTIMMUS, the EAE simulator that I created. Indeed, the bulk of the results are focussed on EAE, investigating recovery-mediating regulation, the role of the spleen in recovery, and how we can simulate intervention strategies. However, for me the more powerful result here is the approach through which we constructed the simulation. Its about engineering high quality simulations, as indeed is my thesis. The close alignment that the paper demonstrates between wet-lab data and that of ARTIMMUS serves as a validation of our approach. Its a hard message to balance, selling the process requires the validation, but the validation message shouldn’t dwarf the process.

Its not easy to write a paper like this, after having written an entire thesis on the subject. As plenty in academia know, thesis writing can be a harrowing experience. To then have to re-address this material and go through draft after draft of a journal manuscript requires quite some motivation. Its worth it though, and in a world where academic progression seems increasingly contingent on publishing astounding quantities of papers, you can’t afford not to. By the way, the implications on my CV can’t have been too severe; I recently landed a job as a Research Fellow at the University of Sydney (before this paper was accepted). I shall write about what I’m doing here next week – two blog posts in one day is plenty.

Published in Nature

Ok, its not a peer-reviewed research manuscript (that really would be a reason to go home early and celebrate), its a short correspondence reviewed by the editor. However both Kieran and I are pretty early career stage researchers, and this represents the most influential journal that our names have to date appeared in. So I’m excited.

My recent blog post on better engineering scientific software was inspired by a Nature news story that Kieran forwarded me on how Mozilla were planning to have their software engineers scrutinise the scientific software accompanying some PLOS Computational Biology papers. I won’t recite it, you can read it below. Kieran and I got talking about the issue, and he suggested replying to the Nature news story. Some further emails and drafts, and some very heavy editorial editing later, and we have ourselves a short correspondence piece explaining our position. So, a good day for Kieran and I.

Who should publish scientific papers?

A week ago the journal Science published the results of a year long sting-operation on the world of open-access publishing. Open-access is (perhaps, was) touted as the gold standard of scientific publishing: the authors pay the fees for publication, after which the manuscript is freely available to anyone. There is appeal to this model, a paper’s potential readership is not restricted to those with a subscription to a journal, or willing to pay to see the article (sums of around $30/article are common).

The alternative model of academic publishing is the pay-for-access model, employed to by the majority of journals. It is neatly summed up by Curt Rice in a Guardian article: the journals pay the author nothing, they pay the editor nothing, they pay the three or more reviewers who scrutinise the work nothing. These journals do incur some type-setting and infrastructure costs. They then charge the academic authors, editors, reviewers – people who provided so much for free, and without whom the journal would have nothing – through the nose for access. The general public, who pay for both the work and its quality control, have to pay again if they wish to see it. Some perceive ethical concerns with this model, and some years ago there was a movement amongst academics to boycott journals published by Elsevier, one of the biggest publishing houses, which was consistently reporting profits of hundreds of millions of dollars: about 30% of revenue. The sentiment was that Elsevier was leaching huge sums of public money for comparatively little work. The boycott movement demanded that pay-for-access journals make publications freely available after 6 months. They appear to have been successful; having recently reviewed Nature’s license to publish it seems authors retain copyright and can post their as-published paper versions online after 6 months.

Perhaps in recognition of the ethical dilemmas above, there are now several funding bodies mandating that publications arising from research they fund be published in open access journals (e.g., NIH and Wellcome Trust). Science’s article has unearthed an exponential growth in the number of “predatory” journals taking author’s publication fees whilst having in place none of the quality controls essential for a healthy scientific discipline. Many of these journals go to great lengths to appear as having roots in western scientific communities, presumably to engender trust. Science’s sting is cunning, and the results are shocking. Variations on a bogus paper, intentionally littered with scientific and ethical errors that any half-competent peer-reviewer should identify, are submitted to over 300 open-access journals. Of 304 submissions, 157 are accepted, 98 are rejected, with the remaining 49 having not yet reached a decision. Of the 255 papers that underwent the entire editing process en route to acceptance, 60% showed no sign of peer review.


There are other stunning results in Science’s paper, but you get the gist: these are scam journals. Their motivations are either to make profit, or earn ill-deserved academic merit for either authors or editors. The interesting question for me is how academic work should be published. Open access is a great ideology, but the incentives don’t stack up – journals have too great an incentive to accept work submitted to them, because that’s how they make money. The pay-for-access system has issues too, explored above. This is an economics problem: how to organise academic publishing such that the incentives line up with the best interests of science.

Its worth pondering what the concept of a journal contributes to science in the first place. Firstly, it is supposed to ensure a high quality of work is published; it is a gatekeeper, affording confidence that results constitute a worthwhile contribution. Secondly, it is a collection. An issue of a journal brings the science to you, rather than you having to go and dig it out. Everything in the journal will relate to some theme or field, and with a subscription to a particular journal scientists can stay up to date with research relevant to them. Third, it constitutes a community platform and provides a forum for discussion and debate, facilitating the emergence of themes and perspectives.

It strikes me that most of these things can be achieved using services now available on the internet: databases and indexes, and social media. PubMed sends me weekly emails containing articles selected on the basis of search terms I have entered. Using a service like this, one can scope out their own specific ‘journal’, which in fact pulls contributions from thousands of journals. Social networking services such as ResearchGate allow one to follow researchers of interest or in particular labs, keeping them up to date with publications. Further, it facilitates discussions through messaging boards specific to particular fields. It is not hard to imagine that readers could rate particular papers, much like amazon reviews, and that popular or interesting manuscripts find their way to the top of the pile in this manner, instead of being in particular journals. Internet-based services could facilitate discussions and questions relating to a particular publication, as you see on many modern day blogs and news websites. One would only hope for less vulgar and more intelligible content than what I have seen on YouTube comment feeds.

If the online facilities described above can fulfil the ‘community’ and ‘collection’ aspects of traditional journals, that leaves only the issue of peer review. It strikes me that the best people to manage the peer review process, and accredit those papers that pass, might be the funding bodies themselves. They are tax-payer owned, and not-for-profit. They are not subject to as much inter-body competition; in the UK, government-run funding bodies do not tend to overlap a great deal in scope (the case with cross-disciplinary research is more complex…), instead focussing on a specific discipline e.g. the physical or social sciences. Hence, they are not under pressure to accept a particular proportion of submissions for fear of losing public money to another agency; I cannot imagine the government giving the social sciences research council a chunk of physical science budget simply because it accepts a greater proportion of submissions and appears to be funding better science. Funding bodies would not be unduly benefiting from the efforts of publicly-funded researchers; they are the ones typically funding them. There are other non-government funding bodies, some are relatively small charities, and I do not propose that all of these necessarily manage their own peer review process; if the government-run funding bodies required some cost contribution for the reviewing process, I imagine this could be accommodated in a grant award. Funding bodies now typically permit applicants to request money for publication costs, and this money could just as easily be used to pay for a publicly-run peer review process.

This vision is far from complete, and more comprehensive thought is needed than will go into one blog post. However, I believe that a future scientific system devoid of journals in the traditional sense we have now is completely plausible, and may be much healthier for science.

(The article that spawned all this: J Bohannon. Who’s Afraid of Peer Reivew. Science, 342(6154), 60-56; 2013.)

 

How well should scientific code be scrutinised?

A friend, Dr. Kieran Alden, forwarded me a Nature article today describing how Mozilla (who make the Firefox browser, amongst other things) are looking through 200-line snippets of code from 9 articles published in the journal PLoS Computational Biology. They are investigating the worth of having competent professional coders, people primarily trained in best coding practice, checking the code of scientists (who are often not formally trained at all) when it is published.

This is a debate that I gather has been gaining some momentum, I keep stumbling onto it. Should code be published, and how well should it be reviewed? I understand the desire of researchers to want to keep their code to themselves. From a research perspective there’s a lot of up-front cost to developing this sort of code, and once someone has it they can build on it and run experiments with almost no buy-in at all. Research code is very cheap to execute, and very difficult to get right in the first place. Certainly, there are a lot of labs who won’t share code until their first round of results emanating from it have already been published – it’s too easy for competing labs to get ahead based on your effort.

But not sharing code can arguably hold back science, with labs investigating similar phenomena having to develop their own code from scratch. This is not very efficient, and it seems that the incentives are wrong if this is the behaviour that the powers-that-be have produced. But that’s another story. The article briefly skirts around two facets of the debate: bad code can be misleading, and harm a discipline rather than informing it; and researchers who are already unsure about publishing code may be put off completely if it gets reviewed as well as published.

For me this isn’t even a debate, its a no-brainer. As a computer scientist by training who has developed several scientific simulations I am very familiar with the complexities of simulating complex domains. I smiled at the paper’s implicit suggestion that commercial software is the way forward – I’ve used linux, windows, mac, firefox… so much of this stuff is full of bugs. But that’s not the point. In research we’re simulating things we often don’t understand very well, which can be tricky to check for bugs: are the weird behaviours an interesting and unexpected result representative of the science, or coding errors? And that’s if you spot them; research labs certainly don’t have the budgets to spend on testing that commercial enterprise does.

For me it boils down to this: what do you think would happen if I, a computer scientist, tried to run experiments in a sterile biological lab with no formal training? Would you trust the results? You might find me drinking coffee out of a petri-dish. And then, when I came to publish, I was vague about my methods – “Just trust me, I did everything right… doesn’t matter how.” Science has demanded a stringent adherence to and reporting of experimental protocol in most disciplines. I don’t see how scientific results coming from a computer are any different. I think code should be published, and reviewed. The journal(s) that make this their policy (particularly the reviewing part) will earn a reputation for publishing robust, high quality work. Researchers have to publish in good journals to progress, so they will have to adhere to the policy – that or accept publication in a “lesser” journal. With spurious results coming from bad code holding such potential to mislead and set back a scientific field, I think it’s highly worthwhile that journals employ experienced coding professionals to check code (as suggested in the article). But really, this is too late in the process, what we really need is these stringent standards being employed during the research and development stage, so as to catch and fix errors as they occur, not killing research tracks years after they start when errors are caught only at the publication stage.

(The original article: EC Hayden. Mozilla plan seeks to debug scientific code. Nature 501: 472, 2013.)

Confidence in Simulations for Science

Last week I attended the excellent SummerSim conference in Toronto, which focuses on all domains of simulation application. An item high on my research agenda in recent years, and that of other researchers at York, has been establishing confidence that results of complex system simulations are adequate representations of those systems. Simulation is used to investigate biological, sociological, political, financial systems, and more. We typically simulate these systems to aid in understanding them, however if they are not well understood to begin with, how do you create representative simulations? It is this bootstrapping problem that techniques such as the CoSMoS process and argumentation structures seek to address. I have looked at this problem from domain modelling, statistical, and calibration perspectives.

Myself and others at York were curious as to whether similar issues were being identified by simulation practitioners in other disciplines. As such, we organised a workshop examining “confidence in simulations for science” workshop at SummerSim, hoping to draw representatives from the eclectic mix of disciplines represented. Our format was perhaps unusual; we did not wish to run a tutorial on techniques being developed at York, not did we want full paper presentations requiring peer review. This was a very preliminary examination of how far the issues we have been facing permeate through simulation endeavour in general. We put out a call for abstracts, hoping mostly to catch people who would already be attending the conference; I did not believe that many would find the finances to visit a conference when there was no publication to be gained (austere times that we live in). We received a number of very encouraging emails from people in the artificial life and synthetic biology community who sadly could not attend for our workshop clashed with other conferences they were already committed to. We received one abstract submission which was later retracted as the author could no longer attend the conference. We needed a plan B. We had prepared introductory material to the problem, and slides posing questions to the audience that we hoped would fuel a discussion. Fearing that we may be faced with a wall of blank stares, we prepared a lot of slides on activities at York addressing these questions, and hoped that we would not need to use them.

We were correct. The workshop was run in two 90 minute sessions. In the first session we had the three organisers: Paul Andrews, Kieran Alden and myself (Jon Timmis was sadly unable to attend the conference) and three participants. The size worked well, it was a very comfortable sized group for discussions. In the second half we were joined by others and had 7 participants. The feedback was very positive: we did generate a lot of discussion; the problems we have identified are pertinent to other simulation domains, and have not been solved elsewhere; the articulation of the problems was praised; the format was also well received, and we were given useful suggestions for how to improve it in the future; and we were informed that a sole outlet dedicated to these issues (rather than addressing them solely by-discipline) manner was useful. We shall have to think carefully about what the outputs of the workshop will be, there seems to be scope for a continuation. This shall be decided in the coming week, when Kieran (who is currently on holiday) returns with the minutes from the workshop. At the very least we have a mailing list of interested parties who can stay in contact and work together on this.

This was my first experience of chairing a workshop, and there are some lessons to take away. First, you will rarely struggle to generate discussions at a conference, though it helps to have a plan on how to lead in (I believe that Paul knew this already, having run numerous CoSMoS workshops in the past, but its one thing to be told and another to witness it firsthand). Second, try to ensure that the workshop’s details appear next to the main conference timetable in the programme. All conference attendees were given a staple bound programme that fell open in the centre at some coloured pages detailing where and when sessions were taking place. Alas the workshops were listed on page 3 or so, and I suspect many never saw them. Third, our initial suspicions that short un-refereed abstracts will not attract attendees to a conference was largely correct (though we did generate interest  in a few parties who could not attend). As such, a conference like SummerSim is appropriate, as it captures such a wide range of simulation practitioners. I thoroughly enjoyed running the workshop, and will seek to do it again in the future. I encourage all aspiring academics to give it a shot!

… and Toronto is well worth visiting …

Awareness Summer School 2013

This week I have been in Lucca, Italy, for the Awareness Summer School 2013. I had a dual role at this meeting, I was both invited speaker and mentor. The former entailed delivering a lecture on using immune inspiration to build novel algorithms and robotic systems. The latter, guiding two teams of summer school students in solving an underwater swarm robotics problem. Having found it thoroughly challenging myself, I was intrigued as to what 10 other brains would come up with in creating an algorithm for relay chain formation. A swarm of underwater robots have to configure themselves to form and maintain a communication chain between the water’s surface and an `exploratory’ shoal that searches the sea for some target.

The teams were very creative, and both managed to solve the bulk of the problem in just the week available (which included breaks for keynotes, coffees and lunches). The week culminated in each team giving a sales pitch presentation to industry. One of my teams got very imaginative, and pitched their algorithm as a way of identifying lost submarines beached on the sea bed. The comical edge that gave their presentation was very well received.

Other keynote talks were given by Alan Winfield, Peter Lewis, Rene Doursat and Martin Wirsing, and all offered interesting perspectives on self-awareness in autonomous systems. Alan delivered a very convincing argument that robots operating in a noisy and unpredictable environment cannot be safe (for humans) unless they are self-aware. It was great to see them all again, all had attended the awareness slides meeting in Barcelona last year.

Lucca is a fantastic city. The walls are great, you can try to work off the pizzas and fine wine by running around them. And there is of course, delicious pizza and fine wine.