Dr. Mark Read

Research Fellow, Charles Perkins Centre, The University of Sydney
Research Gate
My Research Gate

Paper on Calibrating Complex Biological Simulations Released!

RSInterfaceIt’s publishing season. Like the buses, nothing comes along for a while, and then all the papers are published at once. Another grand effort that has been 3 years in the making (since I started at the University of Sydney), the Journal of the Royal Society Interface this week released our manuscript on automated calibration of complex biological simulations. The original idea was formed near the end of my PhD, and is part of a larger theme on how to engineer accurate and representative simulations of biological systems that are incompletely understood: how do you simulate something if you don’t know how it works?

The answer in part rests on “calibration“, a process whereby you adjust the simulation such that its output matches that of known reality. Typically this involves finding parameter values, where the biological correlate is unknown. There are any number of approaches for doing this; the trick for complex system simulations is how you measure that difference between simulation and reality. Being complex, these systems cannot be well characterised in single observations or metrics alone, and that immediately blows standard techniques out of the water. An example, in this paper we employ the “ARTIMMUS” simulation as a test case, it simulates mouse multiple sclerosis. There are four chief T cell populations involved in the disease and subsequent recovery stages of this disease, each growing in population size, peaking, and falling again in a unique manner. They are all critical, and you can’t calibrate on the basis of one alone. Now that’s quite lot level, why not just characterise “disease severity” instead? Well, for starters “disease” is a very emergent property that is hard to replicate in simulation, we tend to deal with more concrete measures that can be tied to specific phenomenon. Take multiple sclerosis, there’s any number of ways that different areas of the nervous system can be impacted to deliver a given degree of debilitation.

mocEnter our approach. We use multi-objective optimisation to simultaneously evaluate several metrics of simulation’s capture of important biological features simultaneously, and evaluate find appropriate parameter values accordingly. We name this technique Multi-Objective Calibration (MOC). MOC exposes where several parameters may trade off against one another to deliver a given simulation dynamic, or where certain aspects of the simulation dynamic are maximised at the expense of others. This raises another intriguing possibility: what does it mean if no parameter values can be found that simultaneously align all aspects of a simulation’s output with that of reality? We propose that this points to a simulation that fails to adequately capture the complexities of the biological components; the changes that need to be made are not parameter values, but rather what those parameters represent. Perhaps an important cell population or component in the biology is missing from the simulation, or has been incorrectly captured. In this manner, we propose that MOC can play a vital role in guiding simulation design and development, not only parameter tuning at the end.

Paper on Modelling Leukocyte Motility Released!

plos_cbWell it’s been three years of work in the making, and earlier this week it was finally released. PLOS Computational Biology have published our work on modelling leukocyte motility!

There is a growing body of research interested in how things move. Many biological processes depend on things interacting with one another, and interactions happen when they contact; hence, the interest in motility. Anything from immune cells finding bacteria to lions finding zebras. The main contributions of this paper are to highlight that motility, particularly in 3D spaces, is quite a complex thing to characterise. We show how to use multi objective optimisation in characterising this movement and calibrating motility models to real data. Further, by looking at the quality of models’ fit to experimental data, you can use this approach to select which model best describes the real world system.

We found some other interesting things too. The leukocyte populations we were modelling, neutrophils and T cells, both show negative correlations in speed and rate of changing direction. It’s naive and totally unbiological (I am a computer scientist), but I like to think of someone trying to do a U-turn at 100kph. Not really possible. We also found that cells are individuals in the sense that their respective movements differ from one another. Some are faster, some turn more. Perhaps most interesting, when you take the multi-objective approach, therein assessing models against several criteria simultaneously, the much hailed Lévy flight is not the best description of how these cells move.

densityIt’s been a wonderful adventure in 3D simulation. I’ll be thankful never to have to look at quaternions again. And collision detection between cells in 3D space is also quite a difficult computation to perform. The plan from here is to use this technology to simulate (complete with accurate motility of cells) the onset and resolution of diseases. Or perhaps processes involved therein, the whole hog is quite complex.

Thanks to Tatyana Chtanova, Jon Timmis: co-authors, friends and mentors who helped push this over the line. Oh, wondering what that is a picture of? Read the paper to find out…

Code to Generate a Lévy Distribution

Ah, the Lévy distribution. The foundation of the Lévy walk (or flight). The Lévy walk started gaining a lot of interest around 1999, when Viswanathan et al. published an article in Nature showing it an optimal search strategy for finding sparsely, randomly distributed targets. A stunning variety of organisms, including bacteria, flies, monkeys and sharks, have been described as performing Lévy walks.

To perform a Lévy walk, the walker selects a random direction in space, and moves in that direction in a straight line. Depending on how you formulate the walk, any of the duration, length or speed of the walker is drawn from a Lévy distribution. What is this Lévy distribution, I hear you ask? An excellent question, and not an easy one to answer by reading around the internet. The wikipedia entry for generating a Lévy distribution using inverse transform sampling, for instance, is wrong; the method produces a distribution with a power law decay of 1/(x^1.5). The scale factor, c, does not alter this. Behold the log-log probability density function:

test_levy_power_law_decay

The gradient of the drop off to the left should be variable, but clearly isn’t. So what should the Lévy distribution look like? It is centred on, and symmetrical around, zero. The long tailed part means that Lévy walkers make lots of short movements, interspersed with very long jumps. The variable power law decay part adjusts the balance of short jumps to long jumps. The Gaussian distribution is a single instance of the Lévy distribution, though whereas the Gaussian’s mean can be shifted I have not seen this being done in the Lévy walk literature (doesn’t necessarily mean it hasn’t, mind).

I spent a long time digging and hunting around for how to produce a Lévy distribution, and found a lot of conflicting information. In the end, with the help of an anonymous reviewer, statistical wizzard Dr. Gregory Rice, and the first two papers cited below, I cracked it. Below you’ll find a ZIP file containing python and java code that will generate a Lévy distribution for you, enjoy. A disclaimer: I am not a statistician, and take no responsibility for any of this being incorrect (but I think it’s ok). I think part of the challenge in gaining a clear picture of Lévy walks/distributions is that many things have been called a Lévy distribution, and the Lévy walk literature is littered with inconsistencies. Some Lévy walks employ constant speed, others not, some have “rests” between phases of mobility, others not. Most draw walk durations (in a given direction) from a Lévy distribution, but I’m willing to bet there are exceptions out there…

Why all this Lévy stuff? I recently had a paper accepted at PLOS Computational Biology! It’s on reproducing leukocyte motility dynamics in simulation, and Lévy walks are one of several motility models the paper examines.

References:

  • Harris, T. H., Banigan, E. J., Christian, D. a., Konradt, C., Tait Wojno, E. D., Norose, K., … Hunter, C. a. (2012). Generalized Lévy walks and the role of chemokines in migration of effector CD8+ T cells. Nature, 486(7404), 545–548. http://doi.org/10.1038/nature11098. The supplementary materials of this paper contain the equation for generating a Lévy distribution. It’s in Nature, so I hope it’s correct.
  • Jacobs, K. (2010). Stochastic Processes for Physicists: Understanding Noisy Systems. Cambridge University Press, Cambridge. http://doi.org/10.1017/CBO9781107415324.004. This book contains the method that the Harris paper (above) employs. Section 9.2.2.
  • Plank, M. J., Auger-Méthé, M., Codling, E. A., Plank, M. J., Auger-Méthé, M., & Codling, E. A. (2013). Lévy or Not? Analysing Positional Data from Animal Movement Paths. In M. Lewis, P. Maini, & S. Petrovskii (Eds.), Dispersal, Individual Movement and Spatial Ecology; A Mathematical Perspective (pp. 33–52). Springer. http://doi.org/10.1007/978-3-642-35497-7. This paper provides a very good explanation of the pitfalls in determining that some phenomenon is performing a Lévy walk. There are other papers too. Find the references in my recent PLOS Computational Biology paper!

Python and Java code are contained in this ZIP file. The Java implementation makes use of MASON’s excellent random number generator.

An Academic Future?

For all walks of Academic, bar perhaps those senior enough to have stability in their jobs, there seems no end of concerns and neuroses about career prospects. These mainly concern job security, concerns about uncompetitive earning potential, and workload. There are any number of news articles that cover this, apparently to little effect as the system (in my 8 years experience) hasn’t improved one iota. In my experience of interacting with these folks, two of these three concerns could be accommodated. The work load is OK if it’s a subject that one can feel impassioned about. There’s a real motivation and pride that comes from believing that your work can improve lives and help people, we are after all mostly tax-payer funded, and I believe many are willing to accept a lower pay to do something they love.

The lack of job security, however, is another story. To an extent pre-30-year olds will, begrudgingly. accommodate this too. But when significant others, families, mortgages and a feeling of falling completely behind your compatriots start to settle in, stress levels rise. These are some of the highest educated, brightest folks we’re talking about. Sure, many seem socially restrained, but their drive and aptitude are world-class, they’d have to be to survive the PhD process, let alone the years of uncertainty that follow it. Now you might happily dismiss these people, it was after all their own decision to go down this path… but the tax payer has sunk a huge quantity of money into training them under the assumption that society at large will benefit from their work down the line. To provide this support and then not provide for the careers to realise the promise seems inefficient, shortsighted and wasteful.

Yet, what strikes me about the articles I’ve read is a fundamental error in scope. Most decry the lack of academic future. Some have focused on the specific issues including universities pushing more admin work onto academics to cut the costs of hiring support staff, overloading academics with teaching responsibilities to the point that research is compromised, or simply failing to provide any form of predicable and stable career path for research-only staff. Estimates average about 7 times as many PhDs trained as there are academic positions for.

A PhD is an education in a whole lot more than the specific research topic you became an expert in. In this journey one learns how to deconstruct a seemingly insurmountable problem into manageable components. There’s a fundamental requirement for application of logic here. You are trained in resilience, project management, self-reliance, and increasingly in team-work across disciplines. You learn how to solve problems no one knows how to solve, and become a world-leader in doing so.

The problem isn’t that there aren’t sufficient academic jobs so much as that academics aren’t educated in the full range of jobs outside of academia for which they can add tremendous value. Yes, there is incredible competition for academic positions, but I don’t believe that means we should train less PhDs – an educated populous can only be a good thing. It means their potential impact across society should be better realised. I know first hand that the academic system does a completely deplorable job of engaging industry and connecting these bright, highly trained people with potential employers who could benefit hugely from their skill sets. It is exactly this bridge that the NSW EMCR initiative tries to address, but we’re only unpaid volunteers.

From a policy perspective, if you want to realise the broader economic benefit of training so many PhDs, ease the competition in the academic system, prevent Australian PhDs from going abroad (having paid for their education yourself) and prevent the PhD program from being labeled a dead-end career, then I encourage you to make the relatively easy leap of facilitating industry engagement within universities. Currently, training for post-PhD employment prospects is something universities don’t really tackle at all. It would benefit the commercial sector to access these bright sparks, good for those sparks to feel valued and make a contribution, and would encourage more people to gain higher education. Everyone wins.

Customer Relations Policy at the Student Loans Company

An unfortunate corollary of pursuing a PhD is the tendency to accrue a sizeable student loan. My initial degree was a 5-year integrated Masters degree, and I sought a fair chunk of loan to help support myself. Unfortunately, the 4 years of PhD study that followed it are poorly paid – one can get by, but you don’t have much by way of disposable income, and because you’re a student you don’t pay back any student loan.

The result is that I graduated from my PhD with close to £20,000 of debt. This was taken off my post-doc-earned pay whilst in the UK at a fair rate. However, moving abroad to Australia has been a game changer. My pay here, in UK terms is high. However, living costs over here are also very high. Sydney is the 5th most expensive city in the world, and rent in particular is astronomical. This isn’t well reflected in the pay threshold, over which you pay a proportion of your income as student loan. When I moved over here the threshold for Australia was about £22k/year, whilst the threshold for the uk was £18k. My feeling is that the UK threshold is reasonable, the Australia one is complete nonsense, at least by Sydney standards.

Going abroad, there’s little the Student Loans Company (SLC) can actually do to make me (or anyone else) repay their loans. There is no system through which they can access my income. I’m unsure of their capacity to set the debt-collectors in my direction. I did the responsible thing, and rather than disappear into the shadows, I informed them that I was moving, and told them of my new income. The return message was a blow to the head. I was asked to repay £260/month, about three times what I was paying in the UK. For a time I was to be the sole earner whilst my wife sought a job, and setting up a home in Sydney is… you guessed it, expensive. I asked the SLC if they could, for a time whilst we set ourselves up, reduce the expected repayments. The answer was a cold “No sir, sorry sir, nothing we can (or will) do. Best keep repaying (or else)”. I was unimpressed, and I managed to clear the arrears once my wife found work.

Last night I got a call from the SLC, and in keeping with expectations, they didn’t fail to ruin my day. Apparently, I hadn’t returned some forms that I never received, informing them of my current income. This was meant to happen 7 months ago. Rather than phone or email me to inquire what was up, the SLC decided that a more appropriate response was to bump up my expected payments to £350/month, and inform me only 7 months later that I was in arrears. To legitimately be asked to repay £350/month, I’d have to be in, I’d guess, the top 5% of wages in Australia. Sadly, I am not an investment banker. I expressed my distaste at this policy, and was informed that my opinion wouldn’t change anything, this was set by the good ol’ UK government (only I’m not dead sure it is, the SLC was privatised some time ago).

I am actually paying back the loan faster than expected, because for now at least, I can afford to do so. However, some time soon I will likely wish to start a family. When I do, my family will probably drop back down to having a single income to support 3 for a while, and I doubt very much I can continue to make payments at the rate requested. I asked if the repayment schedule can be amended in light of my personal circumstances, should the time come.  I pretty much got a “No sir, sorry sir, nothing we can (or will) do. Best keep repaying (or else)”. I consider this to be highly irresponsible as a lender, and this gets to the crux of my substantial dislike of the SLC.

In my opinion, a responsible lender tires to work with the debtor to ensure that the debt is cleared as fast as it can, without crippling the debtor. I would prefer to see the SLC pursue cooperative and consolidatory relationships with its debtors. I hope it is taken as a sign of good faith that I’m paying back my loan at a faster-than-requested rate, and that I’ve been proactive in keeping them in the loop when I left the country. Not so. If it is genuinely the case that an unwaveringly inflexible policy on repayments, in which communications with debtors take an aggressive and obstinate tone, really the only way to recover the debt for some people, then I am still appalled that this policy is applied across the board. It does little for their reputation. Policy makers from a generation who’s parents somehow found the means to give them a quality education at little cost (grants, not loans, used to be a ‘thing’) are instead turning to the next generation and saying “sorry, I know I got all this largely for free, but not you. No, we’re going to load you up with a sizeable house-deposit’s worth of debt. Good luck getting on the housing market. Oh, by the way, we’re all getting older, haven’t put enough money away into the pension schemes and have failed to responsibly manage the health care systems. So you’re probably going to have to pay that too. Have a great life. And that impending climate change disaster thing, well… never mind eh?! I’m sure you’ll find a good way of dealing with that.” As you can tell, this makes me angry. I feel the younger generation is getting a pretty rough deal in many regards. And the SLC? This scheme was set up by the government, which is supposed to represent the people. They work for us, not the other way round. I feel that the way this institution is operation is very, very wrong.

Sorry for the ranty post. On the other hand, I feel a lot better now :)

Back online

bender-backFor a few weeks there my website was a vegetable. My previous website host went bust, almost taking my domain name with it. Thankfully I got a backup of everything just in time. That then meant stretching all my computer science muscles (and patience) as I tried to get the whole thing working again on a new host. A world of php, .htaccess files, wordpress config files, database management portals, and a heap of other things I’ll be happy never to see again. But we’re back.

And when I get some more time (it’s taken me about 3 weeks to find the time to get this website up again) I’ll write some more blog posts. There’s no shortage of things happening, just a shortage of time to write about them.

(And yes, that’s Futurama’s Bender conveying this important message)

Sydney Bioinformaticians Get Together

Towards the end of one excellent conference (ANZOS 2014) I saw the call for another workshop, and so threw in an abstract. The abstract was accepted as a post-grad talk, and so I would be presenting at the Sydney Bioinformatics Research Symposium 2014. This was a smaller, though still well-attended affair hosted at the CPC.

There were some noteworthy highlights. Prof Andreas Zankl (a clinician) give a stunning talk littered with computer science, he and his team (of 3!) are designing a new database and user interface to collate together data and opinion on bone dysplasia. It was just so strange to have a GP tell me about the semantic web. As with last year, this year’s gathering had a “fast forward” hour wherein those with posters were given 2 minutes (exactly, you were cut short the second you overran) to draw attention to their work. It’s an adrenaline fuelled experience for the presenters, well aided by David Lovell’s considerable talent as an MC.

My own talk was on neutrophil swarming, and specifically how automated calibration can be integrated into the simulation development cycle to help refine the model. It was very well received, moreso than I think most other talks I’ve given thus far in my career. My approach in using simulation to test biological hypotheses differs from typical bioinformatics work, which is concerned with making sense of large quantities of high throughput data. I think both communities enjoyed learning a bit more about how the other’s approach works.

This year has not lacked for interesting conferences and workshops, though I’ve barely left Sydney. Its fantastic here, but I suppose the downside of living in an exotic place is that the conferences tend to come to you, rather than giving your the opportunity to head abroad.

Doctoral work published

I have been neglecting the blog. It’s not for want of material to talk about, but life seems to have gotten very busy all of a sudden. A few weeks ago I hit a major milestone, I published the last of the work from my thesis. The Journal of the Royal Society Interface is a top interdisciplinary journal, and I was thrilled that they accepted my manuscript on modelling biological systems.

The paper describes an approach to creating domain models: non-executable models that describe the components and interactions of a biological system. This is part of a wider strategy for creating high quality simulations that demonstrably and accurately capture their target biological behaviours: the CoSMoS process. The domain model provides a consistent and comprehensive perspective of the biology, records what is simulated and what is not, what is abstracted and how, and in doing so provides a coherent link of how simulation results relate to the real biology.

As a comical side note, I love being in Australia and working at the University of Sydney. But there’s one Australian citizen who is particularly irritating for me. As an academic it’s in my interest to raise my online profile. I want this website to be the top hit when someone googles “Mark Read”. Unfortunately I’m not the only Mark Read in Australia: Mark “Chopper” Read is a convicted, now deceased, criminal, author and celebrity. Mr. Chopper started his illustrious criminal career by robbing drug dealers, and went on to achieve highs of assault, armed robbery and kidnapping. Here is our hero now:

What a nice chap. So it’s entirely possible that, short of winning a Nobel Prize, I will spend much of my own life overshadowed in online searches by this man. Isn’t that nice.

Invited Talk Season

Life has become suddenly hectic, I’m giving three talks in 4 weeks, and I’m discovering again how much preparing for talks occupies your mental space. The desire not to make a fool of ones self is a powerful motivator!

The first talk was to the Discipline of Physiology here at the University of Sydney. As a computer scientist, giving a talk concerning biology to a room of biologists is a little nerve wrecking, but it went well. My aim was to demonstrate how mechanistic simulation (simulation’s that explain not only what the biology is doing, but how and why it is doing it) is a valuable complement to traditional techniques in trying to understand biological systems. I found the room hard to read, but I am told it was a successful talk. It did not inspire a shower of requests for collaboration, but I think it got the audience thinking. I tried to be honest about the limitations and pitfalls of simulation, having to program the environment and physics in addition to the experiments themselves leave a lot of scope for error. I’m a big fan of the argumentation structures that Kieran Alden has been pioneering at the University of York, which I feel is a key component to demonstrating that simulation results are faithful representations of the biology. Incidentally, this work culminated in a tool written by Paul Andrews, also at York, to support the creation of such arguments – Artoo. Kieran and Paul are preparing a manuscript on these techniques, and I’ll create a post about it when it hits the press.

The second talk was at the Computer Graphics International (CGI) conference hosted here in Sydney. Jinman Kim organised a workshop at the conference, and asked me to present my work. I was honoured, and somewhat nervous again as the work on neutrophil swarming is still very much under construction. Still, its a great case study, and very visually appealing. Essentially, I’m working with Tatyana Chtanova of the Garvan Institute, Jon Timmis and Peter Kim to try and explain what cases this swarming behaviour in neutrophils (immune system cells that respond very rapidly to injury, you will need quicktime installed to see the movie, click it to play) [1]:

That video shows how neutrophils in a mouse ear respond to a sterile injury caused by a very fine laser burn (causing a single cell to burst) over the course of 50 minutes. It’s striking how quickly cells respond, and how coordinated their movement is. It came from reference [1]. This talk too was very well received, possibly in part because I littered it with videos and images. I have one more talk to give in a week, but that’s to a lab that I work with here, so less pressure. I’m really starting to wish that there were about 48 hours in a day!

REFS:

[1] T Lammerman et al. Neutrophil swarms require LTB4 and integrins at sites of cell death in vivo. Nature 498(7454):371-5. doi: 10.1038/nature12175. URL.

Free statistics code

I’m no world-class statistician, but I am capable of learning what I need to get by. Recently I have been looking into running particular statistical tests and have been dismayed by either the general absence of implementations available online, else by the crazy hoops one needs to jump through to access them. The latter usually involves installing additional modules or entire software suites, which do not always play nice. So I implemented a few tests myself, and will place them here for all to use.

First, the Vargha-Delaney A-test [1]. Its not widely used, so I’ve provided a reference at the end. This is a really useful non-parametric effect magnitude test that can differentiate between two samples of observations. One sample might be a control experiment, the other may be a different algorithm, a new drug… the effects of wearing a fetching tin foil hat – it matters not. The test returns a value between 0 and 1, representing the probability that a randomly selected observation from sample A is bigger than a randomly selected observation from sample B. Hence, it tells you how much the two samples overlap. Values of 0.5 indicate that the medians are the same (though the spreads need not be). Values of 1 and 0 mean that there is no overlap at all. Vargha & Delaney [1] provide suggested thresholds for interpreting the effect size. 0.5 means no difference at all; up to 0.56 indicates a small difference; up to 0.64 indicates medium; anything over 0.71 is large. The same intervals apply below 0.5. I was once given an implementation of this test by Dr. Simon Poulding (I think it may have been originally written by the revered Prof Susan Stepney, but I’m not sure), but the implementation came with a strict warning that it was only suitable when both samples have the same number of observations, which is a serious drawback. I wrote another implementation that lifts this restriction; the key is that the A-test is a linear transformation of Cliff’s Delta statistic. I wrote yet a third implementation in python.

Second, I’ve been doing some python programming recently. I wanted to draw a cumulative distribution function based on empirical observations. I could not believe how many different methods of generating such a thing existed, several didn’t work, and some gave different answers. It’s not that difficult a statistic to calculate. so I wrote my own. Enjoy.

I didn’t have to create my own implementation of the Kolmogorov-Smirnov 2-sample test (KS2), but today it’s fast becoming a favourite. The A test (and Cliff’s Delta, Mann Whitney’s U, and probably a whole load more) only really compare medians, sometimes in the context of the spread. KS2 operates on two samples’ cumulative distribution functions, and as such it is sensitive not only to shifts in median value, but also their shapes and spreads. Most implementations return both a p-value (which is not the same as effect magnitude), and the ‘D’ value. D is awesome, and seems to be under-appreciated. It represents the maximum difference between the cumulative distributions at any value. See the bottom figure for a graphical interpretation. I plan to use this in automated calibration, where I like its sensitivity to so many aspects of a sample’s underlying distribution.

[1] Vargha A, Delaney HD. A critique and improvement of the CL common language effect size statistics of McGraw and Wong. J Educ Behav Stat. 2000;25(2):101-32.

Fig 1 taken from here. Fig 2 taken from here. Fig 3 taken from here.