I’m no world-class statistician, but I am capable of learning what I need to get by. Recently I have been looking into running particular statistical tests and have been dismayed by either the general absence of implementations available online, else by the crazy hoops one needs to jump through to access them. The latter usually involves installing additional modules or entire software suites, which do not always play nice. So I implemented a few tests myself, and will place them here for all to use.

First, the Vargha-Delaney A-test [1]. Its not widely used, so I’ve provided a reference at the end. This is a really useful non-parametric effect magnitude test that can differentiate between two samples of observations. One sample might be a control experiment, the other may be a different algorithm, a new drug… the effects of wearing a fetching tin foil hat – it matters not. The test returns a value between 0 and 1, representing the probability that a randomly selected observation from sample A is bigger than a randomly selected observation from sample B. Hence, it tells you how much the two samples overlap. Values of 0.5 indicate that the medians are the same (though the spreads need not be). Values of 1 and 0 mean that there is no overlap at all. Vargha & Delaney [1] provide suggested thresholds for interpreting the effect size. 0.5 means no difference at all; up to 0.56 indicates a small difference; up to 0.64 indicates medium; anything over 0.71 is large. The same intervals apply below 0.5. I was once given an implementation of this test by Dr. Simon Poulding (I think it may have been originally written by the revered Prof Susan Stepney, but I’m not sure), but the implementation came with a strict warning that it was only suitable when both samples have the same number of observations, which is a serious drawback. I wrote another implementation that lifts this restriction; the key is that the A-test is a linear transformation of Cliff’s Delta statistic. I wrote yet a third implementation in python.

- A-test matlab code – only use for samples with equal numbers of observations.
- A-test matlab code, works for samples for unequal sizes. You will need Cliff’s Delta also.
- A-test python code.

Second, I’ve been doing some python programming recently. I wanted to draw a cumulative distribution function based on empirical observations. I could not believe how many different methods of generating such a thing existed, several didn’t work, and some gave different answers. It’s not that difficult a statistic to calculate. so I wrote my own. Enjoy.

I didn’t have to create my own implementation of the Kolmogorov-Smirnov 2-sample test (KS2), but today it’s fast becoming a favourite. The A test (and Cliff’s Delta, Mann Whitney’s U, and probably a whole load more) only really compare medians, sometimes in the context of the spread. KS2 operates on two samples’ cumulative distribution functions, and as such it is sensitive not only to shifts in median value, but also their shapes and spreads. Most implementations return both a p-value (which is not the same as effect magnitude), and the ‘D’ value. D is awesome, and seems to be under-appreciated. It represents the maximum difference between the cumulative distributions at any value. See the bottom figure for a graphical interpretation. I plan to use this in automated calibration, where I like its sensitivity to so many aspects of a sample’s underlying distribution.

[1] Vargha A, Delaney HD. A critique and improvement of the CL common language effect size statistics of McGraw and Wong. J Educ Behav Stat. 2000;25(2):101-32.

Fig 1 taken from here. Fig 2 taken from here. Fig 3 taken from here.