Measuring the cleanliness of the CPAN River

CPAN River Mon 11 May 2015

When cleaning up a river, you need a measure for cleanliness, and then you start cleaning, and see whether your measure shows improvement. This post outlines two simple measures that may help us measure overall CPAN quality, and then start working to improve it. These are just early ideas — I'm sure we can some up with something better.

Water quality right now

The best measure I can think of, which is readily available, is CPAN Testers performance. To start off simple, let's say that a distribution is 'bad' if it has at least 50 CPAN Testers results for the latest non-developer release, and 2% or more of those are failures.

But if we're thinking about river quality, I think we need to consider position on the river. The further upriver you are, the more harm being bad might do. Here's a picture, which I previously talked about, but with some more annotations:

log binning

The "CT Failing" row gives the number of dists in each stage of the river that is failing the CPAN Testers metric defined above.

Aside: the CPAN Testers data is slightly out of date, as CPAN Testers is currently down. Also, in the previous article I listed 47 dists in the 10,000+ category, but I realised that one of those was Perl itself!

Interestingly, the percentage of failing dists decreases as we "go upriver", until you get furthest upstream.

The impact of failing

The further upriver a dist is, the more downriver dists that might be impacted by its failing. So instead of counting the number of failing dists, I summed the counts of downriver distributions: that is the "weighted failing" row.

We could sum those figures along the whole river, and that could give us a measure of something-or-other for CPAN at a given time. We could record this over time, and see whether CPAN is is gradually getting better, or not. If we were actively trying to improve it, the weighting would encourage us to start upriver, as that will have the biggest impact.

Obviously this is a naive measure, as one dist having some fails on CPAN Testers doesn't mean that it's breaking all of its downriver dependents. And there's some double-counting (and more!) going on: a failing dist in the middle of the river might be the cause of fails slightly downriver, which in turn might be the cause of more fails further downriver.

And it should probably be based on some filtering for mainstream operating systems.

Reliability and Stability

In addition to the current river quality, I think we should also consider the reliability of CPAN. If you've done enough releases to CPAN, at some point you'll have done a release which turned out to be broken in some way. At that point, attempts to install a downriver distribution make fail because of your broken release. Sure, when you realised that your release was broken you uploaded a working release.

So you temporarily polluted the river, and once your working release got around all the CPAN mirrors, your pollution was washed out. And depending on how often it was measured, it might not show up on the "quality of CPAN over time" graph.

So in addition to the water quality over time, I think we should measure the stability and reliability of distributions over time. Some measure of this across all of CPAN would be an indicator of whether we're improving our development and releases processes as a group.

Having this available on individual distributions might be useful too. Let's say you need some functionality and there are two distributions you could use. One of them has slighty better runtime performance, But the other one is much more stable and reliable. I know which one I'd choose.


I think we should monitor some measures of CPAN that show us:

I think we can probably come up with better measures that those presented here, but we probably don't have to worry too much about coming up with optimal measures. As long as they encourage us to focus in the right places, we should do ok.

comments powered by Disqus