As a CPAN distribution moves up the river it needs to become more reliable, as by definition more distributions are relying on it. In this post I propose a simple metric for "suitability for depending on", which is essentially a water quality metric for the CPAN River.
When picking a module to use, there are a number of factors you should consider. The obvious ones are: does it provide the functionality you really need, does it behave as documented for all inputs, and is its performance acceptable to you. But you should also consider whether it's a good distribution to depend on: is it going to impact your module's reliability?
If you're going to rely on a module, I think the following should be true:
META.json
or META.yml
file (and preferably both).To keep it simple, each CPAN distribution is given a pass or fail for this metric.
Having zero CPAN Testers fails isn't always plausible, due to issues with smoke testers, bad perl configurations etc. So the measure I generally use is:
Has more than 50 CPAN Testers reports, and 2% or more of them are fails.
I used a fairly recent CPAN snapshot to calculate this for all distributions, and present it below for different stages of the river. I calculated the individual measures for each distribution, and then the overall water quality metric.
Number of downstream dependents | ||||||
---|---|---|---|---|---|---|
10k+ | 1k - 9999 | 100 - 999 | 10 - 99 | 1 - 9 | 0 | |
# dists | 45 | 195 | 570 | 1589 | 8210 | 21250 |
CPAN Testers Fails | 4 | 10 | 30 | 172 | 1473 | 4762 |
8.9% | 5.1% | 5.3% | 10.8% | 17.9% | 22.4% | |
No META | 1 | 2 | 10 | 51 | 483 | 4440 |
2.2% | 1.0% | 1.8% | 3.2% | 5.9% | 20.9% | |
No perl version | 28 | 68 | 251 | 779 | 4879 | 15401 |
62.2% | 34.9% | 44.0% | 49.0% | 59.4% | 72.5% | |
Any Fail | 28 | 75 | 262 | 849 | 5293 | 16369 |
62.2% | 38.5% | 46.0% | 53.4% | 64.5% | 77.0% |
I first thought about the "water quality" of the CPAN River back in May. The figures for CPAN Testers have improved since then, which is good (though the CPAN Testers was slightly out of date, as there had been a CPAN Testers issue).
One thing that's interesting is that all of the metrics improve as you move up river, until you get to the head of the river (distributions with 10,000 or more dependents), where they all get a bit worse. I wonder if that's because a lot of those 45 distributions are dual-life ones that have been bundled with Perl 5 since the first release, and so perhaps haven't always been updated to follow new practices?
What other factors should be included in a CPAN water quality metric?
One of my main goals for 2016 is going to be improving the water quality of the CPAN River. Ie distributions with 1 or more downstream dependents.
I am going to have this as one of the focusses for the 2016 Pull Request Challenge, and also work on this myself. I'll generate these stats again on the 1st January, and then track them through the year. If anyone wants to join me on this quest, I'll be happy to have the company.
comments powered by Disqus