Two weeks ago I published a list of CPAN modules that might be candidates for adoption, and described the metric used to score them. I had a lot of comments on that version, which has prompted version 3 of the metric. The key change is the use of gating criteria to decide whether a module should even be considered for the list. The new list contains dists that score at least 5 (out of 14), which is about 4% of the dists on CPAN.
There are two stages to the metric now. First we apply gating criteria, and if the dist gets past those, it's then scored for 'adoption potential'. I'll present the gating criteria and scoring rubric first, then explain various parts.
Another new addition: instead of looking at the number of immediate (or direct) dependencies, we now look at the total number of dependent distributions.
Exclusion criteria: the dist isn't included in the list if either:
Inclusion criteria: the dist is scored if either
If a dist is marked for inclusion, it's then scored according to the following rubric. Unless stated, each rule adds +1 to the score.
'cpan'it gets +2.
Here's a plot which shows the distribution of scores across all CPAN distributions. Note that the y axis is logarithmic.
This is quite different from the graph for v2 of the metric. For v3 you can see a lot of dists have a score of 0 (86% of CPAN), due to the gating criteria.
I first reported 87% of dists had a score of 0. This was due to a bug, where I hadn't reinitialised some of my intermediate data after change the rules for which bugs are counted. Thanks to David Golden for catching this. So ignoring wishlist and unimportant tickets meant that 6% of CPAN dropped out of consideration. That made 92% of dists excluded.
But then I found a bug in my SQL used to ignore wishlist and unimportant tickets:
severity NOT IN ('wishlist', 'unimportant')
There are a lot of tickets where the severity is
and those weren't getting included.
That clause is now:
(severity ISNULL OR severity NOT IN ('wishlist', 'unimportant'))
Furthermore, I was previously considering the date when tickets were created, but now I've switched to using the date when each ticket was last updated.
This is calculated with the following, and capped to a max value of 3:
# months since last release - 6 bug_score = ----------------------------------- # months since most recent open bug
This means that dists released in the last 6 months can't appear on the list (unless they have ADOPTME or HANDOFF). For example XML-Twig appeared on the previous list, but doesn't appear this time because it was released in May 2013.
This gives a higher score to modules that were last released a long time ago, but where bugs have been reported recently.
%upstreamhash, which tells you whether the module is 'dual life' (in core and thus released with perl, but also released separately via CPAN). If a module has upstream marked as
'cpan', it means that the CPAN release is considered primary.
Overall I think this is a better measure: there will still be false positive in there, but it feels like there are a lot fewer of them.
Sources of data:
Let me know if you've got other ideas for extending or refining this.comments powered by Disqus