A key part of the Pull Request challenge is deciding which CPAN distributions
to hand out each month. In this post I'll describe the way I rank distributions,
with the highest-ranking previously-unassigned dists handed out each month.
You can browse the
list of ranked distributions.
This is still very much a work-in-progress — I'm looking for input
on the criteria used to 'score' distributions.
Here's the general approach:
Build a list of CPAN distributions that have a github repo.
Calculate a score for each such dist.
Assign the highest scoring dists to the N people still in the challenge,
with some constraints.
The rest of this post will go into each step in a bit more detail.
CPAN distributions that have a github repo
First I build a list of all CPAN distributions which have metadata giving
a github repo.
I wrote a perladvent article
on how to do that using the MetaCPAN API.
But it turns out that there are a good few (roughly 300) dists which list
a github repo that is no longer there.
So then I use URL::Exists (which I wrote because I couldn't find it on CPAN)
to check whether each repo exists.
If it doesn't, then the dist is excluded from assignment, but I'll
publish a list of those dists, since they're possibly candidates for getting
back on github.
Finally, if the dist has the x_deprecated field in its metadata,
marking it as deprecated,
then it is also excluded from the list.
Calculating a score for each dist
There are two types of factor considered:
(1) is it a worthy dist to assign (e.g. used by other dists)?
(2) are there things that could be usefully done?
For now both types of factors are mashed together to produce a single score.
Here are the factors used to currently calculate the score:
Used by other CPAN dists
1
Has issues on RT
1 - 3
Has CPAN Testers fails
1
CPANTS warnings
1-2
Has test.pl in topdir
1
Author encourages PRs
1
Module uses Any::Moose (which is deprecated)
1
Partially deprecated
1
Notes on the above:
I'm thinking perhaps another +1 if used by a CPAN dist released by
a different author.
Currently counting RT issues classified as bugs, but need to consider
github issues as well.
If >= 50 issues, +3; if >= 20 issues, +2; if >= 1 issue, +1.
For CPAN Testers, I'm thinking maybe > 50 total and > 2% fails
For CPANTS, a red flag counts as 2 CPANTS points, and a yellow flag counts
as 1 CPANTS point (ignoring the is_prereq flag).
If a dist has 10 CPANTS points or more, it gets +2;
if it has CPANTS points in the range 1 to 9, it gets +1.
Any::Moose is deprecated, and it's recommended that you use Moo,
unless there's a good reason to use Moose or Mouse.
Partially deprecated means that the abstract appears to mark it as deprecated,
but the dist metadata doesn't contain the x_deprecated field.
Thoughts on additional things to score a dist on. These aren't definite, just
thoughts, which I'm looking for feedback on:
Has CPANTS 'failures', perhaps +2 if any are red, and +1 if amber
indirect method notation seen in code / doc
dist doesn't have a main module
main module doesn't have an abstract
no SEE ALSO section in the doc. I think it's helpful to have a SEE ALSO,
even if it just says "I don't know of any other module that does this",
since if there is, someone will point it out to you :-)
Has CPAN Testers fails with current dev perl, but not previous perls
Dist has NEEDHELP co-maint on one or more modules
Build a list of other dists which have a clear replacement
(like Moo for Any::Moose).
Depends on a deprecated dist.
When sending people their assigned dists, the email will list known
specific issues, essentially giving them pointers on things to do
in a pull request.
Please: add comments with lots of ideas for additional ranking factors.
Assigning dists
When assigning dists to participants, the following rules apply:
By default only one dist from any given author will be assigned in
any given month. You can change this, bumping it up to a higher
number per month. A few people have already: thank you.
At the moment I'm planning on assigning each dist only once.
But clearly some dists have more than enough to justify multiple sequential
assignments: for such dists, as long as the first assignee has said they're
done, them it would be ok to assign again.
When someone drops out of the challenge, their current assignment goes
back into the pool.