In 2015 I ran the first CPAN Pull Request Challenge. Each month participants were assigned a randomly selected CPAN distribution, and had a month to submit a pull request. 496 people signed up, and 237 of them completed 768 assignments between them, submitting pull requests on 677 different CPAN distributions. This is a review of how things panned out, what worked well, not so well, and how things have changed for 2016's challenge.
CPAN was started in 1995, and now has nearly 32000 distributions. Those distributions were released by thousands of different authors, of varying levels of experience and backgrounds, at different times in a 20-year period, using a range of tools, and following evolving best practices, or not.
As a result, CPAN needs quite a bit of gardening. I've tried a number of things to encourage people to join in the gardening effort. After trying 24 Pull Requests in 2014, I thought maybe the two ideas could be merged, and the CPAN Pull Request Challenge (hereafter PRC) was born.
At the start of 2015 about 9100 CPAN distributions (out of about 30k) had a github repo (as of January 2016, 10600 distributions have a github repo). To make my life easier I only considered distributions where the repo was listed in the dist's metadata.
On the 1st day of the month I assigned distributions to participants, and emailed information about each distribution to the selected participant. The email had links to information about the distribution and some tips on things that could be done in the PRC. The PRC web site also has ideas for PRs, and we set up a mailing list and IRC channel where people could discuss their assignments.
By default I only assigned one distribution from any CPAN author (to spread the load), and most distributions were only assigned once during the year.
When they completed their PR, participants sent me an email. If they were too busy that month they could skip the assignment. People who'd done at least one PR, or who have skipped, got an assignment the following month.
I announced the challenge on my personal blog in late November, resulting in 18 signups, about the number I expected. On Christmas Eve I posted to blogs.perl.org, which resulted in about 50 more signups by 30th December. On 31st December someone submitted my second blog post to Hacker News, where for a while it was in the top spot. This prompted another 300 or so signups!
People signed up through the year as well. By the end of the year, 496 people had signed up for the challenge.
Unsurprisingly a lot of the signups were current Perl programmers and CPAN authors. But we also got people who had used Perl in the past, and people who had never programmed in Perl.
I asked participants how much Perl experience they had before signing up:
I was surprised that the largest group was 0-4 years — I had expected a skew towards the other end of the spectrum. As it is, more than half of the participants had programmed in Perl for 10 years or longer.
68% of participants said they'd done at least one pull request prior to signing up.
I asked people why they signed up, and gave them a long list of options, which weren't mutually exclusive. The most common ones were:
|To give something back to Perl||163||76%|
|Sounded like a fun thing to try||149||70%|
|To get more involved with the Perl / CPAN community||146||68%|
|To help CPAN||144||67%|
|To improve existing knowledge of Perl||99||46%|
|To learn more about the CPAN toolchain||77||36%|
|To get experience doing PRs||66||31%|
|To find out more about what's on CPAN||43||20%|
So there are people out there who want to give back / get involved, but don't know how. Providing some kind of framework for that can help draw people (back) into your community.
The following chart shows how many people got an assignment each month, how many of those resulted in at least one pull request (PR), and how many didn't.
768 assignments resulted in one or more pull requests. 677 different CPAN distributions had at least one PR as a result of the challenge. 237 people have so far submitted at least one PR for the challenge in 2015 (people are still completing their December assignments).
259 (52% of the 496 people who signed up) didn't submit a pull request. By the end of the year, 348 people (70% of the 496) had dropped out. Ie 89 people dropped out after doing at least one PR.
That surge in December: I emailed everyone who had done at least one PR before dropping out, and asked if they fancied rejoining for one last hurrah in December. Plenty did.
The following chart shows the monthly number of pull requests against CPAN distributions from September 2010 through December 2015.
The biggest spike is January, and from there it staggers down to a low in September. The surge in October was undoubtedly Hacktoberfest.
The following table summarises the most common reasons given for dropping out:
|Not enough time||128||74%|
|I wasn't interested in the module(s) I was assigned||44||25%|
|I couldn't think of things to do with the assigned module(s)||41||24%|
|I didn't get any response from emailing the author of assigned module(s)||26||15%|
|I didn't get any response to pull request(s) I submitted||22||13%|
|The things to be done on my assigned module(s) were too hard||18||10%|
At the start of 2015 I was considering all CPAN distributions for assignment. After negative reactions from a handful of authors, I introduced the ability for authors to opt out. I emailed the 500 or so authors whose distributions were most likely to be assigned, explaining about the PRC, letting them opt out entirely, or exclude certain distributions, and give a +1 to other distributions if they were keen for them to be assigned.
Following further feedback from authors, I've changed the 2016 challenge to be opt in for authors: I only assign distributions from authors who've explicitly said they're happy for me to do that. I also encouraged them to add github issues for things they'd like to see done.
There are a number of heavyweight distributions on CPAN, such as database drivers for DBI, OO frameworks like Moose and Moo, and web frameworks like Dancer, Mojolicious, and Catalyst. People who were assigned these were quite often daunted, and either did nothing, or asked me for a different distribution. A few people had no problem with such an assignment, and submitted a PR.
There were 3 hackathons during the year that had a link with the PRC. For these I generated a list of distributions, based on the scoring I used when selecting distributions to assign. Based on what I heard of those, I think heavyweight distributions are better suited to hackathons, particularly where a group of people take on a particular distribution.
The other idea which came up is tagging distributions with difficulty and getting participants to self-classify as beginner, intermediate, or expert.
The following are some of the comments made by people who took part:
"The prospect of pitching in doesn't seem as scary as it did before"
"I have actually submitted PRs to modules that were not assigned to me during the challenge because I was inspired."
"I learned a lot about the CPAN - toolchain, conventions, testing, community standards"
"I feel a bit closer to the community, and not quite as much just watching things from the outside"
"nice community, a 'heartbeat' against procrastination"
I'm running a PR Challenge in 2016, but with some changes:
As with last year, I'm sure the gameplan will evolve through the year.
The inspiration for the PRC was 24 Pull Requests. I'd like to thank all the authors who responded positively to PRs, particularly those from beginners. And I'd like to thank everyone who submitted pull requests: together we had an impact on CPAN, which was my goal all along.comments powered by Disqus