A review of the 2015 CPAN Pull Request Challenge

CPAN-PRCretrospective Tue 5 January 2016

In 2015 I ran the first CPAN Pull Request Challenge. Each month participants were assigned a randomly selected CPAN distribution, and had a month to submit a pull request. 496 people signed up, and 237 of them completed 768 assignments between them, submitting pull requests on 677 different CPAN distributions. This is a review of how things panned out, what worked well, not so well, and how things have changed for 2016's challenge.

This post is based on a talk I gave at the London Perl Workshop in November.

Background

CPAN was started in 1995, and now has nearly 32000 distributions. Those distributions were released by thousands of different authors, of varying levels of experience and backgrounds, at different times in a 20-year period, using a range of tools, and following evolving best practices, or not.

As a result, CPAN needs quite a bit of gardening. I've tried a number of things to encourage people to join in the gardening effort. After trying 24 Pull Requests in 2014, I thought maybe the two ideas could be merged, and the CPAN Pull Request Challenge (hereafter PRC) was born.

At the start of 2015 about 9100 CPAN distributions (out of about 30k) had a github repo (as of January 2016, 10600 distributions have a github repo). To make my life easier I only considered distributions where the repo was listed in the dist's metadata.

How it worked in 2015

On the 1st day of the month I assigned distributions to participants, and emailed information about each distribution to the selected participant. The email had links to information about the distribution and some tips on things that could be done in the PRC. The PRC web site also has ideas for PRs, and we set up a mailing list and IRC channel where people could discuss their assignments.

By default I only assigned one distribution from any CPAN author (to spread the load), and most distributions were only assigned once during the year.

When they completed their PR, participants sent me an email. If they were too busy that month they could skip the assignment. People who'd done at least one PR, or who have skipped, got an assignment the following month.

The Hacker News effect

I announced the challenge on my personal blog in late November, resulting in 18 signups, about the number I expected. On Christmas Eve I posted to blogs.perl.org, which resulted in about 50 more signups by 30th December. On 31st December someone submitted my second blog post to Hacker News, where for a while it was in the top spot. This prompted another 300 or so signups!

People signed up through the year as well. By the end of the year, 496 people had signed up for the challenge.

Who signed up?

Unsurprisingly a lot of the signups were current Perl programmers and CPAN authors. But we also got people who had used Perl in the past, and people who had never programmed in Perl.

I asked participants how much Perl experience they had before signing up:

assignments and PRs

I was surprised that the largest group was 0-4 years — I had expected a skew towards the other end of the spectrum. As it is, more than half of the participants had programmed in Perl for 10 years or longer.

68% of participants said they'd done at least one pull request prior to signing up.

Why did people sign up?

I asked people why they signed up, and gave them a long list of options, which weren't mutually exclusive. The most common ones were:

To give something back to Perl 163 76%
Sounded like a fun thing to try 149 70%
To get more involved with the Perl / CPAN community 146 68%
To help CPAN 144 67%
To improve existing knowledge of Perl 99 46%
To learn more about the CPAN toolchain 77 36%
To get experience doing PRs 66 31%
To find out more about what's on CPAN 43 20%

So there are people out there who want to give back / get involved, but don't know how. Providing some kind of framework for that can help draw people (back) into your community.

What got done?

The following chart shows how many people got an assignment each month, how many of those resulted in at least one pull request (PR), and how many didn't.

assignments and PRs

768 assignments resulted in one or more pull requests. 677 different CPAN distributions had at least one PR as a result of the challenge. 237 people have so far submitted at least one PR for the challenge in 2015 (people are still completing their December assignments).

259 (52% of the 496 people who signed up) didn't submit a pull request. By the end of the year, 348 people (70% of the 496) had dropped out. Ie 89 people dropped out after doing at least one PR.

That surge in December: I emailed everyone who had done at least one PR before dropping out, and asked if they fancied rejoining for one last hurrah in December. Plenty did.

The following chart shows the monthly number of pull requests against CPAN distributions from September 2010 through December 2015.

monthly CPAN pull requests

The biggest spike is January, and from there it staggers down to a low in September. The surge in October was undoubtedly Hacktoberfest.

Why did people drop out?

The following table summarises the most common reasons given for dropping out:

Not enough time 128 74%
I wasn't interested in the module(s) I was assigned 44 25%
I couldn't think of things to do with the assigned module(s) 41 24%
I didn't get any response from emailing the author of assigned module(s) 26 15%
I didn't get any response to pull request(s) I submitted 22 13%
The things to be done on my assigned module(s) were too hard 18 10%

At the start of 2015 I was considering all CPAN distributions for assignment. After negative reactions from a handful of authors, I introduced the ability for authors to opt out. I emailed the 500 or so authors whose distributions were most likely to be assigned, explaining about the PRC, letting them opt out entirely, or exclude certain distributions, and give a +1 to other distributions if they were keen for them to be assigned.

Following further feedback from authors, I've changed the 2016 challenge to be opt in for authors: I only assign distributions from authors who've explicitly said they're happy for me to do that. I also encouraged them to add github issues for things they'd like to see done.

Heavyweight distributions

There are a number of heavyweight distributions on CPAN, such as database drivers for DBI, OO frameworks like Moose and Moo, and web frameworks like Dancer, Mojolicious, and Catalyst. People who were assigned these were quite often daunted, and either did nothing, or asked me for a different distribution. A few people had no problem with such an assignment, and submitted a PR.

There were 3 hackathons during the year that had a link with the PRC. For these I generated a list of distributions, based on the scoring I used when selecting distributions to assign. Based on what I heard of those, I think heavyweight distributions are better suited to hackathons, particularly where a group of people take on a particular distribution.

The other idea which came up is tagging distributions with difficulty and getting participants to self-classify as beginner, intermediate, or expert.

Participants said ...

The following are some of the comments made by people who took part:

"The prospect of pitching in doesn't seem as scary as it did before"

"I have actually submitted PRs to modules that were not assigned to me during the challenge because I was inspired."

"I learned a lot about the CPAN - toolchain, conventions, testing, community standards"

"I feel a bit closer to the community, and not quite as much just watching things from the outside"

"nice community, a 'heartbeat' against procrastination"

Changes for 2016

I'm running a PR Challenge in 2016, but with some changes:

As with last year, I'm sure the gameplan will evolve through the year.

Conclusion

Acknowledgements

The inspiration for the PRC was 24 Pull Requests. I'd like to thank all the authors who responded positively to PRs, particularly those from beginners. And I'd like to thank everyone who submitted pull requests: together we had an impact on CPAN, which was my goal all along.

comments powered by Disqus