River of CPAN discussion at The QA Hackathon

CPAN RiverQAH Sun 1 May 2016

At the QAH this year we had another discussion about the River of CPAN: what's been done since last year, and what we should do to keep things moving forward. These are the notes from that discussion, and some of the things that happened after the discussion.

Last year at the QAH we:

The driver behind this was trying to improve the overall reliability of CPAN, and in particular the distributions on CPAN that lots of people rely on.

Since last year's QAH the river been talked about in a number of blog posts, and people have started using the terminology. We've moved things forward a bit, but not as much as I had hoped.

What are the problems?

We talked about what things we should do first, to help improve the situation.

Have MetaCPAN display the river position of all dists

In the meeting we agreed that a great first step would be for MetaCPAN to display the river position of all dists. Rather than display the absolute position, I've been using "logarithmic buckets", which was initially suggested by David Golden last year. There are 6 buckets:

I'm already generating this data on a weekly basis (I use it for the Pull Request Challenge, the adoption list, and other hacks). I said I'll make it available as JSON, and will work on pulling this code out of my other mess of scripts, so we can have a clean stand-alone service.

I had a chat with Olaf, the leader of the MetaCPAN project, and we agreed the structure of the JSON, which is described on this ticket. Joel Berger already worked on the changes to import the data!

We talked about how this might be shown in MetaCPAN, and a ticket was raised for that. Barbara Veloso was fortunately at the QAH (she's GARU's wife), and she came up with some better suggestions for how it could look.


The current river data just tells you when other CPAN distributions are using your distribution. But many CPAN modules are used "off CPAN", aka the DarkPAN. While serving a slightly different need, this sort of data would also be helpful. BOOK and DOLMEN decided to start looking into linux distros, and which modules have packages for those.

Let people know when their dist moves upriver

Rather than let authors know every time they gain or lose a dependent, we agreed that we should tell them when their dist moves between buckets (as defined above). This means you'd be notified the first time a CPAN distribution starts using a dist, then on the 10th, the 100th, and so on. How should we notify authors? Suggestions included:

There was general agreement that we shouldn't open a ticket, as it would generate too much noise: lots of unclosed tickets, which don't really need action. Tickets are to prompt action, whereas we want to inform. Email can bounce, and end up in spam folders. The problem with RSS is that someone has to know to subscribe to it first.

We'll start off with an email, sent to the person who last released the distribution. We could email everyone who's ever done a release, or everyone who has perms, but both of those approaches would end up notifying a lot of people who probably don't care (any more). We assume that the person who most recently released it is the best bet.

I'll implement a first version of this as part of the service that generates the river data, as it's the obvious place to put it. I'll raise an issue on PAUSE to see whether ANDK & Co are open to the idea of having a flag on PAUSE users for "email me useful things about my dists".

This email should be brief and to the point, with pointers documentation.

Tools to help authors

Last year we talked about the practices we'd like to encourage CPAN authors to adopt as their dists move up river, and that we should have tools to support them.

We all hoped that Tux and Chad would share experiences and code. Talk to Tux if you want to help with his tools.

Improving the water quality

One of the best ways to improve the average quality of CPAN is to target issues with distributions at the head of the river (ie depended on by many other CPAN distributions). A CPAN Testers fail there can mean that many distributions might not install on certain operating systems or versions of Perl.

The trouble is that these can be gnarly issues, and scary dists to work on, precisely because lots of people rely on them. They might need a lot of time, and generally aren't "fun". So how can we motivate people to work on them?

Someone suggested TPF grants, but Rik pointed out that TPF have explicitly said they don't want to encourage bounties in this way. Maybe sponsors?

They could be subjects for mini hackathons: a number of hackathons in 2015 had groups focussing on specific modules.

comments powered by Disqus