CPAN Spelunking

PAUSECPAN Sat 8 August 2020

My current PAUSE tidy-up project is to resolve inconsistent first-come permissions on indexed distributions. In working on this I've created several scripts, and updated some modules. In this post I'll go through the most recent things I've done.

I've got a script that looks for distributions with multiple people holding first-come permissions. I'm not going to bother sharing this script, because once the project is done, it will be irrelevant. The output looks like this:

ppt
  PPT::Util     BDFOY  | CWEST
  SymbolicMode  CWEST  |
  Bundle::PPT   SDAGUE |
  ----
  S/SD/SDAGUE/ppt-0.12.tar.gz
  C/CW/CWEST/ppt-0.14.tar.gz

This told me that there were two different releases indexed, with three packages indexed against those releases, and three different people having the first-come permission on those packages. Only the first package has a co-maint, of CWEST.

Next I generally want to look at the release history of the distribution. For that I created a script cpan-release-history, which shows all releases for a distribution:

% cpan-release-history ppt
  0.14   2004-08-05   CWEST
  0.13   2004-07-23   CWEST
  0.12   2001-06-12   SDAGUE
  0.10   2001-06-08   SDAGUE

I've used a rough version of this script for years, but I've tidied it up and included with the CPAN::ReleaseHistory module, which lets you iterate over the history of all releases to CPAN.

In this case, I looked at ppt on MetaCPAN, and discovered it was the Perl Power Tools, which were adopted by brian d foy, and released as distribution PerlPowerTools.

Often just looking at the release history explains the pattern of first-come permissions: the original releaser has first-come on the lead module, and perhaps one or two others. Then someone else took over, and in refactoring the distribution they added some more packages, and they got first-come on those, whereas now PAUSE will maintain the same pattern of permissions as on the lead module.

Other common cases are either where two distributions have been merged, or where a module is split out of a larger distribution into one of its own.

As an aside, when a distribution is renamed, there's no information held anywhere that lets spelunkers determine that ppt became PerlPowerTools (other than maybe in the Changes file).

It would be great to have additional tags in metadata, like:

x_rename_of => 'ppt'
x_split_off_from => 'Huge-Blob-Of-Stuff'
x_merged_in => 'Absorbed-Distribution'

But how useful would they be to anyone else?

The next thing I generally do is email the involved authors, explain what I'm doing, and what I think has led us to this point with the specific distribution. At first I addressed them by their PAUSE ids ("Hi SDAGUE!"), but that felt a bit impersonal.

So now I run the cpan-whois script, which displays information for a PAUSE user. You can look this up via MetaCPAN, but I'm doing it a lot, so I wanted a quicker way:

% cpan-whois BDFOY CWEST SDAGUE
   CWEST:   Casey West    <casey@geeknest.com>
  SDAGUE:   Sean Dague    <sean@NdOaSgPuAeM.net>
   BDFOY:   brian d foy   <bdfoy@cpan.org>

This script is now included in the PAUSE-Users distribution; the PAUSE::Users module provides an interface to information about all PAUSE user accounts.

I've written quite a few tools for digging in PAUSE and CPAN data over the years, and often want tabular output. I've just released version 1.00 of Text::Table::Tiny, which adds additional formatting options, including the norule style, which I used in these tools.

For the example here, I emailed Casey, Sean, and brian, explaining what I was doing, and suggesting that we delete old releases of ppt and transfer all first-come permissions. Everyone was happy with that, so it's no longer turning up in my hit list.

comments powered by Disqus