Last year I created PAUSE::Packages, which lets you iterate over all dists that PAUSE believes are still on CPAN. For a number of projects, including the CPAN Report 2013, I need to iterate over all releases of all dists. Yesterday I made the first release of CPAN::ReleaseHistory, which makes it easy to do that, in a similar way to PAUSE::Packages.
Up to now I've been grabbing a dump of PAUSE's database, and parsing that to get the information I need. There were a number of problems:
Talking to ANDK and DAGOLDEN about this, David pointed out that there were BackPAN indexes available, which might serve my need. This led me to MSCHWERN's BackPAN::Index. That provided more features than I needed, has a lot of dependencies, and doesn't provide exactly the interface I want for a number of my projects / ideas. But it did lead me to the BackPAN index file it uses.
My new module has an interface pretty much the same as PAUSE::Packages,
but now you're iterating over all releases ever.
Here's how to find all releases of dist enum
:
my $iterator = CPAN::ReleaseHistory->new()->release_iterator();
while ($release = $iterator->next) {
next unless defined $release->distinfo->dist;
next unless $release->distinfo->dist eq 'enum';
printf "%s time=%d size=%d\n",
$release->path, $release->timestamp, $release->size;
}
The distinfo
method returns an instance of CPAN::DistnameInfo,
from which you can get the dist name, PAUSE id of the uploaded,
and lots more.
The above code generates the following output:
Z/ZE/ZENIN/enum-1.008.tar.gz time=897606348 size=4232
Z/ZE/ZENIN/enum-1.009.tar.gz time=897610837 size=4524
Z/ZE/ZENIN/enum-1.010.tar.gz time=897682129 size=4509
Z/ZE/ZENIN/enum-1.011.tar.gz time=900784109 size=5906
N/NJ/NJLEON/enum-0.02.tar.gz time=901821239 size=3396
Z/ZE/ZENIN/enum-1.013.tar.gz time=926634892 size=5627
Z/ZE/ZENIN/enum-1.014.tar.gz time=926636344 size=5666
Z/ZE/ZENIN/enum-1.015.tar.gz time=927414594 size=5714
Z/ZE/ZENIN/enum-1.016.tar.gz time=927845988 size=5847
R/RO/ROODE/enum-0.01.tar.gz time=1205434783 size=9280
N/NE/NEILB/enum-1.016_01.tar.gz time=1377640563 size=6667
N/NE/NEILB/enum-1.02.tar.gz time=1378023284 size=6827
N/NE/NEILB/enum-1.03.tar.gz time=1378145819 size=6902
N/NE/NEILB/enum-1.04.tar.gz time=1378412340 size=7003
N/NE/NEILB/enum-1.05.tar.gz time=1378423112 size=7084
N/NE/NEILB/enum-1.06.tar.gz time=1390608724 size=7230
The iterator gives you releases sorted first by dist name, and then by release time.
Not everything on CPAN is a tarball, particularly old things. That's why I included the line:
next unless defined $release->distinfo->dist;
Here are some examples of things released to CPAN that this guard line filters out:
A/AN/ANKITAS/AWS-SQS-Simple
S/SR/SREZIC/patches/Net-ZooKeeper-0.35-RT91216.patch
M/MA/MAHATMA/phttpd-0.01.45.pl
I should add an option to the iterator that controls whether you even get to see those things, since most of the time I skip them anyway.
It's currently very simple in how it works: it grabs the index, loads all the relevant entries into memory, sorts them according to the above rules, then writes them to a local file. This obviously takes up quite a bit of memory, so don't use this module on your smartwatch.
comments powered by Disqus