A spotter's guide to CPAN

CPANdistributionsglossary Sun 18 October 2015

This is the start of a catalogue of the different files and directories you might come across in CPAN distributions: what they're for and how they're used. During the PRC I've had emails from a few people who didn't know what to do with the distribution they'd been assigned, which prompted this.

At the time I started writing this, there were 33195 distribution tarballs in author directories on CPAN, as determined with a minimal CPAN mirror (thanks to CPAN::Mini).

I've been working on this on and off for a couple of weeks, regularly going down rabbit holes learning a bit more about the toolchain. Rather than wait until it's complete, I figure I should post it now and get feedback on this much. What have I got wrong or missed out?

Makefile.PL or Build.PL?

Just about every CPAN release has at least one or both of Makefile.PL and Build.PL. To configure, build, test and install a distribution you use one of the following:

perl Makefile.PL    perl Build.PL
make                ./Build
make test           ./Build test
make install        ./Build install

The Makefile.PL will either be based on ExtUtils::MakeMaker or Module::Install, and Build.PL will usually be based on Module::Build, but may also use Module::Build::Tiny if it's available, falling back on Module::Build if it isn't.

In the early days of Perl 5, the only option was a Makefile.PL based on ExtUtils::MakeMaker. Then other builders came along, and for a while Build.PL/Module::Install was more common for new distributions.

There are 3979 distributions (12%) with both Makefile.PL and Build.PL, but having both can cause problems. 78% of dists have just a Makefile.PL, and 9% have just a Build.PL. Only 7% of 'dists' have neither: some of those will be old dists, but many will just be tarballs in author directories that aren't CPAN distributions.

MANIFEST

The MANIFEST is a list of all the files that should be released with the distribution. Each is given as the path relative to the top directory.

For example, the App-PAUSE-Comaint distribution contains the script comaint, which is in the script/ directory, so in the MANIFEST file you'll see the line:

script/comaint

A classic error that I've made myself: forgetting to update the MANIFEST after adding a module to an existing distribution. Then when you run make dist (if using Makefile.PL), the tarball doesn't contain the new module.

If your module has a Makefile.PL, you can (re)generate a MANIFEST file by running:

make manifest

This uses ExtUtils::Manifest to do the work.

For Build.PL you need to run:

./Build manifest

MANIFEST.SKIP

You can think of this as an 'anti MANIFEST' — it lists files that should not be included in releases. In addition to listing filenames explicitly, you can also use regular expressions to skip all files that match certain patterns:

\.swp$
^MYMETA\.yml$

You can add a special directive:

#!include_default

If this is seen (when you run make manifest), then you get a sensible set of default rules for files to skip.

lib/

Any modules in the distribution should be within the lib/ directory. For example, in the Module-Path distribution, the Module::Path module itself lives in lib/Module/Path.pm.

In older distributions (14% of the total), you'll sometimes find the .pm file in the top directory of the distribution. For example, in the Stat-lsMode distribution (which contains Stat::lsMode), you'll find the file lsMode.pm in the top directory.

META.json and META.yml

The distribution metadata provides information about the dist: name, author, version, dependencies, and a whole bunch of other things besides.

Exactly what information can be included is defined by the spec, CPAN::Meta::Spec. There are two main versions of the spec you need to be aware of: version 2 is the most recent and before that was version 1.4.

The first format used for shipping the metadata with the distribution was YAML, so it's not surprising that 84% of dists include a META.yml. The second supported format is JSON, which was introduced with the meta spec v2, in April 2010. Because it's more recent, only 36% of distributions include a META.json file in their most recent release.

The downside with META.yml is that it only supports the meta spec version 1.4. There are a number of issues with 1.4, but the key thing is that you can't differentiate between the different types of dependencies. You can't say that Test::SomeTestModule is only needed to run tests, for example, you just have to list it like other dependencies. If someone is trying install your release without running tests, they might run:

cpanm --notest

But if your release doesn't have a META.json file, then they'll still end up trying to install Test::SomeTestModule. Version 1.4 also doesn't supported suggested dependencies either.

Releases can include both a META.yml and a META.json, and that's what's recommended.

You can read more about the history of the meta spec in CPAN::Meta::History.

MYMETA.yml and MYMETA.json

These files are similar to the META.* files described above, with a subtle difference:

The main scenario (are there any others?) where these are needed are distributions that dynamically determine the testing and runtime prerequisites. Your Makefile.PL might check what operating system and version of Perl it is being installed on, and decide which modules are required, and which versions of them. It may be that your dist is fine with Foo::Bar 1.01+ on most versions of Perl, but then Perl 5.22 required a change so if your distribution is being installed under 5.22, then the minimum version of Foo::Bar is 1.37.

If you've got a lot of dependencies, and are in turn depended on by a lot of CPAN distributions, then this sort of management of your dependencies can make life easier for people downstream of you. If a distribution does dynamic configuration, it should set the dynamic_config field in the metadata to 1.

The whole MYMETA idea was cooked up at the first QA Hackathon in Oslo, in 2008. It was implemented in Module::Build in 2009, and in ExtUtils::MakeMaker in 2010

Because these are generated at configure time, you shouldn't ship them as part of a release, but it's easy to see how that might happen. There are 404 distributions (1.2%) with a MYMETA.json and 362 (1.1%) with a MYMETA.yml.

README

Traditionally the README file for any package would tell you what it was, who wrote it, and how to install it.

Many distributions have a README that is just an ASCII text rendering of the main module's documentation.

Personally I like a shorter README that is closer to the traditional contents. For a distribution that contains a single module:

You'll see some releases with a README.md file (roughly 6%); these are typically ones that have a github repo, as github will render the README.md as part of the project's home page. The .md extension signifies markdown, a simple text markup similar to pod but more widely used, particularly in blogging apps.

Changes

The Changes file lists the (main) changes made in each (recent) release of the distribution. It is useful for people who are deciding whether to upgrade. MetaCPAN displays the details of the most recent release as part of a distribution's page.

The file can have a header, typically one line identifying the module or distribution, and then a separate section for each release.

There's no official format for this, but the most widely used format is documented in CPAN::Changes::Spec. Here's an example:

Revision history for Perl module Foo::Bar

0.02 2015-08-09
   * Fixed bug where blah blah RT#12345
   * Added a SEE ALSO section to doc

0.01 2014-03-02
   * First release to CPAN

Each release section has a header line that starts with the version of the release, immediately followed by the release date in ISO 8601 format. I always put the date in UTC, since that's the timezone of PAUSE (ie the date and time of upload that's recorded against the distribution is in UTC). You can put whatever you like after that. I generally put the PAUSE id of the person who did the release.

The simplest content for each list is a markdown-style bulleted list. Read the doc for more details on formats.

By convention you should list releases from most recent to oldest, as the most recent one is what people probably want to look at, when deciding whether they should upgrade.

You'll also see this file called ChangeLog (3%), or CHANGES (4%), and similar. 82% of distributions have a Changes file, so I'd suggest you follow that convention.

t/

This directory contains the tests for the distribution. Each testsuite is a file with the extension ".t". All you need to do is create files with the right extension, and your installer (eg EUMM or MB) will find them and run them for you when someone runs make test or Build test.

t/lib/

If you've written some modules for use in the distribution's testsuite, then put them in this directory: they won't be installed along with your modules in lib/, and PAUSE won't include them in the CPAN Index.

xt/

This directory contains extra (extended?) tests that should not be run as part of the regular tests when installing the distribution. These may be one of the following:

The Lancaster Consensus (notes from discussions held at the 2013 QA Hackathon, in Lancaster, UK) defined a number of environment variables, which tests can use to determine in what context they're being run. During release to CPAN vs being tested by a CPAN smoke tester, for example. Using those variables, you can just have all types of test in your t/ directory, but it's cleaner to put all non-stanard tests in the xt/ directory.

Depending on how they're organised / written, you may be able to run these tests with:

prove -lr xt

As with the t/ directory, any modules you put in your xt/ directory will be ignored by PAUSE.

bin/ or script/

If your distribution contains some command-line tools, which should be installed in 'the usual directory for binaries', then put them in a directory called bin or script.

The name 'bin' is used in Unix-like operating systems for executables, but some people like to distinguish between compiled executables and executable scripts; I'm guessing that's why we have two different names to choose from.

examples/ or eg/

If you want to provide examples that show how to use the module(s) in your distribution, then put the scripts or modules in a directory called examples.

Unlike scripts in a bin or script directory, things in the examples directory won't automatically be installed.

An alternate name is eg/. There are 3386 dists with examples/ (10%) and 1009 with eg/ (3%).

LICENSE

This contains the text of the license(s) under which you are making the distribution available.

If you're using something like Dist::Zilla, then this will be generated for you.

You can use the software-license script (which comes with the App::Software::License module, where you'll find the documentation) to generate a LICENSE file, if you're not using a builder that can do it for you.

share/

If your distribution has some data files that should be installed along with your module (eg templates), then the convention is that you put them in a share/ directory of your distribution.

The installer you're using will provide some mechanism for installing these files into the right place for the local operating system.

The module itself can then use File::ShareDir to find the data files wherever they were installed.

dist.ini

This the configuration file used by Dist::Zilla, a build, configure and release tool. If you see this file, then you know the author of the module uses Dist::Zilla.

You don't have to have Dist::Zilla installed in order to install such a distribution: when the tarball is built, it will have a Makefile.PL or Build.PL added to it.

You could argue that dist.ini should not be included in the release of the distribution, since it's for the author, and with most releases you couldn't use Dist::Zilla on them, because the files in the release are often generated from the source, eg with documentation added, the $VERSION line, etc.

On balance though, it's a good idea to include the dist.ini in your releases: it lets people know you're using Dist::Zilla

That is why most Dist::Zilla distributions will often have a github repo linked, since you need the repo version if you want to hack on it beyond any simple patching.

minil.toml

This is the configuration file used by TOKUHIROM's Minilla authoring tool, analogous to the dist.ini file used by Dist::Zilla.

I'm not at all familiar with Minilla yet, so can't say much more than this.

If you're wondering what a toml file is, it's the TOML format, yet another file format you can use for config files, created by Tom Preston-Werner one of the co-founders of GitHub.

489 distributions (1.5%) have a minil.toml file. I think it's used by a lot of Japanese CPAN authors.

inc/

This directory appears in distributions that have a Makefile.PL which uses Module::Install. The inc/ directory contains modules that are used in the configure or build phase of your distribution, but that shouldn't be installed along with your modules. Typically this is a copy of Module::Install, and any plugins you're using.

For example Module::Setup's Makefile.PL has a first line:

use inc::Module::Install;

And if you look in its inc/ directory you'll see a number of modules.

This means that people can install your module without having to install Module::Install, its plugins, and other modules that aren't needed by your module(s) at run-time.

You shouldn't have the inc/ directory in source control (eg git), as it is generated for you. How does it know which modules to put in inc? Does it use metadata?

cpanfile

A cpanfile describes the dependencies of your distribution, similar to the way they're described in a Makefile.PL, but with richer ways to express the version dependencies.

If you have a large app (eg a web app), which is never going to be released to CPAN, so not bundled as a dist, then a cpanfile is a way to keep track of your dependencies, and install them on a new machine using carton or cpanm.

The cpanm CPAN client can install dists directly from github, but typically a repo won't have a META.yml or META.json file; if it sees a cpanfile, then cpanm will use that to identify the dependencies that might need installing first.

Read more in Miyagawa's blog post about cpanfile.

CONTRIBUTING

If a distribution has a CONTRIBUTING file, it will usually describe how to go about contributing to the distribution. Larger complex projects will often have one of these, as will authors who maintain a lot of distributions and have a well-oiled process to help them.

Well-formed release

A well-formed release contains the following:

and doesn't contain the following:

Acknowledgements

Thanks to everyone on IRC who answered my questions related to this, particularly ETHER, RJBS, HAARG, and KENTNL. The #toolchain channel on irc.perl.org is a good place to start if you have questions about the CPAN toolchain.

comments powered by Disqus