This is the start of a catalogue of the different files and directories you might come across in CPAN distributions: what they're for and how they're used. During the PRC I've had emails from a few people who didn't know what to do with the distribution they'd been assigned, which prompted this.
At the time I started writing this, there were 33195 distribution tarballs in author directories on CPAN, as determined with a minimal CPAN mirror (thanks to CPAN::Mini).
I've been working on this on and off for a couple of weeks, regularly going down rabbit holes learning a bit more about the toolchain. Rather than wait until it's complete, I figure I should post it now and get feedback on this much. What have I got wrong or missed out?
Just about every CPAN release has at least one or both of Makefile.PL
and Build.PL
. To configure, build, test and install a distribution
you use one of the following:
perl Makefile.PL perl Build.PL
make ./Build
make test ./Build test
make install ./Build install
The Makefile.PL
will either be based on ExtUtils::MakeMaker
or Module::Install, and Build.PL
will usually be based on Module::Build,
but may also use Module::Build::Tiny if it's available,
falling back on Module::Build
if it isn't.
In the early days of Perl 5, the only option was a Makefile.PL
based
on ExtUtils::MakeMaker. Then other builders came along, and for
a while Build.PL
/Module::Install was more common for new distributions.
There are 3979 distributions (12%) with both Makefile.PL and Build.PL,
but having both
can cause problems.
78% of dists have just a Makefile.PL
, and 9% have just a Build.PL
.
Only 7% of 'dists' have neither: some of those will be old dists,
but many will just be tarballs in author directories that
aren't CPAN distributions.
The MANIFEST
is a list of all the files that should be released
with the distribution.
Each is given as the path relative to the top directory.
For example, the App-PAUSE-Comaint distribution contains the script
comaint,
which is in the script/
directory, so in the MANIFEST file you'll see the line:
script/comaint
A classic error that I've made myself:
forgetting to update the MANIFEST after adding
a module to an existing distribution. Then when you run make dist
(if using Makefile.PL
), the tarball doesn't contain the new module.
If your module has a Makefile.PL
, you can (re)generate a MANIFEST
file by running:
make manifest
This uses ExtUtils::Manifest to do the work.
For Build.PL
you need to run:
./Build manifest
You can think of this as an 'anti MANIFEST' — it lists files that should not be included in releases. In addition to listing filenames explicitly, you can also use regular expressions to skip all files that match certain patterns:
\.swp$
^MYMETA\.yml$
You can add a special directive:
#!include_default
If this is seen (when you run make manifest
), then you get a sensible
set of default rules for files to skip.
Any modules in the distribution should be within the lib/
directory.
For example, in the Module-Path distribution, the Module::Path
module itself lives in lib/Module/Path.pm
.
In older distributions (14% of the total),
you'll sometimes find the .pm
file
in the top directory of the distribution.
For example, in the Stat-lsMode distribution
(which contains Stat::lsMode),
you'll find the file lsMode.pm
in the top directory.
The distribution metadata provides information about the dist: name, author, version, dependencies, and a whole bunch of other things besides.
Exactly what information can be included is defined by the spec, CPAN::Meta::Spec. There are two main versions of the spec you need to be aware of: version 2 is the most recent and before that was version 1.4.
The first format used for shipping the metadata with the distribution
was YAML,
so it's not surprising that 84% of dists include a META.yml
.
The second supported format is JSON,
which was introduced with the meta spec v2, in April 2010.
Because it's more recent, only 36% of distributions include
a META.json
file in their most recent release.
The downside with META.yml
is that it only supports
the meta spec version 1.4.
There are a number of issues with 1.4, but the key thing is that
you can't differentiate between the different
types of dependencies.
You can't say that Test::SomeTestModule
is only needed to run tests,
for example,
you just have to list it like other dependencies.
If someone is trying install your release without running
tests, they might run:
cpanm --notest
But if your release doesn't have a META.json
file, then they'll still
end up trying to install Test::SomeTestModule
.
Version 1.4 also doesn't supported suggested dependencies either.
Releases can include both a META.yml
and a META.json
,
and that's what's recommended.
You can read more about the history of the meta spec in CPAN::Meta::History.
These files are similar to the META.*
files described above,
with a subtle difference:
META.yml
and META.json
files are generated at release time
by the author and included in the tarball that's uploaded to CPAN.MYMETA.yml
and MYMETA.json
files are generated at
configuration time, ie when the installer runs perl Makefile.PL
or perl Build.PL
, and should not be included in the tarball.The main scenario (are there any others?) where these are needed
are distributions that dynamically determine the testing and runtime
prerequisites. Your Makefile.PL
might check what operating system
and version of Perl it is being installed on, and decide which
modules are required, and which versions of them.
It may be that your dist is fine with Foo::Bar
1.01+ on
most versions of Perl, but then Perl 5.22 required a change
so if your distribution is being installed under 5.22,
then the minimum version of Foo::Bar
is 1.37.
If you've got a lot of dependencies, and are in turn depended
on by a lot of CPAN distributions, then this sort of management
of your dependencies can make life easier for people downstream of you.
If a distribution does dynamic configuration,
it should set
the dynamic_config
field in the metadata to 1.
The whole MYMETA idea was cooked up at the first QA Hackathon in Oslo, in 2008. It was implemented in Module::Build in 2009, and in ExtUtils::MakeMaker in 2010
Because these are generated at configure time, you shouldn't ship them
as part of a release, but it's easy to see how that might happen.
There are 404 distributions (1.2%) with a MYMETA.json
and 362 (1.1%) with a MYMETA.yml
.
Traditionally the README file for any package would tell you what it was, who wrote it, and how to install it.
Many distributions have a README that is just an ASCII text rendering of the main module's documentation.
Personally I like a shorter README that is closer to the traditional contents. For a distribution that contains a single module:
You'll see some releases with a README.md
file (roughly 6%);
these are typically ones that have a github repo,
as github will render the README.md
as part of the project's home page.
The .md
extension signifies
markdown, a simple text markup
similar to pod
but more widely used, particularly in blogging apps.
The Changes file lists the (main) changes made in each (recent) release of the distribution. It is useful for people who are deciding whether to upgrade. MetaCPAN displays the details of the most recent release as part of a distribution's page.
The file can have a header, typically one line identifying the module or distribution, and then a separate section for each release.
There's no official format for this, but the most widely used format is documented in CPAN::Changes::Spec. Here's an example:
Revision history for Perl module Foo::Bar
0.02 2015-08-09
* Fixed bug where blah blah RT#12345
* Added a SEE ALSO section to doc
0.01 2014-03-02
* First release to CPAN
Each release section has a header line that starts with the version of the release, immediately followed by the release date in ISO 8601 format. I always put the date in UTC, since that's the timezone of PAUSE (ie the date and time of upload that's recorded against the distribution is in UTC). You can put whatever you like after that. I generally put the PAUSE id of the person who did the release.
The simplest content for each list is a markdown-style bulleted list. Read the doc for more details on formats.
By convention you should list releases from most recent to oldest, as the most recent one is what people probably want to look at, when deciding whether they should upgrade.
You'll also see this file called ChangeLog
(3%),
or CHANGES
(4%), and similar.
82% of distributions have a Changes
file, so I'd suggest
you follow that convention.
This directory contains the tests for the distribution.
Each testsuite is a file with the extension ".t".
All you need to do is create files with the right extension,
and your installer (eg EUMM or MB) will find them and run
them for you when someone runs make test
or Build test
.
If you've written some modules for use in the distribution's
testsuite, then put them in this directory:
they won't be installed along with your modules in lib/
,
and PAUSE won't include them in the
CPAN Index.
This directory contains extra (extended?) tests that should not be run as part of the regular tests when installing the distribution. These may be one of the following:
t/
directory, but let the end-user decide whether
to run them.The Lancaster Consensus
(notes from discussions held at the 2013 QA Hackathon, in Lancaster, UK)
defined a number of environment variables, which tests can use to
determine in what context they're being run. During release to CPAN
vs being tested by a CPAN smoke tester, for example.
Using those variables, you can just have all types of test in
your t/
directory, but it's cleaner to put all non-stanard tests
in the xt/
directory.
Depending on how they're organised / written, you may be able to run these tests with:
prove -lr xt
As with the t/
directory, any modules you put in your xt/
directory will be ignored by PAUSE.
If your distribution contains some command-line tools,
which should be installed in 'the usual directory for binaries',
then put them in a directory called bin
or script
.
The name 'bin' is used in Unix-like operating systems for executables, but some people like to distinguish between compiled executables and executable scripts; I'm guessing that's why we have two different names to choose from.
If you want to provide examples that show how to use the module(s) in your distribution, then put the scripts or modules in a directory called examples.
Unlike scripts in a bin or script directory, things in the examples directory won't automatically be installed.
An alternate name is eg/
. There are 3386 dists with examples/
(10%)
and 1009 with eg/
(3%).
This contains the text of the license(s) under which you are making the distribution available.
If you're using something like Dist::Zilla
, then this
will be generated for you.
You can use the software-license
script
(which comes with the App::Software::License module, where you'll
find the documentation)
to generate a LICENSE
file, if you're
not using a builder that can do it for you.
If your distribution has some data files that should be installed
along with your module (eg templates), then the convention is that you put
them in a share/
directory of your distribution.
The installer you're using will provide some mechanism for installing these files into the right place for the local operating system.
The module itself can then use File::ShareDir to find the data files wherever they were installed.
This the configuration file used by Dist::Zilla,
a build, configure and release tool. If you see this file,
then you know the author of the module uses Dist::Zilla
.
You don't have to have Dist::Zilla
installed in order to
install such a distribution: when the tarball is built,
it will have a Makefile.PL
or Build.PL
added to it.
You could argue that dist.ini
should not be included
in the release of the distribution, since it's for the author,
and with most releases you couldn't use Dist::Zilla
on them,
because the files in the release are often generated from the source,
eg with documentation added, the $VERSION
line, etc.
On balance though, it's a good idea to include the dist.ini
in
your releases: it lets people know you're using Dist::Zilla
That is why most Dist::Zilla
distributions will often have
a github repo linked, since you need the repo version if you want
to hack on it beyond any simple patching.
This is the configuration file used by TOKUHIROM's
Minilla authoring tool,
analogous to the dist.ini
file used by Dist::Zilla
.
I'm not at all familiar with Minilla yet, so can't say much more than this.
If you're wondering what a toml
file is,
it's the TOML format,
yet another file format you can use for config files,
created by Tom Preston-Werner
one of the co-founders of GitHub.
489 distributions (1.5%) have a minil.toml
file.
I think it's used by a lot of Japanese CPAN authors.
This directory appears in distributions that have a Makefile.PL
which uses Module::Install.
The inc/
directory contains modules that are used in the configure
or build phase of your distribution, but that shouldn't be
installed along with your modules.
Typically this is a copy of Module::Install
, and any plugins
you're using.
For example Module::Setup's
Makefile.PL
has a first line:
use inc::Module::Install;
And if you look in
its inc/
directory
you'll see a number of modules.
This means that people can install your module without having to
install Module::Install
, its plugins, and other modules that aren't
needed by your module(s) at run-time.
You shouldn't have the inc/
directory in source control (eg git),
as it is generated for you. How does it know which modules to put in inc?
Does it use metadata?
A cpanfile
describes the dependencies of your distribution,
similar to the way they're described in a Makefile.PL
,
but with richer ways to express the version dependencies.
If you have a large app (eg a web app), which is never going to
be released to CPAN, so not bundled as a dist, then a cpanfile
is a way to keep track of your dependencies, and install them
on a new machine using
carton
or cpanm
.
The cpanm
CPAN client can install dists directly from github,
but typically a repo won't have a META.yml
or META.json
file;
if it sees a cpanfile
, then cpanm
will use that to identify
the dependencies that might need installing first.
Read more in Miyagawa's blog post about cpanfile.
If a distribution has a CONTRIBUTING
file, it will usually
describe how to go about contributing to the distribution.
Larger complex projects will often have one of these,
as will authors who maintain a lot of distributions and have
a well-oiled process to help them.
A well-formed release contains the following:
Makefile.PL
or a Build.PL
META.json
and META.yml
MANIFEST
filelib/
directoryLICENSE
file.t
file, found in the t/
directoryand doesn't contain the following:
Thanks to everyone on IRC who answered my questions
related to this, particularly ETHER, RJBS, HAARG, and KENTNL.
The #toolchain
channel on irc.perl.org
is a good place to start
if you have questions about the CPAN toolchain.