CPAN modules for getting a module's path

other reviews

Neil Bowers

2012-09-21

This is a review of modules that can be used to find out where some other module is installed locally. Typically this consists of looking for the module in the directories listed in @INC.

The following is a list of the modules I'm aware of so far. Please let me know if I've missed any: neilb at cpan dot org.

Module Doc Version Author # bugs # users Last update
App::moduleswhere pod 0.03 孙海军 0 1 2011-05-08
App::whichpm pod 0.04 Jozef Kutej 0 0 2010-06-22
Class::Inspector pod 1.27 Adam Kennedy 3 104 2012-01-25
Module::Data pod 0.006 Kent Fredric 0 2 2012-04-13
Module::Filename pod 0.01 Michael R. Davis 0 0 2009-01-19
Module::Finder pod v0.1.5 Eric Wilhelm 1 1 2007-07-17
Module::Info pod 0.32 Mattia Barbon 5 13 2010-09-08
Module::Locate pod 1.71 Neil Bowers 1 7 2012-09-17
Module::Mapper pod 1.01 Dean Arnold 2 0 2007-08-18
Module::Metadata pod 1.000011 apeiron 4 0 2012-08-16
Module::Path pod 0.06 Neil Bowers 0 2 2012-09-18
Module::Util pod 1.08 Matthew Lawrence 0 13 2012-05-28
Path::ScanINC pod 0.002 Kent Fredric 0 1 2012-04-11
Pod::Perldoc pod 3.17 Mark Allen 8 15 2012-03-18

As my standard test, I'll be using HTTP::Client, a module that I now maintain. Obviously, to get the path for a module you could just do something like:

eval "require $module";
($relpath = "$module.pm") =~ s!::!/!g;
print "$module path = ", $INC{$relpath}, "\n";

But there are various reasons why you might not want to load the module, so most of the modules described here get the path without having to load the module.

Before we dive in, it's worth recapping relevant parts of how modules are loaded. When you require HTTP::Client or use HTTP::Client, perl ends up running require. The module name is converted into a partial path, where HTTP::Client becomes HTTP/Client.pm. The directory path separator will always be / in this partial path, even if the correct directory path separator for your operating system is something else. require looks in %INC to see if the module has been loaded, and if not, it searches the list of directories in @INC, looking for the module. If found, the module is loaded, and %INC is updated: the key is the partial path, and the value is the full path. See perldoc -f require for more of the gory details.

Each module is presented in turn, with a SYNOPSIS style code sample. Then all the modules are compared, and I end up with recommendations.

App::moduleswhere

App::moduleswhere is an empty module, the distribution for which contains the mwhere script. This is a command-line script for getting the path to a module:

% mwhere -n HTTP::Client
/usr/local/lib/perl5/site_perl/5.16.0/HTTP/Client.pm

If you use the -n (or -no-require) switch, then mwhere will scan @INC, and not load the module. Without that switch, the script requires the target module, then gets the path from %INC.

It seems a little odd to me that App::moduleswhere has the documentation for the script, and mwhere itself has no embedded documentation, so man mwhere or perldoc mwhere gets you nothing.

App::whichpm

App::whichpm provides a single function, which_pm(), which in a scalar context returns the path to the module:

use App::whichpm qw(which_pm);

print "$module path = ", which_pm($module), "\n";

which_pm() always runs:

eval "use $module_name;";

and first looks to see if the target module is in %INC, returning the path found there if it is. If it isn't found in %INC, then @INC is scanned.

In an array context, which_pm() returns ($fullpath, $version), where:

$version = $module_name->VERSION;

The distribution also includes the whichpm script, which is just a wrapper around the module. By default it reports both path and version; the -q switch results in the path only.

% whichpm -q HTTP::Client
/usr/local/lib/perl5/site_perl/5.16.0/HTTP/Client.pm

% whichpm HTTP::Client
/usr/local/lib/perl5/site_perl/5.16.0/HTTP/Client.pm 1.53

The problem with the approach used (loading the module then getting the path from %INC) is that the loading of the module might have side effects. For example, Devel::Modlist prints the list of modules used, so you'll get the following:

% whichpm Devel::Modlist
/usr/local/lib/perl5/site_perl/5.16.0/Devel/Modlist.pm 0.801
App::whichpm           0.04
Carp                   1.26
... 33 more modules ...

Class::Inspector

Class::Inspector provides a number of class methods for getting information about a class. The resolved_filename() method returns the full path to a module:

use Class::Inspector;

print "$module path = ", Class::Inspector->resolved_filename($module), "\n";

Class::Inspector provides the following class methods in addition to the one demonstrated above:

I'm not sure why everything is provided as a class method rather than functions, but maybe some of the many modules dependent on Class::Inspector are subclasses, which would explain it.

Module::Data

Module::Data bills itself as a module which will "introspect context information about modules in @INC". The path() method returns the path from %INC if the target module has already been loaded, otherwise it uses Path::ScanINC:

use Module::Data;

$md = Module::Data->new($module);
print "$module path = ", $md->path, "\n";

The module also provides some other methods:

Module::Filename

Module::Filename provides an OO interface for getting a module's filename:

use Module::Filename;

$mf = Module::Filename->new;
print "$module path = ", $mf->filename($module), "\n";

It uses Path::Class to construct the full path from the module name and the relevant directory in @INC.

I think the OO design is overkill, as it's really just providing one function. For example, the SYNOPSIS suggests the following use, which is basically saying to use it like a function call:

use Module::Filename;
my $filename=Module::Filename->new->filename("strict");

The use of Path::Class means that Module::Filename has 38 dependencies.

Module::Finder

Module::Finder is an interesting module, with a quirky interface. So much so that it took me a while to work out how to get it work, and I still don't understand all of it. By default it will search through @INC looking for modules, and for your modules of interest you get a hashref containing information, including the path. Unless you constrain the search, it will recurse through all the directories in @INC, which takes quite a while.

Here's an example that shows getting the information for HTTP::Client:

use Module::Finder;

$finder = Module::Finder->new( dirs => \@INC,
                              paths => { 'HTTP' => '+' });
$info   = $finder->module_info('HTTP::Client');

print "name        = ", $info->{module_name}, "\n";
print "path        = ", $info->{filename}, "\n";
print "inc_path    = ", $info->{inc_path}, "\n";
print "module_path = ", $info->{module_path}, "\n";

This generates the following output:

name        = HTTP::Client
path        = /usr/local/lib/perl5/site_perl/5.16.0/HTTP/Client.pm
inc_path    = /usr/local/lib/perl5/site_perl/5.16.0
module_path = HTTP/Client.pm

The paths parameter is used to constrain the search. The + for HTTP says to only look for things in the HTTP directory, and no deeper. Look at the documentation to learn about the other options for constraining the search. The dirs parameter provides a list of directories to search; if not specified then Module::Finder will look through @INC.

There are a number of other methods; the following shows how could get information about all locally installed modules:

use Module::Finder;

$finder = Module::Finder->new();
%all    = $finder->module_infos();

foreach my $info (values %all) {
    print $info->{module_name}, "\n";
    print "    path        = ", $info->{filename}, "\n";
    print "    inc_path    = ", $info->{inc_path}, "\n";
    print "    module_path = ", $info->{module_path}, "\n";
}

Module::Info

Module::Info can provide some information about a module without loading the module, and can provide more information after loading the module. Getting the path for a module doesn't load it:

use Module::Info;

$mi = Module::Info->new_from_module($module);
print "$module path = ", $mi->file, "\n";

The following shows the other methods that don't trigger loading of the module:

use Module::Info;

$mi = Module::Info->new_from_module($module);
print "  name    = ", $mi->name, "\n";
print "  version = ", $mi->version, "\n";
print "  inc_dir = ", $mi->inc_dir, "\n";
print "  is_core = ", $mi->is_core, "\n";

Which provides the following for HTTP::Client:

  name    = HTTP::Client
  version = 1.53
  inc_dir = /usr/local/lib/perl5/site_perl/5.16.0
  is_core = 0

The following methods do trigger the loading of the target module; read the documentation for more details. Note that the documentation also caveats "From here down reliability drops rapidly!".

There are a few more things the module can do — the interested reader is directed to the documentation.

Module::Locate

Module::Locate provides a number of functions related to finding a module. The main function is locate(), which takes a module name and returns the full path to the module:

use Module::Locate qw(locate);

print "$module path = ", locate($module), "\n";

It uses catfile from File::Spec::Functions to ensure paths are generated portably.

Module::Locate provides a number of other functions:

App::Module::Locate provides a command-line interface to Module::Locate — the distribution includes the mlocate script:

% mlocate HTTP::Client
/usr/local/lib/perl5/site_perl/5.16.0/HTTP/Client.pm

This module has a major flaw: when you use locate($module), if $module hasn't already been loaded, then it finds it in @INC, and then caches the path in %INC. So if you subsequently try to require or use the module, it won't actually be loaded, because its appearance in %INC tricks Perl into thinking it has already been loaded. The module's author, Dan, has just given me co-maint, so I can fix this bug.

Module::Mapper

Module::Mapper provides one function, find_sources(), which searches either @INC and/or a specified list of directories, looking for one or more module names. It returns a hashref, which is keyed off the module name, with the value being a list of paths. The first path in the list is the absolute path to the module.

use Module::Mapper;

$ref = find_sources(
                           All => 0,
                        UseINC => 1,
                    IncludePOD => 0,
                       Modules => [ $module ],
                   );
print "$module path = ", $ref->{$module}->[0], "\n";

The find_sources() function takes 9 different options:

The design of the interface seems a bit curious to me, but the SEE ALSO says that it was created to support Pod::Classdoc and PPI::HTML::CodeFolder, so perhaps the design reflects the design of those modules.

Module::Metadata

Module::Metadata can provide several pieces of metadata about a module without loading the module — it parses the module's source and uses regexes to pull out the metadata. The simplest way to get the path to a module is with the find_module_by_name() class method:

use Module::Metadata;

print "$module path = ", Module::Metadata->find_module_by_name($module), "\n";

You can also instantiate Module::Metadata on a module (or source file), and then get information using instance methods, including the path:

use Module::Metadata;

$meta = Module::Metadata->new_from_module($module);
print "  name         = ", $meta->name, "\n";
print "  version      = ", $meta->version, "\n";
print "  filename     = ", $meta->filename, "\n";
print "  packages     = ", join(',', $meta->packages_inside), "\n";
print "  contains-pod = ", $meta->contains_pod, "\n";
print "  pod          = ", join(',', $meta->pod_inside), "\n";

For HTTP::Client this results in the following:

  name         = HTTP::Client
  version      = 1.53
  filename     = /usr/local/lib/perl5/site_perl/5.16.0/HTTP/Client.pm
  packages     = HTTP::Client
  contains-pod = -1
  pod          = 

The current version (1.000011) of Module::Metadata has a bug which means it ignores any pod that comes after __END__. Here's the output for Net::HTTP::Tiny:

  name         = Net::HTTP::Tiny
  version      = 0.001
  filename     = /usr/local/lib/perl5/site_perl/5.16.0/Net/HTTP/Tiny.pm
  packages     = Net::HTTP::Tiny
  contains-pod = 8
  pod          = NAME,SYNOPSIS,DESCRIPTION,FUNCTIONS,BUGS,SEE ALSO,AUTHOR,COPYRIGHT,LICENSE

Module::Path

Module::Path is a module I wrote while working on my review of CPAN modules for getting module dependency information. A number of modules expect the path to perl source, but I wanted to provide a module name. On searching CPAN I only turned up a couple of modules, but they either seemed to have too many dependencies, or had potential issues. So I whipped up Module::Path. While continuing to work on the other review I subsequently found more modules, and that prompted this review.

Module::Path provides one function, module_path(), which you must import:

use Module::Path qw(module_path);

print "$module path = ", module_path($module), "\n";

The distribution also includes a script, mpath, for use from the command-line:

% mpath HTTP::Client
/usr/local/lib/perl5/site_perl/5.16.0/HTTP/Client.pm

Module::Path uses the right directory path separator for your operating system, and ignores any code references it finds in @INC. It has 3 runtime dependencies: Exporter, strict, and warnings.

Module::Util

Module::Util provides a selection of functions for getting information about a module. The find_installed() function returns the first path found for a module:

use Module::Util qw(find_installed);

print "$module path = ", find_installed($module), "\n";

Similarly, all_installed() will report all paths where the module was found in @INC.

A quick summary of the main other functions:

The distribution also includes a script pm_which which displays the path for one or more modules:

% pm_which HTTP::Client
/usr/local/lib/perl5/site_perl/5.16.0/HTTP/Client.pm

% pm_which Furl HTTP::Tiny
Furl       - /usr/local/lib/perl5/site_perl/5.16.0/Furl.pm
HTTP::Tiny - /usr/local/lib/perl5/site_perl/5.16.0/HTTP/Tiny.pm

Path::ScanINC

Path::ScanINC emulates the way perl searches for modules in @INC. This includes handling of coderefs in @INC — when you're expecting to be returned a scalar containing a path, you might get an arrayref:

use Path::ScanINC;

$inc   = Path::ScanINC->new;
@parts = split('::', "$module.pm");
$path = $inc->first_file(@parts);
if (ref($path)) {
    print "oops, a code ref must have handled this!\n";
} else {
    print "$module path = ", $inc->first_file(@parts), "\n";
}

Notice that instead of calling $inc->first_file('HTTP::Client'), you have to pass the partial path (HTTP/Client.pm), split into its consituent parts: $inc->first_file('HTTP', 'Client.pm')

The all_files() method works like first_file(), but returns all instances found in @INC, not just the first one. The first_dir() and all_dirs() methods are analogous to the _file() methods, but find directories rather than files.

Pod::Perldoc

Pod::Perldoc is the module that provides the guts of the perldoc command.

The perldoc command has an -l option, which lists the path to the documentation for the item specified. For HTTP::Client, the pod is in the module itself, so this returns the path to the module:

% perldoc -l HTTP::Client
/usr/local/lib/perl5/site_perl/5.16.0/HTTP/Client.pm

But if the documentation for a module is in a separate file, then you'll get the path to that, rather than the path to the module:

% perldoc -l Locale::Country
/usr/local/lib/perl5/5.16.0/Locale/Country.pod

It turns out that there's an undocumented feature though: if you list both the -l and -m switches, it will always show the module path rather than the pod path:

% perldoc -l -m Locale::Country
/usr/local/lib/perl5/5.16.0/Locale/Country.pm

The Pod::Perldoc module doesn't provide any hook to this functionality though, so you can't call it from your code, unless you want to do something like:

chomp($path = `perldoc -lm $module`);

Comparison

Performance

The following table shows the result of benchmarking all of the relevant modules. I looked up the path for HTTP::Client 100,000 times. The code used is basically what was presented in the examples above. For the OO style modules, where the constructor is passed the module name, obviously I called the constructor. But for Module::Filename I called the constructor once, before running the benchmark.

ModuleTime (s)
Module::Path0.64
Class::Inspector1.46
Module::Mapper2.75
Path::ScanINC4.79
App::whichpm4.94
Module::Info8.65
Module::Locate9.31
Module::Data12.36
Module::Util14.26
Module::Filename24.83
Module::Metadata47.54
Module::Finder176.58

That's a surprisingly wide spread. Some of the modules are doing more than simply looking up the path, and some are built on generic modules which do a lot more when constructing directory paths.

Dependencies

The following table shows the number of run-time dependencies for each module, when running the example code given for each module above.

Module# dependencies
Path::ScanINC2
Module::Path3
Class::Inspector7
App::whichpm8
Module::Info12
Module::Util15
Module::Locate19
Module::Finder23
Module::Metadata24
Module::Data38
Module::Filename40
Module::Mapper42

Features

The following table summarises the various methods or functions provided by each module:

As you can see, there are a lot of different functions provided by the various modules, with a lot less overlap than I expected. The 's' for Module::Info reflects the fact that some of the methods can trigger the loading of the module.

Conclusion

If you just want to get the path to a module, Module::Path is your best bet: it's the fastest and has very few dependencies. This was by design, so that someone could use it in another module without fear of unexpected bloat. This might seem like I've cheated, but I've basically optimised for the same things I tend to evaluate when benchmarking modules in these reviews.

If you want additional information, such as version and whether the module is in the core, then Module::Info is not a bad choice. But be careful, as some methods can trigger loading of the module.

Depending on what other information you want, Module::Metadata and Class::Inspector are also worth a look.

comments powered by Disqus