This article is a review of 20 CPAN modules that can be used to make HTTP requests. Many of the modules can handle FTP and other types of requests, but here I'm just focussing on HTTP, and in particular GET and POST requests.
If you're thinking "Sheesh, just tell me what module to use!", here's a concise summary of the recommendations:
The following table lists the modules reviewed, along with basic information. The # users column is the number of distributions that list the module as a pre-requisite.
I intentionally excluded modules like WWW::Mechanize, which provide higher-level functions. Web::Magic was close to failing that test!
I'll look at each module in turn, then present results of comparing the modules, and finally which module you should use when.
In comparing the modules, I'm looking at the following:
In each section I'll show SYNOPSIS style code examples to illustrate basic use of each module.
Before starting the reviews, I'll present a simple model for making HTTP requests, so I can describe how each module fits into it. I'll assume you're familiar with the basics of HTTP, and how the web works (there are plenty of HTTP tutorials online).
Making an HTTP request involves three entities:
With some of the modules here, each of those entities is represented with a separate class. Sometimes the request is implicit, constructed from arguments passed to a request method on the User Agent. And sometimes there isn't a separate response class either: everything is rolled into the User Agent.
Disclaimer: I'm not an HTTP expert. Please let me know if you spot any errors, inaccuracies or gaps in the material below.
Furl bills itself as a lightning-fast URL fetcher. Not merely fast you understand, but lightning fast. I'm expecting that most of the time in making a request is in network latency, so I'm having a hard time seeing what will make this so much faster than the competition, but we'll see.
Furl is the User-Agent class, and rather than a request class, relevant information is passed to a request method:
use Furl; $furl = Furl->new(agent => 'MyModule/2.0', max_redirects => 0); $response = $furl->get($url); if ($response->is_success) { print "Status: ", $response->code, "\n"; print "Content: ", $response->body, "\n"; } else { print "Status: ", $response->code, "\n"; print "Message: ", $response->message, "\n"; }
Furl has a generic request method, but also provides get, post, head, put, and delete methods. Each of these return an instance of Furl::Response, which is very similar to the HTTP::Response class from LWP (and provides a method as_http_response() which will return an instance of HTTP::Response).
The following shows simple POST usage:
use Furl; $furl = Furl->new(); $response = $furl->post($url, [], [ x => 7, y => 13 ]); if ($response->is_success) { print $response->body, "\n"; } else { print "Status: ", $response->code, "\n"; print "Message: ", $response->message, "\n"; }
Furl will automatically follow redirects, upto a specified limit, which defaults to 7. You can overide this with the max_redirects parameter passed to the constructor. If you don't want redirects followed, just set this to 0 (zero).
Furl doesn't natively support working with cookies but the documentation contains a section which shows how to use LWP classes with Furl to handle cookies. This would lose you some of the performance benefits of Furl over LWP.
Furl does support https requests.
HTTP::Client is a small class for making GET requests, built on top of HTTP::Lite. The documentation says the aim was speed, and highlights the fact that it doesn't require LWP. The following illustrates making a simple GET request:
use HTTP::Client; $client = HTTP::Client->new(); $response = $client->get($url); if ($client->status_message =~ /^200/) { print "Status: ", $client->status_message, "\n"; print "Content: ", $response, "\n"; } else { print "Status: ", $client->status_message, "\n"; }
The get method is a bit strange: if the request results in an HTTP 200 status code, the content of the request page is returned. Otherwise it returns a status string, in the form "404 Not found". This means that anything other than a status code of 200 is considered a failure, and that redirects are not handled (you'll just get "302 Found", for example).
This design means you can't tell the difference between a request for a non-existent file, and a text file which contains "404 Not found". Ok, not very likely, but this is exactly the sort of thing an HTTP library should get right. And it's why you should always check the status_message, rather than the return value of get.
The module supports a number of methods for getting at header information returned in the HTTP response, but it doesn't let you get the HTTP status code and message individually, only via the status_message method.
As noted above, HTTP::Client only supports making GET requests, doesn't support SSL, and doesn't handle cookies.
The version on CPAN when I started working on this review had a number of bugs. I ended up getting co-maint and fixing the bugs listed on RT. I may improve some of the areas mentioned above, but more likely, unless I hear that people are actively using it, I'll just maintain it into retirement.
HTTP::GHTTP is a Perl interface to Gnome's libghttp. It's a fairly low-level interface, which means you have to write more code than with some modules:
use HTTP::GHTTP ':methods'; $ghttp = HTTP::GHTTP->new(); $ghttp->set_uri($url); $ghttp->set_type(METHOD_GET); $ghttp->process_request; print "STATUS: ", ($ghttp->get_status)[0], "\n"; print "CONTENT: ", $ghttp->get_body, "\n";
The get_status() method returns two values: the status code and reason phrase, as defined by the HTTP spec.
Though because it's low-level, you have a lot of control over the outbound request. It supports all the regular HTTP request types and some which are apparently for DAV (which I'm not familiar with).
You can make POST requests, but you have to specify the Content-Type explicitly, and have to encode the body by hand as well.
use HTTP::GHTTP ':methods'; $ghttp = HTTP::GHTTP->new(); $ghttp->set_uri($url); $ghttp->set_type(METHOD_POST); $body = 'x=7&y=13'; $ghttp->set_header('Content-Type', 'application/x-www-form-urlencoded'); $ghttp->set_body($body); $ghttp->process_request; print "STATUS: ", ($ghttp->get_status)[0], "\n"; print "CONTENT: ", $ghttp->get_body, "\n";
To encode the POST parameters properly, you need to do a lot more than I did, so if you need to POST, you should use a different module.
You can also do async requests as well, where results are pulled back in chunks of N bytes.
use HTTP::GHTTP; $ghttp = HTTP::GHTTP->new(); $ghttp->set_uri($url); $ghttp->set_async; $ghttp->set_chunksize(32); $ghttp->prepare; $count = 0; while ($status = $ghttp->process) { # do some other processing ++$count; } die "failed to complete request\n" unless defined($status); print "STATUS : ", ($ghttp->get_status)[0], "\n"; print "COUNT : ", $count, "\n";
The documentation is fairly light, for example in the section on async operation:
Doing timeouts is an exercise for the reader (hint: lookup select() in perlfunc).
HTTP::GHTTP doesn't handle https requests, as the underlying libghttp doesn't handle SSL. There is no support for working with cookies, and it doesn't handle redirects for you.
The underlying libghttp doesn't seem to be maintained: I couldn't find much about it online, and this module was last released in 2002. It has been superseded by libSoup for the GNOME project. So this is probably not one to use, though as you'll see in the performance comparison below, it's the fastest module, both for GET and POST requests.
HTTP::Lite is a "lightweight HTTP implementation", intended for use where you're only wanting to make simple HTTP requests.
use HTTP::Lite; $http = HTTP::Lite->new(); $http->http11_mode(1); $result = $http->request($url); print "Status: ", $result, "\n"; if (defined($result) && ($result =~ /^2/ || $result == 302)) { print "CONTENT: ", $http->body, "\n"; } else { print "Failed: ", $http->status, "\n"; }
The following code shows how to make a POST request. You call the prepare_post() method before making the request. This builds the request body, and sets the method type to POST.
use HTTP::Lite; $http = HTTP::Lite->new(); $http->http11_mode(1); $http->prepare_post({ x => 7, y => 13 }); $result = $http->request($url); print "Status: ", $result, "\n"; if (defined($result) && ($result =~ /^2/ || $result == 302)) { print "CONTENT: ", $http->body, "\n"; } else { print "Failed: ", $http->status_message, "\n"; }
You can define arbitrary HTTP headers to use when making a request, which you do before making the request. The following example over-rides the default User-Agent header, and switches to HTTP/1.1 (it is HTTP/1.0 by default):
use HTTP::Lite; $http = HTTP::Lite->new(); $http->http11_mode(1); $http->add_req_header('User-Agent', 'HTTP::Lite/2.3'); print "User-Agent:\n"; foreach my $header ($http->get_header('User-Agent')) { print " $header\n"; } $result = $http->request($url); print "status code = $result\n"; print "content = ", $http->body, "\n";
If you want to make multiple requests, you have to call the reset() method between each request:
use HTTP::Lite; $http = HTTP::Lite->new(); $result = $http->request('http://perl.org'); print "response from perl.org = $result\n"; $http->reset; $result = $http->request('http://perl.com'); print "response from perl.com = $result\n";
An instance of HTTP::Lite represents a user agent, an HTTP request, and the HTTP response, all at the same time. As a result the interface is potentially a little confusing, which is further not helped by the documentation contradicting itself in a number of places.
The documentation says that HTTP::Lite only supports GET and POST requests, but also describes a method() method, which suggests you can do PUT and HEAD requests as well. I've only tested GET and POST. I've just taken over maintenance of this module and have fixed some of the outstanding bugs. I'll resolve the documentation issues and decide what to do on remaining bugs.
The module has some features I've not covered, such as the ability to provide callback functions which are invoked at specific points in the request cycle. It doesn't: support https; handle redirects, or provide support for cookies. I wouldn't recommend using this module if you were starting something now.
HTTP::MHTTP provides a low-level library for making HTTP requests, based on a C library which is included in the distribution. Instead of an OO interface like most modules here, it exports 14 functions which act on global variables held in the C library. So not thread-friendly.
It took me a while to get it working reliably, until I realised that
As a result, if you want to use this module (and you almost certainly don't), you should call http_set_protocol() to switch to HTTP/1.1 (HTTP/1.0 is the default), and then you should manually as the Host header to your request:
use HTTP::MHTTP; http_init(); http_set_protocol(1); http_add_headers('User-Agent' => 'HTTP::MHTTP/0.15', 'Host' => 'www.robotstxt.org', ); $result = http_call('GET', $url); if ($result > 0) { if (http_status() == 200) { print "CONTENT: ", http_response(), "\n"; } else { print "Failed to GET file - ", http_status(), "\n"; } } else { print "failed to make request - error code: $result\n"; }
But when running the benchmarks, I discovered that if you switch to HTTP/1.1 then you can't make multiple requests: the first request will succeed, but subsequent requests fail without even talking to the HTTP server.
The following is the closest I could get to a working example of a POST request. The parameters are successfully posted, but the body returned has some extra characters.
use HTTP::MHTTP; http_init(); http_set_protocol(1); $body = 'x=7&y=13'; http_body($body); http_add_headers('User-Agent' => 'HTTP::MHTTP/0.15', 'Host' => 'localhost', 'Content-Type' => 'application/x-www-form-urlencoded', ); $result = http_call('POST', $url); if ($result > 0) { if (http_status() == 200) { print '"', http_response(), '"', "\n"; } else { print "POST failed - ", http_status(), "\n"; } } else { print "failed to make request - error code: $result\n"; }
If you're going to make multiple HTTP requests, you have to call http_reset() between requests. This doesn't clear any headers you set though, so you might want to call http_init() between requests instead.
The documentation says that "rudimentary SSL support can be compiled in", but I didn't test that, as by this point it was clear that you shouldn't use this module. Given the "rudimentary", I counted https as not being supported (for the features comparison table, below). It also doesn't handle redirect transparently, and doesn't help with cookies.
This module marked a first for me: the first time I've given up on trying to get a module to work! When reviewing other modules I've fixed a fair few bugs, and worked hard to install underlying C libraries, but after spending a couple of hours on this, enough is enough, for the moment.
HTTP::Soup is a Perl interface to libsoup, which is an HTTP client/server library for GNOME. It looks like libsoup replaced libghttp as the HTTP library for GNOME.
I first tried installing libsoup by hand, but on the 9th dependent library, and struggling to get things building cleanly, I decide to give MacPorts a go. After nearly an hour I finally had libsoup installed: man, it installed a lot of packages!
Then I tried to install HTTP::Soup. There was an undeclared pre-requisite, but once I got past that, I couldn't get one of the dependencies to install. I may go back to this, but for the moment you should probably give this a miss, unless you've already got all the GNOME libraries installed, in which case you might be more successful.
HTTP::Tiny bills itself as "a small, simple, correct HTTP/1.1 client". The instance of HTTP::Tiny is the User-Agent; the request 'object' is just a hashref used internally, with the response returned as a hashref:
use HTTP::Tiny; $tiny = HTTP::Tiny->new(); $response = $tiny->get($url); if ($response->{success}) { print "Status: ", $response->{status}, "\n"; print "Content: ", $response->{content}, "\n"; } else { print "Status: ", $response->{status}, "\n"; print "Reason: ", $response->{reason}, "\n"; }
The following shows usage of the post_form method, which submits form data with a content type of application/x-www-form-urlencoded:
use HTTP::Tiny; $tiny = HTTP::Tiny->new(); $response = $tiny->post_form($url, { x => 7, y => 13 }); if ($response->{success}) { print "Status: ", $response->{status}, "\n"; print "Content: ", $response->{content}, "\n"; } else { print "Status: ", $response->{status}, "\n"; print "Reason: ", $response->{reason}, "\n"; }
HTTP::Tiny supports all HTTP verbs (GET, HEAD, PUT, POST, DELETE), either with the generic request method, or with convenience functions, which are the lower-case name of the verb.
$response = $http->request($verb, $url, \%options); $response = $http->post($url, \%options);
The options hashref can be used to provide callbacks (see the doc for details) the body of the request, or HTTP headers to include in the request. For example, to include the If-Modified-Since header (which will only return the requested URL if it has been modified after the date/time you give:)
use HTTP::Tiny; $tiny = HTTP::Tiny->new(); $response = $tiny->get($url, { headers => { 'If-Modified-Since' => 'Mon, 23 Aug 2010 19:18:05 GMT' }, }); if ($response->{success}) { print "Status: ", $response->{status}, "\n"; print "Content: ", $response->{content}, "\n"; } else { print "Status: ", $response->{status}, "\n"; print "Reason: ", $response->{reason}, "\n"; }
which results in the following:
Status: 304 Reason: Not Modified
Note that if you make a request with the If-Modified-Since header, then a 304 response (the remote file hasn't been modified), then the success field in the response hashref will be false, so you'll need to check the status field.
You can pass a number of options to the constructor:
$http = HTTP::Tiny->new( agent => 'MyAgent/1.0', default_headers => { }, max_redirect => 7, max_size => 1_048_576, proxy => $proxy_url, timeout => 60, );
The agent option provides the string passed in the User-Agent HTTP header. The default_headers option can be used to provide HTTP headers which you want to include in all requests. If max_redirect is set to a number greater than 0, then HTTP::Tiny will transparently follow redirects, up to the specified number of hops (defaulting to 5). If you specify a max_size, then if the response exceeds the specified number of bytes, you'll get a 599 status code, with reason set to Internal Exception, and the content of the response will be set to:
Size of response body exceeds the maximum allowed of $self->{max_size}
Https requests are supported if you have IO::Socket::SSL installed.
This is a well thought-out module, and is now the default module I turn to for HTTP (replacing LWP). he main thing that bugs me slightly is why the response is returned as a hashref, rather than as an instance of a response class. I don't see that that would break the Tiny philosophy. When I asked David Golden why it returns a hashref and not an object, he replied:
Because it's *tiny* :-)
LWP is the great-grandaddy of libraries for doing all things HTTP. In the canonical usage, you construct a User Agent (LWP::UserAgent), then for each request you create an instance of HTTP::Request. You pass the request object to the request method of the UserAgent, and get back an instance of HTTP::Response:
use LWP::UserAgent; use HTTP::Request; $ua = LWP::UserAgent->new(); $request = HTTP::Request->new('GET' => $url); $response = $ua->request($request); if ($response->is_success) { print "Status: ", $response->code, "\n"; print "Content: ", $response->content, "\n"; } else { print "Status: ", $response->code, "\n"; print "Reason: ", $response->message, "\n"; }
When constructing the request, you can specify headers to include either by passing an instance of HTTP::Headers, or by passing an arrayref which contains key/value pairs. If you're doing anything other than the simplest requests, you might also want to look at HTTP::Request::Common, which provides convenience functions for constructing request objects.
The following shows how to make a POST request, using the POST function from HTTP::Request::Common:
use LWP::UserAgent; use HTTP::Request::Common; $ua = LWP::UserAgent->new(); $response = $ua->request(POST $url, [x => 7, y => 13]); if ($response->is_success) { print "Status: ", $response->code, "\n"; print "Content: ", $response->content, "\n"; } else { print "Status: ", $response->code, "\n"; print "Reason: ", $response->message, "\n"; }
In addition to http requests, LWP supports ftp and file URLs, and can be used to POST to mailto URLs. If you want to make https requests, you 'just' need to install LWP::Protocol::https, which comes in a separate distribution.
More examples on using LWP can be found in lwpcook.
LWP::Curl tries to provide an LWP-like interface on top of libcurl. LWP::Curl is analogous to LWP::UserAgent, but it doesn't provide Request and Response classes: requests are constructed from arguments passed to the get and post methods, which return the body of the response:
use LWP::Curl; $client = LWP::Curl->new(); $content = $client->get($url); if (defined($content)) { print "Request successful\n"; print "Content: ", $content, "\n"; } else { print "Request failed\n"; }
The constructor can take a number of arguments, the most interesting ones being:
$client = LWP::Curl->new( timeout => 10, headers => 0, user_agent => 'MyAgent/1.03', followlocation => 1, maxredirs => 3, auto_encode => 1, );
If headers is true, then the return value from get and post will include the HTTP headers from the response. If followlocation is true, then LWP::Curl will follow redirects, up to the number of hops specified in maxredirs. The followlocation is redundant, as you could just set maxredirs to 0, which is what HTTP::Tiny and others do.
The following shows a basic POST request:
use LWP::Curl; $client = LWP::Curl->new(); $content = $client->post($url, { x => 7, y => 13 }); if (defined($content)) { print "Request successful\n"; print "Content: ", $content, "\n"; } else { print "Request failed\n"; }
The style of interface falls between LWP and Net::HTTP::Tiny. It does support https, but you can't provide HTTP headers to include in the request, and it doesn't support cookies.
I've submitted a number of fixes and changes to this module, which Lindolfo was very quick to act on, and he's given me co-maint. I may come back and do some more work on this, as I like the combination of a simple interface with Curl's performance.
LWP::Simple provides a very simple interace to LWP. It exports 5 functions, the most commonly used of which is probably the get function, which makes a GET request for the specified URL:
use LWP::Simple; $content = get($url); if (defined($content)) { print "Content: ", $content, "\n"; } else { print "Failed to get content, can't tell you why\n"; }
There is no way to get hold of the HTTP response code, but if you just want to get the contents, and just care about success or failure, then this might serve your needs.
The getstore function takes a URL and filename; if a GET request to the URL is successful, the contents are written to the file. The function returns the HTTP status code from the response. The mirror function takes the same arguments, but as an If-Modified-Since header to the GET request, taking the modified time from the filename, if it exists.
There is no function for making a POST request.
Mojo::UserAgent is part of the Mojolicious web framework. When using Mojo::UserAgent, you're actually working with quite a large collection of classes — you have to get your head around quite a lot to use it, even at a basic level.
For example, when you make a GET request, by calling the get() method, you're returned an instance of Mojo::Transaction::HTTP. The documentation for Mojo::Transaction::HTTP is quite light, because it inherits a lot from Mojo::Transaction. It turns out that Mojo::Transaction has a res() method that returns a Mojo::Message::Response object. Aha, you might be thinking, that's the sort of thing you were after.
Here's a simple GET request:
use Mojo::UserAgent; $ua = Mojo::UserAgent->new(max_redirects => 7); $tx = $ua->get($url); $response = $tx->res(); if ($response->code == 200) { print "Status: ", $response->code, "\n"; print "Content: ", $response->body, "\n"; } else { my ($message, $code) = $tx->error(); print "Status: $code\n"; print "Reason: $message\n"; }
It may be that I'm not thinking "the mojo way" here, but I'm trying to map Mojo::UserAgent into my existing mental model.
It will handle redirects, but you have to specify a positive value for max_redirects, as it defaults to zero (i.e. don't follow redirects). I've submitted an issue suggesting that the default should be something like 5 (five). I got a quick response to this, saying they might make the change for the next major version (4), but don't want to break things with the current major version.
Here's how to make a simple POST request:
use Mojo::UserAgent; $ua = Mojo::UserAgent->new(); $tx = $ua->post_form($url => { x => 7, y => 13 }); $response = $tx->res(); if ($response->code == 200) { print "Status: ", $response->code, "\n"; print "Content: ", $response->body, "\n"; } else { my ($message, $code) = $tx->error(); print "Status: $code\n"; print "Reason: $message\n"; }
Mojo::UserAgent handles all HTTP methods, https, and supports cookies. It's part of a comprehensive collection of classes, very reminiscent of LWP. So much so I found myself wondering why they didn't just use LWP. There are a bunch of additional things in there I haven't looked at though, so I'm guessing there are reasons why.
If you're already using Mojolicious, then this is a good choice. If you're not using Mojolicious, then I can't think of a good reason why you'd use this over one of the other choices.
Net::Curl provides an interface to libcurl, the library which underlies the widely used curl utility. Even though the distribution is Net::Curl, the module you actually use is Net::Curl::Easy. It's a low-level interface — you write quite a lot of code just to make a simple request:
use Net::Curl::Easy qw(:constants); $curl = Net::Curl::Easy->new(); $curl->setopt(CURLOPT_URL, $url); $curl->setopt(CURLOPT_FOLLOWLOCATION, 1); $curl->setopt(CURLOPT_MAXREDIRS, 5); $curl->setopt(CURLOPT_SSL_VERIFYPEER, 0); $curl->setopt(CURLOPT_WRITEDATA, \$response_body); $curl->setopt(CURLOPT_USERAGENT, "Net::Curl/$Net::Curl::VERSION"); eval { $curl->perform(); }; if ($@) { die "Request failed: $@\n"; } else { print "Status: ", $curl->getinfo(CURLINFO_HTTP_CODE), "\n"; print "Content: ", $response_body, "\n"; }
The example above shows how you configure it to follow redirects for up to 5 hops.
On failure, most of the methods throw a Net::Curl::Easy::Code error object. This is a dual-var, having both an integer and string value. There are CURLE_ symbols defined for all errors, for example:
if (ref($@) eq 'Net::Curl::Easy::Code') { if ($@ == CURLE_TOO_MANY_REDIRECTS) { die "too many redirect hops: I gave up!\n"; } else { die "request failed: $@\n"; } }
This module clearly provides a lot of low-level features for controlling and getting feedback on the request cycle, but I didn't find the documentation very helpful. It relies on you understanding the underlying C library. For example, getinfo retrieves one of the many defined pieces of information, but the documentation doesn't list all of the supported values. The distribution includes a number of examples in Net::Curl::examples, but I didn't find them particularly helpful either.
The following shows a POST request. This took me a while to work out, though now when I look at it I wonder why.
use Net::Curl::Easy qw(:constants); $body = 'x=7&y=13'; $curl = Net::Curl::Easy->new(); $curl->setopt(CURLOPT_URL, $url); $curl->setopt(CURLOPT_POST, 1); $curl->setopt(CURLOPT_POSTFIELDS, $body); $curl->setopt(CURLOPT_WRITEDATA, \$response_body); eval { $curl->perform(); }; if ($@ || $net_curl->getinfo(CURLINFO_HTTP_CODE) != 200) { die "POST failed\n"; } else { print "Content: ", $response_body, "\n"; }
Net::Curl handles all HTTP methods and redirects. It can handle https requests, but if you use the GET example code above, you'll get an error message something like:
Request failed: Peer certificate cannot be authenticated with given CA certificates
The documentation for Net::Curl says nothing about this, but when I hit a similar issue with WWW::Curl I spent some time looking at libcurl to work out the issue. See section on WWW::Curl below for the solution.
I initially failed to install Net::Curl. CPAN complained that Net::Curl depended on ExtUtils::PkgConfig, which it couldn't install because pkg-config wasn't available. I tried to install pkg-config, but configure complained that it needed glib, and when trying to install glib, configure bombed out complaining that pkg-config wasn't found. The README explained that you could configure for curl manually, so I tried that.
Make test failed, with each test aborting with an error message:
Symbol not found: _ERR_remove_thread_state
After searching online, I discovered that this was down to curl being compiled against the wrong version of openssl. My mac comes with curl, but I had compiled the most recent curl and installed it in /usr/local. I rebuilt curl and pulled in the right version of openssl, then manually configured Makefile.PL to pull in my version of curl and not the default system one. Successful install at last!
I installed openssl in /usr/local, so the final configure for libcurl was:
% ./configure --prefix=/usr/local --with-ssl=/usr/local
Then the %curl hash in Makefile.PL for Net::Curl would look something like (I have just installed curl 7.26.0):
my %curl = ( incdir => '/usr/local/include', cflags => '-I/usr/local/include', libs => '-L/usr/local/lib -lcurl', version => '7.26.0', );
Overall, this module appears comprehensive and stable. If you need fast requests or if you're already familiar with libcurl, then this might be a good choice. For interal use I might consider this module, but I were releasing a module to CPAN I wouldn't want to introduce a dependency on a C library.
Net::Curl::Simple is built on top of Net::Curl::Easy, and provides a simpler interface. By default it works asynchronously:
use Net::Curl::Simple; use Net::Curl::Easy qw(:constants); $curl = Net::Curl::Simple->new(); $curl->get($url, \&finished); 1 while Net::Curl::Simple->join; sub finished { my $curl = shift; if ($curl->code == 0) { my $status = $curl->getinfo(CURLINFO_HTTP_CODE); print "HTTP status: $status\n"; print "Content: ", $curl->content, "\n" if $status =~ /^2/; } else { print "request failed\n"; } }
When using the getinfo method, you can either pass string names, or you can import the constants from Net::Curl::Easy. I think Net::Curl::Simple should provide these constants for you.
The following shows making a simple POST request:
use Net::Curl::Simple; use Net::Curl::Easy qw(:constants); $curl = Net::Curl::Simple->new(); $curl->post($url, { x => 7, y => 13 }, \&finished); 1 while Net::Curl::Simple->join; sub finished { my $curl = shift; if ($curl->code == 0) { my $status = $curl->getinfo(CURLINFO_HTTP_CODE); print "HTTP status: $status\n"; print "Content: ", $curl->content, "\n" if $status =~ /^2/; } else { print "request failed\n"; } }
With version 0.13, running the above examples results in an error message:
Attempt to free unreferenced scalar: SV 0x7fc139909528 during global destruction.
You can use the module in a synchronous mode, by passing undef for the callback parameter:
$curl->get($url, undef); if ($curl->code == 0) {
In addition to GET, the module also supports HEAD, POST and PUT. If your libcurl was compiled with the right options, then Net::Curl::Easy supports IPv6 and SSL. You can get at most (all?) of the full power of Curl, so this module does effectively support cookies, though the documentation doesn't mention how you do this.
When I first installed Net::Curl::Simple and tried to use it, I got a warning:
Please rebuild libcurl with AsynchDNS to avoid blocking DNS requests
To do async DNS lookups curl needs the c-ares library. I installed it with the following:
% ./configure --prefix=/usr/local % make % make install
And then rebuilt curl (again!) with the following:
./configure --prefix=/usr/local -with-ssl=/usr/local --enable-ares=/usr/local % make % make install
Which seemed to satisfy Net::Curl::Simple.
Net::HTTP is a low-level module which represents an HTTP connection — it's a subclass of IO::Socket::INET. On top of the IO::Socket methods, it provides a number of HTTP-specific methods. Given the low-level nature you can achieve most things you might want to, but you will write a lot more code than with other modules.
use Net::HTTP; use URI; $uri = URI->new($url); %headers = ('User-Agent' => "Net::HTTP/$Net::HTTP::VERSION"); $client = Net::HTTP->new(Host => $uri->host) || die $@; $client->write_request(GET => $uri->path, %headers); ($code, $mess, %headers) = $client->read_response_headers; if ($code =~ /^2/) { print "Status: $code\n"; while (1) { my $buf; my $nread = $client->read_entity_body($buf, 1024); die "read failed: $!" unless defined $nread; last unless $nread; print $buf; } } else { print "Request failed\n"; print "Status: $code\n"; print "Response: $mess\n"; }
The following shows how to make a simple POST request:
use Net::HTTP; use URI; $uri = URI->new($url); $body = 'x=7&y=13'; %headers = ('Content-Type' => 'application/x-www-form-urlencoded'); $client = Net::HTTP->new(Host => $uri->host) || die $@; $client->write_request(POST => $uri->path, %headers, $body); ($code, $mess, %headers) = $client->read_response_headers; if ($code =~ /^2/) { print "Status: $code\n"; while (1) { my $buf; my $nread = $client->read_entity_body($buf, 1024); die "read failed: $!" unless defined $nread; last unless $nread; print $buf; } } else { print "Request failed\n"; print "Status: $code\n"; print "Response: $mess\n"; }
Given the low-level design, Net::HTTP doesn't support cookies, doesn't transparently handle redirects, and doesn't handle https.
I'm not going to go into further detail. If you need complete control over your HTTP requests, this might be the module for you (LWP builds on it, amongst others), but for most people I suspect it's just too low-level.
Net::HTTP::Tiny is a new module that bills itself as a "minimal HTTP client". It provides a single function for making GET requests, which returns the body on success, and dies on failure.
use Net::HTTP::Tiny qw(http_get); eval { $content = http_get($url); }; if (not $@) { print "Content: $content\n"; } elsif ($@ =~ m!^HTTP error: 4\d\d!) { print "let's retry!\n"; } else { print "failed to get $url - $@\n"; }
An HTTP status code of 200 is considered success; all other status codes are treated as failure. There are many other reasons why http_get might fail. If you want to check for certain 5xx codes to decide on a retry strategy, you can match against $@:
eval { $content = http_get($url); }; if (not $@) { # success } elsif ($@ =~ m!^HTTP error: 5\d\d!) { # retry } else { # failed }
(yes, I know that a blanket retry on 5xx is a bad idea).
This really is a minimalist module. It will follow redirects up to a hard-coded limit of 5 hops, and will support IPv6 if IO::Socket::IP is installed. But that's about it. From email with Zefram, he's considering https support and a means for configuring the number of redirect hops, but is waiting to see what is wanted by users. I'd considering switching to this for some of my modules if it supported https.
I think the module is mis-named: Net::HTTP::Tiny suggests it is related to Net::HTTP, so for doing low-level HTTP requests. By analogy with HTTP::Tiny and other ::Tiny modules, I think HTTP::Minimal would be a better name.
One of the benefits of this module is the minimal number of dependencies. If you're writing a module which just needs to make a simple GET request, and want to minimise your dependencies (either for ease of distribution or to keep footprint and runtime down), then this might be a good choice.
URI::Fetch provides a single function, which is accessed as a class method. At its simplest, this will GET the specified URI for you, using LWP:
use URI::Fetch; $response = URI::Fetch->fetch($url); if (defined($response)) { print "Status: ", $response->http_status, "\n"; print "Content: ", $response->content, "\n"; } else { print "Request failed\n"; print "Status: ", $response->http_status, "\n"; }
By default, if successful it will return an URI::Fetch::Response, and undef if the underlying GET request returned anything other than a 200 status code. You can change this behaviour with the ForceResponse option, which says that a Response object should always be returned.
The Response object provides some methods in common with LWPs' HTTP::Response, and holds an instance of the latter, which you can access with the http_response method:
use URI::Fetch; $response = URI::Fetch->fetch($url, ForceResponse => 1); if ($response->http_response->is_success) { print "Status: ", $response->http_response->status_line, "\n"; print "Content: ", $response->http_response->content, "\n"; } else { print "Status: ", $response->http_response->status_line, "\n"; }
One of the things which sets this module apart from most of the others here is the support for caching. When fetching a URL, you can pass a cache object, which has to support the Cache interface. If the remote file is already in the cache, then fetch will add an If-Modified-Since header. If the remote resource has been modified since the cache was last updated, fetch will get a 200 response, store the contents in the cache, and return a 200 response to you, along with the contents. If the remote resource hasn't been modified, fetch will receieve a 304 response. On receiving a 304, fetch will get the contents from the cache, and return a 304 to you along with the contents (usually a 304 doesn't have a body).
use URI::Fetch; use Cache::File; $cache = Cache::File->new(cache_root => '/tmp/cache'); $response = URI::Fetch->fetch($url, ForceResponse => 1, Cache => $cache); # if ($response->is_success || $response->http_status == 304) { if ($response->is_success) { print "Status: ", $response->http_status, "\n"; print "Content: ", $response->content, "\n"; } else { print "Request failed (", $response->http_status, ")\n"; }
Rather than use the 304 code explicitly, you could use HTTP::Status (part of the HTTP-Message dist), and refer to the status symbolically:
use HTTP::Status; if ($response->is_success || $response->http_status == HTTP_NOT_MODIFIED) {
Given the nature of the module, I think that in this situation is_success should return true, and have submitted feedback to that effect.
You can add the NoNetwork option to further control whether the remote resource is even requested. If set to 0, then fetch() works as described above. If set to 1, then the remote server isn't contacted; if it's in the cache you'll get it. A value of N (greater than 1) tells fetch() not to make the HTTP request if the cache was updated within the last N seconds.
This module handles https and redirects, but doesn't handle anything other than GET requests, and doesn't provide support for cookies.
There are a few more options that I haven't mentioned, see the documentation for more on those.
This module should really be called URL::Fetch rather than URI::Fetch (there are plenty of resources online about the difference between a URI and a URL).
As the name suggests, URL::Grab is meant for situations where you just wanted to grab the contents of a URL, and don't need much more than that. The simplest usage is the grab_single() method:
use URL::Grab; $grabber = URL::Grab->new(); $result = $grabber->grab_single($url); if (defined($result)) { print "Request successful - content:\n"; print $result->{$url}, "\n"; } else { print "Request failed - can't tell you why\n"; }
Under the hood this is using LWP, so it will handle redirects and https, but you can't get at the HTTP status code, or the resulting URL if your original URL was redirected.
You can request multiple URLs in one go with grab():
$result = $grabber->grab($url1, $url2); foreach my $url ($url1, $url2) { if (exists($result->{$url}->{$url})) { print "$url : successful\n"; } else { print "$url : failed\n"; } }
There's no parallelism here: it just requests the URLs sequentially. Notice the curious return data structure: you're returned a hashref, and to get at the contents of the URL you use:
$retval->{ $url }->{ $url };
To check if the request for a particular URL failed:
if (defined($retval->{ $url })) {
The grab_failover() method takes a list of URLs, and calls grab_single() on them in turn. As soon as one is successful, it stops and returns the result from grab_single(). This would be useful if you were regularly grabbing a file which is mirrored on a number of sites (like CPAN), and wanted to cycle through a list of mirrors to try and ensure you get it every time:
$result = $grabber->grab_failover($url1, $url2); if (defined($result)) { if (exists($result->{$url1})) { print "Got url1\n"; } else { print "Got url2\n"; } } else { print "Request failed\n"; }
This illustrates the problem with the strange return value. You either have to try each key in turn, as above, or do something ugly like:
$content = (values %$result)[0];
The whole point of trying a number of alternatives is that you don't care which one succeeds, you just want the contents.
The grab_mirrorlist() method is a strange beast. If you pass a list of URLs, it will try each in turn, but the returned hashref will only include results for the last URL in the list. If you pass a reference to an array of URLs, it will pass the referenced array of URLs to grab_failover() (described above), so you'll get the result from the first successful URL. This means that if you wrote the following:
$result = $grabber->grab_mirrorlist($url1, [$url2, $url3], $url4);
This would request $url1, $url2 and assuming that was successful, wouldn't request $url3, and finally would request $url4, and return the result from grab_single() on $url4. I don't really see the scenario where you'd want grab_mirrorlist().
This feels like the rough-cut of a useful module that hasn't been finished yet. In the synopsis the author appears to apologise for the quirky interface, saying that it can't be changed now. I'd clean up the interface with new method names, and support the old interface (but deprecated) for backwards compatibility.
URL::Grab handles https and redirects, but doesn't provide support for cookies, or for any HTTP method other than GET. All of which is in line with the design of the module.
Web::Magic (WM hereafter) provides a slightly quirky interface. At its simplest you can use it to make HTTP requests:
use Web::Magic; $magic = Web::Magic->new(GET => $url); $magic->User_Agent("Web::Magic/$Web::Magic::VERSION"); if ($magic->response->is_success) { print "Status: ", $magic->response->code, "\n"; print "Content: ", $magic->content, "\n"; } else { print "Status: ", $magic->response->code, "\n"; print "Reason: ", $magic->response->message, "\n"; }
The following shows how to make a simple POST request:
use Web::Magic; $magic = Web::Magic->new(POST => $url); $magic->set_request_body({ x => 7, y => 13 }); if ($magic->response->is_success) { print "Status: ", $magic->response->code, "\n"; print "Content: ", $magic->content, "\n"; } else { print "Status: ", $magic->response->code, "\n"; print "Reason: ", $magic->response->message, "\n"; }
WM doesn't make the request from the constructor, but delays until it has to: in the above that's when you try and access the content. This means you can call a number of methods between the constructor and content(), to configure the request. To set headers on the HTTP request you can either call set_request_header(), or call a method named after the header, with underscores instead of dashes.
In general, you don't need to worry about when the request is made, but you can call do_request() explicitly, should you want to.
Web::Magic handles redirects and https, and can handle cookies (it uses LWP, and you can pass your own instance of LWP::UserAgent with the user_agent method).
There are lots of post-request methods, and this is where Web::Magic stands apart from the other modules. The request itself is made with LWP, and you can use the response() method to get the HTTP::Response method returned by LWP.
The to_dom method parses the response body as XML or HTML, based on the Content-Type response header, and returns the result as an XML::LibXML::Document. You can get the response body as JSON or YAML, or RDF::Trine::Model, whatever that is. It also does plenty more besides.
This depends on a lot of modules: installing Web::Magic seemed to take a long time. Furthermore, it uses them all at compile time, where many of them could be loaded only if needed. It's a strange beast: personally I'm not a fan of the kitchen sink school of module design, but if you want to process a remote file as RDF, or one of the other high-level operations it supports, then this module might be just the thing for you.
WWW::Curl is another module on top of libcurl. But where LWP::Curl tries to give you a high-level API on top of libcurl, WWW::Curl just maps the C interface into Perl:
use WWW::Curl::Easy; $curl = WWW::Curl::Easy->new(); $curl->setopt(CURLOPT_URL, $url); $curl->setopt(CURLOPT_FOLLOWLOCATION, 1); $curl->setopt(CURLOPT_MAXREDIRS, 7); $curl->setopt(CURLOPT_WRITEDATA, \$content); $ccode = $curl->perform(); if ($ccode == 0) { $status = $curl->getinfo(CURLINFO_HTTP_CODE); if ($status == 200) { print "Status: $status\n"; print "Content: ", $content, "\n"; } else { print "Status: $status\n"; } } else { print "Request failed\n"; print "An error happened: $ccode ".$curl->strerror($ccode)." ".$curl->errbuf."\n"; }
You call methods to configure a request you want to make, then call perform() to actually make the request.
The following shows how to make a POST request:
use WWW::Curl::Easy; $body = 'x=7&y=13'; $curl = WWW::Curl::Easy->new(); $curl->setopt(CURLOPT_URL, $url); $curl->setopt(CURLOPT_POST, 1); $curl->setopt(CURLOPT_POSTFIELDS, $body); $curl->setopt(CURLOPT_WRITEDATA, \$content); $ccode = $curl->perform(); if ($ccode == 0) { $status = $curl->getinfo(CURLINFO_HTTP_CODE); if ($status == 200) { print "Content: ", $content, "\n"; } else { print "Status: $status\n"; } } else { print "Request failed\n"; print "An error happened: $ccode ".$curl->strerror($ccode)." ".$curl->errbuf."\n"; }
The documentation for this module is fairly thin - it refers you to the curl documentation online. The documentation also suggests that you use LWP for most situations where you're working with HTTP, but that WWW::Curl may be a better choice where speed is important, or where you want to make multiple requests in parallel.
WWW::Curl handles redirects, and also supports cookies. If you request an https URL using the GET example above, you'll get an error message something like the following:
An error happened: 60 Peer certificate cannot be authenticated with given CA certificates
You can use an option to tell Curl where your certificates are (with the CURLOPT_SSLCERT and CURLOPT_SSLCERTTYPE options (for more see the relevant libcurl documentation)), or you can tell it not to worry about certificates:
$curl->setopt(CURLOPT_SSL_VERIFYPEER, 0);
WWW::Curl also provides some other interfaces:
WWW::Curl::Simple is built on top of WWW::Curl (see previous section) and provides a simpler interface. There are two ways to use WWW::Curl::Simple; the first is to make a single GET or POST request, in a blocking mode. Here's the way to make a single GET request:
use WWW::Curl::Simple; $curl = WWW::Curl::Simple->new(max_redirects => 5, check_ssl_certs => 0); $response = $curl->get($url); if ($response->is_success) { print "Status: ", $response->code, "\n"; print "Content: ", $response->content, "\n"; } else { print "request failed\n"; print " code = ", $response->code, "\n"; print " message = ", $response->message, "\n"; print "Content: ", $response->content, "\n"; }
The get and post methods return an HTTP::Response object (which used to be part of the LWP distribution, but is now part of the HTTP-Message distribution.
The following shows how to make a simple POST request:
use WWW::Curl::Simple; $curl = WWW::Curl::Simple->new(); $response = $curl->post($url, 'x=7&y=13'); if ($response->is_success) { print "Status: ", $response->code, "\n"; print "Content: ", $response->content, "\n"; } else { print "request failed\n"; print " code = ", $response->code, "\n"; print " message = ", $response->message, "\n"; }
WWW::Curl::Simple doesn't follow redirects and can't handle https requests. I've submitted a patch to addres both points. It can't handle cookies, because you can't get at the underlying Curl handle in order to pass the appropriate options.
The other way you can use this module is to make multiple requests in parallel. This is built on top of WWW::Curl::Multi.
While this undoubtedly provides a simple API to Curl, it is much slower than the other Curl-based modules (see the performance comparison below).
The following table summarises the capabilities of each module.
Module | GET | POST | DELETE | PUT | HTTPS | HTTP11 | redirects | cookies |
---|---|---|---|---|---|---|---|---|
Furl | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
HTTP::Client | ✓ | ✓ | ||||||
HTTP::GHTTP | ✓ | ✓ | ✓ | ✓ | ✓ | |||
HTTP::Lite | ✓ | ✓ | ✓ | |||||
HTTP::MHTTP | ✓ | ✓ | ✓ | ✓ | ||||
HTTP::Tiny | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
LWP | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
LWP::Curl | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
LWP::Simple | ✓ | ✓ | ✓ | ✓ | ||||
Mojo::UserAgent | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Net::Curl | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Net::Curl::Simple | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Net::HTTP | ✓ | ✓ | ✓ | ✓ | ✓ | |||
Net::HTTP::Tiny | ✓ | ✓ | ✓ | |||||
URI::Fetch | ✓ | ✓ | ✓ | ✓ | ||||
URL::Grab | ✓ | ✓ | ✓ | ✓ | ||||
Web::Magic | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
WWW::Curl | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
WWW::Curl::Simple | ✓ | ✓ | ✓ | ✓ | ✓ |
The following shows the results of benchmarking the modules for making GET requests. I ran two separate benchmarks: one for a very small file (14 bytes), and another for a much larger file (100K). I made 10,000 requests for both benchmarks.
|
|
I was surprised to see WWW::Curl::Simple performing so badly, given it's based on libcurl. I'd put onto my list to look into that, but having looked at dependencies (see next section), I suspect that Moose is the culprit. At some point I'll see whether switching to Mouse would help things. I also need to look into why HTTP::Client degrades so badly with larger files, given most of the work is done by HTTP::Lite, which doesn't suffer in the same way.
The following shows the results of benchmarking the relevant modules for making simple POST requests. This was basically the example POST request showed for each module above; Again, I was making 10,000 requests for each module.
Simple POST requests | |
---|---|
Module | Time (s) |
HTTP::GHTTP | 0.8 |
Net::Curl | 0.9 |
WWW::Curl::Easy | 0.9 |
LWP::Curl | 1.3 |
Furl | 2.2 |
Net::HTTP | 5.3 |
HTTP::Tiny | 7.4 |
Net::Curl::Simple | 10.0 |
Mojo::UserAgent | 13.3 |
LWP | 14.4 |
Web::Magic | 17.2 |
HTTP::Lite | 17.8 |
WWW::Curl::Simple | 56.8 |
A shame that the fastest module (HTTP::GHTTP) is no longer maintained.
If you're writing your own module which is making HTTP requests, then it may be important to you how many dependencies the HTTP request has. This may be for performance reasons, or to minimise the likelihood of your module not installing / working due to downstream dependencies.
The following table is an indication of the number of modules loaded by each of the modules under review:
Module | # dependencies |
---|---|
WWW::Curl | 6 |
HTTP::GHTTP | 8 |
Net::Curl | 9 |
HTTP::MHTTP | 10 |
HTTP::Lite | 11 |
HTTP::Client | 12 |
HTTP::Tiny | 20 |
LWP::Curl | 21 |
Net::HTTP::Tiny | 28 |
Furl | 29 |
Net::HTTP | 35 |
Net::Curl::Simple | 41 |
LWP | 69 |
URL::Grab | 71 |
LWP::Simple | 79 |
URI::Fetch | 84 |
Mojo::UserAgent | 109 |
WWW::Curl::Simple | 153 |
Web::Magic | 159 |
The problem with the dependency / pre-requisite information for modules is that it can include build or test dependencies, rather than runtime dependencies. The figures were generated by running the GET requests shown for each module, then running the following:
$ndeps = int(keys %INC) - 1;
The -1 is because the module itself appears in %INC as well.
I was expecting Web::Magic to come in last, but was surprised to see WWW::Curl::Simple close on its tail. WWW::Curl::Simple uses Moose, which adds 111 to the tally right off the bat.
The top four depend on external C libraries, as do all of the Curl-based modules.
While thinking about dependencies, I got distracted looking at modules for collecting dependency information. For the first update to this review I'll probably distinguish between core and non-core dependencies, and the number of dependent distributions as well.
There's no single module which will be the best to use in all situations. I started writing this section in the format "If you want X and don't care about Y, then use Foo", and quickly realised a graphic would be easier and clearer.
Here's one approach:
In this context, "basic requests" means GET or POST requests, https and transparent handling of redirects.
And a simpler one, which will give just as good answers most of the time:
A final twist: if you're repeatedly requesting the same file, for example in a regularly schedule job, then you might want to consider URI::Fetch.
A number of modules can make POST requests, but require you to encode the body yourself. I couldn't find a utility module with a function for doing this:
$body = encode_form_data(username => $user, password => $pword);
Does anyone know of a lightweight module that provides such a function? If not I'll suggest to Gisle that LWP's code be pulled into a separate dist.
WWW::Curl and Net::Curl are almost identical in basic usage, though WWW::Curl provides a lot of other features. I wonder if these could be merged into a single dist?
When I started work on this review, I found most of the Curl modules (apart from LWP::Curl) quite frustrating to work with, as they assume you're familiar with libcurl. I wasn't. Now I'm a lot more familiar with Curl, and am happy to work with any of the modules. Perhaps one of the benefits of merging WWW::Curl and Net::Curl would be to provide more comprehensive documentation.
comments powered by Disqus