CPAN modules for converting markdown to HTML

other reviews

Neil Bowers

2013-10-30

This is a review of Perl CPAN modules that can be used to convert markdown to HTML. If you're looking for a quick answer, you could skip to the Comparison section, but you can't go too far wrong with Text::Markdown.

The following is a list of the modules I'm aware of so far. Please let me know if I've missed any: neilb at cpan dot org.

Module Doc Version Author # bugs # users Last update
DR::SunDown pod 0.02 Dmitry E. Oboukhov 1 0 2012-08-09
Markdent pod 0.22 Dave Rolsky 0 2 2012-07-23
Markup::Unified pod 0.0401 עידו פרלמוטר (Ido Perlmuter) 2 2 2012-11-22
Text::Markdown pod 1.000031 Tomas Doran 8 38 2010-03-20
Text::Markdown::Discount pod 0.10 Masayoshi Sekimura 1 1 2013-08-09
Text::Markdown::Hoedown pod 0.07 MATSUNO★…Tokuhiro 0 1 2013-10-02
Text::Markup pod 0.18 David E. Wheeler 0 1 2013-06-08
Text::Markup::Any pod 0.03 Masayuki Matsuki 0 1 2013-10-08
Text::MultiMarkdown pod 1.000034 Tomas Doran 0 18 2011-04-26

Each module is presented in turn, with a SYNOPSIS style code sample. Then all the converter modules are compared, and I end up with recommendations. The review ends with a See Also section that lists markdown-related modules that don't meet the criteria for this review.

With each module I convert the following small snippet of markdown:

# Sample markdown

This is a paragraph of text.

  * This is a bullet
  * Another bullet

And here's a code sample

    print "Hello, World!\n";

And inline formatting:
*italic*, **bold**, and **_bold italic_**.

The first thing to consider is whether you want a full HTML document to be generated, or just a fragment that could be embedded. Most of the modules listed here generate fragments rather than complete HTML documents.

DR::SunDown

DR::SunDown is a wrapper around the sundown C library. Sundown is a fork of the soldout library.

The module provides a markdown2html function, which takes a markdown string and returns an HTML one:

use DR::SunDown;
my $html = markdown2html($markdown);
print $html, "\n";

Which produces the following output:

<h1>Sample markdown</h1>

<p>This is a paragraph of text.</p>

<ul>
<li>This is a bullet</li>
<li>Another bullet</li>
</ul>

<p>And here's a code sample</p>

<pre><code>print "Hello, World!\n";
</code></pre>

<p>And inline formatting:
<em>italic</em>, <strong>bold</strong>, and <strong><em>bold italic</em></strong>.</p>

This module doesn't offer anything beyond basic conversion, but its main advantage is its speed.

The underlying sundown library has been frozen since November 2012, and has a lot of outstanding bugs, so I wouldn't recommend using it. The freeze notice says github and others are apparently working on a formal definition of markdown and a new parser.

Markdent

Markdent is a toolkit for parsing markdown, which can also be used to convert a markdown document:

use Markdent::Simple::Document;
my $parser = Markdent::Simple::Document->new();
my $html   = $parser->markdown_to_html(
                title    => 'sample document',
                markdown => $markdown
             );
print $html, "\n";

Which produces the following output:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html><head><title>sample document</title></head><body><h1>Sample markdown
</h1><p>This is a paragraph of text.
</p><ul><li>This is a bullet
</li><li>Another bullet
</li></ul><p>And here's a code sample
</p><pre><code>print "Hello, World!\n";
</code></pre><p>And inline formatting:
<em>italic</em>, <strong>bold</strong>, and <strong><em>bold italic</em></strong>.
</p></body></html>

Note that this is for generating documents, rather than processing snippets of markdown. The documentation says to look at Markdent::Handler::HTMLStream::Fragment if you don't want to produce a complete document. I wrote a simple class using that, and here's the output that resulted:

<h1>Sample markdown
</h1><p>This is a paragraph of text.
</p><ul><li>This is a bullet
</li><li>Another bullet
</li></ul><p>And here's a code sample
</p><pre><code>print "Hello, World!\n";
</code></pre><p>And inline formatting:
<em>italic</em>, <strong>bold</strong>, and <strong><em>bold italic</em></strong>.
</p>

The documentation says that if you just want to convert markdown to HTML, then look at Text::Markdown.

Markup::Unified

Markup::Unified provides a common interface to conversion of three simple markup languages: markdown, BBCode, and Textile. The conversion of each format is handled by other modules; for Markdown it is Text::Markdown.

Here's how you convert markdown:

use Markup::Unified;
my $u    = Markup::Unified->new();
my $html = $u->format($markdown, 'markdown');
print $html, "\n";

Which produces the following output:

<h1>Sample markdown</h1>

<p>This is a paragraph of text.</p>

<ul>
<li>This is a bullet</li>
<li>Another bullet</li>
</ul>

<p>And here's a code sample</p>

<pre><code>print "Hello, World!\n";
</code></pre>

<p>And inline formatting:
<em>italic</em>, <strong>bold</strong>, and <strong><em>bold italic</em></strong>.</p>

If you need to support multiple markup formats, for example in a blogging engine, then this kind of module might be useful.

Text::Markdown

Text::Markdown supports both a functional and OO interface. For simple conversion of markdown, you can import the markdown() function:

use Text::Markdown qw(markdown);
my $html = markdown($markdown);
print $html, "\n";

Which produces the following output:

<h1>Sample markdown</h1>

<p>This is a paragraph of text.</p>

<ul>
<li>This is a bullet</li>
<li>Another bullet</li>
</ul>

<p>And here's a code sample</p>

<pre><code>print "Hello, World!\n";
</code></pre>

<p>And inline formatting:
<em>italic</em>, <strong>bold</strong>, and <strong><em>bold italic</em></strong>.</p>

So this can be used to process a snippet of markdown to embed into another document / page.

The OO interface gives you a little bit of control:

use Text::Markdown;
my $parser = Text::Markdown->new(
                  empty_element_suffix => '>',
                             tab_width => 4,
                trust_list_start_value => 1,
             );
my $html   = $parser->markdown($markdown);
print $html, "\n";

This appears to be the most widely used module, and it's the one that I've been using to date. It does have some outstanding bugs on github, but for 'regular markdown usage', it's fine.

Text::Markdown::Discount

Text::Markdown::Discount is a perl interface to Discount, a markdown parser in C. It exports a single function markdown():

use Text::Markdown::Discount qw(markdown);
my $html = markdown($markdown);
print $html, "\n";

Which produces the following output:

<h1>Sample markdown</h1>

<p>This is a paragraph of text.</p>

<ul>
<li>This is a bullet</li>
<li>Another bullet</li>
</ul>


<p>And here's a code sample</p>

<pre><code>print "Hello, World!\n";
</code></pre>

<p>And inline formatting:
<em>italic</em>, <strong>bold</strong>, and <strong><em>bold italic</em></strong>.</p>

The discount library also supports a number of extensions, which are described on the discount home page. Here are some of them:

Text::Markdown::Hoedown

Text::Markdown::Hoedown, from the prolific TOKUHIROM, is a wrapper around the hoedown C library. Hoedown is a fork of the sundown library (used in DR::Sundown, described above), which is itself a fork of the soldout library.

The simplest usage is similar to the other modules:

use Text::Markdown::Hoedown;
my $html = markdown($markdown);
print $html, "\n";

Which produces the following output:

<h1>Sample markdown</h1>

<p>This is a paragraph of text.</p>

<ul>
<li>This is a bullet</li>
<li>Another bullet</li>
</ul>

<p>And here's a code sample</p>

<pre><code>print "Hello, World!\n";
</code></pre>

<p>And inline formatting:
<em>italic</em>, <strong>bold</strong>, and <strong><em>bold italic</em></strong>.</p>

The markdown function can also take options, which are used to enable certain markdown extensions, and control the HTML that is generated.

use Text::Markdown::Hoedown;
my $html = markdown($markdown,
                    extensions   =>   HOEDOWN_EXT_SPACE_HEADERS
                                    | HOEDOWN_EXT_DISABLE_INDENTED_CODE,
                    html_options =>   HOEDOWN_HTML_TOC
                                    | HOEDOWN_HTML_USE_XHTML
                   );
print $html, "\n";

See the documentation for a list of all options. The doc's a bit thin though, so you'll have to guess / experiment to work out what all the options are.

The module also provides a markdown_toc() function, which will generate an HTML table of contents for a markdown string:

use Text::Markdown::Hoedown;
my $html = markdown_toc($markdown);
print $html, "\n";

Note that if you want to generate a TOC, you'll need to pass the HOEDOWN_HTML_TOC option to markdown(), shown above. This generates the id attributes on the header, linked to by the TOC.

This module is fast, and appears quite flexible. Once I've worked out what all the options are for, this might be a good option, if you don't mind a module that requires a C compiler.

Text::Markup

Text::Markup is a parser that can handle a number of formats, converting them to HTML.

use Text::Markup;
my $parser = Text::Markup->new();
my $html   = $parser->parse(file => $path);
print $html, "\n";

Which produces the following output:

<html>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>
<h1>Sample markdown</h1>

<p>This is a paragraph of text.</p>

<ul>
<li>This is a bullet
</li>
<li>Another bullet
</li>
</ul>

<p>And here's a code sample</p>

<pre><code>print "Hello, World!\n";
</code></pre>

<p>And inline formatting:
<em>italic</em>, <strong>bold</strong>, and <strong><em>bold italic</em></strong>.</p>

</body>
</html>

So this seems to be oriented towards generating documents rather than embeddable snippets.

Text::Markup::Any

Text::Markup::Any provides a single interface to a number of other markup conversion modules. The following shows conversion using Text::Markdown:

use Text::Markup::Any;
my $tma  = Text::Markup::Any->new('Text::Markdown');
my $html = $tma->markup($markdown);
print $html, "\n";

Which produces the following output:

<h1>Sample markdown</h1>

<p>This is a paragraph of text.</p>

<ul>
<li>This is a bullet</li>
<li>Another bullet</li>
</ul>

<p>And here's a code sample</p>

<pre><code>print "Hello, World!\n";
</code></pre>

<p>And inline formatting:
<em>italic</em>, <strong>bold</strong>, and <strong><em>bold italic</em></strong>.</p>

The markup modules supported are: Text::Markdown, Text::MultiMarkdown, Text::Markdown::Discount, Text::Xatena, and Text::Textile.

Compared with Markup::Unified this module is a little strange: you have to specify the name of the conversion module rather than the format name (ie 'Text::Markdown' instead of 'markdown'), plus it supports two different markdown modules, with no discussion of why you might choose one over the other.

Text::MultiMarkdown

Text::MultiMarkdown converts MultiMarkdown to HTML, and is written by Tomas Doran, who's the current maintainer of Text::Markdown as well. MultiMarkdown is superset of Markdown defined by Fletcher Penney. Since it's a superset, you can use Text::MultiMarkdown as a converter for regular Markdown:

use Text::MultiMarkdown qw(markdown);
my $html = markdown($markdown);
print $html, "\n";

Which produces the following output:

<h1 id="samplemarkdown">Sample markdown</h1>

<p>This is a paragraph of text.</p>

<ul>
<li>This is a bullet</li>
<li>Another bullet</li>
</ul>

<p>And here's a code sample</p>

<pre><code>print "Hello, World!\n";
</code></pre>

<p>And inline formatting:
<em>italic</em>, <strong>bold</strong>, and <strong><em>bold italic</em></strong>.</p>

The output is nearly identical to that produced by Text::Markdown, unsurprisingly. Notice the id attribute on the h1 element, for example.

Comparison

Performance

I benchmarked three of the modules using a slightly longer markdown sample (2K), which contains most of the different notations. I converted this 10,000 times. I didn't include Text::Markup, as it just uses Text::Markdown, under the hood.

ModuleTime (s)
DR::SunDown0.35
Text::Markdown::Hoedown0.44
Text::Markdown::Discount2.62
Text::Markdown90.13
Markup::Unified90.40
Text::Markup::Any91.39
Text::MultiMarkdown149.54
Markdent348.31

At the moment I'm using Benchmark. I've been meaning to try Dumbbench, and noticed the dist includes Benchmark::Dumb, which is billed as a "Benchmark.pm compatibility layer". But it's not a complete drop-in, so I'll come back to that.

Dependencies

The following table shows the number of run-time dependencies for each module, when running the example code given for each module above.

Module# dependencies
Text::Markdown::Discount4
Text::Markdown::Hoedown7
DR::SunDown8
Text::Markup24
Text::Markup::Any26
Text::Markdown26
Text::MultiMarkdown27
Markup::Unified43
Markdent299

Note that the first three modules are all based on C libraries, so while they're reporting a lot fewer dependencies, they require the relevant C library and a C compiler.

Correctness & Robustness

I've been building a corpus of markdown samples, which I used to compare the output generated by the different modules. This is similar to Gruber's testsuite (zip file), but has smaller files, to make it easier to identify exactly what the differences are.

I've mainly focussed on Text::Markdown, DR::SunDown, Text::Markdown::Discount, and Text::Markdown::Hoedown. The differences I've found so far:

Conclusion

Text::Markdown is pretty battle-hardened, but Discount, SunDown and Hoedown are a lot faster. SunDown is no longer maintained, so don't use that.

For basic usage, and if you want a pure-Perl solution, Text::Markdown is the one to go with.

If you want better performance, then look at Discount or Hoedown.

If you want to write your own markdown processor, then Markdent looks like your best option.

See Also

This is a list of modules that don't meet the criteria for this review, but which might be of interest, as they're markdown-related.

Catalyst::Plugin::Markdown
A persistent markdown processor for the Catalyst web framework, which uses Text::Markdown.
Dist::Zilla::Plugin::ReadmeAnyFromPod
A Dist::Zilla plugin which will convert POD to a README in one of a number of formats, including markdown.
DocLife
A plack-based document viewer for markdown documents.
Email::Simple::Markdown
Create an email message where the content is specified using markdown, which is then converted to both text and HTML versions (a markdown version of Email::Simple).
HTML-Format
A distribution that contains modules for converting HTML into other formats, including markdown.
HTML::WikiConverter
A framework for converting HTML to various wiki formats. It has a plugin capability, via which a lot of formats. One of these is HTML::WikiConverter::Markdown which converts HTML to markdown.
Markdown::Pod
Converts markdown into pod.
Pod::Markdown
A subclass of Pod::Parser which converts POD to markdown.
Template::Provider::Markdown
Use the template toolkit with markdown templates instead of HTML.
Mark Allen's wp2md script (on github)
A script that converts Wordpress XML export into a set of markdown files.
comments powered by Disqus