CPAN modules for spelling numbers in English

Neil Bowers

2012-08-05

This article is a comparison of CPAN modules for spelling out numbers in words, in English. Here's a summary of the modules covered:

Module Doc Version Author # bugs # users Last update
Lingua::EN::Inflect CPAN 1.894 Damian Conway 14 30 2012-06-14
Lingua::EN::Numbers CPAN 1.04 Neil Bowers 1 9 2012-06-21
Lingua::EN::Nums2Words CPAN 1.14 Lester Hightower 0 0 2011-10-02
Math::BigInt::Named CPAN 0.03 Tels 2 0 2007-02-24
Number::Spell CPAN 0.04 Les Howard 4 0 2000-03-11

I'll look at each module in turn, then present results of comparing the modules, and finally which module you should use when.

I haven't included modules for other languages in the evaluation, as I don't speak any of them well enough to check the generated word strings. Following the discussion on the modules above, I've included a list of related modules.

Before we start, there are two things you need to be aware of. Firstly, long and short scales. Read the linked wikipedia page if you're really interested, but it's about whether you call 1,000,000,000 a billion, or a thousand million. Until 1974 British English followed the latter usage, but now all English speaking countries use the short scale (Canada uses the short scale for Canadian English and the long scale for Canadian French. Sacré bleu!).

There are also differences in how numbers are spelled out. These are often referred to as American English versus British English styles, but it's not really so clear-cut. Of course there's a Wikipedia page on English numerals, but some key examples:

British EnglishAmerican English
0.1nought point onezero point one
-1minus onenegative one
117one hundred and seventeenone hundred seventeen

Lingua::EN::Inflect

Lingua::EN::Inflect provides a range of functions for converting singular forms of words into plural, and various other things as well. The two functions relevant for this review are NUMWORDS() and ORD() function. The following shows basic usage:

use Lingua::EN::Inflect qw(NUMWORDS ORD);

$age = 46;
print "I am ", NUMWORDS($age), " years old.\n";
print "I've had my ", NUMWORDS(ORD($age)), " birthday.\n";

Which will display:

I am forty-six years old.
I've had my forty-sixth birthday.

The ORD() function takes a number and returns the ordinal form. So ORD(13) returns 13th. If NUMWORDS is passed an ordinal it will convert it to the ordinal form in words.

The following is a list of numbers I'll try with each module in turn:

-3.14minus three point one four
0zero
0.7zero point seven
+4plus four
67sixty-seven
123one hundred and twenty-three
101001one hundred and one thousand and one
1234567one million, two hundred and thirty-four thousand, five hundred and sixty-seven
10101010101ten billion, one hundred and one million, ten thousand, one hundred and one
6.02e23six point zero two two three

As you can see, it doesn't support exponential notation (as noted in the documentation), but does just fine on the others.

The NUMWORDS function also takes a number of options, as a hash following the number.

group:
says that the number should be split into groups of digits, with each group converted separately. So NUMWORDS(1997, group => 2) will return "nineteen, ninety-seven".
zero:
by default, NUMWORDS will convert 0 to 'zero', but if you want 'nought', you can specify this. So NUMWORDS(0.12, zero => 'nought') will return "nought point one two".
one:
can be used to specify an alternative to 'one' for 1, such as 'a single'.
and:
used to specify whether you want 'and' included in the generated words. NUMWORDS(123) will return 'one hundred and twenty-three'. NUMWORDS(123, and => '') will return 'one hundred twenty-three'.
dot:
used to override the word used for the decimal point, defaulting to 'point'.

There is so much more to this module. If you're generating natural language beyond just numbers, this is probably the one for you.

Lingua::EN::Numbers

This module provides two functions:

Both functions will return undef if you pass a scalar which isn't a number. The following illustrates their usage:

use Lingua::EN::Numbers qw(num2en num2en_ordinal);
    
$age = 45;
print "I am ", num2en($age), " years old.\n";
print "I've had my ", num2en_ordinal($age), " birthday.\n";

Which will display:

I am forty-five years old.
I've had my forty-fifth birthday.

num2en supports negative numbers, real numbers, and exponential notation. Here's the list of test numbers:

-3.14negative three point one four
0zero
0.7zero point seven
+4positive four
67sixty-seven
123one hundred and twenty-three
101001one hundred and one thousand and one
1234567one million, two hundred and thirty-four thousand, five hundred and sixty-seven
10101010101ten billion, one hundred and one million, ten thousand, one hundred and one
6.02e23six point zero two times ten to the twenty-third

Interestingly the module generates a mix of British and American styles, and there's no way to specify which style you want.

When I first wrote this review, I found a bug in the module, and there was another bug listed on RT, with a fix attached. Sean Burke, who wrote this module, kindly gave me co-maint, and I have released new versions which resolve all outstanding issues on RT.

Lingua::EN::Nums2Words

This module provides four functions:

The following illustrates their usage:

use Lingua::EN::Nums2Words;
Lingua::EN::Nums2Words::set_case('lower');

$age = 45;
print "I am ", num2word($age), " years old.\n";
print "I've had my ", num2word_ordinal($age), " birthday.\n";
print "I've had my ", num2word_short_ordinal($age), " birthday.\n";
print "Pay me ", num2usdollars($age), ".\n";

Which will display:

I am forty-five years old.
I've had my forty-fifth birthday.
I'm in my 46th year.
Pay me forty-five dollars and zero cents.

By default the generated text is in upper case. The set_case() call in the above example requests lower case output.

num2word supports negative numbers and real numbers, but not exponential notation. Here's the output from the list of test numbers:

-3.14negative three and fourteen hundredths
0zero
0.7zero and seven tenths
+4four
67sixty-seven
123one hundred twenty-three
101001one hundred one thousand, one
1234567one million, two hundred thirty-four thousand, five hundred sixty-seven
10101010101ten billion, one hundred one million, ten thousand, one hundred one
6.02e23six and two thousand hundred-thousandths

Notes:

num2word generates American style spellings (no "and" in longer numbers), and american style fractions. There is no way to configure the the module for British or US spellings (as noted in the documentation).

Math::BigInt::Named

This module is a subclass of Math::BigInt, adding a method for spelling out a BigInt. Numbers can be spelled out in English or German, but I've only tested the former.

use Math::BigInt::Named;
    
$age = 45;
$bigint = Math::BigInt::Named->new($age);
print "I am ", $bigint->name, " years old.\n";
# or
print "I am ", Math::BigInt::Named->name($age), " years old.\n";

Which will display:

I am fourtyfive years old.

Note: version 0.3 (at the time of writing, the latest version on CPAN) has a bug which makes the module unusable. I've submitted a bug report and patch.

Here's the list of test numbers passed through the above code:

-3.14failed
0failed
0.7failed
+4failed
67failed
123failed
101001failed
1234567failed
10101010101failed
6.02e23failed

BigInt truncates real numbers to integer values, which is why 0.7 becomes "zero". Negative fractional numbers result in NaN though. There are some missing spaces as well (bug reported for that too).

The module uses long scales (e.g. "milliard" instead of "billion"), which aren't used in American, British or any other English.

Given the various problems with this module, I don't think there are any occasions when you might use it for English. The German spelling may be better.

Number::Spell

This module provides one function, spell_number, which will spell out integers only.

use Number::Spell;

$age = 45;
print "I am ", spell_number($age), " years old.\n";

Which will display:

I am forty five years old.

spell_number produces "American formatting" by default, but the Format parameter can be passed to request "European formatting":

$n = 1017324034;
print "American: ", spell_number($n), "\n";
print "European: ", spell_number($n, Format => 'eu'), "\n";

Which results in:

American: one billion seventeen million three hundred twenty four thousand thirty four
European: one thousand seventeen million three hundred twenty four thousand thirty four

The European option actually just switches to long scales, and is thus misnamed. Many European languages do use long scales, but British English uses short scales. It does generate American style spelling ("and" isn't included when spelling out large numbers), and there's no way to request British spelling.

Here's the list of test numbers:

-3.14three negative
0zero
0.7zero
+4four
67sixty seven
123one hundred twenty three
101001one hundred one thousand one
1234567one million two hundred thirty four thousand five hundred sixty seven
10101010101ten billion one hundred one million ten thousand one hundred one
6.02e23six

There are a number of problems:

There are more bugs in RT, one of which includes a patch.

Comparison

Most of the interesting points have been covered in the discussions on each module, but I'll show the results of a performance test, and then present a summary table of the different capabilities.

Performance

I Benchmark'd the modules, converting numbers from 0 to 100,000. The following table shows the time taken.

ModuleTime (s)
Number::Spell1.56
Lingua::EN::Nums2Words2.40
Lingua::EN::Numbers3.44
Lingua::EN::Inflect5.17
Math::BigInt::Named88.31

Not that surprising that the BigInt module takes a lot longer.

Capabilities

The following table summarises the capabilities described above. Let me know if there's any other aspect you'd like to see compared.

Lingua :: EN ::
Inflect
Lingua :: EN ::
Numbers
Lingua :: EN ::
Nums2Words
Math :: BigInt ::
Named
Number :: Spell
Positive integers
Negative integers buggy
Real numbers
Scientific notation
Long scales
Short scales
American style partial partial
British style partial
Ordinal

I can't imagine anyone wants Long Scales, but please let me know if you do!

Conclusion

Overall the best module is Lingua::EN::Numbers. It covers the broadest range of input, and supports ordinals. The module could be improved with the addition of parameters for controlling the formatting:

Lingua::EN::Inflect is a close second: it supports a couple of the points above, but doesn't support exponential numbers.

Now I'm maintainer for Lingua::EN::Numbers, I'm going to think about how best to integrate those ideas.

If you want "US formatting" (no "and", for example), then Lingua::EN::Nums2Words is the one for you. It's not far behind when comparing features, and the main quirk (for me) is the way fractions are spelled out.

comments powered by Disqus