This article is a comparison of CPAN modules for spelling out numbers in words, in English. Here's a summary of the modules covered:
Module | Doc | Version | Author | # bugs | # users | Last update |
---|---|---|---|---|---|---|
Lingua::EN::Inflect | CPAN | 1.894 | Damian Conway | 14 | 30 | 2012-06-14 |
Lingua::EN::Numbers | CPAN | 1.04 | Neil Bowers | 1 | 9 | 2012-06-21 |
Lingua::EN::Nums2Words | CPAN | 1.14 | Lester Hightower | 0 | 0 | 2011-10-02 |
Math::BigInt::Named | CPAN | 0.03 | Tels | 2 | 0 | 2007-02-24 |
Number::Spell | CPAN | 0.04 | Les Howard | 4 | 0 | 2000-03-11 |
I'll look at each module in turn, then present results of comparing the modules, and finally which module you should use when.
I haven't included modules for other languages in the evaluation, as I don't speak any of them well enough to check the generated word strings. Following the discussion on the modules above, I've included a list of related modules.
Before we start, there are two things you need to be aware of. Firstly, long and short scales. Read the linked wikipedia page if you're really interested, but it's about whether you call 1,000,000,000 a billion, or a thousand million. Until 1974 British English followed the latter usage, but now all English speaking countries use the short scale (Canada uses the short scale for Canadian English and the long scale for Canadian French. Sacré bleu!).
There are also differences in how numbers are spelled out. These are often referred to as American English versus British English styles, but it's not really so clear-cut. Of course there's a Wikipedia page on English numerals, but some key examples:
British English | American English | ||||
---|---|---|---|---|---|
0.1 | nought point one | zero point one | |||
-1 | minus one | negative one | |||
117 | one hundred and seventeen | one hundred seventeen |
Lingua::EN::Inflect provides a range of functions for converting singular forms of words into plural, and various other things as well. The two functions relevant for this review are NUMWORDS() and ORD() function. The following shows basic usage:
use Lingua::EN::Inflect qw(NUMWORDS ORD); $age = 46; print "I am ", NUMWORDS($age), " years old.\n"; print "I've had my ", NUMWORDS(ORD($age)), " birthday.\n";
Which will display:
I am forty-six years old. I've had my forty-sixth birthday.
The ORD() function takes a number and returns the ordinal form. So ORD(13) returns 13th. If NUMWORDS is passed an ordinal it will convert it to the ordinal form in words.
The following is a list of numbers I'll try with each module in turn:
-3.14 | minus three point one four | ||
0 | zero | ||
0.7 | zero point seven | ||
+4 | plus four | ||
67 | sixty-seven | ||
123 | one hundred and twenty-three | ||
101001 | one hundred and one thousand and one | ||
1234567 | one million, two hundred and thirty-four thousand, five hundred and sixty-seven | ||
10101010101 | ten billion, one hundred and one million, ten thousand, one hundred and one | ||
6.02e23 | six point zero two two three |
As you can see, it doesn't support exponential notation (as noted in the documentation), but does just fine on the others.
The NUMWORDS function also takes a number of options, as a hash following the number.
There is so much more to this module. If you're generating natural language beyond just numbers, this is probably the one for you.
This module provides two functions:
Both functions will return undef if you pass a scalar which isn't a number. The following illustrates their usage:
use Lingua::EN::Numbers qw(num2en num2en_ordinal); $age = 45; print "I am ", num2en($age), " years old.\n"; print "I've had my ", num2en_ordinal($age), " birthday.\n";
Which will display:
I am forty-five years old. I've had my forty-fifth birthday.
num2en supports negative numbers, real numbers, and exponential notation. Here's the list of test numbers:
-3.14 | negative three point one four | ||
0 | zero | ||
0.7 | zero point seven | ||
+4 | positive four | ||
67 | sixty-seven | ||
123 | one hundred and twenty-three | ||
101001 | one hundred and one thousand and one | ||
1234567 | one million, two hundred and thirty-four thousand, five hundred and sixty-seven | ||
10101010101 | ten billion, one hundred and one million, ten thousand, one hundred and one | ||
6.02e23 | six point zero two times ten to the twenty-third |
Interestingly the module generates a mix of British and American styles, and there's no way to specify which style you want.
When I first wrote this review, I found a bug in the module, and there was another bug listed on RT, with a fix attached. Sean Burke, who wrote this module, kindly gave me co-maint, and I have released new versions which resolve all outstanding issues on RT.
This module provides four functions:
The following illustrates their usage:
use Lingua::EN::Nums2Words; Lingua::EN::Nums2Words::set_case('lower'); $age = 45; print "I am ", num2word($age), " years old.\n"; print "I've had my ", num2word_ordinal($age), " birthday.\n"; print "I've had my ", num2word_short_ordinal($age), " birthday.\n"; print "Pay me ", num2usdollars($age), ".\n";
Which will display:
I am forty-five years old. I've had my forty-fifth birthday. I'm in my 46th year. Pay me forty-five dollars and zero cents.
By default the generated text is in upper case. The set_case() call in the above example requests lower case output.
num2word supports negative numbers and real numbers, but not exponential notation. Here's the output from the list of test numbers:
-3.14 | negative three and fourteen hundredths | ||
0 | zero | ||
0.7 | zero and seven tenths | ||
+4 | four | ||
67 | sixty-seven | ||
123 | one hundred twenty-three | ||
101001 | one hundred one thousand, one | ||
1234567 | one million, two hundred thirty-four thousand, five hundred sixty-seven | ||
10101010101 | ten billion, one hundred one million, ten thousand, one hundred one | ||
6.02e23 | six and two thousand hundred-thousandths |
Notes:
num2word generates American style spellings (no "and" in longer numbers), and american style fractions. There is no way to configure the the module for British or US spellings (as noted in the documentation).
This module is a subclass of Math::BigInt, adding a method for spelling out a BigInt. Numbers can be spelled out in English or German, but I've only tested the former.
use Math::BigInt::Named; $age = 45; $bigint = Math::BigInt::Named->new($age); print "I am ", $bigint->name, " years old.\n"; # or print "I am ", Math::BigInt::Named->name($age), " years old.\n";
Which will display:
I am fourtyfive years old.
Note: version 0.3 (at the time of writing, the latest version on CPAN) has a bug which makes the module unusable. I've submitted a bug report and patch.
Here's the list of test numbers passed through the above code:
-3.14 | failed | ||
0 | failed | ||
0.7 | failed | ||
+4 | failed | ||
67 | failed | ||
123 | failed | ||
101001 | failed | ||
1234567 | failed | ||
10101010101 | failed | ||
6.02e23 | failed |
BigInt truncates real numbers to integer values, which is why 0.7 becomes "zero". Negative fractional numbers result in NaN though. There are some missing spaces as well (bug reported for that too).
The module uses long scales (e.g. "milliard" instead of "billion"), which aren't used in American, British or any other English.
Given the various problems with this module, I don't think there are any occasions when you might use it for English. The German spelling may be better.
This module provides one function, spell_number, which will spell out integers only.
use Number::Spell; $age = 45; print "I am ", spell_number($age), " years old.\n";
Which will display:
I am forty five years old.
spell_number produces "American formatting" by default, but the Format parameter can be passed to request "European formatting":
$n = 1017324034; print "American: ", spell_number($n), "\n"; print "European: ", spell_number($n, Format => 'eu'), "\n";
Which results in:
American: one billion seventeen million three hundred twenty four thousand thirty four European: one thousand seventeen million three hundred twenty four thousand thirty four
The European option actually just switches to long scales, and is thus misnamed. Many European languages do use long scales, but British English uses short scales. It does generate American style spelling ("and" isn't included when spelling out large numbers), and there's no way to request British spelling.
Here's the list of test numbers:
-3.14 | three negative | ||
0 | zero | ||
0.7 | zero | ||
+4 | four | ||
67 | sixty seven | ||
123 | one hundred twenty three | ||
101001 | one hundred one thousand one | ||
1234567 | one million two hundred thirty four thousand five hundred sixty seven | ||
10101010101 | ten billion one hundred one million ten thousand one hundred one | ||
6.02e23 | six |
There are a number of problems:
There are more bugs in RT, one of which includes a patch.
Most of the interesting points have been covered in the discussions on each module, but I'll show the results of a performance test, and then present a summary table of the different capabilities.
I Benchmark'd the modules, converting numbers from 0 to 100,000. The following table shows the time taken.
Module | Time (s) |
---|---|
Number::Spell | 1.56 |
Lingua::EN::Nums2Words | 2.40 |
Lingua::EN::Numbers | 3.44 |
Lingua::EN::Inflect | 5.17 |
Math::BigInt::Named | 88.31 |
Not that surprising that the BigInt module takes a lot longer.
The following table summarises the capabilities described above. Let me know if there's any other aspect you'd like to see compared.
Lingua :: EN :: Inflect | Lingua :: EN :: Numbers | Lingua :: EN :: Nums2Words | Math :: BigInt :: Named | Number :: Spell | |
---|---|---|---|---|---|
Positive integers | ✓ | ✓ | ✓ | ✓ | ✓ |
Negative integers | ✓ | ✓ | ✓ | buggy | |
Real numbers | ✓ | ✓ | ✓ | ||
Scientific notation | ✓ | ||||
Long scales | ✓ | ✓ | |||
Short scales | ✓ | ✓ | ✓ | ✓ | |
American style | partial | ✓ | partial | ✓ | |
British style | ✓ | ✓ | partial | ||
Ordinal | ✓ | ✓ | ✓ |
I can't imagine anyone wants Long Scales, but please let me know if you do!
Overall the best module is Lingua::EN::Numbers. It covers the broadest range of input, and supports ordinals. The module could be improved with the addition of parameters for controlling the formatting:
Lingua::EN::Inflect is a close second: it supports a couple of the points above, but doesn't support exponential numbers.
Now I'm maintainer for Lingua::EN::Numbers, I'm going to think about how best to integrate those ideas.
If you want "US formatting" (no "and", for example), then Lingua::EN::Nums2Words is the one for you. It's not far behind when comparing features, and the main quirk (for me) is the way fractions are spelled out.
comments powered by Disqus