Fighting CPAN entropy

curationCPANadoption Wed 23 April 2014

I first started the adoption list because I thought that we (the Perl community) needed a way to identify CPAN distributions that were in need of some TLC. One of the key factors used to build the list is whether a dist is being used by other CPAN dists. Today I released a new version of Text::Levenshtein, which is used by 4 other dists. I initially imagined I might just fix a couple of the outstanding bugs, but ended up shaving quite a bit more of the yak.

Sometimes I adopt modules because I start using them. I think you need to look into, and after, the modules you depend on. MIYAGAWA's Time::Duration::Parse was a recent example.

Sometimes I adopt modules to fix problems with the distribution. I adopted Chatbot::Eliza so I could fix the abstract.

And sometimes I adopt modules just because.

In a number of previous jobs I've done a lot of processing of natural language text, and have used Text::Levenshtein once or twice, and pointed it out to others. And it's used by four other distributions, so I decided it was time to see if I could adopt it. Yes!

When I adopt a module I start off by 'modernising' the distribution. Getting it 'CPANTS clean' is a good start, but here are the things I generally do:

I've noticed that I'm gradually being pulled into the Dist::Zilla club. I previously tried switching to it a couple of times, but found it very frustrating (which I assume is more about me than DZ, given the serious authors who use it). Having adopted a number of distributions recently though, I'm finding that switching the dist to DZ makes my life a lot easier: a number of the points above are handled for you.

Back to Text::Levenshtein. Having done a lot of the basics, I started looking at the bugs, and then at the code. Generally I try to take it easy at first, minimising the scope of my refactorings. But after struggling with it for a while, and then reading the wikipedia page on Levenshtein, I rewrote the whole module, based on one of the algorithms presented in pseudocode on wikipedia. That's a first for me, but it felt like the best way to deal with some of the bugs.

This might not be a very widely used module, but it feels good to fight against CPAN entropy. Plus I learned a bit about edit distance metrics.

I'd like to acknowledge DREE who first released Text::Levenshtein, and JGOLDBERG who maintained it from 2004 to 2008. I'm the third custodian, but probably won't be the last.

Are any of the modules you use candidates for adoption?

comments powered by Disqus