BOOK and I both recently adopted a distribution each from TOMI, URI-Title and URI-Find-Simple respectively. We got talking about adoptions, and BOOK wondered whether you could automatically identify distributions that had been adopted. A while back I briefly thought about that too, so we chatted about an initial approach. Here I describe that approach, and the code created to try it. Just so I'm clear, the blame for what follows can be put on BOOK.
The basic idea is that the following release pattern probably identifies an adoption:
This should also identify multiple adoptions, for example where author C subsequently adopts the dist from author B.
To do this we need to iterate over all releases on CPAN, not just the dists that are currently on CPAN. That's something I've done before, but always with a hack. But this time I decided to do it properly, leading to CPAN::ReleaseHistory.
That done, I knocked up a quick script, which BOOK tweaked. It seems to do ok, finding 1608 potential adoptions, when working with a minimum gap of 6 months. As noted in my previous post, CPAN's historical data isn't always 'clean', so the script can be confused:
enum created by ZENIN, adopted by ROODE, NEILB
We need to think about handling teams, and looking for other edge cases. Hopefully we might play with this at the QA Hackathon.comments powered by Disqus