This blog post describes a model that we found useful for talking about CPAN dependencies and reverse dependencies at the QA Hackathon. At the head of the river is Perl itself with the core modules. The river flows into the sea, which contains all distributions that aren't used by any other distribution. Other distributions sit somewhere along the river, their position determined by their reverse dependencies. This post introduces the core concepts, but nothing more.

The following picture illustrates the zones in the river model:

The sea contains distributions that aren't used by any other distributions. Currently the best we can do is to say that they're not used by any other distributions on CPAN. When we have a service for tracking DarkPAN dependencies, then some of the distributions in the sea will start moving leftwards in the diagram.

Just between the river and the sea we have the estuary: this contains distributions that are depended on by one or more distributions by the same author. Many services in the CPAN ecosystem make this same distinction.

As soon as a distribution is depended on by someone else's distribution,
it starts to move upriver. The very end of the river, just before the estuary,
is where you find distributions that are used by exactly one
other distribution, which is itself not relied on by any other distributions.
Now if you release a broken version of your distribution to CPAN,
you don't only break your distribution, you break the *downstream* distribution as well.

But if you're maintaining a distribution, you need to be aware not just of the distributions that are using your distribution directly, but all distributions that are "downstream". Consider this situation:

The distributions A and P both have 2 distributions downstream, but if the author of A looks at the reverse dependencies on MetaCPAN, (s)he might think they only have 1 downstream.

Note also as well that considering only this part of the picture, C is more fragile than either Q or R: the author of C might think she's only relying on one module, but either of A or B breaking might break C.

Now let's consider a distribution in the middle of the river:

Distribution A is used by a number of distributions, which have varying number of distributions downstream themselves.

All distributions used by A are upstream, by definition.

The immediately upstream and downstream distributions will be spread along the river, as in turn they'll have different numbers of downstream distributions.

A distribution's position on the river is determined by the number of downstream distributions. This is not just the number of distributions that are directly using your distribution. Instead it is the count of all distributions that are directly or indirectly reliant on the distribution.

At the head of the river is Perl itself and the core modules that are shipped with it. Again, by definition the core modules don't rely on any CPAN modules, but some of them are depended on by a lot of other distributions.

If you pollute a river you might cause problems for everyone downstream of you. And you're relying on the distributions upstream of you not polluting the river.

For CPAN, the pollution is bugs: if one of your upstream dists has a buggy version released to CPAN, it might break your distribution, but it might not.

The further upstream a distribution, the more distributions it can potentially break, should it pollute the river.

CPAN authors / maintainers should know where their distributions sit on the river. We should help with that, and with visualising the upstream and downstream distributions. We should let authors know when a distribution moves up or down the river, particularly sudden large moves (if a distribution much further upstream starts using your distribution, you zoom to a position upstream of them).

Those and many more related topics will be covered over the coming months.

comments powered by Disqus