Some cool CPAN stats
Oct. 8th, 2008 08:09 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-userinfo.gif)
I'm interested to see that the number of new users started to drop off a few years ago, but the activity of existing users has been increasing so much that the archive's overall activity continues to trend upwards.
If I had to guess a single reason for this, it'd involve the community getting better over the last several years at corralling many hackers together into large, frequently updated projects, which then get stored in the CPAN under a single username (as that's a limitation of the system). I think, for example, of DBIx::Class's recent ascendency, and clear community dominance, over the thousand SQL-abstraction modules that came before it. So you have fewer instances of people creating new CPAN author accounts just to upload their own wheel-reinventions.
no subject
Date: 2008-10-09 02:41 am (UTC)can you plot the average number of lines of code per author per month?
if it is sharply increasing in the recent past, then your guess is most likely correct.
no subject
Date: 2008-10-09 07:12 am (UTC)I mean, I guess we could diff against the previous version of a distribution when we see a new one and count the new lines... but then, when modules change names then every line in the project counts as a new line. And determining what constitutes code is a little tricky... do we include makefiles? READMEs? Changelogs? What about the SVN generated changelogs (and how do we tell them apart from regular ones). And so on...
no subject
Date: 2008-10-09 11:51 am (UTC)My thought is that this is a rough measure of user activity, which would be reasonably accurate when averaged over the entire population of contributors.
Another alternative would be to measure the average number of lines of code associated with a user. Then you sidestep all of the issues with commits etc.
anyway, I'm just throwing out idle statistics that could support your supposition, because its interesting and because my mind is in work-mode. :)