prog: (monkey)
[personal profile] prog
[livejournal.com profile] daerr just posted some interesting graphs that measure changes in the CPAN's activity rates over the last decade or so. (CPAN being Perl's distributed, internet-based archive of code libraries and other stuff, and approximately 51 percent of what makes Perl my favorite programming language.)

I'm interested to see that the number of new users started to drop off a few years ago, but the activity of existing users has been increasing so much that the archive's overall activity continues to trend upwards.

If I had to guess a single reason for this, it'd involve the community getting better over the last several years at corralling many hackers together into large, frequently updated projects, which then get stored in the CPAN under a single username (as that's a limitation of the system). I think, for example, of DBIx::Class's recent ascendency, and clear community dominance, over the thousand SQL-abstraction modules that came before it. So you have fewer instances of people creating new CPAN author accounts just to upload their own wheel-reinventions.

Date: 2008-10-09 02:41 am (UTC)
From: [identity profile] mr-choronzon.livejournal.com

can you plot the average number of lines of code per author per month?

if it is sharply increasing in the recent past, then your guess is most likely correct.


Date: 2008-10-09 07:12 am (UTC)
From: [identity profile] daerr.livejournal.com
No, the stats are generated from a list of just date + filename. Even if I had a full copy of the backpan (which is certainly available, it can't be more then 10 or 20 gb now), determining the number of lines of code would be even more of a bs number then some of the ones already there (like distributions).

I mean, I guess we could diff against the previous version of a distribution when we see a new one and count the new lines... but then, when modules change names then every line in the project counts as a new line. And determining what constitutes code is a little tricky... do we include makefiles? READMEs? Changelogs? What about the SVN generated changelogs (and how do we tell them apart from regular ones). And so on...

Date: 2008-10-09 11:51 am (UTC)
From: [identity profile] mr-choronzon.livejournal.com

My thought is that this is a rough measure of user activity, which would be reasonably accurate when averaged over the entire population of contributors.

Another alternative would be to measure the average number of lines of code associated with a user. Then you sidestep all of the issues with commits etc.

anyway, I'm just throwing out idle statistics that could support your supposition, because its interesting and because my mind is in work-mode. :)

August 2022

S M T W T F S
 123456
78910111213
14151617181920
21222324252627
28 293031   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Aug. 11th, 2025 12:08 pm
Powered by Dreamwidth Studios