Study: Reduced Open Source developer productivity linked to "restrictive" FLOSS licenses (where "restrictive"=GPL and "non-restrictive"=BSD)
A study by economists from Tel Aviv University and the Centre for Economic Policy Research (CEPR) entitled "Open source software: Motivation and restrictive licensing"[1] (pre-print) looks at the productivity of developers on Open Source projects and concludes:
and observe:
They chose the 71 most active projects on SourceForge in January 2000 and studied them over an 18 month period starting in January 2002. They measure these projects every 2 months over this period resulting in 9 samples. The metrics they used include: Source lines of code (SLOC), #contributors, the "restrictiveness" of the license (ranging from GPL = very; LGPL, Mozilla, NPL, MPL = moderate; or BSD = non), operating system, age of project, if it is a desktop or system application, language (C++ or C = 1; all others = 0), and others. They took in to account the difference between the LOC of language by separately also looking at just the C++ or C projects.
I do not understand the lag in choosing the projects (January 2000) and the start of the data sampling (January 2002). This in itself could have skewed the results, i.e. the 71 most active projects in 2000 would almost definitely NOT be the most active 2 years later. I think this may be a major flaw in this study.
I also don't think that the sampling size is large enough & that the sampling method should have been a random selection of projects that met some reasonable criteria, like:
I haven't taken too much time to go over all of their experimental design, model & stats....
This study builds on an earlier study titled "The Scope of Open Source Licensing"[2] 2005, (pre-print), which is where the authors get their view of "restrictiveness" for licenses. This study found:
[1] Fershtman, C. & N. Gandal. 2007. Open source software: Motivation and restrictive licensing. International Economics and Economic Policy. http://dx.doi.org/10.1007/s10368-007-0086-4
[2] Lerner J, Tirole J (2005) The scope of open source licensing. Journal of Law, Economics and Organization 21:20–56
"...that the output per contributor in open source projects is much higher when licenses are less restrictive and more commercially oriented."
and observe:
"Projects written for the Linux operating system have lower output per contributor than projects written for other operating systems..."They also observed that the median # of contributors in "restrictive" projects (13) to be much less than for "non-restrictrive" projects (35).
and:
"Output per contributor in projects oriented towards end users (DESKTOP) is significantly lower than that in projects for developers."
They chose the 71 most active projects on SourceForge in January 2000 and studied them over an 18 month period starting in January 2002. They measure these projects every 2 months over this period resulting in 9 samples. The metrics they used include: Source lines of code (SLOC), #contributors, the "restrictiveness" of the license (ranging from GPL = very; LGPL, Mozilla, NPL, MPL = moderate; or BSD = non), operating system, age of project, if it is a desktop or system application, language (C++ or C = 1; all others = 0), and others. They took in to account the difference between the LOC of language by separately also looking at just the C++ or C projects.
I do not understand the lag in choosing the projects (January 2000) and the start of the data sampling (January 2002). This in itself could have skewed the results, i.e. the 71 most active projects in 2000 would almost definitely NOT be the most active 2 years later. I think this may be a major flaw in this study.
I also don't think that the sampling size is large enough & that the sampling method should have been a random selection of projects that met some reasonable criteria, like:
- had at least C contributors
- had at least L lines of code contributed over the last M months
- had at least D downloads over the last M months (penalized very new & very unpopular projects??)
I haven't taken too much time to go over all of their experimental design, model & stats....
This study builds on an earlier study titled "The Scope of Open Source Licensing"[2] 2005, (pre-print), which is where the authors get their view of "restrictiveness" for licenses. This study found:
"Projects geared toward end-users tend to have restrictive licenses, while those oriented toward developers are less likely to do so. Projects that are designed to run on commercial operating systems and whose primary language is English are less likely to have restrictive licenses. Projects that are likely to be attractive to consumers—such as games—and software developed in a corporate setting are more likely to have restrictive licenses. Projects with unrestricted licenses attract more contributors."This study used all 40k SourceForge projects available (2002).
[1] Fershtman, C. & N. Gandal. 2007. Open source software: Motivation and restrictive licensing. International Economics and Economic Policy. http://dx.doi.org/10.1007/s10368-007-0086-4
[2] Lerner J, Tirole J (2005) The scope of open source licensing. Journal of Law, Economics and Organization 21:20–56
Comments
But thats just me :)
Source Forge is undeniably one of the most important OSS repositories in the world. On the other hand many of the biggest, most important OSS projects are hosted elsewhere.
For example - KDE has more than 13 developers and it is licensed under the "restrictive" GPL. FireFox has more than 13 developers and is under the "moderately restrictive" MPL. This list could go on and on.
These guys made the classic research faux pax. They looked at a skewed unrepresentative sample and then announced that their results are globally relevant. They would have been much better off restricting their commentary to what they can prove about the projects they looked at.
In engineering.. in life.. there is always a trade-off. This means that marketing will always have an angle no matter how messed up their intentions.
>> "...that the output per contributor in open source projects is much higher when licenses are less restrictive and more commercially oriented."
Com'n guys. First to code up 1,000,000 lines gets $100 dollars.
Ready set go!
Ah, chucks. I'm being bad.. The real reason for the increased production per BSD developer is that there is a lot more competition to come up with GPL code so each person has a harder time getting quality code in ahead of someone else. This means that each person *officially contributes* less [there being so many more people contributing].
There could even be another reason. With so much more GPL code out there than BSD code, there is a lot more for BSD people to copy from that is new to them than there is for GPL people to copy [copy via studying not copy/paste]. I'd say the GPL people are pulling everyone else. The much smaller number of BSD people (to a greater extent) are like the worker bees trying to help keep the proprietary vendors from falling too far behind by bringing to them the fruits n honey of GPL developers.
Hence the greater sloc for BSD worker bees.
Can I slant or what?
In the FLOSS world, the software either works or it doesn't. It doesn't matter how many eyeballs it took because many of the eyeballs are not being paid. The eyeballs frequently can afford to be extra careful and take their time because of not having an externally imposed deadline. The people that own the eyeballs, who use the product, gain directly from the quality of the product not from the lines of code it took to write it. A single fix might be all that a particular eyeball really cares about. In this world, SLOC doesn't even guide the decisions of a typical intelligent coder artificially. [Bad pun I agree.]
As stated in the earlier posting, if the less permissive licenses are more popular, it means that the average sloc will go down in all likelihood with more people competing for the set of ideas and features. Also, as stated earlier, with more code ideas being expressed first with less permissive licenses, everyone else coming after can more easily pad their sloc's. And then there is the possibility that BSD people are simply more greedy and exclusive.
Did a proprietary monopolist -- or any company -- have anything to do with a SLOC report that appears to indicate that coders that use license terms that benefit the monopolist have a bigger ... um, produce more babies.. I mean, are more fertile and prolific? I don't know. I doubt it. Could this monopolist -- or any company, even a company that produces horribly malfunctioning gaming systems that should be given away, not sold -- sponsor multiple experiments in parallel allowing only the "good" ones -- or should I say lucky ones wink wink -- to be published? I don't know, and I doubt it.
Regardless of who dunnit or why -- and it may have been a monopolist or it may not have been a monopolist -- the bottom line is that there are more reports and statistics to "prove" I am superman (as if we don't all already know that I am) than there is land in Florida.
And as for the value of SLOC.. Surely, 100 less tasty pies is a better thing than 85 tastier pies. Surely. Surely with (eg) Windows filled with so much more bloat than Linux, Windows has to be better. Surely.
Surely. Surely. Surely.
So all of you go out and code with less permissive licenses so you can all be real macho men (or lovely little fertile princesses).
OK, still don't believe me?
Can you write any more than me here now where I am writing a lt as clearlay I am a more powful wirter adn coder tan andyone else as I write a lot and you dont nannynanyboobobo.. Ha seell ##RWERWETDR i knalkdfn sf sdlf you lfklsadjf klasdlfk h e afsdlkfjaks df lkasthey are coming lkjfklsjfkldsj fdkswhoah .jskdfjslkdfj sdfj asdfk umm faskdfjaslkdfj asdfk sdf comem and get it. slkdjfalksjdflkasjdfk lasdjf sdf whay chu mean lkdjflskjfalksdjfklasdfj asdkfj hello lksjdflkasjdfkljsd testing testing kldsjfklasdjfklasdjf kj 1kj klfjaskldj fklj3kj kljdkfljds kjkt6jkjf kdjfkla sdklfj askdlfj sdkfjklsdfjklsjfa;lksjfl;kasjdfl;kasjdfasdjf
aklsdfj;lasjdf;laksdjf;lkasdf
asdkfjal;skdfjjjjjjjjjjjjfalksdjfasdjfalksdjflkasdjflkajsdflkjasdlkfjasldkfjsdlkfjalksjfsdaf
I hope I made my point because I am not getting paid for any of this.