Article: Open Source Econometric Software better performance, accuracy and bug fixing than commercial software

Yalta and Yalta 2010 ("Should Economists Use Open Source Software for Doing Research?") examine the reliability, accuracy and bug fixing time for an Open Source econometric software package and five commercial econometric software packages. They find that after 5 years many of the bugs in the commercial software have not been fixed, whereas similar bugs in the Open Source software are fixed and released within a week of the discovery of the bugs.

Building on the work done by McCullough in 2004 applying a set of tests called Wilkinson's tests to the five commercial software packages, they re-apply these tests to the new versions of the commercial software, and apply the tests to the Open Source econometrics software Gretl.

The idea behind the Yalta and Yalta paper is to evaluate the bug fix time of the Open Source software, and compare this to the fixes -- if any -- and their times, that have been applied to the commercial software, since the 2004 McCullough paper. More specifically, to examine the bugs found by McCullough and see if they have been fixed.

The five packages examined here Yalta and Yalta, and earlier by McCullough are:

Bugs and bug fixing times for Gretl and commercial software

Commercial Packages
Package	Bug	Time to fix bug and release new version
Gretl	Reading files	3 days
Gretl	Rounding error	4 days
Gretl	Standard deviation	1 day
Gretl	Spearman value	1 day
E_Views	Correlations coefficients unit bug	<5yrs
LIMDEP	All	DNA^a
RATS	ZERO Correlations	>5yrs (not fixed)
RATS	Singularity correlation estimates	>5yrs (not fixed)
SHAZAM	Missing values	<5yrs
SHAZAM	X,BIG,LITTLE,MISS	<5yrs
SHAZAM	Correlations, Spearman correlation	>5yrs (not fixed)
SHAZAM	MISS correlation	>5yrs (not fixed)
SHAZAM	ZERO correlation	>5yrs (not fixed)
TSP	ZERO correlation	<5yrs
TSP	Test IIB	Failed

^a We could not apply the tests on Package2 [LIMDEP] because, unlike the other packages, the demo version offered by the vendor only allows using several built-in data sets. As a result, it was not possible without payment to know whether or not they have fixed the flaws in their product. - Yalta and Yalta 2010

Conclusions

The authors make a number of important points in their conclusions:

On the other hand, studies in the last 15 years show that commercial software vendors can also introduce various difficulties to the research process by not correcting the known errors, avoiding to give details on the algorithms, or providing false information regarding their programs. Closed source software can hurt the reliability of computational results by making it impossible to study and verify the programming code performing the myriad functions expected from today's typical econometric package. It also complicates the process of research replication, which is already an exception and not a rule in the field of economics.

The open source movement,which has started to pick momentum after 1998, is now resulting in scientific software reaching and in some cases surpassing in terms of features and usability some of the proprietary alternatives. This new paradigm also brings its own set of inefficiencies such as an over-supply or under-supply of certain types of software,a surplus of licenses as well as the potential for wasted effort due to 'hijacking,' 'forking,' and 'abandoning.' When it comes to reliability and accountability, however, FLOSS helps avoid some of the difficulties associated with proprietary programs. Open source development is a transparent and merit based process similar in some ways to academics. The availability of the source code enables its verification by a large number of people with in the economics profession. Because it is free,everyone has access to it.It is flexible and future proof. These not only result in software of a high standard, but also facilitate peer review and help advance research replication.

In an attempt to assess reliability and accountability, we applied an entry level test suite of accuracy on the gretl econometric package and discovered a number of software defects. However, because gretl is open source, our experience was considerably different in comparison to earlier studies assessing various proprietary packages...unlike the other studies, all of the errors were corrected within a week of our reporting. Moreover, each time there was a revision to one of the source files, the updated version of the program was immediately available for download and inspection...When we applied the same tests on four widely-used proprietary econometric programs, we found that the various flaws uncovered and reported in an earlier study were not necessarily corrected. Despite the 5 years passing, only two of the software vendors have fixed all of the reported errors and still there were problems in all of the packages that we were able to test.

The authors also list what they consider significant Open Source software in the economic and econometrics space:

Project	Category	Year	Developers	SLOC	Effort
GNU Octave	Numerical analysis	1988	74	853,439	238
Gnumeric	Spreadsheet	2001	9	384,341	100
Gnuplot	Scientific plotting	1986	6	95,380	24
Gretl	Econometrics	2000a	10	361,393	94
Maxima	Algebra	1998^a	17	616,576	167
PSPP	Statistics	1998	3	152,593	39
R	Statistics	1997	13	549,780^b	151
Sage	Mathematics	2005	142	195,602	51
Scilab	Numerical analysis	1994	35	1,234,895	341
SciPy	Mathematical library	2001	31	455,903	124

Source: Ohloh.net.
^a Shows the year the program became Open Source
^b Base system only. The more than 1700 contributed R extension packages are not included

New York Times article: Data Analysts Captivated by R’s Power

References and related references

Baiocchi, G. 2007. Reproducible research in computational economics:guidelines, integrated approaches, and open source software. Computational Economics 30:1:19-40.

Koenker, R., A. Zeileis. 2009. On reproducible econometric research. Journal of Applied Econometrics 24:5:833-847

McCullough, B. D. 2004. Wilkinson's tests and econometric software. Journal of Economic and Social Measurement 29:261-270.

McCullough, B.D., David A. Heiser. 2008. On the accuracy of statistical procedures in Microsoft Excel 2007. Computational Statistics & Data Analysis 52:10:4570-4578.

McCullough, B.D., K.A. McGeary, T.D. Harrison. 2008. Do economics journal archives promote replicable research? Canadian Journal of Economics/Revue canadienne d'économique 41:4:1406-1420

Smith R.J., J. Wilson Mixon. 2006. Teaching undergraduate econometrics with GRETL. Journal of Applied Econometrics 21:7:1103-1107.

Wilkinson, L. 1985. Statistical Quiz. SYSTAT, Evanston, IL.
Yalta, A., R. Lucchetti. The GNU/Linux platform and freedom respecting software for economists. Journal of Applied Econometrics 23:2:279-286

Yalta, A., A. Yalta. 2010. Should Economists Use Open Source Software for Doing Research? Computational Economics 35:4:371-394

Yalta, A., A. Yalta. 2010.Wilkinson Tests and gretl. EHUCHAPS. Universidad del País Vasco - Facultad de Ciencias Económicas y Empresariales.

Search This Blog

Zzzoot

Article: Open Source Econometric Software better performance, accuracy and bug fixing than commercial software

Conclusions

Comments

Popular posts from this blog

Java, MySql increased performance with Huge Pages

Lucene concurrent search performance with 1,2,4,8 IndexReaders

Postscript coding resources