银耳孢糖胶囊价格:Speed comparison of various data analysis software

来源:百度文库 编辑:九乡新闻网 时间:2024/04/29 21:24:49
Speed comparison of various number crunching packages (version 2)

Speed of execution is an important aspect in choosing a data analysis software. Since it can vary from a factor 10, or more, on the same computer, this can make the difference between a quick-reacting package and another one that seems to takes hours to calculate!

This is the second version of our benchmark tests, derived from Stephan Steinhaus' benchmark v. 2. You can find a (quite outdated) test with our first version here. The tests in our first version were scaled in such a way than each of them ran in about 1 second on the test machine (a Celeron 500Mhz with 256 Mb RAM under Windows 2000 professional) with our reference software: Matlab 6.0 (R12). For this second version, we decided to change the reference software to a freely available software. This way, everybody can download it and use it also as a reference in its own computer. We chose R version 1.6.2 with a standard, non processor-optimized ATLAS (Rblas.dll) library as our new reference. All tests are scaled in order to run in 1 +/- 0.1 sec in our new test computer: a Pentium IV 1.6 Ghz with 1 Gb RAM under Windows XP professional. Other changes from the original Steinhaus' benchmark are still the same as version 1: (1) we kept only tests that run on all checked software, (2) we ranged them in two categories ("matrix calculation" versus "matrix functions"), (3) we added "programming" category to evaluate how fast the software executes scripts, (4) we adapted or optimized tests to recent versions of the software, and (5) we considered only trimmed geometric means (worst and best results eliminated) inside each category and for the overall index. Note that Stephan Steinhaus' report evaluates also the "richness" of the packages (which functions are present, and which one are absent). Here, we only compare software for speed!

We have compared:

R 1.9.0, the latest version of our reference software, a rich and powerful free 'S language dialect' (R benchmark 2.3 script; text file, 13 Kb). Here, we use the Pentium IV-optimized ATLAS library (provided on CRAN), which gives slightly better results in some tests. We have not tested other optimized libraries, like Goto's one. S-PLUS 6.1, the commercial equivalent of R (S-PLUS benchmark 2 script; text file, 10 Kb) Matlab 6.0 (R12), our previous reference (download Matlab benchmark 2 script and accompanying gcd2.m custom function; text file, 10 Kb). Warning! This is not the latest version. At the time I write this (21 April O-Matrix 5.6, a cheap but very fast package, that can run most Matlab scripts (O-Matrix native mode benchmark 2 & O-Matrix Matlab mode benchmark 2 scripts; text files, 10 Kb each) Octave 2.1.42, a free "clone" of Matlab 4 (Octave benchmark 2 script; text file, 10 Kb). The version used was compiled with an optimized ATLAS library for the Pentium IV. Scilab 2.7, a very complete free software, "not unlike" Matlab (Scilab benchmark 2 script; text file, 10 Kb) Ox 3.30, a very efficient matrix package similar to Gauss and free for academic use (Ox benchmark 2 script; text file, 11 Kb)

Tests are:

I. Matrix calculation: evaluates the ability of performing some common matrix computations.

I.A: creation, transposition, deformation of a 1500x1500 matrix. This test evaluates the ability to create and manipulate matrices. I.B: creation of a 800x800 normally distributed random matrix and taking the 1000th power of all its elements. Evaluates the speed at which a random matrix is processed element by element. I.C: sorting of 2,000,000 random values. Tests the speed of a sorting operation. I.D: 700x700 cross-product matrix (b = a' * a). Evaluates matrix operations. I.E: linear regression over a 600x600 matrix (b = a \ b'). Tests the speed of execution for linear models evaluation.

II. Matrix functions: evaluates speed of some preprogrammed matrix functions.

II.A: fast Fourier transform over 800,000 values. Fourier transform is a commonly used method in signal processing. II.B: eigenvalues of a 320x320 random matrix. Eigenvalues are used in multivariate analyses (PCA, ...). II.C: determinant of a 650x650 random matrix. Calculation of the determinant of a matrix is a common, but unequally optimized, function in matrix calculation packages. II.D: cholesky decomposition of a 900x900 matrix. Another commonly preprogrammed function. II.E: inverse of a 400x400 random matrix. A computationally intensive function for which various algorithms exist (with very different performances).

III. Programming: evaluates efficiency to run scripts and custom functions.

III.A: 750,000 Fibonacci numbers calculation. This evaluates the speed of vector calculation. III.B: creation of a 2250x2250 Hilbert Matrix. Evaluates performances in matrix calculation in scripts. III.C: grand common divisors of 70,000 pairs. Tests potentials in using recursive functions. III.D: creation of a 220x220 Toeplitz matrix. Check the speed of execution for loops. III.E: Escoufier's method on a 37x37 random matrix. Tests various aspects of programming combined in a single test.

Note that tests III.A-E are not most optimized algorithms for each package, but they do test similar features in all of them. For instance, a matrix algorithm for test III.D is often much more efficient, as is a possibly preprogrammed toeplitz() function. Yet, we keep the loop algorithm in all cases... in order to test the speed of loops execution in scripts!

Results

The tests were run three times on a Pentium IV 1.6 Ghz computer with 1 Gb of memory under Windows XP professional and the mean value is recorded. The next table presents results:

Test (sec) R 1.9.0 S-PLUS 6.1 Matlab 6.0 O-Matrix
5.6 Ml mode
O-Matrix 5.6 native Octave 2.1.42 Scilab 2.7 Ox 3.30 I. Matrix calculation I.A 1.49 3.03 0.48 0.69 0.58 2.01 1.19 0.74 I.B 0.43 1.37 0.42 0.53 0.62 1.22 0.70 0.94 I.C 0.87 2.38 0.89 0.98 0.98 7.77 2.00 1.97 I.D 0.26 0.72 0.73 0.19 0.30 0.35 8.58 0.45 I.E 0.26 1.33 0.24 0.17 0.14 0.78 2.11 1.04 Score 0.46 1.63 0.53 0.41 0.48 1.24 1.71 0.90 II. Matrix functions II.A 1.01 1.62 0.48 0.99 1.05 0.96 1.78 3.06 II.B 1.25 0.96 0.86 0.41 0.49 2.30 2.44 1.78 II.C 0.30 0.41 0.27 0.13 0.14 1.02 2.27 0.71 II.D 0.24 1.92 0.33 0.11 0.12 0.21 1.96 0.36 II.E 0.14 1.48 0.23 0.07 0.06 0.47 1.67 0.35 Score 0.42 1.35 0.35 0.18 0.20 0.77 2.00 0.77 III. Programming III.A 0.83 1.68 2.11 0.31 1.84 2.06 0.72 0.69 III.B 1.33 1.14 0.84 0.51 0.64 0.73 0.91 0.79 III.C 0.56 0.71 0.91 0.14 0.17 0.42 1.52 0.72 III.D 0.67 6.62 0.38 0.10 0.10 4.39 1.45 0.05 III.E 0.89 15.10 1.92 0.60 0.56 3.08 3.97 0.31 Score 0.79 3.15 1.14 0.28 0.39 1.67 1.26 0.54 Total 10.52 40.47 11.12 5.93 7.83 27.76 33.27 13.97 Overall 0.53 1.71 0.60 0.27 0.34 1.17 1.63 0.72

Comments

The higher the result (in seconds), the slower the test executes. Low values mean thus higher performances. Results lower than 0.50 (more than twice faster than the reference) are in green; result larger than 2.00 (more than twice slower than the reference) are in violet. We immediately see the progress made in R since version 1.6.2 (about 30% faster, but as much as four to seven times faster for some operations using the optimized libraries).

S-PLUS is a well-recognized standard in statistics, and it is the commercial counterpart of R. As we see here, it is much more slower than R under Windows (it takes four times more to complete all tests)! S-PLUS is well-know for its versatility, and for the ease of exploring statistical models in its environment. It excels in almost all fields of statistics. However, its limits are reached when working with huge datasets. In this case, SAS (not evaluated here) is considered to be faster, and thus more efficient, especially in loops programming where S-PLUS is desperately slow (test III.E)! However, S-PLUS propose alternatives: the For() function for optimized loops, and the apply() family of functions that "vectorize" loops. With middle-size matrices, as in the current test, it is easily outperformed by almost all the other software evaluated here.

Since R offers similar features than S-PLUS, a larger number of additional libraries (more than 300!), and is totally free, it is clearly an excellent choice for statistical analyses. This benchmark shows also that it is also quite good for "number crunching". Moreover, it runs on almost all platforms (Windows, Macintosh, Unix/Linux) and it has not the "loop problem" of S-PLUS (yet it also provides apply() and the like to accelerate loops). However, it does not propose (yet) the same nice user interface with menus and dialog boxes (GUI) as S-PLUS 6.1 does,... (though many professionals do not care about that because they prefer to use scripts and the command line for a finer control on their calculations). R becomes better and better with the successive releases. It is maintained and enriched by a very active community of developers. These are the reasons why we decided to promote it as a reference in our benchmark tests. 

Matlab 6 is a commercial standard in pure matrix calculation. It is significantly poorer in statistical models than S-PLUS or R, but it offers a wide range of high-quality toolboxes for specific applications (although, they increase the cost of this already very expensive software!). Concerning speed, it is about as fast as R 1.9.0. However, we did not tested the latest version, 6.5.1, that seems to provide some substantial increase in speed. As being one of the fastest, the richest, the most commonly used and having one of the best user interface, Matlab 6 deserves its status of leading product in matrix programming.  

Matlab has several contenders that propose a similar matrix language for a lower price (O-Matrix, Octave, Scilab). Among them, only one is fighting also on the performance level with Matlab 6.0: O-Matrix. Overall, O-Matrix is the fastest matrix computation package we have tested. It is much less expensive than Matlab, and it provides reasonable compatibility. However, O-Matrix does not propose the same range of specialized toolboxes and it runs only on Windows.

The two other "Matlab clones" (Octave & Scilab) are free open source software. Their performances are somewhat lower than Matlab 6.0 and better compare with Matlab 5.3 (see version 1 of the test). Octave aims to be fully compatible with the base version of Matlab 4.2. One should note that Octave runs under the cygwin emulation of Unix in Windows, and this has probably some negative impact on its pure performances. The Unix/Linux native version should run comparatively faster. Scilab proposes many more functions than Octave, but it is not 100% compatible with the Matlab language, and it is the slowest package of this comparison if we except the "loop problem" of S-PLUS.

Ox is a little apart. It is the only package that does not claim compatibility with one of the two standards previously cited: Matlab or S-PLUS. However, it is partly compatible with Gauss, another high quality commercial matrix calculation software regarded as a standard in econometry (not evaluated here, but you will find detailed tests in Stephan Steinhaus' report). It is one of the four software (with R 1.9.0, Matlab and O-Matrix) to be faster than R 1.6.2, that is, our reference software and version for this benchmark. It is particularly good for the execution of scripts (tests III). As it is a lightweight console application that can easily run scripts in batch mode, Ox is an excellent choice to shell matrix calculation scripts in various kind of applications. O-Matrix is even faster, but it is restricted to Windows systems.

Conclusions

The choice of a data analysis software is a difficult task. "Matrix languages" (like all the software we evaluated here) are very flexible because they are programmable and they are able to work very efficiently with matrices (by definition!) that are widely used in data analysis. However, they differ from each other in term of price, richness (the number of function provided), usability (including the quality of their user interface, their status of established standard or not, the quality of their support, their availability on different platforms like Windows, Macintosh, Unix or Linux), and finally, in term of their pure performances. We evaluated the latter here by using a benchmark suite of 15 tests. Considering results obtained with our benchmark (but beware of its limits: only few features were tested, and solely on a Windows platform!), one can conclude:

R is one of the fastest open source data analysis packages. Since it is free and provides many additional
packages for all kind of statistics, we warmly recommend it.
S-PLUS is slower and much more expensive, but it still offers a better graphical user interface. Matlab is equally fast, rich and offers a well-designed user interface, but it is equally expensive. O-Matrix is the fastest matrix language we have tested on Windows. Currently, no free "clone" of Matlab is as fast as Matlab 6.0 itself. Octave is language-compatible with Matlab, but not a top performer on Windows. Scilab is a free alternative of Matlab for "richness" more than for performance.  Ox is a very efficient matrix language, especially for batch process of scripts.
Speed comparison of various data analysis software A social analysis of tagging US, China cost of living comparison misleadin... Economic Analysis of Projects - Glossary of T... Inside manufacturer of high-speed railway veh... Mankind nears the end of the age of speed - F... Various Artists -《梦的世界》(dream of world)[FLAC]... China needs to speed up development of marine... 精彩的高速摄影 The Magnificence of High Speed Photography A glimpse of PLA navy's missile speed boats 企业发展 能力分析(Analysis of Enterprises‘Development Capability) An Analysis of Muammar al-Qaddafi Green Book 企业发展能力分析(Analysis of Enterprises‘Development Capability). 企业发展能力分析(Analysis of Enterprises‘88Development Capability) Epidemiological analysis and classification of the health status of pig herds VARIOUS OPTIONS IN TERMS OF DECIDING TALENT PIPELINE PROCESS EMPLOYED BY MULTINAITONAL CORPORATION China cuts number of high-speed trains running between Beijing and Shanghai Performance Analysis of CDMA WLL Systems with Imperfect Power Control and Imperfect Sectorization 450t铸造起重机桥架开裂原因分析 Cracking Analysis of the Br... 儿童原发性甲状腺功能减低症误诊70例分析 Clinical Analysis of Mis... 51) 免费学习美语发音 (#51: American English pronunciation comparison of long e, short e and short i sounds) Chinese vice premier calls for better use of surveying, mapping data Gartner Adds Big Data, Gamification, and Internet of Things to Its Hype Cycle Various Artist -《滚石评选500有史最伟大的歌》(RollingStone The 500 Greatest Songs of All Time)[MP3!]