Intel
® 
Math Kernel Library 10.0 - ScaLAPACK
PrintEmail to a friendSupportNewsletterRSSDigg thisdel.icio.us
Product InformationEvaluation CenterSupport ResourcesDocumentation
OverviewBLAS/LAPACKScaLAPACKSparseFastVectorRandomLINPACK


Buy Product ›

Free Evaluation ›

Free Non-Commercial Download for Linux* ›

ScaLAPACK
Intel® Math Kernel Library (Intel® MKL) provides the underlying components of ScaLAPACK (Scalable Linear Algebra Package), including a distributed memory version of BLAS (PBLAS or Parallel BLAS) and a set of Basic Linear Algebra Communication Subprograms (BLACS) for inter-processor communication.

ScaLAPACK is a standard package of routines for solving linear algebra problems on distributed memory multiprocessor machines (clusters). The ScaLAPACK library is a subset of Linear Algebra Package (LAPACK) functions designed to work efficiently in a distributed memory multiprocessor environment.

Figure 1 illustrates the relationship between ScaLAPACK and its components:

ScaLAPACK diagram, image courtesy of Innovative Computing Laboratory
Figure 1: Relationship of ScaLAPACK and Components
ScaLAPACK diagram, image courtesy of Innovative Computing Laboratory*

Performance*

The Intel MKL implementation of the ScaLAPACK library is specially tuned for Itanium® , Intel® Xeon®, and Intel® Pentium® processor-based systems.

ScaLAPACK includes two areas of Linear Algebra- direct solvers and the eigenvalue problems. As such, we will look at both PDGETRF (a direct solver used for solving linear systems of equations) and PDSYEV (used for solving eigenvalue problems). PDGETRF (Parallel, Double precision, GEneral, TRiangular matrix Factorization) is a key function in the linear equations solver area because it is a general factorization routine that applies to many classes of matrices, and because the lower upper (LU) Factorization that it completes is the performance-intensive portion of linear equations solvers.

In our tests, we compare the Intel MKL implementation of ScaLAPACK to the publicly available implementation from NETLIB. We show the performance of Netlib ScaLAPACK using BLAS from Intel MKL as well as ATLAS*. More information on the ScaLAPACK library is available at http://www.netlib.org/scalapack/.*

Raw Performance
Figure 2 shows performance on a 32-node cluster with 64 Intel Xeon processors for various problem and memory sizes. Figure 2 illustrates that:

    1. Intel MKL ScaLAPACK significantly outperforms NETLIB ScaLAPACK.
    2. Intel MKL is even more impressive when compared to NETLIB ScaLAPACK using ATLAS* BLAS.


Click to enlarge

Figure 2: PDGETRF Performance Comparison Varying Problem Size


Because NETLIB ScaLAPACK requires users to link to an implementation of BLAS, the Intel MKL performance improvements from ScaLAPACK versus BLAS optimizations can be isolated and identified. A comparison of Intel MKL with NETLIB, where both are using Intel MKL BLAS, shows that the optimizations Intel has made specifically for ScaLAPACK constitute a 15 percent performance advantage over the NETLIB ScaLAPACK. The combined optimizations in Intel MKL ScaLAPACK and BLAS can deliver approximately 50% performance improvement overall when compared to NETLIB ScaLAPACK using ATLAS* BLAS.

In figure 3 below, we look at the PDSYEV which computes eigenvalues and eigenvectors of a real symmetric matrix. Using the same 32-node (64 core) cluster of Intel® Xeon® processors we see how Intel MKL can deliver double the performance of NETLIB ScaLAPACK.


Click to enlarge

Figure 3: PDSYEV Performance Comparison Varying Problem Size


A major benefit of distributed memory parallel computing (clusters) is the ability to achieve parallel computing scales of very large magnitude. As such, users of clusters often have a particular interest in the ability of software to scale in performance along with the system size. The classic test is to increase the problem size proportionally with the increase in nodes and observe the extent to which the performance grows linearly. Figure 4 below displays this and shows that Intel MKL can provide tremendous gains over NETLIB using ATLAS BLAS on large systems.


Click to enlarge

Figure 4: Performance Comparison Varying Cluster Size


Block Size Robustness
When running ScaLAPACK, you must decide how to “block” your data. The process of determining how to distribute your data among nodes involves choosing an appropriate block size. The block size determines the amount of data that goes to each node. This requires effort and choosing the wrong block size can have significant adverse effects on performance.

The Intel MKL implementation of ScaLAPACK is tolerant of block size differences. Figure 5 below shows how Intel MKL 9.0 provides approximately the same high performance regardless of block size. The same cannot be said for NETLIB ScaLAPACK.


Click to enlarge

Figure 5: Performance Comparison Varying Block Size

Summary
The Intel MKL implementation of ScaLAPACK is highly optimized for Intel® processors and can significantly increase the performance of your application compared to other implementations of ScaLAPACK.

Reference
§Performance tests and ratings are measured using specific computer systems and/or components and reflect the appropriate performance of Intel products as measured by those tests. Any difference in system design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, go to http://www.intel.com/software/products/.



Intel® Software
Network
Intel® Software Network
  • It’s free and easy to become a member, so join today!