Technology and Research
Intel® Technology Journal Home
Volume 09, Issue 02
Compute-Intensive, Highly Parallel Applications and Uses
Table of Contents
Technical Reviewers
About This Journal
Intel Published Articles
Read Past Journals
Subscribe
E-Mail this Journal to a Collegue
Main Visual Description Intel Technology Journal - Featuring Intel's Recent Research and Development
Compute-Intensive, Highly Parallel Applications and Uses
Volume 09    Issue 02    Published May 19, 2005
ISSN 1535-864X    DOI: 10.1535/itj.0902.04
  Section 1 of 9  
Performance Scalability of Data-Mining Workloads in Bioinformatics
Yurong Chen, Corporate Technology Group, Intel Corporation
Qian Diao, Corporate Technology Group, Intel Corporation
Carole Dulong, Corporate Technology Group, Intel Corporation
Chunrong Lai, Corporate Technology Group, Intel Corporation
Wei Hu, Corporate Technology Group, Intel Corporation
Eric Li, Corporate Technology Group, Intel Corporation
Wenlong Li, Corporate Technology Group, Intel Corporation
Tao Wang, Corporate Technology Group, Intel Corporation
Yimin Zhang, Corporate Technology Group, Intel Corporation

Index words: data mining, bioinformatics, performance scalability analysis

Citation for this paper: Chen, Y.; Diao, Q.; Dulong, C.; Lai, C.; Hu, W.; Li, E.; Li, W.; Wang, T.; Zhang, Y. "Performance Scalability of Data-Mining Workloads in Bioinformatics." Intel Technology Journal. http://developer.intel.com/technology/itj/2005/volume09issue02/
art04_data_workloads/p01_abstract.htm
(May 2005).
ABSTRACT

Data mining is the extraction of hidden predictive information from large data bases. Emerging data-mining applications are important factors to drive the architecture of future microprocessors. This paper analyzes the performance scalability on parallel architectures of such applications to understand how to best architect the next generation of microprocessors that will have many CPU cores on chip.

Bioinformatics is one of the most active research areas in computer science, and it relies heavily on many types of data-mining techniques. In this paper, we report on the performance scalability analysis of six bioinformatics applications on a 16-way SMP based on Intel® Xeon™ microprocessor system. These applications are very compute intensive, and they manipulate very large data sets; many of them are freely accessible. Bioinformatics is a good proxy for workload analysis of general data-mining applications. Our experiments show that these applications exhibit good parallel behaviors after some algorithm-level reformulations, or careful parallelism selection. Most of them scale well with increased numbers of processors, with a speed-up of up to 14.4X on 16 processors.

We start with an introduction to data mining. The data-mining techniques studied are briefly described, and the selected workloads using these techniques are listed. We then provide a brief description of the methodology used for the studies. We present the scalability analysis of three workloads related to Bayesian Network (BN) structure, two workloads relevant to recognition, and one workload related to optimization. We conclude with the key lessons of the study. These workloads are compute intensive and data parallel. They manipulate large amounts of data that stress the cache hierarchy. Techniques optimizing the use of caches are key to ensure performance scalability of these workloads on parallel architectures.

  Section 1 of 9  

Error processing SSI file
Download a PDF of this article.   
Email This Page
Back to Top