A mapreduce-based parallel clustering algorithm for large protein-protein interaction networks

Li LIU, Dangping FAN, Ming LIU, Guandong XU, Shiping CHEN, Yuan ZHOU, Xiwei CHEN, Qianru WANG, Yufeng WEI

Research output: Chapter in Book/Report/Conference proceedingChapters

1 Citation (Scopus)

Abstract

Clustering proteins or identifying functionally related proteins in Protein-Protein Interaction (PPI) networks is one of the most computation-intensive problems in the proteomic community. Most researches focused on improving the accuracy of the clustering algorithms. However, the high computation cost of these clustering algorithms, such as Girvan and Newmans clustering algorithm, has been an obstacle to their use on large-scale PPI networks. In this paper, we propose an algorithm, called Clustering-MR, to address the problem. Our solution can effectively parallelize the Girvan and Newmans clustering algorithms based on edge-betweeness using MapReduce. We evaluated the performance of our Clustering-MR algorithm in a cloud environment with different sizes of testing datasets and different numbers of worker nodes. The experimental results show that our Clustering-MR algorithm can achieve high performance for large-scale PPI networks with more than 1000 proteins or 5000 interactions. Copyright © 2012 Springer-Verlag Berlin Heidelberg.

Original languageEnglish
Title of host publicationAdvanced data mining and applications: 8th International Conference, ADMA 2012, Proceedings
EditorsShuigeng ZHOU, Songmao ZHANG, George KARYPIS
PublisherSpringer
Pages138-148
ISBN (Print)9783642355264
DOIs
Publication statusPublished - 2012

Citation

Liu, L., Fan, D., Liu, M., Xu, G., Chen, S., Zhou, Y., Chen, X., Wang, Q., & Wei, Y. (2012). A mapreduce-based parallel clustering algorithm for large protein-protein interaction networks. In S. Zhou, S. Zhang, & G. Karypis (Eds,). Advanced data mining and applications: 8th International Conference, ADMA 2012, Proceedings (pp. 138-148). Springer. https://doi.org/10.1007/978-3-642-35527-1_12

Keywords

  • PPI
  • Clustering
  • MapReduce
  • Edge-betweenness

Fingerprint

Dive into the research topics of 'A mapreduce-based parallel clustering algorithm for large protein-protein interaction networks'. Together they form a unique fingerprint.