Parallel Computing for Clustering of Large Datasets

Matthias K. Gobbert, Department of Mathematics and Statistics, UMBC
Robin Blasberg, Naval Research Laboratory, Washington, D.C.

 

Affinity propagation is a recently introduced clustering algorithm that accomplishes the recognition of patters in data sets by iteratively updating several matrices. The method has great potential for large data sets, in particular if the number of clusters in the data set is also large and not known in advance. But the method’s data structures require large amounts of memory, which is available on a parallel computer, but the formulation of the algorithm involving row and column oriented operations holds also great potential for efficient parallelization. Early work on this problem demonstrates the excellent scalability of our implementation of the method.

 

Publications:

  1. Robin Blasberg and Matthias K. Gobbert, Parallel Performance Studies for a Clustering Algorithm, Technical Report number HPCF-2008-5, UMBC High Performance Computing Facility, University of Maryland, Baltimore County, 2008.
    (HPCF machines used: hpc and kali.)
    PDF
  2. Robin Blasberg and Matthias K. Gobbert, MVAPICH2 vs. OpenMPI for a Clustering Algorithm, Technical Report number HPCF-2008-7, UMBC High Performance Computing Facility, University of Maryland, Baltimore County, 2008.
    (HPCF machines used: hpc.)
    PDF
  3. Robin Blasberg and Matthias K. Gobbert, Clustering Large Data Sets with Parallel Affinity Propagation, Technical Report number HPCF-2008-8, UMBC High Performance Computing Facility, University of Maryland, Baltimore County, 2008. (HPCF machines used: hpc.)