Distributed Principal Direction Divisive Partitioning

Jacob Kogan, Department of Mathematics and Statistics

Clustering is used in a number of traditionally distant fields to describe methods for grouping of unlabeled data. Clustering very large datasets is a contemporary data mining challenge. This project concerns an application of Principal Direction Divisive Partitioning clustering algorithm (PDDP) introduced by D. Boley to a dataset residing in a number of computers connected in a network. Performance of PDDP and Distributed PDDP for datasets of moderate size will be compared.