A General Distributed Clustering algorithm (GDCluster) is proposed and instantiated with two popular partition based and density-based clustering methods. Using gossip-based communication the nodes gradually build a summarized view of the data set by continuously exchanging information on data items and data representatives. GDCluster can cluster a data set which is dispersed among a large number of nodes in a distributed environment. It can handle two classes of clustering, namely partition-based and density based. GDCluster is able to achieve a high-quality global clustering solution, which approximates centralized clustering. GDCluster A General Decentralized Clustering Algorithm
The existing system considers the clustering of large datasets distributed over a network of computational units using a decentralized K-means algorithm. To obtain the same codebook at each node of the network, the system uses a gossip aggregation protocol where only small messages are exchanged. The algorithm with a centralized K-means provided a bound on the number of small messages each node has to send is met.
We propose a GDCluster; it can cluster a data set which is dispersed among a large number of nodes in a distributed environment. It can handle two classes of clustering, namely partition-based and density based, while being fully decentralized and asynchronous. Our system dealing with dynamic data and evolving the clustering model and Empowering nodes to construct a summarized view of the data, to be able to execute a customized clustering algorithm independently. Execute weighted clustering algorithms to build the clustering models. A distributed K-means clustering algorithm for P2P networks in which nodes communicate with their immediate neighbors. Each node is required to store history of cluster centroids per each K-mean iteration.