GDCluster A General Decentralized Clustering Algorithm

Abstract

A General Distributed Clustering algorithm (GDCluster) is proposed and instantiated with two popular partition based and density-based clustering methods. Using gossip-based communication the nodes gradually build a summarized view of the data set by continuously exchanging information on data items and data representatives. GDCluster can cluster a data set which is dispersed among a large number of nodes in a distributed environment. It can handle two classes of clustering, namely partition-based and density based. GDCluster is able to achieve a high-quality global clustering solution, which approximates centralized clustering.  GDCluster A General Decentralized Clustering Algorithm

HARDWARE REQUIREMENT:
  • Speed       –    1 GHz
  • Processor    –    Pentium –IV
  • RAM       –    256 MB (min)
  • Hard Disk      –   20 GB
  • Floppy Drive       –    44 MB
  • Key Board      –    Standard Windows Keyboard
  • Mouse       –    Two or Three Button Mouse
  • Monitor      –    SVGA
SOFTWARE REQUIREMENTS:
  • Operating System        :           Windows XP or Win7
  • Front End       :           Microsoft Visual Studio 2008
  • Back End :           MSSQL Server
  • Server :           ASP Sever Page
  • Script :           C# Script
  • Document :           MS-Office 2007
EXISTING SYSTEM:

The existing system considers the clustering of large datasets distributed over a network of computational units using a decentralized K-means algorithm. To obtain the same codebook at each node of the network, the system uses a gossip aggregation protocol where only small messages are exchanged. The algorithm with a centralized K-means provided a bound on the number of small messages each node has to send is met. 

PROPOSED SYSTEM:

We propose a GDCluster; it can cluster a data set which is dispersed among a large number of nodes in a distributed environment. It can handle two classes of clustering, namely partition-based and density based, while being fully decentralized and asynchronous. Our system dealing with dynamic data and evolving the clustering model and Empowering nodes to construct a summarized view of the data, to be able to execute a customized clustering algorithm independently. Execute weighted clustering algorithms to build the clustering models. A distributed K-means clustering algorithm for P2P networks in which nodes communicate with their immediate neighbors. Each node is required to store history of cluster centroids per each K-mean iteration. 

Related Post