GDCluster A General Decentralized Clustering Algorithm

Abstract

A General Distributed Clustering algorithm (GDCluster) is proposed and instantiated with two popular partition based and density-based clustering methods. Using gossip-based communication the nodes gradually build a summarized view of the data set by continuously exchanging information on data items and data representatives. GDCluster can cluster a data set which is dispersed among a large number of nodes in a distributed environment. It can handle two classes of clustering, namely partition-based and density based. GDCluster is able to achieve a high-quality global clustering solution, which approximates centralized clustering. GDCluster A General Decentralized Clustering Algorithm

HARDWARE REQUIREMENT:

Speed – 1 GHz
Processor – Pentium –IV
RAM – 256 MB (min)
Hard Disk – 20 GB
Floppy Drive – 44 MB
Key Board – Standard Windows Keyboard
Mouse – Two or Three Button Mouse
Monitor – SVGA

SOFTWARE REQUIREMENTS:

Operating System : Windows XP or Win7
Front End : Microsoft Visual Studio 2008
Back End : MSSQL Server
Server : ASP Sever Page
Script : C# Script
Document : MS-Office 2007

EXISTING SYSTEM:

The existing system considers the clustering of large datasets distributed over a network of computational units using a decentralized K-means algorithm. To obtain the same codebook at each node of the network, the system uses a gossip aggregation protocol where only small messages are exchanged. The algorithm with a centralized K-means provided a bound on the number of small messages each node has to send is met.

PROPOSED SYSTEM:

We propose a GDCluster; it can cluster a data set which is dispersed among a large number of nodes in a distributed environment. It can handle two classes of clustering, namely partition-based and density based, while being fully decentralized and asynchronous. Our system dealing with dynamic data and evolving the clustering model and Empowering nodes to construct a summarized view of the data, to be able to execute a customized clustering algorithm independently. Execute weighted clustering algorithms to build the clustering models. A distributed K-means clustering algorithm for P2P networks in which nodes communicate with their immediate neighbors. Each node is required to store history of cluster centroids per each K-mean iteration.

TAGS : Dot Net

Dot Net

GDCluster A General Decentralized Clustering Algorithm