With the explosive growth of information sources available on the World Wide Web, it has become increasingly necessary for users to utilize automated tools in find the desired information resources, and to track and analyze their usage patterns. Association rule mining is an active data mining research area. However, most ARM algorithms cater to a centralized environment. In contrast to previous ARM algorithms, ODAM is a distributed algorithm for geographically distributed data sets that reduces communication costs.Recently, as the need to mine patterns across distributed databases has grown, Distributed Association Rule Mining (D-ARM) algorithms have been developed. These algorithms, however, assume that the databases are either horizontally or vertically distributed. In the special case of databases populated from information extracted from textual data, existing D-ARM algorithms cannot discover rules based on higher-order associations between items in distributed textual documents that are neither vertically nor horizontally distributed, but rather a hybrid of the two.Modern organizations are geographically distributed. Typically, each site locally stores its ever increasing amount of day-to-day data. Using centralized data mining to discover useful patterns in such organizations’ data isn’t always feasible because merging data sets from different sites into a centralized site incurs huge network communication costs. Data from these organizations are not only distributed over various locations but also vertically fragmented, making it difficult if not impossible to combine them in a central location. Distributed data mining has thus emerged as an active subarea of data mining research.
A significant area of data mining research is association rule mining. Unfortunately, most ARM algorithms focus on a sequential or centralized environment where no external communication is required. Distributed ARM algorithms, on the other hand, aim to generate rules from different data sets spread over various geographical sites; hence, they require external communications throughout the entire process. DARM algorithms must reduce communication costs so that generating global association rules costs less than combining the participating sites’ data sets into a centralized site.However, most DARM algorithms don’t have an efficient message optimization technique, so they exchange numerous messages during the mining process.We have developed a distributed algorithm, called Optimized Distributed Association Mining, for geographically distributed data sets. ODAM generates support counts of candidate item sets quicker than other DARM algorithms and reduces the size of average transactions, data sets, and message exchanges. ODAM An Optimized Distributed Association Rule Mining Algorithm
EXISTING SYSTEM:
The Data mining Algorithms can be categorized into the following :
▪Association Algorithm
▪Classification
PROPOSED SYSTEM:
Unlike other algorithms, ODAM offers better performance by minimizing candidate item set generation costs. It achieves this by focusing on two major DARM issues communication and synchronization.Communication is one of the most important DARM objectives. DARM algorithms will perform better if we can reduce communication (for example, message exchange size) costs. Synchronization forces each participating site to wait a certain period until globally frequent item set generation completes.Each site will wait longer if computing support counts takes more time. Hence, we reduce the computation time of candidate item sets’ support counts.
To reduce communication costs, we highlight several message optimization techniques. ARM algorithms and on the message exchange method, we can divide the message optimization techniques into two methods direct and indirect support counts exchange. Each method has different aims, expectations, advantages, and disadvantages. For example, the first method exchanges each candidate item set’s support count to generate globally frequent item sets of that pass(CD and FDM are examples of this approach). All sites share a common globally frequent item set with identical support counts, so rules that are generated at different participating sites have identical confidence. This approach focuses on a rule’s exactness and correctness.