Maximum Likelihood Estimation from Uncertain Data in the Belief Function Framework

Abstract

We consider the problem of parameter estimation in statistical models in the case where data are uncertain and represented as belief functions. The proposed method is based on the maximization of a generalized likelihood criterion, which can be interpreted as a degree of agreement between the statistical model and the uncertain observations. We propose a variant of the EM algorithm that iteratively maximizes this criterion. As an illustration, the method is applied to uncertain data clustering using finite mixture models, in the cases of categorical and continuous attributes. Maximum Likelihood Estimation from Uncertain Data in the Belief Function Framework

HARDWARE REQUIREMENT:
  • Speed       –    1 GHz
  • Processor  –    Pentium –IV
  • RAM       –    256 MB (min)
  • Hard Disk      –   20 GB
  • Floppy Drive       –    44 MB
  • Key Board      –    Standard Windows Keyboard
  • Mouse       –    Two or Three Button Mouse
  • Monitor      –    SVGA
SOFTWARE REQUIREMENTS:
  • Operating System        :           Windows XP
  • Front End       :           Java JDK 1.7
EXISTING SYSTEM:

The uncertain data mining, probability theory has often been adopted as a formal framework for representing data uncertainty. Typically, an object is represented as a probability density function over the attribute space, rather than as a single point as usually assumed when uncertainty is neglected. Mining techniques that have been proposed for such data include clustering algorithms density estimation techniques this recent body of literature, a lot of work has been devoted to the analysis of interval-valued or fuzzy data, in which ill-known attributes are represented, respectively, by intervals and possibility distributions.

Existing techniques developed for such data, we may mention principal component analysis clustering linear regression and multidimensional scaling. Probability distributions, intervals, and possibility distributions may be seen as three instances of a more general model, in which data uncertainty is expressed by means of belief functions. The theory of belief functions, also known as Dempster-Shafer theory or Evidence theory, was developed by Dempster and Shafer and was further elaborated by Smets .

PROPOSED SYSTEM:

The best solution according to the observed-data likelihood was retained. Each object was then assigned to the class with the largest estimated posterior probability, and the obtained partition was compared to the true partition using the adjusted Rand index. As we can see, the algorithm successfully exploits the additional information about attribute uncertainty, which allows us to better recover the true partition of the data. Rand index as a function of the mean error probability on class labels, for the E2M algorithm applied to data with uncertain and noisy labels, as well as to unsupervised data. Here again, uncertainty on class labels appears to be successfully exploited by the algorithm. Remarkably, the results with uncertain labels never get worse than those obtained without label information, even for error probabilities close. The corroborate the above results with real data, similar experiments were carried out with the well-known Iris data set.5 We recall that this data set is composed of 150 4D attribute vectors partitioned in three classes, corresponding to three species of Iris.

Related Post