Evaluating Data Reliability An Evidential Answer with Application to a Web-Enabled Data Warehouse

Abstract  

There are many available methods to integrate information source reliability in an uncertainty representation, but there are only a few works focusing on the problem of evaluating this reliability. However, data reliability and confidence are essential components of a data warehousing system, as they influence subsequent retrieval and analysis.

In this paper, we propose a generic method to assess data reliability from a set of criteria using the theory of belief functions. Customizable criteria and insightful decisions are provided. The chosen illustrative example comes from real-world data issued from the Sym’Previus predictive microbiology oriented data warehouse. Evaluating Data Reliability An Evidential Answer with Application to a Web-Enabled Data Warehouse

HARDWARE & SOFTWARE REQUIREMENTS:
HARDWARE REQUIREMENT:
  • Speed       –    1 GHz
  • RAM       –    256 MB (min)
  • Hard Disk      –   20 GB
  • Floppy Drive       –    44 MB
  • Key Board      –    Standard Windows Keyboard
  • Mouse       –    Two or Three Button Mouse
  • Monitor              –    SVGA
  • Processor                                 –    Pentium –IV
SOFTWARE REQUIREMENTS:
  • Operating System        :           Windows XP
  • Application Server                 :           .NET Web Server                                           
  • Front End       :           Visual Studio 2008 ASP .NET
  • Scripts                                    :           C# Script.
  • Database      :           SQL Server 2005
EXISTING SYSTEM:

These data are used in further inferences. During collection, data reliability is mostly ensured by measurement device calibration, by adapted experimental design and by statistical repetition. However, full traceability is no longer ensured when data are reused at a later time by other scientists. This estimation is especially important in areas where data are scarce and difficult to obtain as it is the case, for example, in Life Sciences. The growth of the web and the emergence of dedicated data warehouses offer great opportunities to collect additional data, be it to build models or to make decisions. The reliability of these data depends on many different aspects Meta information data source, experimental protocol, developing generic tools to evaluate this reliability represents a true challenge for the proper use of distributed data.

DISADVANTAGES:
  • The conflicting information, as different criteria may provide conflicting information about the reliability.
  • Finally, interval-valued evaluations based on lower and upper expectation notions are used to numerically summarize the results, for their capacity to reflect the imprecision in the final knowledge.
  • Addresses the question of data ordering by groups of decreasing reliability and subsequently the presentation of informative results to end users.
PROPOSED SYSTEM:

The evaluation of their reliability, it is natural to be interested in the reasons explaining why some particular data were assessed as (un)reliable. We now show how maximal coherent subsets of criteria, i.e., groups of agreeing criteria, may provide some insight as to which reasons have led to a particular assessment.

We present an application of the method a web-enabled data warehouse. Indeed, the framework developed in this paper was originally motivated by the need to estimate the reliability of scientific experimental results collected in open data warehouses. To lighten the burden laid upon domain experts when selecting data for a particular application, it is necessary to give them indicative reliability estimations.

We will hopefully be a better asset for them to justify their choices and to capitalize knowledge than the use of an ad hoc estimation. Tools development was carefully done using Semantic Web recommended languages, so that created tools would be generic and reusable in other data warehouses. This required an advanced design step, which is important to ensure modularity and to foresee future evolutions.

ADVANTAGES:
  • This notion only makes sense if the source can be suspected of lying in order to gain some advantage, and is distinct from reliability.
  • The differentiate between individual-level and system-level trust, the former concerning the trust one has in a particular agent, while the latter concerns the overall system and how it ensures that no one will be able to take advantage .

Related Post