Ranking on Data Manifold with Sink Points

Abstract

Ranking is an important problem in various applications, such as Information Retrieval (IR), natural language processing, computational biology, and social sciences. Many ranking approaches have been proposed to rank objects according to their degrees of relevance or importance. Beyond these two goals, diversity has also been recognized as a crucial criterion in ranking. Top ranked results are expected to convey as little redundant information as possible, and cover as many aspects as possible. However, existing ranking approaches either take no account of diversity, or handle it separately with some heuristics. In this paper, we introduce a novel approach, Manifold Ranking with Sink Points (MRSPs), to address diversity as well as relevance and importance in ranking. Specifically, our approach uses a manifold ranking process over the data manifold, which can naturally find the most relevant and important data objects. Meanwhile, by turning ranked objects into sink points on data manifold, we can effectively prevent redundant objects from receiving a high rank. MRSP not only shows a nice convergence property, but also has an interesting and satisfying optimization explanation. We applied MRSP on two application tasks, update summarization and query recommendation, where diversity is of great concern in ranking. Experimental results on both tasks present a strong empirical performance of MRSP as compared to existing ranking approaches. Ranking on Data Manifold with Sink Points

HARDWARE REQUIREMENT:

Speed – 1 GHz
Processor – Pentium –IV
RAM – 256 MB (min)
Hard Disk – 20 GB
Floppy Drive – 44 MB
Key Board – Standard Windows Keyboard
Mouse – Two or Three Button Mouse
Monitor – SVGA

SOFTWARE REQUIREMENTS:

Operating System : Windows XP
Front End : JAVA
Scripts : Java Script.

EXISTING SYSTEM:

A mass of relevant objects may contain highly redundant, even duplicated information, which is undesirable for users. Furthermore, the user’s needs might be multifaceted or ambiguous. The redundance in top ranked results will reduce the chance to satisfy different users. For example, given a query “zeppelin,” if the top ranked search results were all similar articles about the “Zeppelin iPod speaker,” it would be a waste of the output space and largely degrade users’ search experience even though the results are all highly relevant to the query. Obviously, such top ranked results would not satisfy the users who want to know about the rigid airship “Zeppelin” or the rock band “Zeppelin.” Thus, it is important to reduce redundancy in these top search results. Top ranked results are expected to convey as little redundant information as possible, and cover as many aspects as possible. In this way, we are able to minimize the risk that the information need of the user will not be satisfied. Many real application tasks demand diversity in ranking. For example, in query recommendation, the recommended queries should capture different query intents of different users. In text summarization, candidate sentences of a summary are expected to be less redundant and cover different aspects of information delivered by the document. In e-commerce, a list of relevant but distinctive products is useful for users to browse and make a purchase.

PROPOSED SYSTEM:

The ranking approaches have been proposed to rank objects according to their degrees of relevance or importance. Beyond these two goals, diversity has also been recognized as a crucial criterion in ranking. The issue of diversity in ranking has been widely studied recently. Researchers from various domains have proposed many approaches to address this problem, such as Maximum Marginal Relevance (MMR) subtopic diversity cluster-based centroids selecting, categorization- based approach, and many other redundancy penalty approaches. However, these methods often treat relevance and diversity separately in the ranking algorithm, sometimes with additional heuristic procedures. Our proposed approach MRSP has not only a nice convergence property, but also a satisfying optimization explanation. The manifold ranking algorithm is proposed based on the following two key assumptions:

Nearby data are likely to have close ranking scores;
Data on the same structure are likely to have close ranking scores.

An intuitive description of the ranking algorithm is described as follows: A weighted network is constructed first, where nodes represent all the data and query points, and an edge is put between two nodes if they are “close.

TAGS : Java

Java

Ranking on Data Manifold with Sink Points