Efficient rank aggregation using partial data

A. Ammar, D. Shah
ACM SIGMETRICS Performance Evaluation Review, Volume 40, Issue 1, pp.355-366

The need to rank items based on user input arises in many practical applications such as elections, group decision making and recommendation systems. The primary challenge in such scenarios is to decide on a global ranking based on partial preferences provided by users. The standard approach to address this challenge is to ask users to provide explicit numerical ratings (cardinal information) of a subset of the items. The main appeal of such an approach is the ease of aggregation. However, the rating scale as well as the individual ratings are often arbitrary and may not be consistent from one user to another. A more natural alternative to numerical ratings requires users to compare pairs of items (ordinal information). On the one hand, such comparisons provide an “absolute” indicator of the user’s preference. On the other hand, it is often hard to combine or aggregate these comparisons to obtain a consistent global ranking.

In this work, we provide a tractable framework for utilizing comparison data as well as first-order marginal information (see Section 2) for the purpose of ranking. We treat the available information as partial samples from an unknown distribution over permutations. We then reduce ranking problems of interest to performing inference on this distribution. Specifically, we consider the problems of (a) finding an aggregate ranking of n items, (b) learning the mode of the distribution, and (c) identifying the top k items. For many of these problems, we provide efficient algorithms to infer the ranking directly from the data without the need to estimate the underlying distribution. In other cases, we use the Principle of Maximum Entropy to devise a concise parameterization of a distribution consistent with observations using only O(n2) parameters, where n is the number of items in question. We propose a distributed, iterative algorithm for estimating the parameters of the distribution. We establish the correctness of the algorithm and identify its rate of convergence explicitly.