Blind regression: Nonparametric regression for latent variable models via collaborative filtering

D. Song, C.E. Lee, Y. Li, D. Shah
Advances in Neural Information Processing Systems, pp. 2155-2163

We introduce the framework of {em blind regression} motivated by {em matrix completion} for recommendation systems: given m users, n movies, and a subset of user-movie ratings, the goal is to predict the unobserved user-movie ratings given the data, i.e., to complete the partially observed matrix. Following the framework of non-parametric statistics, we posit that user u and movie i have features x1(u) and x2(i) respectively, and their corresponding rating y(u,i) is a noisy measurement of f(x1(u),x2(i)) for some unknown function f. In contrast with classical regression, the features x=(x1(u),x2(i))are not observed, making it challenging to apply standard regression methods to predict the unobserved ratings. Inspired by the classical Taylor’s expansion for differentiable functions, we provide a prediction algorithm that is consistent for all Lipschitz functions. In fact, the analysis through our framework naturally leads to a variant of collaborative filtering, shedding insight into the widespread success of collaborative filtering in practice. Assuming each entry is sampled independently with probability at least max(m1+δ,n1/2+δ) with δ>0, we prove that the expected fraction of our estimates with error greater than ϵ is less than γ2/ϵ2 plus a polynomially decaying term, where γ2 is the variance of the additive entry-wise noise term. Experiments with the MovieLens and Netflix datasets suggest that our algorithm provides principled improvements over basic collaborative filtering and is competitive with matrix factorization methods.