We are given a Boolean function f : {-1,1}^{n} → R that can be written as a sparse linear combination of s polynomials. The Junta problem cf. [1] is an instance of such a setting. Our goal is to learn the function f by accessing its values at randomly sampled m elements from {-1, l}^{n}. In this paper, we draw connections between the sparse polynomial learning problem and compressed sensing. As a result we provide a convex program that learns an s-sparse polynomial with high probability using m = O(s^{2}n) observations. We contrast this result with the worst case sample-complexity which requires O(n2^{n}) random samples to learn the entire function f. Our results naturally extend to the setting where the data is noisy or f is well approximated by an s-sparse polynomial. Our results also show that the solution adapts to the number of observations and finds a natural approximation given the available information.