A relative-error CUR decomposition for matrices and its data applications
Michael W. Mahoney
Yahoo Research
4/18 11am
Wean 8220
Much recent work in the Theoretical Computer Science, Linear Algebra, and Machine Learning has considered matrix decompositions of the following form: given an m-by-n matrix A, decompose it as a product of three matrices, C, U, and R, where C consists of a few (typically a constant number of) columns of A, R consists of a few (typically a constant number of) rows of A, and U is a small carefully constructed matrix that guarantees that the product CUR is “close” to A. Applications of such decompositions include the computation of compressed “sketches” for large matrices in a pass-efficient manner, matrix reconstruction, speeding up kernel-based statistical learning computations, sparsity-preservation in low-rank approximations, and improved interpretability of data analysis methods. In this talk we shall discuss various choices for the matrices C, U, and R that are appropriate in different application domains. The main result will be an algorithm that computes matrices C, U, and R such that the (Frobenius) norm of the error matrix A - CUR is almost minimal. We will also discuss applications of this algorithm (and related heuristics) in the bioinformatics of human genetics, in recommendation system analysis, and in medical imaging.