Energy Landscape for `large average' Gaussian submatrices.
Time and place
1 PM on Tuesday, February 19th, 2013; NAC 6-113
Partha Dey (NYU)
Abstract
The problem of finding large average submatrices of a real-valued matrix arises in the exploratory analysis of data from a variety of disciplines, ranging from genomics, computer science to social sciences. We provide a detailed asymptotic analysis of large average submatrices of an $n \times n$ Gaussian random matrix. For $k<<\log n$ we identify the average and the joint distribution of the $k \times k$ submatrix with largest average value. As a dual result, we establish that, for any given $\tau > 0$, the size of the largest square sub-matrix with average bigger than $\tau$ is, for large $n$, equal to one of two consecutive integers near $4(\log n - \log \log n)/\tau^2$.
We then turn our attention to submatrices with dominant row and column sums, which arise as the local optima of iterative search procedures for large average submatrices. For fixed $k$, we identify the average and joint distribution of a typical $k \times k$ submatrices with dominant row and column sums, and we carry out a detailed analysis of the number $L_n(k)$ of such submatrices, beginning with the mean and variance of $L_n(k)$ which has a very atypical behavior. In particular, for $k=2$ and $k=3$, the order of the means are $o(n^2)$ and $o(n^3)$, while the variances are $n^{8/3}$ and $n^{9/2}$, respectively, with logarithmic corrections. Our principal result is a Gaussian central limit theorem for $L_n(k)$ based on a new variant of Stein's method, that is of independent interest. Based on joint work with Shankar Bhamidi and Andrew Nobel.