Journée Mathematical Foundations of Learning Theory
< précédent | suivant >
|The Dantzig Selector: Statistical Estimation when p is Larger than n|
Emmanuel Candes (California Institute of Technology)
31 mai 2006
In many important statistical applications, the number of variables or parameters is much larger than the number of observations. In radiology and biomedical imaging for instance, one is typically able to collect far fewer measurements about an image of interest than the unknown number of pixels. Examples in functional MRI and tomography immediately come to mind. Other examples of high-dimensional data in genomics, signal processing and many other fields abound. In the context of multiple linear regression for instance, a fundamental question is whether it is possible to estimate a vector of parameters of size p from a vector of observations of size n when n ≪ p. This seems a priori hopeless.
This talk introduces a new estimator, dubbed the “Dantzig selector” in honor of the late George Dantzig as it invokes linear programming, and which enjoys remarkable statistical properties. Suppose that the data or design matrix obeys a uniform uncertainty principle and that the true parameter vector is sufficiently sparse or compressible which roughly guarantees that the model is identifiable. Then the estimator achieves an accuracy which nearly equals that one would achieve with an oracle that would supply perfect information about which coordinates of the unknown parameter vector are nonzero and which were above the noise level. Our results connect with the important model selection problem. In effect, the Dantzig Selector automatically selects the subset of covariates with nearly the best predictive power, by solving a convenient linear program.
Our results are also inspired by a recent body of work perhaps now best known under the name of “Compressive Sampling,” a new sampling theory we introduced very recently. If time allows, I will discuss applications of Compressive Sampling in other fields such as coding theory.
Further references: The main paper but there are also many other papers on a related subject, for instance, Decoding by linear programming and Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information.
||Emmanuel Candes (California Institute of Technology)|
Professor of Applied and Computational Mathematics