Journée Mathematical Foundations of Learning Theory
< précédent | suivant >
|An Isoperimetric Inequality with Applications to Learning|
Shie Mannor (McGill University)
1er juin 2006
An issue of central importance is learning in the presence of data corruption, or noise. In this talk, we consider the case where data corruption has produced a data sample with a large margin. The essential question is “what is the cost of this margin?” in terms of generalization error. We provide an answer for the case where the underlying distribution has a nearly log-concave density.
First, we prove that given such a nearly log-concave density, in any partition of the space into two well separated sets, the measure of the points that do not belong to these sets is large. Next, we apply this isoperimetric inequality to derive lower bounds on the generalization error in classification. We further consider regression problems and show that if the inputs and outputs are sampled from a nearly log-concave distribution, the measure of points for which the prediction is wrong by more than ǫ0 and less than ǫ1 is (roughly) linear in ǫ1 − ǫ0 , as long as ǫ0 is not too small, and ǫ1 not too large. We also show that when the data are sampled from a nearly log-concave distribution, the margin cannot be large in a strong probabilistic sense.