My way to a PhD candidate in mathematical statistics is somewhat untraditional, I work on the applied side which for some time has become more common in the current Big Data era. My major is not in maths but in computer science and through an additional master in system biology I got interested in math.stats. Now, still, I often find it hard to bridge the gap between PhD level math-stat and my level of maths.

Following parts of the excellent book, Statistics for High-Dimensional Data, by Sara van der Geer and Peter Bühlmann I below fill in details needed for someone at my math level to grasp how to prove the consistency and the oracle property of the LASSO (Least Absolute Shrinkage Operator) for prediction. LASSO is a method to do regression and variable selection in sparse linear models (N << p), it has an objective similar to ridge regression but instead of an L2-norm penalty on the coefficients it uses the L1-norm (absolute values) penalty. This set the, hopefully, spurious coefficients to exactly zero.

Given enough data the coefficients in the solved optimization problem correspond to the ones in the underlying true model. The Oracle property tells us that this method work as good as if we already knew which predictors were included in the true model. For this to hold we have to add some restrictions.

Hopefully this will be of some use to others. Please contact me if you find any errors or have other comments.

I used XeTeX and pdf2svg to render SVG files.