In our recent discussions, we saw how k-Nearest Neighbors can serve as a powerful non-parametric approximation to the optimal predictor that minimizes squared error loss.
Today, we’ll take the next step and explore why KNN breaks down in practice - and why this motivates the need for more structured machine learning models.
In particular, we’ll focus on:
- The curse of dimensionality
- Why distance-based methods struggle as feature spaces grow
- What properties we want from better predictors
To guide this, we’ll follow an excellent lecture by Prof. Kilian Weinberger (Cornell):
As you and go through the material, think about:
- Which assumptions KNN implicitly makes about the data
- Which of those assumptions fail in high dimensions
- How these failures inform the design of more advanced models
Feel free to post questions, insights, or counterexamples in the thread below — especially if you can relate them back to squared-loss optimality or real-world data.
Enjoy, and looking forward to the discussion!