Last week our co-founder and machine learning architect, Madison May, presented at the Boston Machine Learning Meetup on various optimization methods based on trends from ICLR 2018. Video and slides below.
In recent years the use of adaptive momentum methods like Adam and RMSProp has become popular in reducing the sensitivity of machine learning models to optimization hyperparameters and increasing the rate of convergence for complex models. However, past research has shown when properly tuned, using simple SGD + momentum produces better generalization properties and better validation losses at the later stages of training. In a wave of papers submitted in early 2018, researchers have suggested justifications for this unexpected behavior and proposed practical solutions to the problem. This talk will first provide a primer on optimization for machine learning, then summarize the results of these papers and propose practical approaches to applying these findings.