Let me be clear: I love deep learning. It has radically improved the scope of problems that machine learning can be practically applied to. Iâve built a company on the back of deep learning and owe quite a lot to it.
DO NOT start with deep learning. Iâm not saying that you shouldnât approach deep learning at all, or even that you shouldnât end up exclusively studying deep learning. To pre maturely focus on deep learning though I believe would be a short-sighted decision.
Deep learning isnât a magic bullet, and in many (common) situations is in fact a very bad fit for problems that you may be faced with. The problem is that because deep learning is such a flexible and powerful tool itâs important to learn when to not use deep learning for a problem. In almost all situations deep learning is capable of attacking a problem, but there is a much smaller set of problems for which deep learning is practically effective or useful.
Letâs use an example:
Manager: Weâre looking to predict customer churn. We only have about 10,000 customer records, and for each one we have about 10 categoric variables. We want to know if we can use those categoric variables to predict their churn.
Approach 1: Hmm, categoric information isnât a perfect fit for any of the out-of-the-box architectures Iâm familiar with. Iâd probably set this up as a simple fully-connected network. Now, that said we donât have much data so I probably need to play around with some smart initialization approaches to get the network in the right neighborhood. Iâll start getting a basic network up that we can test with and then we can iterate on the architecture to get to improve the accuracy. Do we have any GPUs lying around? It would be really helpful to have a couple to run experiments on.
Approach 2: I used a Random Forest from sklearn. It took five minutes, itâs well-fit to the problem definition and I was able to predict churn with 80% precision and 60% recall.
Could Approach 1 have worked? Certainly. Would Approach 1 have given the Manager higher accuracy? Maybe, but not likely. This isnât a particularly complex problem and itâs not one where weâre likely to effectively use a combination of factors more complex than those mapped by a Random Forest. The data is well structured and itâs not very large, so our ability to learn anything truly sophisticated is limited.
Hereâs another example: Bag of Words Meets Bags of Popcorn. This tutorial created a bit of a stir when it first came out. (Note: this tutorial is NOT deep learning despite the fact that it is billed as such. It is however indicative of deep learning solutions in the NLP space) It was originally intended to be an example of how powerful word2vec was. It was billed as a taste of what deep learning could do for NLP. There is a long, detailed tutorial for how to apply word vectors in this problem and itâs a great introduction to the topic for people that arenât familiar.
Except thatâs not how you should do the problem. It created a stir because in this problem (as in many others) the more complex solution actually performs worse than the most basic, obvious approach you could think of. You would be shocked at how strong of a benchmark logistic regression is on top of tf-idf vectors. Itâs a great example of a very simple solution to what may often seem a very complex task.
The problem is that if you exclusively focus on deep learning very early on you will make these mistakes. Youâll miss obvious solutions sitting in front of your face, youâll over-complicate solutions when something very simple and straightforward would have worked, and worst of all, you probably wonât even know. Youâll probably end up shipping a very expensive model into production only to learn months or years later that the model underperforms a simple benchmark. Youâll also find yourself falling off a deep end quickly when you try to move beyond being the ML version of a script-kiddie. Deep learning didnât invalidate historic machine learning, it just built upon it. Youâll find that in more cutting edge deep learning papers there are a lot of callbacks to old research and a lot of assumption of fundamental knowledge that you would likely never get if you exclusively studied deep learning.