The second iteration of the Dogs vs. Cats Redux playground competition challenged Kagglers to once again distinguish images of dogs from cats. This time relying on advances in computer vision and new tools like Keras. In this winner's interview, Kaggler Marco Lugo shares how he landed in 3rd place out of 1,314 teams using deep convolutional neural networks: a now classic approach. One of Marco's biggest takeaways from this for-fun competition was an improved processing pipeline for faster prototyping which he can now apply in similar image-based challenges.
The basics
What was your background prior to entering this challenge?
I am an economist by training and have been submerged in econometrics, which I would describe as the more classical branch of statistics where the main economic focus is often in policy and therefore on causality.
I started programming with the C language about two decades ago and have always strived to keep learning about programming, eventually landing on R which made me discover machine learning in 2013 - I was instantly hooked on predictive modeling.
Do you have any prior experience or domain knowledge that helped you succeed in this competition?
I have tried various computer vision datasets in the past but nothing that had forced me to push the envelope on hyperparameter optimization. This was my first image-related competition.
How did you get started competing on Kaggle?
I believe it was one night where I was searching how to do something with R and as it turns out, the code that ended up helping me to understand how it was done was found on the Kaggle website. I explored the site at that time and decided to enter a competition for fun, applying a linear regression thinking that it would be easy but ended up obtaining a less-than-stellar outcome instead. It was that somewhat humbling result that pushed me into machine learning.
What made you decide to enter this competition?
I was taking the excellent deep learning course by Jeremy Howard, Kaggle’s ex-president and ex-Chief Scientist, and one of the homework assignments was to enter the competition and get a top 50% ranking. I did my homework.
Let’s get technical
Did any past research or previous competitions inform your approach?
The online notes for Stanford’s CS231n course by Andrej Karpathy were particularly useful. Also, the Kaggle Blog winner's interviews were good to spark new ideas when my score started to stall.
What preprocessing and feature engineering did you do?
I randomly partitioned the data to create a validation set containing only 8% of the training set. I also demeaned and normalized the data as needed and used data augmentation to varying degrees.
What supervised learning methods did you use?
I used deep convolutional neural networks, both trained from scratch and pre-trained on the ImageNet database. My ensemble was a weighted average of the following models:
- 1 VGG16 pre-trained on ImageNet and fine-tuned.
- 2 ResNet50s pre-trained on ImageNet and fine-tuned.
- 1 ResNet50 trained from scratch.
- 3 Xception models pre-trained on ImageNet and fine-tuned.
- Features extracted from pre-trained InceptionV3, Resnet50, VGG16, Xception, used as an input to (1) Microsoft’s implementation of gradient boosting, lightGBM and (2) a 5 layer neural network.
- 2 VGG-inspired convolutional neural networks trained from scratch.
Were you surprised by any of your findings?
I was pleasantly surprised by the effect of adding relatively poor performing models into the mix. It was also interesting to play around with the different variations of rectified linear units (ReLU) as switching from standard ReLU to Leaky ReLU and Randomized Leaky ReLU had a noticeable impact.
Which tools did you use?
I used Keras, developed by François Chollet from Google, for all of the neural networks with both the Theano and TensorFlow back-ends depending on the type of model I had to run. While the vast majority of the work was done in Python, I did use R to run the lightGBM model.
What does your hardware setup look like?
It’s a relatively old Windows 7 machine running on a AMD FX-8350 CPU, with 24GB of RAM and a NVidia GTX1060 6GB GPU. I also run an Ubuntu virtual machine on it but most of the work is done on Windows. I plan on upgrading soon.
What was the run time for both training and prediction of your winning solution?
I remember that some models took over 74 hours to train as I often trained for hundreds of epochs but I cannot put an exact number on all the iterations and models that I had to run, I would estimate it at 3 or 4 weeks of running time. Predicting for the full test set took under an hour.
Words of wisdom
What have you taken away from this competition?
I learned how important it is to properly understand the evaluation function. It was worth my time to sit down with pen and paper to explore the mathematical properties of the logarithmic loss function. Understanding the formula is not the same as understanding its impact.
Looking back, what would you do differently now?
I would have set my processing pipeline much earlier in the competition. I only did it after cracking the top 40% and, unsurprisingly, it enabled faster prototyping and thus allowed me to start to make real gains on a daily basis. It is also worth the investment as it can be easily reused. I was able to quickly recycle it for the Cervical Cancer Screening competition and land a top 10% position from the start building on the same setup.
Do you have any advice for those just getting started in data science?
I would highly recommend trying out as many different problems as you can and getting your hands dirty even if you do not fully grasp the theory behind it at the beginning. I will steal a page here from Jeremy Howard’s deep learning course and refer you to a short essay that perfectly illustrates this point: A Mathematician’s Lament by Paul Lockhart.
Bio
Marco Lugo currently works as a Senior Analyst at Canada Mortgage and Housing Corporation. He holds a B.Sc. in Economics and Philosophy and a M.Sc. in Economics from the University of Montreal.