Winners’ Interviews – No Free Hunch

Last year we took our annual data science survey to the next level by turning over the results to YOU through an open-ended Kernel competition.

We were overwhelmed by the response and quality of kernels submitted. Not only are Kagglers amazing data scientists, but they’re incredible storytellers as well!

Mhamed Jabri was one of those skillful enough to take our data and shape it into something meaningful— not just for Kaggle, but for the data science community at large. We hope you enjoy getting to know him as much as we did.

Congrats, Mhamed, on your win!

To take a look at Mhamed's winning Kernel, visit: AfricAI

What was your background prior to entering this challenge ?

I’m currently pursuing an MS in Applied Mathematics at Ecole Centrale Lyon. My first formal background in machine learning came through several MOOCs and internships that I had during my gap year. I became active on Kaggle last year, winning multiple Kernel Awards and finishing top 1% in the Data Science Bowl 2018 competition.

What made you decide to enter?

Multiple reasons actually. A couple of months ago, I had the chance to attend Deep Learning Indaba in South Africa. The Indaba is a week-long conference that aims to strengthen machine learning in Africa through state-of-the-art teaching and networking in a very inclusive and diverse environment. My experience there was fantastic and made me realize how good African researchers really are. Since then, I was looking for a way to not only share that experience but also talk about AI in Africa in general and showcase the wonderful things that have been going on / that are happening right now in that continent for people who might be overlooking it. So the moment Kaggle sent us the first email about the survey to be conducted, I decided that when the results of the survey will be public I'd use them and build a storytelling notebook, which resulted in « AfricAI ».

What was your most important insight into the data?

Hmmm, I’d probably say that the most important insight is the one explaining how important MOOCs and online content are for students in Africa, especially in ML. Not only does it show from their answers in the surveys (as shown in my notebook, most African students think that online resources are better than what they’re given in class) but also from multiple success stories that I always encounter in Twitter, such as the one that I’ve taken from Jeremy Howard’s feed.

The most popular way to start learning Data Science / Machine Learning is Online Courses.

Were you surprised by any of your insights?

So there’s one insight but I think that for me, it was more of a disappointment rather than a surprise: The fact that less than 5% of respondents come from Africa. I definitely hope that the Kaggle community will have more Africans in its ranks in the years to come. It would also be amazing if Kaggle could host a competition by some African company or, even better, a data for good competition about any of the critical issues in Africa.

Which tools did you use?

I used Python and the common libraries for data analysis: Pandas, Matplotlib and seaborn. Using that, along with the storytelling skills I had gained thanks to my previous published kernels, I was able to come up with that notebook.

What have you taken away from this competition?

For me, the first thing that I took away is the satisfaction from being able to share with the Kaggle community, where Africans are clearly underrepresented, an article to get them interested in the state of AI there -- all while using the results of the survey rather than writing a regular blog post. The second thing is, of course, the many new tricks I picked up from other published notebooks during the competition. I mean, the visualizations in many kernels were just off the chart. If I had to pick up a couple that I would recommend, I’d probably go with Heads or Tails’ (as always) and Andre Sionek’s.

Mhamed Jabri is a MS student in Applied Mathematics at the Ecole Centrale Lyon in France. He aims to pursue a PhD, and his research interests include humanitarian AI and applications of ML to healthcare.

Last week we crowned the world’s first-ever Triple Grandmaster, Abhishek Thakur. In a video interview with Kaggle Data scientist Walter Reade, Abhishek answered our burning questions about who he is, what inspires him to compete, and what advice he would give to others. If you missed the video interview, take a listen.

This week, he’s answering your questions!

See below for Abhishek's off the cuff responses to select Twitter questions. Have something more you want to know? Leave a comment on this post, or tweet him @abhi1thakur

Here's what YOU wanted to know...

I used to read and implement quite a lot of papers during my master's degree and then during my unfinished PhD. After that, I decided to join the industry and thus I read papers relevant to the industry I am working in. Sometimes I also read papers I come across on Reddit and Twitter and also Kaggle. Recently, I have read papers on XLNet and BERT.

As for my favorite tools, Python is my bread and butter I love scikit-learn, XGBoost, Keras, TensorFlow and PyTorch.

It’s very difficult to find the time when you are working. Here's what I do: I wake up early and work 1-2 hours on a Kaggle problem before work each day. I try my best to start a model and have written scripts that will do K-Fold training automatically. I also have some scripts that automate submissions. When I’m back home from work, these models finish and I can work on post-processing or new models.

A few hours every day if you are a student. If you are working, maybe an hour or two a day. You can invest a few more hours over the weekends. Rather than investing time, it’s more about understanding the problem statement. I suggest writing down a few different approaches to try.

It’s also very good idea to read the discussion forums as a lot of ideas are shared there. If you're just starting with Kaggle, you also might want to take a look at past competitions and learn how the winners approached the problem. From there you can try to implement them on your own without looking at the code.

Every competition brings its own challenges and there is something new to learn from each one. For example, an image segmentation competition can be started by approaches like U-Net or Mask R-CNN. In a given image segmentation problem, one approach might outperform the other. So, you have to know which approach will work best in different scenarios and that can only be done when you have worked on several image segmentation problems.

Same with tabular data competitions. You can get numerical variables or a mix of numerical and categorical variables. If you have experience with these, you will know right away which approach works well and which models you can start without a lot of processing on the dataset.

So, yes, the process becomes smoother with every competition you try. The more competitions you participate in, the more you learn. Once you have a lot of scripts and functions that you can re-use, you can just automate everything (well, most of the things).

One of the most difficult challenges I worked on was the Stumbleupon Evergreen Classification Challenge. Now, if you look at that competition, you might not even find it challenging. At that time though, I had no clue about NLP and the tools and libraries we have available today to process text data and clean HTML.

Another tough one for me was the Amazon Employee Access-Challenge. Here, we were given categorical data which was again very new to me. Any time there's something in the data that you have less knowledge about or don’t know about at all, it can be challenging. The only way to avoid this is to learn the different approaches, and practice, practice, practice.

Check out Andrew Ng’s courses on Coursera. He explains everything in the simplest manner possible. I think you would need some basic mathematics background which you might have already and if not, I suggest working a little bit with algebra, some basic calculus, and probabilities. The only way to learn is to solve some problems. When you have an idea about how the problems are being solved, dig more into the algorithms and see what happens in the background.

One of the best things I've learned is to never give up. When starting in any field, you will fail several times before you succeed. And if you give up after failing you might not succeed at all. Another important thing I've learned is how to work on a team— how to manage time and divide tasks when working on the same problem. I also learned a lot about preprocessing and post-processing of data, different types of machine learning models, cross-validation techniques and how to improve on a given metric without compromising on the training or inference time.

The Future of AI in Africa Looks Bright | A Winner Interview with Mhamed Jabri

Triple GM Abhishek Thakur Answers Qs from the Kaggle Community

This week, he’s answering your questions!