March Machine Learning Mania 2016, Winner's Interview: 1st Place, Miguel Alomar

The annual March Machine Learning Mania competition sponsored by SAP challenged Kagglers to predict the outcomes of every possible match-up in the 2016 men's NCAA basketball tournament. Nearly 600 teams competed, but only the first place forecasts were robust enough against upsets to top this year's bracket. In this blog post, Miguel Alomar describes how calculating the offensive and defensive efficiency played into his winning strategy.

The Basics

What was your background prior to entering this challenge?

I earned a Master’s Degree in Computer Science from UIB in Mallorca, Spain. For nearly 20 years, I have been involved in software development, business intelligence and data warehousing. Recently, I have developed an interest in analytics and forecasting.

Miguel (AKA Mallorqui) on Kaggle

Do you have any prior experience or domain knowledge that helped you succeed in this competition?

In Spain, I played amateur basketball for 10 years. I like to think that is the reason I won.

The truth is I missed most of the basketball games this season and did not have a good feel for the any of the team’s quality. That most likely helped me because if I had seen more games, my judgment may have changed some of the forecasts. Normally, I am pretty bad at picking winners.

How did you get started competing on Kaggle?

I found Kaggle through some data science lessons I was taking on Coursera.

What made you decide to enter this competition?

I really like analytics and sports so I thought it was a perfect competition for me.

But the key factor is that moderators and other members make it easy to enter, they provide lots of help, data, advice and feedback. Data is already formatted and prepared so the data gathering and manipulating task is made very easy. Some members of the community seemed more interested in sharing and discovering new methods and insights than in winning the competition.

Let's get technical

What preprocessing and supervised learning methods did you use?

I used logarithmic regression and random forests. I did try ADA Boost but didn’t get very good results so I didn’t use it in my final model.

What was your most important insight into the data?

The data behind this competition is very simple, the box stats from basketball games are very simple to understand. The key factor for me was the offensive and defensive efficiency, how to calculate those? What weight to give to strength of schedule? Can you "penalize" a team because they haven’t played against the best teams in the nation? Can you lower their rating for something that didn’t happen?

Those are the kind of questions I was trying to answer, I developed several models with different degrees of adjusted efficiency ratings and checked their scores against past seasons.

Since my scores in Stage1 of the competition were not very good, I kept changing my model after Stage1 was closed.

My goal for next year is to formally test those different models to find out if there is any validity to my ideas.

Were you surprised by any of your findings?

After building the submission files, I put them into brackets using a script provided by one of the Kaggle members. My first model had a more conservative look to it and my second model (the final winner) just didn’t look right to me. Teams like SF Austin, Indiana and Gonzaga were predicted to go very far in the bracket. I almost scrapped it but since it was my 2nd model I decided to go with it. This model got most of the first round upsets right, that surprised me.

Click to expand.

Which tools did you use?

I used R, R studio and SQL.

How did you spend your time on this competition?

I would say my time allocation was 35% reading forums and blogs, 15% manipulating data, 25% building models and 25% evaluating results.

What was the run time for both training and prediction of your winning solution?

Five minutes. I trained my model using only 2016 data, so the amount of data to process is very small.

Bio

Miguel Alomar has a Master’s Degree in Computer Science from UIB in Mallorca, Spain. For nearly 20 years, he has been involved in software development, business intelligence and data warehousing.

March Machine Learning Mania 2016, Winner's Interview: 1st Place, Miguel Alomar

The Basics

What was your background prior to entering this challenge?

Do you have any prior experience or domain knowledge that helped you succeed in this competition?

How did you get started competing on Kaggle?

What made you decide to enter this competition?

Let's get technical

What preprocessing and supervised learning methods did you use?

What was your most important insight into the data?

Were you surprised by any of your findings?

Which tools did you use?

How did you spend your time on this competition?

What was the run time for both training and prediction of your winning solution?

Bio

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112