Kaggle's inaugural code competition, the Two Sigma Financial Modeling Challenge ran from December 2016 to March 2017. Over 2,000 players competed to search for signal in unpredictable financial markets data. As the very first code competition, competitors experimented with the data, trained models, and made submissions directly via Kernels, Kaggle's in-browser code execution platform.
In this winners' interview, team Bestfitting describes how they managed to remain a top-5 team even after a wicked leaderboard shake-up by focusing on building stable models and working effectively as a team. Read on to learn how they accounted for volatile periods of the market and experimented with reinforcement learning approaches.
The basics
What was your background prior to entering this challenge?
Bestfitting: I’ve worked as software developer for more than 15 years and as a machine learning & deep learning researcher for 5 years.
Zero: I worked as a bank data analyst for more than 4 years. I am enthusiastic about machine leaning.
CircleCircle: I am a data analyst working on risk-control related solutions for banks.
Do you have any prior experience or domain knowledge that helped you succeed in this competition?
We learned a lot of skills from previous Kaggle competitions, such as feature analysis, feature selection, validation set build and how to control over-fitting.
According to our experiences in bank data analysis, we prefer to keep models stable in all kinds of market situations. We don’t pursue excessive public scores; a profitable yet stable model is the best.
How did you get started competing on Kaggle?
Bestfitting: I need all kinds of datasets and challenges to validate algorithms I’ve learned.
Zero: When I learned CS229 by Andrew Ng, he advised us to enter a competition.
CircleCircle: Kaggle is a great platform, I learned a lot from forums, and then, I decided to enter a competition and have a try.
What made you decide to enter this competition?
We entered this competition for two reasons:
First, as we know, Two Sigma is a very successful and creative company and we guessed the competition they hosted should be very interesting
Second, predicting financial market is very hard, we want to see how well we can do by using machine learning skills.
Let’s get technical
Summary
We are very happy with the result: we are the only team that stayed in the top 5 on both public and private leaderboards. We feel we are very lucky, but we can not win a competition only by luck; we did a lot of work to ensure profitable yet stable models.
We tried to build effective methods to evaluate our models and control risks, although we don’t have much financial background, we wanted to learn some ideas about quantitative investment.
Features
We used 4 kinds of features:
- Basic features. Original features from the dataset which two-sigma provided.
- Calculated features and lag features. Get by using simple functions on basic features, such as abs, log, standard deviation, and so on. We also used features of last few timestamps which are called lag-N features.
- Predicted features. Predicted from first level weak model. They were used in second level model.
- Whole-market features. We tried to build some features to get information from whole market: increasing or decreasing, calm or volatile. They were also used in our self-adaptive strategy.
Validation
We wanted to introduce some validation methods we used through the whole competition.
The first one is the Cumulative-R: we plotted the cumsum of R to find the performance as time goes on.
And we defined another simple reward value, we called it y-sign-R. For each sample, if the predicted y has same sign of real y, then the reward is 1, otherwise, -1. We summed the reward value up, and plot cumsum curve, we can see the curve on right side. If the cumsum of y-sign-R is less than zero, we think it is not a good model because they are not better than random guess. We can see that both the ET and LR model performed not so well, especially the ET model.
Models
We developed our models independently before we teamed up. Bestfitting and Zero’s model can get top-10 on private LB individually. We did not use CircleCircle’s model in final ensemble model due to run time limitation of the competition, but we used some features from her model.
Bestfitting’s model
We realized our models cannot identify the market environment, as we know, asset prices wave along with the whole market. So we plotted the y-mean of each timestamp and found that there were two volatile periods. We needed to add this information to the models.
So, we added mean of t_20 and t_30 of each timestamp and used it in ET model. The public score improved a lot and private score had a big improvement.
Bestfitting’s post-processing
BestFitting plotted the cumsum of real-y, y predicted by a ridge model, by ET model and an ensemble model. He found that the ET model had better performance in volatile periods especially while the market was increasing. And the ridge model has better performance in relatively calm/smooth periods.
And he also found that the ridge model can predict y-sign much better, but the value it predicted was small, if the market is in volatile periods the R reward will be small although the sign is correct.
So he tried to make his model more adaptive and can select correct model in different periods.
At this stage, we have teamed up. We tried a lot of methods, including reinforcement learning, but we couldn’t find a very stable reward, and time is limited, so we chose a rule based way.
We must let our model know whether the market is increasing or not, calm or volatile, so we defined some measurement, for example, we counted the sign of y_mean of last 5 timestamps, and used them as an indicator to ensemble Ridge and ET model dynamically. After these efforts his model went up to top 7 on public leader board and the private score dropped a little but it’s healthy he thought.
Journey of Bestfitting’s improvement
Let us have a look at Bestfitting’s journey of improvement in one chart. We think the performance was getting better in a stable way. If the competition ended at any time, the model will not be over-fitting.
Zero’s model
Before we teamed up, Zero’s model had a decent public score, but the private score is not so good. After we teamed up, Zero added whole market features to his models and his public scores improved a little and his private scores had a huge improvement.
Zero’s post-processing
After we teamed up, the most important thing we did is to make our model adaptive. Zero used a different strategy: he wanted to let model know whether the market is in volatile periods by standard deviation. He evaluated the standard deviation of y_mean of last 5 timestamps and compared it with standard deviation of y-mean of the training set. He then ensembled the models dynamically.
Journey of Zero’s improvement
Now, let us have a look at the journey of zero’s improvement. As we can see, his model had a huge improvement in private score after we used whole-market features.
Ensemble
OK, let us go on with the ensemble of the two models.
We used weighted average of the two models, and we found that Zero’s model had worse performance when the market is calm, so we gave small weights to Zero’s model.
Teamwork
How did your team form?
We built independent models before we teamed up, and we all had decent public score, but we were exceeded by other competitors as the competition went on. So we all realized that we needed to team up for stable models and good position.
How did your team work together?
After teaming up, Bestfitting was in charge of the structure of the stable model and the concept or similar concept of reinforcement learning so that the model was self-adaptive and stable.
Zero integrated the source codes of the models and he improved the performance making it possible to finish in 1 hour. CircleCircle tried to build a reinforcement learning model and built better validation sets. She also evaluated the model under different market environments.
Words of wisdom
What have you taken away from this competition?
We would rather believe in local multiple verification methods compared to public scores, such as R value of accumulated timestamps, the comparison between predicted Y accumulated and true Y accumulated value, and accuracy of Y’s positive or negative. So we could ensure the stability of model to get a certain income, but also to maintain a low risk.
We used whole market features to let our model have information about the market which let our models be stable.
We also tried some strategies to make our models more adaptive to different periods.
Do you have any advice for those just getting started in data science?
Get knowledge from good courses such as Stanford CS229 and CS231n.
Get information from competitions on Kaggle, kernels, and starter scripts.
Enter Kaggle competitions and get feedback from them.
Thanks
This is our story of hard and happy journey. We must give great thanks to Kaggle and Two Sigma. We think the code competition is more fair and we must pay more attention to the speed which led to more useful models.