Can you put order to space and time? This was the challenge posed to competitors of the Draper Satellite Image Chronology Competition (Chronos) which ran on Kaggle from April to June 2016. Over four-hundred Kagglers chose a path somewhere between man and machine to accurately determine the chronological order of satellite images taken over five day spans.
In collaboration with Kaggle, Draper designed the competition to stimulate the development of novel approaches to analyzing satellite imagery and other image-based datasets. In this spirit, competitors’ solutions were permitted to rely on hand annotations as long as their methodologies were replicable. And indeed the top-three winners used non-ML approaches.
In this interview, Vicens Gaitan, a Competitions Master, describes how re-assembling the arrow of time was an irresistible challenge given his background in high energy physics. Having chosen the machine learning only path, Vicens spent about half of his time on image processing, specifically registration, and his remaining efforts went into building his XGBoost model which landed him well within the top 10%.
The basics
What was your background prior to entering this challenge?
I’d like to define myself as a high-energy physicist, but this was in the 90’s ... ooh, I’m talking about the past century! At that time we used neural networks to classify events coming from the detectors, and NN were cool. Currently I’m managing a team of data scientists in a software engineering firm, and working actively on machine learning projects. Now NN are again cool ☺
Do you have any prior experience or domain knowledge that helped you succeed in this competition?
In fact, no. It was a very good opportunity to learn about image processing.
How did you get started competing on Kaggle?
I discovered Kaggle through the Higgs Boson challenge. Immediately I realized that competing on Kaggle is the most efficient way to follow the state-of-the art in machine learning, and compare our own methodologies against the best practitioners in the world.
What made you decide to enter this competition?
The “time arrow” is still an open problem in science. It was interesting to see if a machine learning approach could say something interesting about it. Moreover, I have always been fascinated by satellite imagery. I could not resist the temptation to enter the challenge!
Let's get technical
What preprocessing and supervised learning methods did you use?
From the start, I realize that the first thing to be done was to “register” the images: scale and rotate the images in order to match them, in the same way we do when building photo panoramas.
Because there was no available image registration package for R, I decide to build the process from scratch. And this was a lucky decision, because during this process I generated some features that were very useful for the learning task.
The registration process is described in detail in this notebook:
The main idea in the “registration” process is to find “interesting points” that can be identified across images. Those will be the so called keypoints. A simple way to define keypoints is to look for “corners”, because a corner will continue to be a corner (approximately) if you rotate and scale the images.
Once you have the key-points, it is necessary to describe them in an economical way using few numbers, to allow finding the matches efficiently. 30x30 rotated patches around the key-points, subsampled to 9x9 seem to work well for the granularity of these images.
To do the match, k-nearest neighbors of descriptors comes into play and the magic of RANSAC to select the true matches over the noise.
Once we have the matching points, it is easy to fit a homomorphic transformation between the images, and then compare them and look for temporal differences.
Are you able to do that by hand? I’m not, so I take the machine learning path:
The number of available classified images is low, so it seems more promising to build a model at keypoint level (we have one hundred per image) using the descriptor information, and then average them for each image. Also it seems to be a good idea to predict temporal differences between keypoints, instead to predict directly de-position in the ordered sequence.
Let’s describe the procedure:
Feature engineering: (this was the easiest one, because it was already done after the registration process)
a) Key point detection For each image: Detect keypoints. Generate descriptors: 30x30 (downsampled to 9x9) oriented patches for each keypoint
b) Inter-set registering: For all sets, for every couple of image: Identify common points using RANSAC and fit a homomorphic transformation. Keep values of the transformation and patches for the identified inliers. The parameters of the homomorphic transformation will result in informative feature when comparing couples of images.
c) Extra set registering: Interestingly, the same idea can be used between images coming from different sets. Sampling from the full database of descriptors, using knn, find candidates to neighbor sets, and then register every possible image from one set to every image to the second set. Look for “almost perfect” matching taking into account a combination of number of identified common keypoints and the rmse in the overlapping region as a matching value. This procedure automatically detects “neighbor sets”
d) Set Clustering: using the previous matching information build a graph with sets as nodes and edges between neighbor sets with weight proportional to the matching value. It is relatively easy to select couple of images in neighbor sets with “perfect match” (High number of inliers, and very low rmse in the intersecting area) The connected component of this graph gives the set clusters without need to explicitly georeference the images by hand. This saves a lot of manual work.
PATCH LEVEL MODEL (Siamese gradient boosting)
The model is trained over pairs of common keypoints in images of the same set, trying to predict the number of days between one image and the other (with sign). The features used are:
Patch im1 (9x9), Patch im2 (9x9), coefficients from the homomorphic transformation (h1..h9) , number of inliers between both images and rmse in the overlapping region
The model is trained with XGBoost to a reg:linear objective using 4-fold cross validation (assuring that images in the same cluster belongs to the same fold). Surprisingly, this model is able to discriminate the time arrow quite well.
IMAGE LEVEL MODEL
Average the contribution of all patches in an image. Patches with absolute value of DeltaT below a certain threshold level (alpha) are discarded. If the Patch Level Model were “perfect” this would result in the average of difference in days from one image to the rest of images in the set, so the expected values for an ordered set will be -2.5,-1.5,0,1.5,2.5 respectively. In practice, the ordering is calculated by sorting this average contribution.
Additionally, the average is refined by adding the average of images from overlapping sets (using the graph previously defined) taking into account the number of inliers between both images and the rmse in the intersection of them, weighted with a parameter beta, and iterating until convergence.
The optimal alpha and beta are adjusted using public leader board feedback (the training set is not representative enough of the test set) and as expected, there was some overfitting and a dropping of five positions in the private leaderboard.
The local CV obtained is 0.83 +/-0.05.
The model scored 0.76786 in the public leaderboard and 0.69209 in the private. I’m convinced that with more training data, this methodology can get a score of.8x or even .9x.
What was your most important insight into the data?
Probably the fact that by comparing local keypoint descriptors it is possible to obtain some information about the deltaT between them. Not only the number of days, but also the sign. Of course, this is a very weak model, but ensembling for all keypoints in an image, and taking into account the deltaT obtained in the neighbor sets (selecting the image with “perfect match”).
Were you surprised by any of your findings?
Initially, I only used keypoints matched by RANSAC between couples of images. This implies that the descriptors are very similar. When I try to add all keypoints, the result doesn’t improve. This is telling us that the temporal information is coded in subtle variations of pixel intensities, and not big changes of macroscopic objects, like other presence or not of a car, for example.
In fact, the bigger surprise is that these descriptors are able to code some information about the time arrow
Which tools did you use?
I tried to develop the full pipeline in R. I used mainly the “imager” package for image processing and “xgboost” for the model. The “FNN” for fast k-nearest neighbors library was also very helpful because of its speed.
How did you spend your time on this competition? (For example: What proportion on feature engineering vs. machine learning?)
Half of the competition was devoted to developing the registration process. After that, building the model was relatively fast.
What was the run time for both training and prediction of your winning solution?
The keypoint extraction and descriptor calculation takes about 2 hours on a 12 core machine for the full dataset. Registration (especially the extra-set one) is more CPU consuming and lasts for more than 8 hours for the full dataset.
Then, model training can be done in few hours (I’m doing 4-fold cross validation for selecting the hyperparameters).
Words of wisdom
The future of satellite imagery and technology will allow us to revisit locations with unprecedented frequency. Now that you've participated in this competition, what broader applications do you think there could be for images such as these?
After the surprising fact that it is possible to correlate time arrow with image information, why not to try to predict changes in other indicators (with changes in a time interval of several days) that can be weakly correlated with the images:
Predict risk of failure of infrastructures, demographics, social behavior, level of wealth, happiness ... I’m just dreaming.
What have you taken away from this competition?
Knowledge in a previously unknown area. The fact that XGBoost can be useful for image challenges (you doubt it). And the most important, a lot of fun.
Do you have any advice for those just getting started in data science?
The only way to learn is to try by yourself. Don’t be afraid of problems in which you don’t have background. There is a lot of very interesting resources out there (forums, code... ), but the only way is hands in.
Just for fun
If you could run a Kaggle competition, what problem would you want to pose to other Kagglers?
I think that the most interesting open problem is, given a dataset and a learner, try to predict the optimal set of hyperparameters for that combination of dataset-model. I know, the problem is how to build a labeled set for this supervised problem... Maybe reinforcement learning can be a way to do that, but wait a moment... what should be the optimal hyperparameters for the policy learner?... It seems I’m entering in a kind of Goedelian loop...
Better forget it Why not to try to predict TV audiences based on time-stamp and EPG information?
What is your dream job?
I already have it
Bio
Vicens Gaitan is R&D director in the Grupo AIA innovation area. He studied physics and received a PhD in Machine Learning applied to experimental High Energy Physics in 1993 with the ALEPH collaboration at CERN. Since 1992 he has worked at AIA in complex problem solving and algorithmic development applied to model estimation, simulation, forecasting and optimization, mainly in the energy, banking, telecommunication and retail sectors for big companies. He has experienced knowledge in advanced methodologies including machine learning, numerical calculations, game theory and graph analysis, using lab environments like Python or R, or in production code in C and Java.