Three years later and the contest is still ongoing. As I've been doing a lot of reading in statistics recently, and the best way to learn is by doing, I decided to enter. Solutions to the problem generally include a blend of these methods
After two weeks I am at position 678 of about 5000 teams with a RMSE of 0.9076. My solution is based on SVD ala Simon Funk. I added two twists to SVD to improve upon its initial RMSE of 0.91453. The first adds date information and the second uses ranks rather than raw scores.
In the simplest form of Simon Funk SVD each user and movie has a number of features, k. The predicted score p for user u and movie m is
p[u][m] = sum over k of movie_feature[k][m] * user_feature[k][u]The movie and user features are first initialised to
sqrt( global average / number of features)This gives a RMSE of about 1.04. The features are then trained using gradient descent over the list of known ratings r, one feature at a time.
error = r[u][m] - p[u][m] movie_feature[k][m] = movie_feature[k][m] + learning_rate * error * user_feature[k][u] user_feature[k][u] = user_feature[k][u] + learning_rate * error * movie_feature[k][m]This training step is done aprroximately 100 times over the entire set of known ratings, and repeated for each of the features. The first feature tends to be high in blockbuster style movies, while later features tend to pick out other aspects such as romance, black and white classics, commedies. (This is a very simplified description). My quick and easy way of adding date information is to add two further sets of features, a pairing of movie and date of rating, and a pairing of user and date of rating. The hope is that the first captures seasonal effects such as Christmas and Halloween, while the second captures a users changing tastes over time. Date information has been added to SVD before, but not I believe, in as simple a manner. The dates of the ratings are binned, so that ratings made in the first month in the dataset are in bin one, ratings made in the second month are in bin two etc. The prediction now has two extra terms:
p[u][m] = sum over k of movie_feature[k][m] * user_feature[k][u] + sum over k of moviedate_feature[k][m] * datemovie_feature[k][d] + sum over k of userdate_feature[k][m] * dateuser_feature[k][d](Though I found that I needed only about 10 features for the date aspects, rather than about 200 for the movie/user features). The learning rule for the new features is very similar, ie for the moviedate feature:
error = r[u][m] - p[u][m][d] moviedate_feature[k][m] = moviedate_feature[k][m] + learning_rate * error * datemovie_feature[k][d] datemovie_feature[k][d] = datemovie_feature[k][d] + learning_rate * error * moviedate_feature[k][m]Finally, to prevent overtraining I used a regularisation term, as described in Simon Funk's blog. My second idea was an attempt to tackle the different ways people rate movies. Imagine two people with identical tastes in movies. One might give all movies a rating between 2 and 5. The other gives exactly one less to each movie. Their preferences are the same, but their calibration on the allowed range of 1 to 5 is different. My idea was that rather than training on the raw score, the features are trained using ranks. This way the impact of different scoring patterns is reduced a little. If a user has rated n movies, and given c[i] of them a rating of i, then for example, the normalised rank score of a movie given a real score of 1 is
5 * (c[1] + 1) /( 2 *n )for a movie given a rating of i it is
5* (( c[i] + 1) /2 + sum c[j for j The factor of five, is merely present so the the ratings are roughly in the range 1 to 5. The term (c[i] + 1)/2 selects the middle rank if a number of movies have been given the same rating.This method proved less useful than I had hoped. However, my submission to Netflix was actually a blend of the two methods (using the Gravity linear blending tool), so it still contributed. In fact one of the main findings to come out of the Netflix Prize is that a blend of multiple, preferably quite different techniques always beats any one technique in isolation.
Finally here are some of my results using the Probe10 dataset from the Gravity team.
Method | RMSE on Probe10 dataset |
---|---|
SVD | 0.91453 |
SVD + date | 0.908360 |
SVD + Normalized Ranks | 0.920397 |
Blended Result | 0.903485 |
I believe from reading the Netflix forums that with careful choice of learning rates and by training all features simultaneously, it is possible to achieve a Quiz RMSE of 0.898 using SVD with simultaneous training of all features, so there is certainly scope for further improvement of my method incorporating dates by tuning of learning parameters or by moving from individual to simultaneous training of features.
It's a fascinating machine learning problem with applications from general shopping recommender systems, to dating agencies to computer bug analysis (which is why I was reading up on stats initially).