Netflix contest: $1M to developer of best movie recommendation engine

ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Other Software > Developer's Corner

<< < (3/6) > >>

mouser:
Alex (www.3form.org) has convinced me to join with him in working on this a little since both of us have backgrounds in machine learning.

I have my doubts about the possibilities of winning this but it seems an interesting challenge and worth a little bit of time playing with it. I'm happy to talk with anyone in the donationcoder irc channel (#donationcoder on efnet, or hit the chat button aboce) who thinks they might also want to try entering.

urlwolf:
all,

Do you know of any webservice where users log which movies they have actually seen? Something like last.fm for music, but using movies instead...

Actually, same thing for books read would be great too..
@mouser: I sent you an email, cannot reach the #IRC channel for some reason...

CWuestefeld:
I made a stab at this late last year with a friend. The amount of data isn't a big deal for the storage capacity of a modern PC. More difficult is being able to process it in a reasonable time.

We started with a bit of success. Our very first attempt was enough to beat Netflix's baseline and yield a score that qualified for the leaderboard (at the time). Doing this was surprisingly easy. Nothing but a good working knowledge of statistics was needed to beat their baseline. Simply normalizing the scores of all the users according to their standard deviations was enough to do this.

Our project fizzled out after just a couple of weeks. The problem was that neither of us has an understanding of the collaborative filtering techniques that appear to be necessary to do really well. And experimentation was difficult. The size of the data set is such that we could only really make one run a day (we were using MS SQL Server, if you're wondering).

Also, I was a little frustrated because I think that their criteria measure the wrong thing. At first it was an intellectual challenge anyway, but that wore out. The challenge is to predict the score that a customer would assign for each of a set of movies. The thing is, this isn't really an interesting question to solve. I don't care very much if Netflix thinks that I'd give this movie a 3.1 or 3.5. What's really interesting is only selecting a list of movies at the top end of the scale, so Netflix can give me a list of recommendations when I ask "what good movies have you got for me today?".

mouser:
great to hear you worked on it!!

my main source of skepticism is a suspicion/fear that there is some inherent noise in the data, and that winning the contest may in fact require being better than the noise would allow without pure luck. in other words, a person's rating for any given movie may vary a little bit depending on their mood at the time, or based on some truly unpredictable factors. for example, imagine i ask you to predict the votes of 100 people and i happen to know that 95 of them always vote one way, but the other 5 flip a coin to decide which way to vote. now you can't hope to get a perfect score because of the element of true randomness (coin flip). At some point, trying to get better than a certain score on the netflix challenge data is going to look like that -- though whether that happens well before the million dollar prize accuracy i dont know, but that's my fear.

mouser:
the forum makes for great reading: http://www.netflixprize.com/community/

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version