Often without realizing it, more and more companies rely on Web Data (any data you can see in a web browser) as a critical foundation for making business decisions.
Ron’s post on Web Data reminded me of this interesting blog post, “More data usually beats better algorithms”, written by Anand Rajaraman, co-founder of Kosmix and also Consulting Assistant Professor of Data Mining at Stanford University.
The blog post describes how Anand’s students competed for the $1 Million Netflix Prize, a competition open to the public.
Netflix provides a huge data set of customer movie ratings from the past, and the challenge is to use this data to create a better algorithm than Netflix already has to predict which movies people want to view in the future.
Anand’s students attacked this challenge and in his post he highlights two very different approaches. Team A focused on developing a sophisticated algorithm. Team B used a simple algorithm and focused more on the data, pulling in additional movie data from IMBD (International Movie Database).
Which team performed better?
Team B, who focused more on the data, got to the top of the Netflix Prize leaderboard.
Anand’s point? “…adding more, independent data usually beats out designing ever-better algorithms to analyze an existing data set. I’m often suprised that many people in business, and even in academia, don’t realize this.” Just adding one extra set of data can improve the quality of your decision making several times over.
The key is not about selecting between a better algorithm or better data, but about improving the outcome of your decision-making by adding more data, namely Web Data. Think about the impact to your business if you could add high-value Web Data to your Market Intelligence, Pricing Intelligence, Financial Intelligence or any other Business Intelligence product.
ARTICLE SOURCE: This factual content has not been modified from the source. This content is syndicated news that can be used for your research, and we hope that it can help your productivity. This content is strictly for educational purposes and is not made for any kind of commercial purposes of this blog.