Recommender Systems | ICM501 Quinnipiac Assignment #1 |

A Look at Ratings and Recommender Systems

Unlike the offline world, the Internet is chock full of ratings and recommender systems. Why? Because metrics are everywhere. On YouTube, you know how many people watched a video, and who made it to the end. On WordPress, you know where your readers came from. And on Facebook, you are constantly being served the likings, joinings, and friendings of your friends.

Plus you have likely provided any number of cues about your preferences, your age, your gender, your marital status and/or sexual preference, and your location, among dozens if not hundreds of other data points.

But Aren’t We Really Just Doing This to Ourselves?

This includes everything from indicating you are a Red Sox fan to watching a cooking demonstration video to its end, to listing your college graduation year on LinkedIn, to joining a group devoted to German Shepherd Dog rescue, to reviewing a book on GoodReads about bi-curiosity. With all of this data, attempts are made to get a clear(er) picture of a person. With the picture comes an effort at predictability.

Amusingly, as I’ve been writing this blog post, to prove the point, the WordPress Zemanta plugin is currently serving me pictures of German Shepherd Dogs and a number of articles about them, allegedly related to this post. But, surprise! This post isn’t really about German Shepherds at all, their images notwithstanding.

Close But No Cigar. Not Even a Kewpie Doll

Just as Zemanta screws up, so do plenty of other sites with recommender systems. Spotify, Pandora, and other music-matching sites seem to fairly routinely not get it. In September of 2013, Forbes reporter Amadou Diallo wrote about a search for a perfect playlist.

In his article, Diallo compared iTunes Radio, Spotify, and Pandora, by using various seed artists to create playlists. He gave the matching algorithms Stevie Wonder, Herbie Hancock, and The Alabama Shakes.

Diallo concluded that Pandora had the best matching algorithm, but there were definite flaws with all three.

The Computers May Not Be Up to the Task

To my mind, a pure computer-driven search is a misplaced notion. One of the issues is of categorization. For musical, film, book, and other recommendations, it’s all only as good as how it’s categorized, and often goods are poorly organized.

Consider Johnny Cash. A country artist? Sure. Male artist? Of course. He came from a particular time period and his work was generally guitar-heavy. And then, late in his career, he threw a curve and recorded a cover of Nine Inch Nails’ Hurt. If recommender systems had existed when he released it, the song would have dented the algorithms, perhaps even fatally.

Granularity Fail

A further issue with recommender systems is that they seem to treat people’s preferences like computer problems. E. g. if you like, say, movies that involve the American South, history, and a strong male lead, you might be served, under a movie recommender system, both Gone With The Wind and Midnight in the Garden of Good and Evil.

Yet one is a classic romance, whereas the other is a nonfiction work. Even if perfect granularity is achieved, and all of the seemingly relevant data points hit, recommender systems still aren’t necessarily truly up to the task.

J. Ellenberg

As J. Ellenberg says, in This psychologist might outsmart the math brains competing for the Netflix Prize. Wired (2008, February 25). [Link] “Of course, this system breaks down when applied to people who like both of those movies. You can address this problem by adding more dimensions — rating movies on a “chick flick” to “jock movie” scale or a “horror” to “romantic comedy” scale. You might imagine that if you kept track of enough of these coordinates, you could use them to profile users’ likes and dislikes pretty well. The problem is, how do you know the attributes you’ve selected are the right ones? Maybe you’re analyzing a lot of data that’s not really helping you make good predictions, and maybe there are variables that do drive people’s ratings that you’ve completely missed.” (Page 3)

There are any number of thoroughly out there reasons why people like or dislike something or other. Some are far from quantifiable, predictable, or replicable. They can’t be scaled to the entire population, or even one of its segments. Do we prefer a particular song because it reminds us of a point in our life that is no more? Do we avoid a film because it’s where we took our lost love on our first date?

Going Along to Get Along With Recommender Systems

Another issue with recommender systems is that they can often persuade people one way or another. The Salganik and Watts study is rather interesting in this regard. These two researchers presented subjects with a number of unreleased songs and asked them to rate the songs and also download whatever they liked.

Certain songs rose to the top of the charts (just like we normally see on Billboard, the Hot 100 and the like) whereas others were clunkers that fell swiftly. When the researchers switched the presented numbers, showing higher ratings for the stinkers and lower ratings for euphony, test subjects changed their minds. All Salganik and Watts had to do was convince their test subjects that this was the right outcome.

Salganik, M. J., & Watts, D. J. (2008). Leading the herd astray: An experimental study of self-fulfilling prophecies in an artificial cultural market.Social Psychology Quarterly, 71(4), 338–355. [PDF] “…over a wide range of scales and domains, the belief in a particular outcome may indeed cause that outcome to be realized, even if the belief itself was initially unfounded or even false.” (Page 2)

But What is it With Recommender Systems, Really?

Are these instances of undue influence? Self-fulfilling prophecies? Test subjects wanting to appear ‘cool’ or go along with the majority in order to increase personal social capital? And where are ratings and recommender systems in all of this? Are they measuring data?

Or is it, like is the case with the Observer Effect, that the very acts of observation and measurement are skewing the numbers and generating false outcomes?

Or is it, perhaps still the case, that there’s no accounting for taste?

Enjoy Johnny Cash (but only if you want to).