Editors desk: Rating, expectations, and experiments

Rating, expectations, and experiments

Last night I watched Snowpiercer. I'd heard good things about it online, I like to support simultaneous releases on iTunes, and it had a whopping 95% rating on Rotten Tomatoes. I was hugely disappointed. It wasn't a terrible movie. It wasn't a great movie either. But that 95% had set such an expectation for me that when I watched it, the massive flaws made it so much worse, perceptively, that if I'd gone in thinking it was a 40% to 60% movie. I watch all sorts of silly sci-fi, and enjoy it. I just go into it expecting silly sci-fi. How the rating influenced by perception and enjoyment of the film got me thinking. How do we rate things on iMore, and how can we do it better?

Ratings systems are something almost every organization that reviews almost anything has to figure out. (Even if that figuring leads to them not using a ratings system at all.) There are different systems and pros and cons to each, as well at to using one or not.

By way of another example, I recently tried out a game that I didn't like. I found the first run experience and the design to be less than good. Georgia really liked the gameplay, however, and Chris the community. We all cared about different things. How do you account for that?

5-star scales are common. iTunes uses them. Amazon uses them. Each star can either be whole, allowing for a 5-point spread (0 to 100% in increments of 20), or halved, allowing for a 10-point spread (0 to 100% in increments of 10). They also allow for relative measure. A 4-star app is better than a 3-star app, for example. They're not so good at qualifying those measures. Why is the 4-star app better?

Thumbs up vs. thumbs down — recommended vs. not recommended — are also common. Instead of a lack of positives, they actually highlight negatives as well (-100% to +100% with potentially 0 in the middle). You can easily tell if some apps are good and other bad, but they any relative measures. You can't tell how good or bad they are compared to other apps.

Sometimes elements of both are combined, and you get a recommendation scale. For example, must avoid, not recommended, recommended, must have.

All of them can suffer from similar problems. Should 1-star or 2-star, or non-recommended apps be reviewed at all? What's the real difference between a 3-star and 4-star or recommended and must-have app, beyond the personal opinion of the reviewer?

What happens if you rate an app 5-stars and a better app comes along? If that app gets better? What happens if part of an app are great and others... not so much.

Most of all, how do you overcome the lack of nuance and specificity that, while making ratings highly glance-able, also make them incredibly shallow.

One of the things I've been thinking about it to tie ratings to specific criteria. It dawned on me during Apple's user experience evangelist Mike Stern's talk on Designing Intuitive User Interfaces at WWDC 2013 that a lot of ideals set for developers and designers could be used as measures for the resulting apps.

For example, how usable is an app? How simple, clear and intuitive are the navigation and controls? How useful is it? Are the features well defined, focused, and implemented? How well designed is it? Is the interface attractive and the interactions enjoyable? How accessible is it? Can it be used by as wide a range of people as possible?

I've not yet finished thinking it through, and there's a lot still to consider, weight, and figure out how to map to ratings, but I think it could ultimately lead to something that provides really glance-able information that's backed by solid criteria. Especially if that criteria is elaborated upon in the review that includes the rating, then it lets everyone know not only relative measures and like or dislike, but areas where apps excel and where they don't. The information density increases, the precision increases, and hopefully the value increases.

So, here's where I ask for all of your help. What ratings systems do you like most? What provides you with the most value? What would you like to see on iMore?

Assorted other stuff:

  • I also saw Guardians of the Galaxy, which is at 92% on Rotten Tomatoes. I find that score equally ridiculous, but I also enjoyed the hell out of it. It wasn't transcendent by any means, but it was a ton of fun.
  • Here's some more of Rotten Tomatoes at the movies: Star Wars at 92%, The Matrix at 87%, The Avengers at 92%, the Godfather at 100%, and so on.
  • Guy English had some smart follow up to our recent Debug podcast with Marco Arment of Overcast and he posted on [Kickinbear] (http://kickingbear.com/blog/archives/464)
  • Speaking of smart stuff, read Ben Thompson's piece on the app business being a business.
  • Our friends Cali Lewis and John P. opened their new Geek House over the weekend. Huge congrats to both of them. Phil Nickinson went to celebrate along with them on our behalf.

Have something to say about this story? Leave a comment! Need help with something else? Ask in our forums!

Rene Ritchie

EiC of iMore, EP of Mobile Nations, Apple analyst, co-host of Debug, Iterate, Vector, Review, and MacBreak Weekly podcasts. Cook, grappler, photon wrangler. Follow him on Twitter and Google+.

More Posts



← Previously

Best iPhone apps for contractors: Win more bids and manage job sites better!

Next up →

Apple partner GT Advanced ready to churn out sapphire glass production

Reader comments

Rating, expectations, and experiments


Do you know how RT works? It's very different from other ratatings, it's just a percentage of critics who were satisfied with the movie. It you see 9.5 on IMDb, yeah, likely it's a masterpiece or something really great. But 95% of RT means nothing but the odd what it's likely to be a good movie, satisfy you. Nothing more.

Yes, I know how RT works. 95% fresh is a fairly high satisfaction rate. Tremendous implications come along with that, especially for a sci-fi movie. i.e. through story and acting it did the job it was hired to do for almost all critics.

That's tough to reconcile with this film.

Anyone who doesn't see that Snowpiercer is pure shite and not actually science-fiction (which requires believability by definition), is a fool or someone who knows little about literature, history and indeed, science-fiction.

I found the discussion of star systems interesting though in that they are all (to me) fatally flawed in that they DON'T actually go from "0% to 100%." It is rare that a star system is ever used that allows the user to rate the movie (use the system), and yet give the movie zero stars. So every single rating of a movie that the user does is actually enforced positivity (you must give it at least one star), and there is no way to actually register a negative reaction to a movie.

In other words a seriously flawed and "gamed" system from the very start, every time.

The actual spread of 5 stars is commonly smaller than it seems, as most systems do not allow for zero stars at all.

A rating must be valid when it is made, worrying about giving 5 stars and a future app being better will lead to not rating or writing anything at all.

Your misinterpretation of RT brings up a good point though: it should be obvious for the reader what the ratings mean. 1 through 5 means nothing.

We do use the 5 star rating for reviews in our corporate portal and have provided the following legend: 1 - Can't be used for the purpose advertised, 2 - Performs advertised functions, but has stability or usability issues, 3 - Does everything as advertised, 4 - Better than 3 by performing ahead of expectations in one discipline (functionality, usability, design, workflow support), 5 - Better than 3 by performing ahead of expectations in more than one discipline.

Obviously, this approach is targeting productivity apps, the same decode would not work well for games. But it provides a very clear guideline for reviewers and readers, guaranteeing that ratings convey a consistent meaning.

I would not try to compare movie (book, album, TV show) review approaches with software reviews though. Even if Hollywood claims otherwise, there is no universal criteria for a good movie. Art can only be rated against its own claims/goals, and even that is subjective.

I understand RT just fine thanks :)

That's not a bad star rating legend.

Things like interface, interactivity, etc. are personal even on apps, as are the way functions are implemented (Twitterrific vs. Tweetbot is a classic example) so I'm not sure there can be purely objective ratings for apps either.

Same experience here. Figured the studio was maybe trying to test a new delivery style and give a good movie at the same time. It was reminiscent of Waterworld. Stuck on a train. Lol

I like the systems which incorporate a number of different levels in the ratings. For example the iTunes store or Amazon which have the 1 to 5 star rating, then the reviews that customers have left and finally the yes/no on whether the review was helpful. I feel that by combining the different levels I get a reasonable picture of whats being reviewed.

I find the biggest problem with any rating systems at the moment is the pressure being placed on the reviewers to give a 5 star (or equivalent). The number of app developers which want a 5 star review because it affects their lively hood, or my local car dealership which wants at least a 9/10 review after they service my car because "anything lower reflects poorly on them". Instead of having a system which reflects whether a person likes or dislikes the product or service we end up with skewed result which is meaningless and in some cases can be misleading.

What the solution is I don't know, for the time being I just rely on the reviews that other people find most useful.

Exactly. The app store rating system is fast becoming synonymous with the type of thing eBay does wherein everyone is basically forced (and threatened with expulsion if they don't comply) to ALWAYS give everyone five stars (unless you had a serious problem). The only allowable exceptions are if you can prove that something horrible happened, then they will allow you to make it three or four stars.

Any rating system that doesn't allow for negative ratings (or even average ratings), is a gigantic FAIL and irrelevance just waiting to happen. Because the money is made off of the rating, the only pressure on ratings is for them to always be positive and to go up and up and up. It's already the case on the app store, that negative ratings and intelligent, reasoned reviews are marginalised at best. Smothered, and drowning in a sea of irrelevant positivity that tells the user nothing about the product, and is unable to weed out the bad product.

Take Sharknado, Sharknado 2 and the upcoming Sharknado 3 - schlock but will get high ratings for enjoyment factors even though Sharknadoes are improbable. Take a look at the Original Unedited Star Wars Trilogy, there are clear flaws that you can see (see-through speeder console "Empire Strikes Back", but people enjoy it.

i think there's a reason they are called 'user reviews' or something similar on most service because that is one customers opinion based on their experience (most of the time) and people like to be able to see what people have been saying about thing before they buy them

You might want to look at my series of articles on the design of rating systems starting at http://www.lifewithalacrity.com/2005/12/collective_choi.html — a lot of analysis about what makes a good rating system, why 5 star rating systems have problems, and a variety of other issues. In subsequent articles in this series we implemented some of our own research and ended up with a rating system with measurably better results.