Top 5 lessons on Star ratings as incentives

STAR RATINGS WORK TO INCENTIVISE AND REWARD GREAT SERVICE

Stars aren’t just for Christmas

Stars aren’t just for Christmas

In yesterday’s article, I described how Pret motivate their teams to deliver outstanding customer service.

The key principle was to ‘catch people doing things right’ (to use a favoured phrase of my former Chiltern Railways boss Rob Brighouse) and reward them for it - to ensure that teams are incentivised to deliver great service.

When we founded Snap, we adopted some of these principles.

We were fortunate to have some examples from previous tech platforms to learn from.

Many existing marketplace platforms (Amazon, Uber, Airbnb, etc) rely on simple star ratings out of 5. Unlike Pret, these platforms do not pay for expensive mystery shopping but instead rely on users to rate the service providers on the platform.

A screenshot of my Uber app. Like most people, I rate most trips 5 star

A screenshot of my Uber app. Like most people, I rate most trips 5 star

This is both efficient and reliable.

We adopted the same principle at Snap: every trip is star rated by customers.

From these ratings, we compile league-tables. We have league tables of operators, comparing their overall aggregate scores.

An example of a Snap operator league table. As you can see, even the lower rated operators are still comfortably above 4.

An example of a Snap operator league table. As you can see, even the lower rated operators are still comfortably above 4.

We commercially incentivise operators to achieve outstanding star ratings, either by making it more likely an operator will win work if they have higher ratings (based on an algorithm), or increasing the revenue share of operators with the top ratings.

We also have league tables of drivers, comparing drivers not just within an operator but between operators. These league tables are moderated using a statistical technique called Bayesian Weighting to ensure that no driver is penalised by a few unrepresentative negative customers if they only have a handful of ratings.

An example of a Snap driver league table, showing three month average scores and the percentage of ratings that came in at 5 star in the last month. As you can see, even quite far down the league table, most drivers were getting a high proportion of…

An example of a Snap driver league table, showing three month average scores and the percentage of ratings that came in at 5 star in the last month. As you can see, even quite far down the league table, most drivers were getting a high proportion of 5 star scores in the last month.

In addition, we also automatically generate certificates for drivers who generate above-average star ratings.

These certificates show the driver’s average star rating, the proportion of star ratings awarded that are five stars, and any verbatim comments from customers.

An example Snap driver certificate. It was great to see drivers sharing on their personal Instagram.

An example Snap driver certificate. It was great to see drivers sharing on their personal Instagram.

I make a point of travelling on Snap coaches every week (well, I used to, when there were Snap coaches!), talking to customers and drivers. I had an internal target that I would personally speak with 30 customers every week. This gave me a good sense of how well the algorithmic star ratings reflect the reality of customers’ experience and gave me a sense of how drivers reacted to the use of star ratings.

So what did I learn from all of this?

1) Customers are very happy to star-rate. Talking to people about using star ratings, I know there’s a fear that customers won’t take part. That isn’t an issue for us. In fact, we’ve never had a single journey that wasn't star rated. We achieved this by making it hyper easy for customers to star rate. They get a text message asking them to star rate their journey just after they leave the coach while it’s still fresh in their minds, and they simply need to send us a message back with the rating. The system automatically captures the number as their numeric star rating, and the words as their verbatim feedback. They don’t need to open an app - they can do the whole thing in two keystrokes on a text message. I can only think of one occasion this system went wrong (🙈) and this is described in the * below…

2) Customers rate high. Another common fear about star ratings is that staff will be demoralised by a constant stream of negative feedback. This is not the case. Indeed, we had the opposite problem. The average star rating is 4.7. I would sometimes travel on a coach with a broken loo and a grumpy driver (not too often, I’m glad to say!). But trying to explain to the operator that 4.2 is a bad score was hard. We built into our contracts a ‘floor’ of 4.25 because a consistent set of scores below 4.25 is evidence of seriously unhappy customers. Believe it or not, most people are generous in their praise!

3) Drivers love it. Linked to the above point, one of the great things about peoples’ generosity in their feedback is that it provides a constant stream of positive feedback. Speaking to coach drivers, without Snap, the vast majority of customer feedback they receive is negative. Customers typically can’t be bothered to write with praise, so they only get in touch when something’s gone wrong. But by removing all the friction to feedback, Snap customers are liberated to say nice things, which mean drivers get loads of positive reinforcement for doing the right thing. It was great talking to drivers and seeing just how much they appreciated hearing how much customers appreciated what they did. Even better was seeing drivers post their Snap certificates on their own personal Instagram or Facebook profiles, and realising just how much we were filling a gap in providing them with professional praise and validation.

4) The driver of success is the driver. We’ve done a lot of work correlating star ratings against different features of the journey. One thing we learned is that the number one driver of customer satisfaction is the driver. We also created word clouds of verbatim feedback and found that the driver is one of the most frequent subjects of feedback. Yet, in our industry, vastly more investment is put into the ‘hardware’ of the vehicle than the ‘software’ of the person driving it.

5) It works. We started doing this because we realised that as Snap grows (and, hopefully, one day, when all this horror is over, it will grow again!) we couldn’t possibly travel on every coach. But our customers can. And we shouldn’t determine what good looks like; our customers should. I’ll give you three case-studies that illustrate just how well this worked.

  • Case study 1: Phil (not his real name) seemed to be the model coach driver for Snap. He was young and enthusiastic, he was great with the customers, he loved the concept. He was so good that when we held an all-operator ‘summit’ for the leaders of our operators, Phil was one of the two drivers we invited along to give a driver perspective. And yet his scores were bizarre: they were either high 4s or below 3 (which is almost unheard of!). It turned out that Phil had a tendency to panic about his remaining drivers’ hours and make a late decision to stop at a service area for a break, often less than an hour from the end of the journey. Customers understandably hated it. It alerted us to something that turned out to be true across the board: the thing that depressed scores more than anything was breaks at services. So we were able to put extra effort into ensuring operators had pre-planned their drivers’ schedules to take breaks before they started their Snap trips. Had we not had star ratings, however, we’d probably have been blinded by our instinctive enthusiasm for Phil and not realised that he had a killer trait that customers hated.

  • Case study 2: An operator was in danger of falling below our 4.25 ‘floor’ threshold and leaving the Snap network. Their coaches were typically a lot older and shabbier than their peer group and their drivers were … curt. They made one last roll of the dice and recruited Jayne (again, different name) as a dedicated driver for their Snap trips. She was awesome. Customers adored her. This operator went from the worst-performing operator on the network to the best. Note that this was with the same shabby coaches: a reminder that the driver of success is the driver.

  • Case study 3: One operator, who shall be nameless, was a disaster. On one occasion two coaches travelling to different destinations made unnecessary stops at the same service area, delaying both trips by 30 minutes. When we fed this back to the operator, their MD responded (as if this excused the matter entirely): “Ah, well, those two drivers are married, so they wanted to see each other.” This operator’s star ratings started low, and declined. They rapidly sank below our floor threshold and were off the network. All based on an entirely empirical measure but without complex and expensive audits and surveys.

The thing I loved most about star ratings was that they were a win-win-win.

Operators loved the real-time feedback. One operator MD told me that the main reason they did Snap trips was because it provided them with a better source of performance management of their drivers than anything else they did.

Drivers loved the positive feedback and praise, and the fact that doing the right thing would be rewarded.

And we loved having an efficient management tool for driving up standards and generating a virtuous circle of increasing scores.

Often when I talk to people in the established transport industry, they look at services like Uber and think of their success as being down to having an app.

But it’s never about the technology: it’s about the product. As per yesterday’s article, Pret have achieved the same outcomes without any kind of rating technology at all. The key thing is to believe in rewarding the best to become better. Moving beyond, as Alex Hynes put it, ‘not enough carrot and not enough stick’.


WHAT DO YOU THINK? ARE STAR RATINGS A POSITIVE DEVELOPMENT? SHOULD TRANSPORT USE THEM MORE? join the debate on linkedin

Do you tweet? Here’s one ready-made.

* - so remember what I said about how the system takes the digit in the customer’s message as the star rating, and the words as the verbatim feedback. It then parses them into separate fields, so the number can go into the star ratings algorithm and the verbatim onto the driver’s certificate. On one occasion one of our operators used a coach without USB plug sockets. The customer sent this message in response to the ratings request: “4 but plugs would be nice”. The system, of course, separated this into numbers and words, so the verbatim feedback was the words without the numbers. And if you can see why the resulting words were a problem, you have a dangerously filthy mind…

Previous
Previous

Budget 2021: Let’s stop fighting the last war

Next
Next

Carrot and stick: People lessons from Pret