Performance Evaluation - the ‘why’ makes a difference • ScienceForWork

Designed by Freepik

Key points

People providing performance evaluation ratings for salary or promotion purposes tend to be more generous and less accurate than when providing ratings for development or feedback purposes.
Like parents who always think their children are above average, the generous ratings are an unfortunate result of good intentions, but can have a detrimental effect on salary and promotion decisions.
Robust ‘frame of reference’ training, rating scales based on specific behaviours, and improving raters’ motivation and accountability can all help to reduce the effects of over generous ratings.

If you design or develop performance evaluation systems, then chances are you are aware of the problems in getting managers and employees to evaluate performance accurately. Regardless of whether the problem is getting raters to recognise poor performance, or getting them to rate it more accurately, bias is an issue.

Performance evaluation ratings – a bit like getting parents to rate their own children

A meta-analysis including 22 controlled studies with over 55,000 people found that the purpose of the evaluation, whether for salary/promotion or development/feedback, does affect the ratings that people provide (Jawahar and Williams, 1997).

In particular, people providing ratings for salary or promotion purposes tend to be more generous and less accurate than they are when rating employee development or giving feedback. It seems that in the real world, raters don’t like to mark people down, and they don’t like to make big distinctions in performance between people. This means that, like parents being asked to rate their own children, most people end up being ‘above average’. Since it was a consistent finding over a large number of people, it begs an obvious question.

Why does this happen?

There might be a number of possible causes of the over generous ratings used to make salary or promotion decisions, here are a few:

to try to get a positive outcome for the employee, or
to motivate someone who hasn’t performed very well,
to avoid having to give negative feedback
or simply to avoid the consequences of ‘harsh but accurate’ ratings.

This is by no means exhaustive. In much the same way as parents tend to overestimate their children’s potential, there are many other reasons why people might give lenient ratings.

Takeaways for your practice

If you are responsible for designing or implementing evaluation systems, you might be tempted to abandon using ratings altogether. However, there are a number of practical things you can do to address the problems of providing accurate performance ratings for salary/promotion purposes.

Define what “Good Performance” really means in your business.

Take in account that performance includes many different aspects, such as what people achieve (e.g., productivity, quality of output, number of errors, etc.), but also how they get to end results (e.g., extra tasks, effort, dedication, cooperating with others). Learn more about performance dimensions and indicators in Koopmans et al. 2011.
Choose specific behaviors and competencies as indicators of performance that you want people to demonstrate in the roles being assessed. In other words what they need to do, how they need to do it and how well they need to do it, in a way that’s culturally appropriate to the organization.

Use standardized tools.

Develop and use Behaviourally-Anchored Rating Scales (BARS), which provide raters with specific examples of behaviors that meet, overshoot or undershoot expectations. Learn more on how to develop them in Kell et al. 2017.

Provide training on performance evaluation.

Make raters aware of the biases that affect the quality of their decisions (such as contrast effects, halo effect, availability heuristic, confirmation bias, etc.). You can develop a checklist for managers to use when giving ratings which explains, with examples, what these biases entail.
Make sure that raters are clear which aspects of performance, and which behaviors and competencies they need to evaluate.
Provide opportunities to practice and receive feedback on the use of new evaluation standards. Learn more about performance evaluation training in Roch et al. 2012.

Increase accountability

Make raters accountable for their evaluations. For example, you can nudge people by telling them that their output will be reviewed by an expert or an independent third party (see Sylvia Roch’s research on accountability).

Trustworthiness score

We critically evaluated the trustworthiness of the study we used to inform this Evidence Summary. We found that the design of the study was highly appropriate to demonstrate a causal relationship, such as effect or impact. Therefore, based on this source of evidence, the claim that performance evaluation purpose has an effect on the ratings distribution is ninety percent (90%) trustworthy. We conclude that there’s a ten percent (10%) chance that the results are due to alternative explanations, including random effects.

Learn how we critically appraise studies to assign them a Trustworthiness Score.

We aim to provide you only the best available scientific evidence to inform your decisions.

Did you like this Evidence Summary? Share it with your network by clicking on the buttons below!

Follow us on LinkedIn, Twitter and subscribe to our newsletter to receive the most trustworthy scientific research summarized in less than 1000 words!

References:

M. Jawahar & Charles R. Williams (1997). Where all the children are above average: The performance appraisal purpose effect. Personnel Psychology, 50, 905-923.

Koopmans, Bernaards, Hildebrandt, Schaufeli, Henrica, de Vet & van der Beek (2011). Conceptual frameworks of individual work performance: A systematic review. Journal of Occupational and Environmental Medicine, 53(8)

Roch, S. G., Woehr, D. J., Mishra, V., & Kieszczynska, U. (2012). Rater training revisited: An updated meta‐analytic review of frame‐of‐reference training. Journal of Occupational and Organizational Psychology, 85(2), 370-395.