How do we evaluate the trustworthiness of scientific evidence? • ScienceForWork

At ScienceForWork, we communicate the science of human behavior at work. This is a specialized body of knowledge that tells us about which workplace practices are good and which are harmful, why, and what we can do about it. Not all scientific evidence is created equal. Knowing this, we critically evaluate the relevance and trustworthiness of scientific studies to answer practical questions about the workplace.

What questions do managers ask?

As managers, we ask burning questions about cause and effect. These may concern:

effectiveness: do using structured interviews improve hiring accuracy compared to unstructured interviews?
safety: will identifying so called ‘high potentials’ do more good than harm?
cost effectiveness: is flexible work cheaper? Is activity-based working more cost-effective than regular work arrangements?

We can also ask questions about:

process: how does performance appraisal process work?
acceptability: will line managers accept a new organizational change?
satisfaction: are line managers satisfied with new methods of working?

Depending on what questions we ask, certain types of scientific studies give the most appropriate answer.

What research design is most appropriate to answer cause and effect questions?

When we critically appraise a study’s trustworthiness for making causal claims, its methodological appropriateness sets the starting level of trustworthiness. The methodological appropriateness is the degree to which a study can answer a practical question based on its design. The design is the ‘blueprint’ of a study that describes its steps, methods and techniques used to collect, measure and analyze data. We examine the study’s methodological appropriateness referring to the pyramid of evidence (see image below). For example, to understand the effects of the practice of performance appraisal on workplace performance a meta-analysis of randomized controlled studies has very high methodological appropriateness for demonstrating the underlying causal effects. By contrast, a cross-sectional study has low methodological appropriateness to address this question. Still, according to the questions we ask, it can be highly informative to tell how satisfied people are with the practice of performance appraisal that we have in place.

Not all scientific evidence is created equal: Why does methodological quality matter?

Not all that glitters is gold: even a methodologically appropriate study can sometimes be untrustworthy! Trustworthiness is also affected by a study’s methodological quality, that is, the way the study was conducted. The trustworthiness of a study with very high methodological appropriateness can drop dramatically when the study is not well conducted and as a result contains several weaknesses. In fact, if a meta-analysis of randomized controlled studies contains too many serious flaws, its trustworthiness can drop from 95% all the way to 55%, which is slightly more than chance.

What is the best available scientific evidence?

The pyramid of evidence shows that scientific results can come from different research designs. At ScienceForWork, we communicate information with high generalizability (breadth) and reliability (consistency); the best available evidence at any time. We mainly choose meta-analyses and systematic reviews, which combine results from many studies to answer a detailed question. These large studies reflect reality better than a single study in one situation. Building on painstaking, incremental work done by teams of scientists who produce single studies, meta-analyses help us recognize the big insights about human behavior at work with the highest confidence.

The philosophy of meta-analysis and systematic reviews sounds like the old saying that,

a dwarf standing on the shoulders of a giant may see farther than the giant himself.

Sometimes, we cover high-quality, single studies. This trades some level of breadth for higher cause-effect connections. When the single study happens in a workplace, instead of in a lab, we have more confidence that the same result can happen in your workplace.

In the end, what is our takeaway? The Trustworthiness Score

Combining the level of methodological appropriateness and the study’s quality gives us a measure of trustworthiness: the chance the outcome of the study was caused by the intervention or variable(s) after controlling for biases and statistical noise. We represent this as a Trustworthiness Score; red reflects the lowest degree of confidence, while green gives us the go-ahead to apply these insights. For example:

We critically evaluated the trustworthiness of the study we used to inform this article. We found that it has a moderately high (90%) trustworthiness level.

This means that there is only a 10% chance that alternative explanations for these results are possible, including random effects.

Ultimately, this is our philosophy: where evidence is strong, we should act on it. Where evidence is suggestive, we should consider it. Where evidence is weak, we should find reliable information and build the knowledge to support better decisions in the future.

You can learn how to assess scientific claims accessing the free e-learning course on Evidence-Based Practice in Management and Consulting offered by the Carnegie Mellon University, here.

References

Barends, E., Poolman, R., Ubbink, D., ten Have, S., (2015). Systematic reviews and meta-analysis in management practice: How quality and applicability are assessed? In: Barends, E. (2015). In search of evidence. Empirical findings and professional perspectives on evidence-based management. Center for Evidence-Based Management: www.cebma.com

You can find the original article here!

Author

Pietro Marenco, Editor and Critical Appraisal Specialist @ScienceForWok