OUP user menu

The effect of performance indicator category on estimates of intervention effectiveness

Alexander K. Rowe
DOI: http://dx.doi.org/10.1093/intqhc/mzt030 331-339 First published online: 10 April 2013

Abstract

Background A challenge for systematic reviews on improving health worker performance is that included studies often use different performance indicators, and the validity of comparing interventions with different indicators is unclear. One potential solution is to adjust comparisons by indicator category, with categories based on steps of the case-management process that can be easily recognized (assessment of symptoms, treatment etc.) and that might require different levels of effort to bring about improvements. However, this approach would only be useful if intervention effect sizes varied by indicator category. To explore this approach, studies were analyzed that evaluated the Integrated Management of Childhood Illness (IMCI) strategy.

Methods Performance indicators were grouped into four categories: patient assessment, diagnosis, treatment and counseling. An effect size of IMCI was calculated for each indicator. Linear regression modeling was used to test for differences among the mean effect sizes of the indicator categories.

Results Six studies were included, with data from 3136 ill child consultations. Mean effect sizes for 63 assessment indicators, 12 diagnosis indicators, 31 treatment indicators and 34 counseling indicators were 50.9 percentage-points (%-points), 44.7, 36.5 and 46.6%-points, respectively. After adjusting for baseline indicator value, compared with the assessment mean effect size, the diagnosis mean was 7.3%-points lower (P = 0.23), the treatment mean was 15.2%-points lower (P = 0.0004) and the counseling mean was 12.9%-points lower (P = 0.0027).

Conclusion Adjusting the results of systematic reviews for indicator category and baseline indicator value might be useful for improving the validity of intervention comparisons.

  • child health
  • developing country
  • health services research
  • methods
  • quality improvement
  • systematic reviews

Introduction

To identify the best ways to improve health worker performance, some systematic reviews compare the effectiveness of multiple interventions in one analysis [16]. However, a major methodological challenge is that studies in these reviews often use different indicators of performance, and the validity of comparing interventions with different indicators is unclear. The concern is that some performance indicators might be easier to improve than others. For example, it might not be valid to compare the effectiveness of two interventions when one is evaluated with the indicator ‘patient is checked for cough’ (a simple task) and the other is evaluated with ‘severe febrile illness is correctly treated’ (a complex task).

One option is to perform separate comparisons for each performance indicator. While this approach does solve the problem of potentially unfair comparisons due to dissimilar indicators, its drawback is that each indicator-specific analysis is likely to include few studies. This limitation was highlighted in a recent review [7].

If studies with different indicators are included in a single analysis, one could adjust comparisons by indicator category. This option has been tried in at least two reviews [5, 6]. Investigators used linear regression to model effect sizes and included a variable in the model that coded for the complexity of the targeted health worker behavior represented by the performance indicator. (Notably, the models included other variables such as baseline performance level and study quality.) This approach, however, seemed to have limited value because coding behavior complexity was considered to be subjective and the variable was not always associated with effect size.

Continuing with the scenario in which different indicators are included in a single analysis, another option is to categorize indicators according to the corresponding step of the case-management process: assessment of patient signs and symptoms, diagnosis, treatment or counseling. As identifying steps in the case-management process is fairly straightforward, this approach would probably be more objective than judging behavior complexity. However, there do not seem to be any reviews that have used this approach; and it is not known whether indicator categories based on case-management steps are associated with intervention effect size (i.e. are indicator categories an effect modifier?). For example, are effect sizes for indicators of correct diagnosis different from effect sizes for indicators of correct treatment? To determine whether intervention effect size varies by indicator category based on case-management steps, data were analyzed from trials that evaluated the same basic intervention to improve health worker performance—the World Health Organization's Integrated Management of Childhood Illness (IMCI) strategy [8].

Primarily through an 11-day training course and job aids for health workers in developing countries, IMCI promotes the use of an evidence-based clinical algorithm on the assessment, diagnosis and treatment of ill children, as well as on counseling the child's caretaker on home care. Seventy-six countries have reportedly scaled-up IMCI beyond a few pilot districts [9].

Methods

Studies were identified from a recent systematic review [10]. The search strategy and inclusion criteria are shown in Box 1. To evaluate IMCI's effectiveness, the studies used results from health facility surveys to compare the performance of health workers with and without IMCI training.

All studies utilized the same basic survey methodology [11]. A sample of outpatient health facilities is visited, and ill child consultations are silently observed. Then, caretakers (usually the child's parent) are interviewed, and the child is re-examined (out of view of the observed health workers to avoid influencing their practices) to obtain a ‘gold standard’ determination of the child's IMCI illness classifications (i.e. diagnoses). At the end of the day, health workers are interviewed and a health facility assessment collects information on the availability of equipment and drugs. Quality of care is assessed by comparing the practices of observed health workers to the gold standard results.

Values of dichotomous patient-level performance indicators were abstracted from study reports, and the indicators were grouped into four categories: patient assessment, diagnosis, treatment and counseling. When two performance indicators from the same study were very similar, only one (i.e. the indicator with the largest sample size) was included in the analysis. For example, ‘child correctly classified’ was included, but ‘child correctly classified omitting coughs and colds and no dehydration’ was not.

Box 1.

Search strategy and inclusion criteria

Search strategy

Five methods were used to identify studies of the effectiveness of the IMCI strategy (details in Rowe et al. [10]).

  1. MEDLINE search of the key word ‘IMCI’.

  2. Search of reports from World Health Organization (WHO) regional offices and WHO's Department of Child and Adolescent Health and Development (mainly unpublished reports).

  3. Search of the WHO/International Network for Rational Use of Medicines (INRUD) database for IMCI intervention studies (personal communication, K. Holloway, 27 June  2007).

  4. Search of the Health Care Provider Performance database with key words ‘Integrated Management of Childhood Illness’ and ‘IMCI’ (personal communication, S. Rowe, May 2006).

  5. Contacting investigators and content experts.

Inclusion criteria

From the 29 studies identified by the search (described above), studies were included in the analysis of the present report that:

  • Evaluated IMCI effectiveness with an intervention group of IMCI-trained health workers whose performance was compared with that of a control group of health workers without IMCI training (with or without randomized allocation).

  • Used WHO's standard IMCI training course that lasted at least 11 days, which is the minimum recommended by WHO.

  • Reported indicator values from at least three of the four indicators categories (assessment, diagnosis, treatment and counseling) that were based on at least 20 patients per study group and time point.

  • Included at least 20 health facilities.

An effect size of IMCI's impact was calculated for each indicator. For studies that only reported results after IMCI was implemented, the effect size was the absolute percentage-point (%-point) difference between the indicator value for the IMCI-trained health workers and the indicator value for workers without IMCI training. For studies that reported results before and after IMCI, the effect size was the ‘difference of differences’—i.e. (follow-up – baseline)IMCI − (follow-up − baseline)no IMCI. When results from several follow-up time points were available, the latest results were used to reflect the longest possible follow-up period.

Among the four indicator categories, it was hypothesized that effect sizes would be greatest for assessment indicators because they generally reflect performance of a single simple task (e.g. checking for fever). Following this logic, diagnosis effects would be smaller than assessment effects (diagnosis is more complex, requiring the integration of several pieces of information), and treatment effects would be the smallest (treatment requires correctly selecting one or more drugs at the appropriate dosage and duration, plus referring severely ill children for hospitalization). Counseling effects were hypothesized to be roughly similar to diagnosis effects, as the difficulty of counseling was thought to be in between that of assessment and treatment.

Three linear regression models were created with effect size as the dependent variable. Model 1 included only dummy variables for the indicator categories. As the baseline indicator value might be related to both effect size and indicator category, it was considered as a potential confounder. Thus, Model 2 included the indicator category dummy variables and baseline. As mentioned previously, other reviews have similarly included baseline indicator value when modeling intervention effect size [5, 6]. For studies that only reported results after IMCI was implemented, baseline = the indicator value in the no-IMCI group, and for studies that reported results before and after IMCI, baseline = (baselineIMCI + baselineno IMCI)/2. Model 3 was the same as Model 2, except that it also included dummy variables that coded for the study site (or intervention arm for studies with multiple IMCI intervention arms). The study site variables helped adjust for contextual differences that influenced IMCI effectiveness (i.e. to account for the fact that some studies tended to have higher effect sizes than others).

In addition to the risk difference (used in the main analysis), two other measures of effect size were examined: the odds ratio (OR) and the percent reduction in defect rate (PRDR)—i.e. [(error prevalence in the no-IMCI group − error prevalence in the IMCI group)/(error prevalence in the no-IMCI group)] × 100%. In the OR analysis, to prevent division-by-zero problems, indicators with a value of zero were assigned a value of 0.5. To normalize the distribution of ORs, the natural logarithm of the OR (log OR) was analyzed.

Results

Six studies met the inclusion criteria (Table 1). Four studies (Bangladesh [12], Morocco [13], Tanzania [14] and Uganda [15]) each had one IMCI intervention arm and one no-IMCI comparison arm, and two studies (Benin [16] and Cambodia [17]) each had two IMCI intervention arms (i.e. IMCI with additional supports, and IMCI without additional supports) and one no-IMCI comparison arm. Thus, there were eight IMCI intervention arms. The six studies included 3136 ill children seen at 547 health facilities. The duration of IMCI training courses ranged from 11 to 16 days. The definitions of health worker performance indicators varied by study; however, most used standard indicators recommended by the World Health Organization [11].

View this table:
Table 1.

Summary of studies included in the analysis

Country [reference]Sample sizeNumber of indicators in each categoryPercentage of indicators that were statistically significantaReference details
nHFnHWnpatientAssessmentDiagnosisTreatmentCounseling
Bangladesh [12]20NA176514493%Table 4, 2005 results only, rows 4–8, 11, 13, 15–17, 19–21, 23
Benin comparison 1b [16]1011967901844694%Supplementary data, Table S1
Benin comparison 2b [16]1012188931844675%Supplementary data, Table S1
Cambodia comparison 1b [17]80NA330503280%Tables 4–7, 9–14
Cambodia comparison 2b [17]80NA282503280%Tables 4–7, 9–14
Morocco [13]621012716124100%Table 1, rows 1–4, 6–8, 11–12, 14–17; complement of indicator in row 12c
Tanzania [14]73NA404517594%Table 2, rows 1–6, 8–19
Uganda [15]158211686114591%Table 2, 2002 results only, rows 1, 2, 4–12
Totald547NA31366312313488%
  • NA, not available; nHF, number of health facilities; nHW, number of health workers; npatient, number of patients.

  • aBased on what authors reported in the article. For the Cambodia study, a result was considered statistically significant if the indicator's 95% confidence intervals for the IMCI group and no-IMCI group did not overlap. For the Benin study, results are shown in Supplementary data, Table S1. bStudy had three arms: (i) IMCI-trained health workers with additional supports, (ii) IMCI-trained workers with no additional supports and (iii) workers without IMCI training. Comparison 1 compares workers who received IMCI with additional supports versus the no-IMCI group, and Comparison 2 compares IMCI without additional supports versus no IMCI. The additional support in the Cambodia study was extra supervision. The additional supports in the Benin study were extra supervision, job aids and non-financial incentives. Note that in Bangladesh and Tanzania, IMCI was also implemented with additional supports; but all IMCI-trained workers potentially benefited from the additional supports (i.e. these studies only had two arms: IMCI with additional supports, and a no-IMCI group). cIndicator, oral antibiotic prescribed without IMCI indication. With this definition, a decrease in the indicator shows improvement. The complement of the indicator (i.e. 100% – indicator value) was used, so that an increase shows improvement. dTotal number of facilities and patients is less than the sum of nHF and npatient numbers in the table because the Benin and Cambodia studies used the same facilities and patients in the no-IMCI comparison group; thus the totals are lower than the sums to avoid double-counting.

Altogether, 140 indicators were included in the analysis: 63 assessment, 12 diagnosis, 31 treatment and 34 counseling indicators. Most (123/140 or 87.9%) indicator estimates were statistically significant (i.e. IMCI effect sizes significantly >0).

Mean effect sizes for the assessment, diagnosis, treatment and counseling indicator categories were 50.9, 44.7, 36.5 and 46.6%-points, respectively (Table 2). An unadjusted model (Table 3, Model 1) showed that, compared with the mean assessment effect size, the treatment mean was 14.3%-points lower (P = 0.0071), with no significant differences for the diagnosis and counseling means.

View this table:
Table 2.

Descriptive statistics of effect sizes of the IMCI strategy and baseline indicator values

StatisticIndicator categoryAll categories combined
AssessmentDiagnosisTreatmentCounseling
Number of indicators63123134140
Unadjusted effect sizes (%-points)
 Mean50.944.736.546.646.1
 95% confidence interval44.5, 57.330.4, 59.028.7, 44.338.4, 54.942.1, 50.2
 Minimum, maximum−2.2, 95.217.1, 87.13.5, 83.03.7, 86.0−2.2, 95.2
Effect sizes adjusted for baseline indicator valuea (%-points)
 Mean53.345.938.140.4
 95% confidence interval48.5, 58.035.0, 56.831.2, 44.833.8, 47.0
Baseline indicator values (%)
 Median11.815.521.06.311.0
 Mean24.722.923.311.020.9
 95% confidence interval17.6, 31.78.8, 36.916.5, 30.16.2, 15.817.0, 24.8
 Minimum, maximum0, 96.30, 58.10, 69.20, 56.00, 96.3
  • aThe mean effect size for each indicator category was calculated with Model 2 results (see Table 3) while holding the baseline value constant at its overall mean of 20.9% (Table 2, last column, third row from the bottom).

View this table:
Table 3.

Linear regression modeling results of the association between performance indicator category, baseline indicator value and IMCI effect sizes

VariableParameter estimatea (95% confidence interval)P-valueAdjusted R2
Model 1: Indicator categories only3.2%
Intercept50.9 (45.0, 56.8)<0.0001
Dummy variables that code the four indicator categories (reference: assessment category)
 Diagnosis indicator category−6.2 (−21.0, 8.6)0.41
 Treatment indicator category−14.3 (−24.6, −4.1)0.0071
 Counseling indicator category−4.3 (−14.2, 5.7)0.40
Model 2: Indicator categories and baseline indicator values37.6%
Intercept66.4 (60.5, 72.3)<0.0001
Dummy variables that code the four indicator categories (reference: assessment category)
 Diagnosis indicator category−7.3 (−19.2, 4.5)0.23
 Treatment indicator category−15.2 (−23.4, −6.9)0.0004
 Counseling indicator category−12.9 (−21.1, −4.6)0.0027
Baseline, per %-point increase in baseline indicator value−0.63 (−0.77, −0.49)<0.0001
Model 3: Indicator categories and baseline indicator values, adjusted for study site51.2%
Intercept45.5 (33.5, 57.5)<0.0001
Dummy variables that code the four indicator categories (reference: assessment category)
 Diagnosis indicator category−5.9 (−16.5, 4.7)0.28
 Treatment indicator category−15.8 (−23.4, −8.1)<0.0001
 Counseling indicator category−11.6 (−19.2, −4.1)0.003
Baseline, per %-point increase in baseline indicator value−0.58 (−0.71, −0.45)<0.0001
Dummy variables that code the eight intervention arms (reference: Uganda)
 Bangladesh32.9 (19.1, 46.7)<0.0001
 Benin (IMCI with additional supports)23.5 (11.4, 35.5)0.0002
 Benin (IMCI without additional supports)9.3 (−2.7, 21.4)0.13
 Cambodia (IMCI with additional supports)21.8 (7.0, 36.6)0.0045
 Cambodia (IMCI without additional supports)14.1 (−0.7, 28.8)0.065
 Morocco25.6 (11.7, 39.4)0.0004
 Tanzania29.4 (16.5, 42.2)<0.0001
  • IMCI, The Integrated Management of Childhood Illness strategy.

  • aThe units of all parameter estimates are absolute %-points. For example, in Model 1, the mean IMCI effect size for diagnosis indicators is 6.2%-points lower than the mean for assessment indicators; although this result was not statistically significant (P = 0.41).

After adjusting for baseline indicator value (Table 3, Model 2), compared with the assessment mean, the diagnosis mean was 7.3%-points lower (P = .23), the treatment mean was 15.2%-points lower (P = 0.0004) and the counseling mean was 12.9%-points lower (P = 0.0027). The confounding effect of baseline on the indicator category–effect size association can be explained by the strong negative association between baseline and effect size (Fig. 1) and the relatively low baseline values for counseling indicators (Table 2, rows 7 and 8). Thus, after adjusting for the low counseling baseline values, the counseling indicator values decreased (Table 2, row 5). Results from Model 3, which adjusts for baseline and study site, are very similar to those of Model 2; although Model 3 might be over-specified, given the relatively small size of the dataset.

Figure 1.

The relationship between effect size and baseline indicator value

Adjusted R2 values (Table 3, last column) indicated the model with only the indicator categories (Model 1) explained little (3.2%) of the variation in effect sizes, but the model with indicator categories and baseline (Model 2) explained 37.6% of the variation.

Results of the analysis that used the log OR as the effect size were similar to those in the main analysis (which used risk differences). In a model with only indicator categories, the indicator category variables were weakly associated with effect size. However, when adjusted for baseline indicator value (which had a strong negative association with the log OR), the indicator category–effect size association strengthened.

The analysis that used the PRDR as the effect size was complicated by two extreme outliers (PRDR values less than –100%), which were caused by a high baseline value (and thus very low baseline ‘defect’ rate) and the no-IMCI group slightly outperforming the IMCI group. With these outliers removed, the results showed no association between PRDR and baseline; and in a model with only indicator categories, the associations between indicator categories and effect size were highly significant. Notably, in analyses of both the log OR (adjusted for baseline) and PRDR, the results generally matched the hypothesized hierarchy of effect sizes: assessment > diagnosis and counseling > treatment.

Practically speaking, how would one use these results to improve the validity of comparisons among interventions? Consider the simple example in which a linear regression meta-analysis is performed to compare the mean effect sizes of eight mutually exclusive interventions (e.g. training, supervision etc.). Assume that effect sizes are defined as risk differences and that all studies have indicators that fall into the four categories discussed in this report. As shown below, a model could be constructed with an intercept (β0), seven dummy variables to code the eight interventions (β1β7), three dummy variables to code the four indicator categories (β8β10) and a term for the baseline indicator value (β11). The primary results of the model are β1β7; the other four terms are intended to reduce confounding by indicator category and baseline.Embedded Image

To illustrate potential results, the above model was explored using the IMCI studies previously discussed, with Uganda as the reference group (Table 4). (Consider that the eight IMCI intervention arms represent eight different interventions, which is partly true, as IMCI implementation did vary among studies.) Column 3 shows estimates of β1β7 from an unadjusted model—i.e. no adjustment for indicator category or baseline. All ‘study group – Uganda’ differences in mean effect sizes are statistically significant. Unadjusted mean effect sizes are shown in column 7.

View this table:
Table 4.

An example analysis to illustrate how adjustment for indicator category and baseline might benefit a systematic review

CountryMean baseline value, all indicators combined (%)Relative effect size, compared with Uganda reference group (risk difference, in %-points)Mean effect size, all indicators combined (risk difference, in %-points)
UnadjustedAdjusted for indicator category onlyAdjusted for baseline onlyAdjusted for both indicator category and baselineUnadjustedAdjusted for indicator category onlyaAdjusted for baseline onlybAdjusted for both indicator category and baseline
Bangladesh4.646.7*44.8*36.7*32.9*67.568.658.659.5
Benin (IMCI with additional supports)22.630.3*25.7*30.1*23.5*51.149.552.050.1
Benin (IMCI without additional supports)23.515.6*11.016.0*9.336.434.937.836.0
Cambodia (IMCI with additional supports)20.127.9*25.5*26.4*21.8*48.749.348.248.4
Cambodia (IMCI without additional supports)20.120.1*17.818.6*14.140.941.640.540.7
Morocco28.227.5*23.7*30.5*25.6*48.347.552.352.2
Tanzania20.332.7*32.3*31.3*29.4*53.556.153.256.0
Uganda22.8ReferenceReferenceReferenceReference20.823.821.826.6
  • aMean effect sizes predicted from a model with seven dummy variables for the eight study sites and three dummy variables for the four indicator categories. Indicator category dummy variables were held constant at their mean values (i.e. 0.0857 for classification, 0.2214 for treatment and 0.2429 for counseling). bMean effect sizes predicted from a model with seven dummy variables for the eight study sites and baseline, which was held constant at its mean value of 20.8886%.

  • *P<0.05 for test of the difference between the study group's mean effect size and the mean effect size of the reference group (Uganda).

When the model is adjusted for indicator category only (column 4), estimates of β1β7 decrease and two of the eight results are no longer statistically significant. These changes reflect bias in the unadjusted results caused by studies having distributions of indicator categories that were different from the distribution of the entire dataset. For example, treatment indicators were more often measured in the Uganda study (5 of 11 indicators or 45%; last row of Table 1) than in the entire dataset (34 of 140 indicators or 24%). As treatment indicators tended to have lower effect sizes, one could argue that the Uganda study's relatively high proportion of treatment indicators partly explains why IMCI's effectiveness was relatively low. Indeed, when the model is adjusted for indicator category, the mean effect from the Uganda study increases from 20.8%-points (column 7, last row) to 23.8%-points (column 8, last row).

Similarly, when the model is adjusted for baseline only (column 5) or indicator category and baseline (column 6), additional changes in the β1β7 estimates are seen—again, reflecting bias in the unadjusted results. In the most extreme case (row 2: Benin IMCI without additional supports versus Uganda), the adjustments cause the β-estimate to decrease from 15.6%-points (unadjusted result in column 3, which was statistically significant) to 9.3%-points (adjusted result in column 6, which was non-significant)—i.e. a 40% decrease and a change in statistical significance.

Discussion

In reviews on health worker performance, the heterogeneity of performance indicators can complicate comparisons among interventions. If enough studies use the same indicators, then separate indicator-specific analyses could be performed. If multiple indicators are included in the same analysis and indicators vary among studies, then one could adjust for indicator category to reduce bias. This analysis suggests that basing indicator categories on case-management steps might be a useful approach.

Although not all differences were statistically significant, after adjusting for baseline indicator values, the results generally matched the hypothesized hierarchy of effect sizes: assessment > diagnosis and counseling > treatment. In the main analysis, which used risk differences as the effect size measure, the mean effect size for indicators of relatively easier tasks (assessment) was considerably larger (by 13–15%-points) than mean effect sizes for more complex tasks (treatment) and tasks at the end of the consultation when health workers might feel pressure to conclude the patient encounter and move onto the next consultation (counseling). These 13–15%-point differences are greater than typical effect sizes of commonly used interventions such as education [6] and audit and feedback [5, 18]. In other words, the bias introduced by indicator type could be large relative to effect sizes that one might often encounter in a review.

The strengths of this analysis are that it includes eight intervention arms from a variety of contexts that all examined the same basic intervention. Key limitations are that indicator categories had small sample sizes, except for assessment, and that only process indicators of case-management were analyzed (e.g. no indicators of healthcare utilization or health impact). Additionally, this analysis did not explore all types of process indicators. For example, among treatment indicators, there might be differences in effect size depending on whether an intervention was encouraging workers to prescribe useful treatments versus discouraging the use of unnecessary treatments that they routinely give (with the latter behavior potentially more difficult to change).

The main analysis found that effect size decreased by about 0.6%-points for every 1%-point increase in baseline value (P < 0.0001). This inverse relationship between effect size and baseline has been noticed elsewhere [7, 18, 19], and it is essentially a manifestation of the law of diminishing marginal returns. At low baseline values, there is much room for improvement; but as baseline increases, the potential for improvement decreases. But these facts do not fully explain the phenomenon: at lower baseline levels, just because a health worker can improve more does not explain why he or she actually does improve more. Indeed, even if one only considers baseline values less than 20% (i.e. a range with enormous potential for improvement), the association is still significant (P = 0.0047). Perhaps, in some cases, a low baseline is a marker for new practices, and health workers are motivated by novelty. Alternatively, at low baseline levels, there might be more ‘easy’ opportunities to perform the new practice; thus, at low baseline levels, for a given amount of health worker effort, performance improvements are greater. A fuller explanation of the baseline–effect size relationship might be useful in designing more effective interventions for improving performance. Regardless of the cause, if it is a generalizable phenomenon that quality improvement interventions tend to have larger effect sizes in settings with lower baseline levels—regardless of the intervention or the indicators used for evaluation—then perhaps the validity of comparisons could be improved by simply adjusting for baseline values.

Regarding choice of effect size, while the OR is often used in meta-analyses and the PRDR had the beneficial property of being independent of baseline, the risk difference measure was selected for the main analysis because it has been used in previous systematic reviews that compare multiple interventions to improve health worker performance and because it is relatively easy to interpret. The public health impact of ORs can be difficult for some policy-makers and program managers to grasp—and the problem is compounded when studies contain pre-intervention indicator values and the effect size is the ratio of ORs. While PRDRs are easier to understand, some individual values can be confusing. For example, in this analysis, among the 24 indicators with a risk difference effect size <20%-points (i.e. small-to-moderate effect sizes), six had very high PRDR values (50–91%).

Although results from a six-study analysis are suggestive, no firm recommendations can be made without confirmation. Perhaps investigators who conduct reviews could use their datasets to explore the question. A robust method for improving the validity of comparisons would be valuable to the field. Additionally, for future intervention trials, researchers should use standard indicators [11, 20] whenever possible to reduce unnecessary heterogeneity.

In conclusion, in systematic reviews on improving health worker performance that include multiple indicators in the same analysis, adjusting the results for indicator category and baseline indicator value might be useful for improving the validity of intervention comparisons.

Author's contributions

A.R. conceived of the study, conducted the analysis and wrote the manuscript.

Conflict of interest statement

None declared.

References

View Abstract