OUP user menu

Does public disclosure of quality indicators influence hospitals' inclination to enhance results?

Kris H.A. Smolders , A. Lya Den Ouden , Willem A.H. Nugteren , Gerrit Van Der Wal
DOI: http://dx.doi.org/10.1093/intqhc/mzs003 129-134 First published online: 7 February 2012

Abstract

Objective The national guideline on oesophageal carcinoma's recommendation of a minimum number of 10 resections per year and the intervention of the Dutch Health Care Inspectorate have highlighted hospitals' ‘need to score’ on the public quality indicator for the annual number of oesophageal resections. To determine whether low-volume hospitals are inclined to adjust their numbers, we studied the difference between the reported and actual numbers of oesophageal resections in 2005 and 2006.

Design A retrospective cohort study. Hospitals were asked to submit all operative reports on resections from 2005 to 2006. Two pairs of evaluators independently labelled all anonymous operative reports from the selected hospitals as resection or non-resection.

Settings Hospitals in the Netherlands.

Participants Ten hospitals that reported 10 or 11 resections in 2006, or an average of fewer than 10 resections per year in the period 2003–2006.

Interventions None.

Main outcome measure(s) Difference between the reported and actual numbers of oesophageal resections in 2005 and 2006.

Results Oesophageal resection criteria were not met in 7% of the 179 operative reports from the 10 selected hospitals. The difference between the reported and actual numbers of resections in 2005 was not significant, while in 2006 it was. Of the hospitals studied, 70% actually performed fewer resections than they reported.

Conclusion Our results support the assumption that low-volume hospitals are inclined to adjust their numbers when, because outcomes are public, pressure to report a sufficient number is high. So, external verification of data is essential when this ‘need to score’ is high.

  • quality indicators
  • measurement of quality
  • hospital care
  • setting of care
  • cancers
  • disease categories

Introduction

In 2003, the Dutch Health Care Inspectorate (inspectorate) [1] introduced a new supervision policy that makes use of public quality indicators [2, 3]. To systematically survey the quality of care, every year the inspectorate asks all Dutch hospitals (∼100) about the care they have provided. The hospitals report their results publicly on the Dutch Hospital Association's website [4]. Publishing results has a number of aims, with the main emphasis being on increasing transparency in health care and hospitals' accountability. At the introduction of the indicators, the assumption also was that a public mirror would enhance the inclination of hospitals and professionals to improve their quality of care and increase consumer choice [5].

Because of public awareness, these indicators can lead to damaged reputations or government interference. Therefore, public indicators could increase the chance of adverse effects. The literature mentions various unintended, negative consequences of public indicators [59] such as more defensive medicine, avoiding difficult patients and decreasing quality in other domains of care due to less attention being paid to these areas. Hospitals may also feel compelled to enhance their results by adjusting their figures [5, 8]. Although there is some evidence of dysfunctional behaviour associated with throughput measures—for example the ‘hello nurse’ in the UK [6, 8, 9]—the potential unintended, negative consequences of public reporting have been largely unexplored [5, 7]. Despite the lack of evidence, the suggestion of negative effects is widely accepted in many parts of health care.

The inspectorate realizes that unintended, negative consequences could increase due to greater pressure to report a sufficient number in specific areas, especially when medical doctors have to give up part of their practice if they underachieve. One example of such an indicator is the number of performed oesophageal resections that include the proximal part of the stomach (hereafter described as resection). In contrast to the other 20 indicators, this indicator was supported by a guideline that strongly advised a minimum number of surgeries (in relation to the quality of care). The literature shows a consistent relationship between short-term mortality and the number of resections a hospital performs [1013]. This signifies a better outcome in high-volume hospitals. For example, Lanschot et al. [12] found that hospital mortality for oesophageal resection in the Netherlands differed significantly, from 12.1% in low-volume hospitals to 4.9% in high-volume hospitals.

Every year in the Netherlands, 1500 people are diagnosed with oesophageal cancer [14]; according to the quality indicator (QI) data, ∼600 of these patients undergo an oesophagogastrectomy. The Dutch multidisciplinary guideline ‘Diagnosis and treatment of oesophageal carcinoma’ was released in 2005, and strongly recommended that these resections must only be performed in hospitals where at least 10–20 of these procedures are carried out each year [15]. For the inspectorate, this guideline was reason to enforce increased centralization of resections. At that time, all 14 hospitals that structurally performed fewer than 10 resections per year between 2003 and 2005 were requested to either stop performing resections or to join forces with other hospitals so that in future at least 10 resections per year would be performed. This resulted in a decrease in the low-volume hospitals [from 32 (33%) in 2003 to 22 (23%) in 2006], and an increase in hospitals that do not perform resections [from 44 (46%) in 2003 to 53 (55%) in 2006] [16]. Owing to the increasing importance of a higher number of resections, the inspectorate assumed that hospitals would be inclined to enhance their results by adjusting their figures. This hypothesis was supported by a sudden increase in the national total number of resections in 2006: from a steady annual average of 602 between 2003 and 2005, this increased to 652 in 2006. Furthermore, because the inspectorate takes action when hospitals do not meet the guideline's criteria, the actual number of resections per hospital—and therefore the reliability of the data—is crucial. This meant it was necessary to verify the data.

Methods

To determine whether low-volume hospitals were inclined to enhance their results by adjusting their figures, inspectorate researchers conducted a retrospective cohort study with hospitals that only just met the required number [15]. The number of resections reported in 2005 and 2006 (collected on the hospital association website as part of the inspectorate's indicators [4]) was compared with the actual number of resections performed during these years. We hypothesized there would be little or no difference between the reported number of resections and the number actually performed in 2005, and that there would be a significant difference in 2006.

Two criteria were used to select the Dutch hospitals: the hospital had either reported 10 or 11 resections in 2006, or on average fewer than 10 resections per year during the 4-year period from 2003 and 2006. To determine the actual number of resections performed, the selected hospitals were asked to provide anonymous operative reports for every patient who underwent a resection in 2005 or 2006. The surgeon dictates these operative reports, and they contain detailed information on the patient's condition and the step-by-step procedures performed during an operation. When hospitals did not provide the expected number of operative reports, they were asked to provide the missing reports or to explain why they were unable to do so. Obtaining the requested information was not expected to be a problem, because the inspectorate is legally authorized to require institutions to supply information when it suspects a care institution is not providing the necessary quality of patient care [17].

Hospital identities were concealed for the evaluation of the operative reports. Two pairs of evaluators (physicians working as inspectors) were formed to examine and assess the operative reports. Each report was presented to a pair of evaluators (Fig. 1, research design), who independently labelled the report as resection, non-resection or unknown. If there were two matching assessments labelled as resection or non-resection, this was recorded as the final conclusion. If the assessments did not correspond, or if one of the evaluators withheld judgement (with at least one labelled as unknown), the report was presented to the other pair of evaluators. If this evaluation did not result in a final conclusion either, the report in question was presented to a surgical expert, whose judgement was final. An operative report was labelled as a resection when the resection of the oesophagus (including the oesophagogastric junction) was described, as well as the reconstruction of the oesophagus with the stomach or colon. Although the guideline focuses on oesophageal carcinoma, the indication for surgery was not taken into account.

Figure 1

Research design for evaluating operative reports on resections. Each of the two pairs of two evaluators assessed half of the submitted operative reports. The evaluators independently labelled a report as resection, non-resection or unknown. Two matching assessments labelled resection or non-resection were recorded as the final conclusion. If the assessments did not correspond or if one of the evaluators withheld judgement (with at least one labelled as unknown), the report was presented to the other pair of evaluators. If this evaluation did not result in a final conclusion either, the report was presented to a surgical expert, whose judgement was final.

Analysis

For all analyses, the null hypothesis was evaluated at a two-sided significance level of 0.05. All statistical analyses were performed with Statistical Package for the Social Sciences (SPSS) version 15.0.0 (SPSS, Inc., Chicago, IL). The paired sampled t-test was applied to establish possible differences between the reported and the actual number of resections in 2005 and 2006. To observe the influence of the ‘need to score’ after release of the guideline in 2005, the 10 selected hospitals were divided into two groups. Group 1 consisted of 6 hospitals (A–F) with fewer than 10 resections for 2005, where the increase in reported resections concurred with the knowledge that an insufficient number might have consequences. Group 2 consisted of 4 hospitals (G–J) that reported 10 or more resections for 2005, where the number of resections in 2006 was not unexpected. The difference between the reported number of resections in 2005 and 2006 for these two groups was tested using the independent t-test.

Results

The 10 selected hospitals reported 82 resections in 2005 and 115 in 2006. The total of 179 submitted operative reports is considerably lower than the expected number of 197. Noticeably, for 2005, 1 hospital submitted 5 (6%) fewer operative reports and 4 hospitals submitted 5 (6%) more operative reports than expected, whereas for 2006, 4 hospitals submitted 18 (16%) fewer operative reports than expected (Table 1). After a specific reminder to the eight hospitals that had submitted fewer reports than expected, one missing report was provided for 2005, but none for 2006. They stated that either an error had been made in the number of resections reported on the website, or that the missing operative reports could not be provided because of registration problems.

View this table:
Table 1

Reported number of resections on the hospital association website, number of submitted operative reports, numbers of operative reports evaluated as (non-) resections and the difference between the reported numbers of resections and the actual number of resections after evaluating the operative reports from 2005 to 2006

Hospital IDGroupaTotal
Group 1Group 2
ABCDEFGHIJ
2005
 Reported number of resections2566891010111582
  Submitted operative reports37668101011111082
  Non-resection based on operative report11001100105
  Resection based on operative report2666791011101077
  Difference in number of resections (reported versus operative report counts)0100−1001−1−5−5
2006
 Reported number of resections11121114121410101011115
  Submitted operative reports1110111412610410997
  Non-resection based on operative report32000010208
  Resection based on operative report881114126948989
  Difference in number of resections (reported versus operative report counts)−3−4000−8−1−6−2−2−26
  • aGroup 1: hospitals A–F reported fewer than 10 resections in 2005 and Group 2: hospitals G–J reported 10 or more resections for 2005.

A total of 178 operative reports from 2005 and 2006 were presented to the evaluators. One additional operative report was submitted after a specific reminder, and was therefore evaluated by the expert directly, who considered it a non-resection. The inter-observer agreement for the two pairs was, respectively, good (κ = 0.74) and moderate (κ = 0.55). After the first evaluation, 154 (86%) operative reports were labelled as resections and 7 (4%) were labelled as non-resections; the assessments for 17 (10%) reports were inconclusive. After these 17 operative reports were presented to the other pair of evaluators, another 9 (5%) were labelled as resections and 3 (2%) as non-resections. The remaining 5 (3%) operative reports were evaluated by the surgical expert: 3 (2%) were labelled as resections and 2 (1%) as non-resections.

Consequently, evaluation of all operative reports showed that 13 (7%) of the reports did not meet the oesophageal resection criteria: 5 from 2005 (6%) and 8 from 2006 (8%) were labelled as non-resections. Most of these reports described partial or total gastrectomy, surgery due to a complication after a previous oesophageal resection, operative treatment of oesophageal perforations and exploratory operation. In four hospitals all submitted operative reports were labelled as resections, in three hospitals one report was labelled as non-resection and in three hospitals more than one report was labelled as non-resection.

After including the additional submitted operative report and excluding missing and rejected operative reports, the number of resections in 2005 decreased by two (in two hospitals) and increased by seven (in three hospitals). In 2006, 7 hospitals reported 26 more resections than were actually performed. Although the number of resections per hospital is small, these results show a noticeable difference in the reported and actual numbers of resections in 2006. The mean difference between the reported and actual numbers of resections in 2005 was not significant [mean difference: 0.5 (95% CI: −0.73 to 1.73); P= 0.38]. Conversely, a significant difference was found for the reported and actual numbers of resections in 2006 [mean difference: 2.6 (95% CI: 0.66–4.54); P= 0.01].

Group 1 hospitals (A–F) both reported and actually performed significantly more resections in 2006 than Group 2 hospitals (G–J) [mean difference reported: 7.58 (95% CI: 4.70–10.47); P= 0.00; mean difference performed: 6.58 (95% CI: 1.33–11.84]; P= 0.02]. The mean difference between reported and performed was similar in both groups. The only hospitals that reported a correct number were 3 hospitals that performed 10 or more resections in 2006 and belonged to Group 1.

Discussion

This study showed that the majority (70%) of the hospitals that reported a threshold number of resections did not perform the required number in 2006. However, the number of reported resections in 2005, before the guideline was accepted and the inspectorate announced enforcement, did not have a predictive value. Five of the six Group 1 hospitals that reported an increase between 2005 and 2006 showed a real increase in the number of actually performed resections in 2006, but only three of them increased their number enough to reach the required minimum. In Group 2 hospitals the decrease in actually performed resections was larger than reported and did not reach the threshold. Although the guideline concentrates on oesophageal carcinoma, we did not exclude operative reports based on non-oncological indications for resection. However, the 16 operative reports without the indication ‘oesophageal carcinoma’ were more often labelled as non-resections (63 versus 2% of the operative reports with this indication). A total of 93% of all submitted operative reports were labelled as resections. If the indication for surgery had been a reason for rejecting the submitted operative reports, the percentage of hospitals that enhanced their numbers would have been higher.

Whereas the increase in missing operative reports (from 6 to 16%) may raise questions about the integrity of the hospitals' reported results, misinterpreting operative reports seems to be consistent over the years: of these reports, 6% in 2005 and 8% in 2006 did not concern the type of resection they were assumed to deal with.

The difference between the reported and actual numbers of resections is quite small. Owing to the threshold effect of the limit dictated in the guideline, if undetected, the effect of adjusting the numbers would be substantial, because the inspectorate only takes action against hospitals that perform fewer than 10 resections per year.

The study may have been more persuasive if it had also included hospitals with large numbers of resections. The data in this study were collected for the purposes of supervision, whereby the inspectorate forced hospitals with an actual number of fewer than 10 resections (in 2005 or 2006) to take action. It is mandatory for hospitals to provide the inspectorate with information about the care they provide. However, in part to minimize the administrative burden for hospitals, the inspectorate is not allowed to request information from hospitals without reason. Therefore, hospitals with a high volume of resections could not be audited in the same fashion. Nevertheless, we would expect fewer higher volume hospitals to adjust their figures, because the ‘need to score’ decreases as the number of resections increases.

After the first round of evaluation, the assessment differed for 16 (9%) operative reports and another 18 (10%) were labelled as unknown at least once. After the second round of evaluation of these same operative reports (by the other pair of evaluators), the assessments of five (2%) reports were once again inconclusive, and were presented to the surgical expert for a final conclusion. The inter-observer reliability shows a difference between the first and second pairs of evaluators. The moderate inter-observer reliability can largely be explained by their set of operative reports, which contained a higher number that were hard to evaluate. For example, four of the five operative reports ultimately presented to the surgical expert because they had been labelled as unknown had first been evaluated by the second pair of evaluators. Producing a matching assessment for these operative reports also turned out to be too difficult for the other pair of evaluators.

Based on the years 2003–2005, the average total number of resections in the Netherlands is 602 per year. Except for the peak in 2006 (which can be explained in part by over-reporting by the hospitals in our study), the total number of resections hardly changed over the years. Therefore, aside from adjusting numbers, it is very unlikely that other negative effects have occurred. Because no operative reports contained aberrant or incorrect indications for resections, there was no reason to assume that the number was inflated by falsified operative reports or needless surgery.

Hospital administrators collect and report quality data on the public website using information from medical professionals, operative reports and hospital information systems. The data reported on the website only becomes public after a hospital's executive board has (electronically) signed it as correct. The hospitals are responsible for providing accurate data; there is no external quality check of these data. The inspectorate stresses that it is the health-care providers themselves who are primarily responsible for data quality within the health-care system [18]. These data can also be accessed by consumers, consumer organizations and the media. Therefore, these public data are used for the annual ranking of hospitals and to notify consumers about (alleged) poor outcomes. There is a substantial ‘need to score’, because hospitals' reputations are at stake. Furthermore, if fewer than 10 resections are performed per year, a hospital can lose a very complex and therefore prestigious procedure. There is also a great deal of pressure within the profession to keep the quality of care high by adhering to the guideline. In the Netherlands, the development of guidelines (and with this, the introduction of regulatory limits) is initiated by representatives of the professional associations.

More and more, the Dutch government is seeking to centralize specialized hospital care. Also, various associations of medical professionals are convinced that centralization will result in better quality of care. The publication of the indicators for oesophageal resections seems to have had the desired effect on centralization. Gradually, fewer and fewer hospitals have been performing this type of resection, while (with the exception of 2006) the total number of reported resections has remained stable. In some regions, care has been reallocated by assigning resections to one hospital and procedures like the Whipple to another. The increasing number of resections per hospital signifies a further centralization of oesophageal resections in the Netherlands.

We expect that the results of our study can be generalized to other volume indicators for both high-risk surgery and other complex procedures. Encouraged by the government, the Netherlands Association of Surgeons has started different initiatives to establish more standards for these kinds of surgeries. The proof of the pudding will be next year, when hospitals have to report data on a new QI for resections of the pancreas.

Conclusion

Our results support the assumption that low-volume hospitals are inclined to enhance their numbers when, because outcomes are made public, pressure to increase the number reported is high. In 2005, before the introduction of the guideline and intervention by the inspectorate, no significant difference was found between the reported number of resections and the number of submitted operative reports. Conversely, a significant difference was found when, in 2006, it became clear that a low volume was no longer acceptable, indicating a ‘need to score’. In conclusion, external verification of data is essential when the ‘need to score’ is high (e.g. because of public attention).

Funding

This study was commissioned by the Dutch Health Care Inspectorate.

Acknowledgements

We would like to thank Jan Haeck, Jan Maarten van den Berg and Annette de Bruijne-Dobben for their medical expertise and evaluation skills. We would like to thank Gidi Smolders and Patricia and Nick Swift for their thoughtful review of the manuscript and for their suggestions. None of these persons were compensated for contributing to the study.

References

View Abstract