OUP user menu

Evaluation of a pilot surgical adverse event detection system for Italian hospitals

Caterina Caminiti, Francesca Diodati, Donatella Bacchieri, Paolo Carbognani, Paolo Del Rio, Elisa Iezzi, Dante Palli, Isabella Raboini, Erica Vecchione, Luca Cisbani
DOI: http://dx.doi.org/10.1093/intqhc/mzr088 114-120 First published online: 24 January 2012


Objective To devise an adverse event (AE) detection system and assess its validity and utility.

Design Observational, retrospective study.

Setting Six public hospitals in Northern Italy including a Teaching Hospital.

Participants Eligible cases were all patients with at least one admission to a surgical ward, over a 3-month period.

Interventions Computerized screening of administrative data and review of flagged charts by an independent panel.

Main Outcome Measures Number of records needed to identify an AE using this detection system.

Results Out of the 3310 eligible cases, 436 (13%) were extracted by computerized screening. In addition, out of the 2874 unflagged cases, 77 randomly extracted records (3%) were added to the sample, to measure unidentified cases. Nursing staff judged 108 of 504 (21%) charts positive for one or more criteria; surgeons confirmed the occurrence of AEs in 80 of 108 (74%) of these. Compared with random chart review, the number of cases needed to detect an AE, with the computerized screening suggested by this study, was reduced by two-thirds, although sensitivity was low (41%).

Conclusions This approach has the potential to allow the timely identification of AEs, enabling to quickly devise interventions. This detection system could be of true benefit for hospitals that intend assessing their AEs.

  • adverse events
  • patient safety, quality improvement
  • quality management, safety indicators
  • surgery
  • teamwork
  • audit
  • external quality assessment, medical errors


Adverse events (AEs) represent a serious problem; they interest nearly 1 out of 10 hospitalized patients and a substantial part of these is preventable [1]. It is increasingly accepted that most medical errors are not due to ignorance or negligence of a single individual, the so-called ‘person approach’, but that the causes of errors must be searched for within the system, towards which improvement interventions should be directed (the ‘systems approach’) [2]. This latter approach should be systematic, and should include the analysis of healthcare problems, selection of evidence-based interventions involving all stakeholders, context analysis for the identification of barriers and facilitators for change and program testing and performance measuring [3, 4].

Different detection strategies have been reported in the literature, aimed at identifying AEs in hospitals. Thomas and Petersen [5] examined as many as eight AE measurement methods, specifying that these were only the most commonly used out of a wide range of strategies applied in different settings. Each method had advantages and limits, suggesting that the choice of which method to use should be determined by various factors, such as aims, context and resource availability. Clinical documentation review seems to be the most widely used method for AE identification in hospitals. Despite its known limitations, chart review is very easy to implement, it allows planning of data collection and, thus, is generally favored by staff [6].

In most experiences, clinical records to be reviewed are randomly selected [712], or all records from a given period are considered [13, 14]. In both strategies, without an initial screening of cases with potential AEs, the number of records to be examined in order to detect AEs must be very high, making investigations extremely time consuming and cumbersome. More recently, the use of indicators applied to administrative data, to screen cases with potential AEs for chart review, has been proposed [15].

This study intended to take advantage of both computerized screening methods and medical record review, with three objectives:

  1. devise an AE detection system;

  2. assess its validity;

  3. assess its utility (to what extent computerized screening improves efficiency).

The findings will determine whether this system could be introduced into routine practice for timely detection of AEs, allowing for the identification of corrective measures.



The following definitions are provided in order to enable a comparison of the results with other international research data.

In this study, positively screened cases, or flagged cases, are cases selected by computerized screening because they exhibited one or more indicators. Unflagged cases are cases that were not selected by computerized screening.

An AE was defined as an injury that was caused by medical management (rather than the underlying disease) and that prolonged hospitalization, produced a disability at the time of discharge, or both. Causation is the degree to which the reviewer is confident that the event was caused by healthcare management, i.e. that an AE had occurred. Preventable AEs were those that would not have occurred if the patient had received ordinary standards of care, appropriate for the time of the study [6, 9]. For each AE found in this study, the degree of confidence in preventability was assigned on a six-point scale, and events were judged preventable if they received a confidence score of 4 or higher, as indicated by the Harvard Study [9].

The reference standard indicates the clinical chart review method used to confirm the occurrence of an AE. Findings on the comparison between computerized screening and the reference standard are described in terms of: sensitivity, the ability to flag correctly the cases with an AE; specificity, the ability to not flag correctly the cases without an AE; positive predictive value (PPV), the proportion of flagged cases that were also confirmed as having an AE; negative predictive value (NPV), the proportion of unflagged cases that were also confirmed as not having an AE.

System development

The study was conducted in six public hospitals within the Northern Italian Provinces of Parma and Piacenza, which volunteered to participate: the Parma Teaching Hospital (i.e. a hospital with full-time core residency training programs in medicine and surgery, 1334 beds), two Community Hospitals in the Parma Province (293 and 118 beds) and three Community Hospitals in the Piacenza Province (130, 178 and 524 beds). The geographical proximity of the two Provinces facilitated the conduction of the study. Data collection took place at the coordinating center, at the Parma Teaching Hospital. Since a large proportion of AEs occurs in surgical wards—up to 60% according to different studies [1] the investigation was limited to general surgical units.

The first phase of the study concerned the development of a computerized method for the screening of discharged summaries containing possible AEs. For this purpose, the working group, made up of three surgeons, four nurses and one epidemiologist, analyzed the literature to identify indicators which could be used to detect AEs in the hospital. Three sets of indicators were found: those used in the Harvard Study [9], the limited adverse occurrence screening by Wolff [13] and the Patient Safety Indicators (PSIs) developed by AHRQ [16, 17]. Although the first two sets of indicators were designed to be applied to clinical records, the latter is a screening tool for administrative data, the content of the three sets often overlapped. After eliminating redundancies, a list of 29 possible indicators was completed, and each item was reviewed and discussed by the working group, resulting in the selection of seven indicators relevant to surgical patients and judged most suitable according to the discharge summary coding rules used at the participating hospitals. The selected indicators are the following [13, 16]:

  1. Death—in-hospital death of surgical patients aged <75 years.

  2. Transfer to intensive care—surgical cases transferred from a general ward to the intensive care unit during the same hospitalization.

  3. Return to operating theater—surgical cases returning to the operating room within 7 days during the same hospitalization.

  4. Unplanned readmission—unplanned readmissions for surgical cases within 28 days of discharge, also to a different hospital.

  5. Length of stay (LOS)—surgical cases with LOS > 21 days.

  6. Postoperative PE or DVT—cases of deep vein thrombosis or pulmonary embolism in surgical discharges.

  7. Postoperative respiratory failure—cases of acute respiratory failure in elective surgical discharges.

The chosen indicators were used for the development of a computer program based on SAS software, by the Agenzia Sanitaria e Sociale Regionale of the Emilia Romagna Region, designed to screen discharge summaries to identify possible AEs using ICD-9-CM diagnosis codes. A hospital discharge case was flagged if at least one of the indicators scored positive based on the algorithms defined by their respective authors [13, 16]. A file containing all flagged cases, and a random sample of cases not identified by computerized screening (see the ‘Sample size’ section) was sent to the coordinating center for chart retrieval and review.

The second phase of the study was the identification of a tool to be used for the two-stage (nursing and medical) manual chart review for the confirmation of AEs, following the methodology outlined by the Harvard Study [9]. The working group decided to employ the instruments developed by Michel et al. [6], which were kindly sent to us by the author, translated into Italian and adapted to the Italian healthcare system. The tool consisted in two questionnaires: the detection sheet, to be used by nurses to judge the presence of possible AEs, consisting in 17 criteria, and the confirmation sheet, used by clinicians to confirm AEs and rate the degree of causation and preventability of each event.

After defining the instruments for the study, the project's Research Team was created, consisting in the nurses and clinicians who would be responsible for the manual chart review. The Research Team was made up of 10 nurses and 7 surgeons, including the healthcare professionals of the working group, representative of all participating institutions. The number of professionals was determined based on an estimate of the number of charts to be reviewed, and by the estimate of the time required for the chart review. The nurses were employees of their institution's administration offices, and had experience of clinical record coding. The physicians were highly motivated surgeons mainly operating in clinical practice. A 3-day training program was provided to all members of the Research Team, between December 2007 and January 2008, with the aim of ensuring uniformity in judgment and minimize interpretation biases.

During the course, reviewers were introduced to the project and to the tools adapted from Michel et al. [6]. Training included simulations of the chart review process, both on an individual basis, with each person reviewing the same chart before and after receiving instructions, and in groups consisting in one nurse and one clinician, where each group examined the same 10 charts followed by discussion of results. For training, charts were selected relative to events difficult to evaluate.

System evaluation

Findings of screening were validated using clinical records as the reference standard. Charts abstracted by the centralized software were reviewed in a two-stage process similar to the one used by the Harvard Study: charts relative to flagged cases were first reviewed by nurses of the Research Team, who identified those containing potential safety-related events. The selected records were then submitted to the physician reviewers of the Research Team, who were asked to confirm the presence of AEs, as well as to rate the degree of causation and preventability of each event. To prevent diagnostic review bias, nursing and medical reviewers were blinded to the output of computerized screening, i.e. they did not know whether the charts they had reviewed had been flagged as containing potential AEs or not. The two-stage process, initially used in the Harvard Study [9] and then adopted in many subsequent studies of this kind, is appealing as the selection by qualified nurses allows to save more costly physician time, still enabling surgeons to look into the clinical details for reasons and prevention strategies.

To avoid corporate bias, reviewers did not examine charts of patients hospitalized at the institution where they operated, and the names of the surgeons were deleted from the chart photocopies used during the review.

Discharge summaries of all patients with at least one admission in a surgical ward of the participating institutions, during the index period (3 months), were included in the investigation (eligible cases); no exclusion criteria were applied, to ensure sample representativeness.

The review of flagged charts required a total of 215 h/nurse and 96 h/physician; during each meeting, abstracted records relative to patients discharged in the previous 2 weeks were reviewed.

To test reliability, calculated for the presence of AEs, 10% of records was randomly selected and resubmitted 3 months after the first review, both to the same reviewer (intrarater reliability) and also to a different reviewer (interrater reliability). Reviewers were blinded to the outcome of the first assessment. Overall, κ coefficient values for the measurement of agreement between reviewers was good (Tables 1), better than those recorded in studies using a similar methodology [10].

View this table:
Table 1

κ coefficient values for the measurement of agreement between reviewers

KClassification95% CI
Nursing interrater reliability0.59Moderate0.53–0.65
Nursing intrarater reliability0.75High0.69–0.81
Medical interrater reliability0.50Moderate0.41–0.59
Medical intrarater reliability0.86Almost perfect0.77–0.95

Statistical analysis

Sample size

Because the paper is descriptive in nature, the sample size was calculated based on practical considerations, since the literature provides no indications on the efficiency of a system of this kind. The sample size was estimated considering an expected accuracy of 0.75 and a minimal acceptable lower confidence limit of 0.65 [18]; the necessary number of cases (with confirmed AE) was 262. Assuming a prevalence value for the study population between 5 and 10%, a total sample of ∼4000 records, corresponding to 3 months of collection, were necessary.

In addition to flagged records, to measure the proportion of false negatives and estimate the total prevalence of AEs (pretest probability), 3% of unflagged cases was abstracted at the same time, stratifying by hospital.

Data analysis

Continuous data were reported as median and 25–75th IQ range values. Dichotomous variables were compared by two-sided χ2 test, for the evaluation of patient characteristics.

All performance parameters were determined inferentially, deriving from the unflagged sample (3%) the information relative to the total of unflagged cases. Point estimates and 95% confidence intervals for the measures of diagnostic performance were determined.

SAS software release 8.2 was used for the statistical analysis.


Figure 1 depicts the study's flow diagram. Discharge summaries of all patients hospitalized in surgical wards between February and April 2008 were screened by the computer program every 2 weeks.

Figure 1

Flow diagram. Numbers of medical records are shown.

Out of the 3310 eligible cases, 436 (13%) were abstracted. Out of 2874 unflagged cases, 77 randomly abstracted records (3%) were added to the sample to measure diagnostic performance. Missing data concerned 8 of 436 (2%) flagged charts and 1 of 77 (1%) unflagged charts; they could not be retrieved either because patients had been readmitted and were currently hospitalized or because they still had not been signed and archived. Nursing staff identified 108 of 504 charts (21%) as positive for one or more criteria and submitted these to medical review. Surgeons confirmed the occurrence of AEs in 80 charts, corresponding to 74% (80 of 108) of nurse-flagged cases and 16% (80 of 504) of all reviewed records.

Demographic and clinical features of all eligible cases, of flagged and unflagged cases, as well as of cases confirmed by nurse and physician review are reported in Table 2. Seventy-eight percent of patients under study were cases from the Parma Teaching Hospital; 46% were male, with a median age of 71 years, median LOS of 5 days. The most common major diagnostic category (MDC) was digestive system (MDC 6). Distribution of flagged cases suggested a higher prevalence in the community hospitals, with respect to the total number of eligible subjects in each institution. Notably, 80% of cases confirmed by nurse review and 77% of cases confirmed by physician review were concentrated in the community hospitals.

View this table:
Table 2

Characteristics of eligible cases compared with flagged, unflagged and confirmed cases

Total of eligible casesFlagged casesUnflagged casesConfirmed by nursesConfirmed by physicians
Local health area
 Parma Teaching Hospital257078%27162%229980%1615%1417%
 Piacenza Community Hospitals2829%6515%2178%54%56%
 Parma Community Hospitals45814%10023%35812%8781%6177%
Male sex150946%25057%158455%8074%5771%
Most frequent major diagnostic categories
 Digestive system (MDC 6)70321%14333%57820%2725%1924%
 Kidney and urinary tract (MDC 11)44914%4811%39714%76%56%
 Circulatory system (MDC 5)41312%378%37013%1412%68%
 Respiratory system (MDC 4)38011%389%34312%1614%912%
 Hepatobiliary system and pancreas (MDC 7)35611%8820%27710%3028%2632%
 Male reproductive system (MDC 12)1795%164%1626%76%34%
 Skin, subcutaneous tissue and breast (MDC 9)1655%297%1375%22%22%
 P50 (median)7170727170
 P50 (median)5951315

The frequency of flagged cases for each individual indicator, the corresponding percentages of confirmed and preventable AEs after medical review, is given in Table 3. The screening indicator ‘Return to operating theater’ was the most frequent, but also the less predictive indicator (only 16% of flagged cases turned out to be confirmed AEs).

View this table:
Table 3

Distribution of flagged cases, confirmed and preventable AEs for each indicator

Flagged casesConfirmed AEsPreventable AEs
Postoperative respiratory failure92%667%350%
Transfer to intensive care5611%2850%311%
LOS > 21 days13127%3930%615%
Unplanned readmission112%218%00%
Return to operating theater26454%4216%512%
Postoperative PE or DVT51%360%133%
  • The same case may be reported more than once being eligible for more than one indicator; similarly, the same AE may be linked to more than one indicator.

The data of chart review on AE frequency, preventability and outcome, relative to flagged and unflagged cases are shown in Table 4. The frequency of AEs was 18% (77 of 428) for flagged cases and 4% (3 of 76) for unflagged cases. Out of 88 AEs confirmed by surgeons in the flagged case-mix, 17% was judged preventable (15 of 88) vs. one-third in the unflagged case-mix (33%).

View this table:
Table 4

AEs, preventability and outcome

 Reviewed charts
FlaggedUnflagged random samplea
Fq (% no. of records)
No. of records428 (100)76 (100)
No. of patients with at least one AE77 (18)3 (4)
No. of AEs88 (21)3 (4)
No. of preventable AEs15 (4)1 (1)
OutcomeFq (% no. of AEs)
The clinical event caused disability19 (22)0 (0)
The clinical event prolonged LOS43 (49)1 (33)
The clinical event influenced survival12 (14)0 (0)
The clinical event caused death7 (8)0 (0)
  • a3% of unflagged cases was randomly abstracted.

Concerning the outcome in flagged cases, 22% of AEs caused disability at discharge, 49% prolonged LOS, 14% influenced survival, and 8% caused death. AEs found in unflagged cases consisted in hematoma, bleeding and intense pain.


The diagnostic performance of computerized screening for the detection of AEs is detailed in Table 5. Sensitivity was 41%, specificity was 89%, PPV was 18% and NPV was 96%. Overall, the screening accuracy (true positives + true negatives over the total of cases) was 86%.

View this table:
Table 5

Diagnostic performance—for estimates of unflagged cases inferential data were used

Cases with AEsCases without AEs
Unflagged112 (3)2725 (73)2837 (76)
189 (80)3076 (424)3265 (504)
95% CI
  • The sample actually analyzed, equal to 3% of unflagged cases, is given in parentheses.


Computerized screening reduced by two-thirds the number of charts to be reviewed to detect an AE compared with random sampling. This value was derived comparing AE prevalence in the total of eligible cases (6%) and the frequency of AEs in flagged cases (18%). Specifically, without computerized screening, physicians would have had to review 17 medical records to detect one AE, while with the screening, physicians needed to review only 6 medical records to find an AE.


The detection system used in this study improved efficiency in locating potential AE cases, enabling physicians to quickly converge on the true AE cases to find causes and potential strategies of correction. In fact, to detect an AE, the number of charts to be reviewed was reduced by two-thirds. Unfortunately, only 41% of AEs was captured (probably the most serious ones) implying the risk of underestimating problems and the need to adopt interventions. It should be emphasized, however, that work aimed at the refinement of surgical AE screening indicators is ongoing [19], and the computerized method will be gradually updated, thus improving its performance.

Although this study was not designed to measure validity and utility of each single indicator, our data suggested that the indicator ‘LOS > 21 days’ worked best, as it captured the highest number of confirmed and preventable AEs, requiring the lowest number of charts. Moreover, this indicator exhibited a highly significant association with the other indicators, i.e. frequently, cases flagged for this indicator were also flagged for other indicators. A curious and unexpected finding was the difference in the rate of confirmed cases among the hospitals; specifically, the highest concentration of AEs was found in one community hospital. This finding suggests a potential use of the system as a tool for the prioritization of corrective interventions when applied to multiple hospitals, for example, in a given geographical area.

To our knowledge, this is the first research investigating the importance of AEs in an Italian setting. With respect to other international research published in this field, our study is mainly characterized by two aspects. It is the product of the cooperation between representatives from the management, medical and nursing staff, who worked together from the initial stages of the protocol development all through its conduction. This is, we believe, a very important fact which should favor the systematic use of this system in routine practice. Also, unlike most studies on AE detection using administrative data, which only refer to PSIs and generally measure the validity of individual indicators, our research tested the advantages of both computerized screening and manual review, using a set of indicators derived from different experiences. However, our research reflects the findings obtained by other PSI validation studies, which suggest a generally high specificity and modest sensitivity. In particular, two studies similar to ours have been published, by Romano et al. [19], and by Kaafarani and Rosen [20], who have examined the use of these sorts of screening approaches using surgical AHRQ PSIs. Unlike our research, however, both papers consider selected PSIs individually, and do not combine individual values. Sensitivities found in these studies range from 19 to 63%, according to the PSI, against our overall value of 41%. Specificities reported by Romano et al. [19] range from 99.1 to 99.9%, higher than the overall value obtained in our study.

This work has a few limitations, as evidenced in other similar studies. First, its findings cannot be totally reproduced, as chart review implies some degree of subjective judgment. Moreover, the sample is relatively small, and limited to a restricted geographical area, therefore, direct inferences cannot be made for other settings. Finally, the use of administrative data for screening may lead to underestimation of a problem, because the ascertainment of AEs depends on the quality and completeness of coding, and only events for which there are corresponding ICD-9-CM codes can be found [17].

Despite these limitations, this study suggests that this detection system offers a true benefit for hospitals which intend to assess their AEs.


This work was supported by the Emilia-Romagna Regional Health Trust.


We thank the medical and nursing reviewers, Alessandra Bardiani, Barbara Benoldi, Stefania Bertocchi, Maria Cristina Cornelli, Guglielmo Delfanti, Paolo De Tullio, Carlo Maggi, Emilio Marchionni, Libera Notarangelo, Chiara Rocchetta, Michela Squeri, Angela Villa, Pietro Vitolo and Mario Zecchinato, for their invaluable contribution to the study. We are grateful to the medical managements of the participating institutions for their precious support. Special thanks go to Dr Maria Elisabeth Street, for her linguistic revision of the paper.


View Abstract