OUP user menu

Validation of a tool assessing appropriateness of hospital days in rehabilitation centres

Romain Guilé, Christophe Leux, Cécile Paillé, Pierre Lombrail, Leila Moret
DOI: http://dx.doi.org/10.1093/intqhc/mzp008 198-205 First published online: 27 February 2009


Objective To develop and validate a list of objective criteria to assess the appropriateness of hospital days for patients admitted to rehabilitation centres and sub-acute care units.

Design Sixteen appropriateness criteria were defined by a multidisciplinary panel of 33 experts using a formalized consensus method. A single ticked criterion classifies the hospital day as appropriate. Reliability was studied by measuring concordance between two independent and simultaneous ratings using the instrument. External validity was tested by comparing conclusions derived from the instrument with the individual judgements of one, two or three experts on the same random sample of hospital days.

Participants The assessment on these criteria was performed on a randomized sample of 406 hospital days from 17 French wards.

Main outcome measures Inter-rater reliability and external validity were evaluated using the κ statistic and prevalence-adjusted and bias-adjusted kappa (PABAK).

Results The inter-rater reliability test showed a κ-value of 0.71 [95% confidence interval (95% CI) 0.63–0.78] and a PABAK of 0.77 (95% CI 0.70–0.83). There was a good agreement between the conclusions reached using the instrument and the individual judgements of experts with a κ coefficient of 0.42 (95% CI 0.35–0.50) and a PABAK of 0.60 (95% CI 0.52–0.67).

Conclusions The instrument is reliable and valid for assessing appropriateness of hospital days in rehabilitation centres and sub-acute care units. The next step in this study is the development of a tool for the analysis of causes of inappropriateness.

  • evaluation of health care
  • quality of health care
  • appropriateness of hospital use
  • validity
  • reliability


The appropriate use of hospital resources is a growing concern in Western countries, and in particular in France where hospital care still generates nearly 50% of health expenditure. In this context, inappropriate or excessive use of certain procedures or types of care, such as hospitalization, is a prominent issue. Being able to analyse inappropriate hospital days, related to the absence of alternative or downstream facilities or to malfunctioning hospital organization, is therefore doubly worthwhile. In addition to cost issues, a unnecessarily prolonged hospital stay is an element that is not in favour of quality care, and it exposes patients to iatrogenic complications, in particular among the elderly [1, 2]. Reviews of the appropriateness of hospital admissions and hospital days could contribute to analysis of this nature [3, 4]. Several measures of appropriateness have been developed for short-stay (acute) facilities [58]. The most widely translated instrument worldwide—on account of its metric properties and its easy use—is the Appropriateness Evaluation Protocol (AEP) [5, 8, 9]. The instrument has been adapted in Europe, where it is mainly used as a tool for monitoring improvement in care quality, enabling self-evaluation of practice by healthcare workers so as to instate measures of improvement to reduce the number of inappropriate hospital days. The rates for inappropriate hospital days found in Europe range from 7 to 41% but they are most frequently in a narrower range, between 20 and 35% [1018] in medicine wards. Among the causes of inappropriateness the most often found, the absence of downstream hospital bed facilities, and particularly medium-stay units, is constantly reported, representing the most frequent cause of inappropriate hospital days in several French studies [12, 15]. This reflects the fact that downstream facilities are overstretched, and the waiting time for beds is often very long. In France, hospitalization in medium-stay units (Soins de Suite et de Réadaptation) is represented by two entities, Médecine Physique et de Réadaptation (or rehabilitation centres) and Soins de Suite (sub-acute care units).

To our knowledge, no instrument has been published to date to assess the appropriateness of hospital days in structures of this type. The studies published on this type of unit used either the classic AEP [13, 14, 19, 20] or a modified version, the Community Hospital AEP [21]. In the first instance, the population hospitalized in medium stay facilities are unlikely to be comparable to those in short-stay facilities, so that it does not appear relevant to use this measure. In the second instance, the Community Hospital AEP has shown limitations in terms of reproducibility and validity [21]. Consequently, it appeared worthwhile developing a specific measure for patients hospitalized in medium-stay facilities.

The aim of the present study was to develop and validate an instrument to assess the appropriateness of hospital days in medium-stay units, applicable to all patients and easy to use.

Materials and methods

Development of the instrument: qualitative phase

An organization committee was established and a panel of 33 health professionals with an activity in a medium-stay unit were recruited in the Loire Atlantique département (administrative area) in western France. This developer-experts panel was made up of the following: 11 doctors, 11 health service executives, 5 quality officers, 2 social workers, 3 physiotherapists and 1 occupational therapist. The instrument was developed using a formalized expert consensus method derived from the Nominal Group Technique [22, 23] and the Rand Appropriateness Method (Rand/Ucla) [24]. The different phases of the method chosen were as follows, conducted in the course of five meetings. The first meeting consisted in a brainstorming session and all suggestions for appropriateness hospital days criteria were included. Then redundancies were removed. The second phase consisted in the panel scoring each criterion, anonymously and independently, using a discontinuous numerical scale ranging from 1 to 9, 1 being a criterion that was scored as not very relevant and 9 indicating a highly relevant criterion, 2–8 providing for intermediate opinion scores. The results were analysed using the method validated by Rand/Ucla [24]: when the median score response was 3 or less, the criterion was removed. When it was 7 or more, the criterion was accepted, and from more than 3 to less than 7 the criterion was qualified as ‘uncertain’ and re-discussed in the following meeting, during which all panel members were issued the overall results for each criterion, and their own scores, so as to be able to assess their own in comparison with the rest of the group. After this second meeting, the panel participants performed a second scoring of the uncertain criteria, also anonymously and independently, completed of an analysis of the literature. Any extra criteria found in this way were submitted to the panel for scoring using the same method (Figure 1).

Figure 1

Summary of the construction phases for a scale measuring appropriateness of hospital days in medium-stay units.

Consensus was reached on an instrument comprising 16 appropriateness criteria (items) based on the clinical condition of the patient and care management parameters (Table 1), and a user's rating guide was drafted detailing inclusion and exclusion criteria, and information on how to complete the measure. A hospital day was to be deemed appropriate if at least one criterion was ticked. However, even if according to the instrument a day was not assessed as appropriate (i.e. none of the 16 appropriateness criteria were ticked), the raters could nevertheless classify that day as appropriate on the basis of their professional opinion (an ‘override’ provision), to allow for situations that had been missed in the development phase of the instrument; to do this they answered a supplementary item at the end of the measure. In analysis, all such expert opinions were taken into account and this enabled the measure to be refined. Certain frequently quoted situations were integrated into the completion guide or included in the measure itself. The appropriateness measure and the completion guide were tested on 60 hospital days by health-professional raters.

View this table:
Table 1

Criteria for hospital day appropriateness in medium-stay units

ItemsFrequency (%)KappaPABAK
1Vital sign monitoring1.00.660.99
2Complex wound dressings5.20.540.92
3Paramedical surveillance at least three times per 24 h of a given parameter on medical prescription6.20.410.88
4Surveillance of medication under direct medical supervision8.60.340.79
5Diagnostic exploration ongoing (list in user guide)6.20.430.86
6Specific nursing care5.20.220.83
7Provisional feeding tube or adaptation in progress1.50.540.98
8Invasive medical act on the day0.0
9Sub-acute car units: review and/or coordinated care by at least two rehabilitation professionals on this day Rehabilitation centres: review and/or coordinated care by at least two rehabilitation professionals, including at least one group 1, or by a group 1 professional for at least 1 hour on this day22.70.780.86
10Programme of specific care delivery2.20.620.97
11Palliative care7.10.820.96
12Patient in pain6.70.620.90
13Ongoing review of recent or non-stabilized loss of autonomy8.10.470.84
14Review and management of severe denutrition3.00.460.95
15Recent intercurrent pathology, evolving or not stabilized, that appeared in the course of hospitalization10.60.390.79
16Planned respite care where duration is restricted and fixed in time0.0

Validation of the instrument

Study population

All the medium-stay units in Loire Atlantique in which the panel members were practising were involved in the study. Each unit was assessed on an appointed day in the months of April and May 2007. Thirty patients in each, hospitalized throughout the day before the survey, were selected randomly for inclusion in the study. In all, 19 wards volunteered to take part in the survey (14 sub-acute care units and 5 rehabilitation centres), but two wards were not finally able to take part on account of an insufficient number of patients, so that the number of participating wards was 17 (13 sub-acute acre units and 4 rehabilitation centres).

For each selected hospital day, the socio-demographic characteristics of the random selection of patients were recorded. A meeting prior to the evaluation was used to remind the professionals of the objectives, the study protocol and the requirement to assess the appropriateness of the particular day of hospitalization and not the hospital stay overall. The appropriateness of the hospital days was then rated using the instrument. Information could be obtained from patients' medical files, from the nursing files, from other available documents (monitoring sheets, prescriptions and so on) and by questioning the healthcare staff.

Inter-rater reliability

Two raters from each facility healthcare team concurrently and independently completed the appropriateness measure [25, 26] with the help of an external investigator acquainted with its user manual.

External validity

The conclusions reached using the appropriateness measure were compared with a concurrent criterion. This was constructed from ‘expert’ opinions. Under supervision by the external investigator, these experts were to assess in independent, concurrent manner and according to their own subjective judgement whether or not the hospital day was appropriate, i.e. whether in their opinion it was absolutely necessary [25]. These experts were health professionals in the ward, who knew the patients but not the instrument. Several situations were possible, depending on the number of experts available. When there were three, if there was disagreement between the first two, the third made a decision on the appropriateness of the day; when only two were available, in case of disagreement there was a discussion between them to reach a consensus. When there was only one expert it was this opinion that was taken into account. Because of the limitations generated by the method, only nine facilities took part in this phase of scale validation.

Statistical analyses

Results were expressed as mean, standard deviation, median, minimum and maximum for quantitative variables and as percentage for qualitative variables. The characteristics of patients for whom the hospital day was assessed appropriate by the raters were then sought using logistic regression with adjustment on confounders. The results are presented as odds ratios (OR) with 95% CI and the corresponding degree of significance. The significance threshold retained was P < 0.05.

The inter-rater reliability and the external validity of the instrument were assessed by studying agreement among raters and experts, respectively. To quantify agreement between two raters/experts, Cohen's Kappa coefficient [27] was used, and the Light Kappa coefficient was used in case of more than two experts. Although the kappa coefficient is widely used to assess reliability and validity, there are two main paradoxes that can influence the magnitude of kappa: the prevalence index and the bias index. We therefore devised kappa adjustments to take account of the influence of prevalence and bias, calculating prevalence-adjusted and bias-adjusted kappa (PABAK), as many authors have already suggested, where necessary [28]. The Kappa and the PABAK confidence intervals were calculated using a bootstrap procedure. Sensitivity and specificity of the measure were also estimated, using the assessment of rater A, who was a doctor (see Inter-rater reliability), as a reference.

Univariate and multivariate analyses were performed to look for patient characteristics associated with appropriateness of hospital days. The explicative variables used were as follows: age (under 60, between 60 and 80 and over 80), gender, place of residence prior to hospitalization, presence/absence of help in the home, living alone at home, existence of persons close in the home, McCabe score [describing the state of health of the patient prior to hospitalization, 0; no illness or non-fatal illness, 1; illness fatal within 5 years, 2; illness fatal in the short term (one year)], patient autonomy, patient behaviour (suited, partially suited, not suited) and the type of ward in which the patient was hospitalized.


Socio-demographic characteristics

Four hundred and six hospital days were analysed, the larger part of which (n = 313, 77.1%) were in sub-acute care units. The mean age of patients was 73.1 (± 15.5). The mean length of hospital stay prior to the survey day was 52.4 (± 133.7) days. The median was 24 days. More than 90% of the patients were living at home before hospitalization (n = 352). Among these, more than half had no help in the home (n = 208). Half of the patients living at home were living alone (52.3%, n = 184) and three quarters had relatives or friends available (76.4%, n = 309). Finally, less than a quarter of the patients were dependent (24.1%, n = 97).

Inter-rater reliability

Agreement between the conclusions reached by the two raters on the 16 criteria in the hospital days appropriateness measure was 87%, Kappa coefficient 0.74 and PABAK 0.75 (Table 2). The first of these two raters (A) was always a doctor, whereas the second (B) was a health executive in 78.1% of cases (317 days), a nurse for 14.5% (59 days) and an occupational therapist for 7.4% (30 days). The agreement observed ranged from 78.6 to 96.7% according to wards/units, and the Kappa coefficient from 0.51 to 0.89. Agreement observed per criterion varied from 89 to 99.5% and the Kappa coefficient from 0.22 to 0.82. Criteria 8 and 16 were never ticked. Finally, inter-rater agreement was identical whatever the profession of the rater and the type of unit.

View this table:
Table 2

Reproducibility of the measure and proportion of inappropriate days according to the scale for 406 days

Type of facilityNumber of daysProportion of days not appropriate according to the measureAgreement observed (%)Kappa [95% CI]PABAK [95% CI]
Rater A (Doctor) n (%)Rater B n (%)
Sub-acute care units313153 (48.9)166 (53.1)85.60.71 [0.64–0.79]0.71 [0.64–0.78]
Rehabilitation centres9315 (16.1)15 (16.1)93.50.76 [0.51–0.91]0.87 [0.76–0.96]
Overall406168 (41.4)181 (44.6)87.40.74 [0.67–0.80]0.75 [0.68–0.81]

In case of ‘override’ procedure agreement observed on the conclusions of the two raters was 88.7%, the Kappa coefficient 0.71 (95% CI 0.63–0.78) and the PABAK 0.77 (95% CI 0.70–0.83).

External validity

It was possible to analyse 208 hospital days (51%). This was assessed using three ‘experts’ for 30 of the hospital days, 2 for 59 days and only 1 for 119 days.

When expert judgements were analysed, the kappa coefficient was 0.42 (95% CI 0.35–0.50) in case of evaluation by three experts, 0.34 (95% CI 0.24–0.45) when there were two experts, and 0.48 (95% CI 0.38–0.56) when the evaluation was performed by a single expert. PABAK were, respectively, 0.60 (95% CI 0.52–0.70), 0.55 (95% CI 0.46–0.63) and 0.58 (95% CI 0.49–0.65).

The instrument showed good sensitivity (80.3%) and moderate specificity (55.4%).

Appropriateness of hospital days

The proportion of hospital days that did not comply with any of the 16 criteria in the instrument was 43% overall (n = 174.5): 42.4% (n = 168) for raters who were doctors (A) and 44.6% (n = 181) for the second raters (B) (not doctors) (Table 3).

View this table:
Table 3

Multivariate analysis to look for predictive factors for inappropriateness of hospital days

Inappropriate days n (%)Univariate OR [95% CI]Multivariate OR [95% CI]
 < 60 (n = 64)8 (12.5)2.20 [0.98–4.97]0.80 [0.24–2.72]
 60–80 (n = 192)46 (24.0)11
 > 80 (n = 150)53 (35.3)0.58a [0.36–0.92]0.82 [0.39–1.74]
 Men (n = 167)33 (19.8)11
 Women (n = 238)73 (30.7)0.56a [0.35–0.89]1.34 [0.65–2.77]
Preceding place of residence
 Home without help (n = 208)42 (20.2)11
 Home with help (n = 144)56 (38.9)0.40a [0.25–0.64]0.44a [0.21–0.90]
Living alone
No (n = 165)35 (21.2)11
Yes (n = 192)65 (33.9)0.53a [0.33–0.85]0.64 [0.31–1.30]
Family or close friends
 Not available (n = 60)25 (41.7)11
 Available (n = 309)71 (23.0)2.39a [1.34–4.27]1.75 [0.80–3.80]
McCabe score
0 (n = 243)80 (32.9)11
 1 (n = 78)14 (17.9)2.24a [1.19–4.24]2.53a [1.09–5.84]
 2 (n = 30)3 (10.0)4.42a [1.30–15.0]9.00a [1.89–43.0]
 No (n = 97)17 (17.5)11
 Yes (n = 305)89 (29.2)1.94a [0.09–6.46]1.17a [1.05–1.31]
Behaviour score
 Suitable (n = 264)73 (27.7)11
 Partly suitable (n = 74)17 (22.8)1.28 [0.70–2.35]0.90 [0.38–2.15]
 Unsuitable (n = 35)11 (31.4)0.84 [0.39–1.79]0.37 [0.11–1.26]
Type of facility managing the patient
 Sub-acute care unit (n = 313)104 (33.2)11
 Rehabilitation centre (n = 93)3 (3.2)14.9a [4.61–48.3]14.5a [1.80–116]
  • aOR significant at 5% threshold.

When raters' personal judgements were taken into account (override procedure), the proportion of non-appropriate days was 26.4% for rater A and 27.8% (n = 113) for rater B.

Non-appropriate days were more often found among patients hospitalized in a sub-acute care unit, in cases where the patient had help in the home before hospitalization, where the patient was self-reliant and where the patient was free from any life-threatening pathology for the short or medium term (Table 3).


The results obtained show a good reproducibility for the appropriateness criteria for medium-stay units days according to the Landis and Koch classification [29]. External validity as measured by agreement between conclusions reached by raters using the instrument and those reached by the ‘expert’ judges was also good.

The method used to construct the instrument was suited to the objectives. First, the developer panel provided a good representation of healthcare professions [30], and the fact that they were professionals implicated in this particular field favoured the quality of the criteria retained [31]. Finally, the panel was large, so as to collect a large number of viewpoints in a field where the literature is sparse.

The results of the inter-rater reliability analysis obtained here are equivalent to those obtained in international studies for instrument validation, in particular concerning the AEP. Gertman & Restuccia developed the AEP and analysed inter-rater reliability of the initial version in two stages on 200 hospital days, and they found Kappa coefficients of 0.77 and 0.86 using a method identical to that used here [5]. Strumwasser [9] also analysed the American version of the AEP and found a Kappa coefficient of 0.59. This researcher, however, followed a retrospective protocol, and the AEP criteria were not completed concurrently but from the clinical records of 119 selected patients, which could explain that reproducibility was not as good. Robain, in a study on the validation of the French version of the AEP, obtained a Kappa coefficient of 0.81 for 502 days analysed [25]. Other studies conducted in the Netherlands, Spain, Switzerland and Turkey found similar results [10, 14, 16, 26, 32]. These were, however, only adaptations of the AEP into different languages, and not newly developed measures.

The analysis of the present instrument's external validity yielded results that were equivalent to those obtained internationally, despite the difficulty in finding an external reference criterion. Robain found a Kappa coefficient of 0.61 (95% CI 0.53–0.68). The method used was practically identical to that used here, and in particular the same external criterion was used, but the size of the sample (501 days analysed) was larger, and the expert opinions were obtained from two doctors from the unit, with a third reaching the decision in case of disagreement between the first two. Strumwasser [9] also used the same concurrent criterion as Robain [25] but the expert judges, who were also doctors, did not know the patients and formed judgements from the medical file. In this case, agreement was less good, and closer to that found in the present study (Kappa coefficient 0.30–0.50 depending on the types of expert judges chosen) [9]. The results of other international studies also provided similar results [5, 16, 32, 33] despite the fact that the present study used only a first version of the measure.

Donald developed a measure from the AEP, the Community Hospital AEP, designed to assess the appropriateness of admissions and hospital days for patients in community hospitals in the UK. In the validation study, a single doctor assessed admission and hospital days for 440 patients using the measure developed. The results obtained on the Community Hospital AEP were compared retrospectively with the opinion of a nurse. The study concluded on poor validity of the measure, with a Kappa coefficient at 0.29 [0.1–0.71] for a study sample of 46 hospital days drawn randomly [21].

The results of the analysis of appropriateness of hospital days yielded by the present study are in line with those found by studies in other countries using the AEP in the same type of setting [13, 14, 19]. Figures just below 28% were found in studies conducted in Switzerland and France [10, 12]. Donald, in the analysis of 5951 hospitals days in a community hospital found a rate of 32% inappropriate days [21].

Certain limitations of the present work need to be underlined. While the absence of random selection among health professionals to form the instrument development panel can be regretted, since it would have improved validity, the fact that they were recruited on a voluntary basis meant that there was excellent motivation among the panel participants. The health facilities taking part in the instrument validation phase were not selected randomly either, so that the question of the representativeness of results is posed. However, the survey was conducted in different types of healthcare establishment in France, public, semi-public and private. In addition, the majority of raters who were not doctors were healthcare executives. In the majority of studies across countries for the validation of the AEP, the raters taking part in the inter-rater reliability study for the appropriateness measures were doctors and nurses. This may have reduced the inter-rater reliability, and the external validity, of the present scale, but it should on the other hand be possible for any health professional to use it. Further to this, ‘override’ rater opinion led to 15% of the days being classified appropriate in case of doctors (rater A) providing their opinion, and 16.7% for the opinions of healthcare staff who were not doctors (rater B). This proportion is well above the generally accepted threshold of 5% of the days that studies have applied to the AEP [12]. Most of the justifications signalled by the raters reverting to this override procedure concerned rehabilitation care. If these particular causes of days being considered appropriate are removed, the rate of recourse to override was 5.6%. This relatively wide use of expert opinion probably leads to an under-estimation of reproducibility and external validity for the measure. It involved one or several clinical situations that were not incorporated into the initial design of the measure (e.g. patient not allowed to put foot on the ground, preventing him/her from returning home). Thus, when such situations were not mentioned among the criteria, either the raters ticked another criterion or they used the expert opinion override. This process led to better precision in the definition of several of the criteria.

There are many uses for this instrument: follow-up of the way in which the measure evolves, providing a quality indicator for care delivered to hospitalized patients; evidencing any malfunction in care management, a decision tool in hospital policy planning.


This study has enabled the construction of an instrument measuring the appropriateness of hospital days that is applicable to sub-acute care units and rehabilitation centres, and can be used by all types of healthcare professionals. The validation study has shown metric properties that are in line with this type of usage, supporting the implementation of objective criteria. A further evaluation could confirm the good performances of the instrument, and its contribution to the establishment of efficient improvement strategies.

As with the AEP, a second phase aiming to analyse the causes of inappropriate hospital days occurring is underway, with a view to assisting healthcare teams in their work on quality care.


The authors would like to thank Angela Swaine Verdier for help with the drafting of the English and also the panel members for their participation: C. Alglave, C. Arurault, P. Berna, T. Bocher, F. Boulet, F. Caharel, J. Cobigo-David, M.N. Coing, S. Coutault, C. Couturier, G. Dallongeville, I. Delory, F. Dos Reis, D. Eveno, S. Ferreol, N. Foucher, M. Gavelle, I. Guitton, D. Hamon, G. Jehenne, E. Lasseron, C. Le Gal, Y. Lequeux, I. Mahé-Galisson, V. Marx, M. Moriot, F. Moutet, S. Pergeline, T. Poujhon, G. Rince, P. Rolland, E. Terzidis, S. Vallier.


View Abstract