OUP user menu

The Patient Experiences Questionnaire: development, validity and reliability

Kjell I. Pettersen , Marijke Veenstra , Bjørn Guldvog , Arne Kolstad
DOI: http://dx.doi.org/10.1093/intqhc/mzh074 453-463 First published online: 22 November 2004

Abstract

Objectives. To describe the development of the Patient Experiences Questionnaire (PEQ) and to evaluate reliability and validity of constructed summed rating scales.

Design. Literature review, focus groups and pilot surveys. Two national cross-sectional studies performed in 1996 and 1998.

Setting. Two postal surveys in a national sample of 14 hospitals stratified by geographical region and hospital size.

Subjects. Patients consecutively discharged from surgical wards and wards of internal medicine. The surveys included 36 845 patients and 19 578 responded (53%).

Results. We constructed 10 summed rating scales based on factor analysis and theoretical considerations: Information on future complaints, Nursing services, Communication, Information examinations, Contact with next-of-kin, Doctor services, Hospital and equipment, Information medication, Organization and General satisfaction. Eight scales had a Cronbach alpha coefficient of >0.70, the remaining two were >0.60. Repeatability was >0.70 for five scales and >0.60 for the remaining scales.

Conclusions. The PEQ is a self-report instrument covering the most important subjects of interest to hospital patients. Results are presented as 10 scales with good validity and reliability. It emphasizes practicability and comprehensibility while at the same time providing sufficient information about domains applicable to most patients admitted to medical and surgical wards.

  • outcome assessment
  • patient satisfaction
  • psychometrics
  • quality of care
  • scale development

Surveys of patient experiences have become a common approach to monitoring and improving quality of health care. At the same time, this research has been criticized for its lack of a conceptual framework, contradictory results [1,2] and lack of valid and reliable instruments [3]. Therefore, an important task is to address some of this criticism by establishing reliable and valid questionnaires for measuring different aspects of patient-experienced quality of care.

Measuring patient experiences can have different purposes: (i) describing health care from the patient’s point of view; (ii) measuring the process of care, thereby both identifying problem areas and evaluating improvement efforts; (iii) evaluating the outcome of care [46]. During the past 7 years, patient experiences of hospital care have been used as a class of outcome measures in a national study carried out in Norway [7]. For use in this study, we developed a questionnaire for measuring patient experiences of hospital care, applicable both in quality improvement locally and for national surveillance. In the development of the questionnaire we focused on a central problem related to the construction of performance indices from patient surveys: how to ensure practicability and comprehensibility while at the same time providing maximum information about the most important domains of hospital care.

The objective of the current paper is to describe the development of the Patient Experiences Questionnaire (PEQ), and to evaluate reliability and validity of the summed rating scales constructed from items in the questionnaire.

Methods

There were five phases in the development of the PEQ and its scoring system: (i) preliminary work resulting in a first-generation questionnaire; (ii) constructing the PEQ; (iii) exploratory factor analysis of the PEQ; (iv) constructing summed rating scales; and (v) assessing the reliability and validity of the summed rating scales.

A first-generation questionnaire

We initially searched Scandinavian and Anglo-American literature for items related to the dimensions of the meta-analysis described by Hall and Dornan [8]: overall quality, humaneness, competence, outcome, facilities, continuity of care, access, informativeness, cost, bureaucracy and attention to psychosocial problems. We then selected items that satisfied the following criteria: (i) items relevant to at least 25% of patients admitted to general surgical or medical departments; (ii) items focusing on specific aspects of hospital care rather than satisfaction with care; (iii) items focusing on medical and nursing aspects of hospital services.

After minor modifications following pilot studies, patient interviews, review of patients’ written comments and thorough discussion with hospital staff, both clinicians and administrators, the first-generation questionnaire was completed. A description of the psychometric properties of the first-generation questionnaire has been published elsewhere [9].

Constructing the PEQ

The developmental process from the first-generation questionnaire leading to the PEQ consisted of several steps. (i) Exploratory factor analysis of the first-generation questionnaire suggested that we could eliminate items and still have both a good amount of shared variance and sufficient coverage of each concept. (ii) We brought items more in line with conversational Norwegian to minimize the risk of idiosyncratic interpretations. (iii) Building on experiences from former questionnaires, we changed the wording of some items, to minimize skewed distribution of responses. (iv) To reduce correlated measurement errors associated with habits of expression and idiosyncratic interpretation of relationships between response scale and questions, we named the response scale endpoints with explicit reference to the wording of the item rather than administering a shared set of anchoring formulations [1012]. (v) An increased number of response categories will increase the resolution of the scale, which in its turn will increase reliability. The relationship between the number of response categories and reliability has been described by others [10,1215]. Accordingly, to increase reliability, we changed the response scales from five to 10 categories.

We classified and discussed more than 600 written comments [16] and organized a focus group with former patients to discuss their experiences of hospital care. This led to additional items on the experiences of having spent considerable time in bed in a corridor, the staff’s treatment of visiting relatives and information, and follow-up after discharge.

The final version had 35 items with 10-point ordinal response scales. Sixteen items had the response option ‘did not apply to me’. Items and anchoring formulations are presented in Appendix 1.

Exploratory factor analysis of the PEQ

We used exploratory factor analysis to examine the structure of relationships between the items. The data for the exploratory factor analysis were derived from the Norwegian study on Outcomes Research and Quality Improvement, a national survey of in-patient experiences of hospital care conducted in 1996 and 1998. The sample included 14 hospitals stratified by geographical region and hospital size. Each hospital was assigned a number of patients proportional to the number of beds in the participating departments. Patients 16 years or older, subsequently discharged from medical and surgical departments, received a questionnaire 6 weeks after discharge. Non-respondents were sent a reminder after 4 weeks. We sent questionnaires to 36 845 patients, and 20 890 (57%) returned the questionnaire. We excluded 1312 respondents who had not filled in the questionnaire properly or had responded to too few items. This left 19 578 patients (53%) for further analysis.

To extract the factors we applied the method of principal axis factoring in SPSS 9.0. We considered principal axis factoring more suitable than principal components because the items probably would have considerable amounts of both random measurement error and unique variance [17]. Furthermore, we expected correlated factors, and therefore used oblique rotation with Promax. Based on the results from exploratory factor analysis of the first-generation questionnaire we assumed a six-factor model and used the scree test criterion to select the number of factors to be extracted [9,17]. We included those items that shared at least 10% of their variance with the factor [18].

We used listwise deletion in the exploratory factor analysis. To avoid extensive loss of responses due to single item missing values, we replaced missing values for the 19 items that did not have the response category ‘did not apply to me’. We used PRELIS software to replace missing values with a conditional modal value, with three socio-demographic items and three to four items from the PEQ as matching criteria [19,20]. This procedure replaced about 20% of the missing cases.

Constructing summed rating scales

The literature supports the use of additive scales to improve reliability and comprehensibility in reporting on complex constructs [17,18]. We used the item structure suggested by the exploratory factor analysis as a basis in constructing summed rating scales. Items that were not included in scales this way were reconsidered in theoretically derived scales, using our knowledge, theoretical considerations and experience gained from the first-generation questionnaire. Only items that increased the Cronbach alpha coefficient when added were included in the final scales [21].

Assessment of aspects of reliability and validity of the summed rating scales

We evaluated internal consistency for the summed rating scales using the Cronbach alpha coefficient. These estimates were based on the same sample that was applied for the exploratory factor analysis.

We used the intraclass correlation to assess test–retest reliability for the summed rating scales [2224]. Intraclass correlation takes into account not only the correlation, but also differences in intercept and slope between replicate measures [24]. We sent a second questionnaire to 242 consecutive respondents 2 weeks after we received their first response. The response rate on the retest was 67% (n = 162). We analysed test–retest reliability for the 150 patients who reported that they had not been re‐hospitalized between completing the two questionnaires.

To study construct validity we explored the general association between summed rating scales in the PEQ and measures external to the summed rating scales. Among socio-demographic factors, age and gender have been pointed out as the most consistent factors associated with patient satisfaction. Elderly patients tend to be more positive about their hospital stay than younger patients, and there is some evidence that men respond more positively than women [4, 2527]. Fulfilment of expectations is a strong predictor of patient experiences [4, 28]. We hypothesized a priori that patients with unfulfilled expectations would have less positive experiences related to treatment outcome and the post-hospital situation. We classified patients as having experienced fulfilment of expectations if they had assigned the most positive scale score on item 24. Unfulfilled expectations were categorized by assigning a scale score in the lower half of the response scale.

In the analysis of construct validity we used Student’s t-test and analysis of covariance adjusting for education and self‐perceived health, and gender and age when applicable. Self-perceived health was measured with the Short Form 36 Physical and Mental component summary scales [29].

Ethics

The Norwegian Regional Committee for Medical Research Ethics, the Data Inspectorate and the Norwegian Board of Health approved the survey.

Results

Appendix 2 presents descriptive statistics for the 35 items of the PEQ. Mean item score ranged from 5.2 (Corridor stay) to 9.1 (Caring nurses) for the individual items.

Exploratory factor analysis

Nine of the 35 items in the PEQ were not included in the initial exploratory factor analysis. We did not include the two items (items 1 and 2) describing patients’ general satisfaction with hospital care, because they were likely to correlate with most of the other items and therefore limit the interpretability of the underlying constructs. Furthermore, we did not include four items (items 5, 7, 13 and 14) in which more than one-third of the patients replied, ‘did not apply to me’. The other items with a ‘did not apply to me’ option were included in the exploratory factor analysis. Lastly, we considered three items to be indicators of patients’ expectations of the medical outcome rather than experiences with care (items 3, 4 and 24), and did not include them in the exploratory factor analysis.

This left us with 26 items to be used in the initial exploratory factor analysis. Five items (items 6, 23, 27, 28 and 29) did not have factor loadings greater than 0.30 on any of the factors. One item (item 12) was excluded because it had loadings greater than 0.30 on two factors. Thus, the final version of the exploratory factor analysis included 20 items represented by six factors and accounted for 67% of the total variance. The scree test criterion also indicated a six-factor model as the best solution. The pattern matrix and communalities of this final solution are presented in Table 1. The average communality was 0.55. Items 8 and 20 had communalities below 0.30, contributing strongly to the low average. Using the same criteria, separate exploratory factor analyses for young patients, elderly patients, patients discharged from medical departments and surgical departments, men and women, yielded some differences in factor loadings, but the underlying factor structure did not differ from the final factor solution found for the whole sample.

View this table:
Table 1

Pattern matrix with factor loadings and extraction communalities in the final exploratory factor analysis-model

Construction of summed rating scales

We summed the items selected within a dimension of care. All sums were then rescaled to cover a range from 0 to 100, 100 representing the most positive evaluation.

In factor 1, we excluded items 32 and 33 and combined items 34 and 35 in the scale Information on future complaints. In factor 2, we excluded item 20 and combined items 17, 18 and 19 in the scale Nursing services. Supported by an apparent split in the size of factor loadings in factor 3, we divided the factor into two conceptually different scales. One scale, Communication, consisted of items 9, 10 and 11, and the other scale, Information examinations, included items 15 and 16. In factor 5, we excluded item 8 and combined the items 21 and 22 in the scale Doctor services. Factors 4 and 6 each constituted a consistent scale according to our criteria and we labelled the scales Contact with next-of-kin and Hospital and equipment, respectively.

We then reconsidered whether the items not included in summed rating scales could be grouped into conceptually meaningful dimensions based on theoretical considerations. In consequence of this process, we combined items 13 and 14 into the scale Information medication and items 1 and 2 into the scale General satisfaction. Items 20, 23, 28 and 29 measure aspects of hospital organization and continuity. We combined these items in the scale Organization.

In scales with few items, item–total correlations tend to be inflated. After correction for this inflation [30], the item–total correlation for the scale Organization ranged from 0.57 to 0.62. For all the other summed rating scales corrected item–total correlations were in the range 0.73–0.92. For all summed rating scales, corrected item–total correlations were higher than any correlation between the scale and items not included in the scale. Table 2 presents the correlations between the summed rating scales. Correlations varied between 0.21 (Hospital and equipment–Information future complaints) and 0.66 (Doctor services–General satisfaction).

View this table:
Table 2

Internal consistency of summed rating-scales and correlation (Pearson’s r) for the summed rating scales in the Patient Experiences Questionnaire

Descriptive statistics for the summed rating scales are presented in Table 3. Even though the scales are not normally distributed and had means in the upper half of the range (0–100), they all had skewness less than |2| [31].

View this table:
Table 3

Descriptive statistics for the summed rating scales in the Patient Experiences Questionnaire

Reliability and validity of the summed rating scales

All scales based on the factor model had good internal consistency with a Cronbach alpha coefficient of >0.70 and internal consistency was acceptable for all theoretically defined scales with a Cronbach alpha coefficient of >0.60 (Table 2).

The test–retest resulted in intraclass correlations of between 0.62 and 0.85 (Table 4). Five of the 10 intraclass correlations exceeded 0.7.

View this table:
Table 4

Test–retest intraclass correlation (ICC) with 95% confidence interval (CI) for the summed rating scales in the Patient Experiences Questionnaire; separate retest data were used

Table 5 presents the associations between summed rating scales and external measures. Unadjusted scores differed on all summed rating scales for the oldest versus the youngest half of the patients, gender and fulfilment of expectation. Adjusting scores did not change the results.

View this table:
Table 5

Associations between summed rating scales in the the Patient Experiences Questionnaire and external measures

Discussion

The PEQ is a self-report instrument covering the most important subjects of interest to patients discharged from surgical wards and wards of internal medicine. Results are presented as 10 scales with good validity and reliability. The instrument can be used as a tool in local quality improvement initiatives and as a surveillance tool in national health care strategies. It emphasizes practicability and comprehensibility while at the same time providing sufficient information about domains applicable to most patients admitted to medical and surgical wards.

Validity

Our results support construct validity. Although the observed differences in scores between men and women and between the younger and the older groups of the respondents were small, they confirmed the association between socio-demographic factors and patient experiences. Patients with unfulfilled expectations scored substantially lower on all scales than patients with fulfilled expectations. This emphasized the potential importance of expectations for patient experiences of hospital care.

Reliability

Internal consistency was acceptable, with a Cronbach alpha coefficient of >0.70 for eight of the 10 scales. The scale Organization had a Cronbach alpha of just below 0.70. One explanation may be that hospital organization typically involves cooperation across different settings and types of staff. Yet, the evaluation of a hospital’s organizational traits is likely to belong to the same conceptual dimension. The other scale with a Cronbach alpha just below 0.70 was Information medication. This scale consists of one item on information about medication during the hospital stay and one on information at discharge and thus may constitute a description of two different situations: the in-hospital and post-hospital situations.

Short-term repeatability for the 10 scales as expressed by the intraclass correlation varied between 0.62 and 0.85. It is customary to state that measurements of repeatability for group comparisons should be at least 0.70 [32]. However, for scales likely to be affected by possible new experiences shortly after hospital discharge, like Information future complaints and Information medication, the stability can reasonably be expected to be less. Stability and change in patient experience of hospital care needs to be assessed more closely, with repeated measurements on the same patients.

Limitations

The PEQ scales consist of two items or more and this raises the question of the minimum number of items to be included in each subscale. We will argue that two items in many instances may be sufficient. Firstly, from a factor analytical perspective, the items included in the summed rating scales are presented in the context of other dimensions and items. In such a context, two items are usually enough to identify the common dimension, if the factor model has a reasonably good fit and a positive number of degrees of freedom. Secondly, the selection of items in a summed rating scale is usually based on the alpha criterion. Provided the level of alpha is satisfactory, there are no other formal arguments for continuing to add items to a scale. Finally, the scales in the PEQ are meant as a screening tool and are not assumed to provide an exhaustive and detailed description of each dimension of hospital care. Given the purposes of the PEQ, two items may cover the dimension sufficiently.

Non-response has been analysed for the first-generation questionnaire [9]. This analysis indicated that results from surveys in general patient populations can be generalized to patients being able to respond to the questionnaire. We do not know whether the results apply to patients who did not reply due to their medical condition.

Responsiveness of the scales in the PEQ has not been formally assessed. However, in one local study the PEQ was used to assess changes in patient experiences after reorganization of doctor services. They observed statistically significant changes in scale scores on all the relevant items [33].

Conclusion

There is no standard method or ultimate instrument for measuring patient experiences. On the contrary, attention has been called to lack of valid and reliable instruments [3]. To improve methodology it is necessary to develop new questionnaires and to scrutinize existing ones. However, which instrument to choose will depend not only on psychometric properties, but also on the health care system, the purpose of the study and in what setting it is carried out. PEQ is in our view one of the questionnaires to be considered when assessing in-patient experiences.

Appendix

View this table:
Appendix 1

Item wording and anchoring phrases

View this table:
Appendix 2

Abbreviated item content, descriptive statistics and number of patients who used the ‘did not apply to me’ option. Total number of respondents was 19 578

Acknowledgements

The authors thank Tomislav Dimoski for his contribution by developing software necessary for quality assurance and patient selection and Knut Stavem for thorough comments on an earlier draft of the manuscript. The study was partly funded by the Norwegian Ministry of Health and Social affairs.

References

View Abstract