OUP user menu

Evaluating quality indicators for physical therapy in primary care

Marijn Scholte, Catharina W.M. Neeleman-van der Steen, Erik J.M. Hendriks, Maria W.G. Nijhuis-van der Sanden, Jozé Braspenning
DOI: http://dx.doi.org/10.1093/intqhc/mzu031 261-270 First published online: 3 April 2014


Objective To evaluate measurement properties of a set of public quality indicators on physical therapy.

Design An observational study with web-based collected survey data (2009 and 2010).

Setting Dutch primary care physical therapy practices.

Participants In 3743 physical therapy practices, 11 274 physical therapists reporting on 30 patients each.

Main Outcome Measure(s) Eight quality indicators were constructed: screening and diagnostics (n= 2), setting target aim and subsequent of intervention (n = 2), administrating results (n = 1), global outcome measures (n = 2) and patient's treatment agreement (n = 1). Measurement properties on content and construct validity, reproducibility, floor and ceiling effects and interpretability of the indicators were assessed using comparative statistics and multilevel modeling.

Results Content validity was acceptable. Construct validity (using known group techniques) of two outcome indicators was acceptable; hypotheses on age, gender and chronic vs. acute care were confirmed. For the whole set of indicators reproducibility was approximated by correlation of 2009 and 2010 data and rated moderately positive (Spearman's ρ between 0.3 and 0.42 at practice level) and interpretability as acceptable, as distinguishing between patient groups was possible. Ceiling effects were assessed negative as they were high to extremely high (30% for outcome indicator 6–95% for administrating results).

Conclusion Weaknesses in data collection should be dealt with to reduce bias and to reduce ceiling effects by randomly extracting data from electronic medical records. More specificity of the indicators seems to be needed, and can be reached by focusing on most prevalent conditions, thus increasing usability of the indicators to improve quality of care.

  • quality indicators
  • physical therapy
  • measurement properties
  • measurement of quality


Quality improvements are of increasing importance in the field of health care. Health-care professionals, such as physical therapists constantly strive to improve the quality and professionalism of their care and there is a growing awareness of the importance of evidence-based physical therapy [1]. Simultaneously, declining trust in health-care institutions has led to a call for audits and inspections as well as more transparency of quality of care [2, 3]. To promote transparency and accountability in health systems, a need exists for performance indicators [4]. Moreover, benchmarks are seen as a powerful incentive to improve the quality of care [5].

On behalf of the Dutch Healthcare Authority 23 indicators capturing the quality of physical therapy in primary care [6] were developed during a systematic iterative consensus procedure with all stakeholders, who agreed that the measurement aim was evaluative [7], meaning that quality of care was measured to detect longitudinal changes. The indicators described 3 domains: the physical therapy care process (8 indicators), patient experience (10 indicators) and practice management (5 indicators).

In this study, the eight indicators that capture the physical therapy care process will be the focal point. This set was based on the guideline which described the report of clinical reasoning in patient records [8]. To our knowledge this is the first time that such a set of indicators is evaluated systematically on criteria for good measurement properties as defined by Terwee et al. [9] They provided a framework to evaluate health-care questionnaires that was applied in this study to evaluate content validity, construct validity, reproducibility, floor and ceiling effects and interpretability of the quality indicators in the domain of the physical therapy process. The purpose of this study was to assess the quality indicators on these measurement properties.


Study population

Primary care physical therapy practices (n = 7199) were invited by the Royal Dutch Society for Physical Therapy to participate in a cohort study in 2009. In 2010, this invitation was repeated. Practices that did not participate in 2009 were urged to do so in 2010 by the health insurance companies with financial incentives.

Measures and data collection

The set of eight indicators is intended to capture the quality of the physical therapy care process. The indicators are based on a guideline that addressed the clinical reasoning process [8, 1014]. The indicators focused on the process of screening and diagnostics (n = 2), goal setting and the subsequent the intervention (n = 2), monitoring the results (n = 1), global outcome measures (n = 2) and patient's treatment agreement (n = 1). Each indicator is composed of one or more items (questions), see Appendix. The indicator scores were calculated at the patient level as the ratio of the sum of the scores of the rated items to the total of possible item scores [4] and then transformed to the physical therapist level and the practice level by determining the patients' median score per therapist and practice.

Data for these indicators were collected by the physical therapists themselves for 2 months during each year using questionnaires to report retrospectively on 30 medical records. Therapists were asked to select patients who completed their treatment and to stratify on acute vs. chronic patients conform the habitual distribution in their own practice. For each patient record, therapists reported information on the questionnaires and patient characteristics including age, gender, direct access or referral, number of treatment sessions and treatment goal. The study was conducted in accordance with the Declaration of Helsinki.

Testing framework and statistical analyses

Content validity

Content validity was more or less guaranteed in the development procedure. However, as new ground is being broken, it is important to reflect on other aspects of content validity too, such as item selection and reduction, and the interpretability or understandability of the questions.

Construct validity

As construct validity relates scores on this instrument to that of other measures of the same underlying concept, it can be assessed by testing predefined hypothesis using known group techniques, i.e. about expected differences between ‘known’ groups [9] if no other instrument to compare with is available. Gijsbers van Wijk et al. [15] found that women were diagnosed and treated less quickly and less aggressively than men for ischemic heart disease, leading to less favorable outcomes. Deutscher et al. [16] found significantly lower functional status outcomes for female than for male patients with knee, cervical spine and shoulder impairments. It is hypothesized that the different view on men and women will also be present in physical therapy.

Hypothesis 1

Male patients will receive higher scores than female patients on outcome indicators 6 and 7.

Furthermore, Mayer et al. [17] reported that age has negative effects on the outcomes of tertiary rehabilitation for chronic disabling spinal disorders. Deutscher et al. [16] also found a negative effect of age on functional status outcomes in outpatient physical therapy practice, as well as higher outcome scores for patients with acute vs. chronic symptoms.

Hypothesis 2

Younger patients will receive higher scores than older patients on outcome indicators 6 and 7.

Hypothesis 3

Acute patients will receive higher scores than chronic patients on outcome indicators 6 and 7.

Hypotheses for the process indicators (indicators 1 through 5 and indicator 8) could not be formulated due to a lack of scientific evidence.

Logistic multilevel modeling in ‘MlWin’, version 2.02 was used to examine the effect on the two outcome indicators of each of the characteristics while controlling for others. Because skewness of the outcome distribution was high, the indicator scores were dichotomized (indicator score ≥50 = 1; <50 = 0). Age, gender and chronicity were also dichotomized with the following reference categories, younger than 65 years, female and acute patients (code 0). The use of multilevel analysis was necessary due to nested data (i.e. patients, therapists and practices) [18]. A very strict significance level of P < 0.001 was used because of the large size of the data set.


Reproducibility is ‘the degree to which repeated measures in stable persons (test–retest) provide similar answers’ [9]. The data set collected in 2009 and 2010 and with different patients was not appropriate to proper test–retest procedures. However, we decided to compare data in 2009 and 2010 at the therapist level and at the practice level using Spearman's rho to establish consistency.

The indicators were rated as positive with respect to the approximation of the reproducibility if the correlations between 2009 and 2010 were strong and significant at P < 0.001. As a rule of thumb, the guidelines by Cohen are used (ρ 0.1–0.3 is considered weak, 0.3–0.5 as moderate and >0.5 as strong) [19].

Floor and ceiling effects

Floor and ceiling effects describe the percentage of respondents who received the highest or the lowest possible score. If these effects are present, differentiation is not possible and changes cannot be measured, which threatens both reliability and responsiveness of the indicators. This property was assessed by examining the percentage of therapists and practices that received the highest or lowest possible score. Terwee et al. [9] have set the threshold at <15% receiving the highest or lowest possible score.


As interpretability refers to the ability to assign qualitative meaning to quantitative scores [9], subgroups were compared with assign meaning to the scores of the whole group of patients. Patient groups categorized by age (below 65/65 and older), gender (male/female), chronic care (yes/no) or direct access (yes/no) that were expected to differ based on a mixture of evidence (see the section on construct validity) and consensus were compared. Furthermore, in 2010, it became clear that health insurance companies shifted the measurement aim from evaluative to discriminative to help decide which practices would receive pay for performance. It is therefore conceivable that indicator scores in 2010 were higher than in 2009. Taking the mean score over the 2 years would then lead to overestimation of the indicator scores. To test this, the group of therapists that provided data only in 2009 was compared with those that only provided data in 2010. Logistic multilevel modeling was used for this property with a significance level of P < 0.001.


Study population

From all physical therapy practices in the Netherlands invited to participate, a total of 3743 practices (52%) participated (of which 41.3% participated only in 2009, 39.5% only in 2010 and 19.2% in both years). Within the 3743 practices 11 274 therapists assessed a total of 311 751 patients. Table 1 presents characteristics of the practices and patients in the analyses. Data showed representativeness compared with a national representative sample [1921], although solo practices were underrepresented, larger practices were overrepresented and patients younger than 24 were also underrepresented.

View this table:
Table 1

Characteristics from the participating practices and patients in comparison with representative samples

Study population, n = 3474 practices (%)National representative sample 2010, n = 1969 therapists [20], n = 4719 practices [21] (%)
Practice type
 Monodisciplinary72.861 [20]
 Multidisciplinary27.239 [20]
No therapist per practice
 13.131.3 [21]
 219.516.2 [21]
 3–546.424.2 [21]
 More than 531.028.3 [21]
Physical therapy care, n = 311 751 patientsNational representative sample 2010 [22], n = 13 180 patients
Gender patients
Age patients
 0–11 years5.62.7 (0–14 years)
 12–18 years5.3
 19–24 years5.518.8 (15–24 years)
 25–44 years28.126.5
 45–64 years35.137.7
 65 years and older20.422.2
Number of sessions
 More than 1816.6
Direct access41.442.6

Content validity

Item selection and reduction and interpretability of the items

In three discussion rounds indicator subjects were listed on importance, prioritized and last the indicators were defined. Consensus was reached on all indicators by all stakeholders after which the indicator set was judged on terminology and definitions by an expert group. A drawback of the procedure is that the questionnaire became quite lengthy. As therapists were involved in developing the questionnaire, they should be able to understand or interpret the questions properly. Two questions that were used in indicators 1 and 2 (the screening and diagnostic processes for direct access patients and referred patients) consisted of more than one question simultaneously. However, excluding these items did not change the outcomes for these indicators; therefore, it was assumed that these items did not cause problems with interpretation amongst the respondents. Overall, the content validity can be assessed as acceptable, but steps can be taken with respect to item reduction.

Descriptives of the indicator scores

Scores were high for all indicators, as seen in Fig. 1. The outcome indicators 6 and 7 showed the highest room for improvement, followed by indicators 1 and 2 on screening and diagnostics and indicator 8 on patient involvement. Indicators 3, 4 and 5 on the intervention had median scores of 95% or higher.

Figure 1

Boxplots of median indicator scores at practice level. Note: outliers are not represented.

Construct validity

The results in Table 2 show that all three hypotheses were confirmed. On the outcome indicators male had significantly higher odds than female to receive a higher score [indicator 6: Exp(B) = 1.34; 99.9% CI 1.30–1.37; Indicator 7: Exp(B) = 1.19; 99.9% CI 1.15–1.23]; elderly (over 65 years) had significantly lower odds than patients younger than 65 [indicator 6: Exp(B) = 0.52; 99.9% CI 0.47–0.56; indicator 7: Exp(B) = 0.67; 99.9% CI 0.61–0.73] and chronic patients had a significantly lower odds than patients with acute symptoms [indicator 6: Exp(B) = 0.67; 99.9% CI 0.62–0.73; Indicator 7: Exp(B) = 0.69; 99.9% CI 0.63–0.75]. Construct validity on the two outcome measures is therefore assessed as acceptable.

View this table:
Table 2

Mean scores for subgroups and results multilevel analysis with the indicators as dependent variable and age, gender, direct access vs. referred patients, chronic vs. acute patients and data provided only in 2009 or only in 2010 as independent variables Exp(B), 99.9% confidence interval and P-value)

Dependent variableAge
<65 years (ref.)>65 years
M (SD)M (SD)Exp(B)99.9% CIP-value
Indicator 190.8 (12.7)89.9 (13.2)0.930.85–1.000.001
Indicator 286.2 (17.6)84.6 (18.6)0.930.88–0.97<0.001
Indicator 395.2 (17.4)95.1 (17.2)0.930.87–0.99<0.001
Indicator 491.5 (17.5)90.0 (17.4)0.800.75–0.84<0.001
Indicator 595.3 (21.1)93.6 (24.5)0.860.78–0.94<0.001
Indicator 678.9 (22.7)69.2 (24.7)0.520.47–0.56<0.001
Indicator 780.3 (24.8)73.7 (26.3)0.670.61–0.73<0.001
Indicator 889.4 (15.9)89.2 (16.1)0.960.92–1.00<0.001
Female (ref.)Male
M (SD)M (SD)Exp(B)99.9% CIP-value
Indicator 190.5 (12.8)90.8 (12.7)1.030.98–1.080.08
Indicator 285.5 (18.0)86.1 (17.8)1.041.00–1.080.002
Indicator 395.1 (17.4)95.2 (17.3)1.030.98–1.070.054
Indicator 490.9 (17.5)91.5 (17.5)1.081.05–1.11<0.001
Indicator 594.8 (22.2)95.2 (21.4)1.030.97–1.100.11
Indicator 675.6 (23.6)78.7 (23.1)1.341.30–1.37<0.001
Indicator 778.0 (25.5)80.3 (24.8)1.191.15–1.23<0.001
Indicator 888.7 (17.0)89.0 (16.8)1.000.97–1.030.58
 Direct access vs. referred patientsa
Referred (ref.)Direct access
M (SD)M (SD)Exp(B)99.9% CIP-value
Indicator 395.5 (16.3)94.6 (18.8)0.890.84–0.94<0.001
Indicator 490.9 (16.7)91.6 (18.7)1.301.26–1.34<0.001
Indicator 594.1 (23.5)96.3 (18.8)1.341.27–1.42<0.001
Indicator 674.3 (24.2)80.8 (21.7)1.331.29–1.37<0.001
Indicator 776.1 (26.1)83.1 (23.2)1.331.28–1.38<0.001
Indicator 888.4 (17.5)89.9 (15.9)0.880.84–0.92<0.001
 Chronic vs. acute patients
Acute (ref.)Chronic
M (SD)M (SD)Exp(B)99.9% CIP-value
Indicator 190.8 (12.6)89.9 (13.5)0.920.81–1.040.02
Indicator 286.2 (17.5)86.2 (17.7)1.010.96–1.060.62
Indicator 395.2 (17.2)95.6 (16.1)0.980.92–1.050.34
Indicator 491.6 (17.3)90.4 (16.4)0.920.87–0.96<0.001
Indicator 595.8 (20.1)92.6 (26.2)0.660.57–0.74<0.001
Indicator 678.0 (23.5)72.0 (22.2)0.670.62–0.73<0.001
Indicator 780.3 (25.2)74.1 (24.1)0.690.63–0.75<0.001
Indicator 889.0 (16.6)88.7 (17.4)1.161.11–1.20<0.001
 Data provided in 2009 vs. only in 2010
Only 2010 (ref.)2009
M (SD)M (SD)Exp(B)99.9% CIP-value
Indicator 191.2 (12.1)90.2 (13.2)0.870.72–1.010.002
Indicator 287.2 (17.0)84.9 (18.4)0.810.68–0.94<0.001
Indicator 396.0 (15.8)94.6 (18.2)0.750.59–0.92<0.001
Indicator 492.2 (16.2)90.4 (18.2)0.780.68–0.88<0.001
Indicator 595.7 (20.2)94.5 (22.7)0.770.61–0.92<0.001
Indicator 677.7 (23.2)76.4 (23.5)0.910.83–0.99<0.001
Indicator 779.2 (25.1)78.7 (25.5)0.950.86–1.040.058
Indicator 889.8 (16.1)88.0 (17.4)0.790.68–0.91<0.001
  • aIndicators 1 and 2 not calculated, because indicator 1 is only for direct access patients and indicator 2 only for referred patients

  • Bold: significant (P < 0.001).


Spearman's rho was significant for all indicators and was moderately strong ranging from 0.35 to 0.5 at therapist level and from 0.30 to 0.40 at practice level, as seen in Fig. 2. The approximation of reproducibility measured as correlation between 2009 and 2010 overall is rated moderately positive. However, the correlations for indicators 3 and 5 could not be estimated properly because of the extremely high percentages of maximum scores: 83.9% of the therapists had the maximum score on indicator 3 in both 2009 as 2010 and 96.2% had the maximum score on indicator 5 in both years. The same pattern occurred on the practice level (90.1 and 98.3%, respectively). We therefore consider the correlation for indicators 3 and 5 to be very strong.

Figure 2

Median indicator scores in 2009 and 2010 at therapist and at practice level and Spearman's rho.

Floor and ceiling effects

No floor effects were measured. Ceiling effects, however, were present for all indicators at the patient level, at therapist level and at practice level and for both years, ranging from 29.1% (indicator 6) to 95% (indicator 5) at patient level, at therapist level from 15% (indicator 6) to 93% (indicator 3) and last at practice level from 11% (indicator 6) to 95% (indicator 3) as seen in Table 3 (2009 and 2010 combined). Overall, the ceiling effects are assessed as negative due to the high percentage of maximum scores on the indicators.

View this table:
Table 3

Ceiling effects of indicators 1 through 8 in percentage of patients, therapists and practices that received the highest possible score (100%), possible contaminationa and ceiling effects without possible contamination

Indicator 1Indicator 2Indicator 3Indicator 4Indicator 5bIndicator 6Indicator 7Indicator 8
Possible contaminationa16.911.548.
Ceiling effects without possible contamination
  • Note: bold is below threshold of 15%, values for 2009 and 2010 combined.

  • aPossible contamination: percentage of therapist with highest score on indicator and no variance amongst their patients.

  • bIndicator 5 is a dichotomous indicator.


Age, gender, chronic care and direct access had significant effects on the outcome indicators 6 and 7 as discussed in the section Construct validity (see Table 2). Besides this, elderly patients (>65 years) had significantly lower odds to receive a higher score than patients younger than 65 years on all other indicators, except for indicator 1. Male patients had significantly higher odds than female patients only on indicator 4 and chronic patients had significantly lower odds than acute patients on indicators 4 and 5. Direct access patients had significantly higher odds than referred patients to receive high scores on all indicators. Therapists who only provided data in 2009 had significantly lower odds than therapists who only provided data in 2010 on all indicators except for indicators 1 and 7. Interpretation of all indicators but indicator 1 was assessed as acceptable, as distinctions were possible between groups.

Conclusion and discussion

Assessing the measurement properties of quality indicators is vital, but rarely described in the development process of quality indicators. The properties of the indicators for the psychical therapy process showed a mixed picture in terms of their appropriateness for use in the public domain (see Table 4). Most properties were rated moderately positive (reproducibility) to acceptable (content validity, construct validity and interpretability), but the presence of ceiling effects is problematic and it is recommended to reduce these effects before further implementation.

View this table:
Table 4

Conclusions on properties per indicator

Indicator 1Indicator 2Indicator 3Indicator 4Indicator 5Indicator 6Indicator 7Indicator 8
Content validity++++++++
Construct validityNaNaNaNaNa++Na
Ceiling effects±
  • Note: +, rated positive; ±, rated neutral; −, rated negative.

Explanation of the findings

Although reproducibility could not be assessed properly, an approximation was made based on correlations between 2009 and 2010. Higher correlations for the process indicators between 2009 and 2010 were expected; however, as these indicators consist of steps in clinical reasoning and should thus remain constant over time. One possible cause of the low correlations is improvement of the quality of care. Coming into contact with the guideline recommendations on clinical reasoning might have caused the therapists to adjust their methods accordingly. Another possibility is that selection bias might have occurred, because in 2010 some health insurance companies began to reward higher quality practices. As the therapists could select the cases themselves, they could have influenced the outcomes of the indicators positively by selecting certain cases. Further, the high ceiling effects make it difficult to detect relevant changes in quality as well as to differentiate between therapists and practices at the high end of the scale. The ceiling effects of indicator 3 and 5 might be partially explained by the fact that they consisted of dichotomous items and thus had a higher chance of receiving maximum scores. Also, some physical therapists showed no variance at all in the data set. Even if a therapist is very consistent in their approach, this result is unlikely. This could mean that there is some contamination in the data, caused by these therapists that rated high scores on all indicators. For therapists who showed any variance at all, ceiling effects were ∼30% for indicators 1, 2 and 7 and ∼45% for indicators 3, 4 and 8, as seen in Table 3. Furthermore, the indicators were based on the guidelines for physical therapy and capture the required level of care. It therefore is not surprising that the results were high, as most therapists are aware of these guidelines. Finally, selection bias could explain the high scores, as therapists could select the cases themselves.

Strength and weaknesses

It is difficult to assess whether inadequate properties tell us something about the indicators or whether they are caused by failures in the method of data collection. Selection bias could have caused higher ceiling effects as well as differences between the findings for 2009 and 2010, and is a threat to validity. The validity was also threatened, because not all physical therapists used electronic medical records (EMRs) to a full degree in which case the questionnaires had to be filled out retrospectively, possibly causing recall bias. The self-selection of the participating physiotherapists in relation to the perspective of pay for performance could also have led to overestimation of the indicator scores and possibly to higher ceiling effects. Furthermore, the data collection period was too short. In both years, therapists had 2 months to collect all data and the patients had to have completed their treatment episode which possibly led to exclusion of children (longer intervention period) and a underrepresentation of chronic and older patients.

A limitation of the study was differences in measurement aims. In the beginning of the development process, the stakeholders agreed to use evaluative indicators. However, the health insurance companies later indicated that they wanted the indicators to be used for discriminative purposes as well. This could have affected the outcome of the indicators as the scores in 2010 were higher than in 2009. Taking the mean over 2009 and 2010 would then lead to overestimation of the outcomes. Further, different aims require different approaches for testing the properties of the indicators. For example, with respect to reproducibility, the between-therapist and between-practice variance is important for discriminative purposes but not for evaluative purposes [7]. In a separate test, it was shown that this property was met for discriminative purposes (see Table 1, Supplementary Data Digital Content 1, in which the null models and intra-class correlation coefficients are presented for all indicators). However, to differentiate between therapists and practices, a norm for the required level of quality should have been established beforehand. Without a norm, inferences made based on the mean score of all practices remain arbitrary.

A drawback of the development process by consensus was that the indicators became quite general. All stakeholders had to be satisfied and the indicators had to be general enough to apply to all physical therapists and all patients. These factors led to indicators that focused more on the administrative component of the physical therapy process and less on measuring whether the physical therapist's diagnosis was in concurrence with the question for help, the treatment, the treatment goals and the treatment effect.

Implications for policy, future research and physical therapy

In a random sample of 200 practices it was shown that half of the practices already used the results to improve the quality of care, but the measurement properties need to be fortified for public disclosure. Selection bias can be reduced by randomizing the selection of patient cases. One way to achieve this goal would be to extract data directly from EMRs to obtain more control over the selection procedure, which could decrease the ceiling effects and thus provide more discriminative power to the indicators. Moreover, using EMRs for data collection would reduce the extra-administrative burden. In addition, establishing a clear evaluative or discriminative measurement aim would help to increase the feasibility of the instrument. All stakeholders should agree on the aim of the instrument as disagreement could lead to mistrust among therapists and thus a declining willingness to participate. If stakeholders choose to use the indicators for discriminative purposes, establishing a minimal required level of quality is essential to formulate conclusions on quality of care.

With respect to the general nature of the instrument, choices must be made. Asking more specific questions to establish whether the diagnostics, treatment and treatment goals are adequate for the condition of the patient will increase the usability of the indicators to improve the quality of care. The set of indicators could include the five most common conditions, i.e. lower back, neck, back, shoulder and knee complaints [23]. This would also be beneficial to the global outcome measures which can be described more precisely, that is condition specific and it is possible to include patient-reported outcomes as well. However, some physical therapists would not be reached because of their specialty, such as pediatric or geriatric physical therapists. If the comparability of the entire group of therapists is most important to the stakeholders, improvements to the instrument can be made by excluding indicators that provide little information. We learned that keeping a broad scope might reduce the ability to draw meaningful conclusions on the level of quality in physical therapy.

To truly measure quality, it is imperative to have scientifically developed and tested quality measures. Too often, giant leaps in quality research are taken, under political pressure, time pressure or pressure from health insurance companies. Mountains are not climbed in one giant leap however, but step by step. With this study, the next small but valuable step in researching the quality of health care has been taken.


This work was supported by the Dutch Healthcare Authority (NZa) and the Dutch Ministry of Health.


First and foremost, we would like to thank the physical therapists and the physical therapy practices that participated in this study. Further, we would like to thank Prof. Dr R.A. de Bie and Prof. Dr R.A.B. Oostendorp for their valuable contribution to this project. We would also like to thank all stakeholders involved in the development process, more specifically: the Royal Dutch Society for Physical Therapy (KNGF), the Federation of Patients and Consumers Organizations (NCPF), the Healthcare Inspectorate (IGZ), the Association of Healthcare Insurance Companies (ZN), the Ministry of Health and the Dutch Healthcare Authority (NZa). Last, we would like to thank the Institute for Applied Social Sciences Nijmegen (ITS) for the data collection.


View this table:
Table A1

Quality indicators for physical therapy care process: short description, indicator, type of indicator and items measured

NoShort descriptionIndicatorType of indicatorItem measured in questionnaire
1Direct access:
Screening- and diagnostic process
The average degree (in %) in which the direct access patients received a methodically performed screening and diagnostic processProcess
Request for help:
Conclusion screening:
1.1. Asked and 1.2. administrated
1.3. Administrated
1.4. Determined systematically, and
1.5. Administrated
2Referred patients:
Diagnostic process
The average degree (in %) in which referred patients received a methodically performed diagnostic processProcess
Request for help:
2.1. Asked and 2.2. administrated
2.3. Determined systematically and
2.4. Administrated
3Intervention goalsThe average degree (in %) in which intervention goals were determined methodically for the patientsProcess FormativeGoal(s)3.1. Defined and 3.2. administrated
3.3. Fitted to request for help
3.4. Based on diagnosis
4Intervention processThe average degree (in %) in which patients received a methodically performed intervention processProcess
Goal(s):4.1. Defined (see 3.1) and
4.2. Administrated (see 3.2)
4.3. Reached (main goal)
Intervention(s):4.4. Administrated
4.5. Administrated (see 5)
Intervention result(s):
5Administration intervention resultsThe percentage of patients whose notes record the intervention resultsProcessNote in record
6Perceived intervention resultsThe average degree (in %) in which the intervention goals (total recovery, reduction of complaints or stabilization) in terms of function and activity and participation are considered to be reached for the patientOutcome
Perceived result per goal (maximum of 15 goals) response: not at all, somewhat, largely, completely
7Measured intervention resultsThe average degree (in %) in which the intervention goals (total recovery, reduction of complaints or stabilization) in terms of function and activity and participation have been reached by use of measurement instrumentsOutcome
Result of objective measure per goal (maximum of 15 goals) response: not at all, somewhat, largely, completely
8Information shared and agreed with the patientThe average degree (in %) in which information was shared with and agreed upon by patientsProcess
Shared and agreement
information on:
8.1. Screening processa
8.2. Diagnostic process
8.3. Defined goals
8.4. Intervention process
8.5. (Interim) evaluation
8.6. Outcomes
8.7. Closure of episode
  • aOnly for direct access patients.


View Abstract