OUP user menu

Classifying indicators of quality: a collaboration between Dutch and English regulators

Alex Mears, Jan Vesseur, Richard Hamblin, Paul Long, Lya Den Ouden
DOI: http://dx.doi.org/10.1093/intqhc/mzr055 637-644 First published online: 16 August 2011


Introduction Many approaches to measuring quality in healthcare exist, generally employing indicators or metrics. While there are important differences, most of these approaches share three key areas of measurement: safety, effectiveness and patient experience. The European Partnership for Supervisory Organisations in Health Services and Social Care (EPSO) exists as a working group and discussion forum for European regulators. This group undertook to identify a common framework within which European approaches to indicators could be compared.

Approach A framework was developed to classify indicators, using four sets of criteria: conceptualization of quality, Donabedian definition (structure, process, outcome), data type (derivable, collectable from routine sources, special collections, samples) and data use (judgement (singular or part of framework) benchmarking, risk assessment). Indicators from English and Dutch hospital measurement programmes were put into the framework, showing areas of agreement and levels of comparability. In the first instance, results are only illustrative.

Conclusions and implications The EPSO has been a powerful driver for undertaking cross-European research, and this project is the first of many to take advantage of the access to international expertize. It has shown that through development of a framework that deconstructs national indicators, commonalities can be identified. Future work will attempt to incorporate other nations' indicators, and attempt cross-national comparison.

  • indicator
  • metric
  • classify
  • categorize
  • measure
  • Europe


Throughout Europe (and indeed the broader developed world) there is increasing interest in measuring the quality of health and personal care services as a mechanism to stimulate improvement, inform consumer choice, provide public and payer accountability and populate risk management strategies [1, 2].

However, while this is occurring widely, international comparisons to allow countries to compare themselves with the global best practice, contextualize national performance and raise sights above the local has not yet been widely grasped. While some first steps are being undertaken by the Organisation for Economic Co-operation and Development [3], regulators and inspectors themselves have yet to collaborate widely on comparing existing indicator sets.

At this time there is not a clear method for classifying various indicators and thus ensuring that they are actually comparable. This is a task made difficult by the fact that each system inevitably collects slightly different data, classifies inputs, outputs and outcomes rather differently and has rather different conceptualizations of health and care. The task becomes more complex still because the required classification is multidimensional.

The following paper is a first attempt to define such a classification. Its genesis was in a meeting of The European Partnership for Supervisory Organisations in Health Services and Social Care (EPSO, 2010 [4]), where Dutch, English and Swedish representatives compared experiences in this field and considered how to advance the agenda. This conversation resulted in the commitment to develop a multidimensional approach to classifying indicators about care. The approach outlined below received broad support from all members present at the September 2009 meeting of the EPSO, in Stockholm, Sweden.


In 1996, the Dutch Inspection for Healthcare (IGZ [5]), in close co-operation with the Norwegian inspection body (Statens Helsetilsyn [6]), initiated a European Platform for Supervisory Organisations—now known as the European Partnership for Supervisory Organisations in Health and Social Care (EPSO).

The EPSO was intended as an informal group of governmental or government-related organizations in the area of supervision and enforcement related to health services in the European community and European free trade area. The objective was to establish a low-profile forum for networking, information exchange and co-operation.

One aim of the EPSO has been to stimulate and encourage the adoption of good practices and to compare performance and generally to exchange experience in the field of healthcare regulation. More recently a growing number of cross-border initiatives are taking place.

In late 2007, a meeting of the EPSO held in the Netherlands was attended by the IGZ, Statens Helsetilsyn and the Healthcare Commission [7] from England. At this meeting, EPSO gained a renewed vigour, with workstreams beginning and an agreement to expand membership. In November 2010, 25 delegates from 13 countries across Europe attended.

The importance of healthcare indicators

Healthcare indicators can be used to provide a judgement of how well an organization is fulfilling its functions as well as to drive improvement [1]. From a regulatory perspective, both of these are central. While the work carried out by Organisation for Economic Cooperation and Development (OECD) has provided an invaluable tool for national improvement, what has yet to be done is to look at how indicators are classified from a regulatory perspective.

Many developed countries have already instituted national strategies to begin to collect technical quality indicators often for benchmarking purposes [8]. Those efforts have brought about much progress in implementing quality indicators at the level of providers, such as hospitals or physicians. However, these national activities do not lead, except by accident, to internationally comparable quality indicators. That is because there is a lack of international agreement on the most promising indicators and many definitions of each indicator that could be adopted. Notwithstanding the complexities of international comparison, there is benefit in comparing developments in many advanced healthcare systems so that we can understand and learn from them [9].

In the USA, the Department of Health and Human Services supports research to improve the outcomes and quality of healthcare through the production of indicators. The Agency for Healthcare Research and Quality (AHRQ) [10] sponsors the development of indicators in four areas: prevention quality, in-patient quality, patient safety and paediatric safety. These are available, along with other indicators from a variety of USA and other sources, via the National Quality Measures Clearinghouse (NQMC) [11]. In the UK, the improvement of quality indicators is now a key thrust of the changes to the National Health Service (NHS). In a recent review, the King's Fund has sought to define good quality indicators and shows how they can effectively be used and misused [12]. Goals in promoting good indicator use are minimizing perverse incentives, engagement of clinicians and minimizing misinterpretation in reporting and dissemination.

The OECD has, through its Health Care Quality Indicator project (HCQI) [13], developed a conceptual framework for healthcare indicators [14]. It also publishes data enabling direct comparison between member countries. This work is at the level of health systems and aims to complement nations' own improvement initiatives and stimulate cross-national learning.

Dimensions of a common framework for measurement classification

Many nations are now undertaking measurement of healthcare quality and/or performance for a variety of reasons. In each case, the exact drivers for measurement, the need being fulfilled and the care system evaluated are different. It was quickly realized that any classification of indicators required a common language as a basis. Four dimensions were selected by the group as fundamental to a common framework:

  • Conceptualization of quality.

  • Donabedian definition (structure, process, outcome) [15].

  • Data type (derivable, collectable from routine sources, special collections, samples).

  • Indicator use (judgement singular, judgement as part of framework, benchmarking, risk assessment).

By unpacking these dimensions, we were able to begin to understand how a classification system might be constructed.

Conceptualization of quality

There have been many conceptualizations of quality, and these form part of the classification frameworks that have been developed by various different national and international bodies (for example, OECD [16], Agency for Healthcare Quality (USA) [17], Department of Health (England) [18], Department of Health, Social Services and Public Safety (Northern Ireland) [19]). From these, the elements below have been identified as common themes and have been used as part of the model for this project.

The parameters of the conceptualization to some extent depend upon how broadly the model of care is being assessed. It could cover purely medical care, or might encompass a more holistic view of care, including social and nursing care. If it is the latter, then some notion of health, independent or even fulfilled lives would be added to the conceptualization.

  • Safe care: avoidance of harmful intervention.

  • Effectiveness: care which conforms with best practice and which is most likely to maximize benefit for patients and service users.

  • Patient/service user experience: how a person experiences care and the extent to which that is positive.

There is then often a consideration of:

  • Efficiency: how effectively resources are distributed to maximize benefit to service users per resource expended.

Within a broader definition of health there is often some measure of how far individuals are living independently and healthily, and a consideration of population health. This latter looks at the broader determinants of good health at the population level, and includes prevention as well as treatment, and epidemiological factors. Accountability for population health within the care system is not easy to pin down. Responsibility for the narrower definition of quality (safety, effectiveness and patient experience) lies within the remit of the healthcare provider (which is accountable to the supervisory body or regulator); population health must be traced back to the common responsible entity; in UK, for example, this accountability will reside with local primary care health provider bodies as well as the local government, and the common entity is therefore at the national governmental level. For the purposes of regulation, only care providers fall within the remit of the Care Quality Commission; the broader conceptualization of population health is not regulated.

The Donabedian conceptualization

The Donabedian (structure, process, outcome) [15] conceptualization of achievement of quality is well known. Structure refers to the underpinning infrastructure and resources that an organization has in place to achieve its aims (people, material, policies and procedures). Process refers to what an organization actually does, and outcome refers to the results of what an organization does [20].

There is widespread enthusiasm for measuring outcomes [21]. Achievement of outcomes for individuals (as opposed to the performance of specific tasks) is, after all, the purpose of care. Yet as measures of quality, outcomes have limitations, largely associated with issues of causality. For example, does a low mortality rate in a hospital point to better care, healthier patients or (particularly where low numbers are involved) chance? The honest answer from the outcome measure alone is that we are unlikely to be able to tell.

Process measures have the advantage that they are easier to interpret [22]. For example, if prophylactic antibiotic use is indicated as good practice, then this guidance should be followed. Because of this, it is possible to construct indicators where higher or lower is unequivocally better. The argument against them is that their specificity can lose sight of the whole process of care, and they can end up rewarding organizations for doing the ‘wrong thing’ very well. Thus, a hospital treats patients according to good clinical practice once admitted, but has extraordinarily high admission rates because the system as a whole does not work together to minimize hospital admissions (which if nothing else is likely to be an inefficient use of resources) [23].

Structure is in general the least valuable of the three types of indicator for identifying areas of weakness or strength, although it has great value for providing hypotheses of why outcomes and processes look good or bad. For example, insufficient or insufficiently trained staff may explain poor processes and treatment.

Structural information is the most routinely available because it is necessary for the legal administration of organizations (e.g. expenditure, staff numbers etc), and so it is often used as a proxy for unavailable data concerning processes or outcomes. This, however, is risky as the relationship between structure and process or structure and outcome is often not well established [15].

The insufficiency of any one of the three types of measure alone means that we should look to use all three in a systematic way wherever possible. Outcomes and processes combined help us to identify if there may be problems, or indeed exceptional good practice. Structural data may help us determine causes of problems or share lessons about what is working well.

When constructing a framework that will form the basis for comparing indicators, the Donabedian conceptualization is not only necessary but also extremely helpful. The Donabedian way of thinking has become embedded within the way that many regulators measure quality, and as such it is unthinkable to not include it in this categorization. Further, it provides a highly convenient route to unpack sometimes complex indicator frameworks, in order to facilitate the comparison across different health systems and make it easier to identify areas of commonality and divergence.

Data type

Different types of information can be used according to how this is collected. There are of course pros and cons of each.

Data derived from individual level data sets

This data is collected in the process of giving care, as part of that process. It is therefore timely and clinically relevant, and thus likely to be a more accurate reflection of what happened than that collected at a later stage and that is not at clinical event level. As such, the costs of its collection are sunk into the costs of care (although as this implies the development of electronic medical records, these sunk costs may be huge).

The indicators themselves will either be derived centrally from all organizations' data or will be derived according to a set process, meaning that there is less danger of inconsistent interpretation in collection and therefore of poor data quality than if this process were to take place locally on an ad hoc basis and later aggregated.

There are, however, two weaknesses in this type of data. It depends upon the same data being collected in the same way across all organizations. Theoretically (and only theoretically) this is achievable in centralized systems such as the NHS in the UK, and arguably harder in decentralized systems (although regulation could be used to insist upon the collection of a central core of data).

More fundamentally, it can only produce indicators covering those areas that the electronic medical record covers. Thus it is potentially an excellent source of measures of clinical process, but is unlikely to cover patient experience and perception, or outcomes beyond mortality and readmission. For these we need other sources of data.

Routinely collected data

Aggregate data (i.e. total number of admissions, contacts, deaths etc) collected ‘after the fact’ and returned at set time periods to central management, purchasers etc. has been the traditional way of collecting data. As a method of collecting information, this can be seen as an expensive distraction and may deliver inconsistent and inaccurate data. However, long-standing familiarity with the collection of information may mean that systems are set up to collect the data semi-automatically and mitigate against this.

Specially collected aggregate data

Where data are simply unavailable, one approach is to request that information be collected specifically by organizations. In many instances this is the only option available to collect information. However, there are specific risks about interpretation of what is meant to be collected and this could lead to inconsistent and low-quality information.

A specific instance of a special data collection is the sample. Samples are needed where we do not, or cannot, collect information routinely as part of the care giving process, and where there are too many individual instances to gather information from all of them. A good example is the use of survey mechanisms to gain a closer insight into the experiences of patients. Indeed for issues of experience and longer term outcomes, surveys, and thus samples, are likely to be the only practical way of gaining usable data.

Data use

The final parameter is how the data are actually used. Data may be used to generate direct judgements, used in a framework of different measures to make an overall judgement, used to compare with other organizations but make no explicit judgement or used to assess likelihood of overall good or poor performance but make no judgement about it. The final section of Part 1 considers each of these in turn.

Judgement from single indicators

Single measures assess one focussed aspect of care with a defined threshold of acceptable performance. Their purpose is to measure in isolation from any other measures, typically with a percentage representing acceptable performance. Single measures have some clear advantages, as they are very visibly linked to policy and clearly highlight poor performance. They are also simple to understand for patients and public. From the negative perspective, they can be prone to gaming and manipulation, especially where linked to an incentive [24]. There is always a temptation to manipulate either the situation or the data themselves to portray performance as more favourable than it actually is; this effect is magnified where a large incentive is attached to a measure. This ‘gaming’ behaviour is undesirable from many perspectives, not least because it masks poor performance and may put patients' lives at risk.

Judgement framework

These share a common methodology and ideology, but can vary in scale considerably, from small aggregate (composite) measures with a few underlying indicators, through to large complicated systems (for example, a review of the quality of an entire service). A structured system is used to summarize a wealth of data. Using many indicators gives a rounded, holistic view of performance in a service area. Data of different types (as above) can be included, and even qualitative information, once suitably coded and weighted, can be contained in the framework. In short they are also a useful way to summarize complexity. From a negative perspective, they can be time consuming and resource intensive. The more comprehensive frameworks become, the more complex the aggregation models need to be. This can make them opaque to those being judged [25].


Benchmarks are peer group-based data-driven performance measures [26]. This approach does not make judgements of absolute performance; rather it establishes a realistic, achievable goal derived from comparative measurement. Measuring variation, either between similar organizations or from an accepted level of performance, is at the heart of benchmarking. The purpose of the benchmark is to inform organizations where service improvements could and should be made. Their particular power lies in allowing organizations to know, and thus explore, the areas of potential weakness and good practice.

Outlier identification

Outlier identification is, effectively, a more sophisticated application of the benchmarking idea, in that it looks for variation, but does so for statistically meaningful outliers, rather than looking at top or bottom deciles or quartiles. They can be considered in two types.

Time series outliers

These draw on control chart-type methods to show when a particular measure is going ‘out of control’—that is, crossing pre-set boundaries which indicate performance of unacceptable levels. These then trigger some form of management or regulatory intervention, even if only in the first instance to understand whether the data represent reality. These have been used to effect at the Healthcare Commission [27].

Multiple outlier patterns

These pull together tangentially related measures, which, if all or a majority show statistically significant variation from the average, indicate a risk of overarching poor performance. Again these measures are used to trigger some form of management or regulatory intervention [28].

Developing a framework

The European Partnership for Supervisory Organisations in Health Services and Social Care working group

The concept of using the expertize within the EPSO to undertake research and development projects cross-nationally is a natural progression of its original purpose, to promote understanding and shared experience, and is well supported by members. In early 2009, preliminary development work on the indicator classification framework was informally discussed at the meeting of the EPSO in Cork, Ireland, among the 32 delegates from 12 European countries. A working group was formed at this meeting, comprising leads from England and the Netherlands. Key experts were recruited to the group, which reported back to the meeting of the EPSO in Stockholm, Sweden, in autumn 2009. The presentation by the authors described the process outlined in this paper, and led to interesting discussion with regard to next steps.

The framework

The framework allows two things. First, we can use it to ensure that the indicators we have are broad enough in scope, collectable and of sufficient quality, to allow us to form an accurate view of quality. Second, the classification gives us a check on the comparability between systems. The following sets out a tool that would allow this to be done.

The following shows the template. On the left-hand side, each measure is categorized according to the dimension of quality it considers, whether this covers structure, process or outcome and the data type. On the right-hand side the use(s) to which it can be put are recorded.

To illustrate, a range of indicators being considered in England as part of the ‘Quality Account’ regime are categorized in the table below [29]. The healthcare-associated infection rates for meticillin-resistant Staphylococcus aureus and Clostricium difficile are, for example, safety-related outcome measures, collected routinely as aggregated data, which can in theory be put to any of the uses in the last 4 columns; whereas condition-specific mortality rates are measures of effective clinical outcomes, derivable from routinely collected care-related data. These can be used to benchmark and assess risk, and in an overarching judgement framework, but cannot be used to make a judgement on their own (Table 1).

View this table:
Table 1

An indicator classification framework

Dimension of qualityData typeDonabedianData use
StructureProcessOutcomeJudge singleJudge frameB'markOutlier
AggregateMRSA and C-difficile rates per 1000 bed days
Mistakes in prescription of drugs and other medications
Special collectionProcess in place that identify events that may lead to avoidable patient harm
Clinical effectivenessDerivableMortality rates for stroke, AMI, fractured neck of femur (FNOF)
30-day readmission rates
AggregatePatient recorded outcome measures
Special collectionCompliance with best practice care pathways and procedures (e.g. Acute myocardial infarction, asthma management)
Patient experienceDerivable
Aggregate% of A&E patients seen within 4 h✓ ✓
Special collection% of patients who always felt treated with respect and dignity
Special collection

Populating the framework

Cross-national comparison has yet to become a key consideration of European healthcare regulators and performance monitoring organizations, let alone a factor in routine data collection. This project does not seek to achieve this, for that, should it be deemed desirable, would be some years away. What it does propose to do is to examine current data collections, and categorize them to facilitate some proxy of national comparison through the framework above. It is unlikely, due to the differences in culture, policy and delivery system, that countries will collect data items that are directly comparable, measuring exactly the same thing. What the tool described above has enabled is the categorization of ostensibly different indicators to facilitate the construction of methods for indirect, blunt comparison.

For the pilot project, indicators in the acute setting and the safety domain from England and the Netherlands were subjected to analysis and categorization in the above template.

A pilot study

As described above, in an attempt to test the framework more thoroughly, a pilot has been conducted across two nations (England and the Netherlands), using a subset of the full framework: the safety dimension in acute hospital settings. The grid below shows a comparison of data streams in use in those two countries, highlighting indicators that are directly or indirectly comparable using a traffic-lighting system: amber where a loose proxy comparison can be done and green where the measures are directly comparable (Table 2).

View this table:
Table 2

Selected comparative indicators between England and the Netherlands

What are we measuring?Which country?IndicatorSourceData typeDonabedianJudgement singleJudgement frameworkB'markOutlier
Healthcare-associated infectionNetherlandsPost-operative wound infection (POWI)% per type of operative procedure (mean)Netherlands Health Inspectorate (IGZ)AggregatedOutcome
Healthcare-associated infectionEnglandSurgical site infection (SSI) % rate per operation (mean)Health Protection AgencyAggregatedOutcome
Healthcare-associated infectionNetherlands% compliance with POWI bundleIGZAggregatedProcess
Healthcare-associated infectionEnglandCompliance with bundle of hygiene code measuresCare Quality Commission (CQC)AggregatedProcess
MedicationNetherlands% patients with medication verification at admission/dischargeIGZAggregatedProcess
MedicationEnglandPerformance against basket of medication measures from NHS staff surveyCCQSpecialProcess

In the first comparison, the two indicators are highly comparable: post-operative wound infections and surgical site infections are very similar measures and can usefully be compared directly, once standardized. Compliance with care bundles and packages give a useful overview of good practice but cannot be directly compared. Likewise, compliance with medication protocols and self-reported behaviours taken from a survey of staff.

Next steps

It is proposed in the first instance to use these three indicators as a beginning point to examine the practicality and robustness of our approach. In all cases, the indicator described is a composite and we need to be careful to ensure that we are robust in our approach in order to yield meaningful outcomes. Our approach will be to develop a suite of indicators where possible, to be available for comparison.

Exploratory analyses will examine national mean values, and also drill down to look at variance within nations. We will also look into the data to detect trends as well as national and regional variations.

Conclusions and implications

Through exchange of information and effective communication, the EPSO has helped all member organizations to consider their own regulatory activity and stimulate internal discussion. Through its workstreams, the EPSO can advance thinking in ways that only a multinational organization that is owned by the component organizations is able to. The first manifestation of this is the work looking at cross-border comparison of indicators. The methodology and framework that has come from this project will enable comparison to be made between England and the Netherlands initially, and then to a broader group of member nations. To be able to cross-compare in a meaningful way will be valuable in benchmarking performance and driving improvement.


View Abstract