OUP user menu

Quality indicators for international benchmarking of mental health care

Richard C. Hermann, Soeren Mattke, David Somekh, Helena Silfverhielm, Elliot Goldner, Gyles Glover, Jane Pirkis, Jan Mainz, Jeffrey A. Chan
DOI: http://dx.doi.org/10.1093/intqhc/mzl025 31-38 First published online: 5 September 2006


Objective. To identify quality measures for international benchmarking of mental health care that assess important processes and outcomes of care, are scientifically sound, and are feasible to construct from preexisting data.

Design. An international expert panel employed a consensus development process to select important, sound, and feasible measures based on a framework that balances these priorities with the additional goal of assessing the breadth of mental health care across key dimensions.

Participants. Six countries and one international organization nominated seven panelists consisting of mental health administrators, clinicians, and services researchers with expertise in quality of care, epidemiology, public health, and public policy.

Measures. Measures with a final median score of at least 7.0 for both importance and soundness, and data availability rated as ‘possible’ or better in at least half of participating countries, were included in the final set. Measures with median scores ≤3.0 or data availability rated as ‘unlikely’ were excluded. Measures with intermediate scores were subject to further discussion by the panel, leading to their adoption or rejection on a case-by-case basis.

Results. From an initial set of 134 candidate measures, the panel identified 12 measures that achieved moderate to high scores on desired attributes.

Conclusions. Although limited, the proposed measure set provides a starting point for international benchmarking of mental health care. It addresses known quality problems and achieves some breadth across diverse dimensions of mental health care.

  • benchmarking
  • consensus development
  • international
  • mental health
  • quality measures


Mental health and substance-related disorders are prevalent, disabling, and costly. An estimated 450 million people worldwide are affected by mental, neurological, or behavioral problems at any given time, and studies indicate that approximately one in five will experience a psychiatric disorder within a given year [1]. A World Health Organization (WHO) study found these conditions to account for almost 11% of the global burden of disease in 1990. Among the 10 leading causes of disability worldwide, 5 are psychiatric conditions—depression, bipolar disorder, schizophrenia, obsessive–compulsive disorder, and alcohol abuse [2].

The percentage of annual health care expenditures spent on mental health care varies widely, exceeding 20% among the highest spending nations [1]. Furthermore, mental health and substance-related disorders incur large indirect costs in utilization of other medical services, in lost work productivity, and in the burden on families and other caregivers. Effective medication and psychosocial treatments exist for many mental disorders [3]. However, research studies have documented wide variations in the quality of care including gaps between clinical practice and evidence-based guideline recommendations [4]. These findings have led to widespread attention to improving the quality of mental health care [5–9].

In this issue of the journal, Mattke et al. [10] describe the potential utility of measurement-based quality improvement and international comparisons of measure results. Data on key processes and outcomes of care can facilitate improvement within organizations delivering care, provide oversight of quality by public agencies and private payers, and provide insight into what levels of performance are feasible. These activities require robust measures that permit meaningful comparisons across providers, systems, or geographic regions. In many areas of health care, however, there is a lack of agreement on which measures should be used. In mental health care, the challenges are to some extent even greater—the diverse nature of the field and competing priorities among stakeholders have slowed consensus development on a core set of measures for common use [4]. Despite these limitations, several countries have implemented measures to evaluate mental health care [11–22].

The Organization for Economic and Community Development’s Health Care Quality Indicators Project (OECD–HCQI) is the first effort of which we are aware to identify measures for international benchmarking of quality of mental health care. This article reports on the methods employed to develop consensus among participants along with the resulting measures, as well as challenges to be surmounted for further progress to be achieved.


We conducted a consensus development process with a panel of international experts drawing on established procedures and based on a framework for quality-measure selection. The consensus development process employed elements of the modified Delphi method, because it has been applied to selection of quality measures [23,24]. The panelists’ evaluation was conducted over two phases. In the first phase, each panelist anonymously rated measures on numerical scales for importance, soundness, and feasibility. The results were used to identify measures with sufficient consensus for inclusion or exclusion. For measures where consensus was lacking, the panel conducted a second review, making decisions about inclusion/exclusion on a case-by-case basis.

Our framework [25] reflects the twin—and in some respects conflicting—goals of measure selection. In selecting quality measures, organizations typically seek to maximize desirable measure attributes, which the OECD–HCQI characterizes as: importance, scientific soundness, and feasibility. But organizations also typically seek measures that are representative of highly diverse health care systems. Dimensions of this diversity include domains of quality (e.g. prevention, access, assessment, treatment, continuity, coordination, safety, and outcomes). They also include breadth among clinical disorders (emphasizing conditions with high prevalence, morbidity, and treatability), treatment modalities (both medication and psychosocial variation), clinical settings across the continuum of care, and vulnerable subpopulations including children, the elderly, and racial/ethnic minorities. Complicating the process of measure selection are tensions among these goals. Clinically important, evidence-based measures typically require richer, more costly data sources [26]—thus, more important measures may be less feasible to implement. Measures of mental health care for children are at an earlier stage of development and testing than measures for adults—thus, broader representativeness may conflict with scientific soundness. These examples illustrate trade-offs in measure selection that our framework helps to make explicit and addressable [25].

In response to a call for participation from the OECD Secretariat six countries (United Kingdom, Sweden, Canada, Australia, Denmark, and the United States) and one international organization, the European Society for Quality in Healthcare (ESQH) nominated seven panelists. Panel members included mental health administrators, clinicians, and services researchers with expertise in quality of care, epidemiology, public health, and public policy. Additionally, each panelist had experience in the development of national core sets of quality measures for mental health care.

The seven panelists participated in a multistage consensus development process (Figure 1). Firstly, panelists identified and reviewed 134 measures of care for mental health and substance-use disorders (the candidate set). Indicators were drawn from OECD member countries’ initiatives, conducted by national health departments, payers, accreditors, researchers, and other stakeholder organizations. Specific sources included the Canadian Mental Health Advisory Network, the United Kingdom Department of Health, the Center for Quality Assessment and Improvement in Mental Health’s National Inventory of Mental Health Quality Measures, numerous US stakeholder initiatives, and published research reports. Information about these sources is detailed in the report, Selecting Indicators for the Quality of Mental Health Care at the Health Systems Level in OECD Countries [27].

Figure 1

Measure selection process.

The 134 candidate measures were screened against the criteria established by the OECD Secretariat to identify measures appropriate for international benchmarking [28]. The criteria sought to establish a conceptual focus, standardized methods, and a preliminary, achievable goal that would serve as a foundation for future work. For inclusion in the candidate set, measures had to meet each of the following:

  • Indicators focusing on quality (as opposed to cost or utilization).

  • Indicators relevant to assessing quality at the system level.

  • Indicators focusing on technical (rather than interpersonal) quality.

  • Indicators constructed from pre-existing administrative data based on standardized coding.

  • Single item indicators (rather than multi-item scales).

Of the candidate measures, 23 met screening criteria (the screened set). These measures, accompanied by information on their specifications and data sources, were reviewed and rated by the panel using a structured assessment process. Each measure was rated anonymously on 9-point Likert scales for indicator importance and scientific soundness. In rating importance, panelists were asked to consider import to policy, impact on health, and susceptibility to being influenced by the health care system. In rating scientific soundness, panelists were asked to consider face validity, content validity, and explicitness of the evidence base. Each panelist also rated the feasibility of data collection in his or her country as likely, possible, or unlikely for each measure. Ratings were obtained from five panelists (all but representatives from Denmark and ESQH); all seven panelists participated in discussions preceding and subsequent to the rating process.

In developing decision rules for inclusion/exclusion of measures, we adopted the approach used in the RAND appropriateness method [29]: 9–7 indicating agreement, 6–4 neither agreement nor disagreement, and 3–1 disagreement. To derive a preliminary set, we included measures that [1] had a median score ≥7 for both importance and soundness and [2] more than half of participating panelists reported that data availability for the indicator was either ‘possible’ or ‘likely’. Excluded from this preliminary set were measures that had a median score ≤3 for either importance or soundness or half or more participating panelists reported that data availability was ‘unlikely’.

The remaining measures—with importance and soundness scores between 4 and 7 as well as more than half of panelists reporting data availability to be at least ‘possible’—were returned to the panel for a second round of review. Each measure was discussed further with regard to their merit and their contribution to the breadth of topics addressed. Inclusion/exclusion decisions were then made on a case-by-case basis by consensus. The resulting recommended set was forwarded to the OECD–HCQI Steering Committee for their consideration.


Of the initial 134 candidate set measures, only 23 (17%) met the specified screening criteria and were rated by the expert panel. Of the 23 measures meeting screening criteria, 4 received ratings meeting the inclusion criteria for the preliminary set, 4 met exclusion criteria, and 15 with intermediate ratings were forwarded to the panel for further review. Panelists discussed the strengths and weaknesses of the measures receiving intermediate ratings along with their relative contribution to a broad and balanced measure set. Consensus was achieved on including 8 of the 15 measures in the final, recommended set, for 12 measures.

Table 1 demonstrates the influence of each stage of the selection process on the diversity of the resulting measure sets. In proceeding from the candidate measure set to the criteria-based preliminary set, the number of quality domains represented by the resulting measure set decreased from eight to two, diagnostic groups categories from six to three, treatment modalities from five to one, and vulnerable populations from five to one. In the subsequent phase of the selection process, panelists noted what they regarded as important gaps among the subjects of these measures and identified what they considered ‘good enough’ measures among those with intermediate scores to fill these gaps. This phase resulted in an expansion in the number of measures within in each dimension in the final, recommended measure set to four quality domains, three diagnostic groups, four treatment modalities, and four vulnerable populations.

View this table:
Table 1

Characteristics of initial, screened, and recommended indicators

Indicator set
Candidate set (n = 134)Screened set (n = 23)Preliminary set (n = 4)Recommended set (n = 12)
Quality domain
Process 113 18 4 11
Outcome 21 5 0 1
Clinical conditions
    Across diagnoses841327
    Depressive disorders38613
    Schizophrenia/Other psychotic disorders5100
    Substance-related disorders3112
    Bipolar disorder1100
    Borderline personality disorder1100
Treatment modalities
    Case management3201
    Electroconvulsive Therapy (ECT)2000
Vulnerable populations
    Severe persistent mental illness17201
    Racial/ethnic minorities1101
    Dual diagnosis MH/SA2111

Table 2 describes the measures in the recommended set and their rating scores. The mean importance score for the set was 6.66, and the mean score for scientific soundness was 6.21. Data availability was assessed as ‘likely’ or ‘possible’ by all five panelists for seven measures and by four of the five panelists for the other five measures. Minor revisions were made to the recommended measures to refine specifications and eliminate redundancy.

View this table:
Table 2

Ratings for recommended mental health and substance-related quality indicators

NameDescriptionImportance medianSoundness medianData availability (n = 5 panelists)
Visits during acute phase treatment of depression% of persons with a new diagnosis of major depression who receive at least three medication visits or at least eight psychotherapy visits in a 12-week period [34].7.007.50122
Hospital readmissions for psychiatric patients% of discharges from psychiatric in-patient care during a 12-month reporting period readmitted to psychiatric in-patient care that occurred within 7 and 30 days [12].7.007.00005
Length of treatment for substance-related disorders% of persons initiating treatment for a substance-related disorder with treatment lasting at least 90 days [35].6.006.50032
Use of anticholinergic antidepressant drugs among elderly patients% of persons age 65+ years prescribed antidepressants using an anticholinergic anti-depressant drug [36].6.006.00122
Continuous antidepressant medication treatment in acute phase% of persons age ≥18 years who are diagnosed with a new episode of depression and treated with antidepressant medication, with an 84-day (12-week acute treatment phase) treatment with antidepressant medication [15,37].6.004.00122
Continuous antidepressant medication treatment in continuation phase% of persons age ≥18 years who are diagnosed with a new episode of depression and treated with antidepressant medication, with a 180-day treatment of antidepressant medication [15,37].6.004.00122
Timely ambulatory follow-up after mental health hospitalization% of persons hospitalized for primary mental health diagnoses with an ambulatory mental health encounter with a mental health practitioner within 7 and 30 days of discharge [12].8.007.00014
Continuity of visits after hospitalization for dual psychiatric/ substance-related conditions% of persons discharged with a dual diagnosis of psychiatric disorder and substance abuse with at least four psychiatric and at least four substance abuse visits within the 12 months after discharge [38].7.007.00032
Racial/ethnic disparities in mental health follow-up rates% of persons with a mental health-related visit receiving at least one visit in 12 months after initial visit stratified by race/ethnicity [18].7.006.50032
Continuity of visits after mental health-related hospitalization% of persons hospitalized for psychiatric or substance-related disorder with at least one visit per month for 6 months after hospitalization [39].6.006.00014
Case management for severe psychiatric disorders% of persons with a specified severe psychiatric disorder in contact with the health care system who receive case management (all types) [12].7.006.50041
Mortality for persons with severe psychiatric disordersStandardized mortality rate for % of persons in total population with specified severe psychiatric disorders [12].7.006.50122


The 12 indicators recommended by the Mental Health Panel assess clinically important processes and outcomes of care where there is known variation in the quality of clinical practice. The measures, selected by an international expert panel using a structured process, represent progress toward the goal of identifying consensus-based measures for international benchmarking of mental health care. This is only a step in a longer-term process, however, as noted by the measures’ moderate ratings on importance, scientific soundness, and feasibility. Information on these measures—including their clinical rationale, basis in research evidence, specifications, prior results, and testing—can be found in the OECD report and detailed review of measures [4,27].

The selection process achieved some degree of success toward the objective of selecting a measure set reflecting the diversity of mental health systems. The majority of the measures are applicable across diagnostic categories, whereas five evaluate care specific to depression and substance-use disorders—conditions of high prevalence, morbidity, and treatability. The measures evaluate several domains of quality, including treatment, continuity, coordination, and outcome. They assess several modalities of treatment, including medication management, psychotherapy, and case management. They evaluate care in both in-patient and outpatient settings, as well as care for vulnerable subgroups such as elderly patients and racial/ethnic minorities.

Limited to measures previously implemented within OECD member countries and constructible from pre-existing administrative data sets, the resulting measure set also has significant gaps. Hundreds of quality measures have been proposed for mental health care; however, a US study found wide variability in their evidence base, operational development, validity and reliability, and adequacy of case mix adjustment [4,26]. Table 1 demonstrates the implications of these findings for this project—namely the loss of diversity among measure topics as standards for importance, soundness and feasibility are applied. The recommended measure set lacks indicators of prevention, access, assessment, and safety of care. Although several measures could be meaningfully stratified by age, the set does not include measures specific to children’s mental health services. Common comorbidities of mental illness—including substance-use disorders and medical conditions—are only indirectly addressed. Measures were not available to assess emergent care or services at intermediate levels of care, such as residential or partial programs. The medication measures examine appropriateness of care more effectively than the measures of psychosocial interventions such as therapy or case management. A contributing factor is that administrative data provide more detailed information about medication management (e.g. dose, duration, and intensity) than about the content of psychosocial treatments. These challenges have been encountered in other initiatives that have sought to identify measures based on administrative data [30]. They are intensified here by the further restriction to data available on a comparable basis across several countries.

Limitations to this study include the number of participants (seven individuals from six countries and one international organization). Panelists did have a unique depth of experience, each previously having participated in one or more national or international consensus development processes on this topic. The generalizability of the results may be limited to developed countries, because developing countries struggle with the adequacy of their data systems in addition to their systems of mental health care. The WHO has developed the WHO Assessment Instrument for Mental Health Systems (WHO–AIMS) to support quality assessment and planning in low- and middle-income countries lacking robust systems for administrative data [31].

It should additionally be noted that measures of rate-based processes and outcomes represent a subset of a broader range of approaches to quality assessment in mental health care. Other methods providing essential insights include: (i) evaluation of patient perceptions of care, (ii) measurement of clinical outcomes, such as change in symptoms, functioning, or quality of life, and (iii) assessment of the fidelity of evidence-based interventions to their empirically proven models. There has been little standardized implementation of such instruments either within or between nations [32].

Progress from this point forward is likely to be incremental and iterative. First, this measure set will need refinement as the OECD further investigates the availability of required data elements among countries participating in the benchmarking initiative. Secondly, pilot implementation of these measures would provide data allowing for measure testing and development of case mix adjustment. Thirdly, the measure of life expectancy among individuals with severe mental illness illustrates the potential for linkage among data sources to contribute to the development of meaningful quality measures. The indicator reflects research findings that individuals with severe mental illness die at a younger age than members of the general population and that better detection and general medical care for these individuals could contribute to narrowing this gap [33]. The OECD plans to further explore the ability of member countries to link national health care and mortality data sets. Analogous linkages may present opportunities to examine educational or criminal-justice outcomes of mental health care. Fourthly, these initial efforts at consensus development should be built upon by identifying key clinical variables that, if added to existing administrative data systems, would allow for measurement of important clinical processes. A more extensive but crucial undertaking is developing consensus nationally and internationally on standardized methods for structured assessment of clinical diagnoses, symptom severity, and functional impairment as well as on tools to evaluate patient experiences and treatment fidelity.

In the meantime, the measure set recommended herein provides a starting point for international benchmarking of mental health care. It covers several relevant dimensions of care and addresses known variations in important processes and outcomes. Although much work will be needed to refine, specify, implement, and augment these measures, they provide a foundation for further progress.


The authors thank Peter Hussey for his contributions to the indicator selection process, Elizabeth Cote and Leighna Kim for providing research support, and Victoria Braithwaite and Orla Kilcullen for their help in preparing this manuscript. The comments of the members of OECD Expert Group on the Health Care Indicators Project and the Ad Hoc Group on the OECD Health Project on an earlier version of this manuscript are greatly appreciated. The authors also thank John Martin, Martine Durand, Peter Scherer, and Jeremy Hurst for review and comments. This paper reflects the opinion of the authors and not an official position of the OECD, its Member countries or institutions participating in the project.


View Abstract