OUP user menu

Developing a function impairment measure for children affected by political violence: a mixed methods approach in Indonesia

Wietse A. Tol, Ivan H. Komproe, Mark J. D. Jordans, Dessy Susanty, Joop T.V.M. De Jong
DOI: http://dx.doi.org/10.1093/intqhc/mzr032 375-383 First published online: 15 June 2011


Objective Practitioners in political violence-affected settings would benefit from rating scales that assess child function impairment in a reliable and valid manner when designing and evaluating interventions. We developed a procedure to construct child function impairment rating scales using resources available in low- and middle-income countries.

Design We applied a mixed methods approach. First, rapid ethnographic methods (brief participant observation, collection of diaries and a focus group with children) were used to select daily activities that best represented children's functioning. Second, rating scales based on these activities were examined for their psychometric properties. Construct validity was assessed through a confirmatory factor analysis procedure.

Setting Central Sulawesi, Indonesia.

Participants Qualitative data were collected for 53 children and psychometric testing was done with 403 children [average age: 9.9 (SD = 1.21), 49% girls] and 385 parents.

Results Using locally available resources, we developed separate child-rated and parent-rated scales, both containing 11 items. The child-rated scale evidenced good internal, test–retest and inter-rater reliability and acceptable convergent and discriminant validity. Construct validity was confirmed by fit of the theorized factor structure—a social-ecological clustering of daily activities.

Conclusions The procedure resulted in a reliable and valid rating scale to assess child function impairment in the context of political violence. Practitioners can apply this procedure to develop new locally adequate rating scales to strengthen epidemiological surveys, baseline assessments, monitoring and evaluation and eventually, interventions. Further research should address the importance of gender differences and criterion-related validity.

  • function impairment
  • measurement
  • evaluation
  • violence
  • armed conflict
  • Indonesia
  • psychosocial


Mental health is a major contributor to the global burden of disease [1]. This is also the case in low- and middle-income countries (LAMIC), where mental health problems constitute 11.1% of the burden of disease [2]. Globally, political violence is a major risk factor for public health, including for child [3], adolescent [4] and adult mental health [5, 6]. Political violence particularly affects LAMIC; 89% of the 36 armed conflicts recorded in 2009 took place in lower income settings, the majority in Asia (42%) and Africa (33%) [7]. In recognition of the importance of child mental health in settings affected by political violence, mental health and psychosocial support for this population group are increasingly integrated in humanitarian response [8].

A major controversy in this field, however, remains the emphasis on a biomedical framework to describe the mental health and psychosocial consequences of political violence on children and adolescents, especially the dominant focus on the posttraumatic stress disorder (PTSD) as described in international psychiatric classification [9, 10]. Studies addressing child and adolescent mental health in areas of political violence have by and large consisted of (i) assessment of exposure to political violence, (ii) assessment of psychological symptoms, generally through symptom checklists, and subsequently (iii) the establishment of a statistical relationship (generally successfully) between exposure and psychological symptoms [911].

Such an isolated focus on a pathogen-symptomatology relationship (i.e. war-related stressor leads to psychological symptoms) problematically neglects the wider socio-cultural context in which the development of symptoms, symptom presentation and help-seeking are shaped. Socio-cultural contexts influence the way in which psychological symptoms are experienced and described, and what types of care are available and used. In Indonesia, for example, we found that children, parents and teachers often explained the impact of armed conflict on child psychosocial wellbeing in terms of somatic (e.g. fever and weakness) and moral complaints (e.g. ‘bad behavior’ such as smoking and early sexual interaction), for which massage healers and (religious) teachers were felt to be important healing resources [12].

In accordance with this critique, a variety of authors have called for the development of culturally sensitive measures that go beyond assessment of symptom clusters to measure child functioning more generally [4, 1315], e.g. through the contextually sensitive development of measures of function impairment measures [16]. The availability of a contextually sensitive measure of function impairment would support public health practitioners in a number of ways. First, it would help in assessing the burden associated with political violence in a manner considered relevant by local stakeholders (e.g. as interfering with children's daily activities such as schooling or household duties). Furthermore, inclusion of such an instrument in treatment outcome studies would enable the demonstration of the benefits of an intervention on indices of importance to local stakeholders and the reality of daily life in a specific context. Contextually sensitive assessment of functioning would also aid in statistically teasing out the specific relations that clusters of psychological symptoms may have with decreased functioning [17]. Such data may nuance positions in the universalism-cultural relativism debate, by showing which types of mental health symptoms are most strongly related to function impairment in specific population (sub)-groups and/or under what circumstances. Finally, a function impairment measure would be useful for strengthening psychiatric diagnosis with children. Although psychiatric classification systems specify function impairment as a diagnostic criterion for most psychiatric disorders, research with symptom checklists rarely takes function impairment into account [14].

Inclusion of function measures in studies of child mental health is increasingly common and scales with good psychometric properties are available for this purpose [18]. However, as concluded in a 10-year review of rating scales assessing functioning by Winters et al., ‘most scales do not optimally address cultural context’, which is problematic given the different values that are given in different socio-cultural contexts to participation in daily activities [18]. For example, transcultural researchers in non-industrialized rural settings are greatly challenged to find alternative activities for items such as ‘playing computer games’ or specific sports activities which may not be common in such settings. Even when finding suitable alternatives, differences remain in how activities are valued. For instance, parents in the developing world may place more emphasis on children's participation in the household's activities, including economic activities and care for siblings. Similarly, items that aim to convey the importance of ‘self-fulfillment’, ‘autonomy or independence’, ‘assertiveness’ or ‘interiorization of conscience’ may have questionable validity in settings with different notions regarding the ideal relations between self, the family and the community [19].

Given the absence of a function impairment measure for children and adolescents in LAMIC settings, the aim of this paper is to describe the development, piloting and results of a method to construct context-specific function impairment measures. We aimed to construct a useful measure for LAMIC with sound psychometric properties. First, for the measure to be useful in LAMIC settings (i.e. in settings with limited mental health research infrastructure), we aimed to develop a method that could be administered within a limited time frame with resources available to organizations working with vulnerable populations. Second, we aimed to assess psychometric properties of the developed rating scale. We assessed internal consistency, test–retest and inter-rater reliability. With regard to convergent and discriminant validity, we expected negative correlations of function impairment with strength measures (hope, social support, coping and family connectedness) and positive correlations with exposure to traumatic events and psychological symptom measures (PTSD, depression and a local idiom of distress of mainly somatic symptoms associated with trauma exposure including headaches, dizziness and trembling [12]). We hypothesized these correlations to be of a moderate size, given the variety of influences on function impairment in a low-resource politically unstable setting. Finally, we expected acceptable construct validity, evident by the fit of a theoretical factor structure to the data.



Research was conducted within the context of a public mental health project for children affected by political violence in Indonesia and other settings [20]. Within this project, a cluster randomized trial was implemented to assess the efficacy of a secondary prevention school-based intervention [21, 22]. The school-based intervention was aimed at reducing symptoms in children with psychological distress as well as increasing indicators of resilience [23]. The project was conducted in Poso on the island of Sulawesi, an area that has suffered communal violence between Protestant and Muslim groups since 1998, rooted in changing economic, demographic and governance relations [12]. Data were collected in two steps: (i) a scale construction phase consisting of qualitative research methods and (ii) the analysis of piloting and baseline survey data from the cluster randomized trial to assess psychometric properties of the constructed scale. To ensure the development of contextually valid tools, we chose to combine qualitative and quantitative methods in a (so-called) mixed methods approach, which is an often-advocated strategy in transcultural mental health research [2426].

Scale construction

We aimed to adapt an earlier published strategy to develop function impairment measures in rural Rwanda and Uganda for adults by Bolton and Tang [16]. Their method uses qualitative free listing interviews (n = 20–40) to identify common daily tasks from the point of view of the target population themselves (e.g. ‘What are the tasks that men/women must do regularly to care for themselves?’). The most common activities thus identified are then placed in a pre-existing 10-item format that specifies three activities regarding individual functioning, three activities regarding family functioning, three activities regarding community functioning and one open item [16].

We adapted the Bolton and Tang's methodology with adults for our purpose with children in two ways. First, we chose to implement different qualitative methods for data collection, since in previous use of free listing interviews among Nepali children we found high rates of social desirable responses (data not shown). Second, as part of this pilot procedure we used more than one technique to collect data in order to enable triangulation. That is, three qualitative data techniques were implemented including brief participant observation, the collection of diaries and focus group discussions (see Table 1). In the first step—brief participant observation—we aimed to collect a more general outline of the places where diverse activities took place, as well as time spent on these activities. Rather than assume the equal importance of three dimensions of functioning (i.e. three items on individual, family and community functioning as specified in the format proposed by Bolton and Tang cited above [16]), we were interested in how much time children spent in different contexts. Also, given the mistrust that families may have towards researchers in a conflict-affected setting, this step was also aimed at introducing the researchers to children and their families.

View this table:
Table 1

Overview of qualitative data collected

StepAimParticipants and data collectedAnalysis
1. Brief participant observationIdentify important dimensions of daily activitiesObservation, note-taking and unstructured interviews, while spending 2 weeks with two boys and two girls, representative of area's religion/livelihood(a) Listing of activities noted, (b) categorizing activities in function domains and (c) estimating time spent per domain
2. DiariesCollect and count specific daily activitiesAll children in one grade of two schools (convenience sample) keep a diary for 2 weeks, noting hourly activities and their location(a) Dividing activities over identified function domains (Step 1) and (b) counting occurrences
3. Focus groupExplore children's perspectives on daily activitiesGroup interviews (n= 9 with mixed gender/ethnicity/religion, focused on (a) listing of daily activities, (b) categorizing these activities and (c) prioritizing these categories on a pie chart—all according to participants’ perspectivesComparison of function domains and their prioritization as described by children with findings from Step 1 to 2 (triangulation)

In the second step, we aimed to collect a set of specific activities in which children engaged in order to select the most-mentioned activities as items for our rating scale. We requested children from two classrooms in the area (convenience sample, n = 40) to keep a diary for 2 weeks, noting each hour what they had been doing and where.

In the last step we conducted a focus group discussion with children (nine participants, mixed religious group [three Christians and six Muslims], 11–12 years old) in order to triangulate data from the previous steps from children's own perspectives. In this focus group, children were asked to (i) brainstorm on activities commonly engaged in by children (in two sub-groups with mixed gender), (ii) divide these activities in categories as the group saw fit (whole group) and (iii) come to a consensus on how important these categories were according to the group as a whole, using the metaphor of dividing a pie.

Based on the data collected in these three steps, the research team made a decision regarding (i) the categories of the instrument (e.g. individual, family, community functioning or otherwise) and (ii) specific items to include under these categories (results described below). We asked about these items on a 4-point Likert scale indicating no impairment (1), a little impairment (2), moderate impairment (3) or severe impairment (4). We chose to use a 4-point scale rather than the 5-point scale proposed by Bolton and Tang, in order to simplify the scale for use with children and to avoid having a middle option. Scores on these items were subsequently added together to obtain a total score (e.g. on a 10-item questionnaire, the range of the scale would be 10–40). To help children choose among these alternatives, we used a pictogram consisting of a child carrying an increasingly heavy bucket of water. We administered instruments with the same items to both children and parents.

Evaluation of psychometric properties

We assessed the following psychometric properties: test–retest reliability, inter-rater reliability, internal consistency, discriminant and convergent validity and construct validity. Test–retest reliability was examined over a 2-week period with 51 children (15 boys and 36 girls) from a classroom of children in the area, selected through convenient sampling. Other forms of reliability and validity were calculated using the baseline data of the afore-mentioned cluster randomized trial, described in more detail earlier [21].


In short, we selected 14 schools at random in Central Sulawesi on the basis of a government-provided list in the area most affected by communal violence, the Poso district. Schools with single religions were deleted from the list, to ensure similar baseline characteristics on religion, livelihood and violence background for the treatment outcome study. Screening took place to identify children for a secondary prevention school-based intervention, using a locally constructed exposure checklist and screening checklists for posttraumatic stress disorder and anxiety.


Selection of these instruments was based on the literature and on a previous rapid ethnographic study [12]. We employed both locally constructed scales [exposure events (9 items) and a locally identified group of items including somatic complaints such as fever and fainting referred to as trauma in Bahasa Indonesia (six items, internal consistency: 0.80)], as well as standardized symptom checklists. Standardized checklists were used to assess PTSD symptoms (the Child PTSD Symptom Scale [27], internal consistency: 0.85), depressive complaints (Depression Self-Rating Scale [28], internal consistency: 0.42), coping (Kidcope [29], internal consistency: 0.68), social support (Social Support Inventory Scheme [30], internal consistency: 0.77), hope (Children's Hope Scale [31], internal consistency: 0.62) and family connectedness ([32], internal consistency: 0.72). All standardized instruments were translated with methodology proposed by Van Ommeren et al. [33]. This translation method systematizes strategies advocated previously by transcultural researchers, through the use of a Translation Monitoring Form, which records the items’ comprehensibility, acceptability, relevance and completeness while translation goes through five steps. First, the item is translated by a group of bilingual indigenous translators and back-translated. Second, an independent bilingual professional evaluates the translation, especially with regard to conceptual structure. Third, focus groups are organized to evaluate the translated items (eat least one per item). Fourth, all translated items go through blind back-translation. Fifth, the translated items are pilot tested.


Data to construct the functioning scales were collected in a 2-month period in 2005. The baseline data for the outcome study were collected between March and May 2006. We trained four local research assistants with a Bachelors degree in a related science in a 4-week period on qualitative data collection and a subsequent 5-week period in quantitative data collection. All interviews were administered after obtaining informed consent. Ethical clearance was provided by the International Review Board of the VU University Amsterdam, which evaluated the study as in accordance with the Declaration of Helsinki.


The relationship between child-rated and parent-rated scores on the functioning scales was calculated with Pearson correlations. Calculation of Pearson correlations between total scores on the function impairment and other measures was used to test discriminant and convergent validity. Test–retest reliability was assessed by calculating the Spearman Brown correlation between the total score of the first measurement with the total score of the second measurement (2 weeks later). Inter-rater reliability was examined on item level, using the intra-class coefficient. These calculations were performed with SPSS version 12.01.

To explore construct validity, we used LISREL 8 to perform confirmatory factor analyses (CFA) of the function impairment scales. The CFA was done to test whether the underlying factor structure resembled the dimensions identified through qualitative enquiry. Factor structures were tested for absolute fit using four models: Model A: a one-factor solution, Model B: a multi-factor model with uncorrelated factors, Model C: a hierarchical model, with a general functioning factor predicting the factors from Model B, and Model D: Model B with correlated factors.


Scale construction

Participant observation and discussions with the assessors suggested that data could best be analyzed according to a social-ecological framework, e.g. in individual, family, school and peer activities. Children seemed to spend most of their time in individual (13 h), school (4 h), family (3 h) and peer activities (3 h) and less time on community activities (1 h). We did not find very pronounced differences between genders in this division. Based on this division we decided to form a template for the rating scale with 4 items regarding individual activities, 2 items regarding family activities, 2 items regarding peer activities, 2 items regarding school functioning and 1 open item (totaling 11 items).

Similarly, we categorized all mentioned activities in diaries (total of 654 separate activities mentioned) along the four social-ecological dimensions. Within each category we selected the most frequently reported activities for inclusion in the rating scale (individual activities: keeping hygiene, sleeping, eating, praying; family activities: household chores, relating to parents; school activities: studying in school, school chores; peer activities: play, socializing with friends). We made sure to include separate examples of activities that were gender-specific, e.g. the probe for playing included examples specific to boys and girls.

Children in the focus group discussion made a division very similar to the social-ecological categories mentioned earlier (self, home, friends and school). A difference in categorization, however, was observed with girls in the focus group who decided on a separate category for religious activities. Furthermore, girls and boys did not agree on what priority to give religious activities. As praying was already included as an individual activity, we decided to keep the chosen scale categories and items from the second step. Appendix 1 contains the resulting rating scale.

Psychometric properties

The sample consisted of 403 children (385 parents) with an average age of 9.94 years (SD = 1.21) equally divided by gender (49% girls). Children reported being exposed to a mean of 3.9 different types of political violence events. Mean score on the function impairment instrument was 17.96 (SD= 5.49). Children scored highest on impairment in hygiene, sleeping and household chores. Child–parent agreement on total scores was low (r(403) = 0.27, P < 0.001).


Internal consistency of the measures was acceptable (Cronbach alpha's: child version, 0.77; parent version, 0.74). Test–retest reliability (2-week) of the child version was 0.78 (Spearman-Brown coefficients). Test–retest reliability of parent versions was not assessed. Inter-rater reliability of the child versions was assessed through independent scoring of an interview of one of the research assistants, watched by the other research assistants. Intra-class coefficients approached 1.00.


Discriminant and convergent validity were assessed by examining correlations between total scores on the functioning scales with total scores on measures for exposure to political violence events, symptom measures (PTSD and depression symptoms) and measures for strengths (social support, coping methods, hope and family connectedness) (see Table 2). Most correlations of the child-rated functioning measures were statistically significant and in the expected direction. However, correlation coefficients were not large in size for most relations. To interpret the size of correlations coefficients we followed Cohen [34], i.e. small correlations between −0.3 and −0.1 and between 0.1 and 0.3, medium correlations between −0.5 and −0.3 and between 0.3 and 0.5 and large correlations between −1.0 and −0.5 and between 0.5 and 1.0. Medium correlations were found for child-rated function impairment with depressive symptoms and the local trauma idiom. Small correlations were found with exposure to traumatic events, PTSD symptoms and social support. The child-rated measures did not correlate well with the other resilience constructs (hope, satisfaction with coping). Parent-rated function impairment scores generally correlated poorly with the child-rated measures, except with the family connectedness measure, with which it correlated moderately.

View this table:
Table 2

Correlations of function impairment scale with other measures

ExposurePTSD symptomsDepressive symptomsTrauma idiom from qualitative studySatisfaction with coping methodsSocial supportHopeFamily connectedness
Function impairment Indonesia (child rated)0.18***0.14**0.30***0.39ª,***−0.03−0.22***−0.06−0.10*
Function impairment Indonesia (parent rated)***0.13**0.03−0.13*0.04−0.32***
  • ªSix somatic items related to the local idiom of ‘trauma’.

  • *P < 0.05; **P < 0.01; ***P < 0.001.

We assessed construct validity by fitting the hypothesized factor structure (the social–ecological division of items: four individual activities, two family activities, two school activities and two peer activities) using confirmatory factor analysis on the child-rated function impairment scales (Table 3). This resulted in excellent fit of the theoretical factor structure, with hypothesized functioning domains predicting their item scores as latent constructs. Functioning domains were strongly correlated (see Fig. 1).

View this table:
Table 3

Confirmatory factor analyses

Model Aa33.5635
Model Bb402.03−368.470**
Model Cc26.41375.624**
Model Dd18.467.952
  • Note: aA one-factor solution (all items loading on one general function impairment latent variable, no correlation of error terms). bA multi-factor model (items loading on socio-ecological latent variables, no correlation of error terms). cA hierarchical model with one general function impairment factor predicting the factors from model B. dModel B with correlated factors.

  • **P < 0.01.

Figure 1

Factor structure of function impairment scale. Fit indices: χ² (29) = 18.46; P = 0.93; RMSEA = 0.00; CI RMSEA = 0.00–0.01; NFI = 0.98; NNFI = 0.99. All estimates in the model were standardized.


We aimed to develop a method that could be used in a variety of settings to construct a function impairment measure, using resources available in LAMIC settings. A mixed qualitative–quantitative methods approach, including brief participant observation, collection of diaries and a focus group discussion, resulted in a measure with high face validity. Data were collected and analyzed locally within a 2-month period, thus largely satisfying our feasibility conditions set out in the introduction.

With regard to hypothesized psychometric properties of the instrument, the results showed good evidence for construct validity given the excellent fit of the social-ecological factor structure identified in the qualitative pre-phase. Convergent validity was as expected for depressive symptoms and a local trauma idiom. Discriminant validity, as assessed through correlation with measures of strength (hope, social support, satisfaction with coping methods and family connectedness) were less strong than expected, except for social support. We found low correlations between the child-rated and parent-rated measures, which—although unexpected—is in accordance with findings in the international literature [35]. Based on this latter finding it may be recommended that for replication purposes only child-rated measures are developed.

The strongest correlations were found between function impairment on the one hand and depression, a local trauma idiom and social support on the other hand. Further analyses should be directed at further disentangling these relations, e.g. through structural equation modeling, preferably in prospective designs. A preliminary conclusion based on these data, however, would warrant screening for these sets of symptoms and low social support by primary care professionals, in order to prioritize treatment for children with likely high levels of function impairment.

Our findings must be interpreted in light of a number of limitations. First, we did not construct separate scales for male and female children, based on the brief participant observation. Future application of the method could address the question whether gender-specific scales would yield a stronger correlation pattern with measures of mental health. Second, we did not examine the relation between function impairment and a more general external criterion such as school functioning or physical health status. This remains an important avenue for further research. Third, as previously mentioned, our cross-sectional design prohibits conclusions regarding causality and future research should address the relations between function impairment and mental health measures in a longitudinal design. Future research could also address if fewer qualitative techniques may be employed to construct measures with similar psychometric properties, to further reduce costs and increase feasibility. As part of this piloting procedure we included three different types of qualitative data collection (brief participant observation, diaries, focus group discussion) and felt this provided ample opportunity for triangulation. However, an alternative strategy may be to focus on brief participant observation and conduct multiple focus group discussions.

Despite these limitations, we feel that this method holds promise for use in the development and evaluation of mental health and psychosocial support programs for children affected by political violence internationally. An indication of the usefulness of the procedure may be seen by the uptake of the procedure by authors in other settings [36, 37]. In addition, inclusion of the function impairment scale in a cluster randomized trial showed significant change in function impairment for girls in the intervention group over a 6-month period, thus indicating the instrument is sensitive to change. The latter is an important asset of instruments for inclusion in evaluation studies [18]. We hope that further development of this procedure may aid in identifying symptom clusters and intervention elements that are capable of prevention and reduction of function impairment in children affected by political violence.


This work was supported by PLAN Netherlands.


We would like to thank Prof. Ria Reis and Dr. Paul Bolton for their assistance in the design of the procedure and Dr. Paul Bolton for his comments to earlier drafts of this manuscript.

Appendix 1

Child functioning impairment rating scale—Central Sulawesi

Instructions [READ ALOUD]:

I would like to ask you now about activities that you do in your daily life. Please tell me if you have had any difficulties, with these activities in the last 2 weeks. If you have had any difficulties with an activity in the last 2 weeks, please tell me how much difficulty you had with that activity. To help you with deciding how much difficulty we can choose the drawing again with the child carrying the water. Remember that drawing [SHOW FLASHCARD]? Please point to the picture on the drawing that is closest to how much difficulty you felt with the activity.


Sekarang kami akan menanyakan kamu tentang kegiatan-kegiatan yang kamu lakukan dalam hidup kamu sehari-hari. Tolong katakan pada kami jika kamu memiliki kesulitan-kesulitan dalam melakukan kegiatan-kegiatan ini dalam 2 minggu belakangan ini. Jika kamu memiliki kesulitan dengan kegiatan-kegiatan ini dalam 2 minggu belakangan, tolong katakan seberapa besar kesulitannya. Untuk menolong kamu menentukan seberapa besar kesulitan, kita dapat memilih gambar anak yang membawa air seperti sebelumnya. Ingat gambar ini [TUNJUKKAN GAMBAR]? Tolong tunjuk gambar yang sesuai dengan besarnya kesulitan kamu dalam melakukan kegiatan-kegiatan ini.

View this table:


View Abstract