What is the role of an MAU instrument in economic evaluation?
When the quality of life is affected by a medical intervention it must be included in an economic evaluation. The most common approach is to employ Cost Utility Analysis (CUA). This measures the cost of achieving an additional quality adjusted life year (QALY). All else equal interventions are preferred when the cost per QALY is low. (However other factors may be of importance such as fairness.) QALYs are calculated by multiplying life years by an index of utility measured on a 0-1 scale. MAU instruments are designed to measure utility and to facilitate the calculation of QALYs. The dimensions of an MAU instrument may also be used to profile the effect of an intervention (ie describe how dimensions of QoL vary because of the intervention).
What is utility?
MAU instruments measure 'utility' which is an index of the strength of a person's preference for a health state. This is usually measured on a scale on which zero (0.00) represents death and unity (1.00) is good health. When the utility index number is multiplied by the number of years in this health state we obtain the number of QALYs.
Note that 'quality' actually means 'utility' which equates with the strength of a person's preference. This differs from other concepts of the quality of a health state.
Also note that while every MAU instrument purports to measure the same quantity, ie ‘utility’, the numbers produced by different instruments actually vary. This makes the choice of instrument important. (See Choice of instrument.)
Why use a multi attribute utility instrument?
See above. MAU instruments are also useful in clinical trials where the focus of the study is well defined such as a program for improving vision. While there are a plethora of disease specific instruments the use of a broad based – multi attribute – instrument is often desirable as it has the potential to identify unexpected effects of a therapy. In particular a narrowly focused element instrument may fail to detect psycho social changes which some MAU instruments were designed to measure.
An advantage of a MAU instrument is that it weights the various responses by the relative importance (preference weight or utility) to the public of each attribute which allows a meaningful summation of scores.
Are MAU instruments intended to measure individual utility?
No. Economic evaluation like all summative program evaluation measures population outcome: that is, the extent to which a program works. Similar to other outcome indicators, the utility scores are the average utilities obtained from a group of patients, trial participants and/or controls. Typically, there is a very large dispersion of individual results around the average score and, consequently, the average QALY value would be a highly unreliable predictor of the utility for any one individual person. It follows from this that QALYs are only appropriately used for assessing the overall impact of a program and not the benefit obtained by any given individual from that program. The purpose of cost utility is to rank the overall program and not to provide advice to clinicians or program managers about individuals.
Can the non-utility values from an MAU instrument be stand-alone units of outcome?
Yes. This could occur if participants in a study completed the AQoL before and after an intervention. The difference between the obtained scores would provide a measure of the effect size of the program. Non-utility values are commonly used in the psychometrics and medical literature. They are easily calculated from an instrument by assigning numbers (1…5) to the response categories when these are consistently in ascending or descending order of importance. The response numbers for different questions are simply added and the total number rescaled so that the instrument score varies between 0-1 or 0-100.
Can the QALY values obtained from MAU instruments be converted into dollars?
QALYs are intended to measure benefits, not costs. Typically cost utility analysis ranks programs by comparing the dollar cost per QALY obtained in the different programs of interest. In principle, a dollar value could be attached to each QALY and dollar costs compared with dollar benefits (thereby converting 'cost utility analysis into cost benefits analysis). The AQoL does not seek to do this. Cost effectiveness and cost utility analysis were introduced specifically to avoid the need to place a dollar value upon life per se. Some economists do convert life years or lives into dollar benefits using either the 'human capital' or 'willingness to pay', generally for risk reduction. These techniques are problematic.
Can the use of QALYs be justified?
There are a significant number of problems associated with the use of QALYs. Some of these are conceptual and arise from the fact that the quality and length of life are not the only outcomes from a program. For example, issues of process and the distribution of outcomes are important. Further there are a set of issues concerning the accuracy of utility measurement. For these reasons, it is important that the role, strengths and weaknesses of QALYs be understood when evaluation results are interpreted. Despite these difficulties, the majority of health service researchers appear to accept that the measurement of QALYs represents an important development since they explicitly recognize the importance of health related quality of life and enable its measurement during program evaluation. The assumptions employed in measuring QALYs are (or should be) transparent and may be subject to sensitivity analysis.
Do MAU instruments measure costs, benefits or both?
MAU and other QALY instruments have nothing to do with costs. They help to measure the benefits of health programs by quantifying the quality (utility) of different health states in such a way that the quality and length of life may be combined as quality adjusted life years (QALYs).
Do MAU instruments ‘add apples and oranges’
Area 7.6 million square km; sheep 68.1 million; people 24.9 million = 98.5 million total
Some view the overall score of an MAU instrument this way, sight, pain and mental health cannot be added.
The criticism is invalid. Sight, pain and mental health are not added. Rather, it is the preference for these (or their value) which is added. Similarly the GDP does not add the number of transport services, holidays taken and commodities sold. Rather it adds the value of these.
Does the use of an MAU instrument replace the need for a clinical trial or evaluation?
No; ideally, economic evaluation incorporating utility measurement should be conducted alongside clinical trials. Economics is concerned with evaluating outcomes that are obtained from random control trials and other forms of experiments, quasi-experiments and pre-experimental research. It complements and does not replace other evaluative techniques. The result of an economic evaluation can be no better than the rigor of the clinical/epidemiological evaluation upon which it is based.
Does the use of the value of a statistical life (VSL) obviate the need for QALYs?
The theory underpinning the calculation of the Value of a Statistical Life (VSL) is logically invalid and the results are empirically inconsistent with an individual's ability to pay.
How can the quality of life (utility) be measured?
Utility may be measured in numerous ways. Each of these allows people to express the strength of their preference for a health state relative to death and good health. Common techniques are: (a) Rating Scale (b) Standard Gamble; (c) Time Trade-Off; and (d) Person Trade-Off.
A new technique the Relative Social Willingness to Pay (RS-WTP) is under development at the Monash CHE (see Richardson et al. (2007) Research Paper 22).
The AQoL instruments employ the Time-Trade-Off (TTO), in which a person indicates the proportion of a given number of remaining years of life (usually defined as 10 years by the interviewer) that they would be prepared to give up in order to avoid living in the health state being measured. For example, if a person with a life expectancy of ten years on a dialysis machine was prepared to give up two of these years that is 20% to be in good health, then their utility score would be 0.8 (i.e., 1.0-0.2, where 1 represents good health). An adjustment may be made to this calculation to allow for a persons rate of time preference.
How do the values (utilities) given to us by MAU instruments help answer the question: 'Should we undertake project/program X?'
Firstly, the values may be the basis for cost utility analysis. With this, the utilities obtained from the AQoL will be multiplied by the number of years spent in the health state. This gives the number of quality adjusted life years (QALYs). The additional QALYs arising from a health intervention the health program are compared with the program costs to give the cost per QALY for the program. This may be compared with the cost per QALY of other programs. Unless there are other relevant factors such as social equity we would normally prefer programs where the cost per QALY is lowest as we may thereby obtain the largest number of QALYs the best outcome from a given budget. Importantly, cost utility analysis of this type cannot answer the question 'Should we undertake Program X? as it only enables programs to be ranked by their cost per QALY. Of course a judgement may be made that the cost of a QALY is clearly too high or so low that a decision about the program is self-evident.
Secondly, the utility values may be compared before and after a program. Where this is done for several programs, the different utilities can be compared. If one program is superior with respect to all possible criteria cost, length of life, quality of life, and any other relevant factor then it dominants the 'alternatives' and it should be preferred. Thirdly, the AQoL can be used in program evaluation to produce a profile of health-related quality of life (HRQoL) as defined by the five different dimensions of health contained within it. Where measurement is made repeatedly, changes in health profiles can be tracked over time. This can be done for each dimension separately, or using the overall AQoL utility values.
Isn’t the composite approach to the measurement of QALYs superior to the use of a generic MAU instrument such as the AQoL?
There are two ways of developing health state scenarios. With the composite approach, health state scenarios or vignettes or complex health state descriptions have been constructed and numerical values placed upon them using direct scaling. Under the multi-attribute utility (MAU) approach, health states are decomposed into a generic descriptive system and a set of scale values corresponding with each possible health state in the descriptive system developed. As with the scenario-based approach, the values are obtained using one of the standard scaling techniques, viz. rating scales, time trade-off, standard gamble, person trade-off or, most recently at CHE, the relative social willingness to pay. Both the approaches have strengths and weaknesses. The composite approach may include more context specific information and may describe a changing health scenario. It may include the risk and prognosis facing the patient. However, the validity of such scenarios is seldom (if ever) tested in the way in which the AQoL is being tested. In addition, because very few health states can be validated in this manner due to the time and cost of doing so, vignettes are limited in the range of health states covered and are insensitive. The use of vignettes in a series of studies increases the likelihood of comparability in the measurement of HRQoL. Generic instruments, on the other hand, are relatively cheap to administer and can be used across many health coalitions and for these reasons they can be completed by the same group of patients periodically during a longitudinal study in order to create a time profile of the HRQoL rather than relying upon a single point in time estimate.
HRQoL Indices As noted there is a large set of possible QALY-like indices. Each of these is defined (inter alia) by the choice of:
- the scaling instrument (Time Trade-off, Standard Gamble, Person Trade-off, Rating Scale, etc);
- (the time frame evaluated (single year; duration of health state);
- choice of the group which rates judges the health state (general public, patient, potential patient); (iv) (iv) the perspective social or individual which is adopted (imagine you are the patient versus imagine you are on a health committee judging social importance); and
- the inclusion or exclusion of additional value weights (for example, for age, initial severity, social group).
What are the chief criticisms of QALYs?
These are too numerous to summarise here. Despite the answer to the previous question the team leader, in particular, has been a critic of both QALYs and economic evaluation. Critiques available on the Monash Centre for Health Economics website (usually drafts of later publications) include the following research papers: 34 (2009); 18 (2007); 8 (2005); 7 (2005); 140 (2003); 134 (2002); 129 (2002); 120 (2001); 112 (2000); 111 (2000); 105 (2000); 108 (1999); 77 (1997); 50 (1995); 45 (1995); 23 (1992); 5 (1990); 1 (1990).
What are the differences between the holistic (composite) and the MAU (decomposed) approaches to measuring QALYs?
The two approaches are similar in principle but different in practice. Both commence with a description of a health state and, secondly, place a numerical value upon the health state. The holistic approach treats each health state as being unique. Typically, people who have experienced the health state will be interviewed and elements of particular importance for their quality of life will be summarised in a vignette or written scenario. Anything relevant to the QoL or which helps describe it may be included in a vignette for a CUA of breast cancer treatment. This vignette is subsequently presented to other individuals for assessment using one of the utility scaling instruments (TTO, etc) and a utility score is placed upon the entire health state. In contrast, and as described earlier, the MAU methodology employs a generic multi attribute descriptive system, ie questionnaire. Utility scores are assigned to a health state (ie a combination of attribute levels) using a formula which has been constructed from the utility scores of a (generally) cross section of the population obtained during the construction of the MAU instrument.
What is the difference between QALYs and DALYs?
DALYs, HYES and QALYs (narrowly defined) are three of a much larger group of possible metrics which combine life years and the health related quality of life. In principle, any of the metrics in the set could be candidates for use in health services research. Partly for historical reasons most have not been considered and only the DALY, HYE and QALY have received individualised names. This does not, however, imply that they are fundamentally different in kind or purpose from many of the unnamed alternative metrics.
QALYs: Quality Adjusted Life Years are calculated as life years times an index of utility (strength of preference) where the index varies from 1.0 (full health to 0.0 (death). The index is measured as the average utility of a twelve month period in a particular health state and this has usually been measured using the Standard Gamble or Time Trade Off technique.
HYEs: Healthy Year Equivalents are calculated using only the standard gamble technique which its proponents claim to be the theoretically correct scaling instrument (a view which is disputed). An index of utility is calculated for the entire (multi-year) period in a health state using the standard gamble. This is subsequently converted into healthy year equivalents using a second stage standard gamble in which the probability is fixed (and equal to the value found in Stage 1) and the number of years of full health are varied.
DALYs: Disability Adjusted Life Years are calculated from years with disability or poor health, as with QALYs, by multiplying the unadjusted life years by an index of the health related quality of life where this index refers to a single year in a health state. In this case, the index is calculated from a scale which is calibrated at selected points initially using the Person Trade-Off technique. As this adopts an impersonal or societal perspective some argue it does not truly measure (individual) utility. In the WHO (Murray-Lopez version) life years are also multiplied by an importance weight for people's age. The Australian DALY studies have not used these. DALYs have normally been calculated as a loss of utility which is numerically identical to 1.0 minus the utility of a health state. (This has no substantive significance.) In BoD studies, DALYs calculated as described above are added to the years of life lost because of a disease to give the total DALY loss.
What is the relationship between utility and non-utility instruments?
In addition to the utility instruments there are a very large number of disease-specific and a smaller number of generic non-utility instruments (the SF36, the Nottingham Health Profile and the Sickness Impact Profile being examples of the latter). Most of these purport to measure health status. We recommend that researchers should incorporate all three levels of measurement. Each level provides different information about the effectiveness of an intervention and the different levels complement each other. Thus evaluation studies should include a disease specific instrument, one of the generic health-status instruments and a utility instrument. The defining difference between the generic disease specific/health status instruments and the utility instruments is that the latter apply utility weights to different dimensions of health; these utilities (or disutilities) are then used in either a summative or multiplicative model to obtain a single index of HRQoL (or, more accurately, an index of the strength of a person's preference for this health state compared with full health and death). Without the utility weights the descriptive system of the AQoL could be (and has been (Lewis et al 1997)) used as a generic multi-attribute (psychometric) instrument where an overall score is obtained by summing the unweighted patient responses. For use as a generic utility instrument the descriptive system must have certain important characteristics; viz, response categories for each item must be hierarchical (as in a Guttman scale); broad health dimensions must be orthogonal (there must be no double counting of health attributes); and there should be preference independence between dimensions (the preference score for one item or dimension must not depend upon the level of health defined by another dimension in the instrument; for example, preference dependence would occur if the disutility of pain increased when a person's social relationships were poor).
Do different instruments give the same answer
In principle each MAU instrument purports to measure the ‘utility’ of a health state; that is, each purports to measure the strength of a person’s preference for that health state. Consequently, the numbers produced by instruments should be the same. In practice they differ very significantly. Drawing upon results from 7720 respondents the ‘Multi Instrument Comparison’ (MIC) project has demonstrated that different instruments are sensitive to different dimensions or facets of a health state. The EQ-5D primarily measures physical function and pain. The AQoL-8D largely measures psycho-social facets to which the EQ-5D is relatively insensitive. The MIC research papers provide pairwise comparisons of all MAU instruments and quantifies their responsiveness to different dimensions of the QoL (see Richardson, Iezzi, Khan, Maxwell 2012 A cross-national comparison of 12 quality of life instruments, MIC Paper 2: Australia Research Paper 78, CHE Monash University. Results for UK, USA, Canada and Norway are in subsequent reports).
Is the instrument too long?
The short answer is ‘no, not if the measurement of QoL – 50 percent of the QALY equation – is of importance.
The longest AQoL instrument – AQoL-8D – takes an average of 5.5 minutes to complete in its online version. (Of course some people will take longer.) A common comment is that clinicians are reluctant to include MAU questions in their already large battery of questionnaires. There is, however, some responsibility upon consultant economists to maintain the quality of the advice and service provided. If an instrument is insensitive to a health intervention – as a number of MAU instruments are to psycho-social interventions – then the ‘price’ of compromising with respect to the instrument may be an invalid evaluation, a high cost to QALY ratio and the failure of the intervention to be funded. There is, in fact, very limited evidence on patient resistance to relatively short questionnaires as demonstrated by the Multi Instrument Comparison (MIC) project where 7720 respondents completed 226 questions and an online Self TTO.
Should we use a single MAU instrument?
In the UK the National Institute for Health and Clinical Excellent (NICE) has mandated the use of a single instrument, the EQ-5D. The argument has been that the use of a single instrument achieves comparability of measurement. The logic of this argument is unambiguously wrong. Analogously we would not achieve comparability in the measurement of medical need through the use of a single and insensitive indicator such as blood pressure. To the contrary, the use of a single insensitive instrument ensures discrimination. EQ-5D primarily measures pain and physical function. Its use for psychological interventions discriminates against these interventions.
Can the AQoL measure changes through time attributable to normal disease progression?
Yes. This is one of the strengths of a simple generic instrument. It may be applied weekly, monthly or at any appropriate time interval.
What is the minimum (clinical or quantitative) difference which should be used for sample size calculations?
The minimum clinical difference is the clinical or quantitative change in a measure that would typically cause a clinician to change his or her treatment. For a researcher seeking to change practice a sample size is calculated to enable this difference to be detected with a given statistical power (usually 80 percent) at a conventional level of statistical significance (usually 5 percent).
No exact analogy exists in cost utility analysis as clinicians use clinical, not QoL, indices. For policy makers concerned with cost per QALY the relevant data relates to the best estimate where confidence (for each component of the cost per QALY) increases with the sample size.
Nevertheless there may be a context where a researcher wishes to ensure that a change will improve QoL sufficiently that it will be detected by patients. Drummond (1991) suggests a figure of 0.03 for this purpose. Subsequent research has reported that patients detect a change in their health status when the SF-6D changes by 0.04 or the EQ-5D by 0.075 (Walters and Brazier 2005).
Drummond M. (1991). ‘Introducing economic and quality of life measures into clinical studies’, Annals of Medicine, Special Edition 33:5, p344-349.
Walters S, Brazier J. (2005). ‘Comparison of the minimally important difference for two health states: EQ-5D and SF-6D’, Quality of Life Research, 14:1523-32.
Does the AQoL measure cost?
No. The AQoL assists in the measurement of benefits which are then compared with costs in order to make a decision.
Is the AQoL designed to measure individual utility or the utility of a group?
QALYs are designed to measure the average utility of a group of patients or program participants
What is the relationship between the AQoL and other MAU instruments?
See Richardson, McKie, Bariola, (2011) Review and Critique of related multi attribute utility instruments in AJ Culyer (ed) Online Encyclopedia of Health Economics, Elsevier Science, San Diego), reproduced in Research Paper 64, CHE, Monash University. This paper describes the construction, similarities and dissimilarities between the major instruments.
Also see Richardson, Iezzi, Khan, Maxwell, (2012) A cross-national comparison of 12 quality of life instruments, MIC Paper 2: Australia, Research Paper 78, CHE Monash University. This paper presents results from a comparison of the major instruments using data from 7720 respondents in five countries. A pairwise comparison of instruments is undertaken which quantifies the advantage of each instrument with respect to different dimensions of the quality of life
In principle the AQoL is similar to other MAU instruments; they all purport to measure the strength of preference for different health states on a 0-1 scale. In practice each of the existing MAU instruments differs in important respects.
Some conceptualise health in terms of disease characteristics: impairment and disability (HUI-I, II and III; 15D; DALY). Others have a heavier emphasis upon handicap: illness induced, or lack of capacity to carry out normal social activities (the AQoL, WHOQoL; SF36 and EuroQoL).
Even when broadly conceptualised in the same way, the descriptive systems of different generic instruments vary considerably with respect to the detail with which they describe different health dimensions. Instruments also differ with respect to the scaling (utility scoring) system adopted.
Some are based upon the use of rating scales (15D and QWB); others have used the time-trade-off (the AQoL, EuroQoL and HUI instruments); one version of the DALY has used the person trade-off instrument.
MAU instruments already exist. Do we need another?
In addition to the AQoL, there are eight other generic utility instruments in existence or which are/may be developed. These are the:
- Health Utility Index (HUI) Mark I, II, III (developed in Canada);
- Rosser-Kind index (UK);
- Quality of Wellbeing (QWB) instrument (USA);
- 15D (Finland);
- EuroQoL or EQ-5D (European)/EQ5D;
- SF36 utility adaptation by Brazier, referred to as the SF6D (American/British);
- World Health Organization/World Bank DALY; and the
- World Health Organizations WHOQoL, which may eventually receive utility weights.
Several of these are now being calibrated using Australian preference scores. Some of the instruments are seriously compromised by the simplicity of their descriptive systems. The available evidence shows that the differences between instruments is primarily attributable to the questions asked, ie the descriptive system and that variation in preferences between countries is relatively unimportant (despite the undemonstrated belief that Australians, Americans English, etc have major differences in their preference for pain, happiness, physical dexterity, etc). Researchers must be cautious in their choice of instrument and ensure that the questions asked are sensitive to the health states of importance to them. In sum, the construction of the AQoL was motivated by deficiencies in existing instruments. (See Why AQoL?)
The major comparison of MAUI has been undertaken by the AQoL team using 8,000 respondents in 7 disease areas in 6 countries. See Richardson et al. (2014, 2016). Comparisons are published for Australia, Canada, Germany, Norway, UK and USA in Research Papers 85, 83, 82, 81.
Are MAU instruments such as the AQoL necessary?
A large number of disease-specific and a smaller number of generic QoL instruments exist (see, for example, Bowling 2001 for a review of over 200 scales in the areas of cancer, mental health, respiratory and neurological conditions, rheumatic, cardiovascular and other diseases). These instruments do not weight the different dimensions of HRQoL by utility or the strength of people's preferences: different dimensions of HRQoL are simply added up to obtain an overall score. This implies there is a need for instruments which weight the different dimensions such that they can legitimately combined. Utility instruments achieve this weighting property through the elicitation of preferences for different health states, thus overcoming this criticism.
There are a number of situations in which it is necessary to know whether or not the overall quality of life has improved or deteriorated as the result of a health intervention; this is required for both summative program evaluation and for economic evaluation. With limited budgets we are commonly forced to select between programs. Consequently we must, implicitly or explicitly, compare the total benefits derived from competing programs. This requires an overall assessment of the program benefits; i.e. the derivation of a single index of HRQoL. Although, in principle, detailed utility studies could be carried out using the composite or vignette approach, in practice research budgets are limited and MAU instruments offer a low cost method for obtaining this information. Perhaps most importantly, MAU instruments have now gained world-wide acceptance and are being widely used. It is likely their use will continue to expand. It is therefore important to have instruments which minimise bias and maximise the likelihood of obtaining valid utility scores.
What are the unique features of the AQoL?
- The AQoL is the only utility instrument which employed correct psychometric techniques for instrument development for the construction of its descriptive system.
- The HUI instruments and the AQoL are the only utility instruments using a flexible multiplicative model for combining HRQoL dimensions. After subsequent second stage adaptation of the QoL scores AQoL- 6D, 7D, 8D have unique scoring algorithms.
- The AQoL is the only instrument which independently models all the sub-dimensions of health and then combines these sub-models.
- The AQoL project has undertaken a more exhaustive analysis of the exchange rate between HRQoL and life years than any other reported in the literature. At the time of writing (2009), this enquiry is ongoing.
- Similarly, the AQoL- 8D has undergone a large scale validation study by comparing its predicted values with the values obtained from other generic instruments and from direct TTO self-assessment. Again, no similar study has been reported in the literature for any other instrument. Results from this study can be found in three publications:
(i) Richardson et al 2015 Comparing and explaining the differences in the magnitude, content and sensitivity of utilities predicted by the EQ-5D, SF-6D, HUI 3, 15D QWB and AQoL-8D multi-attribute utility instruments MDM 35(3):276-91
(ii) Richardson et al 2015 Can multi-attribute utility instruments adequately account for subjective wellbeing? MDM
(iii) Richardson et al 2016 Measuring the sensitivity and construct validity of 6 utility instruments in 7 disease areas MDM 36(2):147-59
What is validation?
This is discussed in detail in Research Paper 57. Validation is an ongoing process of testing an instrument in different ways and in different contexts to determine whether or not the instrument measures what it purports to measure.
Unfortunately, the term validation is commonly misunderstood; many people think it describes whether an instrument is or is not valid. Instruments may be valid in some circumstances but not in others. There are three kinds of evidence suggesting validity.
Content validity: Is defined as how well an instruments items may be considered to be a representative sample of the universe which the researcher is trying to measure. In HRQoL measurement this might be the extent to which the items in an instrument cover the full domain of HRQoL; that is, whether or not the instrument includes items which enquire about each of the dimensions of health that are included in the underlying concept of HRQoL. Where content validity is determined by looking at the instrument this is described as face validity; although popular, apparent face validity does not confer content validity.
Construct validity: Construct validity indicates whether or not instrument items truly represent the underlying construct that is of interest. This implies the researcher has to define an underlying construct: whatever it is that is being measured. If the construct is correctly represented by the instrument it is possible to draw a succession of inferences from the instrument concerning instrument scores in different contexts. Construct validity therefore involves an ongoing process of testing the instrument in different contexts.
Criterion validity: This describes the relationship between an instruments scores and either other independent measures (the criteria) or other specific measures (predictors), where the criteria or predictors are the gold standard for the measurement (e.g. in the case of breast cancer, the gold standard is the histopathological confirmation of cancer). In the absence of a gold standard, confidence in criterion validity increases if the instrument has high correlation with each of the accepted extant instruments.
Which aspects of an MAU instrument need validation?
Both the descriptive system and the utility values associated with health states require validation. More specifically the descriptive system should be shown to have content, construct and criterion validity. Utility scores should correctly reflect the strength of people's preference for health states. As well, the model employed to combine the utility scores from different dimensions should also be valid.
Can there be negative utilities?
Yes. A person might prefer death rather than live any time in the health state. A Worse-than-Death TTO question may be asked.
However, placing a numerical value on these states is difficult. If a person refused even one day in the health state followed by 10 years of full health the implied numerical value of the health state is almost minus infinity. This problem is discussed at length in Richardson and Hawthorne (Working Paper 113 (2001)) and various options are discussed and their numerical implications demonstrated. The final algorithm used for the calculation of utilities in the AQoL instruments transforms negative scores so that the lower boundary is U = -0.25; that is there is a disutility of 1.25.
Interview methodology is presented in detail in Iezzi and Richardson (2009). Measuring Quality of Life at the Centre for Health Economics. Research Paper 41. Melbourne, Monash University.
Have AQoL utility scores been validated?
The two largest validation studies to date have included the AQoL-8D. These are reported in Richardson et al. (2014, 2016), both articles published in Medical Decision Making. Numerous smaller validation tests are reported in the Research Paper series.
Have MAU instruments been validated satisfactorily?
No. The AQoL is the only instrument whose descriptive system was constructed using correct psychometric principles for instrument construction. Most instruments in the literature claim validation on the basis of a correlation between results and results from another instrument which has been validated (often in the same way!). This type of result is necessary but far from sufficient for confidence in an instrument. A valid instrument will produce valid utility scores. The criterion for achieving this is, in fact, exceedingly stringent. The percentage increase in the numerical score of the utility index must indicate an increase in the quality of life which is valued equally to an identical percentage increase in the length of life. No instrument has been shown to have this property.
For a discussion of this so called strong interval property see Richardson (1994) Cost utility analysis: what should be measured Social Science and Medicine, 39(1):7‑21. Importantly, an instrument may be valid in one context (disease area, intervention) but not in another.
The table below provides intraclass correlation coefficients for the AQoL-8D instrument and its dimensions after two weeks and after one month.
Test-Retest reliability: intra class correlation coefficients (ICC)
Base – 2 weeks
Base – 1 month
Test-retest reliability coefficients are sourced from J Richardson, A Iezzi (2011). Psychometric validity and the AQoL 8D Multi attribute utility instrument. Research Paper 71, Table 3, p13.
Why is modelling necessary: have utility models been validated?
The AQoL- 4D measures 1.07 billion health states which is a very small subset of the number of health states defined by AQoL- 8D (many of which are a little improbable, eg being blind, deaf, bedridden, full of energy and in control of your life). These health states cannot all be measured individually and, like all other MAU instruments (except for the Rosser-Kind index) AQoL models utility scores from a limited number of observations. To date, most instruments have adopted an additive model in which the disutility associated with each response from each item is independently measured, and the overall disutility estimated or modelled as a weighted average of these disutilities, where the weights are also obtained empirically during the scaling survey. This additive model is probably invalid and the multiplicative model employed by the HUI and AQoL instruments is superior (Richardson & Hawthorne 1998). However there is no certainty that even this more flexible model does not introduce significant estimation bias.
Can AQoL-4D be used for measuring the impact of health promotion upon population health?
Yes. For example, a road safety awareness program may prevent injury and death. Since the disutility associated with most injuries can be measured by the AQoL- 4D, any reduction in these injuries can be quantified. Similarly it is likely that the AQoL- 4D can measure the disutility associated with illnesses which would occur in the absence of an immunisation program. However, the AQoL- 4D is relatively insensitive to changes in health in the vicinity of full health. AQoL- 6D was constructed to overcome this limitation.
Could AQoL-4D, 6D, 8D produce different utility scores and different project ranking?
Yes. This is generally true for any two QoL instruments. In principle utility scores should be similar. For this reason the project team is constructing transformations between instruments.
If a separate instrument is used for measuring the impact of health promotion (ie AQoL-6D) will the utility scores be comparable with those obtained from other instruments?
No. Each utility instrument has different items, different dimensions of HRQoL, and different scaling properties. Given these differences there is no reason different instruments will necessarily provide comparable utilities. Empirical evidence on this point is presented in:
(i) Richardson et al (2015) Comparing and explaining the differences in the magnitude, content and sensitivity of utilities predicted by the EQ-5D, SF-6D, HUI 3, 15D QWB and AQoL-8D multi-attribute utility instruments MDM 35(3):276-91
(ii) Richardson, McKie, Bariola (2014), Multi attribute utility instruments and their use, Chapter 5.5 in (ed) AJ Culyer, Encyclopedia of Health Economics, San Diego: Elsevier Volume 2, 341-357
It is shown that very different estimates are obtained on the different utility instruments and that this is a function of the coverage of the instruments (see Transformations between instruments).
Is a separate model needed to measure the outcome of health promotion?
This depends upon the outcome of the program and the sub-population which receives the benefits of the program. Researcher discretion is needed to assess whether or not the questions in an instrument are likely to detect the changes they anticipate will occur as a result of their program.
Is it possible for a MAU instrument such as AQoL to measure all of the outcomes of heath promotion?
The AQoL- 6D is able to measure some but not all relevant outcomes. Many more may be measured by AQoL- 8D. See the answer to the previous question.
Can quality of life be reduced to a number?
There are different dimensions of health and combining them is said to be like combining apples and oranges. The index number produced by utility instruments is an index of the strength of preference for different health states. Combining dimensions is therefore analogous to combining the preference for apples with the preference for oranges; combining this is a valid procedure. However a preference for a health state differs from other concepts of the quality of life.
Can people's utility scores preferences really be reduced to mathematical equations such as those employed by the AQoL?
Yes. The equations are nothing more than a sophisticated form of averaging individual utilities. The simplest form of averaging would be to add up the number of 'yes' answers in a simple yes/no questionnaire and then using a 'mathematical' equation which divides the total number of yeses by the number of people surveyed. With a more sophisticated approach answers would be multiplied by an 'importance weight' and the 'mathematical equation' would be equivalent to the formula for a weighted average. This is the approach adopted by simple additive MAU models. Thirdly, the model may combine importance scores multiplicatively, ie scores may be multiplied together after adjusting for their importance and then the results scaled to a 0-1 range. This is what the AQoL and HUI instruments do. The AQoL uses a two stage procedure in which the multiplicative model is used to combine items within each of the four dimensions used and then an overarching equation is used to obtain the final score. It remains true, however, that the multiplicative model does impose a particular structure upon utility values. To increase flexibility AQoL 6D, 7D and 8D add a 3rd stage which adjust scores to better fit independently measured holistic health states. Despite the apparent complexity, these methods all seek to average peoples own preferences.
Can the average utility derived by the AQoL truly represent the utility of each individual?
No. Where individual choice is possible it is clearly better to allow individuals to select their own programs and to state their own preferences. Cost utility analysis is primarily useful for evaluating programs based on the common or average experience. For example, a new technology must either be installed or not installed in a hospital; a new procedure must be either included or not included in the Medicare schedule or in the benefits provided by a private health fund. In these cases it is necessary to make a collective choice and there is no mechanism for making such choices, other than consensus, which can reflect every person's preference. The ethical strength of the QALY procedure is that the final decision will be based upon public preferences and the strength of these preferences and not upon a process which dis-enfranchises those who are affected by the decision.
Can the quality and length of life be combined?
See previous answer.
Does CUA discriminate against the disabled by assigned them a lower utility score?
Possibly. This argument was effectively used to discredit the prioritisation process adopted in the State of Oregon in the USA. Life extension for a disabled person generates fewer QALYs than for a healthy person. However this outcome can be overridden. Policy makers can assign the same utility weight to the disabled. Further, when the total range of possible benefits is considered, the potential for health improvement by the disabled is greater and the potential QALY gain is larger for disabled persons since it may be possible to cure some disabilities.
For a discussion of this issue see in particular Nord (1999) Cost Value Analysis: Making sense out of QALYs, Cambridge University Press.
Doesn’t the use of QALYs over-simplify measurement and ignore a large number of context-specific and process related factors?
Yes. QALYs measure only two dimensions of outcome; viz. quality and length of life. However, these two dimensions are generally considered to be very important. This does not deny the importance of other process, context and equity issues the importance of which have been the subject of recent research.
Who should judge the utility scores?
There has been vigorous debate over this issue in the literature. AQoL-4D and 6D incorporates the utility values of a representative cross-section of the Australian population. In this respect AQoL-4D and 6D have adopted the common practice of using community values. AQoL-7D and 8D include patients with visual and psychiatric illnesses.
General Comment on the specificity of the former issues
The issues raised are not specific to the AQoL. They concern the use of all generic MAU instruments and, more generally, the use of any HRQoL generic or disease specific instrument and the use of any utility scoring.