Dietary Quality, Behavioural Factors and Cardiovascular Health An Econometric Analysis of Structural Relationships with the Data of the National Health and Nutrition Examination Survey 2005-2006, USA Dissertation to obtain the Doctoral Degree at the Faculty of Agricultural Sciences, Nutritional Sciences and Environmental Management, Justus Liebig University Giessen submitted by Tetyana DEMYDAS born in Kiev First supervisor: Prof. Dr. Roland Herrmann Second supervisor: Prof. Dr. P. Michael Schmitz Date of disputation: October 28, 2014 ii iii Acknowledgement This dissertation was prepared with a support and encouragement of my supervisors, colleagues, friends and family members, to whom I feel truly indebted. My deep gratitude goes to my first supervisor, Prof. Dr. Roland Herrmann. I could not wish better guidance and support throughout the whole course of the dissertation project. A special thanks for his warm encouragement and valuable suggestions in the final phase of the thesis preparation. I also very much appreciate the involvement and input of my second supervisor, Prof. Dr. Michael Schmitz. I extend my warm thanks to my colleagues at the institute of Agricultural Policy and Market Research for stimulating discussions and many nice memories. It was truly a pleasure to be a part of the team. My gratitude also goes to the colleagues from the Center for International Development and Environmental Research (ZEU). It was an exciting time with interesting interdisciplinary projects and a great working atmosphere. I thank to the University of Giessen for providing me the funding via Graduate scholarship, which allowed me to undertake this research. Over all the years of my studies, my family and friends living in Ukraine were my “home front” that provided me with faith and motivation. To them I express my deepest gratitude. Finally, the finalization of the thesis would not have been possible without support of my husband, John. Thank you for taking a great care of our little Philip and, thus, creating for me a space needed so much for the final preparations. Giessen, March 2015 Tetyana Demydas iv Table of contents List of tables ...................................................................................................................... vi List of figures ................................................................................................................... vii List of abbreviations ...................................................................................................... viii 1 Introduction ................................................................................................... 1 1.1 Problem statement and project goals ..................................................................... 1 1.2 Structure of the thesis .............................................................................................. 3 2 Dietary quality and health: definition and measurement approaches .... 5 2.1 Dietary quality and its assessment .......................................................................... 5 2.1.1 Theoretically defined indicators ........................................................................... 5 2.1.2 Empirically derived dietary patterns .................................................................. 10 2.1.3 Subjective (self-assessed) dietary quality .......................................................... 11 2.2 Health status: definition and measurement ......................................................... 11 2.2.1 Concept of health ............................................................................................... 11 2.2.2 Measurement approaches ................................................................................... 12 3 Theoretical approach to explaining health-related behaviour and outcomes ..................................................................................................... 16 3.1 Consumer demand theory ..................................................................................... 16 3.2 Household production theory ................................................................................ 18 3.3 Household production of health ............................................................................ 22 3.3.1 Theoretical presentation ..................................................................................... 22 3.3.2 Empirical presentation ........................................................................................ 24 3.3.3 Challenges for empirical estimation ................................................................... 26 4 Methodological approaches to health production function estimation . 30 4.1 Methods of simultaneous equations models estimation ...................................... 30 4.2 Empirical examples of the two-stage estimation method ................................... 33 4.3 The structural equation modelling approach ...................................................... 36 4.3.1 Definition and main features .............................................................................. 36 4.3.2 Aim and general form ........................................................................................ 39 4.3.3 Steps in the modelling process ........................................................................... 42 4.3.4 Empirical examples in health economics research ............................................. 52 5 Empirical analysis ....................................................................................... 57 5.1 Dataset: The U.S. National Health and Nutrition Examination Survey 2005-2006 ................................................................................................................ 57 5.1.1 Sampling method and content of the dataset ...................................................... 57 5.1.2 Data preparation and study sample .................................................................... 59 v 5.2 Dietary quality among adults in the USA ............................................................ 60 5.2.1 Subjective (self-assessed) dietary quality .......................................................... 61 5.2.2 Energy supply and its structure .......................................................................... 61 5.2.3 Nutrient density and Index of Nutritional Quality ............................................. 63 5.2.4 Biological markers of nutritional quality ........................................................... 64 5.2.5 Fruit and vegetable consumption ....................................................................... 66 5.3 A model of cardiovascular health and its determinants ..................................... 70 5.3.1 Theoretical specification of the model ............................................................... 70 5.3.2 Definition of model variables and expected signs ............................................. 74 5.3.3 Main hypotheses ................................................................................................. 93 5.3.4 Empirical specification and analytical procedure .............................................. 93 5.3.5 Evaluation of the model assumptions and descriptive statistics of the variables97 5.3.6 Results and discussion: the structural model of cardiovascular health ............ 100 5.3.7 Results and discussion: the alternative structural model of cardiovascular health ................................................................................................................ 114 5.3.8 Discussion of reciprocal relations in the model ............................................... 120 5.3.9 Critical consideration of the empirical analysis ............................................... 121 6 Summary .................................................................................................... 125 Literature list ................................................................................................................. 134 Appendix ........................................................................................................................ 151 vi List of tables Table 1 Classification of variables in a health-production model ...................................... 23 Table 2 Estimation strategies for empirical models ............................................................ 31 Table 3 Identification in SEM ............................................................................................. 43 Table 4 Socio-demographic characteristics of the sample .................................................. 60 Table 5 Subjective dietary quality indicators ...................................................................... 61 Table 6 Structure of recommended and actual energy supply in the sample ...................... 62 Table 7 Nutrient densities in the diet of adults from NHANES 2005-2006 ...................... 63 Table 8 Biological markers of nutritional status of adults from NHANES 2005-2006 ...... 65 Table 9 F&V classification by degree of processing .......................................................... 67 Table 10 F&V intakes by degree of processing in the sample of adults from NHANES 2005-2006 ......................................................................................... 68 Table 11 Summary statistics of the selected health indicators .............................................. 75 Table 12 Summary statistics of the weight measurements ................................................... 77 Table 13 Summary statistics of the indicators of smoking ................................................... 78 Table 14 Summary statistics of the indicators of alcohol consumption ............................... 80 Table 15 Summary statistics of the main physical activity measurements .......................... 82 Table 16 Summary statistics of the indicators of medical care utilisation ............................ 83 Table 17 Summary statistics of the questions related to nutritional knowledge ................... 87 Table 18 Definition, means and standard deviations of variables in the model ................... 99 Table 19 Correlation matrix of the indicators of the measurement model for health ......... 100 Table 20 Estimates from the health measurement model ................................................... 101 Table 21 Estimation results of the structural model of cardiovascular health .................... 104 Table 22 Effects decomposition: direct, indirect and total effects of socio-demographic variables on endogenous health inputs ................................. 112 Table 23 Direct, indirect (specific and total) and total effects of income and education on health .............................................................................................. 113 Table 24 Estimation results of the two-factor health measurement model ......................... 116 Table 25 Estimation results of the alternative full structural model of health production with multidimensional health status .................................................. 118 Table 26 Non-nested model comparison ............................................................................ 119 Table A1 Estimated calorie requirements (in kilocalories) per day by age, gender, and physical activity level .................................................................................... 151 Table A2 Selected nutrients in the USDA Food Guide and the DASH Eating Plan ........... 152 Table A3 Sample USDA Food Guide and the DASH Eating Plan at the 2000-Calorie Level ................................................................................................................... 153 Table A4 Nutrient contributions of fruit and vegetable food groups averaged over food patterns at all energy levels ......................................................................... 154 Table B1 Classification of total cholesterol, LDL cholesterol, HDL cholesterol and triglyceride .................................................................................................... 155 Table B2 Fit indices and their acceptable thresholds .......................................................... 155 Table C1 Correlation matrix of model variables ................................................................. 156 vii List of figures Figure 1 A three-variable mediation model ......................................................................... 38 Figure 2 Full structural equations model .............................................................................. 40 Figure 3 SEM process .......................................................................................................... 42 Figure 4 Types of non-recursive relationships ..................................................................... 44 Figure 5 Share of adults from NHANES 2005-06 who comply with recommendations as measured by INQ of eleven key nutrients ......................................................... 64 Figure 6 Share of adults consuming exclusively one fruit/vegetable subgroup ................... 69 Figure 7 Graphical presentation of the structural model of cardiovascular health .............. 94 Figure 8 Measurement part of the structural model of cardiovascular health .................... 101 Figure 9 Direct and indirect impact of education on health and health-related behaviour. 109 Figure 10 Direct and indirect impact of income on health state and health-related behaviour ....................................................................................... 111 Figure 11 Unidentified two-factor measurement model of cardiovascular health ............... 114 Figure 12 Modified two-factor measurement model of cardiovascular health .................... 115 Figure 13 Schematic representation of the alternative full structural model of cardiovascular health .......................................................................................... 117 viii List of abbreviations ADF Asymptotically Distribution Free AIC Akaike Information Criteria s AMDR Acceptable Macronutrient Distribution Ranges for Adults AMOS Analysis of MOment Structures BMI Body Mass Index CALIS Covariance Analysis and Linear Structural Equations CDC Centres for Disease Control and Prevention CFA Confirmatory Factor Analysis CNPP Centre for Nutrition Policy and Promotion CVDs Cardiovascular Diseases DBP Diastolic Blood Pressure DF Degrees of Freedom DGA Dietary Guidelines for Americans EM Expectation Maximisation EQS Equations FAO Food and Agriculture Organisation FGP Food Guide Pyramide F&V Fruit and Vegetable GLM General Linear Modelling HDI Healthy Diet Indicator HDL High Density Lipoprotein HEI Healthy Eating Index HHS United States Department of Health and Human Services Hyp Hypothesis INQ Index of Nutritional Quality LDL Low Density Lipoprotein LISREL Linear Structural Relationships MAR Missing at Random MCAR Missing Completely at Random MEC Mobile Examination Centres MIMIC Multiple Causes Multiple Indicators NHANES National Health and Nutrition Examination Survey NIH National Institutes of Health RMSEA Root Mean Square Error of Approximation QOL Quality-of-Life SBP Systolic Blood Pressure SE Standard Error SEM Structural Equation Modelling SPSS Statistical Product and Service Solutions SUR Seemingly Unrelated Regressions USDA United States Department of Agriculture WC Waist Circumference WHO World Health Organisation 2SLS Two-Stage-Least-Squares 1 Introduction 1 1 Introduction "The greatest potential for improving the health of the American people . . . is to be found in what people do and don't do to and for themselves.” (FUCHS, 1967) 1.1 Problem statement and project goals Cardiovascular diseases (CVDs) is a group of disorders of the heart and blood vessels, including coronary artery disease (e.g., heart attack), cerebrovascular disease (e.g., stroke), diseases of the aorta and arteries, hypertension, congenital heart disease and heart failure. In 2008, they were responsible for 30% of all mortalities, thus being the leading cause of death globally. Especially low- and middle-income countries are affected by CVDs (WHO, 2011). In high-income societies, a decline of the mortality rate attributable to CVDs has been observed in the last decades due to the availability of early detection services and improved medical treatment possibilities. However, the burden of the diseases remains high. In the USA, CVDs are the leading cause of mortality among men and women (ROGER et al., 2012). In 2008 they were responsible for 35% of all deaths in the country. For comparison, cancers are estimated to account for 23% of deaths in the USA (WHO, 2011a). High incidence of CVDs among the US population is connected with huge economic costs for the health care system. Currently about 17% of all health expenditures are attributable to CVDs, while a further increase in medical spending is projected due to an ageing population and an expected rise of the incidence of the diseases. By 2030, total direct medical costs of CVDs are assumed to triple, while indirect costs (i.e., productivity loss due to morbidity and premature mortality) will more than double (HEIDENREICH et al., 2011). Adverse health behaviours such as physical inactivity, smoking, harmful alcohol consumption, eating a poor diet and being overweight or obese are considered to be important risk factor of heart disorders. In the long run, these factors increase the risk of hypertension, diabetes, heart attack and stroke (ROGER et al., 2012; LLOYD-JONES et al., 2010). Importantly, these behaviours are modifiable and thus, in the majority of cases, CVDs can be prevented. Thereby, genetic predisposition is likely to play a role in the incidence of the diseases. Individuals with a family history of CVDs may also have an enhanced risk due to sharing common unhealthy environments and lifestyles within a family (CDC, 2013; ROGER et al., 2012). 1 Introduction 2 Although the share of smokers among the US population declined between 1965 and 2008 by over 50%, about 20% of all deaths in the country are attributed to tobacco usage (AMERICAN LUNG ASSOCIATION, 2011). In 2010, 21% of adult men and 18% of adult women were characterised as regular smokers (ROGER et al., 2012). Furthermore, over one third of adults (33%) engage in no regular leisure-time physical activity and 20% are insufficiently active (ROGER et al., 2012). American adults also fail to comply with the existing recommendations for a healthy diet (USDA and CNPP, 2013; SCHILLER et al., 2012). Findings of dietary studies indicate an overconsumption of added sugars (e.g., from sugar-sweetened beverages) and saturated fats (e.g., from fast food) and under-consumption of fruits and vegetables (F&V). According to the data from NHANES 1999–2000, a diet of only 10% of Americans could be characterised as “good”, 74% of population had a diet that “needs improvement,” and 16% had “poor” diets (BASIOTIS et al., 2002). No significant improvement was observed in the overall diet quality of Americans in 2007-2008 compared to 2001-2002. The overall score of the Healthy Eating Index (HEI) computed by the USDA annually was 53 and 52, respectively, out of possible 100 points indicating inadequacy of the diets (USDA and CNPP, 2013). Furthermore, of a large concern for the health care system is a dramatic raise of overweight and obesity among the population. In 2008, about 67% of US adults were characterised as overweight or obese, with 34% of them being obese. According to the estimations, about 13% of CVDs deaths were attributed to obesity in 2004. Moreover, overweight and obesity are associated with numerous other negative health conditions including asthma, cancer, and diabetes mellitus (ROGER et al., 2012). Various disciplines, e.g., economic, social, medical, nutritional and epidemiological, work on their contribution to the research about health determinants, which is aimed to support the nutrition and health policies with scientific knowledge and to improve the population’s health. This dissertation project supports an idea that health-related research may benefit greatly from an interdisciplinary approach as health status is affected by numerous factors existing at every stage and in any area of human life. An individual is confronted with a number of choices, including decisions affecting his health (e.g., diet). A further complexity is due to an impact of economic constraints (e.g., income) on these choices and, furthermore, of usually unobserved personal characteristics such as genetic endowments. Application of knowledge and findings from various disciplines allows a more profound analysis of diverse health factors and their complex interrelations. Other authors (BERMAN et al., 1994; NAYGA, 2008; CHEN et al., 2002) have emphasised the benefits of such an approach. 1 Introduction 3 This study employs a theoretical framework of household production that emerged out of groundwork of BECKER (1965) and its application to health and nutrition by BEHRMAN und DEOLALIKAR (1988). This framework offers a basis for investigations in the field of health and its determinants and finds numerous applications in the health-economic literature. Moreover, it is viewed as an integrating concept for interdisciplinary research dealing with human health and its determinants (NAYGA, 2008). The main goal of the project is to contribute to the analysis of structural relations between dietary quality, lifestyles and an individual’s health state related to CVDs. Among the factors of cardiovascular health, special attention is devoted to the dietary quality of American adults due to its determinative role. Further, in this work the endogenous nature of certain health inputs is recognised and discussed. A system of structural equations is specified and followed by a simultaneous estimation of all model parameters. A special contribution of this study is the focus on the appropriate measurement of the state of health during the model specification. In contrast to the studies using a single indicator to represent human health, a latent variable approach is employed with cardiovascular health being represented by multiple indicators, which is aimed at the improvement of measurement properties of the state-of-health construct. Empirical analysis is based on the data of the representative US National Health and Nutrition Examination Survey (NHANES) of 2005-2006. It provides information on socio- economic characteristics of the adult population in the USA, detailed data on their 2-day dietary behaviour and usual lifestyles as well as accurate medical information related to one’s diet and health, obtained from blood and urine examination. This diverse information facilitates the analysis of the various factors and their interrelations affecting a person’s health. 1.2 Structure of the thesis In the second section, a review of the dietary quality assessment methods is given. Further, the section discusses existing definitions of health status and gives an overview of its measurement approaches. The third part of the dissertation work describes the theoretical background of the study. The fourth section discusses the existing estimation methods and provides the rationale for the selected methodology. The results of the empirical analysis are presented in the fifth section starting with the description of the dataset and then outlining the main characteristic of the study sample. The empirical analysis starts with the investigation of dietary quality among the U.S. adult population with a number of approaches being employed. 1 Introduction 4 Further, the structural model of cardiovascular health is presented, i.e., its theoretical and empirical specification, description of the model variables and main hypotheses. The estimated model’s parameters are discussed, followed by the formulation of an alternative structural model, its test and models’ comparison. This section concludes with a critical consideration of the performed empirical analysis. In the final sixth section, the insights of this study are summarised and suggestions for future research are given. 2 Dietary quality and health: definition and measurement approaches 5 2 Dietary quality and health: definition and measurement approaches Chapter overview Nutrition is considered to be one of the major determinants of human health. This chapter provides an overview of the approaches used to assess the quality of a person’s diet. These include single indicators such as total energy intake, under- and oversupply of particular nutrients, self-evaluations as well as more complex measures based on a number of parameters and their combinations, e.g., dietary indices. Furthermore, statistical methods such as factor and cluster analysis can be used when searching for the consumption patterns in the population of interest. The application of particular methods largely depends on the research goals and data availability. In the second subsection the concept of health is discussed. Based on the conceptualisation of the health status, a number of approaches can be applied to its measurement. These include single objective measures, e.g., clinical data on cholesterol level and country mortality rates, and subjective self-reports about illness or disability. Moreover, health status measure can be based on a number of scores derived from the answers to specific health-related questions. These scores are summed which results in an overall health score (health index). Finally, the health state can be presented as a theoretical construct (latent variable) measured by a number of indicators. 2.1 Dietary quality and its assessment 2.1.1 Theoretically defined indicators Theoretically defined indicators of dietary quality are related to the current knowledge about the effect of specific nutrients and foods on a human’s health. This effect can be beneficial or harmful to health (WAIJERS and FESKENS, 2005). Energy and nutrients supply The rise in the obesity rate in the US and related health disorders are partially attributed to an increase in energy intake amongst the population combined with decreased energy expenditure (HUANG et al., 2004). A total energy supply is considered to be important when assessing a person’s diet (RÖDER, 1998). Thereby, to evaluate its adequacy, actual intake is 2 Dietary quality and health: definition and measurement approaches 6 compared to the recommended, taking into account age, gender, and physical activity of an individual (Appendix A, Table A1)1. Besides information on total calories consumed, the composition of diet matters, i.e. what foods these calories come from. Therefore, further investigations such as under/overconsumption of particular nutrients and/or food groups follow. In the USA, the Food Guide Pyramid for Americans of the United States Department of Agriculture (USDA) provides research-based guidance for the promotion of better diet among Americans. A particular focus is devoted to the limitation of fat, saturated fat, cholesterol, sugar, sodium and alcohol intakes, which is due to the evidence of overconsumption of these elements by the prevailing part of the population (USDA, 2005). On the other hand, an increase in an intake of a number of minerals and vitamins is also recommended (e.g., fibre, folate, vitamins A and C). Several eating plans have been developed to simplify the dietary recommendations by providing examples of a balanced diet, e.g., USDA Food Guide and the DASH Eating Plan (HHS and USDA, 2005) (Appendix A, Table A2). Comparison of actual intakes of particular nutrients with the guidelines’ values gives an indication about their under- or over- consumption. An element is considered to be under- or overconsumed if its intake is below 67% of the recommended amount (RÖDER, 1998: 101). Nutrient density approach According to the Dietary Guidelines for Americans (HHS and USDA, 2005), meeting the recommendations for a number of nutrients should be done within a person’s calorie needs. In reality, most Americans consume more calories than they need without meeting the recommendations. The nutrient density approach is a method that allows for the examination of nutritional adequacy within calorie needs of a person. Empirically, densities of the selected nutrients can be presented either as a proportion of total energy or as an intake per 1000 calories (SIEGA-RIZ et al., 2000; PRYER et al., 2001; WILLET et al., 1997). To compare the actual density of a particular nutrient in a diet to the existing recommendation, an Index of Nutritional Quality (INQ) is calculated. INQ is defined as ratio of nutrient density and an amount of this nutrient recommended for maintenance of good health within a given calorie need (DREWNOWSKI, 1 A number of studies apply various cut-offs in order to select individuals with plausible energy intakes (e.g., HUANG et al., 2004; BLACK, 2000; NIELSEN and ADAIR, 2007). 2 Dietary quality and health: definition and measurement approaches 7 2005; HUANG and MISRA, 1991). The index may range from above to below unity. The INQ values above 1 are considered to be desirable for the nutrients important to the diet and health, e.g., fibre, potassium, calcium, magnesium, and vitamins C and A, while for fats, sugar, sodium, and cholesterol, it should be below 1. A reference for computation is chosen depending on the population of interest, e.g., for the American population it is the Dietary Guidelines for Americans (HHS and USDA, 2005). Food groups’ intake Since we obtain nutrients from foods, an attention needs to be given to what foods we consume as well as to their proportions. Dietary guidelines for Americans give recommendations on intake amounts from major food groups (Appendix A, Table A3). An importance of variety of foods in one’s diet is stressed (HHS and USDA, 2005). At the same time, while some food groups should be consumed in moderation, e.g., fats, oils, and sweets, higher intakes of others are desirable, e.g., grains and F&V. This study puts a particular focus on F&V intake among American adults. This is due to broad scientific evidence on the importance of F&V as naturally healthy, nutrient-dense2 and low-energy foods, in a balanced diet and in prevention of many chronic diseases including cardiovascular disease, diabetes and some types of cancer (VAN DUYN and PIVONKA, 2000; STEINMETZ and POTTER, 1996). Biological markers In the medical and epidemiological literature, biological markers derived from blood and urine examinations (e.g., vitamins, minerals and fats in the blood) are employed for assessment of nutritional status of an individual. Biomarkers are considered to be an objective measure of dietary quality as they are believed to contain less error than self-reported dietary information (POTISCHMAN, 2003). However, the collection of these data is connected with higher costs and is done only in a limited number of population surveys. NHANES used in this study contains detailed information obtained in laboratory conditions, among which are results of blood analyses that can be used for dietary and health state assessments. Biological markers can be used to (dis)confirm the results obtained from usual dietary assessment methods such as 24h recall or food frequency questionnaire (see e.g., NEUHOUSER et al., 2003, POTISCHMAN, 2003). However, it should be kept in mind that similar to the dietary 2 See Table A4 in Appendix A for nutrient contributions from the F&V food group. 2 Dietary quality and health: definition and measurement approaches 8 data, biomarkers present a diet in only a snapshot of time. Additionally, alcohol, antibiotics and particular diseases might influence the concentration of these elements in blood or urine. A number of biomarkers are employed in the literature for characterization of an individual’s dietary status. They can be divided into blood plasma biomarkers (e.g., serum albumin, serum- total protein, haemoglobin, triglyceride, cholesterol, vitamins, and folate), urine markers and hair samples (HAVEMAN-NIES et al., 2001; KANT and GRAUBARD, 2008; WALTER et al., 2008). Anthropometric measures Anthropometric measures, e.g., a person’s height, weight, arm circumference, birth weight, body mass index (BMI), waist circumference (WC) and waist-to-hip ratio are often used as indicators of nutritional status, especially in studies on developing countries (BEHRMAN and DEOLALIKAR, 1988; SAVY et al., 2005). A person’s BMI is calculated by dividing a person’s weight (in kg) by a squared measure of his/her height (in m). It is a recognised marker of obesity. Moreover, BMI and WC showed to be related to the risks for hypertension, diabetes mellitus, cardiovascular disease, arthritis, various forms of cancer, and other diseases. Scientific evidence suggests that WC is a better predictor of cardiovascular disease compared to BMI or waist-to-hip ratio (DOBBELSTEYN et al., 2001; BUCHHOLZ and BUGARESTI, 2005). Indices of dietary quality Diet indices (scores) represent an approach that allows measuring dietary quality as a whole by assessing a supply of a number of nutrients simultaneously as well as their combination in the diet. Diet scores represent current nutrition guidelines and have shown to be useful for the identification of groups with good/poor nutritional status (HAINES et al., 1999). However, several drawbacks of this method can be mentioned, e.g., arbitrary choices of components, cut-offs and scoring. Various diet indices have been developed for particular populations and its groups. They differ by the construction of scores and by components included. They can be nutrient- or foods-based as well as a combination of both. To evaluate the dietary quality of Americans, in 1995 the USDA introduced the Healthy Eating Index (HEI), which was revised in 20063. It is a tool to measure compliance of diets 3 HEI was updated again in 2012 to reflect the 2010 Dietary Guidelines for Americans (GUENTHER et al., 2013). 2 Dietary quality and health: definition and measurement approaches 9 with the diet-related recommendations of the 2005 Dietary Guidelines for Americans. The HEI-2005 consists of 12 components: total fruit; whole fruit (forms other than juice); total vegetables; dark green and orange vegetables and legumes; total grains; whole grains; milk; meat and beans (all meat, fish, eggs, soybean products, nuts, and seeds); oils; saturated fat; sodium; calories from solid fats, alcoholic beverages and added sugars (SoFAAS) (GUENTNER et al., 2008). Each component receives a score from zero to a maximum of 5-20 depending on the food group. All component scores are summed up, producing a total score ranging between 0 (lowest compliance with recommendations) and 100 (best score indicating a full compliance with guidelines). The HEI index is calculated by the National Centre for Health Statistics annually for various population groups. Examples of other dietary indices are a Healthy Diet Indicator (HDI) of HUIJBREGTS et al. (1997) and indices developed by THIELE et al. (2004). The Healthy Diet Indicator (HDI) (HUIJBREGTS et al. 1997) is based on the WHO dietary guidelines for the prevention of chronic diseases (WHO, 1990). It consists of nine food and nutrient groups: saturated fatty acids; poly- unsaturated fatty acids; protein; complex carbohydrates; dietary fibre; fruit and vegetables; pulses/nuts/seeds; mono- and disaccharides; and cholesterol. In the case where a person's intake is within the recommended borders of the WHO guidelines, the element receives “1” and “0” if otherwise. HUIJBREGTS et al. (1997) applied the index to study dietary patterns among Finnish, Italian and Dutch populations and showed a negative association between HDI and mortality. Using the data from the first German Nutrition Survey 1998, THIELE et al. (2004) constructed two indices of dietary quality. The deficiency index consists of 13 vitamins and 12 minerals, while an excess index is made up of fats, cholesterol, sugar, alcohol etc. Application of these indices gives an indication on whether a particular diet is a result of overconsumption or underconsumption of specific nutrients. The outcomes of the study showed a positive association between dietary quality on the one side, and higher income, education level, increasing age and healthier lifestyles on the other side. Dietary diversity/variety approach The Dietary Guidelines for Americans and the Food Guide Pyramid stress an importance of diversity in a diet. An application of dietary diversity/variety measures to assessment of overall dietary quality can be found in a number of studies (SAVY et al., 2005, STEWART and HARRIS, 2005, DRESCHER et al., 2007). However, the empirical results on associations between food diversity and health outcomes are ambiguous. While some studies show that low food 2 Dietary quality and health: definition and measurement approaches 10 diversity was connected with an increased risk of early mortality (KANT et al., 1993), others demonstrated opposite results (e.g., MCCANN et al., 1994). On the one hand, an increased number of different foods consumed might bring an individual a higher range of different nutrients and lead to a better diet. However, on the other hand, a higher diversity in consumption may be accompanied by a generally higher total energy intake and result in over- consumption (RÖDER, 1998). A number of approaches to measure dietary diversity have been developed. In some studies, the count diet diversity measure is applied. This index counts the total number of foods/food groups consumed daily (KANT et al., 1993; RÖDER, 1998, SAVY et al., 2005). For instance, the Dietary Variety Score developed by DREWNOWSKI et al. (1997) is based on the cumulative number of 164 different foods consumed over a 15-day period. In contrast to count indices, the Berry-Index (also known as the Simpson Index) allows for assessing dietary diversity not only in terms of the number of foods consumed but also in terms of food distribution (THIELE and WEISS, 2003; STEWART and HARRIS, 2005; LEE, 1987). DRESCHER et al. (2007) developed a healthy food diversity indicator (HFD - Index) that in addition to the number and the distribution aspects also considered a health value of consumed foods. This indicator reflected healthy food diversity in the study more appropriately than the Count-Index and the Berry-Index. 2.1.2 Empirically derived dietary patterns The methods described above are called “a-priori” methods because they are based on the existing knowledge about a “healthy” diet (incorporate population dietary guidelines). Dietary quality can also be investigated by means of an “a posteriori” approach, which applies statistical methods such as factor and cluster analysis to find the consumption patterns (if any) in the population of interest. These methods are also subject to criticism due to the fact that they are based on available empirical data and might not represent optimal consumption patterns (RANDALL et al., 1990; WAIJERS and FESKENS, 2005). Cluster analysis is employed to group individuals with similar diets into homogeneous, mutually exclusive groups. Diverse criteria can be chosen as a basis for segmentation, e.g., the frequency of food consumed (MILLEN et al., 1996), percentage of energy contributed by each food or food group (WIRFÄLT and JEFFERY, 1997) and average food intakes (g) (HAVEMAN- NIES et al., 2001). In factor analysis, the dietary patterns, i.e., factors, are derived based on the correlations between variables, e.g., foods or food groups. In order to interpret the identified 2 Dietary quality and health: definition and measurement approaches 11 patterns, both these techniques are usually followed by further statistical analyses to investigate the relation between various eating patterns and the outcome of interest, e.g., cardiovascular risk factors or and biochemical indicators of health (WAIJERS and FESKENS, 2005). 2.1.3 Subjective (self-assessed) dietary quality Dietary quality can also be assessed by respondents themselves, who are asked to judge the overall healthiness of their diet using a particular scale, ranged, for instance, from “excellent” to “poor” (NAYGA, 1994). Under focus of research is also the correspondence between perceived and actual dietary quality that is investigated by a comparison of a subjective diet assessment with objective indicators (KENNEDY et al., 1995). In the study of Variyam et al. (2001), about 40% of the investigated population of household meal planner/preparers overestimate the quality of their diets. 2.2 Health status: definition and measurement “Because the concept of health is so complex, its quantitative definition will necessarily be derived from a composite of several measures, rather than a direct observation on a single scale” (BUSH et al., 1972). 2.2.1 Concept of health Health literature provides a number of diverse conceptual models of health. The medical model, which is the most basic one, defines health in physical terms, i.e., as absence of disease and disability, and is primarily used by physicians (LARSON, 1997). In the wellness model, the focus is given to the physical health, but the feelings of an individual about his overall health and its possible improvements are also taken into account (LARSON, 1997). Finally, the environmental model focuses on a complex interaction between individual and its environment (e.g., capability of growth and development in a particular environment), which is believed to affect one’s health more than single medical interventions (LARSON, 1997). In 1946, the WHO proposed the most widely quoted definition of health. It refers to health as to "a state of complete physical, mental, and social well-being and not merely the absence of disease or infirmity" (WHO, 2006: 1). Several decades later, the WHO’s Ottawa Charter formulated a new concept of health according to which health is not just a state of well- being, but “a resource for living” (WHO, 1986). Accordingly, health can be assessed, for instance, in terms of health-related behaviours (e.g., smoking and exercising) as they may have 2 Dietary quality and health: definition and measurement approaches 12 future health consequences. Or, in case of a physical health dimension, a person’s BMI or blood pressure can indicate a person’s health state (BRESLOW, 2006). A recent trend in health status measurement is connected with a concept of quality of life (QOL). WHO defines QOL as a broad multidimensional concept that addresses individuals' perceptions of positive and negative dimensions of life, e.g., aspects of physical, psychological, social and spiritual life (THE WHOQOL GROUP, 1995). A further development presents a concept of health-related quality of life (HRQOL) that focuses on those aspects of overall quality of life that may have an impact on physical/mental health (CDC, 2000). Different measures of health status have been developed based on the outlined concepts. Some of them, such as comprehensive health surveys, aim to incorporate the aspects of several/all of the above presented concepts, while others focus on particular aspects of one/several health problems (e.g., disease-specific health status assessment such as cancer or CVDs). In the following, a more detailed overview of the existing measurement approaches is given. 2.2.2 Measurement approaches The measurement of health status should be based on an accepted health concept. However, due to the absence of agreement on the appropriate health definition, a large number of instruments to measure health status have been developed (MCHORNEY, 1999; BEHRMAN and DEOLALIKAR 1988: 650; BEHRMAN et al., 1988; BOWLING, 1991: 2-11). Generally, health indicators employed in the empirical studies can be divided into objective and subjective ones. Objective measures include clinical and biochemical data such as blood pressure and cholesterol level (KENKEL, 1995; CHEN et al., 2002; CATANZARO and SUEN, 1996), anthropometric measures such as a person’s BMI (LOUREIRO and NAYGA, 2005; RASHAD, 2006; BEHRMAN et al., 1988), mortality rates (death rates, life expectancy) and statistics on health-service utilisation in a country (OR, 2000). Subjective health assessment is based on self-reports about illness or disability as well as on behavioural data, e.g., smoking status (DENTON und WALTERS, 1999; BLAYLOCK and BLISARD, 1992; CONTOYANNIS and JONES, 2004, FU et al., 2004). On the one hand, self-reports are claimed to be subject to measurement error as they might be correlated with respondent’s education, culture and socioeconomic status. Thus, persons with higher socio-economic status might possess better health information and awareness of own health status due to better access 2 Dietary quality and health: definition and measurement approaches 13 to medical services (STRAUSS, 1999; STRAUSS and THOMAS, 1996: 1919). On the other hand, self-reports are believed to be essential if the aim of the study is to obtain insights into a person’s subjective experiences or perceptions (BOWLING, 1991: 17). There are empirical studies showing that the self-assessed health status is a more powerful predictor of mortality in comparison to objective health indicators (DOMINICK et al., 2002; DESALVO et al., 2006). Further, health measurements can be in a form of a single-item measure (i.e., a single question or a measurement) such as blood glucose level, having teeth or eye sight problems, being actually treated for an illness, and not being able to work or to give blood. Alternatively, health status constructs can be based on the multi-item scales. The latter consist of a number of indicators (i.e., questions in the questionnaires related to a number of health conditions, illnesses or symptoms) receiving numerical scores depending on the given answer. These scores are summed into an overall health score called a “health index” (MCDOWELL and NEWELL, 1987: 12; KAZIS et al., 1989; DWYER and MITCHELL, 1999). Scaling methods for item responses may reflect, for instance, the respondent’s opinion presented on the nominal (agree or disagree), categorical (strongly agree, disagree, no opinion, agree or strongly disagree), or continuous scale (a rating scale from “death” at 0 to “full health” at 100) (BOWLING, 1991: 17; GERDTHAM et al., 1999). An example of the multi-item instruments is the standardised EuroQoL EQ-5D self- administered questionnaire developed in 1987 by an international research network. It has been used in a number of population surveys in the UK, Holland, Spain, Germany, and the USA (GREINER et al., 2003; KÖNIG et al., 2005; JOHNSON et al., 1998). It measures five dimensions of health: mobility, self-care, usual activities (work, study, housework, family, or leisure), pain/discomfort, and anxiety/depression. Respondents’ statements indicate whether they have no problem in the respective dimension, a moderate problem, or an extreme problem. Combination of responses provides a single index value for health status and may describe 243 different health states. Additionally, persons are asked to rate the perception of their overall health on the scale from 0 to 100 with higher scores standing for a perception of a better health (GREINER et al., 2003). An example for its application to the adult American population can be found in JOHNSON et al. (1998). In recent decades, a number of instruments have been developed based on the quality- of-life concept, e.g., the Quality-of-Life Instrument (WHOQOL) of the World Health Organization and the Health-Related Quality-of-Life Measure (HRQOL) of the Centre for Disease Control and Prevention (CDC). The latter is an instrument of the CDC in the USA 2 Dietary quality and health: definition and measurement approaches 14 applied to assess the health state of the American population. This tool incorporates a set of questions also called the "Healthy Days Measures”, i.e., the days in the past 30 days when both physical and mental health was good (CDC, 2000). Since 2000, the Healthy Days Measures are a part of the National Health and Nutrition Examination Survey (NHANES). HAYES et al. (2008) showed that a lower HRQOL was associated with several negative health conditions, e.g., hypertension. The approaches to the health status measurement (single indicators and indices) discussed above are not without shortcomings. Health status is a complex theoretical construct that is not directly observable. Consequently, a researcher must choose among available measurable variables a single one that is believed to be reliable and able to capture important features of the theoretical construct, which in practice may not be fulfilled. Moreover, both a single indicator and an index (based on several observable variables) are believed to contain at least moderate amounts of error (HUGHES et al., 1986). Another approach is to estimate theoretical constructs from the multiple indicator measures in a form of latent variables, which are also called unmeasured variables, factors, constructs, or true scores (BOLLEN, 2002). Latent variables are part of a number of statistical and data analyses models such as latent structure analysis, latent curve model, factor analysis and structural equation modelling. It is particularly often used in psychology and social sciences that usually have to deal with unobserved constructs, e.g., intelligence and self- esteem. This method has several strengths. First, it allows assessment of the adequacy with which theoretical constructs have been measured. Second, analysis of structural relationships among unobservable constructs can be performed. Third, this approach has a conceptual value because it provides a framework for theory conceptualization that involves thorough theoretical considerations involved in construction and statistical models testing (HUGHES et al., 1986). Latent variables can be formed a posteriori (derived from the data analysis via exploratory factor analysis procedure) or a priori (hypothesised before data analysis and tested via confirmatory factor analysis) (BOLLEN, 2002). Further information related to latent variables, their construction, representation and estimation is given in section 4.3. To summarise, the diet is an important factor that may cause chronic diseases including heart disease, stroke, certain types of cancer, and diabetes. Section 2 gave an overview of various approaches used to assess an individual’s diet. While individual components of dietary quality such as intakes of particular nutrients give an indication on what nutrients are under- or over-consumed, dietary indices deliver information about a diet as a 2 Dietary quality and health: definition and measurement approaches 15 whole and on how these elements are combined in the diet. The application of statistical methods to diet assessment, e.g., cluster and factor analysis, takes into account interrelations and correlations between foods into a diet and allows the derivation of homogeneous dietary patterns from collected data on food intake. However, both approaches have their drawbacks. The same is true for self-assessment of the diet. An application of more objective indicators of nutritional quality derived, e.g., from blood analysis is usually confronted with high costs of such measurements and unavailability of these data. The second subsection focuses on the concept of health presenting its definitions and measurement approaches. The overview shows that the researcher is faced with many alternatives in the process of health status conceptualization and measurement selection. The choice is dependent upon a particular health problem relevant to the study goals as well as methodological considerations. The section discussed a latent-variable approach, which offers a number of advantages and is an alternative to a single health indicator and derived health indices. A detailed description of this method is given in section 4.3. 3 Theoretical approach 16 3 Theoretical approach to explaining health-related behaviour and outcomes “The only way to keep your health is to eat what you don’t want, drink what you don’t like, and do what you’d rather not” Mark Twain’s quote in MEDICAL NEWS TODAY (2009). Chapter overview The chapter presents a theoretical approach to the analysis of health-related behaviour and outcomes. First, the theory of consumer demand is discussed. It is followed by the concept of household production introduced by BECKER (1965) and its extension to the field of health developed by GROSSMAN (1972). Thereby, a number of literature sources are used to provide the view of different authors with regard to the discussed theoretical approaches, i.e., their conceptualisation, main features, strengths and points for critical discussion. After the introduction of the theoretical approach, the empirical presentation of household production function follows. A number of alternatives for the empirical presentation (e.g., reduced-form or quasi-reduced health function) that are employed in the literature are presented. Further, an overview of the potential difficulties connected with an empirical estimation of a health production function is provided. Among others, the endogeneity problem and biases that may result due to its occurrence in the model are discussed. It is stressed that an empirical analysis of the health production model needs a complex modelling approach that would provide a consistent and careful estimation. 3.1 Consumer demand theory The consumer demand theory assumes that a consumption unit (a household) chooses from the alternatives available on the market such quantities of goods and services xi, which maximise utility U (3.1). Accordingly, consumers are believed to be rational and to make their choices taking into account the expected satisfaction from the chosen goods (YOUNG, 1996). Thereby, the choices are limited by the available resources (3.2) (BECKER, 1965): (3.1) U = u(x1, x2,…,xn) (3.2)  pi xi = I = W+V, where pi are prices of the purchased goods and services xi, I is monetary income, W is salary, and V stands for other non-labour incomes. 3 Theoretical approach 17 The consumer demand theory attributes the differences in behaviour mainly to the changes in goods’ prices and consumer incomes. Thus, a solution of the utility maximisation problem (3.1) presents a system of Marshallian demand functions depicting how changes in prices and income influence consumer’s optimal choices (DRESCHER, 2007: 84). Unexplained variations in demand are considered to be related to the changes in consumer tastes and preferences (MICHAEL and BECKER, 1973). Although preferences play an important role in an explanation of consumer behaviour, the process of their formation and the possibility to forecast their effects are not discussed (MICHAEL and BECKER, 1973; DAVIS, 1982). Additionally, socio-demographic characteristics are not assumed to affect the demand explicitly, but rather via their impact on the preferences structure of an individual or household (DAVIS, 1982). Further critique of the traditional theory examines the limitations in respect to the non-incorporation in the analysis of non-monetary variables (e.g., attitudes, beliefs, and knowledge) that may also influence consumer choices (YOUNG, 1996; DRESCHER, 2007: 84). MORITZ (1993: 127) discusses that although the consumer demand theory provides a theoretical explanation of the demand behaviour of many goods, in some cases it may be treated as a “base model”, which, if modified appropriately, can deliver the framework for the analysis of further (more) complex problems. Another aspect important for further discussion is the assumption that goods purchased on the market deliver direct satisfaction of consumer needs. While some goods exist in the market in the “ready-to-consume” form, many of them presume a need for further transformation (MORITZ, 1993: 127). For example, meal preparation is connected with such inputs as particular foods obtained on the market (e.g., rice, vegetables), as well as time input of the household members needed for cooking and human capital (e.g., cooking knowledge and abilities). Therefore, the conventional consumer theory cannot provide an explanation to the demand for goods that do not exist in a final form on the market. The production-related activities performed within households are usually neglected. Health of an individual can also be considered to be produced inside a household, and along with other goods being a source of satisfaction. The next section presents a development of the demand theory proposed by BECKER (1965). This theory is an important contribution that proposes a conceptual framework for taking into account the production process that takes place within a household. It presumes that market goods are transformed into final “commodities”, which are the sources of actual utility 3 Theoretical approach 18 in a household. It allows an explicit integration of non-market variables (e.g., household socio- demographic factors, attitudes, knowledge) into the traditional demand theory and thus proposes a framework for its application to diverse fields and problems including those from non-economic area such as marriage, good health or prestige (MICHAEL and BECKER, 1973). 3.2 Household production theory BECKER (1965) proposed a new formulation of consumer demand theory that was the first one to give attention to the problem of non-market (or home) goods4 and household production processes including time allocation within a household. According to BECKER, the utility U is obtained not from the goods available in a marketplace, but from the more basic goods produced in a household. These basic goods are called “commodities” and denoted by Zi. They are also known in economic literature as “Z- goods”. BECKER (1965) provides examples of basic goods such as “seeing a play”, “leisure”, “reading a book”, “sleeping”, “transportation” and “business lunch”. In later household production literature, Z-goods are considered to be even more fundamental including such items as, e.g., “prestige”, “good health”, “happiness”, “pleasure”, “social recognition” and “respect” (STAUDIGEL, 2012)5. Thus, the household's utility function can be written as: (3.3) U = U(Z1, Z2, ... Zn) A household is seen as a production unit and as a utility-maximiser. In order to produce commodities Zi it combines market goods yi with further inputs such as time ti and human capital HCi 6 within a household production function (MICHAEL and BECKER, 1973): (3.4) Zi = Zi(yi, ti, HCi) Since available resources (income) and time of household members are limited, together with production function they present constraints to the utility maximization. Thus, the resources constraint is: (3.5) wTVIyp wi m i  1 , 4 EVENSON (1981: 181) defines home goods as “goods which are not traded and do not have market prices”. 5 For a detailed discussion of the nature of “Z-goods” see STAUDIGEL (2012). 6 BECKER (1993: 149) discusses that human capital (e.g., in form of abilities and knowledge of the household members) belongs to the environmental variables that are related to the art of production and the technology level of the production process. 3 Theoretical approach 19 where pi is a vector of prices for a unit of yi, I is a household income, Tw is a vector of working hours, w shows the earnings for a unit of Tw, and V is a non-labour income. According to BECKER, a household allocates the total available time T either on work activities Tw or on consumption Tc (or leisure)7. Therefore, the time restriction can be written as: (3.6)   m wci TTTT 1 , Further, BECKER (1976: 92) discusses that the budget constraint depends on time constraint as “[…] time may be converted into goods by using less time at consumption and more at work”. Therefore, he combines these two constraints into a single resource constraint S called “full income” restriction. It presents an income that households could earn if they used their available time only for working activities8: (3.7)    wTVwTypS iii Thus, a household maximises utility subject to its full income constraint and to a production technology. It aspires to utility maximisation by choosing an optimal combination of commodities. In addition, it chooses the less expensive way of their production. Households allocate time between labour, home production and leisure in such a way that the cost of each commodity is minimised (EVENSON, 1981). Marginal cost of producing an additional commodity unit presents its “shadow price”. It is defined as “[…] weighted average of the value of home production time and the prices of the market goods used in the production of the home good” (EVENSON, 1981: 182). The value of home production time can be evaluated in terms of money income (or wage), which could be obtained in case of alternative labour activity on the market. Besides, human capital of households and their production abilities play an important role in utility maximization (MICHAEL and BECKER, 1973; EVENSON, 1981). At this point, it can be mentioned that due to the tendency in the last decades to rising incomes in many countries, the opportunity cost of time has increased. This affects the allocation of time in a household. For example, a household with higher opportunity costs of time may shift from time-intensive production technology such as cooking a dinner to a less- time consuming production technology aimed at satisfaction of their nutritional needs, e.g., home delivery of ready-to-eat meals or convenience products or even hiring someone to 7 In the model of BECKER (1976) leisure is a part of home production activities. The discussion about the need of separation of leisure time, working and home production activities is available in GRONAU (1977). 8 For a detailed derivation of the full-income constraint see BECKER (1976: 92). 3 Theoretical approach 20 perform this task for them. Although the aspect of time is critical in the model of BECKER (1965), it is usually very difficult to account for it empirically due to the unavailability of data on time allocation in households. Importantly, traditional consumer demand theory attributes differences in behaviour not only to income and prices, but also to the differences in consumer preferences. However, the formation of preferences is not explained. Therefore, the changes in preferences cannot be predicted, which limits the possibilities for further research (STIGLER and BECKER, 1977). Taking the concept of household production as a base, STIGLER and BECKER (1977: 76) argue that tastes can be seen as “stable over time and similar among people”. They discuss that changes in behaviour of individuals over time, i.e., changing tastes, are due to changes in the constraints, which are used to produce utility from commodities. These constraints are prices and available incomes. The authors explain their view about stable preferences on a number of examples such as consumption of addictive goods, listening to classical music or advertising. Thus, according to traditional consumer demand theory, advertising has an influence on consumers’ preferences. When following the assumption of STIGLER and BECKER (1977), a consumer obtains utility not only from a good itself, but also from the information he possesses about this good, irrespective of whether the information is true or false. The notion on a yoghurt’s label “Calcium helps to maintain strong bones and teeth” is an example of such information. Based on the household production theory, households combine market goods with time, knowledge and other inputs to maximise their utility. In this case, the knowledge is influenced by advertising. The authors (STIGLER and BECKER, 1977: 84) discuss that a Z-good that is produced by a household can be written as: (3.8) Z = f (x, A, E, y), where x is the output of the firm, A is advertising of the firm about its good, E is the human capital of consumers and y refers to other variables such as advertising of other firms. In case of no changes in advertising, human capital and other variables, the amount of the Z-good is proportional to the amount of the firm’s output (x) used by the household to produce this commodity. The authors discuss that an increase in the advertising of the firm’s product lowers the price of the commodity produced and consumed by the household. The rationale behind it is that the demand for the commodity rises, which in its turn changes the demand for the firm’s output. According to STIGLER and BECKER (1977: 84) this is “[…] because the household is made to believe - correctly or incorrectly - that it gets a greater output of the commodity from a given input of the advertised product”. The authors conclude that advertising affects 3 Theoretical approach 21 consumption due to its influence on the price of the commodity, not due to the changes in consumers’ taste. The assumption of constant preferences and the Z-theory overall have been a subject of the criticism (see, e.g., COWEN, 1989). COWEN (1989: 129) argues that although the assertion about changing preferences is arbitrary to a certain degree, this is also true with regard to the changes postulated by the household production theory. Thus, the assumption that listening to classical music changes the ability of an individual to produce relaxation (Z-good) could be as arbitrary as the assumption that listening to the music changes the person’s taste for music. Also, STIGLER and BECKER (1977: 84) stress with regard to their theory that “[…] it is a thesis that does not permit of direct proof because it is an assertion about the world, not a proposition in logic.” Further, the abstract character of Z-goods is discussed in literature, which is related to the ambiguity of their definition and quantification (see, e.g., STAUDIGEL, 2012). In addition, several other critical points of the household production theory can be mentioned. Thus, HEIMAN et al. (2001) and BROWNING et al. (1994) emphasise the aspect of joint decisions made in households. They argue that empirical studies usually treat a household as a single decision maker ignoring the potential heterogeneity (e.g., religious and cultural factors, division of tasks) within it that may affect the behavioural outcomes. GRONAU (1977) stresses the inability of the household production model to separate leisure and home production time in the total time of home production activities and shows that work at home and leisure are affected by their determinants (e.g., socioeconomics) in a different way. LANCASTER (1966) introduces another alternative approach to the theory of consumer behaviour. He argues that the objects of utility are not the goods, but rather the characteristics that these goods possess. Thus, consumers seek to obtain not the good itself (e.g., a meal), but the characteristics that this good contains (e.g., nutritional and aesthetic properties). Thereby, it is assumed that the characteristics of one or more goods are objective and perceived by all consumers as the same. Utility derived from these characteristics is subjective and depends on the preference structure of the individual. Thus the demand for any good is due to the demand for the characteristics of this good (DRESCHER, 2007: 108). While the overall marginal utility of a good may be positive, some of the specific characteristics of this good can be perceived by a consumer as negative. HENDLER (1975) gives an example of eating a sandwich and discusses that while a consumer enjoys this food due to its flavour, he may also experience disutility because of its high caloric value. 3 Theoretical approach 22 The contribution of Lancaster found its application especially in the studies of hedonic price analysis, in which the price of a particular good is determined by a number of objective (measurable) characteristics presented as independent variables in the equation. Applications of the hedonic model to agricultural products and foods aim to reveal how product characteristics affect the product price (TEUBER, 2010). This can be of interest for the food industry that may add certain characteristics to a particular product and, thus, gain from consumers’ higher willingness to pay. For example, MELTON et al. (1996) conducted an experimental auction, where consumers were asked to evaluate and bid on several samples of fresh pork chops that varied in a number of attributes (e.g., size, colour). This was followed by an estimation of hedonic price equations, where the effect on pork price was derived from a change in the level of a number of analysed attributes and consumers’ socio-demographic characteristics. Further examples in the field of food and agricultural products can be found in STEINER (2004), HUANG and LIN (2007) or WARD et al. (2008). 3.3 Household production of health 3.3.1 Theoretical presentation The household production approach has been applied to a variety of empirical problems in the fields of nutrition, fertility outcomes, child mortality, and labour supply (MICHAEL and BECKER, 1973; STRAUSS and THOMAS, 1995) and is particularly applicable to health-related research (ROSENZWEIG and SCHULTZ, 1983). Based on the theory of household production, GROSSMAN (1972) introduced the first formal economic model of health demand. In this framework, health is treated as a capital stock, which, however, is different from the other dimensions of human capital such as, for example, educational attainment9. He discusses that “[…] a person's stock of knowledge affects his market and nonmarket productivity; while his stock of health determines the total amount of time he can spend producing money earnings and commodities”. On the one hand, the stock of health is seen as a consumption commodity that directly enters the utility function because there is a direct satisfaction from being healthy. On the other hand, it is an investment commodity because the health of an individual affects the time devoted to (non)market activities (DAVANZO and GERTLER, 1990). 9 For further discussion on the types of human capital see, e.g., SCHULTZ (1997). 3 Theoretical approach 23 BERMAN et al. (1994, p.206) define the household production of health as “[…] a dynamic behavioural process through which households combine their (internal) knowledge, resources, and behavioural norms and patterns with available (external) technologies, services, information, and skills to restore, maintain and promote the health of their members”. Or to put it differently: the health state of an individual (of each household member) is determined by his unique production function, which is formed by a number of health inputs, socio-demographic characteristics, own time, genetic endowment and characteristics of the environment. An overview of the variables in the health-production model is given in Table 1. Table 1 Classification of variables in a health-production model Exogenous variables Endogenous variables a. Personal characteristics: c. Demanded inputs or proximate health factors: - unobserved individual endowments: e.g., genetic make-up, µ e.g., diet, utilisation of and expenditures for medical care, time allocation, breastfeeding, smoking, exercising, alcohol and drugs intake, anthropometric measures, y - observed individual endowments: e.g., age, gender, race, education, initial health, x d. Health outcome, H b. Observed environmental and community characteristics: e.g., mortality and morbidity rates, disease- specific outcomes etc. e.g., availability of goods and services, their prices (p), wage rates, type and quality of health services, climate, infrastructure, availability of information on health messages and its usage, e Source: Modified from SCHULTZ (1984) and DAVANZO and GERTLER (1990). GROSSMAN (1972) stresses the importance of individual and household characteristics for the efficiency of health production. Thus, education is assumed to be very important in the process of health production. Better-educated persons may be more knowledgeable about the effects of a particular behaviour on their health, may make better nutritional choices based on the information available in press or can better understand and follow the treatment prescribed by a doctor. The role of environment, e.g., availability of clean water, quality of public health services, is also recognised. According to GROSSMAN’S health model, a consumer demands the health-related inputs and behaviours not because he values these goods, but due to the expected health impact of these inputs. For example, a regular cholesterol check does not bring a direct utility, but is valued by individuals because it may produce additional health. Therefore, demand for health inputs can be seen as “derived” from the demand for health (GROSSMAN, 1972). Decisions regarding the selection of inputs are influenced not only by the household’s monetary and time constraints but also by the importance of this source of satisfaction (BERMAN et al., 1994). Clearly, an individual, besides good health, may also have other goals and sources of utility. 3 Theoretical approach 24 The relative value of health in comparison to other objectives may be important for the person’s decisions about health-related inputs (DAVANZO and GERTLER, 1990). Thus, WAGSTAFF (1986: 2) argues that “[…] if people valued their health above all else, they would not over-eat, smoke or drive too fast. That people do engage in such activities […] makes it clear that although people do value their health, they do not place an over-riding value on it”. These different forms of values and preferences are represented in economic analysis by the utility function. Thus, a person is faced with a number of trade-offs among desires for tasteful food, good health and other goods as well as resources constraints (VARIYAM, 2003). Thereby, they might, for instance, prefer enjoyment from eating fast food today rather than pursuing a healthier diet that could positively affect their health in the long run. An important feature of health production function is the diminishing marginal returns of the inputs to health status. Thus, an additional use of a health input (food, medicines etc.) in a developed country with a relatively high initial level of usage of this input will have a lower effect compared to its effect in the developing country with an initially low usage rate of this input. This may have implications for the success of health policy interventions in different settings (countries, regions etc.) (DAVANZO and GERTLER, 1990). 3.3.2 Empirical presentation Empirically, health-production models may deliver information about the parameters of a) the health production function, i.e., the technical relationship between health inputs and health outcomes, b) the relationship between changes in the determinants of health input choices (e.g., prices, socio-demographic variables, and time) and the mix of these inputs employed by an individual, and finally c) the effect of changes in the determinants of input choices on the final health outcome (BERMAN et al., 1994). As discussed, a household aims to maximise utility by consuming a range of commodities, one of which is health. Following CHEN et al. (2002) and the notations used in Table 1, the production function of health may be presented in a general form as: (3.9) H = H (y, x, µ) This equation is primarily concerned with the relationship between inputs (y) and output (H), whereas observed (x) and unobserved (µ) individual’s characteristics may affect the efficiency of health production with given inputs. 3 Theoretical approach 25 Further, the demand functions for inputs (y) can be derived from utility maximisation, which is subject to technology, time, and income constraints. The “full-income budget constraint” separates non-labour income (V) from market wage (w) and takes into account the total time that is available to a household for health-related activities (T) (BECKER, 1965). Following the empirical application of CHEN et al. (2002), the general form of a demand function for health inputs can be written as: (3.10) y* = y* (p, V, w, T, x, µ), where y*={y*1, y*2, …, y*n} is a set of utility-maximising demand functions for inputs (e.g., nutrients, medicines, exercises) and p is a vector of input prices. Equation (3.10) is a reduced-form demand function for inputs that shows how the changes in prices, income, socio-demographic characteristics of the household’s members, as well as their endowments and community characteristics affect choices of health inputs. Similarly, a reduced-form health function (H) can be derived: (3.11) H = H (p, V, w, T, x, µ). This function relates input prices, personal socio-demographic and economic factors directly to health itself, and therefore describes the total effect of exogenous variables on health outcome. Thus, reduced-form equations may deliver important insights for policy makers, as they show a direct effect of key socio-demographic and economic variables on health outputs (3.11) or health-input choices (3.12). While showing the impact of exogenous variables on health outcomes, reduced-form equations do not show the links through which prices and other exogenous variables influence health i.e., how they impact health-input choices (CONTOYANNIS and JONES, 2004; DAVANZO and GERTLER, 1990). These linkages may be of interest for researchers as “[…] household characteristics generally do not affect health directly, but indirectly through the behaviours they affect” (DAVANZO and GERTLER, 1990: 19). Moreover, a reduced health equation does not provide information on how lifestyles affect health. Further, BERMAN et al. (1994: 209) point out that “[…] since households are assumed to make their decisions on inputs in part with reference to their expectations about the health production function, these two dimensions of choice occur simultaneously or are interdependent”. Therefore, the two-stage model incorporating simultaneously the production technology (3.9) and input choices (3.10) is believed to be more appropriate than single-equation models (either a reduced demand equation (3.10) or a single production function (3.9)) (SCHULTZ, 1984; BERMAN et al., 1994; DAVANZO and GERTLER, 1990). 3 Theoretical approach 26 Another alternative is the estimation of a quasi-reduced form of health equations (see, e.g., EDWARDS and GROSSMAN, 1979). Here, a distinction between the production and input demand stage is not made, but rather a “hybrid” health equation, that is a mixture of production and demand parameters, is estimated. BEHRMAN and DEOLALIKAR (1988: 648) point out that “[…] such quasi-reduced forms would seem to be of limited interest because they generally neither reveal all of the structural parameters nor the total impact of exogenous changes”. A hybrid health equation usually includes one or several health inputs, household income and individual characteristics as right-hand variables (ROSENZWEIG and SCHULZ, 1983): (3.12) H = H (y, V, w, T, x, µ) While the empirical estimation of reduced-form equations is straightforward, it is not the case with health production functions and hybrid health equations. The aspects to be considered in the empirical analysis of these equations are discussed in the following section. 3.3.3 Challenges for empirical estimation Parameters of reduced-form equations (3.10 and 3.11) can be estimated by means of a standard ordinary-least-squares method, as all the right-hand variables in these equations are exogenous and uncorrelated. In contrast, a consistent estimation of the health production function (3.9) and the hybrid health equation (3.12) is confronted with certain difficulties. An important problem discussed in the economic literature is the endogeneity of health input variables and its consequences (ROSENZWEIG and SCHULTZ, 1983; BEHRMAN and DEOLALIKAR, 1988; SCHULZ, 1984). In general, if an independent variable is not exogenous, it will correlate with the residual in the outcome variable. Because of this correlation, some effects of the error term may be wrongly attributed to the explanatory variable, therefore making the estimates inconsistent. An explanatory variable may correlate with an error term due to simultaneous causation of predictor and outcome or omitted confounder variables, or because of errors in regression covariates (FOSTER and MCLANAHAN, 1996). First, many health endowments (genetic or environmental) are unobservable to a researcher but may be known to an individual10. As health inputs may be affected by the same unobservable factors that affect a final health outcome, not accounting for this can lead to 10 Exogenous health factors, which are partially known to individuals but unobserved by the researcher, are referred to as health heterogeneity. A correlation between these factors and health inputs may pose a problem of simultaneous-equation bias (SCHULTZ, 1984: 217-218). 3 Theoretical approach 27 simultaneity bias. For instance, if an individual has a predisposition to high blood pressure that is unobserved by the researcher, he will have a comparably poorer health status (e.g. higher risk of cardiovascular diseases) than other persons, in spite of using larger amounts of relevant health inputs, e.g., medicines. Thus, the demand for medication may be related to his propensity to a higher blood pressure (unobserved endowment). Not accounting for this endowment may result in a downward bias of the impact of medicine usage on the final health outcome. In relation to the simultaneity problem, it is important to bear in mind that many health inputs are endogenous because they are subject to individual choice. ROSENZWEIG and SCHULZ address this aspect in their seminal paper in the health production literature (1983, p. 723) as follows: „[…] Estimates of health technology must be obtained from a behavioural model in which health inputs are themselves choices”. Usage of longitudinal data that contains repeated measurements of each individual allows the control for unobservable health heterogeneity. Empirically it is performed by the application of fixed and random effects methods (JONES, 2000: 269-270) as the individual effects are the same in every period of time, whereas health inputs and outcomes may vary. However, due to endogeneity of many health inputs, as discussed above, the utilisation of simultaneous equation techniques are still needed to avoid the simultaneous equation bias (MWABU, 2007)11. One of these methods, which is widely employed, is the instrumental variables (IV) approach. It presumes that a number of truly exogenous variables such as market prices and community characteristics, which are believed to affect the demand for health inputs, but do not enter directly the health production function, are employed as instruments (BEHRMAN and DEOLALIKAR, 1988: 658; SCHULTZ, 1984). Such reduced-form equations incorporating all relevant exogenous variables are estimated for all the potentially endogenous health inputs in the model. Further related methodological approaches are discussed in the following chapter. Second, a common problem of the estimations of a health production model is connected with the omission of relevant determinants that correlate with those included in the model. Thus, such information as time use or some individual characteristics (e.g., occupation) if excluded from the model, may be correlated with some included variables, which may result in the omitted-variable bias (BEHRMAN and DEOLALIKAR, 1988: 643). For instance, the decisions on using health inputs may depend on the wage rate of individuals and input prices. 11 For an example of the fixed effects method see BISHAI (1996), who estimates a production function of child’s health using a 2SLS fixed effects model with several lagged variables, e.g., childcare time. 3 Theoretical approach 28 However, usually a researcher is confronted with lacking data. MAZZOCCHI and TRAILL (2008) discuss that socio-economic datasets do not include much information on health-related aspects, while nutritional surveys do not collect the full range of information on socio- economic characteristics of the individuals. According to them, there is no data set in Europe, which would incorporate individual data on nutrient intake, health outcomes, expenditure levels and input prices. BEHRMAN and DEOLALIKAR (1988: 646-648), who applied the framework of household production to the investigation of demand for health and nutrients in particular to the case of developing countries, provide a detailed discussion on a wide array of exogenous variables that may affect such demand relations, e.g., a range of exogenous processes, components of genetic, environmental and community endowments. However, they stress that it is very difficult or most often impossible to account for all the relevant variables empirically. Therefore, omitted-variable bias may arise. The problem of data unavailability is also true in respect to the inclusion of endowments (e.g., genetics, environmental dimensions). This can also lead to omitted-variable bias during the estimation of reduced-form equations, if these variables are correlated with observed ones. Third, errors-in-variables problems may contribute to the biased estimates, too. Thus, nutrition and health data is often based on self-reports and available only for a short reference period (BEHRMAN and DEOLALIKAR, 988: 658). Self-reports are believed to contain measurement errors and may also be correlated with respondent’s education, culture and socioeconomic status (STRAUSS, 1999). BEHRMAN and DEOLALIKAR (1988: 659) suggest that estimation of health status in the form of a latent variable could be a solution to reduce the bias. It is formed by multiple indicators and is believed to be able to provide a more valid and reliable measurement of a given construct (KLINE, 1998). In empirical analysis in the field of health economics, structural equation models is often specified to present health as a multiple- indicator (latent) model (JONES, 2000: 270). Due to the relevance of the structural equation modelling to the present study, its methodology is presented in more detail in the next section. Another aspect relevant for the estimation of both reduced-form equations and production functions is the estimation of parameters aggregated at the household level (i.e., household averages). However, the relations may differ for different individuals in the same household (BEHRMAN and DEOLALIKAR, 1988: 659). Finally, the impact of some inputs on health status can be lagged considerably (e.g., dietary quality) and if these lags are not specified, an estimation bias may arise (BEHRMAN and 3 Theoretical approach 29 DEOLALIKAR, 1988: 659). However, the specification of appropriate lags is often confronted with data limitations as well. To summarise, the household production model developed by BECKER (1965) is particularly applicable to the modelling in the field of health as the latter may be viewed as a commodity or an output of the production process-taking place in the household. Moreover, health directly enters the utility function as a household or an individual as they have a direct benefit from being healthy. Further, health status of an individual has multiple determinants and is produced by a number of choices that also interact with each other. The complexity of household health production implies a need to account for the discussed aspects in order to obtain unbiased estimates. In doing so, a complex econometric model should be applied with simultaneous estimation of the postulated relations. The following section gives an overview of the estimation approaches with special attention toward simultaneous equations models. 4 Methodological approach 30 4 Methodological approaches to health production function estimation Chapter overview In this chapter, the approaches used in economic literature to account for the complexity of health production in empirical analysis are discussed. The section starts with an overview of the existing estimation methods with a focus on simultaneous equations models. First, the method of instrumental variables, which is commonly used in the economic literature, is discussed, and relevant empirical examples from the health economics are presented. Further, the chapter provides a detailed discussion of the structural equation modelling approach (SEM), which can be used for the estimation of complex multi-equation models. The chapter discusses the possibilities and special features of the SEM. The fundamental issues of model specification, its identification, parameters estimation, and evaluation of model fit are outlined in detail. The chapter concludes with empirical examples of SEM in the field of health. 4.1 Methods of simultaneous equations models estimation A common strategy in economics aimed at investigation of a particular outcome (e.g., event or behaviour) in response to some influencing factors is to specify an equation that captures the theoretical assumptions and to estimate its parameters, e.g., the ordinary-least- squares method. However, in case that the assumptions about relationships among variables have a more complex character, a simultaneous equations model may be more appropriate (BERRY, 1984: 8). A number of methods have been developed for simultaneous equations models. They can be grouped into two main categories based on whether the equations are estimated one at a time or jointly. Table 2 provides the most common estimation methods of both groups. These are: a) “single-equation” methods that are also called “limited-information” approaches and b) “systems methods”, known as “full-information” techniques. 4 Methodological approach 31 Table 2 Estimation strategies for empirical models Empirical modelling: Single-equation (limited-information) methods Systems (full-information) methods Simultaneous equations methods Estimation method: Ordinary least squares (OLS), Probit, Logit  Indirect least squares (ILS)  Instrumental variables (IVs)  Two-stage least squares (2SLS) (special case of IVs)  Limited-information maximum likelihood (LIML)  Three-stage least squares (3SLS)  Full-information maximum likelihood (FIML)  Structural equation modelling (SEM) (e.g., LISREL, MIMIC models) Source: Own presentation based on KENNEDY (2003: 186-191) and GREENE (2003: 396-413). The single-equation (limited-information approach) foresees an estimation of a number of equations in the model, whereas each equation is estimated separately (one at a time) and the information on the restrictions is used only in the particular equation. On the contrary, the “systems” (full-information methods) allow estimating all the equations simultaneously; thereby, knowledge of all the restrictions in the model can be utilised (full-information methods) (KENNEDY, 2003: 186). An application of single-equation methods is connected with the estimation of one (or several) structural equation(s) and some reduced-form equations. On the contrary, in the “systems” methods all equations in the model are structural (CAMERON and TRIVEDI, 2005: 35)12. There are other methods dealing with estimation of multi-equation models. However, as stressed above, not all sets of equations are simultaneous. Thus, the frequently employed technique “seemingly unrelated regressions” (SUR or SURE) proposed by ZELLNER (1962) is a generalization of OLS for multi-equation systems. It represents a system of regression equations that are connected not because they interact, but because the error terms across the equations are correlated. Each equation in the model could be estimated separately by the OLS and would deliver consistent estimates. However, parameter estimates of the SUR method are more efficient. KENNEDY (2003: 192) gives an example of a two-equation model that presents demand functions for two different goods. In case a shock affects the demand for one good, it might be transmitted to another good. Therefore, estimation of these equations in a set may deliver estimates that are more efficient. 12 Equations are expressed in reduced form when each endogenous variable in the model is modelled to be caused only by predetermined variables (i.e., exogenous and lagged endogenous) and an error term. Therefore, estimators from the reduced-form equations show how much each endogenous variable in the model changes in response to a unit change in each predetermined variable. In the structural equations, the endogenous variables are expressed as a function of the exogenous and endogenous variables in the model that are assumed to have a causal effect on them and error term. Thus, the structural equations display the causal interrelations in respect to the modelled process and, therefore, reveal the reason for change of endogenous variable in response to a unit change in a predetermined variable (BERRY, 1984: 28). 4 Methodological approach 32 The full-information techniques are generally believed to be able to derive more efficient estimators compared to the limited-information methods (BERRY, 1984: 81; KLINE, 1998: 177). The drawbacks of these methods are their high computational costs. Additionally, they are very dependent on the correct model specification. Thus, in case of a wrong specification all the parameter estimates in the model are affected, whereas in estimations by single-equation methods, the impact of a wrong specification is transmitted only on parameters of the corresponding equation (KENNEDY, 2003: 190). From the techniques listed in Table 2, the 2SLS approach finds most frequent application. It replaced the earlier, more computationally complex LIML estimation procedure. In the first stage of 2SLS, the possibly endogenous causal variables are regressed on all the predetermined variables in the model. The reduced form is estimated. In the second stage, the estimated values of the endogenous variables from the stage 1 (treated as instrumental variables) and included in the OLS equation as regressors along with the predetermined variables. The extensions of the 2SLS approach are the Two-stage Residual Inclusion (2SRI) and the Two-stage Predictor Substitution (2SPS). The latter is employed for modelling non- linear relationships (TERZA et al., 2008). Although IV procedures including 2SLS showed to be appropriate for estimations that use non-experimental data, some of the pitfalls are pointed out in the literature (WOOLDRIDGE, 2002: 101). First, this approach relies strongly on the explanatory power of the instruments employed. However, empirical analysis has shown that often the selected instruments are only weakly correlated with the potentially endogenous variables and therefore are weak and less reliable measures of these variables (ROSENZWEIG and SCHULTZ, 1983; KENKEL, 1995; BOUND et al., 1995). A related potential problem that is connected with the quality of the instruments is that the standard errors tend to be “large” in these estimations and the estimated coefficients are non-significant. Additionally, multicollinearity problem may arise in the 2SLS when the newly created variables are entered as predictors together with the exogenous variables (BERRY, 1984: 69). The systems counterpart of 2SLS is the 3SLS method. It is an extension of the 2SLS procedure that allows the incorporation of disturbances’ correlations of different equations (similarly to the way how SUR extends OLS) (KENNEDY, 2003: 190). Another full-information approach to the estimation of complex multi-equation models is the SEM that is explained in more detail in the section 4.3 together with the relevant empirical examples. 4 Methodological approach 33 The following chapter gives empirical examples of the 2SLS method as an approach frequently applied in the field of health production modelling with an aim to account for the endogenous nature of certain inputs of health production (as discussed in section 3.3.3). 4.2 Empirical examples of the two-stage estimation method The importance of accounting for endogenous health inputs is demonstrated in two early studies of ROSENZWEIG and SCHULTZ (1983) and SCHULTZ (1984) who provided a generic approach to the estimation of heal