USE OF ACCELEROMETRY AND MACHINE LEARNING TO MEASURE FREE-LIVING PHYSICAL ACTIVITY AND SEDENTARY BEHAVIOR By Alexander Henry Montoye A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Kinesiology – Doctor of Philosophy 2014 ABSTRACT USE OF ACCELEROMETRY AND MACHINE LEARNING TO MEASURE FREELIVING PHYSICAL ACTIVITY AND SEDENTARY BEHAVIOR By Alexander Henry Montoye Physical activity (PA) and sedentary behavior (SB) are important behavioral variables that are associated with many key short- and long-term health indices. Objective and highly accurate methods of measuring PA and SB are needed in order to better understand the relationships of PA and SB with various health outcomes, determine population levels of PA and SB, identify and target groups at high risk of having low PA or high SB, and assess the effectiveness of interventions aimed to increase PA and reduce SB in populations. Of the available measurement tools, accelerometer-based activity monitors have gained popularity due to their blend of feasibility for use and relatively high accuracy for assessing PA (by identifying specific activity types), SB, and energy expenditure (EE). However, little research has been done to compare the accuracy of accelerometers placed on different parts of the body, and current data modeling methods are either 1) simple to use but lack accuracy or 2) highly accurate but highly complex. Therefore, the purpose of this dissertation was 1) to develop accurate and relatively simple data processing and modeling methods for accelerometer data and 2) to compare accelerometers located on the right hip, right thigh, and both wrists for classification of activity type and prediction of SB and EE. Healthy adults (n=44) were recruited to participate in a 90-minute simulated free-living protocol. For the protocol, participants performed 14 activities for between 3-10 minutes, with order, duration, and intensity of activities left up to participants. Participants wore a portable metabolic analyzer (for a criterion measure of EE) and four accelerometers, which were placed on the right hip, right thigh, and both wrists. The order and timing of the activities performed during the protocol was recorded by a trained research assistant (for a criterion measure of activity type and SB). Machine learning algorithms (i.e., artificial neural networks) were created by extracting simple-to-compute features from the data from each of the four accelerometers in order to classify activity type and predict SB and EE. Accuracy of the four accelerometers for each outcome variable was assessed by comparing predictions from the accelerometers to the actual values obtained by the criterion measures. Additionally, we processed, cleaned, and extracted features of the accelerometer data in Microsoft Excel and created the artificial neural networks using R software, thereby accomplishing our goal of using simple methods to create machine learning algorithms to model accelerometer data. Overall, the thigh accelerometer provided the highest predictive accuracy for EE, although both the wrists and hip accelerometers also provided highly accurate EE predictions. For recognition of activity type, the wrist accelerometers achieved the highest accuracy while the hip accelerometer had the lowest accuracy. Finally, for prediction of SB, the hip and left wrist accelerometers provided the highest accuracy while the right wrist accelerometer provided the lowest accuracy. Our study highlights the strengths and weaknesses of accelerometers placed on the hip, thigh, and wrists for prediction of activity type, SB, and EE. These findings suggest that single accelerometers can be used for accurate measurement of PA, SB, and EE, although the optimal accelerometer placement site will depend on the specific research question. Further research should be conducted in a true free-living setting with a more diverse population, different sets of activities, and when using other types of machine learning to mode the accelerometer data. Copyright by ALEXANDER HENRY MONTOYE 2014 I would like to dedicate this dissertation to my grandfather, Henry Montoye. You are a pioneer in the field of exercise physiology and have had a lasting positive impact on our world through your work. I feel privileged to get to follow in your footsteps, and I have had the opportunity to meet so many great scientists in the field due to my connection with you. More than that, though, you have been a wonderful grandfather. I will never forget all the card playing, drawings, broken cookies, Great Harvest breads, and Old Country Buffet trips you have shared with me over the years. You are a role model in how to lead a successful career and be an involved husband, father, grandfather, and great-grandfather. Thank you. v ACKNOWLEDGEMENTS First, I would like to thank my advisor, Dr. Karin A. Pfeiffer, for her guidance and support in my four years at Michigan State University. You have been incredibly supportive of the different projects I have undertaken in my doctoral work, even when some of them did not directly push me toward completing my degree. I would also like to thank my dissertation committee for their assistance in designing and implementing a project that has established a solid line of research for me to continue in the future. Second, I want to thank the fellow doctoral students for making the graduate experience at Michigan State so rewarding. They have been so helpful in learning the ins and outs of teaching and research, and they have also been supportive through the highs and lows of school and non-school events. I also want to give a shout out to Chris Connolly for being a great conference roommate and lifting buddy, Kimbo Yee for being a great teaching mentor and fellow fan of the Brody cafeteria, Catherine Gammon for teaching me the true art of tea drinking, and Ian Cowburn for putting up with the whirring of my stationary bike at all times of the day. I owe a special thank you to my parents, brother, and grandparents. I would not be where I am without your love and constant support. Lastly, I want to thank my soon-to-be wife, Laura Kohn. You have been so understanding and patient with me through my doctoral work, allowing me the time I need to complete my work but also making sure that I kept a work-life balance. I cannot thank you enough for keeping me grounded through school and helping to make our distance relationship work as well as it has. I love you and feel so lucky to get to spend my life with you. vi TABLE OF CONTENTS LIST OF TABLES ...........................................................................................................................x LIST OF FIGURES ...................................................................................................................... xii KEY TO SYMBOLS AND ABBREVIATIONS ........................................................................ xiv CHAPTER 1: INTRODUCTION .................................................................................................1 Physical activity and sedentary behavior .................................................................................1 Measurement of physical activity and sedentary behavior.......................................................2 SPECIFIC AIMS AND HYPOTHESES .........................................................................................9 CHAPTER 2: LITERATURE REVIEW...................................................................................13 Introduction ........................................................................................................................13 The influence of physical activity and sedentary behavior on health ................................14 Physical activity .....................................................................................................14 Sedentary behavior.................................................................................................15 Accelerometry as a preferred method to measure physical activity, energy expenditure, sedentary behavior, and activity type.................................................................................23 Measurement methods ...........................................................................................23 The Large-Scale Integrated monitor and Caltrac ...................................................26 Linear regression ....................................................................................................28 Multiple regression ................................................................................................31 Measurement of sedentary behavior using accelerometers ...................................34 Machine learning ...................................................................................................36 Multiple sensor methods ........................................................................................41 Accelerometer placement.......................................................................................49 Laboratory-based vs. free-living settings ...............................................................60 Accelerometer reliability .......................................................................................64 Identifying non-wear ..............................................................................................66 Summary of current evidence and future directions ..........................................................69 CHAPTER 3: VALIDATION AND COMPARISON OF ACCELEROMETERS LOCATED ON THE WRISTS, HIP, AND THIGH FOR FREE-LIVING ENERGY EXPENDITURE PREDICTION ................................................................................................70 ABSTRACT ...................................................................................................................................70 INTRODUCTION .........................................................................................................................72 METHODS ....................................................................................................................................76 Summary of protocol .........................................................................................................76 Participants.........................................................................................................................76 Instrumentation ..................................................................................................................77 vii ActiGraph accelerometers......................................................................................77 GENEA accelerometers .........................................................................................78 Oxycon portable metabolic analyzer .....................................................................78 Procedure ...........................................................................................................................79 Data reduction and modeling .............................................................................................82 Artificial neural networks ......................................................................................82 Window length .......................................................................................................85 Features ..................................................................................................................86 Size of the hidden layer..........................................................................................91 Oxycon data ...........................................................................................................92 Statistical analyses .............................................................................................................92 Power analysis ...................................................................................................................94 RESULTS ......................................................................................................................................96 DISCUSSION ..............................................................................................................................100 Study strengths and limitations ........................................................................................106 Conclusions ......................................................................................................................108 CHAPTER 4: COMPARISON OF ACTIVITY TYPE CLASSIFICATION ACCURACY FROM ACCELEROMETERS WORN ON THE WRISTS, HIP AND THIGH.................110 ABSTRACT .................................................................................................................................110 INTRODUCTION .......................................................................................................................112 METHODS ..................................................................................................................................116 Summary of protocol .......................................................................................................116 Participants.......................................................................................................................116 Instrumentation ................................................................................................................116 ActiGraph accelerometers....................................................................................117 GENEA accelerometers .......................................................................................117 iPAQ portable digital assistant and direct observation ........................................118 Procedure .........................................................................................................................118 Data reduction and modeling ...........................................................................................121 Artificial neural networks ....................................................................................121 Window length .....................................................................................................124 Features ................................................................................................................125 Activity type classification ..................................................................................129 Identifying non-wear ............................................................................................130 Direct observation ................................................................................................131 Statistical analyses ...........................................................................................................131 Power analysis .................................................................................................................133 RESULTS ....................................................................................................................................134 Confusion matrices ..........................................................................................................137 Activity categories ...........................................................................................................139 Activity intensity categories ............................................................................................141 DISCUSSION ..............................................................................................................................155 Strengths and limitations..................................................................................................163 Conclusions ......................................................................................................................164 viii CHAPTER 5: VALIDATION AND COMPARISON OF ACCELEROMETERS WORN ON THE WRISTS, HIP, AND THIGH FOR MEASURING SEDENTARY BEHAVIOR ......................................................................................................................................................165 ABSTRACT .................................................................................................................................165 INTRODUCTION .......................................................................................................................167 METHODS ..................................................................................................................................172 Summary of protocol .......................................................................................................172 Participants.......................................................................................................................172 Instrumentation ................................................................................................................172 ActiGraph accelerometers....................................................................................173 GENEA accelerometers .......................................................................................173 iPAQ portable digital assistant and direct observation ........................................174 Procedure .........................................................................................................................174 Data reduction and modeling ...........................................................................................177 Artificial neural networks ....................................................................................177 Assessing sedentary behavior using accelerometers............................................182 Direct observation ................................................................................................183 Statistical analyses ...........................................................................................................184 Power analysis .................................................................................................................185 RESULTS ....................................................................................................................................187 DISCUSSION ..............................................................................................................................194 Strengths and limitations..................................................................................................199 Conclusions ......................................................................................................................199 CHAPTER 6: DISSERTATION SUMMARY AND RECOMMENDATIONS...................201 Summary of results ..........................................................................................................201 Chapter 3: Estimation of energy expenditure ......................................................201 Chapter 4: Classification of activity type.............................................................205 Chapter 5: Estimation of sedentary behavior .......................................................209 Conclusions ..........................................................................................................212 Recommendations for future research .............................................................................218 APPENDICES ............................................................................................................................222 APPENDIX A: Consent form ...................................................................................................223 APPENDIX B: Recruitment flyer ............................................................................................227 APPENDIX C: Email flyer .......................................................................................................228 APPENDIX D: Supplemental figures ......................................................................................229 REFERENCES ...........................................................................................................................242 ix LIST OF TABLES Table 2.1. Comparison of wireless accelerometer systems for activity classification accuracy and EE prediction accuracy ..................................................................................................................47 Table 2.2. Comparison of different monitor placements for activity classification accuracy and EE prediction accuracy ..................................................................................................................58 Table 3.1. Activities performed during the simulated free-living protocol .....................................81 Table 3.2. Features used for EE prediction ....................................................................................90 Table 3.3. Feature sets used for creation and testing of ANNs .......................................................91 Table 3.4. Minimum Pearson correlations detectable for a given sample size and power ...............95 Table 3.5. Demographic characteristics of participants enrolled in study .......................................96 Table 3.6. Correlations of measured vs. predicted EE ...................................................................97 Table 3.7. Bias for measured vs. predicted EE...............................................................................99 Table 4.1. Activities performed during the simulated free-living protocol...................................120 Table 4.2. Features used for EE and activity type prediction .......................................................128 Table 4.3. Feature sets used for creation and testing of ANNs .....................................................129 Table 4.4. Demographic characteristics of participants enrolled in study .....................................134 Table 4.5. Overall sensitivity, specificity, and AUC for each of the four accelerometer placements for feature set 1 .........................................................................................................137 Table 4.6. Confusion matrix for activity type classification from a hip-mounted ActiGraph accelerometer ...............................................................................................................................143 Table 4.7. Confusion matrix for activity type classification from a thigh-mounted ActiGraph accelerometer ...............................................................................................................................144 Table 4.8. Confusion matrix for activity type classification from a GENEA accelerometer mounted on the left wrist .............................................................................................................145 x Table 4.9. Confusion matrix for activity type classification from a GENEA accelerometer mounted on the right wrist ...........................................................................................................146 Table 4.10. Activity-specific sensitivity, specificity, and AUC among the four accelerometer placement sites. ............................................................................................................................147 Table 4.11. Overall sensitivity, specificity, and AUC among the four accelerometer placement sites with combined activity categories. ......................................................................................149 Table 4.12. Activities classified into activity intensities by the Compendium and by measured METs............................................................................................................................................151 Table 4.13. Overall sensitivity, specificity, and AUC among the four accelerometer placement sites for classification of activity intensity...................................................................................153 Table 5.1. Activities performed during the simulated free-living protocol...................................176 Table 5.2. Features used for EE and activity type prediction .......................................................181 Table 5.3. Demographic characteristics of participants enrolled in study .....................................187 Table 5.4. Root mean square error for prediction of total time spent in SB and breaks in SB ...189 Table 6.1. Overall sensitivity, specificity, and AUC among the four accelerometer placement sites for classification of activity intensity using the energy expenditure ANNs (developed in Chapter 3).....................................................................................................................................214 xi LIST OF FIGURES Figure 3.1 ANN for predicting EE .................................................................................................84 Figure 3.2. RMSE values for predicted vs. measured EE ..............................................................98 Figure 4.1. ANN for predicting activity type ...............................................................................123 Figure 4.2. Sensitivity for the four accelerometers, compared among feature sets ....................136 Figure 4.3. Comparison of dominant and non-dominant wrist accelerometer sensitivities........154 Figure 5.1. ANN for predicting activity type and sedentary behavior ..........................................179 Figure 5.2. Predictions of total time spent in SB compared to a criterion measure (direct observation)..................................................................................................................................188 Figure 5.3. Predictions of breaks in SB using a five-second interval .........................................191 Figure 5.4. Predictions of breaks in SB using a 30-second interval ...........................................191 Figure 5.5. Predictions of breaks in SB using a 60-second interval ...........................................192 Figure B.1. Recruitment flyer .......................................................................................................227 Figure D.1. Equipment worn by participants during the 90-min protocol. Participant shown is performing the lying activity (T1) .................................................................................................229 Figure D.2. Example of participant performing reading activity (T2) ..........................................230 Figure D.3. Example of participant performing computer use activity (T3) .................................231 Figure D.4. Example of participant performing standing activity (T4) .........................................232 Figure D.5. Example of participant performing laundry activity (T5) ..........................................233 Figure D.6. Example of participant performing sweeping activity (T6) .......................................234 Figure D.7. Example of participant performing walking slow and fast activities (T 7 and T8) ......235 Figure D.8. Example of participant performing jogging activity (T9) ..........................................236 xii Figure D.9. Example of participant performing cycling activity (T10) .........................................237 Figure D.10. Example of participant performing stair use activity (T11) ......................................238 Figure D.11. Example of participant performing biceps curls activity (T12) ................................239 Figure D.12. Example of participant performing squats activity (T13) .........................................240 Figure D.13. Example of non-wear (T14) .....................................................................................241 xiii KEY TO SYMBOLS AND ABBREVATIONS ANN artificial neural network AUC area under the receiver operating characteristic curve BMI body mass index counts/minute accelerometer signal counts per minute CSA Computer Science Application accelerometer CV coefficient of variation DO direct observation EE energy expenditure g gravitational force HR heart rate Hz hertz IDEEA Intelligent Device for Energy Expenditure and Activity kcal kilocalorie (or Calorie) kcal/wear time kilocalories per hour of time the accelerometer was worn kg kilogram kg/m2 kilograms per meter squared LPA light-intensity physical activity LSI Large-Scale Integrated motor activity monitor MET-hour metabolic equivalent hours METs metabolic equivalents ml milliliter ml/kg/min milliliters per kilogram body mass per minute xiv mph miles per hour MVPA moderate-to-vigorous intensity physical activity NHANES National Health and Nutrition Examination Survey PA physical activity PDA personal digital assistant r Pearson correlation RMANOVA repeated measures analysis of variance RMSE root mean square error rpm revolutions per minute SB sedentary behavior SD standard deviation TV television VCO2 volume of carbon dioxide expelled VO2 volume of oxygen consumed x-axis vertical accelerometer axis y-axis medial-lateral accelerometer axis z-axis anterior-posterior accelerometer axis xv CHAPTER 1 INTRODUCTION Physical activity and sedentary behavior Physical activity (PA) is widely recognized for its beneficial effects on many aspects of health, including reduced risk of obesity (King and Tribble 1991), hypertension (Paffenbarger, Wing et al. 1983; Chobanian, Bakris et al. 2003), diabetes (Healy, Wijndaele et al. 2008), cardiovascular disease (Paffenbarger, Hyde et al. 1986; Morris, Clayton et al. 1990), some cancers (Thune and Furberg 2001), and all-cause mortality (Lee and Skerrett 2001). Based on most evidence, the US Department of Health and Human Services recommends a minimum of 150 min/week of moderate-intensity PA or 75 min/week of vigorous-intensity PA, defined as activities requiring an energy expenditure (EE) of at least 3.0 or 6.0 times the resting level (METs), respectively, to experience health benefits (2008). Activities below 3.0 METs do not qualify as moderate- or vigorous-intensity PA and instead are labelled as either light-intensity PA or sedentary behavior. Sedentary behavior (SB) is defined as a supine or seated activity requiring low levels of EE (< 1.5 METs) (Owen, Healy et al. 2010; SBRN 2012). Examples of SB include watching television (TV), using a computer, or driving. SB has historically been viewed as a lack of moderate-to-vigorous PA (MVPA); however, recent epidemiological and laboratory-based evidence suggests that SB elicits distinct physiologic responses from MVPA, with high levels of SB associated with diminished metabolic (Hamilton, Hamilton et al. 2004; Hamilton, Hamilton et al. 2007), cardiovascular (Schrage 2008), and bone health (Zerwekh, Ruml et al. 1998) and increased risk of obesity (Hu, Li et al. 2003), some cancers (Howard, Freedman et al. 2008), and all-cause mortality (Katzmarzyk, Church et al. 2009). It is 1 important to note that the associations between SB and negative heath conditions exist independently of total MVPA (Owen, Healy et al. 2010). These associations are especially concerning given that technological advances (e.g., motor vehicles, TV, computers) have contributed to an increased time spent sedentary (Matthews, Chen et al. 2008). Moreover, there are several components of SB that may influence health, notably the total time spent in SB (Healy, Wijndaele et al. 2008) as well as the number of times SB is broken up by non-sedentary activities (breaks in SB) (Healy, Dunstan et al. 2008). Thus, it is important to be able to accurately measure each of these components to determine the true influence of SB on health. Despite the available evidence, there are still knowledge gaps regarding the specific effects of PA and SB on health (PAGAC 2008). For example, there is not enough research into SB to allow for evidence-based recommendations to be developed. Additionally, there is currently only limited evidence of dose-response or threshold effects of SB on chronic health conditions such as heart disease and cancer (Owen, Healy et al. 2010). These knowledge gaps are due mainly to the absence of a single measurement tool that is valid for measuring both PA and SB and that can be used for a variety of activities and environments (Owen, Healy et al. 2010). Without such a measurement tool, researchers will be unable to accurately assess the relationship of PA and SB to health outcomes, monitor precise levels of PA or SB, or evaluate the effectiveness of interventions aimed to increase PA and decrease SB. Measurement of physical activity and sedentary behavior PA and SB can be assessed using a number of different methods, but accelerometers have emerged as a preferred method of assessing free-living PA and SB due to their objectivity, minimal participant burden, and rich data that can be collected for periods of up to 4-6 weeks and beyond 2 (Welk 2002). Accelerometer data can be used to estimate energy expenditure (EE) and time spent in various activity intensities (sedentary, light, moderate, vigorous). Accelerometers are generally worn on the hip for comfort, convenience, and utility for measuring movements of the whole body. Additionally, hip-mounted accelerometers have shown good utility for measuring ambulatory activities (e.g., walking, running) in laboratory-based settings (Freedson, Melanson et al. 1998; Rothney, Schaefer et al. 2008; Lyden, Kozey et al. 2011). Traditionally, accelerometer data have been filtered and then translated into ‘activity counts.’ These counts are then placed into simple linear regression equations to estimate EE (Montoye, Washburn et al. 1983; Freedson, Melanson et al. 1998). Linear regression works well for measuring the energy cost of ambulatory activities (i.e., walking and running), but it dramatically under- or overestimates the EE requirement of many sedentary, lifestyle, and exercise activities and does not allow for classification of activity type (i.e., classifying activities as sitting, walking, running, cycling, etc.) (Crouter, Churilla et al. 2006; Rothney, Schaefer et al. 2008; Lyden, Kozey et al. 2011). Other data processing methods, such as machine learning, have recently evolved as successful alternatives for analyzing data collected from accelerometers. Machine learning is the general term for an array of mathematical techniques that can be used to recognize patterns in data and use those patterns to accurately predict activity type or EE. Machine learning bears some similarities to linear regression; for example, both machine learning and linear regression use one or more input (independent) variables (e.g., accelerometer counts, heart rate, etc.) to predict an outcome (such as EE). However, unlike traditional linear regression, machine learning techniques do not assume a simple relationship between accelerometer counts and EE, and machine learning takes more information from the accelerometer than just counts (e.g., monitor orientation, patterns of count accrual). Machine 3 learning techniques are more complicated than linear regression, but they show improvements in EE measurement (Rothney, Neumann et al. 2007; Staudenmayer, Pober et al. 2009) and allow for classification of activity type (Khan, Lee et al. 2008; Khan, Lee et al. 2010; Trost, Wong et al. 2012), thereby allowing estimation of time spent in SB and breaks in SB. Currently, there is no consensus on which machine learning technique is best for EE measurement or activity classification; however, artificial neural networks (ANNs) have received the most use in kinesiology-based studies because they can be used to predict both continuous variables (such as EE) and categorical variables (such as activity type classification). Additionally, ANNs can be applied to data from commonly used accelerometers and can be developed from freely available software packages (e.g., R statistical software) (Staudenmayer, Pober et al. 2009). Despite the emphasis on measurement of EE and time spent in MVPA, classification and measurement of SB have lagged behind. Only very recently have validation studies been conducted specifically to assess the ability of accelerometers to accurately measure time spent in SB and breaks in SB, and these studies have yielded mixed results (Grant, Ryan et al. 2006; Kozey-Keadle, Libertine et al. 2011; Lyden, Kozey Keadle et al. 2012). Additionally, SB has rarely been included in protocols utilizing machine learning for EE and activity classification (Freedson, Lyden et al. 2011), leaving a large gap in the literature regarding the utility for accelerometer measurement of SB. Finally, standing has often been considered a type of SB, but standing involves significant contraction of muscles in the legs and postural muscles and does not have many of the negative physiologic effects of prolonged sitting or lying (Hamilton, Hamilton et al. 2004; Hamilton, Hamilton et al. 2007). Additionally, it may be that different types of SB elicit different amounts of muscle contraction (e.g., sitting at a computer might require postural muscles, while lying down may not). Therefore, an accurate measurement tool must be able to differentiate 4 standing from SB and also differentiate among different types of SB in order to gain an understanding of the true health risks of SB. Ultimately, there must be a balance between quality of information/data collected in a research study vs. the burden on participants and researchers. For accelerometer-based activity monitor data, use of multiple monitors and collection of several physiologic variables can improve EE measurement and activity classification (Rothney, Neumann et al. 2007; De Vries, Garre et al. 2011; Dong, Biswas et al. 2013) . However, more monitors also increases participant burden dramatically, which may lower compliance rates and, consequently, reduce the amount and quality of data collected. Additionally, large-scale studies cannot easily use multiple monitors due to the dramatic increase in time, burden, and cost necessary to collect, process, and analyze the data. Use of a single activity monitor that can collect data on one or more variables is strongly preferred for large, free-living studies due to ease of use for participants and researchers while still providing a valid measurement of the PA outcome variable(s) of interest. Additionally, machine learning techniques are much more complex to use and understand than traditional linear regression techniques. In order to make machine learning suitable for researchers to use, current approaches to developing and using machine learning must be simplified as much as possible without losing measurement accuracy. In summary, there is a need to refine the methodology for using a single activity monitor for measurement of EE and classification of both SB and non-sedentary activities, especially in free-living settings; additionally, efforts to reduce the complexity of machine learning will make this approach more accessible to researchers who want to measure PA, EE, and/or SB but who are not measurement specialists. 5 Hip-mounted accelerometers are commonly used for comfort and utility for measuring ambulatory activities, but they may offer a more limited ability to classify certain types of activities. Machine learning techniques have been applied to hip-mounted accelerometers with a high degree of success for measuring EE (Rothney, Neumann et al. 2007; Staudenmayer, Pober et al. 2009; Trost, Wong et al. 2012) and activity type (when assessing non-sedentary activities) (Khan, Lee et al. 2008; Bonomi, Plasqui et al. 2009; Khan, Lee et al. 2010). Conversely, these techniques have rarely been used for measurement of total SB, distinguishing among standing and different types of SB, or measuring breaks in SB (Freedson, Lyden et al. 2011; Trost, Wong et al. 2012). Given the previous success of ANNs for improvement of activity type classification and EE prediction, it is likely that creation of ANNs trained on data that include both sedentary and non-sedentary activities will further improve assessment of activity type classification, time spent in SB, and breaks in SB while also improving EE measurement. Our study will address this shortcoming in the literature by creating and validating an ANN based on both sedentary and non-sedentary activities for a hip-mounted ActiGraph accelerometer. This ANN will be tested for its utility to correctly classify activity type and measure time in SB, breaks in SB, and EE. While the hip is the most common accelerometer placement location for measuring activity, there is evidence that placement on the other parts of the body, such as the thigh and wrist, can yield similar or slightly better accuracy for measuring PA and EE (Bouten, Sauren et al. 1997; Bao and Intille 2004; De Vries, Garre et al. 2011; Esliger, Rowlands et al. 2011; Mannini, Intille et al. 2013). Additionally, there is consistent evidence that thigh-mounted accelerometers can accurately measure total SB and breaks in SB, which may not be true of a hip-mounted accelerometer (Grant, Ryan et al. 2006; Hart, Ainsworth et al. 2011; Kozey-Keadle, Libertine et al. 2011; Lyden, Kozey Keadle et al. 2012). However, to date, no published study 6 has evaluated a thigh-mounted accelerometer for its utility in assessing measurement of EE or classification of both PA and SB. Thus, our study will also develop and test an ANN for classifying activity type and measuring SB and EE using data from a thigh-mounted ActiGraph accelerometer. The GENEA is a newly developed accelerometer (designed to be worn on the wrist) that has four functions that serve to dramatically increase compliance and non-wear determination: 1) it has a thin, low-profile design, 2) it is waterproof, 3) it has a battery life and memory capacity of up to 45 days, and 4) it has a temperature sensor to help detect when it is being worn. Therefore, the monitor does not need to be removed for any reason during data collection, and if it is, the temperature sensor will help to determine exact wear-time. These advantages, along with the GENEA’s raw data recording and reasonable price, make the GENEA ideal for measuring EE and SB and classifying activity type in free-living situations and large studies. Using the traditional, cut-point approach, the GENEA has been shown to have high accuracy for measuring EE (r>0.80) in a validation study when worn on the hip and wrist (Esliger, Rowlands et al. 2011) but much lower accuracy for classifying activity intensity in a cross-validation of the cut-points (Welch, Bassett et al. 2013; Welch, Bassett et al. 2014). Recently, the wrist-worn GENEA was tested using machine learning and showed high accuracy (>95% classification accuracy) for identifying 10-12 types of activities in a laboratory-based setting (Zhang, Rowlands et al. 2012). However, there are still many unanswered questions regarding the GENEA, including the ability to use machine learning to predict EE and measure total SB and breaks in SB, especially in a free-living environment. Therefore, our study developed and tested ANNs to measure SB and EE and classify activity type from raw data obtained from two wristmounted GENEA accelerometers. Additionally, it is conventional to wear wrist-mounted 7 accelerometers to be worn on the non-dominant wrist due to perceived superior measurement accuracy, but there is little evidence to support this convention. Therefore, the current study tested and compared monitors worn on both wrists to examine differences in accuracy between the dominant and non-dominant wrists. Finally, measurement techniques are often validated for use in heavily controlled, laboratory-based settings (Freedson, Melanson et al. 1998; Esliger, Rowlands et al. 2011; Zhang, Rowlands et al. 2012; Dong, Montoye et al. 2013). This method is important for providing a proof-of-concept that a technique can accurately measure what it is supposed to measure and identify potential limitations of the measurement technique. However, laboratory settings are very different than free-living conditions, and there is considerable evidence showing that predictive models developed in laboratory validations do not work well when applied to freeliving settings (Hendelman, Miller et al. 2000; Swartz, Strath et al. 2000; Freedson, Lyden et al. 2011; Gyllensten and Bonomi 2011; van Hees, Golubic et al. 2013; Welch, Bassett et al. 2014). Therefore, it is important to incorporate aspects of a free-living setting into validation studies to increase their real-world generalizability. In summary, our study developed and assessed the accuracy of ANNs for the measurement of EE, SB, and activity type using data collected from hip- and thigh-mounted ActiGraph accelerometers and two wrist-mounted GENEA accelerometers. These ANNs were created and validated in a free-living simulation, using a portable metabolic analyzer as the criterion measure of EE and direct observation (DO) as the criterion measure of activity type, SB, and breaks in SB. 8 SPECIFIC AIMS AND HYPOTHESES Objective 1: In a simulated free-living setting, create and test an ANN to estimate EE for a hipmounted ActiGraph GT3X+ accelerometer, a thigh-mounted ActiGraph GT3X+ accelerometer, and two wrist-mounted GENEActiv accelerometers (total of four ANNs). Aim 1: Create EE ANNs for the three accelerometers using simple-to-understand accelerometer signal features and a freely available software package and test a range of potential features to identify which are most relevant for inclusion in the ANNs. This aim is not hypothesis-driven. Aim 2: Assess the criterion validity of the hip-, thigh-, and wrist ANNs developed for the four accelerometers for estimating EE, using EE measured by a portable metabolic analyzer as a criterion. - Hypothesis 2a: All four accelerometers would have at least moderately high validity for measuring EE, as demonstrated by Pearson correlation coefficients of r≥0.60. - Hypothesis 2b: The thigh-mounted accelerometer would have the highest accuracy (as represented by the lowest root mean square error [RMSE] and highest Pearson correlations [r]) for predicting EE, and the wrist-mounted accelerometers would have the lowest accuracy (highest RMSE and lowest r values) for predicting EE. The hip accelerometer placement would be significantly less accurate than the thigh but significantly more accurate than the wrist accelerometers. Differences among RMSE and r values were evaluated using repeated-measures analysis of variance (RMANOVA). 9 Hypothesis 2c: Accuracy for predicting EE would be similar for the accelerometers worn on the dominant and non-dominant wrists. Differences between RMSE and r values for the left and right wrist placement sites were evaluated using RMANOVA. Objective 2: In a simulated free-living setting, create and test ANNs to correctly classify activity type from a hip-mounted ActiGraph GT3X+ accelerometer, a thigh-mounted ActiGraph GT3X+ accelerometer, and two wrist-mounted GENEActiv accelerometers (total of four ANNs). Aim 3: Create activity type ANNs using simple-to-understand accelerometer signal features and a freely available software package and to evaluate the utility of different sets of accelerometer features for inclusion in the ANNs. This aim is not hypothesis-driven. Aim 4: Assess the criterion validity of the ANNs for the four accelerometers for classifying activity type, using direct observation (DO) of activity type as a criterion measure. For hypotheses 4b-4f, differences among accelerometer placement sites were evaluated by RMANOVA. - Hypothesis 4a: Overall classification accuracies (determined by sensitivity of the ANNs) would be at least 70% for the thigh-, hip-, and wrist-mounted accelerometers. - Hypothesis 4b: Overall classification accuracy would be significantly higher for the thigh-mounted accelerometer than the hip- or wrist-mounted accelerometers. - Hypothesis 4c: For ambulatory activities (walking, jogging) and climbing/descending stairs, all four accelerometers would have classification accuracies no more than 5% different among accelerometers. 10 - Hypothesis 4d: For lifestyle activities (laundry and sweeping) and exercise activities (biceps curls and squats), the wrist-mounted accelerometers would yield significantly higher classification accuracy than the hip- or thigh-mounted accelerometers. - Hypothesis 4e: For SB (lying and sitting), standing, and cycling, the thigh-mounted accelerometer would yield significantly higher classification accuracy than the hip- or wrist-mounted accelerometers. - Hypothesis 4f: The dominant and non-dominant wrist accelerometers would yield classification accuracies not significantly different from each other. Objective 3: In a simulated free-living setting, use the activity type ANNs (created in Aim 3) for the four accelerometers for determining total time spent in SB and breaks in SB. Aim 5: Assess the criterion validity of the activity type ANNs developed for the four accelerometers for estimating total time spent in SB, using DO as the criterion measure. For hypotheses 5a-5c, differences among accelerometer placement sites and the criterion measure were evaluated using RMANOVA. - Hypothesis 5a: Total time spent in SB estimated from the thigh-mounted accelerometer would not be significantly different from DO-measured total time spent in SB (i.e., the thigh-mounted accelerometer would accurately measure total time spent in SB). - Hypothesis 5b: The wrist-mounted accelerometers would significantly underpredict total time spent inSB compared to that measured by DO. - Hypothesis 5c: The hip-mounted accelerometer would significantly overpredict total time spent in SB compared to that measured by DO. 11 Aim 6: Assess the criterion validity of the ANNs developed for the three accelerometers for classifying breaks in SB, using DO as the criterion measure. For hypotheses 6a-6c, differences among accelerometer placement sites and the criterion measure were evaluated using RMANOVA. - Hypothesis 6a: Breaks in SB estimated from the thigh-mounted accelerometer would not be significantly different from DO-measured breaks in SB (i.e., the thigh-mounted accelerometer will accurately measure breaks in SB). - Hypothesis 6b: The wrist-mounted accelerometers would significantly overpredict breaks in SB compared to that measured by DO. - Hypothesis 6c: The hip-mounted accelerometer would significantly underpredict breaks in SB compared to that measured by DO. This dissertation is split up into several chapters. Chapter 2 provides a comprehensive review of the literature regarding the use of accelerometers to measure physical activity and sedentary behavior. Then, Chapter 3 addresses Objective 1 (EE estimation), Chapter 4 addresses Objective 2 (activity type prediction), and Chapter 5 addresses Objective 3 (sedentary behavior measurement). Finally, Chapter 6 summarizes the findings of the dissertation and provides areas for further study. 12 CHAPTER 2 LITERATURE REVIEW Introduction Both PA and SB have been shown to influence health. However, the bulk of research conducted to date has focused on PA, with SB being classified as a lack of PA. This conventional definition where PA and SB are on opposite ends of a continuum fails to recognize the complex nature of SB or the independent effects being sedentary may have on people’s health, even when they obtain the recommended weekly PA dose (Pate, O'Neill et al. 2008; Dunstan, Howard et al. 2012). For PA, SB, and EE measurement in surveillance, observational, and intervention studies, recall methods are commonly used due to their low cost and minimal burden on both participants and researchers. However, accelerometer-based measurement of PA, SB, and EE is preferred due to its objectivity and potentially improved capability for accurate measurement of these variables (Welk 2002). With recent technological improvements in accelerometer capabilities, machine learning has become a popular method used to process and analyze accelerometer data. While former processing techniques could only measure EE or activity intensity and were developed for hipmounted accelerometers, machine learning allows researchers to use accelerometers to measure EE and classify activity type when worn on the hip or other parts of the body (Preece, Goulermas et al. 2009). Hip placement works well for ambulatory activities (Rosenberger, Haskell et al. 2013), and wrist placement improves compliance and allows for sleep measurement (Mannini, Intille et al. 2013; Rosenberger, Haskell et al. 2013). However, while hip placement is better than wrist placement for measurement of SB (Rosenberger, Haskell et al. 2013) neither hip nor 13 wrist placement allow for acceptable accuracy for measurement of SB (Lyden, Kozey Keadle et al. 2012; Rosenberger, Haskell et al. 2013), which may be due partly to lack of sedentary activities used to train machine learning algorithms. Thigh placement appears optimal for measuring SB (Kozey-Keadle, Libertine et al. 2011; Lyden, Kozey Keadle et al. 2012). Additionally, a few studies showing successful use of a thigh-mounted accelerometer for classification of SB and non-sedentary activities (Skotte, Korshoj et al. 2012; Dong, Montoye et al. 2013) and estimation of EE (Metcalf, Curnow et al. 2002) provide preliminary evidence that the thigh placement may be the ideal solution for comprehensive measurement of PA, SB, and EE. The current study will directly compare the utility of hip-, thigh-, and wrist-mounted accelerometers classifying SB and non-sedentary activities and measuring total time in SB, breaks in SB, and EE in a simulated free-living setting. This literature review begins by discussing the independent risks of low PA and high SB on multiple health outcomes. Then, the review addresses the strengths and weaknesses of available measurement methods, focusing on the progression in the use of accelerometers and the current state of accelerometer use. Finally, this review highlights several gaps that exist in measurement of EE, SB, and activity type, leading to the rationale for the current study. The influence of physical activity and sedentary behavior on health Physical activity PA has long been recognized for its importance in maintaining and improving health. Historical figures such as Hippocrates recognized the beneficial effects of PA on health as early as 400 B.C., writing the following in his book called Regimen: “Eating alone will not keep a man well; he must also take exercise” (Precope 1952). Since that time, substantial evidence has been 14 collected to support the role of PA in lowering risk of depression (Martinsen, Hoffart et al. 1989), obesity (King and Tribble 1991; Blair 1993), hypertension (Paffenbarger, Wing et al. 1983; Chobanian, Bakris et al. 2003), type II diabetes (Manson, Nathan et al. 1992; Healy, Wijndaele et al. 2008), cardiovascular disease (Paffenbarger, Hyde et al. 1986; Morris, Clayton et al. 1990), some cancers (Shephard 1990; Thune and Furberg 2001; Slattery 2004), and all-cause mortality (Paffenbarger, Hyde et al. 1986; Kampert, Blair et al. 1996). By the 1990s, the evidence was sufficient to recommend a minimum of 30 min/day of PA on most or all days of the week to achieve health benefits (Pate, Pratt et al. 1995). Since the these recommendations, several updates have been published, and the most recent recommendations include a more specific dose of PA (150 min/week in at least moderate intensity, defined as any activity eliciting an EE of at least 3.0 METs), separate recommendations for resistance training, and separate or modified recommendations for children, older adults, adults with disabilities, and pregnant individuals (2008). PA is commonly measured in min/day for comparison to recommendations, but PA can also be assessed indirectly through measuring EE, which is useful in terms of energy balance and assessing total PA. Therefore, an ideal measurement tool should be able to measure both constructs. Sedentary behavior Anyone who does not meet the national recommendations of obtaining at least 150 min/week of MVPA has traditionally been considered sedentary (2008; Pate, O'Neill et al. 2008). However, Pate et al. (Pate, O'Neill et al. 2008) emphasize that there is a marked difference between being sedentary and being physically inactive. While is it often the case that individuals are physically inactive (do not meet PA recommendations) and engage in large amounts of SB, it is also fairly common for people to engage in high amounts of PA and SB (Pate, O'Neill et al. 2008; 15 Troiano, Berrigan et al. 2008), a categorization Owen et al. call the “Active Couch Potatoes” (Owen, Healy et al. 2010). To better address the problem of SB, the Sedentary Behavior Research Network recently redefined “sedentary” to indicate time spent in seated or supine behaviors (e.g., TV watching, computer use, and driving) that elicit an EE of 1.0-1.5 METs (Ainsworth, Haskell et al. 2011; SBRN 2012). It is important to assess these behaviors (PA and SB) separately as evidence is accumulating that each behavior appears to exert independent effects on health. The classic 1953 study by Morris et al. (Morris, Heady et al. 1953) highlighted the potential influence on SB on health by recognizing differences in heart disease incidence between London’s bus drivers when compared to bus conductors. While the bus drivers spent the vast majority of their workday sitting in their driver seats, the conductors were constantly on their feet, accumulating little SB but a lot of LPA and some MVPA while walking through the double-decker bus and going up and down the stairs. Incidence of heart disease was higher in the drivers than the conductors, providing evidence that having high PA and low SB is associated with lower risk of developing heart disease. However, despite this initial evidence, follow-up studies focused less on SB and more on PA. Given that PA is easier to measure (especially with recall as the only available field method at the time) (Healy, Clark et al. 2011) and is arguably easier to prescribe as part of a lifestyle intervention, it is not surprising that follow-up research focused on the effects of PA and health. The importance of SB as a determinant of health returned to prominence in the last 10-15 years with the recognition that our society is becoming increasingly sedentary (Matthews, Chen et al. 2008), likely due to technological advances which increase the number of sedentary jobs and allow for more motorized transportation. Using National Health and Nutrition Examination Survey (NHANES) data, Matthews et al. (Matthews, Chen et al. 2008) found that adults spend 16 over 50% of their waking time (7.7 hours/day) engaged in SB (Matthews, Chen et al. 2008), while Troiano found that average adults spend only 27-29 min/day engaged in MVPA (Troiano, Berrigan et al. 2008). Together, this information indicates that adults spend an average of more than 10 times as much time in SB as in MVPA. Since SB comprises such a large percentage of the average person’s day, it is not surprising that SB has been linked to an array of health outcomes, including obesity (Shields and Tremblay 2008), metabolic and cardiovascular health (Healy, Dunstan et al. 2007), and all-cause mortality (Katzmarzyk, Church et al. 2009). From an energy balance perspective, SB requires much less energy than LPA or MVPA, resulting in lower daily EE and increasing risk for weight gain and obesity (Levine, Eberhardt et al. 1999; Hu, Li et al. 2003; Levine, Lanningham-Foster et al. 2005). For example, for a person with a resting EE of 70 kcal/hour, replacing two hours of SB with two hours of LPA could burn an extra 140 kcal/day ([2.5 METs *2 hours] /[1.5 METs*2 hours] * 70 kcal/MET-hour), which is more than the 105 kcal required to walk at a moderate intensity for 30 minute (3 METs * 0.5 hours = 1.5 MET-hour * 70kcal/MET-hour). In fact, if this person maintained a constant energy balance but wanted to lose weight, replacing two hours of SB with LPA (and holding all other factors constant) would result in losing one pound of body weight every 24 days ([3,500 kcal/lb] / [140 kcal/day]), or over 15 pounds in a year. In a laboratory-based study of 20 adults, Swartz et al. (Swartz, Squires et al. 2011) put this theory into action, measuring EE while having participants complete four activity protocols. Each protocol lasted for 30 minutes; all four bouts started with SB, and then the participant either continued to sit or broke their SB with a one-, two-, or five-minute walk at a self-selected pace. After extrapolating the results to the standard eight-hour workday, participants would burn 132 kcal/day more by taking five-minute walking breaks every 30 minutes (total of 80 minutes of walking) than by sitting for the entire eight hours. In summary, SB can 17 have important implications for total energy balance and maintenance or attainment of a healthy body weight. In addition to SB resulting in lower EE, high amounts of SB and prolonged SB have been shown to negatively affect metabolic and cardiovascular health in laboratory-based studies. Studies in mice and rats have introduced forced SB by immobilizing the animals’ hind limbs, and these studies show that in as little as a few hours, the muscular unloading caused by prolonged SB can result in reduced insulin sensitivity (Seider, Nicholson et al. 1982; Hamilton, Hamilton et al. 2007), poor glucose transport (Ploug, Ohkuwa et al. 1995), and suppression of muscle lipoprotein lipase (Bey and Hamilton 2003; Zderic and Hamilton 2006). Additionally, bed rest studies in humans reveal major negative changes in insulin sensitivity (sometimes reducing sensitivity by 40% or more) (Stuart, Shangraw et al. 1988; Mikines, Richter et al. 1991; Smorawinski, Kubala et al. 1996; Bergouignan, Rudwill et al. 2011), high-density lipoprotein cholesterol levels (Yanagibori, Suzuki et al. 1997; Yanagibori, Kondo et al. 1998), and increased risk of blood clots (Bird 1972; Kierkegaard, Norgren et al. 1987) within the first day spent in bed. Similarly, impaired insulin action (Tobin, Uchakin et al. 2002) and blood pressure responses (Hargens and Richardson 2009) have been observed with spaceflights and simulated microgravity. All three of these research avenues point toward the contribution of SB and a lack of breaks in SB to carrying negative health consequences; however, results from animal studies cannot be directly applied to humans, and bed rest and spaceflight studies represent an extreme situation to which humans are rarely exposed, limiting their generalizability to typical, free-living SB. Importantly, standing, which is often considered a sedentary activity, does not fit the definition of a sedentary behavior because it is not a supine or seated posture, even though it does elicit an energy cost of less than 1.5 METs (Ainsworth, Haskell et al. 2011). Moreover, standing requires significant and prolonged 18 contraction of major muscle groups in the legs, and this does not fit the proposed mechanism for many of the negative physiologic effects seen with prolonged sitting or lying (Hamilton, Hamilton et al. 2004; Hamilton, Hamilton et al. 2007). Therefore, standing likely does not affect health in the same way as SB and must be assessed as a separate construct when identifying the health risks of SB. To support the evidence from these laboratory-based and spaceflight studies, data from several large epidemiologic studies have been used to assess links between SB and health outcomes, both longitudinally and cross-sectionally. Cross-sectional studies have added considerably to our knowledge of SB and its relationship to health. Healy and colleagues have published several studies assessing the associations between SB and cardiometabolic health (Healy, Dunstan et al. 2007; Healy, Dunstan et al. 2008; Healy, Dunstan et al. 2008; Healy, Wijndaele et al. 2008; Wijndaele, Healy et al. 2010; Healy, Matthews et al. 2011). Using accelerometer-derived SB (≤100 counts/min using the ActiGraph accelerometer), they found that US adults in the highest quartile of SB had several adverse cardiometabolic biomarkers, including 32% higher insulin and 12% higher C-reactive protein levels, a 5% drop in high-density lipoprotein, and a 1.6 cm larger average waist circumference when compared to adults in the lowest SB quartiles (NHANES data) (Healy, Matthews et al. 2011). Similarly, in a subsample of participants enrolled in the Australian Diabetes, Obesity and Lifestyle Study, a 30-min decrease in SB was associated with a 7% lower waist circumference, and a similar drop in clustered metabolic risk score (Healy, Dunstan et al. 2007; Healy, Wijndaele et al. 2008). Additionally, in several different samples, Healy et al. have found that adults in the highest quartile for rates of breaking up SB with short periods of non-sedentary activity tend to have better metabolic health as well as a 5% lower waist circumference than adults in the lowest quartile (Healy, Dunstan et al. 19 2008). Healy et al. also found an inverse dose-response relationship of SB breaks with BMI and plasma glucose, independent of total PA or SB (Healy, Dunstan et al. 2007; Healy, Dunstan et al. 2008; Healy, Matthews et al. 2011). These cross-sectional studies provide strong evidence of associations between SB and several health indices, but from these alone we cannot establish cause and effect. Longitudinal evidence has also shown some support for the link between SB and many health conditions, although the evidence is less conclusive than in cross-sectional work. A 2011 review by Thorp et al. (Thorp, Owen et al. 2011) provides good insight into the state of the longitudinal evidence concerning the link between SB and health outcomes. Of the 48 studies included, 45 used self-report measures (TV watching and/or total sitting time), one used HR, one used both HR and self-report, and only one used accelerometry to measure PA and SB. Thorp’s review showed consistent evidence of an association between high levels of SB and risk of cardiovascular disease, all-cause mortality, and obesity. In many of the studies, the authors statistically controlled for BMI and time spent in MVPA, but few accounted for variables such as education or socioeconomic states. In two studies included in the review, those in the highest SB category had 54-130% increased risk of cardiovascular disease and 52-54% increased risk of allcause mortality in 4- and 12-year follow-ups (Katzmarzyk, Church et al. 2009; Stamatakis, Hamer et al. 2011). Similarly, two other studies (6.6- and 10-year follow-ups) showed a dose-response, with each hour of extra television watched per day increasing risk of cardiovascular disease by 718% and all-cause mortality by 4-11%. In relation to obesity risk, several studies showed that high SB in childhood was related to a 22-42% increased risk of obesity in early adulthood (Boone, Gordon-Larsen et al. 2007; Erik Landhuis, Poulton et al. 2008). 20 Thorp’s review also shows some evidence of an association between SB and risks of developing diabetes and certain types of cancer. For example, two studies found dose-response relationships between SB and risk of developing diabetes, with the highest SB group having a 61187% increased risk of developing diabetes in 8-10 year follow-ups (Hu, Leitzmann et al. 2001; Ford, Schulze et al. 2010); in another study, each two-hour increase in SB was associated with a 714% increase.in diabetes risk during a 6-year follow-up (Hu, Li et al. 2003). However, in the two studies using objectively measured SB (HR and accelerometry), there were conflicting results regarding the relationship between SB and insulin resistance (Ekelund, Brage et al. 2009; Helmerhorst, Wijndaele et al. 2009), and some of these studies also find that controlling for other factors such as PA moderate the associations. Similarly, in relation to cancer risk, two studies (with 9- and 10-year follow-ups) found that those with high SB had a 55% increased risk of developing ovarian cancer in females and a 61% increased risk of developing colon cancer in males (but not females) (Patel, Rodriguez et al. 2006; Howard, Freedman et al. 2008), although findings from other studies and other types of cancer have been mixed (Howard, Freedman et al. 2008; Gierach, Chang et al. 2009). These findings are intriguing but far from conclusive, warranting more research examining SB in relation to these outcome variables. Moreover, in eight of the studies, PA appeared to mediate the effects of SB on health outcomes, casting some doubt of the robustness of SB as a risk factor independent of PA. In one such study, Katzmarzyk examined self-reported time spent standing and mortality and found an inverse dose-response relationships between standing time and both mortality and cardiovascular disease, but only among those with low PA levels (Katzmarzyk 2014). Yet, the considerable variation in self-report instruments used and the paucity of research using objective measures of PA or SB severely limits our understanding of the true risk of SB on health or what levels of SB 21 are appropriate for maintaining or enhancing health. Previous evidence indicates that accelerometers yield higher quality data and stronger associations with health outcomes than selfreport (Reilly, Penpraze et al. 2008; Celis-Morales, Perez-Bravo et al. 2012) and recent evidence from a review by Atkin et al. (Atkin, Gorely et al. 2012) found that most self-report measures of SB have poor validity. Therefore, it is likely that objectively-measured PA and SB will yield stronger and more consistent associations of SB with health and greatly enhance our understanding of the ways in which these behaviors influences health. In conclusion, experimental studies have shown that prolonged SB has negative effects on metabolic variables that contribute to long-term disease risk. Also, there is evidence from crosssectional and longitudinal studies showing that TV watching and overall SB have a strong and consistent association with risk of several chronic diseases, although results were based on poor measures of PA and SB. However, to continue to determine the specific effects and true risk of SB on health, discover patterns of PA and SB associated with increased disease risk, and develop national recommendations for SB to improve health, methods for objective measurement of SB need to be utilized and refined for use in observational and intervention research. The next section of this literature review focuses on the progression of methods that have been used for measurement of PA and SB, limitations of the current methods, and gaps in the literature that are addressed with the current study. 22 Accelerometry as a preferred method to measure physical activity, energy expenditure, sedentary behavior, and activity type Measurement methods Many methods have been developed and used for measuring EE, PA, and SB. For smaller, laboratory-based studies, methods such as direct or indirect calorimetry can be used to obtain very accurate measurements of EE, and direct observation (DO) can be used to accurately record the time and type of PA being performed. However, calorimetry and DO are impractical for use in public health, surveillance, and epidemiologic research because these types of studies involve measurement of a large number of participants outside of the controlled laboratory environment. In large-scale studies, self-report measures such as questionnaires, diaries, and interviews are often used to measure EE, PA, and SB. Self-report measures are relatively inexpensive, can yield estimates of EE, and can provide information about the timing, frequency, and types of PA and SB performed (Sallis and Saelens 2000). However, self-report is vulnerable to recall bias and substantial reporting error (LaPorte, Montoye et al. 1985; Sallis and Saelens 2000; Shephard 2003; van Poppel, Chinapaw et al. 2010). Measurement errors associated with self-report reduce or attenuate associations between PA or SB and disease (Frost and White 2005; Lagerros and Lagiou 2007); as a result, statistical power decreases when trying to detect significant relationships between self-reported measures of EE,PA, or SB and health outcomes, and the risk of type II error increases (Beaton, Milner et al. 1979; MacMahon, Peto et al. 1990). Additionally, measurement error reduces researchers’ ability to obtain valid measurements of EE, PA, and SB and hinders efforts to detect meaningful changes in these variables that may 23 occur as the result of lifestyle interventions (Dale, Welk et al. 2002; Healy, Clark et al. 2011; Matthews, Moore et al. 2012). Self-report has been used to measure PA and SB with varying levels of success. Generally, both PA and SB can be measured with only low-moderate validity (van Poppel, Chinapaw et al. 2010; Healy, Clark et al. 2011; Lyden 2012), although MVPA can be assessed with higher validity than SB (Matthews, Moore et al. 2012). It is not surprising that self-report is more successful for measuring PA (especially MVPA) than SB. In adults, MVPA usually occurs during structured or planned activities and can be recalled with better accuracy than SB, which is typically more intermittent in nature and is, therefore, more difficult to recall (Healy, Clark et al. 2011). In addition, few self-report tools are properly designed for measurement of SB. SB has traditionally been assessed using proxy measures such as time spent watching TV, driving, using a computer, work-based sitting time, and/or total screen time. In a recent review of the literature, Healy et al. (Healy, Clark et al. 2011), found that most studies support that specific sedentary activities can be recalled with acceptable reliability and validity (intraclass correlation > 0.50 and Pearson/Spearman correlations >0.40). However, self-report of total SB generally has lower validity (Pearson/Spearman correlations < 0.40) when compared to accelerometer-derived SB in adults (Hagstromer, Oja et al. 2006; Healy, Clark et al. 2011). Similarly, it appears that breaks in SB cannot be accurately assessed using self-report. In 2011, Clark et al. (Clark, Thorp et al. 2011) found that 121 adult office workers recalled total SB with moderate validity (r=0.39) but had poor validity for recalling breaks in SB (r=0.26) during the work day. Moreover, most self-report measures contain few questions about sedentary activities or total SB and no questions about breaks in SB, making measurement of SB impossible using many current self-report tools (Healy, Clark et al. 2011). 24 Limitations of self-report methods have led researchers to use pedometers, heart rate (HR) monitors, and accelerometers for objective measurement of EE, PA, and SB. Of these methods, pedometers can only measure steps taken and, therefore, provide no information on SB, activity intensity, or activity duration (Tudor-Locke and Myers 2001). HR monitors provide a good estimation of moderate-to-vigorous PA (MVPA), but optimal accuracy is dependent on developing individualized curves that match HR to EE values, which can vary considerably among people of different ages and cardiorespiratory fitness levels (Janz 2002). Additionally, HR monitors have limited utility for measuring light-intensity PA or SB because lower-intensity activities tend to elicit high HR variability (Spurr, Prentice et al. 1988). Furthermore, HR is influenced by a number of external factors such as stress, caffeine intake, and temperature, which affect HR during SB and LPA much more than MVPA (Montoye, Kemper et al. 1996; Crouter, Albright et al. 2004). Finally, HR monitors can be cumbersome to wear, which may lower compliance rates compared to accelerometers or pedometers (Janz 2002; Andre and Wolf 2007). Accelerometers have become the preferred device for measuring EE, PA, and SB due to their objectivity, minimal participant and researcher burden, and ability to measure free-living activity for several weeks at a time. Accelerometers work by recording accelerations of a single part of the body and using this information to predict EE or activity type. Traditionally, these accelerations were passed through a filter to remove aberrant signals and then translated into ‘activity counts’ corresponding to the magnitude of the acceleration. In most studies, accelerometers have been worn on the hip to record vertical accelerations of the trunk; these vertical accelerations were found to correlate well with EE for ambulatory activities, such as walking and running (Montoye, Washburn et al. 1983; Freedson, Melanson et al. 1998). However, hip-mounted accelerometers have limited accuracy for measuring EE for 25 SB and many lifestyle activities (e.g., household chores, gardening, climbing/descending stairs, and cycling) and, when using common linear-regression or cut-point methods, cannot classify activity type (Hendelman, Miller et al. 2000; Crouter, Churilla et al. 2006; Rothney, Schaefer et al. 2008; Lyden, Kozey et al. 2011). Recently, accelerometer battery and memory capacity have improved to allow measurement of three-dimensional, raw acceleration data for as long as several months at a time (Westerterp 1999). Following these technological improvements, data processing methods such as machine learning have emerged as superior methods for analyzing accelerometer data. The following sections will review the progression of accelerometer data processing techniques leading up to the present time, address current limitations in data processing, and discuss how the current study will improve EE, PA, and SB measurement. The Large-Scale Integrated monitor and Caltrac In 1979, Laporte et al. (LaPorte, Kuller et al. 1979) developed the Large-Scale Integrated (LSI) motor activity monitor for the measurement of EE. The LSI was a little bit larger than a wrist-watch and contained a ball of mercury housed in a small cylinder. When the LSI was moved, the mercury would roll down the cylinder and run into a mercury switch. The number of times the switch was contacted was displayed on a small screen. In this way, the LSI functioned like a pedometer but was intended for use on the hip and other parts of the body (i.e., ankle, wrist). To assess the LSI’s ability to measure EE, Laporte et al. designed a series of experiments where they had participants log their activity for two days while wearing the LSI on the hip and ankle. Activities in the activity logs were looked up in previously developed EE tables (that reported average EE required for each activity) to obtain a measure of total EE, and these EE 26 values were correlated to the output from the LSI monitors. While both the hip- and anklemounted monitors had positive correlations with EE, the hip monitor performed significantly better (r=0.69) than the ankle monitor (r=0.43). This study provided a first step in validating activity monitors, but it used a poor criterion measure by estimating EE from tables instead of directly measuring EE and did not compare this monitor to other activity measures in use at the time (e.g., pedometer). In 1981, Wong et al. (Wong, Webster et al. 1981) developed an accelerometer that was later commercially produced as the Caltrac (Hemokinetics, Inc., Madison, WI). The Caltrac had a piezoelectric sensor which recorded accelerations based on the output charge generated with a movement, with faster accelerations producing a greater charge. This method provided a significant advantage over the pedometer, which could only record steps and could not differentiate different speeds of movement. The Caltrac was worn on the hip or lower back and recorded total vertical accelerations accrued, allowing it to measure total EE over the time period it was worn. In two different laboratory experiments, Montoye et al. showed that the Caltrac had higher correlations with measured EE than other activity monitors. In the first experiment (Wong, Webster et al. 1981), 15 participants performed walking (at 2, 3, and 4 mph), running (at 6 and 8 mph), and stepping (80, 120, and 160 steps/min) for three minutes each. During the testing the participants wore the Caltrac, two different pedometers, and a metabolic analyzer; they found that the Caltrac had significantly higher correlations with measured EE than either of the pedometers (data displayed in a figure, but no exact correlation coefficients given). Next, they conducted a second study (Montoye, Washburn et al. 1983) where 21 adults performed level and inclined walking (2 and 4 mph at 0, 6, and 12% grades), level and inclined running (6 mph at 0 and 6% grades), stepping (20 and 35 steps/min), knee-bends (28 and 48 bends/min), and floor 27 touches (24 and 36 touches/min) for four minutes each. During the activities, participants wore the Caltrac on the hip, two LSI activity monitors (worn on hip and wrist), and a metabolic analyzer. Similar to the previous study, the Caltrac had significantly higher correlations (r=0.79 vs. r=0.71 and r=0.40) and lower standard errors (S=6.63 vs. S=7.86 and S=9.16 ml/kg/min) for EE measurement than the hip- and wrist-worn LSI monitors. These studies provided the first evidence of the utility of accelerometers for EE measurement over pedometers or other kinds of activity monitors. Additionally, the comparison of the hip-mounted LSI to the wrist- and ankle-mounted LSIs provided preliminary evidence that the hip placement for activity monitors was preferable to limb placement when EE was the outcome variable of interest. However, use of these early monitors was restricted to measuring total EE and could not yield information about activity type, duration, or intensity. Linear regression For almost 15 years, the Caltrac was the most commonly used accelerometer for EE measurement in both adults and children (Sallis, Buono et al. 1990; Haymes and Byrnes 1993). Then, in the mid-1990s, newer accelerometers such as the Tritrac and the Computer Science Applications (CSA, also called the ActiGraph 5032) were developed, tested, and validated for measuring EE in a number of different studies (Janz, Witt et al. 1995; Melanson and Freedson 1995; Welk and Corbin 1995). In 1998, accelerometer data processing took a large step forward with a study by Freedson et al. (Freedson, Melanson et al. 1998) which was the first to use accelerometer data to measure PA intensity as well as EE using the uniaxial CSA 7164 accelerometer (a newer version of the CSA 5032). Their study was a laboratory-based validation study where 50 adult 28 participants walked (3.0 and 4.0 mph) and jogged (6.0 mph) on a treadmill, for six minutes each, while wearing a metabolic analyzer and a CSA on the right hip. Accelerometer counts and EE data were collected at one-minute intervals, allowing minute-by-minute comparisons of counts and EE. Accelerometer counts were found to have high correlations (r=0.88) with EE, allowing a linear regression model to be created to predict EE (in METs) from accelerometer counts. Furthermore, activity intensity could be derived from METs by establishing count thresholds (cut-points) to classify PA into light (<3.0 METS, <1952 counts/min), moderate (3.0-5.9 METs, 1952-5724 counts/min), hard (6.0-8.9 METs, 5725-9498 counts/min), and very hard (≥9.0 METs, ≥9499 counts/min) intensities. Since Freedson et al. published their study, the development of cut-points to classify activity intensity has been the preeminent method for validating and using accelerometers. Cutpoint development is relatively simple for researchers to accomplish and understand, and the cutpoint approach seems to work relatively well for measurement of ambulatory activities (Freedson, Melanson et al. 1998; Lyden, Kozey et al. 2011). However, a significant limitation of a linear regression equation developed using ambulatory activities is that it does not predict EE well when non-ambulatory activities are performed. Hendelman et al. (Hendelman, Miller et al. 2000) designed a free-living simulation that involved four self-selected speeds of walking (ambulatory activities), household chores (washing windows, dusting, vacuuming, lawn mowing, and planting shrubs), and two holes of golf. During the session, participants wore the CSA on the hip and had EE measured with a portable metabolic analyzer. Two regression equations were then developed, one for the walking activities (the “calibration” regression equation) and one for all activities. Similar to Freedson’s equation, Hendelman’s calibration regression equation 29 performed significantly better for predicting EE during the walking activities (r=0.77) than for all activities (r=0.59), and underestimated EE by 30.5-56.8% in the free-living simulation. From Hendelman’s study, it is apparent that linear regression models developed using ambulatory activities perform much better when measuring EE during ambulatory activities than for non-ambulatory activities. To support this finding, the regression equation and cut-points developed by Freedson et al. (Freedson, Melanson et al. 1998) have been studied by Crouter et al. (Crouter, Churilla et al. 2006), Lyden et al. (Lyden, Kozey et al. 2011), and Rothney et al. (Rothney, Schaefer et al. 2008); these studies support Hendelman’s conclusion that Freedson’s regression equation significantly underestimates EE when applied to non-ambulatory activities. In order to improve on the shortcomings of EE regression equations developed using only ambulatory activities, researchers began developing equations using both ambulatory and nonambulatory activities. In 2000, Swartz et al. (Swartz, Strath et al. 2000) developed a linear regression equation using 28 activities, of which only two were ambulatory activities (walking at 2.9 and 3.7 mph) and the remaining 26 were non-ambulatory, lifestyle activities (e.g., sports such as tennis and softball, household chores such as cooking and laundry). Their regression equation yielded only moderate validity (r=0.56) for EE measurement, but the studies by Crouter et al. (Crouter, Churilla et al. 2006), Lyden et al. (Lyden, Kozey et al. 2011), and Rothney et al. (Rothney, Schaefer et al. 2008) confirmed that Swartz’s regression model had better validity for measuring MVPA in free-living settings than Freedson’s. One of the big differences between the cut-points developed by Freedson and those developed by Swartz is that to overcome the underestimation of lifestyle activities, Swartz had a much lower cut-point for MVPA than Freedson (574 counts/min from Swartz’s equation vs. 30 1952 counts/min from Freedson’s equation). However, because of the lower MVPA cut-point, the regression line had a much flatter slope so that the EE of more intense activities would not be overestimated. Thus, Swartz’s equation had a y-intercept at 2.606, meaning that the predicted MET value for an activity registering 0 counts/min was 2.606 METs. This is a substantial error given that activities likely to record 0 counts/min (such as lying or sitting) elicit EE values of 1.0 METs (Ainsworth, Haskell et al. 2011). Thus, Swartz et al. improved the measurement of MVPA at the expense of measuring SB and LPA. In summary, the available evidence suggests that no simple linear regression model successfully classifies all activity intensities or accurately predicts EE across a variety of activities. As evidenced by Swartz’s study, improvement of certain PA intensities hurts measurement in the other intensities. Although the linear regression model is simple to understand and use, more complicated methods of accelerometer data processing are necessary to improve measurement of EE and activity intensity and classify activity type. Multiple regression Once it became apparent that single linear regression models could not adequately measure EE or activity intensity across a range of activity types, researchers moved toward creating more complex, multiple regression equations in an attempt to improve activity measurement. Heil (Heil 2006) was the first to experiment with a model where EE was predicted using one of two independent, linear regression models that had different slopes. Model 1 was used for activities eliciting 350-1,200 counts/min, and model 2 was used for activities eliciting >1,200 counts/min. Model 1 had a steeper slope than model 2, and this steeper slope helped predict EE of non-ambulatory activities (which were often underestimated by single 31 regression equations) more accurately while reducing the overestimation of light-intensity activities. By utilizing this approach, Heil was able to significantly improve estimates of EE over single regression models (r=0.84 for single regression model, r=0.87 for model 1 of tworegression model, and r=0.92 for model 2 of the two-regression model) when predicting EE across 10 activities (7 lifestyle, 3 ambulatory). Additionally, Heil was among the first to set a threshold for SB; in his model, any activity eliciting <350 counts/min was assigned an EE value of 1.0 MET instead of being input into a prediction equation. This sedentary threshold was implemented to alleviate the significant overestimation of EE present in single regression models. Despite the improvements in EE measurement seen with Heil’s two regression model, the model had limited use because it was fully dependent on accelerometer counts and cut-points to estimate EE. Sole reliance on counts and cut-points for determination of EE is a problem because some activities with different EE requirements yield similar numbers of counts when measured with a hip-mounted accelerometer. For example, Lyden et al. (Lyden, Kozey et al. 2011) found that activities can elicit very different counts/min (e.g., 3,245 counts/min for descending stairs vs. 203 counts/min for raking) while having very similar EE requirements (5.0 METs for descending stairs vs. 5.2 METs for raking). Thus, using Freedson’s regression equation (Freedson, Melanson et al. 1998), descending stairs would be correctly classified as moderate-intensity PA , while raking would by incorrectly classified as SB or light-intensity PA. Similarly, Hendelman et al. (Hendelman, Miller et al. 2000) found that some activities can elicit similar counts/min (e.g., 1,982 counts/min for walking 2.0 mph vs. 2,144 counts/min for golfing) but elicit very different EE requirements (2.0 METs for walking 2.0 mph vs. 4.3 METs for golfing). In this example, walking and golfing would both be classified as moderate-intensity 32 PA based on counts, even though walking at 2.0 mph is actually LPA. As one final example, both lying and standing quietly elicit close to 0 counts/min, so EE prediction would be the same for each (1.0 METs). However, lying requires an EE of 1.0 METs, while standing requires 1.3 METs (Ainsworth, Haskell et al. 2011); while the absolute difference is only 0.3 METs, misclassifying standing as lying is a 30% error in EE. This error is especially significant considering that adults in the US spend about 60% of their day engaged in SB (Matthews, Chen et al. 2008), highlighting the need to be able to detect small differences in EE that exist among different types of SB and LPA. In contrast to Heil’s method of choosing the regression line based on counts/min, Crouter et al. (Crouter, Clowers et al. 2006) developed a two regression model where the regression line used was dependent on the variability of the activity being performed. They discovered that variability in counts for ambulatory activities is lower than the variability of non-ambulatory, lifestyle activities (which tend to be more intermittent in nature). In order to determine variability, they parsed the one-minute data into six 10-second segments and calculated the coefficient of variation (CV) for the minute. Then they developed a two regression model where activities with a CV of ≤10 were analyzed using an exponential regression curve developed for ambulatory activities, and those with a CV of >10 were analyzed with a cubic regression curve developed for non-ambulatory activities. They chose exponential and cubic curves because these fit their data better than linear regression lines. When Crouter et al. tested their model in 48 participants performing 17 activities (4 ambulatory, 13 lifestyle), their model showed greatly improved accuracy for measuring METs ( r=0.96) compared to Freedson’s, Hendelman’s, and Swartz’s (Swartz, Strath et al. 2000) linear regression models, where the highest correlation was r=0.70. A subsequent study by Crouter et al. (Crouter and Bassett 2008) produced a two 33 regression model for the Actical accelerometer, and they found similar improvements in EE compared to single, linear regression. Crouter’s model was a significant innovation for two reasons: 1) it was the first to use characteristics of accelerometer output (CV) other than counts for EE measurement and 2) it was the first to utilize a non-linear regression model, which is less restrictive than a linear regression line since it allows more freedom in fitting a relationship between EE and accelerometer output across both ambulatory and lifestyle activities. However, subsequent evidence suggests that Crouter’s model may suffer from over-fitting, where the extra freedom of the non-linear model allowed for construction of a more accurate model for a specific population (better internal validity) but a less accurate model when applied to other populations, data sets, or sets of activities (poorer generalizability). To demonstrate this point, Lyden et al. (Lyden, Kozey et al. 2011) tested Crouter’s model against the models of Freedson, Hendelman, and Swartz in a large (n=277), independent sample performing 23 activities (6 ambulatory, 17 lifestyle). While Crouter’s model performed best for lifestyle activities, both Freedson’s and Swartz’s linear models performed better for ambulatory activities. Therefore, while Crouter’s models did not solve the problem of accurately measuring both ambulatory and lifestyle activities, they showed the utility of using accelerometer features other than counts/min to distinguish different kinds of activities and improve measurement of EE. Measurement of sedentary behavior using accelerometers As mentioned previously, self-report has been largely inadequate for measuring SB, leading researchers to use objective measures for SB measurement. Accelerometers are seemingly ideal for the measurement of SB because they can capture both movement and non- 34 movement and, therefore, should be able to measure total SB as well as detect breaks in SB. Using NHANES data to determine population levels of SB, Matthews et al. used a count cutpoint of <100 counts/min to determine SB and found that the average adults spends about 7.7 hrs/day in SB (Matthews, Chen et al. 2008). Since its first use, the 100 counts/min cut-point has been used by Healy et al. (Healy, Dunstan et al. 2007; Healy, Wijndaele et al. 2008; Healy, Matthews et al. 2011), who has found consistent associations between objectively-measured SB and poor metabolic health. However, the 100 counts/min cut-point for SB was chosen for its utility in detecting non-movement and has not been validated for use as an accurate measure of SB (Pate, O'Neill et al. 2008). Additionally, standing quietly elicits less than 100 counts/min, so the cut-point approach incorrectly classifies standing as SB when, as discussed previously, it exerts different effects on health and must be distinguished from SB. To directly test SB cut-points, a 2011 study by Kozey-Keadle et al. (Kozey-Keadle, Libertine et al. 2011) had 20 adult office workers wear ActiGraph accelerometers for six hours on two work days, with the second day spent performing more PA and less SB In this study, DO was used as the criterion measure of SB. Using the 100 counts/min cut-point, the ActiGraph underestimated time in SB by 4.9% and could not detect the change in SB that occurred between the first and second days. Notably, 150 counts/min was identified as producing a more accurate estimation of total SB (underestimated total SB by 1.8%), although it also could not detect the change in SB. In a follow-up study, Lyden et al. tested the utility of the 100 counts/min and 150 counts/min SB cut-points by having 13 adults wear ActiGraph accelerometers for 10 hours on two separate days, while performing less SB and more breaks in SB on the second day. The investigators found that both the 100 and 150 counts/min cut-points led to overestimations of SB, overestimations of breaks in SB, and inadequate detection of the reduction in SB on the second 35 day. Additionally, while the 100 counts/min cut-point was more accurate for total SB, the 150 counts/min cut-point was more accurate for breaks in SB. Together, these two studies indicate three issues: 1) defining a cut-point for SB is problematic since there does not appear to be a single one that can accurately measure total SB or breaks in SB, 2) no cut-point has been shown to accurately detect changes in SB, and 3) a consistent over- or under-estimation may be correctable, but the previous studies show no such consistent pattern (one showing underestimation and one showing overestimation). Therefore, measuring SB using the cut-point approach is not sufficient to capture the complex nature of SB, and the cut-point approach also cannot be used to identify specific activity types or distinguish standing from SB. Machine learning Regression techniques and cut-points were a logical first step in EE prediction due to their intuitive appeal and simplicity. The progression of regression equations from those developed for the Caltrac to Crouter’s multiple regression models for the ActiGraph and Actical has demonstrated that while newer regression models can address many of the problems of older models, the newer models also create new limitations. Given that two activities of very different intensities can elicit the same number of counts/min (Lyden, Kozey et al. 2011) and the fact that counts/min does not yield enough information to classify activity type, it is critical to move away from relying solely on accelerometer counts for activity measurement. Additionally, Crouter’s (Crouter, Clowers et al. 2006; Crouter and Bassett 2008) method of using the CV of an activity to differentiate ambulatory from lifestyle activities indicates that within the counts/min output is rich information in the accelerometer signal, and this information may be used for improving EE measurement and also allowing for classification of activity type. 36 Using average accelerometer counts/min to estimate EE was originally done as much for practical reasons as for scientific reasons. The Caltrac worked similarly to a pedometer in that it could only record total counts to give an overall estimation of EE or total PA level (Wong, Webster et al. 1981). The CSA represented a vast improvement in technology in that it could aggregate counts into one-minute increments, allowing minute-by-minute estimations of EE and estimations of PA intensity based on the EE in a given minute. Both accelerometers were uniaxial and could only record accelerations in the vertical plane. Further improvements in accelerometer technology have made the monitors smaller, lighter, cheaper, and able to record triaxial accelerations at a rate of up to 100 times per second (100 Hz) and for upwards of 45 days at a time on a single battery charge (GENEActiv 2013). This expansion in monitor capabilities has led to advanced methods of data managing and processing, collectively called “machine learning,” that have been used to predict EE and classify activity type and which show great promise for improving measurement of SB. Machine learning is a term that describes an array of complex mathematical techniques and algorithms that, coupled with an appropriate software package, can learn to recognize and differentiate patterns in activities by examining certain input ‘features,’ which are summaries of the data (e.g., mean, standard deviation, or skewness of the acceleration signal). Thus, machine learning can be applied to accelerometer data to in order to estimate EE, classify activity type and possibly measure SB (Preece, Goulermas et al. 2009). In order to use machine learning, important features of the accelerometer data must be identified and extracted for use. Then, these features can be used as inputs (or independent variables) into a machine learning algorithm, which then provides a specific output (dependent variable), such as EE or activity classification. Some features, such as root mean square error (RMSE) and CV, can be useful for differentiating 37 between static and dynamic activities (e.g., sitting vs. walking). Others, such as monitor orientation, help differentiate different body postures (e.g., lying vs. standing). Conversely, features such as mean, standard deviation, and entropy of the acceleration signal are useful for distinguishing among dynamic activities and activity intensities (Preece, Goulermas et al. 2009). Many machine learning algorithms exist, but those that have been used most commonly in PA research include artificial neural networks (ANNs), hidden Markov models, and decision trees (Bao and Intille 2004; Pober, Staudenmayer et al. 2006; Preece, Goulermas et al. 2009; Staudenmayer, Pober et al. 2009; Freedson, Lyden et al. 2011). While there is no consensus on which machine learning technique is most accurate for PA measurement, the ANN technique has a number of advantages over the other technique: 1) the ability of ANNs to directly estimate continuous and categorical variables and 2) the ability to construct ANNs using freely-available software. First, a significant limitation of the decision tree and hidden Markov model is that they can directly predict categorical variables, such as activity type, but cannot directly predict continuous variables such as EE (Preece, Goulermas et al. 2009). Decision trees and hidden Markov models can estimate EE indirectly by first classifying activity type and then predict EE using values from the Compendium of Physical Activities (Ainsworth, Haskell et al. 2011), but this method is limited to predicting EE from only the activities the decision trees and hidden Markov models were trained to classify and is subject to the same limitations for measuring EE as when using the Compendium (i.e., different people may have different EE when performing the same activities, EE values are averages, etc.). Second, while many machine learning techniques must be conducted using complicated and expensive software packages (Pober, Staudenmayer et al. 2006; Rothney, Neumann et al. 2007; Preece, Goulermas et al. 2009), a 2009 study by Staudenmayer et al. (Staudenmayer, Pober et al. 2009) implemented a relatively simple 38 way to use freely available software, the R statistical software , to extract features and process accelerometer data using ANNs. Thus, their study offered a significant advancement in the field because it was the first to make a complicated machine learning technique accessible to researchers without extensive engineering, computer science, and/or mathematics backgrounds or those without access to expensive statistical software packages. In many respects, machine learning is similar to regression. First, ANNs, as well as other machine learning techniques, work by taking a set of input variables (e.g., accelerometer counts, raw acceleration data, monitor orientation, demographic variables) and using them to predict a certain output (e.g., EE, activity type). Then, in order to create an ANN, the ANN must first be calibrated or trained on a set of data where both the inputs and outputs are known. The ANN then assigns certain weights to the input variables based on how important they are for predicting the output (similar to coefficients in regression equations) (Preece, Goulermas et al. 2009). However, ANNs are different from regression in two important ways. First, ANNs do not assume that simple models can be fit to complex data (derived in a variety of settings and from many different activities) (Preece, Goulermas et al. 2009). Thus, an ANN is much more flexible than a regression model because it does not have to have some predetermined shape (e.g., a line for a linear model or a curve for a quadratic model). Second, ANNs can take input variables that contain much more information about an activity than minute-by-minute accelerometer counts. For example, Staudenmayer’s model took second-by-second, uniaxial (vertical axis) accelerometer count data and extracted the 10th, 25th, 50th, 75th, and 90th percentiles from each minute’s data as the features to use as inputs into the ANN. By extracting these percentiles, it is possible to derive information about the average, variance, and CV of the accelerometer data. Thus, the model being created is using much more information from the 39 accelerometer, which should make it more accurate in predicting the desired outcome variables. Using this approach, Staudenmayer et al. were able to improve EE estimates by 28-66% compared to linear and multiple regression models. Additionally, while regression cannot be used for activity type classification, Staudenmayer’s model correctly classified activities into four different types (sedentary, lifestyle, ambulatory, or sport) with 88% accuracy (Staudenmayer, Pober et al. 2009). A more detailed description of the ANN is offered in the Methods sections of Chapters 3-5 of this dissertation. Although Staudenmayer’s use of machine learning for predicting EE is a significant step forward, newer accelerometer models offer raw data recording in three axes and also provide information about monitor orientation. Since accelerometer counts are derived from proprietary filtering methods by the companies that manufacture each kind of accelerometer, use of accelerometer counts does not allow for comparability of different brands of accelerometer. The move to raw data collection and analysis allows for comparison between accelerometer models. Also, more useful information can be extracted from the raw accelerometer data than from activity counts, so use of raw data will likely improve the use of ANNs for EE and SB measurement and activity classification. Additionally, while Staudenmayer et al. were able to classify activity into four categories, activity measurement will be significantly enhanced with proper identification of more specific activity types and a more thorough classification of SB (e.g., identifying sitting and standing separately instead of grouping them as ‘sedentary’). Thus, the current study will build off of Staudenmayer’s research by including a slightly larger number of input features and raw data in order to further improve measurement of EE and SB and classification of more activity types. 40 Multiple sensor methods Given the limitations of single, hip-mounted accelerometers for measuring the wide variety of activities that occur in free-living settings, some researchers have used multiple sensors to improve EE measurement and classify activity type. These efforts generally fall into one of two categories: 1) utilizing monitors that collect acceleration data along with other physiologic variables (e.g., HR, skin temperature) or 2) use of multiple accelerometer-based monitors placed on different parts of the body. Both types will be discussed in the following text. First, the combination of accelerometry and physiologic measures has been used to improve EE and activity intensity measurement. HR and accelerometry are both popular methods of measuring PA intensity and EE, but both have notable limitations when used on their own. In an effort to minimize the limitations of each method while capitalizing on their strengths, researchers have developed regression methods which use both HR and accelerometer counts to predict EE. Haskell et al. (Haskell, Yee et al. 1993) were the first to use a combination of HR and movement data to try to improve EE estimation. The authors had 19 men perform seven ambulatory and exercise activities while wearing a HR monitor, two Vitalog activity monitors (on the wrist and thigh), and a metabolic analyzer. Overall, using both HR and body motion significantly improved overall EE estimation compared to using HR or accelerometry only, although much of the improvement came from using individualized HR-EE curves (as opposed to a general curve applied to all participants). Follow-up studies have also shown improvements in EE prediction with combined HR and accelerometer data (Moon and Butte 1996; Strath, Bassett et al. 2001; Strath, Bassett et al. 2002; Plasqui and Westerterp 2005), but these studies also used individual calibration curves for HR, dramatically increasing researcher 41 burden for accurate data collection and limiting the generalizability of the regression models to the participants from whom they were created. The Actiheart activity monitor (Phillips, Bend, OR) attempts to reduce burden of multiple monitors by combining accelerometer and HR monitor into one device, which is fastened to a person’s chest with a sticky pad for continuous wear. The Actiheart tends to have good wear compliance, but men have to shave their chest to wear the monitor, and women tend to report lower comfort with the Actiheart than with hipmounted monitors (Moy, Sallis et al. 2010). Thus, while using both HR and accelerometer counts seems to improve EE measurement, the added cost and burden to researchers in creating individual HR curves and using multiple monitors per participant prohibits the use of this method in large studies. Additionally, participant compliance with HR monitors tends to be lower than with self-report or accelerometer tools (Janz 2002; Andre and Wolf 2007), providing another limitation of their use for activity measurement. Another measurement device, the BodyMedia armband (formerly called the Sensewear armband; BodyMedia, Inc., Pittsburgh, PA), is a single monitor (worn on the upper arm) that records biaxial acceleration data as well as heat flux, galvanic skin response, skin temperature, and ambient temperature. The armband uses these variables, along with self-reported gender, age, height, and weight to predict EE through proprietary algorithms developed by BodyMedia. The armband was first validated by Jakicic et al. (Jakicic, Marcus et al. 2004) in 2004 for estimating EE from walking, stepping, and leg and arm ergometry in 40 adults; their results indicate that the armband provided much better estimation of EE than a hip-mounted TriTrac accelerometer for these four exercise activities. Further research on the armband has validated its use for estimating exercise and free-living EE in many populations, including children (Arvidsson, Slinde et al. 2007), younger and older adults (Welk, McClain et al. 2007; 42 Heiermann, Khalaj Hedayati et al. 2011), pregnant women (Berntsen, Stafne et al. 2011), and diseased or obese individuals (Mignault, St-Onge et al. 2005; Papazoglou, Augello et al. 2006; Dwyer, Alison et al. 2009). In studies comparing the armband to traditional accelerometers for EE measurement, the armband frequently performs similarly to whe hip-mounted accelerometer data that are analyzed with linear regression models (Jakicic, Marcus et al. 2004; Welk, McClain et al. 2007; Berntsen, Hageberg et al. 2010; Colbert, Matthews et al. 2011), although some research indicates reductions in error by as much as 20% using the armband (Lee, Kim et al. 2014). Therefore, it is possible that the addition of physiologic measures improves estimates of EE. Additionally, the armband’s skin temperature and heat flux sensors help to verify time the monitor is actually being worn (wear time), which is an important issue in accelerometry-based measurement (Masse, Fuemmeler et al. 2005; Evenson and Terry 2009). Despite these advantages, the armband has some key limitations that prevent it from being an optimal measurement tool. The armband’s primary limitation is that it estimates EE using BodyMedia’s proprietary algorithms. While proprietary algorithms can be useful to consumers and end users who want EE estimation or time spent in MVPA without needing to develop their own prediction model, proprietary algorithms hinder scientific progress because they do not allow researchers transparency as to how EE is being predicted or which input variables are most useful for EE prediction. Without this knowledge, it becomes very difficult to identify armband strengths and limitations or identify variables that might be used to further improve EE measurement. Additionally, BodyMedia constantly refines its prediction algorithms to improve EE measurement, but without knowing how the algorithms work or which variables are most important, it is very difficult to compare results obtained using the different algorithms, hindering generalizability or comparability of study results. Finally, the armband only provides 43 estimates of EE and activity intensity, and its proprietary data analysis prohibits researchers from accessing the raw data in order to be able to use newer data processing techniques to determine activity type. Another multi-sensor method researchers have studied is to use multiple accelerometers positioned on different parts of the body to improve EE measurement and activity classification. When creating their linear regression model for measuring EE of lifestyle activities (discussed earlier), Swartz et al. (Swartz, Strath et al. 2000) also had participants wear a CSA on the wrist to determine if using acceleration information from the hip and wrist locations simultaneously could improve EE measurement. Compared to the correlation of r=0.56 for the hip regression equation, the combination of hip and wrist acceleration improved the correlation only minimally, to r=0.59. The minimal improvement seen in this study does not seem worth the added burden on participants (due to compliance issues) or researchers (for the added data to be analyzed). More recently, researchers have built and tested systems of accelerometers, where each accelerometer has a wired or wireless link to a central unit, allowing the unit to process data from the accelerometers simultaneously. A complete comparison of the systems can be found in Table 2.1. One example of this is the Intelligent Device for Energy Expenditure and Activity (IDEEA; MiniSun, Fresno, CA). Produced in the early 2000s, it is a system of five accelerometers (worn on both feet, both thighs, and the chest) that are wired to a processing unit worn on the hip. The sensors are taped to the skin, and the wires are to be worn underneath clothing to minimize risk of breaking. Data collected from the IDEEA monitor are processed via proprietary algorithms and are used to predict EE, activity type, activity duration and intensity, and activity speed (for walking and running). Validation studies have shown 98.7% accuracy for classifying 32 activities and postures (Zhang, Werner et al. 2003) and a correlation of r=0.973 for 44 measuring EE during a simulated free-living setting (Zhang, Pi-Sunyer et al. 2004). Its high accuracy for measuring both activity type and EE makes it ideal as a criterion measure in shortterm, free-living studies (Welk, McClain et al. 2007; Gyllensten and Bonomi 2011). However, the IDEEA’s limited battery life (48 hours), excessive participant burden, fragile design, proprietary algorithms for predicting its outcome variables, and high cost ($5,000 per unit) prohibit its use as a measurement tool for large, free-living studies. Since the creation of the IDEEA, another system has been developed by Tapia et al. that uses five wireless sensors (placed on the right ankle, thigh, hip, wrist and upper arm), a heart rate monitor, and open-source data analysis to overcome many of the shortcomings of the IDEEA monitor. Using machine learning algorithms, Tapia et al. were able to classify 30 activities and postures with 56.3% accuracy with only accelerometer data and 58.4% using accelerometer and HR data (Tapia, Intillie et al. 2007). While their classification accuracy was much lower than the IDEEA system, the activities that Tapia et al. used were more similar to each other than the activities in the IDEEA validation, and the activities Tapia’s system misclassified were often different intensity levels of a given activity (e.g., cycling hard intensity at 30 rpm vs. cycling moderate intensity at 30 rpm). Importantly, Tapia’s study indicates that HR may have little value for classifying activity type, especially when using advanced processing techniques for accelerometer data. In a similar study, Dong et al. developed a wireless system of three accelerometers (worn on the right ankle, thigh, and wrist) for measurement of activity type and EE. In a validation of the system, they found that activity classification accuracy for 14 activities was 71.3-78.3% using only one accelerometer (with the thigh providing the best classification accuracy and the wrist providing the lowest) but improved to 89.6-96.2% using two accelerometers (with the ankle and wrist combination providing the highest classification 45 accuracy) and only improved slightly (to 97.0%) using all three accelerometers (Dong, Montoye et al. 2013). The improvement in classification accuracy of this system compared to Tapia’s may be due to using fewer activities in the validation, but it may also be due to difference in machine learning approach and features used as input variables. Despite dramatic improvements in activity type classification accuracy achieved when using multiple accelerometers, preliminary analyses with the system developed by Dong et al. indicate that use data from all three accelerometers provides only minimal improvement over use of a single monitor for EE measurement (Dong, Biswas et al. 2013; Montoye, Dong et al. 2013). Overall, inclusion of physiologic variables and/or additional accelerometers appears to improve measurement of activity type classification and possibly EE, but it is unclear which variables or monitor locations are most useful to be included. Inclusion of HR does not appear to improve activity classification (Tapia, Intillie et al. 2007). Also, using two or more accelerometers can markedly improve accuracy of activity classification (Zhang, Werner et al. 2003; Dong, Montoye et al. 2013) but may not be as useful for EE measurement (Metcalf, Curnow et al. 2002; Zhang, Pi-Sunyer et al. 2004; Dong, Biswas et al. 2013; Montoye, Dong et al. 2013). However, the added burden of measuring additional variables restricts the use of these technologies and methods to small, short-duration studies. To help ensure high compliance rates, reduce both researcher and participant burden, and allow for accurate measurement in large studies, development of accurate measurement techniques for single accelerometers is needed. 46 Table 2.1. Comparison of wireless accelerometer systems for activity classification accuracy and EE prediction accuracy. Study Participant characteristics Placement of monitors Dong et al. 40 adults (Dong, Montoye et al. 2013) Right wrist, thigh, and ankle Tapia et al. (Tapia, Intillie et al. 2007) 21 adults Zhang et al. (Zhang, Werner et al. 2003) 68 adults Accelerometers: Right wrist, upper arm, thigh, hip, and ankle HR monitor: Chest IDEEA: Right and left foot, right and left thigh, chest Number and types of activities Activity classification accuracy 14 sedentary, All 3 monitors: ambulatory, lifestyle, 97.0% and exercise activities Ankle an wrist: (11 distinct, 3 96.2% variations). Laboratory- Thigh and wrist: based protocol. 91.0% Ankle and thigh: 89.6% Thigh: 78.3% Ankle: 78.3% Wrist: 71.5% 30 gymnasium Without HR: activities (13 distinct, 56.3% 17 variations). With HR: 58.4% Laboratory-based protocol. EE prediction accuracy 32 activities (5 distinct, 22 variations, and 5 limb movements). Combination of laboratory -based and simulated free-living protocols. N/A 47 98.7% N/A N/A Table 2.1 (cont’d.) a Dong et al. (Dong, Biswas et al. 2013); 25 adults Right wrist, thigh, and ankle b Montoye et al. (Montoye, Dong et al. 2013) Zhang et al. (Zhang, PiSunyer et al. 2004) 37 adults IDEEA: Right and left foot, right and left thigh, chest 14 sedentary, ambulatory, lifestyle, and exercise activities (11 distinct, 3 variations). Simulated free-living protocol. N/A Lab-based: 11 N/A activities (5 distinct, 6 variations) Simulated free-living: 2 required (walking and running), and the rest were left up to participant 48 a 3-monitor system similar to or better than hip for 10 of 14 activities b Correlations (r): 3-monitor system: 0.81 Thigh: 0.80 Ankle: 0.79 Wrist: 0.74 RMSE (METS): 3-monitor system: 1.61 Thigh: 1.61 Ankle: 1.69 Wrist: 1.85 Lab-based: 98.9% Simulated free-living: 95.1% Accelerometer placement Accelerometers can be placed anywhere on the body in order to record movement of the head, limbs, torso, etc. From their first use in PA measurement, accelerometers were placed on the hip to measure whole-body movement, and preliminary studies showed the hip placement to have higher correlations with measured EE compared to wrist or ankle placements (LaPorte, Kuller et al. 1979; Montoye, Washburn et al. 1983). Additionally, validation and cross-validation studies (Freedson, Melanson et al. 1998; Lyden, Kozey et al. 2011; Sasaki, John et al. 2011) show high correlations of hip-mounted accelerometer counts to EE during ambulatory activities, which comprise a high percentage of the types of PA in which people engage (Ham, Kruger et al. 2009). Despite the common use of hip-mounted accelerometers for measuring PA, there are many notable limitations associated with their use. First and foremost, when count-based regression equations are used, hip-mounted accelerometers dramatically underestimate the EE associated with lifestyles activities while overestimating the EE cost of SB (Swartz, Strath et al. 2000; Crouter, Churilla et al. 2006; Rothney, Schaefer et al. 2008; Lyden, Kozey et al. 2011). Even with sophisticated machine learning techniques, hip-mounted accelerometers still cannot accurately classify time spent in SB or SB type (Staudenmayer, Pober et al. 2009; Freedson, Lyden et al. 2011; KozeyKeadle, Libertine et al. 2011; Lyden, Kozey Keadle et al. 2012). Additionally, the newest ActiGraph GT3X+ accelerometer, which was built with an inclinometer (to improve detection of posture and differentiate among lying, sitting, standing, and movement), still frequently misclassifies SB type (Kozey-Keadle, Libertine et al. 2011; Carr and Mahar 2012; Hanggi, Phillips et al. 2012; Lyden, Kozey Keadle et al. 2012). Given the similar angle of the hip for sitting and standing (Parkka, Ermes et al. 2006; De Vries, Garre et al. 2011), it is not surprising that these studies have found frequent misclassification of sitting as standing (and vice versa). 49 Another significant limitation of hip-mounted accelerometers is that it is not clear if they can be used effectively for measuring activity in pregnant or obese individuals. In both pregnancy and obesity, hip-mounted accelerometers experience severe tilt, which changes the orientation of the accelerometer and alters the accelerations being measured, significantly lowering their accuracy for PA measurement (Shepherd, Toloza et al. 1999; Feito, Bassett et al. 2011; DiNallo, Downs et al. 2012). Additionally, when worn on the hip, accelerometers must be secured using a waist band, which may be uncomfortable for obese or pregnant individuals. To support this point, Harrison et al. (Harrison, Thompson et al. 2011) asked pregnant women at 26-28 weeks gestation to wear a pedometer and accelerometer for one week to measure free-living PA. Despite their stated efforts to minimize tilt angle and maximize comfort, 37% of their sample did not meet the minimum wear time requirements, and the authors attributed this in part to lack of comfort wearing the waist band to hold the accelerometer. Clearly, despite the widespread use of hip-mounted accelerometers, the hip is far from perfect as a placement site for measuring EE, SB, or activity type. Recently, researchers have renewed efforts to find alternate accelerometer locations for EE measurement and activity classification. Some locations have included the lower back, chest, wrist, ankle, and thigh. The strengths and weaknesses of each will be discussed in the following text. For a summary of current findings of accelerometer performance for different body locations, please see Table 2.2. First, the lower back and chest locations share many advantages and disadvantages of the hip location. Since these three locations are on the torso, they all measure total body movement and are minimally affected by erratic movements of the limbs, which can lower accuracy of EE prediction and hurt classification of activity type (Rosenberger, Haskell et al. 2013). Additionally, the chest location may be appealing in some contexts because it is worn under clothing and can 50 easily be implemented in a device that also measures heart rate, allowing for measurement of multiple physiologic variables with a single device (Brage, Brage et al. 2005). Similarly, the lower back and chest have the advantage of being placed at the midline of the body (as opposed to the left or right sides), which removes any difficulties with discrepancies that can occur when monitors are worn on dominant vs. non-dominant sides of the body (Nichols, Morgan et al. 1999; Trost, McIver et al. 2005). However, the lower back and chest locations also suffer similar limitations as the hip in their poor measurement of SB and certain lifestyle activities (e.g., household chores or cycling) and their lack of feasibility for continuous wear. A popular accelerometer location in recent years is the wrist. Wrist-mounted accelerometers are appealing because they can be worn like a watch, attracting minimal attention and enhancing comfort. Also, wrist-worn accelerometers allow for continuous wear (assuming the monitor is waterproof), which is likely to increase compliance. Within the last few years, the National Health and Nutrition Examination Survey (NHANES) in the US and the Biobank study in the UK switched from hip-mounted to wrist-mounted accelerometers in the hope of improving compliance, which has been a significant issue in their surveillance efforts (UBCC 2009). Preliminary data from the latest NHANES cycle has indicated that compliance may be slightly improved, with average wear-time almost an hour longer per participant (Troiano and McClain 2012), lending support that the wrist may be a viable location for large-scale studies measuring EE and some types of activity. Moreover, wrist-mounted accelerometers have long been recognized for their utility as objective measures of sleep (Kripke, Mullaney et al. 1978; Mullaney, Kripke et al. 1980) and have very high validity for measuring total sleep time and sleep quality (Jean-Louis, Kripke et al. 2001); therefore, wrist-worn accelerometers may allow 24-hour measurement of EE, activity type, and sleep (Webster, Kripke et al. 1982). Additionally, while regression approaches 51 for wrist accelerometers have yielded lower accuracy than hip accelerometers (Montoye, Washburn et al. 1983; Swartz, Strath et al. 2000), machine learning techniques have dramatically improved the utility of wrist-worn accelerometers for measuring EE and activity type. Mannini et al. (Mannini, Intille et al. 2013) found that machine learning algorithms developed from a wristworn accelerometer classified 26 activities into four activity categories with about 84% accuracy, which is only slightly lower than the classification accuracies of algorithms developed from single, hip-mounted accelerometers (Staudenmayer, Pober et al. 2009; Freedson, Lyden et al. 2011; Trost, Wong et al. 2012). Additionally, preliminary findings by Montoye et al. (Montoye, Dong et al. 2013) found that the wrist accelerometer could achieve high correlations (r=0.71) when predicting EE in a simulated free-living environment, suggesting that use of machine learning could allow accurate EE measurement using wrist accelerometers. However, there are also a number of studies where direct comparisons of the hip and wrist show that the hip has higher accuracy for EE prediction, activity type classification, and SB measurement (Zhang, Rowlands et al. 2012; Rosenberger, Haskell et al. 2013). In a study by Rosenberger et al. (Rosenberger, Haskell et al. 2013), participants performed 20 activities while wearing wrist- and hip-mounted accelerometers and a portable metabolic analyzer (for a criterion measure of EE). The algorithms they created for the hip accelerometer had better sensitivity and specificity for SB (71% and 96% vs. 53% and 76%) and MVPA (70% and 83% vs. 30% and 69%) measurement compared with the wrist accelerometer, and their algorithms for EE had lower errors (0.55 vs. 0.82 METs) and higher correlations (r=0.72 vs. r=0.36) with the hip accelerometer than the wrist monitor. They attributed the superiority of the hip location for EE measurement and SB and MVPA classification to the fact the trunk of the body requires more energy to move, so measurements of trunk movement represent the contraction of larger muscle masses. For SB 52 classification, the authors postulated that SB can have significant variability in arm movement (i.e., working on the computer or driving vs. lying), diminishing the ability of a wrist-worn accelerometer to detect differences between these behaviors and lifestyles activities involving intermittent whole-body movement (i.e., sweeping or washing dishes). Despite some potential drawbacks of the wrist-worn accelerometer, its potential to promote high compliance rates as well as 24-hour measurement of PA, SB, and sleep make it an attractive site for accelerometer placement. Similar to wrist-mounted accelerometers, machine learning algorithms developed for ankle-mounted accelerometers can provide good detection accuracy of ambulatory and lifestyle activities (Mannini, Intille et al. 2013), and they have also been shown to accurately estimate walking/running speeds (Foster, Lanningham-Foster et al. 2005), especially when used with machine learning algorithms. In a direct comparison of wrist- and ankle-mounted accelerometers, a study by Mannini et al. found overall classification accuracies of 95% for the ankle-mounted accelerometer vs. 84.7% for the wrist-mounted accelerometer. Similarly, analyses by Montoye et al. (Montoye, Dong et al. 2013) found that in a free-living simulation, ankle-mounted accelerometers had significantly higher correlations with measured EE than wrist-mounted accelerometers and similar correlations to thigh-mounted accelerometers (r=0.79, 0.71, and 0.80 for the ankle, wrist, and thigh, respectively). However, Dong et al. (Dong, Montoye et al. 2013) found that a single ankle-mounted accelerometer was unable to detect differences between sitting and standing since monitor motion and orientation were similar for these activities, rendering the ankle-accelerometer ineffective for accurate measurement of SB. Additionally, compliance with ankle-worn accelerometers is a significant limitation of ankle accelerometers because they cannot be worn with high-top shoes or boots, and they look somewhat similar to police tethers (Mannini, 53 Intille et al. 2013). Thus, ankle-mounted accelerometers may be useful as part of a multi-sensor system, but as a single unit they do not function well for SB measurement or activity classification and may have limited compliance. A thigh-mounted accelerometer may represent a good compromise of the advantages and disadvantages of the previously discussed accelerometer placements. Placement of an accelerometer on the thigh should allow for good accuracy for prediction of EE as well as measurement of SB and activity classification. Since the thigh is closer to the center of the body than the wrist or ankle, an accelerometer on the thigh is likely to have superior validity for estimating EE than one on the wrist or ankle, especially given that the thigh placement allows for tracking of some of the largest muscle groups in the body (gluteal and quadriceps muscles). Moreover, similar to an ankle-mounted accelerometer, a thigh-mounted accelerometer should be able to accurately capture stepping motions, allowing for good activity type classification with ambulatory and lifestyle activities. Finally, whereas time in SB type and breaks in SB are poorly detected with the previously discussed accelerometer placements, thigh angle is different between sitting and standing, making the differentiation between SB and non-sedentary activities relatively simple using monitor orientation (for differentiating sitting from standing) and acceleration data (for differentiating sitting from cycling). Despite the theoretical superiority of a thigh-mounted accelerometer, this placement has received little research attention until recently. Initial opposition to the thigh location was not due to lack of utility; an early study by an engineering group (Veltink, Bussmann et al. 1996) showed that acceleration signals from a thigh accelerometer could be used to distinguish between sitting and standing and among stair use, walking, and cycling (Veltink, Bussmann et al. 1996). However, the accelerometers used by this group were not available commercially, prohibiting their 54 widespread use in PA measurement. At the time, the Caltrac, ActiGraph, and other commercially available accelerometers were much too large and heavy to be worn on the thigh, and their size would have made them unattractive to wear in free-living environments. Use of a thigh-mounted accelerometer was recently brought to prominence by the development of the activPAL accelerometer (PAL Technologies, Glasgow, Scotland), a small, thin accelerometer specifically designed to be mounted on the thigh using a small strap or a sticky patch. In numerous studies, the activPAL accelerometer has shown high accuracy for quantifying time in SB, breaks in SB, and classifying SB type (Grant, Ryan et al. 2006; Kozey-Keadle, Libertine et al. 2011; Aminian and Hinckson 2012; Lyden, Kozey Keadle et al. 2012), and it is frequently used as a criterion measure of free-living SB (Hart, Ainsworth et al. 2011; Lord, Chastin et al. 2011; Martin, McNeill et al. 2011). However, a major shortcoming of the activPAL is that it relies on proprietary software for the determination of lying/sitting, standing, and stepping. While the proprietary software makes the device user friendly, it does not allow researchers to identify important aspects of movement or improve on the company’s algorithms to allow the activPAL to predict EE or PA type. Additionally, at $600+ per monitor, the activPAL is considerably more expensive than the $300 ActiGraph GT3X+ or $225 GENEActiv, which are both now as small as the activPAL and have the added advantages of being water resistant (ActiGraph) or waterproof (GENEA) and allowing for raw data recording and extraction. Although the activPAL is limited for its utility to predict EE or classify non-sedentary activity type, its development and validation provides a proof-of-concept for the utility of the thigh location for measuring classifying activity type and measuring SB and EE. Recently, a study by Skotte et al. (Skotte, Korshoj et al. 2012) developed a novel method to use a thigh-mounted ActiGraph accelerometer for measuring a total of six sedentary, ambulatory, and lifestyle activities. 55 Using a combination of accelerometer orientation and acceleration data, they were able to correctly classify the six activities with close to 99% accuracy in a simulated free-living protocol. Additionally, the thigh-mounted accelerometer had significantly better sensitivity and specificity (98.2% and 93.3%) for measuring free-living SB than the hip-mounted accelerometer (72.8% and 58.0%). Moreover, analysis of a single, thigh-mounted accelerometer from a wireless accelerometer system developed at Michigan State University showed 78.3% classification accuracy for 14 sedentary, ambulatory, lifestyle, and exercise activities (Dong, Montoye et al. 2013). Additionally, preliminary analyses of EE measurement accuracy indicated that the thigh-mounted accelerometer achieved high correlations (r=0.80) with criterion-measured EE (Metcalf, Curnow et al. 2002). However, the accelerometer used in the wireless system is not commercially available and only measures two axes of acceleration data. Validation of a triaxial, commercially available accelerometer mounted on the thigh for classifying activity type and measuring SB and EE is a logical next step in determining the utility of the thigh location as a measurement site. In conclusion, while it appears that no single accelerometer placement is ideal for all movements or all contexts, the thigh location may represent the best compromise of comfort and measurement accuracy. The hip is well researched and provides good estimation of total body movements, ambulatory activities, and EE. Additionally, the wrist seems to have slightly lower accuracies for activity type and EE prediction, but the ability to record sleep measures and improve participant compliance rates makes the wrist appealing for large studies and total day recording. The thigh appears to be a good compromise of the hip and wrist locations. Since the thigh is very close to the torso, it is less affected by erratic limb movements than the wrist or ankle. Also, placement on the thigh is beneficial for detecting certain lifestyle and cycling activities and shows 56 the greatest promise for accurate measurement of SB. Additionally, with the low-profile, water resistant/waterproof designs of the ActiGraph and GENEA accelerometers, thigh-mounted accelerometers could be placed under clothing with a small strap or sticky patch, allowing for continuous wear with minimal discomfort. We believe that by applying machine learning techniques to thigh-mounted accelerometer data, we can develop algorithms with better accuracy for classifying activity type and measuring EE and SB than can be achieved with hip- or wristmounted accelerometers. 57 Table 2.2. Comparison of different monitor placements for activity classification accuracy and EE prediction accuracy. Study Participant characteristics Placement of monitors Number and types of activities Dong et al. 40 adults (Dong, Montoye et al. 2013) Right wrist, thigh, and ankle Mannini et al. 33 adults (Mannini, Intille et al. 2013) Right wrist and ankle Staudemayer et al. (Staudenmayer, Pober et al. 2009) 48 adults Right hip 14 sedentary, sedentary, ambulatory, lifestyle, and exercise activities (11 distinct, 3 variations). Lab-based protocol. 26 sedentary, cycling, ambulatory, and lifestyles activities (10 distinct, 16 variations) 20 activities (18 distinct, 2 variations) Zhang et al. (Zhang, Rowlands et al. 2012) 60 adults Right hip, right wrist, and left wrist Skotte et al. 17 adults (Skotte, Korshoj et al. 2012) Right thigh 12 sedentary, lifestyle, and ambulatory activities (8 distinct, 4 variations). Combination of labbased and simulated freeliving protocol. 6 sedentary, lifestyle, and ambulatory activities 58 Activity classification accuracy Thigh: 78.3% Ankle: 78.3% Wrist: 71.5% EE prediction accuracy Ankle: 95.0% Wrist: 84.7% N/A 88.8% RMSE (METs): ANN: 1.22 Linear regression: 1.51 – 2.09] Bias (METs): ANN: 0.05 Linear regression: -0.30 – -1.21 N/A Hip: 99.1% Right wrist: 97.0% Left wrist: 95.9% Sensitivity and specificity were both 99% for activity discrimination N/A N/A Table 2.2 (cont’d.) Montoye et al. (Montoye, Washburn et al. 1983) 21 adults Wrist hip and left wrist 14 ambulatory and exercise activities (6 distinct, 8 variations). Lab-based protocol. N/A Montoye et al. 27 adults (Montoye, Dong et al. 2013) Right wrist, thigh, and ankle 14 sedentary, sedentary, ambulatory, lifestyle, and exercise activities (11 distinct, 3 variations). Simulated free-living protocol. N/A Rosenberger et al. (Rosenberger, Haskell et al. 2013) 37 adults Dominant hip and wrist Swartz et al. (Swartz, Strath et al. 2000) 70 adults Right hip and dominant wrist 13 sedentary, lifestyle, N/A cycling, and ambulatory) activities. Combination of lab-based and simulated free-living protocol. 27 lifestyle, occupational, N/A exercise, and ambulatory activities. Combination of lab-based and simulated free-living protocol 59 Reliability: Wrist: r = 0.74, Hip: r = 0.63 Standard error: Wrist: 7.9 ml/kg/min, Hip: 9.2 ml/kg/min Correlations (r): Thigh: 0.80 Ankle: 0.79 Wrist: 0.74 RMSE (METS): Thigh: 1.61 Ankle: 1.69 Wrist: 1.85 Correlations (r): Hip: r = 0.72 Wrist: r = 0.36 Correlations (r): Wrist: r = 0.18 Hip: r = 0.56 Laboratory-based vs. free-living settings Ultimately, measurement techniques need to be validated in a context similar to the setting in which they will be used. When a technique is first tested, activities are generally performed in a laboratory-based setting, where maximum control can be exerted over the timing and order of activities performed. In these laboratory-based studies, activities are usually performed in order of increasing intensity for at least 5-7 minutes, allowing participants to reach steady-state EE (where EE matches the demands of the activity). Additionally, the activities must be performed in a specific manner (i.e., walking/jogging speeds and cycling cadences are the same for all participants), so that there is minimal variability in the activities (Freedson, Melanson et al. 1998). Laboratory-based validation studies are a crucial first step in the testing of measurement devices because they provide a proof of concept that a given measurement device or method can work well in a highly controlled environment. Additionally, highly valid criterion measures, such as metabolic analyzers for measuring EE and DO for determining activity type, are available for use in laboratory-based settings. However, once measurement methods are validated in a laboratory, they must then be tested in a free-living environment since laboratory conditions are very different from activities and settings that people encounter in their everyday lives. In free-living settings, people are seldom engaged in steady-state activities and do not normally perform activities for defined amounts of time, and there can be substantial variability within activity types. For example, free-living walking rarely occurs at a constant speed, and preferred walking speed can differ considerably among individuals. Additionally, treadmill and non-treadmill walking elicit different gait patterns (Dingwell, Cusumano et al. 2001), lowering the potential to generalize detection of treadmill walking to the detection of free-living walking. 60 To support this point, a study by Gyllensten et al. (Gyllensten and Bonomi 2011) found that an ANN created in the laboratory using data from a single accelerometer (located on the lower back) has 94% accuracy for classifying five categories of activities, but this accuracy dropped to 75% accuracy when used in a free-living setting (with IDEEA used as criterion). Additionally, Lyden et al. (Lyden, Keadle et al. 2013) found that an ANN created in the laboratory performed well in the laboratory but very poorly when applied to a free-living scenario, with biases of 33% and 73% when estimating MET-hours and minutes spent in MVPA, respectively. Therefore, they recreated their ANNs using free-living data. These findings have been further confirmed by other studies (Bao and Intille 2004; Ermes, Parkka et al. 2008; Crouter, Kuffel et al. 2010), providing strong evidence that laboratory validations must be applied to free-living settings with caution. Therefore, it is important to incorporate aspects of a free-living environment into validation protocols so that results obtained can be applied to real-world situations. However, conducting validation studies in a true free-living environment is not feasible due to the lack of a suitable criterion measure for measuring EE or activity type. Doubly-labeled water is a commonly used method for assessing free-living EE, but this method only works well for measuring total EE over a period of 1-2 weeks and cannot yield information about timing, type, duration, or intensity of PA. Therefore, doubly-labeled water cannot give an indication of how well a measurement method predicts EE for specific activities. Additionally, since most activity monitors cannot be worn continuously (i.e., must be removed for showering and sleeping), doubly-labeled water captures a significant amount of EE that is not recorded by the monitors, precluding a comparison of monitor output to total EE. Indirect calorimetry measured using a portable metabolic analyzer has also been used as a criterion measure of field-based EE, but use of a metabolic analyzer can result in participant 61 reactivity and does not allow participants to perform many normal activities. While portable metabolic analyzers allow for participants to perform activities outside of a laboratory, the analyzer requires participants to wear a mask (for collecting data on expired gas volumes and concentrations). Therefore, consumption of food or beverage is prohibited during the course of wearing the equipment. Additionally, participants must wear a shoulder harness with multiple pieces of equipment strapped to the participants’ backs, making lying or reclining uncomfortable and unnatural. Another potential criterion measure could be use of whole room indirect calorimeter chambers, which can measure oxygen consumption (to estimate EE) without participants needing to wear any equipment. While this setting allows participants to perform some activities as they would in a free-living setting, being confined to a small room is unnatural and necessitates the use of exercise machines (e.g., treadmills, stair steppers, cycle ergometers) to perform many lifestyle and ambulatory activities, making it a poor substitute for true freeliving. Additionally, whole room indirect calorimeter chambers are very expensive and are only located in a few laboratories around the country, making accessibility to them very difficult. For the measurement and classification of activity type, DO is commonly used as a criterion method for measuring free-living activity. DO allows researchers to capture participants’ actions in the field (Santos-Lozano, Marin et al. 2012), but the act of being observed is likely to cause reactivity in participants (McKenzie 2002), reducing the generalizability of findings to a true free-living setting. Additionally, DO would have to be performed for a period of days or weeks to capture participants’ true activity patterns (SantosLozano, Marin et al. 2012), but this is simply not feasible in a research context and would pose a significant burden on participants and observers. Finally, it is important to validate measurement techniques with participants performing a variety of activities. In free-living settings, adults 62 spend the majority of their time in SB and much less time in household, exercise, or sport activities. Thus, observing participants for a shorter period of time would likely result in a lack of variety in activities detected, hindering the ability of the measurement technique to classify important lifestyle, household, or exercise activities and limiting the utility of the measurement’s validation to only the population in which it was validated. Clearly, both laboratory and free-living validation studies are subject to limitations, but a combination of the two, also called ‘simulated free-living,’ may be an optimal balance of the two settings. In simulated free-living, researchers can exert control over the types of activities and the minimum amount of time participants need to perform the activities, but the participants can choose the amount of time and order in which they perform the activities as well as technique they use to perform each activity (i.e., not everyone walks at the same speed or sweeps the same way). Also, since simulated free-living allows many activities to be performed in a relatively short period of time, both DO and indirect calorimetry can be utilized for criterion measures of activity type and EE. Therefore, simulated free-living provides better generalizability to realworld conditions than strict laboratory-based protocols, but it does not face the limitations of trying to find an appropriate criterion measure for testing the measurement methods in a true free-living setting. Importantly, simulated free-living has shown promise in several recent validation studies of accelerometers (Sun, Schmidt et al. 2008; Rumo, Amft et al. 2011), providing a strong case for its use in the current study. Once the accelerometers and machine learning algorithms have been validated in a simulated free-living setting, they can then be used in true free-living settings with reasonable confidence of their accuracy. 63 Accelerometer reliability In order to be used effectively for measurement of PA, EE, and SB, accelerometers must exhibit high intra- and inter-monitor reliability. Reliability of accelerometers has been assessed in two main ways: 1) laboratory studies where accelerometers are placed on mechanical shakers and 2) accelerometers are placed either next to each other or on the opposite side of the body (i.e., left vs. right hip) and worn in free-living settings. This section will focus on reliability studies of the ActiGraph and GENEA accelerometers since these are the two accelerometers being used in the current study. In laboratory studies using mechanical shakers, accelerometers generally exhibit very high intra- and inter-monitor reliability over the range of intensities encountered in most lifestyle activities (indicated by high intraclass correlations and low coefficients of variation [CVs]). The ActiGraph accelerometer has been tested extensively for intra- and inter-monitor reliability, with intra-monitor intraclass correlations ranging from 0.84-0.92, inter-monitor intraclass correlations from 0.71-0.99, and CVs ranging from 1-9%, (Metcalf, Curnow et al. 2002; Brage, Wedderkopp et al. 2003; Esliger and Tremblay 2006; McClain, Sisson et al. 2007; Santos-Lozano, Marin et al. 2012; Santos-Lozano, Torres-Luque et al. 2012; Troiano and McClain 2012). However, SantosLozano et al. (Santos-Lozano, Marin et al. 2012) found that CVs increased considerably (both intra- and inter-monitor) at very high and very low intensities when on the shaker. The high CV at low intensities is not concerning since the high CV is likely being driven by the very low mean acceleration during low-intensity activities. Similarly, the poor CV achieved during high intensity shaking is not particularly concerning for the current study since the ActiGraph placements will be on the hip and mid-thigh. However, the high CV at high intensities may be problematic for studies with the accelerometer placed on the wrist or ankle, where accelerations 64 are much more rapid than those experienced at the hip. Importantly, a study by Brage et al. (Brage, Brage et al. 2003) discovered that raw acceleration data has better inter-monitor reliability than activity count data, lending further support to the use of raw acceleration data with machine learning algorithms. In free-living settings, intra- and inter-monitor reliability can be assessed by putting monitors on the same body part but on the opposite side of the body (i.e., left vs. right hip). McClain et al. (McClain, Sisson et al. 2007) tested the inter-monitor reliability by comparing outputs from ActiGraph accelerometers mounted on the left vs. right hips and found an intraclass correlation of 0.99 and CV of 4.9% when measuring MVPA, providing evidence of the realworld reliability of the ActiGraph. McClain’s work has been supported by several other studies comparing accelerometers placed on the right and left hips (Brage, Wedderkopp et al. 2003; Vanhelst, Baquet et al. 2012). Additionally, Welk et al. (Welk 2002) conducted a study in which participants performed repeated walking and running trials on a treadmill while wearing only one monitor at a time and found that intra-monitor CVs were very similar to inter-monitor CVs. They postulated that differences seen in the accelerometer output were likely due to slight differences in monitor placement rather than the variation in the accelerometers themselves, providing further evidence that the ActiGraph has good inter- and intra-monitor reliability in the free-living environment. Additionally, although the GENEA accelerometer is relatively new, one study by Esliger et al. (Esliger, Rowlands et al. 2011) has evaluated the reliability of the GENEA in the laboratory and indirectly in a field-based setting. Using a mechanical shaker, they found an intra-monitor CV of 1.4% and an inter-monitor CV of 2.1% when assessing 47 monitors in 15 different shaker speeds. Also, in a free-living setting, the GENEA accelerometers worn on the left and right 65 wrists both showed excellent validity (r=0.83-0.86) for estimating VO2 (Esliger, Rowlands et al. 2011), providing preliminary evidence of the reliability of the GENEA in both laboratory and free-living settings. In summary, reliability studies performed in laboratory and free-living settings indicate that the ActiGraph and GENEA exhibit good intra- and inter-monitor reliability for measuring MVPA as well as raw accelerations, supporting their use in the current study. Identifying non-wear Determining when accelerometers are being worn vs. when they are removed is very important for calculating daily PA and SB. Logs or diaries can be used to help determine weartime, but these are not ideal for large studies since they are subject to error in recording and increase participant and research burden. When establishing wear-time from accelerometer data, there are several criteria that must be addressed: non-wear vs. SB, minimum hours/day of wear, and minimum days/week of wear. The first difficulty in identifying wear-time is distinguishing between non-wear and SB, both of which often result in accelerometers registering zero counts/min. Many data reduction methods have been created in order to identify and remove accelerometer non-wear time by setting a minimum amount of time with continuous zero counts/min. This minimum time has been set anywhere from 10 minutes (Riddoch, Bo Andersen et al. 2004) to 90 minutes (Choi, Liu et al. 2011), and there is no consensus on the optimal length of 0 counts to determine non-wear. Continuous wear and implementation of machine learning algorithms using raw data can help more effectively deal with non-wear. Conventionally, hip-mounted accelerometers were 66 worn during waking hours and removed at night and before performing water-based activities (Welk 2002). Frequent removal of accelerometers is likely to lower compliance when participants forget to put the monitors back on in the morning or after swimming or showering (Kinder, Lee et al. 2012), and the data collected with the accelerometers during times of nonwear do not reflect actual activity levels. Hip-mounted accelerometers have been worn continuously in several studies (Hjorth, Chaput et al. 2012; Kinder, Lee et al. 2012), but having an accelerometer protruding from the hip could be uncomfortable for sleeping and lower participant compliance. Accelerometer placement on the wrist or thigh allows for continuous wear, providing an advantage of these sites over the hip. As previously mentioned, wrist-worn accelerometers can be worn continuously and have improved compliance in NHANES data collection (Troiano and McClain 2012). Additionally, studies using the activPAL accelerometer (Hart, Ainsworth et al. 2011; Lord, Chastin et al. 2011; Martin, McNeill et al. 2011) indicate that thigh-mounted accelerometers can be worn continuously with minimal subject discomfort (Craft, Zderic et al. 2012; Feito, Bassett et al. 2012). Choice of accelerometer may allow or preclude continuous wear. While the newest ActiGraph models are said to be waterproof, the GT3X+ and all older models are water resistant at best, and the company recommends their removal for water-based activities (ActiGraph 2013). Therefore, use of protective sleeves or barriers is necessary to allow for continuous wear. Conversely, GENEA accelerometers are waterproof and can be worn 24 hours/day. Additionally, GENEA accelerometers contain a skin temperature sensor, which can help with the determination of wear-time and remove the need for using the data reduction techniques described above for identifying wear-time. Therefore, GENEA accelerometers do not need to be 67 removed for any reason during data collection, and if they are, the temperature sensor will help to determine exact wear-time, making them well-suited for use in free-living settings. Moreover, machine learning algorithms are designed for pattern recognition and, with proper development and use of raw data, should be able to recognize non-wear as distinct from SB. The acceleration and monitor orientation signals when an accelerometer is not being worn are likely very different than when the monitor is being worn during SB because when someone is engaged in SB, even the smallest movements will be detected by the accelerometer, allowing differentiation of SB from non-wear. Therefore, when developing machine learning algorithms, it is important to include non-wear as an activity so that the algorithms can detect non-wear as distinct from SB. Additionally, there is a lack of consensus on the minimum amount of time per day a monitor must be worn in order to yield an accurate reflection of someone’s daily activity patterns, with minimal wear-time ranging from two hours/day (Brownson, Hoehner et al. 2009) to 16 hours/day (Slootmaker, Schuit et al. 2009), although most studies require a minimum of 812 hours/day (Masse, Fuemmeler et al. 2005). Finally, the minimal number of days of valid data needed for an accurate measure of true PA levels has ranged from one day (Le Masurier, Sidman et al. 2003) to seven days (Matthews, Ainsworth et al. 2002), with most studies using 3-4 as a minimum number of valid days (Trost, McIver et al. 2005). Choice of minimum number of continuous zeroes for non-wear, the minimal number of hours/day of accelerometer wear, and the minimum number of day of wear all can significantly affect the results of subsequent analyses regarding total PA or SB and activity patterns (Trost, McIver et al. 2005; Evenson and Terry 2009; Oliver, Badland et al. 2011; Herrmann, Barreira et al. 2012). We expect that a more 68 accurate way of recognizing non-wear may help to advance discussion on these compliance issues. It is important to note that these data reduction rules have been established with the intent of achieving a certain test-retest reliability (usually r=0.80-0.90) when using linear regression approaches for measuring PA and/or EE (Welk 2002). With improvements in accelerometer technology, continuous wear of monitors, and machine learning techniques for data processing and analysis, these reduction rules may no longer apply. However, this issue lies outside the scope of the current study. Summary of current evidence and future directions In conclusion, there is considerable evidence linking both PA and SB to poor cardiometabolic health. However, without improvements in the measurement of PA and SB along with accurate determination of activity type, we will be limited in our ability to detect the true risks of SB or monitor the effectiveness of interventions at reducing SB. Machine learning techniques show great potential to improve measurement of SB as well as EE and classification of activity type, but their current complexity may prohibit wide adoption by PA researchers. The current study aims to develop ANN algorithms for hip-, wrist-, and thigh-mounted accelerometers using simple-to-compute features from the accelerometer data and freely available software to allow for relatively simple creation and testing of the ANNs. Our study will directly compare the accuracy of the hip-, wrist-, and thigh-mounted accelerometers to measure EE, SB, and activity type in a simulated free-living setting. 69 CHAPTER 3 VALIDATION AND COMPARISON OF ACCELEROMETERS LOCATED ON THE WRISTS, HIP, AND THIGH FOR FREE-LIVING ENERGY EXPENDITURE PREDICTION ABSTRACT The purpose of this study was to develop, validate, and compare energy expenditure prediction models for accelerometers placed on the wrists, hip, and thigh. A secondary purpose was to achieve high measurement accuracy using simple accelerometer features as input variables in energy expenditure prediction models. METHODS: Forty four healthy adults participated in a 90-minute simulated free-living activity protocol. During the protocol, participants engaged in a total of 14 different sedentary, ambulatory, lifestyle, and exercise activities for 3-10 minutes each. Participants chose the order, duration, and intensity of activities. Four accelerometers were worn (right and left wrists, right hip, and right thigh) in order to predict energy expenditure compared to that measured by the criterion measure (portable metabolic analyzer). Artificial neural networks were created to predict energy expenditure from each accelerometer using a leave-one-out cross-validation approach. Accuracy of the neural networks was evaluated using Pearson correlations, root mean square error, and bias. Several models were developed using different input features in order to determine those most relevant for use in the models. RESULTS: All four accelerometers achieved high measurement accuracy, with correlations >0.80 for predicting energy expenditure. The thigh accelerometer provided the highest overall accuracy (r=0.89) and lowest root mean square error (1.05 METs), and the differences between the thigh and the other monitors was more pronounced when fewer input variables were used in the predictive models. None of the predictive models had an overall 70 bias for estimation of energy expenditure. CONCLUSIONS: A single accelerometer placed on the thigh provided the highest accuracy for energy expenditure prediction, although monitors worn on the wrists or hip can also be used with high measurement accuracy. 71 INTRODUCTION Physical activity (PA) has long been recognized for its beneficial effects on many aspects of health. Because of these known health benefits, the most recent PA guidelines advocate that adults obtain a minimum of 150 min/week of moderate-intensity PA, 75 min/week of vigorousintensity PA, or a combination of the two (PAGAC 2008). Moderate- and vigorous-intensity PA can be defined according to the amount of energy they elicit, with moderate-intensity PA being any activity that elicits an energy expenditure (EE) of at least 3.0 times, but less than 6.0 times, the resting level (METs) and vigorous-intensity PA as an activity that elicits at least 6.0 METs. Accurate measurement of EE is vital for understanding prevalence of meeting PA recommendations, identifying populations who may benefit from interventions aimed at increasing PA, and better understanding the relationship between PA and health. Objective PA measurement tools such as activity monitors have shown considerable promise due to their relative ease of use and accurate measurement of PA for days or weeks at a time (Welk 2002). Accelerometer-based activity monitors in particular have seen dramatically increased use for measurement of free-living PA. Accelerometers are generally worn on the hip and record accelerations of the trunk as a person moves. These accelerations have traditionally been used as an independent variable in linear regression equations to estimate EE. Linear regression approaches to prediction of EE are appealing due to their simplicity and their high accuracy in initial validation studies, which focused on measuring the EE of ambulatory activities (i.e., walking and running) in controlled settings (Freedson, Melanson et al. 1998). However, the linear relationship between accelerations and EE does not seem to hold when applied to non-ambulatory activities or free-living environments, resulting in much poorer prediction accuracy in such situations (Hendelman, Miller et al. 2000; Swartz, Strath et al. 2000). 72 To overcome these limitations, researchers have explored several avenues to improve PA measurement. One approach involves the use of more than one monitoring device to measure accelerations and/or other physiologic variables (i.e., heart rate) to improve EE measurement. Use of multi-monitor systems has shown promise for improving EE measurement in several studies (Zhang, Pi-Sunyer et al. 2004; Albinali, Intille et al. 2010; Dong, Biswas et al. 2013), but the use of multiple monitors dramatically increases participant and researcher burden, preventing these methods from being feasible for use in large surveillance, intervention, or epidemiologic studies. Another approach to improving EE prediction has involved using techniques other than linear regression for modeling the relationship between acceleration data and EE. Machine learning, a branch of artificial intelligence, has become a popular modeling technique and has been shown to improve EE measurement in both laboratory-based and free-living settings (Rothney, Neumann et al. 2007; Staudenmayer, Pober et al. 2009; Freedson, Lyden et al. 2011). However, there are still many unresolved questions regarding use of machine learning for predicting EE. First, machine learning modeling may allow for accurate prediction of EE using accelerometers placed on body locations other than the hip (i.e., wrist, ankle, and thigh), but it is unclear if accelerometers placed on alternate body locations can achieve the same measurement accuracy as a hip-mounted accelerometer. The wrist is an appealing accelerometer placement site due to its utility in measuring sleep and activity type as well as ease of wear (Kripke, Mullaney et al. 1978; Jean-Louis, Kripke et al. 2001; Zhang, Rowlands et al. 2012; Mannini, Intille et al. 2013). Additionally, accelerometers worn on the thigh have shown high accuracy for measuring ambulatory activity and sedentary behavior (Grant, Ryan et al. 2006; Ryan, Grant 73 et al. 2006). Despite the potential for the wrist and thigh as measurement sites, there is very limited evidence regarding their utility for measuring EE. Second, a current limitation of using machine learning to model accelerometer data is that machine learning models are much more complex than traditional linear regression approaches, both in the extraction of useful information (features) from accelerometer data to use as inputs into the models as well as the model creation itself. This complexity currently limits the use of machine learning and keeps it from being used on a wider scale. However, there is some evidence that the process of developing and using machine learning can be simplified without compromising measurement accuracy. In 2009, Staudenmayer et al. (Staudenmayer, Pober et al. 2009) took a large step toward simplifying the use of machine learning modeling. They used the R statistical software (a freely available, open-source software package) to develop a specific type of machine learning model (an artificial neural network [ANN]) to predict EE and activity type. Additionally, they used simple, time-domain features (percentiles of the acceleration signal and autocorrelation) as input variables and achieved dramatically improved EE estimations over linear regression approaches. However, it is unknown whether the features they used as input variables in their models represent an optimal set of input variables for maximizing EE prediction accuracy. Third, most validation studies are carried out in laboratory-based settings, which allows for good control of type, duration, and intensity of activities performed. However, there is considerable evidence that laboratory-based validation techniques have considerably lower accuracy when applied to free-living situations (Swartz, Strath et al. 2000; Crouter and Bassett 2008; Lyden, Keadle et al. 2013). 74 The purpose of this study was fourfold: 1) to validate models for estimation of EE from accelerometers worn on the wrists, hip, and thigh for prediction of EE in a simulated free-living setting, 2) compare the accuracy of EE prediction for accelerometers located on the wrist, thigh, and hip, 3) compare accuracies achieved by the left and right wrists, and 4) compare different input features to determine an optimal set of simple input features that maximizes prediction accuracy while minimizing complexity of the machine learning technique. 75 METHODS Summary of protocol Participants were brought into the Human Energy Research Laboratory to participate in a 90-minute simulated free-living protocol. For the protocol, participants performed 14 activities for between 3-10 minutes, with order and duration of activities left up to participants. During the protocol, participants wore a portable metabolic analyzer (for a criterion measure of EE) and four accelerometers. Participants A total of 44 adults (22 male, 22 female) were recruited from the area of East Lansing, MI via email, flyers, and word of mouth for participation in this study. Exclusion criteria included the following:1) if participants had known health conditions that prevented them from being able to perform MVPA safely, 2) if they were wheelchair-bound or had orthopedic limitations that invalidated the use of accelerometry for activity measurement, or 3) if they fell outside the age range of 18-44 years. Anyone over the age of 44 was excluded from participation as the American College of Sports Medicine asserts that those aged 45 and above are at higher risk for acute cardiovascular complications with exercise (ACSM 2009), and we did not have medical personnel available during testing to approve vigorous PA for older individuals. Anyone under the age of 18 was excluded from these preliminary validations because children and adolescents have a higher relative EE for activities than adults due to normal growth and maturation (Krahenbuhl and Williams 1992), and their activity patterns are different than those of adults (Bailey, Olson et 76 al. 1995). This study was approved by the Michigan State University Institutional Review Board prior to participant recruitment. Details of the study were described to each participant immediately upon arriving at the Human Energy Research Laboratory, and written informed consent was obtained prior to proceeding with the protocol. Instrumentation The instruments used in this study were ActiGraph GT3X+ accelerometers, GENEActiv accelerometers, and an Oxycon Mobile portable metabolic analyzer. The Oxycon portable metabolic analyzer provided a criterion measure of EE. The accelerometers and portable metabolic analyzer were synchronized to an external clock before each test; descriptions of the accelerometers and metabolic analyzer follow. Pictures of the equipment can be seen in Figure A.1 in the Appendix. ActiGraph accelerometers The ActiGraph (ActiGraph LLC, Pensacola, FL) is the most commonly used accelerometer on the market for PA research, and there is an abundance of literature regarding its reliability and validity for measurement of PA (Freedson, Melanson et al. 1998; Matthew 2005). Two GT3X+ models were placed on each participant during the study. One accelerometer was placed on the midline of the right thigh, one third of the way between the hip and knee and adhered to the leg with hypoallergenic sticky tape. The other ActiGraph was mounted on the right hip at the anterior axillary line with an elastic belt. The ActiGraph GT3X+ records raw accelerations of up to ± 6 times gravitational force (6g) in three axes of movement. For the current protocol, the GT3X+ accelerometers recorded at a rate of 40 samples per second (40 Hz). 77 GENEA accelerometers The GENEActiv (Activinsights Ltd, Kimbolton, Cambridgeshire, UK) is a new accelerometer that has recently been validated for PA measurement (Esliger, Rowlands et al. 2011). Like the ActiGraph, the GENEA records raw data of up to ± 6g in three axes of movement. The GENEAs were set to record acceleration data at a rate of 20 Hz for the current study. The GENEA is shaped like a watch and comes with a standard wrist strap, allowing for easy attachment to the wrist. Participants wore two GENEA accelerometers (one on each wrist) for this study. Each GENEA was fastened securely to the dorsal side of the wrist, between the styloid processes of the radius and ulna (Esliger, Rowlands et al. 2011). The acceleration data for all four accelerometers were time stamped and stored within the monitors and later were downloaded to a computer for analysis. Additionally, the accelerometers were oriented so that the x-axis was the vertical axis, the y-axis was the medial-lateral axis, and the z-axis was the anterior-posterior axis. Oxycon portable metabolic analyzer The Oxycon Mobile (Cardinal Health, Yorba Linda, CA) portable metabolic analyzer was used to measure oxygen consumption (VO2) and carbon dioxide production (VCO2) during 13 of the 14 activities performed in the protocol (EE was recorded but not analyzed for the non-wear activity). The Oxycon is lightweight (950 g) and was worn on the back using a shoulder harness. Participants were fitted with a breathing mask (held in place by a mesh cap), which was attached to a digital turbine flowmeter and gas sampling tube, allowing the analyzer to measure inspired and expired air volume so that VO2 and VCO2 could be calculated on a breath-by-breath basis. VO2 data were expressed in ml/kg/min and converted to METs (by dividing VO2 by 3.5) for analysis. 78 Prior to each test, the Oxycon was calibrated according to manufacturer’s specifications to ensure accurate measurements for flow rate and gas concentration. The Oxycon has been shown to provide valid VO2 measures over a range of exercise intensities (Rosdahl, Gullstrand et al. 2010; Akkermans, Sillen et al. 2012) and was used as the criterion measure for EE in this study. Procedure Each participant reported to the Human Energy Research Laboratory for one visit. Participants were asked to refrain from eating for three hours prior to visiting the laboratory to minimize risk of discomfort while performing the activities and because food ingestion can affect EE values. Details of the study were discussed with each participant. Written informed consent was obtained, and a physical activity readiness questionnaire was administered to ensure that the participant was healthy and had no contraindications to engaging in MVPA. If participants had answered ‘yes’ to any question on the questionnaire, they would have been asked to obtain physician approval before being able to participate in the study; however, this did not occur. Next, participant weight and height were taken by trained research assistants according to standardized methods (Malina 1995). Weight was measured to the nearest 0.1 kg using a Seca digital scale (Seca, Hanover, Germany), with shoes off and weight balanced on the center of the scale. Height was measured to the nearest 0.1cm using a Harpenden stadiometer (Holtain Ltd., Crymych, United Kingdom). Before measurement, the participant removed his/her shoes, stood erect with feet flat on the floor, head aligned in the Frankfurt plane, and the back of the feet, shoulders, and head resting against the back of the board. Two measurements were taken and averaged for both weight and height. If the two weights differed by more than 0.3 kg or if the two heights differed by more than 0.4 cm, a third measurement was taken, and the closest two were averaged. Body mass index (BMI) was calculated by dividing body weight by the square of height (kg/m2). Age was assessed 79 by asking participants to state their age in years, and handedness was assessed by asking participants which hand they prefer to use for the majority of activities. Each participant wore the Oxycon metabolic analyzer, one ActiGraph on the hip, another ActiGraph on the thigh, one GENEA on the left wrist, and one GENEA on the right wrist while performing 14 activities (activity descriptions provided in Table 3.1, and pictures of the activities being performed can be found in Appendix D). These activities comprised a range of intensities from sedentary to vigorous and represented a mixture of sedentary, ambulatory, exercise, and lifestyle. Ambulatory activities (walking, running) are common in accelerometer validation literature; however, we added the sedentary, exercise, and lifestyle activities to determine the potential for the four accelerometers to measure a range of activity types and intensities often seen in free-living settings. Additionally, we added a non-wear activity so that the ANNs would be able to recognize when the accelerometers were not being worn, allowing for easy exclusion of nonwear time from data analyses. The non-wear activity was not included in our analysis of EE prediction. 80 Table 3.1. Activities performed during the simulated free-living protocol. Activity Category Activity Activity Intensity Lying down (T1) Sedentary Reading (T2) Sedentary Computer (T3) Sedentary Standing (T4) Light** Laundry (T5) Light Sweeping (T6) Light Walking slow (T7) Light Walking fast (T8) Moderate Jogging (T9) Vigorous Cycling (CY) Cycling (T10) Moderate/ Vigorous Stair use (SU) Stair climbing and descending (T11) Moderate/ Vigorous Biceps curls (T12) Light Squats (T13) Moderate Non-wear of accelerometer (T14) N/A Sedentary (SE) Standing (ST) Lifestyle (LI) Leisure walk (LW) Brisk walk (BW) Jogging (JO) Exercise (EX) Non-wear (NW) Description of Activity* Lying on a mat on the floor Reading a magazine article while sitting at a table Sitting and playing a computer game that involves mouse clicking and typing Standing still with arms at sides Folding towels and putting them in a laundry basket Sweeping confetti into piles Walking at a self-selected ‘slow’ pace in a hallway Walking at a self-selected ‘brisk’ pace in a hallway Jogging at a self-selected pace in a hallway Cycling on a cycle ergometer at a selfselected cadence of 50-100 rpm with 1 kg resistance Walking up and down a flight of stairs at a self-selected pace Standing still while doing biceps curls with a 3-lb. weight in each hand With feet shoulder-width apart, bending at the knees (to a 90° angle) while holding an unweighted broom behind the head Not wearing the accelerometer * Activity order, intensity, and duration (3-10 minutes) were left up to participants. ** Standing has traditionally been considered SB; however, recent literature suggests that standing should be considered light-intensity instead of SB due to the differential physiologic effects of standing as compared to sitting/lying (Owen, Healy et al. 2010). 81 The 14 activities were performed in a 90-minute, simulated free-living setting which took place in a laboratory room inside the Human Energy Research Laboratory and a hallway and stairwell outside the laboratory. A list of the activities was written on a whiteboard for participants at the beginning of the visit and a description of each activity was given. The order of activities on the whiteboard was altered every 4-5 participants in order to avoid ordering effects during the visit. Participants completed each of the 14 activities for a total of at least three minutes and for no more than 10 minutes, but the order, intensity, and timing of the activities were left up to each participant. A research assistant observed and recorded each activity on a handheld computer while it was being performed and periodically updated participants on which activities they still needed to complete. The non-wear activity was saved until the end of the 90-minute protocol so that participants would not spend a significant portion of the 90-minute protocol trying to remove and reattach the accelerometers. Upon completion of the protocol, participants were given a $35 Target® gift card. Data reduction and modeling Artificial neural networks ANNs are nonlinear models which take a set of inputs x1…xk and use them to predict a certain output variable y (e.g., EE or activity type), where k is the number of features used to predict y. An ANN designed to predict EE was developed for each accelerometer. Figure 3.1 shows a graphical depiction of the ANN. The general form of an ANN model can be seen in Equation 1. Equation 1: ∑ [ ( 82 ∑ )] In Equation 1, w are the weights that need to be estimated, ( ) (which is a linear function), H is the size of the hidden layer, and y is EE in METs. In accordance with previous research (Preece, Goulermas et al. 2009; Staudenmayer, Pober et al. 2009; Trost, Wong et al. 2012), our ANNs contained only one hidden layer. 83 Figure 3.1. ANN for predicting EE. Legend for Figure 3.1 The input layer contains the features used as input variables *The Hidden Layer contains 15 hidden units, but only three are shown for simplicity. Accelerometer signal features (one of each per axis, three total of each per accelerometer) 1. Mean = mean 2. Var = variance 3. Cov = covariance 4. Min = minimum 5. Max = maximum 6. MeanOR = mean accelerometer orientation 7. VarOR = variance of 8. 10th %ile = 10th percentile accelerometer orientation 9. 25th %ile = 25th percentile 10. 50th %ile = 50th percentile th th 11. 75 %ile = 75 percentile 12. 90th %ile = 90th percentile Participant characteristics features 13. Ht = participant height 14. Wt = participant weight 15. Gender = participant gender Non-feature abbreviations S = summations of the input layer in the hidden units U = activation function for the hidden layer W1 = the weight vectors for each of the inputs W2 = the weight vectors for each of the summations 84 The ANNs were created and tested using a leave-one-participant-out approach. In this approach, the ANN was first created from a ‘training’ data set, where the input features and the outcome variable (EE) were used to estimate the weights for each input feature. This training set consisted of the data from all but one participant in the study. Then, the ANN was tested on the data from the participant left out of the training phase. This testing was conducted by supplying the input features and comparing the predicted EE from the ANNs to the measured EE from the criterion measure (Oxycon metabolic analyzer). This process was conducted with each participant’s data used as the testing data once, therefore obtaining an ANN for each participant in the study. Weights determined from each iteration of the leave-one-participant-out validation were averaged to obtain a final ANN. This process was conducted separately for each accelerometer, resulting in four distinct ANNs. There were three additional considerations that were addressed in building our ANNs: 1) window length, 2) relevant features to use as input variables, and 3) size of the hidden layer. Window length In order to analyze accelerometer data, it must first be divided into smaller segments, called ‘epochs’ or ‘windows,’ for analysis. By dividing the data into windows, EE can be assessed separately for each window to yield information on activity type, duration, intensity, etc. Windows of 60 seconds are commonly used for analyzing accelerometer data because outputting a given EE every minute is intuitively appealing and works well for steady-state activities (Staudenmayer, Pober et al. 2009; Freedson, Lyden et al. 2011). Additionally, longer windows (i.e., 30-60 seconds) increase the amount of information available with which to determine activity type and have been shown to improve EE prediction accuracy (Trost, Wong et al. 2012). Finally, early 85 accelerometers had limited data storage, so acceleration data had to be stored in 60-second windows in order to be able to record data for a period of several days. Sixty-second windows work well for laboratory-based protocols where participants perform activities for a specific amount of time at steady state and then change to the next activity at known intervals (e.g. every five minutes) (Trost, Wong et al. 2012). However, these long windows may be less optimal in free-living situations, where steady-state EE is rarely achieved for physical activities and where activities rarely start or end exactly on the minute (Orendurff, Schoen et al. 2008; Lyden, Keadle et al. 2013). Other studies have shown similar or better accuracy of measurement of EE in adults when using shorter epochs (Gabriel, McClain et al. 2010; Ayabe, Kumahara et al. 2013; Orme, Wijndaele et al. 2014); therefore, we chose to use 30-second windows in the current study. Features As mentioned previously, the activity counts variable is commonly used as an input for linear regression equations used to measure EE and activity intensity. However, contained within activity counts are useful data ‘features’ that can be extracted and used in either linear regression models or machine learning algorithms. There are several different types of features that can be used as input variables. Time-domain features are most commonly used because they can be directly extracted or computed from accelerometer signal data. Examples of time-domain features are mean, standard deviation, skewness, or percentiles of the acceleration signal. In addition to being directly available from accelerometer data, many time-domain features are easy to understand and interpret (Preece, Goulermas et al. 2009; Staudenmayer, Pober et al. 2009). The other main type of features, frequency-domain features, can be used either in conjunction with or 86 independent from time-domain features, yielding similarly high accuracy for activity type classification as time-domain features in some studies (Preece, Goulermas et al. 2009; Mannini, Intille et al. 2013). However, frequency-domain features require additional steps such as framing the data, complex mathematical transformations, and filtering, and their calculation requires significant computational power (Preece, Goulermas et al. 2009). Additionally, several studies provide evidence that time-domain features can be used to achieve high activity classification (7090% from a single accelerometer) and EE predication accuracy without use of frequency-domain features (Herren, Sparti et al. 1999; Staudenmayer, Pober et al. 2009; Dong, Montoye et al. 2013; Montoye, Dong et al. 2013). Other than time- and frequency-domain features, simple descriptive features, such as accelerometer orientation or participant demographic variables, can also be used and may improve EE measurement accuracy. Many accelerometer signal features have been used in previous research, and the models created have varied considerably in complexity and measurement accuracy. While adding more features may improve accuracy of the ANN, it may also lead to overfitting ANNs to the data used for training the ANN, resulting in poorer generalizability of the model when applied to a new population (Preece, Goulermas et al. 2009). Additionally, a major drawback of many machine learning models is that while they tend to have high measurement accuracy, model complexity can quickly render them unusable for anyone who lacks considerable knowledge of mathematics or computer science and/or without access to expensive computing software (Pober, Staudenmayer et al. 2006; Rothney, Neumann et al. 2007; Staudenmayer, Pober et al. 2009). Thus, this study focuses on using easy-to-compute features as input variables and identifying a small number of these features that can achieve high measurement accuracy. 87 Before computing features, the 40 Hz data from the ActiGraph accelerometers were reintegrated to 20 Hz for comparison with the data from the GENEA. Table 3.2 provides a list of calculations for the 39 features tested and used in the analyses. Calculation and extraction of the accelerometer features were performed in Microsoft Excel. The 36 accelerometer features (12 features for each of three accelerometer axes) are all time-domain features that have been effectively utilized in previous studies; additionally, weight, height, and gender were included to account for demographic characteristics of participants. For the EE prediction, the 30-second windows allow 600 accelerometer signal samples for calculating the features (20 samples/second x 30 seconds). Mean, variance, covariance, minimum, maximum, mean and variance of monitor orientation, and the 10th, 25th, 50th, 75th, and 90th percentiles were calculated separately for x-, y-, and z-axes. These features were chosen to allow the ANN sufficient data to accurately predict EE. After creating the ANNs using all 39 features, follow-up analyses were conducted to determine an optimal subset of features that reduced complexity of the ANNs with minimal loss of accuracy. In all, we used and compared five different sets of features. These feature sets can be seen in Table 3.3. Two feature sets (sets 2 and 5 in Table 3.3) were similar to those used successfully in previous studies for EE prediction (Staudenmayer, Pober et al. 2009; Dong, Biswas et al. 2013). Feature set 1 was the full set, consisting of many potentially important characteristics of the acceleration signal that have also been used in other studies, but not necessarily in the same combinations (Preece, Goulermas et al. 2009). From the 36 accelerometer features used in set 1, correlations were computed among the features to determine and remove redundancy in information available from the features (Rothney, Neumann et al. 2007). As with linear regression, highly correlated input variables can cause collinearity in the ANNs and may reduce their generalizability. Therefore, we chose two features from each accelerometer axis that were poorly correlated with each other (mean 88 and variance) and then used a stepwise approach to select features that had correlations of less than r=0.70 with features already included in the set. Using this approach, we arrived at feature set 3, consisting of mean, variance, minimum, and maximum of the acceleration signal (for each axis of measurement) as well as participant weight, height, and gender (15 total features). Finally, in feature set 4 we initially included lag-one autocorrelation, which has been used in many other studies since it can yield valuable information on the temporal nature of activities by assessing the correlation of two adjacent windows of acceleration data. However, the calculation of autocorrelation involves dividing by the variance of the acceleration data within the adjacent windows, which for many sedentary activities is 0 and results in an invalid calculation. Other studies using autocorrelation have used automatic rules for classifying EE of sedentary activities (i.e., Trost et al. automatically assigned all windows with an invalid lag-1 autocorrelation a MET value of 1.0) (Trost, Wong et al. 2012), but we feel that this approach is limited since different sedentary activities may elicit slightly different EE. Instead of using lag-one autocorrelations with an automatic classification scheme for all sedentary activities, we calculated covariance as a feature since covariance is simpler to calculate, is defined even when variance is zero, and can provide information regarding similarity of the accelerometer signals of adjacent data windows (similar to autocorrelation). These feature sets were tested including and excluding the three participant characteristics (weight, height, and gender) in order to determine if demographic characteristics would improve accuracy of the models. 89 Table 3.2. Features used for EE prediction. Feature number 1-3* Feature used 4-6* Variance of acceleration signal 7-9* 10-12* 13-15* 16-18* Covariance of acceleration signal Minimum of acceleration signal Maximum of acceleration signal 10th percentile of acceleration signal 19-21* 25th percentile of acceleration signal 22-24* 50th percentile of acceleration signal 25-27* 75th percentile of acceleration signal 28-30* 90th percentile of acceleration signal N/A Accelerometer orientation (needed for calculating features 31-36) Mean acceleration signal Formula for calculating feature in each 30second window ∑ ( ) ∑ ∑ ( ( ) ) ( ( )] ) ( ) ( ) For every 600 accelerations, arrange in order from smallest to largest and pick the 60th value For every 600 accelerations, arrange in order from smallest to largest and pick the 150th value For every 600 accelerations, arrange in order from smallest to largest and pick the 300th value For every 600 accelerations, arrange in order from smallest to largest and pick the 450th value For every 600 accelerations, arrange in order from smallest to largest and pick the 540th value ( ) ( 31-33* Mean accelerometer orientation 34-36* 37 38 Variance of accelerometer orientation Participant height Participant weight N/A N/A 39 Participant gender N/A √( ) ( ∑ ( ∑ ) ) ) The * signifies that one feature is included for each of the three accelerometer axes. The formulas shown are for the x-axis, but the formulas for the y-and z-axes are similar. Ax is the acceleration in the direction of the x-axis. 90 Table 3.3. Feature sets used for creation and testing of ANNs. Feature set number 1 2 3 4 5 Features used Mean, variance, covariance, minimum, maximum, mean orientation, variance of orientation, and 10th, 25th, 50th, 75th, and 90th percentiles of acceleration signal, weight, height, and gender Mean and variance of acceleration signal, weight, and height Mean, variance, minimum, and maximum of acceleration signal, weight, height, and gender Mean, variance, covariance, minimum, and maximum of acceleration signal, weight, height, and gender 10th, 25th, 50th, 75th, and 90th percentiles of acceleration signal, weight, height, and gender Total number of features used 39 (12 accelerometer features per axis * 3 axes + weight + height + gender) 9 (2 accelerometer features per axis * 3 axes + weight + height + gender) 15 (4 accelerometer features per axis * 3 axes + weight + height + gender) 18 (5 accelerometer features per axis * 3 axes + weight + height + gender) 18 (5 accelerometer features per axis * 3 axes + weight + height + gender) Size of the hidden layer As with the number of features used, more hidden units in the hidden layer allows for more flexibility in the ANNs, allowing the model to better fit the training data. However, having more units also increases the chances of overfitting. There is no consensus on the optimal number of hidden units to use, but some investigators have used a number of hidden units similar to the number of activities being identified and/or the number of input features used (Preece, Goulermas et al. 2009; De Vries, Garre et al. 2011). Since our aim is to minimize the number of features used and since our study contains 14 activities, we chose to use 15 hidden units in our hidden layer. 91 Oxycon data In a previous study by members of our research group, we reintegrated breath-by-breath Oxycon portable metabolic analyzer data into 10- and then 15-second windows for analysis. However, with both windows we found that data loss occurred in participants with slower breathing rates (especially during sedentary activities), resulting in our reintegrating the data into 30-second windows for our final analysis (Montoye, Dong et al. 2014). Correspondingly, breathby-breath Oxycon data from the simulated free-living protocol were reintegrated into 30-second windows for measurement of EE in the current study. These 30-second windows of accelerometer data were used for training the ANNs to predict EE (as described earlier). Also, when testing the EE ANNs, 30-second windows were used for computing predicted EE for comparison to Oxyconmeasured EE. Since the Oxycon recorded continuously and was not dependent on correctly identifying an activity type, all data, including transitions, was included for training and testing of the ANNs. Statistical analyses After downloading the accelerometer and Oxycon data, all data processing was conducted in Microsoft Excel (Microsoft Corporation, Redmond, WA), and ANN creation was performed using the R statistical software package (R-project, Vienna, Austria) . We chose to use Microsoft Excel for data processing and R for our ANN creation in accordance with our intent to create and use ANNs using simple methodology which can be used by those without extensive computer programming skills or access to expensive computing software. Microsoft Excel is a very commonly used and widely accessible software package for personal computing, and R is relatively simple to use and is manageable to learn and use for researchers who may have limited 92 statistical or computing experience. Additionally, R is an open-source software which is freely available for download and has a special ANN library which can be used for development and testing of ANNs . Thus, development and application of ANNs in R is less costly and much less complicated than machine learning algorithms developed in previous studies (Pober, Staudenmayer et al. 2006; Rothney, Neumann et al. 2007; Preece, Goulermas et al. 2009; Mannini and Sabatini 2010), and R has been used successfully for creation of ANNs for predicting EE and activity type (Staudenmayer, Pober et al. 2009; Lyden, Keadle et al. 2013). Three summary statistics were calculated in order to test the accuracy of each ANN for predicting EE: Pearson correlations, root mean square error, and bias. Operational definitions of these three measures are given below. Pearson correlations (r): The covariance of two variables is divided by the product of the standard deviation of the two variables to obtain r. The range of possible r values is -1 to 1, with 1 being a perfect correlation and -1 being a perfect inverse correlation (Field 2009). A minimum correlation of r=0.60 has been defined as moderately high validity in the literature; therefore, we desire to obtain a correlation of r≥0.60 between predicted EE and Oxycon-measured EE (Safrit and Wood 1995). If this minimum correlation was not met, we would have increased the window length and added additional features to improve correlations to and meet our desired correlation. Root mean square error (RMSE): The square root of the mean squared difference between values predicted by an estimator (the ANNs) and the true values (measured by the criterion measure) is the RMSE. Smaller RMSE values represent better prediction of the ANNs; thus, our goal was to minimize RMSE to maximize accuracy of the ANNs. 93 Bias: Bias is the difference between the estimated value of a measure and the true value. Bias allows for determinations of systematic over- or underprediction of EE; a negative bias represents underestimation of EE by the ANNs, and a positive bias represents overestimation. We desired to bias achieve bias values close to 0 in order to maximize accuracy of the ANNs. Correlations, RMSE, and biases were calculated separately for each of the four accelerometers and each of the five different feature sets. Differences among correlations, RMSE, and biases among the four accelerometers were assessed using repeated-measures analysis of variance (RMANOVA). Additionally, differences among feature sets were evaluated using RMANOVA. Since correlations tend to be negatively skewed, we first performed a Fisher’s Z transformation to normalize the correlations before performing the RMANOVA. When the RMANOVA revealed statistically significant differences for any of the three analyses, post hoc dependent t-tests were conducted to determine differences among monitor placements or feature sets. The a priori Alpha level was set at P<0.05 for determining statistical significance. Statistical analyses were performed using SPSS version 22 (IBM Corporation, Armonk, NY). Power analysis In the simulated free-living setting, correlations of r≥0.60 between measured EE and estimated EE from the four ANNs were desired to indicate moderately high validity of the accelerometers for EE estimation (Safrit and Wood 1995). Table 3.4 shows the minimum correlation that could have been detected with different sample sizes and power. For example, with 20 participants, power is 80% to detect a correlation of 0.591. Thus, our sample size of 44 was well above the minimum required number of 25 needed to detect our minimum desired correlation (0.60) with greater than 90% power. We chose to oversample in order to ensure 94 adequate sample size due to the potential for occasional malfunction and/or loss of battery power of the accelerometers or the portable metabolic analyzer experienced in a previous study by members of our research group (Montoye, Dong et al. 2014). Table 3.4. Minimum Pearson correlations detectable for a given sample size and power. Sample size 18 20 24 30 36 42 80% power 0.619 0.591 0.545 0.492 0.452 0.395 90% power 0.684 0.656 0.609 0.554 0.511 0.477 95 RESULTS Malfunction of the Oxycon metabolic analyzer (due to a bad battery) occurred in three participants, and accelerometer malfunction occurred in another two participants. These participants were excluded from further analyses, resulting in 39 included in model creation and validation. Means and standard deviations (SD) for participant characteristics (both those included in and excluded from analysis) are shown in Table 3.5. Although weight and BMI appeared higher in females excluded in the final analysis, these differences were not statistically significant. Of the 39 participants included in the analysis, 13 were either overweight or obese according to BMI (≥25 kg/m2). Additionally four of the 39 participants included in the final analysis were left-hand dominant, with the remaining 35 being right-hand dominant. Table 3.5. Demographic characteristics of participants enrolled in study. Included in analysis Mean (SD) All Males (n=39) (n=19) Age (years) 22.1 (4.3) 23.7 (5.0) Weight (kg) 72.4 (16.2) 84.5 (13.1) Height (cm) 171.4 (10.1) 179.1 (7.7) BMI (kg/m2) 24.4 (3.6) 26.3 (3.4) Females (n=20) 20.5 (2.7) 60.8 (8.9) 164.1 (5.7) 22.5 (2.6) Excluded from analysis All Males (n=5) (n=3) 21.2 (2.9) 21.3 (4.0) 78.2 (21.2) 75.9 (4.9) 167.8 (9.8) 175.9 (0.4) 28.0 (9.1) 24.8 (1.7) Females (n=2) 21.0 (1.4) 81.6 (41.4) 157.1 (2.4) 32.8 (15.8) In initial testing of the five feature sets, it was found that the addition of weight, height, and gender yielded no gains in predictive accuracy of the ANNs. Therefore, these features were removed when training and testing the ANNs. Correlations for predicted EE are shown in Table 3.6. With correlations ranging from r=0.82-0.89 for the four accelerometers across the five sets of features, all four monitors achieved correlations well above the r=0.60 desired to indicate moderately high validity. The RMANOVA test among accelerometer placement sites revealed a test statistic of F=4.36, indicating significant differences among the four placement sites. Post-hoc 96 tests revealed that the ActiGraph thigh accelerometer had higher correlations with measured EE (r=0.88-0.89) than the wrist accelerometers for all five feature sets (r=0.82-0.86) and higher correlations than the hip accelerometer (r=0.83-0.88) for all sets other than set 1 (which included all 39 features). Correlations achieved by the left and right wrist accelerometers were similar for each of the five feature sets.. When comparing accuracy achieved among the five sets of features, the thigh monitor accuracy was not affected by choice of feature set. Conversely, for the hip accelerometers, feature sets 2-5 resulted in slightly lower correlations; similarly, correlations dropped for the wrist accelerometers for feature sets 2-4 but not for set 5. Despite the statistical significance of the decreased correlations seen with the hip and both wrist accelerometers, the actual drop in correlations was quite small, especially for feature sets 3-5. Table 3.6. Correlations of measured vs. predicted EE. Correlations ActiGraph ActiGraph Thigh GENEA Left (SD) Hip Wrist 0.88 (0.05) 0.89 (0.07) 0.86 (0.05)* Set 1 (All accelerometer features) 0.83 (0.06)^ 0.88 (0.09)& 0.82 (0.06)*^ Set 2 (Mean, Var) 0.86 (0.04)*^ 0.89 (0.08)& 0.84 (0.05)*^ Set 3 (Mean, Var, Min, Max) 0.86 (0.06)*^ 0.89 (0.10)& 0.84 (0.06)*^ Set 4 (Mean, Var, Cov, Min, Max) 0.86 (0.05)* Set 5 (10th, 25th, 0.87 (0.04)*^ 0.89 (0.05) 50th, 75th, 90th percentiles) The * indicates significant differences from thigh accelerometer placement site. The & indicates significant differences from hip accelerometer placement site. The ^ indicates significant difference from feature set 1 (all features). 97 GENEA Right Wrist 0.86 (0.06)* 0.82 (0.08)*^ 0.83 (0.06)*^ 0.85 (0.06)*^ 0.86 (0.06)* Root mean square error (RMSE) values for predicted vs. measured EE the four accelerometers are shown in Figure 3.2. The RMANOVA test revealed a test statistic of F=3.64, indicating significant differences in RMSE among placement sites. For all five feature sets, the thigh accelerometer placement (1.05-1.14 METs) had significantly lower RMSE values than the hip (1.12-1.42 METs), left wrist (1.18-1.36 METs), and right wrist (1.18-1.38 METs) accelerometer placements. Moreover, when comparing among the five feature sets, the RMSE for the thigh was not significantly different for any of the five. Conversely, RMSE values with the hip accelerometer placement were significantly higher with feature sets 2-5 than set 1. Similarly, with the two wrist accelerometer placement sites, RMSE was significantly higher with feature sets 2-4 than set 1, although feature set 5 yielded similar RMSE to set 1. There were no differences in RMSE values between left and right wrists. RMSE (METs) Figure 3.2. RMSE values for predicted vs. measured EE. 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 ^* ^^ * ^ ^^ * ^* ^^ ^* ActiGraph Hip ActiGraph Thigh GENEA L. Wrist GENEA R. Wrist 1 2 3 4 5 Feature Set The * indicates significant differences from other accelerometers. The ^ indicates significant difference from feature set 1 (all 38 features). For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation. 98 Average biases for each accelerometer can be seen in Table 3.7. The RMANOVA test statistic was F=0.062, indicating no overall bias for any of the four monitor placements or for any of the five feature sets. This lack of bias indicates that none of the accelerometers had an overall overestimation or underestimation of EE in the total sample. Table 3.7. Bias for measured vs. predicted EE. Bias (SD) ActiGraph Hip Feature Set 1 0.01 (0.35) ActiGraph Thigh -0.01 (0.34) GENEA Left Wrist GENEA Right Wrist -0.02 (0.32) -0.01 (0.35) Feature Set 2 0.02 (0.59) 0.03 (0.42) 0.01 (0.41) -0..03 (0.49) Feature Set 3 -0.03 (0.46) 0.01 (0.32) -0.03 (0.35) 0.00 (0.47) Feature Set 4 0.05 (0.44) 0.01 (0.35) 0.00 (0.48) -0.01 (0.48) Feature Set 5 0.05 (0.43) -0.05 (0.29) 0.01 (0.46) 0.05 (0.47) 99 DISCUSSION The purposes of this study were 1) to validate accelerometers worn on the wrists and thigh for prediction of EE, 2) to compare the accuracy of EE prediction for accelerometers located on the wrists, thigh, and hip, and 3) to compare accuracies of the left and right wrists, and 4) to use simple input features to maximize prediction accuracy while minimizing complexity of the machine learning technique. Our results showed strong correlations between measured and predicted EE for all accelerometer placements and for all five feature sets. Also, our results indicated no systematic bias by any of the accelerometer placements for estimating EE. Overall, the thigh-mounted accelerometer provided the highest correlations with measured EE and also the lowest RMSE of the placement sites. With the full feature set (set 1), the thigh- and hip-mounted accelerometers provided similar EE prediction accuracy, but the thigh performed better when subsets of the full feature set were tested. Additionally, the thigh-mounted accelerometer performance was not diminished for any of the five feature sets tested, meaning that even very simple inputs such as mean and variance of the acceleration signal can be used to predict EE with a high degree of accuracy. Given previous work showing high accuracy for measuring sedentary behavior and ambulatory activities with thigh-mounted accelerometers (Grant, Ryan et al. 2006; Ryan, Grant et al. 2006; Skotte, Korshoj et al. 2012), the results of this study further illustrate the utility of the thigh as a highly accurate placement site for activity and EE measurement. Despite the superiority of the thigh-mounted accelerometer, it is worth emphasizing that the two wrist-mounted accelerometers provided only slightly lower accuracy than the thigh and comparable accuracy to the hip, resulting in high overall prediction accuracy of all four monitors. 100 Our finding of high prediction accuracy for the wrist accelerometer placement sites lies in contrast to studies that have used linear regression-based approaches for estimating EE. In the early days of activity monitors, Montoye et al. found significantly higher correlations for predicting EE using a hip-mounted motion sensor (r=0.71) compared to a wrist-mounted accelerometer (r=0.40) during ambulatory and exercise activities (Montoye, Washburn et al. 1983). Similarly, Swartz et al. found that in a simulated free-living setting, the hip-mounted accelerometer estimated EE with a moderate correlation of r=0.56, while the wrist-mounted accelerometer had a very poor correlation (r=0.18) with measured EE (Swartz, Strath et al. 2000). Finally, as recently as 2013, Rosenberger et al. found higher correlations (r=0.72 vs. r=0.36) and lower error (0.55 vs. 0.85 METs) when predicting EE from a hip-mounted accelerometer compared to a wrist-mounted accelerometer (Rosenberger, Haskell et al. 2013). It is important to note that these studies all used linear regression for their modeling technique; the consistent superiority of the hip to the wrist when linear regression is used is not surprising given that hip monitor records movement of the trunk, while wrist monitors record arm movement that may or may not be coupled with movement of the rest of the body, resulting in poor correlations of activity counts and EE. A significant advantage of machine learning is its ability to recognize patterns in an acceleration signal rather than simply using magnitude of acceleration for prediction. Recent studies by Mannini et al. (2013) and Zhang et al. (2012) show very high activity classification accuracies (85-97%) using a wrist accelerometer coupled with machine learning models, giving strong reason to believe that machine learning would also allow for high accuracy for EE prediction (Zhang, Rowlands et al. 2012; Mannini, Intille et al. 2013). The results of the current study support the utility of machine learning modeling as a viable approach to analyzing wrist101 mounted accelerometer data and provide further evidence of the superiority of machine learning to linear regression for modeling of accelerometer data. Additionally, while the current convention is for wrist accelerometers to be worn on the non-dominant wrist, the results of this study support that wrist choice will not affect accuracy for estimation of EE. A 2012 study by Zhang et al. found that classification accuracy for identifying four types of activities was 97% and 96% for left and right wrist accelerometers, respectively, further supporting the idea that choice of wrist placement will not affect measurement accuracy. The high accuracy of wrist-mounted accelerometers for EE prediction found in this study is especially encouraging given the utility of the wrist location for measuring sleep quality as well as its current use in large surveillance studies such as NHANES (Troiano and McClain 2012; Troiano, McClain et al. 2014). Additionally, wrist-mounted accelerometers are comfortable to wear and can be designed/disguised to look like watches, both of which may lead to improved compliance. With the ability to accurately measure sleep as well as activity type and EE, the wrist may represent an ideal blend of practicality and measurement accuracy for monitoring lifestyle behaviors and patterns. Of note, the left and right wrist accelerometer placements achieved equally high accuracies for prediction of EE, which provides evidence that the popular convention for an accelerometer to be placed on the non-dominant wrist may be unnecessary. The hip-mounted accelerometer achieved correlations of r=0.83-0.88 and RMSE values of 1.12-1.42 METs with the different feature sets, and these statistics compare favorably to those achieved in previous studies. In a study conducted in a laboratory-based setting, Staudenmayer et al. found that an ANN developed using data from a hip-mounted accelerometer predicted EE with an RMSE of 1.22 METs. This RMSE represented an improvement of 32-71% over 102 previously developed linear regression approaches tested in the study (Staudenmayer, Pober et al. 2009). Additionally, their input features were very similar to our feature set 5 (the 10 th, 25th, 50th, 75th, and 90th percentiles of the acceleration signal), lending additional support that this feature set is viable for use in different settings and populations. Similarly, Lyden et al. achieved intraclass correlation coefficients above 0.90 and RMSE values of 1.00 METs for predicting EE using ANNs developed from hip-mounted accelerometer data in a true free-living setting, again achieving superior accuracy when compared to linear regression approaches (Lyden, Keadle et al. 2013). In another study, Rothney et al. achieved a correlation of r=0.92 and RMSE of 0.50 METs when predicting EE using an ANN developed from a hip-mounted accelerometer in a simulated free-living setting. Their slightly better accuracy is likely due to study design, especially given that their use of a linear regression approach to EE prediction yielded a correlation of r=0.89 and an RMSE of 1.00 METs, both of which are considerably better than accuracy achieved in other studies (Hendelman, Miller et al. 2000; Swartz, Strath et al. 2000; Staudenmayer, Pober et al. 2009). Despite the slightly higher RMSE values achieved by the ANNs in our study, our results are encouraging given that participants averaged an intensity of 3.3 METs across the duration of the protocol, which is higher than many other studies and likely contributes to higher RMSE, as seen in previous work by our research group (Montoye, Dong et al. 2014). Taken together, these studies reinforce the high accuracy for EE prediction achievable using machine learning techniques on data from a single, hip-mounted accelerometer, both in laboratory-based and free-living settings. Our final objective in the current study was to use relatively simple methods for feature extraction and ANN creation and compare sets of input features in order to identify relevant feature sets that allow for high measurement accuracy while minimizing the complexity of the 103 ANN, both in its structure and in its creation. In order to achieve the first part of this objective, all data cleaning and feature extraction were conducted in Microsoft Excel. Features were calculated and extracted using simple functions already built into Excel. While this method is somewhat labor intensive, the key strength of this approach is that it is a viable method for feature extraction without knowledge of or access to powerful, complicated software packages. Use of macros in Excel requires additional knowledge of the software package but can also streamline the process of feature extraction. Additionally, ANN creation was conducted using R statistical software, which is a freely available, open-source software package. Writing programs in R is complex and requires skill, but implementing programs which have already been written is relatively simple and can be accomplished with knowledge of only a few commands in R. Use of the nnet package for creating ANNs has been successfully accomplished by Staudenmayer et al. and Lyden et al., and considerable detail of the approach, including some of the code for creating and testing the ANNs, can be found in their manuscripts (Staudenmayer, Pober et al. 2009; Lyden, Keadle et al. 2013). To address the second part of our objective to simplify use of ANNs, we sought to define an optimal subset of features that can be used without sacrificing measurement accuracy. For the thigh accelerometer, we found that choice of features had minimal impact on measurement accuracy, even in the simplest feature set (set 2) consisting of only mean and variance of the acceleration signal. A very similar set of features was used in a study by members of our research group, in which they were able to classify 14 activities with an accuracy above 78% with a thigh accelerometer (Dong, Montoye et al. 2013). Therefore, this minimal feature set appears to provide strong accuracy for both activity type classification and EE prediction when using a thigh-mounted accelerometer. For the hip-mounted accelerometer, feature sets 2-5 all 104 provided slightly lower prediction accuracy than set 1, although the drop in accuracy was very small (especially for sets 3-5) and may be of little practical significance. Finally, with the two wrist-mounted accelerometers, feature sets 2-4 resulted in significantly higher RMSE values and lower correlations with measured EE compared to the other four feature sets, although these differences were also small. Additionally, feature set 5 provided similar measurement accuracy to set 1. The choice of features to use in a predictive model will be dependent on the emphasis on accuracy vs. the feasibility for use. For studies with an emphasis on accuracy of measurement, the larger feature sets used with the wrist and hip placements yielded better accuracies than the ANNs developed from the smaller feature sets. On the other hand, the simplest ANNs developed for the thigh placement were able to predict EE with similar accuracy to the largest feature set. Also, the simplest models (i.e., feature set 2, which included only mean and variance of the acceleration signals) can be used with high accuracy and RMSE within 25% of that achieved with the largest feature set (set 1) for the wrist and hip accelerometer placements. Therefore, these smaller feature sets may be more appropriate for use in large-scale studies, where ease of use of the predictive models is of utmost importance. Taken together, the findings of this study support the use of simple-to-compute acceleration features for achieving highly accurate estimates of free-living EE using machine learning. Moreover, choice of the number and type of features appears to alter EE prediction accuracy slightly, but the practical significance of these small differences is likely minimal, indicating that researchers may be able to use ANNs with only a few, simple-to-compute accelerometer features and achieve high measurement accuracy. 105 Study limitations and strengths There were several limitations of the current study. First, study participants represented a fairly homogenous group of college-age adults. Thus, our findings are not necessarily generalizable to older populations or children/adolescents and require further validation before use in these populations. Second, the use of a simulated free-living setting rather than a true free-living setting could be viewed as a limitation since some studies have used a true free-living setting for ANN creation and validation (Lyden, Keadle et al. 2013). Third, we did not measure resting VO2, which is known to vary across individuals (Ferro-Luzzi 1968). However, like creation of individual HR curves for improving the accuracy of EE prediction using HR, taking individual resting EE into account results in dramatically increased burden on researchers and participants; more importantly, individual resting EE measurement would limit the generalizability of our findings since it is not often possible to measure resting EE in intervention or epidemiologic studies, where accelerometers are often used. Instead of measuring resting VO2, it may be useful to include variables such as age and fat-free mass into prediction models since these variables account for the majority of variation in resting VO2 (Johnstone, Murison et al. 2005). However, our study did not find that the inclusion of demographic variables such as weight, height, and gender improve EE prediction when added as input features.. Last, we experienced some difficulties with keeping thigh-mounted accelerometers in their proper location during the protocol. Taping monitors on the thigh worked well initially but was less reliable once participants started to sweat. We attempted to secure the monitor using an elastic strap, but this often slipped throughout the session and was less comfortable to participants. There have been several studies that have successfully used thigh-mounted accelerometers for PA and SB measurement, and in future work we hope to communicate with other researchers regarding 106 optimal strategies for mounting accelerometers on the thigh due to their high measurement accuracy and ability to be worn inconspicuously (i.e., under clothing) to enhance compliance. There are also several notable strengths of this study. First and foremost, we believe the simulated free-living setting represents the best blend of exerting some control over participant activities while still allowing considerable freedom for the order, intensity, and duration of activities chosen by participants. Troiano et al. identified that PA tends to be performed in short bouts, meaning that steady-state is rarely achieved during PA in free-living settings (Troiano, Berrigan et al. 2008). This study provides rationale for the inclusion of transitions and nonsteady-state activities in our study since it is more similar to true free-living settings than a typical laboratory-based validation. A true free-living setting may theoretically have the most real-world generalizability, but a major issue in true free-living settings is lack of a good criterion measure. Doubly labeled water provides an accurate estimate of total EE but cannot measure activity EE or minute-tominute EE. Also, Lyden et al. used a true free-living setting for their ANN creation and validation and direct observation as their criterion measure. Trained observers recorded activities being performed and later used activity classification to predict EE using the Compendium of Physical Activities (Ainsworth, Haskell et al. 2011). While this approach probably represents the best possible criterion in a true free-living setting, it is limited in that the Compendium is an estimate of activity EE and is not suitable for individual EE prediction. Also, without imposing some structure in which participants must perform certain activities for a minimum time, it is likely that participants will spend the majority of their time in activities such as sitting and walking and minimal or no time performing other activities, limiting the generalizability of ANNs created from these data. By utilizing a variety of activities across a 107 wide range of intensities and including all transition data during the visit in our analysis, we incorporated many advantages of a true free-living setting while also exerting enough control to ensure that a variety of activities were performed. Additionally, in the simulated free-living setting we were able to use a portable metabolic analyzer as our criterion measure, which is widely used as a criterion measure for EE measurement. Another strength of the study was the use of Microsoft Excel and R statistical software for all stages of data cleaning, feature computation and extraction, and ANN creation and validation. These software programs are widely available, and they can be used to create and test machine learning algorithms with minimal experience in computational programming. Finally, it can sometimes be difficult to compare results across studies due to differences in protocol, number and types of activities performed, population used, and modeling approach(es) tested. By simultaneously using four accelerometers, our study allows for direct comparisons of monitors worn on different places on the body for accuracy in EE prediction. Conclusions In summary, our study provides strong preliminary evidence that machine learning modeling allows for single accelerometers mounted on the thigh and wrists to provide highly accurate estimates of EE in a simulated free-living setting. Thigh-mounted accelerometers appear to perform with slightly better accuracy than hip- or wrist-mounted accelerometers, although this difference is fairly small. Also, we have shown that choice of wrist (dominant vs. non-dominant) does not affect accuracy of EE prediction. Finally, our study builds off the work of others and highlights ways of reducing complexity of ANN model creation, hopefully allowing for this approach to be used by a wider group of researchers with skills in areas other than activity measurement. In future studies we plan to extend our comparison of different 108 placement sites for accuracy of activity classification as well as measurement of SB and sleep across different populations. Also, we plan to experiment with using data from multiple monitors to further improve measurement accuracy over that achieved with a single monitor. Finally, we intend to cross-validate the algorithms developed in the study in a true free-living setting to provide support for their future use for EE prediction in epidemiologic or surveillance research. 109 CHAPTER 4 COMPARISON OF ACTIVITY TYPE CLASSIFICATION ACCURACY FROM ACCELEROMETERS WORN ON THE WRISTS, HIP AND THIGH ABSTRACT The purpose of this study was to develop, validate, and compare the accuracy of activity type prediction models for accelerometers placed on the wrist, hip, and thigh. Additionally, we compared classification of activity type between accelerometers worn on the left and right wrists. Finally, we compared prediction accuracies for specific categories of activities (e.g., sedentary activities) METHODS: Forty four healthy adults participated in a 90-minute simulated freeliving activity protocol, in which participants performed a total of 14 activities (sedentary, ambulatory, lifestyle, and exercise activities, standing, cycling, stairs, and non-wear) for 3-10 minutes each. The order, duration, and intensity of activities were dictated by participants and recorded using direct observation (for a criterion measure of activity type). Four accelerometers were worn (right and left wrists, right hip, and right thigh) in order to predict activity type using artificial neural networks. The artificial neural networks were created using several sets of input features in order to determine those most relevant to activity type prediction. Classification accuracy of the artificial neural networks was evaluated using sensitivity, specificity, and area under the curve, with direct observation used as the criterion measure of activity type. RESULTS: The wrist accelerometers achieved the highest overall classification accuracies for identifying all 14 activities (80.9-81.1%) as well as when similar activities were grouped into categories (86.6-86.7%). Additionally, classification accuracies were similar between left and right wrists. The hip accelerometer had the lowest overall classification accuracies (66.272.5%), with the thigh accelerometer accuracy higher than the hip but lower than the wrists 110 (71.4-84.0%). Sedentary, lifestyle, and exercise activities were detected best with the wrist accelerometers, whereas the ambulatory activities had similar classification accuracies with all four accelerometer placements. Unlike our previous work with energy expenditure prediction (Chapter 3), more input features significantly improved classification accuracy. CONCLUSIONS: A single accelerometer placed on the left or right wrist provided the highest overall classification accuracy for activity type prediction as well as the highest accuracy for sedentary, lifestyle, and exercise activity categories in a simulated free-living setting. 111 INTRODUCTION Objective measurement of physical activity (PA) and sedentary behavior (SB) is important for determining relationships between these lifestyle behaviors and health indices, identifying populations at high risk of having low PA and high SB levels, and evaluating the effectiveness of interventions designed to increase PA and/or decrease SB. Because of the interest (at both a population and personal level) in measuring PA and SB, many wearable devices, such as heart rate monitors, pedometers, and accelerometers, have been used in an attempt to quantify PA and SB. Accelerometers have emerged as the most popular method due to their relatively low participant and researcher burden as well as high accuracy for measuring physiologic variables such as energy expenditure and activity intensity (Welk 2002). Traditional use of accelerometers has involved linear regression for predicting energy expenditure from “activity counts”, which are pre-processed and filtered acceleration signals from an accelerometer (Freedson, Melanson et al. 1998). However, in recent years the field has started to move away from the count-based regression approach because linear regression is often inadequate to capture the complex relationship of acceleration patterns and movement that occurs in free-living settings (Hendelman, Miller et al. 2000; Swartz, Strath et al. 2000) and cannot allow for determination of the type of activity being performed (Preece, Goulermas et al. 2009). A large body of recent research has focused on machine learning, a pattern recognition approach for modeling data, in order to predict energy expenditure as well as activity type using features extracted from accelerometer data (Preece, Goulermas et al. 2009). Using machine learning, researchers have achieved activity classification accuracies consistently over 70% (Staudenmayer, Pober et al. 2009; Dong, Montoye et al. 2013) and often over 90% (Zhang, 112 Rowlands et al. 2012; Cleland, Kikhia et al. 2013; Mannini, Intille et al. 2013; Skotte, Korshoj et al. 2014) using data from a single accelerometer worn on different parts of the body. Although accelerometers have traditionally been worn on the hip, machine learning has yielded high activity classification accuracy from accelerometers worn on the wrist, thigh, and ankle (Cleland, Kikhia et al. 2013; Dong, Montoye et al. 2013; Mannini, Intille et al. 2013; Skotte, Korshoj et al. 2014). Of the many placement sites tested in different studies, the wrist and thigh hold significant promise in the context of machine learning approaches to data analysis. Wrist-mounted accelerometers have been used successfully for sleep measurement (JeanLouis, Kripke et al. 2001) and are also being used in the 2011-2014 cycle of NHANES data collection in the hope of improving compliance (Troiano and McClain 2012). Moreover, several studies have achieved high accuracy for activity type classification with wrist accelerometers (Zhang, Rowlands et al. 2012; Cleland, Kikhia et al. 2013; Mannini, Intille et al. 2013), further demonstrating the utility of the wrist as a promising measurement site. The convention is for wrist-mounted accelerometers to be worn on the non-dominant wrist, but it may be that this convention is unnecessary. A study by Zhang et al. found similar classification accuracies from accelerometers worn on the left and right wrists for four types of activities (sedentary, household, walking, and running). However, it is unknown if other kinds of activities, especially activities that may vary considerably between dominant and non-dominant hands (e.g., sweeping, computer use, etc.), are detected with similar accuracy for each wrist. If choice of wrist placement does not affect measurement accuracy, there may be important implications for improving compliance and comfort with wrist-worn accelerometers. Thigh-mounted accelerometers possess significant potential as a placement site due to their high accuracy for measuring SB (Grant, Ryan et al. 2006; Kozey-Keadle, Libertine et al. 113 2011; Lyden, Kozey Keadle et al. 2012) and high accuracy for prediction of energy expenditure (Chapter 3). Studies have found that activity type classification accuracies from a thigh-mounted accelerometer are similar to or higher than accuracies achieved with a wrist-mounted accelerometer (Cleland, Kikhia et al. 2013; Dong, Biswas et al. 2013; Skotte, Korshoj et al. 2014), but it is unknown if the thigh can provide high measurement accuracies across a wide variety of activities. Despite the utility of the hip, wrist, and thigh as accelerometer placement sites, few studies have directly compared activity classification accuracies among these sites to determine their overall accuracy as well as classification of different types of activities such as sedentary, ambulatory, or lifestyle activities. One study by Cleland et al. found classification accuracies above 95% for accelerometers located on the hip, wrist, and thigh, but the small number of activities performed and small number of participants limits our understanding of the advantages and disadvantages of each placement site. Other studies by Dong et al. (Dong, Montoye et al. 2013) and Skotte et al. (Skotte, Korshoj et al. 2014) compare two of these three placement sites but did not compare all three. Therefore, further research is needed to directly compare classification accuracies of hip-, wrist-, and thigh-mounted accelerometers. Another research gap is the lack of machine learning algorithm validation for activity type classification in free-living settings. Most previous studies have been conducted in laboratory-based settings with participants performing a set list of activities in a pre-specified order, for a pre-specified period of time, and at a constant, pre-specified intensity (Cleland, Kikhia et al. 2013; Dong, Montoye et al. 2013; Mannini, Intille et al. 2013; Skotte, Korshoj et al. 2014). These laboratory-based settings ensure that high control is exerted over the protocol and can provide valuable insight as to the strengths and weaknesses of predictive algorithms for 114 classifying different types and intensities of activities. However, the lack of variation allowed in laboratory-based protocols makes the laboratory setting very different from a free-living environment, where individuals are not constrained to a certain order, intensity, or timing of activities. Previous work with cut-points as well as machine learning provides evidence that predictive techniques validated in the laboratory perform with much lower accuracy when applied to a free-living setting (Swartz, Strath et al. 2000; Gyllensten and Bonomi 2011; Lyden, Keadle et al. 2013). In one such study, Lyden et al. (Lyden, Keadle et al. 2013) found that an ANN created from laboratory-based activity data had a bias of over 33% when used to predict energy expenditure from data collected in a free-living setting. Similarly, a study by Gyllensten et al. showed that activity type machine learning algorithms developed in a laboratory had classification accuracies 15-20% lower in a free-living setting than in the laboratory-based setting in which they were created (Gyllensten and Bonomi 2011). Therefore, activity type prediction algorithms need to be created and validated in a free-living setting in order to have true utility for activity measurement in epidemiologic, surveillance, or intervention studies. Given the current gaps that exist with regard to activity type classification, the purposes of our study were 1) to develop and validate ANNs (using several sets of features) for prediction of activity type from accelerometers worn on the wrists, hip, and thigh 2) to compare the activity classification accuracies achieved among these accelerometer placement sites, 3) to compare the overall activity classification accuracies of accelerometers placed on the left and right wrists and 4) to compare classification accuracies for specific activity types, activity categories (i.e., lifestyle, exercise, sedentary, and ambulatory activities), and activity intensities (i.e., sedentary, light, etc.) using data collected in a simulated free-living setting. 115 METHODS Summary of protocol Participants came to the Human Energy Research Laboratory to participate in a 90minute simulated free-living protocol, for which they performed a total of 14 sedentary, ambulatory, lifestyle, and exercise activities. Each activity was performed for between 3-10 minutes, with the order, duration, and intensity of activities left up to participants. During the protocol, participants wore four accelerometers, and the order and durations of their activities were recorded by a trained observer and used as a criterion measure of activity type. Participants A total of 44 adults (22 male, 22 female) were recruited from the area surrounding East Lansing, MI via email, flyers, and word of mouth for participation in this study. Participants had to fulfill three criteria to be eligible for the study: 1) they had to be free of health conditions preventing them from being able to safely perform moderate- or vigorous-intensity activities, 2) they could not have an orthopedic limitations that would invalidate the use of accelerometry for activity measurement, or 3) they had to fall within the age range of 18-44 years. Prior to participant recruitment, this study was approved by the Michigan State University Institutional Review Board. All participants provided written informed consent prior to their participation in the study. Instrumentation The activity monitors used in this study were ActiGraph GT3X+ accelerometers and GENEActiv accelerometers. Additionally, an iPAQ portable digital assistant (PDA) computer was 116 used by observers to record the activities performed during the protocol. The acceleration data for all four accelerometers were time stamped and stored within the monitors and later were downloaded to a computer for analysis. Additionally, the accelerometers were oriented so that the x-axis was the vertical axis, the y-axis was the medial-lateral axis, and the z-axis was the anteriorposterior axis. All accelerometers and the PDA were synchronized to an external clock prior to the start of data collection. Descriptions of the accelerometers and PDA follow. ActiGraph accelerometers The ActiGraph (ActiGraph LLC, Pensacola, FL) is a commonly used, commercially available accelerometer, and there is an abundance of literature regarding its reliability and validity for measurement of PA (Freedson, Melanson et al. 1998; Matthew 2005; McClain, Sisson et al. 2007). Two GT3X+ models were worn by each participant during the study. One accelerometer was placed on the midline of the right thigh, one third of the way between the hip and knee and adhered to the leg with hypoallergenic sticky tape. The other ActiGraph was mounted on the right hip, at the anterior axillary line, with an elastic belt. The ActiGraph GT3X+ records raw accelerations of up to ± 6 times the gravitational force (6g) in three axes of movement. For the current protocol, the accelerometers were set to record data at a rate of 40 samples per second (40 Hz). GENEA accelerometers The GENEActiv (Activinsights Ltd, Kimbolton, Cambridgeshire, UK) is a new accelerometer that has had preliminary validation for PA measurement (Esliger, Rowlands et al. 2011) as well as activity type classification (Zhang, Rowlands et al. 2012). Like the ActiGraph, the GENEA records raw data of up to ± 6g in three axes of movement. The GENEAs were set to 117 record acceleration data at a rate of 20 Hz for the current study. Participants wore two GENEA accelerometers (one on each wrist) for this study. Each GENEA was fastened securely to the dorsal side of the wrist, between the styloid processes of the radius and ulna (Esliger, Rowlands et al. 2011). iPAQ portable digital assistant and direct observation Direct observation (DO) was conducted using an HP iPAQ PDA (HP Development Company, Palo Alto, CA) to obtain a criterion measure of activity type for this study. During the protocol, a trained observer used a portable digital assistant with BEST software developed based on the Children’s Activity Rating Scale protocol (Puhl, Greaves et al. 1990). The numbers codes T1-T14 represented the 14 activities in the visit, and the observer recorded the activities being performed continuously as they occurred throughout the visit. A list of activities and their specific DO codes can be found in Table 4.1. Inter-rater reliability for DO was above r=0.90 for this study. Procedure Each participant reported to the Human Energy Research Laboratory, where details of the study were discussed with each participant. Written informed consent was obtained, and a physical activity readiness questionnaire was administered to ensure that the participant was healthy and had no contraindications to engaging in activity. If participants had answered ‘yes’ to any question on the questionnaire, they would have been required to obtain physician approval before being able to participate in the study; however, this did not occur. After consenting to participation, participant weight and height were taken by trained research assistants according to standardized methods (Malina 1995). Weight was measured to the nearest 0.1 kg using a Seca digital scale (Seca, Hanover, Germany), with shoes off and weight balanced on the center of the 118 scale. Height was measured to the nearest 0.1cm using a Harpenden stadiometer (Holtain Ltd., Crymych, United Kingdom). For measurement of height, the participant removed his/her shoes, stood erect with feet flat on the floor, aligned head in the Frankfurt plane, and placed the back of the feet, shoulders, and head against the back of the board. Two measurements were taken and averaged for both weight and height. If the two body weights differed by more than 0.3 kg or if the two heights differed by more than 0.4 cm, a third measurement was taken, and the closest two measurements were averaged to obtain a final value. Body mass index (BMI) was calculated by dividing body weight by the square of height (kg/m2). Age was assessed by asking participants to state their age in years. Handedness was determined by asking participants which hand they prefer to use for the majority of activities. Each participant wore one ActiGraph on the hip, another ActiGraph on the thigh, one GENEA on the left wrist, and one GENEA on the right wrist while performing 14 activities (activities shown in Table 4.1). These activities comprised a range of intensities from sedentary to vigorous and represented a mixture of sedentary, ambulatory, exercise, and lifestyle. Ambulatory activities (walking and jogging) are common in accelerometer validation literature; we added the sedentary, exercise, and lifestyle activities to determine the potential for the four accelerometer placements to accurately measure a range of activity types often seen in free-living settings. Additionally, we added an activity where participants removed the accelerometers so that the ANNs would be able to recognize non-wear, which is important to be able to detect in free-living environments for compliance purposes. 119 Table 4.1. Activities performed during the simulated free-living protocol. Activity Category Lying down (T1) Activity Intensity Sedentary Reading (T2) Sedentary Computer (T3) Sedentary Standing (T4) Light** Laundry (T5) Light Sweeping (T6) Light Walking slow (T7) Light Walking fast (T8) Moderate Jogging (T9) Vigorous Cycling (CY) Cycling (T10) Moderate/ Vigorous Stair use (SU) Stair climbing and descending (T11) Moderate/ Vigorous Biceps curls (T12) Light Squats (T13) Moderate Non-wear of accelerometer (T14) N/A Sedentary (SE) Standing (ST) Lifestyle (LI) Leisure walk (LW) Brisk walk (BW) Jogging (JO) Exercise (EX) Non-wear (NW) Activity Description of Activity* Lying on a mat on the floor Reading a magazine article while sitting at a table Sitting and playing a computer game that involves mouse clicking and typing Standing still with arms at sides Folding towels and putting them in a laundry basket Sweeping confetti into piles Walking at a self-selected ‘slow’ pace in a hallway Walking at a self-selected ‘brisk’ pace in a hallway Jogging at a self-selected pace in a hallway Cycling on a cycle ergometer at a selfselected cadence of 50-100 rpm with 1 kg resistance Walking up and down a flight of stairs at a self-selected pace Standing still while doing biceps curls with a 3-lb. weight in each hand With feet shoulder-width apart, bending at the knees (to a 90° angle) while holding an unweighted broom behind the head Not wearing the accelerometer * Activity order, intensity, and duration (3-10 minutes) were left up to participants. ** Standing has traditionally been considered SB; however, recent literature suggests that standing should be considered light-intensity instead of SB due to the differential physiologic effects of standing as compared to sitting/lying (Owen, Healy et al. 2010). Participants completed a 90-minute, simulated free-living setting which took place in a laboratory within the Human Energy Research Laboratory as well as a hallway and stairwell. 120 During the protocol, participants performed the 14 activities listed in Table 4.1. A list of these activities was given to participants at the beginning of the visit along with a description of how to perform each activity. Participants completed each of the 14 activities for a total of at least three minutes and for no more than 10 minutes, but the order, intensity, and duration of the activities were left up to each participant. A research assistant directly observed and recorded each activity on a handheld PDA computer while it was being performed. Additionally, activities were written on a whiteboard and checked off as participants completed each activity. so that participants know which activities they still needed to complete. Every 4-5 participants, the activities were erased and rewritten in a different order to avoid possible effects from the order in which the activities were written. For this study, DO served as the criterion measure of activity type performed. The non-wear activity was saved until the end of the 90-minute protocol so that participants would not spend a significant amount of time trying to remove and reattach the accelerometers. Upon completion of the protocol, participants were given a $35 Target® gift card. Data reduction and modeling Artificial neural networks ANNs are nonlinear models which take a set of inputs x1…xk and use them to predict a certain output variable y (e.g., EE or activity type), where k is the number of features used to predict y. Figure 1 provides a graphical depiction of one of the activity type ANNs. For activity type classification, the ANNs functioned similar to a logistic regression model. Setting the activity types as the nominal values a1…a14, the ANN model can be seen in Equation 1. Equation 1: ( ) ( 121 ∑ ( ∑ ) In Equation 1, Pr is probability, C is a constant chosen so that Pr(y=a1)+…+Pr(y=a14)=1, w are the weights of the input features, U is a logistic activation function, and H is the number of hidden layers. In accordance with previous research, our models contained only one hidden layer (Preece, Goulermas et al. 2009; Staudenmayer, Pober et al. 2009; Trost, Wong et al. 2012). 122 Figure 4.1. ANN for predicting activity type. Figure 4.1 legend * The number of output variables shown matches the number used in the study, but the number of input features varied from 8-38, depending on the feature set tested. Additionally, three hidden units are shown above for simplicity, but we used 15 hidden units for constructing our ANNs. Accelerometer signal features (one of each per axis, three total of each per accelerometer) 1. Mean = mean 2. Var = variance 3. Cov = covariance 4. Min = minimum 5. Max = maximum 6. MeanOR = mean accelerometer orientation 7. VarOR = variance of 8. 10th %ile = 10th percentile accelerometer orientation 9. 25th %ile = 25th percentile 10. 50th %ile = 50th percentile th th 11. 75 %ile = 75 percentile 12. 90th %ile = 90th percentile Participant characteristics features 13. Ht = participant height 14. Wt = participant weight Non-feature abbreviations S = summations of the input layer in the hidden units U = activation function for the hidden layer W1 = the weight vectors for each of the inputs W2 = the weight vectors for each of the summations 123 The ANNs were created and tested using a leave-one -out approach. In this approach, data from all but one participant were used to estimate the weights for each input feature for predicting activity type. Then, the ANN was tested on the data from the participant left out of the training phase by supplying the input features and comparing the predicted activity type from the ANNs to the recorded activity type from DO. The leave-one-out cross validation is an iterative approach and was repeated with each participant’s data used as the testing data once, therefore obtaining an ANN for activity type for each participant in the study. Weights determined from each iteration of the leave-one-participation-out validation were averaged to obtain a final ANN for each accelerometer placement site, r, resulting in four distinct ANNs. There were two important considerations that were addressed in building our ANNs: 1) window length and 2) relevant features to use as input variables. Window length In order to analyze accelerometer data, it must first be divided into segments, called ‘epochs’ or ‘windows,’ for analysis. By dividing the data into windows, activity type can be assessed separately for each window to yield information on which activities were being performed as well as when they were performed. Windows of 60 seconds are commonly used for predicting energy expenditure while analyzing accelerometer data because summarizing a given energy expenditure or activity performed each minute is intuitively appealing and works well for steadystate activities (Staudenmayer, Pober et al. 2009; Freedson, Lyden et al. 2011). Additionally, longer windows (i.e., 30-60 seconds) increase the amount of information available with which to determine activity type, and they have been shown to improve activity classification accuracy (Trost, Wong et al. 2012). However, a significant limitation of longer windows is that they are less 124 useful in free-living situations, where activities rarely start or end exactly on the minute and where activities may last less than a minute in length (Lyden 2012). Thus, a 60-second window is likely to encompass more than one activity, resulting in frequent activity misclassification due to too much granularity in the output. On the other hand, very short windows (e.g., less than one second) may not allow enough time to capture a movement (i.e., in one second a person may only take part of a step when walking), therefore yielding insufficient information to classify the movement and resulting in lower classification accuracy (Preece, Goulermas et al. 2009; Trost, Wong et al. 2012). Machine learning techniques have been conducted with window lengths as short as 0.25 seconds, but many studies use window lengths of 4-6.7 seconds for classifying activity type (Preece, Goulermas et al. 2009). Therefore, in accordance with previous research and for simplicity, we employed fivesecond windows for our activity type in our data processing and analyses. Features There are several different types of features that can be used as input variables: timedomain features, frequency-domain features, and participant characteristics. Time-domain features are most commonly used because they can be directly computed from the accelerometer signal data, making them simple to extract and understand (Preece, Goulermas et al. 2009; Staudenmayer, Pober et al. 2009). Examples of time-domain features are mean, variance, covariance, and percentiles of the acceleration signal. The other main type of features, frequency-domain features, can be used either in conjunction with or independent from time-domain features, yielding similarly high accuracy for activity type classification as time domain features in some studies (Preece, Goulermas et al. 2009; Mannini, Intille et al. 2013). However, frequency-domain features 125 require mathematical transformations prior to computation and may require significant computational power and specialized statistical software (Preece, Goulermas et al. 2009). Additionally, several studies provide evidence that time-domain features can be used to achieve high activity classification accuracy (71-99%) from a single accelerometer without use of frequency-domain features (Herren, Sparti et al. 1999; Staudenmayer, Pober et al. 2009; Dong, Montoye et al. 2013; Montoye, Dong et al. 2013). Other than time- and frequency-domain features, simple descriptive features, such as accelerometer orientation or participant demographic variables, can also be used to improve measurement accuracy. Many accelerometer signal features have been used in previous research, and the models created have varied considerably in complexity and measurement accuracy. Adding more features may improve accuracy of the ANN; however, similarly to linear regression, addition of too many input variables may lead to overfitting ANNs to the data used for training, resulting in poor generalizability of the model when applied to a new population. Therefore, there must be a balance of number of features used and accuracy achieved. Another consideration of adding too many features is that it increases complexity of the models created and requires more computational power to create. This added complexity can quickly render machine learning models difficult to create or use for anyone lacking experience with computer programming and/or access to expensive computing software (Pober, Staudenmayer et al. 2006; Rothney, Neumann et al. 2007; Staudenmayer, Pober et al. 2009). Thus, we experimented with different sets of features to determine a set that had high accuracy of measurement without being overly complex. Before computing features, the 40 Hz data from the ActiGraph accelerometers were reintegrated to 20 Hz for comparison with the data from the GENEA. Table 4.2 provides a list of the 38 features tested and used in the current analyses. The 36 accelerometer features (12 features 126 for each of three axes) are all time-domain features that have been effectively utilized in previous studies, and height and weight were included to account for different body sizes. Since fivesecond windows were used for activity type classification, there were 100 accelerometer data points within each five-second window with which to calculate the necessary features (20 samples/second * 5 seconds). Mean, variance, covariance, minimum, maximum, mean and variance of monitor orientation, and the 10th, 25th, 50th, 75th, and 90th percentiles of the acceleration signal were calculated separately for x-, y-, and z-axes. After creating the ANNs using all 38 features, follow-up analyses were conducted to determine if a subset of features could reduce complexity of the ANNs with minimal loss of accuracy. The subsets tested are shown in Table 4.3. Additionally, feature sets 1 and 2 were tested with and without height and weight included as input features to determine if including demographic characteristics impacted classification accuracy. 127 Table 4.2. Features used for EE and activity type prediction. Feature number 1-3* Feature used 4-6* Variance of acceleration signal 7-9* Covariance of acceleration signal Minimum of acceleration signal 10-12* 13-15* 16-18* Formula for calculating feature Mean acceleration signal Maximum of acceleration signal 10th percentile of acceleration signal 19-21* 25th percentile of acceleration signal 22-24* 50th percentile of acceleration signal 25-27* 75th percentile of acceleration signal 28-30* 90th percentile of acceleration signal N/A Accelerometer orientation (needed for calculating features 31-36) ( ∑ ∑ ( ∑ ) ( ) ) ( ( )] ) ( ) ( ) For every 100 accelerations, arrange in order from smallest to largest and pick the 10th value For every 100 accelerations, arrange in order from smallest to largest and pick the 25th value For every 100 accelerations, arrange in order from smallest to largest and pick the 50th value For every 100 accelerations, arrange in order from smallest to largest and pick the 75th value For every 100 accelerations, arrange in order from smallest to largest and pick the 10th value ( ) ( 31-33* Mean accelerometer orientation 34-36* 37 Variance of accelerometer orientation Participant height N/A 38 Participant weight N/A √( ) ( ∑ ( ∑ ) ) ) * Signifies that one feature is included for each of the three accelerometer axes. The formulas shown are for the x-axis, but the formulas for the y-and z-axes are similar. Ax is the acceleration in the direction of the x-axis. 128 Table 4.3. Feature sets used for creation and testing of ANNs. Feature set number 1 2 3 4 5 Features used Mean, variance, covariance, minimum, maximum, mean orientation, variance of orientation, and 10th, 25th, 50th, 75th, and 90th percentiles of acceleration signal, weight, and height Mean and variance of acceleration signal, weight, and height Mean, variance, minimum, and maximum of acceleration signal, weight, and height Mean, variance, covariance, minimum, and maximum of acceleration signal, weight, and height 10th, 25th, 50th, 75th, and 90th percentiles of acceleration signal, weight, and height Total number of features used 38 (12 accelerometer features/axis * 3 axes + weight + height) 8 (2 accelerometer features/axis * 3 axes + weight + height) 14 (4 accelerometer features/axis * 3 axes + weight + height) 17 (5 accelerometer features/axis * 3 axes + weight + height) 17 (5 accelerometer features/axis * 3 axes + weight + height) Activity type classification Although 14 activities were performed in the protocol, some activities could be combined into common groupings. Differentiating among the sitting activities (computer use and reading) and lying may be difficult using a single accelerometer on the thigh or hip since thigh movement was minimal and thigh and hip orientation were similar for all three activities. However, these sedentary activities elicit similar physiologic responses (Bey and Hamilton 2003), so differentiation among them was not of central importance in this study. Therefore, these three activities were grouped into a ‘sedentary’ category. Conversely, standing, which is considered SB in most studies since its energy cost is less than 1.5 METs (Ainsworth, Haskell et al. 2011), 129 requires significant postural muscle contraction and elicits different physiologic responses from sitting and lying down (Hamilton, Hamilton et al. 2004; Hamilton, Hamilton et al. 2007). Thus, standing had its own category, separate from the sedentary category. Squats and biceps curls are both exercise activities, so they were grouped into an ‘exercise’ category. Finally, laundry and sweeping are both lifestyle activities that are light intensity and involve intermittent movement of both the upper and lower body. Thus, these were combined into the ‘lifestyle’ category. The rest of the activities had their own categories. In summary, we evaluated activity classification accuracy for all 14 activities and then for the 10 categories. These 10 categories are displayed in the leftmost column of Table 4.1. It is important to note that these categories were not meant to imply that certain types of activities could not be in a different category (i.e., walking and running are often used for exercise rather than ambulation). Rather, these categories were developed to group similar activities to offer a better idea of the utility of the ANNs for activity classification accuracy. Additionally, a third grouping was performed by grouping activities into intensity categories (sedentary, light, moderate, vigorous) in order to determine how well the ANNs can predict the relative intensity of an activity. Identifying non-wear Non-wear was classified as a separate activity type from the 13 other activities performed by the participants in the 90-minute free-living simulation. By creating a distinct category and training the ANNs to recognize non-wear, we hoped to eliminate the need to establish coding rules for how many minutes of consecutive zero counts determine non-wear when accelerometers are worn in free-living settings (Masse, Fuemmeler et al. 2005; Evenson and Terry 2009). 130 Direct observation With the BEST software, DO data were recorded instantaneously and were not reintegrated into predefined windows for analysis. Therefore, within each five-second window of accelerometer data, there were one or more activities performed. When participants transitioned between activities, usually the activity transition occurred in the middle of a five-second window (as opposed to perfectly at the end of one window/start of another), meaning that it was not possible to accurately predict the activity performed in the transition. Therefore, in each fivesecond window in which a transition between activities occurred, the window was removed from the data set before training and testing the activity type ANNs. The removal of transition windows was necessary for validation purposes; when implemented in a free-living setting, transitions between activities and multiple activities performed in a single window cannot be classified correctly since predictive models can only predict one activity for a window. To minimize this issue, we used five-second windows instead of the longer windows used in many previous studies (Rothney, Neumann et al. 2007; Preece, Goulermas et al. 2009; Staudenmayer, Pober et al. 2009). Statistical analyses Classification accuracies were determined by calculating the sensitivity, specificity, and area under the curve (AUC) for each ANN. Operational definitions of these three variables follow. Sensitivity: Sensitivity refers to the ability of each ANN to correctly classify an activity when it occurs (Parikh, Mathai et al. 2008). It represents the proportion of times an activity was predicted when it actually occurred. Percent agreement, which is equivalent to sensitivity, is most often reported in the literature for defining classification accuracy and was our primary measure of classification accuracy in this study. 131 Specificity: Specificity refers to the ability of the each ANN to correctly classify an activity as not occurring when it does not occur. It represents the proportion of times an activity was not predicted when the activity, in fact, did not occur (Parikh, Mathai et al. 2008). Area under the curve: AUC is the area under the receiver operating characteristic curve created by graphing sensitivity of a variable on the y axis and 1-specificity on the x axis. A value of 1.00 represents perfect classification accuracy and a value of 0.50 represents accuracy which is no better than what would be attained from chance alone. According to Metz, AUC values of ≥ 0.90 are considered excellent, 0.80-0.89 are good, 0.70-0.79 are fair, and <0.70 are considered poor classification accuracy (Metz 1978). Sensitivities, specificities, and AUC were calculated from each accelerometer and each iteration of the leave-one-out validation. Differences among the hip, wrists, and thigh accelerometers were evaluated using repeated measures analysis of variance (RMANOVA). If the RMANOVA test statistic was significant, post hoc tests were conducted using dependent-samples t-test and a least significant difference (LSD) correction in order to account for multiple comparisons and avoid inflation of type I error. Additionally, RMANOVA was used to compare classification accuracies among different feature sets. The a priori Alpha level was set at P<0.05. After running primary analyses for the left- and right-wrist accelerometer placements, the data set was rearranged to compare dominant vs. non-dominant wrist placements, and dependent-samples t-tests were run to compare overall sensitivities for the dominant vs. non-dominant wrist. Confusion matrices were created for each of the four accelerometer placements, with the actual activity performed as the rows of each matrix and the predicted activity as the columns of each matrix. 132 Power analysis We desired 80% power to detect a difference of at least moderate effect size (ES=0.5) among accuracies of accelerometers compared to the criterion measure. Therefore, with the α level set at α = 0.05, we needed 34 participants to be sufficiently powered to detect a moderate effect size difference among groups. We chose to oversample by 10 participants in order to have adequate sample size despite an expected loss of a few participants due to the possibility of equipment malfunction, especially when using multiple accelerometers, a handheld computer, and a portable metabolic analyzer (used for a different aim of the study). 133 RESULTS Data were collected from 44 participants for the current study. However, significant data loss from the accelerometers occurred in two participants, and there was an Oxycon portable metabolic analyzer malfunction in three other participants which resulted in premature termination of data collection. The remaining 39 participants who completed the 90-minute protocol and had usable data were included in analyses (shown in Table 4.4). Those excluded from the analysis were not statistically different from those included in terms of demographic characteristics. Table 4.4. Demographic characteristics of participants enrolled in study. All (n=39) Age (years) 22.1 (4.3) Weight (kg) 72.4 (16.2) Height (cm) 171.4 (10.1) 2 BMI (kg/m ) 24.4 (3.6) Data are displayed as mean (SD). Males (n=19) Females (n=20) 23.7 (5.0) 84.5 (13.1) 179.1 (7.7) 26.3 (3.4) 20.5 (2.7) 60.8 (8.9) 164.1 (5.7) 22.5 (2.6) As most studies present classification accuracy in terms of sensitivity only, we present the first part of our analysis in terms of sensitivity. Sensitivities for each accelerometer placement are shown in Figure 4.2. The sensitivities were as high as 80.9% and 81.1% for the left and right wrist accelerometer placements, respectively, with feature set 1. Both wrist placements had significantly higher sensitivities than the thigh or hip placements, and this difference existed for all five sets of features tested. Additionally, the thigh placement had significantly higher sensitivity than the hip placement for all five feature sets. For all five sets of features, the two wrist placements achieved similar overall sensitivities. Finally, feature sets 1 and 2 were modified to exclude height and weight as input features to determine if these 134 demographic characteristics affected classification accuracy. For both feature sets, classification accuracies were unchanged by excluding height and weight as predictor variables. Figure 4.2 also shows comparisons of classification accuracies achieved among the five feature sets. For all four accelerometers, feature set 1 provided the highest sensitivity, and feature set 2 provided the lowest sensitivity. Additionally, the ANNs created with feature sets 4 and 5 provided sensitivities similar to that achieved using feature set 1, but improvements from feature set 2 were no longer statistically significant for the wrist accelerometers. The ANNs created from feature set 3 yielded similar sensitivities to feature set 1 for the thigh and both wrist accelerometers but significantly lower sensitivity with the hip accelerometer. Additionally, inclusion and exclusion of height and weight as input variables had no effect on classification accuracy. Due to the superiority of feature set 1 compared to the other feature sets, further analyses were performed using feature set 1. 135 Figure 4.2. Sensitivity for the four accelerometer placements, compared among feature sets. 100.0 90.0 † †† ** 80.0 * ** † ^ ** † ^^ ** ^ ** Sensitivity (%) 70.0 60.0 ActiGraph Hip 50.0 ActiGraph Thigh 40.0 GENEA L. Wrist 30.0 GENEA R. Wrist 20.0 10.0 0.0 1 2 3 4 5 Feature Set The * indicates significant differences from all other accelerometer placement sites. The † indicates significant differences from feature set 1 (all 38 features). The ^ indicates significant differences from feature set 2. Table 4.5 provides a comparison of the sensitivity, specificity, and AUC of each accelerometer placement using feature set 1. All three measures were significantly higher for the two wrist-mounted accelerometers than the thigh or hip accelerometer placements, and all three were also significantly higher for the thigh than the hip placement. The magnitude of differences was much larger for sensitivity than specificity, which was consistently high across all accelerometer placements. With AUC values of 0.90, both wrists achieved excellent classification accuracy; in contrast, the hip and thigh placement sites achieved good classification accuracy with AUC values of 0.82 and 0.84, respectively . 136 Table 4.5. Overall sensitivity, specificity, and AUC for each of the four accelerometer placements for feature set 1. ActiGraph ActiGraph Thigh GENEA Left Wrist GENEA Right Wrist Hip 71.7 81.3 81.4 Sensitivity 66.4 (65.9-66.8) (71.3-72.1)* (81.0-81.7)^ 81.1-81.8)^ (%) 97.8 98.5 98.5 Specificity 97.4 (97.2-97.5) (97.7-97.9)* (98.4-98.6)^ (98.4-98.7)^ (%) 0.82 0.84 0.90 0.90 AUC (0.80-0.84) (0.83-0.86)* (0.89-0.91)^ (0.88-0.91)^ Values are reported as mean (95% CI). The * indicates significant differences from the hip accelerometer placement. The ^ indicates significant differences form the hip and thigh accelerometer placementss. Confusion matrices The confusion matrices for each of the four accelerometer placement sites (using the ANN created using feature set 1) can be found in Tables 4.6-4.9. The rows of each confusion matrix are the actual activities performed, and the columns in each matrix represent the activities predicted by the ANN. In Tables 4.6-4.9, the “Total” column represents the total number of fivesecond windows of data recorded for each activity, combined for all 39 participants. The cells highlighted in gray represent the number of windows correctly classified for each activity. Table 4.6 shows the overall sensitivity, specificity, and AUC values (calculated from the data in the confusion matrices) across all 14 activities for each of the four accelerometer placements. Overall AUC was 0.90 for each of the wrist accelerometer placements, indicating excellent classification accuracy according to parameters suggested by Metz (Metz 1978). The hip and thigh placements achieved AUC values, of 0.82 and 0.85, respectively, indicating good classification accuracy. To calculate sensitivity for a specific activity, we divided the number of correctly classified windows by the total number of windows in which that activity was 137 performed. For example, in Table 4.6, lying was correctly classified by the hip accelerometer 2,677 out of the 2,948 windows when lying took place, resulting in a sensitivity of 90.8%. For an example of specificity, activities other than lying were performed for a total of 39,277 windows for the hip placement (Table 4.6). Of these, the hip accelerometer ANN predicted lying as the activity performed only 179 times, resulting in a specificity of (39,277-179)/39,277 = 0.995. The AUC values were calculated, based on the sensitivity and specificity values, using Microsoft Excel. A significant advantage of displaying data with a confusion matrix is that the matrix allows one to assess misclassification to determine potential weaknesses of activity classification from each accelerometer placement site and identify the types of activities for which each site has the highest classification accuracy. From the confusion matrices, it is apparent that the thigh placement performed best for the fast walk and stairs (although the hip was within 2%). At 77.1% sensitivity, the hip accelerometer placement site was best for the slow walk; conversely, the wrist sites were best for jogging, although all four placement sites correctly recognized jogging greater than 90% of the time. For the exercise activities, the wrist accelerometer placements achieved sensitivities close to 90%. The thigh had similar sensitivity for squats but much lower sensitivity for classifying biceps curls (53.4%). Similarly, for the lifestyle activities, the wrist placements outperformed the hip and thigh placements, achieving sensitivities close to 80% for each activity (with the hip at 50-60% and the thigh at 59-73%). Moreover, two of the three sedentary activities (reading and computer use) were least likely to be detected by the hip placement (Table 4.6) and most likely to be detected by the two wrist placements (Tables 4.8 and 4.9), although sensitivity for recognizing lying down was highest with the hip (90.8%). 138 With the hip and thigh accelerometer placements, reading and computer use had low sensitivities (36-55%) due to frequent misclassification of one activity as the other. Additionally, the hip accelerometer ANN often misclassified these activities as standing (13-23% of the time), while the thigh accelerometer ANN incorrectly predicted these activities as lying down 12.7-19.3% of the time. For sweeping and laundry (the lifestyle activities), the hip and thigh ANNs often misclassified one as the other (7-16% of the time) or as biceps curls (15-22% of the time). Additionally, activities that took place while standing with minimal movement (standing, biceps curls) were often classified incorrectly by the hip and thigh accelerometer placements, with one often mistaken as the other. Cycling was not well-recognized by the hip as it was often misclassified as a lifestyle activity (8.0% for laundry and 14.6% for sweeping); conversely, cycling was detected with >84% sensitivity with the other three accelerometer placements. Finally, all four accelerometers had trouble distinguishing between the two walking speeds, frequently misclassifying one as the other or as stairs (9-14% of the time). Activity categories Upon further examinations of the four confusion matrices (Tables 4.6-4.9), it was apparent that classification accuracies were lowest among the sedentary activities for the hip and thigh accelerometer ANNs, with frequent misclassification of one sedentary activity with another sedentary activity (i.e., reading as computer use or vice versa). However, for our purposes, it was less critical to be able to differentiate among sedentary activities than it was to be able to correctly identify when a sedentary activity occurred (vs. a non-sedentary activity); therefore, we performed follow-up analyses combining lying, reading, and computer use into a ‘sedentary’ category. Similarly, we combined laundry and sweeping into a ‘lifestyle’ category and squats and biceps curls into an ‘exercise category’, therefore leaving 10 activity categories. We chose 139 not to group the two walking speeds since they represent different intensities of movement (i.e., light vs. moderate) and may have different health implications. Instead of displaying four additional confusion matrices, we summarized overall classification accuracies for each accelerometer placement in Table 4.11. As can be seen, overall AUC values improved for all four accelerometer sites when combining similar activities into categories, with the largest improvement seen for the thigh accelerometer placement (AUC of 0.91). Additionally, when looking at overall sensitivity for classification, all four accelerometer placement sites saw increased sensitivity, with the largest improvement, 12.6% (71.4% to 84.0%), seen in the thigh placement. The hip ANN accuracy improved 6.1%, and wrist placements’ accuracies improved between 5.6-6.3%. After combining into 10 categories, sedentary activities were still classified with lowest sensitivity with the hip accelerometer placement (72.6%), although the sensitivity achieved with the thigh placement (92.1%) approached that achieved by the wrist-mounted accelerometers (92.7-93.5%). Standing was classified with much lower sensitivity by the hip and thigh placements (56.9-69.6%) than the wrist placements (90.0-90.2%) due to frequent misclassification with biceps curls (exercise category). Also, both the exercise and lifestyle activities were best classified by the wrist placement sites (89.8-90.5% and 88.9-90.1%) compared to the hip (68.5% and 60.6%) and thigh (83.1% and 70.3%) accelerometer placements. Furthermore, jogging was classified with over 90% sensitivity for all placement sites but was slightly better with the wrists (95.3-96.1%) than the hip (92.5%) or thigh (93.1%). Finally, nonwear was detected with over 80% sensitivity for all placements but was highest among the wrist sites (88.8-91.3%). Therefore, the two wrist accelerometer placements appeared to provide 140 superior classification accuracies overall as well as for many of the specific activities and activity categories. Activity intensity categories Lastly, we combined the 14 activities according to their intensity in order to determine how well each accelerometer placement site could correctly classify activity intensity. The MET values elicited by each activity are shown in Table 4.12, both as estimated from the Compendium of Physical Activities and from the average METs measured by the portable metabolic analyzer during the visit. Activities that were seated or lying and required less than 1.5 METs were classified as sedentary (SBRN 2012). Activities with an intensity between 1.5 and 2.9 METs were classified as light, an intensity between 3.0-5.9 METs were considered moderate, and an intensity at or above 6.0 METs were considered vigorous (PAGAC 2008). Both methods for intensity classification resulted in the same intensity categorization for each activity, yielding three sedentary activities, five light-intensity activities, three moderate-intensity activities, and two vigorous-intensity activities (non-wear was not included in an intensity category). Sensitivity, specificity, and AUC for correct classification of activity intensity can be seen in Table 4.13. Overall sensitivities increased for all four accelerometer placements compared to sensitivities achieved when classifying individual activities or activity categories. Additionally, sensitivity increased the most in the thigh placement, surpassing the sensitivities achieved by the wrist placement sites. Specificity dropped slightly for all placements, resulting in no change in AUC for the hip or wrist placements compared to that achieved for classification of 10 activity categories. However, the AUC for the thigh placement increased to 0.94 and was significantly higher than that achieved by the wrist placements. 141 Examining differences between the left and right wrists for classification accuracy yielded only small differences. The right wrist classified lying and computer use better than the left wrist, whereas the left wrist had a higher classification accuracy for reading than the right wrist; however, when grouped into a sedentary category, classification accuracies were less than 1% different between the wrist monitors (93.5% for left wrist and 92.7% for right wrist). In fact, the only activities with more than a 1% difference in classification accuracies between wrists were cycling (3.1% higher for left wrist), exercise activities (1.2% higher for right wrist), stairs (1.1% better for right wrist), and non-wear (2.5% better for right wrist). Since four of the 39 participants included in the analyses reported being left-hand dominant, we also analyzed overall classification accuracy between the dominant and nondominant wrist accelerometer placements (as opposed to strictly comparing left and right wrists). As can be seen in Figure 4.3, no significant differences existed in overall classification accuracy exist between the dominant and non-dominant wrist placements for any of the five feature sets. 142 Table 4.6. Confusion matrix for activity type classification from a hip-mounted ActiGraph accelerometer. AG Hip LY LY RE CO ST LA SW WS WF JO CY BC SQ SR NW Total 2677 17 1 11 70 3 2 1 0 6 53 0 1 106 2948 RE 21 1215 1055 441 108 20 4 0 0 45 149 3 1 270 3332 CO 2 729 1354 795 102 11 5 0 0 14 179 0 1 265 3457 ST 0 273 429 1447 44 12 2 0 0 21 238 0 0 75 2541 LA 111 27 38 21 1726 425 30 1 2 310 612 100 2 6 3411 SW 2 11 4 9 395 1473 34 0 0 404 32 79 12 0 2455 WS 0 3 0 0 1 49 2057 298 20 284 0 13 349 0 3074 WF 0 3 0 0 3 7 249 2228 13 4 4 19 360 0 2890 JO 0 0 0 0 25 3 6 26 2223 0 0 13 105 2 2403 CY 1 8 3 0 239 435 125 6 0 1970 170 2 27 1 2987 BC 3 220 133 269 585 24 7 0 0 153 1139 9 2 0 2544 SQ 0 1 11 1 67 97 53 0 0 126 7 1664 81 1 2109 SR 0 0 0 0 1 9 241 114 68 8 0 14 3116 0 3571 NW 39 527 143 85 10 4 0 0 2 4 6 0 0 3683 4503 The “Total” column is the total number of five-second intervals in which each activity was performed (data from all 39 participants). Rows are actual activities performed, and columns are predicted activities. LY = Lying, RE = Reading, CO = Computer, ST = Standing, LA = Laundry, SW = Sweeping, WS = Walk slow, WF = Walk fast, JO = Jogging, CY = Cycling, BC = Biceps curls, SQ = Squats, SR = Stairs, NW = Non-wear. 143 Table 4.7. Confusion matrix for activity type classification from a thigh-mounted ActiGraph accelerometer. AG Thigh LY LY RE CO ST LA SW WS WF JO CY BC SQ SR NW Total 1590 711 310 0 3 2 1 0 0 8 63 3 5 252 2948 RE 643 1281 1224 0 40 5 2 0 0 6 0 3 2 126 3332 CO 440 884 1889 0 32 0 4 0 0 4 0 1 2 201 3457 ST 3 26 28 1768 149 13 4 0 0 1 548 0 1 0 2541 LA 18 71 102 54 2024 528 33 1 0 46 518 15 1 0 3411 SW 1 4 5 9 534 1786 42 0 0 53 8 2 11 0 2455 WS 1 0 0 0 16 64 2527 322 10 80 2 0 52 0 3074 WF 0 0 0 0 9 2 366 2078 43 80 7 7 298 0 2890 JO 0 0 0 0 1 3 44 27 2238 20 0 0 70 0 2403 CY 50 29 2 0 116 9 12 1 9 2694 4 9 50 2 2987 BC 0 16 13 409 680 20 7 0 0 18 1359 22 0 0 2544 SQ 0 2 0 2 44 30 75 5 0 36 15 1875 21 4 2109 SR 0 0 0 0 4 3 137 104 57 83 0 1 3182 0 3571 NW 566 7 35 3 8 0 0 0 1 2 4 3 1 3873 4503 The “Total” column is the total number of five-second intervals in which each activity was performed (data from all 39 participants). Rows are actual activities performed, and columns are predicted activities. LY = Lying, RE = Reading, CO = Computer, ST = Standing, LA = Laundry, SW = Sweeping, WS = Walk slow, WF = Walk fast, JO = Jogging, CY = Cycling, BC = Biceps curls, SQ = Squats, SR = Stairs, NW = Non-wear. 144 Table 4.8. Confusion matrix for activity type classification for a GENEA accelerometer mounted on the left wrist. GE Left Wrist LY LY RE CO ST LA SW WS WF JO CY BC SQ SR NW Total 2246 461 86 1 23 1 2 1 0 13 64 5 0 45 2948 RE 406 2228 381 31 100 36 3 0 0 80 6 29 3 29 3332 CO 199 374 2722 6 15 16 3 0 0 91 4 11 2 14 3457 ST 5 53 7 2291 38 33 15 1 0 9 22 3 0 64 2541 LA 91 83 4 25 2857 225 22 2 1 23 23 6 49 0 3411 SW 3 24 3 17 273 1952 36 22 0 16 5 2 97 5 2455 WS 7 5 1 18 56 31 2198 354 11 11 15 4 363 0 3074 WF 2 0 0 3 26 17 380 2019 1 1 18 9 412 2 2890 JO 0 0 0 2 11 2 12 10 2291 0 0 0 75 0 2403 CY 8 55 184 15 51 32 9 1 0 2606 10 2 13 1 2987 BC 89 18 7 29 77 10 10 51 0 14 2225 8 2 4 2544 SQ 4 53 39 1 32 4 6 10 0 51 11 1894 4 0 2109 SR 2 0 0 1 95 109 273 300 147 2 8 15 2619 0 3571 NW 85 5 70 326 7 0 1 0 0 0 8 0 1 4000 4503 The “Total” column is the total number of five-second intervals in which each activity was performed (data from all 39 participants). Rows are actual activities performed, and columns are predicted activities. LY = Lying, RE = Reading, CO = Computer, ST = Standing, LA = Laundry, SW = Sweeping, WS = Walk slow, WF = Walk fast, JO = Jogging, CY = Cycling, BC = Biceps curls, SQ = Squats, SR = Stairs, NW = Non-wear. 145 Table 4.9. Confusion matrix for activity type classification for a GENEA accelerometer mounted on the right wrist. GE Right Wrist LY LY RE CO ST LA SW WS WF JO CY BC SQ SR NW Total 2391 398 38 1 34 2 1 0 0 17 62 1 3 0 2948 RE 519 1995 406 63 98 42 4 0 0 102 7 17 1 78 3332 CO 101 210 2971 0 14 7 1 0 0 142 0 0 1 10 3457 ST 9 105 4 2287 23 25 25 2 0 7 11 4 2 37 2541 LA 87 78 2 16 2705 321 41 6 0 43 27 8 77 0 3411 SW 14 12 0 14 342 1902 54 0 2 27 11 6 71 0 2455 WS 3 4 0 11 67 30 2219 352 9 10 20 81 268 0 3074 WF 0 0 0 1 32 24 406 2006 0 1 14 15 391 0 2890 JO 0 0 0 3 3 0 5 18 2310 0 3 0 61 0 2403 CY 51 72 195 2 79 38 11 3 0 2513 7 1 14 1 2987 BC 26 1 6 36 61 17 29 20 0 10 2310 17 10 1 2544 SQ 1 20 1 2 24 18 57 7 1 66 13 1854 44 1 2109 SR 6 1 0 0 119 96 266 271 112 4 15 25 2656 0 3571 NW 114 88 13 168 4 2 1 0 0 1 1 0 0 4111 4503 The “Total” column is the total number of five-second intervals in which each activity was performed (data from all 39 participants). Rows are actual activities performed, and columns are predicted activities. LY = Lying, RE = Reading, CO = Computer, ST = Standing, LA = Laundry, SW = Sweeping, WS = Walk slow, WF = Walk fast, JO = Jogging, CY = Cycling, BC = Biceps curls, SQ = Squats, SR = Stairs, NW = Non-wear. 146 Table 4.10. Activity-specific sensitivity, specificity, and AUC among the four accelerometer placement sites. LY RE CO ST LA SW WS WF JO CY BC SQ SR Sensitivity (% agreement) GE GE AG AG Left Right Hip Thigh Wrist Wrist 90.8 53.9 76.2 81.1 (3.3)* (5.7)* (4.9)* (4.5)* 36.5 38.4 66.9 59.9 (5.2)* (5.3)* (5.1)* (5.3)* 39.2 54.6 78.7 85.9 (5.2)* (5.3)* (4.3)* (3.7)* 56.9 69.6 90.2 90.0 (6.1)* (5.7)* (3.7) (3.7) 50.6 59.3 83.8 79.3 (5.3)* (5.3)* (3.9)* (4.3)* 60.0 72.7 79.5 77.5 (6.2)* (5.6)* (5.1)* (5.3)* 66.9 82.2 71.5 72.2 (5.3)* (4.3)* (5.1) (5.0) 77.1 71.9 69.9 69.4 (4.9)* (5.2)* (5.3) (5.4) 92.5 93.1 95.3 96.1 (3.4) (3.2) (2.7)^ (2.5)^ 66.0 90.2 87.2 84.1 (5.4)* (3.4)* (3.8)* (4.2)* 44.8 53.4 87.5 90.8 (6.2)* (6.2)* (4.1)* (3.6)* 78.9 88.9 89.8 87.9 (5.5)* (4.3) (4.1) (4.4) 87.3 89.1 73.3 74.4 (3.5)* (3.3)* (4.6) (4.6) AG Hip 99.5 (0.8)* 95.3 (2.3) 95.3 (2.2) 95.9 (2.5)* 95.7 (2.2) 97.2 (2.1)* 98.1 (1.5) 98.9 (1.2) 99.7 (0.7) 96.5 (2.1)* 96.3 (2.3)* 99.4 (1.1) 97.6 (1.6) Specificity (%) GE AG Left Thigh Wrist 95.6 97.7 (2.4)* (1.7) 95.5 97.1 (2.2) (1.8)^ 95.6 98.0 (2.2) (1.5)^ 98.8 98.8 (1.3) (1.3) 95.8 97.9 (2.1) (1.5) 98.3 98.7 (1.6) (1.4) 98.1 98.0 (1.5) (1.6) 98.8 98.1 (1.3) (1.6) 99.7 99.6 (0.7) (0.8) 98.9 99.2 (1.2) (1.0) 97.1 99.5 (2.1) (0.9) 99.8 99.8 (0.6) (0.6) 98.7 97.4 (1.2)* (1.7) 147 AUC GE Right Wrist 97.6 (1.8) 97.5 (1.7)^ 98.3 (1.4)^ 99.2 (1.1)* 97.7 (1.6) 98.4 (1.6) 97.7 (1.7) 98.3 (1.5) 99.7 (0.7) 98.9 (1.2) 99.5 (0.9) 99.6 (0.9) 97.6 (1.6) AG Hip AG Thigh 0.95 (0.01)* 0.66 (0.01)* 0.67 (0.01)* 0.76 (0.01)* 0.73 (0.01)* 0.79 (0.01)* 0.82 (0.01)* 0.88 (0.01)* 0.96 (0.01) 0.81 (0.01)* 0.71 (0.01)* 0.89 (0.01)* 0.92 (0.01)* 0.75 (0.01)* 0.67 (0.01)* 0.75 (0.01)* 0.84 (0.01)* 0.78 (0.01)* 0.86 (0.01)* 0.90 (0.01)* 0.85 (0.01)* 0.96 (0.01) 0.95 (0.01)* 0.75 (0.01)* 0.94 (0.01) 0.94 (0.01)* GE Left Wrist 0.87 (0.01)* 0.82 (0.01)* 0.88 (0.01)* 0.94 (0.01)* 0.91 (0.01)* 0.89 (0.01)* 0.85 (0.01) 0.84 (0.01) 0.97 (0.01)* 0.93 (0.01)* 0.93 (0.01)* 0.95 (0.01)* 0.85 (0.01)* GE Right Wrist 0.89 (0.01)* 0.79 (0.01)* 0.92 (0.01)* 0.95 (0.01)* 0.88 (0.01)* 0.88 (0.01)* 0.85 (0.01) 0.84 (0.01) 0.98 (0.01)* 0.92 (0.01)* 0.95 (0.01)* 0.94 (0.01) 0.86 (0.01)* Table 4.10 (cont’d.) NW Total 81.8 (3.6)* 66.2 (1.4)* 86.0 (3.2)* 71.4 (1.4)* 88.8 (2.9)* 80.9 (1.2) 91.3 (2.6)* 81.1 (1.2) 98.1 98.4 (1.3) (1.2) 97.4 97.8 (0.5)* (0.4)* 99.6 (0.6)^ 98.5 (0.4) 99.7 (0.5)^ 98.5 (0.4) 0.90 (0.01)* 0.82 (0.00)* 0.92 (0.01)* 0.85 (0.00)* 0.94 (0.01)* 0.90 (0.00) 0.95 (0.01)* 0.90 (0.00) Values are shown as Mean (SD). The * indicates significant differences from all other accelerometer placements. The ^ indicates significant differences from the hip and thigh accelerometers. LY = Lying, RE = Reading, CO = Computer, ST = Standing, LA = Laundry, SW = Sweeping, WS = Walk slow, WF = Walk fast, JO = Jogging, CY = Cycling, BC = Biceps curls, SQ = Squats, SR = Stairs, NW = Non-wear. AG Hip = Hip-mounted ActiGraph monitor, AG Thigh = Thigh-mounted ActiGraph monitor, GE Left Wrist = GENEA monitor placed on the left wrist, GE Right Wrist = GENEA monitor placed on the right wrist. 148 Table 4.11. Overall sensitivity, specificity, and AUC among the four accelerometer placement sites with combined activity categories. Sensitivity (% agreement) Specificity (%) AUC GE GE GE GE GE GE AG AG AG AG AG AG Left Right Left Right Left Right Hip Thigh Hip Thigh Hip Thigh Wrist Wrist Wrist Wrist Wrist Wrist 72.6 92.1 93.5 92.7 93.9 97.0 97.2 97.2 0.83 0.95 0.95 0.95 SE (2.8)* (1.7)* (1.6)* (1.6)* (1.5)* (1.1) (1.0) (1.0) (0.01)* (0.01) (0.01) (0.01) 56.9 69.6 90.2 90.0 95.9 98.8 98.8 99.2 0.76 0.84 0.94 0.95 ST (6.1)* (5.7)* (3.7) (3.7) (2.5)* (1.4) (1.3) (1.1)* (0.01)* (0.01)* (0.01)* (0.01)* 68.5 83.1 90.5 89.8 94.7 96.6 97.7 97.6 0.82 0.90 0.94 0.94 LI (3.8)* (3.1)* (2.4) (2.5) (1.8)* (1.5)* (1.2) (1.2) (0.01)* (0.01)* (0.01) (0.01) 66.9 82.2 71.5 72.2 98.1 98.1 98.0 97.7 0.82 0.90 0.85 0.85 WS (5.3)* (4.3)* (5.1) (5.0) (1.6) (1.5) (1.6) (1.7) (0.01)* (0.01)* (0.01) (0.01) 77.1 71.9 69.9 69.4 98.9 98.8 98.1 98.3 0.88 0.85 0.84 0.84 WF (4.9)* (5.2)* (5.3) (5.4) (1.2) (1.2) (1.6) (1.5) (0.01)* (0.01)* (0.01) (0.01) 92.5 93.1 95.3 96.1 99.7 99.7 99.6 99.7 0.96 0.96 0.97 0.98 JO (3.4) (3.2) (2.7)^ (2.5)^ (0.7) (0.7) (0.8) (0.7) (0.01) (0.01) (0.01)* (0.01)* 66.0 90.2 87.2 84.1 96.5 98.9 99.2 98.9 0.81 0.95 0.93 0.92 CY (5.4)* (3.4)* (3.8)* (4.2)* (2.1)* (1.2) (1.0) (1.2) (0.01)* (0.01)* (0.01)* (0.01)* 60.6 70.3 88.9 90.1 94.7 95.7 97.3 97.4 0.78 0.83 0.93 0.94 EX (4.5)* (4.2)* (2.9)* (2.7)* (2.0)* (1.8)* (1.5) (1.4) (0.01)* (0.01)* (0.01)* (0.01)* 87.3 89.1 73.3 74.4 97.6 98.7 97.4 97.6 0.92 0.94 0.85 0.86 SR (3.5)* (3.3)* (4.6) (4.6) (1.6) (1.2)* (1.7) (1.6) (0.01)* (0.01)* (0.01)* (0.01)* 81.8 86.0 88.8 91.3 98.1 98.4 99.6 99.7 0.90 0.92 0.94 0.95 NW (3.6)* (3.2)* (2.9)* (2.6)* (1.3) (1.1) (0.6)^ (0.5)^ (0.01)* (0.01)* (0.01)* (0.01)* 72.5 84.0 86.6 86.7 96.8 98.1 98.3 98.3 0.85 0.91 0.92 0.92 Total (1.4)* (1.1)* (1.0) (1.0) (0.5)* (0.4)* (0.4) (0.4)* (0.00)* (0.00)* (0.00) (0.00) Values are shown as Mean (SD). The * indicates significant differences from all other accelerometer placement sites. The ^ indicates significant differences from the hip and thigh accelerometers. SE = Sedentary, ST = Standing, LI = Lifestyle, WS = Walk slow, WF = Walk fast, JO = Jogging, CY = Cycling, EX = Exercise, SR = Stairs, NW = Non-wear. 149 Table 4.11 (cont’d.) AG Hip = Hip-mounted ActiGraph monitor, AG Thigh = Thigh-mounted ActiGraph monitor, GE Left Wrist = GENEA monitor placed on the left wrist, GE Right Wrist = GENEA monitor placed on the right wrist. 150 Table 4.12. Activities classified into activity intensities by the Compendium and by measured METs. Activity Code Description Lying 07011 Reading 09030 Computer 09040 Standing 07041 Lying quietly, doing nothing, lying in bed awake, listening to music (not talking or reading) Sitting, reading, book, newspaper, etc. Sitting, writing, desk work, typing Standing, fidgeting Laundry 05090 Sweeping 05011 Biceps 09071 curls Walk slow 17152 Walk fast 17200 Cycling 02017 Squats 02052 Compendium Compendium Experimentally METs Intensity measured METs [Mean (SD)] 1.3 Sedentary 1.4 (0.7) Experimental Intensity 1.3 Sedentary 1.4 (0.6) Sedentary 1.3 Sedentary 1.4 (0.6) Sedentary 1.8 Light 1.4 (1.0) Light Laundry, fold or hang clothes, put clothes in washer or dryer, packing suitcase, washing clothes by hand, implied standing, light effort Cleaning, sweeping, slow, light effort Standing, miscellaneous 2.0 Light 2.1 (0.6) Light 2.3 Light 2.5 (0.5) Light 2.5 Light 2.0 (0.6) Light Walking, 2.0 mph, level, slow pace, firm surface Walking, 3.5 mph, level, brisk, firm surface, walking for exercise Bicycling, stationary, 51-89 watts, light-to-moderate effort Resistance (weight) training, squats , slow or explosive effort 2.8 Light 2.9 (0.7) Light 4.3 Moderate 4.2 (1.1) Moderate 4.8 Moderate 4.4 (1.1) Moderate 5.0 Moderate 4.5 (1.2) Moderate 151 Sedentary Table 4.12 (cont’d.) Stairs 17130 Jogging Non-wear 12030 -- Stair climbing, using or climbing up ladder (Taylor Code 030) Running, 5 mph (12 min/mile) -- 8.0 Vigorous 6.8 (1.5) Vigorous 8.3 -- Vigorous -- 8.0 (1.8) -- Vigorous -- 152 Table 4.13. Overall sensitivity, specificity, and AUC among the four accelerometer placement sites for classification of activity intensity. Sensitivity (% agreement) Specificity (%) GE Right Wrist 91.3 (2.6)* AG Hip AG Thigh 86.0 (3.2)* GE Left Wrist 88.8 (2.9)* 98.1 (1.3) 72.6 (2.8)* 92.1 (1.7)* 93.5 (1.6) 92.7 (1.6) Lightintensity 75.8 (2.3)* 93.4 (1.3)* 89.1 (1.6) Moderateintensity 75.4 (3.0)* 85.0 (2.5)* Vigorousintensity 92.3 (2.2) GE Right Wrist 99.7 (0.5)^ AG Hip AG Thigh 98.4 (1.1) GE Left Wrist 99.6 (0.6)^ 0.90 (0.01)* 93.9 (1.5)* 97.0 (1.1) 97.2 (1.0) 97.2 (1.0) 89.9 (1.6) 86.5 (1.8)* 96.3 (1.0)* 93.7 (1.3) 82.6 (2.7) 81.0 (2.7) 94.4 (1.6)* 97.6 (1.1)* 92.9 (2.1) 85.9 (2.8)^ 86.0 (2.8)^ 97.6 (1.2) 87.3 (1.8)* 78.0 (1.3)* 93.0 (1.3)* 90.7 (0.9)* 89.4 (1.6) 88.4 (1.0) 88.6 (1.0) 88.5 (1.0) 92.4 (1.4)* 92.5 (0.8)* AG Hip AG Thigh Non-wear 81.8 (3.6)* Sedentary MVPA Total AUC 0.92 (0.01)* GE Left Wrist 0.94 (0.01)* GE Right Wrist 0.95 (0.01)* 0.83 (0.01)* 0.95 (0.01) 0.95 (0.01) 0.95 (0.01) 93.8 (1.3) 0.81 (0.00)* 0.95 (0.01)* 0.91 (0.01)* 0.92 (0.01)* 96.8 (1.2) 96.5 (1.3) 0.85 (0.01)* 0.91 (0.01)* 0.90 (0.01)* 0.89 (0.01)* 98.6 (0.9)* 97.4 (1.3) 97.5 (1.3) 0.95 (0.01)* 0.96 (0.01)* 0.92 (0.01) 0.92 (0.01) 97.6 (0.8)* 97.2 (0.5)* 95.5 (1.1) 96.2 (0.6) 95.3 (1.1) 96.2 (0.6) 0.90 (0.00)* 0.85 (0.00)* 0.95 (0.01)* 0.94 (0.00)* 0.92 (0.01) 0.92 (0.00) 0.92 (0.01) 0.92 (0.00) Values are shown as Mean (SD). The * indicates significant differences from all other accelerometer placement sites. The ^ indicates significant differences from the hip and thigh accelerometers. AG Hip = Hip-mounted ActiGraph monitor, AG Thigh = Thigh-mounted ActiGraph monitor, GE Left Wrist = GENEA monitor placed on the left wrist, GE Right Wrist = GENEA monitor placed on the right wrist. 153 Figure 4.3. Comparison of dominant and non-dominant wrist accelerometer sensitivities. 100 90 Sensitivity (%) 80 70 60 50 Non-Dominant 40 Dominant 30 20 10 0 1 2 3 4 5 Feature Set 154 DISCUSSION The purpose of this study was to develop and validate ANNs using data from accelerometers located on several locations of the body in order to classify activity types. Specifically, we compared the accuracy of ANNs developed for wrist-, hip-, and thigh-mounted accelerometers as well as compared accuracy of accelerometers placed on the left and right wrists. A secondary purpose was to assess the accuracies of the four accelerometer placement sites for classifying specific types of activities (e.g., sedentary, lifestyle, and exercise activities) and activity intensities and to test multiple feature sets. The wrist-mounted accelerometers outperformed the hip- and thigh-mounted accelerometers for total classification accuracy, achieving over 80% sensitivity in our initial analysis and over 86% when combining similar activities into subcategories (i.e., combining lying, reading, and computer use as sedentary). Additionally, when looking solely at the three sedentary activities, the wrist monitors provided sensitivities of 92.7-93.5% when combined into a single sedentary category, which was slightly higher than the thigh (92.1%) and much higher than the hip (72.6%). The wrist accelerometer placement sites also had the highest sensitivities for detecting exercise and lifestyle activities, although the thigh had the highest sensitivity for classifying cycling. Also, the wrist monitors provided higher sensitivity for standing than the hip or thigh. In direct comparison of the left and right wrist placements, we found no differences in overall sensitivity and only very slight differences (1-5%) for specific activity types. These small differences were statistically significant due to the large number of windows of data (>42,000) used when determining sensitivity, but the clinical or real-world significance of the differences between the left and right wrist accelerometer placements is likely minimal. 155 Furthermore, follow-up analyses comparing dominant vs. non-dominant wrists yielded no differences in overall classification accuracy. These findings provide strong evidence that wrist accelerometers can be used to achieve high accuracy for recognition of a variety sedentary, ambulatory, lifestyle, and exercise activities. The superiority of the wrist accelerometer placements for the exercise and lifestyle activities was expected because these activities (with the exception of squats), utilize mostly upper-body movements. These activities would be easier to detect with monitors worn on the wrists compared to accelerometers worn on the hip or thigh since the patterns of wrist movement are likely more distinct than thigh or hip movement and ,therefore, would be best recognized using pattern recognition approaches such as ANNs. The superior accuracy of the wrist accelerometer placement sites for measurement of specific sedentary activities was initially surprising given that thigh-mounted accelerometers have consistently yielded high accuracy for measurement of time spent in sedentary activities as well as breaks in sedentary activities (Kozey-Keadle, Libertine et al. 2011; Lyden, Kozey Keadle et al. 2012). However, a recent study by Rowlands et al. described an elegant and accurate way to use a concept called the “sedentary sphere” to identify specific types of sedentary activities from a wrist accelerometer (Rowlands, Olds et al. 2014). Our study provides further evidence that a wrist-worn accelerometer can provide an accurate indication of specific types of sedentary activities. The high overall sensitivities achieved with the wrist-mounted accelerometers was also surprising given previous research showing that wrist-mounted accelerometers are often outperformed by monitors on other parts of the body. A study by Mannini et al. (Mannini, Intille et al. 2013) showed higher classification accuracies of an ankle monitor (95%) compared to a 156 wrist monitor (84.7%), although the overall accuracy of the wrist monitor is very similar to that achieved in our study. Furthermore, Skotte et al. found classification accuracies of 99% for classifying activity type using hip- and thigh-mounted accelerometers (Skotte, Korshoj et al. 2014), which is well above the classification accuracies achieved in our study. However, Skotte et al. tested only six activities, and the authors ended up removing one activity (stair climbing) since it had poor classification accuracy. Results from Cleland et al. (Cleland, Kikhia et al. 2013) showed very high classification accuracies for hip, wrist, and thigh accelerometers (9597%), but again, only seven activities were used. In short, most research comparing several different accelerometer placement sites is limited by use of small subject numbers (i.e., < 20) and/or small numbers of activities in their studies, limiting their comparisons of the advantages and disadvantages of each placement site. In a recent study, members of our research team found higher classification accuracies of a thigh-mounted accelerometer compared to a wrist-mounted accelerometer (78% vs. 71%) for classifying 14 activities in a laboratory-based setting (Dong, Montoye et al. 2013). However, the accelerometers used in this study measured acceleration data in only two axes (Dong, Montoye et al. 2013), whereas accelerometers used in the current study measured accelerations in three planes of movement (triaxial). It is reasonable to assume that the hip and thigh, which lie closest to the center of the body, would move mostly in the anterior-posterior and vertical planes; conversely, the wrist, which was the most distal accelerometer attachment site tested, would experience significant movement in the medial-lateral plane as well as the anterior-posterior and vertical planes. Therefore, addition of a third measurement axis in the current study may have benefitted the ANNs for the wrist accelerometers much more than the hip or thigh 157 accelerometers and contributed to the much higher accuracy for the ANNs developed for the wrist-mounted accelerometers seen in this study compared to Dong’s work. Also, Dong’s study, as well as Cleland’s and Skotte’s, used a laboratory-based setting, which would have questionable generalizability to a free-living environment (Gyllensten and Bonomi 2011; Trost, Wong et al. 2012; Lyden, Keadle et al. 2013; Mannini, Intille et al. 2013). Our current study builds off of this previous research by validating activity type recognition of ANNs developed for wrist-, hip-, and thigh-mounted accelerometers in a simulated-free living setting, with a wide range of activities and the ability to directly compare monitors located at these popular placement sites. The utility of the wrist placement sites for activity type prediction found in this study is especially encouraging given its implementation in many studies, including the 2011-2014 NHANES data collection cycle. There is preliminary evidence that participant compliance is improved with wear on the wrist (Troiano, McClain et al. 2014), so the wrist holds promise for use in large studies due in part to this improved compliance but also its high accuracy of measurement for both activity type and energy expenditure prediction seen in this study as well as previous work. Additionally, we have found that choice of wrist (left vs. right and dominant vs. non-dominant) does not lower accuracy of activity type prediction, which is encouraging as it may allow participants in large studies to choose the wrist on which they wear the accelerometer. The use of wrist-mounted monitors for sleep measurement in previous research (Kripke, Mullaney et al. 1978; Mullaney, Kripke et al. 1980; Jean-Louis, Kripke et al. 2001) suggests that there is potential for a single accelerometer placed on the wrist to measure physical activity, inactivity/sedentary behavior, and sleep accurately, thereby providing a comprehensive 158 measurement tool for assessing several different behavioral characteristics known to have strong associations with health. Investigators have begun to use commercially available devices such as the Fitbit (Fitbit Inc., San Francisco, CA) and Nike Fuelband (Nike Inc., Beaverton, OR) for comprehensive measurement of activity and sleep, but limited evidence available does not support their accuracy (Montgomery-Downs, Insana et al. 2012; Dannecker, Sazonova et al. 2013; Fortune, Lugade et al. 2014), and we know of no research-grade devices yet capable of accomplishing this task. Now that we have developed algorithms to classify activity type and predict energy expenditure from a wrist-worn accelerometer, we intend to expand our investigations and measure sleep duration and quality as well as sedentary behavior. While the wrist accelerometer placements performed the best in this study, the performance of the hip and thigh accelerometer placements should not be overlooked. At over 70% for prediction accuracy when combining similar activities into categories, the hip placement performed well, although this accuracy is not the highest achieved in the literature. As previously discussed, Cleland et al. (Cleland, Kikhia et al. 2013) and Skotte et al. (Skotte, Korshoj et al. 2014) both achieved over 97% accuracy for the hip-mounted accelerometer for classifying activity type. In the current study, main weaknesses of the hip placement were encountered in classifying sedentary behaviors, standing, and lifestyle and exercise activities. Of these, only sitting and standing were included in the studies by Cleland and Skotte, likely contributing to their higher accuracy of measurement. Members of our research group (Dong, Montoye et al. 2013) used a similar activity set and achieved higher accuracy (78%) for the hip than in the current study, which may be partially attributed to the use of a simulated free-living setting in the current study (vs. a laboratory-based setting in the previous study). Additionally, 159 our research group controlled the exact speed of the walking (2.0 and 4.0 miles/hour), jogging (6.0 miles/hour), and stair climbing (60 steps/min) in the previous study, whereas the current study included no such limitations on speeds, resulting in much more variability in speed of movement and potentially lower accuracy for classifying these tasks. The discrepancy in accuracies between the lab-based and free-living settings also indirectly shows the importance of validating predictive algorithms in a setting similar to that in which they are intended for use, thereby obtaining a more realistic view of their accuracy in the true free-living setting. In terms of accurately classifying sedentary activities, the hip performed moderately well, with 72.6% accuracy for the combined category but only 36.5-90.8% for the individual sedentary activities. According to Table 4.10, sedentary activities were often misclassified as standing with the hip accelerometer placement, which is not surprising given the static nature of these activities as well as the similar hip angle seen with sitting and standing. Poor classification of sedentary behavior by hip-worn accelerometers was also seen in studies by Lyden et al. (Lyden, Kozey Keadle et al. 2012) and Kozey-Keadle et al. (Kozey-Keadle, Libertine et al. 2011), where the hip-worn accelerometer inaccurately predicted total sedentary time and breaks in sedentary time using the cut-point approach to classification. Despite fairly widespread use of hip-mounted accelerometers for measuring sedentary behavior in previous literature, our findings, along with those of Lyden and Kozey-Keadle, suggest that hip-mounted accelerometer estimates of sedentary behavior should be used with caution, regardless of whether cut-points or machine learning are used for prediction. The current study showed that a thigh-mounted accelerometer performed better than the hip placement but worse than the wrist placements for activity type prediction. At 71.4%, the 160 thigh placement had lower accuracy than achieved in Cleland’s and Skotte’s work (>95%) and slightly lower accuracy than in our previous, lab-based study (78%). Our lower measurement accuracy is likely due to the addition of more activities than the other studies and use of a simulated free-living setting. Notably, accuracy of the thigh accelerometer increased from 71.4% to 84.0% upon combining the categories for sedentary, lifestyle, and exercise activities. This increase was due mostly to improvement in accuracy of sedentary and lifestyle activity measurement accuracy upon combining into categories. The inability to measure individual sedentary activities accurately was expected since the angle and movement of the thigh is very similar for lying and seated activities. However, of greater importance is that the thigh accelerometer achieved high accuracy differentiating sedentary activities from non-sedentary activities and was able to differentiate between sedentary activities and standing (as seen in Table 4.8), which the hip-mounted accelerometer was unable to accomplish. The differentiation between sedentary activities and standing is important for measuring total sedentary time as well as breaks in sedentary behavior, and the high accuracy we found for the thigh is in accordance with previous studies showing excellent accuracy of thigh-mounted accelerometers for measuring sedentary time and breaks in sedentary behavior (Grant, Ryan et al. 2006; Lyden, Kozey Keadle et al. 2012). The high overall accuracy of thigh-mounted accelerometers for classifying sedentary and non-sedentary activities achieved in this study provides further rationale for their use in measuring PA as well as SB in free-living settings. Grouping of activities by intensity resulted in highest sensitivity and AUC by the thigh accelerometer placement and slightly lower values for the wrist placements (Table 4.13). Given that the thigh accelerometer placement performed best for estimation of energy expenditure 161 (Chapter 3), the higher performance of the thigh placement for prediction of activity intensity provides further evidence that the thigh accelerometer placement shows higher measurement accuracy than the hip or wrist placements for prediction of the energy cost of activities. In studies specifically focused on identifying time spent in different activity intensities (i.e., for identifying time spent sedentary or time spent in MVPA), the thigh accelerometer placement may be optimal. However, the wrist placements appear best if trying to identify individual types of activities. Due to the different strengths of the monitors located on the hip, thigh, and wrists, choice of placement should depend on the population of interest as well as the specific research question. ANNs for hip-mounted accelerometers classified ambulatory activities and stair use well in this study and have previously been shown to provide highly accurate estimates of energy expenditure (Staudenmayer, Pober et al. 2009) (Chapter 3), but studies in pregnant or obese populations should consider avoiding use of a hip accelerometer due to monitor tilt that can occur (Feito, Bassett et al. 2011). For researchers interested specifically in measuring sedentary behavior, the thigh-mounted accelerometer may be preferable due to high accuracy of classification seen in this study as well as in previous work by Lyden et al. and Kozey Keadle et al. (Kozey-Keadle, Libertine et al. 2011; Lyden, Kozey Keadle et al. 2012). In contrast, those seeking to maximize compliance or those interested in recognition of activity types or sleep may benefit from use of a wrist-mounted accelerometer (Jean-Louis, Kripke et al. 2001). 162 Strengths and limitations This study had several limitations that must be addressed. First, our sample was relatively homogenous and consisted mainly of younger adults who had a lower BMI than the general population. Individuals larger or smaller than those tested, or individuals who perform activities at a different intensity than performed in the study, may not be measured well using the current ANN algorithms. Additionally, our study provided only a sample of activities that people may perform on a daily basis, and therefore our models cannot necessarily be used for comprehensive assessment of everyday activities. Finally, our study did not record the walking or jogging speeds performed by participants, which may have been useful for evaluating the differences between these activities. However, MET values recorded for the slow walk averaged 2.9 METs, while the fast walk elicited an average MET value of 4.2 METs, providing evidence that the two walking speeds were distinct activities that fell into different intensity categories (i.e., light vs. moderate). This study also had several notable strengths. First, our models were created and tested using more than 42,000 five-second windows of data from 39 participants, which is larger than many data sets used in previous studies. Second, validation studies cannot reasonably test all activities that a person could perform, so it is important to pick a set of activities that encompasses a range of intensities and types as well as activities commonly performed in daily life. Our study incorporated a diverse collection of activities, including commonly performed activities such as walking and several sedentary activities, as well as lifestyle and exercise activities of varying intensities. Additionally, our use of a simulated-free living setting is a major advantage as it allowed for much greater variability in the movement patterns and intensities of the activities performed as well as not requiring steady-state to be 163 achieved, as is usually the case for laboratory-based protocols. Our inclusion of wrist-, hip-, and thigh-mounted accelerometers is also a study strength as it allowed for direct comparison of the accuracy of models developed for each placement site. Finally, our use of Microsoft Excel for data processing cleaning, and analysis and R statistical software for model creation and testing provides further evidence of the accessibility of machine learning to those without access to highly powered statistical software or computer programming experience. Staudenmayer et al. (Staudenmayer, Pober et al. 2009) and Lyden et al. (Lyden, Keadle et al. 2013) provide simple details on the code used for developing and testing ANNs using R software. Conclusions In conclusion, we tested four accelerometers located on the left and right wrists, right hip, and right thigh for their utility in classifying activity type across a wide range of activities performed in a simulated free-living setting. Overall sensitivity was moderately high at 66-81%, which improved to 73-87% when condensing similar activities into categories. Both wrist accelerometer placement site outperformed the hip and thigh placements for total classification accuracy as well as in many of the individual activities, providing further support of the wrist placement for use in large epidemiologic and surveillance studies. Our study builds upon previous work by using a simulated free-living setting, which enhances generalizability of the findings as well as the predictive models created. In the future, we intend to expand our algorithms to measure sleep quality and duration and validate the algorithms in a larger, more diverse sample. 164 CHAPTER 5 VALIDATION AND COMPARISON OF ACCELEROMETERS WORN ON THE WRISTS, HIP, AND THIGH FOR MEASURING SEDENTARY BEHAVIOR ABSTRACT The purpose of this study was to validate and compare the accuracy of activity type prediction models developed for accelerometers placed on the wrists, hip, and thigh for measurement of total time spent in sedentary behavior and breaks in sedentary behavior. METHODS: Forty four healthy adults participated in a 90-minute simulated free-living activity protocol, in which participants performed a total of 14 sedentary, ambulatory, lifestyle, and exercise activities for 310 minutes each. Participants dictated the order, duration, and intensity of activities, which were recorded using direct observation (for a criterion measure of total time spent in sedentary behavior and breaks in sedentary behavior). All time spent in lying, reading, and computer use were summed to obtain a measure of total time spent in sedentary behavior. Any transition from one of these three activities to a non-sedentary activity was recorded to measure breaks in sedentary behavior. Four accelerometers were worn (right and left wrists, right hip, and right thigh) in order to predict total time spent in sedentary behavior and breaks in sedentary behavior compared to that measured by direct observation (using the activity type prediction models developed in our previous research [Chapter 4]). We used and tested three break intervals (5-, 30-, and 60-seconds) in order to determine the best method of characterizing breaks in sedentary behavior from an accelerometer. Differences among accelerometer-predicted and criterionmeasured total time spent in sedentary behaviors and breaks in sedentary behavior were evaluated using repeated measures analysis of variance and by non-overlap of 95% confidence 165 intervals. RESULTS: For total time spent in sedentary behavior, all four accelerometers provided similar estimates to direct observation, but the wrist accelerometers had the lowest error for prediction (2.8-3.1 minutes), and the hip had the highest error (7.2 minutes). For breaks in sedentary behavior, the 30-second break interval provided the greatest predictive accuracy. Using this interval, the hip and left wrist accelerometer produced estimates similar to that measured by direct observation, but the thigh and right wrist underestimated breaks in sedentary behavior by 15-17%. CONCLUSIONS: Hip and left wrist accelerometer placements provided the highest overall accuracy for measuring the multiple constructs of sedentary behavior These findings lie in contrast to previous research showing the utility of thigh accelerometers for measurement of sedentary behavior and therefore warrant confirmation. The superiority of the left wrist accelerometer over the right wrist accelerometer provides support for the convention that accelerometers be placed on the non-dominant wrist for sedentary behavior measurement. 166 INTRODUCTION Physical activity (PA) has long been recognized for its beneficial effects on many health indices, such as lowering risk of obesity, cardiovascular disease, and certain cancers, just to name a few (Morris, Clayton et al. 1990; King and Tribble 1991; Thune and Furberg 2001). Correspondingly, the Physical Activity Guidelines Advisory Committee issued a report in 2008 detailing evidence-based recommendations that adults should attain 150 minutes/week of moderate-intensity PA or 75 minutes of vigorous-intensity PA to experience health benefits (2008). Sedentary behavior (SB) has traditionally been viewed as a lack of PA, and people were considered sedentary if not meeting the national PA recommendations (Pate, O'Neill et al. 2008). However, it is possible to meet PA recommendations and still spend substantial time engaged in sedentary activities, (i.e., driving, using a computer, watching TV), a group Owen et al. called the “active couch potatoes” (Owen, Healy et al. 2010). More recently, epidemiologic and laboratory-based studies have started uncovering associations between high amounts of SB and diminished metabolic, cardiovascular, and bone health as well as an increased risk of obesity, some cancers, and all-cause mortality (Zerwekh, Ruml et al. 1998; Hu, Li et al. 2003; Hamilton, Hamilton et al. 2004; Hamilton, Hamilton et al. 2007; Howard, Freedman et al. 2008; Schrage 2008; Katzmarzyk, Church et al. 2009; Owen, Healy et al. 2010) Notably, these associations are largely independent of level of PA. Additionally, it appears that the way SB is accrued may influence its effects on health, with longer periods of SB being worse than SB broken up periodically by short, non-sedentary activities (Healy, Dunstan et al. 2008; Owen, Healy et al. 2010). 167 Despite emerging findings of the potential health risks of SB, there is currently insufficient research to allow for evidence-based recommendations to be created with regard to SB. Given that adults spend well over 50% of their waking hours in SB (Matthews, Chen et al. 2008), it is important to accurately measure SB in order to better determine health risks associated with SB and develop evidence-based recommendations for SB in order to improve health. Accelerometer-based activity monitors have become a widely used and accepted method for PA and energy expenditure measurement due to their objectivity, relatively low participant and researcher burden, and high measurement accuracy in numerous validation studies conducted in laboratory-based and free-living environments (Welk 2002). Traditionally, accelerations of the body were recorded and translated into ‘activity counts,’ which correspond to magnitude of acceleration. Activity counts could then be placed into simple linear regression equations to estimate energy expenditure and activity intensity (Montoye, Washburn et al. 1983; Freedson, Melanson et al. 1998). A count cut-point of <100 counts/minute has been widely used as a threshold for estimating SB using accelerometers; however, this cut-point has been shown to provide inaccurate estimates of SB and an inability to measure breaks in SB in free-living settings (Kozey-Keadle, Libertine et al. 2011; Lyden, Kozey Keadle et al. 2012). Other cut-points ranging from 50-250 counts/minute have been used to define SB with varying degrees of accuracy (KozeyKeadle, Libertine et al. 2011; Lyden, Kozey Keadle et al. 2012),. Regardless of which cut-point is chosen to designate SB, the cut-point method has several notable fallacies. First, the cut-point method does not allow for differentiation of SB from accelerometer non-wear without establishing additional data reduction rules (ex., how many consecutive minutes of 0 counts should count as 168 non-wear), which can affect estimates of SB and PA (Masse, Fuemmeler et al. 2005). Moreover, the cut-point approach would likely classify standing as sedentary (since little movement occurs when standing), but several studies have provided evidence that standing elicits a different physiologic response than sitting or lying (Bey and Hamilton 2003). Additionally, standing has been shown to be inversely associated with all-cause mortality and cardiovascular disease, especially in individuals not meeting PA recommendation (Katzmarzyk 2014). Therefore, an accurate measurement tool for SB needs to be able to differentiate between non-wear and SB as well as standing and sitting/lying. Due to limitations of the cut-point approach to measuring SB as well as energy expenditure, researchers have turned to more advanced data processing techniques, such as machine learning models, to improve accuracy of activity measurement. These studies show dramatically improved measurement of energy expenditure (Staudenmayer, Pober et al. 2009; Freedson, Lyden et al. 2011; Lyden, Keadle et al. 2013) and highly accurate classification of activity type from a hip-mounted accelerometer (Pober, Staudenmayer et al. 2006; Staudenmayer, Pober et al. 2009; Freedson, Lyden et al. 2011). However, to our knowledge, only one study has used machine learning models developed for a hip-mounted accelerometer specifically to measure total time in SB and breaks in SB. In this study, Lyden et al. found that total time spent in SB and breaks in SB could be measured accurately in a free-living setting, but only when the machine learning model was also developed in the free-living setting in which it was subsequently used. Therefore, there is encouraging, but by no means conclusive, evidence that machine learning models can improve measurement of SB using hip-mounted accelerometers. 169 Despite the common use of hip-mounted accelerometers, there are advantages of wearing accelerometers on other parts of the body. For example, tilt angle of a hip-mounted accelerometer will affect its measurement accuracy, which can pose problems when trying to measure pregnant or overweight individuals (Feito, Bassett et al. 2011; DiNallo, Downs et al. 2012). Additionally, the introduction of machine learning modeling to accelerometer data has dramatically improved measurement accuracy of accelerometers worn in various body locations, such as the wrist, thigh, ankle, lower back, and upper arm (Preece, Goulermas et al. 2009). Studies aiming to classify activity type using accelerometers placed on the wrist and thigh have consistently shown accuracies of >70% and often accuracies above 90% in laboratory-based studies (Zhang, Rowlands et al. 2012; Cleland, Kikhia et al. 2013; Mannini, Intille et al. 2013; Skotte, Korshoj et al. 2014). These two measurement sites are appealing not only for their high activity classification accuracy but also for their utility in measuring lifestyle behaviors such as sleep quality (wrist) (Webster, Kripke et al. 1982; Jean-Louis, Kripke et al. 2001) and SB (thigh) (Kozey-Keadle, Libertine et al. 2011; Lyden, Kozey Keadle et al. 2012; Skotte, Korshoj et al. 2012; Skotte, Korshoj et al. 2014), as well as their potential to improve participant compliance. Thigh-mounted accelerometers have been used with high accuracy for measuring total time spent in SB as well as breaks in SB and are often used as a criterion measure of SB in free-living environments. However, methods developed to classify SB from a thigh accelerometer provide accurate estimates of step count (Maddocks, Petrou et al. 2010; Harrington, Welk et al. 2011) but do not allow for detailed information on PA behaviors and appear to underestimate energy expenditure (Harrington, Welk et al. 2011). It would be useful to have a single measurement tool that could measure a variety of activity types as well as SB in a free-living setting; to our knowledge, no such method has yet been validated. 170 Additionally, the wrist-mounted accelerometer has not yet been validated for measurement of total time spent in SB or breaks in SB. We have previously developed and validated machine learning algorithms for hip-, wrist-, and thigh-mounted accelerometers that can classify activity type with accuracies above 70% for the hip and above 80% for the wrists and thigh, but these have yet to be validated for measurement of SB (Chapter 4). Therefore, the primary purpose of our study was to develop, validate, and compare the accuracy of machine learning algorithms created from hip-, wrist-, and thigh-mounted accelerometers for measuring 1) total time spent in SB and 2) breaks in SB in a simulated freeliving environment. A secondary purpose was to compare accelerometers located on the left and right wrists for prediction of total time spent in SB as well as breaks in SB. 171 METHODS Summary of protocol Participants came to the Human Energy Research Laboratory to participate in a 90minute simulated free-living protocol, for which they performed a total of 14 sedentary, ambulatory, lifestyle, and exercise activities while wearing a total of four accelerometers (placed on the right hip, right thigh, and both wrists). Each activity was performed for between 3-10 minutes, with the order, duration, and intensity of activities left up to participants. During the protocol, the order and duration of participants’ activities as well as total time spent in SB and breaks in SB were recorded by a trained observer. Participants A total of 44 adults (22 male, 22 female) were recruited from the surrounding area of East Lansing, MI via email, flyers, and word of mouth for participation in this study. In order to be eligible for participation, participants had to fulfill three criteria 1) they had to be free of health conditions preventing them from being able to safely perform moderate- or vigorousintensity physical activities, 2) they could not have an orthopedic limitations that would invalidate the use of accelerometry, and 3) they had to fall within the age range of 18-44 years. Prior to participant recruitment, this study was approved by the Michigan State University Institutional Review Board. Instrumentation Each participant wore four accelerometer-based activity monitors in this study: two ActiGraph GT3X+ accelerometers and two GENEActiv accelerometers. Additionally, a portable 172 digital assistant (PDA) computer was used by observers to record the activities performed during the protocol. The accelerometers and PDA were synchronized to an external clock before each test; descriptions of the accelerometers and PDA follow. The acceleration data for all four accelerometers were time stamped and stored within the monitors until they could be downloaded to a computer for analysis. Additionally, the accelerometers were oriented so that the x-axis was the vertical axis, the y-axis was the mediallateral axis, and the z-axis was the anterior-posterior axis. ActiGraph accelerometers The ActiGraph (ActiGraph LLC, Pensacola, FL) is a commonly used accelerometer for activity measurement, and there is an abundance of literature regarding its reliability and validity for measurement of PA (Freedson, Melanson et al. 1998; McClain, Sisson et al. 2007). Two GT3X+ models were worn by each participant during the study, one on the midline of the right thigh and adhered to the leg (with hypoallergenic sticky tape), and the other placed on the right hip at the anterior axillary line (with an elastic belt). The ActiGraph GT3X+ records raw accelerations of up to ± 6 times the gravitational force (6g) in three dimensions of movement. For the current protocol, the accelerometers recorded at a rate of 40 samples per second (40 Hz). GENEA accelerometers The GENEActiv accelerometer (Activinsights Ltd, Kimbolton, Cambridgeshire, UK) has undergone preliminary validations for PA measurement (Esliger, Rowlands et al. 2011) as well as activity type classification (Zhang, Rowlands et al. 2012). The GENEA records raw 173 accelerations of up to ± 6g in three axes of movement, and the GENEA monitors used in this study were set to record acceleration data at a rate of 20 Hz. Participants wore two GENEA accelerometers which were fastened to the dorsal side of each wrist using a watch strap supplied by the manufacturers (Esliger, Rowlands et al. 2011). iPAQ portable digital assistant and direct observation Direct observation (DO) was conducted using an HP iPAQ personal digital assistant (PDA) (HP Development Company, Palo Alto, CA) to obtain a criterion measure for total time spent in SB and breaks in SB. During the study protocol, a trained observer used a PDA with BEST software developed based on the Children’s Activity Rating Scale protocol (Puhl, Greaves et al. 1990). The observer used the codes T1-T14 to represent the 14 activities in the visit and recorded the activities being performed continuously as they occurred throughout the visit. The codes T 1-T3 represented the three sedentary activities (lying, reading, and computer use) in the visit, and these were used to determine total time spent in SB and breaks in SB. Inter-rater reliability for DO was above r=0.90 for this study. Procedure Upon arriving at the Human Energy Research Laboratory, details of the study were discussed with each participant. Written informed consent was obtained, and a physical activity readiness questionnaire was administered to ensure that the participant had no contraindications to engaging in PA. After consenting, participant weight and height were measured (to the nearest 0.1 kg and 0.1 cm, respectively) according to standardized methods (Malina 1995). Body mass index (BMI) was calculated by dividing body weight by the square of height (kg/m 2). Participant age 174 was assessed by asking participants to state their age in years, and handedness (left or right) was determined by asking participants which hand they prefer to use for the majority of everyday activities. After being fitted with the four accelerometers, participants performed 14 activities which were meant to include many different types and intensities of activities that would likely be seen in a free-living environment. (shown in Table 5.1). Ambulatory activities (walking and jogging) are common in accelerometer validation literature; we added the sedentary, exercise, and lifestyle activities to determine the potential for the four accelerometers to measure SB accurately in a setting where a variety of activities was being performed, as is normally seen in free-living environments. Additionally, we added an activity where participants removed the accelerometers so that the ANNs would be able to recognize non-wear, which is important to be able to detect in free-living environments for compliance purposes and for differentiation of non-wear from SB. 175 Table 5.1. Activities performed during the simulated free-living protocol. Activity Category Activity Activity Intensity Lying down (T1) Sedentary Reading (T2) Sedentary Computer (T3) Sedentary Standing (T4) Light** Laundry (T5) Light Sweeping (T6) Light Walking slow (T7) Light Walking fast (T8) Moderate Jogging (T9) Vigorous Cycling (CY) Cycling (T10) Moderate/ Vigorous Stair use (SU) Stair climbing and descending (T11) Moderate/ Vigorous Biceps curls (T12) Light Squats (T13) Moderate Non-wear of accelerometer (T14) N/A Sedentary behaviors (SB) Standing (ST) Lifestyle (LI) Leisure walk (LW) Brisk walk (BW) Jogging (JO) Exercise (EX) Non-wear (NW) Description of Activity* Lying on a mat on the floor Reading a magazine article while sitting at a table Sitting and playing a computer game that involves mouse clicking and typing Standing still with arms at sides Folding towels and putting them in a laundry basket Sweeping confetti into piles Walking at a self-selected ‘slow’ pace in a hallway Walking at a self-selected ‘brisk’ pace in a hallway Jogging at a self-selected pace in a hallway Cycling on a cycle ergometer at a selfselected cadence of 50-100 rpm with 1 kg resistance Walking up and down a flight of stairs at a self-selected pace Standing still while doing biceps curls with a 3-lb. weight in each hand With feet shoulder-width apart, bending at the knees (to a 90° angle) while holding an unweighted broom behind the head Not wearing the accelerometer * Activity order, intensity, and duration (3-10 minutes) were left up to participants. ** Standing has traditionally been considered SB; however, recent literature suggests that standing should be considered light-intensity instead of SB due to the differential physiologic effects of standing as compared to sitting/lying (Owen, Healy et al. 2010). Participants completed the 14 activities (shown in Table 5.1) in a 90-minute, simulated free-living protocol which took place within the Human Energy Research Laboratory and in a 176 building stairwell and hallway. The 14 activities were described to each participant prior to the start of the protocol, and some of the less familiar activities (e.g., squats) were demonstrated to ensure understanding. Participants completed each of the 14 activities for at least three minutes and for no more than 10 minutes, but the order, intensity, and duration of the activities were left up to each participant. Participants were also free to perform activities more than once if they so chose. A research assistant directly observed and recorded each activity on a handheld PDA computer while activities were being performed. Additionally, the research assistant periodically updated participants on which activities they still needed to complete. The non-wear activity was saved until the end of the protocol so that participants would not spend a significant portion of the time trying to remove and reattach the accelerometers. For this study, direct observation (DO) served as the criterion measure of total time spent in SB and breaks in SB. Upon completion of the protocol, participants were given a $35 Target® gift card. Data reduction and modeling Artificial neural networks Artificial neural networks (ANNs) are nonlinear models which predict an outcome or dependent variable y (e.g., energy expenditure or activity type) using a number of inputs x1…xk, where k is the number of features used to predict y. A graphical depiction of the ANNs created in the current study can be seen in Figure 5.1. The ANNs were used in our previous work (Chapter 4) for predicting activity type, which were then used in the current study to predict total time in SB and breaks in SB. For activity type classification, the ANNs functioned similar to a logistic 177 regression model. Setting the activity types as the nominal values a1…a14, the ANN model can be seen in Equation 1. Equation 1: ( ) ( ∑ ( ∑ ) In Equation 1, Pr is probability, C is a constant chosen so that Pr(y=a1)+…+Pr(y=a14)=1, w are the weights of the input features, and H is the number of hidden layers. For each activity, values closer to 1 represented a higher likely that the activity was being performed. The activity with the value closest to 1 was chosen as the predicted output by the ANN In accordance with previous research, our models contained only one hidden layer (Preece, Goulermas et al. 2009; Staudenmayer, Pober et al. 2009; Trost, Wong et al. 2012). After classifying into specific activity types, the three sedentary activities (lying, computer use, and reading) were collectively categorized as SB to allow for prediction of total time spent in SB. Likewise, the 10 non-sedentary activities (standing, laundry, sweeping, walk slow and fast, jogging, cycling, stair use, biceps curls, and squats) were collectively classified as non-SB in order to predict breaks in SB. Non-wear was classified into its own category and later removed from the dataset since there is no way to tell if a person is sedentary or non-sedentary if the accelerometer is not being worn. 178 Figure 5.1. ANN for predicting activity type and sedentary behavior. Figure 5.1 legend * The number of input features was 38, as described in Table 5.2. Additionally, three hidden units are shown in Figure 5.1 for simplicity, but 15 hidden units were used for construction of the ANNs. Accelerometer signal features (one of each per axis, three total of each per accelerometer) 1. Mean = mean 2. Var = variance 3. Cov = covariance 4. Min = minimum 5. Max = maximum 6. MeanOR = mean accelerometer orientation 7. VarOR = variance of 8. 10th %ile = 10th percentile accelerometer orientation 9. 25th %ile = 25th percentile 10. 50th %ile = 50th percentile th th 11. 75 %ile = 75 percentile 12. 90th %ile = 90th percentile Participant characteristics features 13. Ht = participant height 14. Wt = participant weight Non-feature abbreviations T1-T3 are sedentary activities, and T4-T13 are non-sedentary activities. S = summations of the input layer in the hidden units U = activation function for the hidden layer W1 = the weight vectors for each of the inputs W2 = the weight vectors for each of the summations 179 The ANNs were created and tested using a leave-one-out cross-validation. In this approach, data from all but one participant were used to estimate the weights for each input feature for predicting activity type. Then, the ANN was tested on the data from the participant left out of the training phase by supplying the input features and comparing the predicted activity type from the ANNs to the recorded activity type from DO. The leave-one-out cross validation is an iterative approach and was repeated with each participant’s data used as the testing data once, therefore obtaining an ANN for activity type for each participant in the study. The weights determined from each iteration of the leave-one -out validation were averaged to obtain a final ANN. This process was conducted separately for each accelerometer, resulting in four distinct ANNs. The ANNs were developed with the intention to predict activity type, which could then be used to estimate total time spent in SB as well as breaks in SB. In accordance with previous research, we chose to use five-second windows for creation and testing of our ANNs (Preece, Goulermas et al. 2009). Table 5.2 provides a list of the 38 features tested and used in the current analyses. The 36 accelerometer features (12 features for each of the three axes) are time-domain features that are simple to compute and have been used previously as inputs into machine learning algorithms. Additionally, we included height and weight to account for different body sizes. 180 Table 5.2. Features used for EE and activity type prediction. Feature number 1-3* Feature used 4-6* Variance of acceleration signal 7-9* 10-12* 13-15* 16-18* Covariance of acceleration signal Minimum of acceleration signal Maximum of acceleration signal 10th percentile of acceleration signal 19-21* 25th percentile of acceleration signal 22-24* 50th percentile of acceleration signal 25-27* 75th percentile of acceleration signal 28-30* 90th percentile of acceleration signal N/A Accelerometer orientation (needed for calculating features 31-36) Formula for calculating feature Mean acceleration signal ( ∑ ∑ ( Mean accelerometer orientation 34-36* Variance of accelerometer orientation Participant height Participant weight 37 38 ) ( ) ) ( ( )] ) ( ) ( ) For every 100 accelerations, arrange in order from smallest to largest and pick the 10th value For every 100 accelerations, arrange in order from smallest to largest and pick the 25th value For every 100 accelerations, arrange in order from smallest to largest and pick the 50th value For every 100 accelerations, arrange in order from smallest to largest and pick the 75th value For every 100 accelerations, arrange in order from smallest to largest and pick the 10th value ( ) ( 31-33* ∑ √( ) ( ∑ ∑ ) ) ( N/A N/A Ax is the acceleration in the direction of the x-axis. *signifies that one feature is included for each of the three accelerometer axes. The formulas shown are for the x-axis, but the formulas for the y-and z-axes are similar. 181 ) Assessing sedentary behavior using accelerometers The ANNs were created in order to classify 10 different activity categories. In our initial testing of the ANNs (Chapter 4), we found that they correctly classified sedentary activities 72.6%, 92.1%, 93.5%, and 92.7% for the hip, thigh, left wrist, and right wrist accelerometer placements, respectively. However, higher classification accuracy for sedentary activities does not necessarily ensure better accuracy for predicting total time spent in SB or breaks in SB. Therefore, in the current study, total time spent in SB for each participant was estimated using each accelerometer. ANNs from each of the four accelerometers predicted the activities being performed throughout the protocol, and all time spent lying, reading, or in computer use was summed to obtain a prediction for total time spent in SB. Since each accelerometer predicted activity type separately from the other accelerometers, we obtained four estimates of total time spent in SB for each participant. Similarly, breaks in SB were assessed for each participant and separately for each accelerometer placement. A break in SB has been defined in previous research as when an interval classified as SB is followed by an interval classified as a non-sedentary activity. We classified a break in SB using three different lengths of time that a non-sedentary activity must occur to constitute a break (we call this a break interval). First, since previous research using accelerometers to measure SB uses 60-second break intervals, we first defined a break in SB as when 12 consecutive 5-second windows (12*5 = 60 seconds) of a sedentary activity were followed by 12 consecutive windows of a non-sedentary activity. Second, a shorter break in SB (i.e., < 60 seconds) might be physiologically meaningful but may be missed if using a 60-second break interval. Therefore, we also evaluated the accuracy of using the two shorter intervals for 182 measuring breaks in SB (30 seconds and 5 seconds). Using a 30-second break interval for estimating breaks in SB, we defined a break as six consecutive windows of a sedentary activity followed by six consecutive windows of a non-sedentary activity. Using a 5-second break interval for estimating SB breaks, we defined a break as one window of sedentary activity followed by one window of non-sedentary activity. Direct observation Direct observation has been used successfully as a criterion measure of SB in previous studies conducted in free-living settings (Lyden, Petruski et al. 2013) and served as our criterion measure for the current study. Data on activities performed were recorded on a handheld PDA using the BEST observation software. Using this software, activities performed during the visit were coded as T1-T13, as shown in Table 5.1. As the final activity in the visit, participants took off their accelerometers and set them on a table, and then the next 3-10 minutes was recorded as non-wear (T14) while the accelerometers sat on the table. Any activity coded as non-wear was not included when analyzing SB, since, by definition, we could not know if participants are engaging in SB if the accelerometer is not being worn. Exclusion of non-wear was necessary in order to determine the real-world suitability of the ANNs for measurement of SB. Additionally, as participants transitioned from one activity to another, we coded this time between activities in a special transition category (T15). The recording of activities using DO took place continuously and in real time. Research assistants were trained to record an activity change as closely as possible to the moment it occurred. After collection, these DO data were synchronized with the accelerometer data so that 183 each five-second window of accelerometer data was matched to the actual activity performed during that window. In most cases, only one activity occurred during a given five-second window. However, when transitioning between activities, two activities could occur in the same window. If this occurred, the window was automatically recoded as a transition. Additionally, we used the transition category to define all time between activities, such as walking from one activity to another or making an equipment adjustment between activities. Thus, transitions did not represent a specific activity type but instead involved walking, standing, etc. that occurred at the end of one activity and before the next started. We did not include transitions as a separate activity in the ANN creation but instead removed them from the DO and accelerometer datasets prior to creation of the ANNs and before prediction of total time spent in SB. However, breaks in SB only occurred during the times coded as transitions in the dataset (e.g., transitioning from reading to jogging would represent a break in SB). Therefore, we added the transition data back to the dataset after creation of the ANNs but before testing the ANNs for their prediction of breaks in SB. For DO, any transition from a sedentary to a non-sedentary activity was considered a break in SB, no matter how short the transition may have been. Conversely, for the accelerometers, we predicted breaks in SB in three ways (using 5-, 30-, and 60second break intervals), as described in the previous section. Statistical analyses A criterion value of total time spent in SB was assessed for each participant using DO and averaged for the entire sample. Similarly, estimates of SB from each of the four accelerometers were calculated for each participant and averaged together for the entire sample. Differences between criterion-measured and accelerometer-estimated total time spent in SB were evaluated 184 using repeated measures analysis of variance (RMANOVA). If significant differences were revealed by the RMANOVA, post hoc dependent t-tests were conducted, with a least significant difference (LSD) correction used to account for multiple comparisons. Additionally, root mean square error (RMSE) values and their 95% confidence intervals (CIs) were calculated for predicted vs. measured total time spent in SB for each of the four accelerometers. Significant differences for RMSE among monitor locations were determined by non-overlap of a 95% CI with the mean from another accelerometer location. For breaks in SB, criterion-measured breaks were also obtained for each participant using DO and averaged for the entire sample. Estimates of breaks in SB from each accelerometer were obtained separately for five-, 30-, and 60-second break intervals for each participant and averaged for the sample. Differences among DO, the four accelerometers, and the three windows were evaluated with RMANOVA, and differences were evaluated using post hoc tests and an LSD correction. Moreover, RMSE values and their 95% CIs were computed to compare predicted to measured breaks in SB, with non-overlap of a 95% CI with the mean from another accelerometer location or break interval indicative of statistically significant differences. Power analysis We desired 80% power to detect a difference of at least moderate effect size (ES=0.5) among accelerometers and the criterion measure. Therefore, with the α level set at α = 0.05, we needed 34 participants to be sufficiently powered to detect this difference. We chose to oversample by 10 participants in order to have adequate sample size despite an expected loss of a 185 few participants due to the possibility of equipment malfunction, especially when using multiple accelerometers and a handheld computer. 186 RESULTS Of the 44 participants who participated in study, significant data loss occurred for the thigh accelerometer in two participants, resulting in their exclusion from the data analysis. Additionally, the portable metabolic analyzer (used to address a study aim not part of the current manuscript) malfunctioned in three participants, resulting in premature termination of the protocol and exclusion of their data from the analysis. Therefore, 39 participants with viable data were included in the final data analysis. Sample demographics included in the analyses are displayed in Table 5.3. Table 5.3. Demographic characteristics of participants enrolled in study. All (n=39) 22.1 (4.3) Age (years) 72.4 (16.2) Weight (kg) 171.4 (10.1) Height (cm) 2 24.4 (3.6) BMI (kg/m ) Data are displayed as mean (SD). Males (n=19) 23.7 (5.0) 84.5 (13.1) 179.1 (7.7) 26.3 (3.4) Females (n=20) 20.5 (2.7) 60.8 (8.9) 164.1 (5.7) 22.5 (2.6) Predictions of total time spent in SB are shown in Figure 5.2. Overall, participants spent an average of 20.7 minutes engaged in SB during the visit, according to DO. The hip accelerometer tended to underpredict SB by 7.9% (19.1 minutes from the hip vs. 20.7 minutes from DO), but this difference did not reach statistical significance. The estimates of total time spent in SB predicted by the accelerometers placed on the thigh and both wrists were not significantly different from that measured by DO. Additionally, we rearranged the data to compare dominant and non-dominant wrist placements, but neither were significantly different from DO-measured total time spent in SB. 187 Total Sedentry Behavior (min) Figure 5.2. Predictions of total time spent in SB compared to a criterion measure (DO). 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0 AG Hip = Hip-mounted ActiGraph monitor, AG Thigh = Thigh-mounted ActiGraph monitor, GE Left Wrist = GENEA monitor placed on the left wrist, GE Right Wrist = GENEA monitor placed on the right wrist. Although there were no significant differences among the four accelerometers compared to DO for predicting total time spent in SB, there was considerable variation in RMSE (Table 5.4), ranging from 2.8 minutes with the accelerometer on the left wrist to 7.2 minutes with the hip-mounted accelerometer. Each monitor placement site had significantly different RMSE values than the other three, but it is notable that the two wrist accelerometer placements had RMSE values that were 49-61% lower than the RMSE values achieved with the hip and thigh accelerometer placements. The left wrist placement had significantly lower RMSE than the right wrist placement; similarly, the non-dominant wrist placement had significantly lower RMSE than the dominant wrist placement. 188 Table 5.4. Root mean square error for prediction of total time spent in SB and breaks in SB. Accelerometer location RMSE for predicted total time spent in SB [Minutes (95% CI)] RMSE for predicted 5second breaks in SB [Breaks (95% CI)] RMSE for predicted 30second breaks in SB [Breaks (95% CI)] RMSE for predicted 60second breaks in SB [Breaks (95% CI)] AG Hip AG Thigh GE Left Wrist GE Right Wrist Dominant Wrist 7.2 (6.9-7.4)* 6.3 (5.0-6.5)* 2.8 (2.7-2.9)* 3.2 (2.8-3.5)* 3.3 (2.9-3.5)^ Nondominant Wrist 2.7 (2.6-2.8) 31.5 (30.732.2)* 21.0 (20.521.6)* 32.4 (31.733.1)* 29.5 (28.930.0)* 30.7 (30.1-31.4) 31.2 (30.6-31.9) 1.6 (1.5-1.6)* 1.5 (1.4-1.5)* 1.9 (1.8-1.9) 1.9 (1.8-1.9) 1.9 (1.8-2.0) 1.9 (1.8-2.0) 2.0 (2.0-2.1) 2.1 (2.0-2.1) 2.1 (2.0-2.2)* 2.2 (2.2-2.3)* 2.2 (2.1-2.2) 2.2 (2.1-2.2) * indicates significant difference from all other accelerometer placements. ^ indicates significant difference from non-dominant wrist placement. AG Hip = Hip-mounted ActiGraph monitor, AG Thigh = Thigh-mounted ActiGraph monitor, GE Left Wrist = GENEA monitor placed on the left wrist, GE Right Wrist = GENEA monitor placed on the right wrist. 189 Predictions of breaks in SB compared to DO are shown in Figure 5.3-5.5. From these figures, it is apparent that choice of break interval for defining SB altered the accuracy of prediction of breaks in SB in this study. Choice of the five-second interval for defining SB (Figure 5.3) resulted in dramatic overestimations of breaks in SB for all four accelerometer placements compared to DO. The thigh accelerometer placement performed best for predicting breaks in SB for the five-second interval, but it still predicted over five times more breaks in SB than were actually taken. On the other extreme, use of the 60-second interval for defining a break in SB (Figure 5.5) resulted in underprediction of breaks by all four accelerometer placements compared to DO, and none of the predictions were significantly different from each other. The 30-second interval for defining a break in SB resulted in highest accuracy of prediction of breaks in SB (Figure 5.4). The thigh and right wrist accelerometer placements underestimated breaks slightly, by an average of 0.7-0.8 breaks per visit. Conversely, the hip and left wrist accelerometer placements provided accurate predictions of breaks in SB with the 30-second interval. When analyzed by dominant and non-dominant wrists, the dominant wrist placement underpredicted breaks, whereas the non-dominant wrist accurately predicted breaks in SB. 190 Number of Breaks in SB Figure 5.3. Predictions of breaks in SB using a five-second interval. 50 45 40 35 30 25 20 15 10 5 0 * * * * *^ * indicates significant difference from DO. ^ indicates significant difference from all other accelerometers. Figure 5.4. Predictions of breaks in SB using a 30-second interval. Number of Breaks in SB 6 5 * * 4 3 2 1 0 * indicates significant difference from DO. 191 * * Figure 5.5. Predictions of breaks in SB using a 60-second interval. Number of Breaks in SB 6 5 4 * * * * * * 3 2 1 0 * indicates significant difference from DO. Table 5.4 shows the RMSE values for predicted vs. measured breaks in SB, displayed separately for the five-, 30-, and 60-second break intervals. For the five-second break interval, the poor prediction accuracy for breaks in SB seen in Figure 5.3 was compounded by very high RMSE values for all four accelerometer placements, ranging from an error of 21.0 breaks for the thigh placement site to 32.4 breaks for the left wrist placement. The RMSE values for the 30and 60-second break intervals were considerably lower than for the five-second break interval. For all four accelerometer placements, the 30-second break interval had significantly lower RMSE than the 60-second interval, again indicating superior accuracy for the 30-second break interval. When comparing among the four accelerometer placements, the hip and thigh accelerometers had RMSE values 19-30% lower than both wrist placements for the 30- break interval and 4-9% lower than the wrist placements for the 60-second break interval. 192 When comparing the two wrist placements, prediction of total time spent in SB was not significantly different between the two; however, RMSE for prediction of total time spent in SB was about 12% lower for the left wrist placement than the right wrist placement (and about 18% lower for the non-dominant wrist than the dominant wrist). For prediction of breaks in SB, both dramatically overpredicted breaks using the five-second break interval and underpredicted breaks using the 60-second break interval. Using the 30-second break interval, the RMSE values were similar between monitors, but the left wrist prediction of breaks was not significantly different from DO, whereas the right wrist underpredicted breaks compared to DO. Similarly, breaks were underpredicted when data were analyzed for the dominant wrist, but the non-dominant wrist placement resulted in accurate predictions of breaks with the 30-sec break interval. 193 DISCUSSION The purpose of this investigation was to develop, validate, and compare the accuracy of ANNs created to estimate total time spent in SB and breaks in SB from accelerometers located on the hip, wrists, and thigh. Additionally we compared accuracy of accelerometers worn on the left and right wrists for prediction of time spent in SB and breaks in SB. The ANNs were developed in order to predict the type of activity being performed, and these were validated in our previous work (Chapter 4). For prediction of total time spend in SB, we summed time predicted as lying, reading, and computer use. Similarly, we predicted breaks in SB as when a bout of time spent in lying, reading, and computer use was followed by a bout predicted as a non-sedentary activity. When examining total time spent in SB, predictions from all four accelerometers were not significantly different from the criterion, although the hip trended toward underpredicting time spent in SB. Additionally, the two wrist accelerometer placements had significantly lower RMSE for predicting total time spent in SB compared to the hip and thigh placements, indicating that the wrist placement sites had less individual error (and superior accuracy) when predicting total time spent in SB. The hip accelerometer placement had the worst prediction of total time spent in SB, with an RMSE value more than 100% greater than those seen with the wrist placements and 14% higher than the RMSE from the thigh placement. The fact that the hip placement site performed worst of the four sites in terms of prediction error and the tendency for underprediction of total time spent in SB is not surprising given previous studies by Kozey Keadle et al. and Lyden et al. showing higher accuracy for measuring total time spent in SB using a thigh accelerometer than a hip accelerometer (Kozey-Keadle, Libertine et al. 2011; 194 Lyden, Kozey Keadle et al. 2012). Additionally, Hart et al. used thigh- and hip-mounted accelerometers in a free-living setting and found that the thigh placement had higher convergent validity with other SB assessment measures than the hip placement (Hart, Ainsworth et al. 2011), again supplying evidence that thigh-mounted accelerometers are preferable to hip accelerometers for the measurement of total time spent in SB. Of note, the RMSE for the left wrist placement was 12% lower than the RMSE for the right wrist placement, which increased to an 18% difference when data were analyzed comparing the dominant and non-dominant wrists. These findings indicate superior accuracy of the non-dominant wrist accelerometer for measurement of total time spent in SB. Implications of this finding are discussed later in this section. To our knowledge, this is the first study that assessed the utility of wrist-mounted accelerometers for measurement of SB. Initially, we were surprised by the superiority of the wrist accelerometers to the thigh accelerometer for measurement of total time spent in SB given the previous literature showing high accuracy of the thigh for measuring SB (Hart, Ainsworth et al. 2011; Kozey-Keadle, Libertine et al. 2011). However, in our previous work (Chapter 4), the left and right wrist accelerometer placements achieved activity type classification accuracies of 86.6% and 86.7%, respectively, which was slightly higher than the thigh placement accuracy (84.0%) and much higher than that accuracy achieved with the hip (72.5%). Moreover, the left and right wrist placements achieved prediction accuracies of 93.5% and 92.7%, respectively, for prediction of sedentary activities, which was higher than the thigh (92.1%) and hip (72.5%). Therefore, the highest overall prediction accuracies for activity type as well as the highest recognition of sedentary activities supports that wrist-mounted accelerometers may also be best for prediction of total time spent in SB. 195 In contrast, the wrist accelerometer placements did not perform superiorly to the hip and thigh placements for estimating breaks in SB. All four accelerometer placements performed best when using the 30-second break interval; using this break interval, the wrist placements had RSME values 19-30% higher than the hip or thigh placements for estimating breaks in SB, indicating lower measurement accuracy from the wrist-mounted accelerometers. Additionally, only the non-dominant wrist placement accurately estimated breaks in SB for the 30-second break interval, with the dominant wrist underpredicting breaks by 17%. The thigh placement also underpredicted breaks (by 15%) but had the smallest RMSE for prediction of breaks in SB. Surprisingly, the hip placement performed the best of the four accelerometer placements, accurately predicting breaks in SB while also yielding an RMSE only 5% higher than the thigh and 19-23% lower than the wrist accelerometers. The high accuracy of the hip placement is insightful given mixed results reported by Lyden et al. considering the utility of the hip for measurement of SB. In one study, Lyden et al. found that a thigh accelerometer was able to accurately classify breaks in SB, while a hip accelerometer overestimated breaks by 78-133% depending on choice of cut-point used as the threshold for SB (Lyden, Kozey Keadle et al. 2012). In a more recent study, the authors found that when using ANNs, a hip-mounted accelerometer was able to accurately measure total time spent in SB as well as breaks in SB (Lyden, Keadle et al. 2013). The current study, in conjunction with Lyden’s work, provides further evidence of the advantages of using machine learning for modeling accelerometer data over using the traditional cut-point approach for measurement of SB using a hip-mounted accelerometer. 196 Our finding that the wrist accelerometer placements were outperformed by the hip placement for measurement of breaks in SB is surprising given that accuracy of activity type classification, specifically for recognizing SB, is higher for the wrists than the hip (Chapter 4). There are several possible reasons why higher classification accuracy did not translate to better measurement of breaks in SB. First, the hip is relatively insensitive to limb movements, whereas the thigh and wrists are not. Therefore, limb movements while sitting or lying (ex., to drink water, scratch an itch, adjust equipment/clothing, etc.) may cause misclassification of one or more -five-second windows as non-sedentary activity. While an occasional misclassification would have minimal effect on overall classification accuracy or total time spent in SB prediction, these misclassifications would disrupt periods of SB and therefore lead to incorrect prediction of a break in SB when one did not occur. It would seem that this type of misclassification would increase the number of breaks detected and result in overprediction, and this was the case with the 5-second break interval. However, given the relatively short periods of time some sedentary activities were performed, it is possible that periodic misclassification due to sporadic limb movement would keep the accelerometers from recognizing the activity as SB (especially with longer break windows), thereby not recording the subsequent transition as a break from SB and leading to underprediction of breaks in SB. Additionally, for measurement of breaks in SB, the left wrist accelerometer placement was able to predict breaks accurately with the 30-second break interval, while the dominant wrist underpredicted breaks by about 17%. Given that 90% of our sample (four of the 39 participants) was right-hand dominant it is not surprising that upon analyzing the data comparing the dominant and non-dominant wrists, the non-dominant wrist achieved better accuracy for 197 measurement of total time in SB and breaks in SB. It may be that the dominant wrist accelerometer placement captured more irregular movement as participants performed the various activities in the visit, leading to misclassification of breaks in SB. Given this possibility, these results lend support to the convention in many large studies, such as NHANES, that wrist accelerometers should be worn on the wrist of the non-dominant hand (Troiano and McClain 2012; Troiano, McClain et al. 2014). It is important to reiterate that SB is a complex construct, and it is necessary to be able to measure the individual components of SB (i.e., total time and breaks) in order to better understand the influence of SB with health. Studies by Hamilton and colleagues provide evidence that prolonged SB is worse than an equivalent amount of time spent in SB which is frequently broken up by periods of non-sedentary activity (Bey and Hamilton 2003; Hamilton, Hamilton et al. 2004). Additionally, Healy and colleagues have published several studies showing inverse associations between breaks in SB and several health indices, independent of total time spent in PA or SB (Healy, Dunstan et al. 2008; Healy, Matthews et al. 2011). Our findings indicate that total time spent in SB may be best measured using wrist-mounted accelerometers, while breaks in SB may be better measured by a hip-mounted accelerometer. Therefore, as more research is conducted to better elucidate the health risks of total time spent in SB and breaks in SB, choice of accelerometer placement should be determined by the exact research question of interest. 198 Strengths and limitations There were several limitations in this study. First, our sample consisted of mostly college students with interest in health sciences and may not be reflective of the wider college age/young adult population. Additionally, the amount of time spent sedentary as well as the number of breaks in SB by participants during the study protocol is probably not reflective of an average 90-minute segment of the day, so without further research it is not guaranteed that the monitors will perform with similar accuracy in a true free-living environment. This study also had a few noteworthy strengths. To our knowledge, this was the first study to assess the ability of wrist-mounted accelerometers for measurement of total time spent in SB and breaks in SB, and our use of hip- and thigh-mounted accelerometers allowed for direct comparison of accuracy of the wrist monitors to previously used methods of measuring SB. Second, while a simulated free-living setting may not be totally reflective of a true free-living environment, the simulated free-living setting allows for better generalizability of results than a heavily controlled, laboratory-based protocol. By utilizing a simulated free-living setting, we were able to allow some freedom in activity choice, intensity, and timing while still using highquality criterion measures and examining of a wide range of different activities in a relatively short period of time, thereby minimizing burden on participants and researchers. Conclusions Our results provide evidence that hip-, thigh-, and wrist-mounted accelerometers can provide accurate estimates of total time spent in SB, although measurement at the individual level may be most accurate using the wrist-mounted accelerometers. For measuring breaks in 199 SB, the 30-second break interval appeared most accurate for all four accelerometers. When using the 30-second interval, the hip accelerometer performed best, although the left wrist accelerometer was also able to accurately predict breaks in SB. Together these results indicate that use of an accelerometer on the non-dominant wrist or the hip may be preferable for measurement of SB in a free-living setting, although the thigh accelerometer should be evaluated further due to its demonstrated utility for SB measurement in previous work. Additionally, when combining these results with the results from Chapters 4 and 5 of this dissertation, it appears that the wrist-mounted accelerometers (especially the non-dominant wrist accelerometer) perform well for measurement of energy expenditure and best for classification of activity type and measurement of SB. Therefore, these results suggest that the wrist may be an ideal measurement site for measurement of many behavioral characteristics. With the previous and current use of wrist-mounted accelerometers for sleep measurement, we plan to expand our ANNs to recognize and classify sleep duration and quality in addition to the variables already assessed. Additionally, use of wrist-mounted accelerometers may allow researchers to design pattern recognition approaches to recognize eating behaviors; we plan to further explore this possibility in future work. 200 CHAPTER 6 DISSERTATION SUMMARY AND RECOMMENDATIONS Summary of results High levels of physical activity (PA) and low levels of sedentary behavior (SB) are known to be beneficial for improving physical and mental health and lowering the risk of many chronic diseases (PAGAC 2008). Valid measurement tools are required to accurately assess the relationship of PA and SB to health outcomes, monitor precise levels of PA or SB to identify groups of people who are attaining insufficient PA and/or too much SB, and evaluate the effectiveness of interventions aimed to increase PA and decrease SB. Accelerometers are commonly used for prediction of energy expenditure, activity type (to determine PA participation), and SB, but the models used to predict these outcomes vary considerably in their complexity and accuracy. Therefore the purposes of this dissertation were to 1) create predictive models from accelerometer data with the intent to predict energy expenditure, activity type, and SB, 2) compare the accuracy of models created from accelerometers worn on the right hip, right thigh, and both wrists, and 3) to develop and test the models created using simple input features and widely available computational software. Chapter 3: Estimation of energy expenditure The first part of our investigation focused on the ability of the four accelerometer placements to accurately estimate EE. We hypothesized that all four placements would achieve at least moderately high accuracy for predicting EE, as indicated by correlations of r ≥ 0.60 (Safrit and Wood 1995) . The four placements achieved correlations of r = 0.82-0.89 with measured EE 201 from the Oxycon, supporting our hypothesis and indicating high accuracy for prediction of EE from all four placements. Root mean square error (RMSE) was also calculated and ranged from 1.05-1.42 METs, which fall in line with values seen in previous work. When comparing placement sites, we hypothesized that the thigh location would show the highest EE prediction accuracy. This hypothesis was supported, with the thigh accelerometer achieving higher correlations and lower RMSE for predicting EE than the hip or wrist accelerometer placements. Another important advantage of the thigh-mounted accelerometer over the other placements was that the use of fewer input features in the EE prediction model (which reduces its complexity) did not result in lower accuracy, whereas the predictive accuracy was lower when fewer features were used with the hip and two wrist accelerometer models. These findings lend support to the use of thigh-mounted accelerometers for achieving high predictive accuracy for measuring EE, even with relatively simple prediction models. However, the superiority of the thigh accelerometer placement should not overshadow the fact that both the hip and two wrist accelerometer placements also achieved highly accurate predictions of EE. One significant hurdle in assessing EE in free-living settings is choice of a criterion measure. Doubly labeled water is often used as a criterion measure for total EE but cannot assess minute-to-minute EE. As an alternate approach, Lyden et al. used direct observation as a criterion measure of free-living EE by recording the activity performed and then looking up an EE value from the Compendium of Physical Activities in order to predict EE for each activity (Ainsworth, Haskell et al. 2011; Lyden, Keadle et al. 2013). A potential problem with this approach is that the Compendium represents an average value of EE for activities and is not necessarily accurate for a given individual, especially when the observer would have to record the activity and also estimate 202 the activity intensity. Additionally, this method does not allow for prediction of EE during transitions between activities since a transition is not a defined activity type but instead is used to classify times when a person moves from one activity to another. Given the limitations of these methods, we chose to use indirect calorimtery via a portable metabolic analyzer as our criterion measure, which measures oxygen consumption to derive estimates of EE. Use of this method allowed us to record data during all activity times as well as during transitions. Indirect calorimetry provides a valid measure of EE when a person performs steady-state activities (Rosdahl, Gullstrand et al. 2010); however, when a person changes activities or moves to a different intensity of activity, change in oxygen consumption lags behind, meaning that indirect calorimetry may not capture the true energy requirement of a task unless the task is being performed at steady state, which may take several minutes to achieve after an activity is started (Kenney, Wilmore et al. 2012). In our study, participants performed 14 distinct activities but could perform an activity more than once; the actual number performed ranged from 14-20 and averaged about 16, with an average length of about five minutes per activity. Therefore, a significant portion of time during the protocol was likely not spent in steady-state EE. Despite these shortcomings, we deemed indirect calorimetry the best available criterion measure due to the limitations of doubly labeled water and direct observation (discussed earlier). The lack of steady-state EE seen in our study likely relates to true free-living situations, at least for PA. In free-living settings, adults likely reach steady-state EE during SB since SB makes up the majority of waking time and since most SB bouts are performed for a prolonged period of time (i.e., > 10 minutes) (Matthews, Chen et al. 2008; Lyden, Kozey Keadle et al. 2012). However, non-sedentary activities make up a much smaller portion of the day and are generally 203 performed in shorter bouts, especially with respect to higher-intensity activities (Troiano, Berrigan et al. 2008); therefore, we expect that steady-state is rarely achieved during free-living PA. Accordingly, we feel that more research and discussion is needed to develop ways of improving the use of direct observation and/or indirect calorimetry for measurement of non-steady-state EE. One potential idea would be to perform a similar protocol to ours but to add a second visit where each participant can perform each activity at steady state while their EE is measured via indirect calorimetry. Then, for the simulated free-living visit, direct observation could be used as the criterion (similar to Lyden’s study), but the individual’s measured EE values from the first visit could be used to predict EE instead of using the Compendium for prediction of EE. This approach would likely increase validity of direct observation but would also significantly increase participant and research burden and cost of the study. However, we feel that our use of indirect calorimetry represented an appropriate criterion measure to answer our research questions and are confident that our results provide an accurate reflection of the true utility of the four accelerometer placements we tested for prediction of EE. In conclusion, the thigh accelerometer performed best of the placement sites for prediction of EE, and the superiority of the thigh was more apparent with the simplest ANNs. However, the wrists and hip placements achieved correlations within 10% and error within 25% of that achieved by the thigh placement, indicating that high accuracy can also be achieved for measurement of EE using accelerometers placed on the hip and wrists. Therefore, thigh-mounted accelerometers should be used if EE measurement accuracy is of utmost importance, but the hip and wrists can be used for accurate measurement as well, if these placement sites are more practical for the population being tested or for the specific research question being addressed. 204 Chapter 4: Classification of activity type The second major aim of this dissertation was evaluating the ability of accelerometers located on the hip, thigh, and wrists to correctly predict the specific type of activity being performed. Our first aim in this study was to create models to predict activity type using simple input features and widely available, easy-to-use software packages. We were successful in accomplishing this goal by using Microsoft Excel for data processing, cleaning, and reduction and R for ANN creation. Our first hypothesis-driven aim was to compare overall classification accuracies among the four accelerometers as well as compare accuracies for detecting specific types of activities. From our results shown in Chapter 3 as well as in previous research by members of our research group and others (Cleland, Kikhia et al. 2013; Dong, Montoye et al. 2013; Skotte, Korshoj et al. 2014), we hypothesized that the thigh accelerometer would achieve the highest overall activity classification accuracy. However, our results did not support this hypothesis. When comparing classification accuracies for identifying all 14 activities, the two wrist accelerometers performed the best, with classification accuracies of 81.3-81.4%. They also showed the highest sensitivity and specificity for activity classification accuracy, whereas the thigh and hip accelerometers achieved accuracies of only 71.7% and 66.4%, respectively. When grouping similar activities into categories, the accuracies of all four monitors improved; the wrist accelerometers still had the highest classification accuracies at 86.6-86.7%, with the thigh being much closer in accuracy (84.0%) than the hip (72.5%). When looking at the classification accuracies of specific activity types, we hypothesized correctly that the wrist accelerometers would have the highest classification accuracies for lifestyle activities (laundry and sweeping) as well as other upper-body activities such as biceps curls. The 205 wrist accelerometers also achieved the highest accuracy for classifying sedentary activities, which we hypothesized would be measured best with the thigh-mounted accelerometer given high accuracy for SB measurement by thigh-mounted accelerometers seen in previous research (KozeyKeadle, Libertine et al. 2011; Lyden, Kozey Keadle et al. 2012). However, the seated activities in this study (computer use and reading) involved arm movement, which we believed would be detected better with the wrist accelerometers than the thigh accelerometer. Importantly, combining the sedentary activities into one category resulted in the thigh achieving an overall classification accuracy within 1.5% of that achieved by the wrists, providing evidence that even if the thigh cannot accurately classify specific types of sedentary activity (ex. lying vs. sitting), the thigh is highly accurate for differentiating SB from non-sedentary activities. Our findings contrast somewhat to previous work showing comparable or higher measurement accuracy of hip and thigh accelerometers (Cleland, Kikhia et al. 2013; Dong, Montoye et al. 2013; Skotte, Korshoj et al. 2014). However, the current study tested a larger number of activities, and they occurred in a simulated free-living setting, which can yield very different results compared to those found in a laboratory-based setting (Gyllensten and Bonomi 2011; Lyden, Keadle et al. 2013; van Hees, Golubic et al. 2013). In addition, the current study design is more generalizable to true free-living settings than the findings from previous work. We also compared classification accuracies of monitors worn on the left and right wrists. Since about 10% of the sample was left-hand dominant, we also analyzed the data comparing the dominant and non-dominant wrists. In both analyses, the classification accuracies of the monitors on the wrists were within 1.0%, signifying similar accuracy of measurement regardless of the wrist on which the accelerometer was worn. This finding suggests that the popular convention to wear 206 an accelerometer on the non-dominant wrist may be unnecessary for prediction of activity type, especially if compliance will be improved by allowing wearers to choose the wrist on which to wear the accelerometer (although our findings do not support that the wearer can switch between wrists within a study). Comparison of classification accuracies achieved among different studies is notoriously difficult because classification accuracy is inversely related to activity number and similarity among activities (all else held equal). Therefore, studies comparing the utility of different accelerometer placement sites must directly compare each placement site. We chose to test the hip, thigh, and wrists because they are the three most commonly used accelerometer placement sites, but other sites, such as the ankle or lower back, may have advantages in certain situations and should be considered for use in future studies. Another difficulty of activity type classification studies is choice of activities to include in the testing set. Predictive models can only predict activities that were used in the model creation; for example, the models created in this dissertation can predict laundry but have no output variable for gardening or dishwashing. When creating models to recognize specific activity types, there is no way to include all activities that people may perform in their everyday lives. However, by collapsing activities into categories comprising similar activities, it is possible to develop an idea of how people spend their days and how active they are. The ANNs developed in this study showed an ability to classify 10 categories of activities with sensitivities from 72.2-86.7%. Further reduction to identification of activity intensities improved the sensitivity and AUC for the thigh accelerometer placement and resulted in high classification accuracy by the thigh and wrist placement sites and good classification accuracy by the hip placement site (Metz 1978). 207 In free-living settings, adults perform a wide variety of activities not included in this study; therefore, we expect that the capability of our ANNs for prediction of specific types of activities will be decreased in free-living settings. However, we demonstrated high predictive accuracy by the thigh- and wrist-mounted accelerometer placements when collapsing our prediction into either activity categories or activity intensities, and we feel that this approach is much more generalizable because even activities not tested in this study can be grouped into an activity category or intensity in order to measure activity levels in a free-living setting. Along with the discussion above, the importance of being able to classify specific activities vs. activity categories (i.e., lifestyle, exercise, etc.) vs. activity intensities (i.e., sedentary, light, etc.) will depend on the question of interest. For example, a physical therapist may be interested in measuring specific types of exercise activities to gauge compliance in a rehabilitation program, necessitating the differentiation of specific exercise activities. Alternatively, a mother might want to be able to differentiate between her child’s reading and TV watching but might be happy with any type of exercise or ambulatory activity. From a health behavior perspective, it is necessary to recognize specific types of ambulatory activity to differentiate incidental activity with healthenhancing activity. Specific exercise activities may not be as important to differentiate unless dictated by a specific research question. Lifestyle activities and standing are likely most important from an energy balance perspective or as breaks in SB. Lastly, from a pure health standpoint, recognition of specific types of SB may not be as important; however, from an intervention perspective, recognition of specific types of SB may be critical because getting someone to watch less TV may require different techniques than getting someone to drive less or sit less at work. 208 In conclusion, our study supports that classification of a variety, but not all possible types, of sedentary, ambulatory, lifestyle, and exercise activities was measured most accurately with accelerometers placed on the left or right wrists, especially if classification of specific types of activities is of importance. When activities were combined into similar categories, the thigh accelerometer classification accuracy approached that achieved by the wrists, but the wrists remained superior. Conversely, when classifying activities by activity intensities, the thigh placement slightly outperformed the two wrist placements, although all three of these sites achieved high overall intensity classification accuracy. These findings may vary depending on the choice of activities included in a validation protocol; however, from these findings, it appears that upper-body movements may be more unique to an activity than lower-body movements, allowing for better recognition of activities when using simple input features for an ANN created from wrist-accelerometer data. Chapter 5: Estimation of sedentary behavior The final objective of this dissertation was to assess the ability of hip, thigh, and wrist accelerometers to accurately predict total time spent in SB as well as breaks in SB. Our first aim of this study was to compare the accuracy of the hip, thigh, and wrist accelerometers for measurement of total time spent in SB. We expected the thigh accelerometer to provide the most accurate estimates, whereas we hypothesized that the hip would overestimate total SB due to misclassification of standing as SB and that the wrist would underestimate total SB due to misclassification of SB as a non-sedentary activity (due to aberrant wrist movement during SB). Overall, all four accelerometer placements provided similar predictions of total time spent in SB, but measurement error was considerably higher for the hip than the thigh, and the thigh had greater 209 error than the two wrist accelerometers. The finding that the hip had higher error than the thigh supports previous work by Lyden, Kozey-Keadle, and Grant (Grant, Ryan et al. 2006; KozeyKeadle, Libertine et al. 2011; Lyden, Kozey Keadle et al. 2012). Our finding of superior accuracy of the left and right wrist accelerometer placements was contrary to our initial hypothesis, but not overly surprising, since the ANNs used to predict total SB were the same as the ones used to classify activity type in Chapter 4, where the wrists outperformed the hip and thigh for overall activity recognition as well as recognition of SB. The second aim of the study was to compare the accuracy of the four accelerometers for estimating breaks in SB. We used three different break intervals (5-, 30-, and 60-seconds) for classifying a break in SB in order to determine an optimal break interval for measurement as it is currently unknown what interval is best suited for recognizing breaks in SB . We found that the 5second interval was too short, with the misclassification of single windows of accelerometer data resulting in dramatic overpredictions of breaks in SB by all four accelerometers. Conversely, the 60-second break interval appeared to be too long and resulted in underprediction of breaks in SB by all four accelerometers. Using the 30-second interval, the hip and left wrist accelerometers predicted breaks in SB accurately, while the thigh and right wrist accelerometers underpredicted breaks. However, error in SB break prediction was lowest with the thigh and highest with the wrist accelerometers. These findings were unexpected given the superior accuracy of the wrists for predicting total time spent in SB; however, the findings point to the importance of measuring these two constructs separately. Even though total time in SB and breaks in SB are related, accurate measurement of one does not imply accurate prediction of the other. The hip accelerometer placement’s ability to measure SB breaks accurately may be due to its insensitivity to limb 210 movement, whereas the wrists and thigh may have detected limb movement and potentially misclassified SB as a non-sedentary activity, therefore misclassifying breaks is SB. Previous research shows mixed findings of the accuracy of hip accelerometers for measurement of breaks in SB, but a recent study by Lyden et al. highlights dramatically improved measurement of SB using machine learning in comparison to the cut-point approach (Lyden, Keadle et al. 2013). Therefore, our study provides further support that machine learning may allow for improved measurement of SB using a hip accelerometer. Interestingly, we found that the left and right wrist accelerometer placements performed with similar accuracy when predicting total time spent in SB, but the left wrist showed higher accuracy for prediction of SB breaks. Given that 90% of our sample was right-hand dominant, our findings indicate potential superiority of the non-dominant wrist for measurement of SB, which is in accordance with the convention for accelerometers to be placed on the non-dominant wrist. We chose to predict time spent in SB and breaks in SB by first classifying into specific types of activity. An alternate way to classify SB would be to use an EE of < 1.5 METs as SB. Breaks would occur any time a predicted EE of < 1.5 METs was followed by an EE of ≥ 1.5 METs. This approach is how cut-point methods have been used, but the major drawback of this method is that standing is an activity that typically elicits an EE of < 1.5 METs but is defined as a non-sedentary activity by Owen et al. (Owen, Healy et al. 2010) due to evidence that standing may not have the same health implications activities such as sitting or lying (Katzmarzyk 2014). Therefore, we determined that this method was inappropriate since it would likely misclassify standing as SB. 211 In conclusion, the findings for prediction of total time spent in SB closely mirrored our findings from Chapter 4 showing highest accuracy for the wrist accelerometers and lowest accuracy for the hip accelerometer, although all four accelerometers provided similar estimations of SB breaks. The results for prediction of breaks in SB were more mixed but indicated that the hip was superior for measurement of breaks in SB. Further work is needed to confirm these findings as there is limited and potentially conflicting research regarding the utility of different accelerometer placement sites for measurement of SB. Conclusions This dissertation provides a comparison of the utility of accelerometers placed on the hip, thigh, and wrists and machine learning models for measurements of three key behavioral variables (energy expenditure, activity type recognition, and sedentary behavior) which are important determinants for long-term health at an individual and population level. We sought to determine if accelerometer placement affected measurement accuracy and if an optimal placement existed for measurement of all three variables. Our study suggests that choice of placement site affects measurement accuracy. Each outcome variable had a different optimal placement site, with the thigh being best for energy expenditure, the wrists being best for activity type classification, and the hip and right wrist being best for measurement of SB, although the SB findings were somewhat mixed. Additionally, although one placement site was not best for all measures, all placement sites allowed for high accuracy of measurement of energy expenditure, the wrists and thigh achieved over 80% accuracy for activity type classification, and all four monitors showed strengths and weakness for measurement of SB. Given these findings along with those of previous work, it seems that choice of accelerometer placement should 212 depend on the specific research questions, the population being tested, the length of time monitors are to be worn, and the complexity of the models desired. In an effort to compare the accuracy of our ANNs for measurement of energy expenditure vs. measurement of activity type, we have provided Table 6.1 below, which shows the sensitivity, specificity, and AUC of the energy expenditure ANNs for their accuracy in predicting activity intensity (similar to Table 4.9). With the activity type ANNs, AUC for activity intensity was as low as 0.85for the hip accelerometer placement (indicating good accuracy) and as high as 0.94 for the thigh accelerometer placement (indicating high accuracy). Conversely, the activity intensity AUC was much lower for the energy expenditure ANNs, with AUC values of 0.75-0.76 for the hip and wrist placements and 0.79 for the thigh placement, indicating only fair accuracy for the four placement sites. Therefore, it appears that in terms of determining activity intensity, the activity type ANNs may be superior to the energy expenditure ANNs for all accelerometer placement sites tested. 213 Table 6.1. Overall sensitivity, specificity, and AUC among the four accelerometer placement sites for classification of activity intensity using the energy expenditure ANNs (developed in Chapter 3). Sedentary Light Moderate Vigorous MVPA Total Sensitivity (% agreement) GE GE AG AG Left Right Hip Thigh Wrist Wrist 64.1 70.4 58.7 51.8 (7.3)* (6.9)* (7.5)* (7.6)* 60.2 65.2 66.2 66.8 (6.3)* (6.1) (6.1) (6.0) 69.1 72.3 74.4 72.6 (6.7) (6.5) (6.3) (6.5) 70.6 75.8 62.2 65.6 (9.0) (8.4) (9.6)* (9.4)* 83.8 88.1 86.0 85.2 (4.3) (3.8) (4.0) (4.1) 65.1 69.9 66.0 64.4 (3.6) (3.4)* (3.6) (3.6) AG Hip 93.0 (3.9) 78.0 (5.3) 82.1 (5.6) 97.5 (3.1) 84.1 (4.3)* 85.6 (2.6) Specificity (%) GE AG Left Thigh Wrist 90.7 93.1 (4.4) (3.8) 82.9 77.5 (4.8)* (5.4) 87.9 83.1 (4.7)* (5.4) 96.5 98.0 (3.6) (2.7) 90.2 87.3 (3.5)* (3.9) 88.1 85.8 (2.4)* (2.6) AUC GE Right Wrist 93.6 (3.7) 74.6 (5.6) 83.4 (5.4) 97.8 (2.9) 86.7 (4.0) 85.0 (2.7) AG Hip AG Thigh 0.79 (0.01)* 0.69 (0.01)* 0.76 (0.01)* 0.84 (0.02)* 0.84 (0.01)* 0.75 (0.01) 0.81 (0.01)* 0.74 (0.01)* 0.80 (0.01)* 0.86 (0.02)* 0.89 (0.01)* 0.79 (0.01)* GE Left Wrist 0.76 (0.01)* 0.72 (0.01)* 0.79 (0.01)* 0.80 (0.02)* 0.87 (0.01)* 0.76 (0.01)* Values are shown as Mean (SD). The * indicates significant differences from all other accelerometer placement sites. 214 GE Right Wrist 0.73 (0.01)* 0.71 (0.01)* 0.78 (0.01)* 0.82 (0.02)* 0.86 (0.01)* 0.75 (0.01) Current use of machine learning suffers from several pitfalls that this dissertation sought to address. First, machine learning models are often built by engineers or computer scientists who have an understanding for model building far beyond that of the average physical activity researcher. The associated complexity of many machine learning models limits or prohibits their use by physical activity researchers. The artificial neural networks created in this dissertation were built with simple input variables that can easily be calculated in Microsoft Excel. Additionally, we used pre-written R code for development and testing of our models, therefore accomplishing our goal of making artificial neural network creation understandable and accessible to non-experts. Also, validation studies are often conducted in laboratories under strictly controlled protocols that require activities to be performed at a constant intensity for a defined period of time and for a specific order. These laboratory conditions are not similar to how people actually act in a free-living environment, and previous research shows consistent drops in performance when laboratory-validated techniques are applied to free-living situations (Gyllensten and Bonomi 2011; Lyden, Keadle et al. 2013; van Hees, Golubic et al. 2013). Therefore, we allowed participants considerable freedom in our protocol to make our setting as similar to a free-living environment as possible while still keeping the visit short and having participants perform all 14 activities. The dissertation results provide several important advances to the field of physical activity and sedentary behavior measurement. First, we have improved measurement of energy expenditure using a single accelerometer far beyond what has been achieved using count-based regression and cut-points. With our energy expenditure models, it is possible to determine a 215 person’s daily kcal expenditure and therefore provide valuable information relevant to interventions such as weight loss. In addition, total daily energy expenditure can be used as a measure of total activity level in order to determine relationships with specific health outcomes. Alternately, by using three METs as the threshold for MVPA, we can use the energy expenditure models to determine daily MVPA levels, measure adherence to meeting the national physical activity recommendations, and identify individuals or groups accumulating inadequate physical activity or excessive sedentary behavior. Second, our activity type models are useful for determining times of the day when participants are most/least active in addition to knowing how much time they spend in certain behaviors. This information is important for individuals who tailor specific intervention strategies to help people become more active and less sedentary. Also, it may help for determining associations of specific behaviors (i.e., standing) with health outcomes. Lastly, the emphasis on accurate sedentary behavior measurement in this dissertation was warranted given the current lack of a measurement tool that is valid for assessment of sedentary behavior as well as physical activity. A major purpose of this dissertation was to determine an optimal method for measuring total time spent in sedentary behavior and breaks in sedentary behavior since an accurate sedentary behavior measurement tool will facilitate further research into the health risks of sedentary behavior and allow for evidence-based recommendations to be developed regarding healthy levels of sedentary behavior. We believe that this dissertation offers a fairly accurate measure of sedentary behavior, but there is room for improvement in this measure. 216 It would be ideal if one accelerometer placement performed best for all variables of interest because that would allow for recommendation of a single monitor placement, but this did not occur. However, if we were to pick one accelerometer placement based on the results of this dissertation, we would choose an accelerometer placed on the non-dominant wrist. This placement showed the highest accuracy for activity type prediction and achieved high overall measurement accuracy for energy expenditure and sedentary behavior. The dominant wrist also performed well for activity type and energy expenditure prediction but with lower accuracy for prediction of sedentary behavior. The thigh placement also performed well overall, but the wrist-mounted accelerometers were more comfortable and convenient for wear and still yielded high measurement accuracy. A good blend of practicality and accuracy is often desired for measurement tools used in large epidemiologic, surveillance, or intervention studies. Additionally, it appears that more accelerometer features may improve measurement accuracy, but simpler feature sets can still provide high accuracy while simplifying the predictive models. From our results, we would recommend the feature set consisting of the five accelerometer percentiles (10th, 25th, 50th, 75th, and 90th), which has been used in previous work and also showed high measurement accuracy in this dissertation. Results of this dissertation encourage further exploration of accurate yet relatively simple ways of using accelerometers to measure several important behavioral variables known to influence health. Below, we have outlined future directions for exploration of sedentary behavior measurement as well as other areas that build off the findings from this dissertation. 217 Recommendations for future research From the findings of this dissertation, we have several recommendations for further research. These recommendations are discussed below. 1. Further research should be conducted evaluating the accuracy of the hip, thigh, and wrist accelerometer placement sites for measurement of sedentary behavior. Evidence is emerging that sedentary behavior is an important health determinant, necessitating further refinement of measurement tools which can accurately measure the various aspects of sedentary behavior (total time, breaks, etc.) to better understand its health effects and determine evidence-based recommendations for limiting sedentary behavior to improve health. We feel that the models we developed for measurement of sedentary behavior provided good accuracy but can be improved; some suggested areas for experimentation include use of different input features and machine learning techniques that may be better suited specifically for differentiating movement from non-movement. 2. The testing of the wrist, hip, and thigh placement sites should be expanded to a more diverse population. Children and older adults have very different movement patterns and physical activity levels (Bailey, Olson et al. 1995; Troiano, Berrigan et al. 2008), and it may be that certain placements may have advantages in these different populations. Additionally, overweight/obese or pregnant populations often feel uncomfortable while wearing hip accelerometers (Feito, Bassett et al. 2011), so wrist and thigh placements should be tested in these populations to determine if these are sufficient alternative sites for accelerometers to be worn. 218 3. Accelerometer placement sites should be qualitatively and quantitatively evaluated for wear preference, and compliance data for each site should be assessed. There is preliminary information from NHANES that the wrist accelerometer placement has slightly higher compliance than the hip accelerometer (Troiano and McClain 2012), but these findings must be verified and expanded. 4. Machine learning algorithms other than artificial neural networks should be utilized in model creation. Artificial neural networks are being studied more thoroughly, show high measurement accuracy, and are easier to compute using R software than many other types of algorithms, but they are also much more computationally inefficient than other algorithms. Additionally, other algorithms may be able to achieve higher measurement accuracy than artificial neural networks (Preece, Goulermas et al. 2009). These possibilities should be explored in future work. 5. The simulated free-living setting used in this study was a significant study strength, and the results are likely more generalizable than those achieved in laboratory-based settings. However, a simulated free-living setting does not provide a perfect representation of the true free-living environment; thus, the artificial neural networks created in this study should be evaluated in a true free-living setting. 6. Further work should be done to determine the optimal criterion measure for use in freeliving measurement of energy expenditure. We chose to use indirect calorimetry as the criterion for this dissertation, even though the majority of time was likely spent not in steady-state. 219 7. This dissertation provided preliminary validation of artificial neural networks developed from accelerometer data to detect several behavioral variables, including energy expenditure, recognition of common activities performed, and sedentary behavior. Others have used accelerometers (usually placed on the wrist or hip) for measurement of sleep quality and quantity (Jean-Louis, Kripke et al. 2001), both of which have known associations with many health indices (Hoevenaar-Blom, Spijkerman et al. 2011). Several proprietary activity monitors such as the Fitbit® or Fuelband® are designed to monitor both activity and sleep, but these have questionable accuracy, and we are unaware of a research-grade device that has been validated to accurately measure both sleep and activity variables. We would like to expand the use of the machine learning algorithms developed and validated in this dissertation to measure sleep quantity and quality. 8. One important finding of this dissertation is that upper-body activities and specific sedentary behaviors are detected well by wrist-mounted accelerometers. Diet is a notoriously difficult variable to measure, and one reason for this difficulty is that diet is most often subjectively recalled via diary, interview, or food frequency questionnaire (Thompson and Subar 2013). Two objective methods exist to measure diet, direct observation and blood-based biomarkers, but direct observation is likely to cause reactivity and blood biomarkers are only useful for some nutrients and not overall diet quality (Park, Vollset et al. 2013). An interesting potential application of machine learning and pattern recognition would be to attempt to detect when someone is eating using acceleration data from a wrist accelerometer. Eating is typically a seated, sedentary 220 behavior with predicable arm movement; these characteristics give us reason to believe that eating could be recognized using a wrist-mounted accelerometer. This approach may not be able to yield accurate estimates of diet quality or quantity of foods consumed, but it could provide valuable information about eating behaviors such as frequency and timing of meals. Also, there may be ways to use this information as feedback to the wearers to improve subjective recall of eating behaviors and to combine physical activity and eating behavior assessment to provide more accurate and focused health information. 221 APPENDICES 222 APPENDIX A Consent form 223 224 225 226 APPENDIX B Recruitment flyer Figure B.1. Recruitment flyer 227 APPENDIX C Recruitment email 228 APPENDIX D Supplemental figures Figure D.1. Equipment worn by participants during the 90-min protocol. Participant shown is performing the lying activity (T1). 229 Figure D.2. Example of participant performing reading activity (T2). 230 Figure D.3. Example of participant performing computer use activity (T3). 231 Figure D.4. Example of participant performing standing activity (T4). 232 Figure D.5. Example of participant performing laundry activity (T5). 233 Figure D.6. Example of participant performing sweeping activity (T6). 234 Figure D.7. Example of participant performing walking slow and fast activities (T 7 and T8). 235 Figure D.8. Example of participant performing jogging activity (T9). 236 Figure D.9. Example of participant performing cycling activity (T10). 237 Figure D.10. Example of participant performing stair use activity (T11). 238 Figure D.11. Example of participant performing biceps curls activity (T12). 239 Figure D.12. Example of participant performing squats activity (T13). 240 Figure D.13. Example of non-wear (T14). 241 REFERENCES 242 REFERENCES "R Core Development Team. R: A language and Environment for Statistical Computing. version 2.12.1." (2008) "Physical Activity Guidelines Advisory Committee: 2008 Physical Activity Guidelines for Americans." (2008). "US Department of Health and Human Services. 2008 physical activity guidelines for Americans." from http://www.health.gov/PAGuidelines/. ACSM (2009). ACSM's Guidelines for Exercise Testing and Prescription, Lippincott Williams & Wilkins. ActiGraph. (2013). "Products: GT3X+ Monitor." from http://www.actigraphcorp.com/products/gt3x-monitor/. Ainsworth, B. E., W. L. Haskell, et al. (2011). "2011 Compendium of Physical Activities: a second update of codes and MET values." Medicine and science in sports and exercise 43(8): 1575-1581. Ainsworth, B. E., W. L. Haskell, et al. (2011). "2011 Compendium of Physical Activities: a second update of codes and MET values." Med Sci Sports Exerc 43(8): 1575-1581. Akkermans, M. A., M. J. Sillen, et al. (2012). "Validation of the oxycon mobile metabolic system in healthy subjects." Journal of sports science & medicine 11(1): 182-183. Albinali, F., S. Intille, et al. (2010). Using Wearable Activity Type Detection to Improve Physical Activity Energy Expenditure Estimation. ACM Conference on Ubiquitous Computing. Denmark: 311-320. Aminian, S. and E. A. Hinckson (2012). "Examining the validity of the ActivPAL monitor in measuring posture and ambulatory movement in children." The international journal of behavioral nutrition and physical activity 9: 119. Andre, D. and D. L. Wolf (2007). "Recent advances in free-living physical activity monitoring: a review." Journal of diabetes science and technology 1(5): 760-767. Arvidsson, D., F. Slinde, et al. (2007). "Energy cost of physical activities in children: validation of SenseWear Armband." Medicine and science in sports and exercise 39(11): 2076-2084. 243 Atkin, A. J., T. Gorely, et al. (2012). "Methods of Measurement in epidemiology: sedentary Behaviour." International journal of epidemiology 41(5): 1460-1471. Ayabe, M., H. Kumahara, et al. (2013). "Epoch length and the physical activity bout analysis: an accelerometry research issue." BMC research notes 6: 20. Bailey, R. C., J. Olson, et al. (1995). "The level and tempo of children's physical activities: an observational study." Med Sci Sports Exerc 27(7): 1033-1041. Bailey, R. C., J. Olson, et al. (1995). "The level and tempo of children's physical activities: an observational study." Medicine and science in sports and exercise 27(7): 1033-1041. Bao, L. and S. S. Intille (2004). "Activity recognition from user-annotated acceleration data." Proceedings of PERVASIVE 2004 LNCS 3001: 1-17. Beaton, G. H., J. Milner, et al. (1979). "Sources of variance in 24-hour dietary recall data: implications for nutrition study design and interpretation." Am J Clin Nutr 32(12): 25462559. Bergouignan, A., F. Rudwill, et al. (2011). "Physical inactivity as the culprit of metabolic inflexibility: evidence from bed-rest studies." Journal of applied physiology 111(4): 12011210. Berntsen, S., R. Hageberg, et al. (2010). "Validity of physical activity monitors in adults participating in free-living activities." British journal of sports medicine 44(9): 657-664. Berntsen, S., S. N. Stafne, et al. (2011). "Physical activity monitor for recording energy expenditure in pregnancy." Acta obstetricia et gynecologica Scandinavica 90(8): 903-907. Bey, L. and M. T. Hamilton (2003). "Suppression of skeletal muscle lipoprotein lipase activity during physical inactivity: a molecular reason to maintain daily low-intensity activity." The Journal of physiology 551(Pt 2): 673-682. Bird, A. D. (1972). "The effect of surgery, injury, and prolonged bed rest on calf blood flow." The Australian and New Zealand journal of surgery 41(4): 374-379. Blair, S. N. (1993). "Evidence for success of exercise in weight loss and control." Annals of internal medicine 119(7 Pt 2): 702-706. Bonomi, A. G., G. Plasqui, et al. (2009). "Improving assessment of daily energy expenditure by identifying types of physical activity with a single accelerometer." Journal of applied physiology 107(3): 655-661. 244 Boone, J. E., P. Gordon-Larsen, et al. (2007). "Screen time and physical activity during adolescence: longitudinal effects on obesity in young adulthood." The international journal of behavioral nutrition and physical activity 4: 26. Bouten, C. V., A. A. Sauren, et al. (1997). "Effects of placement and orientation of body-fixed accelerometers on the assessment of energy expenditure during walking." Medical & biological engineering & computing 35(1): 50-56. Brage, S., N. Brage, et al. (2005). "Reliability and validity of the combined heart rate and movement sensor Actiheart." European journal of clinical nutrition 59(4): 561-570. Brage, S., N. Brage, et al. (2003). "Reliability and validity of the Computer Science and Applications accelerometer in a mechanical setting." Measurement in Physical Education and Exercise Science 7: 101-119. Brage, S., N. Wedderkopp, et al. (2003). "Reexamination of validity and reliability of the CSA monitor in walking and running." Medicine and science in sports and exercise 35(8): 14471454. Brownson, R. C., C. M. Hoehner, et al. (2009). "Measuring the built environment for physical activity: state of the science." American journal of preventive medicine 36(4 Suppl): S99123 e112. Carr, L. J. and M. T. Mahar (2012). "Accuracy of intensity and inclinometer output of three activity monitors for identification of sedentary behavior and light-intensity activity." Journal of obesity 2012: 460271. Celis-Morales, C. A., F. Perez-Bravo, et al. (2012). "Objective vs. self-reported physical activity and sedentary time: effects of measurement method on relationships with risk biomarkers." PloS one 7(5): e36345. Chobanian, A. V., G. L. Bakris, et al. (2003). "The Seventh Report of the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure: the JNC 7 report." JAMA 289(19): 2560-2572. Choi, L., Z. Liu, et al. (2011). "Validation of accelerometer wear and nonwear time classification algorithm." Medicine and science in sports and exercise 43(2): 357-364. Clark, B. K., A. A. Thorp, et al. (2011). "Validity of self-reported measures of workplace sitting time and breaks in sitting time." Medicine and science in sports and exercise 43(10): 19071912. Cleland, I., B. Kikhia, et al. (2013). "Optimal placement of accelerometers for the detection of everyday activities." Sensors 13(7): 9183-9200. 245 Colbert, L. H., C. E. Matthews, et al. (2011). "Comparative validity of physical activity measures in older adults." Medicine and science in sports and exercise 43(5): 867-876. Craft, L. L., T. W. Zderic, et al. (2012). "Evidence that women meeting physical activity guidelines do not sit less: an observational inclinometry study." The international journal of behavioral nutrition and physical activity 9: 122. Crouter, S. E., C. Albright, et al. (2004). "Accuracy of polar S410 heart rate monitor to estimate energy cost of exercise." Medicine and science in sports and exercise 36(8): 1433-1439. Crouter, S. E. and D. R. Bassett, Jr. (2008). "A new 2-regression model for the Actical accelerometer." British journal of sports medicine 42(3): 217-224. Crouter, S. E., J. R. Churilla, et al. (2006). "Estimating energy expenditure using accelerometers." European journal of applied physiology 98(6): 601-612. Crouter, S. E., K. G. Clowers, et al. (2006). "A novel method for using accelerometer data to predict energy expenditure." Journal of applied physiology 100(4): 1324-1331. Crouter, S. E., E. Kuffel, et al. (2010). "Refined two-regression model for the ActiGraph accelerometer." Medicine and science in sports and exercise 42(5): 1029-1037. Dale, D., G. J. Welk, et al. (2002). Methods for Assessing Physical Activity and Challenges for Research. Physical Activity Assessments for Health-Related Research. G. J. Welk. Champaign, IL, Human Kinetics, Inc.: 19-36. Dannecker, K. L., N. A. Sazonova, et al. (2013). "A comparison of energy expenditure estimation of several physical activity monitors." Medicine and science in sports and exercise 45(11): 2105-2112. De Vries, S. I., F. G. Garre, et al. (2011). "Evaluation of neural networks to identify types of activity using accelerometers." Medicine and science in sports and exercise 43(1): 101-107. DiNallo, J. M., D. S. Downs, et al. (2012). "Objectively assessing treadmill walking during the second and third pregnancy trimesters." J Phys Act Health 9(1): 21-28. DiNallo, J. M., D. S. Downs, et al. (2012). "Objectively assessing treadmill walking during the second and third pregnancy trimesters." Journal of physical activity & health 9(1): 21-28. Dingwell, J. B., J. P. Cusumano, et al. (2001). "Local dynamic stability versus kinematic variability of continuous overground and treadmill walking." Journal of biomechanical engineering 123(1): 27-32. 246 Dong, B., S. Biswas, et al. (2013). "Comparing metabolic energy expenditure estimation using wearable multi-sensor network and single accelerometer." Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference 2013: 28662869. Dong, B., A. Montoye, et al. (2013). "Energy-aware activity classification using wearable sensor networks." 87230Y-87230Y. Dunstan, D. W., B. Howard, et al. (2012). "Too much sitting--a health hazard." Diabetes research and clinical practice 97(3): 368-376. Dwyer, T. J., J. A. Alison, et al. (2009). "Evaluation of the SenseWear activity monitor during exercise in cystic fibrosis and in health." Respiratory medicine 103(10): 1511-1517. Ekelund, U., S. Brage, et al. (2009). "Objectively measured moderate- and vigorous-intensity physical activity but not sedentary time predicts insulin resistance in high-risk individuals." Diabetes care 32(6): 1081-1086. Erik Landhuis, C., R. Poulton, et al. (2008). "Programming obesity and poor fitness: the long-term impact of childhood television." Obesity 16(6): 1457-1459. Ermes, M., J. Parkka, et al. (2008). "Detection of daily activities and sports with wearable sensors in controlled and uncontrolled conditions." IEEE transactions on information technology in biomedicine : a publication of the IEEE Engineering in Medicine and Biology Society 12(1): 20-26. Esliger, D. W., A. V. Rowlands, et al. (2011). "Validation of the GENEA Accelerometer." Medicine and science in sports and exercise 43(6): 1085-1093. Esliger, D. W. and M. S. Tremblay (2006). "Technical reliability assessment of three accelerometer models in a mechanical setup." Medicine and science in sports and exercise 38(12): 21732181. Evenson, K. R. and J. W. Terry, Jr. (2009). "Assessment of differing definitions of accelerometer nonwear time." Research quarterly for exercise and sport 80(2): 355-362. Feito, Y., D. R. Bassett, et al. (2012). "Evaluation of activity monitors in controlled and free-living environments." Medicine and science in sports and exercise 44(4): 733-741. Feito, Y., D. R. Bassett, et al. (2011). "Effects of body mass index and tilt angle on output of two wearable activity monitors." Med Sci Sports Exerc 43(5): 861-866. 247 Ferro-Luzzi, A. (1968). "[Inter- and intra-individual variability of the human energy expenditure in the rest position]." Bollettino della Societa italiana di biologia sperimentale 44(7): 633-637. Field, A. (2009). Discovering Statistics Using SPSS. London, SAGE Publications Ltd. Ford, E. S., M. B. Schulze, et al. (2010). "Television watching and incident diabetes: Findings from the European Prospective Investigation into Cancer and Nutrition-Potsdam Study." Journal of diabetes 2(1): 23-27. Fortune, E., V. Lugade, et al. (2014). "Validity of using tri-axial accelerometers to measure human movement - Part II: Step counts at a wide range of gait velocities." Medical engineering & physics. Foster, R. C., L. M. Lanningham-Foster, et al. (2005). "Precision and accuracy of an ankle-worn accelerometer-based pedometer in step counting and energy expenditure." Prev Med 41(34): 778-783. Freedson, P. S., K. Lyden, et al. (2011). "Evaluation of artificial neural network algorithms for predicting METs and activity type from accelerometer data: validation on an independent sample." Journal of applied physiology 111(6): 1804-1812. Freedson, P. S., E. Melanson, et al. (1998). "Calibration of the Computer Science and Applications, Inc. accelerometer." Medicine and science in sports and exercise 30(5): 777-781. Freedson, P. S., E. Melanson, et al. (1998). "Calibration of the Computer Science and Applications, Inc. accelerometer." Med Sci Sports Exerc 30(5): 777-781. Frost, C. and I. R. White (2005). "The effect of measurement error in risk factors that change over time in cohort studies: do simple methods overcorrect for 'regression dilution'?" Int J Epidemiol 34(6): 1359-1368. Gabriel, K. P., J. J. McClain, et al. (2010). "Issues in accelerometer methodology: the role of epoch length on estimates of physical activity and relationships with health outcomes in overweight, post-menopausal women." The international journal of behavioral nutrition and physical activity 7: 53. GENEActiv. (2013). "GENEAction: comprehensive data collection for every body." from http://www.geneactive.co.uk/products/geneactiv-action.aspx. Gierach, G. L., S. C. Chang, et al. (2009). "Physical activity, sedentary behavior, and endometrial cancer risk in the NIH-AARP Diet and Health Study." International journal of cancer. Journal international du cancer 124(9): 2139-2147. 248 Grant, P. M., C. G. Ryan, et al. (2006). "The validation of a novel activity monitor in the measurement of posture and motion during everyday activities." British journal of sports medicine 40(12): 992-997. GRPS. (2012). "GRPS School Choice Expo." from http://www.grps.org/ourschools/high-schools. Gyllensten, I. C. and A. G. Bonomi (2011). "Identifying types of physical activity with a single accelerometer: evaluating laboratory-trained algorithms in daily life." IEEE transactions on bio-medical engineering 58(9): 2656-2663. Hagstromer, M., P. Oja, et al. (2006). "The International Physical Activity Questionnaire (IPAQ): a study of concurrent and construct validity." Public Health Nutr 9(6): 755-762. Ham, S. A., J. Kruger, et al. (2009). "Participation by US adults in sports, exercise, and recreational physical activities." Journal of physical activity & health 6(1): 6-14. Hamilton, M. T., D. G. Hamilton, et al. (2004). "Exercise physiology versus inactivity physiology: an essential concept for understanding lipoprotein lipase regulation." Exercise and sport sciences reviews 32(4): 161-166. Hamilton, M. T., D. G. Hamilton, et al. (2007). "Role of low energy expenditure and sitting in obesity, metabolic syndrome, type 2 diabetes, and cardiovascular disease." Diabetes 56(11): 2655-2667. Hanggi, J. M., L. R. Phillips, et al. (2012). "Validation of the GT3X ActiGraph in children and comparison with the GT1M ActiGraph." Journal of science and medicine in sport / Sports Medicine Australia. Hargens, A. R. and S. Richardson (2009). "Cardiovascular adaptations, fluid shifts, and countermeasures related to space flight." Respiratory physiology & neurobiology 169 Suppl 1: S30-33. Harrington, D. M., G. J. Welk, et al. (2011). "Validation of MET estimates and step measurement using the ActivPAL physical activity logger." Journal of sports sciences 29(6): 627-633. Harrison, C. L., R. G. Thompson, et al. (2011). "Measuring physical activity during pregnancy." The international journal of behavioral nutrition and physical activity 8: 19. Hart, T. L., B. E. Ainsworth, et al. (2011). "Objective and subjective measures of sedentary behavior and physical activity." Medicine and science in sports and exercise 43(3): 449456. 249 Haskell, W. L., M. C. Yee, et al. (1993). "Simultaneous measurement of heart rate and body motion to quantitate physical activity." Medicine and science in sports and exercise 25(1): 109-115. Haymes, E. M. and W. C. Byrnes (1993). "Walking and running energy expenditure estimated by Caltrac and indirect calorimetry." Med Sci Sports Exerc 25(12): 1365-1369. Healy, G. N., B. K. Clark, et al. (2011). "Measurement of adults' sedentary time in populationbased studies." American journal of preventive medicine 41(2): 216-227. Healy, G. N., D. W. Dunstan, et al. (2007). "Objectively measured light-intensity physical activity is independently associated with 2-h plasma glucose." Diabetes care 30(6): 1384-1389. Healy, G. N., D. W. Dunstan, et al. (2008). "Breaks in sedentary time: beneficial associations with metabolic risk." Diabetes care 31(4): 661-666. Healy, G. N., D. W. Dunstan, et al. (2008). "Television time and continuous metabolic risk in physically active adults." Medicine and science in sports and exercise 40(4): 639-645. Healy, G. N., C. E. Matthews, et al. (2011). "Sedentary time and cardio-metabolic biomarkers in US adults: NHANES 2003-06." European heart journal 32(5): 590-597. Healy, G. N., K. Wijndaele, et al. (2008). "Objectively measured sedentary time, physical activity, and metabolic risk: the Australian Diabetes, Obesity and Lifestyle Study (AusDiab)." Diabetes care 31(2): 369-371. Heiermann, S., K. Khalaj Hedayati, et al. (2011). "Accuracy of a portable multisensor body monitor for predicting resting energy expenditure in older people: a comparison with indirect calorimetry." Gerontology 57(5): 473-479. Heil, D. P. (2006). "Predicting activity energy expenditure using the Actical activity monitor." Research quarterly for exercise and sport 77(1): 64-80. Helmerhorst, H. J., K. Wijndaele, et al. (2009). "Objectively measured sedentary time may predict insulin resistance independent of moderate- and vigorous-intensity physical activity." Diabetes 58(8): 1776-1779. Hendelman, D., K. Miller, et al. (2000). "Validity of accelerometry for the assessment of moderate intensity physical activity in the field." Medicine and science in sports and exercise 32(9 Suppl): S442-449. Herren, R., A. Sparti, et al. (1999). "The prediction of speed and incline in outdoor running in humans using accelerometry." Medicine and science in sports and exercise 31(7): 10531059. 250 Herrmann, S. D., T. V. Barreira, et al. (2012). "Impact of accelerometer wear time on physical activity data: a NHANES semisimulation data approach." British journal of sports medicine. Hjorth, M. F., J. P. Chaput, et al. (2012). "Measure of sleep and physical activity by a single accelerometer: Can a waist-worn Actigraph adequately measure sleep in children?" Sleep and Biological Rhythms 10(4): 328-335. Hoevenaar-Blom, M. P., A. M. Spijkerman, et al. (2011). "Sleep duration and sleep quality in relation to 12-year cardiovascular disease incidence: the MORGEN study." Sleep 34(11): 1487-1492. Howard, R. A., D. M. Freedman, et al. (2008). "Physical activity, sedentary behavior, and the risk of colon and rectal cancer in the NIH-AARP Diet and Health Study." Cancer causes & control : CCC 19(9): 939-953. Hu, F. B., M. F. Leitzmann, et al. (2001). "Physical activity and television watching in relation to risk for type 2 diabetes mellitus in men." Archives of internal medicine 161(12): 15421548. Hu, F. B., T. Y. Li, et al. (2003). "Television watching and other sedentary behaviors in relation to risk of obesity and type 2 diabetes mellitus in women." JAMA : the journal of the American Medical Association 289(14): 1785-1791. Jakicic, J. M., M. Marcus, et al. (2004). "Evaluation of the SenseWear Pro Armband to assess energy expenditure during exercise." Medicine and science in sports and exercise 36(5): 897-904. Janz, K. F. (2002). Use of Heart Rate Monitors to Assess Physical Activity. Physical Activity Assessments for Health-Related Research. G. J. Welk. Champaign, IL, Human Kinetics, Inc.: 143-162. Janz, K. F., J. Witt, et al. (1995). "The stability of children's physical activity as measured by accelerometry and self-report." Medicine and science in sports and exercise 27(9): 13261332. Jean-Louis, G., D. F. Kripke, et al. (2001). "Sleep detection with an accelerometer actigraph: comparisons with polysomnography." Physiol Behav 72(1-2): 21-28. Johnstone, A. M., S. D. Murison, et al. (2005). "Factors influencing variation in basal metabolic rate include fat-free mass, fat mass, age, and circulating thyroxine but not sex, circulating leptin, or triiodothyronine." The American journal of clinical nutrition 82(5): 941-948. 251 Kampert, J. B., S. N. Blair, et al. (1996). "Physical activity, physical fitness, and all-cause and cancer mortality: a prospective study of men and women." Annals of epidemiology 6(5): 452-457. Katzmarzyk, P. T. (2014). "Standing and mortality in a prospective cohort of canadian adults." Medicine and science in sports and exercise 46(5): 940-946. Katzmarzyk, P. T., T. S. Church, et al. (2009). "Sitting time and mortality from all causes, cardiovascular disease, and cancer." Medicine and science in sports and exercise 41(5): 998-1005. Kenney, W., J. Wilmore, et al. (2012). Physiology of sport and exercise. Champaign, IL, Human Kinetics. Khan, A. M., Y. K. Lee, et al. (2008). "Accelerometer signal-based human activity recognition using augmented autoregressive model coefficients and artificial neural nets." Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference 2008: 5172-5175. Khan, A. M., Y. K. Lee, et al. (2010). "A triaxial accelerometer-based physical-activity recognition via augmented-signal features and a hierarchical recognizer." IEEE transactions on information technology in biomedicine : a publication of the IEEE Engineering in Medicine and Biology Society 14(5): 1166-1172. Kierkegaard, A., L. Norgren, et al. (1987). "Incidence of deep vein thrombosis in bedridden nonsurgical patients." Acta medica Scandinavica 222(5): 409-414. Kinder, J. R., K. A. Lee, et al. (2012). "Validation of a hip-worn accelerometer in measuring sleep time in children." J Pediatr Nurs 27(2): 127-133. King, A. C. and D. L. Tribble (1991). "The role of exercise in weight regulation in nonathletes." Sports medicine 11(5): 331-349. Kozey-Keadle, S., A. Libertine, et al. (2011). "Validation of wearable monitors for assessing sedentary behavior." Medicine and science in sports and exercise 43(8): 1561-1567. Krahenbuhl, G. S. and T. J. Williams (1992). "Running economy: changes with age during childhood and adolescence." Medicine and science in sports and exercise 24(4): 462-466. Kripke, D. F., D. J. Mullaney, et al. (1978). "Wrist actigraphic measures of sleep and rhythms." Electroencephalogr Clin Neurophysiol 44(5): 674-676. 252 Lagerros, Y. T. and P. Lagiou (2007). "Assessment of physical activity and energy expenditure in epidemiological research of chronic diseases." Eur J Epidemiol 22(6): 353-362. LaPorte, R. E., L. H. Kuller, et al. (1979). "An objective measure of physical activity for epidemiologic research." American journal of epidemiology 109(2): 158-168. LaPorte, R. E., H. J. Montoye, et al. (1985). "Assessment of physical activity in epidemiologic research: problems and prospects." Public health reports 100(2): 131-146. Le Masurier, G. C., C. L. Sidman, et al. (2003). "Accumulating 10,000 steps: does this meet current physical activity guidelines?" Research quarterly for exercise and sport 74(4): 389394. Lee, I. M. and P. J. Skerrett (2001). "Physical activity and all-cause mortality: what is the doseresponse relation?" Medicine and science in sports and exercise 33(6 Suppl): S459-471; discussion S493-454. Lee, J. M., Y. Kim, et al. (2014). "Validity of Consumer-Based Physical Activity Monitors." Medicine and science in sports and exercise. Levine, J. A., N. L. Eberhardt, et al. (1999). "Role of nonexercise activity thermogenesis in resistance to fat gain in humans." Science 283(5399): 212-214. Levine, J. A., L. M. Lanningham-Foster, et al. (2005). "Interindividual variation in posture allocation: possible role in human obesity." Science 307(5709): 584-586. Lord, S., S. F. Chastin, et al. (2011). "Exploring patterns of daily physical and sedentary behaviour in community-dwelling older adults." Age Ageing 40(2): 205-210. Lyden, K. (2012). Refinement, validation and application of a machine learning method for estimating physical activity and sedentary behavior in free-living people. Dissertation. Amherst, MA. Lyden, K., S. K. Keadle, et al. (2013). "A Method to Estimate Free-Living Active and Sedentary Behavior from an Accelerometer." Medicine and science in sports and exercise. Lyden, K., S. L. Kozey Keadle, et al. (2012). "Validity of two wearable monitors to estimate breaks from sedentary time." Medicine and science in sports and exercise 44(11): 22432252. Lyden, K., S. L. Kozey, et al. (2011). "A comprehensive evaluation of commonly used accelerometer energy expenditure and MET prediction equations." European journal of applied physiology 111(2): 187-201. 253 Lyden, K., N. Petruski, et al. (2013). "Direct Observation is a Valid Criterion for Estimating Physical Activity and Sedentary Behavior." Journal of physical activity & health. MacMahon, S., R. Peto, et al. (1990). "Blood pressure, stroke, and coronary heart disease. Part 1, Prolonged differences in blood pressure: prospective observational studies corrected for the regression dilution bias." Lancet 335(8692): 765-774. Maddocks, M., A. Petrou, et al. (2010). "Validity of three accelerometers during treadmill walking and motor vehicle travel." British journal of sports medicine 44(8): 606-608. Malina, R. (1995). "Anthropometry." Physiological assessment of human fitness: 205-219. Mannini, A., S. S. Intille, et al. (2013). "Activity recognition using a single accelerometer placed at the wrist or ankle." Med Sci Sports Exerc. Mannini, A. and A. M. Sabatini (2010). "Machine learning methods for classifying human physical activity from on-body accelerometers." Sensors (Basel) 10(2): 1154-1175. Manson, J. E., D. M. Nathan, et al. (1992). "A prospective study of exercise and incidence of diabetes among US male physicians." JAMA : the journal of the American Medical Association 268(1): 63-67. Martin, A., M. McNeill, et al. (2011). "Objective measurement of habitual sedentary behavior in pre-school children: comparison of activPAL With Actigraph monitors." Pediatric exercise science 23(4): 468-476. Martinsen, E. W., A. Hoffart, et al. (1989). "Comparing aerobic with nonaerobic forms of exercise in the treatment of clinical depression: a randomized trial." Comprehensive psychiatry 30(4): 324-331. Masse, L. C., B. F. Fuemmeler, et al. (2005). "Accelerometer data reduction: a comparison of four reduction algorithms on select outcome variables." Medicine and science in sports and exercise 37(11 Suppl): S544-554. Matthews, C. E. (2005). "Calibration of accelerometer output for adults." Medicine and science in sports and exercise 37(11 Suppl): S512-522. Matthews, C. E., B. E. Ainsworth, et al. (2002). "Sources of variance in daily physical activity levels as measured by an accelerometer." Medicine and science in sports and exercise 34(8): 1376-1381. Matthews, C. E., K. Y. Chen, et al. (2008). "Amount of time spent in sedentary behaviors in the United States, 2003-2004." American journal of epidemiology 167(7): 875-881. 254 Matthews, C. E., S. C. Moore, et al. (2012). "Improving self-reports of active and sedentary behaviors in large epidemiologic studies." Exercise and sport sciences reviews 40(3): 118126. McClain, J. J., S. B. Sisson, et al. (2007). "Actigraph accelerometer interinstrument reliability during free-living in adults." Medicine and science in sports and exercise 39(9): 1509-1514. McKenzie, T. (2002). Use of direct observation to assess physical activity. Physical Activity Assessments for Health-Related Research. G. Welk. Champaign, IL, Kunan Kinetics, Inc.: 179-195. Melanson, E. L., Jr. and P. S. Freedson (1995). "Validity of the Computer Science and Applications, Inc. (CSA) activity monitor." Medicine and science in sports and exercise 27(6): 934-940. Metcalf, B. S., J. S. Curnow, et al. (2002). "Technical reliability of the CSA activity monitor: The EarlyBird Study." Medicine and science in sports and exercise 34(9): 1533-1537. Metz, C. E. (1978). "Basic principles of ROC analysis." Seminars in nuclear medicine 8(4): 283298. Mignault, D., M. St-Onge, et al. (2005). "Evaluation of the Portable HealthWear Armband: a device to measure total daily energy expenditure in free-living type 2 diabetic individuals." Diabetes care 28(1): 225-227. Mikines, K. J., E. A. Richter, et al. (1991). "Seven days of bed rest decrease insulin action on glucose uptake in leg and whole body." Journal of applied physiology 70(3): 1245-1254. Montgomery-Downs, H. E., S. P. Insana, et al. (2012). "Movement toward a novel activity monitoring device." Sleep & breathing = Schlaf & Atmung 16(3): 913-917. Montoye, A., B. Dong, et al. (2013). Assessing the effect of accelerometer placement and modeling method on energy expenditure measurement. East Lansing, MI. Montoye, A., B. Dong, et al. (2014). "Use of a wireless network of accelerometers for improved measurement of human energy expenditure." Electronics 3(2): 205-220. Montoye, H., H. Kemper, et al. (1996). Measuring physical activity and energy expenditure. Champaign, IL, Human Kinetics. Montoye, H. J., R. Washburn, et al. (1983). "Estimation of energy expenditure by a portable accelerometer." Medicine and science in sports and exercise 15(5): 403-407. 255 Moon, J. K. and N. F. Butte (1996). "Combined heart rate and activity improve estimates of oxygen consumption and carbon dioxide production rates." Journal of applied physiology 81(4): 1754-1761. Morris, J. N., D. G. Clayton, et al. (1990). "Exercise in leisure time: coronary attack and death rates." Br Heart J 63(6): 325-334. Morris, J. N., D. G. Clayton, et al. (1990). "Exercise in leisure time: coronary attack and death rates." British heart journal 63(6): 325-334. Morris, J. N., J. A. Heady, et al. (1953). "Coronary heart-disease and physical activity of work." Lancet 265(6796): 1111-1120; concl. Moy, K. L., J. F. Sallis, et al. (2010). "Culturally-specific physical activity measures for Native Hawaiian and Pacific Islanders." Hawaii medical journal 69(5 Suppl 2): 21-24. Mullaney, D. J., D. F. Kripke, et al. (1980). "Wrist-actigraphic estimation of sleep time." Sleep 3(1): 83-92. Nichols, J. F., C. G. Morgan, et al. (1999). "Validity, reliability, and calibration of the Tritrac accelerometer as a measure of physical activity." Med Sci Sports Exerc 31(6): 908-912. Oliver, M., H. M. Badland, et al. (2011). "Identification of accelerometer nonwear time and sedentary behavior." Research quarterly for exercise and sport 82(4): 779-783. Orendurff, M. S., J. A. Schoen, et al. (2008). "How humans walk: bout duration, steps per bout, and rest duration." Journal of rehabilitation research and development 45(7): 1077-1089. Orme, M., K. Wijndaele, et al. (2014). "Combined influence of epoch length, cut-point and bout duration on accelerometry-derived physical activity." The international journal of behavioral nutrition and physical activity 11(1): 34. Owen, N., G. N. Healy, et al. (2010). "Too much sitting: the population health science of sedentary behavior." Exercise and sport sciences reviews 38(3): 105-113. Paffenbarger, R. S., Jr., R. T. Hyde, et al. (1986). "Physical activity, all-cause mortality, and longevity of college alumni." N Engl J Med 314(10): 605-613. Paffenbarger, R. S., Jr., A. L. Wing, et al. (1983). "Physical activity and incidence of hypertension in college alumni." Am J Epidemiol 117(3): 245-257. PAGAC (2008). Physical Activity Guidlines Advisory Committee Report, 2008. Washington, DC, US Department of Health and Human Services. 256 Papazoglou, D., G. Augello, et al. (2006). "Evaluation of a multisensor armband in estimating energy expenditure in obese individuals." Obesity 14(12): 2217-2223. Parikh, R., A. Mathai, et al. (2008). "Understanding and using sensitivity, specificity and predictive values." Indian journal of ophthalmology 56(1): 45-50. Park, J. Y., S. E. Vollset, et al. (2013). "Dietary intake and biological measurement of folate: a qualitative review of validation studies." Molecular nutrition & food research 57(4): 562581. Parkka, J., M. Ermes, et al. (2006). "Activity classification using realistic data from wearable sensors." IEEE transactions on information technology in biomedicine : a publication of the IEEE Engineering in Medicine and Biology Society 10(1): 119-128. Pate, R. R., J. R. O'Neill, et al. (2008). "The evolving definition of "sedentary"." Exercise and sport sciences reviews 36(4): 173-178. Pate, R. R., M. Pratt, et al. (1995). "Physical activity and public health. A recommendation from the Centers for Disease Control and Prevention and the American College of Sports Medicine." JAMA : the journal of the American Medical Association 273(5): 402-407. Patel, A. V., C. Rodriguez, et al. (2006). "Recreational physical activity and sedentary behavior in relation to ovarian cancer risk in a large cohort of US women." American journal of epidemiology 163(8): 709-716. Plasqui, G. and K. R. Westerterp (2005). "Accelerometry and heart rate as a measure of physical fitness: proof of concept." Medicine and science in sports and exercise 37(5): 872-876. Ploug, T., T. Ohkuwa, et al. (1995). "Effect of immobilization on glucose transport and glucose transporter expression in rat skeletal muscle." The American journal of physiology 268(5 Pt 1): E980-986. Pober, D. M., J. Staudenmayer, et al. (2006). "Development of novel techniques to classify physical activity mode using accelerometers." Medicine and science in sports and exercise 38(9): 1626-1634. Precope, J. (1952). Hippocrates on diet and hygiene. London, UK, Williams, Lea, and Company. Preece, S. J., J. Y. Goulermas, et al. (2009). "Activity identification using body-mounted sensors--a review of classification techniques." Physiological measurement 30(4): R1-33. Puhl, J., K. Greaves, et al. (1990). "Children's Activity Rating Scale (CARS): description and calibration." Research quarterly for exercise and sport 61(1): 26-36. 257 Reilly, J. J., V. Penpraze, et al. (2008). "Objective measurement of physical activity and sedentary behaviour: review with new data." Archives of disease in childhood 93(7): 614-619. Riddoch, C. J., L. Bo Andersen, et al. (2004). "Physical activity levels and patterns of 9- and 15-yrold European children." Medicine and science in sports and exercise 36(1): 86-92. Rosdahl, H., L. Gullstrand, et al. (2010). "Evaluation of the Oxycon Mobile metabolic system against the Douglas bag method." European journal of applied physiology 109(2): 159-171. Rosenberger, M. E., W. L. Haskell, et al. (2013). "Estimating activity and sedentary behavior from an accelerometer on the hip or wrist." Med Sci Sports Exerc 45(5): 964-975. Rothney, M. P., M. Neumann, et al. (2007). "An artificial neural network model of energy expenditure using nonintegrated acceleration signals." Journal of applied physiology 103(4): 1419-1427. Rothney, M. P., E. V. Schaefer, et al. (2008). "Validity of physical activity intensity predictions by ActiGraph, Actical, and RT3 accelerometers." Obesity 16(8): 1946-1952. Rowlands, A. V., T. S. Olds, et al. (2014). "Assessing Sedentary Behavior with the GENEActiv: Introducing the Sedentary Sphere." Medicine and science in sports and exercise 46(6): 1235-1247. Rumo, M., O. Amft, et al. (2011). "A stepwise validation of a wearable system for estimating energy expenditure in field-based research." Physiological measurement 32(12): 19832001. Ryan, C. G., P. M. Grant, et al. (2006). "The validity and reliability of a novel activity monitor as a measure of walking." British journal of sports medicine 40(9): 779-784. Safrit, M. and T. Wood (1995). Introduction to measurement in physical education and exercise science. St. Louis, MO, Mosby. Sallis, J. F., M. J. Buono, et al. (1990). "The Caltrac accelerometer as a physical activity monitor for school-age children." Medicine and science in sports and exercise 22(5): 698-703. Sallis, J. F. and B. E. Saelens (2000). "Assessment of physical activity by self-report: status, limitations, and future directions." Research quarterly for exercise and sport 71(2 Suppl): S1-14. Santos-Lozano, A., P. J. Marin, et al. (2012). "Technical variability of the GT3X accelerometer." Med Eng Phys 34(6): 787-790. 258 Santos-Lozano, A., G. Torres-Luque, et al. (2012). "Intermonitor variability of GT3X accelerometer." International journal of sports medicine 33(12): 994-999. Sasaki, J. E., D. John, et al. (2011). "Validation and comparison of ActiGraph activity monitors." Journal of science and medicine in sport / Sports Medicine Australia 14(5): 411-416. SBRN (2012). "Letter to the Editor: Standardized use of the terms "sedentary" and "sedentary behaviours"." Appl Physiol Nutr Metab 37: 540-542. Schrage, W. G. (2008). "Not a search in vein: novel stimulus for vascular dysfunction after simulated microgravity." Journal of applied physiology 104(5): 1257-1258. Seider, M. J., W. F. Nicholson, et al. (1982). "Insulin resistance for glucose metabolism in disused soleus muscle of mice." The American journal of physiology 242(1): E12-18. Shephard, R. J. (1990). "Physical activity and cancer." International journal of sports medicine 11(6): 413-420. Shephard, R. J. (2003). "Limits to the measurement of habitual physical activity by questionnaires." British journal of sports medicine 37(3): 197-206; discussion 206. Shepherd, E. F., E. Toloza, et al. (1999). "Step activity monitor: increased accuracy in quantifying ambulatory activity." J Orthop Res 17(5): 703-708. Shields, M. and M. S. Tremblay (2008). "Sedentary behaviour and obesity." Health reports / Statistics Canada, Canadian Centre for Health Information = Rapports sur la sante / Statistique Canada, Centre canadien d'information sur la sante 19(2): 19-30. Skotte, J., M. Korshoj, et al. (2012). "Detection of Physical Activity Types Using Triaxial Accelerometers." Journal of Physical Activity & Health. Skotte, J., M. Korshoj, et al. (2014). "Detection of physical activity types using triaxial accelerometers." Journal of physical activity & health 11(1): 76-84. Slattery, M. L. (2004). "Physical activity and colorectal cancer." Sports medicine 34(4): 239-252. Slootmaker, S. M., A. J. Schuit, et al. (2009). "Disagreement in physical activity assessed by accelerometer and self-report in subgroups of age, gender, education and weight status." The international journal of behavioral nutrition and physical activity 6: 17. Smorawinski, J., P. Kubala, et al. (1996). "Effects of three day bed-rest on circulatory, metabolic and hormonal responses to oral glucose load in endurance trained athletes and untrained subjects." Journal of gravitational physiology : a journal of the International Society for Gravitational Physiology 3(2): 44-45. 259 Spurr, G. B., A. M. Prentice, et al. (1988). "Energy expenditure from minute-by-minute heart-rate recording: comparison with indirect calorimetry." The American journal of clinical nutrition 48(3): 552-559. Stamatakis, E., M. Hamer, et al. (2011). "Screen-based entertainment time, all-cause mortality, and cardiovascular events: population-based study with ongoing mortality and hospital events follow-up." Journal of the American College of Cardiology 57(3): 292-299. Staudenmayer, J., D. Pober, et al. (2009). "An artificial neural network to estimate physical activity energy expenditure and identify physical activity type from an accelerometer." Journal of applied physiology 107(4): 1300-1307. Strath, S. J., D. R. Bassett, Jr., et al. (2001). "Simultaneous heart rate-motion sensor technique to estimate energy expenditure." Medicine and science in sports and exercise 33(12): 21182123. Strath, S. J., D. R. Bassett, Jr., et al. (2002). "Validity of the simultaneous heart rate-motion sensor technique for measuring energy expenditure." Medicine and science in sports and exercise 34(5): 888-894. Stuart, C. A., R. E. Shangraw, et al. (1988). "Bed-rest-induced insulin resistance occurs primarily in muscle." Metabolism: clinical and experimental 37(8): 802-806. Sun, D. X., G. Schmidt, et al. (2008). "Validation of the RT3 accelerometer for measuring physical activity of children in simulated free-living conditions." Pediatric exercise science 20(2): 181-197. Swartz, A. M., L. Squires, et al. (2011). "Energy expenditure of interruptions to sedentary behavior." The international journal of behavioral nutrition and physical activity 8: 69. Swartz, A. M., S. J. Strath, et al. (2000). "Estimation of energy expenditure using CSA accelerometers at hip and wrist sites." Medicine and science in sports and exercise 32(9 Suppl): S450-456. Tapia, E. M., S. S. Intillie, et al. (2007). "Real-time recognition of physical activities and their intensities using wireless accelerometers and a heart rate monitor." Proceedings of the International Symposium on Wearable Computers. Thompson, F. and A. Subar (2013). Dietary assessment methodology. Nutrition in the prevention and treatment of disease. A. Coulston, C. Boushey and M. Ferruzzi. London, UK, Elsevier. 3. 260 Thorp, A. A., N. Owen, et al. (2011). "Sedentary behaviors and subsequent health outcomes in adults a systematic review of longitudinal studies, 1996-2011." American journal of preventive medicine 41(2): 207-215. Thune, I. and A. S. Furberg (2001). "Physical activity and cancer risk: dose-response and cancer, all sites and site-specific." Medicine and science in sports and exercise 33(6 Suppl): S530550; discussion S609-510. Tobin, B. W., P. N. Uchakin, et al. (2002). "Insulin secretion and sensitivity in space flight: diabetogenic effects." Nutrition 18(10): 842-848. Troiano, R. P., D. Berrigan, et al. (2008). "Physical activity in the United States measured by accelerometer." Medicine and science in sports and exercise 40(1): 181-188. Troiano, R. P. and J. J. McClain (2012). Objective measuremes of physical activity, strength, sleep, and strength in US National Health and Nutrition Examination Survey (NHANES) 20112014. The 8th International Conference on Diet and Activity Methods, Rome, Italy. Troiano, R. P., J. J. McClain, et al. (2014). "Evolution of accelerometer methods for physical activity research." British journal of sports medicine. Trost, S. G., K. L. McIver, et al. (2005). "Conducting accelerometer-based activity assessments in field-based research." Medicine and science in sports and exercise 37(11): S531-S543. Trost, S. G., W. K. Wong, et al. (2012). "Artificial neural networks to predict activity type and energy expenditure in youth." Medicine and science in sports and exercise 44(9): 18011809. Tudor-Locke, C. E. and A. M. Myers (2001). "Challenges and opportunities for measuring physical activity in sedentary adults." Sports medicine 31(2): 91-100. UBCC. (2009). "Category 2 enhanced phenotyping at baseline assessment visit in last 100-150,000 participants." from http://www.ukbiobank.ac.uk/wpcontent/uploads/2011/06/Protocol_addendum_2.pdf. van Hees, V. T., R. Golubic, et al. (2013). "Impact of study design on development and evaluation of an activity-type classifier." Journal of applied physiology 114(8): 1042-1051. van Poppel, M. N., M. J. Chinapaw, et al. (2010). "Physical activity questionnaires for adults: a systematic review of measurement properties." Sports medicine 40(7): 565-600. Vanhelst, J., G. Baquet, et al. (2012). "Comparative interinstrument reliability of uniaxial and triaxial accelerometers in free-living conditions." Percept Mot Skills 114(2): 584-594. 261 Veltink, P. H., H. B. Bussmann, et al. (1996). "Detection of static and dynamic activities using uniaxial accelerometers." IEEE Trans Rehabil Eng 4(4): 375-385. Webster, J. B., D. F. Kripke, et al. (1982). "An activity-based sleep monitor system for ambulatory use." Sleep 5(4): 389-399. Welch, W. A., D. R. Bassett, et al. (2014). "Cross-validation of Waist-Worn GENEA Accelerometer Cut-Points." Medicine and science in sports and exercise. Welch, W. A., D. R. Bassett, et al. (2013). "Classification accuracy of the wrist-worn gravity estimator of normal everyday activity accelerometer." Medicine and science in sports and exercise 45(10): 2012-2019. Welk, G. J. (2002). "Reliability of the CSA activity monitor for assessing physical activity." Research quarterly for exercise and sport 73: A14. Welk, G. J. (2002). Use of Accelerometry-Based Activity Monitors to Assess Physical Activity. Physical Activity Assessments for Health-Related Research. G. J. Welk. Champaign, IL, Human Kinetics, Inc.: 125-142. Welk, G. J. and C. B. Corbin (1995). "The validity of the Tritrac-R3D Activity Monitor for the assessment of physical activity in children." Research quarterly for exercise and sport 66(3): 202-209. Welk, G. J., J. J. McClain, et al. (2007). "Field validation of the MTI Actigraph and BodyMedia armband monitor using the IDEEA monitor." Obesity 15(4): 918-928. Westerterp, K. R. (1999). "Assessment of physical activity level in relation to obesity: current evidence and research issues." Medicine and science in sports and exercise 31(11 Suppl): S522-525. Wijndaele, K., G. N. Healy, et al. (2010). "Increased cardiometabolic risk is associated with increased TV viewing time." Medicine and science in sports and exercise 42(8): 15111518. Wong, T. C., J. G. Webster, et al. (1981). "Portable accelerometer device for measuring human energy expenditure." IEEE transactions on bio-medical engineering 28(6): 467-471. Yanagibori, R., K. Kondo, et al. (1998). "Effect of 20 days' bed rest on the reverse cholesterol transport system in healthy young subjects." Journal of internal medicine 243(4): 307-312. Yanagibori, R., Y. Suzuki, et al. (1997). "The effects of 20 days bed rest on serum lipids and lipoprotein concentrations in healthy young subjects." Journal of gravitational physiology : a journal of the International Society for Gravitational Physiology 4(1): S82-90. 262 Zderic, T. W. and M. T. Hamilton (2006). "Physical inactivity amplifies the sensitivity of skeletal muscle to the lipid-induced downregulation of lipoprotein lipase activity." Journal of applied physiology 100(1): 249-257. Zerwekh, J. E., L. A. Ruml, et al. (1998). "The effects of twelve weeks of bed rest on bone histology, biochemical markers of bone turnover, and calcium homeostasis in eleven normal subjects." Journal of bone and mineral research : the official journal of the American Society for Bone and Mineral Research 13(10): 1594-1601. Zhang, K., F. X. Pi-Sunyer, et al. (2004). "Improving energy expenditure estimation for physical activity." Medicine and science in sports and exercise 36(5): 883-889. Zhang, K., P. Werner, et al. (2003). "Measurement of human daily physical activity." Obesity research 11(1): 33-40. Zhang, S., A. V. Rowlands, et al. (2012). "Physical activity classification using the GENEA wristworn accelerometer." Medicine and science in sports and exercise 44(4): 742-748. 263