USE OF ACCELEROMETRY AND MACHINE LEARNING TO MEASURE FREE-LIVING
PHYSICAL ACTIVITY AND SEDENTARY BEHAVIOR
By
Alexander Henry Montoye

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Kinesiology – Doctor of Philosophy
2014

ABSTRACT
USE OF ACCELEROMETRY AND MACHINE LEARNING TO MEASURE FREELIVING PHYSICAL ACTIVITY AND SEDENTARY BEHAVIOR
By
Alexander Henry Montoye
Physical activity (PA) and sedentary behavior (SB) are important behavioral variables
that are associated with many key short- and long-term health indices. Objective and highly
accurate methods of measuring PA and SB are needed in order to better understand the
relationships of PA and SB with various health outcomes, determine population levels of PA and
SB, identify and target groups at high risk of having low PA or high SB, and assess the
effectiveness of interventions aimed to increase PA and reduce SB in populations. Of the
available measurement tools, accelerometer-based activity monitors have gained popularity due
to their blend of feasibility for use and relatively high accuracy for assessing PA (by identifying
specific activity types), SB, and energy expenditure (EE). However, little research has been
done to compare the accuracy of accelerometers placed on different parts of the body, and
current data modeling methods are either 1) simple to use but lack accuracy or 2) highly accurate
but highly complex. Therefore, the purpose of this dissertation was 1) to develop accurate and
relatively simple data processing and modeling methods for accelerometer data and 2) to
compare accelerometers located on the right hip, right thigh, and both wrists for classification of
activity type and prediction of SB and EE.
Healthy adults (n=44) were recruited to participate in a 90-minute simulated free-living
protocol. For the protocol, participants performed 14 activities for between 3-10 minutes, with
order, duration, and intensity of activities left up to participants. Participants wore a portable

metabolic analyzer (for a criterion measure of EE) and four accelerometers, which were placed
on the right hip, right thigh, and both wrists. The order and timing of the activities performed
during the protocol was recorded by a trained research assistant (for a criterion measure of
activity type and SB). Machine learning algorithms (i.e., artificial neural networks) were
created by extracting simple-to-compute features from the data from each of the four
accelerometers in order to classify activity type and predict SB and EE. Accuracy of the four
accelerometers for each outcome variable was assessed by comparing predictions from the
accelerometers to the actual values obtained by the criterion measures. Additionally, we
processed, cleaned, and extracted features of the accelerometer data in Microsoft Excel and
created the artificial neural networks using R software, thereby accomplishing our goal of
using simple methods to create machine learning algorithms to model accelerometer data.
Overall, the thigh accelerometer provided the highest predictive accuracy for EE,
although both the wrists and hip accelerometers also provided highly accurate EE predictions.
For recognition of activity type, the wrist accelerometers achieved the highest accuracy while
the hip accelerometer had the lowest accuracy. Finally, for prediction of SB, the hip and left
wrist accelerometers provided the highest accuracy while the right wrist accelerometer
provided the lowest accuracy.
Our study highlights the strengths and weaknesses of accelerometers placed on the hip,
thigh, and wrists for prediction of activity type, SB, and EE. These findings suggest that single
accelerometers can be used for accurate measurement of PA, SB, and EE, although the optimal
accelerometer placement site will depend on the specific research question. Further research
should be conducted in a true free-living setting with a more diverse population, different sets
of activities, and when using other types of machine learning to mode the accelerometer data.

Copyright by
ALEXANDER HENRY MONTOYE
2014

I would like to dedicate this dissertation to my grandfather, Henry Montoye. You are a
pioneer in the field of exercise physiology and have had a lasting positive impact on our world
through your work. I feel privileged to get to follow in your footsteps, and I have had the
opportunity to meet so many great scientists in the field due to my connection with you. More
than that, though, you have been a wonderful grandfather. I will never forget all the card
playing, drawings, broken cookies, Great Harvest breads, and Old Country Buffet trips you have
shared with me over the years. You are a role model in how to lead a successful career and be an
involved husband, father, grandfather, and great-grandfather. Thank you.

v

ACKNOWLEDGEMENTS
First, I would like to thank my advisor, Dr. Karin A. Pfeiffer, for her guidance and
support in my four years at Michigan State University. You have been incredibly supportive of
the different projects I have undertaken in my doctoral work, even when some of them did not
directly push me toward completing my degree. I would also like to thank my dissertation
committee for their assistance in designing and implementing a project that has established a
solid line of research for me to continue in the future. Second, I want to thank the fellow
doctoral students for making the graduate experience at Michigan State so rewarding. They have
been so helpful in learning the ins and outs of teaching and research, and they have also been
supportive through the highs and lows of school and non-school events. I also want to give a
shout out to Chris Connolly for being a great conference roommate and lifting buddy, Kimbo
Yee for being a great teaching mentor and fellow fan of the Brody cafeteria, Catherine Gammon
for teaching me the true art of tea drinking, and Ian Cowburn for putting up with the whirring of
my stationary bike at all times of the day.
I owe a special thank you to my parents, brother, and grandparents. I would not be where
I am without your love and constant support. Lastly, I want to thank my soon-to-be wife, Laura
Kohn. You have been so understanding and patient with me through my doctoral work, allowing
me the time I need to complete my work but also making sure that I kept a work-life balance. I
cannot thank you enough for keeping me grounded through school and helping to make our
distance relationship work as well as it has. I love you and feel so lucky to get to spend my life
with you.

vi

TABLE OF CONTENTS

LIST OF TABLES ...........................................................................................................................x
LIST OF FIGURES ...................................................................................................................... xii
KEY TO SYMBOLS AND ABBREVIATIONS ........................................................................ xiv
CHAPTER 1: INTRODUCTION .................................................................................................1
Physical activity and sedentary behavior .................................................................................1
Measurement of physical activity and sedentary behavior.......................................................2
SPECIFIC AIMS AND HYPOTHESES .........................................................................................9
CHAPTER 2: LITERATURE REVIEW...................................................................................13
Introduction ........................................................................................................................13
The influence of physical activity and sedentary behavior on health ................................14
Physical activity .....................................................................................................14
Sedentary behavior.................................................................................................15
Accelerometry as a preferred method to measure physical activity, energy expenditure,
sedentary behavior, and activity type.................................................................................23
Measurement methods ...........................................................................................23
The Large-Scale Integrated monitor and Caltrac ...................................................26
Linear regression ....................................................................................................28
Multiple regression ................................................................................................31
Measurement of sedentary behavior using accelerometers ...................................34
Machine learning ...................................................................................................36
Multiple sensor methods ........................................................................................41
Accelerometer placement.......................................................................................49
Laboratory-based vs. free-living settings ...............................................................60
Accelerometer reliability .......................................................................................64
Identifying non-wear ..............................................................................................66
Summary of current evidence and future directions ..........................................................69
CHAPTER 3: VALIDATION AND COMPARISON OF ACCELEROMETERS
LOCATED ON THE WRISTS, HIP, AND THIGH FOR FREE-LIVING ENERGY
EXPENDITURE PREDICTION ................................................................................................70
ABSTRACT ...................................................................................................................................70
INTRODUCTION .........................................................................................................................72
METHODS ....................................................................................................................................76
Summary of protocol .........................................................................................................76
Participants.........................................................................................................................76
Instrumentation ..................................................................................................................77
vii

ActiGraph accelerometers......................................................................................77
GENEA accelerometers .........................................................................................78
Oxycon portable metabolic analyzer .....................................................................78
Procedure ...........................................................................................................................79
Data reduction and modeling .............................................................................................82
Artificial neural networks ......................................................................................82
Window length .......................................................................................................85
Features ..................................................................................................................86
Size of the hidden layer..........................................................................................91
Oxycon data ...........................................................................................................92
Statistical analyses .............................................................................................................92
Power analysis ...................................................................................................................94
RESULTS ......................................................................................................................................96
DISCUSSION ..............................................................................................................................100
Study strengths and limitations ........................................................................................106
Conclusions ......................................................................................................................108
CHAPTER 4: COMPARISON OF ACTIVITY TYPE CLASSIFICATION ACCURACY
FROM ACCELEROMETERS WORN ON THE WRISTS, HIP AND THIGH.................110
ABSTRACT .................................................................................................................................110
INTRODUCTION .......................................................................................................................112
METHODS ..................................................................................................................................116
Summary of protocol .......................................................................................................116
Participants.......................................................................................................................116
Instrumentation ................................................................................................................116
ActiGraph accelerometers....................................................................................117
GENEA accelerometers .......................................................................................117
iPAQ portable digital assistant and direct observation ........................................118
Procedure .........................................................................................................................118
Data reduction and modeling ...........................................................................................121
Artificial neural networks ....................................................................................121
Window length .....................................................................................................124
Features ................................................................................................................125
Activity type classification ..................................................................................129
Identifying non-wear ............................................................................................130
Direct observation ................................................................................................131
Statistical analyses ...........................................................................................................131
Power analysis .................................................................................................................133
RESULTS ....................................................................................................................................134
Confusion matrices ..........................................................................................................137
Activity categories ...........................................................................................................139
Activity intensity categories ............................................................................................141
DISCUSSION ..............................................................................................................................155
Strengths and limitations..................................................................................................163
Conclusions ......................................................................................................................164
viii

CHAPTER 5: VALIDATION AND COMPARISON OF ACCELEROMETERS WORN
ON THE WRISTS, HIP, AND THIGH FOR MEASURING SEDENTARY BEHAVIOR
......................................................................................................................................................165
ABSTRACT .................................................................................................................................165
INTRODUCTION .......................................................................................................................167
METHODS ..................................................................................................................................172
Summary of protocol .......................................................................................................172
Participants.......................................................................................................................172
Instrumentation ................................................................................................................172
ActiGraph accelerometers....................................................................................173
GENEA accelerometers .......................................................................................173
iPAQ portable digital assistant and direct observation ........................................174
Procedure .........................................................................................................................174
Data reduction and modeling ...........................................................................................177
Artificial neural networks ....................................................................................177
Assessing sedentary behavior using accelerometers............................................182
Direct observation ................................................................................................183
Statistical analyses ...........................................................................................................184
Power analysis .................................................................................................................185
RESULTS ....................................................................................................................................187
DISCUSSION ..............................................................................................................................194
Strengths and limitations..................................................................................................199
Conclusions ......................................................................................................................199
CHAPTER 6: DISSERTATION SUMMARY AND RECOMMENDATIONS...................201
Summary of results ..........................................................................................................201
Chapter 3: Estimation of energy expenditure ......................................................201
Chapter 4: Classification of activity type.............................................................205
Chapter 5: Estimation of sedentary behavior .......................................................209
Conclusions ..........................................................................................................212
Recommendations for future research .............................................................................218
APPENDICES ............................................................................................................................222
APPENDIX A: Consent form ...................................................................................................223
APPENDIX B: Recruitment flyer ............................................................................................227
APPENDIX C: Email flyer .......................................................................................................228
APPENDIX D: Supplemental figures ......................................................................................229
REFERENCES ...........................................................................................................................242

ix

LIST OF TABLES

Table 2.1. Comparison of wireless accelerometer systems for activity classification accuracy and
EE prediction accuracy ..................................................................................................................47
Table 2.2. Comparison of different monitor placements for activity classification accuracy and
EE prediction accuracy ..................................................................................................................58
Table 3.1. Activities performed during the simulated free-living protocol .....................................81
Table 3.2. Features used for EE prediction ....................................................................................90
Table 3.3. Feature sets used for creation and testing of ANNs .......................................................91
Table 3.4. Minimum Pearson correlations detectable for a given sample size and power ...............95
Table 3.5. Demographic characteristics of participants enrolled in study .......................................96
Table 3.6. Correlations of measured vs. predicted EE ...................................................................97
Table 3.7. Bias for measured vs. predicted EE...............................................................................99
Table 4.1. Activities performed during the simulated free-living protocol...................................120
Table 4.2. Features used for EE and activity type prediction .......................................................128
Table 4.3. Feature sets used for creation and testing of ANNs .....................................................129
Table 4.4. Demographic characteristics of participants enrolled in study .....................................134
Table 4.5. Overall sensitivity, specificity, and AUC for each of the four accelerometer
placements for feature set 1 .........................................................................................................137
Table 4.6. Confusion matrix for activity type classification from a hip-mounted ActiGraph
accelerometer ...............................................................................................................................143
Table 4.7. Confusion matrix for activity type classification from a thigh-mounted ActiGraph
accelerometer ...............................................................................................................................144
Table 4.8. Confusion matrix for activity type classification from a GENEA accelerometer
mounted on the left wrist .............................................................................................................145

x

Table 4.9. Confusion matrix for activity type classification from a GENEA accelerometer
mounted on the right wrist ...........................................................................................................146
Table 4.10. Activity-specific sensitivity, specificity, and AUC among the four accelerometer
placement sites. ............................................................................................................................147
Table 4.11. Overall sensitivity, specificity, and AUC among the four accelerometer placement
sites with combined activity categories. ......................................................................................149
Table 4.12. Activities classified into activity intensities by the Compendium and by measured
METs............................................................................................................................................151
Table 4.13. Overall sensitivity, specificity, and AUC among the four accelerometer placement
sites for classification of activity intensity...................................................................................153
Table 5.1. Activities performed during the simulated free-living protocol...................................176
Table 5.2. Features used for EE and activity type prediction .......................................................181
Table 5.3. Demographic characteristics of participants enrolled in study .....................................187
Table 5.4. Root mean square error for prediction of total time spent in SB and breaks in SB ...189
Table 6.1. Overall sensitivity, specificity, and AUC among the four accelerometer placement
sites for classification of activity intensity using the energy expenditure ANNs (developed in
Chapter 3).....................................................................................................................................214

xi

LIST OF FIGURES

Figure 3.1 ANN for predicting EE .................................................................................................84
Figure 3.2. RMSE values for predicted vs. measured EE ..............................................................98
Figure 4.1. ANN for predicting activity type ...............................................................................123
Figure 4.2. Sensitivity for the four accelerometers, compared among feature sets ....................136
Figure 4.3. Comparison of dominant and non-dominant wrist accelerometer sensitivities........154
Figure 5.1. ANN for predicting activity type and sedentary behavior ..........................................179
Figure 5.2. Predictions of total time spent in SB compared to a criterion measure (direct
observation)..................................................................................................................................188
Figure 5.3. Predictions of breaks in SB using a five-second interval .........................................191
Figure 5.4. Predictions of breaks in SB using a 30-second interval ...........................................191
Figure 5.5. Predictions of breaks in SB using a 60-second interval ...........................................192
Figure B.1. Recruitment flyer .......................................................................................................227
Figure D.1. Equipment worn by participants during the 90-min protocol. Participant shown is
performing the lying activity (T1) .................................................................................................229
Figure D.2. Example of participant performing reading activity (T2) ..........................................230
Figure D.3. Example of participant performing computer use activity (T3) .................................231
Figure D.4. Example of participant performing standing activity (T4) .........................................232
Figure D.5. Example of participant performing laundry activity (T5) ..........................................233
Figure D.6. Example of participant performing sweeping activity (T6) .......................................234
Figure D.7. Example of participant performing walking slow and fast activities (T 7 and T8) ......235
Figure D.8. Example of participant performing jogging activity (T9) ..........................................236

xii

Figure D.9. Example of participant performing cycling activity (T10) .........................................237
Figure D.10. Example of participant performing stair use activity (T11) ......................................238
Figure D.11. Example of participant performing biceps curls activity (T12) ................................239
Figure D.12. Example of participant performing squats activity (T13) .........................................240
Figure D.13. Example of non-wear (T14) .....................................................................................241

xiii

KEY TO SYMBOLS AND ABBREVATIONS
ANN

artificial neural network

AUC

area under the receiver operating characteristic curve

BMI

body mass index

counts/minute

accelerometer signal counts per minute

CSA

Computer Science Application accelerometer

CV

coefficient of variation

DO

direct observation

EE

energy expenditure

g

gravitational force

HR

heart rate

Hz

hertz

IDEEA

Intelligent Device for Energy Expenditure and Activity

kcal

kilocalorie (or Calorie)

kcal/wear time

kilocalories per hour of time the accelerometer was worn

kg

kilogram

kg/m2

kilograms per meter squared

LPA

light-intensity physical activity

LSI

Large-Scale Integrated motor activity monitor

MET-hour

metabolic equivalent hours

METs

metabolic equivalents

ml

milliliter

ml/kg/min

milliliters per kilogram body mass per minute
xiv

mph

miles per hour

MVPA

moderate-to-vigorous intensity physical activity

NHANES

National Health and Nutrition Examination Survey

PA

physical activity

PDA

personal digital assistant

r

Pearson correlation

RMANOVA

repeated measures analysis of variance

RMSE

root mean square error

rpm

revolutions per minute

SB

sedentary behavior

SD

standard deviation

TV

television

VCO2

volume of carbon dioxide expelled

VO2

volume of oxygen consumed

x-axis

vertical accelerometer axis

y-axis

medial-lateral accelerometer axis

z-axis

anterior-posterior accelerometer axis

xv

CHAPTER 1
INTRODUCTION
Physical activity and sedentary behavior
Physical activity (PA) is widely recognized for its beneficial effects on many aspects of
health, including reduced risk of obesity (King and Tribble 1991), hypertension (Paffenbarger,
Wing et al. 1983; Chobanian, Bakris et al. 2003), diabetes (Healy, Wijndaele et al. 2008),
cardiovascular disease (Paffenbarger, Hyde et al. 1986; Morris, Clayton et al. 1990), some cancers
(Thune and Furberg 2001), and all-cause mortality (Lee and Skerrett 2001). Based on most
evidence, the US Department of Health and Human Services recommends a minimum of 150
min/week of moderate-intensity PA or 75 min/week of vigorous-intensity PA, defined as activities
requiring an energy expenditure (EE) of at least 3.0 or 6.0 times the resting level (METs),
respectively, to experience health benefits (2008).
Activities below 3.0 METs do not qualify as moderate- or vigorous-intensity PA and
instead are labelled as either light-intensity PA or sedentary behavior. Sedentary behavior (SB) is
defined as a supine or seated activity requiring low levels of EE (< 1.5 METs) (Owen, Healy et al.
2010; SBRN 2012). Examples of SB include watching television (TV), using a computer, or
driving. SB has historically been viewed as a lack of moderate-to-vigorous PA (MVPA); however,
recent epidemiological and laboratory-based evidence suggests that SB elicits distinct physiologic
responses from MVPA, with high levels of SB associated with diminished metabolic (Hamilton,
Hamilton et al. 2004; Hamilton, Hamilton et al. 2007), cardiovascular (Schrage 2008), and bone
health (Zerwekh, Ruml et al. 1998) and increased risk of obesity (Hu, Li et al. 2003), some cancers
(Howard, Freedman et al. 2008), and all-cause mortality (Katzmarzyk, Church et al. 2009). It is

1

important to note that the associations between SB and negative heath conditions exist
independently of total MVPA (Owen, Healy et al. 2010). These associations are especially
concerning given that technological advances (e.g., motor vehicles, TV, computers) have
contributed to an increased time spent sedentary (Matthews, Chen et al. 2008). Moreover, there
are several components of SB that may influence health, notably the total time spent in SB (Healy,
Wijndaele et al. 2008) as well as the number of times SB is broken up by non-sedentary activities
(breaks in SB) (Healy, Dunstan et al. 2008). Thus, it is important to be able to accurately measure
each of these components to determine the true influence of SB on health.
Despite the available evidence, there are still knowledge gaps regarding the specific effects
of PA and SB on health (PAGAC 2008). For example, there is not enough research into SB to
allow for evidence-based recommendations to be developed. Additionally, there is currently only
limited evidence of dose-response or threshold effects of SB on chronic health conditions such as
heart disease and cancer (Owen, Healy et al. 2010). These knowledge gaps are due mainly to the
absence of a single measurement tool that is valid for measuring both PA and SB and that can be
used for a variety of activities and environments (Owen, Healy et al. 2010). Without such a
measurement tool, researchers will be unable to accurately assess the relationship of PA and SB to
health outcomes, monitor precise levels of PA or SB, or evaluate the effectiveness of interventions
aimed to increase PA and decrease SB.
Measurement of physical activity and sedentary behavior
PA and SB can be assessed using a number of different methods, but accelerometers have
emerged as a preferred method of assessing free-living PA and SB due to their objectivity, minimal
participant burden, and rich data that can be collected for periods of up to 4-6 weeks and beyond

2

(Welk 2002). Accelerometer data can be used to estimate energy expenditure (EE) and time spent
in various activity intensities (sedentary, light, moderate, vigorous). Accelerometers are generally
worn on the hip for comfort, convenience, and utility for measuring movements of the whole body.
Additionally, hip-mounted accelerometers have shown good utility for measuring ambulatory
activities (e.g., walking, running) in laboratory-based settings (Freedson, Melanson et al. 1998;
Rothney, Schaefer et al. 2008; Lyden, Kozey et al. 2011).
Traditionally, accelerometer data have been filtered and then translated into ‘activity
counts.’ These counts are then placed into simple linear regression equations to estimate EE
(Montoye, Washburn et al. 1983; Freedson, Melanson et al. 1998). Linear regression works well
for measuring the energy cost of ambulatory activities (i.e., walking and running), but it
dramatically under- or overestimates the EE requirement of many sedentary, lifestyle, and exercise
activities and does not allow for classification of activity type (i.e., classifying activities as sitting,
walking, running, cycling, etc.) (Crouter, Churilla et al. 2006; Rothney, Schaefer et al. 2008;
Lyden, Kozey et al. 2011). Other data processing methods, such as machine learning, have
recently evolved as successful alternatives for analyzing data collected from accelerometers.
Machine learning is the general term for an array of mathematical techniques that can be
used to recognize patterns in data and use those patterns to accurately predict activity type or EE.
Machine learning bears some similarities to linear regression; for example, both machine
learning and linear regression use one or more input (independent) variables (e.g., accelerometer
counts, heart rate, etc.) to predict an outcome (such as EE). However, unlike traditional linear
regression, machine learning techniques do not assume a simple relationship between
accelerometer counts and EE, and machine learning takes more information from the
accelerometer than just counts (e.g., monitor orientation, patterns of count accrual). Machine
3

learning techniques are more complicated than linear regression, but they show improvements in
EE measurement (Rothney, Neumann et al. 2007; Staudenmayer, Pober et al. 2009) and allow
for classification of activity type (Khan, Lee et al. 2008; Khan, Lee et al. 2010; Trost, Wong et
al. 2012), thereby allowing estimation of time spent in SB and breaks in SB. Currently, there is
no consensus on which machine learning technique is best for EE measurement or activity
classification; however, artificial neural networks (ANNs) have received the most use in
kinesiology-based studies because they can be used to predict both continuous variables (such as
EE) and categorical variables (such as activity type classification). Additionally, ANNs can be
applied to data from commonly used accelerometers and can be developed from freely available
software packages (e.g., R statistical software) (Staudenmayer, Pober et al. 2009).
Despite the emphasis on measurement of EE and time spent in MVPA, classification and
measurement of SB have lagged behind. Only very recently have validation studies been
conducted specifically to assess the ability of accelerometers to accurately measure time spent in
SB and breaks in SB, and these studies have yielded mixed results (Grant, Ryan et al. 2006;
Kozey-Keadle, Libertine et al. 2011; Lyden, Kozey Keadle et al. 2012). Additionally, SB has
rarely been included in protocols utilizing machine learning for EE and activity classification
(Freedson, Lyden et al. 2011), leaving a large gap in the literature regarding the utility for
accelerometer measurement of SB. Finally, standing has often been considered a type of SB, but
standing involves significant contraction of muscles in the legs and postural muscles and does not
have many of the negative physiologic effects of prolonged sitting or lying (Hamilton, Hamilton et
al. 2004; Hamilton, Hamilton et al. 2007). Additionally, it may be that different types of SB elicit
different amounts of muscle contraction (e.g., sitting at a computer might require postural muscles,
while lying down may not). Therefore, an accurate measurement tool must be able to differentiate
4

standing from SB and also differentiate among different types of SB in order to gain an
understanding of the true health risks of SB.
Ultimately, there must be a balance between quality of information/data collected in a
research study vs. the burden on participants and researchers. For accelerometer-based activity
monitor data, use of multiple monitors and collection of several physiologic variables can
improve EE measurement and activity classification (Rothney, Neumann et al. 2007; De Vries,
Garre et al. 2011; Dong, Biswas et al. 2013) . However, more monitors also increases participant
burden dramatically, which may lower compliance rates and, consequently, reduce the amount
and quality of data collected. Additionally, large-scale studies cannot easily use multiple
monitors due to the dramatic increase in time, burden, and cost necessary to collect, process, and
analyze the data. Use of a single activity monitor that can collect data on one or more variables
is strongly preferred for large, free-living studies due to ease of use for participants and
researchers while still providing a valid measurement of the PA outcome variable(s) of interest.
Additionally, machine learning techniques are much more complex to use and understand than
traditional linear regression techniques. In order to make machine learning suitable for
researchers to use, current approaches to developing and using machine learning must be
simplified as much as possible without losing measurement accuracy. In summary, there is a
need to refine the methodology for using a single activity monitor for measurement of EE and
classification of both SB and non-sedentary activities, especially in free-living settings;
additionally, efforts to reduce the complexity of machine learning will make this approach more
accessible to researchers who want to measure PA, EE, and/or SB but who are not measurement
specialists.

5

Hip-mounted accelerometers are commonly used for comfort and utility for measuring
ambulatory activities, but they may offer a more limited ability to classify certain types of
activities. Machine learning techniques have been applied to hip-mounted accelerometers with a
high degree of success for measuring EE (Rothney, Neumann et al. 2007; Staudenmayer, Pober
et al. 2009; Trost, Wong et al. 2012) and activity type (when assessing non-sedentary activities)
(Khan, Lee et al. 2008; Bonomi, Plasqui et al. 2009; Khan, Lee et al. 2010). Conversely, these
techniques have rarely been used for measurement of total SB, distinguishing among standing
and different types of SB, or measuring breaks in SB (Freedson, Lyden et al. 2011; Trost, Wong
et al. 2012). Given the previous success of ANNs for improvement of activity type classification
and EE prediction, it is likely that creation of ANNs trained on data that include both sedentary
and non-sedentary activities will further improve assessment of activity type classification, time
spent in SB, and breaks in SB while also improving EE measurement. Our study will address
this shortcoming in the literature by creating and validating an ANN based on both sedentary and
non-sedentary activities for a hip-mounted ActiGraph accelerometer. This ANN will be tested
for its utility to correctly classify activity type and measure time in SB, breaks in SB, and EE.
While the hip is the most common accelerometer placement location for measuring
activity, there is evidence that placement on the other parts of the body, such as the thigh and
wrist, can yield similar or slightly better accuracy for measuring PA and EE (Bouten, Sauren et
al. 1997; Bao and Intille 2004; De Vries, Garre et al. 2011; Esliger, Rowlands et al. 2011;
Mannini, Intille et al. 2013). Additionally, there is consistent evidence that thigh-mounted
accelerometers can accurately measure total SB and breaks in SB, which may not be true of a
hip-mounted accelerometer (Grant, Ryan et al. 2006; Hart, Ainsworth et al. 2011; Kozey-Keadle,
Libertine et al. 2011; Lyden, Kozey Keadle et al. 2012). However, to date, no published study
6

has evaluated a thigh-mounted accelerometer for its utility in assessing measurement of EE or
classification of both PA and SB. Thus, our study will also develop and test an ANN for
classifying activity type and measuring SB and EE using data from a thigh-mounted ActiGraph
accelerometer.
The GENEA is a newly developed accelerometer (designed to be worn on the wrist) that
has four functions that serve to dramatically increase compliance and non-wear determination: 1)
it has a thin, low-profile design, 2) it is waterproof, 3) it has a battery life and memory capacity
of up to 45 days, and 4) it has a temperature sensor to help detect when it is being worn.
Therefore, the monitor does not need to be removed for any reason during data collection, and if
it is, the temperature sensor will help to determine exact wear-time. These advantages, along
with the GENEA’s raw data recording and reasonable price, make the GENEA ideal for
measuring EE and SB and classifying activity type in free-living situations and large studies.
Using the traditional, cut-point approach, the GENEA has been shown to have high accuracy for
measuring EE (r>0.80) in a validation study when worn on the hip and wrist (Esliger, Rowlands
et al. 2011) but much lower accuracy for classifying activity intensity in a cross-validation of the
cut-points (Welch, Bassett et al. 2013; Welch, Bassett et al. 2014). Recently, the wrist-worn
GENEA was tested using machine learning and showed high accuracy (>95% classification
accuracy) for identifying 10-12 types of activities in a laboratory-based setting (Zhang,
Rowlands et al. 2012). However, there are still many unanswered questions regarding the
GENEA, including the ability to use machine learning to predict EE and measure total SB and
breaks in SB, especially in a free-living environment. Therefore, our study developed and tested
ANNs to measure SB and EE and classify activity type from raw data obtained from two wristmounted GENEA accelerometers. Additionally, it is conventional to wear wrist-mounted
7

accelerometers to be worn on the non-dominant wrist due to perceived superior measurement
accuracy, but there is little evidence to support this convention. Therefore, the current study
tested and compared monitors worn on both wrists to examine differences in accuracy between
the dominant and non-dominant wrists.
Finally, measurement techniques are often validated for use in heavily controlled,
laboratory-based settings (Freedson, Melanson et al. 1998; Esliger, Rowlands et al. 2011; Zhang,
Rowlands et al. 2012; Dong, Montoye et al. 2013). This method is important for providing a
proof-of-concept that a technique can accurately measure what it is supposed to measure and
identify potential limitations of the measurement technique. However, laboratory settings are
very different than free-living conditions, and there is considerable evidence showing that
predictive models developed in laboratory validations do not work well when applied to freeliving settings (Hendelman, Miller et al. 2000; Swartz, Strath et al. 2000; Freedson, Lyden et al.
2011; Gyllensten and Bonomi 2011; van Hees, Golubic et al. 2013; Welch, Bassett et al. 2014).
Therefore, it is important to incorporate aspects of a free-living setting into validation studies to
increase their real-world generalizability.
In summary, our study developed and assessed the accuracy of ANNs for the
measurement of EE, SB, and activity type using data collected from hip- and thigh-mounted
ActiGraph accelerometers and two wrist-mounted GENEA accelerometers. These ANNs were
created and validated in a free-living simulation, using a portable metabolic analyzer as the
criterion measure of EE and direct observation (DO) as the criterion measure of activity type,
SB, and breaks in SB.

8

SPECIFIC AIMS AND HYPOTHESES
Objective 1: In a simulated free-living setting, create and test an ANN to estimate EE for a hipmounted ActiGraph GT3X+ accelerometer, a thigh-mounted ActiGraph GT3X+ accelerometer,
and two wrist-mounted GENEActiv accelerometers (total of four ANNs).
Aim 1: Create EE ANNs for the three accelerometers using simple-to-understand accelerometer
signal features and a freely available software package and test a range of potential features to
identify which are most relevant for inclusion in the ANNs. This aim is not hypothesis-driven.
Aim 2: Assess the criterion validity of the hip-, thigh-, and wrist ANNs developed for the four
accelerometers for estimating EE, using EE measured by a portable metabolic analyzer as a
criterion.
-

Hypothesis 2a: All four accelerometers would have at least moderately high validity for
measuring EE, as demonstrated by Pearson correlation coefficients of r≥0.60.

-

Hypothesis 2b: The thigh-mounted accelerometer would have the highest accuracy (as
represented by the lowest root mean square error [RMSE] and highest Pearson
correlations [r]) for predicting EE, and the wrist-mounted accelerometers would have the
lowest accuracy (highest RMSE and lowest r values) for predicting EE. The hip
accelerometer placement would be significantly less accurate than the thigh but
significantly more accurate than the wrist accelerometers. Differences among RMSE and
r values were evaluated using repeated-measures analysis of variance (RMANOVA).

9

Hypothesis 2c: Accuracy for predicting EE would be similar for the accelerometers worn on the
dominant and non-dominant wrists. Differences between RMSE and r values for the left and
right wrist placement sites were evaluated using RMANOVA.
Objective 2: In a simulated free-living setting, create and test ANNs to correctly classify activity
type from a hip-mounted ActiGraph GT3X+ accelerometer, a thigh-mounted ActiGraph GT3X+
accelerometer, and two wrist-mounted GENEActiv accelerometers (total of four ANNs).
Aim 3: Create activity type ANNs using simple-to-understand accelerometer signal features and
a freely available software package and to evaluate the utility of different sets of accelerometer
features for inclusion in the ANNs. This aim is not hypothesis-driven.
Aim 4: Assess the criterion validity of the ANNs for the four accelerometers for classifying
activity type, using direct observation (DO) of activity type as a criterion measure. For
hypotheses 4b-4f, differences among accelerometer placement sites were evaluated by
RMANOVA.
-

Hypothesis 4a: Overall classification accuracies (determined by sensitivity of the ANNs)
would be at least 70% for the thigh-, hip-, and wrist-mounted accelerometers.

-

Hypothesis 4b: Overall classification accuracy would be significantly higher for the
thigh-mounted accelerometer than the hip- or wrist-mounted accelerometers.

-

Hypothesis 4c: For ambulatory activities (walking, jogging) and climbing/descending
stairs, all four accelerometers would have classification accuracies no more than 5%
different among accelerometers.

10

-

Hypothesis 4d: For lifestyle activities (laundry and sweeping) and exercise activities
(biceps curls and squats), the wrist-mounted accelerometers would yield significantly
higher classification accuracy than the hip- or thigh-mounted accelerometers.

-

Hypothesis 4e: For SB (lying and sitting), standing, and cycling, the thigh-mounted
accelerometer would yield significantly higher classification accuracy than the hip- or
wrist-mounted accelerometers.

-

Hypothesis 4f: The dominant and non-dominant wrist accelerometers would yield
classification accuracies not significantly different from each other.

Objective 3: In a simulated free-living setting, use the activity type ANNs (created in Aim 3) for
the four accelerometers for determining total time spent in SB and breaks in SB.
Aim 5: Assess the criterion validity of the activity type ANNs developed for the four
accelerometers for estimating total time spent in SB, using DO as the criterion measure. For
hypotheses 5a-5c, differences among accelerometer placement sites and the criterion measure
were evaluated using RMANOVA.
-

Hypothesis 5a: Total time spent in SB estimated from the thigh-mounted accelerometer
would not be significantly different from DO-measured total time spent in SB (i.e., the
thigh-mounted accelerometer would accurately measure total time spent in SB).

-

Hypothesis 5b: The wrist-mounted accelerometers would significantly underpredict total
time spent inSB compared to that measured by DO.

-

Hypothesis 5c: The hip-mounted accelerometer would significantly overpredict total time
spent in SB compared to that measured by DO.

11

Aim 6: Assess the criterion validity of the ANNs developed for the three accelerometers for
classifying breaks in SB, using DO as the criterion measure. For hypotheses 6a-6c, differences
among accelerometer placement sites and the criterion measure were evaluated using
RMANOVA.
-

Hypothesis 6a: Breaks in SB estimated from the thigh-mounted accelerometer would not
be significantly different from DO-measured breaks in SB (i.e., the thigh-mounted
accelerometer will accurately measure breaks in SB).

-

Hypothesis 6b: The wrist-mounted accelerometers would significantly overpredict breaks
in SB compared to that measured by DO.

-

Hypothesis 6c: The hip-mounted accelerometer would significantly underpredict breaks
in SB compared to that measured by DO.
This dissertation is split up into several chapters. Chapter 2 provides a comprehensive

review of the literature regarding the use of accelerometers to measure physical activity and
sedentary behavior. Then, Chapter 3 addresses Objective 1 (EE estimation), Chapter 4 addresses
Objective 2 (activity type prediction), and Chapter 5 addresses Objective 3 (sedentary behavior
measurement). Finally, Chapter 6 summarizes the findings of the dissertation and provides
areas for further study.

12

CHAPTER 2
LITERATURE REVIEW
Introduction
Both PA and SB have been shown to influence health. However, the bulk of research
conducted to date has focused on PA, with SB being classified as a lack of PA. This
conventional definition where PA and SB are on opposite ends of a continuum fails to recognize
the complex nature of SB or the independent effects being sedentary may have on people’s
health, even when they obtain the recommended weekly PA dose (Pate, O'Neill et al. 2008;
Dunstan, Howard et al. 2012). For PA, SB, and EE measurement in surveillance, observational,
and intervention studies, recall methods are commonly used due to their low cost and minimal
burden on both participants and researchers. However, accelerometer-based measurement of PA,
SB, and EE is preferred due to its objectivity and potentially improved capability for accurate
measurement of these variables (Welk 2002).
With recent technological improvements in accelerometer capabilities, machine learning
has become a popular method used to process and analyze accelerometer data. While former
processing techniques could only measure EE or activity intensity and were developed for hipmounted accelerometers, machine learning allows researchers to use accelerometers to measure
EE and classify activity type when worn on the hip or other parts of the body (Preece, Goulermas
et al. 2009). Hip placement works well for ambulatory activities (Rosenberger, Haskell et al.
2013), and wrist placement improves compliance and allows for sleep measurement (Mannini,
Intille et al. 2013; Rosenberger, Haskell et al. 2013). However, while hip placement is better
than wrist placement for measurement of SB (Rosenberger, Haskell et al. 2013) neither hip nor
13

wrist placement allow for acceptable accuracy for measurement of SB (Lyden, Kozey Keadle et
al. 2012; Rosenberger, Haskell et al. 2013), which may be due partly to lack of sedentary
activities used to train machine learning algorithms. Thigh placement appears optimal for
measuring SB (Kozey-Keadle, Libertine et al. 2011; Lyden, Kozey Keadle et al. 2012).
Additionally, a few studies showing successful use of a thigh-mounted accelerometer for
classification of SB and non-sedentary activities (Skotte, Korshoj et al. 2012; Dong, Montoye et
al. 2013) and estimation of EE (Metcalf, Curnow et al. 2002) provide preliminary evidence that
the thigh placement may be the ideal solution for comprehensive measurement of PA, SB, and
EE. The current study will directly compare the utility of hip-, thigh-, and wrist-mounted
accelerometers classifying SB and non-sedentary activities and measuring total time in SB,
breaks in SB, and EE in a simulated free-living setting. This literature review begins by
discussing the independent risks of low PA and high SB on multiple health outcomes. Then, the
review addresses the strengths and weaknesses of available measurement methods, focusing on
the progression in the use of accelerometers and the current state of accelerometer use. Finally,
this review highlights several gaps that exist in measurement of EE, SB, and activity type,
leading to the rationale for the current study.
The influence of physical activity and sedentary behavior on health
Physical activity
PA has long been recognized for its importance in maintaining and improving health.
Historical figures such as Hippocrates recognized the beneficial effects of PA on health as early
as 400 B.C., writing the following in his book called Regimen: “Eating alone will not keep a man
well; he must also take exercise” (Precope 1952). Since that time, substantial evidence has been

14

collected to support the role of PA in lowering risk of depression (Martinsen, Hoffart et al. 1989),
obesity (King and Tribble 1991; Blair 1993), hypertension (Paffenbarger, Wing et al. 1983;
Chobanian, Bakris et al. 2003), type II diabetes (Manson, Nathan et al. 1992; Healy, Wijndaele et
al. 2008), cardiovascular disease (Paffenbarger, Hyde et al. 1986; Morris, Clayton et al. 1990),
some cancers (Shephard 1990; Thune and Furberg 2001; Slattery 2004), and all-cause mortality
(Paffenbarger, Hyde et al. 1986; Kampert, Blair et al. 1996). By the 1990s, the evidence was
sufficient to recommend a minimum of 30 min/day of PA on most or all days of the week to
achieve health benefits (Pate, Pratt et al. 1995). Since the these recommendations, several
updates have been published, and the most recent recommendations include a more specific dose
of PA (150 min/week in at least moderate intensity, defined as any activity eliciting an EE of at
least 3.0 METs), separate recommendations for resistance training, and separate or modified
recommendations for children, older adults, adults with disabilities, and pregnant individuals
(2008). PA is commonly measured in min/day for comparison to recommendations, but PA can
also be assessed indirectly through measuring EE, which is useful in terms of energy balance and
assessing total PA. Therefore, an ideal measurement tool should be able to measure both
constructs.
Sedentary behavior
Anyone who does not meet the national recommendations of obtaining at least 150
min/week of MVPA has traditionally been considered sedentary (2008; Pate, O'Neill et al. 2008).
However, Pate et al. (Pate, O'Neill et al. 2008) emphasize that there is a marked difference between
being sedentary and being physically inactive. While is it often the case that individuals are
physically inactive (do not meet PA recommendations) and engage in large amounts of SB, it is
also fairly common for people to engage in high amounts of PA and SB (Pate, O'Neill et al. 2008;
15

Troiano, Berrigan et al. 2008), a categorization Owen et al. call the “Active Couch Potatoes”
(Owen, Healy et al. 2010). To better address the problem of SB, the Sedentary Behavior Research
Network recently redefined “sedentary” to indicate time spent in seated or supine behaviors (e.g.,
TV watching, computer use, and driving) that elicit an EE of 1.0-1.5 METs (Ainsworth, Haskell et
al. 2011; SBRN 2012). It is important to assess these behaviors (PA and SB) separately as
evidence is accumulating that each behavior appears to exert independent effects on health.
The classic 1953 study by Morris et al. (Morris, Heady et al. 1953) highlighted the potential
influence on SB on health by recognizing differences in heart disease incidence between London’s
bus drivers when compared to bus conductors. While the bus drivers spent the vast majority of
their workday sitting in their driver seats, the conductors were constantly on their feet,
accumulating little SB but a lot of LPA and some MVPA while walking through the double-decker
bus and going up and down the stairs. Incidence of heart disease was higher in the drivers than the
conductors, providing evidence that having high PA and low SB is associated with lower risk of
developing heart disease. However, despite this initial evidence, follow-up studies focused less on
SB and more on PA. Given that PA is easier to measure (especially with recall as the only
available field method at the time) (Healy, Clark et al. 2011) and is arguably easier to prescribe as
part of a lifestyle intervention, it is not surprising that follow-up research focused on the effects of
PA and health.
The importance of SB as a determinant of health returned to prominence in the last 10-15
years with the recognition that our society is becoming increasingly sedentary (Matthews, Chen et
al. 2008), likely due to technological advances which increase the number of sedentary jobs and
allow for more motorized transportation. Using National Health and Nutrition Examination
Survey (NHANES) data, Matthews et al. (Matthews, Chen et al. 2008) found that adults spend
16

over 50% of their waking time (7.7 hours/day) engaged in SB (Matthews, Chen et al. 2008), while
Troiano found that average adults spend only 27-29 min/day engaged in MVPA (Troiano, Berrigan
et al. 2008). Together, this information indicates that adults spend an average of more than 10
times as much time in SB as in MVPA. Since SB comprises such a large percentage of the average
person’s day, it is not surprising that SB has been linked to an array of health outcomes, including
obesity (Shields and Tremblay 2008), metabolic and cardiovascular health (Healy, Dunstan et al.
2007), and all-cause mortality (Katzmarzyk, Church et al. 2009).
From an energy balance perspective, SB requires much less energy than LPA or MVPA,
resulting in lower daily EE and increasing risk for weight gain and obesity (Levine, Eberhardt et al.
1999; Hu, Li et al. 2003; Levine, Lanningham-Foster et al. 2005). For example, for a person with a
resting EE of 70 kcal/hour, replacing two hours of SB with two hours of LPA could burn an extra
140 kcal/day ([2.5 METs *2 hours] /[1.5 METs*2 hours] * 70 kcal/MET-hour), which is more than
the 105 kcal required to walk at a moderate intensity for 30 minute (3 METs * 0.5 hours = 1.5
MET-hour * 70kcal/MET-hour). In fact, if this person maintained a constant energy balance but
wanted to lose weight, replacing two hours of SB with LPA (and holding all other factors constant)
would result in losing one pound of body weight every 24 days ([3,500 kcal/lb] / [140 kcal/day]),
or over 15 pounds in a year. In a laboratory-based study of 20 adults, Swartz et al. (Swartz,
Squires et al. 2011) put this theory into action, measuring EE while having participants complete
four activity protocols. Each protocol lasted for 30 minutes; all four bouts started with SB, and
then the participant either continued to sit or broke their SB with a one-, two-, or five-minute walk
at a self-selected pace. After extrapolating the results to the standard eight-hour workday,
participants would burn 132 kcal/day more by taking five-minute walking breaks every 30 minutes
(total of 80 minutes of walking) than by sitting for the entire eight hours. In summary, SB can
17

have important implications for total energy balance and maintenance or attainment of a healthy
body weight.
In addition to SB resulting in lower EE, high amounts of SB and prolonged SB have been
shown to negatively affect metabolic and cardiovascular health in laboratory-based studies.
Studies in mice and rats have introduced forced SB by immobilizing the animals’ hind limbs, and
these studies show that in as little as a few hours, the muscular unloading caused by prolonged SB
can result in reduced insulin sensitivity (Seider, Nicholson et al. 1982; Hamilton, Hamilton et al.
2007), poor glucose transport (Ploug, Ohkuwa et al. 1995), and suppression of muscle lipoprotein
lipase (Bey and Hamilton 2003; Zderic and Hamilton 2006). Additionally, bed rest studies in
humans reveal major negative changes in insulin sensitivity (sometimes reducing sensitivity by
40% or more) (Stuart, Shangraw et al. 1988; Mikines, Richter et al. 1991; Smorawinski, Kubala et
al. 1996; Bergouignan, Rudwill et al. 2011), high-density lipoprotein cholesterol levels
(Yanagibori, Suzuki et al. 1997; Yanagibori, Kondo et al. 1998), and increased risk of blood clots
(Bird 1972; Kierkegaard, Norgren et al. 1987) within the first day spent in bed. Similarly,
impaired insulin action (Tobin, Uchakin et al. 2002) and blood pressure responses (Hargens and
Richardson 2009) have been observed with spaceflights and simulated microgravity. All three of
these research avenues point toward the contribution of SB and a lack of breaks in SB to carrying
negative health consequences; however, results from animal studies cannot be directly applied to
humans, and bed rest and spaceflight studies represent an extreme situation to which humans are
rarely exposed, limiting their generalizability to typical, free-living SB. Importantly, standing,
which is often considered a sedentary activity, does not fit the definition of a sedentary behavior
because it is not a supine or seated posture, even though it does elicit an energy cost of less than
1.5 METs (Ainsworth, Haskell et al. 2011). Moreover, standing requires significant and prolonged
18

contraction of major muscle groups in the legs, and this does not fit the proposed mechanism for
many of the negative physiologic effects seen with prolonged sitting or lying (Hamilton, Hamilton
et al. 2004; Hamilton, Hamilton et al. 2007). Therefore, standing likely does not affect health in
the same way as SB and must be assessed as a separate construct when identifying the health risks
of SB.
To support the evidence from these laboratory-based and spaceflight studies, data from
several large epidemiologic studies have been used to assess links between SB and health
outcomes, both longitudinally and cross-sectionally. Cross-sectional studies have added
considerably to our knowledge of SB and its relationship to health. Healy and colleagues have
published several studies assessing the associations between SB and cardiometabolic health
(Healy, Dunstan et al. 2007; Healy, Dunstan et al. 2008; Healy, Dunstan et al. 2008; Healy,
Wijndaele et al. 2008; Wijndaele, Healy et al. 2010; Healy, Matthews et al. 2011). Using
accelerometer-derived SB (≤100 counts/min using the ActiGraph accelerometer), they found that
US adults in the highest quartile of SB had several adverse cardiometabolic biomarkers, including
32% higher insulin and 12% higher C-reactive protein levels, a 5% drop in high-density
lipoprotein, and a 1.6 cm larger average waist circumference when compared to adults in the
lowest SB quartiles (NHANES data) (Healy, Matthews et al. 2011). Similarly, in a subsample of
participants enrolled in the Australian Diabetes, Obesity and Lifestyle Study, a 30-min decrease
in SB was associated with a 7% lower waist circumference, and a similar drop in clustered
metabolic risk score (Healy, Dunstan et al. 2007; Healy, Wijndaele et al. 2008). Additionally, in
several different samples, Healy et al. have found that adults in the highest quartile for rates of
breaking up SB with short periods of non-sedentary activity tend to have better metabolic health as
well as a 5% lower waist circumference than adults in the lowest quartile (Healy, Dunstan et al.
19

2008). Healy et al. also found an inverse dose-response relationship of SB breaks with BMI and
plasma glucose, independent of total PA or SB (Healy, Dunstan et al. 2007; Healy, Dunstan et al.
2008; Healy, Matthews et al. 2011). These cross-sectional studies provide strong evidence of
associations between SB and several health indices, but from these alone we cannot establish cause
and effect.
Longitudinal evidence has also shown some support for the link between SB and many
health conditions, although the evidence is less conclusive than in cross-sectional work. A 2011
review by Thorp et al. (Thorp, Owen et al. 2011) provides good insight into the state of the
longitudinal evidence concerning the link between SB and health outcomes. Of the 48 studies
included, 45 used self-report measures (TV watching and/or total sitting time), one used HR, one
used both HR and self-report, and only one used accelerometry to measure PA and SB. Thorp’s
review showed consistent evidence of an association between high levels of SB and risk of
cardiovascular disease, all-cause mortality, and obesity. In many of the studies, the authors
statistically controlled for BMI and time spent in MVPA, but few accounted for variables such as
education or socioeconomic states. In two studies included in the review, those in the highest SB
category had 54-130% increased risk of cardiovascular disease and 52-54% increased risk of allcause mortality in 4- and 12-year follow-ups (Katzmarzyk, Church et al. 2009; Stamatakis, Hamer
et al. 2011). Similarly, two other studies (6.6- and 10-year follow-ups) showed a dose-response,
with each hour of extra television watched per day increasing risk of cardiovascular disease by 718% and all-cause mortality by 4-11%. In relation to obesity risk, several studies showed that
high SB in childhood was related to a 22-42% increased risk of obesity in early adulthood (Boone,
Gordon-Larsen et al. 2007; Erik Landhuis, Poulton et al. 2008).

20

Thorp’s review also shows some evidence of an association between SB and risks of
developing diabetes and certain types of cancer. For example, two studies found dose-response
relationships between SB and risk of developing diabetes, with the highest SB group having a 61187% increased risk of developing diabetes in 8-10 year follow-ups (Hu, Leitzmann et al. 2001;
Ford, Schulze et al. 2010); in another study, each two-hour increase in SB was associated with a 714% increase.in diabetes risk during a 6-year follow-up (Hu, Li et al. 2003). However, in the two
studies using objectively measured SB (HR and accelerometry), there were conflicting results
regarding the relationship between SB and insulin resistance (Ekelund, Brage et al. 2009;
Helmerhorst, Wijndaele et al. 2009), and some of these studies also find that controlling for other
factors such as PA moderate the associations. Similarly, in relation to cancer risk, two studies
(with 9- and 10-year follow-ups) found that those with high SB had a 55% increased risk of
developing ovarian cancer in females and a 61% increased risk of developing colon cancer in
males (but not females) (Patel, Rodriguez et al. 2006; Howard, Freedman et al. 2008), although
findings from other studies and other types of cancer have been mixed (Howard, Freedman et al.
2008; Gierach, Chang et al. 2009). These findings are intriguing but far from conclusive,
warranting more research examining SB in relation to these outcome variables.
Moreover, in eight of the studies, PA appeared to mediate the effects of SB on health
outcomes, casting some doubt of the robustness of SB as a risk factor independent of PA. In one
such study, Katzmarzyk examined self-reported time spent standing and mortality and found an
inverse dose-response relationships between standing time and both mortality and cardiovascular
disease, but only among those with low PA levels (Katzmarzyk 2014). Yet, the considerable
variation in self-report instruments used and the paucity of research using objective measures of
PA or SB severely limits our understanding of the true risk of SB on health or what levels of SB
21

are appropriate for maintaining or enhancing health. Previous evidence indicates that
accelerometers yield higher quality data and stronger associations with health outcomes than selfreport (Reilly, Penpraze et al. 2008; Celis-Morales, Perez-Bravo et al. 2012) and recent evidence
from a review by Atkin et al. (Atkin, Gorely et al. 2012) found that most self-report measures of
SB have poor validity. Therefore, it is likely that objectively-measured PA and SB will yield
stronger and more consistent associations of SB with health and greatly enhance our understanding
of the ways in which these behaviors influences health.
In conclusion, experimental studies have shown that prolonged SB has negative effects on
metabolic variables that contribute to long-term disease risk. Also, there is evidence from crosssectional and longitudinal studies showing that TV watching and overall SB have a strong and
consistent association with risk of several chronic diseases, although results were based on poor
measures of PA and SB. However, to continue to determine the specific effects and true risk of SB
on health, discover patterns of PA and SB associated with increased disease risk, and develop
national recommendations for SB to improve health, methods for objective measurement of SB
need to be utilized and refined for use in observational and intervention research.
The next section of this literature review focuses on the progression of methods that have
been used for measurement of PA and SB, limitations of the current methods, and gaps in the
literature that are addressed with the current study.

22

Accelerometry as a preferred method to measure physical activity, energy expenditure,
sedentary behavior, and activity type
Measurement methods
Many methods have been developed and used for measuring EE, PA, and SB. For
smaller, laboratory-based studies, methods such as direct or indirect calorimetry can be used to
obtain very accurate measurements of EE, and direct observation (DO) can be used to accurately
record the time and type of PA being performed. However, calorimetry and DO are impractical
for use in public health, surveillance, and epidemiologic research because these types of studies
involve measurement of a large number of participants outside of the controlled laboratory
environment.
In large-scale studies, self-report measures such as questionnaires, diaries, and interviews
are often used to measure EE, PA, and SB. Self-report measures are relatively inexpensive, can
yield estimates of EE, and can provide information about the timing, frequency, and types of PA
and SB performed (Sallis and Saelens 2000). However, self-report is vulnerable to recall bias
and substantial reporting error (LaPorte, Montoye et al. 1985; Sallis and Saelens 2000; Shephard
2003; van Poppel, Chinapaw et al. 2010). Measurement errors associated with self-report reduce
or attenuate associations between PA or SB and disease (Frost and White 2005; Lagerros and
Lagiou 2007); as a result, statistical power decreases when trying to detect significant
relationships between self-reported measures of EE,PA, or SB and health outcomes, and the risk
of type II error increases (Beaton, Milner et al. 1979; MacMahon, Peto et al. 1990).
Additionally, measurement error reduces researchers’ ability to obtain valid measurements of
EE, PA, and SB and hinders efforts to detect meaningful changes in these variables that may

23

occur as the result of lifestyle interventions (Dale, Welk et al. 2002; Healy, Clark et al. 2011;
Matthews, Moore et al. 2012).
Self-report has been used to measure PA and SB with varying levels of success.
Generally, both PA and SB can be measured with only low-moderate validity (van Poppel,
Chinapaw et al. 2010; Healy, Clark et al. 2011; Lyden 2012), although MVPA can be assessed
with higher validity than SB (Matthews, Moore et al. 2012). It is not surprising that self-report is
more successful for measuring PA (especially MVPA) than SB. In adults, MVPA usually occurs
during structured or planned activities and can be recalled with better accuracy than SB, which is
typically more intermittent in nature and is, therefore, more difficult to recall (Healy, Clark et al.
2011). In addition, few self-report tools are properly designed for measurement of SB. SB has
traditionally been assessed using proxy measures such as time spent watching TV, driving, using
a computer, work-based sitting time, and/or total screen time. In a recent review of the literature,
Healy et al. (Healy, Clark et al. 2011), found that most studies support that specific sedentary
activities can be recalled with acceptable reliability and validity (intraclass correlation > 0.50 and
Pearson/Spearman correlations >0.40). However, self-report of total SB generally has lower
validity (Pearson/Spearman correlations < 0.40) when compared to accelerometer-derived SB in
adults (Hagstromer, Oja et al. 2006; Healy, Clark et al. 2011).
Similarly, it appears that breaks in SB cannot be accurately assessed using self-report. In
2011, Clark et al. (Clark, Thorp et al. 2011) found that 121 adult office workers recalled total SB
with moderate validity (r=0.39) but had poor validity for recalling breaks in SB (r=0.26) during
the work day. Moreover, most self-report measures contain few questions about sedentary
activities or total SB and no questions about breaks in SB, making measurement of SB
impossible using many current self-report tools (Healy, Clark et al. 2011).
24

Limitations of self-report methods have led researchers to use pedometers, heart rate
(HR) monitors, and accelerometers for objective measurement of EE, PA, and SB. Of these
methods, pedometers can only measure steps taken and, therefore, provide no information on SB,
activity intensity, or activity duration (Tudor-Locke and Myers 2001). HR monitors provide a
good estimation of moderate-to-vigorous PA (MVPA), but optimal accuracy is dependent on
developing individualized curves that match HR to EE values, which can vary considerably
among people of different ages and cardiorespiratory fitness levels (Janz 2002). Additionally,
HR monitors have limited utility for measuring light-intensity PA or SB because lower-intensity
activities tend to elicit high HR variability (Spurr, Prentice et al. 1988). Furthermore, HR is
influenced by a number of external factors such as stress, caffeine intake, and temperature, which
affect HR during SB and LPA much more than MVPA (Montoye, Kemper et al. 1996; Crouter,
Albright et al. 2004). Finally, HR monitors can be cumbersome to wear, which may lower
compliance rates compared to accelerometers or pedometers (Janz 2002; Andre and Wolf 2007).
Accelerometers have become the preferred device for measuring EE, PA, and SB due to
their objectivity, minimal participant and researcher burden, and ability to measure free-living
activity for several weeks at a time. Accelerometers work by recording accelerations of a single
part of the body and using this information to predict EE or activity type. Traditionally, these
accelerations were passed through a filter to remove aberrant signals and then translated into
‘activity counts’ corresponding to the magnitude of the acceleration.
In most studies, accelerometers have been worn on the hip to record vertical accelerations
of the trunk; these vertical accelerations were found to correlate well with EE for ambulatory
activities, such as walking and running (Montoye, Washburn et al. 1983; Freedson, Melanson et
al. 1998). However, hip-mounted accelerometers have limited accuracy for measuring EE for
25

SB and many lifestyle activities (e.g., household chores, gardening, climbing/descending stairs,
and cycling) and, when using common linear-regression or cut-point methods, cannot classify
activity type (Hendelman, Miller et al. 2000; Crouter, Churilla et al. 2006; Rothney, Schaefer et
al. 2008; Lyden, Kozey et al. 2011). Recently, accelerometer battery and memory capacity have
improved to allow measurement of three-dimensional, raw acceleration data for as long as
several months at a time (Westerterp 1999). Following these technological improvements, data
processing methods such as machine learning have emerged as superior methods for analyzing
accelerometer data.
The following sections will review the progression of accelerometer data processing
techniques leading up to the present time, address current limitations in data processing, and
discuss how the current study will improve EE, PA, and SB measurement.
The Large-Scale Integrated monitor and Caltrac
In 1979, Laporte et al. (LaPorte, Kuller et al. 1979) developed the Large-Scale Integrated
(LSI) motor activity monitor for the measurement of EE. The LSI was a little bit larger than a
wrist-watch and contained a ball of mercury housed in a small cylinder. When the LSI was
moved, the mercury would roll down the cylinder and run into a mercury switch. The number of
times the switch was contacted was displayed on a small screen. In this way, the LSI functioned
like a pedometer but was intended for use on the hip and other parts of the body (i.e., ankle,
wrist). To assess the LSI’s ability to measure EE, Laporte et al. designed a series of experiments
where they had participants log their activity for two days while wearing the LSI on the hip and
ankle. Activities in the activity logs were looked up in previously developed EE tables (that
reported average EE required for each activity) to obtain a measure of total EE, and these EE

26

values were correlated to the output from the LSI monitors. While both the hip- and anklemounted monitors had positive correlations with EE, the hip monitor performed significantly
better (r=0.69) than the ankle monitor (r=0.43). This study provided a first step in validating
activity monitors, but it used a poor criterion measure by estimating EE from tables instead of
directly measuring EE and did not compare this monitor to other activity measures in use at the
time (e.g., pedometer).
In 1981, Wong et al. (Wong, Webster et al. 1981) developed an accelerometer that was
later commercially produced as the Caltrac (Hemokinetics, Inc., Madison, WI). The Caltrac had
a piezoelectric sensor which recorded accelerations based on the output charge generated with a
movement, with faster accelerations producing a greater charge. This method provided a
significant advantage over the pedometer, which could only record steps and could not
differentiate different speeds of movement. The Caltrac was worn on the hip or lower back and
recorded total vertical accelerations accrued, allowing it to measure total EE over the time period
it was worn. In two different laboratory experiments, Montoye et al. showed that the Caltrac had
higher correlations with measured EE than other activity monitors. In the first experiment
(Wong, Webster et al. 1981), 15 participants performed walking (at 2, 3, and 4 mph), running (at
6 and 8 mph), and stepping (80, 120, and 160 steps/min) for three minutes each. During the
testing the participants wore the Caltrac, two different pedometers, and a metabolic analyzer;
they found that the Caltrac had significantly higher correlations with measured EE than either of
the pedometers (data displayed in a figure, but no exact correlation coefficients given). Next,
they conducted a second study (Montoye, Washburn et al. 1983) where 21 adults performed level
and inclined walking (2 and 4 mph at 0, 6, and 12% grades), level and inclined running (6 mph at
0 and 6% grades), stepping (20 and 35 steps/min), knee-bends (28 and 48 bends/min), and floor
27

touches (24 and 36 touches/min) for four minutes each. During the activities, participants wore
the Caltrac on the hip, two LSI activity monitors (worn on hip and wrist), and a metabolic
analyzer. Similar to the previous study, the Caltrac had significantly higher correlations (r=0.79
vs. r=0.71 and r=0.40) and lower standard errors (S=6.63 vs. S=7.86 and S=9.16 ml/kg/min) for
EE measurement than the hip- and wrist-worn LSI monitors.
These studies provided the first evidence of the utility of accelerometers for EE
measurement over pedometers or other kinds of activity monitors. Additionally, the comparison
of the hip-mounted LSI to the wrist- and ankle-mounted LSIs provided preliminary evidence that
the hip placement for activity monitors was preferable to limb placement when EE was the
outcome variable of interest. However, use of these early monitors was restricted to measuring
total EE and could not yield information about activity type, duration, or intensity.
Linear regression
For almost 15 years, the Caltrac was the most commonly used accelerometer for EE
measurement in both adults and children (Sallis, Buono et al. 1990; Haymes and Byrnes 1993).
Then, in the mid-1990s, newer accelerometers such as the Tritrac and the Computer Science
Applications (CSA, also called the ActiGraph 5032) were developed, tested, and validated for
measuring EE in a number of different studies (Janz, Witt et al. 1995; Melanson and Freedson
1995; Welk and Corbin 1995).
In 1998, accelerometer data processing took a large step forward with a study by
Freedson et al. (Freedson, Melanson et al. 1998) which was the first to use accelerometer data to
measure PA intensity as well as EE using the uniaxial CSA 7164 accelerometer (a newer version
of the CSA 5032). Their study was a laboratory-based validation study where 50 adult
28

participants walked (3.0 and 4.0 mph) and jogged (6.0 mph) on a treadmill, for six minutes each,
while wearing a metabolic analyzer and a CSA on the right hip. Accelerometer counts and EE
data were collected at one-minute intervals, allowing minute-by-minute comparisons of counts
and EE. Accelerometer counts were found to have high correlations (r=0.88) with EE, allowing
a linear regression model to be created to predict EE (in METs) from accelerometer counts.
Furthermore, activity intensity could be derived from METs by establishing count thresholds
(cut-points) to classify PA into light (<3.0 METS, <1952 counts/min), moderate (3.0-5.9 METs,
1952-5724 counts/min), hard (6.0-8.9 METs, 5725-9498 counts/min), and very hard (≥9.0
METs, ≥9499 counts/min) intensities.
Since Freedson et al. published their study, the development of cut-points to classify
activity intensity has been the preeminent method for validating and using accelerometers. Cutpoint development is relatively simple for researchers to accomplish and understand, and the cutpoint approach seems to work relatively well for measurement of ambulatory activities
(Freedson, Melanson et al. 1998; Lyden, Kozey et al. 2011). However, a significant limitation of
a linear regression equation developed using ambulatory activities is that it does not predict EE
well when non-ambulatory activities are performed. Hendelman et al. (Hendelman, Miller et al.
2000) designed a free-living simulation that involved four self-selected speeds of walking
(ambulatory activities), household chores (washing windows, dusting, vacuuming, lawn mowing,
and planting shrubs), and two holes of golf. During the session, participants wore the CSA on
the hip and had EE measured with a portable metabolic analyzer. Two regression equations were
then developed, one for the walking activities (the “calibration” regression equation) and one for
all activities. Similar to Freedson’s equation, Hendelman’s calibration regression equation

29

performed significantly better for predicting EE during the walking activities (r=0.77) than for all
activities (r=0.59), and underestimated EE by 30.5-56.8% in the free-living simulation.
From Hendelman’s study, it is apparent that linear regression models developed using
ambulatory activities perform much better when measuring EE during ambulatory activities than
for non-ambulatory activities. To support this finding, the regression equation and cut-points
developed by Freedson et al. (Freedson, Melanson et al. 1998) have been studied by Crouter et
al. (Crouter, Churilla et al. 2006), Lyden et al. (Lyden, Kozey et al. 2011), and Rothney et al.
(Rothney, Schaefer et al. 2008); these studies support Hendelman’s conclusion that Freedson’s
regression equation significantly underestimates EE when applied to non-ambulatory activities.
In order to improve on the shortcomings of EE regression equations developed using only
ambulatory activities, researchers began developing equations using both ambulatory and nonambulatory activities. In 2000, Swartz et al. (Swartz, Strath et al. 2000) developed a linear
regression equation using 28 activities, of which only two were ambulatory activities (walking at
2.9 and 3.7 mph) and the remaining 26 were non-ambulatory, lifestyle activities (e.g., sports such
as tennis and softball, household chores such as cooking and laundry). Their regression equation
yielded only moderate validity (r=0.56) for EE measurement, but the studies by Crouter et al.
(Crouter, Churilla et al. 2006), Lyden et al. (Lyden, Kozey et al. 2011), and Rothney et al.
(Rothney, Schaefer et al. 2008) confirmed that Swartz’s regression model had better validity for
measuring MVPA in free-living settings than Freedson’s.
One of the big differences between the cut-points developed by Freedson and those
developed by Swartz is that to overcome the underestimation of lifestyle activities, Swartz had a
much lower cut-point for MVPA than Freedson (574 counts/min from Swartz’s equation vs.

30

1952 counts/min from Freedson’s equation). However, because of the lower MVPA cut-point,
the regression line had a much flatter slope so that the EE of more intense activities would not be
overestimated. Thus, Swartz’s equation had a y-intercept at 2.606, meaning that the predicted
MET value for an activity registering 0 counts/min was 2.606 METs. This is a substantial error
given that activities likely to record 0 counts/min (such as lying or sitting) elicit EE values of 1.0
METs (Ainsworth, Haskell et al. 2011). Thus, Swartz et al. improved the measurement of
MVPA at the expense of measuring SB and LPA.
In summary, the available evidence suggests that no simple linear regression model
successfully classifies all activity intensities or accurately predicts EE across a variety of
activities. As evidenced by Swartz’s study, improvement of certain PA intensities hurts
measurement in the other intensities. Although the linear regression model is simple to
understand and use, more complicated methods of accelerometer data processing are necessary
to improve measurement of EE and activity intensity and classify activity type.
Multiple regression
Once it became apparent that single linear regression models could not adequately
measure EE or activity intensity across a range of activity types, researchers moved toward
creating more complex, multiple regression equations in an attempt to improve activity
measurement. Heil (Heil 2006) was the first to experiment with a model where EE was
predicted using one of two independent, linear regression models that had different slopes.
Model 1 was used for activities eliciting 350-1,200 counts/min, and model 2 was used for
activities eliciting >1,200 counts/min. Model 1 had a steeper slope than model 2, and this steeper
slope helped predict EE of non-ambulatory activities (which were often underestimated by single

31

regression equations) more accurately while reducing the overestimation of light-intensity
activities. By utilizing this approach, Heil was able to significantly improve estimates of EE
over single regression models (r=0.84 for single regression model, r=0.87 for model 1 of tworegression model, and r=0.92 for model 2 of the two-regression model) when predicting EE
across 10 activities (7 lifestyle, 3 ambulatory). Additionally, Heil was among the first to set a
threshold for SB; in his model, any activity eliciting <350 counts/min was assigned an EE value
of 1.0 MET instead of being input into a prediction equation. This sedentary threshold was
implemented to alleviate the significant overestimation of EE present in single regression
models.
Despite the improvements in EE measurement seen with Heil’s two regression model, the
model had limited use because it was fully dependent on accelerometer counts and cut-points to
estimate EE. Sole reliance on counts and cut-points for determination of EE is a problem
because some activities with different EE requirements yield similar numbers of counts when
measured with a hip-mounted accelerometer. For example, Lyden et al. (Lyden, Kozey et al.
2011) found that activities can elicit very different counts/min (e.g., 3,245 counts/min for
descending stairs vs. 203 counts/min for raking) while having very similar EE requirements (5.0
METs for descending stairs vs. 5.2 METs for raking). Thus, using Freedson’s regression
equation (Freedson, Melanson et al. 1998), descending stairs would be correctly classified as
moderate-intensity PA , while raking would by incorrectly classified as SB or light-intensity PA.
Similarly, Hendelman et al. (Hendelman, Miller et al. 2000) found that some activities can elicit
similar counts/min (e.g., 1,982 counts/min for walking 2.0 mph vs. 2,144 counts/min for golfing)
but elicit very different EE requirements (2.0 METs for walking 2.0 mph vs. 4.3 METs for
golfing). In this example, walking and golfing would both be classified as moderate-intensity
32

PA based on counts, even though walking at 2.0 mph is actually LPA. As one final example,
both lying and standing quietly elicit close to 0 counts/min, so EE prediction would be the same
for each (1.0 METs). However, lying requires an EE of 1.0 METs, while standing requires 1.3
METs (Ainsworth, Haskell et al. 2011); while the absolute difference is only 0.3 METs,
misclassifying standing as lying is a 30% error in EE. This error is especially significant
considering that adults in the US spend about 60% of their day engaged in SB (Matthews, Chen
et al. 2008), highlighting the need to be able to detect small differences in EE that exist among
different types of SB and LPA.
In contrast to Heil’s method of choosing the regression line based on counts/min, Crouter
et al. (Crouter, Clowers et al. 2006) developed a two regression model where the regression line
used was dependent on the variability of the activity being performed. They discovered that
variability in counts for ambulatory activities is lower than the variability of non-ambulatory,
lifestyle activities (which tend to be more intermittent in nature). In order to determine
variability, they parsed the one-minute data into six 10-second segments and calculated the
coefficient of variation (CV) for the minute. Then they developed a two regression model where
activities with a CV of ≤10 were analyzed using an exponential regression curve developed for
ambulatory activities, and those with a CV of >10 were analyzed with a cubic regression curve
developed for non-ambulatory activities. They chose exponential and cubic curves because these
fit their data better than linear regression lines. When Crouter et al. tested their model in 48
participants performing 17 activities (4 ambulatory, 13 lifestyle), their model showed greatly
improved accuracy for measuring METs ( r=0.96) compared to Freedson’s, Hendelman’s, and
Swartz’s (Swartz, Strath et al. 2000) linear regression models, where the highest correlation was
r=0.70. A subsequent study by Crouter et al. (Crouter and Bassett 2008) produced a two
33

regression model for the Actical accelerometer, and they found similar improvements in EE
compared to single, linear regression.
Crouter’s model was a significant innovation for two reasons: 1) it was the first to use
characteristics of accelerometer output (CV) other than counts for EE measurement and 2) it was
the first to utilize a non-linear regression model, which is less restrictive than a linear regression
line since it allows more freedom in fitting a relationship between EE and accelerometer output
across both ambulatory and lifestyle activities. However, subsequent evidence suggests that
Crouter’s model may suffer from over-fitting, where the extra freedom of the non-linear model
allowed for construction of a more accurate model for a specific population (better internal
validity) but a less accurate model when applied to other populations, data sets, or sets of
activities (poorer generalizability). To demonstrate this point, Lyden et al. (Lyden, Kozey et al.
2011) tested Crouter’s model against the models of Freedson, Hendelman, and Swartz in a large
(n=277), independent sample performing 23 activities (6 ambulatory, 17 lifestyle). While
Crouter’s model performed best for lifestyle activities, both Freedson’s and Swartz’s linear
models performed better for ambulatory activities. Therefore, while Crouter’s models did not
solve the problem of accurately measuring both ambulatory and lifestyle activities, they showed
the utility of using accelerometer features other than counts/min to distinguish different kinds of
activities and improve measurement of EE.
Measurement of sedentary behavior using accelerometers
As mentioned previously, self-report has been largely inadequate for measuring SB,
leading researchers to use objective measures for SB measurement. Accelerometers are
seemingly ideal for the measurement of SB because they can capture both movement and non-

34

movement and, therefore, should be able to measure total SB as well as detect breaks in SB.
Using NHANES data to determine population levels of SB, Matthews et al. used a count cutpoint of <100 counts/min to determine SB and found that the average adults spends about 7.7
hrs/day in SB (Matthews, Chen et al. 2008). Since its first use, the 100 counts/min cut-point has
been used by Healy et al. (Healy, Dunstan et al. 2007; Healy, Wijndaele et al. 2008; Healy,
Matthews et al. 2011), who has found consistent associations between objectively-measured SB
and poor metabolic health. However, the 100 counts/min cut-point for SB was chosen for its
utility in detecting non-movement and has not been validated for use as an accurate measure of SB
(Pate, O'Neill et al. 2008). Additionally, standing quietly elicits less than 100 counts/min, so the
cut-point approach incorrectly classifies standing as SB when, as discussed previously, it exerts
different effects on health and must be distinguished from SB.
To directly test SB cut-points, a 2011 study by Kozey-Keadle et al. (Kozey-Keadle,
Libertine et al. 2011) had 20 adult office workers wear ActiGraph accelerometers for six hours on
two work days, with the second day spent performing more PA and less SB In this study, DO was
used as the criterion measure of SB. Using the 100 counts/min cut-point, the ActiGraph
underestimated time in SB by 4.9% and could not detect the change in SB that occurred between
the first and second days. Notably, 150 counts/min was identified as producing a more accurate
estimation of total SB (underestimated total SB by 1.8%), although it also could not detect the
change in SB. In a follow-up study, Lyden et al. tested the utility of the 100 counts/min and 150
counts/min SB cut-points by having 13 adults wear ActiGraph accelerometers for 10 hours on two
separate days, while performing less SB and more breaks in SB on the second day. The
investigators found that both the 100 and 150 counts/min cut-points led to overestimations of SB,
overestimations of breaks in SB, and inadequate detection of the reduction in SB on the second
35

day. Additionally, while the 100 counts/min cut-point was more accurate for total SB, the 150
counts/min cut-point was more accurate for breaks in SB. Together, these two studies indicate
three issues: 1) defining a cut-point for SB is problematic since there does not appear to be a single
one that can accurately measure total SB or breaks in SB, 2) no cut-point has been shown to
accurately detect changes in SB, and 3) a consistent over- or under-estimation may be correctable,
but the previous studies show no such consistent pattern (one showing underestimation and one
showing overestimation). Therefore, measuring SB using the cut-point approach is not sufficient
to capture the complex nature of SB, and the cut-point approach also cannot be used to identify
specific activity types or distinguish standing from SB.
Machine learning
Regression techniques and cut-points were a logical first step in EE prediction due to
their intuitive appeal and simplicity. The progression of regression equations from those
developed for the Caltrac to Crouter’s multiple regression models for the ActiGraph and Actical
has demonstrated that while newer regression models can address many of the problems of older
models, the newer models also create new limitations. Given that two activities of very different
intensities can elicit the same number of counts/min (Lyden, Kozey et al. 2011) and the fact that
counts/min does not yield enough information to classify activity type, it is critical to move away
from relying solely on accelerometer counts for activity measurement. Additionally, Crouter’s
(Crouter, Clowers et al. 2006; Crouter and Bassett 2008) method of using the CV of an activity
to differentiate ambulatory from lifestyle activities indicates that within the counts/min output is
rich information in the accelerometer signal, and this information may be used for improving EE
measurement and also allowing for classification of activity type.

36

Using average accelerometer counts/min to estimate EE was originally done as much for
practical reasons as for scientific reasons. The Caltrac worked similarly to a pedometer in that it
could only record total counts to give an overall estimation of EE or total PA level (Wong,
Webster et al. 1981). The CSA represented a vast improvement in technology in that it could
aggregate counts into one-minute increments, allowing minute-by-minute estimations of EE and
estimations of PA intensity based on the EE in a given minute. Both accelerometers were
uniaxial and could only record accelerations in the vertical plane. Further improvements in
accelerometer technology have made the monitors smaller, lighter, cheaper, and able to record
triaxial accelerations at a rate of up to 100 times per second (100 Hz) and for upwards of 45 days
at a time on a single battery charge (GENEActiv 2013). This expansion in monitor capabilities
has led to advanced methods of data managing and processing, collectively called “machine
learning,” that have been used to predict EE and classify activity type and which show great
promise for improving measurement of SB.
Machine learning is a term that describes an array of complex mathematical techniques
and algorithms that, coupled with an appropriate software package, can learn to recognize and
differentiate patterns in activities by examining certain input ‘features,’ which are summaries of
the data (e.g., mean, standard deviation, or skewness of the acceleration signal). Thus, machine
learning can be applied to accelerometer data to in order to estimate EE, classify activity type
and possibly measure SB (Preece, Goulermas et al. 2009). In order to use machine learning,
important features of the accelerometer data must be identified and extracted for use. Then,
these features can be used as inputs (or independent variables) into a machine learning algorithm,
which then provides a specific output (dependent variable), such as EE or activity classification.
Some features, such as root mean square error (RMSE) and CV, can be useful for differentiating
37

between static and dynamic activities (e.g., sitting vs. walking). Others, such as monitor
orientation, help differentiate different body postures (e.g., lying vs. standing). Conversely,
features such as mean, standard deviation, and entropy of the acceleration signal are useful for
distinguishing among dynamic activities and activity intensities (Preece, Goulermas et al. 2009).
Many machine learning algorithms exist, but those that have been used most commonly
in PA research include artificial neural networks (ANNs), hidden Markov models, and decision
trees (Bao and Intille 2004; Pober, Staudenmayer et al. 2006; Preece, Goulermas et al. 2009;
Staudenmayer, Pober et al. 2009; Freedson, Lyden et al. 2011). While there is no consensus on
which machine learning technique is most accurate for PA measurement, the ANN technique has
a number of advantages over the other technique: 1) the ability of ANNs to directly estimate
continuous and categorical variables and 2) the ability to construct ANNs using freely-available
software. First, a significant limitation of the decision tree and hidden Markov model is that they
can directly predict categorical variables, such as activity type, but cannot directly predict
continuous variables such as EE (Preece, Goulermas et al. 2009). Decision trees and hidden
Markov models can estimate EE indirectly by first classifying activity type and then predict EE
using values from the Compendium of Physical Activities (Ainsworth, Haskell et al. 2011), but
this method is limited to predicting EE from only the activities the decision trees and hidden
Markov models were trained to classify and is subject to the same limitations for measuring EE
as when using the Compendium (i.e., different people may have different EE when performing
the same activities, EE values are averages, etc.). Second, while many machine learning
techniques must be conducted using complicated and expensive software packages (Pober,
Staudenmayer et al. 2006; Rothney, Neumann et al. 2007; Preece, Goulermas et al. 2009), a 2009
study by Staudenmayer et al. (Staudenmayer, Pober et al. 2009) implemented a relatively simple
38

way to use freely available software, the R statistical software , to extract features and process
accelerometer data using ANNs. Thus, their study offered a significant advancement in the field
because it was the first to make a complicated machine learning technique accessible to
researchers without extensive engineering, computer science, and/or mathematics backgrounds
or those without access to expensive statistical software packages.
In many respects, machine learning is similar to regression. First, ANNs, as well as other
machine learning techniques, work by taking a set of input variables (e.g., accelerometer counts,
raw acceleration data, monitor orientation, demographic variables) and using them to predict a
certain output (e.g., EE, activity type). Then, in order to create an ANN, the ANN must first be
calibrated or trained on a set of data where both the inputs and outputs are known. The ANN
then assigns certain weights to the input variables based on how important they are for predicting
the output (similar to coefficients in regression equations) (Preece, Goulermas et al. 2009).
However, ANNs are different from regression in two important ways. First, ANNs do
not assume that simple models can be fit to complex data (derived in a variety of settings and
from many different activities) (Preece, Goulermas et al. 2009). Thus, an ANN is much more
flexible than a regression model because it does not have to have some predetermined shape
(e.g., a line for a linear model or a curve for a quadratic model). Second, ANNs can take input
variables that contain much more information about an activity than minute-by-minute
accelerometer counts. For example, Staudenmayer’s model took second-by-second, uniaxial
(vertical axis) accelerometer count data and extracted the 10th, 25th, 50th, 75th, and 90th percentiles
from each minute’s data as the features to use as inputs into the ANN. By extracting these
percentiles, it is possible to derive information about the average, variance, and CV of the
accelerometer data. Thus, the model being created is using much more information from the
39

accelerometer, which should make it more accurate in predicting the desired outcome variables.
Using this approach, Staudenmayer et al. were able to improve EE estimates by 28-66%
compared to linear and multiple regression models. Additionally, while regression cannot be
used for activity type classification, Staudenmayer’s model correctly classified activities into
four different types (sedentary, lifestyle, ambulatory, or sport) with 88% accuracy
(Staudenmayer, Pober et al. 2009). A more detailed description of the ANN is offered in the
Methods sections of Chapters 3-5 of this dissertation.
Although Staudenmayer’s use of machine learning for predicting EE is a significant step
forward, newer accelerometer models offer raw data recording in three axes and also provide
information about monitor orientation. Since accelerometer counts are derived from proprietary
filtering methods by the companies that manufacture each kind of accelerometer, use of
accelerometer counts does not allow for comparability of different brands of accelerometer. The
move to raw data collection and analysis allows for comparison between accelerometer models.
Also, more useful information can be extracted from the raw accelerometer data than from
activity counts, so use of raw data will likely improve the use of ANNs for EE and SB
measurement and activity classification.
Additionally, while Staudenmayer et al. were able to classify activity into four categories,
activity measurement will be significantly enhanced with proper identification of more specific
activity types and a more thorough classification of SB (e.g., identifying sitting and standing
separately instead of grouping them as ‘sedentary’). Thus, the current study will build off of
Staudenmayer’s research by including a slightly larger number of input features and raw data in
order to further improve measurement of EE and SB and classification of more activity types.

40

Multiple sensor methods
Given the limitations of single, hip-mounted accelerometers for measuring the wide
variety of activities that occur in free-living settings, some researchers have used multiple
sensors to improve EE measurement and classify activity type. These efforts generally fall into
one of two categories: 1) utilizing monitors that collect acceleration data along with other
physiologic variables (e.g., HR, skin temperature) or 2) use of multiple accelerometer-based
monitors placed on different parts of the body. Both types will be discussed in the following
text.
First, the combination of accelerometry and physiologic measures has been used to
improve EE and activity intensity measurement. HR and accelerometry are both popular
methods of measuring PA intensity and EE, but both have notable limitations when used on their
own. In an effort to minimize the limitations of each method while capitalizing on their
strengths, researchers have developed regression methods which use both HR and accelerometer
counts to predict EE. Haskell et al. (Haskell, Yee et al. 1993) were the first to use a combination
of HR and movement data to try to improve EE estimation. The authors had 19 men perform
seven ambulatory and exercise activities while wearing a HR monitor, two Vitalog activity
monitors (on the wrist and thigh), and a metabolic analyzer. Overall, using both HR and body
motion significantly improved overall EE estimation compared to using HR or accelerometry
only, although much of the improvement came from using individualized HR-EE curves (as
opposed to a general curve applied to all participants). Follow-up studies have also shown
improvements in EE prediction with combined HR and accelerometer data (Moon and Butte
1996; Strath, Bassett et al. 2001; Strath, Bassett et al. 2002; Plasqui and Westerterp 2005), but
these studies also used individual calibration curves for HR, dramatically increasing researcher
41

burden for accurate data collection and limiting the generalizability of the regression models to
the participants from whom they were created. The Actiheart activity monitor (Phillips, Bend,
OR) attempts to reduce burden of multiple monitors by combining accelerometer and HR
monitor into one device, which is fastened to a person’s chest with a sticky pad for continuous
wear. The Actiheart tends to have good wear compliance, but men have to shave their chest to
wear the monitor, and women tend to report lower comfort with the Actiheart than with hipmounted monitors (Moy, Sallis et al. 2010). Thus, while using both HR and accelerometer
counts seems to improve EE measurement, the added cost and burden to researchers in creating
individual HR curves and using multiple monitors per participant prohibits the use of this method
in large studies. Additionally, participant compliance with HR monitors tends to be lower than
with self-report or accelerometer tools (Janz 2002; Andre and Wolf 2007), providing another
limitation of their use for activity measurement.
Another measurement device, the BodyMedia armband (formerly called the Sensewear
armband; BodyMedia, Inc., Pittsburgh, PA), is a single monitor (worn on the upper arm) that
records biaxial acceleration data as well as heat flux, galvanic skin response, skin temperature,
and ambient temperature. The armband uses these variables, along with self-reported gender,
age, height, and weight to predict EE through proprietary algorithms developed by BodyMedia.
The armband was first validated by Jakicic et al. (Jakicic, Marcus et al. 2004) in 2004 for
estimating EE from walking, stepping, and leg and arm ergometry in 40 adults; their results
indicate that the armband provided much better estimation of EE than a hip-mounted TriTrac
accelerometer for these four exercise activities. Further research on the armband has validated
its use for estimating exercise and free-living EE in many populations, including children
(Arvidsson, Slinde et al. 2007), younger and older adults (Welk, McClain et al. 2007;
42

Heiermann, Khalaj Hedayati et al. 2011), pregnant women (Berntsen, Stafne et al. 2011), and
diseased or obese individuals (Mignault, St-Onge et al. 2005; Papazoglou, Augello et al. 2006;
Dwyer, Alison et al. 2009). In studies comparing the armband to traditional accelerometers for
EE measurement, the armband frequently performs similarly to whe hip-mounted accelerometer
data that are analyzed with linear regression models (Jakicic, Marcus et al. 2004; Welk, McClain
et al. 2007; Berntsen, Hageberg et al. 2010; Colbert, Matthews et al. 2011), although some
research indicates reductions in error by as much as 20% using the armband (Lee, Kim et al.
2014). Therefore, it is possible that the addition of physiologic measures improves estimates of
EE. Additionally, the armband’s skin temperature and heat flux sensors help to verify time the
monitor is actually being worn (wear time), which is an important issue in accelerometry-based
measurement (Masse, Fuemmeler et al. 2005; Evenson and Terry 2009).
Despite these advantages, the armband has some key limitations that prevent it from
being an optimal measurement tool. The armband’s primary limitation is that it estimates EE
using BodyMedia’s proprietary algorithms. While proprietary algorithms can be useful to
consumers and end users who want EE estimation or time spent in MVPA without needing to
develop their own prediction model, proprietary algorithms hinder scientific progress because
they do not allow researchers transparency as to how EE is being predicted or which input
variables are most useful for EE prediction. Without this knowledge, it becomes very difficult to
identify armband strengths and limitations or identify variables that might be used to further
improve EE measurement. Additionally, BodyMedia constantly refines its prediction algorithms
to improve EE measurement, but without knowing how the algorithms work or which variables
are most important, it is very difficult to compare results obtained using the different algorithms,
hindering generalizability or comparability of study results. Finally, the armband only provides
43

estimates of EE and activity intensity, and its proprietary data analysis prohibits researchers from
accessing the raw data in order to be able to use newer data processing techniques to determine
activity type.
Another multi-sensor method researchers have studied is to use multiple accelerometers
positioned on different parts of the body to improve EE measurement and activity classification.
When creating their linear regression model for measuring EE of lifestyle activities (discussed
earlier), Swartz et al. (Swartz, Strath et al. 2000) also had participants wear a CSA on the wrist to
determine if using acceleration information from the hip and wrist locations simultaneously
could improve EE measurement. Compared to the correlation of r=0.56 for the hip regression
equation, the combination of hip and wrist acceleration improved the correlation only minimally,
to r=0.59. The minimal improvement seen in this study does not seem worth the added burden
on participants (due to compliance issues) or researchers (for the added data to be analyzed).
More recently, researchers have built and tested systems of accelerometers, where each
accelerometer has a wired or wireless link to a central unit, allowing the unit to process data from
the accelerometers simultaneously. A complete comparison of the systems can be found in
Table 2.1. One example of this is the Intelligent Device for Energy Expenditure and Activity
(IDEEA; MiniSun, Fresno, CA). Produced in the early 2000s, it is a system of five
accelerometers (worn on both feet, both thighs, and the chest) that are wired to a processing unit
worn on the hip. The sensors are taped to the skin, and the wires are to be worn underneath
clothing to minimize risk of breaking. Data collected from the IDEEA monitor are processed via
proprietary algorithms and are used to predict EE, activity type, activity duration and intensity,
and activity speed (for walking and running). Validation studies have shown 98.7% accuracy for
classifying 32 activities and postures (Zhang, Werner et al. 2003) and a correlation of r=0.973 for
44

measuring EE during a simulated free-living setting (Zhang, Pi-Sunyer et al. 2004). Its high
accuracy for measuring both activity type and EE makes it ideal as a criterion measure in shortterm, free-living studies (Welk, McClain et al. 2007; Gyllensten and Bonomi 2011). However,
the IDEEA’s limited battery life (48 hours), excessive participant burden, fragile design,
proprietary algorithms for predicting its outcome variables, and high cost ($5,000 per unit)
prohibit its use as a measurement tool for large, free-living studies.
Since the creation of the IDEEA, another system has been developed by Tapia et al. that
uses five wireless sensors (placed on the right ankle, thigh, hip, wrist and upper arm), a heart rate
monitor, and open-source data analysis to overcome many of the shortcomings of the IDEEA
monitor. Using machine learning algorithms, Tapia et al. were able to classify 30 activities and
postures with 56.3% accuracy with only accelerometer data and 58.4% using accelerometer and
HR data (Tapia, Intillie et al. 2007). While their classification accuracy was much lower than the
IDEEA system, the activities that Tapia et al. used were more similar to each other than the
activities in the IDEEA validation, and the activities Tapia’s system misclassified were often
different intensity levels of a given activity (e.g., cycling hard intensity at 30 rpm vs. cycling
moderate intensity at 30 rpm). Importantly, Tapia’s study indicates that HR may have little
value for classifying activity type, especially when using advanced processing techniques for
accelerometer data. In a similar study, Dong et al. developed a wireless system of three
accelerometers (worn on the right ankle, thigh, and wrist) for measurement of activity type and
EE. In a validation of the system, they found that activity classification accuracy for 14 activities
was 71.3-78.3% using only one accelerometer (with the thigh providing the best classification
accuracy and the wrist providing the lowest) but improved to 89.6-96.2% using two
accelerometers (with the ankle and wrist combination providing the highest classification
45

accuracy) and only improved slightly (to 97.0%) using all three accelerometers (Dong, Montoye
et al. 2013). The improvement in classification accuracy of this system compared to Tapia’s
may be due to using fewer activities in the validation, but it may also be due to difference in
machine learning approach and features used as input variables. Despite dramatic improvements
in activity type classification accuracy achieved when using multiple accelerometers, preliminary
analyses with the system developed by Dong et al. indicate that use data from all three
accelerometers provides only minimal improvement over use of a single monitor for EE
measurement (Dong, Biswas et al. 2013; Montoye, Dong et al. 2013).
Overall, inclusion of physiologic variables and/or additional accelerometers appears to
improve measurement of activity type classification and possibly EE, but it is unclear which
variables or monitor locations are most useful to be included. Inclusion of HR does not appear to
improve activity classification (Tapia, Intillie et al. 2007). Also, using two or more
accelerometers can markedly improve accuracy of activity classification (Zhang, Werner et al.
2003; Dong, Montoye et al. 2013) but may not be as useful for EE measurement (Metcalf,
Curnow et al. 2002; Zhang, Pi-Sunyer et al. 2004; Dong, Biswas et al. 2013; Montoye, Dong et
al. 2013). However, the added burden of measuring additional variables restricts the use of these
technologies and methods to small, short-duration studies. To help ensure high compliance rates,
reduce both researcher and participant burden, and allow for accurate measurement in large
studies, development of accurate measurement techniques for single accelerometers is needed.

46

Table 2.1. Comparison of wireless accelerometer systems for activity classification accuracy and EE prediction accuracy.
Study

Participant
characteristics

Placement of
monitors

Dong et al.
40 adults
(Dong, Montoye
et al. 2013)

Right wrist, thigh,
and ankle

Tapia et al.
(Tapia, Intillie
et al. 2007)

21 adults

Zhang et al.
(Zhang, Werner
et al. 2003)

68 adults

Accelerometers:
Right wrist, upper
arm, thigh, hip,
and ankle
HR monitor:
Chest
IDEEA: Right and
left foot, right and
left thigh, chest

Number and types of
activities

Activity
classification
accuracy
14 sedentary,
All 3 monitors:
ambulatory, lifestyle,
97.0%
and exercise activities
Ankle an wrist:
(11 distinct, 3
96.2%
variations). Laboratory- Thigh and wrist:
based protocol.
91.0%
Ankle and thigh:
89.6%
Thigh: 78.3%
Ankle: 78.3%
Wrist: 71.5%
30 gymnasium
Without HR:
activities (13 distinct,
56.3%
17 variations).
With HR: 58.4%
Laboratory-based
protocol.

EE prediction
accuracy

32 activities (5 distinct,
22 variations, and 5
limb movements).
Combination of
laboratory -based and
simulated free-living
protocols.

N/A

47

98.7%

N/A

N/A

Table 2.1 (cont’d.)
a

Dong et al.
(Dong, Biswas
et al. 2013);

25 adults

Right wrist, thigh,
and ankle

b

Montoye et al.
(Montoye, Dong
et al. 2013)

Zhang et al.
(Zhang, PiSunyer et al.
2004)

37 adults

IDEEA: Right and
left foot, right and
left thigh, chest

14 sedentary,
ambulatory, lifestyle,
and exercise activities
(11 distinct, 3
variations). Simulated
free-living protocol.

N/A

Lab-based: 11
N/A
activities (5 distinct, 6
variations)
Simulated free-living:
2 required (walking and
running), and the rest
were left up to
participant

48

a

3-monitor
system similar
to or better than
hip for 10 of
14 activities
b

Correlations
(r):
3-monitor
system: 0.81
Thigh: 0.80
Ankle: 0.79
Wrist: 0.74
RMSE
(METS):
3-monitor
system: 1.61
Thigh: 1.61
Ankle: 1.69
Wrist: 1.85
Lab-based:
98.9%
Simulated
free-living:
95.1%

Accelerometer placement
Accelerometers can be placed anywhere on the body in order to record movement of the
head, limbs, torso, etc. From their first use in PA measurement, accelerometers were placed on the
hip to measure whole-body movement, and preliminary studies showed the hip placement to have
higher correlations with measured EE compared to wrist or ankle placements (LaPorte, Kuller et
al. 1979; Montoye, Washburn et al. 1983). Additionally, validation and cross-validation studies
(Freedson, Melanson et al. 1998; Lyden, Kozey et al. 2011; Sasaki, John et al. 2011) show high
correlations of hip-mounted accelerometer counts to EE during ambulatory activities, which
comprise a high percentage of the types of PA in which people engage (Ham, Kruger et al. 2009).
Despite the common use of hip-mounted accelerometers for measuring PA, there are many notable
limitations associated with their use. First and foremost, when count-based regression equations
are used, hip-mounted accelerometers dramatically underestimate the EE associated with lifestyles
activities while overestimating the EE cost of SB (Swartz, Strath et al. 2000; Crouter, Churilla et
al. 2006; Rothney, Schaefer et al. 2008; Lyden, Kozey et al. 2011). Even with sophisticated
machine learning techniques, hip-mounted accelerometers still cannot accurately classify time
spent in SB or SB type (Staudenmayer, Pober et al. 2009; Freedson, Lyden et al. 2011; KozeyKeadle, Libertine et al. 2011; Lyden, Kozey Keadle et al. 2012). Additionally, the newest
ActiGraph GT3X+ accelerometer, which was built with an inclinometer (to improve detection of
posture and differentiate among lying, sitting, standing, and movement), still frequently
misclassifies SB type (Kozey-Keadle, Libertine et al. 2011; Carr and Mahar 2012; Hanggi, Phillips
et al. 2012; Lyden, Kozey Keadle et al. 2012). Given the similar angle of the hip for sitting and
standing (Parkka, Ermes et al. 2006; De Vries, Garre et al. 2011), it is not surprising that these
studies have found frequent misclassification of sitting as standing (and vice versa).
49

Another significant limitation of hip-mounted accelerometers is that it is not clear if they
can be used effectively for measuring activity in pregnant or obese individuals. In both pregnancy
and obesity, hip-mounted accelerometers experience severe tilt, which changes the orientation of
the accelerometer and alters the accelerations being measured, significantly lowering their
accuracy for PA measurement (Shepherd, Toloza et al. 1999; Feito, Bassett et al. 2011; DiNallo,
Downs et al. 2012). Additionally, when worn on the hip, accelerometers must be secured using a
waist band, which may be uncomfortable for obese or pregnant individuals. To support this point,
Harrison et al. (Harrison, Thompson et al. 2011) asked pregnant women at 26-28 weeks gestation
to wear a pedometer and accelerometer for one week to measure free-living PA. Despite their
stated efforts to minimize tilt angle and maximize comfort, 37% of their sample did not meet the
minimum wear time requirements, and the authors attributed this in part to lack of comfort wearing
the waist band to hold the accelerometer.
Clearly, despite the widespread use of hip-mounted accelerometers, the hip is far from
perfect as a placement site for measuring EE, SB, or activity type. Recently, researchers have
renewed efforts to find alternate accelerometer locations for EE measurement and activity
classification. Some locations have included the lower back, chest, wrist, ankle, and thigh. The
strengths and weaknesses of each will be discussed in the following text. For a summary of current
findings of accelerometer performance for different body locations, please see Table 2.2.
First, the lower back and chest locations share many advantages and disadvantages of the
hip location. Since these three locations are on the torso, they all measure total body movement
and are minimally affected by erratic movements of the limbs, which can lower accuracy of EE
prediction and hurt classification of activity type (Rosenberger, Haskell et al. 2013). Additionally,
the chest location may be appealing in some contexts because it is worn under clothing and can
50

easily be implemented in a device that also measures heart rate, allowing for measurement of
multiple physiologic variables with a single device (Brage, Brage et al. 2005). Similarly, the lower
back and chest have the advantage of being placed at the midline of the body (as opposed to the
left or right sides), which removes any difficulties with discrepancies that can occur when monitors
are worn on dominant vs. non-dominant sides of the body (Nichols, Morgan et al. 1999; Trost,
McIver et al. 2005). However, the lower back and chest locations also suffer similar limitations as
the hip in their poor measurement of SB and certain lifestyle activities (e.g., household chores or
cycling) and their lack of feasibility for continuous wear.
A popular accelerometer location in recent years is the wrist. Wrist-mounted
accelerometers are appealing because they can be worn like a watch, attracting minimal attention
and enhancing comfort. Also, wrist-worn accelerometers allow for continuous wear (assuming the
monitor is waterproof), which is likely to increase compliance. Within the last few years, the
National Health and Nutrition Examination Survey (NHANES) in the US and the Biobank study in
the UK switched from hip-mounted to wrist-mounted accelerometers in the hope of improving
compliance, which has been a significant issue in their surveillance efforts (UBCC 2009).
Preliminary data from the latest NHANES cycle has indicated that compliance may be slightly
improved, with average wear-time almost an hour longer per participant (Troiano and McClain
2012), lending support that the wrist may be a viable location for large-scale studies measuring EE
and some types of activity. Moreover, wrist-mounted accelerometers have long been recognized
for their utility as objective measures of sleep (Kripke, Mullaney et al. 1978; Mullaney, Kripke et
al. 1980) and have very high validity for measuring total sleep time and sleep quality (Jean-Louis,
Kripke et al. 2001); therefore, wrist-worn accelerometers may allow 24-hour measurement of EE,
activity type, and sleep (Webster, Kripke et al. 1982). Additionally, while regression approaches
51

for wrist accelerometers have yielded lower accuracy than hip accelerometers (Montoye,
Washburn et al. 1983; Swartz, Strath et al. 2000), machine learning techniques have dramatically
improved the utility of wrist-worn accelerometers for measuring EE and activity type. Mannini et
al. (Mannini, Intille et al. 2013) found that machine learning algorithms developed from a wristworn accelerometer classified 26 activities into four activity categories with about 84% accuracy,
which is only slightly lower than the classification accuracies of algorithms developed from single,
hip-mounted accelerometers (Staudenmayer, Pober et al. 2009; Freedson, Lyden et al. 2011; Trost,
Wong et al. 2012). Additionally, preliminary findings by Montoye et al. (Montoye, Dong et al.
2013) found that the wrist accelerometer could achieve high correlations (r=0.71) when predicting
EE in a simulated free-living environment, suggesting that use of machine learning could allow
accurate EE measurement using wrist accelerometers.
However, there are also a number of studies where direct comparisons of the hip and wrist
show that the hip has higher accuracy for EE prediction, activity type classification, and SB
measurement (Zhang, Rowlands et al. 2012; Rosenberger, Haskell et al. 2013). In a study by
Rosenberger et al. (Rosenberger, Haskell et al. 2013), participants performed 20 activities while
wearing wrist- and hip-mounted accelerometers and a portable metabolic analyzer (for a criterion
measure of EE). The algorithms they created for the hip accelerometer had better sensitivity and
specificity for SB (71% and 96% vs. 53% and 76%) and MVPA (70% and 83% vs. 30% and 69%)
measurement compared with the wrist accelerometer, and their algorithms for EE had lower errors
(0.55 vs. 0.82 METs) and higher correlations (r=0.72 vs. r=0.36) with the hip accelerometer than
the wrist monitor. They attributed the superiority of the hip location for EE measurement and SB
and MVPA classification to the fact the trunk of the body requires more energy to move, so
measurements of trunk movement represent the contraction of larger muscle masses. For SB
52

classification, the authors postulated that SB can have significant variability in arm movement (i.e.,
working on the computer or driving vs. lying), diminishing the ability of a wrist-worn
accelerometer to detect differences between these behaviors and lifestyles activities involving
intermittent whole-body movement (i.e., sweeping or washing dishes). Despite some potential
drawbacks of the wrist-worn accelerometer, its potential to promote high compliance rates as well
as 24-hour measurement of PA, SB, and sleep make it an attractive site for accelerometer
placement.
Similar to wrist-mounted accelerometers, machine learning algorithms developed for
ankle-mounted accelerometers can provide good detection accuracy of ambulatory and lifestyle
activities (Mannini, Intille et al. 2013), and they have also been shown to accurately estimate
walking/running speeds (Foster, Lanningham-Foster et al. 2005), especially when used with
machine learning algorithms. In a direct comparison of wrist- and ankle-mounted accelerometers,
a study by Mannini et al. found overall classification accuracies of 95% for the ankle-mounted
accelerometer vs. 84.7% for the wrist-mounted accelerometer. Similarly, analyses by Montoye et
al. (Montoye, Dong et al. 2013) found that in a free-living simulation, ankle-mounted
accelerometers had significantly higher correlations with measured EE than wrist-mounted
accelerometers and similar correlations to thigh-mounted accelerometers (r=0.79, 0.71, and 0.80
for the ankle, wrist, and thigh, respectively). However, Dong et al. (Dong, Montoye et al. 2013)
found that a single ankle-mounted accelerometer was unable to detect differences between sitting
and standing since monitor motion and orientation were similar for these activities, rendering the
ankle-accelerometer ineffective for accurate measurement of SB. Additionally, compliance with
ankle-worn accelerometers is a significant limitation of ankle accelerometers because they cannot
be worn with high-top shoes or boots, and they look somewhat similar to police tethers (Mannini,
53

Intille et al. 2013). Thus, ankle-mounted accelerometers may be useful as part of a multi-sensor
system, but as a single unit they do not function well for SB measurement or activity classification
and may have limited compliance.
A thigh-mounted accelerometer may represent a good compromise of the advantages and
disadvantages of the previously discussed accelerometer placements. Placement of an
accelerometer on the thigh should allow for good accuracy for prediction of EE as well as
measurement of SB and activity classification. Since the thigh is closer to the center of the body
than the wrist or ankle, an accelerometer on the thigh is likely to have superior validity for
estimating EE than one on the wrist or ankle, especially given that the thigh placement allows for
tracking of some of the largest muscle groups in the body (gluteal and quadriceps muscles).
Moreover, similar to an ankle-mounted accelerometer, a thigh-mounted accelerometer should be
able to accurately capture stepping motions, allowing for good activity type classification with
ambulatory and lifestyle activities. Finally, whereas time in SB type and breaks in SB are poorly
detected with the previously discussed accelerometer placements, thigh angle is different between
sitting and standing, making the differentiation between SB and non-sedentary activities relatively
simple using monitor orientation (for differentiating sitting from standing) and acceleration data
(for differentiating sitting from cycling).
Despite the theoretical superiority of a thigh-mounted accelerometer, this placement has
received little research attention until recently. Initial opposition to the thigh location was not due
to lack of utility; an early study by an engineering group (Veltink, Bussmann et al. 1996) showed
that acceleration signals from a thigh accelerometer could be used to distinguish between sitting
and standing and among stair use, walking, and cycling (Veltink, Bussmann et al. 1996).
However, the accelerometers used by this group were not available commercially, prohibiting their
54

widespread use in PA measurement. At the time, the Caltrac, ActiGraph, and other commercially
available accelerometers were much too large and heavy to be worn on the thigh, and their size
would have made them unattractive to wear in free-living environments.
Use of a thigh-mounted accelerometer was recently brought to prominence by the
development of the activPAL accelerometer (PAL Technologies, Glasgow, Scotland), a small, thin
accelerometer specifically designed to be mounted on the thigh using a small strap or a sticky
patch. In numerous studies, the activPAL accelerometer has shown high accuracy for quantifying
time in SB, breaks in SB, and classifying SB type (Grant, Ryan et al. 2006; Kozey-Keadle,
Libertine et al. 2011; Aminian and Hinckson 2012; Lyden, Kozey Keadle et al. 2012), and it is
frequently used as a criterion measure of free-living SB (Hart, Ainsworth et al. 2011; Lord, Chastin
et al. 2011; Martin, McNeill et al. 2011). However, a major shortcoming of the activPAL is that it
relies on proprietary software for the determination of lying/sitting, standing, and stepping. While
the proprietary software makes the device user friendly, it does not allow researchers to identify
important aspects of movement or improve on the company’s algorithms to allow the activPAL to
predict EE or PA type. Additionally, at $600+ per monitor, the activPAL is considerably more
expensive than the $300 ActiGraph GT3X+ or $225 GENEActiv, which are both now as small as
the activPAL and have the added advantages of being water resistant (ActiGraph) or waterproof
(GENEA) and allowing for raw data recording and extraction.
Although the activPAL is limited for its utility to predict EE or classify non-sedentary
activity type, its development and validation provides a proof-of-concept for the utility of the thigh
location for measuring classifying activity type and measuring SB and EE. Recently, a study by
Skotte et al. (Skotte, Korshoj et al. 2012) developed a novel method to use a thigh-mounted
ActiGraph accelerometer for measuring a total of six sedentary, ambulatory, and lifestyle activities.
55

Using a combination of accelerometer orientation and acceleration data, they were able to correctly
classify the six activities with close to 99% accuracy in a simulated free-living protocol.
Additionally, the thigh-mounted accelerometer had significantly better sensitivity and specificity
(98.2% and 93.3%) for measuring free-living SB than the hip-mounted accelerometer (72.8% and
58.0%).
Moreover, analysis of a single, thigh-mounted accelerometer from a wireless accelerometer
system developed at Michigan State University showed 78.3% classification accuracy for 14
sedentary, ambulatory, lifestyle, and exercise activities (Dong, Montoye et al. 2013). Additionally,
preliminary analyses of EE measurement accuracy indicated that the thigh-mounted accelerometer
achieved high correlations (r=0.80) with criterion-measured EE (Metcalf, Curnow et al. 2002).
However, the accelerometer used in the wireless system is not commercially available and only
measures two axes of acceleration data. Validation of a triaxial, commercially available
accelerometer mounted on the thigh for classifying activity type and measuring SB and EE is a
logical next step in determining the utility of the thigh location as a measurement site.
In conclusion, while it appears that no single accelerometer placement is ideal for all
movements or all contexts, the thigh location may represent the best compromise of comfort and
measurement accuracy. The hip is well researched and provides good estimation of total body
movements, ambulatory activities, and EE. Additionally, the wrist seems to have slightly lower
accuracies for activity type and EE prediction, but the ability to record sleep measures and improve
participant compliance rates makes the wrist appealing for large studies and total day recording.
The thigh appears to be a good compromise of the hip and wrist locations. Since the thigh is very
close to the torso, it is less affected by erratic limb movements than the wrist or ankle. Also,
placement on the thigh is beneficial for detecting certain lifestyle and cycling activities and shows
56

the greatest promise for accurate measurement of SB. Additionally, with the low-profile, water
resistant/waterproof designs of the ActiGraph and GENEA accelerometers, thigh-mounted
accelerometers could be placed under clothing with a small strap or sticky patch, allowing for
continuous wear with minimal discomfort. We believe that by applying machine learning
techniques to thigh-mounted accelerometer data, we can develop algorithms with better accuracy
for classifying activity type and measuring EE and SB than can be achieved with hip- or wristmounted accelerometers.

57

Table 2.2. Comparison of different monitor placements for activity classification accuracy and EE prediction accuracy.
Study

Participant
characteristics

Placement of
monitors

Number and types of
activities

Dong et al.
40 adults
(Dong, Montoye
et al. 2013)

Right wrist,
thigh, and ankle

Mannini et al.
33 adults
(Mannini, Intille
et al. 2013)

Right wrist and
ankle

Staudemayer et
al.
(Staudenmayer,
Pober et al.
2009)

48 adults

Right hip

14 sedentary, sedentary,
ambulatory, lifestyle, and
exercise activities (11
distinct, 3 variations).
Lab-based protocol.
26 sedentary, cycling,
ambulatory, and
lifestyles activities (10
distinct, 16 variations)
20 activities (18 distinct,
2 variations)

Zhang et al.
(Zhang,
Rowlands et al.
2012)

60 adults

Right hip, right
wrist, and left
wrist

Skotte et al.
17 adults
(Skotte, Korshoj
et al. 2012)

Right thigh

12 sedentary, lifestyle,
and ambulatory activities
(8 distinct, 4 variations).
Combination of labbased and simulated freeliving protocol.
6 sedentary, lifestyle, and
ambulatory activities

58

Activity
classification
accuracy
Thigh: 78.3%
Ankle: 78.3%
Wrist: 71.5%

EE prediction
accuracy

Ankle: 95.0%
Wrist: 84.7%

N/A

88.8%

RMSE (METs):
ANN: 1.22
Linear regression:
1.51 – 2.09]
Bias (METs):
ANN: 0.05
Linear regression:
-0.30 – -1.21
N/A

Hip: 99.1%
Right wrist:
97.0%
Left wrist:
95.9%
Sensitivity and
specificity were
both 99% for
activity
discrimination

N/A

N/A

Table 2.2 (cont’d.)
Montoye et al.
(Montoye,
Washburn et al.
1983)

21 adults

Wrist hip and
left wrist

14 ambulatory and
exercise activities (6
distinct, 8 variations).
Lab-based protocol.

N/A

Montoye et al.
27 adults
(Montoye, Dong
et al. 2013)

Right wrist,
thigh, and ankle

14 sedentary, sedentary,
ambulatory, lifestyle, and
exercise activities (11
distinct, 3 variations).
Simulated free-living
protocol.

N/A

Rosenberger et
al.
(Rosenberger,
Haskell et al.
2013)

37 adults

Dominant hip
and wrist

Swartz et al.
(Swartz, Strath
et al. 2000)

70 adults

Right hip and
dominant wrist

13 sedentary, lifestyle,
N/A
cycling, and ambulatory)
activities. Combination
of lab-based and
simulated free-living
protocol.
27 lifestyle, occupational, N/A
exercise, and ambulatory
activities. Combination
of lab-based and
simulated free-living
protocol

59

Reliability:
Wrist: r = 0.74,
Hip: r = 0.63
Standard error:
Wrist: 7.9
ml/kg/min, Hip:
9.2 ml/kg/min
Correlations (r):
Thigh: 0.80
Ankle: 0.79
Wrist: 0.74
RMSE (METS):
Thigh: 1.61
Ankle: 1.69
Wrist: 1.85
Correlations (r):
Hip: r = 0.72
Wrist: r = 0.36

Correlations (r):
Wrist: r = 0.18
Hip: r = 0.56

Laboratory-based vs. free-living settings
Ultimately, measurement techniques need to be validated in a context similar to the
setting in which they will be used. When a technique is first tested, activities are generally
performed in a laboratory-based setting, where maximum control can be exerted over the timing
and order of activities performed. In these laboratory-based studies, activities are usually
performed in order of increasing intensity for at least 5-7 minutes, allowing participants to reach
steady-state EE (where EE matches the demands of the activity). Additionally, the activities
must be performed in a specific manner (i.e., walking/jogging speeds and cycling cadences are
the same for all participants), so that there is minimal variability in the activities (Freedson,
Melanson et al. 1998).
Laboratory-based validation studies are a crucial first step in the testing of measurement
devices because they provide a proof of concept that a given measurement device or method can
work well in a highly controlled environment. Additionally, highly valid criterion measures,
such as metabolic analyzers for measuring EE and DO for determining activity type, are
available for use in laboratory-based settings. However, once measurement methods are
validated in a laboratory, they must then be tested in a free-living environment since laboratory
conditions are very different from activities and settings that people encounter in their everyday
lives. In free-living settings, people are seldom engaged in steady-state activities and do not
normally perform activities for defined amounts of time, and there can be substantial variability
within activity types. For example, free-living walking rarely occurs at a constant speed, and
preferred walking speed can differ considerably among individuals. Additionally, treadmill and
non-treadmill walking elicit different gait patterns (Dingwell, Cusumano et al. 2001), lowering
the potential to generalize detection of treadmill walking to the detection of free-living walking.
60

To support this point, a study by Gyllensten et al. (Gyllensten and Bonomi 2011) found that an
ANN created in the laboratory using data from a single accelerometer (located on the lower
back) has 94% accuracy for classifying five categories of activities, but this accuracy dropped to
75% accuracy when used in a free-living setting (with IDEEA used as criterion). Additionally,
Lyden et al. (Lyden, Keadle et al. 2013) found that an ANN created in the laboratory performed
well in the laboratory but very poorly when applied to a free-living scenario, with biases of 33%
and 73% when estimating MET-hours and minutes spent in MVPA, respectively. Therefore,
they recreated their ANNs using free-living data. These findings have been further confirmed by
other studies (Bao and Intille 2004; Ermes, Parkka et al. 2008; Crouter, Kuffel et al. 2010),
providing strong evidence that laboratory validations must be applied to free-living settings with
caution. Therefore, it is important to incorporate aspects of a free-living environment into
validation protocols so that results obtained can be applied to real-world situations.
However, conducting validation studies in a true free-living environment is not feasible
due to the lack of a suitable criterion measure for measuring EE or activity type. Doubly-labeled
water is a commonly used method for assessing free-living EE, but this method only works well
for measuring total EE over a period of 1-2 weeks and cannot yield information about timing,
type, duration, or intensity of PA. Therefore, doubly-labeled water cannot give an indication of
how well a measurement method predicts EE for specific activities. Additionally, since most
activity monitors cannot be worn continuously (i.e., must be removed for showering and
sleeping), doubly-labeled water captures a significant amount of EE that is not recorded by the
monitors, precluding a comparison of monitor output to total EE.
Indirect calorimetry measured using a portable metabolic analyzer has also been used as a
criterion measure of field-based EE, but use of a metabolic analyzer can result in participant
61

reactivity and does not allow participants to perform many normal activities. While portable
metabolic analyzers allow for participants to perform activities outside of a laboratory, the
analyzer requires participants to wear a mask (for collecting data on expired gas volumes and
concentrations). Therefore, consumption of food or beverage is prohibited during the course of
wearing the equipment. Additionally, participants must wear a shoulder harness with multiple
pieces of equipment strapped to the participants’ backs, making lying or reclining uncomfortable
and unnatural. Another potential criterion measure could be use of whole room indirect
calorimeter chambers, which can measure oxygen consumption (to estimate EE) without
participants needing to wear any equipment. While this setting allows participants to perform
some activities as they would in a free-living setting, being confined to a small room is unnatural
and necessitates the use of exercise machines (e.g., treadmills, stair steppers, cycle ergometers)
to perform many lifestyle and ambulatory activities, making it a poor substitute for true freeliving. Additionally, whole room indirect calorimeter chambers are very expensive and are only
located in a few laboratories around the country, making accessibility to them very difficult.
For the measurement and classification of activity type, DO is commonly used as a
criterion method for measuring free-living activity. DO allows researchers to capture
participants’ actions in the field (Santos-Lozano, Marin et al. 2012), but the act of being
observed is likely to cause reactivity in participants (McKenzie 2002), reducing the
generalizability of findings to a true free-living setting. Additionally, DO would have to be
performed for a period of days or weeks to capture participants’ true activity patterns (SantosLozano, Marin et al. 2012), but this is simply not feasible in a research context and would pose a
significant burden on participants and observers. Finally, it is important to validate measurement
techniques with participants performing a variety of activities. In free-living settings, adults
62

spend the majority of their time in SB and much less time in household, exercise, or sport
activities. Thus, observing participants for a shorter period of time would likely result in a lack
of variety in activities detected, hindering the ability of the measurement technique to classify
important lifestyle, household, or exercise activities and limiting the utility of the measurement’s
validation to only the population in which it was validated.
Clearly, both laboratory and free-living validation studies are subject to limitations, but a
combination of the two, also called ‘simulated free-living,’ may be an optimal balance of the two
settings. In simulated free-living, researchers can exert control over the types of activities and
the minimum amount of time participants need to perform the activities, but the participants can
choose the amount of time and order in which they perform the activities as well as technique
they use to perform each activity (i.e., not everyone walks at the same speed or sweeps the same
way). Also, since simulated free-living allows many activities to be performed in a relatively
short period of time, both DO and indirect calorimetry can be utilized for criterion measures of
activity type and EE. Therefore, simulated free-living provides better generalizability to realworld conditions than strict laboratory-based protocols, but it does not face the limitations of
trying to find an appropriate criterion measure for testing the measurement methods in a true
free-living setting. Importantly, simulated free-living has shown promise in several recent
validation studies of accelerometers (Sun, Schmidt et al. 2008; Rumo, Amft et al. 2011),
providing a strong case for its use in the current study. Once the accelerometers and machine
learning algorithms have been validated in a simulated free-living setting, they can then be used
in true free-living settings with reasonable confidence of their accuracy.

63

Accelerometer reliability
In order to be used effectively for measurement of PA, EE, and SB, accelerometers must
exhibit high intra- and inter-monitor reliability. Reliability of accelerometers has been assessed
in two main ways: 1) laboratory studies where accelerometers are placed on mechanical shakers
and 2) accelerometers are placed either next to each other or on the opposite side of the body
(i.e., left vs. right hip) and worn in free-living settings. This section will focus on reliability
studies of the ActiGraph and GENEA accelerometers since these are the two accelerometers
being used in the current study.
In laboratory studies using mechanical shakers, accelerometers generally exhibit very
high intra- and inter-monitor reliability over the range of intensities encountered in most lifestyle
activities (indicated by high intraclass correlations and low coefficients of variation [CVs]). The
ActiGraph accelerometer has been tested extensively for intra- and inter-monitor reliability, with
intra-monitor intraclass correlations ranging from 0.84-0.92, inter-monitor intraclass correlations
from 0.71-0.99, and CVs ranging from 1-9%, (Metcalf, Curnow et al. 2002; Brage, Wedderkopp
et al. 2003; Esliger and Tremblay 2006; McClain, Sisson et al. 2007; Santos-Lozano, Marin et al.
2012; Santos-Lozano, Torres-Luque et al. 2012; Troiano and McClain 2012). However, SantosLozano et al. (Santos-Lozano, Marin et al. 2012) found that CVs increased considerably (both
intra- and inter-monitor) at very high and very low intensities when on the shaker. The high CV
at low intensities is not concerning since the high CV is likely being driven by the very low mean
acceleration during low-intensity activities. Similarly, the poor CV achieved during high
intensity shaking is not particularly concerning for the current study since the ActiGraph
placements will be on the hip and mid-thigh. However, the high CV at high intensities may be
problematic for studies with the accelerometer placed on the wrist or ankle, where accelerations
64

are much more rapid than those experienced at the hip. Importantly, a study by Brage et al.
(Brage, Brage et al. 2003) discovered that raw acceleration data has better inter-monitor
reliability than activity count data, lending further support to the use of raw acceleration data
with machine learning algorithms.
In free-living settings, intra- and inter-monitor reliability can be assessed by putting
monitors on the same body part but on the opposite side of the body (i.e., left vs. right hip).
McClain et al. (McClain, Sisson et al. 2007) tested the inter-monitor reliability by comparing
outputs from ActiGraph accelerometers mounted on the left vs. right hips and found an intraclass
correlation of 0.99 and CV of 4.9% when measuring MVPA, providing evidence of the realworld reliability of the ActiGraph. McClain’s work has been supported by several other studies
comparing accelerometers placed on the right and left hips (Brage, Wedderkopp et al. 2003;
Vanhelst, Baquet et al. 2012). Additionally, Welk et al. (Welk 2002) conducted a study in which
participants performed repeated walking and running trials on a treadmill while wearing only one
monitor at a time and found that intra-monitor CVs were very similar to inter-monitor CVs.
They postulated that differences seen in the accelerometer output were likely due to slight
differences in monitor placement rather than the variation in the accelerometers themselves,
providing further evidence that the ActiGraph has good inter- and intra-monitor reliability in the
free-living environment.
Additionally, although the GENEA accelerometer is relatively new, one study by Esliger
et al. (Esliger, Rowlands et al. 2011) has evaluated the reliability of the GENEA in the laboratory
and indirectly in a field-based setting. Using a mechanical shaker, they found an intra-monitor
CV of 1.4% and an inter-monitor CV of 2.1% when assessing 47 monitors in 15 different shaker
speeds. Also, in a free-living setting, the GENEA accelerometers worn on the left and right
65

wrists both showed excellent validity (r=0.83-0.86) for estimating VO2 (Esliger, Rowlands et al.
2011), providing preliminary evidence of the reliability of the GENEA in both laboratory and
free-living settings.
In summary, reliability studies performed in laboratory and free-living settings indicate
that the ActiGraph and GENEA exhibit good intra- and inter-monitor reliability for measuring
MVPA as well as raw accelerations, supporting their use in the current study.
Identifying non-wear
Determining when accelerometers are being worn vs. when they are removed is very
important for calculating daily PA and SB. Logs or diaries can be used to help determine weartime, but these are not ideal for large studies since they are subject to error in recording and
increase participant and research burden. When establishing wear-time from accelerometer data,
there are several criteria that must be addressed: non-wear vs. SB, minimum hours/day of wear,
and minimum days/week of wear.
The first difficulty in identifying wear-time is distinguishing between non-wear and SB,
both of which often result in accelerometers registering zero counts/min. Many data reduction
methods have been created in order to identify and remove accelerometer non-wear time by
setting a minimum amount of time with continuous zero counts/min. This minimum time has
been set anywhere from 10 minutes (Riddoch, Bo Andersen et al. 2004) to 90 minutes (Choi, Liu
et al. 2011), and there is no consensus on the optimal length of 0 counts to determine non-wear.
Continuous wear and implementation of machine learning algorithms using raw data can
help more effectively deal with non-wear. Conventionally, hip-mounted accelerometers were

66

worn during waking hours and removed at night and before performing water-based activities
(Welk 2002). Frequent removal of accelerometers is likely to lower compliance when
participants forget to put the monitors back on in the morning or after swimming or showering
(Kinder, Lee et al. 2012), and the data collected with the accelerometers during times of nonwear do not reflect actual activity levels. Hip-mounted accelerometers have been worn
continuously in several studies (Hjorth, Chaput et al. 2012; Kinder, Lee et al. 2012), but having
an accelerometer protruding from the hip could be uncomfortable for sleeping and lower
participant compliance. Accelerometer placement on the wrist or thigh allows for continuous
wear, providing an advantage of these sites over the hip. As previously mentioned, wrist-worn
accelerometers can be worn continuously and have improved compliance in NHANES data
collection (Troiano and McClain 2012). Additionally, studies using the activPAL accelerometer
(Hart, Ainsworth et al. 2011; Lord, Chastin et al. 2011; Martin, McNeill et al. 2011) indicate that
thigh-mounted accelerometers can be worn continuously with minimal subject discomfort (Craft,
Zderic et al. 2012; Feito, Bassett et al. 2012).
Choice of accelerometer may allow or preclude continuous wear. While the newest
ActiGraph models are said to be waterproof, the GT3X+ and all older models are water resistant
at best, and the company recommends their removal for water-based activities (ActiGraph 2013).
Therefore, use of protective sleeves or barriers is necessary to allow for continuous wear.
Conversely, GENEA accelerometers are waterproof and can be worn 24 hours/day.
Additionally, GENEA accelerometers contain a skin temperature sensor, which can help with the
determination of wear-time and remove the need for using the data reduction techniques
described above for identifying wear-time. Therefore, GENEA accelerometers do not need to be

67

removed for any reason during data collection, and if they are, the temperature sensor will help
to determine exact wear-time, making them well-suited for use in free-living settings.
Moreover, machine learning algorithms are designed for pattern recognition and, with
proper development and use of raw data, should be able to recognize non-wear as distinct from
SB. The acceleration and monitor orientation signals when an accelerometer is not being worn
are likely very different than when the monitor is being worn during SB because when someone
is engaged in SB, even the smallest movements will be detected by the accelerometer, allowing
differentiation of SB from non-wear. Therefore, when developing machine learning algorithms,
it is important to include non-wear as an activity so that the algorithms can detect non-wear as
distinct from SB.
Additionally, there is a lack of consensus on the minimum amount of time per day a
monitor must be worn in order to yield an accurate reflection of someone’s daily activity
patterns, with minimal wear-time ranging from two hours/day (Brownson, Hoehner et al. 2009)
to 16 hours/day (Slootmaker, Schuit et al. 2009), although most studies require a minimum of 812 hours/day (Masse, Fuemmeler et al. 2005). Finally, the minimal number of days of valid data
needed for an accurate measure of true PA levels has ranged from one day (Le Masurier, Sidman
et al. 2003) to seven days (Matthews, Ainsworth et al. 2002), with most studies using 3-4 as a
minimum number of valid days (Trost, McIver et al. 2005). Choice of minimum number of
continuous zeroes for non-wear, the minimal number of hours/day of accelerometer wear, and
the minimum number of day of wear all can significantly affect the results of subsequent
analyses regarding total PA or SB and activity patterns (Trost, McIver et al. 2005; Evenson and
Terry 2009; Oliver, Badland et al. 2011; Herrmann, Barreira et al. 2012). We expect that a more

68

accurate way of recognizing non-wear may help to advance discussion on these compliance
issues.
It is important to note that these data reduction rules have been established with the intent
of achieving a certain test-retest reliability (usually r=0.80-0.90) when using linear regression
approaches for measuring PA and/or EE (Welk 2002). With improvements in accelerometer
technology, continuous wear of monitors, and machine learning techniques for data processing
and analysis, these reduction rules may no longer apply. However, this issue lies outside the
scope of the current study.
Summary of current evidence and future directions
In conclusion, there is considerable evidence linking both PA and SB to poor
cardiometabolic health. However, without improvements in the measurement of PA and SB
along with accurate determination of activity type, we will be limited in our ability to detect the
true risks of SB or monitor the effectiveness of interventions at reducing SB. Machine learning
techniques show great potential to improve measurement of SB as well as EE and classification
of activity type, but their current complexity may prohibit wide adoption by PA researchers.
The current study aims to develop ANN algorithms for hip-, wrist-, and thigh-mounted
accelerometers using simple-to-compute features from the accelerometer data and freely
available software to allow for relatively simple creation and testing of the ANNs. Our study
will directly compare the accuracy of the hip-, wrist-, and thigh-mounted accelerometers to
measure EE, SB, and activity type in a simulated free-living setting.

69

CHAPTER 3
VALIDATION AND COMPARISON OF ACCELEROMETERS LOCATED ON THE
WRISTS, HIP, AND THIGH FOR FREE-LIVING ENERGY EXPENDITURE
PREDICTION

ABSTRACT
The purpose of this study was to develop, validate, and compare energy expenditure
prediction models for accelerometers placed on the wrists, hip, and thigh. A secondary purpose
was to achieve high measurement accuracy using simple accelerometer features as input
variables in energy expenditure prediction models. METHODS: Forty four healthy adults
participated in a 90-minute simulated free-living activity protocol. During the protocol,
participants engaged in a total of 14 different sedentary, ambulatory, lifestyle, and exercise
activities for 3-10 minutes each. Participants chose the order, duration, and intensity of
activities. Four accelerometers were worn (right and left wrists, right hip, and right thigh) in
order to predict energy expenditure compared to that measured by the criterion measure (portable
metabolic analyzer). Artificial neural networks were created to predict energy expenditure from
each accelerometer using a leave-one-out cross-validation approach. Accuracy of the neural
networks was evaluated using Pearson correlations, root mean square error, and bias. Several
models were developed using different input features in order to determine those most relevant
for use in the models. RESULTS: All four accelerometers achieved high measurement
accuracy, with correlations >0.80 for predicting energy expenditure. The thigh accelerometer
provided the highest overall accuracy (r=0.89) and lowest root mean square error (1.05 METs),
and the differences between the thigh and the other monitors was more pronounced when fewer
input variables were used in the predictive models. None of the predictive models had an overall

70

bias for estimation of energy expenditure. CONCLUSIONS: A single accelerometer placed on
the thigh provided the highest accuracy for energy expenditure prediction, although monitors
worn on the wrists or hip can also be used with high measurement accuracy.

71

INTRODUCTION
Physical activity (PA) has long been recognized for its beneficial effects on many aspects
of health. Because of these known health benefits, the most recent PA guidelines advocate that
adults obtain a minimum of 150 min/week of moderate-intensity PA, 75 min/week of vigorousintensity PA, or a combination of the two (PAGAC 2008). Moderate- and vigorous-intensity PA
can be defined according to the amount of energy they elicit, with moderate-intensity PA being
any activity that elicits an energy expenditure (EE) of at least 3.0 times, but less than 6.0 times,
the resting level (METs) and vigorous-intensity PA as an activity that elicits at least 6.0 METs.
Accurate measurement of EE is vital for understanding prevalence of meeting PA
recommendations, identifying populations who may benefit from interventions aimed at
increasing PA, and better understanding the relationship between PA and health.
Objective PA measurement tools such as activity monitors have shown considerable
promise due to their relative ease of use and accurate measurement of PA for days or weeks at a
time (Welk 2002). Accelerometer-based activity monitors in particular have seen dramatically
increased use for measurement of free-living PA. Accelerometers are generally worn on the hip
and record accelerations of the trunk as a person moves. These accelerations have traditionally
been used as an independent variable in linear regression equations to estimate EE. Linear
regression approaches to prediction of EE are appealing due to their simplicity and their high
accuracy in initial validation studies, which focused on measuring the EE of ambulatory
activities (i.e., walking and running) in controlled settings (Freedson, Melanson et al. 1998).
However, the linear relationship between accelerations and EE does not seem to hold when
applied to non-ambulatory activities or free-living environments, resulting in much poorer
prediction accuracy in such situations (Hendelman, Miller et al. 2000; Swartz, Strath et al. 2000).
72

To overcome these limitations, researchers have explored several avenues to improve PA
measurement. One approach involves the use of more than one monitoring device to measure
accelerations and/or other physiologic variables (i.e., heart rate) to improve EE measurement.
Use of multi-monitor systems has shown promise for improving EE measurement in several
studies (Zhang, Pi-Sunyer et al. 2004; Albinali, Intille et al. 2010; Dong, Biswas et al. 2013), but
the use of multiple monitors dramatically increases participant and researcher burden, preventing
these methods from being feasible for use in large surveillance, intervention, or epidemiologic
studies.
Another approach to improving EE prediction has involved using techniques other than
linear regression for modeling the relationship between acceleration data and EE. Machine
learning, a branch of artificial intelligence, has become a popular modeling technique and has
been shown to improve EE measurement in both laboratory-based and free-living settings
(Rothney, Neumann et al. 2007; Staudenmayer, Pober et al. 2009; Freedson, Lyden et al. 2011).
However, there are still many unresolved questions regarding use of machine learning for
predicting EE. First, machine learning modeling may allow for accurate prediction of EE using
accelerometers placed on body locations other than the hip (i.e., wrist, ankle, and thigh), but it is
unclear if accelerometers placed on alternate body locations can achieve the same measurement
accuracy as a hip-mounted accelerometer. The wrist is an appealing accelerometer placement
site due to its utility in measuring sleep and activity type as well as ease of wear (Kripke,
Mullaney et al. 1978; Jean-Louis, Kripke et al. 2001; Zhang, Rowlands et al. 2012; Mannini,
Intille et al. 2013). Additionally, accelerometers worn on the thigh have shown high accuracy
for measuring ambulatory activity and sedentary behavior (Grant, Ryan et al. 2006; Ryan, Grant

73

et al. 2006). Despite the potential for the wrist and thigh as measurement sites, there is very
limited evidence regarding their utility for measuring EE.
Second, a current limitation of using machine learning to model accelerometer data is that
machine learning models are much more complex than traditional linear regression approaches,
both in the extraction of useful information (features) from accelerometer data to use as inputs
into the models as well as the model creation itself. This complexity currently limits the use of
machine learning and keeps it from being used on a wider scale. However, there is some
evidence that the process of developing and using machine learning can be simplified without
compromising measurement accuracy. In 2009, Staudenmayer et al. (Staudenmayer, Pober et al.
2009) took a large step toward simplifying the use of machine learning modeling. They used the
R statistical software (a freely available, open-source software package) to develop a specific
type of machine learning model (an artificial neural network [ANN]) to predict EE and activity
type. Additionally, they used simple, time-domain features (percentiles of the acceleration signal
and autocorrelation) as input variables and achieved dramatically improved EE estimations over
linear regression approaches. However, it is unknown whether the features they used as input
variables in their models represent an optimal set of input variables for maximizing EE
prediction accuracy.
Third, most validation studies are carried out in laboratory-based settings, which allows
for good control of type, duration, and intensity of activities performed. However, there is
considerable evidence that laboratory-based validation techniques have considerably lower
accuracy when applied to free-living situations (Swartz, Strath et al. 2000; Crouter and Bassett
2008; Lyden, Keadle et al. 2013).

74

The purpose of this study was fourfold: 1) to validate models for estimation of EE from
accelerometers worn on the wrists, hip, and thigh for prediction of EE in a simulated free-living
setting, 2) compare the accuracy of EE prediction for accelerometers located on the wrist, thigh,
and hip, 3) compare accuracies achieved by the left and right wrists, and 4) compare different
input features to determine an optimal set of simple input features that maximizes prediction
accuracy while minimizing complexity of the machine learning technique.

75

METHODS
Summary of protocol
Participants were brought into the Human Energy Research Laboratory to participate in
a 90-minute simulated free-living protocol. For the protocol, participants performed 14
activities for between 3-10 minutes, with order and duration of activities left up to participants.
During the protocol, participants wore a portable metabolic analyzer (for a criterion measure of
EE) and four accelerometers.
Participants
A total of 44 adults (22 male, 22 female) were recruited from the area of East Lansing,
MI via email, flyers, and word of mouth for participation in this study. Exclusion criteria
included the following:1) if participants had known health conditions that prevented them from
being able to perform MVPA safely, 2) if they were wheelchair-bound or had orthopedic
limitations that invalidated the use of accelerometry for activity measurement, or 3) if they fell
outside the age range of 18-44 years.
Anyone over the age of 44 was excluded from participation as the American College of
Sports Medicine asserts that those aged 45 and above are at higher risk for acute cardiovascular
complications with exercise (ACSM 2009), and we did not have medical personnel available
during testing to approve vigorous PA for older individuals. Anyone under the age of 18 was
excluded from these preliminary validations because children and adolescents have a higher
relative EE for activities than adults due to normal growth and maturation (Krahenbuhl and
Williams 1992), and their activity patterns are different than those of adults (Bailey, Olson et

76

al. 1995).

This study was approved by the Michigan State University Institutional Review

Board prior to participant recruitment. Details of the study were described to each participant
immediately upon arriving at the Human Energy Research Laboratory, and written informed
consent was obtained prior to proceeding with the protocol.
Instrumentation
The instruments used in this study were ActiGraph GT3X+ accelerometers, GENEActiv
accelerometers, and an Oxycon Mobile portable metabolic analyzer. The Oxycon portable
metabolic analyzer provided a criterion measure of EE. The accelerometers and portable
metabolic analyzer were synchronized to an external clock before each test; descriptions of the
accelerometers and metabolic analyzer follow. Pictures of the equipment can be seen in Figure
A.1 in the Appendix.
ActiGraph accelerometers
The ActiGraph (ActiGraph LLC, Pensacola, FL) is the most commonly used accelerometer
on the market for PA research, and there is an abundance of literature regarding its reliability and
validity for measurement of PA (Freedson, Melanson et al. 1998; Matthew 2005). Two GT3X+
models were placed on each participant during the study. One accelerometer was placed on the
midline of the right thigh, one third of the way between the hip and knee and adhered to the leg
with hypoallergenic sticky tape. The other ActiGraph was mounted on the right hip at the anterior
axillary line with an elastic belt. The ActiGraph GT3X+ records raw accelerations of up to ± 6
times gravitational force (6g) in three axes of movement. For the current protocol, the GT3X+
accelerometers recorded at a rate of 40 samples per second (40 Hz).

77

GENEA accelerometers
The GENEActiv (Activinsights Ltd, Kimbolton, Cambridgeshire, UK) is a new
accelerometer that has recently been validated for PA measurement (Esliger, Rowlands et al.
2011). Like the ActiGraph, the GENEA records raw data of up to ± 6g in three axes of
movement. The GENEAs were set to record acceleration data at a rate of 20 Hz for the current
study. The GENEA is shaped like a watch and comes with a standard wrist strap, allowing for
easy attachment to the wrist. Participants wore two GENEA accelerometers (one on each wrist)
for this study. Each GENEA was fastened securely to the dorsal side of the wrist, between the
styloid processes of the radius and ulna (Esliger, Rowlands et al. 2011).
The acceleration data for all four accelerometers were time stamped and stored within the
monitors and later were downloaded to a computer for analysis. Additionally, the accelerometers
were oriented so that the x-axis was the vertical axis, the y-axis was the medial-lateral axis, and the
z-axis was the anterior-posterior axis.
Oxycon portable metabolic analyzer
The Oxycon Mobile (Cardinal Health, Yorba Linda, CA) portable metabolic analyzer was
used to measure oxygen consumption (VO2) and carbon dioxide production (VCO2) during 13 of
the 14 activities performed in the protocol (EE was recorded but not analyzed for the non-wear
activity). The Oxycon is lightweight (950 g) and was worn on the back using a shoulder harness.
Participants were fitted with a breathing mask (held in place by a mesh cap), which was attached to
a digital turbine flowmeter and gas sampling tube, allowing the analyzer to measure inspired and
expired air volume so that VO2 and VCO2 could be calculated on a breath-by-breath basis. VO2
data were expressed in ml/kg/min and converted to METs (by dividing VO2 by 3.5) for analysis.
78

Prior to each test, the Oxycon was calibrated according to manufacturer’s specifications to ensure
accurate measurements for flow rate and gas concentration. The Oxycon has been shown to
provide valid VO2 measures over a range of exercise intensities (Rosdahl, Gullstrand et al. 2010;
Akkermans, Sillen et al. 2012) and was used as the criterion measure for EE in this study.
Procedure
Each participant reported to the Human Energy Research Laboratory for one visit.
Participants were asked to refrain from eating for three hours prior to visiting the laboratory to
minimize risk of discomfort while performing the activities and because food ingestion can affect
EE values. Details of the study were discussed with each participant. Written informed consent
was obtained, and a physical activity readiness questionnaire was administered to ensure that the
participant was healthy and had no contraindications to engaging in MVPA. If participants had
answered ‘yes’ to any question on the questionnaire, they would have been asked to obtain
physician approval before being able to participate in the study; however, this did not occur. Next,
participant weight and height were taken by trained research assistants according to standardized
methods (Malina 1995). Weight was measured to the nearest 0.1 kg using a Seca digital scale
(Seca, Hanover, Germany), with shoes off and weight balanced on the center of the scale. Height
was measured to the nearest 0.1cm using a Harpenden stadiometer (Holtain Ltd., Crymych, United
Kingdom). Before measurement, the participant removed his/her shoes, stood erect with feet flat
on the floor, head aligned in the Frankfurt plane, and the back of the feet, shoulders, and head
resting against the back of the board. Two measurements were taken and averaged for both weight
and height. If the two weights differed by more than 0.3 kg or if the two heights differed by more
than 0.4 cm, a third measurement was taken, and the closest two were averaged. Body mass index
(BMI) was calculated by dividing body weight by the square of height (kg/m2). Age was assessed
79

by asking participants to state their age in years, and handedness was assessed by asking
participants which hand they prefer to use for the majority of activities.
Each participant wore the Oxycon metabolic analyzer, one ActiGraph on the hip, another
ActiGraph on the thigh, one GENEA on the left wrist, and one GENEA on the right wrist while
performing 14 activities (activity descriptions provided in Table 3.1, and pictures of the activities
being performed can be found in Appendix D). These activities comprised a range of intensities
from sedentary to vigorous and represented a mixture of sedentary, ambulatory, exercise, and
lifestyle. Ambulatory activities (walking, running) are common in accelerometer validation
literature; however, we added the sedentary, exercise, and lifestyle activities to determine the
potential for the four accelerometers to measure a range of activity types and intensities often seen
in free-living settings. Additionally, we added a non-wear activity so that the ANNs would be able
to recognize when the accelerometers were not being worn, allowing for easy exclusion of nonwear time from data analyses. The non-wear activity was not included in our analysis of EE
prediction.

80

Table 3.1. Activities performed during the simulated free-living protocol.
Activity
Category

Activity

Activity
Intensity

Lying down (T1)

Sedentary

Reading (T2)

Sedentary

Computer (T3)

Sedentary

Standing (T4)

Light**

Laundry (T5)

Light

Sweeping (T6)

Light

Walking slow (T7)

Light

Walking fast (T8)

Moderate

Jogging (T9)

Vigorous

Cycling
(CY)

Cycling (T10)

Moderate/
Vigorous

Stair use
(SU)

Stair climbing and
descending (T11)

Moderate/
Vigorous

Biceps curls (T12)

Light

Squats (T13)

Moderate

Non-wear of
accelerometer (T14)

N/A

Sedentary
(SE)
Standing
(ST)
Lifestyle
(LI)
Leisure walk
(LW)
Brisk walk
(BW)
Jogging
(JO)

Exercise
(EX)
Non-wear
(NW)

Description of Activity*
Lying on a mat on the floor
Reading a magazine article while
sitting at a table
Sitting and playing a computer game
that involves mouse clicking and typing
Standing still with arms at sides
Folding towels and putting them in a
laundry basket
Sweeping confetti into piles
Walking at a self-selected ‘slow’ pace
in a hallway
Walking at a self-selected ‘brisk’ pace
in a hallway
Jogging at a self-selected pace in a
hallway
Cycling on a cycle ergometer at a selfselected cadence of 50-100 rpm with 1
kg resistance
Walking up and down a flight of stairs
at a self-selected pace
Standing still while doing biceps curls
with a 3-lb. weight in each hand
With feet shoulder-width apart,
bending at the knees (to a 90° angle)
while holding an unweighted broom
behind the head
Not wearing the accelerometer

* Activity order, intensity, and duration (3-10 minutes) were left up to participants.
** Standing has traditionally been considered SB; however, recent literature suggests that standing
should be considered light-intensity instead of SB due to the differential physiologic effects of
standing as compared to sitting/lying (Owen, Healy et al. 2010).

81

The 14 activities were performed in a 90-minute, simulated free-living setting which took
place in a laboratory room inside the Human Energy Research Laboratory and a hallway and
stairwell outside the laboratory. A list of the activities was written on a whiteboard for participants
at the beginning of the visit and a description of each activity was given. The order of activities
on the whiteboard was altered every 4-5 participants in order to avoid ordering effects during the
visit. Participants completed each of the 14 activities for a total of at least three minutes and for no
more than 10 minutes, but the order, intensity, and timing of the activities were left up to each
participant. A research assistant observed and recorded each activity on a handheld computer
while it was being performed and periodically updated participants on which activities they still
needed to complete. The non-wear activity was saved until the end of the 90-minute protocol so
that participants would not spend a significant portion of the 90-minute protocol trying to remove
and reattach the accelerometers. Upon completion of the protocol, participants were given a $35
Target® gift card.
Data reduction and modeling
Artificial neural networks
ANNs are nonlinear models which take a set of inputs x1…xk and use them to predict a
certain output variable y (e.g., EE or activity type), where k is the number of features used to
predict y. An ANN designed to predict EE was developed for each accelerometer. Figure 3.1
shows a graphical depiction of the ANN. The general form of an ANN model can be seen in
Equation 1.
Equation 1:

∑

[

(

82

∑

)]

In Equation 1, w are the weights that need to be estimated, ( )

(which is a linear

function), H is the size of the hidden layer, and y is EE in METs. In accordance with previous
research (Preece, Goulermas et al. 2009; Staudenmayer, Pober et al. 2009; Trost, Wong et al.
2012), our ANNs contained only one hidden layer.

83

Figure 3.1. ANN for predicting EE.

Legend for Figure 3.1
The input layer contains the features used as input variables
*The Hidden Layer contains 15 hidden units, but only three are shown for simplicity.
Accelerometer signal features (one of each per axis, three total of each per accelerometer)
1. Mean = mean
2. Var = variance
3. Cov = covariance
4. Min = minimum
5. Max = maximum
6. MeanOR = mean accelerometer orientation
7. VarOR = variance of
8. 10th %ile = 10th percentile
accelerometer orientation
9. 25th %ile = 25th percentile
10. 50th %ile = 50th percentile
th
th
11. 75 %ile = 75 percentile
12. 90th %ile = 90th percentile
Participant characteristics features
13. Ht = participant height
14. Wt = participant weight
15. Gender = participant gender
Non-feature abbreviations
S = summations of the input layer in the hidden units
U = activation function for the hidden layer
W1 = the weight vectors for each of the inputs
W2 = the weight vectors for each of the summations
84

The ANNs were created and tested using a leave-one-participant-out approach. In this
approach, the ANN was first created from a ‘training’ data set, where the input features and the
outcome variable (EE) were used to estimate the weights for each input feature. This training set
consisted of the data from all but one participant in the study. Then, the ANN was tested on the
data from the participant left out of the training phase. This testing was conducted by supplying
the input features and comparing the predicted EE from the ANNs to the measured EE from the
criterion measure (Oxycon metabolic analyzer). This process was conducted with each
participant’s data used as the testing data once, therefore obtaining an ANN for each participant in
the study. Weights determined from each iteration of the leave-one-participant-out validation were
averaged to obtain a final ANN. This process was conducted separately for each accelerometer,
resulting in four distinct ANNs.
There were three additional considerations that were addressed in building our ANNs: 1)
window length, 2) relevant features to use as input variables, and 3) size of the hidden layer.
Window length
In order to analyze accelerometer data, it must first be divided into smaller segments, called
‘epochs’ or ‘windows,’ for analysis. By dividing the data into windows, EE can be assessed
separately for each window to yield information on activity type, duration, intensity, etc. Windows
of 60 seconds are commonly used for analyzing accelerometer data because outputting a given EE
every minute is intuitively appealing and works well for steady-state activities (Staudenmayer,
Pober et al. 2009; Freedson, Lyden et al. 2011). Additionally, longer windows (i.e., 30-60
seconds) increase the amount of information available with which to determine activity type and
have been shown to improve EE prediction accuracy (Trost, Wong et al. 2012). Finally, early

85

accelerometers had limited data storage, so acceleration data had to be stored in 60-second
windows in order to be able to record data for a period of several days.
Sixty-second windows work well for laboratory-based protocols where participants
perform activities for a specific amount of time at steady state and then change to the next activity
at known intervals (e.g. every five minutes) (Trost, Wong et al. 2012). However, these long
windows may be less optimal in free-living situations, where steady-state EE is rarely achieved for
physical activities and where activities rarely start or end exactly on the minute (Orendurff, Schoen
et al. 2008; Lyden, Keadle et al. 2013). Other studies have shown similar or better accuracy of
measurement of EE in adults when using shorter epochs (Gabriel, McClain et al. 2010; Ayabe,
Kumahara et al. 2013; Orme, Wijndaele et al. 2014); therefore, we chose to use 30-second
windows in the current study.
Features
As mentioned previously, the activity counts variable is commonly used as an input for
linear regression equations used to measure EE and activity intensity. However, contained within
activity counts are useful data ‘features’ that can be extracted and used in either linear regression
models or machine learning algorithms. There are several different types of features that can be
used as input variables. Time-domain features are most commonly used because they can be
directly extracted or computed from accelerometer signal data. Examples of time-domain features
are mean, standard deviation, skewness, or percentiles of the acceleration signal. In addition to
being directly available from accelerometer data, many time-domain features are easy to
understand and interpret (Preece, Goulermas et al. 2009; Staudenmayer, Pober et al. 2009). The
other main type of features, frequency-domain features, can be used either in conjunction with or

86

independent from time-domain features, yielding similarly high accuracy for activity type
classification as time-domain features in some studies (Preece, Goulermas et al. 2009; Mannini,
Intille et al. 2013). However, frequency-domain features require additional steps such as framing
the data, complex mathematical transformations, and filtering, and their calculation requires
significant computational power (Preece, Goulermas et al. 2009). Additionally, several studies
provide evidence that time-domain features can be used to achieve high activity classification (7090% from a single accelerometer) and EE predication accuracy without use of frequency-domain
features (Herren, Sparti et al. 1999; Staudenmayer, Pober et al. 2009; Dong, Montoye et al. 2013;
Montoye, Dong et al. 2013). Other than time- and frequency-domain features, simple descriptive
features, such as accelerometer orientation or participant demographic variables, can also be used
and may improve EE measurement accuracy.
Many accelerometer signal features have been used in previous research, and the models
created have varied considerably in complexity and measurement accuracy. While adding more
features may improve accuracy of the ANN, it may also lead to overfitting ANNs to the data used
for training the ANN, resulting in poorer generalizability of the model when applied to a new
population (Preece, Goulermas et al. 2009). Additionally, a major drawback of many machine
learning models is that while they tend to have high measurement accuracy, model complexity can
quickly render them unusable for anyone who lacks considerable knowledge of mathematics or
computer science and/or without access to expensive computing software (Pober, Staudenmayer et
al. 2006; Rothney, Neumann et al. 2007; Staudenmayer, Pober et al. 2009). Thus, this study
focuses on using easy-to-compute features as input variables and identifying a small number of
these features that can achieve high measurement accuracy.

87

Before computing features, the 40 Hz data from the ActiGraph accelerometers were
reintegrated to 20 Hz for comparison with the data from the GENEA. Table 3.2 provides a list of
calculations for the 39 features tested and used in the analyses. Calculation and extraction of the
accelerometer features were performed in Microsoft Excel. The 36 accelerometer features (12
features for each of three accelerometer axes) are all time-domain features that have been
effectively utilized in previous studies; additionally, weight, height, and gender were included to
account for demographic characteristics of participants. For the EE prediction, the 30-second
windows allow 600 accelerometer signal samples for calculating the features (20 samples/second x
30 seconds). Mean, variance, covariance, minimum, maximum, mean and variance of monitor
orientation, and the 10th, 25th, 50th, 75th, and 90th percentiles were calculated separately for x-, y-,
and z-axes. These features were chosen to allow the ANN sufficient data to accurately predict EE.
After creating the ANNs using all 39 features, follow-up analyses were conducted to determine an
optimal subset of features that reduced complexity of the ANNs with minimal loss of accuracy. In
all, we used and compared five different sets of features. These feature sets can be seen in Table
3.3. Two feature sets (sets 2 and 5 in Table 3.3) were similar to those used successfully in previous
studies for EE prediction (Staudenmayer, Pober et al. 2009; Dong, Biswas et al. 2013). Feature set
1 was the full set, consisting of many potentially important characteristics of the acceleration signal
that have also been used in other studies, but not necessarily in the same combinations (Preece,
Goulermas et al. 2009). From the 36 accelerometer features used in set 1, correlations were
computed among the features to determine and remove redundancy in information available from
the features (Rothney, Neumann et al. 2007). As with linear regression, highly correlated input
variables can cause collinearity in the ANNs and may reduce their generalizability. Therefore, we
chose two features from each accelerometer axis that were poorly correlated with each other (mean

88

and variance) and then used a stepwise approach to select features that had correlations of less than
r=0.70 with features already included in the set. Using this approach, we arrived at feature set 3,
consisting of mean, variance, minimum, and maximum of the acceleration signal (for each axis of
measurement) as well as participant weight, height, and gender (15 total features). Finally, in
feature set 4 we initially included lag-one autocorrelation, which has been used in many other
studies since it can yield valuable information on the temporal nature of activities by assessing the
correlation of two adjacent windows of acceleration data. However, the calculation of
autocorrelation involves dividing by the variance of the acceleration data within the adjacent
windows, which for many sedentary activities is 0 and results in an invalid calculation. Other
studies using autocorrelation have used automatic rules for classifying EE of sedentary activities
(i.e., Trost et al. automatically assigned all windows with an invalid lag-1 autocorrelation a MET
value of 1.0) (Trost, Wong et al. 2012), but we feel that this approach is limited since different
sedentary activities may elicit slightly different EE. Instead of using lag-one autocorrelations with
an automatic classification scheme for all sedentary activities, we calculated covariance as a
feature since covariance is simpler to calculate, is defined even when variance is zero, and can
provide information regarding similarity of the accelerometer signals of adjacent data windows
(similar to autocorrelation). These feature sets were tested including and excluding the three
participant characteristics (weight, height, and gender) in order to determine if demographic
characteristics would improve accuracy of the models.

89

Table 3.2. Features used for EE prediction.
Feature
number
1-3*

Feature used

4-6*

Variance of acceleration signal

7-9*
10-12*
13-15*
16-18*

Covariance of acceleration signal
Minimum of acceleration signal
Maximum of acceleration signal
10th percentile of acceleration
signal

19-21*

25th percentile of acceleration
signal

22-24*

50th percentile of acceleration
signal

25-27*

75th percentile of acceleration
signal

28-30*

90th percentile of acceleration
signal

N/A

Accelerometer orientation
(needed for calculating features
31-36)

Mean acceleration signal

Formula for calculating feature in each 30second window
∑
(
)
∑
∑

(

(

)

) (

(

)]

)

(
)
(
)
For every 600 accelerations, arrange in
order from smallest to largest and pick the 60th
value
For every 600 accelerations, arrange in
order from smallest to largest and pick the 150th
value
For every 600 accelerations, arrange in
order from smallest to largest and pick the 300th
value
For every 600 accelerations, arrange in
order from smallest to largest and pick the 450th
value
For every 600 accelerations, arrange in
order from smallest to largest and pick the 540th
value
(

)
(

31-33*

Mean accelerometer orientation

34-36*
37
38

Variance of accelerometer
orientation
Participant height
Participant weight

N/A
N/A

39

Participant gender

N/A

√(

)
(

∑

(

∑

)

)
)

The * signifies that one feature is included for each of the three accelerometer axes. The formulas
shown are for the x-axis, but the formulas for the y-and z-axes are similar. Ax is the acceleration in
the direction of the x-axis.

90

Table 3.3. Feature sets used for creation and testing of ANNs.
Feature set number
1

2
3
4

5

Features used
Mean, variance, covariance,
minimum, maximum, mean
orientation, variance of orientation,
and 10th, 25th, 50th, 75th, and 90th
percentiles of acceleration signal,
weight, height, and gender
Mean and variance of acceleration
signal, weight, and height
Mean, variance, minimum, and
maximum of acceleration signal,
weight, height, and gender
Mean, variance, covariance,
minimum, and maximum of
acceleration signal, weight, height,
and gender
10th, 25th, 50th, 75th, and 90th
percentiles of acceleration signal,
weight, height, and gender

Total number of features used
39 (12 accelerometer features
per axis * 3 axes + weight +
height + gender)

9 (2 accelerometer features per
axis * 3 axes + weight + height
+ gender)
15 (4 accelerometer features per
axis * 3 axes + weight + height
+ gender)
18 (5 accelerometer features per
axis * 3 axes + weight + height
+ gender)
18 (5 accelerometer features per
axis * 3 axes + weight + height
+ gender)

Size of the hidden layer
As with the number of features used, more hidden units in the hidden layer allows for more
flexibility in the ANNs, allowing the model to better fit the training data. However, having more
units also increases the chances of overfitting. There is no consensus on the optimal number of
hidden units to use, but some investigators have used a number of hidden units similar to the
number of activities being identified and/or the number of input features used (Preece, Goulermas
et al. 2009; De Vries, Garre et al. 2011). Since our aim is to minimize the number of features used
and since our study contains 14 activities, we chose to use 15 hidden units in our hidden layer.

91

Oxycon data
In a previous study by members of our research group, we reintegrated breath-by-breath
Oxycon portable metabolic analyzer data into 10- and then 15-second windows for analysis.
However, with both windows we found that data loss occurred in participants with slower
breathing rates (especially during sedentary activities), resulting in our reintegrating the data into
30-second windows for our final analysis (Montoye, Dong et al. 2014). Correspondingly, breathby-breath Oxycon data from the simulated free-living protocol were reintegrated into 30-second
windows for measurement of EE in the current study. These 30-second windows of accelerometer
data were used for training the ANNs to predict EE (as described earlier). Also, when testing the
EE ANNs, 30-second windows were used for computing predicted EE for comparison to Oxyconmeasured EE. Since the Oxycon recorded continuously and was not dependent on correctly
identifying an activity type, all data, including transitions, was included for training and testing of
the ANNs.
Statistical analyses
After downloading the accelerometer and Oxycon data, all data processing was conducted
in Microsoft Excel (Microsoft Corporation, Redmond, WA), and ANN creation was performed
using the R statistical software package (R-project, Vienna, Austria) . We chose to use Microsoft
Excel for data processing and R for our ANN creation in accordance with our intent to create and
use ANNs using simple methodology which can be used by those without extensive computer
programming skills or access to expensive computing software. Microsoft Excel is a very
commonly used and widely accessible software package for personal computing, and R is
relatively simple to use and is manageable to learn and use for researchers who may have limited

92

statistical or computing experience. Additionally, R is an open-source software which is freely
available for download and has a special ANN library which can be used for development and
testing of ANNs . Thus, development and application of ANNs in R is less costly and much less
complicated than machine learning algorithms developed in previous studies (Pober,
Staudenmayer et al. 2006; Rothney, Neumann et al. 2007; Preece, Goulermas et al. 2009; Mannini
and Sabatini 2010), and R has been used successfully for creation of ANNs for predicting EE and
activity type (Staudenmayer, Pober et al. 2009; Lyden, Keadle et al. 2013).
Three summary statistics were calculated in order to test the accuracy of each ANN for
predicting EE: Pearson correlations, root mean square error, and bias. Operational definitions of
these three measures are given below.
Pearson correlations (r): The covariance of two variables is divided by the product of the standard
deviation of the two variables to obtain r. The range of possible r values is -1 to 1, with 1 being a
perfect correlation and -1 being a perfect inverse correlation (Field 2009). A minimum correlation
of r=0.60 has been defined as moderately high validity in the literature; therefore, we desire to
obtain a correlation of r≥0.60 between predicted EE and Oxycon-measured EE (Safrit and Wood
1995). If this minimum correlation was not met, we would have increased the window length and
added additional features to improve correlations to and meet our desired correlation.
Root mean square error (RMSE): The square root of the mean squared difference between values
predicted by an estimator (the ANNs) and the true values (measured by the criterion measure) is
the RMSE. Smaller RMSE values represent better prediction of the ANNs; thus, our goal was to
minimize RMSE to maximize accuracy of the ANNs.

93

Bias: Bias is the difference between the estimated value of a measure and the true value. Bias
allows for determinations of systematic over- or underprediction of EE; a negative bias represents
underestimation of EE by the ANNs, and a positive bias represents overestimation. We desired to
bias achieve bias values close to 0 in order to maximize accuracy of the ANNs.
Correlations, RMSE, and biases were calculated separately for each of the four
accelerometers and each of the five different feature sets. Differences among correlations, RMSE,
and biases among the four accelerometers were assessed using repeated-measures analysis of
variance (RMANOVA). Additionally, differences among feature sets were evaluated using
RMANOVA. Since correlations tend to be negatively skewed, we first performed a Fisher’s Z
transformation to normalize the correlations before performing the RMANOVA. When the
RMANOVA revealed statistically significant differences for any of the three analyses, post hoc
dependent t-tests were conducted to determine differences among monitor placements or feature
sets. The a priori Alpha level was set at P<0.05 for determining statistical significance. Statistical
analyses were performed using SPSS version 22 (IBM Corporation, Armonk, NY).
Power analysis
In the simulated free-living setting, correlations of r≥0.60 between measured EE and
estimated EE from the four ANNs were desired to indicate moderately high validity of the
accelerometers for EE estimation (Safrit and Wood 1995). Table 3.4 shows the minimum
correlation that could have been detected with different sample sizes and power. For example,
with 20 participants, power is 80% to detect a correlation of 0.591. Thus, our sample size of 44
was well above the minimum required number of 25 needed to detect our minimum desired
correlation (0.60) with greater than 90% power. We chose to oversample in order to ensure

94

adequate sample size due to the potential for occasional malfunction and/or loss of battery power
of the accelerometers or the portable metabolic analyzer experienced in a previous study by
members of our research group (Montoye, Dong et al. 2014).

Table 3.4. Minimum Pearson correlations detectable for a given sample size and power.
Sample
size
18
20
24
30
36
42

80%
power
0.619
0.591
0.545
0.492
0.452
0.395

90%
power
0.684
0.656
0.609
0.554
0.511
0.477

95

RESULTS
Malfunction of the Oxycon metabolic analyzer (due to a bad battery) occurred in three
participants, and accelerometer malfunction occurred in another two participants. These
participants were excluded from further analyses, resulting in 39 included in model creation and
validation. Means and standard deviations (SD) for participant characteristics (both those included
in and excluded from analysis) are shown in Table 3.5. Although weight and BMI appeared higher
in females excluded in the final analysis, these differences were not statistically significant. Of the
39 participants included in the analysis, 13 were either overweight or obese according to BMI (≥25
kg/m2). Additionally four of the 39 participants included in the final analysis were left-hand
dominant, with the remaining 35 being right-hand dominant.
Table 3.5. Demographic characteristics of participants enrolled in study.
Included in analysis
Mean (SD)
All
Males
(n=39)
(n=19)
Age (years) 22.1 (4.3)
23.7 (5.0)
Weight (kg) 72.4 (16.2)
84.5 (13.1)
Height (cm) 171.4 (10.1) 179.1 (7.7)
BMI (kg/m2) 24.4 (3.6)
26.3 (3.4)

Females
(n=20)
20.5 (2.7)
60.8 (8.9)
164.1 (5.7)
22.5 (2.6)

Excluded from analysis
All
Males
(n=5)
(n=3)
21.2 (2.9)
21.3 (4.0)
78.2 (21.2) 75.9 (4.9)
167.8 (9.8) 175.9 (0.4)
28.0 (9.1)
24.8 (1.7)

Females
(n=2)
21.0 (1.4)
81.6 (41.4)
157.1 (2.4)
32.8 (15.8)

In initial testing of the five feature sets, it was found that the addition of weight, height, and
gender yielded no gains in predictive accuracy of the ANNs. Therefore, these features were
removed when training and testing the ANNs. Correlations for predicted EE are shown in Table
3.6. With correlations ranging from r=0.82-0.89 for the four accelerometers across the five sets of
features, all four monitors achieved correlations well above the r=0.60 desired to indicate
moderately high validity. The RMANOVA test among accelerometer placement sites revealed a
test statistic of F=4.36, indicating significant differences among the four placement sites. Post-hoc
96

tests revealed that the ActiGraph thigh accelerometer had higher correlations with measured EE
(r=0.88-0.89) than the wrist accelerometers for all five feature sets (r=0.82-0.86) and higher
correlations than the hip accelerometer (r=0.83-0.88) for all sets other than set 1 (which included
all 39 features). Correlations achieved by the left and right wrist accelerometers were similar for
each of the five feature sets..
When comparing accuracy achieved among the five sets of features, the thigh monitor
accuracy was not affected by choice of feature set. Conversely, for the hip accelerometers, feature
sets 2-5 resulted in slightly lower correlations; similarly, correlations dropped for the wrist
accelerometers for feature sets 2-4 but not for set 5. Despite the statistical significance of the
decreased correlations seen with the hip and both wrist accelerometers, the actual drop in
correlations was quite small, especially for feature sets 3-5.
Table 3.6. Correlations of measured vs. predicted EE.
Correlations
ActiGraph
ActiGraph Thigh GENEA Left
(SD)
Hip
Wrist
0.88 (0.05)
0.89 (0.07)
0.86 (0.05)*
Set 1 (All
accelerometer
features)
0.83 (0.06)^
0.88 (0.09)&
0.82 (0.06)*^
Set 2 (Mean,
Var)
0.86 (0.04)*^ 0.89 (0.08)&
0.84 (0.05)*^
Set 3 (Mean,
Var, Min, Max)
0.86 (0.06)*^ 0.89 (0.10)&
0.84 (0.06)*^
Set 4 (Mean,
Var, Cov, Min,
Max)
0.86 (0.05)*
Set 5 (10th, 25th, 0.87 (0.04)*^ 0.89 (0.05)
50th, 75th, 90th
percentiles)
The * indicates significant differences from thigh accelerometer placement site.
The & indicates significant differences from hip accelerometer placement site.
The ^ indicates significant difference from feature set 1 (all features).

97

GENEA Right
Wrist
0.86 (0.06)*
0.82 (0.08)*^
0.83 (0.06)*^
0.85 (0.06)*^
0.86 (0.06)*

Root mean square error (RMSE) values for predicted vs. measured EE the four
accelerometers are shown in Figure 3.2. The RMANOVA test revealed a test statistic of F=3.64,
indicating significant differences in RMSE among placement sites. For all five feature sets, the
thigh accelerometer placement (1.05-1.14 METs) had significantly lower RMSE values than the
hip (1.12-1.42 METs), left wrist (1.18-1.36 METs), and right wrist (1.18-1.38 METs)
accelerometer placements. Moreover, when comparing among the five feature sets, the RMSE for
the thigh was not significantly different for any of the five. Conversely, RMSE values with the hip
accelerometer placement were significantly higher with feature sets 2-5 than set 1. Similarly, with
the two wrist accelerometer placement sites, RMSE was significantly higher with feature sets 2-4
than set 1, although feature set 5 yielded similar RMSE to set 1. There were no differences in
RMSE values between left and right wrists.

RMSE (METs)

Figure 3.2. RMSE values for predicted vs. measured EE.
1.9
1.8
1.7
1.6
1.5
1.4
1.3
1.2
1.1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

^* ^^
*

^ ^^
*

^*

^^

^*

ActiGraph Hip
ActiGraph Thigh
GENEA L. Wrist
GENEA R. Wrist

1

2

3

4

5

Feature Set

The * indicates significant differences from other accelerometers.
The ^ indicates significant difference from feature set 1 (all 38 features).
For interpretation of the references to color in this and all other figures, the reader is referred to the
electronic version of this dissertation.

98

Average biases for each accelerometer can be seen in Table 3.7. The RMANOVA test
statistic was F=0.062, indicating no overall bias for any of the four monitor placements or for any
of the five feature sets. This lack of bias indicates that none of the accelerometers had an overall
overestimation or underestimation of EE in the total sample.
Table 3.7. Bias for measured vs. predicted EE.
Bias (SD)

ActiGraph
Hip
Feature Set 1 0.01 (0.35)

ActiGraph
Thigh
-0.01 (0.34)

GENEA Left Wrist

GENEA Right Wrist

-0.02 (0.32)

-0.01 (0.35)

Feature Set 2 0.02 (0.59)

0.03 (0.42)

0.01 (0.41)

-0..03 (0.49)

Feature Set 3 -0.03 (0.46)

0.01 (0.32)

-0.03 (0.35)

0.00 (0.47)

Feature Set 4 0.05 (0.44)

0.01 (0.35)

0.00 (0.48)

-0.01 (0.48)

Feature Set 5 0.05 (0.43)

-0.05 (0.29)

0.01 (0.46)

0.05 (0.47)

99

DISCUSSION
The purposes of this study were 1) to validate accelerometers worn on the wrists and thigh
for prediction of EE, 2) to compare the accuracy of EE prediction for accelerometers located on
the wrists, thigh, and hip, and 3) to compare accuracies of the left and right wrists, and 4) to use
simple input features to maximize prediction accuracy while minimizing complexity of the
machine learning technique.
Our results showed strong correlations between measured and predicted EE for all
accelerometer placements and for all five feature sets. Also, our results indicated no systematic
bias by any of the accelerometer placements for estimating EE. Overall, the thigh-mounted
accelerometer provided the highest correlations with measured EE and also the lowest RMSE of
the placement sites. With the full feature set (set 1), the thigh- and hip-mounted accelerometers
provided similar EE prediction accuracy, but the thigh performed better when subsets of the full
feature set were tested. Additionally, the thigh-mounted accelerometer performance was not
diminished for any of the five feature sets tested, meaning that even very simple inputs such as
mean and variance of the acceleration signal can be used to predict EE with a high degree of
accuracy. Given previous work showing high accuracy for measuring sedentary behavior and
ambulatory activities with thigh-mounted accelerometers (Grant, Ryan et al. 2006; Ryan, Grant
et al. 2006; Skotte, Korshoj et al. 2012), the results of this study further illustrate the utility of the
thigh as a highly accurate placement site for activity and EE measurement.
Despite the superiority of the thigh-mounted accelerometer, it is worth emphasizing that
the two wrist-mounted accelerometers provided only slightly lower accuracy than the thigh and
comparable accuracy to the hip, resulting in high overall prediction accuracy of all four monitors.

100

Our finding of high prediction accuracy for the wrist accelerometer placement sites lies in
contrast to studies that have used linear regression-based approaches for estimating EE. In the
early days of activity monitors, Montoye et al. found significantly higher correlations for
predicting EE using a hip-mounted motion sensor (r=0.71) compared to a wrist-mounted
accelerometer (r=0.40) during ambulatory and exercise activities (Montoye, Washburn et al.
1983). Similarly, Swartz et al. found that in a simulated free-living setting, the hip-mounted
accelerometer estimated EE with a moderate correlation of r=0.56, while the wrist-mounted
accelerometer had a very poor correlation (r=0.18) with measured EE (Swartz, Strath et al.
2000). Finally, as recently as 2013, Rosenberger et al. found higher correlations (r=0.72 vs.
r=0.36) and lower error (0.55 vs. 0.85 METs) when predicting EE from a hip-mounted
accelerometer compared to a wrist-mounted accelerometer (Rosenberger, Haskell et al. 2013). It
is important to note that these studies all used linear regression for their modeling technique; the
consistent superiority of the hip to the wrist when linear regression is used is not surprising given
that hip monitor records movement of the trunk, while wrist monitors record arm movement that
may or may not be coupled with movement of the rest of the body, resulting in poor correlations
of activity counts and EE.
A significant advantage of machine learning is its ability to recognize patterns in an
acceleration signal rather than simply using magnitude of acceleration for prediction. Recent
studies by Mannini et al. (2013) and Zhang et al. (2012) show very high activity classification
accuracies (85-97%) using a wrist accelerometer coupled with machine learning models, giving
strong reason to believe that machine learning would also allow for high accuracy for EE
prediction (Zhang, Rowlands et al. 2012; Mannini, Intille et al. 2013). The results of the current
study support the utility of machine learning modeling as a viable approach to analyzing wrist101

mounted accelerometer data and provide further evidence of the superiority of machine learning
to linear regression for modeling of accelerometer data. Additionally, while the current
convention is for wrist accelerometers to be worn on the non-dominant wrist, the results of this
study support that wrist choice will not affect accuracy for estimation of EE. A 2012 study by
Zhang et al. found that classification accuracy for identifying four types of activities was 97%
and 96% for left and right wrist accelerometers, respectively, further supporting the idea that
choice of wrist placement will not affect measurement accuracy.
The high accuracy of wrist-mounted accelerometers for EE prediction found in this study
is especially encouraging given the utility of the wrist location for measuring sleep quality as
well as its current use in large surveillance studies such as NHANES (Troiano and McClain
2012; Troiano, McClain et al. 2014). Additionally, wrist-mounted accelerometers are
comfortable to wear and can be designed/disguised to look like watches, both of which may lead
to improved compliance. With the ability to accurately measure sleep as well as activity type
and EE, the wrist may represent an ideal blend of practicality and measurement accuracy for
monitoring lifestyle behaviors and patterns. Of note, the left and right wrist accelerometer
placements achieved equally high accuracies for prediction of EE, which provides evidence that
the popular convention for an accelerometer to be placed on the non-dominant wrist may be
unnecessary.
The hip-mounted accelerometer achieved correlations of r=0.83-0.88 and RMSE values
of 1.12-1.42 METs with the different feature sets, and these statistics compare favorably to those
achieved in previous studies. In a study conducted in a laboratory-based setting, Staudenmayer
et al. found that an ANN developed using data from a hip-mounted accelerometer predicted EE
with an RMSE of 1.22 METs. This RMSE represented an improvement of 32-71% over
102

previously developed linear regression approaches tested in the study (Staudenmayer, Pober et
al. 2009). Additionally, their input features were very similar to our feature set 5 (the 10 th, 25th,
50th, 75th, and 90th percentiles of the acceleration signal), lending additional support that this
feature set is viable for use in different settings and populations. Similarly, Lyden et al. achieved
intraclass correlation coefficients above 0.90 and RMSE values of 1.00 METs for predicting EE
using ANNs developed from hip-mounted accelerometer data in a true free-living setting, again
achieving superior accuracy when compared to linear regression approaches (Lyden, Keadle et
al. 2013). In another study, Rothney et al. achieved a correlation of r=0.92 and RMSE of 0.50
METs when predicting EE using an ANN developed from a hip-mounted accelerometer in a
simulated free-living setting. Their slightly better accuracy is likely due to study design,
especially given that their use of a linear regression approach to EE prediction yielded a
correlation of r=0.89 and an RMSE of 1.00 METs, both of which are considerably better than
accuracy achieved in other studies (Hendelman, Miller et al. 2000; Swartz, Strath et al. 2000;
Staudenmayer, Pober et al. 2009). Despite the slightly higher RMSE values achieved by the
ANNs in our study, our results are encouraging given that participants averaged an intensity of
3.3 METs across the duration of the protocol, which is higher than many other studies and likely
contributes to higher RMSE, as seen in previous work by our research group (Montoye, Dong et
al. 2014). Taken together, these studies reinforce the high accuracy for EE prediction achievable
using machine learning techniques on data from a single, hip-mounted accelerometer, both in
laboratory-based and free-living settings.
Our final objective in the current study was to use relatively simple methods for feature
extraction and ANN creation and compare sets of input features in order to identify relevant
feature sets that allow for high measurement accuracy while minimizing the complexity of the
103

ANN, both in its structure and in its creation. In order to achieve the first part of this objective,
all data cleaning and feature extraction were conducted in Microsoft Excel. Features were
calculated and extracted using simple functions already built into Excel. While this method is
somewhat labor intensive, the key strength of this approach is that it is a viable method for
feature extraction without knowledge of or access to powerful, complicated software packages.
Use of macros in Excel requires additional knowledge of the software package but can also
streamline the process of feature extraction. Additionally, ANN creation was conducted using R
statistical software, which is a freely available, open-source software package. Writing programs
in R is complex and requires skill, but implementing programs which have already been written
is relatively simple and can be accomplished with knowledge of only a few commands in R. Use
of the nnet package for creating ANNs has been successfully accomplished by Staudenmayer et
al. and Lyden et al., and considerable detail of the approach, including some of the code for
creating and testing the ANNs, can be found in their manuscripts (Staudenmayer, Pober et al.
2009; Lyden, Keadle et al. 2013).
To address the second part of our objective to simplify use of ANNs, we sought to define
an optimal subset of features that can be used without sacrificing measurement accuracy. For the
thigh accelerometer, we found that choice of features had minimal impact on measurement
accuracy, even in the simplest feature set (set 2) consisting of only mean and variance of the
acceleration signal. A very similar set of features was used in a study by members of our
research group, in which they were able to classify 14 activities with an accuracy above 78%
with a thigh accelerometer (Dong, Montoye et al. 2013). Therefore, this minimal feature set
appears to provide strong accuracy for both activity type classification and EE prediction when
using a thigh-mounted accelerometer. For the hip-mounted accelerometer, feature sets 2-5 all
104

provided slightly lower prediction accuracy than set 1, although the drop in accuracy was very
small (especially for sets 3-5) and may be of little practical significance. Finally, with the two
wrist-mounted accelerometers, feature sets 2-4 resulted in significantly higher RMSE values and
lower correlations with measured EE compared to the other four feature sets, although these
differences were also small. Additionally, feature set 5 provided similar measurement accuracy
to set 1. The choice of features to use in a predictive model will be dependent on the emphasis
on accuracy vs. the feasibility for use. For studies with an emphasis on accuracy of
measurement, the larger feature sets used with the wrist and hip placements yielded better
accuracies than the ANNs developed from the smaller feature sets. On the other hand, the
simplest ANNs developed for the thigh placement were able to predict EE with similar accuracy
to the largest feature set. Also, the simplest models (i.e., feature set 2, which included only mean
and variance of the acceleration signals) can be used with high accuracy and RMSE within 25%
of that achieved with the largest feature set (set 1) for the wrist and hip accelerometer
placements. Therefore, these smaller feature sets may be more appropriate for use in large-scale
studies, where ease of use of the predictive models is of utmost importance.
Taken together, the findings of this study support the use of simple-to-compute
acceleration features for achieving highly accurate estimates of free-living EE using machine
learning. Moreover, choice of the number and type of features appears to alter EE prediction
accuracy slightly, but the practical significance of these small differences is likely minimal,
indicating that researchers may be able to use ANNs with only a few, simple-to-compute
accelerometer features and achieve high measurement accuracy.

105

Study limitations and strengths
There were several limitations of the current study. First, study participants represented a
fairly homogenous group of college-age adults. Thus, our findings are not necessarily
generalizable to older populations or children/adolescents and require further validation before
use in these populations. Second, the use of a simulated free-living setting rather than a true
free-living setting could be viewed as a limitation since some studies have used a true free-living
setting for ANN creation and validation (Lyden, Keadle et al. 2013).

Third, we did not measure

resting VO2, which is known to vary across individuals (Ferro-Luzzi 1968). However, like
creation of individual HR curves for improving the accuracy of EE prediction using HR, taking
individual resting EE into account results in dramatically increased burden on researchers and
participants; more importantly, individual resting EE measurement would limit the generalizability
of our findings since it is not often possible to measure resting EE in intervention or epidemiologic
studies, where accelerometers are often used. Instead of measuring resting VO2, it may be useful
to include variables such as age and fat-free mass into prediction models since these variables
account for the majority of variation in resting VO2 (Johnstone, Murison et al. 2005). However,
our study did not find that the inclusion of demographic variables such as weight, height, and
gender improve EE prediction when added as input features.. Last, we experienced some
difficulties with keeping thigh-mounted accelerometers in their proper location during the
protocol. Taping monitors on the thigh worked well initially but was less reliable once
participants started to sweat. We attempted to secure the monitor using an elastic strap, but this
often slipped throughout the session and was less comfortable to participants. There have been
several studies that have successfully used thigh-mounted accelerometers for PA and SB
measurement, and in future work we hope to communicate with other researchers regarding
106

optimal strategies for mounting accelerometers on the thigh due to their high measurement
accuracy and ability to be worn inconspicuously (i.e., under clothing) to enhance compliance.
There are also several notable strengths of this study. First and foremost, we believe the
simulated free-living setting represents the best blend of exerting some control over participant
activities while still allowing considerable freedom for the order, intensity, and duration of
activities chosen by participants. Troiano et al. identified that PA tends to be performed in short
bouts, meaning that steady-state is rarely achieved during PA in free-living settings (Troiano,
Berrigan et al. 2008). This study provides rationale for the inclusion of transitions and nonsteady-state activities in our study since it is more similar to true free-living settings than a
typical laboratory-based validation.
A true free-living setting may theoretically have the most real-world generalizability, but
a major issue in true free-living settings is lack of a good criterion measure. Doubly labeled
water provides an accurate estimate of total EE but cannot measure activity EE or minute-tominute EE. Also, Lyden et al. used a true free-living setting for their ANN creation and
validation and direct observation as their criterion measure. Trained observers recorded
activities being performed and later used activity classification to predict EE using the
Compendium of Physical Activities (Ainsworth, Haskell et al. 2011). While this approach
probably represents the best possible criterion in a true free-living setting, it is limited in that the
Compendium is an estimate of activity EE and is not suitable for individual EE prediction. Also,
without imposing some structure in which participants must perform certain activities for a
minimum time, it is likely that participants will spend the majority of their time in activities such
as sitting and walking and minimal or no time performing other activities, limiting the
generalizability of ANNs created from these data. By utilizing a variety of activities across a
107

wide range of intensities and including all transition data during the visit in our analysis, we
incorporated many advantages of a true free-living setting while also exerting enough control to
ensure that a variety of activities were performed. Additionally, in the simulated free-living
setting we were able to use a portable metabolic analyzer as our criterion measure, which is
widely used as a criterion measure for EE measurement. Another strength of the study was the
use of Microsoft Excel and R statistical software for all stages of data cleaning, feature
computation and extraction, and ANN creation and validation. These software programs are
widely available, and they can be used to create and test machine learning algorithms with
minimal experience in computational programming. Finally, it can sometimes be difficult to
compare results across studies due to differences in protocol, number and types of activities
performed, population used, and modeling approach(es) tested. By simultaneously using four
accelerometers, our study allows for direct comparisons of monitors worn on different places on
the body for accuracy in EE prediction.
Conclusions
In summary, our study provides strong preliminary evidence that machine learning
modeling allows for single accelerometers mounted on the thigh and wrists to provide highly
accurate estimates of EE in a simulated free-living setting. Thigh-mounted accelerometers
appear to perform with slightly better accuracy than hip- or wrist-mounted accelerometers,
although this difference is fairly small. Also, we have shown that choice of wrist (dominant vs.
non-dominant) does not affect accuracy of EE prediction. Finally, our study builds off the work
of others and highlights ways of reducing complexity of ANN model creation, hopefully
allowing for this approach to be used by a wider group of researchers with skills in areas other
than activity measurement. In future studies we plan to extend our comparison of different
108

placement sites for accuracy of activity classification as well as measurement of SB and sleep
across different populations. Also, we plan to experiment with using data from multiple
monitors to further improve measurement accuracy over that achieved with a single monitor.
Finally, we intend to cross-validate the algorithms developed in the study in a true free-living
setting to provide support for their future use for EE prediction in epidemiologic or surveillance
research.

109

CHAPTER 4
COMPARISON OF ACTIVITY TYPE CLASSIFICATION ACCURACY FROM
ACCELEROMETERS WORN ON THE WRISTS, HIP AND THIGH

ABSTRACT
The purpose of this study was to develop, validate, and compare the accuracy of activity type
prediction models for accelerometers placed on the wrist, hip, and thigh. Additionally, we
compared classification of activity type between accelerometers worn on the left and right wrists.
Finally, we compared prediction accuracies for specific categories of activities (e.g., sedentary
activities) METHODS: Forty four healthy adults participated in a 90-minute simulated freeliving activity protocol, in which participants performed a total of 14 activities (sedentary,
ambulatory, lifestyle, and exercise activities, standing, cycling, stairs, and non-wear) for 3-10
minutes each. The order, duration, and intensity of activities were dictated by participants and
recorded using direct observation (for a criterion measure of activity type). Four accelerometers
were worn (right and left wrists, right hip, and right thigh) in order to predict activity type using
artificial neural networks. The artificial neural networks were created using several sets of input
features in order to determine those most relevant to activity type prediction.

Classification

accuracy of the artificial neural networks was evaluated using sensitivity, specificity, and area
under the curve, with direct observation used as the criterion measure of activity type.
RESULTS: The wrist accelerometers achieved the highest overall classification accuracies for
identifying all 14 activities (80.9-81.1%) as well as when similar activities were grouped into
categories (86.6-86.7%). Additionally, classification accuracies were similar between left and
right wrists. The hip accelerometer had the lowest overall classification accuracies (66.272.5%), with the thigh accelerometer accuracy higher than the hip but lower than the wrists
110

(71.4-84.0%). Sedentary, lifestyle, and exercise activities were detected best with the wrist
accelerometers, whereas the ambulatory activities had similar classification accuracies with all
four accelerometer placements. Unlike our previous work with energy expenditure prediction
(Chapter 3), more input features significantly improved classification accuracy.
CONCLUSIONS: A single accelerometer placed on the left or right wrist provided the highest
overall classification accuracy for activity type prediction as well as the highest accuracy for
sedentary, lifestyle, and exercise activity categories in a simulated free-living setting.

111

INTRODUCTION
Objective measurement of physical activity (PA) and sedentary behavior (SB) is
important for determining relationships between these lifestyle behaviors and health indices,
identifying populations at high risk of having low PA and high SB levels, and evaluating the
effectiveness of interventions designed to increase PA and/or decrease SB. Because of the
interest (at both a population and personal level) in measuring PA and SB, many wearable
devices, such as heart rate monitors, pedometers, and accelerometers, have been used in an
attempt to quantify PA and SB. Accelerometers have emerged as the most popular method due
to their relatively low participant and researcher burden as well as high accuracy for measuring
physiologic variables such as energy expenditure and activity intensity (Welk 2002). Traditional
use of accelerometers has involved linear regression for predicting energy expenditure from
“activity counts”, which are pre-processed and filtered acceleration signals from an
accelerometer (Freedson, Melanson et al. 1998). However, in recent years the field has started to
move away from the count-based regression approach because linear regression is often
inadequate to capture the complex relationship of acceleration patterns and movement that
occurs in free-living settings (Hendelman, Miller et al. 2000; Swartz, Strath et al. 2000) and
cannot allow for determination of the type of activity being performed (Preece, Goulermas et al.
2009).
A large body of recent research has focused on machine learning, a pattern recognition
approach for modeling data, in order to predict energy expenditure as well as activity type using
features extracted from accelerometer data (Preece, Goulermas et al. 2009). Using machine
learning, researchers have achieved activity classification accuracies consistently over 70%
(Staudenmayer, Pober et al. 2009; Dong, Montoye et al. 2013) and often over 90% (Zhang,
112

Rowlands et al. 2012; Cleland, Kikhia et al. 2013; Mannini, Intille et al. 2013; Skotte, Korshoj et
al. 2014) using data from a single accelerometer worn on different parts of the body. Although
accelerometers have traditionally been worn on the hip, machine learning has yielded high
activity classification accuracy from accelerometers worn on the wrist, thigh, and ankle (Cleland,
Kikhia et al. 2013; Dong, Montoye et al. 2013; Mannini, Intille et al. 2013; Skotte, Korshoj et al.
2014). Of the many placement sites tested in different studies, the wrist and thigh hold
significant promise in the context of machine learning approaches to data analysis.
Wrist-mounted accelerometers have been used successfully for sleep measurement (JeanLouis, Kripke et al. 2001) and are also being used in the 2011-2014 cycle of NHANES data
collection in the hope of improving compliance (Troiano and McClain 2012). Moreover, several
studies have achieved high accuracy for activity type classification with wrist accelerometers
(Zhang, Rowlands et al. 2012; Cleland, Kikhia et al. 2013; Mannini, Intille et al. 2013), further
demonstrating the utility of the wrist as a promising measurement site. The convention is for
wrist-mounted accelerometers to be worn on the non-dominant wrist, but it may be that this
convention is unnecessary. A study by Zhang et al. found similar classification accuracies from
accelerometers worn on the left and right wrists for four types of activities (sedentary, household,
walking, and running). However, it is unknown if other kinds of activities, especially activities
that may vary considerably between dominant and non-dominant hands (e.g., sweeping,
computer use, etc.), are detected with similar accuracy for each wrist. If choice of wrist
placement does not affect measurement accuracy, there may be important implications for
improving compliance and comfort with wrist-worn accelerometers.
Thigh-mounted accelerometers possess significant potential as a placement site due to
their high accuracy for measuring SB (Grant, Ryan et al. 2006; Kozey-Keadle, Libertine et al.
113

2011; Lyden, Kozey Keadle et al. 2012) and high accuracy for prediction of energy expenditure
(Chapter 3). Studies have found that activity type classification accuracies from a thigh-mounted
accelerometer are similar to or higher than accuracies achieved with a wrist-mounted
accelerometer (Cleland, Kikhia et al. 2013; Dong, Biswas et al. 2013; Skotte, Korshoj et al.
2014), but it is unknown if the thigh can provide high measurement accuracies across a wide
variety of activities.
Despite the utility of the hip, wrist, and thigh as accelerometer placement sites, few
studies have directly compared activity classification accuracies among these sites to determine
their overall accuracy as well as classification of different types of activities such as sedentary,
ambulatory, or lifestyle activities. One study by Cleland et al. found classification accuracies
above 95% for accelerometers located on the hip, wrist, and thigh, but the small number of
activities performed and small number of participants limits our understanding of the advantages
and disadvantages of each placement site. Other studies by Dong et al. (Dong, Montoye et al.
2013) and Skotte et al. (Skotte, Korshoj et al. 2014) compare two of these three placement sites
but did not compare all three. Therefore, further research is needed to directly compare
classification accuracies of hip-, wrist-, and thigh-mounted accelerometers.
Another research gap is the lack of machine learning algorithm validation for activity
type classification in free-living settings. Most previous studies have been conducted in
laboratory-based settings with participants performing a set list of activities in a pre-specified
order, for a pre-specified period of time, and at a constant, pre-specified intensity (Cleland,
Kikhia et al. 2013; Dong, Montoye et al. 2013; Mannini, Intille et al. 2013; Skotte, Korshoj et al.
2014). These laboratory-based settings ensure that high control is exerted over the protocol and
can provide valuable insight as to the strengths and weaknesses of predictive algorithms for
114

classifying different types and intensities of activities. However, the lack of variation allowed in
laboratory-based protocols makes the laboratory setting very different from a free-living
environment, where individuals are not constrained to a certain order, intensity, or timing of
activities. Previous work with cut-points as well as machine learning provides evidence that
predictive techniques validated in the laboratory perform with much lower accuracy when
applied to a free-living setting (Swartz, Strath et al. 2000; Gyllensten and Bonomi 2011; Lyden,
Keadle et al. 2013). In one such study, Lyden et al. (Lyden, Keadle et al. 2013) found that an
ANN created from laboratory-based activity data had a bias of over 33% when used to predict
energy expenditure from data collected in a free-living setting. Similarly, a study by Gyllensten
et al. showed that activity type machine learning algorithms developed in a laboratory had
classification accuracies 15-20% lower in a free-living setting than in the laboratory-based
setting in which they were created (Gyllensten and Bonomi 2011). Therefore, activity type
prediction algorithms need to be created and validated in a free-living setting in order to have
true utility for activity measurement in epidemiologic, surveillance, or intervention studies.
Given the current gaps that exist with regard to activity type classification, the purposes
of our study were 1) to develop and validate ANNs (using several sets of features) for prediction
of activity type from accelerometers worn on the wrists, hip, and thigh 2) to compare the activity
classification accuracies achieved among these accelerometer placement sites, 3) to compare the
overall activity classification accuracies of accelerometers placed on the left and right wrists and
4) to compare classification accuracies for specific activity types, activity categories (i.e.,
lifestyle, exercise, sedentary, and ambulatory activities), and activity intensities (i.e., sedentary,
light, etc.) using data collected in a simulated free-living setting.

115

METHODS
Summary of protocol
Participants came to the Human Energy Research Laboratory to participate in a 90minute simulated free-living protocol, for which they performed a total of 14 sedentary,
ambulatory, lifestyle, and exercise activities. Each activity was performed for between 3-10
minutes, with the order, duration, and intensity of activities left up to participants. During the
protocol, participants wore four accelerometers, and the order and durations of their activities
were recorded by a trained observer and used as a criterion measure of activity type.
Participants
A total of 44 adults (22 male, 22 female) were recruited from the area surrounding East
Lansing, MI via email, flyers, and word of mouth for participation in this study. Participants
had to fulfill three criteria to be eligible for the study: 1) they had to be free of health
conditions preventing them from being able to safely perform moderate- or vigorous-intensity
activities, 2) they could not have an orthopedic limitations that would invalidate the use of
accelerometry for activity measurement, or 3) they had to fall within the age range of 18-44
years. Prior to participant recruitment, this study was approved by the Michigan State
University Institutional Review Board. All participants provided written informed consent
prior to their participation in the study.
Instrumentation
The activity monitors used in this study were ActiGraph GT3X+ accelerometers and
GENEActiv accelerometers. Additionally, an iPAQ portable digital assistant (PDA) computer was

116

used by observers to record the activities performed during the protocol. The acceleration data for
all four accelerometers were time stamped and stored within the monitors and later were
downloaded to a computer for analysis. Additionally, the accelerometers were oriented so that the
x-axis was the vertical axis, the y-axis was the medial-lateral axis, and the z-axis was the anteriorposterior axis. All accelerometers and the PDA were synchronized to an external clock prior to the
start of data collection. Descriptions of the accelerometers and PDA follow.
ActiGraph accelerometers
The ActiGraph (ActiGraph LLC, Pensacola, FL) is a commonly used, commercially
available accelerometer, and there is an abundance of literature regarding its reliability and validity
for measurement of PA (Freedson, Melanson et al. 1998; Matthew 2005; McClain, Sisson et al.
2007). Two GT3X+ models were worn by each participant during the study. One accelerometer
was placed on the midline of the right thigh, one third of the way between the hip and knee and
adhered to the leg with hypoallergenic sticky tape. The other ActiGraph was mounted on the right
hip, at the anterior axillary line, with an elastic belt. The ActiGraph GT3X+ records raw
accelerations of up to ± 6 times the gravitational force (6g) in three axes of movement. For the
current protocol, the accelerometers were set to record data at a rate of 40 samples per second (40
Hz).
GENEA accelerometers
The GENEActiv (Activinsights Ltd, Kimbolton, Cambridgeshire, UK) is a new
accelerometer that has had preliminary validation for PA measurement (Esliger, Rowlands et al.
2011) as well as activity type classification (Zhang, Rowlands et al. 2012). Like the ActiGraph,
the GENEA records raw data of up to ± 6g in three axes of movement. The GENEAs were set to
117

record acceleration data at a rate of 20 Hz for the current study. Participants wore two GENEA
accelerometers (one on each wrist) for this study. Each GENEA was fastened securely to the
dorsal side of the wrist, between the styloid processes of the radius and ulna (Esliger, Rowlands et
al. 2011).
iPAQ portable digital assistant and direct observation
Direct observation (DO) was conducted using an HP iPAQ PDA (HP Development
Company, Palo Alto, CA) to obtain a criterion measure of activity type for this study. During the
protocol, a trained observer used a portable digital assistant with BEST software developed based
on the Children’s Activity Rating Scale protocol (Puhl, Greaves et al. 1990). The numbers codes
T1-T14 represented the 14 activities in the visit, and the observer recorded the activities being
performed continuously as they occurred throughout the visit. A list of activities and their specific
DO codes can be found in Table 4.1. Inter-rater reliability for DO was above r=0.90 for this study.
Procedure
Each participant reported to the Human Energy Research Laboratory, where details of the
study were discussed with each participant. Written informed consent was obtained, and a
physical activity readiness questionnaire was administered to ensure that the participant was
healthy and had no contraindications to engaging in activity. If participants had answered ‘yes’ to
any question on the questionnaire, they would have been required to obtain physician approval
before being able to participate in the study; however, this did not occur. After consenting to
participation, participant weight and height were taken by trained research assistants according to
standardized methods (Malina 1995). Weight was measured to the nearest 0.1 kg using a Seca
digital scale (Seca, Hanover, Germany), with shoes off and weight balanced on the center of the
118

scale. Height was measured to the nearest 0.1cm using a Harpenden stadiometer (Holtain Ltd.,
Crymych, United Kingdom). For measurement of height, the participant removed his/her shoes,
stood erect with feet flat on the floor, aligned head in the Frankfurt plane, and placed the back of
the feet, shoulders, and head against the back of the board. Two measurements were taken and
averaged for both weight and height. If the two body weights differed by more than 0.3 kg or if
the two heights differed by more than 0.4 cm, a third measurement was taken, and the closest two
measurements were averaged to obtain a final value. Body mass index (BMI) was calculated by
dividing body weight by the square of height (kg/m2). Age was assessed by asking participants to
state their age in years. Handedness was determined by asking participants which hand they prefer
to use for the majority of activities.
Each participant wore one ActiGraph on the hip, another ActiGraph on the thigh, one
GENEA on the left wrist, and one GENEA on the right wrist while performing 14 activities
(activities shown in Table 4.1). These activities comprised a range of intensities from sedentary to
vigorous and represented a mixture of sedentary, ambulatory, exercise, and lifestyle. Ambulatory
activities (walking and jogging) are common in accelerometer validation literature; we added the
sedentary, exercise, and lifestyle activities to determine the potential for the four accelerometer
placements to accurately measure a range of activity types often seen in free-living settings.
Additionally, we added an activity where participants removed the accelerometers so that the
ANNs would be able to recognize non-wear, which is important to be able to detect in free-living
environments for compliance purposes.

119

Table 4.1. Activities performed during the simulated free-living protocol.
Activity
Category

Lying down (T1)

Activity
Intensity
Sedentary

Reading (T2)

Sedentary

Computer (T3)

Sedentary

Standing (T4)

Light**

Laundry (T5)

Light

Sweeping (T6)

Light

Walking slow (T7)

Light

Walking fast (T8)

Moderate

Jogging (T9)

Vigorous

Cycling
(CY)

Cycling (T10)

Moderate/
Vigorous

Stair use
(SU)

Stair climbing and
descending (T11)

Moderate/
Vigorous

Biceps curls (T12)

Light

Squats (T13)

Moderate

Non-wear of
accelerometer (T14)

N/A

Sedentary
(SE)
Standing
(ST)
Lifestyle
(LI)
Leisure walk
(LW)
Brisk walk
(BW)
Jogging
(JO)

Exercise
(EX)
Non-wear
(NW)

Activity

Description of Activity*
Lying on a mat on the floor
Reading a magazine article while
sitting at a table
Sitting and playing a computer game
that involves mouse clicking and typing
Standing still with arms at sides
Folding towels and putting them in a
laundry basket
Sweeping confetti into piles
Walking at a self-selected ‘slow’ pace
in a hallway
Walking at a self-selected ‘brisk’ pace
in a hallway
Jogging at a self-selected pace in a
hallway
Cycling on a cycle ergometer at a selfselected cadence of 50-100 rpm with 1
kg resistance
Walking up and down a flight of stairs
at a self-selected pace
Standing still while doing biceps curls
with a 3-lb. weight in each hand
With feet shoulder-width apart,
bending at the knees (to a 90° angle)
while holding an unweighted broom
behind the head
Not wearing the accelerometer

* Activity order, intensity, and duration (3-10 minutes) were left up to participants.
** Standing has traditionally been considered SB; however, recent literature suggests that standing
should be considered light-intensity instead of SB due to the differential physiologic effects of
standing as compared to sitting/lying (Owen, Healy et al. 2010).

Participants completed a 90-minute, simulated free-living setting which took place in a
laboratory within the Human Energy Research Laboratory as well as a hallway and stairwell.
120

During the protocol, participants performed the 14 activities listed in Table 4.1. A list of these
activities was given to participants at the beginning of the visit along with a description of how to
perform each activity. Participants completed each of the 14 activities for a total of at least three
minutes and for no more than 10 minutes, but the order, intensity, and duration of the activities
were left up to each participant. A research assistant directly observed and recorded each activity
on a handheld PDA computer while it was being performed. Additionally, activities were written
on a whiteboard and checked off as participants completed each activity. so that participants know
which activities they still needed to complete. Every 4-5 participants, the activities were erased
and rewritten in a different order to avoid possible effects from the order in which the activities
were written. For this study, DO served as the criterion measure of activity type performed. The
non-wear activity was saved until the end of the 90-minute protocol so that participants would not
spend a significant amount of time trying to remove and reattach the accelerometers. Upon
completion of the protocol, participants were given a $35 Target® gift card.
Data reduction and modeling
Artificial neural networks
ANNs are nonlinear models which take a set of inputs x1…xk and use them to predict a
certain output variable y (e.g., EE or activity type), where k is the number of features used to
predict y. Figure 1 provides a graphical depiction of one of the activity type ANNs. For activity
type classification, the ANNs functioned similar to a logistic regression model. Setting the activity
types as the nominal values a1…a14, the ANN model can be seen in Equation 1.

Equation 1:

(

)

(

121

∑

(

∑

)

In Equation 1, Pr is probability, C is a constant chosen so that Pr(y=a1)+…+Pr(y=a14)=1, w are
the weights of the input features, U is a logistic activation function, and H is the number of hidden
layers. In accordance with previous research, our models contained only one hidden layer (Preece,
Goulermas et al. 2009; Staudenmayer, Pober et al. 2009; Trost, Wong et al. 2012).

122

Figure 4.1. ANN for predicting activity type.

Figure 4.1 legend
* The number of output variables shown matches the number used in the study, but the
number of input features varied from 8-38, depending on the feature set tested. Additionally,
three hidden units are shown above for simplicity, but we used 15 hidden units for
constructing our ANNs.
Accelerometer signal features (one of each per axis, three total of each per accelerometer)
1. Mean = mean
2. Var = variance
3. Cov = covariance
4. Min = minimum
5. Max = maximum
6. MeanOR = mean accelerometer orientation
7. VarOR = variance of
8. 10th %ile = 10th percentile
accelerometer orientation
9. 25th %ile = 25th percentile
10. 50th %ile = 50th percentile
th
th
11. 75 %ile = 75 percentile
12. 90th %ile = 90th percentile
Participant characteristics features
13. Ht = participant height
14. Wt = participant weight
Non-feature abbreviations
S = summations of the input layer in the hidden units
U = activation function for the hidden layer
W1 = the weight vectors for each of the inputs
W2 = the weight vectors for each of the summations
123

The ANNs were created and tested using a leave-one -out approach. In this approach, data
from all but one participant were used to estimate the weights for each input feature for predicting
activity type. Then, the ANN was tested on the data from the participant left out of the training
phase by supplying the input features and comparing the predicted activity type from the ANNs to
the recorded activity type from DO. The leave-one-out cross validation is an iterative approach
and was repeated with each participant’s data used as the testing data once, therefore obtaining an
ANN for activity type for each participant in the study. Weights determined from each iteration of
the leave-one-participation-out validation were averaged to obtain a final ANN for each
accelerometer placement site, r, resulting in four distinct ANNs.
There were two important considerations that were addressed in building our ANNs: 1)
window length and 2) relevant features to use as input variables.
Window length
In order to analyze accelerometer data, it must first be divided into segments, called
‘epochs’ or ‘windows,’ for analysis. By dividing the data into windows, activity type can be
assessed separately for each window to yield information on which activities were being performed
as well as when they were performed. Windows of 60 seconds are commonly used for predicting
energy expenditure while analyzing accelerometer data because summarizing a given energy
expenditure or activity performed each minute is intuitively appealing and works well for steadystate activities (Staudenmayer, Pober et al. 2009; Freedson, Lyden et al. 2011). Additionally,
longer windows (i.e., 30-60 seconds) increase the amount of information available with which to
determine activity type, and they have been shown to improve activity classification accuracy
(Trost, Wong et al. 2012). However, a significant limitation of longer windows is that they are less

124

useful in free-living situations, where activities rarely start or end exactly on the minute and where
activities may last less than a minute in length (Lyden 2012). Thus, a 60-second window is likely
to encompass more than one activity, resulting in frequent activity misclassification due to too
much granularity in the output.
On the other hand, very short windows (e.g., less than one second) may not allow enough
time to capture a movement (i.e., in one second a person may only take part of a step when
walking), therefore yielding insufficient information to classify the movement and resulting in
lower classification accuracy (Preece, Goulermas et al. 2009; Trost, Wong et al. 2012). Machine
learning techniques have been conducted with window lengths as short as 0.25 seconds, but many
studies use window lengths of 4-6.7 seconds for classifying activity type (Preece, Goulermas et al.
2009). Therefore, in accordance with previous research and for simplicity, we employed fivesecond windows for our activity type in our data processing and analyses.
Features
There are several different types of features that can be used as input variables: timedomain features, frequency-domain features, and participant characteristics. Time-domain features
are most commonly used because they can be directly computed from the accelerometer signal
data, making them simple to extract and understand (Preece, Goulermas et al. 2009; Staudenmayer,
Pober et al. 2009). Examples of time-domain features are mean, variance, covariance, and
percentiles of the acceleration signal. The other main type of features, frequency-domain features,
can be used either in conjunction with or independent from time-domain features, yielding
similarly high accuracy for activity type classification as time domain features in some studies
(Preece, Goulermas et al. 2009; Mannini, Intille et al. 2013). However, frequency-domain features

125

require mathematical transformations prior to computation and may require significant
computational power and specialized statistical software (Preece, Goulermas et al. 2009).
Additionally, several studies provide evidence that time-domain features can be used to achieve
high activity classification accuracy (71-99%) from a single accelerometer without use of
frequency-domain features (Herren, Sparti et al. 1999; Staudenmayer, Pober et al. 2009; Dong,
Montoye et al. 2013; Montoye, Dong et al. 2013). Other than time- and frequency-domain
features, simple descriptive features, such as accelerometer orientation or participant demographic
variables, can also be used to improve measurement accuracy.
Many accelerometer signal features have been used in previous research, and the models
created have varied considerably in complexity and measurement accuracy. Adding more features
may improve accuracy of the ANN; however, similarly to linear regression, addition of too many
input variables may lead to overfitting ANNs to the data used for training, resulting in poor
generalizability of the model when applied to a new population. Therefore, there must be a
balance of number of features used and accuracy achieved. Another consideration of adding too
many features is that it increases complexity of the models created and requires more
computational power to create. This added complexity can quickly render machine learning
models difficult to create or use for anyone lacking experience with computer programming and/or
access to expensive computing software (Pober, Staudenmayer et al. 2006; Rothney, Neumann et
al. 2007; Staudenmayer, Pober et al. 2009). Thus, we experimented with different sets of features
to determine a set that had high accuracy of measurement without being overly complex.
Before computing features, the 40 Hz data from the ActiGraph accelerometers were
reintegrated to 20 Hz for comparison with the data from the GENEA. Table 4.2 provides a list of
the 38 features tested and used in the current analyses. The 36 accelerometer features (12 features
126

for each of three axes) are all time-domain features that have been effectively utilized in previous
studies, and height and weight were included to account for different body sizes. Since fivesecond windows were used for activity type classification, there were 100 accelerometer data
points within each five-second window with which to calculate the necessary features (20
samples/second * 5 seconds). Mean, variance, covariance, minimum, maximum, mean and
variance of monitor orientation, and the 10th, 25th, 50th, 75th, and 90th percentiles of the acceleration
signal were calculated separately for x-, y-, and z-axes. After creating the ANNs using all 38
features, follow-up analyses were conducted to determine if a subset of features could reduce
complexity of the ANNs with minimal loss of accuracy. The subsets tested are shown in Table
4.3. Additionally, feature sets 1 and 2 were tested with and without height and weight included as
input features to determine if including demographic characteristics impacted classification
accuracy.

127

Table 4.2. Features used for EE and activity type prediction.
Feature
number
1-3*

Feature used

4-6*

Variance of acceleration signal

7-9*

Covariance of acceleration
signal
Minimum of acceleration signal

10-12*
13-15*
16-18*

Formula for calculating feature

Mean acceleration signal

Maximum of acceleration
signal
10th percentile of acceleration
signal

19-21*

25th percentile of acceleration
signal

22-24*

50th percentile of acceleration
signal

25-27*

75th percentile of acceleration
signal

28-30*

90th percentile of acceleration
signal

N/A

Accelerometer orientation
(needed for calculating features
31-36)

(
∑
∑

(

∑

)

(

)

) (

(

)]

)

(

)

(

)

For every 100 accelerations, arrange in
order from smallest to largest and pick the 10th
value
For every 100 accelerations, arrange in
order from smallest to largest and pick the 25th
value
For every 100 accelerations, arrange in
order from smallest to largest and pick the 50th
value
For every 100 accelerations, arrange in
order from smallest to largest and pick the 75th
value
For every 100 accelerations, arrange in
order from smallest to largest and pick the 10th
value
(

)
(

31-33*

Mean accelerometer orientation

34-36*
37

Variance of accelerometer
orientation
Participant height

N/A

38

Participant weight

N/A

√(

)
(

∑

(

∑

)

)
)

* Signifies that one feature is included for each of the three accelerometer axes. The formulas
shown are for the x-axis, but the formulas for the y-and z-axes are similar. Ax is the acceleration in
the direction of the x-axis.
128

Table 4.3. Feature sets used for creation and testing of ANNs.
Feature set number
1

2
3

4

5

Features used
Mean, variance, covariance,
minimum, maximum, mean
orientation, variance of orientation,
and 10th, 25th, 50th, 75th, and 90th
percentiles of acceleration signal,
weight, and height
Mean and variance of acceleration
signal, weight, and height
Mean, variance, minimum, and
maximum of acceleration signal,
weight, and height
Mean, variance, covariance,
minimum, and maximum of
acceleration signal, weight, and
height
10th, 25th, 50th, 75th, and 90th
percentiles of acceleration signal,
weight, and height

Total number of features used
38 (12 accelerometer
features/axis * 3 axes + weight
+ height)

8 (2 accelerometer features/axis
* 3 axes + weight + height)
14 (4 accelerometer
features/axis * 3 axes + weight
+ height)
17 (5 accelerometer
features/axis * 3 axes + weight
+ height)
17 (5 accelerometer
features/axis * 3 axes + weight
+ height)

Activity type classification
Although 14 activities were performed in the protocol, some activities could be combined
into common groupings. Differentiating among the sitting activities (computer use and reading)
and lying may be difficult using a single accelerometer on the thigh or hip since thigh movement
was minimal and thigh and hip orientation were similar for all three activities. However, these
sedentary activities elicit similar physiologic responses (Bey and Hamilton 2003), so
differentiation among them was not of central importance in this study. Therefore, these three
activities were grouped into a ‘sedentary’ category. Conversely, standing, which is considered SB
in most studies since its energy cost is less than 1.5 METs (Ainsworth, Haskell et al. 2011),
129

requires significant postural muscle contraction and elicits different physiologic responses from
sitting and lying down (Hamilton, Hamilton et al. 2004; Hamilton, Hamilton et al. 2007). Thus,
standing had its own category, separate from the sedentary category. Squats and biceps curls are
both exercise activities, so they were grouped into an ‘exercise’ category. Finally, laundry and
sweeping are both lifestyle activities that are light intensity and involve intermittent movement of
both the upper and lower body. Thus, these were combined into the ‘lifestyle’ category. The rest
of the activities had their own categories. In summary, we evaluated activity classification
accuracy for all 14 activities and then for the 10 categories. These 10 categories are displayed in
the leftmost column of Table 4.1. It is important to note that these categories were not meant to
imply that certain types of activities could not be in a different category (i.e., walking and running
are often used for exercise rather than ambulation). Rather, these categories were developed to
group similar activities to offer a better idea of the utility of the ANNs for activity classification
accuracy. Additionally, a third grouping was performed by grouping activities into intensity
categories (sedentary, light, moderate, vigorous) in order to determine how well the ANNs can
predict the relative intensity of an activity.
Identifying non-wear
Non-wear was classified as a separate activity type from the 13 other activities performed
by the participants in the 90-minute free-living simulation. By creating a distinct category and
training the ANNs to recognize non-wear, we hoped to eliminate the need to establish coding rules
for how many minutes of consecutive zero counts determine non-wear when accelerometers are
worn in free-living settings (Masse, Fuemmeler et al. 2005; Evenson and Terry 2009).

130

Direct observation
With the BEST software, DO data were recorded instantaneously and were not reintegrated
into predefined windows for analysis. Therefore, within each five-second window of
accelerometer data, there were one or more activities performed. When participants transitioned
between activities, usually the activity transition occurred in the middle of a five-second window
(as opposed to perfectly at the end of one window/start of another), meaning that it was not
possible to accurately predict the activity performed in the transition. Therefore, in each fivesecond window in which a transition between activities occurred, the window was removed from
the data set before training and testing the activity type ANNs. The removal of transition windows
was necessary for validation purposes; when implemented in a free-living setting, transitions
between activities and multiple activities performed in a single window cannot be classified
correctly since predictive models can only predict one activity for a window. To minimize this
issue, we used five-second windows instead of the longer windows used in many previous studies
(Rothney, Neumann et al. 2007; Preece, Goulermas et al. 2009; Staudenmayer, Pober et al. 2009).
Statistical analyses
Classification accuracies were determined by calculating the sensitivity, specificity, and
area under the curve (AUC) for each ANN. Operational definitions of these three variables follow.
Sensitivity: Sensitivity refers to the ability of each ANN to correctly classify an activity when it
occurs (Parikh, Mathai et al. 2008). It represents the proportion of times an activity was predicted
when it actually occurred. Percent agreement, which is equivalent to sensitivity, is most often
reported in the literature for defining classification accuracy and was our primary measure of
classification accuracy in this study.
131

Specificity: Specificity refers to the ability of the each ANN to correctly classify an activity as not
occurring when it does not occur. It represents the proportion of times an activity was not
predicted when the activity, in fact, did not occur (Parikh, Mathai et al. 2008).
Area under the curve: AUC is the area under the receiver operating characteristic curve created by
graphing sensitivity of a variable on the y axis and 1-specificity on the x axis. A value of 1.00
represents perfect classification accuracy and a value of 0.50 represents accuracy which is no better
than what would be attained from chance alone. According to Metz, AUC values of ≥ 0.90 are
considered excellent, 0.80-0.89 are good, 0.70-0.79 are fair, and <0.70 are considered poor
classification accuracy (Metz 1978).
Sensitivities, specificities, and AUC were calculated from each accelerometer and each
iteration of the leave-one-out validation. Differences among the hip, wrists, and thigh
accelerometers were evaluated using repeated measures analysis of variance (RMANOVA). If the
RMANOVA test statistic was significant, post hoc tests were conducted using dependent-samples
t-test and a least significant difference (LSD) correction in order to account for multiple
comparisons and avoid inflation of type I error. Additionally, RMANOVA was used to compare
classification accuracies among different feature sets. The a priori Alpha level was set at P<0.05.
After running primary analyses for the left- and right-wrist accelerometer placements, the data set
was rearranged to compare dominant vs. non-dominant wrist placements, and dependent-samples
t-tests were run to compare overall sensitivities for the dominant vs. non-dominant wrist.
Confusion matrices were created for each of the four accelerometer placements, with the
actual activity performed as the rows of each matrix and the predicted activity as the columns of
each matrix.

132

Power analysis
We desired 80% power to detect a difference of at least moderate effect size (ES=0.5)
among accuracies of accelerometers compared to the criterion measure. Therefore, with the α level
set at α = 0.05, we needed 34 participants to be sufficiently powered to detect a moderate effect
size difference among groups. We chose to oversample by 10 participants in order to have
adequate sample size despite an expected loss of a few participants due to the possibility of
equipment malfunction, especially when using multiple accelerometers, a handheld computer, and
a portable metabolic analyzer (used for a different aim of the study).

133

RESULTS
Data were collected from 44 participants for the current study. However, significant data
loss from the accelerometers occurred in two participants, and there was an Oxycon portable
metabolic analyzer malfunction in three other participants which resulted in premature
termination of data collection. The remaining 39 participants who completed the 90-minute
protocol and had usable data were included in analyses (shown in Table 4.4). Those excluded
from the analysis were not statistically different from those included in terms of demographic
characteristics.
Table 4.4. Demographic characteristics of participants enrolled in study.
All (n=39)
Age (years)
22.1 (4.3)
Weight (kg)
72.4 (16.2)
Height (cm)
171.4 (10.1)
2
BMI (kg/m )
24.4 (3.6)
Data are displayed as mean (SD).

Males (n=19)

Females (n=20)

23.7 (5.0)
84.5 (13.1)
179.1 (7.7)
26.3 (3.4)

20.5 (2.7)
60.8 (8.9)
164.1 (5.7)
22.5 (2.6)

As most studies present classification accuracy in terms of sensitivity only, we present
the first part of our analysis in terms of sensitivity. Sensitivities for each accelerometer
placement are shown in Figure 4.2. The sensitivities were as high as 80.9% and 81.1% for the
left and right wrist accelerometer placements, respectively, with feature set 1. Both wrist
placements had significantly higher sensitivities than the thigh or hip placements, and this
difference existed for all five sets of features tested. Additionally, the thigh placement had
significantly higher sensitivity than the hip placement for all five feature sets. For all five sets of
features, the two wrist placements achieved similar overall sensitivities. Finally, feature sets 1
and 2 were modified to exclude height and weight as input features to determine if these

134

demographic characteristics affected classification accuracy. For both feature sets, classification
accuracies were unchanged by excluding height and weight as predictor variables.
Figure 4.2 also shows comparisons of classification accuracies achieved among the five
feature sets. For all four accelerometers, feature set 1 provided the highest sensitivity, and feature
set 2 provided the lowest sensitivity. Additionally, the ANNs created with feature sets 4 and 5
provided sensitivities similar to that achieved using feature set 1, but improvements from feature
set 2 were no longer statistically significant for the wrist accelerometers. The ANNs created
from feature set 3 yielded similar sensitivities to feature set 1 for the thigh and both wrist
accelerometers but significantly lower sensitivity with the hip accelerometer. Additionally,
inclusion and exclusion of height and weight as input variables had no effect on classification
accuracy. Due to the superiority of feature set 1 compared to the other feature sets, further
analyses were performed using feature set 1.

135

Figure 4.2. Sensitivity for the four accelerometer placements, compared among feature sets.
100.0
90.0

† ††

**

80.0

*

**

†

^
**
†

^^
**

^
**

Sensitivity (%)

70.0
60.0

ActiGraph Hip

50.0

ActiGraph Thigh

40.0

GENEA L. Wrist

30.0

GENEA R. Wrist

20.0
10.0
0.0
1

2

3

4

5

Feature Set

The * indicates significant differences from all other accelerometer placement sites.
The † indicates significant differences from feature set 1 (all 38 features).
The ^ indicates significant differences from feature set 2.

Table 4.5 provides a comparison of the sensitivity, specificity, and AUC of each
accelerometer placement using feature set 1. All three measures were significantly higher for
the two wrist-mounted accelerometers than the thigh or hip accelerometer placements, and all
three were also significantly higher for the thigh than the hip placement. The magnitude of
differences was much larger for sensitivity than specificity, which was consistently high across
all accelerometer placements. With AUC values of 0.90, both wrists achieved excellent
classification accuracy; in contrast, the hip and thigh placement sites achieved good
classification accuracy with AUC values of 0.82 and 0.84, respectively .

136

Table 4.5. Overall sensitivity, specificity, and AUC for each of the four accelerometer
placements for feature set 1.
ActiGraph
ActiGraph Thigh GENEA Left Wrist GENEA Right Wrist
Hip
71.7
81.3
81.4
Sensitivity 66.4
(65.9-66.8)
(71.3-72.1)*
(81.0-81.7)^
81.1-81.8)^
(%)
97.8
98.5
98.5
Specificity 97.4
(97.2-97.5)
(97.7-97.9)*
(98.4-98.6)^
(98.4-98.7)^
(%)
0.82
0.84
0.90
0.90
AUC
(0.80-0.84)
(0.83-0.86)*
(0.89-0.91)^
(0.88-0.91)^
Values are reported as mean (95% CI).
The * indicates significant differences from the hip accelerometer placement.
The ^ indicates significant differences form the hip and thigh accelerometer placementss.

Confusion matrices
The confusion matrices for each of the four accelerometer placement sites (using the
ANN created using feature set 1) can be found in Tables 4.6-4.9. The rows of each confusion
matrix are the actual activities performed, and the columns in each matrix represent the activities
predicted by the ANN. In Tables 4.6-4.9, the “Total” column represents the total number of fivesecond windows of data recorded for each activity, combined for all 39 participants. The cells
highlighted in gray represent the number of windows correctly classified for each activity. Table
4.6 shows the overall sensitivity, specificity, and AUC values (calculated from the data in the
confusion matrices) across all 14 activities for each of the four accelerometer placements.
Overall AUC was 0.90 for each of the wrist accelerometer placements, indicating excellent
classification accuracy according to parameters suggested by Metz (Metz 1978). The hip and
thigh placements achieved AUC values, of 0.82 and 0.85, respectively, indicating good
classification accuracy. To calculate sensitivity for a specific activity, we divided the number of
correctly classified windows by the total number of windows in which that activity was

137

performed. For example, in Table 4.6, lying was correctly classified by the hip accelerometer
2,677 out of the 2,948 windows when lying took place, resulting in a sensitivity of 90.8%. For
an example of specificity, activities other than lying were performed for a total of 39,277
windows for the hip placement (Table 4.6). Of these, the hip accelerometer ANN predicted lying
as the activity performed only 179 times, resulting in a specificity of (39,277-179)/39,277 =
0.995. The AUC values were calculated, based on the sensitivity and specificity values, using
Microsoft Excel.
A significant advantage of displaying data with a confusion matrix is that the matrix
allows one to assess misclassification to determine potential weaknesses of activity classification
from each accelerometer placement site and identify the types of activities for which each site
has the highest classification accuracy. From the confusion matrices, it is apparent that the thigh
placement performed best for the fast walk and stairs (although the hip was within 2%). At
77.1% sensitivity, the hip accelerometer placement site was best for the slow walk; conversely,
the wrist sites were best for jogging, although all four placement sites correctly recognized
jogging greater than 90% of the time.
For the exercise activities, the wrist accelerometer placements achieved sensitivities close
to 90%. The thigh had similar sensitivity for squats but much lower sensitivity for classifying
biceps curls (53.4%). Similarly, for the lifestyle activities, the wrist placements outperformed the
hip and thigh placements, achieving sensitivities close to 80% for each activity (with the hip at
50-60% and the thigh at 59-73%). Moreover, two of the three sedentary activities (reading and
computer use) were least likely to be detected by the hip placement (Table 4.6) and most likely
to be detected by the two wrist placements (Tables 4.8 and 4.9), although sensitivity for
recognizing lying down was highest with the hip (90.8%).
138

With the hip and thigh accelerometer placements, reading and computer use had low
sensitivities (36-55%) due to frequent misclassification of one activity as the other.
Additionally, the hip accelerometer ANN often misclassified these activities as standing (13-23%
of the time), while the thigh accelerometer ANN incorrectly predicted these activities as lying
down 12.7-19.3% of the time. For sweeping and laundry (the lifestyle activities), the hip and
thigh ANNs often misclassified one as the other (7-16% of the time) or as biceps curls (15-22%
of the time). Additionally, activities that took place while standing with minimal movement
(standing, biceps curls) were often classified incorrectly by the hip and thigh accelerometer
placements, with one often mistaken as the other. Cycling was not well-recognized by the hip as
it was often misclassified as a lifestyle activity (8.0% for laundry and 14.6% for sweeping);
conversely, cycling was detected with >84% sensitivity with the other three accelerometer
placements. Finally, all four accelerometers had trouble distinguishing between the two walking
speeds, frequently misclassifying one as the other or as stairs (9-14% of the time).
Activity categories
Upon further examinations of the four confusion matrices (Tables 4.6-4.9), it was
apparent that classification accuracies were lowest among the sedentary activities for the hip and
thigh accelerometer ANNs, with frequent misclassification of one sedentary activity with another
sedentary activity (i.e., reading as computer use or vice versa). However, for our purposes, it
was less critical to be able to differentiate among sedentary activities than it was to be able to
correctly identify when a sedentary activity occurred (vs. a non-sedentary activity); therefore, we
performed follow-up analyses combining lying, reading, and computer use into a ‘sedentary’
category. Similarly, we combined laundry and sweeping into a ‘lifestyle’ category and squats
and biceps curls into an ‘exercise category’, therefore leaving 10 activity categories. We chose
139

not to group the two walking speeds since they represent different intensities of movement (i.e.,
light vs. moderate) and may have different health implications.
Instead of displaying four additional confusion matrices, we summarized overall
classification accuracies for each accelerometer placement in Table 4.11. As can be seen, overall
AUC values improved for all four accelerometer sites when combining similar activities into
categories, with the largest improvement seen for the thigh accelerometer placement (AUC of
0.91). Additionally, when looking at overall sensitivity for classification, all four accelerometer
placement sites saw increased sensitivity, with the largest improvement, 12.6% (71.4% to
84.0%), seen in the thigh placement. The hip ANN accuracy improved 6.1%, and wrist
placements’ accuracies improved between 5.6-6.3%.
After combining into 10 categories, sedentary activities were still classified with lowest
sensitivity with the hip accelerometer placement (72.6%), although the sensitivity achieved with
the thigh placement (92.1%) approached that achieved by the wrist-mounted accelerometers
(92.7-93.5%). Standing was classified with much lower sensitivity by the hip and thigh
placements (56.9-69.6%) than the wrist placements (90.0-90.2%) due to frequent
misclassification with biceps curls (exercise category). Also, both the exercise and lifestyle
activities were best classified by the wrist placement sites (89.8-90.5% and 88.9-90.1%)
compared to the hip (68.5% and 60.6%) and thigh (83.1% and 70.3%) accelerometer placements.
Furthermore, jogging was classified with over 90% sensitivity for all placement sites but was
slightly better with the wrists (95.3-96.1%) than the hip (92.5%) or thigh (93.1%). Finally, nonwear was detected with over 80% sensitivity for all placements but was highest among the wrist
sites (88.8-91.3%). Therefore, the two wrist accelerometer placements appeared to provide

140

superior classification accuracies overall as well as for many of the specific activities and activity
categories.
Activity intensity categories
Lastly, we combined the 14 activities according to their intensity in order to determine
how well each accelerometer placement site could correctly classify activity intensity. The MET
values elicited by each activity are shown in Table 4.12, both as estimated from the
Compendium of Physical Activities and from the average METs measured by the portable
metabolic analyzer during the visit. Activities that were seated or lying and required less than
1.5 METs were classified as sedentary (SBRN 2012). Activities with an intensity between 1.5
and 2.9 METs were classified as light, an intensity between 3.0-5.9 METs were considered
moderate, and an intensity at or above 6.0 METs were considered vigorous (PAGAC 2008).
Both methods for intensity classification resulted in the same intensity categorization for each
activity, yielding three sedentary activities, five light-intensity activities, three moderate-intensity
activities, and two vigorous-intensity activities (non-wear was not included in an intensity
category). Sensitivity, specificity, and AUC for correct classification of activity intensity can be
seen in Table 4.13. Overall sensitivities increased for all four accelerometer placements
compared to sensitivities achieved when classifying individual activities or activity categories.
Additionally, sensitivity increased the most in the thigh placement, surpassing the sensitivities
achieved by the wrist placement sites. Specificity dropped slightly for all placements, resulting
in no change in AUC for the hip or wrist placements compared to that achieved for classification
of 10 activity categories. However, the AUC for the thigh placement increased to 0.94 and was
significantly higher than that achieved by the wrist placements.

141

Examining differences between the left and right wrists for classification accuracy
yielded only small differences. The right wrist classified lying and computer use better than the
left wrist, whereas the left wrist had a higher classification accuracy for reading than the right
wrist; however, when grouped into a sedentary category, classification accuracies were less than
1% different between the wrist monitors (93.5% for left wrist and 92.7% for right wrist). In fact,
the only activities with more than a 1% difference in classification accuracies between wrists
were cycling (3.1% higher for left wrist), exercise activities (1.2% higher for right wrist), stairs
(1.1% better for right wrist), and non-wear (2.5% better for right wrist).
Since four of the 39 participants included in the analyses reported being left-hand
dominant, we also analyzed overall classification accuracy between the dominant and nondominant wrist accelerometer placements (as opposed to strictly comparing left and right wrists).
As can be seen in Figure 4.3, no significant differences existed in overall classification accuracy
exist between the dominant and non-dominant wrist placements for any of the five feature sets.

142

Table 4.6. Confusion matrix for activity type classification from a hip-mounted ActiGraph accelerometer.
AG
Hip
LY

LY

RE

CO

ST

LA

SW

WS

WF

JO

CY

BC

SQ

SR

NW

Total

2677

17

1

11

70

3

2

1

0

6

53

0

1

106

2948

RE

21

1215

1055

441

108

20

4

0

0

45

149

3

1

270

3332

CO

2

729

1354

795

102

11

5

0

0

14

179

0

1

265

3457

ST

0

273

429

1447

44

12

2

0

0

21

238

0

0

75

2541

LA

111

27

38

21

1726

425

30

1

2

310

612

100

2

6

3411

SW

2

11

4

9

395

1473

34

0

0

404

32

79

12

0

2455

WS

0

3

0

0

1

49

2057

298

20

284

0

13

349

0

3074

WF

0

3

0

0

3

7

249

2228

13

4

4

19

360

0

2890

JO

0

0

0

0

25

3

6

26

2223

0

0

13

105

2

2403

CY

1

8

3

0

239

435

125

6

0

1970

170

2

27

1

2987

BC

3

220

133

269

585

24

7

0

0

153

1139

9

2

0

2544

SQ

0

1

11

1

67

97

53

0

0

126

7

1664

81

1

2109

SR

0

0

0

0

1

9

241

114

68

8

0

14

3116

0

3571

NW

39

527

143

85

10

4

0

0

2

4

6

0

0

3683

4503

The “Total” column is the total number of five-second intervals in which each activity was performed (data from all 39 participants).
Rows are actual activities performed, and columns are predicted activities.
LY = Lying, RE = Reading, CO = Computer, ST = Standing, LA = Laundry, SW = Sweeping, WS = Walk slow, WF = Walk fast, JO
= Jogging, CY = Cycling, BC = Biceps curls, SQ = Squats, SR = Stairs, NW = Non-wear.

143

Table 4.7. Confusion matrix for activity type classification from a thigh-mounted ActiGraph accelerometer.
AG
Thigh
LY

LY

RE

CO

ST

LA

SW

WS

WF

JO

CY

BC

SQ

SR

NW

Total

1590

711

310

0

3

2

1

0

0

8

63

3

5

252

2948

RE

643

1281

1224

0

40

5

2

0

0

6

0

3

2

126

3332

CO

440

884

1889

0

32

0

4

0

0

4

0

1

2

201

3457

ST

3

26

28

1768

149

13

4

0

0

1

548

0

1

0

2541

LA

18

71

102

54

2024

528

33

1

0

46

518

15

1

0

3411

SW

1

4

5

9

534

1786

42

0

0

53

8

2

11

0

2455

WS

1

0

0

0

16

64

2527

322

10

80

2

0

52

0

3074

WF

0

0

0

0

9

2

366

2078

43

80

7

7

298

0

2890

JO

0

0

0

0

1

3

44

27

2238

20

0

0

70

0

2403

CY

50

29

2

0

116

9

12

1

9

2694

4

9

50

2

2987

BC

0

16

13

409

680

20

7

0

0

18

1359

22

0

0

2544

SQ

0

2

0

2

44

30

75

5

0

36

15

1875

21

4

2109

SR

0

0

0

0

4

3

137

104

57

83

0

1

3182

0

3571

NW

566

7

35

3

8

0

0

0

1

2

4

3

1

3873

4503

The “Total” column is the total number of five-second intervals in which each activity was performed (data from all 39 participants).
Rows are actual activities performed, and columns are predicted activities.
LY = Lying, RE = Reading, CO = Computer, ST = Standing, LA = Laundry, SW = Sweeping, WS = Walk slow, WF = Walk fast, JO
= Jogging, CY = Cycling, BC = Biceps curls, SQ = Squats, SR = Stairs, NW = Non-wear.

144

Table 4.8. Confusion matrix for activity type classification for a GENEA accelerometer mounted on the left wrist.
GE Left
Wrist
LY

LY

RE

CO

ST

LA

SW

WS

WF

JO

CY

BC

SQ

SR

NW

Total

2246

461

86

1

23

1

2

1

0

13

64

5

0

45

2948

RE

406

2228

381

31

100

36

3

0

0

80

6

29

3

29

3332

CO

199

374

2722

6

15

16

3

0

0

91

4

11

2

14

3457

ST

5

53

7

2291

38

33

15

1

0

9

22

3

0

64

2541

LA

91

83

4

25

2857

225

22

2

1

23

23

6

49

0

3411

SW

3

24

3

17

273

1952

36

22

0

16

5

2

97

5

2455

WS

7

5

1

18

56

31

2198

354

11

11

15

4

363

0

3074

WF

2

0

0

3

26

17

380

2019

1

1

18

9

412

2

2890

JO

0

0

0

2

11

2

12

10

2291

0

0

0

75

0

2403

CY

8

55

184

15

51

32

9

1

0

2606

10

2

13

1

2987

BC

89

18

7

29

77

10

10

51

0

14

2225

8

2

4

2544

SQ

4

53

39

1

32

4

6

10

0

51

11

1894

4

0

2109

SR

2

0

0

1

95

109

273

300

147

2

8

15

2619

0

3571

NW

85

5

70

326

7

0

1

0

0

0

8

0

1

4000

4503

The “Total” column is the total number of five-second intervals in which each activity was performed (data from all 39 participants).
Rows are actual activities performed, and columns are predicted activities.
LY = Lying, RE = Reading, CO = Computer, ST = Standing, LA = Laundry, SW = Sweeping, WS = Walk slow, WF = Walk fast, JO
= Jogging, CY = Cycling, BC = Biceps curls, SQ = Squats, SR = Stairs, NW = Non-wear.

145

Table 4.9. Confusion matrix for activity type classification for a GENEA accelerometer mounted on the right wrist.
GE
Right
Wrist
LY

LY

RE

CO

ST

LA

SW

WS

WF

JO

CY

BC

SQ

SR

NW

Total

2391

398

38

1

34

2

1

0

0

17

62

1

3

0

2948

RE

519

1995

406

63

98

42

4

0

0

102

7

17

1

78

3332

CO

101

210

2971

0

14

7

1

0

0

142

0

0

1

10

3457

ST

9

105

4

2287

23

25

25

2

0

7

11

4

2

37

2541

LA

87

78

2

16

2705

321

41

6

0

43

27

8

77

0

3411

SW

14

12

0

14

342

1902

54

0

2

27

11

6

71

0

2455

WS

3

4

0

11

67

30

2219

352

9

10

20

81

268

0

3074

WF

0

0

0

1

32

24

406

2006

0

1

14

15

391

0

2890

JO

0

0

0

3

3

0

5

18

2310

0

3

0

61

0

2403

CY

51

72

195

2

79

38

11

3

0

2513

7

1

14

1

2987

BC

26

1

6

36

61

17

29

20

0

10

2310

17

10

1

2544

SQ

1

20

1

2

24

18

57

7

1

66

13

1854

44

1

2109

SR

6

1

0

0

119

96

266

271

112

4

15

25

2656

0

3571

NW

114

88

13

168

4

2

1

0

0

1

1

0

0

4111

4503

The “Total” column is the total number of five-second intervals in which each activity was performed (data from all 39 participants).
Rows are actual activities performed, and columns are predicted activities.
LY = Lying, RE = Reading, CO = Computer, ST = Standing, LA = Laundry, SW = Sweeping, WS = Walk slow, WF = Walk fast, JO
= Jogging, CY = Cycling, BC = Biceps curls, SQ = Squats, SR = Stairs, NW = Non-wear.
146

Table 4.10. Activity-specific sensitivity, specificity, and AUC among the four accelerometer placement sites.

LY
RE
CO
ST
LA
SW
WS
WF
JO
CY
BC
SQ
SR

Sensitivity (% agreement)
GE
GE
AG
AG
Left
Right
Hip
Thigh
Wrist
Wrist
90.8
53.9
76.2
81.1
(3.3)* (5.7)* (4.9)* (4.5)*
36.5
38.4
66.9
59.9
(5.2)* (5.3)* (5.1)* (5.3)*
39.2
54.6
78.7
85.9
(5.2)* (5.3)* (4.3)* (3.7)*
56.9
69.6
90.2
90.0
(6.1)* (5.7)*
(3.7)
(3.7)
50.6
59.3
83.8
79.3
(5.3)* (5.3)* (3.9)* (4.3)*
60.0
72.7
79.5
77.5
(6.2)* (5.6)* (5.1)* (5.3)*
66.9
82.2
71.5
72.2
(5.3)* (4.3)*
(5.1)
(5.0)
77.1
71.9
69.9
69.4
(4.9)* (5.2)*
(5.3)
(5.4)
92.5
93.1
95.3
96.1
(3.4)
(3.2)
(2.7)^ (2.5)^
66.0
90.2
87.2
84.1
(5.4)* (3.4)* (3.8)* (4.2)*
44.8
53.4
87.5
90.8
(6.2)* (6.2)* (4.1)* (3.6)*
78.9
88.9
89.8
87.9
(5.5)* (4.3)
(4.1)
(4.4)
87.3
89.1
73.3
74.4
(3.5)* (3.3)*
(4.6)
(4.6)

AG
Hip
99.5
(0.8)*
95.3
(2.3)
95.3
(2.2)
95.9
(2.5)*
95.7
(2.2)
97.2
(2.1)*
98.1
(1.5)
98.9
(1.2)
99.7
(0.7)
96.5
(2.1)*
96.3
(2.3)*
99.4
(1.1)
97.6
(1.6)

Specificity (%)
GE
AG
Left
Thigh
Wrist
95.6
97.7
(2.4)* (1.7)
95.5
97.1
(2.2) (1.8)^
95.6
98.0
(2.2) (1.5)^
98.8
98.8
(1.3)
(1.3)
95.8
97.9
(2.1)
(1.5)
98.3
98.7
(1.6)
(1.4)
98.1
98.0
(1.5)
(1.6)
98.8
98.1
(1.3)
(1.6)
99.7
99.6
(0.7)
(0.8)
98.9
99.2
(1.2)
(1.0)
97.1
99.5
(2.1)
(0.9)
99.8
99.8
(0.6)
(0.6)
98.7
97.4
(1.2)* (1.7)

147

AUC
GE
Right
Wrist
97.6
(1.8)
97.5
(1.7)^
98.3
(1.4)^
99.2
(1.1)*
97.7
(1.6)
98.4
(1.6)
97.7
(1.7)
98.3
(1.5)
99.7
(0.7)
98.9
(1.2)
99.5
(0.9)
99.6
(0.9)
97.6
(1.6)

AG
Hip

AG
Thigh

0.95
(0.01)*
0.66
(0.01)*
0.67
(0.01)*
0.76
(0.01)*
0.73
(0.01)*
0.79
(0.01)*
0.82
(0.01)*
0.88
(0.01)*
0.96
(0.01)
0.81
(0.01)*
0.71
(0.01)*
0.89
(0.01)*
0.92
(0.01)*

0.75
(0.01)*
0.67
(0.01)*
0.75
(0.01)*
0.84
(0.01)*
0.78
(0.01)*
0.86
(0.01)*
0.90
(0.01)*
0.85
(0.01)*
0.96
(0.01)
0.95
(0.01)*
0.75
(0.01)*
0.94
(0.01)
0.94
(0.01)*

GE
Left
Wrist
0.87
(0.01)*
0.82
(0.01)*
0.88
(0.01)*
0.94
(0.01)*
0.91
(0.01)*
0.89
(0.01)*
0.85
(0.01)
0.84
(0.01)
0.97
(0.01)*
0.93
(0.01)*
0.93
(0.01)*
0.95
(0.01)*
0.85
(0.01)*

GE
Right
Wrist
0.89
(0.01)*
0.79
(0.01)*
0.92
(0.01)*
0.95
(0.01)*
0.88
(0.01)*
0.88
(0.01)*
0.85
(0.01)
0.84
(0.01)
0.98
(0.01)*
0.92
(0.01)*
0.95
(0.01)*
0.94
(0.01)
0.86
(0.01)*

Table 4.10 (cont’d.)
NW
Total

81.8
(3.6)*
66.2
(1.4)*

86.0
(3.2)*
71.4
(1.4)*

88.8
(2.9)*
80.9
(1.2)

91.3
(2.6)*
81.1
(1.2)

98.1
98.4
(1.3) (1.2)
97.4
97.8
(0.5)* (0.4)*

99.6
(0.6)^
98.5
(0.4)

99.7
(0.5)^
98.5
(0.4)

0.90
(0.01)*
0.82
(0.00)*

0.92
(0.01)*
0.85
(0.00)*

0.94
(0.01)*
0.90
(0.00)

0.95
(0.01)*
0.90
(0.00)

Values are shown as Mean (SD). The * indicates significant differences from all other accelerometer placements. The ^ indicates
significant differences from the hip and thigh accelerometers.
LY = Lying, RE = Reading, CO = Computer, ST = Standing, LA = Laundry, SW = Sweeping, WS = Walk slow, WF = Walk fast, JO
= Jogging, CY = Cycling, BC = Biceps curls, SQ = Squats, SR = Stairs, NW = Non-wear.
AG Hip = Hip-mounted ActiGraph monitor, AG Thigh = Thigh-mounted ActiGraph monitor, GE Left Wrist = GENEA monitor
placed on the left wrist, GE Right Wrist = GENEA monitor placed on the right wrist.

148

Table 4.11. Overall sensitivity, specificity, and AUC among the four accelerometer placement sites with combined activity categories.
Sensitivity (% agreement)
Specificity (%)
AUC
GE
GE
GE
GE
GE
GE
AG
AG
AG
AG
AG
AG
Left
Right
Left
Right
Left
Right
Hip
Thigh
Hip
Thigh
Hip
Thigh
Wrist
Wrist
Wrist
Wrist
Wrist
Wrist
72.6
92.1
93.5
92.7
93.9
97.0
97.2
97.2
0.83
0.95
0.95
0.95
SE
(2.8)* (1.7)*
(1.6)*
(1.6)*
(1.5)* (1.1)
(1.0)
(1.0)
(0.01)* (0.01)
(0.01)
(0.01)
56.9
69.6
90.2
90.0
95.9
98.8
98.8
99.2
0.76
0.84
0.94
0.95
ST
(6.1)* (5.7)*
(3.7)
(3.7)
(2.5)* (1.4)
(1.3)
(1.1)*
(0.01)* (0.01)* (0.01)* (0.01)*
68.5
83.1
90.5
89.8
94.7
96.6
97.7
97.6
0.82
0.90
0.94
0.94
LI
(3.8)* (3.1)*
(2.4)
(2.5)
(1.8)* (1.5)* (1.2)
(1.2)
(0.01)* (0.01)* (0.01)
(0.01)
66.9
82.2
71.5
72.2
98.1
98.1
98.0
97.7
0.82
0.90
0.85
0.85
WS
(5.3)* (4.3)*
(5.1)
(5.0)
(1.6)
(1.5)
(1.6)
(1.7)
(0.01)* (0.01)* (0.01)
(0.01)
77.1
71.9
69.9
69.4
98.9
98.8
98.1
98.3
0.88
0.85
0.84
0.84
WF
(4.9)* (5.2)*
(5.3)
(5.4)
(1.2)
(1.2)
(1.6)
(1.5)
(0.01)* (0.01)* (0.01)
(0.01)
92.5
93.1
95.3
96.1
99.7
99.7
99.6
99.7
0.96
0.96
0.97
0.98
JO
(3.4)
(3.2)
(2.7)^
(2.5)^
(0.7)
(0.7)
(0.8)
(0.7)
(0.01) (0.01)
(0.01)* (0.01)*
66.0
90.2
87.2
84.1
96.5
98.9
99.2
98.9
0.81
0.95
0.93
0.92
CY
(5.4)* (3.4)*
(3.8)*
(4.2)*
(2.1)* (1.2)
(1.0)
(1.2)
(0.01)* (0.01)* (0.01)* (0.01)*
60.6
70.3
88.9
90.1
94.7
95.7
97.3
97.4
0.78
0.83
0.93
0.94
EX
(4.5)* (4.2)*
(2.9)*
(2.7)*
(2.0)* (1.8)* (1.5)
(1.4)
(0.01)* (0.01)* (0.01)* (0.01)*
87.3
89.1
73.3
74.4
97.6
98.7
97.4
97.6
0.92
0.94
0.85
0.86
SR
(3.5)* (3.3)*
(4.6)
(4.6)
(1.6)
(1.2)* (1.7)
(1.6)
(0.01)* (0.01)* (0.01)* (0.01)*
81.8
86.0
88.8
91.3
98.1
98.4
99.6
99.7
0.90
0.92
0.94
0.95
NW
(3.6)* (3.2)*
(2.9)*
(2.6)*
(1.3)
(1.1)
(0.6)^
(0.5)^
(0.01)* (0.01)* (0.01)* (0.01)*
72.5
84.0
86.6
86.7
96.8
98.1
98.3
98.3
0.85
0.91
0.92
0.92
Total
(1.4)* (1.1)*
(1.0)
(1.0)
(0.5)* (0.4)* (0.4)
(0.4)*
(0.00)* (0.00)* (0.00)
(0.00)
Values are shown as Mean (SD). The * indicates significant differences from all other accelerometer placement sites. The ^ indicates
significant differences from the hip and thigh accelerometers.
SE = Sedentary, ST = Standing, LI = Lifestyle, WS = Walk slow, WF = Walk fast, JO = Jogging, CY = Cycling, EX = Exercise, SR =
Stairs, NW = Non-wear.
149

Table 4.11 (cont’d.)
AG Hip = Hip-mounted ActiGraph monitor, AG Thigh = Thigh-mounted ActiGraph monitor, GE Left Wrist = GENEA monitor
placed on the left wrist, GE Right Wrist = GENEA monitor placed on the right wrist.

150

Table 4.12. Activities classified into activity intensities by the Compendium and by measured METs.
Activity

Code

Description

Lying

07011

Reading

09030

Computer

09040

Standing

07041

Lying quietly, doing nothing,
lying in bed awake, listening to
music (not talking or reading)
Sitting, reading, book,
newspaper, etc.
Sitting, writing, desk work,
typing
Standing, fidgeting

Laundry

05090

Sweeping

05011

Biceps
09071
curls
Walk slow 17152
Walk fast

17200

Cycling

02017

Squats

02052

Compendium Compendium Experimentally
METs
Intensity
measured METs
[Mean (SD)]
1.3
Sedentary
1.4 (0.7)

Experimental
Intensity

1.3

Sedentary

1.4 (0.6)

Sedentary

1.3

Sedentary

1.4 (0.6)

Sedentary

1.8

Light

1.4 (1.0)

Light

Laundry, fold or hang clothes,
put clothes in washer or dryer,
packing suitcase, washing
clothes by hand, implied
standing, light effort
Cleaning, sweeping, slow, light
effort
Standing, miscellaneous

2.0

Light

2.1 (0.6)

Light

2.3

Light

2.5 (0.5)

Light

2.5

Light

2.0 (0.6)

Light

Walking, 2.0 mph, level, slow
pace, firm surface
Walking, 3.5 mph, level, brisk,
firm surface, walking for
exercise
Bicycling, stationary, 51-89
watts, light-to-moderate effort
Resistance (weight) training,
squats , slow or explosive effort

2.8

Light

2.9 (0.7)

Light

4.3

Moderate

4.2 (1.1)

Moderate

4.8

Moderate

4.4 (1.1)

Moderate

5.0

Moderate

4.5 (1.2)

Moderate

151

Sedentary

Table 4.12 (cont’d.)
Stairs

17130

Jogging
Non-wear

12030
--

Stair climbing, using or
climbing up ladder (Taylor
Code 030)
Running, 5 mph (12 min/mile)
--

8.0

Vigorous

6.8 (1.5)

Vigorous

8.3
--

Vigorous
--

8.0 (1.8)
--

Vigorous
--

152

Table 4.13. Overall sensitivity, specificity, and AUC among the four accelerometer placement sites for classification of activity
intensity.
Sensitivity (% agreement)

Specificity (%)

GE
Right
Wrist
91.3
(2.6)*

AG
Hip

AG
Thigh

86.0
(3.2)*

GE
Left
Wrist
88.8
(2.9)*

98.1
(1.3)

72.6
(2.8)*

92.1
(1.7)*

93.5
(1.6)

92.7
(1.6)

Lightintensity

75.8
(2.3)*

93.4
(1.3)*

89.1
(1.6)

Moderateintensity

75.4
(3.0)*

85.0
(2.5)*

Vigorousintensity

92.3
(2.2)

GE
Right
Wrist
99.7
(0.5)^

AG
Hip

AG
Thigh

98.4
(1.1)

GE
Left
Wrist
99.6
(0.6)^

0.90
(0.01)*

93.9
(1.5)*

97.0
(1.1)

97.2
(1.0)

97.2
(1.0)

89.9
(1.6)

86.5
(1.8)*

96.3
(1.0)*

93.7
(1.3)

82.6
(2.7)

81.0
(2.7)

94.4
(1.6)*

97.6
(1.1)*

92.9
(2.1)

85.9
(2.8)^

86.0
(2.8)^

97.6
(1.2)

87.3
(1.8)*
78.0
(1.3)*

93.0
(1.3)*
90.7
(0.9)*

89.4
(1.6)
88.4
(1.0)

88.6
(1.0)
88.5
(1.0)

92.4
(1.4)*
92.5
(0.8)*

AG
Hip

AG
Thigh

Non-wear

81.8
(3.6)*

Sedentary

MVPA
Total

AUC

0.92
(0.01)*

GE
Left
Wrist
0.94
(0.01)*

GE
Right
Wrist
0.95
(0.01)*

0.83
(0.01)*

0.95
(0.01)

0.95
(0.01)

0.95
(0.01)

93.8
(1.3)

0.81
(0.00)*

0.95
(0.01)*

0.91
(0.01)*

0.92
(0.01)*

96.8
(1.2)

96.5
(1.3)

0.85
(0.01)*

0.91
(0.01)*

0.90
(0.01)*

0.89
(0.01)*

98.6
(0.9)*

97.4
(1.3)

97.5
(1.3)

0.95
(0.01)*

0.96
(0.01)*

0.92
(0.01)

0.92
(0.01)

97.6
(0.8)*
97.2
(0.5)*

95.5
(1.1)
96.2
(0.6)

95.3
(1.1)
96.2
(0.6)

0.90
(0.00)*
0.85
(0.00)*

0.95
(0.01)*
0.94
(0.00)*

0.92
(0.01)
0.92
(0.00)

0.92
(0.01)
0.92
(0.00)

Values are shown as Mean (SD).
The * indicates significant differences from all other accelerometer placement sites.
The ^ indicates significant differences from the hip and thigh accelerometers.
AG Hip = Hip-mounted ActiGraph monitor, AG Thigh = Thigh-mounted ActiGraph monitor, GE Left Wrist = GENEA monitor
placed on the left wrist, GE Right Wrist = GENEA monitor placed on the right wrist.
153

Figure 4.3. Comparison of dominant and non-dominant wrist accelerometer sensitivities.
100
90

Sensitivity (%)

80
70
60

50

Non-Dominant

40

Dominant

30
20
10
0
1

2

3

4

5

Feature Set

154

DISCUSSION
The purpose of this study was to develop and validate ANNs using data from
accelerometers located on several locations of the body in order to classify activity types.
Specifically, we compared the accuracy of ANNs developed for wrist-, hip-, and thigh-mounted
accelerometers as well as compared accuracy of accelerometers placed on the left and right
wrists. A secondary purpose was to assess the accuracies of the four accelerometer placement
sites for classifying specific types of activities (e.g., sedentary, lifestyle, and exercise activities)
and activity intensities and to test multiple feature sets.
The wrist-mounted accelerometers outperformed the hip- and thigh-mounted
accelerometers for total classification accuracy, achieving over 80% sensitivity in our initial
analysis and over 86% when combining similar activities into subcategories (i.e., combining
lying, reading, and computer use as sedentary). Additionally, when looking solely at the three
sedentary activities, the wrist monitors provided sensitivities of 92.7-93.5% when combined into
a single sedentary category, which was slightly higher than the thigh (92.1%) and much higher
than the hip (72.6%). The wrist accelerometer placement sites also had the highest sensitivities
for detecting exercise and lifestyle activities, although the thigh had the highest sensitivity for
classifying cycling. Also, the wrist monitors provided higher sensitivity for standing than the hip
or thigh. In direct comparison of the left and right wrist placements, we found no differences in
overall sensitivity and only very slight differences (1-5%) for specific activity types. These
small differences were statistically significant due to the large number of windows of data
(>42,000) used when determining sensitivity, but the clinical or real-world significance of the
differences between the left and right wrist accelerometer placements is likely minimal.
155

Furthermore, follow-up analyses comparing dominant vs. non-dominant wrists yielded no
differences in overall classification accuracy. These findings provide strong evidence that wrist
accelerometers can be used to achieve high accuracy for recognition of a variety sedentary,
ambulatory, lifestyle, and exercise activities.
The superiority of the wrist accelerometer placements for the exercise and lifestyle
activities was expected because these activities (with the exception of squats), utilize mostly
upper-body movements. These activities would be easier to detect with monitors worn on the
wrists compared to accelerometers worn on the hip or thigh since the patterns of wrist movement
are likely more distinct than thigh or hip movement and ,therefore, would be best recognized
using pattern recognition approaches such as ANNs. The superior accuracy of the wrist
accelerometer placement sites for measurement of specific sedentary activities was initially
surprising given that thigh-mounted accelerometers have consistently yielded high accuracy for
measurement of time spent in sedentary activities as well as breaks in sedentary activities
(Kozey-Keadle, Libertine et al. 2011; Lyden, Kozey Keadle et al. 2012). However, a recent
study by Rowlands et al. described an elegant and accurate way to use a concept called the
“sedentary sphere” to identify specific types of sedentary activities from a wrist accelerometer
(Rowlands, Olds et al. 2014). Our study provides further evidence that a wrist-worn
accelerometer can provide an accurate indication of specific types of sedentary activities.
The high overall sensitivities achieved with the wrist-mounted accelerometers was also
surprising given previous research showing that wrist-mounted accelerometers are often
outperformed by monitors on other parts of the body. A study by Mannini et al. (Mannini, Intille
et al. 2013) showed higher classification accuracies of an ankle monitor (95%) compared to a
156

wrist monitor (84.7%), although the overall accuracy of the wrist monitor is very similar to that
achieved in our study. Furthermore, Skotte et al. found classification accuracies of 99% for
classifying activity type using hip- and thigh-mounted accelerometers (Skotte, Korshoj et al.
2014), which is well above the classification accuracies achieved in our study. However, Skotte
et al. tested only six activities, and the authors ended up removing one activity (stair climbing)
since it had poor classification accuracy. Results from Cleland et al. (Cleland, Kikhia et al.
2013) showed very high classification accuracies for hip, wrist, and thigh accelerometers (9597%), but again, only seven activities were used. In short, most research comparing several
different accelerometer placement sites is limited by use of small subject numbers (i.e., < 20)
and/or small numbers of activities in their studies, limiting their comparisons of the advantages
and disadvantages of each placement site.
In a recent study, members of our research team found higher classification accuracies of
a thigh-mounted accelerometer compared to a wrist-mounted accelerometer (78% vs. 71%) for
classifying 14 activities in a laboratory-based setting (Dong, Montoye et al. 2013). However, the
accelerometers used in this study measured acceleration data in only two axes (Dong, Montoye
et al. 2013), whereas accelerometers used in the current study measured accelerations in three
planes of movement (triaxial). It is reasonable to assume that the hip and thigh, which lie closest
to the center of the body, would move mostly in the anterior-posterior and vertical planes;
conversely, the wrist, which was the most distal accelerometer attachment site tested, would
experience significant movement in the medial-lateral plane as well as the anterior-posterior and
vertical planes. Therefore, addition of a third measurement axis in the current study may have
benefitted the ANNs for the wrist accelerometers much more than the hip or thigh
157

accelerometers and contributed to the much higher accuracy for the ANNs developed for the
wrist-mounted accelerometers seen in this study compared to Dong’s work. Also, Dong’s study,
as well as Cleland’s and Skotte’s, used a laboratory-based setting, which would have
questionable generalizability to a free-living environment (Gyllensten and Bonomi 2011; Trost,
Wong et al. 2012; Lyden, Keadle et al. 2013; Mannini, Intille et al. 2013). Our current study
builds off of this previous research by validating activity type recognition of ANNs developed
for wrist-, hip-, and thigh-mounted accelerometers in a simulated-free living setting, with a wide
range of activities and the ability to directly compare monitors located at these popular
placement sites.
The utility of the wrist placement sites for activity type prediction found in this study is
especially encouraging given its implementation in many studies, including the 2011-2014
NHANES data collection cycle. There is preliminary evidence that participant compliance is
improved with wear on the wrist (Troiano, McClain et al. 2014), so the wrist holds promise for
use in large studies due in part to this improved compliance but also its high accuracy of
measurement for both activity type and energy expenditure prediction seen in this study as well
as previous work. Additionally, we have found that choice of wrist (left vs. right and dominant
vs. non-dominant) does not lower accuracy of activity type prediction, which is encouraging as it
may allow participants in large studies to choose the wrist on which they wear the accelerometer.
The use of wrist-mounted monitors for sleep measurement in previous research (Kripke,
Mullaney et al. 1978; Mullaney, Kripke et al. 1980; Jean-Louis, Kripke et al. 2001) suggests that
there is potential for a single accelerometer placed on the wrist to measure physical activity,
inactivity/sedentary behavior, and sleep accurately, thereby providing a comprehensive
158

measurement tool for assessing several different behavioral characteristics known to have strong
associations with health. Investigators have begun to use commercially available devices such as
the Fitbit (Fitbit Inc., San Francisco, CA) and Nike Fuelband (Nike Inc., Beaverton, OR) for
comprehensive measurement of activity and sleep, but limited evidence available does not
support their accuracy (Montgomery-Downs, Insana et al. 2012; Dannecker, Sazonova et al.
2013; Fortune, Lugade et al. 2014), and we know of no research-grade devices yet capable of
accomplishing this task. Now that we have developed algorithms to classify activity type and
predict energy expenditure from a wrist-worn accelerometer, we intend to expand our
investigations and measure sleep duration and quality as well as sedentary behavior.
While the wrist accelerometer placements performed the best in this study, the
performance of the hip and thigh accelerometer placements should not be overlooked. At over
70% for prediction accuracy when combining similar activities into categories, the hip placement
performed well, although this accuracy is not the highest achieved in the literature. As
previously discussed, Cleland et al. (Cleland, Kikhia et al. 2013) and Skotte et al. (Skotte,
Korshoj et al. 2014) both achieved over 97% accuracy for the hip-mounted accelerometer for
classifying activity type. In the current study, main weaknesses of the hip placement were
encountered in classifying sedentary behaviors, standing, and lifestyle and exercise activities. Of
these, only sitting and standing were included in the studies by Cleland and Skotte, likely
contributing to their higher accuracy of measurement. Members of our research group (Dong,
Montoye et al. 2013) used a similar activity set and achieved higher accuracy (78%) for the hip
than in the current study, which may be partially attributed to the use of a simulated free-living
setting in the current study (vs. a laboratory-based setting in the previous study). Additionally,
159

our research group controlled the exact speed of the walking (2.0 and 4.0 miles/hour), jogging
(6.0 miles/hour), and stair climbing (60 steps/min) in the previous study, whereas the current
study included no such limitations on speeds, resulting in much more variability in speed of
movement and potentially lower accuracy for classifying these tasks. The discrepancy in
accuracies between the lab-based and free-living settings also indirectly shows the importance of
validating predictive algorithms in a setting similar to that in which they are intended for use,
thereby obtaining a more realistic view of their accuracy in the true free-living setting. In terms
of accurately classifying sedentary activities, the hip performed moderately well, with 72.6%
accuracy for the combined category but only 36.5-90.8% for the individual sedentary activities.
According to Table 4.10, sedentary activities were often misclassified as standing with the hip
accelerometer placement, which is not surprising given the static nature of these activities as well
as the similar hip angle seen with sitting and standing. Poor classification of sedentary behavior
by hip-worn accelerometers was also seen in studies by Lyden et al. (Lyden, Kozey Keadle et al.
2012) and Kozey-Keadle et al. (Kozey-Keadle, Libertine et al. 2011), where the hip-worn
accelerometer inaccurately predicted total sedentary time and breaks in sedentary time using the
cut-point approach to classification. Despite fairly widespread use of hip-mounted
accelerometers for measuring sedentary behavior in previous literature, our findings, along with
those of Lyden and Kozey-Keadle, suggest that hip-mounted accelerometer estimates of
sedentary behavior should be used with caution, regardless of whether cut-points or machine
learning are used for prediction.
The current study showed that a thigh-mounted accelerometer performed better than the
hip placement but worse than the wrist placements for activity type prediction. At 71.4%, the
160

thigh placement had lower accuracy than achieved in Cleland’s and Skotte’s work (>95%) and
slightly lower accuracy than in our previous, lab-based study (78%). Our lower measurement
accuracy is likely due to the addition of more activities than the other studies and use of a
simulated free-living setting. Notably, accuracy of the thigh accelerometer increased from
71.4% to 84.0% upon combining the categories for sedentary, lifestyle, and exercise activities.
This increase was due mostly to improvement in accuracy of sedentary and lifestyle activity
measurement accuracy upon combining into categories. The inability to measure individual
sedentary activities accurately was expected since the angle and movement of the thigh is very
similar for lying and seated activities. However, of greater importance is that the thigh
accelerometer achieved high accuracy differentiating sedentary activities from non-sedentary
activities and was able to differentiate between sedentary activities and standing (as seen in
Table 4.8), which the hip-mounted accelerometer was unable to accomplish. The differentiation
between sedentary activities and standing is important for measuring total sedentary time as well
as breaks in sedentary behavior, and the high accuracy we found for the thigh is in accordance
with previous studies showing excellent accuracy of thigh-mounted accelerometers for
measuring sedentary time and breaks in sedentary behavior (Grant, Ryan et al. 2006; Lyden,
Kozey Keadle et al. 2012). The high overall accuracy of thigh-mounted accelerometers for
classifying sedentary and non-sedentary activities achieved in this study provides further
rationale for their use in measuring PA as well as SB in free-living settings.
Grouping of activities by intensity resulted in highest sensitivity and AUC by the thigh
accelerometer placement and slightly lower values for the wrist placements (Table 4.13). Given
that the thigh accelerometer placement performed best for estimation of energy expenditure
161

(Chapter 3), the higher performance of the thigh placement for prediction of activity intensity
provides further evidence that the thigh accelerometer placement shows higher measurement
accuracy than the hip or wrist placements for prediction of the energy cost of activities. In
studies specifically focused on identifying time spent in different activity intensities (i.e., for
identifying time spent sedentary or time spent in MVPA), the thigh accelerometer placement
may be optimal. However, the wrist placements appear best if trying to identify individual types
of activities.
Due to the different strengths of the monitors located on the hip, thigh, and wrists, choice
of placement should depend on the population of interest as well as the specific research
question. ANNs for hip-mounted accelerometers classified ambulatory activities and stair use
well in this study and have previously been shown to provide highly accurate estimates of energy
expenditure (Staudenmayer, Pober et al. 2009) (Chapter 3), but studies in pregnant or obese
populations should consider avoiding use of a hip accelerometer due to monitor tilt that can
occur (Feito, Bassett et al. 2011). For researchers interested specifically in measuring sedentary
behavior, the thigh-mounted accelerometer may be preferable due to high accuracy of
classification seen in this study as well as in previous work by Lyden et al. and Kozey Keadle et
al. (Kozey-Keadle, Libertine et al. 2011; Lyden, Kozey Keadle et al. 2012). In contrast, those
seeking to maximize compliance or those interested in recognition of activity types or sleep may
benefit from use of a wrist-mounted accelerometer (Jean-Louis, Kripke et al. 2001).

162

Strengths and limitations
This study had several limitations that must be addressed. First, our sample was
relatively homogenous and consisted mainly of younger adults who had a lower BMI than the
general population. Individuals larger or smaller than those tested, or individuals who perform
activities at a different intensity than performed in the study, may not be measured well using the
current ANN algorithms. Additionally, our study provided only a sample of activities that
people may perform on a daily basis, and therefore our models cannot necessarily be used for
comprehensive assessment of everyday activities. Finally, our study did not record the walking
or jogging speeds performed by participants, which may have been useful for evaluating the
differences between these activities. However, MET values recorded for the slow walk averaged
2.9 METs, while the fast walk elicited an average MET value of 4.2 METs, providing evidence
that the two walking speeds were distinct activities that fell into different intensity categories
(i.e., light vs. moderate). This study also had several notable strengths. First, our models were
created and tested using more than 42,000 five-second windows of data from 39 participants,
which is larger than many data sets used in previous studies. Second, validation studies cannot
reasonably test all activities that a person could perform, so it is important to pick a set of
activities that encompasses a range of intensities and types as well as activities commonly
performed in daily life. Our study incorporated a diverse collection of activities, including
commonly performed activities such as walking and several sedentary activities, as well as
lifestyle and exercise activities of varying intensities.

Additionally, our use of a simulated-free

living setting is a major advantage as it allowed for much greater variability in the movement
patterns and intensities of the activities performed as well as not requiring steady-state to be
163

achieved, as is usually the case for laboratory-based protocols. Our inclusion of wrist-, hip-, and
thigh-mounted accelerometers is also a study strength as it allowed for direct comparison of the
accuracy of models developed for each placement site. Finally, our use of Microsoft Excel for
data processing cleaning, and analysis and R statistical software for model creation and testing
provides further evidence of the accessibility of machine learning to those without access to
highly powered statistical software or computer programming experience. Staudenmayer et al.
(Staudenmayer, Pober et al. 2009) and Lyden et al. (Lyden, Keadle et al. 2013) provide simple
details on the code used for developing and testing ANNs using R software.
Conclusions
In conclusion, we tested four accelerometers located on the left and right wrists, right hip,
and right thigh for their utility in classifying activity type across a wide range of activities
performed in a simulated free-living setting. Overall sensitivity was moderately high at 66-81%,
which improved to 73-87% when condensing similar activities into categories. Both wrist
accelerometer placement site outperformed the hip and thigh placements for total classification
accuracy as well as in many of the individual activities, providing further support of the wrist
placement for use in large epidemiologic and surveillance studies. Our study builds upon
previous work by using a simulated free-living setting, which enhances generalizability of the
findings as well as the predictive models created. In the future, we intend to expand our
algorithms to measure sleep quality and duration and validate the algorithms in a larger, more
diverse sample.

164

CHAPTER 5
VALIDATION AND COMPARISON OF ACCELEROMETERS WORN ON THE
WRISTS, HIP, AND THIGH FOR MEASURING SEDENTARY BEHAVIOR

ABSTRACT
The purpose of this study was to validate and compare the accuracy of activity type prediction
models developed for accelerometers placed on the wrists, hip, and thigh for measurement of
total time spent in sedentary behavior and breaks in sedentary behavior. METHODS: Forty four
healthy adults participated in a 90-minute simulated free-living activity protocol, in which
participants performed a total of 14 sedentary, ambulatory, lifestyle, and exercise activities for 310 minutes each. Participants dictated the order, duration, and intensity of activities, which were
recorded using direct observation (for a criterion measure of total time spent in sedentary
behavior and breaks in sedentary behavior). All time spent in lying, reading, and computer use
were summed to obtain a measure of total time spent in sedentary behavior. Any transition from
one of these three activities to a non-sedentary activity was recorded to measure breaks in
sedentary behavior. Four accelerometers were worn (right and left wrists, right hip, and right
thigh) in order to predict total time spent in sedentary behavior and breaks in sedentary behavior
compared to that measured by direct observation (using the activity type prediction models
developed in our previous research [Chapter 4]). We used and tested three break intervals (5-,
30-, and 60-seconds) in order to determine the best method of characterizing breaks in sedentary
behavior from an accelerometer. Differences among accelerometer-predicted and criterionmeasured total time spent in sedentary behaviors and breaks in sedentary behavior were
evaluated using repeated measures analysis of variance and by non-overlap of 95% confidence
165

intervals. RESULTS: For total time spent in sedentary behavior, all four accelerometers
provided similar estimates to direct observation, but the wrist accelerometers had the lowest error
for prediction (2.8-3.1 minutes), and the hip had the highest error (7.2 minutes). For breaks in
sedentary behavior, the 30-second break interval provided the greatest predictive accuracy.
Using this interval, the hip and left wrist accelerometer produced estimates similar to that
measured by direct observation, but the thigh and right wrist underestimated breaks in sedentary
behavior by 15-17%. CONCLUSIONS: Hip and left wrist accelerometer placements provided
the highest overall accuracy for measuring the multiple constructs of sedentary behavior These
findings lie in contrast to previous research showing the utility of thigh accelerometers for
measurement of sedentary behavior and therefore warrant confirmation. The superiority of the
left wrist accelerometer over the right wrist accelerometer provides support for the convention
that accelerometers be placed on the non-dominant wrist for sedentary behavior measurement.

166

INTRODUCTION
Physical activity (PA) has long been recognized for its beneficial effects on many health
indices, such as lowering risk of obesity, cardiovascular disease, and certain cancers, just to
name a few (Morris, Clayton et al. 1990; King and Tribble 1991; Thune and Furberg 2001).
Correspondingly, the Physical Activity Guidelines Advisory Committee issued a report in 2008
detailing evidence-based recommendations that adults should attain 150 minutes/week of
moderate-intensity PA or 75 minutes of vigorous-intensity PA to experience health benefits
(2008). Sedentary behavior (SB) has traditionally been viewed as a lack of PA, and people were
considered sedentary if not meeting the national PA recommendations (Pate, O'Neill et al. 2008).
However, it is possible to meet PA recommendations and still spend substantial time engaged in
sedentary activities, (i.e., driving, using a computer, watching TV), a group Owen et al. called
the “active couch potatoes” (Owen, Healy et al. 2010). More recently, epidemiologic and
laboratory-based studies have started uncovering associations between high amounts of SB and
diminished metabolic, cardiovascular, and bone health as well as an increased risk of obesity,
some cancers, and all-cause mortality (Zerwekh, Ruml et al. 1998; Hu, Li et al. 2003; Hamilton,
Hamilton et al. 2004; Hamilton, Hamilton et al. 2007; Howard, Freedman et al. 2008; Schrage
2008; Katzmarzyk, Church et al. 2009; Owen, Healy et al. 2010) Notably, these associations are
largely independent of level of PA. Additionally, it appears that the way SB is accrued may
influence its effects on health, with longer periods of SB being worse than SB broken up
periodically by short, non-sedentary activities (Healy, Dunstan et al. 2008; Owen, Healy et al.
2010).

167

Despite emerging findings of the potential health risks of SB, there is currently
insufficient research to allow for evidence-based recommendations to be created with regard to
SB. Given that adults spend well over 50% of their waking hours in SB (Matthews, Chen et al.
2008), it is important to accurately measure SB in order to better determine health risks
associated with SB and develop evidence-based recommendations for SB in order to improve
health.
Accelerometer-based activity monitors have become a widely used and accepted method
for PA and energy expenditure measurement due to their objectivity, relatively low participant
and researcher burden, and high measurement accuracy in numerous validation studies
conducted in laboratory-based and free-living environments (Welk 2002). Traditionally,
accelerations of the body were recorded and translated into ‘activity counts,’ which correspond to
magnitude of acceleration. Activity counts could then be placed into simple linear regression
equations to estimate energy expenditure and activity intensity (Montoye, Washburn et al. 1983;
Freedson, Melanson et al. 1998). A count cut-point of <100 counts/minute has been widely used
as a threshold for estimating SB using accelerometers; however, this cut-point has been shown to
provide inaccurate estimates of SB and an inability to measure breaks in SB in free-living settings
(Kozey-Keadle, Libertine et al. 2011; Lyden, Kozey Keadle et al. 2012). Other cut-points ranging
from 50-250 counts/minute have been used to define SB with varying degrees of accuracy (KozeyKeadle, Libertine et al. 2011; Lyden, Kozey Keadle et al. 2012),. Regardless of which cut-point is
chosen to designate SB, the cut-point method has several notable fallacies. First, the cut-point
method does not allow for differentiation of SB from accelerometer non-wear without establishing
additional data reduction rules (ex., how many consecutive minutes of 0 counts should count as
168

non-wear), which can affect estimates of SB and PA (Masse, Fuemmeler et al. 2005). Moreover,
the cut-point approach would likely classify standing as sedentary (since little movement occurs
when standing), but several studies have provided evidence that standing elicits a different
physiologic response than sitting or lying (Bey and Hamilton 2003). Additionally, standing has
been shown to be inversely associated with all-cause mortality and cardiovascular disease,
especially in individuals not meeting PA recommendation (Katzmarzyk 2014). Therefore, an
accurate measurement tool for SB needs to be able to differentiate between non-wear and SB as
well as standing and sitting/lying.
Due to limitations of the cut-point approach to measuring SB as well as energy
expenditure, researchers have turned to more advanced data processing techniques, such as
machine learning models, to improve accuracy of activity measurement. These studies show
dramatically improved measurement of energy expenditure (Staudenmayer, Pober et al. 2009;
Freedson, Lyden et al. 2011; Lyden, Keadle et al. 2013) and highly accurate classification of
activity type from a hip-mounted accelerometer (Pober, Staudenmayer et al. 2006; Staudenmayer,
Pober et al. 2009; Freedson, Lyden et al. 2011). However, to our knowledge, only one study has
used machine learning models developed for a hip-mounted accelerometer specifically to measure
total time in SB and breaks in SB. In this study, Lyden et al. found that total time spent in SB and
breaks in SB could be measured accurately in a free-living setting, but only when the machine
learning model was also developed in the free-living setting in which it was subsequently used.
Therefore, there is encouraging, but by no means conclusive, evidence that machine learning
models can improve measurement of SB using hip-mounted accelerometers.

169

Despite the common use of hip-mounted accelerometers, there are advantages of wearing
accelerometers on other parts of the body. For example, tilt angle of a hip-mounted accelerometer
will affect its measurement accuracy, which can pose problems when trying to measure pregnant or
overweight individuals (Feito, Bassett et al. 2011; DiNallo, Downs et al. 2012). Additionally, the
introduction of machine learning modeling to accelerometer data has dramatically improved
measurement accuracy of accelerometers worn in various body locations, such as the wrist, thigh,
ankle, lower back, and upper arm (Preece, Goulermas et al. 2009). Studies aiming to classify
activity type using accelerometers placed on the wrist and thigh have consistently shown
accuracies of >70% and often accuracies above 90% in laboratory-based studies (Zhang, Rowlands
et al. 2012; Cleland, Kikhia et al. 2013; Mannini, Intille et al. 2013; Skotte, Korshoj et al. 2014).
These two measurement sites are appealing not only for their high activity classification accuracy
but also for their utility in measuring lifestyle behaviors such as sleep quality (wrist) (Webster,
Kripke et al. 1982; Jean-Louis, Kripke et al. 2001) and SB (thigh) (Kozey-Keadle, Libertine et al.
2011; Lyden, Kozey Keadle et al. 2012; Skotte, Korshoj et al. 2012; Skotte, Korshoj et al. 2014),
as well as their potential to improve participant compliance. Thigh-mounted accelerometers have
been used with high accuracy for measuring total time spent in SB as well as breaks in SB and are
often used as a criterion measure of SB in free-living environments. However, methods developed
to classify SB from a thigh accelerometer provide accurate estimates of step count (Maddocks,
Petrou et al. 2010; Harrington, Welk et al. 2011) but do not allow for detailed information on PA
behaviors and appear to underestimate energy expenditure (Harrington, Welk et al. 2011). It
would be useful to have a single measurement tool that could measure a variety of activity types as
well as SB in a free-living setting; to our knowledge, no such method has yet been validated.
170

Additionally, the wrist-mounted accelerometer has not yet been validated for measurement of total
time spent in SB or breaks in SB.
We have previously developed and validated machine learning algorithms for hip-, wrist-,
and thigh-mounted accelerometers that can classify activity type with accuracies above 70% for the
hip and above 80% for the wrists and thigh, but these have yet to be validated for measurement of
SB (Chapter 4). Therefore, the primary purpose of our study was to develop, validate, and
compare the accuracy of machine learning algorithms created from hip-, wrist-, and thigh-mounted
accelerometers for measuring 1) total time spent in SB and 2) breaks in SB in a simulated freeliving environment. A secondary purpose was to compare accelerometers located on the left and
right wrists for prediction of total time spent in SB as well as breaks in SB.

171

METHODS
Summary of protocol
Participants came to the Human Energy Research Laboratory to participate in a 90minute simulated free-living protocol, for which they performed a total of 14 sedentary,
ambulatory, lifestyle, and exercise activities while wearing a total of four accelerometers
(placed on the right hip, right thigh, and both wrists). Each activity was performed for between
3-10 minutes, with the order, duration, and intensity of activities left up to participants. During
the protocol, the order and duration of participants’ activities as well as total time spent in SB
and breaks in SB were recorded by a trained observer.
Participants
A total of 44 adults (22 male, 22 female) were recruited from the surrounding area of
East Lansing, MI via email, flyers, and word of mouth for participation in this study. In order
to be eligible for participation, participants had to fulfill three criteria 1) they had to be free of
health conditions preventing them from being able to safely perform moderate- or vigorousintensity physical activities, 2) they could not have an orthopedic limitations that would
invalidate the use of accelerometry, and 3) they had to fall within the age range of 18-44 years.
Prior to participant recruitment, this study was approved by the Michigan State University
Institutional Review Board.
Instrumentation
Each participant wore four accelerometer-based activity monitors in this study: two
ActiGraph GT3X+ accelerometers and two GENEActiv accelerometers. Additionally, a portable
172

digital assistant (PDA) computer was used by observers to record the activities performed during
the protocol. The accelerometers and PDA were synchronized to an external clock before each
test; descriptions of the accelerometers and PDA follow.
The acceleration data for all four accelerometers were time stamped and stored within the
monitors until they could be downloaded to a computer for analysis. Additionally, the
accelerometers were oriented so that the x-axis was the vertical axis, the y-axis was the mediallateral axis, and the z-axis was the anterior-posterior axis.
ActiGraph accelerometers
The ActiGraph (ActiGraph LLC, Pensacola, FL) is a commonly used accelerometer for
activity measurement, and there is an abundance of literature regarding its reliability and validity
for measurement of PA (Freedson, Melanson et al. 1998; McClain, Sisson et al. 2007). Two
GT3X+ models were worn by each participant during the study, one on the midline of the right
thigh and adhered to the leg (with hypoallergenic sticky tape), and the other placed on the right hip
at the anterior axillary line (with an elastic belt). The ActiGraph GT3X+ records raw
accelerations of up to ± 6 times the gravitational force (6g) in three dimensions of movement. For
the current protocol, the accelerometers recorded at a rate of 40 samples per second (40 Hz).
GENEA accelerometers
The GENEActiv accelerometer (Activinsights Ltd, Kimbolton, Cambridgeshire, UK) has
undergone preliminary validations for PA measurement (Esliger, Rowlands et al. 2011) as well
as activity type classification (Zhang, Rowlands et al. 2012). The GENEA records raw
173

accelerations of up to ± 6g in three axes of movement, and the GENEA monitors used in this study
were set to record acceleration data at a rate of 20 Hz. Participants wore two GENEA
accelerometers which were fastened to the dorsal side of each wrist using a watch strap supplied by
the manufacturers (Esliger, Rowlands et al. 2011).
iPAQ portable digital assistant and direct observation
Direct observation (DO) was conducted using an HP iPAQ personal digital assistant (PDA)
(HP Development Company, Palo Alto, CA) to obtain a criterion measure for total time spent in
SB and breaks in SB. During the study protocol, a trained observer used a PDA with BEST
software developed based on the Children’s Activity Rating Scale protocol (Puhl, Greaves et al.
1990). The observer used the codes T1-T14 to represent the 14 activities in the visit and recorded
the activities being performed continuously as they occurred throughout the visit. The codes T 1-T3
represented the three sedentary activities (lying, reading, and computer use) in the visit, and these
were used to determine total time spent in SB and breaks in SB. Inter-rater reliability for DO was
above r=0.90 for this study.
Procedure
Upon arriving at the Human Energy Research Laboratory, details of the study were
discussed with each participant. Written informed consent was obtained, and a physical activity
readiness questionnaire was administered to ensure that the participant had no contraindications to
engaging in PA. After consenting, participant weight and height were measured (to the nearest 0.1
kg and 0.1 cm, respectively) according to standardized methods (Malina 1995). Body mass index
(BMI) was calculated by dividing body weight by the square of height (kg/m 2). Participant age
174

was assessed by asking participants to state their age in years, and handedness (left or right) was
determined by asking participants which hand they prefer to use for the majority of everyday
activities.
After being fitted with the four accelerometers, participants performed 14 activities which
were meant to include many different types and intensities of activities that would likely be seen in
a free-living environment. (shown in Table 5.1). Ambulatory activities (walking and jogging) are
common in accelerometer validation literature; we added the sedentary, exercise, and lifestyle
activities to determine the potential for the four accelerometers to measure SB accurately in a
setting where a variety of activities was being performed, as is normally seen in free-living
environments. Additionally, we added an activity where participants removed the accelerometers
so that the ANNs would be able to recognize non-wear, which is important to be able to detect in
free-living environments for compliance purposes and for differentiation of non-wear from SB.

175

Table 5.1. Activities performed during the simulated free-living protocol.
Activity
Category

Activity

Activity
Intensity

Lying down (T1)

Sedentary

Reading (T2)

Sedentary

Computer (T3)

Sedentary

Standing (T4)

Light**

Laundry (T5)

Light

Sweeping (T6)

Light

Walking slow (T7)

Light

Walking fast (T8)

Moderate

Jogging (T9)

Vigorous

Cycling
(CY)

Cycling (T10)

Moderate/
Vigorous

Stair use
(SU)

Stair climbing and
descending (T11)

Moderate/
Vigorous

Biceps curls (T12)

Light

Squats (T13)

Moderate

Non-wear of
accelerometer (T14)

N/A

Sedentary
behaviors
(SB)
Standing
(ST)
Lifestyle
(LI)
Leisure walk
(LW)
Brisk walk
(BW)
Jogging
(JO)

Exercise
(EX)
Non-wear
(NW)

Description of Activity*
Lying on a mat on the floor
Reading a magazine article while
sitting at a table
Sitting and playing a computer game
that involves mouse clicking and typing
Standing still with arms at sides
Folding towels and putting them in a
laundry basket
Sweeping confetti into piles
Walking at a self-selected ‘slow’ pace
in a hallway
Walking at a self-selected ‘brisk’ pace
in a hallway
Jogging at a self-selected pace in a
hallway
Cycling on a cycle ergometer at a selfselected cadence of 50-100 rpm with 1
kg resistance
Walking up and down a flight of stairs
at a self-selected pace
Standing still while doing biceps curls
with a 3-lb. weight in each hand
With feet shoulder-width apart,
bending at the knees (to a 90° angle)
while holding an unweighted broom
behind the head
Not wearing the accelerometer

* Activity order, intensity, and duration (3-10 minutes) were left up to participants.
** Standing has traditionally been considered SB; however, recent literature suggests that standing
should be considered light-intensity instead of SB due to the differential physiologic effects of
standing as compared to sitting/lying (Owen, Healy et al. 2010).
Participants completed the 14 activities (shown in Table 5.1) in a 90-minute, simulated
free-living protocol which took place within the Human Energy Research Laboratory and in a
176

building stairwell and hallway. The 14 activities were described to each participant prior to the
start of the protocol, and some of the less familiar activities (e.g., squats) were demonstrated to
ensure understanding. Participants completed each of the 14 activities for at least three minutes
and for no more than 10 minutes, but the order, intensity, and duration of the activities were left up
to each participant. Participants were also free to perform activities more than once if they so
chose. A research assistant directly observed and recorded each activity on a handheld PDA
computer while activities were being performed.
Additionally, the research assistant periodically updated participants on which activities
they still needed to complete. The non-wear activity was saved until the end of the protocol so that
participants would not spend a significant portion of the time trying to remove and reattach the
accelerometers. For this study, direct observation (DO) served as the criterion measure of total
time spent in SB and breaks in SB. Upon completion of the protocol, participants were given a
$35 Target® gift card.
Data reduction and modeling
Artificial neural networks
Artificial neural networks (ANNs) are nonlinear models which predict an outcome or
dependent variable y (e.g., energy expenditure or activity type) using a number of inputs x1…xk,
where k is the number of features used to predict y. A graphical depiction of the ANNs created in
the current study can be seen in Figure 5.1. The ANNs were used in our previous work (Chapter 4)
for predicting activity type, which were then used in the current study to predict total time in SB
and breaks in SB. For activity type classification, the ANNs functioned similar to a logistic
177

regression model. Setting the activity types as the nominal values a1…a14, the ANN model can be
seen in Equation 1.

Equation 1:

(

)

(

∑

(

∑

)

In Equation 1, Pr is probability, C is a constant chosen so that Pr(y=a1)+…+Pr(y=a14)=1,
w are the weights of the input features, and H is the number of hidden layers. For each activity,
values closer to 1 represented a higher likely that the activity was being performed. The activity
with the value closest to 1 was chosen as the predicted output by the ANN In accordance with
previous research, our models contained only one hidden layer (Preece, Goulermas et al. 2009;
Staudenmayer, Pober et al. 2009; Trost, Wong et al. 2012).
After classifying into specific activity types, the three sedentary activities (lying, computer
use, and reading) were collectively categorized as SB to allow for prediction of total time spent in
SB. Likewise, the 10 non-sedentary activities (standing, laundry, sweeping, walk slow and fast,
jogging, cycling, stair use, biceps curls, and squats) were collectively classified as non-SB in order
to predict breaks in SB. Non-wear was classified into its own category and later removed from the
dataset since there is no way to tell if a person is sedentary or non-sedentary if the accelerometer is
not being worn.

178

Figure 5.1. ANN for predicting activity type and sedentary behavior.

Figure 5.1 legend
* The number of input features was 38, as described in Table 5.2. Additionally, three hidden
units are shown in Figure 5.1 for simplicity, but 15 hidden units were used for construction of
the ANNs.
Accelerometer signal features (one of each per axis, three total of each per accelerometer)
1. Mean = mean
2. Var = variance
3. Cov = covariance
4. Min = minimum
5. Max = maximum
6. MeanOR = mean accelerometer orientation
7. VarOR = variance of
8. 10th %ile = 10th percentile
accelerometer orientation
9. 25th %ile = 25th percentile
10. 50th %ile = 50th percentile
th
th
11. 75 %ile = 75 percentile
12. 90th %ile = 90th percentile
Participant characteristics features
13. Ht = participant height
14. Wt = participant weight
Non-feature abbreviations
T1-T3 are sedentary activities, and T4-T13 are non-sedentary activities.
S = summations of the input layer in the hidden units
U = activation function for the hidden layer
W1 = the weight vectors for each of the inputs
W2 = the weight vectors for each of the summations
179

The ANNs were created and tested using a leave-one-out cross-validation. In this
approach, data from all but one participant were used to estimate the weights for each input feature
for predicting activity type. Then, the ANN was tested on the data from the participant left out of
the training phase by supplying the input features and comparing the predicted activity type from
the ANNs to the recorded activity type from DO. The leave-one-out cross validation is an iterative
approach and was repeated with each participant’s data used as the testing data once, therefore
obtaining an ANN for activity type for each participant in the study. The weights determined from
each iteration of the leave-one -out validation were averaged to obtain a final ANN. This process
was conducted separately for each accelerometer, resulting in four distinct ANNs.
The ANNs were developed with the intention to predict activity type, which could then be
used to estimate total time spent in SB as well as breaks in SB. In accordance with previous
research, we chose to use five-second windows for creation and testing of our ANNs (Preece,
Goulermas et al. 2009). Table 5.2 provides a list of the 38 features tested and used in the current
analyses. The 36 accelerometer features (12 features for each of the three axes) are time-domain
features that are simple to compute and have been used previously as inputs into machine learning
algorithms. Additionally, we included height and weight to account for different body sizes.

180

Table 5.2. Features used for EE and activity type prediction.
Feature
number
1-3*

Feature used

4-6*

Variance of acceleration signal

7-9*
10-12*
13-15*
16-18*

Covariance of acceleration signal
Minimum of acceleration signal
Maximum of acceleration signal
10th percentile of acceleration
signal

19-21*

25th percentile of acceleration
signal

22-24*

50th percentile of acceleration
signal

25-27*

75th percentile of acceleration
signal

28-30*

90th percentile of acceleration
signal

N/A

Accelerometer orientation
(needed for calculating features
31-36)

Formula for calculating feature

Mean acceleration signal

(
∑
∑

(

Mean accelerometer orientation

34-36*

Variance of accelerometer
orientation
Participant height
Participant weight

37
38

)

(

)

) (

(

)]

)

(
)
(
)
For every 100 accelerations, arrange in
order from smallest to largest and pick the 10th
value
For every 100 accelerations, arrange in
order from smallest to largest and pick the 25th
value
For every 100 accelerations, arrange in
order from smallest to largest and pick the 50th
value
For every 100 accelerations, arrange in
order from smallest to largest and pick the 75th
value
For every 100 accelerations, arrange in
order from smallest to largest and pick the 10th
value
(

)
(

31-33*

∑

√(

)
(

∑

∑

)

)

(

N/A
N/A

Ax is the acceleration in the direction of the x-axis.
*signifies that one feature is included for each of the three accelerometer axes. The formulas
shown are for the x-axis, but the formulas for the y-and z-axes are similar.

181

)

Assessing sedentary behavior using accelerometers
The ANNs were created in order to classify 10 different activity categories. In our initial
testing of the ANNs (Chapter 4), we found that they correctly classified sedentary activities 72.6%,
92.1%, 93.5%, and 92.7% for the hip, thigh, left wrist, and right wrist accelerometer placements,
respectively. However, higher classification accuracy for sedentary activities does not necessarily
ensure better accuracy for predicting total time spent in SB or breaks in SB. Therefore, in the
current study, total time spent in SB for each participant was estimated using each accelerometer.
ANNs from each of the four accelerometers predicted the activities being performed throughout
the protocol, and all time spent lying, reading, or in computer use was summed to obtain a
prediction for total time spent in SB. Since each accelerometer predicted activity type separately
from the other accelerometers, we obtained four estimates of total time spent in SB for each
participant.
Similarly, breaks in SB were assessed for each participant and separately for each
accelerometer placement. A break in SB has been defined in previous research as when an interval
classified as SB is followed by an interval classified as a non-sedentary activity. We classified a
break in SB using three different lengths of time that a non-sedentary activity must occur to
constitute a break (we call this a break interval). First, since previous research using
accelerometers to measure SB uses 60-second break intervals, we first defined a break in SB as
when 12 consecutive 5-second windows (12*5 = 60 seconds) of a sedentary activity were followed
by 12 consecutive windows of a non-sedentary activity. Second, a shorter break in SB (i.e., < 60
seconds) might be physiologically meaningful but may be missed if using a 60-second break
interval. Therefore, we also evaluated the accuracy of using the two shorter intervals for
182

measuring breaks in SB (30 seconds and 5 seconds). Using a 30-second break interval for
estimating breaks in SB, we defined a break as six consecutive windows of a sedentary activity
followed by six consecutive windows of a non-sedentary activity. Using a 5-second break interval
for estimating SB breaks, we defined a break as one window of sedentary activity followed by one
window of non-sedentary activity.
Direct observation
Direct observation has been used successfully as a criterion measure of SB in previous
studies conducted in free-living settings (Lyden, Petruski et al. 2013) and served as our criterion
measure for the current study. Data on activities performed were recorded on a handheld PDA
using the BEST observation software. Using this software, activities performed during the visit
were coded as T1-T13, as shown in Table 5.1.
As the final activity in the visit, participants took off their accelerometers and set them on a
table, and then the next 3-10 minutes was recorded as non-wear (T14) while the accelerometers sat
on the table. Any activity coded as non-wear was not included when analyzing SB, since, by
definition, we could not know if participants are engaging in SB if the accelerometer is not being
worn. Exclusion of non-wear was necessary in order to determine the real-world suitability of the
ANNs for measurement of SB. Additionally, as participants transitioned from one activity to
another, we coded this time between activities in a special transition category (T15).
The recording of activities using DO took place continuously and in real time. Research
assistants were trained to record an activity change as closely as possible to the moment it
occurred. After collection, these DO data were synchronized with the accelerometer data so that
183

each five-second window of accelerometer data was matched to the actual activity performed
during that window. In most cases, only one activity occurred during a given five-second window.
However, when transitioning between activities, two activities could occur in the same window. If
this occurred, the window was automatically recoded as a transition. Additionally, we used the
transition category to define all time between activities, such as walking from one activity to
another or making an equipment adjustment between activities. Thus, transitions did not represent
a specific activity type but instead involved walking, standing, etc. that occurred at the end of one
activity and before the next started. We did not include transitions as a separate activity in the
ANN creation but instead removed them from the DO and accelerometer datasets prior to creation
of the ANNs and before prediction of total time spent in SB.
However, breaks in SB only occurred during the times coded as transitions in the dataset
(e.g., transitioning from reading to jogging would represent a break in SB). Therefore, we added
the transition data back to the dataset after creation of the ANNs but before testing the ANNs for
their prediction of breaks in SB. For DO, any transition from a sedentary to a non-sedentary
activity was considered a break in SB, no matter how short the transition may have been.
Conversely, for the accelerometers, we predicted breaks in SB in three ways (using 5-, 30-, and 60second break intervals), as described in the previous section.
Statistical analyses
A criterion value of total time spent in SB was assessed for each participant using DO and
averaged for the entire sample. Similarly, estimates of SB from each of the four accelerometers
were calculated for each participant and averaged together for the entire sample. Differences
between criterion-measured and accelerometer-estimated total time spent in SB were evaluated
184

using repeated measures analysis of variance (RMANOVA). If significant differences were
revealed by the RMANOVA, post hoc dependent t-tests were conducted, with a least significant
difference (LSD) correction used to account for multiple comparisons. Additionally, root mean
square error (RMSE) values and their 95% confidence intervals (CIs) were calculated for predicted
vs. measured total time spent in SB for each of the four accelerometers. Significant differences for
RMSE among monitor locations were determined by non-overlap of a 95% CI with the mean from
another accelerometer location.
For breaks in SB, criterion-measured breaks were also obtained for each participant using
DO and averaged for the entire sample. Estimates of breaks in SB from each accelerometer were
obtained separately for five-, 30-, and 60-second break intervals for each participant and averaged
for the sample. Differences among DO, the four accelerometers, and the three windows were
evaluated with RMANOVA, and differences were evaluated using post hoc tests and an LSD
correction. Moreover, RMSE values and their 95% CIs were computed to compare predicted to
measured breaks in SB, with non-overlap of a 95% CI with the mean from another accelerometer
location or break interval indicative of statistically significant differences.
Power analysis
We desired 80% power to detect a difference of at least moderate effect size (ES=0.5)
among accelerometers and the criterion measure. Therefore, with the α level set at α = 0.05, we
needed 34 participants to be sufficiently powered to detect this difference. We chose to
oversample by 10 participants in order to have adequate sample size despite an expected loss of a

185

few participants due to the possibility of equipment malfunction, especially when using multiple
accelerometers and a handheld computer.

186

RESULTS
Of the 44 participants who participated in study, significant data loss occurred for the
thigh accelerometer in two participants, resulting in their exclusion from the data analysis.
Additionally, the portable metabolic analyzer (used to address a study aim not part of the current
manuscript) malfunctioned in three participants, resulting in premature termination of the
protocol and exclusion of their data from the analysis. Therefore, 39 participants with viable
data were included in the final data analysis. Sample demographics included in the analyses are
displayed in Table 5.3.
Table 5.3. Demographic characteristics of participants enrolled in study.
All (n=39)
22.1 (4.3)
Age (years)
72.4 (16.2)
Weight (kg)
171.4 (10.1)
Height (cm)
2
24.4 (3.6)
BMI (kg/m )
Data are displayed as mean (SD).

Males (n=19)
23.7 (5.0)
84.5 (13.1)
179.1 (7.7)
26.3 (3.4)

Females (n=20)
20.5 (2.7)
60.8 (8.9)
164.1 (5.7)
22.5 (2.6)

Predictions of total time spent in SB are shown in Figure 5.2. Overall, participants spent an
average of 20.7 minutes engaged in SB during the visit, according to DO.

The hip

accelerometer tended to underpredict SB by 7.9% (19.1 minutes from the hip vs. 20.7 minutes
from DO), but this difference did not reach statistical significance. The estimates of total time
spent in SB predicted by the accelerometers placed on the thigh and both wrists were not
significantly different from that measured by DO. Additionally, we rearranged the data to
compare dominant and non-dominant wrist placements, but neither were significantly different
from DO-measured total time spent in SB.

187

Total Sedentry Behavior (min)

Figure 5.2. Predictions of total time spent in SB compared to a criterion measure (DO).
30
28
26
24
22
20
18
16
14
12
10
8
6
4
2
0

AG Hip = Hip-mounted ActiGraph monitor, AG Thigh = Thigh-mounted ActiGraph monitor,
GE Left Wrist = GENEA monitor placed on the left wrist, GE Right Wrist = GENEA monitor
placed on the right wrist.

Although there were no significant differences among the four accelerometers compared
to DO for predicting total time spent in SB, there was considerable variation in RMSE (Table
5.4), ranging from 2.8 minutes with the accelerometer on the left wrist to 7.2 minutes with the
hip-mounted accelerometer. Each monitor placement site had significantly different RMSE
values than the other three, but it is notable that the two wrist accelerometer placements had
RMSE values that were 49-61% lower than the RMSE values achieved with the hip and thigh
accelerometer placements. The left wrist placement had significantly lower RMSE than the right
wrist placement; similarly, the non-dominant wrist placement had significantly lower RMSE
than the dominant wrist placement.

188

Table 5.4. Root mean square error for prediction of total time spent in SB and breaks in SB.
Accelerometer location

RMSE for
predicted total
time spent in
SB
[Minutes (95%
CI)]
RMSE for
predicted 5second breaks
in SB
[Breaks (95%
CI)]
RMSE for
predicted 30second breaks
in SB
[Breaks (95%
CI)]
RMSE for
predicted 60second breaks
in SB
[Breaks (95%
CI)]

AG Hip

AG Thigh

GE Left
Wrist

GE Right
Wrist

Dominant
Wrist

7.2
(6.9-7.4)*

6.3
(5.0-6.5)*

2.8
(2.7-2.9)*

3.2
(2.8-3.5)*

3.3
(2.9-3.5)^

Nondominant
Wrist
2.7
(2.6-2.8)

31.5
(30.732.2)*

21.0
(20.521.6)*

32.4
(31.733.1)*

29.5
(28.930.0)*

30.7
(30.1-31.4)

31.2
(30.6-31.9)

1.6
(1.5-1.6)*

1.5
(1.4-1.5)*

1.9
(1.8-1.9)

1.9
(1.8-1.9)

1.9
(1.8-2.0)

1.9
(1.8-2.0)

2.0
(2.0-2.1)

2.1
(2.0-2.1)

2.1
(2.0-2.2)*

2.2
(2.2-2.3)*

2.2
(2.1-2.2)

2.2
(2.1-2.2)

* indicates significant difference from all other accelerometer placements.
^ indicates significant difference from non-dominant wrist placement.
AG Hip = Hip-mounted ActiGraph monitor, AG Thigh = Thigh-mounted ActiGraph monitor,
GE Left Wrist = GENEA monitor placed on the left wrist, GE Right Wrist = GENEA monitor
placed on the right wrist.

189

Predictions of breaks in SB compared to DO are shown in Figure 5.3-5.5. From these
figures, it is apparent that choice of break interval for defining SB altered the accuracy of
prediction of breaks in SB in this study. Choice of the five-second interval for defining SB
(Figure 5.3) resulted in dramatic overestimations of breaks in SB for all four accelerometer
placements compared to DO. The thigh accelerometer placement performed best for predicting
breaks in SB for the five-second interval, but it still predicted over five times more breaks in SB
than were actually taken. On the other extreme, use of the 60-second interval for defining a
break in SB (Figure 5.5) resulted in underprediction of breaks by all four accelerometer
placements compared to DO, and none of the predictions were significantly different from each
other. The 30-second interval for defining a break in SB resulted in highest accuracy of
prediction of breaks in SB (Figure 5.4). The thigh and right wrist accelerometer placements
underestimated breaks slightly, by an average of 0.7-0.8 breaks per visit. Conversely, the hip
and left wrist accelerometer placements provided accurate predictions of breaks in SB with the
30-second interval. When analyzed by dominant and non-dominant wrists, the dominant wrist
placement underpredicted breaks, whereas the non-dominant wrist accurately predicted breaks in
SB.

190

Number of Breaks in SB

Figure 5.3. Predictions of breaks in SB using a five-second interval.
50
45
40
35
30
25
20
15
10
5
0

*

*

*

*

*^

* indicates significant difference from DO.
^ indicates significant difference from all other accelerometers.

Figure 5.4. Predictions of breaks in SB using a 30-second interval.

Number of Breaks in SB

6
5

*

*

4
3
2

1
0

* indicates significant difference from DO.

191

*

*

Figure 5.5. Predictions of breaks in SB using a 60-second interval.

Number of Breaks in SB

6
5
4

*

*

*

*

*

*

3
2
1
0

* indicates significant difference from DO.
Table 5.4 shows the RMSE values for predicted vs. measured breaks in SB, displayed
separately for the five-, 30-, and 60-second break intervals. For the five-second break interval,
the poor prediction accuracy for breaks in SB seen in Figure 5.3 was compounded by very high
RMSE values for all four accelerometer placements, ranging from an error of 21.0 breaks for the
thigh placement site to 32.4 breaks for the left wrist placement. The RMSE values for the 30and 60-second break intervals were considerably lower than for the five-second break interval.
For all four accelerometer placements, the 30-second break interval had significantly lower
RMSE than the 60-second interval, again indicating superior accuracy for the 30-second break
interval. When comparing among the four accelerometer placements, the hip and thigh
accelerometers had RMSE values 19-30% lower than both wrist placements for the 30- break
interval and 4-9% lower than the wrist placements for the 60-second break interval.

192

When comparing the two wrist placements, prediction of total time spent in SB was not
significantly different between the two; however, RMSE for prediction of total time spent in SB
was about 12% lower for the left wrist placement than the right wrist placement (and about 18%
lower for the non-dominant wrist than the dominant wrist). For prediction of breaks in SB, both
dramatically overpredicted breaks using the five-second break interval and underpredicted breaks
using the 60-second break interval. Using the 30-second break interval, the RMSE values were
similar between monitors, but the left wrist prediction of breaks was not significantly different
from DO, whereas the right wrist underpredicted breaks compared to DO. Similarly, breaks
were underpredicted when data were analyzed for the dominant wrist, but the non-dominant
wrist placement resulted in accurate predictions of breaks with the 30-sec break interval.

193

DISCUSSION
The purpose of this investigation was to develop, validate, and compare the accuracy of
ANNs created to estimate total time spent in SB and breaks in SB from accelerometers located
on the hip, wrists, and thigh. Additionally we compared accuracy of accelerometers worn on the
left and right wrists for prediction of time spent in SB and breaks in SB. The ANNs were
developed in order to predict the type of activity being performed, and these were validated in
our previous work (Chapter 4). For prediction of total time spend in SB, we summed time
predicted as lying, reading, and computer use. Similarly, we predicted breaks in SB as when a
bout of time spent in lying, reading, and computer use was followed by a bout predicted as a
non-sedentary activity.
When examining total time spent in SB, predictions from all four accelerometers were
not significantly different from the criterion, although the hip trended toward underpredicting
time spent in SB. Additionally, the two wrist accelerometer placements had significantly lower
RMSE for predicting total time spent in SB compared to the hip and thigh placements, indicating
that the wrist placement sites had less individual error (and superior accuracy) when predicting
total time spent in SB. The hip accelerometer placement had the worst prediction of total time
spent in SB, with an RMSE value more than 100% greater than those seen with the wrist
placements and 14% higher than the RMSE from the thigh placement. The fact that the hip
placement site performed worst of the four sites in terms of prediction error and the tendency for
underprediction of total time spent in SB is not surprising given previous studies by Kozey
Keadle et al. and Lyden et al. showing higher accuracy for measuring total time spent in SB
using a thigh accelerometer than a hip accelerometer (Kozey-Keadle, Libertine et al. 2011;
194

Lyden, Kozey Keadle et al. 2012). Additionally, Hart et al. used thigh- and hip-mounted
accelerometers in a free-living setting and found that the thigh placement had higher convergent
validity with other SB assessment measures than the hip placement (Hart, Ainsworth et al. 2011),
again supplying evidence that thigh-mounted accelerometers are preferable to hip accelerometers
for the measurement of total time spent in SB. Of note, the RMSE for the left wrist placement
was 12% lower than the RMSE for the right wrist placement, which increased to an 18%
difference when data were analyzed comparing the dominant and non-dominant wrists. These
findings indicate superior accuracy of the non-dominant wrist accelerometer for measurement of
total time spent in SB. Implications of this finding are discussed later in this section.
To our knowledge, this is the first study that assessed the utility of wrist-mounted
accelerometers for measurement of SB. Initially, we were surprised by the superiority of the
wrist accelerometers to the thigh accelerometer for measurement of total time spent in SB given
the previous literature showing high accuracy of the thigh for measuring SB (Hart, Ainsworth et
al. 2011; Kozey-Keadle, Libertine et al. 2011). However, in our previous work (Chapter 4), the
left and right wrist accelerometer placements achieved activity type classification accuracies of
86.6% and 86.7%, respectively, which was slightly higher than the thigh placement accuracy
(84.0%) and much higher than that accuracy achieved with the hip (72.5%). Moreover, the left
and right wrist placements achieved prediction accuracies of 93.5% and 92.7%, respectively, for
prediction of sedentary activities, which was higher than the thigh (92.1%) and hip (72.5%).
Therefore, the highest overall prediction accuracies for activity type as well as the highest
recognition of sedentary activities supports that wrist-mounted accelerometers may also be best
for prediction of total time spent in SB.
195

In contrast, the wrist accelerometer placements did not perform superiorly to the hip and
thigh placements for estimating breaks in SB. All four accelerometer placements performed best
when using the 30-second break interval; using this break interval, the wrist placements had
RSME values 19-30% higher than the hip or thigh placements for estimating breaks in SB,
indicating lower measurement accuracy from the wrist-mounted accelerometers. Additionally,
only the non-dominant wrist placement accurately estimated breaks in SB for the 30-second
break interval, with the dominant wrist underpredicting breaks by 17%. The thigh placement
also underpredicted breaks (by 15%) but had the smallest RMSE for prediction of breaks in SB.
Surprisingly, the hip placement performed the best of the four accelerometer placements,
accurately predicting breaks in SB while also yielding an RMSE only 5% higher than the thigh
and 19-23% lower than the wrist accelerometers. The high accuracy of the hip placement is
insightful given mixed results reported by Lyden et al. considering the utility of the hip for
measurement of SB. In one study, Lyden et al. found that a thigh accelerometer was able to
accurately classify breaks in SB, while a hip accelerometer overestimated breaks by 78-133%
depending on choice of cut-point used as the threshold for SB (Lyden, Kozey Keadle et al.
2012). In a more recent study, the authors found that when using ANNs, a hip-mounted
accelerometer was able to accurately measure total time spent in SB as well as breaks in SB
(Lyden, Keadle et al. 2013). The current study, in conjunction with Lyden’s work, provides
further evidence of the advantages of using machine learning for modeling accelerometer data
over using the traditional cut-point approach for measurement of SB using a hip-mounted
accelerometer.

196

Our finding that the wrist accelerometer placements were outperformed by the hip
placement for measurement of breaks in SB is surprising given that accuracy of activity type
classification, specifically for recognizing SB, is higher for the wrists than the hip (Chapter 4).
There are several possible reasons why higher classification accuracy did not translate to better
measurement of breaks in SB. First, the hip is relatively insensitive to limb movements, whereas
the thigh and wrists are not. Therefore, limb movements while sitting or lying (ex., to drink
water, scratch an itch, adjust equipment/clothing, etc.) may cause misclassification of one or
more -five-second windows as non-sedentary activity. While an occasional misclassification
would have minimal effect on overall classification accuracy or total time spent in SB prediction,
these misclassifications would disrupt periods of SB and therefore lead to incorrect prediction of
a break in SB when one did not occur. It would seem that this type of misclassification would
increase the number of breaks detected and result in overprediction, and this was the case with
the 5-second break interval. However, given the relatively short periods of time some sedentary
activities were performed, it is possible that periodic misclassification due to sporadic limb
movement would keep the accelerometers from recognizing the activity as SB (especially with
longer break windows), thereby not recording the subsequent transition as a break from SB and
leading to underprediction of breaks in SB.
Additionally, for measurement of breaks in SB, the left wrist accelerometer placement
was able to predict breaks accurately with the 30-second break interval, while the dominant wrist
underpredicted breaks by about 17%. Given that 90% of our sample (four of the 39 participants)
was right-hand dominant it is not surprising that upon analyzing the data comparing the
dominant and non-dominant wrists, the non-dominant wrist achieved better accuracy for
197

measurement of total time in SB and breaks in SB. It may be that the dominant wrist
accelerometer placement captured more irregular movement as participants performed the
various activities in the visit, leading to misclassification of breaks in SB. Given this possibility,
these results lend support to the convention in many large studies, such as NHANES, that wrist
accelerometers should be worn on the wrist of the non-dominant hand (Troiano and McClain
2012; Troiano, McClain et al. 2014).
It is important to reiterate that SB is a complex construct, and it is necessary to be able to
measure the individual components of SB (i.e., total time and breaks) in order to better
understand the influence of SB with health. Studies by Hamilton and colleagues provide
evidence that prolonged SB is worse than an equivalent amount of time spent in SB which is
frequently broken up by periods of non-sedentary activity (Bey and Hamilton 2003; Hamilton,
Hamilton et al. 2004). Additionally, Healy and colleagues have published several studies
showing inverse associations between breaks in SB and several health indices, independent of
total time spent in PA or SB (Healy, Dunstan et al. 2008; Healy, Matthews et al. 2011). Our
findings indicate that total time spent in SB may be best measured using wrist-mounted
accelerometers, while breaks in SB may be better measured by a hip-mounted accelerometer.
Therefore, as more research is conducted to better elucidate the health risks of total time spent in
SB and breaks in SB, choice of accelerometer placement should be determined by the exact
research question of interest.

198

Strengths and limitations
There were several limitations in this study. First, our sample consisted of mostly college
students with interest in health sciences and may not be reflective of the wider college age/young
adult population. Additionally, the amount of time spent sedentary as well as the number of
breaks in SB by participants during the study protocol is probably not reflective of an average
90-minute segment of the day, so without further research it is not guaranteed that the monitors
will perform with similar accuracy in a true free-living environment.
This study also had a few noteworthy strengths. To our knowledge, this was the first
study to assess the ability of wrist-mounted accelerometers for measurement of total time spent
in SB and breaks in SB, and our use of hip- and thigh-mounted accelerometers allowed for direct
comparison of accuracy of the wrist monitors to previously used methods of measuring SB.
Second, while a simulated free-living setting may not be totally reflective of a true free-living
environment, the simulated free-living setting allows for better generalizability of results than a
heavily controlled, laboratory-based protocol. By utilizing a simulated free-living setting, we
were able to allow some freedom in activity choice, intensity, and timing while still using highquality criterion measures and examining of a wide range of different activities in a relatively
short period of time, thereby minimizing burden on participants and researchers.
Conclusions
Our results provide evidence that hip-, thigh-, and wrist-mounted accelerometers can
provide accurate estimates of total time spent in SB, although measurement at the individual
level may be most accurate using the wrist-mounted accelerometers. For measuring breaks in
199

SB, the 30-second break interval appeared most accurate for all four accelerometers. When
using the 30-second interval, the hip accelerometer performed best, although the left wrist
accelerometer was also able to accurately predict breaks in SB. Together these results indicate
that use of an accelerometer on the non-dominant wrist or the hip may be preferable for
measurement of SB in a free-living setting, although the thigh accelerometer should be evaluated
further due to its demonstrated utility for SB measurement in previous work. Additionally, when
combining these results with the results from Chapters 4 and 5 of this dissertation, it appears that
the wrist-mounted accelerometers (especially the non-dominant wrist accelerometer) perform
well for measurement of energy expenditure and best for classification of activity type and
measurement of SB. Therefore, these results suggest that the wrist may be an ideal measurement
site for measurement of many behavioral characteristics. With the previous and current use of
wrist-mounted accelerometers for sleep measurement, we plan to expand our ANNs to recognize
and classify sleep duration and quality in addition to the variables already assessed.
Additionally, use of wrist-mounted accelerometers may allow researchers to design pattern
recognition approaches to recognize eating behaviors; we plan to further explore this possibility
in future work.

200

CHAPTER 6
DISSERTATION SUMMARY AND RECOMMENDATIONS
Summary of results
High levels of physical activity (PA) and low levels of sedentary behavior (SB) are
known to be beneficial for improving physical and mental health and lowering the risk of many
chronic diseases (PAGAC 2008). Valid measurement tools are required to accurately assess the
relationship of PA and SB to health outcomes, monitor precise levels of PA or SB to identify
groups of people who are attaining insufficient PA and/or too much SB, and evaluate the
effectiveness of interventions aimed to increase PA and decrease SB. Accelerometers are
commonly used for prediction of energy expenditure, activity type (to determine PA participation),
and SB, but the models used to predict these outcomes vary considerably in their complexity and
accuracy. Therefore the purposes of this dissertation were to 1) create predictive models from
accelerometer data with the intent to predict energy expenditure, activity type, and SB, 2) compare
the accuracy of models created from accelerometers worn on the right hip, right thigh, and both
wrists, and 3) to develop and test the models created using simple input features and widely
available computational software.
Chapter 3: Estimation of energy expenditure
The first part of our investigation focused on the ability of the four accelerometer
placements to accurately estimate EE. We hypothesized that all four placements would achieve at
least moderately high accuracy for predicting EE, as indicated by correlations of r ≥ 0.60 (Safrit
and Wood 1995) . The four placements achieved correlations of r = 0.82-0.89 with measured EE
201

from the Oxycon, supporting our hypothesis and indicating high accuracy for prediction of EE
from all four placements. Root mean square error (RMSE) was also calculated and ranged from
1.05-1.42 METs, which fall in line with values seen in previous work. When comparing
placement sites, we hypothesized that the thigh location would show the highest EE prediction
accuracy. This hypothesis was supported, with the thigh accelerometer achieving higher
correlations and lower RMSE for predicting EE than the hip or wrist accelerometer placements.
Another important advantage of the thigh-mounted accelerometer over the other placements was
that the use of fewer input features in the EE prediction model (which reduces its complexity) did
not result in lower accuracy, whereas the predictive accuracy was lower when fewer features were
used with the hip and two wrist accelerometer models. These findings lend support to the use of
thigh-mounted accelerometers for achieving high predictive accuracy for measuring EE, even with
relatively simple prediction models. However, the superiority of the thigh accelerometer
placement should not overshadow the fact that both the hip and two wrist accelerometer
placements also achieved highly accurate predictions of EE.
One significant hurdle in assessing EE in free-living settings is choice of a criterion
measure. Doubly labeled water is often used as a criterion measure for total EE but cannot assess
minute-to-minute EE. As an alternate approach, Lyden et al. used direct observation as a criterion
measure of free-living EE by recording the activity performed and then looking up an EE value
from the Compendium of Physical Activities in order to predict EE for each activity (Ainsworth,
Haskell et al. 2011; Lyden, Keadle et al. 2013). A potential problem with this approach is that the
Compendium represents an average value of EE for activities and is not necessarily accurate for a
given individual, especially when the observer would have to record the activity and also estimate
202

the activity intensity. Additionally, this method does not allow for prediction of EE during
transitions between activities since a transition is not a defined activity type but instead is used to
classify times when a person moves from one activity to another. Given the limitations of these
methods, we chose to use indirect calorimtery via a portable metabolic analyzer as our criterion
measure, which measures oxygen consumption to derive estimates of EE. Use of this method
allowed us to record data during all activity times as well as during transitions.
Indirect calorimetry provides a valid measure of EE when a person performs steady-state
activities (Rosdahl, Gullstrand et al. 2010); however, when a person changes activities or moves to
a different intensity of activity, change in oxygen consumption lags behind, meaning that indirect
calorimetry may not capture the true energy requirement of a task unless the task is being
performed at steady state, which may take several minutes to achieve after an activity is started
(Kenney, Wilmore et al. 2012). In our study, participants performed 14 distinct activities but could
perform an activity more than once; the actual number performed ranged from 14-20 and averaged
about 16, with an average length of about five minutes per activity. Therefore, a significant portion
of time during the protocol was likely not spent in steady-state EE. Despite these shortcomings,
we deemed indirect calorimetry the best available criterion measure due to the limitations of
doubly labeled water and direct observation (discussed earlier).
The lack of steady-state EE seen in our study likely relates to true free-living situations, at
least for PA. In free-living settings, adults likely reach steady-state EE during SB since SB makes
up the majority of waking time and since most SB bouts are performed for a prolonged period of
time (i.e., > 10 minutes) (Matthews, Chen et al. 2008; Lyden, Kozey Keadle et al. 2012).
However, non-sedentary activities make up a much smaller portion of the day and are generally
203

performed in shorter bouts, especially with respect to higher-intensity activities (Troiano, Berrigan
et al. 2008); therefore, we expect that steady-state is rarely achieved during free-living PA.
Accordingly, we feel that more research and discussion is needed to develop ways of improving
the use of direct observation and/or indirect calorimetry for measurement of non-steady-state EE.
One potential idea would be to perform a similar protocol to ours but to add a second visit where
each participant can perform each activity at steady state while their EE is measured via indirect
calorimetry. Then, for the simulated free-living visit, direct observation could be used as the
criterion (similar to Lyden’s study), but the individual’s measured EE values from the first visit
could be used to predict EE instead of using the Compendium for prediction of EE. This approach
would likely increase validity of direct observation but would also significantly increase participant
and research burden and cost of the study. However, we feel that our use of indirect calorimetry
represented an appropriate criterion measure to answer our research questions and are confident
that our results provide an accurate reflection of the true utility of the four accelerometer
placements we tested for prediction of EE.
In conclusion, the thigh accelerometer performed best of the placement sites for prediction
of EE, and the superiority of the thigh was more apparent with the simplest ANNs. However, the
wrists and hip placements achieved correlations within 10% and error within 25% of that achieved
by the thigh placement, indicating that high accuracy can also be achieved for measurement of EE
using accelerometers placed on the hip and wrists. Therefore, thigh-mounted accelerometers
should be used if EE measurement accuracy is of utmost importance, but the hip and wrists can be
used for accurate measurement as well, if these placement sites are more practical for the
population being tested or for the specific research question being addressed.
204

Chapter 4: Classification of activity type
The second major aim of this dissertation was evaluating the ability of accelerometers
located on the hip, thigh, and wrists to correctly predict the specific type of activity being
performed. Our first aim in this study was to create models to predict activity type using simple
input features and widely available, easy-to-use software packages. We were successful in
accomplishing this goal by using Microsoft Excel for data processing, cleaning, and reduction and
R for ANN creation.

Our first hypothesis-driven aim was to compare overall classification

accuracies among the four accelerometers as well as compare accuracies for detecting specific
types of activities. From our results shown in Chapter 3 as well as in previous research by
members of our research group and others (Cleland, Kikhia et al. 2013; Dong, Montoye et al.
2013; Skotte, Korshoj et al. 2014), we hypothesized that the thigh accelerometer would achieve the
highest overall activity classification accuracy. However, our results did not support this
hypothesis. When comparing classification accuracies for identifying all 14 activities, the two
wrist accelerometers performed the best, with classification accuracies of 81.3-81.4%. They also
showed the highest sensitivity and specificity for activity classification accuracy, whereas the thigh
and hip accelerometers achieved accuracies of only 71.7% and 66.4%, respectively. When
grouping similar activities into categories, the accuracies of all four monitors improved; the wrist
accelerometers still had the highest classification accuracies at 86.6-86.7%, with the thigh being
much closer in accuracy (84.0%) than the hip (72.5%).
When looking at the classification accuracies of specific activity types, we hypothesized
correctly that the wrist accelerometers would have the highest classification accuracies for lifestyle
activities (laundry and sweeping) as well as other upper-body activities such as biceps curls. The
205

wrist accelerometers also achieved the highest accuracy for classifying sedentary activities, which
we hypothesized would be measured best with the thigh-mounted accelerometer given high
accuracy for SB measurement by thigh-mounted accelerometers seen in previous research (KozeyKeadle, Libertine et al. 2011; Lyden, Kozey Keadle et al. 2012). However, the seated activities in
this study (computer use and reading) involved arm movement, which we believed would be
detected better with the wrist accelerometers than the thigh accelerometer. Importantly, combining
the sedentary activities into one category resulted in the thigh achieving an overall classification
accuracy within 1.5% of that achieved by the wrists, providing evidence that even if the thigh
cannot accurately classify specific types of sedentary activity (ex. lying vs. sitting), the thigh is
highly accurate for differentiating SB from non-sedentary activities. Our findings contrast
somewhat to previous work showing comparable or higher measurement accuracy of hip and thigh
accelerometers (Cleland, Kikhia et al. 2013; Dong, Montoye et al. 2013; Skotte, Korshoj et al.
2014). However, the current study tested a larger number of activities, and they occurred in a
simulated free-living setting, which can yield very different results compared to those found in a
laboratory-based setting (Gyllensten and Bonomi 2011; Lyden, Keadle et al. 2013; van Hees,
Golubic et al. 2013). In addition, the current study design is more generalizable to true free-living
settings than the findings from previous work.
We also compared classification accuracies of monitors worn on the left and right wrists.
Since about 10% of the sample was left-hand dominant, we also analyzed the data comparing the
dominant and non-dominant wrists. In both analyses, the classification accuracies of the monitors
on the wrists were within 1.0%, signifying similar accuracy of measurement regardless of the wrist
on which the accelerometer was worn. This finding suggests that the popular convention to wear
206

an accelerometer on the non-dominant wrist may be unnecessary for prediction of activity type,
especially if compliance will be improved by allowing wearers to choose the wrist on which to
wear the accelerometer (although our findings do not support that the wearer can switch between
wrists within a study).
Comparison of classification accuracies achieved among different studies is notoriously
difficult because classification accuracy is inversely related to activity number and similarity
among activities (all else held equal). Therefore, studies comparing the utility of different
accelerometer placement sites must directly compare each placement site. We chose to test the
hip, thigh, and wrists because they are the three most commonly used accelerometer placement
sites, but other sites, such as the ankle or lower back, may have advantages in certain situations and
should be considered for use in future studies.
Another difficulty of activity type classification studies is choice of activities to include in
the testing set. Predictive models can only predict activities that were used in the model creation;
for example, the models created in this dissertation can predict laundry but have no output variable
for gardening or dishwashing. When creating models to recognize specific activity types, there is
no way to include all activities that people may perform in their everyday lives. However, by
collapsing activities into categories comprising similar activities, it is possible to develop an idea of
how people spend their days and how active they are. The ANNs developed in this study showed
an ability to classify 10 categories of activities with sensitivities from 72.2-86.7%. Further
reduction to identification of activity intensities improved the sensitivity and AUC for the thigh
accelerometer placement and resulted in high classification accuracy by the thigh and wrist
placement sites and good classification accuracy by the hip placement site (Metz 1978).
207

In free-living settings, adults perform a wide variety of activities not included in this study;
therefore, we expect that the capability of our ANNs for prediction of specific types of activities
will be decreased in free-living settings. However, we demonstrated high predictive accuracy by
the thigh- and wrist-mounted accelerometer placements when collapsing our prediction into either
activity categories or activity intensities, and we feel that this approach is much more generalizable
because even activities not tested in this study can be grouped into an activity category or intensity
in order to measure activity levels in a free-living setting.
Along with the discussion above, the importance of being able to classify specific activities
vs. activity categories (i.e., lifestyle, exercise, etc.) vs. activity intensities (i.e., sedentary, light, etc.)
will depend on the question of interest. For example, a physical therapist may be interested in
measuring specific types of exercise activities to gauge compliance in a rehabilitation program,
necessitating the differentiation of specific exercise activities. Alternatively, a mother might want
to be able to differentiate between her child’s reading and TV watching but might be happy with
any type of exercise or ambulatory activity. From a health behavior perspective, it is necessary to
recognize specific types of ambulatory activity to differentiate incidental activity with healthenhancing activity. Specific exercise activities may not be as important to differentiate unless
dictated by a specific research question. Lifestyle activities and standing are likely most important
from an energy balance perspective or as breaks in SB. Lastly, from a pure health standpoint,
recognition of specific types of SB may not be as important; however, from an intervention
perspective, recognition of specific types of SB may be critical because getting someone to watch
less TV may require different techniques than getting someone to drive less or sit less at work.

208

In conclusion, our study supports that classification of a variety, but not all possible types,
of sedentary, ambulatory, lifestyle, and exercise activities was measured most accurately with
accelerometers placed on the left or right wrists, especially if classification of specific types of
activities is of importance. When activities were combined into similar categories, the thigh
accelerometer classification accuracy approached that achieved by the wrists, but the wrists
remained superior. Conversely, when classifying activities by activity intensities, the thigh
placement slightly outperformed the two wrist placements, although all three of these sites
achieved high overall intensity classification accuracy. These findings may vary depending on the
choice of activities included in a validation protocol; however, from these findings, it appears that
upper-body movements may be more unique to an activity than lower-body movements, allowing
for better recognition of activities when using simple input features for an ANN created from
wrist-accelerometer data.
Chapter 5: Estimation of sedentary behavior
The final objective of this dissertation was to assess the ability of hip, thigh, and wrist
accelerometers to accurately predict total time spent in SB as well as breaks in SB. Our first aim
of this study was to compare the accuracy of the hip, thigh, and wrist accelerometers for
measurement of total time spent in SB. We expected the thigh accelerometer to provide the most
accurate estimates, whereas we hypothesized that the hip would overestimate total SB due to
misclassification of standing as SB and that the wrist would underestimate total SB due to
misclassification of SB as a non-sedentary activity (due to aberrant wrist movement during SB).
Overall, all four accelerometer placements provided similar predictions of total time spent in SB,
but measurement error was considerably higher for the hip than the thigh, and the thigh had greater
209

error than the two wrist accelerometers. The finding that the hip had higher error than the thigh
supports previous work by Lyden, Kozey-Keadle, and Grant (Grant, Ryan et al. 2006; KozeyKeadle, Libertine et al. 2011; Lyden, Kozey Keadle et al. 2012). Our finding of superior accuracy
of the left and right wrist accelerometer placements was contrary to our initial hypothesis, but not
overly surprising, since the ANNs used to predict total SB were the same as the ones used to
classify activity type in Chapter 4, where the wrists outperformed the hip and thigh for overall
activity recognition as well as recognition of SB.
The second aim of the study was to compare the accuracy of the four accelerometers for
estimating breaks in SB. We used three different break intervals (5-, 30-, and 60-seconds) for
classifying a break in SB in order to determine an optimal break interval for measurement as it is
currently unknown what interval is best suited for recognizing breaks in SB . We found that the 5second interval was too short, with the misclassification of single windows of accelerometer data
resulting in dramatic overpredictions of breaks in SB by all four accelerometers. Conversely, the
60-second break interval appeared to be too long and resulted in underprediction of breaks in SB
by all four accelerometers. Using the 30-second interval, the hip and left wrist accelerometers
predicted breaks in SB accurately, while the thigh and right wrist accelerometers underpredicted
breaks. However, error in SB break prediction was lowest with the thigh and highest with the wrist
accelerometers. These findings were unexpected given the superior accuracy of the wrists for
predicting total time spent in SB; however, the findings point to the importance of measuring these
two constructs separately. Even though total time in SB and breaks in SB are related, accurate
measurement of one does not imply accurate prediction of the other. The hip accelerometer
placement’s ability to measure SB breaks accurately may be due to its insensitivity to limb
210

movement, whereas the wrists and thigh may have detected limb movement and potentially
misclassified SB as a non-sedentary activity, therefore misclassifying breaks is SB. Previous
research shows mixed findings of the accuracy of hip accelerometers for measurement of breaks in
SB, but a recent study by Lyden et al. highlights dramatically improved measurement of SB using
machine learning in comparison to the cut-point approach (Lyden, Keadle et al. 2013). Therefore,
our study provides further support that machine learning may allow for improved measurement of
SB using a hip accelerometer.
Interestingly, we found that the left and right wrist accelerometer placements performed
with similar accuracy when predicting total time spent in SB, but the left wrist showed higher
accuracy for prediction of SB breaks. Given that 90% of our sample was right-hand dominant, our
findings indicate potential superiority of the non-dominant wrist for measurement of SB, which is
in accordance with the convention for accelerometers to be placed on the non-dominant wrist.
We chose to predict time spent in SB and breaks in SB by first classifying into specific
types of activity. An alternate way to classify SB would be to use an EE of < 1.5 METs as SB.
Breaks would occur any time a predicted EE of < 1.5 METs was followed by an EE of ≥ 1.5
METs. This approach is how cut-point methods have been used, but the major drawback of this
method is that standing is an activity that typically elicits an EE of < 1.5 METs but is defined as a
non-sedentary activity by Owen et al. (Owen, Healy et al. 2010) due to evidence that standing may
not have the same health implications activities such as sitting or lying (Katzmarzyk 2014).
Therefore, we determined that this method was inappropriate since it would likely misclassify
standing as SB.

211

In conclusion, the findings for prediction of total time spent in SB closely mirrored our
findings from Chapter 4 showing highest accuracy for the wrist accelerometers and lowest
accuracy for the hip accelerometer, although all four accelerometers provided similar estimations
of SB breaks. The results for prediction of breaks in SB were more mixed but indicated that the
hip was superior for measurement of breaks in SB. Further work is needed to confirm these
findings as there is limited and potentially conflicting research regarding the utility of different
accelerometer placement sites for measurement of SB.
Conclusions
This dissertation provides a comparison of the utility of accelerometers placed on the hip,
thigh, and wrists and machine learning models for measurements of three key behavioral
variables (energy expenditure, activity type recognition, and sedentary behavior) which are
important determinants for long-term health at an individual and population level. We sought to
determine if accelerometer placement affected measurement accuracy and if an optimal
placement existed for measurement of all three variables. Our study suggests that choice of
placement site affects measurement accuracy. Each outcome variable had a different optimal
placement site, with the thigh being best for energy expenditure, the wrists being best for activity
type classification, and the hip and right wrist being best for measurement of SB, although the
SB findings were somewhat mixed. Additionally, although one placement site was not best for
all measures, all placement sites allowed for high accuracy of measurement of energy
expenditure, the wrists and thigh achieved over 80% accuracy for activity type classification, and
all four monitors showed strengths and weakness for measurement of SB. Given these findings
along with those of previous work, it seems that choice of accelerometer placement should
212

depend on the specific research questions, the population being tested, the length of time monitors
are to be worn, and the complexity of the models desired.
In an effort to compare the accuracy of our ANNs for measurement of energy expenditure
vs. measurement of activity type, we have provided Table 6.1 below, which shows the
sensitivity, specificity, and AUC of the energy expenditure ANNs for their accuracy in
predicting activity intensity (similar to Table 4.9). With the activity type ANNs, AUC for
activity intensity was as low as 0.85for the hip accelerometer placement (indicating good
accuracy) and as high as 0.94 for the thigh accelerometer placement (indicating high accuracy).
Conversely, the activity intensity AUC was much lower for the energy expenditure ANNs, with
AUC values of 0.75-0.76 for the hip and wrist placements and 0.79 for the thigh placement,
indicating only fair accuracy for the four placement sites. Therefore, it appears that in terms of
determining activity intensity, the activity type ANNs may be superior to the energy expenditure
ANNs for all accelerometer placement sites tested.

213

Table 6.1. Overall sensitivity, specificity, and AUC among the four accelerometer placement sites for classification of activity
intensity using the energy expenditure ANNs (developed in Chapter 3).

Sedentary
Light
Moderate
Vigorous
MVPA
Total

Sensitivity (% agreement)
GE
GE
AG
AG
Left
Right
Hip
Thigh
Wrist
Wrist
64.1
70.4
58.7
51.8
(7.3)*
(6.9)*
(7.5)*
(7.6)*
60.2
65.2
66.2
66.8
(6.3)*
(6.1)
(6.1)
(6.0)
69.1
72.3
74.4
72.6
(6.7)
(6.5)
(6.3)
(6.5)
70.6
75.8
62.2
65.6
(9.0)
(8.4)
(9.6)*
(9.4)*
83.8
88.1
86.0
85.2
(4.3)
(3.8)
(4.0)
(4.1)
65.1
69.9
66.0
64.4
(3.6)
(3.4)*
(3.6)
(3.6)

AG
Hip
93.0
(3.9)
78.0
(5.3)
82.1
(5.6)
97.5
(3.1)
84.1
(4.3)*
85.6
(2.6)

Specificity (%)
GE
AG
Left
Thigh
Wrist
90.7
93.1
(4.4)
(3.8)
82.9
77.5
(4.8)*
(5.4)
87.9
83.1
(4.7)*
(5.4)
96.5
98.0
(3.6)
(2.7)
90.2
87.3
(3.5)*
(3.9)
88.1
85.8
(2.4)*
(2.6)

AUC
GE
Right
Wrist
93.6
(3.7)
74.6
(5.6)
83.4
(5.4)
97.8
(2.9)
86.7
(4.0)
85.0
(2.7)

AG
Hip

AG
Thigh

0.79
(0.01)*
0.69
(0.01)*
0.76
(0.01)*
0.84
(0.02)*
0.84
(0.01)*
0.75
(0.01)

0.81
(0.01)*
0.74
(0.01)*
0.80
(0.01)*
0.86
(0.02)*
0.89
(0.01)*
0.79
(0.01)*

GE
Left
Wrist
0.76
(0.01)*
0.72
(0.01)*
0.79
(0.01)*
0.80
(0.02)*
0.87
(0.01)*
0.76
(0.01)*

Values are shown as Mean (SD). The * indicates significant differences from all other accelerometer placement sites.

214

GE
Right
Wrist
0.73
(0.01)*
0.71
(0.01)*
0.78
(0.01)*
0.82
(0.02)*
0.86
(0.01)*
0.75
(0.01)

Current use of machine learning suffers from several pitfalls that this dissertation sought
to address. First, machine learning models are often built by engineers or computer scientists
who have an understanding for model building far beyond that of the average physical activity
researcher. The associated complexity of many machine learning models limits or prohibits their
use by physical activity researchers. The artificial neural networks created in this dissertation
were built with simple input variables that can easily be calculated in Microsoft Excel.
Additionally, we used pre-written R code for development and testing of our models, therefore
accomplishing our goal of making artificial neural network creation understandable and
accessible to non-experts.
Also, validation studies are often conducted in laboratories under strictly controlled
protocols that require activities to be performed at a constant intensity for a defined period of
time and for a specific order. These laboratory conditions are not similar to how people actually
act in a free-living environment, and previous research shows consistent drops in performance
when laboratory-validated techniques are applied to free-living situations (Gyllensten and
Bonomi 2011; Lyden, Keadle et al. 2013; van Hees, Golubic et al. 2013). Therefore, we allowed
participants considerable freedom in our protocol to make our setting as similar to a free-living
environment as possible while still keeping the visit short and having participants perform all 14
activities.
The dissertation results provide several important advances to the field of physical
activity and sedentary behavior measurement. First, we have improved measurement of energy
expenditure using a single accelerometer far beyond what has been achieved using count-based
regression and cut-points. With our energy expenditure models, it is possible to determine a
215

person’s daily kcal expenditure and therefore provide valuable information relevant to
interventions such as weight loss. In addition, total daily energy expenditure can be used as a
measure of total activity level in order to determine relationships with specific health outcomes.
Alternately, by using three METs as the threshold for MVPA, we can use the energy expenditure
models to determine daily MVPA levels, measure adherence to meeting the national physical
activity recommendations, and identify individuals or groups accumulating inadequate physical
activity or excessive sedentary behavior. Second, our activity type models are useful for
determining times of the day when participants are most/least active in addition to knowing how
much time they spend in certain behaviors. This information is important for individuals who
tailor specific intervention strategies to help people become more active and less sedentary.
Also, it may help for determining associations of specific behaviors (i.e., standing) with health
outcomes.
Lastly, the emphasis on accurate sedentary behavior measurement in this dissertation was
warranted given the current lack of a measurement tool that is valid for assessment of sedentary
behavior as well as physical activity. A major purpose of this dissertation was to determine an
optimal method for measuring total time spent in sedentary behavior and breaks in sedentary
behavior since an accurate sedentary behavior measurement tool will facilitate further research
into the health risks of sedentary behavior and allow for evidence-based recommendations to be
developed regarding healthy levels of sedentary behavior. We believe that this dissertation
offers a fairly accurate measure of sedentary behavior, but there is room for improvement in this
measure.

216

It would be ideal if one accelerometer placement performed best for all variables of interest
because that would allow for recommendation of a single monitor placement, but this did not
occur. However, if we were to pick one accelerometer placement based on the results of this
dissertation, we would choose an accelerometer placed on the non-dominant wrist. This placement
showed the highest accuracy for activity type prediction and achieved high overall measurement
accuracy for energy expenditure and sedentary behavior. The dominant wrist also performed well
for activity type and energy expenditure prediction but with lower accuracy for prediction of
sedentary behavior. The thigh placement also performed well overall, but the wrist-mounted
accelerometers were more comfortable and convenient for wear and still yielded high measurement
accuracy. A good blend of practicality and accuracy is often desired for measurement tools used in
large epidemiologic, surveillance, or intervention studies. Additionally, it appears that more
accelerometer features may improve measurement accuracy, but simpler feature sets can still
provide high accuracy while simplifying the predictive models. From our results, we would
recommend the feature set consisting of the five accelerometer percentiles (10th, 25th, 50th, 75th, and
90th), which has been used in previous work and also showed high measurement accuracy in this
dissertation.
Results of this dissertation encourage further exploration of accurate yet relatively simple
ways of using accelerometers to measure several important behavioral variables known to
influence health. Below, we have outlined future directions for exploration of sedentary behavior
measurement as well as other areas that build off the findings from this dissertation.

217

Recommendations for future research
From the findings of this dissertation, we have several recommendations for further
research. These recommendations are discussed below.
1. Further research should be conducted evaluating the accuracy of the hip, thigh, and wrist
accelerometer placement sites for measurement of sedentary behavior. Evidence is
emerging that sedentary behavior is an important health determinant, necessitating further
refinement of measurement tools which can accurately measure the various aspects of
sedentary behavior (total time, breaks, etc.) to better understand its health effects and
determine evidence-based recommendations for limiting sedentary behavior to improve
health. We feel that the models we developed for measurement of sedentary behavior
provided good accuracy but can be improved; some suggested areas for experimentation
include use of different input features and machine learning techniques that may be better
suited specifically for differentiating movement from non-movement.
2. The testing of the wrist, hip, and thigh placement sites should be expanded to a more
diverse population. Children and older adults have very different movement patterns and
physical activity levels (Bailey, Olson et al. 1995; Troiano, Berrigan et al. 2008), and it
may be that certain placements may have advantages in these different populations.
Additionally, overweight/obese or pregnant populations often feel uncomfortable while
wearing hip accelerometers (Feito, Bassett et al. 2011), so wrist and thigh placements
should be tested in these populations to determine if these are sufficient alternative sites
for accelerometers to be worn.

218

3. Accelerometer placement sites should be qualitatively and quantitatively evaluated for
wear preference, and compliance data for each site should be assessed. There is
preliminary information from NHANES that the wrist accelerometer placement has
slightly higher compliance than the hip accelerometer (Troiano and McClain 2012), but
these findings must be verified and expanded.
4. Machine learning algorithms other than artificial neural networks should be utilized in
model creation. Artificial neural networks are being studied more thoroughly, show high
measurement accuracy, and are easier to compute using R software than many other types
of algorithms, but they are also much more computationally inefficient than other
algorithms. Additionally, other algorithms may be able to achieve higher measurement
accuracy than artificial neural networks (Preece, Goulermas et al. 2009). These
possibilities should be explored in future work.
5. The simulated free-living setting used in this study was a significant study strength, and
the results are likely more generalizable than those achieved in laboratory-based settings.
However, a simulated free-living setting does not provide a perfect representation of the
true free-living environment; thus, the artificial neural networks created in this study
should be evaluated in a true free-living setting.
6. Further work should be done to determine the optimal criterion measure for use in freeliving measurement of energy expenditure. We chose to use indirect calorimetry as the
criterion for this dissertation, even though the majority of time was likely spent not in
steady-state.

219

7. This dissertation provided preliminary validation of artificial neural networks developed
from accelerometer data to detect several behavioral variables, including energy
expenditure, recognition of common activities performed, and sedentary behavior.
Others have used accelerometers (usually placed on the wrist or hip) for measurement of
sleep quality and quantity (Jean-Louis, Kripke et al. 2001), both of which have known
associations with many health indices (Hoevenaar-Blom, Spijkerman et al. 2011).
Several proprietary activity monitors such as the Fitbit® or Fuelband® are designed to
monitor both activity and sleep, but these have questionable accuracy, and we are
unaware of a research-grade device that has been validated to accurately measure both
sleep and activity variables. We would like to expand the use of the machine learning
algorithms developed and validated in this dissertation to measure sleep quantity and
quality.
8. One important finding of this dissertation is that upper-body activities and specific
sedentary behaviors are detected well by wrist-mounted accelerometers. Diet is a
notoriously difficult variable to measure, and one reason for this difficulty is that diet is
most often subjectively recalled via diary, interview, or food frequency questionnaire
(Thompson and Subar 2013). Two objective methods exist to measure diet, direct
observation and blood-based biomarkers, but direct observation is likely to cause
reactivity and blood biomarkers are only useful for some nutrients and not overall diet
quality (Park, Vollset et al. 2013). An interesting potential application of machine
learning and pattern recognition would be to attempt to detect when someone is eating
using acceleration data from a wrist accelerometer. Eating is typically a seated, sedentary
220

behavior with predicable arm movement; these characteristics give us reason to believe
that eating could be recognized using a wrist-mounted accelerometer. This approach may
not be able to yield accurate estimates of diet quality or quantity of foods consumed, but
it could provide valuable information about eating behaviors such as frequency and
timing of meals. Also, there may be ways to use this information as feedback to the
wearers to improve subjective recall of eating behaviors and to combine physical activity
and eating behavior assessment to provide more accurate and focused health information.

221

APPENDICES

222

APPENDIX A
Consent form

223

224

225

226

APPENDIX B
Recruitment flyer
Figure B.1. Recruitment flyer

227

APPENDIX C
Recruitment email

228

APPENDIX D
Supplemental figures
Figure D.1. Equipment worn by participants during the 90-min protocol. Participant shown is
performing the lying activity (T1).

229

Figure D.2. Example of participant performing reading activity (T2).

230

Figure D.3. Example of participant performing computer use activity (T3).

231

Figure D.4. Example of participant performing standing activity (T4).

232

Figure D.5. Example of participant performing laundry activity (T5).

233

Figure D.6. Example of participant performing sweeping activity (T6).

234

Figure D.7. Example of participant performing walking slow and fast activities (T 7 and T8).

235

Figure D.8. Example of participant performing jogging activity (T9).

236

Figure D.9. Example of participant performing cycling activity (T10).

237

Figure D.10. Example of participant performing stair use activity (T11).

238

Figure D.11. Example of participant performing biceps curls activity (T12).

239

Figure D.12. Example of participant performing squats activity (T13).

240

Figure D.13. Example of non-wear (T14).

241

REFERENCES

242

REFERENCES

"R Core Development Team. R: A language and Environment for Statistical Computing. version
2.12.1."
(2008) "Physical Activity Guidelines Advisory Committee: 2008 Physical Activity Guidelines for
Americans."
(2008). "US Department of Health and Human Services. 2008 physical activity guidelines for
Americans." from http://www.health.gov/PAGuidelines/.
ACSM (2009). ACSM's Guidelines for Exercise Testing and Prescription, Lippincott Williams &
Wilkins.
ActiGraph. (2013). "Products: GT3X+ Monitor." from
http://www.actigraphcorp.com/products/gt3x-monitor/.
Ainsworth, B. E., W. L. Haskell, et al. (2011). "2011 Compendium of Physical Activities: a second
update of codes and MET values." Medicine and science in sports and exercise 43(8):
1575-1581.
Ainsworth, B. E., W. L. Haskell, et al. (2011). "2011 Compendium of Physical Activities: a second
update of codes and MET values." Med Sci Sports Exerc 43(8): 1575-1581.
Akkermans, M. A., M. J. Sillen, et al. (2012). "Validation of the oxycon mobile metabolic system
in healthy subjects." Journal of sports science & medicine 11(1): 182-183.
Albinali, F., S. Intille, et al. (2010). Using Wearable Activity Type Detection to Improve Physical
Activity Energy Expenditure Estimation. ACM Conference on Ubiquitous Computing.
Denmark: 311-320.
Aminian, S. and E. A. Hinckson (2012). "Examining the validity of the ActivPAL monitor in
measuring posture and ambulatory movement in children." The international journal of
behavioral nutrition and physical activity 9: 119.
Andre, D. and D. L. Wolf (2007). "Recent advances in free-living physical activity monitoring: a
review." Journal of diabetes science and technology 1(5): 760-767.
Arvidsson, D., F. Slinde, et al. (2007). "Energy cost of physical activities in children: validation of
SenseWear Armband." Medicine and science in sports and exercise 39(11): 2076-2084.
243

Atkin, A. J., T. Gorely, et al. (2012). "Methods of Measurement in epidemiology: sedentary
Behaviour." International journal of epidemiology 41(5): 1460-1471.
Ayabe, M., H. Kumahara, et al. (2013). "Epoch length and the physical activity bout analysis: an
accelerometry research issue." BMC research notes 6: 20.
Bailey, R. C., J. Olson, et al. (1995). "The level and tempo of children's physical activities: an
observational study." Med Sci Sports Exerc 27(7): 1033-1041.
Bailey, R. C., J. Olson, et al. (1995). "The level and tempo of children's physical activities: an
observational study." Medicine and science in sports and exercise 27(7): 1033-1041.
Bao, L. and S. S. Intille (2004). "Activity recognition from user-annotated acceleration data."
Proceedings of PERVASIVE 2004 LNCS 3001: 1-17.
Beaton, G. H., J. Milner, et al. (1979). "Sources of variance in 24-hour dietary recall data:
implications for nutrition study design and interpretation." Am J Clin Nutr 32(12): 25462559.
Bergouignan, A., F. Rudwill, et al. (2011). "Physical inactivity as the culprit of metabolic
inflexibility: evidence from bed-rest studies." Journal of applied physiology 111(4): 12011210.
Berntsen, S., R. Hageberg, et al. (2010). "Validity of physical activity monitors in adults
participating in free-living activities." British journal of sports medicine 44(9): 657-664.
Berntsen, S., S. N. Stafne, et al. (2011). "Physical activity monitor for recording energy
expenditure in pregnancy." Acta obstetricia et gynecologica Scandinavica 90(8): 903-907.
Bey, L. and M. T. Hamilton (2003). "Suppression of skeletal muscle lipoprotein lipase activity
during physical inactivity: a molecular reason to maintain daily low-intensity activity." The
Journal of physiology 551(Pt 2): 673-682.
Bird, A. D. (1972). "The effect of surgery, injury, and prolonged bed rest on calf blood flow." The
Australian and New Zealand journal of surgery 41(4): 374-379.
Blair, S. N. (1993). "Evidence for success of exercise in weight loss and control." Annals of
internal medicine 119(7 Pt 2): 702-706.
Bonomi, A. G., G. Plasqui, et al. (2009). "Improving assessment of daily energy expenditure by
identifying types of physical activity with a single accelerometer." Journal of applied
physiology 107(3): 655-661.

244

Boone, J. E., P. Gordon-Larsen, et al. (2007). "Screen time and physical activity during
adolescence: longitudinal effects on obesity in young adulthood." The international journal
of behavioral nutrition and physical activity 4: 26.
Bouten, C. V., A. A. Sauren, et al. (1997). "Effects of placement and orientation of body-fixed
accelerometers on the assessment of energy expenditure during walking." Medical &
biological engineering & computing 35(1): 50-56.
Brage, S., N. Brage, et al. (2005). "Reliability and validity of the combined heart rate and
movement sensor Actiheart." European journal of clinical nutrition 59(4): 561-570.
Brage, S., N. Brage, et al. (2003). "Reliability and validity of the Computer Science and
Applications accelerometer in a mechanical setting." Measurement in Physical Education
and Exercise Science 7: 101-119.
Brage, S., N. Wedderkopp, et al. (2003). "Reexamination of validity and reliability of the CSA
monitor in walking and running." Medicine and science in sports and exercise 35(8): 14471454.
Brownson, R. C., C. M. Hoehner, et al. (2009). "Measuring the built environment for physical
activity: state of the science." American journal of preventive medicine 36(4 Suppl): S99123 e112.
Carr, L. J. and M. T. Mahar (2012). "Accuracy of intensity and inclinometer output of three
activity monitors for identification of sedentary behavior and light-intensity activity."
Journal of obesity 2012: 460271.
Celis-Morales, C. A., F. Perez-Bravo, et al. (2012). "Objective vs. self-reported physical activity
and sedentary time: effects of measurement method on relationships with risk biomarkers."
PloS one 7(5): e36345.
Chobanian, A. V., G. L. Bakris, et al. (2003). "The Seventh Report of the Joint National
Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure:
the JNC 7 report." JAMA 289(19): 2560-2572.
Choi, L., Z. Liu, et al. (2011). "Validation of accelerometer wear and nonwear time classification
algorithm." Medicine and science in sports and exercise 43(2): 357-364.
Clark, B. K., A. A. Thorp, et al. (2011). "Validity of self-reported measures of workplace sitting
time and breaks in sitting time." Medicine and science in sports and exercise 43(10): 19071912.
Cleland, I., B. Kikhia, et al. (2013). "Optimal placement of accelerometers for the detection of
everyday activities." Sensors 13(7): 9183-9200.
245

Colbert, L. H., C. E. Matthews, et al. (2011). "Comparative validity of physical activity measures
in older adults." Medicine and science in sports and exercise 43(5): 867-876.
Craft, L. L., T. W. Zderic, et al. (2012). "Evidence that women meeting physical activity guidelines
do not sit less: an observational inclinometry study." The international journal of behavioral
nutrition and physical activity 9: 122.
Crouter, S. E., C. Albright, et al. (2004). "Accuracy of polar S410 heart rate monitor to estimate
energy cost of exercise." Medicine and science in sports and exercise 36(8): 1433-1439.
Crouter, S. E. and D. R. Bassett, Jr. (2008). "A new 2-regression model for the Actical
accelerometer." British journal of sports medicine 42(3): 217-224.
Crouter, S. E., J. R. Churilla, et al. (2006). "Estimating energy expenditure using accelerometers."
European journal of applied physiology 98(6): 601-612.
Crouter, S. E., K. G. Clowers, et al. (2006). "A novel method for using accelerometer data to
predict energy expenditure." Journal of applied physiology 100(4): 1324-1331.
Crouter, S. E., E. Kuffel, et al. (2010). "Refined two-regression model for the ActiGraph
accelerometer." Medicine and science in sports and exercise 42(5): 1029-1037.
Dale, D., G. J. Welk, et al. (2002). Methods for Assessing Physical Activity and Challenges for
Research. Physical Activity Assessments for Health-Related Research. G. J. Welk.
Champaign, IL, Human Kinetics, Inc.: 19-36.
Dannecker, K. L., N. A. Sazonova, et al. (2013). "A comparison of energy expenditure estimation
of several physical activity monitors." Medicine and science in sports and exercise 45(11):
2105-2112.
De Vries, S. I., F. G. Garre, et al. (2011). "Evaluation of neural networks to identify types of
activity using accelerometers." Medicine and science in sports and exercise 43(1): 101-107.
DiNallo, J. M., D. S. Downs, et al. (2012). "Objectively assessing treadmill walking during the
second and third pregnancy trimesters." J Phys Act Health 9(1): 21-28.
DiNallo, J. M., D. S. Downs, et al. (2012). "Objectively assessing treadmill walking during the
second and third pregnancy trimesters." Journal of physical activity & health 9(1): 21-28.
Dingwell, J. B., J. P. Cusumano, et al. (2001). "Local dynamic stability versus kinematic variability
of continuous overground and treadmill walking." Journal of biomechanical engineering
123(1): 27-32.

246

Dong, B., S. Biswas, et al. (2013). "Comparing metabolic energy expenditure estimation using
wearable multi-sensor network and single accelerometer." Conference proceedings : ...
Annual International Conference of the IEEE Engineering in Medicine and Biology
Society. IEEE Engineering in Medicine and Biology Society. Conference 2013: 28662869.
Dong, B., A. Montoye, et al. (2013). "Energy-aware activity classification using wearable sensor
networks." 87230Y-87230Y.
Dunstan, D. W., B. Howard, et al. (2012). "Too much sitting--a health hazard." Diabetes research
and clinical practice 97(3): 368-376.
Dwyer, T. J., J. A. Alison, et al. (2009). "Evaluation of the SenseWear activity monitor during
exercise in cystic fibrosis and in health." Respiratory medicine 103(10): 1511-1517.
Ekelund, U., S. Brage, et al. (2009). "Objectively measured moderate- and vigorous-intensity
physical activity but not sedentary time predicts insulin resistance in high-risk individuals."
Diabetes care 32(6): 1081-1086.
Erik Landhuis, C., R. Poulton, et al. (2008). "Programming obesity and poor fitness: the long-term
impact of childhood television." Obesity 16(6): 1457-1459.
Ermes, M., J. Parkka, et al. (2008). "Detection of daily activities and sports with wearable sensors
in controlled and uncontrolled conditions." IEEE transactions on information technology in
biomedicine : a publication of the IEEE Engineering in Medicine and Biology Society
12(1): 20-26.
Esliger, D. W., A. V. Rowlands, et al. (2011). "Validation of the GENEA Accelerometer."
Medicine and science in sports and exercise 43(6): 1085-1093.
Esliger, D. W. and M. S. Tremblay (2006). "Technical reliability assessment of three accelerometer
models in a mechanical setup." Medicine and science in sports and exercise 38(12): 21732181.
Evenson, K. R. and J. W. Terry, Jr. (2009). "Assessment of differing definitions of accelerometer
nonwear time." Research quarterly for exercise and sport 80(2): 355-362.
Feito, Y., D. R. Bassett, et al. (2012). "Evaluation of activity monitors in controlled and free-living
environments." Medicine and science in sports and exercise 44(4): 733-741.
Feito, Y., D. R. Bassett, et al. (2011). "Effects of body mass index and tilt angle on output of two
wearable activity monitors." Med Sci Sports Exerc 43(5): 861-866.

247

Ferro-Luzzi, A. (1968). "[Inter- and intra-individual variability of the human energy expenditure in
the rest position]." Bollettino della Societa italiana di biologia sperimentale 44(7): 633-637.
Field, A. (2009). Discovering Statistics Using SPSS. London, SAGE Publications Ltd.
Ford, E. S., M. B. Schulze, et al. (2010). "Television watching and incident diabetes: Findings
from the European Prospective Investigation into Cancer and Nutrition-Potsdam Study."
Journal of diabetes 2(1): 23-27.
Fortune, E., V. Lugade, et al. (2014). "Validity of using tri-axial accelerometers to measure human
movement - Part II: Step counts at a wide range of gait velocities." Medical engineering &
physics.
Foster, R. C., L. M. Lanningham-Foster, et al. (2005). "Precision and accuracy of an ankle-worn
accelerometer-based pedometer in step counting and energy expenditure." Prev Med 41(34): 778-783.
Freedson, P. S., K. Lyden, et al. (2011). "Evaluation of artificial neural network algorithms for
predicting METs and activity type from accelerometer data: validation on an independent
sample." Journal of applied physiology 111(6): 1804-1812.
Freedson, P. S., E. Melanson, et al. (1998). "Calibration of the Computer Science and Applications,
Inc. accelerometer." Medicine and science in sports and exercise 30(5): 777-781.
Freedson, P. S., E. Melanson, et al. (1998). "Calibration of the Computer Science and Applications,
Inc. accelerometer." Med Sci Sports Exerc 30(5): 777-781.
Frost, C. and I. R. White (2005). "The effect of measurement error in risk factors that change over
time in cohort studies: do simple methods overcorrect for 'regression dilution'?" Int J
Epidemiol 34(6): 1359-1368.
Gabriel, K. P., J. J. McClain, et al. (2010). "Issues in accelerometer methodology: the role of epoch
length on estimates of physical activity and relationships with health outcomes in
overweight, post-menopausal women." The international journal of behavioral nutrition
and physical activity 7: 53.
GENEActiv. (2013). "GENEAction: comprehensive data collection for every body." from
http://www.geneactive.co.uk/products/geneactiv-action.aspx.
Gierach, G. L., S. C. Chang, et al. (2009). "Physical activity, sedentary behavior, and endometrial
cancer risk in the NIH-AARP Diet and Health Study." International journal of cancer.
Journal international du cancer 124(9): 2139-2147.

248

Grant, P. M., C. G. Ryan, et al. (2006). "The validation of a novel activity monitor in the
measurement of posture and motion during everyday activities." British journal of sports
medicine 40(12): 992-997.
GRPS. (2012). "GRPS School Choice Expo." from http://www.grps.org/ourschools/high-schools.
Gyllensten, I. C. and A. G. Bonomi (2011). "Identifying types of physical activity with a single
accelerometer: evaluating laboratory-trained algorithms in daily life." IEEE transactions on
bio-medical engineering 58(9): 2656-2663.
Hagstromer, M., P. Oja, et al. (2006). "The International Physical Activity Questionnaire (IPAQ): a
study of concurrent and construct validity." Public Health Nutr 9(6): 755-762.
Ham, S. A., J. Kruger, et al. (2009). "Participation by US adults in sports, exercise, and recreational
physical activities." Journal of physical activity & health 6(1): 6-14.
Hamilton, M. T., D. G. Hamilton, et al. (2004). "Exercise physiology versus inactivity physiology:
an essential concept for understanding lipoprotein lipase regulation." Exercise and sport
sciences reviews 32(4): 161-166.
Hamilton, M. T., D. G. Hamilton, et al. (2007). "Role of low energy expenditure and sitting in
obesity, metabolic syndrome, type 2 diabetes, and cardiovascular disease." Diabetes
56(11): 2655-2667.
Hanggi, J. M., L. R. Phillips, et al. (2012). "Validation of the GT3X ActiGraph in children and
comparison with the GT1M ActiGraph." Journal of science and medicine in sport / Sports
Medicine Australia.
Hargens, A. R. and S. Richardson (2009). "Cardiovascular adaptations, fluid shifts, and
countermeasures related to space flight." Respiratory physiology & neurobiology 169
Suppl 1: S30-33.
Harrington, D. M., G. J. Welk, et al. (2011). "Validation of MET estimates and step measurement
using the ActivPAL physical activity logger." Journal of sports sciences 29(6): 627-633.
Harrison, C. L., R. G. Thompson, et al. (2011). "Measuring physical activity during pregnancy."
The international journal of behavioral nutrition and physical activity 8: 19.
Hart, T. L., B. E. Ainsworth, et al. (2011). "Objective and subjective measures of sedentary
behavior and physical activity." Medicine and science in sports and exercise 43(3): 449456.

249

Haskell, W. L., M. C. Yee, et al. (1993). "Simultaneous measurement of heart rate and body
motion to quantitate physical activity." Medicine and science in sports and exercise 25(1):
109-115.
Haymes, E. M. and W. C. Byrnes (1993). "Walking and running energy expenditure estimated by
Caltrac and indirect calorimetry." Med Sci Sports Exerc 25(12): 1365-1369.
Healy, G. N., B. K. Clark, et al. (2011). "Measurement of adults' sedentary time in populationbased studies." American journal of preventive medicine 41(2): 216-227.
Healy, G. N., D. W. Dunstan, et al. (2007). "Objectively measured light-intensity physical activity
is independently associated with 2-h plasma glucose." Diabetes care 30(6): 1384-1389.
Healy, G. N., D. W. Dunstan, et al. (2008). "Breaks in sedentary time: beneficial associations with
metabolic risk." Diabetes care 31(4): 661-666.
Healy, G. N., D. W. Dunstan, et al. (2008). "Television time and continuous metabolic risk in
physically active adults." Medicine and science in sports and exercise 40(4): 639-645.
Healy, G. N., C. E. Matthews, et al. (2011). "Sedentary time and cardio-metabolic biomarkers in
US adults: NHANES 2003-06." European heart journal 32(5): 590-597.
Healy, G. N., K. Wijndaele, et al. (2008). "Objectively measured sedentary time, physical activity,
and metabolic risk: the Australian Diabetes, Obesity and Lifestyle Study (AusDiab)."
Diabetes care 31(2): 369-371.
Heiermann, S., K. Khalaj Hedayati, et al. (2011). "Accuracy of a portable multisensor body
monitor for predicting resting energy expenditure in older people: a comparison with
indirect calorimetry." Gerontology 57(5): 473-479.
Heil, D. P. (2006). "Predicting activity energy expenditure using the Actical activity monitor."
Research quarterly for exercise and sport 77(1): 64-80.
Helmerhorst, H. J., K. Wijndaele, et al. (2009). "Objectively measured sedentary time may predict
insulin resistance independent of moderate- and vigorous-intensity physical activity."
Diabetes 58(8): 1776-1779.
Hendelman, D., K. Miller, et al. (2000). "Validity of accelerometry for the assessment of moderate
intensity physical activity in the field." Medicine and science in sports and exercise 32(9
Suppl): S442-449.
Herren, R., A. Sparti, et al. (1999). "The prediction of speed and incline in outdoor running in
humans using accelerometry." Medicine and science in sports and exercise 31(7): 10531059.
250

Herrmann, S. D., T. V. Barreira, et al. (2012). "Impact of accelerometer wear time on physical
activity data: a NHANES semisimulation data approach." British journal of sports
medicine.
Hjorth, M. F., J. P. Chaput, et al. (2012). "Measure of sleep and physical activity by a single
accelerometer: Can a waist-worn Actigraph adequately measure sleep in children?" Sleep
and Biological Rhythms 10(4): 328-335.
Hoevenaar-Blom, M. P., A. M. Spijkerman, et al. (2011). "Sleep duration and sleep quality in
relation to 12-year cardiovascular disease incidence: the MORGEN study." Sleep 34(11):
1487-1492.
Howard, R. A., D. M. Freedman, et al. (2008). "Physical activity, sedentary behavior, and the risk
of colon and rectal cancer in the NIH-AARP Diet and Health Study." Cancer causes &
control : CCC 19(9): 939-953.
Hu, F. B., M. F. Leitzmann, et al. (2001). "Physical activity and television watching in relation to
risk for type 2 diabetes mellitus in men." Archives of internal medicine 161(12): 15421548.
Hu, F. B., T. Y. Li, et al. (2003). "Television watching and other sedentary behaviors in relation to
risk of obesity and type 2 diabetes mellitus in women." JAMA : the journal of the
American Medical Association 289(14): 1785-1791.
Jakicic, J. M., M. Marcus, et al. (2004). "Evaluation of the SenseWear Pro Armband to assess
energy expenditure during exercise." Medicine and science in sports and exercise 36(5):
897-904.
Janz, K. F. (2002). Use of Heart Rate Monitors to Assess Physical Activity. Physical Activity
Assessments for Health-Related Research. G. J. Welk. Champaign, IL, Human Kinetics,
Inc.: 143-162.
Janz, K. F., J. Witt, et al. (1995). "The stability of children's physical activity as measured by
accelerometry and self-report." Medicine and science in sports and exercise 27(9): 13261332.
Jean-Louis, G., D. F. Kripke, et al. (2001). "Sleep detection with an accelerometer actigraph:
comparisons with polysomnography." Physiol Behav 72(1-2): 21-28.
Johnstone, A. M., S. D. Murison, et al. (2005). "Factors influencing variation in basal metabolic
rate include fat-free mass, fat mass, age, and circulating thyroxine but not sex, circulating
leptin, or triiodothyronine." The American journal of clinical nutrition 82(5): 941-948.

251

Kampert, J. B., S. N. Blair, et al. (1996). "Physical activity, physical fitness, and all-cause and
cancer mortality: a prospective study of men and women." Annals of epidemiology 6(5):
452-457.
Katzmarzyk, P. T. (2014). "Standing and mortality in a prospective cohort of canadian adults."
Medicine and science in sports and exercise 46(5): 940-946.
Katzmarzyk, P. T., T. S. Church, et al. (2009). "Sitting time and mortality from all causes,
cardiovascular disease, and cancer." Medicine and science in sports and exercise 41(5):
998-1005.
Kenney, W., J. Wilmore, et al. (2012). Physiology of sport and exercise. Champaign, IL, Human
Kinetics.
Khan, A. M., Y. K. Lee, et al. (2008). "Accelerometer signal-based human activity recognition
using augmented autoregressive model coefficients and artificial neural nets." Conference
proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and
Biology Society. IEEE Engineering in Medicine and Biology Society. Conference 2008:
5172-5175.
Khan, A. M., Y. K. Lee, et al. (2010). "A triaxial accelerometer-based physical-activity recognition
via augmented-signal features and a hierarchical recognizer." IEEE transactions on
information technology in biomedicine : a publication of the IEEE Engineering in
Medicine and Biology Society 14(5): 1166-1172.
Kierkegaard, A., L. Norgren, et al. (1987). "Incidence of deep vein thrombosis in bedridden nonsurgical patients." Acta medica Scandinavica 222(5): 409-414.
Kinder, J. R., K. A. Lee, et al. (2012). "Validation of a hip-worn accelerometer in measuring sleep
time in children." J Pediatr Nurs 27(2): 127-133.
King, A. C. and D. L. Tribble (1991). "The role of exercise in weight regulation in nonathletes."
Sports medicine 11(5): 331-349.
Kozey-Keadle, S., A. Libertine, et al. (2011). "Validation of wearable monitors for assessing
sedentary behavior." Medicine and science in sports and exercise 43(8): 1561-1567.
Krahenbuhl, G. S. and T. J. Williams (1992). "Running economy: changes with age during
childhood and adolescence." Medicine and science in sports and exercise 24(4): 462-466.
Kripke, D. F., D. J. Mullaney, et al. (1978). "Wrist actigraphic measures of sleep and rhythms."
Electroencephalogr Clin Neurophysiol 44(5): 674-676.

252

Lagerros, Y. T. and P. Lagiou (2007). "Assessment of physical activity and energy expenditure in
epidemiological research of chronic diseases." Eur J Epidemiol 22(6): 353-362.
LaPorte, R. E., L. H. Kuller, et al. (1979). "An objective measure of physical activity for
epidemiologic research." American journal of epidemiology 109(2): 158-168.
LaPorte, R. E., H. J. Montoye, et al. (1985). "Assessment of physical activity in epidemiologic
research: problems and prospects." Public health reports 100(2): 131-146.
Le Masurier, G. C., C. L. Sidman, et al. (2003). "Accumulating 10,000 steps: does this meet
current physical activity guidelines?" Research quarterly for exercise and sport 74(4): 389394.
Lee, I. M. and P. J. Skerrett (2001). "Physical activity and all-cause mortality: what is the doseresponse relation?" Medicine and science in sports and exercise 33(6 Suppl): S459-471;
discussion S493-454.
Lee, J. M., Y. Kim, et al. (2014). "Validity of Consumer-Based Physical Activity Monitors."
Medicine and science in sports and exercise.
Levine, J. A., N. L. Eberhardt, et al. (1999). "Role of nonexercise activity thermogenesis in
resistance to fat gain in humans." Science 283(5399): 212-214.
Levine, J. A., L. M. Lanningham-Foster, et al. (2005). "Interindividual variation in posture
allocation: possible role in human obesity." Science 307(5709): 584-586.
Lord, S., S. F. Chastin, et al. (2011). "Exploring patterns of daily physical and sedentary behaviour
in community-dwelling older adults." Age Ageing 40(2): 205-210.
Lyden, K. (2012). Refinement, validation and application of a machine learning method for
estimating physical activity and sedentary behavior in free-living people. Dissertation.
Amherst, MA.
Lyden, K., S. K. Keadle, et al. (2013). "A Method to Estimate Free-Living Active and Sedentary
Behavior from an Accelerometer." Medicine and science in sports and exercise.
Lyden, K., S. L. Kozey Keadle, et al. (2012). "Validity of two wearable monitors to estimate
breaks from sedentary time." Medicine and science in sports and exercise 44(11): 22432252.
Lyden, K., S. L. Kozey, et al. (2011). "A comprehensive evaluation of commonly used
accelerometer energy expenditure and MET prediction equations." European journal of
applied physiology 111(2): 187-201.
253

Lyden, K., N. Petruski, et al. (2013). "Direct Observation is a Valid Criterion for Estimating
Physical Activity and Sedentary Behavior." Journal of physical activity & health.
MacMahon, S., R. Peto, et al. (1990). "Blood pressure, stroke, and coronary heart disease. Part 1,
Prolonged differences in blood pressure: prospective observational studies corrected for the
regression dilution bias." Lancet 335(8692): 765-774.
Maddocks, M., A. Petrou, et al. (2010). "Validity of three accelerometers during treadmill walking
and motor vehicle travel." British journal of sports medicine 44(8): 606-608.
Malina, R. (1995). "Anthropometry." Physiological assessment of human fitness: 205-219.
Mannini, A., S. S. Intille, et al. (2013). "Activity recognition using a single accelerometer placed at
the wrist or ankle." Med Sci Sports Exerc.
Mannini, A. and A. M. Sabatini (2010). "Machine learning methods for classifying human physical
activity from on-body accelerometers." Sensors (Basel) 10(2): 1154-1175.
Manson, J. E., D. M. Nathan, et al. (1992). "A prospective study of exercise and incidence of
diabetes among US male physicians." JAMA : the journal of the American Medical
Association 268(1): 63-67.
Martin, A., M. McNeill, et al. (2011). "Objective measurement of habitual sedentary behavior in
pre-school children: comparison of activPAL With Actigraph monitors." Pediatric exercise
science 23(4): 468-476.
Martinsen, E. W., A. Hoffart, et al. (1989). "Comparing aerobic with nonaerobic forms of exercise
in the treatment of clinical depression: a randomized trial." Comprehensive psychiatry
30(4): 324-331.
Masse, L. C., B. F. Fuemmeler, et al. (2005). "Accelerometer data reduction: a comparison of four
reduction algorithms on select outcome variables." Medicine and science in sports and
exercise 37(11 Suppl): S544-554.
Matthews, C. E. (2005). "Calibration of accelerometer output for adults." Medicine and science in
sports and exercise 37(11 Suppl): S512-522.
Matthews, C. E., B. E. Ainsworth, et al. (2002). "Sources of variance in daily physical activity
levels as measured by an accelerometer." Medicine and science in sports and exercise
34(8): 1376-1381.
Matthews, C. E., K. Y. Chen, et al. (2008). "Amount of time spent in sedentary behaviors in the
United States, 2003-2004." American journal of epidemiology 167(7): 875-881.
254

Matthews, C. E., S. C. Moore, et al. (2012). "Improving self-reports of active and sedentary
behaviors in large epidemiologic studies." Exercise and sport sciences reviews 40(3): 118126.
McClain, J. J., S. B. Sisson, et al. (2007). "Actigraph accelerometer interinstrument reliability
during free-living in adults." Medicine and science in sports and exercise 39(9): 1509-1514.
McKenzie, T. (2002). Use of direct observation to assess physical activity. Physical Activity
Assessments for Health-Related Research. G. Welk. Champaign, IL, Kunan Kinetics, Inc.:
179-195.
Melanson, E. L., Jr. and P. S. Freedson (1995). "Validity of the Computer Science and
Applications, Inc. (CSA) activity monitor." Medicine and science in sports and exercise
27(6): 934-940.
Metcalf, B. S., J. S. Curnow, et al. (2002). "Technical reliability of the CSA activity monitor: The
EarlyBird Study." Medicine and science in sports and exercise 34(9): 1533-1537.
Metz, C. E. (1978). "Basic principles of ROC analysis." Seminars in nuclear medicine 8(4): 283298.
Mignault, D., M. St-Onge, et al. (2005). "Evaluation of the Portable HealthWear Armband: a
device to measure total daily energy expenditure in free-living type 2 diabetic individuals."
Diabetes care 28(1): 225-227.
Mikines, K. J., E. A. Richter, et al. (1991). "Seven days of bed rest decrease insulin action on
glucose uptake in leg and whole body." Journal of applied physiology 70(3): 1245-1254.
Montgomery-Downs, H. E., S. P. Insana, et al. (2012). "Movement toward a novel activity
monitoring device." Sleep & breathing = Schlaf & Atmung 16(3): 913-917.
Montoye, A., B. Dong, et al. (2013). Assessing the effect of accelerometer placement and
modeling method on energy expenditure measurement. East Lansing, MI.
Montoye, A., B. Dong, et al. (2014). "Use of a wireless network of accelerometers for improved
measurement of human energy expenditure." Electronics 3(2): 205-220.
Montoye, H., H. Kemper, et al. (1996). Measuring physical activity and energy expenditure.
Champaign, IL, Human Kinetics.
Montoye, H. J., R. Washburn, et al. (1983). "Estimation of energy expenditure by a portable
accelerometer." Medicine and science in sports and exercise 15(5): 403-407.

255

Moon, J. K. and N. F. Butte (1996). "Combined heart rate and activity improve estimates of
oxygen consumption and carbon dioxide production rates." Journal of applied physiology
81(4): 1754-1761.
Morris, J. N., D. G. Clayton, et al. (1990). "Exercise in leisure time: coronary attack and death
rates." Br Heart J 63(6): 325-334.
Morris, J. N., D. G. Clayton, et al. (1990). "Exercise in leisure time: coronary attack and death
rates." British heart journal 63(6): 325-334.
Morris, J. N., J. A. Heady, et al. (1953). "Coronary heart-disease and physical activity of work."
Lancet 265(6796): 1111-1120; concl.
Moy, K. L., J. F. Sallis, et al. (2010). "Culturally-specific physical activity measures for Native
Hawaiian and Pacific Islanders." Hawaii medical journal 69(5 Suppl 2): 21-24.
Mullaney, D. J., D. F. Kripke, et al. (1980). "Wrist-actigraphic estimation of sleep time." Sleep
3(1): 83-92.
Nichols, J. F., C. G. Morgan, et al. (1999). "Validity, reliability, and calibration of the Tritrac
accelerometer as a measure of physical activity." Med Sci Sports Exerc 31(6): 908-912.
Oliver, M., H. M. Badland, et al. (2011). "Identification of accelerometer nonwear time and
sedentary behavior." Research quarterly for exercise and sport 82(4): 779-783.
Orendurff, M. S., J. A. Schoen, et al. (2008). "How humans walk: bout duration, steps per bout,
and rest duration." Journal of rehabilitation research and development 45(7): 1077-1089.
Orme, M., K. Wijndaele, et al. (2014). "Combined influence of epoch length, cut-point and bout
duration on accelerometry-derived physical activity." The international journal of
behavioral nutrition and physical activity 11(1): 34.
Owen, N., G. N. Healy, et al. (2010). "Too much sitting: the population health science of sedentary
behavior." Exercise and sport sciences reviews 38(3): 105-113.
Paffenbarger, R. S., Jr., R. T. Hyde, et al. (1986). "Physical activity, all-cause mortality, and
longevity of college alumni." N Engl J Med 314(10): 605-613.
Paffenbarger, R. S., Jr., A. L. Wing, et al. (1983). "Physical activity and incidence of hypertension
in college alumni." Am J Epidemiol 117(3): 245-257.
PAGAC (2008). Physical Activity Guidlines Advisory Committee Report, 2008. Washington, DC,
US Department of Health and Human Services.
256

Papazoglou, D., G. Augello, et al. (2006). "Evaluation of a multisensor armband in estimating
energy expenditure in obese individuals." Obesity 14(12): 2217-2223.
Parikh, R., A. Mathai, et al. (2008). "Understanding and using sensitivity, specificity and predictive
values." Indian journal of ophthalmology 56(1): 45-50.
Park, J. Y., S. E. Vollset, et al. (2013). "Dietary intake and biological measurement of folate: a
qualitative review of validation studies." Molecular nutrition & food research 57(4): 562581.
Parkka, J., M. Ermes, et al. (2006). "Activity classification using realistic data from wearable
sensors." IEEE transactions on information technology in biomedicine : a publication of the
IEEE Engineering in Medicine and Biology Society 10(1): 119-128.
Pate, R. R., J. R. O'Neill, et al. (2008). "The evolving definition of "sedentary"." Exercise and sport
sciences reviews 36(4): 173-178.
Pate, R. R., M. Pratt, et al. (1995). "Physical activity and public health. A recommendation from
the Centers for Disease Control and Prevention and the American College of Sports
Medicine." JAMA : the journal of the American Medical Association 273(5): 402-407.
Patel, A. V., C. Rodriguez, et al. (2006). "Recreational physical activity and sedentary behavior in
relation to ovarian cancer risk in a large cohort of US women." American journal of
epidemiology 163(8): 709-716.
Plasqui, G. and K. R. Westerterp (2005). "Accelerometry and heart rate as a measure of physical
fitness: proof of concept." Medicine and science in sports and exercise 37(5): 872-876.
Ploug, T., T. Ohkuwa, et al. (1995). "Effect of immobilization on glucose transport and glucose
transporter expression in rat skeletal muscle." The American journal of physiology 268(5
Pt 1): E980-986.
Pober, D. M., J. Staudenmayer, et al. (2006). "Development of novel techniques to classify
physical activity mode using accelerometers." Medicine and science in sports and exercise
38(9): 1626-1634.
Precope, J. (1952). Hippocrates on diet and hygiene. London, UK, Williams, Lea, and Company.
Preece, S. J., J. Y. Goulermas, et al. (2009). "Activity identification using body-mounted sensors--a
review of classification techniques." Physiological measurement 30(4): R1-33.
Puhl, J., K. Greaves, et al. (1990). "Children's Activity Rating Scale (CARS): description and
calibration." Research quarterly for exercise and sport 61(1): 26-36.
257

Reilly, J. J., V. Penpraze, et al. (2008). "Objective measurement of physical activity and sedentary
behaviour: review with new data." Archives of disease in childhood 93(7): 614-619.
Riddoch, C. J., L. Bo Andersen, et al. (2004). "Physical activity levels and patterns of 9- and 15-yrold European children." Medicine and science in sports and exercise 36(1): 86-92.
Rosdahl, H., L. Gullstrand, et al. (2010). "Evaluation of the Oxycon Mobile metabolic system
against the Douglas bag method." European journal of applied physiology 109(2): 159-171.
Rosenberger, M. E., W. L. Haskell, et al. (2013). "Estimating activity and sedentary behavior from
an accelerometer on the hip or wrist." Med Sci Sports Exerc 45(5): 964-975.
Rothney, M. P., M. Neumann, et al. (2007). "An artificial neural network model of energy
expenditure using nonintegrated acceleration signals." Journal of applied physiology
103(4): 1419-1427.
Rothney, M. P., E. V. Schaefer, et al. (2008). "Validity of physical activity intensity predictions by
ActiGraph, Actical, and RT3 accelerometers." Obesity 16(8): 1946-1952.
Rowlands, A. V., T. S. Olds, et al. (2014). "Assessing Sedentary Behavior with the GENEActiv:
Introducing the Sedentary Sphere." Medicine and science in sports and exercise 46(6):
1235-1247.
Rumo, M., O. Amft, et al. (2011). "A stepwise validation of a wearable system for estimating
energy expenditure in field-based research." Physiological measurement 32(12): 19832001.
Ryan, C. G., P. M. Grant, et al. (2006). "The validity and reliability of a novel activity monitor as a
measure of walking." British journal of sports medicine 40(9): 779-784.
Safrit, M. and T. Wood (1995). Introduction to measurement in physical education and exercise
science. St. Louis, MO, Mosby.
Sallis, J. F., M. J. Buono, et al. (1990). "The Caltrac accelerometer as a physical activity monitor
for school-age children." Medicine and science in sports and exercise 22(5): 698-703.
Sallis, J. F. and B. E. Saelens (2000). "Assessment of physical activity by self-report: status,
limitations, and future directions." Research quarterly for exercise and sport 71(2 Suppl):
S1-14.
Santos-Lozano, A., P. J. Marin, et al. (2012). "Technical variability of the GT3X accelerometer."
Med Eng Phys 34(6): 787-790.

258

Santos-Lozano, A., G. Torres-Luque, et al. (2012). "Intermonitor variability of GT3X
accelerometer." International journal of sports medicine 33(12): 994-999.
Sasaki, J. E., D. John, et al. (2011). "Validation and comparison of ActiGraph activity monitors."
Journal of science and medicine in sport / Sports Medicine Australia 14(5): 411-416.
SBRN (2012). "Letter to the Editor: Standardized use of the terms "sedentary" and "sedentary
behaviours"." Appl Physiol Nutr Metab 37: 540-542.
Schrage, W. G. (2008). "Not a search in vein: novel stimulus for vascular dysfunction after
simulated microgravity." Journal of applied physiology 104(5): 1257-1258.
Seider, M. J., W. F. Nicholson, et al. (1982). "Insulin resistance for glucose metabolism in disused
soleus muscle of mice." The American journal of physiology 242(1): E12-18.
Shephard, R. J. (1990). "Physical activity and cancer." International journal of sports medicine
11(6): 413-420.
Shephard, R. J. (2003). "Limits to the measurement of habitual physical activity by
questionnaires." British journal of sports medicine 37(3): 197-206; discussion 206.
Shepherd, E. F., E. Toloza, et al. (1999). "Step activity monitor: increased accuracy in quantifying
ambulatory activity." J Orthop Res 17(5): 703-708.
Shields, M. and M. S. Tremblay (2008). "Sedentary behaviour and obesity." Health reports /
Statistics Canada, Canadian Centre for Health Information = Rapports sur la sante /
Statistique Canada, Centre canadien d'information sur la sante 19(2): 19-30.
Skotte, J., M. Korshoj, et al. (2012). "Detection of Physical Activity Types Using Triaxial
Accelerometers." Journal of Physical Activity & Health.
Skotte, J., M. Korshoj, et al. (2014). "Detection of physical activity types using triaxial
accelerometers." Journal of physical activity & health 11(1): 76-84.
Slattery, M. L. (2004). "Physical activity and colorectal cancer." Sports medicine 34(4): 239-252.
Slootmaker, S. M., A. J. Schuit, et al. (2009). "Disagreement in physical activity assessed by
accelerometer and self-report in subgroups of age, gender, education and weight status."
The international journal of behavioral nutrition and physical activity 6: 17.
Smorawinski, J., P. Kubala, et al. (1996). "Effects of three day bed-rest on circulatory, metabolic
and hormonal responses to oral glucose load in endurance trained athletes and untrained
subjects." Journal of gravitational physiology : a journal of the International Society for
Gravitational Physiology 3(2): 44-45.
259

Spurr, G. B., A. M. Prentice, et al. (1988). "Energy expenditure from minute-by-minute heart-rate
recording: comparison with indirect calorimetry." The American journal of clinical
nutrition 48(3): 552-559.
Stamatakis, E., M. Hamer, et al. (2011). "Screen-based entertainment time, all-cause mortality, and
cardiovascular events: population-based study with ongoing mortality and hospital events
follow-up." Journal of the American College of Cardiology 57(3): 292-299.
Staudenmayer, J., D. Pober, et al. (2009). "An artificial neural network to estimate physical activity
energy expenditure and identify physical activity type from an accelerometer." Journal of
applied physiology 107(4): 1300-1307.
Strath, S. J., D. R. Bassett, Jr., et al. (2001). "Simultaneous heart rate-motion sensor technique to
estimate energy expenditure." Medicine and science in sports and exercise 33(12): 21182123.
Strath, S. J., D. R. Bassett, Jr., et al. (2002). "Validity of the simultaneous heart rate-motion sensor
technique for measuring energy expenditure." Medicine and science in sports and exercise
34(5): 888-894.
Stuart, C. A., R. E. Shangraw, et al. (1988). "Bed-rest-induced insulin resistance occurs primarily
in muscle." Metabolism: clinical and experimental 37(8): 802-806.
Sun, D. X., G. Schmidt, et al. (2008). "Validation of the RT3 accelerometer for measuring physical
activity of children in simulated free-living conditions." Pediatric exercise science 20(2):
181-197.
Swartz, A. M., L. Squires, et al. (2011). "Energy expenditure of interruptions to sedentary
behavior." The international journal of behavioral nutrition and physical activity 8: 69.
Swartz, A. M., S. J. Strath, et al. (2000). "Estimation of energy expenditure using CSA
accelerometers at hip and wrist sites." Medicine and science in sports and exercise 32(9
Suppl): S450-456.
Tapia, E. M., S. S. Intillie, et al. (2007). "Real-time recognition of physical activities and their
intensities using wireless accelerometers and a heart rate monitor." Proceedings of the
International Symposium on Wearable Computers.
Thompson, F. and A. Subar (2013). Dietary assessment methodology. Nutrition in the prevention
and treatment of disease. A. Coulston, C. Boushey and M. Ferruzzi. London, UK, Elsevier.
3.

260

Thorp, A. A., N. Owen, et al. (2011). "Sedentary behaviors and subsequent health outcomes in
adults a systematic review of longitudinal studies, 1996-2011." American journal of
preventive medicine 41(2): 207-215.
Thune, I. and A. S. Furberg (2001). "Physical activity and cancer risk: dose-response and cancer,
all sites and site-specific." Medicine and science in sports and exercise 33(6 Suppl): S530550; discussion S609-510.
Tobin, B. W., P. N. Uchakin, et al. (2002). "Insulin secretion and sensitivity in space flight:
diabetogenic effects." Nutrition 18(10): 842-848.
Troiano, R. P., D. Berrigan, et al. (2008). "Physical activity in the United States measured by
accelerometer." Medicine and science in sports and exercise 40(1): 181-188.
Troiano, R. P. and J. J. McClain (2012). Objective measuremes of physical activity, strength, sleep,
and strength in US National Health and Nutrition Examination Survey (NHANES) 20112014. The 8th International Conference on Diet and Activity Methods, Rome, Italy.
Troiano, R. P., J. J. McClain, et al. (2014). "Evolution of accelerometer methods for physical
activity research." British journal of sports medicine.
Trost, S. G., K. L. McIver, et al. (2005). "Conducting accelerometer-based activity assessments in
field-based research." Medicine and science in sports and exercise 37(11): S531-S543.
Trost, S. G., W. K. Wong, et al. (2012). "Artificial neural networks to predict activity type and
energy expenditure in youth." Medicine and science in sports and exercise 44(9): 18011809.
Tudor-Locke, C. E. and A. M. Myers (2001). "Challenges and opportunities for measuring
physical activity in sedentary adults." Sports medicine 31(2): 91-100.
UBCC. (2009). "Category 2 enhanced phenotyping at baseline assessment visit in last 100-150,000
participants." from http://www.ukbiobank.ac.uk/wpcontent/uploads/2011/06/Protocol_addendum_2.pdf.
van Hees, V. T., R. Golubic, et al. (2013). "Impact of study design on development and evaluation
of an activity-type classifier." Journal of applied physiology 114(8): 1042-1051.
van Poppel, M. N., M. J. Chinapaw, et al. (2010). "Physical activity questionnaires for adults: a
systematic review of measurement properties." Sports medicine 40(7): 565-600.
Vanhelst, J., G. Baquet, et al. (2012). "Comparative interinstrument reliability of uniaxial and
triaxial accelerometers in free-living conditions." Percept Mot Skills 114(2): 584-594.
261

Veltink, P. H., H. B. Bussmann, et al. (1996). "Detection of static and dynamic activities using
uniaxial accelerometers." IEEE Trans Rehabil Eng 4(4): 375-385.
Webster, J. B., D. F. Kripke, et al. (1982). "An activity-based sleep monitor system for ambulatory
use." Sleep 5(4): 389-399.
Welch, W. A., D. R. Bassett, et al. (2014). "Cross-validation of Waist-Worn GENEA
Accelerometer Cut-Points." Medicine and science in sports and exercise.
Welch, W. A., D. R. Bassett, et al. (2013). "Classification accuracy of the wrist-worn gravity
estimator of normal everyday activity accelerometer." Medicine and science in sports and
exercise 45(10): 2012-2019.
Welk, G. J. (2002). "Reliability of the CSA activity monitor for assessing physical activity."
Research quarterly for exercise and sport 73: A14.
Welk, G. J. (2002). Use of Accelerometry-Based Activity Monitors to Assess Physical Activity.
Physical Activity Assessments for Health-Related Research. G. J. Welk. Champaign, IL,
Human Kinetics, Inc.: 125-142.
Welk, G. J. and C. B. Corbin (1995). "The validity of the Tritrac-R3D Activity Monitor for the
assessment of physical activity in children." Research quarterly for exercise and sport
66(3): 202-209.
Welk, G. J., J. J. McClain, et al. (2007). "Field validation of the MTI Actigraph and BodyMedia
armband monitor using the IDEEA monitor." Obesity 15(4): 918-928.
Westerterp, K. R. (1999). "Assessment of physical activity level in relation to obesity: current
evidence and research issues." Medicine and science in sports and exercise 31(11 Suppl):
S522-525.
Wijndaele, K., G. N. Healy, et al. (2010). "Increased cardiometabolic risk is associated with
increased TV viewing time." Medicine and science in sports and exercise 42(8): 15111518.
Wong, T. C., J. G. Webster, et al. (1981). "Portable accelerometer device for measuring human
energy expenditure." IEEE transactions on bio-medical engineering 28(6): 467-471.
Yanagibori, R., K. Kondo, et al. (1998). "Effect of 20 days' bed rest on the reverse cholesterol
transport system in healthy young subjects." Journal of internal medicine 243(4): 307-312.
Yanagibori, R., Y. Suzuki, et al. (1997). "The effects of 20 days bed rest on serum lipids and
lipoprotein concentrations in healthy young subjects." Journal of gravitational physiology :
a journal of the International Society for Gravitational Physiology 4(1): S82-90.
262

Zderic, T. W. and M. T. Hamilton (2006). "Physical inactivity amplifies the sensitivity of skeletal
muscle to the lipid-induced downregulation of lipoprotein lipase activity." Journal of
applied physiology 100(1): 249-257.
Zerwekh, J. E., L. A. Ruml, et al. (1998). "The effects of twelve weeks of bed rest on bone
histology, biochemical markers of bone turnover, and calcium homeostasis in eleven
normal subjects." Journal of bone and mineral research : the official journal of the
American Society for Bone and Mineral Research 13(10): 1594-1601.
Zhang, K., F. X. Pi-Sunyer, et al. (2004). "Improving energy expenditure estimation for physical
activity." Medicine and science in sports and exercise 36(5): 883-889.
Zhang, K., P. Werner, et al. (2003). "Measurement of human daily physical activity." Obesity
research 11(1): 33-40.
Zhang, S., A. V. Rowlands, et al. (2012). "Physical activity classification using the GENEA wristworn accelerometer." Medicine and science in sports and exercise 44(4): 742-748.

263