OCCUPANT BEHAVIOR PREDICTION MODEL BASED ON ENERGY CONSUMPTION USING MACHINE LEARNING APPROACHES By Yunjeong Mo A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Planning, Design and Construction—Doctor of Philosophy 2018 i OCCUPANT BEHAVIOR PREDICTION MODEL BASED ON ENERGY CONSUMPTION USING MACHINE LEARNING APPROACHES ABSTRACT By Yunjeong Mo Building sectors use the largest amount of energy among all energy-consuming sectors, and the residential sector constitutes 39 percent of the electricity consumption in the United States, which is the highest consumption among the various electricity-consuming sectors. The goals of this research are to identify a relationship between energy consumption and occupant behavior in a detailed level while also considering building technology, and to build a behavior prediction model using machine learning approaches based on energy consumption data. This research consists of four main parts: (1) Part I provides a theoretical foundation for the rest of the research, and develops the Occupant Behavior Prediction Model and apply the model to the American Time Use Survey (ATUS) data, (2) Part II focuses on analyzing energy usage-related behaviors and activities with the ATUS data, (3) Part III analyzes building technologies, including appliances, and energy usage with the Residential Energy Consumption Survey (RECS) data, and (4) Part IV combines the findings from the previous parts and applies the Occupant Behavior Prediction Model to the sensor-measured dataset. This research will have an impact on residential occupant behavior by helping occupants better understand their own behaviors’ effects on energy usage, and detect what changes would improve energy efficiency in their homes. The findings will be beneficial to energy- related industries, and energy research area. In addition, the Occupant Behavior Prediction Model has the potential to be further integrated with research in other fields. Copyright by YUNJEONG MO 2018 TABLE OF CONTENTS LIST OF TABLES ...................................................................................................................... vii LIST OF FIGURES ...................................................................................................................... x OVERVIEW OF THE RESEARCH ................................................................... 1 Introduction ..................................................................................................................... 1 Problem Statement .......................................................................................................... 3 Goals and Objectives ...................................................................................................... 6 Research Structure .......................................................................................................... 9 1.4.1. Research Design and Structure ............................................................................... 9 1.4.2. Main Datasets ........................................................................................................ 12 1.4.3. Main Methodology: Machine Learning ................................................................ 14 Research Scope and Assumptions ................................................................................ 20 Definition of Occupancy ............................................................................................... 21 Relationship between Energy, Technology, and Behavior ........................................... 22 Summary ....................................................................................................................... 23 OCCUPANT BEHAVIOR PREDICTION MODEL ON ENERGY CONSUMPTION IN RESIDENTIAL BUILDINGS ............................................................... 25 Abstract ..................................................................................................................................... 25 Introduction ................................................................................................................... 25 Theoretical Background ................................................................................................ 27 Habitual Occupant Behavior ................................................................................. 27 2.2.1. 2.2.2. Energy, Building Technologies and Occupants .................................................... 30 Behavior Prediction Model ........................................................................................... 35 2.3.1. Occupant Behavior Prediction Model ................................................................... 35 2.3.2. Main Components of the Model ........................................................................... 37 Case Study: Extract Attributes from the ATUS to Fit the Model ................................. 40 Overview of the ATUS Data ................................................................................. 40 2.4.1. 2.4.2. Reclassification of Activities ................................................................................ 42 2.4.3. Variables (Attributes) ............................................................................................ 44 Case Study: ML Classification Process ........................................................................ 45 2.5.1. Pre-Analysis .......................................................................................................... 46 Algorithm Selection .............................................................................................. 46 2.5.2. Feature Engineering .............................................................................................. 49 2.5.3. 2.5.4. Parameter Tuning .................................................................................................. 49 2.5.5. Subgroup Analysis ................................................................................................ 50 Case Study: Result ........................................................................................................ 51 Pre-Analysis .......................................................................................................... 51 2.6.1. Features (Variables) and Instances ............................................................... 51 Descriptive Analysis ..................................................................................... 51 Algorithm Selection .............................................................................................. 57 2.6.1.1. 2.6.1.2. 2.6.2. iv Feature Engineering .............................................................................................. 58 2.6.3. Parameter Tuning .................................................................................................. 60 2.6.4. 2.6.5. Subgroup Analysis ................................................................................................ 64 Discussion ..................................................................................................................... 67 3.3.1.1. 3.3.1.2. 3.3.2. 3.3.3. DAILY BEHAVIOR PATTERN AND FACTORS AFFECTING OCCUPANT BEHAVIOR IN RESIDENTIAL BUILDINGS ................................................ 69 Abstract ..................................................................................................................................... 69 Introduction ................................................................................................................... 69 Background ................................................................................................................... 71 Occupant Behavior Prediction Model ................................................................... 71 3.2.1. 3.2.2. Use of the ATUS in Occupant Behavior Studies .................................................. 72 3.2.3. Use of GIS in Building/Construction Studies ....................................................... 74 Methodology ................................................................................................................. 76 3.3.1. Clustering of Occupant Daily Activities by Time ................................................ 78 Data Preparation ............................................................................................ 78 K-modes Clustering ...................................................................................... 79 Comparative Analysis for Energy Usage-Related Activities ................................ 80 GIS Analysis for Habitual Energy Usage-Related Activities ............................... 83 GIS Visualization: Comparison of Activities by States ................................ 84 3.3.3.1. 3.3.3.2. GIS Grouping Analysis: Grouping of Activities with K-means Clustering . 85 Result ............................................................................................................................ 86 3.4.1. Clustering of Occupant Daily Activities by Time ................................................ 86 Comparative Analysis for Energy Usage-Related Activities ................................ 94 3.4.2. Activities by Region ..................................................................................... 95 Activities by Day of the Week .................................................................... 100 Activities by Gender ................................................................................... 106 Activities by Job Status ............................................................................... 112 3.4.3. Spatial Analysis for Habitual Energy Usage-Related Activities ........................ 118 Discussion and Conclusion ......................................................................................... 128 3.4.2.1. 3.4.2.2. 3.4.2.3. 3.4.2.4. EFFECTIVE FACTORS TO PREDICT RESIDENTIAL ENERGY CONSUMPTION USING MACHINE LEARNING ............................................................. 131 Abstract ................................................................................................................................... 131 Introduction ................................................................................................................. 131 Background ................................................................................................................. 132 Data ............................................................................................................................. 134 Overview of RECS Data ..................................................................................... 134 4.3.1. 4.3.2. Data Pre-Process ................................................................................................. 135 Methodology ............................................................................................................... 136 4.4.1. Feature Selection ................................................................................................. 136 4.4.2. Algorithm Selection ............................................................................................ 137 Result .......................................................................................................................... 140 4.5.1. Main Factors of Energy Consumption ................................................................ 140 4.5.2. Energy Consumption Prediction ......................................................................... 145 Conclusion .................................................................................................................. 149 v VALIDATION OF THE OCCUPANT BEHAVIOR PREDICTION MODEL USING REAL-WORLD HOME ENERGY SENSORS ........................................................ 151 Abstract ................................................................................................................................... 151 Introduction ................................................................................................................. 151 Background ................................................................................................................. 153 Data Used for Existing Studies ........................................................................... 153 5.2.1. 5.2.1.1. Measured Data (Energy, Occupant Behavior) ............................................ 154 5.2.1.2. Survey Data ................................................................................................. 155 RECS .......................................................................................................... 156 5.2.1.3. 5.2.1.4. ATUS .......................................................................................................... 157 5.2.2. Methods Used for Existing Studies ..................................................................... 158 5.2.2.1. Machine Learning / Data Mining ................................................................ 158 Statistics ...................................................................................................... 159 5.2.2.2. 5.2.2.3. Simulation / Modeling ................................................................................ 160 Data ............................................................................................................................. 161 Sensor Measured Data ........................................................................................ 161 5.3.1. Other Data ........................................................................................................... 162 5.3.2. 5.3.3. Data Pre-Process ................................................................................................. 164 Methodology ............................................................................................................... 167 Classification: Predicting Appliances ................................................................. 167 5.4.1. Clustering: Grouping Electricity Usage Pattern ................................................. 168 5.4.2. 5.4.3. Descriptive Analysis: Connecting Energy – Technology – Behavior ................ 169 Result .......................................................................................................................... 171 Classification ....................................................................................................... 171 5.5.1. Clustering ............................................................................................................ 173 5.5.2. 5.5.3. Descriptive Analysis ........................................................................................... 176 ATUS: Activity ........................................................................................... 176 5.5.3.1. 5.5.3.2. RECS: Energy and Appliance ..................................................................... 181 Discussion and Conclusion ......................................................................................... 184 SUMMARY AND CONCLUSION OF THE RESEARCH ........................... 186 Summary of Research ................................................................................................. 186 Summary of Findings .................................................................................................. 187 Contributions .............................................................................................................. 189 Intellectual Merit ......................................................................................................... 190 Broad Impacts ............................................................................................................. 191 Limitations .................................................................................................................. 191 Future Research .......................................................................................................... 192 APPENDICES ........................................................................................................................... 195 APPENDIX A. GIS Analysis for Main Activities: All Maps .............................................. 196 APPENDIX B. Descriptive Analysis for Activities: Full Tables ........................................ 226 BIBLIOGRAPHY ..................................................................................................................... 232 vi LIST OF TABLES Table 2-1. ATUS 1st Tier Activities .............................................................................................. 41 Table 2-2. Energy Usage-Related Activities (3rd Tier) ................................................................. 42 Table 2-3. New Activity Code, Energy, Appliances .................................................................... 43 Table 2-4. Partner Code ................................................................................................................ 44 Table 2-5. Place Code ................................................................................................................... 45 Table 2-6. Distribution of Partner for Each Activity .................................................................... 52 Table 2-7. Performance of Different Algorithms ......................................................................... 57 Table 2-8. Pearson Correlations between Features ....................................................................... 58 Table 2-9. Performance of Additional Features ............................................................................ 59 Table 2-10. Problematic Cells in Confusion Matrix ..................................................................... 63 Table 2-11. Descriptive Analysis for Problematic Cells .............................................................. 63 Table 2-12. Predictive Performance of Each Activity .................................................................. 64 Table 2-13. Performance of Subgroups ........................................................................................ 66 Table 3-1. Sample Inputs for Clustering Analysis ........................................................................ 79 Table 3-2. Energy Usage-Related Activities (3rd Tier) (=Table 2-2) ............................................ 81 Table 3-3. Activities and Associated Energy and Appliances (=Table 2-3) ................................. 82 Table 3-4. Census Regions ........................................................................................................... 82 Table 3-5. Main Habitual Energy Usage-Related Activities ........................................................ 84 Table 3-6. Number of Occupants by Cluster ................................................................................ 87 Table 3-7. Distribution of Data by Cluster ................................................................................... 88 Table 3-8. Centroid of Occupant Cluster 1 ................................................................................... 90 vii Table 3-9. Centroid of Occupant Cluster 2 ................................................................................... 90 Table 3-10. Centroid of Occupant Cluster 3 ................................................................................. 91 Table 3-11. Centroid of Occupant Cluster 4 ................................................................................. 91 Table 3-12. Centroid of Occupant Cluster 5 ................................................................................. 92 Table 3-13. Centroid of Occupant Cluster 6 ................................................................................. 92 Table 3-14. Difference in Activities by Region .......................................................................... 100 Table 3-15. Differences in Activities by Day of the Week ......................................................... 106 Table 3-16. Differences in Activities by Gender ........................................................................ 112 Table 3-17. Differences in Activities by Job Status ................................................................... 118 Table 4-1. Categories of RECS Data .......................................................................................... 135 Table 4-2. Selected Features from All ........................................................................................ 141 Table 4-3. Selected Features from Appliances ........................................................................... 142 Table 4-4. Selected Features from Behavior ............................................................................... 142 Table 4-5. Selected Features from Technology .......................................................................... 143 Table 4-6. Selected Features from Demographic ....................................................................... 144 Table 4-7. Selected Features from Application and Behavior .................................................... 144 Table 4-8. Algorithm Performance with All Features ................................................................ 145 Table 4-9. Algorithm Performance with Appliance Features ..................................................... 146 Table 4-10. Algorithm Performance with Behavior Features ..................................................... 146 Table 4-11. Algorithm Performance with Technology Features ................................................ 146 Table 4-12. Algorithm Performance with Demographic Features .............................................. 147 Table 4-13. Algorithm Performance with Application and Behavior Features .......................... 147 Table 4-14. Performance Comparison by Different Features ..................................................... 148 viii Table 5-1. Selected Features from Appliances (=Table 4-3) ...................................................... 164 Table 5-2. Activities and Associated Energy and Appliances (=Table 2-3) ............................... 171 Table 5-3. Appliance List from Sensor Data .............................................................................. 172 Table 5-4. Performance of Appliance Prediction ....................................................................... 173 Table 5-5. Descriptive Analysis of Clusters ............................................................................... 175 Table 5-6. Weekday Activities and Appliances .......................................................................... 178 Table 5-7. Weekend Activities and Appliances .......................................................................... 178 Table 5-8. Energy Usage-Related Activities and Appliances ..................................................... 181 Table 5-9. Yearly Electricity Usage of the Selected RECS Samples ......................................... 182 Table 5-10. Appliances of the Selected RECS Samples ............................................................. 182 Table A-1. Mean and CV of Activities by Cluster ..................................................................... 226 Table A-2. Mean and CV of Activities by Region ..................................................................... 229 Table A-3. Mean and CV of Activities by Day .......................................................................... 230 Table A-4. Mean and CV of Activities by Gender ..................................................................... 231 ix LIST OF FIGURES Figure 1-1. Research Goal Overview .............................................................................................. 7 Figure 1-2. Research Structure Overview ....................................................................................... 9 Figure 1-3. Research Design and Structure .................................................................................. 10 Figure 1-4. Plots of Polynomials Having Various Orders (Bishop, 2006) ................................... 15 Figure 1-5. Reducing Over-Fitting with More Data (Bishop, 2006) ............................................ 16 Figure 1-6. Reducing Over-Fitting with Regularization (Bishop, 2006) ...................................... 16 Figure 1-7. Bias-Variance Tradeoff (Dubrawski, 2015) ............................................................... 18 Figure 1-8. Dependence of Bias and Variance on Model Complexity (Bishop, 2006) ................ 19 Figure 1-9. Differences of Models ................................................................................................ 23 Figure 2-1. Chapter Outline .......................................................................................................... 27 Figure 2-2. Formation of Occupant Behavior ............................................................................... 28 Figure 2-3. Subcategories of Energy-Tech-Occupants ................................................................. 31 Figure 2-4. Occupant Behavior/Activity Prediction Model .......................................................... 36 Figure 2-5. Data Analysis Process ................................................................................................ 46 Figure 2-6. Confusion Matrix ....................................................................................................... 49 Figure 2-7. Range of Frequency for Each Activity ....................................................................... 54 Figure 2-8. Range of Duration for Each Activity (a: top, b: bottom) ........................................... 55 Figure 2-9. Range of Start Time for Each Activity ...................................................................... 56 Figure 2-10. Range of End Time for Each Activity ...................................................................... 56 Figure 2-11. Accuracy by Different gamma Values (a: left, b: right) .......................................... 60 Figure 2-12. Accuracy by Different C Values .............................................................................. 61 x Figure 2-13. Confusion Matrix: Comparison of Actual vs. Predicted Numbers .......................... 62 Figure 2-14. Confusion Matrix: Comparison of Actual vs. Predicted Accuracy .......................... 62 Figure 3-1. BIC for Number of K ................................................................................................. 87 Figure 3-2. Daily Activity Routines of Occupant Clusters ........................................................... 93 Figure 3-3. Comparison of Frequency by Region ........................................................................ 95 Figure 3-4. Comparison of Duration per Act by Region .............................................................. 96 Figure 3-5. Comparison of Duration per Day by Region ............................................................. 97 Figure 3-6. Comparison of Start Time by Region ........................................................................ 98 Figure 3-7. Comparison of End Time by Region ......................................................................... 98 Figure 3-8. Comparison of Partner by Region .............................................................................. 99 Figure 3-9. Comparison of Frequency by Day of the Week ....................................................... 101 Figure 3-10. Comparison of Duration per act by Day of the Week ............................................ 102 Figure 3-11. Comparison of Duration per Day by Day of the Week .......................................... 103 Figure 3-12. Comparison of Start Time by Day of the Week ..................................................... 104 Figure 3-13. Comparison of End Time by Day of the Week ...................................................... 104 Figure 3-14. Comparison of Partner by Day of the Week .......................................................... 105 Figure 3-15. Comparison of Frequency by Gender .................................................................... 107 Figure 3-16. Comparison of Duration per Act by Gender .......................................................... 108 Figure 3-17. Comparison of Duration per Day by Gender ......................................................... 109 Figure 3-18. Comparison of Start Time by Gender .................................................................... 110 Figure 3-19. Comparison of End Time by Gender ..................................................................... 110 Figure 3-20. Comparison of Partner by Gender .......................................................................... 111 Figure 3-21. Comparison of Frequency by Job Status ................................................................ 113 xi Figure 3-22. Comparison of Duration per Act by Job Status ..................................................... 114 Figure 3-23. Comparison of Duration per Day by Job Status ..................................................... 115 Figure 3-24. Comparison of Start Time by Job Status ................................................................ 116 Figure 3-25. Comparison of End Time by Job Status ................................................................. 116 Figure 3-26. Comparison of Partner by Job Status ..................................................................... 117 Figure 3-27. LL01 Number of K ................................................................................................. 119 Figure 3-28. LL01 Group Analysis ............................................................................................. 120 Figure 3-29. LL01 State Clusters by Grouping Analysis ........................................................... 122 Figure 3-30. LL01 Frequency by Quantiles ................................................................................ 123 Figure 3-31. LL01 Duration by Quantiles .................................................................................. 124 Figure 3-32. LL01 Start Time by Quantiles ................................................................................ 124 Figure 3-33. LL03 End Time by Quantiles ................................................................................. 125 Figure 3-34. LL01 Partner .......................................................................................................... 125 Figure 3-35. AA01 State Clusters by Grouping Analysis ........................................................... 126 Figure 3-36. CD01 State Clusters by Grouping Analysis ........................................................... 127 Figure 3-37. BB03 State Clusters by Grouping Analysis ........................................................... 127 Figure 3-38. BB04 State Clusters by Grouping Analysis ........................................................... 128 Figure 4-1. Performance Comparison by Different Features ...................................................... 149 Figure 5-1. Data Collection Process ........................................................................................... 161 Figure 5-2. Daily Activity Routines of Occupant Clusters (=Figure 3-2) .................................. 163 Figure 5-3. Appliance/Activity Prediction with Occupant Behavior Prediction Model ............. 165 Figure 5-4. Overall Research Flow ............................................................................................. 167 Figure 5-5. Elbow Method with Distortion ................................................................................. 174 xii Figure 5-6. Daily Electricity Usage of Clusters .......................................................................... 175 Figure 5-7. Mode Activities of the Selected ATUS Samples ..................................................... 177 Figure 5-8. Weekday Activities .................................................................................................. 179 Figure 5-9. Weekend Activities .................................................................................................. 180 Figure 6-1. Summary of the Research ........................................................................................ 186 Figure 6-2. Research Contributions ............................................................................................ 189 Figure A-1. AA01 State Clusters by Grouping Analysis ............................................................ 196 Figure A-2. AA01 Frequency by Quantiles ................................................................................ 197 Figure A-3. AA01 Duration by Quantiles ................................................................................... 198 Figure A-4. AA01 Start Time by Quantiles ................................................................................ 199 Figure A-5. AA01 End Time by Quantiles ................................................................................. 200 Figure A-6. AA01 Partner ........................................................................................................... 201 Figure A-7. LL01 State Clusters by Grouping Analysis ............................................................. 202 Figure A-8. LL01 Frequency by Quantiles ................................................................................. 203 Figure A-9. LL01 Duration by Quantiles ................................................................................... 204 Figure A-10. LL01 Start Time by Quantiles ............................................................................... 205 Figure A-11. LL01 End Time by Quantiles ................................................................................ 206 Figure A-12. LL01 Partner ......................................................................................................... 207 Figure A-13. CD01 State Clusters by Grouping Analysis .......................................................... 208 Figure A-14. CD01 Frequency by Quantiles .............................................................................. 209 Figure A-15. CD01 Duration by Quantiles ................................................................................. 210 Figure A-16. CD01 Start Time by Quantiles .............................................................................. 211 Figure A-17. CD01 End Time by Quantiles ............................................................................... 212 xiii Figure A-18. CD01 Partner ......................................................................................................... 213 Figure A-19. BB03 State Clusters by Grouping Analysis .......................................................... 214 Figure A-20. BB03 Frequency by Quantiles .............................................................................. 215 Figure A-21. BB03 Duration by Quantiles ................................................................................. 216 Figure A-22. BB03 Start Time by Quantiles .............................................................................. 217 Figure A-23. BB03 End Time by Quantiles ............................................................................... 218 Figure A-24. BB03 Partner ......................................................................................................... 219 Figure A-25. BB04 State Clusters by Grouping Analysis .......................................................... 220 Figure A-26. BB04 Frequency by Quantiles .............................................................................. 221 Figure A-27. BB04 Duration by Quantiles ................................................................................. 222 Figure A-28. BB04 Start Time by Quantiles .............................................................................. 223 Figure A-29. BB04 End Time by Quantiles ............................................................................... 224 Figure A-30. BB04 Partner ......................................................................................................... 225 xiv OVERVIEW OF THE RESEARCH Introduction Building sectors use the largest amount of energy among all energy-consuming sectors, and approximately more than 70 percent of electricity and 50 percent of natural gas is consumed by the building sector in the United States (Diao, Sun, Chen, & Chen, 2017). The residential sector constitutes 39 percent of the electricity consumption in the United States, which is the highest consumption among the various electricity-consuming sectors (Johnson, Starke, Abdelaziz, Jackson, & Tolbert, 2014). Residential building energy consumption is affected by various factors, such as climate, physical properties of the building, building services and energy systems, appliances in the household, occupants’ activities and behavior, and the interactions among them (Widén & Wäckelgård, 2010). As the quality of thermal properties improves and the technologies for energy efficient appliances become more advanced, the overall energy consumption associated with buildings’ physical properties and appliances is decreasing. Despite the decreased energy consumption due to the development of these technologies and the stricter requirements regarding energy efficiency of buildings and appliances, overall building energy consumption has not decreased (Chen et al., 2015). This can be explained by the influence of occupant behavior and living style, which emphasizes the significant role of occupant behavior in residential energy savings. 1 Unlike commercial building occupants, residential occupants have a high degree of energy control. They can control heating, ventilation, and air conditioning (HVAC) systems, lighting and electronic devices, and kitchen and laundry appliances, which are the main consumers of energy in residential buildings (Li & Jiang, 2006). This suggests that residential energy consumption can be significantly reduced by changing the energy usage-related behaviors of the occupants. Various models explaining occupant behavior have been developed to estimate residential energy consumption. Darby (2006) stated that energy consumption was reduced by up to 20 percent when improved energy feedback was provided to the occupants. Wood and Newborough (2003) reported energy savings of more than 10 percent by using more specific information strategies. Similarly, Ouyang and Hokao (2009) reported an average of 14 percent energy savings achieved solely by improving occupant behavior. Compared to the climate or buildings’ physical attributes, occupant behavior is more difficult to quantify and assess. Recent studies (Aksanli, Akyurek, & Rosing, 2016; Diao et al., 2017; Sanquist, Orr, Shui, & Bittner, 2012; Santin, Itard, & Visscher, 2009) have analyzed detailed usage data of each appliance in a household to measure occupant behavior, since the use of appliances is heavily influenced by occupants’ behavioral patterns at varying times and days. However, limitations still exist in the previous studies, and a more rational and systematic classification method for occupant behaviors and building attributes, and a solid model explaining their relationships, are needed to improve energy strategies. 2 Problem Statement Several studies have examined occupant behavior with regard to energy consumption in residential buildings. However, a more comprehensive and systematic study is still needed to solve the existing problems outlined below. Problem #1: There is a lack of comprehensive understanding of occupant behavior with regard to building technology and energy consumption in the residential sector. One of the significant barriers to finding a measurable relationship between occupant behavior and energy consumption is the lack of a thorough understanding of occupant behaviors in residential buildings. Traditionally, behavioral patterns have been classified based on occupants’ socioeconomic factors, such as age, gender, marital status, number of children, employment status, and income level. However, this method has significant shortcomings, as socioeconomic factors cannot fully explain their energy consumption patterns. Even if occupants have similar characteristics, it does not guarantee similar behavioral patterns (Diao et al., 2017). Occupant behavior is associated with more than just socioeconomic factors, and actual occupant behavior is determined by multifaceted variables. It is critical to comprehensively identify all the relevant occupant characteristics and the hierarchy of occupant behavior, along with other external factors such as building attributes and climate. Therefore, a systematic and thorough approach is 3 necessary to define and understand occupant behavior comprehensively, and furthermore, to predict the resulting energy consumption in a more consistent and accurate way (Chen et al., 2015). Problem #2: There is an absence of systematic structures explaining the relationship between occupant behavior, building technology, and energy consumption. Several studies have made efforts to identify the influences of occupant behavior and building technology on building energy consumption. Researchers also measured the influence of occupant behavior on energy consumption through observations and surveys. Although it is obvious that occupant behavior influences building energy usage, previous studies have lacked thorough and clear methods to quantify the effects of occupant behavior. The main reason is that various factors have influences on energy consumption simultaneously, and the individual and interactive effects of these factors are not clearly identified yet (Chen et al., 2015). Yu et al. (2011) identified the simultaneous influences of behavior, physical building attributes, and external environmental factors on building energy consumption. However, the existing methods could not isolate the sole influence of occupant behaviors by removing the effects of other factors. Compared to physical building attributes, such as thermal environment and envelope of the building, occupant behavior is difficult to assess and measure. In addition, when occupant behavior is combined with energy consumption and building attributes, the quantitative assessment of occupant behavior becomes even more complicated. The absence of a systematic model explaining 4 the relationship between behavior, technology, and energy is another significant obstacle to assessing occupant behavior quantitatively. Problem #3: There lacks a model to explain and predict occupant behavior. Although the American Society of Heating, Refrigerating, and Air-Conditioning Engineers (ASHRAE) suggests a standardized occupancy schedule to assess building energy, occupant behavior patterns and time use could be different for each household due to the different occupants’ lifestyles, preferences, and other factors. Most of the existing occupant behavior models have been implemented using survey data. They concentrated on statistical analysis of occupants’ sociodemographic characteristics to predict their energy consumption, which implies the actual user behaviors were mostly guessed (Aksanli et al., 2016; Diao et al., 2017). In order to apply this framework to real-world cases to predict and quantify occupant behavior, the model needs to be more directly based on the actual measured data, but there is a lack of such a detailed prediction model in existing studies. In addition, occupant behavior, building technology, and energy consumption simultaneously interact with one another, and their data are usually recorded as different types, such as numerical, ordinal, categorical, or text types. The concurrent effects of multiple factors and various types of data are difficult for quantitative data analysis to process, and increase the complexity of distinguishing the effects of occupant behavior from other building-related factors. Thus, an occupant behavior prediction model is needed to solve these issues. 5 To address the problems of the existing studies, the following hypotheses are established: • Occupant behavior, energy consumption, and building technology interact all together and their interaction can be explained more effectively by understanding the procedure of behavior formation. • Occupant behavior can be predicted based on their energy consumption pattern. Goals and Objectives The goals of this research are to understand occupant behavior based on energy consumption while also considering building technology, and to build a behavior prediction model using machine learning approaches on energy consumption data. This model can potentially be used for efficient building operation and control strategies. Unlike previous studies, which focused on using occupant behavior to predict energy consumption, or changing occupant behavior with interventions or education, this study investigates the reverse prediction model: using energy consumption to predict occupant behavior. In this research, human behavior can be narrowed down to building occupant behavior. Energy consumption and building technology information are used as inputs to predict occupant behavior as the output. The research aims to reduce the gap between energy consumption and occupant behavior, and to optimize technologies for occupant behavior (Figure 1-1). 6 Energy Usage Reduce the gap Technology Optimize Technologies Occupant Behavior Figure 1-1. Research Goal Overview Based on the problems stated earlier, the objectives of this research are to develop the prediction model as follows. Objective 1: Create a structured list of occupant behaviors, building technologies, and energy consumption based on comprehensive and refined definitions of each category. In order to achieve this objective, first, each main category (occupant behavior, building technology, energy usage) will be examined individually. The subcategories and elements under the main categories will be specified with various techniques, such as literature reviews and machine learning algorithms based on the interactions between energy consumption and other elements. The properties of occupant behavior will be assessed as a single activity level, quantitatively measured with its frequency per day, duration per day, and energy impact. Building technology will be defined based on the subcategories of (1) heating and cooling, (2) light and appliances, (3) ventilation, (4) water, (5) design and construction, and (6) insulation. In addition, the ideal time-interval (e.g. 5-minute interval, daily/weekly/monthly data interval) of time-series data will be examined for each critical element of occupant behavior and energy usage. 7 The comprehensive and inclusive elements of energy, building technologies, and behavior will be collected from existing study results and national public databases, such as the Residential Energy Consumption Survey (RECS)(EIA, 2018) and the American Time Use Survey (ATUS) (U.S.BLS, 2018) results. Among all of the elements collected, the ones critical to energy consumption will be identified and subsets of the highest impact elements will be selected. The refined final list will be used for the next step. Objective 2: Establish an occupant behavior model that explains the systematic relationships between categories (occupant behavior, building technology, energy usage), and interactions between a more detailed level of features. The relationships between the categories and the interactions between individual elements will be evaluated. An occupant behavior model will be established using network-analysis and meta- analysis based on the relationships between categories. The model will be applied to a building simulation in order to evaluate the behavior patterns of the residential building’s occupants, and the effectiveness of the model will be evaluated. Objective 3: Explain and predict occupant behavior using machine learning algorithms. The model will be used to explain and make detailed predictions about occupant behavior, including activity, frequency, duration, and effect on energy consumption using machine learning algorithms. The complex relationships between building technology, energy usage, and user behavior will be simultaneously modeled by different subcategories using multi-task learning techniques. Different than most of the existing models, occupant activity data showing the 8 interactions between occupants and appliances are used, which have been measured in five-minute intervals using sensors. In this step, actual sensor-measured data will be used to train the model. The machine learning model will be evaluated by applying it to other datasets. Research Structure 1.4.1. Research Design and Structure Figure 1-2 explains the overall research structure: developing a behavior prediction model and validating the model using two different datasets. Additional analysis supports the second validation of the model. The reliability and validity of the research are achieved by the triangulation of dual validations using different data sources and types. Behavior Prediction Model Supporting Analysis Model Validation 1 Model Validation 2 Figure 1-2. Research Structure Overview Figure 1-3 summarizes the detailed research process, grouped into its major parts. Each part is independent, but all four parts are connected. First, the Occupant Behavior Prediction Model is developed and validated by applying it to the American Time Use Data (ATUS), the Residential Energy Consumption Survey (RECS), and sensor-measured data. Each part is explained as follows. 9 Part I Literature Review Meta-Analysis Habitual Activities Structural List/ Relationship: OB, Tech, Energy Behavior Prediction Model Model/Framework Method Data (Input/Output) Model Validation 1 ATUS Data ML: Classification Habitual Activities Part II Part III Part IV ATUS Habitual Activities Descriptive Analysis, ML: Clustering GIS, Spatial Analysis: Clustering Activity Pattern RECS Data Feature Selection Effective Tech, OB Numeric Prediction Energy Usage Model Validation 2 Sensor Data ML: Classification Appliances Table Mapping Activities RECS (Selected) Descriptive Analysis ATUS (Selected) Descriptive Analysis Figure 1-3. Research Design and Structure Part I (Chapter 2): Objective 1, 2, 3 Part I aims to achieve Objectives 1, 2, and 3. It provides a theoretical foundation for the rest of the research, and develops the Occupant Behavior Prediction Model with the following steps. The ATUS data are used in Part I. ● Review existing literature/studies about occupant behavior and energy usage, and perform meta-analysis to combine the results of the selected studies. ● Derive structured lists and relationships between occupant behavior, building technology, and energy usage. 10 ● Review existing literature/studies about habitual behaviors and activities from psychology, business, and building energy studies, then delineate the main characteristics of habitual behaviors and activities. ● Develop the Occupant Behavior Prediction Model based on the structured lists and the characteristics of habitual activities. ● Apply the model to the ATUS data using machine learning classification algorithms to predict energy usage–related activities and to define habitual/predictable activities (Model Validation 1). Part II (Chapter 3): Objective 1, 3 Part II aims to achieve Objectives 1 and 3, and focuses on analyzing energy usage-related behaviors and activities with the steps below. The ATUS habitual energy usage–related activities defined in Part I are used as input data in Part II. ● Perform descriptive analysis and K-modes clustering to identify patterns in the energy usage–related activities. ● Perform spatial analysis (K-means clustering) and demonstrate the geographical differences of the activities using geographical information system (GIS). ● Detect the activity patterns by region, gender, day of the week, etc. Part III (Chapter 4): Objective 1, 3 Part III aims to achieve Objectives 1 and 3, and focuses on analyzing building technologies, including appliances, and energy usage. The RECS data are used in Part III. 11 ● Select features that have significant impacts on energy usage. The categories of the features include home appliances, building envelopes, demographic information of the respondents, occupant behavior, etc. ● Predict energy consumption with the selected features using machine learning algorithms to verify the features’ predictive effectiveness. Part IV (Chapter 5): Objective 2, 3 Part IV aims to achieve Objectives 2 and 3. It combines the findings from the previous parts and applies the Occupant Behavior Prediction Model to the sensor-measured dataset. The sensor- measured data are mainly used, and the ATUS and the RECS are also used to support further analysis in Part IV. ● Predict appliances with the features specified by the Occupant Behavior Prediction Model using machine learning numeric prediction algorithms on the sensor-measured data. ● Estimate the related activities using the appliance-activity mapping table defined in Part I. ● Support appliance information with the selected features and energy consumption from Part III, and additional descriptive analysis of the RECS. • Support activity information with the habitual activities from Part I, activity patterns from Part II, and additional descriptive analysis of the ATUS. 1.4.2. Main Datasets In this research, various types of empirical data are collected as follows. ● 1. Energy Consumption Data - Install sensors (electricity) in participants’ residential buildings 12 - Weekly data downloads to save more granular data (5-minute intervals, .csv file type) ● 2. Building Technology Data - Technical data (year built, building type, size, materials, energy certification, green building technology, etc.) is collected by survey or site visit - Weather data is collected from weather station websites during the measurement period ● 3. User Behavior Data - Major occupant behaviors are recorded as a form of appliance-level energy data measured by the sensors - They are also captured by analyzing patterns in the aggregated energy consumption data using nonintrusive load monitoring (NILM) ● 4. Residential Energy Consumption Survey (RECS) Data - Download from the RECS website - Latent variables represent qualities that are not directly measured, but rather inferred from the observed covariation among a set of variables (Piedmont, 2014). Latent variables are examined among those three types of datasets, and the RECS data are also analyzed to identify other potential latent variables. ● 5. American Time Use Survey (ATUS) Data - Download from the ATUS website - Includes respondents’ time use data of each activity done in a day - Occupant behavior activities are derived from the ATUS datasets. 13 1.4.3. Main Methodology: Machine Learning In this research, Machine Learning (ML) approaches are mainly used for data analysis, which is novel in occupant behavior studies. Other quantitative methods can be also considered depending on the characteristics of the input datasets and the format of the expected outcome. This study uses multiple large datasets with more than 270 features and/or more than 70000 instances. Therefore, ML methods are selected since ML involves searching a very large space of possible hypotheses to find one that best fits the observed data and any prior knowledge held by the learner (Mitchell, 1997). ML is concerned with answering questions such as the following (Bishop, 2006; Mitchell, 1997): ● What algorithms exist for learning general target functions from specific training examples? In what settings will particular algorithms converge to the desired function, given sufficient training data? Which algorithms perform best for which types of problems and representations? ● How much training data is sufficient? ● When and how can prior knowledge held by the learner guide the process of generalizing from examples? ● What is the best strategy for choosing a useful next training experience? ● What is the best way to reduce the learning task to one or more function approximation problems? ● How can the learner automatically alter its representation to improve its ability to represent and learn the target function? 14 As described above, generalization performance and model complexity regarding testing/training data are fundamental concerns in ML, and they will be further examined in the following sections. Generalization Performance Generalization is the ability to correctly categorize new examples that differ from those used for training. In practical applications, the variability of the input vectors will be such that the training data can comprise only a tiny fraction of all possible input vectors, so the ability to generalize and make accurate predictions for new data is a central goal in ML (Bishop, 2006). Figure 1-4. Plots of Polynomials Having Various Orders (Bishop, 2006) Figure 1-4 illustrates plots of polynomials with various orders and the root-mean-squared-error (RMSE) of training and test sets for each order M. As the order increases (signifying more complex models), the training set error goes to zero (when M = 9). However, this model has an over-fitting problem, and the test set error value becomes very large (Bishop, 2006). The model’s 15 generalization performance can be improved with some strategies, and two examples are explained below. Figure 1-5. Reducing Over-Fitting with More Data (Bishop, 2006) As seen in Figure 1-5, for a given model complexity, the over-fitting problem becomes less severe as the size of the dataset increases. In other words, the larger the dataset, the more complex (more flexible) the model we can afford to fit to the data. Figure 1-6. Reducing Over-Fitting with Regularization (Bishop, 2006) However, it is not always possible to have enough data, and we should consider how we can apply the model to datasets of limited size where we may still wish to use relatively complex and flexible models. One technique that is often used to control the over-fitting phenomenon in such cases is that of regularization, which involves adding penalty terms to the error function. Figure 1-6 shows the results of fitting the polynomial of order M = 9 to the same dataset as before, but now using 16 the regularized error function. For a value of ln λ = −18, the over-fitting has been suppressed and the model obtains a much closer representation of the underlying function. However, if we use too large a value for λ, then we again obtain a poor fit, as shown for ln λ = 0 (Bishop, 2006). Model Complexity If we were trying to solve a practical application using this approach of minimizing an error function, we would have to find a way to determine a suitable value for the model complexity. In the previous example of polynomial curve fitting using least squares, we saw that there was an optimal order of polynomial that gave the best generalization. The order of the polynomial controls the number of free parameters in the model and thereby governs the model’s complexity. With regularized least squares, the regularization coefficient λ also controls the effective complexity of the model, whereas for more complex models, such as mixture distributions or neural networks, there may be multiple parameters governing complexity. In practical applications, we need to determine the values of such parameters, and the principal objective in doing so is usually to achieve the best predictive performance on new data. As well as finding the appropriate values for complexity parameters within a given model, we may also wish to consider a range of different types of model in order to find the best one for our particular application (Bishop, 2006). ML Model Selection The phenomenon of model complexity and over-fitting can be considered with bias-variance tradeoff. Bias is the extent to which the average prediction over all datasets differs from the desired regression function, and variance is the extent to which the solutions for individual datasets vary around their average (Zhou, 2016). In ML models, low bias and low variance are the most desirable, 17 and high bias and high variance are the least desirable. However, bias and variance have a trade- off relationship in model complexity as shown in Figure 1-7. Thus, it is important to find the optimal model complexity to minimize the expected loss. Expected Loss = (Bias)2 + Variance + Noise Figure 1-7. Bias-Variance Tradeoff (Dubrawski, 2015) In a given relationship, flexible models with strong approximators (high degree polynomials) have low bias and high variance, and rigid models with weak approximators (low degree polynomials) have high bias and low variance (Zhou, 2016). This is further explained in the next example. 18 Figure 1-8. Dependence of Bias and Variance on Model Complexity (Bishop, 2006) Figure 1-8 illustrates the dependence of bias and variance on model complexity, governed by a regularization parameter λ. The left column shows the result of fitting the model to the datasets for various values of ln λ (for clarity, only 20 of the 100 fits are shown). The center column shows the corresponding average of the 100 fits (red) along with the sinusoidal function from which the datasets were generated (green). The right graph is the plot of squared bias and variance, together with their sum, corresponding to the results shown on the left side. Also, the average test set error for a test dataset size of 1000 points is shown. In this example, the minimum value of (bias)2 + variance occurs around ln λ = −0.31, which is close to the value that gives the minimum error on the test data (Bishop, 2006). 19 Although the bias-variance decomposition may provide some interesting insights into the model complexity issue from a frequentist perspective, it has its limitations. Bias-variance decomposition is based on averages with respect to ensembles of datasets, whereas in practice, we have only a single observed dataset. If we had a large number of independent training sets of a given size, we would be better off combining them into a single large training set, which of course would reduce the level of over-fitting for a given model complexity (Bishop, 2006). In the larger picture, in order to choose the best ML models and to design a good learning system, we should consider (1) training experience, (2) target function, (3) representation of the target function, and 4) function approximation algorithm. The type of training experience available can have a significant impact on the success or failure of the learner (Mitchell, 1997). There are three key attributes to a good training experience: ● Whether the training experience provides direct or indirect feedback regarding the choices made by the performance system. ● The degree to which the learner controls the sequence of training examples. How well the training experience represents the distribution of examples over which the final system performance must be measured. Research Scope and Assumptions In this research, the building type is confined to residential buildings, and the boundary of occupant behavior for the machine learning (ML) model is limited to energy consumption–related behaviors. Due to limitations of the measurements, the ML model contains occupant behavior and energy consumption data regarding electricity and gas, but does not include water-related data. 20 Definition of Occupancy Depending on the purpose of the research, researchers define “occupant behavior” from different perspectives. Thus, the definition and scope of occupant behavior need to be defined at the early stages of research. In this research, occupant behavior is limited to only energy use within a built environment, especially in residential buildings. Existing studies generally divide the effects of occupant behavior into two categories: (1) simple occupancy effects on building energy consumption, and (2) occupants’ actions/activities influencing energy consumption (Yu et al., 2011). Chen et al (2015) defined behavior as discernable actions or reactions of a person in response to external or internal stimuli, or to adapt to external conditions such as weather or indoor air quality. In built environments, the impact of behavior on building energy consumption is closely related to building elements, such as windows and curtains, and appliances controlled by the occupants. Thus, the operation of building elements and appliances indicate occupant behavior (Chen et al., 2015). Santin (2011) defined behavior to include all activities of occupants in the residential building. In particular, they defined “use” as the direct interaction between an occupant and an action to accomplish a certain goal. Occupant behavior was specified further as the use of residential space, building systems, and other services in the house that can affect energy consumption, including space and water heating. In many cases, an occupant’s psychological factors, including attitudes and motivations, leading to a specific action are explained separately (Chen et al., 2015). 21 Relationship between Energy, Technology, and Behavior Energy, technology, and behavior are the three main concepts in residential energy studies. In most of the current research, the influences of technology and behavior on energy consumption have been studied separately. However, recent studies have introduced a novel way to explore the relationship of the main concepts, and this research will accept that new point of view. Zhao et al. (2017) explained these concepts as follows. (1) Home energy consumption is measured and recorded by utility companies. They combine monthly consumption data, distribution, transmission, taxes, and service charges to produce a monthly energy bill that is sent to the occupant for their previous month’s service. (2) “Green building technology” refers to the collection of advanced technologies and products for building design and construction that reduce overall energy use and carbon emissions. (3) Occupant behavior and its position as part of the overall development process, and specifically with building systems, is critical for understanding residential energy consumption. Occupant actions impacting energy use can be divided into three categories: time-related usage, environment-related mode, and quantitatively described behavior. Existing studies suggest that the efficiency and efficacy of building technology, such as heating and cooling systems, have a considerable impact on residential energy consumption. Literature also asserts that some resident behaviors considerably affect home energy use. Most of the existing studies investigated the effects of either occupant behaviors or technologies. However, Zhao et al. (2017) identified a new point of view on energy efficiency in residential buildings. Unlike other earlier studies, which isolated the effects of technology or behavior on energy consumption, their 22 study investigated the interaction between building technology and occupant behavior and their joint impact on energy use (Figure 1-9). Figure 1-9. Differences of Models Occupant behavior and building technology are two indispensable factors for enabling energy efficiency. In addition, the effects of either behavior or technology depend on the other’s specific values. The effects of one level of technology vary for different occupants, and vice versa. It is obvious that a higher level of green building technology will lead to less energy use. However, Zhao et al. (2017) argued that when considering the interaction with occupant behavior, the most advanced technologies might not necessarily be the optimal option for all occupants. They asserted that the identified interaction effects are mutual rather than one-way, and thus implied that behavior can impact the technology’s performance, and that performance can influence occupant behavior in kind (Zhao et al., 2017). Summary In this chapter, problems of the existing studies were examined and the goals and objectives of this research were defined based on the hypotheses to address the problems. Then, the research design and structure was explained. The term “occupancy” was defined for this research, and the 23 relationship between the three main data categories (occupant behavior, building technology, and energy usage) were discussed. More details will be studied in the following chapters. 24 OCCUPANT BEHAVIOR PREDICTION MODEL ON ENERGY CONSUMPTION IN RESIDENTIAL BUILDINGS Abstract Occupant behavior consists of multifaceted variables and thus a systematic approach is required to comprehensively understand occupant behavior. This research aims to define a structure of relationship between energy consumption, building technology, and occupant behavior, using the Occupant Behavior Prediction Model. The model can predict and explain occupant energy usage- related activities. This model can also identify the predictability and habitual characteristics of each activity. A machine learning approach is used to develop the model, and datasets from the American Time Use Survey (ATUS) are used to verify the model. The results show that the energy use activities with higher predictive performances are more stable and habitual compared to the ones with lower predictive performances. Occupants’ habitual behaviors are difficult to change, but they are more predictable. The prediction accuracy achieved by this model for these habitual activities reached as high as 99%. For example, the accuracy was 99% when predicting washing and grooming activity, and 82% for watching TV. Such findings imply that the building systems and control strategies need to be adjusted to accommodate habitual energy use behaviors, rather than changing the behaviors. In addition, educational interventions seem more effective on the less habitual behaviors, which often change. Introduction Residential building energy consumption is affected by climate, physical properties of the building, building services and energy systems, appliances in the household, occupant behavior, and the interactions among them (Widén & Wäckelgård, 2010). As the building technologies grow more advanced, the energy consumption in residential buildings becomes more influenced by occupant behavior and living style, which emphasizes the need to understand occupant behavior and the relationship between occupant behavior and energy consumption. Occupant behaviors have been often studied based on socioeconomic factors, such as age, gender, marital status, number of children, employment status, and income level. However, this method 25 has significant shortcomings, in that socioeconomic factors cannot fully explain occupants’ energy consumption patterns. Even if occupants have similar socioeconomic characteristics, these similar characteristics do not guarantee similar behaviors. When an analysis only considers socioeconomic factors, the result will provide limited information (Diao et al., 2017). Occupant behavior is associated with more than socioeconomic factors, and occupant behavior can be caused by a variety of factors. It is critical to comprehensively identify not just occupant- specific characteristics like socioeconomic status and behavior hierarchy, but also external factors such as building attributes and climate. Therefore, a model is necessary to define and understand occupant behavior comprehensively (Chen et al., 2015). Many studies have examined the relationship between occupant behavior and energy consumption in residential buildings. However, a more comprehensive study is still needed to solve several existing problems, which are as follows: first, there is a lack of comprehensive understanding of occupant behavior regarding building technology and energy consumption in residential sectors, and second, there is an absence of systematic models able to predict the behaviors and the habitual properties of activities related to residential energy usage. In order to solve these problems, this research aims to define a model of relationships between energy consumption, building technology, and energy usage-related behavior, then uses that model to explain occupant behavior. This model is applied to predict occupants’ behavior and to identify how predictable and habitual each activity is. “Habitual behavior” denotes a behavior influenced by habits. This new model integrates the concept of habitual behavior and reduces the gap between energy consumption and occupant behavior. The outline of this chapter is summarized in Figure 2-1. 26 Input Model Output Systematic Relationship among Energy, Tech & Occupants Background of Habitual Behavior Occupant Behavior Prediction Model Figure 2-1. Chapter Outline Prediction of Future Behavior Predictability of Each Activity The structure of this chapter is as follows. First, the background section explains the categories of behavior, introduces the stability and habitual characteristic of behavior, and analyzes the relationship between energy, building technology, and occupant behavior. Then, a behavior prediction model is defined based on the concept of habitual behavior, and the main components of the model are specified. This model is applied to a case study with a machine learning approach. In the case study, an overview of the dataset is explained, and the methodology and results follow. Finally, the implications and limitations of this study are discussed. Theoretical Background 2.2.1. Habitual Occupant Behavior Behavioral routines and lifestyles are critical for energy saving because they have significant influences on daily energy use, but they are difficult to affect, changing gradually over time or not at all. Most people want to maintain their existing behavioral routines, lifestyles, and habits. Therefore, changing attitudes is easier than changing behaviors, and many studies report that building occupants' attitudes have changed to be more energy-conscious, but they are unlikely to change their behaviors to match (Lutzenhiser, 1993). 27 Cognitive belief about an object & Affective evaluations of the belief Personal attitudes Social norms & Motivation to comply w/ the norms Subjective norms Situational constraints Behavioral intention Behavior Figure 2-2. Formation of Occupant Behavior Habit Recommendation / Information Energy usage-related behavior is determined by behavioral intention and influenced by habit and situational constraints (Figure 2-2). Van Raaij and Verhallen (1983) explain that behavioral intention determines behavior. Behavioral intention is the subjective probability that a person will perform a behavior, and it is created by personal attitudes and subjective norms, if there are no unanticipated situational constraints. Personal attitudes about an object are constituted of cognitive beliefs about an object and affective evaluations of the beliefs. Subjective norms are determined by social norms and the motivation to abide by the norms. Energy-related personal attitudes include concerns about energy price, environment, building energy efficiency, health, and personal comfort (Van Raaij & Verhallen, 1983). As discussed before, these attitudes influence behavior by affecting behavioral intention. Van Raaij and Verhallen (1983) explained that people try to be consistent in their personal attitudes and behaviors, and if we change behaviors to be more energy-saving, people may develop energy-conscious attitudes. However, energy-conscious attitudes do not always cause energy-saving behavior. Also, certain behaviors may be directly changed through recommendations, prompts, and information (i.e. rewards, information about energy costs) without changing attitude first (Lutzenhiser, 1993). Attitudes may develop good behavioral intentions, but when the subjective norms are weak, the behavioral intention cannot be fully influenced. 28 Situational constraints may also hinder behavioral intentions from realizing actual behaviors. Thus, a desired behavior can be achieved when a person has positive personal attitudes and subjective norms without situational constraints (Van Raaij & Verhallen, 1983). Additionally, repeated past behaviors form habits, which affect future behavior. This means that not only changes in personal attitudes or subjective norms but also changes in earlier behavior may cause a desired behavior (Van Raaij & Verhallen, 1983). In this chapter, “habitual behavior” refers to behavior influenced by habits. In sum, behavioral intention and habit lead to behavior when situational constraints do not exist. Danner et al. (2008) studied the role of habit and intention in the prediction of people’s future behavior. They suggest that the frequency and stability of the context of past behavior mediates the role of intention. Intention has more influence on future behavior when habits are weak with low frequency or unstable context, while it has less influence when habits are strong with high frequency and stable context. Similarly, Triandis (1979) suggested a model explaining the interaction between habit and intention in the prediction of future behavior: when a habit is stronger, the relationship between intention and behavior becomes weaker. Energy consumption and energy usage-related behavior are highly patterned. Daily and weekly energy consumption patterns within a household—such as appliance usage, hot water usage, and thermostat settings—are quite stable over time, but energy consumption patterns of households are often different from one another (Lutzenhiser, 1993). Some energy consumption occurs under conscious control, while others are associated with habitual or unconscious activities (e.g. habitual water usage patterns or keeping the lights on). The micro-behavioral research explained that 29 significant differences in energy consumption can be derived from patterned behavior, including conscious vs. habitual activities, and routine vs. extraordinary activities. The differences are influenced by the interactions of buildings, equipment, and actors (Lutzenhiser, 1993). Residential energy consumption can be classified with regard to occupant behavior, building, and events as follows (Bernard, McBride, Desmond, & Collings, 1988; Lutzenhiser, 1993): • Habitual consumption: It is caused by a routine of conscious and unconscious management. • Structural consumption: It happens when the building is unoccupied. • Daily variation consumption: It results from unusual events such as vacations, parties, holidays, visitors, sick children, or broken windows. 2.2.2. Energy, Building Technologies and Occupants In the previous subsection, behavior is examined regarding internal factors that influence the formation of behavior, especially focusing on habitual occupant behavior. In this subsection, occupant behavior is explained with other external factors, including energy factors and building technologies. In order to understand energy usage-related behavior in residential buildings, overall factors affecting energy, physical building properties, and occupants should be examined and the relationship between energy, building technology, and occupant behavior should be understood (Figure 2-3). 30 Energy Factors Building Technologies Occupants Design/Construction Occasional Insulation Heating Cooling Ventilation DHW Lighting Daily y c n a p u c c O 2 r o i v a h e B l d e t a e R - e g a s U r o i v a h e B l d e t a e R - e s a h c r u P Appliances (RECS) Usage-Related Behavior1 BD O&M OB Figure 2-3. Subcategories of Energy-Tech-Occupants r o i v a h e B l d e t a e R - e c n a n e t n a M i s r o t c a F c i m o n o c E / l a i c o S BD Envelope e t a m i l C Energy System IEQ Energy Factors Building energy consumption is mainly influenced by six factors (Hong, Taylor-Lange, D’Oca, Yan, & Corgnati, 2016; Yoshino, Hong, & Nord, 2017; Yu et al., 2011): (1) climate, (2) building envelope, (3) building services and energy systems, (4) building operation and maintenance, (5) indoor environmental quality (IEQ) provided, and (6) occupant activities and behaviors. The former three are external factors and the latter three are behavior-related factors. ● Climate: The climate of the region and weather, such as outdoor air temperature, solar radiation, wind velocity, etc. ● Building envelope: The physical characteristics of the building, including orientation, building type, shape, area, insulation, windows, materials, etc. ● Building services and energy system: This includes building services and physical characteristics of energy systems, such as space cooling/heating, hot water supply, etc. 31 ● Building operation and maintenance: This includes building operation hours, week/weekend usage schedule, etc. In residential buildings, the usage pattern of HVAC (heating, ventilation, air-conditioning), lighting, and appliances are included in this category. ● Indoor environmental quality: This includes indoor air quality, thermal and visual comfort, occupants’ satisfaction with indoor conditions, etc. ● Occupant activities and behavior: This includes user-related characteristics, social and economic factors, occupants’ activities in the building, and energy usage-related behaviors. Building Technologies Zhao et al. (2017) defined the main categories of current green building technology based on IECC 2009 (ICC, 2009): (1) Design/Construction, (2) Heating/Cooling, (3) Hot Water, (4) Ventilation, (5) Insulation, and (6) Lighting/Appliances. Each category is further specified for residential houses as follows. ● Design and Construction: The physical condition of the building. Main parameters affecting energy efficiency are the size of the house, number of bedrooms, house type, and foundation type. ● Heating and Cooling: The main energy consumption in residential buildings. Important parameters are heat pump fuel, heating seasonal performance factor, and seasonal energy efficiency ratio (SEER). ● Water: Domestic hot water consumes a significant amount of fuel, and main parameters are water heater type, water heater energy factor, and water heater tank size. Also, the amount of water usage is related to weather and occupants’ behavior. 32 ● Ventilation: Ventilation is another critical category of HVAC (Heating, Ventilation, and Air-Conditioning) systems. Important factors include duct leakage, ventilation system type, and ventilation system air flow. ● Insulation: Insulation is highly correlated with energy consumption for heating and cooling. Main factors are R-value, U-value, solar heat gain coefficient (SHGC), and infiltration rate. ● Lights and Appliances: Energy efficient light bulbs and appliances contribute energy saving in the residential sector. Main factors are the energy consumption of interior lighting, exterior lighting, refrigerators, dishwashers, ranges and ovens, clothes dryers, and ceiling fans. Occupants Yu et al. (2011) described occupants of buildings as their (1) user-related characteristics, (2) social and economic factors, and (3) occupants’ activities in the building and behavior about energy usage. ● User-related characteristics: This includes number of occupants, occupancy (user presence in a building), etc. ● Social and economic factors: This includes age, gender, job, degree of education, energy cost, etc. ● Occupant behavior and activities: This includes what occupants do in the buildings, energy use behavior, and activities regarding temperature settings, appliance purchases, energy usage, etc. According to the American Psychological Association (APA) Dictionary of Psychology, behavior is “an organism’s activities in response to external or internal stimuli, including objectively observable activities, introspectively observable activities 33 (see covert behavior), and nonconscious processes” (APA, 2018). In this chapter, “activities” refers simply to “objectively observable activities”. Occupants’ activities and behavior can be further specified. Van Raaij and Verhallen (1983) categorized energy usage-related behaviors as (1) purchase, (2) maintenance, and (3) usage-related behaviors. ● Purchase-related behavior: The process of purchasing Heating, Ventilating, Air- Conditioning (HVAC) equipment, household appliances, and energy-using products. It includes the consideration of the energy attribute of the appliances regarding energy efficiency in daily use. ● Maintenance-related behavior: The behavior to maintain HVAC system and appliances, including repairs, home improvements, and servicing. ● Usage-related behavior: The daily energy consumption of household appliances (usage- related behavior 1 in Figure 2-3), lighting, and HVAC systems in the home (usage-related behavior 2 in Figure 2-3) regarding frequency, duration, and intensity of the energy use. It includes the energy-conscious behavior of setting the set-point temperature of thermostats, using ventilation systems. This usage-related behavior is more directly related to habits and behavioral patterns, which are generally more difficult to change. All of the factors about energy, building technology, and occupants are illustrated in Figure 2-3. In this figure, the heights of the bars indicate the relationships between the factors. For example, occupant behavior (OB) in energy factors are mainly related to lighting and appliances in building technologies, but in the long term, it can be related to insulation, heating, cooling, ventilation, or 34 domestic hot water (DHW) in building technologies. Occupant factors are more detailed in the right part of the figure. The latter part of this research will focus more on the highlighted parts of the figure, lighting and appliances of building technologies, and occupants’ usage-related behavior in daily life. Behavior Prediction Model 2.3.1. Occupant Behavior Prediction Model The Occupant Behavior Prediction Model aims to predict occupant behavior through energy consumption data. In addition, this model can identify habitual and non-habitual behaviors, which can potentially be used for efficient building operation/control strategies, interventions/education, and so on. Unlike previous models that focused on predicting energy consumption by occupant behavior, or changing occupant behavior through intervention or education, this model investigates the reverse: predicting occupant behavior based on energy consumption. The Occupant Behavior Prediction Model incorporates the function of habit on the formation of behavior, which is innovative in residential energy and occupant behavior studies. Existing studies (Ouellette & Wood, 1998; W. Wood, Tam, & Witt, 2005) suggested that the strength of a habit should be measured by reflecting its frequency and stability of its context. They estimated the strength of habits by multiplying a measure of past behavior frequency with a measure of context stability. This provided a habit scale, where a higher score indicates a strong habit with high frequency in a stable context, and lower score indicates a weak or nonexistent habit with low frequency in an unstable context. Given that the contexts remain relatively stable, past choice of behavior can have more influence on later choice of behavior (C.-F. Chen & Chao, 2011). Wood 35 et al. (2002) defined habits as behaviors that are performed repeatedly in stable contexts, because context stability is important for automatic responding. Frequency Frequency per Day Time Duration Start Time End Time Context Place Place Situation Partner Weather Other Circumstances Behavior / Activity Figure 2-4. Occupant Behavior/Activity Prediction Model The components of this Occupant Behavior/Activity Prediction Model are extracted from those habitual behavior studies and used to measure the strength of habit in occupant behavior. Behaviors and activities are explained with the following main components (Figure 2-4). ● Frequency: Number of times a single activity s performed per day ● Context: Context is broken down into Time, Place, and Situation o Time ▪ Duration: Total minutes of an activity, from the start time to the end time ▪ Start Time: Start time (HH:MM) of an activity ▪ End Time: End time (HH:MM) of an activity o Place (Where): Physical location where an activity is performed o Situation 36 ▪ Partner (Who): Person/people with whom an activity is performed ▪ Weather: Weather conditions when an activity is performed ▪ Other Circumstances: Other circumstances affecting an activity In this study, Frequency, Duration, Start Time, End Time, Place (Where), Partner (Who) are mainly used as input features of machine learning algorithms, which are then used predict occupants’ behaviors and activities and identify the predictability of each activity. 2.3.2. Main Components of the Model In order to predict a person’s behavior, we must first determine how predictable and habitual that behavior is. This section will examine the main component used to predict behavior. Habits and intentions jointly predict future actions, and strong habits are difficult to change with intentions. New intentions must be sufficiently strong to override stable habits. Continuous control is required until the new behavior is more strongly settled than existing habits. If the new behavior is not as well established as the existing habits, because the behavior is new and not performed frequently enough, or because the context of the behavior is unstable or difficult, behavior is more like to be influenced by intentions, conscious and controlled processes (Ouellette & Wood, 1998). The relationship between the existing habits, the new behavior, and intentions implies that education or intervention on behavior intend to influence intentions, and by doing so, change the behavior. However, education or intervention might be less effective on strong habits. Thus, after identifying which habits are strong or weak, researchers and stakeholders can set more effective 37 strategies to change behavior by focusing on the weak habits, which have more potential to be easily changed. In contrast, a different approach is required to deal with strong habits. Energy control systems need to understand the patterns behind occupants’ strong habits and set effective control strategies following those patterns, rather than trying to change the behavior directly. Effective interventions to change weak habits tend to involve stimulus control (i.e., limitation of exposure to stimulus cues), and response substitution (i.e., linkage of a competing response to the cues). In addition, effective interventions to change intentional action tend to give new information that changes the value of behavioral outcomes. (W. Wood et al., 2005). Habits are constructed when one behavior is frequently and consistently repeated for the same purpose in similar contexts (Danner et al., 2008; Ouellette & Wood, 1998). Habits are signs of the cognitive and motivational changes caused by repeated behavior. With repetition, the practical action is associated with the times, locations, and other features of that context, and these associations form habitual actions which are automatically triggered by those features (W. Wood et al., 2002; W. Wood et al., 2005). Frequency Frequency of past behavior plays a significant role in the prediction of future behavior, over and above intention (Ajzen, 1991; Ouellette & Wood, 1998), which means that those behaviors are performed without much thought and deliberation (Danner et al., 2008). The impact of the frequency of past behavior on future behavior emphasizes how heavily behavior is influenced by habit (Danner et al., 2008). 38 Context: Time, Place, Situation Although frequency plays a significant role in forming habits and predicting future behavior, it is not the sole factor needed to form habits. Another important factor is the consistency of the behavior (Danner et al., 2008; Ouellette & Wood, 1998; W. Wood et al., 2005). The consistency denotes the stability of the context in which the behavior has happened in the past. The stability of the context contributes to habit formation based on the assumption that people tend to be sensitive to changes in a given context. The context includes place, time, and situation. The time is the time of a day, the place is the physical location, and the situation includes circumstances such as other people and weather (Danner et al., 2008). Kahneman et al. (2004) explained that the situation more focused on interaction partners. They asked structured questions about respondents’ daily activities: what they were doing (activities), when they started and ended (time), where they happened (place), and whom they were with (interaction partner). A context is considered stable when the time, place, and situation (partner) in which the behavior is performed are always similar (Danner et al., 2008). Aarts et al. (1997) explained that habits are supposed to be developed when a behavior is frequently performed at the same time, in the same place, in the same situation. If a behavior is performed very frequently, but it is always performed in different contexts (time, place, situation/partner), the behavior will be more dependent on intentions and will not be established as habit. Similarly, if a behavior is always executed in the same context, but it only occurs occasionally, it will again be more determined by intentions rather than from being a stable habit (Danner et al., 2008). 39 Case Study: Extract Attributes from the ATUS to Fit the Model 2.4.1. Overview of the ATUS Data The American Time Use Survey (ATUS) is an annual national survey conducted by the U.S. Bureau of Labor Statistics (U.S. BLS) (Kahneman et al., 2004). The U.S. BLS conducts the national survey on how the population allocates time in their daily lives. The ATUS assesses what (activity), where (place), and with whom (partner) a nationally representative sample of Americans spends their time in a regular day. The survey has been annually conducted since 2003, and it contains detailed daily activities from more than 10,000 respondents per year (U.S.BLS, 2018). The diary of the activities starts from 4 AM to 4 AM of the next day. In this study, the ATUS 2015 data are used to examine energy usage-related behavior, focusing on habitual consumption among habitual, structural, and daily variation consumptions (see the end of subsection 2.2.1). The survey results are recorded in the following seven basic data files (U.S.BLS, 2018). ● Respondent file: Contains data about respondents, including their workforce status and earnings. ● Roster file: Contains data about household members and non-household children of the respondents, including age and sex. ● Activity file: Contains data about how the respondents spent a day, including activity codes, locations, and start/end times. ● Activity summary file: Contains data about the total time each respondent spent on each activity during the day. ● Who file: Contains data about who was with the respondent during each activity. 40 ● Eldercare roster file: Contains data about elderly people whom the respondents take care of, including duration of care, age, and sex. ● Current population survey (CPS) file: Contains data about all individual household members who were selected to take part in the survey. These data were collected 2-5 months ahead of the actual ATUS interview. Table 2-1. ATUS 1st Tier Activities Activity Personal care Professional and personal care services Code 01 02 Household activities 03 Caring for and helping household members 04 Caring for and helping non-household members 05 Work and work related activities 06 Education 07 Consumer purchases 08 09 Household services 10 Government services and civic obligations 11 Eating and drinking Socializing, relaxing, and leisure 12 13 Sports, exercise, and recreation 14 Religious and spiritual activities 15 Volunteer activities 16 Telephone calls 18 Traveling 50 Data codes The main data for this research are extracted from the activity file, and other supporting information is extracted from the who, respondent, roster, and CPS files. The activities are defined in three tiers: the first tier has 18 overall categories of activities (Table 2-1), the second tier has more detailed 110 subcategories under the first tier, and the third tier has the most detailed 465 categories under the first and second tiers. 41 2.4.2. Reclassification of Activities Most of the existing studies using the ATUS data analyzed the activities in the 1st tier level (Aksanli et al., 2016; Diao et al., 2017). However, the 1st tier activity categories are too broad to explain energy usage-related behaviors. In order to understand residential energy behaviors more accurately, this study uses the 3rd tier categories. New Code 3rd Tier Code Activity Table 2-2. Energy Usage-Related Activities (3rd Tier) AA01 BB01 BB02 BB03 BB04 BB05 BB06 BB07 BB08 CD01 CD02 EF01 LL01 LL02 LL03 010201 020101 020102 020201 020203 020303 020501 020502 020601 020701 030101 040101 030401 030501 040401 050101 050102 060301 120303 120304 120305 120306 020904 050401 120307 120308 150101 Washing, dressing, and grooming oneself Interior cleaning Laundry Food and drink preparation Kitchen and food clean-up Heating and cooling Lawn, garden, and houseplant care Ponds, pools, and hot tubs Care for animals and pets (not veterinary care) Vehicle repair and maintenance (by self) Physical care for household children Physical care for non-household children Physical care for household adults Helping household adults Physical care for non-household adults Work, main job Work, other job(s) Research/homework for class for degree, certification, or licensure Television and movies (not religious) Television (religious) Listening to the radio Listening to/playing music (not radio) Household & personal e-mail and messages Job search activities Playing games Computer use for leisure (exc. Games) Computer use Since the ATUS data record all of the respondents’ activities on the diary day, the dataset contains both energy usage-related and non-energy-usage-related activities. Among the 465 activities, 27 activities with the potential to use electricity, gas, or water were selected by examining their 42 descriptions. The selected activities were re-grouped based on their similarity, the energy types and appliances that they could possibly use. Table 2-2 shows the new codes for the modified groups of activities, the original 3rd tier activity codes from the ATUS, and the descriptions of the activities. The 3rd tier code shows the hierarchy of the activities: the first 2 digits indicate the 1st tier activity groups, the middle 2 digits indicate the 2nd tier activity groups, and the last 2 digits indicate the 3rd tier activity groups. Table 2-3 explains the new code of activities and the energy types and appliances for the activities. Table 2-3. New Activity Code, Energy, Appliances Activity Energy Appliances (Electricity and Gas) E,W,G Lighting, Shower, Hair dryer, Shaving E Lighting, Vacuum E,W,G Lighting, Washer, Dryer E,W,G Lighting, Oven, Stove, Toaster, Blender, Code AA01 Washing, dressing, and grooming BB01 BB02 Laundry BB03 Food and drink preparation Interior cleaning Coffee machine, Cooker, etc. Lighting, Dish washer Lighting, HVAC E,W E,G BB04 Kitchen and food clean-up BB05 Heating and cooling BB06 Gardening, ponds, pools, and hot tubs W,G,E Lighting Lighting BB07 Care for animals and pets Lighting, Repair tools BB08 Vehicle repair and maintenance CD01 Physical care for children Lighting Lighting CD02 Physical care for/helping adults Lighting, Computer EF01 Work for job(s)/research/homework Lighting, TV LL01 Television LL02 Listening to/playing radio or music Lighting, Computer, Music player, Radio LL03 General computer use Lighting, Computer ** E: Electricity, W: Water, G: Gas E,W E E,W E,W E E E E In the ATUS, Heating and cooling (new code BB05, 3rd tier code 020303) activity does not mean operating HVAC systems or setting set-point temperature of a thermostat, but means “collecting/chopping woods, lighting fireplace, shoveling coal, filling heater with fuel, installing fireplace etc.” (U.S.BLS, 2018), which is less usual in households. The specific meaning of Heating and cooling activity of the ATUS should be considered in the latter part of data analysis in this chapter. 43 2.4.3. Variables (Attributes) Based on the Occupant Behavior Prediction Model defined in Section 2.3.2, Frequency and Context (Time, Place, Partner) variables were extracted from the ATUS files as follows. ● Activity: New group of activities ● Frequency: Number of times the activity was recorded during the day ● Start Time: Start time of the activity in minutes ● End Time: End time of the activity in minutes ● Duration: Minutes spent doing the activity from start time to end time ● Place: Place where the activity was performed ● Partner: Partner with whom the activity was performed Table 2-4. Partner Code ATUS Code Partner Alone Household Non-Household (Friends, Acquaintances) 18 19 20 21 22 23 24 25 26 27 28 29 30 40 51 52 53 54 56 57 58 59 60 61 62 Work-Related Code 1 2 3 4 Detailed Partner Alone Alone Spouse Unmarried partner Own household child Grandchild Parent Brother/sister Other related person Foster child Housemate/roommate Roomer/boarder Other nonrelative Own non-household child < 18 Parents (not living in household) Other non-household family members < 18 Other non-household family members 18 and older (including parents-in-law) Friends Neighbors/acquaintances Other non-household children < 18 Other non-household adults 18 and older Boss or manager People whom I supervise Co-workers Customers 44 Activity is a dependent variable and others are independent variables. An activity has 15 unique values, as explained in Table 2-3. Frequency, Start Time, End Time, and Duration are numeric variables and Partner and Place are categorical variables, explained in Table 2-4 and Table 2-5. The ATUS defined the Partner with 25 categories, but this was simplified to Alone, Household, Non-Household, and Work-Related people in this research. Table 2-5. Place Code Code Place Respondent's home or yard Respondent's workplace Someone else's home Restaurant or bar Place of worship Grocery store Other store/mall School Outdoors away from home 1 2 3 4 5 6 7 8 9 10 Library 11 Other place 12 Car, truck, or motorcycle (driver) 13 Car, truck, or motorcycle (passenger) Place Bus Subway/train Bicycle Boat/ferry Taxi/limousine service Code 14 Walking 15 16 17 18 19 20 Airplane 21 Other mode of transportation 30 31 Gym/health club 32 89 Unspecified place 99 Unspecified mode of transportation Bank Post Office Case Study: ML Classification Process This research used a machine learning (ML) approach to understand energy usage-related behavior based on the behavior prediction model using the ATUS data. The model is used to predict energy usage-related activities and to identify the predictability and the habitual characteristic of each activity. For the data analysis, various packages in Python and R are used. The process is explained as follows (Figure 2-5). 45 Pre-Analysis Select data (Residential energy related data) Perform descriptive analysis Prepare dataset (Scale and split dataset) Algorithm Selection Run main algorithms Feature Engineering Evaluate performance / Select the best algorithm Check collinearity Remove or add features Evaluate performance / Finalize input features Parameter Tuning Change parameter values (gamma, C, kernel) Evaluate performance / Finalize parameter setting Subgroup Analysis Define subgroups Evaluate sensitivity Figure 2-5. Data Analysis Process 2.5.1. Pre-Analysis The goal of the pre-analysis is to understand the characteristics of the overall dataset and the data distribution of each variable. This step includes the following tasks: (1) select data based on the given conditions, (2) perform s descriptive data analysis on the main variables, and (3) split datasets for machine learning processes. 2.5.2. Algorithm Selection Machine learning algorithms show different performances depending on the characteristics of the given dataset. Thus, multiple machine learning algorithms are compared in this step and the 46 algorithm with the best performance is selected for further improvement. Algorithms that are frequently used for classification include Naïve Bayes (NB), Logistic Regression (LR), K-Nearest Neighbor (KNN), Decision Tree (DT), and Support Vector Machine (SVM) (Mayfield & Rose, 2010; Shermis & Burstein, 2013). NB is a probabilistic classifier that is built on the Bayes’ theorem that assumes independence between attributes. NB is often employed as a baseline algorithm due to its easy and fast implementation. NB performs well with plenty of fairly weak predictors and efficiently extends to classification tasks with multiple class values (Mayfield, Adamson, & Rose, 2014). LR is a conditional probability model that builds a linear model by reducing incorrect probability values based on a transformed target variable. LR generates accurate probability estimates by maximizing the probability of the training data (Witten, Frank, Hall, & Pal, 2016). KNN is an instance-based classification. Nearest-neighbor classification compares each new instance with existing ones using a distance metric, and a class is assigned to the new instance using the closest existing instance. K neighbors use more than one nearest neighbor for the categorical class or the distance-weighted average for the numeric class. KNN is simple and often works efficiently, and each attribute has the same effect on the decision. However, it is easily influenced by noisy data (Witten et al., 2016). DT is a divide-and-conquer approach that compares the value of some attribute with a constant and divides the data at a node. Nodes in a decision tree test a particular attribute, and the test 47 compares an attribute value with a constant. A DT constructs the comparisons recursively. First, it selects an attribute, places it at the root node, and creates a branch for each possible value. Then it splits the dataset into subsets and repeats the process recursively for each branch until all instances at a node have the same class value (Witten et al., 2016). SVM is based on the maximum-margin hyperplane, an algorithm used to find a special type of linear model (Witten et al., 2016). SVM adapts linear models to investigate nonlinear class boundaries with a focus on marginal instances. Two different versions of the dataset are compared: one where the numeric variables are used without standardization, and a second where the numeric variables are standardized to a mean of 0 and a standard deviation of 1. The performance of the algorithms is evaluated with Accuracy, Precision, Recall, and F1-score. Accuracy, Precision, Recall, and F1-score are used to evaluate the predictive performance of the algorithms. Accuracy is the percentage of correct predictions, or the ratio of true predictions to the total number of instances. Precision and Recall are the indexes of relevance. Precision is the ratio of correct positive predictions to all positive predictions. A low precision implies a large number of false positives. Recall is the ratio of correct positive predictions to the sum of correct positive predictions and wrong negative predictions. A low recall implies a large number of false negatives. F1-score is the harmonic mean of precision and recall. Their functions are described in Figure 2-6 and Equations 1 through 4. 48 Predicted Positive True Positive False Positive Negative False Negative True Negative Figure 2-6. Confusion Matrix Actual Positive Negative !""#$%"&= )*+, -./0102,3)*+, 4,56102, ).167 8$9":;:<== )*+, -./0102, )*+, -./0102,3>67/, -./0102, ?9"%@@= )*+, -./0102, )*+, -./0102,3>67/, 4,56102, A1 C"<$9= D -*,E0/0.F ×H,E677 -*,E0/0.F3H,E677 (Equation 1) (Equation 2) (Equation 3) (Equation 4) 2.5.3. Feature Engineering The goal of feature engineering is to identify the best combination of features to achieve a higher performance. First, problems among the existing features are examined, such as collinearity or noisy features. In order to diagnose the collinearity of the variables, correlations between variables are checked. Then, additional features are considered. All of the possible combinations of additional features are applied to the model, and the performance is evaluated. Finally, the highest- performing combination of features is selected for the next step of analysis. 2.5.4. Parameter Tuning The goal of parameter tuning is to further refine the performance of the selected algorithm. Most machine learning algorithms’ performance varies by parameter setting. The performance of SVM, 49 in particular, is heavily influenced by its parameters, such as kernel, C, gamma, etc. Several different values of the parameters are tested and the values with the best performance are selected. 2.5.5. Subgroup Analysis Subgroup analysis finds patterns in a subset of the dataset, and it is useful to assess whether different types of subsets respond differently to the model (Lagakos, 2006). In this study, subgroups are defined by (1) quantile of feature values, (2) predictive performance of each activity, and (3) number of instances of activities. • (1) Quantile: Each feature has outliers, and subgroup analysis is performed to evaluate the influence of outliers and the various range of quantiles. For each feature, quantiles are calculated and subgroups are set with different instances having certain ranges of middle values, such as 95%, 90%, 80%, and 50% of middle ranges. • (2) Performance: Activities with low performance might include noisy data, which have a negative impact on the overall model performance. In this subgroup analysis, the model’s performance is compared between high- and low-performance activities. • (3) Number of Instances: Activities with a high number of instances are more common among respondents and activities with a lower number of instances are less common. In order to examine the influence of an activity’s frequency on the model’s performance, subgroups are defined based on the number of activity instances. 50 Case Study: Result 2.6.1. Pre-Analysis 2.6.1.1. Features (Variables) and Instances Based on the Occupant Behavior Prediction Model, Frequency, Duration, Start Time, End Time, and Partner variables are initially selected. The Place variable is excluded in the baseline algorithm selection, since it only contains the values of Home (1) and Not Collected (-1). Originally, the activity file from the 2015 ATUS contained 214,429 activities from 10,905 respondents. For this study, only energy usage-related activities were selected, so 76,980 activities from 10,849 respondents remained. Since this study focuses on residential energy behaviors, those activities were narrowed down to only include the ones that happened in the respondent’s home or yard. The ATUS does not collect the location and partner information for certain types of activities, such as sleeping and grooming, due to privacy concerns. Therefore, it was assumed that those activities happened alone at home (Diao et al., 2017). This left 67,115 activities from 10,772 respondents, which this study used for the analysis. 70 percent of the whole dataset was set as the training set and the remaining 30 percent was set as the testing set. 2.6.1.2. Descriptive Analysis Descriptive analysis helps researchers understand the overall properties of a dataset and provides ways to help analyze the given data more efficiently. In this section, the distribution of categorical data values and the range of numeric data values are examined. 51 Table 2-6 summarizes the data distribution of a dependent variable, Activity, and a categorical variable, Partner. Watching television (LL01), Washing, dressing, and grooming (AA01), and Food and drink preparation (BB03) have the highest numbers, which means they are the most common and frequent energy usage-related activities in daily life. Physical care for/helping adults (CD02), Vehicle repair and maintenance (BB08), and Heating and cooling (BB05) have the lowest numbers among the energy usage-related activities in the ATUS data. In the ATUS, heating and cooling activity includes preparation of fuels (such as collecting/chopping/stacking wood, shoveling coals, or filling a heater with fuel), and installing and maintaining heating and cooling systems (such as installing a fireplace or window air-conditioning unit, or changing a furnace filter), which are less common activities in households with gas or electricity-based heating and cooling systems. This may be the reason why the number of heating and cooling activities is very low in this dataset. Table 2-6. Distribution of Partner for Each Activity Total Not Collected(-1) (100%) Code AA01 15266 3535 BB01 2952 BB02 9986 BB03 BB04 3360 111 BB05 1345 BB06 2167 BB07 BB08 228 5063 CD01 253 CD02 2422 EF01 16334 LL01 LL02 371 3722 LL03 15266 3 0 7 1 0 0 2 0 0 0 0 7 0 1 Alone(1) 0 (0%) 2486 (0%) 2419 (0%) 6191 (0%) 2231 (0%) 85 (0%) 1016 (0%) 1858 (0%) 158 87 (0%) (0%) 11 (0%) 1844 (0%) 8750 (0%) 295 (0%) 2714 (0%) (70%) (82%) (62%) (66%) (77%) (76%) (86%) (69%) (2%) (4%) (76%) (54%) (80%) (73%) Household(2) Non-Household(3) Work-Related(4) (0%) (0%) (0%) (0%) (0%) (0%) (0%) (0%) (0%) (0%) (0%) (2%) (0%) (0%) (0%) (0%) (28%) (16%) (35%) (31%) (21%) (22%) (13%) (24%) (95%) (89%) (19%) (42%) (17%) (24%) 0 975 483 3450 1035 23 290 279 55 4819 225 459 6940 63 891 (0%) (2%) (2%) (3%) (3%) (3%) (3%) (1%) (7%) (3%) (7%) (3%) (4%) (4%) (3%) 0 71 50 336 93 3 39 26 15 157 17 61 637 13 116 0 0 0 2 0 0 0 2 0 0 0 58 0 0 0 Due to privacy issues, the ATUS does not collect Partner information for some activities, including Washing, dressing, and grooming (AA01). Except for Caring for children and adults (CD01, CD02), most of the activities are done by oneself. Care for animals and pets (BB07), 52 Laundry (BB02), and Listening to/playing radio or music (BB02) are the activities most likely to be performed alone. Following Caring for children and adults (CD01, CD02), the next most likely activities to be performed with household members are Watching television (LL01), Food and drink preparation (BB03), Kitchen and food clean-up (BB04), and Interior cleaning (BB01). This shows that people tend to watch television, have meals, and do household chores with family members. Physical care for/helping adults (CD02) and Vehicle repair and maintenance (BB08) are relatively likely to be performed with non-household members, which implies that these activities need more help from other experts. The only activity that was likely to be performed with work-related people (2%) was Work for job(s)/research/homework (EF01). Since only the activities that were performed at home are selected, the Place variable has only “home,” except for Washing, dressing, and grooming (AA01), when location information was not collected due to privacy concerns. Figure 2-7 shows the range of Frequency values for each activity. For most of the activities, the number of an activity performed in a day is between 1-3 times. Physical care for children (CD01) has the highest value of Frequency and it also has the highest-value outlier. Physical care for/helping adults (CD02) and Work for job(s)/research/homework (EF01) show higher values than other activities. This implies that caring for others (especially children) happens more frequently because they (children or other adults) need help often. Also, when respondents report their Work for job(s)/research/homework (EF01) activity at home, the number of instances of the working activity in a day is relatively high. 53 Figure 2-7. Range of Frequency for Each Activity Figure 2-8 illustrates the range of Duration per one instance of an activity for each activity type. The ranges of Duration values vary by activity, which suggests that the Duration variable can have distinctive power to predict activities. The respondents spend longer times Watching television (LL01) and Working for job(s)/research/homework (EF01), and spend shorter times on Care for animals and pets (BB07), Care for children and adults (CD01, CD02), Kitchen and food clean-up (BB04), and Heating and cooling (BB05) in a single instance of the activity. Watching television has the highest-value outlier (1400 minutes). When explaining with Frequency together, the data show that if the respondents watch television (LL01), most of them watch television between 2-3 times a day, and spend approximately 60-144 minutes each time. If they work at home for jobs/research/homework (EF01), most of them work between 1-4 times a day and spend about 30- 150 minutes each time. The activity of watching television shows has a relatively low frequency per day, but once people start watching TV, they spend longer times on it compared to other activities. Similarly, if the respondents take care of children or adults (CD01, CD02), most of them take care of children 2-6 times and adults 1-4 times a day, and spend 10-30 minutes each time. The 54 respondents with children to take care of do physical care for children (CD01) most often among the given activities, but they spend relatively short amounts of time on each instance. Figure 2-8. Range of Duration for Each Activity (a: top, b: bottom) Figure 2-9 and Figure 2-10 illustrate the range of Start Time and End Time for each activity, and the values are varied by activity. Most of the respondents start Washing, dressing, and grooming (AA01), Care for animals and pets (BB07), Physical care for children and adults (CD01, CD02), 55 and Food and drink preparation (BB03) earlier than other activities (before 8 AM) in the morning. In contrast, most of them start watching television (LL01) and listening to/playing radio or music (LL02) in the afternoon (after 12 PM), and the end times of these activities are later than that of other activities. Figure 2-9. Range of Start Time for Each Activity Figure 2-10. Range of End Time for Each Activity 56 2.6.2. Algorithm Selection Table 2-7 lists the results of the algorithm selection for the prediction of energy usage-related activities. Firstly, Naïve Bayes (NB), Logistic Regression (LR), K-Nearest Neighbor (KNN), Decision Tree (DT), and Support Vector Machine (SVM) were tested with the original data. Then, the same algorithms were run with standardized data. When using the original non-standardized features, LR showed the best performance (Accuracy 0.57). However, SVM improved significantly with standardized features: Accuracy improved from 0.53 to 0.61, which is better than LR performed with non-standardized features. Table 2-7. Performance of Different Algorithms No-Standardization Standardization Accuracy Precision Recall F1-score Accuracy Precision Recall F1-score 0.56 0.57 0.51 0.53 0.53 0.53 0.46 0.47 0.53 0.48 0.56 0.57 0.51 0.53 0.53 0.53 0.49 0.48 0.53 0.47 0.56 0.57 0.57 0.53 0.61 0.53 0.46 0.55 0.53 0.58 0.56 0.57 0.57 0.53 0.61 0.53 0.49 0.55 0.53 0.55 Algorithm NB LR KNN DT SVM Among the features, Partner is a categorical variable, and Frequency, Duration, Start Time, and End Time are numeric variables. As examined in the previous descriptive analysis, the value ranges are very different among the numeric variables, which can affect the performance of the machine learning algorithms. For example, many algorithms (such as the radial basis function (RBF) kernel of SVM) assume that the all input variables/features have means of 0 and variances in the same order of magnitude. Thus, if one feature has much larger variance than the others, it might have too heavy an influence on the objective function and weaken the estimating power from other features as expected (Scikit-Learn, 2017). This explains the big performance improvement of SVM with standardized features because the RBF kernel is used in this run. In the following steps, SVM with standardized features is further developed to improve its predictive performance. 57 2.6.3. Feature Engineering In the baseline algorithm selection, Frequency, Duration, Start Time, End Time, and Partner features were used. Since collinearity can cause biased estimates or insignificant estimates that were considered to be important (Belsley, Kuh, & Welsch, 2005), collinearity among the existing features are tested before adding more features, which potentially improves the model’s performance. In order to diagnose the collinearity, Pearson correlations between features are calculated as summarized in Table 2-8. Start Time and End Time have high correlation, 0.86. Although the baseline algorithm selection did not include the Place variable, it is included when checking collinearity, and the result indicates very high correlation (0.9) between the Partner and Place variables. Table 2-8. Pearson Correlations between Features Frequency Duration Start Time End Time Frequency Duration Start Time End Time Partner Place * All correlations are significant at the 0.01 level (2-tailed) -0.02 0.02 1.00 0.86 0.14 0.11 -0.08 1.00 0.02 0.11 0.19 0.22 1.00 -0.08 -0.02 -0.02 0.09 0.04 -0.02 0.11 0.86 1.00 0.16 0.13 Partner 0.09 0.19 0.14 0.16 1.00 0.90 Place 0.04 0.22 0.11 0.13 0.90 1.00 Johnson et al. (2014) studied the probability of transitioning from a current activity to the next activity using a Markov Chain behavior model, which shows that the previous activity is related to the next activity. Thus, previous activities are also considered additional features in this section. Based on the correlations between features and past studies about the influence of previous activities on the next activity, input features are adjusted with the following options: (1) baseline 58 features are Frequency, Duration, Start Time, End Time, and Partner, (2) exclude End Time from baseline, (3) add Place to baseline, (4) add previous activity (A-1) to baseline, (5) add the 2 steps previous activity (A-2) to baseline, (6) add Place and A-1 to baseline, (7) add Place and A-2 to baseline, (8) add A-1 and A-2 to baseline, and (9) add Place, A-1, and A-2to baseline. The performances are compared in Table 2-9. Features (1) Baseline (2) – End Time (3) + Place (4) + A-1 (5) + A-2 (6) + Place, A-1 (7) + Place, A-2 (8) + A-1, A-2 (9) + Place, A-1, A-2 0.61 0.61 0.61 0.64 0.62 0.64 0.62 0.59 0.60 Table 2-9. Performance of Additional Features Accuracy Precision Recall F1-score 0.55 0.55 0.55 0.60 0.58 0.59 0.58 0.55 0.56 0.58 0.57 0.58 0.61 0.60 0.60 0.60 0.58 0.58 0.61 0.61 0.61 0.64 0.62 0.64 0.62 0.59 0.60 Despite excluding the End Time feature (option 2), the performance kept almost same with the baseline features. When adding the Place feature (option 3), it did not improve the performance as was expected given the previous collinearity test. However, when adding the previous activity, A- 1, as a feature (option 4), the performance improved. Adding the 2 steps previous activity, A-2, (option 5) slightly improved performance compared to the baseline. Adding only A-1 (option 4) showed the highest performance among all other options of new features. Some options, such as adding A-1 and A-2 together (option 8) and adding Place, A-1, and A-2 all together (option 9) showed even lower performance than the baseline. Based on this feature engineering, the Frequency, Duration, Start Time, Partner, and A-1 features are used for the next step. 59 2.6.4. Parameter Tuning The performance of SVM is sensitive to parameter settings. Based on the previous studies about parameter tunings (Dong, Cao, & Lee, 2005; Friedrichs & Igel, 2005; C.-L. Huang & Wang, 2006), the gamma and C values of the RBF kernel are tested in this section. The performance is compared by changing the values of gamma and C. In the previous steps, the default parameters of support vector classification (SVC) from Python Scikit-Learn package were used with the RBF kernel, C as 1, and gamma as auto, which is automatically calculated as 1/number of features. Since the SVM had five parameters, gamma was set as 0.2. y c a r u c c A 0.65 0.60 0.55 0.50 0.45 0.40 y c a r u c c A 0.641 0.640 0.639 0.638 0.637 0.636 0.635 0.634 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 gamma 0.0001 0.001 0.01 0.1 gamma 1 10 Figure 2-11. Accuracy by Different gamma Values (a: left, b: right) First, C is fixed as 1 (default value), and only gamma values are changed as 10-4, 10-3, 10-2, 10-1, 100, and 101. As shown in Figure 2-11a, the accuracy is highest (0.6368) when gamma is 1 (100), which is slightly higher than the accuracy (0.6361) with the default gamma value (0.2). In Figure 2-11b, gamma values are more finely tested between 0.1 and 1, and the accuracy (0.6405) is highest when gamma is 0.6. 60 y c a r u c c A 0.65 0.64 0.63 0.62 0.61 0.60 0.1 1 10 100 C Figure 2-12. Accuracy by Different C Values Next, gamma value is fixed as 0.6, and C values are changed to 10-1, 100, 101, and 102. As shown in Figure 2-12, the accuracy is highest (0.6405) when C value is 1 (100). With C value 1, and gamma value 0.6, the sigmoid, linear, and polynomial kernels are tried, but the sigmoid kernel shows very low accuracy (0.2434), and linear and polynomial kernels are much more computationally expensive than the RBF kernel. Therefore, the RBF kernel with C value 1 and gamma value 0.6 is used for the final SVM model. The final performance has Accuracy 0.6405 which is slightly higher than the accuracy of the baseline SVM (0.6382), and Precision 0.61, Recall 0.64, and F1-score 0.60, which are almost the same as the baseline. Since the default parameter settings were already compatible with the given dataset, the final parameter tuning result showed similar performance to the baseline. Figure 2-13 and Figure 2-14 display the confusion matrix of actual activity and predicted activity. In the matrix, cells along the right downward diagonal line represent the correct predictions, while cells out of the diagonal line represent incorrect predictions. Actual activities are on the Y-axis and predicted activities are on the X-axis. The sum of each row in Figure 2-13 is the total number of each activity in the testing set, and similarly, the sum of each row in Figure 2-14 is 1 (100%) 61 for each activity. For example, 2133 instances (71%) of BB03 are correctly predicted as BB03, but 435 instances (14%) of BB03 are incorrectly predicted as LL01. ** Numbers are from the testing set, which is 30% of the whole dataset Figure 2-13. Confusion Matrix: Comparison of Actual vs. Predicted Numbers Figure 2-14. Confusion Matrix: Comparison of Actual vs. Predicted Accuracy 62 Confusion matrices can be used to identify problematic cells, where the ML algorithm has less power to distinguish the differences between classes. Table 2-10 summarizes some problematic cells with error numbers higher than 400 or error rates higher than 50%. Actual EF01 (Work for job(s)/research/homework) being predicted as LL01 (Watching Television) is the most problematic since both its error number is 452 (higher than 400) and its error rate is 61% (higher than 50%). Table 2-10. Problematic Cells in Confusion Matrix Predicted Error Number Error Rate Actual BB05 BB07 LL01 BB03 EF01 LL03 BB03 BB03 BB03 LL01 LL01 LL01 20 329 412 435 452 456 0.59 0.52 0.08 0.14 0.61 0.41 Table 2-11 compares the mean values of the numeric attributes and the ratio of the categorical attributes of EF01 and LL01. The mean values of Frequency, Duration, Start Time, and End Time are similar, and the mode value of Partner is Alone (1), although the ratios of the categories are different. It can be inferred that the main factors of the ML algorithm are the numeric attributes and the algorithm has less power of classification when most of the attribute values are similar. In order to improve overall accuracy by lowering the error numbers and error rates of the problematic cells, additional feature engineering can be considered, such as adding additional features or variable transformations, in the future research. Table 2-11. Descriptive Analysis for Problematic Cells Mean Code EF01 LL01 ** Partner Ratio is Alone(1) : Household(2) : Non-Household(3) Start Time End Time 909.65 1024.48 846.36 975.05 Frequency Duration 109.67 114.23 2.63 2.59 Ratio Partner 76:19:3 54:42:4 Based on the confusion matrix, the performance of the model for each activity is calculated as summarized in Table 2-12. Washing, dressing, and grooming (AA01) shows the highest accuracy 63 (0.99), which means this model predicts 99% of AA01 activity correctly. The model predicts Watching television (LL01), Physical care for children (CD01), and Food and drink preparation (BB03) with higher performance. However, the model incorrectly predicts Heating and cooling (BB05), Vehicle repair and maintenance (BB08), and Listening to/playing radio or music (LL02). The number of instances of an activity is also relevant to the model’s predictive performance for each activity, since the model can be trained better with more data. Table 2-12. Predictive Performance of Each Activity Description Code Accuracy Precision Recall F1-score Count AA01 LL01 CD01 BB03 BB04 BB01 BB07 LL03 EF01 BB02 CD02 BB06 BB05 BB08 LL02 ** Counts are from the testing set, which is 30% of the whole dataset 4607 Washing, dressing, and grooming oneself 4900 Watching TV 1527 Physical care for children 3003 Food and drink preparation 1031 Kitchen and food clean-up 1028 635 Care for animals and pets 1101 General computer use 742 Work for job(s)/research/homework 860 Laundry 88 400 Gardening, ponds, pools, and hot tubs 34 Heating and cooling 66 Vehicle repair and maintenance (by self) 113 Listening to/playing radio or music 0.99 0.71 0.64 0.55 0.49 0.31 0.22 0.21 0.11 0.10 0.04 0.01 0.00 0.00 0.00 0.99 0.82 0.65 0.71 0.50 0.28 0.14 0.15 0.06 0.06 0.02 0.01 0.00 0.00 0.00 1.00 0.62 0.63 0.45 0.49 0.35 0.47 0.39 0.42 0.31 1.00 0.08 0.00 0.00 0.00 0.99 0.82 0.65 0.71 0.50 0.28 0.14 0.15 0.06 0.06 0.02 0.01 0.00 0.00 0.00 Interior cleaning Physical care for/helping adults 2.6.5. Subgroup Analysis To further examine the sensitivity of the model, its performance is tested with different subgroups, which are defined as follows. ● Quantile Subgroups: Quantiles of 2.5%, 5%, 10%, 25%, 75%, 90%, 95%, and 97.5% are calculated and subgroups are set with instances having certain ranges of middle values. - 95%: Instances with all features between the 2.5% and 97.5% range are selected. - 90%: Instances with all features between the 5% and 95% range are selected. - 80%: Instances with all features between the 10% and 90% range are selected. 64 - 50%: Instances with all features between the 25% and 75% range are selected. ● Performance Subgroups: Subgroups are set with activities showing high performance and activities showing low performance. - High: Activities with high performance (accuracy above 0.49) are included in this group (AA01, LL01, BB03, CD01, BB04). - Low: More activities with lower performance are added into the High group (High + LL03, BB07, BB02, EF01, CD02). Activities with almost 0 Accuracy are excluded (BB06, BB05, BB08, LL02). ● Number of Instances Subgroups: Subgroups are set with activities having high numbers of instances and activities having lower numbers of instances. - Major: Activities with high numbers of instances are included in this group (LL01, AA01, BB03, CD01, LL03, BB01, BB04, BB02). - Minor: Additional activities with lower number of instances are included in the Major group (Major + EF01, BB07, BB06). LL02, CD02, BB08, and BB05, which represent less than 1% of the total activity instances, are excluded. Table 2-13 summarizes the results of the subgroup analysis. ● Quantile Subgroups: The overall accuracy of SVM after the parameter tuning with the whole dataset was 0.6405, and the performance of all quantile subgroups was lower than the overall performance. The result indicates that the model loses its distinctive power when the value ranges of the instances are reduced. ● Performance Subgroups: The accuracy of the high performance subgroup reaches 0.8282, which means the model can predict correctly more than 82% of the time for the activities 65 of washing, dressing, and grooming (AA01), watching television (LL01), physical care for children (CD01), and food and drink preparation (BB03). The accuracy of the low performance subgroup also shows higher accuracy (0.6585) than the overall accuracy (0.6405). The result shows that when the quality of the data is improved, the model reaches higher performance. • Number of Instances Subgroups: The accuracy of the major subgroup shows 0.7047 and the minor subgroup shows 0.6438. Both are higher than the overall accuracy. These subgroups excluded activities with a very few number of instances, and the result shows that the performance of the model can be improved with more training data for each class. Also, the result suggests that the data quality of the activities with few instances might not be good in this dataset. Criteria Quantile Group 95% 90% 80% 50% Performance High Low # Instances Major Minor Accuracy 0.6245 0.6117 0.5918 0.5414 0.8282 0.6585 0.7047 0.6438 ** Counts are from the whole dataset Table 2-13. Performance of Subgroups Precision 0.59 0.57 0.55 0.49 0.83 0.63 0.68 0.62 Recall 0.62 0.61 0.59 0.54 0.83 0.66 0.70 0.64 F1-score 0.59 0.57 0.55 0.49 0.83 0.62 0.68 0.61 Count 59040 52715 39677 16061 50009 65060 60218 66152 The results of the subgroup analysis demonstrate that the performance of the model can be even more improved depending on the quality of the data. Its performance with the whole dataset is lower than its performance with only the subgroups with better data quality, since the whole dataset includes outliers and generally poorer quality of data. Depending on the data quality, the model could reach 83% accuracy, and it can be further improved with other datasets of better quality. 66 Discussion The Occupant Behavior Prediction Model can predict occupant behavior with overall 64% accuracy for the ATUS dataset, and its accuracy can reach up to 83% for a subgroup of habitual activities. Notably, the model shows 99% accuracy for predicting washing, dressing, and grooming activity and 82% accuracy for predicting watching television activity. The multi-class classification problems are challenging, and achieving high accuracy in these compared to binary classification problems is difficult (Farid, Zhang, Rahman, Hossain, & Strachan, 2014; Guyon & Elisseeff, 2003). The Occupant Behavior Prediction Model is applied for multi-class classification with 15 classes (15 activities). The result demonstrates high performance pertaining to multi-class classification, especially considering that the probability of correct predictions with simple statistical calculation is 6.7%. This model can identify more-habitual activities and less-habitual activities based on the prediction performance of each activity. The model was tested on the ATUS data to predict activities of the general occupants from nationally representative samples. From the results, people tend to wash, dress, and groom (AA01) as more predictable routines, and watch television in a predictable pattern. They take care of children (CD01) frequently when the children are in need of their care and help. Food and drink preparation (BB03) and kitchen and food clean up (BB04) are habitual and predictive behaviors. Interior cleaning (BB01), laundry (BB02), care for adults (CD02) or pets (BB07), general computer use (LL03), and working at home (EF01) are less predictive, meaning less habitual behavior. Heating fuel preparation (BB05), vehicle maintenance (BB08), and listening to radio/music or playing music (LL02) are very difficult to predict, and therefore they are non-habitual behaviors. 67 There exist some limitations to this study. The ATUS collects diary data for only one specific day from a respondent and does not ensure that it is a typical day for the respondent. Although this shortcoming is compensated for by the large number of samples collected, another study using occupants’ daily records of multiple days is suggested to identify more precise and specific patterns of occupants’ behavior. Also, while the ATUS records one activity at a time, multiple activities can happen concurrently in reality. For example, people may do laundry while watching television. Thus, the complexity of the activities should be considered when applying this model to another dataset. The Occupant Behavior Prediction Model innovatively incorporated the concept of habit to predict occupant behaviors and identify habitual/non-habitual activities, while previous studies about occupant behaviors have tended to focus more on socioeconomic attributes to predict energy consumption. This novel approach explores the past habitual characteristics of the households, predicts their future behaviors, and identifies their habitual behaviors. Habitual behaviors are more difficult to change, but they are easier to predict. For these activities and behaviors, energy systems need to find efficient control strategies that are suitable for these behaviors rather than trying to change the behaviors. In contrast, less habitual behaviors, which are difficult to predict, might be easier to change, and education or intervention might be more effective on these activities. The result can be used to develop more improved occupant schedules and to set specific energy control strategies. Also, the results can be used to develop effective intervention or education for residential occupants. This model will be further applied to examine the geographical patterns of activities (horizontal analysis), and the timely patterns of activities (vertical analysis) in the following chapters. 68 DAILY BEHAVIOR PATTERN AND FACTORS AFFECTING OCCUPANT BEHAVIOR IN RESIDENTIAL BUILDINGS Abstract Residential occupants have a high degree of energy control, unlike commercial building occupants, which implies that residential energy consumption is significantly influenced by the energy usage- related behaviors of the occupants. This study aims to strategically identify the daily routines of habitual behaviors and activities in residential buildings with diverse methods including clustering, comparative analysis, and Geographical Information System (GIS) using the American Time Use Survey (ATUS) data. The patterns of occupant energy usage-related activities are identified using K-modes clustering, and the activities are compared by different perspectives including region, day of the week, gender, and job status. The main energy usage-related activities are analyzed with GIS at the state level. The findings include (1) day of the week, gender, job status affect the similarities and differences in energy usage-related activities, (2) watching TV is one of the most common activities in cluster analysis, and it happens between 18:30 and 21:30. The results can be used to provide more realistic information regarding energy and behavior to the occupants in residential buildings, and it can be applied to new energy and behavior strategies and policies for residential building energy plans. Introduction Residential occupants have significant influences on and control over energy consumption. Residential building energy consumption is affected by climate, physical properties of the building, building services and energy systems, appliances in the household, occupant activities and behavior, and the interactions among them (Widén & Wäckelgård, 2010). As the quality of thermal properties is improved and the technology for energy efficient appliances grows more advanced, the energy consumption associated with buildings’ physical properties and appliances is decreasing. For example, the U.S. Department of Energy (DOE) reports that most recently built houses are 14 percent more energy efficient than houses built 30 years ago, and 40 percent more energy efficient than houses built 60 years ago (U.S.DOE, 2015; Zhao et al., 2017). Also, building design standards and requirements are becoming stricter with regard to energy efficiency of 69 buildings and appliances. However, overall building energy consumption has not decreased (Chen et al., 2015). This energy consumption can be explained by the influence of occupant behavior and living style, and it emphasizes the role of occupant behavior in residential energy savings. Residential energy consumption can be significantly reduced by changing the energy usage-related behaviors of the occupants. Unlike commercial building occupants, residential occupants have a high degree of energy control. They can control heating, ventilation, and air conditioning (HVAC) systems, lighting and electronic devices, and kitchen and laundry appliances, are the main causes of energy consumption in residential buildings (Li & Jiang, 2006). In recent years, the relationship between occupant behavior and residential energy consumption has been studied more actively. However, the existing studies have been less focused on the occupants’ habitual activities and daily routines. The goal of this study is to identify the habitual daily routines of occupant behavior in different groups of occupants, and find out the factors that influence the similarity and differences of energy usage-related activities. To achieve this goal and to solve existing problems, this study (1) defines an occupant behavior prediction model that includes the concept of habit to help understand occupant behavior in a more realistic way by considering past behavior patterns, (2) uses detailed levels of activities for occupant behavior and activity analysis, (3) analyzes the pattern of occupant habitual energy usage-related behavior using the U.S. national behavior data separated out by diverse context, such as by region, day of the week, gender, and job status, and (4) uses GIS to identify if geographical location affects the characteristics of an activity. The identified habitual 70 daily routine of occupant behavior and the factors affecting energy usage-related activities can be used to set more realistic occupant schedules for energy control strategies or energy simulation in residential buildings. The structure of this paper is as follows. First, the background section explains the Occupant Behavior Prediction Model, the use of the American Time Use Survey (ATUS) dataset which records participants’ daily activity logs in a national survey, and the use of Geographic Information System (GIS) as a research method to geographically explain and analyze datasets. Then, the methodologies for clustering analysis, comparative analysis, and GIS techniques are described, and finally, the results are explained and discussed. Background 3.2.1. Occupant Behavior Prediction Model Behavior refers to occupants’ activities, including introspectively observable activities, objectively observable activities, and non-conscious processes responding to internal or external stimuli (APA, 2018). In this research, “behaviors” refers to broader activities as explained, and “activities” refers to a narrower range of objectively observable activities. In addition, “habitual behavior” refers to a behavior influenced by habits. The Occupant Behavior Prediction Model aims to predict occupant behaviors and identify habitual activities. The model incorporates the concept of habit with the components of activity frequency and context. Context includes time (start time, end time, duration), place, and situation (partner, 71 weather, other circumstances). These components are derived from habitual behavior studies to measure the strength of habit in occupant behavior. The predicted occupant activities and the identified habitual and non-habitual activities can be used for efficient building operation and control strategies, more effective interventions, or education on occupant energy usage-related behaviors. In this chapter, this model will be used to identify the habitual daily routines of occupants and the factors affecting their energy usage-related activities. 3.2.2. Use of the ATUS in Occupant Behavior Studies The American Time Use Survey (ATUS) is a survey conducted by U.S. Bureau of Labor Statistics every year. The purpose of the survey is to record the activities, locations, and demographic information of the respondents in a regular day from 4 AM to 4 AM of the next day (Diao et al., 2017). The ATUS provides (1) population measurement and (2) participant measurement. The population measurement provides the average time of an activity for a particular population. The participant measurement estimates the average time spent on an activity per day (Diao et al., 2017). While the time use surveys conducted in other countries, such as the United Kingdom and Sweden, require respondents to record their activity with 5- or 10-minute intervals, the ATUS asks participants to report the start and end times of an activity. The activities in the ATUS data are in a hierarchical tree structure with 3 tiers. The 1st tier consists of overall categories of activities, the 2nd tier consists of intermediate categories of activities, and the 3rd tier contains the most detailed activities. 72 The ATUS data have been used in many behavioral studies since the ATUS records detailed daily diaries for each respondent with activities, times, places, partners, and so on. In addition, the ATUS provides the respondent’s socioeconomic information which supports behavior data analysis. Some examples of past analyses are as follows. Johnson et al. (2014) presented a statistical model for the behavior of residential occupants with the ATUS data. They specified ten simplified activities from the 1st and 2nd tiers of activities, which correspond to the major energy-consuming appliances in a household. Then, they developed time- based statistical models by different occupant types (working male occupant, working female occupant, non-working male occupant, non-working female occupant, and child occupant) using the Markov chain method. The models were applied to energy simulations to show how a residential occupant affects major energy consumption during a day. Diao et al. (2017) identified and classified occupant behavior with energy consumption outcomes. They used 8-17 activities from the 1st tier and 2nd tier ATUS activities. They derived occupant behavior patterns using K-modes clustering, and extracted occupant features from the 2009 ATUS data. These were combined with demographic-based probability neural networks (PNN) to identify 10 behavior patterns. Aksanli et al. (2016) developed a residential energy modeling framework based on human activities to estimate the energy consumption in residential buildings. They used seven simplified activities: sleeping, personal grooming, cooking, cleaning, entertainment, working at home, and going to work, which are derived from the 1st tier activities in the ATUS. They extracted action- 73 and activity-related parameters from the ATUS, such as duration of each activity by different user group based on the demographic information of the occupants, such as age, gender, job status, and number of household members. These parameters were applied to a probabilistic model to capture the time-series characteristics of occupant behavior. While most of the current studies used the 1st tier or 2nd tier activities, this study uses the 3rd tier activity list to provide more detailed and realistic behavioral analysis. Especially, all of the original 3rd tier activities are included to identify the habitual daily routines of occupants, and the main energy usage-related activities are directly selected from the 3rd tier activities in this study. 3.2.3. Use of GIS in Building/Construction Studies Geographic Information System (GIS) has been used often in research. GIS is beneficial as a useful cognitive tool to analyze and gather spatial data with its visual interface, which can help experts from other areas understand the data easily (Fonseca & Schlueter, 2015). It allows the researchers and other stakeholders to quickly identify data patterns and outliers (Kolter & Ferreira Jr, 2011). GIS is used not only as a tool to display data on the map, but also as a method to analyze data by geographical location. Recently, building and other construction fields have been actively employing GIS as a part of their research methods as well, since GIS can capture, store, analyze, manage, and present spatial or geographic data including not only the energy or construction related data but also the location (i.e. address, city, state) and physical properties of buildings (i.e. size, height) (Ma & Cheng, 2016). Also, city-wide GIS databases have been available in many regions of the world and accessible to 74 the general public (Reinhart & Davila, 2016). GIS has more potential to combine with 3D models of buildings, energy simulations, and real-time databases at a large geographical scale. Some examples are as follows. Fonseca and Schlueter (2015) developed an integrated model for building energy consumption patterns in city districts. They used spatial analysis, dynamic building energy modeling, and energy mapping combined with GIS. The model focused on determination of the spatiotemporal variability of energy services in existing and future buildings in commercial, residential, and industrial sectors. It provided detailed assessments of potential energy efficiency measures in the city district scale. Howard et al (2012) developed a model to estimate the energy end-use intensity (EUI) in the building sector for space heating, cooling, domestic hot water, and appliances in New York City. They assumed that energy consumption primarily depended on building functions (residential, office, educational, etc.) and not on the physical characteristics of a building (construction type, age of building, etc.). The end-use ratios were obtained from the Microdata of the Residential Energy Consumption Survey (RECS) and the Commercial Building Energy Consumption Survey (CBECS), and they estimated the energy consumption. The modeled energy usage and the percent differences between the measured and predicted consumption were calibrated by ZIP code level and displayed on a map using GIS. Heiple and Sailor (2008) presented a technique to estimate hourly and seasonal energy consumption profiles in buildings at detailed spatial scales, tax lots, or parcels. They combined GIS framework and annual building energy simulations for city-specific prototypes. They applied 75 the method to Houston, TX, and the result can estimate sensible and latent wastes of heat emissions related to building energy consumption. Kolter and Ferreira (2011) suggested a system to model end use energy consumption in residential and commercial buildings in Cambridge, MA with a data-driven approach. They combined monthly electricity and gas bill data, tax assessor records, and the GIS database containing polygonal outlines and estimated roof heights for buildings and parcels in the city. They predicted energy distributions using both parametric and non-parametric methods, and provided a system that visualized each building’s energy consumption in the city using GIS. Most of the existing building and construction studies using GIS have focused more on building energy consumption, physical building properties, or demographic information of occupants. However, this study combines GIS with the Occupant Behavior Model, and uses spatial analysis (the grouping analysis with K-means clustering) to explain the similarities and differences in energy usage-related behaviors of residential occupants by state. Methodology The behaviors and activity patterns in the ATUS data will be analyzed in the following parts. ● Part 1 - Data Selection: In this part, all of the ATUS activity data are used. All recorded activities from all places are included – neither limited to energy usage-related activities nor limited to the activities at home. 76 - Instance: One row of the dataset is one respondent’s daily activities in one-minute intervals during 24 hours from 4:00 AM to next day 4:00 AM. - Expected Outcome: Identify different groups of occupants based on their daily behavioral patterns using clustering analysis. ● Part 2 - Data Selection: In this part, only energy usage-related activities at home are analyzed. Also, the data format is different from the format used in Part 1. - Instance: One row of the dataset is one energy usage-related activity with its respondent code, region, day of the week, gender, job status, and properties of the activity including Frequency, Duration, Start Time, End Time, and Partner. - Expected Outcome: Identify similarities and differences in energy usage-related activities by the clusters in Part 1, region, day of the week, gender, and job status using comparative analysis. ● Part 3 - Data Selection: Among the energy usage-related activities in Part 2, the five most habitual activities are selected. - Instance: One row of the dataset is similar to an instance in Part 2. Instances in this part include state information. - Expected Outcome: Identify geographical similarities and differences between selected energy usage-related activities using GIS. 77 In all parts, the original ordinal HH:MM format of Start Time and End Time is converted to a numeric minute format. For example, 13:10 is converted to 790 minutes. The detailed process is further explained in the following subsections. 3.3.1. Clustering of Occupant Daily Activities by Time Clustering methods are used to identify distinctive groups of occupants based on their daily activities. The ATUS data is pre-processed for the clustering analysis, and K-modes clustering is selected based on the data type. For clustering of occupants, national level occupant data is used without segmentation by state. 3.3.1.1. Data Preparation The original ATUS data recorded activities of an occupant as a form of sequential list with activity names (codes), places, partners, start times, and end times. In this chapter, all activities (not only limited to energy usage-related activities) from all places (not only limited to activities at home) are included. For the clustering analysis, the data are re-organized with the features (columns) of one-minute interval timestamps and the instances (rows) of occupants. It standardizes the data format of occupant activities by the same timestamps and helps the clustering algorithm identify the pattern of activities more clearly. The number of features are 1440 (1440 minutes per day), and the number of instances are 10772 (10772 respondents). The sample inputs are described in Table 3-1. All of the 465 original 3rd tier activities in the ATUS data are included in this clustering analysis with the initial 3rd tier activity codes in a numeric format. For example, sleeping is coded as 010101, grooming as 010299, cleaning as 020101, working as 050101, watching TV as 120303, and cooking as 150201. 78 Person 1 Person 2 … 010101 150201 010101 150201 Table 3-1. Sample Inputs for Clustering Analysis t1 t2 t3 … 010299 … 150201 … 050101 … t1438 010101 020101 t1439 010101 120303 t1440 010101 120303 Person 10772 010299 050101 010299 010101 010101 3.3.1.2. K-modes Clustering Clustering divides a set of instances into a number of groups (clusters) so that instances in the same cluster are similar to each other and different from those in other clusters. K-means clustering is one of the most common clustering algorithms for numeric values. However, another approach is necessary for categorical data. K-modes clustering employs a simple matching dissimilarity measure which is suitable for categorical data. It uses the modes of the clusters instead of means, and updates modes in the clustering to minimize the cost function using a frequency-based method (Z. Huang, 1998). K-modes clustering uses a function minimizing cluster distance as follows (Diao et al., 2017). IJ,L = V OWX QR∈TU M(J0,LO) Where D (X, C) is the sum of within-cluster distance. X = {X1, X2, X3, …, Xn} is the dataset with n instances with a vector of categorical attributes {A1, A2, A3, …, Am}, and Xi is the ith instance of X. C = {C1, C2, C3, …, Ck} is the centers of K different clusters, and Ck is the center of the kth cluster. d (x, c) is the distance function for calculating the distance between two categorical vectors. As described in Table 3-1, the type of inputs are categorical data, and K-modes clustering is selected for the clustering of the occupant activities. 79 To determine a suitable number of clusters, Bayesian inference criterion (BIC) is selected. BIC is explained as follows (Jain, 2010; Kodinariya & Makwana, 2013). YZL= −2 ×ln@:_9@:ℎ<=80 1 0: None, 1: 1-4, 2: 5-9, 3: >=10 Table 5-10 (cont’d) ** (2) Additional General Appliances STOVEN TOAST TOASTOVN CROCKPOT FOODPROC RICECOOK BLENDER DISHWASH CWASHER DRYER PLAYSTA Number of stoves Toaster used Toaster oven used Crockpot or slow cooker used Food processor used Rice cooker used Blender or juicer used Have dishwasher Have clothes washer in home Have clothes dryer in home Number of video game consoles Number of DVD players Number of VCRs Number of laptop computers Number of tablet computers Number of printers, scanners, fax machines, or copiers Number of other cell phones Humidifier used CELLPHONE MOISTURE NUMWHOLEFAN Number of whole house fans DVD VCR NUMLAPTOP NUMTABLET ELPERIPH LGTINCAN LGTINCFL LGTINLED ESCWASH ESDISHW ESDRYER ESFREEZE ESFRIG ESLIGHT ESWATER ESWIN used Portion of inside light bulbs that are incandescent Portion of inside light bulbs that are CFL Portion of inside light bulbs that are LED Energy Star qualified clothes washer Energy Star qualified dishwasher Energy Star qualified clothes dryer Energy Star qualified freezer Energy Star qualified refrigerator Energy Star qualified lightbulbs Energy Star qualified water heating Energy Star qualified windows 1.0 Number of stoves 1 1: Yes, 0: No 0 1: Yes, 0: No 0 1: Yes, 0: No 0 1: Yes, 0: No 0 1: Yes, 0: No 0 1: Yes, 0: No 1 1: Yes, 0: No 1 1: Yes, 0: No 1 1: Yes, 0: No 1.1 Number of video game consoles 1.1 Number of DVD players 0.3 Number of VCRs 1.5 Number of laptop computers 1.6 Number of tablet computers 0.9 Number of printers, scanners, fax machines, or copiers 0.4 Number of other cell phones 0 1: Yes, 0: No 0.1 Number of whole house fans used 4 1: All, 2: Most, 3: About half, 4: 4 1: All, 2: Most, 3: About half, 4: Some, 0: None Some, 0: None 0 1: All, 2: Most, 3: About half, 4: Some, 0: None 1 1: Yes, 0: No 1 1: Yes, 0: No 1 1: Yes, 0: No -2 1: Yes, 0: No, -2: N/A 1 1: Yes, 0: No 1 1: Yes, 0: No 0 1: Yes, 0: No 0 1: Yes, 0: No Table 5-10 specifies the average numbers or mode values of appliances that the selected households own. It provides a list of common appliances and their numbers in the households that are similar to the sensor data sample. The table has 2 sections: (1) appliances that are selected 183 features for the effective prediction of electricity usage (Mo, 2018)(Chapter 4), and (2) additional common appliances in general households. Discussion and Conclusion The features derived from the Occupant Behavior Prediction Model predicted residential appliances with 96% accuracy using a Decision Tree algorithm. It implies that the daily appliance usage and associated activities of a household are quite well patterned, and can be precisely predicted. This information can be further used to set efficient energy saving strategies for a household by analyzing the impact and usage time of the appliances and associated activities. Clustering analysis provided further energy consumption characteristics of the households by identifying and analyzing the days and times when energy was used. Daily energy consumption by minute-level interval is clustered with 4 groups, which is mainly influence by CDD and HDD (hot, warm, cool, and cold days). It shows that energy consumption for cooling and heating have strong influences on total energy consumption in residential buildings. Also, the minute-level daily energy consumption can be used for detailed energy strategies for the households having similar conditions with this testbed. Additional descriptive analysis of the ATUS and the RECS supplemented the sensor-measured data by providing detailed activity schedules and appliance lists from households that were similar to the household in the sensor data sample. There are also some limitations to the datasets. First, the sensor does not identify appliances exactly. It identifies the differences between appliances, but it does not precisely recognize if, for example, an appliance is a coffee maker or a toaster. It still requires appliance names to be manually identified. As a result, the appliances from the sensor data have arbitrary names such as Appliance 184 12, Heating Element 7, etc. Since the names of the appliances are important to estimate the activities associated with them, the ambiguous appliance names are a barrier to precisely predicting activities by time. Once this limitation is resolved, activities can be predicted with other methodologies such as machine learning. The individual datasets, the ATUS, the RECS, and the sensor-measured data have different data formats. The ATUS collects activities from several respondents, and records a daily diary from a single person doing one activity per timestamp during one specific day. It lacks a record of other household members’ activities and simultaneous activities. While the ATUS data reflect individual activities with state-level demographic information, the RECS data are household-level with census division–level demographic information. In this study, the 2015 RECS and the 2015 ATUS are used, as they were the most recent matching years when this study started. Although the ATUS collects individual time-series activity data and the RECS collects household yearly survey data, many studies extract the structures and important concepts from both datasets and use them for further analysis. Future research will conduct more refined behavior prediction. Once the appliance names are clearly identified with more enhanced NILM and manual detection in the sensor-measured event data, the time-series appliance data can be mapped with the time-series electricity usage. More advanced methodologies can be applied to these minute-level datasets, which can predict more precise time-series activities. 185 SUMMARY AND CONCLUSION OF THE RESEARCH Summary of Research The purpose of this research is to identify a relationship between energy consumption and occupant behavior in detail and with consideration of building technology, and to build a model to predict behavior based on energy data using machine learning approaches, which can be potentially used to create efficient building operation and control strategies. At the beginning of the study, the Occupant Behavior Prediction Model was developed, and this model was applied to the national survey data, the ATUS data, and the sensor-measured data (Figure 6-1). In order to define the structural relationship between occupant behavior, building technology, and energy usage, the ATUS, RECS, and sensor data are analyzed with several methods including machine learning classification, numeric prediction, clustering (K-modes clustering and K-means clustering). GIS was also used for spatial analysis and geographical representation. Behavior Predictive Model National Data (ATUS) Frequency Activity National Data (RECS) Energy Usage Appliance Context (Time, Place, Situation) Activity Activity Appliance Measured Data (SMAPPEE) Energy Usage Figure 6-1. Summary of the Research 186 Summary of Findings In Part I (Chapter 2), the Occupant Behavior Prediction Model was developed based on habitual behavior studies. The model was applied to the ATUS data, and findings include the prediction of occupant energy usage–related activities and behaviors with the components of the Occupant Behavior Prediction Model, and the identification of habitual energy usage–related activities. The Occupant Behavior Prediction Model can predict occupant behavior with overall 64% accuracy for the ATUS dataset, and its accuracy can reach up to 83% for a subgroup of habitual activities. Notably, the model shows 99% accuracy for predicting washing, dressing, and grooming activity and 82% accuracy for predicting watching television activity. In Part II (Chapter 3), occupant clusters’ daily routines of activities by time are derived from the ATUS data, and the time ranges of major habitual energy usage–related activities are identified. In the latter sections of this chapter, the influences of major factors (occupant clusters, day of the week, gender, region) on habitual energy usage–related activities are identified. GIS analysis identified the geographical pattern of the selected energy usage–related activities. Watching TV is one of the most habitual energy usage-related activities, and it is included in 5 clusters. Based on the overlapping time from these 5 clusters, the occupants watch TV around from 18:30 to 21:30, and it means that one of the most habitual energy usage-related activities strongly tend to happen during this time. Day of the week, gender, and job status have strong influences on the difference of energy usage-related activities. In Part III (Chapter 4), features with significant impacts on energy consumption are selected from the RECS data, and energy consumption is predicted with the selected features. The model’s 187 prediction performances with all features vs. with selected features are compared, and the effectiveness of the selected features are measured. The findings include the efficient features to predict energy consumption, and the prediction of total energy consumption in residential buildings. The main selected features are refrigerator, freezer, oven and television from the Appliance category, TV, cloth dryer, and swimming pool usage from the Behavior category, housing type, number of rooms from Technology category, and number of household members and number of young (under 18 years old) household members from the Demographic category. The selected 32 features predict the total electricity consumption with 78% accuracy, which almost reaches 80% accuracy with all 271 features. It shows that the selected features keep 98% of the prediction power compared to all of the 271 features. In Part IV (Chapter 5), the Occupant Behavior Prediction Model is applied to the household sensor- measured dataset. This chapter synthesized the findings from Parts I through III. Using machine learning approaches, appliances and associated activities are predicted with electricity consumption data. The findings are as follows. The appliance names are predicted with 96 percent accuracy based on the Occupant Behavior Prediction Model using DT algorithm. Daily energy consumption by minute-level interval is clustered with 4 groups, which is mainly influence by CDD and HDD (hot, warm, cool, and cold days). It shows that energy consumption for cooling and heating have strong influences on total energy consumption in residential buildings. Also, the minute-level daily energy consumption can be used for detailed energy strategies for the households having similar conditions with this testbed. 188 Contributions Unlike existing studies, which focused on predicting energy consumption based on occupant behavior, this study innovatively developed the reverse prediction model: predicting occupant behavior based on energy consumption. This model is valuable in that it provides more detailed and precise occupant behavior patterns including the daily schedule and the habitual characteristics of the occupant activities. The contribution of this research is described in Figure 6-2. The structured list and model from each research step will reveal detailed and dynamic interactions between occupant behavior, energy usage, and building technology. These findings can contribute to three different groups: residential occupants, industry companies, and researchers. Research Findings Structural List of Occupant Behavior Framework of OB, Energy, Tech Predictive Model of Occupant Behavior Residential Occupants Understand Behavior & Save Energy Energy Industry Improve Measurement & Strategies Researchers Improve Analytic Analysis Figure 6-2. Research Contributions First, this research will have an impact on residential occupant behavior by helping occupants better understand their own behaviors’ effects on energy usage, and detect what changes would improve energy efficiency in their homes since the Occupant Behavior Prediction Model can explain the unique behavior pattern of each household based on their energy consumption data. The detailed breakdown of energy consumption will explain occupants’ behavior patterns of heating, cooling, and appliance usage. Then, it will identify their most energy-consuming behaviors, which will help with setting effective energy saving strategies at the residential building. 189 In addition, the analysis will indicate which appliances use the most energy, and help occupants select energy-efficient appliances. Second, the findings will be beneficial to energy-related industries. The Occupant Behavior Prediction Model can be applied to energy sensors and energy dashboards to improve measurement and analysis strategies. The most important behavior factors will improve the strategies around residential energy monitoring sensor development and placement by providing guides on what to measure, what kind of factors should be focused on, and how to measure them. For example, the findings provide where to install energy monitoring sensors to collect critical information to analyze occupant behavior and energy consumption. Furthermore, it can be used to optimize heating, cooling, and appliance schedules. Third, the structured list of behaviors will enhance the methodology for future building energy research areas, such as statistical analysis, case studies, energy simulation, and, etc. The model will deepen understanding of occupant behavior with regard to residential energy usage, and will improve the analysis about energy and occupant behavior. In addition, the findings will be beneficial for national energy and time-use surveys, such as the RECS and the ATUS. It can be further used to develop more meaningful energy policy. Intellectual Merit Most of the occupant behavior studies used occupant activities and behaviors to predict energy consumption. However, this research approached the topic in a novel way, reversing past studies’ 190 approaches and using the concept of habit to predict occupant behaviors with energy consumption data. This research developed the Occupant Behavior Prediction Model, which has the potential to be used for efficient energy control strategies, occupant interventions, and education for energy savings. The model showed high performance when predicting occupant activities and behaviors, which was verified with two different types of datasets: the national survey data and the sensor- measured specific household energy consumption data. The study can be scaled up for larger datasets from households throughout a city or state. Broad Impacts The machine learning approaches used in this study can be utilized for other diverse studies. Especially, the Occupant Behavior Prediction Model which combined the concept of habit formation and the methods of machine learning has strong potential to be applied to various research fields. In building and construction domain, this model can be extended to other energy types, occupant types, and building types in the near future. In broader domains, this model has the potential to be further integrated with research in psychology, sociology, economics, and other fields. Limitations There are some limitations in this study, mainly related to the datasets. First, the sensor did not identify appliances exactly. It recognized that different appliances were separate, but could not identify what a given appliance actually was – for example, whether a small appliance was a coffee 191 maker or a toaster. It still required manual identification of the appliance names. As a result, the appliances from the sensor data have arbitrary names such as Appliance 12, Heating element 7, etc. Since the names of the appliances are important to estimate the activities associated with them, the ambiguous appliance names are a barrier to precisely predicting activities by time. Once this limitation is resolved, activities can be predicted with other methodologies such as machine learning. The individual datasets, the ATUS, the RECS, and the sensor-measured data have different data formats. The ATUS collects activity data from several respondents, and records a daily diary for each person, with one activity per timestamp during one specific day. It lacks other household members’ activities and simultaneous activities. While the ATUS data record individual activities with state-level demographic information, the RECS data are household-level data with census division–level demographic information. In this study, the 2015 RECS and the 2015 ATUS are used for consistent data collection years, as 2015 was the most recent matching year at the moment this study started. Since then, the 2017 RECS started to record state information, which will help with a more precise regional comparison between the ATUS and the RECS. Although the ATUS is individual time-series activity data and the RECS is household yearly survey data, many studies extract the structures and important concepts from these datasets and use them together for further analysis. Future Research Future research will conduct more refined behavior prediction using improved data quality. Also, future research will include more sensor-measured data from residential buildings – other types of 192 energy, more building technologies including heating, cooling, and ventilation systems (HVAC), and indoor environmental quality (IEQ). The model and data analysis methods can be expanded to fit larger areas and other types of buildings (commercial buildings, educational buildings, etc.). Refining Behavior Prediction Future research will conduct more refined behavior prediction. Once the appliance names are clearly identified with more enhanced nonintrusive load monitoring (NILM) and manual detection in the sensor-measured event data, the time-series appliance data can be mapped with the time- series electricity usage. More advanced methodologies can be applied to these minute-level datasets, and with more detailed data, they can predict more precise time-series activities. Including Gas and Water Consumption This research concentrated on occupant activities and behaviors that affected electricity consumption and appliances. Future research will include occupant behaviors’ effects on gas and water consumption, and will include other renewable energy production (such as residential solar energy production with photovoltaic panels) if applicable. Applying to HVAC Systems and IEQ While this research focused on appliance usage and their associated occupant behaviors, future research will apply the Occupant Behavior Prediction Model to heating, cooling, and ventilation systems and indoor environmental quality criteria including thermal comfort, lighting, noise, air quality, etc. 193 Expanding to Broader Area for Measured Data Future studies will collect more measured data of occupant behaviors and energy consumption at the city or state level. Data collection will include more detailed geographical location information, which will enable more precise GIS analysis. Expanding to Other Types of Buildings and Occupants The Occupant Behavior Prediction Model can be expanded to occupants in other types of buildings (commercial buildings, educational buildings, facilities for the elderly, etc.), and the results can contribute to improve their energy strategies considering occupant behavior in their facilities. 194 APPENDICES 195 APPENDIX A. GIS Analysis for Main Activities: All Maps Figure A-1. AA01 State Clusters by Grouping Analysis 196 Figure A-2. AA01 Frequency by Quantiles 197 Figure A-3. AA01 Duration by Quantiles 198 Figure A-4. AA01 Start Time by Quantiles 199 Figure A-5. AA01 End Time by Quantiles 200 Figure A-6. AA01 Partner 201 Figure A-7. LL01 State Clusters by Grouping Analysis 202 Figure A-8. LL01 Frequency by Quantiles 203 Figure A-9. LL01 Duration by Quantiles 204 Figure A-10. LL01 Start Time by Quantiles 205 Figure A-11. LL01 End Time by Quantiles 206 Figure A-12. LL01 Partner 207 Figure A-13. CD01 State Clusters by Grouping Analysis 208 Figure A-14. CD01 Frequency by Quantiles 209 Figure A-15. CD01 Duration by Quantiles 210 Figure A-16. CD01 Start Time by Quantiles 211 Figure A-17. CD01 End Time by Quantiles 212 Figure A-18. CD01 Partner 213 Figure A-19. BB03 State Clusters by Grouping Analysis 214 Figure A-20. BB03 Frequency by Quantiles 215 Figure A-21. BB03 Duration by Quantiles 216 Figure A-22. BB03 Start Time by Quantiles 217 Figure A-23. BB03 End Time by Quantiles 218 Figure A-24. BB03 Partner 219 Figure A-25. BB04 State Clusters by Grouping Analysis 220 Figure A-26. BB04 Frequency by Quantiles 221 Figure A-27. BB04 Duration by Quantiles 222 Figure A-28. BB04 Start Time by Quantiles 223 Figure A-29. BB04 End Time by Quantiles 224 Figure A-30. BB04 Partner 225 APPENDIX B. Descriptive Analysis for Activities: Full Tables Table A-1. Mean and CV of Activities by Cluster Mean (Mode) CV Start 662.09 911.07 958.78 819.86 29.90 41.71 36.46 27.35 22.86 13.45 68.24 11.96 67.53 21.13 23.00 End Freq Dur/a Dur/d 687.09 51.51 1.88 952.78 47.57 1.13 982.47 43.48 1.30 847.22 37.77 1.49 26.06 1019.70 1042.57 1.16 858.79 19.24 872.25 1.35 950.53 1018.77 75.71 1.12 736.01 724.05 18.19 1.60 78.05 821.42 888.95 1.11 905.08 883.96 44.31 2.17 707.52 684.52 1.13 27.29 2.76 119.99 222.08 819.13 928.85 93.41 128.16 1112.22 1171.92 1.52 75.66 1072.90 1092.81 61.05 1.26 71.57 1010.24 1027.30 54.00 1.19 830.36 50.13 32.83 1.62 812.98 1.23 65.17 76.46 808.13 749.46 843.05 70.00 55.76 1.56 796.20 844.57 52.70 34.32 1.65 812.72 939.22 38.18 29.78 1.33 917.94 796.44 48.19 46.31 1.13 750.13 1.13 73.46 82.12 858.67 785.21 824.68 23.12 15.01 1.70 816.13 897.70 99.05 88.27 1.17 809.43 935.31 24.88 3.05 65.85 910.43 1.95 31.93 71.08 780.18 748.25 867.91 1.52 893.67 86.12 123.52 981.30 1011.05 2.45 162.92 338.39 854.77 999.71 73.23 1.15 83.76 1.33 81.54 119.81 901.68 900.09 738.04 731.64 53.33 32.24 1.82 806.92 752.67 64.11 59.92 1.08 860.23 820.63 48.99 39.59 1.36 802.72 792.07 37.38 27.74 1.49 1.32 25.09 30.93 952.59 966.02 936.86 894.71 42.14 42.14 1.00 873.14 783.27 93.65 89.86 1.05 774.23 768.90 14.44 1.77 24.05 1.20 91.40 102.07 811.87 903.27 799.39 794.07 47.72 26.04 2.06 467.50 447.50 1.00 20.00 20.00 955.77 2.81 133.73 264.40 925.90 81.35 110.13 1020.23 1.48 819.75 92.67 685.54 814.49 63.06 1.20 853.55 974.37 83.12 60.11 1.28 862.88 842.77 1.75 55.12 34.01 825.17 718.53 1.34 108.83 136.88 1.33 68.11 749.33 799.12 791.41 757.83 61.99 1.89 1.40 41.30 899.46 917.23 57.48 35.69 29.91 Partner Freq Dur/a Dur/d Start End Partner 0.00 0.39 0.37 0.39 0.38 0.44 0.38 0.37 0.44 0.10 0.18 0.54 0.35 0.40 0.40 0.00 0.38 0.37 0.41 0.41 0.45 0.40 0.34 0.45 0.12 0.13 0.40 0.40 0.37 0.41 0.00 0.37 0.33 0.41 0.44 0.55 0.35 0.23 0.46 0.10 0.00 0.47 0.41 0.54 0.44 0.00 0.40 0.40 0.41 0.40 0.64 0.41 0.39 1.04 0.34 0.33 1.08 0.31 0.32 0.80 0.36 0.36 0.74 0.25 0.24 0.83 0.40 0.39 0.92 0.24 0.24 1.20 0.44 0.43 1.02 0.39 0.40 1.13 0.36 0.35 1.55 0.58 0.57 0.93 0.35 0.29 0.66 0.18 0.22 1.07 0.25 0.30 1.23 0.30 0.33 0.79 0.37 0.37 0.93 0.32 0.30 0.80 0.27 0.26 0.92 0.27 0.27 1.20 0.27 0.27 1.91 0.42 0.40 0.75 0.25 0.22 1.16 0.37 0.37 0.89 0.27 0.23 0.96 0.31 0.30 1.48 0.40 0.39 1.13 0.31 0.34 0.53 0.17 0.30 0.97 0.35 0.55 0.97 0.32 0.37 0.59 0.38 0.39 0.84 0.39 0.36 0.96 0.38 0.36 0.91 0.37 0.38 0.77 0.31 0.33 1.04 0.40 0.39 0.80 0.32 0.28 1.11 0.40 0.40 1.32 0.26 0.21 1.11 0.46 0.46 0.35 0.04 0.05 0.87 0.28 0.33 0.67 0.36 0.61 1.09 0.55 0.70 1.17 0.37 0.52 0.72 0.36 0.35 0.81 0.25 0.21 1.08 0.31 0.29 0.95 0.29 0.28 0.91 0.33 0.34 -1 0.50 1 0.32 1 0.54 1 0.53 1 0.35 1 0.45 1 0.36 1 0.62 1 0.29 2 0.66 2 0.30 1 0.48 2 0.50 1 0.65 1 0.40 -1 0.52 1 0.43 1 0.73 1 0.53 1 0.45 1 0.30 1 0.33 1 0.61 1 0.32 2 0.95 2 1.19 1 0.60 1 0.47 1 0.41 1 0.47 -1 0.49 1 0.28 1 0.51 1 0.55 1 0.52 1 0.00 1 0.22 1 0.89 1 0.47 2 0.85 2 0.00 1 0.49 1 0.52 1 0.47 1 0.54 -1 0.52 1 0.46 1 0.54 1 0.62 1 0.49 0.60 0.95 1.07 0.77 0.63 0.58 0.94 0.96 0.91 0.84 1.23 0.91 0.66 1.04 1.02 0.68 0.92 0.89 0.94 1.33 1.99 0.74 1.11 0.89 0.95 1.39 0.95 0.70 0.92 0.94 0.61 0.85 0.93 0.97 0.64 1.04 0.81 1.10 1.42 1.08 0.35 0.68 0.65 0.75 0.90 0.67 0.81 1.18 1.07 0.68 CL Code 1 AA01 BB01 BB02 BB03 BB04 BB05 BB06 BB07 BB08 CD01 CD02 EF01 LL01 LL02 LL03 2 AA01 BB01 BB02 BB03 BB04 BB05 BB06 BB07 BB08 CD01 CD02 EF01 LL01 LL02 LL03 3 AA01 BB01 BB02 BB03 BB04 BB05 BB06 BB07 BB08 CD01 CD02 EF01 LL01 LL02 LL03 4 AA01 BB01 BB02 BB03 BB04 226 BB05 BB06 BB07 BB08 CD01 CD02 EF01 LL01 LL02 LL03 5 AA01 BB01 BB02 BB03 BB04 BB05 BB06 BB07 BB08 CD01 CD02 EF01 LL01 LL02 LL03 6 AA01 BB01 BB02 BB03 BB04 BB05 BB06 BB07 BB08 CD01 CD02 EF01 LL01 LL02 LL03 7 AA01 BB01 BB02 BB03 BB04 BB05 BB06 BB07 BB08 CD01 CD02 EF01 LL01 LL02 LL03 781.56 67.50 1.25 828.44 749.34 95.63 1.18 834.20 854.51 1.89 34.31 873.33 707.68 1.21 118.75 138.93 826.43 891.63 3.20 73.26 25.48 911.43 927.99 32.49 1.71 68.93 939.91 92.54 127.45 1.39 827.31 896.62 74.70 100.87 1007.47 1.41 940.17 1.33 58.93 915.51 73.74 974.44 902.21 70.00 123.37 1.44 922.01 812.50 1.73 32.04 51.98 830.94 618.59 1.40 107.43 139.71 725.33 1.47 68.02 765.39 52.86 815.66 797.88 73.35 1.98 41.12 839.00 904.45 40.48 1.40 29.91 934.36 789.11 68.60 1.33 52.60 841.71 769.94 73.53 1.22 62.11 832.06 1.82 39.04 801.84 22.02 823.85 780.54 78.26 1.13 69.13 849.67 890.97 66.56 2.98 24.49 908.13 677.58 31.03 2.14 66.64 708.61 1.46 70.16 107.90 889.85 938.00 89.89 138.57 1070.05 1141.48 1.67 930.54 70.11 1.20 78.50 938.05 62.44 1.31 97.28 1.71 30.59 870.00 48.06 810.50 68.67 1.30 81.96 829.46 57.69 1.51 75.01 848.86 36.63 1.82 62.14 926.81 27.22 1.31 35.51 1.25 45.75 902.50 47.00 845.58 1.46 163.20 219.04 766.04 20.66 1.62 31.79 859.74 81.83 110.52 1.35 2.89 29.15 65.02 936.18 625.25 19.25 1.25 26.75 1.59 86.62 146.83 878.30 1.72 108.02 167.63 1107.38 1162.35 1.18 89.64 1032.73 1108.55 931.39 88.06 1.30 805.55 45.60 1.62 746.86 68.61 1.21 811.49 66.47 1.44 1.81 55.06 783.43 884.46 1.31 35.01 646.31 1.15 103.19 106.46 803.25 69.94 1.12 1.69 28.43 728.78 860.96 93.84 1.16 912.60 53.53 2.48 717.74 29.85 1.58 1.29 88.90 878.96 977.21 2.70 156.38 362.63 872.23 99.76 115.06 1.17 1.29 78.79 112.29 886.70 921.62 777.14 687.33 757.27 751.51 856.95 543.12 740.51 710.68 778.12 888.62 698.83 829.41 839.20 832.47 815.67 956.43 895.93 851.31 747.17 771.77 812.24 904.29 856.75 682.38 787.73 777.91 907.03 606.00 896.04 75.82 61.77 30.25 59.53 54.23 31.92 27.50 62.74 18.10 82.84 23.98 18.91 72.05 Table A-1 (cont’d) 46.88 84.85 18.82 1 0.37 1 0.38 1 0.65 1 0.35 2 0.96 2 0.66 1 0.54 2 0.46 1 0.59 1 0.54 -1 0.52 1 0.52 1 0.56 1 0.57 1 0.46 1 0.46 1 0.44 1 0.79 1 0.30 2 0.77 2 0.71 1 0.55 2 0.50 1 0.40 1 0.45 -1 0.53 1 0.40 1 0.55 1 0.56 1 0.44 2 0.40 1 0.50 1 0.54 1 0.42 2 0.95 2 0.40 1 0.56 2 0.50 1 0.51 1 0.45 -1 0.53 1 0.40 1 0.54 1 0.53 1 0.44 1 0.33 1 0.31 1 0.67 1 0.32 2 0.89 2 0.84 1 0.40 1 0.50 1 0.32 1 0.52 0.93 0.93 0.96 0.84 0.65 0.97 0.92 0.64 0.81 0.79 0.97 0.77 1.09 0.91 0.73 1.30 0.83 1.11 0.92 0.66 1.44 0.84 0.58 0.66 0.87 0.57 0.83 1.36 0.75 0.58 1.36 0.65 1.06 0.90 0.97 0.70 1.07 0.80 0.53 0.82 0.65 0.92 0.92 0.94 0.86 1.46 0.75 1.05 0.86 0.73 1.11 0.92 0.76 0.87 0.79 1.10 0.56 0.54 0.90 0.32 0.26 1.27 0.38 0.37 0.91 0.29 0.23 0.97 0.33 0.32 1.57 0.27 0.29 1.10 0.35 0.32 0.72 0.32 0.45 0.78 0.36 0.34 1.06 0.35 0.36 0.91 0.36 0.35 0.77 0.31 0.24 0.98 0.30 0.28 0.82 0.25 0.25 0.79 0.27 0.27 1.21 0.37 0.37 0.86 0.31 0.28 1.37 0.38 0.37 1.01 0.26 0.24 0.88 0.30 0.30 1.30 0.37 0.38 1.25 0.31 0.30 0.61 0.17 0.19 0.58 0.33 0.43 1.04 0.31 0.32 0.59 0.35 0.35 0.79 0.32 0.29 1.26 0.32 0.28 0.86 0.26 0.26 0.69 0.30 0.29 1.31 0.37 0.29 0.64 0.22 0.16 1.04 0.43 0.45 1.20 0.23 0.21 0.91 0.32 0.30 0.93 0.48 0.48 1.37 0.29 0.35 0.65 0.16 0.22 0.68 0.25 0.22 0.90 0.31 0.35 0.68 0.39 0.37 0.92 0.34 0.31 0.86 0.31 0.29 0.89 0.29 0.28 0.92 0.29 0.28 1.40 0.37 0.44 0.78 0.27 0.25 1.18 0.39 0.39 0.96 0.26 0.24 0.89 0.32 0.31 1.23 0.42 0.40 1.02 0.32 0.33 0.60 0.19 0.18 0.87 0.46 0.44 0.94 0.36 0.34 0.31 0.39 0.43 0.51 0.11 0.17 0.43 0.38 0.41 0.43 0.00 0.38 0.35 0.39 0.38 0.35 0.39 0.33 0.42 0.12 0.24 0.38 0.36 0.41 0.40 0.00 0.38 0.38 0.37 0.36 0.38 0.40 0.37 0.36 0.09 0.00 0.34 0.33 0.34 0.37 0.00 0.39 0.34 0.40 0.39 0.33 0.40 0.31 0.53 0.15 0.10 0.42 0.39 0.41 0.40 227 8 AA01 BB01 BB02 BB03 BB04 BB05 BB06 BB07 BB08 CD01 CD02 EF01 LL01 LL02 LL03 9 AA01 BB01 BB02 BB03 BB04 BB05 BB06 BB07 BB08 CD01 CD02 EF01 LL01 LL02 LL03 10 AA01 BB01 BB02 BB03 BB04 BB05 BB06 BB07 BB08 CD01 CD02 EF01 LL01 LL02 LL03 721.85 723.70 767.04 757.66 881.29 797.45 793.66 789.57 891.21 812.98 899.44 64.28 756.88 37.83 1.90 57.13 766.40 50.52 1.21 56.60 809.62 44.13 1.49 52.65 790.80 33.14 1.72 35.08 901.36 27.38 1.34 35.25 1026.94 1042.61 15.67 2.25 862.42 71.73 64.97 1.18 810.49 27.75 16.84 1.79 83.57 864.75 75.18 1.21 914.67 54.76 23.46 2.51 839.17 26.19 32.19 1.31 1.55 71.54 115.44 905.53 1.73 109.51 168.79 1066.48 1124.03 79.08 117.64 972.58 1.36 78.71 60.07 889.11 1.23 52.88 33.91 845.85 1.71 73.22 64.04 783.14 1.17 53.53 63.65 1.31 867.30 52.58 34.74 886.96 1.62 41.47 34.33 928.53 1.28 36.67 24.17 716.17 1.33 80.91 88.16 1.11 766.41 30.91 22.65 703.43 1.55 91.25 63.44 844.19 1.38 857.17 78.90 32.26 2.86 34.60 34.60 1.00 552.60 949.22 1.67 100.74 132.19 955.86 88.39 132.87 1.61 950.92 70.97 112.17 1.40 918.05 94.96 147.07 1.47 1.76 32.55 53.87 829.12 828.56 68.49 58.48 1.21 860.54 68.81 51.07 1.55 49.40 32.21 1.64 817.69 1.32 27.36 35.85 952.18 5.00 1351.00 1356.00 5.00 1.00 886.00 79.58 70.90 1.15 803.14 35.08 18.83 1.67 1.29 85.75 96.57 934.07 957.17 64.01 25.82 2.62 770.43 27.65 1.70 32.90 69.37 108.30 1.44 932.85 93.18 132.27 1092.85 1120.86 1.55 1.29 69.88 983.77 96.07 949.46 68.29 109.24 1.32 893.50 920.12 819.67 775.45 830.51 852.22 909.04 692.00 685.50 680.78 780.75 824.91 518.00 885.78 946.89 879.96 928.30 809.71 777.70 813.61 785.47 924.81 815.09 820.59 848.32 931.35 742.78 903.48 -1 0.53 1 0.41 1 0.56 1 0.57 1 0.45 1 0.67 1 0.45 1 0.77 1 0.35 2 0.67 2 0.60 1 0.63 1 0.52 1 0.48 1 0.43 -1 0.54 1 0.41 1 0.50 1 0.56 1 0.42 1 0.43 1 0.35 1 0.76 1 0.38 2 0.85 2 0.00 1 0.61 1 0.60 1 0.55 1 0.61 -1 0.49 1 0.40 1 0.60 1 0.50 1 0.47 2 NA 1 0.48 1 0.74 1 0.36 2 0.77 2 0.74 1 0.58 2 0.50 1 0.36 1 0.53 0.63 1.24 0.94 1.00 0.72 0.76 0.75 1.46 0.92 0.61 1.11 0.84 0.78 0.81 1.06 0.73 0.89 0.92 0.94 1.04 0.70 0.70 1.64 0.64 0.91 1.27 0.83 0.74 0.78 0.81 0.60 0.91 0.78 0.78 0.73 NA 0.73 1.27 0.88 0.68 1.24 0.91 0.62 0.64 0.84 0.60 0.34 0.31 1.13 0.38 0.36 0.87 0.32 0.31 0.93 0.31 0.30 0.79 0.31 0.31 0.74 0.28 0.27 0.73 0.29 0.27 1.47 0.41 0.40 0.82 0.30 0.29 0.91 0.32 0.31 0.97 0.42 0.42 1.14 0.37 0.40 0.69 0.18 0.25 1.08 0.40 0.37 1.03 0.35 0.42 0.73 0.36 0.35 0.91 0.35 0.37 0.96 0.33 0.33 0.93 0.26 0.25 0.91 0.32 0.32 0.97 0.72 0.72 0.75 0.29 0.27 1.36 0.41 0.40 0.83 0.23 0.17 0.97 0.36 0.34 1.27 0.57 0.55 0.76 0.35 0.34 0.75 0.28 0.33 1.18 0.32 0.28 0.92 0.31 0.37 0.63 0.36 0.35 0.88 0.36 0.34 0.77 0.33 0.31 0.83 0.30 0.29 1.06 0.29 0.28 NA NA NA 0.70 0.29 0.28 1.97 0.36 0.38 0.83 0.25 0.24 0.94 0.28 0.27 1.11 0.52 0.52 1.14 0.33 0.34 0.64 0.21 0.29 0.74 0.33 0.35 0.99 0.32 0.33 0.00 0.41 0.36 0.41 0.41 0.00 0.43 0.38 0.45 0.12 0.24 0.44 0.36 0.43 0.40 0.00 0.38 0.39 0.41 0.37 0.43 0.44 0.38 0.54 0.18 0.00 0.46 0.40 0.18 0.35 0.00 0.42 0.39 0.39 0.41 NA 0.45 0.39 0.35 0.06 0.15 0.36 0.33 0.35 0.41 Table A-1 (cont’d) 965.32 913.08 228 Table A-2. Mean and CV of Activities by Region Mean (Mode) CV 982.55 891.45 732.07 735.04 804.29 803.18 905.89 747.64 793.86 744.87 795.76 894.20 734.20 871.51 Start 762.97 721.91 798.69 806.22 938.78 812.09 759.48 798.02 754.46 868.03 729.59 846.52 50.16 31.09 86.10 69.75 64.36 52.17 56.01 35.01 34.04 26.55 25.17 42.00 98.45 122.33 26.93 16.77 56.17 56.17 24.85 62.00 24.10 52.50 94.10 160.29 End Freq Dur/a Dur/d 779.15 1.74 788.12 1.28 847.65 1.43 839.76 1.72 963.83 1.34 837.25 1.40 857.92 1.24 807.51 1.78 810.63 1.04 892.88 2.66 743.40 1.54 2.14 912.99 1.94 115.34 209.09 1026.86 1062.05 87.43 109.86 939.08 1.27 71.23 100.22 1.29 906.00 50.63 31.94 756.79 1.74 92.24 76.72 799.74 1.25 59.50 46.55 844.11 1.48 49.87 31.46 833.25 1.70 27.63 34.76 1.28 928.86 62.74 70.58 810.37 1.33 890.84 96.98 119.92 1.25 760.42 1.71 15.55 26.13 1.28 108.18 134.06 903.94 915.49 53.16 2.52 753.11 1.31 24.25 2.19 104.71 180.76 933.24 1.99 120.83 221.60 1008.25 1054.79 70.54 98.45 853.13 1.31 69.99 104.62 920.12 1.31 54.47 33.09 796.02 1.78 88.90 73.58 803.17 1.24 52.72 65.44 1.42 855.72 51.53 33.28 824.52 1.67 950.03 34.57 26.94 1.31 910.00 1.22 41.44 44.50 1.20 103.93 121.24 869.85 796.05 25.97 1.64 879.96 92.39 1.23 883.79 59.50 2.65 1.83 57.47 733.35 2.18 109.99 189.03 931.10 2.01 123.36 230.85 1005.78 1047.22 986.23 1.21 932.77 1.29 1.74 785.51 800.29 1.25 850.02 1.47 814.31 1.71 1.31 940.34 776.62 1.10 1.26 831.36 68.21 84.99 71.37 101.53 31.11 50.70 87.81 72.40 63.52 50.54 51.98 31.85 30.07 38.36 57.74 59.29 87.25 114.19 863.88 907.73 774.07 733.69 807.68 792.82 928.72 868.56 765.92 782.12 800.01 861.34 701.49 880.99 967.68 931.78 761.52 730.23 807.31 784.08 914.14 718.88 744.11 16.72 79.95 24.42 31.86 Partner Freq Dur/a Dur/d Start End Partner 0.00 0.37 0.33 0.40 0.38 0.42 0.36 0.35 0.34 0.13 0.16 0.49 0.39 0.40 0.40 0.00 0.40 0.35 0.40 0.40 0.35 0.39 0.36 0.42 0.11 0.15 0.47 0.37 0.39 0.41 0.00 0.39 0.39 0.41 0.40 0.47 0.41 0.35 0.48 0.11 0.14 0.48 0.38 0.41 0.41 0.00 0.38 0.37 0.38 0.39 0.36 0.42 0.63 0.39 0.38 0.97 0.34 0.31 1.00 0.31 0.30 0.88 0.29 0.28 0.86 0.27 0.27 1.45 0.47 0.45 0.85 0.27 0.22 1.23 0.38 0.38 1.13 0.35 0.34 1.08 0.34 0.33 1.55 0.43 0.43 1.14 0.36 0.33 0.80 0.22 0.29 0.97 0.29 0.41 1.01 0.35 0.39 0.71 0.40 0.38 0.94 0.35 0.31 0.89 0.32 0.31 0.98 0.30 0.30 0.90 0.29 0.28 1.63 0.48 0.44 0.93 0.26 0.22 1.54 0.43 0.43 1.01 0.23 0.22 0.87 0.34 0.33 1.17 0.44 0.44 1.04 0.32 0.32 0.81 0.23 0.29 1.08 0.44 0.48 1.10 0.33 0.35 0.71 0.37 0.37 0.98 0.35 0.31 1.00 0.33 0.31 0.92 0.31 0.31 0.89 0.28 0.28 1.17 0.32 0.32 0.94 0.29 0.25 1.31 0.38 0.38 0.86 0.28 0.25 1.03 0.35 0.34 1.56 0.47 0.45 1.02 0.32 0.33 0.81 0.22 0.30 0.97 0.34 0.36 0.99 0.33 0.37 0.71 0.39 0.38 0.99 0.35 0.32 0.91 0.32 0.30 0.90 0.32 0.31 1.13 0.30 0.30 1.57 0.41 0.42 1.05 0.30 0.26 -1 0.52 1 0.46 1 0.59 1 0.57 1 0.45 1 0.46 1 0.41 1 0.68 1 0.20 2 0.86 2 0.70 1 0.58 1 0.58 1 0.47 1 0.47 -1 0.53 1 0.43 1 0.64 1 0.55 1 0.43 1 0.57 1 0.46 1 0.69 1 0.42 2 0.85 2 0.59 1 0.59 1 0.57 1 0.56 1 0.48 -1 0.51 1 0.43 1 0.56 1 0.56 1 0.45 1 0.35 1 0.43 1 0.67 1 0.35 2 0.83 2 1.07 1 0.63 1 0.56 1 0.44 1 0.49 -1 0.52 1 0.45 1 0.64 1 0.56 1 0.46 1 0.27 1 0.47 0.61 0.96 1.15 0.87 0.80 1.13 0.82 1.20 1.13 0.87 1.02 1.00 0.78 0.89 0.98 0.75 0.97 0.93 1.01 0.81 1.85 0.78 1.29 0.94 0.83 1.11 0.91 0.80 0.74 0.95 0.68 0.91 1.04 0.94 0.76 1.27 0.95 1.17 0.91 0.88 1.37 0.87 0.82 0.88 0.84 0.63 0.98 0.93 0.81 1.21 1.62 0.90 2 Rgn Code AA01 1 BB01 BB02 BB03 BB04 BB05 BB06 BB07 BB08 CD01 CD02 EF01 LL01 LL02 LL03 AA01 BB01 BB02 BB03 BB04 BB05 BB06 BB07 BB08 CD01 CD02 EF01 LL01 LL02 LL03 AA01 BB01 BB02 BB03 BB04 BB05 BB06 BB07 BB08 CD01 CD02 EF01 LL01 LL02 LL03 AA01 BB01 BB02 BB03 BB04 BB05 BB06 3 4 24.09 18.91 229 BB07 BB08 CD01 CD02 EF01 LL01 LL02 LL03 Day Code WD AA01 BB01 BB02 BB03 BB04 BB05 BB06 BB07 BB08 CD01 CD02 EF01 LL01 LL02 LL03 WE AA01 BB01 BB02 BB03 BB04 BB05 BB06 BB07 BB08 CD01 CD02 EF01 LL01 LL02 LL03 Table A-2 (cont’d) 28.49 16.87 87.67 76.30 56.86 24.12 28.39 44.66 99.44 178.62 1.72 747.64 1.16 873.33 2.55 940.88 1.53 812.55 2.12 914.95 1.90 116.50 207.95 1015.14 1071.83 918.74 1.19 1.30 941.82 749.97 797.03 922.73 784.17 848.30 72.31 83.98 66.10 102.89 933.17 924.03 1 0.72 1 0.32 2 0.86 2 0.80 1 0.60 2 0.56 1 0.46 1 0.52 1.16 0.80 0.69 1.07 0.86 0.72 0.85 0.94 1.54 0.42 0.43 0.93 0.28 0.24 1.01 0.32 0.32 1.40 0.41 0.42 1.10 0.32 0.30 0.80 0.22 0.27 0.83 0.38 0.45 1.11 0.33 0.36 0.37 0.42 0.11 0.21 0.45 0.36 0.39 0.39 Table A-3. Mean and CV of Activities by Day Mean (Mode) CV Start 728.65 729.77 824.48 804.55 937.48 764.81 797.75 753.25 774.71 829.08 666.13 856.13 End Freq Dur/a Dur/d 750.85 51.15 1.80 30.91 794.41 81.50 1.26 66.52 860.96 56.12 1.41 45.23 833.06 48.14 1.71 30.03 962.35 1.30 33.87 26.75 812.38 47.57 1.26 56.60 885.69 87.94 109.68 1.22 762.12 25.61 15.99 1.69 853.98 1.20 79.27 95.55 851.80 56.61 23.05 2.61 686.18 1.54 23.70 43.25 2.52 107.13 197.43 930.26 1.93 110.28 200.81 1033.02 1073.30 94.19 943.88 1.25 939.10 97.11 951.24 1.32 936.91 53.06 816.12 1.71 794.15 95.62 803.23 1.25 732.37 69.89 1.48 840.42 787.57 56.31 819.65 1.68 785.22 925.45 37.06 1.31 903.68 847.49 1.29 52.36 801.98 1.24 105.76 128.19 844.39 738.64 801.07 28.09 1.69 17.11 789.06 890.99 97.60 1.21 85.46 805.53 970.47 58.79 2.57 25.82 950.40 1.67 50.77 831.63 30.52 801.11 875.67 1.71 917.35 98.86 157.45 990.39 1039.78 2.01 129.81 239.92 910.96 940.60 73.99 1.23 91.55 1.28 74.36 107.77 896.74 902.75 74.18 65.19 33.35 79.52 55.49 35.97 28.84 45.50 Partner Freq Dur/a Dur/d Start End Partner 0.00 0.38 0.35 0.40 0.39 0.39 0.38 0.35 0.39 0.11 0.16 0.50 0.38 0.38 0.40 0.00 0.39 0.38 0.40 0.41 0.42 0.41 0.36 0.45 0.12 0.17 0.44 0.37 0.41 0.40 0.70 0.40 0.39 1.00 0.36 0.32 0.98 0.33 0.32 0.89 0.32 0.31 0.84 0.28 0.27 1.68 0.46 0.44 1.02 0.28 0.24 1.47 0.40 0.40 1.11 0.29 0.27 1.08 0.37 0.36 1.71 0.48 0.47 1.04 0.33 0.31 0.85 0.22 0.28 1.04 0.38 0.42 1.02 0.32 0.35 0.70 0.36 0.36 0.94 0.33 0.30 0.93 0.31 0.29 0.93 0.29 0.29 1.06 0.29 0.29 1.42 0.40 0.38 0.88 0.28 0.24 1.34 0.40 0.40 0.93 0.27 0.23 0.91 0.29 0.29 1.46 0.39 0.38 1.07 0.32 0.33 0.76 0.23 0.29 0.89 0.34 0.42 1.07 0.34 0.38 -1 0.51 1 0.45 1 0.57 1 0.58 1 0.45 1 0.43 1 0.44 1 0.66 1 0.36 2 0.84 2 1.07 1 0.54 1 0.57 1 0.51 1 0.49 -1 0.52 1 0.44 1 0.63 1 0.54 1 0.44 1 0.49 1 0.45 1 0.72 1 0.36 2 0.86 2 0.77 1 0.63 1 0.56 1 0.45 1 0.49 0.70 0.97 1.04 0.85 0.78 1.91 0.91 1.20 1.00 0.88 1.00 0.93 0.79 0.92 0.90 0.65 0.92 1.00 0.96 1.03 1.56 0.85 1.19 0.91 0.77 1.39 0.86 0.78 0.78 0.93 230 Sex Code M AA01 BB01 BB02 BB03 BB04 BB05 BB06 BB07 BB08 CD01 CD02 EF01 LL01 LL02 LL03 F AA01 BB01 BB02 BB03 BB04 BB05 BB06 BB07 BB08 CD01 CD02 EF01 LL01 LL02 LL03 Table A-4. Mean and CV of Activities by Gender Mean (Mode) CV 27.99 71.82 50.98 31.68 27.42 66.33 End Start Freq Dur/a Dur/d 762.11 742.09 43.50 1.68 832.97 768.12 83.16 1.18 879.36 831.62 58.79 1.31 809.33 780.19 44.83 1.49 953.66 926.24 32.83 1.24 897.52 831.18 1.27 73.29 867.86 755.71 1.26 112.15 138.56 792.26 777.90 17.67 1.61 26.90 876.31 785.88 90.43 107.93 1.24 950.74 931.85 24.53 1.93 44.14 751.28 723.11 1.50 33.93 63.02 857.27 914.77 2.19 101.67 177.66 999.04 1049.63 2.00 128.62 238.84 955.06 931.17 80.61 104.82 1.26 909.89 925.88 1.34 79.60 120.42 796.10 772.52 58.24 35.02 1.81 787.47 718.51 90.91 73.89 1.28 842.25 797.91 64.58 50.51 1.48 835.40 803.03 55.52 33.38 1.79 920.81 943.13 1.33 27.78 36.04 733.29 715.40 27.19 17.89 1.28 859.33 779.80 96.48 79.53 1.20 772.32 764.15 26.57 15.88 1.73 814.49 873.27 1.09 58.78 61.06 887.65 865.04 63.37 24.24 2.88 758.37 734.80 1.66 23.56 38.94 2.13 105.18 181.49 872.30 934.06 1.95 113.08 205.37 1022.01 1062.16 897.34 1.21 1.27 928.33 950.43 922.43 78.38 88.91 66.01 62.35 Partner Freq Dur/a Dur/d Start End Partner 0.00 0.38 0.33 0.40 0.38 0.41 0.39 0.36 0.45 0.09 0.16 0.45 0.37 0.40 0.40 0.00 0.39 0.38 0.40 0.40 0.38 0.42 0.35 0.41 0.12 0.17 0.49 0.38 0.40 0.40 0.66 0.41 0.39 0.97 0.35 0.32 0.90 0.33 0.31 1.02 0.35 0.35 0.87 0.30 0.29 1.36 0.36 0.33 0.86 0.28 0.23 1.46 0.42 0.41 0.96 0.26 0.23 0.97 0.35 0.34 1.51 0.43 0.43 1.10 0.33 0.32 0.80 0.23 0.29 1.00 0.36 0.38 1.02 0.33 0.37 0.69 0.37 0.36 0.97 0.34 0.31 0.97 0.32 0.30 0.87 0.28 0.28 0.97 0.28 0.28 1.66 0.53 0.52 1.02 0.29 0.25 1.39 0.40 0.39 0.98 0.32 0.30 0.98 0.33 0.32 1.54 0.44 0.44 1.03 0.33 0.32 0.80 0.22 0.28 0.85 0.37 0.47 1.04 0.33 0.37 -1 0.52 1 0.38 1 0.61 1 0.52 1 0.41 1 0.42 1 0.46 1 0.76 1 0.37 2 0.68 2 0.69 1 0.57 1 0.56 1 0.49 1 0.50 -1 0.51 1 0.46 1 0.60 1 0.56 1 0.45 1 0.52 1 0.42 1 0.65 1 0.26 2 0.84 2 1.00 1 0.64 1 0.57 1 0.48 1 0.49 0.62 0.95 0.92 1.03 0.84 1.49 0.82 1.18 0.90 0.87 1.33 0.91 0.78 0.86 0.88 0.68 0.95 1.05 0.87 0.93 1.34 0.94 1.20 1.01 0.81 1.13 0.90 0.80 0.84 0.93 231 BIBLIOGRAPHY 232 BIBLIOGRAPHY Aarts, H., Paulussen, T., & Schaalma, H. (1997). Physical exercise habit: on the conceptualization and formation of habitual health behaviours. Health Education Research, 12(3), 363-374. Abdi, H. (2010). Coefficient of variation. Encyclopedia of research design, 1, 169-171. Ajzen, I. (1991). The Theory of Planned Behavior. Organizational Behavior and Decision Processes: University of Massachusetts at Amherst: Academic Press. Inc. Aksanli, B., Akyurek, A. S., & Rosing, T. S. (2016). User behavior modeling for estimating residential energy consumption. Paper presented at the Smart City 360°. APA. (2018). American Psychological Association (APA) Dictionary of Psychology. Retrieved from https://dictionary.apa.org/behavior Bartusch, C., Odlare, M., Wallin, F., & Wester, L. (2012). Exploring variance in residential electricity consumption: Household features and building properties. Applied Energy, 92, 637-643. Belsley, D. A., Kuh, E., & Welsch, R. E. (2005). Regression diagnostics: Identifying influential data and sources of collinearity (Vol. 571): John Wiley & Sons. Bernard, M., McBride, J., Desmond, D., & Collings, N. (1988). Events--The third variable in daily household energy consumption. Paper presented at the Proceedings of the 1988 ACEEE Summer Study on Energy Efficiency in Buildings. Bishop, C. M. (2006). Pattern recognition and machine learning: springer. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. Caliński, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics-Theory and Methods, 3(1), 1-27. Chen, Yang, W., Yoshino, H., Levine, M. D., Newhouse, K., & Hinge, A. (2015). Definition of occupant behavior in residential buildings and its application to behavior analysis in case studies. Energy and Buildings, 104, 1-13. Chen, C.-F., & Chao, W.-H. (2011). Habitual or reasoned? Using the theory of planned behavior, technology acceptance model, and habit to examine switching intentions toward public transit. Transportation Research Part F: Traffic Psychology and Behaviour, 14(2), 128-137. Chen, J., Taylor, J. E., & Wei, H.-H. (2012). Modeling building occupant network energy consumption decision-making: The interplay between network structure and conservation. Energy and Buildings, 47, 515-524. 233 DaftLogic. (2018). List of the Power Consumption of Typical Household Appliances. Retrieved from https://www.daftlogic.com/information-appliance-power-consumption.htm Danner, U. N., Aarts, H., & Vries, N. K. (2008). Habit vs. intention in the prediction of future behaviour: The role of frequency, context stability and mental accessibility of past behaviour. British Journal of Social Psychology, 47(2), 245-265. Darby, S. (2006). The effectiveness of feedback on energy consumption. A Review for DEFRA of the Literature on Metering, Billing and direct Displays, 486(2006). Diao, L., Sun, Y., Chen, Z., & Chen, J. (2017). Modeling energy consumption in residential buildings: A bottom-up analysis based on occupant behavior pattern clustering and stochastic simulation. Energy and Buildings. Dong, B., Cao, C., & Lee, S. E. (2005). Applying support vector machines to predict building energy consumption in tropical region. Energy and Buildings, 37(5), 545-553. Dubrawski, A. (2015). 95792: Data Mining (Lecture Note). Retrieved from Pittsburgh, PA: EIA. (2018). Residnetial Energy Consumption Survey https://www.eia.gov/consumption/residential/about.php (RECS). Retrieved from ESRI. (2018). Data classification methods. ArcGIS Pro. http://pro.arcgis.com/en/pro-app/help/mapping/layer-properties/data-classification- methods.htm Retrieved from Farid, D. M., Zhang, L., Rahman, C. M., Hossain, M. A., & Strachan, R. (2014). Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks. Expert Systems with Applications, 41(4), 1937-1946. Fonseca, J. A., & Schlueter, A. (2015). Integrated model for characterization of spatiotemporal building energy consumption patterns in neighborhoods and city districts. Applied Energy, 142, 247-265. Friedrichs, F., & Igel, C. (2005). Evolutionary tuning of multiple SVM parameters. Neurocomputing, 64, 107-117. Generac. (2018). Estimating Power Needs: Portable Generators. https://www.lowes.com/projects/pdfs/portable-generator-wattage-chart.pdf Retrieved from Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3(Mar), 1157-1182. Hall, M. A. (1998). Correlation-based feature subset selection for machine learning. Thesis submitted in partial fulfillment of the requirements of the degree of Doctor of Philosophy at the University of Waikato. 234 Heiple, S., & Sailor, D. J. (2008). Using building energy simulation and geospatial modeling techniques to determine high resolution building sector energy consumption profiles. Energy and Buildings, 40(8), 1426-1436. HES. (2018). Home Energy Saver & Score: Engineering Documentation. Retrieved from http://hes-documentation.lbl.gov/calculation-methodology/calculation-of-energy- consumption/major-appliances/miscellaneous-equipment-energy-consumption/default- energy-consumption-of-mels Higashino, M., Fujimoto, T., Yamaguchi, Y., & Shimoda, Y. (2014). Simulation of Home Appliance Use and Electricity Consumption to Quantify Residential Energy Management Resources. Paper presented at the Proceedings of the 2nd Asia Conference of International Building Performance Simulation Association, Nagoya, Japan. Holmes, G., Hall, M., & Prank, E. (1999). Generating rule sets from model trees. Paper presented at the Australasian Joint Conference on Artificial Intelligence. Hong, T., Taylor-Lange, S. C., D’Oca, S., Yan, D., & Corgnati, S. P. (2016). Advances in research and applications of energy-related occupant behavior in buildings. Energy and Buildings, 116, 694-702. Howard, B., Parshall, L., Thompson, J., Hammer, S., Dickinson, J., & Modi, V. (2012). Spatial distribution of urban building energy consumption by end use. Energy and Buildings, 45, 141- 151. Huang, C.-L., & Wang, C.-J. (2006). A GA-based feature selection and parameters optimizationfor support vector machines. Expert Systems with Applications, 31(2), 231-240. Huang, Z. (1998). Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 2(3), 283-304. ICC. (2009). 2009 International Energy Conservation Code. Washington, DC: International Code Council. Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651-666. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112): Springer. Johnson, B. J., Starke, M. R., Abdelaziz, O. A., Jackson, R. K., & Tolbert, L. M. (2014). A method for modeling household occupant behavior to simulate residential energy consumption. Paper presented at the Innovative Smart Grid Technologies Conference (ISGT), 2014 IEEE PES. Kahneman, D., Krueger, A. B., Schkade, D. A., Schwarz, N., & Stone, A. A. (2004). A survey method for characterizing daily life experience: The day reconstruction method. Science, 306(5702), 1776-1780. 235 Kavousian, A., Rajagopal, R., & Fischer, M. (2013). Determinants of residential electricity consumption: Using smart meter data to examine the effect of climate, building characteristics, appliance stock, and occupants' behavior. Energy, 55, 184-194. Kodinariya, T. M., & Makwana, P. R. (2013). Review on determining number of Cluster in K- Means Clustering. International Journal, 1(6), 90-95. Kolesnikov, A., & Trichina, E. (2012). Determining the number of clusters with rate-distortion curve modeling. Paper presented at the International Conference Image Analysis and Recognition. Kolter, J. Z., & Ferreira Jr, J. (2011). A Large-Scale Study on Predicting and Contextualizing Building Energy Usage. Paper presented at the AAAI. Lagakos, S. W. (2006). The challenge of subgroup analyses—reporting without distorting. New England Journal of Medicine, 354(16), 1667-1669. Li, Z., & Jiang, Y. (2006). Characteristics of cooling load and energy consumption of air conditioning in residential buildings in Beijing. Heating Ventilating & Air Conditioning, 36(8), 1-6. Lovie, P. (2005). Coefficient of variation. Encyclopedia of statistics in behavioral science. Lutzenhiser, L. (1993). Social and behavioral aspects of energy use. Annual Review of Energy and the Environment, 18(1), 247-289. Ma, J., & Cheng, J. C. (2016). Estimation of the building energy use intensity in the urban scale by integrating GIS and big data technology. Applied Energy, 183, 182-192. Madhulatha, T. S. (2012). An overview on clustering methods. arXiv preprint arXiv:1205.1117. Mayfield, E., Adamson, D., & Rose, C. (2014). LightSide Researcher's Workbench User Manual. Pittsburgh, PA: Carnegie Mellon University. Mayfield, E., & Rose, C. P. (2010). An interactive tool for supporting error analysis for text mining. Paper presented at the Proceedings of the NAACL HLT 2010 Demonstration Session. Mitchell, T. M. (1997). Machine learning. 1997. Burr Ridge, IL: McGraw Hill, 45(37), 870-877. Mo, Y. (2018). Occupant Behavior Prediction Model Based on Energy Consumption Using Machine Learning Approaches. (Doctoral Dissertation), Michigan State University, East Lansing, MI. Ouellette, J. A., & Wood, W. (1998). Habit and intention in everyday life: The multiple processes by which past behavior predicts future behavior. Psychological Bulletin, 124(1), 54. Ouyang, J., & Hokao, K. (2009). Energy-saving potential by improving occupants’ behavior in urban residential sector in Hangzhou City, China. Energy and Buildings, 41(7), 711-720. 236 Piedmont, R. L. (2014). Latent Variables. In Encyclopedia of Quality of Life and Well-Being Research. Netherlands: Springer. Quinlan, J. R. (1992). Learning with continuous classes. Paper presented at the 5th Australian joint conference on artificial intelligence. Ramsey, S. A., Klemm, S. L., Zak, D. E., Kennedy, K. A., Thorsson, V., Li, B., . . . Litvak, V. (2008). Uncovering a macrophage transcriptional program by integrating evidence from motif scanning and expression dynamics. Plos Computational Biology, 4(3), e1000021. Reinhart, C. F., & Davila, C. C. (2016). Urban building energy modeling–A review of a nascent field. Building and Environment, 97, 196-202. Sanquist, T. F., Orr, H., Shui, B., & Bittner, A. C. (2012). Lifestyle factors in US residential electricity consumption. Energy Policy, 42, 354-364. Santin, O. G. (2011). Behavioural patterns and user profiles related to energy consumption for heating. Energy and Buildings, 43(10), 2662-2672. Santin, O. G., Itard, L., & Visscher, H. (2009). The effect of occupancy and building characteristics on energy use for space and water heating in Dutch residential stock. Energy and Buildings, 41(11), 1223-1232. Scikit-Learn. (2017). sklearn.preprocessing.StandardScaler. Retrieved from http://scikit- learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html Shermis, M. D., & Burstein, J. (2013). Handbook of automated essay evaluation: Current applications and new directions: Routledge. Smappee. (2018). How does Smappee’s appliance recognition technology work? Retrieved from https://www.smappee.com/us/blog/smappee-appliance-recognition/ Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 411-423. Triandis, H. C. (1979). Values, attitudes, and interpersonal behavior. Paper presented at the Nebraska symposium on motivation. U.S.BLS. (2018). American Time Use Survey. https://www.bls.gov/tus/overview.htm U.S.DOE. (2015). http://www.energy.gov/eere/efficiency/buildings Building Efficiency. Retrieved from Retrieved from U.S.DOE. (2017). How much energy is consumed in U.S. residential and commercial buildings? Retrieved from https://www.eia.gov/tools/faqs/faq.php?id=86&t=1 237 Van Raaij, W. F., & Verhallen, T. M. (1983). A behavioral model of residential energy use. Journal of Economic Psychology, 3(1), 39-63. Vassileva, I., Wallin, F., & Dahlquist, E. (2012a). Analytical comparison between electricity consumption and behavioral characteristics of Swedish households in rented apartments. Applied Energy, 90(1), 182-188. Vassileva, I., Wallin, F., & Dahlquist, E. (2012b). Understanding energy consumption behavior for future demand response strategy development. Energy, 46(1), 94-100. Wang, Y., & Witten, I. H. (1996). Induction of model trees for predicting continuous classes. WholesaleSolar. (2018). How Much Power Do Your Appliances Use? Retrieved from https://www.wholesalesolar.com/solar-information/how-to-save-energy/power-table Widén, J., & Wäckelgård, E. (2010). A high-resolution stochastic model of domestic activity patterns and electricity demand. Applied Energy, 87(6), 1880-1892. Wilkinson, L., Engelman, L., Corter, J., & Coward, M. (2004). Cluster Analysis Systat (Vol. 11, pp. 65-124): University of Illinois Urbana-Champaign. Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data Mining: Practical machine learning tools and techniques: Morgan Kaufmann. Wood, G., & Newborough, M. (2003). Dynamic energy-consumption indicators for domestic appliances: environment, behaviour and design. Energy and Buildings, 35(8), 821-841. Wood, W., Quinn, J. M., & Kashy, D. A. (2002). Habits in everyday life: Thought, emotion, and action. Journal of personality and social psychology, 83(6), 1281. Wood, W., Tam, L., & Witt, M. G. (2005). Changing circumstances, disrupting habits. Journal of personality and social psychology, 88(6), 918. Yoshino, H., Hong, T., & Nord, N. (2017). IEA EBC annex 53: Total energy use in buildings— Analysis and evaluation methods. Energy and Buildings, 152, 124-136. Yu, Z., Fung, B. C., Haghighat, F., Yoshino, H., & Morofsky, E. (2011). A systematic procedure to study the influence of occupant behavior on building energy consumption. Energy and Buildings, 43(6), 1409-1417. Zhao, D., McCoy, A. P., Du, J., Agee, P., & Lu, Y. (2017). Interaction effects of building technology and resident behavior on energy consumption in residential buildings. Energy and Buildings, 134, 223-233. Zhou, J. (2016). Machine Learning - Regression: Bias and Variance (Lecture Note). Retrieved from East Lansing, MI: 238