SMARTPHONE-BASED SENSING SYSTEMS FOR DATA-INTENSIVE APPLICATIONS By Mohammad-Mahdi Moazzami A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Computer Science - Doctor of Philosophy 2017 ABSTRACT SMARTPHONE-BASED SENSING SYSTEMS FOR DATA-INTENSIVE APPLICATIONS By Mohammad-Mahdi Moazzami Supported by advanced sensing capabilities, increasing computational resources and the advances in Artificial Intelligence, smartphones have become our virtual companions in our daily life. An average modern smartphone is capable of handling a wide range of tasks including navigation, advanced image processing, speech processing, cross app data processing and etc. The key facet that is common in all of these applications is the data intensive computation. In this dissertation we have taken steps towards the realization of the vision that makes the smartphone truly a platform for data intensive computations by proposing frameworks, applications and algorithmic solutions. We followed a data-driven approach to the system design. To this end, several challenges must be addressed before smartphones can be used as a system platform for data-intensive applications. The major challenge addressed in this dissertation include high power consumption, high computation cost in advance machine learning algorithms, lack of real-time functionalities, lack of embedded programming support, heterogeneity in the apps, communication interfaces and lack of customized data processing libraries. The contribution of this dissertation can be summarized as follows. We present the design, implementation and evaluation of the ORBIT framework, which represents the first system that combines the design requirements of a machine learning system and sensing system together at the same time. We ported for the first time off-the-shelf machine learning algorithms for realtime sensor data processing to smartphone devices. We highlighted how machine learning on smartphones comes with severe costs that need to be mitigated in order to make smartphones capable of real-time data-intensive processing. From application perspective we present SPOT. SPOT aims to address some of the challenges discovered in mobile-based smart-home systems. These challenges prevent us from achieving the promises of smart-homes due to heterogeneity in different aspects of smart devices and the underlining systems. We face the following major heterogeneities in building smart-homes:: (i) Diverse appliance control apps (ii) Communication interface, (iii) Programming abstraction. SPOT makes the heterogeneous characteristics of smart appliances transparent, and by that it minimizes the burden of home automation application developers and the efforts of users who would otherwise have to deal with appliance-specific apps and control interfaces. From algorithmic perspective we introduce two systems in the smartphone-based deep learning area: Deep-Crowd-Label and Deep-Partition. Deep neural models are both computationally and memory intensive, making them difficult to deploy on mobile applications with limited hardware resources. On the other hand, they are the most advanced machine learning algorithms suitable for real-time sensing applications used in the wild. Deep-Partition is an optimization-based partitioning meta-algorithm featuring a tiered architecture for smartphone and the back-end cloud. Deep-Partition provides a profile-based model partitioning allowing it to intelligently execute the Deep Learning algorithms among the tiers to minimize the smartphone power consumption by minimizing the deep models feed-forward latency. Deep-Crowd-Label is prototyped for semantically labeling user’s location. It is a crowd-assisted algorithm that uses crowd-sourcing in both training and inference time. It builds deep convolutional neural models using crowd-sensed images to detect the context (label) of indoor locations. It features domain adaptation and model extension via transfer learning to efficiently build deep models for image labeling. The work presented in this dissertation covers three major facets of data-driven and computeintensive smartphone-based systems: platforms, applications and algorithms; and helps to spurs new areas of research and opens up new directions in mobile computing research. Dedicated to my beloved parents and family ... iv ACKNOWLEDGMENTS First, I thank my co-authors, more than anyone else they have influenced my view of the research process, and established in me the importance of aiming to produce quality research with the potential for impact. I would like to thank my co-advisers Guoliang Xing and Matt Mutka. I value their candid and honest opinions, their calmness and clarity of advice amid difficult times, and their patience and understanding over the past several years. I feel fortunate to have had the opportunity to work closely with Ulrich Herberg, Daisuke Mashima and Wei-Peng Chen during one year internship at Fujitsu Research Lab. in Sunnyvale, California. I thoroughly enjoyed the chance to work with not only Ulrich, Daisuke and Wei-Peng but the many other exceptional researchers and interns at FLA. As a member of the Mobile Sensing Group and ELANS Lab at Michigan State University I count myself lucky to have been surrounded by a number of outstanding individuals who, at different stages of my PhD, have been part of the lab. In particular, I must make special mention of Dennis Philips. Over the years we have shared many long hours working together in the lab, nights and days. He is not only good colleague, but a good friend. I am definitely lucky to have had support from another amazing person over these years. I would especially like to thank Abdol Esfahanian, for being there. Over the last few years I’ve had so much ups and downs amid the particular family situation. Abdol was the very first and most of the time the only person I could go to. I apologize to all my family and friends for the past years. I appreciate your understanding of the unreasonably long delays in my replies to phone calls and emails. I thank you all for not giving up on me and I plan on keeping in closer contact in the future. Thank you all for your love and support. I would like to have a special thank to my wife, then my girl-friend, Samaneh, who was always with me in all difficulties and was willing to go through them with me, like a true friend. I would v like to thank her for her calmness, for her emotional support and for her uncountable kindness. She is indeed my true friend. Finally, I can not thank enough my parents, my brothers Manoochehr and Hamidreza, my sister, Zahra, and my sister-in-law, Azadeh, for being accepting and loving irrespective of the unpredictable nature of my grad-school life, work schedule and the uncertainty it brings. Living half way across the world complicates many things for a family and they have always stood by me, sacrificed for me and showed me there are things above material life. vi TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 ORBIT: A Smartphone-Based Platform for Data-Intensive Sensing Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 SPOT: A Smartphone-Based Platform to Tackle Heterogeneity in SmartHome Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 On-device Deep-Learning . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 6 . 6 . . . 7 8 9 . . . . . . . . . . . . . . . . . . . . . . . . . . 12 12 14 17 17 19 22 22 25 26 28 28 30 31 32 34 34 35 36 40 41 41 42 43 46 47 Chapter 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 ORBIT: A Smartphone-Based Platform for Data-Intensive Sensing Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Motivation and System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Motivation and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Measurement-Based Latency and Power Profiling . . . . . . . . . . . . . . . . . 2.5.1 Timing Accuracy and Latency Profiling . . . . . . . . . . . . . . . . . . 2.5.2 Power Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Design And Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Application Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.2 Data Processing Library . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.2.1 Adaptive Delay/Quality Trade-off . . . . . . . . . . . . . . . . 2.6.2.2 Data Partitioning via Multi-threading . . . . . . . . . . . . . . 2.6.3 Task Partitioning and Energy Management . . . . . . . . . . . . . . . . . 2.6.3.1 Power Management Model . . . . . . . . . . . . . . . . . . . . 2.6.3.2 Execution Time Profiler . . . . . . . . . . . . . . . . . . . . . 2.6.3.3 Partitioning with Sequential Execution . . . . . . . . . . . . . 2.6.3.4 Partitioning with Branches . . . . . . . . . . . . . . . . . . . . 2.6.4 Task Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.4.1 Smartphone Task Controller . . . . . . . . . . . . . . . . . . . 2.6.4.2 extBoard and Cloud Task Controllers . . . . . . . . . . . . . . Microbenchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.1 Robotic Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii 2.9 2.8.2 Event Timing . . . . . . . . . . 2.8.3 Multi-camera 3D reconstruction 2.8.4 Discussion . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 57 59 60 SPOT: A Smartphone-Based Platform to Tackle Heterogeneity in SmartHome Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Requirements and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Design and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 XML Driver Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1.1 Driver Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1.2 Device Driver Usage . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Appliance Driver by SPOT JAVA Library . . . . . . . . . . . . . . . . . . 3.5.3 Appliance/State Consistency . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.4 Appliance discovery and bootstrap . . . . . . . . . . . . . . . . . . . . . . 3.5.5 Application Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.1 Application 1: Cross-Device Programming . . . . . . . . . . . . . . . . . 3.6.2 Application 2: Residential Automated Demand Response . . . . . . . . . . 3.6.3 Application 3: Central Usage Analytics . . . . . . . . . . . . . . . . . . . Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 61 64 66 70 73 73 74 79 82 83 84 84 85 85 86 87 88 98 Chapter 3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 Chapter 4 On-device Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Architectural Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Layer-wise Profiling of Representative Deep Networks . . . . . . . . . . . . . . 4.6 Evaluation of Model Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Application Use-case: Deep-Learning Based Crowd-Assisted Location Labeling System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 Traditional Approaches to Location Labeling . . . . . . . . . . . . . . . 4.7.2 Deep Learning-based Approach . . . . . . . . . . . . . . . . . . . . . . 4.7.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.4 Labeling and Aggregation by Crowd-Sourcing . . . . . . . . . . . . . . . 4.7.5 Data Collection and Dataset Preparation . . . . . . . . . . . . . . . . . . 4.7.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 5 . . . . . . . 99 99 101 102 104 106 108 . . . . . . . . 113 116 117 121 122 123 124 125 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 viii BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 ix LIST OF TABLES Table 2.1: ORBIT based applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Table 3.1: Smart appliances tested with SPOT . . . . . . . . . . . . . . . . . . . . . . . . 88 Table 4.1: The breakdown of the model shown in Fig. 4.1. Each layer’s output dimension and execution time profile on two different smartphones with two different processors i.e., Exynos 7420, Intel i7-4500. . . . . . . . . . . . . . . . . . . . . 108 Table 4.2: Representative Deep Neural Network Models . . . . . . . . . . . . . . . . . . . 109 Table 4.3: Models built in Deep-Crowd-Label via model adaptation and model extension . 120 Table 4.4: Location labeling results. Each table represents one store with name and grandtruth type (top row). Top-5 prediction results with confidence values (prediction probabilities) are presented in each row. Each prediction is the aggregated result of crowd-sensed images for each store (Sec. 4.7.4). . . . . . . . . . . . . . . . . 126 x LIST OF FIGURES Figure 2.1: ORBIT nodes for seismic sensing and robots. . . . . . . . . . . . . . . . . . . 19 Figure 2.2: System Architecture of ORBIT. . . . . . . . . . . . . . . . . . . . . . . . . . 20 Figure 2.3: Distribution of the intervals between two interrupts raised by a software timer of Android. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Figure 2.4: Distribution of execution time of the SIFT algorithm on 640x480 images on Nexus S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Figure 2.5: Execution time of signal processing algorithms (error bar: standard deviation). Figure 2.6: Arduino and Nexus S power consumption profiles. . . . . . . . . . . . . . . . 26 Figure 2.7: An example ORBIT application. The numbers besides the leaf nodes in the execution tree are the priorities assigned by the application developer; the tag Sx of a task represents the set it belongs to in the task partitioning solution.) . . 29 Figure 2.8: Pseudo-code for generating an application pipeline . . . . . . . . . . . . . . . 30 Figure 2.9: Power management scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 24 Figure 2.10: Delay/quality trade-off (r = step size) . . . . . . . . . . . . . . . . . . . . . . 44 Figure 2.11: Smartphone multi-threading reduces processing delay of compute-intensive tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Figure 2.12: The data-dependant algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . 46 Figure 2.13: The results of various partition schemes . . . . . . . . . . . . . . . . . . . . . 48 Figure 2.14: Impact of delay bound setting on the task assignment and total energy consumption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Figure 2.15: The block diagram of the seismic event timing application. The white blocks are pre-processing algorithms; the gray blocks are the earthquake detection algorithms; the black blocks are the P-phase estimation algorithms. . . . . . . 50 xi Figure 2.16: Application specification of event timing. The “sampler” is a special task running on the extBoard. Specific tasks with different parameters are defined. For example, the parameters “1600” and “1” indicate the number of input and/or output data samples for different tasks, the parameter “1,6” of the bandpass filter specifies the two corner frequencies; the parameter “4” of wavelet specifies the level of transform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Figure 2.17: The results of various partition schemes. . . . . . . . . . . . . . . . . . . . . . 52 Figure 2.18: Impact of delay bound setting on the task assignment and total energy consumption. Top: The number of tasks assigned to the extBoard versus delay bound. Bottom: Total energy consumption versus delay bound. . . . . . . . . 53 Figure 2.19: The measured extBoard processing delay and smartphone energy consumption versus delay bound. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Figure 2.20: Projected lifetime vs. extBoard duty cycle. . . . . . . . . . . . . . . . . . . . 55 Figure 2.21: The energy consumption trace of a node. . . . . . . . . . . . . . . . . . . . . 56 Figure 2.22: The block diagram of the Multi-camera 3D reconstruction application. . . . . . 57 Figure 2.23: The results of various partition schemes. . . . . . . . . . . . . . . . . . . . . . 58 Figure 3.1: Heterogeneity in today’s smart-home systems. Each appliance in (c) requires its own app as shown in (a) that communicates with the appliance using its own protocol via cloud, a bridge and/or directly as shown in (b). Each appliance in (c) has different functionality and each smartphone app in (a) does not support appliances for more than one vendor nor share data with other apps. The user has to switch between apps to operate different appliances. . . . . . . . . . . . 62 Figure 3.2: Heterogeneity in programming abstraction . . . . . . . . . . . . . . . . . . . 67 Figure 3.3: Examples of heterogeneity in message schema in setting different configurations 69 Figure 3.4: SPOT System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Figure 3.5: XML driver of the Philips HUE light . . . . . . . . . . . . . . . . . . . . . . 75 Figure 3.6: The snippet of the XML schema for the SPOT’s driver model . . . . . . . . . 76 Figure 3.7: The snippet of the common driver unit in the driver model . . . . . . . . . . . 77 Figure 3.8: The snippet of the read action unit in driver model . . . . . . . . . . . . . . . 77 xii Figure 3.9: The write actions using driver model . . . . . . . . . . . . . . . . . . . . . . 78 Figure 3.10: The snippet of the write action unit in driver model . . . . . . . . . . . . . . . 79 Figure 3.11: JAVA and XML specifications for dynamic GUI generation . . . . . . . . . . . 81 Figure 3.12: An example dynamically generated GUI in SPOT . . . . . . . . . . . . . . . . 82 Figure 3.13: Smart appliances tested with SPOT . . . . . . . . . . . . . . . . . . . . . . . 89 Figure 3.14: The length comparison (LOC) of different kind of drivers . . . . . . . . . . . . 91 Figure 3.15: Latency of dynamically loading the XML drivers (error bars: standard deviation) 92 Figure 3.16: Latency of database query . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Figure 3.17: The effect of polling interval and number of appliances (devices) on smartphone’s energy consumption. Shorter polling interval and more number of devices lead to higher rate in energy consumption of smartphone. . . . . . . . 94 Figure 3.18: SPOT records the state of appliances and maintains appliance/state consistent in its internal DB with frequent polling. . . . . . . . . . . . . . . . . . . . . . 95 Figure 3.19: The smoothness of displaying GUI: The regular rhythm in SurfaceFlinger process indicates the smooth display rendering. The regular rhythm in the CPU state in the same period of time indicates no interference between the threads in the app. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Figure 3.20: Latency of whole-home application runtime . . . . . . . . . . . . . . . . . . . 97 Figure 4.1: The output volume (feature maps) of different layers in a deep neural network . 100 Figure 4.2: Sparsity in feature maps in conv. neural nets: This figure shows a typicallooking feature map on the first conv. layer of a trained AlexNet while processing an image of a cat as the input. Every box shows an activation map corresponding to a filter. This figure shows how sparse the activations are (most values are zero and shown in black) [Kaparaty 2016, ] . . . . . . . . . . 104 Figure 4.3: The profiling of the execution time, layer-wise latency and the activation’s tensor size for the three major representative deep neural models. . . . . . . . 107 Figure 4.4: Partitioning Results and end-to-end model latency for representative models when Deep-Partition is applied . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Figure 4.5: The impact of communication bit-rate on the partitioning and model latency . . 113 xiii Figure 4.6: The impact of feature maps sparsity on the partitioning and model latency (AlexNet) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Figure 4.7: An indoor area with semantic labels . . . . . . . . . . . . . . . . . . . . . . . 116 Figure 4.8: Our model adaptation schema and ensemble of adapted deep neural models. Left/Green: Several deep neural models pre-trained or extended using transfer learning. Middle/Red: The adaptation layer. Right/Blue: The aggregation layer. 118 Figure 4.9: Predictions on real samples collected from indoor shops. Bars below each image show the top-5 model predictions using our deep learning method sorted in ascending order. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 xiv Chapter 1 Introduction 1.1 Overview Smartphones are becoming more and more embedded in our daily lives, have changed the way we interact with our environment, and the way we perform our daily activities. While early smartphones were designed to primarily support a better voice communication and basic activities like surfing web, checking emails and listing to music, technological advances helped to reduce the gap between what we consider conventional smartphones and advanced computers. As this technological divide further diminished, a new paradigm is emerging fast: smartphones are beginning to replace the so called “intelligent” or “smart” aspects of many sensing and embedded applications. The increasing computational resources offered by smartphones allow us to interface them with many other legacy and new systems directly and continuously more than ever before. Another critical component that makes smartphones a central enabler to new advances across a wide spectrum of application domains is the embedded sensors in these devices. Multi-modal sensing capability in smartphones is set to become even more critical to sensing systems as they become intertwined with existing applications such as global environmental monitoring and new emerging domains such as smart appliances and smart-home systems, sensor-based augmented and virtual reality, personal and community health-care and sport systems, and intelligent transportation systems. As such, the ubiquity of smartphones together with their multi-modal sensing capabilities 1 have enabled ground-breaking ways to build a wide spectrum of mobile sensing applications as never been possible before. These applications are usually human-centric in that the smartphone utilizes on-board sensors to sense people and characteristics of their contexts. These advances are enabled not only by rich and multiple sensing capabilities, but also by a number of other factors as well, including increased battery capacity, communications and computational resources (CPU, RAM), and new large-scale application distribution channels [Miluzzo, 2011]. By building innovative smartphone applications and embedding novel data processing algorithms to mine large scale sensing data on smartphones, it is now possible to perform scientific discoveries using ensemble of smartphones in a more cost-effective way than before [Eagle and Pentland, 2006, Moazzami et al., 2015]. Different from the sensing-centric applications, this dissertation considers an emerging class of smartphone-based compute-centric applications. In contrast to the snesing-centric nature of participatory sensing in which the application complexity mainly lies on the sensing algorithms while complex processing is offloaded to the cloud, smartphones in these applications are embedded into environments to sense and interact with the physical world autonomously over long periods of time. Applications on the smartphones are more complex and equiped with advance machine learning algortihms capable of processing large volume and complex sensor data on-the-go on-device. For instance, in the Floating Sensor Network project [Amin et al., 2007], smartphone-equipped drifters are rapidly deployed to collect real-time data about the flow of water through a river. The smartphone’s GPS allows the drifter to measure volume and direction of water flow based on its real-time location and transmit the data back to the server through cellular networks. Smartphones have also been employed for monitoring earthquakes [Faulkner et al., 2011], volcanoes [VolcanoSRI 2012, ], and even operating miniature satellites [NASA PhoneSat 2013, ]. Another important class of smartphone-based embedded systems is cloud robots [Guizzo, 2011] [Kehoe et al., 2015]. By in2 tegrating smartphones, these robots can leverage a plethora of phone sensors to realize complex sensing and navigation capabilities and offload compute-intensive cognitive tasks like image and voice recognition to the cloud. Compared with the traditional mote-class sensing platforms, smartphones have several salient advantages that make them promising system platforms for the aforementioned applications. These features include high-speed multi-core processors that are capable of executing advanced data processing algorithms, multiple network interfaces, various integrated sensors, friendly user interfaces and advanced programming languages. Moreover, the price of smartphones has been dropping significantly in the last decade. Many Android phones with reasonable configurations (up to 800 MHz CPU and 2 GB memory) cost less than US$50 [LG Optimus Net, ]. However, several challenges must be addressed before smartphones can be used as a system platform for data-intensive applications. We face the following major challenges: (1) High power consumption: The smartphone power management schemes are designed to adapt to user activities to extend battery time. However, they are not suitable for untethered embedded sensing systems. If the smartphone samples sensors continually, its CPU cannot enter a deep sleep state to save energy. Low-power coprocessors (e.g., M7 in iPhone5s) can handle continuous sampling, but are available on a few high-end models only. (2) Lack of real-time functionalities: Many sensing applications have stringent real-time requirements, such as constant sampling rate and precise timestamping. However, modern smartphone OSes are not designed for meeting these real-time requirements. For instance, sensor sampling can be delayed by high-priority CPU tasks such as Android system services or user interface drawing. Our measurements show that the software timer provided by Android may be blocked by Android core system services by up to 110 milliseconds. Moreover, Android programming library does not 3 provide the native interfaces that allow developers to express timing requirements. (3) Lack of embedded programming support: The programming environment of smartphone is designed to facilitate the development of networked, human-centric mobile applications. However, it lacks important embedded programming support such as resource-efficient signal processing libraries and unified primitives for controlling and communicating with peripheral accessories such as external sensors. (4) Heterogeneity in mobile apps, communication interfaces and programming abstractions: The lack of global standards in communication, control and data management for smartphone and smart devices results in highly fragmented systems consisting of proprietary solutions provided by multiple device vendors. Users are required to use different control interfaces to interact with smart appliances in their homes or are forced to use devices sold by a single vendor to get the interconnection among the apps. Cross app scenarios also suffer from the heterogeneity in the programming abstraction owing to the lack of global standard and a dominating architecture. (5) High computation cost in advance machine learning algorithms: Machine learning algorithms are usually computationally intensive, take considerable amount of time and resources to train and sometimes the resulting model take up a lot of space on the disk, making them difficult to deploy on resource-limited embedded systems such as smartphones and smart devices. In addition, majority of advance machine learning algorithms commonly used in real applications are supervised algorithms - require labeled data to train - making them difficult to use in large scale sensing applications. In this dissertation, we propose a number of platforms, models, algorithms and applications, that advances our abilities to build smartphone-based data-intensive applications by addressing these challenges collectively. We first conduct a series of systematic measurement experiments to 4 study the limitations of smartphones and highlight the challenges of developing such applications. We describe the design and implementation of a generic multi-tier platform called ORBIT [Moazzami et al., 2015] that provide the inference and machine learning algorithms for data intensive continuous sensing applications deployed on off-the-shelf smartphones. Moreover, we provide case-studies of multiple different such applications built upon ORBIT and discuss the challenges addressed by ORBIT as well as opportunities provided. In addition, we describe SPOT, a platform for a new emerging class of data intensive applications in smart-home context. We describe how SPOT addresses multiple kinds of heterogeneity while providing an extensible ecosystem toward truly connected smart-homes. We describe applications built on top of SPOT and discuss how they benefit from SPOT. Finally, we describe how advance machine learning algorithms, like deep learning models, can efficiently be integrated into the smartphone applications as the core computation pipeline, by introducing Deep-Partition and Deep-Crowd-Label. Deep-Partition provides a systematic approach to analyze the architecture of deep neural networks for efficient deployment. It minimizes the feed forward execution time of deep models by partitioning them between the phone and the backend cloud subject to the application requirement, available resources and the model architecture. While Deep-Partition describes efficiency in inference time for single smartphone scenarios, this chapter also offers methods to efficiently build such deep models for sensing applications, as well as a novel method to improve the inference accuracy by using the power of crowd at inference time. It proposes these methods with a real use-case, a crowd-sourced indoor location labeling application, Deep-Crowd-Label. 5 1.2 Thesis Outline The proposed smartphone sensing platforms, system architectures, algorithms, and applications in this dissertation are rigorously evaluated using a combination of analysis and experimental studies in real environments. Experimental research plays a key role in the work presented. We build large scale experimental data-intensive applications over smartphone that are optionally connected to different devices and embedded boards such as the Arduino and IOIO boards, or different smartappliances. We study the behavior of these applications and apply our findings toward the construction of larger and more scalable smartphone-based data-intensive sensing systems. By implementing sensing systems, algorithms, and applications on off-the-shelf smartphones and leveraging large scale data processing algorithms we discover and highlight the challenges presented by realistic mobile sensing system deployments and propose solutions to address them. 1.2.1 ORBIT: A Smartphone-Based Platform for Data-Intensive Sensing Applications In chapter 2 we present the design, implementation, and evaluation of ORBIT platform and report our experience of building data-intensive application over ORBIT in real scenarios like environmental monitoring and robotic sensing applications. ORBIT is a smartphone-based platform for data-intensive embedded sensing applications and features a tiered architecture, in which a smartphone can interface to an energy-efficient peripheral board and/or a cloud service. ORBIT as a platform addresses the shortcomings of current smartphones while utilizing their strengths. ORBIT provides a profile-based task partitioning allowing it to intelligently dispatch the processing tasks among the tiers to minimize the system power consumption. ORBIT also provides a data processing library that includes two mechanisms namely adaptive delay/quality trade-off and 6 data partitioning via multi-threading to optimize resource usage. Moreover, ORBIT supplies an annotation-based programming API for developers that significantly simplifies the application development and provides programming flexibility. We conduct a measurement-based profiling of the latency and power consumption of different Android smartphones and various peripheral boards. Our results suggest that, time-critical tasks, such as high-rate sensor sampling and lightweight signal processing, should be executed on the peripheral board, while compute-intensive tasks should be offloaded to the smartphone or the cloud. By intelligently dispatching the processing tasks among the smartphone, the board, and the cloud, ORBIT minimizes the system energy consumption subject to upper-bounded processing delays. ORBIT also includes a signal processing library and a component-based programming environment, which support task partitioning with embedded programming primitives. Extensive microbenchmark evaluation and three case studies including seismic sensing, visual tracking using an ORBIT robot, and multi-camera 3D reconstruction, validate the generic design of ORBIT. 1.2.2 SPOT: A Smartphone-Based Platform to Tackle Heterogeneity in SmartHome Systems In chapter 3 we expand the scope of our study to one of the emerging area of data-intensive sensing applications, smart-homes. In this chapter we look at the recent advancements of smart-home technologies, including broad penetration of Internet-connected smart appliances such as remotely controllable LED lights, thermostats, cameras, motion sensors, and door locks. We elaborate how these technologies have changed the way we interact with the appliances and perform our daily activities and what the challenges toward a truly connected smart-homes are. In general, the significant heterogeneity in the smart appliances has led to isolated smart-home systems in which each 7 single appliance vendor provides proprietary solution for appliance specific connectivity and user experience. In particular, the heterogeneity exists in different aspects of smart appliances such as control apps, communication protocol, messaging schema, data structure and variable naming. To address these challenges, we present SPOT, a user-centric, smartphone-based platform for multi-vendor heterogeneous smart-home appliances. SPOT consists of several novel mechanisms including XML and JAVA appliance driver models, annotation-based API, an appliance-adaptive user interface and appliance/state control. To validate the flexibility and generality of our approach, we have built a SPOT prototype based on 7 real appliances. Our extensive microbenchmark evaluation and case studies show that SPOT tackles different types of heterogeneities in smart appliances, significantly eases the development of cross-device smart-home applications, and improves user experience while incurring low runtime overhead. We believe SPOT is the promising solution toward a truly-connected smart-homes. 1.2.3 On-device Deep-Learning Chapter 4 presents a novel partitioning framework for deploying deep neural models in mobile applications more efficiently. This chapter describes the design, implementation and evaluation of the Deep-Partition, which represents the first system that combines task offloading with architecture of deep neural network models. We propose a novel model partitioning framework that enables us to embed deep learning models into mobile applications by decomposing them and assigning layers of the model to different tiers based on their time-criticality, compute-intensity, and heterogeneous latency/memory consumption profiles. To this end we look into the key standard layers that deep learning frameworks provide to build any deep neural architecture. We benchmark them to achieve the layer-wise latency for each architecture. In addition, we exploit the neural networks architecture further and consider the 3D output volume of each layer and the encoding and 8 sparsity of each output volume. These factors drive the design of Deep-Partition. To validate the performance of this framework, we build three major representative deep neural models, partition them with Deep-Partition under several conditions. We show how Deep-Partition minimizes the end-to-end execution time of embedded deep neural models. In addition, this chapter embodies an application of deep neural models in a crowd-assisted system for location labeling. DeepCrowd-Label uses crowd-sourcing in both training and execution and shows how crowd-sourcing architecture can be leveraged to decrease the uncertainty in the prediction of sensing pipelines. It presents novel model adaptation and transfer learning mechanism to build deep neural models for mobile application more efficiently especially when proper training data is not available. We believe methods provided by Deep-Partition and Deep-Crowd-Label significantly facilitate building high performance mobile sensing applications. 1.3 Thesis Contribution Herein, in this dissertation, we make several broad contributions to the field of smartphone-based sensing and data-intensive sensing applications, as summarized in the following. 1- The work in this dissertation contributes to spearheading the emerging area of data-intensive sensing applications and provides generic and universal platforms for such applications. In Chapter 2 we conduct systematic measurement and modeling to understand the opportunities as well as the challenges for using smartphones for data-intensive embedded sensing applications. Our measurement results are also useful for the design of a broad class of smartphone-based sensing systems. Second, we provide an implementation of several data processing algorithms as a library as well as several mechanisms that improve the efficiency of data processing algorithms for smartphones and mechanisms to extend the hardware plat9 form by extension-boards like the Arduino board, if needed. To our best knowledge, ORBIT is the first general-purpose, extensible, application-aware, and end-to-end sensing and processing platform for smartphones-based data-intensive embedded applications. Lastly, we demonstrate the generality and flexibility of ORBIT as a platform by presenting our experience in prototyping three applications upon ORBIT: seismic sensing, multi-camera 3D reconstruction and robotic sensing. The flexible task partitioning and dispatching framework allows ORBIT to adapt to different task structures, application deadlines, and communication delays. 2- In Chapter 3 we perform a systematic study to understand the characteristics of smart-home appliances as well the opportunities and challenges for using smartphone as the central gateway to control smart-home appliances. The result of our study shows multiple aspects of heterogeneity in smart appliances. Second, we provide a flexible, extensive and extensible device driver model that supports a number of smart appliances available on the market. The driver model addresses multiple types of heterogeneity observed in our study. Third, we provide the design and implementation of the proposed platform as a smart-home system that loads the drivers at runtime along with a dynamic user interface adaptive to the features of each appliance. Lastly, we demonstrate the generality and flexibility of our system by presenting our experience in prototyping the drivers for several real appliances as well as a cross-device home application. We also discuss examples of other home applications that we have prototyped on top of our platform. We believe this work is a crucial solution for the current highly fragmented smart-home systems and is a major step toward having a truly connected smart-home. 3- In Chapter 4 we present a collection of methods to address training and execution chal- 10 lenges of mobile sensing pipelines embedding deep neural models, one of the most computeintensive data processing methods for mobile application. In this chapter we address both system and algorithmic challenges in two different yet complementary perspectives: a) building the the processing pipeline, b) runtime execution of the pipeline. The methods provided in this chapter enables researchers and developers to build more efficient mobile sensing applications with built-in more accurate data processing pipelines. We believe that ORBIT, SPOT, Deep-Partition and Deep-Crowd-Label significantly advance the understanding of opportunities and challenges in the design of smartphone-based data-intensive sensing systems. By proposing some early solutions to tackle these challenges, and ways to seize the opportunities provided, this dissertation opens up new research directions in this emerging area. 11 Chapter 2 ORBIT: A Smartphone-Based Platform for Data-Intensive Sensing Applications 2.1 Introduction Owing to the rich processing, multi-modal sensing, and versatile networking capabilities, smartphones are increasingly used to build data-intensive embedded sensing applications. However, various challenges must be systematically addressed before smartphones can be used as a generic embedded sensing platform, including high power consumption, lack of real-time functionality and user-friendly embedded programming support. In this Chapter, we take the first step toward addressing these challenges collectively. We present ORBIT, a smartphone-based platform for embedded sensing systems. In particular, ORBIT leverages off-the-shelf smartphones to meet the energy-efficiency and timeliness requirements of data-intensive embedded sensing applications. ORBIT is based on a tiered architecture that comprises up to three tiers: the cloud, the smartphone, and one or more energy-efficient peripheral boards (referred to as extBoard) that are interfaced with the smartphone. A number of extBoard platforms are currently available, such as Arduino [Arduino Board, ] and IOIO [IOIO for Android, ]. Therefore, if the built-in sensors on the smartphones are not suitable for sensing applications, these boards can readily integrate various accessories, such as external sensors, to an Android 12 phone via USB or bluetooth interface. We conduct a measurement study on the latency and power consumption of Android smartphones and extBoard platforms. Our results show that the two platforms have highly heterogeneous but complementary power/latency profiles: smartphone features higher energy efficiency due to its faster processing capability while yielding poor timing accuracy due to the overhead of OS. These results have important implication for efficient task partitioning. In particular, while the smartphone and cloud should handle long-running compute-intensive tasks, time-critical functions such as high-rate sensor sampling and precise event timestamping must be shifted to the extBoard owing to its hardware timers and efficient interrupt handling. Motivated by the above observations, we propose a task partitioning framework that assigns tasks to different tiers based on their time-criticality, compute-intensity, and heterogenous latency/power consumption profiles. Furthermore, to take advantage of the increasing availability of multiple cores on smartphones, ORBIT implements a data partitioning scheme that decomposes matrix-based computation into multiple threads. ORBIT also integrates a data processing library that supports high-level Java annotated application programming. The design of this library facilitates the resource management of the embedded applications by promoting a delay/quality trade-off mechanism. To enable dynamic task dispatch and runtime task profiling, we develop an ORBIT runtime environment consisting of task controllers running on each tier. These controllers coordinate task execution through a unified messaging protocol. Owing to these features, ORBIT is a powerful system toolkit to build a wide spectrum of data-intensive embedded sensing applications. Contributions of our work outlined as follows. First, we conduct systematic measurement and modeling to understand the opportunities as well as the challenges for using smartphones for dataintensive embedded sensing applications. Our measurement results are also useful for the design of a broad class of smartphone-based sensing systems. Second, we provide an implementation of several data processing algorithms as a library as well as several mechanisms that improve 13 the efficiency of data processing algorithms for both the smartphone and the extension board. Several components of ORBIT bear some similarity with existing embedded system platforms [Cuervo et al., 2010a, Girod et al., 2004, Newton et al., 2009, Sorber et al., 2005]. However, to our best knowledge, ORBIT is the first general-purpose, extensible, application-aware, and end-to-end sensing and processing platform for smartphones-based data-intensive embedded applications.1 Lastly, we demonstrate the generality and flexibility of ORBIT as a platform by presenting our experience in prototyping two applications upon ORBIT: seismic sensing and multi-camera 3D reconstruction. The flexible task partitioning and dispatching framework allows ORBIT to adapt to different task structures, application deadlines, and communication delays. The experiments show ORBIT reduces energy consumption by up to 50% compared to baseline approaches. 2.2 Related Work Mobile sensing based on smartphones has recently received significant interests. Most studies focus on the issues related to human-centric context, including coordination among multiple concurrent sensing applications [Kang et al., 2008, Kang et al., 2010, Ju et al., 2012] and sensing algorithms such as context classifiers [Chu et al., 2011]. Recently, smartphones have been used in a number of embedded sensing applications. In [Faulkner et al., 2011], smartphones are used to build an earthquake early warning system using an onboard accelerometer. In the Floating Sensor Network project [Floating sensor network project, ], smartphone-equipped drifters are deployed to monitor waterways and collect real-time volume and direction of water flow based on the phone’s GPS. The NASA PhoneSat project [NASA PhoneSat 2013, ] has launched low-cost satellites equipped with Android smartphones. Controlled by a smartphone, such small satellites 1 The source code of ORBIT is available at https://github.com/msu-sensing/ORBIT 14 could perform various tasks such as earth observation and space debris tracking. Several recent efforts focus on building cloud robots [Guizzo, 2011] that integrate smartphones with robots. The phone’s built-in sensors are used for sensing and navigation, while compute-intensive tasks like image and voice recognition are offloaded to the cloud. Various task offloading schemes for smartphones have been developed recently. Spectra [Flinn et al., 2002] allows programmers to specify task partitioning plans given application-specific service requirements. Chroma [Balan et al., 2003] aims to reduce the burden on manually defining the detailed partitioning plans. Medusa [Ra et al., 2012] features a distributed runtime system to coordinate the execution of tasks between smartphones and cloud. Turducken [Sorber et al., 2005] adopts a hierarchical power management architecture, in which a laptop can offload lightweight tasks to tethered PDAs and sensors. While Turducken provides a tiered hardware architecture for partitioning, it relies on the application developer to design a partitioned application across the tiers to achieve energy efficiency. Different from these task partitioning schemes, ORBIT dispatches the execution of sensing and processing tasks in a smart-phone-based multi-tier architecture to achieve data-intensive applications requirements. ORBIT maximizes the battery lifetime subject to the application-specific latency constraints. Moreover, in order to support fine-grained task partitioning across the tiers, the developer specifies the application’s task structure as well as real-time requirements via either Java annotations or an XML-based application model provided by ORBIT. ORBIT also provides a messaging interface to support unified data passing mechanism between heterogenous tiers and between different application components. The details of this messaging protocol is described in a technical report [Moazzami et al., 2013]. The MAUI system [Cuervo et al., 2010a] enables a fine-grained offloading mechanism to prolong the smartphone’s battery lifetime. However, MAUI relies on the properties of the Microsoft 15 .NET managed code environment to identify the functions that can be executed remotely. When a function is executed remotely, MAUI assumes the energy associated with its local execution is saved. In contrast, ORBIT does not rely on any language specific environment and its measurement-based power profiles account for many realistic power characteristics such as CPU sleep, wake up and tail time. The Wishbone system [Newton et al., 2009] also features a task dispatch scheme. Unlike Turducken, Wishbone uses a profile-based approach to find the optimal partition. It only considers two tiers: in-network and on-server. Unlike MAUI, Wishbone relies on the timing profile only and does not account for the power consumption. ORBIT differs from Wishbone in several ways. Wishbone uses the CPU and network timing profiles only to find the optimal task partition, while ORBIT considers the measured latency and power consumption, which leads to more energy-efficient task partitions. Moreover, Wishbone depends on the timing profiles based on sample data under the assumption that the sample data can represent actual runtime data. However, our measurement study shows that the signal processing timing profiles can exhibit significantly variations in real scenarios. To address this, ORBIT measures the statistical timing profiles at runtime, and periodically refines the partitioning results. Moreover, Wishbone formulates the partitioning problem as a 0/1 integer linear programming problem and thus supports two tiers only. In contrast, ORBIT formulates the problem as a non-linear optimization problem and supports three or more tiers. RTDroid [Yan et al., 2014] tackles the lack of hard real-time capability of Android system and addresses the problem by redesigning and replacing several Android components in Dalvik, e.g., Looper-Handler and Alarm-Manager. In contrast, ORBIT requires no changes to the Android system. ORBIT accounts for statistical properties of task execution, and finds the best execution assignment by its task partitioning mechanism. Hence, although RTDroid and ORBIT address different sets of issues, they are complementary. In fact, ORBIT can run on RTDroid and the 16 ORBIT-based sensing applications can benefit from both. Similar to ORBIT, EmStar [Girod et al., 2004] provides an environment to implement distributed embedded systems for sensing applications based on Linux-class Microservers. However ORBIT takes one major step further and proposes a design based on smartphones for the purpose they are not originally designed for, which is embedded systems. This difference in underlying technology leads to totally different design and implementation. Although EmStar and ORBIT have similar modular designs, unlike ORBIT, EmStar does not have any partitioning mechanism and it is not strictly tiered. More importantly, ORBIT provides a library of data processing algorithms that are efficient on the resource-constrained smartphone and extension board. This is not a design goal of EmStar. 2.3 Motivation and System Overview In this section, we discuss the motivation of using smartphone as a system platform for dataintensive embedded sensing applications and the design objectives of ORBIT. 2.3.1 Motivation and Challenges Mote-class sensing platforms such as TelosB have been widely adopted by embedded sensing applications in the past decade. However, due to the limited processing and storage capabilities, they are ill-suited for high-sampling-rate sensing applications. Recently, several single-board computers such as Gumstix [Gumstix, ], SheevaPlug [Marvell Sheevaplug, ], and Raspberry Pi [Raspberry Pi, ], which are equipped with rich processing and storage capabilities, have been increasingly used in embedded applications. However, their designs are not particularly optimized for low-power sensing. Moreover, without on-board sensors and wireless interfaces, they need to be equipped with various peripherals for different applications. 17 Different from the above platforms, commercial off-the-shelf smartphones offer several salient advantages that make them a promising system platform for data-intensive embedded sensing applications. The advantages include rich computation and storage resources, multiple network interfaces and sensing modalities, increasing available multi-core architecture and low cost. Moreover, smartphones come with advanced programming languages and friendly user interfaces, such as touch screen to enable rich and interactive display, unlike the limited user interfaces of motes and embedded computers (e.g., LED and buttons). However, we still face the following major challenges in building an embedded sensing platform based on COTS smartphones: (1) High power consumption: The smartphone power management schemes are designed to adapt to user activities to extend battery time. However, they are not suitable for untethered embedded sensing systems. If the smartphone samples sensors continually, its CPU cannot enter a deep sleep state to save energy. Low-power coprocessors (e.g., M7 in iPhone5s) can handle continuous sampling, but are available on a few high-end models only. (2) Lack of real-time functionalities: Many sensing applications have stringent real-time requirements, such as constant sampling rate and precise timestamping. However, modern smartphone OSes are not designed for meeting these real-time requirements. For instance, sensor sampling can be delayed by high-priority CPU tasks such as Android system services or user interface drawing. Our measurements show that the software timer provided by Android may be blocked by Android core system services by up to 110 milliseconds. Moreover, Android programming library does not provide the native interfaces that allow developers to express timing requirements. (3) Lack of embedded programming support: The programming environment of smartphone is designed to facilitate the development of networked, human-centric mobile applications. However, 18 (a) Seismic ORBIT node (b) Robotic ORBIT node Figure 2.1: ORBIT nodes for seismic sensing and robots. it lacks important embedded programming support such as resource-efficient signal processing libraries and unified primitives for controlling and communicating with peripheral accessories such as external sensors. 2.4 System Overview In this paper, we present ORBIT, which is designed to address the above three major challenges. An ORBIT node comprises an Android smartphone, an extBoard (e.g., IOIO [IOIO for Android, ] and Arduino [Arduino Board, ]), and possibly a runtime system on the cloud. The extBoard is connected to the smartphone through a USB cable or bluetooth for communication. It is equipped with a low-power MCU, e.g., ATmega2560 with 16 MHz frequency, 8 KB RAM, and an analogto-digital (A/D) convertor that can integrate various analog sensors. Fig. 2.1 shows two ORBIT 19 Figure 2.2: System Architecture of ORBIT. prototypes, a seismic monitoring node and a robot sensing node that are used in the evaluation (cf. Section 3.7). Fig. 2.2 shows the overall system architecture of ORBIT. ORBIT is designed to meet the following three requirements. (1) Energy-efficiency and while taking into account the timeliness requirements: ORBIT leverages the heterogeneous power/latency characteristics of multiple tiers (e.g., extBoard, smartphone and cloud server) to minimize the overall energy consumption. It also models the timing latency of the application statistically and applies these models in task partitioning and execution. We note that ORBIT cannot achieve hard real-time guarantees. However, the statistical task timing model allows the task deadlines to be met with higher probability. (2) Programmability: ORBIT provides a component-based programming environment that allows developers to build sensing applications without the need to deal with low-level issues of the system design. (3) Compatibility: ORBIT relies solely on the out of the box functionality of COTS smartphones, without requiring kernel-level customization or device rooting. This not only minimizes the burden on the application developers, but also ensures the compatibility with diverse smartphone models. In the following, the major ORBIT components are described. 20 ORBIT Library and Application Model: ORBIT provides a library of signal processing algorithms with unified interfaces. They can be easily composed into various advanced sensing applications. The library provides a programming primitive, referred to as connection, allowing programmers to specify application composition in an XML file or through Java annotations. In particular, each algorithm can be executed on any tier, enabling flexible task dispatching. Task/Data Partitioner and Execution Time Profiler: To meet the deadlines of sensing applications, time-critical tasks should be executed on the extBoard while the compute-intensive tasks should be executed on the smartphone and/or the cloud. We formally formulate a task partitioning problem that aims to minimize the energy usage of the smartphone subject to a processing delay bound on time-critical tasks. Task Partitioner solves this problem and obtains the optimal task dispatch plan. A challenge presented by this design is that the signal processing tasks may have highly variable execution time. We design an online profiler that measures task execution time at runtime and runs the task partitioner dynamically. Moreover, ORBIT adopts a data partitioning scheme that decomposes matrix-based computation into multiple threads to take advantage of the increasing availability of multiple cores on smartphones. Task Controllers and Unified Messaging Protocol: At runtime, the Task Controllers on different tiers collaboratively instantiate the tasks and execute them by following the task dispatch plan. The extBoard runs low-level and real-time functions such as sensor sampling and lightweight signal processing tasks. The smartphone and cloud run compute-intensive tasks that require data from a single and multiple ORBIT nodes, respectively. To facilitate such flexible task dispatching and control, we develop a unified messaging protocol for the communication across different tiers on top of native communication channels such as USB (between phone and extBoard) and HTTP (between phone and cloud server). Due to space constraint, the details of the messaging protocol 21 are omitted in this paper and can be found in a technical report [Moazzami et al., 2013]. 2.5 Measurement-Based Latency and Power Profiling To use smartphones as a system platform for data-intensive sensing applications, it is important to understand the characteristics of their latency and power consumption. This section presents a measurement study of the latency and power consumption on different smartphones. The measurement study provides insights into the limitations of smartphones and motivates several key design decisions in ORBIT. For instance, the design of the task partitioner, execution time profiler, adaptive delay/quality trade off in the library are based on the findings of the measurement study discussed in this section. 2.5.1 Timing Accuracy and Latency Profiling Timing accuracy is critical for many sensing applications. For instance, acoustic or seismic source localization [Liu et al., 2013] typically requires millisecond level precision for the timestamps of sensor samples. In this section, we measure the accuracy of software timer and event timestamping of Android smartphones and discuss the impact on the design of ORBIT. First, an event timer is commonly used to implement constant-rate sensor sampling and its accuracy determines the sampling rate precision that can be supported. Second, timestamping an external event, which may be triggered by a GPS receiver or a sensor connected to the smartphone through USB, is also essential for many embedded applications. Our measurements are conducted using an LG GT540, a Nexus S, and a Galaxy Nexus, representing three typical low- to medium-end smartphone models. They run three versions of Android distribution, 2.1, 4.0.4, and 4.2.2. The LG GT540 results discussed here are representative of these phones measured in terms of the level of timing variability. 22 Percentage(%) Percentage (%) 50 60 40 20 0 40 30 20 10 0 0 20 40 60 80 100 120 Interval between two interrupts (ms) Figure 2.3: Distribution of the intervals between two interrupts raised by a software timer of Android. 0 1 2 3 4 5 Execution time (s) 6 Figure 2.4: Distribution of execution time of the SIFT algorithm on 640x480 images on Nexus S. Software Timer: Fig. 2.3 plots the distribution of the intervals between two interrupts generated by a software periodic timer with an desired interval of 10 ms, while only Android core system services are running. Although most intervals are close to 10 ms, the distribution has a long tail with a maximum interval above 110 ms. Event timestamping: We then measure the delay between the time instance when a pulse signal is received by a digital pin of an extBoard (which triggers a USB interrupt to Android) and when the USB interrupt is received in an Android application. Our measurement shows that this delay is highly variable and can be up to 5 ms. Due to the Android’s poor timing accuracy suggested by these results, it is difficult to implement high-constant-rate sensor sampling or precise event timestamping. In contrast, our measurement shows that the timing error of an Arduino extBoard is no greater than 12 µs, due to the availability of hardware timers and efficient interrupt handling. We then investigate the execution time of the signal processing algorithms. We find that most algorithms have relatively constant execution times for fixed input sizes. However, the execution time of a few algorithms depends on the input data. Fig. 2.4 shows the distribution of the exe23 Execution time (ms) Execution time (s) 1.6 variance 1.4 regresser 1.2 wavelet 1 sparsity 0.8 0.6 0.4 0.2 0 128 256 400 800 Input size 1600 (a) On Arduino 1 0.8 0.6 variance regresser wavelet sparsity 0.4 0.2 0 128 256 400 800 Input size 1600 (b) On Nexus S Figure 2.5: Execution time of signal processing algorithms (error bar: standard deviation). cution time of the scale-invariant feature transform (SIFT) used for detecting features of different 640x480 pixel images on a Nexus S. This example suggests that the statistical properties of the signal processing delays must be accounted for at runtime to ensure the real-time performance of the application. The execution times of the tasks determine their energy consumption and highly affect the realtime performance of the application. Fig. 2.5 plots the execution times of four signal processing algorithms on an Arduino extBoard and a Nexus S smartphone versus the length of the input signal. It can be seen that extBoard’s and smartphone’s latencies are in the order of seconds and milliseconds, respectively. However, they have comparable power consumption as will be shown in Section 2.5.2. Therefore, the smartphone can process the signals with less energy and shorter delays. 24 2.5.2 Power Profiling As computation is typically the dominant source of power consumption in data-intensive sensing applications, we focus on profiling the CPU of smartphones. Power consumption of other components (e.g., radio) can be easily integrated with the measured CPU power profiles. We measure the current draw of several Android smartphones using an Agilent 34411A Multimeter. Fig. 2.6a shows the the power consumption of a Samsung Nexus S and Arduino board in different processing states. We observed similar CPU state transitions and power consumption characteristics across multiple smartphone models. Initially, the smartphone is in the sleep state, and hence draws little current (less than 5 mA). At the 5th second, the extBoard requests the smartphone to execute an FFT algorithm. Upon receiving the request, the phone first acquires a wake-lock, the Android mechanism to prevent the phone from going to sleep. At the 25th second, FFT completes and releases the wake-lock. Before the phone fully wakes up or goes to the sleep state, there is a transitional phase with a few power spikes. Fig. 2.6b shows the expanded view of these two transitional phases. We refer to them as wake-up and tail phases, lasting approximately 200 ms and 755 ms, respectively. There are also two spikes in Fig. 2.6a, caused by the communications between the phone and the extBoard. Since these spikes are very short and have limited current draw, their energy consumption is negligible. Based on these results, we define four CPU states: sleep, wake-up, active and tail. The Arduino extBoard has three states, active, idle, and sleep. Its average current draw in these states are 90 mA, 66 mA, and near zero, respectively. In contrast to the smartphones, the transitional states for Arduino are very short (in the order of µs) and hence their energy consumption is negligible. 25 Power consumption (mW) Arduino Nexus S 800 600 400 200 0 0 5 10 15 Time (s) 20 25 30 Power consumption (mW) Power consumption (mW) (a) Power consumption comparison of Arduino and Nexus S in a 30-second experiment. 800 600 400 200 0 4 4.5 5 5.5 Time (s) 6 800 600 400 200 0 24.8 25 25.2 Time (s) 25.4 (b) Zoomed-in view of the wake-up and tail states of Nexus S. Figure 2.6: Arduino and Nexus S power consumption profiles. 2.5.3 Summary The above profiling results show the significant heterogeneity in the power and latency profiles of different tiers (extBoard and smartphone). Although similar measurement studies have been reported in literature [Newton et al., 2009, Cuervo et al., 2010a], we collectively report our mea- 26 surement results and show how these findings provide important implications for both challenges and opportunities in the design of ORBIT. First, as the Android system has poor timing accuracy, time-critical functions such as high-rate sensor sampling and precise sensor event timestamping must be shifted to the extBoard owing to its hardware timers and efficient interrupt handling. Second, signal processing algorithms may have dynamic execution times, which need online profiling to ensure that the critical time deadlines of the application are met. Third, smartphones have much lower latency and higher energy efficiency than the extBoard. However, if the extBoard must stay active to continually sample sensors, it is desirable to utilize its spare time to process signals, such that the smartphone can sleep to save energy. Lastly, the transitional phases (wake-up and tail) and the data transfers among the tiers incur non-negligible overhead in both energy consumption and latencies. When dispatching signal processing tasks to different tiers, these important characteristics must be carefully considered in order to minimize the total system energy consumption while meeting application latency constraints. 27 2.6 Design And Implementation This section presents the design of ORBIT to achieve the objectives discussed in Section 2.3.1. 2.6.1 Application Pipeline An ORBIT application pipeline can be represented by a graph, where the nodes are the processing tasks and the edges are the data flows. The application pipeline, which defines the sequence of executing the tasks, is used by the component-based programming model and task partitioning module of ORBIT. Each task implements an elementary sensing or processing operation, such as computing mean, FFT or converting an image to grayscale. For example, an application pipeline can be: sample the sensor (camera) → low pass filter → face recognition → write into file. Each task itself can be made of a few smaller tasks. Such an application model offers two benefits. First, by the notion of task, we can build the latency profile of each task (as explained in section 2.5.1) and use it for task partitioning (as described in the next section). Second, ORBIT application model can significantly simplify the application development and reduce the user effort to create an application, especially for those who are not familiar with embedded system design. In particular, ORBIT presents application developers with a single programming abstraction without burdening them with low-level details such as where and how the tasks are executed and how they communicate across different tiers. ORBIT supports two methods for specifying an application. An application developer can either write Java code using the ORBIT API or write an XML file. In either way the application pipeline specifies what tasks are used, what parameters for each task are set, and how the task are connected to form the pipeline. From this point forward, we will use a running example, shown 28 data connection sequential execution connection branching execution connection S4 T4 4 S1 S2 T5 T2 T1 T3 S3 T10 3 S5 S5 T7 S5 T9 2 S1 T6 1 S1 S2 S6 T8 T11 5 S6 T12 6 Figure 2.7: An example ORBIT application. The numbers besides the leaf nodes in the execution tree are the priorities assigned by the application developer; the tag Sx of a task represents the set it belongs to in the task partitioning solution.) in Fig. 2.7, to illustrate how tasks are connected to build an application, as well as the automatic execution optimization and manipulation in later sections. The sample application has 12 tasks (i.e., T1 to T12 ). The major way to define an application is to use the ORBIT API. ORBIT provides the application developer an API, using Java annotations [Oracle, ]. By using this API, an application developer implements the application pipeline as a Java class specifying each task in the pipeline as a field and uses ORBIT-provided annotations to annotate each task. By annotations, the developer indicates which task is connected to another task(s) as well as which outputs data pins in the source task are connected to which input data pins in the destination task. For instance, a Java class generating the application pipeline in Fig. 2.7 can simply be implemented as shown in Fig.. 2.8, where T aski is an algorithm in the ORBIT library, the parami s specify the input and output parameters for each task including the input, output data and data sizes (number of samples) and other algorithms’ specific parameters, e.g., threshold, window size and etc. The @N ext annotation is defined by ORBIT API and used by application developer to connect the tasks and form the pipeline. The annotations @source and @sink are used to indicate the source and sink 29 /** import ORBIT API **/ public class Sample_application_pipeline extends ORBIT_pipeline_model { @Source @Next{T_2, T_3} private Task T_1 = new Task_1(param_1,param_2,...,param_N); @Next{T_4, T_5{2}, T_6{1}} private Task T_2 = new Task_2(param_1,param_2, ...,param_N); @Next{T_7,T_8} private Task T_3 = new Task_3(param_1,param_2,...,param_N); . . . @Sink private Task T_10 = new Task_10(param_1,param_2,...,param_N); } Figure 2.8: Pseudo-code for generating an application pipeline tasks in the pipeline. A key advantage of the annotation-based ORBIT programming model is that the developers use the advanced features of Java supported by Android and take advantage of the ease of use of Java language to set up the application pipeline without being burdened with error-prone embedded programming using low-level languages. 2.6.2 Data Processing Library ORBIT provides a library of data processing algorithms ranging from common learning algorithms and utilities (e.g., classification, regression, clustering, filtering, and dimensionality reduction), to primitives like gradient decent optimizations. Using these well tested functions and provided APIs, developers can quickly construct sensing applications by simply connecting different building blocks via the ORBIT application pipeline model. This library has two main design objectives. Firstly, it is extensible so that developers can easily add more algorithms or port legacy signal pro- 30 cessing libraries. Secondly, it is designed to be resource-friendly with smartphone and extBoard (if utilized by the application). Several algorithms are implemented in Java while others are written in C++ and connected with the rest of ORBIT components via a Java Native Interface (JNI) bridge. A key challenge in the design of ORBIT programming library is that many ORBIT applications have stringent requirements on timing/overhead. ORBIT library includes two mechanisms to optimize resource usage while providing programming flexibility at the same time, namely adaptive delay/quality trade-off and data partitioning via multi-threading. These mechanisms allow programmers to develop resource-friendly applications on the smartphone platforms. 2.6.2.1 Adaptive Delay/Quality Trade-off The goal of this feature is to shorten the execution time of many tasks without substantially impeding the quality of their output. ORBIT achieves this by taking advantage of a property common to many algorithms. That is, many algorithms are iterative and based on an optimization function. The most commonly used methods to solve optimization problems, including the gradient descent method and Newton’s method, are implemented as low-level primitives in the ORBIT library. Gradient descent is an iterative process moving in the direction of the negative derivative in each step (or iteration) to decrease the loss. Once the loss is less than a threshold, the algorithm stops. Similarly, Newton’s method uses the second derivative to take a better route. Thus, a task that goes through more iterations to find the optimum solution for an objective function experiences a longer execution time, consequently causing the application to consume more energy on the smartphone. One way to shorten this latency and thus decrease the energy consumption is to simply stop the algorithm earlier, e.g., when the solution at step t is satisfactory. This approach is motivated by the principles of anytime algorithms [Zilberstein, 1996]. This early-stopping mechanism for these iterative-optimization tasks in ORBIT is controlled by three parameters: stepSize, numOfItera31 tions, samplingFraction, where samplingFraction is the fraction of the total data sampled in each iteration to compute the gradient direction. In the ORBIT library, these parameters are used as input parameters to the quality controller for each task while still satisfying the quality level of the entire application pipeline. 2.6.2.2 Data Partitioning via Multi-threading One of the key advantages of the smartphone, in comparison to the mote-class platforms, is the availability of high-speed multi-core processors. Many smartphones today have two or more cores. For instance, Moto G costs less than $110 and has 4 cores. However, in spite of the availability of multi-core CPUs, multi-thread programming remains challenging. ORBIT can automatically partition long-running and compute-intensive tasks into different threads and run them on different cores. This allows users to focus on the domain specific aspects when designing the task structure for their sensing applications. There are two different approaches to transforming an application into multiple threads. First, we can schedule different tasks of the application to execute on a pool of worker threads. In particular, ORBIT can parse the task structure and schedule tasks to different threads accordingly. However, many embedded applications contain a small number of "bottleneck" tasks in the signal processing pipeline, whose execution time dominates the total latency. As a result, such a tasklevel multi-threading strategy would not significantly reduce the end-to-end latency. ORBIT adopts a data-driven multi-threading approach to partition these tasks. We now use the matrix-vector multiplication operation as an example to illustrate this approach. Many signal processing algorithms (e.g., various transforms and compressive sampling) are based on matrix multiplication. The output y is the matrix multiplication expressed as y = Ax, where A ∈ Zm×l is the computation to be applied on the input x ∈ Zl×1 . Suppose matrix A 32 is evenly split into sub matrices, i.e., A = [A1 , A2 , . . . , AK ], where Ak ∈ Zm/k×l . The kth sub-task computes yk = Ak x, and the final result is y = [y1 , y2 , . . . , yk ]. The kth sub-task also performs matrix-vector multiplication. ORBIT picks the value of k based on the number of cores available on the phone (which can be queried through an Android API). ORBIT creates the computation threads on-the-fly and assigns the maximum priority to them to ensure they will not compete for resources with other threads running on the device. In this manner, ORBIT splits all matrix-based signal processing tasks assigned to the smartphone. A number of signal processing algorithms based on matrix operations can benefit from ORBIT’s data partitioning scheme. Examples include Singular Value Decomposition (SVD), Eigenvalue Decomposition, Principal Component Analysis (PCA), mean and average. These fundamental algorithms are often used in the design of other more advanced algorithms. Since extBoard does not support multi-threading, these versions are implemented in C++ without the use of any matrix libraries. A key design consideration of multi-threading is to minimize the overhead of inter-thread communication. In ORBIT, the matrices are passed to the threads by reference and each thread computes the partial and non-overlapping (disjoint) part of the result. In other words, different threads access the same data structure but disjoint parts of it. For example, thread k computes yk = Ak x, and sub-matrices yk are not overlapping. Matrix y = [y1 , y2 , . . . yk ] is also accessed by the main thread similarly without conflicts or memory copy between threads. The avoidance of inter-thread communication in ORBIT is important for data-intensive tasks that deal with large matrices. 33 2.6.3 Task Partitioning and Energy Management A key design objective of ORBIT is to provide an energy efficient smartphone-based platform. For this purpose, ORBIT adopts a task partitioning framework that exploits the heterogeneity in power consumption and latency profiles of different tiers. The task partitioning algorithm minimizes system energy consumption while meeting the processing deadlines of sensing applications. In addition, to reduce application delays, ORBIT implements a data partitioning scheme that decomposes matrix-based computation into multiple threads which are scheduled to execute on different CPU cores. 2.6.3.1 Power Management Model From the key observations obtained from the measurement-based study in Section 2.5, ORBIT employs different power management strategies for different tiers. Specifically, the extBoard operates in a duty cycle where it remains active for Ta seconds and sleeps for Ts seconds in a cycle. During the active period, the extBoard samples the sensors at constant rates. The time duration for sampling a signal segment is referred to as sampling duration, and denoted as Td . The active period contains multiple sampling periods. A signal segment collected during the current sampling period will be processed by the ORBIT application (e.g., the one shown in Fig. 2.7) in the next sampling period. The values of Ta , Ts , and Td are determined based on the expected system lifetime and timeliness requirements of the sensing application. Moreover, the sampling and processing on the extBoard are often subjected to stringent delay bounds. Modern microprocessors also offer low power sleep states with wake on interrupt which can be utilized to further reduce the extBoard power consumption during the sampling period. Different from extBoard, the smartphone adopts an on-demand sleep strategy in which it remains asleep unless activated by extBoard or by the 34 Ts Ta ... extBoard Td activation time phone state Figure 2.9: Power management scheme. cloud messages. Fig. 2.9 illustrates the extBoard’s duty cycle and the smartphone on-demand sleep schedule. 2.6.3.2 Execution Time Profiler The extBoard and smartphone power profiles are unlikely to substantially change during the lifetime of the application. However, the latency profile of a task may contain errors and be subject to change after deployment, as shown in the Fig. 2.4 example. To address this issue, ORBIT continuously measures the latency of each task at runtime and periodically re-runs the task partitioner to update the task partitioning scheme. Specifically, we designed an Execution Time Profiler that can build the statistical latency models for all tasks based on the run-time measurements. It measures the execution time of each task by using the system time before and after execution of the task. It also maintains a Gaussian distribution model for each task’s execution time, Ti ∼ N (µi , σi2 ). The parameters of this distribution are updated by each new measurement t as : µ0i = µi + n1 .(t − µi ) 0 and σi2 = n1 ((n − 1)σi2 + (t − µi )(t − µ0i )). Based on these models, the percentiles with a high rank p are used to set the execution times (i.e., ti , tbi , and tci ). Under this approach, ORBIT can achieve optimal partitioning solution while meeting the timing requirements statistically. 35 2.6.3.3 Partitioning with Sequential Execution As discussed in Section 2.6.3.1, the extBoard has a fixed duty cycle and hence consumes relatively constant energy. Therefore, ORBIT aims to minimize the total energy consumption of smartphone, subject to the processing delay upper bound for each tier. Consider a sensing application consisting of n tasks (denoted by T1 , . . . , Tn ), with an execution pipeline expressed as a sequential set of tasks: T = T1 → T2 → . . . → Tn . Let Ii denote the execution tier of Ti , where: Ii ∈ {(1, 0, 0), (0, 1, 0), (0, 0, 1)} represent the extBoard, smartphone, and cloud, respectively. Let τb , τp , τc , τA denote the execution times of the extBoard, smartphone, cloud, and the end-to-end delay of the whole application (or the delay-critical portion of the application), respectively, in a sampling period. We now formulate the task partitioning problem for sequential execution. The case of branching execution is discussed in a technical report [Moazzami et al., 2013]. Task Partitioning Problem. For the sequential execution T = T1 → T2 → . . . → Tn , the Task Partitioner finds an execution assignment set S = {I1 , I2 , . . . , In } to minimize the total smartphone energy consumption in a sampling duration (denoted by E) subject to τb ≤ Db , τp ≤ Dp , τc ≤ Dc , and τA ≤ DA . The processing delay upper bounds Db , Dp , Dc , and DA are typically set according to the timeliness requirements of the application, e.g., the constant rate of sensor sampling, the time period to detect a moving object before it moves away, etc. As the sensor sampling and timestamping introduce little overhead (cf. Section 2.6.4.2), it is safe to set Db to a value that is slightly smaller than the sampling period. It is shown in [Newton et al., 2009] and [Cuervo et al., 2010b] that this partitioning problem is modeled as an integer linear program (ILP) that minimizes a linear combination of network bandwidth and CPU consumption subject to the upper bounds for these resources. It is important to note that under the conventional ILP partitioning, the model only takes 36 the execution time latency (i.e., CPU consumption) and data copy latency between tiers (e.g., network bandwidth) into account. In contrast, ORBIT extends this model by adding two additional terms to the partitioning model. These terms are wake-up and tail time of smartphone and the (instant) power consumption of each tier. Also, with the help of the execution time profiler, ORBIT considers one more factor, the uncertainty of execution times. Thus, ORBIT provides a more realistic partitioning model. We now derive E and the delays (τA , τb , τc , and τp ). We first define the following notation. The p execution times of task Ti on the extBoard, smartphone and cloud are denoted by ti , tbi and tci , respectively. Let P denote the power consumption, where the superscripts ‘p’, ‘b’, and ‘c’ represent smartphone, extBoard, and cloud; and the subscripts ‘a’ and ‘s’ represent active power and sleep power of the smartphone and the extBoard. Denote tb↔p the latency of downloading/uploading a data unit from/to the phone to/from the extBoard, tp↔c the latency of downloading/uploading a data unit from/to the phone to/from the cloud, Ji the number of input pins of Ti , and li,j the signal length of the jth input pin of Ti . We now analyze the energy consumption and processing delay of an application in a sampling period. Note that we only need to analyze the energy consumption of the smartphone. The reasons are twofold. First, the cloud’s energy consumption does not fall into the system’s total energy consumption. Second, as the extBoard keeps active to continually sample sensors or activate the smartphone, its power consumption is fixed. 37 (1) Processing energy consumption and delay: Let E1 and τ1 denote the smartphone energy and delay in processing the signal collected during a sampling duration. Analysis shows  n X p (Pab + Ps )tbi    E1 = Ii ·  (Psb + Pap )tp i  i=1  p (Psb + Ps )tci  n X tbi    Ew +ET  ,  + d(Ii , Ii−1 ) ·  2       Tw +TT   τ1 = , Ii ·  tp  + d(Ii , Ii−1 ) ·  i  2 i=1   c ti where Ew , Tw , ET and TW are the energy consumed and the time spent during the tail and wakeup phases respectfully. The function d(Ii , Ii−1 ) accounts for the data-copy overhead between the tiers by indicating the distance between the positions of ‘1’ in Ii and Ii−1 . For instance, when Ii = Ii−1 = (1, 0, 0), d(Ii , Ii−1 ) = 0; when Ii = (1, 0, 0) and Ii−1 = (0, 1, 0), d(Ii , Ii−1 ) = 1. Since the number of tiers are three, the maximum distance is two dmax (Ii , Ii−1 ) = 2 and that is when a task is assigned to the cloud (e.g., Ii = (0, 0, 1)) while its previous task is assigned to the extBoard (e.g., Ii−1 = (1, 0, 0)), or vice-versa. In this case, the data has to be transferred between these two tiers through smartphone (there is not direct communication between extBoard and the cloud server). Moreover, we obtain the execution times of the extBoard, smartphone and the cloud portion of application as following:       b t 0 0  i      n n n       X X X       τb = Ii ·  0  , τ p = Ii ·  tc  , τc = Ii ·  0     i    i=1 i=1 i=1       c 0 0 ti 38 (2) Overhead of phone state transitions and cross-tier data copy: Let E2 and τ2 denote the energy consumption and the delay for copying data. We define a function s(i, j) based on the application pipeline which returns the ID of the source task connected with the Ti ’s jth input parameter. If the tasks Ti and Ts(i,j) are executed at different tiers, the jth input data parameter of Ti needs to be copied between the consecutive tiers causing smartphone energy consumption of p Pa tc li,j and extBoard processing delay of tc li,j . Thus, Ji n X X p E2= d(Ii , Is(i,j) )Pa tc li,j i=1 j=1 Ji n X X τ2= d(Ii , Is(i,j) )tc li,j . i=1 j=1 Therefore, the total smartphone energy consumption and the delay for processing the sensor data collected in a sampling period are E = E1 + E2 and τA = τ1 + τ2 . Note that E does not include the sleep energy consumption of the smartphone from the end of the current execution cycle to the beginning of the next cycle when the new sensor data become available. However, as the Task Partitioner will fully utilize the allowed processing time D to reduce the smartphone energy consumption, the time duration of an execution cycle will be close to the sampling period if D is close to the sampling period. Therefore, the sleep energy consumption of the smartphone during the gap is negligible. Based on the above delay and energy models, the task partitioning problem is a constrained non-linear optimization problem. The nonlinearity comes from the formula of E and τ . ORBIT uses brute-force search to solve the problem. As the number of tasks in a sensing application is typically small, our measurements in Section 3.7 show that the brute-force search introduces little overhead even if the Task Partitioner is executed periodically by the smartphone 39 (c.f., Section 2.6.3.4). For instance, its execution time is less than 10 ms on a Galaxy Nexus for up to 20 tasks. 2.6.3.4 Partitioning with Branches While we focus on sequential execution in the last section, real applications can contain branches in their execution flow. To discuss our approach to partitioning tasks containing branches, we continue to use the running example shown in Fig. 2.7. Different from sequential tasks, a key challenge is a task partitioning solution that is optimal for all branches may not exist. As an example, consider the part of Fig. 2.7, which includes T1 , T2 , and T3 only. Suppose we run the task partitioning algorithm for the two execution paths, i.e., T1 → T2 and T1 → T3 . These two solutions can be conflicting because T1 may be assigned to different tiers (smartphone and extBoard) in each solution. ORBIT adopts a priority-based approach to resolve the potential conflicts. In the execution tree of Fig. 2.7, there are six paths from the root node to all leaf nodes. We assign an integer priority to each path, where a smaller number means a higher priority. As each leaf node is associated with a unique path from the root node, the priorities can also be associated with the leaf nodes. A higher priority means that the corresponding path will be executed with higher probability. The priorities can be assigned by the developer or randomly set by default. In our approach, we run the task partitioning algorithm discussed in Section 2.6.3.3 for each path, in the order of increasing integer priority. For instance, we run the task partitioning algorithm over the path with the highest priority (i.e., T1 → T2 → T6 ), yielding solution S1 . We then choose the path with the second highest priority (i.e., T1 → T2 → T5 → T9 ). As T1 and T2 have been included in S1 , we run the task partitioning algorithm for the residual path (i.e., T5 → T9 ) only with the assignment of T2 in S1 . We apply this procedure to all other paths. At run time, the Task Controller executes the tasks according to the assignment set. For instance, in Fig. 2.7, if it decides to execute 40 T5 according to the result of T2 , it will dispatch T2 according to S2 . 2.6.4 Task Controllers Task Controllers (TCs) on the smartphone, the extBoard, and the cloud execute the entire sensing application according to the assignment computed by the Task Partitioner. Fig. 2.2 shows the interaction of the TCs with other components in ORBIT. 2.6.4.1 Smartphone Task Controller The smartphone TC is designed as an Android background service, which manipulates the execution of the tasks and communicates with the extBoard and the cloud. When the ORBIT application is launched, the smartphone TC creates the instances of the tasks in T, and allocates the buffers for all inputs and outputs. After this initialization phase, the TC checks the partitioning assignments and begins execution of the first task. When the smartphone is not executing a task, it switches to the sleep state to conserve energy. When task execution needs to return to the smartphone, a notification message is sent waking the smartphone and activating its TC to execute the next tasks assigned to it. The smartphone TC also continuously updates the task meta information (e.g., execution times) as well as branch priorities. Our measurement study shows that the smartphone consumes considerable energy during wakeup and tail phases (cf. Section 2.5.2). We optimize the design of TC to start a task as soon as the smartphone wakes up or to let the smartphone sleep as soon as no more tasks need to run. After the TC executes a task T on the smartphone, it checks if there is any task assigned to the other two tiers that takes T ’s output as input. If there are, TC will send T ’s output data to the other tier using ORBIT’s messaging protocol [Moazzami et al., 2013]). This allows the other tiers to run the tasks with input data from smartphone without re-activating it, avoiding extra wake-up and tail energy 41 consumption. However, a side effect of this design is that, if the application has branches the data transmitted to another tier may not be used. However, typical signal processing pipelines likely contain a limited number of branches. 2.6.4.2 extBoard and Cloud Task Controllers The extBoard TC continually checks for the arrival of messages from the smartphone. When it receives a start task execution message from the smartphone, it begins executing the first task in its assignment. In the case of starting a sampling task, the extBoard creates a periodic timer to control the sampling. The timer interrupt handling routine reads a sensor sample from the ADC, timestamps it, and then inserts it into a circular buffer. This process involves only a few instructions, and is optimized to reduce the interrupt handling delay. Once the sampling task has obtained the number of samples specified in its input parameter, task execution continues with the task following the sampling task according to the execution tree T. The cloud TC is implemented as a Linux daemon that checks for the arrival messages from one or multiple smartphones. There are two types of tasks running on the cloud; tasks that are computationally intensive that are assigned by the Task Partitioner and the tasks that take input data from multiple ORBIT nodes. Upon the completion of task T in the cloud, the cloud TC sends T ’s output to all the smartphones that require the output. If any of the smartphones are in the sleep mode, they wake up when a cloud message is received. The cloud TC, like the smartphone TC, continuously updates the task meta information (e.g., execution times) as well as branch priorities. This ensures fresh meta information is used for the task partitioning. 42 2.7 Microbenchmark In this section, we evaluate overall memory and CPU usage of ORBIT as well as the overhead introduced by online task partitioning. We also evaluate the effect of multi-threading on reducing the task processing delays. CPU and memory footprint on smartphone: We measure the CPU and memory footprints of ORBIT. We use the Android utility application, System Monitor, to measure the CPU and memory usages. We select different applications that run with ORBIT. ORBIT runs as an app and its CPU utilization may vary based on the smartphone hardware, Android version and other apps running on the smartphone. We measure the CPU footprint of ORBIT by the increased CPU utilization when it runs tasks. Our measurements show that ORBIT’s CPU footprint ranges from 10% up to 15%. The memory usage is about 22.5 MB during silence, but reaching 33.8 MB for a sensing application as heap space is dynamically allocated for the processing tasks. The total size of the ORBIT binary is only 2.84 MB. Delay/quality trade-off: In section 2.6.2.1 we discussed how algorithms are tuned for desirable trade-offs between quality and delay. Fig. 2.10 shows the convergence of the Gradient Descent algorithm for different step sizes r and number of iterations. As it is expected and is illustrated in the figure the gradient value decreases as the number of iterations increases until finally converges to the solution. Larger step sizes result in the gradient converging faster. However the rate of decrease slows after a certain iteration for each step size, meaning the task does not benefit from more iterations. Thus, gradient descent can often find a good enough solution in fewer iterations than the number of iterations provided as an input parameter, allowing ORBIT to stop it earlier without loosing a significant accuracy. Examples of algorithms that can benefit from this feature are SVM, linear regression and K-mean clustering. This feature not only provides insights for 43 Gradient Value 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.0001 < r < 0.01 r = 0.0001 r = 0.01 00 50 00 45 00 40 00 35 00 30 00 25 00 20 00 15 00 10 0 50 0 Iteration Figure 2.10: Delay/quality trade-off (r = step size) choosing better parameter values for each task in the application pipeline, but also gives ORBIT the power and a systematic mechanism to terminate the tasks while still maintaining the results within an expected accuracy range. Effect of data partitioning and multi-threading: As discussed in Section 2.6.2.2, smartphone TC can partition compute-intensive tasks into multiple threads to reduce the processing delays. Fig. 2.11 shows the performance gain of a matrix vector multiplication task, y = Ax, on two different smartphones, Moto-G with a quad-core processor and Galaxy Nexus with a dual-core processor. In this example, vector x ∈ Zl×1 is the input signal and matrix A ∈ Zm×l is the computation matrix. l has a fixed value of 2000 data samples and m varies for different operations (the horizontal axis in the figure). Larger values of m indicate more data-intense computation. The results show that the computation delay reduces by 44.7%, on average, for Mote-G and reduces by 36.2% for Galaxy Nexus when the task is partitioned into 2 threads. It also reduces by 56.1% for Mote-G when the computation is partitioned into 4 threads. As we can see from the figure, multi-threading reduces the computation delay more for larger matrices (more data-intensive computation) that agrees with our design objectives. Another important result from this figure is that 4 threads in Moto-G does not provide significant improvement over 2 threads. This is because, once the computation is partitioned into 2 threads the problem 44 Execution time (ms) Execution time (ms) 140 Only main thread 2 threads 4 threads 200 150 100 50 Only main thread 2 threads 120 100 80 60 40 20 0 0 200 800 1200 1600 2000 2400 Matrix size 200 (a) Moto-G (4 cores) 800 1200 1600 2000 2400 Matrix size (b) Galaxy Nexus (2 cores) Figure 2.11: Smartphone multi-threading reduces processing delay of compute-intensive tasks. size is reduced by half. Consequently when each thread is further split into 2 new threads, it only affects a smaller problem and thus the reduction in computation delay is smaller. This agrees with the intuition that multi-threading provides less improvement for smaller problems. Effect of data dependency: A salient feature of ORBIT is that it takes input data size and input data content into account in modeling the task energy consumption and partitioning. In contrast, in conventional task modeling and partitioning schemes, the time latency is measured offline and the average value is often assumed as the time latency without considering the observed variance in the execution time. However, our measurement study shows that the execution time can vary significantly for a data-dependant algorithm with different input sizes and input content. We now use several examples to illustrate the effect of data dependency on the system energy consumption. Fig. 2.12a shows the distribution of the execution time for the SIFT algorithm for input images with different dimensions and number of SIFT features. Fig. 2.12b shows the difference between the energy consumption estimation of SIFT algorithm under the Wishbone approach and the approach adopted by ORBIT. Since Wishbone does not consider the differences between input data, the 45 kermit house chair same animal chair House 5 80 60 40 20 0 kermit ET 6 100 Energy (J) Percentage (%) average animal ET 4 6 8 10 12 14 16 Execution time (s) 4 3 2 1 18 0 (a) Execution time distribution of SIFT Wishbone ORBIT (b) Energy consumption of SIFT for different images Figure 2.12: The data-dependant algorithms. average value of offline measurements will be used as the execution time. Therefore, when the execution time of SIFT for an image is close to the average value, e.g., for the house image, the energy estimated by both approaches are similar. However when the execution time of the image is less than the average, e.g., for the ET, kermit and the animal images, the estimated energy by ORBIT outperforms Wishbone. On the other hand, if the execution time is longer than the average execution time, e.g., the chair image, although the energy estimated by ORBIT is larger than Wishbone, ORBIT provides a closer estimation to the true value. Thus, ORBIT provides a more realistic approach to model the execution times and the energy consumption of data-intensive algorithms. 2.8 Case Studies To demonstrate the expressivity of ORBIT application scripting as well as the generality and flexibility of ORBIT as a platform, we have prototyped three different embedded sensing yet dataintensive applications. (cf. Table 2.1). Each application demonstrates different facets of ORBIT varying the number of tasks in the task-structure, the use of different sensors, the number of tiers 46 the application is partitioned between, and the data fidelity requirement of the application. Our goal is to demonstrate the capabilities and effectiveness of the platform rather than present novel applications. Table 2.1: ORBIT based applications. Application ← Robotic Sensing Event Timing Multi-Camera 3D Reconstruction Script Length 35 27 20 Number of Tasks 11 7 10 IR, Camera, Ultrasound GPS, Geophone Camera, GPS extBoard, smartphone extBoard, smartphone extBoard, smartphone, cloud 5fps 100Hz 640*480px Sensors Tiers Data Fidelity 2.8.1 Robotic Sensing We choose a robotic sensing application for our first case study. In this case study the application estimates the presence, the distance and the direction of an object approaching using an IR and a Sonar sensor attached to the robot as well as the smartphone’s built-in camera (cf. Fig. 2.1b). Once an object is detected by IR sensor, its distance is estimated using the sonar sensor. If within a specified range, the extBoard sends a command to the smartphone, waking up the smartphone, and activating the camera. Once the system captures the image, it converts the image to grayscale and the computes the threshold to separate the background and the foreground. Next, the system detects the objects and computes its bounding rectangle. As the object moves, its bounding rectangle changes. The direction and the velocity of this movement is determined by computing changes in the bounding rectangles. This information is then used to move the robot to track the object. We used a portion of the code from an open source soccer robot project [Object Tracking Robot, ]. 47 150 600 extBoard-only phone-only ORBIT Execution time (ms) Energy consumption (mJ) 200 100 50 500 400 extBoard-only phone-only ORBIT 300 200 100 0 0 ArduinoGT540 Galaxy Nexus ArduinoGT540 Galaxy Nexus (a) Total energy consumption (b) Total execution time Figure 2.13: The results of various partition schemes Compared to the event timing application, this application involves actuators in addition to the sensors. A few tasks are not allowed to be partitioned since they are directly sampling the sensors or actuating robot servos. Other processing tasks can be partitioned between tiers. The processing delay determines the maximum image capture frame rate which in turn determines the maximum speed of a moving object that can be tracked. Effectiveness of Task Partitioner: We first evaluate the effectiveness of the task partitioning algorithm by comparing with the baselines. For this case study, our baselines are extBoard-only and smartphone-only. Fig. 2.13 shows the energy consumption and the application execution time when the delay bound D is set to 0.4 s. Fig. 2.13a and Fig. 2.13b plot the estimated total energy consumption and total execution time (i.e., smartphone + extBoard) of a ORBIT node in one execution cycle. As the extBoard is slow and power-inefficient for intensive computation, it cannot meet the delay bound and consumes the most energy. Our partitioning approach in ORBIT achieves the lowest energy consumption across different smartphones. Impact of delay bound: We then evaluate the impact of the delay bound D on the task assignment 48 Energy (J) 2 GT540 Galaxy Nexus 1.5 1 0.5 0 0.2 0.4 0.6 Delay bound D (s) 0.8 1 Figure 2.14: Impact of delay bound setting on the task assignment and total energy consumption. and smartphone energy consumption. Assume at least n frames are required to detect the object and track its trajectory; the smartphone camera’s angle of view is θ; and the object’s distance to the robot is d. Also, let v indicates the speed of the object. Therefore, the time the objects takes to θ . Thus, the system has to process n frames in t move away out of the camera’s view is t = 2.d tan v tan θ . This shows seconds. As a result, the upper bound D for processing one frame is D = nt = 2.dn.v how the delay bound is inversely related to the object’s velocity. For example, if the camera’s angle of view is 18 degrees, the object’s distance to the robot is 3 m, and 5 frames are required to detect and estimate an object’s moving direction, for an object moving at 5 km/h the system must to process each frame in less than 0.28 seconds. Fig. 2.14 shows the smartphone energy consumption versus D. We can see that the total energy consumption decreases with D. This is because with higher delay bound more tasks are assigned to the extBoard allowing smartphone sleep for a longer time. This is consistent with our analysis. In this case the smartphone only wakes up when an image must be captured. It then preprocesses the image and sends the results to the extBoard to detect the object. 49 sampler truncator compute mean remove mean queue compute energy FFT extract feature noise model priority=1 updater priority=2 fine-grained bandpass picker Bayesian detector Figure 2.15: The block diagram of the seismic event timing application. The white blocks are pre-processing algorithms; the gray blocks are the earthquake detection algorithms; the black blocks are the P-phase estimation algorithms. 2.8.2 Event Timing This application estimates the arrival time of an acoustic/seismic event. This is an building block of many acoustic/seismic monitoring applications such as distributed event timing [Liu et al., 2013] and source localization. Seismic event source localization requires events timed to sub millisecond precision and time synchronization between nodes to be within a few microseconds. Fig. 2.16 shows the application specification of this case study. The incoming signal is first pre-processed by mean removal and bandpass filtering. Wavelet transform is then applied to the filtered signal. Signal sparsity and coarse arrival time are computed based on the wavelet coefficients. This application requires a sampling rate of 100 Hz. In the context of early earthquake detection, the system must have a response time in the order of a few seconds. The following section presents the evaluation results. Effectiveness of Task Partitioner: We first evaluate the effectiveness of the task partitioning algorithm presented in Section 2.6.3.3, by comparing the following partitioning approaches: extBoard only, phone-only, and greedy. The greedy approach assigns as many processing tasks to the 50 package com.my_sensing_app.pipeline; import ORBIT.pipeline_model.*; import ORBIT.io.*; import ORBIT.generic_task; import java.util.Vector; ... public class Event_timing extends ORBIT_pipeline_model { @Source @fixed{"extBoard"} @Next{T_1, T_2{0}} private Task T_0 = new sampler("extBoard",1600); @Next{T_2{1}} private Task T_1 = new compute_mean(1600,1); @Next{T_3} private Task T_2 = new remove_mean(1600,1,1600); @Next{T_4} private Task T_3 = new filter("band_pass",1600,1600,1,6); @Next{T_5, T_6} private Task T_4 = new wavelet("haar",1600,1600,4); @Next{T_7} private Task T_5 = new compute_sparsity(1600,1); @Next{T_7} private Task T_6 = new coarse_picker(1600,1); @Sink private Task T_7 = new write_into_file("results.txt"); } Figure 2.16: Application specification of event timing. The “sampler” is a special task running on the extBoard. Specific tasks with different parameters are defined. For example, the parameters “1600” and “1” indicate the number of input and/or output data samples for different tasks, the parameter “1,6” of the bandpass filter specifies the two corner frequencies; the parameter “4” of wavelet specifies the level of transform. 51 5 4 extBoard-only phone-only ORBIT 3 2 1 0 7 Execution time (s) Energy consumption (J) 6 6 5 4 3 2 1 0 Arduino GT540 Galaxy Nexus (a) Total energy consumption extBoard-only phone-only ORBIT Arduino GT540 Galaxy Nexus (b) Total execution time Figure 2.17: The results of various partition schemes. extBoard as can be supported by the delay bound. Fig. 2.17 shows the task partitioning results of the partitioning approaches using a delay bound D of 1.8 s. The extBoard processing delay meets this bound except for the extBoard-only approach. Fig. 2.17a and Fig. 2.17b plot the estimated total energy consumption and total execution time (i.e., phone + extBoard) of a ORBIT node in one execution cycle under different partition approaches. As the extBoard is slow and power-inefficient for intensive computation, it cannot meet the delay bound and consumes the most energy. Our partitioning approach (“ORBIT”) achieves the lowest energy consumption on the smartphones tested. The impact of the delay bound D on the task assignment and smartphone energy consumption was next evaluated. The top portion of Fig. 2.18 shows the number of tasks assigned to the extBoard versus D. We can see that the Task Partitioner generally assigns more tasks to the extBoard for larger D. This is consistent with our analysis in Section 2.6.3.3. However, we can see a number of drops in the top portion of Fig. 2.18. For instance, when D increases from 1.37 s to 1.38 s, the number of extBoard tasks drops from 4 to 1. This is due to a compute-intensive task replacing the previous four lightweight tasks to increase the CPU utilization of extBoard and 52 # of tasks Energy (J) 6 GT540 Galaxy Nexus 4 2 0 3 2 1 0 GT540 Galaxy Nexus 0.5 1 1.5 2 2.5 3 Delay bound D (s) 3.5 4 4.5 Figure 2.18: Impact of delay bound setting on the task assignment and total energy consumption. Top: The number of tasks assigned to the extBoard versus delay bound. Bottom: Total energy consumption versus delay bound. reduce the smartphone energy consumption. The bottom portion of Fig. 2.18 shows the total energy consumption versus D. This shows the total energy consumption decreases with D, which is consistent with our analysis. Measured execution time and energy consumption: Based on the obtained task partitioning results, we use a Nexus ORBIT node to run the application over real-time sensor readings. Fig. 2.19 plots the measured extBoard processing delay and the smartphone energy consumption versus the specified delay bound. Note the smartphone processing delay is less than 5 ms for all settings of delay bound. Therefore, the extBoard processing delay dominates. From Fig. 2.19a we can see that the specified delay bound is always met. Moreover, the extBoard processing delay increases with the delay bound, proving the effective utilization of the allowed extBoard CPU time. From Fig. 2.19b, the smartphone energy consumption decreases with the delay bound, which is consistent with our analysis. 53 30 Phone energy (mJ) extBoard delay (s) 2.5 2 1.5 1 0.5 0 0.1 1 1.7 2 Delay bound D (s) (a) extBoard processing delay 25 20 15 10 5 0 0.1 1 1.7 2 Delay bound D (s) (b) Phone energy consumption Figure 2.19: The measured extBoard processing delay and smartphone energy consumption versus delay bound. Duty cycle of extBoard and lifetime: Based on the measured energy consumption, we calculate the projected node lifetime over four D-cell batteries (capacity: 1.2 × 104 mAh) versus duty cycle of extBoard under various settings of delay bound. The results are plotted in Fig. 2.20. When the duty cycle is 100%, the projected lifetime is 5.8 days and when the duty cycle is 20%, the node can live for up to 2 months. As shown in Fig. 2.19b, the smartphone energy consumption is tens of millijoules, while the extBoard energy consumption is about one joule when duty cycle is 100%. Since the active powers of extBoard and smartphone are comparable (cf. Section 2.5), the extBoard energy consumption dominates when its duty cycle is large. In such cases, the major role of the smartphone is to help meet the tight delay bound, and the node lifetimes are similar for different delay bounds. However, when duty cycle is 20%, the lifetime can be extended by 18.4% if the delay bound increases from 0.1 s to 2.0 s. Nevertheless, with the help of the smartphone, the ORBIT node can meet tight delay bounds, which is critical to the success of many sensing applications that requires continuous sensor sampling. 54 Node lifetime (days) 60 D=0.1 sec D=1.0 sec D=1.7 sec D=2.0 sec 50 40 30 20 10 0 10 20 30 40 50 60 70 80 extBoard duty cycle (%) 90 100 Figure 2.20: Projected lifetime vs. extBoard duty cycle. Effect of branches: To evaluate the effect of branches, we integrate the event timing application in [Moazzami et al., 2013] with an event detection approach [Tan et al., 2010]. Fig. 2.15 shows the block diagram of the application. The sampling rate is 2.5 kHz, and the sampling duration is 40 milliseconds. In each cycle, the ORBIT node inserts the most recent 100 seismic samples into a queue with size of 1600. Next, the truncator task copies 100 samples at the middle of the queue to its output. A Bayesian event detection approach [Tan et al., 2010], which consists of multiple tasks, is applied to the output of the truncator. If the detector makes a positive decision, the node will run a primary-wave arrival time (i.e., P-phase) estimation algorithm [Sleeman and van Eck, 1999] based on all samples in the queue; otherwise, the node will use the input to the Bayesian detector to update the noise model used by the detector. Suppose the application is monitoring rare events (e.g., earthquakes), the execution path that branches to the noise model updater is assigned with higher priority since it occurs most frequently. In this case all tasks except the compute-intensive bandpass and fine-grained picker are assigned to the extBoard. Fig. 2.21 plots the trace of energy consumption of a tested ORBIT node in each sampling duration over time. In the first 7 seconds, the detector makes negative decisions and the node executes the branch to the noise model updater. From the 7th second, the detectors makes positive decisions for about 0.5 seconds and the extBoard 55 Energy (J) 2.6 2.2 1.8 1.4 1 0.6 priority-based extBoard-only (estimated) 0 2 4 6 8 10 Time (s) Figure 2.21: The energy consumption trace of a node. activates the phone to execute the bandpass and fine-grained picker. Hence, we observe increased energy consumption. We compare our approach with the extBoard-only approach. Under this approach, the delay bound cannot be met when the detector makes positive decisions. Therefore, we can not directly run this approach on the node. Instead, we estimate the energy consumption based on the extBoard power model and task meta records. From Fig. 2.21, it can be seen that, when a seismic event occurs, the energy consumption of extBoard-only is significantly higher than our approach, primarily due to the long execution time of fine-grained picker on the extBoard. 56 Figure 2.22: The block diagram of the Multi-camera 3D reconstruction application. 2.8.3 Multi-camera 3D reconstruction The final case study is inspired by Phototourism [Snavely et al., 2006] and involves opportunistic sensing wherein smartphone-equipped robots capture location-based images to collaboratively reconstruct a 3D structure. Compared to previous two case studies, this application is partitioned cross over three tiers. The captured image is partially processed on the phone and the remainder of the processing as well as the distributed tasks are offloaded to the cloud server. Once an image has been processed, the robot is directed to move to a new spot to capture a new image. In addition to CPU, we also account for radio power consumption in this case study. Fig. 2.22 shows the task structure of this application. The cloud server is emulated by a Sun Ultra 20 workstation. Effectiveness of Task Partitioner: For this case study, with the addition of the cloud tier and more complex input data to the sensing application, the communication delay between the smartphone and the cloud server and the complexity of input data impact the partitioning result. We evaluate the effectiveness of the task partitioning algorithm by comparing ORBIT with the phone-only and cloud-only baselines. In Fig. 2.23a and Fig. 2.23b two cases with different input images are compared: a) house image: a bigger image (in terms of number of pixels) with less complexity (in terms of number of SIFT features), and b) kermit image: a smaller image with higher complexity. 57 7 Energy (J) 6 5 8 cloud-only (4G) cloud-only (WiFi) phone-only ORBIT (4G) ORBIT (WiFi) 7 6 Energy (J) 8 4 3 5 4 3 2 2 1 1 0 0 Cloud Galaxy Nexus (a) House image (1280x722px) cloud-only (4G) cloud-only (WiFi) phone-only ORBIT (4G) ORBIT (WiFi) Cloud Galaxy Nexus (b) kermit image (640x480px) Figure 2.23: The results of various partition schemes. The difference between the partitioning assignments between these two cases is the assignment of the SIFT task. For the house image the results show that it is more energy efficient to run SIFT on the phone, because: 1) the image is less complex and thus SIFT runs faster and consequently causes the application to consume less energy, and 2) it would consume more energy to transmit the large image to the cloud for the SIFT processing. For the kermit image, it is more energy efficient to run SIFT in the cloud because it is a smaller image with more SIFT features. Thus, in both cases ORBIT comes up with the most energy efficient partition. In addition, this result demonstrates that ORBIT considers the execution time of data processing tasks not only as a function of input size but also as a function of input content. Existing task partitioning approaches [Newton et al., 2009, Cuervo et al., 2010a] often do not address the two affecting factors. 58 2.8.4 Discussion These case studies demonstrate the generality of ORBIT’s design. In particular, the three example applications differ significantly in the task structure, computation intensity of tasks, delay requirements, input data and the tiers involved in task partitioning. Overall, ORBIT can achieve energy saving of up to 50% compared to baseline approaches. An interesting observation from case studies 1 and 2 is the system power consumption is highly probabilistic. For instance, whether the power-hungry image processing is needed in case study 1 depends on the decision of the first-stage IR/Sonar sensing which is subject to false alarms and misses. However, such runtime dynamics are unknown to ORBIT at the design time. As a result, ORBIT partitions the tasks according to the worst-case scenario in which a target is assumed to be present. A possible improvement would be to provide ORBIT runtime feedback such as the detection history and estimation of system detection performance. This would allow ORBIT to optimize the wiring of tasks and priorities of tasks to reduce power consumption. Such runtime adaptation is supported by ORBIT due to its flexible task partition and dispatch framework. Case study 3 suggests the input data play important roles in the partitioning result. For example, when the input data is not too complex the tasks may not be offloaded to the cloud. However, the same image processing task may be offloaded to the cloud if it receives a complex input image. Making this decision at runtime, shows ORBITS intelligent flexibility. 59 2.9 Summary This Chapter presented ORBIT, a smartphone-based platform for data-intensive, embedded sensing applications. ORBIT features a tiered architecture, in which a smartphone is optionally interfaced with an energy efficient peripheral board, and a cloud server. By fully exploiting the heterogeneity in the power/latency characteristics of multiple tiers, ORBIT minimizes the system energy consumption, subject to upper bounded processing delays. ORBIT also integrates a data processing library that supports high-level Java annotated application programming. The design of this library facilitates the resource management of the embedded applications and provides programming flexibility through adaptive delay/quality trade-off and multi-threaded data partitioning mechanisms. ORBIT is evaluated through several benchmarks and three case studies: seismic sensing, multicamera 3D reconstruction and visual tracking using an ORBIT robot. This extensive evaluation demonstrates the generality of ORBIT’s design. Moreover, our results show that ORBIT can save up to 50% energy consumption compared to baseline approaches. 60 Chapter 3 SPOT: A Smartphone-Based Platform to Tackle Heterogeneity in Smart-Home Systems 3.1 Introduction The vision of smart, connected homes has been around for decades. In this vision, users easily perform tasks involving diverse sets of devices in their home without the need for painstaking configuration and custom programming. For example, imagine a home with remotely controllable lights, air-conditioning systems, cameras, windows, and door locks. It should be easy to set up this home to automatically adjust windows and lights based on the outside temperature and lighting or to remotely view who is at the front door and then open the door. While modern homes have many network-capable devices, applications that coordinate them for cross-device tasks have yet to appear in any significant numbers. Smart-homes differ in terms of appliances used there and how these appliances are connected and operated by users. The lack of global standards for smart appliances communication, control and data management results in highly fragmented systems consisting of proprietary solutions provided by each device vendor. Thus, users are required to use different control interfaces, e.g., 61 REST SOAP STUN ZigBee REST REST (a) mobile apps (b) communication (c) appliances Figure 3.1: Heterogeneity in today’s smart-home systems. Each appliance in (c) requires its own app as shown in (a) that communicates with the appliance using its own protocol via cloud, a bridge and/or directly as shown in (b). Each appliance in (c) has different functionality and each smartphone app in (a) does not support appliances for more than one vendor nor share data with other apps. The user has to switch between apps to operate different appliances. mobile apps, to interact with smart appliances in their homes or are forced to use devices sold by a single vendor. For instance, a user needs to separately operate Philips HUE [Phi, ] app and WeMo [wem, ] app to control lighting and smart power plug in the same room. Recently a cloudbased solution for multi-vendor IoT interaction, such as IFTTT [IFT, ], has appeared, but user’s flexibility is still limited to what the service provider offers. Moreover, third-party involvement might raise privacy concerns. Therefore, a user-centric platform that enables users to manage, monitor, and control heterogeneous smart appliances without relying on external parties is highly desired. In this Chapter, we expand the scope of our study to smart-home systems as another emerging class of data-intensive sensing applications. We propose and demonstrate a platform built on a dynamic device driver abstraction model that tackles different aspects of heterogeneity in smart 62 appliances and smart-home systems. The device-control capability can be defined and expanded by means of “device driver” that are composed using XML or annotation-based JAVA APIs. The specification of the driver will be made public so that device vendors, open source community, or users themselves can create or modify the drivers. Besides communication with smart devices, our device driver framework implements automated generation of appropriate graphical user interface based on the driver definition of each device, which can minimize the developer’s implementation efforts and improves the user experience by means of consistent look-and-feel throughout the system. Moreover, our framework provides a unified abstraction of the data structure in different smart devices that helps application developers to build “whole-home” applications that involve a diverse set of appliances. In addition to providing a single point of control over multiple smart devices, our platform enables centralized collection of data about smart appliances used in the entire household, such as historical data of device statuses (e.g., on/off and set point) and control operations, as well as analytics and utilization of such data for energy efficiency, home automation, automated demand response, and so forth. We believe a generic platform like this is an important step toward a smart-home ecosystem from which smart appliance vendors, application developers, and users can benefit. In summary, we makes the following contributions. First, we conduct a systematic study to understand the characteristics of smart-home appliances as well the opportunity and challenges for using smartphone as the central gateway to control smart-home appliances. The result of our study shows multiple aspects of heterogeneity in smart appliances. Second, we provide a flexible, extensive and extensible device driver model that supports a number of smart appliances available on the market. The driver model addresses multiple types of heterogeneity observed in our study. Third, we provide the design and implementation of the proposed platform as a smart-home system 63 that loads the drivers at runtime along with a dynamic user interface adaptive to the features of each appliance. Lastly, we demonstrate the generality and flexibility of our system by presenting our experience in prototyping the drivers for several real appliances as well as a cross-device home application. We also discuss examples of other home applications that we have prototyped on top of our platform. 3.2 Related Work While we draw on many strands of existing work, we are unaware of a system similar to SPOT that provides a unified solution for heterogeneity problem that smart-home systems suffer. We categorize related work into the following groups: Mobile apps for smart-home appliances: Many commercial off-the-shelf smart appliances are shipped with mobile apps that allows users to remotely control them, such as Nest Thermostat and Smoke detector [Nes, ], Philips [Phi, ], and GE Brillion [GE, ]. However, these apps only support only their own devices. Such a vendor-centric solution requires users to use multiple apps. In contrast to these systems, SPOT provides a device/vendor-independent, user-centric system that dynamically supports multiple devices of different types (e.g., thermostats, lighting, home security), from different vendors. Solutions based on hubs and central controllers: There are some solutions using hub devices for enabling centralized control and management of multiple appliances, such as Wink [win, ] and SmartThings [sma, ]. However, appliances still have to be designed and developed according to the proprietary specification provided by such solutions. Moreover, each hub-based system utilizes different communication protocols, making any interoperability highly challenging. Previous works also advocates using a central controller to simplify integration [Escoffier et al., 2008,Rosen 64 et al., 2004, Dixon et al., 2010]. These works have different scopes than SPOT. For instance, Rosen et al. [Rosen et al., 2004] focus on providing context such as user location to applications. Other systems have employed services in the home environment. iCrafter is a system for UI programmability [Ponnekanti et al., 2001]. ubiHome [Ha et al., 2007] aims to program ubiquitous computing devices inside the home using Web services. However, these systems only focus on the programmability of smart appliances and do not reduce the burden of users in terms of controlling diverse appliances in a home. Multi-appliance platforms: Many commercial home automation and security systems integrate multiple devices in the homes. For instance, HomeOS [Dixon et al., 2010] provides a PC-like abstraction for digital devices to users and developers. But such system only supports devices with existing drivers and does not provide any dynamic driver loading mechanism. Control4 [Con, ] only offers support for its own devices (and a limited set of ZigBee devices); a limited form of programming between devices. Furthermore, the technical complexity of installing and configuring Control4 and other systems alike (e.g., HomeSeer [Hom, ], Elk M1 [ELK, ]) can be handled only by professionals. EasyLiving [Krumm et al., 2000] is a monolithic system with a fixed set of applications geared for an specific domain (visual tracking via multiple cameras). Similarly, IFTTT.com offers the ability to define or download several predefined “recipes” in terms of “if this then that”, allowing users to have access to shared configurations for devices. However, IFTTT.com only supports a certain number of devices, called channels, and does not scale based on user preferred devices unless new channels. In addition, reliance on a third party may cause privacy concerns given the sensitivity of data collected by the devices in the home. In contrast to such systems, we focus on building a device-independent system that can be extended easily with new devices and applications by non-experts. 65 3.3 Requirements and Challenges Recent years, we can find a growing number of smart appliances on the market. However, these appliances exhibit significant heterogeneity in terms of communication protocols and architecture, device functionalities, programming abstraction and etc., owing to the lack of dominating standards in smart-home technologies. In this section, we will highlight different kinds of heterogeneity among smart appliances that brings challenges for different groups including users, home application developers and appliance vendors. We also discuss requirements to address each kind of heterogeneity based on our investigation of commodity smart appliances available on the market. We face the following major heterogeneities in building smart-homes: Diverse appliance control apps: Currently each modern smart appliance comes with its own smartphone app. These smartphone apps do not provide similarity in user interface nor any interaction with other apps. Consequently users have to switch between multiple apps when they want to operate different devices which deteriorates the user experiences. Users need to learn specific apps functionalities in addition to appliance functionalities. Such diversity in appliance control apps presents barriers for adoption of smart appliances. Communication interface: The communication interface for smart appliances varies a lot in protocol and architecture. Although most of smart appliances allow remote control via HTTP (or HTTPS) over WiFi, messages conveying control commands for each device differ. Many devices implement RESTful APIs with JSON messaging formats, while a few others, such as WeMo devices [wem, ], only support SOAP messaging that exchanges XML messages. Another aspect of heterogeneity is communication architecture. As depicted by Fig.3.1, there are roughly three different communication architectures: appliances are accessible directly via local IP address in the home network; appliances require a dedicated “hub” to communicate with; and appliances are 66 device thermostat 1 identifier name thermostatRev settings mode reminder ... ... thermostat 2 ... (a) Ecobee thermostat device device_id can_heat target_temp_f temp_scale fan_timer_active ... ... ... ... ... ... (b) Nest appliances device Status on bri ... type name ... ... ... ... ... (c) Philips light Figure 3.2: Heterogeneity in programming abstraction controlled via vendors provided cloud service. For example, thermostats from Venstar [Ven, ] and Radio Thermostat [Rad, ] are controlled from devices in the same WiFi network using local IP address. Similarly, Wink [win, ], SmartThings [sma, ] Philips HUE light [Phi, ] using local IP address but provide a gateway hub that implements WiFi interface for remote control. On the other hand, Nest’s thermostat/smoke detector and Ecobee’s thermostats [eco, ] are controlled remotely (even from outside of a user’s premise) via their own cloud services. Other aspects of heterogeneity observed in communication lie in security and user authentication mechanisms. Nest and Ecobee thermostats require users to use OAuth2 authorization token [Hammer-Lahav and Hardt, ] while Philips HUE light simply relies on username/password only. We also found diversity in appliances initial setup. Many appliances require a simple configuration and use a static server port to accept control commands. Thus, the external system controlling such appliances need to be configured with the devices’ IP addresses and port numbers, reconfiguration is rarely necessary unless IP address of devices changes. However, there are appliances like WeMo power plug [wem, ] in which the port number is randomly selected among non-privileged ports and changes upon device restart etc. In this case the app provided by Belkin uses UPnP service discovery to identify the WeMo 67 device’s IP address and port number. Besides WeMo devices, Philips HUE also adopts a device discovery service. The heterogeneity in communication architecture and protocol presents serious hurdles for developers to develop genetic applications that suits multiple appliances from different vendors. From the developer’s point of view it is highly desirable to standardize communication interface and messaging schema which are essential for building extensible applications. Moreover, lack of services like device discovery in majority of smart appliances requires users to perform the entire setup process for most of appliances they use, which is not necessarily a trivial process for non-expert users. This brings barriers in adoption of smart-home appliances and diminishes users ease-of-use. Programming abstraction: Even among devices using same protocol to communicate e.g., RESTful with JSON messages, variable names and the structure that forms the control command are not standardized. For example, to access set-point configuration, which is the most common variable in setting a thermostat target temperature, Nest thermostat uses a variable named target_temperature_f or target_temperature_c depending on what unit is used, while Radio thermostat uses either t_heat or t_cool, and Venstar thermostat uses either heattemp and cooltemp depending on the thermostat’s operation mode. This means not only different appliances adopt different naming for the same setting, but also the structures of control messages are different. Fig. 3.2 and 3.3 illustrate different variable naming for different appliances. In addition, several appliances confine their settings by defining dependencies between the values of different variables. For instance, in Venstar thermostat all control messages with mode must include heattemp and cooltemp variables or when setting mode to Auto, cooltemp must be greater than heattemp, while in Nest thermostat the variables are set independently. Smart appliances also vary in terms of their internal data 68 Ecobee thermostat: "thermostat": {"settings":{"heatRangeHigh" : "72", "hvacMode" : "on"}} Radio thermostat: {"tmode" : 1,"t_heat" : 72} PhilipsHUE light: {"on" : true, "sat" : 255, "bri" : 255, "hue" : 1000} Venstar thermostat: {’mode’: 3, ’fan’: 0, ’heattemp’: 75, ’cooltemp’: 78} Figure 3.3: Examples of heterogeneity in message schema in setting different configurations structure. Most commonly, the appliances utilize a hierarchical data structure for their variables. Fig. 3.2 illustrates the data structure of three common smart appliances i.e., Nest and Ecobee thermostats, and PhilipsHUE light. As depicted in the figure the grouping of variables as well the relationship in the hierarchy are different. From Fig. 3.2 and 3.3, we see depending on the data structure designed for the appliance the control messages are crafted differently. Fig. 3.3 depicts different messaging schema for different appliances. Unlike Radio thermostat and PhilipsHUE light that support a flat message structure, Ecobee thermostats adopts a “nested” structure. The difference in the data structure together with differences in the messaging schema and variables naming, makes it challenging for developers to design a common data management system for multiple appliances. The above heterogeneities in control apps, communication interface and programming abstraction bring significant challenges in designing smart-home systems. These challenges motivate us to develop a platform that is universal to different appliances and yet flexible and extensible. Commercial off-the-shelf (COTS) smartphones offer several salient advantages that make them a promising system platform for smart-home applications. The advantages include rich computation and storage resources, multiple network interfaces and sensing modalities, highly increasing availability in each home and low cost. Moreover, smartphones come with advanced programming languages and friendly user interfaces, such as touch screen to enable rich and interactive display, unlike the limited user interfaces of existing smart-home hubs. Modern smartphones have 69 (a) Zoomed-in view of SPOT’s driver manager with dynamic driver loading mechanism (b) SPOT components on the smartphone (yellow area) (c) appliances Figure 3.4: SPOT System Architecture powerful multi-core processors which makes it possible to run analytical applications to provide better privacy guarantee to the users sine personal information does not need to be transmitted to the cloud. Another key advantage to use smartphones to control modern appliances is the fact that smartphones are personal devices and users are more comfortable in using it rather than learning to operate new appliances directly or to learn other interfaces e.g., hubs, to control appliances. Moreover, by providing a central unified smartphone app it is possible to log appliance usage data in one place and provide further analytical processing. It also enables the developers to develop whole-home applications that involves multiple appliances. 3.4 System Overview In this section we present SPOT, which is designed to address the above major challenges. The key design principle of SPOT is to leverage the availability of smartphones in any house to support multiple heterogenous smart appliances and to facilitate the transition of current homes to smart-homes. Moreover, SPOT relies only on the out of the box functionalities of commercial-off- 70 the-shelf (COTS) smartphones, without requiring kernel level customization or rooting the device. This not only minimizes the burden on the application developers, but also ensures the compatibility with the diverse smartphone models. SPOT is designed as a mobile app that can be optionally backed by a cloud server as illustrated in Fig. 3.1 and 3.4. It enables the efficient collaboration between the smart appliances and smartphone to accomplish connected smart-homes, including smart appliances control, home energy management and automation, and centralized usage analytics. In the following, the major components of SPOT are described. Appliance driver model and dynamic driver loading: SPOT provides a generic model to define the smart appliance properties and communication interfaces in a unified structure referred to as driver. A driver defines each variable and configuration setting as well as the detail of communication protocol and access control for each variable, the way the variables appear in the userinterface, and the range of accepted values for each variable. There are two types of appliance drivers: XML driver or Java driver using SPOT API. SPOT implements mechanisms to load the drivers dynamically at runtime. Fig. 3.4a depicts dynamic driver loading mechanism. Dynamic user interface (GUI): In order to enable users to control smart appliances SPOT generates a specific GUI for each appliance based on specific appliance variables and configuration settings. The information about how the GUI is generated at runtime for each appliance and what variables are shown in GUI as well as the access control information are defined in the XML driver. By dynamically generating the GUI at runtime, SPOT enables access to any smart appliances without requiring to change the app. Appliance discovery and status consistency: SPOT requires users to add smart appliances in the setting to be able to access to them. To facilitate this process, we implement a service discovery protocol based on UPnP that finds the smart appliances available in the home network and maps 71 the discovered appliances with the associated device driver. Moreover, SPOT adopts a monitoring scheme that polls the status of smart appliances periodically and keeps the usage information in an internal database. This enables application developers to implement data analytics about the appliances usage and extract information like daily pattern of user’s appliance usage. SPOT can report this information e.g., on/off status, setpoint value of the thermostats to the cloud server for further data processing, if desired. The SPOT design has several key advantages. First, it benefits users by providing one single gateway to their smart-home appliances and applications. It eases device setup and management and provides a central place for users to trace their activities e.g., track their energy usage. Moreover, users can benefit from accelerated device support as the open XML-based device driver framework is expected to encourage contribution from open-source developer community. In addition, the dynamic, adaptive user interface provided by SPOT improves the users experience and decreases their learning curve. Second, it benefits home appliance vendors by supporting the features and intelligence common in many appliances in one smartphone-based platform. Therefore, vendors do not need to provide all of these features in their appliances leading to simpler design and reduced product price. Vendors only need to provide a device driver according to our XML schema to enable the platform to leverage the device capabilities. Third, SPOT benefits application developers by enabling them to build applications that support multiple appliances from different vendors. SPOT also addresses each kind of heterogeneity outlined in Section 3.3 and brings a data abstraction and a unified data structure across heterogeneous appliances. This consequently reduces the burden on the application developer to deal with appliance specific interfaces and data structures. Lastly, as a gateway to the smart-homes that operates multiple diverse appliances, SPOT allows third party service providers i.e., utility companies, to access the user’s data and provide services in a unified fashion. 72 3.5 Design and Implementation The core component of SPOT is the appliance driver layer that abstracts the access to heterogeneous devices. In this section, we elaborate the design and implementation of such driver mechanism. SPOT supports two types of driver implementation, namely XML based and Java library based. However, the discussion primarily focuses on the former owing to the space limitation. 3.5.1 XML Driver Model We designed a simple XML-based driver model for SPOT. This driver model defines multiple driver units, where each unit indicates an elementary appliance data unit referred to as variable, or an elementary appliance function, such as the details of protocol to communicate with the appliance to retrieve data or change a setting by sending a command. An appliance driver is composed of a number of driver units specifying the sequence of actions and list of variables based on the appliance’s specific design and functionality. Such a driver model offers several benefits. First, by having a generic driver model, we can achieve extensible and universal mobile based appliance control without requiring the knowledge of the whole app or the source code of the app or the design detail of smart appliances. Second, the system can dynamically load the device driver into the app during runtime without the need to change the app structure or even downloading a new version of the app. More specifically, the XML drivers can be installed from the cloud via http or a smartphone’s local storage and then parsed by the app according to the standardized XML schema. Third, by having the notion of driver units, we can build a unified system by addressing different heterogeneity related to different driver units. Fourth, the driver model can significantly simplify the application development and saves the efforts of users, especially for those who are not familiar with embedded system design and app programming. Users or application developers can thus 73 implement cross device applications using the same abstraction provided by the driver model without dealing with appliances specific programming APIs. In particular, SPOT presents application developers a single programming abstraction without burdening them with low-level details such as the data structure for each appliance or how each appliance’s specific communication protocol works. The high-level view of the XML driver is shown in Fig. 3.5. The key component of SPOT’s driver model is multiple kinds of units. Specifically, a driver unit can be either a data unit or an action unit. The data units are classified into common data or variables and the action units are classified into three groups: read actions, write actions and authentication actions. To capture the fundamental role of each driver unit we explain each of them as following. 3.5.1.1 Driver Units Variables: The variables driver unit is the most important unit in the SPOT driver model. This driver unit contains the list of variables and the detail information about each variable. A variable is referred to any elementary configuration setting or data element that can be read or set from/to a smart appliance. The XML schema snippet illustrated in Fig. 3.6 details the data structure in the driver model. Most of the elements are self-explanatory. canonicalName is enumeration, such as “Status”, “SetPoint”, “CurrentTemperature”, etc. for thermostats, and tells the app the meaning of the variable. Such semantic mapping is needed to address heterogeneity in variable names and allows the app to process each variable in an appropriate way. parent is used for defining hierarchical relationship among variables to address the heterogeneity in the data structure demonstrated in Section 3.3. The uiType, uiHelperText, uiCaption are used in dynamic GUI creation. Common data: The common data unit defines common data types about all kinds of appliances. 74 Philips HUE Light Philips ... ... ... ... ... ... Figure 3.5: XML driver of the Philips HUE light Currently this data unit is simple and includes the driver name, appliance type e.g., HVAC, lighting, and the appliance vendor information. The common data is used in SPOT to categorize the devices into its data structure for group management and data analytics. The XML schema for the common driver unit is simply implemented as: 75 Figure 3.6: The snippet of the XML schema for the SPOT’s driver model 76 ... Figure 3.7: The snippet of the common driver unit in the driver model Read action: A read action unit specifies the detail of the communication protocol and its settings for reading the device data variables. As it is shown in the following XML schema snippet, this driver unit specifies the http method e.g., get, post that needs to be used to request the variable values from the appliances. It also specifies the base URL that the read request has to use and any extension to the base URL to create to access each variable. The read action unit also indicates the pattern of the read response, e.g., JSON, XML or MIME (responsePattern). Thus, the read action driver unit provide all the required information for SPOT to establish a success communication to each appliance and retrieve its data. uriExtensionPattern can be used to attach additional information as part of URL, e.g., query string in http GET requests. Figure 3.8: The snippet of the read action unit in driver model 77 Figure 3.9: The write actions using driver model Write action: The write action unit in the driver describes the communication method to set or change the current value of any variable (with the write permission) in each appliance. It indicates whether the new value and the variable name have to be set in the URL extension or in the body of the http message. It also describes the pattern of the body of the message (bodyPattern) if the write command has to be defined in the body part of the message. Currently the SPOT driver model support several body patterns including json, xml, mime and url-encoding. The snippet of write action unit is shown in Fig. 3.10 Authentication action: As discussed in section 3.3, one of the aspects of the heterogeneity of smart appliances is their authentication and user management mechanisms. Therefore, it is clear that one of the essential parts of the driver model needs to define the authentication mechanism 78 Figure 3.10: The snippet of the write action unit in driver model for each appliance. Our extensive exploration in several smart appliances suggest there are two major authentication mechanisms are adapted by the vendors namely as OAuth2 e.g., Nest and WeMo and simple username/password authentication e.g., Philips HUE light. SPOT supports both authentication mechanisms. At the high level, the former can be addressed by defining optional auth element, which stores token expiration time, URL for token refreshment, etc., while the latter can be done by including username/password as part of baseUri or uriExtentionPattern. 3.5.1.2 Device Driver Usage Reading values from the appliance: SPOT can read/fetch variable values from the appliances via a network connection, using RESTful/SOAP protocols. SPOT initiates an HTTP request to an appliance, querying for list of variables listed in the XML driver, possibly using additional headers, using HTTP GET/POST methods as indicated, and possibly with a body as defined in the driver. 79 Upon receiving the HTTP response from the appliance, the payload of the response message is pattern-matched to the format indicated in the XML driver e.g., different JSON or XML formats. After pattern matcher verifies the message is well-formed the driver manager extracts the variable values and if some variables are indicated to be shown in the UI, it passes them to the user-interface component. The user-interface then presents the variables and their values based on the defined format using the indicated UI component e.g., as a text or as a value of the progressbar. Finally, if necessary, variables are stored in the SPOT internal database or processed based on canonicalName definition. Setting/Updating the values on the device: In order to set values on the appliance (update the appliance configuration), SPOT sends the HTTP request to one or multiple fields possibly using additional headers, using POST/PUT as indicated, and possibly with a body message. If the new values for the variables are set by user from the user-interface, similar to the read action, SPOT maps the variable values to the accepted format and range defined for each variable in the XML driver. It then creates the message according to the settings defined in the write action unit and sends to the associated appliance. If necessary, the value of the variable is updated in the database upon receiving a ack from the target appliance. Also, the value of the variable is updated in the UI after a successful write action. Fig. 3.9 illustrates this process. Dynamic UI Creation: A salient feature of SPOT is the Dynamic UI creation. In section 3.5.1 we describe how device drivers are abstracted and how one can easily craft an XML driver for a new appliance using our defined XML structure. The system then utilizes the XML-formatted driver and extracts the fields, access controls, and message formats to communicate with the device. Similarly, upon receiving the XML-formatted driver, the system reads the appropriate XML tags indicating what type of GUI components has to be generated and appear in the associated section 80 (a) JAVA specification (b) XML specification Figure 3.11: JAVA and XML specifications for dynamic GUI generation in the mobile app for any device that user wants to control. This information are obtained from the user-interface (UI) related tags, uiType, uiHelperTex, uiCaption, defined in the variable driver unit. The GUI components in the Android are all sub-classes of a parent class called View e.g., Button, TextView, ProgressBar, and by intelligently implementing the system the user-interface renders at runtime. This feature not only enables the users to control different variables from different appliances without requiring to design a separate app, but also provides the appliance vendors the ability to indicate what variables and how they want users to have access to (access control). By changing the values in UI-related fields in the XML driver, the GUI changes accordingly in the next run of the app without any need to change in the design of the app. 81 Figure 3.12: An example dynamically generated GUI in SPOT 3.5.2 Appliance Driver by SPOT JAVA Library Besides the XML-based device driver framework, SPOT provides the application developer an API, using Java annotations. By using this API a driver developer implements the device driver as a Java class specifying the device variable as a field in the class and uses SPOT-provided annotations to annotate each variable. By annotations the developer indicates what information the system can fetch from the device and how they are structured e.g., RESTful URI. Moreover, by using the annotations, the developer can set what variables appear in the UI, what access level user can have e.g., read-only, read/write, and what kind of UI component should be used by SPOT for each variable. Also, more meta data like the default, maximum, minimum values for each variable can be specified. In addition, the driver can indicate the variables that have to be kept persistent in the database using the appropriate annotation tag. More importantly, the driver specifies the URI to access in read and write scenarios as well as the message patterns e.g., JSON/XML format, for 82 send and response messages to/from the device. Using SPOT annotation APIs, writing such a class is very simple. For example, only the variables and meta data for variables need to be defined, and no method has to be implemented. For instance, a snippet of a Java class generating a driver for the Nest Thermostat can simply be implemented as shown in Fig. 3.11a. The java classes then must be packed as jar files and after submitting to SPOT, the system verifies the jar file and then creates a DEX module [DEX, ] from the legitimate jar files. The DEX module is dynamically loaded to the system and become functional right after. The SPOT API provides a compact and efficient way to implement the appliance drivers, however it requires the programming knowledge and learning the API, and thus is a suitable method for the developers rather than the users, while this is not the case for the XML-based method. 3.5.3 Appliance/State Consistency To effectively manage smart-homes, having up-to-date status of each smart appliance is mandatory. However, since the state of the smart devices can change via other mobile apps e.g., vendor specific apps or physically by changing their settings e.g., turning off the light via a wall switch, it’s highly possible the devices states become inconsistent with the stored state in the SPOT database. In order to address this issue SPOT periodically synchronizes its database by polling the devices, using the mechanisms described in Section 3.5.1.2. However, these changes may not happen frequently and the state of different devices may not change with same frequency (e.g., one may change the setpoint of a thermostat more often than turning on/off a light in the bedroom). Thus in order to have a more efficient device/state persistency and to prevent draining the smartphones battery by unnecessary polls, SPOT can have an adaptive polling mechanism. Intelligent adaptive polling mechanism is left for our future work. 83 3.5.4 Appliance discovery and bootstrap To minimize users’ efforts to register new smart appliances on the system, SPOT supports device discovery mechanism using UPnP. By using cybergarage-upnp library [upn, ], SPOT listens SSDP NOTIFY messages broadcasted by UPnP-capable smart appliances in the network. Through this message, the SPOT identifies IP address and port number of the appliance to which read/write commands are to be sent. Moreover, when the default device name is included in the message, the app automatically finds an appropriate device driver. If it is not found, the app can download it from the cloud. Further utilization of UPnP is part of our future work. 3.5.5 Application Manager SPOT enables home-wide application execution by Application Manager component. Developers or hobbyists can use SPOT API to write home-wide applications and submit to the SPOT. SPOT checks the legitimacy of each application and dynamically loads them into the system (with the same technique to dynamically load annotation-based JAVA drivers). SPOT persists each application into the database and pass its handler to the Application Manager component. Application Manager is implemented as a background process and periodically checks for the active applications in the system, if the execution requirement of each application is satisfied then it executes the application. Since home applications involve one or more smart appliances and each appliance can be associated with one or more application, the application manager service ensures the persistency of device states and their configuration changes with the requirements of associated applications. This is performed by the interaction with device management and underline database management services. 84 3.6 Application Scenarios To demonstrate the extensibility and universality of SPOT driver model and data abstraction, and the generality and flexibility of SPOT as a platform, we have prototyped three different smart-home applications. Each application demonstrates the advantages of having SPOT as a central platform for home automation systems and the opportunities it provides for application developers. Our goal is to demonstrate the capabilities and effectiveness of the platform and show the opportunities provided by SPOT rather than present novel applications. 3.6.1 Application 1: Cross-Device Programming Unlike other contexts (e.g., enterprise or ISP networks), the intended administrators for the home networks are non-expert users.The management challenge is particularly noteworthy when it comes to inter-device control and applications that involve the entire smart-home or part of it but requires more than one device. SPOT provide a platform for the cross-device applications. For this purpose SPOT provides an application layer to enable the users and developers to design custom applications that operate home-wide. SPOT supplies application management service to execute the defined applications along with “trigger-target” application. The idea of this application is inspired by the IFTTT system [IFT, ].“Trigger-target” application is defined by “trigger-target” rules. In trigger-target programming, end users specify the behavior of a system as an event (trigger) and corresponding action (target) to be taken whenever that trigger occurs. Both the trigger and the target can contain parameters that customize their behavior. Each trigger and its target involve one device (trigger device and target device). Trigger also defines the circumstances that lead to its occurrence. For that a user selects a variable from the trigger appliance and the condition and a triggering threshold when he defines the trigger. User 85 also selects the target device, chooses one of its variables as target variable along with the target value for that variable when it defines the target. Once the trigger and target are defined a triggertarget rule is programmed and it becomes active immediately. Application Manager of SPOT then takes the rule and places it in the application pools to execute when the trigger occurs. For example, if target temperature set in Nest thermostat is equal to 70 then turn off the Philips HUE light. Without using SPOT, users can only program simple inter-device applications using third party cloud-based systems like IFTTT. However, involvement of third party systems might rise some privacy concerns for the users. Moreover, systems like IFTTT only provide interfaces to a limited set of appliances that are integrated with the implemented interfaces. 3.6.2 Application 2: Residential Automated Demand Response Demand Response (DR) is a technology to enable balancing of electricity supply and demand. In DR, utility companies send signals to electricity customers to request their electricity demand curtailment, in order to handle the situation where significant demand peak is expected or when electricity price is high. By participating in DR, customers can gain monetary incentive or incur lower electricity cost. SPOT can benefit users (i.e., electricity customers) by facilitating DR participation by means of home automation. Specifically, SPOT works as a home gateway, which interacts with electricity utility company’s DR server and controls smart appliances in a household according to the signal from the server. For instance, in the case of DLC (direct load control), which is probably the most popular DR program, SPOT can automatically change the operation state of the targeted appliance (e.g., turning off lighting and /or changing set point of an air conditioning system). Dynamic pricing, which encourages voluntary demand curtailment by adjusting electricity price during peak hours, also has gained popularity. In this case, SPOT interprets pricing information from a DR server and automatically reduces or shifts appliance usage. While tradition86 ally DR participation required manual control of individual appliances, which deterred penetration into residential sectors, SPOT is expected to significantly reduce the user’s efforts and hence lower the barrier of participation. We have implemented a prototype DLC app that implements OpenADR 2.0b protocol [Ope, ], the globally-recognized standard for automated demand response, as well as SPOT for automated appliance control. Functionality to report appliance status (e.g., on/off and setpoint) is also implemented using SPOT so that the DR server can use such data to validate DR participation. Moreover, although all of the smart devices discussed in this work (listed in Table 3.1) are not OpenADRcompliant as of April, 2015, SPOT immediately allows integration of them into automated demand response systems. 3.6.3 Application 3: Central Usage Analytics As explained in Section 3.5.1, SPOT provides a way for collecting status of all smart appliances in a household. In other words, a smartphone app can access the rich historical data regarding when each appliance was active. Such centralized data collection enables a variety of data analytics that is useful for users. Combined with the user’s energy usage data, which can be obtained via services like Green Button Download My Data or Connect My Data [Gre, ], the app can accurately label the energy consumption trace with activeness of operation status of each appliance, without requiring users’ effort. Such labeled traces can be used, for example, for electricity load disaggregation, which is takes a whole-home energy signal and separates it into component appliances, and provides the user with fine-grained understanding of her own energy usage. Namely, the app can tell how much electricity is consumed by each appliance and how much electricity cost is attributed to it. Furthermore, other types of supervised data analytics can also be made possible. 87 Table 3.1: Smart appliances tested with SPOT Appliance Name Vendor Type Protocol Architecture Upnp Auth. Msg. schema driver Philips HUE light [Phi, ] Philips light RESTful HUB X user/pass JSON alla Nest thermostat RESTful Cloud x OAuth2.0 JSON all WeMo Switch [wem, ] WeMo plug SOAP Local IP Add. X x XML XML Venstar thermostat [Ven, ] Venstar thermostat RESTful Local IP Add. x x JSON all UFO plug RESTful Local IP Add. x x JSON all Radio Thermostat [Rad, ] Radio Therm. thermostat RESTful Local IP Add. x x JSON XML Ecobee Thermostat [eco, ] Ecobee thermostat RESTful Local IP Add. x OAuth2.0 JSON XML Nest smart thermostat [Nes, ] UFO Smart Plug [ufo, ] a : all means all three types of drivers (XML, Pure Java, SPOT annotation-based Java API) are implemented for this appliance. 3.7 Evaluation To demonstrate the usability of SPOT as a platform, we conducted extensive analysis to evaluate SPOT performance in detail. Our goal is to achieve the latency that is low enough to handle application scenarios discussed in Section 3.6 and to offer scalability and throughput that can handle large, complex smart-homes. In this section, we report our system evaluation in different criteria from memory and CPU usage of SPOT as a smartphone app and the latency of appliance control command invocation to the overhead introduced by dynamic driver loading, appliance status polling, and so forth. We also look at the scalability of system to managed multiple smart appliances in smart-homes. In order to show the universality of our driver model and SPOT app, we prototyped device drivers for multiple appliances on the market from major home appliance vendors. Fig. 3.1 show the appliances that are currently tested using our driver model and the app. As elaborated in Section 3.3, appliances vary in several different aspects including communication protocol and ar- 88 Figure 3.13: Smart appliances tested with SPOT chitecture, messaging schema and data structure, authentication mechanism, and support of device discovery. Comparison of XML-based and JAVA-based drivers: As discussed in Section 3.5, there are basically three ways of implementing an appliance driver: a) conventional driver, which is to implement each driver without using SPOT driver model e.g., a proprietary Java class for each appliance and implement all the functionalities and variables independently, b) using SPOT Java API (cf. Section 3.5.2) and c) XML-based driver (cf. Section 3.5.1). We compare the ease of programming a driver in terms of line-of-code(LOC) for each kind of driver for multiple appliances. Fig. 3.14 shows that the implementation code of the drivers using SPOT driver model (XML-based and API-based) is significantly shorter than the conventional driver. This benefit is regardless of the other benefits of using SPOT driver model such as dynamic UI generation and automatic DB connection. In addition, we note that implementing drivers using SPOT API is even shorter than using XML models. This is because in the XML model, as illustrated in Section 3.5.1, in order to define the driver units e.g., variables and action units, it requires to provide a few lines of XML code for each. However, using SPOT API one can define all the meta information about each driver unit in a compact line using SPOT API annotations (cf. Fig.3.11a). Moreover, the Java 89 drivers implemented by SPOT API do not require to implement the action units as opposed to the conventional driver. This is because the driver manager in SPOT provides actions e.g., read, write, authentications for each driver according to the driver model. It is important to note that although SPOT API provides a compact and efficient way to implement the appliance drivers, it is a suitable method only for the application developers rather than the users since it requires the programming knowledge and learning the SPOT API, while this is not the case for the XML-based method. CPU and memory footprint on smartphone: We measure the CPU and memory footprints of SPOT. We select different features that SPOT provides. We use the utility application, System Monitor, to measure the CPU and memory usages. The CPU usage of SPOT is almost 0.0%. The memory usage is about 49 MB during silence, but reaching 108 MB when visualizing the reports. This is due to the dynamic increase of the heap space when data being loaded for visualization. The total size of the SPOT binary is only 9.69 MB. Latency of appliance control command invocation: We also measure the latency of an operation invocation with different number of variables to set under two different situations: a) when the smartphone and a smart appliance communicate directly on the local WiFi, e.g., Philips HUE light and Venstar thermostat, or b) when the smartphone and the smart device communicate via the smart device’s (or a third party) cloud e.g., Nest Thermostat. Our measurements show it takes less than 150 ms when the command is invoked over local WiFi and and it takes around 395 ms when it is invoked via Internet (device cloud). The low latency makes SPOT a suitable platform for applications with specific timing requirements. For example there are demand response services, (cf. in Section 3.6.2), such as Fast DR [fas, ] that require the operations to the targeted appliances to finish within 4 seconds. Our results show that the command invocation time on SPOT is short enough to support this use case. 90 Figure 3.14: The length comparison (LOC) of different kind of drivers Latency of dynamically loading drivers: Although loading and processing the device drivers into the system does not happen frequently, it must not interfere with user’s experience as well as time-critical operations. We measure the time it takes to load and process the XML driver with 10 different variables. Our measurements on Galaxy Nexus, Nexus 5, Nexus 7 show that the drivers are loaded less than 25 ms. SPOT processes the XML drivers in a background thread and thus it does not interrupt the user’s experience. Fig. 3.15 shows the latency of dynamically loading an XML driver versus the number of fields defined for the smart appliance. In Section 3.5.2 we discussed that an alternative to XML-based drivers is to draft the appliance drivers as minimal Java class using the annotation-based SPOT API. These driver classes are packaged as DEX files and are loaded dynamically. We measure the time it takes SPOT driver manager to load the DEX drivers and process the annotations to extract the driver specific infor91 Dynamic loading time (ms) 30 25 Nexus 7 Nexus 5 Galaxy Nexus 20 15 10 5 0 5 10 15 20 25 Number of variables 30 Figure 3.15: Latency of dynamically loading the XML drivers (error bars: standard deviation) mation. Similar to Fig. 3.15, our measurement is done for drivers with 10 different variables on different smartphones. Our measurements shows the DEX drivers are loaded and processed in less than 20 ms. This process is an I/O process and it is executed in the background to prevent any user interruption or any interruption with the main thread. DB query delay when fetching smart appliance state: In Section 3.5.3 we explained how SPOT stores the device states in a lightweight DB to provide appliance state consistency. We measure how long it takes to persist the appliance information and to retrieve its state from the database. Fig. 3.16 shows the latency of database query on various smartphones versus the number of appliance records in the database. We can see querying the DB when there are a few records of smart devices are stored takes less than 4 ms. Although this delay is very short, it can vary based on the number of records. To achieve a design that is scalable in terms of number of appliances and to have an smooth transition between different components of the system including the user-interface components, this I/O task, similar to other I/O and network tasks, is done in a background thread. 92 Latency of DB querry (ms) 8 7 6 Nexus 7 Moto g Galaxy Nexus 5 4 3 2 1 0 2 4 6 8 10 Number of devices 12 14 Figure 3.16: Latency of database query The effect of polling appliances state: In addition to the latency of the DB query, another important factor related to the appliance state consistency is the frequency for polling the appliances. Polling too often could drain the smartphones battery quickly. In this section, we measure how polling affects the battery level when SPOT runs on Nexus 6. Fig. 3.17 shows the trend of battery level drop for different number of appliances and different polling intervals. For experiments with 3 devices, we used Radio Thermostat, UFO Smart Plug, and Philips HUE. For 6-device experiments, we additionally used Ecobee, WeMo Switch, and Venstar. Note that the separation between the lines for different experiment is made for better readability, and is not related to battery level differences. Intuitively it is expected the longer polling interval leads to a slower energy consumption while polling more appliances cause a faster energy consumption. As can be seen, even in the most aggressive case (polling 6 appliances every 1 minute), the battery level drops at most only about 1 percent per hour. This result suggests that frequency of this level is practical. Moreover, we can see by choosing a longer polling interval the rate of the smartphone’s energy consumption decreases that agrees with our hypothesis. In addition we see when SPOT polls 6 appliances rather 93 Battery Percentage(%) 100 99 98 97 96 no pulling (baseline) 3 dev. - 1 min 3 dev. - 10 min 6 dev. - 1 min 6 dev. - 10 min 20 40 60 80 Running time(minutes) 100 120 Figure 3.17: The effect of polling interval and number of appliances (devices) on smartphone’s energy consumption. Shorter polling interval and more number of devices lead to higher rate in energy consumption of smartphone. than 3 the drop rate in battery level increases as expected, but the increase in energy consumption rate is moderate (a few minutes). These results suggest, in this context, the polling interval plays a more important role than the number of appliances. Based on our experiments using 5 real devices (ufo, venstar, wemo, radio, hue), we confirmed that SPOT can handle over 800 device state pollings per minute. Therefore not only the polling mechanism is practical but also it is scalable to more appliances. Fig. 3.18 is the screenshot of the page in SPOT that shows a part of appliance status stored in the SPOT internal DB. We can see multiple appliances and their defined type as well as their current status including on/off state, current setpoint, current temp (if any). By providing this central and consistent information about the smart appliances at home, SPOT facilitates the development of different kinds of whole-home applications. For instance, maintaining the consistent information about the appliances status not only is useful for home automation and is essential for users to control appliances, but also is crucial for applications like data analytics and demand response. 94 Figure 3.18: SPOT records the state of appliances and maintains appliance/state consistent in its internal DB with frequent polling. Latency of dynamically generating GUI: A salient feature of SPOT is its capability to dynamically create the appropriate graphical user interface (GUI) for controlling smart appliances. The components used in the GUI are indicated by designated tags in the XML driver or appointed Java annotation in the JAVA-API drivers. The components are specified for each field in the driver and has to be selected from the supported components available in the Android library. Moreover, different appliances can have different number of fields accessible in the UI. As a result the UI load time can vary for different number of UI components. Our measurements show the user interface load time is around 5ms and changes at most by 1ms when the number of components in the UI are up seven components. It should be noted that maintaining small variation in loading the user interface is essential for achieving good user experience. The smoothness of displaying dynamic GUI: As an mobile app that provides a dynamic user interface for the users, drawing screen frames with a regular rhythm is essential for good performance and use experience. We analyze this by using an Android system tools, Systrace [sys, ], which is particularly useful in analyzing application display slowness or pauses in rendering the 95 UI components. Typically the analysis of display components (UI threads) by Systrace is reported under SurfaceFlinger process as shown in Fig. 3.19. Having a regular rhythm for display ensures that UI components are smoothly appearing on the screen [sys, ]. Fig. 3.19 illustrates the execution pattern of display component in SPOT. The regularity of SurfaceFlinger process suggests a smooth GUI rendering in the app. Moreover, the regular pattern in the CPU state in the upper panel of Fig.3.19 indicates that there is no other threads in the app, e.g., network communication, disk operation for DB access or loading the UI components like images, which may interrupt the rendering of the user-interface. This validates the efficient architecture of SPOT that achieves the smoothness in displaying the UI. Latency of cross-device application runtime: A salient feature of SPOT is that it enables developers to create whole-home applications that involve and connect multiple smart appliances. In order to show the performance of such applications, we use the Application 1 discussed in Section 3.6.1. Fig. 3.20 shows the running time of this application for different number of smart appliances used. Note that this execution time only presents the application runtime delay and does not include the delay of command invocation to the appliances as the appliances response time can significantly vary owing to the diversity in their communication architecture. In this figure we can see having an smart-home application running on top of SPOT (cf. Section 3.6) will add at most 13 ms to the execution time when handling the information of 14 appliances. This shows the practicality of build whole-home applications. 96 Cross-device application Load time (ms) Figure 3.19: The smoothness of displaying GUI: The regular rhythm in SurfaceFlinger process indicates the smooth display rendering. The regular rhythm in the CPU state in the same period of time indicates no interference between the threads in the app. 16 14 12 Galaxy Nexus Moto g Nexus 7 10 8 6 4 2 0 2 4 6 8 10 12 Number of devices 14 Figure 3.20: Latency of whole-home application runtime 97 3.8 Summary In this Chapter we presented SPOT, a smartphone-based platform for home automation systems. SPOT addresses the extreme heterogeneity and diversity between different smart appliances in smart-home context. SPOT features a driver abstraction, in which the smart appliances are modeled by an XML-based or JAVA-based driver structure that specifies data and actions supported by appliances. By making the heterogeneous characteristics of smart appliances transparent, SPOT minimizes the burden of home automation application developers and the efforts of users who would otherwise have to deal with appliance-specific apps and control interfaces. SPOT is evaluated through several benchmarks and three case studies: cross-device programming, central usage analytics and residential energy management via demand response commands. Our evaluation demonstrates the generality of SPOT’s design and its driver model. Although our prototype and experiments focus on RESTful and SOAP-based appliances, it should be noted that the driver model and the way SPOT tackles the appliance heterogeneity are not limited to these appliances. 98 Chapter 4 On-device Deep Learning 4.1 Introduction Applications of deep learning have seen a great leap in inference accuracies in a number of fields. Neural networks, the algorithmic core of deep learning, have become ubiquitous in several applications including speech recognition [Deng et al., 2013], computer vision [Moazzami et al., 2017] and natural language processing [Collobert et al., 2011]. In sensory systems, deep learning has revolutionized the way sensor measurements are processed and interpreted. However, significant requirements of memory and computational latency have been the main bottlenecks in wide adoption of these novel computational techniques on resource constrained smartphones and embedded platforms. Deep learning models are both computationally intensive and take up a lot of srorage space, making them difficult to deploy on resource-limited embedded systems such as smartphones. For example original AlexNet [Krizhevsky et al., 2012] and R-CNN [Girshick et al., 2014] are more than 200 MB or VGGNet [Simonyan and Zisserman, 2014] is more than 520 MB. Almost all of that size is taken up with the weights for the neural connections that are trained through the training process. There are many millions of these connection in a single model. For example, VGGNet consists of more than 135 million weights in its structure [Kaparaty 2016, ]. In 1998, Lecun et al. [LeCun et al., ] classified handwritten digits with less than 1M parameters, while in 99 Figure 4.1: The output volume (feature maps) of different layers in a deep neural network 2012 Krizhevsky et al. [Krizhevsky et al., 2012] won the ImageNet competition with a network with 60M parameters. Deepface [Taigman et al., 2014] classified human faces with a network with 120M parameters. These parameters are arranged in large layers; in which each layer takes input from the previous layer and after applying specific processing on the data passes its output to the next layer. Fig. 4.1 illustrates this hierarchy of layers and the input and output of each layer in a sample network architecture. The shear complexity and associated heavy computation, memory and energy demands of the deep learning models cause the majority of mobile sensor-based apps to rely on simpler methods with lower resource overhead (e.g., decision trees, Gaussian Mixture Models (GMM)) resulting in lower accuracy and robustness in real-time inference in mobile sensing applications. However, it is critical to gain the inference accuracy and robustness provided by deep models in the future generations of mobile applications. Motivated by the above observations, in this chapter we make important strides towards adopting deep learning models by applications in mobile and embedded devices. We propose a novel model partitioning framework that enables us to embed deep learning models into mobile applications by decomposing them into their layers, and assigning each layer of the model to different tier based on the layer’s time-criticality, compute-intensity, and heterogeneous latency/memory con- 100 sumption profiles. This is a inter-layer partitioning in which a neural network is partitioned in two sub-networks that are deployed on the smartphone and the cloud. 4.2 Related Work On-device deployment by model compression: There have been various proposals to compress deep models by removing the redundancy in the weight values: Vanhoucke et al. [Gupta et al., ] explored a fixed-point implementation with 8-bit integer (vs 32-bit floating point) activations. Hwang & Sung [Shin et al., 2016] proposed an optimization method for the fixed-point network with ternary weights and 3-bit activations. Authors in [Denton et al., 2014] exploited the linear structure of the neural network by finding an appropriate low-rank approximation of the parameter and keeping the accuracy within 1% of the original model. Much work has been focused on binning the network parameters into buckets, and only the values in the buckets need to be stored. HashedNets [Chen et al., ] reduces model sizes by using a hash function to randomly group connection weights, so that all connections within the same hash bucket share a single parameter value. In their method, the weight binning is pre-determined by the hash function, instead of being learned through training. Partitioning and task offloading: Various task offloading schemes for smartphones have been developed recently. Spectra [Flinn et al., 2002] allows programmers to specify task partitioning plans given application-specific service requirements. Chroma [Balan et al., 2003] aims to reduce the burden on manually defining the detailed partitioning plans. Medusa [Ra et al., 2012] features a distributed runtime system to coordinate the execution of tasks between smartphones and cloud. Turducken [Sorber et al., 2005] adopts a hierarchical power management architecture, in which a laptop can offload lightweight tasks to tethered PDAs and sensors. While Turducken provides 101 a tiered hardware architecture for partitioning, it relies on the application developer to design a partitioned application across the tiers to achieve energy efficiency. ORBIT [Moazzami et al., 2015] dispatches the execution of sensing and processing tasks in a smartphone-based multi-tier architecture to achieve data-intensive applications requirements. ORBIT maximizes the battery lifetime subject to the application-specific latency constraints. Wishbone [Newton et al., 2009] also features a task dispatch scheme and unlike Turducken uses a profile-based approach to find the optimal partition. It only considers two tiers: in-network and on-server. In this chapter, we take a different approach. We show how we efficiently embed deep models in mobile applications with partitioning by exploiting the model architecture and layers properties. 4.3 Architectural Observations The layered architecture of deep models: Deep Neural Models are sequences of layers, in which each layer takes input from the previous layer and after applying specific processing on the data passes its output to the next layer. We use three main types of layers to build ConvNet architectures: Convolutional Layer, Pooling Layer, and Fully-Connected Layer (exactly as seen in regular Neural Networks). We will stack these layers to form a full deep model architecture. A deep neural model can have as many of any of these kinds of layer as needed. Fig. 4.1 illustrates a typical deep neural network with several layers. The volumetric but sparse output of each layer: Except the very first layer of a deep neural network that take advantage of kind of sensory data (e.g., sound, image, etc), the architecture of deep neural models are typically constrained in a particular way. Specifically in convolutional neural networks, the layers have neurons (a.k.a nodes) arranged in three dimensions: width, height, depth. The depth here refers to the third dimension of an activation volume or output of each layer. 102 In a well-trained neural network each neuron in each layer is responsible to extract certain feature from input data as a feature map which normally is a 2D matrix. As illustrated in Fig. 4.1 a 3D feature map of a layer is generated by concatenating the feature maps from all neurons in that layer. For example, CIFAR-10 model [Krizhevsky, 2009] is a deep neural model for detecting objects in tiny images. In this model the input data is image and the volume of activations has 32x32x3 dimensions (width, height, depth respectively). Obviously not all the features are available in all input data meaning many of the neurons are not activated when processing a certain input at inference time. This results in a highly sparse feature map as the output of each layer. For instance, Fig. 4.2 shows a typical-looking feature map on the first conv. layer of a trained AlexNet when it processes an image of a cat as the input. In this figure values in zero are shown in black. As we can see, the feature maps are extremely sparse since most of the activation values in the feature maps are zero. Moreover, as the data goes through the network the output volume changes. The final output layer for CIFAR-10 has dimensions of 1x1x10, because by the end of the architecture, the model reduces the input into a single vector of class scores. Smartphone’s architecture Smartphones have several salient advantages that make them promising system platforms to execute the applications with embedded deep learning models. These features include high-speed multi-core processors that are capable of executing advanced data processing algorithms, various integrated sensor modalities and multiple network interfaces. Motivated by these observation we propose a novel partitioning mechanism that exploits the above properties in deep learning models and smartphone architecture to achieve the best embedded schema for embedding deep models in smartphone sensory apps. 103 Figure 4.2: Sparsity in feature maps in conv. neural nets: This figure shows a typical-looking feature map on the first conv. layer of a trained AlexNet while processing an image of a cat as the input. Every box shows an activation map corresponding to a filter. This figure shows how sparse the activations are (most values are zero and shown in black) [Kaparaty 2016, ] 4.4 Partitioning This section describes our model partitioning framework, Deep-Partition. Deep-Partition aims to minimize the end-to-end delay of feed-forward execution of deep neural network on the smartphone and embedded devices. Deep-Partition employs a novel model partitioning framework that processes the deep neural architecture as a computation graph and finds the optimal graph cut for each model. Consider a deep neural network model, N, consisting of n layers (denoted by L1 , . . . , Ln ), with a feed-forward execution pipeline expressed as a sequential set of layers: N := L1 → L2 → . . . → Ln . Let Ii denote the execution tier of Li , where: Ii ∈ {(1, 0), (0, 1)} represent the smartphone and cloud, respectively. Let τp , τc and τA denote the feed-forward execution time on the smartphone, on the cloud, and model’s end-to-end, respectively. Fig. 4.1 illustrates a typical 104 deep learning model (a deep convolutional neural network) with hierarchy of layers and the tensor sizes for each layers output a.k.a layers activations. We now formulate the model partitioning problem for a feed- forward execution. Model Partitioning Problem. For the feed-forward execution N = L1 → L2 → . . . → Ln , the Model Partitioner finds a graph cut defined by the set S = {I1 , I2 , . . . , In } to minimize the total execution time in a feed-forward execution denoted by T subject to the processing constraints specific to each model. p The execution times of layer Li on the smartphone and cloud are denoted by ti and tci , respectively, where the superscripts ‘p’ and ‘c’ represent smartphone and cloud.Denote tp↔c the latency of downloading/uploading a data unit (a layer’s output feature-map as a tensor with dimensions wi , hi , ci ) from/to the phone to/from the cloud. As we will see later in the evaluation section, unlike other partitioning schema, this parameter is very critical in defining the optimal partition due to 3D volume output of each layer as described in Section 4.3. Thus, let κi denote the size of output tensor of layer Li , as si = wi × hi × ci . In addition, let Ki indicate the size of this tensor considering the coding schema. For example, if the tensors are stored like images in the format of RGBA-8888 each pixel/item in the tensor is stored/computed as a 32 bit float number. Therefore Ki = 32 × κi . In order to keep the generality let b denote the coding bit rate (e.g., 32 for RGBA-8888) and thus Ki = b × κi for the tensor output of layer Li . We now analyze the processing delay of a deep model in a feed-forward period. Processing execution delay: Let T denote the end-to-end delay in processing the input data. Analysis shows T = n X i=1 1 p Ii · ti + (1 − Ii ) · tci + d(Ii , Ii−1 ) · Ki−1 , λ 105 (4.1) where, similar to what we have in Chapter 2 , the function d(Ii , Ii−1 ) accounts for the data-copy overhead between the tiers (i.e., upload the tensor output of layer Li to the cloud) by indicating the distance between the positions of ‘1’ in Ii and Ii−1 . Note that unlike Ch.2 we only consider two tiers here (smartphone and cloud) the function d(., .) can simply be defined as the XOR value of the assignments of two consecutive layers, d(Ii , Ii−1 ) = Ii ⊕ Ii−1 , therefore it can be either 0 or 1 depending on whether the partitioning happen between layers Li−1 and Li . As we see under this analysis, the actual model execution time (and thus the partitioning result) not only depends on each layers computation time but also depends on the layers output volume. Parameter λ is the network’s upstream bit-rate. 4.5 Layer-wise Profiling of Representative Deep Networks For an efficient partitioning, it is important to understand the characteristics of the execution latency in deep learning models and their internal volumetric outputs. This section presents a measurement study of the layer-wise output volume and the latency of a few of the most common representative deep models on different smartphones. The measurement study provides insights into the computational intensity of deep neural models, their layer-wise delay and output size; and motivates the key design decision in our model partitioning schema. We investigate the latency of each layer for three of the representative and commonly used deep neural models, namely AlexNet, GoogleNet and VGGNet, when running on a Galaxy S7 smartphone. Fig. 4.3a, 4.3c and 4.3e summarize these results. Moreover, Table 4.1 shows the breakdown of the output volume and execution delay for the sample neural network shown in Fig. 4.1. In addition, table 4.2 summarizes the end-to-end feed-forward execution time for these architectures. 106 (a) AlexNet execution time (b) AlexNet tensor size (c) GoogleNet execution time (d) GoogleNet tensor size (e) VGGNet execution time (f) VGGNet tensor size Figure 4.3: The profiling of the execution time, layer-wise latency and the activation’s tensor size for the three major representative deep neural models. 107 Table 4.1: The breakdown of the model shown in Fig. 4.1. Each layer’s output dimension and execution time profile on two different smartphones with two different processors i.e., Exynos 7420, Intel i7-4500. dimensions(pixel)d size execution time (ms) layer name w h c pixels:w*h*c bitsβ 1.5GHzc1 1.8GHzc2 input image 32 32 3 3072 98304 NA NA conv1 28 28 10 7840 250880 45.56 9.39 pool1 14 14 19 3724 119168 30.66 30.49 conv2 10 10 25 2500 80000 80.86 35.29 pool2 5 5 25 625 20000 71.46 20.54 conv3 2 2 100 400 12800 50.07 21.2 pool3 1 1 100 100 3200 42.34 17.55 output 1 1 10 10 NA NA NA d: w, h, c: indicate width, height and number of channels of the associated tensor, respectively. β: Based on 32-bit RGBA-8888 coding. c1 : 4.6 Exynos 7420, c2 : Intel i7-4500 Evaluation of Model Partitioning To demonstrate the expressivity of Deep-Partition and its generality and flexibility, we consider three representative Deep Convolutional Neural Networks (ConvNet) from the above list. As we discussed earlier (cf. 4.3), each ConvNet consists of different layering architecture with different intermediate feature volumes. Our goal is to demonstrate the capabilities and effectiveness of Deep-Partition rather than comparing the performance of these ConvNets. We evaluate the effectiveness of the model partitioning algorithm presented in Section 4.4, 108 Table 4.2: Representative Deep Neural Network Models Representative Deep Neural End-to-end latency(ms) Architectureα Smartphone Network Server 1-thread 2-threads CPU-only CPU+GPU AlexNet 5(CV) 3(FC) 1000(O) 450 350 100 15 GoogleNet 5(CV) 3(FC) 1000(O) 500 300 150 12 5(CV) 3(FC) 4(PL) 1000(O) 400 220 50 10 VGGNet α: FC: Fully-connected Layer, CV: Convolution Layer, PL: Pooling Layer, O: Output (number of classes) by comparing the following partitioning baselines: a) smartphone-only, when the entire model is deployed and running on the device, and b) server-only, when the model is running on the backend server. For the execution on the smartphone we consider two cases: a) when each layers computation is performed with single thread b) when the layers computations are with two threads. Similarly, for the execution on the server side two cases are considered: a) when the computation happens on on the CPU, b) when computation happens on the GPU. Fig. 4.3a, 4.3e and 4.3c plot the measured execution time of AlexNet, VGGNet and GoogleNet models respectively. These figures show the execution time of these models on different platforms and different configuration for each platform. Moreover, Fig. 4.3b, 4.3f and 4.3d show the output/activation volume of each layer for these models. We can see in these figures that how the model architecture and in particular the layers parameters impact the run-time performance of the models. As discussed in section 4.4 the objective of Deep-Partition is to minimize the model’s latency (feed-forward execution time). On the other hand, Fig. 4.4a, 4.4e and 4.4c show the partitioning results of these models respectively. As we can see Deep-Partition suggests different partitioning layer for different models corresponding to the lowest execution time for each model. 109 In particular, Deep-Partition suggests partitioning AlexNet at layer pool5 and offload the layers fc6 onward and partitioning VGGNet at layer pool3 and offload the execution of layers conv4_1 to the cloud while it suggest partitioning GoogleNet at layer pool13x3_s2 and continue the execution of the network from layer pool13x3_s2 on the cloud. Therefore, there are three major factors that impact the partitioning results. First, the execution time of each layer on each platform as shown in the layer-wise breakdowns in Fig. 4.3a. Second, the communication delay between the smartphone and the cloud server. For example whether it is a broadband 4G communication a WiFi connection what the upstream bit-rate of the communication is. Third, the output volume of the layer’s in the model. The two latter together indicate the time it takes to upload the tensor output at the edge of partition. It’s easy to see that smaller tensor volume and the higher bit-rate leads to faster uploading and thus a better partitioning. As we can see the partitioning algorithm in Deep-Partition finds the minimum execution time for all the models subject to the above three factors. In order to see the end-to-end execution time of the models that is the total execution time on the smartphone and the cloud including the latency due to uploading we plot Fig. 4.4b, 4.4d and 4.4f . The figures show the total execution time of these models versus the partitioning layer for four different configuration platforms. As we can see in these figures, Deep-Partition suggests minimum execution time for all the models. An important and interesting observation from the measurement and the results shown in figures 4.3 and 4.4 is that a few of the output tensors especially for the convolutional layer are very volumetric. This is critical for Deep-Partition partitioning decision because larger tensors faces longer delays to upload. As we cab see in Fig. 4.4 the upload time (green bars) are the dominating factor in the latency. This changes the decision of Deep-Partition in favor of layers with smaller output tensor sizes, i.e., pooling layers or fully connected layers toward the end of the network re110 (a) AlexNet (b) AlexNet (c) GoogleNet (d) GoogleNet (e) VGGNet (VGG16) (f) VGGNet (VGG16) Figure 4.4: Partitioning Results and end-to-end model latency for representative models when Deep-Partition is applied 111 sulting most of the the layers assigned to the phone and thus a larger execution time on the phone. Therefore, in order to optimize the situation we consider two major factors: The Impact of , λ, the cloud-phone communication bit-rate: Although uploading the volumetric tensors generated by the intermediate hidden layers in deep models takes longer time relatively when compared with the layers computation delay, higher communication rate can decrease this time and decrease the dominating factor of the outputs tensor sizes. Therefore, we evaluate the communication data rate (up-stream) and the impact of it on the decision made by the DeepPartition. Fig. 4.5 shows model end-to-end latency when partitions at each layer for different communication data rates, R, for AlexNet model (without loss of generality and because of the space limitation we only show this result for AlexNet ). As we see, by increasing the data rate not only the total execution time decreases (obviously), but also we see at R = 25M bps the partitioning results changes from layer fc6 to pool1. The Impact of, ||K||0 , layer’s output sparsity: Another important observations about deep neural models is that the intermediate features (feature maps) generated by hidden layers are pretty sparse. Higher sparsity means the layers output is subjects to higher compression. Similar to the communication bitrate, higher compression in the layers output decrease the dominating factor of the outputs tensor sizes in the partitioning results. Fig. 4.6 illustrates how the partition layer shifts back to the norm1/pool1 from the norm2/pool2 in the AlexNet architecture. 112 Figure 4.5: The impact of communication bit-rate on the partitioning and model latency 4.7 Application Use-case: Deep-Learning Based Crowd-Assisted Location Labeling System In this section we take a mobile crowd-sourced based application that we prototyped as a usecase application of deep learning in mobile sensing applications. We describe Deep-Crowd-Label, a deep-learning based crowd-assisted location labeling system. Deep-Crowd-Label is motivated by the idea that a vast majority of location-based applications desire the semantic labeling i.e., coffee shop. Past work in location-based systems has mostly focused on achieving localization accuracy, while assuming that the translation of physical location to semantic labels will be done manually. In this section, we explore an opportunity for automatic labeling of user’s location. We propose a 113 Figure 4.6: The impact of feature maps sparsity on the partitioning and model latency (AlexNet) system called Deep-Crowd-Label that uses crowd-sensing to obtain data and build a powerful and scalable prediction model using deep neural networks. It applies convolutional neural models to predict the semantic label for the user’s location. Deep-Crowd-Label also uses the power of the crowd to aggregate the prediction of the model on the data samples associated to the each location. Prior work has shown how places can be discovered from temporal streams of user location coordinates [Chon et al., 2012]. However, if we can automatically characterize places by linking them with attributes, such as place categories (e.g., clothing store, restaurant), we can realize powerful location- and context-based scenarios [Pei et al., 2013]. Indoor maps and semantic understanding of a place are equally crucial pieces of the bigger localization puzzle, but only recently researchers began to focus on these aspects. Deep-Crowd-Label aims for semantically labeling 114 the user’s location. Despite the generality of our approach we focus on the processing of indoor locations, in which the people spend a big portion of their time. In our approach we focus on the location-tagged visual data i.e., images and videos, and use deep neural networks as the core of our processing pipeline. Note that the choice of deep neural networks is due to the promising performance of the recent methods developed in this area that outperform the other data mining and machine learning methods as mentioned earlier in this chapter. Example methods include the recent algorithms developed for object detection [Zhou et al., 2014, Sharif Razavian et al., 2014, Johnson et al., 2015] or for processing of sequential information [Kelly and Knottenbelt, 2015,Pichotta and Mooney, 2016]. On the other hand, the advantages of crowd-sourcing architecture in this work is two folded. First, the training data used to train the deep models are crowd-sourced data (crowdsensing). Second, not only in the training time, but also in our method the final label of the user’s location in the inference time is aggregated over the predictions resulted from the processing of several data samples obtained by the crowd (crowd-prediction). In particular, Deep-Crowd-Label leverages the crowd-sensed data to build a big dataset suitable for training deep neural network, and proposes novel techniques based on model adaptation and model extension via transfer learning to exploit the pre-trained models to build several robust and accurate prediction models for semantically labeling the user’s location. Since training deep neural networks require a lot of data to prevent over-fitting [Bishop, 2001], Deep-Crowd-Label retrofits the pre-trained models via novel transfer learning techniques and builds ensemble of models to cope the over-fitting problem. Moreover, Deep-Crowd-Label uses the power of crowd to aggregate the individual predictions to generate the final location label. 115 Figure 4.7: An indoor area with semantic labels 4.7.1 Traditional Approaches to Location Labeling The notion of semantic localization is not new. Works like SurroundSense [Azizyan et al., 2009] and SenseLock [Kim et al., 2010] utilize sensor data from smartphones to characterize the ambiance and translate them into a semantic information about the user’s location. Authors in [Sapiezynski et al., 2015] and [Wind et al., 2016] attempt to categorize places by training a model on WiFi, GSM and sensor data collected from frequently visited places. These approaches are based on the assumption of availability of labeled ambiance data. Several other works attempt automatic place identification (e.g., home, office, gym) based on the analysis of user trajectories, frequency and timing of visits [Krumm and Rouhana, 2013, Liu et al., 2006]. Work by [Chon et al., 2013] tries to connect the text in the crowd-sensed pictures with the posts in social networks to infer business names. AutoLabel [Meng et al., 2015] aims to automatically identify the name of the store by correlating the words inside the store’s WiFi-tagged pictures with the keywords found in the store’s website to produce a WiFi AP to StoreName table. Perhaps more closer to our approach is work by [Zamir et al., 2013] that attempts to identify the stores that appear in a photo by matching it against the images of the exteriors of the nearby stores extracted from the web. This approach relies 116 on the conventional computer vision techniques and are neither scalable nor robust. In contrast, on the one hand, Deep-Crowd-Label does not rely on the conventional computer vision techniques and uses deep neural networks that has recently driven remarkable performance in computer vision and machine learning. On the other hand, Deep-Crowd-Label relies on crowd-sourcing on both training and inference phases that increases the generality and robustness of the system. 4.7.2 Deep Learning-based Approach Many shortcomings of traditional sensor data processing in modeling the context data can be overcome through the use of deep learning; and have been successfully applied, e.g., image captioning [Johnson et al., 2015] or time series data [Kelly and Knottenbelt, 2015]. Such deep algorithms (e.g., CNN, RNN) learn a number of hierarchical layers of dense feature representations and has two important benefits. First, deep neural networks do not need hand-crafted features which allows us to deploy the models in real applications [LeCun et al., 2015]. Second, the deep features learned by deep neural networks are more generic and extremely robust which allows us to use the learned features in one domain (the source domain) to build the models in another domain (the target domain), e.g., object detection to scene understanding. In this section we present our deep learning approach and the design of our processing pipeline. Domain Adaptation with Pre-Trained Models Domain adaptation aims at training a classifier in one problem space and applying it to a related but not identical problem [Zhang et al., 2015]. We adopt existing practice in DNN modeling, called pre-trained models [Jia et al., 2014], and apply them to our location labeling problem. Our “domain adaptation" in this case is limited to the “label space adaptation", that is adopting the output of the 117 Pre-trained models Model adaptation layer Ensemble output Figure 4.8: Our model adaptation schema and ensemble of adapted deep neural models. Left/Green: Several deep neural models pre-trained or extended using transfer learning. Middle/Red: The adaptation layer. Right/Blue: The aggregation layer. final layer without tuning the learned parameters (weights) or the internal network structure. In this approach, the pre-trained models trained with arbitrary number of class labels used in the task of classifying the images that cover only a subset of classes identifying the location context i.e., store types. Table 4.3 summarizes the pre-trained models used in our domain adaptation mechanism. The final layer of these DNNs commonly use the SoftMax classifier [Bishop, 2001]. For this layer we have: T P (y = j|Xi ) = w X e j i Pn k=1 e (wk XiT ) (4.2) where Xi is the feature vector extracted by the deep neural network for the input sample i (captured single image). wi is the weight learned by the neural network. y is the predicted class label in j ∈ N the set of all the class labels a pre-trained model is trained on (the source domain). For example the size of label space, |N|, for original AlexNet [Krizhevsky et al., 2012] trained for ImageNet [Deng et al., 2009] is 1000 labels. In order to adapt such a pre-trained model for the task of interest (the 118 target domain), we follow the Bayesian chain rule [Bishop, 2001] and apply the prior knowledge specific to the space to the model prediction and thus we have: 1(y ∈ L) · P (y = j|Xi ) Ps (y = j|Xi ) = P 1(y ∈ L) · P (y = j|Xi ) (4.3) L where 1(.) is the identity function and L is the label-set of the application the pre-trained model is adopted for (the target domain). The denominator is the normalization factor and thus Ps (y = j|Xi ) indicates the probability of class(label) given the feature vector Xi for application specific labels j ∈ L . Therefore, given a pre-trained model M with the label space N in the source domain, and the loss function as shown in equation 4.2, our domain adaptation approach adapts the model M for the target application with label space L ⊂ N using the equation 4.3. Fig. 4.8 illustrates the layouts of our pipeline. The adaptation layer in this figure implements equation 4.3. Model Extension with Transfer Learning In practice, very few people train an entire deep neural network from scratch, because it is relatively rare to have a dataset of sufficient size. Instead, it is common to pre-train a deep neural net. on a very large standard dataset i.e. ImageNet, which contains 1.2 million images with 1000 categories or Places dataset [también SCHUM, ]] with 500K images with 205 categories and then use the the resulting model (the pre-trained model) either as an initialization or a fixed feature extractor for the task of interest i.e. location-context/scene understanding. In the previous section, we described how this models are adapted to our application without further training. This approach works perfect for the cases that the target labels are a subset of original labels, L ⊂ N. However, we found out, in our problem, none of the original models include the entire label space. Therefore, we have two case: a) There are class labels in L that do not have any high level representation 119 Table 4.3: Models built in Deep-Crowd-Label via model adaptation and model extension DNN Model Architectureα Dataset ( # of classes/ total size ) MM imagenet-alexnet 5(CV), 3(FC), 1000(O) ImageNet(1000/1.2M) MA places-alexnet 5(CV), 3(FC), 205(O) Places (205/2.5M) MA places-hybrid 5(CV), 3(FC), 1183(O) ImageNet (978) + Places(205) MA places-googleNet 59(CV), 5(FC),205(O) Places (205/2.5M) MA shops-alexnet 5(CV), 3(FC), 26(O) Places205 =⇒ shops from places + SUN3971 dataset (205/2.5M) ME MA indoor67-alexnet 5(CV), 3(FC), 67(O) Indoor 672 (67/15.6K) indoorshops-alexnet 5(CV), 3(FC), 26(O) ImageNet =⇒ indoor shops from SUN + places =⇒ data collected by ourselves (15/1.5k) imagenet-stores-alexnet 5(CV), 3(FC), 9(O) ME ImageNet=⇒ indoor shops from ImageNet (9/10k) ME α: FC: Fully-connected Layer, CV: Convolution Layer, O: Output (number of classes) =⇒: indicates the direction of transfer learning: base model =⇒ new model. MM (Method) : MA: Model Adaptation (Sec. 4.7.2), ME: Model Extension (Sec. 4.7.2). in the pre-trained models label space N e.g., computer store is missing in the ImageNet label space. b) There are class labels in L that match with more than one class label in N e.g., shoeshop has multiple corresponding categories in AlexNet-ImageNet model (shoe, loafer shoe or sport shoe). To address this issue we propose a“Transfer Learning" [Pan and Yang, 2010] approach. In this approach we keep the feature extractor layers3 in the pre-trained model frozen by setting the 3 convolutional layers 120 learning rate for these layers to zero. The last fully connected layer instead is initiated with random weights and is trained with the data collected and labeled by us. This allows us to use the features extractors trained in the pre-trained models and use the our collected data to train the final fully connected layers to cover the entire label space L. This enables us to train a deep model with limited amount of training data with no over-fitting. Model Ensemble To further improve the accuracy and increase the robustness of our final label prediction model (for single image) we use an ensemble of models as illustrated in Fig. 4.8. Our ensemble model is simply the weighted average of prediction probabilities of the individual models that are either adapted or extended by the methods explained earlier (cf. Sec.4.7.2 4.7.2). Fig.4.8 illustrates this approach and Table 4.3 summarizes these models. 4.7.3 Training We train our network with a combination of available public datasets and our collected data using video frames and still images. The training is done when the Model Extension (ME) approach (cf. Sec. 4.7.2) is used. Each model in this approach is trained independently regardless of what kind of pre-trained model is chosen as the base model. In the training process, the convolutional layers are taken from the convolutional layers in the base model e.g., [Jia et al., 2014, Krizhevsky et al., 2012]. We pass the output of these convolutional layers (i.e. the pool5 features) into a single feature vector. This vector is the input to the fully connected layers taken from the base model and trained on our dataset. Finally, we re-define the last layer (i.e., softmax layer) to have outputs equal to the number of classes in our label space. During the training procedure the learning rate is set to zero for the convolutional layers. This is because we do not have enough data to train these 121 layers and thus by freezing these layers (learning rate = 0) it will prevent our model to over-fit to the training data and at the same time taking advantages of the pre-trained model as a feature extractor. The learning rate for the fully connected layers are taken from the default for the base layers while the learning rate for the output layer is set to 10 times of the maximum of learning rates of the fully connected layers. This is because the last layer defined specifically for this layer, unlike other fully connected layers there is no corresponding layer in the base model and thus the weights are initialized randomly. Therefore this requires us to have the layer trained faster than other fully connected layers. Beside the learning rate and the structure of the output layer, the other network hyper-parameters are taken from the base models. In our model we use ReLU [Nair and Hinton, 2010] as the activation function in each fully-connected and dropout [Srivastava et al., 2014] after each one, as in the base models. Our neural network is trained using Caffe [Jia et al., 2014]. 4.7.4 Labeling and Aggregation by Crowd-Sourcing The last stage of our location-labeling pipeline is to aggregate the prediction results of individual images associated to one location, k. We do not need to perform this step at the training time but only at the inference time, that is when the trained model is used to label the collection of images from a location. Let PI (y = j|Xi ) be the prediction probability of classifying an image feature vector Xi using our ensemble of deep neural network models (cf. Sections 4.7.2), and let Γk indicate the set of images collected for location k . Therefore we have: PΓ (y = i|Γk ) = 1 X PI (y = j|Xi ) |Γk | xi ∈Γk where PΓ (y = i|Γk ) is the aggregated predication for each location k. The system labels each 122 location by picking the label with the maximum prediction probability, in other words we have: labelk = max PΓ (y|Γl ) l 4.7.5 (4.4) Data Collection and Dataset Preparation Data Collection: We have collected data from 26 different indoor locations, mostly shops in the malls and supermarkets. The data is collected using smart-watch and smart-phone in the form of videos and still images. The videos are converted to the frames. It is important to remove the very similar frames in the training data to prevent bias in the model. Therefore we only extract the key-frames from the video using FFMPEG. Moreover, the 80% data is left for training and the 20% of is used for inference (labeling). Since having a balanced dataset is crucial in the training phase, the number of images per class is kept balanced in the training set. This is not necessary in the inference phase. Automatic Rotation and Noise Reduction: During data collection we observed that the camera API on each device rotates the captured image arbitrary. Since our deep neural network models are not rotation-invariant, we use the camera information in the images Exif meta-data, to rotate the images to the right orientation automatically. In addition, we apply standard denoising method to improve the quality of collected images. Images that the measure of blurriness is above a certain threshold are not used in the training data. Data Augmentation: Deep neural networks require a lot of data to train. The easiest and the most common method to cope with data scarcity is to artificially enlarge the dataset a.k.a as data augmentation. Following the technique in [Krizhevsky et al., 2012], we augment our dataset by extracting random 224 × 224 patches (and their horizontal reflections) from the 256 × 256 images 123 and training our network on these extracted patches. This increases the size of our training set by a factor of 2048, though the resulting training examples are, of course, highly interdependent [Krizhevsky et al., 2012]. Without this scheme, our network suffers from substantial over-fitting even with transfer learning (cf. Sec. 4.7.2). 4.7.6 Evaluation Our proposed method is applied to the real data collected from 26 stores. Fig. 4.9 shows the prediction results of our pipeline when it labels a single image. The figure shows top-5 prediction results for each image with confidence values in the bar chart in increasing order. Moreover, Tables 4.4 -a:f show the results of the aggregated predictions for 6 different indoor locations (5 different kinds of stores and 1 food court) in a shopping mall. For each location the grand-truth label is mentioned on the top-right cell and the top-5 prediction results are reported in descending order of confidence values. Although the confidence values are different, the results show the difference between the top-1 prediction the other 4 verifying the applicability and generality of our method in predicting the right label for each location. Moreover, as we can see in Fig. 4.9 there are several examples that even the top-1 prediction is not correct (false positive) while this is not the case when the results are aggregated by crowd-sourcing as it is shown in Tables 4.4 a-f. This results show the expressivity of our method in aggregating with crowd-sensing (crowd-prediction) to improve the prediction accuracy. 124 Figure 4.9: Predictions on real samples collected from indoor shops. Bars below each image show the top-5 model predictions using our deep learning method sorted in ascending order. 4.8 Summary Deep neural models are both computationally and memory intensive, making them difficult to deploy on mobile application with limited hardware resources. In this chapter we presents DeepPartition, an optimization based partitioning pipeline featuring a tiered architecture for smartphone and the back-end cloud to deploy and execute deep neural models more efficiently. Deep-Partition provides a profile-based model partitioning allowing it to intelligently dispatch the processing tasks among the tiers to minimize the smartphone power consumption and the deep models feed-forward latency. Extensive microbenchmark evaluation and three case studies on representative deep neural 125 Table 4.4: Location labeling results. Each table represents one store with name and grand-truth type (top row). Top-5 prediction results with confidence values (prediction probabilities) are presented in each row. Each prediction is the aggregated result of crowd-sensed images for each store (Sec. 4.7.4). (b) (a) (c) Safeway supermarket Macy’s clothing-store Disney Store gift-shop 68.52% supermarket 35.71% clothing-store 52.60% gift-shop 6.31% cottage-garden 5.75% gift-shop 11.97% candy-store 5.06% crevasse 3.99% staircase 5.18% market 4.89% valley 3.81% shoe-shop 3.06% game-room 4.81% mountain 3.13% beauty-salon 2.42% supermarket (e) (d) (f) Apple Store computer store Zara clothing-store DSW shoe-shop 16.47% computer store 40.40% clothing-store 19.15% bookstore 6.42% food-court 5.20% garbage-dump 11.39% airport-terminal 6.38% art-gallery 3.98% slum 7.32% shoe-shop 5.49% cafeteria 2.91% excavation 6.32% supermarket 5.26% art-studio 2.83% railroad-track 4.86% clothing-store models validate the performance gain by Deep-Partition. In addition, this chapter presents Deep-Crowd-Label, a novel system to semantically label user’s location. Deep-Crowd-Label is a crowd-assisted system that uses crowd-sourcing in both training and inference time. It builds deep convolutional neural models using crowd-sensed images to infer the context (label) of indoor locations. It features domain adaptation and model extension via transfer learning to efficiently build deep models for image labeling. By fully exploiting the pre126 trained models and available datasets, Deep-Crowd-Label builds ensemble of models to increase the robustness and improve the accuracy of prediction. Moreover, Deep-Crowd-Label aggregates the several individual predictions of images obtained from the same location to infer the contextual label of a location. We also provide the layer-wise benchmark our deep models and apply novel compression techniques on the trained models to facilitate the deployment of the deep neural network on smartphones. The prototyped system and the preliminary experiments on 26 different stores show the high accuracy of the model and demonstrates the generality and robustness of the underlying approach. Future plans include extending the model to more diverse types of locations as well as improving the on-device performance.In addition we plan to merge our ensemble of models into one unified deep neural network by exploiting the shared part of the models. 127 Chapter 5 Conclusion Supported by advanced sensing capabilities, increasing computational resources and the advances in Artificial Intelligence, smartphones have become our virtual companions in our daily life. An average modern smartphone is capable of handling a wide range of tasks including navigation, advanced image processing, speech processing, cross app data processing and etc. The key facet that is common in all of these applications is the data intensive computation. In this dissertation we have taken steps towards the realization of the vision that makes the smartphone truly a platform for data intensive computations by proposing frameworks, application and algorithmic solutions. We followed a data-driven approach to the system design. To this end, several challenges must be addressed before smartphones can be used as a system platform for data-intensive applications. The major challenge addressed in this dissertation include high power consumption, high computation cost in advance machine learning algorithms, lack of real-time functionalities, lack of embedded programming support, heterogeneity in the apps, communication interfaces and programming abstractions and lack of customized data processing libraries. The contribution of this dissertation can be summarized as follows. We presented the design, implementation and evaluation of the ORBIT framework, which represents the first system that combines the design requirements of a machine learning system and sensing system together at the same time. We ported for the first time off-the-shelf machine learning algorithms for real-time sensor data processing to smartphone devices. In this process we considered the power and mem- 128 ory limitation of smartphone, and for each algorithm we provided two versions: the light and the heavy version. This is a leap forward from previous approaches, which relied on custom-designed sensing and computing platforms. We highlighted how machine learning on smartphones comes with severe costs that need to be mitigated in order to make smartphone capable of real-time data-intensive processing. Some of the costs can be managed with an adapting re-design of the off-the-shelf processing pipeline with additional real-time hyper-parameter control parameters to control the precision and computation cost of the pipeline respect to available resource smartphone in terms of battery duration. We showed that some of the limitations imposed by a mobile sensing application can be overcome by having a multi-tier framework allowing us to split the computation pipeline between the smartphone and two other tiers namely extension-board and cloud, by identifying the bottlenecks in the computation graph. We showed that computation blocks can be can be adopted at execution time leading to further improvement in the resource consumption while maintaining the algorithm accuracy and yet shortening the computation time. We reported on our experience deploying ORBIT at scale with a few case studies as well as multiple deployments on active volcanos in Ecuador and Chile. We extended the scope of our work from platforms to application and presented SPOT. SPOT aims to address some of the challenges discovered in mobile-based smart-home systems. These challenges prevent us from achieving the promises of smart-homes due to heterogeneity in different aspects of smart devices and the underlining systems. This owes to lack of dominating standards in smart-home technologies, leading to the fragmented digital homes rather than truly smart homes. We face the following major heterogeneities in building smart-homes:: (i) Diverse appliance control apps (ii) Communication interface, (iii) Programming abstraction. SPOT is an enabling technology for smart-homes system that allows the integration of hetrogenious smart-device seemless by proposing a novel dynamic draver loading schema. SPOT introduces two driver models namely 129 XML-based and library-based allowing the integration and manipulation of smart devices easy for both programmers and users. SPOT makes the heterogeneous characteristics of smart appliances transparent, and by that minimizes the burden of home automation application developers and the efforts of users who would otherwise have to deal with appliance-specific apps and control interfaces. SPOT is evaluated through several benchmarks and three case studies: cross-device programming, central usage analytics and residential energy management via demand response commands. Our evaluation demonstrates the generality of SPOTâĂŹs design and its driver model. After discussing two aspects of this dissertation namely the framework and the application, we finally presented the algorithmic aspect of the dissertation by introducing two systems in smartphone-based deep learning area: Deep-Crowd-Label and Deep-Partition. Deep neural models are both computationally and memory intensive, making them difficult to deploy on mobile applications with limited hardware resources. On the other hand, they are the most advanced machine learning algorithms suitable for real-time sensing applications used in the wild. DeepPartition is an optimization based partitioning meta-algorithm featuring a tiered architecture for smartphone and the back-end cloud, which helps to deploy and execute deep neural models more efficiently. Deep-Partition provides a profile-based model partitioning allowing it to intelligently execute the Deep Learning algorithms among the tiers to minimize the smartphone power consumption by minimizing the deep models feed-forward latency. Extensive microbenchmark evaluation and three case studies on representative deep neural models validate the performance gain by Deep-Partition. In addition, we presented Deep-Crowd-Label, a novel algorithm designed for distributed collaborative smartphone systems for crowd-sourcing applications. Deep-Crowd-Label is prototyped for semantically labeling userâĂŹs location. Deep-Crowd-Label is a crowd-assisted algorithm that uses crowd-sourcing in both training and inference time. It builds deep convolutional neural models using crowd-sensed images to detect the context (label) of indoor locations. 130 It features domain adaptation and model extension via transfer learning to efficiently build deep models for image labeling. By fully exploiting the pre-trained models and available datasets, DeepCrowd- Label builds ensemble of models to increase the robustness and improve the accuracy of prediction. Moreover, Deep-Crowd-Label aggregates several individual predictions of images obtained from the same location to infer the contextual label of a location. The prototyped system and the preliminary experiments on 26 different in-door locations show the high accuracy of the model and demonstrates the generality and robustness of the underlying approach. The work presented in this dissertation covers three major facets of data-driven and computeintensive smartphone-based systems, platforms, applications and algorithms; and helps to spurs a new area of research on smartphone sensing and opens up new directions in mobile computing research. 131 BIBLIOGRAPHY 132 BIBLIOGRAPHY [Hom, ] Control your home. http://www.homeseer.com/. [Online; accessed 2-Nov-2014]. [DEX, ] Dalvik executable format. https://source.android.com/devices/tech/dalvik/dex-format. html. [Online; accessed 03-Apr-2015]. [sys, ] Display and performance analysis in android apps. http://developer.android.com/tools/ debugging/systrace.html. [Online; accessed 06-Apr-2015]. [ELK, ] Elk products. http://www.elkproducts.com/security-automation-connection. [Online; accessed 2-Nov-2014]. [fas, ] Fast demand response (white paper). https://www.parc.com/content/attachments/energy_ fastdemandresponse_wp_parc.pdf. [Online; accessed 06-Apr-2015]. [GE, ] Ge brillion appliances. http://www.geappliances.com/connected-home-smart-appliances/. [Online; accessed 2-Nov-2014]. [Gre, ] Green button data. http://www.greenbuttondata.org/. [Online; accessed 4-Nov-2014]. [Con, ] Home automation and control. http://www.control4.com. [Online; accessed 2-Nov-2014]. [IFT, ] If this, then that. https://ifttt.com/. [Online; accessed 03-Apr-2015]. [Nes, ] Nest thermostat and smoke detector. https://nest.com/. [Online; accessed 2-Nov-2014]. [upn, ] Open source frameworks for upnp. http://www.cybergarage.org/do/view/Main/ UPnPFramework. [Online; accessed 06-Apr-2015]. [Ope, ] Openadr alliance, openadr 2.0 profile specification. http://www.openadr.org/specification. [Online; accessed 3-July-2014]. [Phi, ] Philips hue light. www.lighting.philips.com. [Online; accessed 2-Nov-2014]. [Rad, ] Radio thermostat. http://www.radiothermostat.com/. [Online; accessed 2-Nov-2014]. [sma, ] Smart-things. http://www.smartthings.com/. [Online; accessed 2-Nov-2014]. [eco, ] Smart wifi thermostats by ecobee. http://www.ecobee.com/. [Online; accessed 2-Nov2014]. 133 [ufo, ] Ufo power center. http://www.energyufo.com/. [Online; accessed 06-Apr-2015]. [Ven, ] Venstar thermostat. 2014]. http://www.venstar.com/Thermostats/. [Online; accessed 2-Nov- [wem, ] Wemo that – home automation made easy. http://www.wemothat.com/. [Online; accessed 29-Mar-2015]. [win, ] Wink home automation system. http://www.wink.com/products/. [Online; accessed 2Nov-2014]. [Amin et al., 2007] Amin, S., Bayen, A. M., El Ghaoui, L., and Sastry, S. (2007). Robust feasibility for control of water flow in a reservoir-canal system. In Decision and Control, 2007 46th IEEE Conference on, pages 1571–1577. IEEE. http://float.berkeley.edu. [Arduino Board, ] Arduino Board. Arduino board. http://www.arduino.cc. [Azizyan et al., 2009] Azizyan, M., Constandache, I., and Roy Choudhury, R. (2009). Surroundsense: mobile phone localization via ambience fingerprinting. In Proceedings of the 15th annual international conference on Mobile computing and networking, pages 261–272. ACM. [Balan et al., 2003] Balan, R. K., Satyanarayanan, M., Park, S. Y., and Okoshi, T. (2003). Tacticsbased remote execution for mobile computing. In MobiSys. [Bishop, 2001] Bishop, C. (2001). Bishop pattern recognition and machine learning. [Chen et al., ] Chen, W., Wilson, J. T., Tyree, S., Weinberger, K. Q., and Chen, Y. Compressing neural networks with the hashing trick. [Chon et al., 2013] Chon, Y., Kim, Y., and Cha, H. (2013). Autonomous place naming system using opportunistic crowdsensing and knowledge from crowdsourcing. In Information Processing in Sensor Networks (IPSN), 2013 ACM/IEEE International Conference on. IEEE. [Chon et al., 2012] Chon, Y., Lane, N. D., Li, F., Cha, H., and Zhao, F. (2012). Automatically characterizing places with opportunistic crowdsensing using smartphones. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing. ACM. [Chu et al., 2011] Chu, D., Lane, N. D., Lai, T. T.-T., Pang, C., Meng, X., Guo, Q., Li, F., and Zhao, F. (2011). Balancing energy, latency and accuracy for mobile sensor data classification. In SenSys. [Collobert et al., 2011] Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., and Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(Aug):2493–2537. 134 [Cuervo et al., 2010a] Cuervo, E., Balasubramanian, A., Cho, D.-k., Wolman, A., Saroiu, S., Chandra, R., and Bahl, P. (2010a). Maui: making smartphones last longer with code offload. In MobiSys. [Cuervo et al., 2010b] Cuervo, E., Balasubramanian, A., Cho, D.-k., Wolman, A., Saroiu, S., Chandra, R., and Bahl, P. (2010b). Maui: Making smartphones last longer with code offload. In Proceedings of the 8th International Conference on Mobile Systems, Applications, and Services, MobiSys ’10, pages 49–62, New York, NY, USA. ACM. [Deng et al., 2009] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition. [Deng et al., 2013] Deng, L., Li, J., Huang, J. T., Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G., He, X., Williams, J., Gong, Y., and Acero, A. (2013). Recent advances in deep learning for speech research at microsoft. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. [Denton et al., 2014] Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., and Fergus, R. (2014). Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in Neural Information Processing Systems, pages 1269–1277. [Dixon et al., 2010] Dixon, C., Mahajan, R., Agarwal, S., Brush, A. J., Lee, B., Saroiu, S., and Bahl, V. (2010). The home needs an operating system (and an app store). In Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks, Hotnets-IX, pages 18:1–18:6, New York, NY, USA. ACM. [Eagle and Pentland, 2006] Eagle, N. and Pentland, A. (2006). Reality mining: sensing complex social systems. Personal and ubiquitous computing, 10(4):255–268. [Escoffier et al., 2008] Escoffier, C., Bourcier, J., Lalanda, P., and Yu, J. (2008). Towards a home application server. In Consumer Communications and Networking Conference, 2008. CCNC 2008. 5th IEEE, pages 321–325. IEEE. [Faulkner et al., 2011] Faulkner, M., Olson, M., Chandy, R., Krause, J., Chandy, K. M., and Krause, A. (2011). The next big one: Detecting earthquakes and other rare events from community-based sensors. In IPSN. [Flinn et al., 2002] Flinn, J., Park, S., and Satyanarayanan, M. (2002). Balancing performance, energy, and quality in pervasive computing. In ICDCS. [Floating sensor network project, ] Floating sensor network project. Floating sensor network. http://float.berkeley.edu. 135 [Girod et al., 2004] Girod, L., Elson, J., Cerpa, A., Stathopoulos, T., Ramanathan, N., and Estrin, D. (2004). Emstar: A software environment for developing and deploying wireless sensor networks. In USENIX Annual Technical Conference. [Girshick et al., 2014] Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition. [Guizzo, 2011] Guizzo, E. (2011). Robots with their heads in the clouds. IEEE Spectrum, 48(3):16–18. [Gumstix, ] Gumstix. Gumstix. https://www.gumstix.com. [Gupta et al., ] Gupta, S., Agrawal, A., Gopalakrishnan, K., and Narayanan, P. Deep learning with limited numerical precision. [Ha et al., 2007] Ha, Y.-G., Sohn, J.-C., and Cho, Y.-J. (2007). ubihome: An infrastructure for ubiquitous home network services. In Consumer Electronics, 2007. ISCE 2007. IEEE International Symposium on, pages 1–6. IEEE. [Hammer-Lahav and Hardt, ] Hammer-Lahav, D. and Hardt, D. The oauth2. 0 authorization protocol. 2011. Technical report, IETF Internet Draft. [IOIO for Android, ] IOIO for Android. IOIO for Android. www.sparkfun.com. [Jia et al., 2014] Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia. ACM. [Johnson et al., 2015] Johnson, J., Karpathy, A., and Fei-Fei, L. (2015). Densecap: Fully convolutional localization networks for dense captioning. arXiv preprint arXiv:1511.07571. [Ju et al., 2012] Ju, Y., Lee, Y., Yu, J., Min, C., Shin, I., and Song, J. (2012). Symphoney: a coordinated sensing flow execution engine for concurrent mobile sensing applications. In SenSys. [Kang et al., 2008] Kang, S., Lee, J., Jang, H., Lee, H., Lee, Y., Park, S., Park, T., and Song, J. (2008). Seemon: scalable and energy-efficient context monitoring framework for sensor-rich mobile environments. In MobiSys. [Kang et al., 2010] Kang, S., Lee, Y., Min, C., Ju, Y., Park, T., Lee, J., Rhee, Y., and Song, J. (2010). Orchestrator: An active resource orchestration framework for mobile context monitoring in sensor-rich mobile environments. In PerCom. [Kaparaty 2016, ] Kaparaty 2016. convolutional-networks. Conv net break down. 136 http://cs231n.github.io/ [Kehoe et al., 2015] Kehoe, B., Patil, S., Abbeel, P., and Goldberg, K. (2015). A survey of research on cloud robotics and automation. IEEE Transactions on automation science and engineering, 12(2):398–409. [Kelly and Knottenbelt, 2015] Kelly, J. and Knottenbelt, W. (2015). Neural nilm: Deep neural networks applied to energy disaggregation. In Proceedings of the 2nd ACM International Conference on Embedded Systems for Energy-Efficient Built Environments. ACM. [Kim et al., 2010] Kim, D. H., Kim, Y., Estrin, D., and Srivastava, M. B. (2010). Sensloc: sensing everyday places and paths using less energy. In Proceedings of the 8th ACM Conference on Embedded Networked Sensor Systems, pages 43–56. ACM. [Krizhevsky, 2009] Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. [Krizhevsky et al., 2012] Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. [Krumm et al., 2000] Krumm, J., Harris, S., Meyers, B., Brumitt, B., Hale, M., and Shafer, S. (2000). Multi-camera multi-person tracking for easyliving. In Visual Surveillance, 2000. Proceedings. Third IEEE International Workshop on, pages 3–10. [Krumm and Rouhana, 2013] Krumm, J. and Rouhana, D. (2013). Placer: semantic place labels from diary data. In Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing. ACM. [LeCun et al., 2015] LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature. [LeCun et al., ] LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. Backpropagation applied to handwritten zip code recognition. Neural computation. [LG Optimus Net, ] LG Optimus Net. LG Optimus Net. http://www.gsmarena.com/lg_optimus_ net-4043.php. [Liu et al., 2013] Liu, G., Tan, R., Zhou, R., Xing, G., Song, W.-Z., and Lees, J. M. (2013). Volcanic earthquake timing using wireless sensor networks. In IPSN. [Liu et al., 2006] Liu, J., Wolfson, O., and Yin, H. (2006). Extracting semantic location from outdoor positioning systems. In MDM. Citeseer. [Marvell Sheevaplug, ] Marvell Sheevaplug. Marvell sheevaplug. www.plugcomputer.org. [Meng et al., 2015] Meng, R., Shen, S., Roy Choudhury, R., and Nelakuditi, S. (2015). Matching physical sites with web sites for semantic localization. In The 2nd workshop on Workshop on Physical Analytics. ACM. 137 [Miluzzo, 2011] Miluzzo, E. (2011). Smartphone Sensing. PhD thesis, Dartmouth College. [Moazzami et al., 2015] Moazzami, M., Phillips, D. E., Tan, R., and Xing, G. (2015). ORBIT: a smartphone-based platform for data-intensive embedded sensing applications. In Proceedings of the 14th International Conference on Information Processing in Sensor Networks, IPSN 2015, Seattle, WA, USA, April 14-16, 2015, pages 83–94. [Moazzami et al., 2013] Moazzami, M.-M., Phillips, D. E., Tan, R., and Xing, G. (2013). A smartphone-based system platform for embedded sensing applications. Technical Report MSUCSE-13-11, Dept. CSE, Michigan State University. http://www.cse.msu.edu/publications/tech/ TR/MSU-CSE-13-11.pdf. [Moazzami et al., 2017] Moazzami, M.-M., Singh, J., Srinivasan, V., and Xing, G. (2017). Deepcrowd-label: A deep-learning based crowd-assisted system for location labeling. In 4th International Workshop on Crowd Assisted Sensing, Pervasive Systems and Communications (CASPer 2017). [Nair and Hinton, 2010] Nair, V. and Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning. [NASA PhoneSat 2013, ] NASA PhoneSat 2013. Nasa phonesat project. http://open.nasa.gov/ plan/phonesat/. [Newton et al., 2009] Newton, R., Toledo, S., Girod, L., Balakrishnan, H., and Madden, S. (2009). Wishbone: Profile-based partitioning for sensornet applications. In NSDI. [Object Tracking Robot, ] Object Tracking Robot. Soccer robot project. https://code.google.com/ p/android-object-tracking/. [Oracle, ] Oracle. Java annotations. http://docs.oracle.com/javase/tutorial/java/annotations/. [Pan and Yang, 2010] Pan, S. J. and Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on knowledge and data engineering. [Pei et al., 2013] Pei, L., Guinness, R., Chen, R., Liu, J., Kuusniemi, H., Chen, Y., Chen, L., and Kaistinen, J. (2013). Human behavior cognition using smartphone sensors. Sensors, 13(2):1402–1424. [Pichotta and Mooney, 2016] Pichotta, K. and Mooney, R. J. (2016). Using sentence-level lstm language models for script inference. arXiv preprint arXiv:1604.02993. [Ponnekanti et al., 2001] Ponnekanti, S. R., Lee, B., Fox, A., Hanrahan, P., and Winograd, T. (2001). Icrafter: A service framework for ubiquitous computing environments. In Ubicomp 2001: Ubiquitous Computing, pages 56–75. Springer. 138 [Quattoni and Torralba, 2009] Quattoni, A. and Torralba, A. (2009). Recognizing indoor scenes. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. [Ra et al., 2012] Ra, M.-R., Liu, B., La Porta, T. F., and Govindan, R. (2012). Medusa: a programming framework for crowd-sensing applications. In MobiSys. [Raspberry Pi, ] Raspberry Pi. Raspberry pi. http://www.raspberrypi.org. [Rosen et al., 2004] Rosen, N., Sattar, R., Lindeman, R. W., Simha, R., and Narahari, B. (2004). Homeos: Context-aware home connectivity. In International Conference on Wireless Networks, pages 739–744. [Sapiezynski et al., 2015] Sapiezynski, P., Stopczynski, A., Gatej, R., and Lehmann, S. (2015). Tracking human mobility using wifi signals. PloS one. [Sharif Razavian et al., 2014] Sharif Razavian, A., Azizpour, H., Sullivan, J., and Carlsson, S. (2014). Cnn features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. [Shin et al., 2016] Shin, S., Hwang, K., and Sung, W. (2016). Quantized neural network design under weight capacity constraint. arXiv preprint arXiv:1611.06342. [Simonyan and Zisserman, 2014] Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556. [Sleeman and van Eck, 1999] Sleeman, R. and van Eck, T. (1999). Robust automatic p-phase picking: an on-line implementation in the analysis of broadband seismogram recordings. Physics of the earth and planetary interiors, 113. [Snavely et al., 2006] Snavely, N., Seitz, S. M., and Szeliski, R. (2006). Photo tourism: Exploring photo collections in 3d. In ACM SIGGRAPH. [Sorber et al., 2005] Sorber, J., Banerjee, N., Corner, M. D., and Rollins, S. (2005). Turducken: Hierarchical power management for mobile devices. In MobiSys. [Srivastava et al., 2014] Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research. [Szegedy et al., 2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. [Taigman et al., 2014] Taigman, Y., Yang, M., Ranzato, M., and Wolf, L. (2014). Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1701–1708. 139 [también SCHUM, ] también SCHUM, V. The evidential foundations of probabilistic reasoning. [Tan et al., 2010] Tan, R., Xing, G., Chen, J., Song, W., and Huang, R. (2010). Quality-driven volcanic earthquake detection using wireless sensor networks. In RTSS. [VolcanoSRI 2012, ] VolcanoSRI 2012. Innovative monitoring systems help researchers look inside volcanos. www.cse.msu.edu/About/Notable.php?Nid=423. [Wind et al., 2016] Wind, D. K., Sapiezynski, P., Furman, M. A., and Lehmann, S. (2016). Inferring stop-locations from wifi. PloS one. [Yan et al., 2014] Yan, Y., Cosgrove, S., Anand, V., Kulkarni, A., Konduri, S. H., Ko, S. Y., and Ziarek, L. (2014). Real-time android with rtdroid. In MobiSys. [Zamir et al., 2013] Zamir, A. R., Dehghan, A., and Shah, M. (2013). Visual business recognition: a multimodal approach. In ACM Multimedia. Citeseer. [Zhang et al., 2015] Zhang, X., Yu, F. X., Chang, S.-F., and Wang, S. (2015). Deep transfer network: Unsupervised domain adaptation. arXiv preprint arXiv:1503.00591. [Zhou et al., 2014] Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., and Oliva, A. (2014). Learning deep features for scene recognition using places database. In Advances in neural information processing systems. [Zilberstein, 1996] Zilberstein, S. (1996). Using anytime algorithms in intelligent systems. AI magazine, 17(3):73. 140