LOW-POWER ARTIFICIAL INTELLIGENCE OF THINGS(AIOT) SYSTEMS

By

Li Liu

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

Computer Science—Doctor of Philosophy

2023

ABSTRACT

Internet-of-Things (IoT) is another excellent innovation after the Internet and mobile networks in

the information era, aiming at connecting billions of end-devices across scales. A multitude of

IoT applications often operate under conditions of constrained energy resources, which has ren-

dered low-power IoT systems a subject of considerable research interest. The increasing need

for AI in complex scenario-based composite tasks has led to the rise of Artificial Intelligence of

Things(AIoT), which encompasses research in two major directions: AI for IoT that solves prob-

lems in IoT systems with AI techniques and IoT for AI that adopts IoT infrastructure/data to advance

the development of AI models. While AIoT systems in low-power scenarios offer significant bene-

fits, they also face specific challenges that are inherent to their design and operational requirements.

This dissertation delves into low-power AIoT from both angles. 1) We endeavor to harness

the capabilities of AI to predict and analyse the communication channels of dynamic long links

in LoRaWAN which is one of the Low-power Wide-area Networks(LPWANs). DeepLoRa adopts

Deep Neural Networks based on Bi-directional LSTM(Long-Short-Time-Memory) to capture the

sequential information of environmental influence on LoRa link performances for accurate LoRa

link path-loss estimation. It reduces the path-loss estimation error to less than 4 dB, which is 2×

smaller than state-of-the-art models. LoSee extends the contributions of DeepLoRa. It measures

the real-world fine-grained performance, including detailed coverage study and feasibility analysis

of fingerprint-based localization, of a self-deployed LoRaWAN system with temporal dynamics

and spatial dynamics. 2) We design energy-efficient IoT systems that facilitate the deployment

of AI models for practical applications. FaceTouch enables accurate face touch detection with a

multimodal wearable system consisting of an inertial sensor on the wrist and a novel vibration

sensor on the finger. We leverage a cascading classification model, including simple filters and a

DNN, to significantly extend the battery life while keeping a high recall. FaceTouch achieves a

93.5% F-1 score and can continuously detect face-touch events for 79 – 273 days using a small 400

mWh battery depending on usage.

In general, this dissertation studies both theoretical and practical aspects in the field of low-

power AIoT systems, including LoRaWAN link behavior analysis and building practical wearable

systems. These advancements not only underscore the feasibility of deploying low-power AIoT in

real-world settings but also pave the way for future research and development in this domain, aiming

to bridge the gap between IoT and AI for the creation of smarter, sustainable, and more efficient

technologies.

Copyright by
LI LIU
2023

ACKNOWLEDGEMENTS

I am profoundly grateful for the guidance, support, and opportunities provided to me during my

journey as a Ph.D. candidate.

First and foremost, I would like to extend my deepest appreciation to Dr. Yunhao Liu, who

not only gave me the chance to embark on this academic voyage but also led me into the enriching

world of scientific research in the AIoT area.

My sincere gratitude goes to Dr. Zhichao Cao, my advisor during my Ph.D. program. His

unwavering support, guidance, and patient tolerance have greatly facilitated my scientific pursuits.

His mentorship has been a beacon of light, guiding me through the challenges and triumphs of

academic research.

I am also deeply thankful to my graduate committee members, Dr. Li Xiao, Dr. Mi Zhang,

and Dr. Tianxing Li. They have witnessed the milestones throughout my Ph.D. program. Their

presence and input have been invaluable to my academic and personal growth.

To my parents, who have always been my sanctuary, my haven of warmth, and my steadfast

support, my gratitude is boundless. They are the cornerstone of my strength, providing me with

love and encouragement every step of the way. I wish them happiness, health, and consistent pride

in me.

I owe a debt of gratitude to Manni Liu, my senior colleague and dearest sister without blood,

who has been my comrade in this journey. Together, we have faced challenges and setbacks, forging

an unbreakable bond in the pursuit of our dreams in life and scientific research. Our friendship is

eternal, and we’ll always be on the same side.

Special thanks to my best friend, Maolin Gan, who has been a source of joy, hope, and motiva-

tion in my life. Time is the touchstone of our friendship. Let’s continue to be there for each other,

find happiness in life, and shine together in our shared career path.

I also want to acknowledge my other friends who have generously supported and helped me

along this journey. Their belief in me has made me stronger and more determined to move forward,

reminding me that I deserve the best and can overcome any obstacle.

v

Lastly, but certainly not least, I want to thank myself for the courage to embrace the unknown

and the perseverance to withstand the ups and downs. I hope to always believe in myself, love my

career, cherish the people I meet in life, and continue to make further progress throughout the long

journey of life.

vi

TABLE OF CONTENTS

CHAPTER 1

INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Conducted Studies and Proposed Techniques . . . . . . . . . . . . . . . . . . .
1.2 Low-power And Accurate Face Touch Detection . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Organization . .

.

.

.

.

1
2
3
5

CHAPTER 2

.
.

.
.

Introduction . .

DEEPLORA: LEARNING ACCURATE PATH LOSS MODEL FOR
LONG DISTANCE LINKS IN LPWAN . . . . . . . . . . . . . . . . . .
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6
6
2.1
.
2.2 Related Work . .
9
2.3 Preliminary and Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 System Design .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Experiment Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
.
2.6 Evaluation .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
.
2.7 Conclusion . .

.
.

.
.

.

.

.

.

CHAPTER 3

.
.

.
.

Introduction . .

IS LORAWAN REALLY WIDE? FINE-GRAINED LORA LINK-
LEVEL MEASUREMENT IN AN URBAN ENVIRONMENT . . . . . 31
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1
.
3.2 Related Work . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3 System and Dataset Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 36
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4 Link Behavior Study .
3.5 Coverage Area Study .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.6 Localization Accuracy Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.7 Observations, Insight, and Discussion . . . . . . . . . . . . . . . . . . . . . . 54

CHAPTER 4

.

.

.
.

FACETOUCH: PRACTICAL FACE TOUCH DETECTION WITH A
MULTIMODAL WEARABLE SYSTEM FOR EPIDEMIOLOGICAL
SURVEILLANCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.1
.
Introduction . .
4.2 System Overview . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3 Vibration-based Surface Touching Classification . . . . . . . . . . . . . . . . . 62
4.4 Wrist-IMU based Face-Touch Gesture Detection . . . . . . . . . . . . . . . . . 68
.
4.5
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.6 Evaluation .
4.7 Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.8 Related Work . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
.
.
4.9 Conclusion . .

Implementation .
.

.
.

.
.

.
.

.

.

CHAPTER 5

CONCLUSION .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

BIBLIOGRAPHY . .

.

.

.

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

vii

CHAPTER 1

INTRODUCTION

Internet-of-Things (IoT) is another excellent innovation after the Internet and mobile networks in the

information era, aiming to connect billions of end-devices across scales, among which many devices

operate in an energy-constraint unattended manner. The development of low-power IoT systems

has thus become a focal point of research, driven by the need to efficiently manage energy while

maintaining functionality. The integration of AI with IoT has given rise to the concept of AIoT,

a field that has become increasingly relevant in complicated composition tasks. This integration

brings both opportunities and challenges to low-power IoT systems. AIoT research branches into

two primary directions: 1) AI for IoT This direction focuses on applying AI techniques to enhance

IoT systems. AI algorithms are used to process and analyze the vast amounts of data generated

by IoT devices, leading to more efficient operations, predictive maintenance, intelligent decision-

making, and energy consumption optimization. 2) IoT for AI Conversely, this approach utilizes

the infrastructure and data generated by IoT systems to improve and advance AI models. The real-

world data collected by IoT devices provide a rich, diverse, and often real-time dataset for training

and refining AI algorithms. This collaboration is crucial in developing more accurate and robust

AI models that can adapt to various scenarios and environments.

In the realm of AI for IoT, my research focuses on harnessing the power of AI for better mea-

surement of network conditions and optimization of network deployment for low-power IoT sys-

tems. LoRa (Long Range) is an emerging technique that enables long-distance communication

and keeps low power consumption. It supports IoT applications in lots of large-scale environments

where various types of land covers usually exist. However, due to the expensive cost of densely

deploying end-nodes, the understanding of LoRa link channel is still coarse-grained. There are

numerical empirical stduies [1–7] conducted to measure LoRa coverage ranges. The specific range

varies with experiment environments. Also, these works are not able to address the link dynamics,

fine-grained networking coverage, and localization accuracy of LoRaWAN. Besides, the complex-

ity of the environment makes it challenging to build an accurate LoRa link model for further anal-

1

ysis of the channel and related applications. There are link models that integrate environmental

factors [1,8–11], but they did not fully utilize the fine-grained environment information. Also, they

can not transfer well to new environments due to fixed environment modeling. In this dissertation,

we design DeepLoRa that adopts deep learning for accurate path loss estimation of long-distance

LoRa links using environment information. DeepLoRa achieves less than 4 dB estimation error

which is 2× smaller than state-of-the-art models and is more transferable because of using less-

information-lost raw environment data and highly generalizable RNN models. We also conduct a

fine-grained link-level measurement in LoSee that shows spatial-temporal link dynamics, coverage,

and link-information-based localization of LoRaWAN in an urban area which can benefit the de-

ployment of LoRa gateways, service quality in mobile applications, and network management in

practice.

Conversely, in the realm of IoT for AI, I design energy-efficient IoT systems that enable AI

for real-world applications. Face touch is an unconscious and high-frequency behavior that most

of us have [12–15]. Amidst the global pandemic of COVID-19, face-touch detection becomes

imperative for reducing epidemiological risk [16–19]. Prior work has investigated a variety of

emerging sensing techniques [20–29] to measure the distance between the hand and the face to

detect face-touch events, or adopts on-body sensors to extract features from the movement of hands

to classify face-touch gestures [30–34]. However, these approaches suffer from gestures similar to

face touch or drain the battery really fast. Therefore, there is a significant need for an accurate and

low-power face-touch monitoring system. In this dissertation, we propose FaceTouch, a low-power

and versatile method that enables accurate face touch detection with a multimodal wearable system

and AI model. FaceTouch achieves a 93.5% F-1 score of face touch detection with low power

consumption.

1.1 Conducted Studies and Proposed Techniques

1.1.1 DNN-based LoRa Link Model

As LoRa enables long-distance communication in diverse IoT applications in lots of environ-

ments where various types of land covers usually exist, it is challenging to conduct thorough field

2

measurements on a large scale or precisely predict a LoRa link’s path loss. A few models are inte-

grating environmental factors to reflect the difference in the rates of path loss increasing [1, 8, 9],

but those models only adopt regional environment information for prediction. State-of-the-art mod-

els [10, 11] adopt remote sensing techniques to analyze the composition of land covers along LoRa

links quantitatively and select the corresponding empirical model to use for prediction, which does

not fully utilize the fine-grained environment information. Also, they can not transfer well to new

environments due to fixed environment modeling.

In Chapter 2, we present DeepLoRa. DeepLoRa adopts a deep learning-based approach to ac-

curately estimate the path loss of long-distance LoRa links in complex environments. Specifically,

DeepLoRa relies on remote sensing to automatically recognize land-cover types along a LoRa link.

Then, DeepLoRa utilizes Bi-LSTM (Bidirectional Long Short Term Memory) to develop a land-

cover aware path loss model.

We implement DeepLoRa and use the data gathered from a real LoRaWAN deployment on

campus to evaluate its performance extensively in terms of estimation accuracy and model trans-

ferability. The results show that DeepLoRa reduces the estimation error to less than 4 dB, which is

2× smaller than state-of-the-art models.

1.1.2 Fine-grained LoRa link-level Measurement

In Chapter 3, we further present LoSee which shows a fine-grained LoRa link-level measure-

ment via a self-deployed LoRaWAN system consisting of 2 gateways and 6 mobile end nodes in

a 6×6 𝑘𝑚2 urban area. By such measurement, LoSee studies three fundamental research issues

and draws the following conclusions: 1) The spatial and temporal behavior of LoRa links is quite

dynamic due to environmental factors; 2) The coverage of LoRa gateways is anisotropic; 3) The me-

dian error of RSSI-fingerprint-based localization in given setting is about 400 m. Without densely

deployed LoRa gateways, the SOTA LoRa localization can support road-level localization.

1.2 Low-power And Accurate Face Touch Detection

During the COVID-19 pandemic, protecting ourselves from virus infection has been of vital

importance. There are many studies conducted in both IoT and AI to provide health management

3

applications, and face-touch is one of them. The mainstream automatic face-touch monitoring is

currently performed by recognizing face-touching gestures using emerging sensing techniques and

wireless signals, including acoustic [21–23, 35–38], radio frequency signals [24–26, 39], and mag-

netic signals [27,28], to measure the distance between the hand and the face and recognize potential

hand-to-face gestures. On-body sensors like inertial sensors have also been investigated to extract

features from the movement of hands to classify face-touch gestures [30–34]. However, many sim-

ilar gestures (e.g., picking up the phone, wearing a hat, or adjusting eyeglasses) can significantly

degrade their system performance and generate lots of false alarms, causing unnecessary panic

and/or bringing medical resources to a place where they are not needed. To filter out these false-

positive gestures, a recent work leverages sensors in the ear to accurately detect touch events [29].

However, since it relies on always-on sensing and signal processing to guarantee high recall, the

battery life is extremely limited (e.g., the system requires multiple charging times per day), increas-

ing the user burdens and degrading the user experience. Therefore, there is a significant need for

an accurate and low-power face-touch monitoring system.

To fill the gaps, in Chapter 4, we propose to leverage the wrist inertial sensor to detect the face-

touch gesture that the hand moves towards the face area and utilize the channel response of chirp

vibration signal propagating through the human body to detect events that touch the skin to com-

pensate for the ambiguity of gesture classification. To achieve this goal in a computation-efficient

manner, we develop a cascading classification model including three classifiers(one of which is a

DNN) to filter out irrelevant gestures to significantly extend the battery life while keeping a high re-

call. Once a face-touch gesture is triggered, we activate the vibration sensor to detect touch events.

We implement FaceTouch using commercial off-the-shelf hardware components and evaluate its

performance with various user activities and false-positive behaviors. FaceTouch achieves a 93.5%

F-1 score of face touch detection. The entire system only consumes 60.89 𝜇W power on average in

normal daily usage and 209.15 𝜇W in extremely heavy usage, which is several magnitudes lower

than the state-of-the-art systems, and FaceTouch can continuously detect face-touch events for 79

– 273 days using a small 400 mWh battery depending on usage.

4

1.3 Organization

The remainder of this dissertation is as follows, in Chapter 2, we discuss the LoRa link model

that exploits fine-grained environment info; in Chapter 3, we show LoRa link-level measurement

results that address fundamental issues for deploying LoRa in real-world; in Chapter 4 we propose

a low-power system that detects face-touch events utilizing vibration chirp signal integrated with

IMU data; in Chapter 5 we conclude this dissertation.

5

CHAPTER 2

DEEPLORA: LEARNING ACCURATE PATH LOSS MODEL FOR LONG DISTANCE
LINKS IN LPWAN

2.1

Introduction

The development of Internet of Things (IoT) has witnessed broader applications, increasing

of IoT devices and the expansion of network size. In many scenarios (e.g., agriculture, industry,

logistics, city, home, healthy-care), a large amount of unattended IoT devices are deployed, sending

small volume of data periodically or sporadically, which are expected to last working for years

given limited energy budget. To simultaneously fulfill all these requirements, various short-range

and low-power wireless techniques (e.g., BLE, 802.15.4) have been widely adopted in body-area

and local-area IoT. To extend the scale of these networks, ad-hoc architecture (e.g., wireless sensor

networks [40–42]) is further utilized in the past decades, but suffers from dramatically increasing

deployment and maintenance cost with the increasing of network scale. To mitigate this gap, long-

distance and low-power wireless techniques (e.g., LoRa, Sigfox, NB-IoT) have recently emerged to

enable LPWANs (Low-Power Wide Area Networks). Due to low-cost COTS radio/gateway (e.g.,

Semtech) and open-source development (e.g., LoRa Alliance), LoRa is gaining popularity in both

industry and academy areas [1, 43, 44]. One of the most popular LPWAN, resulted from LoRa, is

called LoRaWAN.

LoRa operates at license-free frequency bands, thus LoRaWAN saves the cost subscribing to

any telecommunication operator. LoRaWAN consists of end nodes equipped with LoRa radio and

LoRa gateways. An end node directly connects with several LoRa gateways in its communication

range. LoRa physical layer adopts chirp spread spectrum (CSS) modulation [45] to enable data

packet reception under low signal-to-noise ratio (SNR) (e.g., -20dBm) while keeping low power

consumption (e.g., 400mW transmitting at 20dBm, 5𝜇W in idle mode) as its low duty cycle and

narrow bandwidth. Moreover, COTS LoRa radio and gateway usually have a high signal sensitivity

to receive the potential weak signals. Hence, LoRa obtains large link budget [1, 7, 46] which ac-

counts for its high maximum feasible power loss along the signal propagation between an end node

6

and a gateway. The sufficient link budget is capable of providing reliable coverage spanning from

several kilometers to tens of kilometers in various environments (e.g., urban area, rural area).

Though LoRa establishes long-distance link, we observe that the communication distance may

vary greatly in real-world deployment. When an end node is deployed at different directions re-

garding to a gateway, the power attenuation of the link between them, dubbed as path loss, changes

due to different types of land-covers (e.g., tree, buildings, road, river) along the path. An accurate

path loss model is vitally important for LoRaWAN applications such as gateway deployment and

end-client localization. Specifically, since path loss correlates to the packet delivery probability of

a link [47], if we can accurately predict the path loss associated with a LoRa gateway before it is

deployed, we can optimize the LoRaWAN coverage by selecting gateway locations. Moreover, in

LoRaWAN, end node localization [11, 48–50] relies on the matching of the signal fingerprint (e.g.,

received signal strength indicator (RSSI)) observed by several gateways. If we can accurately pre-

dict path loss without exhausted site survey, the localization system will be deployed and maintained

with low overhead. However, facing the environment diversity in different and large coverage areas

of LoRaWAN, it is challenging to develop such an accurate and general path loss model with low

overhead [10, 11].

Most existing methods of path loss estimation depend on various physical models (e.g., Friis [51],

Bor [1], Okumura-Hata [8]), which depict the influence of environmental reflection, refraction and

diffraction on wireless signal attenuation. Friis transmission formula can be used to calculate path

loss when wireless signal is propagating in free space, but free-space transmission is hard to meet

in various field studies [2–7, 47, 52]. Moreover, Petajajarvi et al. [2] and Bor et al. [1] explore the

log-normal shadowing model [53] to estimate path loss. An environment related signal shadowing

is modeled as log-distance attenuation and measured by field study, and the derived parameters are

combined to adjust Friis transmission formula. Okumura [8] and Hata [9] study the path loss of

cellular signal in urban, suburban, rural areas based on the data collected in Tokyo, then propose

empirical formulas for different areas. Although these fine-tuned physical models consider the in-

fluence of surrounding environments, the accuracy of path loss estimation may vary significantly

7

since their environment classifications are too coarse-grained to model the diverse per-link envi-

ronment characteristics such as the types and order of land-covers appearing along the path. How

to break the ceiling of these physical models and accurately estimate path loss of long-distance

wireless links is still a challenging problem.

In this paper, we propose DeepLoRa, a learning framework for accurate path loss estimation

of long distance LoRa links. We have two key observations. First, some public remote sensing

images [10, 11] can be utilized to recognize the fine-grained land-covers distributed along a link.

Second, the influence of the land-covers on path loss is actually very complicated that not only

the types of land-covers matter but also their appearing order along the link makes a difference

(Section 2.3). Then, we resort to deep learning technique [54–56] to model the influence of a

specific land-cover distribution on path loss.

Specifically, instead of considering the environment of a LoRa link as a whole, DeepLoRa

divides it into an ordered sequence of short links (called micro link) with the same length. The

detailed land-covers of each micro link are recognized by utilizing remote sensing images. Then

we apply supervised Long-Short-Term-Memory (LSTM), which is one kind of Recurrent Neural

Network(RNN) for sequence analysis, to learn a path loss model based on the measurements col-

lected from the area of interest. Our LSTM model inherently capture the relationship between the

types and order of land-covers and path loss. When we have trained a LSTM model using the data

collected from a gateway, with only a few extra data collection and model training, the model can

be directly transferred to accurately estimate path loss for other gateways in the areas with similar

land-cover composition.

We implement DeepLoRa and extensively evaluate its performance on the dateset collected

from a campus LoRaWAN deployment spanning 6 × 6𝑘𝑚2 area in urban scenario. The dataset

includes data recorded by two gateways 𝐺1 and 𝐺2 placed at the roofs of two buildings with different

locations and heights. 6 mobile end nodes mounting on 5 bicycles and a car are used to collect GPS

and wireless signal data. 16071 data records are collected for 𝐺1 and 15192 records are collected for

𝐺2. The experimental results show that DeepLoRa achieves a mean error of 3.56dBm which is 2×

8

smaller than state-of-the-art models. When we transfer the model to the other gateway and fine-tune

it with less than 200 data records, the model achieves a mean error of 4.79dB. Our contributions

are summarized as follows:

• Instead of physical model, we first propose to utilize deep learning for path loss estimation of

long-distance LoRa links across large area in outdoor scenarios.

• We empirically study the influence of detailed land-cover sequence on path loss in a real

LoRaWAN system. We propose DeepLoRa utilizing adaptive LSTM model to learn the relationship

between path loss and the corresponding types and order of land-covers.

• We implement DeepLoRa and evaluate its performance in real LoRaWAN deployment. The

experimental results show that the mean error is 2× smaller than state-of-the-arts and the LSTM

model can be generalized with low overhead.

The rest of the paper is organized as follows. Section 3.2 introduces the related work. We present

the preliminary knowledge and illustrate our empirical study in Section 2.3. The system design of

DeepLoRa is followed in Section 2.4. Section 2.5 and Section 4.6 exhibit the implementation and

evaluation, respectively. We conclude our paper in Section 4.9.

2.2 Related Work

The characteristics of long-distance wireless links have been empirically studied and theoret-

ically modeled in the past decades. We summarize the existing efforts from the following three

aspects.

LoRaWAN field studies: In LoRaWAN, path loss estimation is facilitated by the study of LoRa

coverage in real world. The LoRa radios and gateways are usually adopt the commodity products

from Semetech. LoSee [47] deploys a LoRaWAN network consisting of one gateway and a mobile

end node in campus environment of Tsinghua University and utilizes log-normal shadowing model

to predict the path loss. It shows two gateways are needed to ensure the full coverage of the 4.5𝑘𝑚2

campus. Liando et al. [52] deploys 3 gateways and 50 end nodes. The maximum line-of-sight

(LOS) and non-line-of-sight (NLOS) communication distance is approximate 9.08km and 2km

when the packet delivery ratio is higher than 70%. Numerous other empirical studies [1–7] have

9

been conducted the measure the LoRa coverage ranges. The specific range varies with experiment

environments. For example, Centenaro et al. [5] observe range of 2km in an area of high-buildings.

Bor et al. [1] obtain a range of 2.6km in rural areas and a range of 100m in an environment concen-

trated buildings. Wixted et al. [7] observe of 1km to 20km in the central business district. However,

these empirical studies do not model and answer the question how path loss increases with commu-

nication distance at different rates in various environments. In DeepLoRa, we deploy 2 gateways

and 6 mobile end nodes in campus environment to study the detailed relationship between the types

and order of land-covers and signal attenuation.

Land-cover and environment aware models: There are a few models integrating environmental

factors to reflect the difference in the rates of path loss increasing. Empirical models Okumura [8]

and Hata [9] can be applied to LoRa path loss estimation which are originally used in cellular

scenarios. The Okumura-Hata model provides ready to use formulas that are suitable for different

environments (e.g. urban, suburban, rural areas). Bor et al. [1] adopts the well known log-normal

shadowing path loss model [53]. Different from free-space path loss, Bor model estimates the

absolute path loss with the reference path loss plus relative path loss between two distances. And it

introduces a parameter, called path loss exponent, that accounts for the rate of path loss increasing

in diverse environments, but estimating this value requires extra on-site measurements. Demetri et

al. [10] and Lin et al. [11] use remote sensing to quantitatively analyze the composition of land-cover

types along a signal propagation path, then based on the types of land-cover, they select appropriate

combinations of Okumura-Hata model and Bor model for further path loss estimation, respectively.

Instead of adopting physical path loss model, DeepLoRa utilizes deep learning to develop a learning

model which can depict the complex relationship between the path loss and the types and order of

land-cover along the path. Thus, DeepLoRa can achieve more accurate path loss estimation.

Machine learning based models: Some works [57–61] use machine learning to model the path

loss regarding to the influence of surrounding environments. Oroza et al. [57] adopt random forest

algorithm to predict the path loss for the wireless links in American River Hydrologic Observatory

(ARHO) system. The model takes link-specific features as inputs and achieves average prediction

10

error of 3.74dB with standard error deviation of 3.40dB. Zhang et al. [58] propose path loss mod-

els for evaluating the unmanned aerial vehicle (UAV) communication channels based on machine

learning models (e.g., random forest and KNN). They take propagation distance, Tx altitude, Rx

altitude, path visibility (binary parameter indicating if there exists LOS path between the Tx and Rx

UAVs), elevation angle as features. Cheng et al. [60] associate the floor plan of a building to RSSI

values in each indoor Wi-Fi measurement. They trained Convolutional Neural Networks (CNNs)

to capture the underlying path loss model. The model takes images (e.g., floor plan) as input and

generates predictions of received RSSI, achieving Root Mean Square Error (RMSE) of 3.9404 dBm

and good generalizability. The existing learning models, however, cannot guarantee the accuracy

and generalizability at the same time, especially for long-distance links. In comparison, DeepLoRa

utilizes Bi-LSTM to depict the sequential influence of different landcover along the link path and

shows good generalizability.

2.3 Preliminary and Empirical Study

2.3.1 Physical Path Loss Model

When the transmitter and receiver antennas are put in ideal free space, Friis transmission for-

mula [51] gives the free-space path loss (FSPL) as follow:

𝐹𝑆𝑃𝐿 (𝑑) = 10 log10

(𝑑) + 20 log10

( 𝑓 ) − 27.55

(2.1)

where 𝑑 is the distance between the transmitter’s antenna and the receiver’s antenna and 𝑓 is the

frequency whose unit is MHz.

Based on Friis model shown in Equ. 2.1, some models such as Bor model and Okumura-Hata

model require integrating environmental information so that can provide more accurate path loss

estimation. Specifically, for Bor model, the use of environmental information is reflected in an

introduced parameter:

𝑃𝐿 (𝑑) = 𝑃𝐿 (𝑑0) + 10 · 𝑛 · log10

(

𝑑
𝑑0

) + 𝑋𝜎

(2.2)

where 𝑃𝐿 (𝑑) indicates the path loss when the distance between receiver’s antenna and transmitter’s

antenna is 𝑑. 𝑃𝐿 (𝑑0)) is the path loss at a known reference distance 𝑑0. 𝑛 indicates the path loss

11

exponent that is environment-specific and needs to be estimated by empirical data. 𝑋𝜎 is a zero-

mean Gaussian random noise with standard deviation 𝜎.

Okumura-Hata model requires selecting one of the formulas based on the surrounding environ-

ment of end node. The main formulas involved in our following discussion is given by:

• The formula used in urban environments is indicated as follows:

𝐿𝑈 (𝑑) =69.55 + 26.16 log10

𝑓 − 13.82 log10

ℎ𝐵

− 𝐶𝐻 + (44.9 − 6.55 log10

ℎ𝐵) log10

𝑑

(2.3)

where ℎ𝐵 (𝑚) is the height of LoRa gateway. ℎ𝑀 (𝑚) is the height of LoRa end node. 𝐶𝐻 is the

antenna height correction factor which is defined as follows:

𝐶𝐻 = 0.8 + (1.1 log10

𝑓 − 0.7)ℎ𝑀 − 1.56 log10

𝑓

• Similarly, the formula used in suburban environments is depicted as follows:

𝐿𝑆𝑈 (𝑑) = 𝐿𝑈 (𝑑) − 2(log10

𝑓

28

)2 − 5.4

(2.4)

(2.5)

To acquire the environmental information to support the aforementioned two models, one typ-

ical way is on-site measurements which are usually labor exhausted. Demetri et al. [10] and Lin et

al. [11] adopt public remote sensing images for quantitatively analysing the composition of land-

covers along propagation route remotely, then based on the recognized types of land-covers, they

train and select appropriate physical model for further path loss estimation.

2.3.2 Remote Sensing based Land-cover Recognition

Literally, remote sensing is acquiring information of large-scale area on the earth, which can be

the surface, the atmosphere, or the oceans, using air-crafts or satellites equipped with sensors that

detect radiation reflected or emitted from target objects.

To recognize different types of land-covers from remote sensing images, as shown in Table 2.1,

Demetri et al. [10] define the types of land-covers which are representative enough in characterizing

the environment factors that affect LoRa signal attenuation. The land-covers are divided in two

groups according to whether they may lead NLOS signal attenuation or not (i.e., LOS transmission).

12

Table 2.1 The types of land-covers.

S
O
L
N

S
O
L

BUILDING
GREENHOUSE
TREES
Field
SOIL
ROAD
WATER

buidings
greenhouse structures
trees
farming field or glassland
bare soil
streets,roads and highways
lakes and rivers

A few features are extracted from the multi-spectral images [62, 63] which contain the different

radiation reflected by the land-covers. Then, each 10 × 10𝑚2 area in geographic space can be

classified into one type of land-cover by applying a classifier trained with Support Vector Machines

(SVM) [64–66]. With the types of land-covers along a LoRa link, they decide which Okumura-Hata

formula should be used based on the dominating land-cover type. They choose to use suburban

formula if dominating type belongs to LoS category and use urban formula otherwise.

Similarly, Lin et al. [11] classify different types of land-covers by using Random Forest. They

can achieve an area resolution of 0.6 × 0.6𝑚2 on their map due to the fine-grained remote sensing

dataset. After extracting a sequence of land-covers along a link, they separate the link into seg-

ments by the boundaries between adjacent different land-covers and adopt Bor model to estimate

the overall path loss segment by segment. The path loss exponent for each type of land-cover is

trained by site-surveyed data. To evaluate the influence of surrounding environment on signal at-

tenuation, we deploy a LoRaWAN system in a campus environment, where is full of different types

of land-covers. We also adopt remote sensing techniques for land-cover recognition.

2.3.3 Campus LoRaWAN System and Dataset

Figure 2.1 shows the overview and hardware of our campus LoRaWAN system. The system

is built on the LoRaWAN protocol. In our system, we deployed 2 gateways 𝐺1 and 𝐺2. Each

of them is equipped with a MCU, a SX1276 transceiver and a Raspberry Pi 3 for programming

remotely. They are located at the roofs of two different buildings in a campus environment as shown

in Figure 2.1(a). Their altitudes are 84m and 68m, respectively. The ground altitude of the campus

area is about 52m. Our LoRa end nodes are implemented with a MCU, a SX1278 transceiver and a

13

Figure 2.1 The overview of the deployment and dataset of our campus LoRaWAN system.

GPS unit as shown in Figure 2.1(b). They are mounted on 5 bicycles and a car as shown in Figure

2.1(c). While the bicycles and the car are moving, the LoRa end nodes will send packets to the

gateways. All the packets are transmitted with spreading factor SF = 12, bandwidth BW = 125kHz,

and coding rate CR = 4/5. TX power together with antenna gains is about 19dB. The 6 end nodes use

channels of 486.3kHz, 486.5kHz, 486.7kHz, 486.9kHz, 487.1kHz and 487.3kHz, respectively. The

interval between two adjacent packets is 5s. A packet includes the GPS coordinates, timestamps

and sequence number. The corresponding SNR and RSSI are logged at gateway.

We completed deploying the system in Dec, 2018. All the data were collected in the campus or

surrounding area from Dec 22, 2018 to Mar 15, 2019. We logged over 30,000 records at the two

gateways in total. Via GPS readings, we can calculate the link distance 𝑑 and the height difference

ℎ between end node’s antenna and gateway’s antenna. As shown in Figure 2.1(a), the measurement

locations are along the main roads in or around the campus. The whole region of interest is a

6𝑘𝑚 × 6𝑘𝑚 square area in where the land-covers include buildings, roads, parking lots, lakes, a

14

G1G26km6kmGPS UnitSX1278transceiverbatterySTM32L0MCU( a )( b )( c )(a) 𝐺1

(b) 𝐺2

Figure 2.2 Path loss vs. distance for different land-cover dominated links regarding to different
gateways 𝐺1 and 𝐺2.

river, glassland, trees and playground. The red points are the locations of our two gateways 𝐺1 and

𝐺2. The yellow points are the locations of all packet transmitted by the moving end nodes.

To study the environmental effects, we adopt the way that Demetri et al. [10] proposed to get

a sequence of land-covers for each LoRa link. The detailed implementation refers to Section 2.5.

We clean the data and remove redundancy in the way section 2.5.2. Finally, we obtain a dataset that

consists of over 4,000 unique records regarding to two gateways. 2,301 records are from 𝐺1 and

1,780 records are from 𝐺2, respectively.

In practice, given a received LoRa packet, we can obtain its RSSI and SNR from gateway, but

the RSSI is the power combination of LoRa signal and various noises. To eliminate the influence

of the noises, we use Expected Signal Power (ESP) as a metric to indicate actual received signal

power, which can be derived from the following equation:

𝐸 𝑆𝑃 = 𝑅𝑆𝑆𝐼 + 𝑆𝑁 𝑅 − 10 · log10

(1 + 100.1𝑆𝑁 𝑅)

Then, we use the following equation to obtain the ground truth of signal path loss.

𝑃𝐿 = 𝑃𝑡 + 𝐺𝑟 + 𝐺𝑡 − 𝐸 𝑆𝑃

(2.6)

(2.7)

where 𝑃𝑡 is the power fed into the transmitter’s antenna, 𝐺𝑟 and 𝐺𝑡 are the power gains at receiver

and transmitter sides, respectively. In our system, the sum of 𝑃𝑡, 𝐺𝑟 and 𝐺𝑡 is 19dB.

2.3.4 Empirical Study

We already know that free-space path loss does not take attenuation caused by environment into

account, thus underestimating the true path loss. How to model the impact of environment on path

15

01234Distance[km]100120140Path Loss[dB]BUILDINGTREESFIELDSOILROADWATER0.00.51.01.52.02.53.03.5Distance[km]80100120140Path Loss[dB]BUILDINGTREESFIELDSOILROADWATERloss is the focus of designing path loss models. We conduct empirical study based on our LoRaWan

system measurements to find laws in which the environment information affect the path loss of

LoRa links. As shown in Figure 2.2, we color the measured points according to their dominating

land-covers along the link. We can see different links have different dominating land-covers (e.g.,

buildings, trees, roads and field). Moreover, even under the same distance, the distribution of path

loss varies for different types of dominating land-covers. The buildings and fields make the path

loss more dynamic than other types of land-covers. Therefore, the results show that different types

of land-covers will lead diverse effects on the path loss.

The main problem of existing environment aware path loss models is that the land-cover infor-

mation they use is either extracted from the surrounding environment of end node or from statistics

of the whole link, which does not make full use of the fine-grained land-cover information. In Fig-

ure 2.2, we notice that even for the links with the same type of dominating land-cover, their path

loss variance is still very significant, especially for 𝐺1.

To discuss the problem in detail, we select four links 𝑅1, 𝑅2, 𝑅3 and 𝑅4 from our dataset as

shown in Figure 2.3(a). The properties of those links can be found in Table 2.2. 𝑅1 and 𝑅2, 𝑅3 and

𝑅4 have something in common: 1) their length are nearly the same; 2) the type of dominating land-

cover of both links are BUILDING; 3) the percentages of NLoS land-cover of both links are very

close. If we adopt aforementioned empirical models, we should get very close path loss estimation

of each pair of links. However, as it shown in our real measurements, the differences between path

loss of the two pairs are more than 20dBm, which cannot be ignored.

Table 2.2 The properties of two link pairs.

Link
Index
𝑅1
𝑅2
𝑅3
𝑅4

Length[m]

46.04
47.27
76.85
75.45

Dominating
Land-cover
BUILDING
BUILDING
BUILDING
BUILDING

NLoS Land-cover
Percentage
0.61
0.62
0.52
0.52

Path
Loss[dB]
140.61
114.74
149.34
127.40

We plot the detailed types of NLoS and LoS land-covers of these links at different distance along

the path from their end nodes to the gateway in Figure 2.3(b) and Figure 2.3(c). We can see that

16

Figure 2.3 Four example links in our area of interest.

each pair of links show significant difference in the order that NLoS and LoS land-covers present

along their paths. For those links (e.g., 𝑅2 and 𝑅4) with less path loss, they have less NLoS land-

covers that present near the end node. As other properties of the links remain similar, we believe

that not only the types of land-covers affects the path loss, but their order along a link also matters.

The reason is that for the distance between a obstacle and the end node, the closer the distance is,

the more probability the signal can be blocked. Although buildings may block the signal between

the end node and the gateway, the limited height of building has less probability to block the signal

if it is far away from the end node.

Though Lin et al. [11] divides the whole link into segments by different land-covers and calcu-

late the overall path loss segment by segment, they only use the type of land-cover in each segment

to determine the path loss exponent for Bor model. The result would be of no difference if the order

of those segments change in the link. There is still much room for us to improve in LoRa path loss

estimation by fully exploit the fine-grained order in which land-covers present in the link.

2.3.5 Recurrent Networks

To model the observation mentioned in last section, we look into the signal-propagation link

from finer granularity other than a whole. Link path loss can be regarded as a result of travers-

ing a sequence of micro-environments. Based on this understanding, we need sequence analyze

techniques to build the model. Finally we resort to Recurrent Neural Network (RNN), a main

Deep Neural Network (DNN) approach for tackling with sequence data. Though effective on some

17

2km2kmG1G2R1R2R3R4( a )( c )( b )tasks [67,68], the simple architecture suffers from vanishing and exploding gradients problem [69],

making the gradients hard to back-propagate through long sequence of hidden units. Thus, it is dif-

ficult for the traditional RNN to learn long-time dynamics. Since LoRa link can be very long,

traditional RNN may fail on long distance cases. To address the vanishing and exploding gradients

problem, LSTM [69] was proposed with a memory and forgetting mechanism. Besides, Bidirec-

tional Recurrent Neural Network (BRNN) [70] can encode the temporal information in both the

sequence order and the reverse order, which better captures the properties of the sequence. Consid-

ering all those factors, we choose to adopt Bidirectional LSTM(Bi-LSTM) in our model to capture

the fine grained and order-dependent environmental information of LoRa links.

2.4 System Design

2.4.1 Overview

Figure 2.4 shows the overall workflow of our system. DeepLoRa consists of three parts. To start

with, given a location where we intend to deploy a LoRa gateway at, we get free multi-spectral im-

ages of related area of interest via Sentinel-2 open access API. Next, we generate a land-cover map

from multi-spectral images through land-cover classification. Each pixel in the land-cover map is

the class label that represents the true land-cover type in the real map. For regional estimation, we

assume that an end node can be deployed at any possible point in the area, so we can get a list of

coordinate pairs of the gateway and an end node, which is exactly a list of LoRa links. Then, Link

segmentation and embedding produces formalized descriptions of the land-covers traversed by

LoRa link based on land-cover map, which are in the format of sequences. Moreover, our path loss

model based on DNN takes the sequences together with experimental specific parameters as inputs,

and predicts corresponding path loss such that the ESP received by the gateway can be calculated.

Finally the regional estimation of ESP received by the gateway can be visualized as a heatmap.

2.4.2 Land-cover Classification

Land-cover classification is the first step of DeepLoRa, it provides us fine-grained knowledge

of the land-cover information traversed by LoRa link. To formalize this problem, given an area of

18

Figure 2.4 Overview of our system design.

interest that is 10𝑀 × 10𝑁𝑚2 large and its multi-spectral images (10m is the pixel resolution of the

multi-spectral images, 𝑀 and 𝑁 are the width and height of the images, respectively), we want to

obtain a land-cover map of which each pixel 𝑝𝑖 𝑗 , 0 ≤ 𝑖 < 𝑀, 0 ≤ 𝑗 < 𝑁 indicates the corresponding

land-cover type 𝑐𝑘 , 0 ≤ 𝑘 ≤ 5 that the corresponding area 𝑎𝑖 𝑗 of 10 ∗ 10𝑚2 in the real map belongs

to. We consider 6 land-cover types in total as shown in Table 2.1 except GREENHOUSE since this

class does not present in our experiment area. Actually it is a per-pixel classification problem.

For each unit area 𝑎𝑖 𝑗 , we extract a feature vector f including the raw spectral values of corre-

sponding pixel, the Normalized Difference Vegetation Index (NDVI) and the Normalized Differ-

ence Water Index (NDWI) from corresponding multi-spectral images. Then we feed the feature

vector to SVMs with Radial Basis Function (RBF) kernel that predicts whether 𝑎𝑖 𝑗 belongs to land-

cover type 𝑐𝑘 or not. We train 6 binary classifiers. Each classifier is trained for one specific land-

cover type, and we select the one with the highest confidence score as final prediction. Altogether

we can get the land-cover classification map.

2.4.3 Path Loss Estimation

Once we are done with land-cover classification, we can exploit the detailed environment infor-

mation to design our DNN based path loss model. We need to first select a region that can represent

the LoRa link. Then according to discussions in Section 2.3.4 and Section 2.3.5, we need to mount

the types of land-covers in the link-region as a sequence of micro-environments and further formal-

ize it as the inputs of the DNN learning framework with Bi-LSTM units.

19

Land-coverClassiﬁcation ModelLink Segmentation and EmbeddingGateway LocationMultispectual Images of Area of Interest(cid:53)(cid:72)(cid:80)(cid:82)(cid:87)(cid:72)(cid:3)(cid:54)(cid:72)(cid:81)(cid:86)(cid:76)(cid:81)(cid:74)Classiﬁcation MapESP HeatmapPath Loss Model based on DNN(cid:57)(cid:76)(cid:86)(cid:88)(cid:68)(cid:79)(cid:76)(cid:93)(cid:68)(cid:87)(cid:76)(cid:82)(cid:81)Figure 2.5 Deep neural network based on Bi-LSTM for path loss prediction.

2.4.3.1 Link Segmentation and Embedding

To represent the land-cover composition of a LoRa link, we do not just take a ”line” but a rectan-

gular area connecting the end node and the gateway from land-cover classification map as shown in

Figure 2.6. Since in our scenario, usually the direct link is NLoS path and the attenuation caused by

the environment can be quite complex due to reflection, diffraction, diffusion and so on. Moreover,

factors like the orientations of transmitter’s antenna and receiver’s antenna can affect the actual

propagation route so that make the ”line” is hard to be determined. Besides, the misclassification

of a few pixels on the line would affect the whole sequence if just take one line of pixels into ac-

count. Selecting rectangular area can provide fault tolerance to above concerns. The width of this

rectangle needs to be selected according to experiment and empirical knowledge. Then, we seg-

ment the rectangular area and embed it into sequence format. Take a closer look at the embedding

process in Figure 2.5 as Figure 2.6. We divide the extracted link region of length 𝑑 and width 𝑤

into micro-link regions of length 𝑑′ from the end node to the gateway. We can get 𝑛 = ⌈𝑑/𝑑′⌉

micro rectangles in total. If the remainder 𝑟 ≠ 0 we still regard the rest part as a micro-link. The

granularity and length of the sequence is actually determined by 𝑑′ and have direct impact on es-

timation accuracy. We will discuss the selection of 𝑑′ in section 2.6.2. Say that micro-link region

𝑙𝑖, 0 ≤ 𝑖 < 𝑛 contains 𝑚𝑘 , 1 ≤ 𝑘 ≤ 5 pixels for each land-cover type 𝑐𝑘 . Then each micro-link

region is embedded into a 1 × 6 vector 𝑣𝑖 by counting the proportion of 6 land-cover types.

𝑣𝑖 = [𝑣0

𝑖 , 𝑣3

𝑖 , 𝑣4

𝑖 ]
𝑖 , 𝑣5

𝑖 , 𝑣1
𝑖 , 𝑣2
5∑

𝑣 𝑘
𝑖 = 𝑚𝑘 /

𝑚 𝑗

𝑗=0

20

(2.8)

(cid:17)(cid:17)(cid:17)(cid:90)GataewayEnd NodeLand-cover Map(cid:71)Embedding...Bi-LSTM(cid:17)(cid:17)(cid:17)(cid:17)(cid:17)(cid:17)(cid:17)(cid:17)(cid:17)128(cid:17)(cid:17)(cid:17).........(cid:17)(cid:17)(cid:17)RELU Layer(cid:17)(cid:17)(cid:17)Max PoolingTrue Path LossLossOutputLinear Layern  128 n  6 128  n128  n128  n128   1Adjustable Parameters6128   1(cid:17)(cid:17)(cid:17)(cid:17)(cid:17)(cid:17)3  1 Convolution(cid:17)(cid:17)(cid:17)Linear LayerSigmoidFunctionFigure 2.6 Link segment and embedding.

Such that we transform the rectangle area into an ordered sequence 𝑠 = [𝑣1, 𝑣2, ..., 𝑣𝑛] consists of

land-cover vectors. After embedding, we input the sequence to our designed deep neural network.

2.4.4 DNN based Path Loss Model

The architecture of our neural network based path loss model is shown in Figure 2.5. The

sequence of vectors is first input to Bi-LSTM unit to extract order dependencies. As RNNs can

unfold along time axis (in our case, distance), they enable information flow traverse from the start

of the sequence to the end of the sequence thus capture the forward dependency and connect the

output of current frame (timestamp,location, etc.) to previous frame. This capability suits our

demand that we want to estimate the path loss at gateway which is the last frame in our sequence

considering the attenuation from the start of the sequence. One concern is that RNNs are not good

at learning long-term dependencies. But LoRa links can span extremely long distances, resulting

in quite long sequences. Even if we adopt quite coarse link segmentation granularity, e.g. 𝑑′ =

3 (i.e., corresponding to a distance of 30m in reality), we’ll get a sequence of length 100 for a

link of distance 3km. That sequence is equivalent to some paragraphs in machine translation task,

which we already know that is hard to process with more complicated neural architectures. Not to

say longer link with finer segmentation granularity. LSTM is more capable of learning long-term

dependencies like sequences contains hundreds of elements. To better relate the information from

the start of sequence to the end as we have discussed that land-covers near the end node are more

likely to block the signal, we further adopt Bi-LSTM.

All RNNs have a chain form information flow from the former frame to the next frame as the

forward layer related flow (e.g., blue arrows) in Figure 2.7. Bi-LSTM contains information flow

21

(cid:17)(cid:17)(cid:17)6d0dwLink RegionSequencersGataewayEnd Nodevi(cid:17)(cid:17)(cid:17)v0vnEmbedding by StatisticsFigure 2.7 The information flow of Bi-LSTM.

from the other direction as the backward layer related flow (e.g., red arrows) in Figure 2.7. This

ensures that the land-cover information from both the start and end of the sequence can be captured

instead of ’forgot’ when length of the sequence is very long.

The outputs of Bi-LSTM are input into convolution layers to extract local features and context

dependency. Rectified Linear Unit (ReLU) layer [71] introduces non-linearity to the model. After

max pooling, the output features are down-sampled and the dimensionality is reduced. Then we

linearly map the features to the path loss. To be noted that, while doing linear mapping, we can add

extra parameters to the network which can be any factor that has an effect on the path loss (also we

can introduce non-linearity with other units). In this way our network becomes extendable when

we have other properties of LoRa Link, e.g. weather condition, temperature, etc. So that we can

quantitatively study new influencing factors in the future. For now, we just input the link distance

as well as the height difference of transmitting antenna and receiving antenna.

The actual path loss of a LoRa Link has boundaries, it can not be less than 0, it can not exceed

the maximum link budget constraint to the maximum transmitting power and end node sensitivity.

So we curve our final estimation to a value between 0 and 1 using sigmoid function. In this way

we can control the range of losses for the sake of training convenience. We just need to scale the

estimation with the upper boundary to get the predicted path loss. Path loss larger than the upper

boundary then will be curved to the upper boundary which indicates failure of packet delivery. In

our experiments, we take 160dB as the upper boundary.

An important concern of DeepLoRa is whether it can be used in new environments. Our system

22

oi 1LSTMLSTMLSTMLSTMLSTMLSTMLSTMLSTM(cid:17)(cid:17)(cid:17)(cid:17)(cid:17)(cid:17)(cid:17)(cid:17)(cid:17)vi 1vi+1vivroroi+1oiInputBackwardLayerForwardLayerActivationLayerOutput(cid:17)(cid:17)(cid:17)design enables 3 levels of generality and can be transferred to new environment painlessly.

• We do not manually select features but use sequence reorganized from real land-cover map

with other factors as inputs, so that our model can learn a mapping which approximates to the law

of signal propagation. This ensures the model generality of the first level.

• When we train our model, we endeavour to select training data that covers the links of various

distances and land-cover compositions as introduced in section 2.5.2, then our training set span

much room in the full feature space. This ensures the model generality of the second level.

• We adopt a Bi-LSTM based DNN model in our path loss model. Neural networks that are

trained on a large history dataset can be fine-tuned with a small dataset contains new data to adjust

its weights to fit new observations. So when we fine-tune our model with just a few data from the

new environment, it can achieve higher accuracy than original model in the new environment. This

is one advantage over many other machine learning based models since they need to retrain their

model with fixed data from scratch and not promise to get better results. This ensures the model

generality of the third level.

The first two levels of generality enable fair transferability of the original model while the last

level of generality provide a feasible way to enable model fine-tune when we have higher demand

for accuracy and would like to do lite on-site measurements for it. We evaluated the generality and

tranferability of our model in Section 2.6.3.

2.5 Experiment Details

2.5.1 Land-cover Classification

We obtain multi-spectral images (hereafter tile) of resolution 10m from sentinel-2 open hub.

We build a dataset from those images, which consists of small portions of micro-areas on the map

to train and evaluate our land-cover classifiers. We manually labeled these areas using the online

tool semantic-segmentaion-editor [72] which support annotating pixels in RGB images by checking

areas. We obtained 900 labeled pixels in total, 180 pixels per class. The dataset then is divided into

training set and testing set by 9 : 1. Training set is used for training and model selection, testing set

is used for evaluating model performance.

23

((a)) RGB Map

((b)) Land-cover Map

Figure 2.8 RGB map and land-cover map of the area of interest.

The testing result shows a quite high overall classification accuracy of 97.4% for all land-cover

types which indicates that we can regard the obtained land-cover map as a reflection of true en-

vironment condition. Figure 2.8(b) shows the generated land-cover map of our area of interest.

Different colors indicate different types of land-covers. We can see most parts of the area are full

of buildings. Trees, fields and roads occupy a large part of the rest area. Water and soil only appear

in a few parts of the area.

2.5.2 Path Loss Estimation

Our path loss model is based on DNN. We implemented it using Pytorch [73]. We need to use

our collected data for model training. While training the model, we have to make sure that two

same inputs can only be mapped to the same output, otherwise our model will be confused. So we

clean our data before training. Since our data is continuously collected as bike moves, the locations

logged by GPS unit are continuous on the map. Due to the 10m resolution of multi-spectral images

we use, we regard every area of size 10 × 10𝑚2 in reality as a pixel on the map. When we transform

the GPS coordinates into coordinates on the map, many locations in reality are mapped to the same

pixel with different ground truth path loss. To remove redundancy and get a unique ground truth

for each input to the path loss model, we calculate a mean path loss for those measurements with

24

BuildingTreeFieldSoilWaterRoadlocations fall into the same pixel.

To train and evaluate our path loss model, we split the dataset into training set and testing set by

9 : 1. Since the principle behind our model is sequence processing and the length of sequence has

significant impact on path loss, we separate our data into bins based on their sequence lengths before

we split the dataset. In this way, we make sure that our training set contains sequences of diverse

lengths, and their percentages in training set are close to these in testing set. Such a balanced

training set promises a more general model. However, there is still gap of the link composition

between training sets of two gateways and the two gateways are located at different altitudes, so it

remains a challenge to apply the model trained for one gateway to a new gateway or new environment

directly. We also extract data from training set for two gateways in several proportions, and conduct

experiments on model tranferablility between different environment or gateways with those data.

We will discuss more about it in Section 2.6.3.

For the link segmentation and embedding, we tried with multiple values of rectangle width 𝑤

and micro-link length 𝑑′, the comparison and evaluation can be found in section 2.6.2. We finally

select 𝑑 = 3, 𝑤 = 7(represent 30m, 70m respectively) in our following experiments. We train our

model with learning rate 𝑙𝑟 = 0.0001, batch size 𝑡𝑟𝑎𝑖𝑛𝑏𝑠 = 16, and test the model every 5 epochs.

2.6 Evaluation

2.6.1 Overall Performance

We evaluate our model accuracy in comparison with free-space model, Bor model, and two

models PATH and INTERSECTION proposed by Demetri et al. [10] on the same testing set by cal-

culating the absolute difference between path loss estimation and ground truth value. The path loss

exponent of Bor model is fitted by the same training set as our model. PATH and INTERSECTION

use settings in the original paper [10]. The result is as follows: Among all these models, DeepLoRa

achieves the lowest error of less than 4dB for both gateways with the best performance of as low

error as 3.29dB, which outperform those models in comparison by at least 50%. Also the standard

deviation of DeepLoRa is limited to be 3.xdBm, the stability of estimation is ensured. We plot the

raw estimation errors of these models except free-space model(since its error is too large) on the

25

Table 2.3 Absolute average estimation error on path loss. Average (avg[dBm]) and standard devia-
tion (stdd[dBm]) among different models.

𝐺𝑖𝑑

DeepLoRa

avg
3.29
INTERSECTION 19.91
20.93
8.70
52.23

PATH
Bor
free-space

𝐺1

𝐺2

all

stdd
3.12
7.13
7.68
7.18
6.04

avg
3.94
17.55
17.35
12.24
47.52

stdd
3.21
10.00
10.35
8.83
9.64

avg
3.56
18.88
19.36
10.25
50.17

stdd
3.17
8.58
9.12
8.14
8.16

(a) 𝐺1

(b) 𝐺2

Figure 2.9 The distribution of the estimation errors on the full testing set.

full testing set in a box plot as Figure 2.9. We can see that the raw estimation errors of DeepLoRa

is centered around 0dB which means that it has no tendency of underestimating or overestimating

path loss, while estimation of other models all show obvious offset towards one side of 0. The error

distribution of DeepLoRa is way narrower than other models, the magnitude of the largest error is

less than 10dB and the magnitude of 50% errors are less than 5dB. It further proves that DeepLoRa

achieves higher estimation accuracy with low variance.

2.6.2 Link Segmentation

To study the impact of different link segmentation parameters on the final performance of the

path loss model, we train models for several combinations of link region width 𝑤 and micro-link

length 𝑑′. The performance of those models are compared based on absolute average estimation

error. The results is reported in Table 2.4 Though the variance of estimation error among all the

settings is not significant, we can still find some changing rule from it. When we fix the width 𝑤,

and vary length 𝑑′, 𝑑′ = 3 gives lower error than 𝑑′ = 5, 7, this is consistent with our intuition

26

GeoLoRaPATHBor3020100102030Error[dB]INTERSECTIONGeoLoRaPATHBor3020100102030Error[dB]INTERSECTIONTable 2.4 Absolute average estimation error with different link segmentation parameters.

𝑑′(10m) 𝑤(10m)

1
3
5
7
3
3
3

7
7
7
7
3
5
11

avg(dB)
3.79
3.29
3.75
3.54
3.85
3.28
3.30

(a) free-space

(b) Bor

(c) INTERSECTION

(d) DeepLoRa

Figure 2.10 ESP heatmaps of a 6 × 6𝑘𝑚2 area with regarding to gateway 𝐺2.

that the finer the granularity of the sequence, the better the result. But 𝑑′ = 1 does not produce the

best result. The reason mainly lies in two folds: 1) for the same link, the length of sequence with

𝑑′ = 1 is 3 times longer than that of sequence with 𝑑′ = 1. Long sequences are harder to learn; 2)

smaller 𝑑′ means smaller micro-link region, there are so few pixels in such region for embedding

that it provides little fault tolerance for land-cover classification. So we choose 𝑑′ = 3. When we

fix the length 𝑑′, and vary width 𝑤, 𝑤 = 3 seems to give the highest error while the model achieves

similar results with 𝑤 ≥ 5. Actually setting 𝑤 = 3 is similar to selecting a ”line” to represent the

link, it is just too narrow as we already explained in Section 2.4.3.1. But it is not always the wider

the better since wider link region means more calculation during embedding. To balance between

fault tolerance and computation, we select 𝑤 = 7.

2.6.3 Model Generality and Transferability

To compare the performance between DeepLoRa and other models when applied in new envi-

ronment, we conduct testing of model trained with training set of one gateway on the testing set

of the other. Since free-space model, PATH and INTERSECTION do not apply in such scenario

(their performance should be the same as in Table 2.3), we just compare our model with Bor model

27

-140-120-100-80-60-40Table 2.5 Absolute average estimation error during model transfer.

𝐺𝑖𝑑

DeepLoRa
Bor

𝐺1 → 𝐺2

𝐺2 → 𝐺1

avg (dBm)
9.58
10.20

stdd (dBm)
0.76
9.16

avg (dBm)
8.92
10.00

stdd (dBm)
6.51
8.91

(a) 𝐺1

(b) 𝐺2

Figure 2.11 CDF of absolute estimation error when apply model in new environment with different
amount of fine-tune data.

as in Table 2.5.

We can see that when DeepLoRa model trained on 𝐺1 training set is applied for 𝐺2 testing set,

the average estimation error and standard deviation of DeepLoRa is 9.58dBm and 0.76dBm which is

lower than those of Bor model. The same things happen if we reverse 𝐺1,𝐺2. This result indicates

that DeepLoRa guarantees good generality. When we transfer DeepLoRa to new environment it

still retains satisfactory estimation accuracy.

Above result just shows DeepLoRa generality of the first two levels, we also conduct experi-

ments to verify its generality of the third level. Before apply the model directly in new environment,

we use different percentages of training data belonging to the new gateway to fine-tune the base

model. Adding 0% of data means using the base model directly without fine-tuning. The result is

given by testing the fine-tuned model on the testing set of new gateway as shown in Figure 2.11.

We can see from the CDF that when test on 𝐺1 data, using 10% 𝐺1 training data for fine-tuning

can control 80% of estimation errors within 7dBm, when test on 𝐺2 data, using 10% 𝐺2 training

28

data for fine-tune can control 80% of estimation errors within 8.5dBm, which approximates the

performance when using 100% training data of the new gateway for fine-tuning (i.e., equivalent to

train the model from scratch for the new gateway).

In Table 2.6, we report the absolute average estimation error of above experiment. We can see

that using 10% of training data to fine-tune can improve the estimation accuracy up to 2× when

compared with no fine-tuning. And we can see greater improvement when fine-tune 𝐺2 model and

test on 𝐺1 data. This is because dataset collected for 𝐺2 is more diverse than that of 𝐺1, resulting

in a more general base model. The extra accuracy benefit brought by increasing the amount of

fine-tuning data can be ignored when we already use 20% or more fine-tune data. In our context,

10% training data is around 200 records, which can be easily collected with our LoRaWan system.

Actually we may not even need 10%, 5% or less would be enough. Based on this result, we suggest

first training a base model with large-scale history data which can be obtained from existing real-

world deployments, and fine-tuning the base model with a few data collected in the new environment

for seek of higher accuracy demand.

2.6.4 Generating ESP Heatmap

In order to show the performance of DeepLoRa more intuitively, we do per-link path loss esti-

mation using DeepLoRa for each unit area in the 6 × 6𝑘𝑚2 area shown in Figure 2.1. Finally we

draw the ESP heatmap of this area with regarding to gateway 𝐺2 (𝐺1’s is similar to 𝐺2’s). We

also draw heatmaps using free-space model, Bor model and INTERSECTION model for compar-

ison purpose. Figure 2.10 shows the heatmaps. In these heatmaps we use the same color scale of

[-40, -140]dBm for all models, darker color means lower ESP value (i.e., larger path loss). ESP

value equal or lower than 140dBm means unable to deliver the packet/no coverage. It is clearly

that free-space model and Bor model only provide isotropic path loss estimation with lower accu-

racy. INTERSECTION model reflects the anisotropy to some extent, but the granularity is not fine

enough. When it comes to DeepLoRa, we can see the difference between each single link clearly,

many holes of coverage that are hidden in former heatmaps now show up.

Combine the quantitative experiments results and the visualization of estimation for large-scale,

29

Table 2.6 Absolute average estimation error with model fine-tune.

𝐺𝑖𝑑
0% fine-tune data
10% fine-tune data
20% fine-tune data
50% fine-tune data
100% fine-tune data

𝐺1 → 𝐺2 (dBm) 𝐺2 → 𝐺1 (dBm)

9.58
4.06
3.42
3.40
3.15

8.92
5.37
4.80
4.21
3.99

we can prove that DeepLoRa is a more accurate and robust path loss model which is capable of

providing ESP/coverage estimation for the area of interest in fine granularity.

2.7 Conclusion

To conclude, we propose DeepLoRa, a learning framework enabling accurate and general path

loss estimation for long distance wireless links in LPWAN. By deploying a real LoRaWAN system

in campus environment, we empirically study the relationship between the path loss of a link and the

land-covers along the link. We have observed that not only the types of land-covers lead different

signal attenuation, but also the order of these land-covers has significant influence. Given the posi-

tion of an end node, we utilize remote sensing images to recognize the types of land-covers between

the end node and a gateway. Then, we use Bi-LSTM to develop a learning path loss model which

captures the influence of both type and order of these land-covers on the path loss. We implement

our learning model and evaluate it based on our dataset. In comparison with state-of-the-art physi-

cal models, the experimental results show that DeepLoRa achieves more accurate and fine-grained

path loss estimation and needs few transferring training overhead.

30

CHAPTER 3

IS LORAWAN REALLY WIDE? FINE-GRAINED LORA LINK-LEVEL
MEASUREMENT IN AN URBAN ENVIRONMENT

3.1

Introduction

Low-power Wide Area Network (LPWAN) is an emerging IoT paradigm aiming for low-power

wireless communication over kilometer links. Several LPWANs (e.g., Long Range (LoRa) [74],

Narrow-Band(NB)-IoT [75], SIGFOX [5]) with different physical layer designs have been commer-

cialized, enabling city-scale IoT applications at a low cost. For example, NB-IoT [75] and LTE-M

operate on the LTE band as a part of 5G for the massive IoT. SIGFOX [5] uses an unlicensed band

but is a proprietary network. In contrast, LoRaWAN [74] operates at an unlicensed spectrum and

follows an open-source standard, attracting much attention from academia and industrial commu-

nities.

LoRa networking stack adopts the Chirp Spread Spectrum (CSS) modulation at the physical

layer (LoRa-PHY). By suppressing the background noise on the spectrum in CSS, LoRa-PHY can

successfully demodulate a symbol even if its SNR level is as low as -20 dB [76, 77], making it a

representative of the low-power and long-distance communication. With such LoRa links, spatial-

temporal link dynamics, coverage, and link-information based localization are three fundamental

research issues [78] which can be formulated as follows:

• For spatial-temporal link dynamics, the critical questions are whether different links with the

same distance show similar link performance and whether a link’s performance is stable over a long

period.

• For coverage, the critical question is whether the conceptual “long-distance” can be realized in

a wide area with a few LoRa gateways, enabling smart-city applications (e.g., vehicle sharing [47],

environment monitoring [40, 41], metering, logistics)?

• For link-information based localization, the critical question is whether an end-node can be

accurately localized with LoRa link fingerprint in a wide area and sparse deployment.

With the answers to these questions, a fine-grained link-level measurement can benefit the de-

31

ployment of LoRa gateways, service quality in mobile applications, and network management in

practice.

Status Quo and their Limitations: Several works [10, 79, 80] have observed the spatial diver-

sity of LoRa links, but lack detailed analysis in different distance scales. To our best knowledge, no

work reports the temporal performance of LoRa links. Similarly, to answer the coverage question,

some measurement studies [2,47,52,81–84] deployed real LoRaWAN systems to study the coarse-

grained communication rangeFin in real environments. For example, Liando et al. [52] deployed

three LoRa gateways and more than 50 static LoRa end nodes in a 3×3 𝑘𝑚2 campus environment

to conduct a coverage measurement. And they further use a 70% packet delivery ratio (PDR) as a

threshold to approximate the communication range of a LoRa link. The results show that the max-

imum communication range is 9.08 𝑘𝑚 and 2 𝑘𝑚 in Line-of-Sight (LoS) and Non-Line-of-Sight

(NLoS) scenarios. However, with only a communication range, the communication heterogene-

ity [79,85] will cause significant uncertainty in the coverage area for a gateway. Thus, the coverage

problem is not fully addressed.

Compared to the energy-consuming GPS-based localization, LoRa link fingerprint based local-

ization consumes much less power at the expense of accuracy. To answer the localization question,

the SOTA LoRa localization method, SateLoc [11] reported a median localization error of 43.5 m

in a 350×650 𝑚2 urban area with three gateways. However, the size of the evaluated area is limited,

and the cost of dense gateways deployment is unaffordable. Thus, whether we can achieve the same

localization accuracy in a larger area and with sparsely deployed gateways is still questionable.

Challenges: To achieve fine-grained spatial-temporal dynamics, coverage, and localization

measurement, the key information is to obtain the link PDR and signal fingerprint at a fine-grained

geography scale. We take a 6×6 𝑘𝑚2 area as an example to demonstrate the difficulty of obtaining

such information. If we split the whole area into 100 𝑚2 (i.e., the geography scale) cells and deploy

a LoRa end node in each cell, 3,600 LoRa end nodes are required. The number of LoRa end nodes

increases as the geography scale becomes more fine-grained. The expensive cost makes a static

deployment impossible to achieve the fine-grained link-level measurement.

32

In this paper, we deploy a mobile LoRaWAN system and propose novel methods to measure

the LoRa link-level coverage area and localization accuracy in a wide urban area at a fine-grained

geography scale. Our deployed mobile LoRaWAN system consists of two LoRa gateways and six

mobile LoRa end nodes in a 6×6 𝑘𝑚2 urban area, which continuously transmits data packets with

the location information while they are moving. Although benefiting from the mobility of the LoRa

end nodes, thousands of LoRa links are recorded efficiently, covering a variety of different locations,

we still encounter two challenges to achieve the fine-grained and whole-area measurement. On the

one hand, since a LoRa end node keeps moving, it needs time to observe enough packets for PDR

calculation, but it travels for a distance as well. Such mobility leads to a granularity tradeoff between

the PDR calculation and the geography scale. On the other hand, the users carrying the mobile end

nodes moved freely in their daily life, without any requirement for movement. Thus, the locations

of the collected data are not uniformly distributed across the areas of interest. Although we have

available data records over four months, some locations and roads are still uncovered. In such areas,

it is not trivial to infer the coverage performance and establish a fingerprint map for localization.

To solve the first challenge, The PDR granularity indicates the PDR estimation precision we

can achieve by observing different numbers of packet transmissions. The more packets we count,

the higher the precision is. For example, the precision will be 0.1 if we only count 10 packets in

total, but it will be 0.01 if 100 packets are counted. We estimate the speed of each LoRa end node

(§3.3.3), then adaptively adjust the geography scale to ensure the PDR granularity is not higher than

0.1 (§3.4.1). Moreover, we adopt DeepLoRa [79] to generate the expected signal power (ESP) [10]

for every location in the area. DeepLoRa [79] is a deep neural network (DNN) based ESP estimation

model to predict accurate ESP values by taking a land-cover type sequence as input. For coverage,

with the calculated PDRs in the covered locations, we establish an ESP based PDR prediction model

to infer the PDRs in the uncovered locations (§3.4.5). For localization, we use the ESPs observed

by different gateways as fingerprints to generate a fingerprint map for each location.

With the ESP, PDR, and fingerprint map, our link-level measurement includes the following

aspects. First, with the ESPs and PDRs in the covered locations, we analyze the overall, spatial

33

and temporal link dynamics for link property analysis (§3.4.3 and §3.4.4). Second, we estimate the

coverage area of each gateway with/without link ESP gains (§3.5). Third, we study the localization

accuracy with the fingerprint map under various settings (§3.6). Our measurement study presents

three key observations, and the conclusions are as follows:

• The distance cannot reflect the link quality anymore, and the temporal link behavior is much

more dynamic due to the micro-environment changes.

• Although the maximum communication range of a gateway observed by us is over 3 km, its

actual coverage area is irregular and only about 11.3 𝑘𝑚2, which is much less than expected.

• The fingerprint-based LoRa localization accuracy is quite limited in sparse gateway deploy-

ment. More gateways, site-survey, and dynamic calibration are needed.

We summarize our contributions as follows:

• We deploy a real mobile LoRaWAN system in an urban area and measure massive LoRa links

over four months.

• We propose several methods to measure spatial/temporal link dynamics and enable coverage

area calculation using sparsely received LoRa packets.

• We report the localization accuracy in such a wide-area deployment providing more insights

for future localization method design in LoRaWAN.

3.2 Related Work

LoRa Link Dynamic Study. To estimate the coverage of LoRa gateways without the deploy-

ment and on-site measurements, Demetri et al. [10], SateLoc [11], and DeepLoRa [79] develop

different models to accurately estimate the signal path loss by understanding the impact of land-

cover types in an urban environment. And a variety of remote sensing techniques are adopted to

recognize land covers through the LoRa link. For example, Demetri et al. [10] first design an au-

tomated processing toolchain with the multi-spectral images from remote sensing and then apply

the Okumura-Hata formula [8] for path loss prediction. Similarly, SateLoc [11] proposes a seg-

mented Bor model [1] to capture the different path loss exponents with corresponding land covers.

DeepLoRa [79] further incorporates the deep learning techniques for LoRa link estimation. It de-

34

Figure 3.1 Illustration of our LoRaWAN architecture.

velops a land-cover aware path loss model based on the Bi-LSTM (Bidirectional Long Short Term

Memory) and reduces the estimation error to less than 4 dB, which is 2× smaller than state-of-the-

art models [10]. In contrast, we study the relationship between a path loss and the resulting PDR,

which is crucial in bridging the gap between link behavior and network coverage.

LoRa Coverage Measurement. Recent years have witnessed several measurement works [5,

7, 47, 52, 80, 86, 87] to reveal the LoRaWAN performance in real environments. Liando et al. [52]

deploy three gateways and more than 50 static end nodes in a 3×3 𝑘𝑚2 campus to study the Lo-

RaWAN performance for measurement, including the communication range, network throughput,

and energy efficiency. Results show that the LoS and NLoS communication ranges are 9.08 𝑘𝑚

and 2 𝑘𝑚, respectively. Similarly, Centenaro et al. [5] observe a communication range of 2 𝑘𝑚 in

an area of high buildings. And the communication range they reported varies from 1 𝑘𝑚 to 20 𝑘𝑚

in the central business district [7]. Besides, LoSee [47] adopts a mobile end node mounted on a

bike to study the LoRaWAN coverage ability on the campus scale (e.g., 4.5 𝑘𝑚2). For reliable

PDR calculation, the mobile end node must transmit 50 to 100 packets on the spot. Focusing on

the indoor environments (e.g., office buildings, residential buildings, car parks, warehouses), Xu

et al. [86] study the LoRa link behavior and energy profile by deploying ten static and two mobile

LoRa end nodes. Compared to these measurement studies only focusing on the spatial link behav-

ior, we analyze the temporal characteristics of LoRa links and provide a more fine-grained coverage

area study than existing works in a 6×6 𝑘𝑚2 urban area.

LoRa Localization Method. Studies mainly adopt two kinds of technologies for LoRa lo-

calization: 1) TDoA-based localization; 2) RSSI-based localization. TDoA-based approaches uti-

35

ApplicationServerNetworkServerLoRa-MACLoRa-PHYBackhaulGatewayMobileEnd Nodeslize the time differences of the same signal arriving at different gateways. TDoA has been im-

plemented in the LoRaWAN network to perform localization both for stationary [88] and mobile

scenarios [89–91]. However, due to the limited bandwidth of commercial LoRa end nodes, TDoA-

based localization error can reach hundreds of meters since only 𝜇𝑠-level time resolution [92,93] can

be achieved. Researchers have improved TDoA-based localization to meter-level by customizing

dedicated LoRa devices. Nandakumar et al. [94] proposed a multi-band LoRa backscatter device

based on CSS modulation. Bansal et al. [95] present a distributed software-radio-based station net-

work that spans a wide bandwidth encompassing the TV whitespaces and offers a high aperture.

Those approaches, however, cannot be applied directly in existing LoRaWAN systems. Besides,

TDoA-based systems require at least three gateways that are strictly time-synchronized or phase-

synchronized which is not applicable in scenarios with sparse gateway deployment.

We can utilize received signal strength indicator (RSSI) measurements for localization [48]

according to the path loss models mentioned above [1, 9]. However, the performance is highly

affected by channel dynamics in complicated environments [10, 11, 79]. Fingerprint-based ap-

proaches [49, 50, 96, 97] also use RSSI values as a fingerprint to locate an end node by matching its

fingerprint with known reference locations in the database. Machine learning approaches have been

adopted for fingerprint matching, such as k-Nearest-Neighbor (KNN) [49], SVM [96], Bayesian

inference [50, 97]. SateLoc [11] proposes a weighted combination strategy for multi-gateway like-

lihood maps based on fingerprint matching and selects the point with the highest likelihood as the

predicted location. Sateloc achieves a 43.5 𝑚 median localization error in a 227,500 𝑚2 urban area.

Based on our LoRaWAN setting, we adopt link RSSI localization which is similar to SateLoc and

provide a detailed localization comparison with the data collected from our mobile LoRa system.

3.3 System and Dataset Overview

In this section, we first briefly review the LoRaWAN technical specification and define the

LoRa coverage problem. An overview is then given on the system architecture, configuration, and

deployment. Finally, we show the measurements and analysis results from our deployed mobile

LoRa system.

36

Figure 3.2 The structure of a LoRa packet.

3.3.1 LoRaWAN Primer

We illustrate the architecture of LoRaWAN in Figure 3.1, which operates in the infrastructure

mode. Multiple LoRa end nodes run the LoRa-MAC (media access control) and LoRa-PHY pro-

tocols and connect to the gateways in their communication range. Transport protocols like TCP,

6lowpan, and COAP is not involved in the LoRaWAN networking stack yet. Hence, we mainly

focus on the link layer performance. Upon receiving the LoRa packets, gateways forward them to

LoRa network servers for further processing. Note that there is no energy constraint on the gate-

way in most scenarios [77, 98]. Since the connection between gateways and network servers are

usually cellular networks or wired networks. As the packet forwarder, gateways also forward the

control messages (e.g., PHY configurations, MAC settings) from network servers to end nodes. Fi-

nally, network servers filter duplicated LoRa packets and disseminate the valid ones into application

servers for different applications.

As for LoRa networking, LoRa-PHY uses CSS to modulate data symbols. Figure 3.2 shows

the structure of a LoRa packet, which consists of the preamble, start frame delimiter (SFD), and

payload. Specifically. the preamble consists of multiple base up-chirps, followed by the SFD with

2.25 base down-chirps for packet detection and alignment. The payload contains multiple mod-

ulated chirps with different shifted initial frequencies for encoded data bits. In LoRa-PHY, three

parameters (i.e., bandwidth (BW), spreading factor (SF), and coding rate (CR)) can be configured

to adapt the communication range. For example, BW determines the frequency range of a chirp

symbol, such as 125, 250, and 500 𝑘 𝐻𝑧, in which a small BW corresponds to an extensive com-

munication range [77]. SF denotes the data bits a chirp symbol represents, ranging from 7 to 12.

37

Figure 3.3 We deploy two gateways and six mobile nodes in the urban areas, covering various land
cover types.

The communication range gets larger as the SF increases and enhances the noise resilience of LoRa

signals. Besides, CR introduces data redundancy in the coding process for extra noise tolerance,

which can be assigned as 4/5, 4/6, 4/7, and 4/8.

Sitting upon LoRa-PHY, LoRa-MAC adopts an ALOHA-based protocol that allows end nodes

to transmit as soon as they wake up, and exponential back-off is involved in case of collisions.

However, ISM bands regulation imposes a maximum 1% transmission duty cycle to end nodes and

gateways when using an ALOHA MAC. As a result, it puts a significant limitation on the downlink

capacity of the gateways as they need to serve all the surrounding end nodes with relatively scarce

transmission opportunities.

3.3.2 Our System Overview

We first introduce the hardware and deployment of our mobile LoRaWAN system. Illustrated in

Figure 3.3, two gateways 𝐺1 and 𝐺2 and six mobile end nodes (e.g., bicycle, car) are deployed in the

6×6 𝑘𝑚2 urban area. Both gateways are equipped with an MCU, an SX1276 transceiver [76], and

a Raspberry Pi 3 for programming remotely. We further indicate the location of our two gateways

𝐺1 and 𝐺2 in the campus as white points in Figure 3.3(a), which are located at the rooftop of two

different buildings at the height of 84 𝑚 and 68 𝑚, respectively. Note that the ground altitude of the

campus area is about 52 𝑚, and the distance between 𝐺1 and 𝐺2 is 1332.14 𝑚. The gateways are

38

( c )( b )( a )(a)G1G26km6kmGPS UnitSX1278transceiverbatterySTM32L0MCURoadWaterSoilFieldTreeBuilding(a)(b)(c)(a) Gateway 𝐺1

(b) Gateway 𝐺2

(c) Movement Speed

Figure 3.4 The accumulated number of different locations observed by (a) 𝐺1 and (b) 𝐺2 across
different users on different days. (c) Movement speed distribution of six mobile end nodes.

powered by PoE (Power over Ethernet) and provided with Internet access. Thus it can forward the

LoRa packets to our network and application servers running on the cloud (e.g., Digital Ocean).

On the transmitter side, the LoRa end nodes are implemented with an MCU, an SX1278 transceiver,

and a GPS unit, as shown in Figure 3.3(b). Figure 3.3(c) illustrates the 5 LoRa end nodes mounted

on different bicycles, and the remaining end node is put inside a BYD car under the front wind-

shield glass. These end nodes move freely with the bicycles/car in the users’ daily life without any

constraints, they send a packet to the gateways every five seconds only when they are moving for

power efficiency.

By default, our experiment uses the spreading factor 𝑆𝐹 = 12, bandwidth 𝐵𝑊 = 125 𝑘 𝐻𝑧, and

coding rate 𝐶 𝑅 = 4/5. We enable a regulation-compatible power amplifier controlled by the reg-

ister PA_HP [76] and connected to the pin PA_BOOST [76] on the SX1278 transceiver. The total

transmission power reaches 19 𝑑𝐵, which complies with LoRa regulations. The operating channel

is set as 486.3 𝑘 𝐻𝑧, 486.5 𝑘 𝐻𝑧, 486.7 𝑘 𝐻𝑧, 486.9 𝑘 𝐻𝑧, 487.1 𝑘 𝐻𝑧 and 487.3 𝑘 𝐻𝑧, respectively.

Thus we can avoid potential packet loss due to collisions between different end nodes. The exper-

iment spans four months, during which the end node owners traveled as usual (e.g., eating, office,

home). Thus the collected data records can only cover several parts of the whole area. To obtain the

land-cover types in this area for the LoRa based localization, we apply the satellite remote sensing

imaging on the whole area of interest by following the instructions in existing works [10, 11, 79],

including buildings, roads, parking lots, lakes, a river, grassland, trees, and playground.

39

DateNumber Of Covered LocationsDateNumber Of Covered LocationsEndnodeSpeed[m/s](a) Gateway 𝐺1

(b) Gateway 𝐺2

Figure 3.5 The spatial distribution of data records in the view of 𝐺1 and 𝐺2.

3.3.3 Collected Dataset Overview

This section provides detailed instruction on our collected LoRa dataset, spanning from Dec 22

to Mar 15. Considering the fast movement of an end node, the transmission interval between two

adjacent packets is set as 5s. We encode the GPS coordinates, timestamps, and sequence numbers

into the payload of LoRa packets. And the corresponding SNR and RSSI are logged at the gateway

side. Upon receiving the packets, the logged data records can be extracted from the network server

to keep the duplicate packets at both gateways, delivering over 30,000 records in total. Besides, we

can calculate the link distance and the height difference between the end node and gateway pair by

decoding the GPS data in the payloads.

We further illustrate the measuring locations on the main roads of the 6×6 𝑘𝑚2 urban area,

shown in Figure 3.5. The yellow points and red ones indicate the moving end nodes and the gateway

𝐺1 and 𝐺2 as the receivers for successful packet transmissions. And the maximum communication

range of 𝐺1 and 𝐺2 can be larger than 3 𝑘𝑚. Additionally, we observe similar trajectories of end

nodes for 𝐺1 and 𝐺2, but the PDR of an identical road is quite diverse. For example, both 𝐺1 and

𝐺2 have poor performance on the right-center roads in common. However, 𝐺1 has better coverage

for the left-bottom road while 𝐺2 performs better on the middle-top road, especially the part in

the north of a river on the top. The observation shows that the maximum communication range is

too coarse-grained to understand the coverage of LoRaWAN in an urban area, and a finer-grained

40

report on the measurement of LoRaWAN is required.

To demonstrate the coverage of both gateways statistically in our mobile LoRa system, we show

the total number of covered locations with successful transmissions in terms of end nodes and days,

shown in Figure 3.4. Specifically, we use a 10×10 𝑚2 square block to define our “locations”. Thus

the whole area can be divided into 600×600 locations. For each end node, we calculate the total

number of various trajectories with corresponding transmitting locations. For example, if more than

one packet is received in a new location, we count once for the current end node and derive the total

covered locations. Figure 3.4(a) and 3.4(b) show that the covered locations by 𝐺1 on 23 different

days, while 𝐺2 observes data for 19 days. Regarding the successful transmissions for each day, the

maximum and minimum locations observed by 𝐺1 are 452 and 7. In contrast, 𝐺2’s maximum and

minimum observed locations are 352 and 10, respectively. From the view of mobile end nodes, end

nodes 1 (e.g., red) and 2 (e.g., deep blue) contribute the most data records in different locations on

most days. And other nodes demonstrate a varied covered location. For example, end node 4 (e.g.,

orange) only delivers the most covered locations in two days.

To measure the mobility of our end nodes, we further calculated the speed of each end node

by using the timestamps between two adjacent locations in a trajectory. The speed distributions

(i.e., min, 25%, median, 75%, and max) of different end nodes are shown in Figure 3.4(c). The

maximum observed speed is about 25 𝑚/𝑠 (90 𝑘𝑚/ℎ) from the end node 1 (i.e., the BYD car).

The median speed is less than 5 𝑚/𝑠 (18 𝑘𝑚/ℎ) for most nodes, which is reasonable for a bicycle.

Note that data records related to end node 1 are taken during the morning and afternoon traffic peak

hours. Since LoRa-PHY is resilient to the Doppler effect [52] in the range of our observed speed,

we can use these data records to estimate an equivalent PDR for different locations for transmitting

nodes.

3.4 Link Behavior Study

Given our collected dataset with mobile LoRa nodes, we study the LoRa link behavior in the

urban area. Two metrics, ESP and PDR, are adopted to indicate the signal path loss over a physical

channel and reliable coverage in an area. By carefully analyzing their spatial and temporal distri-

41

butions, we establish a PDR prediction model that associates a position’s computed ESP value to

the estimated PDR.

3.4.1 Estimation Methodology on Metrics

ESP Estimation. We use ESP to depict the LoRa signal attenuation over a long-distance trans-

mission. Although RSSI is a widely adopted indicator to measure the signal attenuation of a phys-

ical link in WSNs [99–101] and Wi-Fi [102], it can be more error-prone below the noise floor in

LoRaWAN. Thus, we choose ESP which combines RSSI and SNR to calibrate the expected signal

path loss in our measurement study, which be calculated as follows [49, 79]:

ESP = RSSI + SNR − 10 log10

(1 + 100.1SNR)

(3.1)

where RSSI is the received signal strength indicator, and SNR is the signal-to-noise ratio. Given a

received data packet, its RSSI and SNR will be automatically calculated by gateways forwarded to

the network server.

PDR Estimation. Given a PDR threshold, the PDR of nodes with each position can be used to

determine the coverage of our mobile LoRa system. Due to the mobility of the end nodes, the data

packets are scattered along various trajectories. Our basic idea is to utilize all trajectories that pass

the position based on their coordinates to calculate the PDR of a specific position.

Given this trajectory-based PDR estimation method, a trade-off should be considered between

the position granularity and the estimation accuracy. On the one hand, a fine-grained position

granularity is desirable so that the micro-differences can be reflected across the observed “positions”

by our mobile end nodes. On the other hand, the number of available trajectories can be reduced for

observed locations if we split the urban area at a highly finer-grained scale to represent a position.

Consequently, the PDR accuracy of mobile end nodes can suffer from the estimation bias from

limited trajectories. For example, assuming the true PDR of a position is 90%, the calculated PDR

is only 80% due to one packet loss of five observed packets. More than ten packet records are

required for each position to mitigate the scarce trajectory distribution.

In practice, we first divide the 6×6 𝑘𝑚2 area into 1,600 150×150 𝑚2 square blocks. Each block

42

represents a position denoted as 𝑝(𝑖, 𝑗) to balance the estimation granularity and the estimation

error, where 𝑖 and 𝑗 are the coordinates of the corresponding block. Assuming the average speed

of an end node is 3 𝑚/𝑠 from Figure 3.4(c), the packet interval between two adjacent transmissions

is set as 5 𝑠. Thus the end node can travel through 150 𝑚 for ten continuous packet records.

Upon receiving the LoRa packets at the gateway side, we first extract all trajectories for each

end node. Then, we estimate all 𝑛 positions that a trajectory 𝑡 covers. For the 𝑘 𝑡ℎ position 𝑝𝑡 (𝑘)

of trajectory 𝑡, we use the sequence numbers of the data records to count the total number of trans-

mitted LoRa packets passing through the current position, denoted as 𝑐𝑡 ( 𝑝𝑡 (𝑘)). And the number

of successfully received LoRa packets is denoted as 𝑐𝑟 ( 𝑝𝑡 (𝑘)). The trajectory 𝑡 only contributes a

valid PDR estimation as 𝑐𝑟 ( 𝑝𝑡 (𝑘))/𝑐𝑡 ( 𝑝𝑡 (𝑘)) for the position 𝑝𝑡 (𝑘) when 𝑐𝑡 ( 𝑝𝑡 (𝑘)) is larger than

10. When we traverse all trajectories to compute their PDR estimations for the covered positions,

we calculate the average value with all PDR estimations for each position.

Furthermore, we adaptively enlarge the splitting area of a position where the observed packet

is less than five but not 0. Specifically, if the total number of packet transmissions is less than 5

for position 𝑝𝑡 (𝑘), we keep increasing the area of the position by adding its adjacent blocks until

more than 5 data records are reported. For 𝐺1, the blocks of packets less than 5 take 12.16% of all

the blocks number. For 𝐺2, the blocks of packets less than 5 take 12.97%. In this way, we deliver

a reliable PDR estimation for those covered positions with one or two lossy trajectories (e.g., the

right-middle roads for 𝐺1 and 𝐺2 in Figure 3.5).

3.4.2 Overall PDR and ESP Distribution

We further demonstrate the estimated PDR and ESP across different positions for 𝐺1 and 𝐺2.

Illustrated in Figure 3.6(a), the CDFs of PDR are distributed similarly for 𝐺1 (e.g., blue dashed

curve) and 𝐺2 (e.g., solid red curve). In comparison, 𝐺1 provides a little better PDR for the covered

positions than 𝐺2 does. And 60% of links are reliable with a PDR higher than 90% for 𝐺1. And

the remaining 40% LoRa links are intermediate links with dynamic link behaviors.

Figure 3.6(b) further shows the CDFs of ESP in all recorded data packets. We can observe that

the minimum ESP is −142.3 𝑑𝐵𝑚 for all packets, which is consistent with the reported −148 𝑑𝐵𝑚

43

(a) CDF of PDR

(b) CDF of ESP

Figure 3.6 CDFs of PDR and ESP observed at 𝐺1 and 𝐺2.

for the sensitivity of SX1276 [76]. Notice that LoRa gateways with different transceiver types

definitely receive signals at different sensitivity levels, resulting in a varied link budget. Compared

with 𝐺1, the ESP observed at 𝐺2 is much higher. For example, 𝐺1 has 20% ESP higher than

−120 𝑑𝐵𝑚 and the maximum ESP is −80 𝑑𝐵𝑚. However, 80% ESP of 𝐺2 is higher than −120 𝑑𝐵𝑚,

and the maximum ESP is approaching −47.34 𝑑𝐵𝑚. As shown in Figure 3.3(a), we attribute the

ESP difference to the deployment environment. 𝐺1’s antenna is partially hidden by the wall and

railing while there is no obstacle for 𝐺2.

Remark. Figure 3.6 reflects the distribution inconsistency between PDR and ESP. Due to the

strong noise tolerance ability of LoRa, low ESP (e.g., median value -127 dB) can achieve similar

PDR distribution as high ESP (e.g., median value -87 dB) does.

3.4.3 Spatial PDR and ESP Distribution

We study the spatial distribution of PDR and ESP regarding the link distance. For each position

(e.g., 150×150 𝑚2 block), the distance between the center of the block to a gateway location is first

calculated as its distance. And we use the GPS coordinates to compute the distance between the

end node and a gateway for each data packet. The spatial PDR distribution is shown in Figure 3.7.

A similar spatial distribution can be observed at 𝐺1 and 𝐺2, where the intermediate links with

low PDR are scattered at all distance levels. We further illustrate the spatial ESP distribution in

Figure 3.8. As the distance increase, the ESP values are reduced for both 𝐺1 and 𝐺2 and scattered

in a relatively wide range at different distance levels. Specifically, the maximum range of ESP

values is from −140 𝑑𝐵𝑚 to −95 𝑑𝐵𝑚 at 𝐺1 in Figure 3.8(a), when the distance is about 1, 000 𝑚.

44

0.20.40.60.81PDR00.20.40.60.81CDFGateway1Gateway2-160-140-120-100-80-60-40ESP(dBm)00.20.40.60.81CDFGateway1Gateway2(a) PDR vs Distance at 𝐺1

(b) PDR vs Distance at 𝐺2

Figure 3.7 The spatial distribution for PDR and distance.

In contrast, it is from −139 𝑑𝐵𝑚 to −100 𝑑𝐵𝑚 for 𝐺2 in Figure 3.8(b). Additionally, the longest

distance observed by ESP is about 3.5 𝑘𝑚, which is longer than 3.2 𝑘𝑚 observed by PDR. The

main reason is that the data records reported at those long-distance positions are from the end node

1 (i.e., the car). And it becomes hard to observe enough data records in our defined position area

due to the high mobility, resulting in a failed estimation of PDR in long-distance areas.

Remark. The distance of a LoRa link is weakly associated with its PDR and ESP. A rough

estimation of ESP can be given with the link distance (Figure 3.7), but the link distance cannot be

used for fine-grained PDR prediction (Figure 3.8).

3.4.4 Temporal PDR and ESP Distribution

The temporal distribution of PDR and ESP is evaluated for transmission days. We first associate

the trajectories per day to each position and then compute the standard deviation of per-day PDR

values to depict the temporal PDR changes for each position. As for ESP, we first divide the whole

area into 360,000 10×10 𝑚2 blocks and then calculate the average ESP of the associated data records

to represent the ESP of the block. The standard deviation of ESP values can be further derived for

each block.

We show the CDFs of PDR and ESP deviation in Figure 3.9(a) and 3.9(b), respectively. On the

one hand, 𝐺1 and 𝐺2 exhibit analogous temporal deviation on PDR and ESP. For example, 30% of

positions have more than 5 dB variance for ESP. And the maximum ESP deviation is about −15 𝑑𝐵.

Besides, more than 10% variances of PDR are reported for 40% of positions. And the maximum

PDR deviation is larger than 30%. On the other hand, the only difference in temporal distribution

45

0500100015002000250030003500Distance(m)00.20.40.60.81PDR0500100015002000250030003500Distance(m)00.20.40.60.81PDR(a) ESP vs Distance at 𝐺1

(b) ESP vs Distance at 𝐺2

Figure 3.8 The spatial distribution for ESP and distance.

(a) PDR deviation

(b) ESP deviation

Figure 3.9 The standard deviation of ESP and PDR observed on different days.

over time is from the micro-environment (e.g., surrounding obstacles like other bicycles and cars),

demonstrating the significant impact of the micro-environment patterns on the link performance for

different end nodes.

Remark. LoRa links are highly dynamic over time in an urban environment, shown in Fig-

ure 3.9, which can be attributed to the frequently varying micro-environment [80].

3.4.5 ESP based PDR Prediction

Based on the above observations on PDR and ESP distributions, we build a PDR prediction

model by feeding ESP as input. First, we calculate the average ESP of all observed data records for

each associated position in the urban area. Given the measured PDR for covered areas, we obtain a

variety of pairs of PDR and ESP. Then, the Gaussian process regression (GPR) [103] is adopted to

predict the numerical PDR from ESP for those uncovered areas.

To achieve a more accurate regression learner, we choose the exponential function as the kernel

function and complete the fitting processing, shown in Figure 3.10. Statistically, the root-mean-

46

01000200030004000Distance(m)-140-120-100-80-60ESP(dBm)01000200030004000Distance(m)-140-120-100-80-60ESP(dBm)00.10.20.30.40.5PDR Dev00.20.40.60.81CDFGateway1Gateway2-160-140-120-100-80-60-40ESP00.20.40.60.81CDFGateway1Gateway2(a) PDR vs ESP at 𝐺1

(b) PDR vs ESP at 𝐺2

Figure 3.10 Gaussian process regression analysis between PDR and ESP at 𝐺1 and 𝐺2.

square error is 0.12448 and 0.13678 for 𝐺1 and 𝐺2, and the coefficient of determination is 0.84

and 0.82, respectively. From the raw data pairs (e.g., blue dot), when ESP is lower than −133 𝑑𝐵𝑚

and −131 𝑑𝐵𝑚 for both gateways, the measured PDR nears 0 based on our measurement study.

Additionally, a 11 𝑑𝐵 wide transition zone (i.e., [−131 𝑑𝐵𝑚, −120 𝑑𝐵𝑚]) can be observed in both

𝐺1 and 𝐺2, which is larger than a 3 𝑑𝐵 transition zone in WSNs [101]. The reason is that in

LoRa long-distance communication, LoRa links are affected by more complicated factors and are

less predictable with only ESP, thus introducing more ambiguity. Even when the ESP is larger than

−120 𝑑𝐵𝑚, the PDR achieves a high performance but is not always 100%. And it can decrease below

70% due to a large temporal variance of PDR and ESP observed in §3.4.4. As for the uncovered

areas with the given ESP, the predicted data points (e.g., yellow triangle) exhibit a good match

with the ground truth. However, it cannot reflect the dynamic PDR accurately in our mobile LoRa

system.

Remark. ESP is a relatively good indicator to predict the PDR of a position. A 13 dB transition

zone and the PDR dynamic under large ESP indicate LoRa links are less predictable than other

wireless techniques like Wi-Fi and Zigbee.

3.5 Coverage Area Study

3.5.1 LoRa Coverage Problem

The coverage area indicates where a gateway can reliably communicate with any end node

and is determined by LoRa-PHY and LoRa-MAC. The influence of LoRa-PHY on coverage is

explicit. LoRa-PHY determines a signal-to-noise ratio (SNR) threshold, under which LoRa chirp

47

-160-140-120-100-80ESP(dBm)00.20.40.60.81PDRrawpredicted-160-140-120-100-80-60ESP(dBm)00.20.40.60.81PDRrawpredictedsymbols cannot be decoded correctly. The SNR thresholds are determined by different LoRa-PHY

configurations [77]. The observed ESP of various LoRa links is related to their distance. Thus,

LoRa-PHY determines the link reliability for LoRa transmissions.

Besides, LoRa-MAC may influence the coverage, too. For example, LoRa-MAC determines

collision probability when multiple LoRa end-nodes are deployed in the same area and share an

identical gateway. WiChronos [104] reported that when an end node transmits a 1-byte message

every ten minutes, the collision probability is about 1.4% for 100 nodes, increasing to 12.75%

for 1000 nodes. However, the influence of collision on the coverage is implicit since the collision

is not determined by link distance but by the transmission schedule. If the schedule is not well

adjusted, the end nodes far from the gateway may not have a higher collision probability than the

end nodes near the gateway even if the transmission of the far end nodes is using a longer signal

on-air time (e.g., larger spreading factor). Therefore, if the transmission schedules of all end-nodes

are uniformly random, the collision will uniformly degrade the transmission reliability for long and

short links, making it stained for part of the LoRa-PHY covered area. Many works [98, 105] focus

on solving the collided LoRa signals to enhance the LoRa transmission reliability.

In our measurement work, we focus on the LoRa-PHY coverage to determine the maximum

area a LoRa gateway can cover. By adjusting the channel of each mobile end node to a different

frequency 3.3.2, there is no signal collision in our collected datasets.

(a) 𝐺1

(b) 𝐺2

Figure 3.11 The heatmap of PDR values for different positions in the urban area.

48

G10100200300400500600010020030040050060000.10.20.30.40.50.60.70.80.91G20100200300400500600010020030040050060000.10.20.30.40.50.60.70.80.91(a) 𝐺1

(b) 𝐺2

Figure 3.12 CDF of predicted PDR with different ESP gains.

3.5.2 Methodology and Implementation

In this section, we study the coverage of each gateway in our deployed mobile LoRa system.

And the coverage area is defined as the covered area whose sum of the positions with a PDR value

larger than 70%. Specifically, by dividing the urban area into “positions” (150×150 𝑚2), we first

compute the corresponding PDR with our data records for those covered areas. We first adopt

DeepLoRa [79] to estimate an average ESP for each position for those uncovered ones. Then, we

can predict the associated PDR based on the PDR-ESP regression model in § 3.4.5. Figure 3.11

illustrates the distribution of calculated and predicted PDR values for all positions in the urban

areas. We can observe an irregular PDR distribution in different directions for both 𝐺1 and 𝐺2.

And the covered positions for 𝐺1 and 𝐺2 are distributed non-uniformly. Statistically, the coverage

area of 𝐺1 and 𝐺2 is 11.4 𝑘𝑚2 and 11.6 𝑘𝑚2, respectively, far from covering all 6×6 𝑘𝑚2 reliably.

3.5.3 Coverage Improvement ESP Gain

To enhance the coverage area of each gateway in the wild, several systems [50, 106, 107] have

been proposed to cooperate with multiple gateways for extra SNR gains of received LoRa signals.

For example, an SNR gain of 2 ∼ 3 𝑑𝐵 can be achieved through the coherent combining across three

or more gateways [50, 106]. Equation 3.1 shows the SNR gain is equivalent to the ESP gain. To

quantitative the relationship between the ESP gains and the coverage area in our deployed system,

we manually add an ESP gain for each position and then recalculate the corresponding PDR under

the enhanced ESP. For fairness, different ESP gains from 2 𝑑𝐵 to 10 𝑑𝐵 are selected randomly,

resulting in the CDF of predicted PDR in Figure 3.12. As the extra ESP gains go up, the PDR

49

00.20.40.60.81PDR00.20.40.60.81CDF02dB3dB6dB10dB00.20.40.60.81PDR00.20.40.60.81CDF02dB3dB6dB10dBincreases as well. For example, the median PDR improvement can reach 48.6% to 62.8% at 𝐺1

with a 3 𝑑𝐵 ESP gain, shown in Figure 3.12. And it gets larger from 50.3% to 62.7% when the ESP

gain is 10 𝑑𝐵, delivering a covered area with all PDR values larger than 70%. The observations in

Figure 3.12 verify the effectiveness of the SNR enhancement method.

Table 3.1 Coverage area under different ESP gains.

ESP Gains (dB)

0

2

3

6

10

𝐺1 Coverage Area (𝑘𝑚2)

11.4

15.2

17.7

27.1

35.9

𝐺2 Coverage Area (𝑘𝑚2)

11.6

15.3

17.3

23.7

33.0

Illustrated in Table 3.1, we further adopt the enhanced PDR to calculate the coverage area. And

a steady improvement of the coverage area can be observed at 𝐺1 and 𝐺2 as the ESP gains increase.

Given the 2 𝑑𝐵 ESP gains, the coverage area can be increased by 32.6%. And we can approximately

cover the whole urban area of 6×6 𝑘𝑚2 via only one gateway, with a given ESP gain of 10 𝑑𝐵.

Remark. Due to the observed link dynamics, the coverage area of a gateway is usually irregular.

Beyond deploying new gateways, it can be more effective to enlarge the coverage area of a gateway

by capturing extra SNR gains of LoRa signals.

3.6 Localization Accuracy Study

3.6.1 Methodology and Implementation

Recent years have witnessed a variety of localization systems [11, 47, 79, 108–111] built on the

knowledge of LoRa link behaviors with path loss. Among them, SateLoc [11] is the SOTA method.

The basic method is illustrated as follows:

Suppose we have several gateways to cover a certain area for localization. Each gateway will gener-

ate an ESP map as a part of the fingerprint map. The whole area is split into many geography cells,

which indicate the location unit in the localization process. Given the 𝑚𝑡ℎ gateway’s ESP map, the

likelihood of 𝐿𝑚,𝑖 for the 𝑖𝑡ℎ cell that an end node 𝑒 is located can be formulated as follows:

𝐿𝑚,𝑖 = 1 −

|𝐸 𝑚,𝑒 − 𝐸𝑚,𝑖 |
𝑚𝑎𝑥(|𝐸 𝑚,𝑒 − 𝐸𝑚 |) − 𝑚𝑖𝑛(|𝐸 𝑚,𝑒 − 𝐸𝑚 |)

(3.2)

50

where 𝐸 𝑚,𝑒 is the average ESP value of each packet, which is transmitted by the end node 𝑒 and

received at the 𝑚𝑡ℎ gateway. 𝐸𝑚,𝑖 is the ESP value predicted by path loss models at the 𝑖𝑡ℎ cell in

the 𝑚𝑡ℎ ESP map. The likelihood is then scaled and normalized according to the value range of

differences between received and ESP values in the 𝑚𝑡ℎ ESP map. Given the likelihood map for

each gateway, the fingerprint-based localization leverages the joint likelihood of multiple gateways,

in which the cell with the highest likelihood is selected as the predicted location:

𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛 = arg max

𝑖

𝑀∑

𝑚=1

𝐿𝑚,𝑖

(3.3)

To evaluate the performance of LoRa link-based localization systems in our deployment, we

implement SateLoc based on four different path loss models for ESP map generation, including

Bor model [1], PATH/INTERSECTION [10], SateLoc [11] and DeepLoRa [79]. To obtain the

remote sensing images for the environmental analysis for PATH/INTERSECTION, SateLoc, and

DeepLoRa, we first use the Sentinel-2 open-access API to get multi-spectral images of 10 𝑚 res-

olution for all four path loss models. The models are then trained with the collected dataset in

our deployed system, delivering 2 ESP maps for both gateways. Each pixel in our ESP map cor-

responds to a 10 × 10 𝑚2 cell in a real map. Note that the evaluated data points are filtered from

the whole dataset, in which each packet record contains the ESP values of the same frame from

the end node received at two gateways. Finally, we collected available data records covering 1,495

different 10 × 10 𝑚2 locations.

3.6.2 Overall Comparison of Localization Accuracy

Illustrated in 3.14(a), the CDF of localization error is given for the comparison study of local-

ization accuracy. On our dataset, with the most accurate DeepLoRa [79], the median localization

error reaches up to 400 𝑚 while adopting the approach in SateLoc [11], we got a median localiza-

tion error of about 500 𝑚. The worst localization error of those state-of-the-art models can even

reach 2, 000 𝑚.

This localization accuracy is much worse than that reported by SateLoc [11]. The best accuracy

achieved by SateLoc shows that 100% localization error is within 100 𝑚 and the median localization

51

Figure 3.13 Spatial distribution of the localization errors.

error is 43.5 𝑚 given the multi-spectral images of 50 𝑐𝑚 resolution for three gateways.

This is reasonable due to the property difference of the datasets used. On the one hand, only two

gateways are deployed in our system, resulting in serious fingerprint ambiguity compared to three

or more gateways. On the other hand, the localization accuracy is bounded by the resolution of cell

splitting. Since the fine-grained link estimation is based on cell splitting and a cell is the smallest

unit of distance comparison in our system. Therefore, a less fine-grained cell splitting can induce

much higher localization errors in urban areas. For example, we compute the median localization

error of 40 cells (which equals 400 𝑚 since each cell is a 10 𝑚 ×10 𝑚 area). With a similar cell-wise

error, we can get a median error of 20 𝑚 if we have access to remote sensing of 50 𝑐𝑚 resolution

which can outperform the localization error reported by SateLoc. Thus, improving the resolution of

multi-spectral images can improve localization accuracy.Compared with other models, DeepLoRa

achieves the best performance, consistent with the reported results [79]. Since DeepLoRa can pro-

vide more accurate ESP estimation than others, which can mitigate lots of fingerprint ambiguities.

Besides, PATH/INTERSECTION has the worst performance among all approaches.

Given the two generate ESP maps for 𝐺1 and 𝐺2, we further show the spatial distribution of

the localization errors of DeepLoRa [79] in Figure 3.13. The lighter the color is, the smaller the

localization error is. And the PDR reaches 0 for the black areas. An interesting observation is that

the evaluated data records near 𝐺2 have the best accuracy while it suffers from the estimated data

records near 𝐺1. The evaluated data records far from both 𝐺1 and 𝐺2 have the worst localization

performance. The reason has two folds. First, the ESP dynamic increases at distant locations. The

ESP dynamic makes DeepLoRa hard to predict the ESP fingerprint accurately. Second, the ESP

52

(a) whole area

(b) 500 m Circle (DeepLoRa)

Figure 3.14 The CDF of localization errors under different ESP estimation methods in different
ranges.

value is close to the LoRa sensitivity at long distances. The fingerprint ambiguity is increasing at

many borderline areas in the whole area.

We further reduce the number of evaluated data records to see whether we can achieve a better

localization accuracy when the evaluated data records are close to either 𝐺1 or 𝐺2. We only select

the cells whose distance from the gateway is smaller than 500 m. We use DeepLoRa to generate

the two ESP map of 𝐺1 and 𝐺2. Figure 3.14(b) shows that the data records around 𝐺2 have more

accurate localization results than those around 𝐺1. The reason is that the ESP observed by 𝐺1

is much more dynamic than 𝐺2 (Section 3.4.3). For 𝐺2, the median localization error is about

220 m. Regarding the 500 m range, it is still hard to support fine-grained localization. As shown

in Figure 3.15, we can detect different traffic trends under current median localization of 500 m by

drawing part of the trajectories of a single end node. The trends of the predicted locations almost

follow the actual movement of the end node when it stays around a gateway(a), moves across the

blocks, or moves towards(b)/away(c) a gateway. It possible to apply the localization model to traffic

trend prediction.

Remark. The ESP fingerprint-based localization highly depends on the granularity of the po-

sition unit, the number of gateways, and the distance to gateways. Given two gateways at 100 𝑚2

granularity, a sparse site survey can only achieve road-level localization for traffic trend tracking.

Additionally, the dynamic nature of link ESP in urban areas degrades the localization accuracy.

53

0500100015002000Error(m)00.20.40.60.81CDFDeepLoRaINTERSECTIONPATHBorSateLoc050010001500Error(m)00.20.40.60.81CDFGateway1Gateway2Figure 3.15 Tracking different traffic trends under median localization error of 500m.

3.7 Observations, Insight, and Discussion

Observations. We deploy a LoRaWAN with two gateways and six mobile LoRa end nodes. By

taking advantage of mobility, we accumulate data records that last more than 20 days to cover a large

area. Moreover, we develop a mobility adaptive method to achieve the PDR estimation and coverage

area calculation. Based on our link behavior study, we further verify the feasibility of fingerprint-

based LoRa localization in practice. We have three key observations: 1) The temporal link behavior

is much more dynamic. The main reason is the micro-environment change; 2) To obtain SNR gains

of LoRa signals is an efficient way to enlarge the network coverage; 3) The localization accuracy by

taking LoRa signals as the fingerprint is far from needed. It highly depends on system deployment

and the granularity of site-survey.

Our Insights. We present a few key insights for the LoRa communication stack and localization

method design in the future as follows:

• To deal with the link dynamics, SF 12 may not be resilient enough. We need a flexible way to

extend SF to 13 or more, which is not supported on commercial-of-the-shelf LoRa radios, to avoid

temporal disconnection.

• To deal with the link dynamics, the ad-hoc multiple-hop relay may be an alternative way to

forward the data reliably. How to reduce the energy consumption for forwarders searching at a very

low duty cycle and extra cost to maintain the network status is a critical problem.

• To obtain SNR gains in a LoRa gateway is an efficient way to enhance the coverage ability.

Hence, how to detect and recover weak signals with less overhead is another important problem.

• The fingerprint-based LoRa localization suffers from the hard taming link behavior. More

sophisticated techniques are needed to achieve accurate localization with narrow bandwidth and

54

abcGatewayPredictionGround Truthlow-cost end nodes.

Measurement Universality and Deployment Diversity. With the similar settings of the LoRa

transceivers, our observations may be applicable to other typical urban areas with high-density

obstacles and frequent micro-environment changes as shown in Figure 3. For new areas with great

disparities (e.g., rural areas, forest areas, mountain areas) from our current urban environment, the

results may vary since the link behavior is highly related to the types of different land covers along

the link path. The gateway siting and deployment also affect final results. A higher antenna and

fewer obstacles would result in higher PDR, higher ESP, and better coverage with more LOS links.

55

CHAPTER 4

FACETOUCH: PRACTICAL FACE TOUCH DETECTION WITH A MULTIMODAL
WEARABLE SYSTEM FOR EPIDEMIOLOGICAL SURVEILLANCE

4.1

Introduction

Face touch is an unconscious and automatic behavior that most of us have. In general, the

face-touch frequency is at a rate of 15-54 times per hour, and almost half of the face touches came

in contact with mucous membranes [12–15]. The frequency of face-touch behaviors is uneven

between the two hands for the human and primates, and most self-touches are with the non-dominant

hand [13, 112–114]. During the COVID-19 pandemic, face-touch behaviors significantly expose

humans to an epidemiological risk. Clinical studies have shown that the main route of transmission

of infectious diseases (e.g., COVID-19, Ebola, influenza) is through droplets sneezed or coughed

out by infected people and are then inhaled by someone else [16–18]. Those droplets can land on

surfaces like tables or chairs and live for more than 72 hours. When we touch the surfaces, the

epidemic viruses from polluted hands can be easily transferred [14, 19, 115] to your eyes, mouth,

and nose, and then cause respiratory diseases.

Face touch can not be diminished by physical protection. Also, due to its unconscious nature,

it is a hard-to-break habit that requires much conscious effort. To avoid the potentially serious

consequences of face-touch behaviors and track people’s face-touch patterns, prior work analyzed

the occurrence, and the frequency of face-touch to evaluate the progress of face-touch behavioral

change during the pandemic [19, 116]. For example, a monitoring system can send haptic feed-

back to users when they touch the face. Such systems can alert people who work in sensitive

domains like airports and hospitals not to touch their faces, which help stop the spread of epidemic

virus [116]. Besides infectious diseases, many studies showed that face touch is associated with

the user’s emotional status [117], mental health conditions [118], and workload evaluation [119].

However, the existing study still relies on clinicians or professionals to manually collect the occur-

rence and frequency of face touches, which introduces high labor costs and significant overhead for

data analytics.

56

Figure 4.1 Illustration of our multimodal wearable system for practical face touch detection.

Wearable Awareness Enhancement Devices (AEDs) such as smartwatches, smart rings that

support automatic behavior logging and facilitate behavioral changes (e.g., sleep trackers [120] and

smoking detector [121]) can be used to address such problem. Mainstream automatic face-touch

monitoring is currently performed by recognizing face-touching gestures. Prior work has inves-

tigated a variety of emerging sensing techniques and wireless signals, including acoustic [20–23,

35–38], radio frequency signals [24–26, 39], and magnetic signals [27, 28], to measure the distance

between the hand and the face and recognize potential hand-to-face gestures. On-body sensors

like inertial sensors also have been investigated to extract features from the movement of hands to

classify face-touch gestures [30–34]. However, many similar gestures (e.g., picking up the phone,

wearing a hat, or adjusting eyeglasses) can significantly degrade their system performance and gen-

erate lots of false alarms, causing unnecessary panic and/or bringing medical resources to a place

where they are not needed. To filter out these false-positive gestures, a recent work leverages sen-

sors in the ear to accurately detect touch events [29]. However, since it relies on always-on sensing

and signal processing to guarantee high recall, the battery life is extremely limited (e.g., the system

requires charging multiple times per day), increasing the user burdens and degrading the user expe-

rience. Therefore, there is a significant need for an accurate and low-power face-touch monitoring

system. To fill the gaps, we propose FaceTouch (Figure 4.1), a novel wearable system consisting of

a ring and a wristwatch using three sensing modalities: acceleration, rotation, and vibration. The

wristwatch is equipped with an inertial sensor, and the ring contains a pair of small vibration trans-

57

FaceTouchMultimodal InferenceAcceleration sequenceRotation angle sequenceTouch StateTouchingNon-touchingVibration WaveFigure 4.2 Overview of our multimodal wearable system for face-touch gesture detection.

ducers¹. As shown in Figure 4.2, we divide the face touch detection into two sequential tasks. The

first task is face-touch gesture detection which detects the movements when the hand moves toward

the face, and the other is the surface-touch classification to determine whether the fingers touch the

skin. We first leverage the inertial sensor on the wrist to detect whether an arm gesture is towards

the user’s face (called face-touch gesture) through the pattern of wrist movement and rotation. The

energy consumption of the MEMS-based inertial sensor is negligible, so we can run it in an always-

on manner. Then, we design and implement a novel wearable ring to extract touch-related vibration

features due to the changes in propagation paths when the finger touches different surfaces. By com-

bining these sensing modalities together, FaceTouch can precisely distinguish face-touch behaviors

from ambiguous arm gestures while minimizing energy consumption. By accurately capturing the

evolution of users’ face-touch patterns over time in different activities, FaceTouch can provide users

with records of when/where they touch their face to increase their awareness, trigger alerts when

user touch their face at really high frequency or involve high-risk behaviors such as long-time face

scratching. In the long-term point of view, FaceTouch can help with reducing frequent face-touch

behaviors of the users.

To realize FaceTouch, we face three technical challenges: 1) Face touch detection requires

always-on sensing and real-time signal processing that can drain the battery quickly. Since both

vibration sensing and gesture recognition contain power-hungry components, it is challenging to

¹The vibration transducers can send haptic feedback to users when they touch faces

58

ThresholdingSmart ringSmart watchAccelerometerBluetooth Comm.FaceTouchFace Touch Multimodal InferenceStatic Wrist ClassifierAccSmall Movement ClassifierRotation VectorOrientCalculate Rotation AnglesNoAccAnglesYesDNN-based Face-Touch DetectorYesTransmitterMicrocontroller ReceiverAmplifierBand-pass FilterCurrent FramePrevious  FrameFFTSavitzky–Golay filterFrequency Response-Boosted Tree Classificationtouch/notWrist-IMU based Gesture DetectionVibration Wave Sensing Based Surface-Touch ClassificationTouch?Static?Touch?design a face touch detection algorithm that achieves high recall and precision while minimizing

power consumption. 2) Existing vibration sensing requires two separate devices (i.e., one on the

moving body part and the other on the body part being touched) to sense vibration waves from the

transmitter and the receiver, which is impractical and leads to a high bar for many users. How-

ever, simply placing the transmitter and the receiver close together on a single device results in

saturation of the receiver device and poor sensitivity to the touch event. 3) Considering the limited

computation resource of a wristwatch and the importance of user experience, our face-touch ges-

ture detection methods must be computing-efficient to optimize energy consumption while keeping

high precision and recall for diverse users without exhausting user training or cooperation.

We design three key components in FaceTouch to address the challenges. First, we design

FaceTouch as an end-to-end system that adopts a cascading classification model to ensure high

detection precision/recall while minimizing power consumption. The cascading model not only ef-

fectively fuses results from complementary sensor modalities, including IMU and vibration sensing,

but also only triggers the energy-consuming component when it is essential, such that it balances

the trade-off between performance and computation/energy overhead to achieve a practical system.

Specifically, FaceTouch leverages two lightweight classifiers to lower the duty cycle of DNN-based

face-touch gesture detection and the vibration-based surface-touch classification. The first one is

an acceleration-threshold-based classifier, which filters out the static wrist gestures when the wrist

is at rest. The second classifier is a logistic regression model [122, 123] for detecting small move-

ments like typing keyboard and page-turning. Second, we propose a novel vibration sensor that

requires only a single point of instrumentation. To detect touch events and overcome the signal

saturation problem, we inject a chirp vibration signal in the finger, extract unique features caused

by materials’ properties, and design a lightweight boosted tree classification model to precisely de-

tect face-touch events across different users and surfaces. Third, we adopt computing-efficient GRU

unit [124] instead of LSTM [30] in DNN-based IMU arm gesture recognition for sequence process-

ing. For training/fine-tuning a DNN model for a specific group of people (e.g., medical staff, farmer,

etc.) with only a little new user-conscious input, we adopt the consistent-regularization-based semi-

59

supervised learning [125] to exploit deep model training with a small portion of labeled data and a

large amount of unlabeled data to achieve compatible performance with supervised learning. This

enables a quick start of our system in the initial phase of real-life usage. The system can be contin-

uously boosted when we collect user-specific data. The subsequently collected data can be used to

train via semi-supervised learning without requiring the users to provide labels.

We implement FaceTouch prototype using off-the-shelf hardware components and commercial

wearable devices. The ring prototype is low-cost (< $80) and low-power (60.89 𝜇W). FaceTouch

utilizes the sensing data from the always-on inertial sensor on the wristwatch and vibration fea-

tures from the ring to precisely monitor face-touch behaviors. We conduct a user study with 10

participants. The participants touch their faces and various surfaces (e.g., glass, cloth, rubber,

wood) with daily activities (sitting, standing, walking) and false-positive behaviors (drinking, call-

ing, adjusting glasses, etc.). Experimental results show that FaceTouch achieves 93.5% F1-score for

face-touch detection with leave-one-user-out cross-validation, which is 9% higher than the state-

of-the-art method [29]. The F1-score is close to that of the personalized model (93.9%), which

demonstrates the generality of FaceTouch across diverse users.

The contributions of this work are:

• We propose FaceTouch, a multi-modal wearable system capable of detecting face touch in

practical scenarios. To the best of our knowledge, FaceTouch is the first system that can achieve

long-term (e.g., 79-day to 273-day usage without charging) and accurate face touch monitoring.

The system can be used to prevent the potentially serious consequences of face-touch behaviors.

• We design an effective cascading model using always-on inertial sensors to lower the duty

cycle of power-hungry components while maintaining high recall. Unlike prior work that only

detects face-touch gestures, FaceTouch can distinguish face-touch from various false-positive be-

haviors while the user performs daily activities.

• We design a novel vibration-based sensing unit to extract unique vibration features while the

user touches the face. It is the first vibration sensor that requires only a single point of instrumenta-

tion, containing both the transmitter and the receiver. As a result, it can sense the reflected vibration

60

signal from the fingertips and classify surface materials.

• We implement the FaceTouch prototype using off-the-shelf hardware components and com-

mercial wearable devices. We evaluated the prototype and validated its performance with 10 par-

ticipants. Overall, FaceTouch achieves an average of 93.5% F-1 score. The average power is only

60.89 𝜇W in normal daily usage and 209.15 𝜇W in extremely heavy usage.

4.2 System Overview

In this paper, we design a novel wearable system on the wrist and the finger using three sens-

ing modalities: accelerations, rotations, and vibration waves. The key idea to minimize energy

consumption is to use low-power sensors and lightweight signal processing to filter out irrelevant

hand movements. Face-touch gestures and their similar gestures (e.g., drinking) involve large an-

gular variations at the elbow and small angular variations at the shoulder [30]. In the long term, the

frequency of such gestures is extremely low (e.g., less than 30 times per hour) for regular users in

working environments (e.g., college students attending a class) [126]. Therefore, we design a cas-

cading classifier to turn each component on step by step so that the energy-consuming components

remain asleep for most of the time without missing any key movements. FaceTouch leverages low-

power and ubiquitous IMU sensors in existing wearable devices to detect gestures that approach

the face. We then design a novel vibration sensor on a smart ring to distinguish the face touch

from false-positive behaviors that involve different touching surfaces. Specifically, we inject vibra-

tion chirps (1–10 kHz) into the finger and monitor the vibration that propagates through the finger

to classify surface materials. Figure 4.2 illustrates the overview of FaceTouch, containing four

individual classifiers:

• Static wrist classifier: The first classifier leverages an always-on accelerometer and a simple

threshold-based approach to classify whether the wrist is static.

• Small movement classifier: If the system detects a hand movement, it turns on the gyroscope.

Then, both accelerometer and gyroscope data will be fed into a logistic regression classifier to

determine if hand movement is small or significant.

• DNN-based face-touch detector: The temporal sequences of accelerometer and gyro data will

61

Table 4.1 Example of scenarios that trigger the cascading classifiers at different steps. "indicates
a possible face touch event and will trigger the next classifier for further validation. 8 indicates the
next classifier will not be triggered due to irrelevance between the current event and a face-touch
event. FaceTouch detects a face touch only if the results of all classifiers are ".

Cascading Steps

Scenario

Static
Small Movements(typing, reading, ...)

Static Wrist
Classifier
8
"

Small Movement
Classifier
8
8

DNN-based
Face-touch Detector
8
8

Vibration-based
Surface Classifier
8
8

Large Movements

Confounding Gestures
(drinking water, smoking,...)
Face Touch
Other Large Movements

"

"
"

"

"
"

"

"
8

8

"
8

be fed into a recurrent neural network (RNN) based deep neural network to classify whether the

hand is approaching the face.

• Surface-touch classifier: If the system detects a face-touch gesture, it turns on the vibration

sensor. The signal captured by the sensor is then processed with a band-pass filter (1–10 kHz),

a Savitzky–Golay filter, and a Fast Fourier Transformation (FFT) to extract frequency responses,

which are fed into a series of boosted trees to classify the type of touching the surface. The system

detects a face touch only if the movement involves both a face-touch gesture and a skin-touch event.

Table 4.1 demonstrates how FaceTouch filters various non-face-touch events out while guaran-

tee high precision and recall. For example, all four classifiers will be triggered for confounding

gestures that are similar to face-touch gestures. However, these confounding gestures do not in-

volve a skin-touch event and will be filtered out by the vibration-based surface classifier. For other

large movements that do not involve face-touch gestures, FaceTouch reduces energy consumption

by remaining the vibration-based classifier asleep. It detects a face touch only if all classifiers are

triggered, and the vibration-based classifier returns true.

4.3 Vibration-based Surface Touching Classification

In this section, we first present how touching various surfaces affects vibration waves, followed

by how we measure the wave changes using a novel vibration sensor and extract unique features

associated with surface-touch events. The vibration sensing serves as a skin-touch detector. Face-

Touch leverages it to filter out false-positive gestures (e.g., drinking, calling, etc.) because the user

does not touch a skin surface while performing these gestures.

62

Vibration Through Human Skin The propagation of vibration appears differently through

media like gases, liquids, and solids. By analyzing the propagation of vibration along the touch

surfaces we can estimate various material properties, including stiffness, damping ratio, elastic

constants, and viscoelasticity. And these material properties can be further used to analyze the

material characterization of composite materials like the human skin [127]. Prior work has shown

the finger is a good conductor for vibration propagation [128], and demonstrated the capability

of leveraging vibration through the finger to transmit data to other surfaces [128], recognize hand

gestures [129], localize finger taps on unmodified surfaces [130], and user authentication [131].

However, all efforts distribute the transmitter and the receiver on different surfaces or parts of the

body. Therefore, they all require two separate devices on the body (e.g., one on the finger and

the other on the face), which is not user-friendly and practical. In this paper, we aim to design a

novel vibration sensor using only a single point of instrumentation on the finger for surface touch

detection.

4.3.1 Vibration-based Surface-touch Detection

We exploit the spectral properties of reflected vibration changes to classify touch surfaces. Com-

pared with no-touch events, touching different materials has a significant impact on how the touch

surface reflects the vibrations. As shown in Figure 4.3, we place both the transmitter and the re-

ceiver on the index proximal. The vibration can propagate from the transmitter to the receiver via

two major paths. One is the direct path from the transmitter to the receiver (marked as the green

line), and the vibration amplitude remains the same for touch and no-touch events. The other is

the indirect path from the fingertip to the receiver (marked as blue, purple, and black lines). The

amplitude of vibration via indirect path depends on the material’s properties (e.g., mass, density,

and spring constant) and vibration frequency [128, 131]. For example, when the vibrations prop-

agate from the finger to a less dense medium (e.g., water), the phase of the reflected wave will be

inverted, and the amplitude of the reflected waves will be smaller than that of passing from the fin-

ger to a dense medium (e.g., metal). Therefore, by monitoring the amplitude of reflected vibration,

our system can detect whether the finger touches a surface and classify the material.

63

(a) No-touch event

(b) Surface 1 touch event

(c) Surface 2 touch event

Figure 4.3 The rationale of proposed surface touch detection using vibration sensing.

We validate the sensing rationale using two Piezo-based transducers [132]. In this experiment,

we place one sensor on the top of the index proximal as the transmitter and the other at the bottom

of the index proximal as the receiver. The transmitter leverages an LM386 amplifier to inject the

vibration at 2.5 kHz. The captured signal is amplified using an LMV358 amplifier and sampled

by a 12-bit ADC at 100 ksps. After applying a bandpass filter, we perform 2048-points FFT with

a sliding window to continuously extract the frequency response at 2.5 kHz. With this setup, we

asked a user to perform a pointing gesture and touch five different surfaces (skin, glass, wood, cloth,

and rubber).

The red error bars in Figure 4.4 show the mean frequency response at 2.5 kHz with 90% con-

fidence intervals. We have two observations. First, since the frequency response of surface-touch

events is significantly higher than that of no-touch events, the frequency response change can be

used to detect surface-touch events. Second, we observe a negative correlation between the fre-

quency response and material hardness. Soft materials absorb more energy from mechanical waves,

which decreases the corresponding frequency response [133]. Thus, we can leverage the observa-

tion to classify the surface material. We also repeated the experiment on ten users and observed

similar results, which validated the feasibility of using the wearable ring to capture the reflected

vibration change caused by various materials.

Identification of Sensing Sites We investigated various skin sites on the hand for surface-touch

detection. As shown in Figure 4.5, we asked a user to wear our sensor on 14 phalanges of the five

fingers and then touched the face ten times using corresponding fingertip (e.g., using the index finger

to touch the face when the user wears the sensor on the three index phalanges.). Figure 4.6 shows

64

Figure 4.4 Frequency response at 2.5 kHz: the user touches five surfaces using either a fixed gesture
or other gestures.

Figure 4.5 The 14 different measurement locations on the hand.

the mean frequency response at 2.5 kHz (with 90% confidence intervals as error bars) when the user

wears the sensor on the 14 phalanges of the five fingers (x-axis). Overall, the frequency response

increases when the sensor is closer to the fingertip. We observe the frequency response on the thumb

is relatively low because the thumb has more dedicated muscle and a bigger nail, which can either

block or absorb some vibration and decrease the frequency response [134]. Except for the thumb,

the sensing granularity on other fingers is similar to each other. Since the ring is normally worn

on the proximal and most face touch events involve index finger touch [135], we select the index

proximal as the optimal sensing site. In this paper, we focus on optimizing the sensing sites on the

finger due to the signal attenuation on the skin surface. To support long-distance sensing capability,

the vibration generator may leak hearable sound to the environment, degrading the user experience.

We carefully design a vibration sensor and fine-tune the duty cycle to minimize the hearable sound

to the user. And we will discuss how to block the hearable sound to the environment in Sec. 4.7.

We also explore the capability of detecting surface touch events using other fingers while the

sensor is placed on the index proximal. Figure 4.7 shows the comparison of the frequency response

65

 0 5 10 15 20notouchglasswoodclothskinrubberFrequency ResponseMaterialsPointingOthersFigure 4.6 Comparison of the frequency response at 2.5 kHz on 14 skin sites.

Figure 4.7 Comparison of the frequency response at 2.5 kHz at five fingers.

at 2.5 kHz when different fingers touch the face. We observe the frequency response drops when

other fingers touch the face because the wave propagation distance increases. However, it is still

possible to reliably detect the touch events when other fingers touch the face. We leave it as future

work.

4.3.2 Practical surface-touch detection

Although we have shown the capability of detecting surface-touch events using a novel vibration

sensor on the finger, we conducted all experiments when the user performs a fixed pointing gesture.

However, hand gestures affect skin tension, which can generate similar features to face touch signals.

The green bars in Figure 4.4 show the mean frequency response with 90% confidence intervals

as error bars when the user performs various hand gestures and touches surfaces. We observed

the error bars overlap among surfaces, which degraded the sensing granularity and robustness.

Therefore, the frequency response on a narrow band (e.g., 2.5 kHz) can be noisy across diverse

users.

To overcome the challenge, we inject vibration chirps (1–10 kHz²) on the finger to extract unique

²The minimum frequency of the chirp signal is 1 kHz, which is high enough that users can hardly feel the vibra-

tion [136].

66

 0 20 40 60 80ThumbIndexMiddleRingLittleFrequency response at 2.5 kHzDistalMiddleProximal 0 5 10 15 20ThumbIndexMiddleRingLittleFrequency response at 2.5 kHzFigure 4.8 Frequency responses of the chirp signal (1–10 kHz) when touching on five materials.

features related to face touch events. Vibration chirps have been investigated to recognize finger

gestures, and their frequency responses vary based on how vibration wave travels through the finger,

namely tissue, muscle, blood, and bone [129]. Therefore, by scanning a wide variety of vibration

frequencies, FaceTouch leverages the properties of different waves propagating through the fin-

ger to extract unique features for surface-touch classification. We conducted a user study with ten

participants to validate the design, and participants touched five surfaces with various gestures. Fig-

ure 4.8 shows the mean frequency responses within 1–10 kHz. Overall, the frequency responses

of no-touch events are significantly lower than those of touch events for most of the frequencies.

The observation implies fewer vibration waves can be reflected from the fingertip during no-touch

events. Besides, we observe that surface materials affect the power of frequency responses at dif-

ferent vibration frequencies. For example, the frequency responses when touching the skin are the

highest among all touch events after 3.5 kHz. The frequency responses of the rubber touch is the

highest within 2.5–3.5 kHz.

4.3.3 Touch event detection and touch surface classification

According to the observations above, our feature extraction is based on a comparison to a no-

touch signal, which can be recorded when the user wears the sensor for the first time. Then, we

compute the mean Euclidean distance of 192 frequency responses between the current chirp signal

and the no-touch signal. To classify the surface materials, we extend the feature vector from 192 to

235 by adding the Mel-frequency cepstral coefficients (MFCC), zero-crossing rate, mean energy,

and energy variation. The features are then fed into a supervised learning model to capture the

relationship between the feature vector and surface materials. Specifically, we choose boosted tree

classification [137] that optimizes a sequence of classification trees with weights associated with

67

 0 5 10 15 20 25 30 0 2 4 6 8 10Frequency responseFrequency (kHz)No touchGlassWoodClothSkinRubberdecisions. As the number of trees grows, it can correct errors made by the previously trained tree.

We also implement our touch event classification model using other classification backends (e.g.,

SVM and deep neural networks). However, these models do not outperform boosted trees classi-

fication because the feature dimensionality is relatively small in our system, and running a large

number of iterations degrades the performance due to the overfitting problem. Another benefit of

boosted tree classification is its low computing complexity because it only involves comparison

and addition operations during real-time inference. When we deploy the model on embedded sys-

tems (e.g., STM32 or MSP432), the inference latency and energy consumption are significantly

lower than that of SVM and deep neural networks. Specifically, the time complexity of boosted

tree classification is less than 5% and 11% of that using a four-layer feed-forward neural network

and SVMs, respectively. As a result, those models consume 98% and 93% more energy to classify

surface materials. Overall, boosted tree classification achieves a better trade-off between efficiency

and accuracy.

The vibration-based module can only detect skin touches and cannot distinguish face touch from

other skin touches (e.g., hand touch, clenching fist). However, our cascading mechanism mentioned

in Sec. 4.2 can effectively filter these confounding skin-touch events because these movements do

not involve face-touch gestures.

4.4 Wrist-IMU based Face-Touch Gesture Detection

Based on the acceleration and rotation data obtained from a wrist-IMU, we design a sequential

face-touch gesture detector with three classifiers to optimize energy efficiency and detection preci-

sion for long-term continuous usage. First, we design a threshold-based static wrist classifier that

filters out the static periods in which no gesture is involved. For the rest of the wrist movements,

we use a small-movement classifier to further filter out those movements which could not be a face-

touch gesture with explicit feature definition. At last, we develop a DNN-based face-touch gesture

classifier to accurately recognize face-touch gestures from other confounding gestures. In compar-

ison with the first classifier, the computational overhead of the later classifiers is high but they will

be triggered infrequently. Thus, we optimize the energy efficiency for continuous monitoring while

68

keeping a high precision and recall.

4.4.1

IMU Data Input

We use the accelerometer and gyroscope of a 9-axis IMU equipped on a commercial off-the-

shelf smartwatch to record the acceleration and rotation data of the wrist. Specifically, at any

timestamp 𝑡 under 50Hz sampling rate, we adopt Android API 28 to obtain the 3-axis linear ac-

celeration 𝑎(𝑡) and the orientation 𝑜(𝑡) of the device in a Global Reference Frame (GRF) in the

form of a quaternion which is a mathematical entity that provides convenience for representing

three-dimension orientations and rotations of objects. The linear acceleration reflects the linear

movements of the wrist while from the change of device orientation in GRF, we can calculate the

rotation angles of the wrist in GRF [138]. As face-touch gestures involve a series of wrist move-

ments and rotations, to determine whether a user touches the face at time 𝑡𝑖, 𝑖 ∈ N, we look ahead

for a short time interval Δ𝑡 and take two sequences of the acceleration and rotation data samples

𝑎 = {𝑎(𝑡𝑖 − Δ𝑡), ..., 𝑎(𝑡𝑖)} and 𝑆𝑖
𝑆𝑖

𝑜 = {𝑜(𝑡𝑖 − Δ𝑡), ..., 𝑜(𝑡𝑖)} directly as the input of our face-touch

gesture detector. Since we do not leverage accelerations and orientations for location tracking, the

drifts of IMU sensors will not be accumulated over time, resulting in acceptable bounded errors.

4.4.2 Always-on Static Wrist Classifier

According to the common daily life pattern, people’s wrists are almost static most of the time in

a day due to sleeping, resting, etc. Face touch rarely happens during static periods. In addition, it

would be a waste of energy to run all three classifiers to detect face-touch gestures during the static

periods. Therefore, as the basic step, we filter out all static periods to improve the energy efficiency

without losing detection accuracy by designing a computation-efficient static-wrist classifier.

Method Design Specifically, when a user’s wrist remains static, both the average and variance

of linear acceleration values within the static period should be close to 0. With two non-negative pre-

configured threshold vectors 𝑇1 = [𝑇 0
1

] and 𝑇2 = [𝑇 0
2
age and variance of the 3-axis acceleration sequence 𝑎(𝑡𝑖) =

, 𝑇 1
1

, 𝑇 2
1

, 𝑇 1
2
∑

, 𝑇 2
2

], we compare the absolute aver-

(𝑎 𝑗 )∈𝑆𝑖
𝑎

(𝑎 𝑗 ) = [𝑎(𝑡𝑖)0, 𝑎(𝑡𝑖)1, 𝑎(𝑡𝑖)2]

and 𝜎(𝑎(𝑡𝑖)) = [𝜎(𝑎(𝑡𝑖)0), 𝜎(𝑎(𝑡𝑖)1), 𝜎(𝑎(𝑡𝑖)2)] with 𝑇1 and 𝑇2, separately.

If both 𝑎(𝑡𝑖) and

𝜎(𝑎(𝑡𝑖)) are lower than 𝑇1 and 𝑇2 on all 3-axis, the status of the user’s wrist is recognized as static.

69

As such, the next-tier classifier would not be triggered and continuously check the next acceleration

sequence.

4.4.3 Small Movement Classifier

Among all possible arm gestures that human beings can perform, face-touch gestures are only

a small part of them. Common face-touch gestures involve relatively large movements of the whole

forearm and even the elbow. In this case, the wrist moves faster and we can expect that the wrist

acceleration changes rapidly, and the wrist twists obviously. We further utilize the gap between

small movements and face-touch gestures to improve computational efficiency by filtering out those

gestures that only involve small wrist movements.

Method Design First, we calculate the rotation matrix 𝑅(𝑡𝑖) from orientation quaternion 𝑜(𝑡𝑖),

and then obtain the related 3-axis euler angles 𝜃 (𝑡𝑖) = [𝜃 (𝑡𝑖)0, 𝜃 (𝑡𝑖)1, 𝜃 (𝑡𝑖)2]. Then given a rotation

sequence 𝑆𝑖

𝑜, we can convert it to an rotation-angle sequence 𝑆𝑖

𝜃 = {𝜃 (𝑡𝑖 − Δ𝑡), ..., 𝜃 (𝑡𝑖)}. We use
𝜃 (𝑡𝑖) and 𝜎(𝜃 (𝑡𝑖)) to indicate the average and variance of the rotation-angle sequence. Moreover,

we mainly consider five statistical values of wrist movements: 𝑎(𝑡𝑖), 𝜎(𝑎(𝑡𝑖)), 𝜃 (𝑡𝑖), 𝜎(𝜃 (𝑡𝑖)), and

𝜃 (𝑡𝑖) − 𝜃 (𝑡𝑖 − Δ𝑡) that can reflect the displacement, speed, and rotation changing features of wrist

movements. With these features, we train a logistic regression model [122, 123, 139] to recognize

whether a wrist movement is small or not. If a small wrist movement is detected, we will switch

to running the static-wrist classifier for continuous static wrist checking. Finally, the logistic re-

gression model training makes sure that the face-touch gestures would not be classified into the

category of small wrist movements.

4.4.4 DNN-based Face-touch Detector

After the processing of the first two classifiers, most gestures that are not related to face-touch

events are filtered out with low computation cost. To accurately detect face-touch gestures out of

other wrist movements with the same magnitude, we need to extract more gesture-specific features

from the raw data sequence with a more powerful model. As the gesture data consists of sequential

accelerations and rotations, it contains temporal relation between data at different time steps where

such relationship is unique across different gesture types. Therefore, we need a sequence prediction

70

Figure 4.9 The DNN model for detecting face-touch gestures. The DNN consists of a single-layer
GRU and a 2-layer MLP. The GRUs take the raw data sequences as input and the MLP outputs
whether a face-touch gesture is detected.

Table 4.2 DNN-based Face-touch Detector configurations.

Input(6 × 𝑡𝑖 sequence)
GRU-32
FC-16
ReLU
FC-2
soft-max
Output(2 × 1)

model to capture such temporal relations.

Method Design LSTM [30] is a widely used DNN unit to handle sequential data. Considering

the computation efficiency, however, we adopt a GRU unit [124] that requires less computational

cost while achieving a similar performance of LSTM for short sequence data. GRU-based DNN

can learn the inherent relationship between sequential IMU data and face-touch/non-face-touch

behaviors as long as the DNN can be trained with a dataset that covers the feature space as much as

possible. To determine whether a user touches face at time 𝑡𝑖, we generate the network input data

sequences with the acceleration sequence 𝑆𝑖

𝑎 and angle sequence 𝑆𝑖

𝜃 as follows:

𝑋 = 𝑥𝑡1

, ..., 𝑥𝑡 𝑗 , .., 𝑥𝑡𝑖

𝑥𝑡 𝑗 = [𝑎(𝑡 𝑗 )0, 𝑎(𝑡 𝑗 )1, 𝑎(𝑡 𝑗 )2, 𝜃 (𝑡 𝑗 )0, 𝜃 (𝑡 𝑗 )1, 𝜃 (𝑡 𝑗 )2],

(4.1)

where 𝑡1 is 𝑡𝑖 − Δ𝑡, 𝑡 𝑗 is a timestamp within range [𝑡𝑖 − Δ𝑡, 𝑡𝑖] when acceleration and orientation are

sampled at 50Hz.

As in Figure. 4.9, our network architecture consists of GRU network noted as 𝐹gru and Multiple

Layer Perceptron (MLP) network noted as 𝐹mlp. The GRU network accepts sequence input 𝑋 and

outputs a hidden feature ℎ𝑖 at the last time step 𝑡𝑖 which is then taken by the MLP network as input.

The MLP network outputs the prediction results. For training with labeled data, we adopt cross-

71

 2<latexit sha1_base64="2DLa1zgXnt1VjoVMlr24f+NcyUQ=">AAAB73icbVBNSwMxEJ3Ur1q/qh69BIvgqexWQY9FLx4r2A9ol5JNs21okl2TrFCW/gkvHhTx6t/x5r8xbfegrQ8GHu/NMDMvTAQ31vO+UWFtfWNzq7hd2tnd2z8oHx61TJxqypo0FrHuhMQwwRVrWm4F6ySaERkK1g7HtzO//cS04bF6sJOEBZIMFY84JdZJnZ7hQ0n6tX654lW9OfAq8XNSgRyNfvmrN4hpKpmyVBBjur6X2CAj2nIq2LTUSw1LCB2TIes6qohkJsjm907xmVMGOIq1K2XxXP09kRFpzESGrlMSOzLL3kz8z+umNroOMq6S1DJFF4uiVGAb49nzeMA1o1ZMHCFUc3crpiOiCbUuopILwV9+eZW0alX/olq7v6zUb/I4inACp3AOPlxBHe6gAU2gIOAZXuENPaIX9I4+Fq0FlM8cwx+gzx/Il4/M</latexit> 1<latexit sha1_base64="GCOVvUeYeAeb+ag4WpSjasXyuZw=">AAAB73icbVBNSwMxEJ3Ur1q/qh69BIvgqexWQY9FLx4r2A9ol5JNs21okl2TrFCW/gkvHhTx6t/x5r8xbfegrQ8GHu/NMDMvTAQ31vO+UWFtfWNzq7hd2tnd2z8oHx61TJxqypo0FrHuhMQwwRVrWm4F6ySaERkK1g7HtzO//cS04bF6sJOEBZIMFY84JdZJnZ7hQ0n6fr9c8areHHiV+DmpQI5Gv/zVG8Q0lUxZKogxXd9LbJARbTkVbFrqpYYlhI7JkHUdVUQyE2Tze6f4zCkDHMXalbJ4rv6eyIg0ZiJD1ymJHZllbyb+53VTG10HGVdJapmii0VRKrCN8ex5POCaUSsmjhCqubsV0xHRhFoXUcmF4C+/vEpatap/Ua3dX1bqN3kcRTiBUzgHH66gDnfQgCZQEPAMr/CGHtELekcfi9YCymeO4Q/Q5w/HE4/L</latexit>1 <latexit sha1_base64="+Gd2lnx8ET+3Oj1xiJU1T0CbBhs=">AAAB6XicbVBNS8NAEJ34WetX1aOXxSJ4sSRV0GPRi8cq9gPaUDbbSbt0swm7G6GE/gMvHhTx6j/y5r9x2+agrQ8GHu/NMDMvSATXxnW/nZXVtfWNzcJWcXtnd2+/dHDY1HGqGDZYLGLVDqhGwSU2DDcC24lCGgUCW8Hoduq3nlBpHstHM07Qj+hA8pAzaqz04J33SmW34s5AlomXkzLkqPdKX91+zNIIpWGCat3x3MT4GVWGM4GTYjfVmFA2ogPsWCpphNrPZpdOyKlV+iSMlS1pyEz9PZHRSOtxFNjOiJqhXvSm4n9eJzXhtZ9xmaQGJZsvClNBTEymb5M+V8iMGFtCmeL2VsKGVFFmbDhFG4K3+PIyaVYr3kWlen9Zrt3kcRTgGE7gDDy4ghrcQR0awCCEZ3iFN2fkvDjvzse8dcXJZ47gD5zPH+U0jPA=</latexit>tanh<latexit sha1_base64="JCKAJELOrhWYbuhTeFVsCtLE3IE=">AAAB8nicbVBNS8NAEN3Ur1q/qh69LBbBU0mqoMeiF48V7Ae0oWy2k3bpZhN2J2IJ/RlePCji1V/jzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVHJo8lrHuBMyAFAqaKFBCJ9HAokBCOxjfzvz2I2gjYvWAkwT8iA2VCAVnaKVuD+EJM2RqNO2XK27VnYOuEi8nFZKj0S9/9QYxTyNQyCUzpuu5CfoZ0yi4hGmplxpIGB+zIXQtVSwC42fzk6f0zCoDGsbalkI6V39PZCwyZhIFtjNiODLL3kz8z+umGF77mVBJiqD4YlGYSooxnf1PB0IDRzmxhHEt7K2Uj5hmHG1KJRuCt/zyKmnVqt5FtXZ/Wanf5HEUyQk5JefEI1ekTu5IgzQJJzF5Jq/kzUHnxXl3PhatBSefOSZ/4Hz+APPjkbA=</latexit>h0<latexit sha1_base64="dxR1q4FeOmmzi2DaPrP43MJCT14=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0MOq7/XLFrbpzkFXi5aQCORr98ldvELM0QmmYoFp3PTcxfkaV4UzgtNRLNSaUjekQu5ZKGqH2s/mpU3JmlQEJY2VLGjJXf09kNNJ6EgW2M6JmpJe9mfif101NeO1nXCapQckWi8JUEBOT2d9kwBUyIyaWUKa4vZWwEVWUGZtOyYbgLb+8Slq1qndRrd1fVuo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnBfn3flYtBacfOYY/sD5/AHzC42T</latexit>h1<latexit sha1_base64="yiZQYYYwO+F5pP+nrgz81eCgwJQ=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0MOp7/XLFrbpzkFXi5aQCORr98ldvELM0QmmYoFp3PTcxfkaV4UzgtNRLNSaUjekQu5ZKGqH2s/mpU3JmlQEJY2VLGjJXf09kNNJ6EgW2M6JmpJe9mfif101NeO1nXCapQckWi8JUEBOT2d9kwBUyIyaWUKa4vZWwEVWUGZtOyYbgLb+8Slq1qndRrd1fVuo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnBfn3flYtBacfOYY/sD5/AH0j42U</latexit>GRUGRUMLPtouch/not<latexit sha1_base64="/l6/EKwxhkg/EhzuftqJeGXxbQQ=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBiyURUY9FLx4r2A9oQ9lsN+3azSbsToQS+iO8eFDEq7/Hm//GbZuDtj4YeLw3w8y8IJHCoOt+O4WV1bX1jeJmaWt7Z3evvH/QNHGqGW+wWMa6HVDDpVC8gQIlbyea0yiQvBWMbqd+64lrI2L1gOOE+xEdKBEKRtFKrWEvezzzJr1yxa26M5Bl4uWkAjnqvfJXtx+zNOIKmaTGdDw3QT+jGgWTfFLqpoYnlI3ogHcsVTTixs9m507IiVX6JIy1LYVkpv6eyGhkzDgKbGdEcWgWvan4n9dJMbz2M6GSFLli80VhKgnGZPo76QvNGcqxJZRpYW8lbEg1ZWgTKtkQvMWXl0nzvOpdVr37i0rtJo+jCEdwDKfgwRXU4A7q0AAGI3iGV3hzEufFeXc+5q0FJ585hD9wPn8A7J+PTQ==</latexit>hj 1<latexit sha1_base64="5aXj+4IbHOf5y5LJ9DXdQOox5q8=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE1GPRi8eK9gPaUDbbTbt2swm7E6GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSKQw6LrfTmFldW19o7hZ2tre2d0r7x80TZxqxhsslrFuB9RwKRRvoEDJ24nmNAokbwWjm6nfeuLaiFg94DjhfkQHSoSCUbTS/bD32CtX3Ko7A1kmXk4qkKPeK391+zFLI66QSWpMx3MT9DOqUTDJJ6VuanhC2YgOeMdSRSNu/Gx26oScWKVPwljbUkhm6u+JjEbGjKPAdkYUh2bRm4r/eZ0Uwys/EypJkSs2XxSmkmBMpn+TvtCcoRxbQpkW9lbChlRThjadkg3BW3x5mTTPqt5F1bs7r9Su8ziKcATHcAoeXEINbqEODWAwgGd4hTdHOi/Ou/Mxby04+cwh/IHz+QNLrI3P</latexit>hj<latexit sha1_base64="WLJaS/qMmYfYN7Lx4+34/ISxUM4=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBiyWRoh6LXjxWsB/QhrLZbtqlm03YnQgl9Ed48aCIV3+PN/+N2zYHbX0w8Hhvhpl5QSKFQdf9dgpr6xubW8Xt0s7u3v5B+fCoZeJUM95ksYx1J6CGS6F4EwVK3kk0p1EgeTsY38389hPXRsTqEScJ9yM6VCIUjKKV2qN+Ji68ab9ccavuHGSVeDmpQI5Gv/zVG8QsjbhCJqkxXc9N0M+oRsEkn5Z6qeEJZWM65F1LFY248bP5uVNyZpUBCWNtSyGZq78nMhoZM4kC2xlRHJllbyb+53VTDG/8TKgkRa7YYlGYSoIxmf1OBkJzhnJiCWVa2FsJG1FNGdqESjYEb/nlVdK6rHpXVe+hVqnf5nEU4QRO4Rw8uIY63EMDmsBgDM/wCm9O4rw4787HorXg5DPH8AfO5w/rGI9M</latexit>hi 1<latexit sha1_base64="VFf4YIXObs+D5uIrFtKWgNRC1DI=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseiF48V7Qe0oWy2m3bpZhN2J0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6xEnC/YgOlQgFo2ilh1Ff9MsVt+rOQVaJl5MK5Gj0y1+9QczSiCtkkhrT9dwE/YxqFEzyaamXGp5QNqZD3rVU0YgbP5ufOiVnVhmQMNa2FJK5+nsio5ExkyiwnRHFkVn2ZuJ/XjfF8NrPhEpS5IotFoWpJBiT2d9kIDRnKCeWUKaFvZWwEdWUoU2nZEPwll9eJa2LqlereveXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OZI58V5dz4WrQUnnzmGP3A+fwBKKI3O</latexit>hi<latexit sha1_base64="0gYGVt5cH2vu5AqRkAwaDPu2TBk=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4KomIeix68VjBfkAbwma7aZduNmF3IpbQH+HFgyJe/T3e/Ddu2xy09cHA470ZZuaFqRQGXffbWVldW9/YLG2Vt3d29/YrB4ctk2Sa8SZLZKI7ITVcCsWbKFDyTqo5jUPJ2+Hoduq3H7k2IlEPOE65H9OBEpFgFK3UfgpyDMQkqFTdmjsDWSZeQapQoBFUvnr9hGUxV8gkNabruSn6OdUomOSTci8zPKVsRAe8a6miMTd+Pjt3Qk6t0idRom0pJDP190ROY2PGcWg7Y4pDs+hNxf+8bobRtZ8LlWbIFZsvijJJMCHT30lfaM5Qji2hTAt7K2FDqilDm1DZhuAtvrxMWuc177Lm3V9U6zdFHCU4hhM4Aw+uoA530IAmMBjBM7zCm5M6L8678zFvXXGKmSP4A+fzB7XYj9E=</latexit>xti<latexit sha1_base64="8EL/r+qOv8n5Gaq71KWaOkWHPRc=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE1GPRi8cK9gPaEDbbbbt2swm7E7GE/ggvHhTx6u/x5r9x2+agrQ8GHu/NMDMvTKQw6LrfTmFldW19o7hZ2tre2d0r7x80TZxqxhsslrFuh9RwKRRvoEDJ24nmNAolb4Wjm6nfeuTaiFjd4zjhfkQHSvQFo2il1lOQYfAwCcoVt+rOQJaJl5MK5KgH5a9uL2ZpxBUySY3peG6CfkY1Cib5pNRNDU8oG9EB71iqaMSNn83OnZATq/RIP9a2FJKZ+nsio5Ex4yi0nRHFoVn0puJ/XifF/pWfCZWkyBWbL+qnkmBMpr+TntCcoRxbQpkW9lbChlRThjahkg3BW3x5mTTPqt5F1bs7r9Su8ziKcATHcAoeXEINbqEODWAwgmd4hTcncV6cd+dj3lpw8plD+APn8we3XY/S</latexit>xtj<latexit sha1_base64="4CJFm8f544A3bEk+whf9v5cjEk0=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4KomIeix68VjBfkAbwma7aZduNmF3IpbQH+HFgyJe/T3e/Ddu2xy09cHA470ZZuaFqRQGXffbWVldW9/YLG2Vt3d29/YrB4ctk2Sa8SZLZKI7ITVcCsWbKFDyTqo5jUPJ2+Hoduq3H7k2IlEPOE65H9OBEpFgFK3UfgpyDLxJUKm6NXcGsky8glShQCOofPX6CctirpBJakzXc1P0c6pRMMkn5V5meErZiA5411JFY278fHbuhJxapU+iRNtSSGbq74mcxsaM49B2xhSHZtGbiv953Qyjaz8XKs2QKzZfFGWSYEKmv5O+0JyhHFtCmRb2VsKGVFOGNqGyDcFbfHmZtM5r3mXNu7+o1m+KOEpwDCdwBh5cQR3uoAFNYDCCZ3iFNyd1Xpx352PeuuIUM0fwB87nD2DAj5k=</latexit>xt1Figure 4.10 FaceTouch ring prototype.

entropy loss:

L𝑙 =E(𝑋,𝑦)∼X𝑙

(1 − 𝑦) log (1 − 𝐹mlp(𝐹gru(𝑋)))

(4.2)

+ 𝑦 log 𝐹mlp(𝐹gru(𝑋))

where X𝑙 is the set of labeled training data. We adopt consistency regularization for the DNN model

training with unlabeled data to make the prediction robust to small perturbations which can improve

the performance of the face-touch detection model when labeled data is rare. In such case, the loss

is defined as:

L𝑢 = E𝑋∼X𝑢,𝛿∼N (0,𝜎𝑛) |𝐹mlp(𝐹gru(𝑋 + 𝛿)) − 𝐹mlp(𝐹gru(𝑋))|

(4.3)

where X𝑢 is the set of unlabeled training data and 𝛿 is a Gaussian noise N (0, 𝜎𝑛) to perturb the

input, where 𝜎𝑛 is much smaller than the scale of 𝑋.

4.5

Implementation

The prototype consists of two main components, namely a novel vibration sensing unit and

an IMU sensing unit. For vibration-based sensing, we designed and implemented a compact and

low-cost (e.g., < $80) wearable ring (Figure 4.10) using off-the-shelf hardware components. The

sensing unit is powered by a small 400 mWh battery. For IMU sensing, we adopt a commercial

off-the-shelf Moto 360 smartwatch.

Vibration Sensing Unit To minimize the form factor of the sensing unit, we utilized two Piezo

transducers (PHUA2010), each of which is 20 mm by 10 mm in size. We built a ring-like device to

attach both transducers. One is placed on top of the index proximal to inject vibration chirp (1–10

kHz) into the finger and the other is placed at the bottom to capture the surface waves from the finger.

The transmitter sends one 50-ms chirp of 1.65 Vpp per second. We utilize an LM386 amplifier to

72

amplify the chirp signal. The frequency range of the chirp signal was optimized in Sec. 4.3.2

to retain maximum vibration information propagating in the finger. The receiver is connected to

an LMV358 amplifier (amplification gain is 32) to amplify the vibration signals. The amplified

signals are then sampled by a 12-bit ADC at 100 ksps. Both amplifiers have large quiescent current

draws (e.g., 0.2–4 mA). To minimize the power consumption of the sensing unit, the positive supply

voltages (VCC) of amplifiers are connected to an analog switch, and we switch the amplifiers off

when the sensing unit is not triggered by the cascading model as mentioned in Sec. 4.2.

We use a MINI-M4 for MSP432 micro-controller³ to digitize vibration signals from amplifiers,

extract touch-related features, and the boosted tree is deployed on the micro-controller to detect

touch events and classify the surface materials. When the microcontroller is not active, it only

consumes 40 𝜇W power in idle mode. The microcontroller has a low-power MCU with 80 uA/MHz

in active mode and a 12-bit ADC with 450 uA at 100 ksps. During signal digitization, we leverage

the Direct memory access (DMA) in the microcontroller so that the main processor will be in

the low-power mode during data sampling. We implement the touch event detection algorithm in

Sec. 4.3.3 for binary skin-touch classification on the microcontroller. The number of boosted trees

is 50, and the maximum tree depth is 4. The prediction results will be transmitted to the nearby

devices via a Bluetooth transceiver [140]. We integrate a 400 mWh Lithium-ion battery underneath

the MINI-M4 board to power the sensing unit.

Wrist-IMU Sensing Unit We implement the face-touch gesture detection models including

Static Wrist Classifier, Small Movement Classifier, and DNN-based Face-touch Detector, and de-

ploy these models on a smartwatch (Moto 360 3rd Gen). The IMU data collected by the smartwatch

is sampled at 50Hz. Based on our real-world experiments, the average length of face-touch gestures

is around 1.5 seconds. Thus, to capture all face-touch events while ensuring the energy efficiency

of the system, we trigger the static wrist classifier every 500ms and take 1.5s data back from the

triggered timestamp. The system will process the raw sensor data based on our wrist-based gesture

detection in Sec. 4.4.

³To minimize the power consumption of the board, we removed all irrelevant components on the MINI-M4 FOR

MSP432 board, including the USB bridge and LED indicators.

73

We empirically selected the thresholds of the static classifier by calculating the average and

variance of linear acceleration values of multiple collected static wrist data records and data with

wrist movements. The thresholds are:𝑇1 = [0.1, 0.1, 0.1], 𝑇2 = [0.5, 0.5, 0.5]. Since the differences

between static/non-static data are significant, our static classifier achieves 100% precision and recall

in filtering out static data. The small movement classifier is fitted from historical data. We conduct

model selection on the small movement classifier by tuning the precision of the model to be as

higher as possible while keeping 100% recall for better final task performance. Table 4.2 shows the

configuration of DNN-based Face-touch Detector in Sec. 4.4.4. We adopt a single-layer GRU with

a hidden dimension of 32 and a 2-layer MLP with two fully connected layers of size 16 and size 2.

The DNN model is trained on data described in section 4.6.

4.6 Evaluation

In this section, we first discuss the experimental setup and the process of data collection, fol-

lowed by FaceTouch’s overall performance and practical considerations.

4.6.1 Experiment Setup and Data Collection

In our study, we recruited 10 participants (8 males and 2 females with ages ranging from 20

to 30+). For each participant, we collected both face-touch and non-touch data throughout vari-

ous daily activities (sitting, standing, and walking), touching surfaces (e.g., glass, wood, rubber),

confounding gestures involving the hand moving toward the head direction but not touching the

face(e.g., drinking, picking up the phone), and face-touch areas (e.g., jaw, forehead). By covering

as many scenarios as possible, we obtained a comprehensive dataset that includes various combina-

tions of movement replacement/speed/angle sequences when users touch face/not touch face. Dur-

ing the data collection process, the participants wore FaceTouch on their wrists and index fingers.

They were asked to perform gestures naturally as in daily life during the data collection. Though

they were wearing the prototype, their arm movements were not restricted, and the influence of the

prototype on their behaviors was minimized.

The two devices were clock-synchronized by network time. The inertial data consists of 3-axis

accelerometer data, orientation in quaternion form rotation vector acquired at 50Hz from Android

74

Figure 4.11 The illustration of the three scenarios in our dataset: (a) Sitting, (b) Standing: the four
colored spaces represent regions from which face-touch movements may start, and (c) Walking.

Table 4.3 Data composition of participants in our dataset.

#Gestures
Type

Activity Sitting
70 × 10
70 × 10

Face-touch
Non-touch
Confounding Gestures

Standing

70 × 10
70 × 10

Walking
Swing No Swing
20 × 10
70 × 10
20 × 10
20 × 10

100 × 10

API 28. The vibration data consists of 235 features described in Sec. 4.3.3. For data annotation,

participants were accompanied by an instructor who timely logged the start and end time of touching

face events while recorded by an RGB camera which was also clock synchronized. During the

data segmentation and labeling process, the annotator integrated the manually logged timestamps

of events with camera videos to get accurate labels. To collect comprehensive face-touch and non-

touch gestures, participants performed various arm and hand gestures in three typical daily activities

as in Figure 4.11. We collected about three-hour sensing data from each participant. Table 4.3

shows the data composition of each participant.

For face-touch behaviors, participants moved their hands from various starting points, which

are randomly and evenly sampled, and then touched one of the five areas on the face shown in

figure 4.12. For non-touch behaviors, participants performed various daily activities, including

static, typing, reading books, fetching objects, lifting, putting things down, exercising, and walking

with swing arms or non-swinging arms. We also collected many confounding gestures, including

drinking water, eating, brushing teeth, adjusting the glasses/masks/hair, picking up the phone, and

raising hands. These gestures inevitably confuse many existing systems using near-field sensing.

4.6.2 Overall Performance

Precision, Recall and F-1 Score We evaluate FaceTouch using leave-one-user-out cross-validation

75

Side ViewFront ViewArm SaggingArm Swinging(a) Sitting(b) Standing(c)WalkingFigure 4.12 Five face areas covered in our experiments.

Table 4.4 Comparison of experimental settings.

#Participants

FaceTouch
FaceSense

10
14

Session Duration
(per user)
3 hrs
15 mins

#Activity
Scenarios
3
1

Confounding
Gestures
various types
Only facial

Cross-user
model
"
"

which takes data of 9 participants as training data and data of the remaining participant as testing

data. We compare our results with three baselines. The first baseline is FaceTouch without the

vibration sensing module. The second method is the personalized model, in which the data of the

same participant is split into the training set and testing set by 8:2. And we also compare Face-

Touch’s performance with FaceSense (generic model) [29], a state-of-the-art face touch detection

system using a dedicated earbud. Table 4.4 shows the comparison of experimental settings. Both

studies have a similar user group. we evaluate FaceTouch via a long-term session (3 hours), while

FaceSense was evaluated for only 15 minutes per user. FaceTouch’s dataset contains three activity

scenarios (sitting, standing, and walking) and various types of confounding gestures (introduced in

section 4.6.1) that are similar to face touch. For the FaceSense study, all data are collected when

the participants are sitting, and only facial movements that are confounded with face touch are

considered. Both studies leverage the cross-user model to evaluate their systems.

Figure 4.13 shows the mean precision, recall, and F-1 score of ’s generic model, as well as the

three baselines. The main metric we adopt for comparison is the F-1 score. Overall, FaceTouch

achieves 93.5% F-1 score. Without our vibration sensing module, the F-1 score drops 4%. The

improvement mainly comes from reducing false positives since the vibration sensor can effectively

filter out non-skin touch events like drinking, calling phones, etc. Also, the F-1 score of FaceTouch

is only 1% lower than that of the personalized model, which demonstrates the generality of the

76

 Eye-nose Area Left Cheek Area ForeheadArea Right Cheek Area Jaw AreaFigure 4.13 FaceTouch’s overall performance.

Figure 4.14 Energy consumption of each step in FaceTouch’s cascading classification.

system. Finally, FaceTouch’s generic model outperforms the FaceSense by 9% on the F-1 score.

Energy Consumption and Inference latency We measure the energy consumption of Face-

Touch using a Monsoon High Voltage Power Monitor [141]. The energy consumption of each step

in FaceTouch’s cascading classification is shown in Figure 4.14. When the wrist is static, Face-

Touch only consumes 14.9 𝜇J every 500 ms to monitor the accelerometer data. Then, depending

on the wrist movement, different IMU-based classifiers will be triggered step by step, and the total

energy consumption to produce a face-touch inference is 1287.07 𝜇J. We also measure the energy

consumption of the vibration sensing unit, which is shown in Table 4.5. Overall, each face-touch in-

ference consumes about 2570 𝜇J with signal digitization (750 𝜇J), injecting vibration chirp signals

(330 𝜇J), and Bluetooth communication (1050 𝜇J) as the main contributors. When the cascading

IMU-based classifier does not trigger the vibration-sensing module, the main processor will remain

in low-power mode, and the vibration-sensing unit will be turned off. The power consumption in

this mode is around 40 𝜇W.

Table 4.5 Energy consumption for producing one face-touch inference using our vibration sensing
unit.

Vibration-sensing Unit

Microcontroller

Transmitter

Receiver

ADC

Energy

330𝜇J (± 2)

200𝜇J (± 2)

750𝜇J (± 5)

Surface-touch
classification
<240𝜇J (± 1)

Bluetooth

1050𝜇J (± 20)

77

020406080100Ours (GM)Ours (GM, IMU only)Ours (PM)FaceSense (GM)1008994998893879389848484Percentage (%)PrecisionRecallF10123456789logE(J)Static Wrist ClassifierSmall Movement ClassifierDNN-based DetectionVibration-based DetectionModel2570J1258J14.17J14.9JThe inference latency of FaceTouch depends on the type of gestures. Overall, the inference

latency for static wrist filtering, small movement filtering, face-touch gesture detection, and surface

classification is 0.049 ms, 0.046 ms, 3.01 ms, and 25 ms, respectively. In the worst case, when all

the classifiers are triggered, the total inference latency is 28.105 ms, which is still significantly less

than the face touch detection interval (500 ms).

Practicality To evaluate the practicality of using FaceTouch in the wild, we conducted a two-

hour user study in four typical scenarios. In scenario 𝑆1, the participant frequently touched the

face or performed confounding gestures and remained in the static state (e.g., the wrist is static)

for the rest of the time. In scenario 𝑆2, the participant was in the active state (e.g., the wrist moves

frequently) and frequently touched the face or performed confounding gestures. In scenario 𝑆3, the

participant conducted various activities (e.g., drinking, eating, etc.) involving many confounding

gestures. In scenario 𝑆4, the participant was in the active state but touched his/her face or per-

formed confounding gestures at normal frequency. Table 4.6 summarizes the frequency of face

touch and confounding gestures in the study, as well as the corresponding power consumption. We

have two observations. First, for normal scenarios (e.g., 𝑆4 where the participant performs various

non-face-touch gestures and occasionally touches the face), the total power consumption is below

61 𝜇W, which supports a run time of 273 days using a 400 mWh Lithium-ion battery. In this case,

the vibration sensing unit was rarely triggered since most irrelevant gestures were filtered by the

IMU sensing module. Second, 𝑆1, 𝑆2, and 𝑆3 involve many confounding gestures which trigger

all four classifiers frequently. However, in these extreme scenarios, the total power consumption is

still below 210 𝜇W, which supports a run time of 79 days using a 400 mWh Lithium-ion battery.

Scenarios 𝑆1 and 𝑆2 involve touch events at least once per minute and confounding gestures at least

three times per minute, and 𝑆3 assumes an even higher frequency of confounding gestures (e.g.,

25+ per min). These scenarios either hardly happen in real life or won’t last for hours. In general,

the real-life running time of FaceTouch using a 400 mWh Lithium-ion battery should be between

79 and 273 days depending on the frequency of face touch behaviors and confounding gestures.

Therefore, the long-term study demonstrates the practicality of FaceTouch in real use cases. And

78

(a) Face touch detection performance

(b) Energy consumption

Figure 4.15 The overall performance and the energy consumption of FaceTouch in four daily sce-
narios.

Table 4.6 Details of long-term experiments in four different scenarios and resulting average power.

Scenario

Frequency of
face Touch (/h)

Frequency of
Confounding Gestures (/h)

1
2
3
4

71
76
33
24

279
297
1645
15

Average Power(𝜇W)
Vibration
Sensing
52.83
57.83
54.26
16.42

IMU
Sensing
72.84
82.91
154.89
44.47

the battery life is several magnitudes longer than existing solutions [28, 29, 142]. Figure 4.15(a)

shows the overall performance of the long-term experiments using FaceTouch in four daily sce-

narios, FaceTouch achieves at least 97% F-1 score, demonstrating the robustness of FaceTouch in

realistic scenarios. Figure 4.15(b) shows the corresponding composition of energy consumption,

the energy consumption varies among the four scenarios. The always-on Static Wrist Classifier

consumes the same amount of energy in all the scenarios and most of the energy is consumed by

the IMU sensor. The energy consumption of the Small movement Classifier is negligible due to

its lightweight computation overhead. However, DNN and vibration sensing consume the most

energy in FaceTouch. Depending on the occurrence of confounding gestures and touch events, the

DNN-based Face-touch Gesture Detector consumes 20–60% of the total energy consumption. In-

terestingly, since the IMU-based sensing filtered out many irrelevant gestures, the vibration sensing

unit, which is the most power-hungry component in FaceTouch, is less triggered and consumes less

than 25% of total energy in all scenarios.

79

020406080100S1S2S3S4Percentage (%)PrecisionRecallF1Figure 4.16 Impact of user diversity among 10 users.

Figure 4.17 Impact of user activity.

4.6.3 Practical Considerations

We then examine the impact of user diversity, activities, confounding gestures, surface materi-

als, and training the DNN-based face-touch detector with unlabeled data.

User Diversity We first look at the impact of user diversity. Figure 4.16 compares the precision,

recall, and F-1 score across 10 participants using leave-one-user-out cross-validation. We observe

the F-1 score remains high (87–98%) among the 10 participants, which validates the multi-modal

sensing system significantly reduces false positives.

Activity Figure 4.17 shows the precision, recall, and F-1 score across three types of daily ac-

tivities. The performance in the standing scenario is the best, followed by that of the sitting and

walking scenarios. For standing scenarios, users often touch the face from the body side. Thus, the

movement of the face-touch gesture tends to be more significant than that when the user is sitting,

resulting in high detection accuracy. For walking scenarios, when the upper arm moves fast, the

movement of the lower arm can be ambiguous, resulting in erroneous features.

Confounding Gestures Confounding gestures like drinking and calling can degrade the sys-

tem’s accuracy and robustness. Figure 4.18 shows the impact of five false positive behaviors. The

recall of all these behaviors is almost 100%. Since none of the false positive behaviors involves

skin touch events, these behaviors can be effectively filtered out by our vibration-sensing module.

80

 0 20 40 60 80 100 1 2 3 4 5 6 7 8 9 10Percentage (%)User IDPrecisionRecallF1 0 25 50 75 100SittingStandingWalkingPercentage (%)Daily activityPrecisionRecallF1Figure 4.18 Impact of confounding gestures.

Figure 4.19 Confusion matrix: surface material classification.

Surface Materials Classification In this experiment, each participant touched five surfaces

(cloth, glass, rubber, skin, and wood) ten times in random order. Figure 4.19 shows the confusion

matrix across the five surface materials using leave-one-user-out cross-validation. We observe Face-

Touch can precisely classify surface materials, which further distinguishes confounding gestures

that may be mistaken as face-touch events by the IMU-sensing module. Note that both face-touch

gesture detection and surface material detection are indispensable. The former relies on the latter

to reduce false positives, the latter relies on the former to filter out irrelevant gestures that touch the

skin of other body parts (e.g., leg, arm) but don’t approach the face. For example, in the cases of

drinking water or calling, FaceTouch may first recognize the touch event and the approaching face

gesture. However, since the user does not touch the skin, the system can successfully filter out the

false positive behavior by classifying the touching material.

Training DNN-based detector with Unlabeled Data In Section 4.4.4, we use unlabeled data

for face-touch detection DNN model training through the loss in Equation (4.3). We conduct ex-

periments with the 𝜎𝑛 = 0.01 for the Gaussian noise. For the generic model, we use 80% of all

the data as training data with a random split of 10% of the training data as labeled, and the rest

are unlabeled data. We compare the DNN performance of training with/without semi-supervised

81

 0 25 50 75 100Face touchingDrinkingCallingAdjusting glassesRaising handPercentage (%)PrecisionRecallF1Figure 4.20 Training DNN-based detector with/without semi-supervised learning. Only 10% of the
training data are labeled while the rest 90% are unlabeled.

learning in Figure 4.20. The DNN model trained with semi-supervised learning achieves 85% pre-

cision, 86.8% recall, and 85.9% F-1 score, which outperforms the DNN model trained without

semi-supervised learning (79.3% precision, 81.2% recall, and 80.2% F-1 score) 5% on F-1 score

and is close to our best F-1 score (89.4%) achieved on 100% labeled data. The results demonstrate

that semi-supervised learning enables utilizing large-scale unlabeled data collected from different

users in real-world life to improve the deep face-touch gesture detection performance.

4.7 Discussion and Future Work

Face Touch using Different Fingers or Hands: In our experiments, the participants wore the

ring on the index finger and the face-touch events involved the index finger touching the face by

default which is compatible with most cases in daily life. We explore the capability of detecting

surface touch events using other fingers with the sensor placed on the index proximal and observe

the frequency response drops when other fingers touch the face because the wave propagation dis-

tance increases. Thus, in our current implementation, if the users touch their faces without using

the index finger, the precision of face touch detection decreases as the features from the vibration

sensing module can be erroneous. To solve this problem, one solution is moving the vibration sen-

sor from finger to wrist. However, the attenuation of the reflected vibration signal increases as the

propagation path increases. Once the SNR is below a certain threshold, the system cannot reliably

extract features related to face-touch events. We can leverage a powerful vibration sensor to in-

crease the sensing range, but such a sensor may degrade the user experience. We leave it as future

work.

Miniaturization: In our current implementation, the IMU and vibration sensors are placed

82

 0 25 50 75 100w/o semiw/ semiall labeledPercentage (%)PrecisionRecallF1separately on a smartwatch and a ring prototype. The ring prototype is built with off-the-shelf

components that can be easily purchased from the market. There are more miniature components

that can be used to reduce the size of the prototype. Besides, there are already many commod-

ity smart rings in the market (i.e., Oura Ring). It’s practically feasible to integrate vibration units

into such ring systems so that the prototype can be further miniaturized. The utilization of smart-

watches [143, 144] and smart rings in improving people’s life quality has been well studied (e.g.,

sleep tracking [145], heart rate monitoring [146], menstrual cycle tracking [147]), so wearing such

wrist-ring system only introduces acceptable overhead in our experiments and users’ daily life.

It is possible to implement a single-piece sensing system (e.g., a wearable ring) containing both

IMU-based and vibration-based sensing modules. However, the main challenge is to deploy our

DNN model on a resource-constrained microcontroller, which has limited computation resources

and memory space. We can also offload the sensing data to a remote server (e.g., a smartphone), but

the data communication can be extremely energy-consuming. Prior work has investigated various

ways to compress the DNN model that reduces both storage and computation requirements [148].

Therefore, we will investigate these approaches to compress our DNN model without sacrificing

performance and then combine the wrist-ring system into a single device. We leave it as future

work.

Noticeable Vibration Sound: The skin vibration generator operates on human audible fre-

quency. In our experiments, once the transceiver is triggered, it will keep on for a while. Thus, the

vibration sound may be a little bit annoying. We have conducted a small-scale user study among

our 10 experiment subjects about the acceptance of the vibration sound and the average acceptance

score is 4.2 out of 5, the score variance is 0.75, and the lowest score is 3. Large-scale user study

remains a future work. We intend to address the vibration sound issue in two folds to make it negli-

gible to the users. First, we will design a soft cover to absorb the sound propagating to the air while

maintaining the vibration signals on the skin. If the soft material has similar acoustic properties to

the skin, it will maximize the signal strength to the sensor [130]. For this purpose, we consider us-

ing inert silicone material to build the soft cover in our future work. Second, we can further reduce

83

the duty cycle of the vibration signal to turn the vibration sound down. However, a low duty cycle

can cause miss-detection for short-time face-touch behaviors. Thus, we will explore an adaptive

duty cycle scheme to optimize the user experience and sensing performance.

ML Classifier Generalization: Due to the user diversity, especially the length of arms, when

we adopt our pre-trained IMU-based classifiers to new users, the performance may be subject to the

size of the training dataset. If the training dataset covers the face-touch arm gesture data performed

by similar body-condition users, the performance will be promising. To avoid performance degra-

dation, we will also involve a bootstrapping process to calibrate the parameters of the IMU-based

classifiers. Specifically, before a new user uses our system, we will ask the user to perform several

pre-configured arm and hand gestures to fine-tune our classifiers. When it comes to larger user

groups in which the user diversity is higher, and there are less data (maybe only the bootstrapping

data) for unseen users, the semi-supervised learning that we explored in the experiment can provide

us with a quick start for training a preliminary model of fair performance. Moreover, we will pursue

a data-argumentation method to generate some synthesis training data covering more diverse users

to enhance the generality of our IMU-based classifier.

4.8 Related Work

Near-field Sensing Methods Near-field sensing systems leverage wireless signals, including

acoustic, Wi-Fi, Bluetooth, or magnetic signals, to extract unique features from the movement of

hand gestures. Acoustic-based systems utilize microphones to capture reflected ultrasound signals

and recognize hand/arm gestures [21–23,35–38]. Recent work has turned existing smartphones and

earsets into a sonar sensing system to detect face touching in the presence of user mobility [20].

Radio frequency-based methods measure the reflection of electromagnetic waves from the human

body to recognize hand gestures [24–26, 39]. However, the main issue of these near-field sensing

systems is that they cannot distinguish many false-positive gestures (e.g., calling, drinking) from

face touch since these gestures have similar hand trajectories with face-touching gestures. Our

system leverages a novel vibration sensor on a ring to filter out false-positive gestures. The per-

formance of our system which is reported with confounding gestures present in the testing dataset

84

outperforms the most recent work [20].

On-body Sensing Methods On-body sensing systems utilize off-the-shelf or customized wear-

able devices (e.g., wristbands, earbuds) and various sensing techniques (e.g., ECG, EOG) to de-

tect face touching. No Face-touch [30], FaceOff [31], Nudge [32], Immutouch [33], and Face

Touch Aware [34] developed mobile apps on off-the-shelf smartwatches or smartbands to detect

face touching using accelerometers. FaceSense designed a customized earbud with impedance

sensing and thermal sensing [29]. ElectroRing [142], SkinTrack [149], and ActiTouch [150] built

customized rings that couple a high-frequency AC signal (10–80 Mhz) to the finger to detect the

skin-touch events. However, some of these on-body systems require wearing a separate transceiver

on the body part being touched which is impractical in face-touch scenarios. Our ring prototype

requires only a single point of instrumentation of both the transmitter and receiver and overcomes

the signal saturation problem, achieving precise face-touch detection across different users and sur-

faces. Besides, all of these on-body sensing systems drain the battery in a few hours because they

require always-on sensing and continuous signal processing to monitor and alert the users. Our

system leverages three sensing modalities and a cascading classifier to filter out irrelevant hand

gestures and minimize the active time of the energy-consuming components. As a result, the power

consumption is several magnitudes lower than existing solutions.

4.9 Conclusion

To conclude, we propose FaceTouch, a low-power, practical, and user-friendly face-touch detec-

tion for epidemiological surveillance. FaceTouch consists of a wrist-based IMU sensor and a novel

ring-based vibration sensor. To simultaneously achieve high detection precision and low energy

consumption, our magic relies on four hierarchical classifiers. We first apply two energy-efficient

IMU-based classifiers to filter out irrelevant gestures. For face-touch like gestures, we use an IMU-

based DNN classifier and a vibration-wave based classifier to guarantee high precision and recall.

We implement FaceTouch using off-the-shelf hardware components and evaluate its performance

under various complex scenarios across ten participants. Experimental results show that the F-1

score is 93.5%. The power consumption is 60.89 𝜇W in normal usage and 209.15 𝜇W in extremely

85

heavy usage.

86

CHAPTER 5

CONCLUSION

This dissertation shows innovative research and development in lower-power AIoT systems, focus-

ing on optimizing energy efficiency, data processing, network reliability, and so on.

In Chapter 2, we present DeepLoRa. DeepLoRa adopts a deep learning-based approach that

utilizes fine-grained landcover information extracted from remote sensing images to accurately esti-

mate the path loss of long-distance LoRa links in complex environments. Compared with previous

environment-aware models, DeepLoRa enables per-link estimation that takes both the type and or-

der of landcovers into account. DeepLoRa also shows fair transferability. The evaluation shows

that DeepLoRa reduces the estimation error to less than 4 dB, which is 2× smaller than state-of-the-

art models. In Chapter 3, we further present LoSee, which shows a fine-grained LoRa link-level

measurement in a 6×6 𝑘𝑚2 urban area. By such measurement, LoSee studies three fundamen-

tal research issues and draws the following conclusions: 1) The spatial and temporal behavior of

LoRa links is quite dynamic due to environmental factors; 2) The coverage of LoRa gateways is

anisotropic; 3) The median error of RSSI-fingerprint-based localization in a given setting is about

400 m. Without densely deployed LoRa gateways, the SOTA LoRa localization can support road-

level localization.

In Chapter 4, we propose FaceTouch, a low-power multimodal wearable system that enables AI

algorithms to monitor face-touch events to protect people from virus infection during the COVID-

19 pandemic. We implement FaceTouch using commercial off-the-shelf hardware components

and evaluate its performance with various user activities and false-positive behaviors. FaceTouch

achieves a 93.5% F-1 score of face-touch detection and 60.89-209.15 𝜇W power consumption de-

pending on usage, which is several magnitudes lower than the state-of-the-art systems.

There are still many research topics along with this dissertation, and many possible future works

still remain to be addressed. Regarding improving large-scale LoRa deployment with AI techniques,

weak signal detection, optimum transmission configuration prediction, and gateway deployment

planning are critically important; Also, there’s still much room for improvement in localization ac-

87

curacy in LoRaWAN systems with advanced models. To enable AI algorithms with IoT infrastruc-

tures in complicated real-world applications, challenges like selecting and combining appropriate

sensing modalities as information sources in IoT systems, designing lightweight AI models that can

achieve efficient on-device computing, and further miniaturizing system components still require

exploration and investigation.

88

BIBLIOGRAPHY

[1] M. C. Bor, U. Roedig, T. Voigt, and J. M. Alonso, “Do lora low-power wide-area networks

scale?” in Proceedings of ACM MSWiM, 2016.

[2]

[3]

[4]

J. Petajajarvi, K. Mikhaylov, A. Roivainen, T. Hanninen, and M. Pettissalo, “On the cov-
erage of lpwans: range evaluation and channel attenuation model for lora technology,” in
Proceedings of IEEE ITST, 2015.

O. Iova, A. Murphy, G. P. Picco, L. Ghiro, D. Molteni, F. Ossi, and F. Cagnacci, “Lora from
the city to the mountains: Exploration of hardware and environmental factors,” in Proceed-
ings of EWSN, 2017.

S. Kartakis, B. D. Choudhary, A. D. Gluhak, L. Lambrinos, and J. A. McCann, “Demystify-
ing low-power wide-area communications for city iot applications,” in Proceedings of ACM
WiNTECH, 2016.

[5] M. Centenaro, L. Vangelista, A. Zanella, and M. Zorzi, “Long-range communications in
unlicensed bands: The rising stars in the iot and smart city scenarios,” IEEE Wireless Com-
munications, 2016.

[6]

[7]

[8]

B. Moyer, “Low power wide area: A survey of longer-range iot wireless protocols (2015),
retrieved sept. 7, 2015.”

A. J. Wixted, P. Kinnaird, H. Larijani, A. Tait, A. Ahmadinia, and N. Strachan, “Evaluation
of lora and lorawan for wireless sensor networks,” in Proceedings of IEEE SENSORS, 2016.

Y. Okumura, “Field strength and its variability in vhf and uhf land-mobile radio service,”
Rev. Electr. Commun. Lab., 1968.

[9] M. Hata, “Empirical formula for propagation loss in land mobile radio services,” IEEE Trans-

actions on Vehicular Technology, 1980.

[10] S. Demetri, M. Zúñiga, G. P. Picco, F. Kuipers, L. Bruzzone, and T. Telkamp, “Automated
estimation of link quality for lora: a remote sensing approach,” in Proceedings of ACM/IEEE
IPSN, 2019.

[11] Y. Lin, W. Dong, Y. Gao, and T. Gu, “Sateloc: A virtual fingerprinting approach to outdoor
lora localization using satellite images,” in Proceedings of ACM/IEEE IPSN, 2020.

[12] T. Hatta and S. J. Dimond, “Differences in face touching by japanese and british people,”

Neuropsychologia, vol. 22, no. 4, pp. 531–534, 1984.

[13] S. Dimond and R. Harries, “Face touching in monkeys, apes and man: Evolutionary origins

and cerebral asymmetry,” Neuropsychologia, vol. 22, no. 2, pp. 227–233, 1984.

[14] Y. L. A. Kwok, J. Gralton, and M.-L. McLaws, “Face touching: a frequent habit that has
implications for hand hygiene,” American journal of infection control, vol. 43, no. 2, pp.
112–114, 2015.

89

[15] K. Morita, K. Hashimoto, M. Ogata, H. Tsutsumi, S.-i. Tanabe, and S. Hori, “Measurement
of face-touching frequency in a simulated train,” in E3S Web of Conferences, vol. 111. EDP
Sciences, 2019, p. 02027.

[16] T. Singhal, “A review of coronavirus disease-2019 (covid-19),” The Indian Journal of Pedi-

atrics, pp. 1–6, 2020.

[17] M.-L. McLaws, A. A. Chughtai, S. Salmon, and C. R. MacIntyre, “A highly precautionary
doffing sequence for health care workers after caring for wet ebola patients to further reduce
occupational acquisition of ebola,” American journal of infection control, vol. 44, no. 7, pp.
740–744, 2016.

[18] M. Liu, J. Ou, L. Zhang, X. Shen, R. Hong, H. Ma, B.-P. Zhu, and R. E. Fontaine, “Protective
effect of hand-washing and good hygienic habits against seasonal influenza: a case-control
study,” Medicine, vol. 95, no. 11, 2016.

[19] N. Zhang, P. Wang, T. Miao, P.-T. Chan, W. Jia, P. Zhao, B. Su, X. Chen, and Y. Li, “Real
human surface touch behavior based quantitative analysis on infection spread via fomite route
in an office,” Building and Environment, vol. 191, p. 107578, 2021.

[20] C. Rojas, N. Poulsen, M. Van Tuyl, D. Vargas, Z. Cohen, J. Paradiso, P. Maes, K. Esvelt,
and F. Adib, “A scalable solution for signaling face touches to reduce the spread of surface-
based pathogens,” Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous
Technologies, vol. 5, no. 1, 2021.

[21] S. Yun, Y.-C. Chen, H. Zheng, L. Qiu, and W. Mao, “Strata: Fine-grained acoustic-based
device-free tracking,” in Proceedings of the 15th annual international conference on mobile
systems, applications, and services, 2017, pp. 15–28.

[22] R. Nandakumar, V. Iyer, D. Tan, and S. Gollakota, “Fingerio: Using active sonar for fine-
grained finger tracking,” in Proceedings of the 2016 CHI Conference on Human Factors in
Computing Systems, 2016, pp. 1515–1525.

[23] W. Mao, J. He, and L. Qiu, “Cat: high-precision acoustic motion tracking,” in Proceedings
of the 22nd Annual International Conference on Mobile Computing and Networking, 2016,
pp. 69–81.

[24] B. Kellogg, V. Talla, and S. Gollakota, “Bringing gesture recognition to all devices,” in
11th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI}
14), 2014, pp. 303–316.

[25] C. Zhao, K.-Y. Chen, M. T. I. Aumi, S. Patel, and M. S. Reynolds, “Sideswipe: detecting
in-air gestures around mobile devices using actual gsm signal,” in Proceedings of the 27th
annual ACM symposium on User interface software and technology, 2014, pp. 527–534.

[26] F. Adib, Z. Kabelac, D. Katabi, and R. C. Miller, “3d tracking via body radio reflections,”
in 11th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI}
14), 2014, pp. 317–329.

90

[27] N. D’Aurizio, T. L. Baldi, S. Marullo, G. Paolocci, and D. Prattichizzo, “Reducing face-
touches to limit covid-19 outbreak: an overview of solutions,” in 2021 29th Mediterranean
Conference on Control and Automation (MED).

IEEE, 2021, pp. 645–650.

[28] D. Chen, M. Wang, C. He, Q. Luo, Y. Iravantchi, A. Sample, K. G. Shin, and X. Wang,
“Magx: wearable, untethered hands tracking with passive magnets,” in Proceedings of the
27th Annual International Conference on Mobile Computing and Networking, 2021, pp. 269–
282.

[29] V. Kakaraparthi, Q. Shao, C. J. Carver, T. Pham, N. Bui, P. Nguyen, X. Zhou, and T. Vu,
“Facesense: Sensing face touch with an ear-worn system,” Proceedings of the ACM on In-
teractive, Mobile, Wearable and Ubiquitous Technologies, vol. 5, no. 3, pp. 1–27, 2021.

[30] S. Marnilo, T. L. Baldi, G. Paolocci, N. D’Aurizio, and D. Prattichizzo, “No face-touch:
Exploiting wearable devices and machine learning for gesture detection,” in 2021 IEEE In-
ternational Conference on Robotics and Automation (ICRA).
IEEE, 2021, pp. 4187–4193.

[31] X. Anthony’Chen, “Faceoff: Detecting face touching with a wrist-worn accelerometer,”

arXiv e-prints, pp. arXiv–2008, 2020.

[32]

“Nudge,” https://www.nudgeband.co.uk/, 2021, Accessed: 2021-11-03.

[33]

[34]

“Immutouch: Stay healthy and hygienic with immutouch,” https://immutouch.com/, 2021,
accessed: 2021-11-03.

“Face Touch Aware : the apple watch app,” https://facetouch.app, 2021, Accessed: 2021-11-
03.

[35] S. Gupta, D. Morris, S. Patel, and D. Tan, “Soundwave: using the doppler effect to sense ges-
tures,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems,
2012, pp. 1911–1914.

[36] Y. Qifan, T. Hao, Z. Xuebing, L. Yin, and Z. Sanfeng, “Dolphin: Ultrasonic-based gesture
recognition on smartphone platform,” in 2014 IEEE 17th International Conference on Com-
putational Science and Engineering.

IEEE, 2014, pp. 1461–1468.

[37] W. Ruan, Q. Z. Sheng, L. Yang, T. Gu, P. Xu, and L. Shangguan, “Audiogest: enabling fine-
grained hand gesture detection by decoding echo signal,” in Proceedings of the 2016 ACM
international joint conference on pervasive and ubiquitous computing, 2016, pp. 474–485.

[38] X. Li, H. Dai, L. Cui, and Y. Wang, “Sonicoperator: Ultrasonic gesture recognition
with deep neural network on mobiles,” in 2017 IEEE SmartWorld, Ubiquitous Intelli-
gence & Computing, Advanced & Trusted Computed, Scalable Computing & Communica-
tions, Cloud & Big Data Computing, Internet of People and Smart City Innovation (Smart-
World/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

IEEE, 2017, pp. 1–7.

[39] Q. Pu, S. Gupta, S. Gollakota, and S. Patel, “Whole-home gesture recognition using wireless
signals,” in Proceedings of the 19th annual international conference on Mobile computing
& networking, 2013, pp. 27–38.

91

[40] L. Mo, Y. He, Y. Liu, J. Zhao, S.-J. Tang, X.-Y. Li, and G. Dai, “Canopy closure estimates

with greenorbs: sustainable sensing in the forest,” in Proceedings of ACM SenSys, 2009.

[41] X. Mao, X. Miao, Y. He, X.-Y. Li, and Y. Liu, “Citysee: Urban CO₂ monitoring with sen-

sors,” in Proceedings of IEEE INFOCOM, 2012.

[42] M. Ceriotti, M. Corrà, L. D’Orazio, R. Doriguzzi, D. Facchin, S. Guna, G. P. Jesi, R. L.
Cigno, L. Mottola, A. L. Murphy et al., “Is there light at the ends of the tunnel? wireless
sensor networks for adaptive lighting in road tunnels,” in Processing of IEEE/ACM IPSN,
2011.

[43]

J. Haxhibeqiri, E. De Poorter, I. Moerman, and J. Hoebeke, “A survey of lorawan for iot:
From technology to application,” Sensors, 2018.

[44] L. Li, J. Ren, and Q. Zhu, “On the application of lora lpwan technology in sailing monitoring

system,” in Proceedings of IEEE WONS.

IEEE, 2017.

[45] B. Reynders and S. Pollin, “Chirp spread spectrum as a modulation technique for long range

communication,” in Proceedings of IEEE SCVT, 2016.

[46] L. Vangelista, A. Zanella, and M. Zorzi, “Long-range iot technologies: The dawn of lora™,”

in Future access enablers of ubiquitous and intelligent infrastructures. Springer, 2015.

[47] Y. Yao, Z. Ma, and Z. Cao, “Losee: Long-range shared bike communication system based

on lorawan protocol,” in Proceedings of EWSN, 2019.

[48] K.-H. Lam, C.-C. Cheung, and W.-C. Lee, “Lora-based localization systems for noisy out-

door environment,” in Proceedings of IEEE WiMob, 2017.

[49] M. Aernouts, R. Berkvens, K. Van Vlaenderen, and M. Weyn, “Sigfox and lorawan datasets

for fingerprint localization in large urban and rural areas,” Data, 2018.

[50] W. Choi, Y.-S. Chang, Y. Jung, and J. Song, “Low-power lora signal-based outdoor position-

ing using fingerprint algorithm,” ISPRS International Journal of Geo-Information, 2018.

[51] H. T. Friis, “A note on a simple transmission formula,” Proceedings of IEEE IRE, 1946.

[52]

J. C. Liando, A. Gamage, A. W. Tengourtius, and M. Li, “Known and unknown facts of lora:
Experiences from a large-scale measurement study,” ACM Transactions on Sensor Networks,
2019.

[53] T. S. Rappaport et al., Wireless communications: principles and practice.

Prentice Hall

PTR New Jersey, 1996, vol. 2.

[54] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, 2015.

[55]

J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural networks, 2015.

[56] L. Deng, D. Yu et al., “Deep learning: methods and applications,” Foundations and Trends®

in Signal Processing, 2014.

92

[57] C. A. Oroza, Z. Zhang, T. Watteyne, and S. D. Glaser, “A machine-learning-based connec-
tivity model for complex terrain large-scale low-power wireless deployments,” IEEE Trans-
actions on Cognitive Communications and Networking, 2017.

[58] Y. Zhang, J. Wen, G. Yang, Z. He, and X. Luo, “Air-to-air path loss prediction based on
machine learning methods in urban environments,” Wireless Communications and Mobile
Computing, 2018.

[59] S. I. Popoola, S. Misra, and A. A. Atayero, “Outdoor path loss predictions based on extreme

learning machine,” Wireless Personal Communications, 2018.

[60] H. Cheng, H. Lee, and S. Ma, “Cnn-based indoor path loss modeling with reconstruction of

input images,” in Proceedings of IEEE ICTC, 2018.

[61] E. Ostlin, H.-J. Zepernick, and H. Suzuki, “Macrocell path-loss prediction using artificial

neural networks,” IEEE Transactions on Vehicular Technology, 2010.

[62] A. Novelli, M. A. Aguilar, A. Nemmaoui, F. J. Aguilar, and E. Tarantino, “Performance
evaluation of object based greenhouse detection from sentinel-2 msi and landsat 8 oli data:
A case study from almería (spain),” International journal of applied earth observation and
geoinformation, 2016.

[63] M. Pesaresi, C. Corbane, A. Julea, A. J. Florczyk, V. Syrris, and P. Soille, “Assessment of

the added-value of sentinel-2 for detecting built-up areas,” Remote Sensing, 2016.

[64] G. Camps-Valls and L. Bruzzone, “Kernel-based methods for hyperspectral image classifi-

cation,” IEEE Transactions on Geoscience and Remote Sensing, 2005.

[65] C.-W. Hsu and C.-J. Lin, “A comparison of methods for multiclass support vector machines,”

IEEE transactions on Neural Networks, 2002.

[66] C. Huang, L. Davis, and J. Townshend, “An assessment of support vector machines for land

cover classification,” International Journal of remote sensing, 2002.

[67] O. Vinyals, S. V. Ravuri, and D. Povey, “Revisiting recurrent neural networks for robust asr,”

in Proceedings of IEEE ICASSP, 2012.

[68]

I. Sutskever, J. Martens, and G. E. Hinton, “Generating text with recurrent neural networks,”
in Proceedings of ICML, 2011.

[69] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, 1997.

[70] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions

on Signal Processing, 1997.

[71] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in Proceedings

of the international conference on artificial intelligence and statistics, 2011.

[72] D.

Mandrioli,

“Semantic

segmentation

editor,”

https://github.com/

Hitachi-Automotive-And-Industry-Lab/semantic-segmentation-editor.

93

[73] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen,
Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito,
M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chin-
tala, “Pytorch: An imperative style, high-performance deep learning library,” in
Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle,
A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds.
Curran Asso-
ciates, Inc., 2019, pp. 8024–8035. [Online]. Available: http://papers.neurips.cc/paper/
9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

[74] L. Alliance,

lorawan,”
alliance.org/resource-hub/what-lorawanr, Retrieved by July 19th 2021.

“A technical

overview of

lora

and

in

https://lora-

[75] A. Research, “Nb-iot and lte-m issues to boost lora and sigfox near and long-term lead in
lpwa network connections,” in https://tinyurl.com/2026- cellular-iot, Retrieved by July 19th
2021.

[76] L. SX1276, “77/78/79 datasheet, rev. 4,” Semtech, March, 2015.

[77] C. Li, H. Guo, S. Tong, X. Zeng, Z. Cao, M. Zhang, Q. Yan, L. Xiao, J. Wang, and Y. Liu,
“Nelora: Towards ultra-low snr lora communication with neural-enhanced demodulation,”
in Proceedings of ACM SenSys, 2021.

[78] C. Li and Z. Cao, “Lora networking techniques for large-scale and long-term iot: A down-

to-top survey,” ACM Computing Surveys, 2022.

[79] L. Li, Y. Yuguang, C. Zhichao, and Z. Mi, “Deeplora: Learning accurate path loss model for

long distance links in lpwan,” in Proceedings of IEEE INFOCOM, 2021.

[80]

[81]

[82]

J. Yang, Z. Xu, and J. Wang, “Ferrylink: Combating link degradation for practical lpwan
deployments,” in Proceedings of IEEE ICPADS.

IEEE, 2021.

J. Navarro-Ortiz, S. Sendra, P. Ameigeiras, and J. M. Lopez-Soler, “Integration of lorawan
and 4g/5g for the industrial internet of things,” IEEE Communications Magazine, 2018.

J. Haxhibeqiri, A. Karaagac, F. Van den Abeele, W. Joseph, I. Moerman, and J. Hoebeke,
“Lora indoor coverage and performance in an industrial environment: Case study,” in Pro-
ceedings of IEEE international conference on emerging technologies and factory automation
(ETFA), 2017.

[83]

J. Petäjäjärvi, K. Mikhaylov, M. Hämäläinen, and J. Iinatti, “Evaluation of lora lpwan tech-
nology for remote health and wellbeing monitoring,” in Proceeding of the International Sym-
posium on Medical Information and Communication Technology, 2016.

[84] D. Magrin, M. Centenaro, and L. Vangelista, “Performance evaluation of lora networks in a

smart city scenario,” in Proceeding of IEEE ICC, 2017.

[85] Y. Wang, X. Zheng, L. Liu, and H. Ma, “Polartracker: Attitude-aware channel access for
floating low power wide area networks,” in Proceedings of IEEE INFOCOM, 2021.

94

[86] W. Xu, J. Y. Kim, W. Huang, S. S. Kanhere, S. K. Jha, and W. Hu, “Measurement, charac-
terization, and modeling of lora technology in multifloor buildings,” IEEE Internet of Things
Journal, 2019.

[87] C. Li, X. Guo, L. Shuangguan, Z. Cao, and K. Jamieson, “Curvinglora to boost lora network

throughput via concurrent transmission,” in Proceedings of USENIX NSDI, 2022.

[88] GPS-free geolocation using LoRa in low-power WANs.

IEEE, 2017.

[89] N. Podevijn, D. Plets, J. Trogh, L. Martens, P. Suanet, K. Hendrikse, and W. Joseph, “Tdoa-
based outdoor positioning with tracking algorithm in a public lora network,” Wireless Com-
munications and Mobile Computing, 2018.

[90] N. Podevijn, D. Plets, M. Aernouts, R. Berkvens, L. Martens, M. Weyn, and W. Joseph, “Ex-
perimental tdoa localisation in real public lora networks,” in Proceedings of CEUR Workshop
Proceedings, 2019.

[91] D. F. Carvalho, A. Depari, P. Ferrari, A. Flammini, S. Rinaldi, and E. Sisinni, “On the feasi-
bility of mobile sensing and tracking applications based on lpwan,” in Proceedings of IEEE
SAS.

IEEE, 2018.

[92] A. Dongare, C. Hesling, K. Bhatia, A. Balanuta, R. L. Pereira, B. Iannucci, and A. Rowe,
“Openchirp: A low-power wide-area networking architecture,” in Proceedings of IEEE Per-
Com Workshops, 2017.

[93] C. Gu, L. Jiang, and R. Tan, “Lora-based localization: Opportunities and challenges,” arXiv

preprint arXiv:1812.11481, 2018.

[94] R. Nandakumar, V. Iyer, and S. Gollakota, “3d localization for sub-centimeter sized devices,”

in Proceedings of ACM SenSys, 2018.

[95] A. Bansal, A. Gadre, V. Singh, A. Rowe, B. Iannucci, and S. Kumar, “Owll: Accurate lora
localization using the tv whitespaces,” in Proceedings of the 20th International Conference
on Information Processing in Sensor Networks, 2021.

[96] H. Sallouha, A. Chiumento, and S. Pollin, “Localization in long-range ultra narrow band iot

networks using rssi,” in Proceedings of IEEE ICC, 2017.

[97] Y. Li, Z. He, Y. Li, H. Xu, L. Pei, and Y. Zhang, “Towards location enhanced iot: Character-
ization of lora signal for wide area localization,” in Proceedings of IEEE UPINLBS.
IEEE,
2018.

[98] S. Tong, J. Wang, and Y. Liu, “Combating packet collisions using non-stationary signal scal-

ing in LPWANs,” in Proceedings of ACM MobiSys, 2020.

[99] Y. Liu, Y. He, M. Li, J. Wang, K. Liu, and X. Li, “Does wireless sensor network scale? a
measurement study on greenorbs,” IEEE Transactions on Parallel and Distributed Systems,
2012.

95

[100] W. Dong, Y. Liu, Y. He, T. Zhu, and C. Chen, “Measurement and analysis on the packet de-
livery performance in a large-scale sensor network,” IEEE/ACM Transactions on Networking,
2013.

[101] J. Wang and W. Dong, “Understanding the link-level behaviors of a large scale urban sensor

network,” in Proceedings of IEEE MSN, 2016.

[102] D. Aguayo, J. Bicket, S. Biswas, G. Judd, and R. Morris, “Link-level measurements from an

802.11 b mesh network,” in Proceedings of ACM SIGCOMM, 2004.

[103] C. K. Williams and C. E. Rasmussen, “Gaussian processes for regression,” 1996.

[104] Y. Sangar and B. Krishnaswamy, “Wichronos: Energy-efficient modulation for long-range,

large-scale wireless networks,” in Proceedings of ACM MobiCom, 2020.

[105] C. Li, Z. Cao, and L. Xiao, “Curvealoha: Non-linear chirps enabled high throughput random

channel access for lora,” in Proceedings of IEEE INFOCOM, 2022.

[106] A. Dongare, R. Narayanan, A. Gadre, A. Luong, A. Balanuta, S. Kumar, B. Iannucci, and
A. Rowe, “Charm: exploiting geographical diversity through coherent combining in low-
power wide-area networks,” in Proceedings of ACM/IEEE IPSN, 2018.

[107] A. Gadre, R. Narayanan, A. Luong, A. Rowe, B. Iannucci, and S. Kumar, “Frequency
Configuration for Low-Power Wide-Area Networks in a Heartbeat,” in Proceedings of
USENIX NSDI, 2020.

[108] L. Chen, J. Xiong, X. Chen, S. I. Lee, K. Chen, D. Han, D. Fang, Z. Tang, and Z. Wang,
“WideSee: towards wide-area contactless wireless sensing,” in Proceedings of ACM SenSys,
2019.

[109] R. Nandakumar, V. Iyer, and S. Gollakota, “3D Localization for Sub-Centimeter Sized De-

vices,” in Proceedings of ACM SenSys, 2018.

[110] F. Zhang, Z. Chang, K. Niu, J. Xiong, B. Jin, Q. Lv, and D. Zhang, “Exploring LoRa for

Long-range Through-wall Sensing,” Proceedings of ACM IMWUT, 2020.

[111] S. Zhang, W. Wang, N. Zhang, and T. Jiang, “RF Backscatter-based State Estimation for

Micro Aerial Vehicles,” in Proceedings of IEEE INFOCOM, 2020.

[112] W. C. McGrew and L. F. Marchant, “On the other hand: current issues in and meta-analysis
of the behavioral laterality of hand function in nonhuman primates,” American Journal of
Physical Anthropology: The Oﬀicial Publication of the American Association of Physical
Anthropologists, vol. 104, no. S25, pp. 201–232, 1997.

[113] L. J. Rogers, G. Vallortigara, and R. J. Andrew, Divided brains: the biology and behaviour

of brain asymmetries. Cambridge University Press, 2013.

[114] N. Zhang, W. Jia, P. Wang, M.-F. King, P.-T. Chan, and Y. Li, “Most self-touches are with

the nondominant hand,” Scientific reports, vol. 10, no. 1, pp. 1–13, 2020.

96

[115] S. L. Warnes, Z. R. Little, and C. W. Keevil, “Human coronavirus 229e remains infectious

on common touch surface materials,” MBio, vol. 6, no. 6, pp. e01 697–15, 2015.

[116] A. Guellich, E. Tella, M. Ariane, C. Grodner, H.-N. Nguyen-Chi, and E. Mahé, “The face
mask-touching behavior during the covid-19 pandemic: Observational study of public trans-
portation users in the greater paris region: The french-mask-touch study,” Journal of trans-
port & health, vol. 21, p. 101078, 2021.

[117] Z. Witkower and J. L. Tracy, “Bodily communication of emotion: Evidence for extrafacial
behavioral expressions and available coding systems,” Emotion Review, vol. 11, no. 2, pp.
184–193, 2019.

[118] R. Haratian, “Assistive wearable technology for mental wellbeing: Sensors and signal pro-
cessing approaches,” in 2019 5th International Conference on Frontiers of Signal Processing
(ICFSP).

IEEE, 2019, pp. 7–11.

[119] S. M. Mueller, S. Martin, and M. Grunwald, “Self-touch: contact durations and point of
touch of spontaneous facial self-touches differ depending on cognitive and emotional load,”
PloS one, vol. 14, no. 3, p. e0213677, 2019.

[120] M. De Zambotti, N. Cellini, A. Goldstone, I. M. Colrain, and F. C. Baker, “Wearable sleep
technology in clinical and research settings,” Medicine and science in sports and exercise,
vol. 51, no. 7, p. 1538, 2019.

[121] A. Parate, M.-C. Chiu, C. Chadowitz, D. Ganesan, and E. Kalogerakis, “Risq: Recognizing
smoking gestures with inertial sensors on a wristband,” in Proceedings of the 12th annual
international conference on Mobile systems, applications, and services, 2014, pp. 149–161.

[122] R. E. Wright, “Logistic regression.” 1995.

[123] D. G. Kleinbaum, K. Dietz, M. Gail, M. Klein, and M. Klein, Logistic regression. Springer,

2002.

[124] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and
Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical ma-
chine translation,” arXiv preprint arXiv:1406.1078, 2014.

[125] T. Miyato, S.-i. Maeda, M. Koyama, and S. Ishii, “Virtual adversarial training: a regular-
ization method for supervised and semi-supervised learning,” IEEE transactions on pattern
analysis and machine intelligence, vol. 41, no. 8, pp. 1979–1993, 2018.

[126] R. O. Nelson, R. A. Boykin, and S. C. Hayes, “Long-term effects of self-monitoring on
reactivity and on accuracy,” Behaviour research and therapy, vol. 20, no. 4, pp. 357–363,
1982.

[127] S. Foti, C. G. Lai, G. J. Rix, and C. Strobbia, Surface wave methods for near-surface site

characterization. CRC press, 2014.

97

[128] N. Roy, M. Gowda, and R. R. Choudhury, “Ripple: Communicating through physical vi-
bration,” in 12th {USENIX} Symposium on Networked Systems Design and Implementation
({NSDI} 15), 2015, pp. 265–278.

[129] C. Zhang, Q. Xue, A. Waghmare, R. Meng, S. Jain, Y. Han, X. Li, K. Cunefare, T. Ploetz,
T. Starner et al., “Fingerping: Recognizing fine-grained hand poses using active acoustic on-
body sensing,” in Proceedings of the 2018 CHI Conference on Human Factors in Computing
Systems, 2018, pp. 1–10.

[130] J. Gong, A. Gupta, and H. Benko, “Acustico: Surface tap detection and localization using
wrist-based acoustic tdoa sensing,” in Proceedings of the 33rd Annual ACM Symposium on
User Interface Software and Technology, 2020, pp. 406–419.

[131] W. Wang, L. Yang, and Q. Zhang, “Touch-and-guard: secure pairing through hand reso-
nance,” in Proceedings of the 2016 ACM International Joint Conference on Pervasive and
Ubiquitous Computing, 2016, pp. 670–681.

[132] “TDK piezolisten™ piezo transducers,”

https://product.tdk.com/en/search/sw_piezo/

speaker/piezolisten/info?part_no=PHUA2010-049B-00-000&utm_source=piezolisten_
commercial_phu_en.pdf&utm_medium=catalog, 2010, accessed: 2010-09-30.

[133] R. A. Serway and C. Vuille, College physics. Cengage Learning, 2014.

[134] D. Rachaveti, N. Chakrabhavi, V. Shankar, and S. Varadhan, “Thumbs up: movements made
by the thumb are smoother and larger than fingers in finger-thumb opposition tasks,” PeerJ,
vol. 6, p. e5763, 2018.

[135] F. W. Jones, Finger-ring Lore: Historical, Legendary, Anecdotal. Good Press, 2019.

[136] N. J. Mansfield, Human response to vibration. CRC press, 2004.

[137] J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of

statistics, pp. 1189–1232, 2001.

[138] J. Diebel, “Representing attitude: Euler angles, unit quaternions, and rotation vectors,” Ma-

trix, vol. 58, no. 15-16, pp. 1–35, 2006.

[139] S. Menard, Applied logistic regression analysis. Sage, 2002, vol. 106.

[140] “Adafruit bluefruit le uart friend - bluetooth low energy (ble),” 2022, accessed: 2022-01-30.

[141] “Monsoon

monitor,”
voltage
high-voltage-power-monitor, 2021, accessed: 2021-11-03.

power

high

https://www.msoon.com/

[142] W. Kienzle, E. Whitmire, C. Rittaler, and H. Benko, “Electroring: Subtle pinch and touch
detection with a ring,” in Proceedings of the 2021 CHI Conference on Human Factors in
Computing Systems, 2021, pp. 1–12.

[143] Z. Song, Z. Cao, Z. Li, and J. Wang, “Magic wand: Towards plug-and-play gesture recogni-
tion on smartwatch,” in 2020 16th International Conference on Mobility, Sensing and Net-
working (MSN).

IEEE, 2020, pp. 275–282.

98

[144] Z. Song, Z. Cao, Z. Li, J. Wang, and Y. Liu, “Inertial motion tracking on mobile and wearable
devices: Recent advancements and challenges,” Tsinghua Science and Technology, vol. 26,
no. 5, pp. 692–705, 2021.

[145] M. Altini and H. Kinnunen, “The promise of sleep: A multi-sensor approach for accurate

sleep stage detection using the oura ring,” Sensors, vol. 21, no. 13, p. 4302, 2021.

[146] D. Phan, L. Y. Siong, P. N. Pathirana, and A. Seneviratne, “Smartwatch: Performance eval-
uation for long-term heart rate monitoring,” in 2015 International symposium on bioelec-
tronics and bioinformatics (ISBB).

IEEE, 2015, pp. 144–147.

[147] A. Maijala, H. Kinnunen, H. Koskimäki, T. Jämsä, and M. Kangas, “Nocturnal finger skin
temperature in menstrual cycle tracking: ambulatory pilot study using a wearable oura ring,”
BMC Women’s Health, vol. 19, no. 1, pp. 1–10, 2019.

[148] R. Mishra, H. P. Gupta, and T. Dutta, “A survey on deep neural network compression: Chal-

lenges, overview, and solutions,” arXiv preprint arXiv:2010.03954, 2020.

[149] Y. Zhang, J. Zhou, G. Laput, and C. Harrison, “Skintrack: Using the body as an electri-
cal waveguide for continuous finger tracking on the skin,” in Proceedings of the 2016 CHI
Conference on Human Factors in Computing Systems, 2016, pp. 1491–1503.

[150] Y. Zhang, W. Kienzle, Y. Ma, S. S. Ng, H. Benko, and C. Harrison, “Actitouch: Robust touch
detection for on-skin ar/vr interfaces,” in Proceedings of the 32nd Annual ACM Symposium
on User Interface Software and Technology, 2019, pp. 1151–1159.

99