DIET MONITORING THROUGH BREATHING SIGNAL ANALYSIS USING
WEARABLE SENSORS
By
Bo Dong

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Electrical Engineering−Doctor of Philosophy
2014

ABSTRACT
DIET MONITORING THROUGH BREATHING SIGNAL ANALYSIS USING WEARABLE
SENSORS
By
Bo Dong

This dissertation presents a framework of wearable food and drink intake monitoring
system that analyzes human breathing signal for identifying swallows during the intake process.
The system works based on a key observation that a person’s otherwise continuous breathing
cycles are interrupted by brief intra-cycle apneas during the swallows. This dissertation develops
wireless wearable electronics for capturing and processing human breathing signal, and
algorithms for identifying intake-related swallows via recognizing apneas extracted from
breathing signal. A family of apnea detection mechanisms including matched filters and machine
learning has been developed. Algorithms are developed for detecting various types of
swallowing events including for solid and liquid in the presence of many artifacts presents in
free-living conditions. It is demonstrated that using these algorithms and the electronics, run-time
intake monitoring and analysis are feasible at acceptable accuracy levels. Further accuracy
improvements were explored using a Hidden Markov Model (HMM) based mechanism that
leverages known temporal locality in the human swallow sequence. Finally, it was demonstrated
that by combining swallowing signatures from breathing signal with hand movement signatures
using accelerometers, it is possible to train a hierarchical Support Vector Machine (SVM)
classifiers and a Hidden Markov Model (HMM) for accurate mealtime and duration estimation.
The developed wearable system, along with a smartphone App, was experimentally validated on
tens

of

subjects

with

approval

from

MSU’s

Institutional

Review

Board

(IRB).

To My Family
For All Their
Love and Support

iii

ACKNOWLEDGEMENTS

I would like to thank my supervisor Dr. Subir Biswas for his support and instruction
through my PhD study. His academic rigor in research guided me to continue improving myself
in terms of every aspect of research. I would also like to thank my committee Dr. Lalita Udpa,
Dr. Sandeep Kulkarni, Dr. Nihar Mahapatra and Dr Rama Mukkamala for their time and support.
I would also like to thank my lab mates Qiong Huo, Mahmoud Taghizadeh, Debasmit
Banerjee, Faeze Hajiaghajani, Stephan Lorenz, Muhannad Quwaider, William Tomlinson,
Clifton Watson, Yan Shi, Saptarshi Das, and Dezhi Feng for participating in my experiments,
brainstorming and implementation discussions.
Last but not least, I would like give my thanks to my friends for supporting me through
this process.

iv

TABLE OF CONTENTS

LIST OF TABLES ........................................................................................................... viii
LIST OF FIGURES ........................................................................................................... ix
Chapter 1:
1.1
1.2
1.3
1.4

Introduction .................................................................................................... 1
Background ..................................................................................................... 1
Objectives ........................................................................................................ 3
Summary of Proposed Solutions ..................................................................... 4
Dissertation Structure ...................................................................................... 5

Chapter 2:
2.1
2.2
2.2.1
2.2.2
2.2.3
2.3

Related Works................................................................................................ 7
Invasive Swallow Detection ............................................................................ 7
Non-Invasive Swallow Detection .................................................................... 8
Non-Invasive Swallow Detection using Chewing and Swallowing Sounds ... 9
Non-Invasive Swallow Detection using Other Modalities ............................ 15
Non-Invasive Diet Monitoring using Infrastructural Sensors ....................... 20
Proposed Mechanism .................................................................................... 22

Chapter 3:
3.1
3.2
3.3
3.3.1
3.3.2
3.3.3
3.4
3.4.1
A.
B.
3.4.2
3.4.3

Instrumentation System ............................................................................... 25
Breathing Process .......................................................................................... 25
Swallowing Apnea ........................................................................................ 25
Sensors for Collecting Respiratory Signal .................................................... 28
Conductive Rubber Sensor ............................................................................ 29
Piezoelectric PVDF Sensor ........................................................................... 31
Respiratory Inductance Plethysmography (RIP) Sensor ............................... 33
Swallow Detection System ............................................................................ 34
Signal Shaping Circuit................................................................................... 36
Signal Shaping Circuit for Piezoelectric Belts .............................................. 36
Signal Shaping Circuit for RIP Belts ............................................................ 38
µController and Bluetooth Module ............................................................... 39
Breathing Signal Logging App...................................................................... 39

Chapter 4:
4.1
4.1.1
4.1.2
4.1.3
4.1.4
4.2
4.2.1

Swallow Detecting using Matched Filters ................................................... 43
Processing Methods ....................................................................................... 43
Breathing, Apnea and Swallow Signature ..................................................... 43
Matched Filter Method .................................................................................. 45
Machine Learning based Detection ............................................................... 46
Artifacts Handling ......................................................................................... 47
Results ........................................................................................................... 48
Results for Matched Filter based Detection .................................................. 49
v

4.2.2 Performance for Machine Learning based Method with Time Domain Features
....................................................................................................................... 51
4.2.3 Performance for Machine Learning based Method with Frequency Domain Features
....................................................................................................................... 51
4.3 Discussion ..................................................................................................... 52
4.3.1 Iterative Template Refinement ...................................................................... 52
4.3.2 Discrimination Power of Time Domain Features.......................................... 56
4.3.3 Discrimination Power of Frequency Domain Features ................................. 58
4.3.4 Artifacts Handling ......................................................................................... 60
4.4 Summary ....................................................................................................... 62
Chapter 5:
5.1
5.1.1
5.1.2
5.1.3
5.2
5.3
5.3.1
5.3.2
5.4

Machine Learning Based Processing Algorithms ........................................ 63
Processing Methods ....................................................................................... 63
Machine Learning Algorithms ...................................................................... 63
Breathing Apnea and Swallowing Signature................................................. 67
Detection Scheme .......................................................................................... 69
Experiments ................................................................................................... 70
Results and Discussion .................................................................................. 72
Feature Extraction and Selection ................................................................... 72
Swallow Detection ........................................................................................ 74
Conclusion ..................................................................................................... 76

Chapter 6:

Support Vector Machine and Hidden Markov Model based Processing Algorithms
...................................................................................................................... 77
Processing Methods ....................................................................................... 78
Two-tier Swallow Detection ......................................................................... 81
SVM-based Swallow Detection with Posterior Probability .......................... 81
Hidden Markov Model with Swallow Sequence Locality ............................ 84
HMM Processing ........................................................................................... 88
Results and Discussion .................................................................................. 90
Experimental Methods .................................................................................. 90
Performance Indices ...................................................................................... 91
Feature Extraction for Stage-1 Detection using SVM................................... 92
Swallow Detection with SVM....................................................................... 95
Improved Detection using HMM .................................................................. 98
Conclusion ..................................................................................................... 99

6.1
6.1.1
6.1.2
6.1.3
6.1.4
6.2
6.2.1
6.2.2
6.2.3
6.2.4
6.2.5
6.3
Chapter 7:
7.1
7.2
7.3
7.3.1
7.3.2
7.3.3
7.3.4
7.3.5

Mealtime and Duration Monitoring ........................................................... 101
System Architecture .................................................................................... 102
Processing Methods ..................................................................................... 103
Results ......................................................................................................... 109
Experimental Methods ................................................................................ 109
Performance Evaluation .............................................................................. 111
Feature Extraction ....................................................................................... 113
Performance of Food Intake Detection ....................................................... 114
Performance of Meal Intake Analysis ......................................................... 118
vi

7.4
7.4.1
7.4.2
7.4.3
7.4.4
7.5

Discussion ................................................................................................... 121
Restrictive Feature Selection ....................................................................... 121
Benefits of Hierarchical Classifier .............................................................. 123
Performance of Existing Research .............................................................. 124
Spontaneous Swallows ................................................................................ 125
Conclusion ................................................................................................... 125

Chapter 8: Proposed Work........................................................................................... 126
8.1 Diet Volume Detection ................................................................................ 126
8.2 Choking and Coughing Detection ............................................................... 127
BIBLIOGRAPHY ........................................................................................................... 129

vii

LIST OF TABLES

Table 4-1: Performance of classifiers using time domain features ............................................... 51
Table 4-2: Performance of all three classifiers using frequency domain features. Performance on
both individual dataset and the combined dataset are presented in this table. .............................. 52
Table 5-1: Durations of different breathing cycle types ............................................................... 71
Table 5-2: Features selected for classification .............................................................................. 74
Table 5-3: Performance of the first stage of the hierarchical classifier ........................................ 75
Table 5-4: Performance of the second stage of the hierarchical classifier ................................... 75
Table 6-1: Comparison between fixed threshold SVM-only and two-tier SVM+HMM
mechanism .................................................................................................................................... 98
Table 7-1: Features Extracted For Svm Classifiers .................................................................... 113
Table 7-2: Comparison Between Svm-Only And Svm+Hmm Solutions ................................... 116
Table 7-3: Comparison of 3-Stage Hierarchical Classifier And Single Classifier ..................... 123

viii

LIST OF FIGURES

Figure 1-1: Prevalence of obesity worldwide (From International Association For the Study of
Obesity) ........................................................................................................................................... 1
Figure 1-2: Increase of obesity in US ............................................................................................. 2
Figure 1-3: Dissertation flowchart .................................................................................................. 5
Figure 2-1: Schematic sensor positioning in [29] ......................................................................... 10
Figure 2-2: In-ear microphone used in [33] .................................................................................. 13
Figure 2-3: sensors used in [36] .................................................................................................... 14
Figure 2-4: Sensor positioning in [39] .......................................................................................... 16
Figure 2-5: Prototype gyroscope sensor used in [42] ................................................................... 19
Figure 2-6: Magnetic coils and microphone system proposed in [46].......................................... 20
Figure 2-7: Camera based system proposed in [47]...................................................................... 21
Figure 2-8: The embedded RFID and weighing table surfaces .................................................... 22
Figure 3-1: 3 steps of swallowing ................................................................................................. 26
Figure 3-2: Breathing signal of two subjects in the experiments.................................................. 27
Figure 3-3: Resistive stretch sensor (a) and its static response (b) ............................................... 29
Figure 3-4: Transient response of stretch sensors ......................................................................... 30
Figure 3-5: Piezoelectric belt (a) and its static response (b) ......................................................... 31
Figure 3-6: Equivalent circuit of piezoelectric sensor (a), Isolation circuit (b), and Transient
response of piezo respiratory sensor(c)......................................................................................... 32
Figure 3-7: RIP sensor .................................................................................................................. 33
Figure 3-8: System architecture of swallow detection system using piezoelectric belt ............... 34
Figure 3-9: System architecture of swallow detection system using RIP belts ............................ 35
Figure 3-10: Signal shaping circuit for piezoelectric belt ............................................................. 37
Figure 3-11: Schematic of signal shaping circuit ........................................................................ 39
ix

Figure 3-12: Logic layering of LiveActive ................................................................................... 40
Figure 3-13: Design of the webserve module.............................................................................. 41
Figure 3-14: Graphic User Interface (GUI) of LiveActive ........................................................... 41
Figure 4-1: Examples of Breathing Cycles (BC), Normal Breathing Cycles (NBC), Breathing
Cycles with Inhale Swallow (BC-IS), Breathing Cycles with Exhale Swallow (BC-ES) and
apnea ............................................................................................................................................. 44
Figure 4-2: Detection process of matched filter based detection algorithm ............................... 46
Figure 4-3: Breathing signal variability before, during, and after talking. ................................... 48
Figure 4-4: ROC Distribution for all seven subjects with arbitrary templates. ............................ 50
Figure 4-5: Similarity score space for: a) initial matched filter template used as a starting point,
and b) the final template obtained at stabilization of the iterative algorithm. The tighter clustering
of the points in the bottom graph indicates iterative improvement of the template quality. ........ 55
Figure 4-6: Iterative template refinement performance; a) true positive rate, and b) false positive
rate evolution with iterations. ....................................................................................................... 56
Figure 4-7: Utility of the time domain features for Subject-1; three peaks in the left figure are
caused by different types of breathing cycles with feature distribution shown in the right figure.
....................................................................................................................................................... 57
Figure 4-8: (a) utility of frequency domain features, (b) comparison between time and frequency
domain features; results are presented with limited number of features that are chosen using a
method as described. ..................................................................................................................... 58
Figure 4-9: Power spectral density (PSD) of breathing signals with talking and without talking,
when normal breathing or breathing with swallows are executed. ............................................... 60
Figure 4-10: Breathing signal: (a) with upper body rocking movement, and (b) without the
movement...................................................................................................................................... 61
Figure 5-1: Examples of Breathing Cycles (BC), Normal Breathing Cycles (NBC), Breathing
Cycles with Inhale Swallow (BC-IS), Breathing Cycles with Exhale Swallow (BC-ES) and
apnea ............................................................................................................................................. 67
Figure 5-2: Example breathing signals for solid and liquid swallows .......................................... 68
Figure 5-3: Logic for swallow signature detection ....................................................................... 70
Figure 5-4: Discriminative property of time and frequency domain features .............................. 72
Figure 5-5: Benefits of ±1 crossings as a classification feature.................................................... 73
x

Figure 6-1: Respiratory signal with swallow signature ................................................................ 78
Figure 6-2: Processing scheme for swallow detection.................................................................. 79
Figure 6-3: (a) Hidden breathing state machine and (b) HMM processing components.............. 85
Figure 6-4: Experimental setup..................................................................................................... 91
Figure 6-5: Feature discriminative property and ±10 crossings as a classification feature .......... 93
Figure 6-6: Distribution of posterior probabilities with and without swallows ............................ 96
Figure 6-7: Comparison between SVM-only and two-tier SVM+HMM mechanism .................. 97
Figure 7-1: Components of the mealtime and duration monitoring system ............................... 103
Figure 7-2: The mealtime and duration detection scheme ....................................................... 105
Figure 7-3: Meal intake analysis algorithm ............................................................................... 108
Figure 7-4: Experimental setup................................................................................................... 110
Figure 7-5:. Performance of SVM-only food intake detection method ..................................... 115
Figure 7-6: An example temporal dynamics of the meal intake analysis process ...................... 117
Figure 7-7: Threshold selection for different window sizes ....................................................... 118
Figure 7-8: Performance of meal intake analysis module ......................................................... 120
Figure 7-9: Performance of Classifier-1, 2 and 3 with different feature count........................... 122
Figure 8-1: Anticipated output of chest belts during coughing (a) one coughing within a
breathing cycle (b) two consecutive coughs in a breathing cycle ............................................... 128

xi

Chapter 1:
1.1

Introduction

Background
According to the data from World Health Organization, worldwide obesity increased over

200% since 1980 [1]. Figure 1-1 shows the prevalence of obesity in different countries based on
the data collected by International Association for the Study of Obesity. Observe that obesity is
currently less prevalent in Asian countries such as China, India, Japan etc, while it is a severe
problem in countries such as United States, Canada, Australia,etc. In particular, in 2010, 35.5%
of men and 35.8% of women in US are obese.

Figure 1-1: Prevalence of obesity worldwide (From International Association For the
Study of Obesity)
The prevalence of obesity in US is constantly increasing. Figure 1-2 depicts the increase in
percentage of obesity in population from 1960 to 2010. Similar trends are also predicted for
developing countries, such as China [2][3] and India [4][5].
1

Figure 1-2: Increase of obesity in US
The prevalence of obesity brings many health problems, both physical and mental, and
social issues. Eckel et al [6] had proven that obesity is a major risk factor for coronary heart
disease, and it is known that 5% to 10% weight reduction can decrease blood pressure and total
blood cholesterol, improve glucose tolerance for patients with diabetes, and reduce the severity
of obstructive sleep apnea. Visscher [7] mentioned that obesity can cause cardiovascular disease,
type-2 diabetes mellitus, cancer, osteoarthritis, work disability and sleep apnea, and it has
pronounced impact on morbidity. Moreover, Jia [8] has shown that obesity also affects healthrelated quality-of-life (HRQL) [9], which is a multi-dimensional concept that includes domains
related to physical, mental, emotional and social functioning, and assesses the positive aspects of
a person’s life, such as positive emotions and life satisfaction. It is proved that people with

2

obesity had significant lower HRQL than those who had normal weight, and such low HRQL
were also seen for people without chronic diseases caused by obesity.
There are many factors found to be associated with obesity. Nielsen et al [10]
demonstrated that short sleep duration was consistently associated with development of obesity
in children and young adults. It has also been found out that environmental, perinatal and genetic
factors induce neuroendocrine perturbation followed by abdominal obesity [11]. However,
Astrup et al [12] pointed out that the prevalence of obesity we were facing should not be caused
by genetic factors, because the development of obesity worldwide is too rapid to be associated
with genetic changes, and there were only few humans had been shown to have genetic obesity.
From an energy balance point of view, obesity is caused by the imbalance between the energy
we derive from food and drink and the energy we expend for metabolism and physical activity
[12][13].
1.2

Objectives
Diet control and physical exercise are the two most important components of obesity

control. Traditionally, self-reported questionnaires were widely used by researchers for
estimating both food intake and physical activity levels for high-risk individuals. In most such
studies, participants have shown underreporting tendency. Additionally, self-reporting by elderly
population is often unreliable due to poor memory situations. These make questionnaire-only
based methods subjective and unreliable [14][15][16].
In recent years, accelerometers, gyroscope, and pressure sensors have been widely utilized
for instrumented physical activity monitoring with high detection accuracy [17][18][19][20].
However, not many efforts on instrumented diet monitoring are reported in the literature. Diet

3

monitoring can reduce the subjectivity [14] associated with questionnaire based self-reporting
systems.
An instrumented system can potentially detect each instance of food/drink intake, and can
have significant impact on obesity and overall health monitoring and management. Together
with self-reporting of dietary habits at a high level, the system can quantify calorie intake trends
and estimates for its users. It has also been proven that such health monitoring can improve the
effectiveness and quality of healthcare service [21].
We present a wearable sensor system for solid food intake monitoring based on swallows
detected in breathing signals. Using a wearable chest-belt, we detect swallows by the way of
detecting apneas extracted from breathing signal captured by the chest-belt. Since the belt can be
worn inside, outside, or between garments (it does not need skin contact), it has the potential for
prolonged comfortable daily usage without raising any cosmetic and discomfort concerns.
1.3

Summary of Proposed Solutions
The sensor system and intake monitoring algorithms developed in this thesis works based

on a key observation that a person’s otherwise continuous breathing process is interrupted by a
short apnea during a swallow, which is a part of the swallowing process [22]. We first detect
swallows by the way of detecting apneas extracted from breathing signal captured by a wearable
wireless chest-belt. Afterwards, swallow pattern analysis is used for identifying solid/liquid
swallows. Together with self-reporting at the high level of overall diet habits (i.e., the types of
food and drinks etc.), the instrumented detection of swallow counts can offer an objective way
to: 1) study the food and drink intake trends, and 2) estimate calorie intake. In this thesis,
however, we only address the automatic swallow detection part of the process.

4

1.4

Dissertation Structure
Figure 1-3 illustrates the structure of the thesis:

Figure 1-3: Dissertation flowchart
Chapter 1 introduces the prevalence of obesity and overweight, the causes, and the
drawbacks of existing questionnaire based methods.
Chapter 2 first investigates the existing invasive and non-invasive swallow detection
methods, and analyzed their drawbacks. Then it talks about the proposed swallow detection
mechanism, and its advantages over the existing methods.

5

Chapter 3 first introduces the physiological process of breathing and swallowing, and
brings up the concept of swallow apnea. It then depicts the sensors and swallow detection system
for diet monitoring.
Chapter 4 introduces the matched filter method used for liquid swallow detection. Machine
learning method is also proposed for comparison. Iterative template refinement, feature selection,
and artifacts handling are also discussed in this chapter.
Chapter 5 proposes a hierarchical classification algorithm for solid and liquid swallow
detection. It compares the performance of different features and different popular machine
learning algorithms.
Chapter 6 provides the method of cascading Support Vector Machine (SVM) with Hidden
Markov Model (HMM) for improving the accuracy of food intake detection. The modeling,
processing, and performance of this method are discussed in details.
Chapter 7 proposes a hierarchical classification method cascaded with HMM, diet time and
duration analysis are performed to indicate the dietary habits. The algorithms are validated
through least controlled experiments. Artifacts such as spontaneous swallows, talking, laughing,
coughing and clearing throat are considered in the algorithm.
Chapter 8 summarizes the thesis and discusses the future work that can be done.

6

Chapter 2:

Related Works

As mentioned in Chapter 1, diet control and physical exercise are the two most important
aspects in obesity control. With the development of microelectromechanical systems (MEMS),
accelerometers have been widely used in physical activity detection and energy expenditure
estimation. Zhang et al [23] had shown that together with some other sensors (such as
temperature sensors, heart rate sensors) the IDEEA® (Intelligent Device for Energy Expenditure
and Physical Activity) device is able to provide over 95% accuracy in energy expenditure
estimation. In this disseration, we are focusing on the swallow detection, which is strongly
correlated with food and drink intake. Existing swallow detection methods are generally
classified into two groups: invasive and non-invasive.
2.1

Invasive Swallow Detection
Many patients with neurologic issues due to stroke, multiple sclerosis, trauma, bulbar palsy

and other impairment may have difficulties in swallowing. Videofluoroscopy is therefore used in
[24] to provide important information on impairment of the swallowing process, providing
essential information to the doctors for arranging treatment accordingly. The paper (a) described
the indications for videofluoroscopic swallowing studies by evaluating patients with neurological
disorders affecting swallowing, (b) described the techniques for evaluating the swallow
mechanism with videofluoroscopy in a standardized manner, and (c) used cine videofluoroscopy
to illustrate the range of abnormalities that can be demonstrated for some of these conditions and
discussed the effect of patient treatment. This method provides the images to demonstrate each
stage of swallowing, and shows the movement of bolus in detail. However, it requires
swallowing food or water mixed with barium, which labels the bolus under X-ray, and patients
7

need to be exposed under radiation. For those reasons, this method is not applicable for swallow
detection for obesity control purposes. Perlman et al [25] analyzed the duration and temporal
relationship of electromyographic activity from the submental complex, superior pharyngeal
constrictor, cricopharyngeus, thyroarytenoid and interarytenoid muscles during swallowing
saliva, 5ml and 10ml water. Bipolar, hooked-wire electrodes were inserted into the muscles
mentioned above except the submental complex, which was analyzed with bipolar surface
electrodes. The experiment included 8 healthy subjects executing 5 swallows for saliva, 5ml and
10ml water individually composing a 120-swallow data set. The total activation duration of all
the muscles during pharyngeal phase of the swallow did not change with bolus size, but some
muscles did demonstrate a difference in electromyograph duration and time of firing between
saliva and 10ml water. Submental muscle activity was longest for saliva swallows. The
interarytenoid muscle showed a significant difference in duration between the saliva and 10ml
water swallow. Finally, the interval between the start of laryngeal muscle activity and pharyngeal
muscle firing pattern decreased as the bolus volume increased. The muscle activation pattern
showed a high correlation within dataset of each subject and high variance across different
subjects.
2.2

Non-Invasive Swallow Detection
Non-invasive methods use accelerometers, microphones, surface electromyography,

piezoelectric sensors etc, to collect physiological signals related to swallowing without involving
tools that break the skin or physically enter the body. Comparing to invasive methods, the noninvasive alternatives have some important advantages:
Safety: invasive methods needs tools such as electrode needle to penetrate the skin to
collect electromyography signal, or patients need to take barium to label bolus under X-ray,
8

which should be carried out by medical providers with special trainings. Incorrect operations,
such as contaminated electrode needles and long-term exposure to X-ray, may severely impact
health of the patient even cause death. However, non-invasive methods do not need to break the
skin, and misplacement of the devices will not lead to negative impacts.
Ease of use: experiments using invasive methods need to be executed in hospitals under
the supervision of professionals, while the non-invasive devices can be even embedded into
clothes [26] or packaged as hearing aid devices [27]. Swallowing detection algorithms can be
implemented in microcontrollers, and results may be wirelessly transferred to health care
providers for analysis.
Long-term monitoring: as invasive experiments need to be carried out in health care
facilities with the help of specially trained professionals, they are not suitable for long-term
monitoring. However, non-invasive sensors can be designed to be portable and suitable for longterm usage.
Non-invasive sensors also have the drawback of not being able to provide as much
information as invasive methods do, i.e., non-invasive methods cannot provide detailed
information about the activation of muscles involved in swallowing process. But non-invasive
sensors provide a suitable solution for the application of food intake monitoring for diet control,
which mainly focus on the number and timing of swallowing events.
2.2.1

Non-Invasive Swallow Detection using Chewing and Swallowing Sounds

Takahashi et al [28] suggested that cervical auscultation in the evaluation of the
pharyngeal swallow may become a part of the clinical evaluation of dysphagic patient. The
presented study investigated three aspects of the methodology for detecting swallowing sounds:
(1) the type of acoustic detector for the analysis of pharyngeal swallow, (2) the type of adhesive
9

suitable for attaching the sensor, (3) the optimal sites for detection. An accelerometer with
double-side paper tape was selected as the optimal detector due to its wide range of frequency
response and small attenuation level. Using this sensor, swallowing sounds and noises were
collected at 24 sites on the neck for 14 healthy subjects. The optimal position for collecting
swallowing sounds were selected with the largest signal to noise ratio and the smallest variance,
and it was proven that the site over the literal border of the trachea immediately inferior to the
cricoids cartilage is the optimal site. The site over the center of the cricoid cartilage and the
midpoint between the site over the center of the cricoid cartilage and the site immediately
superior to the jugular notch were also considered to be the most appropriate sites. This method
provides some guidance to the study of swallowing detection using acoustic information, but the
sensors are needed to be attached in the neck region, which brings cosmetic and safety issues.

Figure 2-1: Schematic sensor positioning in [29]
Cichero et al [30] presented a hypothesis of the cause of swallowing sounds. It was
suggested that as pharynx contains a number of valves and pumps that produce vibrations and
10

reverberations within the pharynx to generate swallowing sounds. An analogy was proposed
between swallowing sounds and heart sounds that propagate via vibration of muscles and valves.
Therefore, many literatures use swallowing sounds as the metric of detection food intake. In
order to derive clear swallowing sounds, microphones are normally attached to the neck region
to be close to pharynx.
Amft et al [29] presents an investigation to detect and classify normal swallowing during
eating and drinking from surface electromyography (SEMG) and microphone sensors as shown
in Figure 2-1. Gel electrodes were placed in the submental and infra-hyoid regions, and recorded
at 24 bit 2kHz. Swallowing sound was recorded using an electrets condenser microphone placed
inferior midline from the cricoid cartilage. The non-invasive sensors were selected to be
integrated into a collar-like fabric for continuous monitoring of swallowing activity over long
period. Feature similarity detection mechanism was applied for both SEMG signal and sound
signal. Signals were first segmented using Sliding-Window And Bottom-up (SWAB) algorithm
[31], which partitioned continuous stream of sensor data by sequentially testing the
approximation of the signal through linear regression lines and using the boundaries of these
approximations as segments. Each segment of sensor data was then compared to a trained pattern
using Euclidean distance for calculating feature similarities. The detection results based on
SEMG and sound signals are then fused using parameter training method. Overall 80% recall
and 70% precision were achieved using the proposed methods. This paper further utilized
machine learning algorithms to differentiate high vs. low volume swallows, and high vs. low
viscosity bolus with normalized accuracy around 70%. This paper is one of the earliest works on
swallow detection. However, as the electrode needs gel to guarantee the electric contact with the
skin and the microphone needs to be closely attached to the neck, the system may not be

11

comfortable for prolonged usage. Moreover, because the sensor system is worn on the neck
region, it will bring up some cosmetic issues.
Passler et al [32] proposed a method for non-invasive monitoring of human food intake
behavior and long-term dietary protocol by using only chewing and swallowing sound sensors.
The sensor system was belt using an in-ear microphone and a reference microphone integrated in
a hearing aid case for recording chewing and swallowing sounds in the ear canal and
environmental noise. It was observed that food intake sounds recorded by the in-ear microphone
had slightly higher signal amplitude than the same sounds recorded by the reference microphone,
while environmental sounds and speech of the participant have comparable signal energies in
both records. Another parameter, magnitude squared coherence function (MSC), was also used,
as the MSC of the environmental sound is high, while that of the food intake sound is low. It was
demonstrated that the detection algorithm by comparing the signal energy of the two
microphones outperformed the method using MSC, and the precision and recall can be 91.3%
and 81.8% respectively. However, this method suffers from some drawbacks: (1) the experiment
was done in quiet room or office without other disturbing environmental noise; (2) the
performance of the system on differentiating swallowing sounds and chewing sounds was not
illustrated. It is very common that some people chew food for longer time, while some other
people swallow before food is fully crunched.
Similarly, Nishimura et al [33] developed an in-ear microphone embedded into a common
Bluetooth headset which is used to capture sound emission generated by chewing as shown in
Figure 2-2. A two stage recognition algorithm was proposed. First, the “Chew-like” signal
detection was performed by using the number of zero-crossings of the regression coefficients
with a negative valued slope and the local peak of LPF output. Second, the chewing sound

12

verification was performed by comparing the extracted features of the testing sound signal with
the training sound signal. High accuracy of 98.7% was reported for 5 food categories, including
chips, salad, rice, wafers, and banana. However, details about the verification process were not
provided. The author did not mention the details of the experiments either, such as the number of
subjects, and how the training data set and testing data set were formed.

Figure 2-2: In-ear microphone used in [33]
Makeyev et al [34] presented a fully automatic food intake detection methodology, with
the aim of improving understanding of eating behaviors associated with obesity and eating
disorders. The system proposed used a miniature throat microphone attached over the
laryngopharynx, which had a dynamic range of 46±3dB with a frequency range of 20-8000Hz.
The proposed method consisted of two stages. First, acoustic detection of swallowing instances
based on mel-scale Fourier spectrum features and classification using support vector machines
was performed. Principal component analysis (PCA) and smoothing algorithms were performed
to improve detection accuracy. Second, the frequency of swallowing is used as a predictor for
detection of food intake interval. Experiments were carried out on 12 subjects with different
13

degree of adiposity. Average accuracy of >80% and 75% were obtained for intra-subject and
inter-subject models. However, similar to [29], the microphone attached to the neck may not be
comfortable for long-term usage, and it also brings up cosmetic issues. External noise was not
considered either during the experiments.
Walker et al [35] also used the throat microphone system for collecting swallowing sounds.
Short Time Fourier Transform (STFT) was performed on the collected audio data, and it was
found that swallow sounds have a stronger presence in the upper frequencies comparing with
other sounds such as vocal cord activation (hum, whispering, and speaking), cleaning the throat,
and coughing. Discrete Wavelet Transform (DWT) was then used to obtain higher temporal
resolution than that offered by STFT at high frequency intervals. Windowed signal energy and
windowed maximum are used to perform swallow event detection. The proposed mechanisms
were tested on two male subjects. The main drawback of this work is that only two subjects were
included in the experiments, and environmental noises are not fully considered.

Figure 2-3: sensors used in [36]
Sazonov et al [36] developed a swallowing and chewing monitoring system to study the
behavioral patterns of food consumption and producing volumetric and weight estimates of
energy intake as shown in Figure 2-3. The system worked based on detecting swallowing by a
sound sensor located over laryngopharynx or by a bone conduction microphone and chewing
through a below-the-ear strain sensor. The system can be implemented in a wearable monitoring
14

device, thus suitable for monitoring ingestive behavior in free living settings. Experiments were
carried out on 21 subjects during eating and quiet sitting. Video and sensor data were manually
labeled by trained professionals. The reliability of manual labels was tested on 5 subjects and it
was demonstrated that the intra-class correlation coefficients are 0.996 for bites, 0.988 for chews
and 0.98 for swallows. The collected sensor signals and the resulting manual scores were left for
future research.
Aboofazeli et al [37] presented a Hidden Markov Model (HMM) based method for the
swallowing sound segmentation and classification method. Swallowing sounds of 15 healthy and
11 dysphagic subjects were studied. The swallowing sound signals were segmented into 25 ms
segments, and 7 features were extracted. Trained HMM model classified the sound signals into
three phases: initial quite phase, initial discrete sound (IDS) and bolus transit sound (BTS).
Multi-scale products of wavelet coefficients were proved to be the most effective feature for
HMM. HMM model was also built to differentiate the swallowing sound of healthy subjects and
patients with disphagia, and accuracy of 85.5% was achieved. However, as the experiment was
done in strictly controlled environment, and artifacts such as ambient noise, talking were not
included, the method may not be used for everyday diet monitoring.
2.2.2

Non-Invasive Swallow Detection using Other Modalities

Other than sounds generated during chewing and swallowing, inertial sensors were also
widely used to monitor other physiological phenomenon related to eating and drinking.
Amft et al [38] proposed a two-stage recognition system for detecting food intake related
arm gestures. Information derived from this system can be used for automatic food intake
monitoring in the domain of behavioral medicine. It is demonstrated that arm gestures can be
clustered and detected using inertial sensors on the arm. Experiments were carried out on 2
15

subjects with 384 gestures with 4 sensors attached to the right and left lower and upper arm. The
subjects were asked to eat or drink using cutlery, spoon, hand and glass. An accuracy of 94% can
be achieved by using HMM method. When analyzing the continuous data, an accuracy of 87%
can be reached. However, the experiments were done on a limited data set, i.e., only two subjects
were included in the experiments, and the type of food lacks variety. Moreover, the experiment
was done in controlled environment, and there would be many false positives if gestures such as
smoking, scratching the head or face, etc were analyzed.

Figure 2-4: Sensor positioning in [39]
Amft and Troster [39] proposed a dietary monitoring system using more than one
modalities as shown in Figure 2-4. The on-body sensing approach was chosen based on three
core activities during food intake: arm movement, chewing and swallowing. The arm and trunk
movements associated with food intake were measured using inertial sensors, i.e., accelerometer,
gyroscope and magnetometer, chewing sounds were recorded using an in-ear microphone, and
swallowing activities were acquired by a sensor-collar containing surface electromyography
16

(SEMG) electrodes and a stethoscope microphone. In three independent evaluation studies, the
continuous recognition of activity events had been investigated and performances were evaluated.
An event recognition procedure was deployed that addresses multiple challenges of continuous
activity recognition, including the dynamic adaptability for variable-length activities and flexible
deployment by supporting one to many independent classes. The approach uses a sensitive
activity event search followed by a selective refinement of the detection using different
information fusion scheme. With experiments, four intake gesture categories from arm
movments and two food types from chewing sounds were detected with a recall of 80-90%, and
a precision of 50-64%. 68% of recall and 20% of precision was achieved for individual swallows.
Although this work is one of the most comprehensive work in dietary monitoring to our best
knowledge, it suffers from the following drawbacks: (1) the system used three groups of sensors
on the arms, neck and in the ear respectively, which may not be convenient for everyday usage
and cause cosmetic issues, (2) the proposed algorithms had low precision for swallow detection,
meaning many false positives would be expected.
Mioch et al [40] examined the patter of activity in masseter and temporalis muscles during
mastication of different food samples with known textural properties and analyzed the interindividual variations. Surface electromyography (SEMG) signals were recorded from the right
and left masseter and temporalis muscles in 36 young adults during free-living and side-imposed
mastication. 5 different types of food with known rheological properties were used. Both
masseter and temporalis activity increased with increased stress at measurements of food, which
confirmed that the mastication process was adjusted according to the food texture. Temporalis
muscle activity was more influenced by food texture than masseter muscles. Less muscle activity
was observed to chew the food during free-living scenario. However, 25% of the subjects did not

17

show any differences between side-imposed mastication and free-living scenarios, indicating that
they may have greater chewing efficiency on one side. Therefore, measuring the activities of
masseter and temporalis muscles may be used for analyzing the food intake of subjects, however,
there are cases that foods with the same texture have very different calorie densities, such as
mushroom and meat.
Nahrstaedt et al [41] investigated the use of a combined electromyography (EMG) and
bioimpedance (BI) measurement at the throat to automatically detect swallowing events. The
measured BI indicated the closure of larynx. There is a typical drop in BI during swallowing.
The activations of the muscles involved were measured using EMG. Valley detection algorithm
was used to segment BI signals. Additionally, only BI valleys that coincide with EMG
activations are selected for feature extraction. Then the extracted features from BI and EMG
signal were classified using Support Vector Machine (SVM) to identify BI valleys related to
swallowing events. The proposed methods were tested on 9 healthy subjects. The data set
contained 1370 swallow events with different bolus size with artifacts such as movement and
speech. The combined BI/EMG segmentation detected 99.3% of all swallow events. The
subsequent SVM classifier had a sensitivity of 96.1% and a specificity of 97.1%. However,
similar to other methods based on EMG, skin contact using conductive gel is required and the
suitability for long-term usage is questionable.
Dong et al [42] proposed a method for measuring food or drink intake through automated
tracking

of

wrist

motion

as

shown

in

Figure

2-5.

A

watch-like

device

with

microelectromechanical (MEMS) gyroscope was used to detect and record the motion of hand,
which was believed to be related to food or drink intake. This method was found to have 94%
sensitivity in controlled meal setting and 86% sensitivity in uncontrolled setting, and both had

18

one false positive out of every 5 bites. Preliminary data showed that bites measured by the device
were positively related to calorie intake indicating the potential of the device to monitor energy
intake. However, the watch was designed to be on during eating only.

Figure 2-5: Prototype gyroscope sensor used in [42]
Moreau-Gaudrey et al [43] proposed a user-friendly non-invasive bedside procedure for
studying swallowing and swallowing disorders in the elderly considering the frailty of this age
group. In this study, respiratory inductance plethysmography (RIP) was proposed. An automated
process for the detection of swallowing was designed, and the first derivative of the breathing
signal was used to pick up the apnea during breathing. An accuracy of 90% was reported given
that an appropriate threshold had been selected. However, only 56 swallows from 14 subjects
were recorded, and no artifacts, such as motion and speech, were considered. Moreover, people
breathing at a lower rate may have longer apneas between consecutive breathing cycles, which
can be detected as false positives using the proposed mechanism.
Damouras et al [44] proposed a method of using an accelerometer for swallowing
detection. A dual-axis accelerometer was attached to the participant’s neck (anterior to the
cricoids cartilage) using double-sided tape. In the paper, the acceleration signal was considered
as a stochastic diffusion where movement was associated with drift and swallowing with
volatility. Consequently, a volatility-based online swallow event detector that operated on the
raw acceleration signal was developed. With data from healthy subjects and subjects with
19

dysphagia, the proposed method is proved to be working as good as their previous work [45],
where same data was used, but without preprocessing. However, the experiment was done in a
more strictly controlled environment, and less artifacts were considered.

Figure 2-6: Magnetic coils and microphone system proposed in [46]
Kandori et al [46] developed a swallowing detection system that can detect swallowing
sounds and measure the distance between two magnetic coils as shown in Figure 2-6. The coils
were set on both sides of the thyroid cartilage, and the distance between them changes in
accordance with the movement of the thyroid cartilage. Swallowing sounds were detected by a
piezoelectric microphone attached to the neck. The coils and microphones were installed on a
holding structure, which was positioned in the neck region. The system was validated using
videofluorography (VF), and it was concluded that high correlation existed between the results
from the proposed mechanism and VF. However, the paper did not consider artifacts such as
motion and speech.
2.2.3

Non-Invasive Diet Monitoring using Infrastructural Sensors

Saeki et al [47] proposed a measuring system of food intake using image processing
method as shown in Figure 2-7. The system was composed of 4 incandescent lamps, a USB
20

camera with 320×240 resolution and a computer running the proposed algorithm. A tray with
plates and bowls on was taken a picture by the system before and after the food intake
experiments. The software running on the computer consisted of two parts, i.e., the image
processing program, and the data base program (DBP). The image processing program included
the communication program, photography processing program, and the measuring processing
program, while the DBP was composed of the dish database, food menu database, food database,
food stuff database and personal database. When comparing the images before and after the
experiment, the nutrition intake was calculated by referring to the detailed information from the
database. This method has the advantage of being able to estimate the energy intake directly and
not requiring any wearing devices. However, it also has some drawbacks: (1) the system is not
portable, so that it cannot be used when the subjects eat outside; (2) the accuracy is expected to
be much lower if the food is layered, as the camera only captures the image from the top; (3) the
usage of database is questionable, because there are too many kinds of food around the world to
be included completely, and people may even cook at will.

Figure 2-7: Camera based system proposed in [47]
Chang et al [48] designed and implemented a diet-aware dining table that could track the
type and volume of food the subjects had taken as shown in Figure 2-8. The dining table was
21

augmented with two layers of weighing and RFID sensor surfaces, where the RFID tag at the
bottom of the container indicated the type of food, and the weighing sensor measured the
changes in weight. A weight-RFID matching algorithm was proposed to detect and distinguish
how people eat. Experiments were carried out to validate this method including scenarios such as
live dining (afternoon tea and Chinese-style dinner), multiple dinning participants, and
concurrent activities chosen randomly. An accuracy of 80% was reported through the
experiments. This work is able to report the per-dinner energy intake, however, it has the
following drawbacks: (1) it can only monitor energy intake when people always have food on the
table; (2) food in the container is sometimes heterogeneous, for example, a dish may consist of
low calorie vegetables and high calorie beef, thus measuring the weight of the whole dish may
not indicate how much energy the subject has taken; (3) the system has the limitation that dishes
should be placed in rather than cross cells where RFID antenna and weighing sensors were
located, and subjects should not place their elbows or hands on the table.

Figure 2-8: The embedded RFID and weighing table surfaces
2.3

Proposed Mechanism
This dissertation presents the design, system level details and algorithms of a wearable

food and drink intake monitoring system that analyzes human breathing signal. Food and drink
22

intake can be detected by the way of detection a person’s swallow events. The system works
based on a key observation that a person’s otherwise continuous breathing process is interrupted
by a short apnea when she or he swallows as a part of the intake process. We detect the swallows
via recognizing apneas extracted by a wearable sensor chest-belt. Such apnea detection is
performed using matched filters and machine learning mechanisms, and further refined using a
Hidden Markov Model (HMM) based mechanism that leverages known locality in the sequence
of human swallows. This dissertation also demonstrates the effectiveness of the proposed
mechanisms using experimental data.
Comparing to the existing work using non-invasive sensors for food intake monitoring, our
work has the following advantages:
(1) Ease of Usage: the work [28]- [37] attached sensors on the neck region for collecting
swallowing sounds using elastic bands or adhesive tapes, however, the elastic bands
and wiring will cause cosmetic issue so that people may be reluctant to wear those
systems. While our system used piezoelectric belts or RIP belts for collecting breathing
signal, which can be worn inside or between garments, and no sign of wearing can be
observed. Moreover, since the piezoelectric belt works solely on a small piece of
piezoelectric sensor, which can be even embedded into clothes, it is very suitable for
long-term diet monitoring.
(2) Usage Comfort: in our experiments [49], a microphone and an elastic belt were
initially used as control, however, subjects complained that the belt was uncomfortable
and may affect their swallowing patterns, i.e., they may have more spontaneous
swallows. Therefore, the long-term usage of microphone on neck region fixed using
elastic belts can be questionable. In some other work [29] [39] [40] [41], SEMG

23

electrodes were used to collect EMG signal on the skin in the neck region, however, in
order to provide reliable contact, conductive gel was normally used, which may
contaminate the clothes and not comfortable for everyday usage.

24

Chapter 3:

Instrumentation System

This chapter first introduces the concept of apnea during swallow, which is the key
observation that we leveraged for intake monitoring, and then compare a number of different
sensors that can be used for breathing signal extraction in our system. Lastly, our proposed
system is described in detail
3.1

Breathing Process
Breathing is the process during which air moves in and out of the lungs to deliver oxygen

and remove carbon dioxide. The lungs can be expanded and contracted in two ways: (1) the
human diaphragm moves up and down to lengthen or shorten the chest cavity, and (2) the ribs
move back and forth to increase and decrease the anteroposterior diameter of the chest cavity
[50]. It has been found that for healthy subjects, the movement of the rib cage contributes to
around 75% of the tidal volume for both resting and exercising scenarios, while tidal volumes
increase with the intensity of exercise [51]. For healthy people, chest and abdomen movements
are synchronous with tidal air flow, but for some patients with chronic obstructive pulmonary
disease (COPD), chest movement is synchronous with flow of air, but the abdomen moves
asynchronously during parts of the breathing cycle [52].
3.2

Swallowing Apnea
Anatomically, breathing is inhibited in a part of the swallow process, thus causing a

swallow apnea. The swallowing process is divided into three steps [53]: 1) the oral preparation
phase, 2) the pharyngeal phase, and 3) the esophageal phase as shown in Figure 3-1. During the
oral phase, food is chewed into a viscous bolus, and liquid is also considered as bolus with very
high fluidity. The volume and viscosity of bolus is also sensed in this phase, so that the
25

swallowing apparatus can adapt to the bolus. In the pharyngeal phase, the bolus travels through
the pharynx and passes the upper esophageal sphincter. A set of muscles are activated to propel
the bolus and the epiglottis moves downward to cover the vocal folds and to protect the trachea
from contamination. Finally, the bolus is pushed towards the stomach during the esophageal
phase. During the pharyngeal phase, since the trachea is blocked by epiglottis, breathing is
temporarily stopped, thus causing the apnea.

Figure 3-1: 3 steps of swallowing
Figure 3-2 shows the breathing signal of two subjects. The rising edges correspond to
inhalations and the falling edges correspond to exhalations. As shown in the figure, a breathing
cycle can be either normal (i.e. Normal Breathing Cycle or NBC) or elongated due to swallowtriggered apnea. A cycle that is elongated due to an apnea at the beginning of an exhale (see the
top figure on the left in Figure 3-2 for subject-1, session-1) is termed as Breathing Cycle with
Exhale Swallow (BC-ES). For a second subject, the bottom figure on the left in Figure 3-2 shows
swallows (i.e. apnea) during the inhale process which are termed as Breathing Cycles with Inhale
Swallow (BC-IS).
Swallow apnea localization with regard to breathing phases has been investigated in this
dissertation. Martin et al [54] reported that swallow apnea happened during expiration with a
probability between 0.94 and 1, and the swallow apnea was followed by expiration with a
probability of 100% for 3, 10 and 20ml bolus size experimented. Klahn and Perlman [55]
26

reported similar results where swallow apnea occurs during expiration 93% of the time and was
followed by expiration with probability 1 in their experiment with 5ml of water and applesauce.
As a comparison, it was reported in [56] that the exhale-swallow-exhale pattern was 77% out of
100 swallows. Therefore, BC-ESs as shown in Figure 3-2 are much more common than BE-IS,
which is validated in our following experiments.

ADC readings

400

Normal Breathing Breathing Cycle with Exhale
Swallow (BC-ES)
Cycle (NBC)

(a) Subject-1
BC-ES

350
300
250
Apnea
200
150

170
Time (Second)

Breathing
Cycle (BC)

190

(b) Subject-2
ADC readings

400

Breathing Cycle with
Inhale Swallow (BC-IS)

350

BC-IS

300
250
200
110

Apnea

130

150

Time (Second)

Figure 3-2: Breathing signal of two subjects in the experiments
By detecting swallow events, the food or liquid amount can be estimated. Nilsson et al
[57] has reported that the average bolus volume for 292 healthy adults during single swallows is
25.6±8.5 ml and 21.1±8.2ml during repetitive swallows. When gender was considered, they
demonstrated that the mean bolus size for single swallows is 28.1±9.1 ml for males and 21.6±5.5
ml for females, and for repetitive swallows, 23.2±9.2ml for males and 17.9±4.8ml for females.

27

3.3

Sensors for Collecting Respiratory Signal
Generally, there are some popular non-intrusive breathing detection techniques for

obtaining a respiratory signal. Dietz [58] proposed a method using a flexible tube worn on the
chest of the subject and connected to an external equipment measuring the airflow through the
tube. Corbishley et al [59] used a miniaturized microphone in an aluminum conical bell on the
neck region, and they proposed algorithms to handle noises from the body, the sensor itself and
environment. Chia et al [60] developed a UWB radar based system, which is able to detect
breathing and heart beat remotely even through walls. Mukai et al [61] used a 40kHz ultrasonic
transmitter and receiver installed into a bed mattress for monitoring respiration, cardiac vibration
and movement. Masks are also widely used for extracting breathing signal, and by detecting the
oxygen concentration, they are able to estimate the energy consumption of human subjects[62]
[63]. Karlen et al used ECG and photoplethysmography (PPG) to estimate respiratory rate. A
pressure sensor array was used in [64][65] to detect breathing signal during sleep and a algorithm
was proposed to select proper sensor sets during movement in [65]. Bates et al [66] Jin et al [67]
Hung et al [68] and Reinvuo et al [69] used accelerometer to detection breathing signal and
proposed algorithms to handle the artifacts caused by body movements and speech. Koo et al
[70] proposed using a piezoelectric polyvinylidene fluoride (PVDF) impedance sensor for
measuring respiratory signal and compared its performance with respiratory inductance
plethysmography (RIP) and nasal-oral pneumotachography. Conductive rubber [71] can also be
used to measure respiratory signal. The sensor is made of a string of carbon black and carbon
filled silicone rubber, whose resistance changes when it is stretched.
Comparing to other sensing methods, conductive rubber, piezoelectric PVDF and
respiratory inductance plethysmography (RIP) have the advantages of (1) non-intrusiveness:

28

these sensors can be put on and off easily, (2) no skin contact: these sensors can be put between
garments without causing any cosmetic issue, and (3) comfort: these sensors can be embedded
into garments and worn all day long without causing any discomfort.
In the following section, we compare the conductive rubber, piezoelectric PVDF and
respiratory inductance plethysmography (RIP) solutions for respiratory signal detection, and
discuss their advantages and disadvantages.
3.3.1

Conductive Rubber Sensor

An elastic belt on the chest or abdomen is able to capture the change in tension when the
chest or abdomen expands or contracts. The tension is directly reflected as the change in the
resistance, and it can be easily converted to voltage when put serially with another constantvalue resistor.

Figure 3-3: Resistive stretch sensor (a) and its static response (b)
In order to evaluate characteristics of the resistive belt, we choose the stretch sensor
manufactured by Scientific Instruments as shown in Figure 3-3:(a). The diameter of the sensor is
1.5mm, and the length is 15cm.

29

Static and transient property of the sensor is analyzed by stretching the sensor to different
length. Figure 3-3:(b) demonstrates the static property of the sensor. In this experiment, the
resistance of the sensor at each sample point is read 1 minute after it is stretched so that the
impact of transient response is minimized. From Fig2, the sensor demonstrates good linearity in
static experiments.
To analyze the transient property, the sensor is first stretched for 5cm to make it tight so
that the impact of slack is minimized. Then the sensor is stretched for another 5 cm, and the
response is therefore recorded. After 1 minute, when the resistance of the sensor is stable, the
sensor is release for 5 cm. Figure 3-4 shows the resistive response of the sensor when such an

9

9

8

8

Resistance (KOhm)

Resistance (KOhm)

experiment is done.

7
6
5
4

7
6
5
4

0

20

40
Time (Second)

60

0

(a)

20

40
Time (Second)

60

(b)

Figure 3-4: Transient response of stretch sensors
From Figure 3-4, when the sensor is stretched, a jump can be observed followed by decay.
Similarly when the sensor is released, its resistance jumps up quickly before decaying to its
resulting resistance.
Generally, the resistive belt provides a cheap solution for breathing detection. However, its
transient response deforms the breathing signal despite its linearity in static case.

30

3.3.2

Piezoelectric PVDF Sensor

The device contains a piezoelectric sensor placed between two elastic strips. Stretching the
belt exerts a strain on the sensor, which generates a voltage proportional to the strength of the
force. Compared with other transduction principles, such as capacitive, inductive and
piezoresistive sensors, piezoelectric sensors provide highest sensitivity [72] and excellent
linearity over a wide amplitude range.

Figure 3-5: Piezoelectric belt (a) and its static response (b)
In order to evaluate the characteristics of the Piezo respiratory sensor, both static and
transient response analysis is done as for resistive belt. Figure 3-5:b shows the static response of
the piezo respiratory sensor, and the sensor exhibits good linearity in the experiment.
The piezo electric devices have very high input impedance, and the equivalent circuit is
shown in Figure 3-6. As the input resistances of normal oscilloscopes are around 1MΩ, and the
Cs of the piezo respiratory belt we are using is 2.2µF, when the output of the sensor is directly
injected into the oscilloscope, the time constant of the circuit is 2.2 second. Therefore, in order to
analyze the transient response of the sensor when it is dragged or released, we need an isolate
circuit for a larger time constant. Operational amplifier MAX406 is able to provide ~1011 Ω
31

input impedance, which would bring up the time constant to 2.2x105 seconds. Consequently, and
isolation circuit shown in Figure 3-6:b is adopted to separate the oscilloscope and the sensor.
Cs

+
-

Vs

(a)

(b)

140

Sensor output (mV)

120
100
80
60
40
20
0
0

2

4

6
Time (Second)

8

10

12

(c)
Figure 3-6: Equivalent circuit of piezoelectric sensor (a), Isolation circuit (b), and
Transient response of piezo respiratory sensor(c)
Figure 3-6:c demonstrates the signal captured by the oscilloscope when the piezo
respiratory sensor is stretched for 15mm and then released. The output signal is very clean and
follows the mechanical input closely. Compared with resistive belt, the piezo respiratory belt has
much better characteristics.

32

3.3.3

Respiratory Inductance Plethysmography (RIP) Sensor

Figure 3-7: RIP sensor
An inductance belt is relying on Farady’s Law and Lenz’s Law that a magnetic field will
be generated when current flows through a loop of wire, and a change in the area enclosed by the
loop would create current in the loop in the opposite direction proportional to the change in the
area. When the inductance belt is used, a low amplitude sine wave of ~20 mV at ~300 KHz is
injected through the belt. The inhalation and exhalation would change the area enclosed by the
belt introducing an opposing current in the belt and thus deforming the applied current and
changing the frequency. The frequency is then demodulated to produce analogy waveform
reflecting the change of the area. It is reported that the output of the belt changes linearly with
the cross-sectional area [73]. A RIP respiratory system is normally composed of two RIP belts,
wearing on the chest and abdomen respectively.
Consequently, a driver module is needed to measure the breathing signal, which consists of
a frequency generator, signal processor and analog/digital converter. The price of an inductance
belt set is therefore much higher than the other two sensors analyzed.
As a conclusion, the conductive rubber sensor is inferior in terms of transient response.
The piezoelectric belt provides a better solution with good linearity and transient response, yet its

33

position needs to be adjusted for each subject for optimum signal amplitude. RIP belt system
provides the best signal quality and does not need position adjustment as it has a dedicated signal
processing driver module and two belts wearing on the chest and abdomen individually, although
it cost much higher than piezoelectric belts. Therefore, in this dissertation, we have done
experiment using both piezoelectric belts and RIP belts for swallow detection. For the RIP belt,
we used the sum output of the two belts, i.e., on the chest and abdomen.
3.4

Swallow Detection System
Based on the analysis from Section 3.2, we developed two systems using piezoelectric belt

and RIP belts respectively. In the experiment carried out in Chapter 4, 5 and 6, piezoelectric
sensor belt is used, while RIP belt is adopted in our future experiments.

Figure 3-8: System architecture of swallow detection system using piezoelectric belt

34

The system architecture of swallow detection system using piezoelectric belt is shown in
Figure 3-8. The embedded wearable sensor system is worn on the chest for collecting breathing
signal and transmitting it to a smart phone through Bluetooth. The embedded belt system
contains: 1) a piezo-respiratory belt for converting the changes of tension during breathing to a
voltage signal, 2) an amplifier and signal shaping circuit for formatting the raw voltage signal to
a format suitable for the ADC chip, 3) a processor and radio subsystem (EZ430-RF256x from
Texas Instrumnet), and 4) a 3.7V 300mAH polymer rechargeable battery. The entire package
weighs approximately 40 grams. The 300mAh polymer battery is able to support the system for
more than 30 hours of continuous operation on a single charge. After the signal is received by the
smart phone, it is stored on an SD card attached to the phone.

Figure 3-9: System architecture of swallow detection system using RIP belts
The swallow detection system using RIP belts is shown in Figure 3-9. Similar to the
system based on piezoelectric belt, it includes: 1) a pair of RIP belts for collecting breathing
35

signal, 2) a signal shaping circuit for amplifying and filtering the raw signal from the sensor to
cater the requirement of ADC stage, 3) a processor and Bluetooth subsystem (TI EZ430RF256x), which is able to sample the signal at 100Hz and transmit it over Bluetooth to the
external smart phone, and 4) a 3.7V 300mAh polymer rechargeable battery. The whole packet
weighs approximately 45 grams. The 300mAh polymer battery is able to support the system
for around 20 hours.
Due to the fact that the output signals from piezoelectric belt and RIP belts are very
different in terms of output impedance, signal amplitude and signal to noise ratio (SNR), signal
shaping circuits need to be designed individually to cater their characteristics. However,
battery, µController and Bluetooth module, app on smart phones can be shared between the
two systems.
3.4.1

Signal Shaping Circuit

The out of these sensors are noisy and weak, for example, the output of the piezoelectric
belt on a subject during normal breathing is only about 10mV, while the corresponding output
for RIP belt is only 20µV. Moreover, since the RIP respiratory belts are using high frequency
current to drive the belt as discussed in 3.3.3, the output signal is also contaminated with those
high frequency components.
A. Signal Shaping Circuit for Piezoelectric Belts
Signal shaping circuit is needed for the piezoelectric belt for two reasons. First, based on
our observation, when people are breathing, the change in the circumference of the chest is
normally within several millimeters, which makes the output of the belt by fluctuating within 10
mV. In order to capture this signal using main stream ADCs on mobile sensors, amplification

36

circuit is necessary. Second, the output signal of piezo respiratory belt is proportional to the
tension on the belt, so when the belt is worn on the chest or abdomen, a certain amount of
electrons will be accumulated on the two sides of the sensor because of the tension. The
electrons will become the DC part of the respiratory signal collected on the belt. This DC value
may change from person to person depending on how they wear it, and it may also change from
time to time when the tension on the belt changes slowly. The uncertainty of the DC value makes
the design of the amplification circuit complicated. Therefore, this amplification circuit must be
designed carefully to meet these two requirements.

Figure 3-10: Signal shaping circuit for piezoelectric belt
Amplification circuit as shown in Figure 3-10 is designed to meet the two requirement
mentioned above. The circuit can be divided into four parts. The first part is drifting control. It
adds an 8.2MΩ resister to the output of the belt, which is able to damp the DC drifting when the
belt is worn by different people. The time constant introduced by the 8.2MΩ resistor is 18.04

37

seconds, which would reduce the impact of DC drifting while bringing minimum effect on
respiratory signal. The second part is voltage shifting. It sets the default DC voltage to be 63mV,
so that the output signal stays positive. The third part is impedance match. It isolates the previous
parts from the amplification circuit. The last part is amplification. It is consisted of a low pass
filter and an amplification circuit. The time constant of the low pass filter is 5.6 milliseconds,
and it can filter out the noise caused by previous circuit. The amplification rate of the circuit is
controlled by an adjustable resistor.
B. Signal Shaping Circuit for RIP Belts
Since the output signal amplitude of the RIP sensing module is very small (around 20µV
peak-to-peak for normal breathing) during normal breathing, high amplification ratio is
necessary for the signal shaping circuit. A low pass filter is also implemented to filter out the
high frequency components in the breathing signal caused by the alternating current injected into
the belts. Figure 3-11 shows the schematic of the circuit. The first op-amp (i.e., MAX412 (1) in
the figure) is to provide a stable 1.5V reference to the input and the amplifier on the right. The
second op amp (i.e., MAX412 (2) in the figure) forms the amplifier with an amplification ratio of
10,000. The low pass filter accompanying the amplifier has a cutoff frequency of 28Hz. After the
amplification, the breathing signal has a peak-to-peak value around 200mV.
The MAX412 (MAX412BCSA+) op-amp is chosen because of its low input offset voltage
and low input noise-current density, i.e., the input offset voltage VOS=±150µV, and input noisecurrent density in=1.5nV/Hz1/2.

38

10nF
Input
560KΩ
3V

-

56Ω

10KΩ
+

+

MAX412 (1)

MAX412 (2)

10KΩ

Output

Figure 3-11: Schematic of signal shaping circuit
3.4.2

µController and Bluetooth Module

The processed signal from the signal shaping circuit module is fed into an ADC channel of
a µController and Bluetooth module (EZ430-RF256x). The ADC converter has an accuracy of
14 bits, and a sampling rate at 100Hz. For a breathing signal with 200mV peak-to-peak value,
the SNR regarding the quantization noise is 60dB.
The sampled data is then sent via Bluetooth to the Android smart phone for food intake
monitoring application.
In addition to the breathing signal from the belts, acceleration data collected from the
accelerometer on the µController and Bluetooth module is also collected and transmitted to the
smart phone for future usage.
3.4.3

Breathing Signal Logging App

An Android app, named LiveActive, is developed on the smart phone to perform food
intake detection based on breathing signal received from Bluetooth.

39

Bluetooth
connection

Android
API

LiveActive App

LiveActive Activity

Food
intake
detection

GUI

Bluetooth

Webservice

Ethernet connection

Figure 3-12: Logic layering of LiveActive
Figure 3-12 demonstrates the logic layering of the LiveActive. The LiveActive activity
coordinates the other modules. Bluetooth connection module receives breathing signal from
Bluetooth API provided by Android. Graphic User Interface (GUI) plots the breathing signal on
the screen. Food intake detection module runs in background to perform the detection algorithm
illustrated in Section 3. Webservice module updates the detection results to the server using
either WIFI or cellular network. Currently, the food intake detection module and webservice
module are still under development, and the collected data are stored in the microSD card in
the smart phone for further processing. The proposed design of the webservice module is
shown in Figure 3-13.

40

Intake detection

Packetizing

Unpacketizing

JSON serialization

Server side

Smart phone side

Bluetooth interface

JSON deserialization

Figure 3-13: Design of the webserve module

Figure 3-14: Graphic User Interface (GUI) of LiveActive
As shown in Figure 3-13, the breathing signal is first received through the Bluetooth
interface as a constant data stream, which will then be packetized into data packets. The size of
the packet is going to be optimized to achieve the shorter transmission delay and less memory
usage. Json serializer is used to convert the data packet into a data array, which can be
41

transferred through the Ethernet to the server side. On the server side, the data is converted
back to a continuous data stream, and the intake detection algorithm is launched to detect
swallow events embedded in the breathing signal.
Figure 3-14 shows the GUI of the app LiveActive. The GUI is mainly divided into three
parts. The first part sits on the top of the screen showing the raw data received from Bluetooth
interface, the second part plots the breathing signal, and the last part plots the acceleration data of
three axes. All the three parts get updated when a new sample point is received.

42

Chapter 4:

Swallow Detecting using Matched Filters

The sensor system and detection algorithms developed in this chapter works based on a
key observation that a person’s otherwise continuous breathing process is interrupted by a brief
apnea during a swallow, which is a part of the intake process [74]. We first detect swallows by
the way of detecting apneas extracted from breathing signal captured by a wearable wireless
chest-belt with piezoelectric sensors as demonstrated in Chapter 3.4. Afterwards, swallow
pattern analysis is used for identifying drinking swallows. Together with self-reporting at the
high level of overall liquid intake habits (i.e., the types of drinks etc.), the instrumented
detection of liquid swallow counts can offer an objective way to: 1) study the liquid intake
trends, and 2) estimate calorie intake.
4.1

Processing Methods
Matched filters [75] are widely used in signal processing. A matched filter is implemented

by correlation a known signal, or reference, with an unknown signal to detect the presence of the
template signal in the unknown signal. This method is equivalent to convolving the timereversed reference signal to the unknown signal. The matched filter is the optimal linear filter,
which maximizes the signal to noise ratio (SNR).
In this dissertation, we used matched filters to detect the presence of swallow apnea for
swallow detection purpose.
4.1.1

Breathing, Apnea and Swallow Signature

Figure 4-1: Demonstrates two representative human breathing signal segments. The ADC
readings in the figure are directly proportional to the elongation of the piezo-electric sensing
belt shown in Chapter 3. A breathing cycle can be either normal (i.e., Normal Breathing Cycle
43

or NBC) or elongated due to a swallow-triggered apnea. A cycle that is elongated due to an
apnea at the beginning of an exhale (see Figure 4-1:a) is termed as Breathing Cycle with
Exhale Swallow (BC-ES). Figure 4-1:b shows swallows (i.e., apnea) during the inhale process
which are termed as Breathing Cycles with Inhale Swallow (BC-IS). During our experiments,
it was also found that BC-ES is much more prevalent than BC-IS, which also coincides with
previous research in [54][55][56].

ADC readings

400

Normal Breathing Breathing Cycle with Exhale
Swallow (BC-ES)
Cycle (NBC)

(a) Subject-1
BC-ES

350
300
250
Apnea
200
150

170
Time (Second)

Breathing
Cycle (BC)

190

(b) Subject-2
ADC readings

400

Breathing Cycle with
Inhale Swallow (BC-IS)

350

BC-IS

300
250
200
110

Apnea

130

150

Time (Second)

Figure 4-1: Examples of Breathing Cycles (BC), Normal Breathing Cycles (NBC),
Breathing Cycles with Inhale Swallow (BC-IS), Breathing Cycles with Exhale Swallow (BCES) and apnea
The objective in this chapter is to be able to classify NBC, BC-ES, and BC-IS with high
accuracy. The challenges in detection stem from the fact that there is significant variability in
breathing waveforms across different: 1) subjects, 2) measurement instances for the same
subject, and most importantly, 3) the location and duration of the apnea with respect to its

44

breathing cycle. Moreover, breathing signal may also be contaminated by artifacts such as
movement, speech, coughing and etc.
4.1.2

Matched Filter Method

In what follows we present the performance of a matched filter based template matching
mechanism for swallow detection. The template signals for matched filters are chosen from
NBCs, BC-ESs, and BC-ISs, so that a breathing cycle can be classified as one of those three by
observing the similarity score produced by the respective filters.
As shown in Figure 4-2, the signal sampled by ADC at 30Hz is first fed into a low-pass
filter for removing any quantization noise. The second step is to extract individual breathing
cycles through a peak and valley detector. The next module is for normalizing the extracted
cycles in both time and amplitude, so that both input waveform and the reference waveforms of
the matched filters have the same amplitude and the number of sample points. Each breathing
cycle is normalized to be between 0 and 100 (ADC output units) vertically, and interpolated to
128 sample points. Considering the average length of a breathing cycle of 3.77 seconds in our
experiments, the normalized sampling rate after interpolation is approximately 34Hz. Note that
the time-normalization provides a way to handle variable duration breathing cycles and
variable duration apnea (i.e., caused by different amount of liquid intake in one swallow) by
creating a uniform duration swallow signature. Such a uniform duration signature is then
presented to the proposed matched filter and machine learning algorithms.

45

Breathing
Cycle
Extractor

Low Pass
Filter
ADC
readings

Normalization

BC-IS

Detection
Result

Comparator

Reference s(n)
Matched filter-1
Similarity Score-1
µ1
BC-ES
Reference s(n)
Matched filter-2
Similarity Score-2

µ2

Input x(n)

NBC

Reference s(n)
Matched filter-2
Similarity Score-3

µ3
Detection
Result

Machine learning based
detection

Feature
extraction

Figure 4-2: Detection process of matched filter based detection algorithm
The normalized breathing cycle waveforms are fed into three separate filters, each with a
specific type of reference template waveform. The filters use reference waveforms corresponding
to NBC, BC-ES, and BC-IS. The similarity score outputs are compared in order to classify a
breathing cycle as one of the above three types of BCs.
Note that the bottom part of Figure 4-2 shows how the shaped signal is used for feature
extraction and swallow classification of using machine learning based methods as presented in
Section 4.1.3.
4.1.3

Machine Learning based Detection

A machine learning approach with time domain features is applied using all 128 sample

46

points in a normalized breathing cycle. The Toolkit Weka [76] was used for implementing three
different classifiers, namely, Support Vector Machine (SVM), Decision Tree (J48), and Naïve
Bayes. The classifier parameters are optimized to provide the best accuracy. For SVM,
polynomial kernel function is used and the features are normalized. All the other parameters are
set to default values. A 10-fold validation approach is used in which the collected breathing
cycles are randomly divided into 10 subsets of equal size and a classifier is run for 10 times. In
each run, one subset is used for testing while the others are used for training. 10-fold validation
method is used to avoid over-fitting [77].
Breathing signal power at different frequencies, computed using FFT, is also used as
features in machine learning. Since FFT is applied on normalized 128-point breathing cycles
with a normalized sampling frequency at 34Hz, it produces spectral coefficients for frequencies
up to 34Hz with a granularity of 0.27Hz. As each normalized breathing cycle is a real finite
length series, the resulting 128-point power spectrum are symmetric on fୱ /2	[78]. Therefore, the
first 64 spectral power values are used as the features for driving the classifiers.
4.1.4

Artifacts Handling

Breathing signals may suffer from artifacts, and in this dissertation, we proposed methods
for handling artifacts caused by moving and speech.
Anatomically, it is not possible to swallow while talking, but people can talk right before
or after swallowing. Therefore, it is necessary to detect talking so that swallow detection from
breathing signal can be paused whenever talking is detected. Figure 4-3 demonstrates an
exemplary breathing signal with speech artifacts. Observe that exhalation parts have larger
slopes and more undulation, which is caused by modulated air flow through vocal folds.
In this dissertation, we proposed using power spectral density analysis to identify breathing
47

cycle with speech.
3000

Breathing
Start talking

End of talking

ADC readings

2600
2200
1800
1400
Normal breathing
1000
150

160

170

180

190

Time (Second)

Figure 4-3: Breathing signal variability before, during, and after talking.
When people are eating or drinking while sitting, slight upper body movement may be
involved. Therefore, exaggerate upper body movement was also involved in the experiments to
analyze its impact on breathing signal.
4.2

Results
Experiments using the piezoelectric chest belt system were carried out with seven subjects

(five male and two female) without any known respiratory or swallowing disorders. The belt was
worn immediately inferior to sternum, where best signal strength was derived across all subjects.
Each subject performed three sessions, 10 minutes each. In the first 5 minutes, the subject was
asked drink water from a flask with a swallow instruction given once in every 20 seconds. Then
the subject conversed with the experimenter for 3 minutes, and in the last 2 minutes, the subject
shook their upper body and drank every 20 seconds. Breathing signals from the first 5-minute
phase were used for swallow detection, and that from the last 5 minutes were for artifacts
handling. First phase of each session resulted in approximately 80 Normal Breathing Cycles
(NBCs) and 20 breathing cycles with swallows (both BC-ESs and BC-ISs). Please note that

48

spontaneous swallows were also included. For the first phase of each session, approximately 100
breathing cycles were recorded in total. For each subject, a library containing cycles from three
such sessions (i.e., around 300 cycles) was then formed.
4.2.1

Results for Matched Filter based Detection

Templates or reference waveforms for the matched filters are computed based on cycles
from the library as constructed above. A template for NBC is created by sample-by-sample
averaging of three randomly chosen NBC breathing cycles from the library. Such randomness
adds to the desired variability while forming the template. Similar process is adopted for
constructing the templates for BC-IS and BC-ES breathing cycle types. One set of NBC, BC-IS
and BC-ES cycles is referred to as a template combination.
Figure 4-4 shows the performance of matched filter based swallow detection method while
using a large number of template combinations as the reference signals (i.e., S(n)) for all three
breathing cycle types. For each subject, 3500 different template combinations are first created
from the library by choosing different random combinations of NBC, BC-IS and BC-ES cycles
while forming the templates. Then, each waveform in the library is classified to be one of NBC,
BC-IS or BC-ES using the matched filter-based detection. An ROC pair (True Positive Rate,
False Positive Rate) is finally computed for each of the 3500 template combinations. Figure 4-4
shows the resulting ROC distributions.

49

Figure 4-4: ROC Distribution for all seven subjects with arbitrary templates.
The cluster of high value columns in Figure 4-4 indicates that even with arbitrarily chosen
50

template combinations, majority of them offer high True Positive and low False Positive rates.
The spread in the distribution indicates that there exist NBC, BC-IS, and BC-ES waveforms
which, if chosen to generate templates, can indeed degrade the system performance.
We experimented with different number of cycles (i.e., three for the above results) for
generating the matched filter templates. It was observed that with more cycles, the spread in
ROC performance were relatively shorter.
4.2.2

Performance for Machine Learning based Method with Time Domain Features

Classification accuracies for machine learning based methods using time domain features
are summarized in Table 4-1 on both per-subject basis and on combined data set from all seven
subjects. In the subject specific case, SVM and J48 perform better in terms of both True and
False Positive rates. In the combined case, J48 provides the best swallow detection performance.
Subject

Individual subject

Combined data set

Classifier
SVM

True Positive
Rate ±std (%)
98.69±2.03

False Positive
Rate ±std (%)
0.14±0.16

J48

98.8±0.49

0.32±0.14

Naïve Bayes

97.7±2.62

2.63±2.3

SVM

87.6

1.5

J48

97.5

0.7

Naïve Bayes

85.5

9.5

Table 4-1: Performance of classifiers using time domain features
4.2.3

Performance for Machine Learning based Method with Frequency Domain
Features

Table 4-2 shows the frequency domain detection performance of 3 classifiers on seven
subjects using a 10-fold validation. Similar to the time-domain results, SVM and J48 provides

51

better True and False Positive rates for subject-specific classification. When classification is
done on combined data from all seven subjects, J48 outperforms the other two.
Subject

Individual subject

Combined data set

Classifier
SVM

True Positive
Rate±std (%)
99.29±1.25

False Positive
Rate±std (%)
0.09±0.15

J48

98.8±0.51

0.37±0.17

Naïve Bayes

95.96±3.03

2.96±1.26

SVM

88.8

2.2

J48

96.6

0.8

Naïve Bayes

82.1

4.8

Table 4-2: Performance of all three classifiers using frequency domain features.
Performance on both individual dataset and the combined dataset are presented in this table.
Note that in Tables 4-1 and 4-2, the standard deviation comes from the variation of true
and false positive rates across the subjects. Since for the combined data set scenario in both the
tables, the data from all seven subjects are combined into a single data set, standard deviation
does not apply to that scenario.
4.3

Discussion
4.3.1

Iterative Template Refinement

The performance of matched filter based detection method heavily depends on the
selection of reference waveforms. As shown in Figure 4-4, improper selection of reference
waveform can deteriorate the performance significantly. In this section we develop an iterative
template refinement algorithm for incrementally improving the swallow classification
performance. First, an NBC, a BC-IS, and a BC-ES waveform are chosen from the breathing
cycle library. Second, all collected breathing cycles are classified using those three waveforms as
the templates to the three matched filters. At this stage, each collected cycle is classified as NBC,

52

BC-IS, or BC-ES. Third, all cycles that are classified as NBC are sorted based on their similarity
score obtained from the NBC matched filter in the second step. Now, the top 50% of those NBC
cycles are sample-by-sample averaged to create the NBC template for the second iteration. The
same process is also executed for BC-IS and BC-ES to form the templates for the second
iteration. The second and third steps are iteratively repeated till the breathing cycles selected in
the third step for generating templates stabilize.
Stabilization is defined as when the differences between the matched filter similarity scores
across consecutive iterations reduce below a pre-defined threshold, which in turn, dictates the
overall error performance of the mechanism. The algorithm is summarized in Algorithm 1.
Algorithm 1: Iterative template refinement algorithm
Input: Initial templates TNBC, TBC-IS, and TBC-ES
while (templates have not converged)
for (all collected breathing cycle ‫ݔ‬௜ )
Compute similarity scores ߤ௜ே஻஼ , ߤ௜஻஼ିாௌ , ߤ௜஻஼ିூௌ 	for ‫ݔ‬௜
if (ߤ௜ே஻஼ ൌ ݉ܽ‫ݔ‬ሺߤ௜ே஻஼ , ߤ௜஻஼ିாௌ , ߤ௜஻஼ିூௌ ሻ) then
‫ݔ‬௜ is NBC;
if (ߤ௜஻஼ିாௌ ൌ ݉ܽ‫ݔ‬ሺߤ௜ே஻஼ , ߤ௜஻஼ିாௌ , ߤ௜஻஼ିூௌ ሻ) then
‫ݔ‬௜ is BC-ES;
if (ߤ௜஻஼ିூௌ ൌ ݉ܽ‫ݔ‬ሺߤ௜ே஻஼ , ߤ௜஻஼ିாௌ , ߤ௜஻஼ିூௌ ሻ) then
‫ݔ‬௜ is BC-IS;
Generate a new set of templates as:
TNBC = average (detected NBCs with top 50% ߤ௜ே஻஼ )
TBC-ES = average (detected BC-ESs with top 50% ߤ௜஻஼ିாௌ )
TBC-IS = average (detected BC-ISs with top 50% ߤ௜஻஼ିூௌ )
return
The key concept here is that even when the initial matched filter template quality is poor,
by choosing the top 50% of the cycles, the algorithm is able to iteratively refine the template
quality, thus delivering good final detection performance. Figure 4-5 depicts the algorithm
dynamics in the form of the similarity score state space at the start and at stabilization of
template refinement. The top graph shows the location of all the collected breathing cycles in the
similarity score space obtained from the starting template waveforms. The graph has 247 points,
53

corresponding to 247 collected breathing cycles. The bottom graph corresponds to similarity
score space obtained from the template waveforms when the algorithm stabilizes.
Observe that the overlapping among the three types of breathing cycles is much less in the
bottom graph compared to the top one. This indicates a clear improvement of the matched filter
template quality, leading to improved separation of different classified cycle types. The tighter
clustering of the points in the bottom graph provides additional indication to better template
quality compared to the starting set. The patterns in Figure 4-5 have been consistently observed
for a wide range of initial template quality applied to the data from all seven subjects.
Figure 4-6 shows the representative performance of iterative template refinement for a
specific subject (i.e., subject-2). The evolution of true and false positive rates are reported for
three different starting template sets, termed as, Good Starting Point (GSP), Moderate Starting
Point (MSP), and Poor Starting Point (PSP). GSP represents the NBC, BC-IS, and BC-ES
combination in the breathing cycle library that provides the highest true positive rate and the
lowest false positive rate as evaluated in Figure 4-4. PSP, on the other hand, represents the
combination in the library that provides the lowest true positive rate and the highest false
positive rate. Finally, MSP is chosen to be a combination for which the true positive and false
positive rates are somewhere in between.

54

µ NBC
1

0.95

0.9
1

1

µ BC − IS

0.9

0.9

µ BC − ES

0.8 0.8

Algorithm Start

µ

+ NBC

NBC

⃝

BC-IS

1

* BC-ES
0.95

0.9
1

1
0.9

0.9

µ BC − IS

0.8

0.8

µ BC − ES

Algorithm End
Figure 4-5: Similarity score space for: a) initial matched filter template used as a starting
point, and b) the final template obtained at stabilization of the iterative algorithm. The tighter
clustering of the points in the bottom graph indicates iterative improvement of the template
quality.
Observe that the true positive rate for PSP consistently improves with iterations. For MSP,
such rates either improve or remain constant. With GSP, true positive rates go down slightly,
55

although the decrement is always observed to be much less than the improvements observed for
PSP, thus establishing the effectiveness of the approach.
Subject-2
0.4

GSP

PSP

0.9
0.8

False Positive Rate

True Positive Rate

1

Subject-2

MSP

0.7
PSP

0.6

(a)

0.5

0.3
0.2
GSP
MSP

0.1

(b)
0

0

10

20

0

Iteration Count

10

20

Iteration Count

Figure 4-6: Iterative template refinement performance; a) true positive rate, and b) false
positive rate evolution with iterations.
Note that for few PSPs with highly deformed BC-ES or BC-IS waveforms, the false
positive rates temporarily go up with iterations before they settle down to lower values. This
explains the temporary increase in the false positive rate in Figure 4-6:b. For the majority of the
PSPs, however, the false positive rate remains acceptably low. Results with waveforms from
other subjects demonstrated very similar performance patterns.
4.3.2

Discrimination Power of Time Domain Features

The results of discrimination power analysis of the time-domain features (i.e., all 128
sample points of a breathing cycle) are shown in Figure 4-7. Figure 4-7:a depicts the overall
importance of each feature in swallow classification in terms of merit. The merit here refers to
information gain [79], which is defined as the reduction in class entropy (i.e., H (*) ) with
additional information provided by the feature about the target classes. Assuming A is the

56

feature, and C is the set of classes, the following two equations indicate the class entropy before
and after providing the feature:
H (C) = −∑ p(c) log2 p(c)
c∈C

H (C | A) = −∑ p(a)∑ p(c | a) log2 p(c | a)

0.6

c∈C

A

(a)

0.5

C

Merit

0.4
0.3
0.2

B

0.1
1
16
31
46
61
76
91
106
121

0

100
Normalized ADC readings

a∈A

Features

A

B

C

(b)

80
60
BC-ES

40
20

NBC BC-IS

0
1 21 41 61 81 101 121
Features

Figure 4-7: Utility of the time domain features for Subject-1; three peaks in the left figure
are caused by different types of breathing cycles with feature distribution shown in the right
figure.
A feature with higher merit indicates lower class entropy when this feature is adopted. It
also indicated higher utility of a feature, which can be used as a guidance factor when feature
reduction is needed in the presence of limited computational and storage resources.
There are three distinct utility peaks in Figure 4-7:a, which can be explained using the
breathing cycles shown in Figure 4-7:b. The sample points in peak region A are instrumental in
distinguishing NBC from BC-IS and BC-ES, and those in region B help distinguishing BC-IS.
Finally, the sample points in region C distinguish all three breathing cycle types. The implication
of these results is that if a feature reduction is needed, unimportant samples can be eliminated
57

from the areas not in the vicinity of the peaks in Figure 4-7:a. While the results in Figure 4-7:a
are for subject-1, we have observed very similar patterns of discrimination power for all seven
subjects.
4.3.3

Discrimination Power of Frequency Domain Features

Figure 4-8:a depicts the overall importance of the spectral power at each frequency. The
power in the frequency range 0 (i.e., DC) to approximately 3Hz contains the most information
for differentiating the three target breathing cycle types. Figure 4-8:b reports the ROC graph with
both time and frequency domain features, when only up to 5 features are allowed. The feature
sets are selected using the subset evaluation method [79] as follows.
1

0.7
0.6

0.95
True Positive Rate

0.5
0.4
Merit

(b)

2 (3,4,5)

(a)

0.3
0.2
0.1

2
0.9
Number of
features=1

0.85
0.8

Time
Raw
FFT

Number of
features=1

0.75

0
0

3

5

8

11

13

0

16

Frequency (Hz)

0.02

0.04

0.06

0.08

False Positive Rate

Figure 4-8: (a) utility of frequency domain features, (b) comparison between time and
frequency domain features; results are presented with limited number of features that are chosen
using a method as described.
The first feature is selected using the method illustrated in Section 4.3.2 which ensures the
largest possible reduction in class entropy. The resulting first features are the 10th sample point
(out of 128) in time domain, and the DC spectral power in frequency domain. These first features
58

in time and frequency domains can be observed in Figure 4-7:a and Figure 4-8:a respectively.
Using the same procedure as above, the rest of the four features are added iteratively while
maximizing the reduction in class entropy [79].
With more features, the difference in detection performance between the time and
frequency domain approaches is negligible. When only one feature is used, however, the
frequency domain approach outperforms the time domain approach for the following reason.
Figure 4-8:a demonstrates that the DC component has the highest discriminative power, which
can be expressed as:
ೖ

ି௜ଶగ ௡
ேିଵ
ಿ |
ܺ௞ୀ଴ ൌ ∑ேିଵ
௞ୀ଴ ൌ ∑௡ୀଵ ‫ݔ‬௡ ,
௡ୀଵ ‫ݔ‬௡ ൉ ݁

where ܺ௞ୀ଴ represents the area under curve of the breathing cycle waveform. For BC-ES,
since the apnea is located at the beginning of an exhale, its area under curve is much higher than
that of NBC and BC-IS. Moreover, majority of the swallows are found to be BC-ES, which is
why ܺ௞ୀ଴ alone can be used to achieve considerable detection accuracy. However, no single
feature in time domain is able to provide similar discriminative power.
Observe that machine learning can provide higher detection accuracy compared to the
matched filters, although they require a-priori training. The matched-filters, on the other hand,
can achieve acceptable performance using the iterative template refinement algorithm as
presented in Section 4.3.1.
It should be noted that both the presented mechanisms are subject-dependent and require
personalized training. The subject-dependency stems from the wide diversity of the breathing
signals across subjects and their inherently different breathing patterns. In spite of such diversity,
however, the swallow signatures were found to be detectable through appropriate algorithm
training as proposed for both the techniques.

59

4.3.4

Artifacts Handling

We analyzed the undulated exhalation of breathing cycles during talking using power
spectral density (PSD). Figure 4-9 shows the comparison between PSD of breathing signals
during talking (solid lines) and those during NBCs and swallows (dashed lines). The density is
computed over 4.27-second windows to facilitate 128-point FFT for the 30Hz sampling rate
(4.27 second=128pts/30Hz). Observe that the PSDs with talking contain many more variations
between 0 and 2Hz mainly because of the undulations during exhalation as illustrated in Figure
4-3. This was consistently observed across a large number of subjects and sessions.

0

1

0

2

1

2

2

PSD

2

0

Frequency (Hz)

1

2

Talking

0

Frequency (Hz)

2

NBC
/Swallow

PSD

0

1
Frequency (Hz)

Subject-7

PSD

Frequency (Hz)

1

Subject-6

PSD

Subject-5

1

0

Frequency (Hz)

Frequency (Hz)

0

Subject-4

PSD

Subject-3

PSD

Subject-2

PSD

Subject-1

1

2

Frequency (Hz)

Figure 4-9: Power spectral density (PSD) of breathing signals with talking and without
talking, when normal breathing or breathing with swallows are executed.
Using the variance of difference between the PSD of received breathing signal and the
reference NBC signal, it is possible to identify talking so that swallow detection can be paused
during talking. Detection of talking is accomplished by using a threshold (of variance of

60

difference), which can be either manually set or can be trained using variance of difference as a
feature.
In order to analyze the impacts of upper body movements, in the last 2 minutes of each
session, the subject shook their upper body and drank every 20 seconds to simulate changing
postures and rocking, which often occurs during food and drink intake sessions. Figure 4-10
shows the breathing signals for a subject both with and without such upper torso movements
while swallowing. Note that the signals belong to the same subject, but it is constantly observed
for other subjects also.

ADC readings

2200

(a) With artifact

Subject-1

Swallow

Swallow

2000
1800
1600
1400
290

ADC readings

2200

300

310
320
Time (Second)

330

(b) Without artifact

Subject-1

Swallow

2000

340

Swallow Swallow

1800
1600
1400
200

210

220
230
Time (Second)

240

250

Figure 4-10: Breathing signal: (a) with upper body rocking movement, and (b) without the
movement
By comparing the signals with and without artifacts, one can observe that such movement
artifacts do not introduce any noticeable changes to the breathing signal, mainly due to the fact
61

that such movement does not change the circumvent of the chest area where the belt is placed.
Because of this very minimal impact, the swallow signatures are well preserved, and can still be
clearly discerned using the mechanisms described earlier in the chapter. Therefore, the proposed
mechanism for swallow signature detection does work even when natural upper body movements
are present during a swallow process.
4.4

Summary
This chapter reported the algorithm design for a wearable liquid intake monitoring system

using piezoelectric chest belt illustrated in section 3.4. A matched filtered based template
matching framework, along with a number of template design mechanisms, both static and
iterative, were developed for swallow detection with high true positive rate and low false
positive rate performance. This paper also presented the preliminary results for classifier based
detection using both time and frequency domain features. Finally, talking and upper body motion
artifacts were analyzed.
Please note that this chapter only focused on the detection of liquid intake, although
swallowing apnea is also present for solid food intake. Using the features extracted from
breathing cycles, we may be able to distinguish solid and liquid intake using the proposed chest
belts.

62

Chapter 5:

Machine Learning Based Processing Algorithms

In this chapter, we present a swallow detection algorithm based on machine learning
methods. The system works based on the key observation that during swallowing, because the
trachea is blocked, a person is not able to breathe, thus causing a temporary apnea. Using the
wearable piezoelectric chest belt introduced in Chapter 3, we detect swallows by the way of
detection apnea captured by the chest belt. After the swallow sequence is recorded, a swallow
pattern analysis can potentially be used for identifying non-intake swallows, solid intake
swallows, and drinking swallows.
Comparing algorithms proposed in Chapter 4, where only liquid intake monitoring was
tested, in this chapter, we demonstrate the algorithm and software extension of the same concept
for monitoring both solid and liquid intakes.
5.1

Processing Methods
5.1.1

Machine Learning Algorithms

Machine learning is a branch of artificial intelligence, and it provides methods and
algorithms for building systems that can learn from the data. For example, a machine learning
system can be trained on a large number of hand written digits to recognize them, and after
training, the system can be used to classify new hand written digits.
A number of machine learning algorithms have been developed, such as Artificial Neural
Network (ANN), Support Vector Machine (SVM), Naïve Bayes, and Decision Tree.
Artificial Neural Network: An ANN is composed of a network of inter connected neurons
in different layers [80][81]. A typical ANN has three layers, the first layer has input neurons,
which sends data via synapses to the second layer of neurons, and then via more synapses to the
63

third layer of output neurons. More complex systems will have more layers of neurons or more
number of neurons in each layer. The synapses have parameters called weight that manipulate
the data in the calculation. Each neuron has a transfer function which determines its output based
on the weighted inputs. The output of a neuron can be expressed as:
m

y = F (∑ wi xi + b)
i =0

Where:

xi is the ith input value,
wi is the weight associated with ith input value,
b is the bias,
F is the transfer function,
y is the output value.

When the structure and transfer function of an ANN are predefined, by adjusting the
weights and biases in an ANN, we can train the network to produce the output we want for
specific inputs.
Support Vector Machine: SVM is a classifier that constructs a hyperplane in a high
dimensional space, which is able to classify the data with the largest separation, or margin
between the two classes. For data sets that are not linearly separable, SVM maps the original
space into a much higher dimensional space using a kernel function. The hyperplane is known as
the maximum-margin hyperplane, and the classifier is also called the maximum margin
classifier. The decision function can be expressed as:
T
r
r r
D(d ) = ∑ ck ak K (d k , d ) + b
k =1

64

Where:

r
r
(d k , ck ), k = 1,2,...T is the training set, d k is the features of instances in the training set,
r
and ck is the class label for training instance with feature d k

ak and b are parameters trained using the training data set.
In the case of classifying more than two classes, the method one-versus-all is normally
used, in which one class is distinguished against all the other classes.
Naïve Bayes: Naïve Bayes classifier is a probabilistic classifier based on Bayes’ theorem
with the assumption of strong independence among the input features. The Bayes theorem
shows:
r
r
P ( d | c j ) P (c j )
r
P (c j | d ) =
P (d )

r
d ∈ Rn , c j ∈ Z

Where:

r
r
P(c j | d ) is the probability of an instance with feature vector d being in class c j , and it is
also known as posterior,

r
r
P(d | c j ) is the probability of generating an instance with feature vector d given class c j ,
and it is also known as likelihood,

P(c j ) is the probability of occurrence of class c j , and it is also known as prior,
r
r
P(d ) is the probability of occurrence of an instance with feature vector d , and it is also
known as evidence,

65

r
In practice, as P(d ) does not depend on class c j , in classification problem, it is a constant

r
among classes, and it is therefore ignored. P(d | c j ) and P(c j ) can be estimated based on the
training data set. Therefore,

r
r
class of d = arg max P (c j | d )
cj

Decision Tree: A decision tree is a classifier expressed as a recursive partition of the
feature space. A decision tree has 3 types of node:
(1) Root node: the root node is the entry point of a decision tree, which have no input and
multiple output,
(2) Test node: a test node has exactly one input edge, and runs if-ifelse-..-else statement, it
may have two or more output edges,
(3) Leaf node: a leaf node has one input edge, and it demonstrates the class that an instance
belongs to.
We use J48 (also known as C4.5) in this dissertation. In each test node, J48 chooses the
feature that most effectively splits the data set. The splitting criterion is the normalized
information gain, which is discussed in Chapter 6. The feature with the highest normalized
information gain is chosen to make the decision. The algorithm is then recursively applied on the
smaller subsets.
Decision trees can get undesirably complex and over-fitting, such that each training
instance takes one node. Therefore, the stopping criteria are used by decision tree algorithms.
The typical stopping criteria include [82]:
• Number of cases in the node is less than a threshold
• Percentage of instance belongs to a class exceed a predefined limit in a node

66

• Depth of the tree is within certain limit
5.1.2

Breathing Apnea and Swallowing Signature

ADC readings

400

Normal Breathing Breathing Cycle with Exhale
Swallow (BC-ES)
Cycle (NBC)

(a) Subject-1
BC-ES

350
300
250
Apnea
200
150

170
Time (Second)

Breathing
Cycle (BC)

190

(b) Subject-2
ADC readings

400

Breathing Cycle with
Inhale Swallow (BC-IS)

350

BC-IS

300
250
200
110

Apnea

130

150

Time (Second)

Figure 5-1: Examples of Breathing Cycles (BC), Normal Breathing Cycles (NBC),
Breathing Cycles with Inhale Swallow (BC-IS), Breathing Cycles with Exhale Swallow
(BC-ES) and apnea
Figure 5-1 demonstrates two representative human breathing signal segments. The ADC
readings in the figure are directly proportional to the elongation of the piezo-electric sensing belt
shown in Chapter 3. A breathing cycle can be either normal (i.e., Normal Breathing Cycle or
NBC) or elongated due to a swallow-triggered apnea. A cycle that is elongated due to an apnea at
the beginning of an exhale (see Figure 5-1:a) is termed as Breathing Cycle with Exhale Swallow
(BC-ES). Figure 5-1:b shows swallows (i.e., apnea) during the inhale process which are termed
as Breathing Cycles with Inhale Swallow (BC-IS). During our experiments, it was also found
67

that BC-ES is much more prevalent than BC-IS, which also coincides with previous research in
[54][55][56].

Solid swallow

2000

Subject-1, Solid

Solid swallow

Solid swallow

ADC readings

1800
1600
1400
1200
1000
0

20

40
Time (Second)

80
Subject-1, Liquid

Liquid swallow Liquid swallow

270

Liquid swallow

250
ADC readings

60

230
210
190
170
0

20

40
Time (Seconds)

60

80

Figure 5-2: Example breathing signals for solid and liquid swallows
Figure 5-2 shows example breathing signals with solid and liquid swallows. As can be
seen, for solid swallows, breathings are deeper and contain more temporal fluctuations. The key
objective is to be able to classify three types of breathing cycles, namely, NBC, BC-ES, and BCIS, and to detect if the swallow is a solid or liquid one. The challenges stem from the fact that
there is significant variability in breathing waveforms across different: 1) subjects, 2)
measurement instances for the same subject, and most importantly, 3) the location and duration

68

of the apnea with respect to its breathing cycle. Among other things, this depends a great deal on
the swallowing habits and the texture of the material that is being swallowed.
5.1.3

Detection Scheme

Figure 5-3 depicts the logic for classifying breathing cycles towards swallow detection.
The raw data sampled by ADC at 100Hz is first fed into a low-pass filter for removing
quantization noise caused by the A-to-D conversion process. Because the power spectrum of
breathing signal is mainly below 2.5Hz, 100Hz is obviously sufficient. The second step is to run
the filtered data stream through a peak and valley detection module in order to extract the
individual breathing cycles. The next module is used for normalizing the extracted cycles in both
time and amplitude dimensions. Each breathing cycle is normalized to be between 0 and 100
vertically, and interpolated to 128 sample points. Considering the average length of a breathing
cycle of 3.77 seconds in our experiments, the normalized sampling rate after interpolation is
mapped to 34Hz. The objective of normalization is to make sure that although different cycles
may have different time and amplitude ranges (person-to-person or cycle-to-cycle for the same
person), they can be effectively identified based on the apnea caused by swallowing.
The normalized breathing cycle waveforms are fed into a feature extraction module which
extracts time domain or frequency domain features. These extracted features are then selected
based on their discriminative power, and fed into a classifier for training or testing purposes.
Number of features would affect the complexity and performance of classification. A classifier
would be simple but with inferior performance if very few features are selected. Classifiers with
a large number of features, however, are complex but do not necessarily provide superior
performance [77].

69

Raw data
Low pass filter
Peak and valley detection
Normalization
Feature extraction
Feature selection
Swallow detection

Solid/Liquid detection

Normal breathing

Solid

Liquid

Figure 5-3: Logic for swallow signature detection
A hierarchical classification scheme is used for solid and liquid swallow detection. The
first classifier detects if a breathing cycle is an NBC or a breathing cycle with swallow. The
second classifier detects if a swallow is a solid and liquid when the output of the first classifier is
a swallow.
5.2

Experiments
Experiments using the piezoelectric breathing belt were carried out for swallow detection

with three subjects, including 2 male and 1 female. Each subject performed three liquid swallow
sessions and three solid swallow sessions, each session lasting for five minutes. Each subject was
asked to wear the instrumented chest-belt and sit still throughout the experiment. During the
liquid swallow session, the subject drank water from a flask with a swallow instruction given
70

once in every 20 seconds. 20 ml of water was added to the flask for each swallow, ensuring the
swallow volume to be 20ml. Each liquid swallow session resulted in approximately 80 Normal
Breathing Cycle (NBC) and approximately 15 breathing cycles with swallows (both Breathing
Cycle with Exhale Swallow (BC-ES) and Breathing Cycle with Inhale Swallow (BC-IS)).
During the solid swallow sessions, the subject was asked to eat 6 grams of crackers each time at
their comfortable rate, and noted the time when he or she swallowed. Considering that the
cracker would be chewed and mixed with saliva, the formed bolus was roughly the same volume
as 20 ml of water swallows. The resulting swallow signals are collected over the Bluetooth
channel on a smart phone.

Subject 1

Subject 2

Subject 3

NBC

Solid swallow

Liquid swallow

(Seconds)

(Seconds)

(Seconds)

Maximum

5.61

6.81

7.56

Minimum

2.36

2.91

3.36

Average

3.24

4.86

4.79

Maximum

5.88

9

5.56

Minimum

1.64

3.54

3.57

Average

3.44

6.27

4.27

Maximum

4.51

9.33

6.64

Minimum

1.93

4.26

2

Average

3.05

6.22

4.27

Table 5-1: Durations of different breathing cycle types
Table 5-1 summarizes the duration of different types of breathing cycles. In addition to the
spread of the cycle durations across subjects, it should be observed that the cycles with swallows

71

(i.e., both solid and liquid) are consistently longer than the normal breathing cycles. This is
mainly due to the short apnea introduced by the swallow events. Moreover, it can also be
observed that there is significant difference in the lengths of solid swallow and liquid swallow,
which is mainly because of the different texture of the bolus in solid swallow and liquid swallow.
5.3

Results and Discussion
5.3.1

Feature Extraction and Selection

As analyzed in our previous work [83], both time domain and frequency domain features
can perform well in detecting liquid swallows. The discriminative power of those feature types,
however, can be different. As shown in Figure 5-4:a, for time domain features, sample points
with indices near 16 and 90 are more important than other sample points in classification. As
shown in Figure 5-4:b, for frequency domain features, lower frequency components have more
discriminative power. It was also found that the discriminative power distribution of frequency
domain features are more consistent across subjects, which is why the time-domain features are
used in this dissertation.
0.6

A

0.7

(a)

0.5

C

0.5
Merit

0.3
B

0.4
0.3
0.2

0.1

0.1

0

0
1
12
23
34
45
56
67
78
89
100
111
122

Merit

0.4

0.2

(b)

0.6

0

3

5

8

11

13

Frequency (Hz)

Features

Figure 5-4: Discriminative property of time and frequency domain features
72

16

The second set of classification features is derived from the first derivative of the breathing
signal. As shown in Section III, it was found that the solid swallows generally create more
fluctuations in the breathing signal compared to the liquid swallows. To capture such
fluctuations, an additional classification feature was derived from the first derivatives of the
breathing signal. More specifically, the number of ±10 crossings is used as the feature, which is
defined as the number of points in the breathing signal at which the first derivative of the signal
is exactly +10 or -10. Compared to the number of zero crossings, the number of ±10 crossings
not only captures the fluctuations observed in solid swallows, but also helps detecting the
swallows in the first place. Figure 5-5 shows an example of the benefits of ±10 crossings of first
derivative in detecting swallows. In this case, the number of zero crossings of first derivative is
1, which is the same as NBCs and is not sufficient in detecting the swallow, but the number of 10 crossings of the first derivative is 2 instead of 1 in case of NBC, which helps to detect the
swallow.
Normal Breathing
-10 crossing of
1 st derivative

100
80
60
40

Zero crossing of
1 st derivative

20

Breathing with Swallow

120
Normalized amplitude

Normalized amplitude

120

-10 crossing of
1 st derivative

100
80
60
40

Zero crossing of
1 st derivative

20

0

0
1

21

41

61

81 101 121

1

Normalized Time

21

41

61

81 101 121

Normalized Time

Figure 5-5: Benefits of ±1 crossings as a classification feature

73

The third set of features is derived from various length distributions of the breathing
cycles. Table 2 summarizes all the used features used in this paper.
Features
1st order Fourier transform coefficient
2nd order Fourier transform coefficient

Frequency
domain
features

3rd order Fourier transform coefficient
4th order Fourier transform coefficient
5th order Fourier transform coefficient
Number of +10 crossings in first derivative
Number of -10 crossings in first derivative

Features
from
waveform

Breathing cycle length
Inhalation length
Exhalation length Inhalation depth
Exhalation depth
Table 5-2: Features selected for classification

5.3.2

Swallow Detection

All the above features are fed into the hierarchical classifier for solid and liquid swallow
detection. In order to prove the generalizability, we adopt the leave-one-out method, in which
case, data from all subjects are used for training except the one whose data is used for testing.
Table 5-3 and Table 5-4 report the performance of the hierarchical classifier using the
leave-one-out method. As can be seen, SVM provides the best performance among all the
applied methods for both the classifier stages. For the first stage, for all subjects the true positive
rates remained higher than 82.9% and false positive rates lower than 1.6%. The performance of
the second stage classifier has accuracy ranging from 88% to 73.33% when SVM is applied.
Testing the system with more subjects is under way.

74

Subject 1

Subject 2

Subject 3

True positive rate

False positive rate

(%)

(%)

SVM

82.9

1.6

J48

76

2.4

Naïve Bayes

100

1.2

SVM

84

0

J48

88.6

4.9

Naïve Bayes

97.1

4.1

SVM

86.7

0

J48

83.3

8.6

Naïve Bayes

93.3

8.6

Table 5-3: Performance of the first stage of the hierarchical classifier

Accuracy (%)
SVM
Subject 1

82.86

J48

80

Naïve Bayes

Subject 2

Subject 3

76

SVM

88

J48

80

Naïve Bayes

68.6

SVM

73.33

J48

70

Naïve Bayes

70

Table 5-4: Performance of the second stage of the hierarchical classifier

75

5.4

Conclusion
This chapter reported the algorithm and performance of the machine learning based food

and drink intake detection system. It presented the machine learning based swallow detection
method using hierarchical classification scheme.
During the experiment and analysis it was found that food intake swallows have very
regular temporal patterns in lunch session. Such temporal information can therefore be used for
improving the system detection accuracy, which is going to be analyzed in the next chapter.

76

Chapter 6:

Support Vector Machine and Hidden Markov Model
based Processing Algorithms

This chapter presents a wearable solid food intake monitoring system that analyzes human
breathing signal and swallow sequence locality for solid food intake monitoring. Food intake is
identified by the way of detecting a person’s swallow events. A Support Vector Machine (SVM)
is first used for detecting such apneas in breathing signals collected from a wearable chest-belt.
The resulting swallow detection is then refined using a Hidden Markov Model (HMM) based
mechanism that leverages known locality in the sequence of human swallows. The chapter
experimentally demonstrates the effectiveness of such two-stage SVM-HMM based mechanism
for solid food intake detection via analyzing breathing signal and human swallow sequence
locality.
In our previous work [83][84], we reported the effectiveness of such as system for
detecting liquid-only intake monitoring using swallow signal analysis. As an extension to our
previous work, in this chapter we focus on solid-only intake detection using a two stage SVMHMM processing strategy as follows. After the swallow sequence is recorded, a Support Vector
Machine (SVM) is first used for detecting such apneas in breathing signals collected from a
wearable chest-belt. The resulting swallow detection is then refined using a Hidden Markov
Model (HMM) based mechanism that leverages known locality in the sequence of human
swallows. In a future publication we plan to report processing mechanisms and their
effectiveness for joint liquid-solid intake monitoring.
The contributions of this chapter are: 1) combining SVM and HMM methods for
processing breathing signals for solid food intake detection, and 2) experimentally demonstrating

77

the detection accuracy and effectiveness of the proposed system and the signal processing
methods.
6.1

Processing Methods
(a) Subject-1, Session-1

ADC readings

400 Normal Breathing

Breathing Cycle with
Exhale Swallow (BC-ES)

Cycle (NBC)

350

BC-ES

300
250
Apnea

200
150

Breathing
Cycle (BC)

170

190

Time (Second)
(b) Subject-2, Session-1

ADC readings

400
350

Breathing Cycle with
Inhale Swallow (BC-IS)
BC-IS

300
250
200

Apnea

110

130
Time (Second)

150

Figure 6-1: Respiratory signal with swallow signature
The piezoelectric belt based breathing signal collection system proposed in Chapter 3 is
used in this chapter. Figure 6-1 demonstrates a number of experimentally obtained breathing
signal segments from different human subjects. The ADC readings in the figure are directly
proportional to the elongation and contraction of the piezo-electric sensing belt. The rising edges
correspond to inhalations and the falling edges correspond to exhalations. As shown in the
78

figure, a breathing cycle can be either normal (i.e. Normal Breathing Cycle or NBC) or elongated
due to swallow-triggered apnea. A cycle that is elongated due to an apnea at the beginning of an
exhale (see the top figure in Figure 6-1 for subject-1, session-1) is termed as Breathing Cycle
with Exhale Swallow (BC-ES). For a second subject, the bottom figure in Figure 6-1 shows
swallows (i.e. apnea) during the inhale process which are termed as Breathing Cycles with Inhale
Swallow (BC-IS).

ADC
readings

Normalization

Low Pass
Filter
Feature
extraction
Breathing
Cycle
Extractor

SVM detection
Posterior Probability
HMM
Improved Detection
using Swallow
Sequence Locality

Figure 6-2: Processing scheme for swallow detection
Figure 6-2 depicts the logic for classifying breathing cycles towards swallow detection.
Before sending the data to the ADC, an anti-aliasing analog low pass filter circuit with cutoff
frequency of 30 Hz is applied. The signal is then sampled by ADC at 100Hz and fed into a
software-based low-pass filter for removing quantization noise caused during the A-D
79

conversion. Because the power spectrum of breathing signal is mainly below 2.5Hz, 100Hz
provides a fast enough sampling rate. The next step is to run the filtered data stream through a
peak and valley detection software module in order to extract the individual breathing cycles. In
order to perform peak and valley detection, the data stream is first divided into 30% overlapping
10-second windows, and then a threshold based algorithm from [85] is used. The threshold is set
to

0.3(max d ( m )∈C d (m) − min d ( n)∈C d (n))

, where C is the set includes all the data points in the 10-

second window, and d (m ) and d (n ) are the mth and nth sample points in the 10-second
window.
After individual breathing cycles are extracted, they are normalized in both time and
amplitude dimensions. Each cycle is normalized to be between 0 and 100 vertically, and
interpolated to 128 sample points in time. Considering the average length of a breathing cycle of
3.77 seconds in our experiments, the normalized sampling rate after interpolation is mapped to
34Hz. Although different cycles may originally have different time and amplitude ranges
(person-to-person or cycle-to-cycle for the same person), the normalization process removes
such variance in duration and amplitude, thus making the cycles more suitable for the apnea
detection process.
Feature extraction module takes breathing cycles before and after normalization and
extracts features including: 1) breathing cycle length, 2) inhalation duration and depth, 3)
exhalation duration and depth of breathing cycles before normalization, and 4) ±10 crossing
counts, 5) first 5 Fast Fourier Transform (FFT) coefficients of normalized breathing cycles. The
details about the features extracted are demonstrated in Section 6.2.3. The features are then fed
into the Support Vector Machine (SVM) detection module with posterior probability outputs,
which are illustrated in more details in Section 6.1.2. At this stage, a posterior probability
80

indicates the SVM-detected probability of a given breathing cycle to be of types normal
breathing or breathing with swallow. Information about swallow sequence locality is not utilized
at this stage. Finally, the Hidden Markov Model (HMM) is applied to the posterior probability
outputs of the SVM module to improve the detection performance by leveraging a-priori
knowledge about swallow sequence locality. The HMM modeling is presented in Section 6.1.3.
6.1.1

Two-tier Swallow Detection

In our previous work [83][84], Support Vector Machine (SVM) was shown to be the best
classifier for liquid swallow detection. Like in traditional usage of SVM [77], the classification
output for each breathing cycle was a class label, which is normal breathing or breathing with
swallow. After analyzing the classification errors in [83][84], it was realized that many of those
errors can be corrected by applying known locality information in human swallow sequences.
For example, people rarely swallow in many consecutive breathing cycles. Thus, whenever the
classification output shows many consecutive breathing cycles, errors can be suspected and the
misclassified instances can be identified/removed by applying higher level techniques such as the
Hidden Markov Model (HMM). This motivates the two-tier detection using SVM and HMM
presented in the next subsections.
6.1.2

SVM-based Swallow Detection with Posterior Probability

Consider the following training set of size T:

( x1 , y1), ( x2 , y 2 ), ( x3 , y 3 ),..., ( xT , y T )
In each training instance ( x i , y i ) , xi ∈ R n represents a set if n input features, and yi is
the corresponding class label. For a binary class system in our case, yi can be defined as

81

{

yi = 1 if xi ∈ Breathing cycle with swallows
y i = −1

if xi ∈ Normal breathing

i = 1,2,..., T

A traditional SVM decision function can be derived as [86]:
T

D( x) = ∑ yk ak K ( xk , x) + b
(1)

k =1

where

ak

and

b

are trained using the training dataset, T is the number of training

instances, and K ( xk , x ) is the kernel function of SVM. Classification for a test feature set x j
using the decision function can be as follows:

x j ∈ Breathing cycle with swallows, if D( x j ) > 0
{
x j ∈ Normal breathing,
otherwise
The distance between x j and the decision boundary (i.e., that separates breathing cycle
with swallows and normal breathing) with the maximum margin can be expressed as D( x j ) C
[86], where C is a positive constant depending on ak ( k = 1, 2,..., T ) , training feature set
( k = 1, 2 ,..., T ) and the kernel function. Therefore,

xk

D( x j ) is positively correlated to the

confidence of correct detection, meaning the closer to the decision boundary, the less confidence
in correct classification.
In order for the HMM to be able to process the SVM output, the latter needs to be in the
form

of

posterior

probability

ADDIN

ZOTERO_ITEM

CSL_CITATION

{"citationID":"ptU5sHDL","properties":{"formattedCitation":"[32]","plainCitation":"[32]"},"cit
ationItems":[{"id":591,"uris":["http://zotero.org/users/642418/items/TWVH4M6A"],"uri":["http:
//zotero.org/users/642418/items/TWVH4M6A"],"itemData":{"id":591,"type":"paper82

conference","title":"Food

Intake

Activity

Detection

Using

a

Wearable

Microphone

System","page":"298 -301","source":"IEEE Xplore","event":"2011 7th International Conference
on Intelligent Environments (IE)","abstract":"A method for non-invasive monitoring of human
food intake behavior and long-term dietary protocol has been developed by the sole use of
chewing and swallowing sound sensors. A novel sensor system has been built containing an inear microphone and a reference microphone integrated in a hearing aid case in order to record
chewing and swallowing sounds in the ear canal and environmental noise, respectively. Using
manual labeled records of the food intake sounds of 40 participants we developed an algorithm
to detect food intake activity in sound data. Comparison between sounds from both microphones
enables

the

discrimination

between

internal

and

external

sounds.","DOI":"10.1109/IE.2011.9","author":[{"family":"Passler","given":"S."},{"family":"Fis
cher","given":"W.-J."}],"issued":{"dateparts":[["2011",7]]}}}],"schema":"https://github.com/citation-stylelanguage/schema/raw/master/csl-citation.json"} [32][86] as opposed to class labels used by
traditional SVM models [77] as described above. An appropriately designed SVM [86] can
indicate the probability that a given input feature set correspond to a specific class. This
probability is referred to as the Posterior Probability for that class. In what follows we describe
the mechanisms for computing such probabilities which are the input for the swallow sequence
based Hidden Markov Model presented in Section 6.1.4.
Posterior

probability

for

class-i

is

formally

defined

as

prob ( class i | input features ) = prob ( ± 1 | xi ) . This indicates the probability that a given input

feature set

xi correspond

to a breathing cycle with swallow or a normal breathing cycle. It

83

follows that prob (1 | xi ) + prob ( −1 | xi ) = 1 . We use the following method for computing posterior
probability using the SVM decision function D( x j ) , as proposed by Wahba in [87]:

prob(classi | input features) = prob( y = 1| x) =

1
1 + exp(A * D( x) + B)

(2)

where A and B are constants and estimated by minimizing the negative log likelihood of
training data set using regression methods.
6.1.3

Hidden Markov Model with Swallow Sequence Locality

The key concept of HMM in swallow detection is as follows. A sequence of breathing
cycles are represented by a discrete time Markov Chain consisting of two states (i.e., normal
breathing cycles and breathing cycles with swallows) that are hidden from an observer, meaning
that an observer cannot directly determine which state the system is in at any given point of time.
However, the posterior probability out of the SVM, which indicates the likelihood of the system
being in any state, is visible to the observer. The idea of HMM formulation is that if the locality
in swallow sequence dynamics and the mapping between the system’s state and posterior
probability observation are known (or measurable) to the HMM model, then by observing the
posterior probability out of the SVM the current state in the Markov Chain can be estimated.
Hidden State Space: As shown in Figure 6-3:a, a breathing cycle sequence can be modeled
as a hidden state machine with two hidden states, namely, Normal Breathing and Breathing
Cycle with Swallows. The states are hidden because they are not deterministically known from
posterior probabilities computed out of the SVM processing.

84

a11

a22

a12
i = Normal
Breathing

j = Breathing
Cycle with
Swallows

a21

Oi

Oj
(a)

Sequence of posterior
probabilities observations (O)
Features

SVM
Module

Sequence of estimated
states (qt, t=1,2…T)

HMM Processing Module

Transit ion Probability Matrix (A)
[Swallow sequence locality]

Initial Probability
Array (π)
Observation Matrix (B)
[Observation to hidden states mapping]

(b)
Figure 6-3: (a) Hidden breathing state machine and (b) HMM processing components
Transition Probability Matrix: It is defined as A = {aij } , where aij represents the
probability of transitioning from state Si to state S j .

a ij = prob ( q k = S j | q k −1 = S i )

85

It is assumed that

qk

depends only on qk −1 , which means

prob(q k | q k −1 ) = prob(q k | q k −1 , q k −2 ,..., q1 )
A is an 2×2 matrix for two breathing cycle types in our case. The transition probability
matrix is constructed from the true swallow sequence detected by a video camera and push
button. The probabilities in this matrix represent the swallow sequence locality information
which is leveraged by the HMM processing
Observation Matrix: Although the states are considered hidden, the SVM-computed
posterior probability at each state can be considered as an observable parameter for HMM
modeling purposes. For a given state-i, the posterior probability prob( yi = 1 | xi ) , generated by
the SVM detector, is utilized for constructing an observation bitmap

Oi

in the following

manner.
The probability range [0, 1] is divided into N equal windows (we used N=10) and each
window is presented as a bit in the N-bit long bitmap

Oi

The bit corresponding to the window in

which the posterior probability prob ( y i = 1 | xi ) falls on is set to 1, and all other bits in

Oi

are

set to 0. For example, with N=10 and prob ( y i = 1 | x i ) = 0.71 , the observation bitmap for
state-i will be Oi = {0,0,0,0,0,0,0,1,0,0, }
Now let b jm be the probability that if an observation bitmap’s mth bit is 1 (i.e., all other
bits are 0s) then the system is in hidden state j. Formally stated:

b jm = prob (O = {bit1 = 0,..., bit m = 1,...} | State = S j )

86

An observation matrix of size M×N (M: Number of states, N: number of bits in the
observation bitmap) is constructed as B = {b jm } . In this case, a 2×10 matrix is constructed by
combing the true swallow events detected by a video camera and push button, and the SVM
outputs prob( yi = 1 | xi ) after processing the chest belt sensor data. This observation matrix,
together with the transition probabilities and the following initial probability array, is used for
HMM processing as described in Section 6.1.4.
Initial Probability Array: The initial probability array is represented by a vector π = [π i ]
of length M (i.e. 2), in which

π i = prob (q0 = S i ) 1 ≤ i ≤ M

π i indicates the probability that the initial state of the hidden state machine is Si .
Therefore, by definition
M

∑π

i

=1

i =1

This array is formed using true swallow data gathered by the experimental system as
described in Chapter 3. The swallow system as modeled by HMM can be expressed as a tuple:

λ = ( A, B, π ) , where A, B, and π represent the hidden state transition matrix (i.e. known
swallow sequence locality), the observation locality, and the initial condition of the state
machine respectively.

87

6.1.4

HMM Processing

As shown in Figure 6-3:b, the processing model is fed by the HMM model

λ = ( A, B, π )

and the posterior probability observation sequence, and its outcome is an estimation of the
current system state

qˆt .

The probability of observing a sequence for a given model prob (O | λ ) can be expressed
as [88]:

prob (O | λ ) = ∑ prob (O | Q , λ ) prob (Q | λ )
Q

where

(3)

O = {O1O2O3 ...OT }

is a sequence of observations and Q = {q1 q 2 q3 ...qT } is a

sequence of states, and they have the same sequence length T. Note that qi is the i-th state in
the state sequence

Q and it can represent any state S ,1 ≤ j ≤ M (i.e. M=2). prob (O | Q, λ )
j

means the probability of having the observation sequence O given the state sequence Q and the
model λ. prob ( Q | λ ) indicates the probability of having the state sequence Q for the model λ.
prob ( O | Q , λ ) and prob ( Q | λ ) can be expressed as:
T

prob (O | Q , λ ) = ∏ prob (Oi | q i , λ )
(4)

i =1

prob(Q | λ ) = prob(q1 ) prob(q2 | q1 )... prob(qT | qT −1 ) (5)
where

prob(Oi | qi , λ )

is the probability of observing

corresponds to an element in the observation matrix B,
state

q1

Oi

in state

qi

prob(q1 ) is the initial probability of

corresponding to an relevant element in initial probability array

88

, which

π

, and

prob(qi +1 | qi )

is the probability of transitioning from state

qi

to state

qi+1

, which corresponds

to one element in transition probability matrix A.

prob(O | Q, λ ) and prob (Q | λ ) in Equation (3) using Equation (4)

By substituting
and (5),

∑ prob (q ) prob (O

prob (O | λ ) =

1

1

| q1 , λ ) prob ( q 2 | q1 )

q1 , q 2 ,... qT

... prob ( qT | qT −1 ) prob (OT | qT , λ )

(6) It can

be interpreted as follows. Initially the system is in state q1 with probability prob ( q1 ) and
generates the observation O1 with probability prob(O1 | q1, λ) . At the next time slot, the state
transitions from q1 to q 2 with probability
probability prob(O2 | q2 , λ ) …. Finally

prob(q2 | q1 ) and produces an observation O2 with

prob(O | λ ) is derived by summing the products over

all possible state sequences.
The Forward-Backward Procedure [89] is adopted to simplify the calculation. Consider the
forward variable α t (i ) defined as:

α t (i ) = prob (O1O2 ...Ot , qt = S i | λ )
where

α t (i )

indicates the probability of partially observing the sequence

state at time slot t is

Si

O1O2 ...Ot

given the model λ. It can be proved [88] through induction that:

T

prob(O | λ ) = ∑ α T (i)
i =1

Consider backward variable

β t (i )

, defined as:

β t (i ) = prob(Ot +1Ot +2 ...OT | qt = S i , λ )
89

and

which represents the probability that partially observing the sequence from time slot t+1 to
the end given the current state

Si

and the model λ. Now another variable is defined as:

γ t (i ) = prob(qt = Si | O, λ )
indicating the probability of being in state

Si

at time slot t given the observation sequence

O and model λ. The equation can be reformatted in terms of forward and backward variables:
γ t (i) =

α t (i) β t (i )
α (i) β t (i )
= M t
prob(O | λ )
∑α t (i)βt (i)
i =1

By maximizing

qˆ t =

arg max
1≤ i ≤ M

γ t (i )

, the estimated state

qˆt

at time stamp t can be detected using

[γ t (i )], 1 ≤ t ≤ T

ˆt is the estimated system state at the t-th instance of the state sequence of
The quantity q
length T.
6.2

Results and Discussion
6.2.1

Experimental Methods

During an experiment, a subject was instructed to press a button whenever she or he
swallows, and the smartphone shown in Figure 6-4 was used to record the breathing signal
sensed by the wireless chest belt. A video camera was connected to a computer to record the
movement of mouth and laryngopharynx during the experiment for validation purposes. The
computer, smartphone, and button recorder were synchronized before each experiment session.
The experiment setup is shown in Figure 6-4.
90

Figure 6-4: Experimental setup
Experiments were carried out on 6 subjects (2 female and 4 male) without any known
swallow abnormalities. Each subject was asked to wear the instrumented chest-belt and have his
or her lunch at his or her own pace. The lunch type was chosen by individual subjects based on
their dietary preferences. It included diverse food types including rice, bread, salad, and cooked
vegetarian and non-vegetarian items. Note that the subjects were allowed to drink during the
experiments. However, since the results in this chapter are concentrated only for solid intake, the
affected breathing cycles during drinking were first identified from video recording, and then
removed during data processing.
6.2.2

Performance Indices

To evaluate the detection performance (i.e., both SVM-only and SVM followed by HMM),
we adopted the metrics Precision and Recall, commonly used [27][53] in biomedical signal
processing and information retrieval. Precision and Recall are defined as:

91

Precision =

Recall =

Recognized swallows
TP
=
TP + FP
Retrieved swallows

TP Recognized swallows
=
P
Relevant swallows

In this definition, recognized swallows (i.e., true positives, TP) indicates the number of
swallow events that are correctly detected. Retrieved swallows correspond to the number of
detected swallows including both the TPs and the incorrectly detected swallows (i.e., false
positives, FP). Relevant swallows (i.e., positive, P) refer to the number of actual swallow events
annotated from video observations reflecting the ground truth.
6.2.3

Feature Extraction for Stage-1 Detection using SVM

As reported in our previous work [83], both time domain and frequency domain features
can be used for detecting liquid swallows. The discriminative power of those feature types,
however, can be different. Figure 6-5 shows the discrimination power of time (Figure 6-5:a) and
frequency (Figure 6-5:b) domain features in solid swallow detection using SVM classifier. The
merit of a feature in Figure 6-5:a and Figure 6-5:b refers to information gain [90], which is
defined as the reduction in classification entropy (i.e., H(*)) with additional information
provided by the corresponding feature about the target classes. Assuming A as the feature and C
as the set of classes, the following two equations indicate class entropies before and after using
the feature:
H (C ) = −∑ p (c) log 2 p (c )
c∈C

H (C | A) = − ∑ p (a )∑ p (c | a ) log 2 p (c | a )
a∈ A

c∈C

92

0.2

0.2

Merit

0.3

0.1

0

0

1
13
25
37
49
61
73
85
97
109
121

0.1

0 1 3 4 5 7 8 9 1112131516
Frequency (Hz)

Features

(a)

80
60
40
20
0

50
40
30
20
10
0
-10
-20
-30
-40
-50

21 41 61 81 101 121
Normalized Time
First Derivative:
Normal Breathing
NZC = 2
NTC = 3

Ten
Crossings
1

Normalized amplitude

100

1

Breathing with Swallow

120

Swallow

100

First derivative

Normalized amplitude

(b)

Normal Breathing

120

First derivative

Merit

0.3

21 41 61 81 101 121
Normalized Time

80
60
40
20
0
1

21 41 61 81 101 121
Normalized Time
First Derivative:
Breathing with Swallow
NZC = 2
NTC = 5
Swallow

50
40
30
20
10
0
-10
-20
-30
Ten
-40 Crossings
-50
1 21 41 61 81 101 121
Normalized Time

(c)

(d)

Figure 6-5: Feature discriminative property and ±10 crossings as a classification
feature
93

A feature with higher merit indicates lower class entropy when this feature is adopted.
Merit can be also used when feature reduction is needed in the presence of limited computational
and storage resources.
For time domain features, as shown in Figure 6-5:a, where 128 sample points in
normalized breathing cycles are used as features, sample points near the 27th and 53th sample
points are more important than others in classification. For frequency domain features, as shown
in Figure 6-5:b, where the first 64 FFT coefficients are used as features, lower frequency
components have more discriminative power. It was also found that the discriminative power
distribution of frequency domain features are more consistent across subjects, which is why the
frequency-domain features were finally used in [83].
The second set of SVM classification features is derived from the first derivative of the
breathing signal. As shown in Section 6.1, it was found that the swallows generally create more
fluctuations in the breathing signal compared to the normal breathing cycles. To capture such
fluctuations, an additional classification feature was derived from the first derivatives of the
breathing signal. More specifically, the number of ±10 crossings (NTC) is used as the feature,
which is defined as the number of points in the breathing signal at which the first derivative of
the signal is exactly +10 or -10. Compared to the number of zero crossings (NZC), NTC can
better capture the swallow signatures.
Figure 6-5:c and Figure 6-5:d show an example comparison between a representative
normal breathing cycle and a representative breathing cycle with swallow and their
corresponding NZC and NTC of the first derivatives. Observe that while for both types of
breathing cycles the NZC is 2, the number of ±10 crossings (NTC) is 3 for the normal breathing
and 5 for the breathing with swallow. The additional 2 NTCs (i.e., -10 crossings) are contributed

94

by the swallow event. Differences in NTCs were consistently observed between breathing cycles
with and without swallows, thus indicating the usefulness of NTC of the first derivative as a
useful classification feature for the SVM engine.
The third set of features is derived from the duration and amplitude of the breathing cycles
before normalization. In summary, the SVM features used in this chapter include: first 5 Fourier
transform coefficient, NTC, inhalation duration, exhalation duration, total breathing cycle
duration, inhalation amplitude, and exhalation amplitude.
6.2.4

Swallow Detection with SVM

The features mentioned above were fed into the posterior probability SVM classifier
described in Section 6.1.2. The classifier was trained and validated using data collected through
experiments outlined in Figure 6-4. We have used a leave-one-out validation approach. Meaning
a subject’s data is excluded in training set if his or her data is used as the test set.
Figure 6-6 reports the distribution of the SVM-produced posterior probabilities (i.e., of
probability of a cycle containing swallow signature) for breathing cycles with swallows and
normal breathing cycles. The distribution was plotted from all classification data obtained during
the experiment. In the absence of classification errors, there would have been only one bar at
probability 1 for the cycles with swallows. Similarly, there would have been only one bar at
probability 0 for the cycles without swallows. In Figure 6-6, it can be observed that in spite of
some classification errors (i.e., indicated by the scattered bars over the probability axis) the SVM
is able to separate the two cycle types fairly distinctly. Such errors are often caused by swallow
signatures that are too short (in time) to be captured by the specified features, and by breathing
cycle modulation by adjacent swallows [49].

95

1
0.9
Normal breathing cycle

Posterior Probability

0.8

Breathing cycle with swallow

0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0

0.1

0.2
Percentage

0.3

0.4

Figure 6-6: Distribution of posterior probabilities with and without swallows
Applying a probability threshold Pth to the SVM-produced posterior probabilities it is
possible to classify each cycle as a normal or swallow-containing breathing cycle in the
following manner.

normal breathing cycle , if prob (1 | x i ) < Pth
{
breathing cycle with swallow ,
otherwise

96

Figure 6-7: Comparison between SVM-only and two-tier SVM+HMM mechanism
Figure 6-7 shows the SVM-only classification accuracy (i.e., precision and recall) while
the threshold Pth is changed in the range from 0.1 to 0.9. In Figure 6-7, each SVM-only
performance point on the curve for a given subject corresponds to one threshold value and its
97

corresponding precision and recall performance pair. When the threshold varies from low to
high, the precision increases, while the recall reduces, indicating fewer false positives and true
positives. Since the breathing signal during lunch is imbalanced, meaning there are much more
normal breathing cycles than breathing cycles with swallows, a higher threshold gives more
preference over normal breathing cycles and therefore reduces the false positives, thus increasing
the precision and decreasing the recall. The reverse effect was observed while lowering the
detection threshold Pth. The SVM-only performance lines in Figure 6-7 provide a means for
choosing an appropriate Pth for a required balance between precision and recall. Table 6-1
presents the precision and recall performance for all six subjects when an even threshold of 0.5 is
chosen for swallow classification.
SVM

SVM+HMM

Precision (%)

Recall(%)

Precision (%)

Recall(%)

Subject 1

68

86

74

82

Subject 2

67

100

74

96

Subject 3

49

90

81

81

Subject 4

54

98

72

93

Subject 5

45

91

66

87

Subject 6

80

80

83

74

Table 6-1: Comparison between fixed threshold SVM-only and two-tier SVM+HMM
mechanism
6.2.5

Improved Detection using HMM

Hidden Markov Model processing, as outlined in Section 6.1.4, was applied to the
posterior probabilities output of SVM for improving detection performance. Such improvements
are accomplished via correcting some of the SVM errors by the way of leveraging known
98

locality information in human swallow sequences. As described in Section 6.1.3, for each
subject, both A (2x2) and B (2x10) matrices in the HMM model were computed based on
experimental observations using video camera, push button, and data from the wireless chest-belt
system. As an example, for Subject-1 the A matrix was found to be:

 0.76
A=
 0.96

0.24 
0.04 

and the B matrix was found to be:

 0 . 84 0 . 03 0 . 01 0 0 . 03 0 . 01 0 . 01 0 . 01 0 . 01 0 . 06 
B = 

 0 . 08 0 . 02 0 0 0 . 04 0 0 . 04 0 . 02 0 . 06 0 . 74 
Swallow detection performance of the SVM+HMM approach is presented in Table 6-1 and
in Figure 6-7. For each subject, there is one point for the SVM+HMM approach indicating the
corresponding precision and recall performance. Observe that for subjects 1, 2, 3, and 6, the
SVM+HMM point is situated higher and on the right in comparison to the line for the SVM-only
approach. This indicates better performance of SVM+HMM compared to the SVM-only
approach with all possible posterior probability thresholds. For the remaining two subjects (i.e., 4
and 5), the SVM+HMM performance point is on the SVM-only line, indicating that with certain
posterior probability threshold, SVM-only can perform as good as the SVM+HMM approach.
These results validate the overall usefulness of the proposed HMM processing by leveraging
known swallow sequence locality information for removing certain classification errors that are
introduced by the SVM-only approach.
6.3

Conclusion
In this chapter, we have presented a wireless and wearable solid food intake monitoring

system. A novel support vector machine (SVM) and Hidden Markov Model (HMM) based

99

processing mechanism, which analyzes collected breathing signal and previously known swallow
sequence locality information, was also presented. The system and processing mechanism was
experimentally proven to be effective for solid food intake detection. Ongoing work on this topic
includes: 1) developing an unsupervised swallow (both solid and liquid) detection mechanism for
generalizability, 2) developing a detection and filtering mechanism for artifacts introduced by
movement and speech, and 3) implementing a real-time swallow detection system that can be
used by health researchers for retrieving dietary information from targeted population.

100

Chapter 7:

Mealtime and Duration Monitoring

In this chapter, we present a wearable sensor system for estimating mealtime (i.e., time of
the day for a meal) and meal duration based on those swallow-triggered apneas detected in
breathing signals. Using two Respiratory Inductance Plethysmography (RIP) belts worn on the
chest and abdomen as shown in Figure 7-1, swallow-triggered apneas are detected. Since the RIP
belts do not rely on pressure or skin-contact to pick up breathing signal, they can be suitable for
prolonged usage without minimal cosmetic and comfort issues.
Time and duration of meals were shown to be highly correlated with obesity. Ma et al [91]
analyzed the mealtime of 499 people for 1 year and concluded that a greater number of eating
episodes each day was associated with lower risk of obesity, whereas practices such as skipping
breakfast was linked with higher obesity risk. Gluck et al [92] conducted experiments on 55
subjects and demonstrated that nighttime eaters, who consume considerable amount of food at
night frequently tend to gain weight faster than non-nighttime eaters. Cleator [93] also showed
that 52% of obese nighttime eaters reported normal weight before the onset of their nighttime
eating habit. Andersen et al [94] carried out experiments lasted for 10 years, and reported that
obese women with nighttime eating experienced a weight gain of 5.2kg on an average over 6
years.
Improper mealtime can have other negative impacts. Rogers et al [95] showed that subjects
with night eating syndrome had less stage 2 and stage 3 sleep, which contributed to shorter total
sleep time and lower sleep efficiency, and they are more likely to suffer from depression.
Sassaroli et al [96] also found that night eating syndrome is highly correlated with anxiety.
In addition to the wearable belt, an accelerometer on the wrist of the dexterous hand has
also been used to improve the swallow detection performance. The hand movement helps
101

improving detection accuracy by supplying side information, so that when there is confusion due
to uncertainties in detection using breathing signal, the hand movement information helps. The
accelerometer is able to capture the characteristics of hand movement during eating, drinking,
sitting, and talking artifacts.
In our previous work [97][98], we presented an early stage swallow detection system and
algorithms for detecting liquid and solid food intake in the absence of artifacts. This chapter
builds on those core system capabilities and leverages those algorithms in order to develop a
system specifically for estimating the meal intake time and duration.
Specific contributions of this chapter include: 1) developing algorithmic solutions for
handling artifacts including spontaneous swallows, talking, laughing, coughing and clearing
throat, and 2) experimentally demonstrating the effectiveness of the proposed system and
algorithms in a semi-controlled environment.
7.1

System Architecture
Figure 7-1 demonstrates the wearable sensor system, which is an extension of what was

reported in [98]. The wearable system includes: 1) a pair of RIP belts and their associated control
box (from zRIP Durabelt sum kit, Pro-Tech, Murrysville, PA) for collecting breathing signal, 2)
a signal shaping circuit for amplifying and filtering the raw signal from the sensor to optimize
signal-to-noise ratio (SNR) of ADC stage, 3) a µController and Bluetooth subsystem equipped
with 14-bit ADC channels and an 8-bit accelerometer for sampling and transmitting breathing
and hand movement signal over Bluetooth to an external Android smart phone, and 4) a 3.7V
340mAh polymer rechargeable battery.

102

Figure 7-1: Components of the mealtime and duration monitoring system
The signal shaping circuit, µController and Bluetooth subsystem, and battery are placed in
a 4cm×2.5cm×2cm 3D-printed watch-like wrist unit worn on the wrist of the dexterous arm of a
subject. The system is able to collect data continuously for over 20 hours on a single battery
charge.
7.2

Processing Methods
Figure 7-2 depicts the overall logic and algorithmic architecture for the proposed mealtime

and duration monitoring. The overall logic can be divided into three parts, namely,
preprocessing, food intake detection, and meal intake analysis.
The preprocessing stage takes respiratory signal and wrist acceleration signal, and
generates feature vectors. Each feature vector contains a number of features, which represent the
unique characters of breathing in a cycle-by-cycle manner and of the wrist acceleration signal.
103

More specifically, features from breathing signal capture the signature of swallow apnea,
whereas those from wrist acceleration represent hand movement during eating, drinking, sitting,
talking, and other artifacts.
The food intake detection module utilizes a 3-stage hierarchical classifier, where the first
classifier differentiates normal breathing cycles from non-normal breathing cycles (i.e., cycles
with talking, solid swallow, and liquid swallow). The second classifier detects talking and
swallowing (i.e., both solid and liquid swallows), and the third classifier differentiates between
solid and liquid swallows.
The final architectural component (in Figure 7-2: The mealtime and duration detection
scheme) is the meal intake analysis stage that estimates the time and duration of each meal intake
episode based on the detected solid swallows from the previous food intake detection stage.
In the preprocessing stage, the respiratory signal from the from RIP sensors and the wrist
acceleration signal, as described in Section 7.1, are sampled at 100Hz and fed into a digital low
pass filter for removing quantization noise caused during the A-D conversion. The respiratory
signal then passes through a breathing cycle extraction module, which performs a threshold
based peak and valley detection operation as described in [85] on 33% overlapping 30-second
windows.
The peak-valley detection algorithm selects peaks and valleys alternatively, such that the
amplitude difference between each neighboring pair is larger than a threshold. The threshold is
set to 0 .3 (max d ( m )∈C d ( m ) − min d ( n )∈C d ( n )) , where C is the data set that includes all the data
points in the target 30-second window, and max d ( m )∈C d (m) and min d ( n)∈C d (n) are the amplitudes
of the highest and the lowest sample points in the window respectively. In case of fluctuations
below the threshold near a peak or valley, the highest or lowest points are selected. A threshold
104

of (max d ( m )∈C d ( m ) − min d ( n )∈C d ( n )) is also applied in order to detect improper wear of the
respiratory belts, in which case the received respiratory signal contains only low amplitude noise
caused by its electronic components and the ADC

Respiratory signal
from ADC

Wrist acceleration
from ADC

Low pass filter

Low pass filter

Preprocessing

Breathing cycle
extraction
Normalization
Feature extraction
Food intake detection
Classifier-1

Classifier-2
Classifier-3

HMM-1
Normal
breathing cycle
Meal intake analysis

HMM-2
Solid swallow
Talking

Liquid swallow

Dietary behavior analyzer
Detected
mealtime

15min
Meal-1

30min
Meal-2

7am

12pm

45min
Meal-3
8pm

Figure 7-2: The mealtime and duration detection scheme
Each separated breathing cycle is normalized amplitude-wise and interpolated in time. The
normalized and interpolated breathing cycles span between 0 and 100 amplitude-wise and have
105

128 sample points each. More specifically, the normalization of a cycle is performed for the
valley-to-peak (inhale) and peak-to-valley (exhale) data segments individually. Each valley-topeak and peak-to-valley data segment is normalized as follows:

normalized (i ) =

[data (i) − valley ] ⋅100
( peak − valley )

where data(i) corresponds to a data point in the valley-to-peak (inhale) or peak-to-valley (exhale)
data segment, normalized(i) is the normalized data point corresponding to data(i), and peak and
valley are the peak and valley points of the segment. Linear interpolation is adopted to
interpolate each breathing cycle.
Considering the average length of a cycle of 3.77 seconds in our experiments, the
normalized sampling rate after interpolation is 34Hz (128 sample points/3.77 seconds result in
34Hz). Although each cycle may have different time and amplitude ranges caused by various
tidal volume [97] and respiratory frequency (i.e., variable person-to-person or cycle-to-cycle for
the same person), the normalization process removes such variances, thus improving the
generalizability of the proposed system in handling respiratory signals with various amplitude
and frequency. The Feature Extraction module takes both the breathing signal and wrist
acceleration data and generates 29 features from breathing signal and 12 features from wrist
acceleration signal. Details about the features are provided in Section 7.3.3.
In the food intake detection phase, we used a 3-stage hierarchical classifier, which contains
3 individual classifiers: Classifier-1 for detecting normal breathing cycles and non-normal
breathing cycles (including breathing cycles with talking, solid swallows and liquid swallows),
Classifier-2 for identifying swallows (including both solid and liquid swallows) from breathing
cycles with talking, and Classifier-3 for detecting solid and liquid swallows.

106

For all three classifiers, Support Vector Machine (SVM) with posterior probability [99] are
used. A posterior probability indicates the SVM-detected probability of a given breathing cycle
to be either one of the two classes that the classifier is designed to differentiate. Details about the
SVM with posterior probability are introduced in Section 6.1.2.
Hidden Markov Model (HMM) is applied to the posterior probability outputs of Classifier1 and Classifier-2 in order to improve the detection performance by leveraging a-priori
knowledge about any temporal locality present in the swallow and talking sequence. HMM is not
applied to Classifier-3 (which differentiates solid and liquid swallows), because based on our
experiments and observations, solid and liquid swallows do not demonstrate strong temporal
locality. Meaning, swallows (i.e., solid or liquid) are generally not likely to happen in
consecutive breathing cycles. Details about HMM modeling and its processing are discussed in
Sections 6.1.3 and 6.1.4.
All 3 classifiers in the hierarchical classifier system use features extracted in the
preprocessing module. Classifier-2 and Classifier-3 are triggered by the output of HMM-1 and
HMM-2 respectively. Meaning Classifier-2 is deployed only when a breathing cycle is classified
as non-normal (i.e., cycles with talking, solid swallows and liquid swallows), and Classifier-3 is
applied when it is classified a swallow (i.e., both solid and liquid swallows).

107

Start
Move to next
breathing cycle
Currently breathing cycle is
with a solid swallow

No

Yes
Select a window centering
at current breathing cycle

Number of solid swallows in
the window > Threshold

No

Yes
Set breathing cycles between the first and last
solid swallow as part of the meal episode

Figure 7-3: Meal intake analysis algorithm
In our previous work [97][98], Support Vector Machine (SVM) was shown to be the most
effective classifier for swallow detection. Like most commonly used SVM [77], the classification
output for each breathing cycle is a class label, which is normal breathing cycle or breathing
cycle with swallows. After analyzing the classification errors, it was found that many of those
errors could be corrected by applying known temporal locality information in human swallow
and breathing sequences. For example, people rarely swallow in many consecutive breathing
cycles. Thus, whenever the classification output shows many consecutive breathing cycles with
swallows, errors can be suspected and misclassified instances can be identified and removed by
108

applying higher level techniques such as Hidden Markov Model (HMM). This motivates the
cascading of SVM and HMM, as shown in Figure 7-2.
The meal intake analysis module takes the detected solid swallows from the food intake
phase. Empirically, people execute solid swallows periodically during a meal. Therefore, when N
(i.e., a threshold count) or more of solid swallows are detected in a window of M breathing
cycles, those M cycles are categorized as the part of a meal intake episode. The detailed
algorithm is illustrated in Figure 7-3. When a moving window is centered at a breathing cycle
with solid swallows, and the number of solid swallows in the window exceeds the threshold N,
the cycles among the solid swallows are classified as part of the meal intake episode.
7.3

Results
7.3.1

Experimental Methods

The experiments were carried out on 14 subjects (5 female and 9 male) without any known
swallow abnormalities. The experiments were approved by Michigan State University’s
Institutional Review Board. Subjects were required to participate at least 3 sessions of
experiments. The first 2 sessions were Type-1, and the last session was Type-2.

109

Figure 7-4: Experimental setup
During a Type-1 session, a subject was asked to have lunch first without talking, then drink
water from a flask every 20 seconds for 10 times, and then rest for 10 minutes, and lastly
converse with the experimenter freely for 10 minutes. A Type-2 session was the same as a Type1 session except that the subject was allowed to talk when having lunch. Coughing, clearing
throat, and other activities that impact breathing were also allowed and constantly observed
during the whole experiment (both Type-1 and Type-2), and laughing was allowed during the
10-minute conversation of each experiment session. The food for lunch during the experiment
included rice, bread, salad, fruit, cooked vegetarian and non-vegetarian items. Among the 14
subjects, 4 of them executed 3 Type-1 sessions and 1 Type-2 session, and the rest performed 2
Type-1 sessions and 1 Type-2 session. Each Type-1 or Type-2 session lasted around 45 minutes,
and in total, approximately 34 hours 30 minutes of data were collected.
The experimental setup is shown in Figure 7-4. During an experiment, a subject, wearing
the instrumented system as described in Figure 7-1, was instructed to press a button whenever
110

she or he swallows during lunch or drinking, and the smartphone was used for recording the
respiratory and hand movement signal captured by the system. A video camera was connected to
a computer to record the movement of mouth and laryngopharynx during the experiment to
indicate when the subject was talking and to validate the swallows. The computer, smartphone,
and button recorder were synchronized before each session of experiment. The push-button and
video information was used as a ground truth for verification purposes.
Note that the method of using a press button and video camera is more natural compared to
the observer-based experiments as reported in [29][36], where an observer recorded the swallow
events. Such observer-based method suffers from the observer-expectancy effect [100], meaning
the expectation of experimenter affects the behavior of participants. In our experiment, we used a
press button to indicate the ground truth, and a camera for verification purposes to avoid such
observer-expectancy effect.
Normal breathing cycles, breathing cycle with talking, solid swallows, and liquid swallows
were labelled correspondingly. Laughing, coughing, clearing throat and other artif1acts were
labelled as talking. Data collected during the 10-minute rest were only used in the meal intake
analysis phase as described in Section 7.3.5 for analyzing the impact of spontaneous swallows.
7.3.2

Performance Evaluation

To evaluate the performance of the SVM-only and the SVM-followed-by-HMM
arrangements, we have used commonly used [29][32] biomedical signal processing and
information retrieval performance indices, namely, Precision, Recall, and F-measure. They are
defined as follows:

Precision = True Positives (True Positives + False Positives)

111

(11)

Recall = True Positives (True Positives + False Negatives)

(12)

F − measure = 2 ⋅ Precision ⋅ Recall ( Precision + Recall )

(13)

For Classifier-1 and HMM-1 (see Figure 7-2), which classify normal breathing cycles and
non-normal breathing cycles (i.e., breathing cycle with talking, solid swallows and liquid
swallows), True Positives indicate the non-normal breathing cycles that are correctly detected,
False Positives means the normal breathing cycles that are mistakenly classified as non-normal
breathing cycles, and False Negatives correspond to those non-normal breathing cycles that are
wrongly detected as normal breathing cycles.
For Classifier-2 and HMM-2, which differentiate breathing cycles from talking and
swallows (i.e., solid swallows and liquid swallows), True Positives means the swallows that are
correctly detected, False Positives indicate those breathing cycles with talking that are
mistakenly classified as swallows, and False Negatives are swallows detected as breathing cycles
with talking.
For Classifier-3, which classifies solid and liquid swallows, True Positives depict the solid
swallows that are corrected classified, False Positives indicate liquid swallows that are
incorrectly detected as solid swallows, and False Negatives are solid swallows detected as liquid
swallows.
F-measure is the harmonic mean of Precision and Recall, and it is used in the case of
contradictory conclusions based on Precision and Recall individually when compared to the
performance of two different classifiers. For example, when comparing two classifiers A and B,
Precision of A is higher than that of B, whereas A’s Recall is lower than B’s, F-measure can then
be used to draw a conclusion.

112

7.3.3

Feature Extraction

Category

Features
Based on non-

Duration of inhalation and exhalation (4)

normalized

Amplitude of inhalation and exhalation (4)

breathing

Breathing cycle duration and frequency (3)

cycles

Standard deviation of breathing signal in a cycle (1)

Respiratory
Number of local peaks (1)
signal
Based on

Hist-60 (1)

related
normalized

Mean and standard deviation of breathing signal in a cycle (2)

breathing

First 10 FFT coefficients (10)

cycles

Energy of high frequency (>3 Hz) components (1)
±10 crossing of first derivative(2)

Mean and standard deviation of X and Y axis acceleration in the current breathing
cycle (4)
Hand
Mean and standard deviation of X and Y axis acceleration in the previous breathing
movement
cycle (4)
related
Mean and standard deviation of X and Y axis acceleration in the breathing cycle
before previous one(4)
Table 7-1: Features Extracted For Svm Classifiers
Table 7-1 demonstrates the 41 features extracted for SVM classifiers for food intake
detection (numbers in the parenthesis indicate the number of features for each category). Note
that the first 3 sets of features, i.e., duration of inhalation and exhalation, amplitudes of
inhalation and exhalation, and breathing cycle duration, include both absolute and relative
113

numbers. For example, both absolute duration of inhalation and exhalation and the proportion of
inhalation and exhalation as part of the whole breathing cycle are used as features. Absolute
breathing cycle duration and the ratio between the duration and average duration of 2
neighboring breathing cycles are also used.
Hist-60 is derived as follows: 1) first divide the amplitude of the normalized breathing
cycle into 10 equal intervals with ID 1 to 10, and then 2) set the number of samples falling into
interval i as bin-i, i=1,2,…,10. Hist-60 is set to i when
i −1

i

∑ bin − j > 0.6 ⋅ (total sample points) > ∑ bin − j
j =1

j =1

(8)

±10 crossing of first derivative is defined as the number of sample points whose first
derivatives are either +10 or -10. ±10 crossing of first derivative has been used in [98] and
proved to be effective in swallow detection.
7.3.4

Performance of Food Intake Detection

Food intake detection module as described in Figure 7-2 utilizes the features extracted
from the preprocessing module, and detects normal breathing cycles, breathing cycles with
talking, solid swallows, and liquid swallows. The food intake detection module uses the
extracted features as shown in Table I from the preprocessing stage, and performs SVM and
HMM classification. The detected normal breathing cycles, breathing cycles with talking, solid
swallows, and liquid swallows are then fed in to the meal intake analysis module for meal intake
episode detection.
SVM classifiers (i.e., Classifier-1, 2, and 3) can be used without involving the HMMs. As
presented in Section 6.1.2, Classifier-1, 2 and 3 produce the posterior probability of each
breathing cycle of being non-normal breathing cycle, swallow, and solid swallow respectively.
114

By comparing the posterior probability with a predefined threshold, detection results can be
derived. For unbiased classifiers, which have no preference on any class, the threshold 0.5 can be
adopted. For example, if the posterior probability produced by Classifier-1 is 0.3, the
corresponding breathing cycle can be classified as a normal breathing cycle.
Figure 7-5 shows the performance measures as defined in Section 7.3.2 of this SVM-only
food intake detection method with and without the hand movement features as illustrated in
Table 7-1. Results are reported both for subject dependent and subject independent models.

Subject dependent
(a)
Without hand movement features

100
Performance (%)

Performance (%)

100
90
80
70
60

90
80
70
60

Classifier 1 Classifier 2 Classifier 3

Performance (%)

100

Subject dependent
(b)
With hand movement features

Classifier 1 Classifier 2 Classifier 3

(c)
Subject independent
With hand movement features

90
80
70
60
Classifier 1 Classifier 2 Classifier 3

Precision

Recall

F-measure

Figure 7-5:. Performance of SVM-only food intake detection method
Subject dependent model: SVM classifiers were tested with data from one session and trained on
the data from other sessions for each subject.
115

Subject independent model: SVM classifiers were trained using the data collected from all
subjects except the one whose data was used for testing.
Note that results shown in Figure 7-5 were based on the Type-1 experiments during which
talking was not allowed during lunch. Figure 7-5:(a) shows the precision, recall and F-measure
for Classifier-1, Classifier-2 and Classifier-3, when evaluated with subject dependent and
without hand movement scenarios. Figure 7-5:(b) is for subject dependent and with hand
movement scenarios. Finally, Figure 7-5:(c) is for subject independent and with hand movement
case.
As expected, by providing useful side information, the optional hand movement feature did
improve the overall detection performance. As shown in Figure 7-5:(a) and (b), when the hand
movement features are used, the performance of Classifier-2 improved significantly. However,
the hand movement features have very limited impacts on the performance of Classifiers-1 and
3. The Subject-independent model has performance similar to the subject dependent model
according to Figure 7-5:(b) and (c), which proves the good generalizability of the proposed
method for food intake detection.
Table 7-2: Comparison Between Svm-Only And Svm+Hmm Solutions
SVM only
Classifier

SVM+HMM

Precision

Recall

F-measure

Precision

Recall

F-measure

(%)

(%)

(%)

(%)

(%)

(%)

1

86.6

76.6

81.3

85.1

81.7

83.4

2

60.4

80.4

69.0

72.8

71.3

72.0

HMM modeling and processing, as introduced in Section 6.1.3 and 6.1.4, were applied on
the posterior probability produced by the SVM classifiers to improve the performance by
leveraging the temporal locality present in human swallowing and talking behavior. Table 7-2
demonstrates the comparison between SVM-only solution and SVM+HMM solution on Type-2
116

dataset, in which subjects can talk during lunch. The F-measure of SVM+HMM mechanism,
which represents a harmonic mean of precision and recall, is constantly higher than that of the
SVM-only solution.
(a) Subject 1

Lunch

Drinking

Rest

Talking

T
3
LS
2
SS
1
NBC
0
0
Actual events

3

6

9

12

15

18

21

24

27

30

33

3

6

9

12

15

18

21

24

27

30

33

3

6

9

18

21

24

27

30

33

3
T
2
LS
SS1
0
NBC
0
Detected events
1
MIE

Non0
MIE 0

12
15
Time (Minute)

Detected meal intake episode

(b) Subject 2

Lunch

T
3
LS 2

Drinking

Rest

Talking

SS 1
NBC 0
0
Actual events

3

6

9

12

15

18

21

24

27

30

33

36

0
3
Detected events

6

9

12

15

18

21

24

27

30

33

36

6

9

12

15

18

21

24

27

30

33

36

T3
LS 2
SS 1
NBC 0

MIE
1
NonMIE 0

0

3

Time (Minute)
Detected meal intake episode
T: Talking LS: Liquid Swallow SS: Solid Swallow NBC: Normal Breathing Cycle
MIE: Meal Intake Episode Non-MIE: Non-Meal Intake Episode

Figure 7-6: An example temporal dynamics of the meal intake analysis process
117

7.3.5

Performance of Meal Intake Analysis

As the final step of mealtime and duration analysis, the meal intake analysis module as
described in Figure 7-2, analyzes the detected solid swallows from food intake monitoring
module and detects the meal intake episodes. Since people generally execute solid swallows
periodically during a meal, a subject is considered to be within a meal when a more than a
threshold number of solid swallows (i.e., N) are detected in a window of M breathing cycles. The

Average error (Minutes)

detailed algorithm is depicted in Figure 7-3.

2.5
2
1.5

Window =100 sec
Window =140 sec

1
0.5
0
2
3
4
5
6
Threshold (number of breathing cycles)
Figure 7-7: Threshold selection for different window sizes

The optimum threshold N depends on the window size M. When M is large, the threshold
also needs to be large. A large M causes the issue that small false positives may lead to large
error in food intake episode detection, whereas a small M can lead to short detected food intake
episodes for an actual meal. Figure 7-7 shows the optimum threshold selection for example
window sizes of 100 seconds and 140 seconds. The error is defined as the difference in time
between recorded lunch intake duration and the duration detected by the system. In the case of a
window size of 100 seconds, the optimum threshold is 3 breathing cycles, whereas when the

118

window size is 140 seconds, the threshold needs to be adjusted to 4 breathing cycles. Note that
the optimized M and N can be different for different subjects or even the type of food, but the
analysis here provides a general guidance in selecting the parameters. In our experiments,
window size of 100 seconds and threshold 3 were selected across all subjects and experiments.
Figure 7-6 shows an example of meal intake analysis with average food intake detection
performance. Figure 7-6:(a) shows the results for Type-1 experiments, in which talking was not
allowed during meal, and Figure 7-6:(b) depicts the results for Type-2 experiments, when talking
was allowed. Figures on the top indicate the ground truth recorded by button pressing and the
video camera. For instance, each solid swallow (SS) point during lunch corresponds to a solid
swallow recorded by the button pressing. Similarly, each point during the drinking phase was a
liquid swallow (LS), which was recorded by a button pressing, and the video record was used to
verify the boundary between lunch and drinking and to indicate breathing cycles with talking.
Detection results from food intake detection module are shown in the middle graphs in Figure
7-6. Each point in this section corresponds to a normal breathing cycle, breathing cycle with
talking, solid swallow or liquid swallow detected by the food intake detection module as
described in Figure 7-2. The graphs at the bottom of fig. 8 are the detected meal intake episodes
by the meal intake analysis module in Figure 7-2. Each point in these graphs corresponds to a
breathing cycle which either belongs to the meal intake episode (MIE) or not (Non-MIE). It can
be seen that although there are quite a few detection errors in the food intake detection module,
the meal intake analysis module can still detect the time and duration of meal intake episodes
with fairly high accuracy.

119

Actual duration
Detected duration

Meal Intake Episode
(Minutes)

15

11

7

3
Type-1 experiment

Type-2 experiment

Figure 7-8: Performance of meal intake analysis module
More specifically, in Figure 7-6:(a), few breathing cycles are mistakenly classified as
talking during meal. However, since the meal intake analysis module relies only on solid
swallows, those few misdetections do not reduce the number of solid swallows in the window
below the threshold N. As a result, the meal intake episode is still detected fairly accurately.
Similarly, although few breathing cycles are erroneously classified as solid swallows during the
talking session, the detected solid swallows are not enough (i.e., far below the threshold N) to be
detected as meal intake episode. In Figure 7-6:(b), few of the solid swallows are not detected
correctly at the beginning of the experiments, thus the detected meal intake episode is slightly
shorter than the actual episode. Observe that many spontaneous swallows are detected during the
resting session. At the beginning of talking session, three breathing cycles are erroneously
classified as solid swallows, thus a short false positive is observed.
Figure 7-8 demonstrates the actual and detected duration of meal intake episode for both
Type-1 and Type-2 experiments. It can be seen that the meal intake analysis module can
generally estimate the meal intake episode for Type-1 experiments in an unbiased manner, but
underestimates the episode for Type-2 experiments due to the fact that during lunch people
sometimes converse for a while without feeding themselves.

120

7.4

Discussion
7.4.1

Restrictive Feature Selection

In the analysis in Section 7.3, we have used all the features listed in Table 7-1. While using
all the features yields the best detection performance, it may not be desirable in situations where
the available computational power is limited on devices such as smart phones, sensors, etc. In
order to evaluate the proposed algorithms in such limited resource scenarios, the features can be
ranked based on their discrimination power, which refers to information gain [90] defined as the
reduction in classification entropy (i.e., H(*)) with additional information provided by the
corresponding feature about the target classes. Assuming A as a feature and C as the set of target
classes, the following equations indicate class entropy before and after using the feature:

H (C ) = −∑ p(c) log2 p(c)

(9)

c∈C

H (C | A) = − ∑ p ( a )∑ p (c | a ) log 2 p (c | a )
a∈ A

(10)

c∈C

A feature with high discrimination power brings low class entropy when the feature is
utilized. This information gain-based feature selection algorithm has been used in [101] [102],
and proven to be effective.

121

Classifier1

0.9
0.8
Precision
Recall

0.7
0.6
0

Classifier2

0.9
0.8
Precision
Recall

0.7
0.6

10
20
30
40
Number of features
Classifier3

1

Precision/Recall

1

Precision/Recall

Precision/Recall

1

0

10
20
30
40
Number of features

0.9
0.8
Precision
Recall

0.7
0.6
0

10
20
30
40
Number of features
Figure 7-9: Performance of Classifier-1, 2 and 3 with different feature count

Figure 7-9 demonstrates the performance of Classifier-1, 2 and 3 when the number of
adopted features is changed. Note that the HMM stage is bypassed in this analysis, and SVM
[99] is used as the machine learning algorithm. Features are added into the feature set one by one
based on their discrimination power evaluated based the mechanism suggested in [101] [102].
More specifically, the feature with highest discrimination power is first used, and then the feature
with second highest discrimination power is added. The results in the figure represent the
average performance for all the subjects using the Subject dependent model as described in
Section 7.3.4.
As shown in Figure 7-9, performance of Classifier-1 stabilizes with 25 or more features,
while performance of Classifier-2 stabilizes with 20 or more, and performance of Classifier-3
stabilizes with 10 or more features.

122

These results can be used to determine how many and which features should be used based
on how much computational resources are available in the target platform such as a smart phone
or sensor mote.
7.4.2

Benefits of Hierarchical Classifier

We have evaluated the performance difference between the proposed hierarchical classifier
in Section 7.2 and a corresponding single-stage classifier with same classification objective. In
the 3-stage hierarchical classifier, the first classifier detects normal breathing cycle and nonnormal breathing cycles (includes breathing cycles with talking, solid swallow and liquid
swallow). The second classifier detects talking and swallowing (both solid and liquid swallow),
and the third classifier identifies solid and liquid swallows.
A corresponding single stage classifier would classify normal breathing cycles, breathing
cycles with talking, solid swallows, and liquid swallows. Such a classifier would use the features
extracted by the Preprocessing module in Figure 7-2, and apply them to a single machine
learning algorithm. We have used SVM for both the hierarchical classifier and the single stage
classifier for their performance comparison as reported in Table 7-3.
Table 7-3: Comparison of 3-Stage Hierarchical Classifier And Single Classifier
3-stage hierarchical classifier
Classifier

Precision

Single classifier

F-measure

Precision

Recall (%)
(%)

F-measure
Recall (%)

(%)

(%)

(%)

1

84.5

79.6

82.0

85.7

70.9

77.5

2

82.1

82.9

82.5

73.2

84.6

78.5

3

87.0

87.5

87.2

64.7

96.2

77.4

123

Performance of each classifier in the 3-stage hierarchical classification method is reported
individually. For the single classifier, the performance is mapped to the corresponding stages of
the hierarchical method. For example, performance of the single classifier compared to
Classifier-1 (i.e., the one differentiates between normal breathing cycle and non-normal
breathing cycle) in hierarchical method was computed by combining breathing cycles with
talking, solid swallow and liquid swallow as a single category of non-normal breathing cycle. By
comparing the F-measures of the two solutions, it can be observed that the proposed 3-stage
hierarchical classifier constantly outperforms the single classifier solution, and justifies its
additional complexity.
7.4.3

Performance of Existing Research

Passler et al [32] achieved 91.3% precision and 81.8% recall with an in-ear microphone,
although only solid swallows were considered, and artifacts were excluded in their work. Amft et
al [39] used inertial sensors to track the movement of arm and trunk, an ear microphone to record
the food break down sound, and surface Electromyography (SEMG) electrodes and stethoscope
microphone to detect swallowing activities. The derived detection of individual swallows
resulted in 20% for precision and 68% for recall. Makeyev et al [34] deployed a throat
microphone located over the laryngopharynx to detect swallow events, and the average accuracy
is 66.7% for inter-subject model (cross-validation). In this paper, the cascading of Classifier-1
and Classifier-2 using the SVM+HMM scheme detects swallows, for which the combined
precision and recall (i.e., derived using the method described in [103][104]) are 71.3% for
precision and 60.6% for recall. Based on these observations, it can be concluded that the
proposed detection algorithms in this paper offer a competitive mealtime estimation method with
a brand new modality of sensing.
124

7.4.4

Spontaneous Swallows

Spontaneous swallows were identified and handled in this reported system. Spontaneous
swallow is a protective aero-digestive reflex for airway protection, and it is caused by
accumulated saliva and/or food remnants in the mouth [105]. For healthy subjects, the frequency
of spontaneous swallows is about 1.22 times per minute [106].
In our experiments, the 10-minute rest within each experiment session was used to
evaluate the performance for handling spontaneous swallows. As shown in Figure 7-6:(a),
spontaneous swallows during resting were sometimes misdetected as talking, and as shown in
Figure 7-6:(b), majority of those were detected as liquid swallows. However, as spontaneous
swallows are less frequent and the meal intake analysis module only depends on solid swallows,
spontaneous swallows did not impact on the detection performance of the of meal intake
episodes, in general.
7.5

Conclusion
This paper presents a wearable sensor system for mealtime and duration detection based on

breathing signal and hand movement analysis. Different from previous research, the proposed
mechanism not only detects each individual swallows, but also detects the mealtime and duration
information. Experiments were carried out on 14 subjects considering various artifacts that affect
breathing signal, such as spontaneous swallows, talking, laughing, coughing, and clearing throat.
Experimental results show that the proposed system and mechanism is an effective method for
mealtime and duration monitoring.

125

Chapter 8:

Proposed Work

In this thesis it was demonstrated that piezoelectric and RIP based sensor systems, and the
proposed intake monitoring algorithms are able to detect swallow events, mealtime, and duration
by observing a person’s breathing signal. We first detect swallows by the way of detecting
apneas extracted from breathing signal captured by a wearable wireless chest-belt. Afterwards,
swallow pattern analysis is used for identifying swallows. Lastly, mealtime and duration
detection is performed. Together with self-reporting at the high level of overall diet habits (i.e.,
the types of food and drinks etc.), the instrumented detection of swallow counts can offer an
objective way to: 1) study the food and drink intake trends, and 2) estimate calorie intake.
Building on the work done so far, we propose the following future search items.
8.1

Diet Volume Detection
To the best of our knowledge, most of the existing diet monitoring methods and algorithms

focus on detecting the swallowing events. The evaluation metrics includes precision, recall and
sensitivity. Although swallowing counts are generally correlated with volume of food consumed,
the actual volume of bolus for each swallow may vary over time and across different subjects.
Consequently, detecting swallow events without considering the volume of food intake may
bring inaccuracy in estimating energy intake.
According to our experiments in Chapter 7, the volume for each swallow varies
significantly across subjects, and also depends on the nature of the food. The following methods
may be adopted for volume detection:
1) Performing controlled experiment on subjects with various types of food to get the
volume per swallow

126

2) Using the swallow detection mechanism proposed to count the number of swallows.
Together with the self-report of food type, the total volume of food can be estimated.
8.2

Choking and Coughing Detection
According to National Safety Council [107], choking is the third leading cause of home

injury death for adults over 76, and second for people over 89. Choking is also fatal to children
under 1 year old. Many factors can lead to choking, such as improperly chewed food, drinking
alcohol, Parkinson’s disease, etc.
Choking is the obstruction of the flow of air into lungs, and it prevents breathing.
Prolonged choking time can result in asphyxia which leads to anoxia and is potentially fatal.
Measures need to be taken within minutes before oxygen stored in the blood and lungs is
depleted.
As shown in Chapter 3, holding breathe can cause constant output from the proposed chest
belt systems. While choking, a victim may move irregularly causing motion effects. Therefore,
by analyzing the component of breathing signal at different frequencies, choking can be detected.
Coughing is a protective reflex that clears the respiratory passages. It can be divided into
three phases: (a) inhalation, (b) forced exhalation against closed glottis, and (c) violent release of
air from lungs with opening of the glottis. Coughing is a common symptom of many diseases,
such as virus and bacterial infection, respiratory tract infection, asthma, gastroesophageal reflux
disease and etc.

127

Phase (b)
Phase (a)

Phase (b)
Phase (c)

Phase (c)

Chest belt output

Chest belt output

Phase (a)

Phase (b)

Phase (c)

Time

Accelerometer &
Bluetooth module

Time
Phase (a): inhalation
Phase (b): forced exhalation against a closed glottis
Phase (c): violent release of air from lungs with opening of glottis

(a)

(b)

Figure 8-1: Anticipated output of chest belts during coughing (a) one coughing within a
breathing cycle (b) two consecutive coughs in a breathing cycle
Figure 8-1demonstrates the anticipated output of the chest belts during coughing based on
the fact that coughing has three phases. If similar breathing signal can be observed during data
collection, we should be able to detect coughing using matched filter based mechanism or
machine learning mechanism.

128

BIBLIOGRAPHY

129

BIBLIOGRAPHY

[1] WHO, “WHO | Obesity and overweight,” WHO. [Online]. Available:
http://www.who.int/mediacentre/factsheets/fs311/en/. [Accessed: 26-Jan-2013].
[2] T. O. Cheng, “Fast food and obesity in China,” J. Am. Coll. Cardiol., vol. 42, no. 4, pp.
773–773, Aug. 2003.
[3] Yangfeng Wu, “Overweight and obesity in China,” BMJ, vol. 333, Aug. 2006.
[4] J. Chhatwal, M. Verma, and S. Riar, “Obesity among pre-adolescent and adolescents of a
developing country (India).,” Asia Pac. J. Clin. Nutr., vol. 13, no. 3, pp. 231–235, 2004.
[5] A. Misra, R. Pandey, J. Devi, R. Sharma, N. Vikram, and N. Khanna, “High prevalence of
diabetes, obesity and dyslipidaemia in urban slum population in northern India.,” Int. J.
Obes. Relat. Metab. Disord. J. Int. Assoc. Study Obes., vol. 25, no. 11, pp. 1722–1729,
Nov. 2001.
[6] R. H. Eckel and R. M. Krauss, “American Heart Association Call to Action: Obesity as a
Major Risk Factor for Coronary Heart Disease,” Circulation, vol. 97, no. 21, pp. 2099–
2100, Jun. 1998.
[7] T. L. Visscher and J. C. Seidell, “The Public Health Impact of Obesity,” Annu. Rev. Public
Health, vol. 22, no. 1, pp. 355–375, 2001.
[8] H. Jia and E. I. Lubetkin, “The impact of obesity on health-related quality-of-life in the
general adult US population,” J. Public Health, vol. 27, no. 2, pp. 156–164, Jun. 2005.
[9] K. Weis, V. R. Taylor, and C. for D. C. and Prevention, Measuring Healthy Days:
Population Assessment of Health-Related Quality of Life. .
[10] L. S. Nielsen, K. V. Danielsen, and T. I. A. Sørensen, “Short sleep duration as a possible
cause of obesity: critical analysis of the epidemiological evidence,” Obes. Rev., vol. 12, no.
2, pp. 78–92, 2011.
[11] P. Björntorp, “Do stress reactions cause abdominal obesity and comorbidities?,” Obes.
Rev., vol. 2, no. 2, pp. 73–86, 2001.
[12] A. Astrup, J. O. Hill, and S. Rössner, “The cause of obesity: are we barking up the wrong
tree?,” Obes. Rev., vol. 5, no. 3, pp. 125–127, 2004.
[13] E. Jéquier, “Pathways to obesity,” Int. J. Obes., vol. 26, no. Suppl2, pp. S12–S17, 2002.

130

[14] V. A. Vance, S. J. Woodruff, L. J. McCargar, J. Husted, and R. M. Hanning, “Self-reported
dietary energy intake of normal weight, overweight and obese adolescents,” Public Health
Nutr., vol. 12, no. 2, pp. 222–227, Feb. 2009.
[15] D. A. Schoeller, “Limitations in the assessment of dietary energy intake by self-report,”
Metabolism, vol. 44, Supplement 2, pp. 18–22, Feb. 1995.
[16] C. D. Samuel-Hodge, L. M. Fernandez, C. F. Henríquez-Roldán, L. F. Johnston, and T. C.
Keyserling, “A comparison of self-reported energy intake with total energy expenditure
estimated by accelerometer and basal metabolic rate in African-American women with
type 2 diabetes,” Diabetes Care, vol. 27, no. 3, pp. 663–669, Mar. 2004.
[17] B. Dong and S. Biswas, “Wearable networked sensing for human mobility and activity
analytics: A systems study,” in 2012 Fourth International Conference on Communication
Systems and Networks (COMSNETS), 2012, pp. 1 –6.
[18] U. Maurer, A. Smailagic, D. P. Siewiorek, and M. Deisher, “Activity recognition and
monitoring using multiple sensors on different body positions,” presented at the
International Workshop on Wearable and Implantable Body Sensor Networks, 2006. BSN
2006, 2006, p. 4 pp. –116.
[19] M. Sun and J. O. Hill, “A method for measuring mechanical work and work efficiency
during human activities,” J. Biomech., vol. 26, no. 3, pp. 229–241, Mar. 1993.
[20] S. E. Crouter, K. G. Clowers, and D. R. Bassett Jr, “A novel method for using
accelerometer data to predict energy expenditure,” J. Appl. Physiol. Bethesda Md 1985,
vol. 100, no. 4, pp. 1324–1331, Apr. 2006.
[21] U. Varshney, “Pervasive Healthcare and Wireless Health Monitoring,” Mob Netw Appl, vol.
12, no. 2–3, pp. 113–127, Mar. 2007.
[22] R. O. Dantas, M. K. Kern, B. T. Massey, W. J. Dodds, P. J. Kahrilas, J. G. Brasseur, I. J.
Cook, and I. M. Lang, “Effect of swallowed bolus variables on oral and pharyngeal phases
of swallowing,” Am. J. Physiol., vol. 258, no. 5 Pt 1, pp. G675–681, May 1990.
[23] K. Zhang, F. X. Pi-Sunyer, and C. N. Boozer, “Improving energy expenditure estimation
for physical activity,” Med. Sci. Sports Exerc., vol. 36, no. 5, pp. 883–889, May 2004.
[24] J. Gates, G. G. Hartnell, and G. D. Gramigna, “Videofluoroscopy and Swallowing Studies
for Neurologic Disease: A Primer1,” Radiographics, vol. 26, no. 1, pp. e22–e22, Jan. 2006.
[25] A. L. Perlman, P. M. Palmer, T. M. McCulloch, and D. J. Vandaele, “Electromyographic
activity from human laryngeal, pharyngeal, and submental muscles during swallowing,” J.
Appl. Physiol. Bethesda Md 1985, vol. 86, no. 5, pp. 1663–1669, May 1999.
[26] B. Dong and S. Biswas, “Swallow monitoring through apnea detection in breathing signal,”
presented at the 2012 Annual International Conference of the IEEE Engineering in
Medicine and Biology Society (EMBC), 2012, pp. 6341 –6344.
131

[27] S. Passler and W.-J. Fischer, “Food Intake Activity Detection Using a Wearable
Microphone System,” in 2011 7th International Conference on Intelligent Environments
(IE), 2011, pp. 298 –301.
[28] K. Takahashi, M. E. Groher, and K. Michi, “Methodology for detecting swallowing
sounds,” Dysphagia, vol. 9, no. 1, pp. 54–62, Dec. 1994.
[29] O. Amft and G. Troster, “Methods for Detection and Classification of Normal Swallowing
from Muscle Activation and Sound,” presented at the Pervasive Health Conference and
Workshops, 2006, 2006, pp. 1 –10.
[30] J. A. Cichero and B. E. Murdoch, “The physiologic cause of swallowing sounds: answers
from heart sounds and vocal tract acoustics,” Dysphagia, vol. 13, no. 1, pp. 39–52, 1998.
[31] E. Keogh, S. Chu, D. Hart, and M. Pazzani, “An online algorithm for segmenting time
series,” in ICDM 2001, Proceedings IEEE International Conference on Data Mining,
2001, 2001, pp. 289–296.
[32] S. Passler and W.-J. Fischer, “Food Intake Activity Detection Using a Wearable
Microphone System,” presented at the 2011 7th International Conference on Intelligent
Environments (IE), 2011, pp. 298 –301.
[33] J. Nishimura and T. Kuroda, “Eating habits monitoring using wireless wearable in-ear
microphone,” in 3rd International Symposium on Wireless Pervasive Computing, 2008.
ISWPC 2008, 2008, pp. 130–132.
[34] O. Makeyev, P. Lopez-Meyer, S. Schuckers, W. Besio, and E. Sazonov, “Automatic food
intake detection based on swallowing sounds,” Biomed. Signal Process. Control, vol. 7,
no. 6, pp. 649–656, Nov. 2012.
[35] W. P. Walker and D. Bhatia, “Towards automated ingestion detection: swallow sounds,”
Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., vol. 2011, pp. 7075–7078, 2011.
[36] E. Sazonov, S. Schuckers, P. Lopez-Meyer, O. Makeyev, N. Sazonova, E. L. Melanson, and
M. Neuman, “Non-invasive monitoring of chewing and swallowing for objective
quantification of ingestive behavior,” Physiol. Meas., vol. 29, no. 5, pp. 525–541, May
2008.
[37] M. Aboofazeli and Z. Moussavi, “Analysis of swallowing sounds using hidden Markov
models,” Med. Biol. Eng. Comput., vol. 46, no. 4, pp. 307–314, Apr. 2008.
[38] O. Amft, H. Junker, and G. Troster, “Detection of eating and drinking arm gestures using
inertial body-worn sensors,” in Ninth IEEE International Symposium on Wearable
Computers, 2005. Proceedings, 2005, pp. 160–163.
[39] O. Amft and G. Tröster, “Recognition of dietary activity events using on-body sensors,”
Artif. Intell. Med., vol. 42, no. 2, pp. 121–136, Feb. 2008.

132

[40] L. Mioche, P. Bourdiol, J. F. Martin, and Y. Noël, “Variations in human masseter and
temporalis muscle activity related to food texture during free and side-imposed
mastication,” Arch. Oral Biol., vol. 44, no. 12, pp. 1005–1012, Dec. 1999.
[41] C. S. Holger Nahrstaedt, “Swallow Detection Algorithm Based on Bioimpedance and EMG
Measurements,” pp. 91 – 96, 2012.
[42] Y. Dong, A. Hoover, J. Scisco, and E. Muth, “A new method for measuring meal intake in
humans via automated wrist motion tracking,” Appl. Psychophysiol. Biofeedback, vol. 37,
no. 3, pp. 205–215, Sep. 2012.
[43] Alexandre Moreau–Gaudry, Abdelkebir Sabil, Gila Benchetrit, and Alain Franco, “Use of
Respiratory Inductance Plethysmography for the Detection of Swallowing in the Elderly,”
Dysphagia, vol. 20, no. 4, pp. 297–302, Oct. 2005.
[44] S. Damouras, E. Sejdic, C. M. Steele, and T. Chau, “An Online Swallow Detection
Algorithm Based on the Quadratic Variation of Dual-Axis Accelerometry,” IEEE Trans.
Signal Process., vol. 58, no. 6, pp. 3352–3359, 2010.
[45] E. Sejdic, C. M. Steele, and T. Chau, “Segmentation of Dual-Axis Swallowing
Accelerometry Signals in Healthy Subjects With Analysis of Anthropometric Effects on
Duration of Swallowing Activities,” IEEE Trans. Biomed. Eng., vol. 56, no. 4, pp. 1090–
1097, 2009.
[46] A. Kandori, T. Yamamoto, Y. Sano, M. Oonuma, T. Miyashita, M. Murata, and S. Sakoda,
“Simple Magnetic Swallowing Detection System,” IEEE Sens. J., vol. 12, no. 4, pp. 805–
811, 2012.
[47] Y. Saeki and F. Takeda, “Proposal of Food Intake Measuring System in Medical Use and
Its Discussion of Practical Capability,” in Knowledge-Based Intelligent Information and
Engineering Systems, R. Khosla, R. J. Howlett, and L. C. Jain, Eds. Springer Berlin
Heidelberg, 2005, pp. 1266–1273.
[48] K. Chang, S. Liu, H. Chu, J. Y. Hsu, C. Chen, T. Lin, C. Chen, and P. Huang, “The DietAware Dining Table: Observing Dietary Behaviors over a Tabletop Surface,” in Pervasive
Computing, K. P. Fishkin, B. Schiele, P. Nixon, and A. Quigley, Eds. Springer Berlin
Heidelberg, 2006, pp. 366–382.
[49] B. Dong and S. Biswas, “Swallow monitoring through apnea detection in breathing signal,”
in 2012 Annual International Conference of the IEEE Engineering in Medicine and
Biology Society (EMBC), 2012, pp. 6341 –6344.
[50] J. E. Hall and A. C. Guyton, Textbook of medical physiology. Philadelphia, Pa.; London:
Saunders, 2010.
[51] G. Grimby, J. Bunn, and J. Mead, “Relative contribution of rib cage and abdomen to
ventilation during exercise.,” J. Appl. Physiol., vol. 24, no. 2, pp. 159–166, Feb. 1968.

133

[52] K. Ashutosh, R. Gilbert, J. Auchincloss, and D. Peppi, “ASynchronous breathing
movements in patients with chronic obstructive pulmonary disease.,” CHEST J., vol. 67,
no. 5, pp. 553–557, May 1975.
[53] O. Amft and G. Troster, “Methods for Detection and Classification of Normal Swallowing
from Muscle Activation and Sound,” in Pervasive Health Conference and Workshops,
2006, 2006, pp. 1 –10.
[54] B. J. Martin, J. A. Logemann, R. Shaker, and W. J. Dodds, “Coordination between
respiration and swallowing: respiratory phase relationships and temporal integration,” J.
Appl. Physiol. Bethesda Md 1985, vol. 76, no. 2, pp. 714–723, Feb. 1994.
[55] M. S. Klahn and A. L. Perlman, “Temporal and durational patterns associating respiration
and swallowing,” Dysphagia, vol. 14, no. 3, pp. 131–138, 1999.
[56] H. G. Preiksaitis and C. A. Mills, “Coordination of breathing and swallowing: effects of
bolus consistency and presentation in normal adults,” J. Appl. Physiol. Bethesda Md 1985,
vol. 81, no. 4, pp. 1707–1714, Oct. 1996.
[57] H. Nilsson, O. Ekberg, R. Olsson, O. Kjellin, and B. Hindfelt, “Quantitative assessment of
swallowing in healthy adults,” Dysphagia, vol. 11, no. 2, pp. 110–116, 1996.
[58] H. G. Dietz, “Pneumatic breathing belt sensor with minimum space maintaining tapes,”
4602643, 29-Jul-1986.
[59] P. Corbishley and E. Rodriguez-Villegas, “Breathing Detection: Towards a Miniaturized,
Wearable, Battery-Operated Monitoring System,” IEEE Trans. Biomed. Eng., vol. 55, no.
1, pp. 196–204, 2008.
[60] M. Y.-W. Chia, S. W. Leong, C. K. Sim, and K. M. Chan, “Through-wall UWB radar
operating within FCC’s mask for sensing heart beat and breathing rate,” in Microwave
Conference, 2005 European, 2005, vol. 3, p. 4 pp.–.
[61] K. Mukai, Y. Yonezawa, H. Ogawa, H. Maki, and W. M. Caldwell, “A remote monitor of
bed patient cardiac vibration, respiration and movement,” Conf. Proc. Annu. Int. Conf.
IEEE Eng. Med. Biol. Soc. IEEE Eng. Med. Biol. Soc. Conf., vol. 2009, pp. 5191–5194,
2009.
[62] S. E. Crouter, K. G. Clowers, and D. R. Bassett Jr, “A novel method for using
accelerometer data to predict energy expenditure,” J. Appl. Physiol. Bethesda Md 1985,
vol. 100, no. 4, pp. 1324–1331, Apr. 2006.
[63] A. M. Swartz, S. J. Strath, D. R. Bassett Jr, W. L. O’Brien, G. A. King, and B. E.
Ainsworth, “Estimation of energy expenditure using CSA accelerometers at hip and wrist
sites,” Med. Sci. Sports Exerc., vol. 32, no. 9 Suppl, pp. S450–456, Sep. 2000.

134

[64] M. H. Jones, R. Goubran, and F. Knoefel, “Reliable respiratory rate estimation from a bed
pressure array,” Conf. Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. IEEE Eng. Med.
Biol. Soc. Conf., vol. 1, pp. 6410–6413, 2006.
[65] M. Holtzman, D. Townsend, R. Goubran, and F. Knoefel, “Breathing sensor selection
during movement,” Conf. Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. IEEE Eng.
Med. Biol. Soc. Conf., vol. 2011, pp. 381–384, 2011.
[66] A. Bates, M. Ling, C. Geng, A. Turk, and D. K. Arvind, “Accelerometer-Based Respiratory
Measurement During Speech,” in 2011 International Conference on Body Sensor Networks
(BSN), 2011, pp. 95–100.
[67] A. Jin, B. Yin, G. Morren, H. Duric, and R. M. Aarts, “Performance evaluation of a tri-axial
accelerometry-based respiration monitoring for ambient assisted living,” Conf. Proc. Annu.
Int. Conf. IEEE Eng. Med. Biol. Soc. IEEE Eng. Med. Biol. Soc. Conf., vol. 2009, pp.
5677–5680, 2009.
[68] P. D. Hung, S. Bonnet, R. Guillemaud, E. Castelli, and P. T. N. Yen, “Estimation of
respiratory waveform using an accelerometer,” in 5th IEEE International Symposium on
Biomedical Imaging: From Nano to Macro, 2008. ISBI 2008, 2008, pp. 1493–1496.
[69] M. H. T. Reinvuo, “Measurement of respiratory rate with high-resolution accelerometer and
emfit pressure sensor,” pp. 192 – 195, 2006.
[70] B. B. Koo, C. Drummond, S. Surovec, N. Johnson, S. A. Marvin, and S. Redline,
“Validation of a Polyvinylidene Fluoride Impedance Sensor for Respiratory Event
Classification during Polysomnography,” J. Clin. Sleep Med., Oct. 2011.
[71] K. P. Sau, D. Khastgir, and T. K. Chaki, “Electrical conductivity of carbon black and
carbon fibre filled silicone rubber composites,” Angew. Makromol. Chem., vol. 258, no. 1,
pp. 11–17, 1998.
[72] G. Gautschi, Piezoelectric Sensorics: Force, Strain, Pressure, Acceleration and Acoustic
Emission Sensors, Materials and Amplifiers. Springer, 2002.
[73] G. G. Mazeika, “Respiratory Inductance Plethysmography An Introduction,” 2007.
[74] R. O. Dantas, M. K. Kern, B. T. Massey, W. J. Dodds, P. J. Kahrilas, J. G. Brasseur, I. J.
Cook, and I. M. Lang, “Effect of swallowed bolus variables on oral and pharyngeal phases
of swallowing,” Am. J. Physiol., vol. 258, no. 5 Pt 1, pp. G675–681, May 1990.
[75] G. Turin, “An introduction to matched filters,” IRE Trans. Inf. Theory, vol. 6, no. 3, pp. 311
–329, Jun. 1960.
[76] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA
data mining software: an update,” SIGKDD Explor Newsl, vol. 11, no. 1, pp. 10–18, Nov.
2009.

135

[77] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed. Wiley-Interscience,
2000.
[78] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing (3rd Edition), 3rd
ed. Prentice Hall, 2009.
[79] M. A. Hall and G. Holmes, “Benchmarking attribute selection techniques for discrete class
data mining,” IEEE Trans. Knowl. Data Eng., vol. 15, no. 6, pp. 1437–1447, 2003.
[80] L. Tarassenko, A Guide to Neural Computing Applications. John Wiley & Sons, 1998.
[81] A. K. Jain, J. Mao, and K. M. Mohiuddin, “Artificial neural networks: a tutorial,”
Computer, vol. 29, no. 3, pp. 31–44, 1996.
[82] L. Rokach, Data Mining with Decision Trees: Theory and Applications. World Scientific,
2008.
[83] Bo Dong, Subir Biswas, “Noninvasive wearable diet monitoring through breathing signal
analysis,” presented at the SPIE Defense, Security, and Sensing, Baltimore, Maryland,
United States, 2013.
[84] Bo Dong and Subir Biswas, “Liquid Intake Monitoring Through Breathing Signal Using
Machine Learning,” in SPIE Defense, Security, and Sensing, Baltimore, Maryland, United
States, 2013.
[85] A. J. Wilson, C. I. Franks, and I. L. Freeston, “Algorithms for the detection of breaths from
respiratory waveform recordings of infants,” Med. Biol. Eng. Comput., vol. 20, no. 3, pp.
286–292, May 1982.
[86] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A Training Algorithm for Optimal Margin
Classifiers,” in Proceedings of the 5th Annual ACM Workshop on Computational Learning
Theory, 1992, pp. 144–152.
[87] G. Wahba and G. Wahba, Support Vector Machines, Reproducing Kernel Hilbert Spaces
and the Randomized GACV. 1998.
[88] L. Rabiner, “A tutorial on hidden Markov models and selected applications in speech
recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257–286, 1989.
[89] L. E. Baum and J. A. Eagon, “An inequality with applications to statistical estimation for
probabilistic functions of Markov processes and to a model for ecology,” Bull. Am. Math.
Soc., vol. 73, no. 3, pp. 360–363, 1967.
[90] M. A. Hall and G. Holmes, “Benchmarking attribute selection techniques for discrete class
data mining,” IEEE Trans. Knowl. Data Eng., vol. 15, no. 6, pp. 1437–1447, 2003.

136

[91] Y. Ma, E. R. Bertone, E. J. Stanek, G. W. Reed, J. R. Hebert, N. L. Cohen, P. A. Merriam,
and I. S. Ockene, “Association between Eating Patterns and Obesity in a Free-living US
Adult Population,” Am. J. Epidemiol., vol. 158, no. 1, pp. 85–92, Jul. 2003.
[92] M. E. Gluck, C. A. Venti, A. D. Salbe, and J. Krakoff, “Nighttime eating: commonly
observed and related to weight gain in an inpatient food intake study,” Am. J. Clin. Nutr.,
vol. 88, no. 4, pp. 900–905, Oct. 2008.
[93] J. Cleator, J. Abbott, P. Judd, C. Sutton, and J. P. H. Wilding, “Night eating syndrome:
implications for severe obesity,” Nutr. Diabetes, vol. 2, no. 9, p. e44, Sep. 2012.
[94] G. S. Andersen, A. J. Stunkard, T. I. A. Sørensen, L. Petersen, and B. L. Heitmann, “Night
eating and weight change in middle-aged men and women,” Int. J. Obes. Relat. Metab.
Disord. J. Int. Assoc. Study Obes., vol. 28, no. 10, pp. 1338–1343, Oct. 2004.
[95] N. L. Rogers, D. F. Dinges, K. C. Allison, G. Maislin, N. Martino, J. P. O’Reardon, and A.
J. Stunkard, “Assessment of sleep in women with night eating syndrome,” Sleep, vol. 29,
no. 6, pp. 814–819, Jun. 2006.
[96] S. Sassaroli, G. M. Ruggiero, P. Vinai, S. Cardetti, G. Carpegna, N. Ferrato, P. Vallauri, D.
Masante, S. Scarone, S. Bertelli, R. Bidone, L. Busetto, and S. Sampietro, “Daily and
nightly anxiety among patients affected by night eating syndrome and binge eating
disorder,” Eat. Disord., vol. 17, no. 2, pp. 140–145, Apr. 2009.
[97] Bo Dong, Subir Biswas, “Wearable Diet Monitoring through Breathing Signal Analysis,”
presented at the 2013 Annual International Conference of the IEEE Engineering in
Medicine and Biology Society (EMBC), Osaka, Japan, 2013.
[98] B. Dong, S. Biswas, R. Gernhardt, and J. Schlemminger, “A Mobile Food Intake
Monitoring System Based on Breathing Signal Analysis,” in Proceedings of the 8th
International Conference on Body Area Networks, ICST, Brussels, Belgium, Belgium,
2013, pp. 165–168.
[99] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A Training Algorithm for Optimal Margin
Classifiers,” in Proceedings of the Fifth Annual Workshop on Computational Learning
Theory, New York, NY, USA, 1992, pp. 144–152.
[100] D. F. Balph and M. H. Balph, “On the Psychology of Watching Birds: The Problem of
Observer-Expectancy Bias,” The Auk, vol. 100, no. 3, pp. 755–757, Jul. 1983.
[101] H. Liu, J. Sun, L. Liu, and H. Zhang, “Feature selection with dynamic mutual
information,” Pattern Recognit., vol. 42, no. 7, pp. 1330–1339, Jul. 2009.
[102] C. S. Dhir, N. Iqbal, and S.-Y. Lee, “Efficient feature selection based on information gain
criterion for face recognition,” in International Conference on Information Acquisition,
2007. ICIA ’07, 2007, pp. 523–527.

137

[103] H. B. Borges, C. N. Silla Jr., and J. C. Nievola, “An evaluation of global-model
hierarchical classification algorithms for hierarchical classification problems with single
path of labels,” Comput. Math. Appl., vol. 66, no. 10, pp. 1991–2002, Dec. 2013.
[104] E. P. Costa, D. C. D. Computação, I. S. Carlos, A. C. Lorena, and I. S. Carlos, A Review of
Performance Evaluation Measures for Hierarchical Classifiers. .
[105] M. A. Crary, G. D. Carnaby, I. Sia, A. Khanna, and M. F. Waters, “Spontaneous
swallowing frequency has potential to identify dysphagia in acute stroke,” Stroke J. Cereb.
Circ., vol. 44, no. 12, pp. 3452–3457, Dec. 2013.
[106] M. Pehlivan, N. Yüceyar, C. Ertekin, G. Çelebi, M. Ertaş, T. Kalayci, and I. Aydoğdu, “An
electronic device measuring the frequency of spontaneous swallowing: Digital
Phagometer,” Dysphagia, vol. 11, no. 4, pp. 259–264, Sep. 1996.
[107] [Online].
Available:
http://www.nsc.org/safety_home/HomeandRecreationalSafety/Pages/Choking.aspx.
[Accessed: 26-Nov-2013].

138