‘9.—

.11

 

 

 

 

 

 

 

 

 

 

 

«.3. V .
dump L. ‘
Ami...” .‘ . .
“24.... ,

.

aﬂmmmn a
. a

La

0.-

.
.M.
t u
.i
x

‘
r
»

\ a
5...
34M...
.5. . x:
:3" . a

I
5.23.... .3

«324

.
3..

<..

u

..
3.,

a. . .i _.,. 3“

.d. “30%. n»;

.94.”

<7 0.. .

.7 : exu‘én

.F‘n , V .3-

J«. .1. ”5.4.1.1..

5.- ”if. as}:
.i

n 1.. ‘u
. . ".mprmvxu.
lug-“v.5!
.1 S

A ..

. 1x4.“ ,.
f n

u

c .«k. T ...u“.~ﬂ§.h§
. .7. 3 s. b
g 5%.. WM?» .

.13iii l
..
. .3
u 9).:

“Hum” .iﬁo 2.

r a t... .

a: u .rnusuwamﬂ.
n5

 

T PIES I S LIBRARY

;1/ MiChigan State
I, ’ . nlversity
C- ' '
H [,2 Q 712.

This is to certify that the
dissertation entitled

ON PREDICTION AND DETECTION OF EPILEPTIC SEIZURES BY
MEANS OF GENETIC PROGRAMMING ARTIFICIAL FEATURES

presented by

HIRAM ALEXER FIRPI

has been accepted towards fulﬁllment
of the requirements for the

Ph.D. degree in Electrical Engineean

 

 

Major Professor’s Signature

Date

 

MSU is an Afﬁrmative Action/Equal Opportunity Institution

 

PLACE IN RETURN BOX to remove this checkout from your record.
To AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

DATE DUE DATE DUE DATE DUE

 

ON PREDICTION AND DETECTION OF EPILEPTIC SEIZURES BY MEANS OF
GENETIC PROGRAMMING ARTIFICIAL FEATURES

By

Hiram Alexer Firpi

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY

Department of Electrical and Computer Engineering

2005

ABSTRACT

ON PREDICTION AND DETECTION OF EPILEPTIC SEIZURES BY MEANS OF
GENETIC PROGRAMMING ARTIFICIAL FEATURES

By
Hiram Alexer Firpi
This work presents a novel, general-purpose algorithm called Genetic Programming
Artiﬁcial Features (GPAF), which consists of a genetic programming (GP) algorithm and
a k-nearest neighbor classiﬁer, and which surpasses the performance of another recently
published method called Genetically Found, Neurally Computed Artiﬁcial Features for
addressing similar classes of problems. Unlike conventional features, which are designed
based on human knowledge, experience, and/or intuition, the artiﬁcial features (i.e.,
features that are computer-crafted and may not have a known physical meaning) are
systematically and automatically designed by a computer from data provided. In this
dissertation, we apply the GPAF algorithm to one of the most puzzling brain-disorder
problems: the prediction and detection of epileptic seizures. Epilepsy is a neurological
condition that makes people susceptible to brief electrical disturbance in the brain thus
producing a change in sensation, awareness, and/or behavior; and is characterized by
recurrent seizures. It affects up to 11% of the worldwide population, or sixty million
people, and 25% cannot be ﬁJlly controlled by current pharmacological or surgical
treatment. The possibility that an implantable device might eventually warn patients of an
impending seizure is of utmost importance, allowing on-the-spot medication or safety

measures.

Epileptic electroencephalographic (EEG) signals were treated from a chaos theory
perspective. First, we reconstructed the EEG state-space trajectories via a delay-
embedding scheme. Then these pseudo—state-space vectors were input to a genetic
programming algorithm, which designed one or more (non)linear features providing an
artiﬁcial space where the baseline (nonseizure data) and preictal (preseizure data, or ictal
data in case of detection) classes are sufﬁciently separated for a classiﬁer to achieve
better accuracy than using principal components analysis, our benchmark feature
extractor. The GPAF algorithm was applied to data segments extracted from 730 hours of
EEG recording obtained from seven patients. The machine automatically discovered one
or more patient-speciﬁc features that predicted epileptic seizures with a time horizon
from one to ﬁve minutes before the unequivocal electrographic onset of each seizure.
Results showed that 43 of 55 seizures were correctly predicted, for a 78.19% correct
classiﬁcation rate, while 55 epochs out of 59 representative of baseline conditions were
classiﬁed correctly, for a low false positive rate per hour of 0.0508. In the case of
detection, a low false-positive-per-hour-rate and a high detection rate were also achieved.
A generic (cross-patient) model for prediction of epileptic seizures was also found, at the
expense of decreased performance with an average of 69.09% sensitivity. The GPAF
algorithm was additionally investigated to design seizure detectors. Evaluating 730 hours
of EEG recording showed that with customized, artiﬁcially designed detectors, 83 of 86
seizures were detected. Seven previously unreported seizures were also detected in this

work.

I . - via! 144! 'Nr lift? ‘2'?li .
,. . .- we grants -

 

 

ToHiramElsir-aAmed .- ‘ S“
Janus ' ’
people with whom I have Mm, “ﬁﬁm

”t" mettvwi 214'. i- . '
. ..“. “q'uf’u!

 

“Discovery consists of seeing what everybody
has seen and thinking what nobody has thought”

—A1bert von Szent-Gyorgyi

A good feature extractor consists of “seeing” what every
method has seen and computing what no method has computed

vi

ACKNOWLEDGMENTS
First, I want to thank Dr. Erik Goodman for his advice, dedication, and time as my
advisor. Second, to Dr. Javier Echauz, who has been not just a member of the Ph.D.
committee but also a friend. I want to thank him because of his dedication, motivation,
and time to teach me, with lessons even delivered by phone. Also, I want to give my
gratitude to Barbara O’Kelly for her help and time.

Additionally, I want to give many thanks to the Harriet G. Jenkins Predoctoral
Fellowship for their ﬁnancial support throughout the years to complete this work.

I want to thank my father Hiram F irpi-Valencia and my mother Elsira Cruz-Castro
because of everything that they have done for me, for their unconditional support, their
motivation, encouragement, and advice throughout all my life. They are my heroes and
role models to follow because they have been by far the most persistent people I have
ever known. Without their guidance and inspiration nothing of what I have done could
have been possible. Additionally, I want to thank my brother Hiram Amed Firpi-sz and
my sister Jennette E. Firpi-sz for their support and all the moments that we have shared
together. Also, I want to thank my wife Linés Ortiz-Molina for her help and motivation.
She has been with me even in the most difﬁcult moments that a person can experience. I
also want to thank my uncle Francisco Cruz-Castro for the adventure and exploration
senses that he gave me, which in part are the spark of my life.

Finally, I want to thank Miguel A. Figueroa-Villanueva because of his time and
programming contribution to this work. Also, to all those people that in same way or

another contributed to make this work possible.

vii

TABLE OF CONTENTS

LIST OF FIGURES
LIST OF TABLES
LIST OF ACRONYMS
CHAPTER 1

1.

a.

E

 

INTRODUCTION
1.1 MOTIVATION

 

1.2 OBJECTIVES

 

1.3 METHODS

 

1.4 CONTRIBUTIONS

 

1.5 OUTLINE

 

CHAPTER2

 

LITERATURE REVIEW

 

2.1 EPILEPSY OVERVIEW

 

2.2 RELEVANCE OF FEATURES

 

2.3 CONVENTIONAL MEASURES USED IN SIGNAL ANALYSIS
2.4 SIGNAL ANALYSIS AND FEATURE EXTRACTION IN EPILEPSY .................................

2.4.1 Fourier Transform
2.4.2 Wavelet Transform

 

 

2.4.3 Fractal I"

Ni—ib-lI—I—lb—l
COOOQUII-‘Q 0‘ a Aaww— I-I I— ..

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2.4.4 Artiﬁcial Neural Networks 21
2.4.5 Lyapunov Exponents 23
2.4.6 Fuzzy Logic 24
2.4.7 Cellular Neural Networks 25
2.4.8 Genetic Algorithm 26
2.4.9 Additional Work 27

2.5 GENETICALLY FOUND, NEURALLY COMPUTED ARTIFICIAL FEATURES ................. 27
2.6 EVOLUTIONARY ALGORITHMS 29
2.7 GENETIC ALGORITHMS 30
2.8 GENETIC PROGRAMMING 34
CHAPTER 3 38
GENETIC PROGRAMMING TO CREATE ARTIFICIAL FEATURES ............... 38
3.1 GFNCAF ALGORITHM LIMITATIONS 38
3.2 GENETIC PROGRAMMING ARTIFICIAL FEATT IRES 39
3.3 COMPARING GPAF AND GFNCAF ALGORITHMS 42
3.4 RESULTS AND COMPARISON 44
3.5 SI IMMARY 49

 

viii

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

CHAPTER 4 52
GP ARTIFICIAL FEATURES APPLIED TO SEIZURE PREDICTION ............... 52
4.1 CLINICAL CONTEXT 52
4.1.1 Data Acquisition Equipment and MRS 54
4.1.2 EEG Data Description 55
4.2 EEG SIGNALS AT A GLANCE 56
4.3 NONLINEAR DYNAMICAL SYSTEMS AND STATE-SPACE RECONSTRUCTION ........... 59
4.4 STATE-SPACE RECONSTRUCTION AND THE GPAF ALGORITHM ............................. 64
4.4.1 Classiﬁer 70
4.4.2 Detailed Fvample 71
4.5 AUTOCORRELATTON FUNCTION 72
4.6 PERFORMANCE INDEX 75
4.7 BENCHMARK CLASSIF 81
4.8 RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE ....................................... 84
4.9 FUNCTION AND TERMINAL SETS FOR GP 86
4.10 RESULTS 89
4.10.1 Patient A 91
4.10.2 Patient R 94
4.10.3 Patient C 96
4.10.4 Patient D 98
4.10.5 Patient F 100
4.10.6 Patient F 102
4.10.7 Patient G 105
4.11 DISCUSSION 107
4.12 STATISTICAL TEST 110
4.13 GENERIC GPAF EQUATIONS 1 12
4.14 TRAINING THE GPAF ALGORITHM USING THE LEAVE-ONE-OUT METHOD ......... 1 15
4.15 COMPARISON WITH ANOTHER PROPOSED METHOD 1 17
4.16 SUMMARY 119
CHAPTER 5 122
DETECTION OF SEIZURE ONSET BY MEANS OF GPAF ................................ 122
5.1 WHY DETECT ONSET OF SEIZUREs? 122
5.2 ICTAL EEG SIGNAI S 124
5.3 DETECTION WITH GP ARTIFICIAL FEATI IRF‘S 125
5.4 BENCHMARK FEATURE 127
5.5 DECISION INTEGRATION WINDOW 129
5.6 RESULTS 130
5.6.1 Patient A 131
5.6.2 Patient R 132
5.6.3 Patient C 134
5.6.4 Patient D 134

 

5.6.5 PatientF
5.6.6 PatientF

 

5.6.7 Patient G
5.7 WHAT CAUSES FALSE POSITIVES?

 

5.8 A GENERIC DETECTOR
5.9 A DETECTOR WITH 1 = 1

 

5.10 TRAINING BY LEAVE-ONE-OUT METHOD

 

5.1 1 DISCUSSION

 

5.12 SUMMARY

 

CHAPTER 6
CONCLUSIONS AND RECOMMENDATIONS
6.1 BENEFITS OF THE ALGORITHM

 

6,2 anr‘rnsmnq
6.3 LIMITATIONS AND RECOMMENDATIONS

 

APPENDICES
BIBLIOGRAPHY

135
136
138
142
145
147
150
152
156

157
157

157
159
160

I63
I 73

LIST OF FIGURES
Figure 2.1 WAVEFORM OF AN ARBITRARY SIGNAI 12

Figure 2.2 GENETICALLY FOUND, NEURALLY COMPUTED ARTIFICIAL FEATURE

 

 

 

ALGORITHM BLOCK DIAGRAM 79
Figure 2.3 STEPS OF A CANONICAL GENETIC ALGORITHM (GA) 1%
Figure 2.4 A TREE DIAGRAM OF A GP CHRannMF as
Figure 2.5 ILLUSTRATION OF A TYPICAL CROSSOVER OPERATOR IN GP ............................ 36
Figure 2.6 ILLUSTRATION OF THE MUTATION OPERATOR IN A GP FRAMEWORK ................ 37
Figure 3.1 DIAGRAM OF THE GP-BASED ARTIFICIAL FEATURES ALGORITHM .................... 41

Figure 3.2 TREE DIAGRAM FOR THE ARTIFICIAL FEATURE FOUND BY THE GPAF
ALGORITHM 47

Figure 3.3 EVOLUTION OF FITNESS, LEVELS (DEPTH), AND NODES FOR GP FEATURE WITH 4
INPUTS 48

Figure 3.4 EVOLUTION OF FITNESS, LEVELS (DEPTH), AND NODES FOR GP FEATURE WITH 8
INPUTS 43

Figure 3.5 TREE DIAGRAM FOR THE ARTIFICIAL FEATURE WITH 8 FEATURE INPUTS ......... 49

Figure 4.1 ELECTRODES PLACED IN THE HOLE DRILLED IN THE BACK AREA OF THE
SKIN] g4

Figure 4.2 (a) ILLUSTRATION OF TEN-MINUTE—DURATION BASELINE SIGNAL AND (b) TEN-
MINUTE-DURATTON PREICTAL SIGNAL, EXTRACTED UNTIL FIVE MINUTES BEFORE
UEO 58

Figure 4.3 PROBABILITY DENSITY FUNCTION FOR Two-TEN MINUTES EPOCHES: ONE
BASELINE (SOLID- LINE) AND ONE PREICTAL (DASHED-LINE) 59

Figure 4.4 RECONSTRUCTION OF THE DELAY-EMBEDDING TRAJECTORY FROM A TIME-
SPRIF‘Q 6’)

Figure 4.5 (a) LORENZ ATTRACTOR AND (b) ITS RECONSTRUCTION USING THE PREVIOUS
VALUES OF A STATE VARIABI E 64

xi

Figure 4.6 SLIDING WINDOW SLIDES (SOLID RECTANGLE) OVER AN EEG SIGNAL ............ 67
Figure 4.7 DIAGRAM OF THE GENETIC PROGRAMMED ARTIFICIAL FEATURES ALGORITH..69

Figure 4.8 SCHEMATIC OF THE NONLINEAR TRANSFORMATION PERFORMED BY THE GPAF

 

 

 

ALGORITHM 70
Figure 4.9 DETAILED ILLUSTRATION OF THE GPAF ALGORITHM PROCESS ........................ 73
Figure 4.10 CHARACTERISTIC AUTOCORRELATION PLOT 74
Figure 4.11 PDFS FOR A BINARY HYPOTHESIS TEST 76
Figure 4.12 STAGES OF THE PCA K-NN BENCHMARK CLASSIFIED 83

 

Figure 4.13 ILLUSTRATION SHOWING FROM WHERE PREICTAL EPOCHS ARE EXTRACTED
89

Figure 4.14 POINT AND BLOCK BASIS EXAMPI ES 90

Figure 4.15 RECEIVER OPERATING CHARACTERISTIC CURVE FROM TESTING DATA FOR
PATIENT A 93

Figure 4.16 RECEIVER OPERATING CHARACTERISTIC CURVE FROM TESTING DATA FOR
PATIENTB 96

Figure 4.17 RECEIVER OPERATING CHARACTERISTIC CURVE FOR PATIENT C ................... 98

Figure 4.18 RECEIVER OPERATING CHARACTERISTIC CURVE FROM TESTING DATA FOR
PATIENT D 100

Figure 4.19 RECEIVER OPERATING CHARACTERISTIC CURVE FROM TESTING DATA FOR
PATIENT F 102

Figure 4.20 ROC CURVE FOR TESTING DATA FOR PATIENT F 104

Figure 4.21 RECEIVER OPERATING CHARACTERISTIC CURVE FOR VALIDATION DATA FOR
PATIENT G 107

Figure 4.22 PDFS FOR THE PERCENTAGE OF CORRECT CLASSIFICATION OF THE EPOCHS
FOR THE THREE ALGORITHMS (FROM APPENDIX 1) l 1 1

Figure 5.1 A TWO-MINUTE DURATION ICTAL SIGNAI
Figure 5.2 CLASSIFICATION LABELING FOR A Two—MINUTE ICTAL EPOCH ......................
Figure 5.3 LINE LENGTH FEATURE BENCHMARK CLASSIFIED

Figure 5.4 A SEIZURE FROM PATIENT B, WHICH DEPICTS A BETA-BUZZ PATTERN ...........
Figure 5.5 CLINICAL SEIZURE PREVIOUSLY UNREPORTED FOR PATIENT F .......................

Figure 5.6 SUBCLINICAL SEIZI IRE

 

Figure 5.7 CLASSIFICATION LABELING FOR SUBCLINICAL SFI7I IRES

Figure 5.8 BLACKOUT OCCURRED IN THE EEG RECORDING OF PATIENT B .....................
Figure 5.9 FULL-SCALE SWING OCCURRING DURING EEG RECORDING OF PATIENT D...
Figure 5.10 FALSE START OCCURRING DURING THE EEG RECORDING OF PATIENT E .....
Figure 5.11 FALSE-START OCCURRING DURING EEG RECORDING OF PATIENT E ............

Figure 5.12 DELTA TRAIN IN EEG RECORDING 0F PATIENT F

xiii

125

127

128

133

138

141

143

143

144

144

145

LIST OF TABLES

 

 

Table 2.1 CONVENTIONAL vs. ARTIFICIAL FEATI IRFQ 14
Table 3.1 RESULTS OF THE CLASSIFIER USING THE INDICATED FEATURE INPUTS .............. 44
Table 4.1 NUMBER OF BASELINE AND SEIZURE TAPES AVAILARI E 55
Table 4.2 CONFUSION MATRIX OF THE CLASSIFIER DECISIONS 76

 

Table 4.3 MULTIPLE CLASSIFIER-BASED PERFORMANCE METRICS FOR THE GP COST
FUNCTION 77

 

Table 4.4 PRIOR PROBABILITY METRICS FROM THE CONFUSION MATRIX .......................... 78

Table 4.5 LIST OF FUNCTIONS AND THEIR RESPECTIVE SYMBOLS FOR THE GP

 

ALGORITHM 527
Table 4.6 RESULTS FOR TRAINING DATA FOR PATIENT A (POINT BASIS) ............................ 92
Table 4.7 RESULTS FOR VALIDATION DATA FOR PATIENT A (BLOCK BASIs) ...................... 93
Table 4.8 RESULTS FOR TRAINING DATA FOR PATIENT B (POINT BASIS) ............................ 95
Table 4.9 RESULTS FOR VALIDATION DATA FOR PATIENT B (BLOCK BASIs) ...................... 95
Table 4.10 RESULTS FOR TRAINING DATA FOR PATIENT C (POINT BASIs) .......................... 97
Table 4.11 RESULTS FOR VALIDATION DATA FOR PATIENT C (BLOCK BASIS) .................... 97
Table 4.12 RESULTS FOR TRAINING DATA FOR PATIENT D (POINT BASIS) .......................... 99
Table 4.13 RESULTS FOR VALIDATION DATA FOR PATIENT D (BLOCK BASIS) .................. 100
Table 4.14 RESULTS FOR TRAINING DATA FOR PATIENT E (POINT BASIS) ........................ 101
Table 4.15 RESULTS FOR VALIDATION DATA FOR PATIENT E (BLOCK BASIs) .................. 101
Table 4.16 RESULTS FOR TRAINING DATA FOR PATIENT F (POINT BASIs) ........................ 103
Table 4.17 RESULTS FOR VALIDATION DATA FOR PATIENT F (BLOCK BASIs) ................... 104

Table 4.18 RESULTS FOR TRAINING DATA FOR PATIENT G (POINT BASIS) ........................ 106
Table 4.19 RESULTS FOR VALIDATION DATA FOR PATIENT G (BLOCK BASIs) .................. 106
Table 4.20 OVERALL RESULTS FOR ALL THREE CLASSIFIERS 108

Table 4.21 RESULTS FOR VALIDATION DATA ON BLOCK BASIS FOR PATIENT D USING 50%

 

 

 

 

 

 

 

 

THRESHOLD 109
Table 4.22 VALIDATION RESULTS FOR SEVEN PATIENTS 114
Table 4.23 RESULTS USING LEAVE-ONE-OUT CROSS VALIDATION .................................. 116
Table 4.24 COMPARISON OF D’ALESSANDRo’S ALGORITHM AND GPAF RESULTS ......... 119
Table 5.1 RESULTS FOR VALIDATION DATA FOR PATIENT A 132
Table 5.2 RESULTS FOR VALIDATION DATA FOR PATIENT B 133
Table 5.3 RESULTS FOR VALIDATION DATA FOR PATIENT C 134
Table 5.4 RESULTS FOR VALIDATION DATA FOR PATIENT D 135
Table 5.5 RESULTS FOR VALIDATION DATA FOR PATIENT F 136
Table 5.6 RESULTS FOR VALIDATION DATA FOR PATIENT F 137
Table 5.7 RESULTS FOR VALIDATION DATA FOR PATIENT G 139

 

Table 5.8 RESULTS FOR VALIDATION DATA USING THE NEW CLASSIFICATION LABELING

 

 

140
Table 5.9 RESULTS FOR VALIDATION DATA FOR PATIENT G USING NEW CLASSIFICATION
LABELING 141
Table 5.10 RESULTS FOR VALIDATION DATA FOR THE GENERAL DETECTOR ................... 146

Table 5.11 VALIDATION DATA RESULTS FOR THE GENERAL DETECTOR WITH DELAY
EQUAL ONE 150

Table 5.12 RESULTS FOR THE LEAVE-ONE-OUT METHOD 152

XV

ANN:

EEG(S):

EEGer(s):

FN:

FP:

FPH:

GA:

GFNCAF:

GP:
GPAF:
k-NN:
MTLE:
PCA:

PDF:

Pm!

UCO:

LIST OF ACRONYMS

Artiﬁcial Neural Network
Electroencephalograph(s)

Elem” rL ‘ L (cl

False Negative

False Positive

False Positives per Hour

Genetic Algorithm

Genetically Found, Neurally Computed Artiﬁcial Feature
Genetic Programming

Genetic Programming Artiﬁcial Feature
k Nearest Neighbor Classiﬁer

Mesial Temporal Lobe Epilepsy
Principal Components Analysis
Probability Density Function
Probability of False Alarms

Probability of False Negatives
Unequivocal Clinical Onset

Unequivocal EEG (Electrographic) Onset

xvi

Chapter 1

Introduction

A woman wakes up and brushes her teeth. Then, she goes to the kitchen and prepares
her breakfast. Later, the woman gets dressed and takes her belongings and she is ready
to go to work. On her way to work, suddenly a small device starts to beep and she stops
the vehicle and takes a pill: the beeping indicates to her that soon she will suﬂer an

epileptic seizure.

In the near future, the above hypothetical scenario is likely to be possible with the help of
a new device that will be marketed to detect or predict epileptic seizures. Although
epileptic seizures have been studied extensively in recent decades, many of the
circumstances surrounding epileptic seizures are unknown to this day. Researchers have
used several techniques to analyze electroencephalographic signals (EEGs) to develop a
methodology to predict and detect such seizures. In this work, we shall present an
algorithm that will create artiﬁcial features directly from EEG signals. In theory, these
features Should provide a classiﬁer with equal or better discrimination power to predict or

detect a seizure than the classical features used for such tasks.

1.1 Motivation
Since the invention of the electroencephalograph, great progress has been made in
Studying brain disorders. One Of the most studied brain disorders is epilepsy, a

neurological condition that makes people susceptible to brief electrical disturbance in the

brain and that is described by a change in sensation, awareness, and/or behavior. Epilepsy
iS characterized by recurrent seizures. It affects up to one percent of the population of the
world, or sixty million people, and 25% cannot be fully controlled by current medical or
surgical treatment.

Many approaches have been proposed to extract information from EEG signals in
seeking to develop algorithms to predict or detect epileptic seizures. To extract the
relevant information that can facilitate such prediction or detection, features are
calculated using conventional techniques and methodologies, which are time-consuming,
trial-and-error processes requiring a great deal of eﬂ‘ort from researchers.

In this manuscript, we propose a systematic methodology to automatically create
artificial features—features that are computer-generated and may not have a known
physical meaning——directly from EEG signals, and which we project to have similar or
better discrimination power to categorize EEG signals between preictal or baseline, in

the case of prediction, and between ictal and baseline, in the case Of detection.

1.2 Objectives

The main Objectives of this work are as follows:

1. Extend the framework of F irpi and Echauz [31] by creating artiﬁcial features directly
from EEG signals. In [31], features were created using a features-ﬁ'om-features basis.

Here, a features-ﬁom-data basis will be used.

 

2. Prescribe a general-purpose systematic methodology that can automatically create
features that capture generalizations without the intervention of researchers
performing exhaustive analysis to select features.

3. Generalize the capabilities of the Genetically Found Neurally Computed Artiﬁcial
Features algorithm (GFNCAF), proposed by the author in [31], by using Genetic
Programming (GP), which allows for the evolution of more complex networks, unlike
GFNCAF, which used an arbitrary, ﬁxed network.

4. Apply the GFNCAF extension, called Genetic Programming Artiﬁcial Features
(GPAF) algorithm, to create artiﬁcial features directly from raw EEG data, to predict

and detect epileptic seizures.

1.3 Methods

A large archive (730 hours) of digital intracranial EEG recordings obtained from patients
undergoing presurgical evaluation at the Emory Epilepsy Center of Emory University in
Atlanta, GA, was available for this purpose. Data were obtained as a routine procedure
independent of our work. No human subject was placed at risk by or at Michigan State
University. The prediction and detection problems will be cast as a dichotomous
classiﬁcation, in which EEG signals are fed to the GP algorithm in order to produce
artiﬁcial features that can grant maximal separability between ictal and baseline EEG
signals, or between preictal (before seizure) and baseline EEG signals, as possible. A
library of mathematical operators (or functions) will be provided to the GP

learning/classiﬁcation system in order for it to develop complex artiﬁcial features.

3

1.4 Contributions
The contributions of this work include:

0 An algorithm that can automatically generate artiﬁcial features from raw EEG
signals to predict epileptic seizures with Similar or better classiﬁcation rates than
those obtained through previously published methods.

0 An algorithm that can automatically generate artiﬁcial features directly from raw
EEG Signals to detect epileptic seizures, Similarly to or better than through
previously published methods.

0 The ﬁrst step toward a generic model of artiﬁcial features for prediction of
epileptic seizures.

0 A generic model for detection of epileptic seizures that has similar or better

performance to that of current methods/features.

1.5 Outline

Chapter 2 introduces the epilepsy disorder and deﬁnes some of the terminology used in
the medical community. A brief overview is presented on the literature about the
importance of relevant measurements to apply to the prediction and detection Of epileptic
seizures. A review is provided Of the classical and non-classical techniques used today to
extract features from EEG Signals for prediction and/or detection applications. A brief
overview of a related algorithm, called the Genetically Found, Neurally Computed
Artiﬁcial Features algorithm, is presented. Finally, a brief explanation about genetic

algorithms and genetic programming is provided.

4

In Chapter 3, we introduce the genetic programming artiﬁcial feature algorithm (GPAF).
Later, the GPAF is tested on a problem where randomly generated vectors are classiﬁed
as either parallel or nonparallel.

In Chapter 4, we present the implementation of the GPAF algorithm to design
artiﬁcial features for prediction Of epileptic seizures. We explain a method to reconstruct
space-state trajectories, called delay embedding. Later, these reconstructed states are
input to the GPAF algorithm, which constructs the predictors. Many experiments are run
on the EEG data of seven patients.

In Chapter 5, we implement the algorithm to design artiﬁcial features for detection of
epileptic seizures. Details about detection are explained. Detectors are tested on EEG data
of seven patients. Finally, Chapter 6 concludes with beneﬁts, limitations, and

recommendations regarding the GPAF algorithm.

 

Chapter 2

Literature Review

This chapter ﬁrst deﬁnes what epilepsy is. Also provided is a review of previous work,
methodologies and techniques that have been used to predict and detect epileptic
seizures. Classical signal analysis methods used in recent decades to analyze
electroencephalographic Signals (EEGS), such as Fourier transforms and wavelets, as well
as others, will be discussed. Furthermore, an overview will be presented on related work

presented in [30] and [31]. Finally, the foundation of genetic programming is laid out.

2.1 Epilepsy Overview

According to the Epilepsy Foundation of America [22], epilepsy is a neurological
condition that makes people susceptible to seizures. A seizure is a term used to describe
change in sensation, awareness, or behavior brought about by a brief electrical
disturbance in the brain. Seizures vary from a momentary disruption of the senses, to
short periods of unconsciousness or staring spells, to convulsions. There are many types
of seizures, and some people have only one type of seizure, while others may have more
than one type. Even though they look different, all seizures are caused by a sudden
change in how the neurons of the brain send electrical Signals to each other [22]. Epilepsy

can be caused by anything that affects the brain, such as tumors or strokes. Sometimes

epilepsy is inherited; sometimes it is caused by trauma, like from head injury in a car
accident; however, in many cases no cause can be found for such disorder.

Today, one way to treat epilepsy is through the use of anticonvulsant medicines.
These medications control seizures for many people with this disorder; however, they do
not cure it. Other alternatives treat the condition by means Of surgery, by a demanding
ketogenic (high-fat) diet, which is principally administered to children, or by electrical
stimulation of the vagus nerve, a large nerve leading into the brain, through implantation
of a vagus nerve stimulator that offers partial relief [1]. From a control systems
perspective, all of these treatment options are still a form of sensorless, open loop control.
Regardless, the goal of any treatment is to stop seizures with as few side effects as
possible.

There are some controversial ﬁndings concerning seizure prediction by animals.
Some studies have suggested that dogs can predict seizures in children. In [29] and [54],
Kirton et al. have reported that dogs could predict epileptic seizures from minutes to
hours before the seizure occurred. The researchers collected the data by means of
questionnaires sent to the owners who claimed their dogs could predict epileptic seizures
in their children. The study reported that among 60 dogs, nine dogs were judged to be
able to predict epileptic seizures with 80% accuracy (of course, this measure is difﬁcult
to state with high reliability, as it clearly depends on the interpretation of the dog’s
owner). The dogs were reported to let the children know that they were about to suffer a
seizure by licking, whimpering, or standing next to them. (Of course, these behaviors are

not unusual, and it is easy to see how the Observer’s biases might inﬂuence the

7

interpretation of the data.) In previous studies, the dogs were reported to bark for no
apparent reason and behaved erratically (much like other animals foreshow tsunarnis,
etc.). One interesting point is that those dogs were not trained to do such task. Therefore,
such behavior, if present, was somehow learned by the dogs without human training. The
study does not indicate how the dogs learned such “ability” or what the dogs are
measuring or seeing in the child to be able to predict the seizure. However, some
researchers have conjectured that these dogs can “see” or “smell” small electrical or
chemical changes, using visual or olfactory or increased sweating information to predict
seizures. Critics in the epilepsy medical community have pointed to the fact that the
evidence collected for such claims is weak, that the dogs’ bizarre behavior may in fact be
what triggers some of the “predicted” seizures, and that in a large number of cases, the
seizures are psychogenic (clinical ﬁts without corresponding EEG evidence), however
such dogs seem to be of great psychological value to the patients.

Since the invention of the electroencephalogram, most of the studies on epileptic
seizures have been carried out using electroencephalographic (EEG) signals. Many
approaches have been proposed to extract information from EEG Signals that can be used
to develop algorithms to predict or detect epileptic seizures (many of them are discussed
in Section 2.4). Features are highly informative measures or attributes eluded from raw
data can facilitate tasks such as prediction or detection. These are calculated using
conventional methodologies that are time-consuming, trial-and-error processes requiring
a great deal of effort from researchers. All of these techniques rely on knowledge of a

feature formula or algorithm that may have been obtained from intuition, tradition, the

8

physics of the problem, analogies to problems in other ﬁelds, etc. There is no guarantee
that any Of these conventional features extracts maximally relevant information from the
raw data. Nevertheless, seizure prediction studies using conventional features
increasingly hint at the fact that the information is there, waiting to be fully extracted. Litt
et al., in [60], presented evidence that mesial temporal lobe seizures are generated in a
series of events that evolve over hours, leading to the clinical seizure onset. This series of
events can be recorded by depth intracranial EEG. However, this work was conducted by
scoring, manually, many hours Of EEGS and the predictability was largely limited to what
a basic energy feature could reveal about preictal changes. More recently, it has been
Shown that complicated feature calculations can be realized in miniaturized hardware
using a cellular neural network approximation of the feature on a chip [70], [90].
However, before beginning a full discussion Of previous work in the literature
review, we note from the preceding paragraph the importance of selecting good measures
to detect and predict epileptic seizures. As noted, the prediction/detection of epileptic
seizures can be viewed from a pattern recognition point of view. The stages that involve
pattern recognition are detailed in [18]. The system is summarized in three stages:
preprocessing, feature extraction, and classiﬁcation. Preprocessing refers to those
processes or operations on the data that are performed before other stages are started. The
main task of preprocessing is to improve the quality of the data/images. In an EEG
analysis framework, the preprocessing stage is used to maintain the integrity of the signal
and to ﬁlter out line noise using a 60 Hz notch ﬁlter. Also, a low-pass ﬁlter is used prior

to digital sampling to avoid aliasing of signals.

9

Feature extraction is the process by which quantitative or qualitative measures are
obtained that distill relevant information from raw data. A dimensionality reduction is
obtained through this process by mapping the high-dimensional space of the raw data to a
relatively low-dimensional space of features. However, other processes, like the feature
creation process, in which new features are created from other features or directly from
raw data, can also be performed in this stage. By means of feature extraction techniques,
we can also reduce the number of features to be used and then select those features that
improve the discrimination task. Because this stage is the main issue in this work, in
Section 2.2 we cite, discuss, and extend the theory on this subject.

Finally, classiﬁcation is the process by which a system categorizes an object, making
use of its features. There are two types Of classiﬁers: supervised classifiers and
unsupervised classiﬁers (although there already are classiﬁers proposed that combine
supervised and unsupervised techniques). Supervised classiﬁers are those that must be
trained using labeled samples, i. e., samples whose classes are known a priori. There are
two kinds of supervised classiﬁers: parametric and nonparametric. Parametric classiﬁers
are those classiﬁers that assume a probability density ﬁmction (PDF) for the data. Among
those are Gaussian and mixture model [18] classiﬁers. On the other hand, nonparametric
classiﬁers do not assume a probability density function. Among those classiﬁers are k-
Nearest Neighbors [76], its fuzzy version, Fuzzy k-Nearest Neighbor [51], Parzen-
windows [18], and Genetic Classiﬁers [73]. Artiﬁcial Neural Networks [41] are also
nonparametric classiﬁers, given that they do not assume a PDF, though they are

considered to be hybrid methods because the parameters, called weights, must be

10

calculated. More details about classiﬁers are beyond the scope of this work, however, for
in-depth information, readers may refer to the references cited above.

Nevertheless, before we continue with the next section, we need to deﬁne and state
clearly some terminology about epilepsy that is used in the medical community. A
preictal signal refers to an EEG Signal from a time period preceding the known
occurrence of an epileptic seizure. The term preseizure will be used interchangeably with
preictal throughout the text. An ictal signal is the EEG recorded during a seizure. A
baseline signal is referred to as a normal signal (nonseizure signal), Speciﬁcally
excluding preictal signals. It will also be called seizure-free or nonseizure data
throughout the text. Unequivocal clinical onset (U C0) of the seizure is when an Observer
can see the consequences of the seizure in the person, such as a convulsion. Unequivocal
electrographic onset (UEO) is when the seizure onset is identiﬁed in EEG by an expert

electroencephalographer.

2.2 Relevance of Features

As was stated previously, features are highly informative measures or attributes eluded
from raw data for tasks such as classiﬁcation, regression, and probability density
estimation. Feature extraction have been pinpointed as the cornerstone of most proposed
works in order to advance the seizure prediction and detection Objectives. For instance,
Figure 2.1 depicts the waveform of a Signal from which we can obtain several measures
such as the spectral components derived from the Fast Fourier Transform, and statistics,

such as mean, variance, skewness, etc. These measurements, which make this Signal

11

 

Magnitude

 

”F M

I L l l
0 500 1000 1500 2000 2500
Time

 

 

 

 

Figure 2.1 Waveform of an arbitrary signal.

different from others, are named features. Many authors have agreed that the
speciﬁcation of useful features (sensitive, speciﬁc, robust, computationally efﬁcient,
among others) is the most important key to machine learning with statements like: “In
general, feature extraction is the most crucial aspect of classification” [52], “An ideal
feature extractor would yield a representation that makes the job of the classifier trivial”
[18], “The precise choice of features is perhaps the most diﬂicult task in pattern
processing ” [65], ”the selection of variables is a key problem in pattern recognition”
[36]. In fact, according to Kil and Shin [52], a good set of features should have the
following attributes:

o A large interclass mean and small intraclass variance

0 Discard irrelevant variables

- Computationally inexpensive to measure

12

o Uncorrelated with other features
0 Mathematical deﬁnable
o Explainable in physical terms

From the feature PDFS, we can acquire information to design Optimal or close-to-
optimal decision structures (i. e., classiﬁers). However, the performer of the classiﬁers are
linked to the relevance of the input features whose power set (all possible subsets) may
not be carrying the maximum information available from the raw data.

The action of establishing features have been relied on humans by means of trail-
and-error or previous knowledge or intuition of the physics as approaches tO create them.
With the exponential increasing of the computing power of processors, a new realm has
been opened to explore new methods to systematize the prescription of features in a given
problem. In works [8], [31], and [89], the authors took advantage of the increasing power
of computers and proposed algorithms whose results defy the classical subset selection
results. In these manuscripts the creation of artiﬁcial features was proposed. Artiﬁcial
features are deﬁned as features computer-generated and do not necessary have a physical
meaning. By deﬁnition, artiﬁcial features possess the ﬁrst ﬁve attributes mentioned
above; however, the Sixth attribute not necessarily has to be true. In fact, the creation
artiﬁcial features is motivated by the fact that conventional features (which possesses the
Six attributes), when chosen arbitrarily or Simply because of tradition, in situations where
domain knowledge is insufﬁcient, often fall far short of performance requirements. It is
the relaxation of this last attribute that may give to the artiﬁcial features the capability to

“capture” relevant information from the raw data. Soft-computing methods like
13

evolutionary algorithms (discussed further) can be used to construct and optimize
artiﬁcial features for a given problem. These artiﬁcial features are expected to resemble
or surpass the classiﬁcation accuracy of conventionally chosen features. The contrasting
characteristics between conventional and the desired characteristics for the artiﬁcial
features are provided in Table 2.1.

Table 2.1 Conventional vs. artiﬁcial features.

 

 

Conventional Features Artiﬁcial Features
Serial Parallel
Von Neumann architecture Neural architecture
Programmed Learned
Combinatorial Inductive
Ad—hoc Optimized
Slow computation Fast computation

Based on intuition or sufﬁcient

domain knowledge Data-drrven

 

Generally, when computing features for a given problem, many of them are
calculated or collected. Later, it is in the feature extraction stage where commonly the
dimensionality of the feature space is reduced or the informative ones are selected by
means of classical algorithms like principal components analysis, making them “top-
down” approaches. In contrast, artiﬁcial features algorithms are “bottom-up” approaches,
creating just the right set of features. That is, the artiﬁcial features algorithm decides the

nature and the number of features that are necessary, from few to many features.

In the literature, feature extraction methods (which are also called feature creation,
feature optimization, feature learning, feature discovery, feature mapping, feature
augmentation, feature transformation [104], and Signal or data projection [52]) have been
proposed in two ways: linear feature extractors and nonlinear feature extractors. Linear
feature extractors create features as a linear combination of input features (e. g., principal
components analysis) or create features as a linear combination of the raw inputs by
means of such methods as adaptive noise ﬁltering and time-frequency transform [52] [32]
[79]. On the other hand, nonlinear feature extractors create features as a nonlinear
combination of input features by means of a hidden layer in a neural network [10] or by
combining them using mathematical operators [8] [30] [89] [55].

In the next section, we mention the classical measures or features used to analyze
EEG signals, and related work that proposes different approaches to Obtain better and

more relevant (informative) features.

2.3 Conventional Measures Used in Signal Analysis

Although there exist many measures from various research ﬁelds, in this work, we recall
some of the measures most used for prediction and detection of signals, not just in
epilepsy applications, but also in different areas like signal processing, system
identiﬁcation, stochastic systems, and so on.

Arithmetic Mean: statistical measure that yields the expected value (average or ﬁrst

moment statistic) of a given set of data X. This mean is the most commonly used.

l1: xl , (2.1)

'Ms

L
N

Geometric Mean: another kind of mean used to determine average factors.

N UN
#6 = [1'] x,] . (2.2)
i=1

Harmonic Mean: another of the various ways to calculate the average of a set of data.

 

1 ' 2.3
2_ ( )

Variance: measures how widely Spread or distributed the data are from the mean ,u.

N

XXX.- _ #)2 - (2.4)

i=l

az=i
N

Standard Deviation: the square root of the variance.

0: Linen (25)
N—l i=1 ' ' °

Skewness: measures the degree of asymmetry of the distribution of a set of data.

 

71 N [:1 a - -
Kurtosis: measures the degree of peakedness of the distribution of a set data X (the 3 is

subtracted to make it 0 for Gaussian distribution).

1 N x —,u 4
=— —"— —3. 2.7
72 NE( a J ( )

Signal Energy: measures the energy of a given signal x(t).

l6

15-2202»

i=1
Energy of Wavelet Packets: the energy Of wavelet-packet coefﬁcients is derived from a
ﬁve-level decomposition of signal within a sliding Observation window into wavelet-
packet coefﬁcients based, e. g., on the Daubechies 4 mother wavelet. The sum Of squared

coefficients at each terminal node of the decomposition tree gives spectral energies.
N
~22 22.1.2. 09)

Spectral Entropy: the normalized Spectral entropy is the entropy of the spectrum of the
signal x normalized by log2(length(S)), where S =f(x)/Z(f(x)) and f(-) is the power Spectral
density function. The normalized spectral entropy is an index from 0 to 1 almost

insensitive to the signal length.

H=_N Silog2(S.~). (2.10)
i=1 1032(N)

Signal Curve length: it is calculated as the sum of the lengths of the vertical line
segments between successive samples of a signal. The signal curve length is

simultaneously sensitive to amplitude (AM) and frequency modulation (FM).

N
L=z|xi—x,-_,I- (2.11)
i=1

2.4 Signal Analysis and Feature Extraction in Epilepsy

Various methodologies have been used for decades to analyze EEG signals to extract

relevant features that give insight as to when an epileptic seizure will occur. Among the

17

most classical methods are Fourier transform, wavelets, and chaos analyses. A brief
overview of these methodologies is provided subsequently.

The brain disorder we deal with here is mesial temporal lobe epilepsy (MTLE).
MTLE is strongly associated with complex partial seizures, the most common type of
seizure, being present in 40% of all epilepsy cases reported [14]. In this section, we cite
papers and manuscripts that present approaches to predict and detect MTLE seizures. We

mention for each citation what problems the author(s) are dealing with.

2.4.1 Fourier Transform

Early on, when EEG Signals began to be studied, Fourier transform was the most logical
method applied to study such Signals. Fourier transform is a technique used to represent
aperiodic signals for all values of time in the frequency domain, unlike the Fourier series,
which is a technique to represent periodic signals over speciﬁc values of time. This
transformation gives us the frequency components, better known as the frequency

spectrum, of the signal at hand. The Fourier Transform (FT) is

F(w)= j: f(t)—fw’dz, (2.12)
where a) is the frequency (co = 211/) and f(t) is the signal in the time domain. As
mentioned, FT shows the spectral components of the signal at hand; and in EEG signals
the frequency bands most widely studied to track the seizures are the alpha band, a, (8-13
Hz), theta band, 0, (4-8 Hz), delta band, 6, (.5-3.5 Hz) and beta band, B, (14-30 Hz,
although it can reach up to a maximum of 40 Hz). Authors [103] have applied the FT to

see if there are indications that can Shed a clue to predict seizures. However, one of the

18

main problems of FT is that it can Show the frequency components of the signal but it
cannot indicate in what instant of time these occur because time is integrated out in
Equation (2.12). However, EEGS are non-stationary signals time localization is of

interest.

2.4.2 Wavelet Transform

In [94], the wavelet transform has been used to detect epileptic seizures. Wavelet
transform [39] is a mathematical transformation that uses scale and translation properties
to decompose the input signal into its components of frequency that appear at different
resolutions. One of the main properties of wavelets is that they decompose the frequency
components so as to indicate in what interval Of time a given component occurs. The

wavelet transform is deﬁned by the following mathematical expression:
1 a l - 2'
‘1’“725): \l—H— jx(’)/’ {7}” 2 (2.13)

where s and r are the scale and translation parameters respectively, Mt) is the
transformation function, often called the mother wavelet, and x(t) is the input signal. In
[94], Szilagyi, Benyo, and Szilagyi used wavelets to decompose the EEG signals into
their spectral components a, 0, 6, and ﬂ bands. Later, these spectral measurements were
input to an artiﬁcial neural network (discussed in detail below) that was trained to output
whether an EEG signal is an epileptic or a baseline (normal) wave.

In [9], Chen, Zhong, and Yao used wavelet-transformed features to extract or detect

the characteristic values (the authors refer to these values as singular values) of the sharp

19

waves and spikes, which indicate a clinical onset, and which are embedded in EEG
signals. The authors deﬁned a wavelet that permits reconstruction of the high frequency
components of the EEG and detection of the spikes by taking those as singular values and
using local maxima of their wavelet transform modulus.

In [63], where Mehta, Koser, and Venziale are also interested in detecting ictal
events, they proposed an automated method to detect ictal processes based on wavelet
decomposition. In their publication, the authors claimed that the average power spectrum
of the EEG of a resting, healthy person follows power law attenuation with a scale-
invariant property over a range of clinically interesting frequencies, suggesting a hidden
scale-invariant property. This statistical scale-invariant property was monitored, and
when appreciable changes in the statistical measurement were detected, an onset of
seizure could be determined. These statistical changes were represented using wavelets
with which coefﬁcients measured the variance of progression over different scales, thus,

reﬂecting changes of scale-invariant signals.

2.4.3 Fractal Dimension

Fractal dimension [28] is a concept that represents the space-ﬁlling ability of an irregular
geometric object of interest with an inﬁnite nesting of structure (e. g. line segment,
square, etc.). While the familiar dimensions, such as topological dimension, are speciﬁed
as integer values, a fractal dimension may be a fractional value. The fractal dimension is

deﬁned by the following equation in the limit as r vanishes:

 

FD = 10g(N),
log(r)
20

(2.14)

where N is the number or bulk measure of self-similar pieces and r is the magniﬁcation
factor.

One area where fractal dimension is usefully applied is analysis of time series. In
[26] and [27] the application of FD to predict and detect the clinical onset of seizures was
proposed as a real-time tool. In their work, the authors used the Katz method to calculate
the FD of the EEG waveforms. They used baseline signals as well as ictal data to test the
techniques. Esteller et al. [25], determined that using the Katz approach to FD was a good
way to detect onset of seizures. On the other hand, in [25], Esteller et al. proposed an
even more efﬁcient measure to detect seizure onsets. The feature, called curve length or
line length, is less computationally expensive. In their work, the authors Showed that, by
processing the EEG signals with the curve length feature, they were able to detect seizure
onsets with a delay of 4.1 seconds, with 0.051 false positives per hour and one false
negative out of 111 seizures analyzed in over 1,215 hours of intracranial EEG data.

In [53], Kirlangic et al. also proposed the use of FD as a feature for an adaptive EEG
segmentation algorithm (AEEGS) in epilepsy, instead of using Fourier transforms (FT)

and Fast Fourier Transform used previously in the AEEGS algorithm.

2.4.4 Artiﬁcial Neural Networks
Artiﬁcial neural networks (ANNS) [41] are networks (i.e. equations) that can do a
mapping of input to output with high levels of complexity. This complexity depends on

the number of layers and neurons that the network contains. ANNS can perform many of

21

the complex tasks that are present in real-world application problems, like nonlinear
classiﬁcation, function approximation, and control of dynamic systems, among Others.

ANNS have also been used to classify or detect epileptic EEG signals; however, they
have been used more to classify EEG states rather than to create features. In [67], the
authors used ANNS to classify EEG Signals between normal (baseline) and ictal signals.
The authors used, as feature inputs, classical statistics of signals, such as zero-crossing,
variance, integral of absolute value, parameters extracted from amplitude histogram, and
power spectrum values from the four different spectral components of EEGS.

In [72], Ozdamar et al. used a variant of neural networks, called adaptive resonance
theory neural network (ARTNN), for ﬂoating point values. This network is an
unsupervised self-organizing system that can cluster data into different classes; such
networks are suitable for input patterns that cannot be deﬁned by a static input. In this
work, the ARTNN should be able to detect the transient patterns of EEG signals and
classify them into their respective classes. When a signal that cannot be classiﬁed is
input, the algorithm stores the Signal in a database for future classiﬁcation. Since the
network is an unsupervised technique, an external trainer was used to label output from
the network that classiﬁes the output between spikes or non-spikes. EEG raw data were
the input to the network.

Another algorithm to detect epileptic waveforms was proposed in [74], in which Park
et al. used a wavelet transform, a neural network, and an expert system to design a
multiple channel system to detect epileptiform signals. The EEG signals (raw data) were

preprocessed by the wavelet transform (WT) to reduce the number of inputs to the neural

22

network, thus the WT works as the feature extractor. Later, once the preprocessed data
are input to the neural network, the ANN performs signal processing and accentuates the
epileptic spikes. The network was trained, using the backpropagation method, to output a
threshold value that is input to an expert system, a database of 16 if-then rules that
captures the knowledge and experience Of EEGers, and takes this input along with the
inputs of neighboring channels (surrounding electrodes) to make a decision whether the
Spikes at hand are epileptic or not. The authors reported up to an average of 5.5 false
detections per hour.

Bigan [4] presented an automatic system to identify EEG waveforms that use an
ANN, whose inputs (features) are measurements of frequency changes in the EEG signals
during an interval of time. Additionally, classical measurements such as magnitude and
power spectrum analysis were also calculated from EEGS and those were input, along
with the ANN output, to create a ﬁnal decision regarding seizure onset. Other works
using variants of artiﬁcial neural networks to predict or model epileptic seizures have

been proposed in [6], [7], and [19].

2.4.5 Lyapunov Exponents
Lyapunov exponents [68] (LE) are standard measures to determine whether a system is
chaotic. LE measures the average rate of convergence or divergence of nearby

trajectories. The LE is deﬁned as

_ - 1 WHO“)
’1 " Ilium t In leol ’ (2.15)
[MOI—>0

23

where Ax(x0, t) is the radius of an ellipsoid evolving in time. LE can also be viewed as a
measure of how easy it is to perform prediction in a given system: positive exponents
indicate chaoticity and thus unpredictability.

Iasemidis et al. [45] proposed the design of a mesial temporal lobe epileptic seizure
prediction implantable system using a version of LE, which they called Short-term
maximum LE. Using LE to measure the rate of divergence in the brain is just an
approximation of the LE, given that the brain is a highly nonstationary system. The
authors claimed to have obtained good results, predicting 82% of all cases of seizures
using EEG data from different patients. The authors also claimed to have predicted
seizures with an average Of 71.1 min. before the ictal onset. Controversy over these
results remains, since at the reported high FPH levels, random chance cannot be ruled

out.

2.4.6 Fuzzy Logic

Fuzzy logic [102] is a technique that offers some Of the linguistic ambiguity that exists in
human language (e.g., low, medium, high). This property makes fuzzy logic suitable to
deal with ambiguous and noisy data. In fuzzy logic, rules are stated using multivalent if-
then statements, i. e., if an event occurs to some degree, then its respective output is ﬁred
to some other degree. For this reason, Harikumar and Sabarish [42] proposed the use of
fuzzy logic to classify the risk level of epileptic seizures. In this work, the authors
calculated ﬁve features: signal energy, number of positive and negative spikes exceeding

a threshold, number Of sharp waves, total number of spikes and sharp waves in

24

neighboring channels, and variance. These were then input to the fuzzy logic system and
three outputs per channel were calculated indicating the seizure risk. The output was
expressed with ﬁve fuzzy sets: normal, low risk, medium risk, high risk, and very high
risk. Inputs were also represented with similar statements. With these inputs, rules could
be constructed to design the fuzzy logic system. A post-processing Of the fuzzy logic
output was carried out to obtain better classiﬁcation. The authors Obtained up to 80%

correct classiﬁcation of the onset of seizures.

2.4.7 Cellular Neural Networks
A cellular neural network (CNN) [11] [12] is a recurrent nonlinear network in which
neurons are locally connected and the dynamics are identical for each node. Commonly,
these neurons are called cells. CNNS of one or two dimensions are the most commonly
used.

Each cell in the CNN contains an input, an internal state, and an output. The state
equation for cell C2,- is defrned as follows:

in; = -x2-j + 224mm. + Zanumn + 22,-, (2.16)
mneNij mneNy-

where x2,- 6 R is the state, y,-,- e R is the output, u,~j e R is the input, and 22,- e R is the bias.
Am, the feedback weights, and B,,,,,, the feedforward weights, are connected from cell CW,
to C2}. N2,- is the l-neighborhood of C,-,-, where C,,,,. is also a member (CM 6 N.-,-).

The output equation is described by the following piecewise linear equation:

yg. a f(xij)= é—Qxy. +1|—|x,.j -1|). (2.17)

25

Tezlaﬂ' et al., in [97], proposed the use of CNN to anticipate or predict epileptic seizures,
ﬁrst studying the patterns that the CNN outputs when the onset Of a seizure was occurring
vs. not occurring, in a software Simulation environment. Then the authors implemented
the CNN parameters (i. e., control and feedback weights) in a Universal Machine CNN
(UM-CNN), a CNN computer. The authors used this to state the viability of a portable
device to predict epileptic seizures. We note that the feature extraction here is performed
by the CNN. Similar work was presented in [96], except that this time they used a

discrete time-delay CNN, a variant in the universe of CNNS.

2.4.8 Genetic Algorithm

The genetic algorithm (GA) is a global search, general-purpose Optimization algorithm in
which, by means of so-called genetic operators, many possible solutions are evolved in
an attempt to locate the best solution. Because our approach involves evolutionary
algorithms (i.e., GA or genetic programming), we leave ﬁrrther details for the next
section to explain those algorithms. However, the deﬁnition provided for GA gives an
intuitive idea to understand the papers that will be discussed next.

In [15], [16], and [17], unlike in previous work mentioned where a few features were
used as inputs, the EEG signals were processed by multiple levels of feature extraction,
starting with six features: curve length, signal energy, nonlinear energy, spectral entropy,
and energy of wavelet packets. Later, signals processed by those features were analyzed
visually and processed into a second stage of features, such as maximum spikes,

minimum spikes, kurtosis, variance, and skewness, among others. These processed EEG

26

signals were again processed in a third stage of feature extraction, where similar features
to those calculated in the second stage were used. After this stage, a GA was used to look
for the Optimal set of features that would be input to a classiﬁer, which would decide if
the EEG Signal is a baseline (normal) or a preictal Signal. Also, the GA would select
which channels (i.e., electrodes) would be used for the study. A probabilistic neural
network (PNN) was used as a classiﬁer. Experiments were run using data from Six
different patients. For each patient, a different set of features was the optimal set to
perform the best classiﬁcation. The results suggested that a speciﬁc predictor is necessary

for every patient.

2.4.9 Additional Work

Although several approaches have been proposed to address epilepsy prediction and
detection problems, in this work we do not discuss all of them in detail. Brieﬂy, other
approaches have been proposed in [101], where nonlinear dynamic systems theory was
used to detect seizures. Another scheme was proposed in [99], a method for feature
extraction using an extension of singular value decomposition (SVD) to detect epileptic
seizures. Other researchers have tried to establish an analytical model for epilepsy, as was

the case in [69].

2.5 Genetically Found, Neurally Computed Artiﬁcial Features
Regarding the approaches discussed in the previous section, classical features are selected
manually, i. e., by the expert or researcher who is studying the problem. Those procedures

are what we term conventional ﬂzatures-from—data-based. On the other hand, in the work
27

of D’Alessandro et al., a higher level of intelligence was added to select the optimal set
by using a GA; however, the set of features used were still conventional features, and so
analysis and study by the researcher were required.

In [31], Firpi and Echauz proposed an approach called genetically found, neurally
computed artificial features (GFNCAF). In their work, the GFNCAF algorithm was used
to create an artificial feature from EEG signals that were processed using conventional
features: wavelet packets, skewness, and curve length. This procedure is termed
artificial-features-from-conventional—feature-based. The GFNCAF algorithm is
composed of a genetic algorithm, a neural network and a classiﬁer, all connected in
cascade. Figure 2.2 depicts the GFNCAF algorithm. Brieﬂy, it works as follows: the
conventional features (i. e., processed EEG signals), denoted as x,- in Figure 2.2, are input
to the neural network, which in [31], for convenience, is implemented as an algebraic
network (called so because it uses only elementary mathematical Operators). The
operators, denoted in Figure 2.2 as 0,, on each node Of the network are selected by the
GA to create the artiﬁcial feature. Later, the classiﬁer receives the output from the
network, denoted as y,-, and evaluates the training or testing data to decide whether the
EEG signal was preictal vs. baseline, in the case Of prediction, or baseline vs. ictal, in the
case of detection. For this work, the authors implemented a k-nearest neighbor classiﬁer.

The GFNCAF algorithm was used to create artiﬁcial features that can help to predict
a seizure 10 minutes before it occurs. In fact, the algorithm was able to ﬁnd features that
have given the classiﬁer more discrimination power than using the three original classical

features. Similar results were also Obtained when GFNCAF was used to ﬁnd artiﬁcial
28

 

Genetic
Algorithm

 

 

 

only 0n22u- 0n

 

 

 

x1. «V2,... xn yr, y;,... y,,
—p :

C], 62,... 6,,

 

 

 

 

 

 

 

 

Algebraic Network Classiﬁer

Figure 2.2 Genetically found, neurally computed artiﬁcial feature algorithm block
diagram.

features that could detect seizures with equal or better performance than by using the
original three features.

In the conclusion, the authors touched upon some points that could improve the
GFNCAF algorithm. Among them, they indicated that more ﬂexibility needed to be
allowed to the neural network, and not just to the inputs, in order for the GFNCAF
algorithm to advance from creating simple features to creating complex artiﬁcial features

that may give more discrimination power among the classes.

2.6 Evolutionary Algorithms

Evolutionary Algorithms (EAS) are a family of algorithms that draw analogies from
natural evolution. Some of the evolutionary processes mimicked by such algorithms
include: reproduction, selection (or survival of the ﬁttest), mutation, and genetic

crossover. In conjunction with a ﬁtness function evaluation, combinations of any of these

29

processes give rise to algorithms, like genetic algorithms (GA, discussed in more detail in
Section 2.6), evolution strategies (ES) [34] [80] [85], and evolutionary programming (EP)
[35]. In addition, GA was the basis for development of other evolutionary algorithms, for
example, genetic programming (GP) [57] [58]. Other similarities that EAS have are: most
maintain population-based solutions (individuals), the score of the individuals is obtained
via evaluation of an objective (ﬁtness) function, many of them run a ﬁxed number of
iterations (or generations), and various parameters (i.e., strategy parameters, which
deﬁne the behavior of the EA—for example, mutation rate, crossover probability,

selection probability, size Of the population, and number of generations) must be set.

2.7 Genetic Algorithms

The genetic algorithm (GA) [38] [40] [43] [66], introduced by Holland in the late 19603
[44], is a general-purpose global search optimization method that imitates a small subset
of the processes of natural evolution. The main properties of GA are its operators:
selection, crossover, and mutation. There are two primary types of GA: binary and real-
valued. In a binary GA, solutions, also called individuals or chromosomes, are
represented as strings or vectors of binary values.

Because most real-world problems deal with numbers (integer or real), a mapping
function is Often needed to map a binary string to a set of numbers, formally called a
phenotype. These phenotypes are evaluated using an objective or fitness function, which
is an equation or algorithm used by the GA to assign a cost or value to each evaluated

solution. This score is called the fitness of the solution. Fitness functions are either to be

30

maximized or minimized, depending on the formulation of the problem. Another kind of
GA, a real-valued GA, uses a continuum Of real values (or, in fact, ﬁnite-length, ﬂoating-
point representations of such numbers) to represent the parameters of the problem under
evaluation. A brief overview is shown about the GA Operators mentioned previously.
Selection

The selection operator is the process by which a set of individuals, also known as an
intermediate population, is picked from a population using a criterion based on the ﬁtness
of the individual. There are several selection types. One of the most popular types is
ﬁtness proportional selection, often called roulette wheel selection. This selection type
relates the probability of selection Of an individual in a linear fashion to its relative ﬁtness
in the population. This means that individuals with a better ﬁtness have a proportionally
higher probability to be selected. Another commonly used selection method is
tournament selection, which closely emulates the mating competition in natural selection.
In this approach, a small set of chromosomes is randomly selected, and then the
chromosome among them with the best ﬁtness is selected. These tournaments are
repeated as many times as parents are needed.

Crossover

Crossover or recombination is the operator by which the GA creates new individuals,
called oﬁfspring, which are combinations Of (usually pairs of) the parents selected using
the previous operator, (i. e., the intermediate population) and that will belong to the new
population. One-point crossover randomly picks a crossover point between two loci in

the bit strings and exchanges the bit strings after this point. Two—point crossover is

31

similar to the former, but instead picks two positions on the string and exchanges the
values at the loci between them. In a real-valued GA, we can use recombination similarly
to its use in a binary GA. However, other approaches have also been studied for
recombination, such as a blending method, blend crossover (BLX-a) and quadratic
crossover, among others [2] [24].
Mutation
The main idea behind a mutation operator is to introduce diversity into the population. In
a binary GA, mutation is quite simple: invert the bit value (negating, i. e., 0 to 1 or vice-
versa) at each locus, based on a probability called the mutation rate. For a real-valued
GA, although there are different types of mutation [64], one of the most simple is uniform
mutation. The number of genes that will be mutated is derived by multiplying the
dimension (i.e., length) of the individual by the mutation rate and the population size.
This determines the number of genes randomly selected on the chromosome. Finally,
those genes that were selected for mutation are replaced with new uniform-randomly
generated values.

The canonical GA is implemented as follows (Figure 2.3 shows a ﬂowchart of the
GA):

1. The problem to be solved is cast aS the max' ' " ’ ' ' ' " of a ﬁtness

function that indicates the desirability of a potential solution.
2. A population of candidate solutions is initialized randomly, and is subject to

certain constraints.

32

3. Each chromosome is evaluated using the ﬁtness or cost function and assigned
a ﬁtness score.

4. Each chromosome is assigned a probability of reproduction in each iteration,
better called a generation. This probability is assigned using a selection
operator that generally depends upon the ﬁtness distribution of individuals.

5. During each generation, a new population is actively evolved by applying the
operations of selection, crossover, and mutation. Typically, the algorithm is
terminated upon reaching a predeﬁned maximum number of generations or a

level of ﬁtness desired.

 

Initial
Population

 

  
   

Evaluation
and Fitness

1

C Selection
‘ Crossover

l
—C Mutttion

Figure 2.3 Steps of a canonical genetic algorithm (GA). The variable g denotes the
number of generations.

  

 

g~g+1

 

 

UUU

 

2.8 Genetic Programming

Genetic programming (GP) [57], established formally by Koza [56], is closely related to
GA, although it has a few critical differences. In GP, the length of the chromosome is
variable and the representation is formulated using trees instead of the strings used in
GA. Another important aspect of GP, unlike GA, in which chromosomes often directly
encode the solution to the problem, is that the tree provides a program or solver that is
used to solve the problem, or instructions for how to construct a solution. In GP, there are
two types of nodes: functions, which have some number of arguments that they Operate
on, and terminals. Figure 2.4 depicts an example of a GP program (function), which
illustrates the ﬁmctions {cos, log, ( )2, +, x} and terminals {vbvz} and encodes the
ﬁmction cos(v1)(log(vz) + (v.)2).

Given that in the GP algorithm, the result is a program rather than a solution, the
evaluation to determine the performance of the resultant program is somewhat different
than for a GA. In GP, a set of input data is provided to the program (i. e., the program is
executed in some environment) and the output data of the program are evaluated in order
to know the performance of the program. For GP problems in which the performance is
partially determined by the environment, each program Should be evaluated using many
samples as input to determine the ﬁtness of the tree.

AS in GA, in GP, genetic operators: selection, crossover, and mutation, are used as
the engine to create new programs and add diversity in the population. Selection in GP is

typically implemented using the same Operators as in GA, such as tournament selection

34

 

Figure 2.4 A tree diagram of a GP chromosome. The dark circles indicate the function
nodes and the light circles indicate the terminal nodes.

and ﬁtness proportional selection (see Section 2.6), although other selection routines have
also been proposed, such as ﬁtness-overselection.

Among crossover routines for GP are subtree exchange crossover, self-crossover,
module crossover, and context-preserving crossover. Crossover, in GP, typically creates
two different trees from two parent trees by selecting random points in each tree and
swapping the subtrees below them. Figure 2.5 shows an illustration of crossover in a GP
context. There are many mutation routines similar to those used in GA, and others
speciﬁcally designed for GP. Some of these are: point mutation, permutation, hoist,
collapse subtree mutation, and gene duplication [3], among others. Figure 2.6 illustrates a
typical form of mutation in a GP context. The iteration of the GP algorithm is similar to

those steps presented for GA in Figure 2.3.

35

Parent 1 Parent 2

 

(a)

 

(b)

Figure 2.5 Illustration of a typical crossover operator in GP. (a) The parents, indicating
the subtrees that will be swapped. (b) The results (children) after crossover is performed.

36

   

’--‘

I
I

Subtree selected
for mutation

   
 

'---------‘
~-—--- -o'

I

“——’

    

--—-'

24—— New subtree

O---------~

‘u

(b)

Figure 2.6 Illustration of the mutation operator in a GP framework. (a) The individual,
indicating the subtree that will be mutated. (b) The result after mutation is performed.

37

Chapter 3

Genetic Programming to Create Artiﬁcial Features

In this chapter we discuss the limitations of the GFNCAF algorithm and introduce and
discuss the genetic programming-derived artiﬁcial feature (GPAF) algorithm, which is
the proposed solution to overcome the limitations encountered in the GFNCAF approach.
Later, we apply the GPAF algorithm in a demonstration problem where the algorithm
Should learn the concept of parallelism, creating an equation able to determine whether or
not two randomly generated vectors are parallel. A comparison with GFNCAF results on

this problem is provided.

3.1 GFNCAF Algorithm Limitations

The previous chapter discussed the genetically found, neurally computed artiﬁcial feature
algorithm. This algorithm was intended to create artiﬁcial features—that is, computer-
crafted features—that have the same or greater relevance as the classical features used to
create the new one, and therefore, to give the classiﬁer more power to distinguish among
classes. However, the GFNCAF algorithm still has various limitations that were noted by
Firpi in [30]. The limitations are summarized as follows:

0 The algorithm starts creating features from a limited library of features, rather than

creating the features directly from the data under study

38

The unary and n-ary operator set was limited to a set of only a few mathematical
operations

The neural network used in their work was based on an arbitrarily chosen, ﬁxed-
architecture network (number Of nodes determined a priori), in which the only
topological connections that could be connected or disconnected were the inputs

The topological structure of the network does not evolve, limiting the Space of
possible artiﬁcial features (solutions)

The authors suggested how to extend the capability and performance of the

algorithm to give more relevance and power to the new features. Those points are:

Allow arbitrary topological connections in the neural network to be active or inactive
when creating new artiﬁcial features

Let the evolutionary part of the algorithm develop its own neural network structure
and control the number of unary/n-ary layers, nodes, and connections

Allow more complex unary and n-ary operators in the neural network nodes (e. g,
trigonometric and transcendental operations)

Allow the algorithm to start creating features directly from data under study, rather

than using a limited library of pre-deﬁned features

In the next section, we discuss an approach to implementing some Of the above ideas.

3.2 Genetic Programming Artiﬁcial Features

The limitations described in the ﬁrst and second bullets above regarding the GFNCAF

algorithm can be overcome by using GP. The typical tree representation of GP programs

39

can be seen as an algebraic network, or vice versa, from a GFNCAF point of view.
Similarities and differences between the GFNCAF and the GP algorithms were noted in
[30]. After depth and size parameters for the tree are set up, the GP can evolve any tree
structure allowed by its syntax rules, to create Simple features (functions) as well aS
complex features, unlike in the GFNCAF algorithm, in which the algebraic network
(replaced by a tree diagram in the GP framework) is a ﬁxed (although previous
knowledge could also be added to a network when it is being designed). Its topological
connections are ﬁxed, and the only topological connections able to be on or Off are the
inputs. GPS have been used to create artiﬁcial features previously. Similar approaches
appeared in [55] [89], where the authors used GP to create features (although they refer to
this process as feature extraction) that provide the classiﬁer better performance than using
the original features.

Because GP can use any type of function with veriﬁed closure (i. e., the function
Should be deﬁned for any argument the system can present it), there are many functions
available, such as trigonometric, polynomial, and transcendental functions. This richness
helps to address the limitation stated in the third bullet of the previous section. Figure 3.1
shows the block diagram of the genetic programming artificial features algorithm
(GPAF). The genetic programming produces the program (ﬁmction) that creates the
artiﬁcial feature. The program receives as input (i. e., as a terminal in the GP tree) the data
of interest (input signal) and the output is the result of this artiﬁcial feature. The Signals
input to the GP are not limited to only one; many inputs can be provided, which would

result in a GP program with multiple terminals. Data processed by the artiﬁcial feature is

40

Genetic Programming

 

 

 

 

x;

 

 

 

 

 

 

 

 

 

 

 

 

x ylr er-"yn
——p t

Multidimensional Data Program Classiﬁer

 

 

Figure 3.1 Diagram of the GP-based artiﬁcial features algorithm. The input data (Signal)
are processed by the program generated by GP. Later, the output is used to train the
classiﬁer.

used to train a classiﬁer, which is being used as a supportive component to evaluate the
performance of the GP program or artiﬁcial feature. In addition, testing data must be
processed by the artiﬁcial feature, in order to be evaluated by the classiﬁer. The classiﬁer
can be a parametric or nonparametric classiﬁer, such as maximum likelihood or k—nearest
neighbor classiﬁers, among others. Because one of the contributions of this work is to
prescribe a general-purpose algorithm that automatically creates artiﬁcial features from
conventional features or raw data, we selected the k-nearest neighbor classiﬁer because is
a nonparametric, nonlinear classiﬁer capable to produce multiple thresholds and an easy
trainable procedure. The error or misclassiﬁcation of samples in the classiﬁer is the

measure used by the GP for the ﬁtness or cost function.

41

3.3 Comparing GPAF and GFNCAF Algorithms

In [30], the feasibility of the GFNCAF algorithm was tested with a problem in trying to
learn the concept of parallelism from a set of randomly generated vectors. In the present
work, we use this same problem to Show clearly the implementation of the GP-based
artiﬁcial features. Also, the results Obtained are compared with those Obtained in [30].

In the vector parallelism learning problem, it is desired to ﬁnd an efﬁcient method of
testing whether two given vectors are parallel (to within some error tolerance) or not. A
simple analytic geometric equation that precisely measures whether two vectors are

parallel is the absolute cosine formula

I202 (211: Kama-(2222,2221 z twee + 212.2221 (21)
II(Axt,Ay1)||I(Ax22AJ’2]i (“A2212 2. AYIZXAX22 + Ayzzi .

 

where (Ax), Ayl) and (sz, Ayz) are increments of each component for two given vectors
(the vectors’ tails need not originate ﬁ'om the point (0,0) on the plane) . These data are
relevant features. If the result of |cos(@| is equal to one, the vectors are parallel;
otherwise the vectors are nonparallel. This equation can be seen as a nonlinear, reduction
transformation given that the equation is reducing the feature Space (dimensionality) from
four features to one feature.

In a ﬁrst experiment, we use the GPAF algorithm to ﬁnd an artiﬁcial feature that has
similar or equal performance to the absolute cosine equation, using as possible terminals

any of the increment features, i. e., Ax], Ayl, sz, and/or Ayz. Later, in a second

experiment, we input to the GPAF algorithm the terminal sets (Ax), Ayl, X], y.) and (An,

42

Ayz, X2, yz), where (x1, y1) and (X2, y;) are the tail coordinates of two vectors at hand. It
should be noted that these initial points of a vector do not in themselves provide relevant
information to determine whether two vectors are parallel. Therefore, it is expected that
the GPAF algorithm will realize which features are irrelevant and disregard them, i. e., it
will not use them in designing the artiﬁcial feature. The functions available for the GP are

{+, —, x, -:-, I, l l, ( )2, f }, the same set of mathematical operators used in experiments

with the GFNCAF algorithm. Because we are providing conventional (original) features
to the algorithm to create an artiﬁcial one, this problem is what we term the artiﬁcial-
feature-from-classical-feature-based approach.

To measure the performance of a GP program (artiﬁcial feature), we use Equation

(3.2):

N
P - giyi -y,-I (3.2)

error — —_ 2
N

where Pena, is the percent of misclassiﬁcation of samples, y, is the output of the classiﬁer

that value of 0 or 1, j}, is the desired output, i is the sample index, and N is the total

number of samples evaluated.

In these experiments, for the classiﬁer stage (see Figure 3.1), we use the same
classiﬁer used in [30], k-nearest neighbor with Euclidean distance as metric, to be able to
compare the results of the GPAF algorithm, presented here, with those reported by Firpi
and Echauz using the GFNCAF algorithm. The constant k is set to three. Again, the

output of the classiﬁer is one if the vectors are parallel, and zero if they are not.

43

3.4 Results and Comparison

Similar to the data generated by F irpi and Echauz, in these experiments, 128 parallel and
128 nonparallel vector pairs were generated for training while another set of equal Size
was generated for testing. The vectors had randomly generated increments in a range of
(—5, 5) and initial points in a range between (—10, 10). Table 3.1 shows the classiﬁcation
accuracies of all sets of test vectors. The number of neighbors, k, was set 5.

From the table, it can be observed that by using the k-NN classiﬁer with 4 increment
features as inputs, we can Obtain up to 75% Of classiﬁcations correct. Also, as was
expected, when four features are processed by Equation (3.1) (i. e., absolute cosine

formula), the classiﬁer categorizes with 100% accuracy.

Table 3.1 Results of the classiﬁer using the indicated feature inputs.

 

 

Feature Inputs input to Classiﬁer Correct percent
Ax], Ayl, sz, Ayz Ax], Ayl, sz, Ayz 75%
Axl, AyLsz, Ayz, x), yLXz, y; Ax], Ayl, sz, Ayz, x1, y1,xz, y; 55%
Ax], Ayl, sz, Ayz 1 absolute cosine feature 100%
Ax), Ayl, sz, Ayz 1 GP feature (Eq. (3.3)) 100%
Ax), Ayn, sz, Ayz l GFNC feature (Eq. (3.4)) 100%
Ax), Ayn, sz, Ayz, x1, yLXz, y; 1 GP feature (Eq. (3.5)) 100%
Ax], Ayn, sz, Ayz, x], y], x2, y; 1 GFNC feature (Eq. (3.6)) 84.2%

 

44

When four relevant features were processed by the GP feature, a 100% correct
classiﬁcation was obtained. Figure 3.2 depicts the tree diagram for the GP artiﬁcial
feature. From this tree, a GP artiﬁcial feature designed by the GPAF algorithm, Equation
(3.3) can be obtained. It can be noted that the GP algorithm examines every input feature
even when it is able to disregard any feature input.

I sziAxliiAyliViAxli l—Ax22 —Ax22.

IlimlimzAhl - IAyl Iszzl

 

(3.3)

For comparison, Equation (3.4) shows the formula that was found by the GFNCAF
algorithm with four relevant features in [30]. The interested reader is referred to the

indicated source.

Jig—(m—Ayzﬂz “[(Axl ‘szzlz '(Aytz -Ay22). (3.4)

It can be noted that the equations are different; however, both of them still meet the

 

 

requirement to classify random vectors correctly.

The evolution of ﬁtness, number Of nodes, and number Of levels (depth) through
generations are shown in Figure 3.3. This ﬁgure portrays the accuracy of the program
(artiﬁcial feature) versus its complexity.

In the second experiment, running the GPAF algorithm with eight possible terminal
values, a classiﬁcation of 100% was Obtained. The tree diagram is Shown in Figure 3.5.

From the tree diagram, the following analytical expression can be obtained:

 

lily: + szlezKAytsz - Art/31a). (3.5)

45

All four relevant features appear in the equation. Also, an irrelevant feature, y. (initial
point), was included by the algorithm. This variable can be considered noise; nonetheless,
the artiﬁcial feature achieves perfect classiﬁcation at the expense of added computational
effort. It is likely that a bigger sampling of random vectors where y. would have been
represented throughout the whole plane would have made this variable’s irrelevance
more Obvious to the algorithm. Figure 3.4 presents the accuracy versus complexity graph
of this experiment.

Again, for comparison purposes, the expression found by the GFNCAF algorithm for
the same problem (i. e., with eight feature inputs) is stated in Equation (3.6). We note

GFNCAF was also confused by a non-essential feature 022) in this case.

(Ax: _ mghygyg. (3.6)

Table 3.1 shows that this equation achieved an 84.2% classiﬁcation of correct
vectors. Although GFNCAF algorithm could not ﬁnd an artiﬁcial feature that performed
perfect classiﬁcation, the tendency of the algorithm to pay attention to the relevant
features and disregard those that are irrelevant—the same tendency noticed in the GPAF
algorithm—is apparent.

The GP used for these experiments included Simple crossover and mutation operators
with variable probability of occurrence [87]. To avoid bloat (i.e., unwarranted grth of
tree size until reaching the maximum depth), a dynamic depth technique was used [88].
The number of generations was set to 100 and the population size was 50 individuals.
Lexicographic parsimony pressure [61] was the selection method used. The initial

population was created using the Rarnped Half-and-Half method [57] with size variation.
46

0 Ar.

Ax]

Figure 3.2 Tree diagram for the artiﬁcial feature found by the GPAF algorithm.
47

Accuracy versus Complexity

 

 

 

 

 

 

 

40— 9
— Fitness
35 - - Levels
........ Nodes

8 30*
.0
o
c
3525‘
o
>
2
”‘20"
a «- .......
s 2’ -----
LE 15* ’2’

10*

’2’ I” ........... ..._..J
o I J_ l l
O 5 10 15 20

 

Generations

Figure 3.3 Evolution of ﬁtness, levels (depth), and nodes for GP feature with 4 inputs.

 

 

 

 

 

 

45 -
— Fitness
40 * - - - Levels
........ Nodes
35 _
8
8
c 30
.12“
2
2 25
32:32.
it

 

100 150 200 250
Generations

Figure 3.4 Evolution Of ﬁtness, levels (depth), and nodes for GP feature with 8 inputs.
48

0 ' sz Ay, Ax, Ax, Ay,

y] sz

Figure 3.5 Tree diagram for the artiﬁcial feature with 8 feature inputs.

3.5 Summary

In this chapter, we introduced, discussed, and tested the feasibility Of the GP-based
artiﬁcial features (GPAF) algorithm to create artiﬁcial features for a test problem in
which a function was sought that could help categorize a set of random vectors as parallel
vs. nonparallel. The problem used an artiﬁcial-feature-from-classical-feature-based
approach. The algorithm found artiﬁcial features that process original features, giving a
resultant feature that later was input to the classiﬁer. Then a k-NN classiﬁer was used,

classifying the inputs into their respective classes. A comparison of results with another

49

work, the genetically found, neurally computed artiﬁcial feature (GFNCAF) algorithm
[30], was presented. From results, it can be seen that both GFNCAF and GPAF are able
to create highly complex, nontrivial, nonlinear, transformations that could increase the
separability among the classes—in this case, between parallel and nonparallel vectors—
helping the classiﬁer to make a better decision when classifying the data. However, the
properties of the GP algorithm (i. e., just indicating to the algorithm the maximum depth
and the number of nodes, and the algorithm takes care of the feature’s topology) make the
GPAF algorithm more ﬂexible to create Simpler as well as more complex features. For
this reason, sometimes the parsimony of the GPAF algorithm’s results may be smaller.
For instance, in Equation (3.5), the GPAF algorithm created an equation that contains
more elements (i. e., inputs and operators) than that created by the GFNCAF algorithm
(Equation (3.6)) for the same experiment, but the GPAF-designed feature achieved
perfect classiﬁcation. The GP algorithm cannot ensure that it will ﬁnd results that achieve
the intended objectives with the maximum parsimony possible in a limited-time
framework (in fact, no evolutionary algorithm can), but only that at least a suboptimal
solution will be found. Because the GP algorithm is the search engine of the GPAF
algorithm, the previous statement also applies to it.

On the other hand, with the inclusion of the GP algorithm to replace the evolutionary
component and the algebraic network, the ﬁrst and second recommendations bulleted in
Section 3.1 to improve the GFNCAF algorithm were addressed and implemented. The
third and fourth bullets were not implemented because the degree of complexity of the

problem in this chapter (i. e., categorization of parallel and nonparallel vectors) did not

50

require improvements for such limitations. However, in the next chapter, such limitations

will be addressed for the prediction and detection of epileptic seizures.

51

Chapter 4

GP Artiﬁcial Features Applied to Seizure Prediction

In this chapter, we explain the application of the GPAF algorithm to epileptic seizure
prediction. The next section shows describes how the electroencephalographic signals
(EEGS) used in this research were obtained. The EEGS were intracranial voltage signals
measured in patients with mesial temporal (middle of lateral brain lobes) epilepsy. We
also present some concepts from nonlinear dynamic systems theory, concerning state-
reconstruction via a delay-embedding method, and provide an example of using such a
method. Later, we merge nonlinear dynamic systems concepts with the GPAF algorithm.
Additionally, we deﬁne the benchmark classiﬁer that will be compared in performance
with those designed by the GPAF algorithm. The following sections contain the
mathematical development of the performance metrics for the GP version. Also, we
present the function and tenrrinal sets chosen for the GP algorithm. Results for various

patients are presented in the last section.

4.1 Clinical Context

The anonymized intracranial EEG data archive used for these experiments was obtained
from a tripartite database from Georgia Institute of Technology, Emory University, and
University of Pennsylvania. The EEG data were recorded from epileptic patients
undergoing a pre-surgical evaluation. According to the Epilepsy Foundation of America

[23], of every 100 new cases of epilepsy reported, one patient is a candidate for surgery
52

and 70% percent of those are cured by removing a part of the affected region in the brain.
To perform these evaluations, recordings were Obtained by surgically implanting scalp
electrodes or intracranial electrodes deep inside the brain. Intracranial electrodes are
preferred over scalp electrodes because EEG Signals are recorded practically clear of
undesired artifacts created by physiological activities like heartbeat, chewing, moving,
blinking, etc. The patients suffered complex partial seizures, seizures that arose focally
and caused of loss of consciousness. Intracranial recordings allow neurologists to identify
more accurately the foci, where the seizures originated. Identiﬁcation of the correct foci
was not possible using scalp electrodes. For each patient, a six-contact depth electrode
was placed in each temporal lobe throughout a hole drilled in the back of the skull
(occipital region) as depicted in the picture in Figure 4.1. To know the coordinates at
which the electrodes Should be placed, high—resolution MRI images from the patient were
used.

Epileptologists visually reviewed all the EEG recordings and marked each seizure by
ﬁrst identifying seizure activity on the EEG and later moving backward in time to
determine where the earliest EEG changes from baseline to epileptic seizure activity
occurred in the patient. Second, the unequivocal electrographic onset (UEO) was
identiﬁed. This is the point in time that marks the onset of seizures [82] without any
doubt that a seizure is occurring. Additionally, for each patient, intracranial EEG archive,

synchronized video, and medical records were revised. Other relevant clinical aspects

53

”1‘ ‘2‘“ ‘

Figure 4.1 Electrodes placed in the hole drilled in the back area of the skull.

 

like clinical activity, sleep-cycle times, and times and doses of medication administered

during the hospital stay were also recorded.

4.1.1 Data Acquisition Equipment and Process

Continuous IEEG and video of patients were collected on a digital, 64-channel, 12-bit
Nicolet BMS-SOOO epilepsy monitoring system. The EEG Signal was downloaded from
tape to CD-ROM. EEGS were digitized at 200 Hz, and recorded aﬁer ﬁltering through a
bandpass of 0.1—100 Hz. A digital 60-Hz notch ﬁlter was also employed to suppress line
noise. Hospital stays varied from 3 to 14 days. For this work, 66 CDs totaling several
gigabytes of raw data were available. Figure 4.2 shows the electroencephalographic
signal for a case in which an epileptic seizure will not occur immediately thereafter and
Figure 4.3 Shows the electroencephalographic signal shortly before an epileptic seizure.

The durations of these signals are 10 minutes.

54

4.1.2 EEG Data Description

AS was mentioned previously, the anonymized intracranial EEG data archive used for
these experiments was Obtained mostly from Emory University School of Medicine.
From this archive, we have available EEG data for seven patients. There are baseline
Signals as well as (pre)ictal signals. Table 4.1 shows the number of tapes that are seizure-
free signals and the number of tapes that contain one or more seizures available for each
patient.

Table 4.1 Number of baseline and seizure tapes available.

 

 

Patient No. of Baselines No. of Seizures
A 2 4
B 2 4
C 1 8
D 14 6
E 20 5
F 3 6
G 11 8

 

The durations of the EEG recordings vary from 38 hours of recording for patient D
up to 275 hours of recording available for patient G. Each EEG recording tape lasts
approximately eight hours (although a few are Shorter).

Because during the EEG signal recording procedure much interference, such as 60
Hz line voltage, blackouts, and other forms of noise can be introduced, the EEG
recordings are not pure or clean. To remove some Of these artifacts, a preprocessing step

is carried out. Even after that, many of those artifacts pass through the ﬁlters. For this

55

reason, if we design an algorithm to predict seizures, it may happen that the algorithm
triggers (indicates a seizure) because of one of those artifacts and not because a seizure
will actually occur. Given this Situation, if all the available EEG hours were evaluated,
we would need to analyze and study all the recordings and verify one-by-one all places
throughout the EEG Signal where the algorithm triggered to indicate a seizure in order to
see whether the algorithm is responding to valid EEG data or, instead, to a recognizable
artifact. This veriﬁcation would be tedious beyond practical limits.

For this reason, baseline and preictal data segments of ten-minute duration were
extracted from the whole EEG data set for each patient for training and validation
purposes. The extracted EEG data segments are a statistical sample that represents the
whole EEG archive of a patient. These segments were analyzed “by eye” to be sure that
they were clean and that no recognizable artifacts were present. As a general rule, we
extracted as many baseline epochs as there were epileptic seizures in the patient’s EEG

record; however, in some cases, a few more baseline epochs were extracted.

4.2 EEG Signals at a Glance

In this work our interest is to apply the GPAF algorithm to extract or create artiﬁcial
features that increase the performance of the classiﬁer. In this chapter, we apply the
GPAF algorithm to create a feature that could perform an acceptable rate of classiﬁcation
in categorizing EEG data as either baseline or preictal waveforms.

In this work, we deal with mesial temporal lobe epilepsy (MTLE). First, let us look

at the EEG data for both baseline and preictal classes. Figure 4.2 shows two epochs (i. e.,

56

data segments), where one is a baseline segment and the second is a preictal segment;
both of them often minutes duration (as mentioned before, EEG signals were sampled at
200 Hz; therefore, a ten-minute epoch contains 120,000 points). The preictal epoch initial
point was taken to begin ﬁfteen minutes before the UEO, and it ended ﬁve minutes
before the UEO. That is, in order to obtain a prediction a sufﬁcient time in advance of the
seizure onset, we wanted to base the prediction on a signal epoch ending at least ﬁve
minutes before UEO. From the illustration, it can be noted that there is no obvious visual
pattern that helps us to distinguish one signal from another. Figure 4.3 presents the
(empirically determined) probability density function of both EEG data epochs. It can be
seen in the picture that the baseline data and preictal data are almost completely
overlapping. In fact, if we calculate the ﬁrst- and second-order statistics for both signals,
we obtain that the means for both classes are approximately zero (i.e., 112,353.,“ z 0 and
ppm“... z 0) and the variances are ozbasennc = 2550.0 and ozpreicm] = 707.02, making
virtually impossible a high classiﬁcation rate without any preprocessing of the EEG

signals.

57

 

 

0_8 r T r r r T ﬁr r r

0.6 t 4

0.4 r r

 

 

Voltage [V]

 

 

 

 

-0.6 _
'0'80 1 2 3 4 5 6 7 8 9 10
Time [min]
(a)
0.8 Y I r I I r f r
0.6 ~ -
0.4 - ‘

 

 

Voltage [V]

 

 

 

 

-0.4 - c
-0.6 - a
J 1 l l I I l l l
0.80 1 2 3 4 5 6 7 8 9 10
Time [min]
(b)

Figure 4.2 Illustration of ten-minute-duration baseline signal (a) and ten-minute-duration
preictal Si gnal, extracted until ﬁve minutes before UEO (b).

58

0.035 _

— Baseline data
------- Preictal data

7

0.03 r

0.025 r

0.02 ’

 

0.015 ‘

 

 

0.01 r

0.005 *

 

Al I

0.4 0.6 0.8

 

Figure 4.3 Probability density functions for two-ten minutes epoches: one baseline (solid-
line) and one preictal (dashed-line).

4.3 Nonlinear Dynamical Systems and State-Space Reconstruction

It has been proposed that EEG signals are nonlinear [77], nonstationary [103], high-
dimensional (because of the number of neurons in the brain), and chaotic (complicated,
but of relatively low dimension) [47], [78]. Whether or not EEG signals are nonlinear is
still being argued by other researchers, some of whom propose that EEG Signals are
linearly generated, i. e., that one should treat EEG Signals like output of a linear system to
which the input is noise [86], [98]. Nonetheless, in this work, we treat EEG signals as
nonlinear and chaotic Signals. EEG signals have been analyzed from many perspectives
and using different methodologies, most of which were cited and brieﬂy discussed in

Chapter 2.

59

Many authors have applied nonlinear dynamics tools to analyze EEG data. Chaos theory
[91] states that in a system with apparently disordered dynamics, an underlying order
resides in the data. Chaotic systems have various characteristics [91]:
o Chaotic systems are aperiodic
0 They are sensitive to the initial conditions. A slight change in them can cause a
completely different long-term output response (therefore, unpredictability in the
long term)
0 The equations that dictate the dynamics of the system are nonlinear.
Point two (above) states that long-term prediction is impossible; however, it has also been
stated that prediction in the short term, in chaotic systems, is possible.

Let us consider a system with dynamics described by

;=2(.,(2),.,(.),.....,(.» (41>
with a unique observable output described by
y(t)=f(x1(t)2x2(t)L'--2xn(t))2 (4.2)

where n is the number of equations that describe the dynamic behavior of the system.

Evaluating these equations from an initial state x0, a trace is left afterward by the state
vector x while traversing the state-space R". This state trajectory is denoted B,(xo).
Nonlinear dynamic theory asserts that the state trajectory B,(xo) can be reconstructed

by uniformly sampling Equation (4.2) to obtain a time-series, which we denote yr, Where

k = 0, ..., n, -—l , where n, is the number of samples and

y. syn“), (4.3)
60 I

where r is the delay or sampling interval. This strategy is called time-delay embedding
[95]. Figure 4.4 illustrates a state trajectory being reconstructed by delay—embedding.
This method involves building a pseudo-state vector A]. (also known as the embedding
vector) of n, delayed samples (using Equation (4.3)) from the observable measurements
y(t):
7% = [yk—(ne—l) yk-(ne—Z) yk]- (4.4)

The parameter n. is also known as the embedding dimension. Equation (4.4) is a pseudo-
state vector at time k. To perform the state reconstruction completely, we need to slide
the window through all the samples taken, 12,. This last statement can be formulated as a

matrix: if we stack all the embedding vectors in a vector column, we obtain the following

 

matrix:
' yo yr yz yne—r-
Yr yz 2’3 yne
P: y2 y3 y4 ynen , (4.5)
_}’Ne-I J’Ne yNe+l J’ns-1_

 

where N... denotes the number of embedding vectors and equals n, — (n. — 1). This matrix
is called the embedding matrix. Such a matrix contains or samples the trajectory of B,(xo).
However, the embedding matrix I" does not contain an exact copy of the trajectory but a
distorted or copy of the “genuine” trajectory of B,(xo). To be more consistent with
topological concepts, the embedding matrix is a diffeomorphism of the original

trajectory. A dtﬁeomorphism [37] is a differentiable homeomorphism (smoothly distorted

61

 

 

 

 

,2 _. ,, 2111,11I2ili

 

 

r J’s—1 1
J’k—z %
”:3 : é?
_yk—ne _

 

Figure 4.4 Reconstruction of the delay-embedding trajectory from a time-series.

copy) of the original trajectory that preserves the topological properties of the trajectory,
such as number of holes, Sharp comers, and thus properties like fractal dimension and
Lyapunov exponents. Although more details are beyond the scope of this work, interested
readers are referred to a full explanation in [20].

One valid question that comes up is what value Should he be set to? If the embedding
dimension is set such that n, 2 n (recalling n is the order or the number of equations that
describe the system in Equation (4.1)), a state reconstruction can be generated though not
guaranteed (e. g. , a Klein bottle needs a four-dimension Cartesian space to describe its
trajectory even though it is a two-dimension manifold) not, because only it equations are
needed to solve for n unknowns. On the other hand, if n, < n, it is still possible to
“recover” the state-space if the trajectory settles in a lower-dimensional region. Takens

[95] proposed the embedding theorem, which gives us an upper bound on the embedding
62

dimension n... Takens’ bound guarantees that the state-space can be reconstructed by
setting n, = 2n + 1. However, in practice, the value of n iS not known; but given such an
upper bound for ne, various strategies have been conceived. One of the simplest methods
used is to start evaluating the model with {21.3 = l, 2,..., ns} and increase it until an
acceptable prediction error is obtained, i. e., the difference between the predicted value
and the real value is low [75]. Additionally, some researchers have said that what the
embedding theorem states is that even if a system is of inﬁnite order, a ﬁnite number of
delays of the observable measurement (observable output) can be used to realize the
whole dynamics, given that the state conﬁnes its trajectory to a space of ﬁnite dimension.
Such cases have been reported in studies of turbulent ﬂuid ﬂow [84].

For the sake of clarity and comprehension, consider a well-known chaotic system:
the Lorenz attractor. The equations that govern the dynamics, although simple, produce
chaotic behavior. When a three-dimensional system is plotted in a three-dimensional
space, the trajectory of the system can be Observed. Figure 4.5(a) illustrates the Lorenz
attractor in a three-dimensional plot. Equation (4.4) states that we can reconstruct a
diffeomorphism of the original trajectory using the previous values of an observable
measure from the system under study. In this case, we know that the order of the system
is n = 3. Therefore, using a pseudo-state vector formed by one state variable and two
delayed samples of it, we reconstruct the trajectory. Figure 4.5(b) Shows the state

reconstruction of the Lorenz attractor.

63

 

 

 

 

 

 

 

 

Figure 4.5 (a) Lorenz attractor and (b) its reconstruction using the previous values of a
state variable.

4.4 State-Space Reconstruction and the GPAF Algorithm

Early in the previous section, it was mentioned that the dynamics of the EEG Signals are
considered to be of high-dimensionality. This fact is discernible because the human brain
has approximately 1011 neurons and each of these neurons can be described with an
equation that dictates its dynamics, making the order of the brain practically inﬁnite or at
least very high-dimensional. However, it was also mentioned that researchers have stated
that the embedding theorem states that a ﬁnite number of delays of the observable
measurement (observable output) can be used to reproduce the whole dynamics when the
state conﬁnes its trajectory to a Space of ﬁnite dimension. Thus, we can hope to capture a
“pocket” of deterministic components of the state trajectory of the EEG Signals by taking

previous samples of the observable output and creating an artiﬁcial state vector with n.

elements (denoted A). in Section 4.3). The idea of modeling EEG Signals by a delay-
embedding method has been proposed before in [20].
Given that we can reconstruct the state trajectory from previous values, our approach
is to input such deterministic components, represented as pseudo-state vectors (i. e., M)
evolving in time (i. e., embedding matrix F), to a genetic programming module and by
means of the algorithm, to ﬁnd a pattern or transformation, probably nonlinear (although
a linear transformation is possible), that yields a subspace where the baseline and preictal
classes exhibit the maximum separability possible, making viable the prediction of
epileptic seizures. In other words, the GP algorithm Should combine the inputs (states) in
a (non)linear way and output a function that separate the classes such that the
performance of the classiﬁer is better than without a transformation or a benchmark
previously deﬁned. Let us deﬁne an expression that captures mathematically the essence
of what was stated above. Let
y.~[k] = 4120»), (4-6)
where ([1,- is the transformation function (or artiﬁcial feature) designed by the GP
algorithm; i is the artiﬁcial feature index (equal to 1 if the GP module is designing just
one artiﬁcial feature), and
A = {x[n],x[n- t],x[n—21]-~,x[n—(ne —l)r]} (4.7)
is the set of delayed samples (inputs) that the GP will use in any combination (with

replacement) to construct the artiﬁcial features. Therefore, the artiﬁcial feature is a

function deﬁned (13,-: R"? —> R. The parameter T is the delay time, which will be

65

determined using the autocorrelation function (discussed in Section 4.5), and n, is the
embedding dimension. There are various methodologies to determine an adequate
embedding dimension, the simplest being trial-and-error in increasing dimension, as
mentioned before.

One work of particular interest in this area, due to its relationship with our work, was
presented in [62]. McConaghy et al. proposed the use of a GP algorithm to reconstruct a
chaotic dynamic state of a given system. The GP system designed polynomials that
reconstructed the exact dynamics of the chaotic system. Also, the GP determined the
embedding dimension given that the authors added the delay operator z'1 to the function
set, which applies a unit delay to all terminals in its argument. Therefore, before
evaluating the program (individual or chromosome) to score its ﬁtness, a preprocessing is
performed on each individual to apply the delay operators such that the terminals in the
argument receive their respective delays.

AS is done with many conventional features—for example, the Fourier coefﬁcients,
or the energy of a signal, etc., in which the feature is calculated over data viewed through
an observation window—in the genetic programming artiﬁcial feature (GPAF) algorithm,
we will also use an observation window, also known as a sliding window, which will be
slid through the entire signal being processed. The sliding window is deﬁned by two
parameters: length of the window, denoted L, and the displacement factor, denoted D,
alternatively described as the percentage of overlap between adjacent sampling windows.
Figure 4.6 depicts the sliding window concept over an arbitrarily selected ﬁfteen-second-

duration EEG Signal. The length L deﬁnes the number of points that will be analyzed at
66 I

0.1 . H___, ,

 

 

0.08 r

0.06 *

 

0.04 r

 

 

 

Voltage [V]

-0.02

-0.04 r

-0.06 *

 

-0.08

l-I-I-h 2L..-—
_:

 

 

 

 

1—

 

-O.1
0

 

01
..s
o

1 5
Time [s]

Figure 4.6 Sliding window slides (solid rectangle) over an EEG Signal. The dashed

rectangle indicates the position where the window will be displaced in the next
computation.

the same time. On the other hand, D is deﬁned as the number of new points that will be
used in the next evaluation, such that D = L — 0, where 0 is the overlap, or the number
that will be evaluated again from the previous position of the window. To compress and
smooth out the variability in the data, a summation operator will be applied at each
window position. Recalling Equation (4.6) and adding to it the sliding window, i. e., a
summation operator, the resultant equation is stated as

2.121: ”(Elf-“(2142122-21...,2In—(n. —1>21>. (2s)

n=l+(k-I)D

67

The y,-[k] is the GP artiﬁcial feature of the EEG Signal x[n], termed here the artificial
feature time series; the subscript i denotes the number of the feature (i. e., we can have a
single artiﬁcial feature or a vector of such features), it is the index that controls the
displacement inside the sliding window, and k is the discrete time unit of this time-series
(or the index that indicates the next sliding-window position). The sliding window
observes all L points in window-position k, whereas index n iS delimited by l + D(k—l) S
n S D(k-—1) + L (superscript and subscript on the summation symbol). In this work, we
selected n to advance by steps of one, thus taking advantage of all of the EEG data
available and scrutinizing the data in maximum detail looking for patterns.

A point y2[k] in Equation (4.8) is proportional to an average of the GPAF-processed
points Observed through the Sliding window, in which the argument of the summation 1]),-
(i. e., the GPAF function) is found by the GP algorithm, and in which at each window
position k, the number L of points will be reduced, by means of Equation (4.8), to a Single
point at the output. Thus, if we have a Signal that contains PTO, points and we set
displacement to D, the number of output points after the entire signal is processed by the

GP artiﬁcial feature is

K=fE’-—-L—+l, (4.9)
D D
where the 1 - L/D adjusts for the beginning and end of the Signal where the window of
length L is not complete; thus, some of those positions (or points) are not calculated.

To summarize all the components of the GPAF algorithm, a block diagram is

presented in Figure 4.7. Figure 4.8 illustrates the transformation performed by the GPAF.

68

EEG signal

1

Genetic programming

 

 

 

 

 

 

 

 

 

 

Sampled EEG signal
Ak Y
—> :
State-reconstruction Program Classiﬁer

Figure 4.7 Diagram of the genetic programmed artiﬁcial features algorithm. The input
Signal is processed by the program generated by the GP. Later, the output is used to train
the classiﬁer.

The function (1) is the artiﬁcial feature, i. e., the program generated by the GP that
increases the separation between classes. Once the EEG data are processed by the
artiﬁcial feature, y (the artiﬁcial feature Space) is obtained and input to the classiﬁer
(denoted as u). The classiﬁer is used as an auxiliary component to calculate the ﬁtness of
the program (artiﬁcial feature); however, a researcher may decide not to use a classiﬁer
and to let the GP algorithm designs the decision structure (i. e., classiﬁer), but that will
carry an enormous computational cost. Additionally, if the researcher decides to use a
classiﬁer to calculate the ﬁtness, he/She must be aware that the artiﬁcial features are

69

optimized for the classiﬁer that was selected. Although it may be desired that the
artiﬁcial features work well with any classiﬁer, such performance cannot be guaranteed.
The GPAF algorithm will continue evaluating new programs (artiﬁcial features or
individuals) until the GP reaches a pre-set maximum number of generations or the ﬁtness

goal is met, according to a ﬁtness function explained in Section 4.6.

   
 

Artiﬁcial feature
space

  
 
 

EEG space-state
reconstruction

Figure 4.8 Schematic of the nonlinear transformation performed by the GPAF algorithm,
denoted d), in cascade with a classiﬁer, denoted 11, which categorizes the data into one of
two possible classes: baseline or preictal. The variable x is the EEG signal at hand and
variable y represents the data aﬁer processing by the GP artiﬁcial feature.

4.4.1 Classiﬁer

Although in the GPAF algorithm any classiﬁer may be used, in this work we selected the
classiﬁer with two aspects in mind. Recalling from Chapter 1, one of the contributions of
this work is to prescribe a general-purpose algorithm that automatically creates artiﬁcial

features from conventional features or raw data. Therefore, we selected the k-nearest

neighbor (k-NN) classiﬁer as the classiﬁer component for the GPAF algorithm. This
70

classiﬁer is a nonparametric, nonlinear, and thus a multi-threshold one, making it suitable
even for multi-dimensional, multi-modal problems. In addition, the training process is
relatively easy, by just giving to the classiﬁer the whole training dataset. Again, as was
previously mentioned, because we selected the k-NN classiﬁer, the artiﬁcial features are
going to be optimized for such classiﬁer. We do not guarantee that such artiﬁcial features
will have a good performer if any classiﬁer such statement cannot be guaranteed.

There are many metrics or rules used in k-nearest neighbor classiﬁers. Among them
are the Minkowski metric, Manhattan distance (also known as city block distance), and
the Tanimoto metric. In this work, we will use the Euclidean distance, which is the most

common metric used and a special case of the Minkowski metric, and is deﬁned as

 

D(a,b)= Jim. —b,.)2 , (4.10)

i=1

where a and b are the vectors between which the metric is to be calculated.

4.4.2 Detailed Example

Figure 4.9 shows all the components that comprise the GPAF algorithm and the steps
involved in the evaluation of one individual. In the ﬁgure, we provide an example where
two artiﬁcial features are being sought (thus, an individual corresponds to a two-tree
“forest”). The ﬁrst step iS the generation of the initial population of artiﬁcial features (I).
The artiﬁcial features are transformed from vectors to trees for comprehension purposes
(II). Such trees encode the artiﬁcial features. From these, the artiﬁcial functions are

translated into mathematical expressions (the arguments Of the summation operators) to

71

be evaluated (Ill). Later, the EEG signal, segmented with the Sliding window (IV) is
processed by the artiﬁcial features generating the artiﬁcial Signals (V). Later these
artiﬁcial data are input to the k-NN classiﬁer to execute the categorization task (V I). In

661’?

the illustration, label “0” denotes the baseline samples and denotes the preictal (or
ictal) samples. In the training phase (i. e., when artiﬁcial features are being designed), we
can evaluate either the training data or a checking data set (a data set different from the
training data intended to resemble the validation data) to calculate the classiﬁcation error,
the performance measure that will drive the GP search (further discussion about the
performance index is in Section 4.6). It Should be noted that this entire illustration is just
to evaluate one individual (i. e., one forest). After all individuals in the initial population
are evaluated, the genetic operators take over to create the next population of artiﬁcial

features. The errors are calculated again. This process is repeated until either a

performance goal is reached or the maximum number of generation is met.

4.5 Autocorrelation Function

The autocorrelation function [71] of a random process or a signal q[n] is deﬁned as the
expected value of the multiplication of q[n] and a time-Shifted version of itself.
Mathematically, the autocorrelation (discrete version) is deﬁned as follows:
qu [n,n+m]= E[q[n]q[n+m]]. (4,11)
What the autocorrelation tells is how quickly the self—similarity of a process (or a
given Signal) changes with respect to time. To graph an autocorrelogram, Equation (4.11)
is evaluated for different lags, m, and the results are displayed in the graph. A

72

 

 

II. First set of individuals
(artiﬁcial features)

Tree 1 Tree 2

 

11 x5 X2 X3 X1 x, X:

I. Population

(forests) ]

”121:0“? L —»

l+D(k—l)x5 + x6

III. Artiﬁcial features
—> (programs)

D(k—l L _x
yzlkl= z” "‘3'—'—+

I+D(k—I)x5 '(xz )2

 

 

IV. EEG signal and
sliding window

 

V. Artiﬁcial feature

signals

y: [k] y2[k] Labels
-25. 4 12.5 0
-10. 2 16.5 0
—65. 8 3.25 0
' ‘ 1.25 16.9 0

Training or I Error-
checking data - - - calculation
1
. . 1
-3. 25 29.3

VI. k-NN classiﬁer

Figure 4.9 Detailed illustration of the GPAF algorithm process.

73

 

.0
m
I

 

 

 

 

,5 0.6“ ~
:26
e
§o.4~ ~
‘5
<
0.2- -
OF-
”0.2 E I I LL 1 I L L I I
o 100 200 300 400 500 600 700 800 900 1000

Lags

Figure 4.10 Characteristic autocorrelation plot.

characteristic normalized autocorrelation plot is shown in Figure 4.10. It can be noted
that the closer m (the lag) is to zero, the larger is the autocorrelation.

One parameter that is determined from the autocorrelation plot is the autocorrelation
time. It is the value of m for which the autocorrelation value, qu(n,n+m), is zero or
crosses zero for the ﬁrst time. In Figure 4.10, this value (point) is indicated, on the lower
left side of the plot, with two dashed-lines.

To determine the delay time I (see Section 4.3), the autocorrelation plot of an EEG
preictal Signal (selected for training purposes) will be Obtained and the autocorrelation
time will be visually determined for each patient. The autocorrelation time (delay) value

can be represented using time units, e. g. , seconds.

74

4.6 Performance Index

The desirability of the artiﬁcial features is quantiﬁed using classiﬁer-based performance
metrics. These performance metrics drive the search of the GP algorithm for the “best”
set of artiﬁcial features. The prediction/detection epileptic seizure problem is a classical
detection problem, with the sole difference that in a classical detection the test is whether
the event at hand is noise or a Signal-plus-noise. In our case, we are the test is to know
whether the event at hand is nonseizure (analogous to noise) or preseizure (analogous to a
signal) data. We can model this more formally by letting

H0: y 6 NS

(4.12)

H]: y e S.
This is the classical binary hypothesis test. Typically, H0 is known as the null hypothesis,
and states that y is coming from the nonseizure class NS, whereas H1 is the alternative
hypothesis that states that y is coming from the preseizure S class. Figure 4.11 Shows an
illustration of two conditional PDFS, where p(y|NS) denotes the conditional PDF for the
nonseizure class, while p(yIS) denotes the conditional PDF for the preseizure class. There
are various metrics that can be deﬁned from this illustration. These metrics comprise
integrals, determination of PDFS, and class-decision boundaries. Thus, we develop
empirical measures of such metrics, which are unbiased estimates based on counting.
Counting can be performed either on a point basis or a block basis. In point-basis

counting, a sample (i. e., a point) in the artiﬁcial feature space is counted as one example,

whereas in block-basis counting, a series of artiﬁcial feature samples (i.e., a string of

75

p(y|NS) p(yIS)

    

\

\\ \

 

PCN
PFN

Figure 4.11 PDFS for a binary hypothesis test.

points) is counted as one example. From the above hypothesis, we can deﬁne a series of

classiﬁer (detector) decisions, which are presented as a confusion matrix in Table 4.2.
Table 4.2 Confusion matrix of the classiﬁer decisions.
S NS

3 NC? NFN

 

 

NS NFp NCN

 

 

 

 

S and NS are labels for true preseizure and nonpresiezure classes, while their “hats” (with
circumﬂex marks) indicate the classes declared by the classiﬁer, Ncp is number of correct
positives (preseizure class detections), NCN is number of correct negatives, N“) is number
of false positives (false alarms, when the algorithm or device said that a seizure will
occur and it does not), and NFN denotes number of false negatives (preseizure class
misses, when the algorithm says that a seizure will not occur, then it does). Addtionally,
we deﬁne NS = Ncp + NFN, the number of preseizure examples, NNS = NCN + Npp, the

number of nonpreseizure examples, and N202 = N5 + NNS = Ncp + NFN + NCN + Npp, the

76

total number of examples. Using the classiﬁer decisions stated above, we can derive a set
of empirical formulas to calculate various classiﬁer-based performance metrics. These
metrics are provided in Table 4.3. The quantities PCP, PFN, Ppp, and PCN are denoted in
Figure 4.11.

Table 4.3 Multiple classiﬁer-based performance metrics for the GP cost function.

 

 

Metric Formula
Probability of correct _ NCP
posrtrve PCP — —N
(Sensitivity) S

Probability of false Pm = NFN 1-22.1,

negative 8
Probability Of correct N
negative PCN = ————NCN = l — PFP
(Speciﬁcity) NS
Probability of false P = N pp
positive FP NNS
NCP
Selectivity Sel =
N CP + N F?

Probability of correct P = N CP + N CN
classiﬁcation C Ntot

PE:AI7N+AI:P=1_R:

tot

Probability of error

 

The selectivity is another deﬁnition for speciﬁcity quantity [100], which denotes
how many of the feature-vectors (or a Single feature) categorized as preseizure actually

were preseizure. Additionally, we can estimate the prior probabilities empirically as

77

Shown in Table 4.4, where P(S) denotes the prior probability of the preseizure data and
P(N S) denotes the prior probability of the nonseizure data (baseline).

Table 4.4 Prior probability metrics from the confusion matrix.

 

Quantity Formula

 

Prior probability of P(S) = N S
preseizure N to,

 

N
Prior probability of P(NS) = Tvy‘s‘

nonpreseizure m
=1 — P(S)

 

From Tables 4.3 and 4.4, we can state the performance index (or ﬁtness function) for the
GP algorithm. The probability of error can be stated as

PE = P(E | S)P(S)+ P(E | NS)P(NS)
= PFN P(S)+ PFPP(NS) '

(4.13)

Equation (4.13) penalizes the two types of errors: the number of false positives and the
number of false negatives, which are scaled by their respective prior probabilities. In
some problems, it is possible to use the priors to penalize the errors; however, for
prediction and detection of epileptic seizures this may not be enough because P(S) is very
small considering the time scales used (seizures just last approximately minutes while the
nonseizure data may last hours); and missing of seizures (false negatives) are relatively

unacceptable. For this reason, we invoke the error risk metric, which assigns risk factors

r > 0 to adjust the relative costs of the errors. Equation (4.14) Shows the error risk metric

RE = IDFN P(S)rFN + PFPP(NS)rFP, (4.14)

78

where rm is a risk term associated with missing preictal signals and er is a risk term
associated with declaring false positives. We can interpret the P(S)rFN and P(NS)er as
penalty factors (or weight factors); thus, we can rewrite Equation (4.14) as
RE =PFN”FN+PFF”FF- (4-15)
Typically, missing a seizure is considered worse than declaring a false alarm; therefore, it
is intuitive to set am > rm: to compensate for the effects of a low P(S). For example, we
can assign am = 0.75 and 71'“) = 0.25. This makes the GP algorithm to pay attention to
both types of errors. However, the exact values of penalty weights can be adjusted
according to the patient’s untreated seizure frequency, forms of therapy, or other factors.
For a given feature vector or a Single artiﬁcial feature, the minimum error risk

problem is

min R
Q E , (4.16)

where Q is the Space of all possible class decision boundaries or the Space that contains
all possible mappings from input feature y to classiﬁer output. The optimal decision

structure [10] is given by
C = ar max p S lit , NS in .
(y) C2§{S,NS}{ (yl FN p(yl FF} (4-17)

If we deﬁne the likelihood ratio, the decision rule declares class S if

PIYIS) > P(NS)rFP = ”FF
p(y|NS) P(S)rI-‘N ”EN

 

, (4.18)

where ﬂ'Fp/ﬂFN is the threshold. The decision rule for the threshold on the posterior

probability declares class S if
79

l

P(S'y)>m'

(4.19)

If P(S) is very small, this threshold can be very small, as in the maximum-likelihood rule.
For example, if P(S) = 0.005 and am = 0.75 and erp = 0.25, values that place more stress
on avoiding FNS than maximum likelihood, the threshold is 0.0017.

To discourages the GP algorithm from designing artiﬁcial features for which its
classiﬁer makes too many errors of one type and too few of the other type, we added a
constraint to the minimum risk problem as stated in Equation (4.20):

ngIRE lb 2 Bo i- (4.20)
For instance, the constant-declaration classiﬁer whose output is always class preictal (i. e. ,
it is always declaring that a seizure is occurring) has only 25% error risk:
RE = 0(0.75) +1(0.25) = 0.25. (4.21)
However, this classiﬁer is merely a device that medicates the patient in a prescribed
open-loop regime. The balance constraint drives the algorithm to make more useful
decisions based on the input artiﬁcial feature. In the cost function of the genetic
programming system, this contraint is added as a penalty term to the error risk if the

balance is less than 50%.

 

0.75 P + 0.25 , b50.5
Cost Function = ( ) FN ( )PFP , (4.22)
(0-75)Pm + (0-25)P1:p 22 7(1— b), b > 0.5
where yis a weight commonly equal to 1 and b is the balance deﬁned as
2 P - P
b - 1 I F” “’l (4.23)

_ _|0.5—PFN|+|0.5-PFP|+I'

80

The parameter b is O for the worst case when either (PCN, Ppp) = (0,1) or (1,0), and is l in
the best case when Pm = PFP [21].

Depending on the value of the threshold arrived at by changing the probability or the
likelihood ratio, we move the operating point of the classiﬁer along the receiver operating
characteristic (ROC [92]; PCN vs. Ppp) curve, trading off the ability to detect preseizures
for the ability to detect nonpreseizures (and losing optimality as initially deﬁned by the
performance metrics). If the risk 71:? is increased, the probability threshold is also
increased, reducing the propensity of the system to declare S. In contrast, to make the
detector (i. e., classiﬁer) more sensitive, the risk rm should be increased. The complexity
of the classiﬁer decision boundaries changes in the n-dimensional space of features as the

value of thresholds on the likelihood ratio or the probability changes.

4.7 Benchmark Classiﬁers

As a frame of reference for our proposed methods, principal components analysis [49]
(also known as Karhunen-Loéve transform in engineering) will be implemented as the
feature extractor. Principal components analysis (PCA) is a well-known, established
nonparametric feature extractor that performs a linear transformation over the input
space, returning a subspace with the best (in the least-sum-of-squared-error sense)
uncorrelated representation of it. Mathematically, PCA is deﬁned as
Y = CDT X , (4.24)

where X is the input data and (DT are eigenvectors associate with the largest eigenvalues.
Such eigenvalues are obtained from the covariance matrix of the input data X. The

81

number of eigenvalues (and thus the number of eigenvectors) is related to the number of
dimensions to which it is desired to reduce the input space (i. e. , X).
The embedding matrix (denoted as F in Section 4.3) with it, columns (equal in

number to the embedding dimension), is input to the PCA. That is,

_ x[l] x[l — 2'] x[l — 22'] - -- x[l — (ne — l)r] _
x[2] x[2 — r] x[2 — 21] - -- x[2 — (ne —1)t]
x[3] x[3 — r] x[3 — 21] - -- x[3 - (ne — l)r]
x[n] x[n — T] x[n - 21] -- x[n - (ne —1)r]

Lx[hs] x[ns —T] x[ns—Zt] x[ns —(ne—1)TL

where x[] is the EEG signal and n is the time index. The parameter ne is the embedding

, (4.25)

 

 

dimension, whose value deﬁnes the number of features input to the PCA. The number of
components selected to reduce the dimensionality of the input data X will be same
number of artiﬁcial features that was needed by the GPAF algorithm to achieve a

satisfactory result. Then these components are processed by the sliding window yielding

D(k—1 L
y 2 Ikl= 2);),- (n), (4.26)

n=l+(k-1)D
where 01- is the principal component j selected by the PCA, j = l, 2,...,nc, and n. is the
number principal components. The parameters L and D will be set to the value set for the
GPAF features. The summation of all points can be considered an elemental feature
(summation is proportional to the mean). For instance, if nc = 2 and letting 0. = x[n—r]

and 02 = x[n] to be the principal components, the features in Equation (4.27) are obtained.

82

 

Multidimensional
EEG raw data PCA

Sliding window Classiﬁer

Figure 4.12 Stages of the PCA k-NN benchmark classiﬁer.

D(k-1 L
.2121: 2512-21.
n=l+(k—1)D

(4.27)
D(k—1 L

y2 M = x[n].
n=l+(k-—I)D .

Finally, the output of the sliding-window-processed data is input to a k-NN classiﬁer
that will categorize the data into its respective classes, i. e., baseline or preictal class.
Figure 4.12 illustrates the process of the ﬁrst benchmark classiﬁer. For instance, if we
have that for a given patient, the GPAF algorithm found three artiﬁcial features, the
dimension of X will be reduced by the PCA to three. Later, each one of these features will
be processed by the Sliding window, yielding three feature vectors that are input to the
classiﬁer.

A k-NN classiﬁer alone is the second benchmark classiﬁer to be used for comparison

against the performance of the GPAF algorithm. That is, the matrix X in Equation (4.25)

(pseudo-state vector) is processed through the sliding window and a vector of size ne

83

input directly to a k—NN classiﬁer, so there is no dimensionality reduction. That is, the

number feature vectors input to the classiﬁer is equal to ne, the dimension of X.

4.8 Receiver Operating Characteristic (ROC) Curve

For the experiments that we will present in Section 4.10, we will provide the receiver
operating characteristic curve plots, an easy, visual way to compare the performance of
the GPAF algorithms and the benchmark classiﬁers. We will also provide the area-under-
curve statistic, a measure that allows us an easier way to compare the performance among
classiﬁers. There are two ways to plot an ROC curve: theoretical and empirical. In the
theoretical way, the PDFS of both classes (i. e., seizure and nonseizure) are needed. If the

more typical ROC curve is going to be plotted (i. e., Pcp vs. Pm), we to calculate PM as

PFA = My 1 Nsldy, (4.28)
5

where 6 is the threshold and PCP as

Pcp = [p(ylslay (4.29)
5

for each value of 6 from —00 to oo [50]. Each pair (Pm, Pep) corresponding to a value of 6
is plotted in the graph. Both axes (Pm, Pcp) go from 0 to 1. An ideal classiﬁer would be
one for which the ROC curve is a vertical line from coordinates (0,0) to (0,1) and a
horizontal line from (0,1) to (1,1), with (0,1) being the point at which the classiﬁer would
yield perfect discrimination. However, this framework is not possible because the

distributions of the conditional PDFS are unknown. Additionally, the classiﬁer used for

84

the GPAF algorithm and the benchmarks is a nonparametric classiﬁer, which outputs not
a probability, but a label. For this reason, to plot the ROC, which may be considered an
empirical or estimated ROC, we deﬁne a set of equations to estimate the probabilities

PCP, PFp using the output of the classiﬁer. Let us deﬁne

_ P(YIS) >
L(y)————-—-—p INS) < 5, (4.30)

where L(y) is the likelihood ratio, 6 denotes the threshold, and p(yIS) and p(y|NS) are the

conditional probabilities which are estimated as

 

and (4.31)

 

where k is the number of nearest neighbors, d2S is the Euclidean distance between the
sample point being tested and the closest preseizure point i, and d2NS is the Euclidean
distance between the sample point being tested and the closest nonseizure point i. It can
be noted that the further away are the closest points of a class, the lower is the probability
of y to belong to that class; correspondingly, the closer the k closest points of a given
class are, the greater is the probability that y belong to that class. Equation (4.32)
provides an estimation of the conditional probabilities that allow us to calculate and
computationally graph the likelihood ratio stated in Equation (4.30). Moving the

threshold 5 from (—oo, 00) allows us to estimate Pm and PCP by counting those nonseizure

85

samples that were misclassiﬁed (i.e., exceeded the threshold and were classiﬁed as
preseizure samples) and those preseizure points that were classiﬁed correctly as shown in
Table 4.3 (i.e., sensitivity and probability of false positives). For each evaluation of 6,
Pm and Pcp are calculated obtaining a point in the (empirical) ROC curve plot. To
smooth the ROC curve, it will be plotted using point-basis samples. That is, each point in

the artiﬁcial feature space is counted as one example.

4.9 Function and Terminal Sets for GP

In GP algorithms, the user must Specify the function set and terminal set that the GP
algorithm will use to create the programs. Table 4.5 shows the functions that will be
available for the GP.

Because functions need to satisfy the closure property in GP, the square root,
logarithm, and division operators are implemented in a protected way. Protected-division
works identically to ordinary division except that it outputs one when its denominator
input is zero. In order to avoid complex values, the protected square root operator applies
an absolute value Operator to the input value before the taking the square root. Also,
logarithms are protected versions that output zero for an argument of zero and apply an
absolute value operator to the argument to handle negative arguments. By increasing the
function set by adding to it more functions (i. e., trigonometric and transcendental
functions), we are addressing and implementing the third recommendation to overcome
the limitations found in the GFNCAF algorithm experiments in [30]. We recall that in
that work, the function set was limited to {+, —, x, -:-, l, I I, ( )2, J— }. With the function set

86

provided in Table 4.5, we are giving more complexity to the search Space and increasing

the likelihood to create better artiﬁcial features.

Table 4.5 List of functions and their respective symbols for the GP algorithm.

 

Function

Symbol

 

addition
subtraction
multiplication
division
square
square root
natural logarithm
logarithm base 2
logarithm base 10
absolute value
sine

cosine

+

()2
M

In
log;

l0210

sin

COS

 

AS a terminal set, the GP has available {x[n], x[n—r], ..., x[n—(ne—l)r]}, i. e., based on

an insight from chaos theory, the number of elements in the terminal set is ne, the

embedding dimension. Again, the delay time t is determined using the autocorrelation

plot as explained in Section 4.5. With this information (embedding vector) as input to the

GP algorithm, we are addressing the fourth and last recommendation to overcome the

limitations of the GFNCAF algorithm (bulleted early in Section 3.1). By essentially

inputting raw data to the GP algorithm, we expect the GP to explore and exploit the EEG
data, increasing the chance of ﬁnding more informative and successﬁrl patterns to
discriminate the classes.

In this chapter, the maximum tree depth was set to 10 and the population size to
1000. The initial population was generated using the ramped half-and-half method. The
GP algorithm used a crossover operator with a rate of 0.9, ﬁtness proportional selection,
and a breeding operator with a rate of 0.1. The GP algorithm was extended to allow it to
evaluate not just one tree per individual, but multiple trees per individual (i. e., a forest),
allowing us to generate a prespeciﬁed number of features for a patient Simultaneously.
All the trees of the individual were evaluated at the same time; however, there was no
crossover among trees (features) of the same individual—crossover was done only
among homologous trees. Although it is possible to conﬁgure the GP algorithm
(software) to decide how many artiﬁcial features is going to evaluate per forest
(individual), in this work, we limited the algorithm to start looking for one artiﬁcial
feature. If no satisfactory results were obtained, then we increased the number of artiﬁcial
features to two. The same procedure was executed if no satisfactory results were
obtained. However, features were not incremented to more than three artiﬁcial features.
To reduce the computing time, LilGP software was used, a popular genetic programming
system implemented in C code (the genetic programming algorithm implemented in
Chapter 3 could not be used because of the impractically high computing time for the

experiments given that it is a Matlab-based code).

88

4.10 Results

In the experiments in this section, all baseline and preictal epochs were of ten minutes
duration, thus containing 120,000 points in each epoch. Preictal segments were selected
to end ﬁve minutes before the UEO occurred. Figure 4.13 illustrates how the preictal
epochs were selected. The length of the Sliding window is set to L = 800 points (or 4 s)
and D = 400 points (or 2 3). Additionally, the maximum for the embedding dimension 11,,
was set to 6, which was arbitrarily selected. The number of nearest neighbors for the k-
NN classiﬁer was set to k = 5. For training, we used a holdout validation technique. Each
patient had for training data two epochs, a baseline epoch and a preictal epoch. In this
chapter, baseline epochs are labeled with small letters, whereas the preictal epochs are
labeled with capital letters.

Experiments were carried out using point-basis classiﬁcation. As was stated
previously, in point-basis classiﬁcation each point counts as an example, whereas in the

block basis, an entire segment (epoch) of data is considered a sample. Figure 4.14 Shows

 

 

 

 

 

Preictal epoch Data
10 min long ignored
-15 min -5 min 0 rinin
UEO

Figure 4.13 Illustration showing from where preictal epochs are extracted. It starts at 15
min before the UEO and ends 5 min before the UEO. The data between 5 min and the
UEO are not used.

89

 

 

Point basis Block basis

(a) (b)

Figure 4.14 Point and block basis examples. A Signal with a sampling of At is illustrated.
(a) in point basis, each point (shaded area) is an example for the classiﬁer while (b) in
block basis a segment of points is considered an example (Shaded area).

an example of each basis. In the ﬁgure, a hypothetical artiﬁcial signal sampled at At is
presented, in which for the point basis, each point is considered an example for the
classiﬁer, while in the block basis, a segment (string of points) is considered an example
for the classiﬁer. From this, it is intuitive that while the block basis is more informative to
make a decision, given that we are using more points, a lot more data will necessary to
train the classiﬁer. On the other hand, the point basis is not as informative as the block
basis, given that we are using a point as an example; however, there are many data
available to train the classiﬁer. Here, a ten-minute long epoch is treated as a block. The
problem with reporting results (to the drug delivery component) on the point basis is that
we do not want to have an implantable device that medicates the patient each At that the
classiﬁer says a point in the signal is preictal. The question that comes up is whether we
can predict the seizures or not. Therefore, we need to transform the point basis results to a
block basis in order to integrate the decision from the classiﬁer over a predetermined,

ﬁxed period. When the evolution of classiﬁer’s decisions is monitored over a ﬁxed

90

period, the device (classiﬁer) can make a better decision on whether to medicate (i. e.,
drug) the patient or not. To convert from point basis to block basis, we deﬁne a threshold,
also known as “chance level”. The chance level to convert from point basis to block basis
was set to 65% in order to increase the conﬁdence of the decision beyond the noise ﬂoor
that a "coin ﬂipper" at 50% would create. If the sum of the point-basis classiﬁcation
results of a preseizure epoch is equal to or exceeds the threshold, then the epoch will be
labeled preictal; otherwise, it will be categorized as baseline (nonseizure). Thus, an epoch
for which 35% or more of the point-basis classiﬁcations classify it as baseline is labeled
as baseline by the block-basis classiﬁer; otherwise it is labeled preictal.

In the next subsections, for the sake of Simplicity, we refer to the GPAF algorithm,
which includes the GP algorithm and k-NN classiﬁer, as the GPAF classiﬁer or just
GPAF, while the benchmark algorithm that includes the PCA transformation and k-NN
classiﬁer is called the PCA k-NN classiﬁer or Simply PCA. Of course, k-NN alone refers
to a k-NN classiﬁer used on full-dimensionality data, without preprocessing by PCA or

using GP-evolved artiﬁcial features.

4.10.1 Patient A

For this patient, seizures were diagnosed as coming from left anterior hippocampus. Two
kinds of seizure were clinically identiﬁed—simple partial and complex partial seizures.
For the training stage, we used one baseline epoch and one preictal epoch. To validate the

artiﬁcial feature, we had three baseline and three preseizure epochs available. The time

91

delay obtained from the autocorrelation function was set to r = 32 (or 0.16 5). Thus, the
terminal set for this patient was {x[n], x[n—32], x[n—64], x[n—96], x[n—128], x[n——l 60]}.

As stated above, the default prediction horizon was set to ﬁve minutes before the
UEO. However, after several evaluations, the GPAF algorithm could not ﬁnd a set of
artiﬁcial features that achieved good performance. Therefore, we moved the prediction
horizon to one minute before UEO. The GPAF algorithm was executed again and a set of

features was found. Equation (4.32) shows the artiﬁcial features y, [k], y2[k], and y3[k]:

y.[kl= “miifgtoinogwein-szi».

 

n=l+400(k—l)
400““ Menuhin-160]) (4 32)
y k = . '
2H n=l+§0(k-l) x[n-96]
400k+400 4
y3[k]= Z (x[n—128D .
n=l+400(k—l)

Table 4.6 Shows the results for the training epochs (point basis). The table also shows the
results for the PCA k-NN and k-NN classiﬁer benchmarks. Results for training data Show
that the transformation provided by the artiﬁcial features achieves an increment on the
overall performance of the classiﬁer.

Table 4.6 Results for training data for patient A (point basis).

 

 

 

 

Epoch PCA k-NN k-NN GPAF
a 65.55% 65.55% 87.87%
A 67.89% 65.89% 67.10%
Overall 66.72% 65.72% 77.44%

 

92

Table 4.7 presents the results for the validation data. The results for validation data are on
a block basis using the threshold previously deﬁned (for point basis results, see Appendix
I). The table Shows the number of preseizure epochs (PSz), the number of baseline
epochs (BS), the number of false positives (FP), the rate of false positives as a percentage
(F P rate), the number of false negatives (FN), and the rate of false negatives as a

percentage.

 

0.9 ~ //

0.8 F ’0’" ,,
0.7 ~ /' ‘ 4
v’ /

z 0.6 ~ / 1’ -

LL
0.5 l' y, ,0’/ "
0.4 r I ~

0.3 r /’ _
I
0.2 r 3/

Hit, 1-P

 

0.1 J ..a — k—NN _

 

 

 

 

 

 

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
False Alarm, PFP

Figure 4.15 Receiver Operating characteristic curves from testing data.

Table 4.7 Results for validation data for patient A (block basis).

 

 

PSz BS FP F P rate FN FN rate
k-NN 3 3 0 0% 3 100%
PCA 3 3 0 0% 3 100%
GPAF 3 3 O 0% 0 0%

 

93

While all classiﬁers Show good performance classifying all the baseline epochs correctly,
the GPAF classiﬁer was the only one that classiﬁed all three preictal epochs correctly.
Figure 4.15 depicts the receiver operating characteristic (ROC) curve for the three
different classiﬁers. Although not close to an ideal curve, it shows that the GPAF
classiﬁer exceeds the performance of the benchmark classiﬁers. Because the ROC curve
is a visual tool, we compare the three classiﬁers by also calculating the area under the
curve (AUC) of the ROC, an easier way to compare the performer among classiﬁers. The
AUC values for the classiﬁers are AUCLNN = 0.499, AUCPCA = 0.499, and AUCGPA}? =
0.688. Theoretically, the classiﬁer cannot achieve an AUC worse than 0.5 because the
class labels could always be reversed; thus, the AUCLNN = 0.501 and AUCPCA = 0.501.
The AUC statistic Shows that the transformation provided by the artiﬁcial features

produces a better classiﬁer system than the benchmark classiﬁers.

4.10.2 Patient B

For patient B, seizures were found coming from the right inferior temporal neocortex and
were diagnosed as complex partial. There were seven baseline epochs and ﬁve preictal
epochs for training and validation purposes. One baseline and one preictal epoch were
used to train the GPAF algorithm. The delay I for this patient was set to r = 30 (or 0.15 S)
based on autocorrelation calculations; thus, the terminal set was {x[n], x[n—30], x[n—60],
x[n—90], x[n—120], x[n—150]}.Equation (4.33) Shows the artiﬁcial features found for this

patient by way of the GPAF algorithm.

94

400k+400

Yr [1‘]: Z x[n]-(x[n—30D2-
n=l+400(k—l) (4.33)

y2[k]= 400kz+4ooln(x[nD—x[n—90]+(x[n—150])22

n=l+400(k—l)

Table 4.8 shows point basis results for training epochs along with the results for the
benchmark classiﬁers, using a prediction horizon of ﬁve minutes. Table 4.9 Shows the
results for validation data on a block basis (see Appendix I for point basis results). All
baseline epochs were classiﬁed correctly by all the classiﬁers. On the other hand, the k-
NN and PCA k-NN classiﬁers missed two and three preictal epochs, respectively, while
GPAF failed to predict only one seizure. Figure 4.16 Shows the ROC curves for the three
classiﬁers. The k—NN has an AUCLNN = 0.735, the PCA k-NN an AUCPCA = 0.609, and
GPAF a AUCGPAF = 0.874. Looking at the ROC and AUC statistics, it is clear that the
GPAF classiﬁer had a better performer.

Table 4.8 Results for training data of patient B (point basis).

 

 

 

 

PCA k-NN k-NN GPAF

a 68.46% 59.06% 92.29%

C 72.15% 83.56% 90.94%
Overall 70.40% 71.24% 91.62%

 

Table 4.9 Results for validation data for patient B (block basis).

 

 

PSz BS FP FP rate FN FN rate
k-NN 4 6 0 0% 2 50%
PCA 4 6 O 0% 3 75%
GPAF 4 6 0 0% 1 25%

 

95

 

Hﬂ,1—P

 

 

 

 

 

 

 

 

0 DJ (12 (13 (14 (15 (16 (17 (18 (19 1
False Alarm, PFP

Figure 4.16 Receiver operating characteristic curves from testing data for patient B.

4.10.3 Patient C
This patient has complex partial seizures. Seizures were found to be multifocal, coming
from right and left inferior frontal neocortex, left temporal neocortex, and left
hippocampus. The delay was set to r = 31 (or 0.155 S) and thus the terminal set was
{x[n], x[n—31], x[n—62], x[n—93], x[n—124], x[n—155]}. For training and validation, we
have eleven baseline and eleven preictal epochs.

For this patient, only one artiﬁcial feature was needed. Equation (4.34) Shows the
GPAF formula derived. AS well as being relatively Simple, it used only three of the six

possible dimensions in the terminal set.

400k+400

.VIkI: _l Emk-llpgmhlnD-(ﬂn—ISSDZ ~logm(cos(x[n—124])). (4,34)

96

Table 4.10 presents the training results of the GPAF equation and benchmark classiﬁers
k-NN and PCA k-NN (the PCA algorithm reduced the dimensionality of the data to 1).
Table 4.11 shows the results for the validation data. The benchmark classiﬁers classiﬁed
correctly all the baseline epochs; however, the k-NN classiﬁer missed one seizure
whereas the PCA k-NN classiﬁer missed four seizures. On the other hand, the GPAF
classiﬁer achieved a perfect classiﬁcation. Figure 4.17 Shows the ROC curves. We note
that both k-NN and GPAF achieved near optimal curves (a Sharp 90-degree elbow along
the axes) in this case. From the ﬁgure, we can determine the AUC values, yielding AUCk-
W = 0.952, AUCPCA = 0.778, and AUCGPAF = 0.967. The GPAF algorithm not only found
a single artiﬁcial feature that achieved perfect classiﬁcation (in the block basis sense), but
also a single feature, which uses just three out of six inputs available, that yielded a
higher accuracy classiﬁcation in a point basis sense (see Appendix I).

Table 4.10 Results for training data for patient C (point basis).

 

 

 

 

Epoch PCA k-NN k-NN GPAF
a 85.95% 98.66% 96.66%
D 66.88% 87.27% 97.66%
Overall 76.42% 92.97% 97.16%

 

Table 4.11 Results for validation data for patient C (block basis).

 

 

PSz BS F P F P rate F N FN rate
k-NN 10 10 0 0% l 10%
PCA 10 10 0 0% 4 40%
GPAF 10 10 0 0% 0 0%

 

97

 

 

Hit, 1—P

 

 

- -- GPAF
— k-NN

l I l l l I

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
False Alarm, PFP

 

 

 

 

 

Figure 4.17 Receiver Operating characteristic curves from testing data for patient C.

4.10.4 Patient D

This patient was diagnosed with complex partial seizures coming from the right inferior
frontal region. The delay for this patient was set at 1: = 27 (or 0.135 S). Therefore, the
terminal set was {x[n], x[n—27], x[n—54], x[n—81], x[n—108], x[n—l35]}. For training and
validation purposes, we had twelve baseline and eleven preictal epochs available. The
prediction horizon for this patient was ﬁve minutes, the default horizon. Following the
training procedure described earlier in this section, one baseline and one preseizure epoch
were used. However, when the GPAF algorithm was executed, it found a set of features
that achieved acceptable performance over the preictal epochs (six out of ten), but not
over some baseline epochs. Because of this, we decided to use two ﬁve-minute-duration

baseline epochs for the training process (plus the Single preictal epoch) thus giving the
98

GPAF algorithm a better representation of the baseline data of this patient. Under these
conditions, two artiﬁcial features were found that achieved a more practical performance.

Equation (4.35) Shows the genetically-designed features.

400k+400 2
2.1/2F 2: 122.0(1212-271) 1212-271.
n=1+400(k—1)
(4.35)
400k+400
k = x n—54 +M.
Y2” n=1+§0(k—l)l [ ] x[n-81]

Table 4.12 Shows the results for the training epochs. Table 4.13 presents the validation
results (block basis) (see Appendix I for point basis results). It shows that the benchmark
classiﬁers, while correctly classifying all baseline epochs, also misclassiﬁed all the
preictal seizures (ten of them). On the other hand, the GPAF classiﬁer misclassiﬁed one
baseline epoch and just four preseizure epochs (six seizures were predicted correctly ﬁve
minutes before the seizure occurred). Finally, Figure 4.18 Shows the ROC curve for
validation data (point basis). The ROC is not close to the ideal; however, it still does
better than the benchmark classiﬁers. To measure this, we calculated the AUC values
yielding AUCkNN = 0.519, AUCPCA = 0.513, and AUCgpAp = 0.673. Although, the AUC
for the GPAF is very far from the ideal 1, it is not nearly as close to the "coin-ﬂipper"
case as are the benchmark classiﬁers k-NN and PCA k—NN.

Table 4.12 Results for training data for patient D (point basis).

 

 

 

Epoch PCA k-NN k-NN GPAF
c and e 70.57% 70.90% 83.61%
A 76.92% 72.24% 87.96%

 

Overall 73 .75% 71.57% 85.79%
99

 

Table 4.13 Results for validation data for patient D (block basis).

 

 

 

 

PSz Bs F P FP rate FN FN rate
k-NN 10 10 0 0% 10 100%
PCA 10 10 0 0% 10 100%
GPAF l 0 1 0 1 1 0% 4 40%
1 L 2 .
”x

0.9 2 // 2

0.8 i- if.
0 7 ~ ’/ 4
. '/
a’
0.6 '— I./ _]
z "
IL“ "
I 0.5 L / .4
'- I
2.. l
f 0.4 r l’ 4
’I
I
0.3 ’- l 0 4

 

0.2 [- t/ S

0.1 -

 

 

 

 

 

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
False Alarm, PFP

Figure 4.18 Receiver operating characteristic curves from testing data for patient D.

4.10.5 Patient E

Patient E was found to have complex partial seizures coming from the left hippocampus
and left anterior temporal cortex. The EEG data include Sixteen baseline epochs and the
same number Of preictal epochs. Preictal epochs were extracted ﬁve minutes before the
UEO (unequivocal electrographic onset of seizure). The delay factor I was set to r = 49

(or 0.245 3). Consequently, the variables in the terminal set are {x[n], x[n—49], x[n—98],

100

x[n—147], x[n—l96], x[n—245]}. For this patient, two artiﬁcial features were found, for

which equations are shown in (4.36).

 

 

 

 

 

 

 

n=l+400(k—l)
(4.36)
400k+400 2
Y2 M = Z (log10(x[n])) -
n=1+400(k—1)
Table 4.14 Results for training data for patient E (point basis).
Epoch PCA k-NN k—NN GPAF
g 74.16% 84.9% 85.23%
P 76.51% 74.16% 94.3%
Overall 75.42% 79.6% 89.63%
Table 4.15 Results for validation data for patient E (block basis).
PSz BS FP FP rate FN F N rate
k-NN 15 15 0 0% 6 40%
PCA 15 15 1 6.67% 6 40%
GPAF 15 1 5 1 6.67% 4 26.67%

 

Table 4.14 shows the results for the benchmark and GPAF classiﬁers evaluated with
the training data. The GPAF algorithm created two artiﬁcial features that for the training
data achieved about 10% higher correct classiﬁcation than the benchmark schemes. Table
4.15 presents the results for the validation EEG data. The k-NN classiﬁer misclassiﬁed
one baseline epoch and six preictal epochs. The PCA k-NN classiﬁed all baseline epochs
correctly, however, missed six preictal epochs. Finally, the GPAF classiﬁer failed to

categorize one baseline epoch correctly, but also misclassiﬁed four preictal epochs; in

101

other words, it was able to correctly predict two more seizures than the benchmark
classiﬁers did.

Figure 4.19 Shows the ROC curve. From this, we can calculate the AUC values
yielding AUCLNN = 0.821, AUCPCA = 0.749, and AUCGPAF = 0.884. Although the k-NN
classiﬁer had good performance, the GPAF classiﬁer had a still better AUC value. Also,
we need to remark that the k-NN classiﬁer used all dimensions (six of them) in the
terminal set, whereas the GPAF features used just two inputs (states)——that is, x[n] and

x[n—I47].

 

 

 

 

 

 

 

 

 

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
False Alarm, PFP

Figure 4.19 Receiver operating characteristic curves for patient E.

4.10.6 Patient F
This patient was diagnosed with complex partial seizures coming from bilateral regions.
Some seizures came from the left hippocampus and other seizures came from the right

102

temporal region. This patient, though, had seven seizures in the data; one epileptic seizure
occurred very early in the recording, leaving us without enough data to extract ten
minutes prior to the UEO mark. Therefore, we had Six preictal epochs and eight baseline
epochs for training and validation purposes. Because the delay was equal to r = 55 (or
0.275 s), the elements of the terminal set are {x[n], x[n—55], x[n—llO], x[n—165], x[n—
220], x[n—275]}. Equation (4.37) Shows the GP artiﬁcial feature for this patient. Table
4.16 Shows the training results. The artiﬁcial feature achieved an overall improvement of
7% in classiﬁcation accuracy over the benchmark classiﬁers.
400k+400
ylkl = Zcos(log.o(x[nl- x[n - 551»- (4.37)
n=l+400(k—1)

Table 4.17 shows the results for the validation data. Although the k-NN classiﬁer
categorized all baseline segments correctly; it also misclassiﬁed one preictal epoch,
falling short by 0.79% (see Appendix I, epoch F) to be classiﬁed correctly at ﬁve minutes
before the UEO. On the other hand, the PCA k-NN and GPAF classiﬁers categorized all
epochs correctly. However, when the ROC curves are plotted in Figure 4.20, it shows that
the ROC curve for the k-NN classiﬁer is closer to the ideal ROC curve than that for the
PCA k-NN classiﬁer. In fact, the AUC values are AUCLNN = 0.979 and AUCPCA = 0.834.

Table 4.16 Results for training data for patient F (point basis).

 

 

 

 

Epoch PCA k-NN k-NN GPAF
a 84.56% 94.63% 86.91%
A 68.79% 78.19% 96.31%
Overall 76.59% 86.45% 93.65%

 

103

Table 4.17 Results for validation data for patient F (block basis).

 

 

PSz BS F P FP rate PM PM rate
k-NN 5 7 0 0% l 20%
PCA 5 7 0 0% 0 0%
GPAF 5 7 0 0% 0 0%

 

1

 

0.9 r

0.8 _

0.7

 

0.6

FN

0.5 .

Hit. 1-P

0.4

 

0.3

0.2 r ‘

 

0.1 t -— k-NN 4

 

 

 

 

 

o l l I I I l l I l
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
False Alarm, PFP

 

Figure 4.20 ROC curves from testing data for patient F.

None of those, however, surpassed the performance of the GPAF classiﬁer, for which
AUCgpAp = 0.985.

Equation (4.37) shows that GP artiﬁcial feature used just two dimensions (or states),
x[n] and x[n—55], while the k-NN classiﬁer necessarily used all Six dimensions in the
terminal set. In other words, the GPAF algorithm created an artiﬁcial that performs

slightly better than the k-NN classiﬁer while using fewer—in this case, two—dimensions.

104

On the other hand, although the PCA k-NN classiﬁer classiﬁed all epochs correctly, it
reached an overall point basis classiﬁcation of 78.2%, while the GPAF classiﬁer reached
94.45%. Therefore, the GPAF is not just classifying epochs correctly, but also Classifying

those with high-accuracy, allowing a larger margin for error.

4.10.7 Patient G

Like most patients, this patient also suffered complex partial seizures, which came from
the left hippocampus and left inferior temporal neocortex. This patient had 31 seizures, of
which twenty seizures were subclinical seizures, SO only eleven were clinical seizures.
Although it is known that subclinical seizures are disruptive to the brain [59], in this
work, our target is the prediction Of clinical epileptic seizures. In the next chapter, we
discuss subclinical seizures more fully, in the context of the seizure detection (rather than
prediction) problem.

There were eleven preseizure epochs and eleven baseline epochs for training and
validation processes. The delay was r = 33 (or 0.165 s) and terminal set was {x[n], x[n—
33], x[n—66], x[n—99], x[n—132], x[n—165]}. Initially, in the training phase, we used one
baseline epoch and one preictal epoch, yielding a set Of artiﬁcial features that had good
performance on preictal data; however, they displayed poor performance on non-training
baseline data, classifying some epochs correctly but other epochs completely wrong.
Therefore, we decided, again, to use two ﬁve-minute-duration baseline epochs from two

different timefrarnes in the training data in order to give to the GPAF algorithm a better

105

representation of the baseline data of this patient. Using these data, the GPAF algorithm

was able to design the features shown in Equation (4.38).

yIIkI= ‘”§18.,(.[.-22p.1.[.1.

 

 

 

 

n=l+400(k—l)
(4.38)
400k+400 1
y2[k]= z cos[(x[n_66])z]+|x[n-99]_(x[n]+tog(x[n])).
n=l+400(k-l)
Table 4.18 Results for training data for patient G (point basis).
Epoch PCA k-NN k-N N GPAF
a and h 49.83% 67.89% 79.26%
A 50.17% 38.8% 81.94%
Overall 50% 53.35% 80.36%

 

Table 4.19 Results for validation data for patient G (block basis).

 

PSz Baseline FP FP rate FN FN rate

 

k-NN 8 8 2 25% 4 50%
PCA 8 8 0 0% 8 100%
GPAF 8 8 2 25% 3 37.8%

 

Table 4.18 Shows the results for the GPAF equations. It also shows the results for the
benchmark k-NN and PCA k-NN classiﬁers. In Table 4.19, the validation data results are
presented. The k-NN classiﬁer misclassiﬁed two baseline epochs and four preictal
epochs. The PCA k-NN classiﬁed all baseline epochs correctly but misclassiﬁed all
preictal seizures (eight of them). Finally, the GPAF classiﬁer misclassiﬁed two baseline
and three preictal epochs. The GPAF classiﬁer, compared against the k-NN classiﬁer,

could predict one more seizure correctly. Figure 4.21 Shows the ROC curves. The areas

106

 

I T I ’
-#’.'- -.
0 9 ~ /"'u .-’ 4
. I .I"
r .o’
I .-‘
0.8 r // ff 4
/ ...f

0.7 r ’/ .-°°
/ /

FN

0.
' .I
1:. 0.5 ~ / ’,.-° 2
':E ,I’ ,1
0 4 2 ,c’ / _
I, ,0"
0.3 h J... _
If
0.2 - ...,/J 4
’C
1 /
_ J ..
0.1 ”,J/

 

 

 

 

 

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
False Alarm, PFP

Figure 4.21 Receiver operating characteristic curve for validation data.

under curve are AUCk-NN = 0.632, AUCPCA = 0.469 (or 0.531 by the worst binary
classiﬁer argument), and AUCGPAF = 0.733. Again, the GPAF classiﬁer achieved better
performance than the benchmark classiﬁers by Obtaining an overall 9.64% increase in
correct classiﬁcation (for the point basis results, see Appendix I) over the k-NN classiﬁer
and an overall 18.33% increase in correct classiﬁcation (on the point basis) over the PCA
k—NN. This means that the GPAF classiﬁer classiﬁed epochs correctly with higher

accuracy than its benchmark counterparts.

4.11 Discussion

In this section, we summarize the results described in the preceding subsections. Table

4.20 shows the overall results where the performance of the benchmark classiﬁers and the

107

GPAF classiﬁer are measured counting all the epochs. We have a total of 59 baseline
epochs and 55 preseizure epochs. The table shows that the number of baseline epochs
misclassiﬁed was small, with the GPAF classiﬁer misclassifying more, 4 out of 59, for a
false positive rate of 6.78%. On the other hand, the k-NN classiﬁer Showed poor
prediction performance by missing 27 out of 55 seizures (or it correctly predicted 28
seizures) for a false negative rate of 49.09%. The other benchmark, the PCA k-NN
classiﬁer, obtained the worst performance, misclassifying 34 of 55 preseizure epochs (it
correctly predicted only 21 seizures). It resulted in a 61.82% false negative rate. The high
false negative rates yielded by the benchmark classiﬁers make them impractical for
clinical purposes. On the other hand, the GPAF classiﬁer was able to predict 43 preictal
epochs out of 55 (it missed twelve Of them), yielding a 21.81% false negative rate. These
results make the GPAF classiﬁer by far the best Of the three classiﬁers, for this purpose.

Table 4.20 Overall results for all three classiﬁers.

 

 

PSz BS FP FP rate F N F N rate

k-NN 55 59 3 5.08% 27 49.09%
PCA 55 59 1 1.69% 34 61.82%

GPAF 55 59 4 6.78% 12 21 .81%

 

It is clear that the artiﬁcial features crafted by the GP algorithm play an important
role in getting a better separation of the classes and thus Obtaining a better classiﬁcation.
Also, we make clear that the block basis classiﬁcation depends on what value is set for
the threshold used to transform from point basis to block basis classiﬁcation. For all
patients evaluated above, this threshold was set to 65%. However, if we move the

threshold to a lower value, we would have an increase in the number of preictal epochs
108

classiﬁed correctly. For example, consider Patient D. If we set the threshold to 50%, the
block basis classiﬁcation rate changes drastically. Table 4.21 presents the results with the
new threshold. The k-NN classiﬁer now misclassiﬁes two additional baseline epochs, but
is able to predict four seizures, unlike when using 65%, when all seizures were missed.
The PCA classiﬁer misclassiﬁed most of the baseline epochs (seven of them) while
classifying correctly seven preseizure epochs. Finally, the GPAF classiﬁer misclassiﬁed
two more baseline epochs and classiﬁed correctly one additional preictal epoch (it missed
only three seizures). A reduction of the false negative rate was obtained at the expense of
an increase of the false positive rate. However, the GPAF still was not drastically affected
by the new threshold given its superior point basis classiﬁcation.

Table 4.21 Results for validation data on block basis for patient D using 50% threshold.

 

 

PSZ BS F P F P rate FN F N rate
k-NN 10 10 3 30% 6 60%
PCA l 0 l 0 7 70% 2 20%
GPAF l 0 l 0 3 30% 3 30%

 

The overall point basis classiﬁcation using the GPAF classiﬁer was 65.10% (with a
standard deviation i 32.16%), whereas the overall point basis classiﬁcation for the k-NN
was 51.49 at 15.32%, and the overall classiﬁcation using the PCA k-NN was an average
of 50.86% with a standard deviation i15.44%. The mean of the GPAF classiﬁer was

roughly 15% over the average of the benchmark classiﬁers (the standard deviation was

high because one baseline and three preictal epochs were poorly classiﬁed). This means

109

that the GPAF classiﬁer yielded a high-accuracy classiﬁcation, so we were able to deﬁne
the threshold for the block-basis decision with higher conﬁdence.

The overall average of the point-basis classiﬁcation accuracy of GPAF across all
patients was 78.55 _+_ 25.26%, whereas the PCA k-NN classiﬁer achieved 62.46 i 17.75%
and the k—NN classiﬁer achieved 69 at 23.53%. The GPAF classiﬁer surpassed the best
benchmark classiﬁer, the k—NN, by about 9% in the mean. The GPAF classiﬁers achieved

an average prediction time of 4.43 minutes (before UEO).

4.12 Statistical Test

However, this comparison is not sufﬁcient to establish whether our algorithm is better or
not. Therefore, we wished to perform a statistical test. TO determine whether the
statistical test Should be parametric or nonparametric, we plotted the probability density
functions (PDF) of the classiﬁcation of the epochs across all patients (114 epochs in total)
for the three different algorithms. Figure 4.22 depicts the PDFS.

From the ﬁgure, it is easy to determine that the PDF S are not normally distributed; a
nonparametric statistic test should be employed. We selected the Mann-Whitney U test
[81], also called the Wilcoxson test. Although nonparametric tests are not as powerful as
parametric tests, according to Conover [13], a comparison of the relative efﬁciency
between the Mann-Whitney test and the two-sample t-test, which is parametric, is close.
We are interested in knowing whether the GPAF algorithm’s performance is equal to or
better than the benchmark algorithms’ performances, across the ensemble of data. Thus,
the hypothesis for the statistical test is as follow:

110

0.4 *

 

 

 

 

 

 

 

—— GPAF
_ —- k-NN
0'35 --- PCAk—NN

0.32

0.25

0.2

0.15»

0.1—

0.052

t"

0 ’1
0 10 20

 

Figure 4.22 PDFS for the percentage of correct classiﬁcation of the epochs for the three
algorithms (from Appendix 1).

Ho: E[X] = E[Y]

Ha: E [Y] < E[X],
where H0 is the null hypothesis, Ha the alternative hypothesis, X denotes the epochs

(4.39)

processed with the GPAF algorithm, Y denotes the epochs evaluated with the PCA k-NN
and is E[] is the expected value. Also, a Wilcoxson test was established between the

GPAF classiﬁer and the k-NN classiﬁer with the hypothesis

Ho: E[X] = E[Z]
(4.40)
Ha: E[Z] < E[X],
where Z denotes the epochs processed by the k—NN classiﬁer. The level Of signiﬁcance or

was set to 0.05. Because 114 samples (number of epochs) are considered a large number,

the P-value is calculated using a normal (2) distribution (by means of the central limit

.111

theorem). For the hypothesis in Equation (4.39), the statistical test value was 6.59 and the
P-value was 4.54><10'll z 0. The critical value (the value that must be exceeded in order
to reject the null hypothesis) for one-tailed, or = 0.05, was 1.659. Therefore, the null
hypothesis is rejected and the alternative hypothesis is accepted. That is, the expected
value (or mean) of the GPAF classiﬁcation is larger than the mean of the PCA k—NN
results. In the second case, for the hypothesis in Equation (4.40), the statistical test value
was 3.61 and the P-value was 3.09X10'4, which indicates that the null hypothesis was
rejected and the alternative was accepted, because the critical value was 1.659. In other
words, the mean of the GPAF results was larger than that of k-NN classiﬁer. Therefore,
the statistical tests state that the improvement in the mean performance of the GPAF over
the other two methods is statistically Signiﬁcant to a very high level, well beyond the 5%
level represented by a critical value of 1.659. Of course, the statistical test results do not
necessarily imply that the GPAF algorithm is better than the benchmark classiﬁers but

just simply assure us that the GPAF results are better.

4.13 Generic GPAF Equations

Ideally, it is desirable to ﬁnd not just a good predictor for a given patient, but also a
general set of features that will work for all patients by tuning a set of parameters to
customize the predictor for the needs of such patients. Although such an approach has
been proposed in the past, little effort has been exerted in this direction—in part because
of the lack of an algorithm that allows researchers to explore the precursor patterns to
predict seizures directly from the raw EEG data.

112

In this section, we present the results of an experiment intended to ﬁnd a set of artiﬁcial
features or a generic predictor model that yields acceptable results. In the experiment, we
set the number of artiﬁcial features to three. A holdout validation technique was used for
training and validation purposes, where the baseline and preictal epochs were clipped to
ﬁve minutes, in order to reduce the computation load. Preictal epochs were the ﬁve
minutes immediately leading to the UEO, except for patient A, for whom EEG data were
extracted up to one minute before the UEO. Additionally, we used the same delay values
previously determined for each patient in the terminal set. That is, during training, when a
given patient was being evaluated, the algorithm set its delay values in the artiﬁcial
features to integer multiples Of that patient’s autocorrelation time. The parameters of the
GP were set the same as before except for the number of generations, which was
increased to 250 because of the complexity of the problem.

Equation (4.41) Shows the artiﬁcial features found by the GPAF algorithm.
Surprisingly and interestingly, the GPAF algorithm came up with a set of artiﬁcial

features that yielded an acceptable, but also, surprising, result.

400k+400
yr ik] = Z X[n]+ logz (ln(x[n - 2T]))+ ln(x[n — TI) .
n=l+400(k—l)
400k+400
yzlki= ZX[n—T]+x[n—5T]. (4-41)
n=l+400(k-l)

400k+400

y3 [k] = leln — T].

n=l+400(k—l)

113

Table 4.22 Validation results for seven patients.

 

 

 

Patient PSz BS FP FP rate FN F N rate
A 3 3 0 0% 1 33%
B 4 6 0 0% 2 50%
C 10 10 0 0% 0 0%
D 10 10 1 10% 3 30%
E 15 15 1 10% 4 26.67%
F 5 7 0 0% 0 0%
G 8 8 2 25% 7 87.5%
Overall 55 59 4 4.5% 17 22.71%

 

Table 4.22 shows the results for the validation data. Overall, the set of features
misclassiﬁed just 4 of out 59 baseline epochs for a rate 4.5% of false positives, whereas
17 preictal epochs out of 55 were not classiﬁed correctly for a false negative rate of
22.71%.

Comparing these results against those of the previous section, where a set of features
was optimized for each patient, reveals that Equation (4.41) misclassiﬁed the same
number of baseline epochs, while missing ﬁve seizures more than were missed
collectively by the artiﬁcial features shown in the previous section. Most of those
misclassiﬁcations were on patient G, in which just one seizure was predicted. Similar to
results presented before for patients C and F, the Equation (4.41) also achieved perfect
classiﬁcation. The average overall accuracy achieved was 76.79 i 27.56%, roughly 2%
less in the mean than that achieved by the “customized” artiﬁcial features, although the
standard deviation was also a little higher. The results from Table 4.23 demonstrate that it

is possible to optimize a generic model to capture the characteristics of individual
114

patients. These results might pave the way for more research on a general model that

captures the dynamics hidden behind the epileptic seizures.

4.14 Training the GPAF Algorithm Using the Leave-One-Out Method

It is generally desirable to train a classiﬁer using the leave-one—out cross validation
method. The leave-out-one method (LOO) tends to extract from a given dataset the model
with the highest classiﬁcation accuracy, as it uses the largest possible training set against
which to perform each remaining testing classiﬁcation, but at the price of the highest
requirements for computation. This computational expense is precisely the reason that the
LOO method was not chosen as the training method for most runs of the GPAF
algorithm. Nonetheless, as a way to reinforce this work, we provide the results for a
patient evaluated with the GPAF algorithm using a leave-one-out method.

Data from patient B were selected for the experiment. The total number of epochs
available, twelve, created too much processing load for use in a full LOO method;
therefore, three baseline epochs (a, c, e) and three preictal epochs (C, D, E) were
randomly selected from the twelve epochs available for this patient. Since only these six
datasets will be used for both training and validation, the analysis corresponds to LOO
for that universe of Six datasets, and demonstrates the sorts of gains in accuracy available
with more computational resources.

To train, we set the algorithm to leave one baseline and one preictal epoch out for
validation and to use the remaining four for training, doing this with all combinations of

baseline and preictal epochs. The parameters for the GP algorithm and k—NN classiﬁer

115

were set the same as before. The number of features was set to two, thus producing two
artiﬁcial features. The terminal set for the patient was the same, given that the delay
value was unaltered.

Equation (4.42) Shows the equation found by the GPAF algorithm using LOO cross
validation for patient B. The ﬁrst artiﬁcial feature y1[k] is the signal energy of the state
x[n—60]. Table 4.23 provides the results using the LOO cross-validation, where Pc

denotes the percent of correct classiﬁcation.

400k+400

y. [kl = Z (x[n -- 60])2 .
n=1+400(k—l)
(4.42)
400k+400
yzlk]= Zx[n-90]-(x[n-150]—x[n—120D.
n=l+400(k-l)

Table 4.23 Results using leave-one-out cross validation.

 

 

Epochs Pc
a, C 79.3
c, D 84.31
e, E 85.14
a, D 77.63
a, E 68.78
c, C 73.96
e, C 80.63
c, E 78.46
e, D 86.14

 

Average 79.59 i 5.94 %

 

116

According to the results from the table, using the LOO method, the GPAF equations
achieved good, low-variance classiﬁcation results even when epoch E, which in holdout
validation was misclassiﬁed, iS included (see Appendix 1, patient B). These results Show
that to design good artiﬁcial equations with high accuracy and low-variability, the leave-
One-Out method can deﬁnitely be useful. However, when running many experiments,
such as was done here, across many patients, it would have required an impractically
large computer resource. In this work, there were large archives of EEG signals,
containing many epochs, but use of all of them would have consumed more computer
time than was available. This LOO run demonstrates that better accuracy could be
obtained for clinical application if Signiﬁcant computing resources were available for
analyzing each patient’s EEG signals, or if a satisfactory cross-patient model could be
found.

Finally, we want tO remark that the GPAF algorithm did not just design artiﬁcial
features with unknown physical meanings, but was able to produce some interpretable
equations that are well known to researchers in epilepsy. In this case, the GPAF
algorithm designed a feature measuring the energy of the Signal, which is one Of the

conventional features used for prediction and detection of epileptic seizures.

4.15 Comparison with Another Proposed Method

Because the benchmarks used for comparison with the GPAF algorithm in this chapter
are neither proven nor well-known methods for extracting features to predict epileptic

seizures, in this section we would like to compare our results with another reported

117

method that used the same EEG database. In Chapter 2, we brieﬂy discussed a method
proposed by D’Alessandro et al. [15], in which a GA was used to ﬁnd a set of
conventional features (and channels or electrodes) that best predicted seizures, while at
the same time having a l‘Ow false positive rate. Table 4.24 shows the average results
reported in their work, where “baseline tested” is the number of ten-minute-duration
epochs used for the experiments, “preictal tested” is the number of preictal epochs used
for the experiments, “F P rate” is the fraction of baseline epochs misclassiﬁed, “FN rate”
is the fraction of preictal epochs misclassiﬁed, and FPH is the rate of false positives per
hour. The table also displays the results Obtained by the GPAF algorithm’s customized
equations and the generic (cross-patient) GPAF equations (denoted as “Gen. GPAF”).
The table shows that both GPAF-generated sets of features had better performance than
the D’Alessandro’s method. However, it is clear that we tested a lower number of
baseline epochs. Additionally, D’Alessandro et al. reported an average prediction time of
3.46 minutes (before UEO), whereas the GPAF and generic GPAF achieved an average
prediction time of 4.43 minutes (before UEO).

We need to consider that in [15], the authors used 70% (they had 46 preictal epochs)
of the preseizure data available for training, while the rest was used for validation
purposes. For this reason, surrogate preictal data sets were artiﬁcially generated to
increase the amount of the preictal data available. The authors did not indicate how many
surrogate epochs were generated (therefore, the preictal tested cell just indicates that
more than 46 seizures were used). In contrast, we used roughly 11% percent of the

preictal data available (we had 62 preictal epochs in total) for training purposes, and 89%
118

of the data for validation purposes. This reliance on only a smaller set of training data
appears to be another advantage of the GPAF method.

Even though the comparison is not under completely identical conditions, Since
training and validation sets are not the same, the data are drawn from the same pool, and
the results favorable over pre-existing methods. Results do not demonstrate that our
algorithm outperforms D’Alessandro’s approach in all respects, but simply indicate that
the GPAF algorithm produced promising results that are worthy of further investigation.

Table 4.24 Comparison of D’Alessandro’s algorithm and GPAF results.
Number Baseline Preictal

 

 

 

System Patients Tested Tested FP Rate FN Rate FPH
D’Alessandro 4 276 +46 9.53 37.5 0.278
GPAF 7 59 55 6.78 21.81 0.068
Gen. GPAF 7 59 55 6.78 30.91 0.068
4.16 Summary

This chapter presented the application of the GPAF algorithm, introduced in Chapter 3, to
design of a set of artiﬁcial features helping to predict epileptic seizures. The theory
behind the reconstruction of the state-Space trajectory for a chaotic system was explained
early in the chapter. We reconstructed the state-space trajectory for a time-varying
empirical model of the EEG. Those reconstructed dynamic variables were input to the GP
algorithm, which designed a set of artiﬁcial features to separate the baseline data from the
preictal data. The discrimination task was carried out by the classiﬁer component of the
GPAF algorithm—in this case, a k-nearest neighbor classiﬁer was used. By implementing

the GP algorithm as the search engine for artiﬁcial features, using the delay embedding
119

scheme as input data to the GP, and adding more functions to the function set of the GP,
we addressed every one of the suggested recommendations by the author in [30] and
stated in Section 3.1.

The algorithm was evaluated over seven patients: six with a prediction horizon of
ﬁve minutes, and one with prediction horizon of one minute before the UEO. Results
from two benchmark classiﬁers were used to compare against the results of the GPAF
algorithm. In all the experiments carried out, the GPAF-designed features surpassed the
performance of the benchmark classiﬁers, yielding high accuracy on a point basis as well
as on a block basis.

Although the training for the customized artiﬁcial equations was carried out using
holdout validation, because of computing resource limitations, we carried out an
experiment using a reduced LOO cross-validation on Six epochs from patient B, which
produced a pair of artiﬁcial features that yielded good results with high conﬁdence.

Results in this chapter also illustrate that, although a uniﬁed model for the epilepsy
prediction problem is still far from trivial, the design of optimized, nontrivial features for
each patient makes viable a custom-implantable device with the artiﬁcial features
integrated into the chip.

Finally, we evaluated an algorithm to design a generic set of artiﬁcial features that
works across patients, using only one pre-calculated patient-Speciﬁc characteristic, the
delay parameter. The results were much better than expected, yielding three simple

equations that predicted the epileptic seizures across patients, better than the benchmark

120

classiﬁers. These equations are a step forward in the search for a general model for
epilepsy prediction.

In the next chapter, we use the GPAF algorithm to design features for detection
(rather than prediction) of epileptic seizures—a relatively easier problem, yet still far

from trivial.

121

Chapter 5

Detection of Seizure Onset by Means of GPAF

In this chapter, we discuss the development of features that can help detect, with high
accuracy, the onset of seizures. We then present the application of the GPAF algorithm
for the detection case. It will be implemented in different patients such that the algorithm
can create a feature that has a detection rate similar to or better than that of the

benchmark classiﬁer.

5.1 Why Detect Onset of Seizures?

At ﬁrst glance, detection of seizure onset does not seem very beneﬁcial, from a medical
point of view, if it is compared with the obvious beneﬁt offered by prediction of such
attacks. However, the truth is that detection of seizure onset is as important as prediction
of seizures. There is one beneﬁt from a diagnostic perspective and two potential beneﬁts
from a therapeutic perspective:

0 To diagnose what area of the brain is generating the seizures. Patients with
epileptic disorders that are untreatable with current medications are candidates for
surgery. Evaluation of surgical candidates consists of hospitalization, stopping all
anti-seizure medications and any other such treatment, inserting EEG probes, and
observing patients to determine in what area of the brain seizures are generated. If
seizures are generated in critical areas of the brain, the subject is rejected for

122

surgery; otherwise, surgery is possible. In some cases, the patient presses a button
each time he or She feels that an attack is imminent. This procedure can be done
by only about 50% of the patients with MTLE who maintain volitional control of
their ﬁnger at the start of seizure [5]. Monitoring also includes video recordings in
order to visually conﬁrm clinical events. Review of such video is an exhaustive,
tedious, and time consuming process. In the video recording analysis, labeling of
seizures is limited to those that are clinical seizures (i. e., the patient exhibits
clinical symptoms of seizure during the convulsion); therefore, subclinical
seizures—those where the patient does not exhibit any externally observable
symptoms—are not labeled as seizures. Developing an algorithm that can detect
all the (clinical and subclinical) seizures and record the time when the seizures
occurred, regardless of the patient’s display of clinical symptoms, would give the
neurologists a more accurate “proﬁle” of the brain Of the patient. It could also
allow performing surgeries after a shorter period of observation.

Early detection can trigger immediate therapy. If early detection of seizures can
be achieved by a detection algorithm, therapy (e. g, drug or an electrical stimulus)
from an implantable device could be initiated, avoiding a seizure’s complete
evolution and allowing the patient to continue doing ordinary daily activities like
driving or talking, without any clinical consequence like loss of consciousness or
suffering of convulsions.

A chronic therapeutic effect on patients. Just as traditional medications, used to

attempt to control or relieve a given condition, must often be administered over an

123

extended time period (e. g., antibiotics usually require days to weeks to kill
bacteria), similarly extended treatment may be needed to see all the beneﬁcial
therapeutic effects of an implantable detection device ﬁring an electrical stimulus
each time the device detects epileptiform activity (i. e., abnormal behavior of the
EEG signals). It is hypothesized that such extended treatment may have a chronic
therapeutic effect on the patient, reducing the number of epileptic seizures that are

suffered and thus improving the quality of life of the patient.

5.2 lctal EEG Signals

Unlike preictal Signals, in which no obvious sign is visible, in ictal Signals, noticeable
changes can be observed throughout the evolution of the seizure. This means that
detection of EEG seizures is somewhat easier than prediction of seizures. The ictal period
includes the Unequivocal Electroencephalographic Onset (UEO) of the seizure, the
indicator observable by EEGerS that denotes that a seizure is starting. Figure 5.1 depicts
an ictal Signal of two minutes duration.

Patients’ data used for these experiments were the same as in Chapter 4; thus, so are
their clinical proﬁles. In the training phase of the experiments, we used baseline and ictal
epochs of two minutes duration, thus containing 24,000 points each (recall EEG Signals
were sampled at 200Hz). For ictal epochs, we used one minute of data before the UEO
and one minute of data after it (for example, see Figure 5.1). The training set for each

patient was three baseline epochs and three ictal epochs.

124

 

0_6 r r r m—

0.4 *

0.2

____--..---.l

 

   
   

-0.2 “

Voltage {V}

-0.4 -

 

I I I I

1.2 1.4 1.6 1.8 2

d b---—----—

-0'8 1 I 1 1
0 0.2 0.4 0.6 0.8

Time [min]

Figure 5.1 A two-minute duration ictal Signal. The dashed line indicates the UEO.

Whereas in Chapter 4 the extracted baseline and preictal epochs of ten minutes duration
were used for validation purposes, in this chapter, in the validation stage, we used the
entire EEG archive available for each patient. Because the detector is less sensitive than
the predictor, the detector Should be able to ignore artifacts that might be considered by
the predictor as precursors of an epileptic seizure. Yet, aﬁer classiﬁcation Of the
validation data was completed, further analyses were conducted to verify that any false

positives were not due to one of those artifacts.

5.3 Detection with GP Artiﬁcial Features

Although detection of seizures is not as hard as prediction of seizures, the problem is still
far from trivial. Many algorithms and methods have been developed to do such tasks, of
which several were cited and brieﬂy discussed in Chapter 2; however, most algorithms

125

still fail to detect some seizures. In this chapter, we apply the genetic programming
artiﬁcial feature (GPAF) algorithm to create a feature that ﬁnds a pattern or gives a
transformation that delivers more separability between baseline and ictal classes, with a
similar or better classiﬁcation rate than the benchmark classiﬁer that was explained in
Section 4.7. In other words, this transformation will make the task of the classiﬁer (i. e.,
categorization) easier; therefore allowing detection of the seizures with higher accuracy.
The implementation of the GPAF algorithm for detection is exactly as was explained in
Section 4.4, with the GP algorithm using the same cost ﬁrnction as presented in Section
4.6.

Unlike the prediction case, where a whole preictal epoch was labeled as belonging to
one class, in detection, we need an additional labeling detail because the ictal data are
clipped. For ictal epochs, the ﬁrst minute before the UEO was labeled as baseline data
and the rest of the epoch (i. e., the minute of EEG data after the UEO) was labeled as ictal.
Figure 5.2 depicts the new classiﬁcation labeling (the EEG signal was normalized to
one). Figure 5.2 makes this clear. The one-minute-EEG data before the UEO has an
amplitude similar to baseline data. Given that for the detection case a large part of the
information used to detect seizures is provided by the amplitude of the signal, it is
reasonable that our previous labeling (of the whole epoch as belonging to one class)
would be confusing to the GPAF (or any other) algorithm.

The GP algorithm used the same ﬁtness function as in the previous chapter.

Therefore, we measured metrics such as the numbers of false positives and false

126

negatives. In this chapter, we sought to achieve a false negative rate of 10% or less and

no more than two false positives per day (or a false positive rate of 0.0833 per hour).

 

2 Y T 1 ﬂ T ﬂ I 'ﬁ

0.5 "

-0.5 *

 

 

 

 

'ﬁmehmm

Figure 5.2 Classiﬁcation labeling for a two-minute ictal epoch. The dashed-straight line
(step) indicates the classiﬁcation for this epoch.

5.4 Benchmark Feature

In the case of seizure prediction, a benchmark had to be created against which to compare
the GPAF algorithm. However, in the seizure detection case, a benchmark was already
available. It is a conventional (handcraﬁed) feature called line length or curve length that
may well be the best one implemented to date for detection of epileptic seizures [25].
Equation (5.1) shows the mathematical expression.

LL(k)= i]x[n]—x[n— l]. (5.1)

n=k-L+l

127

Although the equation is simple, the feature can detect abrupt changes in a given signal;
and that is precisely the characteristic activity that present when a seizure occurs. That is,
the line length feature is sensitive to sudden changes in the amplitude (it is the discrete
analog of the absolute value Of the derivative of the function) and frequency. Line length

linearly tracks AM and FM activity in a signal.

 

 

x[n] y[k]
— LL(k) —b

 

 

 

 

 

 

 

EEG signal Line-length Classiﬁer

Figure 5.3 Line length feature benchmark classiﬁer.

In this chapter, we compare the results of the GPAF algorithm against the results
provided by the k-NN classiﬁer after processing the EEG data using the line length
feature. Figure 5.3 shows an illustration Of the benchmark classiﬁer using the line length
feature, where x[n] is the EEG signal, y[k] is the output of the line length, (y[k] is
calculated from Equation (5.1), and we have used the y[k] notation to be consistent with
previous equations and diagrams), and c denotes the output label of the classiﬁer. This
benchmark does not use the delays used for the GPAF algorithm. However, Equation
(5.1) can also be viewed as a state-Space reconstruction using two dimensions x[n] and

x[n—l] (that is, the embedding dimension is two), with the delay T = 1.

128

5.5 Decision Integration Window

As in the seizure prediction experiments, detection experiments were conducted using a
point basis, where each point in an artiﬁcial feature epoch counts as one example, i.e.,
each point k is an example for the classiﬁer (see Figure 4.14). Similarly, if, for instance,
an implantable device (with GPAF features integrated) classiﬁed each t seconds over an
incoming EEG signal, the device could not be allowed to medicate the patient based on
the decision that the classiﬁer makes each At seconds. The device needs a longer, ﬁxed-
length window, so that it can observe the past evolution of the point-basis classiﬁcation
during a deﬁned period to decide whether or not a patient is suffering a seizure. Even
when this window is buffered, the point basis creates too much ﬂickering at the binary
output. We denoted the length of this window as LDIW. This parameter controls a tradeoff
between the number of false positives and the detection delays of seizures (thus, it also
controls the number of seizures that are detected; however, because most seizures are
detected sooner or later given the sudden and large changes on the EEG signal’s
amplitude, what is also of great interest is how early those seizures can be detected). How
the length of the window affects the number of false positives, false negatives, and

detection delay is as follows:

 

 

Law Small Large
False
. . Increase Decrease
Posrtrves
False
. Decrease Increase
Negatives
Detection
Decreases Increases
Delay

 

129

This parameter can be ﬁxed as a constant for all patients, that is, the same LDlw value for
all the patients. However, this parameter allows more ﬂexibility in our approach, to make
the system tunable for better patient-Speciﬁc performance. Here we selected LDIW by
trial-and-error, but the ability of the GP algorithm to automatically select the optimal
parameter will be investigated in the future.

The detection algorithm uses the concept of a time tolerance. A detection will be
considered to be made correctly up to one minute before and through one minute after the
UEO. If the detector detects the seizure earlier than a minute before UEO, it is considered
a false positive; whereas if the seizure is not detected within one minute after the UEO, it

is considered a false negative.

5.6 Results

The feature-sampling periods for creating the data sets were different. Since the relative
amplitude variability of the EEG signals at the moment of an attack is much higher than
the relative variability of the preictal EEG (see Section 4.2), features were generated
every 0.125 seconds. Therefore, the length of the Sliding window was set to D = 50 points
(0.25 s) and L = 200 points (1 S). The same Sliding window parameters were used for the
curve length feature. Thus, the number of points in an epoch processed by the GP-
generated equation K was 477 points. Holdout validation was used as training technique.
The training data consisted of three baseline and three ictal epochs for each patient.
Euclidean distance was set as the metric and the number of nearest neighbors for the k-

NN classiﬁer was set to k = 5, a value typically used. The classiﬁer was set to output one

130

when ictal data are present (that is, a seizure is being detected) and zero otherwise.
Finally, as stated before, the embedding dimension was set to six.

In the experiments, the Low parameter was set identical for both the GPAF
algorithm and the curve length benchmark for each patient.

In these experiments, we set the number of trees for each individual in the GP
module to two; that is, the GPAF algorithm created two artiﬁcial features for each
patient. Also, the GP algorithm used the same set of functions presented in Section 4.8.
The maximum depth for a tree was set to 10 and the population Size to 1000 individuals.
The initial population was generated using the ramped half-and-half method. A crossover
operator with a rate of 0.9, ﬁtness proportional selection, and a mutation operator with a

rate of 0.1 were used.

5.6.1 PatientA

This patient suffered six seizures through 45.0 hours of EEG recording. Based on
autocorrelation analysis of the ictal signal, the delay was set to T = 19 (or 0.095 s) and the
terminal set was {x[n], x[n—19], x[n—38], x[n—57], x[n—76], x[n—95]}. Equation (5.2)
shows the artiﬁcial features found by the GPAF algorithm for the detector of this
patient’s seizures. Table 5.1 provides the results for GPAF and curve length equations,
where FP denotes the number of false positives, FPH is the rate of false positive per hour,
FN denotes the number Of false negatives, LDIW is the length of the decision integration

window, ADT denotes the average delay of detection time, S2 is the number Of seizures

131

 

that occurred throughout the EEG archive, and HP is the number of hours of EEG
recordings processed.

This patient had a high average Of detection time because one seizure was detected
with an outlier delay of 35.75 s. For this patient, the traditional curve length measure

outperformed GPAF, overall.

yl[k]= sogso (x[n—76D2

n=l+50(k-l) 108(xl” “191).

 

 

 

(5.2)
50k+150
yz [k]: Zx[n—95]-x[n— 76].
n=1+50(k—l)
Table 5.1 Results for validation data for patient A.
System FP FPH F N me ADT 82 HP

GPAF 6 0.133 0 3.75 3 15.67 s 6 45.0
Curve length 5 0.111 0 3.75 s 7.67 s 6 45.0

 

5.6.2 Patient B

Through 46.2 hours Of EEG recording, this patient suffered 5 seizures. The delay was T =
11 (or 0.055 S) and terminal set was {x[n], x[n—l 1], x[n—22], x[n—33], x[n—44], x[n—55]}.
Equation (5.3) Shows the equations designed by the GPAF algorithm. Additionally, Table
5.1 provides the results Obtained from the validation of the GPAF equations and the curve
length feature. The table Shows that using the GPAF equations, just one false positive
occurred. GPAF was able to detect all ﬁve seizures that occurred. In addition, the GPAF-

based detector was able to achieve a lower average detection delay.

132

2 I if T T

 

 

 

 

 

 

 

 

 

 

 

— EEG signal
1. — - Classiﬁer
1'5 - ' - Ist feature
........ 2nd feature
1F "1 1""- {-3.17
1.1111: QM”? PM“:
if“ P. 5 : zvéﬁg
r‘ {‘k {. MF~ ['3‘ I ' g :E:
0.5 f L~ ‘15; ‘11»: E i . j
i
0
- - 3. :a
.0.5 h f ‘ f. '3'. ‘a'vﬁ‘ P1... b’g’u‘i’sh' “‘25:"... 5: :3 .
1 53.4,? V f .‘o kv’f2;3.«~- ‘3‘." \Ir
-1 L l r 1 r 1 l 1 “at“ ~
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time [min]

Figure 5.4 A seizure from patient B, which depicts a beta-buzz pattern.

 

 

 

50k+150
y][k]= Zloglo(x[n-l l]—x[n—22]+‘lxIn—11]).
n=l+50(k—l)
(5.3)
50k+150 IO x n _ 1 1
Y2Ik1= 2 log,[ g2([ D J
n=l+50(k-l) xi" — 331+ x[n — 44]
Table 5.2 Results for validation data for patient B.

System FP FPH FN me ADT 82 HP
GPAF 1 0.022 0 3.75 s 1.15 5 46.2
Curve length 1 0.022 0 3.75 S 4.35 5 46.2

 

A good average Of detection delay was achieved because this patient had a
characteristic pattern called “beta-buzz” in which the amplitude dramatically drops before
the UEO. Figure 5.4 Shows the characteristic pattern, which occurs a few seconds before

the UEO that is located exactly at one minute in the plot. Summarizing across measures,
133

 

GPAF never under-performed the curve length detector, and outperformed it on average

time to detection.

5.6.3 Patient C

This patient had an archive of 65.7 hours of EEG recording, in which eleven clinical
seizures were diagnosed. The delay was T = 19 (or 0.095 s) and terminal set was {x[n],
x[n—19], x[n—3 8], x[n—57], x[n—76], x[n—95]}. The pair of artiﬁcial features found by the
GPAF algorithm is in Equation (5.4). Table 5.3 shows the results for the validation data,
where zero false positives and thus a zero false positive rate per hour occurred and all
eleven seizures were detected. Finally, though not better than the curve length, the

artiﬁcial equations produced a low average of detection delay.

ytlkl= '2" [21221- W]

 

 

 

n=l+50(k-l) x[n 95] (5 4)
y [k]: 150k+50 I (x[n _19]_ x[n])2 l
2 n=l+150(k—1)“log10 (10810 (x[n " 38Dl|
Table 5.3 Results for validation data for patient C.
System FP FPH FN LDIW ADT 82 HP
GPAF 0 0 0 3 S 1.93 S 11 65.7
Curve length 0 0 0 3 s —l.82 s 11 65.7

 

5.6.4 Patient D
This patient has 156.4 hours of EEG recording in which eleven clinical seizures were

found. The delay was T = 9 (or 0.045 S) and the terminal set was {x[n], x[n—9], x[n—18],

134

 

x[n—27], x[n—36], x[n—45]}. Equation (5.5) Shows the artiﬁcial functions found by the
GPAF. Some very interesting results were found for this patient. The ﬁrst artiﬁcial
equation y1[k] is the curve length feature, but with delay of 9 instead of 1. AS seen in
Table 5.4, the GPAF equations and the curve length both performed well, producing no
false positives or false negatives, and the average detection delay (ADT) with GPAF was

lower than that with curve length.

y1[k] = ”gm" - 9]— x[n].

 

 

n=l+50(k—l)
(5.5)
50k+150
yz M = 2'34"]
n=l+50(k—l)
Table 5.4 Results for validation data for patient D.

System FP F PH F N Lplw ADT 82 HP
GPAF O O O 2.75 S 2.1 s 11 156.4
Curve length 0 0 O 2.75 s 4 s 11 156.4

 

5.6.5 Patient E

This patient has the longest EEG archive, lasting 197.3 hours, in which ﬁfteen seizures
were found. The delay was I = 9 (or 0.045 s) and the terminal set was {x[n], x[n-9], x[n—
18], x[n—27], x[n—36], x[n—45]}. The pair of artiﬁcial features is presented in Equation
(5.6). Table 5.5 provides the results for both the artiﬁcial features and the curve length
feature. Both of them produced the same number of false positives, providing a false

positive rate of 0.025 per hour, which is acceptable. All seizures were detected, with the

135

GPAF features producing a slightly better average delay of detection although this

difference does not appear to be statistically Signiﬁcant.

 

 

50k+150
yl[k]= Zx[n—18]-x[n—36].
n=l+50(k—l)
(5.6)
50k+150
yz [k] = lenl+ x[n - 9]+ x[n — 36]- x[n — 45].
n=l+50(k—1)
Table 5.5 Results for validation data for patient E.
System FP FPH FN LDIW ADT 82 HP
GPAF ‘ 5 0.025 0 4.25 s 10.9 s 15 197.3

Curve length 5 0.025 0 4.25 5 11.05 S 15 197.3

 

5.6.6 Patient F
The entire EEG database of this patient lasts 66.2 hours, during which six clinical
seizures were identiﬁed. The delay was t = 20 (or 0.1 S) and the terminal set was {x[n],
x[n—20], x[n—40], x[n—60], x[n—80], x[n—100]}. Equation (5.7) shows the artiﬁcial
features designed by the GPAF algorithm. Table 5.6 shows results obtained with the
GPAF equations and the curve length feature. For this patient, the GPAF-designed
equations produced three false positives, whereas the curve length feature produced
ﬁfteen of them. The average detection delay was lower for the GPAF equations.

On the other hand, both algorithms failed to detect one seizure previously reported.
However, the GPAF algorithm could detect three clinical seizures, of which one is
depicted in Figure 5.5, and one subclinical seizure that had not been previously reported;

whereas the curve length feature detected an extra subclinical seizure (that is, the GPAF

136

features did not detect the additional subclinical seizure). The clinical seizures lasted

roughly one minute, whereas the subclinical seizures lasted 16 to 18 seconds.

50k+150

 

 

 

y, [k] .-= zlog(./x|n —- 100 I - x[n — 20]).
n=l+50(k-l)
(5.7)
50k+l50 I x[n _ 100]
y [kl= -
2 n=1+so(k—1)|x["]' 1031004" ‘ 60D|
Table 5.6 Results for validation data for patient F.

System FP F PH F N me ADT 82 HP
GPAF 3 0.045 1 3.75 S —9.21 S 7 66.2
Curve length 4 0.060 1 3.75 s 5.33 s 7 66.2

 

 

 
 

 

115% $93!; :2

r

0.5

 
     

 

‘2';

”(7*

1 “Huh.
‘ ---- i.
o
. I
.II ..
'.I:

01?. '3'" I'm.

llltllllllllll

— EEG signal
— - Classiﬁer

- - - 1st feature
.05 - ------- 2nd feature

; I 1 1 ' 1 1 1 1 1 1
5.905 5.91 5.915 5.92 5.925 5.93 5.935 5.94 5.945 5.95
Time [hr]

Figure 5.5 Clinical seizure previously unreported for patient F.

 

 

 

 

 

 

137

5.6.7 Patient G

This patient had available 153.8 hours of EEG for the training and validation procedures.
This patient suffered 31 seizures, of which twenty were subclinical. Unlike in the
prediction problem, here we also included the subclinical seizures so that the GPAF
algorithm could design a detector that also warns about such seizures. Subclinical
seizures are epileptic seizures, but of duration usually no more than 30 seconds and
without overt, clinically evident symptoms, i. e., they are detected only with EEG. Figure

5.6 shows a subclinical seizure.

 

1_5 I r r r I 1 i—

0.5 “ . ‘

 

 

 

 

-O.5 *

 

 

-1 1 J 1 1 l l l l l

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time [min]

 

Figure 5.6 Subclinical seizure. The signal is normalized to one.

The delay was I = 15 (or 0.045 s) and the terminal set was {x[n], x[n—15], x[n—30],
x[n—45], x[n—60], x[n—75]}. Equation (5.8) shows the artiﬁcial features found by the

GPAF and Table 5.7 shows the results for the validation data. Although the number of

138

 

false negatives was zero, the number of positives was so high that the artiﬁcial equations

were impractical (see below). A Similar result was obtained for the curve length feature.

50k+150

 

 

y,[1]: z |x[n_3o]+((x[n_4s])'2+(1og(x[n].x[n-6op)2f.
n=1+50(k-l)
(5.8)
50k+150
yz [k] = Z‘lxln — 60'.
n=1+50(k-1)
Table 5.7 Results for validation data for patient G.
System FP F PH F N me ADT 82 HP

GPAF — — 0 2.75 0.105 31 153.8
Curve length — — O 2.75 3.07 31 153.8

 

We realized that, unlike the other patients, this patient had many subclinical seizures.
Figure 5.2 illustrates what was happening. The classiﬁcation labeling that was used
before was also being applied to subclinical seizures, but Figure 5.6 shows that once a
subclinical seizure ends, the amplitude of the Signal goes back to normal. In fact, the
amplitude of the Signal aﬁer the seizure is Similar to the amplitude before the seizure.
Therefore, to address this problem, we redeﬁned the classiﬁcation labeling for the
subclinical seizures. The classiﬁer should output one just when the seizure is occurring,
and zero both before and aﬁer the seizure. Also, this patient had some seizures that
created a “catch 22”—the situation when neurologists, who are trying to place the
intracranial electrodes at the seizure foci, do not know exactly where those are located.

Because of this, sometimes the electrodes are misplaced, creating delays in the EEG

recordings of seizure activity of up to tens of seconds. For this reason, because our

139

observation of the data suggested the existence of such delays, for Six seizures, we
corrected the UEOs forward in time by 6 to 12 seconds.

Although the GPAF algorithm was run again to see what features would be designed
using the revised information, we also evaluated Equation (5.8) again using the new
classiﬁcation labeling. Table 5.8 shows the results for the validation data. The number of
false positives is approximate, given that there were still too many events for an accurate
false positives count. Additionally, approximately eight hours of EEG recording, during
which there were no seizures, were highly misclassiﬁed; therefore, the real number of
false positive is a lot larger. Increasing the length of the DIW did not yield an acceptable
solution; given the setting DIW = 2.75 5, four seizures were still missed (although all
were subclinical).

Table 5.8 Results for validation data using the new classiﬁcation labeling.

System F P FPH F N me ADT 82 HP
GPAF 85 0.553 4 2.75 8.361 31 153.8

 

 

 

The GPAF algorithm was run using the new classiﬁcation labeling. Equation (5.9)
shows the artiﬁcial features found. Table 5.9 provides the new results for the validation
data. Clearly, these artiﬁcial features are achieving good performance. The two seizures
missed by the GPAF equation were both subclinical seizures. However, an additional
unreported subclinical seizure was detected by both methods. Additionally, a low false
positive rate was achieved. In contrast, the curve length feature had a high false positive
rate (although the number of false positives is an approximation, as explained above). In

fact, the number of false positive for the curve length feature was larger because

140

misclassiﬁed almost eight hours of EEG recording. If the parameter LDIW was increased,

several seizures were missed.

_ 50k+150 xn- mn- +M
yllk]_n=l+§(k—l)[ 75] [ 60] x[n—30]'

(5.9)
50k+150 (x[n _ 60])2

y2[k]= Z ln(x[n—451)'

n=1+50(k—1)

 

 

1.5 I I 1 I f

..L
I
I
I
I
l

0.5 *

 

   

 

 

 

 

' - ‘
-

O

-o.5 ' } -

 

 

-1 l l 1 1 l l l l l

o 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time [min]

Figure 5.7 Classiﬁcation labeling for subclinical seizures. The dashed-straight line
indicates the ideal classiﬁcation for this epoch.

 

Table 5.9 Results for validation data for patient G using new classiﬁcation labeling.

 

 

System FP FPH FN LDIW ADT 32 HP
GPAF 11 0.072 2 2.75 5.81 31 153.8
Curve length 81 0.527 0 2.75 3.40 31 153.8

 

141

 

5.7 What Causes False Positives?

It is natural to wonder about the anomalies that tend to make the GPAF classiﬁer produce
false positives. Some of the false positives occur because artifacts are present, as
discussed below. In the results of the previous section, artifact-type false positives were
not counted, however, in this section we Show pictures of how a false positive is
produced by an artifact. In the following ﬁgures, the means of the artiﬁcial features were
subtracted to shift the artiﬁcial Signals to zero for easier comparison. Figure 5.8 shows an
artifact, called blackout, detected in the EEG recording of patient B. The picture Shows
that as soon as the blackout starts, the artiﬁcial features change suddenly, making the
detector “ﬁre.” Blackouts were excluded as false positives given that such artifacts occur
because of the recording equipment and process. Another artifact that leads to false
positives is full-scale swings. Figure 5.9 depicts a full-scale swing detected by the
artiﬁcial features for patient D. Once again, it represents a ﬂaw in the EEG recording
system.

However, not all false positives are due to artifacts; some result from brain-generated
events. Figures 5.10 and 5.11 show seizure-like events that occurred in patient B. These
“false start” events may last just a few seconds. If their duration iS more than 30 seconds,
they can be considered epileptic seizures. Both graphs illustrate that the detector was
“ﬁring” during the false start (the frequency increased). Additionally, the EEG of patient
E also exhibited “delta trains,” whose frequency components are between 0.5 and 4.0 Hz.

Figure 5.12 presents an example of a delta train.

142

 

l 1 Y T T r I

 

 

— EEG signal
— - Classiﬁer

0-3 r - - - tst feature 1
........ 2nd featufe
0.6 - .

 

 

 

 

 

 

0.4 - .

0.2 ‘

 

 

 

 

 

-0.4 ' g

'0.6 '- | d

j

 

'0‘8 ' 1 1 1 d

L l l l
3.31 3.32 3.33 3.34 3.35 3.36 3.37 3.38
Time [hr]

Figure 5.8 Blackout occurred in the EEG recording of patient B.

 

 

 

 

T 1 I jr 1 T l I Y

 

 

 

 

 

 

 

 

 

1 P ’ 3 ~
— EEG signal :
— - Classiﬁer E
0.8 p - - - 18! feature : -
........ 2nd feature I
0.6 r r
0.4 r S
0.2 r ,
i
I
0W..." .
| I
‘- —_—
-0.2 ~ 1
.0'4 r1 1 1 1 L 1 1 1 1 q
5.98 5.985 5.99 5.995 6 6.005 6.01 6.015 6.02

Time [hr]

Figure 5.9 Full-scale swing occurring during EEG recording of patient D.

143

 

1 v r r r Y

 

— EEG signal
— - Classiﬁer 5
— - - 1st feature
-------- 2nd feature

 

 

l
I
I
1

-------_---_----
----—-——_--—------_

 

0.8 -

0.6

0.4

   
 

 

  

.—.-.-.-.Inl-

0.2

-0.2

 

 

l l l L 1
4.934 4.936 4.938 4.94 4.942 4.944
Time [hr]

Figure 5.10 False start occurring during the EEG recording of patient E.

 

 

 

 

 

-—— EEG signal
— — Classiﬁer

— - - tst feature
------- 2nd feature

 

 

 

 

    
 

   

_-_---___--_-_-____--_---4

. ,,,+,. ,97. -55, "1,771 17,,
5.914 5.916 5.918 5.92 5.922 5.924 5 926 5.928
Time [hr]

Figure 5.11 False-Start occurring during EEG recording of patient E.

144

 

1
I

I

I

I

I

I

I

I

t

0.5 ~ g
I

I

I

I

I

I

I

 

....

 

— EEG signal
_ — - Classifier _

'0'5 - - - 1st feature

-------- 2nd feature

 

 

 

 

L l I; I

 

 

I

3.937 3.9375 3.938 3.9385 3.939 3.9395
Time [hr]

Figure 5.12 Delta train in EEG recording of patient E.

5.8 A Generic Detector

A universal, generic detector, such as the curve length feature, is desired because it can
be applied to any patient without the need for extracting data to design a detector for the
seizures of the patient at hand. Using only a small amount of baseline and ictal data, the
detector Should able to detect most seizures and at the same time have a low false positive
rate. In this section, we Show a general detector designed by the GPAF algorithm.
For this experiment, we used only one baseline epoch and one ictal epoch of two

minutes duration from each available patient. Therefore, the training data set consisted of
seven baseline and seven ictal epochs (a total of 28 minutes of data). The training ictal

epoch for patient G was a subclinical seizure. The holdout method was used for training.

145

1n searching for a generic detector, we also ﬁxed the delay parameter by obtaining an
average from the delays calculated previously. The average delay was 16. Therefore, the
terminal set was {x[n], x[n—l6], x[n—32], x[n—48], x[n—64], x[n—80]}. The LDIW
parameter, which is not speciﬁcally tied to any feature, was adjusted for each patient and
was not necessarily the same value used for the customized detector in the previous
section. The parameters for the GP algorithm and k-NN classiﬁer were set as before.

Equation (5.10) Shows the artiﬁcial equations found by the GPAF algorithm.

 

_ _ _ ,_ 2+x[n]-sirlsirtxlnD)-x{n—481_o ,_
J’ilkl-n=I+]ZS(:(k_l)("1" 64] x[ 80D (x[n—16D2—logm(x[n—l6]) 182(31 641)-

yz[k]= ='IS(:Z:E[):)]+x[n-8O]+x[n—32]-x[n—16]—cos(x[n—48D.

(5.10)

Table 5.10 provides the results for the seven patients using the validation data, their
respective whole EEG archives. The results were acceptable for most patients, having
similar performance to the curve length feature. Five patients out of seven met the

speciﬁcations of a 0.0833 of false positive rate and less than 10% of false negative.

Table 5.10 Results for validation data for the general detector.

 

 

 

Patient F P FPH FN Lplw ADT 82 HP
A 3 0.067 0 3.75 15.08 6 45.0
B 2 0.043 0 3.75 0.1 5 46.2
C 0 0 0 3.25 0.16 l 1 65.7
D 0 0 0 2.5 1.55 1 1 156.4
E 25 0.159 O 6.25 19.13 15 197.3
F 8 0.121 1 6.25 16.17 7 66.2
G 2 0.013 2 2.75 6.22 31 153.8

Overall 5.71 0.058 0.429

4.07 8.34 86 730.6

 

146

For patient E, a 0% false negative rate was achieved; however, this patient had a ﬂurry of
EEG seizure-like bursts in the form of false starts and delta trains, which caused all of the
false positives. Because of this, the length of the decision window was increased to ﬁlter
out several false positives; however, this decision affected the average detection delay.

Patient F had a rate of 14.29% false negatives. However, three clinical and two
subclinical unreported seizures previously were detected (found by the customized and
curve length feature presented in Subsection 5.6.6); additionally, the detector found an
additional subclinical seizure, which was not detected by either the customized or curve
length detector. If all seizures were counted, the false negative rate for this patient drops
to 7.69%. On the other hand, this patient had many epileptiform discharges (spike-and-
wave patterns) that lasted only a few seconds. These electrical discharges made the
generic detector ﬁre, producing several false positives.

Finally, a similar Situation occurred with the results for patient G. Two subclinical
seizures were missed, yielding a false negative rate of 6.45%. However, a previously
unreported subclinical seizure was found. A low false positive rate per hour was
achieved. Additionally, a series of nineteen epileptiform discharges—spike-and-wave

EEG pattems—was also detected.

5.9 A Detector with r = 1

It is clear, as mentioned previously, that the curve length feature exhibits quite good

detection performance for all patients. Even for patient D, where the GPAF found the

147

curve length, the original curve length equation performed Slightly better than the GPAF
equation. (Note that it was not possible for the GPAF search to ﬁnd the identical curve
length feature, because it was not given the possibility to use I = 1 in its terminal set;
instead, for previously justiﬁed reasons, 1: was set using autocorrelation data.) Therefore,
it is natural to ask if the better performance was because of the maximum correlation that
the curve length feature displays; that is, the delay equals one. Therefore, we selected
patient G to rerun the GPAF algorithm, but with delay 1 =1. The set of terminals or
delays was {x[n], x[n—l], x[n—2], x[n—3], x[n—4], x[n—5]}. Patient G was selected because
that patient exhibited the most false negatives. The rest of the parameters was set as

before. Equation (5.11) Shows the artiﬁcial features found by the GPAF algorithm.

50k+150

y; [kl = -1 gtlgl- X[" -1]- 10910009201" -1]- Iln - 31))-
- (5.1 1)
.21.]: 2x14.
n=l+50(k—l)

Perhaps not surprisingly, the algorithm found a pair of features in which the ﬁrst, y1[k],
contains the term |x[n] — x[n—1]|, which is precisely the curve length feature; whereas the
second term in the same equation is a “tuning” or “noise” term, given that its relative
magnitude is so small that it has little effect. The second feature was not contributing to
better discrimination, given that when the GPAF features were compared with the curve
length, the latter was still performing better. When it was removed, the curve length
feature and y1[k] in Equation (5.11) performed similarly, with the difference of 0.255 in

lead or delay of detection of some seizures caused by the log term in the GPAF feature.

148

 

In a second experiment, we ran the GPAF algorithm to ﬁnd a generic set (that is, using
the EEG data epochs from all patients) of artiﬁcial features, in the same way as in the
previous section, but with the delay equal to one. Equation (5.12) shows the equations

found by the GPAF algorithm, whereas Table 5.11 provides the results for the validation

data.
50k+150 2
Y2 [k]: X (x[n " 3D '
n=l+50(k—l)
(5.12)
50k+150
yz [k] = Zx[n — 3]- (x[n]+ x[n — 5D.
n=l+50(k—l)

Unlike the generic detector found before, these equations are simpler; however,
according to the table, the general artiﬁcial detector achieved good performance. Except
for one patient, the patients had an average of two false positives or fewer per day. At the
same time, a low average false negative rate was achieved. Patient F had the most
seizures missed. Patient F had the longest decision window. That is, because the detector
was being triggered by epileptiform discharges (in the form of spike-and-wave artifacts)
not lasting more than three seconds, in order to ﬁlter out these EEG discharges, the length
of the window was increased, at the expense of two (reported) clinical seizures missed.
However, it detected three clinical and one subclinical seizures, but another subclinical
seizure that had been previously detected by the curve length was missed. On the other
hand, although postictal activity was not counted as false positives, it was observed for
patient C using Equation (5.12). However, such activity was not seen using the
customized detector (i.e., Equation (5.4)) or the generic detector (i. e., Equation (5.10)).

149

That means that this set of features is very sensitive for patient C. Additionally, all the
false positives for patient F were false starts. Such EEG patterns also affected the
previous set of artiﬁcial features.

Table 5.11 Validation data results for the general detector with delay equal one.

 

 

 

Patient FP FPH FN me ADT SZ HP
A 2 0.044 0 3.75 14.42 6 45 .0
B 2 0.043 0 3.75 25.2 5 46.2
C 0 0 0 3.75 -—1.3 l 1 65.7
D 0 0 0 2.5 9.34 1 1 156.4
E 12 0.061 0 3.75 5.65 15 197.3
F 13 0.196 2 5 9.75 7 66.2
G 13 0.084 1 2.75 4.53 31 153.8

Overall 6 0.061 0.429 3.61 9.66 86 730.6

 

5.10 Training by Leave-One-Out Method

All the training phase for detection (as well for prediction) was done using a holdout
method (or split-sample validation). AS previously mentioned, a problem with this
method is that there is variability, given that the classiﬁcation is tied to the training data
selected for use. To avoid (or at least to reduce) this effect, a leave-one-out method can
be used; however, is computationally intensive. Because of this, we did not use the leave-
one-out (LOO) method for most experiments; however, we will Show here results for
patient G using a form of the LOO method. Unlike prediction, where we were using EEG
epochs for training and validation purposes, in detection, we had an entire EEG archive—
in this case, patient G had 153.8 hours of EEG recording—which made it impractical to

segment all the data and run the algorithm with two-minute segments. Therefore, our

150

LOO implementation was very limited, using three baseline and three ictal epochs (two
subclinical were included, one of them was the one missed by the all the detectors when
patient G was processed) of two-minute duration. Although the length of the epochs is
shorter for detection, we needed to calculate more points, given that the Sliding window is
0.25 S long.

The parameters for the GP algorithm and k—NN classiﬁer were the same used to
obtain the customized artiﬁcial features for patient G. Equation (5.13) provides the
GPAF-designed features. The results are Shown in Table 5.12, where Pc denotes the
percent of correct classiﬁcation F P denotes false positives, and FN is false negatives. The
length of the decision window was set to 2.5 5. Two false positives were detected among
all the combinations, and one seizure was misclassiﬁed, which was the one missed by all
the detectors previously tested. By looking at the mean and standard deviation, which

was i 7.19 %, we can state that a detector with high conﬁdence was found by the GPAF

algorithm.
150k+50 2
yr [k]: 2 (x[n ‘ 20D -
n=l+150(k-1)
(5.13)
150k+50
y2[k1= len — 301- (x[n — 50]— x[n - 40D.
n=1+150(k-1)

It seems clear that using the LOO method for training produces models—detectors,
in this case—that are more accurate, but on the other hand, they are computationally

expensive. High-end computers are needed in order to make LOO practical for this

151

 

 

 

 

application, considering the amount of data that must be processed, at least when

conducting many experiments in a short timeframe.

Table 5.12 Results for the leave-one-out method.

 

 

 

Set Pc F P FN
1 96.13 0 1
2 82.34 1 0
3 81.57 0 0
4 95.40 0 0
5 81.51 0 0
6 94.25 0 0
7 82.55 0 0
8 96.34 1 0
9 95.72 0 0

Overall 89.53 0.22 0.1 1

 

5.11 Discussion

Section 5.6 shows that the GPAF algorithm can produce or design sets of equations with
similar or better performance to that of a conventionally human-crafted feature—in this
case, the curve length. For the customized artiﬁcial features, just four seizures were
missed (with three being subclinical seizures, one of which was found in this work by
other detectors), for a rate of 4.35% false negatives (88 of 92 seizures caught, counting
the previously unreported ones), which more than satisﬁes the requirement set, being no
more than 10%. Also, in 703.6 hrs (roughly 30.4 days) of EEG recordings, just 23 false

positives were signaled, for a rate of 0.0315 false positives per hour. The false positive

152

 

requirement was 0.0833 false positive per hour so the system exceeded the expectation.
The curve length feature achieved an average 0.1314 false positives per hour. However,
this value was affected most by the results in patient G. Also, it missed just one
previously reported seizure on patient F, yielding a false negative rate of 1.09%. Note,
however, that GPAF was using two equations (artiﬁcial features) while the curve length
is just one equation, allowing the GPAF equations more ﬂexibility.

A generic detector for all patients was also successful. This model was sufﬁciently
general to allow use of a ﬁxed (average) delay parameter. The overall false positive rate
per hour was a low 0.055, and only three seizures were missed, of which two were
subclinical. However, this model was also able to detect an additional subclinical seizure
in patient F. This yielded a low false negative rate of 3.22% (90 seizures of 93 were
detected, after counting the seizures that had not been previously reported). This general
detector was created by the GPAF algorithm with just one baseline and one ictal epoch of
two minutes duration from each patient, a very limited dataset (due to computational
expense) compared with the entire body of data available. Time (generations) limits on
the evolution of GPAF features were also imposed, for the same reason. Our point is that
a better model may be obtained if more data, more generations, and a leave-one-out
method were used. With an experiment, we demonstrated that a high-accuracy, low-
variance model could be designed by using a larger dataset with leave-one-out cross
validation.

Similar results were obtained by the generic model with the delay set to one, yielding

an overall false positives per hour rate of 0.058, and missed seizure rate of 4.35% (88 of

153

92 seizures were detected, adding the previously unreported seizures). Clearly, for seizure
detection, exploring the maximum correlation (that is, by using I = 1) to design new
artiﬁcial features may also lead to better artiﬁcial detectors.

Similar to prediction, in detection, the number false positives and false negatives (and
thus the average of detection delay) is affected by the length of the decision window.
Although the length can be ﬁxed, it seemed beneﬁcial to ﬁx the length for each patient.
Additionally, the design of artiﬁcial features from the GPAF algorithm is affected by
where we start to label the ictal class. Initially, we were labeling the ictal epochs one
minute before the UEO in order to give to the GPAF algorithm a way to track some sort
of signal that indicates that an epileptic seizure was imminent. However, artiﬁcial
features then produced too many false positives, because part of the data was in conﬂict
with the post-UEO data, aS was explained earlier in this chapter. Therefore, we decided to
start labeling data as ictal class only after UEO. After that, good artiﬁcial features were
obtained. However, we also learned that the classiﬁcation Should not start for all patients
at UEO. For example, patient B has a characteristic pattern before UEO, known as beta-
buzz, in which amplitude collapses and frequency increases considerably. This pattern
was precisely the one that made the detector ﬁre, allowing an early detection of the
seizures. However, in general, the length of the decision integration window, the
classiﬁcation labeling, and the possible feature complexity provided by the GP algorithm
provide much ﬂexibility to design good artiﬁcial features. First, the GP algorithm tries to
explore the search space looking for the “optimal” artiﬁcial features, based on the data

and the classiﬁcation labeling. Later, the points misclassiﬁed can be corrected using the

154

decision integration window, using its length parameter to control the tradeoff between
false positives and negatives. This architecture allows us to design detectors that pay
attention to the EEG patterns and Speciﬁcs of each patient.

Unlike in the prediction case, where the GPAF algorithm far outperformed the
benchmark classiﬁers, in this chapter we cannot claim that the GPAF algorithm
outperformed the curve length feature; rather, the GPAF algorithm did better for some
patients and the curve length feature did better for other patients. However, the GPAF
algorithm has a desirable property: it provides room for improvement. By providing more
data, adding a term in the ﬁtness function that considers in some better way the detection
delay, and using high-end computers to enable use of more data and more thorough
search, there is a high probability that the GPAF algorithm will ﬁnd better artiﬁcial
features that respond to the characteristics of a particular patient than does one universal
feature. Additionally, we can increase the number of artiﬁcial features, for additional
discriminating power.

Finally, this work has shown that the GPAF algorithm is not only able to create
artiﬁcial features with no known physical meaning, but also is able to ﬁnd some features
that are known to be good features—speciﬁcally, this was the case for patient D and
patient G with delay set to one. Although not the central object of this study, these results
conﬁrmed the claims by Esteller et al. [25] in the sense that the line length feature is a

good and efﬁcient feature for detection of epileptic seizure onsets.

155

 

5.12 Summary

In this chapter, we presented the GPAF algorithm for the seizure detection problem,
demonstrating its use in several experiments. These experiments yielded detectors that
had low false positive rates per hour and low false negative rates (in total, 88 of 92
seizures were detected). In addition to customized equations that were compared with the
curve length feature and had similar or better results than that benchmark, we also
presented a generic detector that Showed good performance by detecting 90 seizures
(clinical and subclinical, including the seizures discovered during the experiments) out of
93, and by being able to detect an extra seizure that was not detected by either the other
GPAF-generated detectors or by the benchmark.

Artiﬁcial features were also designed using a terminal set with delay 1? set to 1. In the
ﬁrst such experiment, just using EEG data for patient G, the GPAF algorithm found an
expression that resembled the curve length feature, the benchmark feature. In a second
experiment, this time to ﬁnd a generic detector, the GPAF algorithm came up with a set
of simple equations that yielded a low false positive rate and detected 88 of 92 seizures
(counting even those that were discovered in these experiments).

Finally, an experiment was run with the GPAF algorithm using a leave-one-out
method on an expanded dataset as the training process. Although this method is of
limited practical value because of its computational intensiveness, this experiment

showed that a detector with low variability can be found by such a process.

156

Chapter 6

Conclusions and Recommendations

In this dissertation, we proposed the genetic programming artiﬁcial features (GPAF)
algorithm, intended to overcome the reported limitations of the genetically found,
neurally computed artiﬁcial features (GFNCAF) algorithm. The GPAF algorithm was
ﬁrst tested on a problem of design of an artiﬁcial feature that discriminates between
parallel and nonparallel vectors. Results were compared with those of the GFNCAF
algorithm. Later, we used the GPAF algorithm to design artiﬁcial features for epileptic
seizure prediction directly from the raw intracranial EEG data via state-space
reconstruction for seven patients. In the last chapter, the methods were applied to

machine design of epileptic seizure detectors.

6.1 Beneﬁts of the Algorithm

The GPAF algorithm, along with the space-state reconstruction technique, allows
researchers and medical personnel to design a set of features that take into account the
characteristics or patterns that are “hidden” in the intracranial EEG Signals of a given
patient in order to predict or detect epileptic seizures. Similar to an optometric test that
prepares the prescription for eyeglasses for a patient, a set of features optimized for the

needs of a patient could someday be programmed into a chip to be implanted in the

157

patient by a surgical procedure, providing warnings or automatically administering
medications or electrical stimulation.

On the other hand, the GPAF algorithm could also be applied in several other ﬁelds
of study, such as signal analysis, optimization, intelligent control, pattern recognition, and
biomedical engineering. The computational framework described in this dissertation also
represents an advance in the development of new analyses and practical techniques to
predict epileptic seizures, while paving the way for application to other important
problems that remain unsolved, such as prediction of earthquakes or cerebral infarct.

The GPAF algorithm offers a modest contribution in advancing the area of pattern
recognition. Previous work, in which preprocessing or transformation of the input data to
the classiﬁer has been recognized as allowing achievement of better classiﬁcation, has
relied upon scaling the input features (that is, multiplying original features by
coefﬁcients) or selecting the most informative features (that is, turning on or off the
original features) usually using a genetic algorithm. In this work, we have extended the
meaning of feature extraction by allowing the GP algorithm to create artiﬁcial features,
based on use of case-Speciﬁc but algorithmically determined terminal function sets that
allow enough ﬂexibility to make either Simple or complex transformations. This makes
the task of the classiﬁer easier.

Additionally, in this work we dealt with a dynamic system, the brain. Using state-
space reconstruction allows us to work directly in the time-domain—avoiding the need to
make some transformation to another domain like frequency, for example—while trying

to capture the deterministic dynamics underlying what appears to be random behavior of

158

 

the EEG Signals. Although it does not make sense to apply state-space reconstruction in a
dynamic system in which the state variables are already well-known, the GPAF algorithm
could still be used for tasks that include diagnosis, estimation, and detection, by
highlighting patterns that can be confused with other normal behavior or are submerged

in undesired anomalies like noise.

6.2 Conclusions

In this dissertation, we have proposed an algorithm aimed at capturing a very broad set of
artiﬁcial feature characteristics, thus, intended to overcome the limitations of the
traditional features that have been used in the past for prediction and detection of
epileptic seizures and to overcome the limitations of the feature creator algorithm
GFNCAF. The genetic programming artiﬁcial features algorithm extracts relevant
information from the EEG data. Looking at the raw EEG data——in this case, through
state-space reconstruction—to look for patterns was valuable, endowing the algorithm
with the ability to pay attention to the dynamics of each individual patient as a system. At
the same time, with the inclusion of a genetic programming (GP) module as the search
engine to look for the artiﬁcial features, the GPAF algorithm overcame the limitations of
the GFNCAF algorithm by having vastly expanded control over the structure
(connections, functions, and inputs) of the features.

Experiments showed that predictive patterns vary from patient to patient, given that
the artiﬁcial features designed were different on a per-patient basis in structure as well as

in number of features needed for good prediction. Thus, it was also Shown that, although

159

 

prediction of epileptic seizures could be difﬁcult, seizures can be predicted with a
reasonable time margin before onset, potentially allowing intervention using a medical
treatment.

A generic (cross-patient) detector was designed that could easily be programmed into
an implantable chip.

With the GPAF algorithm, we are automating much of the process involved in
designing customized predictors and detectors. Once the patient is instrumented for EEG
recording, the EEG data can be fed to the GPAF algorithm, which can design the
predictor or detector (artiﬁcial features). Later, these equations could be programmed into
an implantable chip, which could then intelligently trigger various forms of intervention.

Finally, this work has Shown that the GPAF algorithm is not only able to create
artiﬁcial features without needing to ascribe physical meaning to the Signals, but also is

able to ﬁnd some man-made features that are known to be good features.

6.3 Limitations and Recommendations

Albeit this work was intended to overcome the shortcomings of the GFNCAF algorithm,
we still found some limitations, which we summarize below along with some
recommendations, providing to the GPAF algorithm more room for improvement.

0 Increasing Computing Performance: Although we ran the experiments on a 3GHz
Pentium IV PC, it was not enough to run the GPAF algorithm for an acceptable
number of generations. According to Banzhaf et al., GP population Sizes typically

Should range between 500 and 5000 individuals. They also mention that for good

160

search on challenging problems, the population Size should be even larger. Our
population sizes ranged between 500 and 1000 individuals, and the number of
generations was no larger than 200. However, it is well known that the GP
algorithm is computationally intensive, consuming a large amount of memory and
CPU time. Because of this, there were also limitations on the amount of EEG data
that could be used for training in each run, with consequent limitations on the GP
algorithm to look for a better set of features. In order to increase considerably the
memory and CPU time resources available, parallel computing or high-
performance computers (supercomputers) could be considered for such purposes.

Extend GPAF Algorithm Capability: In this work, the GPAF algorithm was
limited to designing or ﬁnding the arguments (GP program) of a summation, and
could not produce a set of artiﬁcial features that contained embedded summations
or other vector-handling operators, because the GP algorithm used was Single-
typed (scalar-typed). For instance, the GPAF algorithm presented in this work
cannot produce artiﬁcial features that include quotients of summations, because
the GP algorithm does not handle vectors inside the programs. This shortcoming
limited the ﬂexibility of the GP algorithm and thus the complexity of the artiﬁcial
features that it could explore or design. To allow the GPAF algorithm to produce
more complex features, a strongly-typed genetic programming system could be
implemented as the search engine for artiﬁcial features—i.e., a GP algorithm that
contains mechanisms to prevent policy violations (enforce well-formed formulas)

in mixing different types of data like vectors and scalars, for example. With this,

161

 

the GPAF algorithm would have much more ﬂexibility to create the simplest as
well aS very complicated features, probably yielding more accurate results.

Use All EEG Channels: In this work, the intracranial EEG data available came
from one channel per patient. Similar to the approach used by D’Alessandro et al. ,
we could provide multiple channels to the GPAF algorithm and let it decides
which channels are more “informative” to extract patterns in order to predict (or
detect) the epileptic seizures.

Support Vector Machine as classifier: an SVM could be implemented as a
classiﬁer component for the GPAF algorithm. An SVM processes the data ﬁrst
and projects them into a high-dimensional Space by means of a nonlinear

function. This classiﬁer may yield a better separation of the classes, increasing the

capability of the GPAF algorithm.

162

 

Appendix I

This appendix shows the point-basis results for all the epochs available evaluated with
the benchmark classiﬁers, as explained in Section 4.9, and GPAF algorithm, for each
patient. Small letters denote baseline epochs, whereas capital letters denote preictal

epochs.

Patient A

Point-basis classiﬁcation results for patient A.

 

 

 

Epoch PCA k-NN k-NN GPAF
b 50.5 53.51 61.54
c 46.82 52.51 57.86
d 45.82 53.85 57.19
B 55.18, 47.49 80.33
C 49.83 45.15 69.23
D 54.52 42.14 73.91
Overall 50.45 49.1 1 66.68

 

163

 

 

 

Patient B

Points-basis classiﬁcation results for patient B.

 

 

 

Epoch PCA k-NN k-NN GPAF
b 63.88 57.86 82.61
c 67.22 72.91 89.63
d 67.22 67.89 86.96
e 63.88 53.85 88.29
f 67.56 62.21 81.61
g 66.56 51.51 77.26
B 59.2 85.62 85.62
C 40.8 57.19 66.56
D 67.89 87.63 94.65
E 34.78 54.85 53.17
Overall 59.9 65.15 80.64

 

164

Patient C

Points-basis classiﬁcation results for patient C.

 

 

 

Epoch PCA k-NN k-NN GPAF
b 87.63 99.67 100
c 50.17 79.59 100
d 84.62 98.99 40.8
e 87.96 100 70.9
f 76.92 95.65 99.0
g 86.62 100 100
h 87.96 99.33 100
i 88.63 100 87.63
j 72.58 91.97 100
k 90.3 100 97.32
A 80.93 89.63 100
B 69.57 83.27 99.66
C 51.17 76.92 87.63
E 63.88 81.94 81.61
F 75.25 87.29 99.0
G 55.85 77.59 97.99
H 30.1 15.38 96.99
I 75.59 89.29 99.0
J 67.22 85.28 96.66
K 70.57 89.29 100

Overall 72.68 87.27 92.71

 

165

 

 

Patient D

Points-basis classiﬁcation results for patient D.

 

 

 

Epoch PCA k-NN k-NN GPAF
a 77.93 75.25 96.66
b 78.26 68.9 96.33
d 52.17 47.83 47.16
f 38.46 71.24 96.65
g 46.15 64.88 91.30
h 43.14 64.88 90.97
i 41.81 58.53 45.82
j 42.81 35.12 20.74
k 44.48 43.14 51.84
I 40.47 63.88 88.63
B 10.70 12.37 0.33
C 53.85 50.50 74.92
D 56.86 34.45 8.03
E 63.21 44.15 53.18
F 57.19 55.18 86.29
G 57.86 60.20 77.93
H 61.54 51.17 86.96
I 31.77 34.78 17.07
J 58.19 47.16 86.29
K 59.87 46.15 84.95

Overall 50.86 51.49 65.10

 

166

 

Patient E

Points-basis classiﬁcation results for patient E.

 

 

Epoch PCA k-NN k-NN GPAF
a 31.88 42.28 26.85
b 56.71 77.52 72.82
c 82.55 99.66 100.0
d 86.58 100.0 100.0
e 48.32 64.43 70.47
f 89.93 97.31 100.0
h 87.85 82.21 100.0
i 85.9 100.0 100.0
j 46.64 73.15 57.05
k 80.87 74.83 100.0
1 85.23 100.0 100.0
m 86.58 95.97 100.0
n 91.95 100.0 100.0
0 58.39 71.81 53.69
p 59.06 80.87 74.5
A 62.75 41.61 79.53
B 70.13 68.12 93.29
C 57.05 48.99 48.99
D 74.16 79.53 92.95
E 78.19 75.5 94.63
F 74.83 78.52 92.28
G 76.51 74.16 94.3
H 79.53 74.16 94.97
1 36.58 28.86 17.11
J 79.87 79.87 97.32

167

 

 

K 76.85 78.86 97.32
L 79.19 79.19 98.32
M 43.62 36.91 49.66
N 44.3 31.21 78.52
0 20.81 4.03 9.4
Overall 67.79 71.28 79.8

 

168

 

Patient F

Points-basis classiﬁcation results for patient F.

 

 

 

Epoch PCA k-NN k-NN GPAF
b 74.24 94.63 83.56
c 75.59 94.97 86.58
d 71.57 90.27 66.56
e 88.63 100.0 100.0
f 85.28 100.0 100.0
g 83.95 100.0 100.0
h 83.28 100.0 100.0
B 72.57 89.63 97.99
C 76.58 88.29 99.33
D 78.93 92.98 100.0
E 80.2 89.3 100.0
F 67.89 64.21 99.33
Overall 78.2 92.02 94.45

 

169

 

 

Patient G

Points-basis classiﬁcation results for patient G.

 

 

 

Epoch PCA k-NN k-NN GPAF
b 61.54 80.6 60.2
c 48.49 55.52 9.03
d 50.84 60.54 26.76
c 52.84 69.23 71.57
f 54.18 67.22 59.2
g 61.54 80.6 79.26
i 41.14 11.37 94.65
j 37.46 13.38 96.66
B 42.48 59.53 29.77
C 48.91 66.89 79.6
D 42.81 63.21 90.97
E 48.5 61.54 84.28
F 45.15 67.56 51.17
G 50.84 68.23 90.64
H 46.82 66.56 89.97
1 38.46 19.4 51.51

Overall 48.25 56.94 66.58

 

170

 

Appendix II

Glossary

ANN: Artificial neural network — a network of interconnected units, called neurons,
which activate or ﬁre an output signal depending on the input Signal. The network is
trained using a training algorithm, e. g., backpropagation algorithm, and training data.
Neural networks are also known as universal approximators, given that they are able to
approximate any function if sufﬁcient nodes and training set data are provided.

EA: Evolutionary algorithm — a family of computer algorithms inspired by natural
selection processes. In addition to GP, those algorithms include the Genetic Algorithm
(GA), Evolutionary Programming (EP), and Evolution Strategies (ES). These algorithms
are population-based and use one (or all) of the so-called genetic operators: selection,
crossover, and mutation.

Epilepsy: (gp'9-lgp's5) — a neurological condition that makes people susceptible to
seizures. A seizure is a term used to describe a change in sensation, awareness, or
behavior brought about by a brief electrical disturbance in the brain. Seizures vary from a
momentary disruption of the senses, to Short periods of unconsciousness or staring spells,
to convulsions.

GA: Genetic algorithm — a general purpose, global optimization search algorithm that
emulates the natural selection process by using genetic operators: selection, crossover,
and mutation.

GFNCAF: Genetically found neurally computed artiﬁcial feature — an algorithm
proposed in [31], which uses a GA and a classiﬁer to create relevant artiﬁcial features
from a given set of conventional features or raw data directly. The algorithm also may
reduce the dimensionality of the feature Space.

GP: Genetic programming - an extension of GA, genetic programming (GP) uses the
genetic operators to evolve programs (solvers), unlike GA, which evolves solutions to
solve problems, among them real-world applications. One of the most important aspects
of GP is its variable length representation, which is designed to ﬁnd simple as well as
complex programs.

GPAF: Genetic programming artificial feature — an algorithm that uses a GP in

combination with a classiﬁer to construct artiﬁcial features from a set of classical features
or raw data. This approach overcomes some of the limitations of the GFNCAF algorithm.

171

 

lctal: (Tk't91) — period or Signal in which a seizure occurs.

k-NN: k-nearest neighbor classifier — a supervised, nonparametric classiﬁer which uses a
distance measure, commonly Euclidean distance, to categorize an input pattern depending
on the class value of the maximum number of samples closer (nearest neighbors) to the
input pattern. Typically, 11: is assigned to be three or ﬁve.

PFA: probability of false alarm —- probability that the next negative class feature vector
sample (which for practical purposes is approximately all samples) will be a false
positive.

PFN: probability of false negative — probability that the next positive class feature vector
sample will be a false negative.

Preictal: (pr'é-‘l'k'PD — Signal refers to an EEG signal from a time period just prior to an
epileptic seizure.

UCO: unequivocal clinical onset -— time when an observer can see the consequences of a
seizure in the person, for example, a convulsion.

UEO: unequivocal EEG onset — time of seizure onset identiﬁed in an EEG signal by an
electroencephalographer.

172

 

 

[1]
[2]

[31

[4]

[51

[6]

l7]

[8]

[9]

[10]

[11]

BIBLIOGRAPHY
“Device to cut seizures gets F.D.A.'s nod,” Dow Jones, Washington, July 16, 1997.

A. A. Adewuya, New Methods in Genetic Search with Real- Valued Chromosomes,
Master’s Thesis. Massachusetts Institute of Technology, Cambridge, MA, 1996.

W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone, Genetic Programming:
An Introduction, San Francisco, CA: Morgan Kaufman Publishers, Inc., 1998.

C. Bigan, “A system for neural networks detection and automatic identiﬁcation of
EEG epileptic events,” IEE Colloquium in Intelligent Methods in Healthcare and
Medical Applications, No.1998/514, pp. 13/1-13/4, 1998.

A. Block and R. Fisher, “Can patients perform volitional motor acts at the start of a
seizure?,” Journal of Clinical Neurophysiology, vol. 16, no. 2, pp. 141-145, Mar.
1999.

V. E. Bondarenko, “Epilepsy-like phenomena in chaotic neural networks,” IEEE
International Conference on Neural Networks 1996, vol. 2, pp. 774-777, 8-11 Nov.
1996.

D. A. Campbell and L. W. Cahill, “Transient discrimination in nonlinear time series
using linear-phase tome-delay neural networks,” IEEE International Symposium on
Circuit and Systems 1996, vol. 2, pp. 524-527, 12-15 May 1996.

E. Chang, R. Lippmann, and D. Tong, “Using genetic algorithms to select and
create features for pattern classiﬁcation,” Proceeding IEEE International Joint
Conference on Neural Networks, vol. 3, pp. 747-752, 1990.

H. Chen, S. Zhong, and D. Yao, “Detection singularity value of character wave in
epileptic EEG by wavelet,” IEEE International Conference on Communications,
Circuits and Systems and West Sino Expositions 2002, vol. 2, pp. 1094-1097, 29
Jun.-l Jul., 2002.

V. Cherkassky and F. Mulier, Learning ﬁom Data: Concepts, Theory, and
Methods. NY: Wiley, 1998.

L. O. Chua and L. Yang, “Cellular neural networks: Theory”. IEEE Transactions on
Circuits and Systems-I, Volume: 35, no. 10, pp. 1257-1272, Oct. 1988.

173

 

[12] L. O. Chua and L. Yang, “Cellular neural networks: Applications”. IEEE
Transactions on Circuits and Systems-I, Volume: 35, no. 10, pp. 1273-1290, Oct.
1988.

[13] W. J. Conover, Practical Nonparametric Statistics, 3rd eds. Wiley, John & Sons,
Inc. 1998.

[14] M. D’Alessandro, The Utility of Intracranial EEG Feature and Channel Synergy
for Evaluating the Spatial and Temporal Behavior of Seizure Precursors. Ph.D.
Dissertation, Georgia Institute of Technology, Atlanta, 2001.

[15] M. D’Alessandro, R. Esteller, G. Vachtsevanos, A. Hinson, J. Echauz, and B. Litt,
“Epileptic seizures prediction using hybrid feature selection over multiple
intracranial EEG electrode contacts: a report of four patients,” IEEE Transactions
on Biomedical Engineering, vol. 50, no. 5, pp. 603-615, May 2003.

[16] M. D’Alessandro, G. Vachtsevanos, R. Esteller, J. Echauz, D. Sewell, and B. Litt,
“A systematic approach to seizure prediction using genetic algorithm and classiﬁer

based feature selection,” 14th International Conference on Digital Signal
Processing 2002, vol. 2, pp. 603-606, 1-3 July 2002.

[17] M. D’Alessandro, G. Vachtsevanos, A. Hinson, R. Esteller, J. Echauz, and B. Litt,
“A genetic approach to selecting the optimal feature for epileptic seizure
prediction,” Proceedings of the 23rd Annual EMBS International Conference 2001,
pp. 1703-1706, Oct. 25-28, 2001.

[18] R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd edition. New York, NY:
John Wiley & Sons, Inc., 2001.

[19] R. C. Eberhart, R. W. Dobbins, and W. R. S. Webber, “Neural network design
considerations for EEG spike detection,” Proceedings of the 1989 15th Annual
Northeast Bioengineering Conference, pp. 97-98, 1998.

[20] J. Echauz, Wavelet Neural Networks for EEG Modeling and Classification. Ph.D.
dissertation, Georgia Institute of Technology, Aug. 1995.

[21] J. Echauz, S. Kim, V. Ramani, and G. Vachtsevanos, “Neuro-fuzzy approaches to
decision making: A comparative study with an application to check authorization,”

Journal of Intelligent and Fuzzy Systems, no. 6, pp. 259-278, June 1998.

[22] Epilepsy Foundation of America, Landover, MD, http://www.efa.org.

1'74

 

 

[23] Epilepsy Foundation of America, Landover, MD, Epilepsy Facts & Figures,
http://www.efa.org/education/facts.htrnl, 1999.

[24] L. J. EShelman and D. J. Shaffer, “Real-coded genetic algorithms and interval-
schemata”. Foundations of Genetic Algorithms 2. San Mateo, CA: Morgan
Kaufman, pp. 187-202, 1993.

[25] R. Esteller, J. Echauz, T. Tcheng, B. Litt, and B. Pless, “Line length: An efﬁcient
feature for seizure onset detection,” Proceedings of the 23th Annual International
Conference of the IEEE Engineering in Medicine and Biology Society 2001,
pp.l707-1710, Oct. 25-28 2001.

[26] R. Esteller, G. Vachtsevanos, J. Echauz, M. D’Alessandro, C. Bowen, R. Shor, and
B. Litt, “Fractal dimension detects seizures onset in mesial temporal lobe epilepsy,”
Proceedings of the First Joint BMES/EMBS Conference: Serving Humanity,
Advancing Technology, Atlanta, GA, pp. 442, Oct. 13-16, 1999.

[27] R. Esteller, G. Vachtsevanos, J. Echauz, T. Henry, P. Pennell, C. Epstein, R. Bakay,
C. Bowen, and B. Litt, “Fractal dimension characterizes seizure onset in epileptic

patients,” Proceedings IEEE International Conference on Acoustics, Speech, and
Signal Processing 1999, vol. 4, pp.2343 — 2346, 1999.

[28] K. J. Falconer, Fractal Geometry: Mathematical Foundations and Applications,
John Wiley & Sons, 2003.

[29] P. J. F legg and A. Kirton, “Seizure-alerting and -response behaviors in dogs living
with epileptic children,” Neurology, 64 p. 581, Feb 2005.

[30] H. Firpi, Genetically Found, Neurally Computed Artificial Features with
Applications to Epileptic Seizure Detection and Prediction. Master’s Thesis,
University of Puerto Rico-Mayagﬁez, Mayagiiez, PR, 2001.

[31] H. Firpi and J. Echauz, “Genetically found, neurally computed artiﬁcial features
from relevant and irrelevant data,” Group T echnoIogy/Cellular Manufacturing
World Symposium 2000, San Juan, PR, Mar. 27-29, pp. 327-331, 2000.

[32] H. F irpi and E. Goodman, “Swarmed feature selection,” Proceedings of Emerging
Technologies and Applications for Imagery Pattern Recognition (AIPR 2004)
Washington DC, pp. 112-118, 2004.

[33] D. B. Fogel, Evolutionary Computation: Toward a New Philosophy of Machine
Intelligence. NY: IEEE Press, 1995.

175

 

[34] D. B. Fogel, Evolutionary Computation: The Fossil Record. New York, NY: IEEE
Press, pp. 301-309, 1998.

[35] L. J. Fogel, Intelligence through Simulated Evolution: Forty Years of Evolutionary
Programming. New York, NY: John Wiley & Sons, Inc., 1999.

[36] K. Fukunaga, Introduction to Statistical Pattern Recognition. NY: Academic Press,
1972.

[37] R. Gilmore and M. Lefranc, The Topology of Chaos: Alice in the Stretch and
Squeezeland. John Wiley and Sons Inc., 2002.

[3 8] D. E. Goldberg, Genetic Algorithm in Search, Optimization and Machine Learning.
Reading, MA: Addison-Wesley, 1989.

[39] J. C. Goswami and A. K. Chan, Fundamentals of Wavelets: Theory, Algorithms,
and Applications, John Wiley & Sons, Inc., 1999.

[40] R. L. Haupt and S. E. Haupt, Practical Genetic Algorithms. New York, NY: John
Wiley & Sons, Inc., 1998.

[41] S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd edition. Upper
Saddle River, NJ: Prentice-Hall, Inc., 1999.

[42] R. Harikumar and B. Sabarish Narayanan, “Fuzzy techniques for classiﬁcation of
epilepsy risk level from EEG signals,” Conference on Convergent Technologies for
Asia-Paciﬁc Region, vol. 1, pp. 209-213, Oct. 15-17, 2003.

[43] J. H. Holland, “Genetic algorithms,” Scientific American, vol. 267, pp. 66-72, July
1992.

[44] J. H. Holland, “Genetic algorithms and the optimal allocation of trials,” SIAM J.
Computing, vol. 2, no. 2, pp. 88-105.

[45] L. Iasemidis, D.-S. Shiau, W. Chaovalitwongse, J. C. Sackellares, P. M. Pardalos, J.
C. Principe, P. R. Carney, A. P. Prasad, B. Veeramani, and K. Tsakalis, “Adaptive
epileptic seizure prediction system,” IEEE Transactions on Biomedical
Engineering, vol. 50, no. 5, May 2003.

[46] L. Iasemidis, R. Savit, and J. Sackellares, “The use of dynamical analysis of EEG
frequency content in seizure prediction,” Proceeding American

Electroencephalographic Annual Conference. New Orleans, LA. pp. (abstract),
1993.

176

 

 

[471

[48]

[49]

[50]

[51]

[52]

[53]

B. H. Jansen, “Quantitative analysis of electroencephalograms: IS there chaos in the
future?,” International Journal of Biomedical Computing, vol. 27, pp. 95-123,
1991 .

K. Jerger et al., “Early seizure detection,” Journal of Clinical Neurophysiology, vol.
18, no. 3, pp. 259-268, 2001.

R. A. Johnson and D. W. Wichern, Applied Multivariate Statistical Analysis, 5th ed.
Prentice Hall, 2002.

S. M. Kay, Fundamentals of Statistical Signal Processing: Detection Theory. vol. 2.
Upper Saddle River, NJ: Prentice Hall, 1998.

J. M. Keller, R. Gray, and J. A. Givens Jr., “A fuzzy k-nearest neighbor algorithm,”
IEEE Transactions on Systems, Man and Cybernetics, vol. 15, no. 4, pp. 258-263,
1985.

D. H. Kil and F. B. Shin, Pattern Recognition and Prediction with Applications to
Signal Characterization. Woodbury, NY: AIP Press, 1996.

M. E. Kirlangic, D. Perez, S. Kudryavtseva, Griessbach, G. Henning, and G.
Ivanova “Fractal dimension as a feature for adaptive electroencephalogram
segmentation in epilepsy,” Proceedings of the 23'“ Annual EMBS International
Conference 2001, pp. 1573-1576, Oct. 25-28, 2001.

[54] A. Kirton, E. Wirrell, J. Zhang, and L. Hamiwka, “Seizure-alerting and -response

[55]

[56]

[571

[58]

behaviors in dogs living with epileptic children,” Neurology, 62, pp. 2303-2305,
Jun. 2004.

M. Kotani, M. Nakai, and K. Akazawa, “Feature extraction using evolutionary
computation,” Proceedings of the Congress on Evolutionary Computation 1999, pp.
1230-1236, 1999.

J. R. Koza, “Hierarchical genetic algorithms operating on populations of computer
programs,” Proceedings of the I 1th International Joint Conference on Artificial
Intelligence, vol. 1, pp. 768-774, 1989.

J. R. Koza, Genetic Programming: 0n Programming of Computers by Means of
Natural Selection. Cambridge, MA: MIT Press, 1992.

J. R. Koza, Genetic Programming 11: Automatic Discovery of Reusable Programs.
Cambridge, MA: MIT Press, 1994.

177

[59] S. A. Lee, D. D. Spencer, and S. S. Spencer, “Intracranial EEG seizure-onset
patterns in neocortical epilepsy,” Epilepsia, vol. 41, no. 3, pp. 297-307, 2000.

[60] B. Litt, R. Esteller, J. Echauz, M. D’Alessandro, R. Shor, T. Henry, P. Pennell, C.
Epstein, R. Bakay, M. Dichter, and G. Vachtsevanos, “Epileptic seizures may
begin hours in advance of clinical seizures: A report of four patients,” Neuron, vol.
29,no.4,2001.

[61] S. Luke and L. Panait, “Lexicographic parsimony pressure,” Proceedings of
Genetic and Evolutionary Computation Conference 2002, San Francisco, CA:
Morgan Kauﬁnann, pp. 829-836, 2002.

[62] T. McConaghy, H. Leung, and V. Varadan, “Functional reconstruction of
dynamical systems from time series using genetic programming,” 26th Annual
Conference of the IEEE Industrial Electronics Society, 2000, vol. 3, pp. 2031-2034,
2000.

[63] S. Mehta, R. W. Koser, and P. Venziale, “Wavelet analysis as a potential tool for
seizure detection,” Proceedings of the IEEE-SP International Symposium on Tome-
Frequency and T ime-Scale Analysis 1994, pp. 584-587, 1994.

[64] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolutionary Programs,
2nd edition, New York, NY: Springer-Verlag, 1994.

[65] E. Micheli-Tzanakou, Supervised and Unsupervised Pattern Recognition: Feature
Extraction and Computational Intelligence. New York, NY: CRC, 2000.

[66] M. Mitchell, An Introduction to Genetic Algorithms, MIT Press, 1998.

[67] M. M. Mirbagheri, K. Badie, R. M. Hashemi Golpayegani, and M. Amir Ahmadi,
“A neural network approach to EEG classiﬁcation for the propose of differential
diagnosis between epilepsy and normal EEG states,” Proceedings of the Annual

International Conference of the IEEE Engineering in Medicine and Biology Society,
vol. 14, pp. 2649-2650, Oct. 29-Nov. 1 1992.

[68] A. H. Nayfeh and B. Balachandran, Applied Nonlinear Dynamics: Analytical,
Computational, and Experimental Methods, John Wiley & sons, Inc., 1994.

[69] J. J. Niederhauser, “Simple electric model for epilepsy (SEMEY),” IEEE
Potentials, vol. 21, issue 1, pp. 35-39, Feb.-Mar. 2002.

178

 

 

 

[70] C. Niederhoefer, F. Gollas, A. Chemihovskyi, K. Lehnertz, and R. Tetzlaff,
“Detection of seizure precursors in the EEG with cellular neural networks,”
Epilepsia, vol. 45, suppl. 7, p. 245 (abstract), 2004.

[71] S. J. Orfanidis, Optimum Signal Processing: An Introduction, 2nd edition.
Englewood Cliffs: NJ: Prentice Hall, 1996.

[72] O. Ozdamar, C. N. Lopez, and l. Yaylali, “Detection of transient EEG patterns with
adaptive unsupervised neural networks,” IEEE International Biomedical
Engineering Days, pp. 192-197, 1992.

[73] S. K. Pal, S. Bandyopadhyay, and C. A. Murphy, “Genetic algorithms for
generations of class boundaries,” IEEE Transactions on System, Man and
Cybernetics, vol. 28, pp. 816-828, 1998.

[74] H. S. Park, Y. H. Lee, D. S. Lee, and S. 1. Kim, “Detection of epileptiform activity
using wavelet and neural network,” Proceedings of the 19th Annual International

Conference of the IEEE Engineering in Medicine and Biology Society, pp.1194-
1197, Oct. 30-Nov. 2 1997.

[75] T. S. Parker and L. O. Chua, “Chaos: A Tutorial for Engineers,” Proceedings of the
IEEE, pp. 982-1008, Aug. 1987.

[76] E. A. Patrick and F. P. Fischer 111, “A generalized k—nearest neighbor rule,”
Information and Control, vol.: 16, no. 2. pp. 128-152, 1970.

[77] N. Pradhan and P. K. Sadasivan, “Validity of dimensional complexity

measurements of EEG Signals,” International Journal of Bifurcation and Chaos,
vol. 7, no.1, pp. 173-186, 1997.

[78] W. S. Pritchard and D. W. Duke, “Measuring chaos in the brain: a tutorial review of
nonlinear dynamical EEG analysis,” International Journal of Neuroscience
(preprint), Apr. 1992.

[79] W. Punch, E. Goodman, M. Pei, L. Chai-Shun, P. Hovland, and R. Enbody,
“Further Research on Feature Selection and Classiﬁcation Using Genetic
Algorithms”, ICGA 1993, pp. 557-564.

[80] I. Rechenberg, Evolutionsstrategie: Optimierung T echnischer Systeme nach
Prinzipien der Biologischen Evolution. F rommann-Holzboog, Stuttgart, 1973.

179

 

 

[81] J. A. Rice, Mathematical Statistics and Data Analysis, 2nd eds. Duxbury Press,
1994.

[82] M. Risinger et al., “lctal localization of temporal seizures with scalp-sphenoidal
recordings,” Neurology, vol. 39, pp. 1288-1293, 1989.

[83] Z. Rogowski, I. Gath, and E. Bental, “On the prediction of epileptic seizures,”
Biological Cybernetics, vol. 42, pp. 9-15, 1981.

[84] D. Ruelle, Evolution and Strange Attractors: The Statistical Analysis of Time Series

for Nonlinear Deterministic Nonlinear Systems. Cambridge: Cambridge University
Press, 1989.

[85] H. P. Schwefel, Evolution and Optimum Seeking: The Sixth Generation, Wiley,
John & sons, 1993.

[86] D. A. Scott and S. J. Schiff, “Predictability of EEG interictal spikes,” Biophysical
Journal, vol. 69, pp. 1748-1757, Nov. 1995.

[87] S. Silva and J. Almeida “GPLab: A genetic programming toolbox for Matlab,”
Proceedings of the Nordic MATLAB Conference 2003, pp. 273-278, 2003.

[88] S. Silva and E. Costa, “Dynamic limits for bloat control: Variations on size and
depth,” Proceedings of GECCO 2004, Berlin, Germany: Springer Verlag, pp. 666-
677, 2004.

[89] J. Sherrah, R. Bogner, and A. Bouzerboum, “Automatic selection of features for
classiﬁcation using genetic programming,” Proceedings of Australian and New
Zealand Conference on Intelligent Information Systems 1996, pp. 284-287, 1996.

[90] R. Sowa, F. Morrnann, A. Chemihovskyi, C. Niederhoefer, R. Tetzlaff, C Elger,
and K. Lehnertz, “Seizure prediction: Measuring EEG phase synchronization with
cellular neural networks,” Epilepsia, vol. 45, suppl. 7, p. 244 (abstract), 2004.

[91] J. C. Sprott, Chaos and Time—Series Analysis, New York, NY: Oxford Univesity
Press, 2003.

[92] M. D. Srinath, P. K. Rajasekaran, and R. Viswanathen, Introduction to Statistical
Signal Processing with Applications. Englewood Cliffs, NJ: Prentice Hall, 1996.

[93] S. D. Steams, “On selecting features for pattern classiﬁers,” Third International
Joint Conference on Pattern Recognition, Coronado, CA, 1976.

180

 

 

[94] L. Szilagyi, Z. Benyé, and S. M. Szilagyi, “A new method for epileptic waveform
recognition using wavelets decomposition and artiﬁcial neural networks,”
Proceedings of the Second Joint EMBS/BMES Conference, Houston TX, pp. 2025-
2026, Oct. 23-26, 2002.

[95] F. Takens, “Detecting strange attractors in turbulence,” Dynamical Systems and
Turbulence, Warwick 1980 Lecture Notes in Mathematics 898. Berlin, Germany:
Springer-Verlag, pp. 336-381, 1981 .

[96] R. Tetzlaff and D. WeiB, “Cellular neural networks for the anticipation of epileptic

seizures,” IEEE International Symposium on Circuit and Systems 2002, vol. 4, pp.
IV-177-IV180, May 26-29, 2002.

[97] R. Tetzlaff, C. Niederhofer, and P. Fischer, “Feature extraction in epilepsy using a
cellular neural network based device — ﬁrst results,” Proceedings of the

International Symposium on Circuit and Systems 2003, vol. 3, pp. III-850-III-853,
May 25-28, 2003.

[98] J. Theiler, “On the evidence for low-dimensional chaos in an epileptic
electroencephalogram,” Physics Letters A, vol. 196, pp. 335-341, Sept. 1995.

[99] P. Wahlberg, A. Grennberg, and G. Salomonsson, “Feature extraction from
multichannel EEG transients for clustering purposes,” IEEE 17'” Annual

Conference Engineering in Medicine and Biology Society, vol. 2, pp. 915-916, Sept.
20-23 1995.

[100] W. R. S. Webber, B. Litt, K. Wilson, and R. Lesser, “Practical detection of
epileptiform discharges (EDS) in the EEG using an artiﬁcial neural network: A
comparison of raw and parameterized EEG data,” Electroencephalography and
Clinical Neurophysiology, vol. 91, pp. 194-204, 1994.

[101]I. Yaylali, H. Kocak, and P. Jayakar, “Detection of seizures from small samples

using nonlinear dynamic system theory,” IEEE Transactions on Biomedical
Engineering, vol. 43, no. 7, pp. 743-751, July 1996.

[102]J. Yen and R. Langari, Fuzzy Logic: Intelligence, Control, and Information. Upper
Saddle River, NJ: Prentice Hall, 1999.

[103]]. Zhu, N. Hazarika, A. C. Tsoi, and A. A. Sergejew, “Classiﬁcation of EEG Signals

using wavelet coefﬁcients and an ANN,” Workshop of Brain Electric and Magnetic
Topography, Sydney, Australia. Feb. 1994.

181

[104]B. Zupan, M. Bohanec, J. Demsar, and I. Bratko, “Feature transformation by
function decomposition,” IEEE Intelligent Systems, pp. 38-43, March/April 1998.

182

 

 

A

artificial feature .................................................. 2
autocorrelation function ................................... 72
B

baseline signal .................................................. l 1
block-basis ........................................................ 75
C

computer-generated .............. See artiﬁcial feature
crossover .......................................................... 30
curve length .................................................... 127
D

diﬂeomorphism ................................................. 6 l
E

embedding dimension ....................................... 61
embedding matrix ............................................. 61
error risk .......................................................... 78
F

features ............................................................. 1 1
functions ........................................................... 34
G

generation ......................................................... 33
genetic algorithm .............................................. 30
Genetic programming ....................................... 34
genetic programming artificial features ........... 40
genetically found, neurally computed artificial

features ........................................................ 28

Index

183

I

ictal signal ....................................................... 1 1
K

k-nearest neighbor ........................................... 70
L

line length .................................. See curve length
M

mutation ........................................................... 30
N

nonseizure ............................... See baseline signal
P

point-basis ....................................................... 75
preictal signal .................................................. l 1
preseizure ................................ See preictal signal
S

seizure ................................................................ 6
seizure- ee ............................. See baseline signal
selection ........................................................... 30
T

terminals .......................................................... 34
time-delay embedding ...................................... 61
U

Unequivocal clinical onset .............................. l 1
Unequivocal electrographic onset ................... l 1

 

 

   

Il[ll]‘]]]l]]l]]]l1131]]