TRACKING SINGLE-UNITS IN CHRONIC NEURAL RECORDINGS FOR BRAIN
MACHINE INTERFACE APPLICATIONS
By
Ahmed Ibrahim Eleryan

A THESIS
Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of
Electrical Engineering - Master of Science
2013

ABSTRACT
TRACKING SINGLE-UNITS IN CHRONIC NEURAL RECORDINGS FOR BRAIN
MACHINE INTERFACE APPLICATIONS
By
Ahmed Ibrahim Eleryan
Ensemble recording of multiple single-unit activity has been used to study the mechanisms of neural population coding over prolonged periods of time, and to perform reliable
neural decoding in neuroprosthetic motor control applications. However, there are still many
challenges towards achieving reliable stable single-units recordings. One primary challenge
is the variability in spike waveform features and ﬁring characteristics of single units recorded
using chronically implanted microelectrodes, making it challenging to ascertain the identity
of the recorded neurons across days. In this study, I present a fast and eﬃcient algorithm
that tracks multiple single-units recorded in non-human primates performing brain control
of a robotic limb, based on features extracted from units’ average waveforms and interspike
intervals histograms. The algorithm requires a relatively short recording duration to perform
the analysis and can be applied at the start of each recording session without requiring the
subject to be engaged in a behavioral task. The algorithm achieves a classiﬁcation accuracy
of up to 90% compared to manual tracking. I also explore using the algorithm to develop an
automated technique for unit selection to perform reliable decoding of movement parameters
from neural activity.

TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

v

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vi

Chapter 1 Introduction . . . . . . . . . . . . . . . . .
1.1 Background . . . . . . . . . . . . . . . . . . . . . .
1.2 Challenges Facing Brain Machine Interface Systems
1.3 Related Work . . . . . . . . . . . . . . . . . . . . .
1.4 BMI Systems . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

1
1
9
10
11

Chapter 2 Single-Unit Tracking . . . . . . . . . . . .
2.1 Algorithm Overview . . . . . . . . . . . . . . . . .
2.2 Single-Unit Waveform Characteristics . . . . . . . .
2.2.1 Correlation . . . . . . . . . . . . . . . . . .
2.2.2 Normalized Peak-to-Peak Height Diﬀerence .
2.2.3 Normalized Peak-to-Peak Time Diﬀerence .
2.2.4 Peak Matching . . . . . . . . . . . . . . . .
2.3 Single-Unit Firing Characteristics . . . . . . . . . .
2.3.1 KullbackLeibler Divergence . . . . . . . . .
2.3.2 Bhattacharyya Distance . . . . . . . . . . .
2.3.3 KolmogorovSmirnov Statistic . . . . . . . .
2.3.4 Earth Mover’s Distance . . . . . . . . . . . .
2.4 Tracking Classiﬁer . . . . . . . . . . . . . . . . . .
2.4.1 Support Machine Vectors (SVM) . . . . . .
2.4.2 Relevance Machine Vectors (RVM) . . . . .
2.4.3 Maximum A Posteriori Classiﬁcation (MAP)
2.4.4 Naive Bayesian Classiﬁcation (NB) . . . . .
2.5 Tracking Algorithm . . . . . . . . . . . . . . . . . .
2.5.1 Pre-processing . . . . . . . . . . . . . . . . .
2.5.2 Algorithm . . . . . . . . . . . . . . . . . . .
2.5.3 Post-processing . . . . . . . . . . . . . . . .
2.6 Results . . . . . . . . . . . . . . . . . . . . . . . . .
2.6.1 Data Acquisition and Behavioral Task . . .
2.6.2 Training Features . . . . . . . . . . . . . . .
2.6.3 Classiﬁers Evaluation . . . . . . . . . . . . .
2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

13
13
14
15
15
16
16
17
18
18
19
19
20
22
22
23
23
24
24
24
26
28
28
30
30
41

Chapter 3 Unit Selection Algorithm for Decoding . . . . . . . . . . . . . . .
3.1 Unit Selection Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43
44

iii

3.2
3.3
3.4

3.5
3.6

3.7

Tracking Unit Stability . . . . . . . . . . . . . . . . . . . . . .
Clusters of Functionally Connected Units . . . . . . . . . . . .
3.3.1 A Multi-Scale Approach . . . . . . . . . . . . . . . . .
Balancing Units . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Measuring the functional connectivity between units . .
3.4.2 Balancing Algorithm . . . . . . . . . . . . . . . . . . .
Unit Selection Algorithm . . . . . . . . . . . . . . . . . . . . .
Unit Replacement Strategies . . . . . . . . . . . . . . . . . . .
3.6.1 Replacement Strategy 1: Unit Closest to Cluster . . . .
3.6.2 Replacement Strategy 2: Unit Closest to Dropped Unit
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.7.1 Tracking Single-Units Stability . . . . . . . . . . . . .
3.7.2 Stability Analysis . . . . . . . . . . . . . . . . . . . . .
3.7.3 Unit Selection . . . . . . . . . . . . . . . . . . . . . . .
3.7.4 Replacement Strategies . . . . . . . . . . . . . . . . . .

Chapter 4

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

44
46
46
47
48
48
51
53
53
54
56
56
60
61
62

Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

74

APPENDICES . . . . . . . . . . . . . . .
Appendix A: Peak Matching
Appendix B: Functional Connectivity Algorithm
BIBLIOGRAPHY

. . . . . . . . . . . . .

iv

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

. . . . . . . . . . . . . . . . . .

79

. . . . . . . . . . . . . . . . .

88

LIST OF TABLES

Table 2.1:

Waveform-derived distance measures . . . . . . . . . . . . . . . . . .

15

Table 2.2:

ISIH-derived distance measures . . . . . . . . . . . . . . . . . . . . .

17

Table 2.3:

Classiﬁers’ Performance - Classiﬁcation Accuracy . . . . . . . . . . .

40

Table 2.4:

Classiﬁers’ Performance - Percentage of Correct Proﬁles . . . . . . .

40

Table 2.5:

Average execution time per channel when the RVM was used to track
15 datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

Table 3.1:

Classiﬁers’ Performance - Classiﬁcation Accuracy . . . . . . . . . . .

56

Table 3.2:

Classiﬁers’ Performance - Percentage of Correct Proﬁles . . . . . . .

60

Table 3.3:

Replacement Strategies Performance - average MSE between the reach
velocity decoded using the original cluster and that decoded using the
cluster with a replacement unit across all cluster units. . . . . . . .

69

Replacement Strategies Performance - the number of times the strategy provided the replacement unit that has the least MSE among the
free units pool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69

Replacement Strategies Performance - average MSE between the reach
velocity decoded using the original cluster and that decoded using
the cluster with a replacement unit across all cluster units - Testing
datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71

Table 3.4:

Table 3.5:

v

LIST OF FIGURES

Figure 1.1:

Figure a shows a sample neural trace recorded on an electrode. Figure
b displays the APs detected in the trace (surrounded by boxes). The
APs are obtained after ﬁltering the raw data with Butterworth ﬁlter
with frequency range 300-5000 Hz. The ﬁgure is taken from [36]. For
interpretation of the references to color in this and all other ﬁgures,
the reader is referred to the electronic version of this thesis. . . . .

2

Figure 1.2:

A spike-sorted channel with two single-units. . . . . . . . . . . . . .

4

Figure 1.3:

Recorded waveforms of a single-unit along with their average. The
unit’s waveforms form a stochastic process with a number of random
variables equals to the number of sample in each waveform snippet (in
this case each waveform is made up of 48 samples). Each waveform
is a realization of this stochastic process. . . . . . . . . . . . . . . .

6

The spike train for a single-unit is shown in Figure a (top) and the
ISIH inferred from it is shown in Figure b (bottom). . . . . . . . .

8

Components of a typical motion-restoration BMI system: a recording
device, a signal processing module to extract information from raw
signals, a spike sorter to identify single-units, a decoder to map the
neuronal input signal to movement parameters, and an actuator to
perform the action. . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

Tracking Algorithm: The tracking module takes in the spike sorted
units and the proﬁles stored on the channel and calculates the distance between units and stored proﬁles. It then assigns each unit
to either an existing proﬁle or a new proﬁle based on the distance
computed. The output is the updated set of proﬁles for that channel.

14

Process of training a waveform-based classiﬁer. Using spike sorted
units recorded across several days, experts track the units manually
and produce pairs of matching average waveforms (true positives).
Using the same set of units, pairs of non-matching average waveforms are generated by comparing spike sorted units recorded simultaneously on the same channel. . . . . . . . . . . . . . . . . . . . .

21

Figure 1.4:

Figure 1.5:

Figure 2.1:

Figure 2.2:

vi

Figure 2.3:

Application of a waveform-based classiﬁer. On any given day x, the
spike sorted units of a given channel c are compared against the stored
proﬁles on the same channel c. Pairwise distance vectors are calculated from features extracted from pairs of waveforms. The tracking
classiﬁer makes the decision of whether to add a unit to an existing
proﬁle or to create a new proﬁle for it. . . . . . . . . . . . . . . . .

27

R
Spike sorting on the Cerebus⃝ Online Sorter. Each single-unit is
deﬁned by a set of “hoops” where any waveform intersecting with all
hoops of a given unit is added to that unit. . . . . . . . . . . . . . .

29

Probability distributions of the training true positives (in blue) and
true negatives (in red) for each of the four wavefrom-based features.

31

Figure a shows an example of two average waveforms that have very
similar peak shapes but diﬀerent peak-to-peak height, while Figure
b shows an example of two average waveforms that have very similar
peak shapes but diﬀerent peak-to-peak time. . . . . . . . . . . . . .

32

Probability distributions of the training true positives (in blue) and
true negatives (in red) for each of the four ISIH-based features. . .

33

The ﬁgure shows an example of two instances of a single-unit on
two successive days where the average waveform experienced some
variability and the ISIH remained the same. . . . . . . . . . . . . .

34

Comparison of the classiﬁers performance using cross validation. Four
diﬀerent sets of features were used. Cross validation was performed
using 6 partitions (each had 5 training datasets and 5 testing datasets).
The errorbars indicate the standard deviation across the 6 partitions.
All classiﬁers performed similarly for feature sets 1, 2 and 4 (conﬁrmed by t-test). . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

Figure 2.10: Comparison of the classiﬁer performance based on the average execution time per channel. Cross validation was performed using 6
partitions and 4 feature sets were deﬁned. The errorbars indicate the
standard deviation across the 6 partitions. All classiﬁers performed
similarly for feature sets 1, 2, and 4 (conﬁrmed by t-test). . . . . .

37

Figure 2.11: RoC curves for diﬀerent classiﬁers using 4 diﬀerent feature sets. The
top left ﬁgure is for set 1, the top right ﬁgure is for set 2, and so on.

38

Figure 2.4:

Figure 2.5:

Figure 2.6:

Figure 2.7:

Figure 2.8:

Figure 2.9:

vii

Figure 2.12: Decision boundaries for the RVM classiﬁer using 4 diﬀerent feature
sets For visualization, the RVM classiﬁer was trained on the ﬁrst two
principal components of the training features. The top left ﬁgure is
for set 1, the top right ﬁgure is for set 2, and so on. The black circles
are the basis vectors of the RVM classiﬁer. . . . . . . . . . . . . . .

39

Figure 2.13: Two of the proﬁles tracked by the RVM classiﬁer across 15 datasets.
The proﬁles were recorded on channels 4 and 90 respectively. . . . .

39

Figure 2.14: Average execution time per channel when feature set 4 and the RVM
classiﬁer were used. The classiﬁer was run on 35 datasets spanning
88 days. Convergence in the execution time happens on day 34 of the
analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

Figure 3.1:

Figure 3.2:

Figure 3.3:

Classiﬁcation of single-units. For a stability window of 7 days, stable
units are those units that were identiﬁed as similar on a daily basis
throughout the stability window. Partially stable units are those units
that were identiﬁed throughout more that half the stability window.
Unstable units are all the remaining units. . . . . . . . . . . . . . .

45

Unit selection algorithm: On each channel, spike sorted units are
compared against stored proﬁles on the same channel. The tracking
classiﬁer determines whether to add a single-unit to an existing proﬁle
or to create a new proﬁle for it. The updated units proﬁles are input
to a stability ﬁlter that classiﬁes units into stable, partially stable and
unstable units. Functional connectivity matrix is calculated for stable
units. The matrix is then input to a clustering algorithm that outputs a set of functionally connected clusters of single-units. However,
these clusters are unbalanced in terms of the number of units. Therefore, the clusters pass through a balancing algorithm that maintains
the functional connection relations between units while balancing the
number of units across clusters. Each cluster is used to control a
kinematic variable. . . . . . . . . . . . . . . . . . . . . . . . . . . .

52

An example for an MST constructed from a graph containing 10
nodes. An MST can be constructed using either Prim’s or Kruskal’s
algorithms. Both algorithms are greedy in the way they choose the
MST edges. In both algorithms, edges are sorted ascendingly based
on their weights and the lowest weight edges are added one by one as
long as a loop is not formed and until all nodes are added to the tree.
The boldface edges in the ﬁgure are the ones that were included in
the MST. Total cost of the MST is 31. Taken from [50]. . . . . . .

55

viii

Figure 3.4:

Comparison of the classiﬁers performance using cross validation. Four
diﬀerent sets of features were used. Cross validation was performed
using 6 partitions (each had 5 training datasets and 5 testing datasets).
The errorbars indicate the standard deviation across the 6 partitions.
All classiﬁers performed similarly for feature sets 1, 2 and 4 (conﬁrmed by t-test). . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

Comparison of the classiﬁer performance based on the average execution time per channel. Cross validation was performed using 6
partitions and 4 feature sets were deﬁned. and the errorbars indicate the standard deviation across the 6 partitions. All classiﬁers
performed similarly for feature sets 1, 2, and 4 (conﬁrmed by t-test).

58

Two of the proﬁles tracked by the RVM classiﬁer across 15 datasets.
The proﬁles were recorded on channels 4 and 90 respectively. . . . .

59

Average execution time per channel when feature set 4 and the RVM
classiﬁer were used. The classiﬁer was run on 35 datasets spanning
88 days. Convergence in the execution time happens on day 34 of the
analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

59

For a given day x: the black line shows total number of units recorded
on day x, the blue and green lines indicate the number units that were
recorded on day 1 and were also recorded or dropped respectively on
day x, and the red line shows the number of new units recorded on
day x relative to day 1. . . . . . . . . . . . . . . . . . . . . . . . .

61

Number of stable, partially stable, and unstable units on any given
day. An increase in the number of units is reﬂected as an increase in
the number of unstable units and a decrease in the number of units
is reﬂected as a decrease in the number of stable units. . . . . . . .

62

Figure 3.10: The ﬁgure shows 4 sample proﬁles recorded on channels 30, 32, 49,
and 50 respectively. The single-units represented by the proﬁles were
stable throughout the duration of the analysis (5 months) and were
recorded on all 55 datasets. Each subplot displays the mean of the
average waveforms of the instances of each proﬁle. The color range
surrounding the plots repesents the standard deviation across all instances average waveforms. We can see that all 4 single-units experienced very little changes in their average waveforms throughout the
duration of the analysis. . . . . . . . . . . . . . . . . . . . . . . . .

63

Figure 3.5:

Figure 3.6:

Figure 3.7:

Figure 3.8:

Figure 3.9:

ix

Figure 3.11: The ﬁgure shows 4 sample proﬁles recorded on channels 23, 25, 29,
and 42 respectively. The single-units represented by the proﬁles were
stable throughout the duration of the analysis (5 months) and were
recorded on all 55 datasets. Each subplot displays the mean of the
average waveforms of the instances of each proﬁle. The color range
surrounding the plots repesents the standard deviation across all instances average waveforms. The single-units in this experienced bigger changes in their average waveforms throughout the duration of
the analysis than those in Figure 3.10. The algorithm was still able to
capture those changes and correctly classify the single-units instances. 64
Figure 3.12: The ﬁgure shows 2 sample proﬁles recorded on channels 44 and 48
respectively. The single-units represented by the proﬁles were unstable and they were only recorded on 18 and 15 datasets respectively
(out of 55 datasets). We can see that the average waveforms of both
units are unstable and there is much variability across days. . . . .

65

Figure 3.13: Histogram of the number of contiguous days a unit is likely to be
recorded on until it drops. The large peaks on the left and right of
the plot demonstrate that once a unit becomes stable, it is likely to
remain stable for a long time. . . . . . . . . . . . . . . . . . . . . .

65

Figure 3.14: The ﬁrst ﬁgure shows the correlation matrix obtained by the functional connectivity algorithm where spectral clustering was applied
on the matrix to partition the units into 5 clusters. The numbers
on the axes indicated the start of each cluster. It is clear that the
clustering algorithm results in unbalanced clusters, this is especially
true for cluster 3 that contains only 3 units while cluster 1 contains 30
units. The second ﬁgure shows that correlation matrix obtained by
balancing the number of units across clusters while taking the functional connectivity between units into consideration. The strongly
connected units in the initial clusters remained while the weakly connected ones were transferred to other clusters. . . . . . . . . . . . .

66

Figure 3.15: Mapping of the cluster units on the physical recording array. . . . .

67

Figure 3.16: A subset of the session recorded on day 1. One unit was dropped and
each replacement strategy provided a replacement unit to be used in
the oﬄine decoding of the reach velocity instead of the dropped one.

70

x

Figure 3.17: The ﬁgure summarizes the performance of the three replacement
strategies when tested on 9 unseen datasets. The replacement unit
in each strategy was selected on the ﬁrst day and then it was used on
the 9 subsequent days. The error bars are the standard deviation of
the average MSE across days. . . . . . . . . . . . . . . . . . . . . .

71

Figure 3.18: Figure a displays the augmented correlation matrix of the cluster
units on day 1. Figure b shows the MST formed from the correlation
matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

73

Figure 1:

The ﬁgure shows the steps of calculating the peak matching similarity
of a signal x (blue) to another signal y (red). Both x and y have two
peaks. Weights are given to the peaks of x, µ1 and µ2 . Closeness
factors are calculated between pairwise peaks in both signals. In this
case, there are 4 closeness factors to calculate (c11 , c12 , c21 , and c22 ).
For each signal in x, we select the maximum closeness factor it has
with any of the peaks in y. We end up with 2 factors, c1 and c2 , one
for each peak in x. The total similarity of x to y is then the sum of
the peak weights multiplied by the closeness factors for all peaks of
x (i.e. µ1 c1 + µ2 c2 ). . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

85

Chapter 1
Introduction

1.1

Background

Extracellular recording is the observation of the membrane potential of an ensemble of neurons using a voltage sensing device. Diﬀerent types of signals that can be extracted from
extracellular recording are described in [36]. Invasive methods have higher time and space
resolutions than non-invasive methods. In this study, we are primarily interested in action
potentials. An action potential (AP), or a spike, is the main communication method between
neurons and it is a short event (< 1ms) where the neuron’s membrane potential rises and
falls due to the inﬂow and outﬂow of diﬀerent types of ions. These signals can be recorded
invasively using a single electrode or an array of electrodes.
Figure 1.1 shows the neural trace recorded on an electrode. The signal is band-ﬁltered
using a Butterworth ﬁlter with frequency range 300-5000 Hz in order to extract spikes [36].
Detection of spikes occurs when the ﬁltered signal crosses a user-deﬁned threshold (typically
equals to three times the root mean square of the noise [52]) and snippets of the signal
surrounding the spikes are extracted which we call “spike waveforms”. Spike waveforms are
time series (since they are subsets of the recorded signal) and they are aligned using one of
two sample points: the threshold-crossing point or the waveform minimum peak.
The signal recorded in extracellular recordings is the aggregate contribution of the mem-

1

Figure 1.1: Figure a shows a sample neural trace recorded on an electrode. Figure b displays
the APs detected in the trace (surrounded by boxes). The APs are obtained after ﬁltering
the raw data with Butterworth ﬁlter with frequency range 300-5000 Hz. The ﬁgure is taken
from [36]. For interpretation of the references to color in this and all other ﬁgures, the reader
is referred to the electronic version of this thesis.

2

brane potential of all the neurons in the vicinity of the electrode (i.e. the spikes detected on
one electrode might belong to more than one neuron). To be able to understand the brain
encoding mechanisms and the response of individual neurons to diﬀerent stimuli, we need to
distinguish between the events of diﬀerent neurons. Therefore, a spike sorting algorithm is
needed to sort out the input spike waveforms into diﬀerent clusters where each represents a
single-unit (neuron) (Figure 1.2). There are many algorithms to perform spike sorting such
as: using wavelets, principal components analysis, and k-means.
The spike waveforms are characteristic for each single-unit and their shape depends on the
type of neuron and its location relative to the recording electrode [7]. The spike waveforms
from a single-unit form a stochastic process, (denote it as W), consisting of a number of
random variables equals to the number of samples in each waveform snippet.

W = [W(1) . . . W(m)]T

(1.1)

where m is the number of random variables in the waveforms stochastic process (= number
of samples in the waveform snippets).
Each random variable reﬂects the variability at a time instance (i.e. sample) along the
spike waveform and it can be modelled as [36]:

W(t) = S(t) + n(t)

(1.2)

where t ∈ [1, m] is the random variable index (or time instance), W(t) ∈ R is the recorded
signal, S(t) ∈ R is the real membrane potential, and n(t) is the noise resulting from other
neurons. n(t) can be assumed to be colored and Gaussian distributed with zero mean and

3

300
200

uV

100
0
−100
−200
−300
0

10

20
30
Time(sample#)

40

50

10

20
30
Time(sample#)

40

50

300
200

uV

100
0
−100
−200
−300
0

Figure 1.2: A spike-sorted channel with two single-units.

4

coveriance Σ (i.e. n N (0, Σ)) [36]. Although the Gaussian assumption may not be realistic,
it is reasonable for suﬃciently large observation intervals. Each recorded waveform belonging
to a single-unit is a realization of the stochastic process forming its waveforms:

Wi = [Wi (1) . . . Wi (m)]T

(1.3)

The average waveform for a single-unit with l recorded waveforms (i.e. realizations) is:

¯
W=

l
∑

Wi (1)/l · · ·

i=1

l
∑

T
Wi (m)/l

(1.4)

i=1

Figure 1.3 displays an example of a single-unit waveforms stochastic process. The ﬁgure
shows diﬀerent realizations of the process and their average. The number of random variables
in this process is 48. The waveforms were aligned based on the point of threshold crossing.
A spike train is a sequence of zeros and ones at millisecond precision. A “one” means
the occurrence of a spike at that time instance, and a zero means the absence of a spike.
Another single-unit characteristic that can be inferred from its spike train is the “interspike
interval histogram (ISIH)”, which is a popular feature to characterize the ﬁring pattern of a
neuron. An ISIH is a plot of the distribution of the times between a neuron’s spiking events.
The standard ISIH ﬁts an exponential distribution reﬂecting the underlying homogenuous
poisson process that governs the ﬁring of the neuron [10]. Each neuron has a refractory
period after each spike where it is hard (but not impossible) for it to ﬁre another spike.
In the presence of a stimulus in the receptive ﬁeld of a neuron, the ﬁring characteristics of
the neuron become time-varying and they can be modelled by an inhomogenuous poisson
process. During that period, the shape of the neuron’s ISIH changes to reﬂect the changes

5

250
200
150
100

uV

50
0
−50
−100
−150
−200
−250
0

10

20
30
Time(sample#)

40

50

Figure 1.3: Recorded waveforms of a single-unit along with their average. The unit’s waveforms form a stochastic process with a number of random variables equals to the number
of sample in each waveform snippet (in this case each waveform is made up of 48 samples).
Each waveform is a realization of this stochastic process.

6

in the surrounding environment [40]. The ISIH can also be used to show whether the neuron
phase-locks to a periodic input.
Next, I show how a unit’s ISIH is computed. Let n be the number of desired bin in the
histogram, ST be the set of the unit’s time stamps. Then the time interval between any two
consecutive spikes is:
∆ti = STi+1 − STi

(1.5)

where i ∈ [1, ∥ST ∥ − 1] is the index of the spike.
Let B be the set of bin boundaries in the ISIH such that:

B0 = 0

(1.6)

Bi = Bi−1 + step, ∀i ∈ [1, n]

(1.7)

where the bins step is deﬁned as:

step =

max(∆t) − min(∆t)
n

(1.8)

Finally, the value of the ith ISIH bin is

H[i] = {∆ti : ∆ti ≥ Bi , ∆ti < Bi+1 }, ∀i ∈ [1, n − 1]

(1.9)

with the only exception:
H[n] = {∆tn : ∆tn ≥ Bn }
Figure 1.4 shows the spike train of a single-unit and the ISIH inferred from it.

7

(1.10)

Time(ms)

1

0.5

0
0

1

2

3

4

Spikes

x 10

4

Frequency

6

5
4

x 10

4
2
0

25

100

175

250
325
400
Interspike Interval(ms)

475

Figure 1.4: The spike train for a single-unit is shown in Figure a (top) and the ISIH inferred
from it is shown in Figure b (bottom).

8

1.2

Challenges Facing Brain Machine Interface Systems

While chronically implanted multi-electrode silicon arrays have been very popular in recording neural signals, these recordings, however, frequently exhibit variability in spike waveform
shapes and ﬁring characteristics over days and months, making it hard to assess whether
recordings on consecutive days are made from the same neurons [9, 31]. This variability
potentially results from micro-movements of neurons and electrodes, and the brain tissue
reaction to the abiotic interface (electrodes), or from functional plasticity [15, 16, 37]. This
variability presents a major challenge to neural decoding - the process of translating the
ﬁring pattern of a ﬁxed subset of the recorded neurons into control signals to maintain a
ﬁxed mapping between the neural input space and the kinematic task space. Most of the
current decoding techniques necessitate the calibration of the decoder at the start of each
session where signals are recorded while the subject observes the task being performed and
a new decoder is trained using those signals. This process is a major limitation towards the
practical use of Brain Machine Interface (BMI) systems because it requires an operator to
perform this operation.
For subjects (non-human primates) to routinely use decoders without the need for frequent calibration, the mapping from the neural input space to the kinematic task space
needs to be ﬁxed, and that requires the accurate identiﬁcation of neurons to use at the start
of each recording session, preferably when the subject is not engaged in any behavioral task.
Using the same set of neurons during each session would ensure the ﬁxation of this mapping.
Hence, there is a need for a fast and eﬃcient single-unit tracking algorithm that analyzes
neural data recorded over a short duration of time and that uses only features extracted
9

from the units’ waveforms (as opposed to their stimulus-driven ﬁring characteristics) for
the decoder’s routine calibration. This can pave the way for fully automated tracking and
thereby eliminating the need for human supervision all-together.

1.3

Related Work

Previous work on assessing the stability of these signals has focused on qualitative measures
to track units based primarily on visual inspection [6, 8, 15, 29, 34, 41]. Few studies have
attempted to develop automated solutions to the unit tracking problem. Jackson et al. [23]
tracked units using the peak of the normalized cross-correlation between average waveforms.
They reported that 80% of the stable units had correlation values of > 0.95 across days.
In [45], Suner et al. approach the problem from a signal reliability perspective. They
compare the signals recorded on the same channel across days by comparing the centers of
clusters in the principal components space and the Kolmogorov-Smirnov statistic between
interspike intervals histograms of the formed clusters.
Movable tetrodes were used in [46]. The authors determined a so-called “null distribution” of pairwise distances by comparing signals before and after adjusting the depths of the
tetrodes. The “null distribution” can then be used to train a classiﬁer to track the stability
of single-units. Their method is unapplicable in the case of a ﬁxed array.
Dickey et al. [11] trained a classiﬁer on features extracted from both the unit’s average
waveform and interspike interval histogram (ISIH). The authors argued that using both
the average waveforms and ISIHs gave a better accuracy. However, they also stated that
subjects were overtrained on the behavioral task which minimized the variability of the
units’ ISIHs across days. Furthermore, the method the authors used to match ISIHs is

10

Figure 1.5: Components of a typical motion-restoration BMI system: a recording device, a
signal processing module to extract information from raw signals, a spike sorter to identify
single-units, a decoder to map the neuronal input signal to movement parameters, and an
actuator to perform the action.
computationally intensive and time consuming, which makes it unsuitable for real time
Brain Machine Interface applications. The authors reported that 57% of the units were
stable through 7 days and 39% of the units were stable through 15 days. They concluded
that if units are stable beyond the 7 days time marker, they are more likely to remain stable
in subsequent recording sessions.
Fraser et al. [14] used features extracted from functional relations between units, such as
pairwise cross correlograms, to track units across days. However, this method requires the
subject to be engaged in a behavioral task and may be susceptible to plasticity-mediated
changes in functional connectivity between units.

1.4

BMI Systems

A BMI system divides its operation among several sequential modules (Figure 1.5): a recording device, a signal processing module, a spike sorting module, a decoding module and an
actuator [18, 35].

11

The neuronal signals recorded from the brain undergo some signal processing such as:
band-pass ﬁltering to extract spiking events, denoising of the extracted spikes (typically by
thresholding), and artifact rejection. The processed signal is then spike sorted to produce
spike trains of individual neurons. Spike trains are all-or-none representation of the spiking
events of a neuron. Spike trains are then input to the decoding module. A decoder is what
translates the neuronal activity into motion. It extracts information from the spike trains
and outputs the equivalent movement parameters to perform the desired motion.
In this thesis, I present a fast and accurate single-unit tracking algorithm that uses features extracted from single-units’ waveform shapes and ﬁring characteristics. The algorithm
achieves an accuracy of up to 90% when compared to manual tracking. I describe the details of the algorithm in Chapter 2. I also present one application of single-units tracking in
Chapter 3: unit selection for BMI decoding.

12

Chapter 2
Single-Unit Tracking
In this chapter, I present an algorithm to track neurons recorded using a chronically implanted silicon array. Section 2.1 gives an overview of the algorithm. Sections 2.2 and 2.3
list some features that can be extracted from single-units. In Section 2.4, I describe four
types of classiﬁers that can be used in the tracking algorithm, and in Section 2.5 I describe
the algorithm in more details. Finally, I show the results I obtained by applying the tracking
algorithm in Section 2.6 .

2.1

Algorithm Overview

Figure 2.1 illustrates the algorithm ﬂow diagram. The algorithm builds proﬁles of singleunits from instances of unit recordings across days. The inputs to the tracking module are
the spike sorted units recorded on a given day, and the output is a set of updated proﬁles of
single-units for that day. The tracking module makes the choice of either assigning a unit
to an existing proﬁle or creating a new proﬁle for that unit on any given day. The distance
vectors used in this decision are computed using diﬀerent characteristics of a single-unit: the
waveform shape, and ﬁring patterns.

13

Figure 2.1: Tracking Algorithm: The tracking module takes in the spike sorted units and the
proﬁles stored on the channel and calculates the distance between units and stored proﬁles.
It then assigns each unit to either an existing proﬁle or a new proﬁle based on the distance
computed. The output is the updated set of proﬁles for that channel.

2.2

Single-Unit Waveform Characteristics

A single-unit’s average waveform (deﬁned in Equation 1.4) can be used to distinguish it
from the surrounding units. Waveforms features, such as amplitude, temporal dynamics,
time between minimum and maximum peak and peaks shapes, can all be used to track
single-units across days. However, due to small movements in the recording electrode’s
location relative to the single-unit, the shape of the recorded waveform diﬀers across days.
Therefore, we need measures to compute how similar two waveforms recorded on the same
channel on two diﬀerent days, and based on such measures, we could decide whether the two
waveforms are recorded from the same single-unit.
In this study, I chose four distance measures to quantify the diﬀerence between two
¯
¯
recorded average waveforms Wx and Wy . Table 2.1 lists the chosen measures and the
waveform features each of them uses to compute the distance.

14

Table 2.1: Waveform-derived distance measures
Distance Measure
Correlation
Normalized Peak-to-Peak
Height Diﬀerence
Normalized Peak-to-Peak
Time Diﬀerence
Peak Matching

2.2.1

Used Feature
Waveform temporal dynamics
Amplitude
Transition time between
minimum and maximum peaks
Peaks shape

Correlation

The Correlation (PC ) is used to compare the temporal dynamics of the waveforms during
an action potential. The PC is calculated by:
∑m ¯
¯
¯
¯
i=1 (Wx (i) − Wx )(Wy (i) − Wy )
¯ x , Wy ) = √
¯
√∑
P C(W
∑m ¯
m
¯ 2
¯ 2
¯
i=1 (Wx (i) − Wx )
i=1 (Wy (i) − Wy )

(2.1)

¯
where m is the number of samples in each of the two average waveforms, W is the mean of
an average waveform.

2.2.2

Normalized Peak-to-Peak Height Diﬀerence

The Normalized Peak-to-Peak Height Diﬀerence (PH ) is deﬁned as:
(
¯ ¯
P H(Wx , Wy ) =

) (
)
¯
¯
¯
¯
max(Wy ) − min(Wy ) − max(Wx ) − min(Wx )
(
)
¯
¯
max(Wx ) − min(Wx )

(2.2)

where max and min are the maximum and minimum functions.
This metric quantiﬁes the diﬀerence in the amplitude of the two waveforms and normalizes
it to the amplitude of one of them. Therefore, a larger diﬀerence between two high amplitude

15

units would result in a smaller PH distance than if the diﬀerence was between two low
amplitude units. This is desirable to reﬂect that a large diﬀerence between two low amplitude
units indicate a higher variability than in the case of two high amplitude units.

2.2.3

Normalized Peak-to-Peak Time Diﬀerence

The Normalized Peak-to-Peak Time Diﬀerence (PT ) is deﬁned as:
(
¯ ¯
P T (Wx , Wy ) =

) (
)
¯
¯
¯
¯
t[max(Wy )] − t[min(Wy )] − t[max(Wx )] − t[min(Wx )]
(
)
¯
¯
t[max(Wx )] − t[min(Wx )]

(2.3)

where t[.] is a function that retrieves the sample number (i.e. time instance) of the passed
argument.
This metric quantiﬁes the change in the transition time between the minimum and maximum peaks. Combined with the PH, it essentially compares the slopes of the lines joining
the minimum and maximum peaks of the two waveforms.

2.2.4

Peak Matching

The Peak Matching (PM ) is a heuristic asymmetric measure developed by Strelkov [44]
that measures the diﬀerence in the shapes of peaks of two signals. The algorithm used to
calculate the distance ﬁnds the peaks in both signals and assigns weights to each one of
them. It then ﬁnds heuristic pairwise closeness factors between peaks and ﬁnally calculates
the total distance as the sum of peak weights multiplied by the computed closeness factors.
Refer to Appendix A for a full description of this metric.

16

2.3

Single-Unit Firing Characteristics

Diﬀerent single-units have diﬀerent ﬁring patterns and these patterns can be used to distinguish units from each other [7, 30]. Several features can be extracted from a unit’s ﬁring
patterns such as:
• Average ﬁring rate (λ)
• Coeﬃcient of variation (CoV)
• Interspike interval histogram (ISIH)
I chose to use the ISIH since it can provide more information about ﬁring characteristics of
a neuron and how it responds to external stimuli.
Distance measures that are used to compare two probability distributions can be used
to compare two normalized ISIHs Hx and Hy . I chose four of such distance measures in
this study, where each quantiﬁes the diﬀerences between two probability distributions from
a diﬀerent prespective. Table 2.2 lists the chosen measures and the criterion each of them
uses to compute the distance.
Table 2.2: ISIH-derived distance measures
Distance Measure
KL-Divergence
Bhattacharyya Distance
KS-Statistic
Earth Mover’s Distance

Distance Criterion
Total information divergence
Amount of overlap
Maximum divergence
Cost of turning one dist. to the other

17

2.3.1

KullbackLeibler Divergence

The KullbackLeibler Divergence (KLD) is a non-symmetric measure that measures how
divergent two statistical populations are [27].
The KLD is calculated as:

D(Hx ||Hy ) =

l
∑

(
ln

i=1

Hx [i]
Hy [i]

)
Hx [i]

(2.4)

where l is the number of bins in each ISIH, H[i] is the value of the ith bin in the histogram
H.
To compute a symmetric measure out of the KLD, we simply compute the following:

D(Hx , Hy ) =

2.3.2

D(Hx ||Hy ) + D(Hy ||Hx )
2

(2.5)

Bhattacharyya Distance

The Bhattacharyya Distance (BD) is closely related to the Bhattacharyya coeﬃcient (BC )
which is a measure of the amount of overlap between two statistical populations [3]. It is
calculated using:

BD(Hx , Hy ) = − ln BC(Hx , Hy )


l
∑√
BC(Hx , Hy ) = 
Hx [i]Hy [i]

(2.6)
(2.7)

i=1

where l is the number of bins in each ISIH, H[i] is the value of the ith bin in the histogram
H.
18

2.3.3

KolmogorovSmirnov Statistic

The KolmogorovSmirnov Statistic (KS ) quantiﬁes the distance between two empirical distributions [19] and it is calculated as:

l

KS(x, y) = sup Fx [i] − Fy [i]

(2.8)

i=1

where Fx and Fy are the cumulative distribution function derived from Hx and Hy , l is the
number of bins in each ISIH, and F[i] is the value of the cumulative distribution at the ith
bin.

2.3.4

Earth Mover’s Distance

The Earth Mover’s Distance (EMD) is a measure of the distance between two probability
distributions. The EMD is the minimum cost of turning one distribution into the other
by moving distribution mass around. For 1-D distributions, the EMD is computed by the
following algorithm:

EM D0 = 0

(2.9)

EM Di+1 = (Hx [i] + EM Di ) − Hy [i]

(2.10)

EM D(Hx , Hy ) =

l
∑

|EM Di |

(2.11)

i=1

where l is the number of bins in each ISIH, H[i] is the value of the ith bin in the histogram.
Each of the previous distance measures computes the distance between two probability
distributions from diﬀerent perspectives. While the KLD calculates the total divergence
19

between two distributions, the KS approximates that divergence by just including the maximum diﬀerence at any point in the cumulative distributions. Both the KLD and the KS
impose diﬀerent constraints when comparing two distributions. The former measure overlooks individual points and is only concerned by the total divergence across all the points,
which is opposite to what the latter measure does. The BD quantiﬁes the amount of overlap
between the two distributions to decide how similar they are, and the EMD computes the
minimum cost of converting one distribution to the other.

2.4

Tracking Classiﬁer

Given a vector of distances between the features extracted from two single-units recorded on
the same channel on two diﬀerent days, the tracking classiﬁer makes the decision of whether
these two single-units represent diﬀerent instances of the same single-unit.
For training a waveform-based classiﬁers, true positives, which correspond to pairs of
matched waveforms, were obtained by two experts who manually tracked the units. True
negatives, which correspond to pairs of unmatched waveforms are obtained by comparing
units recorded simultaneously on the same channel on any given day. Figure 2.2 illustrates
this process. The same process can be applied for training classiﬁers that use other singleunits characteristics.
Several types of classiﬁers were compared in this study. The following few sections give
a brief summary of the classifers I compared. The input to all classiﬁer is a vector, x, of the
distances between the features extracted from two units recorded on the same channel but
on diﬀerent days. The length, n, of the distance vector is the number of features used in
the comparison. The classiﬁers’ decision is binary, either to assign x to class 0 (c0 ), which

20

Figure 2.2: Process of training a waveform-based classiﬁer. Using spike sorted units recorded
across several days, experts track the units manually and produce pairs of matching average
waveforms (true positives). Using the same set of units, pairs of non-matching average
waveforms are generated by comparing spike sorted units recorded simultaneously on the
same channel.

21

corresponds to non-matching units, or to class 1 (c1 ), which corresponds to matching units.

2.4.1

Support Machine Vectors (SVM)

The SVM is a supervised binary classiﬁcation method that ﬁnds a high-dimensional decision
plane where the “margin” is maximized. The margin is deﬁned as the smallest distance
between the decision plane and any of the data points. The SVM model is in the form [33]:

y(x) = wT ϕ(x) + b

(2.12)

where, w deﬁnes the decision plane, x is the input distance vector, ϕ(x) is projection of
the input into the feature space (basis function), and b is a oﬀset term to account for any
biasedness in the training data.
Although the SVM model in Equation 2.12 is linear, non-linearity can be added to the
model by the use of a non-linear basis function such a Gaussian radial basis function.

2.4.2

Relevance Machine Vectors (RVM)

The RVM is a Bayesian sparse kernel technique for classiﬁcation and it shares many of the
characteristics of the SVM. However, unlike the SVM, which gives the classiﬁcation decision,
the RVM provides the classiﬁcation probability. In other words, the RVM combines the
advantages of the sparse SVM and the probabilistic Bayesian methods. The model then
becomes of the form [33]:
(

y(x) = σ wT ϕ(x) + b
where, σ(.) is the sigmoid function.

22

)
(2.13)

2.4.3

Maximum A Posteriori Classiﬁcation (MAP)

The MAP is a probablistic classiﬁer and it is used to estimate how likely a new point belongs
to a distribution based on empirical data. In other words, the MAP ﬁnds the most probable
class given the available training data.
Given the distance vector between two units, x, the MAP classiﬁcation would be [33]:

c1

p(x1 , . . . , xn |c1 )P (c1 ) ≷ p(x1 , . . . , xn |c0 )P (c0 )

(2.14)

c0

where, p(x1 , . . . , xn , |c0 ) and p(x1 , . . . , xn , |c1 ) are the likelihood probabilities that x belongs
to c0 and c1 respectively, and P (c0 ) and P (c1 ) are the prior probabilities of c0 and c1
respectively.
The likelihood and prior probability distributions are constructed using the training data,
where the true positives are instances of c1 and the true negatives are instances of c0 .

2.4.4

Naive Bayesian Classiﬁcation (NB)

The NB is a simple probabilistic classiﬁer that is based on Bayes theorem, but unlike the
MAP, the NB assumes independence between features. Therefore, instead of using a joint
conditional probability distribution as in the case of MAP, the NB uses multiple individual
conditional probability distributions to infer its decision. Despite its unrealistic features
independence assumption, the NB is surprisingly eﬀective in practice [12].
For a distance vector between two units, x, the NB decision is [4]:
n
∏
i=1

n
c1 ∏

p(xi |c1 ) ≷
c0

23

i=1

p(xi |c0 )

(2.15)

where p(xi |c0 ) and p(xi |c1 ) are the conditional probability distributions that the ith feature
of x belongs to c0 and c1 respectively.
As in the MAP, the conditional probability distributions are constructed using the training data, where the true positives are instances of c1 and the true negatives are instances of
c0 .

2.5
2.5.1

Tracking Algorithm
Pre-processing

The average waveforms were smoothed using a Gaussian kernel (σ = 2) to get rid of the high
frequency components (which results from the short amount of recording tie I used).

2.5.2

Algorithm

As stated above, the tracking algorithm builds a database of proﬁles of recorded singleunits. Each proﬁle stores instances of the corresponding single-unit recorded across days.
On a given day x, units on a channel c are spike sorted and passed to the feature extraction
module. Pairwise distance vectors are computed between the stored proﬁles on c and the
input spike sorted units. A distance vector is the concatenation of the distance measures
computed over an input unit and an instance stored in a proﬁle. A distance matrix is built
using the pairwise distance vectors between the stored proﬁles and the spike sorted units.
The distance matrix is then input to the tracking classiﬁer to build a membership matrix,
where each cell indicates whether an input single unit is another instance of a stored proﬁle:
Denote the set of proﬁles stored on channel c as Pc , the set of spike sorted input units

24

on channel c as Uc , the distance between the ith proﬁle and the jth unit as Dist(Pc,i , Uc,j ),
and the membership of the jth unit to the ith proﬁle as M em(Pc,i , Uc,j ). Assume there are
m stored proﬁles on c, and n spike sorted units on day x. The distance matrix is formed as:



 Dist(Pc,1 , Uc,1 ) Dist(Pc,1 , Uc,2 )


 Dist(Pc,2 , Uc,1 ) Dist(Pc,2 , Uc,2 )

DistM atc (P, U ) = 

.
.
.
.

.
.


Dist(Pc,m , Uc,1 ) Dist(Pc,m , Uc,2 )

. . . Dist(Pc,1 , Uc,n ) 


. . . Dist(Pc,2 , Uc,n ) 



.
..
.

.
.


. . . Dist(Pc,m , Uc,n )

And the membership matrix is:




 M em(Pc,1 , Uc,1 ) M em(Pc,1 , Uc,2 )


 M em(Pc,2 , Uc,1 ) M em(Pc,2 , Uc,2 )

M emM atc (P, U ) = 

.
.
.
.

.
.


M em(Pc,m , Uc,1 ) M em(Pc,m , Uc,2 )

. . . M em(Pc,1 , Uc,n ) 


. . . M em(Pc,2 , Uc,n ) 



.
..
.

.
.


. . . M em(Pc,m , Uc,n )

Finally, the relationship between the distance and membership matrices can be modeled
as:
DistM atc (P, U ) ⇒ T rackingClassif ier ⇒ M emM atc (P, U )

(2.16)

Units are added as instances to the proﬁles they belong to. If a new unit is identiﬁed, a
new proﬁle is initialized and the unit instance is added to it.
A proﬁle is composed of all the recorded instances of the single-unit it represents. When
an input unit is compared to a proﬁle to make the decision of whether the unit is another
instance of the proﬁle, the tracking classiﬁer needs to decide which proﬁle instance to use
in such comparison. In [11], the authors stated that once a unit remains stable (does not

25

experience much variability) beyond a threshold (seven days), it is more likely to maintain
its stability. Based on this result, I developed a technique that I call “First Matching
Distance (FMD)”. In this technique the most recent seven instances in a proﬁle are used in
this comparison from newest to oldest. Once the tracking classiﬁer indicates that a proﬁle
instance and the unit are recorded from the same singe-unit (a match), the comparison stops
and this instance is used for distance calculations. Furthermore, two classes of proﬁles are
deﬁned: active and inactive. Active proﬁles are those proﬁles whose single-units have been
recorded on any day within the seven days before. On the other hand, inactive proﬁles are
those proﬁles whose units did not appear in the recording on any day during the seven days
before, in which case I consider them “dropped units” and I do not include them in future
comparisons.

2.5.3

Post-processing

The training classiﬁer results in a membership matrix where each cell in this matrix indicates
whether an input unit on a given day can be added to a stored proﬁle or not. In many cases,
because of the similarity of the waveform shapes of diﬀerent units or misclassiﬁcations in
the spike sorting algorithm, the classiﬁer might indicate that the input unit can be added
to more than one proﬁle. However, a unit can be only added to one proﬁle. Therefore, I
developed a backtracking algorithm to ﬁnd the best assignment of units instances to proﬁles. Backtracking is a standard general algorithm for ﬁnding all the possible solutions to
a problem and then selecting one that most satisﬁes a set of user-deﬁned objectives. The
objectives I speciﬁed were as follows:
• Primary objective: Maximize the number of assigned units to already existing stored

26

Figure 2.3: Application of a waveform-based classiﬁer. On any given day x, the spike sorted
units of a given channel c are compared against the stored proﬁles on the same channel c.
Pairwise distance vectors are calculated from features extracted from pairs of waveforms.
The tracking classiﬁer makes the decision of whether to add a unit to an existing proﬁle or
to create a new proﬁle for it.

27

proﬁles.
• Secondary objective: Maximize the total sum of distances to the decision boundary
of the assigned units (since the larger the distance to the boundary, the more certain
the decision of the tracking classiﬁer is).
If two solutions have the same number of units assigned to existing stored proﬁles, the
one with a larger sum of distances to the decision boundary is selected.
The tracking algorithm, on any given day x, is summarized in Algorithm 1.
Algorithm 1 Tracking algorithm
for each channel c do
Spike sort the signal recorded on c
Input the spike sorted units to the tracking module
Compute distance vectors between each of the input units and the stored proﬁles associated with c
Using the distance vectors computed, build a membership matrix between the input
units and the stored proﬁles on c using the tracking classiﬁer
Using back-tracking, ﬁnd the best assignment that maximizes the number of assigned
units to existing proﬁles
Create new proﬁles for unassigned units
end for

2.6
2.6.1

Results
Data Acquisition and Behavioral Task

Recordings were obtained using a 100-microelectrode silicon array (Blackrock Microsystems,
Inc., Salt Lake City, UT) chronically implanted in the primary motor cortex (M1) of a Rhesus
Macaque. The implant was performed three months before the recording of the ﬁrst dataset
used in this study. The surgical and behavioral procedures involved in this study were
approved by the University of Chicago Institutional Animal Care and Use Committee and
28

R
Figure 2.4: Spike sorting on the Cerebus⃝ Online Sorter. Each single-unit is deﬁned by a
set of “hoops” where any waveform intersecting with all hoops of a given unit is added to
that unit.

conform to the principles outlined in the Guide for the Care and Use of Laboratory Animals.
R
Spike sorting was done using the Cerebus⃝ Online Sorter and the sorting templates were

updated on a daily basis. The Hoops method was used to perform the spike sorting where
a set of bars is deﬁned for each single-unit and when a waveform intersects with all the bars
of one set, the waveform is classiﬁed as part of the unit corresponding to this set. Figure 2.4
displays a screen shot of the process of deﬁning hoops for single-units.
The subject performed a brain-controlled reach and grasp task. Details of the behavioral
task can be found in [2]. Fifteen datasets, spanning a period of 26 days, were used in training
(7 datasets) and testing (8 datasets) the classiﬁers. I used only the ﬁrst 15 minutes of each
dataset.This typically corresponds to the period of updating the spike sorting templates
where the subject is not engaged in any behavioral task.

29

2.6.2

Training Features

Figures 2.5 and 2.7 show the distributions of the waveform- and ISIH- based distance measures when applied on the true positives and true negatives training datasets.
The degree of separability between the distributions provided by each of the features I
selected varied. While the peak matching distance provided the best separability between
matching and non-matching waveforms, the other three waveform-based distance measures
did not provide separation between classes (except in regions where it was very clear that
there is a signiﬁcant diﬀerence in the amplitude or time between peaks). The inclusion of the
latter three features, however, was necessary to account for cases where two waveforms had
very similar peak shapes but with signiﬁcantly diﬀerent amplitude or time between peaks or
temporal dynamics, which occurred frequently in our data (see Figure 2.6).
On the other hand, all the ISIH-based distance measures did not provide the degree
of separation between classes provided by the peak matching. However, their inclusion is
nessecary to account for cases where the waveform shapes of two unit instances experience
variability but their ﬁring properties remain similar (see Figure 2.8).

2.6.3

Classiﬁers Evaluation

The performance of the classiﬁers was assessed based on two metrics:
1. Classiﬁcation Accuracy: The percentage of correct classiﬁcation made on a day-to-day
basis compared to manual tracking. A correct classiﬁcation happens when a unit on
day x is matched with the correct unit on day x − 1 (as compared to manual tracking).
2. Percentage of Correct Proﬁles: The percentage of proﬁles that were correctly tracked
across all days of the testing datasets compared to manual tracking. A correct proﬁle
30

Figure 2.5: Probability distributions of the training true positives (in blue) and true negatives
(in red) for each of the four wavefrom-based features.

31

80

60

60

40

40
20
uV

uV

20
0

0
−20

−20

−40
−40
−60
0

−60
−80
0

20
40
Time (sample#)

20
40
Time (sample#)

Figure 2.6: Figure a shows an example of two average waveforms that have very similar peak
shapes but diﬀerent peak-to-peak height, while Figure b shows an example of two average
waveforms that have very similar peak shapes but diﬀerent peak-to-peak time.
is the proﬁle that matches a manually tracked proﬁle across all days.
Classiﬁers implementation was provided by the newFolder liberary [47]. I used a radial
basis function as the kernel for the SVM classiﬁer. I compared the performance of the
classiﬁers using four diﬀerent sets of features:
• Set 1: 4 waveform-based features (PC, PH, PT, PM ) and 4 ISIH-based features (KLD,
BD, KS, EMD).
• Set 2: 3 waveform-based features (PH, PT, PM ) and 3 ISIH-based features (KLD,
BD, KS )
• Set 3: 3 ISIH-based features (KLD, BD, KS ).
• Set 4: 3 waveform-based features (PH, PT, PM ).
32

Figure 2.7: Probability distributions of the training true positives (in blue) and true negatives
(in red) for each of the four ISIH-based features.

33

200

uV

100
0
−100
−200
0

10

20
30
Time (sample#)

50

0.6
Prob.

0.8

0.6
Prob.

0.8

40

0.4
0.2
0

0.4
0.2
0

25 150 275 400
Interspike Interval (ms)

25 150 275 400
Interspike Interval (ms)

Figure 2.8: The ﬁgure shows an example of two instances of a single-unit on two successive
days where the average waveform experienced some variability and the ISIH remained the
same.

34

Performing cross validation was challenging because the order of the datasets had to be
maintained. This is because changes in the characteristics of a single unit might be small
over two successive datasets but much larger across longer duration. We divided the ﬁfteen
datasets (explained in Section 2.6.1) into six partitions. Each partition is divided into ﬁve
training datasets and ﬁve testing datasets. The ﬁrst partition contained datasets 1 to 10,
the second partition contained datasets 2 to 11, and so on.
Figure 2.9 shows the classiﬁers performance under the four feature sets. It is clear
that feature sets 1, 2 and 4 give better performance than feature set 3. This shows that
adding the ISIH-based features can help improve the performance but by themselves they
cannot successfully track single-units. This can be explained by the fact that the subject
are not overtrained and the single-units’ ﬁring characteristics experience variability during
the behavioral task learning phase [21]. Furthermore, I performed t-test to see if any of
the classiﬁers is better than others. For feature sets 1, 2 and 4 all classiﬁers performed
similarly. However, for feature set 3, the RVM and SVM classiﬁers were better than the NB
and MAP classiﬁers (p > 0.01 - p-value is small because of the small number of samples
(6 samples)). Figure 2.10 displays the average execution time per channel for the classiﬁers
under diﬀerent feature sets. The machine used in running the algorithm had an Intel-i7
quad-core processor (3.40GHz) and 16GB of RAM. Execution times were almost the same
for the diﬀerent classiﬁers. Feature set 1 has the largest execution time (as we might expected
because of the larger number of features used) and feature set 3 has the smallest one.
Furthermore, I compared the receiver operating characteristic (RoC) curves of the four
classiﬁers under the four feature sets. For feature set 1, the RVM performed slightly better
than other classiﬁers. For feature set 2, the NB performed the worst. For feature set 3,
the RVM and SVM were superior to the NB and MAP. Finally, all classiﬁers performed
35

Per. of Correct Classification

RVM

100

NB

MAP

80
60
40
20
0

Set1

Set2

RVM

100
Per. of Correct Profiles

SVM

Set3
Feature Set

SVM

Set4

NB

MAP

80
60
40
20
0

Set1

Set2

Set3
Feature Set

Set4

Figure 2.9: Comparison of the classiﬁers performance using cross validation. Four diﬀerent
sets of features were used. Cross validation was performed using 6 partitions (each had 5
training datasets and 5 testing datasets). The errorbars indicate the standard deviation
across the 6 partitions. All classiﬁers performed similarly for feature sets 1, 2 and 4 (conﬁrmed by t-test).

36

Avg. Execution Time Per Channel (Sec.)

2

1.5

1
RVM
SVM
NB
MAP

0.5

0

Set1

Set2
Set3
Feature Set

Set4

Figure 2.10: Comparison of the classiﬁer performance based on the average execution time
per channel. Cross validation was performed using 6 partitions and 4 feature sets were
deﬁned. The errorbars indicate the standard deviation across the 6 partitions. All classiﬁers
performed similarly for feature sets 1, 2, and 4 (conﬁrmed by t-test).
similarly under feature set 4. We also plotted the RoC curve of an RVM classiﬁer trainined
by randomly shuﬄing label of training datasets (we call it Rand). We repeated this process
for 100 times and plotted the average RoC curve which runs along the chance level showing
that my method is unbiased.
Finally, Tables 2.3 and 2.4 show the classiﬁers’ performance measured by the metrics
above when trained and tested using all 15 datasets. All classiﬁers performed very well,
however, the RVM was superior to other classiﬁers in terms of both metrics. We believe that
the reason behind this is that it combines the advantages of SVM and those of a probabilistic
classiﬁer. I decided to use it to perform further analysis.
Figure 2.12 shows the decision boundary of the RVM classiﬁer under the 4 feature sets.
Figure 2.13 displays two of the proﬁles tracked by the RVM classiﬁer.

37

Figure 2.11: RoC curves for diﬀerent classiﬁers using 4 diﬀerent feature sets. The top left
ﬁgure is for set 1, the top right ﬁgure is for set 2, and so on.

38

Same Unit
Different Units

1

0

PC2

PC2

2

−2

Same Unit
Different Units

−4
0

2

4

6

8

0
−1

10

0

2

4
PC1

PC1

0.5
1
PC2

PC2

0

−1
0

2

4
6
PC1

8

Same Unit
Different Units

0.5
0

−0.5
Same Unit
Different Units

6

−0.5

8

0

0.5
PC1

1

1.5

26
24
17
12
10
7
4
1

Day

Day

Figure 2.12: Decision boundaries for the RVM classiﬁer using 4 diﬀerent feature sets For
visualization, the RVM classiﬁer was trained on the ﬁrst two principal components of the
training features. The top left ﬁgure is for set 1, the top right ﬁgure is for set 2, and so on.
The black circles are the basis vectors of the RVM classiﬁer.

26
24
17
12
10
7
4
1

57 uV

109 uV
0.125 ms

0.125 ms
0

0.5
1.0
Time (ms)

1.5

0

0.5
1.0
Time (ms)

1.5

Figure 2.13: Two of the proﬁles tracked by the RVM classiﬁer across 15 datasets. The
proﬁles were recorded on channels 4 and 90 respectively.
39

Table 2.3: Classiﬁers’ Performance - Classiﬁcation Accuracy
Set/Class.
Set 1
Set 2
Set 3
Set 4

RVM
90.51%
89.14%
61.16%
89.62%

SVM
87.44%
88.05%
61.92%
89.41%

NB
87.30%
87.91%
40.61%
89.62%

MAP
86.62%
88.39%
41.36%
90.10%

Table 2.4: Classiﬁers’ Performance - Percentage of Correct Proﬁles
Set/Class.
Set 1
Set 2
Set 3
Set 4

RVM
78.34%
74.01%
20.47%
75.98%

SVM
73.62%
72.83%
15.74%
77.16%

NB
74.01%
74.01%
10.23%
76.37%

MAP
71.65%
72.83%
9.84%
76.77%

One of the goals of this study is to develop a fast and accurate algorithm to track singleunits. A trade-oﬀ exists between accuracy and speed and it can be observed clearly when
comparing the performance results for feature sets 1 and 4 (See Tables 2.3, 2.4, and 2.5).
Feature set 1 achieves a higher performance but at the cost of a longer execution time. On
the other hand, feature set 4 has a smaller execution time but at the cost of a lower accuracy.
However, the gain in using feature set 4 is greater than that in the case of using feature set
1, since the drop in accuracy is very small (1% drop) compared to a larger increase in the
average execution time per channel (17% increase). Therefore, I used feature set 4 to perform
any additional analysis.
Table 2.5: Average execution time per channel when the RVM was used to track 15 datasets.
Feature Set
Time (sec.)

1
2
5.24 4.03

40

3
0.62

4
4.48

Avg. Execution Time Per Channel (Sec.)

10
RVM
8

6

4

2

0

1 4 7 10 12 17 24 26 34 40 44 46 50 53 57 59 86 88
Day Index

Figure 2.14: Average execution time per channel when feature set 4 and the RVM classiﬁer
were used. The classiﬁer was run on 35 datasets spanning 88 days. Convergence in the
execution time happens on day 34 of the analysis.
Figure 2.14 shows the convergence of the average execution time per channel as the
algorithm is run on more datasets. The convergence happens on day 34 with an average
exection time of 8 seconds per channel. This convergence occurrs because the number of
active proﬁles stabilizes after some time. Proﬁles inactive for a long time (> 7 datasets) are
considered droppped and are not included in the tracking.

2.7

Conclusion

In this chapter, I described the details of a single-unit tracking algorithm. I compared
the performance of four diﬀerent classiﬁers using four diﬀerent sets of features that were
extracted from a unit’s average waveforms and ﬁring characteristics. Using all the eight fea-

41

tures propsed led to the best performance but at the cost of a longer execution time. I found
that using three waveform-based features achieved the best trade-oﬀ between execution time
and accuracy. Using cross validation, the classiﬁers performed equally. However, when more
training datasets were added, the RVM classiﬁer performed slightly better than other classiﬁers. Finally, when the algorithm was run on 35 datasets spanning 88 days, the execution
time converged on day 34 with an average execution time of 8 seconds per channel.

42

Chapter 3
Unit Selection Algorithm for
Decoding
A neural decoding is the process of translating the ﬁring pattern of a ﬁxed subset of the
recorded neurons into control signals to actuate an artiﬁcial limb. For maximizing success of
this control, it is important to maintain a ﬁxed mapping between the neural input space and
the kinematic task space, which is known as “Memory Motor Consolidation”. To automate
the process of decoder calibration and to ﬁx this mapping, the accurate identication of
neurons to use at the start of each recording session is required, preferably when the subject
(non-human primate) is not engaged in any behavioral task.
However, to apply a ﬁxed decoder that only uses a ﬁxed subset of the recorded units,
one has to make several decisions, among these: which units to use? how to ensure the
stability of these units? what to do when a unit stops showing up on recordings (dropped )?
In this section, I propose a units selection algorithm that addresses these questions in a
systematic way. The algorithms targets the use of a ﬁxed linear decoder [49] following an
operant-conditioning approach [24] where only a ﬁnite subset of units (cluster) is used for
decoding a kinematic variable.

43

3.1

Unit Selection Criteria

Four criteria for the unit selection algorithm were speciﬁed:
1. All units must be stable.
2. Units within a cluster should be functionally connected.
3. Stable units are evenly distributed across clusters.
4. Units recorded on the same channel must belong to the same cluster.
The rationale behind each criterion is explained in the next three sections.

3.2

Tracking Unit Stability

We deﬁne a stable single-unit as a unit that was recorded on a daily basis and that maintained
similar properties for at least a certain number of days. In [11], the authors found out that
once a unit maintains its stability beyond a certain number of days, it is more likely to stay
stable. They postulated that this threshold is 7 days. Therefore, using only stable units
in decoding may ensure the reliability of decoding and the minimal inconsistency in the
subject’s experience with the BMI.
I used the algorithm described in Section 2.5 to track units across days. Based on the
results of [11], we deﬁned a variable which we called “Stability Window ”, and three categories
of units were deﬁned based on it:
• Stable Units: The units that were recorded and maintained similar properties on
every day throughout the past “Stability Window ”.

44

Figure 3.1: Classiﬁcation of single-units. For a stability window of 7 days, stable units are
those units that were identiﬁed as similar on a daily basis throughout the stability window.
Partially stable units are those units that were identiﬁed throughout more that half the
stability window. Unstable units are all the remaining units.
• Partially Stable Units: The units that were recorded and maintained similar properties for more than half of the past “Stability Window ”.
• Unstable Units: All the remaining units.
Figure 3.1 illustrates the process of classifying the single-units based on their stability.
Furthermore, I modiﬁed the tracking algorithm such that the comparison between a
single-unit and a stored proﬁle is performed by comparing the unit against all the proﬁle
instances that occurred within the past “Stability Window ” of days. The distance vector
resulting from this comparison is the distance between the unit and the proﬁle instance
closest to it (determined using the distance to the decision boundary). The intuition behind
this modiﬁcation is that this way I am more tolerant towards a unit that undergoes some
minor instability for a short period of time and then goes back to its stable state. One of the
causes of this short term instability might be the high interference from the electric circuitry
used in recording on one of the days (which was observed on some channels). This tolerance
comes from the fact that now the unit is compared against more than one proﬁle instance. In

45

addition, this would increase the accuracy of classiﬁcation since the closest matching proﬁle
instance is used in the comparison (instead of the ﬁrst matching proﬁle instance as was done
in Chapter 2). I call this new technique “Maximum Matching Distance (MMD)”.

3.3

Clusters of Functionally Connected Units

Neuronal assemblies cooperate to perform complex behavioral tasks and this determines the
functional connections between neurons. Within an assembly, each neuron assumes a role
depending on its tuning function, which depends on the behavioral task being performed
at the time of recording [13]. It was found that changes in the tuning function of singleunits occur as subjects perform the behavioral task [15]. Correlated neurons are potential
cell assemblies that share common coding [22]. In [1], the authors stated that changes in
functional connectivity between units and the relevance of the behavioral task are necessary
for cortical plasticity to occur. Based on this result, we hypothesized that using already
functionally connected units in decoding can help to speed the cortical plasticity occurring
from learning the behavioral task.

3.3.1

A Multi-Scale Approach

I used the algorithm in Eldawlatly et al. [13] to infer the functional connectivity between
units. The algorithm has a primary parameter and that is the number of time scales in which
the correlation is calculated. The average ﬁring rate is calculated at each time scale using
wavelets. Pairwise correlations are then calculated between units’ spike trains at diﬀerent
time scales resulting in a set of correlation matrices that are fused into one similarity matrix
using SVD. Algorithm 2 provides a summary of the algorithm and Appendix B provides

46

more details on the computations of each step.
Algorithm 2 Functional connectivity algorithm
for j = 1 to J (timescale index) do
for p = 1 to P (neuron index) do
j
Obtain the wavelet representation sp of each spike train sp
end for
Compute the P × P correlation matrix Σ(j) of the spike trains at timescale j
end for
Form the block diagonal correlation matrix R from the matrices Σ(j) and obtain its spectral
decomposition using SVD
Obtain the pairwise similarity of neurons using the weighted sum of the ﬁrst few dominant
modes of R

The result of this algorithm is an augmented correlation matrix (augmented because it is
the result of fusing the correlation matrices across diﬀerent time scales), where each cell in
the matrix indicates the strength of functional connectivity between the two units (i.e. the
augmented correlation between two units is an indicator of their functional connectivity).
Note that from now on when I use the term “correlation”, I mean the “augmented
correlation” between the units. An additional clustering step is required to divide the units
into diﬀerent partitions, each controlling a degree of freedom. I used the same clustering
algorithm used in [13], and that is the Spectral Clustering [32].

3.4

Balancing Units

The clusters produced from the previous step are generally unbalanced in terms of the number
of units. This is undesirable because one cluster might contain a large number of units while
another cluster contains a much smaller number of units. In the event of a unit drop from
the smaller cluster, the subject may be unable to control the degree of freedom associated
with that cluster. I developed an algorithm to balance the number of units across clusters

47

while maintaining the functional connectivity between units within a cluster. The aim of
this algorithm is to provide the subject with the same degree of control for each degree of
freedom by assigning equal number of units across clusters. This way we also divide the
labor equally over units so that the control would not deteriorate in case of a unit drop.

3.4.1

Measuring the functional connectivity between units

Swapping units between clusters during the balancing process requires quantifying the connectivity of each unit to the rest of the cluster units. The Interquartile Mean (IQM) of
the correlations between a unit and the rest of the cluster units can be used for this purpose.
The IQM can be calculated by throwing away the ﬁrst and last quartiles of the correlations
and averaging the rest [42]. Assume that ρ is the sorted array of the correlations between a
unit and the rest of cluster units and it is of length n, the IQM is:
3n

4
2 ∑
IQM (ρ) =
ρi
n n

(3.1)

i= 4 +1

The IQM combines the beneﬁts of both the average and the median. It handles the eﬀect
of the outliers by computing the ﬁrst and third quartiles and at the same time the result of
the metric does not have to be a value in the series.

3.4.2

Balancing Algorithm

The balancing algorithm is summarized in Algorithm 3. An extra criteria is added to this
algorithm where units recorded on the same channel are assigned to the same cluster. This
criteria is speciﬁed because units recorded on the same electrode are located in close proximity

48

and it was found that neighboring neurons tend to share common inputs of the same sign [48].
This enables them to work together as a coherent functional group when their common input
is activated. The modiﬁed algorithm is summarized in Algorithm 4.
Algorithm 3 Balancing Algorithm
Deﬁne a minimum number of units for each cluster
Run the clustering algorithm described in 3.3.1 and get the functional clusters
Sort the units in the clusters based on a functional connectivity metric
Determine clusters that have excess units (P )
Determine clusters that are in short of units (R)
while R is not empty && P is not empty do
for each cluster C in set P do
Find the connectivity of the least connected unit to the rest of the cluster units
M inConC
end for
Select the cluster C from the set P with the minimum M inConC (i.e. the cluster that
has the least connected unit)
Declare the least connected unit in C a free unit
for each cluster C in set R do
Find the connectivity of the free unit to its units F reeU nitConC
end for
Add the free unit to the cluster C with maximum F reeU nitConC (i.e. the cluster that
the free unit is most connected to)
Update stats of sets P and R
end while
if there are clusters with excess number of units above the minimum number speciﬁed
then
Distribute them using the same method used above
end if

49

Algorithm 4 Modiﬁed Balancing Algorithm - Units on the same channel belong to the
same cluster
Set a minimum number of units for each cluster
Run the clustering algorithm described in 3.3.1 and get the functional clusters
for each channel C do
if units recorded on C do not belong to the same cluster then
Find the average connectivity of each cluster to each of the units recorded on C
Move all units recorded on C to the cluster they are most connected to
end if
end for
Sort channels in the clusters based on average connectivity of each channel units to the
rest of the cluster units
Determine clusters that have excess units (P )
Determine clusters that are in short of units (R)
while R is not empty && P is not empty do
for each cluster C in set P do
Find the connectivity of the least connected channel to the rest of the cluster units
(average connectivity of channel units) M inConC
end for
Select the cluster C from the set P with the minimum M inConC (i.e. the cluster that
has the least connected channel)
Declare the units of the least connected channel in C free units
for each cluster C in set R do
Find the average connectivity of the free units to its units F reeU nitConC
end for
Add the free units to the cluster C with maximum F reeU nitConC (i.e. the cluster that
the free units are most connected to)
Update stats of sets P and R
end while
if there are clusters with excess number of units above the minimum number speciﬁed
then
Distribute them using the same method used above
end if

50

3.5

Unit Selection Algorithm

Figure 3.2 illustrates all the steps of the unit selection algorithm. For each channel, spike
sorting is performed on the recorded signal. The spike sorted units together with the stored
units proﬁles previously recorded on this channel are then input to the tracking module which
ﬁnds matches between the units proﬁles and the spike sorted units. The tracking module
starts its operation by extracting features from instances of single-units (spike sorted units
and units proﬁles). It then ﬁnds pairwise distances between the features of each spike sorted
single-unit and the instances of each stored proﬁles. After ﬁnding the best assignment,
the tracking module outputs the updated proﬁles recorded on the channel. Those updated
proﬁles are then input to a stability units ﬁlter, that extracts only stable units according to
the criteria speciﬁed in Section 3.2. Stable units from all channels are then lumped together
and input to the functional connectivity module which calculates a similarity correlation
matrix using the stable single-units spike trains. A spectral clustering algorithm is then
applied to the similarity matrix where the number of clusters is the number of degrees of
freedom (DOFs) the subject needs to control. The resulting clusters will vary in the number
of single-untis belonging to them. Therefore, a balancing algorithm is applied to the clusters
to: 1) balance the number of units across all clusters, 2) maintain the functional connectivity
relations between units within a cluster. The clusters output from the balancing algorithm
are used to control diﬀerent DOFs through the decoder.

51

Figure 3.2: Unit selection algorithm: On each channel, spike sorted units are compared
against stored proﬁles on the same channel. The tracking classiﬁer determines whether to
add a single-unit to an existing proﬁle or to create a new proﬁle for it. The updated units
proﬁles are input to a stability ﬁlter that classiﬁes units into stable, partially stable and
unstable units. Functional connectivity matrix is calculated for stable units. The matrix is
then input to a clustering algorithm that outputs a set of functionally connected clusters
of single-units. However, these clusters are unbalanced in terms of the number of units.
Therefore, the clusters pass through a balancing algorithm that maintains the functional
connection relations between units while balancing the number of units across clusters. Each
cluster is used to control a kinematic variable.

52

3.6

Unit Replacement Strategies

Even if I only used stable units, units will continue to drop over time [25]. Among the reasons
of this drop is the formation of a glial scar consisting of reactive astrocytes and microglia
around the electrodes site which aﬀects the quality of recording [17]. The decoder should
be prepared with a strategy to replace the dropped unit with another unit that would best
match the dropped one to get the closest decoding performance and to make the replacement
unnoticeable to the subject as much as possible One way of measuring this is by observing
how much the behavioral performance dropped after the replacement.
In this section, I propose two replacement strategies,
• Strategy 1 is based on the relationship between the replacement unit with the rest of
the cluster units,
• Strategy 2 is based on the relationship between the dropped unit and its replacement
Furthermore, a small modiﬁcation is made to the units selection algorithm where the units
in the free pool (not assigned to any cluster) are divided evenly on the clusters using the
same balancing algorithm described in Section 3.4.2. These units are regarded as substitute
units for each cluster (i.e. each cluster has its own free pool of substitute units).

3.6.1

Replacement Strategy 1: Unit Closest to Cluster

This replacement strategy looks for the unit from the free units pool that has the highest
correlation (using the metric described in Section 3.4.1) with the rest of the cluster units.
The intuition behind this strategy is the same as the one behind choosing the functionally
connected units for decoding. We postulate that this would make it easier for the subject

53

to control the robotic arm since we pick the unit that may constitute a member of the same
cell assembly as the rest of the units.

3.6.2

Replacement Strategy 2: Unit Closest to Dropped Unit

This replacement strategy ﬁnds the unit that has the highest correlation to the dropped
unit. The intuition behind this strategy is to ﬁnd the unit that would act the same as the
dropped unit. In principle, the decoder weights corresponding to the dropped unit should be
suitable for the new unit and the replacement process would be transparent to the subject.
I propose two methods to apply this strategy:
• By using the correlation matrix (computed in Section 3.3.1) directly.
• By applying graph-theory algorithms on the correlation matrix (computed in Section
3.3.1).
The second method models the cluster as an undirectional weighted graph where the graph
nodes are the cluster units and the weights of edges between nodes are the distances between nodes. The distance measure between two units x and y is derived from the pairwise
correlation ρ(x, y) such that [5]:

D(x, y) =

√
2 ∗ (1 − ρ(x, y)

(3.2)

Given an adjacency correlation matrix, the model results in a fully connected graph where
each node is connected to all other nodes. The weight on a given edge is the distance between
the two nodes connecting the edge (distance is obtained from correlation as in equation 3.2).
From the fully connected graph, the minimum spanning tree (MST) is constructed using
54

Figure 3.3: An example for an MST constructed from a graph containing 10 nodes. An MST
can be constructed using either Prim’s or Kruskal’s algorithms. Both algorithms are greedy
in the way they choose the MST edges. In both algorithms, edges are sorted ascendingly
based on their weights and the lowest weight edges are added one by one as long as a loop
is not formed and until all nodes are added to the tree. The boldface edges in the ﬁgure are
the ones that were included in the MST. Total cost of the MST is 31. Taken from [50].
any of the well-known MST algorithms (Prim’s [38] or Kruskal’s [26] algorithms). A tree is
an acyclic graph and a spanning tree is a subgraph that contains all of that graph’s vertices
and is a single tree [43]. Finally, an MST is is a spanning tree whose weight (the sum of the
weights of its edges) is no larger than the weight of any other spanning tree [43]. Figure 3.3
shows an example of an MST for a 10-node graph.
The MST in our case is just a representation of the structure of the underlying network
formed by the cluster nodes. It reveals the underlying network that connects all the nodes of
the cluster and includes only the essential strong connections (edges with high correlation)
to connect all the nodes.
My algorithm forms the MST of the cluster units and the free units using the popular
Kruskal algorithm [26], and from the MST it computes all pairs shortest paths using FloydWarshall algorithm [28]. The free node that is closest to the dropped one is selected as the

55

replacement unit. This is diﬀerent from merely selecting the unit that is most correlated to
the dropped unit because only a small subset of the edges are included in the MST (most of
the connections are considered redundant and not included in the formed tree).

3.7
3.7.1

Results
Tracking Single-Units Stability

I repeated the same analysis I did in Section 2.6.3 after applying the modiﬁcation described in
Section 3.2 to the tracking algorithm to verify that the modiﬁcation resulted in an improved
performance. I chose the stability window to be 7 days based on the results in [11].
Figures 3.4 and 3.5 compare the performance of the four classiﬁers using cross validation.
The results were very similar to the results I obtained in Section 2.6.3.
Tables 3.1 and 3.2 compare the classiﬁers’ performance when all the 15 datasets were used
in training and testing. Again, the RVM classiﬁer was slightly superior to other classiﬁers in
terms of both metrics, that is why I chose to use it for the rest of the analysis. Furthermore,
feature set 4 gave the best trade-oﬀ between accuracy and speed (similar to what I obtained
in Section 2.6.3).
Table 3.1: Classiﬁers’ Performance - Classiﬁcation Accuracy
Set/Class.
Set 1
Set 2
Set 3
Set 4

RVM
91.67%
89.89%
62.11%
90.30%

SVM
88.05%
88.32%
67.84%
89.55%

NB
88.12%
89.48%
57.26%
90.03%

MAP
87.44%
85.87%
57.13%
89.89%

Figure 3.6 shows the same proﬁles I tracked in Section 2.6.3. Both methods obtained the

56

Per. of Correct Classification

RVM

100

NB

MAP

80
60
40
20
0

Set1

Set2

RVM

100
Per. of Correct Profiles

SVM

Set3
Feature Set

SVM

Set4

NB

MAP

80
60
40
20
0

Set1

Set2

Set3
Feature Set

Set4

Figure 3.4: Comparison of the classiﬁers performance using cross validation. Four diﬀerent
sets of features were used. Cross validation was performed using 6 partitions (each had 5
training datasets and 5 testing datasets). The errorbars indicate the standard deviation
across the 6 partitions. All classiﬁers performed similarly for feature sets 1, 2 and 4 (conﬁrmed by t-test).

57

Avg. Execution Time Per Channel (Sec.)

2.5
2
1.5
1
0.5
0

RVM
SVM
NB
MAP
Set1

Set2
Set3
Feature Set

Set4

Figure 3.5: Comparison of the classiﬁer performance based on the average execution time per
channel. Cross validation was performed using 6 partitions and 4 feature sets were deﬁned.
and the errorbars indicate the standard deviation across the 6 partitions. All classiﬁers
performed similarly for feature sets 1, 2, and 4 (conﬁrmed by t-test).
same results. Finally, Figure 3.7 shows the average execution time per channel when the
algorithm is run on 35 datasets spanning a period of 88 days. The execution time converges
on day 34 of the analysis.
Comparing the results I obtained using FMD and MMD, I found the following:
• MMD has a higer average execution time per channel and that results from the additional number of comparisons the algorithm has to perform to ﬁnd the closest matching
unit within the “Stability Window ” history.
• MMD has a higer accuracy than FMD and that is because the technique aims at ﬁnding
the closest match to the new unit.
I decided to use MMD for the rest of the analysis.

58

Day

Day

26
24
17
12
10
7
4
1

26
24
17
12
10
7
4
1

57 uV

109 uV
0.125 ms

0.125 ms
0

0.5
1.0
Time (ms)

1.5

0

0.5
1.0
Time (ms)

1.5

Avg. Execution Time Per Channel (Sec.)

Figure 3.6: Two of the proﬁles tracked by the RVM classiﬁer across 15 datasets. The proﬁles
were recorded on channels 4 and 90 respectively.

14
RVM
12
10
8
6
4
2
0

1 4 7 10 12 17 24 26 34 40 44 46 50 53 57 59 86 88
Day Index

Figure 3.7: Average execution time per channel when feature set 4 and the RVM classiﬁer
were used. The classiﬁer was run on 35 datasets spanning 88 days. Convergence in the
execution time happens on day 34 of the analysis.

59

Table 3.2: Classiﬁers’ Performance - Percentage of Correct Proﬁles
Set/Class.
Set 1
Set 2
Set 3
Set 4

3.7.2

RVM
82.67%
75.98%
26.37%
77.16%

SVM
74.80%
74.80%
29.92%
75.19%

NB
75.19%
66.53%
16.14%
75.98%

MAP
73.22%
65.35%
15.74%
75.19%

Stability Analysis

Next, I investigated the stability of the units for longer durations. I ran the tracking algorithm on 55 datasets spanning the duration of 5 months. Figure 3.8 shows diﬀerent statistics
about the neural population on any given day of the analysis. The number of units was almost constant across days and it shows that we still have not reached the point where the
array becomes incapable of detecting spikes on a signiﬁcant number of channels [25]. The
duration after which the spiking activity fades away varries from one subject to another, and
the reason behind this fading away is believed to be the formation of a glial scar consisting
of reactive astrocytes and microglia around the electrodes site which aﬀects the quality of
recording [17]. The slow decay of the curve of the number of units that were recorded on
day 1 and recorded on any given day (the blue line) conﬁrms the results in [11] which stated
that once a unit becomes stable it is more likely to remain stable.
I also plotted the number of units of each of the categories deﬁned in Section 3.2 on any
given day in Figure 3.9. As what we expected from observing Figure 3.8, the stable units
form the majority of the population. Fluctuations in the total number of recorded units are
reﬂected on the number of stable and unstable units. An increase in the number of units
is reﬂected as an increase in the number of unstable units and a decrease in the number of
units is reﬂected as a decrease in the number of stable units. Furthermore, Figures 3.10,

60

250

Num Units

200
150
Total

Day1 units

New

Dropped

100
50
0
10/26

11/14

01/02

01/23
Date

02/16

03/19

Figure 3.8: For a given day x: the black line shows total number of units recorded on day
x, the blue and green lines indicate the number units that were recorded on day 1 and were
also recorded or dropped respectively on day x, and the red line shows the number of new
units recorded on day x relative to day 1.
3.11, and 3.12 display diﬀerent samples of the proﬁles tracked.
Finally, I conclude the units tracking analysis by plotting a histogram of the number of
contiguous days a unit is recorded on until it drops. The histogram is displayed in Figure
3.13. The mass of the histogram is mainly divided in two regions:
• Days < 20
• Days > 140
Again, my results proves that once a unit becomes stable, it is more likely to remain stable
for a long time.

3.7.3

Unit Selection

We used the algorithm to select the units of ﬁve clusters for one subject. We assigned
20 units for each cluster. Figure 3.14 shows the units correlation matrix before and after
61

250

Num Units

200
150
100
Stable

PartiallyStable

Unstable

50
0
10/26

11/14

01/02

01/23
Date

02/16

03/19

Figure 3.9: Number of stable, partially stable, and unstable units on any given day. An
increase in the number of units is reﬂected as an increase in the number of unstable units
and a decrease in the number of units is reﬂected as a decrease in the number of stable units.

applying the balancing algorithm. The ﬁgure shows that the correlation within a cluster
is higher than that between clusters. The ﬁgure also shows that the balancing algorithm
only swaps the weakly connected units between clusters and therefore, it maintains the same
clusters skeleton resulting from the functional connectivity algorithm. The mapping of the
clusters on the physical electrodes is displayed in Figure 3.15. Some of the clusters reﬂect
some degree of localization on the array and this agrees with the fact that neighboring units
tend to be functionally connected [48] (because there is a high probability of having common
inputs.)

3.7.4

Replacement Strategies

I compared the performance of three replacement strategies using 10 datasets:
1. Finding the most correlated unit to the dropped unit (UnitFuncCon).
62

Figure 3.10: The ﬁgure shows 4 sample proﬁles recorded on channels 30, 32, 49, and 50
respectively. The single-units represented by the proﬁles were stable throughout the duration
of the analysis (5 months) and were recorded on all 55 datasets. Each subplot displays the
mean of the average waveforms of the instances of each proﬁle. The color range surrounding
the plots repesents the standard deviation across all instances average waveforms. We can see
that all 4 single-units experienced very little changes in their average waveforms throughout
the duration of the analysis.

63

Figure 3.11: The ﬁgure shows 4 sample proﬁles recorded on channels 23, 25, 29, and 42
respectively. The single-units represented by the proﬁles were stable throughout the duration
of the analysis (5 months) and were recorded on all 55 datasets. Each subplot displays the
mean of the average waveforms of the instances of each proﬁle. The color range surrounding
the plots repesents the standard deviation across all instances average waveforms. The
single-units in this experienced bigger changes in their average waveforms throughout the
duration of the analysis than those in Figure 3.10. The algorithm was still able to capture
those changes and correctly classify the single-units instances.

64

Figure 3.12: The ﬁgure shows 2 sample proﬁles recorded on channels 44 and 48 respectively.
The single-units represented by the proﬁles were unstable and they were only recorded on
18 and 15 datasets respectively (out of 55 datasets). We can see that the average waveforms
of both units are unstable and there is much variability across days.

120

Num of Units

100
80
60
40
20
0
0

50
100
Contiguous Recording Days

150

Figure 3.13: Histogram of the number of contiguous days a unit is likely to be recorded on
until it drops. The large peaks on the left and right of the plot demonstrate that once a unit
becomes stable, it is likely to remain stable for a long time.

65

1

1

0.9
0.8
Unit Num.

31

0.7
0.6
0.5

53
56

0.4
0.3

78

0.2
0.1
1

31

536
5
Unit Num.

78

0

1

1

0.9
21

0.8

Unit Num.

0.7
41

0.6
0.5

61

0.4
0.3

81

0.2
0.1
1

21

41
61
Unit Num.

81

0

Figure 3.14: The ﬁrst ﬁgure shows the correlation matrix obtained by the functional connectivity algorithm where spectral clustering was applied on the matrix to partition the units
into 5 clusters. The numbers on the axes indicated the start of each cluster. It is clear that
the clustering algorithm results in unbalanced clusters, this is especially true for cluster 3
that contains only 3 units while cluster 1 contains 30 units. The second ﬁgure shows that
correlation matrix obtained by balancing the number of units across clusters while taking
the functional connectivity between units into consideration. The strongly connected units
in the initial clusters remained while the weakly connected ones were transferred to other
clusters.

66

Figure 3.15: Mapping of the cluster units on the physical recording array.

67

2. Finding the most correlated unit to the dropped unit using MST as described in Section
3.6.2 (MSTFuncCon).
3. Find the unit that is most functionally connected to the remaining units in the cluster
as described in Section 3.6.1 (ClusterFuncCon).
The comparison was performed by dropping each unit in the reach cluster and replacing the
dropped unit with the replacement unit provided by each strategy. Two metrics were used
in this comparison:
1. The average of the mean squared error (MSE) between the decoded velocity using the
original cluster and that using the cluster with the replacement unit across all units
of the cluster. The lower the average MSE, the better the replacement strategy is,
because this is an indication that it provides replacement units close to the dropped
one.
2. The number of times the strategy provided the replacement unit that has the least
MSE among the free units pool (CorRepl). The higher the value of this metric, the
better the replacement strategy is. This is because it is indication that the replacement
strategy provides the best replacement unit available in the pool of free units a greater
of number of times.
Tables 3.3 and 3.4 show the performance of the three strategies based on the two metrics. Both UnitFuncCon and MSTFuncCon provided replacement units that gave the lowest
average MSE in 40% of the sessions under test. However, in 70% of the sessions MSTFuncCon had the highest number of correct replacements, compared to 50% of the sessions in
case of UnitFuncCon (there was a tie in 20% of the sessions). ClusterFuncCon gave the
68

Table 3.3: Replacement Strategies Performance - average MSE between the reach velocity
decoded using the original cluster and that decoded using the cluster with a replacement
unit across all cluster units.
Strat./Day
1
2
UnitFuncCon
0.24 0.29
MSTFuncCon
0.22 0.25
ClusterFuncCon 0.29 0.30

3
4
0.28 0.30
0.29 0.33
0.29 0.28

5
6
0.27 0.23
0.28 0.31
0.28 0.25

7
0.24
0.38
0.28

8
9
0.37 0.41
0.30 0.24
0.29 0.31

10
0.30
0.26
0.31

Table 3.4: Replacement Strategies Performance - the number of times the strategy provided
the replacement unit that has the least MSE among the free units pool.
Strat./Day
1
UnitFuncCon
8
MSTFuncCon
8
ClusterFuncCon 3

2
6
10
0

3
7
4
0

4
5
2
0

5
5
7
0

6
5
6
1

7
4
1
3

8
3
3
1

9
2
4
0

10
3
5
0

worst performance in both metrics and that is expected since this analysis was done oﬄine
(after recording) and in this strategy I was looking for the unit that would be in synchrony
with the remaining cluster unit rather than looking for the closest unit to the dropped one.
The eﬀect of this choice can be better evaluated online (while the subject is controlling
the robotic arm). In Figure 3.16, I show a subset of the results of the oﬄine reach velocity
decoding using each of the replacement strategies. The ﬁring rates of the replacement unit
were measured during the same interval as those of the dropped one.
I also tested the eﬀect of replacing a unit on the decoding of the succeeding datasets and
that can be achieved by using the replacement units found on day 1 in all succeeding days
as well. Table 3.5 gives the MSE between the velocity decoded using the original cluster and
that decoded using the cluster with the replacement unit averaged across all the units of the
cluster and Figure 3.17 summarizes the testing results. As can be observed, the average MSE
of all replacement strategies ﬂuctuates around a constant value (shown by the short error

69

Original
MostCorr
MostCorrMst
MostCorrFuncConn

1

Decoded reach velocity

0
−1
−2
−3
−4
−5
1.62

1.625

1.63

1.635

1.64 1.645 1.65
Time (50ms bins)

1.655

1.66

1.665
4

x 10

Figure 3.16: A subset of the session recorded on day 1. One unit was dropped and each
replacement strategy provided a replacement unit to be used in the oﬄine decoding of the
reach velocity instead of the dropped one.

70

Table 3.5: Replacement Strategies Performance - average MSE between the reach velocity
decoded using the original cluster and that decoded using the cluster with a replacement
unit across all cluster units - Testing datasets.
Strat./Day
1
2
UnitFuncCon
0.25 0.31
MSTFuncCon
0.23 0.29
ClusterFuncCon 0.30 0.32

3
4
0.30 0.29
0.29 0.28
0.31 0.28

5
6
0.30 0.28
0.30 0.28
0.32 0.29

7
0.29
0.29
0.33

8
9
0.31 0.29
0.34 0.31
0.35 0.36

10
0.26
0.26
0.35

0.4

avg. MSE

0.3

0.2

0.1

0

MostCorr

MostCorrMst MostFuncConn
Repl. Strategy

Figure 3.17: The ﬁgure summarizes the performance of the three replacement strategies
when tested on 9 unseen datasets. The replacement unit in each strategy was selected on
the ﬁrst day and then it was used on the 9 subsequent days. The error bars are the standard
deviation of the average MSE across days.
bars) and does not deteriorates with time which proves the eﬀectiveness of the replacement
strategies. I also performed a t-test to see if one strategy performed better than the other.
Both the UnitFuncCon and MSTFuncCon were better than the ClusterFuncCon (p > 0.004
and p > 0.008 respectively).
Finally, I show the correlation matrix of the cluster and it corresponding MST on day 1
in Figure 3.18. Using the formula deﬁned in equation 3.2, pairwise correlations are converted
√
into distances. This is a non-linear transformation where a zero-correlation is mapped to 2
distance and a unit-correlation is mapped to zero-distance. A correlation of -1 is mapped to
71

a distance equals 2. Therefore, the boundaries of the calculated distance is [0, 2]. The MST
construction algorithm is greedy and it includes edges with small weights (high correlation)
as long as it does not form a loop in the MST. Looking at the correlation matrix, we can
expect certain edges to be present in the MST, such as (5,15) and (5,6) since they have a
higher correlation than other edges.

72

0.3
Unit#

5
0.2

10

0.1

15
20
5

10
Unit#

14
1.25

17
16

15

0

20

15

36
1.

1.
33
1 12
.2
9

19

13

1.3

18

10

6

30
1.

3

1

.3

11

1.28

1.29

1

1.28

1.18

1.
16

1.28

0

8

1

40
1.

.2
6

2

1

1.30
.3

0

5

9

1.14

4

20

33
1.

7

6

Figure 3.18: Figure a displays the augmented correlation matrix of the cluster units on day
1. Figure b shows the MST formed from the correlation matrix.

73

Chapter 4
Discussion
In this chapter, I give a summary of the topics presented in this study, list the advantages and
drawbacks of the techniques used, and give directions to future studies and improvements.
In this study, I presented a single-unit tracking algorithm that is based on waveform
features and ﬁring characteristics of single-units and that does not require the subject to
be engaged in any behavioral task. It also requires a short recording time (15 minutes
of recording time) to perform the tracking. I also speciﬁed a set of criteria to assess the
stability of a single-unit and to classify it into one of three classes: stable, partially stable,
and unstable. I tracked the stability of single-units over the duration of 150 days where
the ﬁrst session was recorded three months after implant. The number of recorded units
was almost constant during this duration and there were many stable units. This shows
that the arrays has not reached the stage where it is unable to detect spike on a signiﬁcant
number of its electrodes. I also used the stability criteria to boost the performance of the
tracking classiﬁer by comparing the newly recorded unit against more than one instance of
the previously recorded single units on a given channel and selecting the instance that is
closest to the new unit. When the algorithm was tested on eight datasets, the accuracy
of classiﬁcation reached 91% when using all the features I proposed and 90% when using
only 37% of the features. 82% of the proﬁles tracked by the algorithm matched the experts’
manual tracking. Furthermore, the running time was short and the convergence occurred

74

at about 12 seconds per channel. It can also reach 8 seconds per channel at a slightly
reduced accuracy. This makes the algorithm suitable for real-time BMI applications such as
identifying units at the start of a recording session to be used in brain-control decoding.
One possible improvement to the tracking algorithm is to perform a two-level classiﬁcation paradigm. The current paradigm is based on one classiﬁer that is used for all channels.
However, diﬀerent channels have diﬀerent properties, for example units might experience
more variability on one channel than on others. A human expert can capture the variability
of a channel across days by looking at the channel’s waveforms, while an automated method
cannot because the training points extracted from this channel would contaminate the classiﬁcation decision or might be considered outliers. In the two-level classiﬁcation paradigm, a
custom classiﬁer for each channel can be trained using training features extracted from this
channel’s units only. The challenges in this paradigm include the need for a large number of
training datasets to generate enough data points for classiﬁers’ training and ways to handle
the introduction of new single-units on the channel. These challenges should be addressed
before implementing this paradigm.
A drawback of my tracking technique is that it does not include any tuning features,
which are considered a strong proof of whether the recordings on two diﬀerent days were
from the same unit or not. Although each unit recorded on an electrode has its characteristic
waveform, it was shown that the shape of the waveform depends on the relative location of
the recording electrode to the cell [20]. It may happen that the electrode lies exactly midway
between two cells and the same waveform shape is recorded for both cells. Also, the micromovement occurring in the brain might have a major eﬀect on the shape of the recorded
waveform if a diﬀerent part of the cell body is closer to the recording electode. However,
there are some drawbacks in using the tuning features in tracking units. First of all, the
75

subject has to be engaged in a behavioral task and the tuning features depend on this task
Second, changes might occurr to the tuning function of units as the subject learns the task.
I also proposed a unit selection algorithm for BMI decoding as an application of the unit
tracking algorithm. For a ﬁxed decoder, the mapping from the neural space to the kinematic
parameters should be ﬁxed, and in order to do so, the same units have to be used with the
same decoder coeﬃcients across days. I use the tracking algorithm to identify units at the
start of each recording session for this purpose. But then comes the question of how to select
the units to use for decoding. I speciﬁed four criteria for this and explained the reasoning
behind each:
1. Units have to be stable to guarantee the reliability and long lasting of those units
(based on the results of [11]).
2. Units within a cluster have to be functionally connected because functionally connected
units are potential cell assemblies that share common coding. We also hypothesized
that using functionally connected units can speed the cortical pasticity reﬂecting learning of the behavioral task [1].
3. The number of units across the clusters should be the same to provide the subject with
the same degree of control for each degree of freedom by assigning equal number of
units across clusters. This way, we also divide the labor equally over units so that the
control would not deteriorate in case of a unit drop.
4. Units recorded on the same electrode should be put in the same cluster to control the
same degree of freedom. This is due to the high probability of them sharing the same
source of input [48] and to minimize the eﬀect of misclassiﬁcations in spike sorting and
units tracking by making the units still control the same degree of freedom.
76

Some of clusters that resulting from the unit selection algorithm show some degree of localization on the physical recording array. The balancing algorithm also maintained the same
connectivity strength in the clusters output from the functional connectivity algorithm and
it only swapped the loosely connected units.
I also described the details of three strategies that provide replacements for dropped
decoding units and they are all based on the augmented correlation matrix resulting from
the functional connectivity algorithm. Using an oﬄine analysis, selecting the most correlated
replacement to the dropped unit using graph-theory algorithms provided the best results.
However, furhter veriﬁcation of these replacement techniques have to be done online.
One possible strategy to replace units is to use a longer lasting signal than the spikes, the
local ﬁeld potentials (LFP). The LFPs are a much slower wave than the APs and they are
the aggregate of synchronized activity of the ensemble of neurons relatively away from the
voltage sensing device (> 250µ m) [36]. In [39], the authors report that they were able to
infer the spiking activity of single-units from LFP in the primary visual cortex (V1). Zhuang
et al. in [51] report that LFP can be used to decode hand 3-D position and grasp aperture.
LFP can be used to build what might be called a “Pseudo Unit” by training a classiﬁer or an
artiﬁcial neural network (ANN) to predict the ﬁring rates of the dropped unit from features
extracted from the LFP signal. The LFP features include the signal power and phase across
diﬀerent frequency bands. One challenge towards the use of ANN to predict ﬁring rates
from LFP is how to quantize the resulting output to reﬂect the discrete-valed ﬁring rates.
K-means clustering is a possible solution to this problem, but it requires the knowledge of
the maximum ﬁring rate in advance.
Another possible application of the tracking algorithm is to track brain-mediated plasticity. This plasticity occurrs while the subject is learning the task where the tuning functions
77

of units and their functional connections with other units may change over time. However,
to do so we need the accurate identiﬁcation of units which can be provided by the tracking
algorithm.
Finally, I point out that the techniques discussed here promotes the practicality of BMI
system. Using the tracking algorithm, units can be accurately identiﬁed across days and this
enables us to maintain a ﬁxed mapping between the neuronal space and decoded movement
parameters (i.e. ﬁxed decoder coeﬃcients). In addition, using the replacement strategies
discussed, the system can react automatically to units dropping and maintain a relatively
ﬁxed mapping that is unnoticeable to the subject. This alleviates the need of a human operator that performs decoder calibration on a daily basis. Also, since the tracking algorithm
does consume much time for its operation, any automated decoder calibration can run fast.

78

APPENDICES

79

Peak Matching
The peak matching ﬁnds the distance between two signals using the maximum and minimum
peaks in both signals. The purpose of the peak matching is to provide comparable results to
those given by an expert. It focuses on the features an expert pays attention to in comparing
signals and the main ones of those are the maximal and minimal peaks.

Preprocessing
Cubic spline interpolation is performed on the signal to generate more data points. This
aims at considering the signal a continuous one.

Algorithm
The algorithm ﬁnds the peaks of both signals, assigns weights to them (based on their shape)
and ﬁnally quantiﬁes the pairwise distance between peaks from both signals and summarizes
them in one ﬁnal distance.
Given two signals, H(x) and L(x), each of length n, the similarity of H(x) to L(x),
˜(H, L), is:
r
˜(H, L) = K1 (H, L)K2 (H, L)
r

80

(1)

where ˜(H, L) is the asymmetric similarity of H to L, and K1 (H, L) and K2 (H, L) are factors
r
deﬁned in Equations 2 and 4.
The ﬁrst factor, K1 (H, L), quantizes the absolute distance between the two signals using
the Manhattan distance and can be viewed as a scaling factor:
(
K1 (H, L) = exp

S(H, L)
¯
S

)
(2)

¯
where S is a parameter and S(H, L) is the Manhattan distance between the two signals
and is deﬁned as:

S(H, L) =

n
∑

|H(x) − L(x)|

(3)

x=1

where H(x) and L(x) are the xth points in signals H and L respectively.
The second factor, K2 (H, L), ﬁnds a weighted distance between the peaks of the two
signals using some heuristic measures.

K2 (H, L) =

∑

µi (H)ci (H, L)

(4)

i

where µi (H)’s are a set weights given to the peaks of H(x) (see Section ) and ci (H, L)’s
are heuristic closeness factors between peaks in H(x) and the closest peaks to them in L(x)
(see Section ).

Peak Weights Calculation
In this section, I show how the weights given to peaks of a signal are calculated. The value
of the weight given to a peak depends on its height compared to its neighboring points.

81

The ﬁrst and second derivates of a signal H(x) are deﬁned as:

H (1) (x) = δH(x)/δx

(5)

H (2) (x) = δ 2 H(x)/δx2

(6)

A peak at a point xi is deﬁned as a point in which the second derivative has a local
minimum (this deﬁnition includes “shoulders” as well). Deﬁne also two points related to each
lef t

peak, xi

right

and xi

as the nearest points on the left and right of each peak respectively

where the second derivate is zero.
Finally, the weight of a peak at point xi is:
H (2) (xi )li

µi = ∑
η(H)

(7)

(2)
j=1 H (xj )lj
lef t

where li is the maximal distance from the points on the signal in the interval [xi
lef t

and the line joining xi

right

and xi

right

, xi

]

. η(H) is the total number of peaks in signal H.

Closeness Factor Calculation
The closeness factor (similarity) between peak i in signal H(x) and peak j in signal L(x),
˜ij , is the multiplication of three factors:
c

y f orm

˜ij = cx cij cij
c
ij

82

(8)

The ﬁrst factor is the distance between the x-location of the two peaks:
{ (
) }
△xij 2
cx = exp −
ij
δx

(9)

where δx is a parameter, and

△xij = xi − xj
˜

(10)

xi is the x-location of peak i in signal H(x), and xj is the x-location of peak j in signal
˜
L(x).
Similarly, the second factor is the distance between the y-location of the two peaks:
{ (
) }
△yij 2
y
cij = exp −
δy

(11)

where δy is a parametre, and

△yij = H(xi ) − L(˜j )
x

(12)

Finally, the third factor aims at ﬁnding the distance between the shapes of the two peaks:

f orm
= exp
cij

{

fij ν
¯
−
η(H) + η(L)

83

}
(13)

lef t

H (1) (xi

fij =

lef t

H (1) (xi

H (1) (xi

right

H (1) (xi

right

+

(xi

right

(xi

lef t

− xi

lef t

− xi

lef t

) + L(1) (˜j
x

right

+

lef t

) − L(1) (˜j
x

) + ε1

right

) − L(1) (˜j
x

right

) + L(1) (˜j
x
right

) − (˜j
x

right

) + (˜j
x

lef t

− xj
˜

)

) + ε1

lef t

− xj
˜

)

)
(14)

) + ε2

where ε1 and ε2 are parameters.
The closeness factor of a peak is deﬁned as the maximum closeness factor it has with any
of the peaks in the other signal:

ci = max(˜ij )
c
j∈J

(15)

The calculated similarity is asymmetric and the symmetric peak matching similarity
between two signals is calculated as the geometric mean of the asymmetric ones:

r(H, L) =

√
˜(H, L)˜(L, H)
r
r

(16)

and the peak matching distance is:

D(H, L) = 1 − r(H, L)

(17)

Figure 1 shows an illustration of the previous steps to calculate the peak matching similarity of one signal to another.

84

Figure 1: The ﬁgure shows the steps of calculating the peak matching similarity of a signal
x (blue) to another signal y (red). Both x and y have two peaks. Weights are given to
the peaks of x, µ1 and µ2 . Closeness factors are calculated between pairwise peaks in both
signals. In this case, there are 4 closeness factors to calculate (c11 , c12 , c21 , and c22 ). For
each signal in x, we select the maximum closeness factor it has with any of the peaks in y.
We end up with 2 factors, c1 and c2 , one for each peak in x. The total similarity of x to y is
then the sum of the peak weights multiplied by the closeness factors for all peaks of x (i.e.
µ1 c1 + µ2 c2 ).

85

Functional Connectivity Algorithm
The following are the details of computing each step of the functional connectivity algorithm
used in Section 3.3.1. The details below are mostly adapted from [13].
At any given timescale j, the spike train of a neuron p can be expressed as:

j

sp (t) = W (j) sp (t), j = 0, 1...J

(18)

where sp (t) is the spike train (0’s and 1’s)of neuron p, W (j) is a lumped matrix representing
the Haar wavelet transform and J is the total number of timescales. For instance, equation
shows the Haar wavelet transform lumped matrix for the ﬁrst time scale, W (1) .


0
0
0
0
0
0 
1/2 1/2




 0
0
1/2 1/2
0
0
0
0 






 0
0
0
0
1/2 1/2
0
0 






0
0
0
0
0
1/2 1/2 
√  0

W (1) = 2 


1/2 −1/2 0
0
0
0
0
0 






 0
0
1/2 −1/2 0
0
0
0 






 0
0
0
0
1/2 −1/2 0
0 




0
0
0
0
0
0
1/2 −1/2
86

“Multiplying by the top half of W (1) by the spike train sp gives the probability of ﬁring
in successive bins that are two samples in size. Multiplying by the bottom half of W (1) by
the spike train sp gives the relative change in the ﬁring pattern in successive bins that are
two samples in size.”
The sample cross-correlation between neurons p and q at timescale j can be expressed
as:
j

j

j

σpq (ti , tj ) = E{sp (ti )sq (tj )}

(19)

The full correlation matrix at timescale j, Σ(j) , can be formed using equation 19 for all
p, q ∈ {1, ..., P }, where P is the number of neurons in the population.
A block diagonal matrix can then be formed as:

(0)
Σ



Σ(1)

R=

..

.





Σ(J)












(20)

Next, R is decomposed using SVD:

R=

Q
∑

λui uT
i

(21)

i=1

and ﬁnally, the pairwise similarity (augmented correlation) of the neurons is obtained
using the weighted sum of the ﬁrst few dominant modes of R.

87

BIBLIOGRAPHY

88

BIBLIOGRAPHY
[1] Ahissar, Vaadia, Ahissar, Bergman, Arieli, and Abeles, Dependence of cortical plasticity
on correlated activity of single neurons and on behavioral context, Science 257 (1992),
no. 5075, 1412–1415.
[2] Balasubramanian, Southerland, Vaidya, Qian, Eleryan, Fagg, Sluzky, Oweiss, and Hatsopoulos, Operant conditioning of a multiple degree-of-freedom brain-machine interface
in a primate model of amputation, Engineering in Medicine and Biology Society, 2013.
EMBS’13. 35th Annual International Conference of the IEEE, IEEE, 2013, p. To appear.
[3] Bhattacharyya, On a measure of divergence between two statistical populations deﬁned
by their probability distributions, Bull. Calcutta Math. Soc 35 (1943), no. 99-109, 4.
[4] Bishop et al., Introduction to information retrieval, vol. 1, Cambridge University Press,
2008.
[5] Borgatti, Multivariate statistics, University Lecture, 2007.
[6] Carmena, Lebedev, Henriquez, and Nicolelis, Stable ensemble performance with singleneuron variability during reaching movements in primates, The Journal of neuroscience
25 (2005), no. 46, 10712–10716.
[7] Chen and Fetz, Characteristic membrane potential trajectories in primate sensorimotor
cortex neurons recorded in vivo, Journal of neurophysiology 94 (2005), no. 4, 2713–2725.
[8] Chestek, Batista, Santhanam, Byron, Afshar, Cunningham, Gilja, Ryu, Churchland,
and Shenoy, Single-neuron stability during repeated reaching in macaque premotor cortex, The Journal of Neuroscience 27 (2007), no. 40, 10742–10750.
[9] Chestek, Gilja, Nuyujukian, Foster, Fan, Kaufman, Churchland, Rivera-Alvidrez, Cunningham, Ryu, et al., Long-term stability of neural prosthetic control signals from silicon
cortical arrays in rhesus macaque motor cortex, Journal of neural engineering 8 (2011),
no. 4, 045005.
[10] Dayan, Abbott, and Abbott, Theoretical neuroscience: Computational and mathematical modeling of neural systems, Taylor & Francis, 2001.

89

[11] Dickey, Suminski, Amit, and Hatsopoulos, Single-unit stability using chronically implanted multielectrode arrays, Journal of neurophysiology 102 (2009), no. 2, 1331–1339.
[12] Domingos and Pazzani, On the optimality of the simple bayesian classiﬁer under zeroone loss, Machine learning 29 (1997), no. 2-3, 103–130.
[13] Eldawlatly, Jin, and Oweiss, Identifying functional connectivity in large-scale neural
ensemble recordings: a multiscale data mining approach, Neural computation 21 (2009),
no. 2, 450–477.
[14] Fraser and Schwartz, Recording from the same neurons chronically in motor cortex,
Journal of Neurophysiology 107 (2012), no. 7, 1970–1978.
[15] Ganguly and Carmena, Emergence of a stable cortical map for neuroprosthetic control,
PLoS biology 7 (2009), no. 7, e1000153.
[16] Gilletti and Muthuswamy, Brain micromotion around implants in the rodent somatosensory cortex, Journal of neural engineering 3 (2006), no. 3, 189.
[17] Griﬃth and Humphrey, Long-term gliosis around chronically implanted platinum electrodes in the¡ i¿ rhesus macaque¡/i¿ motor cortex, Neuroscience letters 406 (2006),
no. 1, 81–86.
[18] Hatsopoulos and Donoghue, The science of neural interface systems, Annual review of
neuroscience 32 (2009), 249.
[19] Hazewinkel, Encyclopaedia of mathematics, supplement iii, vol. 13, Springer, 2001.
[20] Henze, Borhegyi, Csicsvari, Mamiya, Harris, and Buzs´ki, Intracellular features prea
dicted by extracellular recordings in the hippocampus in vivo, Journal of neurophysiology
84 (2000), no. 1, 390–400.
[21] Hikosaka, Nakamura, Sakai, and Nakahara, Central mechanisms of motor skill learning,
Current opinion in neurobiology 12 (2002), no. 2, 217–222.
[22] Humphries, Spike-train communities: ﬁnding groups of similar spike trains, The Journal
of Neuroscience 31 (2011), no. 6, 2321–2336.
[23] Jackson and Fetz, Compact movable microwire array for long-term chronic unit recording in cerebral cortex of primates, Journal of neurophysiology 98 (2007), no. 5, 3109–
3118.
90

[24] Jackson, Mavoori, and Fetz, Long-term motor cortex plasticity induced by an electronic
neural implant, Nature 444 (2006), no. 7115, 56–60.
[25] Kr¨ger, Caruana, Dalla Volta, and Rizzolatti, Seven years of recording from monkey
u
cortex with a chronically implanted multiple microelectrode, Frontiers in neuroengineering 3 (2010).
[26] Kruskal, On the shortest spanning subtree of a graph and the traveling salesman problem,
Proceedings of the American Mathematical society 7 (1956), no. 1, 48–50.
[27] Kullback and Leibler, On information and suﬃciency, The Annals of Mathematical
Statistics 22 (1951), no. 1, 79–86.
[28] Leiserson, Rivest, Stein, and Cormen, Introduction to algorithms, The MIT press, 2001.
[29] Linderman, Gilja, Santhanam, Afshar, Ryu, Meng, and Shenoy, Neural recording stability of chronic electrode arrays in freely behaving primates, Engineering in Medicine and
Biology Society, 2006. EMBS’06. 28th Annual International Conference of the IEEE,
IEEE, 2006, pp. 4387–4391.
[30] Liu, McCreery, Carter, Bullara, Yuen, and Agnew, Stability of the interface between
neural tissue and chronically implanted intracortical microelectrodes, Rehabilitation Engineering, IEEE Transactions on 7 (1999), no. 3, 315–326.
[31] L¨tcke, Margolis, and Helmchen, Steady or changing? long-term monitoring of neuronal
u
population activity, Trends in neurosciences (2013).
[32] Von Luxburg, A tutorial on spectral clustering, Statistics and computing 17 (2007),
no. 4, 395–416.
[33] Manning et al., Pattern recognition and machine learning, vol. 1, springer New York,
2006.
[34] Nicolelis, Dimitrov, Carmena, Crist, Lehew, Kralik, and Wise, Chronic, multisite, multielectrode recordings in macaque monkeys, Proceedings of the National Academy of
Sciences 100 (2003), no. 19, 11041–11046.
[35] Nicolelis and Lebedev, Principles of neural ensemble physiology underlying the operation
of brain–machine interfaces, Nature Reviews Neuroscience 10 (2009), no. 7, 530–540.

91

[36] Oweiss (ed.), Statistical signal processing for neuroscience and neurotechnology, 1st ed.,
Academic Press, 2010.
[37] Polikov, Tresco, and Reichert, Response of brain tissue to chronically implanted neural
electrodes, Journal of neuroscience methods 148 (2005), no. 1, 1–18.
[38] Prim, Shortest connection networks and some generalizations, Bell system technical
journal 36 (1957), no. 6, 1389–1401.
[39] Rasch, Gretton, Murayama, Maass, and Logothetis, Inferring spike trains from local
ﬁeld potentials, Journal of neurophysiology 99 (2008), no. 3, 1461–1476.
[40] Reich, Mechler, Purpura, and Victor, Interspike intervals, receptive ﬁelds, and information encoding in primary visual cortex, The Journal of Neuroscience 20 (2000), no. 5,
1964–1974.
[41] Rousche and Normann, Chronic recording capability of the utah intracortical electrode
array in cat sensory cortex, Journal of neuroscience methods 82 (1998), no. 1, 1–15.
[42] Salkind, Encyclopedia of research design, vol. 1, Sage Publications, Incorporated, 2010.
[43] Sedgewick, Algorithms in java, parts 1-4, vol. 1, Addison-Wesley Professional, 2002.
[44] Strelkov, A new similarity measure for histogram comparison and its application in time
series analysis, Pattern Recognition Letters 29 (2008), no. 13, 1768–1774.
[45] Suner, Fellows, Vargas-Irwin, Nakata, and Donoghue, Reliability of signals from a chronically implanted, silicon-based electrode array in non-human primate primary motor cortex, Neural Systems and Rehabilitation Engineering, IEEE Transactions on 13 (2005),
no. 4, 524–541.
[46] Tolias, Ecker, Siapas, Hoenselaar, Keliris, and Logothetis, Recording chronically from
the same neurons in awake, behaving primates, Journal of neurophysiology 98 (2007),
no. 6, 3780–3790.
[47] Peter Torrione, Sam Keene, and Kenneth Morton, PRT: The pattern recognition toolbox
for MATLAB, 2011, Software available at http://newfolderconsulting.com/prt.
[48] Vaadia, Haalman, Abeles, Bergman, Prut, Slovin, and Aertsen, Dynamics of neuronal
interactions in monkey cortex in relation to behavioural events, Nature 373 (1995),
no. 6514, 515–518.
92

[49] Warland, Reinagel, and Meister, Decoding visual information from a population of retinal ganglion cells, Journal of Neurophysiology 78 (1997), no. 5, 2336–2350.
[50] Wikipedia, Minimum spanning tree, July 2013.
[51] Zhuang, Truccolo, Vargas-Irwin, and Donoghue, Decoding 3-d reach and grasp kinematics from high-frequency local ﬁeld potentials in primate primary motor cortex, Biomedical Engineering, IEEE Transactions on 57 (2010), no. 7, 1774–1784.
[52] Zumsteg, Kemere, O’Driscoll, Santhanam, Ahmed, Shenoy, and Meng, Power feasibility
of implantable digital spike sorting circuits for neural prosthetic systems, Neural Systems
and Rehabilitation Engineering, IEEE Transactions on 13 (2005), no. 3, 272–279.

93