c5 ‘1? E This is to certify that the thesis entitled Methods for Neural Signal Processing and Analysis presented by Yasir Suhail LIBRARY Michigan State Umversity has been accepted towards fulfillment of the requirements for the MS. degree in Electrical Engineering War Professor/s Signature 0:3 1/ z 3 ,1 o f Date MSU is an Affinnative Action/Equal Opportunity Institution PLACE IN RETURN Box to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECAILED with earlier due date if requested. DATE DUE DATE DUE DATE DUE Pitt» we 2/05 c:/ClRC/DateDue.ind¢p.15 Methods for Neural Signal Processing and Analysis By Yasir Suhail A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Department of Electrical and Computer Engineering 2005 ABSTRACT Methods for Neural Signal Processing and Analysis By Yasir Suhail High-density microelectrode arrays offer indispensable tools for neuroscience re- search and potential benefits for neuroprosthetic applications at the micro and nano scales. Advances in microfabrication and microelectronics have enabled integration of large number of electrodes on a single device with modest signal processing capa- bilities, which in turn triggered many new scientific discoveries in the neuroscience community. From a system viewpoint, these rapid developments present numerous challenges to the associated data processing and analysis stages. In this work, we present algorithms and performance analyses of a neural interface system designed for large scale processing of the neural data obtained. We evaluate the performance and tradeoffs of the system’s computational complexity versus data precision and signal fidelity during real-time telemetry transmission. We also evaluate perfor- mance tradeoffs between single and multiple channel signal detection techniques. On the data mining aspect, we present a new multiscale processing and clustering technique to infer functional interdependency between populations of neurons from the recorded mixture to ease the identification of local and global biological neural circuits. Copyright © by Yasir Suhail 2005 To my parents and my sister iv ACKNOWLEDGEMENTS I would like to express my gratitude to my thesis advisor Dr. Karim G. Oweiss for his constant guidance, support and encouragement throughout my two years of graduate study. I must credit him with finding time whenever it was needed for discussing any results and progress. His support has been indispensable not only for this work, but also for my graduate study and career. I also wish to record the support and camaraderie of my lab mates and fellow graduate students, notably Kyle Thomson, Lance Slifka, and April Thompson which made it easier to get through the work and enjoy it. I have benefitted greatly from the company of many people at Michigan State, most notably Professors Hayder Radha and Selin Aviyente who have taught me most of what I know of Signal Processing, Information Theory, and Statistics. Professor Rong Jin has been very helpful with his collaboration in the area of probabilistic clustering. Michigan State University hosts one of the friendliest people to learn from and spend time with. A diverse community of wonderful students within the Department and the University in general have been a bedrock of support and have made my stay memorable. I would not have been able to achieve any of this without the unconditional love, support, and good wishes of my parents and my little sister. I do not have the words to express my feelings and appreciation to them. Most of all, I thank the Almighty for bestowing on me these blessings so I could comfortably engage myself in the pursuit of knowledge that I have enjoyed so much. vi TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES 1 Wavelet Transform Block for Implantable Neural Interfaces 1.1 Introduction ............................... 1.2 Theory .................................. 1.2.1 The Traditional Mallat’s Algorithm .............. 1.2.2 The Lifting Approach ...................... 1.2.3 Integer Lifting DWT ...................... 1.2.4 Quantization of the Filter Coefficients ............ 1.2.5 Wavelet Thresholding ...................... 1.2.6 Computation and Data Rate Savings ............. 1.3 Methodology .............................. 1.4 Results .................................. 1.5 Conclusion ................................ 2 Spike Detection 2.1 Introduction ............................... 2.2 Data model ............................... 2.3 Wavelet domain detection ....................... 2.3.1 The Stationary Wavelet Packet Tree ............. 2.4 Analysis of the Bayesian Tests for Spike Detection .......... 2.4.1 Likelihood Ratio Test for the known signal case ....... CDOOOOO'IuD-iiki—il-t i—Ii—ii—ar—a \IOOMM 19 19 29 29 2.4.2 Likelihood Ratio Test for the unknown signal case ...... 2.5 Results. . . 2.6 Conclusions ooooooooooooooooooooooooooooooo 3 Clustering Neural Spike Trains 3.1 Introduction ............................... 3.1.1 Background ........................... 3.2 Simulation Method ........................... 3.2.1 Proposed Technique ...................... 3.3 Multi-resolution Approach ....................... 3.4 A Principal Components Approach for Multi-scale Clustering . . . . 3.5 Graph Theoretic Clustering ...................... 3.6 Clustering Results ........................... 3.7 Conclusions BIBLIOGRAPHY ooooooooooooooooooooooooooooooo viii 36 44 49 51 51 52 54 55 56 65 66 69 74 76 1.1 2.1 3.1 3.2 LIST OF TABLES Update and Prediction filter coefficients for symmlet 4. ....... 8 Signal to Noise Ratio in the different SWPT nodes. ......... 29 Eigenvalues associated with the Principal Components ....... 66 Clustering Errors obtained in the k-means algorithm ........ 72 ix 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 2.1 2.2 2.3 2.4 2.5 LIST OF FIGURES The basic analysis block of the pyramidal algorithm proposed by Mallat[10] for 3 decomposition levels. ................. Lifting scheme for computing a single level of DWT decomposition. Comparison of denoising performance for various precisions of filter coefficients for a sample channel of a 4-channel array with the data quantized to 10 bits. . . . '. ...................... Denoising performance for various wavelet transform computations. Performance in the distance metric of the signal subspace for data denoised using various wavelet transform computations ........ Performance in the distance metric of the signal subspace for data denoised using various wavelet transform computations ........ Example spike waveforms obtained with DWT computation with different filter coefficient bit widths ................... Effect of round off and quantization errors on the signal fidelity as a function of multiplier complexity. ................... The Stationary Wavelet Packet Tfee. ................. A single channel represented in different SWPT nodes. ....... Receiver operating Characteristics for Array and singe channel de- tection ................................... Receiver operating Characteristics for Array and singe channel de- tection ................................... ROC curves for Multiresoution and Time domain detection for a single channel data at an SN R of 7.6024. ............... 15 16 16 17 25 28 41 43 45 2.6 ROC curves for Multiresoution and Time domain detection for a single channel data at an SNR of 8.7623. ............... 2.7 ROC curves for Multiresoution and Time domain detection for a single channel data at an SNR of 10.857. ............... 2.8 ROC curves for Multiresoution and Time domain detection for a two channel array at an SNR of 8.1177. .................. 2.9 ROC curves for Multiresoution and Time domain detection for a two channel array at an SNR of 6.9579. .................. 2.10 ROC curves for Multiresoution and Time domain detection for a two channel array at an SNR of 10.2124. ................. 2.11 Noisy data on one channel used to detect spikes ............ 2.12 The clean spike data to be detected. ................. 2.13 A clean and noisy spike example. ................... 3.1 The Haar basis approximation function. ............... 3.2 The Haar basis detail function. .................... 3.3 The Wavelet 'Iree representation to 5 levels. ............. 3.4 Correlations in different subbands. .................. 3.5 Principal Components of correlations .................. 3.6 Fused profile of the first 4 principal components ............ 3.7 The true clusters depicted in the first two components resolved from the fused matrix D ............................ 3.8 Result of the k—means algorithm applied to the first two principal components of the fused matrix ..................... xi 45 46 46 47 48 48 49 59 64 67 70 71 73 CHAPTER 1 Wavelet Transform Block for Implantable Neural Interfaces 1 . 1 Introduction Electrical recordings from neural tissue give direct observations of the neural activ- ity. Compared to Electroencephelogram (EEG) recordings recordings or chemical detection of neurotransmitters and neuromodulators, invasive recordings from elec- trodes placed inside the neural tissue provide data with high temporal (of the order of 30 kHz) and spatial (micron range) resolution. Cellular recordings from neurons have been used to study the neural systems such as the motor, somatosensory, and visual cortical areas of the brain. With the advances in micro fabrication, it is possible to manufacture dense arrays of microeletrodes[1]. These arrays can record from a large number of neu- rons and provide high volumes of data for further analysis. For artificially grown neural tissue, it is possible to have relatively long term cellular recordings from individual neurons through techniques such as patch clamping[2] and directed neu- ral growth[3]. ln-situ recordings from animals, however is more complicated for various reasons. The recorded data is corrupted to a large extent with noise from the neighboring tissue, and the signal conditions evolve over time. Another prob- lem is the actual transmission and processing of this information. The traditional technique relies on amplifiers situated on the recording probe relaying the recorded signals over wires to a bank of filters, analog to digital converters, and a computer for signal processing. This complicates things for experiments in live, freely behav- ing animals. It is not always possible to have bulky wired connections to a large data processing system. The opportunities for clinical neural prosthetic devices also present similar challenges. These devices have the potential to help patients with motor disorders and spinal cord injuries lead more normal lives. Our motivation is the development of on-chip signal processing and wireless data transmission for high density, implantable neural recording probes. Researchers have tried to move towards wireless telemetry[4] to make the neural probes less obtrusive and suitable for both long term behavior experiments and prosthetic devices. It is clear that complex signal processing is required on this data for compression, detection of action potentials and local field potentials, separation of neural sources, and decoding the neural signals. Passive probes transmitting all the data wirelessly face a deluge of data. A modestly sized array with 96 electrodes recording at a sampling rate of 25kHz per channel to a precision of 12bits / sample, required bit rate would be 29 Mbps. Further complicating this problem is that the biological environment places stringent limits on the size and power availability for such devices. We focus on the development of on-chip signal processing prior to data trans- mission to extract some valuable information and reduce the telemetry bandwidth requirements. Studies have shown that on-chip processing may reduce the data throughput to approximately 75% for bursting activity and 85-90% for spontaneous activity and reasonable bit rates are achieved for a 32 channel device sampling at 25 kHz, with 8 bit sample quantization[5]. This technique relied heavily on the Discrete Wavelet Transform (DWT) [6] The Discrete Wavelet Transform is also the basis of many algorithms for spike detection[7], noise reduction[8], spike sorting[9] of multichannel microelectrode recordings, including those presented in this thesis. On-chip DWT computation of the recorded data will be an essential component of any implantable electrode recording array with intelligent wireless transmission of data. With the stringent power and ares limitations imposed on the system, a suitable implementation of the DWT is paramount. The use of the lifting technique for DWT computation results in considerable savings in computational hardware, memory, and bandwidth requirements. 1 .2 Theory The compaction property of the wavelet transform, which expresses the signal in a few larger coefficients while expressing the noise in many small coefficients is used in most of the algorithms for neural signal processing. Discarding the smaller coefficients results in denoising[8]. Also, the fact that the signal of interest (neural spikes) occupy a small number of coefficients is beneficial for compression strategies. 1.2.1 The Traditional Mallat’s Algorithm The traditional Mallat’s algorithm[10] for evaluating the wavelet transform of a given signal involves recursively convolving the signal through two decomposition filters h and g, and decimating the result to obtain the approximation and detail coefficients at every decomposition level j as illustrated in Fig.1.1. These filters l and h are the low pass scaling function and high pass wavelet functions respectively. The approximation and detail coefficients at any arbitrary level are given by . K-1 . a-7(z') = Z aJ—1(2z' — km, k=0 K—l dj(i) = Z aj_1(2z’— k)hk (1.1) 1:20 where K is the length of the filters, also known as the support of the wavelet dj-l-l IQ a, Kn. A Ht?) t2 r:— dj+3 - L(:) - t2 ~—~ “1 as (: t2 ~3- L(.:) 4,2 L—g a HZ) l2”— IQ j+3 IQ Figure 1.1. The basic analysis block of the pyramidal algorithm proposed by Mallat for 3 decomposition levels. function. 1.2.2 The Lifting Approach From the computational standpoint, the lifting scheme for DWT computation has shown numerous benefits over traditional DWT computation[11]. It is a fast and efficient mechanism for implementing the DWT. For asymptotically long filters, the computational cost of the lifting algorithm is one half of the cost of the standard algorithm, and it requires only iii-place computations. It can also be adapted to a lossless integer—to—integer transform which does not require any floating point computations. The lifting scheme relies on factorizing the l and h filters to obtain the filters S N and TN as hit/E) + law—7) W?) + M?) ' 2 hffi) - h(\/:E) 2 l(x/E) - INT-3) _ 2 p K 0 0 K 0 I SN(Z) I 2 0 I TN(z) 1 d 1 d (1.2) l 0 T1(Z) 1 In this scheme, there is no need to compute the two different discrete convolu- tions for the whole data record. The memory required is the same as that required to store the data, while Mallat’s algorithm requires twice as much memory as the initial data to compute the wavelet transform. In addition to these savings, an integer approximation of the data and quantization of the filter coefficients further reduces the computational load. _f_. 1. JD ‘1’ —_——i Split 71(2) 81(2) ... 1,, ‘ fl SN(z) —1 8." G j+l Figure 1.2. Lifting scheme for computing a single level of DWT decomposition. The lifting scheme of Fig.1.2 replaces one stage of the Mallat’s algorithm shown in Fig.1.1. It is based on three steps: splitting the data into even and odd samples, predicting the odd samples from the even samples, and updating the even samples to obtain the approximation coefficients. For our hardware implementation with strict power and area constraints, we con- sider an integer to integer wavelet transform[12]. Here, the first factor in Eq. (1.3) is discarded, and the realized transform is in fact a scaled version of the original wavelet and scaling functions. The symmleM wavelet basis has been found suitable for processing of neural recordings[9]. Factoring the scaling and wavelet functions for this basis results in the following steps to be used in the scheme of Fig.1.2. stepl: gilt]: lgolil+01folill step2= filil = [folil+C2gilil+C3glli+1lJ step3= 92M = lgilil +C4f1lil +05f1li-1ll (13) steel: alil = lfllil+0692lil+0792li-lll step5 : d[i] = [g2[i] + Cga[i + 1]] where the input registers go[i] and f0[i], are the pair of even and odd samples of the signal, a[i] and d[i] are a pair of approximation and detail coefficients computed, and [.j denotes the floor function. Note that we truncate the result at every stage, since we wish to implement the integer version of the transform. The constants 01,02, ..., C8, which are the coefficients of the filters Tn(z) and Table 1.1. Update and Prediction filter coefficients for symmlct 4. Coefficient Value Coefficient. Value C1 0.39114 C5 0.16203 Cg -0.12439 CG 0.43128 C3 -0.33924 C7 0.43128 C4 -1.41951 C8 -1.04925 571(2) are given given in Table 1.1. 1.2.3 Integer Lifting DWT The data recorded at any channel is scaled and quantized to obtain data sam- ples within a 1—bit integer precision. The DWT is then computed using the lift- ing scheme described in Eq. (1.3) to the required number of decomposition levels. The last step (of the corresponding floating point lifting) of multiplication by the constants K and K ‘1 (in Eq 1.3) is omitted. Hence, the dynamic range of the transform at each level will change by K. 1.2.4 Quantization of the Filter Coefficients The filter coefficients C1 through C8 have real values given in Table 1.1. However, since we are working with 10-bit integer data, we wish to remove all the floating point operations from the DWT computation strategy. The filter coefficients are therefore quantized to a fixed number of bits. We have experimented with 4, 6, 8, 10, and 12-bit precision of the lifting factorized coefficients. Quantization to a small number of bits results in a significant reduction in the number of additions and shift operations required for each multiplication in Eq. (1.3). 1.2.5 Wavelet Thresholding The DWT representation of neural signals is suited for noise reduction as the sig- nal components are compactly represented by few large amplitude coefficients be- cause of the inherent similarity between the wavelet basis functions and the spike waveforms[13], and the noise is spread across all decomposition levels in many small amplitude coefficients. This allows thresholding[14] the small coefficients to take place and simultaneously reduces the number of coefficients to code for extracuta— neous transmission. The coefficients that are thresholded can be run-length encoded to give significant savings in bandwidth[5]. We formulate a measures to quantitatively assess the performance of our DWT module for typical neural signal processing problems. The Signal to Noise Ratio of the recorded data at any stage of processing is defined as Average Spike Peak to Peak Amplitude SNR (dB) = 20109.10 Noise RMS (1.4) This definition of the SN R is relevant for spike detection in noisy data as dis- cussed in Chapter 2. We also wish to evaluate a measure of neural source separability in multiunit array recordings. The wavelet based MASSIT algorithm[9] for spike sorting relies on the invariance of the signal subspace in the “best” subbands of the DWT expansion. The approximations performed in the DWT module design should not compromise the accuracy of the spike sorting stage. To compute the eigenvectors for the MASSIT spike sorter, we model the ob- served spike waveform across multiple channels as Y=X+Z am where Y E ERMXN denotes the observed spike in the interval [1, . . .,N] across an array of M channels, X = apsg is the noise free observed spike across the array such that the column vector ap E ERMXI describes the array response to the spike waveform Sp 6 RN X1 elicited by neuron p, and Z 6 giMXN denotes a zero mean additive noise component with a non-diagonal spatial covariance matrix RzéiliMXM. The column vector ap spans the same subspace as the first eigenvector "1 of the spatial covariance matrix of Y, computed as M Ry = E[YYT] = UYDYU$ = Z amumug (1.6) m=1 Therefore, a qualitative assessment of the neural source separability is a distance metric between the eigenvector obtained with floating-point standard Mallat DWT 10 algorithm and the eigenvector obtained using fixed-point integer lifting DWT com- putation. The distance between both vectors, in a mean square sense is computed as dp = ”111 - at”2 (1.7) where [III is the Euclidean norm. This distance metric computed after thresholding in the Integer Lifting DWT module, and the regular DWT module provides a qualitative measure of the per- formance in spike sorting. The distortion in the reconstructed spike waveform due to the quantization and approximation errors in the DWT module is also considered. Faithful reconstruc- tion of the spike waveform is essential for all spike sorting algorithms, whether for single channel[15] or array data[9]. We compute the distortion as xxT (X — Yr)(x — Yr)?” Distortion SNR (dB) 2 10log10 (1.8) where Yr 6 §R1XN is the reconstructed signal after wavelet thresholding for the DWT module, and X E ili‘lXN is the true neural signal for any single channel. This distortion metric can be evaluated for the Integer Lifting DWT module with arbitrary data and filter quantization, and compared to the “ideal” distortion obtained with the traditional floating point DWT module. 11 1.2.6 Computation and Data Rate Savings The lifting scheme for DWT computation requires less computation than the tra- ditional Mallat’s algorithm, even without any data or filter quantization. For the symmleU, orthogonal wavelet with a filter length of eight (Ihl = 8, [II = 8), the number of floating point operations (flops) required for the standard algorithm is 34, while the lifting algorithm requires only 20 flops [11]. Further, the quantization of the data and filter coefficients C1, C2, ..., C8 reduce the complexity of the multipliers used in the lifting scheme computations (Eq. (1 ..3)) If the data is quantized to Nd bits and the lifting filter coefficients are quantized to N f bits, the product will be truncated to Nd bits. Hence, we estimate the total number of 1-bit addition steps required for one multiplication in the lifting module to be Nd x N f. The required precision of the data is directly proportional to the data rate needed for transmission or storage on on-chip buffers. 1.3 Methodology A template spike waveform from a single neural source was used to generate a simulated spike train across a four channel array. The inter-spike interval times were realizations of an exponentially distributed random process. Experimental neural noise from a closely spaced array recording was superimposed on the spike train. The noise level was varied to obtain carious SNR conditions. To evaluate the expected performance in source separation, we used Monte Carlo simulations for various choices of mixing matrices for different single units. The dominant eigenvectors were computed from the denoised data and their distance metric was evaluated from the original mixing matrix A. The distortion in the spike waveform in an individual channel as a function of the multiplier complexity and transmission data rate are also plotted 1.4 Results Figure 1.3(a) illustrates the noisy signal, and its denoised versions using the tra- ditional floating point DWT, Integer Lifting DWT with 10-bit filter coefficients, and Integer Lifting DWT with 4-bit filter coefficients. Figure 1.4 illustrates the de- noising performance comparison of the floating point DWT, Integer Lifting DWT with 10—bit and 4-bit filter coefficients on 10-bit data. The interesting result here is that the performance of the Integer Lifting DWT module with 4-bit filter coeffi- cients is close to that of the 10-bit DWT module, which translates into tremendous computational and memory savings. The performance of the source separation distance metric was evaluated and illustrated in Figure 1.6. The plot indicates that the signal subspace is maintained when replacing the floating point DWT to the Integer to Integer Lifting based DWT. Figure 1.7 shows that the reconstructed waveform after denoising for the different cases is very similar. We can locate an optimal operating point for minimum 13 2m .. -—‘~— —a ——v~~--—v-—-~—1 15w .- —~ __._. -— ._ ~ ~ —« “”Wtl‘tit‘ll'i’li‘iilli’tlilllimit l l h l" f l l‘ . I i l l -1 50 [ I . I l -9“) L- -- .1. A D.L_._c_fn_. Ll -25(_ii,___1.m.1. .4 L. A.” .___.._ 1_J i. liv‘lj )l I: ',“‘KI.‘,,‘,'. i'."j 1 Oil J?" 5i; {ill l. lull} ll ".li.‘ ii“ l‘i'il' f‘ \11 ,v (a) A single channel of noisy data. (b) Wavelet thresholded data using the tra- ditional DWT — m —r— “ r— r m 200 * *— rrrr Hf * *Fr‘r‘ “’fi 180; i i ‘ . . l l ' l r. : ‘ , , . ‘ . l.-\"ii,"’ f. ' 19". ‘1‘ n”. "i." .3: ‘hu; 4‘," v. " . h - ' - ”i159." ‘ .‘ “"23ut»+~’\;(f’i' .‘|.'. i‘f“.‘_.'r"(_P‘ ii If). ”LIN! , 0 I' "1' -i ‘, '.. ‘ " ' l "r" i' ' .‘. ill ‘ Er " ,. 3 J7 ‘it’ “w ‘ L ' 'l :4 ' . 'l l :39": 1) "ml -:,.l ./, - 'j , l l I l 150i i | J -200, ‘ l L -__ _.. A -._r_-_ . . _ L_ _r__._ 1-..-.. _.__- L _ (c) Wavelet thresholded data with Integer (d) Wavelet thresholded data with Integer Lifting DWT with 10-bit filter coefficients. Lifting DWT with 4—bit filter coeflicients. Figure 1.3. Comparison of denoising performance for various precisions of filter coefficients for a sample channel of a 4-channel array with the data quantized to 10 bits. computational complexity while giving good signal fidelity from Figure 1.8. We see that using 10-bits to represent the data, and constructing the DWT module with 4-bit filter coeflicients gives very little waveform distortion, while huge savings on the hardware are obtained. 14 30L —Floating Point DWT SNR after denoising (dB) to N o o: .3 01 10 8 10 12 14 16 --------- IWT 10 bit quant. ' _ v" - ~~IWT 4 bit quant. . . '/ l l l l L 41 L I J 18 20 SNR of noisy data (dB) 22 24 26 l 28 Figure 1.4. Denoising performance for various wavelet transform computations. p p p p a: a: '4 on I S3 00 Distance for denoised data 0 _ a p N 0.1 . Data denoised with Floating point DWT 0 Data denoised with Integer Lifting for 10 bit coefficients ' Data denoised with Integer Lifting for 4 bit coefficients —Floating point (best fit) "mm-10 bit coefficients (best fit) ' "4 bit coefficients jbest fit) no. 0. 1 L i 1 0" 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.3 Distance for noisy data Figure 1.5. Performance in the distance metric of the signal subspace for data denoised using various wavelet transform computations. 15 0 Data denoised with Floating point DWT ° Data denoised with Integer Lifting for 10 bit coefficients 0 Data denoised with Integer Lifting for 4 bit coefficients —Floating point (best fit) ' ....... 10 bit coefficients (best fit) ' ‘ '4 bit coefficients (best f'rL F3 on o q P 0) .0 an I Distance for denoised data 9 o w a I L I J 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Distance for noisy data Figure 1.6. Performance in the distance metric of the signal subspace for data denoised using various wavelet transform computations. i. 2 i: 'I :E it 1!, :I 32 ii , I‘ E: ii i l' 3': ,' I 5. .‘. it, 2‘. l! l: 2: ti, 2‘, ii. _ A | E ' I 1 (U \J‘] ‘ $5 \ 'L- ~ u,“ \ g“) i: a; v ‘i, x ‘0‘) l: i3 l i: ii H i H 32 9i : i: a; :i 1 It 'i ,! ll i: E: i! 3 'I 2; i! 1 II :3 i: it i ‘ i ? Time -— Thresholded with floating point DWT -—- Thresholded with Integer Lifting ........ Thresholded with Integer Lifting. 4—bit Wavelet -'-‘- Thresholded with Integer Lifting, 6-bit Wavelet —-- Thresholded with Integer Lifting, 8-bit Wavelet Figure 1.7. Example spike waveforms obtained with DWT computation with dif- ferent filter coefficient bit widths. 16 Distortlon in spike waveform vs Computations g . I T f 6 ' _,._.-—- - W“ s ' l/ T g i. / 4 / j/ I a 2 ~ —4 bit Wavelet fiter coefficients £1 — 6 bit Wavelet filter coefficients 5) 0 _ -—- 8 bi Wavelet filter coefficients — 10 bit Wavelet filter coefficierts -—- 12 bit Wavelet filter coefficients -2 - O 4 bit data quantization ,', ,z r; X 6 be data quantization 4, .. T + 8b't data quantization ” 0 10 bit data quantization V 12 bit data quantization -6 1 l l 0 50 100 1 50 Mulfipier Complexity (equivalent bit additions/sample) Figure 1.8. Effect of round off and quantization errors on the signal fidelity as a function of multiplier complexity. 1 .5 Conclusion A robust, computationally simple module with anticipated major reduction in chip area and power consumption for implementing the DWT has been demonstrated. The proposed design was motivated by the need to efficiently process neural record- ings from high—density implantable neural interface modules in the nervous system. The ability of the DWT to preserve the signal characteristics in the temporal, spectral and spatial domains intuitively makes it a desirable signal processing tool to pipeline the neural data with reduced bandwidth in any associated telemetry system. The anticipated reduction in chip area and power consumption without re- ducing the performance to a noticeable degree strongly suggests that the proposed 17 module design can be a key processing stage in the next generation of neuropros— thetic devices. A circuit design and simulation of the proposed module has indeed confirmed the savings in chip area and power requirements in addition to the versatility of the design for real-time and streaming data processing[16]. 18 CHAPTER 2 Spike Detection 2.1 Introduction Spike Detection refers to the detection of action potentials generated at the mem- brane of neurons from electrical signals recorded from neural electrodes. There is generally a noise process from the surrounding neural tissue corrupting the recorded signal. The neural spikes or action potentials are transient signals of small duration that appear as local voltage surges. The wavelet transform appears as a natural way to represent data containing transient signals due to its localization properties in both time and frequency. We have demonstrated use of the wavelet transform to do low level processing on the recorded data for denoising[17] and compression[18]. Therefore, the wavelet transform computations would need to be performed in our framework of a modular design of the prosthetic devices, with the detection algo— rithms adding very little overhead. 19 The algorithms presented here detect the spikes through the peaks in signal amplitude occurring at any point over its waveform. For wavelet bases which might be correlated with the signal waveform to some extent, the ratio of the signal peaks associated with the spike to the noise variance increases in some wavelet subbands. The wavelet transform essentially gives a higher SNR for detection in the relevant subbands The waveform of spikes generated by a neuron may change over time, and in general cannot be assumed known. For chronic experiments or neural prosthetic devices, active neurons whose action potentials are observed at the electrodes may appear and go due to electrode movement, and changes in neural tissue. For these reasons we assume no knowledge of the spike waveform or amplitude. Nenadic[19] also gives a wavelet based method for spike detection heavily based on Oweiss[20]. However, Nenadic’s method implicitly assumes knowledge of the spike waveform by choosing the wavelet function and decomposition levels which are highly correlated with the spike waveform. Further, the estimation of signal strength and signal probability makes the detection susceptible to changes in signal conditions. These assumptions and estimations will run into more problems when action potentials from more than one neural source are to be detected, and the problem becomes cou- pled with that of estimating the number of sources. Nenadic’s approach also relies on a continuous wavelet transform which is highly computationally expensive. Our emphasis is evaluating approaches for a very blind signal detection scenario with very little assumptions about the signal and noise conditions. The work presented 20 here uses algorithms for single electrode channel and array observations similar to those presented in [20] and derives results for performance in different cases. 2.2 Data model We assume the following data model for the an M electrode array (Eq. (2.1)). Y = AS+Z = x+z (2.1) where S E §RPXN is the neural signal, A E §RMXP is the mixing matrix, Z E meN is the noise and Y E ERMXN is the observed data, N being the number of time samples, and P being the number of neurons. For simplicity, we assume stationarity and iid distributions at each time sample. Hence, for any time snapshot we get Eq. (2.2). y = As+z : x + z (2.2) 21 where s E mel is the neural signal, 2 E ERMXI is the noise and y E ERMXI is the observed data. z ~ N(O, R) (2.3) We assume the noise (Eq. (2.3)) to be temporally white Gaussian, with the spatial correlation structure given by R 6 RM XM . 2.3 Wavelet domain detection The compaction property of the wavelet transform condenses the energy of the signal into a few coefficients, while the noise (assumed white) is distributed across all the nodes. Hence, it should be easier to detect the signals in the wavelet domain. Our approach is to perform a multi level stationary wavelet packet decomposition of the observed data. The stationary (or undecimated) wavelet transform is performed because the truncated wavelet transform reduces the number of samples at each subsequent level by half. Though it takes less memory or registers, the truncated wavelet transform reduces the temporal resolution for transient signal detection. The wavelet packet tree decomposes both the approximations and details at each level, as opposed to decomposing only the approximations as in a wavelet tree. 22 2.3.1 The Stationary Wavelet Packet Tree The Stationary Wavelet Transform seeks to represent the signal in a redundant, shift-invariant basis with uniform time-scale tiling. To aid the understanding of the transform as a matrix operation, consider the following definition of a translation operator[21]. Definition 2.1 Suppose z is a vector. Define (Rk2)n = zn_k. Rk is a translation operator. If the low pass filter function (also called the father wavelet) be designated as u E gile, and the high pass filter function (also called the mother wavelet) designated v 6 Elile, where f is the length of the filters (compact support of the wavelet), then the matrices for the approximations and detail decomposition of a signal of length N at any level can be written as in Eq. (2.4) and Eq. (2.5). Wa = [Rlu R211 RNu] (2.4) W(1 = [Rlv Rgv RNV] (2.5) 23 Here, Wa E me N is the transformation matrix for the approximations, and Wd 6 iliNx N is the transformation matrix for the details. The approximations sa 6 mlxN and details sd 6 ERUN for the signal s E ililxN can be obtained as in Eq. (2.6). s8 = sWa sd = SWd (2.6) Note that at every level of decomposition the amount of data (or memory re- quirement) is doubled for a stationary wavelet packet tree. The multilevel Station- ary Wavelet Packet Tree can be represented as in Fig.2.1. The node 1 in this tree corresponds to the un-processed time domain signal. We can therefore write the transformation matrix for obtaining the signal at any node as a product of Wa and Wd. For example, to obtain the wavelet transformed signal at the node 5 we have, 4 5 5 7 W W” w W“ wa dea wd 8 9 10 11 12 13 14 15 Figure 2.1. The Stationary Wavelet Packet Tree. => W(5) = wawd. (2.7) Looking back at the data model for the observations, we formulate the wavelet transform equations for the array data 25 Y = AS+Z = x+z y(.i) : yw(j) Y0) : xw(j)+zw(j) 3(0) 2 x(j)+zfj) (2.8) where YO) E iIiMXN is the wavelet transform in the jth node of the observations YERMXN. The observations are assumed to be sampled at 20kHz, and since the spike waveforms are usually concentrated at bandwidths less than 10kHz, the first level details (labelled Node 3 in Fig.2.1) and its offspring do not include data of interest. Hence, to save computations the complete wavelet tree is not computed. Rather, only Node 2 and its children are computed. The detection tests are run only on these subbands. Similar to Eq. (2.2), we describe a snapshot of the data at a single time sample 351 ,0) 2 x0) + 20'), (2.9) 26 Here, yo.) 6 ERMxl is simply one column of the Y matrix, corresponding to the time instant of interest. The algorithms considered in this work operate on single time snapshots. The spike detection tests are carried out independently across all the wavelet sub-bands and the decisions are logically OR’ed. For convenience and reducing visual clutter, the subscript corresponding to the node (subband) of the Wavelet Packet Tree will be dropped, and all the data will be understood to be from a particular wavelet subband. It is shown in the sections on detection performance evaluation, that the detection performance is a function of the Signal to Noise Ratio of a channel. The wavelet transform helps in the detection performance because neural spike waveform, being localized in time would be correlated to some extent with the wavelet functions. This in effect, increases the equivalent SNR at some wavelet subbands. Previous work[8],[17] has shown the symmlet 4 to be a good wavelet basis for neural spike signals, and this is the wavelet basis used in this study. As an illustration, we have plotted the signal in various nodes of the Stationary Wavelet Packet Tree for a single channel of data containing 4 spikes in Fig.2.2. The red dots denote the instant of the largest absolute peak for a spike, which is to be detected. As we can see, some of the nodes, for example Nodes 8 and 17 are well tuned for capturing the spike energy. The SNR’s in the different SWPT nodes for this example are tabulated in Table 2.1. 27 Amplitude Amplltude Amplltude Amplltude Amplltude Amplltude Amplitude Amplitude 500 0 -500 500 0 -500 500 0 -500 200 0 500 -500 200 0 -200 200 0 -200 200 0 -200 Mi in MM 200 NE Node1 ‘ W kmwlllMWW "0601M WWWMMMWN it ill-iii Milli it'll tilifiiill l 100 300 Time Samples Mill MM ,1 M 500 WWW? til” will I“ 'l'l‘l'IW Elli ll ,,h‘1WV1.,lil lI‘IlIIlIIill IIIIIIIII‘II ”#11le Will} WWW WW W‘ltll'illilii r 1,13,11,ng111‘111, “Will" iI'l 12011“.th 100 Time Samples Figure 2.2. A single channel represented in different SWPT nodes. 28 Table 2.1. Signal to Noise Ratio in the different SWPT nodes. SWPT Node Signal to Noise Ratio 1 10.4799 2 13.3108 4 14.8127 5 6.1543 8 15.5377 9 13.8415 10 3.3308 11 7.2197 16 11.9609 17 16.7398 18 5.9863 19 14.3290 20 4.1822 21 1.2984 22 3.0082 23 6.4636 2.4 Analysis of the Bayesian Tests for Spike De- tection 2.4.1 Likelihood Ratio Test for the known signal case For simplicity, we consider a single neural source impinging on an array. In this case, we have a binary hypothesis. Under hypothesis H0, no signal is present and s = 0. Under the alternative hypothesis H1, the signal (spike) is present and As=x=p.. 29 The LRT then becomes, — A) = g [1— Q(I£TR31M\/§)] (2.15) Correspondingly, we have similar expressions for single channel spike detection too. .2 ‘ 21-” 1 a]- PD =PH1(T>/\)=-2- l-Q ”.2 - 03' fl _ J 1 A PF =PHO(T>/\)=-2- l—Q ”.2 U'zfi - .7 (2.16) Hence, for a Gaussian distributed Sufficient Statistic T, the distance metric between the hypotheses is simply “TR-1p. For a single channel case, this is simply the energy of the signal divided by the variance of the noise. 32 To compare the performance of the array detection vs. the single channel de- tection, we observe the distance measure between the hypotheses in the two sce- narios. The array processing will give better performance if the following condition (Eq. (2.17)) holds true. 0 ”TR—1p. > El??- (2.17) 0]- where pj is the strength of the signal at the the channel j where we are detecting the signal, and aj is the variance of the noise at that channel. Hence, we can describe “TR—1p, as an equivalent Signal to Noise Ratio for the array case. The observations (array or channel) with a better Signal to Noise Ratio give better detection performance. Performance Comparison for Spatially Uncorrelated Noise If we assume the noise covariance to be diagonal (Eq. (2.18)), we can get results on the equivalent SNR’S for a single channel and the array data. Eq. (2.19) shows that the equivalent SNR, and hence the performance of a detector using the array data will be better than that working on any single channel. 33 012 0 0 0 0 022 0 0 R: (2.18) L 0 0 0 0M2 01—2 0 #TR_1#=#T u 1 0 ”NI—2- M #2 =25» —:—:Vie{1,2.. M} (2.19) 1210i Performance Comparison for Spatially Correlated Noise In this case, the noise covariance R is not diagonal. Since we have already have the expressions for the detection performance in various cases, our purpose here is simply to qualitatively describe the effect of the correlation. For mathematical simplicity, we assume a two channel array. The signal and noise covariances are described in Eq. (2.20). 34 #1 ’1’: H2 2 01 0102/) R: (2.20) 0102/) 012 An inequality condition (Eq. (2.21)) can be derived for the array and channel SNRs. By symmetry, a similar condition holds for channel 2. Hence, we conclude that the array performance is better than a single channel performance. T -1 it? 031% -- 2H1H20102P + #30? it? " R f‘ ‘ _2 = 2 2 2 - ‘2 01 0102 — (0102p) 01 2 (#2002 (1- £323) — “2 1 > 0 (2.21) clog (1 - p2) We can also try to understand the effect of the noise correlation on the per- formance. For some fixed signal conditions, and fixed channel noise variances, the array SNR performance improves with the noise correlation coefficient opposite in sign to the signal correlation, and deteriorates when the noise correlation coefficient is of the same sign as the signal. 2.4.2 Likelihood Ratio Test for the unknown signal case Here, we construct the generalized likelihood ratio test because we do not know the signal waveform or strength. The generalized estimate for the signal therefore is the observed data itself. -(y - y)TR"1(Y - y) 1 e 2 ,f‘zn'IRl-Nfl H1 T _1 §H 77 (2.22) 1 -y R y 0 , e «mm-M? This reduces to _ H yTR 1y §Hé2logn (2.23) This is essentially a blind energy detector. We therefore define the Sufficient Statistic T for the array spike detection with Eq. (2.24). T = yTR-ly (2.24) The Sufficient Statistic in this case is X2 distributed with degrees of freedom equal to the dimension of y. The X2 statistics can be proven if we consider the singular value decomposition of the noise covariance (Eq. (2.25)). 36 R = UDUT (2.25) The Sufficient Statistic can thus be expressed as the norm of the whitened observations y. T = {IR—13' = yT(UDUT)_1y : (D’1/2U‘1y)T(D’1/2U‘1y) = 9% (2.26) Now, 57 = D—l/zU—ly, being a linear relation on a Gaussian Random variable y, is also Gaussian. The norm of a multidimensional Gaussian is X2 distributed, and hence the Sufficient Statistic (Eq. (2.26)) for unknown signal detection in an array is X2 distributed. We can calculate some statistics for the two hypotheses (Eq. (2.27)) in terms of the whitened observations. 37 = 0 E[(y— “Xi—WT] =1 '9'” = 0T0 = 0 (2.27) H1 2 = E121 = E [D—l/zU‘ly] = D—1/2U—1M E [(9 — m - W] = 1 if? = (D”1/2U‘IM)T(D_1/2U‘1u) = ”TR—III A Therefore, we can construct the probability density functions and cumulative distribution functions for the Sufficient Statistic T in terms of the non-central X2 distribution functions. Tn/2-le—T/2 H : T = n T,0 = 0 p( ) p( ) I‘(%n)2n/2 F(T) = F(T|n, 0) (2.28) 38 H1 : p = pn(T,uTR‘1u) = MT, 1) _ T(n/2—1)e—(T+/\)/2 00 (ATV: _ 2n/2 k=0 22kk!r(k + $72.) P(T) : firm, A) (2.29) where F (T ln, A) is the cumulative distribution function of a non-central X2 dis- tributed random variable with 17. degrees of freedom and non-centrality parameter )1. These statistics (Eq. (2.28), Eq. (2.29)) can help us calculate the performance of the detection as follows. PD 2 P(T > 2logn|H1)=1— FH1(2logn) =1— F(2 log nln, A) = 1 — F(2logn|n,pTR“1p) PF 2 P(T > 2log77|H0) =1— FH0(2logn) = 1 — F(2 log nln,0) (2.30) where PD is the probability of detection, and PF is the probability of false alarm. To compare the performance of the array detection with that of the single channel detection, we write the corresponding probabilities of detection and false 39 alarm for the single channel case. 2 fl. PD =1-FH1(210g77)=1-F(210gn|1,fi) .7 PF =1—FH0(2logn)=1—F(2logn|1,0) (2.31) where 113- is the mean of the signal at the channel j where we are detecting the signal. In this case of unknown signal detection, we do not have one expression similar to Eq. (2.17) to compare the performance of array and single channel detection. However, it is possible to compare the performance in different cases nu- merically using the tabulated values of various X2 cumulative distribution functions. Let us consider an example of a three channel electrode array. Assume the signal strength at the 3 electrodes in the presence of a spike given by Eq. (2.32), and the noise covariance given by Eq. (2.33). 2.2 As : z = p = 2 (2.32) 1.8 40 1 0.4 0.3 R: 0.4 1 0.2 (2-33) 0.3 0.2 1 The detection performance in terms of the Receiver Operating Characteristics are plotted for this case in F ig.2.3. HOB"; t'“:“ 6 ‘I “an“ 09 ——“ 0'8 ---- Channel 1 c - - Channel 2 g 0-7 - - - Channel 3 0 Q) 15 0.6 o '5 0.5 g g 0.4 .0 9 o. 0.3 0.2 0.1 0 1 1 l 1 l 0 0.2 0.4 0.6 0.8 1 Probability of False Alarm Figure 2.3. Receiver operating Characteristics for Array and singe channel detec- tion. It is seen that in this case, spike detection using the array data gives better performance. 41 Let us now consider a slightly different case with the same noise covari- ance(Eq. (2.33)), but the signal strengths given by Eq. (2.34). 2.48 As=x=u= 2.1 (2-34) 1.8 The detection performance in terms of the Receiver Operating Characteristics are plotted for this case in Fig.2.4. On close examination, it is revealed that for low values of Probability of False Alarm, Channel 1 detection gives better performance compared to detection with the Array data. The situation is reversed for higher values of Probability of False Alarm. We thus observe that for the generalized likelihood ratio detection, we can not derive a Uniformly Most Powerful property for array vs. channel detection. It will be worthwhile to calculate the values of the Signal to Noise Ratio at channel 1, and the equivalent array Signal to Noise Ratio we had derived in the known signal detection case (Eq. (2.35)). 42 0.9 , _ y’ , ’ ' — Array 0'8 .z ’ ’ , ’ ’ ----- Channel 1 :07. ,I" x - - Channel2 2 ' .I ,’ - -- Channel 3 8 ' I, g 0.6 r / '5 0.5 g: a 0.4 .D 9 ’I n. 0.3 . I l 0.2 I 0.1 l O L L L l l 0 0.2 0.4 0.6 0.8 1 Probability of False Alarm Figure 2.4. Receiver operating Characteristics for Array and singe channel detec- tion. 2 Channell SNR = 1010g10 (5%) = 7.889 dB (7 1 Equivalent Array SNR = lologlo (pTR_1 p) = 9.3516 dB (235) Channel 1 data and the Array data give comparable detection performance in this case. However, if we had known the signal (and applied the test of Eq. (2.10), the performance of the Array detection would have been much better due to its 43 higher equivalent SN R. Thus, we can conclude that the performance gain in array detection is more pronounced in the case of known signals. 2.5 Results we run Monte Carlo simulations for spike detection of unknown signals for single channel and array observations at various SNR conditions. The performance of spike detection in the time domain and multiresolution (wavelet) domain is compared. Fig.2.5 shows the performance of detection for single channel detection in multiresolution and time domain for an SNR of 7.6024. As we can see, the multiresoltion detection performs much better than the time domain detection. Figs.2.6 and 2.7 also show similar improvements in detection performance due to the multiresolution analysis. F igs.2.8, 2.9, and 2.10 Show ROC curves for multiresolution and time domain detection at various SNR conditions. An example of the noisy data and the real locations of spikes for an SNR of 8.117 is shown in Figs.2.11 and 2.12. At these noise levels, the spikes are well masked by the noise so as to make detection by visual inspection quite difficult. The level of masking can be seen in 44 c .9 8 -7 cf v Multiresolution Scatter 8 0.6 f1 i 1 Time Domain Scatter 3 "‘~ ' —Multiresolution Best Fit 1; --'-'T1me Domain Best Fit 3 0. (U .D 9 o. 0. . of a 1 1 1 a 0 0.2 0.4 0.6 0.8 1 Probabiity'of False Alarm Figure 2.5. ROC curves for Multiresoution and Time domain detection for a single channel data at an SNR of 7.6024. c 0. v Multiresolution Scatter 1%: 1 Time Domain Scatter % 0 —Multiresolution Best Fit 0 ‘ '-~-'Time Domain Best Fit "5 2* . =3 0.4 - (U .D 9 ‘1 0.2 Gfi 1 1 1 1 1 O 0.2 0.4 0.6 0.8 1 Probabiity of False Alarm Figure 2.6. ROC curves for Multiresoution and Time domain detection for a single channel data at an SNR of 8.7623. 45 c - .9 g v Multiresolution Scatter g _ A Time Domain Scatter ‘6 — Multiresolution Best Fit 3‘ '- . -'Time Domain Best Fit ‘5 0.4 <5 .0 9 “L 0.2 0 1 1 1 1 1 0 0.2 0.4 0.6 0.8 1 Probabiity of False Alarm Figure 2.7. ROC curves for Multiresoution and Time domain detection for a single channel data at an SNR of 10.857. 1 c 0.8 .Q '3 ‘55 0 6 _ v Multiresolution Scatter e ., A Time Domain Scatter : . — Multiresolution Best Fit :15: Q4 '-'-‘Time Domain Best Fit 9 a. J I 0 0.2 0.4 0.6 0.8 1 Probabiity of False Alarm Figure 2.8. ROC curves for Multiresoution and Time domain detection for a two channel array at an SNR of 8.1177. 46 c 0.8 .9 *6 (D g 0.6 .. ‘5 I | .f "1% v Multiresolution Scatter % Q4141: 5 A Time Domain Scatter S ‘ 1 — Multiresolution Best Fit 9 1- , -'Time Domain Best Fit :1 0.2 Or: I A 1 I l 0 0.2 0.4 0.6 0.8 1 Probabiity 01 False Alarm Figure 2.9. ROC curves for Multiresoution and Time domain detection for a two channel array at an SN R of 6.9579. c 2% 19 v Multiresolution Scatter {3 0 6 ‘ A; A Time Domain Scatter c1 ' , A — Multiresolution Best Fit ° v-~-1Time Domain Best Fit E :_‘ '5 0.4 ~ (1! " .0 e : ‘L 0.25 C. - . e 1 . 0 0.2 0.4 0.6 0.8 1 Probabiity of False Alarm Figure 2.10. ROC curves for Multiresoution and Time domain detection for a two channel array at an SN R of 10.2124 47 Noisy Data (channel 1) i I L i L I 400 Time (ms) Figure 2.11. Noisy data on one channel used to detect spikes. Clean Data (channel 1) 300 400 500 fine (ms) Figure 2.12. The clean spike data to be detected. 48 the spike which is shown in detail in Fig.2.13. 25 20* 15* 10* 100 10 -20» 20 40 6080 Figure 2.13. A clean and noisy spike example. 2.6 Conclusions We have described Bayesian detection tests for spike detection in single electrode and array data. Analytical results for the Receiver Operating Characteristics have been derived for the known and unknown signal cases. Detection using the multi- electrode array data gives potentially better performance than single channel spike detection. A Uniformly Most Powerful property for the array detection versus the single channel detection is proved for the known signal case in terms of an “equiva- 49 lent array SNR”. The effect of the spatial correlation of the noise on the performance of the array detection is also investigated. In the case of an unknown signal case, the array detection gives better performance than the single channel detection for certain signal and noise cases, but it may not be Uniformly Most Powerful. We have concentrated on the unknown signal case, as we assume spike detection to occur in the preliminary stages of the neural signal processing before any signal conditions are estimated. Transforming the data in a multi-resolution space like the wavelet transform and running the tests on the various time scales improves detection because the wavelet compresses the signal energy in a small number of coefficients. The performance gain in wavelet domain detection has been experi- mentally verified for various Signal to Noise Ratio conditions in single channel and array cases. 50 CHAPTER 3 Clustering Neural Spike Trains 3.1 Introduction Ensemble coding of stimuli[22], motor commands[23], and memory[24] has been demonstrated by neuroscientists and is a subject of contemporary research. We consider high dimensional electrode arrays recording from subthalamic regions of the central nervous system. After the preliminary signal processing stages of compression, telemetry, spike detection, and spike sorting, we obtain spike trains of individual neurons. The recorded data usually comprise spike trains of a large number of neurons. While such neural yield presents rich data sets for understand- ing complex neural circuits and coding mechanisms, it makes inferring relationships between neurons quite a formidable task. However, not all the recorded neurons might be responding to a common stimulus under study. Our goal here is to cluster the data into groups of neurons which have correlated neural activity. This re- 51 duces the data dimensions for any secondary processing of the recorded spike trains. 3. 1 .1 Background The spike trains, which are a series of action potentials, are characterized as discrete-event random processes. The exact statistical characteristics of the spike trains is still a research topic in the neuroscience community, with most studies sub- scribing to either a rate coding or temporal coding mechanism in the neural system. The Poisson process has been traditionally used to model the arrival times of spike events. Alternatively, the inter spike interval generally obeys an exponential density. If the spike train is modelled as a homogenous Poisson process, it will be described completely by its mean firing rate. However, a homogenous Poisson assumption implies stationarity. Since neural signals encode time varying stimuli, commands, and higher brain functions, inhomogenous Poisson processes appear as a more natural way of modelling spike trains. An inhomogenous Poisson process is completely characterized by a time varying rate function. The rate function describes the probability of an action potential in a unit time interval, as a function of time. In this study we assume all neural spike trains to be inhomogenous spike trains. Various methods exist to characterize a spike train’s Poisson rate function. The simplest way to do so is to count the number of spikes occurring in a time 52 window of some duration 6T. This is known as binning the data with a bin width of size 6T. Inferring relationships between neural spike trains is an integral part of any multi unit spike train analysis[25]. Presently a number of methods exist to analyze these associations. The cross-corellogram is the cross variance of two spike trains calculated for different time lags. Cross-intensity is a similar relation that characterizes the firing rate of one neuron as a function of the firing time of the other neuron. These methods can be characterized as time domain methods[26]. Another widely used approach is the characterization of the firing activity of two or more neurons after the presentation of a stimulus. The Joint Peri-Stimulus Time Histogram (JPSTH) for two neurons is a matrix of the firing rate of one neuron at a time t1 after the application of the stimulus, and firing rate of the other neuron at a time t2 after the application of the stimulus. Computing the J PSTH requires repeatedly recording the same neurons under repetetive controlled application of the stimulus. Analogous to these time domain methods, there are various frequency domain methods that infer relationships between spike trains. Computing the correlation between the Fourier transforms of spike trains leads to measures of coherence. 53 Operating in the frequency domain, these methods are not bin size dependent but assume stationarity on the data. Spectrograms and coherograms can be seen as the frequency domain counterparts of the JPSTH as they characterize the joint frequency domain firing rates of neurons in the presence of stimuli[27]. Multiple spike trains can also be modelled as jointly random discrete event processes. The joint probability functions can thus be characterized by a number of parameters, depending on the form of the probability function assumed. Standard statistical techniques can be used to estimate these parameters like maximum likelihood[28] and minimum errors. However, this method requires a priori knowledge of the parametric form of the multidimensional point process probability function, and sufficient data to estimate the parameters. 3.2 Simulation Method Our simulation consists of 120 spike trains with 4 groups of 30 neurons each having related firing rates within the group, and independent of neurons outside the group. Every group of neurons had a basic firing rate function. For example, the kth group Sk is associated with a firing rate function fk E 3?le sampled at N points in time. A basic firing rate function was selected independently for each of the four groups. 54 For each group Sk, firing rate functions fm E 3?le were generated for all neurons m, m E Sk with linear and delay relations as in Eq. (3.1). fm[n] =afk[n—T] +5 (3.1) where a and B are randomly generated parameters for a linear relation, 7' is a randomly generated delay parameter. For each neural firing rate fm, a spike train sm is obtained as a realization of an in—homogenous Poisson process.‘ Spike trains were thus generated for 30 neurons m1, m2, - ~ ,m30 based on the basic firing rate function of each group k, where m1,m2, - -- ,m30 E Sk. For 4 groups k = {1,2, 3, 4}, a total of 120 spike trains are thus generated. Our aim is to cluster these 120 neuronal spike trains into the 4 natural clusters. 3.2.1 Proposed Technique . In order to overcome the difficulties in predetermining the bin size in time domain methods, the stationarity assumptions in frequency domain methods, and the data requirements of the the parametric methods, we rely on a novel method to cluster these spike trains. A discrete wavelet transform representation of the spike train is obtained to 8 levels. The correlations among the neuronal spike trains in different time scales are calculated and plotted in F ig.3.4. For visual clarity, each group of related neural spike train is numbered consecutively. Hence, a cluster appears as a continuous band of 30 neurons. We see that different clusters appear with different strengths in different time scales. Graph theoretic and k-means clustering results with features extracted by Principal Component Analysis are reported. 3.3 Multi-resolution Approach We assume that functionally interdependent neurons due to synaptic connections or shared stimuli will have related firing rates. The relations in the firing rate can be characterized in the pair-wise correlations of the estimated firing rates. We have indicated that time domain methods like cross-corellograms require a fixed bin width depending on the assumed frequency content of the signal. Increasing the bin width gives a more efficient estimate of the firing rate in the sense of a smaller variance of the estimate. However, the temporal resolution in the estimation of a varying firing rate would degrade with larger bin widths. Therefore, 56 this results in a variance-resolution trade off in the firing rate estimation. The optimal bin width will depend on the firing rate, its rate of variation, and its relation to the stimulus or signal being encoded by the neural signals. Frequency domain methods are not bin-size dependent but assume stationarity on the signals. Multi-resolution estimation of the firing rates resolves these problems by de- composing the spike trains with a wavelet transform (Fig.3.3). We propose the use of the Haar wavelet basis to represent the observed spike trains. The use of the Haar is well-suited for the spike trains because the wavelet coefficient at each level 3' of the wavelet tree can be related to the firing rate computed at a certain bin width Zj . The Haar basis approximation u[n] and detail v[n] functions are shown in Figs.3.1, and 3.2. 05* i Amplitude —0.5* .- L -1 -0.5 0 0.5 1 1.5 2 Time Figure 3.1. The Haar basis approximation function u[n]. 57 0.5* 8 a 3 c S -o.5» 11 -o.5 o 0.5 1 1.5 2 Time Figure 3.2. The Haar basis detail function v[n]. At any level j, the details coefficients d[n] of a signal s[n] in the wavelet transform are obtained as din] = $(v[2j(r — n)]s[r]), and the approximations coefficients a[n] by a[n] = Eli-(2423.0 — n)]s[r]). The Haar basis functions have compact support, and the constant absolute magnitude within the support does not scale the data as in the case of any higher order basis function. With the wavelet functions for the Haar basis, the details coefficient d[n] is simply the change in the firing rate at the nth bin of size 2j_1. The approximation coefficient a[n] is the firing rate at the nth of size 2j . A wavelet representation of the neural spike trains to characterize processes as Poisson or non-Poisson has been reported by Cao[29] for decoding neural spike trains in neural prosthetic devices. However, our purpose here is to have a convenient representation for the neural spike train firing rate for inferring clusters of neurons with correlated activity. 58 Figure 3.3. The Wavelet Tree representation to 5 levels. The wavelet transformation at level j can be represented as a lumped matrix W0) E ifffl x N where N is the length of the spike train being decomposed[21]. We initially aszshme the spike trains to be binned to a very small bin width 6T where within 6T no more than one event can occur. The spike train expressing the neural activity during a duration N 6T will be a vector of length N. Therefore, the wavelet details of the mth spike train sm at level j can be computed as in Eq. (3.2). 53m = W(j)sm (3.2) where S‘' 6 ER is the representation of sJ at node j. m 97.. x1 m 59 The representation of the M spike trains S E ERNX M at node j of the wavelet tree is given by Sj 6 RN xM (Eq. (3.3)). 2.7 sj = SW“) = [9'1 s1; 55M] (3.3) We can assume the spike train vector at bin width 6T to be a noisy estimate of its underlying firing rate fm = [fm[1] fm [2] fm [N ]]T sampled at intervals of 6T. Hence, for a neuron m with firing rate given by fm E 9?le (sampled at N intervals 6T time apart), the spike train data can be represented as in Eq. (3.4). where qm is expresses a random process associated with the uncertainty due to using a single realization of the random process (spike train Sm) to estimate the parameter (firing rate fm). Note that we chose 6T small enough so that not more than a single spike can occur in one bin (due to the refractory period). This implies We can derive some properties of the ”noise” term qm. At a particular instance of the discrete time, p, the probability of observing n spikes due to its Poisson nature, is given by Eq. (3.5). 60 ._. (fmipiérwe-(fmipiazr) PlSmlpl = ni n, (3.5) We thus observe that qm[p], at any time instance 19 is given by Eq. (3.6). Writing out its probability distribution function(Eq. (3.7)), we observe that it is a zero mean, non-stationary, temporally white process which is uncorrelated across neurons(Eq. (3.8)). Qmipl : Smlpl " fmlPl5T (3-6) _ (fm[p]6T)(m+fm[Pl5T)e-(fm[p]6T) Pllepl = ml (m + fmlplciT)! ’ (3-7) Vm -— fm[p]6T E {..., —-1,0, 1,2, ...} ElQmiPll = 0 Ei(qmlpl)zl = fmlplé‘T — e '. 1 g: i '1 4o; _ '2- ; ‘ EH ’1" t 20:1: . . ' w i: fimwmxmr ,0" '\ ‘ll “3"?! 'IHM' l l 4‘ 'u‘ ,, ii . "1" -.‘ 3. . figm Imefimh "14$ Wm wean-‘1: ‘mflMfiTfi! 405030100120 204060n80100120 2040m80100120 Neuron Nouro Neuron Figure 3.4. Correlations in different subbands. 64 3.4 A Principal Components Approach for Multi—scale Clustering Eq. (3.11) hints that some clusters can be identified in certain time scales with different strengths 6%. Correlated neural activity at time scale 3' appears as non zero correlation among the neurons at that time scale. Consider a cluster correlation A(k) E me M for every cluster k = 1,... K with the m1,m2th element being A(k)[m1,m2] = am1 kamz k- The cluster correlations A(k) can be K thought of as the essential features making up the correlation CO) = Z A0045";c at any arbitrary time scale 3'. Therefore, Principal Component Analysifgll] should resolve the cluster energies in the first few dominant principal components, with the noise energy distributed among the smaller principal components. The correlation profiles in different principal components are given in Fig.3.5. Figures in this thesis appear in color. The dominant principal components express the energy of the correlated firing rates in them, while the independent noise occupy the less dominant principal components. The eigenvalues associated with the principal components are tabulated in Table 3.1. The first four principal components capture 75.5% of the energy, and we assume these to capture the required information for all the clusters. We form a fused matrix from the principal components to be used for clustering the neurons. The entries of this matrix D are calculated as in Eq. (3.13) using the 4 most dominant principal components. 65 Table 3.1. Eigenvalues associated with the Principal Components Principal Component Eigenvalue 1 0.2537 0.1926 0.1739 0.1349 0.1136 0.0598 0.0326 0.0226 0.0162 QDOOKIOBCJWiD-CON 4 dm1,m2 = Z laiiiimglAm (3'13) . p=1 (p) where cm m is the correlation between the neurons m and m in rincipal com- 11 2 1 2 p h ponent p, and A(p) is the eigenvalue associated with the pt principal component. This resulting matrix has a profile plotted in F ig.3.6. As we can see, all the four natural clusters are well represented and distinctly visible in this matrix. 3.5 Graph Theoretic Clustering A graph theoretic clustering approach[32] is needed in our application because the neurons to be clustered are related through pair-wise similarity measures 66 .11. ...-a ”an? :23. Principal Component 1 Principal Component 2 Principal Component 3 . “i -_I I' s {I 1 1 l I.— r111 l 21.-.... _ _._J .. ..., 5 4 20 40 80 no 100120 20 40 60 80 100120 20 40 60 80 100120 Neuron Neuron Neuron Figure 3.5. Principal Components of correlations. 67 rather than a low dimensional vector associated with each neuron as required in vector clustering algorithms[33]. The true clusters of neurons are plotted in the space of two components resolved from the edge weight matrix of Eq. (3.13) are plotted in Fig.3.7. While our choice of the edge weights computed from the principal components of the correlation matrices in different subbands offers a good representation of the clusters, the data cannot be satisfactorily clustered with any of the classical “vector data” algorithms like k—means or k-Nearest Neighbors. The neural spike trains are represented as the vertices of an undirected graph. The weight, or capacity of an edge connecting any two neurons m1 and m2 is a measure of correlation among the neurons. The weights of the edges are assigned from the D matrix, with the edge weight between neurons m1 and m2 given by Eq. (3.13). We use a minimum cut type of algorithm on our graph representation to separate the neurons in clusters. Probabilistic clustering, with “sof ” member- ship functions to characterize the membership of a neuron in a cluster is used. Probabilistic clustering is a technique to reduce the computational complexity of an otherwise NP-hard clustering problem[34]. We wish to find clusters that have maximum edge weights between neurons within a cluster, and minimum edge weights between neurons in two different clusters, or across a cut. With the pm”; denoting the membership of neuron ml in cluster k, we form an objective 68 function[35] described by Eq. (3.14). We note that intuitively, we want pch to have a high value if amk is statistically non-zero. MM 2 Z Pi,kpj,kdz',j K i=13=1 i: Z (3.14) Maximizing the objective function 1 attempts to minimize the edge weights (which are measures of correlated firing) across cuts separating the clusters. Finally, crisp decisions on the cluster memberships are calculated by assigning the neuron to the cluster in which it has the highest probabilistic membership. 3.6 Clustering Results To compare the performance of our probabilistic graph clustering with traditional methods, we tried to use the first few resolved components from the fused matrix D to perform clustering with the k-means algorithm. As we can see in Fig.3.7, neural groups 1 and 3 are well mixed, and can not be linearly separated in the first two component space. Fig.3.8 shows a k-means result for the first two principal components of the fused matrix. As expected, this gives quite a significant clus- tering error of 32.5% We define the clustering error as the number of incorrectly 69 Neuron 120 100 80 60 4o 20 4 Principal Components (Ii. - «1.5 20 40 60 80 100120 Neuron Figure 3.6. Fused profile of the first 4 principal components. 70 2.5 w r 0 Ar Cluster 1 2 _ 0 O O Cluster 2 H O o O g 06) o 0 U Cluster 3 O O 0 O A Cluster 4 N 1.5“ O 00 no ‘ "" O C a) O O 000 .59; 3%“ g 1 l. g '* A.“ A A w .1 Q D O “A A E [:1 :D at ”E El Dalia/3} 1 as 8 EU fi%‘*fl A 0 5 - 1:1 Cl I D a - D #0 Cl D are are * * :1 D * Oi . 13 _0.5 l 1 r 1 —2 —1.5 -1 —0.5 0 0.5 Component 1 Figure 3.7. The true clusters depicted in the first two components resolved from the fused matrix D. 71 Table 3.2. Clustering Errors obtained in the k-means algorithm No. of Principal Components Clustering Error 2 33.21% 3 27.49% 4 14.61% 5 16.87% 6 16.93% 7 17.57% 15 16.41% classified neurons as a fraction of the total number of neurons. Since the results of the k-means algorithm depend on the initial guess of cluster centers, we run the k-means in 1000 iterations and compute the mean error. Table 3.2 shows the mean clustering errors for the k-means algorithms applied to different numbers of principal components. As we see, the clustering error does not decrease uniformly by increasing the number of principal components. Though increasing the number of principal components captures more useful information, it also adds statistical noise to the data. This shows that the performance of the k—means algorithm is limited due to the inherent nature of the pair-wise similarity measures in our prob— lem, which lends itself better to a graph theoretic representation. Probabilistic clustering with the edge weights of Eq. (3.13) gave a clustering error of 2.5% z'.e.,only 3 neurons out of the 120 being classified in the wrong cluster. 72 2.5 O 1. Cluster 1 2 _ 0 O O Cluster 2 O o O 806) o O D Cluster 3 1 5 O 8 go 0 A Cluster 4 N - b A E C] O . 1 X1”; [Tic a:J DB ”L. 1.35"“. ’ 451,0. 8 1 h D are ALA /. E 3 SE C] D D *gie; ; ‘E 8 05’ 0633?] DD *1#*’21:* H DD C] a?“ * 1:1 ,.. #91; 0 F E] O.§2 -1.5 —1 -O.5 0 0.5 Component 1 Figure 3.8. Result of the k-means algorithm applied to the first two principal components of the fused matrix. 73 3.7 Conclusions We have demonstrated the use of a novel multi-scale clustering algorithm for par- titioning a large set of neural spike trains into groups with correlated firing. A multi-scale or wavelet domain representation of the neural spike train eliminates the need of fixing a bin width and assuming stationarity, which are the limita- tions of traditional time domain and frequency domain methods respectively. We have demonstrated that the probabilistic graph theoretic clustering approach out- performs traditional algorithms like k-means for our application. We believe that this algorithm can go a long way in enabling neuroscientists to analyze large scale neural recordings and give a better systems understanding of neural activities. In future, we need to test this approach on data sets generated from a more biologically faithful representation of neural circuits or on real large scale data sets. 74 BIBLIOGRAPHY BIBLIOGRAPHY [1] K.D Wise, D.J. Anderson, J.F. Hetke, D.R. Kipke, and K. Najafi. Wireless implantable microsystems: High-density electronic interfaces to the nervous system. In Proc. of the IEEE, volume 92, pages 76—97, 2004. [2] FA. Edwards, A. Konnerth, B. Sakmann, and T. Takahashi. A thin slice preparatiOn for patch clamp recordings from neurones of the mammalian central nervous system. Pfiugers Archive: European Journal of Physiology, 414(5):600—612, September 1989. [3] J .C. Chang, G.J. Brewer, and BC. Wheeler. Microelectrode array recordings of patterned hippocampal neurons for four weeks. Biomedical Microdevices, 2(4):245—253, December 2000. [4] M. Ghovanloo and K. Najafi. A wideband frequency-shift keying wireless ink for inductively powered biomedical implants. IEEE Transactions on Circuits and Systems I, 51(12):2374—2383, December 2004. [5] KC. Oweiss, D.J. Anderson, and M.M. Papaefthymiou. Optimizing signal coding in neural interface system-on-a-chip modules. In Proceedings of the 25th IEEE Conference of EMBS, pages 2016-2019, September 2003. [6] S. Mallat. A Wavelet Tour of Signal Processing. Academic Publishers, 2 edition, 1999. [7] K. G. Oweiss and D. J. Anderson. A multiresolution generalized maximum likelihood approach for the detection of unknown transient multichannel signals in colored noise with unknown covariance. In Proceedings of the I CASSP ’2002, volume 3, pages 2993—2996, May 2002. [8] KC. Oweiss and DJ. Anderson. Noise reduction in multichannel neural recordings using a new array wavelet denoising algorithm. Neurocomputing, 38-4021687—1693, July 2001. [9] KC. Oweiss and DJ. Anderson. Massit-multiresolution analysis of signal sub- space invariance technique: A novel algorithm for blind source separation. In Pmc. of the IEEE 35th Asilomar Conference on Signals, Systems and Com- puters, pages 819—823, November 2001. 76 [10] [11] Ml [13l l14l [15] [16] [17] [18] [19] [20] [21] S.G. Mallat. A theory for multiresolution signal decomposition: the wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 11(7):674—693, July 1989. I. Daubechies and W. Swedens. Factoring wavelet transform into lifting steps. Journal of Fourier Anaysis Apllications, 41(3):247—269, 1998. W. Sweldens, A.R. Calderbank, I. Daubechies, and BL. Yeo. Wavelet trans- forms that map integers to integers. Applied Computational Harmonic Analy- sis, 5(3):332-369, 1998. K. G. Oweiss and D. J. Anderson. A unified framework for advancing ar- ray signal processing technology of multichannel microprobe neural recording devices. In Proceedings of the 2nd IEEE Conference on Microtechnology in Medicine and Biology, pages 245—250, May 2002. David L. Donoho and Iain M. Johnstone. Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3):425—455, 1994. Michael S. Lewicki. A review of methods for spike sorting: the detection and classification of neura action potentials. Network: Computational Neura Systems, 9:R53—R78, 1998. Karim Oweiss, Andrew Mason, Kyle Thomson, Jian Li, and Yasir Suhail. A scalable wavelet transform vlsi architecture for real-time neural signal condi- tioning in implantable multichannel neuroprosthetic devices. Under review, for IEEE Transactions on Circuits and Systems. Y. Suhail and KC. Oweiss. A reduced complexity integer lifting wavelet- based module for real-time processing in implantable neural interface devices. In Conference Proceedingsof the 26th Annual International Conference of the Engineering in Medicine and Biology Society, pages 4552 — 4555, Sept. 2004. K. Oweiss, K. Thomson, , and D. Anderson. A systems approach for real- time data compression in advanced brain-machine interfaces. In Conference Proceedings of the 2nd International IEEE EMBS Conference on Neural En- gineering, 2005. Z. Nenadic and J. W. Burdick. Spike detection using the continuous wavelet transform. IEEE Transactions on Biomedical Engineering, 52(1):74—87, Jan- uary 2005. K. G. Oweiss. Multiresolution Analysis of Multichannel Neural Recordings in the Context of Signal Detection, Estimation, Classification and Noise Suppres- sion. PhD thesis, University of Michigan, Ann Arbor, 2002. Michael W. Frazier. An Introduction to Wavelets Through Linear Algebra. Springer-Verlag, 1999. 77 [22] Paul M. Gochin, Michael Colombo, A. Gerald Dorfman., George L. Gerstein, and Charles G. Gross. Neural ensemble coding in inferior temporal cortex. Journal of Neurophysiogy, 71(6):2325—2327, June 1994. [23] Mark Laubach, Johan Wessberg, and Miguel A. L. Nicolelis. Cortical ensemble activity increasingly predicts behaviour outcomes during learning of a motor task. Nature, 405:567—571, June 2000. [24] H Eichenbaum, SI Wiener, ML Shapiro, and NJ Cohen. The organization of spatial coding in the hippocampus: a study of neural ensemble activity. Journal of Neuroscience, 9:2764—2775, August 1989. [25] Emery N. Brown, Robert E. Kass, and Partha P. Mitra. Multiple neural spike train data analysis: state-of-the—art and future challenges. Nature Neu- roscience, 7(5):456—461, May 2004. [26] Z. H. Perkel, G. L. Gerstein, and G. P. Moore. Neuronal spike trains and stochastic point processes. ii. simultaneous spike trains. Biophysics Journal, 7(4):419—40, July 1967. [27] M. R. Jarvis and Mitra P. P. Sampling properties of the spectrum and co- herency of sequences of action potentials. Neural Computation, 13:717-749, 2001. [28] ES. Chornoby, L.P. Schramm, and A. F. Karr. Maximum likelihood estimation of neural point processes. Biological Cybernetics, 59:265—275, 1988. [29] Shiyan Cao. Spike Train Characterization and Decoding for Neural Prosthetic Devices. PhD thesis, California Institute of Technology, 2003. [30] Y. Hata, T. T sumoto, H. Sato, and H. Tamura. Horizontal interactions between visual cortical neurones studied by cross-correlation analysis in the cat. Journal of Physiology, 441:593—614, September 1991. [31] IT. Jolliffe. Principal Component Analysis. Springer, 2 edition, October 2002. [32] D. W. Matula. Classification and Clustering, chapter Graph theoretic tech- niques for cluster anaysis algorithms, pages 95—129. Academic Press, 1977. [33] J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2001. [34] A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems, 14, 2001. [35] K. Oweiss, R. Jin, and Y. Suhail. Scale-space processing and clustering for ef- ficient multi-electrode data analysis of large-size neuronal ensebles. In Confer- ence Proceedings of the 2nd International IEEE EMBS Conference on Neural Engineering, 2005. ‘Q 00