GRAPH-BASED METHODS FOR INFERRING NEURONAL CONNECTIVITY FROM SPIKE TRAIN ENSEMBLES By Seif El-din El-dawlatly A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Electrical Engineering 2011 ABSTRACT GRAPH-BASED METHODS FOR INFERRING NEURONAL CONNECTIVITY FROM SPIKE TRAIN ENSEMBLES By Seif El-din El-dawlatly Understanding the brain’s inner workings requires studying the underlying complex networks that bind its basic computational elements, the neurons. Advances in extracellular neural recording techniques have enabled simultaneous recording of spike trains from multiple single neurons in awake, behaving subjects. Yet, devising methods to infer connectivity among these neurons has been significantly lacking. We introduce a connectivity inference framework based on graphical models. We first infer the functional connectivity between neurons by searching for clusters of statistically dependent spike trains. We then infer the effective connectivity between neurons within each cluster by building Dynamic Bayesian Network (DBN) model fit to the spike train data. Using probabilistic models of neuronal firing, we demonstrate the utility of this framework to infer neuronal connectivity in moderate and large scale networks with a substantial gain in performance compared to classical methods. We further use this framework to examine the role of spike timing correlation in infragranular layer V of the primary somatosensory cortex (S1) of the rat during unilateral whisker stimulation in vivo. Stable, whisker-specific networks provided more information about the stimulus than individual neurons’ response. We finally demonstrate how this framework enables tracking and quantifying plastic changes in connectivity in biologically-plausible models of spike-timing-dependent-plasticity as well as changes in S1 response maps following sensory deprivation in the awake, behaving rat. Copyright by SEIF EL-DIN EL-DAWLATLY 2011 th To the martyrs of the January 25 , 2011 Egyptian revolution iv ACKNOWLEDGMENTS First, I want to express my gratitude to Allah whose blessings made my efforts fruitful. Second, I could not have done this without many people who have directly or indirectly helped me. I would like to thank them all, but there are some people who deserve special recognition. I would like to thank my advisor Dr. Karim Oweiss for his continuous guidance. His encouragement, patience and meeting with me every single week are the primary reasons this work has progressed. He has provided me with an excellent atmosphere for doing research as well as generous support to present our work at many prestigious conferences. I would like to express my appreciation to my dissertation committee members Dr. Hayder Radha and Dr. Hongbing Wang. I am also grateful to Dr. Rong Jin for providing valuable comments and suggestions throughout our collaboration. I would also like to thank all my current and previous lab mates who were more of friends than colleagues: Mehdi Aghagolzadeh, Ki Yong Kwon, Ali Mohebi, John Daly, Fei Zhang, Debbie Mukherjee, Jianbo Liu, Jasmina Jakupovic, Jennifer Parker and Faisal Abu-Nimeh. My thanks also go to my friends who have always been there for me: Kareem Refaat, Sameh Zakhary, Cherif Salama, Rana Alkhamra, Jameel Haddad, Mohamed El-gafy, Sara Al-alili, Mona Alchaer, Bassem Amin, Ahmed Mamoun and the TAP Wednesday soccer team. I would also like to thank my parents and my brother for their support and encouragement during years of my work. Living thousands of miles away has not prevented them from supporting me every single day during my PhD. Finally, I would like to thank the generous funding of my PhD by the National Institute of Neurological Disorders and Stroke (NINDS) grant number NS054148 and the Graduate School. v TABLE OF CONTENTS LIST OF TABLES....................................................................................................................... viii LIST OF FIGURES ....................................................................................................................... ix LIST OF ABBREVIATIONS..................................................................................................... xvii Chapter 1: Introduction ................................................................................................................... 1 Chapter 2: Background and Overview............................................................................................ 7 2.1 Neural Activity................................................................................................................ 7 2.2 Recording and Pre-processing of Extracellular Neural Activity .................................... 9 2.3 Existing Methods for Inferring Neuronal Connectivity................................................ 13 2.3.1 The Crosscorrelogram........................................................................................... 13 2.3.2 Information-theoretic Methods ............................................................................. 15 2.3.3 Granger Causality ................................................................................................. 17 2.3.4 Generalized Linear Model .................................................................................... 19 2.4 Graphical Models for Inferring Neuronal Connectivity ............................................... 22 Chapter 3: A Multiscale Clustering Approach for Identifying Neuronal Functional Connectivity...................................................................................................................................26 3.1 Neuron Population Model............................................................................................. 26 3.2 Scale-space Representation........................................................................................... 30 3.3 Similarity Measures ...................................................................................................... 32 3.4 Spectral Clustering........................................................................................................ 34 3.5 Results........................................................................................................................... 36 3.5.1 Clustering Synaptically-Coupled Populations ...................................................... 37 3.5.1.1 Population with Fixed History Interval................................................................. 37 3.5.1.2 Population with Variable History Interval............................................................ 40 3.5.1.3 Clusters with Distinct Temporal Interactions ....................................................... 42 3.5.1.4 Distinct within-cluster Temporal Interactions ...................................................... 42 3.5.2 Variability in Network Topology.......................................................................... 44 3.5.2.1 Structural Plasticity............................................................................................... 45 3.5.2.2 Spike-Timing Dependent Plasticity ...................................................................... 47 3.5.3 Clustering Large Populations................................................................................ 49 3.5.4 Estimating the Time Scale of Maximum Accuracy.............................................. 51 3.6 Validation using Barrel Cortex Data............................................................................. 52 3.7 Discussion and Conclusion ........................................................................................... 56 Chapter 4: Inferring Neuronal Effective Connectivity using Dynamic Bayesian Networks........ 59 4.1 Dynamic Bayesian Network Theory............................................................................. 59 4.2 Learning Bayesian Networks........................................................................................ 63 4.3 DBN and Spike Trains .................................................................................................. 65 4.4 Simulation Model.......................................................................................................... 67 vi 4.5 Results........................................................................................................................... 69 4.5.1 Networks with Fixed Synaptic Latency................................................................ 70 4.5.2 Mismatch between Actual Synaptic Latency and DBN Markov Lag................... 75 4.5.3 Networks with Variable Synaptic Latencies......................................................... 78 4.5.4 Identifying Monosynaptic Connectivity ............................................................... 79 4.5.5 Networks with Unobserved and Independent Neurons ........................................ 82 4.5.6 Identifying Connectivity in Large Populations..................................................... 86 4.5.7 Estimating the Markov Lag of Maximum Accuracy ............................................ 88 4.6 Discussion and Conclusion ........................................................................................... 90 Chapter 5: Millisecond-Timescale Stimulus-specific Network Coding ....................................... 93 5.1 Rat Barrel Cortex .......................................................................................................... 93 5.2 Methods......................................................................................................................... 95 5.2.1 Barrel Cortex Recording ....................................................................................... 95 5.2.2 Single Unit Analysis ............................................................................................. 98 5.2.3 Dynamic Bayesian Networks Analysis................................................................. 99 5.2.4 Network Similarity and Network Information...................................................... 99 5.2.5 Decoding Whisker Identity ................................................................................. 102 5.2.6 Crosscorrelogram Analysis................................................................................. 103 5.2.7 Predicting Single Neuron Firing from Pre-synaptic Peers.................................. 104 5.3 Results......................................................................................................................... 106 5.3.1 Firing Characteristics .......................................................................................... 106 5.3.2 Whisker-specific Networks................................................................................. 108 5.3.3 Relating Network Properties to Individual Neuronal Responses ....................... 116 5.3.4 Crosscorrelogram Comparison ........................................................................... 118 5.3.5 Network Model-based Prediction of Single Neuron Firing................................ 120 5.4 Position-specific Networks in the Medial Prefrontal Cortex...................................... 122 5.5 Discussion and Conclusion ......................................................................................... 124 Chapter 6: Network Dynamics Associated with Experience-dependent Plasticity in the Rat Somatosensory Cortex................................................................................................................. 130 6.1 Experience-dependent Plasticity in the Barrel Cortex ................................................ 130 6.2 Methods....................................................................................................................... 132 6.2.1 Barrel Cortex Recording and Electrode Implantation ........................................ 132 6.2.2 Data Analysis ...................................................................................................... 134 6.3 Results......................................................................................................................... 135 6.3.1 Changes in Individual Neuron Response............................................................ 135 6.3.2 Changes in Network Response ........................................................................... 137 6.4 Discussion and Conclusion ......................................................................................... 139 Chapter 7: Concluding Remarks and Future Directions ............................................................. 142 REFERENCES ........................................................................................................................... 148 vii LIST OF TABLES Table 4.1:Conditional probabilities Pr(W|R) and Pr(U|R)............................................................ 61 t+1 Table 4.2: Conditional probabilities Pr(S t t | S , R )..................................................................... 62 viii LIST OF FIGURES Figure 2.1: (a) Simple neuron structure (Adapted from www.solarnavigator.net/human_brain.htm). (b) Two neurons in contact. (c) A train of three action potentials or spikes. (d) Excitatory post-synaptic potential (top) and inhibitory post-synaptic potential (bottom) (Adapted from [32]). For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation. ......................................................................... 8 Figure 2.2: (a) Micro-wire arrays. Wires made of tungsten with a diameter of 50μm and a minimum inter-wire separation of 250μm (Adapted from www.tdt.com). (b) Siliconbased multi-electrode array (Michigan array) with (Left) 8 shanks and (Right) 4 shanks and 4 electrodes/shank [44]. (c) Another example of silicon-based arrays (Utah array) with 100 single electrodes [45]. ............................................................ 10 Figure 2.3: Extracellular recording pre-processing. Spikes are first detected (shown in red) from the filtered raw signal. Characteristic features are extracted from the detected spikes shape (using principal component analysis in the diagram). Each point in the feature space represents one spike. A clustering algorithm is then used to identify the spikes belonging to each neuron. In this example, 3 different neurons are recorded (red, blue and magenta) while the black cluster represents miss-detected spikes (i.e. noise). ................................................................................................................................... 12 Figure 2.4: (a) Crosscorrelogram and (b) Post Stimulus Time Histogram (PSTH) and Joint PSTH (JPSTH) computed from the spike trains of two neurons in the rat somatosensory (barrel) cortex in response to whisker stimulation. Bin size was set to 10 ms in both figures. The crosscorrelogram in (a) indicates an excitatory interaction between the two neurons as indicated by the significant narrow peak around 10 ms. The JPSTH in (b) indicates that the significant synchrony between the two neurons occurs about 30 ms post stimulus presentation and therefore may indicate that the stimulus represents the cause of the synchrony. ....................................................................................... 14 Figure 2.5: (a) GLM of a 2-neuron population used to fit spike trains from stimulus-driven retinal ganglion cell (RGC) responses in vitro. Each neuron input is processed by a stimulus filter ki, a post-spike filter αii and coupling filters that capture other neurons’ dependencies, αij’s. (b) Parameters of an example retinal ganglion ON cell model: Temporal and spatial components of center and surround filter, post-spike filter and coupling filters from neighboring OFF and ON cells. (Adapted from [83]). ............ 20 Figure 2.6: (a) Undirected graph of a sample neuronal population where edges indicate the membership of the connected neurons in the same cluster. (b) Directed graph of the same population in (a) where edges indicate the causal influence among the neurons. (c) Sample small-world network. Connections are mainly local with the exception of a few distal connections (neurons 2 to 6 and neurons 5 to 8). (d) Sample scale-free network, where a small number of neurons appears as hubs (neurons 2 and 8)........ 24 ix Figure 3.1: Synaptic weights for (a) Mpq = 40 bins (Hpq = 120 ms) and (b) Mpq =120 bins (Hpq = 360 ms). In both cases, Apq = 2.5, B = 2000 and F = 3000. When the history interval is limited to 40 bins, the function exhibits smaller time constant within the short time support where the function is non-zero. In contrast, the function exhibits larger time constant when Mpq =120. ........................................................................ 30 Figure 3.2: Two-level Haar scale space representation of a sample spike train. .......................... 32 Figure 3.3: A sample undirected graph with 4 objects and arbitrary edge weights for which the spectral clustering algorithm clusters objects with strong similarity (small edge weight indicates stronger similarity between objects)............................................... 35 Figure 3.4: Pseudocode of the multiscale clustering method. ...................................................... 36 Figure 3.5: (a) Network topology for the independent clusters population (solid lines), I = inhibition, E = excitation. (b) Sample 3-sec trace of the obtained spike trains. The initialization period is indicated by a black bar. (c) Sample ISI histogram for a neuron in the population and an independent neuron at 12 ms and 360 ms history intervals, respectively. Coefficient of variation of the ISI histogram. (d) Augmented correlation matrix using time scales up to 96 ms. (e) Average clustering accuracy vs. time scale for a fixed 360 ms history interval obtained using probabilistic spectral clustering and k-means clustering.............................................................................. 38 rd Figure 3.6: (a) Clustering accuracy vs. time scale fitted with 3 order polynomial for the network in Figure 3.5a for variable history intervals. (b) (Gray dots) Time scale of maximum clustering accuracy obtained for the 25 trials vs. the population true history interval. (Black dots) Average of maximum accuracy time scales. .............. 41 rd Figure 3.7: (a) Average clustering accuracy ± SD vs. time scale fitted with 3 order polynomial 3 4 1 2 3 4 for different settings of M and M (M and M fixed to 20 bins). H = H = H . (b) (Gray dots) Time scale of maximum clustering accuracy vs. history interval of clusters 3 and 4 for each of the 25 trials. (Black dots) Average of maximum accuracy time scales.................................................................................................................. 42 Figure 3.8: (a) Average clustering accuracy ± SD over all trials vs. time scale. (b) Average clustering accuracy using crosscorrelograms for different window sizes. (c) Average clustering accuracy using JPSTH for different window sizes. .................................. 43 Figure 3.9: (a) Network topology with across-cluster synaptic coupling (dotted lines). (b) Average clustering accuracy vs. time scale. (c) Augmented R matrix using time scales up to 96 ms. ..................................................................................................... 46 x Figure 3.10: (a) Network topology with paired stimulation to induce STDP. (b) EPSP slope as a function of pre-post synaptic spikes amplitude (A) and time constant (τ) were chosen as 1.01 and 14.8 ms for LTP, and -0.52 and 33.8 ms for LTD, respectively. (c) Average clustering accuracy post stimulation phases assuming 4 clusters and 2 clusters. (d) Similarity matrix using time scales up to 96 ms post the 10 stimulation phases and (e) post 120 stimulation phases. .............................................................. 48 Figure 3.11: (a) Network topology for a population of 120 neurons, where each cluster has 30 neurons. The bidirectional connections refer to an excitation connection in one direction and an inhibition in the opposite one. Black connections represent the within cluster connections, while gray connections represent the across clusters 1 and 2 connections that undergo functional plasticity. (b) Average pair-wise distance between neurons in clusters 1 and 2 for variable connectivity strengths. A connectivity strength of 1 indicates a connection across-clusters of the same strength as a connection within clusters. (c) Average clustering accuracy for variable acrossclusters connectivity strengths (clusters 1 and 2). ..................................................... 50 rd Figure 3.12: Average Dunn index ± SD over all trials vs. time scale fitted with 3 order polynomial for the network in Figure 3.5a for different history intervals................. 52 Figure 3.13: (a) Sample stimulus-specific networks formed of clusters for 3 different whiskers (C1, D1 and D2) of rat R3. (b) Network feature space of rat R3 for 3 different whiskers (C1, D1 and D2). Each dot corresponds to the projection of one network onto a 2-dimension principal components (PC1 and PC2) feature space. (c) Network information for rat R3 as a function of the number of clusters used for each of whiskers C1, D1 and D2............................................................................................ 54 Figure 3.14: (a) Histogram of the spatial difference between the connected neurons quantified in terms of the horizontal and vertical separations between the electrodes on which neurons were recorded. (b) Histogram of the temporal difference between the connected neurons quantified in terms of the difference in the connected neurons’ response latency to the stimulus. The number of clusters for each rat in both (a) and (b) was set to the number that maximizes the mutual information between the networks and the stimulus. ........................................................................................ 55 Figure 4.1: (a) An example of a Bayesian Network. Random variable R indicates if it rains or not, W indicates if the grass is wet or not, and U indicates if people bring umbrella or not. (b) An example of a Dynamic Bayesian Network. Each morning an automatic sprinkler randomly decides whether it should water the lawn or not conditioned on whether it watered the day before and whether there was rain the day before.......... 61 Figure 4.2: Schematic of the DBN structure search algorithm..................................................... 64 Figure 4.3: (a) A simple causal network where nodes represent neurons and edges represent connections. lij indicates the synaptic latency associated with each connection. (b) xi Corresponding DBN where black nodes represent the neuron firing state at a specific Markov lag while white nodes represent the present state......................................... 66 Figure 4.4: An illustrative graph of α ij (t ) where lijΔ is the synaptic latency, Aij models the + connectivity strength, Mij/3000 is the time constant of the decaying exponential where Mij denotes the number of history bins, and MijΔ is the history interval........ 68 Figure 4.5: (a) DBN performance vs. number of excitatory input (pre-synaptic) connections. Inset: Connection strengths Aij for each choice of that number. (b) DBN performance vs. firing history interval length. Inset: Connection strengths Aij for each choice of this length. (c) DBN performance (solid) vs. ratio of inhibitory to excitatory synaptic strength (I/E Ratio). The accuracy gets closer to unity as inhibition strength surpasses excitation strength (I/E ratio increases above 1). Partial Directed Coherence (PDC) (dashed) and Generalized Linear Model (GLM) fit (dotted) performance shown for comparison. (d) Coefficient of variation for the data analyzed in (c). (e) DBN performance vs. background rate............................................................................... 71 + Figure 4.6: (a) Coupling function α ij (t ) for different history intervals M (in bins). (b) DBN performance vs. Markov lag for variable history intervals and fixed synaptic latency (4 bins).(c) PDC performance vs. model order for the data analyzed in (a). (d) GLM fit performance vs. detection threshold for the data analyzed in (b). (e) GLM fit performance vs. GLM parameter WGLM when the model parameter M was set to 60 bins..............................................................................................................................76 Figure 4.7: (a) A simple network illustrating the dependence on the synaptic heterogeneity index (HI). The subpopulation inside the dashed circle has a HI of 3 while the entire population (neurons 1 to 6) has a HI of 5. The firing history was fixed at 60 bins (180 ms). (b) DBN, PDC and GLM-fit performance vs. HI.............................................. 79 Figure 4.8: (a) Example 3-neuron chain, where solid arrows represent true connections while the dotted one represents a spurious connection, l indicates the synaptic latency associated with each connection. (b) Average number of spurious connections inferred per chain....................................................................................................... 80 Figure 4.9: (a) Network structure examined where black nodes 1-5 in the dashed rectangle represent unobserved neurons. Solid arrows indicate real connections while dashed ones indicate the expected connections to be inferred by DBN. (b) DBN performance vs. difference between the synaptic latency of the connections to odd numbered neurons and even numbered neurons (lo − le) where le is fixed at 1 bin. (c) A network of 20 neurons, where white nodes indicate observed neurons while black nodes indicate unobserved neurons. (d) A network with 15 observed neurons; neurons 1-10 (inside the dashed circle) receive connections while neurons 11-15 are not connected to any neurons. The history interval was set to 60 bins (180ms) and the synaptic latency was fixed to 1 bin (3 ms) for all connections. ............................................... 84 xii Figure 4.10: (a) DBN performance vs. number of neurons/subpopulation for different search intervals. 10 different populations of 120 neurons each were simulated. (b) Search time required by DBN to achieve 100% accuracy (F-Measure = 1) vs. number of neurons/subpopulation. (c) DBN performance vs. search time for 120 neuron populations with no clusters. ..................................................................................... 87 Figure 4.11: (a) Mean influence score (IS) vs. Markov lag for variable history intervals for the same data analyzed in Figure 4.6a. Note that the mean IS has similar profile to the accuracy. (b) DBN performance vs. Markov lag for networks with HI equals 3. Maximum accuracy is attained when the maximum Markov lag matches the maximum synaptic latency in the population. (c) Mean influence score vs. Markov lag for networks with HI equals 3.............................................................................. 89 Figure 5.1: (a) (Left) The rat’s right snout with the whiskers organized in rows, indexed in letters, and columns, indexed in numbers. (Right) A horizontal section of the barrel cortex at layer IV showing the whiskers mapping (Adapted from [197]). (b) Flow of information from the Thalamus through the barrel system. The thickness of the arrows correspond to the strength of the connections [198]...................................... 94 Figure 5.2: (a) Experimental setup where rats were anesthetized and whiskers were deflected one at a time using a capillary tube glued to a piezoelectric bimorph. (b) Whiskers were deflected horizontally for 100 ms. Red curve indicates whisker displacement......... 97 Figure 5.3: Firing characteristics of the recorded neurons. Two sample neurons: (a) Neuron 2a and (b) Neuron 10a from rat R2 response to stimulation of whiskers C2, D1 and D2. (Top) Whisker displacement. (Middle) Spike raster over multiple repeated trials. (Bottom) Post-stimulus Time Histogram (PSTH) with 0.5ms bin size. Neuron 2a shows stronger and faster response to whisker C2 than other whiskers while Neuron 10a shows a slightly stronger and faster response to whisker D1. (c) Nissl stained coronal section (50 μm) in rat R5. This rat was chronically implanted over 35 days. Dashed curve indicates the original shape of the section that was damaged due to the implant. Black arrowhead points to an electrolytic lesion mark of the deepest recording site on one of the shanks of the multi-electrode array. The depth of the lesion mark (~1250 μm) is consistent with the depth recorded using the micromanipulator during the surgery and corresponds to layer Vb of the barrel cortex (1.1 mm posterior and 5.2 mm lateral to bregma). (d) Peak PSTH counts of each neuron for each whisker versus its mean first-spike latency (n = 240). Each dot corresponds to the response of a single neuron to a single whisker. (e) Normalized mutual information between spike count/first-spike latency and whisker identity. Each dot corresponds to one neuron. Black diagonal line represents equal information (n = 80)................................................................................................. 107 Figure 5.4: Sample whisker-specific networks. (a) DBN networks inferred from one population (Rat R2) for 3 individually stimulated whiskers: C2, D1 and D2. Undirected links indicate bidirectional connections. Network of each whisker was inferred from a xiii dataset of length 18 sec (180 trials x 100 ms). (b) Connection probability in the DBNs as a function of the horizontal and vertical separations between the electrodes on which neurons were recorded. The number of connections inferred at each distance was normalized by the corresponding total number of possible connections. ................................................................................................................................. 110 Figure 5.5: Networks similarity within- and across-whiskers. (a) Network feature space of rat R2 for 3 different whiskers (C2, D2 and D1) and a shuffled dataset. Each dot corresponds to the projection of one network onto a 2-dimension principal components (PC1 and PC2) feature space. Insets: Sample networks from the 3 whiskers. Black edges represent common connections with the top left sample network (whisker C2). (b) Similarity between networks inferred for the same whisker and between networks inferred for different whiskers averaged across subjects (mean ± SD). Similarity for a given pair of networks was quantified as 1 – the normalized distance between the projections of the pair in the principal component space. Within whisker similarity was corrected for the bias resulting from the overlap in the data (estimated from the shuffled data). The figure indicates that the within whisker networks cluster more closely compared to across-whisker networks. *P < 1e-6, two-sample t-test. (c) Normalized mutual information between each of the networks inferred from the original and shuffled data, and the stimulus averaged across subjects (mean ± SD). *P < 0.01, two-sample t-test. .................... 112 Figure 5.6: Overlap in the original and the shuffled datasets. (a) Distribution of the amount of overlap between any pair of datasets for (Right) the original data and (Left) the shuffled data. Blue curve indicates a binomial distribution fit with parameters p = 0.2 and n = 180. Both figures indicate that both the original and the shuffled data have the same degree of overlap where any given pair of datasets would have an overlap of 20±4%. (b) Network feature space of the shuffled datasets extracted from rat R2 data. Each dot corresponds to the projection of one network onto a 2-dimension principal components (PC1 and PC2) feature space. (c) Similarity between networks inferred for the same shuffled dataset and between networks inferred for different shuffled datasets averaged across subjects (mean ± SD). Similarity for a given pair of networks was quantified as 1 – the distance between the projections of the two networks in the principal component feature space................................................. 113 Figure 5.7: Individual neurons subnetworks information. (a) Network feature space of one sample neuron (Neuron 16b of rat R2) for 3 different whiskers (C2, D1 and D2). Each dot corresponds to the projection of one network onto a 2-dimension principal components (PC1 and PC2) feature space. (b) Information in the individual neurons’ networks inferred for the shuffled data versus the original data (mean ± SD). * P = 0, two-sample t-test. (c) Normalized information in first-spike latency and spike count versus normalized information in the networks of each neuron. Each dot corresponds to one neuron. Black diagonal line represents equal information. Network information was corrected for any statistical bias by subtracting the network information computed for each neuron from the shuffled data in (b). .................... 115 xiv Figure 5.8: Network information in the original data is consistently higher than that in the shuffled data, independent of the bin size. (A) Network information in the original and the shuffled data as a function of the bin size used to estimate the mutual information averaged across the 5 subjects (mean ± SD). (B) Information in the network of individual neurons computed from the original and the shuffled data as a function of the bin size used to estimate the mutual information, averaged across the 80 neurons (mean ± SD). ......................................................................................... 116 Figure 5.9: Network properties and individual neuronal responses. (a) Histogram of the difference between the mean first-spike latency of the post-synaptic cells and the pre-synaptic cells for each inferred connection. Only unidirectional connections were counted in the histogram. Red bars indicate the fraction of connections consistent with the difference between latencies while blue bars indicate connections that are not. (b) Ratio between the number of outgoing connections and ingoing connections for each neuron (source index) as a function of its mean first-spike latency for each whisker. Each dot corresponds to one neuron for a given whisker (n = 240). Z-scores of the mean first-spike latency and the source index are reported on the X-axis and the Y-axis, respectively. Gray curve indicates decaying exponential fit. (c) Information in the networks of each neuron as a function of the source index averaged across whiskers (n = 80). Each dot corresponds to the standardized z-scores of one neuron. Gray line indicates regression line. ................................................. 117 Figure 5.10: Comparison with networks inferred using crosscorrelograms. (a) Histogram of the difference between the mean first-spike latency of the post-synaptic neurons and the pre-synaptic neurons for each connection inferred using the crosscorrelogram technique. Only unidirectional connections were counted in the histogram. Red bars indicate the fraction of connections consistent with the difference between latencies while blue bars indicate connections that are not. (b) Ratio between the number of outgoing connections and ingoing connections for each neuron (source index) as a function of its mean first-spike latency for each whisker. Each dot corresponds to one neuron for a given whisker (n = 240). Z-scores of the mean first-spike latency and the source index are reported on the X-axis and the Y-axis, respectively. Gray curve indicates decaying exponential fit. (c) Fraction of possible chain effect-induced connections and common input-induced connections inferred by crosscorrelogram and DBN (mean ± SD). * P < 0.001, two-sample t-test. ......................................... 119 Figure 5.11: Predicting single neuron firing. (a) ROC curves of sample neuron 2b in response to whisker C2 deflection (left) and neuron 17b in response to whisker D2 deflection (right) in rat R2. Predicted spike trains were obtained using the pre-synaptic cells’ history inferred by DBN and stimulus onset time, pre-synaptic cells’ history inferred by the GLM and stimulus onset time, and the stimulus onset time only. Each curve was computed from tenfold cross-validation datasets. (b) Predictive power comparison across all rats for multiple models (mean ± SD). * P < 0.001, twosample t-test. (c) GLM predictive power versus that obtained using DBN and stimulus onset time. Black diagonal line represents equal prediction (n = 240). (d) Predictive power obtained using DBN and stimulus as a function of the variability in xv mean first-spike latency. (e) Difference between predictive power for each neuron obtained using DBN and stimulus and that obtained using GLM as a function of the variability in mean first-spike latency. Each dot corresponds to one neuron for a given whisker (n = 240). Gray lines in (d) and (e) indicate regression lines........... 121 Figure 5.12: Partitioning the spike trains to obtain position-specific datasets for 24 right-turn trials (similar analysis for 18 left-turn trials)........................................................... 123 Figure 5.13: Networks inferred from left and right trials data for different maze position intervals. Only connections appeared in at least 3 out of the 8 datasets of each position interval are shown. Clear dissimilarity between network models of left and right trials can be seen. ......................................................................................................................... 125 Figure 5.14: (a) Similarity between the inferred networks at different maze positions computed using the F-Measure (equation 4.9) averaged across both sides. On-diagonal entries are eliminated. (b) Average similarity between the inferred networks vs. distance between maze position intervals. Values were extracted from the matrix shown in (a). It can be seen that networks become increasingly dissimilar as a function of the distance between position intervals, suggesting a differential population encoding mechanism that depends on the behavioral goal. .................................................... 126 Figure 6.1: (a) Timeline of the experiment. (b) Difference in the evoked response and (c) in the first-spike latency across the spared whiskers averaged across neurons and subjects for the control data (blue) and the plasticity data (red) recorded 1-2 days and 6-7 days post whisker-trimming. Y-axis was normalized by the maximum evoked response in (b) and the maximum first-spike latency in (c) across the spared whiskers for each neuron. *P < 0.05, two-sample t-test......................................................... 136 Figure 6.2: (a) Control (left) and 7-days post whisker-pairing (right) network feature space of a sample rat for 3 different whiskers (D4, D5 and D6). Whiskers D4 and D5 were paired after recording the control data. Each dot corresponds to the projection of one network onto a 2-dimension principal components (PC1 and PC2) feature space. (b) Network similarity across spared whiskers for the control data (blue) and data recorded 1-2 days and 6-7 days post-pairing (red) averaged across 4 subjects (mean ± SD). Similarity for a given pair of networks was quantified as 1 – the normalized distance between the projections of the pair in the principal component space. *P < 0.05, two-sample t-test............................................................................................. 138 Figure 7.1: A DBN model that distinguishes between signal and noise correlations. Gray node (W) takes a value ∈ {0, 1}, where 0 indicates no whisker deflection, while 1 indicates the deflection of any whisker. Dotted edges indicate signal influence while solid edges indicate noise influence. Signal correlation can be inferred from the graph by searching for neurons receiving connections from W (such as neurons 1 and 2) whereas noise correlation can be estimated by searching for neurons receiving connections from a common parent (such as neurons 3 and 4)............................... 145 xvi LIST OF ABBREVIATIONS AUC Area Under the Curve DBN Dynamic Bayesian Networks EPSP Excitatory Post-synaptic Potential GLM Generalized Linear Model HI Heterogeneity Index IPSP Inhibitory Post-synaptic Potential IS Influence Score ISI Inter-spike Interval JPSTH Joint Post-stimulus Time Histogram mPFC Medial Prefrontal Cortex PC Principal Component PDC Partial Directed Coherence PSTH Post-stimulus Time Histogram ROC Receiver Operating Characteristics SD Standard Deviation STDP Spike Timing-dependent Plasticity xvii Chapter ____________________________________________ 1 Introduction ______________________________________________________________________________ Hundred billion neurons with a total of two million miles of axons and trillion connections make the human brain the most complex organ in the human body [1, 2]. Studying the brain and its inner workings is therefore an extremely complex task. The functionality of the brain is known to lie in the enormous number of connections among the neurons that interact, rather than act independently. These interactions mediate many complex brain functions along numerous parallel and sequential pathways with highly intricate network structures. The massive size of these networks suggests the vastly complex information processing mechanism that underlies their operation [3, 4]. Many nervous system diseases and disorders are also known to result from abnormal changes in the function and/or the structure of these networks [5-7]. Deciphering the neural circuitry underlying the observed behavior is therefore of utmost importance in systems and clinical neuroscience. The long standing goal of inferring neural circuitry in vivo has always been challenged by the lack of technology that enables simultaneous recording of brain activity at the individual neuron level. Despite the emergence of different non-invasive techniques to record brain activity such as Electroencephalogram (EEG), Functional Magnetic Resonance Imaging (fMRI) and Magnetoencephalogram (MEG), these techniques do not directly measure the activity of individual neurons [8]. They rather measure the aggregate activity in a certain brain area which 1 leads into low spatial and temporal resolutions. On the other hand, invasive recording techniques have progressed over decades from recording the activity of an individual neuron using a single penetrating electrode to simultaneously recording the activity of multiple neurons using microelectrode arrays [9-12]. Implantable microelectrode arrays have enabled scrutinizing activity from cortical neurons at an unprecedented scale, and greatly accelerated our ability to monitor functional alterations of cortical networks in awake, behaving subjects [13-15]. The concomitant neurobiological discoveries have paved the way for these devices to become an essential constituent in neuroprosthetic devices to provide assistive technology to people with severe disability. Consequently, capitalizing on these advances by developing data analysis tools to identify neuronal connectivity from simultaneously recorded neuronal activity becomes the next essential step towards understanding how the brain encodes information in the population activity. There has been extensive research in the past few decades to develop methods to discover neuronal connectivity from simultaneously recorded spike trains. Neuronal connectivity can be classified into two different types: anatomical connectivity and inferred connectivity. Anatomical connectivity refers to the direct unidirectional synaptic links between the neurons which can be excitatory or inhibitory. On the other hand, inferred connectivity refers to the connectivity that can be identified between observed neurons in the presence or absence of a specific stimulus presentation. The inferred connections represent statistical dependence between the observed neurons that may or may not necessarily match the anatomical connections. Two types of inferred connectivity are frequently used in the literature: functional connectivity and effective connectivity [16, 17]. Functional connectivity refers to the undirected statistical dependence between the observed variables that is often represented in the form of clusters [18, 19]. At the 2 neuronal level, this can result from the presence of a synaptic link between the neurons, or can be observed when two unlinked neurons respond to a common driving input, for e.g., an unobserved neuron. Effective connectivity, on the other hand, refers to the causal relationships governing this dependence and involves identifying the putative pre- and post-synaptic neurons in the observed mixture. Existing neuronal connectivity inference methods include cross correlation-based methods, spike pattern classification, likelihood, information theory and frequency-domain methods [16, 20-22]. However, many of these methods were developed for analyzing single electrode recordings, and therefore are constrained by the number of neurons that a single depth electrode can pick, mostly pairs or triplets of neurons [23, 24]. When large-scale recording of multiple single units became more common, usage of these methods to infer a complex dynamic network structure has shown to be a nontrivial task. In addition some of these methods suffer from a number of limitations such as relying on pair-wise correlations and detecting linear relationships only. The objective of this dissertation is to: Develop advanced graph-based techniques to infer neuronal connectivity in large scale ensembles and demonstrate their analysis of simultaneously recorded spike trains Graph-based techniques have become increasingly popular in the analysis of large scale neural data [25, 26]. Advantages of graph-based techniques over classical measures of neuronal connectivity include the ability to consider the activity of the entire observed neuronal population rather than relying on pair-wise measures, the ability to detect non-linear relationships and the intuitive graphical representation of the inferred neuronal networks which allows visualizing and quantifying the properties of these networks as well as the plastic changes that occur to these networks, for example, as a result of learning and memory. 3 This dissertation reviews previous methods that have been proposed to identify neuronal connectivity from population activity and discusses their utility as well as their limitations. Graph-based methods are emphasized as a way to circumvent many of the limitations of the reviewed methods. We introduce two graphical approaches in this dissertation: First, the multiscale clustering algorithm for inferring functional connectivity which attempts to find disjoint subspaces (clusters) with arbitrary structures in the entire neural space using unlabeled data [27]. Second, using Dynamic Bayesian Networks (DBNs) to find the intrinsic structure of the system underlying each subspace by transforming the inference problem to a structure learning problem for which many efficient search algorithms exist [28]. In addition to validating the introduced approaches using data simulated by biologically-plausible models, we also provide evidences for the validity of these approaches by applying them to data recorded from the rat somatosensory cortex. Together, the two algorithms constitute a two-stage framework for statistical inference in large scale neural population data [29]. Chapter 2 provides the background and overview of the research presented in this dissertation. The chapter begins with a brief introduction of the biological aspects of neural activity since it is crucial to the understanding of the nature of the analyzed data. This is followed by the pre-processing steps required to extract neuronal spike trains from recorded extracellular activity. We then survey the most-widely used connectivity inference methods, stating the advantages and the limitations of each method. We finally introduce the utility of graph-based methods as a general framework for inferring neuronal connectivity. Chapter 3 suggests a multiscale clustering approach to identify functional connectivity between neuronal elements from their simultaneously recorded spike trains [27]. In particular, we identify clusters of neurons that exhibit functional interdependency over variable spatial and 4 temporal patterns of interaction. We represent neurons as objects in a graph and connect them using arbitrarily-defined similarity measures calculated across multiple time scales. We then use a probabilistic spectral clustering algorithm to cluster the neurons in the graph by solving a minimum graph cut optimization problem. Using point process theory to model population activity, we demonstrate the robustness of the approach in tracking a broad spectrum of neuronal interaction, from synchrony to rate co-modulation, by systematically varying the length of the firing history interval and the strength of the connections that govern the discharge pattern of each neuron. We also demonstrate how activity-dependent plasticity can be tracked and quantified in multiple network topologies built to mimic distinct behavioral contexts. We compare the performance to classical approaches to illustrate the substantial gain in performance. We finally validate the approach using real data recorded from the rat somatosensory cortex. Chapter 4 investigates the applicability of Dynamic Bayesian Networks (DBNs) in inferring the effective connectivity between spiking cortical neurons from their observed spike trains [28]. We demonstrate that DBNs can infer the underlying non-linear and time-varying causal interactions between these neurons and can discriminate between mono- and poly-synaptic links between them under certain constraints governing their putative connectivity. We analyzed conditionally-Poisson spike train data mimicking spiking activity of cortical networks of small and moderately-large sizes. The performance was assessed and compared to other methods under systematic variations of the network structure to mimic a wide range of responses typically observed in the cortex. Results demonstrate the utility of DBN in inferring the effective connectivity in cortical networks. Chapter 5 presents a comprehensive study in which stimulus-specific networks are inferred in layer V of the rat somatosensory cortex [30]. We examined spike timing correlation between 5 simultaneously recorded layer V neurons within and across columns of the primary somatosensory cortex of anesthetized rats during unilateral whisker stimulation. For each stimulated whisker, we inferred stable, whisker-specific, DBNs over many repeated trials, with network similarity of 83.3±6% within whisker, compared to only 50.3±18% across whiskers. These networks provided information about whisker identity that was 6 times higher than that provided by the latency to first spike and 22 times higher than that provided by the spike count of individual neurons. Furthermore, prediction of individual neurons’ precise firing conditioned on knowledge of putative pre-synaptic cell firing was 3 times higher than predictions conditioned on stimulus onset alone. These results suggest the presence of a temporally precise network coding mechanism that integrates information across neighboring columns within layer V about vibrissa position to mediate whisker movement by motor areas innervated by layer V. In addition, we demonstrate the inference of stable position-specific networks in the rat medial prefrontal cortex (mPFC) during a working memory task [29]. Chapter 6 demonstrates the use of DBNs to track plastic changes in layer V of the rat somatosensory cortex resulting from sensory alteration induced by trimming all whiskers but two, referred to as whisker pairing. Stimulus-driven activity before pairing and after 1-2 days and 6-7 days of pairing were analyzed using DBN. An increased similarity between the networks inferred for the spared whiskers was observed as a result of pairing where the similarity for the control data was 0.58±0.1, while after 1-2 days of pairing it increased to 0.66±0.1, followed by a further increase after 6-7 days to 0.71±0.1. These changes are consistent with previous studies that hypothesized that plasticity resulting from whisker pairing follows Hebb’s postulate of “Neurons that fire together, wire together”. Finally, Chapter 7 provides concluding remarks and future directions. 6 Chapter ____________________________________________ 2 Background and Overview ______________________________________________________________________________ 2.1 Neural Activity Neurons have four distinct regions that can be ordered according to the signal flow direction as: the dendrites, the cell body (the soma), the axon, and the axon terminals as shown in Figure 2.1a [31]. The dendrites serve as the antennae of the neuron through which the neuron receives inputs from other neurons. When the received inputs are strong enough, the cell body decides to fire an output signal, called an action potential or a spike, shown in Figure 2.1b. The spike is then propagated through the axonal terminals that deliver the output of the neuron to other adjacent and distal neurons. Neurons communicate with each other through specialized sites called synapses as shown in Figure 2.1c [31]. A synapse is the point at which the axon terminal of the pre-synaptic neuron (transmitter neuron) comes in contact with the dendrites of the post-synaptic neuron (receiver neuron). In general, connections between neurons are unidirectional, and are either excitatory or inhibitory. A spike from the pre-synaptic neuron increases the post-synaptic neuron firing probability in case of an excitatory connection, while for an inhibitory connection, it decreases the post-synaptic neuron firing probability. A neuron generates a spike if the integration of the inputs coming through the dendrites is higher than a predefined membrane threshold. In case of 7 Dendrites AP Direction Cell Body Synapse Axon (b) Post-synaptic Cell EPSP Memb. Potential +50 mV (a) Action Potential Pre-synaptic Cell Axon Terminals 10 - 500 ms 10 - 500 ms -60 mV Time IPSP 2 ms (c) (d) Figure 2.1: (a) Simple neuron structure (Adapted from www.solarnavigator.net/human_brain.htm). (b) Two neurons in contact. (c) A train of three action potentials or spikes. (d) Excitatory post-synaptic potential (top) and inhibitory postsynaptic potential (bottom) (Adapted from [32]). For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation. an excitatory connection, a spike from the pre-synaptic neuron leads to an increase in the membrane potential of the post-synaptic neuron. This increase is called excitatory post-synaptic potential (EPSP), which is shown in Figure 2.1d [32]. For an inhibitory connection, a presynaptic spike causes an inhibitory post-synaptic potential (IPSP) leading to a decrease in the membrane potential of the post-synaptic cell. For both EPSP and IPSP, the membrane potential decays back to the resting potential 10 to 500 ms (depending on the synapse properties) after the occurrence of the pre-synaptic spike at time 0. This implies that the effect of a pre-synaptic spike on the firing of the post-synaptic neuron lasts for a certain period of time during which that effect varies. The influence of that pre- 8 synaptic spike on the post-synaptic neuron firing at a certain time instant is the amount of change caused by the corresponding EPSP or IPSP signal in the post-synaptic neuron membrane potential at that time instant. Thus from the post-synaptic neuron perspective, it considers at each time instant the effects caused by previous pre-synaptic spikes, which leads to the notion of the firing history. The firing history of pre-synaptic neurons plays an important role in governing the firing of the post-synaptic neurons. 2.2 Recording and Pre-processing of Extracellular Neural Activity Techniques for recording brain activity can be classified into two categories: non-invasive techniques and invasive Electroencephalogram techniques. (EEG), Functional Examples Magnetic of non-invasive Resonance techniques Imaging (fMRI) are and Magnetoencephalogram (MEG). Being non-invasive, these methods measure the aggregate activity in a certain brain area not the activity of individual neurons leading to low spatial and temporal resolutions [8]. Numerous functional neuroimaging studies suggest that cortical regions selectively couple to one another [33-35], for example, during complex visual stimulus presentations [36-38], during motor prehension and execution in visuomotor tasks [39, 40], or in working memory and auditory-related tasks [41, 42]. It is widely believed, however, that the low temporal and spatial resolutions of these techniques do not enable the investigation of the brain’s functional dynamics at the single neuron level [43], nor do they enable the identification of causal relationships between multi-units across distant cortical regions. This dissertation is mainly focused on signals recorded using invasive techniques, specifically, signals recorded using implantable high-density microelectrode arrays [11, 12] which have enabled scrutinizing activity from cortical neurons at temporal and spatial resolutions 9 (a) (b) (c) Figure 2.2: (a) Micro-wire arrays. Wires made of tungsten with a diameter of 50μm and a minimum inter-wire separation of 250μm (Adapted from www.tdt.com). (b) Silicon-based multielectrode array (Michigan array) with (Left) 8 shanks and (Right) 4 shanks and 4 electrodes/shank [44]. (c) Another example of silicon-based arrays (Utah array) with 100 single electrodes [45]. (1-2 ms and 10-100 μm) much higher than what can be achieved using non-invasive recording techniques [24, 46]. Figure 2.2 illustrates two major kinds of microelectrode arrays: the tungstenbased arrays and the silicon-based arrays. The geometry of these arrays enables them to record from individual neurons extracellularly across different brain layers. The high temporal and spatial resolution of these arrays opened the door to investigating how populations of neurons encode information at the individual neuron level. They have been extensively used in studies addressing crucial neuroscience questions such as how neuronal ensembles represent sensory inputs [47-49], plan motor actions [50-52] and perform complex cognitive tasks such as decision making [13, 53]. Although the methods presented in this dissertation are introduced to analyze data recorded using microelectrode arrays, these methods can be applied to other signal modalities such as spiking recorded using 2-photon Calcium imaging [54]. 10 It is widely accepted in the literature that information processing in the brain is carried out using information in the spike event timing and/or rate, and not in the spike shape. Given that extracellular recording of ensembles of neurons is inevitable with multi-electrode arrays, spike sorting typically takes place to convert the neural recording mixture to spike trains of the individual neurons observed as illustrated in Figure 2.3 [55-57]. First, the recorded raw signals are band-pass filtered in the range [300, 5000] Hz which is the frequency band of the spikes. Second, spikes are detected on each electrode using a spike detection algorithm which typically identifies the time of the spikes by comparing the recorded signal amplitude to a pre-defined threshold [58]. In the time-domain, a threshold of 3 to 5 times the standard deviation of the noise amplitude, depending on the Signal-to-Noise Ratio (SNR), is normally used for spike detection [59, 60]. Third, a window with a pre-defined width, typically 1 to 2 ms, around the detection point of each spike is extracted. The extracted spikes are then aligned at either their peaks or their troughs depending on the recorded signals. Fourth, characteristic features are extracted from the spike shapes. Although spikes of all neurons are stereotypical with the same shape when measured intracellularly, they would have different shapes when recorded extracellularly. For example, the amplitude of the spikes emitted by a given neuron recorded on an electrode is inversely proportional to the distance between that neuron and the electrode. Therefore, given that spikes from different neurons will be recorded with different shapes, features extracted from the spike shapes can be used for spike sorting in which the recorded spikes are assigned to different neurons. Simple features can be extracted from the spike shapes such as the amplitude, the spike width and rise time. However, many more accurate approaches exist for feature extraction that lead to more compact clusters and more separation between the clusters of different neurons in the feature space and thus better spike sorting results. The most commonly 11 Raw Data 0 1 Time (sec) 2 Detection 0 Feature Extraction 1 Time (sec) 2 Feature Space After Clustering Feature Space PC2 PC 1 Clustering PC 2 PC1 Spike Assignment Figure 2.3: Extracellular recording pre-processing. Spikes are first detected (shown in red) from the filtered raw signal. Characteristic features are extracted from the detected spikes shape (using principal component analysis in the diagram). Each point in the feature space represents one spike. A clustering algorithm is then used to identify the spikes belonging to each neuron. In this example, 3 different neurons are recorded (red, blue and magenta) while the black cluster represents miss-detected spikes (i.e. noise). 12 used approach is principal component analysis (PCA) in which the first 2 or 3 principal components are used as the classification features [59]. Finally, a clustering algorithm such as kmeans, fuzzy c-means or expectation maximization is used to cluster the feature space and identify the spikes belonging to each neuron [61, 62]. 2.3 Existing Methods for Inferring Neuronal Connectivity The problem of inferring connectivity between neurons can be stated as follows: we observe the spike trains of a population of N neurons, denoted S=[Si]N×T, during an interval of length T. We want to identify the set of neurons, πi, in the set of the observed N neurons that affect the firing of neuron i, Si, in a statistical sense. We assume that S is expressed in the form of digital spike trains with a very small bin width Δ (usually 1-3 ms). This form assumes that at time bin m, Si(mΔ)=1 would indicate a spike is present, while Si(mΔ)=0 would indicate no spike. 2.3.1 The Crosscorrelogram The Crosscorrelogram is one of the oldest and most widely used methods for inferring neuronal connectivity [23, 63]. It is computed as a histogram of spike counts for a given neuron within a bin of pre-defined width ω, relative to the spike count of another neuron. For a given pair of neurons (i, j), and a given lag kω, the crosscorrelogram CCij is expressed as CC ij (kω ) = T kω Δ m =1 l =1+ (k −1)ω Δ ∑ S i (mΔ ) ∑ S j ((m + l )Δ ) (2.1) The peak of this histogram at a certain lag indicates excitation, while a trough indicates inhibition. The sign of the lag at which the peak or the trough occurs determines the direction of 13 the inferred connection. A significant peak or a trough at a positive lag indicates a connection from neuron i to neuron j, while a significant peak or a trough at a negative lag indicates a connection from neuron j to neuron i. An extension to the crosscorrelogram method is the Joint Peri-Stimulus Time Histogram (JPSTH) which measures the lagged synchrony between pairs of neurons relative to the onset of an external stimulus [21]. JPSTH results in a 2-dimensional histogram in which the value at a given element (t1, t2) represents the number of times the first neuron fired at time t1 and the second neuron fired at time t2 as shown in Figure 2.4. Both t1 and PSTH of Unit 1 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -400 -200 Average Spikes/bin Normalized Counts/bin Sample Crosscorrelogram 0 200 0.2 0.1 0 PSTH of Unit 2 0.2 0.1 0 400 Time (ms) 1 (a) 200 Time (ms) 400 JPSTH 400 2 0 200 0 0 200 400 Time (ms) (b) Figure 2.4: (a) Crosscorrelogram and (b) Post Stimulus Time Histogram (PSTH) and Joint PSTH (JPSTH) computed from the spike trains of two neurons in the rat somatosensory (barrel) cortex in response to whisker stimulation. Bin size was set to 10 ms in both figures. The crosscorrelogram in (a) indicates an excitatory interaction between the two neurons as indicated by the significant narrow peak around 10 ms. The JPSTH in (b) indicates that the significant synchrony between the two neurons occurs about 30 ms post stimulus presentation and therefore may indicate that the stimulus represents the cause of the synchrony. 14 t2 are measured relative to the stimulus onset. The further the distance from the diagonal, the more lagged the synchrony is. Both methods have a number of limitations. First, the analysis bin size is fixed and this means that neuronal firing is assumed stationary for both neurons during this interval.. There is ample evidence, however, suggesting that the time scale of neuronal excitability is largely timevarying, possibly to maintain a principled excitation–inhibition homeostatic balance [64, 65]. Most importantly, recent studies on spike-timing-dependent plasticity (STDP) have demonstrated that neurochemical receptors can generate excitatory post synaptic potentials (EPSPs) at resting potential that can last a few hundred milliseconds, which elongates the time window required for synaptic integration and eventually alters the temporal length for neuronal excitability [64, 6671]. Second, both the CC and JPSTH methods only consider correlation between pairs of neurons. Considering the effect of the entire population becomes important when quantifying information transmission in large networks that is not accounted for by using pair-wise association measures such as cross correlation. Third, cross correlation can only be interpreted as a measure of statistical dependence when the pair-wise responses are jointly Gaussian, which may not always be the case. Lastly, the analysis is typically limited to very short coincidence intervals above which the effect of covariation of firing rates in response to a common driving input cannot be ruled out [72]. 2.3.2 Information-theoretic Methods Information theoretic (IT) methods attempt to circumvent the limitations of crosscorrelograms by considering measures, such as mutual information [73], that rely on higherorder statistics but may differ in the neuronal response property used. The advantage is that the 15 analysis can be carried out independent of the detailed system properties as it relies on building empirical histogram estimates of joint response distributions. In some cases, precise spike timing is used to estimate these distributions [74-76], thereby revealing the tendency of neurons to synchronize their firing. In others, a binned version of the spike trains is used, thereby measuring stimulus-dependent covariation of firing rates. One example is the transfer entropy measure (TE) defined as [77] TE j →i = I ⎛ Si ; S h ⎜ j ⎝ f S ih ⎞ = ⎟ ⎠ ∑∑∑ Pr ( f si , s h , sih j f h h si s j si Pr ⎛ si s h , sih ⎞ ⎜ ⎟ j ⎝ ⎠, log f Pr ⎛ si sih ⎞ ⎜ ⎟ ⎝ ⎠ ) f (2.2) where sif , s h and s ih are binary variables, I ⎛ S i f ; S h S ih ⎞ is the mutual information between the ⎜ ⎟ j j ⎝ ⎠ f future firing of neuron i, S i , and the past firing of neuron j, S h , that cannot be accounted for by j using the past firing of neuron i, S ih . This measure can also be computed using binned versions of the spike trains. IT methods can also be viewed as pattern detection methods that search for frequent patterns of spikes (symbols) appearing more than chance level [78, 79]. The size of the pattern (word length) is an important feature that has to be identified to determine if the frequency of occurrence of a certain pattern above chance is indicative of a functional relationship. In a few studies [80, 81], the choice of the timescale was found to yield strongly correlated network states, even though pair-wise correlations computed over a fixed bin width was statistically insignificant. One potential caveat of IT methods is the need for large amounts of data where the connectivity can be assumed stationary to provide unbiased estimates of the joint distributions of the patterns. Neural plasticity may affect these estimates if it occurs over short time scales [77]. 16 The most popular among IT methods is the Maximum Entropy model (MaxEnt), rooted in statistical physics, in which the joint density of the population is expressed as an exponential function of the weighted individual and pair-wise activities ⎞ ⎛ ⎟ ⎜N N N 1 1 Pr (S1 , S 2 ,..., S N ) = exp⎜ ∑ δ i S i + ∑ ∑ γ ij S i S j ⎟ ⎟ ⎜ Z 2 i =1 j =1 ⎟ ⎜ i =1 j ≠i ⎠ ⎝ (2.3) where Z is a normalization factor, and δi and γij are the unknown parameters of the model. These parameters are estimated using an iterative scaling algorithm so that the expected values of the individual and pair-wise responses match the measurable information in the experiment [82] the mean spike count within a fixed window and pair-wise correlations. Pair-wise MaxEnt models have been shown to fit experimental data in vitro for small systems such as retinal ganglion cell cultures [83]. Their performance for larger systems (> ~20 neurons) remains questionable [84]. Recently, we proposed a Minimum Entropy Distance (MinED) algorithm to alleviate this curse of dimensionality when characterizing larger systems [85]. The method attempts to find the network structure that minimizes the distance between the entropy of single trials and the true entropy of the data, estimated from a large number of repeated trials. It has been tested on data from biologically plausible models of spiking neurons with natural adaptation mediated by an STDP mechanism [86]. The method demonstrated good generalization capability given limited data available. 2.3.3 Granger Causality Granger causality is one popular method that has been applied to study causal links between random variables [87]. Specifically, suppose that the spike train of neuron i at time bin m can be 17 predicted given the neuron’s own firing history and that of another neuron j using the bivariate autoregressive model S i (mΔ ) = K ∑ Aii (k )Si ((m − k )Δ ) + k =1 K ∑ Aij (k )S j ((m − k )Δ ) + ε i j (mΔ ) (2.4) k =1 where K is the maximum number of lags (model order) and the A’s represent the linear regression coefficients obtained by minimizing the squared prediction error εi|j when Sj is used to predict Si. Neuron j is said to Granger cause neuron i if the inclusion of Sj in equation (2.4) reduces the variance of the prediction error. This bivariate model can be extended to a multivariate model by summing the effect of other neurons in the population in the second and third terms of equation (2.4) [88]. Extending the time domain model in equation (2.4) to the frequency-domain yields the Partial Directed Coherence (PDC) and Directed Transfer Function (DTF) approaches [89, 90]. The main advantage is the abolition of the dependence of the time domain models on a specific bin size, under the assumption of stationarity within the analysis interval. Granger causality, whether computed in the time-domain or the frequency-domain, assumes linear interactions by virtue of the autoregressive model structure. This might lead to erroneous conclusions when nonlinear interactions occur, particularly in higher-level association cortices [91]. For example, auditory neurons have been consistently found to perform coincidence detection of their dendritic inputs [92, 93], while posterior parietal cortex neurons perform gain modulation to guide motor behavior in response to visual stimuli [94, 95]. It has also been reported that Granger causality cannot detect inhibitory connections with the same accuracy as excitatory ones [96, 97], and cannot discriminate between mono and polysynaptic connections. 18 2.3.4 Generalized Linear Model The Generalized Linear Model (GLM) is a generative model of wide use in many statistical problems. In the context of modeling neuronal population activity, GLM models the output of each neuron in terms of a conditional intensity function, λ(t F(t)) of a stochastic point process, where F(t) models all the factors that influence the neuron output [98, 99]. For inhomogeneous Poisson processes, the conditional intensity of neuron i is expressed as ⎛ λi (t Fi (t )) = exp⎜ β i + k i .ξ h + ⎜ ⎝ ⎞ αij .S h ⎟, ∑ j⎟ j ∈π i ⎠ (2.5) h where Fi(t) is a linear sum of the neuron‘s background rate βi, an extrinsic covariate ξ that the neuron independently encodes, and the firing history S h of the neurons driving neuron i forming j the set πi prior to time t (inclusive of neuron i to model self-inhibition caused by refractoriness). The parameters ki and αij model the neuron’s receptive field and synaptic coupling with neuron j, respectively. While the GLM has been used as a generative model and not as a method to infer neuronal connectivity, optimizing its parameters to fit the observed data involves calculating the coupling parameters αij’s and therefore can be viewed as a tool to identify neuronal connectivity. Typically, αij’s are calculated using maximum likelihood estimation techniques [100]. GLMs have been successfully shown to fit data in vivo from primary motor cortex during goal-directed 2D arm reach movements [98], as well hippocampus place cell firing activity of rats in spatial navigation tasks [101]. They have also been shown to fit population data from in vitro neural cultures such as the primate retinal ganglion cells (RGCs) [83], a structure with wellcharacterized anatomical connectivity. Figure 2.5 shows a block diagram of the model for a pair 19 Stimulus Filter Coupled Spiking Model Stimulus Stochastic Filter Nonlinearity Spiking -200 -100 0 Time (ms) Post-spike Filter Post-spike Filter 1 Gain Neuron 1 Coupling Filters 0 20 40 Time (ms) Incoming Coupling Filters OFF ON Neuron 2 3 2 1 0 (a) 0 1.0 0.8 0.6 20 40 0 20 Time (ms) 40 (b) Figure 2.5: (a) GLM of a 2-neuron population used to fit spike trains from stimulus-driven retinal ganglion cell (RGC) responses in vitro. Each neuron input is processed by a stimulus filter ki, a post-spike filter αii and coupling filters that capture other neurons’ dependencies, αij’s. (b) Parameters of an example retinal ganglion ON cell model: Temporal and spatial components of center and surround filter, post-spike filter and coupling filters from neighboring OFF and ON cells. (Adapted from [83]). of RGC neurons recorded in the study by Pillow et al. [83], where an example of the estimated parameters for one ON cell is illustrated, including the estimated synaptic coupling parameters with the neighboring ON and OFF cells. A critical assumption in the GLM is that the dependence of the firing rate on the external covariates is implicitly linear, with possibly the inclusion of a nonlinearity, such as the exponential function in the Poisson case. There are many prime differences, however, between the structure of cortical networks observed in vivo and that of neuronal cultures in vitro such as 20 the well-characterized spatial layout of the RGC system. More specifically, the linearity property may not always hold in many cortical areas where nonlinear mechanisms and non-Poisson output characteristics tend to predominate [13, 102-104]. For example, neuronal responses in prefrontal areas have been shown to represent discrete – possibly binary – values (rather than a continuum) reflecting possible “decision making” mechanisms reminiscent of sensorimotor integration (e.g. [105]). Many complex brain functions such as attention [106-108] and motion perception are thought to be mediated by highly heterogeneous and nonlinear firing patterns of neurons in other areas such as area MT [104, 109-111]. Most importantly, the GLM assumes the neuron responds h to a covariate ξ that is known or can be explicitly measured. However, in many cortical areas (e.g. premotor), the intrinsic driving input to the observed population may not be explicitly measured, and therefore the exact functional form of the encoding mechanism – the h mathematical relationship between extrinsic covariates, ξ ’s, and individual neurons’ outputs, λ ’s - remains a subject of intense debate [112-115]. From a computational standpoint, GLM fitting - like most maximum likelihood optimization techniques - is computationally intense and may tend to overfit the data because of the large number of parameters to be estimated. Some potential remedies include Bayesian regularization techniques that attempt to maximize the posterior density instead of the likelihood, assuming a priori beliefs about receptive field parameters. Another way is to assume sparse connectivity and use known anatomical properties of the populations being measured, or by assuming a specific functional form for the coupling between the neurons [83, 98], effectively reducing the number of parameters. Recent experimental evidence from selected brain areas seems to support this approach [116, 117]. 21 2.4 Graphical Models for Inferring Neuronal Connectivity Graphical models are considered an elegant blend of probability and graph theories to capture complex dependencies between random variables. They are a cornerstone in statistical inference and have been widely used in mathematics and machine learning applications such as bioinformatics, communication theory, statistical physics, signal and image processing, and information retrieval [118]. Their use in modeling neuroscience data has been surprisingly very limited, but is increasingly gaining strong attention in the computational and systems neuroscience communities. A graphical model uses a node to represent a random variable and an edge to connect two nodes representing a probabilistic relationship between the corresponding random variables. Formally, a graph G consists of a set of nodes V and edges E, written as G =< V, E >. Each node in V, denoted by vi, corresponds to a random variable xi. Each edge in E ⊆ V × V , denoted by vi → vj for directed graphs, indicates the causal influence of random variable xi on random variable xj (i.e. effective connectivity). For undirected graphs, edges indicate statistical dependence that is symmetric between the two random variables xi and xj (i.e. functional connectivity). An unweighted graph can be represented as an n × n binary adjacency matrix A, where n is the total number of nodes in the graph. Each element A(i, j) takes the value ‘1’ if there is a connection from vi to vj and ‘0’ if there is no connection between the corresponding nodes. An undirected edge between nodes vi and vj is represented by A(i, j) = 1 and A(j, i) = 1. In the context of neuronal networks, each neuron is represented by a node in a graph and each connection between a given pair of neurons is represented by an edge. Figure 2.6a illustrates an example of an undirected graph of neurons where an edge between a pair of neurons indicates their membership 22 in the same cluster (functional connectivity), while Figure 2.6b illustrates a directed graph for the same population with edges indicating causal influence (effective connectivity). Throughout this dissertation, for any effective connection a b, neuron a will be referred to as the pre-synaptic neuron while neuron b will be referred to as the post-synaptic neuron. These terms refer to both monosynaptic (direct anatomical connection) and polysynaptic (indirect influence through a chain of neurons interconnected by monosynaptic connections) connectivity unless otherwise indicated. Graphical representation of neuronal interactions is intuitive because it can be easily visualized and allows the use of many established graph metrics to quantify certain features of the inferred graph that can be compared to imaging and anatomical neural data [17, 119]. For instance, using graph-theoretic metrics, it has been discovered that anatomical and functional neuronal networks in C. elegans, vertebrates and primates follow a small-world structure [120123]. In this structure, nodes tend to connect to their nearest neighbors with a very high probability in addition to some distant nodes with a very low probability as illustrated in Figure 2.6c [120]. This structure results in a short path length between any given pair of nodes Lα log(n ) , where n is the total number of neurons, facilitating the information transfer while maintaining a low connection cost. Another network structure that has been argued to co-exist in brain networks is the scale-free structure shown in Figure 2.6d. In this structure, few nodes appear as hubs in the network with a large number of connections while the rest of the nodes would have small number of connections asymptotically following a power law where the probability of having k connections for a given neuron Pr (k )α k −c , where c is a constant [124, 125]. These discoveries all became possible as a result of using graphical representations. 23 1 5 2 3 7 4 1 6 3 8 5 2 6 7 4 (a) (b) 3 8 3 4 4 2 5 5 1 6 2 1 6 8 8 7 7 (c) (d) Figure 2.6: (a) Undirected graph of a sample neuronal population where edges indicate the membership of the connected neurons in the same cluster. (b) Directed graph of the same population in (a) where edges indicate the causal influence among the neurons. (c) Sample smallworld network. Connections are mainly local with the exception of a few distal connections (neurons 2 to 6 and neurons 5 to 8). (d) Sample scale-free network, where a small number of neurons appears as hubs (neurons 2 and 8). In addition to the measures graph-theory offers to post-process the inferred networks, they can provide more accurate models compared to existing methods of inferring neuronal connectivity. Existing directed and undirected traditional techniques for identifying neuronal connectivity surveyed above, such as crosscorrelograms and partial directed coherence, mainly rely on pair-wise correlations assessed over a very fine time scale (typically 0-5 milliseconds). While these techniques may be feasible to implement for a small number of neurons, they become computationally prohibitive for large number of cells and inadequate to collectively assess causal relationships in networks with polysynaptic connectivity in which interaction occurs over broader time scales (10’s-100’s of milliseconds). Graphical models, on the other 24 hand, do not rely on pair-wise relationships as detailed in Chapters 4 and 5. They rather take into account the activity of the entire observed ensemble when searching for relationships between the neurons. This enables identifying only direct, and not indirect, relationships between observed neurons. This is particularly important when dealing with cortical networks with complex connectivity patterns that are likely to span broader temporal and spatial scales. In such a case, simple pair-wise measures may erroneously lead to classify indirect relationships as direct, or entirely miss certain connections. An added advantage is that graphical models have a unique ability to detect nonlinear relationships that other techniques for inferring connectivity (such as Granger causality) cannot detect. Thus, using graphical models in analyzing neuronal spike trains might facilitate revealing how the underlying cortical circuits are reconfigured in response to complex, nonlinear stimuli features. Moreover, they can be used to explain the observed correlation in cortical network nd states beyond what 2 order maximum entropy models can reveal, given their sensitivity to variations in temporal dependence [82, 126]. 25 Chapter ____________________________________________ 3 A Multiscale Clustering Approach for Identifying Neuronal Functional Connectivity ______________________________________________________________________________ 3.1 Neuron Population Model Here we introduce the multiscale clustering approach to identify neuronal functional connectivity [27, 29]. A common assumption among all of the classical techniques of inferring neuronal connectivity surveyed in Chapter 2 is that the interval over which the functional relationship between any neuron pair remains fixed. While this may be the case in local populations, such a simplistic assumption does not take into account a number of factors governing the discharge probability of a neuron. This probability can vary considerably over short latencies of a few milliseconds depending on the strength of synaptic connectivity to other adjacent neurons [65]. It can also vary over relatively longer latencies, possibly hundreds of milliseconds [127], depending on the dynamics of neurotransmitter release. In addition, information content in spike trains was shown to vary considerably depending on the timescale chosen for the analysis, and, depending on a particular choice of that timescale, strongly correlated network states can emerge even though correlations over a fixed bin width may be statistically insignificant [80, 81]. Thus, there is an urgent need to assess the interaction between simultaneously recorded neuronal elements over a multitude of timescales to better understand the nature of neural computations that give rise to behavioral events. 26 A neuron population model can be formulated as follows: suppose that in a P neuron population, we want to approximate the probability of the pth neuron firing within an interval of length T. If the interval is binned to a very small bin width Δ chosen to account for the refractory period (i.e., the digital form of the spike train can accommodate consecutive ‘1’ bins, and no more than one event can occur in any given bin), then the neural discharge pattern during T will be a vector of length N= T/ Δ , and the spikes will occupy some fraction Np/N of these N bins, where Np denotes the number of events within T. This simple calculation gives an empirical estimate of the firing rate of the neuron within T. Alternatively, if T = [0, t ) , this probability can ( ) also be calculated with knowledge of the conditional intensity function λ p t H p (t ) , where H p (t ) denotes the history of all the processes that affect the firing probability of the neuron p up to time t (e.g. stimulus feature, other neurons’ interaction, the neuron’s own firing history, etc…). In this model, the history H p (t ) models the effects caused by post-synaptic potentials. Specifically, the probability of a spike event for small Δ is [128] ( ) Pr{spike from neuron p in (t , t + Δ ] | H p (t ) } ≅ λ p t H p (t ) Δ (3.1) The spike train Sp can be modeled as a conditionally independent Bernoulli random process with probability mass function [129] ( ) ⎧ λ p t H p (t ) Δ, ⎪ f s p (t ) = ⎨ ⎪1 − λ p t H p (t ) Δ, ⎩ ( ) ( ) if S p (t ) = 1 (3.2) if S p (t ) = 0 Here the success probability of the Bernoulli random variable is time-varying, given its dependence on the history interval. This interval is likely to be of variable duration when the 27 intrinsic properties of the synapses undergo substantial changes, for example, as a result of Hebbian learning. Consider for example a version of the conditional intensity of an inhomogeneous Poisson model [128] ⎞ ⎛ P M pq ⎜β + λ p t i α p , H p (t i ) = exp⎜ p ∑ ∑ α pq (mΔ )S q (t i − mΔ )⎟ ⎟ ⎟ ⎜ q =1 m=1 ⎠ ⎝ ( ) (3.3) th where α pq denotes the synaptic coupling between neurons p and q in the m time bin, Mpq is the length (in bins) of the history interval that relates neuron p’s firing probability to activity from neuron q and can vary as q varies, S q (t i − mΔ ) is an indicator function that takes the value th of ‘1’ if neuron q fired a spike in the m time bin and 0 otherwise, and βp is the log of the background rate of neuron p. The α pq ’s are time-varying weight factors governing how much the occurrence of a spike from neuron q, going back Mpq bins in time, influences the probability of firing of neuron p at time ti. The model in equation (3.3) has been previously demonstrated to be powerful in fitting data from motor cortex during goal-directed arm reach movements [98, 130], as well as data from place cells in rat hippocampus area CA1 during foraging in a linear track [131]. A related model in [99] was restricted to a fixed length history interval among all neurons in the ensemble. The model in equation (3.3) is more realistic for two reasons: first, it takes into account variable length of the history interval among any given neuron pair consistent with the expected variations in the post-synaptic potentials duration depending on the cell morphology; and second, it accounts for temporal precision in spike occurrence when Δ is too small (for example, 1 ms) which is important for identifying spike-timing dependent plasticity (STDP) in synaptic connectivity as will be detailed later. 28 Consider the weight function α pq that models the connection from neuron q to p given by α pq (t ) = A pq sin (Fπt M pq )exp(− Bt M pq ) (3.4) where F and B are constants, and Apq is a magnitude controlled by synaptic modification. The damped sinusoidal form in this expression implies that a spike from a neuron q occurring towards the end of the history interval has a smaller contribution to the firing probability of neuron p than if it were to occur at an earlier time. This weight function mimics the change in the membrane potential of post-synaptic neurons caused by an excitatory post-synaptic potential (EPSP) when Apq is positive or an inhibitory post-synaptic potential (IPSP) when Apq is negative. The damped sinusoidal form of α pq models a synchronized oscillatory impulse response of the effect of potentially many unobserved, bidirectional, excitatory and inhibitory neurons that affect the discharge pattern of neurons p and q. The time-varying characteristics of α pq are controlled by the sinusoid frequency and the time constant of the exponential term, which is equivalent to the bandpass filtering effect of synaptic transmission [132]. Other functions can also be used to mimic different characteristics. A sample function is shown in Figure 3.1 for two different history intervals while other constants are fixed. A narrow analysis time scale will be suitable to track the dynamics of α pq when Mpq is small, while a larger analysis time scale will be needed if Mpq is large. It is very likely that the interaction history intervals between multiple neurons in an ensemble will be distinct, given the possibility of a neuron to simultaneously participate in multiple neuronal clusters consistent with Hebbian learning. 29 2 1.5 1 0.5 0 -0.5 -1 -1.5 -2 Amplitude Amplitude 2 1.5 1 0.5 0 -0.5 -1 -1.5 -2 0 50 100 Time (ms) (a) 150 0 50 100 Time (ms) (b) 150 Figure 3.1: Synaptic weights for (a) Mpq = 40 bins (Hpq = 120 ms) and (b) Mpq =120 bins (Hpq = 360 ms). In both cases, Apq = 2.5, B = 2000 and F = 3000. When the history interval is limited to 40 bins, the function exhibits smaller time constant within the short time support where the function is non-zero. In contrast, the function exhibits larger time constant when Mpq =120. The above analysis emphasizes the fact that correlations are highly dependent on the time scale that govern the functional connectivity among neurons, which in turn depends on the strengths and directions of the synaptic coupling as well as the particular elements of a given cluster. In a real situation, specific information about these elements and the associated history is not available to the experimenter, and therefore functional interdependence between any neuron pair has to be identified without knowledge of these parameters. 3.2 Scale-space Representation The dependence of the correlation on the variable-duration history terms necessitates a preprocessing step to identify any intricate variations in neuronal discharge probability across multiple time scales. This step consists of projecting the spike train onto a scale space using a variable duration Haar rectangular basis [133]. At any given time scale j, the spike train can be expressed as 30 j S p (t ) = W ( j ) S p (t ) where W (j) j = 0,1,..., J (3.5) is a lumped matrix representing the Haar wavelet transform, and J is the total number of time scales, i.e., depth of temporal analysis. This is a parameter to be chosen by the user. Figure 3.2 shows an example of a scale space representation of a spike train snippet up to two (j) time scales. The Haar transformation operator W for the first two time scales takes the form 0 0 0 0 0 0 ⎡1 2 1 2 ⎢ 0 0 0 0 0 0 12 12 ⎢ ⎢ 0 12 0 0 0 0 0 12 ⎢ 0 0 12 12 0 0 0 0 W (1) = ⎢ ⎢1 2 − 1 2 0 0 0 0 0 0 ⎢ 0 0 0 0 0 1 2 −1 2 ⎢0 ⎢0 1 2 −1 2 0 0 0 0 0 ⎢ 0 0 0 0 0 1 2 −1 2 ⎢0 ⎣ ⎡1 4 ⎢ 0 (2) W =⎢ ⎢1 4 ⎢ ⎣ 0 14 0 14 0 14 0 0 14 0 14 0 14 14 0 −1 4 0 −1 4 0 0 14 0 14 0 −1 4 where multiplying the top half of W (1) ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ 0 ⎤ 14 ⎥ ⎥ 0 ⎥ ⎥ −1 4 ⎦ (3.6) by the spike train Sp gives the probability of firing in 1 successive bins that are two samples in size. These represent the approximation coefficients a p . Multiplying the bottom half of W (1) by the spike train gives the relative change in the firing pattern in successive bins that are two samples in size. These represent the detail coefficients (2) d 1 . Similarly, a 2 and d 2 can be obtained by using W , which is equivalent to computing the p p p probability of firing and the relative change in successive bins that are four samples in size, and so on. Two important features can be extracted from the scale space representation: 1) the j probability of firing at time scale j indicated by the sequence a p (t ) along the transform approximation path (left branch); 2) the relative change in firing probability across successive 31 S p = {1, 0, 0, 0, 0,1,1,1} d 1 = { 1 2 ,0, −1 2 , 0} p a1 = { 1 2 ,0, 1 2 , 1} p a 2 = { 1 4 , 3 4} p d 2 = { 1 4 , −1 4} p Figure 3.2: Two-level Haar scale space representation of a sample spike train. j bins indicated by the sequence d p (t ) along the details path (right branch), where positive/negative values indicate a decrease/increase in the probability of firing relative to the previous bin, respectively. At this stage, the scale space representation incorporates all the desired information about the dynamics of the discharge patterns. 3.3 Similarity Measures Once the scale space representation is obtained for every spike train, a similarity measure is computed between pair-wise neurons in each time scale up to a temporal depth set by the user. The choice of this depth largely depends on the expected latency of interaction between neurons, which typically depends on the brain region(s) from which these neurons are recorded. The similarity measure used throughout this chapter is the sample cross-correlation between neurons p and q at time scale j, and can be expressed as [ ] j j σ pq (t i , t l ) = E S p (t i )S qj (t l ) (3.7) ( j) ∈ ℜ P × P , can be formed using equation (3.7) The full correlation matrix at time scale j, ∑ for all p, q ∈ { ,..., P} . Other similarity measures can also be used (e.g. mutual information, 1 coherence, Granger causality, etc…). 32 Let’s now define the total “instantaneous energy” of a given cluster, C k ∈ ℜ P ×P , as the algebraic sum of all instantaneous pair-wise correlations between the elements of that cluster, i.e., C k = ∑ j ∑ σ pq , where the time indices have been dropped for notation simplicity. We j p ,q∈{k } note here that all the off-diagonal entries C k ( p, q ) are zero for p, q ∉ {k } . The objective is to determine the nonzero entries C k ( p, q), ∀p, q ∈ {k} that can serve as similarity measures between pair-wise neurons, irrespective of the length of the history interval governing the interaction between the two neurons. These entries can be obtained by simultaneously ( j) diagonalizing the correlation matrices ∑ computed from equation (3.7) for j = 0,1,..., J . Specifically, a block diagonal matrix R ∈ ℜ ( P×J )×( P×J ) that has the j th ( j) P × P block as ∑ is formed. This matrix is diagonalized using Singular Value Decomposition (SVD) as ⎡ ∑ ( 0) ⎢ R=⎢ ⎢ ⎢ ⎢ ⎣ ⎤ ⎥ Q ∑ (1) ⎥ ≅ λ u uT ⎥ ∑ i i i O ⎥ i =1 (J ) ∑ ⎥ ⎦ (3.8) th where λ i u i denote the eigenvalue/eigenvector pair associated with the i dominant mode of the block diagonal matrix R . To compute an instantaneous similarity measure between pair-wise neurons, the dominant Q modes of the decomposition in equation (3.8) are used to approximate the augmented correlation ( j) matrix R. In practice, this is achieved by first vectorizing the J matrices ∑ to P 2 × 1 - dimensional vectors, concatenating the J vectors obtained in a matrix of dimension P 2 × J , performing SVD on the resulting matrix and keeping the first Q modes in a matrix of size 33 P 2 × Q , then finally reshaping the P 2 × Q matrix to Q matrices of size P × P. SVD is used here as an efficient way to fuse the correlation matrices obtained at different time scales into one similarity matrix while taking into account the expected variance in neuronal temporal interactions. SVD captures second order statistics while simultaneously decorrelating the information across multiple time scales, thereby eliminating any redundancy in the cross correlations obtained. 3.4 Spectral Clustering The next step relies on clustering a graphical representation of the population. Using graph theory, each neuron is represented as an object. Each object p, in our case a neuronal spike train, is represented as a node in the graph. Any two objects, p and q, are connected by a directed edge whose weight w pq is the similarity between the two objects. An example of an undirected th graph is shown in Figure 3.3. In our case, the weight w pq is the sum of the ( p, q ) entry of the Q principal matrices weighted by the corresponding eigenvalue obtained from equation (3.8). The present task is to cluster the neurons in this graphical representation by identifying the groups of strongly similar objects. In our case, we use a spectral clustering approach drawn from Machine Learning theory [134, 135]. A spectral clustering algorithm performs graph partitioning using a minimum cut algorithm. Specifically, the connected graph is divided into multiple disconnected sub-graphs such that only edges of small weights are removed. In the example shown in Figure 3.3, by removing the edges with small weights, we are able to identify two clusters of strongly correlated objects, with object 1 and 2 in the first cluster and object 3 and 4 in the second cluster. Since a typical graph cut problem is NP-hard, some approximation 34 O1 2 O2 6 7 7 6 O3 2 O4 Figure 3.3: A sample undirected graph with 4 objects and arbitrary edge weights for which the spectral clustering algorithm clusters objects with strong similarity (small edge weight indicates stronger similarity between objects). methods exist to efficiently solve the problem using “soft” memberships to reduce the computational complexity [136]. th Let’s denote by a pk the probability of the p neuron to be in the k th cluster and solve the minimum cut problem by finding the set of probabilities {a pk } that maximize the following objective function P P l= K ∑ k =1 (3.9) ∑ ∑ a pk a qk w pq p =1q =1 P P ∑ ∑ a pk w pq p =1q =1 The final cluster memberships are derived from {a pk } by assigning each object to the cluster * with the largest probability, i.e., k p = arg max a pk . Note that although many clustering k∈[1,...,K ] algorithms exist, a number of studies have shown that spectral clustering is able to outperform many well known clustering algorithms, particularly when cluster boundaries are complex to define [137]. Our results demonstrate that this is indeed the case. Figure 3.4 illustrates the steps of the entire algorithm. 35 • For j = 1 to J (Time scale index) For p = 1 to P (Neuron index) j Obtain the wavelet representation S p of each spike train S p End Compute the P × P correlation matrix ∑ ( j ) of the spike trains at time scale j End • Form the block diagonal correlation matrix R from the matrices ∑ ( j ) and obtain its spectral decomposition using SVD • Obtain the pair-wise similarity of neurons using the weighted sum of the first few dominant modes of R • Compute the cluster membership of each neuron by maximizing the objective function (3.9) Figure 3.4: Pseudocode of the multiscale clustering method. 3.5 Results At that stage of development, it was essential to test the algorithm on spike train data for which we knew the absolute truth. Therefore, we extensively tested the algorithm on data simulated using the point process model of equation (3.3) with known relationships among neuronal elements. We also propose metrics to assess the performance when the ground truth is not available when analyzing real data. 36 3.5.1 Clustering Synaptically-Coupled Populations We used synaptic coupling in equation (3.3) according to the following expressions [99] ( ) ( ) α + (t ) = 2.5 sin(4000πt M pq ) exp(− 3000t M pq ), pq α − (t ) = −3 sin(2000πt M pq )exp(− 3000t M pq ), pq α 0 (t ) = −2 sin 3000πt M pp exp − 3000t M pp , pp (3.10) where t is in seconds, α 0 , α ± are the auto-inhibitory, and excitatory/cross-inhibitory pp pq interactions, respectively. We generated data sets (n=25) containing 16 neurons each with a total duration of 98 sec in each trial. In each dataset, the 16 neurons were divided into 4 clusters with 4 neurons each. 3.5.1.1 Population with Fixed History Interval Figure 3.5a shows the network topology of the 16-neuron population with no across cluster connectivity. Within a cluster, each neuron inhibits itself as well as its neighbor in the clockwise direction, while excites the other neighbor in the anti-clockwise direction. In this dataset, the parameter Mpq was set to 120 for all interactions. Figure 3.5b shows a 3-seconds raster of the generated spike trains. A sample inter-spike interval (ISI) histogram of a neuron in the population and that of an independent Poisson model for two choices of Mpq (4 and 120 bins), is shown in Figure 3.5c demonstrating the effect of inhibition (first 40 ms) and the excitation (40100 ms) intervals resulting from interaction with other neurons when Mpq is large, while this interaction effect diminishes when Mpq decreases. We also computed the Coefficient of Variation (CV) of the Inter-spike Interval (ISI) histograms ( CV = std (ISI ) mean ( ISI ) ) to compare the statistics of the data generated by the model to those typically observed in cortical 37 I 1 E I 2 I I C1 I E I 4 E 3 I I E 13 I 14 6 I E I C2 I I 8 E 7 I I E I I E 5 I E I I E I C4 I I 16 E 15 I I E E 9 I 10 I E I C3 I E I 12 E 11 I I (a) C1 C2 C3 C4 80 Count 60 0.75 1.5 Time (s) (b) Dep. Indep. Dep. Indep. 40 20 0 0 3 1 0.9 0.8 60 120 240 360 Hist. Interval (ms) 150 300 450 ISI (ms) (c) Figure 3.5: (a) Network topology for the independent clusters population (solid lines), I = inhibition, E = excitation. (b) Sample 3-sec trace of the obtained spike trains. The initialization period is indicated by a black bar. (c) Sample ISI histogram for a neuron in the population and an independent neuron at 12 ms and 360 ms history intervals, respectively. Coefficient of variation of the ISI histogram. (d) Augmented correlation matrix using time scales up to 96 ms. (e) Average clustering accuracy vs. time scale for a fixed 360 ms history interval obtained using probabilistic spectral clustering and k-means clustering. 150 300 ISI (ms) 450 0 2.25 Coeff. of Variation __________ 0 38 2 4 6 8 10 12 14 16 Neuron (d) Figure 3.5 (cont’d) 10 5 0 Avg. Clustering Acc. Neuron 16 14 12 10 8 6 4 2 Spectral k-means 1 0.75 0.5 0.001 0.01 0.1 1 Time Scale (s) (e) 10 100 neurons [106, 138]. The rightmost plot in Figure 3.5c illustrates that as the history interval increases, the CV decreases below 1 (CV = 1 is equivalent to a Poisson neuron), consistent with characteristics of many types of cortical neurons [139]. In each trial, four across-cluster interactions pairs were randomly added to simulate noise or potential error margins that may be encountered, for example, during spike sorting of simultaneously recorded neurons [59]. Each pair consisted of an excitation in one direction and an inhibition in the other using the same expressions in equation (3.10). Figure 3.5d shows the augmented R matrix obtained by considering correlation matrices up to a time scale of 96 ms illustrating the computed similarity between pair-wise neurons averaged across trials. Neurons were ordered such that neurons 1-4 represent cluster 1, neurons 5-8 represent cluster 2, and so on. Neurons in the same cluster are observed to share significantly larger similarity than those across-clusters. The average clustering accuracy using the proposed approach shown in Figure 3.5e (the solid curve) was above 96% for time scales in the range of ~200 ms to ~1.5 sec and peaks around 360 ms, consistent with the fixed history interval chosen in the model ( M pq Δ = 360 ms), then it declines below 80% for smaller (< 10 ms) as well as larger time scales (> 10 39 sec). Since we did not account for synchrony explicitly in this dataset, these results confirm our expectation that the clustering accuracy is maximized at temporal resolution matching the network properties. To assess the effect of each of the computational steps in the algorithm on the clustering accuracy, we first examined the contribution of the scale space step. When the similarity matrix was computed using a fixed bin width, i.e., without the scale space representation prior to clustering, an average accuracy of only ~51% was achieved. We then compared the performance when the k-means was used as opposed to spectral clustering. Figure 3.5e shows that the clustering accuracy deteriorates (maximum accuracy in this case is 61%) compared to the case when spectral clustering is used. Therefore, we concluded that both steps are important for superior performance. 3.5.1.2 Population with Variable History Interval As derived earlier, the interaction history interval with neighboring neurons plays a fundamental role on the probability of firing of the neuron under consideration. For example, if the time constant of an excitatory synapse between neurons p and q is small (e.g. Mpq = 4 bins as in Figure 3.5c), neuron p is likely to fire Mpq times the likelihood of firing independently. In contrast, when the time constant is large (e.g. Mpq = 120 bins), neuron p incorporates the history of neuron q’s firing over a much longer time scale and therefore ignores the precise timing of neuron q’s firing and emphasizes firing rate information. Thus, the former case represents synchrony effects governing the probability of firing, while the latter represents response to rate modulation effects, both believed to be major constituents of the neural code [140, 141]. 40 We varied the history parameter Mpq and examined the algorithm performance as illustrated in Figure 3.6a. The performance systematically peaked around the specific history interval chosen in the model. This is further indicated by observing the monotonic relationship in Figure 3.6b between the time scale where maximum accuracy was attained and the history interval in the population model. In our model, the firing pattern of each neuron is a highly nonlinear function of the individual pre-synaptic neurons’ firing pattern as indicated by the exponential term and the nonlinear relation between Mpq and the synaptic strength in equation (3.4). Therefore, we did not expect the relationship between the time scale of maximum accuracy and the history interval to be linear. It is also noteworthy that we are not interested in estimating the history intervals of interaction, but rather identify the clusters of functionally interdependent neurons. Max. Acc. Time Scale (s) Avg. Clustering Acc. 1 0.8 0.6 0.4 H=60 ms H=120 ms 0.2 H=240 ms H=360 ms 0 0.001 0.01 0.1 1 10 Time Scale (s) 10 1 0.1 0.01 0.001 100 (a) 0.1 History Interval (s) 1 (b) Figure 3.6: (a) Clustering accuracy vs. time scale fitted with 3 order polynomial for the network in Figure 3.5a for variable history intervals. (b) (Gray dots) Time scale of maximum clustering accuracy obtained for the 25 trials vs. the population true history interval. (Black dots) Average of maximum accuracy time scales. rd 41 3.5.1.3 Clusters with Distinct Temporal Interactions We further examined the effect of varying history intervals across clusters, while maintaining it fixed among neurons within a given cluster. We used the same topology in Figure 3.5a with 1 2 3 4 k M = M = 20, and we varied both M and M , where M denotes the history interval (in bins) of interaction between neurons in cluster k. Figure 3.7a shows the clustering accuracy for different 3 4 k k settings of H and H (H = M × Δ) with a peak accuracy of more than 90% in all cases. As can be seen from Figure 3.7b, the monotonic relationship indicates that the accuracy is not diminished as a result of the distinct patterns of interactions expressed by the variable history intervals. 3.5.1.4 Distinct within-cluster Temporal Interactions We also examined the performance of the approach when variable history intervals of Max. Acc. Time Scale (s) Avg. Clustering Acc. interaction occur within a given cluster. This case is relatively challenging since the dependence 1 0.9 0.8 0.7 0.6 0.5 0.4 H=120 ms H=240 ms H=420 ms H=840 ms 0.001 0.01 0.1 1 10 Time Scale (s) 100 (a) 1 0.1 0.01 0.001 0.1 History Interval (s) (b) 1 rd Figure 3.7: (a) Average clustering accuracy ± SD vs. time scale fitted with 3 order polynomial 3 4 1 2 3 4 for different settings of M and M (M and M fixed to 20 bins). H = H = H . (b) (Gray dots) Time scale of maximum clustering accuracy vs. history interval of clusters 3 and 4 for each of the 25 trials. (Black dots) Average of maximum accuracy time scales. 42 between any pair of neurons’ firing pattern is only governed by the presence of a non-zero α pq , while dependence on similar Mpq is not present. The same topology of Figure 3.5a was used with M1i, M5i, M9i, and M13i set to 10, M2i, M6i, M10i, and M14i set to 20, M3i, M7i, M11i, and M15i set to 110, and M 4i , M 8i , M 12i , and M 16i set to 120, where i ∈ {1, 2, ...,16} . Figure 3.8a demonstrates that the average clustering accuracy reaches ~97% around the 100 ms time scale 1 Avg. Clustering Acc. Avg. Clustering Acc. 1 0.8 0.6 0.4 0.2 0.001 0.01 0.1 1 10 Time Scale (s) (a) 100 0.8 0.6 0.4 0 0.6 0.12 Window Size (s) (b) 0.18 Aveg. Clustering Acc. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.01 0.1 1 10 100 Window Size (s) (c) Figure 3.8: (a) Average clustering accuracy ± SD over all trials vs. time scale. (b) Average clustering accuracy using crosscorrelograms for different window sizes. (c) Average clustering accuracy using JPSTH for different window sizes. 43 showing that the neuronal elements in each cluster are correctly identified despite the variability in the history intervals within the cluster. For comparison, we assessed the performance to that obtained using two standard techniques: the crosscorrelogram and the Joint Post Stimulus Time Histogram (JPSTH) [21]. Both methods rely on using a fixed bin width correlation analysis as detailed in Chapter 2, section 2.3.1, and therefore are not expected to be robust to variations in the history interval. We tested window sizes of various numbers of bins (each bin was 4 samples). A correlation matrix was formed in each case, where each entry corresponded to the peak of the crosscorrelogram/JPSTH obtained for the corresponding pair of neurons that exceeded a predefined significance level. Figure 3.8b and Figure 3.8c show the average clustering accuracy for different window sizes, obtained using the crosscorrelogram and JPSTH-based similarity matrices as input to a standard hierarchical clustering. In this analysis, we used the average linkage hierarchical clustering algorithm in which the distance between two clusters is computed as the average of the pair-wise distance between the members of the two clusters [142]. Comparing Figure 3.8a to Figure 3.8b and Figure 3.8c, better clustering accuracy is obtained using the proposed approach. 3.5.2 Variability in Network Topology So far we have examined the performance of the algorithm under systematic variation in temporal dependence to illustrate its ability to track nonstationarity in the temporal interaction between neurons within a fixed network structure. Herein, we examine the ability of the algorithm to track variations in spatial interactions. This was investigated in two different ways: 1) by manually inducing changes in the structure of the network in which new connections randomly appear; and 2) by gradually varying the connectivity strength based on an activity- 44 dependent spike-timing dependent plasticity (STDP) model, widely believed to underlie Hebbian learning. 3.5.2.1 Structural Plasticity To invoke structural plasticity in the model, new connections were instantaneously created in the population between subsets of neurons from different clusters as shown in Figure 3.9a. This can mimic changes in neuronal circuitry resulting from instantaneous activation of distinct pathways, for example, when cortical rewiring takes place post traumatic brain injury or stroke [143]. These structural changes are long thought to accompany functional plasticity across multiple time scales, even under normal conditions [144, 145]. Another example of structural changes triggered by functional plasticity is the sudden transition in behavioral goal during a motor task [146]. This may trigger an observed neuron -previously not involved in encoding behavioral task parameters- to be instantaneously “recruited” based on the correlation between its tuning characteristics and the new task demands. In such case, the synaptic connections linking this neuron to other across-cluster neurons can be modeled as a step function centered at the time marker of the response trigger, and weighted by an excitatory/inhibitory coupling α pq with appropriate strength. rd For illustration purposes, the 3 st neuron in each cluster now excites the 1 neuron from the th nd neighboring cluster in the clockwise direction, while the 4 neuron in a cluster inhibits the 2 neuron from the neighboring cluster in the anti-clockwise direction. The synaptic coupling was introduced over distinct time markers and over multiple time scales. Noise was added in the same way as previously described. The across-cluster interaction expressions used for synaptic coupling are 45 I 1 E I 2 I I I E C1 I E I 4 E 3 I I E I I E E I E I 13 I 14 I E 5 I 6 I E I C2 I I 8 E 7 I E E I I E I C4 I I 16 E 15 I I E E 9 I I 10 I E I C3 I E I 12 E 11 I I 1 0.9 Neuron Avg. Clustering Acc. (a) 0.8 0.7 0.6 16 14 12 10 8 6 4 2 15 10 5 Isolated 0 Interacting 2 4 6 8 10 12 14 16 0.4 0.001 0.01 0.1 1 10 100 Neuron Time Scale (s) (b) (c) Figure 3.9: (a) Network topology with across-cluster synaptic coupling (dotted lines). (b) Average clustering accuracy vs. time scale. (c) Augmented R matrix using time scales up to 96 ms. 0.5 46 α + (t ) = 1.25 sin (4000πt M pq )exp (− 3000t M pq ) pq α − (t ) = −1.5 sin pq (2000πt M pq )exp(− 3000t M pq ) (3.11) The expression in equation (3.11) differs from equation (3.10) by the constant amplitude term Apq, which indicates relatively weaker coupling between clusters. The history interval was fixed to 120 bins for all interactions. The similarity matrix shown in Figure 3.9b shows significant correlation appearing between neuron 5 and neurons 2, 3, and 4 consistent with the model since neuron 5 is excited by neuron 3, and neuron 3 is coupled to neurons 2 and 4. Similarly, significant correlation appears between neuron 8 and neurons 1, 2, and 3. These changes in the correlation structure lead to a slightly diminished clustering accuracy (about ~4% decrease) compared to the isolated cluster topology illustrated in Figure 3.9c. These results demonstrate that despite the presence of weak interactions in the similarity matrix, the clustering accuracy was not significantly affected, indicating the ability of the algorithm to detect subtle differences between weak across-cluster interactions relative to stronger within-cluster ones. 3.5.2.2 Spike-Timing Dependent Plasticity Herein, we investigated an activity-dependent model of plasticity by modifying synaptic strength based on the timing of pre- and post-synaptic spike occurrences [86, 147]. Figure 3.10a shows the topology used where STDP was induced by simultaneously stimulating two neurons from each cluster. This was achieved by simultaneously increasing the background rate β p in equation (3.3) for the paired neurons at specific phases. By doing so, we increase the likelihood of observing two spikes from the pre- and the post-synaptic neurons within a shorter interval, thereby potentiating their synaptic connections. We carried out 120 stimulation phases at 0.2 Hz 47 I E I I E 2 E C1 E E 4 3 I I E E E 1 I 14 I E C4 E E 16 15 I I 13 5 E Normalized EPSP slope (%) Stimulation I 6 C2 E E E 7 8 I I E E 10 I E C3 E E 11 12 I I I 9 Avg. Clustering Acc. (a) 2 1.5 1 0.5 -100 -50 0 50 100 Pre/post Synaptic Spike Interval (ms) (b) 1 0.9 0.8 0.7 0.6 0 4 Clusters 2 Clusters 20 40 60 80 100 120 Stimulation Phases 16 14 12 10 8 6 4 2 4 Neuron Neuron (c) 2 16 14 12 10 8 6 4 2 15 10 5 0 2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16 Neuron Neuron (d) (e) Figure 3.10: (a) Network topology with paired stimulation to induce STDP. (b) EPSP slope as a function of pre-post synaptic spikes amplitude (A) and time constant (τ) were chosen as 1.01 and 14.8 ms for LTP, and -0.52 and 33.8 ms for LTD, respectively. (c) Average clustering accuracy post stimulation phases assuming 4 clusters and 2 clusters. (d) Similarity matrix using time scales up to 96 ms post the 10 stimulation phases and (e) post 120 stimulation phases. 48 with 12 ms duration in each. The parameter Mpq was set to 40 bins (120 ms) for all synaptic couplings. The percentage change in synaptic strength was modeled using the relation dα pq = A exp(− dt τ ) , where dt is the pre/post inter-spike interval. The parameters A and τ were chosen as 1.01 and 14.8 ms respectively for Long Term Potentiation (LTP), and -0.52 and 33.8 ms for Long Term Depression (LTD), respectively. These parameters were chosen according to the model described in [147] in which the measured synaptic changes were fitted with exponential functions similar to Figure 3.10b. Increasing stimulation time results in LTP, and this means that the post-synaptic neuron should gradually “merge” into the pre-synaptic neuron’s cluster. For example, neuron pair (2, 5) was weakly connected before stimulation. During each stimulation phase, these two neurons were stimulated simultaneously. This repetitive pairing resulted in LTP, and the same process was applied to neuron pairs (3, 8), (9, 14), and (12, 15). Thus, the clustering accuracy should decrease if the number of clusters is held equal to the number of clusters prior to the paired stimulation, which is indeed the case in Figure 3.10c. Figure 3.10d and Figure 3.10e show the similarity matrix obtained using time scales up to 96 ms after 10 and 120 stimulation phases, respectively. The results demonstrate that the algorithm is able to track the effect of naturally occurring spikes as a result of STDP and the corresponding changes in the network topology. 3.5.3 Clustering Large Populations We examined the performance of the proposed approach when applied to a larger population of 120 neurons. The population consisted of 4 clusters, 30 neurons each, where each cluster is formed of a 2D grid of neurons. The strength of the connections between clusters 1 and 2 was manually varied similar to the 16-neuron population above as shown in Figure 3.11a. 49 Avg. Pairwise Distance Cluster 1 Cluster 2 Cluster 4 Cluster 3 8 7 6 5 4 3 0 0.2 0.4 0.6 0.8 Connectivity Strength 1 (b) Avg. Clustering Acc. (a) 1 0.9 0.8 0.7 0.6 0.5 0.4 0 4 Clusters 3 Clusters 0.5 1 Connectivity Strength (c) Figure 3.11: (a) Network topology for a population of 120 neurons, where each cluster has 30 neurons. The bidirectional connections refer to an excitation connection in one direction and an inhibition in the opposite one. Black connections represent the within cluster connections, while gray connections represent the across clusters 1 and 2 connections that undergo functional plasticity. (b) Average pair-wise distance between neurons in clusters 1 and 2 for variable connectivity strengths. A connectivity strength of 1 indicates a connection across-clusters of the same strength as a connection within clusters. (c) Average clustering accuracy for variable across-clusters connectivity strengths (clusters 1 and 2). Figure 3.11b shows the average pair-wise distance between neurons of clusters 1 and 2 for variable connectivity strengths between the clusters. These distances were computed using the similarity matrix prior to probabilistic spectral clustering. As expected, the distance decreases as the connectivity strength between the two clusters increases, which is reflected on the average 50 clustering accuracy in Figure 3.11c. As the across-clusters connections strengthen, the clustering accuracy decreases when the number of clusters is fixed to be 4, while it increases when the number of clusters is fixed to be 3. 3.5.4 Estimating the Time Scale of Maximum Accuracy We have demonstrated that a functional relationship exists between the time scale at which maximum accuracy occurs and the history interval length chosen to model the population interaction. When dealing with real data, prior knowledge of this history interval is not available and therefore this relationship cannot be exactly discovered. For that purpose, we computed a clustering validity index, the Dunn Index (DI) [148], defined as DI = min 1≤ i ≤ N,1 ≤ j ≤ N i≠ j ( d Ci , C j ) max diam(C k ) 1≤ k ≤ N (3.12) where d(Ci, Cj) is the average distance between spike trains of pair-wise neurons in clusters i and j ( i ≠ j ), diam(Ck) is the maximum pair-wise distance between spike trains of neurons in cluster k, and N is the total number of clusters. This index will be maximized when the across-clusters distance (the numerator) is maximized, while the within-cluster distance (the denominator) is minimized. Thus, it can identify sets of clusters that are compactly separated. Figure 3.12 shows the average Dunn index at different time scales for the data sets analyzed in section 3.5.1.2. The values shown were computed directly using the clustering outcome from the proposed approach. Comparing Figure 3.12 with the clustering accuracy shown previously in Figure 3.6a, it can be seen that for each choice of history interval H, the Dunn index is maximized around the same time scale where the clustering accuracy peaks. In practice, the Dunn index is computed using entries of the augmented R matrix by normalizing the entries in 51 Average Dunn Index 1 0.5 0 H=60 ms H=120 ms H=240 ms H=360 ms 0.01 0.1 1 Time Scale (s) 10 100 rd Figure 3.12: Average Dunn index ± SD over all trials vs. time scale fitted with 3 polynomial for the network in Figure 3.5a for different history intervals. order the R matrix by the maximum value, then subtracting each entry in the normalized R from one. The resulting matrix thus comprises the distance between all pair-wise neurons and is eventually used in the computation of equation (3.12). 3.6 Validation using Barrel Cortex Data To demonstrate the performance of the multiscale clustering algorithm in the analysis of real data, we applied the algorithm to stimulus-driven activity recorded from the rat barrel cortex. Details of the experimental procedure are presented in Chapter 5, section 5.2.1. Briefly, 5 adult female Sprague Dawley rats were used in the experiment. Subjects were anesthetized and their left somatosensory cortex was exposed. An array of 32-channel microelectrode array was advanced into layer V of the barrel field. For each rat, 3 whiskers were selected for mechanical stimulation where each whisker was horizontally deflected 900 times. For each deflected whisker, spike trains within 100 ms of each stimulus onset were considered as one trial. A total 52 of 100 datasets, 18 sec each, for each whisker were extracted from the recorded 900 trials/whisker, where each dataset was formed by concatenating 180 trials that were randomly chosen with a uniform distribution from the 900 trials. Multiscale clustering was applied to each of the stimulus-specific datasets with variable number of clusters. Figure 3.13a illustrates sample inferred networks for three distinct whiskers in one rat showing 2 clusters for whisker C1, 3 clusters for whisker D1 and 6 clusters for whisker D2. To determine the optimal number of clusters for each rat, we computed the mutual information between the inferred networks and the stimulus (i.e. whisker identity) for each possible setting of the number of clusters. The optimal number of clusters was set as the number of clusters that maximizes the mutual information. Details of the mutual information (network information) computation are presented in Chapter 5, section 5.2.4. Briefly, principal component analysis (PCA) feature space of the inferred networks was constructed as illustrated in Figure 3.13b for each setting of the number of clusters where each point represents a network of one dataset. Network information was then computed from the obtained feature space. Figure 3.13c demonstrates the mutual information obtained for each setting of the number of clusters for one sample rat. For this particular example, the maximum network information was achieved by clustering C1 data to 2 clusters, D1 data to 3 clusters and D2 data to 6 clusters. The network information obtained across all rats was 1.4±0.23 bits. To assess the validity of these networks in the absence of knowledge of the underlying true connectivity, we first examined the connection probability as a function of the horizontal and vertical separations between electrodes for optimal setting of the number of clusters. We expected that neurons recorded on the same or adjacent electrodes are more likely to be connected in the inferred networks, consistent with anatomical and physiological studies in 53 Whisker C1 16a 8b 25c 1b 8a 2a 25b 17b 25a 7a 24b 17a 25a 7a 2b 1a 9c 9b Whisker D1 17b 7b 24a 2a 9a 8a 9a 1a 1b 24c 24a 25b Whisker D2 25a 8b 1a 7a 24a 17a 24c 25c 17a 8b 2b 9b 16b 16a 25b 24b 9c 7b 16b 2a 16a 24b 24c 16b 9b 9a 9c 2b 17b 8a 1b 25c 7b (a) Network Info. (bits) Network Space 0 -5 -10 -10 C1 D1 D2 5 D2 Clusters PC2 5 7 6 5 4 3 2 1.5 1 0.5 7 6 5 4 3 0 2 D1 Clusters 7 6 5 4 0 3 2 C1 Clusters PC1 (c) (b) Figure 3.13: (a) Sample stimulus-specific networks formed of clusters for 3 different whiskers (C1, D1 and D2) of rat R3. (b) Network feature space of rat R3 for 3 different whiskers (C1, D1 and D2). Each dot corresponds to the projection of one network onto a 2-dimension principal components (PC1 and PC2) feature space. (c) Network information for rat R3 as a function of the number of clusters used for each of whiskers C1, D1 and D2. -5 54 the neocortex suggesting that connectivity tends to be mostly local for economic wiring [13, 149, 150]. Figure 3.14a demonstrates that neurons recorded on the same or on adjacent electrodes have a higher probability of being connected. Furthermore, the connection probability decreased with increasing electrode separation (r = -0.42, P < 0.05, n = 24, t-test). We also validated the identified clusters temporally where neurons with similar temporal characteristics were expected to belong to the same cluster. We examined whether neurons with similar response latencies to whisker stimulation were clustered together where the response st latency was quantified as the time to the 1 spike fired in response to whisker deflection (Chapter 5, equation (5.1)). Figure 3.14b illustrates a histogram of the connections as a function of the difference in the response latency between the connected pairs using the optimal setting of the number of clusters. As can be seen, neurons with small difference tend to be clustered together more than neurons with larger difference (r = -0.76, P < 0.05, n = 7, t-test). 500 400 300 200 100 0 0.06 0.04 0.02 0 Fraction of Conn. Vertical Sep. (μm) Connection Prob. 0.8 0.6 0.4 0.2 0 0 5 10 15 20 25 30 35 Response Latency Difference (msec) (a) (b) Figure 3.14: (a) Histogram of the spatial difference between the connected neurons quantified in terms of the horizontal and vertical separations between the electrodes on which neurons were recorded. (b) Histogram of the temporal difference between the connected neurons quantified in terms of the difference in the connected neurons’ response latency to the stimulus. The number of clusters for each rat in both (a) and (b) was set to the number that maximizes the mutual information between the networks and the stimulus. 0 400 800 1200 Horizontal Sep. (μm) 55 3.7 Discussion and Conclusion Two major features of the proposed approach are specifically tailored to the type of neuronal correlation likely to be observed in cortical data: First, the projection of the spike trains onto a scale space to account for any synergistic interplay between temporal and rate codes that may be simultaneously present in a given ensemble activity pattern; and second, the probabilistic spectral clustering to overcome any complexity of clustering data with arbitrary structures in a feature space. This latter feature yields two additional advantages: First, it is nonparametric and therefore can track a wide variety of cluster shapes, and second, its probabilistic nature is useful in quantifying the degree of uncertainty in the membership of a given neuron to each cluster. This implies more robustness to plastic changes in cortical circuitry within and across multiple trials. The results demonstrated in this chapter show the effectiveness of the proposed multiscale clustering approach in identifying functional connectivity in both simulated data under different model parameters and real data recorded from the rat barrel cortex. A typical problem arises in the use of clustering algorithms in general is the knowledge of the exact number of independent clusters prior to the analysis. While a few heuristic approaches have been proposed in the literature for many types of data, this problem can be quite challenging to the application at hand given the ever-present cortical adaptation caused by synaptic plasticity, cellular conditioning and/or representational plasticity [151]. One approach for estimating the number of clusters to look for in the data is the elbow criterion [142], which was further improved by the introduction of the gap statistic [152]. Specifically, the gap statistic criterion is based on the observation that the within-cluster dispersion, defined as the overall distance between pair-wise elements in a cluster, tends to decrease significantly when the 56 number of clusters exceeds the true number. Therefore, it is possible to estimate the correct number of clusters by identifying the elbow (or the “knee”) in the graph of within-cluster dispersion and comparing it to the reduction observed in the within-cluster dispersion of a reference distribution. It has been demonstrated that the gap statistic approach was capable of detecting the correct number of independent neuronal clusters when applied to a population of leaky integrate-and-fire (LIF) neurons [153]. Similar to the Dunn index used in this chapter to determine the maximum clustering accuracy, the gap statistic can also be considered as a clustering validity index. However, the main difference between the use of both measures is that the gap statistic is used prior to the clustering to estimate the number of clusters to look for in the data, while the Dunn index is used post clustering to choose the clustering result corresponding to the time scale that best represents the temporal interaction between neuronal elements in the observed activity patterns. In case of stimulus-specific data, we used the mutual information between the obtained clustering result of all trials and the stimulus to determine the optimal number of clusters. In addition to the numerous systems neuroscience applications that may benefit from the proposed approach, a fundamental objective in many recent Brain Machine Interface (BMI) applications is to decode ensemble spike trains to better characterize the nature of cortical motor control associated with distinct motor behaviors [50, 154-158]. Many of these studies have examined the utility of multiple single unit recordings in the primary motor, premotor and supplementary motor areas of the cortex in controlling artificial devices. In the overwhelming majority of these studies, neuronal firing rates estimated over a fixed bin width (typically 50-100 ms) are believed to contain all the information needed to estimate the neural correlate of behavioral. However, the large variability in individual firing rates typically observed across 57 multiple repeated trials results in unreliable decoding. These algorithms require extensive periodic “calibration”, even when recordings are stable and the subject’s performance remains steady [159]. Faithful decoding of motor cortex activity patterns when the subject is freely behaving and naturally interacting with the surrounding requires continuous identification of clusters of neurons involved in encoding behavioral parameters at different time scales. Clustering techniques such as the one reported in this chapter may constitute an important predecoding step to identify the underlying clusters constituting a lower dimensional subspace of the observed neural space. 58 Chapter ____________________________________________ 4 Inferring Neuronal Effective Connectivity using Dynamic Bayesian Networks ______________________________________________________________________________ 4.1 Dynamic Bayesian Network Theory This chapter introduces the use of Dynamic Bayesian Networks (DBNs) in inferring the effective connectivity between spiking cortical neurons from their observed spike trains [28, 29]. Inferring causal links between neurons is a challenging problem because observing an effect may have many causes, some of which may be unobservable, leading to the possibility of inferring spurious connections between neurons. In the case of time varying stochastic processes with potentially many possible causes such as spike trains, Bayesian analysis provides a powerful framework because it treats all model quantities as random variables.  Causal links between these variables – now explicitly appearing as vertices in a graph - can therefore be represented as directed edges. In such case, DBN can be used as a graphical model of spiking neurons because it explicitly models time dependencies between the variables [160, 161] . A Bayesian Network (BN) is a graphical representation of statistical relationships between random variables, widely used for statistical inference and machine learning [162, 163]. A BN is denoted by B =< G, P >, where G is a directed acyclic graph (DAG) and P is a set of conditional probabilities. Each graph G consists of a set of nodes V and edges E, and is usually written as G =< V, E >. Each node in V, denoted by vi, corresponds to a random variable xi. Each directed 59 edge in E, denoted by vi → vj, indicates that node vi (i.e., random variable xi) is a parent of node vj (i.e., random variable xj). Conditional probabilities in P are used to capture the statistical dependence between child nodes and parent nodes. In particular, given a random variable xi in the graph, we denote by xπ(i) the set of random variables that are parents of xi. The statistical dependence between xi and its parent nodes xπ(i) is captured by the conditional probabilities Pr(xi|xπ(i)). More precisely, the value of random variable xi is decided by the values of its parents via the conditional probabilities Pr(xi|xπ(i)), and is independent from the values of other random variables in the graph given the value of xπ(i). Thus, the joint probability distribution of the random variables xi can be expressed given the conditional dependence on the parents using the chain rule N ( Pr( x1 , x2 ,..., x N ) = ∏ Pr xi xπ (i ) i =1 ) (4.1) Figure 4.1a shows an example of a Bayesian Network, in which we have three binary random variables: random variable R indicates “if it rains” or not, W indicates “if the grass is wet” or not, and U indicates “if people bring umbrella” or not. The arc from R to W indicates that “the grass being wet” is decided by “if it rains”. Similarly, the arc from R to U indicates that “whether or not people bring umbrella” is decided by “if it rains”. The conditional probabilities Pr(U|R) and Pr(W|R) are summarized in Table 4.1. According to the chain rule described before, the joint probability Pr(R, U, W) is written as Pr (U , R, W ) = Pr (R ) Pr (U R ) Pr (W R ). (4.2) 60 R W U (a) Day t-1 Day t Day t+1 Sprinkle Sprinkle Sprinkle Rain Rain Rain (b) Figure 4.1: (a) An example of a Bayesian Network. Random variable R indicates if it rains or not, W indicates if the grass is wet or not, and U indicates if people bring umbrella or not. (b) An example of a Dynamic Bayesian Network. Each morning an automatic sprinkler randomly decides whether it should water the lawn or not conditioned on whether it watered the day before and whether there was rain the day before. A Dynamic Bayesian Network (DBN) is an extension of BN to handle time-series or sequential data [161] and, therefore, are much more suitable to analyze nonstationary random processes. In a DBN, the status of a node (variable) at time t0 is conditionally-dependent on its parents’ state in history. Specifically, given a random variable xi at time T = t + 1, denoted by ( ) ( ) xi t +1 , and its parents xπ(i), the value of xi t +1 is decided by the values of its parents xπ(i) (1 observed during the interval T = 1 to T = t, denoted by x π :(ti)) . Note that T here is assumed to be a ( ) t +1 and discrete variable. Similar to Bayesian networks, the statistical dependence between xi Pr(W|R) R = true R = false Pr(U|R) W = true 0.7 0.4 U = true W = false 0.3 0.6 U = false Table 4.1:Conditional probabilities Pr(W|R) and Pr(U|R). 61 R = true 0.9 0.1 R = false 0.2 0.8 (t +1) | x (1:t ) ), and the conditional probability π (i ) (1 x π :(ti)) is captured by conditional probabilities Pr( xi ( ( Pr ⎛ x1t +1) , x 2t +1) ,..., x (t +1) x (1:t ) ⎞ is computed as ⎜ ⎟ N ⎝ ⎠ N (1:t ) ( ( Pr⎛ x1t +1) , x2t +1) ,..., x (t +1) x (1:t ) ⎞ = ∏ Pr⎛ xi(t +1) xπ (i ) ⎞. ⎜ ⎟ ⎟ ⎜ N ⎝ ⎠ ⎠ ⎝ (4.3) i =1 ( ) In many cases, and for the sake of simplicity, it is often assumed that xi t +1 is only dependent on the value of its parents observed at time T = t, which simplifies the conditional (t +1) x (1:t ) ⎞ to Pr ⎛ x (t +1) x (t ) ⎞ . This is known as the Markov assumption probabilities Pr ⎛ x i ⎜ ⎜ i π (i ) ⎟ π (i ) ⎟ ⎝ ⎠ ⎝ ⎠ with Markov lag equal to 1. A special case of DBN is the well-known Hidden Markov Model (HMM) [164]. This simplification to the Markov assumption can be extended to include multiple ( ) Markov lags. For instance, a DBN with maximum Markov lag equals 3 implies that xi t +1 is (t −2:t ) decided by the value of its parents observed at time T = t, t − 1, t − 2, or Pr⎛ xi(t +1) xπ (i ) ⎞ . ⎜ ⎟ ⎝ ⎠ Figure 4.1b is an example of a DBN. Each morning, an automatic sprinkler decides whether to water a lawn. The sprinkler makes a probabilistic decision based on 1) whether it watered the lawn the day before and 2) whether there was rain the day before. The conditional probability distribution table of Pr(S t+1 t t = true| S , R ) is listed in Table 4.2. Note that in this example, the variable S is also one of its parents. t+1 Pr(S t t = true| S , R ) 0.8 0.3 0.3 0.1 t+1 Table 4.2: Conditional probabilities Pr(S t t t S false true false true | S , R ). 62 t R false false true true 4.2 Learning Bayesian Networks Learning a Bayesian network from data involves two tasks: learning the structure of the network and learning the parameters of the conditional probability distributions. Structure learning of Bayesian networks is much more difficult compared to parameter learning because once the structure is known, it is easy to learn the parameters of the conditional probability distributions using existing algorithms such as Maximum Likelihood Estimation (MLE). Learning the structure of the network can be formulated as searching for a network structure G* that best fits the observed data D. Formally, this is expressed as G* = arg max(log Pr (G D )) (4.4) G = arg max(log Pr(D G ) + log Pr(G )), G where Pr(D|G) is the likelihood of the data D given the structure G and Pr(G) is the prior probability of G. In our experiments, we assumed a uniform distribution for Pr(G). There are many structure learning algorithms, and the most popular among them are the score-based approaches [163, 165, 166]. Figure 4.2 outlines the steps of the DBN structure search algorithm. A scoring function is first defined to estimate the likelihood Pr(D|G) by which a given Bayesian network structure is evaluated on a given dataset, and then a search is performed through the space of all possible structures to find the one with the highest score. Score-based approaches are typically based on well-established statistical principles such as Minimum Description Length (MDL) [167], Bayesian Dirichlet equivalent (BDe) score [168] or Bayesian Information Criterion (BIC) [169]. In our analysis, we used the BDe score. Let θG denote the parameters for the conditional probability distribution for structure G. To evaluate the 63 Spike Trains D Initial Random Structure Structure Markov Lags Compute Bayesian Score Pr(G|D) G Score No Modify End Search? Yes Inferred Structure (G*) Figure 4.2: Schematic of the DBN structure search algorithm. posterior Pr(D|G), one needs to consider all possible parameter assignments of θG, namely log Pr(D G ) = log ∫ Pr (D G, θ G ) Pr (θ G G ) dθ G , (4.5) where Pr(D|G,θG) is the probability of the data given the network structure and the parameters and Pr(θG|G) is the prior probability of the parameters which penalizes complex structures. Under the assumption that the distribution of each node in the network can be learned independently of all other distributions in the network and assuming Dirichlet priors, the BDe score can be expressed in a closed form of equation (4.5) as [163, 170] ( ) ( ' ⎛ ' Γ Z ijk + Z ijk Γ Z ik ⎜ + ∑ log log Pr (D G ) = ∑ ⎜ log ' ' Γ Z ijk Γ Z ik + Z ik i ,k ⎜ j ⎝ ( ) ( ) )⎞⎟ (4.6) ⎟ ⎟ ⎠ ∞ x −1 −t where Γ( x ) = ∫ t e dt is the Gamma function satisfying Г(x+1)=xГ(x) and Г(1)=1, 0 (1 Z ik = ∑ Z ijk , Zijk is the number of times random variable xi(t ) = j and x π :(ti)) = k , ' Z ik = ' ( ' ∑ Z ijk and Z ijk = a Pr(xi(t ) = j, xπ1:(ti)) = k G0 ), where a is the equivalent sample size and j j G0 is a prior structure. 64 The associated optimization of this approach, however, is intractable [171]. Commonly used methods to alleviate this problem include: 1) using some pre-processing methods like conditional independence test to infer the Markov blanket for each node, and thus limiting the candidate structure space; 2) using heuristics, like greedy search or simulated annealing, to find the suboptimal structures. Herein, we use a score-based approach with simulated annealing search. Simulated annealing search avoids falling in local maxima by choosing certain modifications to the structure that do not necessarily increase the score. Specifically, if for a given modification the score increases, then this modification is accepted, while if the score decreases, then this modification is accepted with probability exp(Δe/T0) where Δe is the change in the score, which is negative in this case, and T0 is the system ‘temperature’. The search starts with a very high T0 so that almost every structure modification is accepted, and then decays gradually as the search process progresses [172]. 4.3 DBN and Spike Trains DBN has been recently used in inferring transcriptional regulatory networks from gene expression data [173-175]. For brain connectivity, DBN has been coarsely applied to infer effective connectivity between multiple brain areas from fMRI data [176, 177] and, at a more refined resolution, from multi-unit activity recorded throughout the songbird auditory pathway [178]. Nevertheless, the use of DBNs to characterize cortical networks at the single neuron and the ensemble level has not been fully explored. This chapter demonstrates how DBN can be used to infer neuronal connectivity from spike trains. In the context of spiking cortical networks, we model each neuron in the data by a set of nodes where each node corresponds to the neuron’s firing state (‘0’ or ‘1’) at a given Markov 65 lag. A directed edge between two nodes represents a causal relationship between them that is detected at the associated Markov lag. Figure 4.3 illustrates an example of a causal network and the corresponding DBN. Consider for example the connection between neurons 3 and 4 with a synaptic latency of 2 bins. This connection is represented in the DBN as a directed edge from one node representing neuron 3 firing state at a Markov lag of 2 to another node representing the current firing state of neuron 4. Modeling time dependency allows DBN to model cycles or feedback loops as illustrated by the loop involving neurons 1, 2 and 3 (represented by a DAG in the DBN). Moreover, this makes DBN capable of capturing relationships that appear at different time lags. This is particularly useful considering that many unobserved neurons, e.g. interneurons, may exist in the pathway between observed efferent and afferent neurons. This may contribute considerable latency between causal events as a result of the aggregate synaptic delay. 2 ... 3 ... ... 1 l31=1 l32=3 3 l43=2 2 Neuron l12=2 1 4 1 Markov Lag . . . t-3 t-2 t-1 ... 4 (a) t (b) Figure 4.3: (a) A simple causal network where nodes represent neurons and edges represent connections. lij indicates the synaptic latency associated with each connection. (b) Corresponding DBN where black nodes represent the neuron firing state at a specific Markov lag while white nodes represent the present state. 66 4.4 Simulation Model To examine the performance of DBN in reconstructing a spiking neuronal network, it was essential to use a model to simulate the spike train data in which the “ground truth” was known. In our simulations, we used a variant of the generalized linear model (GLM) proposed by [98] similar to the model used in Chapter 3. In this model, the spike train Si of neuron i is expressed ( ) as a conditionally-Poisson point process with mean intensity function λi t H i (t ) , where H i (t ) denotes the firing history of all the processes that affect the firing probability of neuron i up to time t (e.g. stimulus feature, other neurons’ interaction, the neuron’s own firing history, etc…). Herein, we do not consider an explicit external stimulus to drive the network, albeit this can be included as an object in the graphical representation in the stationary case or as a set of objects with known transitions in the nonstationary case. We focus on two main variants contributing to λi (t H i (t )) : a) the neuron’s background level of activity; and b) the spiking history of the neuron itself and that of other neurons connected to it. Mathematically, the firing probability f (S i (t ) = 1) of neuron i at time t can be expressed as M ij ⎛ ⎞⎞ ⎛ ⎜ ⎟⎟ ⎜ f (S i (t ) = 1) ≅ λi (t H i (t ))Δ = ⎜ exp⎜ β i + ∑ ∑ α ij (mΔ )S j (t − mΔ )⎟ ⎟Δ, ⎟⎟ ⎜ ⎜ j ∈π (i ) m = 0 ⎠⎠ ⎝ ⎝ (4.7) where Δ is a very small bin width, βi is the log of the background rate of neuron i, π(i) is the set of pre-synaptic neurons of neuron i (this set includes neuron i itself to model self-inhibition), Mij is the number of history bins that relate the firing probability of neuron i to activity from neuron j, αij models the connection between neuron i and neuron j (excitatory or inhibitory), and Sj(t − mΔ) is the state of neuron j in bin m (takes the value of 0 or 1). Unlike the model in [98], the length of the history interval of interaction between neurons i and j (equals Mij × Δ) was not 67 fixed for all neuron pairs, consistent with the notion of cell assemblies [179]. To mimic the influence of excitatory post-synaptic potential (EPSP) and inhibitory post-synaptic potential (IPSP), we used the following decaying exponential functions for synaptic coupling [180-182] ⎧ ⎪0 ⎪ ⎩± Aij exp − 3000 t − lij Δ M ij ± α ij (t ) = ⎨ ( ( ) , if t < lij Δ ) (4.8) , if t ≥ lij Δ where t is the time (in seconds), +/- indicate excitatory/inhibitory interactions, Aij models the strength of the connection, and lij models the synaptic latency (in bins) associated with that connection. Similar to the effects of EPSP and IPSP, the decaying exponential in these expressions implies that a spike from a neuron j occurring towards the end of the history interval has a smaller influence on the firing of neuron i than if it were to occur at a closer time. Figure 4.4 illustrates α ij (t ) showing that α ij (t ) attains its peak value at lijΔ and that the time constant + + of the decaying exponential is M ij 3000 . ⎧0 ⎪ ⎪ Aij exp − 3000 t − lij Δ M ij ⎩ + α ij (t ) = ⎨ Aij lijΔ ( ( l ij Δ + M ij 3000 ) ) , if t < lij Δ , if t ≥ lij Δ M ij Δ t + Figure 4.4: An illustrative graph of α ij (t ) where lijΔ is the synaptic latency, Aij models the connectivity strength, Mij/3000 is the time constant of the decaying exponential where Mij denotes the number of history bins, and MijΔ is the history interval. 68 4.5 Results We tested the algorithm on multiple network structures simulated using the point process model in equation (4.7). For each parameter setting in the results that will follow, we generated 100 networks of different structures, each containing 10 randomly connected neurons, where for each neuron, the indices of pre-synaptic neurons were drawn from a uniform distribution to avoid biasing the algorithm to a specific model. In addition, each neuron had a self-inhibitory connection to model post-firing refractoriness and recovery effects. The duration of the generated spike trains was set to 1 minute with a bin width of 3 ms that included the refractory period. In our experiments, we used the Bayesian Network Inference with Java Objects (BANJO) toolbox [178]. We used the simulated annealing search algorithm in all the analyses [172]. The search time was set to 1 minute in all the analyses except for Section 4.5.6, where the performance was examined as the search time was increased for large populations. All the analyses were performed on a Dual Intel Xeon machine (2.33 GHz, 64 bit) with 8 GB of memory. We used the F-measure to quantify the inference accuracy [183]. This measure is the harmonic mean of two quantities: the recall R and the precision P, defined by C C ,P = C+M C +W 2 RP 2C = F= , R + P 2C + M + W (4.9) R= where C is the number of correctly inferred connections, M is the number of missed connections and W is the number of erroneously inferred spurious connections. Thus, F will be ‘0’ if and only if none of the true connections are inferred (C = 0) and will be ‘1’ if and only if all the true connections are inferred (M = 0) and no spurious connections are inferred (W = 0). Note that we 69 are interested here in comparing the connections inferred by the DBN to the true connections regardless of the Markov lag they appear at or the strength of the actual connections. For example, if a true connection has a synaptic latency of 1 bin, it should be inferred at a Markov lag of 1. If it were to be inferred at a Markov lag of 2, it will still be considered as correct when computing F. 4.5.1 Networks with Fixed Synaptic Latency We thoroughly investigated the performance of DBN by varying the model parameters in equations (4.7) and (4.8) while keeping the synaptic latency for all neurons fixed at 3 ms. This mimics direct connectivity that may exist in a local population rather than diffuse connections that may be observed across different parts of the cortex. Figure 4.5 shows the performance of DBN for different settings of the model parameters when the DBN Markov lag was set to match the synaptic latencies. Each point represents the mean and standard deviation of the inference accuracy for 100 different network structures. We initially examined the performance as the number of pre-synaptic neurons was varied between 1 and 6 while fixing all other parameters. All the connections here were excitatory with a history interval of 180 ms. We decreased the strength of the connections Aij in equation (4.7) as shown in the inset in Figure 4.5a as the number of pre-synaptic neurons increased in order to keep the mean firing rate in the range of 20 to 25 spikes/sec and prevent unstable network dynamics while the background rate of each neuron was set to 10 spikes/sec. The results in Figure 4.5a suggest that DBN achieves 100% accuracy with a slight decline above 4 input connections. This decline can be attributed to the decrease in the synaptic strength 70 0.9 0.9 0.6 1 0 2 4 6 1 2 3 4 5 6 # Pre-synaptic Connections (a) F-Measure 0.9 0.8 0.7 0.6 No 0.25 0.5 1 2 I/E Ratio Inhib. (c) 0.7 2 1 0 0 200 400 60 180 300 420 History Interval (ms) (b) DBN PDC GLM 1 0.8 0.6 Coefficient of Variation 0.7 2 Aij 0.8 F-Measure 1 Aij F-Measure 1 4 1.4 1.2 1 0.8 0.6 0.4 No 0.25 0.5 1 I/E Ratio Inhib. (d) 2 4 F-Measure 1 0.9 0.8 0.7 0 5 10 15 20 25 Background Rate (spikes/sec) (e) Figure 4.5: (a) DBN performance vs. number of excitatory input (pre-synaptic) connections. Inset: Connection strengths Aij for each choice of that number. (b) DBN performance vs. firing history interval length. Inset: Connection strengths Aij for each choice of this length. (c) DBN performance (solid) vs. ratio of inhibitory to excitatory synaptic strength (I/E Ratio). The accuracy gets closer to unity as inhibition strength surpasses excitation strength (I/E ratio increases above 1). Partial Directed Coherence (PDC) (dashed) and Generalized Linear Model (GLM) fit (dotted) performance shown for comparison. (d) Coefficient of variation for the data analyzed in (c). (e) DBN performance vs. background rate. 71 Aij, thereby reducing the influence an input spike from neuron j has on the firing probability of neuron i. We then varied the history interval of the interaction by varying Mij and examined the performance. Each neuron received two excitatory connections of equal weight from two neurons. The weights were adjusted in order to keep the mean firing rate in the same regime for different settings of Mij as illustrated in the inset of Figure 4.5b. Given that the synaptic latency was fixed among all neurons and matched the DBN Markov lag used, we expected the performance to be invariant to changes in the history interval length. Figure 4.5b illustrates that this is indeed the case and that the DBN performance is almost steady regardless of the length of the history interval. When inhibitory connections exist, the inference task becomes more complicated, particularly for hypoexcitable neurons, due to the fact that the neuron’s spiking pattern becomes very sparse. As a result, observing a spike event may contain a lot more information about causal effects than not observing one. We further investigated the performance in the presence of inhibitory connections with variable degrees of strength. Each neuron received two input connections, one excitatory and one inhibitory. The excitatory synaptic strengths in equation (4.8) were fixed at 2.5, while those of the inhibitory connections were varied. We defined I/E ratio as the ratio of the cross-inhibitory to cross-excitatory synaptic strength and tested ratios of 0.25, 0.5, 1, 2, and 4, respectively. All other parameters were set as previously described. Taking the self-inhibition mechanism inherent in our model into account (Aii = -2.5), an I/E ratio of 4 would correspond to a neuron with equal degree of inhibition and excitation. Such neuron should display homogeneous – Poisson like - characteristics reminiscent of independent firing. Figure 4.5c illustrates the inference accuracy as a function of the I/E ratio. For ratios below 1, a neuron 72 is more affected by the excitatory connection than the cross-inhibitory one. Therefore, the drop in performance observed suggests that DBN is unable to detect the presence of weak inhibition in the presence of a strong excitation. This can be attributed to the relatively insignificant effect of weak inhibitory input on the firing characteristics of post-synaptic neurons conditioned on receiving a strong excitation. When the I/E ratio increases above 1, the accuracy rapidly gets closer to unity and does not deteriorate even when the inhibitory connections are 4 times stronger than the excitatory connections. To interpret these results, we used the coefficient of variation (CV) of the inter-spike interval (ISI) histograms (CV = std(ISI)/mean(ISI)) to quantify the variability in the firing characteristics for variable I/E ratios. We hypothesized that if weak inhibitory connections do not significantly influence the firing of these neurons, then the CV should be close to those receiving pure excitation. Moreover, one would not expect to see a sharp transition in CV characteristics for I/E values adjacent to purely-excitatory connections. Figure 4.5d demonstrates that this is indeed the case. The average CV for I/E ratios of 0.25 is not significantly different from that of purelyexcitatory connections. This suggests that weak cross-inhibition did not result in large effects on the neurons’ firing characteristics, making it more difficult for DBN to detect this type of connectivity. The CV monotonically increases as the I/E ratio increases, reaching an average of 1 (similar to that of an independent Poisson neuron) when cross-inhibition, on average, is 4 times stronger than excitation (I/E ratio = 4). We compared DBN performance to a related measure of connectivity - Partial Directed Coherence (PDC) – that has been proposed to study causal relationships between signal sources at coarser resolution in EEG and fMRI data [89]. PDC is the frequency domain equivalent of Granger causality that is based on vector autoregressive models of certain order. A connection is 73 inferred between a given pair of neurons if the PDC at any frequency exceeded a threshold of 0.1. For each network, PDC was applied using models of orders 1 to 30. The accuracy shown by the dashed plot in Figure 4.5c represents the maximum accuracy achieved across all model orders. As can be seen, DBN outperformed PDC over the entire range of the I/E ratio, while exhibits similar performance around an I/E ratio of 0.25. Closer examination of the networks inferred using PDC for I/E ratios greater than 0.25 revealed that the deterioration in the PDC performance compared to the DBN was due to its inability to detect most of the inhibitory connections, consistent with previous findings with spectral coherence as well as Granger causality [96, 97]. We also compared DBN performance to that obtained using a Generalized Linear Model (GLM) fit [98, 184]. Ideally, GLM fit should yield the best result for our data since this is the generative model we used in equation (4.7). A maximum likelihood estimate of the coupling function αij in equation (4.8) for each neuron i is computed in terms of the spiking history of all other neurons j within a certain window of length WGLM. Since our goal is to identify the existence of connections and not to estimate the coupling function, we post-processed the estimated coupling functions such that a connection was inferred if the estimated coupling function was larger/lower than a given threshold for excitatory/inhibitory connections, respectively, for 3 consecutive bins while their p-values were significant. The spiking history considered for fitting WGLM was set to 20 bins (60 ms). The threshold was varied between 0 and ±1.5 and the p-value between 0.1 and 0.0001. The dotted plot in Figure 4.5c shows the maximum accuracy obtained across all thresholds and p-values. Superior performance of the GLM can be seen compared to DBN at I/E ratios below 1, while comparable if not slightly inferior for I/E ratios above 1. We note, however, that the GLM approach has a number of limitations: First, the 74 performance is highly dependent on the choice of the fitting spiking history interval WGLM. Second, the inference threshold and the p-values have to be carefully set to identify the connections. Finally, the GLM method search time was approximately 20 times that needed by the DBN to estimate the coupling functions for each 10-neuron population. Finally, we examined the performance as a function of the background rate in equation (4.7). This may mimic variations in the afferent input current to the neuron, and increments in this input when there is not coupling to other neurons are known to impact estimates of correlation between their output spike trains [185]. This is because correlation between a neuron pair cannot be orthogonally separated from their firing rates, thereby potentially leading to spurious connectivity inference [186]. Here, we had 2 input connections per neuron, one excitatory and one inhibitory having the same strength. Figure 4.5e illustrates that the inference accuracy was above 96% for background rates higher than 5 spikes/sec. Most remarkable is the ability of DBN to infer roughly 70% of the connections at background rates around 2 spikes/sec, despite that, at this low rate, neurons are “silent” most of the time. 4.5.2 Mismatch between Actual Synaptic Latency and DBN Markov Lag In practice, the synaptic latency between any two observed neurons is unknown and it is more likely that synaptic latencies between network elements to be heterogeneous, reflecting the highly distributed nature of cortical processing. For example, the synaptic weight functions expressing the EPSP and IPSP characteristics in Figure 4.6a for different choices of M illustrate that for relatively longer history intervals, the effect of an input spike on the firing probability of a post-synaptic neuron should last longer. This should enable the DBN to infer these connections even if the Markov lag is chosen to be larger than the true synaptic latency. On the 75 M=8 M=20 M=60 M=100 M=140 α+(t) ij 0.8 0.6 0.4 1 F-Measure 1 0.2 0 0.8 0.6 0.4 M=8 M=20 M=60 M=100 M=140 0.2 0 20 40 0 60 80 100 Bins (a) 2 4 6 Markov Lag (bins) (b) 8 1 F-Measure 0.8 0.6 0.4 M=8 M=20 M=60 M=100 M=140 0.2 0 0 10 20 PDC Order (c) 1 1 0.5 M=8 0 M=20 M=60 M=100 M=140 0 0.5 1 Threshold F-Measure F-Measure 30 0.8 0.6 0.4 0.2 0 0 1.5 20 40 60 80 WGLM (bins) 100 (e) (d) + Figure 4.6: (a) Coupling function α ij (t ) for different history intervals M (in bins). (b) DBN performance vs. Markov lag for variable history intervals and fixed synaptic latency (4 bins).(c) PDC performance vs. model order for the data analyzed in (a). (d) GLM fit performance vs. detection threshold for the data analyzed in (b). (e) GLM fit performance vs. GLM parameter WGLM when the model parameter M was set to 60 bins. 76 other hand, the influence of an input spike on the post-synaptic neuron firing for a short history interval should behave in a similar fashion. Figure 4.6b demonstrates the performance of the DBN when there is a mismatch between the DBN Markov lag and the synaptic latency in the model. In this experiment, we generated 100 different networks, 10 neurons each, in which each neuron had two input connections, one excitatory and one inhibitory of the same strength. The synaptic latency was fixed for all neurons at 4 bins. As can be seen, when the Markov lag was set to a value smaller than the populations’ true average synaptic latency, almost none of the connections was inferred. On the other hand, when the Markov lag was set to be larger than the true synaptic latency, the inference accuracy deteriorated only slightly for relatively long history intervals, while deteriorated significantly for shorter history intervals. To interpret this result, let’s first denote by lij the synaptic latency between neurons j and i. The firing of neuron i at time bin m is only affected by the firing of neuron j in the range [m−Mij, m−lij], where Mij is the number of bins in the history interval. Thus, setting the Markov lag to a value less than lij (for example, trying to relate the firing of neuron i at time bin m to that of neuron j at time bin m − lij + 1) did not help to detect the presence of the connection between the neuron pair. Not surprisingly, DBN attained accuracy close to unity when the Markov lag matched the synaptic latency, consistent with the results previously shown in Figure 4.5b. For the sake of comparison, Figure 4.6c and Figure 4.6d show the performance of PDC and GLM-fit, respectively. For PDC, almost none of the true connections were inferred when the model order was less than the synaptic latency. The performance improved when the model order was set greater than or equal to the synaptic latency, but the accuracy never exceeded 0.8 across the entire range of history intervals considered in the model. The drop in performance was 77 again due to the inability of PDC to infer inhibitory connections. For the GLM fit shown in Figure 4.6d, the spiking history interval WGLM considered in the fitting procedure was kept fixed at 20 bins (60 ms) for all the examined populations. As can be seen, the GLM fit highly depends on the threshold used to detect the existence of connections. To demonstrate GLM fit dependency on the choice of WGLM, we examined its performance in Figure 4.6e when the history Mij in the model was fixed to 60 bins. Peak performance was obtained when WGLM was set to 20 bins. For this choice of Mij (60 bins), the weight function α ij (t ) in equation (4.8) falls to roughly 95% of its peak value after 20 bins (Figure 4.6a). Therefore, a choice of WGLM that does not match the GLM model order may result in over or under fitting. 4.5.3 Networks with Variable Synaptic Latencies As stated before, a functional cortical network is more likely to have heterogeneous synaptic latencies. Typically, synaptic latencies can range from a few milliseconds for monosynaptic connections [31, 187] to a few tens of milliseconds for polysynaptic connections [188, 189]. In addition, the limited sample size precludes the ability to record all the neurons that are directly connected in a given population. It is therefore much more realistic for our model to have variable synaptic latencies in order to test the applicability of the method to data from real cortical networks. We investigated the performance when multiple synaptic latencies exist within the population. We defined a heterogeneity index (HI) as the number of distinct synaptic latencies that exist in the network as illustrated by the simple network in Figure 4.7a. Figure 4.7b demonstrates the accuracy for networks of 10 neurons with different HIs where HI = 1 implies 78 1 l=4 5 l=2 l=1 3 l=3 2 F-Measure 1 l=2 4 l=5 6 0.9 0.8 DBN GLM PDC 0.7 0.6 1 2 3 HI 4 5 (a) (b) Figure 4.7: (a) A simple network illustrating the dependence on the synaptic heterogeneity index (HI). The subpopulation inside the dashed circle has a HI of 3 while the entire population (neurons 1 to 6) has a HI of 5. The firing history was fixed at 60 bins (180 ms). (b) DBN, PDC and GLM-fit performance vs. HI. that all neurons had the same synaptic latency while 5 implies that every 20% of the neurons in the network had similar synaptic latency. Each neuron received 2 random connections, one excitatory and one inhibitory. In order to identify connections at different synaptic latencies, we applied DBN with a range of Markov lags such that the maximum lag matched the maximum synaptic latency in the population while the minimum lag was set to 1. In this case, DBN considers those lags simultaneously and tries to identify the best lag at which each connection is most prominent. Surprisingly, Figure 4.7b demonstrates that the accuracy was not significantly impacted at high HI. In addition, the DBN performance was superior to that of the GLM-fit and PDC for the same populations. 4.5.4 Identifying Monosynaptic Connectivity One important feature of using DBN in structure learning is its ability to detect causal relationships between the children nodes and their immediate monosynaptic parents and not the 79 higher-order polysynaptic parents. Figure 4.8a shows a simple network consisting of a 3-neuron chain where a directed information flow characterizes the connectivity such that neuron A excites neuron B, while in turn neuron B excites neuron C. The synaptic latency of both connections was set to 1 bin. In this case, neurons A and B are considered monosynaptic parents of neurons B and nd C, respectively. On the other hand, neuron A is considered a 2 order polysynaptic parent (ancestor) of neuron C. It is expected that the connections between neurons B and C and their monosynaptic parents can be best detected using a DBN with a single bin Markov lag, while a nd spurious monosynaptic relationship between neuron A and neuron C (2 order connection) might be detected with a 2-bin Markov lag as a result of the polysynaptic pathway between the two neurons. The complexity in discriminating between these two types of connections would be most pronounced in the absence of prior knowledge of the best Markov lag to use, particularly when analyzing a heterogeneous network such as the ones analyzed in Figure 4.7. Herein, we demonstrate the ability of DBN to restrict its inference to only direct causal relationships despite False Connections/Chain using a non-optimal Markov lag. l=2 A l=1 B l=1 C 0.15 0.1 0.05 0 1 2 3 (1, 2) Markov Lag (bins) (b) (a) Figure 4.8: (a) Example 3-neuron chain, where solid arrows represent true connections while the dotted one represents a spurious connection, l indicates the synaptic latency associated with each connection. (b) Average number of spurious connections inferred per chain. 80 We investigated the performance using 100 different network structures with a history interval fixed to 60 bins (180 ms), a fixed synaptic latency of 1 bin, and 1 excitatory and 1 inhibitory input connections per neuron. Each of the 100 networks examined had a total of 40 chains with 3 neurons each similar to the one in Figure 4.8a. A successful DBN inference should nd yield two connections matching the true monosynaptic connections and no 2 nd per chain. Figure 4.8b shows the number of erroneously inferred 2 order connections order connections per chain versus Markov lag. It can be seen that when a Markov lag of 1 bin was used, DBN did not infer nd any of the spurious 2 order connections as expected since the lag is shorter than the superposition of two synaptic latencies. When a Markov lag of only 2 bins was used, DBN surprisingly had a very low error rate despite that true monosynaptic connections should be nd inferred in addition to the spurious 2 order connection in each chain as shown in Figure 4.8b. We attributed this superior performance to two factors: first, the effect of the pre-synaptic neuron on the probability of firing of the post-synaptic neuron lasts for a period of time governed by the synaptic coupling function given in equation (4.8). Given the exponential decay form of this coupling, it is expected that the probability of firing at times beyond those governed by the synaptic latency will be influenced by the time constant of that exponential function. Consider the example in Figure 4.8a, when the synaptic latency is 3 ms (1 bin) and given the firing of neuron A at time t, equation (4.8) yields a probability of firing for neuron B at time t + 3 ms of 0.25, while that at time t + 6 ms (i.e. after 2 bins of neuron A firing) to be 0.2. Thus, the probability of firing at t + 6 ms has only declined by 20% of its maximum attained at t + 3 ms. Therefore, the connection A→B can still appear at a Markov lag of 2. The same applies to the connection B→C. Second, as the DBN tends to find the network structure with the least number 81 of edges to explain the data, the connection A→C is a redundant one given the presence of the connections A→B and B→C. Figure 4.8b also shows that a Markov lag of 3 results in a slight increase in the number of nd spurious 2 order connections. This can also be attributed to the effect of the time constant of the synaptic coupling. As the lag increases, the effect of the input spike decreases, making it relatively harder to detect true monosynaptic connections, while increasing the probability of nd detecting only spurious 2 order connections. Finally, at a Markov lag range of 1 and 2 in which nd connections at both lags should be simultaneously identified, no 2 order connections were inferred since in this case DBN finds the best lag (or a combination of lags) that best explains the data. These results indicate that even if the Markov lag is not correctly set, DBN can still succeed in identifying only true monosynaptic connections provided that the lag range is chosen equal to or more than the maximum anticipated synaptic latency in the population. 4.5.5 Networks with Unobserved and Independent Neurons Cortical neurons are known to receive common excitatory and inhibitory inputs from other regions [190-193]. These common inputs are a major source of complexity in inferring causal relationships since they cannot be distinguished from actual coupling caused by true synaptic links. We investigated DBN performance when unobserved common inputs exist in the form of synaptic coupling from unobserved neurons. Figure 4.9a shows a sample structure of the network examined in which unobserved neurons in the top row simultaneously excite pairs of observable neurons in the bottom row. The dashed connections in the figure indicate the connections that are expected to be inferred by DBN when the synaptic latency is fixed. These bidirectional 82 connections indicate that each pair of neurons receives the same input at the same time, and therefore there should be no causal, time-dependent relationship between them. We studied the performance of DBN in inferring connectivity between the observed neurons (indexed 6 to 15). The synaptic latency of connections to even numbered neurons le was kept fixed at 1 bin (3 ms) while that to odd numbered neurons lo was varied. Figure 4.9b demonstrates the inference accuracy plotted as a function of the difference l o - l e . We considered a bidirectional connection as an indicator of a common input. The Markov lag of the DBN was set to 1 bin (3 ms). As can be seen, the inference accuracy peaks when the synaptic latency of the connections to both odd and even numbered neurons was the same (i.e. the difference in synaptic latency lo − le is 0). As the difference in synaptic latency increases, the accuracy decreases. This may be interpreted as follows: when there is a difference between the synaptic latencies of the common input, an indirect relationship between the neurons receiving that common input emerges. For example, consider the common input from neuron 1 to neurons 6 and 7 shown in Figure 4.9a when the synaptic latency of the connection 1→ 6 is 1 bin (3 ms) while that of 1→7 is 2 bins (6 ms). In this case, the influence of an event from the common input (neuron 1) on neuron 7 probability of firing has a latency of 1 bin relative to its influence on neuron 6. Thus, not observing the common input coupled with a difference in synaptic latency between that input and the observed neurons results in a spurious connection between the observed neurons directed from the shorter latency neuron to the longer latency one. On the other hand, an inferred bidirectional connection between two observed neurons might indicate a common input simultaneously received by both neurons. Nevertheless, it can still be indicative of a pair of unidirectional physical connection between the two neurons rather than a common input. 83 1 … 2 le lo 7 6 le 8 lo 9 5 le 14 … lo 15 (a) F-Measure 1 0.8 0.6 0.4 0.2 0 0 1 2 l - l (bins) o (b) 15 1 8 20 2 3 9 14 2 4 6 11 10 15 13 16 9 3 6 5 12 17 7 13 8 1 4 5 12 e 14 18 10 3 11 19 7 (c) (d) Figure 4.9: (a) Network structure examined where black nodes 1-5 in the dashed rectangle represent unobserved neurons. Solid arrows indicate real connections while dashed ones indicate the expected connections to be inferred by DBN. (b) DBN performance vs. difference between the synaptic latency of the connections to odd numbered neurons and even numbered neurons (lo − le) where le is fixed at 1 bin. (c) A network of 20 neurons, where white nodes indicate observed neurons while black nodes indicate unobserved neurons. (d) A network with 15 observed neurons; neurons 1-10 (inside the dashed circle) receive connections while neurons 1115 are not connected to any neurons. The history interval was set to 60 bins (180ms) and the synaptic latency was fixed to 1 bin (3 ms) for all connections. 84 Discrimination between the two cases can be achieved if prior knowledge about the anatomy of the brain area and the duration of the synaptic latencies are available. We further examined the performance in the case of unobserved neurons within the population. We were interested in assessing whether this would hinder our ability to detect functional relationships between the actual observed neurons. We randomly selected 6 neurons to be unobserved from 20-neuron populations as shown in Figure 4.9c. DBN was applied to the spike trains of the remaining 14 neurons. The synaptic latency of all the neurons was set to 1 bin (3 ms) and the Markov lag was also set to 1 bin to match the synaptic latency. The inference accuracy achieved by examining 100 different networks was 0.96±0.03. When computing the accuracy, all the connections that involved the unobserved neurons in the simulated networks were not considered. We also examined the performance when some of the observed neurons are not elements of the functional network. These can be typically thought of as task-independent neurons. Identifying these neurons can be useful in the context of decoding spike trains. In such a case, the performance of a decoder should improve if the posterior probability of the stimulus is expressed in terms of the conditional probabilities of the task-dependent neurons, while excluding the task-independent ones from the computation of the joint prior distribution [194]. Figure 4.9d shows a sample network of 15 neurons in which neurons 1-10 receive 2 input connections each (1 excitatory and 1 inhibitory), while neurons 11-15 are not connected to any other neurons. The Markov lag was set at 1 bin. The inference accuracy obtained by applying the DBN to 100 networks was 0.98±0.07. The number of spurious connections inferred per network between any of the connected neurons (i.e. neurons 1-10) and the unconnected ones (i.e. neurons 11-15) was 0.1±0.3, and occurred in 10 out of 100 networks. The number of connections inferred 85 per network among unconnected neurons (i.e. neurons 11-15) was 0.05±0.2 and occurred in only 5 out of 100 networks. To compare this number with what would be expected by chance, we simulated 100 different networks with 10 unconnected neurons each. The number of connections inferred per network was 0.02±0.14, and occurred in 2 out of 100 networks. 4.5.6 Identifying Connectivity in Large Populations All the analyses shown so far were carried out on relatively small populations of 10 to 15 neurons each. Given that simultaneous recording of multiple single units in excess of these numbers is expected, we were interested to investigate how the method scales as a function of the number of neurons in the population. We simulated 10 different populations, 120 neurons each. Each population consisted of 12 clusters, 10 neurons each, in which each neuron received 3 excitatory connections from neurons belonging only to its own cluster. The history interval was set to 60 bins (180 ms) and the synaptic latency to 1 bin (3 ms). Figure 4.10a shows the inference accuracy when DBN was applied to the entire population with a Markov lag set to 1 bin (3 ms) for different search times. For a search time of 1 min, it is clearly seen that DBN was unable to infer the network structure (only 15% accuracy). To improve the accuracy, two different approaches were examined. First, when allowing the DBN more search time (2 hours), an inference accuracy of 100% was attained for the 120 neuron population. Second, one can try to break down the large population into smaller, functionallyinterdependent subpopulations if prior knowledge of any clusters in the neural space is available. When this is the case, we found that applying DBN to subpopulations when keeping the search interval fixed at 1 min significantly improved the performance. Specifically, we divided each of the 120 neuron populations into 2 subpopulations with 60 neurons each (i.e. clusters 1-6 in one 86 100 Time for F=1 (min) 1 F-Measure 0.8 0.6 0.4 0.2 0 1 min 5 min 10 min 60 min 120 min 0 30 60 90 120 # Neurons/Subpopulation (a) 10 1 0.1 0 30 60 90 120 # Neurons/Subpopulation (b) F-Measure 1 0.8 0.6 0.4 0.2 0 40 80 120 160 200 240 Search Time (min) (c) Figure 4.10: (a) DBN performance vs. number of neurons/subpopulation for different search intervals. 10 different populations of 120 neurons each were simulated. (b) Search time required by DBN to achieve 100% accuracy (F-Measure = 1) vs. number of neurons/subpopulation. (c) DBN performance vs. search time for 120 neuron populations with no clusters. subpopulation and clusters 7-12 in the other). The performance increased in this case by more than two fold as indicated in Figure 4.10a. When subsequent division is carried out into 3, 4, 6 and 12 subpopulations, with 40, 30, 20, and 10 neurons each, respectively, accuracy of 100% was achieved on populations of sizes 10 to 20 neurons each. In this realm, the multiscale clustering algorithm reported in Chapter 3 [27] was applied prior to DBN analysis. In summary, for large populations, a two-stage framework is suggested in which the neural space is first 87 clustered to identify any potential statistical dependence between neuronal elements. This is then followed by DBN analysis to identify the effective connectivity structure within each cluster. Figure 4.10b illustrates the minimum search time needed to achieve 100% accuracy for each subpopulation size. It can be seen that the computational complexity increases exponentially with the population size. Finally, we investigated the performance when no clusters are present. As can be seen in Figure 4.10c, when more search time is allowed (in this case by 2 fold), the underlying network was correctly inferred. 4.5.7 Estimating the Markov Lag of Maximum Accuracy We have demonstrated how the DBN performance is highly dependent on the selection of the Markov lag and that the accuracy is maximized when the Markov lag is equal to or slightly greater than the population true average synaptic latency. When dealing with real neural data, prior knowledge of this synaptic latency may be unavailable, and therefore some measure is needed to devise the best Markov lag to use for a given population. For that purpose, we computed the mean influence score (IS), defined as the average absolute value of the influence score for each inferred connection. The influence score IS(i, j) measures the degree to which neuron j influences neuron i’s firing, independent of the output of the other parents of neuron i [195]. Its computation is based on the same conditional probabilities used in inferring the network structure. IS can be either positive or negative depending on whether the connection is excitatory or inhibitory, respectively. Since IS measures the degree of influence of the presynaptic neuron on the post-synaptic neuron for a given inferred connection, it is expected that the mean of the absolute value of the IS of all the inferred connections for a given network structure will be maximum if the inferred structure matches the true one. 88 Mean IS 0.03 0.02 0.01 M=8 M=20 M=60 M=100 M=140 0 0 2 4 6 Markov Lag (bins) (a) 8 0.025 0.8 Mean IS F-Measure 1 0.6 0.4 0.2 0.02 0.015 0.01 0.005 0 0 1 2 3 4 5 1 2 3 4 5 Maximum Markov Lag (bins) Maximum Markov Lag (bins) (b) (c) Figure 4.11: (a) Mean influence score (IS) vs. Markov lag for variable history intervals for the same data analyzed in Figure 4.6a. Note that the mean IS has similar profile to the accuracy. (b) DBN performance vs. Markov lag for networks with HI equals 3. Maximum accuracy is attained when the maximum Markov lag matches the maximum synaptic latency in the population. (c) Mean influence score vs. Markov lag for networks with HI equals 3. Figure 4.11a shows the mean IS at different Markov lags for the data sets previously analyzed in Figure 4.6 in which the heterogeneity index (HI) was 0. Comparing the results in Figure 4.11a with those in Figure 4.6, it can be seen that for each choice of history interval length M, the mean IS profile matches the F-measure profile obtained when exact knowledge of the network structures was available. Thus, the mean IS can be used to quantify the degree of confidence in the inferred network and to potentially estimate the best Markov lag that explains the data. Figure 4.11b shows the inference accuracy for more complex networks with HI = 3 at 89 different ranges of Markov lags. The inference accuracy peaks when the maximum Markov lag matches the maximum synaptic latency in the population. Figure 4.11c shows similar result using the mean IS. It is clearly seen that, similar to the fixed synaptic latency case (HI = 0), the mean IS behaves similar to the F-measure. This confirms the utility of the mean IS as a metric for determining which network best explains the data when information about the anticipated synaptic latency is unavailable. 4.6 Discussion and Conclusion Identifying the effective connectivity between cortical neurons is a fundamental goal in systems neuroscience. In this chapter, we demonstrated the utility of Dynamic Bayesian Networks (DBN) in inferring this connectivity from the observed spike trains under a wide variety of conditions. Our main conclusion is that DBN is a useful tool for analyzing spike trains, and is capable of identifying causal relationships between distinct neuronal elements in small and moderately-large populations of neurons, particularly when nonlinear interaction between these neurons is present. We have applied the method to probabilistic neuronal network models. These models are increasingly being used in the neuroscience community because they are highly non-linear, stochastic, and faithfully model the dynamic discharge patterns of many cortical neurons. DBN outperformed Partial Directed Coherence (PDC), particularly when temporal dependence was variable among the population elements. The diminished PDC performance was mainly attributed to its inability to detect inhibitory connections. Compared to GLM fit, DBN performance was comparable overall, and occasionally superior in highly heterogeneous network structures. The GLM performance was not surprising, given that it was tested on data simulated 90 using the same generative model, while the DBN did not assume any specific generative model. Therefore, we expect DBN to have more powerful generalization capability compared to GLM fit, although not tested here. Some disadvantages of the GLM method are the need to specify a detection threshold (with associated p-values) due to the dependence of the neuron’s firing on the spiking history of other neurons that is likely to be variable across neuron pairs. In addition, the orders of magnitude difference in computational time, in favor of the DBN, makes the inference task more daunting for the GLM. Two issues deserved detailed investigations in the proposed approach. The first is the dependence of the DBN performance on the Markov lag, which can be regarded as playing the same role as the model order in the GLM fit. Nevertheless, choosing the Markov lag to be exactly matching the synaptic latencies is not necessary, as long as it is not smaller than the maximum synaptic latency expected in the population. Information about synaptic latency may be available if knowledge of the anatomical structure of the sampled cortical regions is available. In such a case, running the DBN using different Markov lags and finding the Markov lag(s) at which the mean influence score is maximized should yield the best accuracy. The structure obtained would be the most likely network structure that best explains the data. The second issue is the complexity of the search in the high-dimensional neural space that is directly proportional to the number of the neurons in the population. We found that the computational complexity increases exponentially with the population size. However, we found that when clusters of functionally interdependent neurons do exist, the computational complexity can be reduced. This is important, given the plethora of studies suggesting that neuronal interaction across multiple time scales may underlie multisensory neuronal integration in naturalistic behavior, perception, and cognition. The results suggest that a two-step framework 91 may be best suited for large-scale analysis, whereby populations are first clustered to discover any inherent statistical dependency between their elements. These clusters are subsequently analyzed by DBN to infer their underlying structures [29]. The clustering step, however, can be eliminated without affecting the DBN performance if longer search time is permitted. As detailed later in Chapter 6, the proposed approach can also be useful in quantifying synaptic plasticity that is strongly believed to underlie learning and memory formation. In such case, graphical approaches such as DBNs can reveal potential plastic changes reminiscent of cortical re-organization during learning or recovery from injury. It can also be used to identify task-dependent neurons when neural decoding is sought to reconstruct a sensory stimulus or predict an intended motor behavior in a wide range of neuroprosthetic applications. 92 Chapter ____________________________________________ 5 Millisecond-Timescale Stimulusspecific Network Coding ______________________________________________________________________________ 5.1 Rat Barrel Cortex Temporal precision plays an important role in the coordination among neocortical neurons and has been hypothesized to enhance information flow along sensory pathways [196]. However, its role in mediating the integration of information at the output of these pathways remains poorly understood. Using DBN and information theoretic measures, this chapter demonstrates some evidence for the presence of stable, stimulus-specific networks that emerge in the barrel cortex in response to individual whisker deflection [30]. The chapter also provides evidence for the existence of position-specific networks in medial prefrontal cortex (mPFC) as rats forage in a T-shaped maze during a working memory task [29]. The barrel cortex is part of the rat somatosensory cortex in which neurons encode information about whiskers deflection [31]. Rats use their whiskers to gather information about their surrounding environment and recognize features of external objects, much like humans use their hands and digits for tactile sensation. The barrel cortex is organized into vertical columns, or barrels, across different cortical layers. The cells within each barrel respond maximally to the deflection of a single principal whisker. Figure 5.1a illustrates the organization of the barrel cortex (layer IV) and the associated whiskers where whiskers/barrels are denoted by A-E for the 93 Rat Snout A 1 2 34 B C D E Barrel Cortex Layer II/III E 1 D B C 3 A 4 P M L A IV Va VPM Thalamus (a) (b) Figure 5.1: (a) (Left) The rat’s right snout with the whiskers organized in rows, indexed in letters, and columns, indexed in numbers. (Right) A horizontal section of the barrel cortex at layer IV showing the whiskers mapping (Adapted from [197]). (b) Flow of information from the Thalamus through the barrel system. The thickness of the arrows correspond to the strength of the connections [198]. rows and 1, 2, 3, etc. for the columns [197]. Figure 5.1b shows a schematic of the information flow to the barrel system [198]. Specifically, upon a certain whisker deflection, mechanoreceptors in its follicle modulate their firing activity. This activity is carried by sensory fibers projecting to the brain stem and subsequently project to the thalamus [199]. Axons from thalamic neurons innervate the barrel at layer IV corresponding to the deflected whisker as shown in Figure 5.1b. This layer is considered as the entry point of information to the barrel system at the cortical level. Inside each column, excitatory connections project from layer IV up to layer II/III and down to layer V. Weaker excitatory and inhibitory connections extend to neighboring barrels. It has been suggested that precise spike timing relative to stimulus onset carries most of the information about whisker displacement [48], and that this mechanism aids the animal during active whisking episodes to recognize external objects. This is primarily present in layer IV neurons as this layer receives numerous inputs from the ventral posterial medial (VPM) thalamus through the lemniscal and paralemniscal pathways [200, 201]. Neurons in infragranular layer V, on the other hand, exhibit more complex dynamics as they integrate inputs from multiple barrels 94 within and across hemispheres [202-204]. This integration creates larger receptive fields than those typically found in layer IV cells [205, 206]. Because layer V is a major output layer to multiple structures such as the posterior medial nucleus, zona incerta, pontine nuclei and the primary motor cortex [207-209], the frequently observed stimulus-dependent correlation among layer V cells is believed to play an important role in providing two streams of information [210]: a spatial code from ascending pathways representing current whisker position, and a corticothalamic feedback representing information about past whisker position [211]. These two streams are necessary to provide sufficient information to primary motor cortex for mediating active whisking cycles during adaptive exploratory behavior [212]. Despite the large body of reports suggesting that individual layer V neurons encode spatial information, the precise mapping of this spatial information to temporal coordination among layer V neurons during whisker movements remains poorly understood. Given the information integration that occurs in layer V, we hypothesized that: For a given stimulated whisker, a stable network representation would be obtained - as measured by the degree of structural similarity between networks inferred across multiple repeated trials. In addition, the structure of these networks would be less similar to those inferred during stimulation of other whiskers. 5.2 Methods 5.2.1 Barrel Cortex Recording Five adult female Sprague Dawley rats weighing ~300 g were used in this study (Rats R1R5). All procedures involving animals were approved by the Michigan State University 95 Institutional Animal Care and Use Committee (IACUC). Animals were anesthetized using a cocktail of ketamine and xylazine (75 and 5 mg/kg injected intrapertoneally, respectively). The left somatosensory cortex was exposed (4 × 4 mm craniotomy, 0-4 mm posterior and 4-8 mm lateral to bregma). A 32-channel microelectrode silicon array (NeuroNexus Technologies, Ann Arbor, MI, USA) with 4 shanks, 8 recording sites/shank, 400 μm shank separation and 100 μm electrode separation within shank was advanced into the barrel field in 100 μm steps. Acquired signals were amplified and band-pass filtered in the range 300-5000 Hz and sampled at 25 KHz. Stimulus-driven activity was recorded at depths of 1100-1500 μm corresponding to layer V of the barrel cortex. Subjects were perfused at the end of the experiments using 0.1 M phosphatebuffered saline and 4% paraformaldehyde. Coronal sections (50 μm) were cut and sections were Nissl-stained. The laminar depth of the arrays was confirmed to be in layer V by examining either the length of the electrode tracks or electrolytic lesions created by passing 4 μA current for 5 sec. Prior to vibrissae stimulation, whiskers were all trimmed to 6 mm length. For each rat, 3 whiskers were selected for mechanical stimulation that resulted in maximal modulation of the firing rate based on the observed neuronal response to manual deflection (Whiskers C2, C3 and D2 for R1; C2, D2 and D1 for R2; C1, D1 and D2 for R3; B2, B3 and B4 for R4; B1, B2 and B3 for R5). The selected whiskers were deflected one at a time by inserting each whisker into a capillary tube glued to a piezoelectric bimorph (Piezo Systems, Cambridge, MA, USA). Each whisker was horizontally deflected 900 times with a displacement of 80 μm for 100 ms (rise time and fall time were each set to 1 ms) at 1 Hz frequency as illustrated in Figure 5.2 [48, 213]. Spikes in multiple single unit activity were detected and sorted using NeuroQuest; a MATLAB toolbox for neural data processing and analysis [214, 215]. Spikes were declared 96 Deflection (μm) 80 0 1 98 Time (ms) 1 (b) (a) Figure 5.2: (a) Experimental setup where rats were anesthetized and whiskers were deflected one at a time using a capillary tube glued to a piezoelectric bimorph. (b) Whiskers were deflected horizontally for 100 ms. Red curve indicates whisker displacement. present if the raw waveform surpassed a threshold set at 3 times the noise standard deviation. Due to the observed overlap in the recorded spikes in the data, a short spike length of 0.5 ms was used for spike sorting (0.25 ms pre threshold crossing and 0.25 ms post threshold crossing). Spikes were aligned at their trough. Principal Component Analysis (PCA) was applied to the detected spikes, and the first 2 principal components were used as features for spike sorting. An average population size of 16±7.8 single units/rat was recorded (12 units for R1, 27 for R2, 21 for R3, 8 for R4 and 12 for R5). Spike trains were binned at Δ=0.5 ms. Quality of the spike sorting was assessed using inter-spike interval histogram (ISI) to ensure that no spikes with interspike intervals of less than 1.5 ms were classified as belonging to the same unit. Neurons were indexed by channel number (1-32) and unit number (a, b, c, etc ...). 97 5.2.2 Single Unit Analysis For each neuron, Post-stimulus Time Histograms (PSTHs) were computed as the average firing across trials for each stimulated whisker with 0.5 ms bin size within a window of 100 ms post stimulus onset. The peak PSTH count for a given neuron for each whisker was then extracted. The mean first-spike latency Liw of each neuron i for a given whisker w was computed as [48, 216] (5.1) T 1 w tw Liw = ∑ τ (1) Tw t =1 i w t where Tw is the total number of trials for whisker w and τ i w is a vector of the spike times of neuron i on trial tw relative to stimulus onset. We quantified the amount of information present in the individual neuron and population response property, namely the first-spike latency/spike count and network representation, respectively, about the deflected whisker identity using mutual information [48, 217]. For a given neuron i, the mutual information between its response property Xi and the stimulus W (in our case the whisker identity) was computed as ⎛ Pr ( xi , w ) ⎞ I ( X i ;W ) = ∑ ∑ Pr ( xi , w ) log ⎜ ⎜ Pr ( x ) Pr (w ) ⎟ ⎟ i ⎝ ⎠ xi w (5.2) where Xi corresponds to the time stamp of the first spike of neuron i within 100 ms of the stimulus onset in the case of first-spike latency, Xi corresponds to the total number of spikes fired by neuron i during the same time interval in the case of spike count, and Xi corresponds to the subnetwork of neuron i in the case of network representation as detailed later. The more distinct the response distribution of a given neuron to different whiskers is, the higher the mutual 98 information, and thus, the higher the information it conveys about the identity of the deflected whisker. To normalize I(Xi;W) between 0 and 1, we divided equation (5.2) by the joint entropy H(Xi,W). 5.2.3 Dynamic Bayesian Networks Analysis To infer the whisker-specific networks, we used the Bayesian Network Inference with Java Objects (BANJO) toolbox [178] with simulated annealing search algorithm [172]. For each deflected whisker, spike trains within 100 ms of each stimulus onset were considered as one trial. A total of 100 datasets, 18 sec each, for each whisker were extracted from the recorded 900 trials/whisker, where each dataset was formed by concatenating 180 trials that were randomly chosen with a uniform distribution from the 900 trials. This results in an overlap between any given pair of datasets that follows a binomial distribution with a mean overlap of 20% and a standard deviation of 4% (as will be shown later in Figure 5.6). The spike trains of each dataset were analyzed using DBN with Markov lags in the range [1, 10] bins ([0.5, 5] ms). A Markov lag range of [1, 5] bins ([0.5, 2.5] ms) was found to be the best range for all datasets based on calculations of an influence score that measures the degree of influence each pre-synaptic cell has on post-synaptic cells [28, 195]. In case a connection was inferred at more than one Markov lag, only the largest lag was considered. The maximum number of pre-synaptic cells for each cell was set to 10. 5.2.4 Network Similarity and Network Information To quantify the similarity between the inferred networks, we first represented each inferred network as an n × n binary adjacency matrix A, where n is the total number of neurons in the 99 network. Each element A(i, j) takes the value ‘1’ if there is a connection from neuron i to neuron j and ‘0’ if there is no connection between the corresponding neurons. For a given population of n neurons, K deflected whiskers and M datasets per whisker, all the adjacency matrices of the 2 inferred networks were vectorized and stacked together into one KM × n matrix. Principal component analysis (PCA) was then applied to this matrix to extract significant features from the inferred networks by projecting the adjacency matrices into a p-dimension network space, where 2 p ≤ n , that accounts for most of the variance in the networks [218]. The similarity R(Al, Am) between a pair of adjacency matrices Al and Am was defined as R( Al , Am ) = 1 − ql − q m (5.3) where ql and qm are the projections of Al and Am in the p-dimension network space, respectively, and ||.|| is the Euclidean distance (lp-norm) between the two projections. The network space was normalized such that the maximum possible distance between any pair of projections is 1. The number of principal components used p was set to 2. The average across-whiskers similarity R Across and the average within-whisker similarity RWithin for a given population were therefore defined as (5.4) 2 R Across = ∑ ∑ R (w1 , w2 ), K (K − 1) w w ≠ w 1 RWithin = 2 1 1 ∑ R (w, w) K w where the average similarity between the networks inferred for a given pair of whiskers w1 and w2, R (w1 , w2 ) , and within a given whisker w, R (w, w ) , were defined as 100 R (w1 , w2 ) = 2 w w R (Al 1 , Am 2 ) 2 ∑∑ M l m 2 w R (w, w) = ∑ ∑ R Alw , Am M (M − 1) l m ≠ l ( (5.5) ) Similar to the expression given in equation (5.2), the mutual information between the network projection Q and the stimulus W was computed as ⎛ Pr (q , w ) ⎞ I (Q;W ) = ∑ ∑ Pr (q , w ) log ⎜ ⎜ Pr (q ) Pr (w ) ⎟ ⎟ ⎝ ⎠ q w (5.6) where the probabilities used in equation (5.6) were computed from the 2-dimensional network space after discretizing it into a 10 × 10 bins. For the mutual information measure of individual neurons’ subnetworks, Q corresponds to the projection of the subnetworks of each neuron onto the network space. The normalized I(Q;W) was computed by dividing equation (5.6) by the joint entropy H(Q,W). In order to estimate any bias in R (w, w ) and I(Q;W) that results from the overlap in the datasets of each whisker, surrogate datasets were created from the original data such that each surrogate contains trials from all whiskers that has the same degree of overlap as the original datasets. Thus, within a surrogate, any excess similarity between the networks inferred for the datasets extracted from it would only result from the overlap between these datasets and not from the stimulus. To form the shuffled datasets, the 900 trials recorded for each whisker were first split into 3 different groups, 300 trials each. One group was then selected from each whisker and concatenated with 1 group from each of the other 2 whiskers forming a 900 trial shuffled surrogate (For example: group 1 of whisker 1, group 1 of whisker 2 and group 1 of whisker 3 form one shuffled surrogate; group 2 of whisker 1, group 2 of whisker 2 and group 2 of whisker 101 3 form another shuffled surrogate, ... etc). For each subject, 3 shuffled surrogates, 900 trials each, were formed. A total of 100 datasets, 18 sec each, were then extracted from each shuffled surrogate by concatenating 180 trials that were randomly chosen with a uniform distribution from the 900 trials. Therefore, these datasets are not whisker-specific and thus any similarity between the networks inferred within the same shuffled surrogate that is larger than that across surrogates would only result from the overlap between the datasets extracted from the same surrogate. Similarly, any non-zero mutual information between the networks inferred for the shuffled surrogates and the identity of these surrogates would also result from the overlap between the datasets extracted from the same surrogate. We used the difference between the within-surrogate similarity and the across-surrogates similarity as an estimate of the bias in the within-whisker similarity of the original data that results from the overlap. We also used the mutual information computed for the shuffled datasets as an estimate of the bias resulting from the overlap. 5.2.5 Decoding Whisker Identity We used a leave-one-out cross-validation approach to test the ability to decode the identity of the deflected whisker using the inferred networks [219]. The network obtained for each 18 sec dataset of a given whisker was compared to the other networks inferred for the same whisker (M - 1 networks) and the other whiskers (M(K-1) networks). The identity of the deflected whisker w* for a given test dataset was computed as w* = arg max w ( 1 Z ∑ R A, Azw Z z =1 ) (5.7) 102 where Z = M when w ≠ w* or Z = M - 1 when w = w* (to exclude the test dataset network from the similarity comparison). Therefore, w* is the whisker whose inferred networks (templates) result in maximum similarity with the test data fit A. Decoding for a given test dataset based on the response latency of individual neurons was obtained by, first, computing the mean first-spike latency for each neuron from all datasets of all whiskers excluding the test dataset (training datasets). The mean first-spike latency for each neuron in the test dataset was computed and compared to that obtained from the training datasets. Whiskers whose training datasets for each cell had the least absolute latency difference were identified as the ones being stimulated. Single-cell decoding accuracy was computed as the percentage of test datasets for which the identified whiskers matched the actual whiskers. The overall decoding accuracy was computed by averaging across cells. For comparison, we also computed the decoding accuracy using a majority-voting rule in which whisker identity was determined as the one with the majority of cells having the least absolute latency difference between training and test datasets. 5.2.6 Crosscorrelogram Analysis For the sake of comparison, we computed the standard crosscorrelogram approach to assess potential causal influence between pairs of neurons with bin size of 0.5 ms and range [-5, 5] ms [63]. Ten jittered versions of each of the datasets analyzed using DBN were formed in which each spike was randomly displaced over a uniform interval in the range [-10, 10] ms around the original spike time. A peak or trough (indicating excitation or inhibition, respectively) in the crosscorrelogram for a given neuron pair in the original dataset was determined to represent a connection if it crossed a confidence band computed from the jittered datasets. The upper and 103 lower limits of the confidence band were computed from the maximum and minimum counts of the jittered crosscorrelograms, respectively, with an acceptance level of 0.99 [13]. 5.2.7 Predicting Single Neuron Firing from Pre-synaptic Peers The firing probability of each neuron was estimated using the firing history (5 bins or 2.5 ms) of pre-synaptic cells determined from the networks inferred for each whisker and the stimulus. The conditional firing probability of a given neuron i at time t was estimated as { ( ( ) )} Pr⎛ ri (t ) = 1 r j t − lij − 5 Δ : t − lij Δ , S (t − Liw − 5Δ : t − Liw )⎞ ⎜ ⎟ j∈π (i ) ⎝ ⎠ { ( ( ) { ( ( ) (5.8) )} Pr⎛ ri (t ) = 1, r j t − lij − 5 Δ : t − lij Δ , S (t − Liw − 5Δ : t − Liw )⎞ ⎜ ⎟ j∈π (i ) ⎝ ⎠ = Pr⎛ r j t − lij − 5 Δ : t − lij Δ , S (t − Liw − 5Δ : t − Liw )⎞ ⎜ ⎟ j∈π (i ) ⎝ ⎠ )} where π(i) is the set of pre-synaptic neurons inferred for whisker w, lij is the Markov lag at which a connection from neuron j to neuron i was inferred, S(t) is a binary vector with a nonzero entry of ‘1’ only at the stimulus onset and 0 otherwise, and Liw is the mean first-spike latency computed in equation (5.1). Using ten-fold cross-validation, 10 training datasets (72 sec duration each) were extracted from each 90 sec whisker dataset by sliding a 72 sec window with 5 sec steps. The joint probabilities on the right hand side of equation (5.8) were computed from each training dataset using kernel density estimation with a normal function kernel and a bandwidth of 0.001 [220]. The probabilities estimated from each training dataset were used to predict the firing of each neuron in the remaining 18 sec test dataset. When conditioned on the stimulus only, equation (5.8) can be re-written as 104 Pr(ri (t ) = 1 S (t − Liw − 5Δ : t − Liw )) = Pr(ri (t ) = 1, S (t − Liw − 5Δ : t − Liw )) . Pr(S (t − Liw − 5Δ : t − Liw )) (5.9) The probability estimates were then smoothed by applying a moving average filter with a window of 15 ms. Predicted spike trains were computed for test datasets (10 datasets/whisker, 18 sec each) using the smoothed estimates for threshold values in the range [0, 1] with a step of 0.001. At each time point, the original spike trains were used in the prediction. True and false positive rates were computed from the predicted spike trains for each threshold. Receiver Operating Characteristic (ROC) curves were constructed from the true and false positive rates for each cell for a given whisker. The area under the ROC curve was used as a measure of the predictive power as 2 × (Area under ROC – 0.5) [221]. We compared the predictive power using the DBN method to that obtained using Generalized Linear Models (GLMs) [98]. GLM expresses the firing probability of a neuron i as {( ( ) )} Pr⎛ ri (t ) = 1 r j t − lij − 5 Δ : t − lij Δ , S (t − Liw )⎞ ⎜ ⎟ ∀j ≠i ⎝ ⎠ 5 5 ⎛ ⎞ = exp⎜ ∑ ∑αij (mΔ )r j (t − mΔ ) + ∑αiw (mΔ )S (t − Liw − mΔ )⎟Δ ⎜ ⎟ m =1 ⎝ j ≠i m=1 ⎠ (5.10) where αij models the coupling filters between neurons i and j and αiw models the stimulus filter. The history interval was set to 5 bins (2.5 ms) similar to the DBN analysis. The coupling and stimulus filters were estimated using the same training datasets for the DBNs using iterative reweighted least squares (IRLS). These filters were then used to compute the firing probability of each neuron at each time point of the test datasets. Spike train predictions were computed by comparing the estimated firing probability to threshold values in the range [0, 1]. ROC curves 105 were constructed and the predictive power was also computed for each neuron for a given whisker. 5.3 Results 5.3.1 Firing Characteristics We recorded a total of 80 single units from layer V of the barrel cortex in five anesthetized rats (R1-R5). Microelectrode arrays with 32 channels were used in each rat to record neural responses to unilateral stimulation of individual whiskers, one at a time. Each whisker was stimulated 900 times at a frequency of 1 Hz. Compared to previous studies that addressed individual neurons’ firing characteristics in which 50 trials only were used [48, 205, 222], we used this large number of trials (900 trials) to guarantee enough data to infer causal networks. Figure 5.3a and b illustrate the discharge patterns of two sample neurons to the deflection of three whiskers in rat R2. Onset response latencies were in the range of 10+ ms, corresponding to typical response patterns of pyramidal cells in layer V of the barrel cortex [223, 224]. The laminar depth of the arrays was also confirmed to be in layer V by examining electrode tracks or electrolytic lesions as illustrated in Figure 5.3c. Neurons showed significant whisker-specific modulation of their firing pattern in response to whisker deflection, with 71.3% of the recorded units exhibiting a significantly stronger response to stimulation of a principal whisker during the first 100 ms post stimulus onset compared to other non-principal whiskers (P < 0.05, two-sample t-test for each pair of whiskers). 106 Raster for Neuron 2a PSTH PSTH 400 PSTH 0 50 100 150 Time (msec) 0 50 100 150 Time (msec) 0 50 100 150 Time (msec) (a) Whisker D1 Stim. Whisker D2 Stim. Raster for Neuron 10a Raster for Neuron 10a PSTH PSTH Whisker C2 Stim. 80 0 Raster for Neuron 10a 800 400 0 0.05 0 PSTH 0 50 100 150 Time (msec) 0 50 100 150 Time (msec) 0 (b) 0.4 Sp. Count Info. D (μm) Whisker D2 Stim. Raster for Neuron 2a Raster for Neuron 2a 800 0 0.05 0 Sp./Trial Trial Whisker D1 Stim. Whisker C2 Stim. Peak PSTH (Sp./Tr.) D (μm) Sp./Trial Trial 80 0 0.2 0 50 100 150 Time (msec) 0.1 0.05 0 20 40 0 0.05 0.1 st st Mean 1 -sp. Lat. (ms) 1 -sp. Lat. Info. (c) (d) (e) Figure 5.3: Firing characteristics of the recorded neurons. Two sample neurons: (a) Neuron 2a and (b) Neuron 10a from rat R2 response to stimulation of whiskers C2, D1 and D2. (Top) Whisker displacement. (Middle) Spike raster over multiple repeated trials. (Bottom) Poststimulus Time Histogram (PSTH) with 0.5ms bin size. Neuron 2a shows stronger and faster response to whisker C2 than other whiskers while Neuron 10a shows a slightly stronger and faster response to whisker D1. (c) Nissl stained coronal section (50 μm) in rat R5. This rat was chronically implanted over 35 days. Dashed curve indicates the original shape of the section that was damaged due to the implant. Black arrowhead points to an electrolytic lesion mark of the deepest recording site on one of the shanks of the multi-electrode array. The depth of the lesion mark (~1250 μm) is consistent with the depth recorded using the micromanipulator during the surgery and corresponds to layer Vb of the barrel cortex (1.1 mm posterior and 5.2 mm lateral to bregma). (d) Peak PSTH counts of each neuron for each whisker versus its mean first-spike latency (n = 240). Each dot corresponds to the response of a single neuron to a single whisker. (e) Normalized mutual information between spike count/first-spike latency and whisker identity. Each dot corresponds to one neuron. Black diagonal line represents equal information (n = 80). 200 μm 107 It has been reported that the first spike post-stimulus onset in a barrel column conveys most of the information about the corresponding whisker deflection [48, 216]. Therefore, we used the post stimulus first-spike latency as a measure of temporal response fidelity of the recorded neurons. We found that 83.8% of the recorded neurons had a significantly smaller first-spike latency for a single whisker compared to other whiskers (P < 0.05, two-sample t-test for each pair of whiskers). A fraction of 46.3% of the recorded units showed preference to the same whisker in terms of both firing rate and first-spike latency, suggesting the presence of both rate and temporal coding mechanisms [225]. Neurons with strong response modulation to a given whisker also exhibited short latency as illustrated in Figure 5.3d (r = -0.76, P < 0.0001, n = 240, t-test). To confirm the salience of spike timing in encoding whisker movements, we computed the mutual information between the stimulus and each individual response property (namely, the first-spike latency and the spike count) [48, 217]. More information was conveyed about the stimulus by the first-spike latency than the spike count as illustrated in Figure 5.3e (P < 0.0001, n = 80, two-sample t-test). Indeed, 96.25% of the recorded neurons had larger first-spike latency information than spike count information, indicating that temporal coding was more pronounced compared to rate coding. (Information in first-spike latency: 0.19±0.08 bits, information in spike count: 0.05±0.04 bits; Normalized information in first-spike latency: 0.04±0.02, normalized information in spike count: 0.01±0.01, mean ± SD). 5.3.2 Whisker-specific Networks As illustrated in Figure 5.3, variability in the temporal characteristics of the responses across whiskers was observed. We asked whether this variability could be accounted for using a 108 network model of the recorded ensemble beyond what is provided by individual neurons’ response variability. To address this question, we analyzed the data by fitting a dynamic Bayesian network (DBN) model [28]. For each stimulated whisker, one hundred 18-sec long spike train datasets were formed from the 900 trials of each whisker by randomly choosing 180 trials out of the 900 trials following a uniform distribution. DBN fit was then obtained for each of these datasets. Figure 5.4a illustrates sample inferred networks for three distinct whiskers in one rat. To assess the validity of these networks in the absence of knowledge of the underlying true connectivity, we examined the connection probability as a function of the horizontal and vertical separation between electrodes. We expected that neurons recorded on the same or adjacent electrodes are more likely to be connected in the inferred networks, consistent with anatomical and physiological studies in the neocortex suggesting that connectivity tends to be mostly local for economic wiring [13, 149, 150]. Figure 5.4b demonstrates that neurons recorded on the same or on adjacent electrodes have a higher probability of being connected. Furthermore, the connection probability decreased with increasing electrode separation (r = -0.47, P < 0.05, n = 24, t-test). To examine whether the inferred networks are indeed whisker-specific, we measured the similarity between the networks inferred for the same whisker (termed herein within-whisker similarity) and the similarity between the networks inferred for different whiskers (termed across-whisker similarity). Principal component analysis (PCA) was used to construct a network feature space where each point in that space corresponds to one network as illustrated in Figure 5.5a [218]. We quantified the similarity between a pair of networks as (1 – the normalized distance between their corresponding projections in the PCA network space). Distance normalization ensured that the maximum possible pair-wise distance measure in the network 109 Whisker C2 16a 16b 17a 17b 24a 24b 25a 25b 10b 26a Whisker D1 10a 9b 9a 6b 26b 26c 31a 31b 6a 16a 16b 17a 17b 24a 24b 25a 25b 2b 2a 1b 1a 32c 32b 32a 31c 10b 26a 10a 9b 9a 6b 26b 26c 31a 31b 6a 2b 2a 1b 1a 32c 32b 32a 31c Whisker D2 10b Ver. Separation ( m) μ 16a 16b 17a 17b 24a 24b 25a 25b 26a 10a 9b 9a 6b 26b 26c 31a 31b 6a 2b 2a 1b 1a 32c 32b 32a 31c (a) Conn. Probability 500 400 300 200 100 0 0.08 0.06 0.04 0.02 0 0 400 800 1200 Hor. Separation (μ m) (b) Figure 5.4: Sample whisker-specific networks. (a) DBN networks inferred from one population (Rat R2) for 3 individually stimulated whiskers: C2, D1 and D2. Undirected links indicate bidirectional connections. Network of each whisker was inferred from a dataset of length 18 sec (180 trials x 100 ms). (b) Connection probability in the DBNs as a function of the horizontal and vertical separations between the electrodes on which neurons were recorded. The number of connections inferred at each distance was normalized by the corresponding total number of possible connections. 110 feature space does not exceed ‘1’. As illustrated by Figure 5.5b, networks inferred for the same whisker were significantly more similar compared to those inferred for other whiskers (83.3±6% within whisker, 50.3±18% across whiskers, P < 1e-6, two-sample t-test). This suggests that the inferred networks are whisker-specific, and that temporal coordination among the observed neurons bared a signature of the deflected whisker identity. We next examined whether the overlap between the datasets extracted for the same whisker (20±4% overlap between any given pair of datasets) may have resulted in a statistical bias towards the within-whisker similarity measures. We estimated that bias by creating multiple shuffled surrogate datasets for each population. Each surrogate dataset consisted of 900 trials, 300 trials from each whisker. One hundred 18-sec long datasets were then extracted from each surrogate dataset in the same way as the original 100 datasets/whisker were extracted. This shuffling procedure destroyed whisker-specific features within each dataset but maintained the same amount of overlap (Figure 5.6). Therefore, similarity within each surrogate dataset that exceeds similarity across surrogate datasets would only result from the overlap between the datasets within each surrogate. We found the bias of within-whisker similarity to be 1.8±3% (Figure 5.6). In addition, the projection of the networks inferred from the surrogate data clustered around the origin in the network space as seen in Figure 5.5a. This demonstrates that the networks inferred from the shuffled data represented pure noise and, therefore, the inferred networks from the original data are whisker-specific (distance from the origin for the original data: 1.83±0.9, distance from the origin for the surrogate data: 0.75±0.5, P = 0, two-sample ttest). We then examined the amount of information conveyed by the networks about the stimulus and compared it to those conveyed by single individual neuron responses. Averaged across 111 Network Space 10a 9b 9a 10b 6b 2 6a 16a 2b 16b 2a 17a 1b 10a 9b 9a 10b 6b 6a 16a 2b 16b 2a 17b 17a 1b 1a 24a 17b 32c 1a 24b 24a 32b 32c 25a 24b 32a 25b 32b 25a 31c 26a 26b 26c 31a 31b 32a 25b 31c 26a PC2 26b 9b 9a C2 D2 D1 Shuffled 10b 6b 6a 16a 2b 16b 2a 17a 1b 17b 1a 24a 32c 24b 32b 25a 32a 25b 31c 26a 26b 26c 31a 31b 31a 9b 9a 31b 0 -2 10a 26c -4 -4 10a 10b 6b 6a 16a 2b 16b 2a 17a 1b 17b 1a -2 0 PC1 2 24a 32c 24b 32b 25a 32a 25b 31c 26a 26b 26c 31a 31b (a) * Network Information Network Similarity 1 0.8 0.6 0.4 0.2 0 Within Whisker 0.4 0.3 0.2 0.1 0 Across Whisker * Original Data Shuffled Data (b) (c) Figure 5.5: Networks similarity within- and across-whiskers. (a) Network feature space of rat R2 for 3 different whiskers (C2, D2 and D1) and a shuffled dataset. Each dot corresponds to the projection of one network onto a 2-dimension principal components (PC1 and PC2) feature space. Insets: Sample networks from the 3 whiskers. Black edges represent common connections with the top left sample network (whisker C2). (b) Similarity between networks inferred for the same whisker and between networks inferred for different whiskers averaged across subjects (mean ± SD). Similarity for a given pair of networks was quantified as 1 – the normalized distance between the projections of the pair in the principal component space. Within whisker similarity was corrected for the bias resulting from the overlap in the data (estimated from the shuffled data). The figure indicates that the within whisker networks cluster more closely compared to across-whisker networks. *P < 1e-6, two-sample t-test. (c) Normalized mutual information between each of the networks inferred from the original and shuffled data, and the stimulus averaged across subjects (mean ± SD). *P < 0.01, two-sample t-test. 112 Original Data 0.05 0 0 50 Overlap (%) 100 Fraction of Pairs of Datasets Fraction of Pairs of Datasets 0.1 0.1 Shuffled Data 0.05 0 0 50 Overlap (%) 100 (a) Shuff. 1 Shuff. 2 Shuff. 3 2 PC2 Network Similarity Shuffled Data Network Space 1 0 -1 -1 0 1 1 0.8 0.6 0.4 0.2 0 2 Within Dataset Across Datasets PC1 (b) (c) Figure 5.6: Overlap in the original and the shuffled datasets. (a) Distribution of the amount of overlap between any pair of datasets for (Right) the original data and (Left) the shuffled data. Blue curve indicates a binomial distribution fit with parameters p = 0.2 and n = 180. Both figures indicate that both the original and the shuffled data have the same degree of overlap where any given pair of datasets would have an overlap of 20±4%. (b) Network feature space of the shuffled datasets extracted from rat R2 data. Each dot corresponds to the projection of one network onto a 2-dimension principal components (PC1 and PC2) feature space. (c) Similarity between networks inferred for the same shuffled dataset and between networks inferred for different shuffled datasets averaged across subjects (mean ± SD). Similarity for a given pair of networks was quantified as 1 – the distance between the projections of the two networks in the principal component feature space. subjects, information of 0.28±0.08 (Un-normalized: 1.4±0.3 bits) was obtained from the original data compared to only 0.09±0.07 (Un-normalized: 0.28±0.28 bits) from the shuffled data which can be used as an estimate of the bias resulting from the overlap among the within-whisker datasets (Figure 5.5c). Therefore, after correcting for the bias, the inferred whisker-specific 113 networks convey an average net normalized information of ~0.2. Compared to individual neurons’ response information, we found network information to be approximately 6 times the information provided by the first-spike latency and 13 times the information provided by the spike count. This suggests that the network code provides orders of magnitude more information about the stimulus compared to rate and temporal codes combined. Within each of the networks inferred, some neurons exhibited strong participation in subnetworks. We investigated whether, within each subnetwork, individual neurons’ conveyed more information about the stimulus compared to their individual response characteristics. This was done by, first, extracting the subnetwork where each neuron was an actual element, and second, computing the PC’s of these subnetworks as illustrated in the example of Figure 5.7a. Individual neuron subnetworks conveyed more information about the stimulus, albeit at a reduced precision compared to the parent network case as illustrated in Figure 5.7b (Normalized information for original data: 0.15±0.06, shuffled data: 0.08±0.02, P = 0, two-sample t-test; Unnormalized information for original data: 0.82±0.3 bits, shuffled data: 0.4±0.2 bits, P = 0, twosample t-test). Moreover, 78.6% and 94.3% of the neurons conveyed more information through their subnetworks than their individual first- spike latency and spike count, respectively, as can be seen in Figure 5.7c. This suggests that the network code provides better stimulus discrimination than rate and temporal code not only at the global population level but also at the local subnetwork level. It is noteworthy that the estimated probabilities, and so the computed mutual information, vary with the network space discretization bin size. Our results, however, did not seem to be affected by the choice of the bin size (Figure 5.8). The significant amount of information conveyed by the network suggests that decoding whisker identity based on network features would be more accurate. For a given 18-sec long 114 Neuron 16b Subnetwork Space 1 PC2 9b 2b 9a 1b 16a 0.5 1a 16b 32c 17a 25a 2a 2a 31a 31b 0 1b 16b 2b -0.5 -1 -1.5 C2 D2 D1 -2 -1 PC1 0 1 0.2 st * Resp. Info. Subnetwork Info. (a) 0.1 0.2 1 -sp. Lat. Sp. Count 0.1 0 Original Shuffled 0 0.1 0.2 Subnet. Info. Data Data (b) (c) Figure 5.7: Individual neurons subnetworks information. (a) Network feature space of one sample neuron (Neuron 16b of rat R2) for 3 different whiskers (C2, D1 and D2). Each dot corresponds to the projection of one network onto a 2-dimension principal components (PC1 and PC2) feature space. (b) Information in the individual neurons’ networks inferred for the shuffled data versus the original data (mean ± SD). * P = 0, two-sample t-test. (c) Normalized information in first-spike latency and spike count versus normalized information in the networks of each neuron. Each dot corresponds to one neuron. Black diagonal line represents equal information. Network information was corrected for any statistical bias by subtracting the network information computed for each neuron from the shuffled data in (b). 0 dataset of a given whisker, the identity of the deflected whisker was decoded as the whisker whose inferred networks had the highest similarity to the network of that dataset (using leaveone-out cross-validation method). Overall, decoding accuracy reached 97.6±3% across subjects 115 1.5 1 0.5 0 -0.5 0 Original Shuffled 80 100 Indiv.l Neuron Net. Info. (bits) Network Information (bits) 2 2 1.5 1 0.5 Original Shuffled 80 100 0 40 60 0 20 40 60 Bin Size Bin Size (a) (b) Figure 5.8: Network information in the original data is consistently higher than that in the shuffled data, independent of the bin size. (A) Network information in the original and the shuffled data as a function of the bin size used to estimate the mutual information averaged across the 5 subjects (mean ± SD). (B) Information in the network of individual neurons computed from the original and the shuffled data as a function of the bin size used to estimate the mutual information, averaged across the 80 neurons (mean ± SD). 20 (1464 of 1500 datasets were classified correctly), compared to only 79.7±12% when the response latency of each neuron was used as a feature for decoding, whereas using a majority voting method using all neurons reached an accuracy of 87.7±13%. 5.3.3 Relating Network Properties to Individual Neuronal Responses To examine whether the inferred networks are consistent with individual neuron response properties, we first examined whether the direction of an inferred connection between a given pair of neurons is consistent with the sign of the difference between their mean first-spike latencies. As Figure 5.9a illustrates, we found that for 93.3% of the inferred unidirectional connections, neurons acting as putative pre-synaptic cells had smaller latencies than neurons acting as putative post-synaptic cells. We next examined the relationship between the number of connections each neuron sends/receives and its response latency. Neurons with relatively large number of outgoing 116 0.3 0.2 0.1 0 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 Post-Pre First-spike Latency (ms) (a) Subnet. Info. (z-score) Fraction of Connections Source Index (z-score) 6 0.4 4 2 0 -2 2 1 0 -1 -2 -2 0 2 -1 0 1 2 3 Mean 1st-sp. Lat. (z-score) Source Index (z-score) (b) (c) Figure 5.9: Network properties and individual neuronal responses. (a) Histogram of the difference between the mean first-spike latency of the post-synaptic cells and the pre-synaptic cells for each inferred connection. Only unidirectional connections were counted in the histogram. Red bars indicate the fraction of connections consistent with the difference between latencies while blue bars indicate connections that are not. (b) Ratio between the number of outgoing connections and ingoing connections for each neuron (source index) as a function of its mean first-spike latency for each whisker. Each dot corresponds to one neuron for a given whisker (n = 240). Z-scores of the mean first-spike latency and the source index are reported on the X-axis and the Y-axis, respectively. Gray curve indicates decaying exponential fit. (c) Information in the networks of each neuron as a function of the source index averaged across whiskers (n = 80). Each dot corresponds to the standardized z-scores of one neuron. Gray line indicates regression line. connections constituted central hubs in the network, thereby acting as “source” nodes, as can be seen in Figure 5.4a. The identity of these source neurons was whisker–specific (for e.g., neuron 2b for whisker C2, neuron 16b for whisker D1, and neuron 17b for whisker D2). We observed that the identity of these neurons correlates with neurons with short response latency. On the other hand, neurons with large response latency were observed to have more ingoing connections 117 than ones with short response latency, thereby acting as “sink” nodes. Figure 5.9b illustrates the ratio of the number of outgoing (fan-out) to ingoing connections (fan-in) for each neuron, termed the source index, as a function of its response latency. This index decays exponentially with the mean first-spike latency (Time constant = -0.67, r2 = 0.53, n = 240, t-test). One way to interpret these observations is that neurons with short response latency receive the information about whisker deflection before neurons with larger response latency. In addition to these findings, neurons with high source index were found to convey more information about the stimulus than neurons with low source index as seen in Figure 5.9c (r = 0.54, P < 1e-6, n = 80, t-test). This suggests that few strongly connected hub neurons are key players in orchestrating the local population response to the stimulus. 5.3.4 Crosscorrelogram Comparison The crosscorrelogram is a known method to identify functional connectivity between cells over very short time scales (< 5 ms) [13, 63]. Crosscorrelogram analysis of our data revealed some similarity but also some substantial differences from the DBN analysis. In particular, Figure 5.10a shows the connection probability calculated from crosscorrelogram analysis as a function of the difference between post and pre-synaptic neurons latencies. Contrary to the DBN result in Figure 5.9a, the probability of inferring a connection from small latency neurons to large latency neurons was not significantly higher than inferring a connection in the opposite direction. (Crosscorrelogram: 50.1%, DBN: 93.3%). Similar to Figure 5.9b, Figure 5.10b shows a negative correlation between the source index and response latency but less significant than 118 4 2 0 -2 0.3 0.2 0.1 0 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 Post-Pre First-spike Latency (ms) (a) Fraction of Connections Fraction of Connections Source Index (z-score) 6 0.4 1 Crosscorr. DBN * * 0.5 0 -2 0 2 ChainCommon st Mean 1 -sp. Lat. (z-score) effect Input (b) (c) Figure 5.10: Comparison with networks inferred using crosscorrelograms. (a) Histogram of the difference between the mean first-spike latency of the post-synaptic neurons and the pre-synaptic neurons for each connection inferred using the crosscorrelogram technique. Only unidirectional connections were counted in the histogram. Red bars indicate the fraction of connections consistent with the difference between latencies while blue bars indicate connections that are not. (b) Ratio between the number of outgoing connections and ingoing connections for each neuron (source index) as a function of its mean first-spike latency for each whisker. Each dot corresponds to one neuron for a given whisker (n = 240). Z-scores of the mean first-spike latency and the source index are reported on the X-axis and the Y-axis, respectively. Gray curve indicates decaying exponential fit. (c) Fraction of possible chain effect-induced connections and common input-induced connections inferred by crosscorrelogram and DBN (mean ± SD). * P < 0.001, two-sample t-test. what was obtained using DBN analysis (Crosscorrelogram: time constant = -0.23, r2 = 0.04, n = 240; DBN: time constant = -0.67, r2 = 0.53, n = 240). The crosscorrelogram analysis, however, revealed some inconsistency with individual neuron response analysis. In particular, more connections were observed that were inconsistent with the response latency of the neurons, indicating they mostly represent spurious connections. This was 119 expected because, in contrast to DBN, the crosscorrelogram method – by virtue of the fact that it is a pair-wise measure – does not explain away unlikely causes of correlation. To demonstrate that this is indeed the case, consider for example the case of 3 neurons A, B and C forming a 3neuron chain where A→B and B→C but no connection exists between A and C. A pair-wise measure such as the crosscorrelogram would infer a direct connection A→C. In the case of a common input A driving both B and C, the crosscorrelogram would detect the spurious connections B→C or C→B. We quantified the ability of the DBN method to infer direct causal influence in the 3-neuron chain case as well as the common input case and compared it to the crosscorrelogram method. We found that the crosscorrelogram inferred a significantly higher number of spurious connections compared to DBN (P < 0.001, two-sample t-test). On average, the crosscorrelogram inferred 36.2% more connections than DBN for the 3-neuron chain case, and 64.4% more connections for the common input case as shown in Figure 5.10c. 5.3.5 Network Model-based Prediction of Single Neuron Firing We further examined the ability of the inferred networks to predict the firing of individual neurons. The probability of firing of each neuron at any given time point was estimated based on the firing history of the neuron’s input connections as determined by the network structure for a given whisker dataset and stimulus onset. Predicted spike trains were obtained by comparing the probability of firing to a variable threshold (see section 5.2.7) and computing the percentage of times the spikes in the predicted spike train matched the original spike train (True positives) and the percentage of times they did not match (False positives) for each threshold value [221]. We then compared the Receiver Operating Characteristics (ROC) obtained using DBN fit to those obtained using Generalized Linear Models (GLM), conditioned on the pre-synaptic cells’ history 120 Neuron 17b - Whisker D2 0.5 DBN (Pre-syn. + Stim) GLM (Pre-syn. + Stim) Stim 0 0 0.5 False Positive Rate * GLM Predictive Power (a) 1 * 0.5 Stim DBN Predictive Power 0 1 GLM DBN (Pre-syn.+Stim) (Pre-syn.+Stim) (b) 0.5 0 0 20 40 Within Whisker Lat. Std (ms) Neuron 2b - Whisker C2 0.5 0 0 1 DBN - GLM Pred. Pow. Predictive Power 1 True Positive Rate True Positive Rate 1 DBN (Pre-syn. + Stim) GLM (Pre-syn. + Stim) Stim 0.5 False Positive Rate 1 1 0.5 0 0 0.5 1 DBN Predictive Power (c) 0.4 0.2 0 -0.2 -0.4 0 20 40 Within Whisker Lat. Std (ms) (d) (e) Figure 5.11: Predicting single neuron firing. (a) ROC curves of sample neuron 2b in response to whisker C2 deflection (left) and neuron 17b in response to whisker D2 deflection (right) in rat R2. Predicted spike trains were obtained using the pre-synaptic cells’ history inferred by DBN and stimulus onset time, pre-synaptic cells’ history inferred by the GLM and stimulus onset time, and the stimulus onset time only. Each curve was computed from tenfold cross-validation datasets. (b) Predictive power comparison across all rats for multiple models (mean ± SD). * P < 0.001, two-sample t-test. (c) GLM predictive power versus that obtained using DBN and stimulus onset time. Black diagonal line represents equal prediction (n = 240). (d) Predictive power obtained using DBN and stimulus as a function of the variability in mean first-spike latency. (e) Difference between predictive power for each neuron obtained using DBN and stimulus and that obtained using GLM as a function of the variability in mean first-spike latency. Each dot corresponds to one neuron for a given whisker (n = 240). Gray lines in (d) and (e) indicate regression lines. 121 and the stimulus [98]. GLM has been shown to fit spike train data in a number of brain structures [83, 98, 221]. Figure 5.11a demonstrates sample ROC curves of two cells. As can be seen, the predictive power of DBN conditioned on the pre-synaptic cells’ history as well as the stimulus onset was the highest, suggesting that network inference was the most accurate. To quantify the predictive power for each neuron, we used the area under the ROC curve (AUC) [221]. An AUC of 0 indicates a prediction that is similar to chance level, while an AUC of 1 indicates perfect prediction with a highly deterministic conditioned response. Figure 5.11b shows a comparison between the predictive power of different models. DBN predictive power conditioned on the pre-synaptic cells’ history and the stimulus onset time was significantly better than that obtained conditioned on the stimulus onset time only (stimulus only: 0.2±0.24, DBN pre-synaptic cells’ history and stimulus: 0.58±0.16, P = 0, two-sample t-test). The predictive power of the DBN was slightly better than that of the GLM (GLM: 0.53±0.16, P < 0.001, n = 240, two-sample t-test) as illustrated in Figure 5.11c. DBN predictive power was inversely proportional to the variability in response latency as seen in Figure 5.11d (r = -0.56, P < 1e-20, n = 240, t-test). Thus, better overall prediction was obtained for cells with small variance in the response temporal precision compared to cells with high variance. For this group of cells, DBN prediction was better than GLM as shown in Figure 5.11e (r = -0.26, P < 0.001, n = 240, t-test). 5.4 Position-specific Networks in the Medial Prefrontal Cortex We used DBN to infer causal relationships between simultaneously recorded neurons in the medial prefrontal cortex (mPFC) of an awake, behaving rat, during a working memory task [13, 29]. This brain area is known to integrate multiple inputs during decision making to guide motor behavior. This experiment was carried out in Gyorgy Buzsaki’s laboratory at Rutgers University 122 in which rats were trained to decide, based on an odor cue, whether to traverse the left arm or the right arm of a T-maze to be rewarded. A session consisted of 24 right-turn trials and 18 left-turn trials. Using crosscorrelograms features, recorded neurons were found to fire selectively in different regions of the maze and exhibit some form of pair-wise connectivity. These mPFC neurons were therefore argued to encode behavioral goal representation that is mediated by integrating a myriad of sensory inputs. In addition, inferring monosynaptic connectivity between some of the recorded neurons was hypothesized to form local dynamic circuits indicative of short-term plasticity [13]. Spike trains from 39 simultaneously recorded neurons in layer 2/3 of the mPFC were examined here for the existence of consistent networks using the graphical models discussed above. Spike trains were first downsampled to 2000 Hz (0.5 ms bins). This permitted detecting connectivity with synaptic latency of 0.5 ms or more. The length of each maze arm was normalized between 0 and 1 and divided into 5 intervals (0 0.2, 0.2 0.4, 0.4 0.6, 0.6 0.8 and 0.8 1). The spike train data corresponding to each interval were concatenated together to nd form 5 position-specific datasets for each side (2 Trial 1 Trial 2 row in Figure 5.12). Each of the positionTrial 3 Trial 24 ... 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 T1 T2 . . . T24 0 0.2 T1 T2 . . . T24 0.2 0.4 D1 D2 D3 D4 D5 D6 D7 D8 D1 D2 D3 D4 D5 D6 D7 D8 0 0.2 0.4 0.6 0.8 1 T1 T2 . . . T24 T1 T2 . . . T24 T1 T2 . . . T24 0.4 0.6 0.8 1 0.6 0.8 D1 D2 D3 D4 D5 D6 D7 D8 D1 D2 D3 D4 D5 D6 D7 D8 D1 D2 D3 D4 D5 D6 D7 D8 Figure 5.12: Partitioning the spike trains to obtain position-specific datasets for 24 right-turn trials (similar analysis for 18 left-turn trials). 123 specific datasets was subsequently divided into 8 smaller datasets, where each dataset was rd formed by randomly selecting 6 sub-trials and grouping them together (3 row in Figure 5.12). For each position-specific dataset, connections with synaptic latency in the range [0.5, 3] ms with a step of 0.5 ms were identified. Figure 5.13 illustrates the networks obtained for each maze position interval for the left and right datasets. Only connections inferred in at least 3 out of the 8 datasets examined are shown. We then examined the similarity between the inferred networks for different maze positions in Figure 5.14a. Figure 5.14b illustrates that the similarity decreases with increasing separation between maze positions. This suggests a graded transition between network states as the rat navigates through the maze. In addition, we found that networks corresponding to left arm datasets have more connections (17.8±5.3) than those of right arm datasets (12.6±7). This may be attributed to the difference in firing rate of each neuron for left and right trials, where 62% of the analyzed neurons were more active during left trials than right trials. 5.5 Discussion and Conclusion Neural coding theories in the sensory neocortex of many species posit that fast integration of sensory information is crucial to the organism’s ability to guide motor actions. In the rat somatosensory cortex, previous studies have consistently demonstrated that putative pyramidal neurons in layer IV receiving input from trigeminal nuclei through VPM thalamus showed the prevalence of temporal coding over rate and correlation coding [48, 205, 225-227]. Whether temporal coding at the population level provides sufficient information to subserve the rapid sensorimotor integration mechanisms needed to perform active whisking remained poorly understood. 124 192 186 25 130 49 246 271 135 30 22 32 295 55 207 246 139 182 30 133 271 135 30 247 131 245 244 133 Position = 0 -> 0.2 Right 201 Position = 0 -> 0.2 Left 190 143 32 72 201 22 207 295 55 73 40 Position = 0.2 -> 0.4 Left 45 295 246 22 30 55 139 32 131 19 130 135 197 49 40 186 143 271 207 271 201 192 52 73 190 247 71 138 72 192 Position = 0.2 -> 0.4 Right 135 245 54 Position = 0.4 -> 0.6 Right 143 244 295 22 135 190 186 246 45 139 190 40 135 49 216 249 72 130 25 192 143 247 244 52 133 295 30 245 186 246 182 131 186 71 201 246 201 Position = 0.4 -> 0.6 Left 131 197 295 190 72 52 139 182 247 271 241 40 201 130 30 131 Figure 5.13: Networks inferred from left and right trials data for different maze position intervals. Only connections appeared in at least 3 out of the 8 datasets of each position interval are shown. Clear dissimilarity between network models of left and right trials can be seen. 125 Position Interval Network Similarity 0.19 0.18 0.17 0.16 0.15 0-0.2 0.2-0.4 0.4-0.6 0.6-0.8 0.8-1 Average F-Measure Position = 0.6 -> 0.8 Right Position = 0.6 -> 0.8 Left 245 295 182 143 182 197 244 19 246 192 49 22 52 247 25 49 130 150 30 72 135 25 190 30 135 216 186 73 131 186 249 40 139 72 245 130 32 19 246 216 150 295 201 201 133 207 Position = 0.8 -> 1 Right Position = 0.8 -> 1 Left 138 52 197 25 130 30 138 246 45 40 186 131 245 72 19 241 283 247 133 143 271 192 216 190 135 54 249 216 55 22 73 71 201 135 201 Figure 5.13 (cont’d) 283 54 271 138 241 0.2 0.18 0.16 0.2 0.4 0.6 0.8 0-0.2 0.2-0.40.4-0.60.6-0.8 0.8-1 Position Intervals Distance Position Interval (b) (a) Figure 5.14: (a) Similarity between the inferred networks at different maze positions computed using the F-Measure (equation 4.9) averaged across both sides. On-diagonal entries are eliminated. (b) Average similarity between the inferred networks vs. distance between maze position intervals. Values were extracted from the matrix shown in (a). It can be seen that networks become increasingly dissimilar as a function of the distance between position intervals, suggesting a differential population encoding mechanism that depends on the behavioral goal. 126 Here, we examined the dynamics of spike timing correlation between local, simultaneously observed neurons in layer V in response to unilateral whisker stimulation in the anesthetized rat. We showed that rapid network dynamics between these neurons, as determined by the stable, whisker-specific dynamic Bayesian networks, provided evidence of a synergistic code that mediates information flow. Furthermore, we were able to demonstrate that the effective connectivity revealed by the structure of these networks provided more information about the stimulus than what is provided by both temporal and rate codes of each neuron analyzed individually. In particular, we showed that the decoding performance of these networks was ~18% higher compared to that of the first-spike latency, and that the inferred connections were important in predicting individual neurons’ firing patterns. This agrees with previous studies that addressed similar questions in other brain areas [53, 80, 83], albeit at a much finer temporal and spatial resolution. It is widely accepted that complex, possibly nonlinear, response characteristics occurring at the population level such as those described here are more pronounced in higher cortical areas such as the prefrontal cortex, or subcortical areas such as the hippocampus and the thalamus [13, 228, 229]. These characteristics have been also hypothesized to underlie the fast oscillatory patterns observed throughout many cortical layers [230, 231]. These patterns are known to originate in the cortex and do not require rhythmic drives from the thalamus, as demonstrated by direct electrical stimulation studies [116, 212]. Here we demonstrate that the origin of these dynamics may be rooted in the precise temporal coordination among layer V neurons possibly to subserve the integration of multiple information processing streams needed to mediate sensorimotor transformations by areas innervated by layer V, and in particular the motor cortex. Whether the connectivity inferred in this study represents actual anatomical connectivity 127 between the cells was not possible to achieve with our recording or analysis methods. Nevertheless, there is substantial evidence in the literature indicating that local anatomical connectivity in sensory cortices is predominantly present [232, 233], and our results provide strong support of these findings. A plausible interpretation of our findings would be that since layer V is a major output of the barrel cortex to other brain areas such as the thalamus, the motor cortex and the pontine nuclei [207], information has to be rapidly integrated across cortical columns to regulate the whisking behavior needed to discriminate between different objects that come in contact with the whiskers [211]. Thus, these high-level functions might require a gain-modulation mechanism that is dynamically shaped by a variable number of participating neurons coordinating their information-bearing signals to represent stimulus attributes that cannot be provided by single units responding individually to their principal whiskers. Our results therefore suggest a strong account for a synergistic coding mechanism in layer V that may reflect stimulus-dependent states of the observed population. While these findings agree with previous reports of multi-whisker integration in layer V of the barrel cortex at the single cell level [205, 207], they provide the first evidence that this integration occurs at the network level within millisecond timescales. It is noteworthy that the recurrent nature of cortical circuits, particularly those present in layer V, makes it especially difficult for pair-wise connectivity measures such as crosscorrelograms to differentiate between direct and indirect coupling, such as in a neuronal chain, or when a common input is present. The DBN approach overcomes these limitations, as it integrates immediate evidence with prior information (long-term knowledge) and uses this process to explain away unlikely causes of firing. 128 It is important to bring our findings in perspective with two related hypotheses, namely cellassemblies and synfire chains [179, 234]. Integration across neighboring columns as demonstrated by our analysis seems to support the cell assembly hypothesis given the active, dynamic participation of the observed neurons in whisker-specific networks. This implies that each post-synaptic neuron in the assembly “reads” patterns of firing of its pre-synaptic peers within temporal integration windows with possibly variable lengths. This is particularly interesting, in part because cell-assemblies that reflect different degrees of synchrony between their elements have been hypothesized to correlate with presumed ‘top-down’ processing of sensory information [228, 235]. Herein, our findings suggest that similar mechanisms occur during bottom-up processing where sensory information is propagated to upstream cortical networks, and that coordination among the cells at millisecond timescale is a more wide spread phenomenon than previously thought. Finally, consistent with these findings in the somatosensory cortex, data from the medial prefrontal cortex (mPFC) showed a similar trend in which networks inferred during a working memory task were position-specific. We found that the similarity between the inferred networks is inversely proportional to the distance between the corresponding positions, suggesting a graded transition between network states as the rat navigates through the maze. These findings in both somatosensory cortex and mPFC suggest that localized network codes may represent a universal encoding mechanism for mediating information flow across many neocortical structures. 129 Chapter ____________________________________________ 6 Network Dynamics Associated with Experience-dependent Plasticity in the Rat Somatosensory Cortex ______________________________________________________________________________ 6.1 Experience-dependent Plasticity in the Barrel Cortex Brain plasticity, or plasticity in short, refers to the continuous change in the neurons’ intrinsic properties or the connections among them that occurs due to a variety of reasons such as experience, aging, brain diseases or injury [31, 236]. The problem of identifying plasticity in a recorded neural population has long been the subject of intense research given its importance in understanding how the brain develops and adapts [66, 86, 237, 238]. Experience-dependent plasticity might be the most vital to brain development because it includes memory formation, accompanies learning and sensory alteration. The latter results in reorganization of the brain area that represents the altered sensory inputs [239, 240]. For example, sensory alteration has been studied in humans and monkeys by amputating hand fingers or blocking nerves that innervate them resulting in reorganization of connectivity in the somatosensory cortex [241, 242]. Similar phenomena have been observed in the somatosensory cortex of adult rats as a result of whisker trimming [198]. For example, sparing a single whisker results in an expansion of the barrel corresponding to that whisker and shrinkage of the barrels corresponding to the deprived whiskers [243]. In this chapter, we focus on experience-dependent plasticity induced by trimming all whiskers but two, termed whisker pairing. Consistent with previous studies, we 130 hypothesize that whisker pairing will result in merging of the two barrels corresponding to the spared whiskers [198]. This merging form of plasticity is consistent with Hebb’s hypothesis which postulates that “neurons that fire together, wire together” [179]. This can be explained by virtue of the fact that in the case of whisker pairing, rats are likely to rely on the two spared whiskers during active whisking episodes to palpate new objects and explore the surroundings. Subsequently, neurons of the two corresponding barrels would likely fire nearly simultaneously than if all whiskers were intact. There are multiple physiological mechanisms behind this type of plasticity. For example, the blockade of NMDA receptors or the lack of α-CaMKII (calcium/calmodulin-dependent protein kinase II, type α) impairs plasticity in the somatosensory cortex [198]. In addition, inhibitory connections are also altered where the number and density of GABA synapses in layer IV was observed to decrease significantly as a result of whisker deprivation. Other studies have also demonstrated a significant increase in excitatory axonal projections from the barrels corresponding to the deprived whiskers to the barrels corresponding to the spared whiskers, whereas a significant increase in the inhibitory axonal projections occurs in the opposite direction [244]. At the individual neuron level, neuronal firing rates in layer IV in one principal sparedwhisker (W1) barrel become more similar to those in the other spared-whisker (W2) barrel in response to deflection of W2 [245]. Plasticity in layer IV starts 3 days post whisker trimming and reaches its peak 7-10 days post whisker trimming [246], while in layers II/III and layer V it starts as early as 1 day post whisker trimming [247]. The elevated response of the neurons of one whisker to the other spared whisker is hypothesized to result from the emergence of new connections between the two corresponding barrels or the strengthening of already existing connections [198]. Changes in the effective connectivity, measured using crosscorrelograms, 131 across the barrels corresponding to the spared whiskers have been reported in layer IV indicating an increase in the strength of these connections [213]. In this chapter, we examine the applicability of Dynamic Bayesian Networks (DBNs) in tracking plasticity induced by whisker pairing at the network level. We hypothesized that stimulus-specific networks inferred for the spared whiskers will exhibit more similarity after pairing compared to networks inferred before pairing in the same animal. We provide evidences that plasticity induced by whisker pairing occurs in layer V and that DBN can track changes in the stimulus-specific networks. 6.2 Methods 6.2.1 Barrel Cortex Recording and Electrode Implantation Four adult female Sprague Dawley rats weighing ~300 g were used in this study. All procedures involving animals were approved by the Michigan State University Institutional Animal Care and Use Committee (IACUC). Animals were anesthetized using a cocktail of ketamine and xylazine (75 and 5 mg/kg injected intrapertoneally, respectively). The left somatosensory cortex was exposed (4 × 4 mm craniotomy, 0-4 mm posterior and 4-8 mm lateral to bregma). A 32-channel microelectrode silicon array (NeuroNexus Technologies, Ann Arbor, MI, USA) with 8 shanks, 4 recording sites/shank, 200 μm shank separation and 100 μm electrode separation within shank was advanced into the barrel field in 100 μm steps. Acquired signals were amplified and band-pass filtered in the range 300-5000 Hz and sampled at 25 KHz. Stimulus-driven activity was observed at depths of 1100-1500 μm corresponding to layer V of 132 the barrel cortex. After reaching the desired depth in layer V, the electrode array was secured in place using dental cement. Rats were left to recover for 7-10 days post surgery after which they were anesthetized and stimulus-driven S1 activity was recorded (control data). For each rat, 3 whiskers contralateral to the side of the implant were selected for mechanical stimulation based on the observed neuronal response to manual deflection (mapping experiment). The selected whiskers were deflected one at a time by inserting each whisker into a capillary tube glued to a piezoelectric bimorph (Piezo Systems, Cambridge, MA, USA). Each whisker was horizontally deflected 900 times with a displacement of 80 μm for 100 ms (rise time and fall time were each set to 1 ms) at 1 Hz frequency. Two whiskers were then selected to be paired and all other whiskers on the same side of the rat’s mystacial pad were trimmed to the skin level. Rats were returned back to their cages where enrichments were introduced to encourage them to actively whisk new objects using the spared whiskers. After 1-2 days and 6-7 days post-whisker trimming, rats were anesthetized and stimulus-driven activity was recorded (plasticity data) in response to spared whiskers deflection. At the end of the 1-2 days recording session, re-grown whiskers (not previously spared) were trimmed again to restrict the pairing to the same pair of whiskers. For both control and plasticity data, spikes from multiple single unit activity were detected and sorted using NeuroQuest; a MATLAB toolbox for neural data processing and analysis [214, 215]. Spikes were declared present if the raw waveform surpassed a threshold set at 3 times the noise standard deviation. A spike length of 1 ms was used for spike sorting (0.25 ms pre threshold crossing and 0.75 ms post threshold crossing). Spikes were aligned at their trough. Principal Component Analysis (PCA) was applied to the detected spikes, and the first 2 principal components were used as features for spike sorting. An average population size of 23.2±6.7 133 single units/rat was recorded. Spike trains were binned at Δ=1 ms. Quality of the spike sorting was assessed using inter-spike interval histogram (ISI) to ensure that no spikes with inter-spike intervals of less than 2 ms were classified as belonging to the same unit. 6.2.2 Data Analysis For each neuron, the total number of evoked spikes in a given trial for each stimulated whisker was computed as the difference between the total number of spikes fired by the neuron within the 100 ms-trial and the total number of spikes fired by the neuron in the 100 ms window preceding the trial. The mean first-spike latency of each neuron for a given whisker was computed using the same expression given in equation (5.1). To infer the stimulus-specific networks for both control and plasticity data, a total of 100 datasets, 18 sec each, for each whisker were extracted in a similar manner to the datasets used in Chapter 5 from the recorded 900 trials/whisker. Each dataset was formed by concatenating 180 trials that were randomly chosen with a uniform distribution from the 900 trials. The spike trains of each dataset were analyzed using DBN with Markov lags in the range [1, 5] bins ([1, 5] ms). In case a connection was inferred at more than one Markov lag, only the largest lag was considered. The maximum number of pre-synaptic cells for each cell was set to 10. Similar to the analysis presented in Chapter 5, the similarity between the inferred networks was quantified as follows: we first represented each inferred network as an n × n binary adjacency matrix A, where n is the total number of neurons in the network. Each element A(i, j) takes the value ‘1’ if there is a connection from neuron i to neuron j and ‘0’ if there is no connection between the corresponding neurons. For a given population of n neurons, K deflected whiskers and M datasets per whisker, all the adjacency matrices of the inferred networks were 134 2 vectorized and stacked together into one KM × n matrix. Principal component analysis (PCA) was then applied to this matrix to extract significant features from the inferred networks by 2 projecting the adjacency matrices into a p-dimension network space, where p ≤ n , that accounts for most of the variance in the networks [218]. The distance D(Al, Am) between a pair of adjacency matrices Al and Am was defined as D( Al , Am ) = ql − q m (6.1) where ql and qm are the projections of Al and Am in the p-dimension network space, respectively, and ||.|| is the Euclidean distance (lp-norm) between the two projections. The number of principal components used p was set to 2. The average distance between the networks inferred for a given pair of whiskers w1 and w2, D (w1 , w2 ) , was defined as D (w1 , w2 ) = 2 M w w D (Al 1 , Am 2 ) 2 ∑∑ (6.2) l m 6.3 Results 6.3.1 Changes in Individual Neuron Response We recorded a total of 325 single units from layer V of the barrel cortex in four chronically implanted rats. Microelectrode arrays with 32 channels were used in each rat to record neural responses to unilateral stimulation of individual whiskers, one at a time. One week postimplantation, stimulus-driven activity was recorded (control data) over a number of days after which all but two whiskers on one side of the rat’s face were trimmed to the level of the skin to induce experience-dependent plasticity. Stimulus-driven activity was then recorded 1-2 days and 135 6-7 days post whisker trimming (plasticity data) as illustrated by the timeline in Figure 6.1a. Each spared whisker was stimulated 900 times at a frequency of 1 Hz. To assess the changes in the response of individual neurons as a result of whisker pairing, we computed the number of evoked spikes by each neuron as well as the first-spike latency in response to the deflection of each of the spared whiskers. Whisker pairing was expected to result in an increased similarity in the response of the neurons to the deflection of each of the spared Record Record Trim all but two Trim re-grown Keep two Start Pairing Control Diff. in 1 Sp. Lat. * 0.4 st Diff. in Response * 0.6 0.2 0 Control Maintain Pairing 6-7 Days 1-2 Days (a) 0.8 Record 1-2 6-7 Days Post-Trimming 0.4 * 0.3 * 0.2 0.1 0 Control 1-2 6-7 Days Post-Trimming (c) (b) Figure 6.1: (a) Timeline of the experiment. (b) Difference in the evoked response and (c) in the first-spike latency across the spared whiskers averaged across neurons and subjects for the control data (blue) and the plasticity data (red) recorded 1-2 days and 6-7 days post whiskertrimming. Y-axis was normalized by the maximum evoked response in (b) and the maximum first-spike latency in (c) across the spared whiskers for each neuron. *P < 0.05, two-sample t-test. 136 whiskers [213, 245-247]. We quantified the similarity in the response by the difference in the evoked number of spikes and in the first-spike latency across the spared whiskers. There was no significant change in this difference 1-2 days post-trimming, however, this difference decreased significantly 6-7 days post-trimming as illustrated in Figure 6.1b and Figure 6.1c (Normalized difference in evoked spikes for control data: 0.41±0.23, 6-7 days post-trimming: 0.29±0.21; Normalized difference in first-spike latency for control data: 0.18±0.12, 6-7 days post-trimming: 0.15±0.11, P < 0.05, two-sample t-test). This indicates an increased similarity in the response of individual neurons to the spared whiskers 6-7 days post-trimming as a result of pairing. 6.3.2 Changes in Network Response For each stimulated whisker in both control and plasticity data, one hundred 18-sec long spike train datasets were formed from the 900 trials of each whisker by randomly choosing 180 trials out of the 900 trials following a uniform distribution. DBN fit was then obtained for each of these datasets. Similar to the analysis in Chapter 5, principal component analysis (PCA) was used to construct a network feature space where each point in that space corresponds to one network. Therefore, we quantified the similarity between a pair of networks as (1 – the normalized distance between their corresponding projections in the PCA network space). The normalization constant was computed as the maximum distance between the projections of the spared whiskers’ networks obtained from both control and plasticity data of all rats. Distance normalization ensured that the maximum possible pair-wise distance measure in the network feature space does not exceed ‘1’. Figure 6.2a illustrates the network space of one sample rat for the control data and after 7 days of pairing of whiskers D4 and D5. The figure clearly shows the increased similarity 137 between the networks of these two whiskers compared to the control data illustrated by the merging of the two clusters of whiskers D4 and D5. Figure 6.2b summarizes the average similarity across all rats between the networks of the spared whiskers before and after pairing. A significant increase in the similarity was observed that is proportional to the number of days post-trimming (Network similarity for control data: 0.58±0.1, 1-2 days post-trimming: 0.66±0.1, Day 7 Control D4 D5 D6 0 D4 D5 D6 2 PC2 PC2 2 -2 0 -2 -2 0 PC1 2 -2 Paired Whiskers Net. Sim. (a) 1 * 0 PC1 2 * 0.5 0 Control 1-2 6-7 Days Post-trimming (b) Figure 6.2: (a) Control (left) and 7-days post whisker-pairing (right) network feature space of a sample rat for 3 different whiskers (D4, D5 and D6). Whiskers D4 and D5 were paired after recording the control data. Each dot corresponds to the projection of one network onto a 2dimension principal components (PC1 and PC2) feature space. (b) Network similarity across spared whiskers for the control data (blue) and data recorded 1-2 days and 6-7 days post-pairing (red) averaged across 4 subjects (mean ± SD). Similarity for a given pair of networks was quantified as 1 – the normalized distance between the projections of the pair in the principal component space. *P < 0.05, two-sample t-test. 138 6-7 days post-trimming: 0.71±0.1, P < 0.05, two-sample t-test). This finding suggests that the change in the functional/effective connectivity between neurons as a result of whisker pairing could be tracked using DBN and is consistent with the longstanding Hebbian plasticity hypothesis. 6.4 Discussion and Conclusion Identifying plasticity in cortical neural ensembles in vivo is important in studying systems neuroscience to permit tracking the dynamics of biological cortical networks during learning and behavior. In this chapter, we provide evidence that graphical models such as DBNs can track experience-dependent plasticity in layer V of the rat barrel cortex after sensory deprivation. Our results indicate a gradual increase in the similarity among stimulus-specific networks inferred for the spared whiskers that peaks after 6-7 days of pairing. Although no significant change was observed after 1-2 days of pairing at the individual neuron level, we were able to detect increased similarity at the network level after the same time period. This suggests the effectiveness of such algorithms in tracking plasticity that might not be feasible to detect using simple measures such as changes in firing rates or response latency. Our experimental paradigm provides a more accurate quantification of the effects induced by plasticity compared to previous studies for two reasons: First, changes due to plasticity in this study were identified in the same animal by using a chronic implant to record neuronal activity whereas in previous studies, comparisons were carried out between two different groups of animals using acute recordings: a plasticity group and a control group [213, 245-247]. Identifying plasticity effects in the same animal could provide results that are more objective than comparing effects across animals. The downside with our experimental paradigm might be 139 the issues associated with chronic implants such as losing the signals after some time and the need to anesthetize the animals before every recording session. This could affect the overall condition of the animals. Second, this study is the first study, to our knowledge, that attempts to quantify whisker pairing-induced plasticity at the ensemble level in the barrel cortex in vivo. With the exception of only one study that quantified changes in the crosscorrelograms of pairs of neurons in the barrels of the spared whiskers [213], other studies only hypothesized that the changes in neuronal responses due to whisker pairing result from changes in the neuronal networks in the corresponding barrels [198, 243]. These changes were hypothesized to be in the form of either emergence of new connections across the two barrels or strengthening already existing connections [198, 243]. The study presented in this chapter provides evidence for changes that occur at the ensemble level using a more powerful tool than pair-wise measures to support such hypothesis. Although this study provides evidence for the possibility of tracking plasticity at the population level in vivo, several modifications to the experimental paradigm should be considered to improve this study: First, more control data needs to be recorded across a number of days before whisker trimming to establish a baseline of day-to-day changes in neural responses that should not be attributed to experience-dependent plasticity. The control data used in this study did not follow the same timeline used for recording the plasticity data, where for 2 rats, 1 control dataset was recorded, while for each of the other 2 rats, 3 control datasets were recorded over a period of 1 week. Second, more data needs to be recorded during the plasticity period within the 1 week pairing period as well as after that to determine whether the similarity observed scales linearly, sublinearly or supralinearly with the plasticity period. Third, the reversibility of the process needs to be examined, i.e., if all whiskers were allowed to re-grow, 140 the mapping should be repeated to determine if ensemble representation of whisker deflection would become gradually dissimilar across barrels. Repeated anesthesia might prevent implementing these suggestions. Recording from awake, head-fixed animals might overcome such challenge, and there is evidence that plasticity at the individual neuron level could be detected within only few hours after whisker pairing in these rats [248]. Another challenge, however, would manifest itself in the need to ensure rats are trained not to move their whiskers voluntarily during the experimental trial. 141 Chapter ____________________________________________ 7 Concluding Remarks and Future Directions ______________________________________________________________________________ Devising methods to identify neuronal connectivity from simultaneously observed neurons is crucial to understand how neurons collectively and dynamically respond to external stimuli or give rise to an observed motor behavior. We have reviewed and proposed a number of methods that could be used to infer neuronal connectivity and discussed their utility as well as their limitations in analyzing population activity. In particular, we have emphasized the use of graphical models to circumvent many of these limitations. The multiscale clustering algorithm for inferring functional connectivity attempts to find disjoint subspaces with arbitrary structures in the entire neural space using unlabeled data. Dynamic Bayesian Networks (DBNs), on the other hand, finds the intrinsic structure of the population underlying each subspace by transforming the inference problem to a structure learning problem for which many efficient search algorithms exist. This two-step approach provides a framework to decipher the complex connectivity patterns that may exist in large neural populations. From a system identification standpoint, the multiscale clustering step helps alleviate the uncertainty about the temporal dynamics of the system that are governed by a time-scaledependent interaction between the elements. It therefore takes into consideration the effect of pre-synaptic cells on the post-synaptic cell firing probability without relying on the hard-tomeasure synaptic latencies. The latter would require measuring post-synaptic potentials in the 142 post-synaptic cell in response to controlled input to a physiologically tagged pre-synaptic cell (for example, using a current or voltage clamp technique) – a technique that is simply not feasible to perform in large populations with current state of the art technology. The DBN, on the other hand, helps alleviate the uncertainty about the spatial dynamics of the system that are governed by the specific labels of the elements. It does so by integrating the immediate evidence (which cell fired) with the long-term knowledge (prior probability) and uses it to explain away unlikely causes of the observed effects. In doing so, it searches for the best graphical model that fits the data. Taken together, both steps would be essential to capture uncertainties about dynamic interactions between neurons in large complex networks. Despite the superior performance of the methods introduced in this dissertation compared to other classical methods, more work is needed to devise algorithms for estimating some parameters during the analysis such as the best time scale and the best wavelet for the multiscale clustering algorithm, and the Markov lag for the DBN. We have provided a number of metrics the Dunn Index, maximum mutual information and the Mean Influence Score – to devise some suggested values to use for these parameters. These metrics, however, require running the algorithms using a range of values for each parameter and then choosing the value that maximizes them. This is a computationally intense task. Future work should provide more systematic ways of estimating these parameters from the data, for example using maximum likelihood, prior to running the algorithms. In addition, larger network structures, compared to the small size of the networks we demonstrated in Chapters 3 and 4, could be used to simulate the spiking data of populations mimicking real anatomic neuronal networks in which each neuron receives thousands of connections from other unobserved neurons. 143 Using data recorded from the rat barrel cortex, we showed evidence for the validity of both multiscale clustering and DBN in Chapters 3 and 5. In particular, we showed that the inferred connectivity was consistent with known connection probability in local cortical areas and that this connectivity was sufficient to build a model that accurately predicted the firing of individual neurons. In addition, we showed that the inferred connectivity using both methods provided a substantial amount of information about the stimulus that was higher than the information carried by individual neurons’ response properties. Although the results reported are encouraging to claim the validity of both methods, more direct validation is still needed in which the anatomical properties of the connections - rather than their functional properties - are used as ground-truth. For example, recording neuronal activity from layers II/III, IV and V of the rat barrel cortex and comparing the direction of the inferred connections across layers to the well-documented anatomical connections might provide a strong argument towards the validity of these methods. Although it is almost impossible in this case to confirm that the inferred connectivity directly corresponds to anatomical connectivity in the same animal because it is practically impossible to confirm the identity of the recorded cells post-mortem to permit anatomical tract tracing studies. Matching the direction of the inferred connectivity with what has been reported in the literature (From layer IV to layer II/III and from layer IV to layer V [198]), however, should be a good first step. Another way to validate the proposed methods using anatomical ground-truth is to compare properties of the networks inferred from animal models of brain disorders (such as stroke or Alzheimer’s disease) to control subjects. For example, Alzheimer’s disease is known to correlate with a significant loss in the total number of connections [249]. Therefore, we would expect that the networks inferred from Alzheimer’s subjects using the proposed methods would have a significant decrease in the total number of connections compared to control subjects. 144 Another extension to the barrel cortex study is to examine the separation between signal and noise correlations [81, 250]. Signal correlation refers to the correlation between the mean firing rates of a pair of neurons in response to stimulus presentation. Noise correlation, on the other hand, refers to the correlation observed in the trial-to-trial deviation of the firing rates from their mean values. Noise correlation has been often attributed to the possibility of anatomical connectivity among the neurons. This inherently assumes that the stimulus directly influences the firing of all of the observed neurons and that the contribution of the anatomical connectivity is purely noise [83]. The distinction between signal and noise correlation in the barrel cortex data could be examined using DBN by adding a variable (a node) in the graphical model to represent whisker deflection as illustrated by the simple example in Figure 7.1. In this case, connections that are inferred using DBN between the added node and the neurons will correspond to the signal influence (dotted edges) whereas connections among the neurons will correspond to noise (solid edges). Signal correlation for a given pair of neurons can be inferred from the graph if both neurons receive a connection from the whisker node whereas noise correlation can be inferred if they both receive a connection from another common neuron. In this model, a pair of neurons can be correlated in both signal and noise if they receive common connections from the whisker 1 3 W 5 2 4 Figure 7.1: A DBN model that distinguishes between signal and noise correlations. Gray node (W) takes a value ∈ {0, 1}, where 0 indicates no whisker deflection, while 1 indicates the deflection of any whisker. Dotted edges indicate signal influence while solid edges indicate noise influence. Signal correlation can be inferred from the graph by searching for neurons receiving connections from W (such as neurons 1 and 2) whereas noise correlation can be estimated by searching for neurons receiving connections from a common parent (such as neurons 3 and 4). 145 node as well as another common neuron. The model given in Figure 7.1 can be further extended to include multiple whisker nodes in the graph, one for each whisker, to investigate whiskerspecific signal correlations. This proposed model could help in distinguishing between connections inferred from the data as a result of stimulus-driven statistical dependence and those that represent other causal influence among the neurons. Spontaneous data could be utilized as well to examine connectivity that is not stimulus-driven. However, low firing rates of neurons associated with spontaneous activity compared to stimulus-driven activity might hinder the ability of the proposed methods to infer the connections using the same amount of data. The methods presented in this dissertation were designed specifically to analyze spike train data. An important factor to consider that may affect the performance is the accuracy of spike sorting [55, 251]. Although we did not assess the effect of spike sorting errors on the accuracy, we suspect that errors above certain level may affect the accuracy, for example, when resolving spike overlaps. A detailed simulation study is needed to address how sensitive these methods are to spike detection and sorting deficiencies. Connectivity inference methods used here were designed for offline analysis. Developing real-time implementation of these methods would be extremely useful for applications that require continuous knowledge of the underlying neural dynamics such as decoding motor cortex neural signals for Brain Machine Interfaces (BMI) or tracking rapid plasticity. For decoding applications, knowledge of the stimulus prior as well as the connectivity between observed neurons may improve the ability to decode spike trains in a Bayesian sense [101, 194, 252, 253]. One way to think about this is to consider factorizing the likelihood function - the joint neural response given the stimulus - as a product of marginal densities. Each density would express an independent neural response to the stimulus given its pre-synaptic inputs, which would be 146 inferred by the connectivity algorithm. In a simulation study, our results suggested that distinct and consistent network connectivity patterns may arise during goal-directed 2D arm reach movements [85]. When these connectivity patterns were used to estimate the parameters of a Bayesian decoder offline, the decoding performance was significantly improved compared to the case when information about network connectivity was absent [194]. Experimental results by other groups using the activity of retinal ganglion cells in vitro to decode visual scenes agree with the above findings, i.e., better scene reconstruction was obtained when information about connectivity was taken into account [83]. Tracking the underlying connectivity online could therefore improve the decoding performance. Given the extensive computational run-time of the methods presented here, a potential alternative is to use non-negative matrix factorization (NMF) to speed the analysis. NMF factorizes a matrix into the product of a non-negative bases matrix and a non-negative encoding matrix [254]. It has been used to relate the firing rates of simultaneously recorded single neurons to behavioral variables [255] and to infer the functional 2+ connectivity from firing rates extracted from Ca imaging data [256]. Inferring neuronal connectivity using NMF from extracellular recordings has not been fully explored yet. This could be achieved by applying NMF to the continuous-time firing rates of the neurons to eliminate the combinatorial optimization required by DBN to find the best discrete structure that fits the data. NMF, however, assumes linear relationships between the examined random variables. Lastly, quantifying neural plasticity can greatly benefit from the ability to systematically identify network connectivity. For example, learning and memory formation has long been hypothesized to accompany changes in network connectivity [145, 257-262], as well as recovery after spinal cord injury, traumatic brain injury and stroke [143, 259, 263, 264]. We have shown 147 that the methods presented here can track changes in the shape and number of neuronal clusters in simulated data [27]. We have also shown how they can track network dynamics associated with experience-dependent plasticity after sensory deprivation. An ultimate goal would be to validate these methods with anatomical and/or imaging data. 148 REFERENCES 149 REFERENCES [1] R. W. Williams and K. Herrup, "The control of neuron number," Annu. Rev. Neurosci., vol. 11, pp. 423–453, 1988. [2] Y. Tang, J. R. Nyengaard, D. M. D. Groot, and H. J. Gundersen, "Total regional and global number of synapses in the human brain neocortex," Synapse, vol. 41, pp. 258–273, 2001. [3] S. L. Bressler, "Large-scale cortical networks and cognition," Brain Res., vol. 20, pp. 288–304, 1995. [4] F. Varela, J. P. Lachaux, E. Rodriguez, and J. Martinerie., "The brainweb: phase synchronization and large-scale integration," Nat. Rev. Neurosci., vol. 2, pp. 229–239, 2001. [5] M. Catani and D. H. fftyche, "The rises and falls of disconnection syndromes," Brain, vol. 128, pp. 2224–2239, 2005. [6] K. Supekar, V. Menon, D. Rubin, M. Musen, and M. D. Greicius, "Network analysis of intrinsic functional brain connectivity in Alzheimer's disease," PLoS Comput. Biol., vol. 4, p. e1000100, 2008. [7] Y. Liu, M. Liang, Y. Zhou, Y. He, Y. Hao, M. Song, C. Yu, H. Liu, Z. Liu, and T. Jiang, "Disrupted small-world networks in schizophrenia," Brain, vol. 131, pp. 945–961, 2008. [8] M. v. Gerven, J. Farquhar, R. Schaefer, R. Vlek, J. Geuze, A. Nijholt, N. Ramsey, P. Haselager, L. Vuurpijl, S. Gielen, and P. Desain, "The brain–computer interface cycle," Journal of Neural Engineering, vol. 6, p. 041001, 2009. [9] R. C. Gesteland, B. Howland, J. Y. Lettvin, and W. H. Pitts, "Comments on microelectrodes," Proc. IRE, vol. 47, pp. 1856–1862, 1959. [10] E. E. Fetz, "Operant conditioning of cortical unit activity," Science, vol. 163, pp. 955– 958, 1969. [11] R. Normann, E. M. Maynard, P. J. Rousche, and D. J. Warren, "A neural interface for a cortical vision prosthesis," Vision Research, vol. 39, pp. 2577–2587, 1999. 150 [12] K. Wise, D. Anderson, J. Hetke, D. Kipke, and K. Najafi, "Wireless Implantable Microsystems: High-Density Electronic Interfaces to the Nervous System," Proc. of the IEEE, vol. 92-1, pp. 76–97, 2004. [13] S. Fujisawa, A. Amarasingham, M. Harrison, and G. Buzsáki, "Behavior-dependent short-term assembly dynamics in the medial prefrontal cortex," Nat. Neurosci., vol. 11, pp. 823–833, 2008. [14] A. B. Schwartz, "Cortical neural prostheses," Ann. Rev. Neurosci., vol. 27, pp. 487-507, 2004. [15] A. Jackson, J. Mavoori, and E. E. Fetz, "Long-term motor cortex plasticity induced by an electronic neural implant," Nature, vol. 444, pp. 56–60, 2006. [16] K. J. Friston, "Functional and effective connectivity in neuroimaging: A synthesis," Hum Brain Mapping, vol. 2, pp. 56-78, 1994. [17] E. T. Bullmore and O. Sporns, "Complex brain networks: graph-theoretical analysis of structural and functional systems," Nature Rev Neurosci., vol. 10, pp. 186–198, 2009. [18] S. Dodel, J. M. Hermann, and T. Geisel, "Functional connectivity by cross-correlation clustering," Neurocomputing, pp. 1065-1070, 2002. [19] H. Tamura, H. Kaneko, and I. Fujita, "Quantitative analysis of functional clustering of neurons in the macaque inferior temporal cortex," Neurosci Res, vol. 52, pp. 311–322, 2005. [20] A. M. Aertsen, G. L. Gerstein, M. K. Habib, and G. Palm, "Dynamics of neuronal firing correlation: modulation of "effective connectivity"," Journal of Neurophysiology, vol. 61, pp. 900-917, 1989. [21] G. L. Gerstein and D. H. Perkel, "Simultaneously recorded trains of action potentials: analysis and functional interpretation," Science vol. 164, pp. 828-830, 1969. [22] E. N. Brown, R. Kass, and P. Mitra, "Multiple neural spike train data analysis: State-ofthe-Art and future challenges," Nature Neurosc., vol. 7 pp. 456-461, 2004. 151 [23] G. L. Gerstein, "Cross correlation measures of unresolved multi-neuron recordings," J. Neurosci Methods, vol. 100, pp. 41-51, 2000. [24] G. Buzsaki, "Large-scale recording of neuronal ensembles," Nat Neurosci, vol. 7, pp. 446-451, 2004. [25] O. Sporns, "Graph theory methods for the analysis of neural connectivity patterns," in Neuroscience Databases. A Practical Guide, R. Kotter, Ed. Boston: Kluwer, 2002, pp. 169–83. [26] B. Denise, W. David, N. Richard, A. Amos, and G. Sonja, "Spatially organized spike correlation in cat visual cortex," Neurocomputing, vol. 70, pp. 2112--2116, 2007. [27] S. Eldawlatly, R. Jin, and K. G. Oweiss, "Identifying Functional Connectivity in LargeScale Neural Ensemble Recordings: A Multiscale Data Mining Approach," Neural Computation, vol. 21, pp. pp. 450-477, 2009. [28] S. Eldawlatly, Y. Zhou, R. Jin, and K. G. Oweiss, "On The Use of Dynamic Bayesian Networks in Reconstructing Functional Neuronal Networks from Spike Train Ensembles," Neural Computation, vol. 22, pp. 158-189, 2010. [29] S. Eldawlatly and K. Oweiss, "Graphical Models of Functional and Effective Neuronal Connectivity," in Statistical Signal Processing for Neuroscience and Neurotechnology, K. Oweiss, Ed.: Academic Press, Elsevier, 2010, pp. 129-174. [30] S. Eldawlatly and K. Oweiss, "Millisecond-Timescale Local Network Coding in the Rat Primary Somatosensory Cortex," PLoS ONE, in press. [31] E. R. Kandel, J. H. Schwartz, and T. M. Jessell, Principles of Neuroscience, fourth ed. New York: McGraw-Hill Health Professions Division, 2000. [32] H. Lodish, D. Baltimore, A. Berk, S.L. Zipursky, P. Matsudaira, and J. Darnell, Molecular Cell Biology, 4th ed. New York: W.H. Freeman, 1999. [33] M. D. Greicius, B. Krasnow, A. L. Reiss, and V. Menon, "Functional connectivity in the resting brain: a network analysis of the default mode hypothesis," PNAS vol. 100, pp. 253–258, 2003. 152 [34] R. Winder, C. R. Cortes, J. A. Reggia, and M.-A. Tagamets, "Functional connectivity in fMRI: A modeling approach for estimation and for relating to local circuits," Neuroimage, vol. 34, pp. 1093–1107, 2007. [35] H. Koshino, P. A. Carpenter, N. J. Minshew, V. L. Cherkassky, T. A. Keller, and M. A. Just, "Functional connectivity in an fMRI working memory task in high-functioning autism," Neuroimage, vol. 24, pp. 810–821, 2005. [36] U. Hasson, E. Yang, I. Vallines, D. J. Heeger, and N. Rubin, "A Hierarchy of Temporal Receptive Windows in Human Cortex," J. Neurosci., vol. 28, pp. 2539 - 2550, 2008. [37] D. Y. Tsao, N. Schweers, S. Moeller, and W. A. Freiwald, "Patches of face-selective cortex in the macaque frontal lobe," Nature Neuroscience, vol. 11, pp. 877-879, 2008. [38] A. H. Bell, F. Hadj-Bouziane, J. B. Frihauf, R. B. H. Tootell, and L. G. Ungerleider, "Object representations in the temporal cortex of monkeys and humans as revealed by functional magnetic resonance imaging," Journal of Neurophysiology, vol. 101, p. 688, 2009. [39] L. Verhagen, H. C. Dijkerman, M. J. Grol, and I. Toni, "Perceptuo-Motor Interactions during Prehension Movements," J. Neurosci., vol. 28, pp. 4726 - 4735, 2008. [40] M. Lotze, P. Montoya, M. Erb, E. Hulsmann, H. Flor, U. Klose, N. Birbaumer, and W. Grodd, "Activation of cortical and cerebellar motor areas during executed and imagined hand movements: an fMRI study," Journal of Cognitive Neuroscience, vol. 11, pp. 491501, 1999. [41] A. Caclin and P. Fonlupt, "Functional and effective connectivity in an fMRI study of an auditory-related task," Eur J Neurosci, vol. 23, pp. 2531-2537, 2006. [42] M. d'Esposito, G. K. Aguirre, E. Zarahn, D. Ballard, R. K. Shin, and J. Lease, "Functional MRI studies of spatial and nonspatial working memory," Cognitive Brain Research, vol. 7, pp. 1-13, 1998. [43] N. K. Logothetis, D. Leopold, M. Wilke, A. Maier, N. K. Logothetis, K. Moutoussis, G. A. Keliris, Z. Kourtzi, N. K. Logothetis, and C. Juchem, "What can and cannot fMRI tell us about neural activity: A primer in combined imaging and electrophysiology"}," Journal of Neuroscience}, vol. 25, p. 2526, 2005. 153 [44] A. B. Schwartz, "Useful signals from motor cortex," J. Physiol., vol. 579, pp. 581–601, 2007. [45] P. J. Rousche and R. A. Normann, "Chronic recording capability of the Utah intracortical electrode array in cat sensory cortex," J. Neurosci. Methods, vol. 82, pp. 1–15, 1998. [46] K. Oweiss, "Introduction," in Statistical Signal Processing for Neuroscience and Neurotechnology, K. Oweiss, Ed.: Academic Press, Elsevier, 2010, pp. 1-13. [47] P. Tiesinga, J. M. Fellous, and T. J. Sejnowski, "Regulation of spike timing in visual cortical circuits," Nat Rev Neurosci vol. 9, pp. 97–107, 2008. [48] S. Panzeri, R. S. Petersen, S. R. Schultz, M. Lebedev, and M. E. Diamond, "The role of spike timing in the coding of stimulus location in rat somatosensory cortex," Neuron, vol. 29, pp. 769-777, 2001. [49] M. DeWeese, M. Wehr, and A. Zador, "Binary spiking in auditory cortex," J Neurosci, vol. 23, pp. 7940–7949, 2003. [50] L. R. Hochberg, M. D. Serruya, G. M. Friehs, J. A. Mukand, M. Saleh, A. H. Caplan, A. Branner, D. Chen, R. D. Penn, and J. P. Donoghue, "Neuronal ensemble control of prosthetic devices by a human with tetraplegia," Nature, vol. 442, pp. 164–171, 2006. [51] J. Reimer and N. G. Hatsopoulos, "Periodicity and evoked responses in motor cortex," J Neurosci vol. 30, pp. 11506-11515, 2010. [52] M. Velliste, S. Perel, M. C. Spalding, A. S. Whitford, and A. B. Schwartz, "Cortical control of a prosthetic arm for self-feeding," Nature, vol. 453, pp. 1098–1101, 2008. [53] K. D. Harris, J. Csicsvari, H. Hirase, G. Dragoi, and G. Buzsaki, "Organization of cell assemblies in the hippocampus," Nature, vol. 424, pp. 552–556, 2003. [54] Y. Ikegaya, G. Aaron, R. Cossart, D. Aronov, I. Lampl, D. Ferster, and R. Yuste, "Synfire chains and cortical songs: temporal modules of cortical activity," Science, vol. 304, pp. 559–564, 2004. 154 [55] K. D. Harris, D. A. Henze, J. Csicsvari, H. Hirase, and G. Buzsaki, "Accuracy of tetrode spike separation as determined by simultaneous intracellular and extracellular measurements," J. Neurophysiol. , vol. 84, pp. 401–414, 2000. [56] M. S. Fee, P. P. Mitra, and D. Kleinfeld, "Automatic sorting of multiple unit neuronal signals in the presence of anisotropic and non-Gaussian variability," J Neurosci Methods, vol. 69, pp. 175-88, 1996. [57] K. G. Oweiss and D. J. Anderson, "Spike sorting: a novel shift and amplitude invariant technique," Neurocomputing, vol. 44-46, pp. 1133–1139, 2002. [58] M. Aghagolzadeh and K. Oweiss, "Detection and Classification of Extracellular Action Potential Recordings " in Statistical Signal Processing for Neuroscience and Neurotechnology, K. Oweiss, Ed.: Academic Press, Elsevier, 2010, pp. 15-74. [59] M. Lewicki, "A review of methods for spike sorting: the detection and classification of neural action potentials," Network Comput. Neural Syst. , vol. 9, pp. R53–R78, 1998. [60] Q. R. Quiroga, Z. Nadasdy, and Y. Ben-Shaul, "Unsupervised spike detection and sorting with wavelets and super-paramagnetic clustering," Neural Comput., vol. 16, p. 1661−1687, 2004. [61] G. Zouridakis and D. C. Tam, "Identification of reliable spike templates in multi-unit extracellular recordings using fuzzy clustering," Comput. Methods Prog. Biomed, vol. 61, pp. 91–98, 2000. [62] S. Shoham, M. R. Fellows, and R. A. Normann, "Robust, automatic spike sorting using mixtures of multivariate t-distributions," J. Neurosci. Methods, vol. 127, pp. 111–122, 2003. [63] D. H. Perkel, G. Gerstein, and G. P. Moore, "Neuronal spike trains and stochastic point processes. II. Simultaneous spike trains," J. Biophys vol. 7, pp. 419-40, 1967. [64] M. Frerking, J. Schulte, S. P. Wiebe, and U. Stäubli, "Spike timing in CA3 pyramidal cells during behavior: implications for synaptic transmission”," J. Neurophysiol. , vol. 94, pp. 1528-1540, 2005. [65] W. Maass and T. Natschlager, "A Model for Fast Analog Computation Based on Unreliable Synapses," Neural Computation, vol. 12, pp. 1679-1704, 2000. 155 [66] G. Q. Bi and M. M. Poo, "Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type," Journal of Neuroscience vol. 18, pp. 10464-72, 1998. [67] H. Markram, J. Lübke, M. Frotscher, and B. Sakmann, "Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs," Science, vol. 275, pp. 213-215, 1997. [68] M. V. Tsodyks and H. Markram, "The neural code between neocortical pyramidal neurons depends on neurotransmitter release probability " PNAS, vol. 94, pp. 719-723, 1997. [69] P. D. Roberts and C. C. Bell, "Spike-timing dependent synaptic plasticity in biological systems," Biological Cybernetics, vol. 87, pp. 392-403, 2002. [70] A. Destexhe and E. Marder, "Plasticity in single neuron and circuit computations," Nature, vol. 431, pp. 789-795, 2004. [71] R. Dzakpasu and M. Zochowski, "Discriminating different types of synchrony in neural systems," Physica D., vol. 208, pp. 115-122, 2005. [72] W. S. Smith and E. E. Fetz, "Synaptic Interactions Between Forelimb-Related Motor Cortex Neurons in Behaving Primates," Journal of Neurophysiology, vol. 102, pp. 10261039, Aug 2009. [73] T. M. Cover and J. A. Thomas, Elements of Information Theory: Wiley-Interscience New York, 2006. [74] S. Yamada, K. Matsumoto, M. Nakashima, and S. Shiono, "Information theoretic analysis of action potential trains. II. Analysis of correlation among n neurons to deduce connection structure," J. Neurosci. Methods, vol. 66, pp. 35-45, 1996. [75] L. M. A. Bettencourt, G. J. Stephens, M. I. Ham, and G. W. Gross, "Functional structure of cortical neuronal networks grown in vitro," Phys Rev E Stat Nonlin Soft Matter Phys, vol. 75, p. 021915, 2007. [76] P.-O. Amblard and O. J. J. Michel, "On directed information theory and Granger causality graphs," J Comput Neurosci, vol. 30, pp. 7–16, 2011. 156 [77] B. Gourevitch and J. J. Eggermont, "Evaluating Information Transfer Between Auditory Cortical Neurons," J Neurophysiol, vol. 97, pp. 2533 - 2543, 2007. [78] M. Abeles and G. Gerstein, "Detecting Spatiotemporal Firing Patterns Among Simultaneously Recorded Single Neurons," Journal of Neurophysiology, vol. 60, pp. 909924, 1988. [79] S. Grün, M. Diesmann, and A. Aertsen, "Unitary Events in Multiple Single-Neuron Spiking Activity: II. Nonstationary Data," Neural Computation, vol. 14, pp. 81-119, 2002. [80] E. Schneidman, M. J. Berry, R. Segev, and W. Bialek, "Weak pairwise correlations imply strongly correlated network states in a neural population," Nature, vol. 440, pp. 1007– 1012, 2006. [81] B. B. Averbeck, P. Latham, and A. Pouget, "Neural correlations, population coding and computation," Nature Reviews Neuroscience vol. 7, pp. 358-366, 2006. [82] A. Tang, D. Jackson, J. Hobbs, W. Chen, J. L. Smith, H. Patel, A. Prieto, D. Petrusca, M. I. Grivich, A. Sher, P. Hottowy, W. Dabrowski, A. M. Litke, and J. M. Beggs, "A Maximum Entropy Model Applied to Spatial and Temporal Correlations from Cortical Networks In Vitro," J. Neurosci., vol. 28, pp. 505-518, January 9, 2008 2008. [83] J. W. Pillow, J. Shlens, L. Paninski, A. Sher, A. M. Litke, E. J. Chichilnisky, and E. P. Simoncelli, "Spatio-temporal correlations and visual signalling in a complete neuronal population " Nature, vol. 454, pp. 995-999 2008. [84] Y. Roudi, S. Nirenberg, and P. E. Latham, "Pairwise maximum entropy models for studying large biological systems: when they can work and when they can't," PLoS Comput Biol, vol. 5, p. e1000380, May 2009. [85] M. Aghagolzadeh, S. Eldawlatly, and K. Oweiss, "Synergistic Coding by Cortical Neural Ensembles," IEEE Transactions on Information Theory, vol. 56, pp. 875-889, 2010. [86] Y. Dan and M. M. Poo, "Spike timing-dependent plasticity of neural circuits," Neuron, vol. 44, pp. 23-30, 2004. [87] C. W. J. Granger, "Investigating causal relations by econometric models and crossspectral methods," Econometrica, vol. 37 pp. 424-438, 1969. 157 [88] M. Kami ski, M. Ding, W. Truccolo, and S. Bressler, "Evaluating causal relations in neural systems: Granger causality, directed transfer function and statistical assessment of significance," Biological Cybernetics, vol. 85, pp. 145-157, 2001. [89] K. Sameshima and L. Baccala, "Using partial directed coherence to describe neuronal ensemble interactions," J Neurosci Methods, vol. 94, pp. 93–103, 1999. [90] P. J. Franaszczuk, G. K. Bergey, and M. Kaminski, "Analysis of mesial temporal seizure onset and propagation using the directed transfer function method," Electroenceph. clin. Neurophysiol., vol. 91, pp. 413-427, 1994. [91] L. Lin, R. Osan, and J. Z. Tsien, "Organizing principles of real-time memory encoding: neural clique assemblies and universal neural codes," Trends Neurosci, vol. 29, pp. 48-57, Jan 2006. [92] J. L. Peña, S. Viete, K. Funabiki, K. Saberi, and M. Konishi, "Cochlear and neural delays for coincidence detection in owls," J. Neurosci., vol. 21, pp. 9455–9459, 2001. [93] H. Agmon-Snir, C. E. Carr, and J. Rinzel, "The role of dendrites in auditory coincidence detection," Nature, vol. 393, pp. 268–272, 1998. [94] E. Salinas and P. Their, "Gain modulation: a major computational principle of the central nervous system," Neuron, vol. 27, pp. 15-21, 2000. [95] Y. E. Cohen and R. A. Andersen, "A common reference frame for movement plans in the posterior parietal cortex," Nature Reviews Neuroscience, vol. 3, pp. 553–562, 2002. [96] A. J. Cadotte, T. B. DeMarse, P. He, and M. Ding, "Causal measures of structure and plasticity in simulated and living neural networks," PLoS ONE, vol. 3, e3355, 2008. [97] R. Dahlhaus, M. Eichler, and J. Sandkuhler, "Identification of synaptic connections in neural ensembles by graphical models," J Neurosci Methods, vol. 77, pp. 93–107, 1997. [98] W. Truccolo, U. T. Eden, M. Fellow, J. D. Donoghue, and E. N. Brown, "A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects," J. Neurophysiol., vol. 93, pp. 1074-1089, 2005. 158 [99] M. Okatan, M. A. Wilson, and E. N. Brown, "Analyzing Functional Connectivity Using a Network Likelihood Model of Ensemble Neural Spiking Activity," Neural Computation, vol. 17, pp. 1927-1961, 2005. [100] L. Paninski, J. W. Pillow, and E. P. Simoncelli, "Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Encoding Model," Neural Computation, vol. 16, pp. 2533-2561, 2004. [101] R. Barbieri, M. A. Wilson, L. M. Frank, and E. N. Brown, "An analysis of hippocampal spatio-temporal representations using a Bayesian algorithm for neural spike train decoding," IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 13, pp. 131-136, 2005. [102] R. Barbieri, M. Quirk, L. Frank, M. Wilson, and E. Brown, "Construction and analysis of non-Poisson stimulus-response models of neural spiking activity," Journal of Neuroscience Methods, vol. 105, pp. 25-37, 2001. [103] D. A. Dombeck, M. S. Graziano, and D. W. Tank, "Functional clustering of neurons in motor cortex determined by cellular resolution imaging in awake behaving mice," J Neurosci, vol. 29, pp. 13751-60, Nov 4 2009. [104] M. R. Cohen and W. T. Newsome, "Estimates of the contribution of single neurons to perception depend on timescale and noise correlation," J Neurosci, vol. 29, pp. 6635-48, May 20 2009. [105] J. I. Gold and M. N. Shadlen, "The Neural Basis of Decision Making," Annual Review of Neuroscience, vol. 30, pp. 535-574, 06/28/ 2007. [106] M. N. Shadlen and W. T. Newsome, "The variable discharge of cortical neurons: Implications for connectivity, computation, and information coding," J. Neurosci., vol. 18, pp. 3870-3896, 1998. [107] L. Paninski, J. Pillow, and J. Lewi, "Statistical models for neural encoding, decoding, and optimal stimulus design," Progress in brain research, vol. 165, p. 493, 2007. [108] M. Cohen and J. Maunsell, "Attention improves performance primarily by reducing interneuronal correlations," Nature neuroscience, vol. 12, pp. 1594-1600, 2009. 159 [109] E. M. Callaway, "Local circuits in primary visual cortex of the macaque monkey," Annu Rev Neurosci, vol. 21, pp. 47-74, 1998. [110] Y. Yoshimura and E. M. Callaway, "Fine-scale specificity of cortical networks depends on inhibitory cell type and connectivity," Nat Neurosci, vol. 8, pp. 1552-9, Nov 2005. [111] M. M. Churchland and K. V. Shenoy, "Temporal Complexity and Heterogeneity of Single-Neuron Activity in Premotor and Motor Cortex," Journal of Neurophysiology, vol. 97, p. 4235, 2007. [112] J. M. Carmena, M. A. Lebedev, R. E. Crist, J. E. O'Doherty, D. M. Santucci, D. F. Dimitrov, P. G. Patil, C. S. Henriquez, and M. A. L. Nicolelis, "Learning to control a brain-machine interface for reaching and grasping by primates," PLoS Biology, vol. 1, p. e2, 2003. [113] L. Paninski, M. R. Fellows, N. G. Hatsopoulos, and J. P. Donoghue, "Spatiotemporal tuning of motor cortical neurons for hand position and velocity," Journal of neurophysiology, vol. 91, pp. 515-532, 2004. [114] W. Wu, M. J. Black, D. Mumford, Y. Gao, E. Bienenstock, and J. P. Donoghue, "Modeling and decoding motor cortical activity using a switching Kalman filter," Biomedical Engineering, IEEE Transactions on, vol. 51, pp. 933-942, 2004. [115] D. W. Moran and A. B. Schwartz, "Motor cortical representation of speed and direction during reaching," Journal of Neurophysiology, vol. 82, pp. 2676–2692, 1999. [116] M. Histed, V. Bonin, and R. Reid, "Direct activation of sparse, distributed populations of cortical neurons by electrical microstimulation," Neuron, vol. 63, pp. 508–522 2009. [117] S. Song, P. J. Sjostrom, M. Reigl, S. Nelson, and D. B. Chklovskii, "Highly nonrandom features of synaptic connectivity in local cortical circuits," PLoS Biol, vol. 3, p. e68, Mar 2005. [118] M. J. Wainwright and M. Jordan, "Graphical Models, Exponential Families, and Variational Inference," Foundations and Trends in Machine Learning vol. 1, pp. 1-305, 2008 2008. [119] C. J. Stam and J. C. Reijneveld, "Graph theoretical analysis of complex networks in the brain," Nonlin. Biomed. Phys., vol. 1, p. 3, 2007. 160 [120] D. J. Watts and S. H. Strogatz, "Collective dynamics of 'small-world' networks," Nature, vol. 393, pp. 440–442, 1998. [121] C. C. Hilgetag, G. A. P. C. Burns, M. A. O'Neill, J. W. Scannell, and M. P. Young, "Anatomical connectivity defines the organisation of clusters of cortical areas in macaque monkey and cat," Philos Trans R Soc Lond B Biol Sci, vol. 355, pp. 91–110, 2000. [122] K. E. Stephan, C.-C. Hilgetag, G. A. P. C. Burns, M. A. O'Neill, M. P. Young, and R. Kotter, "Computational analysis of functional connectivity between areas of primate cerebral cortex," Phil Trans R Soc Lond B, vol. 355, pp. 111-126, 2000. [123] G. Buzsaki, C. Geisler, D. A. Henze, and X. J. Wang, "Interneuron diversity series: Circuit complexity and axon wiring economy of cortical interneurons," Trends Neurosci., vol. 27, pp. 186–193, 2004. [124] V. M. Eguíluz, D. R. Chialvo, G. A. Cecchi, M. Baliki, and A. V. Apkarian, "Scale-free brain functional networks," Phys. Rev. Lett., vol. 94, p. 018102, 2005. [125] B. J. He, J. M. Zempel, A. Z. Snyder, and M. E. Raichle, "The temporal structures and functional significance of scale-free brain activity," Neuron, vol. 66, pp. 353–369, 2010. [126] J. Shlens, G. D. Field, J. L. Gauthier, M. I. Grivich, D. Petrusca, A. Sher, A. M. Litke, and E. J. Chichilnisky, "The structure of multi-neuron firing patterns in primate retina," J. Neurosci., vol. 26, pp. 8254-8266, 2006. [127] J. M. Fellous, P. H. E. Tiesinga, P. J. Thomas, and T. J. Sejnowski, "Discovering Spike Patterns in Neuronal Responses," Journal of Neuroscience, vol. 24, p. 2989, 2004. [128] D. Brillinger, "The Identification of Point Process Systems," Annals of Probability, vol. 3, pp. 909-924, 1975. [129] E. N. Brown, "Theory of Point Processes for Neural Systems " in Methods and Models in Neurophysics, C. C. Chow, B. Gutkin, D. Hansel, C. Meunier, and J. Dalibard, Eds. Paris: Elsevier, 2005, pp. 691-726. [130] L. Srinivasan, U. T. Eden, A. S. Willsky, and E. N. Brown, "A State-Space Analysis for Reconstruction of Goal-Directed Movements Using Neural Signals," Neural Comp., vol. 18, pp. 2465-2494, October 1, 2006 2006. 161 [131] E. N. Brown, D. P. Nguyen, L. M. Frank, M. A. Wilson, and V. Solo, "An analysis of neural receptive field plasticity by point process adaptive filtering," PNAS, vol. 98, pp. 12261-12266, October 9, 2001 2001. [132] C. E. Armstrong-Gold and F. Rieke, "Bandpass Filtering at the Rod to Second-Order Cell Synapse in Salamander (Ambystoma tigrinum) Retina," J. Neurosci., vol. 23, pp. 37963806, May 1, 2003 2003. [133] S. Mallat, A Wavelet Tour of Signal Processing, second ed.: Kluwer Acedemic Publishers, 1998. [134] F. R. K. Chung, Spectral Graph Theory vol. 92. Providence, RI: American Mathematical Society, 1994. [135] C. H. Q. Ding, X. He, H. Zha, M. Gu, and H. D. Simon, "A Min-max Cut Algorithm for Graph Partitioning and Data Clustering " in Proceedings of the 2001 IEEE International Conference on Data Mining 2001, pp. 107 - 114. [136] R. Jin, C. Ding, and F. Kang, "A Probabilistic Approach for Optimizing Spectral Clustering," in Advances in Neural Information Processing Systems, 2005. [137] A. Ng, M. Jordan, and Y. Weiss, "On spectral clustering: Analysis and an algorithm," Advances in Neural Information Processing Systems, vol. 14, 2001. [138] C. F. Stevens and A. M. Zador, "Input synchrony and the irregular firing of cortical neurons," Nat. Neurosci., vol. 1, pp. 210-217, 1998. [139] F. Rieke, "Spikes:: Exploring the Neural Code," 1997. [140] P. J. Sjöström, G. G. Turrigiano, and S. B. Nelson, "Rate, timing, and cooperativity jointly determine cortical synaptic plasticity," Neuron, vol. 32, pp. 1149-64, 2001. [141] S. B. Nelson, P. J. Sjöström, and G. G. Turrigiano, "Rate and timing in cortical synaptic plasticity," Philos Trans R Soc Lond B Biol Sci., vol. 357, pp. 1851-7, 2002. [142] A. K. Jain, M. N. Murty, and P. J. Flynn, "Data clustering: a review," ACM Comp. Surv., vol. 31, pp. 264-323, 1999. 162 [143] I. R. Winship and T. H. Murphy, "In Vivo Calcium Imaging Reveals Functional Rewiring of Single Somatosensory Neurons after Stroke," J. Neurosci., vol. 28, pp. 65926606, June 25, 2008 2008. [144] L. Gross, "A New Window into Structural Plasticity in the Adult Visual Cortex," PLoS Biology, vol. 4, p. e42, 2006. [145] S. Fusi, W. F. Asaad, E. K. Miller, and X. J. Wang, "A Neural Circuit Model of Flexible Sensorimotor Mapping: Learning and Forgetting on Multiple Timescales," Neuron, vol. 54, pp. 319-333, 2007. [146] K. Sakamoto, H. Mushiake, N. Saito, K. Aihara, M. Yano, and J. Tanji, "Discharge Synchrony during the Transition of Behavioral Goal Representations Encoded by Discharge Rates of Prefrontal Neurons," Cereb. Cortex, p. bhm234, February 5, 2008 2008. [147] R. C. Froemke and Y. Dan, "Spike-timing-dependent synaptic modification induced by natural spike trains," Nature, vol. 416, pp. 433-438, 2002. [148] J. C. Bezdek and N. R. Pal, "Some new indexes of cluster validity," IEEE Transactions on Systems, Man, and Cybernetics, Part B, , vol. 28, pp. 301-315, 1998. [149] C. A. Atencio and C. E. Schreiner, "Columnar Connectivity and Laminar Processing in Cat Primary Auditory Cortex," PLoS ONE, vol. 5, p. e9521. doi:10.1371/journal.pone.0009521 2010. [150] A. Nicoll and C. Blakemore, "Patterns of local connectivity in the neocortex," Neural Computation, vol. 5, pp. 665–680, 1993. [151] D. V. Buonomano and M. M. Merzenich, "CORTICAL PLASTICITY: From Synapses to Maps," Annual Review of Neuroscience, vol. 21, pp. 149-186, 1998. [152] R. Tibshirani, G. Walther, and T. Hastie, "Estimating the number of clusters in a data set. via the gap statistic," Journal of the Royal Statistical Society B vol. 63, pp. 411–423, 2001. [153] F. Chen, S. El-Dawlatly, R. Jin, and K. Oweiss, "Identifying and Tracking the number of independent clusters of functionally interdependent neurons from biophysical models of 163 population activity," in Proc. of 3rd Int. IEEE EMBS Conf. on Neural Engineering, Hawaii, 2007, pp. 542-545. [154] D. M. Taylor, S. I. H. Tillery, and A. B. Schwartz, "Direct Cortical Control of 3D Neuroprosthetic Devices," Science, vol. 296, p. 1829, 2002. [155] M. A. Lebedev and M. A. Nicolelis, "Brain-machine interfaces: past, present and future," Trends Neurosci, vol. 29, pp. 536-546, 2006. [156] M. A. Lebedev, J. E. O'Doherty, and M. A. L. Nicolelis, "Decoding of Temporal Intervals From Cortical Ensemble Activity," J Neurophysiol, vol. 99, pp. 166-186, January 1, 2008 2008. [157] D. W. Moran and A. B. Schwartz, "Motor Cortical Activity During Drawing Movements: Population Representation During Spiral Tracing," Journal of Neurophysiology, vol. 82, pp. 2693-2704, 1999. [158] M. D. Serruya, N. G. Hatsopoulos, L. Paninski, M. R. Fellows, and J. P. Donoghue, "Brain-machine interface: Instant neural control of a movement signal," Nature, vol. 416, pp. 141-142, 2002. [159] M. Serruya, N. Hatsopoulos, M. Fellows, L. Paninski, and J. Donoghue, "Robustness of neuroprosthetic decoding algorithms," Biological Cybernetics, vol. 88, pp. 219-228, 2003. [160] K. Murphy and S. Mian, "Modelling gene expression data using dynamic Bayesian networks," Computer Science Division, University of California Berkeley, CA 1999. [161] K. Murphy, "Dynamic Bayesian Networks: Representation, Inference and Learning," PhD thesis, UC Berkeley, Computer Science Division, 2002. [162] J. Pearl, Probabilistic Reasoning in Intelligent Systems. San Francisco, CA: Morgan Kaufmann, 1988. [163] D. Heckerman, "A tutorial on learning with bayesian networks," Technical Report MSRTR-95-06, Microsoft Research, 1995. [164] L. R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proceedings of the IEEE, vol. 77, pp. 257–286, 1989. 164 [165] N. Friedman, I. Nachaman, and D. Peer, "Learning bayesian network structure from massive datasets: The sparse candidate algorithm," in Proc. of the 15th Conference on Uncertainty in Artificial Intelligence (UAI), 1999, pp. 206-215. [166] A. J. Hartemink, D. K. Gifford, T. S. Jaakkola, and R. A. Young, "Using graphical models and genomic expression data to statistically validate models of genetic regulatory networks," in Pacific Symposium on Biocomputing, 2001. [167] W. Lam and F. Bacchus, "Learning bayesian belief networks: an approach based on mdl principle," Computational Intelligence, vol. 10, pp. 269-293, 1994. [168] D. Heckerman, D. Geiger, and M. Chickering, "Learning bayesian networks: The combination of knowledge and statistical data," Machine Learning, vol. 9, pp. 197–243, 1995. [169] G. Schwarz, "Estimating the dimension of a model," Annals of Statistics, vol. 6, pp. 461– 464, 1978. [170] G. F. Cooper and E. Herskovits, "A Bayesian method for the induction of probabilistic networks from data," Machine Learning, vol. 9, pp. 309-347, 1992. [171] M. Chickering, C. Meek, and D. Heckerman, "Large-sample learning of bayesian networks is np-hard," in 19th Annual Conference on Uncertainty in Artificial Intelligence (UAI-03), 2003. [172] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, "Optimization by Simulated Annealing," Science, vol. 220, pp. 671-680, 1983. [173] A. Bernard and A. J. Hartemink, "Informative structure priors: Joint learning of dynamic regulatory networks from multiple types of data," Pac Symp Biocomput, vol. 10, pp. 459– 470, 2005. [174] N. Dojer, A. Gambin, A. Mizera, B. Wilczynski, and J. Tiuryn, "Applying dynamic Bayesian networks to perturbed gene expression data," BMC Bioinformatics, vol. 7, 2006. [175] F. Geier, J. Timmer, and C. Fleck, "Reconstructing gene-regulatory networks from time series, knock-out data, and prior knowledge," BMC Systems Biology, vol. 1, 2007. 165 [176] J. C. Rajapakse and J. Zhou, "Learning effective brain connectivity with dynamic Bayesian networks " NeuroImage, vol. 37, pp. 749-760, 2007. [177] L. Zhang, D. Samaras, N. Alia-Klein, N. Volkow, and R. Goldste, "Modeling neuronal interactivity using dynamic Bayesian networks," in Advances in Neural Information Processing Systems, Cambridge, MA, 2006. [178] V. A. Smith, J. Yu, T. V. Smulders, A. J. Hartemink, and E. D. Jarvis, "Computational inference of neural information flow networks," PLoS Computational Biology, vol. 2, pp. 1436-1449, 2006. [179] D. O. Hebb, The Organization of Behavior; a Neuropsychological Theory: Wiley, New York, 1949. [180] L. Kuhlmann, A. N. Burkitt, A. G. Paolini, and G. M. Clark, "Summation of spatiotemporal input patterns in leaky integrate-and-fire neurons: Application to neurons in the cochlear nucleus receiving converging auditory nerve fiber input," J. Comput. Neurosci., vol. 12, pp. 55-73, 2002. [181] H. Sprekeler, C. Michaelis, and L. Wiskott, "Slowness: An objective for spike-timingdependent plasticity?," PLoS Computational Biology, vol. 3, p. e112. doi:10.1371/journal.pcbi.0030112, 2007. [182] X. Zhang and L. H. Carney, "Response properties of an integrate-and-fire model that receives subthreshold inputs," Neural Computation, vol. 17, pp. 2571-2601, 2005. [183] C. J. v. Rijsbergen, Information Retrieval. London: Buttersworth, 1979. [184] G. Czanner, U. T. Eden, S. Wirth, M. Yanike, W. A. Suzuki, and E. N. Brown, "Analysis of between-trial and within-trial neural spiking dynamics," J Neurophysiol., vol. 99, pp. 2672-2693, 2008. [185] J. de La Rocha, B. Doiron, E. Shea-Brown, Josi, K. cacute, and A. Reyes, "Correlation between neural spike trains increases with firing rate," Nature, vol. 448, pp. 802-806, 2007. [186] S. Amari, "Measure of correlation orthogonal to changing in firing rate," Neural Comput, vol. 21, pp. 960-972, 2009. 166 [187] D. Debanne, "Information processing in the axon," Nat Rev Neurosci, vol. 5, pp. 304–316, 2004. [188] G. Molnar, S. Olah, G. Komlosi, M. Fule, J. Szabadics, C. Varga, P. Barzo, and G. Tamas, "Complex events initiated by individual spikes in the human cerebral cortex," PLoS Biol vol. 6, p. p. e222, 2008. [189] N. Toni, D. A. Laplagne, C. Zhao, G. Lombardi, C. E. Ribak, F. H. Gage, and A. F. Schinder, "Neurons born in the adult dentate gyrus form functional synapses with target cells," Nature Neuroscience, vol. 11, pp. 901-907, 2008. [190] Y. Yoshimura, J. L. Dantzker, and E. M. Callaway, "Excitatory cortical neurons form fine-scale functional networks," Nature, vol. 433, p. 868−873, 2005. [191] D. A. Keen and A. J. Fuglevand, "Common input to motor neurons innervating the same and different compartments of the human extensor digitorum muscle," J Neurophysiol, vol. 91, pp. 57–102, 2004. [192] K. S. Turker and R. K. Powers, "Effects of common excitatory and inhibitory inputs on motoneuron synchronization," Journal of Neurophysiology, vol. 86, pp. 2807-2822, 2001. [193] K. S. Turker and R. K. Powers, "The effects of common input characteristics and discharge rate on synchronization in rat hypoglossal motoneurones," Journal of Physiology, vol. 541, pp. 254-260, 2002. [194] M. Aghagolzadeh, S. Eldawlatly, and K. Oweiss, "Identifying Functional Connectivity of Motor Neuronal Ensembles Improves the Performance of Population Decoders," Proc. of 4th Int. IEEE EMBS Conf. on Neural Engineering, pp. 534-537, 2009. [195] J. Yu, V. A. Smith, P. P. Wang, A. J. Hartemink, and E. D. Jarvis, "Advances to Bayesian network inference for generating causal networks from observational biological data," Bioinformatics, vol. 20, pp. 3594-3603, 2004. [196] S. Panzeri, N. Brunel, N. K. Logothetis, and C. Kayser, "Sensory neural codes using multiplexed temporal scales," Trends Neurosci, vol. 33, pp. 111–120, 2010. [197] C. I. Moore, S. B. Nelson, and M. Sur, "Dynamics of neuronal processing in rat somatosensory cortex," Trends in Neurosciences, vol. 22, p. 513:520, 1999. 167 [198] D. E. Feldman and M. Brecht, "Map plasticity in somatosensory cortex," Science, vol. 310, pp. 810–815, 2005. [199] M. E. Diamond, M. v. Heimendahl, P. M. Knutsen, D. Kleinfeld, and E. Ahissar, "“Where” and “what” in the whisker sensorimotor system," Nat Rev Neurosci vol. 9, pp. 601–612, 2008. [200] C. C. H. Petersen, "The functional organisation of the barrel cortex," Neuron, vol. 56, pp. 339-355, 2007. [201] D. Kleinfeld, E. Ahissar, and M. E. Diamond, "Active sensation: insights from the rodent vibrissa sensorimotor system," Curr. Opin. Neurobiol. , vol. 16, pp. 435–444, 2006. [202] A. A. Ghazanfar and M. A. L. Nicolelis, "The structure and function of dynamic cortical and thalamic receptive fields," Cereb Cortex, vol. 11, pp. 183–193, 2001. [203] M. C. Wiest, N. Bentley, and M. A. L. Nicolelis, "Heterogeneous integration of bilateral whisker signals by neurons in primary somatosensorycortex of awake rats," J Neurophysiol, vol. 93, pp. 2966–2973, 2005. [204] D. Schubert, J. F. Staiger, N. Cho, R. Kötter, K. Zilles, and H. J. Luhmann, "Layerspecific intracolumnar and transcolumnar functional connectivity of layer V pyramidal cells in rat barrel cortex," J Neurosci, vol. 21, pp. 3580-3592, 2001. [205] N. Wright and K. Fox, "Origins of cortical layer V surround receptive fields in the rat barrel cortex," J. Neurophysiol., vol. 103, pp. 709–724, 2010. [206] D. J. Simons, G. E. Carvell, A. E. Hershey, and D. P. Bryant, "Responses of barrel cortex neurons in awake rats and effects of urethane anesthesia," Exp Brain Res, vol. 91, pp. 259–272, 1992. [207] A. A. Ghazanfar and M. A. L. Nicolelis, "Spatiotemporal properties of layer V neurons in the rat primary somatosensory cortex," Cereb. Cortex, vol. 9, pp. 348-361, 1999. [208] B. E. Mercier, C. R. Legg, and M. Glickstein, "Basal ganglia and cerebellum receive different somatosensory information in rats," Proc Natl Acad Sci USA, vol. 87, pp. 4388– 4392, 1990. 168 [209] R. Izraeli and L. L. Porter, "Vibrissal motor cortex in the rat: connections with the barrel field," Exp Brain Res, vol. 104, pp. 41–54, 1995. [210] M. Zhang and K. D. Alloway, "Stimulus-induced intercolumnar synchronization of neuronal activity in rat barrel cortex: a laminar analysis," J. Neurophysiol., vol. 92, pp. 1464–1478, 2004. [211] K. D. Alloway, "Information processing streams in rodent barrel cortex: the differential functions of barrel and septal circuits," Cereb. Cortex, vol. 18, pp. 979–989, 2008. [212] F. Matyas, V. Sreenivasan, F. Marbach, C. Wacongne, B. Barsy, C. Mateo, R. Aronoff, and C. C. H. Petersen, "Motor Control by Sensory Cortex," Science, vol. 330, pp. 12401243, 2010. [213] M. A. Lebedev, G. Mirabella, I. Erchova, and M. E. Diamond, "Experience-dependent Plasticity of Rat Barrel Cortex: Redistribution of Activity across Barrel-columns," Cereb Cortex, vol. 10, pp. 23 - 31, 2000. [214] K. Y. Kwon, S. Eldawlatly, and K. G. Oweiss, "NeuroQuest: A Comprehensive Tool for Large Scale Neural Data Processing and Analysis," in Proc. of 4th Int. IEEE EMBS Conf. on Neural Engineering, 2009, pp. 622-625. [215] K. Y. Kwon, S. Eldawlatly, and K. Oweiss, "NeuroQuest: A Comprehensive Analysis Tool for Extracellular Neural Ensemble," Journal of Neuroscience Methods, in review. [216] S. Temereanca, E. N. Brown, and D. J. Simons, "Rapid changes in thalamic firing synchrony during repetitive whisker stimulation," J Neurosci, vol. 28, pp. 11153-11164, 2008. [217] C. Shannon, "A mathematical theory of communication," Bell. Sys. Tech. J., vol. 27, pp. 379–423, 1948. [218] B. Luo, R. C. Wilson, and E. R. Hancock, "Spectral Embedding of Graphs," Pattern Recognition, vol. 36, pp. 2213-2230, 2003. [219] R. Q. Quiroga and S. Panzeri, "Extracting information from neuronal populations: information theory and decoding approaches," Nature Rev. Neurosci., vol. 10, pp. 173– 185, 2009. 169 [220] M. C. Jones, J. S. Marron, and S. J. Sheather, "A Brief Survey of Bandwidth Selection for Density Estimation," Journal of the American Statistical Association, vol. 91, pp. 401– 407, 1996. [221] W. Truccolo, L. R. Hochberg, and J. P. Donoghue, "Collective dynamics in human and monkey sensorimotor cortex: predicting single neuron spikes," Nat Neurosci, vol. 13, pp. 105-111, 2010. [222] R. S. Petersen, S. Panzeri, and M. E. Diamond, "Population coding of stimulus location in rat somatosensory cortex," Neuron, vol. 32, pp. 503–514, 2001. [223] C. P. de Kock, R. M. Bruno, H. Spors, and B. Sakmann, "Layer- and cell-type-specific suprathreshold stimulus representation in rat primary somatosensory cortex," J. Physiol. (Lond.), vol. 581, pp. 139–154 2007. [224] T. Celikel, V. A. Szostak, and D. E. Feldman, "Modulation of spike timing by sensory deprivation during induction of cortical map plasticity," Nat. Neurosci., vol. 7, pp. 534– 541, 2004. [225] R. S. Petersen, S. Panzeri, and M. E. Diamond, "Population coding in somatosensory cortex," Curr. Opin. Neurobiol., vol. 12, pp. 441–447, 2002. [226] S. P. Jadhav, J. Wolfe, and D. E. Feldman, "Sparse temporal coding of elementary tactile features during active whisker sensation," Nat. Neurosci., vol. 12, pp. 792–800, 2009. [227] G. Foffani, B. Tutunculer, and K. A. Moxon, "Role of spike timing in the forelimb somatosensory cortex of the rat," J. Neurosci., vol. 24, pp. 7266–7271, 2004. [228] K. D. Harris, "Neural signatures of cell assembly organization," Nat. Rev. Neurosci., vol. 6, pp. 399–407, 2005. [229] G. Foffani, M. L. Morales-Botello, and J. Aguilar, "Spike timing, spike count, and temporal information for the discrimination of tactile stimuli in the rat ventrobasal complex," J. Neurosci., vol. 29, pp. 5964–5973, 2009. [230] D. S. Barth, "Submillisecond synchronization of fast electrical oscillations in neocortex," J Neurosci, vol. 23, pp. 2502–2510, 2003. 170 [231] G. Buzsáki, "Neural Syntax: Cell Assemblies, Synapsembles, and Readers " Neuron, vol. 68, pp. 362-385, 2010. [232] B. Haider and D. A. McCormick, "Rapid neocortical dynamics: cellular and network mechanisms," Neuron, vol. 62, pp. 171–189, 2009. [233] B. M. Hooks, S. A. Hires, Y.-X. Zhang, D. Huber, L. Petreanu, K. Svoboda, and G. M. G. Shepherd, "Laminar Analysis of Excitatory Local Circuits in Vibrissal Motor and Sensory Cortical Areas," PLoS Biol, vol. 9, p. e100057, 2011. [234] M. Abeles, Local Cortical Circuits: An Electrophysiological study. Berlin: Springer, 1982. [235] A. K. Engel, P. Fries, and W. Singer, "Dynamic predictions: oscillations and synchrony in top-down processing," Nat. Rev. Neurosci., vol. 2, pp. 704–716, 2001. [236] B. Kolb and I. Q. Whishaw, "Brain plasticity and behavior," Annu Rev Psychol, vol. 49, pp. 43–64, 1998. [237] S. A. Siegelbaum and E. R. Kandel, "Learning-Related Synaptic Plasticity: LTP and LTD," Curr Opin Neurobiol, vol. 1, pp. 113–120, 1991. [238] L. F. Abbott and S. B. Nelson, "Synaptic plasticity: taming the beast," Nature Neurosci., vol. 3, pp. 1178–1183, 2000. [239] K. Fox and R. O. Wong, "A comparison of experience-dependent plasticity in the visual and somatosensory systems," Neuron, vol. 48, pp. 465–477, 2005. [240] E. Gao and N. Suga, "Experience-dependent plasticity in the auditory cortex and the inferior colliculus of bats: role of the corticofugal system," Proc. Natl Acad. Sci. USA, vol. 97, pp. 8081–8086, 2000. [241] T. Weiss, W. H. R. Miltner, R. Huonker, R. Friedel, I. Schmidt, and E. Taub, "Rapid functional plasticity of the somatosensory cortex after finger amputation," Exp. Brain Res., vol. 134, pp. 199–203, 2000. [242] J. H. Kaas, "Plasticity of sensory and motor maps in adult mammals," Annu. Rev. Neurosci., vol. 14, pp. 137–167, 1991. 171 [243] K. Fox, "Anatomical pathways and molecular mechanisms for plasticity in the barrel cortex," Neuroscience, vol. 111, pp. 799-814, 2002. [244] M. Fu and Y. Zuo, "Experience-dependent structural plasticity in the cortex " Trends Neurosci, vol. 34, pp. 177-187, 2011. [245] M. E. Diamond, M. Armstrong-James, and F. F. Ebner, "Experience-dependent plasticity in adult rat barrel cortex," Proc. Natl Acad. Sci. USA, vol. 90, pp. 2082–2086, 1993. [246] M. Armstrong-James, M. E. Diamond, and F. F. Ebner, "An innocuous bias in whisker use in adult rats modifies receptive fields of barrel cortex neurons," J. Neurosci. Methods, vol. 14, pp. 6978–6991, 1994. [247] M. E. Diamond, W. Huang, and F. F. Ebner, "Laminar comparison of somatosensory cortical plasticity," Science, vol. 265, pp. 1885-1888, 1994. [248] H. Sellien and F. F. Ebner, " Rapid plasticity follows whisker pairing in barrel cortex of the awake rat," Exp. Brain Res., vol. 177, pp. 1–14, 2007. [249] S. W. Scheff and D. A. Price, "Synaptic pathology in Alzheimer's disease: a review of ultrastructural studies," Neurobiol Aging, vol. 24, p. 1029−1046, 2003. [250] D. Lee, N. L. Port, W. Kruse, and A. P. Georgopoulos, "Variability and correlated noise in the discharge of neurons in motor and parietal areas of the primate cortex," J Neurosci, vol. 18, pp. 1161–1170, 1998. [251] I. Bar-Gad, Y. Ritov, E. Vaadia, and H. Bergman, "Failure in identification of overlapping spikes from multiple neuron activity causes artificial correlations," J Neurosci Methods, vol. 107, pp. 1-13, 2001. [252] J. W. Pillow, L. Paninski, V. J. Uzzell, E. P. Simoncelli, and E. J. Chichilnisky, "Prediction and Decoding of Retinal Ganglion Cell Responses with a Probabilistic Spiking Model," J. Neurosci., vol. 25, pp. 11003-11013, November 23, 2005 2005. [253] W. Wu, Y. Gao, E. Bienenstock, J. P. Donoghue, and M. J. Black, "Bayesian Population Decoding of Motor Cortical Activity Using a Kalman Filter," Neural Computation, vol. 18, pp. 80-118, 2005. 172 [254] D. D. Lee and S. Seung, "Learning the parts of objects by nonnegative matrix factorization," Nature, vol. 401, pp. 788 - 791, 1999. [255] S. P. Kim, Y. N. Rao, D. Erdogmus, J. C. Sanchez, M. A. L. Nicolelis, and J. C. Principe, "Determining patterns in neural activity for reaching movements using nonnegative matrix factorization," EURASIP J. Appl. Signal Process., vol. 19, pp. 3113-3121, 2005. [256] L. Tao, J. D. Lauderdale, and A. T. Sornborger, "Mapping functional connectivity between neuronal ensembles with larval zebrafish transgenic for a ratiometric calcium indicator," Front. Neural Circuits, vol. 5, 2011. [257] J. W. H. Schnupp and O. Kacelnik, "Cortical Plasticity: Learning from Cortical Reorganisation," Current Biology, vol. 12, pp. R144-R146, 2002. [258] J. R. Wickens, J. N. Reynolds, and B. I. Hyland, "Neural mechanisms of reward-related motor learning," Curr Opin Neurobiol, vol. 13, pp. 685-90, 2003. [259] O. A. Gharbawie and I. Q. Whishaw, "Parallel stages of learning and recovery of skilled reaching after motor cortex stroke:“Oppositions” organize normal and compensatory movements," Behav Brain Res, vol. 175, pp. 249-262, 2006. [260] E. H. Baeg, Y. B. Kim, J. Kim, J. W. Ghim, J. J. Kim, and M. W. Jung, "LearningInduced Enduring Changes in Functional Connectivity among Prefrontal Cortical Neurons," Journal of Neuroscience, vol. 27, p. 909, 2007. [261] D. Dupret, A. Fabre, M. D. Döbrössy, A. Panatier, J. J. Rodríguez, S. Lamarque, V. Lemaire, S. H. R. Oliet, P. V. Piazza, and D. N. Abrous, "Spatial Learning Depends on Both the Addition and Removal of New Hippocampal Neurons," PLoS Biol, vol. 5, p. e214, 2007. [262] D. E. Feldman, "Synaptic Mechanisms for Plasticity in Neocortex," Annual Review of Neuroscience, vol. 32, pp. 33-55, 2009. [263] N. Dancause, S. Barbay, S. B. Frost, E. J. Plautz, D. Chen, E. V. Zoubina, A. M. Stowe, and R. J. Nudo, "Extensive cortical rewiring after brain injury," Journal of Neuroscience, vol. 25, pp. 10167-10179, 2005. 173 [264] J. Girgis, D. Merrett, S. Kirkland, G. A. S. Metz, V. Verge, and K. Fouad, "Reaching training in rats with spinal cord injury promotes plasticity and task specific recovery," Brain, vol. 130, p. 2993, 2007. 174