STATISTICAL INFERENCE WITH HIGH-DIMENSIONAL DEPENDENT DATA By Shawn M. Santo A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Statistics – Doctor of Philosophy 2018 ABSTRACT STATISTICAL INFERENCE WITH HIGH-DIMENSIONAL DEPENDENT DATA By Shawn M. Santo High-dimensional time dependent data appear in practice when a large number of variables are repeatedly measured for a relatively small number of experimental units. The number of repeated measurements can range from two to hundreds depending on the application. Advances in technology have made the process of gathering and storing data such as these relatively low-cost and efficient. Demand to analyze such complex data arises in genetics, microbiology, neuroscience, finance, and meteorology. In this dissertation, we first intro- duce and investigate a novel solution to a classical problem that involves high-dimensional time dependent data. In addition, we propose a new approach to analyze high-dimensional dependent genomics data. First, we consider detecting and identifying change points among covariance matrices of high-dimensional longitudinal data and high-dimensional functional data. The proposed methods are applicable under general temporospatial dependence. A new test statistic is introduced for change point detection, and its asymptotic distribution is established under two different asymptotic settings. If a change point is detected, an estimate for the lo- cation is provided. We investigate the rate of convergence for the change point estimator and study how it is impacted by dimensionality and temporospatial dependence in each asymptotic framework. Binary segmentation is applied to estimate the locations of possibly multiple change points, and the corresponding estimator is shown to be consistent under mild conditions for each asymptotic setting. Simulation studies demonstrate the empirical size and power of the proposed test and accuracy of the change point estimator. We apply our procedures on a time-course microarray data set and a task-based fMRI data set. In the second part of this dissertation we consider a hierarchical high-dimensional de- pendent model in the context of genomics. Our model analyzes RNA sequencing data to identify polymorphisms with allele-specific expression that are correlated with phenotypic variation. Through simulation, we demonstrate that our model can consistently select sig- nificant predictors among a large number of possible predictors. We apply our model to an RNA sequencing and phenotypic data set derived from a sounder of swine. ACKNOWLEDGMENTS I would like to express the utmost gratitude to my advisor, Dr. Ping-Shou Zhong, for his assistance, support, guidance, and encouragement. Dr. Zhong taught me what it means to be a researcher in an academic environment. I would also like to thank my three committee members: Dr. Yuehua Cui, Dr. Hyokyoung Hong, and Dr. Juan Steibel. Their service is greatly appreciated. Furthermore, I would like to thank the Department of Statistics and Probability for the resources and opportunities provided to me during my Ph.D. career. Lastly, I would not have pursued a Ph.D. if it were not for Dr. M. P´adraig M. M. McLoughlin. Dr. McLoughlin challenged and encouraged me as an undergraduate student at Kutztown University of Pennsylvania in a way no professor had before. iv TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . KEY TO ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Technology and the field of statistics 1.2 Low to high-dimensional data . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Independent to dependent data . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Change point detection and identification . . . . . . . . . . . . . . . . . . . . 1.5 High-dimensional time dependent data . . . . . . . . . . . . . . . . . . . . . 1.6 Dissertation outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 2 HOMOGENEITY TESTS OF COVARIANCE MATRICES WITH HIGH-DIMENSIONAL LONGITUDINAL DATA . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Basic setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Homogeneity tests of covariance matrices . . . . . . . . . . . . . . . . . . . . 2.3.1 Non-Gaussian random errors . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Power-enhanced test for sparse alternatives . . . . . . . . . . . . . . . 2.4 Change point identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Simulation studies 2.5.1 Power-enhanced test statistic . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Non-Gaussian random errors . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Accuracy of correlation matrix estimator of VnD . . . . . . . . . . . . 2.5.4 Comparison with a pair-wise based method . . . . . . . . . . . . . . . 2.6 An empirical study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Technical details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.1 Proofs of lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.2 Proofs of main results . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 3 COVARIANCE CHANGE POINT DETECTION AND IDENTIFI- CATION WITH HIGH-DIMENSIONAL FUNCTIONAL DATA . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Model 3.3 Change point detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Computation of the proposed statistics . . . . . . . . . . . . . . . . . . . . . 3.5 Change point identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Simulation studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 An empirical study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Technical details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix xi 1 1 2 5 6 8 9 12 12 16 17 22 23 26 28 31 32 33 35 36 40 40 50 63 63 69 70 74 78 81 88 94 v 3.8.1 Proofs of lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8.2 Proofs of theorems 94 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 CHAPTER 4 A HIDDEN MARKOV APPROACH FOR QTL MAPPING USING ALLELE-SPECIFIC EXPRESSION SNPS . . . . . . . . . . . . . . . 133 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 4.2 A hidden Markov model for SNP genotype calling . . . . . . . . . . . . . . . 136 4.3 Phenotypic model specification . . . . . . . . . . . . . . . . . . . . . . . . . 138 4.3.1 Prediction of ASE ratios . . . . . . . . . . . . . . . . . . . . . . . . . 139 4.3.2 . . . . . . . . . . . . . . . . . 141 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 4.4 Simulation studies 4.5 An empirical study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Identification of quantitative trait loci CHAPTER 5 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 5.1 5.2 Summary of contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 5.3 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 vi LIST OF TABLES Table 2.1: Empirical size and power of the proposed test, percentages of simulation replications that reject the null hypothesis under settings (I) and (II) . . . 29 Table 2.2: Percentages of correct change point identification among all rejected hy- potheses under settings (I) and (II) . . . . . . . . . . . . . . . . . . . . . 30 Table 2.3: Average true positives and average true negatives for identifying multiple change points using the proposed binary segmentation method. Standard errors are included after each number. For T = 5, the maximum number of true positives and true negatives for each is 2. For T = 8, the maximum number of true positives and true negatives is 2 and 5, respectively . . . Table 2.4: Empirical size and power, percentages of simulation replications that re- ject the null hypothesis for the test statistic Mn and the power-enhanced test statistic M∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . n Table 2.5: Empirical size and power of the proposed test, percentages of simulation replications that reject the null hypothesis for data generated from a standardized Gamma distribution under the nominal level 5% . . . . . . 31 32 33 Table 2.6: Percentages of correct change point identification among all rejected hy- potheses for data generated from a standardized Gamma distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Table 2.7: Empirical size and power, percentages rejecting the null hypotheses in the simulations, for the pair-wise based test and the power-enhanced test statistic M∗ n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 2.8: Significant gene ontology terms, test statistic values, number of genes in each gene ontology term, identified change points and estimated local false discovery rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 39 Table 3.1: Empirical size and power of the proposed test, percentages of simulation replications that reject the null hypothesis . . . . . . . . . . . . . . . . . . 83 Table 3.2: Empirical size and power of the proposed test for T = 100, percentages of simulation replications that reject the null hypothesis, quantile computed from a correlation matrix that used linear interpolation. The first 5 off- diagonals were computed exactly as well as the last w components for each row . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 vii Table 3.3: Empirical size and power of the proposed test for T = 100, percentages of simulation replications that reject the null hypothesis, quantile computed from a correlation matrix that used linear interpolation. The first 10 off- diagonals were computed exactly as well as the last w components for each row . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 3.4: Empirical size and power of the proposed test for T = 100, percentages of simulation replications that reject the null hypothesis, quantile computed from a correlation matrix that used linear interpolation. The first 20 off- diagonals were computed exactly as well as the last w components for each row . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 3.5: Average true positives and average true negatives for identifying multiple change points using the proposed binary segmentation method. The maximum number of true positives for a given replication is 2. The maximum number of true negatives for a given replication is T − 3 . . . . Table 3.6: Standard errors for average true positives and average true negatives given in Table 3.5. The maximum number of true positives for a given replication is 2. The maximum number of true negatives for a given replication is T − 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 3.7: Identified change points in the Sherlock fMRI data set. Range of time points preceding the identified change point where the covariance matri- ces are temporally homogeneous. An interval ID provides a reference to Figure 3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 85 87 88 91 Table 4.1: Average false positive and average false negative rates for the single test with significance level 0.01. Average false positive rate is top value . . . . 146 Table 4.2: Average false positive and average false negative rates for the simulta- neous test with nominal level 0.05. Average false positive rate is top value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Table 4.3: Alternative method 1, average false positive and average false negative rates for the single test with significance level 0.01. Average false positive rate is top value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 Table 4.4: Alternative method 2, average false positive and average false negative rates for the single test with significance level 0.01. Average false positive rate is top value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 viii LIST OF FIGURES Figure 1.1: Population covariance heat maps at six time points. Change points exist at time t = 3 and at time t = 4. . . . . . . . . . . . . . . . . . . . . . . . 9 Figure 1.2: A small graphical model for the problem considered in Chapter 4. Grey circles represent observed values. White circles represent latent variables. 10 Figure 2.1: The average component-wise quadratic distance between ˆVnD and VnD. The top solid line is for n = 40; the middle dashed line is for n = 50; the bottom dotted line is for n = 60. The scale of the y-axis is 10−5. . . 34 Figure 2.2: Histogram of the number of genes among the 159 gene ontology terms analyzed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Figure 2.3: Correlation network map for gene ontology term 0030054. Each dot rep- resents a gene within the gene ontology. A link between dots indicates a strong correlation between genes. . . . . . . . . . . . . . . . . . . . . . Figure 3.1: Accuracy of linear interpolation for ˆRn,tq. Black circles represent ˆRn,1q for all q ∈ {1, . . . , T − 1}. Red triangles represent the corresponding interpolated values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 77 Figure 3.2: Shen 268 node parcellation. This image was obtained from Finn et al. (2015). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Figure 3.3: Correlation networks based on an average over a time interval in which the covariance matrices are homogeneous. Each circle is comprised of 67 Shen nodes. Solid lines represent a positive correlation, and dashed lines represent a negative correlation. The darker the line the stronger the correlation between nodes. A correlation threshold value of 0.70 in absolute values was used. . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Figure 4.1: A graphical model for illustrating the hidden Markov model for SNP genotype calling. Grey circles represent observed values. White circles represent latent variables. . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Figure 4.2: Grey circles represent observed values. White circles represent latent variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Figure 4.3: Average false negative rates and average false positive rates for the proposed method. Facets in row 1 are for n = 50. Facets in row 2 are for n = 100. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 ix Figure 4.4: Average false negative rates and average false positive rates for alterna- tive procedure 1. Facets in row 1 are for n = 50. Facets in row 2 are for n = 100. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Figure 4.5: Average false negative rates and average false positive rates for alterna- tive procedure 2. Facets in row 1 are for n = 50. Facets in row 2 are for n = 100. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Figure 4.6: ASE estimates from the hidden Markov model compared to simulated raw allele count ratios. Hidden Markov model imputed ASE ratios with value less than 0.50 are marked as red, and values above 0.50 are marked as blue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Figure 4.7: Estimates for SNPs. Significant SNPs are displayed with their respective ID provided in the real data set. IDs correspond to the ordered locations. 152 Figure 4.8: ASE estimates from the hidden Markov model compared to real raw allele count ratios. Hidden Markov imputed ASE values conditional on Gil = 3 and Gil = 4 are marked as blue and red, respectively. . . . . . . 153 Figure 4.9: ASE estimates from the hidden Markov model compared to real raw allele count ratios. Hidden Markov imputed ASE values conditional on Gil = 3 and Gil = 4 are marked as blue and red, respectively. . . . . . . 153 x KEY TO ABBREVIATIONS ASE Allele-specific expression BOLD Blood-oxygen-level dependent CUSUM Cumulative sum DNA Deoxyribonucleic acid EM Expectation-maximization FDR False discovery rate fMRI Functional magnetic resonance imaging GO Gene ontology Lasso Least absolute shrinkage and selection operator QTL Quantitative trait loci mapping RNA Ribonucleic acid SNP Single-nucleotide polymorphism xi CHAPTER 1 INTRODUCTION 1.1 Technology and the field of statistics Technology is one of the chief drivers of growth and innovation in society, and its impact on the field of statistics cannot be understated. For much of the twentieth century, statisti- cians concentrated on solving problems in a classical setting, where the number of subjects, observations, or experimental units, exceeded the number of variables or features measured. If p is the number of variables or features, and n is the number of experimental units, then this classical setting is the so-called ‘small p, large n’ setting. The demand to develop robust theoretical procedures under the ‘small p, large n’ setting was due in large part to the data and resources available at the time. Computers were not efficient, data recording was not automated, and the scope of technology was limited; thus, there was little motivation to consider situations in which p far exceeded n. In fact, even as late as 1981, it was considered poor practice to have a study in which n/p < 5 (Huber 1981). The past thirty years have been an era of accelerated technological progress in many fields in society. Biology, finance, economics, computer science, meteorology, and others, all have the available resources to gather massive amounts of information. The need to filter, understand, and analyze this information continues to grow. Data sets in numerous domain specific fields now often have more variables recorded than experimental units. This ‘large p, small n’ setting is what is referred to as high-dimensional data. As technology and data recording processes improve, statisticians will play an integral role in developing theoretically robust and computationally efficient statistical methods to analyze such complex data. 1 1.2 Low to high-dimensional data The research in high-dimensional data has seen a shift over the past two decades from es- timation to more complex forms of inference. Estimation is often an initial step in inference, but it does not allow us to quantify uncertainty. Much of the focus with regards to estimation in a high-dimensional framework has been geared towards parameter estimation in general- ized linear models and graphical models (B¨uhlmann and van de Geer 2011). Donoho and Johnstone (1994) pioneered parameter estimation in a linear model when p = n. To obtain sparse estimation, Tibshirani (1996) proposed an (cid:96)1-norm penalization procedure known as least absolute shrinkage and selection operator (Lasso). Under a sparsity assumption and other regularization conditions, Lasso simultaneously performs parameter estimation and variable selection. Tibshirani’s seminal paper resulted in an extensive study of Lasso’s theoretical properties and paved a way for valuable (cid:96)1-norm and (cid:96)2-norm penalization ex- tensions. For example, Zou and Hastie (2005) introduced the elastic net to address some short-comings with regards to the number of covariates selected via Lasso. Tibshirani et al. (2005) and Yuan and Lin (2006) proposed fused Lasso and group Lasso, respectively. In 2006, Zou (2006) introduced adaptive Lasso. Fu and Knight (2000) and Zhao and Yu (2006) investigated the asymptotic behavior of Lasso-type estimates and proved under certain con- ditions that when the true parameter is 0, there exists non-zero probability mass at 0 for the estimator’s limiting distribution. From a computation standpoint, Osborne et al. (2000) studied the primal and dual problem of Lasso, and as a result, developed a fast and efficient algorithm to obtain Lasso estimates. There is a long list of literature on regularization es- timation for high-dimensional parameters. Since the main focus of this dissertation is not estimation, we do not enumerate all of them. Some important works include: Fan and Li (2001), Candes and Tao (2007), and Zhang (2010). Inference as it relates to hypothesis testing or confidence intervals allows researchers to make scientific discoveries and improve decision making. However, statistical inference of these forms in high-dimensional data are not simple extensions of the classical inference 2 procedures, where the number of sample subjects exceeds the number of variables measured. As was noted by Johnstone and Titterington (2009), It should not, of course, be imagined that the ‘large p’ scenarios are mere al- ternative cases to be explored in the same spirit as their ‘small p’ forebears. A better analogy would lie in the distinction between linear and nonlinear models and methods — the unbounded variety and complexity of departures from linear- ity is a metaphor (and in some cases a literal model) for the scope of phenomena that can arise as the number of parameters grows without limit. In terms of inference for high-dimensional mean vectors, Dempster (1958) first considered a two-sample test in a p > n setting. Bai and Saranadasa (1996), Chen and Qin (2010), and Cai and Xia (2014) proposed test statistics to extend the novel work of Dempster in 1958. Fujikoshi et al. (2010) provides an overview and details on testing high-dimensional mean vectors. The work on testing high-dimensional covariance matrices can be traced back to Ledoit and Wolf (2002), where they assumed p/n converges to some constant, and proved under a normality assumption that their test statistics are normal. Methodology building off Ledoit and Wolf include: Chen et al. (2010) and Cai and Ma (2013). Schott (2007), Srivastava and Yanagihara (2010), Li and Chen (2012), and Cai et al. (2013) all investigated the problem of testing the equality of high-dimensional covariance matrices for two or multiple groups. More recently, Ahmad (2017) and Zhang et al. (2018) generalized the work of Li and Chen (2012). Some testing and confidence interval procedures with regards to Lasso estimates and generalized linear models were established by Bach (2008), Meinshausen and B¨uhlmann (2010), and Zhang and Zhang (2013). To elucidate one of the challenges brought about in a high-dimensional framework, con- sider a classical test with regards to covariance matrices under the ‘small p, large n’ setting. Muirhead (2005) details a few of these tests, along with some tests for mean vectors. Suppose 3 we are interested in testing H0 : Σ1 = ··· = ΣT versus H1 : Not all are equal, (1.1) where we assume Xit (i = 1, . . . , n; t = 1, . . . , T ) is a p-dimensional random vector from a multivariate normal distribution with mean µt and covariance Σt. Let, xit be a realization of Xit from the tth population. Assume that the T populations are independent and the random sample of n vectors from each of the T populations are independent. The likelihood ratio test can be used to develop an α-level test for (1.1). The likelihood function is given by T(cid:89) L(µt, Σt) = (2π)−pn/2|Σt|−n/2 exp t=1 (cid:26) n(cid:88) i=1 − 1 2 (xit − µt)TΣ−1 t (xit − µt) (cid:27) . For observed data, L(µt, Σt) is a function of µt and Σt for all t. To obtain the likelihood criterion we maximize L(µt, Σt) under the restricted parameter space of the null hypothesis and also under the unrestricted parameter space. Let Ω = {(µt, Σt) : t = 1, . . . , T} and Ω0 = {(µt, Σt) : Σ1 = ··· = ΣT} denote the unrestricted and restricted parameter spaces, respectively. Thus, the likelihood criterion is defined as Let N = T n, and let A = (cid:80)T λn = L(µt, Σt) supΩ0 supΩL(µt, Σt) t=1 At, where At = (cid:80)n . i=1(xit − ¯xt)(xit − ¯xt)T. For the parameter space Ω, the maximum likelihood estimators are ˆµt,Ω = ¯xt and ˆΣt,Ω = At/n, for µt and Σt, respectively. For the parameter space Ω0, the maximum likelihood estimators = ¯xt and ˆΣΩ0 are ˆµt,Ω0 values back into λn and taking the logarithm, = A/N , for µt and Σt, respectively. Therefore, substituting these Λn = −2 log(λn) = N log | ˆΣΩ0 (cid:18) (cid:19) | − T(cid:88) t=1 (cid:18) (cid:19) | ˆΣt,Ω| . n log Hence, an α-level test rejects H0 of (1.1) whenever Λn < Λα. Under an asymptotic setting when n diverges and p is fixed, the null distribution of Λn can be derived. Furthermore, as 4 n → ∞, At/(n − 1) → Σt in probability. Thus, the standard asymptotic results hold for likelihood ratios, Λn → χ2 in distribution with degrees of freedom (T − 1)p(p + 1)/2, under the ‘small p, large n’ setting. However, breakdowns occur if we consider a ‘large p, small n’ framework. results are not easily extended. If p > n, then we can no longer compute log(| ˆΣΩ0 Under a ‘large p, small n’ setting, Λn can no longer be computed and the asymptotic |) due to At and A being singular. Furthermore, the asymptotic distribution under the null hypothesis is not well defined for when p diverges. In a high-dimensional framework with p > n, the convergence in probability of At/(n − 1) → Σt no longer holds as demonstrated through spectral analysis by Bai and Yin (1993), Johnstone (2001), and others. As a result, testing (1.1) is not possible via a likelihood ratio test. This is just one example in which breakdowns in the classical methods occur due to an increase in data dimension. This phenomena is known as the “curse of dimensionality”. An increase in data dimension can produce extra noise, computation challenges, and a failure in many of the existing classical statistical procedures. However, in certain situations an increase in dimensionality may be a blessing (Donoho 2000). For further challenges associated with high-dimensional data we encourage readers to see Fan and Li (2006) and Fan et al. (2014a). 1.3 Independent to dependent data The likelihood ratio test for (1.1) as described in Section 1.2 further breaks down if the T groups are not independent. In this dissertation, measurements of a sample that are repeatedly recorded will be referred to as longitudinal data when the number of repeated measurements is small. If the number of repeated measurements is large, or dense, we will refer to it as functional data. Measurements taken over time allow researchers to understand the evolution of the sample subjects, detect and identify changes in certain variables across time, and study sequences of events. In longitudinal or functional data sets, temporal depen- 5 dence exists among measurements from the same subject, and adds a layer of complexity to the theoretical and computational analysis. Methodology developed under a T -independent sample framework is not applicable for a T -dependent sample. For example, Chen and Qin (2010) and Li and Chen (2012) considered an independent two-sample high-dimensional test for mean vectors and covariance matrices, respectively. However, their methods are not applicable in a temporal dependent setting. There are two types of dependence in the data: temporal and spatial. If these dependencies are ignored, then inference procedures are invalid and misleading. Currently, there is no existing work accounting for the aforemen- tioned dependencies in high-dimensional covariance testing and change point detection and identification. The asymptotic analysis is more complicated when both dependencies are considered. Generalizing to an asymptotic framework for high-dimensional functional data further increases complexity. 1.4 Change point detection and identification Given (1.1) for time dependent data, two questions naturally arise. First: Can we detect changes among T dependent covariance matrices? Second: Can we identify the time points for where those changes occur? These questions have profound effects for time dependent data. Their answers can provide critical information to individuals in the fields of finance, genetics, neuroscience, climatology, and more. Change point detection is a classical problem in time series analysis. Numerous supervised and unsupervised machine learning algorithms are used in various change point detection applications. Aminikhanghahi and Cook (2016) detail a few multi-class supervised learning algorithms such as Gaussian mixture models, hidden Markov models, and decision trees. Their work also highlights likelihood ratios, probabilistic models, graphs, and clustering as further approaches to the change point detection problem. One of the most common tech- niques in change point detection is the cumulative sum (CUSUM) method by Page (1954). Measurements in a process are cumulatively summed according to a weighted procedure. A 6 change point is identified once the cumulative sum quantity exceeds a threshold value. Cher- noff and Zacks (1964) laid the groundwork for change point detection with regards to the mean of normal random variables. Accordingly, a series of methodologies were developed in independent univariate and multivariate settings. Some of these works include: Kander and Zacks (1966), Yao and Davis (1986), Sen and Srivastava (1973), and Srivastava and Worsley (1986). Chapter 2 in both Cs¨org¨o and Horv´ath (1997) and Brodsky and Darkhovsky (1993) detail nonparametric change point detection methods based on Wilcoxon-type statistics, U- type statistics, and M-estimators. Johnson and Bagshaw (1974), Brown et al. (1975), and Horv´ath and Kokoszka (1997) introduced methods to address the change point problem for dependent data. For further details on classical change point detection and identification procedures, we refer readers to Basseville and Nikiforov (1993) and Brodsky (2017). In terms of a classical procedure for testing (1.1) with T dependent groups, there is none. A multivariate procedure to test (1.1) was proposed by Aue et al. (2009). Assume Xt (t = 1, . . . , T ) are p-dimensional temporal dependent random vectors from a multivariate distribution with mean µ and covariance Σt. Thus, xt is an observation at the tth time point. To test (1.1), Aue et al. (2009) considered the quantity, Sk (k = 1, . . . , T ) such that (cid:26) k(cid:88) j=1 Sk = 1√ T vech(xjxT j ) − k T (cid:27) vech(xjxT j ) , T(cid:88) j=1 i, j ∈ {1, . . . , T}. Based on Sk, they introduced a test statistic ΩT = T−1(cid:80)T where for any p × p symmetric matrix M , vech(M ) represents the stacked columns of the lower triangular region of M in the form of a p(p − 1)/2 vector. The quantity Sk was motivated by the fact that under H0 of (1.1), E{vech(xjxT i )} for all ˆΣ−1 T Sk, where ˆΣT is an estimator such that | ˆΣT − ΣT|E = op(1) as T diverges, and for any matrix A, |A|E = supx(cid:54)=0|Ax|/|x|. They derived the test statistic’s asymptotic distribution under the null hypothesis with T (cid:29) p. j )} = E{vech(xixT k=1 ST k However, Aue et al. (2009)’s method fails in a high-dimensional framework since ˆΣT is not invertible if p (cid:29) T . In addition, Aue et al. did not consider a setting in which n > 1, 7 and thus, their methodology does not permit multiple-subject inference. In Section 1.2 we highlighted the fact that recent research has addressed the high-dimensional challenges for testing (1.1) but not dependence. In this section we detailed a procedure that incorporates dependence but not a ‘large p, small n’ framework. Therefore, a gap exists. How can we test (1.1) for high-dimensional time dependent data? 1.5 High-dimensional time dependent data High-dimensional longitudinal data appear in practice when a large number of variables, p, are repeatedly measured for a relatively small number of experimental units, n. The number of repeated measurements, T , can range from two to hundreds depending on the application. Throughout this dissertation, longitudinal data will refer to settings when T is small. High-dimensional functional data will refer to settings in which T is large or dense. For details on functional data analysis we refer readers to Ramsay and Silverman (2005). Consider an experiment where patients have their gene expressions measured throughout the course of a treatment regimen. Doctors and clinicians may be interested in understand- ing how these gene expressions are regulated over time. In studies such as this, the number of gene expressions, p, measured is anywhere from a few hundred to a few thousand, and the number of patients, n, along with the number of repeated measurements, T , is small. We will refer to this as high-dimensional longitudinal data. As another example, consider a functional magnetic resonance imaging (fMRI) study where patients have their brain activ- ity measured while performing various tasks. Thousands of blood-oxygen-level dependent (BOLD) responses are recorded, hundreds of times during the duration of a scan, for voxels corresponding to regions of interest in the patient’s brain. For this single patient, radiologists may be interested in identifying and understanding significant spatial and temporal changes. The BOLD data from an fMRI experiment are considered high-dimensional functional data. 8 1.6 Dissertation outline In Chapters 2 and 3 of this dissertation we develop and evaluate a procedure to test (1.1) for high-dimensional longitudinal and high-dimensional functional data, respectively. To visualize our objective in a high-dimensional longitudinal setting, consider Figure 1.1. Each sub-plot represents the covariance matrix at the respective time point. From Figure 1.1 it is clear that the covariance is homogeneous between time points one through three; there is a different covariance structure at t = 4; for time points five and six the covariance structure is homogeneous again. Figure 1.1: Population covariance heat maps at six time points. Change points exist at time t = 3 and at time t = 4. Our statistical test will first detect the presence of any change points among the T covariance matrices. If we can conclude that change points exist, we further identify the time points at which changes occur. The procedures we propose are pioneering with regards to (1.1) for high-dimensional longitudinal and high-dimensional functional data. As is discussed in detail in Chapters 2 and 3, some research has provided a solution to test (1.1) in a high-dimensional 9 framework, but no method has been developed for high-dimensional time dependent data. In addition to the theoretical challenges, we also address the natural computation challenges that arise with such massive time dependent data. We ensure our method is practical and accessible to the end users in biology, neuroscience, and other fields via an R package. In Chapter 4 we consider a different type of high-dimensional dependent data, where we propose a novel hierarchical model for genomics applications. Our interest is to link a phenotypic response with single nucleotide polymorphisms (SNPs) that have allele-specific expression (ASE). To account for dependence among the latent genotype and ASE status combination, we consider a hidden Markov model and incorporate regularized regression to address the high-dimensionality. Our problem can be depicted with the graphical model in Figure 1.2 for the ith individual with five SNPs. Let Xil, Gil, δil be the RNA read counts, genotype and ASE status, and allele-specific expression ratio, respectively for the ith individual at the lth SNP. Let Yi be an observed phenotypic response. Given the relationships between X, G, and δ we first aim to estimate the latent variables Gil and δil given X and an assumed Markov structure for G. For an observed phenotypic response, Y , we use regularized regression to select the significant δs. Yi δi1 δi2 δi3 δi4 δi5 Gi1 Gi2 Gi3 Gi4 Gi5 Xi1 Xi2 Xi3 Xi4 Xi5 Figure 1.2: A small graphical model for the problem considered in Chapter 4. Grey circles represent observed values. White circles represent latent variables. In, Chapter 5, we discuss possible theoretical and computational extensions to the results of Chapters 2 – 4. 10 All proofs to lemmas and theorems are provided in the sections titled “Technical details” of the respective chapter. 11 CHAPTER 2 HOMOGENEITY TESTS OF COVARIANCE MATRICES WITH HIGH-DIMENSIONAL LONGITUDINAL DATA 2.1 Introduction In a typical time-course microarray data set, thousands of gene expression values are measured repeatedly from the same subject at different stages in a developmental process (Tai and Speed, 2006). As a motivating example, Taylor et al. (2007) conducted a longitudinal study on 69 patients infected with hepatitis C virus. Their gene expression values were measured once before treatment and five times during the treatment regimen of pegylated alpha interferon and ribavirin. One purpose of the study was to identify which genes were regulated by treatment. The repeated measurements enable researchers to understand gene regulation over time. An important task in genomic studies is to identify gene sets with significant temporal changes (Storey et al., 2005). Much evidence has shown that gene interaction and co-regulation play a critical role in the etiology of various diseases (Shedden and Taylor, 2005). One application of our methods is to identify gene sets with significant changes in their covariance matrices, because the covariance matrix or its inverse can be used for quantifying interaction and co-regulation among genes (Danaher et al., 2015). Assume that Yit = (Yit1, . . . , Yitp)T is a p-dimensional random vector with mean µt and covariance Σt. In the aforementioned applications, Yit (i = 1, . . . , n; t = 1, . . . , T ) represents gene expressions for p genes in a gene set measured from the ith individual at the tth developmental stage, where n is the sample size and T is the total number of finite stages. The number of genes, p, in a given gene set ranges from a hundred to a few thousand, as illustrated by the histogram in Figure 2.2 in Section 2.6, but n and T are small in the study. Thus, p can be much larger than n and T . We focus on testing the homogeneity of covariance 12 matrices: (2.1) for some 1 ≤ k (cid:54)= l ≤ T . The alternative in (2.1) can be written as a change point type H0 : Σ1 = ··· = ΣT versus H1 : Σk (cid:54)= Σl alternative: H1 : Σ1 = ··· = Σk1 (cid:54)= Σk1+1 = ··· = Σkq (cid:54)= Σkq+1 = ··· = ΣT , (2.2) where 1 ≤ k1 < ··· < kq < T are unknown locations of change points. This alternative is of interest in practice because it specifies the locations of changes. For example, researchers are often interested in understanding dynamic gene regulation. By identifying the change points, we can infer the change pattern of gene regulation, which is important for developing diagnostic and preventive tools for some diseases (Koh et al., 2014). Testing the homogeneity of covariance matrices is a classical problem in multivariate analysis. Classical methods for testing (2.1) include the likelihood ratio test (Muirhead, 2005) and Box’s M test (Box, 1949). Some resampling methods have also been proposed by Zhang and Boos (1992) and Zhu et al. (2002). However, these methods are not valid for the aforementioned applications for the following reasons. First, these methods require n to be much larger than p. Thus, they are not applicable under the large p, small n paradigm. Second, these methods are only valid for independent samples without temporal dependence, but the independence assumption is not valid for high-dimensional longitudinal data because the repeated measurements obtained from the same individual are temporally dependent. There is some existing research on testing (2.1) in the large p, small n scenario for independent samples. Li and Chen (2012) considered testing the equality of two covariance matrices for two independent samples. Schott (2007) and Srivastava & Yanagihara (2010) proposed test statistics for (2.1) based on estimators of the summation of the weighted pair- wise Frobenius norm distances between any two covariance matrices. Zheng et al. (2015) and Yang and Pan (2017) applied random matrix theory to test the equality of two large- dimensional covariance matrices. 13 Some methods have also been developed in neuroscience literature under the large p and large T setup with T > p, which is different from our large p, small n and T setup. For exam- ple, Barnett & Onnela (2016) proposed a sieve bootstrap covariance change point detection method that requires removing both boundaries of a time series with length greater than p to avoid ill-conditioned covariance matrices. Laumann et al. (2017) discussed a method for detecting changes in covariances by assessing the stability of multivariate kurtosis us- ing a simulation approach. Their methods also require T > p to ensure the existence of an inverse of a sample covariance matrix. In addition to the aforementioned multivariate detection procedures, a marginal pair-wise testing procedure was developed by Zalesky et al. (2014). Their approach relies on a sliding window to detect changes in correlation coef- ficients between a pair of coordinates. The p-value for each pair is obtained by resampling residuals after fitting vector autoregressive models. It is then applied to test the homogene- ity of covariance matrices using multiple testing. Despite the above advances, no existing multivariate method can be applied directly to test (2.1) for temporal dependent data under the large p, small n and T setup. This chapter proposes a new method for testing the equality of covariance matrices with high-dimensional longitudinal data under the large p, small n and T scenario. The proposed method considers both spatial and temporal dependence. Spatial dependence refers to the dependence among different components of Yit, and temporal dependence refers to the de- pendence between Yit and Yis for any two time points t (cid:54)= s. The asymptotic distribution of the proposed test statistic is derived under mild conditions on dependence without any explicit requirement on the relationships between p, n and T . We also propose a method for estimating the location of change points k1, . . . , kq among covariance matrices. There exists some work on identifying change points in high-dimensional means, but the literature for high-dimensional covariances is very small. Aue et al. (2009) laid groundwork by considering a p-dimensional multivariate, possibly high-dimensional, time series setup where T diverges, n = 1 and p < T . Their test statistic involves the inverse 14 of a p × p sample covariance matrix, which is singular if p > T . Thus, their method is not applicable to high-dimensional longitudinal data. In the case with finite p and n but diverging T , one major concern is that the change point estimator is not consistent (Hinkley, 1970) and only the ratios ki/T (i = 1, . . . , q) are consistent. When p is finite but n → ∞, it has been shown that change points can be estimated consistently. However, it is not clear how the data dimension affects the rate of convergence. We study the rate of convergence of our proposed change point estimator and find that it depends on the data dimension, sample size, noise level and signal strength. Consistency of the change point estimator is possible even in the high-dimensional case. Furthermore, we propose a binary segmentation procedure for identifying the locations of multiple change points, whose consistency is also established. Our work is related to, but different from, that of Li and Chen (2012), who considered a test for the equality of two covariance matrices with two independent samples. First, we consider a general homogeneity test of covariance matrices with more than two populations, while Li and Chen only considered a two-sample case. Second, Li and Chen considered the test for two independent samples, but our proposal can accommodate both temporal and spatial dependence. Moreover, our method is designed to test for the existence of change points among high-dimensional covariance matrices for longitudinal data. Therefore, the test procedure considered in this chapter is different from that in Li and Chen (2012). This chapter makes the following contributions. From a methodology perspective, the proposed test procedure provides a novel solution for change point detection problems in the large p, small n and T scenario. The test statistic combines the strength of maximal and Frobenius norms, and is powerful against the alternative. Second, we propose a method for estimating locations of change points among high-dimensional covariance matrices. The proposed change point detection and identification procedures are widely applicable without any sparsity assumption. We establish the asymptotic distribution of a test statistic for data with general temporal and spatial dependence. The identification procedure for multiple 15 change points is shown to be consistent. Our results reveal the impact of data dimension, sample size, and signal-to-noise ratio on the rate of convergence of the change point estimator. The proposed methods formally address two challenges that are unsolved in the existing covariance change point literature: the large p, small n and T issue, and spatial and temporal dependence. The remaining sections of this chapter are organized as follows. Section 2.2 details our basic settings with regards to covariance testing. In Section 2.3 we introduce our testing pro- cedure and test statistics along with their asymptotic distributions. Section 2.4 introduces an estimator for change point identification. Moreover, binary segmentation is proposed to identify multiple change points. Sections 2.5 and 2.6 demonstrate the finite sample perfor- mance of our procedures via simulation and analysis of a time-course microarray data set, respectively. All proofs of theorems and necessary lemmas are available in Section 2.7. 2.2 Basic setting Let Yit = (Yit1, . . . , Yitp)T be the observed p-dimensional random vector for the ith individual at time point t = 1, . . . , T , where T ≥ 2, and i = 1, . . . , n. Assume that Yit follows the model Yit = µt + εit, (2.3) where µt is a p-dimensional unknown mean vector and εit = (εit1, . . . , εitp)T is a multivariate normally distributed random error vector with mean zero and covariance var(εit) = Σt. A generalization to the non-Gaussian setup is given in Section 2.5. In addition, it is assumed that εit = ΓtZi for a p × m matrix Γt, where m ≥ pT , and Zi is an m-dimensional standard multivariate normally distributed random vector so that cov(εis, εjt) = ΓsΓT t = Cst if i = i=1 are independent, but {εit}T j ∈ {1, . . . , n} and is 0 if i (cid:54)= j. The random errors {εit}n depend on each other. Of interest is to test whether any change points among covariances occur at some time points t ∈ {1, . . . , T − 1}. We test the hypothesis H0 versus H1 specified in (2.1) and (2.2). If H0 is rejected, we further estimate the locations of change points. t=1 16 s1=1 2.3 Homogeneity tests of covariance matrices At each t ∈ {1, . . . , T − 1}, we define a measure Dt = w−1(t)(cid:80)t (cid:80)T s2=t+1 tr{(Σs1 − Σs2)2}, where w(t) = t(T − t). Measure Dt characterizes the differences among the covari- ances before t and after t. Clearly, Dt = 0 for all t ∈ {1, . . . , T − 1} under H0, and Dt (cid:54)= 0 for any t under H1. Therefore, max1≤t≤T−1 Dt = 0 under H0, and max1≤t≤T−1 Dt > 0 under H1. Thus, Dt is useful for distinguishing the null and alternative hypotheses. Measure Dt is different from measure S1,T =(cid:80)T−1 (cid:80)T s2=s1+1 tr{(Σs1 − Σs2)2} used in Schott (2007), who applied S1,T in constructing a homogeneity test specified in (2.1) for independent samples. In fact, for any t ∈ {1, . . . , T − 1}, Dt = S1,T − (S1,t + St+1,T ), where S1,t and St+1,T quantify the differences among covariances only before time t and only after s1=1 time t, respectively. These are not useful for measuring the differences among covariances before and after time t. Measure Dt removes both S1,t and St+1,T from S1,T . To construct an unbiased estimator of Dt, we need an unbiased estimator of tr(Σs1Σs2). We make use of U-statistic type estimators because they avoid bias that is not ignorable in a high- dimensional setup (Bai & Saranadasa, 1996; Chen & Qin, 2010). Otherwise, bias that limit the scope of applications. Let dices of sample subjects. For example, correction could be a challenge and require conditions on the data dimension and sample size ∼(cid:80) denote summation over mutually different in- ∼(cid:80) i,j,k means summation over {(i, j, k) ∈ {1, . . . , n} : )2 Yjs2 µs2)2 where P k n)(cid:80)n i(cid:54)=j(Y T is1 ∼(cid:80) Σs2µs1 + µT s2 i (cid:54)= j, j (cid:54)= k, k (cid:54)= i}. For any s1, s2 ∈ {1, . . . , T}, define Us1s2,0 = (1/P 2 as an unbiased estimator of tr(Σs1Σs2) + µT Σs1µs2 + (µT n = s1 s1 n!/(n − k)!. To remove the nuisance terms µT Σs2µs1 and (µT µs2)2, we define Us1s2,1 = s1 s1 (1/P 3 as an unbiased estimator of µT Σs2µs1 + (µT µs2)2 and, n) s1 s1 similarly, Us2s1,1 is an unbiased estimator of µT Σs1µs2 + (µT µs2)2. To remove the nui- s2 s1 sance term (µT µs2)2, we define Us1s2,2 = (1/P 4 i,j,k,l Y T n) as an unbiased Yjs2 s1 is1 estimator of (µT µs2)2. A computation efficient formulation of Us1s2,1 and Us1s2,2 is given s1 i,j,k Y T is1 ∼(cid:80) Y T ks1 Yls2 Yjs2 Y T js2 Yks1 17 in the Appendix. Finally, we define an unbiased estimator for tr(Σs1Σs2) as Us1s2 = Us1s2,0 − Us1s2,1 − Us2s1,1 + Us1s2,2. (2.4) The estimator Us1s2 is a generalization of the estimator for the trace of the covariance given by Chen et al. (2010) and Li and Chen (2012). For t = 1, . . . , T − 1, an unbiased estimator of Dt is ˆDnt = 1 w(t) (Us1s1 + Us2s2 − Us1s2 − Us2s1). (2.5) To study the asymptotic variance of ˆDnt for t = 1, . . . , T − 1, define (−1)|u−v|+|k−l|tr2(Csuhk CT svhl ) t(cid:88) T(cid:88) s1=1 s2=t+1 ∗(cid:88) (cid:88) s1,s2, h1,h2 u,v, k,l∈{1,2} V0t = ∗(cid:88) =(cid:80)t s1,s2, h1,h2 (cid:88) (cid:80)T and V1t = u,k∈{1,2} (−1)|u−k|tr{(Σs1 − Σs2)Csuhk (cid:80)t where(cid:80)∗ (cid:80)T = 0 for any su (cid:54)= hk, and V0t = (cid:80)∗ (cid:80)T (cid:80)t h2=t+1 . If no temporal dependence exists, then = s2=t+1. Up to a scale factor, this V0t is the part of the variance of ˆDnt for the case u,v∈{1,2} tr2(ΣsuΣsv ) where (cid:80)∗ (Σh1 − Σh2 )CT suhk }, s1,s2, h1,h2 (cid:80) Csuhk s1=1 s2=t+1 h1=1 s1=1 s1,s2 s1,s2 with independent samples under H0 . The asymptotic setting considered in this chapter is p(n) → ∞ as n → ∞, where p is considered to be a function of n. We do not require a specific relationship between p and n. Instead, for any t ∈ {1, . . . , T − 1}, we have two regularity conditions. For any matrix A, denote A⊗2 = AAT. Then: Condition 1. tr{(ΓT s2 Condition 2. tr(cid:2){(Γs1 + Γs2)T(Σs1 − Σs2)(Γs1 − Γs2)}⊗2(cid:3) = o(nV1t) for s1 ∈ {1, . . . , t} )⊗2} = o(V0t) for any s1, s2, h1, h2 ∈ {1, . . . , T}; Cs1h1 Γh2 and s2 ∈ {t + 1, . . . , T}. Condition 1 generalizes Condition 2 imposed by Li and Chen (2012) to a T -sample test with temporal dependence. If there is no temporal dependence, Condition 1 can be simplified 18 Σh1 Σh2 Σs1) = o(V0t). In general, the left-hand side of the equality in Condition )tr(Σs2Σs1Σs2Σs1)}1/2, which is of order O(p) if all to tr(Σs2Σs1Σh2 1 is bounded by {tr(Σh2 Σh1 the eigenvalues of Σt are bounded. If the temporal dependence is not overwhelming so that V0t (cid:16) pδ for any δ > 1, then Condition 1 holds. To appreciate this point, consider a null hypothesis case with Cst = (1 − rst,n)Σ for s, t ∈ {1, . . . , T}. Here 1 − rst,n measures the temporal correlation. If rst,n is small for all s, t, then the temporal dependence among {Yit}T If rst,n → 0 for all s, t, then V0t (cid:16) rntr2(Σ2) (cid:16) rnp2 provided all the eigenvalues of Σ are bounded. If the temporal dependence is not too strong so that 1/p = o(rn), then Condition 1 holds as p → ∞. Intuitively, Condition 1 implies that spatial and temporal dependence (cid:80) u,v,k,l∈{1,2}(−1)|u−v|+|k−l|rsuhk,nrsvhl,n. t=1 is strong. Let rn = (cid:80)∗ s1,s2,h1,h2 cannot be too strong. Condition 2 is automatically true under H0 because its left-hand side equals zero. Hence, it is not needed under H0. If there is no temporal dependence, it can be shown that the left- hand side of Condition 2 is tr(cid:8)(Σ2 − Σ2 s2 s1 )2(cid:9), whose order is not larger than V1t. Therefore, Condition 2 is not needed for data without temporal dependence. This condition implies that the alternatives should not be too far away from the null hypothesis. Otherwise, the alternatives are easy to detect because the test statistics would diverge to infinity. Theorem 1 states the mean and variance of ˆDnt. The proof is given in Section 2.7. Theorem 1. The expectation of ˆDnt is E( ˆDnt) = Dt. Under Condition 1, the leading order variance of ˆDnt is σ2 nt = w−2(t)(cid:0)4V0t/n2 + 8V1t/n(cid:1). Based on Theorem 1, we observe that E( ˆDnt) = Dt = 0 under H0. Under alternative H1 in (2.2), it is clear that E( ˆDnt) > 0 for all t under H1. Therefore, ˆDnt is able to distinguish the null and alternative hypotheses in (2.1) and (2.2). If T = 2 and no temporal dependence exists, V0t and V1t are, respectively, simplified 2) and V11 =(cid:80)∗ s1,s2 u=1 tr(cid:2){Σsu(Σs1 − Σs2)}2(cid:3), (cid:80)2 to V01 = tr2(Σ2 1) + 2tr2(Σ1Σ2) + tr2(Σ2 which are the same as those obtained by Li and Chen (2012). For a general case with temporal dependence, V01 = tr2(Σ2 1) + 2tr2(Σ1Σ2) + tr2(Σ2 2)− 4{tr2(Σ1C21) + tr2(Σ2C12)} + 19 12) + tr2(C12C12)}. The last four terms in V01, due to the temporal dependence, 2{tr2(C12CT are not included in Li and Chen’s test. However, in general, these four terms are not ignorable. Therefore, Li and Chen’s procedure is not suitable for temporal dependent data even in the two-sample case. We now study the asymptotic distribution of ˆDnt. The following theorem establishes the asymptotic normality of ˆDnt. The proof is given in Section 2.7. Theorem 2. Under Conditions 1–2, σ−1 where σ2 nt is defined in Theorem 1. nt ( ˆDnt − Dt) → N (0, 1) in distribution as n → ∞, We do not require explicit conditions on p and n in Theorem 2. The asymptotic normality holds provided Conditions 1–2 hold. In particular, we only need Condition 1 under the null hypothesis. Thus, our test is valid under Condition 1 without Condition 2, which is needed only for studying the power of the test. The normality assumption in model (2.3) is not essential and can be relaxed to a multivariate model as considered in Chen et al. (2010) and Li and Chen (2012). See Subsection 2.3.1 for the generalization to the non-Gaussian case. Under H0, Dt = 0 for all t ∈ {1, . . . , T −1}. Theorem 2 indicates that σ−1 ˆDnt converges nt,0 = 4V0t/{nw(t)}2 is the variance of ˆDnt under H0. An to N (0, 1) in distribution where σ2 asymptotic α-level rejection region is Rt = {σ−1 ˆDnt > zα}, where zα is the upper α quantile of the standard normal distribution. For each t ∈ {1, . . . , T − 1}, one can use Rt to test for the hypothesis in (2.1). Provided that one test based on ˆDnt rejects the null hypothesis, nt,0 nt,0 one may suspect that change points could exist among covariance matrices. Accordingly, t, in ˆDnt, could be considered as a tuning parameter, and it is hard to decide which t should be used for testing in practice. To make the proposed method free of any tuning parameter and adaptive to unknown change points, we propose the following statistic for testing the hypothesis in (2.1): Mn = max 1≤t≤T−1 ˆσ−1 nt,0 ˆDnt, (2.6) where ˆσ2 nt,0 = 4 ˆV0t/{nw(t)}2. The estimator ˆV0t can be constructed by replacing 20 CT svhl ). Define ) in V0t with Ususv,hkhl tr(Csuhk Ususv,hkhl (1/P 2 n)(cid:80)n , an unbiased estimator of tr(Csuhk CT svhl = Ususv,hkhl,0 − Ususv,hkhl,1 − Usvsu,hlhk,1 + Ususv,hkhl,2, where Ususv,hkhl,0 = YjsvY T i(cid:54)=j=1 Y T Yjhl µhl isu ihk suµsv µT + µT µhl µhk hk + µT µhl µT suCsvhl estimator of µT sv Csuhk Ususv,hkhl,2 = (1/P 4 n) estimators Ususv,hkhl,q, q = 1, 2, is similar to that for Us1,s2,q defined in (2.4). Yghl and an unbiased estimator of µT , Ususv,hkhl,1 = (1/P 3 n) suµsv µT hk sv Csuhk is an unbiased suµsv µT hk is an unbiased estimator of tr(Csuhk . A computation efficient formulation of the µhl YjsvY T ghk i,j,g,f Y T isu YjsvY T ihk i,j,g Y T isu ∼(cid:80) ∼(cid:80) CT svhl µhl is Yf hl )+µT + Under H0 and Condition 1, similar to the derivation in Lemma 4 in Section 2.7, the s1=1 Qn,tq = t(cid:88) leading order of the cov( ˆDnt, ˆDnq) is Qn,tq, where Vn0(s1, s2, h1, h2)/{w(t)w(q)} T(cid:88) q(cid:88) T(cid:88) and Vn0(s1, s2, h1, h2) = (4/n2)(cid:80) ˆDnq is Qn,ts/(cid:112)(Qn,ttQn,ss), which is the correlation u,v,k,l∈{1,2}(−1)|u−v|+|k−l|tr2(Csuhk Let VnD be a correlation matrix whose (t, s) component is Qn,ts/(cid:112)(Qn,ttQn,ss) for t, s ∈ variance between σ−1 nq,0 between ˆDnt and ˆDnq. ˆDnt and σ−1 ). Then the co- h2=q+1 s2=t+1 h1=1 svhl CT nq,0 {1, . . . , T−1}. Assume that VnD converges to VD as n → ∞. The following theorem provides the asymptotic distribution of Mn. Theorem 3. Under Condition 1, we have that under H0, Mn→W in distribution as n → ∞, where W = max1≤t≤T−1 Zt and Z = (Z1, . . . , ZT−1)T is a multivariate normally distributed random vector with mean 0 and covariance VD. According to Theorem 3, an α-level test for (2.1) rejects the null hypothesis if Mn > Wα, where Wα is the α-quantile of W such that pr(W > Wα) = α. Let Zn be a N (0, ˆVnD) dis- ( ˆQn,tt ˆQn,ss), tributed random vector with the (t, s) component of ˆVnD estimated by ˆQn,ts/ (cid:113) where ˆQn,ts = 4 n2w(t)w(s) t(cid:88) s(cid:88) T(cid:88) T(cid:88) (cid:88) (−1)|u−v|+|k−l|U 2 susv,hkhl , u,v,k,l∈{1,2} s1=1 h1=1 s2=t+1 h2=s+1 21 is defined just below (2.6). Simulations suggest that the plug-in estimates and Ususv,hkhl of the correlation matrix ˆVnD are reliable when the sample size is approximate 40 or above. See Section 2.7 for a detailed comparison between ˆVnD and VnD. The quantile Wα can be approximated by Wn,α obtained from the multivariate normal distribution by finding the quantile wn,α = (Wn,α, . . . , Wn,α)T satisfying pr(Zn < wn,α) = 1 − α. The quantile wn,α can be computed using the R package mvtnorm (Genz et al., 2018), and no simulation is needed to find quantile Wn,α. The lower bound for power based on Mn is pr(Mn > Wα) ≥ max 1≤t≤T−1 pr(ˆσ−1 nt,0 ˆDnt > Wα) = max 1≤t≤T−1 (cid:16) − σnt,0 σnt Φ (cid:17) , (2.7) Wα + Dt σnt where Φ(·) is the standard normal cumulative distribution function. If Dt/σnt dominates Wα, the right-hand side of (2.7) is the maximum power of the test using Rt constructed on a single ˆDnt, so the test based on Mn is more powerful than any test based on a single ˆDnt. 2.3.1 Non-Gaussian random errors To relax the Gaussian assumption, we assume the following data generation model for εi = T )T is a T p × m matrix with m ≥ T p (εT t = Cst. We assume Z1, . . . , Zn are independent and identically iT )T and εi = ΓZi where Γ = (ΓT such that Σ = ΓΓT and ΓsΓT 1 , . . . , ΓT i1, . . . , εT distributed m-dimensional random vectors such that E(Z1) = 0 and var(Z1) = Im. Write Z1 = (Z11, ..., Z1m)T. We assume that each Z1l has a uniformly bounded 8th moment. Also, we assume there exists a finite constant such that for l = 1, . . . , m, E(Z4 )··· E(Z for any integers lv ≥ 0 with(cid:80)q 1l) = 3 + ∆ and ), whenever ) = E(Z ··· Z v=1 lv = 8, E(Z lq 1iq l1 1i1 lq 1iq l1 1i1 i1, . . . , iq are distinct indices. Under Condition 1 and the above setup, it can be shown that the leading order of the 22 variance of ˆDnt is var( ˆDnt) = 4 n2w2(t) 8 + nw2(t) (Σh1 − Σh2 )CT suhk } ∗(cid:88) ∗(cid:88) s1,s2, h1,h2 (cid:88) (cid:88) u,v, s1,s2, h1,h2 + ∆tr{ΓT ) CT svhl k,l∈{1,2} (−1)|u−v|+|k−l|tr2(Csuhk (−1)|u−k|(cid:2)tr{(Σs1 − Σs2)Csuhk }(cid:3). u,k∈{1,2} su(Σs1 − Σs2)Γsu ◦ ΓT hk − Σh2 (Σh1 )Γhk Under the null hypothesis, var( ˆDnt) = 4V0t/{n2w2(t)}. The variance V0t can be estimated using the formula given below equation (2.6). The results in Theorems 2 and 3 can be established in a similar way. 2.3.2 Power-enhanced test for sparse alternatives The proposed test statistic, Mn, is powerful for alternatives with small absolute differences in many components of Σt. However, it might not be very powerful for sparse alternatives with the differences among Σt only residing in a few components. To enhance the power of the proposed test for sparse alternatives, we include an additional term with Mn, as an idea in Fan et al. (2015). Let ¯Ys1v =(cid:80)n s1, and define ˆσs1,uv =(cid:80)n between components u, v ∈ {1, . . . , p} at time s1. Define ˆDnt,uv =(cid:80)t ˆσs2,uv)2 as an estimator of Dnt,uv = (cid:80)t i=1 Yis1v/n be the sample mean of the vth component measured at time (cid:80)T i=1(Yis1u − ¯Ys1u)(Yis1v − ¯Ys1v)/(n − 1) as the sample covariance (cid:80)T s2=t+1(ˆσs1,uv− s2=t+1(σs1,uv − σs2,uv)2. The estimator (uv) skht be the (u, v) component of Cskht ˆDnt,uv is a consistent estimator of Dnt,uv. Let C (uv) ht is the (u, v) component of Σht . To define the variance of ˆDnt,uv, define the following s1=1 s1=1 and σ 23 notation: F (uv) skslhths = σ {C (uv) sl σ (uv) ht (uv) sl σ (uv) hs (uv) sk σ (uv) sk σ (uv) ht (uv) hs + σ + σ + σ C (uv) skhs + C (vu) hssk {C {C {C (vu) htsk (vu) hssl (vu) htsl C C C (uv) skht (uv) slhs (uv) slht C (vv) hssk (vv) htsk (vv) hssl (vv) htsl C } (uu) skhs } (uu) skht } (uu) slhs }, (uu) slht C C + C + C + C G (uv) skslhths = {C (vu) skhs C (uv) skhs + C + {C (vu) skht C (uv) skht + C C (vv) skhs (vv) skht }{C }{C (uu) skhs (uu) skht C (vu) slht C (uv) slht + C (vv) slht C (uu) slht } (vu) slhs C (uv) slhs + C (vv) slhs C (uu) slhs }. The leading order term of the variance of ˆDnt,uv is σ2 nt,uv = 1 w2(t) (−1)|k−l|+|s−t|{n−1F (uv) skslhths + n−2G (uv) skslhths }. (2.8) ∗(cid:88) (cid:88) s1,s2, h1,h2 k,l, s,t∈{1,2} ∗(cid:88) (cid:88) s1,s2, h1,h2 k,l, s,t∈{1,2} Under H0, the first term in (2.8) is 0. Namely, (−1)|k−l|+|s−t|F (uv) skslhths = 0. The leading term in the variance of ˆDnt,uv under H0 is σ2 nt,uv0 = (−1)|k−l|+|s−t|G (uv) skslhths /n2. ∗(cid:88) (cid:88) s1,s2, h1,h2 k,l, s,t∈{1,2} Let ˆG (uv) skslhths be a sample plug-in estimate of G (uv) skslhths , and ˆσ2 nt,uv0 be the corresponding sample estimate of σ2 nt,uv0. Then, the power-enhanced test statistic is (cid:110) ˆσ−1 nt,0 M∗ n = max 1≤t≤T−1 (cid:88) u≤v ˆDnt + λn I( ˆDnt,uv > δn,pˆσnt,uv0) , (cid:111) where δn,p and λn are tuning parameters. The tuning parameters are chosen such that the second part of M∗ n equals zero with probability tending to one under H0, and it converges to a large number under sparse alternatives. 24 We now discuss the choices for tuning parameters for the above power-enhanced test statistic. Let R = (ρij) be the correlation matrix corresponding to the common covariance Σ1 under H0. Define Nj(α) = card{i : |ρij| > (log p)−1−α} and Λ(r) = {i : |ρij| > r for some j (cid:54)= i}. We assume the following condition used in Cai et al. (2013). Condition 3. Suppose that there exists a α and a set π ⊂ {1, . . . , p} whose size is o(p) such that max1≤j≤p,j(cid:54)∈π Nj(α) = o(pγ) for all γ > 0. In addition, there exists a r < 1 and a sequence of numbers Λp,r = o(p) so that card{Λ(r)} ≤ Λp,r. as M∗ s1=1 n Define ls1s2 = max1≤u≤v≤p(ˆσs1,uv−ˆσs2,uv)2/σns1s2,uv0 where σ2 ns1s2,uv0 = var{(ˆσs1,uv− ˆσs2,uv)2} under H0. Similar to the proof of Theorem 1 in Cai et al. (2013), under Condition 3 and H0, we can show that (2.9) (cid:80)T Define Luv = ˆDnt,uv/ˆσnt,uv0 and Ln = max1≤u≤v≤p Luv. Denote the second term in M∗ s2=t+1 σns1s2,uv0/σnt,uv0 ≤ n1 = λn K, uniformly for all u, v for a constant K > 0, and uniform consistency of ˆσnt,uv0 to σnt,uv0, (cid:17) pr{ls1s2 − 4 log(p) + log log(p) ≤ t} → exp{− exp(−t/2)/(cid:112)(8π)}. (cid:80) u≤v I( ˆDnt,uv > δn,pˆσnt,uv0). Because(cid:80)t (cid:16) we have, under H0, pr(M∗ n1 = 0) ≥ pr (cid:26) (cid:26) (cid:110) ≥ 1 − t(cid:88) (ˆσs1,uv − ˆσs2,uv)2 t(cid:88) (ˆσs1,uv − ˆσs2,uv)2/σns1s2,uv0 ≤ δn,p/K σns1s2,uv0 (ˆσs1,uv − ˆσs2,uv)2 Ln ≤ δn,p) = pr( max ˆDnt,uv/ˆσnt,uv0 ≤ δn,p (cid:27) ≤ δn,p σns1s2,uv0 σnt,uv0 σns1s2,uv0 σnt,uv0 max 1≤s1≤t, t+1≤s2≤T max 1≤s1≤t, t+1≤s2≤T 1≤u≤v≤p T(cid:88) = pr max 1≤u≤v≤p T(cid:88) (cid:16) max 1≤u≤v≤p max 1≤u≤v≤p σns1s2,uv0 s1=1 s2=t+1 (cid:27) ≤ δn,p t(cid:88) s1=1 s2=t+1 (cid:17) pr ls1s2 > δn,p/K . T(cid:88) (cid:111) ≥ pr ≥ pr s1=1 s2=t+1 Applying the result in (2.9), if δn,p/K − 4 log(p) + log log(p) → ∞, then pr(M∗ n1 = 0) → 1. We suggest choose δn,p at the order of log(n) log(p) and λn to be a constant based on our 25 numerical experiments. In summary, the tuning parameters δn,p and λn ensure that, under the null hypothesis, M∗ n1 converges to zero with probability one. 2.4 Change point identification If H0 is rejected, then there exist change points among the covariances Σt. We first consider an alternative with one change point: H∗ 1 : Σ1 = ··· = Σk1 (cid:54)= Σk1+1 = ··· = ΣT , where k1 is the true change point, whose location is estimated by ˆk1 = arg max 1≤t≤T−1 ˆDnt. (2.10) (2.11) Define the weight function r(t; k) =  (T − k)/(T − t), k/t, 1 ≤ t ≤ k, k + 1 ≤ t ≤ T − 1. For any fixed value k ∈ {1, . . . , T − 1}, the function r(t; k) achieves its maximum value at t = k. Let βn = max1≤t≤T−1 max{√ theorem establishes the rate of convergence of the change point estimator ˆk1 obtained by (2.11) under the alternative H∗ 1 . V0t,(cid:112)(nV1t)} and ∆n = tr{(Σ1 − ΣT )2}. The next Theorem 4. Under the alternative H∗ its maximum at t = k1. Moreover, ˆk1 − k1 = Op{βn/(n∆n)}. 1 in (2.10), E( ˆDnt) = Dt = r(t; k1)∆n and Dt attains Since r(t; k1) achieves its maximum at t = k1, the first part of Theorem 4 indicates that t = k1 maximizes E( ˆDnt) as a function of t. This is the rationale for estimating k1 through √ (2.11). When the data dimension is fixed, ˆk1−k1 = Op(1/ n). The effect of data dimension is reflected both in βn and ∆n. Here βn can be considered as noise and ∆n can be viewed as signal. If the signal level is larger than the noise level, the rate of convergence of ˆk1 is faster n). On the other hand, if βn is not smaller than n∆n, ˆk1 is not consistent. 26 √ than Op(1/ Next, we consider the alternative, H1, with multiple change points k1 < ··· < kq, as 1 , we have shown in Theorem 4 that the maximum of Dt is specified in (2.2). Under H∗ attained at change point k1. Theorem 5. Under H1 in (2.2), the maximum value of Dt is attained at one of the change points among k1 < ··· < kq. If we estimate the multiple change points by repeatedly applying estimation methods in (2.11) to the population version Dt to all sub-sequences with non-zero Dt, Theorem 5 ensures that we find all the true change points. This property is important for applying the binary segmentation method to identify multiple change points as demonstrated by Venkatraman (1992) in an unpublished technical report. To describe the proposed binary segmentation method, we first define some notation. Let [It] represent the quantities computed based on the data within the time interval It, a subset of [1, T ]. For example, Mn[t1, t2] is the test statistic defined in (2.6) calculated based on Y [t1, t2], the data collected between time t = t1 and t = t2 for t1 < t2. Namely, Mn[t1, t2] = maxt1≤t L, and allows dependence among components within the vector Yit and dependence among {Yit}T at different time points. In the simulation studies, we set n = 40, 50 and 60, p = 500, 750 h=t−s At,hAT t=1 and 1000, T, = 5 and 8, and L = 3. The simulation results reported in Tables 2.1 and 2.2 were based on 500 replications. The results in Table 2.3 were based on 100 simulation replications. Let k1 = [T /2] be the largest integer no greater than T /2. For t ∈ {1, . . . , k1}, we set At,h = A(1) for h ∈ {0, . . . , L}. For t ∈ {k1 + 1, . . . , T} and h ∈ {0, . . . , L}, At,h = A(2). In setting (I), Two simulation settings were used for the generation of the A matrices. we set A(1) = (cid:8)0.6|i−j|I(|i − j| < p/5)(cid:9), and A(2) = (cid:8)(0.6 + δ)|i−j|I(|i − j| < p/5)(cid:9). If δ = 0, A(1) and A(2) are the same and the covariances of Yit are the same for all t. 28 Hence, the null hypothesis, H0, is true. the true change point. In setting (II), we set A(1) = (cid:8)(cid:0)|i − j| + 1(cid:1)−2I(|i − j| < p/5)(cid:9) and A(2) = (cid:8)(cid:0)|i − j| + δ + 1(cid:1)−2I(|i − j| < p/5)(cid:9). Similar to setting (I), a value of δ = 0 If δ (cid:54)= 0, the null hypothesis is false and k1 is corresponds to the null hypothesis being true. If δ (cid:54)= 0, k1 is the underlying true change point for the covariance matrices. Table 2.1 demonstrates the empirical size and power of the proposed test for the homo- geneity of covariance matrices under setting (I) at nominal level 0.05. We observe that the size of the proposed test is reasonably close to the nominal level. The power increases as n increases, as δ increases, and as T increases. Table 2.1 also provides the empirical size and power of the proposed test under simulation setting (II). The phenomena in setting (II) are very similar to those in setting (I). Table 2.1: Empirical size and power of the proposed test, percentages of simulation replications that reject the null hypothesis under settings (I) and (II) Setting δ (I) 0(size) 0.05 0.10 (II) 0(size) 0.10 0.20 T = 5 T = 8 n 40 50 60 40 50 60 40 50 60 40 50 60 40 50 60 40 50 60 500 4.6 4.6 6.0 21.4 37.0 45.6 99.6 100 100 4.4 5.6 4.8 33.4 44.2 65.4 99.8 99.8 100 p 750 4.8 5.2 4.4 27.6 36.0 49.2 100 100 100 5.4 4.6 4.6 35.8 48.6 63.6 99.8 100 100 1000 6.4 5.4 4.2 24.8 36.0 46.2 99.8 100 100 5.0 4.8 4.2 38.2 47.0 60.4 99.6 100 100 500 4.8 4.4 5.4 35.6 49.8 59.6 100 100 100 4.4 6.0 3.6 50.2 68.4 87.0 100 100 100 p 750 4.8 5.8 4.2 34.6 48.8 65.6 100 100 100 4.0 5.2 5.6 52.0 70.6 89.6 100 100 100 1000 4.4 4.6 3.6 34.2 52.0 65.0 100 100 100 4.8 5.6 5.0 51.6 74.0 88.0 100 100 100 The percentages of correct identification are summarized in Table 2.2 when the null 29 hypothesis is false under settings (I) and (II). The percentages of correct identification are the percentages of simulation replications that estimate the location of the change point correctly among all those that reject the null hypothesis. When T = 5, the true change point is k1 = 2, and when T = 8, the true change point is k1 = 4. In both settings, for almost all the cases, the percentages increase as n and δ increase. Table 2.2: Percentages of correct change point identification among all rejected hypotheses under settings (I) and (II) Setting δ (I) 0.05 0.10 0.10 0.20 (II) T = 5 p 750 37.96 45.81 53.28 96.60 98.60 99.80 45.51 61.51 69.72 90.98 92.60 96.20 n 40 50 60 40 50 60 40 50 60 40 50 60 500 41.12 51.35 52.63 93.17 98.00 99.20 49.10 65.00 72.78 90.58 93.37 97.00 1000 40.65 43.33 52.17 95.19 98.20 99.40 55.50 55.98 64.57 89.16 93.20 96.40 500 30.18 39.52 49.33 93.79 98.40 99.80 43.12 53.80 67.59 95.80 98.60 99.80 T = 8 p 750 29.88 39.34 49.70 93.80 97.20 98.60 47.15 61.19 72.99 96.00 98.20 99.80 1000 37.58 41.54 55.08 96.40 99.00 99.00 47.10 58.65 75.23 95.60 99.40 99.80 To demonstrate the performance of the proposed binary segmentation procedure for identifying multiple change points, we generated data using simulation setup (II) with two change points, k1 and k2. When T = 5, k1 = 2 and k2 = 4. When T = 8, k1 = 4 and k2 = 6. For t ∈ {kj−1 + 1, . . . , kj}, we set At,h = A(j) for h ∈ {0, . . . , L} and j = 1, 2, 3 with k0 = 0 and k3 = T . Here, A(1) and A(2) were set to be the same as those in setting (II), and we set A(3) = A(1). The values of δ were chosen to be 0.15 and 0.25. The average true positives and the average true negatives are summarized in Table 2.3. The true positives are the correctly-identified change points, and the true negatives are the correctly-identified time points where no covariance change exists. For T = 5, the maximum number of true positives and true negatives for each is 2. For T = 8, the maximum number of true positives and true negatives is 2 and 5, respectively. The results in Table 2.3 show that the proposed binary 30 segmentation procedure performs well as the sample size, n, increases and as the signal, δ, increases. Table 2.3: Average true positives and average true negatives for identifying multiple change points using the proposed binary segmentation method. Standard errors are included after each number. For T = 5, the maximum number of true positives and true negatives for each is 2. For T = 8, the maximum number of true positives and true negatives is 2 and 5, respectively δ=0.15 δ=0.25 T p 500 5 750 1000 500 8 750 1000 n ATP SE ATN SE ATP SE ATN SE 40 0.27 0.14 50 0.28 60 40 0.24 0.24 50 0.14 60 0.22 40 50 0.20 0.14 60 0.30 40 50 0.27 0.22 60 0.36 40 50 0.24 0.30 60 0.27 40 0.25 50 60 0.24 1.92 1.98 1.92 1.94 1.96 1.98 1.95 1.96 1.98 4.90 4.92 4.95 4.85 4.94 4.90 4.92 4.96 4.94 1.81 1.94 2.00 1.76 2.00 2.00 1.87 1.96 2.00 1.91 1.97 2.00 1.90 1.97 2.00 1.88 1.99 2.00 0.30 0.37 0.27 0.41 0.27 0.30 0.30 0.20 0.20 0.40 0.36 0.32 0.39 0.38 0.34 0.44 0.40 0.27 1.10 1.36 1.57 1.11 1.38 1.47 1.15 1.22 1.54 1.40 1.62 1.78 1.52 1.67 1.81 1.43 1.68 1.84 0.39 0.24 0.00 0.43 0.00 0.00 0.34 0.20 0.00 0.29 0.17 0.00 0.30 0.17 0.00 0.33 0.10 0.00 0.36 0.48 0.50 0.37 0.49 0.50 0.36 0.42 0.50 0.49 0.49 0.42 0.50 0.47 0.40 0.50 0.47 0.37 1.90 1.87 1.92 1.82 1.92 1.90 1.90 1.96 1.96 4.84 4.85 4.89 4.82 4.83 4.90 4.82 4.80 4.92 2.5.1 Power-enhanced test statistic We conducted a numerical simulation to illustrate the performance of the power-enhanced test statistic under sparse alternatives. The data were generated according to setting (I), except for a sparse alternative design. Specifically, let k1 = [T /2] be the largest integer no greater than T /2. For t ∈ {1, . . . , k1}, we set At,h = A(1) for h ∈ {0, . . . , L}. For t ∈ {k1 + 1, . . . , T}, we set At,h = A(2), where A null hypothesis, A h =(cid:8)0.6|i−j|I(|i − j| < p/5)(cid:9). Under the (2) (1) h was set equal to A h . Under the sparse alternative hypothesis, A h except the components within {|i − j| < 2, i < p/25} were set to 1.4. (1) was the same as A (2) h (1) 31 Table 2.4: Empirical size and power, percentages of simulation replications that reject the null hypothesis for the test statistic Mn and the power-enhanced test statistic M∗ n Mn M∗ n n 40 40 40 50 50 50 60 60 60 80 80 80 p 500 750 1,000 500 750 1,000 500 750 1,000 500 750 1,000 Null Alternative Null Alternative 5.2 3.2 4.6 5.2 6.4 3.4 3.8 4.8 4.2 4.0 3.2 4.6 67.6 62.8 62.4 91.6 94.2 97.0 98.8 99.2 99.8 100 100 100 35.4 34.6 36.4 47.8 47.2 52.6 56.8 65.8 66.4 81.2 86.4 84.8 6.0 3.6 4.6 5.6 6.6 3.4 4.4 5.6 4.2 4.0 3.8 4.6 Table 2.4 reports the empirical size and power of the test based on Mn and M∗ n. In the simulation, the tuning parameter δn,p was set to 0.5log(n) log(p), and λn was set to 0.15. We observe that both tests can control the type I error, and the power-enhanced test does not inflate the type I error. More importantly, the power-enhanced test statistic has greater power under the sparse alternative setting. 2.5.2 Non-Gaussian random errors To illustrate the numerical performance of the proposed method under the non-Gaussian setup, we generated data from the linear process model Yit = µt +(cid:80)L h=0 At,hηi(t−h) for i = 1, . . . , n and t = 1, . . . , T , where At,h is a p × p matrix, µt = 0 and ηit are p-dimensional random vectors with each element independently generated from a standardized Gamma distribution with shape parameter 4 and scale parameter 0.5. Let k1 = [T /2] be the largest integer no greater than T /2. For t ∈ {1, . . . , k1}, we set At,h = A(1) = (cid:8)0.6|i−j|I(|i − j| < p/5)(cid:9). For t ∈ {k1 + 1, . . . , T}, we set At,h = A(2) = (cid:8)(0.6+δ)|i−j|I(|i−j| < p/5)(cid:9). If δ = 0, A(1) and A(2) are the same. Hence, the covariances, Σt, are the same for all t ∈ {1, . . . , T} and H0 is true. If δ (cid:54)= 0, the null hypothesis is not 32 true and k1 is the underlying true covariance change point. In the simulation studies, we set n = 40, 50 and 60, with p = 500, 750 and 1000. The number of repeated measurements, T , was set to be 5 and 8 and set L = 3. The simulation results reported in Tables 2.5 and 2.6 were based on 500 simulation replications. Table 2.5: Empirical size and power of the proposed test, percentages of simulation replications that reject the null hypothesis for data generated from a standardized Gamma distribution under the nominal level 5% δ 0(size) 0.05 0.10 n 40 50 60 40 50 60 40 50 60 500 3.6 4.2 4.6 23.4 38.2 46.4 99.8 100 100 T = 5 T = 8 p 750 4.0 5.2 3.8 21.4 36.2 46.8 99.8 100 100 1000 4.4 4.4 4.6 28.2 33.4 46.2 100 100 100 500 4.0 5.2 5.0 35.2 47.8 64.6 100 100 100 p 750 3.6 4.8 5.0 38.6 50.4 67.2 100 100 100 1000 4.6 4.8 5.6 31.2 47.8 66.4 100 100 100 Table 2.5 reports the empirical size and power of the proposed test under the null and alternative hypotheses. We observe that Type I error is well controlled with the empirical sizes close to the nominal level of 5%. The results demonstrate the robustness of the pro- posed method for non-Gaussian distributed random vectors. When the differences between covariance matrices increase, the power of the proposed test increases accordingly. Table 2.6 reports the performance of the proposed change point identification procedure under the non-Gaussian distributed random vectors. We observe that the percentages of correct identification with non-Gaussian random vectors are similar to those under the Gaussian setup. 2.5.3 Accuracy of correlation matrix estimator of VnD This section aims to evaluate the numerical performance of the correlation matrix estimator, ˆVnD, proposed immediately following Theorem 3. To measure the difference between ˆVnD 33 Table 2.6: Percentages of correct change point identification among all rejected hypotheses for data generated from a standardized Gamma distribution δ 0.05 0.10 n 40 50 60 40 50 60 500 32.48 50.26 49.14 93.79 98.80 99.60 T = 5 p 750 42.99 52.49 55.56 96.79 99.40 99.20 T = 8 p 750 30.05 42.06 50.00 94.60 97.00 99.20 1000 30.13 46.86 52.41 95.00 97.20 99.20 1000 35.46 46.71 57.58 95.80 99.00 99.80 500 26.70 40.17 48.30 95.20 98.60 99.60 and VnD, we used the average component-wise quadratic distance, namely, (T − 1)−2(cid:107) ˆVnD − VnD(cid:107)2 F based on 500 simulation replications conducted in setting (I) under the null hypothesis with T = 5. We observe that F . Figure 2.1 illustrates the average of (T −1)−2(cid:107) ˆVnD−VnD(cid:107)2 the correlation matrix estimator, ˆVnD, is reliable when n = 40. The performance further improves as the sample size increases. Figure 2.1: The average component-wise quadratic distance between ˆVnD and VnD. The top solid line is for n = 40; the middle dashed line is for n = 50; the bottom dotted line is for n = 60. The scale of the y-axis is 10−5. 34 2.5.4 Comparison with a pair-wise based method In this section, we compare our proposed method with a pair-wise based method that is similar to the method proposed by Zalesky et al. (2014). In the pair-wise based method, we first obtain a p-value for testing the homogeneity of each component of the covariance matrix for every pair of coordinates (u, v) with u ≤ v and u, v ∈ {1, . . . , p}, and then apply the Bonferroni correction to all the p-values to control the family-wise error rate. In the first step, for each pair (u, v) with u ≤ v and u, v ∈ {1, . . . , p}, we test the following hypothesis versus H0,uv : σ1,uv = ··· = σT,uv, n(cid:80)T−1 H1,uv : σ1,uv = ··· = σk1,uv (cid:54)= σk1+1,uv = ··· = σkq,uv (cid:54)= σkq+1,uv = ··· = σT,uv. ˆDnt,uv. Under H0,uv, the asymptotic distribution of ˆDn,uv is (cid:80)∞ To test H0,uv, we apply the statistic ˆDnt,uv defined in Section 4, and define ˆDn,uv = l , where χ2 l are independent chi-square distributions with degree of freedom 1, and λl’s are the eigen- values of the kernel of ˆDn,uv. In practice, one can approximate the weighted chi-square l=1 λlχ2 t=1 distribution using a scaled chi-square distribution. Thus, we approximate the distribution of ˆDn,uv by bχ2 and variance of ˆDn,uv under H0,uv, respectively. The variance of ˆDn,uv under H0,uv is uv/(2µuv) and ν = 2µ2 uv. Here µuv and σ2 ν, where b = σ2 uv/σ2 uv are the mean T−1(cid:88) T−1(cid:88) t(cid:88) q(cid:88) T(cid:88) T(cid:88) (cid:88) t=1 q=1 s1=1 h1=1 s2=t+1 h2=q+1 k,l, s,t∈{1,2} σ2 uv = (−1)|k−l|+|s−t|G (uv) skslhths , where G (uv) skslhths is defined in Section 4. The mean of ˆDn,uv under the null H0,uv is T−1(cid:88) t(cid:88) T(cid:88) (cid:88) t=1 s1=1 s2=t+1 a,b∈{1,2} µuv = (−1)|a−b|{C (uu) sasb C (vv) sasb + C (uv) sasb C (vu) sasb }. We then approximate the distribution of ˆDn,uv by ˆbχ2 2ˆµ2 uv. The p-value for the (u, v) pair is computed as puv = pr(ˆbχ2 ˆν where ˆb = ˆσ2 uv/ˆσ2 ˆν > ˆDn,uv). uv/(2ˆµuv) and ˆν = 35 In the second step, we apply the Bonferroni correction to control the family-wise error rate. Define pmin = minu≤v puv as the minimum of all the pair-wise p-values. If pmin < 2α/{p(p + 1)}, then we reject the null hypothesis on the homogeneity of covariance matrices at the α level. To compare the proposed methods with the pair-wise based method, we conducted a simulation study using the simulation setup given in Subsection 2.5.1. The simulation re- sults are summarized in Table 2.7. We observe that the pair-wise based method has very conservative size under the null hypothesis when sample size is less than 80, but it improves as sample size increases. Under the alternatives, the power of the pair-wise based method is low for the small sample cases, but it increases as sample size increases to 80. However, in all the cases, our proposed power-enhanced method has superior power than the pair-wise based method. Table 2.7: Empirical size and power, percentages rejecting the null hypotheses in the simulations, for the pair-wise based test and the power-enhanced test statistic M∗ n n 40 40 40 50 50 50 60 60 60 80 80 80 p 500 750 1000 500 750 1000 500 750 1000 500 750 1000 M∗ n Pair-wise based test Null Alternative Null Alternative 0.2 0.0 0.0 0.4 0.2 0.2 0.6 0.2 0.6 0.4 2.4 2.0 67.6 62.8 62.4 91.6 94.2 97.0 98.8 99.2 99.8 100 100 100 0.2 0.4 0.0 0.4 0.0 0.2 12.2 4.8 1.0 97.6 98.8 96.8 6.0 3.6 4.6 5.6 6.6 3.4 4.4 5.6 4.2 4.0 3.8 4.6 2.6 An empirical study In this section, we apply our proposed method to a time-course gene expressions data set collected by Taylor et al. (2007). The goal was to identify gene sets with significant changes 36 in covariances over time and estimate their respective change points, should any exist. The data correspond to a study where peripheral blood mononuclear cells were collected from 69 patients with hepatitis C virus. The cells were collected once before treatment, day 0, and five times during treatment: days 1, 2, 7, 14 and 28. The treatment consisted of pegylated alpha interferon and ribavirin. More information about the experiment can be found in Taylor et al. (2007). Prior to the application of our methodology, the data were pre-processed. The gene expressions with low quality measurements were removed if the corresponding Microarray Suite 5.0 signal transcript was classified as absent. We only kept individuals with gene expression arrays at all six time points. After pre-processing, our data set consisted of 46 individuals with gene expression arrays at days 0, 1, 2, 7, 14 and 28. The original data set can be obtained at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE7123. The genes were grouped into gene sets that were defined by gene ontology, which classifies genes according to attributes of the gene in three biological domains: molecular function, biological process, and cellular component (Ashburner et al., 2000). For instance, the gene ontology term labeled 0006468 is related to introducing a phosphate group onto a protein. Hence, this gene ontology term would consist of all the genes that have a role in the afore- mentioned biological process. A given gene can be a member of multiple gene ontologies. For example, in our processed data set, gene ontology 0006468 consists of 221 genes and gene ontology 0007155 consists of 134 genes, with 64 genes in common. After filtering the data set according to the procedure above, 159 gene ontology terms were analyzed. We applied our method to gene ontology terms with a minimum of 100 genes. Figure 2.2 illustrates the number of genes in the 159 gene ontology terms. Each gene set analyzed had a gene count much larger than the sample size of 46 patients. 37 Figure 2.2: Histogram of the number of genes among the 159 gene ontology terms analyzed. Let Y (g) it (i = 1, . . . , 46; t = 1, . . . , 6) be the gene expression data for the gth gene ontology term of the ith individual at time t, where t = 1 represents day 0, before treatment, and t = 2, 3, 4, 5, 6 represent the times during the treatment of hepatitis C virus with pegylated alpha interferon and ribavirin. Assume model (2.3) for each gene ontology term, Y (g) it = (g) (g) it ) = Σ . t it }T (g) t=1 µ (g) t + ε (g) it for g = 1, . . . , 159, where µ is an unknown mean vector and var(ε The assumptions on ε (g) it in model (2.3) incorporate temporal dependence so that {ε (g) t are dependent over time. For each gene ontology term, we tested whether the covariance matrices, Σ (g) t , are the same across all t. In addition, the change points were identified for those gene ontology terms found to be significant. For the gth gene ontology term, we computed ˆD (g) nt /ˆσ(g),nt,0 for t = 1, . . . , 5 and the covariance matrix estimation ˆV n be the maximum of the standardized test statistics { ˆV n5 }T . For each gene ontology term, the ˆD n }159 local false discovery rate was estimated using { ˜M(g) g=1 based on the method proposed by n,D}−1/2{ˆσ−1 n,D. Let ˜M(g) n1 , . . . , ˆσ−1 ˆD (g),n1,0 (g),n5,0 (g) (g) (g) (g) Efron (2007). As suggested in Efron (2007), a cutoff value of 0.20 was used for the local 38 false discovery rate procedure. There were 10 gene ontology terms that had a local false discovery rate less than or equal to 0.20. These 10 significant gene ontology terms and their corresponding number of genes, test statistic value, estimated change points, and local false discovery rate are listed in Table 2.8. Among those gene ontology terms listed in Table 2.8, term 0008285 is associated with the reduction or stoppage of cell proliferation. This is of interest, as Kannan et al. (2011) had noted that the hepatitis C virus reduces cell proliferation. Thus, the results here suggest that treatment using pegylated alpha interferon and ribavirin has some effect on the covariances of those genes that play a role in cellular proliferation. Table 2.8: Significant gene ontology terms, test statistic values, number of genes in each gene ontology term, identified change points and estimated local false discovery rates GO Number of Genes Test Statistic Value Change Points Local FDR 0006511 0030054 0042493 0008219 0006357 0005765 0019904 0008285 0048471 0005739 132 136 128 122 167 116 117 148 263 661 11.10 9.92 9.54 9.34 9.13 8.93 8.87 8.75 8.04 8.04 4, 5 1, 4, 5 5 4, 5 1, 4 4 4, 5 1, 2, 5 1, 4, 5 4, 5 0.012 0.044 0.064 0.076 0.090 0.103 0.106 0.115 0.168 0.168 After identifying ten significant gene ontology terms, we applied binary segmentation to identify all change points. We discovered that eight terms have a change point at t = 5, day 14, eight have a change point at t = 4, day 7, and four terms have a change point at t = 1, day 0. Recall that a change point at time t = 5 implies the covariance matrix at time t = 5 is not equal to that at time t = 6. Hence, most of the identified changes in the covariance matrices occurred by the initial day of treatment or later in the treatment cycle. These findings complement those of Taylor et al. (2007), who observed that the majority of the genes that were altered in expression occurred at the early days of treatment and again, marginally, between treatment days 7 and 28. To illustrate the changes in covariance 39 matrices, Figure 2.3 demonstrates the correlation networks of gene ontology term 0030054 at the six time points. We see that the correlation networks change at time points 1, 4 and 5, which is consistent with the identified change points reported in Table 2.8. Figure 2.3: Correlation network map for gene ontology term 0030054. Each dot represents a gene within the gene ontology. A link between dots indicates a strong correlation between genes. 2.7 Technical details 2.7.1 Proofs of lemmas In this section, we present the proofs to some lemmas used in the proofs of the main theorems. Without loss of generality, assume that µt = 0 in our proofs for each t ∈ {1, . . . , T} because the test statistic, ˆDnt, is invariant with respect to µt. Lemma 1. (i) For any symmetric matrices A and B with appropriate dimensions, we have tr2(AB) ≤ tr(A2)tr(B2); (ii) for any square matrix A, |tr(A2)| ≤ tr(AAT); and (iii) for any square matrix A, (cid:107)A2(cid:107)2 F = tr(BTB) is the Frobenius norm of B. F where (cid:107)B(cid:107)2 F ≤ (cid:107)ATA(cid:107)2 40 Proof. (i) Let A = (aij) and B = (bij). By the Cauchy-Schwarz inequality, (cid:88) (cid:88) aijbij ≤(cid:88) (cid:16)(cid:88) (cid:17)1/2(cid:16)(cid:88) (cid:17)1/2 ≤(cid:16)(cid:88) (cid:88) (cid:17)1/2(cid:16)(cid:88) (cid:88) (cid:17)1/2 . b2 ij a2 ij b2 ij a2 ij tr(AB) = i j i j j i j i j Since A and B are symmetric, the right-hand side of the above inequality is the square root of tr(A2)tr(B2). (ii) Assume that A = (aij) is any p× p matrix. If tr(A2) ≥ 0, because tr{(AT − A)(AT − A)T} ≥ 0 and tr{(AT − A)(AT − A)T} = 2tr(ATA)− 2tr(A2), we have |tr(A2)| ≤ tr(AAT). If tr(A2) < 0, because tr{(AT + A)(AT + A)T} ≥ 0 and tr{(AT + A)(AT + A)T} = 2tr(ATA) + 2tr(A2) = 2tr(ATA) − 2|tr(A2)|, we have |tr(A2)| ≤ tr(AAT). (iii) By definition, (cid:107)A2(cid:107)2 F = tr(ATATAA) = tr(ATAAAT). Since ATA and AAT are symmetric matrices, it follows by using part (i) that tr(ATAAAT) ≤ (cid:107)ATA(cid:107)F(cid:107)AAT(cid:107)F = (cid:107)ATA(cid:107)2 F , (cid:3) and this completes the proof. Lemma 2. Define Us1s2,0 = {n(n − 1)}−1(cid:80)n )2 for any s1, s2 ∈ {1, . . . , T}. Under Condition 1, the leading order term of the covariance between Us1s2,0 and Uh1h2,0 is Gn(s1, s2, h1, h2) = cov(Us1s2,0, Uh1h2,0), where i(cid:54)=j=1(Y T is1 Yjs2 Gn(s1, s2, h1, h2) = 2 n(n − 1) 2(n − 2) n(n − 1) + tr2(Cs1h1 CT (cid:88) 2 tr2(Cs1h2 CT s2h1 ) ) + n(n − 1) s2h2 tr(Σsuc Csuhv Σhvc CT ). suhv u,v∈{1,2} Denote uc as the complement set of {u}. That is, uc = {1, 2}/{u}. 41 Proof. Using the notation i(cid:54)=j=1 i(cid:54)=j=1 is1 is1 ih1 Yjs2 Yjs2 ∼(cid:80) defined in Section 2.3, we define n(cid:88) )2 − tr(Σs1Σs2)(cid:9)(cid:8)(Y T E(cid:2)(cid:8)(Y T n(cid:88) E(cid:2){(Y T ∼(cid:88) i,j,l E(cid:2){(Y T ∼(cid:88) i,j,l E(cid:2){(Y T ∼(cid:88) i,j,l E(cid:2){(Y T ∼(cid:88) i,j,l E(cid:2){(Y T ∼(cid:88) i,j,k,l E(cid:2){(Y T )2 − tr(Σs1Σs2)}{(Y T jh1 )2 − tr(Σs1Σs2)}{(Y T ih1 )2 − tr(Σs1Σs2)}{(Y T lh1 )2 − tr(Σs1Σs2)}{(Y T jh1 )2 − tr(Σs1Σs2)}{(Y T lh1 )2 − tr(Σs1Σs2)}{(Y T kh1 Yjs2 Yjs2 Yjs2 Yjs2 Yjs2 is1 is1 is1 is1 is1 L1 = L2 = L3 = L4 = L5 = L6 = L7 = 1 (P 2 n)2 1 (P 2 n)2 1 (P 2 n)2 1 (P 2 n)2 1 (P 2 n)2 1 (P 2 n)2 1 n)2 (P 2 )(cid:9)(cid:3), )}(cid:3), )}(cid:3), )}(cid:3), )}(cid:3), )}(cid:3), )}(cid:3). )2 − tr(Σh1 Σh2 Yjh2 )2 − tr(Σh1 Σh2 Yih2 Ylh2 Yih2 Ylh2 Yjh2 Σh2 Σh2 )2 − tr(Σh1 )2 − tr(Σh1 )2 − tr(Σh1 )2 − tr(Σh1 Σh2 )2 − tr(Σh1 Σh2 Σh2 Ylh2 )2} = tr(Σs1Σs2). Applying Then cov(Us1s2,0, Uh1h2,0) = L1 + ··· + L7 since E{(Y T is1 Yjs2 standard results in multivariate analysis, we obtain Yjs2 E{(Y T )2(Y T is1 ih1 + 2tr2(Cs2h2 Yjh2 Ch1s1 )2} = 2tr(Ch1s1 ) + 2tr(Σs2Cs1h1 Cs2h2 Σh2 Ch1s1 CT s1h1 Cs2h2 ) + 2tr(Σs1CT Σh1 Cs2h2 ) ) + tr(Σs2Σs1)tr(Σh2 s2h2 Σh1 ). This implies that L1 + L2 = (cid:104) tr{(Cs2h2 2 n(n − 1) + tr2(Cs2h1 + tr(Σs1Cs2h1 Ch1s1 )2} + tr{(Cs2h1 CT Σh1 Ch2s1 ) + tr(Σs1Cs2h2 CT s2h1 ) + tr(Σs2Cs1h2 s2h2 Σh1 Σh2 Ch2s1 )2} + tr2(Cs2h2 (cid:105) ) + tr(Σs2Cs1h1 CT Σh2 ) . s1h2 Ch1s1 ) CT s1h1 ) Furthermore, L7 = 0 and 6(cid:88) i=3 2(n − 2) n(n − 1) Li = (cid:88) u,v∈{1,2} tr(Σsuc Csuhv Σhvc CT suhv ), This with Condition 1 implies that Lemma 2 is valid. (cid:3) 42 Lemma 3. Define Us1s2,1 = (1/P 3 n) covariance between Us1s2,1 and Uh1h2,1 is 4 n3 cov(Us1s2,1, Uh1h2,1) = i,j,k Y T is1 Yjs2 Y T js2 Yks1 . The leading term in the ∼(cid:80) (cid:88) (cid:88) u,v∈{1,2} u,v∈{1,2} + 2 n2 tr2(Csuhv CT suchvc ) tr(Σsuc Csuhv Σhvc CT suhv ), where uc is the complement set of {u}. That is, uc = {1, 2}/{u}. In addition, var( ˆDnt,1) = o{var( ˆDnt,0)}. ∼(cid:88) ∼(cid:88) Proof. Because E(Us1s2,1) = 0, cov(Us1s2,1, Uh1h2,1) = E(Us1s2,1Uh1h2,1). By definition, Us1s2,1Uh1h2,1 = 1 (P 3 n)2 i,j,k i1,j1,k1 (Y T is1 Yjs2 Y T js2 Yks1 + Y T is2 Yjs1 Y T js1 Yks2 ) × (Y T i1h1 Yj1h2 Y T j1h2 Yk1h1 + Y T i1h2 Yj1h1 Y T j1h1 Yk1h2 ). According to the number of equivalent indices among two sets {i, j, k} and {i1, j1, k1}, we decompose Us1s2,1Uh1h2,1 into three terms. Let Ic = {i, j, k}∪{i1, j1, k1} where c represents the number of indices that are equivalent to each other in two sets {i, j, k} and {i1, j1, k1}. If there is one index equivalent, I1 = {(i = i1, j, k, j1, k1), (i = j1, j, k, i1, k1), (i, j, k = i1, i1, j1), (i, j = i1, k, j1, k1), (i, j = j1, k, i1, k1), (i, j = k1, k, i1, j1), (i, j, k = i1, j1, k1), (i, j, k = j1, i1, k1), (i, j, k = k1, i1, j1)}. For each case within I1, the expectation of corresponding summand in Us1s2,1Uh1h2,1 is 0. If there are two indices equivalent, I2 = {(i = i1, j = j1, k, k1), (i = j1, j = i1, k, k1), (i = i1, k = k1, j, j1), (i = k1, k = i1, j, j1), (j = j1, k = k1, i, i1), (j = k1, k = j1, i, i1), (i = i1, j = k1, k, j1), (i = j1, j = k1, k, i1), (i = i1, k = j1, j, k1), (i = k1, k = j1, j, i1), (j = j1, k = i1, i, k1), (j = k1, k = i1, i, j1)}. 43 Among all the cases in I2, there exist two cases {(i = i1, k = k1, j, j1), (i = k1, k = i1, j, j1)} whose expectations of the summand in Us1s2,1Uh1h2,1 are not zero. Similarly, if there are three indices equivalent, I3 = {(i = i1, j = j1, k = k1), (i = j1, j = i1, k = k1), (i = k1, j = j1, k = i1), (i = k1, j = i1, k = j1), (i = i1, j = k1, k = j1), (i = j1, j = k1, k = i1)}. Among all the cases in I3, there are two cases (i = i1, j = j1, k = k1) and (i = k1, j = j1, k = i1) that have non-zero expectation. In summary, E(Us1s2,1Uh1h2,1) = 2 n)2 E (P 3 × (Y T ih1 (cid:110) ∼(cid:88) (cid:110) ∼(cid:88) (cid:104) 2 n)2 E (P 3 (cid:88) × (Y T ih1 + = u,v∈{1,2} 2 P 3 n + tr{(Csuhv CT i,k,j,j1 (Y T is1 Yjs2 Y T js2 Yks1 + Y T is2 Yjs1 Y T js1 Yks2 ) Y T Yj1h2 j1h2 i,k,j (Y T is1 Ykh1 + Y T ih2 Yj1h1 Y T j1h1 Yjs2 Y T js2 Yks1 + Y T is2 Yjs1 Yks2 ) (cid:111) ) Ykh2 Y T js1 (cid:111) ) Ykh2 ) + tr2(Csuhv CT suchvc ) (cid:105) ) . suhv suhv Yjh1 Ykh1 Y T jh2 + Y T ih2 Y T Yjh2 jh1 (n − 3)tr(Σsuc Csuhv Σhvc CT suchvc )2} + tr(Σsuc Csuhv Σhvc CT ∼(cid:80) )(Y T ks1 Yls2 ). For any fixed u, v, k, l ∈ This completes the proof. (cid:3) Lemma 4. Define Us1s2,2 = (1/P 4 n) Yjs2 {1, 2}, the covariance between Ususv,2 and Uhkhl,2 is i,j,k,l (Y T is1 cov(Ususv,2, Uhkhl,2) = 2 P 4 n {tr2(Csuhk CT + 2tr(Csvhk CT svhl ) + tr(Csuhk CT svhl Csuhk CT svhl ) Csuhl CT suhk ) + 2tr(Csvhk CT svhl Csuhk CT svhl + 3tr(Csvhk CT suhl Csvhl CT suhk ) + 3tr(Csvhk CT suhl Moreover, var( ˆDnt,2) = o{var( ˆDnt,0)}. 44 ) suhl CT )tr(Csvhl suhk )}. Proof. Because E(Ususv,2) = 0, cov(Ususv,2, Uhkhl,2) = E(Ususv,2Uhkhl,2). Therefore, let R = E{(Y T isu Yjsv )(Y T ksu Ylsv )(Y T )}, so )(Y T Yj1hl Yl1hl k1hk cov(Ususv,2, Uhkhl,2) = = R i1,j1,k1,ll ∼(cid:88) i,j,k,l i1hk ∼(cid:88) 1 n)2 (P 4 {tr2(Csuhk 2 P 4 n CT + 2tr(Csvhk CT svhl ) + tr(Csuhk CT svhl Csuhk CT svhl ) Csuhl CT suhk ) + 2tr(Csvhk CT svhl Csuhk CT svhl ) suhl CT )}. + 3tr(Csvhk CT suhl Csvhl CT suhk ) + 3tr(Csvhk CT suhl )tr(Csvhl suhk This completes the proof of the first part. For the second part, write ˆDnt,2 = w−1(t)(cid:80)t (cid:88) ∗(cid:88) It follows by the first part that 1 var( ˆDnt,2) = 2 n4 w2(t) s1,s2,h1,h2 u,v,k,l∈{1,2} (cid:80)T s2=t+1 (cid:80) u,v∈{1,2}(−1)|u−v|Ususv,2. s1=1 (−1)|u−v|+|k−l|{tr2(Csuhk CT svhl ) + tr(Csuhk Csuhk ) + 2tr(Csvhk Csuhl + 2tr(Csvhk Csuhk ) + 3tr(Csvhk Csvhl CT svhl CT suhl CT ) suhk CT suhk ) CT svhl CT svhl CT svhl CT suhl CT + 3tr(Csvhk CT suhl )tr(Csvhl suhk )}. Applying the inequalities given in Lemma 1, we can show that var( ˆDnt,2) = o{var( ˆDnt,0)}. This completes the proof of this Lemma. (cid:3) Lemma 5. Let Z be an m-dimensional multivariate normally distributed random vector with mean 0 and covariance Im. Define M = ZZT − I. Assume A, B, C, D are matrices 45 with appropriate dimensions. Then E{tr(AM ATBM BT)} = tr2(ATB) + tr{(ATB)2} and cov{tr(AM ATBM BT), tr(CM CTDM DT)} = 2tr(ATB)tr(CTD)tr{(ATB + BTA)(CTD + DTC)} tr2{(ATB + BTA)(CTD + DTC)} + tr(cid:2){(ATB + BTA)(CTD + DTC)}2(cid:3) 1 2 + + 2tr(ATB)tr{(ATB + BTA)(CTDCTD + DTCDTC)} + 2tr(CTD)tr{(CTD + DTC)(ATBATB + BTABTA)} + 2tr{(ATBATB + BTABTA)(CTDCTD + DTCDTC)}. In particular, var{tr(AM ATBM BT)} = 2tr2(ATB)tr{(ATB + BTA)2} + + 4tr(ATB)tr{(ATB + BTA)(ATBATB + BTABTA)} + 2tr{(ATBATB + BTABTA)2} + tr{(ATB + BTA)4}. tr2{(ATB + BTA)2} 1 2 (cid:104) tr4(ATB) + tr2{(ATB)⊗2}(cid:105) Moreover, var{tr(AM ATBM BT)} ≤ K for a constant K > 0. Proof. We first consider E{tr(AM ATBM BT)}. Because M = ZZT − I, we have tr(AM ATBM BT) = (ZTATBZ)2 − ZTATBBTAZ − ZTBTAATBZ + tr(ATBBTA). (2.12) Taking expectation of the both sides of equation (2.12), we have E{tr(AM ATBM BT)} = tr2(ATB) + tr{(ATB)2} + tr(ATBBTA) − tr(ATBBTA) = tr2(ATB) + tr{(ATB)2}. 46 Next, we consider the covariance part. Using equation (2.12), we have tr(AM ATBM BT)tr(CM CTDM DT) = (ZTATBZ)2(ZTCTDZ)2 − (ZTATBZ)2ZTCTDDTCZ − (ZTATBZ)2ZTDTCCTDZ − (ZTATBZ)2tr(CTDDTC) − ZTBTAATBZ(ZTCTDZ)2 + (ZTBTAATBZ)(ZTCTDDTCZ) + (ZTBTAATBZ)(ZTDTCCTDZ) − (ZTBTAATBZ)tr(CTDDTC) − ZTATBBTAZ(ZTCTDZ)2 + ZTATBBTAZ(ZTCTDDTCZ) + ZTATBBTAZ(ZTDTCCTDZ) − ZTATBBTAZtr(CTDDTC) + tr(ATBBTA)(ZTCTDZ)2 − tr(ATBBTA)(ZTCTDDTCZ) − tr(ATBBTA)(ZTDTCCTDZ) − tr(ATBBTA)tr(CTDDTC). Define the terms in the above expression as J1, . . . , J16. We consider the expectation of each Ji for i = 1, . . . , 16. We have the following: E(J4) = [tr2(ATB) + tr{(ATB)2} + tr(ATBBTA)]tr(CTDDTC), E(J6) = tr(BTAATB)tr(CTDDTC) + 2tr(BTAATBCTDDTC), E(J7) = tr(BTAATB)tr(DTCCTD) + 2tr(BTAATBDTCCTD), E(J8) = E(J12) = E(J14) = −tr(BTAATB)tr(CTDDTC), E(J10) = tr(ATBBTA)tr(CTDDTC) + 2tr(ATBBTACTDDTC), E(J11) = tr(ATBBTA)tr(DTCCTD) + 2tr(ATBBTADTCCTD), E(J13) = tr(ATBBTA)[tr2(CTD) + tr{(CTD)2} + tr(CTDDTC)]. In addition, we can show that, for any matrices A, B, C of appropriate dimensions, E(ZTAZZTBZZTCZ) = tr(A)tr(B)tr(C) + tr(A){tr(BC) + tr(BTC)} + tr(B){tr(AC) + tr(ATC)} + tr(C){tr(AB) + tr(ATB)} + tr{(A + AT)(B + BT)(C + CT)}. 47 Applying the above formula to J2, J3, J5 and J9, we obtain −E(J2) = tr2(ATB)tr(CTDDTC) + tr(ATB){tr(ATBCTDDTC) + tr(BTACTDDTC)} + tr(ATB){tr(ATBCTDDTC) + tr(BTATCTDDTC)} + tr(CTDDTC){tr(ATBATB) + tr(BTAATB)} + 2tr{(ATB + BTA)2CTDDTC}. The expectation of J3 is the same as E(J2) above except for changing CTDDTC to DTCCTD. Similarly, −E(J5) = tr2(CTD)tr(BTAATB) + tr(CTD){tr(CTDBTAATB) + tr(DTCBTAATB)} + tr(CTD){tr(CTDBTAATB) + tr(DTCTBTAATB)} + tr(BTAATB){tr(CTDCTD) + tr(DTCCTD)} + 2tr{(CTD + DTC)2BTAATB}, and E(J9) is the same as E(J5) with replacing BTAATB with ATBBTA. Finally, we can show that E(J1) = tr2(ATB)tr2(CTD) + tr2(ATB)[tr{(CTD)2} + tr(CTDDTC)] + tr2(CTD)[tr{(ATB)2} + tr(ATBBTA)] + 4tr(ATB)tr(CTD){tr(ATBCTD) + tr(BTACTD)} + [tr{(ATB)2} + tr(ATBBTA)][tr{(CTD)2} + tr(CTDDTC)] + 2{tr(ATBCTD) + tr(BTACTD)}2 + 2tr(ATB)tr{(ATB + BTA)(CD + DTCT)2} + 2tr(CTD)tr{(ATB + BTA)2(CD + DTCT)} + tr{(ATB + BTA)2(CD + DTCT)2} + tr[{(ATB + BTA)(CD + DTCT)}2]. Summarizing the above E(Ji)’s, we obtain E{tr(AM ATBM BT)tr(CM CTDM DT)}. From this result and (2.12), we can obtain the covariance between tr(AM ATBM BT) and 48 tr(CM CTDM DT). The variance is a special case of the covariance. This completes the first part of the Lemma. Next, we prove the inequality given in the second part. Using the Cauchy-Schwarz inequality and Lemma 1, 2tr2(ATB)tr{(ATB + BTA)2} ≤ Ktr2(ATB)tr{(ATB)⊗2} and 2tr(ATB)tr{(ATB + BTA)(ATBATB + BTABTA)} ≤ 2tr(ATB)tr1/2{(ATB + BTA)2}tr1/2{(ATBATB + BTABTA)2} ≤ Ktr(ATB)tr1/2{(ATB)⊗2}tr1/2{(ATBATB)⊗2} ≤ Ktr(ATB)tr1/2{(ATB)⊗2}tr(cid:2){(ATB)T(ATB)}2(cid:3) Moreover, tr{(ATB + BTA)4} ≤ tr2{(ATB + BTA)2} ≤ Ktr2{(ATB)⊗2}. In summary, ≤ Ktr(ATB)tr3/2{(ATB)⊗2}. (cid:104) (cid:104) var{tr(AM ATBM BT)} ≤ K ≤ K tr(ATB)tr1/2{(ATB)⊗2} + tr{(ATB)⊗2}(cid:105)2 tr4(ATB) + tr2{(ATB)⊗2}(cid:105) . This finishes the proof of this Lemma. (cid:3) Define Vn0(s1, s2, h1, h2) = Vn1(s1, s2, h1, h2) = 4 n(n − 1) 8(n − 2) n(n − 1) (cid:88) (cid:88) u,v,k,l∈{1,2} u,v∈{1,2} (−1)−|u−v|−|k−l|tr2(Csuhk (−1)|u−v|tr{(Σs1 − Σs2)Csuhv (Σh1 svhl CT ), − Σh2 )CT suhv }. Lemma 6. Let Ws1s2 = Us1s1,0 + Us2s2,0 − Us1s2,0 − Us2s1,0. The covariance between Ws1s2 and Wh1h2 +Vn1(s1, s2, h1, h2) and Vn0(s1, s2, h1, h2) is the covariance between Ws1s2 and Wh1h2 H0. is Vu(s1, s2, h1, h2), where Vu(s1, s2, h1, h2) = Vn0(s1, s2, h1, h2) under 49 Proof. Let Gn(·) be the function defined in Lemma 2. It then follows, (−1)−|u−v|−|k−l|Gn(su, sv, hk, hl). Vu(s1, s2, h1, h2) = Applying Lemma 2, we have u,v,k,l∈{1,2} Vu(s1, s2, h1, h2) = + tr2(Csuhl CT svhk (−1)−|u−v|−|k−l|(cid:8)tr2(Csuhk (cid:88) (−1)−|u−v|−|k−l|(cid:8)tr(ΣsuCsvhl svhl CT ) n(n − 1) 2 )(cid:9) + 2(n − 2) n(n − 1) u,v,k,l∈{1,2} u,v,k,l∈{1,2} (cid:88) (cid:88) Σhk CT svhl ) )(cid:9). + tr(Σsv Csuhk Σhl CT suhk ) + tr(ΣsuCsvhk Σhl CT svhk ) + tr(Σsv Csuhl Σhk CT suhl Hence, Vu(s1, s2, h1, h2) = (cid:88) (cid:88) u,v,k,l∈{1,2} u,v,k,l∈{1,2} 4 n(n − 1) 8(n − 2) n(n − 1) + (−1)−|u−v|−|k−l|tr2(Csuhk (−1)−|u−v|−|k−l|tr(ΣsuCsvhl CT ) svhl Σhk CT svhl ). After some algebra, one can show the second term in the above expression is equivalent to Vn1(s, h, h1, h2). Under H0, Vn1(s, h, h1, h2) = 0. Therefore, Vu(s, h, h1, h2) = V0(s, h, h1, h2) is the co- variance under H0. This completes the proof of Lemma 6. (cid:3) 2.7.2 Proofs of main results In this section, we present proofs for the main results of Chapter 2. By definition, ˆDnt can be expressed as ˆDnt = ˆDnt,0 − 2 ˆDnt,1 + ˆDnt,2, where for k = 0, 1 and 2, ˆDnt,k = 1 t(T − t) (Us1s1,k + Us2s2,k − Us1s2,k − Us2s1,k). (2.13) t(cid:88) T(cid:88) s1=1 s2=t+1 Here Us1s2,k was defined in Section 2.3. 50 Proof of Theorem 1. Based on the definition of ˆDnt, the expectation of ˆDnt is t(cid:88) s=1 T(cid:88) h=t+1 t(cid:88) T(cid:88) s=1 h=t+1 E( ˆDnt) = 1 t tr(Σ2 s) + 1 T − t tr(Σ2 h) − 2 t(T − t) tr(ΣsΣh) = Dt. We next calculate the order of the variance of ˆDnt. By using the definition of ˆDnt, write ˆDnt as ˆDnt = ˆDnt,0 − 2 ˆDnt,1 + ˆDnt,2. By Lemmas 3 and 4, it follows that ˆDnt,1 = op( ˆDnt,0) and ˆDnt,2 = op( ˆDnt,0). Therefore, it suffices to compute the variance of ˆDnt,0. Using Lemma 6, nt = w−2(t) σ2 cov(Ws1s2, Wh1h2 ) = w−2(t) Vu(s1, s2, h1, h2). ∗(cid:88) s1,s2, h1,h2 ∗(cid:88) s1,s2, h1,h2 This completes the proof of Theorem 1. (cid:3) Proof of Theorem 2. By Theorem 1, it is sufficient to establish the asymptotic normality of ˆDnt,0. We first write ˆDnt,0 into a martingale. Define Ajsu = YjsuY T jsu − Σsu, and Gnj = 1 t(T − t) Qni = 1 t(T − t) t(cid:88) t(cid:88) u,v∈{1,2} T(cid:88) (cid:88) T(cid:88) (cid:88) ni = 2(cid:80)i−1 u,v∈{1,2} (1) s1=1 s2=t+1 s1=1 s2=t+1 Let Zni = Z ˆDnt,0 − Dt =(cid:80)n (1) ni + Z i=1 Zni, (2) ni , where Z (−1)|u−v|(cid:8)Y T isuAjsv Yisu − tr(ΣsuAjsv )(cid:9), (−1)|u−v|{Y T isuΣsv Yisu − tr(ΣsuΣsv )}. j=1 Gnj/{n(n − 1)} and Z (2) ni = 4Qni/n. Then, Let Fk be the σ-algebra generated by σ{Y1, . . . , Yk} where Yi = {Yi1, . . . , YiT} is the collection of Y for the i-th sample. It follows that E(Znk|Fk−1) = 0. Therefore, Znk is a sequence of martingale difference with respect to Fk. Let σ2 ni = E(Z2 ni|Fi−1). To prove the asymptotic normality, we check two following conditions (Hall and Hedye, 1980): Condition (a)(cid:80)n Condition (b)(cid:80)n i=1 σ2 ni/var( ˆDnt) p→ 1; i=1 E(Z4 ni)/var2( ˆDnt) → 0. 51 We first prove Condition (a). Consider E((cid:80)n Furthermore, var( ˆDnt) = (cid:80)n ni) + 2E{(cid:80) Thus, we have E((cid:80)n i=1 E(Z2 i=1 σ2 i=1 E(Z2 ni) =(cid:80)n ni) =(cid:80)n i 0, there exist a constant C such that pr{|ˆk1− k1| > Cβn/(n∆n)} < . This is equivalent to show that pr{ˆk1 ∈ B(C)} < . Since the event {ˆk1 ∈ B(C)} ⊂ {maxt∈B(C) }, ˆDnt > ˆDnk1 }. Thus, it suffices to show, for any  > 0, there pr{ˆk1 ∈ B(C)} ≤ pr{maxt∈B(C) exist a constant C such that ˆDnt > ˆDk1 pr( ˆDnt − Dk1 > ˆDnk1 − Dk1 ) < . (2.17) } ≤ (cid:88) t∈B(C) pr{ max t∈B(C) ˆDnt > ˆDk1 Under H∗ 1 , we have ˆDnt − Dk1 = ˆDnt − Dt + Dt − Dk1 = ˆDnt − Dt − |t − k1|G(t; k1)tr{(Σ1 − ΣT )2}, = ˆDnt − Dt + {r(t; k1) − 1}tr{(Σ1 − ΣT )2} 58 where G(t; k1) = {1/(T − t)}I(1 ≤ t ≤ k1) + (1/t)I(k1 + 1 ≤ t ≤ T − 1). Then, for t ∈ B(C), pr( ˆDnt > ˆDnk1 ) ≤ pr{| ˆDnt − Dt| > |t − k1|G(t; k1)∆n/2} − Dk1 + pr{| ˆDnk1 ≤ pr{|σ−1 + pr{|σ−1 nk1 nt ( ˆDnt − Dt)| > CβnG(t; k1)/(cid:112)(4V0t + 8nV1t)} | > |t − k1|G(t; k1)∆n/2} (cid:113) )| > CβnG(t; k1)/ ( ˆDnk1 − Dk1 (cid:112)(4V0t + 8nV1t). Furthermore, w(t) and G(t; k1) + 8nV1k1 (4V0k1 )}. For any t and some constant C1, βn > C1 are bounded away from zero for t ∈ B(C). Thus, by Chebyshev’s inequality, pr( ˆDnt > ˆDnk1 ) ≤ pr{|σ−1 nt ( ˆDnt − Dt)| > C} + pr{|σ−1 nk1 ( ˆDnk1 − Dk1 )| > C} ≤ 2 C2 < for large enough C. Therefore, (2.17) is true. This finishes the proof of Theorem 4. ,  T (cid:3) Proof of Theorem 5. Let k0 = 0 and kq+1 = T . Denote the common covariances between the change points kj and kj+1 as ˜Σj for j = 0, . . . , q. To show that maxt Dt is at one of the change points, it is enough to show that maxt Dt cannot be attained at any time points except change points k1, . . . , kq. Thus, we need to show that the maximum of Dt is not attainable for t in the following sets: (1) t ∈ {1, . . . , k1 − 1}; (2) t ∈ {kq + 1, . . . , T − 1}; and (3) t ∈ {kl + 1, . . . , kl+1 − 1} for some l ∈ {1, . . . , q − 1}. We do not need to consider case (1) if k1 = 1 or case (3) if kq = T − 1. Without loss of generality, we assume k1 > 1 and kq < T − 1 in the following proof. First, if t ∈ {1, . . . , k1 − 1}, then using the definition of Dt, we have T(cid:88) k1(cid:88) t(cid:88) 1 1 (cid:107)Σs1 − Σs2(cid:107)2 F + t(T − t) Dt = t(T − t) s1=1 s2=k1+1 (cid:107)Σs1 − Σs2(cid:107)2 F t(cid:88) T(cid:88) s1=1 s2=t+1 (cid:107) ˜Σ0 − Σs2(cid:107)2 F = 1 T − t s2=k1+1 which is an increasing function of t in this scenario. Therefore, the maximum value of Dt will not be at any t ∈ {1, . . . , k1 − 1}. 59 Second, if t ∈ {kq + 1, . . . , T − 1}, then kq(cid:88) T(cid:88) s1=1 s2=t+1 (cid:107) ˜Σq − Σs1(cid:107)2 F Dt = 1 t(T − t) kq(cid:88) s1=1 = 1 t (cid:107)Σs1 − Σs2(cid:107)2 F + 1 t(T − t) t(cid:88) T(cid:88) s1=kq+1 s2=t+1 (cid:107)Σs1 − Σs2(cid:107)2 F which is a decreasing function of t. Therefore, the maximum value of Dt will not be at any t ∈ {kq + 1, . . . , T − 1}. At last, let us consider the third case with t ∈ {kl + 1, . . . , kl+1 − 1} for some l ∈ {1, . . . , q − 1}. We rewrite Dt as Dt = 1 t(T − t) + (t − kl) j=l+1 (ki+1 − ki)(kj+1 − kj)(cid:107) ˜Σi − ˜Σj(cid:107)2 l−1(cid:88) (ki+1 − ki)(cid:107) ˜Σi − ˜Σl(cid:107)2 F + (kl+1 − t) (kj+1 − kj)(cid:107) ˜Σl − ˜Σj(cid:107)2 F F (cid:111) . j=l+1 i=0 Since (cid:107) ˜Σi − ˜Σj(cid:107)2 Dt as F = (cid:107) ˜Σi − ˜Σl(cid:107)2 F + (cid:107) ˜Σl − ˜Σj(cid:107)2 F + 2tr{( ˜Σi − ˜Σl)( ˜Σl − ˜Σj)}, we further write (cid:8)2∆ + tA + (T − t)B(cid:9), q(cid:88) (cid:110) l−1(cid:88) q(cid:88) i=0 Dt = 1 t(T − t) l−1(cid:88) q(cid:88) where i=0 ∆ = j=l+1 (ki+1 − ki)(kj+1 − kj)tr{( ˜Σi − ˜Σl)( ˜Σl − ˜Σj)}, A =(cid:80)q i=0(ki+1 − ki)(cid:107) ˜Σi − ˜Σl(cid:107)2 the fact that 1/{t(T − t)} = (1/T ){1/t + 1/(T − t)} to further write Dt as F and B =(cid:80)l−1 j=l+1(kj+1 − kj)(cid:107) ˜Σl − ˜Σj(cid:107)2 (cid:16) (cid:16) (cid:17) (cid:17) Dt = A + 1 t 2∆ T + 1 T − t B + 2∆ T . F . Then we can use We will consider four cases, (a)-(d), according to the signs of A + 2∆/T and B + 2∆/T . (a) If A + 2∆/T ≥ 0 and B + 2∆/T ≤ 0, then Dt is a decreasing function of t. In this case, the maximum of Dt will not be at any t for t ∈ {kl + 1, . . . , kl+1 − 1}. 60 (b) If A + 2∆/T ≤ 0 and B + 2∆/T ≥ 0, then Dt is an increasing function of t. In this case, the maximum of Dt will not be at any t for t ∈ {kl + 1, . . . , kl+1 − 1}. (c) If A + 2∆/T > 0 and B + 2∆/T > 0, then the derivative of Dt with respect to t is D(cid:48) t = 1 t2(T − t)2{(B − A)t2 + 2(A + t is always positive for t ∈ {kl + 1, . . . , kl+1 − 1}. Thus, to determine )T t − (A + 2∆ T )T 2}. 2∆ T t, we only need to know the sign of the numerator of D(cid:48) t. The denominator of D(cid:48) the sign of D(cid:48) The numerator of D(cid:48) t is a quadratic form of t. To know the sign of the numerator of D(cid:48) t, we consider two cases: (i) B > A and (ii) B < A. In the case (i) with B > A, one of the solution of t2(T − t)2D(cid:48) t0 ∈ (kl, kl+1), then D(cid:48) that the function Dt decreases for kl < t < t0 and increases for t0 < t < kl+1. Therefore, t is negative for kl < t < t0 and positive for t0 < t < kl+1. This implies t = 0 is less than 0, another solution t0 is greater than 0. If Dt attains its minimum at t0 and the maximum of Dt will not be attained within (kl, kl+1). If t0 (cid:54)∈ (kl, kl+1), then D(cid:48) t is either always negative or always positive for t ∈ (kl, kl+1). In this case, Dt is a monotonic function of t and hence the maximum of Dt will not be attained within (kl, kl+1). In the case (ii) with B < A, it can be shown that t2(T − t)2D(cid:48) t1, t2 = T(cid:2)(A + 2∆/T )/(A − B) ±(cid:112){(A + 2∆/T )/(A − B) − 1/2}2 − 1/4(cid:3). Here, t1, t2 t = 0 has two solutions, corresponds to the positive and negative sign, respectively. Because B + 2∆/T > 0, (A + 2∆/T )/(A − B) > 1. It follows that t2 > T . Similar to the case of B > A, if t1 ∈ (kl, kl+1), the function Dt decreases for kl < t < t1 and increases for t1 < t < kl+1. Therefore, Dt attains its minimum at t0 and the maximum of Dt will not be attained within (kl, kl+1). If t1 (cid:54)∈ (kl, kl+1), Dt is a monotone function of t and hence the maximum of Dt will not be attained within (kl, kl+1). In summary, the maximum of Dt will not be attained within (kl, kl+1) if A + 2∆/T > 0 and B + 2∆/T > 0. (d) If A + 2∆/T < 0 and B + 2∆/T < 0, then 2∆/T < 0 because A > 0 and B > 0. 61 Thus, A − 2|∆|/T < 0. Using the Cauchy-Schwarz inequality, we have A < 2|∆|/T ≤ l−1(cid:88) q(cid:88) (ki+1 − ki)(kj+1 − kj)((cid:107) ˜Σi − ˜Σl(cid:107)2 F + (cid:107) ˜Σl − ˜Σj(cid:107)2 F )/T i=0 j=l+1 = {(T − kl+1)A + klB}/T. The above inequality implies that A/B < kl/kl+1 < 1. On the other hand, B < 2|∆|/T ≤ {(T − kl+1)A + klB}/T, which implies that A/B > (T − kl)/(T − kl+1) > 1. This is a contradiction. Therefore, case (d) is not possible. By the results of (a)-(d), the maximum of Dt will not attain within {kl + 1, . . . , kl+1 − 1} for case (3). Thus, the proof is completed. (cid:3) Proof of Theorem 6. At the beginning of the binary segmentation algorithm, we have Mn[1, T ] > Wαn[1, T ] with probability one because, for any t ∈ {1, . . . , T − 1}, pr(Mn[1, T ] > Wαn[1, T ]) ≥ pr(σ−1 = pr(cid:8)σ−1 = 1 − Φ(cid:8)σ−1 nt [1, T ]( ˆDnt[1, T ] − Dt[1, T ]) > σ−1 nt [1, T ](σnt,0[1, T ]Wαn[1, T ] − Dt[1, T ])(cid:9) → 1, nt,0[1, T ] ˆDnt[1, T ] > Wαn[1, T ]) nt [1, T ](σnt,0[1, T ]Wαn[1, T ] − Dt[1, T ])(cid:9) where we used the condition Wαn = o(mSN R) in Theorem 6. Therefore, using Theorems 4 and 5, one change point in {k1, . . . , kq} will be detected and estimated with probability 1 because βn[1, T ] = o(nDks[1, T ]) for some s ∈ {1, . . . , q}. Each subsequence satisfies the condition Wαn = o(mSN R) in Theorem 6 and hence the detection continues. Suppose we have detected less than q change points. By the assumptions in this theorem, there exists a segment, {l1 + 1, . . . , l2}, that contains a change point, ks, such that Wαn = o(mSN R) and βn[(l1 + 1), l2] = o{nDks[(l1 + 1), l2]} hold. Therefore, by similar arguments as above, a change point will be detected and estimated consistently in the segment. Thus, ˆq ≥ q. Once ˆq reaches q, all subsequent segments have end points at the change points and two boundary points 1, k1, . . . , kq, T . Then, by Theorem 3, Mn[l1, l2] < Wαn with probability one as αn → 0. This implies that no additional change point will be detected. The proof is completed. (cid:3) 62 CHAPTER 3 COVARIANCE CHANGE POINT DETECTION AND IDENTIFICATION WITH HIGH-DIMENSIONAL FUNCTIONAL DATA 3.1 Introduction Access to high-dimensional data has exploded in recent years due to technological im- provements and cost reductions. High throughput technology has facilitated the collection of genomics data, with more variables being measured than ever before. In addition, the reductions in cost have allowed measurements to be taken over time, as is the case in time- course microarray studies. Similarly, functional neuroimaging studies repeatedly measure a massive number of variables throughout the duration of a medical experiment. Time-course microarray data and functional neuroimaging data are just two examples of applications that beget high-dimensional longitudinal, or functional, data, where a large number of vari- ables are repeatedly measured on a small number of experimental units. Throughout this chapter we focus on high-dimensional dense functional data, where the number of repeated measurements is large (Ramsay 1982). Functional magnetic resonance image (fMRI) data is an important example of high- dimensional functional data. In a task-based fMRI study, individuals perform various tasks while the fMRI machine records blood-oxygen-level dependent (BOLD) signals throughout their brain. These tasks may be passive or active. For example, subjects may be shown a movie, a sequence of pictures, or asked to respond to questions. In contrast, a resting-state fMRI does not involve any subject engagement, but aims to investigate the brain’s functional organization through the BOLD signal measurements. In the course of an fMRI study, the human brain is partitioned into small uniform cubes, also known as voxels, that are about the size of 1–3mm3. For each voxel, a BOLD measurement is recorded at each time point. A cluster of voxels is known as a node or region of interest, where clusters can be defined for 63 anatomical region of interest analysis or spherical region of interest analysis. BOLD signal measurements are repeatedly recorded for each of about 100,000 brain voxels between 100 to 2000 times for a single subject. The number of repeated measurements typically depends on the fMRI scanner and duration of the task-based or resting-state experiment. To enable population inference, multiple subjects are included in an fMRI study. Rather than analyze all 100,000 voxels, doctors may be interested in specific anatomical regions of the brain. However, a region of interest will still have voxel BOLD signal measurements at the order of 100. In addition to the sheer size of the data, fMRI data exhibit complex spatiotemporal dependence. For a given subject, BOLD measurements in neighboring voxels are correlated, as are BOLD measurements for a given voxel but across time points. The high-dimensional and dependent structure make statistical modeling, testing, and analysis a challenge. One major interest in neuroscience is to understand functional connectivity or dynamic functional connectivity at an individual or group level across time points (Kundu et al. 2018). We refer to dynamic functional connectivity as the changing relationships between spatially separated brain regions across experimental time points. In particular, we are inter- ested in studying dynamic functional connectivity across individuals. Traditional functional connectivity assumes stationary relationships between nodes throughout the experiment. To characterize the functional connectivity at a given time point, the covariance matrix, or precision matrix, of BOLD signals serve as a proxy for the within and between brain node neural activity. As a result, dynamic functional connectivity of the brain can be explored via a procedure that assesses covariance matrix stationarity. The purpose of this chapter is to develop a robust statistical procedure to detect and iden- tify change points among covariance matrices in high-dimensional functional data. Assume Yit = (Yit1, . . . , Yitp)T is a p-dimensional random vector with mean vector µt and covariance matrix Σt. In the context of an fMRI study, Yit (i = 1, . . . , n; t = 1, . . . , T ) represents the p BOLD signal measurements for the ith individual at the tth time point, where p, T , and n are typically at the order of 100,000, 100, and 10, respectively. For a specific region 64 of interest in the brain or for region of interest network analysis, p may be at the order of 100. Our proposed procedure aims to answer two questions. First, does a temporal change exist among covariance matrices? This corresponds to a covariance change point detection problem that can be posed in the form of a statistical hypothesis test H0 : Σ1 = ··· = ΣT H1 : Σ1 = ··· = Στ1 (cid:54)= Στ1+1 = ··· = Στq (cid:54)= Στq+1 = ··· = ΣT , versus (3.1) where τk < T (k = 1, . . . , q < ∞) are the unknown change point locations. Second, if a temporal change does exist, can we determine its location and the locations of all possible changes? This suggests a change point identification problem that aims to estimate the unknown locations of τks. Although we consider a high-dimensional setting, we do not require a sparsity assumption for Σt, and we allow the complex spatiotemporal dependence present in high-dimensional functional data. In the context of fMRI studies, our proposed procedure will first determine if functional connectivity is stationary. If not, our change point identification procedure will partition the functional data into stationary sequences with regards to the covariance matrices. Testing covariance matrices is a classical problem in multivariate statistical analysis. Muirhead (2005) and Anderson (2003) detailed multivariate tests for covariance matrices, including testing the homogeneity of several covariance matrices. However, these tests rely on likelihood ratios, and they require the sample size to exceed the number of variables measured. Recent work done by Schott (2007), Srivastava and Yanagihara (2010), and Li and Chen (2012) addressed the lack of an appropriate testing procedure for covariance ma- trices in a high-dimensional setting. More recently, Ahmad (2017) and Zhang et al. (2018) generalized aspects of the aforementioned works to an independent multi-sample test for high-dimensional covariance matrices. All of the research in testing high-dimensional covari- ance matrices since Schott’s 2007 pioneering procedure have addressed the high-dimensional challenges. However, none have focused on how to incorporate temporal dependence in a 65 high-dimensional setting. Therefore, none of the previously mentioned methods are applica- ble to high-dimensional functional data. Researchers in neuroscience have developed a few methods to study the dynamic func- tional brain connectivity for single patients and populations. However, in general, their methods are ad hoc and lack the theoretical rigor to ensure a robust inference procedure. Some neuroscience approaches were detailed in Chapter 2. Most of the existing work stud- ies dynamic functional connectivity for an individual. For example, Monti et al. (2014) developed a sliding window approach based on pair-wise correlations to study the dynamic functional connectivity. Their approach was based off a single subject and is not directly applicable to study the common dynamic functional connectivity for a population. Kundu et al. (2018) developed a procedure to test (3.1) with the aim of studying group level brain dynamic functional connectivity in a task-based fMRI experiment. To detect and identify change points, Kundu et al. (2018) first compute all pair-wise correlations between p nodes at each time point. Thus, at each time point they obtain p(p − 1)/2 sample pair-wise corre- lations that they stack as a vector. Next, they apply a generalized fused Lasso (Tibshirani et al. 2005) approach to the multivariate time series of sample correlations. The fused Lasso was developed for an ordered set of covariates, and as is the case with Lasso, it also involves a penalty parameter. To tune the penalty parameter they use a lowess fit, which also depends on a smoothing parameter. Based on the fused Lasso, the number of change points is a function of the penalty parameter’s value. A small value leads to more identified change points, whereas a large value leads to a fewer number of identified change points. In order to accurately identify all change points, they first fit the model where the tuning parameter’s value is small, and subsequently apply screening criteria to remove any false positive change points. In their approach they did not derive any theoretical results with regards to change point identification consistency. Nor did they investigate the size or power of their proposed change point detection procedure. Furthermore, their method is heavily dependent on the choice of parameters. Our proposed procedure is free of tuning parameters 66 and is theoretically rigorous. While no methods in the existing literature are applicable to test (3.1) for high-dimensional functional data, it is also the case that the methods developed in Chapter 2 are not appli- cable for a few reasons. First, in Chapter 2 it was assumed that the number of repeated measurements is small. Numerical studies considered the finite sample performance when T = 5 and 8. A real data application was conducted where T = 6. Second, the asymptotic distribution of the test statistic and rate of convergence for the change point estimator were derived under an asymptotic setting in which p and n diverge but with T finite. For a large number of repeated measurements, as is the case with dense functional data, it will be more appropriate to consider an asymptotic setting in which p, n, and T diverge. Numerical simulation and real data applications should be based on theoretical results derived under this new asymptotic setting and not that considered in Chapter 2. Third, the computation complexity of the proposed procedure in Chapter 2 was not a concern for small values of n and T . The overall computation complexity of the change point detection procedure detailed in Chapter 2 is O(pn4T 6). To directly apply the procedure from Chapter 2 would be com- putationally impractical, if not impossible. Thus, in this chapter we aim to address these theoretical and computational challenges so our procedure is applicable to high-dimensional functional data. In addition to testing the hypotheses of (3.1), we also develop a method to estimate unknown change points. In Chapter 2, the rate of convergence was established under an asymptotic setting where p and n diverge but with T finite. In this chapter we investigate the rate of convergence of the change point estimator where p, n, and T all diverge. Much of the research in change point identification considers the scenario with n = 1. For instance, Aue et al. (2009) considered a p-dimensional multivariate time series where T diverges but under the assumption that p < T . Wang et al. (2017) considered covariance matrix change point identification for T independent p-dimensional sub-Gaussian random vectors. They also require p < T . Dette et al. (2018) proposed a two-stage covariance change point 67 identification procedure based on T independent sub-Gaussian random vectors. Their first step involves dimension reduction governed by a regularization parameter. In step two, they use a CUSUM-type statistic to estimate the locations of change points. Despite these recent advances, none of the aforementioned methods are applicable to identify change points among covariance matrices in high-dimensional functional data. This chapter provides both theoretical and computational contributions to the field of statistics. From a theoretical perspective, a new asymptotic setting is considered, a setting suitable for high-dimensional functional data, in which n, p, and T diverge. For T diverg- ing, the test statistic forms a stochastic process. The convergence of the finite-dimensional distributions is not sufficient for weak convergence of a stochastic process. Thus, we extend the finite-dimensional result to establish weak convergence of our proposed test statistic. Furthermore, the rate of convergence with respect to the change point estimator is now im- pacted by n, p, and T , as opposed to just n and p in Chapter 2. Our investigation reveals that the rate of convergence depends on the data dimension, sample size, number of repeated measurements, and signal-to-noise ratio. The change point identification estimator is shown to be consistent, provided the signal strength exceeds the noise. To our knowledge, the asymptotic framework in which n, p, and T all diverge has not previously been investigated with regards to change point identification among high-dimensional covariance matrices. From a computation perspective, we improve the efficiency of methods developed in Chap- ter 2. This chapter considers T to be dense, so much of our attention is focused towards computation efficiency for those statistics that have high orders of T . We introduce two recursive relationships and computation efficient formulae to reduce the computation com- plexity from O(pn4T 6) to O(pn2T 4). A quantile approximation technique is shown to further decrease the complexity to the order of pn2T 3. The approximation accuracy is demonstrated through simulation. These improvements are included in an R package, tecoma, which also affords an option for parallel computing. In the absence of these modifications, it would be impossible to apply our methods to fMRI data, or any high-dimensional data set with a 68 large number of repeated measurements. The remaining sections of this chapter are organized as follows. Section 3.2 details the statistical model and our basic setting. Section 3.3 introduces the measure from Chapter 2 along with the unbiased estimator that is a linear combination of U-type statistics. The test statistic’s asymptotic distribution is derived under the asymptotic framework in which n, p, and T diverge. Computation consideration with regards to the statistics is provided in Section 3.4. Section 3.5 introduces an estimator to identify the locations of change points should we reject H0 of (3.1). The estimator’s rate of convergence is studied, and two pro- cedures are detailed to estimate the locations of multiple change points. Sections 3.6 and 3.7 demonstrate the finite sample performance via simulation and investigate the brain’s functional connectivity through a task-based fMRI data set, respectively. All proofs and technical details are provided in Section 3.8. 3.2 Model Suppose we have n independent individuals that have p variables recorded at each of T identical time points. Let Yit = (Yit1, . . . , Yitp)T be an observed p-dimensional random vector, where Yit (i = 1, . . . , n; t = 1, . . . , T ) is independently and identically distributed for all n individuals. Assume Yit follows a general factor model, where Yit = µt + ΓtZi, (3.2) and µt is a p-dimensional unknown mean vector, Γt is an unknown p × m matrix such that m ≥ pT , and Zi’s are independent m-dimensional multivariate standard normal random vectors. Since var(Zi) = Im, it follows that for the ith individual, cov(ΓsZi, ΓtZi) = ΓsΓT t . We define ΓsΓT t as Cst for different time points, s and t, and define ΓtΓT t as Σt. Thus, for the ith individual, cov(Yis, Yit) =  Cst, Σt, 69 s (cid:54)= t, s = t, for all s, t ∈ {1, . . . , T}. For individuals i (cid:54)= j, cov(Yis, Yjt) = 0. By definition, Cst and Σt are p × p matrices for all s, t ∈ {1, . . . , T}. No specific structure is required on covariance matrices Cst and Σt. Their generality allows us to capture the spatiotemporal dependence in and among the random vectors Yit (i = 1, . . . , n; t = 1, . . . , T ). In the context of fMRI data, spatial dependence is present among neighboring voxels or nodes and is captured in both Cst and Σt. Temporal dependence exists for the same voxel or node across time points and is captured in matrix Cst. 3.3 Change point detection We consider the measure, Dt (t = 1, . . . , T − 1), defined in Chapter 2, where t(cid:88) T(cid:88) s1=1 s2=t+1 Dt = 1 t(T − t) tr{(Σs1 − Σs2)2}. (3.3) To simplify notation, let t(T − t) be defined as w(t). The choice of Dt is motivated by the fact that we can distinguish between H0 and H1 based on the maximum value of Dt for all t ∈ {1, . . . , T − 1}. Let T = {1, . . . , T − 1}. Under H0 of (3.1), maxt∈T Dt = 0, and under H1, maxt∈T Dt > 0. Our test statistic is constructed in the same manner as detailed in Chapter 2. We Dt. Quantity Dt can be expressed as Dt = w−1(t)(cid:80)t use a linear combination of U-type statistic estimators to create an unbiased estimator of ) − tr(Σs1Σs2) − tr(Σs2Σs1)}. An unbiased estimator for tr(Σs1Σs2) is given by Us1s2, where (cid:80)T s2=t+1{tr(Σ2 ) + tr(Σ2 s2 s1 s1=1 Us1s2 = Us1s2,0 − Us1s2,1 − Us2s1,1 + Us1s2,2, (3.4) 70 and P 2 nUs1s2,0 = P 3 nUs1s2,1 = P 3 nUs2s1,1 = P 4 nUs1s2,2 = i,j ∼(cid:88) ∼(cid:88) ∼(cid:88) ∼(cid:88) i,j,k i,j,k i,j,k,l (Y T is1 )2, Yjs2 Y T is1 Y T is2 Yjs2 Y T js2 Yks1 , Yjs1 Y T js1 Yks2 , Y T is1 Yjs2 Y T ks1 Yls2 . In the above expressions, quantity P k sents the summation over mutually different indices. Thus,(cid:80)∼ n = n!/(n − k)!, and the ∼ summation notation repre- i,j,k is defined as the summa- tion over i, j, and k, such that i (cid:54)= j, j (cid:54)= k, and k (cid:54)= i. Therefore, an unbiased estimator of Dt is ˆDnt = 1 w(t) = 1 w(t) t(cid:88) t(cid:88) s1=1 T(cid:88) T(cid:88) s2=t+1 (Us1s1 + Us2s2 − Us1s2 − Us2s1) 2(cid:88) (−1)|a−b|Usasb. s1=1 s2=t+1 a,b=1 (3.5) In this chapter we consider a different asymptotic framework than that of Chapter 2. Chapter 2 considered p(n) → ∞ as n → ∞, where p is a function of n. We now consider p(n) → ∞ and T (n) → ∞ as n → ∞, where p and T are both functions of n. No specific functional form is required, and we do not require any specific relationships between p, T , and n. Thus, we allow for p > n and p > T . To establish the limiting distribution of ˆDnt, we assume Conditions 1 – 2 introduced in Section 2.3 along with the following two conditions. s1=1 s2=t+1 h1=1 s1,s2, h1,h2 (cid:80)t (cid:80)T (cid:80)T given by (3.6). is defined as (cid:80)t The notation (cid:80)∗ Condition 4. (cid:80)∗ Condition 5. There exists a function ψ(k) such that ψ(k) > 0 and (cid:80)∞ any s1, s2 ∈ {1, . . . , T}, tr2(Cs1s2Cs2s1) (cid:16) ψ(|s1 − s2|)tr2(Σs1Σs2). 0t), for any u, k, v, l ∈ {1, 2}. tr4(Csuhk ) = o(V 2 s1,s2, h1,h2 CT svhl h2=t+1, and quantity V0t is k=1 ψ(k) < ∞. For 71 In Condition 5, (cid:16) means of the same order. Thus, f (s) (cid:16) g(s) implies there exists a con- stant c1 such that |f (s)| ≤ c1|g(s)|, and there exists a constant c2 such that |g(s)| ≤ c2|f (s)| for all s in the real numbers. Condition 5 imposes mild requirements on the spatiotem- poral structure. Condition 4 is also a mild condition. If no temporal dependence exists, then V0t =(cid:80)t tion 4 is(cid:80)t (cid:80)T (cid:80)T s1=1 s2=t+1 s1=1 s2=t+1 (cid:80) (cid:80) u,v∈{1,2} tr2(ΣsuΣsv ). Similarly, the left-hand side of Condi- u,v∈{1,2} tr4(ΣsuΣsv ). Furthermore, if all eigenvalues of Σt are 0t (cid:16) {t(T − t)p2}2. In comparison, the left-hand side bounded for all t ∈ {1, . . . , T}, then V 2 of Condition 4 is of the order t(T − t)p4. As a result, Condition 4 holds. In Chapter 2, we derived the leading order variance of ˆDnt, that is var( ˆDnt) = σ2 nt{1 + (3.6) (3.7) o(1)}, where σ2 nt = w−2(t)(4V0t/n2 + 8V1t/n), and ∗(cid:88) ∗(cid:88) (cid:88) (cid:88) (−1)|u−v|+|k−l|tr2(Csuhk s1,s2, h1,h2 u,v, k,l∈{1,2} V0t = V1t = CT svhl ), (−1)|u−k|tr{(Σs1 − Σs2)Csuhk (Σh1 − Σh2 )CT suhk }. u,k∈{1,2} s1,s2, h1,h2 The below theorem establishes the asymptotic distribution of ˆDnt under the asymptotic setting considered in this chapter. Theorem 7. Under Conditions 1 – 2, and 4, as n → ∞, (cid:16) ˆDnt − Dt (cid:17) d→ N (0, 1), σ−1 nt where σ2 nt = w−2(t)(4V0t/n2 + 8V1t/n) and V0t and V1t are given in (3.6) and (3.7), respec- tively. Under the null hypothesis, it follows that σ−1 ˆDnt → N (0, 1) in distribution, where nt,0 = w−2(t)(4V0t/n2) and only Conditions 1 and 4 are required. To formulate an appro- σ2 priate test procedure free of tuning parameters, consider the test statistic, Mn, of Chapter 2, where nt,0 Mn = max t∈T ˆσ−1 nt,0 ˆDnt, (3.8) 72 and ˆσnt,0 is a plug-in estimator for σnt,0. Methods to construct ˆσnt,0 were detailed in Chapter 2. The following theorem establishes the asymptotic distribution of Mn under the setting where n, p, and T all diverge. Theorem 8. Under Conditions 1, 4, and 5, H0 of (3.1), and as n → ∞, Mn where Zt is a Gaussian process with mean 0 and covariance Rz. d→ maxt∈T Zt, We assume that as n → ∞, Rn,z converges to Rz, where Rn,z is a correlation matrix with (t, q) component defined as Rn,tq = corr( ˆDnt, ˆDnq). The leading order of the cov( ˆDnt, ˆDnq) is w−1(t)w−1(q)(4V0,tq/n2), where T(cid:88) q(cid:88) (cid:88) t(cid:88) T(cid:88) (−1)|u−v|+|k−l|tr2(Csuhk V0,tq = CT svhl ). (3.9) s1=1 s2=t+1 h1=1 h2=q+1 u,v, k,l∈{1,2} In order to perform an α-level hypothesis test for (3.1), we must approximate Rn,z and thus require an estimator for V0,tq. In Chapter 2, an unbiased estimator for tr(Csuhk given as a linear combination of U-type statistics. Let ˆRn,tq be an estimator for the (t, q) ) was svhl CT component of Rn,z. Let W = maxt∈T Zt, where Zt is a Gaussian process with mean 0 and covariance Rz, and define Wα as the quantity such that pr(W > Wα) = α. By Theorem 8, Mn→W in distribution, and an α-level test rejects the null hypothesis in (3.1) if Mn > Wα. However, there is no simple and computation efficient approach to obtain Wα. The random variable W depends on Rz. Chapter 2 proposed a procedure to approximate quantile Wα on the basis of computing ˆRn,tq for each t, q ∈ {1, . . . , T − 1}. However, the computation complexity of this approach, in terms of T , is at the order of T 4 for each component. Therefore, total complexity is at the order of T 6 to compute ˆRn,z. As a result, it is not feasible to compute all components when T is large. As an attempt to alleviate this burden, we can further approximate the distribution of W by a Gumbel distribution. Under additional assumptions and if T diverges, then pr(Mn ≤(cid:112)2 log(T ) − log{log(T )} + x) → exp{−(2 √ π)−1 exp(−x/2)}. 73 Accordingly, an α-level quantile is defined as (cid:112)2 log(T ) − log{log(T )} + xα, where xα = π log(1 − α)}. However, the rate of convergence is at the order of log(T ), which √ −2 log{−2 is slow. In addition, our simulation experiments demonstrated that the size of the test was not well controlled at the nominal level. Moreover, using an extreme value-type distribution does not eliminate the need to compute ˆσnt,0 for all t ∈ T . That overall cost in terms of T is at the order T 5. Hence, we carefully consider an approximation procedure in Section 3.4 that improves efficiency and maintains accuracy. 3.4 Computation of the proposed statistics The computation complexity for the change point detection procedure is at the order of pn4T 6. To reduce the complexity, we re-formulate some of the statistics introduced in Section 3.3 in a computation optimal manner. The computation complexity of Us1s2 is at the order of n4 due to term Us1s2,2. In addition, term Us1s2,1 has computation complexity at the order of n3. To save computation cost, we can rewrite Us1s2,1 and Us1s2,2 defined in (2.4) in a computation efficient form as follows. First, we consider Us1s2,1, which can be rewritten as n(cid:88) (cid:16) n(cid:88) j=1 i=1 (cid:17)2 − n(cid:88) i,j=1 n(cid:88) k(cid:54)=j=1 P 3 nUs1s2,1 = Y T is1 Yjs2 (Y T is1 )2 − 2 Yjs2 Y T js1 Yjs2 Y T js2 Yks1 . (3.10) Therefore, the computation complexity of Us1s2,1 regarding the sample subjects is at the order of n2, not n3. To write Us1s2,2 in a computation efficient form, we first define Vs1s2,1 = (1/P 3 n) . Similar to Us1s2,1, we can write Vs1s2,1 as ∼(cid:80) Yjs2 i,j,k Y T is1 P 3 nVs1s2,1 = Y T is1 Yjs2 Y T is1 Yjs2 Y T js1 Yis2 Y T js1 Yks2 (cid:16) n(cid:88) n(cid:88) − n(cid:88) j=1 i=1 k(cid:54)=j=1 (cid:17)(cid:16) n(cid:88) i=1 (cid:17) − n(cid:88) i,j=1 Y T is2 Yjs1 − n(cid:88) i(cid:54)=j=1 Y T js1 Yjs2 Y T js1 Yks2 Y T is1 Yjs2 Y T js1 Yjs2 . The computation complexity of Vs1s2,1 regarding the sample subjects is also at the order of 74 n2. Finally, we can write Us1s2,2 as (cid:17)2 − P 3 P 4 nUs1s2,2 = Y T is1 Yjs2 n(Us1s2,1 + Us2s1,1 + 2Vs1s2,1) − P 2 nUs1s2,0 (cid:16) n(cid:88) − n(cid:88) i(cid:54)=j=1 i(cid:54)=j=1 (Y T is1 Yjs2 )(Y T is2 Yjs1 ). (3.11) Based on the above expression for P 4 plexity of Us1s2,2 regarding the sample subjects is at the order of n2. computation cost of the proposed statistic Us1s2 with regard to sample subjects is at the order of n2. These computation efficient expressions can be derived in a similar manner for nUs1s2,2, we can also see that the computation com- In summary, the Ususv,hkhl , the term used as a plug-in estimator to primarily compute ˆRn,tq. The computation complexity of ˆDnt in (3.5) in terms of T is at the order T 3. To reduce the complexity in terms of T , we write ˆDnt recursively. Let f (s1, s2) = (Us1s1 + Us2s2 − Us1s2 − Us2s1) such that s1, s2 ∈ {1, . . . , T}. By definition, it follows that for t ≥ 2, ˆDnt = w(t − 1) w(t) ˆDn(t−1) − w−1(t) f (k, t) + w−1(t) f (t, k). (3.12) k=1 k=t+1 t−1(cid:88) T(cid:88) When t = 1, the computation complexity of ˆDn1 is at the order of T . Therefore by (3.12), for each t ∈ {1, . . . , T − 1} the computation complexity in terms of T is at the order T . Since we compute ˆDnt for all t ∈ {1, . . . , T − 1}, the total computation complexity in terms of T is at the order of T 2 rather than T 3. As a result, the overall computation complexity to compute ˆDnt for all t ∈ {1, . . . , T − 1} is at the order of pn2T 2. Parallel computing can further decrease the computation time. The greatest cost in terms of computation is due to ˆRn,tq for all t, q ∈ T , where the complexity is at the order of pn2T 6 provided (3.10) and (3.11) are applied. To reduce the 75 complexity, we express ˆRn,tq recursively. Let (cid:88) t(cid:88) u,v, k,l∈{1,2} g(s1, h1, s2, h2) = h(t, q) = (−1)|u−v|+|k−l|U 2 T(cid:88) q(cid:88) T(cid:88) susvhkhl , g(s1, h1, s2, h2). s1=1 s2=t+1 h1=1 h2=q+1 Thus, n2w(t)w(q) ˆV0,tq/4 = h(t, q). Suppose the quantity h(t, q − 1) is known for t ∈ {1, . . . , T − 2} and q ∈ {2, . . . , 2 − 1}. For a fixed t, q−1(cid:88) T(cid:88) T(cid:88) T(cid:88) h(t, q) = h(t, q − 1) − g(t, j, k, q) + g(t, q, j, k). (3.13) An analogous recursive formula can be derived to traverse a fixed column where h(t − 1, q) j=t k=t+1 j=t+1 k=q+1 is known and we want to compute h(t, q). Based on the recursive formula in (3.13), the computation complexity in terms of T is at T 2. By the definition of ˆRn,tq, ˆRn,1,1, ˆRn,1,T−1, ˆRn,T−1,T−1 can each be computed at the computation complexity in terms of T at the order T 2. Therefore, based on (3.13) and the fact that we must compute ˆRn,tq for all t < q, the overall computation complexity for ˆRn,z is at the order np2T 4. Despite this reduction, the complexity can further be improved via linear interpolation on a sparse form of ˆRn,z. Rather than compute ˆRn,tq for all t, q ∈ {1, . . . , T − 1}, we can compute h = (b + I) off-diagonals of the matrix and interpolate the remaining values. Let b be the number of consecutive off-diagonals immediately following the main diagonal, and let I be the num- ber of off-diagonals computed at a fixed interval after the b consecutive off-diagonals. Let diag( ˆRn,1,d+1) be the dth off-diagonal, where d ∈ {1, . . . , T − 2}. For an efficient approx- imation of ˆRn,z, first compute ˆRn,1,1. Next, apply (3.13) to compute diag( ˆRn,1,2), . . . , diag( ˆRn,1,b) for the corresponding b off-diagonals. Lastly, apply formula (3.13) to compute diag( ˆRn,1,I1 Each of these I off-diagonals has an initial computation in terms of T at the order T 3. ) that correspond to the I off-diagonals at a fixed interval. ), . . . , diag( ˆRn,1,II Parallel processing can be utilized to start each off-diagonal’s computation independently. The overall complexity in terms of T , will be at order hT 3 to obtain a sparse version of 76 ˆRn,z. Linear interpolation is then used to estimate the components not computed. Based on our simulations, linear interpolation results in a negligible loss in power, and the size remains near the nominal level. Full simulation results for the linear interpolation method are available in Section 3.6. Figure 3.1: Accuracy of linear interpolation for ˆRn,tq. Black circles represent ˆRn,1q for all q ∈ {1, . . . , T − 1}. Red triangles represent the corresponding interpolated values. Figure 3.1 illustrates ˆRn,1,q for all q ∈ {1, . . . , T − 1} and the corresponding interpolated values based on parameters b = 20 and I = 8. The fixed interval for off-diagonals was set to ten. The accuracy of the linear interpolation is evident under the null and alternative hypotheses. Therefore, the computation complexity in terms of T for ˆRn,tq can be reduced from T 4 to hT 3. In Chapter 2, the overall computation complexity to approximate the quan- tile was at the order pn4T 6. From the recursive formulae and estimation procedure via linear interpolation, the overall computation complexity to estimate Rn,z is reduced to pn2T 3, and thus, making our change point detection procedure applicable to high-dimensional functional data. 77 3.5 Change point identification If the data leads us to the conclusion to reject H0 of (3.1), then a second task is to identify the time points where changes exist among the T high-dimensional covariance matrices. First, consider the case with only one change point. Let τ be the time point where the single change point exists. Define ˆτ as an estimator for the change point’s location, where ˆτ = arg max t∈T ˆDnt, (3.14) and T = {1, . . . , T − 1}. The form of the estimator is motivated by Theorem 4, which states that Dt is maximized at the time point t = τ when a single change point exists at τ . Consider the hypotheses H0 : Σ1 = ··· = ΣT H∗ 1 : Σ1 = ··· = Στ (cid:54)= Στ +1 = ··· = ΣT . versus (3.15) The following theorem establishes the rate of convergence for ˆτ . Theorem 9. Assume that H∗ constant. Under Conditions 1 – 2, and 4, it follows that as n → ∞, 1 of (3.15) is true. Also, assume that as T → ∞, τ /T → ω, a (3.16) (cid:16)√ where νmax = maxt∈T max ˆτ − τ = Op √ nV1t V0t, (cid:112)log(T ) (cid:111) , n∆p (cid:110)νmax (cid:17) and ∆p = tr{(Σ1 − ΣT )2}. Theorem 9 demonstrates that the change point estimator, ˆτ , is consistent for high- dimensional functional data, provided that ∆p/νmax (cid:29) (cid:112)log(T )/n. Quantity ∆p can be interpreted as the signal, and quantity νmax can be interpreted as the noise. Thus, (cid:112)log(T )/(n∆p) → 0, ˆτ is a consistent estimator for τ . n) since ∆p,(cid:112)log(T ), √ V0t, and √ if νmax √ is Op(1/ To investigate the impact of n, p, and T on the rate of convergence of ˆτ we consider each in turn. Assume p and T are fixed as n → ∞. As a result, the rate of convergence for ˆτ − τ V1t are held constant. If we assume T is fixed 78 as n and p diverge, then the rate of convergence is the same as that proved in Theorem 4. n depending on the contributions of ∆p and νmax. Next, if √ This rate can be faster than 1/ we assume p is fixed as n and T diverge, then ˆτ − τ = Op on the relationship between ∆p and νmax, the rate of convergence can be much faster than n as p, T , and n all diverge. As p increases, Σ1 − ΣT can possibly contain more nonzero components, so ∆p could get larger. However, as p and T increase, νmax (cid:112)log(T )/ (cid:111) (cid:110) increases. Therefore, if νmax does not dominate ∆p, we obtain a faster rate of convergence √ n. Despite the fact that the estimator in (3.14) is the same as that proposed √ νmax n . Depending √ (cid:112)log(T )/ than(cid:112)log(T )/ in Chapter 2, the rate of convergence for the estimator is very different with regards to the asymptotic framework in which n → ∞, p → ∞, T → ∞. Assume H1 of (3.1) is true for multiple change points. First, we introduce two procedures to identify the locations of multiple change points, and then introduce a theorem with regards to the consistency of estimating multiple change points. Let Q = {1 ≤ τ1 < ··· < τq < T} be the collection of all the true q change points, and let ˆQ be the estimated set of change points. We make use of the notation in Chapter 2 and define for time points t1 < t2, S[t1, t2] is the statistic S calculated based on the data in time interval t1 through t2. For example, νmax[t1, t2] is the quantity based on data between t1 and t2. To identify multiple change points we apply binary segmentation (Venkatraman 1992). The binary segmentation algorithm is detailed follows. Step 1: Compute Mn and compare it with Wα. If Mn > Wα, then ˆκ = arg maxt∈T ˆDnt is the estimated change point, and set ˆκ = ˆτ1 so ˆQ = {ˆτ1}. Partition the full data set into two intervals: [1, ˆκ] and [ˆκ + 1, T ] and proceed to step 2. However, if Mn ≤ Wα, then no change points exist. Step 2: Perform the detection procedure to test (3.1) using Y [1, ˆκ] and Y [ˆκ + 1, T ]. If H0 is ˆDnt[1, ˆκ] as a change point. rejected based on Y [1, ˆκ], then identify ˆκ1 = arg maxt∈[1,ˆκ] Since ˆκ1 < ˆτ1, set ˆτ1 = ˆκ1 and ˆτ2 = ˆκ so ˆQ = {ˆτ1, ˆτ2}. Partition the data Y [1, ˆκ] into 79 two intervals: [1, ˆκ1] and [ˆκ1 + 1, ˆκ]. If H0 is not rejected, then no change points exist in the interval [1, ˆκ]. Repeat this procedure for the data based on interval [ˆκ + 1, T ]. Set ˆQ is then updated to contain the ordered change points. If no change points are detected in either interval, then stop, as ˆκ is the only change point that exists. Step 3: If a change point is identified in at least one interval in step 2, repeat step 2 until no further change points are detected. At each step update and order set ˆQ. At the conclusion of the binary segmentation procedure we can partition the interval [1, T ] so each sub-interval will consist of end points from the set {1, ˆQ, T}. For example, if no change point is identified, then the single interval is [1, T ]. If a single change point is identified at ˆτ , then two intervals where no change points exist are [1, ˆτ ] and [ˆτ , T ]. The computation time to identify multiple change points exceeds the time to detect the existence of change points. If parallel computing is available, then the computation time required to identify multiple change points can be improved via a more efficient identification procedure when compared to the steps outlined for binary segmentation. The improvement stems from the fact that the time to compute ˆDnt is less than the time required to test (3.1) for a given time interval. An efficient parallel procedure is detailed below. Step 1: Perform binary segmentation by partitioning at arg maxt∈It ˆDnt[It], where It is the considered time interval of data. Change point detection is not performed at this step. Binary segmentation continues until all intervals are either of the form [a, a] or [a, a + 1], where a ∈ T . Suppose there exist N total intervals at the conclusion of binary segmentation. Step 2: For all N intervals of length at least one, apply the change point detection procedure to test (3.1) in parallel. If H0 is rejected for a given interval, then a change point exists ˆDnt[It] from step 1. Update ˆQ for each and is estimated at the point arg maxt∈It identified change point. 80 Hence, the computation time required to identify multiple change points will only slightly exceed the time to perform change point detection on the longest interval [1, T ]. To establish the consistency of ˆQ we first define some notation. Let It be a time interval such that It = [τa + 1, τb], where a + 1 < b such that a ∈ {0, . . . , q− 1} and b ∈ {2, . . . , q + 1}. Define τ0 = 0 and τq+t = T . Thus, It is an interval with at least one change point. Assume the smallest maximum signal-to-noise ratio among all segments It is as defined in Chapter 2, where minIt maxτs∈It σ−1 nτs,0[It]Dτs[It] is denoted as mSNR. Theorem 10. Assume that τk/T converges to ωk as T diverges, Wαn = o(mSNR), and for any interval It, νmax[It](cid:112)log(T )/(n∆p[It]) → 0. Furthermore, assume αn → 0. Therefore, under Conditions 1 – 2, and 4, as n → ∞, ˆQ → Q in probability. In the existence of change points, the assumption that Wαn = o(mSNR) ensures the con- sistency of the proposed test at each phase of binary segmentation. In the absence of change points, the assumption that αn → 0 ensures no change points will be detected and binary seg- mentation will stop on the given interval. The assumption that νmax[It](cid:112)log(T )/(n∆p[It]) → 0 ensures that in the existence of change points, the estimator is consistent. 3.6 Simulation studies In this section, we present multiple simulation studies to demonstrate the performance of the change point detection and identification procedures in a large p, large T , and small n setting. All data were generated from a multivariate linear process, L(cid:88) Yit = At,hξi(t−h) (i = 1, . . . , n; t = 1, . . . , T ), (3.17) h=0 where At,h is a p× p matrix, and ξi(t−h) are p-dimensional multivariate normally distributed random vectors with mean 0 and covariance Ip. The data generation scheme given by (3.17) 81 permits spatial and temporal dependence. Let t ≥ s. By definition of Yit in (3.17), L(cid:88)  cov(Yit, Yis) = At,hAT s,h−(t−s), t − s ≤ L; t − s > L. h=t−s 0, Spatial dependence occurs among the vector Yit for a given time point t. Temporal de- pendence exists among {Yit}T parameter L. t=1 at different time points and is governed by the simulation In the simulation studies, we set n = 40, 50 and 60, and p = 500, 750 and 1000. The num- ber of repeated measurements, T, was set to be 50 and 100. For change point identification we considered an additional case with T = 150. The simulation parameter L = 3. Simulation results reported in Tables 3.1 – 3.4 were based on 500 simulation replications, and simulation results in Tables 3.5 and 3.6 were based on 100 simulation replications. The spatial and temporal dependence incorporated in (3.17) depends on the choice of matrices At,h. First, we define matrices At,h for the testing simulation to demonstrate the size and power of the proposed test procedure. Later, matrices At,h will be defined for the change point identification simulation. Let τ1 be the true underlying change point among the covariance matrices such that τ1 = (cid:98)T /2(cid:99), where (cid:98)x(cid:99) is the floor function. Define two matrices, B1 and B2, such that where (i, j) represents the ith row and jth column of the p × p matrices B1 and B2. Thus, for h ∈ {0, . . . , 3} (cid:110) (cid:111) (cid:110) (cid:111) (0.6)|i−j|I(|i − j| < p/5) , (0.6 + δ)|i−j|I(|i − j| < p/5)  B1, t ∈ {1, . . . , τ1}; t ∈ {τ1 + 1 . . . , T}. B2, , B1 = B2 = At,h = Parameter δ in B2 governs the signal strength in terms of how different the covariance matrices are before and after the change point at time τ1. When δ = 0, B1 = B2 and At,h 82 is the same for all t, and the null hypothesis is true. If δ > 0, then the null hypothesis is false, and τ1 is the true covariance change point. For the change point detection simulation, δ was set to 0.00, 0.025, 0.05 and 0.10. Table 3.1: Empirical size and power of the proposed test, percentages of simulation replications that reject the null hypothesis δ 0(size) 0.025 0.05 0.10 n 40 50 60 40 50 60 40 50 60 40 50 60 500 4.4 4.8 3.8 13.4 17.0 26.4 96.0 100 100 100 100 100 T = 50 T = 100 p 750 4.6 4.0 4.2 13.4 19.2 26.0 97.0 100 100 100 100 100 1000 3.8 3.6 2.8 10.8 17.0 27.4 98.0 100 100 100 100 100 500 3.6 2.0 5.4 18.0 30.6 47.0 100 100 100 100 100 100 p 750 5.4 4.6 3.6 19.0 27.2 41.6 100 100 100 100 100 100 1000 4.4 4.0 5.6 18.0 30.4 41.6 100 100 100 100 100 100 Table 3.1 demonstrates the empirical size and power of the proposed test procedure. The size is well controlled at the nominal level of 0.05 for all values of n, p, and T . For a fixed p and T , as n increases the power increases. Likewise, as δ increases, the power of the change point detection procedure increases. For a fixed n and p, the power increases as T increases. These relationships are further elucidated when simulation results from Table 2.1 under Setting (I) in Section 2.5 are considered. For example, when n = 40, p = 500, and δ = 0.05, we observe that the power of the test is 21.4, 35.6, 96.0, and 100 as T is 5, 8, 50, and 100, respectively. 83 Table 3.2: Empirical size and power of the proposed test for T = 100, percentages of simulation replications that reject the null hypothesis, quantile computed from a correlation matrix that used linear interpolation. The first 5 off-diagonals were computed exactly as well as the last w components for each row δ 0(size) 0.025 0.05 0.10 n 40 50 60 40 50 60 40 50 60 40 50 60 500 3.4 2.0 4.8 17.8 30.8 46.6 100 100 100 100 100 100 w = 5 w = 10 w = 20 p 750 4.8 4.6 3.2 19.0 26.2 40.8 100 100 100 100 100 100 1000 4.2 3.8 5.0 17.6 30.2 41.0 100 100 100 100 100 100 500 3.4 2.0 4.8 17.8 30.8 46.6 100 100 100 100 100 100 p 750 4.8 4.6 3.2 19.0 26.6 40.8 100 100 100 100 100 100 1000 4.2 4.0 5.0 17.6 30.2 41.0 100 100 100 100 100 100 500 3.4 2.0 5.2 17.8 30.8 46.6 100 100 100 100 100 100 p 750 5.2 4.6 3.8 19.0 26.6 41.2 100 100 100 100 100 100 1000 4.2 4.0 5.6 17.6 30.2 41.0 100 100 100 100 100 100 Table 3.3: Empirical size and power of the proposed test for T = 100, percentages of simulation replications that reject the null hypothesis, quantile computed from a correlation matrix that used linear interpolation. The first 10 off-diagonals were computed exactly as well as the last w components for each row δ 0(size) 0.025 0.05 0.10 n 40 50 60 40 50 60 40 50 60 40 50 60 500 3.4 2.0 4.8 18.0 30.8 46.6 100 100 100 100 100 100 w = 5 w = 10 w = 20 p 750 5.0 4.6 3.2 19.0 26.6 40.8 100 100 100 100 100 100 1000 4.2 4.0 5.0 17.6 30.2 41.0 100 100 100 100 100 100 500 3.4 2.0 4.8 18.0 30.8 46.6 100 100 100 100 100 100 p 750 5.0 4.6 3.4 19.0 26.6 40.8 100 100 100 100 100 100 1000 4.2 4.0 5.0 17.6 30.2 41.0 100 100 100 100 100 100 500 3.4 2.0 5.2 18.0 30.8 46.6 100 100 100 100 100 100 p 750 5.2 4.6 3.8 19.0 26.8 41.2 100 100 100 100 100 100 1000 4.2 4.0 5.6 17.6 30.2 41.0 100 100 100 100 100 100 84 Table 3.4: Empirical size and power of the proposed test for T = 100, percentages of simulation replications that reject the null hypothesis, quantile computed from a correlation matrix that used linear interpolation. The first 20 off-diagonals were computed exactly as well as the last w components for each row δ 0(size) 0.025 0.05 0.10 n 40 50 60 40 50 60 40 50 60 40 50 60 500 3.6 2.0 5.2 18.0 30.6 46.8 100 100 100 100 100 100 w = 5 w = 10 w = 20 p 750 5.2 4.6 3.4 19.0 26.8 40.8 100 100 100 100 100 100 1000 4.2 4.0 5.6 17.6 30.2 41.0 100 100 100 100 100 100 500 3.6 2.0 5.2 18.0 30.6 47.0 100 100 100 100 100 100 p 750 5.2 4.6 3.4 19.0 26.8 41.4 100 100 100 100 100 100 1000 4.4 4.0 5.6 17.8 30.4 41.4 100 100 100 100 100 100 500 3.6 2.0 5.2 18.0 30.6 46.6 100 100 100 100 100 100 p 750 5.2 4.6 3.8 19.0 27.0 41.4 100 100 100 100 100 100 1000 4.4 4.0 5.6 18.0 30.4 41.6 100 100 100 100 100 100 Tables 3.2 – 3.4 demonstrate the empirical size and power of the proposed test procedure using a modification of the quantile approximation procedure introduced in Section 3.4. Rather than compute ˆRn,tq for all t, q ∈ {1, . . . , T − 1}, we compute the first b off-diagonals and the last w columns of ˆRn,tq. The remaining values were imputed via linear interpolation. Figure 3.1 demonstrates the accuracy of this linear interpolation procedure. Simulations considered b = 5, 10 and 20, and w = 5, 10 and 20. Based on our simulation results, there is only a minimal loss in power when compared to computing all components of ˆRn,tq. Furthermore, the size of the test is well maintained at the nominal level of 0.05. To evaluate the performance of the change point identification procedure through binary segmentation, consider two change points: τ1 and τ2. Let τ1 = (cid:98)T /2(cid:99), and let τ2 = τ1 + 2. 85 Define three matrices, B1, B2, and B3, such that (cid:110) (cid:111) (cid:111) (cid:110) (|i − j| + 1)−2I(|i − j| < p/5) , (cid:110) (cid:111) (|i − j| + δ + 1)−2I(|i − j| < p/5) (|i − j| + 2δ + 1)−2I(|i − j| < p/5) , , B1 = B2 = B3 = where (i, j) represents the ith row and jth column of the p × p matrices B1, B2, and B3. Thus, for h ∈ {0, . . . , 3}  At,h = B1, B2, B3, t ∈ {1, . . . , τ1}; t ∈ {τ1 + 1 . . . , τ2}; t ∈ {τ2 + 1 . . . , T}. When δ = 0, the null hypothesis is true, and At,h is the same for all t ∈ {1, . . . , T}. Since our purpose is to demonstrate the finite sample accuracy of change point identification, we do not consider a null hypothesis setting in which δ = 0. The values of δ were selected to be 0.15, 0.25, and 0.35. Two measures were considered to evaluate the change point identification procedure’s efficacy: average true positives and average true negatives. For each simulation replication there exists two true change points at time points τ1 and τ2, and there exists T − 3 time points where no change point exists. The average true positives are defined as the average number of correctly-identified change points among 100 simulation replications. Similarly, the average true negatives are defined as the average number of correctly-identified time points where no covariance change exists among 100 simulation replications. Table 3.5 provides the efficacy of the binary segmentation procedure in the large p, large T , and small n setting. For fixed p, n, and T , the average true positives and average true negatives approach two and T − 3, respectively, as δ increases. As the sample size increases, the average true positives and average true negatives approach their optimal values. Table 3.6 contains the corresponding standard errors for the measures in Table 3.5. 86 Table 3.5: Average true positives and average true negatives for identifying multiple change points using the proposed binary segmentation method. The maximum number of true positives for a given replication is 2. The maximum number of true negatives for a given replication is T − 3 δ=0.15 δ=0.25 δ=0.35 T p 500 50 750 1000 500 100 750 1000 500 150 750 1000 n ATP ATN ATP ATN ATP ATN 40 46.62 46.63 50 46.61 60 46.59 40 50 46.70 46.64 60 46.59 40 50 46.76 46.59 60 96.54 40 96.44 50 60 96.46 96.54 40 96.54 50 60 96.42 96.59 40 96.49 50 96.44 60 40 146.48 146.40 50 146.53 60 40 146.46 146.52 50 146.55 60 146.51 40 50 146.47 146.51 60 46.48 46.42 46.52 46.51 46.53 46.53 46.61 46.58 46.69 96.56 96.54 96.56 96.59 96.51 96.55 96.52 96.50 96.58 146.53 146.55 146.57 146.58 146.55 146.42 146.49 146.50 146.56 1.97 2.00 2.00 2.00 2.00 2.00 1.95 2.00 2.00 1.98 2.00 2.00 1.98 2.00 2.00 1.98 2.00 2.00 1.97 2.00 2.00 1.97 2.00 2.00 1.97 2.00 2.00 1.20 1.41 1.57 1.30 1.33 1.57 1.27 1.48 1.65 1.27 1.31 1.62 1.22 1.33 1.60 1.20 1.34 1.59 1.19 1.34 1.54 1.16 1.42 1.56 1.20 1.46 1.53 46.76 46.68 46.58 46.78 46.66 46.58 46.76 46.67 46.51 96.75 96.67 96.70 96.76 96.59 96.59 96.80 96.64 96.50 146.76 146.68 146.51 146.84 146.64 146.45 146.80 146.70 146.56 1.68 1.91 1.98 1.77 1.95 1.99 1.81 1.95 1.99 1.74 1.92 1.99 1.85 1.96 1.99 1.74 1.90 2.00 1.73 1.95 2.00 1.73 1.97 1.98 1.72 1.92 1.99 87 Table 3.6: Standard errors for average true positives and average true negatives given in Table 3.5. The maximum number of true positives for a given replication is 2. The maximum number of true negatives for a given replication is T − 3 δ=0.15 δ=0.25 δ=0.35 T p 500 50 750 1000 500 100 750 1000 500 150 750 1000 n ATP SE ATN SE ATP SE ATN SE ATP SE ATN SE 40 50 60 40 50 60 40 50 60 40 50 60 40 50 60 40 50 60 40 50 60 40 50 60 40 50 60 0.62 0.53 0.55 0.42 0.48 0.55 0.55 0.47 0.63 0.50 0.47 0.48 0.50 0.55 0.49 0.43 0.50 0.61 0.43 0.47 0.52 0.40 0.50 0.58 0.40 0.46 0.54 0.49 0.49 0.55 0.61 0.46 0.50 0.50 0.43 0.67 0.52 0.50 0.54 0.50 0.50 0.78 0.55 0.50 0.61 0.56 0.53 0.52 0.58 0.56 0.56 0.52 0.63 0.50 0.40 0.49 0.50 0.46 0.47 0.50 0.45 0.50 0.48 0.45 0.47 0.49 0.42 0.47 0.49 0.40 0.48 0.49 0.39 0.48 0.50 0.37 0.50 0.50 0.40 0.50 0.50 0.47 0.29 0.14 0.42 0.22 0.10 0.39 0.22 0.10 0.44 0.27 0.10 0.36 0.20 0.10 0.44 0.30 0.00 0.45 0.22 0.00 0.45 0.17 0.14 0.45 0.27 0.10 0.52 0.52 0.50 0.50 0.50 0.56 0.51 0.50 0.47 0.50 0.50 0.52 0.51 0.50 0.50 0.50 0.52 0.50 0.56 0.50 0.50 0.50 0.52 0.52 0.50 0.67 0.52 0.17 0.00 0.00 0.00 0.00 0.00 0.22 0.00 0.00 0.14 0.00 0.00 0.14 0.00 0.00 0.14 0.00 0.00 0.17 0.00 0.00 0.17 0.00 0.00 0.18 0.00 0.00 3.7 An empirical study Human memory has been studied through fMRI experiments in the context of discrete and continuous activities. One goal of neurologists is to better understand perception and memory processes in humans as they experience continuous real-world events (Baldassano et al. 2017). Event segmentation theory, posited by Zacks et al. (2007), poses that under certain conditions, humans generate event boundaries in memory during continuous percep- 88 tion events. Thus, humans may partition a continuous experience into a series of segmented discrete events. Baldassano et al. (2017) investigated event boundary detection and con- cluded that long-term memory in humans is structured as a series of hierarchical discrete events. Moreover, Schapiro et al. (2013) suggested that event boundaries are formed around changes in functional connectivity. In this section, we apply our method to the task-based fMRI data set analyzed in Baldassano et al. (2017) and Chen et al. (2017) in order to study the brain’s dynamic functional connectivity. In the presence of brain dynamic func- tional activity, points of change may represent these event boundaries as suggested in the aforementioned neuroscience literature. We apply our proposed method to a task-based fMRI data set collected by Chen et al. (2017), where they investigated the effects of memories across different individuals. The experiment involved 17 participants that each watched the same 48-minute segment of the BBC television series Sherlock while undergoing an fMRI scan. The 48-minute segment was the first 48-minutes of the first episode in the television series. None of the participants had watched the series Sherlock prior to the study. Chen et al. partitioned the television episode into a 23-minute segment and a 25-minute segment. Each segment was prepended by a 30-second cartoon to allow the brain time to adjust to new audio and visual stimuli. Including an unrelated cartoon prior to studies such as this is common practice as it reduces statistical noise. Subjects were instructed to watch the television episode as they would watch a typical television episode in their own home. The fMRI data were gathered from a Siemens Skyra 3T full-body scanner. More details about the experiment and processes of acquiring functional and anatomical images are provided in Chen et al. (2017). The 48-minute segment of Sherlock resulted in 1,976 time point measurements of data. For each participant, the fMRI machine acquired an image the participant’s brain every 1.5 seconds. To demonstrate our proposed method, we analyzed the first 100 time points which equates to the first 150 seconds of the Sherlock episode. Let Yit be the BOLD random vector for the 268 nodes of the ith individual at time t. Thus, Yit (i = 1, . . . , 17; t = 1, . . . , 100) 89 is a 268-dimensional random vector. A node, or region of interest, represents a collection of voxels. The 268 node parcellation was performed according to Shen et al. (2013), where voxels groupings ensure functional homogeneity within each node, making it ideal for node network and dynamic functional connectivity analysis. Figure 3.2 illustrates the 268 Shen node parcellation along with large-scale node groupings. Node-level analysis decreases the data dimensional and allows for more interpretable results. For further details on the benefits and processes of Shen node parcellation, we refer readers to Shen et al. (2013). Figure 3.2: Shen 268 node parcellation. This image was obtained from Finn et al. (2015). In our analysis n = 17, p = 268, and T = 100. Based on (3.1) – (3.2), we assume that at each time point there exists a common population covariance matrix among all 17 individuals. Our assumption is not unrealistic given this task-based fMRI experiment. In Chen et al. (2017) and Baldassano et al. (2017), they found that an across-subject design was appropriate due to consistent stimulus-response across patients for a given brain region. Under model (3.2), we applied our procedure to test (3.1). Based on the test statistic value, Mn = 3.6596, we rejected H0 of (3.1) as the p-value was less than 0.001. Hence, we rejected the claim that the covariance matrices were stationary for all T = 100 time points. Accordingly, we applied binary segmentation to identify all significant change points among 90 99 possible points of change. Our proposed method identified 17 locations of significance. Change points were located at time points 2, 25, 36, 39, 40, 41, 42, 58, 60, 61, 63, 81, 83, 88, 89, 91, and 92. A change point at time two implies that Σ1 = Σ2 (cid:54)= Σ3. Table 3.7: Identified change points in the Sherlock fMRI data set. Range of time points preceding the identified change point where the covariance matrices are temporally homogeneous. An interval ID provides a reference to Figure 3.3 Change point Interval Homogeneous interval 2 25 36 39 40 41 42 58 60 61 63 81 83 88 89 91 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 [1, 2] [3, 25] [26, 36] [37, 39] [40, 40] [41, 41] [42, 42] [43, 58] [59, 60] [61, 61] [62, 63] [64, 81] [82, 83] [84, 88] [89, 89] [90, 91] [92, 92] Figures (3.3) illustrates these temporal changes among covariance matrices around the identified change points listed in Table 3.7. Each subplot is the average correlation between nodes across the time interval where the covariance matrices are homogeneous. Thus, in Figure 3.3, Interval 1 represents the correlation network based on the average correlations between nodes over time interval [1, 2]. Interval 2 represents the correlation network based on the average correlations between nodes over time interval [3, 25]. Table 3.7 details the time interval corresponding to the temporal homogeneous covariance matrices preceding each identified change point. Therefore, given that a change point was located at t = 2, the correlation network of Interval 1 compared to Interval 2 should be significantly different. Correlation network layouts are structured according to the eight large-scale node groupings 91 illustrated in Figure 3.2. The top-centered circle consists of nodes within the medial frontal group. Moving clockwise on a given sub-plot, the remaining circles represent frontoparietal, default mode, subcortical-cerebellum, motor, visual I, visual II, and visual association. The identified change points in Table 3.7 coincide with interesting events in the television episode Sherlock. For example, the first change point at t = 2 may be a reaction to initial stimuli of the cartoon. The brain must process this initial video and audio stimuli. At approximately 37 to 38 seconds into the series Sherlock the cartoon ends, and a graphic war scene commences. Guns are fired, casualties are shown, but there is no distinguishable dialect. The transition point from cartoon clip to battle scene coincides with the change point identified at t = 25. After this war scene a period of quiet ensues. The first understandable dialect from actors occurs at approximately two minutes and 11 seconds into the episode. At this time, a therapist inquires about a patient’s well-being as the viewer learns the opening war scene was a flashback. Change points identified at 88, 89, 91, and 92 equate to the start of this conversation. 92 Figure 3.3: Correlation networks based on an average over a time interval in which the covariance matrices are homogeneous. Each circle is comprised of 67 Shen nodes. Solid lines represent a positive correlation, and dashed lines represent a negative correlation. The darker the line the stronger the correlation between nodes. A correlation threshold value of 0.70 in absolute values was used. 93 3.8 Technical details This section contains proofs of lemmas and the main theorems. Some of the expressions are rather long. Thus, for readability, an equation will not always be aligned with the initial equality sign. 3.8.1 Proofs of lemmas First, we provide proofs for some lemmas that will be used in the proofs corresponding to the main theorems. sakzi, where zi is a standard multivariate normal random vector, Lemma 7. Let Yisak = γT and γT sak is known. Then (cid:16) E YisakYisalYircmYircnYiueoYiuepYixgqYixgw ueΣwq xg + Σlk saΣnm rc Cpq uexg Cwo xgue (cid:17) (cid:16) Σlk rc Σpo saΣnm rc Σpo ueClq sarcCnk uerc + Σlk rcueCpm sarcCnk xgsaCno uerc + Σnm rcsa + Clm rcueCpm + Σlk + Σnm + Clo saΣpo rc Σwq saueCpk rc Clo sarcCno ueCnq xg Clo uesaCnq saueCpq rcueCpq rcxg Cwm saueCpk xgrc + Σlk uesa + Σpo saΣwq ueΣwq xgrc + Clq xgsa + Σpo xgsa + Clm xg Cno xg Clm saxg Cwk ueClm sarcCnq rcxg Cwm uexg Cwk uexg Cwk + Σnm + Clm uexg Cwo xgue rcueCpq uexg Cwm xgrc saxg Cwk xgsa rcsaCpq saCno sarcCno saueCpm sarcCnq rcxg Cwk xgsa + Σwq xg Clm rcxg Cwo xgueCpk uesa + Clo rcueCpk uesa uercCnq rcxg Cwk xgsa, where Σlk sa = γT salγsak and Cpq uexg = γT ueqγxgq. Proof. Let A1, A2, A3, and A4 be any matrices of appropriate dimensions. Assume zi is a 94 3 )tr(A2A4 + A2AT 4 ) 2 )(A3 + AT 3 )(A4 + AT 4 )} + tr(A1)tr(A2)tr(A3A4 + A3AT 4 ) + tr(A1)tr(A3)tr(A2A4 + A2AT 4 ) + tr(A1)tr(A4)tr(A2A3 + A2AT + tr(A2)tr(A4)tr(A1A3 + A1AT 3 ) + tr(A2)tr(A3)tr(A1A4 + A1AT 4 ) 3 ) + tr(A3)tr(A4)tr(A1A2 + A1AT 2 ) (cid:105) + tr(A1A2 + A1AT 2 )tr(A3A4 + A3AT (cid:105) 4 ) + tr(A1A3 + A1AT (cid:104) tr(A1)tr{(A2 + AT + 4 )} 4 )} 4 )tr(A2A3 + A2AT 3 ) 1 )(A3 + AT 1 )(A2 + AT 1 )(A2 + AT 3 )(A4 + AT 2 )(A4 + AT 2 )(A3 + AT + tr(A1A4 + A1AT + tr(A2)tr{(A1 + AT + tr(A3)tr{(A1 + AT (cid:104) + tr(A4)tr{(A1 + AT tr{(A1 + AT + + tr{(A1 + AT + tr{(A1 + AT 1 )(A2 + AT 2 )(A3 + AT 1 )(A2 + AT 1 )(A3 + AT 2 )(A4 + AT 3 )(A2 + AT (cid:110) 3 )}(cid:105) 4 )}(cid:105) 4 )} 3 )(A4 + AT 3 )} 4 )(A3 + AT 2 )(A4 + AT . (cid:104) (cid:104) (cid:16) (cid:110) (cid:16) standard multivariate normal random vector. By the results of multivariate analysis E zT i A1zizT i A2zizT i A3zizT i A4zi = tr(A1)tr(A2)tr(A3)tr(A4) (cid:17) (cid:111) = By the definition of Yi··, E i γrcmγT salzizT E zT i γsakγT i γxgqγT substitutions for A1, A2, A3, and A4, it follows that i γueoγT uepzizT rcnzizT xgwzi YisakYisalYircmYircnYiueoYiuepYixgqYixgw (cid:111) . Thus, making the appropriate E YisakYisalYircmYircnYiueoYiuepYixgqYixgw ueΣwq xg + Σlk saΣnm rc Cpq uexg Cwo xgue (cid:17) (cid:16) Σlk rc Σpo saΣnm ueClq rc Σpo sarcCnk uerc + Σlk rcueCpm sarcCnk xgsaCno uerc + Σnm rcsa + Clm rcueCpm + Σlk + Σnm saΣpo rc Σwq saueCpk rc Clo sarcCno ueCnq xg Clo uesaCnq saueCpq rcueCpq rcxg Cwm saueCpk xgrc + Σlk uesa + Σpo saΣwq ueΣwq xgrc + Clq xgsa + Σpo xgsa + Clm xg Cno xg Clm saxg Cwk ueClm sarcCnq rcxg Cwm uexg Cwk uexg Cwk + Clo + Σnm + Clm uexg Cwo xgue rcueCpq uexg Cwm xgrc saxg Cwk xgsa rcsaCpq saCno sarcCno saueCpm sarcCnq rcxg Cwk xgsa + Σwq xg Clm rcxg Cwo xgueCpk uesa + Clo rcueCpk uesa uercCnq rcxg Cwk xgsa, where Σlk sa = γT salγsak and Cpq uexg = γT ueqγxgq. (cid:3) 95 Lemma 8. Let Vsasb(i, j) = (Y T isa and i (cid:54)= j. Then (cid:110) E Vsasb(i, j)Vrcrd(i, j)Vueuf (i, j)Vxgxh(i, j) )2, where sa, sb, rc, rd, ue, uf , xg, xh ∈ {1 . . . T − 1} Yjsb (cid:111) = H1 + H2 + H3 + H4 + H5 + H6 + H7 + H8 + H9 + H10 + H11 + H12 + H13 + H14 + H15 + H16 + H17 where Hk, k ∈ {1, . . . , 17} is given below. (cid:104) H1 = (cid:104) H2 = tr4(Σ2) + tr2(Σ2)tr(ΣCuf xhΣCxhuf ) + tr2(Σ2)tr(ΣCrdxhΣCxhrd) + tr2(Σ2)tr(ΣCrduf ΣCuf rd) + tr2(Σ2)tr(ΣCsbxhΣCxhsb) + tr2(Σ2)tr(ΣCsbuf ΣCuf sb) + tr2(Σ2)tr(ΣCsbrdΣCrdsb) + tr(ΣCsbrdΣCrdsb)tr(ΣCuf xhΣCxhuf ) + tr(ΣCsbuf ΣCuf sb)tr(ΣCrdxhΣCxhrd) + tr(ΣCsbxhΣCxhsb)tr(ΣCrduf ΣCuf rd) + tr(Σ2)tr(ΣCrdxhΣCxhuf ΣCuf rd) + tr(Σ2)tr(ΣCsbxhΣCxhuf ΣCuf sb) + tr(Σ2)tr(ΣCsbxhΣCxhrdΣCrdsb) + +tr(Σ2)tr(ΣCsbuf ΣCuf rdΣCrdsb) + tr(ΣCsbxhΣCxhuf ΣCuf rdΣCrdsb) + tr(ΣCsbuf ΣCuf xhΣCxhrdΣCrdsb) + tr(ΣCsbxhΣCxhrdΣCrduf ΣCuf sb) (cid:105) tr2(Σ2)tr(ΣCuexg ΣCxgue) + tr2(Σ2)tr2(Cuexg Cxhuf ) + tr(Σ2)tr(ΣCrdxhCxgueΣCuexg Cxhrd) + tr(Σ2)tr(ΣCrduf Cuexg ΣCxgueCuf rd) + tr(Σ2)tr(ΣCsbxhCxgueΣCuexg Cxhsb) + tr(Σ2)tr(ΣCsbuf Cuexg ΣCxgueCuf sb) + tr(ΣCsbrdΣCrdsb)tr(ΣCuexg ΣCxgue) + tr(ΣCsbrdΣCrdsb)tr2(Cuexg Cxhuf ) + tr(ΣCsbuf Cuexg CxhrdΣCrdxhCxgueCuf sb) + tr(ΣCsbxhCxgueCuf rdΣCrduf Cuexg Cxhsb) + tr(Σ2)tr(ΣCrdxhCxgueCuf rd)tr(Cuexg Cxhuf ) 96 + tr(Σ2)tr(ΣCsbxhCxgueCuf sb)tr(Cuexg Cxhuf ) + tr(ΣCsbxhCxgueΣCuexg CxhrdΣCrdsb) + tr(ΣCsbuf Cuexg ΣCxgueCuf rdΣCrdsb) + tr(ΣCsbxhCxgueCrduf ΣCrdsb)tr2(Cuexg Cxhuf ) + tr(ΣCsbuf Cuexg CxhrdΣCrdsb)tr2(Cuexg Cxhuf ) + tr(ΣCsbxhCxgueCuf sb)tr(ΣCrduf Cuexg Cxhrd) (cid:105) tr2(Σ2)tr(ΣCrcxg ΣCxgrc) + tr2(Σ2)tr2(Crcxg Cxhrd) + tr(Σ2)tr(ΣCuf xhCxgrcΣCrcxg Cxhuf ) + tr(Σ2)tr(ΣCuf rdCrcxg ΣCxgrcCrduf ) + tr(Σ2)tr(ΣCsbxhCxgrcΣCrcxg Cxhsb) + tr(Σ2)tr(ΣCsbrdCrcxg ΣCxgrcCrdsb) + tr(ΣCsbuf ΣCuf sb)tr(ΣCrcxg ΣCxgrc) + tr(ΣCsbuf ΣCuf sb)tr2(Crcxg Cxhrd) + tr(ΣCsbrdCrcxg Cxhuf ΣCuf xhCxgrcCrdsb) + tr(ΣCsbxhCxgrcCrduf ΣCuf rdCrcxg Cxhsb) + tr(Σ2)tr(ΣCuf xhCxgrcCrduf )tr(Crcxg Cxhrd) + tr(Σ2)tr(ΣCsbxhCxgrcCrdsb)tr(Crcxg Cxhrd) + tr(ΣCsbxhCxgrcΣCrcxg Cxhuf ΣCuf sb) + tr(ΣCsbrdCrcxg ΣCxgrcCrduf ΣCuf sb) + tr(ΣCsbxhCxgrcCuf rdΣCuf sb)tr2(Crcxg Cxhrd) + tr(ΣCsbrdCrcxg Cxhuf ΣCuf sb)tr2(Crcxg Cxhrd) + tr(ΣCsbxhCxgrcCrdsb)tr(ΣCuf rdCrcxg Cxhuf ) (cid:105) tr2(Σ2)tr(ΣCrcueΣCuerc) + tr2(Σ2)tr2(CrcueCuf rd) 97 (cid:104) H3 = (cid:104) H4 = + tr(Σ2)tr(ΣCxhuf CuercΣCrcueCuf xh) + tr(Σ2)tr(ΣCxhrdCrcueΣCuercCrdxh) + tr(Σ2)tr(ΣCsbuf CuercΣCrcueCuf sb) + tr(Σ2)tr(ΣCsbrdCrcueΣCuercCrdsb) + tr(ΣCsbxhΣCxhsb)tr(ΣCrcueΣCuerc) + tr(ΣCsbxhΣCxhsb)tr2(CrcueCuf rd) + tr(ΣCsbrdCrcueCuf xhΣCxhuf CuercCrdsb) + tr(ΣCsbuf CuercCrdxhΣCxhrdCrcueCuf sb) + tr(Σ2)tr(ΣCxhuf CuercCrdxh)tr(CrcueCuf rd) + tr(Σ2)tr(ΣCsbuf CuercCrdsb)tr(CrcueCuf rd) + tr(ΣCsbuf CuercΣCrcueCuf xhΣCxhsb) + tr(ΣCsbrdCrcueΣCuercCrdxhΣCxhsb) + tr(ΣCsbuf CuercCxhrdΣCxhsb)tr2(CrcueCuf rd) + tr(ΣCsbrdCrcueCuf xhΣCxhsb)tr2(CrcueCuf rd) + tr(ΣCsbuf CuercCrdsb)tr(ΣCxhrdCrcueCuf xh) (cid:105) (cid:104) H5 = tr2(Σ2)tr(ΣCsaxg ΣCxgsa) + tr2(Σ2)tr2(Csaxg Cxhsb) + tr(Σ2)tr(ΣCuf xhCxgsaΣCsaxg Cxhuf ) + tr(Σ2)tr(ΣCuf sbCsaxg ΣCxgsaCsbuf ) + tr(Σ2)tr(ΣCrdxhCxgsaΣCsaxg Cxhrd) + tr(Σ2)tr(ΣCrdsbCsaxg ΣCxgsaCsbrd) + tr(ΣCrduf ΣCuf rd)tr(ΣCsaxg ΣCxgsa) + tr(ΣCrduf ΣCuf rd)tr2(Csaxg Cxhsb) + tr(ΣCrdsbCsaxg Cxhuf ΣCuf xhCxgsaCsbrd) + tr(ΣCrdxhCxgsaCsbuf ΣCuf sbCsaxg Cxhrd) + tr(Σ2)tr(ΣCuf xhCxgsaCsbuf )tr(Csaxg Cxhsb) + tr(Σ2)tr(ΣCrdxhCxgsaCsbrd)tr(Csaxg Cxhsb) + tr(ΣCrdxhCxgsaΣCsaxg Cxhuf ΣCuf rd) 98 + tr(ΣCrdsbCsaxg ΣCxgsaCsbuf ΣCuf rd) + tr(ΣCrdxhCxgsaCuf sbΣCuf rd)tr2(Csaxg Cxhsb) + tr(ΣCrdsbCsaxg Cxhuf ΣCuf rd)tr2(Csaxg Cxhsb) + tr(ΣCrdxhCxgsaCsbrd)tr(ΣCuf sbCsaxg Cxhuf ) tr2(Σ2)tr(ΣCsaueΣCuesa) + tr2(Σ2)tr2(CsaueCuf sb) + tr(Σ2)tr(ΣCxhuf CuesaΣCsaueCuf xh) + tr(Σ2)tr(ΣCxhsbCsaueΣCuesaCsbxh) + tr(Σ2)tr(ΣCrduf CuesaΣCsaueCuf rd) + tr(Σ2)tr(ΣCrdsbCsaueΣCuesaCsbrd) + tr(ΣCrdxhΣCxhrd)tr(ΣCsaueΣCuesa) + tr(ΣCrdxhΣCxhrd)tr2(CsaueCuf sb) + tr(ΣCrdsbCsaueCuf xhΣCxhuf CuesaCsbrd) + tr(ΣCrduf CuesaCsbxhΣCxhsbCsaueCuf rd) + tr(Σ2)tr(ΣCxhuf CuesaCsbxh)tr(CsaueCuf sb) + tr(Σ2)tr(ΣCrduf CuesaCsbrd)tr(CsaueCuf sb) + tr(ΣCrduf CuesaΣCsaueCuf xhΣCxhrd) + tr(ΣCrdsbCsaueΣCuesaCsbxhΣCxhrd) + tr(ΣCrduf CuesaCxhsbΣCxhrd)tr2(CsaueCuf sb) + tr(ΣCrdsbCsaueCuf xhΣCxhrd)tr2(CsaueCuf sb) + tr(ΣCrduf CuesaCsbrd)tr(ΣCxhsbCsaueCuf xh) (cid:105) (cid:105) (cid:104) H6 = (cid:104) H7 = tr2(Σ2)tr(ΣCsarcΣCrcsa) + tr2(Σ2)tr2(CsarcCrdsb) + tr(Σ2)tr(ΣCxhrdCrcsaΣCsarcCrdxh) + tr(Σ2)tr(ΣCxhsbCsarcΣCrcsaCsbxh) + tr(Σ2)tr(ΣCuf rdCrcsaΣCsarcCrduf ) 99 (cid:104) H8 = + tr(Σ2)tr(ΣCuf sbCsarcΣCrcsaCsbuf ) + tr(ΣCuf xhΣCxhuf )tr(ΣCsarcΣCrcsa) + tr(ΣCuf xhΣCxhuf )tr2(CsarcCrdsb) + tr(ΣCuf sbCsarcCrdxhΣCxhrdCrcsaCsbuf ) + tr(ΣCuf rdCrcsaCsbxhΣCxhsbCsarcCrduf ) + tr(Σ2)tr(ΣCxhrdCrcsaCsbxh)tr(CsarcCrdsb) + tr(Σ2)tr(ΣCuf rdCrcsaCsbuf )tr(CsarcCrdsb) + tr(ΣCuf rdCrcsaΣCsarcCrdxhΣCxhuf ) + tr(ΣCuf sbCsarcΣCrcsaCsbxhΣCxhuf ) + tr(ΣCuf rdCrcsaCxhsbΣCxhuf )tr2(CsarcCrdsb) + tr(ΣCuf sbCsarcCrdxhΣCxhuf )tr2(CsarcCrdsb) + tr(ΣCuf rdCrcsaCsbuf )tr(ΣCxhsbCsarcCrdxh) (cid:105) tr(ΣCsarcΣCrcsa)tr(ΣCuexg ΣCxgue) + tr(ΣCsarcΣCrcsa)tr2(Cuexg Cxhuf ) + tr(ΣCsarcCrdxhCxgueΣCuexg CxhrdCrcsa) + tr(ΣCsarcCrduf Cuexg ΣCxgueCuf rdCrcsa) + tr(ΣCrcsaCsbxhCxgueΣCuexg CxhsbCsarc) + tr(ΣCrcsaCsbuf Cuexg ΣCxgueCuf sbCsarc) + tr(ΣCuexg ΣCxgue)tr2(CsarcCrdsb) + tr2(CsarcCrdsb)tr2(Cuexg Cxhuf ) + tr2(CsarcCrdxhCxgueCuf sb) + tr2(CsarcCrduf Cuexg Cxhsb) + tr(ΣCsarcCrduf Cuexg CxhrdCrcsa)tr(Cuexg Cxhuf ) + tr(ΣCrcsaCsbuf Cuexg CxhsbCsarc)tr(Cuexg Cxhuf ) + tr(ΣCuexg CxhsbCsarcCrdxhCxgue)tr(CsarcCrdsb) + tr(ΣCxgueCuf sbCsarcCrduf Cuexg )tr(CsarcCrdsb) + tr(CrcsaCsbxhCxgueCuf rd)tr(CsarcCrdsb)tr(Cuexg Cxhuf ) 100 + tr(CrcsaCsbuf Cuexg Cxhrd)tr(CsarcCrdsb)tr(Cuexg Cxhuf ) + tr(CsarcCrduf Cuexg CxhrdCrcsaCsbxhCxgueCuf sb) (cid:105) (cid:105) tr(ΣCsaueΣCuesa)tr(ΣCrcxg ΣCxgrc) + tr(ΣCsaueΣCuesa)tr2(Crcxg Cxhrd) + tr(ΣCsaueCuf xhCxgrcΣCrcxg Cxhuf Cuesa) + tr(ΣCsaueCuf rdCrcxg ΣCxgrcCrduf Cuesa) + tr(ΣCuesaCsbxhCxgrcΣCrcxg CxhsbCsaue) + tr(ΣCuesaCsbrdCrcxg ΣCxgrcCrdsbCsaue) + tr(ΣCrcxg ΣCxgrc)tr2(CsaueCuf sb) + tr2(CsaueCuf sb)tr2(Crcxg Cxhrd) + tr2(CsaueCuf xhCxgrcCrdsb) + tr2(CsaueCuf rdCrcxg Cxhsb) + tr(ΣCsaueCuf rdCrcxg Cxhuf Cuesa)tr(Crcxg Cxhrd) + tr(ΣCuesaCsbrdCrcxg CxhsbCsaue)tr(Crcxg Cxhrd) + tr(ΣCrcxg CxhsbCsaueCuf xhCxgrc)tr(CsaueCuf sb) + tr(ΣCxgrcCrdsbCsaueCuf rdCrcxg )tr(CsaueCuf sb) + tr(CuesaCsbxhCxgrcCrduf )tr(CsaueCuf sb)tr(Crcxg Cxhrd) + tr(CuesaCsbrdCrcxg Cxhuf )tr(CsaueCuf sb)tr(Crcxg Cxhrd) + tr(CsaueCuf rdCrcxg Cxhuf CuesaCsbxhCxgrcCrdsb) (cid:104) H9 = (cid:104) H10 = tr(ΣCsaxg ΣCxgsa)tr(ΣCrcueΣCuerc) + tr(ΣCsaxg ΣCxgsa)tr2(CrcueCuf rd) + tr(ΣCsaxg Cxhuf CuercΣCrcueCuf xhCxgsa) + tr(ΣCsaxg CxhrdCrcueΣCuercCrdxhCxgsa) + tr(ΣCxgsaCsbuf CuercΣCrcueCuf sbCsaxg ) + tr(ΣCxgsaCsbrdCrcueΣCuercCrdsbCsaxg ) 101 (cid:104) H11 = + tr(ΣCrcueΣCuerc)tr2(Csaxg Cxhsb) + tr2(Csaxg Cxhsb)tr2(CrcueCuf rd) + tr2(Csaxg Cxhuf CuercCrdsb) + tr2(Csaxg CxhrdCrcueCuf sb) + tr(ΣCsaxg CxhrdCrcueCuf xhCxgsa)tr(CrcueCuf rd) + tr(ΣCxgsaCsbrdCrcueCuf sbCsaxg )tr(CrcueCuf rd) + tr(ΣCrcueCuf sbCsaxg Cxhuf Cuerc)tr(Csaxg Cxhsb) + tr(ΣCuercCrdsbCsaxg CxhrdCrcue)tr(Csaxg Cxhsb) + tr(CxgsaCsbuf CuercCrdxh)tr(Csaxg Cxhsb)tr(CrcueCuf rd) (cid:105) + tr(CxgsaCsbrdCrcueCuf xh)tr(Csaxg Cxhsb)tr(CrcueCuf rd) + tr(Csaxg CxhrdCrcueCuf xhCxgsaCsbuf CuercCrdsb) tr(Σ2)tr(ΣCrcxg ΣCxgueΣCuerc) + tr(Σ2)tr(Cuexg Cxhuf )tr(ΣCrcxg Cxhuf Cuerc) + tr(Σ2)tr(Crcxg Cxhrd)tr(ΣCuercCrdxhCxgue) + tr(Σ2)tr(CrcueCuf rd)tr(ΣCxgueCuf rdCrcxg ) + tr(ΣCsbxhCxgrcΣCrcueΣCuexg Cxhsb) + tr(ΣCsbuf Cuexg ΣCxgrcΣCrcueCuf sb) + tr(ΣCsbrdCrcueΣCuexg ΣCxgrcCrdsb) + tr(ΣCsbrdCrcueCuf xhCxgrcCrdsb)tr(Cuexg Cxhuf ) + tr(ΣCsbuf Cuexg CxhrdCrcueCuf sb)tr(Crcxg Cxhrd) + tr(ΣCsbxhCxgrcCrduf Cuexg Cxhsb)tr(CrcueCuf rd) + tr(Σ2)tr(CrcueCuf rd)tr(Cuexg Cxhuf )tr(Crcxg Cxhrd) + tr(ΣCsbxhCxgrcΣCrcueCuf sb)tr(Cuexg Cxhuf ) + tr(ΣCsbxhCxgrcCrdsb)tr(ΣCuercCrdxhCxgue) 102 + tr(ΣCsbuf Cuexg ΣCxgrcCrdsb)tr(CrcueCuf rd) + tr(ΣCsbxhCxgrcCrdsb)tr(CrcueCuf rd)tr(Cuexg Cxhuf ) + tr(ΣCsbuf Cuexg CxhrdCrcueCuf xhCxgrcCrdsb) + tr(ΣCsbxhCxgrcCrduf Cuexg CxhrdCrcueCuf sb) (cid:105) (cid:105) 103 (cid:104) H12 = (cid:104) H13 = tr(Σ2)tr(ΣCsaxg ΣCxgueΣCuesa) + tr(Σ2)tr(Cuexg Cxhuf )tr(ΣCsaxg Cxhuf Cuesa) + tr(Σ2)tr(Csaxg Cxhsb)tr(ΣCuesaCsbxhCxgue) + tr(Σ2)tr(CsaueCuf sb)tr(ΣCxgueCuf sbCsaxg ) + tr(ΣCrdxhCxgsaΣCsaueΣCuexg Cxhrd) + tr(ΣCrduf Cuexg ΣCxgsaΣCsaueCuf rd) + tr(ΣCrdsbCsaueΣCuexg ΣCxgsaCsbrd) + tr(ΣCrdsbCsaueCuf xhCxgsaCsbrd)tr(Cuexg Cxhuf ) + tr(ΣCrduf Cuexg CxhsbCsaueCuf rd)tr(Csaxg Cxhsb) + tr(ΣCrdxhCxgsaCsbuf Cuexg Cxhrd)tr(CsaueCuf sb) + tr(Σ2)tr(CsaueCuf sb)tr(Cuexg Cxhuf )tr(Csaxg Cxhsb) + tr(ΣCrdxhCxgsaΣCsaueCuf rd)tr(Cuexg Cxhuf ) + tr(ΣCrdxhCxgsaCsbrd)tr(ΣCuesaCsbxhCxgue) + tr(ΣCrduf Cuexg ΣCxgsaCsbrd)tr(CsaueCuf sb) + tr(ΣCrdxhCxgsaCsbrd)tr(CsaueCuf sb)tr(Cuexg Cxhuf ) + tr(ΣCrduf Cuexg CxhsbCsaueCuf xhCxgsaCsbrd) + tr(ΣCrdxhCxgsaCsbuf Cuexg CxhsbCsaueCuf rd) tr(Σ2)tr(ΣCsaxg ΣCxgrcΣCrcsa) + tr(Σ2)tr(Crcxg Cxhrd)tr(ΣCsaxg CxhrdCrcsa) + tr(Σ2)tr(Csaxg Cxhsb)tr(ΣCrcsaCsbxhCxgrc) + tr(Σ2)tr(CsarcCrdsb)tr(ΣCxgrcCrdsbCsaxg ) + tr(ΣCuf xhCxgsaΣCsarcΣCrcxg Cxhuf ) + tr(ΣCuf rdCrcxg ΣCxgsaΣCsarcCrduf ) + tr(ΣCuf sbCsarcΣCrcxg ΣCxgsaCsbuf ) + tr(ΣCuf sbCsarcCrdxhCxgsaCsbuf )tr(Crcxg Cxhrd) + tr(ΣCuf rdCrcxg CxhsbCsarcCrduf )tr(Csaxg Cxhsb) + tr(ΣCuf xhCxgsaCsbrdCrcxg Cxhuf )tr(CsarcCrdsb) + tr(Σ2)tr(CsarcCrdsb)tr(Crcxg Cxhrd)tr(Csaxg Cxhsb) + tr(ΣCuf xhCxgsaΣCsarcCrduf )tr(Crcxg Cxhrd) + tr(ΣCuf xhCxgsaCsbuf )tr(ΣCrcsaCsbxhCxgrc) + tr(ΣCuf rdCrcxg ΣCxgsaCsbuf )tr(CsarcCrdsb) + tr(ΣCuf xhCxgsaCsbuf )tr(CsarcCrdsb)tr(Crcxg Cxhrd) + tr(ΣCuf rdCrcxg CxhsbCsarcCrdxhCxgsaCsbuf ) + tr(ΣCuf xhCxgsaCsbrdCrcxg CxhsbCsarcCrduf ) (cid:105) (cid:104) H14 = tr(Σ2)tr(ΣCsaueΣCuercΣCrcsa) + tr(Σ2)tr(CrcueCuf rd)tr(ΣCsaueCuf rdCrcsa) + tr(Σ2)tr(CsaueCuf sb)tr(ΣCrcsaCsbuf Cuerc) + tr(Σ2)tr(CsarcCrdsb)tr(ΣCuercCrdsbCsaue) + tr(ΣC∗uf CuesaΣCsarcΣCrcueCuf∗) + tr(ΣC∗rdCrcueΣCuesaΣCsarcCrd∗) + tr(ΣC∗sbCsarcΣCrcueΣCuesaCsb∗) + tr(ΣC∗sbCsarcCrduf CuesaCsb∗)tr(CrcueCuf rd) + tr(ΣC∗rdCrcueCuf sbCsarcCrd∗)tr(CsaueCuf sb) + tr(ΣC∗uf CuesaCsbrdCrcueCuf∗)tr(CsarcCrdsb) + tr(Σ2)tr(CsarcCrdsb)tr(CrcueCuf rd)tr(CsaueCuf sb) 104 + tr(ΣC∗uf CuesaΣCsarcCrd∗)tr(CrcueCuf rd) + tr(ΣC∗uf CuesaCsb∗)tr(ΣCrcsaCsbuf Cuerc) + tr(ΣC∗rdCrcueΣCuesaCsb∗)tr(CsarcCrdsb) + tr(ΣC∗uf CuesaCsb∗)tr(CsarcCrdsb)tr(CrcueCuf rd) + tr(ΣC∗rdCrcueCuf sbCsarcCrduf CuesaCsb∗) + tr(ΣC∗uf CuesaCsbrdCrcueCuf sbCsarcCrd∗) (cid:105) tr(ΣCsaxxΣCxgueΣCuercΣCrcsa) + tr(Cuexg Cxhuf )tr(ΣCsaxg Cxhuf CuercΣCrcsa) (cid:104) H15 = + tr(ΣCsaxg CxhrdCrcsa)tr(ΣCuercCrdxhCxgue) + tr(CrcueCuf rd)tr(ΣCsaxg ΣCxgueCuf rdCrcsa) + tr(Csaxg Cxhsb)tr(ΣCrcsaCsbxhCxgueΣCuerc) + tr(ΣCrcsaCsbuf Cuerc)tr(ΣCxgueCuf sbCsaxg ) + tr(CsarcCrdsb)tr(ΣCuercCrdsbCsaxg ΣCxgue) + tr(CsarcCrdsb)tr(Cuexg Cxhuf )tr(CrcueCuf xhCxgsaCsbrd) + tr(CsarcCrdxhCxgsaCsbuf Cuexg CxhrdCrcueCuf sb) + tr(CrcueCuf rd)tr(Csaxg Cxhsb)tr(CsarcCrduf Cuexg Cxhsb) + tr(CrcueCuf rd)tr(Cuexg Cxhuf )tr(ΣCsaxg CxhrdCrcsa) + tr(Csaxg Cxhsb)tr(Cuexg Cxhuf )tr(ΣCrcsaCsbuf Cuerc) + tr(CsarcCrdsb)tr(Csaxg Cxhsb)tr(ΣCuercCrdxhCxgue) + tr(CsarcCrdsb)tr(CrcueCuf rd)tr(ΣCxgueCuf sbCsaxg ) + tr(CsarcCrdsb)tr(CrcueCuf rd)tr(Cuexg Cxhuf )tr(CxgsaCsbxh) + tr(CsarcCrdsb)tr(CrcueCuf xhCxgsaCsbuf Cuexg Cxhrd) + tr(Csaxg Cxhsb)tr(CsarcCrduf Cuexg CxhrdCrcueCuf sb) 105 (cid:105) (cid:104) H16 = tr(ΣCsaxxΣCuexg ΣCxgrcΣCrcsa) + tr(CxgueCuf xh)tr(ΣCsaueCuf xhCxgrcΣCrcsa) + tr(ΣCsaueCuf rdCrcsa)tr(ΣCxgrcCrduf Cuexg ) + tr(Crcxg Cxhrd)tr(ΣCsaueΣCuexg CxhrdCrcsa) + tr(CsaueCuf sb)tr(ΣCrcsaCsbuf Cuexg ΣCxgrc) + tr(ΣCrcsaCsbxhCxgrc)tr(ΣCuexg CxhsbCsaue) + tr(CsarcCrdsb)tr(ΣCxgrcCrdsbCsaueΣCuexg ) + tr(CsarcCrdsb)tr(CxgueCuf xh)tr(Crcxg Cxhuf CuesaCsbrd) + tr(CsarcCrduf CuesaCsbxhCxgueCuf rdCrcxg Cxhsb) + tr(Crcxg Cxhrd)tr(CsaueCuf sb)tr(CsarcCrdxhCxgueCuf sb) + tr(Crcxg Cxhrd)tr(CxgueCuf xh)tr(ΣCsaueCuf rdCrcsa) + tr(CsaueCuf sb)tr(CxgueCuf xh)tr(ΣCrcsaCsbxhCxgrc) + tr(CsarcCrdsb)tr(CsaueCuf sb)tr(ΣCxgrcCrduf Cuexg ) + tr(CsarcCrdsb)tr(Crcxg Cxhrd)tr(ΣCuexg CxhsbCsaue) + tr(CsarcCrdsb)tr(Crcxg Cxhrd)tr(CxgueCuf xh)tr(CuesaCsbuf ) + tr(CsarcCrdsb)tr(Crcxg Cxhuf CuesaCsbxhCxgueCuf rd) + tr(CsaueCuf sb)tr(CsarcCrdxhCxgueCuf rdCrcxg Cxhsb) (cid:105) (cid:104) H17 = tr(ΣCsaxxΣCxgrcΣCrcueΣCuesa) + tr(Crcxg Cxhrd)tr(ΣCsaxg CxhrdCrcueΣCuesa) + tr(ΣCsaxg Cxhuf Cuesa)tr(ΣCrcueCuf xhCxgrc) + tr(CuercCrduf )tr(ΣCsaxg ΣCxgrcCrduf Cuesa) + tr(Csaxg Cxhsb)tr(ΣCuesaCsbxhCxgrcΣCrcue) + tr(ΣCuesaCsbrdCrcue)tr(ΣCxgrcCrdsbCsaxg ) 106 + tr(CsaueCuf sb)tr(ΣCrcueCuf sbCsaxg ΣCxgrc) + tr(CsaueCuf sb)tr(Crcxg Cxhrd)tr(CuercCrdxhCxgsaCsbuf ) + tr(CsaueCuf xhCxgsaCsbrdCrcxg Cxhuf CuercCrdsb) + tr(CuercCrduf )tr(Csaxg Cxhsb)tr(CsaueCuf rdCrcxg Cxhsb) + tr(CuercCrduf )tr(Crcxg Cxhrd)tr(ΣCsaxg Cxhuf Cuesa) + tr(Csaxg Cxhsb)tr(Crcxg Cxhrd)tr(ΣCuesaCsbrdCrcue) + tr(CsaueCuf sb)tr(Csaxg Cxhsb)tr(ΣCrcueCuf xhCxgrc) + tr(CsaueCuf sb)tr(CuercCrduf )tr(ΣCxgrcCrdsbCsaxg ) + tr(CsaueCuf sb)tr(CuercCrduf )tr(Crcxg Cxhrd)tr(CxgsaCsbxh) + tr(CsaueCuf sb)tr(CuercCrdxhCxgsaCsbrdCrcxg Cxhuf ) + tr(Csaxg Cxhsb)tr(CsaueCuf rdCrcxg Cxhuf CuercCrdsb) . (cid:105) 107 Proof. By definition of Vsasb(i, j), (cid:110) E Vsasb(i, j)Vrcrd(i, j)Vueuf (i, j)Vxgxh(i, j) = E (Y T isaYjsb )2(Y T ircYjrd )2(Y T iueYjuf )2(Y T ixg Yjxh = E ( YisakYjsbk)2( YircmYjrdm)2( YiueoYjuf o)2( YixgqYjxhq)2(cid:111) (cid:88) q E YisakYjsbkYisalYjsblYircmYjrdmYircnYjrdnYiueoYjuf oYiuepYjuf p (cid:110) (cid:110) (cid:88) (cid:88) (cid:110) k (cid:88) m (cid:111) = = C × YixgqYjxhqYixgwYjxhw (cid:110) (cid:88) (cid:110) E C × E (cid:111) (cid:88) o )2(cid:111) (cid:111) (cid:111) YisakYisalYircmYircnYiueoYiuepYixgqYixgw YjsbkYjsblYjrdmYjrdnYjuf oYjuf pYjxhqYjxhw (3.18) where C represents the summation over the p components of the vector Y.. for k, l, m, n, o, p, q, and w. For each of the expectation terms in (3.18) we apply lemma 7 and sum over the set C. After some tedious algebra it follows that , (cid:110) E Vsasb(i, j)Vrcrd(i, j)Vueuf (i, j)Vxgxh(i, j) (cid:111) = H1 + H2 + H3 + H4 + H5 + H6 + H7 + H8 + H9 + H10 + H11 + H12 + H13 + H14 + H15 + H16 + H17. (cid:3) 3.8.2 Proofs of theorems In this section we provide proofs for the theorems given in Chapter 3. Without loss of generality, assume µt = 0 for all t ∈ {1, . . . , T}, since the test statistic, ˆDnt, is invariant 108 with respect to µt. Proof of Theorem 7. With the addition of Condition 4 as an assumption, the proof is similar to the proof of Theorem 2. Condition (a) and Condition (b) hold in the proof of Theorem 2, and the martingale central limit theorem holds as all required terms are smaller order with T diverging. (cid:3) Proof of Theorem 8. To establish the asymptotic distribution of Mn under H0, we must show convergence of the finite-dimensional distributions and the tightness of the stochastic process maxt∈T σ−1 ˆDntc)T for t1 < ··· < tc is nearly identical to the proof in 2.7 when T is considered finite. Thus, it remains for us to show the tightness of maxt∈T σ−1 ˆDnt so as to conclude Mn converges to maxt∈T Zt, where Zt is a Gaussian process with mean 0 and correlation Rz ˆDnt. The joint asymptotic normality of (σ−1 nt1,0 ˆDnt1, . . . , σ−1 ntc,0 nt,0 nt,0 By definition, ˆDnt = ˆDnt,0 + ˆDnt,2 − 2 ˆDnt,1, where t(cid:88) T(cid:88) (cid:16) s1=1 s2=t+1 ˆDnt,k = Us1s1,k + Us2s2,k − Us1s2,k − Us2s1,k for k ∈ {0, 1, 2}. Furthermore, by Lemmas 3 and 4 in 2.7.1, ˆDnt,1 = op( ˆDnt,0) and ˆDnt,2 = op( ˆDnt,0). Therefore, to show the tightness of max1≤t ν. Let η = i/T (i = 1, . . . , T − 1). By definition, Us1s10 = {n(n − 1)}−1(cid:80) (cid:18) [T η](cid:88) (cid:112)n(n − 1) Gn(η) − Gn(ν) = i(cid:54)=j(Y T is1 Yjs1 T(cid:88) )2. Thus, for η, ν ∈ (0, 1), [T ν](cid:88) T(cid:88) s1=1 s2=[T ν]+1 (cid:19) Ws1s2 , tr(Σ2)T 3 2 s1=1 s2=[T η]+1 s1=1 s2=[T ν]+1 1 (cid:112)n(n − 1)tr(Σ2)T (cid:80)T n(cid:88) ˜Ws1s2 −(cid:80)[T ν] i(cid:54)=j )2 − (Y T is1 )2 + (Y T is2 Yjs2 3 2 f (i, j), (3.21) where ˜Ws1s2 = (Y T is1 f (i, j) = (cid:80)[T η] Yjs1 Yjs1 ˜Ws1s2. We will bound the fourth moment of (3.21) to ultimately show the tightness of (3.20). First, we compute some s1=[T ν]+1 s2=[T ν]+1 s2=[T η]+1 Yjs2 s1=1 )2, and )2 − (Y T is2 (cid:80)[T η] moments of f for various indices. 110 Under the null hypothesis and for i (cid:54)= j, (cid:110) (cid:111) (cid:18) [T η](cid:88) E f (i, j) = E T(cid:88) [T ν](cid:88) [T η](cid:88) (cid:19) , ˜Ws1s2 ˜Ws1s2 − s1=[T ν]+1 s2=[T η]+1 s1=1 s2=[T ν]+1 = [T η](cid:88) [T ν](cid:88) s1=[T ν]+1 − T(cid:88) [T η](cid:88) (cid:104) (cid:104) s2=[T η]+1 s1=1 s2=[T ν]+1 (cid:105) ) − 2tr(Σs1Σs2) (cid:105) ) − 2tr(Σs1Σs2) , tr(Σ2 s1 ) + tr(Σ2 s2 tr(Σ2 s1 ) + tr(Σ2 s2 = 0, (3.22) ) − 2tr(Σs1Σs2) = tr(Σ2) + tr(Σ2) − 2tr(Σ2) = 0. Define the fol- R1 ≡ r2=[T ν]+1. The s1=[T ν]+1 s1=1 r1=1 (cid:80)T s2=[T η]+1; (cid:80) (cid:80)[T η] R2 ≡ (cid:80)[T ν] (cid:17)(cid:27) ˜Ws1s2 −(cid:88) (cid:105) ˜Ws1s2 R2 , E = E S1 S2 R1 ˜Ws1s2 r1=[T ν]+1 f (i, j)f (i, j) ) + tr(Σ2 s2 since tr(Σ2 s1 second moment under the null hypothesis is given by (cid:80)T r2=[T η]+1; (cid:80) (cid:26)(cid:16)(cid:88) (cid:111) lowing notation for the double summation: (cid:80) S1 ≡ (cid:80)[T η] (cid:80)[T η] (cid:80)[T η] s2=[T ν]+1; (cid:80) (cid:17)(cid:16)(cid:88) (cid:110) (cid:104) ˜Ws1s2 2(cid:88) S2 ≡ (cid:80)[T ν] ˜Ws1s2 −(cid:88) (cid:88) (−1)|x−y|(cid:88) (−1)|x−y|(cid:88) (cid:105) (cid:88) where Vsasb(i, j) = (Y T isa )2. Under the null hypothesis, 2(cid:88) 2(cid:88) ˜Wr1r2 Sx Ry Sx Ry a,b,c,d=1 Yjsb = = x,y=1 E , x,y=1 (cid:104) (−1)|a−b|+|c−d|E (cid:104) (cid:105) , Vsasb(i, j)Vrcrd(i, j) E Vsasb(i, j)Vrcrd(i, j) = 2tr2(CsbrdCrcsa) + 2tr(CrcsaCsbrdCrcsaCsbrd) + 2tr(ΣCrdsbΣCsbrd) + 2tr(ΣCrcsaΣCsarc) + tr2(Σ2). 111 Therefore, under Condition 1, (cid:110) (cid:111) E f (i, j)f (i, j) = C x,y=1 for some constant C. 2(cid:88) (−1)|x−y|(cid:88) Sx (cid:88) Ry 2(cid:88) (−1)|a−b|+|c−d|tr2(CsbrdCrcsa) a,b,c,d=1 (3.23) (cid:110) E f (i, j)f (i, k) = Next, consider mutually different indices i, j, k. Thus, (cid:111) 2(cid:88) (−1)|x−y|(cid:88) (cid:88) 2(cid:88) x,y=1 Sx Ry (−1)|a−b|+|c−d|E Vsasb(i, j)Vrcrd(i, k) . a,b,c,d=1 (3.24) (cid:105) Under the null hypothesis, (cid:105) E Vsasb(i, j)Vrcrd(i, k) = tr2(Σ2) + 2tr(ΣCrcsaΣCsarc). Hence, (−1)|a−b|+|c−d|(cid:110) (cid:111) (cid:110) a,b,c,d=1 (cid:111) (cid:110) Therefore, E f (i, j)f (i, k) = 0. Lastly, if we consider the mutually different indices i, j, k, l, then E f (i, j)f (k, l) = 0 due to independence and the fact that E f (i, j) = 0. Consider the difference Gn(η) − Gn(ν) squared. (cid:111) tr2(Σ2) + 2tr(ΣCrcsaΣCsarc) = 0. (3.25) (cid:104) 2(cid:88) (cid:104) (cid:111) (cid:110) (cid:27)2 {Gn(η) − Gn(ν)}2 = {n(n − 1)tr2(Σ2)T 3}−1 f (i, j) , i(cid:54)=j (cid:26)(cid:88) = 2{n(n − 1)tr2(Σ2)T 3}−1(cid:88) + 4{n(n − 1)tr2(Σ2)T 3}−1 (cid:88) + {n(n − 1)tr2(Σ2)T 3}−1 (cid:88) i(cid:54)=j i(cid:54)=j(cid:54)=k f (i, j)f (i, j) f (i, j)f (i, k) f (i, j)f (k, l). i(cid:54)=j(cid:54)=k(cid:54)=l 112 For any real numbers a, b, and c, (a + b + c)2 ≤ 4a2 + 4b2 + 2c2. Thus, {Gn(η) − Gn(ν)}4 ≤ 16{n2(n − 1)2tr4(Σ2)T 6}−1 f (i, j)f (i, j) i(cid:54)=j + 64{n2(n − 1)2tr4(Σ2)T 6}−1 f (i, j)f (i, k) + 2{n2(n − 1)2tr4(Σ2)T 6}−1 f (i, j)f (k, l) . (cid:27)2 (cid:27)2 (cid:27)2 (cid:27)2(cid:21) (cid:26)(cid:88) (cid:26) (cid:88) (cid:26) (cid:88) i(cid:54)=j(cid:54)=k i(cid:54)=j(cid:54)=k(cid:54)=l i(cid:54)=j (cid:20)(cid:26)(cid:88) (cid:20)(cid:26) (cid:88) (cid:20)(cid:26) (cid:88) i(cid:54)=j(cid:54)=k i(cid:54)=j(cid:54)=k(cid:54)=l (cid:20) (cid:21) Taking the expectation of both sides of the above inequality it follows that {Gn(η) − Gn(ν)}4 E ≤ 16{n2(n − 1)2tr4(Σ2)T 6}−1E f (i, j)f (i, j) + 64{n2(n − 1)2tr4(Σ2)T 6}−1E f (i, j)f (i, k) + 2{n2(n − 1)2tr4(Σ2)T 6}−1E f (i, j)f (k, l) , (cid:27)2(cid:21) (cid:27)2(cid:21) ≡ I1 + I2 + I3. (3.26) To bound the expectation in (3.26) we need the order of I1, I2, and I3. Thus, we need to expand multiple summations where the summation is across multiple non-identical indices. First, consider the possible indices for expanding the term inside the expectation for I1 in (3.26). Consider(cid:110)(cid:88) i(cid:54)=j (cid:111)2 f (i, j)f (i, j) (cid:88) i(cid:54)=j (cid:88) i1(cid:54)=j1 = f (i, j)f (i, j)f (i1, j1)f (i1, j1). (3.27) Let Dc = {i, j} ∪ {i1, j1} be the set of indices that are not equivalent to each other where c represents the number of indices that are equivalent to each other in two sets {i, j} and {i1, j1}. If there are no equivalent indices, then D0 = {(i, j, i1, j1)}. 113 Hence, the summation over D0 is given by (cid:88) i(cid:54)=i1(cid:54)=j(cid:54)=j1 f (i, j)f (i, j)f (i1, j1)f (i1, j1) (3.28) If there is one equivalent index, then D1 = {(i = i1, j, j1), (i = j1, j, i1), (i, j = i1, j1), (i, j = j1, i1)}. Let ˜D1 be the set with one equivalent index that produces a unique combination of f (i, j)f (i, j)f (i1, j1)f (i1, j1). ˜D1 = {(i = i1, j, j1)}. Hence, the summation over D1 is equivalent to 4f (i, j)f (i, j)f (i, j1)f (i, j1) (3.29) If there are two equivalent indices, then D2 = {(i = i1, j = j1), (i = j1, j = i1)}. Hence, the summation over D2 is given by (cid:88) i(cid:54)=j(cid:54)=j1 (cid:88) i(cid:54)=j (cid:27)2(cid:21) As a result, from (3.28) – (3.30), (cid:20)(cid:26)(cid:88) i(cid:54)=j E f (i, j)f (i, j) = E f (i, j)f (i, j)f (i1, j1)f (i1, j1) (3.31) 2f (i, j)f (i, j)f (i, j)f (i, j). (3.30) (cid:27) (cid:27) (cid:27) + 4E f (i, j)f (i, j)f (i, j1)f (i, j1) + 2E f (i, j)f (i, j)f (i, j)f (i, j) . (cid:26) (cid:88) (cid:26) (cid:88) (cid:26)(cid:88) i(cid:54)=i1(cid:54)=j(cid:54)=j1 i(cid:54)=j(cid:54)=j1 i(cid:54)=j 114 Thus, I1 = 16{n2(n − 1)2tr4(Σ2)T 6}−1E = 16{n2(n − 1)2tr4(Σ2)T 6}−1E (cid:27)2(cid:21) f (i, j)f (i, j) i(cid:54)=j (cid:20)(cid:26)(cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26)(cid:88) i(cid:54)=j i(cid:54)=j(cid:54)=j1 i(cid:54)=i1(cid:54)=j(cid:54)=j1 f (i, j)f (i, j)f (i1, j1)f (i1, j1) + 64{n2(n − 1)2tr4(Σ2)T 6}−1E f (i, j)f (i, j)f (i, j1)f (i, j1) + 32{n2(n − 1)2tr4(Σ2)T 6}−1E f (i, j)f (i, j)f (i, j)f (i, j) (cid:27) (cid:27) (cid:27) (3.32) (3.33) ≡ R1 + R2 + R3. We now show the order for each of R1, R2, and R3 in terms of n, p, and T . For R1, consider (cid:111) the order for E{f (i, j)f (i, j)f (i1, j1)f (i1, j1)} for the mutually different indices. (cid:110) (cid:110) (cid:111) E f (i, j)f (i, j)f (i1, j1)f (i1, j1) = E (cid:20) = C f (i, j)f (i, j) E f (i1, j1)f (i1, j1) (cid:110) (cid:111) 2(cid:88) (−1)|x−y|(cid:88) (cid:21)2 2(cid:88) (−1)|a−b|+|c−d|tr2(CsbrdCrcsa) (cid:88) Ry Sx x,y=1 × a,b,c,d=1 (cid:20) (cid:16) C for some constant C. Therefore, R1 (cid:16) C[n2(n − 1)2] n2(n − 1)2tr4(Σ2)T 6 (cid:20) (cid:110) (cid:110) tr2(Σ2) T 2([T η] − [T ν]) tr2(Σ2) T 2([T η] − [T ν]) (cid:111)(cid:21)2 (cid:111)(cid:21)2 . , (3.34) (3.35) (cid:16) C ([T η] − [T ν])2 T 2 , 115 for some constant C. Next, consider the term R3. From lemma 7, (cid:110) E{Vsasb(i, j)Vrcrd(i, j)Vueuf (i, j)Vxgxh(i, j)} was calculated up to a constant. Thus, (cid:111) E f (i, j)f (i, j)f (i, j)f (i, j) is given by 2(cid:88) × w,x,y,z=1 (−1)|w−x|+|y−z|(cid:88) 2(cid:88) Sw (cid:88) (cid:88) (cid:88) Rx Uy X z (−1)|a−b|+|c−d|+|e−f|+|g−h|E a,b,c,d,e,f,g,h=1 (cid:110) Vsasb(i, j)Vrcrd(i, j) × Vueuf (i, j)Vxgxh(i, j) (cid:111) . Under the null hypothesis, (cid:110) E Therefore, (cid:20) (cid:111) (cid:16) C (cid:20) (cid:110) (cid:110) f (i, j)f (i, j)f (i, j)f (i, j) tr2(Σ2) T 2([T η] − [T ν]) R3 (cid:16) C [n(n − 1)]tr4(Σ2)T 6 tr2(Σ2) T 2([T η] − [T ν]) and thus R3 = o(R1). For the final term in I1, R2, consider the order of E{f (i, j)f (i, j)f (i, j1)f (i, j1)}. By the Cauchy-Schwarz inequality (cid:110) E f (i, j)f (i, j)f (i, j1)f (i, j1) E f (i, j)f (i, j)f (i, j)f (i, j) (cid:111)(cid:21)1/2 f (i, j1)f (i, j1)f (i, j1)f (i, j1) (cid:20) (cid:111) ≤ (cid:110) (cid:110) (cid:20) (cid:110) E E × (cid:18) (cid:16) O f (i, j)f (i, j)f (i, j)f (i, j) (cid:111)(cid:21)2 . (cid:111)(cid:21)2 , (cid:111)(cid:21)1/2 (cid:111)(cid:19) . Therefore, based on the above results for R3 it follows that R2 = o(R1). As a result, for some constant C. I1 ≤ C ([T η] − [T ν])2 T 2 , 116 (3.36) (3.37) Next, we investigate the order of I2. Consider the possible indices for expanding (cid:110) (cid:88) i(cid:54)=j(cid:54)=k (cid:111)2 (cid:88) (cid:88) i(cid:54)=j(cid:54)=k i1(cid:54)=j1(cid:54)=k1 f (i, j)f (i, k) = f (i, j)f (i, k)f (i1, j1)f (i1, k1). (3.38) Let Ec = {i, j, k} ∪ {i1, j1, k1} be the set of indices that are not equivalent to each other, where c represents the number of indices that are equivalent to each other in two sets {i, j, k} and {i1, j1, k1}. If there are no equivalent indices, then E0 = {(i, j, k, i1, j1, k1)}. Hence, the summation over E0 is given by (cid:88) i(cid:54)=i1(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1 If there is one equivalent index, then f (i, j)f (i, k)f (i1, j1)f (i1, k1). (3.39) E1 = {(i = i1, j, k, j1, k1), (i = j1, j, k, i1, k1), (i = k1, j, k, i1, j1), (i, j = i1, k, j1, k1), (i, j = j1, k, i1, k1), (i, j = k1, k, i1, j1), (i, j, k = i1, j1, k1), (i, j, k = j1, i1, k1), (i, j, k = k1, i1, j1)}. Let ˜E1 be the set with one equivalent index that produces a unique combination of f (i, j)f (i, k)f (i1, j1)f (i1, k1), such that ˜E1 = {(i = i1, j, k, j1, k1), (i = j1, j, k, i1, k1), (i, j = j1, k, i1, k1)}. Hence, the summation over E1 is equivalent to f (i, j)f (i, k)f (i, j1)f (i, k1) (3.40) (cid:88) i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1 (cid:88) (cid:88) + + i(cid:54)=i1(cid:54)=j(cid:54)=k(cid:54)=k1 i(cid:54)=i1(cid:54)=j(cid:54)=k(cid:54)=k1 4f (i, j)f (i, k)f (i1, i)f (i1, k1) 4f (i, j)f (i, k)f (i1, j)f (i1, k1) 117 If there are two equivalent indices, then E2 = {(i = i1, j = j1, k, k1), (i = i1, j = k1, k, j1), (i = j1, j = i1, k, k1), (i = j1, j = k1, k, i1), (i = k1, j = i1, k, j1), (i = k1, j = j1, k, i1), (i = i1, k = j1, j, k1), (i = i1, k = k1, j, j1), (i = j1, k = i1, j, k1), (i = j1, k = k1, k, i1), (i = k1, k = i1, j, j1), (i = k1, k = j1, j, i1), (j = i1, k = j1, i, k1), (j = i1, k = k1, i, j1), (j = j1, k = i1, i, k1), (j = j1, k = k1, i, i1), (j = k1, k = i1, i, j1), (j = k1, k = j1, i, i1)}. Let ˜E2 be the set with two equivalent indices that produces a unique combination of f (i, j)f (i, k)f (i1, j1)f (i1, k1). ˜E2 = {(i = i1, j = j1, k, k1), (i = j1, j = i1, k, k1), (i = j1, j = k1, k, i1), (j = j1, k = k1, i, i1)}. Hence, the summation over E2 is equivalent to 4f (i, j)f (i, k)f (i, j)f (i, k1) (3.41) (cid:88) i(cid:54)=j(cid:54)=k(cid:54)=k1 + + + (cid:88) (cid:88) (cid:88) i(cid:54)=j(cid:54)=k(cid:54)=k1 i(cid:54)=i1(cid:54)=j(cid:54)=k i(cid:54)=i1(cid:54)=j(cid:54)=k 4f (i, j)f (i, k)f (j, i)f (j, k1) 8f (i, j)f (i, k)f (i1, i)f (i1, j) 2f (i, j)f (i, k)f (i1, j)f (i1, k). Lastly, if there are three equivalent indices, then E3 = {(i = i1, j = j1, k = k1), (i = i1, j = k1, k = j1), (i = j1, j = i1, k = k1), (i = j1, j = k1, k = i1), (i = k1, j = j1, k = i1), (i = k1, j = i1, k = j1)}. Let ˜E3 be the set with two equivalent indices that produces a unique combination of f (i, j)f (i, k)f (i1, j1)f (i1, k1) such that ˜E3 = {(i = i1, j = j1, k = k1), (i = j1, j = i1, k = k1)}. 118 Hence, the summation over E3 is equivalent to 2f (i, j)f (i, k)f (i, j)f (i, k) + 4f (i, j)f (i, k)f (j, i)f (j, k) (cid:88) i(cid:54)=j(cid:54)=k (cid:88) i(cid:54)=j(cid:54)=k (cid:27)2(cid:21) As a result, from (3.39) – (3.42), (cid:20)(cid:26) (cid:88) i(cid:54)=j(cid:54)=k E f (i, j)f (i, k) = E f (i, j)f (i, k)f (i1, j1)f (i1, k1) + E f (i, j)f (i, k)f (i, j1)f (i, k1) (3.42) (3.43) (cid:27) (cid:27) (cid:27) (cid:27) f (i, j)f (i, k)f (i1, i)f (i1, k1) f (i, j)f (i, k)f (i1, j)f (i1, k1) (cid:27) (cid:27) (cid:27) (cid:27) f (i, j)f (i, k)f (i, j)f (i, k1) f (i, j)f (i, k)f (j, i)f (j, k1) f (i, j)f (i, k)f (i1, i)f (i1, j) f (i, j)f (i, k)f (i1, j)f (i1, k) + 2E f (i, j)f (i, k)f (i, j)f (i, k) + 4E f (i, j)f (i, k)f (j, i)f (j, k) . (cid:27) (cid:27) i(cid:54)=i1(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1 i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1 i(cid:54)=i1(cid:54)=j(cid:54)=k(cid:54)=k1 i(cid:54)=i1(cid:54)=j(cid:54)=k(cid:54)=k1 (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) i(cid:54)=j(cid:54)=k + 4E + 4E + 4E + 4E + 8E + 2E i(cid:54)=i1(cid:54)=j(cid:54)=k i(cid:54)=i1(cid:54)=j(cid:54)=k i(cid:54)=j(cid:54)=k(cid:54)=k1 i(cid:54)=j(cid:54)=k(cid:54)=k1 i(cid:54)=j(cid:54)=k 119 Thus, I2 = 64{n2(n − 1)2tr4(Σ2)T 6}−1E = 64{n2(n − 1)2tr4(Σ2)T 6}−1E (cid:27)2(cid:21) f (i, j)f (i, k) , + 64{n2(n − 1)2tr4(Σ2)T 6}−1E + 256{n2(n − 1)2tr4(Σ2)T 6}−1E + 256{n2(n − 1)2tr4(Σ2)T 6}−1E + 256{n2(n − 1)2tr4(Σ2)T 6}−1E + 256{n2(n − 1)2tr4(Σ2)T 6}−1E + 512{n2(n − 1)2tr4(Σ2)T 6}−1E i(cid:54)=i1(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1 i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1 i(cid:54)=i1(cid:54)=j(cid:54)=k(cid:54)=k1 i(cid:54)=i1(cid:54)=j(cid:54)=k(cid:54)=k1 i(cid:54)=j(cid:54)=k (cid:20)(cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) i(cid:54)=j(cid:54)=k i(cid:54)=j(cid:54)=k i(cid:54)=i1(cid:54)=j(cid:54)=k i(cid:54)=i1(cid:54)=j(cid:54)=k i(cid:54)=j(cid:54)=k(cid:54)=k1 i(cid:54)=j(cid:54)=k(cid:54)=k1 (cid:27) (cid:27) (cid:27) (cid:27) f (i, j)f (i, k)f (i1, j1)f (i1, k1) f (i, j)f (i, k)f (i, j1)f (i, k1) f (i, j)f (i, k)f (i1, i)f (i1, k1) f (i, j)f (i, k)f (i1, j)f (i1, k1) f (i, j)f (i, k)f (i, j)f (i, k1) f (i, j)f (i, k)f (j, i)f (j, k1) f (i, j)f (i, k)f (i1, i)f (i1, j) (cid:27) (cid:27) (cid:27) (cid:27) (cid:27) (cid:27) + 128{n2(n − 1)2tr4(Σ2)T 6}−1E f (i, j)f (i, k)f (i1, j)f (i1, k) + 128{n2(n − 1)2tr4(Σ2)T 6}−1E f (i, j)f (i, k)f (i, j)f (i, k) + 256{n2(n − 1)2tr4(Σ2)T 6}−1E f (i, j)f (i, k)f (j, i)f (j, k) , ≡ S1 + S2 + S3 + S4 + S5 + S6 + S7 + S8 + S9 + S10. (3.44) Under the null hypothesis, S1, S2, S3, and S4 are all zero. Terms S9 and S10 are of the same order as R2. Thus, S9 = o(I1) and S10 = o(I1). Terms S5, S6, S7, and S8 are all of the same order in terms of n as term R1. Additionally, using two iterations of the Cauchy Schwarz inequality, these terms will be the same as R3 in terms of n and p. Thus, S5, S6, S7, and S8 120 I2 ≤ C (cid:88) (cid:88) (cid:111)2 (cid:110) (cid:88) i(cid:54)=j(cid:54)=k(cid:54)=l are all smaller order terms in comparison to I1. As a result, for some constant C, ([T η] − [T ν])2 T 2 . (3.45) Finally, we show the order of I3. Consider the possible indices for expanding f (i, j)f (k, l) = f (i, j)f (k, l)f (i1, j1)f (k1, l1). (3.46) i(cid:54)=j(cid:54)=k(cid:54)=l i1(cid:54)=j1(cid:54)=k1(cid:54)=l1 Let Fc = {i, j, k, l} ∪ {i1, j1, k1, l1} be the set of indices that are not equivalent to each other, where c represents the number of indices that are equivalent to each other in two sets {i, j, k, l} and {i1, j1, k1, l1}. If there are no equivalent indices, then F0 = {(i, j, k, l, i1, j1, k1, l1)}. (cid:88) f (i, j)f (k, l)f (i1, j1)f (k1, l1). Hence, the summation over F0 is given by i(cid:54)=i1(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1(cid:54)=l(cid:54)=l1 If there is one equivalent index, then F1 = {(i = i1, j, k, l, j1, k1, l1), (i = j1, j, k, l, i1, k1, l1), (i = k1, j, k, l, i1, j1, l1), (i = l1, j, k, l, i1, j1, k1), (i, j = i1, k, l, j1, k1, l1), (i, j = j1, k, l, i1, k1, l1), (i, j = k1, k, l, i1, j1, l1), (i, j = l1, k, l, i1, j1, k1)(i, j, k = i1, l, j1, k1, l1), (i, j, k = j1, l, i1, k1, l1), (i, j, k = k1, l, i1, j1, l1), (i, j, k = l1, l, i1, j1, k1) (i, j, k, l = i1, j1, k1, l1), (i, j, k, l = j1, i1, k1, l1), (i, j, k, l = k1, i1, j1, l1), (i, j, k, l = l1, i1, j1, k1)}. The summation over F1 is equivalent to(cid:88) 16f (i, j)f (k, l)f (i, j1)f (k1, l1). i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1(cid:54)=l(cid:54)=l1 If there are two equivalent indices, then (3.47) (3.48) F2 = {(i = i1, j = j1, k, l, k1, l1), (i = i1, j = k1, k, l, j1, l1), (i = i1, j = l1, k, l, j1, k1), 121 (i = j1, j = i1, k, l, k1, l1), (i = j1, j = k1, k, l, i1, l1), (i = j1, j = l1, k, l, i1, k1), (i = k1, j = i1, k, l, j1, l1), (i = k1, j = j1, k, l, i1, l1), (i = k1, j = l1, k, l, i1, j1), (i = l1, j = i1, k, l, j1, k1), (i = l1, j = j1, k, l, i1, k1), (i = l1, j = k1, k, l, i1, j1), (i = i1, k = j1, j, l, k1, l1), (i = i1, k = k1, j, l, j1, l1), (i = i1, k = l1, j, l, j1, k1), (i = j1, k = i1, j, l, k1, l1), (i = j1, k = k1, j, l, i1, l1), (i = j1, k = l1, j, l, i1, k1), (i = k1, k = i1, j, l, j1, l1), (i = k1, k = j1, j, l, i1, l1), (i = k1, k = l1, j, l, i1, j1), (i = l1, k = i1, j, l, j1, k1), (i = l1, k = j1, j, l, i1, k1), (i = l1, k = k1, j, l, i1, j1), (i = i1, l = j1, j, k, k1, l1), (i = i1, l = k1, j, k, j1, l1), (i = i1, l = l1, j, k, j1, k1), (i = j1, l = i1, j, k, k1, l1), (i = j1, l = k1, j, k, i1, l1), (i = j1, l = l1, j, k, i1, k1), (i = k1, l = i1, j, k, j1, l1), (i = k1, l = j1, j, k, i1, l1), (i = k1, l = l1, j, k, i1, j1), (i = l1, l = i1, j, k, j1, k1), (i = l1, l = j1, j, k, i1, k1), (i = l1, l = k1, j, k, i1, j1), (j = i1, k = j1, i, l, k1, l1), (j = i1, k = k1, i, l, j1, l1), (j = i1, k = l1, i, l, j1, k1), (j = j1, k = i1, i, l, k1, l1), (j = j1, k = k1, i, l, i1, l1), (j = j1, k = l1, i, l, i1, k1), (j = k1, k = i1, i, l, j1, l1), (j = k1, k = j1, i, l, i1, l1), (j = k1, k = l1, i, l, i1, j1), (j = l1, k = i1, i, l, j1, k1), (j = l1, k = j1, i, l, i1, k1), (j = l1, k = k1, i, l, i1, j1), (j = i1, l = j1, i, k, k1, l1), (j = i1, l = k1, i, k, j1, l1), (j = i1, l = l1, i, k, j1, k1), (j = j1, l = i1, i, k, k1, l1), (j = j1, l = k1, i, k, i1, l1), (j = j1, l = l1, i, k, i1, k1), (j = k1, l = i1, i, k, j1, l1), (j = k1, l = j1, i, k, i1, l1), (j = k1, l = l1, i, k, i1, j1), (j = l1, l = i1, i, k, j1, k1), (j = l1, l = j1, i, k, i1, k1), (j = l1, l = k1, i, k, i1, j1), (k = i1, l = j1, i, j, k1, l1), (k = i1, l = k1, i, j, j1, l1), (k = i1, l = l1, i, j, j1, k1), (k = j1, l = i1, i, j, k1, l1), (k = j1, l = k1, i, j, i1, l1), (k = j1, l = l1, i, j, i1, k1), (k = k1, l = i1, i, j, j1, l1), (k = k1, l = j1, i, j, i1, l1), (k = k1, l = l1, i, j, i1, j1), (k = l1, l = i1, i, j, j1, k1), (k = l1, l = j1, i, j, i1, k1), (k = l1, l = k1, i, j, i1, j1)}. 122 The summation over F2 is equivalent to (cid:88) + + (cid:88) (cid:88) 4f (i, j)f (k, l)f (i, j)f (k1, l1) i(cid:54)=j(cid:54)=k(cid:54)=k1(cid:54)=l(cid:54)=l1 28f (i, j)f (k, l)f (i, j1)f (j, l1) i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=l(cid:54)=l1 32f (i, j)f (k, l)f (i, j1)f (k, l1). (3.49) i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=l(cid:54)=l1 If there are three equivalent indices, then F3 = {(i = i1, j = j1, k = k1, l, l1), (i = i1, j = j1, k = l1, l, k1), (i = i1, j = k1, k = j1, l, l1), (i = i1, j = k1, k = l1, l, j1), (i = i1, j = l1, k = j1, l, k1), (i = i1, j = l1, k = k1, l, j1), (i = j1, j = i1, k = k1, l, l1), (i = j1, j = i1, k = l1, l, k1), (i = j1, j = k1, k = i1, l, l1), (i = j1, j = k1, k = l1, l, i1), (i = j1, j = l1, k = i1, l, k1), (i = j1, j = l1, k = k1, l, i1), (i = k1, j = i1, k = j1, l, l1), (i = k1, j = i1, k = l1, l, j1), (i = k1, j = j1, k = i1, l, l1), (i = k1, j = j1, k = l1, l, i1), (i = k1, j = l1, k = i1, l, j1), (i = k1, j = l1, k = j1, l, i1), (i = l1, j = i1, k = j1, l, k1), (i = l1, j = i1, k = k1, l, j1), (i = l1, j = j1, k = i1, l, k1), (i = l1, j = j1, k = k1, l, i1), (i = l1, j = k1, k = i1, l, j1), (i = l1, j = k1, k = j1, l, i1), (i = i1, j = j1, l = k1, k, l1), (i = i1, j = j1, l = l1, k, k1), (i = i1, j = k1, l = j1, k, l1), (i = i1, j = k1, l = l1, k, j1), (i = i1, j = l1, l = j1, k, k1), (i = i1, j = l1, l = k1, k, j1), (i = j1, j = i1, l = k1, k, l1), (i = j1, j = i1, l = l1, k, k1), (i = j1, j = k1, l = i1, k, l1), (i = j1, j = k1, l = l1, k, i1), (i = j1, j = l1, l = i1, k, k1), (i = j1, j = l1, l = k1, k, i1), (i = k1, j = i1, l = j1, k, l1), (i = k1, j = i1, l = l1, k, j1), (i = k1, j = j1, l = i1, k, l1), (i = k1, j = j1, l = l1, k, i1), (i = k1, j = l1, l = i1, k, j1), (i = k1, j = l1, l = j1, k, i1), (i = l1, j = i1, l = j1, k, k1), (i = l1, j = i1, l = k1, k, j1), (i = l1, j = j1, l = i1, k, k1), (i = l1, j = j1, l = k1, k, i1), (i = l1, j = k1, l = i1, k, j1), (i = l1, j = k1, l = j1, k, i1), (i = i1, k = j1, l = k1, j, l1), (i = i1, k = j1, l = l1, j, k1), (i = i1, k = k1, l = j1, j, l1), (i = i1, k = k1, l = l1, j, j1), (i = i1, k = l1, l = j1, j, k1), (i = i1, k = l1, l = k1, j, j1), 123 (i = j1, k = i1, l = k1, j, l1), (i = j1, k = i1, l = l1, j, k1), (i = j1, k = k1, l = i1, j, l1), (i = j1, k = k1, l = l1, j, i1), (i = j1, k = l1, l = i1, j, k1), (i = j1, k = l1, l = k1, j, i1), (i = k1, k = i1, l = j1, j, l1), (i = k1, k = i1, l = l1, j, j1), (i = k1, k = j1, l = i1, j, l1), (i = k1, k = j1, l = l1, j, i1), (i = k1, k = l1, l = i1, j, j1), (i = k1, k = l1, l = j1, j, i1), (i = l1, k = i1, l = j1, j, k1), (i = l1, k = i1, l = l1, j, j1), (i = l1, k = j1, l = i1, j, k1), (i = l1, k = j1, l = l1, j, i1), (i = l1, k = k1, l = i1, j, j1), (i = l1, k = k1, l = j1, j, i1), (j = i1, k = j1, l = k1, i, l1), (j = i1, k = j1, l = l1, i, k1), (j = i1, k = k1, l = j1, i, l1), (j = i1, k = k1, l = l1, i, j1), (j = i1, k = l1, l = j1, i, k1), (j = i1, k = l1, l = k1, i, j1), (j = j1, k = i1, l = k1, i, l1), (j = j1, k = i1, l = l1, i, k1), (j = j1, k = k1, l = i1, i, l1), (j = j1, k = k1, l = l1, i, i1), (j = j1, k = l1, l = i1, i, k1), (j = j1, k = l1, l = k1, i, i1), (j = k1, k = i1, l = j1, i, l1), (j = k1, k = i1, l = l1, i, j1), (j = k1, k = j1, l = i1, i, l1), (j = k1, k = j1, l = l1, i, i1), (j = k1, k = l1, l = i1, i, j1), (j = k1, k = l1, l = j1, i, i1), (j = l1, k = i1, l = j1, i, k1), (j = l1, k = i1, l = k1, i, j1), (j = l1, k = j1, l = i1, i, k1), (j = l1, k = j1, l = k1, i, i1), (j = l1, k = k1, l = i1, i, j1), (j = l1, k = k1, l = j1, i, i1)}. (cid:88) (cid:88) The summation over F3 is equivalent to 32f (i, j)f (k, l)f (i, j)f (k, l1) + 64f (i, j)f (k, l)f (i, k)f (j, l1). (3.50) i(cid:54)=j(cid:54)=k(cid:54)=l(cid:54)=l1 i(cid:54)=j(cid:54)=k(cid:54)=l(cid:54)=l1 Lastly, if there are four equivalent indices, then F4 = {(i = i1, j = j1, k = k1, l = l1), (i = i1, j = j1, k = l1, l = k1), (i = i1, j = k1, k = j1, l = l1), (i = i1, j = k1, k = l1, l = j1), (i = i1, j = l1, k = j1, l = k1), (i = i1, j = l1, k = k1, l = j1), (i = j1, j = i1, k = k1, l = l1), (i = j1, j = i1, k = l1, l = k1), (i = j1, j = k1, k = i1, l = l1), (i = j1, j = k1, k = l1, l = i1), (i = j1, j = l1, k = i1, l = k1), (i = j1, j = l1, k = k1, l = i1), 124 (i = k1, j = i1, k = j1, l = l1), (i = k1, j = i1, k = l1, l = j1), (i = k1, j = j1, k = i1, l = l1), (i = k1, j = j1, k = l1, l = i1), (i = k1, j = l1, k = i1, l = j1), (i = k1, j = l1, k = j1, l = i1), (i = l1, j = i1, k = j1, l = k1), (i = l1, j = i1, k = k1, l = j1), (i = l1, j = j1, k = i1, l = k1), (i = l1, j = j1, k = k1, l = i1), (i = l1, j = k1, k = i1, l = j1), (i = l1, j = k1, k = j1, l = i1)}. The summation over F4 is equivalent to 6f (i, j)f (k, l)f (i, j)f (k, l) + (cid:88) i(cid:54)=j(cid:54)=k(cid:54)=l 18f (i, j)f (k, l)f (i, k)f (j, l). (3.51) (cid:88) i(cid:54)=j(cid:54)=k(cid:54)=l (cid:20)(cid:110) (cid:88) i(cid:54)=j(cid:54)=k(cid:54)=l As a result from (3.47) – (3.51), (cid:111)2(cid:21) (cid:26) E f (i, j)f (k, l) = E f (i, j)f (k, l)f (i1, j1)f (k1, l1) (cid:27) (cid:27) (cid:27) (cid:27) (cid:27) f (i, j)f (k, l)f (i, j1)f (k1, l1) f (i, j)f (k, l)f (i, j)f (k1, l1) f (i, j)f (k, l)f (i, j1)f (j, l1) f (i, j)f (k, l)f (i, j1)f (k, l1) f (i, j)f (k, l)f (i, j)f (k, l1) f (i, j)f (k, l)f (i, k)f (j, l1) (cid:27) (cid:27) (cid:27) (cid:27) i(cid:54)=i1(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1(cid:54)=l(cid:54)=l1 i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1(cid:54)=l(cid:54)=l1 + 16E + 4E + 28E + 32E + 32E + 64E i(cid:54)=j(cid:54)=k(cid:54)=k1(cid:54)=l(cid:54)=l1 (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) i(cid:54)=j(cid:54)=k(cid:54)=l(cid:54)=l1 i(cid:54)=j(cid:54)=k(cid:54)=l(cid:54)=l1 i(cid:54)=j(cid:54)=k(cid:54)=l i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=l(cid:54)=l1 i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=l(cid:54)=l1 i(cid:54)=j(cid:54)=k(cid:54)=l 125 + 6E f (i, j)f (k, l)f (i, j)f (k, l) + 18E f (i, j)f (k, l)f (i, k)f (j, l) Thus, I3 = 2{n2(n − 1)2tr4(Σ2)T 6}−1E = 2{n2(n − 1)2tr4(Σ2)T 6}−1E (cid:20)(cid:26) (cid:88) (cid:26) i(cid:54)=j(cid:54)=k(cid:54)=l (cid:27)2(cid:21) f (i, j)f (k, l) , + 32{n2(n − 1)2tr4(Σ2)T 6}−1E + 8{n2(n − 1)2tr4(Σ2)T 6}−1E + 56{n2(n − 1)2tr4(Σ2)T 6}−1E + 64{n2(n − 1)2tr4(Σ2)T 6}−1E + 64{n2(n − 1)2tr4(Σ2)T 6}−1E (cid:27) (cid:27) f (i, j)f (k, l)f (i1, j1)f (k1, l1) f (i, j)f (k, l)f (i, j1)f (k1, l1) f (i, j)f (k, l)f (i, j)f (k1, l1) f (i, j)f (k, l)f (i, j1)f (j, l1) f (i, j)f (k, l)f (i, j1)f (k, l1) f (i, j)f (k, l)f (i, j)f (k, l1) (cid:27) (cid:27) (cid:27) (cid:27) (cid:27) i(cid:54)=i1(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1(cid:54)=l(cid:54)=l1 i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1(cid:54)=l(cid:54)=l1 i(cid:54)=j(cid:54)=k(cid:54)=k1(cid:54)=l(cid:54)=l1 (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) (cid:26) (cid:88) i(cid:54)=j(cid:54)=k(cid:54)=l(cid:54)=l1 i(cid:54)=j(cid:54)=k(cid:54)=l i(cid:54)=j(cid:54)=k(cid:54)=l i(cid:54)=j(cid:54)=k(cid:54)=l(cid:54)=l1 i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=l(cid:54)=l1 i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=l(cid:54)=l1 (cid:27) (cid:27) , (cid:110) (cid:111) (3.52) = 0 for f (i, j) + 128{n2(n − 1)2tr4(Σ2)T 6}−1E f (i, j)f (k, l)f (i, k)f (j, l1) + 12{n2(n − 1)2tr4(Σ2)T 6}−1E f (i, j)f (k, l)f (i, j)f (k, l) + 36{n2(n − 1)2tr4(Σ2)T 6}−1E f (i, j)f (k, l)f (i, k)f (j, l) ≡ Q1 + Q2 + Q3 + Q4 + Q5 + Q6 + Q7 + Q8 + Q9. Due to the mutually different indices, Q1, Q2, Q3, Q4 all equal zero since E (cid:110) (cid:111) (cid:110) (cid:111) (cid:111) (cid:110) (cid:111) (cid:110) i different than j. Furthermore, under the null hypothesis, Q5, Q6, and Q7 all equal zero. For Q5, consider E f (i, j)f (k, l)f (i, j1)f (k, l1) . Due to the mutually different indices E f (i, j)f (k, l)f (i, j1)f (k, l1) = E f (i, j)f (i, j1) E f (k, l)f (k, l1) . By (3.25), each of the expectation terms is zero and thus Q5 = 0. Similarly, for 126 (cid:110) (cid:110) (cid:110) Q6, E f (i, j)f (k, l)f (i, j)f (k, l1) the term E f (k, l)f (k, l1) E f (i, j)f (k, l)f (i, k)f (j, l1) = E . Again, by (3.25) S1 S2 (cid:111) (cid:111) (cid:111) (cid:111) f (k, l)f (k, l1) ˜Ws1s2(i, j) ˜Wr1r2(k, l) = E f (i, j)f (i, j) E is zero. Thus, Q6 = 0. To see that Q7 is zero, consider (cid:111) (cid:17) (cid:17) (cid:17) (cid:17)(cid:27) (cid:110) (cid:26)(cid:16)(cid:88) ×(cid:16)(cid:88) ×(cid:16)(cid:88) ×(cid:16)(cid:88) 2(cid:88) (cid:110) ˜Ws1s2(i, j) −(cid:88) ˜Wr1r2(k, l) −(cid:88) ˜Wu1u2(i, k) −(cid:88) ˜Wx1x2(j, l1) −(cid:88) (−1)|a−b|+|c−d|(cid:88) (cid:88) (cid:110) ˜Ws1s2(i, j) ˜Wr1r2(k, l) ˜Wu1u2(i, k) ˜Wx1x2(j, l1) (cid:111) (cid:110) ˜Ws1s2(i, j) ˜Wr1r2(k, l) ˜Wu1u2(i, k) ˜Wx1x2(j, l1) (cid:111) (cid:110) (cid:88) (cid:88) ˜Wx1x2(j, l1) can be expressed as ˜Wu1u2(i, k) × E Sa Rb Uc X d a,b,c,d=1 R1 U1 X 1 R2 U2 X 2 (−1)|a−b|+|c−d|+|e−f|+|g−h|E = , Vsasb(i, j)Vrcrd(k, l)Vueuf (i, k)Vxgxh(j, l1) . . (cid:111) Accordingly, E 2(cid:88) a,b,c,d,e,f,g,h=1 (cid:110) E Under the null hypothesis, (cid:111) Vsasb(i, j)Vrcrd(k, l)Vueuf (i, k)Vxgxh(j, l1) + 2tr2(Σ2)tr(ΣCsaueΣCuesa) + 2tr2(Σ2)tr(ΣCrcuf ΣCuf rc) + 4tr(Σ2)tr(ΣCuesaCsbxg ΣCxgsbCsaue) + 4tr(Σ2)tr(ΣCrcuf CuesaΣCsaueCuf rc) = tr(Σ4) + 2tr2(Σ2)tr(ΣCsbxg ΣCxgsb) + 4tr(ΣCrcuf ΣCuf rc)tr(ΣCsbxg ΣCxgsb) + 8tr(ΣCrcuf CuesaCsbxg ΣCxgsbCsaueCuf rc). The summation of the above expression over a, b, c, d, e, f, g, h ∈ {1, 2} is zero. Hence, Q7 = 0. Terms Q8 and Q9 are at most the order of R1. Term Q8 has the same order up to a 127 constant as R1 due to the four mutually different indices. By the Cauchy-Schwarz inequality, the E f (i, j)f (k, l)f (i, k)f (j, l) with respect to Q9 can be expressed as f (i, j)f (k, l)f (i, k)f (j, l) E f (i, j)f (k, l)f (i, j)f (k, l) (cid:111) (cid:20) (cid:111) ≤ (cid:110) (cid:110) (cid:20) (cid:110) E E × (cid:18) (cid:111)(cid:21)1/2 (cid:111)(cid:21)1/2 (cid:111)(cid:19) f (i, k)f (j, l)f (i, k)f (j, l) , (cid:16) O f (i, j)f (i, j)f (k, l)f (k, l) . (cid:110) (cid:110) E Therefore, As a result, (cid:110)|Gn(i/T ) − Gn(j/T )|4(cid:111) (cid:21)2 (cid:20)(j − i) (cid:18) 1 (cid:88) (cid:19)2α λ2T λ4 ul , T i 0, (cid:16)|Gn(i/T ) − Gn(j/T )| ≥ λ (cid:17) ≤ E pr where we set α = 1, β = 1, and ul = l − (l − 1) = 1. By Theorem 10.2 in Billingsley (1999) (cid:110) pr max t∈T |Gn(i/T )| ≥ λ For λ large, above probability is less than any ε > 0. As a result, maxt∈T |Gn(i/T )| is tight, and thus maxt∈T σ−1 ˆDnt is also tight. Therefore, by the tightness of the stochastic process and convergence of the finite dimensional distributions, it follows that Mn converges to a Gaussian process with mean zero and correlation Rz. nt,0 (cid:3) Proof of Theorem 9. Assume that one change point exists at time τ . Thus, we assume alternative H∗ νt,max = maxt∈T max (cid:16)(cid:112)V0t/w2(t),(cid:112)nV1t/w2(t) 1 as defined in (3.15). Let ∆p = tr{(Σ1 − ΣT )2} and such that T = {1, . . . , T − 1}. (cid:17) Define a set, E(C), such that E(C) = {t ∈ {1, . . . , T − 1} : |t − τ| ≥ CΘ}, where C is some constant and Θ is a function of p, n, and T . The value Θ is chosen to show the rate of convergence of the change point estimator under the asymptotic setting where p, (cid:16) pr max t∈E(C) n, and T diverge. Thus, to establish this rate of convergence we must show that for some C, pr(|ˆτ − τ| ≥ CΘ) < ε. since pr(|ˆτ − τ| ≥ CΘ) = pr(ˆτ ∈ E(C)) ≤ pr(maxt∈E(C) {maxt∈E(C) ˆDnt > ˆDnτ ) < ε ˆDnt > ˆDnτ ) for {ˆτ ∈ E(C)} ⊂ It is sufficient to show that pr(maxt∈E(C) t∈E(C) ˆDnt > ˆDnτ ˆDnt > ˆDnτ}. Thus, (cid:17) ≤ (cid:88) (cid:88) (cid:88) ≤ (cid:88) = = t∈E(C) t∈E(C) pr pr pr (cid:17) (cid:16) ˆDnt > ˆDnτ (cid:16) ˆDnt − Dt + Dt − Dτ > ˆDnτ − Dτ (cid:104){ ˆDnt − Dt} + {−( ˆDnτ − Dτ )} > −{Dt − Dτ}(cid:105) (cid:104)|{ ˆDnt − Dt} + {−( ˆDnτ − Dτ )}| > −{Dt − Dτ}(cid:105) (cid:17) . The term −(Dt − Dτ ) can be expressed as |t − τ|G(t; τ )∆p, where pr t∈E(C) G(t; τ ) =  1 T − t , 1 ≤ t ≤ τ, 1 t , τ + 1 ≤ t < T. In terms of T , the function G is of the order 1/T . 129 Recall that for two random variables, X and Y , pr(|X + Y | > ε) ≤ pr(|X| > ε/2) + pr(|Y | > ε/2). Hence, (cid:16) pr max t∈E(C) ˆDnt > ˆDnτ (cid:17) ≤ (cid:88) ≤ (cid:88) t∈E(C) t∈E(C) for some constants C1 and C. Choose Θ = νt,maxT log T /n∆p. By the choice of Θ, order pr t∈E(C) 2 2 σnt σnτ > > (cid:27) (cid:27) (cid:27) (cid:27) pr pr | ˆDnτ − Dτ| > |t − τ|G(t; τ )∆p |t − τ|G(t; τ )∆p |t − τ|G(t; τ )∆p 2σnt |t − τ|G(t; τ )∆p (cid:104)|{ ˆDnt − Dt} + {−( ˆDnτ − Dτ )}| > −{Dt − Dτ}(cid:105) (cid:26) | ˆDnt − Dt| > (cid:26) (cid:88) (cid:26)| ˆDnt − Dt| (cid:26)| ˆDnτ − Dτ| (cid:88) (cid:26)| ˆDnt − Dt| (cid:26)| ˆDnτ − Dτ| (cid:88) (cid:26)| ˆDnt − Dt| (cid:26)| ˆDnτ − Dτ| (cid:88) (cid:26)| ˆDnt − Dt| (cid:26)| ˆDnτ − Dτ| (cid:88) 2σnτ |t − τ|G(t; τ )n∆p 4 ˜V0t + 8n ˜V1t 2 |t − τ|G(t; τ )n∆p 4 ˜V0τ + 8n ˜V1τ 2 C1|t − τ|G(t; τ )n∆p 2νt,max C1|t − τ|G(t; τ )n∆p CΘG(t; τ )n∆p CΘG(t; τ )n∆p > (cid:112) (cid:112) 2νt,max νt,max (cid:27) (cid:27) (cid:27) (cid:27) (cid:27) σnτ σnτ > > σnt σnt σnt > > > (cid:27) , νt,max σnτ √ pr t∈E(C) pr t∈E(C) + (cid:88) + (cid:88) = = pr t∈E(C) pr t∈E(C) + ≤ (cid:88) pr t∈E(C) pr t∈E(C) + ≤ (cid:88) pr t∈E(C) + pr t∈E(C) 130 of G(t; τ ), and the fact that both ( ˆDnt − Dt)/σnt, ( ˆDnτ − Dτ )/σnτ ∼ N (0, 1), it follows that (cid:88) (cid:88) t∈E(C) (cid:26)| ˆDnt − Dt| (cid:26)| ˆDnτ − Dτ| σnt pr pr t∈E(C) σnτ (cid:27) (cid:27) ≤ (cid:88) ≤ (cid:88) t∈E(C) t∈E(C) (cid:18) |Z| >(cid:112)C log T (cid:18) |Z| >(cid:112)C log T (cid:19) (cid:19) , pr pr (3.54) , (3.55) > > CΘG(t; τ )n∆p νt,max CΘG(t; τ )n∆p νt,max where Z ∼ N (0, 1) and C is some constant. Recall that for a standard normal random variable, Z, and for any k > 0, pr(|Z| > k) ≤ 2 exp{−x2/2}. For a large enough C, the summation terms in (3.54) and (3.55) can be expressed as (cid:18) |Z| >(cid:112)C log T pr (cid:19) (cid:88) t∈E(C) ≤ (cid:88) t∈E(C) − C 2 < ε. 2T For large C, the series is convergent as T → ∞. Therefore, pr(maxt∈E(C) and ˆDnt > ˆDnτ ) < ε, (cid:18) νt,maxT √ log T ˆτ − τ = Op n∆p for ∆p = tr{(Σ1 − ΣT )2} and νt,max = maxt∈T max . The rate of convergence can be simplified further. The function w−1(t) is minimized at T /2. (cid:19) (cid:16)(cid:112)V0t/w2(t),(cid:112)nV1t/w2(t) (cid:19) (cid:17) (cid:17) √ nV1t V0t, . (cid:3) Therefore, ˆτ − τ = Op for ∆p = tr{(Σ1 − ΣT )2} and νmax = maxt∈T max (cid:18)νmax √ log T n∆p (cid:16)√ Proof of Theorem 10. Recall Theorem 5: Under the alternative H1 of (3.1), the maximum value of Dt is attained at one of the q change points. We will make use of this theorem in the proof that follows. We will first show that provided change points exist, we can detect their existence with probability one, and we can the locations of the change points with probability one. Assume 131 (cid:16) pr(Mn[It] > Wαn[It]) ≥ pr (cid:17) at least one change point exists in the interval It and that the cardinality of ˆQ is less than the cardinality of Q. To show we can detect the existence of change points with probability one in the interval It we must show that pr(Mn[It] > Wαn[It]) = 1. = 1 − pr = 1 − pr (cid:17) σ−1 nt,0[It] ˆDnt[It] ≤ Wαn[It] σ−1 nt,0[It] ˆDnt[It] > Wαn[It] (cid:16) (cid:18) ˆDnt[It] − Dt[It] (cid:18) σ−1 nt [It] = 1 − pr Z ≤ σnt,0[It] σ−1 nt [It] Wαn[It] − Dt[It] σ−1 nt [It] ≤ σnt,0[It]Wαn[It] − Dt[It] (cid:19) σ−1 nt [It] (cid:19) → 1, (cid:18) where Z is a standard normal random variable. The pr (cid:19) Z ≤ σnt,0[It] σ−1 nt [It] Wαn[It] − Dt[It] σ−1 nt [It] goes to zero by our premise that Wαn = o(mSNR) for any It. Therefore, it follows that we can detect the existence of a change point with probability one. Furthermore, by Theorem 5, Theorem 9 and our premise that νmax[It](cid:112)log(T )/n∆p[It] → 0, we can also correctly identify a change point with probability one. The above derivations do not depend on It since each subsequence satisfies the premises of this Theorem. We also need to demonstrate that no change points will be identified that are not true change points. Thus, consider the case where ˆQ = Q. It is sufficient to demonstrate that no change point will be detected among the remaining time interval segments. Under H0 of (3.1), as αn → ∞, it follows that by Theorem 8 pr(Mn[It] > Wαn[It]) = αn → 0 for some interval It with no change points. Therefore, no change points will be incorrectly-identified at any stage of the binary segmentation procedure. As a result ˆQ → Q in probability. (cid:3) 132 CHAPTER 4 A HIDDEN MARKOV APPROACH FOR QTL MAPPING USING ALLELE-SPECIFIC EXPRESSION SNPS 4.1 Introduction Allele-specific expression (ASE) is part of the foundation for genetic diversity and is paramount to programming and development of biological cells (Ferguson-Smith 2001). ASE serves as a proxy for differential expression of two alleles at the same location within an organism (Gu & Wang 2015). For example, allele-specific expression can be characterized as the ratio between allele A and allele T. Differential expression is primarily explained by three factors: cis-acting modification, post transcription modification, and epigenetic modification (Ferguson-Smith 2001). Cis-effects correspond to the allele-specific variation, and thus, by quantifying ASE it is possible to identify cis-acting effects on an inter-individual basis among heterozygous individuals (Buckland 2004). The presence of ASE implies one or multiple variants have cis-acting effects on gene expression levels that could be directly correlated to phenotypic variation (Skelly et al. 2011). In fact, the phenomena of ASE has become a focal point in identifying predispositions towards certain diseases (de la Chapelle 2009). Due to the importance of understanding ASE, two natural questions arise with regards to its influence on phenotypic traits. What is the relationship between single nucleotide polymorphisms (SNPs) with ASE and a phenotypic trait? Which SNPs with ASE are have an effect on phenotypic variation? Our focus in Chapter 4 is to develop a procedure using a novel hierarchical model to answer the second question. Quantitative trait loci (QTL) mapping is the statistical process of identifying locations in the genome that have an association with a complex phenotypic trait. For example, geneti- cists may be interested in understanding which genes affect cholesterol. Their understanding of this association can provide insight towards disease prevention and susceptibility. An effec- 133 tive QTL mapping procedure can also provide researchers a better understanding of breeding and appropriate techniques, and permit altered genetic variation within a population (Cheng et al. 2015). Studying SNPs with ASE and phenotypic variation was shown to be successful by Cheng et al. (2015). By applying multiple Bayesian approaches, Cheng et al. (2015) identified genetic markers in chickens associated with a resistance to Marek’s disease. This disease is highly contagious and results in paralysis of the animal. The potential to eradi- cate Marek’s disease through superior breeding techniques would be valuable to farmers and individuals within the animal science community. Cheng et al. (2015) discovered that 83% of the genetic variance in Marek’s disease resistance was explained by the selected SNPs exhibiting ASE. These results were validated through a progeny study that found a 22% difference in the occurrence of Marek’s disease after one generation of bidirectional selection (Cheng et al. 2015). The profound discovery by Cheng et al. (2015) gives credence to the fact that gene expression explains a large portion of phenotypic variation. Next generation RNA sequencing data is now being widely used to investigate the pres- ence of ASE. However, inference with regards to ASE remains a challenge along with map- ping quantitative trait loci in the presence of only RNA sequencing data (Skelly et al. 2011). Skelly et al. (2011) proposed a three-stage hierarchical Bayesian model to test ASE gene ex- pression and study cis-regulatory variation. However, their procedure requires genomic DNA data to establish prior probabilities. Similarly, Nariai et al. (2016) established a Bayesian framework with variational inference for estimating allele-specific expression. Their tech- nique also relied on diploid DNA data and did not link any phenotypic response. Hu et al. (2015) proposed a unified maximum likelihood approach combining two models based on ASE and total RNA read counts. Their approach involved cis-expression QTL mapping with RNA sequencing data via a beta-binomial distribution. In this chapter we present a novel two-step approach to perform QTL mapping using SNPs with allele-specific expression. In the first step, we predict the ASE ratios from RNA sequencing data. In step two, we use the predicted ASE ratios to identify SNPs with cis- 134 acting effects as it relates to a phenotypic response variable. We elicit a hierarchical model for analysis of RNA sequence data to discover polymorphisms in expressed sequences whose allele-specific expression is correlated with observed phenotypic variation. In our hierarchical model, we first implement a hidden Markov approach to impute the underlying genotype and ASE status combinations based on the RNA read count data and simultaneously predict ASE ratios for heterozygous SNP locations. Second, we apply regularized regression to identify SNPs with ASE ratios that significantly impact an observed phenotypic response. Ordinary least squares is then applied for refinement. Our proposed hierarchical model and procedure has several advantages over existing methods. First, the hidden Markov model allows us to model dependence among SNPs and affords accurate genotype-ASE status imputation given RNA read counts (Steibel et al. 2015). Second, our procedure obtains an ASE ratio estimate in the absence of genetic DNA data. Many of the existing techniques required genetic DNA data for ASE estimation. Third, our proposed model integrates RNA sequencing data and phenotypic data to make inferences about the ASE status and cis-acting affects on phenotypic data. Fourth, our proposed method is easy to implement, where parameter estimation for the hidden Markov model is performed using the expectation-maximization (EM) algorithm. Variable selection via cyclic coordinate descent allows us to identify significant SNPs quickly and accurately, given an adequate signal-to-noise ratio. Lastly, our hierarchical model offers flexibility with regards to the phenotypic response model of interest, mapping error, spatial dependency, and individual variation in ASE ratios (Steibel et al. 2015). Chapter 4 is organized as follows. In Section 4.2 we introduce the first layer of our proposed model. A hidden Markov model and genotype-ASE status with ASE prediction is recited based on the results of Steibel et al. (2015). In Section 4.3 we propose our method to identify SNPs with ASE that have cis-acting effects on a phenotypic variable of interest. Simulation results and a comparison with two competing procedures are detailed in Section 4.4. Our procedure is applied to a real data example that combines RNA sequencing data and phenotypic data from a sounder of swine in Section 4.5. The swim data set and a 135 procedure to implement the hidden Markov approach is available in the R package HMMASE at http://www.stt.msu.edu/users/pszhong/HMMASE.html. 4.2 A hidden Markov model for SNP genotype calling In this Section we introduce the basic setting and proposed model in Steibel et al. (2015). We introduce the salient features of their model, HMM-ASE, before we concentrate on ASE prediction and quantitative trait loci mapping in Section 4.3. Let Xil = (Xil1, Xil2, Xil3, Xil4)T be a random vector of RNA read counts at the lth SNP for the ith individual. Denote xil (l = 1, . . . , L; i = 1, . . . , n) as the observed RNA read counts, where xil1, xil2, xil3, xil4 represent observed counts for alleles A, C, G, and T, respectively. Define the total RNA read counts at SNP l for individual i as nil =(cid:80)4 j=1 xilj. Below we provide a set-up for a hidden Markov model with only two possible alleles: A or T. The procedure can easily be extended to consider a non-bi-allelic SNP. Let Gil be a latent variable that describes the genotype-ASE status with five possible hidden states. For each individual i at SNP l let Gil =  1 for “AA”, 2 for “AT-NASE”, 3 for “AT-ASE-HIGH”, 4 for “AT-ASE-LOW”, 5 for “TT”. (4.1) Two homozygous hidden states are represented by AA and TT, and three heterozygous states are classified according to a relative ASE level. The variable Gil is latent and we assume that Gil(l = 1, . . . , L) follows a Markov process. Let A be the probability transition matrix for the Markov process Gil. Define the transition probabilities as k, k(cid:48) = 1, . . . , 5. pr(Gil = k(cid:48)|Gi(l−1) = k) = akk(cid:48) (4.2) Let πik (i = 1, . . . , n; k = 1, . . . , 5) be the the initial probabilities of Gil1 being a specific state in (4.1) such that pr(Gi1 = k) = πik. 136 We assume that the RNA read counts are generated by a hierarchical model conditional on the underlying state of Gil and ASE ratios. Let δil be a random variable for ASE ratios conditional on Gil. Thus, δil|Gil = k ∼  I{δil=1} I{δil=0.5} Beta(0.5,1](α1, β1) Beta[0,0,5)(α2, β2) I{δil=0} for k = 1, for k = 2, for k = 3, for k = 4, for k = 5. (4.3) If the underlying genotype is homozygous, then the corresponding ASE ratio is either zero or one with probability one. If the underlying genotype is heterozygous, but without ASE, then the corresponding ASE ratio is 0.5 with probability one. For the two remaining heterozygous states, it follows that the ASE ratio is defined as Beta(0.5,1](α1, β1) and Beta[0,0,5)(α2, β2), where each represent scaled beta distributions with scale and shape parameters being α1, α2 and β1, β2, respectively. Conditional on Gil, we assume that δil are independent. The first layer of the hierarchical model is conditional on a latent genotype-ASE status. In the second layer of the hierarchical model we define the probability distribution for RNA read counts conditional on 4.3. gives us the distribution of the RNA read counts. Here we assume that Xil = (Xil1, Xil2, Xil3, Xil4)T conditional on δil follows a multinomial distribution such that (cid:16)(cid:16) where p(δil, e) = (cid:17) Xil|δil ∼ M ultinomial(nil, p(δil, e)), δil + 1 − e (cid:16) 4e (cid:17) 3 − 1 δil + e 3 , e 3 , e 3 , 1 − 4e 3 (cid:17) (4.4) is the probability vector in the multinomial distribution for A, C, G, and T, respectively. We assume that all reads are observable via a mapping error parameter denoted as e. (δil, 0, 0, 1 − δil) represents the probabilities for observing A, C, G, and T, respectively. If e = 0, then p(δil, 0) = Figure 4.1 illustrates our hidden Markov model specification for the ith individual when L = 5. The hidden variables Gil are dependent via a Markov process. The variables δil are 137 conditional on Gil and independent among each other. RNA read counts are conditional on Gil, but through δil. δi1 δi2 δi3 δi4 δi5 Gi1 Gi2 Gi3 Gi4 Gi5 Xi1 Xi2 Xi3 Xi4 Xi5 Figure 4.1: A graphical model for illustrating the hidden Markov model for SNP genotype calling. Grey circles represent observed values. White circles represent latent variables. Given observed RNA read counts, xil, we can predict the underlying genotype-ASE status, Gil, via the expectation-maximization (EM) algorithm and forward-backward proce- dure. In addition, and more importantly, given observed RNA read counts and underlying genotype-ASE statuses, we can derive the distribution for allele-specific expression ratios and use the posterior mode of the distribution as an estimate for the ratio of ASE. 4.3 Phenotypic model specification Our ultimate goal is to identify significant SNPs and understand their affects on pheno- typic variation. Let Yi be a phenotypic response of interest, where Yi∼fYi (yi|τi, φ) = exp + c(yi, φ) , i = 1, . . . , n. (4.5) (cid:20)yiτi − b(τi) a(φ) (cid:21) We assume the distribution of Yi is in the form of a known exponential family. Let τ be the canonical parameter and let φ be the dispersion parameter. For example, suppose the phenotypic trait is eye color. If eye color is binary, such as the case for blue eyes or not blue eyes, then we assume (4.5) follows a Bernoulli distribution. However, if the phenotype is continuous one may consider the distribution of (4.5) to be Gaussian or Exponential. Fur- l=1 δilγl, where γ is an L-dimensional vector of unknown parameters that represent the effects of gene expression to the phenotypic response Yi and δi is an L- 138 thermore, let η(δi) =(cid:80)L dimensional vector of ASE ratios. In order to relate the parameters of the distribution to the various predictors we denote E[Yi] = µi. Thus, for a canonical link function, h, it is the case that η(δi) = h(µi) = τi. In particular, if we assume Yi follows a normal distribution, then h is the identity link; whereas if Yi follows a Bernoulli distribution, then h could be the logit link. 4.3.1 Prediction of ASE ratios ASE ratios are unknown random variables. If we want to use them as predictors with regards to modeling a phenotypic response, then we need an estimation procedure. We consider two posterior probabilities that will be useful in our ultimate goal of identifying significant SNPs. Calculation of these two posterior distributions is dependent on an unknown parameter vector θ = (α1, β1, α2, β2, e, A). Details to obtain maximum likelihood estimates via the EM algorithm are provided in Steibel et al. (2015). Our first posterior probability of interest is pr(Gil = gil|X), which will be used for predicting the underlying genotype-ASE status of the lth SNP in the ith individual. Here X represents the RNA read counts for all n individuals at all L SNP positions. Given the states of Gil as defined in (4.1), we will also be able to deduce the ASE status for the respective individual and SNP. The posterior probability pr(Gil|X) can be computed by Bayes’ formula. Let Gi = (Gi1,··· , GiL)T represent all the possible genotype-ASE status combinations. Then the posterior probability is (cid:88) (cid:88) Li,k(l) := pr(Gil = k|X) = pr(Gi|X)I(Gil = k) = pr(X, Gi) pr(X) I(Gil = k). (4.6) Gi Gi In order to estimate the hidden state of Gil (i = 1, . . . , n; l = 1, . . . , L) we compute maxk Li,k(l) for each individual and SNP combination. The quantity Li,k(l) is computed from the EM algorithm. By the definition of the random variable δil, we aim to use the estimated state of Gil to obtain an estimate for the ratio of ASE. This leads to our second posterior probability of interest. 139 Let θ∗ be the updated parameter vector upon convergence of the EM algorithm. Consider f (δil|Xil, Gil = k; θ∗) such that f (δil|Xil, Gil = k; θ∗) = f (Xil|δil; θ∗)f (δil|Gil = k; θ∗) (cid:82) f (Xil|δil; θ∗)f (δil|Gil = k; θ∗)dδil , (4.7) where the distributions of f (Xil|δil; θ∗) and f (δil|Gil = k; θ∗) are defined in (4.4) and (4.3), respectively. It follows that the denominator of (4.7) can be expressed as  (cid:1)(1 − e)Xil1( e (cid:1)(0.5 − e (cid:1)( e (cid:1)( e (cid:1)(1 − e)Xil4( e 3)Xil2+Xil3 3)Xil2+Xil3 Xil Xil (cid:0) nl (cid:0) nl (cid:0) nl (cid:0) nl (cid:0) nl Xil Xil Xil 3)nil−Xil1 3)Xil1+Xil4( e 3)Xil2+Xil3 C0(θ∗;Xil1,Xil4) 0.5α1+β1−1B(α1,β1) C1(θ∗;Xil1,Xil4) 0.5α2+β2−1B(α2,β2) 3)nil−Xil4 for k = 1, for k = 2, for k = 3, (4.8) for k = 4, for k = 5 f (Xil|Gil = k; θ∗) = Xil where(cid:0) nl (cid:1) = (cid:90) 1 (cid:90) 0.5 C0 = 0.5 C1 = 0 nl! Xil1!Xil2!Xil3!Xil4!, and ((1 − 4e 3 )δil + )Xil1(( e 3 4e 3 − 1)δil + 1 − e)Xil4(1 − δil)α1−1(δil − 0.5)β1−1dδil, ((1 − 4e 3 )δil + )Xil1(( e 3 4e 3 − 1)δil + 1 − e)Xil4δ α2−1 il (0.5 − δil)β2−1dδil. If Gil = 3 or Gil = 4, then we know that the heterozygous genotype has ASE with the quantity determined by a rescaled beta distribution. Thus, we only consider these estimated 140 states when computing an estimate for δil. Hence, for Gil = 3 and Gil = 4 it follows that M0(δil, Xil1, Xil4, θ∗)(δil − 0.5)α1−1(1 − δil)β1−1 for k = 3, M1(δil, Xil1, Xil4, θ∗)(δil)α2−1(0.5 − δil)β2−1 for k = 4, (4.9)  f (δil|Xil, Gil = k; θ∗) = where M0 = ((1 − 4e 3 )δil + M1 = ((1 − 4e 3 )δil + )Xil1(( )Xil1(( e 3 e 3 4e 3 4e 3 − 1)δil + 1 − e)Xil4C0(θ∗; Xil1, Xil4) − 1)δil + 1 − e)Xil4C1(θ∗; Xil1, Xil4), with C0 and C1 as defined above. We define our ASE ratio estimate as ˆδil (i = 1, . . . , n; l = 1, . . . , L), where ˆδil is the mode of the posterior distribution in (4.9). 4.3.2 Identification of quantitative trait loci In order to quantify the impact of ASE on phenotypic variation we utilize ˆδil as estimated in Section 4.3.1. For L large, we aim to find a sparse solution for the L-dimensional parameter vector γ. In order to accomplish this task we apply a Lasso penalty. A sparse solution is computed via cyclical coordinate descent and k-fold cross validation. For Yi as defined in (4.5) an estimated vector ˆγ is given by (cid:34) (cid:40) − n(cid:88) i=1 (cid:80)L yi l=1 ˆδilγl − b((cid:80)L a(φ) ˆγ = arg min γ (cid:35) (cid:41) ˆδilγl) l=1 + c(yi, φ) + λ∗||γ||1 , (4.10) where λ∗ is a non-negative regularization parameter. If Yi has a Binomial distribution, then T ˆδi ˜γ − log(1 + e ˆδi T γ)) + λ∗||γ||1 (yi . (4.11) (cid:41) a solution for ˆγ is given by ˆγ = arg min ˆγ (cid:40) − n(cid:88) i=1 141 Similarly, if Yi has as Gaussian distribution, then a solution for ˆγ is given by (cid:40) n(cid:88) i=1 (cid:41) ˆγ = arg min ˆγ (yi − ˆδi T γ)2 + λ∗||γ||1 . (4.12) A cyclic coordinate descent algorithm can be used to solve (4.10) – (4.12) (Friedman et al. 2010). Solutions are provided across a range of λ∗ values, so to determine an optimal sparse solution we perform k-fold cross validation as a way to extract SNPs that have non-zero coefficients for a specific value of λ∗. Our choice of λ∗ is based on the “one-standard error” rule as it provides the most parsimonious model whose error is no more than one standard error above the error of the best model. Details on cyclic coordinate descent and how it can be applied to specific exponential families is available in Friedman et al. (2010). After identifying a sparse solution for ˜γ we apply ordinary least squares using the phe- notypic response and filtered ˆδ to obtain estimates and standard errors for the non-zero γs. Ordinary least squares estimates are given by (cid:16) ∗T yi − ˆδi γ∗(cid:17)2(cid:41) (cid:40) n(cid:88) i=1 ˆγ∗ = arg min γ∗ , (4.13) where ∗ denotes a filtered set of predictors and parameters after (4.10). We summarize our procedure as follows. First, obtain ASE ratio estimates given RNA read counts and imputed genotype-ASE statuses. Second, apply variable selection to deter- mine the SNPs with ASE that influence phenotypic variation. Third, model the relationship between SNPs with ASE and the response using ordinary least squares. Figure 4.2 illustrates the relationships between Gil, Xil, δil, and Yi for the ith individual and L = 5. 142 Yi δi1 δi2 δi3 δi4 δi5 Gi1 Gi2 Gi3 Gi4 Gi5 Xi1 Xi2 Xi3 Xi4 Xi5 Figure 4.2: Grey circles represent observed values. White circles represent latent variables. 4.4 Simulation studies In this section we demonstrate the performance of our proposed two-stage model in identifying significant SNPs with ASE ratios as they relate to a phenotype. We consider a simplified version of (4.1) for the simulation by ignoring an extended classification of heterozygous genotype-ASE states. Hence, we assume  Gil = 1 for “AA”, 2 for “AT-ASE”, 3 for “TT”, (4.14) follows a three-state Markov process. Our data generation process consisted of the following steps. First, two independent haplotypes were generated to form genotypes. The sequences for each of the haplotypes were created using linkage disequilibrium information. For each individual and SNP, total RNA read counts were generated from a negative binomial distri- bution with parameter λ and probability parameter p = .40. RNA read counts of A, C, G, and T were then generated according to the total number of RNA read counts and (4.14) – 143 (4.15). From (4.14) it follows that δil|Gil = k ∼  I{δil=1} Beta(α, β) I{δil=0} for k = 1, for k = 2, for k = 3. (4.15) Let δ be an n × L matrix that represents the true allele-specific expression ratios given the true underlying genotype-ASE statuses. For a given individual and SNP, where the underlying genotype is homozygous, the corresponding value in the matrix δ is represented by zero. We set these values to zero because our interest is only in exploring the cis-acting genetic effects on phenotypic variation. In the next step we generate the phenotypic response by the following linear model, where L(cid:88) yi = δT il γl + εi, i = 1, . . . , n, (4.16) l=1 and γ is an L-dimensional parameter vector. We assume γ is sparse and only allow the first four elements to be non-zero. Thus, γ = (γ1, γ2, γ3, γ4, 0, . . . , 0)T such that γ1 = γ2 = γ3 = γ4. Under this set-up, the first four SNPs have cis-acting effects while the remaining L − 4 SNPs have no cis-acting effects on yi. In the simulation studies we set n = 50 and 100, and L = 8, 15, and 50. The parameter λ defined in the negative binomial distribution to simulate total RNA read counts was set to 16 and 24. The signal strength used in generation of the continuous phenotypic data,γ, was set to 2, 3, 5, and 7. Lastly, we set e = 0.07, α = 3, β = 3 for the mapping parameter and Beta distribution parameters, respectively; and the linkage disequilibrium information was set to 0.30. The simulation results presented in the Tables are figures were based off 100 replications. To evaluate the performance of our proposed method considered the false positive rate and false negative rate. For a given parameter combination, the false positive and false negative rates were averaged over the 100 replications. The false positive rate and false 144 negative rate are defined as follows: False Positive Rate = # coefficents falsely identified as non-zero # non-zero coefficients False Negative Rate = # coefficents falsely identified as zero # zero coefficients (4.17) Figure 4.3: Average false negative rates and average false positive rates for the proposed method. Facets in row 1 are for n = 50. Facets in row 2 are for n = 100. The simulation results for a single test at significance level 0.01 are illustrated in Figure 4.3. As the heritability, or value of γ increases, the average false negative rate decrease. The same relationship holds for the average false positive rates. As the number of SNPs increases for a given γ, the average false positive rate increases whereas the average false negative rate decreases. As the sample size increases both average rates decrease. The top row of plots corresponds to the setting in which n = 50, and the bottom row of plots corresponds to the setting in which n = 100. Lastly, all else held constant, a larger value of λ generally results in smaller false negative and false positive rates. A larger value of λ means a larger number of RNA read counts, and thus, more information. From a practical perspective, a low average 145 false negative rate implies significant SNPs will not fail to be identified. Similar trends exist when we consider a simultaneous test. Average rate values are displayed in Table 4.1 and Table 4.2 for the single and simultaneous test, respectively. Table 4.1: Average false positive and average false negative rates for the single test with significance level 0.01. Average false positive rate is top value L 8 8 λ 16 24 15 16 15 24 50 16 50 24 γ = 2 0.0074 0.2633 0.0182 0.2105 0.0357 0.1240 0.0160 0.1062 0.0977 0.0345 0.1062 0.0288 n = 50 n = 100 γ = 3 0.0138 0.0828 0.0040 0.0573 0.0217 0.0515 0.0120 0.0328 0.1021 0.0138 0.1337 0.0086 γ = 5 0.0120 0.0240 0.0073 0.0040 0.0145 0.0099 0.0185 0.0042 0.0650 0.0021 0.0549 0.0011 γ = 7 0.0078 0.0085 0.0060 0.0020 0.0290 0.0052 0.0073 0.0017 0.0642 0.0021 0.0562 0.0004 γ = 2 0.0085 0.0678 0.0093 0.0220 0.0308 0.0184 0.0120 0.0117 0.0676 0.0051 0.0610 0.0034 γ = 3 0.0060 0.0040 0.0060 0.0000 0.0278 0.0034 0.0020 0.0000 0.0716 0.0002 0.0545 0.0002 γ = 5 0.0020 0.0000 0.0040 0.0000 0.0060 0.0000 0.0133 0.0000 0.0486 0.0000 0.0463 0.0000 γ = 7 0.0060 0.0000 0.0133 0.0000 0.0093 0.0000 0.0080 0.0000 0.0330 0.0000 0.0420 0.0000 Table 4.2: Average false positive and average false negative rates for the simultaneous test with nominal level 0.05. Average false positive rate is top value L 8 8 λ 16 24 15 16 15 24 50 16 50 24 γ = 2 0.0718 0.0668 0.0756 0.0440 0.1233 0.0436 0.1348 0.0350 0.2033 0.0178 0.2433 0.0124 n = 50 n = 100 γ = 3 0.0726 0.0213 0.0854 0.0040 0.1132 0.0106 0.1141 0.0077 0.2445 0.0043 0.2884 0.0020 γ = 5 0.0702 0.0040 0.0762 0.0000 0.1177 0.0008 0.0861 0.0027 0.2238 0.0011 0.1923 0.0004 γ = 7 0.0603 0.0020 0.0450 0.0000 0.1146 0.0009 0.0885 0.0000 0.2001 0.0013 0.1848 0.0002 γ = 2 0.0512 0.0120 0.0548 0.0065 0.0975 0.0025 0.0543 0.0025 0.1544 0.0017 0.1403 0.0015 γ = 3 0.0433 0.0000 0.0400 0.0000 0.0709 0.0000 0.0273 0.0000 0.1528 0.0002 0.1264 0.0000 γ = 5 0.0293 0.0000 0.0326 0.0000 0.0378 0.0000 0.0352 0.0000 0.0925 0.0000 0.1080 0.0000 γ = 7 0.0343 0.0000 0.0363 0.0000 0.0360 0.0000 0.0327 0.0000 0.0905 0.0000 0.0884 0.0000 We evaluated the performance of our proposed method with two alternative procedures. 146 Alternative procedure 1 used an exact binomial test based on the simulated RNA read counts in order to estimate the unknown genotype-ASE status. Our test was performed under the null hypothesis p = 0.50 with alternatives p > 0.50 and p < 0.50 corresponding to geno- types AA and TT, respectively. Following genotype-ASE state imputation, we performed an ordinary least squares post Lasso technique using the simulated phenotypic data as the response and the estimated genotype-ASE statuses as predictors. Thus, we did not consider ASE estimation in this alternative procedure. The average false positive and average false negative rates were calculated under the same parameter scenarios as our proposed method. Figure 4.4: Average false negative rates and average false positive rates for alternative procedure 1. Facets in row 1 are for n = 50. Facets in row 2 are for n = 100. Figure 4.4 depicts the average false positive and average false negative rates. When n increases from 50 to 100, the average false negative rate decreases slightly. However, we do not see the precipitous decline in average false negative rates as heritability increases, compared to our proposed method. Likewise, the average false positive rate decreases as the sample size increases. As the number of SNPs increases, the average false positive rate 147 increases and is much higher than the average rates in Figure 4.3. Table 4.3 provides the raw values for all parameter combinations. Table 4.3: Alternative method 1, average false positive and average false negative rates for the single test with significance level 0.01. Average false positive rate is top value L 8 8 λ 16 24 15 16 15 24 50 16 50 24 γ = 2 0.0312 0.4864 0.0714 0.4886 0.2500 0.2592 0.2564 0.2608 0.3646 0.0778 0.4769 0.0782 n = 50 n = 100 γ = 3 0.1750 0.4869 0.1190 0.4884 0.0000 0.2577 0.1667 0.2555 0.5370 0.0782 0.3833 0.0775 γ = 5 0.0000 0.4867 0.0476 0.4845 0.1000 0.2587 0.1369 0.2519 0.4674 0.0763 0.4031 0.0750 γ = 7 0.0500 0.4843 0.1136 0.4850 0.2304 0.2579 0.1591 0.2538 0.2978 0.0753 0.3476 0.0760 γ = 2 0.0000 0.4864 0.0294 0.4876 0.1190 0.2528 0.0526 0.2509 0.2892 0.0767 0.2325 0.0762 γ = 3 0.0309 0.4726 0.1250 0.4776 0.1133 0.2513 0.0000 0.2493 0.2179 0.0740 0.1471 0.0757 γ = 5 0.0714 0.4752 0.0000 0.4674 0.0808 0.2434 0.0778 0.2447 0.3053 0.0757 0.2546 0.0714 γ = 7 0.0333 0.4746 0.0750 0.4578 0.0469 0.2469 0.0783 0.2424 0.2048 0.0719 0.1333 0.0723 The weak performance of alternative method 1 comes from two sources. First, the bino- mial test results in less accurate genotype predictions compared to (4.6). Assuming genotypes follow a Markov process and using a hidden Markov model to impute their state provides ex- tra information that results in accurate predictions (Ferguson-Smith 2001). Second, correctly predicted heterozygous states do not consider ASE. For alternative method 2 we estimated an allele-specific expression ratio directly from the simulated RNA read counts. Let ˆASEil be an estimated quantity for ASE such that ˆASEil = Xilref Xilref + Xilalt i = 1, . . . , n; l = 1, . . . , L, (4.18) where we define A to be the reference allele and T to be the alternative allele. Again, we performed an ordinary least squares post-Lasso procedure in conjunction. The average false positive and average false negative rates were calculated under the same parameter scenarios as our proposed method and alternative method 1. Figure 4.5 and Table 4.4 illustrate that 148 the performance is similar to alternative method 1 and inferior to our proposed method with regards to false positive and false negative metrics. Figure 4.5: Average false negative rates and average false positive rates for alternative procedure 2. Facets in row 1 are for n = 50. Facets in row 2 are for n = 100. 149 Table 4.4: Alternative method 2, average false positive and average false negative rates for the single test with significance level 0.01. Average false positive rate is top value L 8 8 λ 16 24 15 16 15 24 50 16 50 24 γ = 2 0.0938 0.4875 0.1250 0.4876 0.3000 0.2628 0.1154 0.2591 0.3968 0.0787 0.4009 0.0771 n = 50 n = 100 γ = 3 0.2111 0.4917 0.0000 0.4850 0.3148 0.2614 0.3182 0.2626 0.4011 0.0756 0.4190 0.0777 γ = 5 0.1522 0.4867 0.1190 0.4861 0.1618 0.2552 0.1746 0.2521 0.4141 0.0770 0.5042 0.0773 γ = 7 0.1391 0.4817 0.1481 0.4832 0.1458 0.2552 0.2283 0.2562 0.4429 0.0778 0.5198 0.0758 γ = 2 0.0000 0.4860 0.1667 0.4924 0.0702 0.2529 0.0000 0.2598 0.2083 0.0773 0.2633 0.0765 γ = 3 0.0185 0.4738 0.0962 0.4793 0.0938 0.2443 0.0909 0.2517 0.1607 0.0773 0.2333 0.0737 γ = 5 0.0455 0.4710 0.0556 0.4716 0.0500 0.2490 0.1496 0.2399 0.1865 0.0746 0.2646 0.0732 γ = 7 0.0095 0.4680 0.0000 0.4623 0.1000 0.2455 0.0208 0.2499 0.2033 0.0745 0.1470 0.0720 Figure 4.6 characterizes the discrepancy between an ASE estimate from RNA read counts defined in (4.18) and an ASE estimate using our proposed hierarchical model. For values less than 0.50, the hidden Markov model ASE estimate is greater than our naive estimate in (4.18). Above 0.50 the hidden Markov model ASE estimate is less than our naive estimate. Figure 4.6: ASE estimates from the hidden Markov model compared to simulated raw allele count ratios. Hidden Markov model imputed ASE ratios with value less than 0.50 are marked as red, and values above 0.50 are marked as blue. 150 Through our simulation analysis, our hierarchical model appears to perform better than two alternative procedures at identifying SNPs with cis-acting effects on phenotypic varia- tion. 4.5 An empirical study The following paragraph provides some of the data gathering and processing details as explained in Steibel et al. (2015). RNA sequence data was obtained from 24 female pigs from an F2 cross of Duroc and Pietrain breeds (Choi et al. 2012, Choi et al. 2011, Edwards et al. 2008a, Edwards et al. 2008b, Steibel et al. 2011). Protocols for RNA sequencing and the accuracy of genotype calling using a hidden Markov-ASE model have already been established in Steibel et al. (2015). To summarize the process, RNA from each sample was reverse transcribed, fragmented, barcode-labeled and sequenced on an Illumina HiSeq 2000 (100 bp, paired-end reads). After quality control filtering, sequence reads were aligned to reference genome (Sus scrofa 10.2.69 retrieved from the Ensembl database) using Tophat (Trapnell et al. 2009). Coding SNP discovery and genotyping were done with VarScan (Trapnell et al. 2009). We focused on chromosome 13 and extracted counts of reads agreeing with reference (R) or alternative (A) allele with respect to the reference genome at putative 5364 cSNP and we retained read counts on 65 SNPs that could be independently validated using a SNP chip (Steibel et al. 2015). In addition to the RNA sequence data, 45 minute post-mortem meat pH was recorded in these animals (Edwards et al. 2008b) and served as our phenotypic response variable for analyses. The RNA sequence data we analyzed is available in the HMMASE R package which is available at http://www.stt.\msu.edu/users/pszhong/HMMASE.html. The data set was partitioned so that the minimum number of SNPs in a segment is 30. Our proposed proce- dure was applied to each segment of RNA sequence data. Figure 4.7 depicts estimates for significant SNPs along with their SNP ID number. For example, the second segmented data set produced four significant SNPs: 12256008, 12400307, 12403644, and 12404379. 151 Figure 4.7: Estimates for SNPs. Significant SNPs are displayed with their respective ID provided in the real data set. IDs correspond to the ordered locations. We investigated the effects of an estimated ASE ratio from the hidden Markov model compared to the naive estimate defined in (4.18). Figure 4.8 depicts the relationship between the two estimates, and reveals a shrinkage around each Beta distribution’s mode of 0.25 and 0.75, respectively. For values below the respective mode, the hidden Markov ASE estimate is less than the raw allele count ratio, and for values above the respective mode, the hidden Markov ASE estimate is greater than the raw allele count ratio. 152 Figure 4.8: ASE estimates from the hidden Markov model compared to real raw allele count ratios. Hidden Markov imputed ASE values conditional on Gil = 3 and Gil = 4 are marked as blue and red, respectively. Figure 4.9: ASE estimates from the hidden Markov model compared to real raw allele count ratios. Hidden Markov imputed ASE values conditional on Gil = 3 and Gil = 4 are marked as blue and red, respectively. 153 CHAPTER 5 CONCLUSION 5.1 Introduction In this chapter we summarize the salient contributions to the field of Statistics based on the content of Chapters 2 through 4. We also introduce details to new and exciting research challenges. 5.2 Summary of contributions In Chapter 2, we proposed a novel nonparametric test procedure for testing the temporal homogeneity of covariance matrices with high-dimensional longitudinal data. The proce- dure aims to detect and identify change points among a temporally dependent collection of covariance matrices. In Chapter 2, a new test statistic was introduced, and theoretical results were derived under an asymptotic setting in which n and p diverge and T is finite. The test statistic’s asymptotic distribution was derived under mild dependence assumptions but with no assumption of sparsity and no requirement on the relationship between n and p. We also proposed a procedure to identify the locations of change points through binary segmentation. The corresponding change point identification estimator’s rate of convergence was investigated and shown to be consistent provided an adequate signal-to-noise ratio ex- ists. Numerical studies demonstrated the finite sample performance of our procedure. These developments expanded the field of Statistics by pioneering a robust procedure to detect and identify change points among covariance matrices in the presence of high-dimensional longitudinal data. In Chapter 3, we widened the scope of applicability with regards to the procedure de- veloped in Chapter 2. Theoretical results were derived under an asymptotic framework in which n, p, and T all diverge. We established the test statistic’s asymptotic distribution and 154 demonstrated that the change point identification estimator’s rate of convergence depends on n, p, T , and the signal-to-noise ratio. Therefore, the estimator was also shown to be con- sistent in a diverging T setting, provided an adequate signal-to-noise ratio exists. Numerical studies demonstrated the finite sample performance for a large T setting. Chapter 3 also addressed computation challenges. Recursive formulae were derived as were computation effi- cient forms of U-type statistics. In addition, we proposed an accurate quantile approximation procedure to via an estimated correlation matrix. The overall computation complexity was reduced from the order pn4T 6 to the order of pn2T 3. These theoretical and computational developments of Chapter 3 made our procedure applicable to high-dimensional functional data and allowed us to demonstrate our method using a task-based fMRI data set. Thus, the contribution to Statistics in Chapter 3 is an expanded scope of the cutting-edge procedures introduced in Chapter 2. In Chapter 4, we developed a hierarchical model to understand the relationship between allele-specific expression and phenotypic variation. Our hierarchical model was able to use RNA sequence data and identify SNPs with ASE that have a cis-acting effect on a phenotypic response. The performance is accurate and can quickly be applied through a combination of the EM algorithm and Lasso procedure. 5.3 Future research The procedures established in Chapter 2 and extended in Chapter 3 required mild as- sumptions but did not allow much flexibility in terms of n and T . For example, in many longitudinal studies patients drop out, measurements are missing at random or non-random time points, and the sample size can be extremely small or even one. The methods de- veloped in Chapters 2 and 3 will not be applicable to data under these settings. More work is necessary to accommodate a wider domain of real-world data and problems. One valuable extension of our work will be to develop a procedure for single-subject inference in high-dimensional longitudinal data and high-dimensional functional data. In terms of 155 a homogeneity test for covariance matrices, this setting will allow for an increase in scope of applications due to more realistic assumptions, and a sample size requirement of only one. For genetic or fMRI data, an effectively developed procedure will invoke a personalized medicine approach and have a greater benefit to the individual patient. Other potential applications of this work would include real estate and financial data, and motion sensor data for activities. From a computation standpoint, accurate and fast approximations could be developed to handle situations where T is of the order 1000. Even with a high-performance computing cluster, it is not practical to apply our proposed procedure for massive values of T . However, as technology improves and longitudinal studies expand, the demand to address massive high-dimensional longitudinal data will increase. It is paramount that statistical methods can produce accurate and fast results for practitioners. A natural extension to the model proposed in Chapter 4 is to develop a unified likeli- hood approach in a hierarchical framework. Rather than only use RNA read count data to predict the underlying genotype with ASE status, we could perform this prediction given phenotypic data and RNA read counts. The additional information should improve predic- tion accuracy. From a theoretical perspective, a unified likelihood approach would allow for statistical inference, and under certain conditions consistency and asymptotic normality could be proved. From a computation perspective, we could investigate a procedure that performs variable selection and parameter estimation through a penalized EM algorithm or penalized variational EM algorithm. 156 BIBLIOGRAPHY 157 BIBLIOGRAPHY Ahmad, R. M. (2017). Testing homogeneity of several covariance matrices and multi-sample sphericity for high-dimensional data under non-normality. Communications in Statistics - Theory and Methods, 46:8, 3738–3753. Aminikhanghahi, S. & Cook, D. (2016). A survey of methods for time series change point detection. Knowledge And Information Systems, 51(2), 339–367. Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, New York: John Wiley. Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., & Cherry, J. et al. (2000). Gene ontology: tool for the unification of biology. The gene ontology consortium. Nature Genetics, 25, 25–29. Aue, A., Hormann, S., Horv´ath, L., & Reimherr, M. (2009). Break detection in the covariance structure of multivariate time series models. Annals of Statistics, 37, 4046–4087. Bach, F. R. (2008). Bolasso: Model consistent lasso estimation through the bootstrap. In Cohen, W. W., Mccallum, A., & Roweis, S. T., editors, Proceedings of the 25th In- ternational Conference on Machine Learning, pages 33–40, Brookline, MA. Microtome Publishing. Bai, Z. & Saranadasa, H. (1996). Effect of high dimension: by an example of a two sample problem. Statistica Sinica, 6, 311–329. Bai, Z. D. & Yin, Y. Q. (1993). Limit of the Smallest Eigenvalue of a Large Dimensional Sample Covariance Matrix. Annals of Probability, 21, 1276–1294. Baldassano, C., Chen, J., Zadbood, A., Pillow, J., Hasson, U., & Norman, K. (2017). Discovering Event Structure in Continuous Narrative Perception and Memory. Neuron, 95(3), 709–721.e5. Barnett, I. & Onnela, J.-P. (2016). Change point detection in correlation networks. Scientific Reports, 6, 18893. Basseville M. & Nikiforov, I. V. (1993). Detection of Abrupt Changes - theory and application. Prentice Hall. Box, G. E. (1949). A general distribution theory for a class of likelihood criteria. Biometrika, 36, 317–346. Brodsky, B. E. (2017). Change-Point Analysis in Nonstationary Stochastic Models, CRC Press. Brodsky, B. E. & Darkhovsky, B.S. (1993). Nonparametric Methods in Change Point Problems, Boston: Springer. 158 Brown, R., Durbin, J. & Evans, J. (1975). Techniques for Testing the Constancy of Regression Relationships over Time. Journal of the Royal Statistical Society: Series B, 37(2), 149–192. Buckland, P. (2004). Allele-specific gene expression differences in humans. Human Molec- ular Genetics, 13 (suppl 2), R255–R260. B¨uhlmann, P. & van de Geer, S. (2011). Statistics for High-Dimensional Data, Boston: Springer. Cai, T., Liu, W., & Xia, Y. (2013). Two-sample covariance matrix testing and sup- port recovery in high-dimensional and sparse settings. Journal of the American Statistical Association, 108(501), 265–277. Cai, T. & Ma, Z. (2013). Optimal hypothesis testing for high dimensional covariance matrices. Bernoulli, 19(5B), 2359–2388. Cai, T. & Xia, Y. (2014). High-dimensional sparse MANOVA. Journal of Multivariate Analysis, 131, 174–196. Candes, E. & Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics, 35(6), 2313–2351. Chen, J., Leong, Y., Honey, C., Yong, C., Norman, K., & Hasson, U. (2016). Shared memories reveal shared structure in neural activity across individuals. Nature Neu- roscience, 20(1), 115–125. Chen, S. X. & Qin, Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Annals of Statistics, 38, 808–835. Chen, S. X., Zhang, L., & Zhong, P.-S. (2010). Testing high dimensional covariance matrices. Journal of the American Statistical Association, 105, 810–819. Cheng, H., Perumbakkam, S., Pyrkosz, A., Dunn, J., Legarra, A., & Muir, W. (2015). Fine mapping of QTL and genomic prediction using allele-specific expression SNPs demonstrates that the complex trait of genetic resistance to Marek’s disease is predominantly determined by transcriptional regulation. BMC Genomics, 16(1). Chernoff, H. & Zacks, S. (1964). Estimating the Current Mean of a Normal Distribution which is Subjected to Changes in Time. Annals of Mathematical Statistics, 35(3), 999– 1018. Choi, I., Bates, R., Raney, N., Steibel, J., & Ernst, C. (2012). Evaluation of QTL for carcass merit and meat quality traits in a US commercial Duroc population. Meat Science, 92:132–8. Choi, I., Steibel, J., Bates, R., Raney, N., Rumph, J., & Ernst, C. (2011). Identification of Carcass and Meat Quality QTL in an F(2) Duroc x Pietrain pig resource population using different least-squares analysis models. Frontiers in Genetics, 2:18. 159 Cs¨org¨o, M. & Horv´ath, L. (1997). Limit Theorems in Change-Point Analysis, New York: John Wiley. Danaher, P., Paul, D., & Wang, P. (2015). Covariance-based analyses of biological pathways. Biometrika, 102, 533–544. de la Chapelle, A. (2009). Genetic predisposition to human disease: allele-specific ex- pression and low-penetrance regulatory loci. Oncogene, 28(38), 3345–3348. Dempster, A. (1958). A High Dimensional Two Sample Significance Test. Annals of Math- ematical Statistics, 29(4), 995–1010. Dette, H., Pan, G., & Yang, Q. (2018). Estimating a change point in a sequence of very high-dimensional covariance matrices. Arxiv.org. Donoho, D. L. (2000). High-dimensional data analysis: The curses and blessings of dimen- sionality. In AMS Conference on Mathematical Challenges of the 21st Century. Donoho, D. L. & Johnstone, I. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3), 425–455. Edwards, D., Ernst, C., Raney, N., Doumit, M., Hoge, M., & Bates, R. (2008a). Quantitative trait loci mapping in an F2 Duroc x Pietrain resource population: I. Growth traits. Journal of Animal Science, 86:241–53. Edwards, D., Ernst, C., Raney, N., Doumit, M., Hoge, M., & Bates, R. (2008b). II. Quantitative trait locus mapping in an F2 Duroc x Pietrain resource population: Carcass and meat quality traits. Journal of Animal Science, 86:254–66. Efron, B. (2007). Size, power and false discovery rates. Annals of Statistics, 35, 1351–1377. Fan, J., Han, F., & Liu, H. (2014a). Challenges of Big Data analysis. National Science Review, 1(2):293–314. Fan, J. & Li, R. (2001). Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association, 96:456, 1348–1360 Fan, J. & Li, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery. In Sanz-Sole, M., Soria, J., Varona, J. L., & Verdera, J., editors, Proceedings of the International Congress of Mathematicians, volume 3, pages 595–622, Z¨urich, CH. European Mathematical Society. Fan, J., Liao, Y., & Yao, J. (2015). Power enhancement in high dimensional cross- sectional tests. Econometrica, 83, 1497–1541. Ferguson-Smith, A. (2001). Imprinting and the Epigenetic Asymmetry Between Parental Genomes. Science, 293(5532), 1086–1089. Finn, E., Shen, X., Scheinost, D., Rosenberg, M., Huang, J., & Chun, M. et al. (2015). Functional connectome fingerprinting: identifying individuals using patterns of brain connectivity. Nature Neuroscience, 18(11), 1664–1671. 160 Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for General- ized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. Fu, W. & Knight, K. (2000). Asymptotics for lasso-type estimators. Annals of Statistics, 28(5):1356–1378. Fujikoshi, Y., Ulyanov, V., & Shimizu, R. (2010). Multivariate Statistics, Wiley Series In Probability And Statistics. Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., & Hothorn, T. (2018). mvtnorm: Multivariate Normal and t Distributions, R package version 1.0–8. Gu, F. & Wang, X. (2015). Analysis of allele specific expression - a survey. Tsinghua Science And Technology, 20(5), 513–529. Hall, P. & Heyde, C. (1980). Martingale Limit Theory and Its Application, New York: Academic Press. Hinkley, D. V. (1970). Inference about the change-point in a sequence of random variables. Biometrika, 57, 1–17. Horv´ath, L. & Kokoszka, P. (1997). The effect of long-range dependence on change- point estimators. Journal of Statistical Planning And Inference, 64(1), 57–81. Huber, P. J. (1981). Robust statistics, New York: John Wiley. Johnson, R. & Bagshaw, M. (1974). The Effect of Serial Correlation on the Performance of CUSUM Tests. Technometrics, 16(1), 103–112. Johnstone, I. (2001). On the distribution of the largest eigenvalue in principal components analysis.Annals of Statistics, 29 (2001), no. 2, 295–327. Johnstone, I. & Titterington, D. (2009). Statistical challenges of high-dimensional data. Philosophical Transactions of The Royal Society A: Mathematical, Physical And Engineering Sciences, 367(1906), 4237–4253. Kander, Z. & Zacks, S. (1966). Test Procedures for Possible Changes in Parameters of Statistical Distributions Occurring at Unknown Time Points. Annals of Mathematical Statistics, 37(5), 1196–1210. Kannan, R. P., Hensley, L. L., Evers, L. E., Lemon, S. M., & McGivern, D. R. (2011). Hepatitis C virus infection causes cell cycle arrest at the level of initiation of mitosis. Journal of Virology, 85, 7989–8001. Koh, W., Pan, W., Gawad, C., Fan, H. C., Kerchner, G. A., Wyss-Coray, T., Blumenfeld, Y. J., El-Sayed, Y. Y., & Quake, S. R. (2014). Noninvasive in vivo monitoring of tissue-specific global gene expression in humans. Proceedings of the National Academy of Sciences, 111, 7361–7366. 161 Kundu, S., Ming, J., Pierce, J., McDowell, J., & Guo, Y. (2018). Estimating dynamic brain functional networks using multi-subject fMRI data. Neuroimage, 183, 635– 649. Laumann, T. O., Snyder, A. Z., Mitra, A., Gordon, E. M., Gratton, C., Adeyemo, B., Gilmore, A. W., Nelson, S. M., Berg, J. J., Greene, D. J., McCarthy, J. E., Tagliazucchi, E., Laufs, H., Schlaggar, B. L., Dosenbach, N. U. F., & Peterson, S. E. (2017). On the stability of BOLD fMRI correlations. Cerebral Cortex, 27, 4719–4732 . Ledoit, O. & Wolf, M. (2002). Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Annals of Statistics, 30(4):1081–1102. Li, J. & Chen, S. X. (2012). Two sample tests for high-dimensional covariance matrices. Annals of Statistics, 40, 908–940. Meinshausen, N. & B¨uhlmann, P. (2010). Stability selection. Journal of the Royal Statistical Society: Series B, 72(4):417–473. Monti, R., Hellyer, P., Sharp, D., Leech, R., Anagnostopoulos, C., & Mon- tana, G. (2014). Estimating time-varying brain connectivity networks from functional MRI time series. Neuroimage, 103, 427–443. Muirhead, R. J. (2005). Aspects of Multivariate Statistical Theory, New York: John Wiley. Nariai, N., Kojima, K., Mimori, T., Kawai, Y., & Nagasaki, M. (2016). A Bayesian approach for estimating allele-specific expression from RNA-Seq data with diploid genomes. BMC Genomics, 17(S1). Osborne, M. R., Presnell, B., & Turlach, B. A. (2000). On the LASSO and its dual. Journal of Computational and Graphical Statistics, 9(2):319–337. Page, E. S. (1954). Continuous Inspection Schemes. Biometrika, 41(1–2), 100–115. Ramsay, J. O. (1982). When the data are functions. Psychometrika, 47, 379–396. Ramsay, J. O. & Silverman, B. W. (2005). Functional Data Analysis, New York: Springer. Schapiro, A., Rogers, T., Cordova, N., Turk-Browne, N., & Botvinick, M. (2013). Neural representations of events arise from temporal community structure. Nature Neuroscience, 16(4), 486–492. Schott, J. (2007). A test for the equality of covariance matrices when the dimension is large relative to the sample size. Computational Statistics & Data Analysis, 51, 6535–6542. Sen, A. & Srivastava, M. (1973). On Multivariate Tests for Detecting Change in Mean. Sankhy¯a: The Indian Journal of Statistics: Series A, 35(2), 173–186. 162 Shedden, K. & Taylor, J. (2005). Differential correlation detects complex associations between gene expression and clinical outcomes in lung adenocarcinomas. Methods of Mi- croarray Data Analysis, J. S. Shoemaker and S. M. Lin, eds. Boston: Springer, pp 121–131. Shen, X., Tokoglu, F., Papademetris, X., & Constable, R. (2013). Groupwise whole-brain parcellation from resting-state fMRI data for network node identification. Neuroimage, 82, 403–415. Skelly, D., Johansson, M., Madeoy, J., Wakefield, J., & Akey, J. (2011). A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq data. Genome Research, 21(10), 1728–1737. Srivastava, M. S. & Worsley, K. J. (1986). Likelihood Ratio Tests for a Change in the Multivariate Normal Mean. Journal of the American Statistical Association, 81:393, 199–204. Srivastava, M. S. & Yanagihara, H. (2010). Testing the equality of several covariance matrices with fewer observations than the dimension. Journal of Multivariate Analysis, 101, 1319–1329. Steibel, J., Bates, R., Rosa, G., Tempelman, R., Rilington, V., & Ragaven- dran, A. et al. (2011). Genome-wide linkage analysis of global gene expression in loin muscle tissue identifies candidate genes in pigs. PLOS One, 6(2), e16766. Steibel, J., Wang, H., & Zhong, P.-S. (2015). A hidden Markov approach for ascer- taining cSNP genotypes from RNA sequence data in the presence of allelic imbalance by exploiting linkage disequilibrium. BMC Bioinformatics, 16(1). Storey, J., Xiao, W., Leek, J., Tompkins, R., & Davis, R. (2005). Significance analysis of time course microarray experiments. Proceedings of the National Academy of Sciences, 102, 12837–12842. Tai, Y. & Speed, T. P. (2006). A multivariate empirical Bayes statistic for replicated microarray time course data. Annals of Statistics, 34, 2387–2412. Taylor, M., Tsukahara, T., Brodsky, L., Schaley, J., Sanda, C., & Stephens, M. et al. (2007). Changes in gene expression during pegylated interferon and ribavirin therapy of chronic hepatitis C virus distinguish responders from nonresponders to antiviral therapy. Journal of Virology, 81, 3391–3401. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58(1):267–288. Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., & Knight, K. (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B, 67(1):91–108. Trapnell, C., Pachter, L., & Salzberg, S. L. (2009). TopHat: discovering splice junctions with RNA-Seq. Bioinformatics, 25(9):1105–11. 163 Venkatraman, E. S. (1992). Consistency results in multiple change-point situations. Tech- nical report, Department of Statistics. Stanford University. Wang, D., Yu, Y., & Rinaldo, A. (2017). Optimal Covariance Change Point Localiza- tion in High Dimension. Arxiv.org. Yang, Q. & Pan, G. (2017). Weighted statistic in detecting faint and sparse alternatives for high-dimensional covariance matrices. Journal of the American Statistical Association, 112, 188–200. Yao, Y. & Davis, R. (1986). The Asymptotic Behavior of the Likelihood Ratio Statistic for Testing a Shift in Mean in a Sequence of Independent Normal Variates. Sankhy¯a: The Indian Journal of Statistics: Series A, 48(3), 339–353. Yuan, M. & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B, 68(1):49–67. Zacks, J., Speer, N., Swallow, K., Braver, T., & Reynolds, J. (2007). Event perception: A mind-brain perspective. Psychological Bulletin, 133(2), 273–293. Zalesky, A., Fornito, A., Cocchi, L., Gollo, L. L., & Breakspear, M. (2014). Time-resolved resting-state brain networks. Proceedings of the National Academy of Sci- ences, 111, 10341–10346. Zhang, C. (2010). Nearly unbiased variable selection under minimax concave penalty. An- nals of Statistics, 38(2), 894–942. Zhang, C., Bai, Z., Hu, J., & Wang, C. (2018). Multi-sample test for high-dimensional covariance matrices. Communications in Statistics - Theory and Methods, 47:13, 3161– 3177. Zhang, J. & Boos, D. D. (1992). Bootstrap critical values for testing homogeneity of covariance matrices. Journal of the American Statistical Association, 87, 425–429. Zhang, C. & Zhang, S. (2013). Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of The Royal Statistical Society: Series B, 76(1), 217–242. Zhao, P. & Yu, B. (2006). On Model Selection Consistency of Lasso. Journal of Machine Learning Research, 2541–2563. Zheng, S., Bai, Z., & Yao, J. (2015). Substitution principle for CLT of linear spectral statistics of high-dimensional sample covariance matrices with applications to hypothesis testing. Annals of Statistics, 43, 546–591. Zhu, L.-X., Ng, K., & Jing, P. (1992). Resampling methods for homogeneity tests of covariance matrices. Statistica Sinica, 12, 769–783. Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476):1418–1429. 164 Zou, H. & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67(2):301–320. 165