STATISTICAL INFERENCE WITH HIGH-DIMENSIONAL DEPENDENT DATA

By

Shawn M. Santo

A DISSERTATION

Submitted to

Michigan State University

in partial fulﬁllment of the requirements

for the degree of

Statistics – Doctor of Philosophy

2018

ABSTRACT

STATISTICAL INFERENCE WITH HIGH-DIMENSIONAL DEPENDENT DATA

By

Shawn M. Santo

High-dimensional time dependent data appear in practice when a large number of variables

are repeatedly measured for a relatively small number of experimental units. The number

of repeated measurements can range from two to hundreds depending on the application.

Advances in technology have made the process of gathering and storing data such as these

relatively low-cost and eﬃcient. Demand to analyze such complex data arises in genetics,

microbiology, neuroscience, ﬁnance, and meteorology.

In this dissertation, we ﬁrst intro-

duce and investigate a novel solution to a classical problem that involves high-dimensional

time dependent data. In addition, we propose a new approach to analyze high-dimensional

dependent genomics data.

First, we consider detecting and identifying change points among covariance matrices

of high-dimensional longitudinal data and high-dimensional functional data. The proposed

methods are applicable under general temporospatial dependence. A new test statistic is

introduced for change point detection, and its asymptotic distribution is established under

two diﬀerent asymptotic settings.

If a change point is detected, an estimate for the lo-

cation is provided. We investigate the rate of convergence for the change point estimator

and study how it is impacted by dimensionality and temporospatial dependence in each

asymptotic framework. Binary segmentation is applied to estimate the locations of possibly

multiple change points, and the corresponding estimator is shown to be consistent under

mild conditions for each asymptotic setting. Simulation studies demonstrate the empirical

size and power of the proposed test and accuracy of the change point estimator. We apply

our procedures on a time-course microarray data set and a task-based fMRI data set.

In the second part of this dissertation we consider a hierarchical high-dimensional de-

pendent model in the context of genomics. Our model analyzes RNA sequencing data to

identify polymorphisms with allele-speciﬁc expression that are correlated with phenotypic

variation. Through simulation, we demonstrate that our model can consistently select sig-

niﬁcant predictors among a large number of possible predictors. We apply our model to an

RNA sequencing and phenotypic data set derived from a sounder of swine.

ACKNOWLEDGMENTS

I would like to express the utmost gratitude to my advisor, Dr. Ping-Shou Zhong, for his

assistance, support, guidance, and encouragement. Dr. Zhong taught me what it means to

be a researcher in an academic environment. I would also like to thank my three committee

members: Dr. Yuehua Cui, Dr. Hyokyoung Hong, and Dr. Juan Steibel. Their service

is greatly appreciated. Furthermore, I would like to thank the Department of Statistics

and Probability for the resources and opportunities provided to me during my Ph.D. career.

Lastly, I would not have pursued a Ph.D. if it were not for Dr. M. P´adraig M. M. McLoughlin.

Dr. McLoughlin challenged and encouraged me as an undergraduate student at Kutztown

University of Pennsylvania in a way no professor had before.

iv

TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

KEY TO ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CHAPTER 1

INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
1.1 Technology and the ﬁeld of statistics
1.2 Low to high-dimensional data . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3
Independent to dependent data . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 Change point detection and identiﬁcation . . . . . . . . . . . . . . . . . . . .
1.5 High-dimensional time dependent data . . . . . . . . . . . . . . . . . . . . .
1.6 Dissertation outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CHAPTER 2 HOMOGENEITY TESTS OF COVARIANCE MATRICES WITH

HIGH-DIMENSIONAL LONGITUDINAL DATA . . . . . . . . . . .
2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Basic setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Homogeneity tests of covariance matrices . . . . . . . . . . . . . . . . . . . .
2.3.1 Non-Gaussian random errors . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Power-enhanced test for sparse alternatives . . . . . . . . . . . . . . .
2.4 Change point identiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Simulation studies
2.5.1 Power-enhanced test statistic
. . . . . . . . . . . . . . . . . . . . . .
2.5.2 Non-Gaussian random errors . . . . . . . . . . . . . . . . . . . . . . .
2.5.3 Accuracy of correlation matrix estimator of VnD . . . . . . . . . . . .
2.5.4 Comparison with a pair-wise based method . . . . . . . . . . . . . . .
2.6 An empirical study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7 Technical details
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7.1 Proofs of lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7.2 Proofs of main results
. . . . . . . . . . . . . . . . . . . . . . . . . .

CHAPTER 3 COVARIANCE CHANGE POINT DETECTION AND IDENTIFI-

CATION WITH HIGH-DIMENSIONAL FUNCTIONAL DATA . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Model
3.3 Change point detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Computation of the proposed statistics . . . . . . . . . . . . . . . . . . . . .
3.5 Change point identiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6 Simulation studies
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.7 An empirical study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.8 Technical details
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

xi

1
1
2
5
6
8
9

12
12
16
17
22
23
26
28
31
32
33
35
36
40
40
50

63
63
69
70
74
78
81
88
94

v

3.8.1 Proofs of lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.8.2 Proofs of theorems

94
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

CHAPTER 4 A HIDDEN MARKOV APPROACH FOR QTL MAPPING USING

ALLELE-SPECIFIC EXPRESSION SNPS . . . . . . . . . . . . . . . 133
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.2 A hidden Markov model for SNP genotype calling . . . . . . . . . . . . . . . 136
4.3 Phenotypic model speciﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . 138
4.3.1 Prediction of ASE ratios . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.3.2
. . . . . . . . . . . . . . . . . 141
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
4.4 Simulation studies
4.5 An empirical study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

Identiﬁcation of quantitative trait loci

CHAPTER 5 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.1
5.2 Summary of contributions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.3 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

vi

LIST OF TABLES

Table 2.1: Empirical size and power of the proposed test, percentages of simulation

replications that reject the null hypothesis under settings (I) and (II) . . .

29

Table 2.2: Percentages of correct change point identiﬁcation among all rejected hy-

potheses under settings (I) and (II)

. . . . . . . . . . . . . . . . . . . . .

30

Table 2.3: Average true positives and average true negatives for identifying multiple
change points using the proposed binary segmentation method. Standard
errors are included after each number. For T = 5, the maximum number
of true positives and true negatives for each is 2. For T = 8, the maximum
number of true positives and true negatives is 2 and 5, respectively

. . .

Table 2.4: Empirical size and power, percentages of simulation replications that re-
ject the null hypothesis for the test statistic Mn and the power-enhanced
test statistic M∗

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

n

Table 2.5: Empirical size and power of the proposed test, percentages of simulation
replications that reject the null hypothesis for data generated from a
standardized Gamma distribution under the nominal level 5% . . . . . .

31

32

33

Table 2.6: Percentages of correct change point identiﬁcation among all rejected hy-
potheses for data generated from a standardized Gamma distribution

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

Table 2.7: Empirical size and power, percentages rejecting the null hypotheses in
the simulations, for the pair-wise based test and the power-enhanced test
statistic M∗

n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table 2.8: Signiﬁcant gene ontology terms, test statistic values, number of genes in
each gene ontology term, identiﬁed change points and estimated local
false discovery rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

39

Table 3.1: Empirical size and power of the proposed test, percentages of simulation

replications that reject the null hypothesis . . . . . . . . . . . . . . . . . .

83

Table 3.2: Empirical size and power of the proposed test for T = 100, percentages of
simulation replications that reject the null hypothesis, quantile computed
from a correlation matrix that used linear interpolation. The ﬁrst 5 oﬀ-
diagonals were computed exactly as well as the last w components for
each row . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

84

vii

Table 3.3: Empirical size and power of the proposed test for T = 100, percentages of
simulation replications that reject the null hypothesis, quantile computed
from a correlation matrix that used linear interpolation. The ﬁrst 10 oﬀ-
diagonals were computed exactly as well as the last w components for
each row . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table 3.4: Empirical size and power of the proposed test for T = 100, percentages of
simulation replications that reject the null hypothesis, quantile computed
from a correlation matrix that used linear interpolation. The ﬁrst 20 oﬀ-
diagonals were computed exactly as well as the last w components for
each row . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table 3.5: Average true positives and average true negatives for identifying multiple
change points using the proposed binary segmentation method. The
maximum number of true positives for a given replication is 2. The
maximum number of true negatives for a given replication is T − 3 . . . .

Table 3.6: Standard errors for average true positives and average true negatives
given in Table 3.5. The maximum number of true positives for a given
replication is 2. The maximum number of true negatives for a given
replication is T − 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table 3.7:

Identiﬁed change points in the Sherlock fMRI data set. Range of time
points preceding the identiﬁed change point where the covariance matri-
ces are temporally homogeneous. An interval ID provides a reference to
Figure 3.3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

84

85

87

88

91

Table 4.1: Average false positive and average false negative rates for the single test

with signiﬁcance level 0.01. Average false positive rate is top value . . . . 146

Table 4.2: Average false positive and average false negative rates for the simulta-
neous test with nominal level 0.05. Average false positive rate is top
value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

Table 4.3: Alternative method 1, average false positive and average false negative
rates for the single test with signiﬁcance level 0.01. Average false positive
rate is top value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Table 4.4: Alternative method 2, average false positive and average false negative
rates for the single test with signiﬁcance level 0.01. Average false positive
rate is top value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

viii

LIST OF FIGURES

Figure 1.1: Population covariance heat maps at six time points. Change points exist

at time t = 3 and at time t = 4. . . . . . . . . . . . . . . . . . . . . . . .

9

Figure 1.2: A small graphical model for the problem considered in Chapter 4. Grey

circles represent observed values. White circles represent latent variables.

10

Figure 2.1: The average component-wise quadratic distance between ˆVnD and VnD.
The top solid line is for n = 40; the middle dashed line is for n = 50;
the bottom dotted line is for n = 60. The scale of the y-axis is 10−5.

. .

34

Figure 2.2: Histogram of the number of genes among the 159 gene ontology terms

analyzed.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38

Figure 2.3: Correlation network map for gene ontology term 0030054. Each dot rep-
resents a gene within the gene ontology. A link between dots indicates
a strong correlation between genes.

. . . . . . . . . . . . . . . . . . . . .

Figure 3.1: Accuracy of linear interpolation for ˆRn,tq. Black circles represent ˆRn,1q
for all q ∈ {1, . . . , T − 1}. Red triangles represent the corresponding
interpolated values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

77

Figure 3.2: Shen 268 node parcellation. This image was obtained from Finn et al.

(2015). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

90

Figure 3.3: Correlation networks based on an average over a time interval in which
the covariance matrices are homogeneous. Each circle is comprised of
67 Shen nodes. Solid lines represent a positive correlation, and dashed
lines represent a negative correlation. The darker the line the stronger
the correlation between nodes. A correlation threshold value of 0.70 in
absolute values was used. . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

Figure 4.1: A graphical model for illustrating the hidden Markov model for SNP
genotype calling. Grey circles represent observed values. White circles
represent latent variables.

. . . . . . . . . . . . . . . . . . . . . . . . . . 138

Figure 4.2: Grey circles represent observed values. White circles represent latent

variables.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

Figure 4.3: Average false negative rates and average false positive rates for the
proposed method. Facets in row 1 are for n = 50. Facets in row 2 are
for n = 100.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

ix

Figure 4.4: Average false negative rates and average false positive rates for alterna-
tive procedure 1. Facets in row 1 are for n = 50. Facets in row 2 are for
n = 100. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

Figure 4.5: Average false negative rates and average false positive rates for alterna-
tive procedure 2. Facets in row 1 are for n = 50. Facets in row 2 are for
n = 100. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

Figure 4.6: ASE estimates from the hidden Markov model compared to simulated
raw allele count ratios. Hidden Markov model imputed ASE ratios with
value less than 0.50 are marked as red, and values above 0.50 are marked
as blue.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

Figure 4.7: Estimates for SNPs. Signiﬁcant SNPs are displayed with their respective

ID provided in the real data set. IDs correspond to the ordered locations. 152

Figure 4.8: ASE estimates from the hidden Markov model compared to real raw
allele count ratios. Hidden Markov imputed ASE values conditional on
Gil = 3 and Gil = 4 are marked as blue and red, respectively.

. . . . . . 153

Figure 4.9: ASE estimates from the hidden Markov model compared to real raw
allele count ratios. Hidden Markov imputed ASE values conditional on
Gil = 3 and Gil = 4 are marked as blue and red, respectively.

. . . . . . 153

x

KEY TO ABBREVIATIONS

ASE Allele-speciﬁc expression

BOLD Blood-oxygen-level dependent

CUSUM Cumulative sum

DNA Deoxyribonucleic acid

EM Expectation-maximization

FDR False discovery rate

fMRI Functional magnetic resonance imaging

GO Gene ontology

Lasso Least absolute shrinkage and selection operator

QTL Quantitative trait loci mapping

RNA Ribonucleic acid

SNP Single-nucleotide polymorphism

xi

CHAPTER 1

INTRODUCTION

1.1 Technology and the ﬁeld of statistics

Technology is one of the chief drivers of growth and innovation in society, and its impact

on the ﬁeld of statistics cannot be understated. For much of the twentieth century, statisti-

cians concentrated on solving problems in a classical setting, where the number of subjects,

observations, or experimental units, exceeded the number of variables or features measured.

If p is the number of variables or features, and n is the number of experimental units, then

this classical setting is the so-called ‘small p, large n’ setting. The demand to develop robust

theoretical procedures under the ‘small p, large n’ setting was due in large part to the data

and resources available at the time. Computers were not eﬃcient, data recording was not

automated, and the scope of technology was limited; thus, there was little motivation to

consider situations in which p far exceeded n. In fact, even as late as 1981, it was considered

poor practice to have a study in which n/p < 5 (Huber 1981).

The past thirty years have been an era of accelerated technological progress in many

ﬁelds in society. Biology, ﬁnance, economics, computer science, meteorology, and others, all

have the available resources to gather massive amounts of information. The need to ﬁlter,

understand, and analyze this information continues to grow. Data sets in numerous domain

speciﬁc ﬁelds now often have more variables recorded than experimental units. This ‘large

p, small n’ setting is what is referred to as high-dimensional data. As technology and data

recording processes improve, statisticians will play an integral role in developing theoretically

robust and computationally eﬃcient statistical methods to analyze such complex data.

1

1.2 Low to high-dimensional data

The research in high-dimensional data has seen a shift over the past two decades from es-

timation to more complex forms of inference. Estimation is often an initial step in inference,

but it does not allow us to quantify uncertainty. Much of the focus with regards to estimation

in a high-dimensional framework has been geared towards parameter estimation in general-

ized linear models and graphical models (B¨uhlmann and van de Geer 2011). Donoho and

Johnstone (1994) pioneered parameter estimation in a linear model when p = n. To obtain

sparse estimation, Tibshirani (1996) proposed an (cid:96)1-norm penalization procedure known

as least absolute shrinkage and selection operator (Lasso). Under a sparsity assumption

and other regularization conditions, Lasso simultaneously performs parameter estimation

and variable selection. Tibshirani’s seminal paper resulted in an extensive study of Lasso’s

theoretical properties and paved a way for valuable (cid:96)1-norm and (cid:96)2-norm penalization ex-

tensions. For example, Zou and Hastie (2005) introduced the elastic net to address some

short-comings with regards to the number of covariates selected via Lasso. Tibshirani et

al. (2005) and Yuan and Lin (2006) proposed fused Lasso and group Lasso, respectively. In

2006, Zou (2006) introduced adaptive Lasso. Fu and Knight (2000) and Zhao and Yu (2006)

investigated the asymptotic behavior of Lasso-type estimates and proved under certain con-

ditions that when the true parameter is 0, there exists non-zero probability mass at 0 for

the estimator’s limiting distribution. From a computation standpoint, Osborne et al. (2000)

studied the primal and dual problem of Lasso, and as a result, developed a fast and eﬃcient

algorithm to obtain Lasso estimates. There is a long list of literature on regularization es-

timation for high-dimensional parameters. Since the main focus of this dissertation is not

estimation, we do not enumerate all of them. Some important works include: Fan and Li

(2001), Candes and Tao (2007), and Zhang (2010).

Inference as it relates to hypothesis testing or conﬁdence intervals allows researchers to

make scientiﬁc discoveries and improve decision making. However, statistical inference of

these forms in high-dimensional data are not simple extensions of the classical inference

2

procedures, where the number of sample subjects exceeds the number of variables measured.

As was noted by Johnstone and Titterington (2009),

It should not, of course, be imagined that the ‘large p’ scenarios are mere al-

ternative cases to be explored in the same spirit as their ‘small p’ forebears. A

better analogy would lie in the distinction between linear and nonlinear models

and methods — the unbounded variety and complexity of departures from linear-

ity is a metaphor (and in some cases a literal model) for the scope of phenomena

that can arise as the number of parameters grows without limit.

In terms of inference for high-dimensional mean vectors, Dempster (1958) ﬁrst considered

a two-sample test in a p > n setting. Bai and Saranadasa (1996), Chen and Qin (2010),

and Cai and Xia (2014) proposed test statistics to extend the novel work of Dempster in

1958. Fujikoshi et al. (2010) provides an overview and details on testing high-dimensional

mean vectors. The work on testing high-dimensional covariance matrices can be traced

back to Ledoit and Wolf (2002), where they assumed p/n converges to some constant, and

proved under a normality assumption that their test statistics are normal. Methodology

building oﬀ Ledoit and Wolf include: Chen et al. (2010) and Cai and Ma (2013). Schott

(2007), Srivastava and Yanagihara (2010), Li and Chen (2012), and Cai et al. (2013) all

investigated the problem of testing the equality of high-dimensional covariance matrices for

two or multiple groups. More recently, Ahmad (2017) and Zhang et al. (2018) generalized

the work of Li and Chen (2012).

Some testing and conﬁdence interval procedures with regards to Lasso estimates and

generalized linear models were established by Bach (2008), Meinshausen and B¨uhlmann

(2010), and Zhang and Zhang (2013).

To elucidate one of the challenges brought about in a high-dimensional framework, con-

sider a classical test with regards to covariance matrices under the ‘small p, large n’ setting.

Muirhead (2005) details a few of these tests, along with some tests for mean vectors. Suppose

3

we are interested in testing

H0 : Σ1 = ··· = ΣT

versus

H1 : Not all are equal,

(1.1)

where we assume Xit (i = 1, . . . , n; t = 1, . . . , T ) is a p-dimensional random vector from a

multivariate normal distribution with mean µt and covariance Σt. Let, xit be a realization

of Xit from the tth population. Assume that the T populations are independent and the

random sample of n vectors from each of the T populations are independent. The likelihood

ratio test can be used to develop an α-level test for (1.1). The likelihood function is given

by

T(cid:89)

L(µt, Σt) =

(2π)−pn/2|Σt|−n/2 exp

t=1

(cid:26)

n(cid:88)

i=1

− 1
2

(xit − µt)TΣ−1

t

(xit − µt)

(cid:27)

.

For observed data, L(µt, Σt) is a function of µt and Σt for all t. To obtain the likelihood

criterion we maximize L(µt, Σt) under the restricted parameter space of the null hypothesis
and also under the unrestricted parameter space. Let Ω = {(µt, Σt) : t = 1, . . . , T} and
Ω0 = {(µt, Σt) : Σ1 = ··· = ΣT} denote the unrestricted and restricted parameter spaces,
respectively. Thus, the likelihood criterion is deﬁned as

Let N = T n, and let A = (cid:80)T

λn =

L(µt, Σt)
supΩ0
supΩL(µt, Σt)

t=1 At, where At = (cid:80)n

.

i=1(xit − ¯xt)(xit − ¯xt)T. For the
parameter space Ω, the maximum likelihood estimators are ˆµt,Ω = ¯xt and ˆΣt,Ω = At/n, for

µt and Σt, respectively. For the parameter space Ω0, the maximum likelihood estimators

= ¯xt and ˆΣΩ0

are ˆµt,Ω0
values back into λn and taking the logarithm,

= A/N , for µt and Σt, respectively. Therefore, substituting these

Λn = −2 log(λn) = N log

| ˆΣΩ0

(cid:18)

(cid:19)

|

− T(cid:88)

t=1

(cid:18)
(cid:19)
| ˆΣt,Ω|

.

n log

Hence, an α-level test rejects H0 of (1.1) whenever Λn < Λα. Under an asymptotic setting

when n diverges and p is ﬁxed, the null distribution of Λn can be derived. Furthermore, as

4

n → ∞, At/(n − 1) → Σt in probability. Thus, the standard asymptotic results hold for
likelihood ratios, Λn → χ2 in distribution with degrees of freedom (T − 1)p(p + 1)/2, under
the ‘small p, large n’ setting. However, breakdowns occur if we consider a ‘large p, small n’

framework.

results are not easily extended. If p > n, then we can no longer compute log(| ˆΣΩ0

Under a ‘large p, small n’ setting, Λn can no longer be computed and the asymptotic
|) due to At
and A being singular. Furthermore, the asymptotic distribution under the null hypothesis

is not well deﬁned for when p diverges. In a high-dimensional framework with p > n, the
convergence in probability of At/(n − 1) → Σt no longer holds as demonstrated through
spectral analysis by Bai and Yin (1993), Johnstone (2001), and others. As a result, testing

(1.1) is not possible via a likelihood ratio test. This is just one example in which breakdowns

in the classical methods occur due to an increase in data dimension. This phenomena is

known as the “curse of dimensionality”.

An increase in data dimension can produce extra noise, computation challenges, and a

failure in many of the existing classical statistical procedures. However, in certain situations

an increase in dimensionality may be a blessing (Donoho 2000). For further challenges

associated with high-dimensional data we encourage readers to see Fan and Li (2006) and

Fan et al. (2014a).

1.3 Independent to dependent data

The likelihood ratio test for (1.1) as described in Section 1.2 further breaks down if the

T groups are not independent.

In this dissertation, measurements of a sample that are

repeatedly recorded will be referred to as longitudinal data when the number of repeated

measurements is small. If the number of repeated measurements is large, or dense, we will

refer to it as functional data. Measurements taken over time allow researchers to understand

the evolution of the sample subjects, detect and identify changes in certain variables across

time, and study sequences of events. In longitudinal or functional data sets, temporal depen-

5

dence exists among measurements from the same subject, and adds a layer of complexity to

the theoretical and computational analysis. Methodology developed under a T -independent

sample framework is not applicable for a T -dependent sample. For example, Chen and

Qin (2010) and Li and Chen (2012) considered an independent two-sample high-dimensional

test for mean vectors and covariance matrices, respectively. However, their methods are

not applicable in a temporal dependent setting. There are two types of dependence in the

data: temporal and spatial. If these dependencies are ignored, then inference procedures are

invalid and misleading. Currently, there is no existing work accounting for the aforemen-

tioned dependencies in high-dimensional covariance testing and change point detection and

identiﬁcation. The asymptotic analysis is more complicated when both dependencies are

considered. Generalizing to an asymptotic framework for high-dimensional functional data

further increases complexity.

1.4 Change point detection and identiﬁcation

Given (1.1) for time dependent data, two questions naturally arise. First: Can we detect

changes among T dependent covariance matrices? Second: Can we identify the time points

for where those changes occur? These questions have profound eﬀects for time dependent

data. Their answers can provide critical information to individuals in the ﬁelds of ﬁnance,

genetics, neuroscience, climatology, and more.

Change point detection is a classical problem in time series analysis. Numerous supervised

and unsupervised machine learning algorithms are used in various change point detection

applications. Aminikhanghahi and Cook (2016) detail a few multi-class supervised learning

algorithms such as Gaussian mixture models, hidden Markov models, and decision trees.

Their work also highlights likelihood ratios, probabilistic models, graphs, and clustering as

further approaches to the change point detection problem. One of the most common tech-

niques in change point detection is the cumulative sum (CUSUM) method by Page (1954).

Measurements in a process are cumulatively summed according to a weighted procedure. A

6

change point is identiﬁed once the cumulative sum quantity exceeds a threshold value. Cher-

noﬀ and Zacks (1964) laid the groundwork for change point detection with regards to the

mean of normal random variables. Accordingly, a series of methodologies were developed in

independent univariate and multivariate settings. Some of these works include: Kander and

Zacks (1966), Yao and Davis (1986), Sen and Srivastava (1973), and Srivastava and Worsley

(1986). Chapter 2 in both Cs¨org¨o and Horv´ath (1997) and Brodsky and Darkhovsky (1993)

detail nonparametric change point detection methods based on Wilcoxon-type statistics, U-

type statistics, and M-estimators. Johnson and Bagshaw (1974), Brown et al. (1975), and

Horv´ath and Kokoszka (1997) introduced methods to address the change point problem for

dependent data. For further details on classical change point detection and identiﬁcation

procedures, we refer readers to Basseville and Nikiforov (1993) and Brodsky (2017).

In terms of a classical procedure for testing (1.1) with T dependent groups, there is

none. A multivariate procedure to test (1.1) was proposed by Aue et al. (2009). Assume

Xt (t = 1, . . . , T ) are p-dimensional temporal dependent random vectors from a multivariate

distribution with mean µ and covariance Σt. Thus, xt is an observation at the tth time

point. To test (1.1), Aue et al. (2009) considered the quantity, Sk (k = 1, . . . , T ) such that

(cid:26) k(cid:88)

j=1

Sk =

1√
T

vech(xjxT

j ) − k
T

(cid:27)

vech(xjxT
j )

,

T(cid:88)

j=1

i, j ∈ {1, . . . , T}. Based on Sk, they introduced a test statistic ΩT = T−1(cid:80)T

where for any p × p symmetric matrix M , vech(M ) represents the stacked columns of the
lower triangular region of M in the form of a p(p − 1)/2 vector. The quantity Sk was
motivated by the fact that under H0 of (1.1), E{vech(xjxT
i )} for all
ˆΣ−1
T Sk,
where ˆΣT is an estimator such that | ˆΣT − ΣT|E = op(1) as T diverges, and for any matrix
A, |A|E = supx(cid:54)=0|Ax|/|x|. They derived the test statistic’s asymptotic distribution under
the null hypothesis with T (cid:29) p.

j )} = E{vech(xixT
k=1 ST
k

However, Aue et al. (2009)’s method fails in a high-dimensional framework since ˆΣT is
not invertible if p (cid:29) T . In addition, Aue et al. did not consider a setting in which n > 1,

7

and thus, their methodology does not permit multiple-subject inference. In Section 1.2 we

highlighted the fact that recent research has addressed the high-dimensional challenges for

testing (1.1) but not dependence. In this section we detailed a procedure that incorporates

dependence but not a ‘large p, small n’ framework. Therefore, a gap exists. How can we

test (1.1) for high-dimensional time dependent data?

1.5 High-dimensional time dependent data

High-dimensional longitudinal data appear in practice when a large number of variables,

p, are repeatedly measured for a relatively small number of experimental units, n. The

number of repeated measurements, T , can range from two to hundreds depending on the

application. Throughout this dissertation, longitudinal data will refer to settings when T is

small. High-dimensional functional data will refer to settings in which T is large or dense.

For details on functional data analysis we refer readers to Ramsay and Silverman (2005).

Consider an experiment where patients have their gene expressions measured throughout

the course of a treatment regimen. Doctors and clinicians may be interested in understand-

ing how these gene expressions are regulated over time. In studies such as this, the number

of gene expressions, p, measured is anywhere from a few hundred to a few thousand, and

the number of patients, n, along with the number of repeated measurements, T , is small.

We will refer to this as high-dimensional longitudinal data. As another example, consider a

functional magnetic resonance imaging (fMRI) study where patients have their brain activ-

ity measured while performing various tasks. Thousands of blood-oxygen-level dependent

(BOLD) responses are recorded, hundreds of times during the duration of a scan, for voxels

corresponding to regions of interest in the patient’s brain. For this single patient, radiologists

may be interested in identifying and understanding signiﬁcant spatial and temporal changes.

The BOLD data from an fMRI experiment are considered high-dimensional functional data.

8

1.6 Dissertation outline

In Chapters 2 and 3 of this dissertation we develop and evaluate a procedure to test

(1.1) for high-dimensional longitudinal and high-dimensional functional data, respectively.

To visualize our objective in a high-dimensional longitudinal setting, consider Figure 1.1.

Each sub-plot represents the covariance matrix at the respective time point. From Figure

1.1 it is clear that the covariance is homogeneous between time points one through three;

there is a diﬀerent covariance structure at t = 4; for time points ﬁve and six the covariance

structure is homogeneous again.

Figure 1.1: Population covariance heat maps at six time points. Change points exist at
time t = 3 and at time t = 4.

Our statistical test will ﬁrst detect the presence of any change points among the T covariance

matrices. If we can conclude that change points exist, we further identify the time points

at which changes occur. The procedures we propose are pioneering with regards to (1.1) for

high-dimensional longitudinal and high-dimensional functional data. As is discussed in detail

in Chapters 2 and 3, some research has provided a solution to test (1.1) in a high-dimensional

9

framework, but no method has been developed for high-dimensional time dependent data.

In addition to the theoretical challenges, we also address the natural computation challenges

that arise with such massive time dependent data. We ensure our method is practical and

accessible to the end users in biology, neuroscience, and other ﬁelds via an R package.

In Chapter 4 we consider a diﬀerent type of high-dimensional dependent data, where

we propose a novel hierarchical model for genomics applications. Our interest is to link a

phenotypic response with single nucleotide polymorphisms (SNPs) that have allele-speciﬁc

expression (ASE). To account for dependence among the latent genotype and ASE status

combination, we consider a hidden Markov model and incorporate regularized regression to

address the high-dimensionality. Our problem can be depicted with the graphical model

in Figure 1.2 for the ith individual with ﬁve SNPs. Let Xil, Gil, δil be the RNA read

counts, genotype and ASE status, and allele-speciﬁc expression ratio, respectively for the ith

individual at the lth SNP. Let Yi be an observed phenotypic response. Given the relationships

between X, G, and δ we ﬁrst aim to estimate the latent variables Gil and δil given X and an

assumed Markov structure for G. For an observed phenotypic response, Y , we use regularized

regression to select the signiﬁcant δs.

Yi

δi1

δi2

δi3

δi4

δi5

Gi1

Gi2

Gi3

Gi4

Gi5

Xi1

Xi2

Xi3

Xi4

Xi5

Figure 1.2: A small graphical model for the problem considered in Chapter 4. Grey circles
represent observed values. White circles represent latent variables.

In, Chapter 5, we discuss possible theoretical and computational extensions to the results

of Chapters 2 – 4.

10

All proofs to lemmas and theorems are provided in the sections titled “Technical details”

of the respective chapter.

11

CHAPTER 2

HOMOGENEITY TESTS OF COVARIANCE MATRICES WITH

HIGH-DIMENSIONAL LONGITUDINAL DATA

2.1 Introduction

In a typical time-course microarray data set, thousands of gene expression values are

measured repeatedly from the same subject at diﬀerent stages in a developmental process (Tai

and Speed, 2006). As a motivating example, Taylor et al. (2007) conducted a longitudinal

study on 69 patients infected with hepatitis C virus. Their gene expression values were

measured once before treatment and ﬁve times during the treatment regimen of pegylated

alpha interferon and ribavirin. One purpose of the study was to identify which genes were

regulated by treatment. The repeated measurements enable researchers to understand gene

regulation over time. An important task in genomic studies is to identify gene sets with

signiﬁcant temporal changes (Storey et al., 2005). Much evidence has shown that gene

interaction and co-regulation play a critical role in the etiology of various diseases (Shedden

and Taylor, 2005). One application of our methods is to identify gene sets with signiﬁcant

changes in their covariance matrices, because the covariance matrix or its inverse can be

used for quantifying interaction and co-regulation among genes (Danaher et al., 2015).

Assume that Yit = (Yit1, . . . , Yitp)T is a p-dimensional random vector with mean µt and

covariance Σt. In the aforementioned applications, Yit (i = 1, . . . , n; t = 1, . . . , T ) represents

gene expressions for p genes in a gene set measured from the ith individual at the tth

developmental stage, where n is the sample size and T is the total number of ﬁnite stages.

The number of genes, p, in a given gene set ranges from a hundred to a few thousand, as

illustrated by the histogram in Figure 2.2 in Section 2.6, but n and T are small in the study.

Thus, p can be much larger than n and T . We focus on testing the homogeneity of covariance

12

matrices:

(2.1)
for some 1 ≤ k (cid:54)= l ≤ T . The alternative in (2.1) can be written as a change point type

H0 : Σ1 = ··· = ΣT

versus H1 : Σk (cid:54)= Σl

alternative:

H1 : Σ1 = ··· = Σk1

(cid:54)= Σk1+1 = ··· = Σkq (cid:54)= Σkq+1 = ··· = ΣT ,

(2.2)

where 1 ≤ k1 < ··· < kq < T are unknown locations of change points. This alternative is
of interest in practice because it speciﬁes the locations of changes. For example, researchers

are often interested in understanding dynamic gene regulation. By identifying the change

points, we can infer the change pattern of gene regulation, which is important for developing

diagnostic and preventive tools for some diseases (Koh et al., 2014).

Testing the homogeneity of covariance matrices is a classical problem in multivariate

analysis. Classical methods for testing (2.1) include the likelihood ratio test (Muirhead,

2005) and Box’s M test (Box, 1949). Some resampling methods have also been proposed

by Zhang and Boos (1992) and Zhu et al. (2002). However, these methods are not valid for

the aforementioned applications for the following reasons. First, these methods require n to

be much larger than p. Thus, they are not applicable under the large p, small n paradigm.

Second, these methods are only valid for independent samples without temporal dependence,

but the independence assumption is not valid for high-dimensional longitudinal data because

the repeated measurements obtained from the same individual are temporally dependent.

There is some existing research on testing (2.1) in the large p, small n scenario for

independent samples. Li and Chen (2012) considered testing the equality of two covariance

matrices for two independent samples. Schott (2007) and Srivastava & Yanagihara (2010)

proposed test statistics for (2.1) based on estimators of the summation of the weighted pair-

wise Frobenius norm distances between any two covariance matrices. Zheng et al. (2015)

and Yang and Pan (2017) applied random matrix theory to test the equality of two large-

dimensional covariance matrices.

13

Some methods have also been developed in neuroscience literature under the large p and

large T setup with T > p, which is diﬀerent from our large p, small n and T setup. For exam-

ple, Barnett & Onnela (2016) proposed a sieve bootstrap covariance change point detection

method that requires removing both boundaries of a time series with length greater than

p to avoid ill-conditioned covariance matrices. Laumann et al. (2017) discussed a method

for detecting changes in covariances by assessing the stability of multivariate kurtosis us-

ing a simulation approach. Their methods also require T > p to ensure the existence of

an inverse of a sample covariance matrix. In addition to the aforementioned multivariate

detection procedures, a marginal pair-wise testing procedure was developed by Zalesky et

al. (2014). Their approach relies on a sliding window to detect changes in correlation coef-

ﬁcients between a pair of coordinates. The p-value for each pair is obtained by resampling

residuals after ﬁtting vector autoregressive models. It is then applied to test the homogene-

ity of covariance matrices using multiple testing. Despite the above advances, no existing

multivariate method can be applied directly to test (2.1) for temporal dependent data under

the large p, small n and T setup.

This chapter proposes a new method for testing the equality of covariance matrices with

high-dimensional longitudinal data under the large p, small n and T scenario. The proposed

method considers both spatial and temporal dependence. Spatial dependence refers to the

dependence among diﬀerent components of Yit, and temporal dependence refers to the de-
pendence between Yit and Yis for any two time points t (cid:54)= s. The asymptotic distribution
of the proposed test statistic is derived under mild conditions on dependence without any

explicit requirement on the relationships between p, n and T .

We also propose a method for estimating the location of change points k1, . . . , kq among

covariance matrices. There exists some work on identifying change points in high-dimensional

means, but the literature for high-dimensional covariances is very small. Aue et al. (2009)

laid groundwork by considering a p-dimensional multivariate, possibly high-dimensional,

time series setup where T diverges, n = 1 and p < T . Their test statistic involves the inverse

14

of a p × p sample covariance matrix, which is singular if p > T . Thus, their method is

not applicable to high-dimensional longitudinal data. In the case with ﬁnite p and n but

diverging T , one major concern is that the change point estimator is not consistent (Hinkley,
1970) and only the ratios ki/T (i = 1, . . . , q) are consistent. When p is ﬁnite but n → ∞, it
has been shown that change points can be estimated consistently. However, it is not clear

how the data dimension aﬀects the rate of convergence. We study the rate of convergence

of our proposed change point estimator and ﬁnd that it depends on the data dimension,

sample size, noise level and signal strength. Consistency of the change point estimator is

possible even in the high-dimensional case. Furthermore, we propose a binary segmentation

procedure for identifying the locations of multiple change points, whose consistency is also

established.

Our work is related to, but diﬀerent from, that of Li and Chen (2012), who considered

a test for the equality of two covariance matrices with two independent samples. First, we

consider a general homogeneity test of covariance matrices with more than two populations,

while Li and Chen only considered a two-sample case. Second, Li and Chen considered the

test for two independent samples, but our proposal can accommodate both temporal and

spatial dependence. Moreover, our method is designed to test for the existence of change

points among high-dimensional covariance matrices for longitudinal data. Therefore, the

test procedure considered in this chapter is diﬀerent from that in Li and Chen (2012).

This chapter makes the following contributions. From a methodology perspective, the

proposed test procedure provides a novel solution for change point detection problems in

the large p, small n and T scenario. The test statistic combines the strength of maximal

and Frobenius norms, and is powerful against the alternative. Second, we propose a method

for estimating locations of change points among high-dimensional covariance matrices. The

proposed change point detection and identiﬁcation procedures are widely applicable without

any sparsity assumption. We establish the asymptotic distribution of a test statistic for data

with general temporal and spatial dependence. The identiﬁcation procedure for multiple

15

change points is shown to be consistent. Our results reveal the impact of data dimension,

sample size, and signal-to-noise ratio on the rate of convergence of the change point estimator.

The proposed methods formally address two challenges that are unsolved in the existing

covariance change point literature: the large p, small n and T issue, and spatial and temporal

dependence.

The remaining sections of this chapter are organized as follows. Section 2.2 details our

basic settings with regards to covariance testing. In Section 2.3 we introduce our testing pro-

cedure and test statistics along with their asymptotic distributions. Section 2.4 introduces

an estimator for change point identiﬁcation. Moreover, binary segmentation is proposed to

identify multiple change points. Sections 2.5 and 2.6 demonstrate the ﬁnite sample perfor-

mance of our procedures via simulation and analysis of a time-course microarray data set,

respectively. All proofs of theorems and necessary lemmas are available in Section 2.7.

2.2 Basic setting

Let Yit = (Yit1, . . . , Yitp)T be the observed p-dimensional random vector for the ith
individual at time point t = 1, . . . , T , where T ≥ 2, and i = 1, . . . , n. Assume that Yit
follows the model

Yit = µt + εit,

(2.3)

where µt is a p-dimensional unknown mean vector and εit = (εit1, . . . , εitp)T is a multivariate

normally distributed random error vector with mean zero and covariance var(εit) = Σt. A

generalization to the non-Gaussian setup is given in Section 2.5. In addition, it is assumed
that εit = ΓtZi for a p × m matrix Γt, where m ≥ pT , and Zi is an m-dimensional standard
multivariate normally distributed random vector so that cov(εis, εjt) = ΓsΓT
t = Cst if i =
i=1 are independent, but {εit}T
j ∈ {1, . . . , n} and is 0 if i (cid:54)= j. The random errors {εit}n
depend on each other. Of interest is to test whether any change points among covariances
occur at some time points t ∈ {1, . . . , T − 1}. We test the hypothesis H0 versus H1 speciﬁed
in (2.1) and (2.2). If H0 is rejected, we further estimate the locations of change points.

t=1

16

s1=1

2.3 Homogeneity tests of covariance matrices

At each t ∈ {1, . . . , T − 1}, we deﬁne a measure Dt = w−1(t)(cid:80)t

(cid:80)T
s2=t+1 tr{(Σs1 −
Σs2)2}, where w(t) = t(T − t). Measure Dt characterizes the diﬀerences among the covari-
ances before t and after t. Clearly, Dt = 0 for all t ∈ {1, . . . , T − 1} under H0, and Dt (cid:54)= 0
for any t under H1. Therefore, max1≤t≤T−1 Dt = 0 under H0, and max1≤t≤T−1 Dt > 0

under H1. Thus, Dt is useful for distinguishing the null and alternative hypotheses.

Measure Dt is diﬀerent from measure S1,T =(cid:80)T−1

(cid:80)T
s2=s1+1 tr{(Σs1 − Σs2)2} used
in Schott (2007), who applied S1,T in constructing a homogeneity test speciﬁed in (2.1) for
independent samples. In fact, for any t ∈ {1, . . . , T − 1}, Dt = S1,T − (S1,t + St+1,T ), where
S1,t and St+1,T quantify the diﬀerences among covariances only before time t and only after

s1=1

time t, respectively. These are not useful for measuring the diﬀerences among covariances

before and after time t. Measure Dt removes both S1,t and St+1,T from S1,T .

To construct an unbiased estimator of Dt, we need an unbiased estimator of tr(Σs1Σs2).
We make use of U-statistic type estimators because they avoid bias that is not ignorable in

a high- dimensional setup (Bai & Saranadasa, 1996; Chen & Qin, 2010). Otherwise, bias

that limit the scope of applications. Let

dices of sample subjects. For example,

correction could be a challenge and require conditions on the data dimension and sample size

∼(cid:80) denote summation over mutually diﬀerent in-

∼(cid:80)
i,j,k means summation over {(i, j, k) ∈ {1, . . . , n} :
)2
Yjs2
µs2)2 where P k

n)(cid:80)n

i(cid:54)=j(Y T
is1

∼(cid:80)

Σs2µs1 + µT
s2

i (cid:54)= j, j (cid:54)= k, k (cid:54)= i}. For any s1, s2 ∈ {1, . . . , T}, deﬁne Us1s2,0 = (1/P 2
as an unbiased estimator of tr(Σs1Σs2) + µT
Σs1µs2 + (µT
n =
s1
s1
n!/(n − k)!. To remove the nuisance terms µT
Σs2µs1 and (µT
µs2)2, we deﬁne Us1s2,1 =
s1
s1
(1/P 3
as an unbiased estimator of µT
Σs2µs1 + (µT
µs2)2 and,
n)
s1
s1
similarly, Us2s1,1 is an unbiased estimator of µT
Σs1µs2 + (µT
µs2)2. To remove the nui-
s2
s1
sance term (µT
µs2)2, we deﬁne Us1s2,2 = (1/P 4
i,j,k,l Y T
n)
as an unbiased
Yjs2
s1
is1
estimator of (µT
µs2)2. A computation eﬃcient formulation of Us1s2,1 and Us1s2,2 is given
s1

i,j,k Y T
is1

∼(cid:80)

Y T
ks1

Yls2

Yjs2

Y T
js2

Yks1

17

in the Appendix. Finally, we deﬁne an unbiased estimator for tr(Σs1Σs2) as

Us1s2 = Us1s2,0 − Us1s2,1 − Us2s1,1 + Us1s2,2.

(2.4)

The estimator Us1s2 is a generalization of the estimator for the trace of the covariance given
by Chen et al. (2010) and Li and Chen (2012). For t = 1, . . . , T − 1, an unbiased estimator

of Dt is

ˆDnt =

1

w(t)

(Us1s1 + Us2s2 − Us1s2 − Us2s1).

(2.5)

To study the asymptotic variance of ˆDnt for t = 1, . . . , T − 1, deﬁne

(−1)|u−v|+|k−l|tr2(Csuhk

CT

svhl

)

t(cid:88)

T(cid:88)

s1=1

s2=t+1

∗(cid:88)

(cid:88)

s1,s2,
h1,h2

u,v,

k,l∈{1,2}

V0t =

∗(cid:88)
=(cid:80)t

s1,s2,
h1,h2

(cid:88)
(cid:80)T

and

V1t =

u,k∈{1,2}

(−1)|u−k|tr{(Σs1 − Σs2)Csuhk
(cid:80)t

where(cid:80)∗
(cid:80)T
= 0 for any su (cid:54)= hk, and V0t = (cid:80)∗
(cid:80)T
(cid:80)t

h2=t+1 . If no temporal dependence exists, then
=
s2=t+1. Up to a scale factor, this V0t is the part of the variance of ˆDnt for the case

u,v∈{1,2} tr2(ΣsuΣsv ) where (cid:80)∗

(Σh1

− Σh2

)CT

suhk

},

s1,s2,
h1,h2

(cid:80)

Csuhk

s1=1

s2=t+1

h1=1

s1=1

s1,s2

s1,s2

with independent samples under H0 .

The asymptotic setting considered in this chapter is p(n) → ∞ as n → ∞, where p is

considered to be a function of n. We do not require a speciﬁc relationship between p and n.
Instead, for any t ∈ {1, . . . , T − 1}, we have two regularity conditions. For any matrix A,
denote A⊗2 = AAT. Then:

Condition 1. tr{(ΓT
s2

Condition 2. tr(cid:2){(Γs1 + Γs2)T(Σs1 − Σs2)(Γs1 − Γs2)}⊗2(cid:3) = o(nV1t) for s1 ∈ {1, . . . , t}

)⊗2} = o(V0t) for any s1, s2, h1, h2 ∈ {1, . . . , T};

Cs1h1

Γh2

and s2 ∈ {t + 1, . . . , T}.

Condition 1 generalizes Condition 2 imposed by Li and Chen (2012) to a T -sample test

with temporal dependence. If there is no temporal dependence, Condition 1 can be simpliﬁed

18

Σh1

Σh2

Σs1) = o(V0t). In general, the left-hand side of the equality in Condition
)tr(Σs2Σs1Σs2Σs1)}1/2, which is of order O(p) if all

to tr(Σs2Σs1Σh2
1 is bounded by {tr(Σh2
Σh1
the eigenvalues of Σt are bounded.
If the temporal dependence is not overwhelming so
that V0t (cid:16) pδ for any δ > 1, then Condition 1 holds. To appreciate this point, consider a
null hypothesis case with Cst = (1 − rst,n)Σ for s, t ∈ {1, . . . , T}. Here 1 − rst,n measures
the temporal correlation. If rst,n is small for all s, t, then the temporal dependence among
{Yit}T
If
rst,n → 0 for all s, t, then V0t (cid:16) rntr2(Σ2) (cid:16) rnp2 provided all the eigenvalues of Σ are
bounded. If the temporal dependence is not too strong so that 1/p = o(rn), then Condition
1 holds as p → ∞. Intuitively, Condition 1 implies that spatial and temporal dependence

(cid:80)
u,v,k,l∈{1,2}(−1)|u−v|+|k−l|rsuhk,nrsvhl,n.

t=1 is strong. Let rn = (cid:80)∗

s1,s2,h1,h2

cannot be too strong.

Condition 2 is automatically true under H0 because its left-hand side equals zero. Hence,

it is not needed under H0. If there is no temporal dependence, it can be shown that the left-

hand side of Condition 2 is tr(cid:8)(Σ2

− Σ2
s2

s1

)2(cid:9), whose order is not larger than V1t. Therefore,

Condition 2 is not needed for data without temporal dependence. This condition implies

that the alternatives should not be too far away from the null hypothesis. Otherwise, the

alternatives are easy to detect because the test statistics would diverge to inﬁnity.

Theorem 1 states the mean and variance of ˆDnt. The proof is given in Section 2.7.

Theorem 1. The expectation of ˆDnt is E( ˆDnt) = Dt. Under Condition 1, the leading order
variance of ˆDnt is σ2

nt = w−2(t)(cid:0)4V0t/n2 + 8V1t/n(cid:1).

Based on Theorem 1, we observe that E( ˆDnt) = Dt = 0 under H0. Under alternative H1
in (2.2), it is clear that E( ˆDnt) > 0 for all t under H1. Therefore, ˆDnt is able to distinguish

the null and alternative hypotheses in (2.1) and (2.2).

If T = 2 and no temporal dependence exists, V0t and V1t are, respectively, simpliﬁed

2) and V11 =(cid:80)∗

s1,s2

u=1 tr(cid:2){Σsu(Σs1 − Σs2)}2(cid:3),
(cid:80)2

to V01 = tr2(Σ2

1) + 2tr2(Σ1Σ2) + tr2(Σ2

which are the same as those obtained by Li and Chen (2012). For a general case with

temporal dependence, V01 = tr2(Σ2

1) + 2tr2(Σ1Σ2) + tr2(Σ2

2)− 4{tr2(Σ1C21) + tr2(Σ2C12)} +

19

12) + tr2(C12C12)}. The last four terms in V01, due to the temporal dependence,

2{tr2(C12CT
are not included in Li and Chen’s test. However, in general, these four terms are not

ignorable. Therefore, Li and Chen’s procedure is not suitable for temporal dependent data

even in the two-sample case.

We now study the asymptotic distribution of ˆDnt. The following theorem establishes the

asymptotic normality of ˆDnt. The proof is given in Section 2.7.
Theorem 2. Under Conditions 1–2, σ−1
where σ2

nt is deﬁned in Theorem 1.

nt ( ˆDnt − Dt) → N (0, 1) in distribution as n → ∞,

We do not require explicit conditions on p and n in Theorem 2. The asymptotic normality

holds provided Conditions 1–2 hold. In particular, we only need Condition 1 under the null

hypothesis. Thus, our test is valid under Condition 1 without Condition 2, which is needed

only for studying the power of the test. The normality assumption in model (2.3) is not

essential and can be relaxed to a multivariate model as considered in Chen et al. (2010) and

Li and Chen (2012). See Subsection 2.3.1 for the generalization to the non-Gaussian case.

Under H0, Dt = 0 for all t ∈ {1, . . . , T −1}. Theorem 2 indicates that σ−1

ˆDnt converges
nt,0 = 4V0t/{nw(t)}2 is the variance of ˆDnt under H0. An
to N (0, 1) in distribution where σ2
asymptotic α-level rejection region is Rt = {σ−1
ˆDnt > zα}, where zα is the upper α quantile
of the standard normal distribution. For each t ∈ {1, . . . , T − 1}, one can use Rt to test
for the hypothesis in (2.1). Provided that one test based on ˆDnt rejects the null hypothesis,

nt,0

nt,0

one may suspect that change points could exist among covariance matrices. Accordingly, t,

in ˆDnt, could be considered as a tuning parameter, and it is hard to decide which t should

be used for testing in practice. To make the proposed method free of any tuning parameter

and adaptive to unknown change points, we propose the following statistic for testing the

hypothesis in (2.1):

Mn = max

1≤t≤T−1

ˆσ−1

nt,0

ˆDnt,

(2.6)

where ˆσ2

nt,0 = 4 ˆV0t/{nw(t)}2. The estimator ˆV0t can be constructed by replacing

20

CT

svhl

). Deﬁne

) in V0t with Ususv,hkhl

tr(Csuhk
Ususv,hkhl
(1/P 2

n)(cid:80)n

, an unbiased estimator of tr(Csuhk

CT
svhl
= Ususv,hkhl,0 − Ususv,hkhl,1 − Usvsu,hlhk,1 + Ususv,hkhl,2, where Ususv,hkhl,0 =
YjsvY T
i(cid:54)=j=1 Y T
Yjhl
µhl
isu
ihk
suµsv µT
+ µT
µhl
µhk
hk
+ µT
µhl

µT
suCsvhl
estimator of µT
sv Csuhk
Ususv,hkhl,2 = (1/P 4
n)
estimators Ususv,hkhl,q, q = 1, 2, is similar to that for Us1,s2,q deﬁned in (2.4).

Yghl
and an unbiased estimator of µT

, Ususv,hkhl,1 = (1/P 3
n)
suµsv µT
hk

sv Csuhk
is an unbiased
suµsv µT
hk

is an unbiased estimator of tr(Csuhk

. A computation eﬃcient formulation of the

µhl
YjsvY T
ghk

i,j,g,f Y T
isu

YjsvY T
ihk

i,j,g Y T
isu

∼(cid:80)

∼(cid:80)

CT

svhl

µhl

is

Yf hl

)+µT

+

Under H0 and Condition 1, similar to the derivation in Lemma 4 in Section 2.7, the

s1=1

Qn,tq =

t(cid:88)

leading order of the cov( ˆDnt, ˆDnq) is Qn,tq, where

Vn0(s1, s2, h1, h2)/{w(t)w(q)}

T(cid:88)
q(cid:88)
T(cid:88)
and Vn0(s1, s2, h1, h2) = (4/n2)(cid:80)
ˆDnq is Qn,ts/(cid:112)(Qn,ttQn,ss), which is the correlation
u,v,k,l∈{1,2}(−1)|u−v|+|k−l|tr2(Csuhk
Let VnD be a correlation matrix whose (t, s) component is Qn,ts/(cid:112)(Qn,ttQn,ss) for t, s ∈

variance between σ−1
nq,0
between ˆDnt and ˆDnq.

ˆDnt and σ−1

). Then the co-

h2=q+1

s2=t+1

h1=1

svhl

CT

nq,0

{1, . . . , T−1}. Assume that VnD converges to VD as n → ∞. The following theorem provides
the asymptotic distribution of Mn.

Theorem 3. Under Condition 1, we have that under H0, Mn→W in distribution as n → ∞,
where W = max1≤t≤T−1 Zt and Z = (Z1, . . . , ZT−1)T is a multivariate normally distributed

random vector with mean 0 and covariance VD.

According to Theorem 3, an α-level test for (2.1) rejects the null hypothesis if Mn > Wα,
where Wα is the α-quantile of W such that pr(W > Wα) = α. Let Zn be a N (0, ˆVnD) dis-
( ˆQn,tt ˆQn,ss),
tributed random vector with the (t, s) component of ˆVnD estimated by ˆQn,ts/

(cid:113)

where

ˆQn,ts =

4

n2w(t)w(s)

t(cid:88)

s(cid:88)

T(cid:88)

T(cid:88)

(cid:88)

(−1)|u−v|+|k−l|U 2

susv,hkhl

,

u,v,k,l∈{1,2}

s1=1

h1=1

s2=t+1

h2=s+1

21

is deﬁned just below (2.6). Simulations suggest that the plug-in estimates

and Ususv,hkhl
of the correlation matrix ˆVnD are reliable when the sample size is approximate 40 or above.
See Section 2.7 for a detailed comparison between ˆVnD and VnD. The quantile Wα can be

approximated by Wn,α obtained from the multivariate normal distribution by ﬁnding the
quantile wn,α = (Wn,α, . . . , Wn,α)T satisfying pr(Zn < wn,α) = 1 − α. The quantile wn,α
can be computed using the R package mvtnorm (Genz et al., 2018), and no simulation is

needed to ﬁnd quantile Wn,α.

The lower bound for power based on Mn is

pr(Mn > Wα) ≥ max

1≤t≤T−1

pr(ˆσ−1

nt,0

ˆDnt > Wα) = max

1≤t≤T−1

(cid:16) − σnt,0

σnt

Φ

(cid:17)

,

(2.7)

Wα +

Dt
σnt

where Φ(·) is the standard normal cumulative distribution function. If Dt/σnt dominates
Wα, the right-hand side of (2.7) is the maximum power of the test using Rt constructed on
a single ˆDnt, so the test based on Mn is more powerful than any test based on a single ˆDnt.

2.3.1 Non-Gaussian random errors

To relax the Gaussian assumption, we assume the following data generation model for εi =
T )T is a T p × m matrix with m ≥ T p
(εT
t = Cst. We assume Z1, . . . , Zn are independent and identically

iT )T and εi = ΓZi where Γ = (ΓT

such that Σ = ΓΓT and ΓsΓT

1 , . . . , ΓT

i1, . . . , εT

distributed m-dimensional random vectors such that E(Z1) = 0 and var(Z1) = Im. Write
Z1 = (Z11, ..., Z1m)T. We assume that each Z1l has a uniformly bounded 8th moment. Also,
we assume there exists a ﬁnite constant such that for l = 1, . . . , m, E(Z4
)··· E(Z

for any integers lv ≥ 0 with(cid:80)q

1l) = 3 + ∆ and
), whenever

) = E(Z

··· Z

v=1 lv = 8, E(Z

lq
1iq

l1
1i1

lq
1iq

l1
1i1

i1, . . . , iq are distinct indices.

Under Condition 1 and the above setup, it can be shown that the leading order of the

22

variance of ˆDnt is

var( ˆDnt) =

4

n2w2(t)

8

+

nw2(t)

(Σh1

− Σh2

)CT

suhk

}

∗(cid:88)
∗(cid:88)

s1,s2,
h1,h2

(cid:88)
(cid:88)

u,v,

s1,s2,
h1,h2
+ ∆tr{ΓT

)

CT

svhl

k,l∈{1,2}

(−1)|u−v|+|k−l|tr2(Csuhk
(−1)|u−k|(cid:2)tr{(Σs1 − Σs2)Csuhk
}(cid:3).

u,k∈{1,2}
su(Σs1 − Σs2)Γsu ◦ ΓT
hk

− Σh2

(Σh1

)Γhk

Under the null hypothesis, var( ˆDnt) = 4V0t/{n2w2(t)}. The variance V0t can be estimated
using the formula given below equation (2.6). The results in Theorems 2 and 3 can be

established in a similar way.

2.3.2 Power-enhanced test for sparse alternatives
The proposed test statistic, Mn, is powerful for alternatives with small absolute diﬀerences
in many components of Σt. However, it might not be very powerful for sparse alternatives

with the diﬀerences among Σt only residing in a few components. To enhance the power of
the proposed test for sparse alternatives, we include an additional term with Mn, as an idea
in Fan et al. (2015).

Let ¯Ys1v =(cid:80)n
s1, and deﬁne ˆσs1,uv =(cid:80)n
between components u, v ∈ {1, . . . , p} at time s1. Deﬁne ˆDnt,uv =(cid:80)t
ˆσs2,uv)2 as an estimator of Dnt,uv = (cid:80)t

i=1 Yis1v/n be the sample mean of the vth component measured at time
(cid:80)T
i=1(Yis1u − ¯Ys1u)(Yis1v − ¯Ys1v)/(n − 1) as the sample covariance
(cid:80)T
s2=t+1(ˆσs1,uv−
s2=t+1(σs1,uv − σs2,uv)2. The estimator
(uv)
skht

be the (u, v) component of Cskht

ˆDnt,uv is a consistent estimator of Dnt,uv. Let C
(uv)
ht

is the (u, v) component of Σht

. To deﬁne the variance of ˆDnt,uv, deﬁne the following

s1=1

s1=1

and

σ

23

notation:

F

(uv)
skslhths

= σ

{C

(uv)
sl

σ

(uv)
ht
(uv)
sl

σ

(uv)
hs

(uv)
sk

σ

(uv)
sk

σ

(uv)
ht
(uv)
hs

+ σ

+ σ

+ σ

C

(uv)
skhs

+ C

(vu)
hssk
{C
{C
{C

(vu)
htsk
(vu)
hssl
(vu)
htsl

C

C

C

(uv)
skht
(uv)
slhs
(uv)
slht

C

(vv)
hssk
(vv)
htsk
(vv)
hssl
(vv)
htsl

C

}

(uu)
skhs
}
(uu)
skht
}
(uu)
slhs
},
(uu)
slht

C

C

+ C

+ C

+ C

G

(uv)
skslhths

= {C

(vu)
skhs

C

(uv)
skhs

+ C

+ {C

(vu)
skht

C

(uv)
skht

+ C

C

(vv)
skhs
(vv)
skht

}{C
}{C

(uu)
skhs
(uu)
skht

C

(vu)
slht

C

(uv)
slht

+ C

(vv)
slht

C

(uu)
slht

}

(vu)
slhs

C

(uv)
slhs

+ C

(vv)
slhs

C

(uu)
slhs

}.

The leading order term of the variance of ˆDnt,uv is

σ2
nt,uv =

1

w2(t)

(−1)|k−l|+|s−t|{n−1F

(uv)
skslhths

+ n−2G

(uv)
skslhths

}.

(2.8)

∗(cid:88)

(cid:88)

s1,s2,
h1,h2

k,l,

s,t∈{1,2}

∗(cid:88)

(cid:88)

s1,s2,
h1,h2

k,l,

s,t∈{1,2}

Under H0, the ﬁrst term in (2.8) is 0. Namely,

(−1)|k−l|+|s−t|F

(uv)
skslhths

= 0.

The leading term in the variance of ˆDnt,uv under H0 is

σ2
nt,uv0 =

(−1)|k−l|+|s−t|G

(uv)
skslhths

/n2.

∗(cid:88)

(cid:88)

s1,s2,
h1,h2

k,l,

s,t∈{1,2}

Let ˆG

(uv)
skslhths

be a sample plug-in estimate of G

(uv)
skslhths

, and ˆσ2

nt,uv0 be the corresponding

sample estimate of σ2

nt,uv0. Then, the power-enhanced test statistic is

(cid:110)
ˆσ−1

nt,0

M∗

n = max

1≤t≤T−1

(cid:88)

u≤v

ˆDnt + λn

I( ˆDnt,uv > δn,pˆσnt,uv0)

,

(cid:111)

where δn,p and λn are tuning parameters. The tuning parameters are chosen such that the
second part of M∗

n equals zero with probability tending to one under H0, and it converges

to a large number under sparse alternatives.

24

We now discuss the choices for tuning parameters for the above power-enhanced test

statistic. Let R = (ρij) be the correlation matrix corresponding to the common covariance
Σ1 under H0. Deﬁne Nj(α) = card{i : |ρij| > (log p)−1−α} and Λ(r) = {i : |ρij| >
r for some j (cid:54)= i}. We assume the following condition used in Cai et al. (2013).

Condition 3. Suppose that there exists a α and a set π ⊂ {1, . . . , p} whose size is o(p) such
that max1≤j≤p,j(cid:54)∈π Nj(α) = o(pγ) for all γ > 0. In addition, there exists a r < 1 and a
sequence of numbers Λp,r = o(p) so that card{Λ(r)} ≤ Λp,r.

as M∗

s1=1

n

Deﬁne ls1s2 = max1≤u≤v≤p(ˆσs1,uv−ˆσs2,uv)2/σns1s2,uv0 where σ2

ns1s2,uv0 = var{(ˆσs1,uv−
ˆσs2,uv)2} under H0. Similar to the proof of Theorem 1 in Cai et al. (2013), under Condition
3 and H0, we can show that

(2.9)
(cid:80)T
Deﬁne Luv = ˆDnt,uv/ˆσnt,uv0 and Ln = max1≤u≤v≤p Luv. Denote the second term in M∗
s2=t+1 σns1s2,uv0/σnt,uv0 ≤
n1 = λn
K, uniformly for all u, v for a constant K > 0, and uniform consistency of ˆσnt,uv0 to σnt,uv0,
(cid:17)

pr{ls1s2 − 4 log(p) + log log(p) ≤ t} → exp{− exp(−t/2)/(cid:112)(8π)}.
(cid:80)
u≤v I( ˆDnt,uv > δn,pˆσnt,uv0). Because(cid:80)t
(cid:16)
we have, under H0,
pr(M∗
n1 = 0) ≥ pr
(cid:26)
(cid:26)
(cid:110)
≥ 1 − t(cid:88)

(ˆσs1,uv − ˆσs2,uv)2
t(cid:88)

(ˆσs1,uv − ˆσs2,uv)2/σns1s2,uv0 ≤ δn,p/K

σns1s2,uv0
(ˆσs1,uv − ˆσs2,uv)2

Ln ≤ δn,p) = pr( max

ˆDnt,uv/ˆσnt,uv0 ≤ δn,p

(cid:27)

≤ δn,p

σns1s2,uv0
σnt,uv0

σns1s2,uv0
σnt,uv0

max
1≤s1≤t,
t+1≤s2≤T

max
1≤s1≤t,
t+1≤s2≤T

1≤u≤v≤p

T(cid:88)

= pr

max

1≤u≤v≤p

T(cid:88)

(cid:16)

max

1≤u≤v≤p

max

1≤u≤v≤p

σns1s2,uv0

s1=1

s2=t+1

(cid:27)

≤ δn,p

t(cid:88)

s1=1

s2=t+1

(cid:17)

pr

ls1s2 > δn,p/K

.

T(cid:88)

(cid:111)

≥ pr

≥ pr

s1=1

s2=t+1

Applying the result in (2.9), if δn,p/K − 4 log(p) + log log(p) → ∞, then pr(M∗
n1 = 0) → 1.
We suggest choose δn,p at the order of log(n) log(p) and λn to be a constant based on our

25

numerical experiments. In summary, the tuning parameters δn,p and λn ensure that, under
the null hypothesis, M∗

n1 converges to zero with probability one.

2.4 Change point identiﬁcation

If H0 is rejected, then there exist change points among the covariances Σt. We ﬁrst

consider an alternative with one change point:

H∗
1 : Σ1 = ··· = Σk1

(cid:54)= Σk1+1 = ··· = ΣT ,

where k1 is the true change point, whose location is estimated by

ˆk1 = arg max

1≤t≤T−1

ˆDnt.

(2.10)

(2.11)

Deﬁne the weight function

r(t; k) =

 (T − k)/(T − t),

k/t,

1 ≤ t ≤ k,
k + 1 ≤ t ≤ T − 1.

For any ﬁxed value k ∈ {1, . . . , T − 1}, the function r(t; k) achieves its maximum value at
t = k. Let βn = max1≤t≤T−1 max{√
theorem establishes the rate of convergence of the change point estimator ˆk1 obtained by
(2.11) under the alternative H∗
1 .

V0t,(cid:112)(nV1t)} and ∆n = tr{(Σ1 − ΣT )2}. The next

Theorem 4. Under the alternative H∗
its maximum at t = k1. Moreover, ˆk1 − k1 = Op{βn/(n∆n)}.

1 in (2.10), E( ˆDnt) = Dt = r(t; k1)∆n and Dt attains

Since r(t; k1) achieves its maximum at t = k1, the ﬁrst part of Theorem 4 indicates that
t = k1 maximizes E( ˆDnt) as a function of t. This is the rationale for estimating k1 through
√
(2.11). When the data dimension is ﬁxed, ˆk1−k1 = Op(1/
n). The eﬀect of data dimension
is reﬂected both in βn and ∆n. Here βn can be considered as noise and ∆n can be viewed as
signal. If the signal level is larger than the noise level, the rate of convergence of ˆk1 is faster

n). On the other hand, if βn is not smaller than n∆n, ˆk1 is not consistent.

26

√
than Op(1/

Next, we consider the alternative, H1, with multiple change points k1 < ··· < kq, as
1 , we have shown in Theorem 4 that the maximum of Dt is

speciﬁed in (2.2). Under H∗

attained at change point k1.

Theorem 5. Under H1 in (2.2), the maximum value of Dt is attained at one of the change
points among k1 < ··· < kq.

If we estimate the multiple change points by repeatedly applying estimation methods in

(2.11) to the population version Dt to all sub-sequences with non-zero Dt, Theorem 5 ensures

that we ﬁnd all the true change points. This property is important for applying the binary

segmentation method to identify multiple change points as demonstrated by Venkatraman

(1992) in an unpublished technical report.

To describe the proposed binary segmentation method, we ﬁrst deﬁne some notation.

Let [It] represent the quantities computed based on the data within the time interval It,
a subset of [1, T ]. For example, Mn[t1, t2] is the test statistic deﬁned in (2.6) calculated
based on Y [t1, t2], the data collected between time t = t1 and t = t2 for t1 < t2. Namely,
Mn[t1, t2] = maxt1≤t<t2 ˆσ−1

nt,0[t1, t2] ˆDnt[t1, t2].

The binary segmentation method can be summarized as follows. Let αn be a number
speciﬁed in Theorem 6. In the ﬁrst step, compute Mn[1, T ]. If Mn[1, T ] < Wαn[1, T ], where
Wαn[1, T ] is the cut-oﬀ quantile estimated based on Y [1, T ], we accept the null hypothesis
and stop. Otherwise, we identify the change point, say ˆk1, using (2.11). Next, we compute
Mn for both subsequences Y [1, ˆk1] and Y [ˆk1 + 1, T ]. For each subsequence, we repeat the
ﬁrst step until no change points can be identiﬁed or the number of repeated measurements

in the subsequence is less than two.

Let It be any interval of the form [kf + 1, kg] with f + 1 < g where f ∈ {0, . . . , q− 1} and
g ∈ {2, . . . , q + 1} that contains at least one change point kj for j ∈ {1, . . . , q}, where k0 = 0
σ−1
maxks∈It
and kq+1 = T . Denote mSNR = minIt
nks,0[It]Dks[It] as the smallest maximum
signal-to-noise ratio among all segmentations It.

27

Theorem 6. Assume αn → 0 and mSN R diverges so that Wαn = o(mSN R). For all It,
if βn[It] = o(nDks[It]) for some change points ks ∈ It, we have, limn→∞ pr(ˆq = q; ˆkj =
kj, j = 1, . . . , q) = 1.

The ﬁrst assumption, with Wαn = o(mSN R), is a very mild condition, which ensures

the consistency of the proposed test at each step of the binary segmentation. The second

assumption βn[It] = o(nDks[It]) is needed to ensure the consistency of the change point

estimators. Theorem 6 implies that the proposed binary segmentation procedure consistently

estimates the number and locations of the change points.

2.5 Simulation studies

In this section, we present multiple simulation studies to demonstrate the ﬁnite sample

performance of the proposed method. The data were generated from the model

L(cid:88)

h=0

Yit = µt +

At,hηi(t−h)

(i = 1, . . . , n; t = 1, . . . , T ),

cov(Yit, Yis) =(cid:80)L

where At,h is a p × p matrix, µt = 0 and ηit are p-dimensional multivariate normally dis-
tributed random vectors with mean 0 and covariance Ip. Let t ≥ s. This implies that
s,h−(t−s) if t − s ≤ L and cov(Yit, Yis) = 0 if t − s > L, and
allows dependence among components within the vector Yit and dependence among {Yit}T
at diﬀerent time points. In the simulation studies, we set n = 40, 50 and 60, p = 500, 750

h=t−s At,hAT

t=1

and 1000, T, = 5 and 8, and L = 3. The simulation results reported in Tables 2.1 and

2.2 were based on 500 replications. The results in Table 2.3 were based on 100 simulation

replications.

Let k1 = [T /2] be the largest integer no greater than T /2. For t ∈ {1, . . . , k1}, we set
At,h = A(1) for h ∈ {0, . . . , L}. For t ∈ {k1 + 1, . . . , T} and h ∈ {0, . . . , L}, At,h = A(2).
In setting (I),
Two simulation settings were used for the generation of the A matrices.

we set A(1) = (cid:8)0.6|i−j|I(|i − j| < p/5)(cid:9), and A(2) = (cid:8)(0.6 + δ)|i−j|I(|i − j| < p/5)(cid:9).

If δ = 0, A(1) and A(2) are the same and the covariances of Yit are the same for all t.

28

Hence, the null hypothesis, H0, is true.

the true change point.

In setting (II), we set A(1) = (cid:8)(cid:0)|i − j| + 1(cid:1)−2I(|i − j| < p/5)(cid:9)
and A(2) = (cid:8)(cid:0)|i − j| + δ + 1(cid:1)−2I(|i − j| < p/5)(cid:9). Similar to setting (I), a value of δ = 0

If δ (cid:54)= 0, the null hypothesis is false and k1 is

corresponds to the null hypothesis being true. If δ (cid:54)= 0, k1 is the underlying true change
point for the covariance matrices.

Table 2.1 demonstrates the empirical size and power of the proposed test for the homo-

geneity of covariance matrices under setting (I) at nominal level 0.05. We observe that the

size of the proposed test is reasonably close to the nominal level. The power increases as n

increases, as δ increases, and as T increases. Table 2.1 also provides the empirical size and

power of the proposed test under simulation setting (II). The phenomena in setting (II) are

very similar to those in setting (I).

Table 2.1: Empirical size and power of the proposed test, percentages of simulation
replications that reject the null hypothesis under settings (I) and (II)

Setting

δ

(I)

0(size)

0.05

0.10

(II)

0(size)

0.10

0.20

T = 5

T = 8

n
40
50
60
40
50
60
40
50
60
40
50
60
40
50
60
40
50
60

500
4.6
4.6
6.0
21.4
37.0
45.6
99.6
100
100
4.4
5.6
4.8
33.4
44.2
65.4
99.8
99.8
100

p
750
4.8
5.2
4.4
27.6
36.0
49.2
100
100
100
5.4
4.6
4.6
35.8
48.6
63.6
99.8
100
100

1000
6.4
5.4
4.2
24.8
36.0
46.2
99.8
100
100
5.0
4.8
4.2
38.2
47.0
60.4
99.6
100
100

500
4.8
4.4
5.4
35.6
49.8
59.6
100
100
100
4.4
6.0
3.6
50.2
68.4
87.0
100
100
100

p
750
4.8
5.8
4.2
34.6
48.8
65.6
100
100
100
4.0
5.2
5.6
52.0
70.6
89.6
100
100
100

1000
4.4
4.6
3.6
34.2
52.0
65.0
100
100
100
4.8
5.6
5.0
51.6
74.0
88.0
100
100
100

The percentages of correct identiﬁcation are summarized in Table 2.2 when the null

29

hypothesis is false under settings (I) and (II). The percentages of correct identiﬁcation are

the percentages of simulation replications that estimate the location of the change point

correctly among all those that reject the null hypothesis. When T = 5, the true change

point is k1 = 2, and when T = 8, the true change point is k1 = 4. In both settings, for

almost all the cases, the percentages increase as n and δ increase.

Table 2.2: Percentages of correct change point identiﬁcation among all rejected hypotheses
under settings (I) and (II)

Setting

δ

(I)

0.05

0.10

0.10

0.20

(II)

T = 5

p
750
37.96
45.81
53.28
96.60
98.60
99.80
45.51
61.51
69.72
90.98
92.60
96.20

n
40
50
60
40
50
60
40
50
60
40
50
60

500
41.12
51.35
52.63
93.17
98.00
99.20
49.10
65.00
72.78
90.58
93.37
97.00

1000
40.65
43.33
52.17
95.19
98.20
99.40
55.50
55.98
64.57
89.16
93.20
96.40

500
30.18
39.52
49.33
93.79
98.40
99.80
43.12
53.80
67.59
95.80
98.60
99.80

T = 8

p
750
29.88
39.34
49.70
93.80
97.20
98.60
47.15
61.19
72.99
96.00
98.20
99.80

1000
37.58
41.54
55.08
96.40
99.00
99.00
47.10
58.65
75.23
95.60
99.40
99.80

To demonstrate the performance of the proposed binary segmentation procedure for

identifying multiple change points, we generated data using simulation setup (II) with two

change points, k1 and k2. When T = 5, k1 = 2 and k2 = 4. When T = 8, k1 = 4 and
k2 = 6. For t ∈ {kj−1 + 1, . . . , kj}, we set At,h = A(j) for h ∈ {0, . . . , L} and j = 1, 2, 3 with
k0 = 0 and k3 = T . Here, A(1) and A(2) were set to be the same as those in setting (II),
and we set A(3) = A(1). The values of δ were chosen to be 0.15 and 0.25. The average true

positives and the average true negatives are summarized in Table 2.3. The true positives are

the correctly-identiﬁed change points, and the true negatives are the correctly-identiﬁed time

points where no covariance change exists. For T = 5, the maximum number of true positives

and true negatives for each is 2. For T = 8, the maximum number of true positives and true

negatives is 2 and 5, respectively. The results in Table 2.3 show that the proposed binary

30

segmentation procedure performs well as the sample size, n, increases and as the signal, δ,

increases.

Table 2.3: Average true positives and average true negatives for identifying multiple
change points using the proposed binary segmentation method. Standard errors are
included after each number. For T = 5, the maximum number of true positives and true
negatives for each is 2. For T = 8, the maximum number of true positives and true
negatives is 2 and 5, respectively

δ=0.15

δ=0.25

T

p

500

5

750

1000

500

8

750

1000

n ATP SE ATN SE ATP SE ATN SE
40
0.27
0.14
50
0.28
60
40
0.24
0.24
50
0.14
60
0.22
40
50
0.20
0.14
60
0.30
40
50
0.27
0.22
60
0.36
40
50
0.24
0.30
60
0.27
40
0.25
50
60
0.24

1.92
1.98
1.92
1.94
1.96
1.98
1.95
1.96
1.98
4.90
4.92
4.95
4.85
4.94
4.90
4.92
4.96
4.94

1.81
1.94
2.00
1.76
2.00
2.00
1.87
1.96
2.00
1.91
1.97
2.00
1.90
1.97
2.00
1.88
1.99
2.00

0.30
0.37
0.27
0.41
0.27
0.30
0.30
0.20
0.20
0.40
0.36
0.32
0.39
0.38
0.34
0.44
0.40
0.27

1.10
1.36
1.57
1.11
1.38
1.47
1.15
1.22
1.54
1.40
1.62
1.78
1.52
1.67
1.81
1.43
1.68
1.84

0.39
0.24
0.00
0.43
0.00
0.00
0.34
0.20
0.00
0.29
0.17
0.00
0.30
0.17
0.00
0.33
0.10
0.00

0.36
0.48
0.50
0.37
0.49
0.50
0.36
0.42
0.50
0.49
0.49
0.42
0.50
0.47
0.40
0.50
0.47
0.37

1.90
1.87
1.92
1.82
1.92
1.90
1.90
1.96
1.96
4.84
4.85
4.89
4.82
4.83
4.90
4.82
4.80
4.92

2.5.1 Power-enhanced test statistic

We conducted a numerical simulation to illustrate the performance of the power-enhanced

test statistic under sparse alternatives. The data were generated according to setting (I),

except for a sparse alternative design. Speciﬁcally, let k1 = [T /2] be the largest integer
no greater than T /2. For t ∈ {1, . . . , k1}, we set At,h = A(1) for h ∈ {0, . . . , L}. For
t ∈ {k1 + 1, . . . , T}, we set At,h = A(2), where A
null hypothesis, A

h =(cid:8)0.6|i−j|I(|i − j| < p/5)(cid:9). Under the

(2)
(1)
h was set equal to A
h . Under the sparse alternative hypothesis, A
h except the components within {|i − j| < 2, i < p/25} were set to 1.4.
(1)

was the same as A

(2)
h

(1)

31

Table 2.4: Empirical size and power, percentages of simulation replications that reject the
null hypothesis for the test statistic Mn and the power-enhanced test statistic M∗

n

Mn

M∗

n

n
40
40
40
50
50
50
60
60
60
80
80
80

p
500
750
1,000
500
750
1,000
500
750
1,000
500
750
1,000

Null Alternative Null Alternative
5.2
3.2
4.6
5.2
6.4
3.4
3.8
4.8
4.2
4.0
3.2
4.6

67.6
62.8
62.4
91.6
94.2
97.0
98.8
99.2
99.8
100
100
100

35.4
34.6
36.4
47.8
47.2
52.6
56.8
65.8
66.4
81.2
86.4
84.8

6.0
3.6
4.6
5.6
6.6
3.4
4.4
5.6
4.2
4.0
3.8
4.6

Table 2.4 reports the empirical size and power of the test based on Mn and M∗

n. In the

simulation, the tuning parameter δn,p was set to 0.5log(n) log(p), and λn was set to 0.15.

We observe that both tests can control the type I error, and the power-enhanced test does

not inﬂate the type I error. More importantly, the power-enhanced test statistic has greater

power under the sparse alternative setting.

2.5.2 Non-Gaussian random errors

To illustrate the numerical performance of the proposed method under the non-Gaussian

setup, we generated data from the linear process model Yit = µt +(cid:80)L

h=0 At,hηi(t−h) for
i = 1, . . . , n and t = 1, . . . , T , where At,h is a p × p matrix, µt = 0 and ηit are p-dimensional
random vectors with each element independently generated from a standardized Gamma

distribution with shape parameter 4 and scale parameter 0.5.

Let k1 = [T /2] be the largest integer no greater than T /2. For t ∈ {1, . . . , k1}, we set

At,h = A(1) = (cid:8)0.6|i−j|I(|i − j| < p/5)(cid:9). For t ∈ {k1 + 1, . . . , T}, we set At,h = A(2) =
(cid:8)(0.6+δ)|i−j|I(|i−j| < p/5)(cid:9). If δ = 0, A(1) and A(2) are the same. Hence, the covariances,

Σt, are the same for all t ∈ {1, . . . , T} and H0 is true. If δ (cid:54)= 0, the null hypothesis is not

32

true and k1 is the underlying true covariance change point.

In the simulation studies, we set n = 40, 50 and 60, with p = 500, 750 and 1000. The

number of repeated measurements, T , was set to be 5 and 8 and set L = 3. The simulation

results reported in Tables 2.5 and 2.6 were based on 500 simulation replications.

Table 2.5: Empirical size and power of the proposed test, percentages of simulation
replications that reject the null hypothesis for data generated from a standardized Gamma
distribution under the nominal level 5%

δ

0(size)

0.05

0.10

n
40
50
60
40
50
60
40
50
60

500
3.6
4.2
4.6
23.4
38.2
46.4
99.8
100
100

T = 5

T = 8

p
750
4.0
5.2
3.8
21.4
36.2
46.8
99.8
100
100

1000
4.4
4.4
4.6
28.2
33.4
46.2
100
100
100

500
4.0
5.2
5.0
35.2
47.8
64.6
100
100
100

p
750
3.6
4.8
5.0
38.6
50.4
67.2
100
100
100

1000
4.6
4.8
5.6
31.2
47.8
66.4
100
100
100

Table 2.5 reports the empirical size and power of the proposed test under the null and

alternative hypotheses. We observe that Type I error is well controlled with the empirical

sizes close to the nominal level of 5%. The results demonstrate the robustness of the pro-

posed method for non-Gaussian distributed random vectors. When the diﬀerences between

covariance matrices increase, the power of the proposed test increases accordingly. Table

2.6 reports the performance of the proposed change point identiﬁcation procedure under

the non-Gaussian distributed random vectors. We observe that the percentages of correct

identiﬁcation with non-Gaussian random vectors are similar to those under the Gaussian

setup.

2.5.3 Accuracy of correlation matrix estimator of VnD

This section aims to evaluate the numerical performance of the correlation matrix estimator,

ˆVnD, proposed immediately following Theorem 3. To measure the diﬀerence between ˆVnD

33

Table 2.6: Percentages of correct change point identiﬁcation among all rejected hypotheses
for data generated from a standardized Gamma distribution

δ

0.05

0.10

n
40
50
60
40
50
60

500
32.48
50.26
49.14
93.79
98.80
99.60

T = 5

p
750
42.99
52.49
55.56
96.79
99.40
99.20

T = 8

p
750
30.05
42.06
50.00
94.60
97.00
99.20

1000
30.13
46.86
52.41
95.00
97.20
99.20

1000
35.46
46.71
57.58
95.80
99.00
99.80

500
26.70
40.17
48.30
95.20
98.60
99.60

and VnD, we used the average component-wise quadratic distance, namely, (T − 1)−2(cid:107) ˆVnD −
VnD(cid:107)2
F based on 500 simulation
replications conducted in setting (I) under the null hypothesis with T = 5. We observe that

F . Figure 2.1 illustrates the average of (T −1)−2(cid:107) ˆVnD−VnD(cid:107)2

the correlation matrix estimator, ˆVnD, is reliable when n = 40. The performance further

improves as the sample size increases.

Figure 2.1: The average component-wise quadratic distance between ˆVnD and VnD. The
top solid line is for n = 40; the middle dashed line is for n = 50; the bottom dotted line is
for n = 60. The scale of the y-axis is 10−5.

34

2.5.4 Comparison with a pair-wise based method

In this section, we compare our proposed method with a pair-wise based method that is

similar to the method proposed by Zalesky et al. (2014). In the pair-wise based method,

we ﬁrst obtain a p-value for testing the homogeneity of each component of the covariance
matrix for every pair of coordinates (u, v) with u ≤ v and u, v ∈ {1, . . . , p}, and then apply

the Bonferroni correction to all the p-values to control the family-wise error rate.

In the ﬁrst step, for each pair (u, v) with u ≤ v and u, v ∈ {1, . . . , p}, we test the following

hypothesis

versus

H0,uv : σ1,uv = ··· = σT,uv,

n(cid:80)T−1

H1,uv : σ1,uv = ··· = σk1,uv (cid:54)= σk1+1,uv = ··· = σkq,uv (cid:54)= σkq+1,uv = ··· = σT,uv.
ˆDnt,uv. Under H0,uv, the asymptotic distribution of ˆDn,uv is (cid:80)∞

To test H0,uv, we apply the statistic ˆDnt,uv deﬁned in Section 4, and deﬁne ˆDn,uv =

l , where
χ2
l are independent chi-square distributions with degree of freedom 1, and λl’s are the eigen-
values of the kernel of ˆDn,uv.
In practice, one can approximate the weighted chi-square

l=1 λlχ2

t=1

distribution using a scaled chi-square distribution. Thus, we approximate the distribution

of ˆDn,uv by bχ2
and variance of ˆDn,uv under H0,uv, respectively. The variance of ˆDn,uv under H0,uv is

uv/(2µuv) and ν = 2µ2

uv. Here µuv and σ2

ν, where b = σ2

uv/σ2

uv are the mean

T−1(cid:88)

T−1(cid:88)

t(cid:88)

q(cid:88)

T(cid:88)

T(cid:88)

(cid:88)

t=1

q=1

s1=1

h1=1

s2=t+1

h2=q+1

k,l,

s,t∈{1,2}

σ2
uv =

(−1)|k−l|+|s−t|G

(uv)
skslhths

,

where G

(uv)
skslhths

is deﬁned in Section 4. The mean of ˆDn,uv under the null H0,uv is

T−1(cid:88)

t(cid:88)

T(cid:88)

(cid:88)

t=1

s1=1

s2=t+1

a,b∈{1,2}

µuv =

(−1)|a−b|{C

(uu)
sasb

C

(vv)
sasb

+ C

(uv)
sasb

C

(vu)
sasb

}.

We then approximate the distribution of ˆDn,uv by ˆbχ2
2ˆµ2

uv. The p-value for the (u, v) pair is computed as puv = pr(ˆbχ2

ˆν where ˆb = ˆσ2

uv/ˆσ2

ˆν > ˆDn,uv).

uv/(2ˆµuv) and ˆν =

35

In the second step, we apply the Bonferroni correction to control the family-wise error
rate. Deﬁne pmin = minu≤v puv as the minimum of all the pair-wise p-values. If pmin <
2α/{p(p + 1)}, then we reject the null hypothesis on the homogeneity of covariance matrices

at the α level.

To compare the proposed methods with the pair-wise based method, we conducted a

simulation study using the simulation setup given in Subsection 2.5.1. The simulation re-

sults are summarized in Table 2.7. We observe that the pair-wise based method has very

conservative size under the null hypothesis when sample size is less than 80, but it improves

as sample size increases. Under the alternatives, the power of the pair-wise based method is

low for the small sample cases, but it increases as sample size increases to 80. However, in

all the cases, our proposed power-enhanced method has superior power than the pair-wise

based method.

Table 2.7: Empirical size and power, percentages rejecting the null hypotheses in the
simulations, for the pair-wise based test and the power-enhanced test statistic M∗

n

n
40
40
40
50
50
50
60
60
60
80
80
80

p
500
750
1000
500
750
1000
500
750
1000
500
750
1000

M∗

n

Pair-wise based test
Null Alternative Null Alternative
0.2
0.0
0.0
0.4
0.2
0.2
0.6
0.2
0.6
0.4
2.4
2.0

67.6
62.8
62.4
91.6
94.2
97.0
98.8
99.2
99.8
100
100
100

0.2
0.4
0.0
0.4
0.0
0.2
12.2
4.8
1.0
97.6
98.8
96.8

6.0
3.6
4.6
5.6
6.6
3.4
4.4
5.6
4.2
4.0
3.8
4.6

2.6 An empirical study

In this section, we apply our proposed method to a time-course gene expressions data set

collected by Taylor et al. (2007). The goal was to identify gene sets with signiﬁcant changes

36

in covariances over time and estimate their respective change points, should any exist. The

data correspond to a study where peripheral blood mononuclear cells were collected from 69

patients with hepatitis C virus. The cells were collected once before treatment, day 0, and

ﬁve times during treatment: days 1, 2, 7, 14 and 28. The treatment consisted of pegylated

alpha interferon and ribavirin. More information about the experiment can be found in

Taylor et al. (2007).

Prior to the application of our methodology, the data were pre-processed. The gene

expressions with low quality measurements were removed if the corresponding Microarray

Suite 5.0 signal transcript was classiﬁed as absent. We only kept individuals with gene

expression arrays at all six time points. After pre-processing, our data set consisted of 46

individuals with gene expression arrays at days 0, 1, 2, 7, 14 and 28. The original data set

can be obtained at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE7123.

The genes were grouped into gene sets that were deﬁned by gene ontology, which classiﬁes

genes according to attributes of the gene in three biological domains: molecular function,

biological process, and cellular component (Ashburner et al., 2000). For instance, the gene

ontology term labeled 0006468 is related to introducing a phosphate group onto a protein.

Hence, this gene ontology term would consist of all the genes that have a role in the afore-

mentioned biological process. A given gene can be a member of multiple gene ontologies.

For example, in our processed data set, gene ontology 0006468 consists of 221 genes and gene

ontology 0007155 consists of 134 genes, with 64 genes in common. After ﬁltering the data

set according to the procedure above, 159 gene ontology terms were analyzed. We applied

our method to gene ontology terms with a minimum of 100 genes. Figure 2.2 illustrates the

number of genes in the 159 gene ontology terms. Each gene set analyzed had a gene count

much larger than the sample size of 46 patients.

37

Figure 2.2: Histogram of the number of genes among the 159 gene ontology terms analyzed.

Let Y

(g)
it

(i = 1, . . . , 46; t = 1, . . . , 6) be the gene expression data for the gth gene ontology

term of the ith individual at time t, where t = 1 represents day 0, before treatment, and

t = 2, 3, 4, 5, 6 represent the times during the treatment of hepatitis C virus with pegylated

alpha interferon and ribavirin. Assume model (2.3) for each gene ontology term, Y

(g)
it =
(g)
(g)
it ) = Σ
.
t
it }T
(g)

t=1

µ

(g)
t + ε

(g)
it

for g = 1, . . . , 159, where µ

is an unknown mean vector and var(ε

The assumptions on ε

(g)
it

in model (2.3) incorporate temporal dependence so that {ε

(g)
t

are dependent over time. For each gene ontology term, we tested whether the covariance

matrices, Σ

(g)
t

, are the same across all t. In addition, the change points were identiﬁed for

those gene ontology terms found to be signiﬁcant.

For the gth gene ontology term, we computed ˆD

(g)
nt /ˆσ(g),nt,0 for t = 1, . . . , 5 and the

covariance matrix estimation ˆV
n be the maximum of the standardized test
statistics { ˆV
n5 }T . For each gene ontology term, the
ˆD
n }159
local false discovery rate was estimated using { ˜M(g)
g=1 based on the method proposed by

n,D}−1/2{ˆσ−1

n,D. Let ˜M(g)
n1 , . . . , ˆσ−1
ˆD

(g),n1,0

(g),n5,0

(g)

(g)

(g)

(g)

Efron (2007). As suggested in Efron (2007), a cutoﬀ value of 0.20 was used for the local

38

false discovery rate procedure. There were 10 gene ontology terms that had a local false

discovery rate less than or equal to 0.20. These 10 signiﬁcant gene ontology terms and their

corresponding number of genes, test statistic value, estimated change points, and local false

discovery rate are listed in Table 2.8. Among those gene ontology terms listed in Table

2.8, term 0008285 is associated with the reduction or stoppage of cell proliferation. This

is of interest, as Kannan et al. (2011) had noted that the hepatitis C virus reduces cell

proliferation. Thus, the results here suggest that treatment using pegylated alpha interferon

and ribavirin has some eﬀect on the covariances of those genes that play a role in cellular

proliferation.

Table 2.8: Signiﬁcant gene ontology terms, test statistic values, number of genes in each
gene ontology term, identiﬁed change points and estimated local false discovery rates

GO

Number of Genes Test Statistic Value Change Points Local FDR

0006511
0030054
0042493
0008219
0006357
0005765
0019904
0008285
0048471
0005739

132
136
128
122
167
116
117
148
263
661

11.10
9.92
9.54
9.34
9.13
8.93
8.87
8.75
8.04
8.04

4, 5

1, 4, 5

5

4, 5
1, 4

4

4, 5

1, 2, 5
1, 4, 5

4, 5

0.012
0.044
0.064
0.076
0.090
0.103
0.106
0.115
0.168
0.168

After identifying ten signiﬁcant gene ontology terms, we applied binary segmentation to

identify all change points. We discovered that eight terms have a change point at t = 5,

day 14, eight have a change point at t = 4, day 7, and four terms have a change point at

t = 1, day 0. Recall that a change point at time t = 5 implies the covariance matrix at

time t = 5 is not equal to that at time t = 6. Hence, most of the identiﬁed changes in the

covariance matrices occurred by the initial day of treatment or later in the treatment cycle.

These ﬁndings complement those of Taylor et al. (2007), who observed that the majority

of the genes that were altered in expression occurred at the early days of treatment and

again, marginally, between treatment days 7 and 28. To illustrate the changes in covariance

39

matrices, Figure 2.3 demonstrates the correlation networks of gene ontology term 0030054

at the six time points. We see that the correlation networks change at time points 1, 4 and

5, which is consistent with the identiﬁed change points reported in Table 2.8.

Figure 2.3: Correlation network map for gene ontology term 0030054. Each dot represents
a gene within the gene ontology. A link between dots indicates a strong correlation
between genes.

2.7 Technical details

2.7.1 Proofs of lemmas

In this section, we present the proofs to some lemmas used in the proofs of the main theorems.
Without loss of generality, assume that µt = 0 in our proofs for each t ∈ {1, . . . , T} because
the test statistic, ˆDnt, is invariant with respect to µt.

Lemma 1. (i) For any symmetric matrices A and B with appropriate dimensions, we have
tr2(AB) ≤ tr(A2)tr(B2); (ii) for any square matrix A, |tr(A2)| ≤ tr(AAT); and (iii) for any
square matrix A, (cid:107)A2(cid:107)2
F = tr(BTB) is the Frobenius norm of B.

F where (cid:107)B(cid:107)2

F ≤ (cid:107)ATA(cid:107)2

40

Proof. (i) Let A = (aij) and B = (bij). By the Cauchy-Schwarz inequality,

(cid:88)

(cid:88)

aijbij ≤(cid:88)

(cid:16)(cid:88)

(cid:17)1/2(cid:16)(cid:88)

(cid:17)1/2 ≤(cid:16)(cid:88)

(cid:88)

(cid:17)1/2(cid:16)(cid:88)

(cid:88)

(cid:17)1/2

.

b2
ij

a2
ij

b2
ij

a2
ij

tr(AB) =

i

j

i

j

j

i

j

i

j

Since A and B are symmetric, the right-hand side of the above inequality is the square root

of tr(A2)tr(B2).

(ii) Assume that A = (aij) is any p× p matrix. If tr(A2) ≥ 0, because tr{(AT − A)(AT −
A)T} ≥ 0 and tr{(AT − A)(AT − A)T} = 2tr(ATA)− 2tr(A2), we have |tr(A2)| ≤ tr(AAT).
If tr(A2) < 0, because tr{(AT + A)(AT + A)T} ≥ 0 and tr{(AT + A)(AT + A)T} =

2tr(ATA) + 2tr(A2) = 2tr(ATA) − 2|tr(A2)|, we have |tr(A2)| ≤ tr(AAT).

(iii) By deﬁnition, (cid:107)A2(cid:107)2

F = tr(ATATAA) = tr(ATAAAT). Since ATA and AAT are

symmetric matrices, it follows by using part (i) that

tr(ATAAAT) ≤ (cid:107)ATA(cid:107)F(cid:107)AAT(cid:107)F = (cid:107)ATA(cid:107)2
F ,

(cid:3)

and this completes the proof.

Lemma 2. Deﬁne Us1s2,0 = {n(n − 1)}−1(cid:80)n

)2 for any s1, s2 ∈ {1, . . . , T}.
Under Condition 1, the leading order term of the covariance between Us1s2,0 and Uh1h2,0 is
Gn(s1, s2, h1, h2) = cov(Us1s2,0, Uh1h2,0), where

i(cid:54)=j=1(Y T
is1

Yjs2

Gn(s1, s2, h1, h2) =

2

n(n − 1)
2(n − 2)
n(n − 1)

+

tr2(Cs1h1

CT

(cid:88)

2

tr2(Cs1h2

CT

s2h1

)

) +

n(n − 1)

s2h2
tr(Σsuc Csuhv Σhvc CT

).

suhv

u,v∈{1,2}

Denote uc as the complement set of {u}. That is, uc = {1, 2}/{u}.

41

Proof. Using the notation

i(cid:54)=j=1

i(cid:54)=j=1

is1

is1

ih1

Yjs2

Yjs2

∼(cid:80) deﬁned in Section 2.3, we deﬁne
n(cid:88)
)2 − tr(Σs1Σs2)(cid:9)(cid:8)(Y T
E(cid:2)(cid:8)(Y T
n(cid:88)
E(cid:2){(Y T
∼(cid:88)
i,j,l E(cid:2){(Y T
∼(cid:88)
i,j,l E(cid:2){(Y T
∼(cid:88)
i,j,l E(cid:2){(Y T
∼(cid:88)
i,j,l E(cid:2){(Y T
∼(cid:88)
i,j,k,l E(cid:2){(Y T

)2 − tr(Σs1Σs2)}{(Y T
jh1
)2 − tr(Σs1Σs2)}{(Y T
ih1
)2 − tr(Σs1Σs2)}{(Y T
lh1
)2 − tr(Σs1Σs2)}{(Y T
jh1
)2 − tr(Σs1Σs2)}{(Y T
lh1
)2 − tr(Σs1Σs2)}{(Y T
kh1

Yjs2

Yjs2

Yjs2

Yjs2

Yjs2

is1

is1

is1

is1

is1

L1 =

L2 =

L3 =

L4 =

L5 =

L6 =

L7 =

1
(P 2
n)2

1
(P 2
n)2
1
(P 2
n)2
1
(P 2
n)2
1
(P 2
n)2
1
(P 2
n)2
1
n)2
(P 2

)(cid:9)(cid:3),
)}(cid:3),
)}(cid:3),
)}(cid:3),
)}(cid:3),
)}(cid:3),
)}(cid:3).

)2 − tr(Σh1

Σh2

Yjh2

)2 − tr(Σh1

Σh2

Yih2

Ylh2

Yih2

Ylh2

Yjh2

Σh2

Σh2

)2 − tr(Σh1
)2 − tr(Σh1
)2 − tr(Σh1
)2 − tr(Σh1
Σh2
)2 − tr(Σh1

Σh2

Σh2

Ylh2
)2} = tr(Σs1Σs2). Applying

Then cov(Us1s2,0, Uh1h2,0) = L1 + ··· + L7 since E{(Y T

is1

Yjs2

standard results in multivariate analysis, we obtain

Yjs2

E{(Y T
)2(Y T
is1
ih1
+ 2tr2(Cs2h2

Yjh2

Ch1s1

)2} = 2tr(Ch1s1
) + 2tr(Σs2Cs1h1

Cs2h2

Σh2

Ch1s1
CT

s1h1

Cs2h2

) + 2tr(Σs1CT

Σh1

Cs2h2

)

) + tr(Σs2Σs1)tr(Σh2

s2h2
Σh1

).

This implies that

L1 + L2 =

(cid:104)
tr{(Cs2h2

2

n(n − 1)
+ tr2(Cs2h1
+ tr(Σs1Cs2h1

Ch1s1

)2} + tr{(Cs2h1
CT

Σh1

Ch2s1

) + tr(Σs1Cs2h2
CT

s2h1

) + tr(Σs2Cs1h2

s2h2
Σh1

Σh2

Ch2s1

)2} + tr2(Cs2h2
(cid:105)

) + tr(Σs2Cs1h1
CT

Σh2

)

.

s1h2

Ch1s1

)

CT

s1h1

)

Furthermore, L7 = 0 and

6(cid:88)

i=3

2(n − 2)
n(n − 1)

Li =

(cid:88)

u,v∈{1,2}

tr(Σsuc Csuhv Σhvc CT

suhv

),

This with Condition 1 implies that Lemma 2 is valid.

(cid:3)

42

Lemma 3. Deﬁne Us1s2,1 = (1/P 3
n)
covariance between Us1s2,1 and Uh1h2,1 is
4
n3

cov(Us1s2,1, Uh1h2,1) =

i,j,k Y T
is1

Yjs2

Y T
js2

Yks1

. The leading term in the

∼(cid:80)

(cid:88)
(cid:88)

u,v∈{1,2}

u,v∈{1,2}

+

2
n2

tr2(Csuhv CT

suchvc )

tr(Σsuc Csuhv Σhvc CT

suhv

),

where uc is the complement set of {u}. That is, uc = {1, 2}/{u}. In addition, var( ˆDnt,1) =
o{var( ˆDnt,0)}.

∼(cid:88)

∼(cid:88)

Proof. Because E(Us1s2,1) = 0, cov(Us1s2,1, Uh1h2,1) = E(Us1s2,1Uh1h2,1). By deﬁnition,

Us1s2,1Uh1h2,1 =

1
(P 3
n)2

i,j,k

i1,j1,k1

(Y T
is1

Yjs2

Y T
js2

Yks1

+ Y T
is2

Yjs1

Y T
js1

Yks2

)

× (Y T
i1h1

Yj1h2

Y T
j1h2

Yk1h1

+ Y T

i1h2

Yj1h1

Y T
j1h1

Yk1h2

).

According to the number of equivalent indices among two sets {i, j, k} and {i1, j1, k1}, we
decompose Us1s2,1Uh1h2,1 into three terms. Let Ic = {i, j, k}∪{i1, j1, k1} where c represents
the number of indices that are equivalent to each other in two sets {i, j, k} and {i1, j1, k1}.
If there is one index equivalent,

I1 = {(i = i1, j, k, j1, k1), (i = j1, j, k, i1, k1), (i, j, k = i1, i1, j1),

(i, j = i1, k, j1, k1), (i, j = j1, k, i1, k1), (i, j = k1, k, i1, j1),
(i, j, k = i1, j1, k1), (i, j, k = j1, i1, k1), (i, j, k = k1, i1, j1)}.

For each case within I1, the expectation of corresponding summand in Us1s2,1Uh1h2,1 is 0.
If there are two indices equivalent,

I2 = {(i = i1, j = j1, k, k1), (i = j1, j = i1, k, k1), (i = i1, k = k1, j, j1),

(i = k1, k = i1, j, j1), (j = j1, k = k1, i, i1), (j = k1, k = j1, i, i1),

(i = i1, j = k1, k, j1), (i = j1, j = k1, k, i1), (i = i1, k = j1, j, k1),
(i = k1, k = j1, j, i1), (j = j1, k = i1, i, k1), (j = k1, k = i1, i, j1)}.

43

Among all the cases in I2, there exist two cases {(i = i1, k = k1, j, j1), (i = k1, k = i1, j, j1)}
whose expectations of the summand in Us1s2,1Uh1h2,1 are not zero. Similarly, if there are
three indices equivalent,

I3 = {(i = i1, j = j1, k = k1), (i = j1, j = i1, k = k1), (i = k1, j = j1, k = i1),
(i = k1, j = i1, k = j1), (i = i1, j = k1, k = j1), (i = j1, j = k1, k = i1)}.

Among all the cases in I3, there are two cases (i = i1, j = j1, k = k1) and (i = k1, j =

j1, k = i1) that have non-zero expectation.

In summary,

E(Us1s2,1Uh1h2,1) =

2
n)2 E
(P 3

× (Y T
ih1

(cid:110) ∼(cid:88)
(cid:110) ∼(cid:88)
(cid:104)

2
n)2 E
(P 3
(cid:88)
× (Y T
ih1

+

=

u,v∈{1,2}

2
P 3
n
+ tr{(Csuhv CT

i,k,j,j1

(Y T
is1

Yjs2

Y T
js2

Yks1

+ Y T
is2

Yjs1

Y T
js1

Yks2

)

Y T
Yj1h2
j1h2
i,k,j (Y T
is1

Ykh1

+ Y T
ih2

Yj1h1

Y T
j1h1

Yjs2

Y T
js2

Yks1

+ Y T
is2

Yjs1

Yks2

)

(cid:111)

)

Ykh2
Y T
js1

(cid:111)

)

Ykh2
) + tr2(Csuhv CT

suchvc )

(cid:105)

)

.

suhv

suhv

Yjh1

Ykh1

Y T
jh2

+ Y T
ih2

Y T
Yjh2
jh1
(n − 3)tr(Σsuc Csuhv Σhvc CT
suchvc )2} + tr(Σsuc Csuhv Σhvc CT
∼(cid:80)

)(Y T
ks1

Yls2

). For any ﬁxed u, v, k, l ∈

This completes the proof.

(cid:3)

Lemma 4. Deﬁne Us1s2,2 = (1/P 4
n)
Yjs2
{1, 2}, the covariance between Ususv,2 and Uhkhl,2 is

i,j,k,l (Y T
is1

cov(Ususv,2, Uhkhl,2) =

2
P 4
n

{tr2(Csuhk
CT

+ 2tr(Csvhk

CT

svhl

) + tr(Csuhk

CT

svhl

Csuhk

CT

svhl

)

Csuhl

CT

suhk

) + 2tr(Csvhk

CT

svhl

Csuhk

CT

svhl

+ 3tr(Csvhk

CT

suhl

Csvhl

CT

suhk

) + 3tr(Csvhk

CT

suhl

Moreover, var( ˆDnt,2) = o{var( ˆDnt,0)}.

44

)

suhl
CT

)tr(Csvhl

suhk

)}.

Proof. Because E(Ususv,2) = 0, cov(Ususv,2, Uhkhl,2) = E(Ususv,2Uhkhl,2). Therefore, let
R = E{(Y T
isu

Yjsv )(Y T
ksu

Ylsv )(Y T

)}, so

)(Y T

Yj1hl

Yl1hl

k1hk

cov(Ususv,2, Uhkhl,2) =

=

R

i1,j1,k1,ll

∼(cid:88)

i,j,k,l

i1hk

∼(cid:88)
1
n)2
(P 4
{tr2(Csuhk
2
P 4
n
CT

+ 2tr(Csvhk

CT

svhl

) + tr(Csuhk

CT

svhl

Csuhk

CT

svhl

)

Csuhl

CT

suhk

) + 2tr(Csvhk

CT

svhl

Csuhk

CT

svhl

)

suhl
CT

)}.

+ 3tr(Csvhk

CT

suhl

Csvhl

CT

suhk

) + 3tr(Csvhk

CT

suhl

)tr(Csvhl

suhk

This completes the proof of the ﬁrst part.

For the second part, write ˆDnt,2 = w−1(t)(cid:80)t
(cid:88)

∗(cid:88)

It follows by the ﬁrst part that

1

var( ˆDnt,2) =

2
n4

w2(t)

s1,s2,h1,h2

u,v,k,l∈{1,2}

(cid:80)T

s2=t+1

(cid:80)

u,v∈{1,2}(−1)|u−v|Ususv,2.

s1=1

(−1)|u−v|+|k−l|{tr2(Csuhk

CT

svhl

)

+ tr(Csuhk

Csuhk

) + 2tr(Csvhk

Csuhl

+ 2tr(Csvhk

Csuhk

) + 3tr(Csvhk

Csvhl

CT

svhl
CT

suhl

CT

)

suhk
CT

suhk

)

CT

svhl
CT

svhl

CT

svhl
CT

suhl
CT

+ 3tr(Csvhk

CT

suhl

)tr(Csvhl

suhk

)}.

Applying the inequalities given in Lemma 1, we can show that var( ˆDnt,2) = o{var( ˆDnt,0)}.
This completes the proof of this Lemma.

(cid:3)

Lemma 5. Let Z be an m-dimensional multivariate normally distributed random vector
with mean 0 and covariance Im. Deﬁne M = ZZT − I. Assume A, B, C, D are matrices

45

with appropriate dimensions. Then E{tr(AM ATBM BT)} = tr2(ATB) + tr{(ATB)2} and

cov{tr(AM ATBM BT), tr(CM CTDM DT)}
= 2tr(ATB)tr(CTD)tr{(ATB + BTA)(CTD + DTC)}

tr2{(ATB + BTA)(CTD + DTC)} + tr(cid:2){(ATB + BTA)(CTD + DTC)}2(cid:3)

1
2

+
+ 2tr(ATB)tr{(ATB + BTA)(CTDCTD + DTCDTC)}
+ 2tr(CTD)tr{(CTD + DTC)(ATBATB + BTABTA)}
+ 2tr{(ATBATB + BTABTA)(CTDCTD + DTCDTC)}.

In particular,

var{tr(AM ATBM BT)} = 2tr2(ATB)tr{(ATB + BTA)2} +
+ 4tr(ATB)tr{(ATB + BTA)(ATBATB + BTABTA)}
+ 2tr{(ATBATB + BTABTA)2} + tr{(ATB + BTA)4}.

tr2{(ATB + BTA)2}

1
2

(cid:104)
tr4(ATB) + tr2{(ATB)⊗2}(cid:105)

Moreover, var{tr(AM ATBM BT)} ≤ K

for a constant K > 0.

Proof. We ﬁrst consider E{tr(AM ATBM BT)}. Because M = ZZT − I, we have

tr(AM ATBM BT) = (ZTATBZ)2 − ZTATBBTAZ − ZTBTAATBZ + tr(ATBBTA).

(2.12)

Taking expectation of the both sides of equation (2.12), we have

E{tr(AM ATBM BT)} = tr2(ATB) + tr{(ATB)2} + tr(ATBBTA) − tr(ATBBTA)

= tr2(ATB) + tr{(ATB)2}.

46

Next, we consider the covariance part. Using equation (2.12), we have

tr(AM ATBM BT)tr(CM CTDM DT) = (ZTATBZ)2(ZTCTDZ)2
− (ZTATBZ)2ZTCTDDTCZ − (ZTATBZ)2ZTDTCCTDZ
− (ZTATBZ)2tr(CTDDTC) − ZTBTAATBZ(ZTCTDZ)2

+ (ZTBTAATBZ)(ZTCTDDTCZ) + (ZTBTAATBZ)(ZTDTCCTDZ)
− (ZTBTAATBZ)tr(CTDDTC) − ZTATBBTAZ(ZTCTDZ)2

+ ZTATBBTAZ(ZTCTDDTCZ) + ZTATBBTAZ(ZTDTCCTDZ)
− ZTATBBTAZtr(CTDDTC) + tr(ATBBTA)(ZTCTDZ)2
− tr(ATBBTA)(ZTCTDDTCZ) − tr(ATBBTA)(ZTDTCCTDZ)
− tr(ATBBTA)tr(CTDDTC).

Deﬁne the terms in the above expression as J1, . . . , J16. We consider the expectation of each

Ji for i = 1, . . . , 16. We have the following:

E(J4) = [tr2(ATB) + tr{(ATB)2} + tr(ATBBTA)]tr(CTDDTC),
E(J6) = tr(BTAATB)tr(CTDDTC) + 2tr(BTAATBCTDDTC),

E(J7) = tr(BTAATB)tr(DTCCTD) + 2tr(BTAATBDTCCTD),
E(J8) = E(J12) = E(J14) = −tr(BTAATB)tr(CTDDTC),
E(J10) = tr(ATBBTA)tr(CTDDTC) + 2tr(ATBBTACTDDTC),

E(J11) = tr(ATBBTA)tr(DTCCTD) + 2tr(ATBBTADTCCTD),
E(J13) = tr(ATBBTA)[tr2(CTD) + tr{(CTD)2} + tr(CTDDTC)].

In addition, we can show that, for any matrices A, B, C of appropriate dimensions,

E(ZTAZZTBZZTCZ) = tr(A)tr(B)tr(C) + tr(A){tr(BC) + tr(BTC)}

+ tr(B){tr(AC) + tr(ATC)} + tr(C){tr(AB) + tr(ATB)}
+ tr{(A + AT)(B + BT)(C + CT)}.

47

Applying the above formula to J2, J3, J5 and J9, we obtain

−E(J2) = tr2(ATB)tr(CTDDTC) + tr(ATB){tr(ATBCTDDTC) + tr(BTACTDDTC)}

+ tr(ATB){tr(ATBCTDDTC) + tr(BTATCTDDTC)}
+ tr(CTDDTC){tr(ATBATB) + tr(BTAATB)}
+ 2tr{(ATB + BTA)2CTDDTC}.

The expectation of J3 is the same as E(J2) above except for changing CTDDTC to DTCCTD.

Similarly,

−E(J5) = tr2(CTD)tr(BTAATB) + tr(CTD){tr(CTDBTAATB) + tr(DTCBTAATB)}

+ tr(CTD){tr(CTDBTAATB) + tr(DTCTBTAATB)}
+ tr(BTAATB){tr(CTDCTD) + tr(DTCCTD)}
+ 2tr{(CTD + DTC)2BTAATB},

and E(J9) is the same as E(J5) with replacing BTAATB with ATBBTA. Finally, we can

show that

E(J1) = tr2(ATB)tr2(CTD) + tr2(ATB)[tr{(CTD)2} + tr(CTDDTC)]

+ tr2(CTD)[tr{(ATB)2} + tr(ATBBTA)]
+ 4tr(ATB)tr(CTD){tr(ATBCTD) + tr(BTACTD)}
+ [tr{(ATB)2} + tr(ATBBTA)][tr{(CTD)2} + tr(CTDDTC)]
+ 2{tr(ATBCTD) + tr(BTACTD)}2
+ 2tr(ATB)tr{(ATB + BTA)(CD + DTCT)2}
+ 2tr(CTD)tr{(ATB + BTA)2(CD + DTCT)}
+ tr{(ATB + BTA)2(CD + DTCT)2} + tr[{(ATB + BTA)(CD + DTCT)}2].

Summarizing the above E(Ji)’s, we obtain E{tr(AM ATBM BT)tr(CM CTDM DT)}. From
this result and (2.12), we can obtain the covariance between tr(AM ATBM BT) and

48

tr(CM CTDM DT). The variance is a special case of the covariance. This completes the ﬁrst

part of the Lemma.

Next, we prove the inequality given in the second part. Using the Cauchy-Schwarz

inequality and Lemma 1,

2tr2(ATB)tr{(ATB + BTA)2} ≤ Ktr2(ATB)tr{(ATB)⊗2}

and

2tr(ATB)tr{(ATB + BTA)(ATBATB + BTABTA)}
≤ 2tr(ATB)tr1/2{(ATB + BTA)2}tr1/2{(ATBATB + BTABTA)2}
≤ Ktr(ATB)tr1/2{(ATB)⊗2}tr1/2{(ATBATB)⊗2}

≤ Ktr(ATB)tr1/2{(ATB)⊗2}tr(cid:2){(ATB)T(ATB)}2(cid:3)

Moreover, tr{(ATB + BTA)4} ≤ tr2{(ATB + BTA)2} ≤ Ktr2{(ATB)⊗2}. In summary,

≤ Ktr(ATB)tr3/2{(ATB)⊗2}.
(cid:104)
(cid:104)

var{tr(AM ATBM BT)} ≤ K
≤ K

tr(ATB)tr1/2{(ATB)⊗2} + tr{(ATB)⊗2}(cid:105)2
tr4(ATB) + tr2{(ATB)⊗2}(cid:105)

.

This ﬁnishes the proof of this Lemma.

(cid:3)

Deﬁne

Vn0(s1, s2, h1, h2) =

Vn1(s1, s2, h1, h2) =

4

n(n − 1)
8(n − 2)
n(n − 1)

(cid:88)
(cid:88)

u,v,k,l∈{1,2}

u,v∈{1,2}

(−1)−|u−v|−|k−l|tr2(Csuhk
(−1)|u−v|tr{(Σs1 − Σs2)Csuhv (Σh1

svhl

CT

),

− Σh2

)CT

suhv

}.

Lemma 6. Let Ws1s2 = Us1s1,0 + Us2s2,0 − Us1s2,0 − Us2s1,0. The covariance between
Ws1s2 and Wh1h2
+Vn1(s1, s2, h1, h2) and Vn0(s1, s2, h1, h2) is the covariance between Ws1s2 and Wh1h2
H0.

is Vu(s1, s2, h1, h2), where Vu(s1, s2, h1, h2) = Vn0(s1, s2, h1, h2)

under

49

Proof. Let Gn(·) be the function deﬁned in Lemma 2. It then follows,

(−1)−|u−v|−|k−l|Gn(su, sv, hk, hl).

Vu(s1, s2, h1, h2) =

Applying Lemma 2, we have

u,v,k,l∈{1,2}

Vu(s1, s2, h1, h2) =

+ tr2(Csuhl

CT

svhk

(−1)−|u−v|−|k−l|(cid:8)tr2(Csuhk
(cid:88)

(−1)−|u−v|−|k−l|(cid:8)tr(ΣsuCsvhl

svhl

CT

)

n(n − 1)

2

)(cid:9) +

2(n − 2)
n(n − 1)

u,v,k,l∈{1,2}

u,v,k,l∈{1,2}

(cid:88)

(cid:88)

Σhk

CT

svhl

)

)(cid:9).

+ tr(Σsv Csuhk

Σhl

CT

suhk

) + tr(ΣsuCsvhk

Σhl

CT

svhk

) + tr(Σsv Csuhl

Σhk

CT

suhl

Hence,

Vu(s1, s2, h1, h2) =

(cid:88)
(cid:88)

u,v,k,l∈{1,2}

u,v,k,l∈{1,2}

4

n(n − 1)
8(n − 2)
n(n − 1)

+

(−1)−|u−v|−|k−l|tr2(Csuhk
(−1)−|u−v|−|k−l|tr(ΣsuCsvhl

CT

)

svhl

Σhk

CT

svhl

).

After some algebra, one can show the second term in the above expression is equivalent to

Vn1(s, h, h1, h2).

Under H0, Vn1(s, h, h1, h2) = 0. Therefore, Vu(s, h, h1, h2) = V0(s, h, h1, h2) is the co-

variance under H0. This completes the proof of Lemma 6.

(cid:3)

2.7.2 Proofs of main results

In this section, we present proofs for the main results of Chapter 2. By deﬁnition, ˆDnt can
be expressed as ˆDnt = ˆDnt,0 − 2 ˆDnt,1 + ˆDnt,2, where for k = 0, 1 and 2,

ˆDnt,k =

1

t(T − t)

(Us1s1,k + Us2s2,k − Us1s2,k − Us2s1,k).

(2.13)

t(cid:88)

T(cid:88)

s1=1

s2=t+1

Here Us1s2,k was deﬁned in Section 2.3.

50

Proof of Theorem 1. Based on the deﬁnition of ˆDnt, the expectation of ˆDnt is

t(cid:88)

s=1

T(cid:88)

h=t+1

t(cid:88)

T(cid:88)

s=1

h=t+1

E( ˆDnt) =

1
t

tr(Σ2

s) +

1

T − t

tr(Σ2

h) −

2

t(T − t)

tr(ΣsΣh) = Dt.

We next calculate the order of the variance of ˆDnt. By using the deﬁnition of ˆDnt, write ˆDnt
as ˆDnt = ˆDnt,0 − 2 ˆDnt,1 + ˆDnt,2. By Lemmas 3 and 4, it follows that ˆDnt,1 = op( ˆDnt,0) and
ˆDnt,2 = op( ˆDnt,0). Therefore, it suﬃces to compute the variance of ˆDnt,0. Using Lemma 6,

nt = w−2(t)
σ2

cov(Ws1s2, Wh1h2

) = w−2(t)

Vu(s1, s2, h1, h2).

∗(cid:88)

s1,s2,
h1,h2

∗(cid:88)

s1,s2,
h1,h2

This completes the proof of Theorem 1.

(cid:3)

Proof of Theorem 2. By Theorem 1, it is suﬃcient to establish the asymptotic normality
of ˆDnt,0. We ﬁrst write ˆDnt,0 into a martingale. Deﬁne Ajsu = YjsuY T
jsu

− Σsu, and

Gnj =

1

t(T − t)

Qni =

1

t(T − t)

t(cid:88)
t(cid:88)

u,v∈{1,2}

T(cid:88)
(cid:88)
T(cid:88)
(cid:88)
ni = 2(cid:80)i−1

u,v∈{1,2}

(1)

s1=1

s2=t+1

s1=1

s2=t+1

Let Zni = Z

ˆDnt,0 − Dt =(cid:80)n

(1)
ni + Z

i=1 Zni,

(2)
ni , where Z

(−1)|u−v|(cid:8)Y T

isuAjsv Yisu − tr(ΣsuAjsv )(cid:9),

(−1)|u−v|{Y T

isuΣsv Yisu − tr(ΣsuΣsv )}.

j=1 Gnj/{n(n − 1)} and Z

(2)
ni = 4Qni/n. Then,

Let Fk be the σ-algebra generated by σ{Y1, . . . , Yk} where Yi = {Yi1, . . . , YiT} is the
collection of Y for the i-th sample. It follows that E(Znk|Fk−1) = 0. Therefore, Znk is a
sequence of martingale diﬀerence with respect to Fk.

Let σ2

ni = E(Z2

ni|Fi−1). To prove the asymptotic normality, we check two following

conditions (Hall and Hedye, 1980):

Condition (a)(cid:80)n
Condition (b)(cid:80)n

i=1 σ2

ni/var( ˆDnt)

p→ 1;

i=1 E(Z4

ni)/var2( ˆDnt) → 0.

51

We ﬁrst prove Condition (a). Consider E((cid:80)n
Furthermore, var( ˆDnt) = (cid:80)n
ni) + 2E{(cid:80)
Thus, we have E((cid:80)n

i=1 E(Z2

i=1 σ2

i=1 E(Z2

ni) =(cid:80)n
ni) =(cid:80)n
i<j ZniE(Znj|Fj−1)} = (cid:80)n
ni) = var( ˆDnt). It suﬃces to show var((cid:80)n
(cid:18)n
(cid:19)−2 i−1(cid:88)
i−1(cid:88)
i−1(cid:88)
(cid:1) E(cid:0)Qni
4(cid:0)n

(cid:1) = Rni,1 + Rni,2 + Rni,3.

4
n2 E(Q2

Gnj|Fi−1

ni|Fi−1)

|Fi−1) +

E(GnjGnj1

i=1 σ2

j1=1

j=1

+

2

i=1 var(Zni).
i=1 E(Z2
ni).
ni) = o{var2( ˆDnt)}.

ni = E(Z2
σ2

ni|Fi−1) =

Now we obtain σ2

i=1 σ2
ni as

1
n

2

j=1

= Yjs1

Recall that Ajs1

no impact on var((cid:80)n
(cid:1)2w2(t)

Rni,1 =

(cid:0)n

2

2

Y T
js1
ni). Moreover,

− Σs1. We can further show that Rni,2 is a constant and has
∗(cid:88)
i−1(cid:88)

(cid:88)

(−1)|u−v|+|k−l|tr(Ajsv Csuhk

Aj1hl

CT

)

suhk

i=1 σ2

i−1(cid:88)

j=1

j1=1

s1,s2,
h1,h2

u,v,k,l∈{1,2}

= R

(0)
ni,1 + R

(1)
ni,1,

(0)
ni,1 corresponds to summation of the terms where j = j1 and R

where R
of the terms where j (cid:54)= j1. To prove Condition (a), it suﬃces to show that

(1)
ni,1 is the summation

i=1 R

(0)
ni,1) = o(σ4

nt),

i=1 R

(1)
ni,1) = o(σ4

nt) and

i=1 Rni,3) = o(σ4

nt).

(a1) var((cid:80)n
(a2) var((cid:80)n
(a3) var((cid:80)n
(cid:16) n(cid:88)

var

(cid:17) ≤

We ﬁrst show (a1). We have

R

(0)
ni,1

i=1

C

w4(t)n5 var

≤ Cw−4(t)n−5

(cid:110) ∗(cid:88)
∗(cid:88)

s1,s2,
h1,h2

(cid:88)
(cid:88)

u,v,

k,l∈{1,2}

(cid:111)

)

CT

suhk

(−1)|u−v|+|k−l|tr(Ajsv Csuhk
(cid:110)

var

tr(Ajsv Csuhk

Ajhl

CT

suhk

)

.

Ajhl

(cid:111)

s1,s2,
h1,h2

u,v,

k,l∈{1,2}

52

Applying Lemma 5 and using the fact

tr(Ajsv Csuhk

Ajhl

CT

suhk

we have

) = tr{ΓT

suΓsv (ZjZT

j − I)ΓT

sv ΓsuΓT
hk

(ZjZT

j − I)ΓT
hl

Γhl

Γhk

},

)} ≤ C(cid:2)tr4(Csuhk

) + tr2{(ΓT

sv Csuhk

)⊗2}(cid:3).

Γhl

CT

CT

suhk

Ajhl

var{tr(Ajsv Csuhk
Under Condition 1, var((cid:80)n
We next show (a2). Because of j (cid:54)= j1, E{tr(Ajsv Csuhk
n(cid:88)

(0)
ni,1) = o(σ4

i=1 R

that

svhl

C

(1)

ni,1) ≤

R

w4(t)n4 var

var(

i=1

(cid:110) ∗(cid:88)
∗(cid:88)

s1,s2,
h1,h2

(cid:88)
(cid:88)

u,v,

k,l∈{1,2}

≤ Cn−4w−4(t)

s1,s2,
h1,h2

u,v,

k,l∈{1,2}

Similar to Lemma 5, we obtain

nt). This completes the proof of (a1).

Aj1hl

CT

suhk

)} = 0. It follows

(cid:111)

CT

suhk

)

(−1)|u−v|+|k−l|tr(Ajsv Csuhk
var(cid:8)tr(Ajsv Csuhk

Aj1hl

CT

suhk

Aj1hl

)(cid:9).

CT

Aj1hl

)} = 2tr2{(ΓT

var{tr(Ajsv Csuhk
where Q⊗2 = QQT and Q⊗4 = QQTQQT. By Condition 1, we have var((cid:80)n
sv Csuhk

Γhl
)⊗2} ≤ Ctr2{(ΓT

)⊗2} + 6tr{(ΓT
)⊗2},

− 4tr{(ΓT

ΓT
sv Csuhk

sv Csuhk

sv Csuhk

sv Csuhk

suhk

Γhl

Γhl

Γhl

)⊗4}

Γhl

i=1 R

(1)
ni,1) =

o(σ4

nt), which proves (a2).

For proving (a3), a direct computation shows that

(cid:88)

n(cid:88)
n(cid:0)n

i=1

16

=

2

Rni,3 =

(cid:1) n(cid:88)

j=1

(cid:1) n(cid:88)

j=1

n(cid:0)n

16

2

(n − j)
w2(t)

(n − j)
w2(t)

∗(cid:88)

s1,s2,
h1,h2

∗(cid:88)
(cid:88)

s1,s2,
h1,h2

u,k∈{1,2}

u,v,

k,l∈{1,2}
(−1)|u−k|tr{(Ajs1

(−1)|u−v|+|k−l|tr(Csuhk

Σhl

CT

suhk

Ajsv )

− Ajs2

)Csuhk

(Σh1

− Σh2

)CT

suhk

}.

53

− Ajs2

)Csuhk

(Σh1

− Σh2

)CT

(−1)vΓT

sv Csuhk

(Σh1

− Σh2

)CT

suhk

suhk

}(cid:3)
Γsv )2(cid:111)

.

i=1 Rni) = o(σ4

nt). Thus, Condition (a) is valid.

(cid:17)4|Fi−1
(cid:111)(cid:105)
(cid:110)(cid:16) i−1(cid:88)
nj|Fi−1) +(cid:80)i−1

Gnj

j=1

j=1 E(G4

+ Cn−4E(Q4

ni) = J1i + J2i,

j(cid:54)=j1

E(G2

njG2

nj1

|Fi−1). Using

C

i=1

var(

Then,

n3w4(t)

n(cid:88)

Rni,3) ≤

∗(cid:88)
(cid:88)
∗(cid:88)
(cid:88)
Therefore, by Condition 2, var((cid:80)n

s1,s2,
h1,h2

s1,s2,
h1,h2

n3w4(t)

=

C

u,k∈{1,2}

u,k∈{1,2}

E(cid:2)tr2{(Ajs1
(cid:110)
2(cid:88)

tr

(
v=1

To check Condition (b), we compute

E(Z4

ni) = E{E(Z4

where E(cid:8)(cid:0)(cid:80)i−1

j=1 Gnj

the deﬁnition of Gnj, one obtain

E

(cid:104)
ni|Fi−1)} ≤ Cn−8E
(cid:1)4|Fi−1
(cid:9) = (cid:80)i−1
(cid:88)
∗(cid:88)

1

w2(t)

u,v,k,l∈{1,2}

s1,s2,
h1,h2

G2

nj =

(−1)|u−v|+|k−l|tr(AisuAjsv )tr(Aihk

Ajhl

).

For any two symmetric matrices A and B, E[{ZTAZ − tr(A)}2{ZTBZ − tr(B)}2] =
4{tr(A2)tr(B2) + 2tr2(AB)} + 16{2tr(A2B2) + tr(ABAB)} and tr[{ZTAZ − tr(A)}4] ≤
Ctr2(A2). Accordingly,

J1i ≤

C

w4(t)n8

(cid:88)

i−1(cid:88)
∗(cid:88)
i−1(cid:88)

j=1

s1,s2

∗(cid:88)

u,v∈{1,2}

E(cid:8)tr4(AisuAjsv )(cid:9)
(cid:88)
E(cid:8)tr2(AisuAjsv )tr2(Aihk

C

+

w4(t)n8

j(cid:54)=j1

s1,s2,
h1,h2

u,v,

k,l∈{1,2}

)(cid:9) ≤ 5(cid:88)

k=1

(k)
1i ,
J

Aj1hl

54

where

(1)
1i =
J

C

w4(t)n8

(2)
1i =
J

C

w4(t)n8

(3)
1i =
J

C

w4(t)n8

(4)
1i =
J

C

w4(t)n8

(5)
1i =
J

C

w4(t)n8

j=1

i−1(cid:88)
i−1(cid:88)
i−1(cid:88)
i−1(cid:88)

j(cid:54)=j1

j(cid:54)=j1

j(cid:54)=j1

i−1(cid:88)

j(cid:54)=j1

s1,s2

u,v∈{1,2}

∗(cid:88)
∗(cid:88)
∗(cid:88)
∗(cid:88)

s1,s2,
h1,h2

(cid:88)
(cid:88)
(cid:88)
(cid:88)

u,v,

u,v,

s1,s2,
h1,h2

k,l∈{1,2}

k,l∈{1,2}

E(cid:8)tr2(ΣsuAjsv ΣsuAjsv )(cid:9),
E(cid:8)tr(Ajsv ΣsuAjsv Σsu)tr(Aj1hl
)(cid:9),
E(cid:8)tr2(Ajsv Csuhk
E(cid:8)tr(ΓT

suAjsv ΓsuΓT

suAjsv×

Aj1hl

suhk

CT

)(cid:9),

Aj1hl

Σhk

Σhk

s1,s2,
h1,h2

u,v,

k,l∈{1,2}

∗(cid:88)

(cid:88)

s1,s2,
h1,h2

u,v,

k,l∈{1,2}

)(cid:9),

Γhk

ΓT
hk

Aj1hl

Γhk

Aj1hl

ΓsuΓT
hk

E(cid:8)tr(Ajsv Csuhk

Aj1hl

CT

suhk

Ajsv Csuhk

Aj1hl

CT

suhk

)(cid:9).

Consider the ﬁrst term, J

(1)
1i , in the above inequality. By Lemma 5,

(cid:104)
var{tr(Ajsv ΣsuAjsv Σsu)} ≤ C
sv ΣsuΓsv )4} + tr{(ΓT

+ tr{(ΓT

tr4(ΣsuΣsv ) + 4tr2(ΣsuΣsv ΣsuΣsv )

sv ΣsuΓsv )⊗2}(cid:105)

,

sv ΣsuΓsv ΓT

(cid:104)

and E

tr(ΣsuAjsv ΣsuAjsv )
E{tr2(Ajsv ΣsuAjsv Σsu)} ≤ C

= tr2(ΣsuΣsv ) + tr(ΣsuΣsv ΣsuΣsv ). Thus,

tr4(ΣsuΣsv ) + tr2{(ΣsuΣsv )2} + tr{(Σsv Σsu)4}(cid:105)

(cid:105)

C

(cid:104)
≤ Ctr4(ΣsuΣsv ).
n(cid:88)
∗(cid:88)

i−1(cid:88)

(cid:88)

Therefore,

n(cid:88)

i=1

(1)

1i ≤

J

w4(t)n8

i=1

j=1

s1,s2

u,v∈{1,2}

tr4(ΣsuΣsv ) = o(σ4

nt).

(2.14)

For the second and third terms, J

tr2(Ajsv Csuhk

Aj1hl

CT

suhk

(3)
1i , consider the following inequality:

(2)
1i and J
) ≤ tr(Ajsv ΣsuAjsv Σsu)tr(Aj1hl

Σhk

Aj1hl

Σhk

).

55

Thus, we have

n(cid:88)

i=1

(J

(2)
1i + J

(3)

1i ) ≤

C

w4(t)n8

≤

C

w4(t)n8

(cid:104) ∗(cid:88)
(cid:110) ∗(cid:88)

s1,s2

n(cid:88)
n(cid:88)

i=1

i=1

i−1(cid:88)
i−1(cid:88)

j(cid:54)=j1

j(cid:54)=j1

(cid:88)
(cid:88)

u,v∈{1,2}

s1,s2

u,v∈{1,2}

{tr2(ΣsuΣsv ) + tr{(ΣsuΣsv )2}(cid:105)2

(cid:111)2

tr2(ΣsuΣsv )

= o(σ4

nt).

(2.15)

Next, we consider the fourth term J

(4)
1i . We ﬁrst note the following results
1 − I)CTD(Z2ZT

2 − I)DTF (Z2Z2 − I)F T}]

1 − I)ATC(Z1ZT

E[tr{A(Z1ZT
= tr(ATC)tr(DTF )tr(DF TACT) + tr(ATC)tr(F DTF DTACT)

+ tr(DTF )tr(CATCATDF T) + tr(F DTF DTCATCAT)

for matrices A, C, D and F with appropriate dimensions. Then we can obtain

)(cid:9)

suAjsv ΓsuΓT
hk

Aj1hl

ΓT
hk

Aj1hl

Γhk

Γhk
CT

)tr(Σsv ΣsuΣsv Csuhk

)tr(Csuhk

Σhl

suhk

Σsv ) + tr(Σhl

Σhk

CT

suhk

Σhl

) + tr(Σhl

Σhk
Σhl
) + tr3/2(Σsv Σsu)tr1/2{(Σhk

Σhl

Σhk

Σhl
CT
suhk
)4}

CT

suhk

Σsv Csuhk

)}

Σsv ΣsuΣsv Csuhk

)

)4}tr1/2{(ΣsuΣsv )4}.

Σhl

(cid:111)2

tr2(ΣsuΣsv )

= o(σ4

nt).

(2.16)

E(cid:8)tr(ΓT
suAjsv ΓsuΓT
= tr(Σsv Σsu){tr(Σhl
+ tr(Σhl
≤ tr3/2(Σsv Σsu)tr3/2(Σhl
+ tr3/2(Σhl

Σhk

Σhk

Σhk

It follows that

n(cid:88)

i=1

(4)

1i ≤

J

)tr1/2{(ΣsuΣsv )4} + tr1/2{(Σhk
(cid:88)

(cid:110) ∗(cid:88)

i−1(cid:88)

n(cid:88)

C

w4(t)n8

i=1

j(cid:54)=j1

s1,s2

u,v∈{1,2}

56

By Lemma 5,

E(cid:2)tr{(CT

suhk

)2}(cid:3) = E(cid:2)tr2(ΓT

sv Csuhk

Aj1hl

Ajsv Csuhk

Aj1hl

CT

suhk

Γsv )

Γsv )2}(cid:3)

Aj1hl

CT

suhk

Σsv Csuhk

Σsv Csuhk

Zj

Γhl
)}2

+ tr{(ΓT

sv Csuhk
CT

suhk
CT

suhk

= {ZT
j ΓT
hl
− tr(Σhl
+ tr(Csuhk
CT

= 3tr{(ΓT
hl

Aj1hl

suhk

CT
Σsv Csuhk
Γsv )⊗4} + tr2{(ΓT
hl

Aj1hl
CT

suhk

CT
Σsv )
suhk
Γsv )⊗2}.

suhk

k=1 J

(5)
1k = o(σ4

nt), and further with equations

(cid:111)

i − tr{(Σs1 − Σs2)(Σs1 + Σs2)}(cid:105)

,

ZT

This along with Condition 1 together implies(cid:80)n
(2.14) and (2.15) implies(cid:80)n
(cid:110) 2(cid:88)

Finally, we consider J2i. We write Qni as

k=1 J1k = o(σ4

∗(cid:88)

nt).

(cid:104)

1

ZT
i

(−1)u−1ΓT

su(Σs1 − Σs2)Γsu

s1,s2

u=1

w(t)

Qni =

where (cid:80)2

Proposition A.1 in Chen et al. (2010),

u=1(−1)u−1ΓT
su(Σs1 − Σs2)Γsu = (Γs1 + Γs2)T(Σs1 − Σs2)(Γs1 − Γs2). Using
t(cid:88)

tr2(cid:104){(Γs1 + Γs2)T(Σs1 − Σs2)(Γs1 − Γs2)}⊗2(cid:105)

T(cid:88)

J2i ≤

C

.

nt). Condition (b) is valid. This completes the proof of the

s1=1

s2=t+1

n4w2(t)

As a result, (cid:80)n

i=1 J2i = o(σ4
asymptotic normality of ˆDnt.

(cid:3)

joint multivariate normality of { ˆDnt}T−1

Proof of Theorem 3. Using the continuous mapping theorem, we only need to prove the
vector of length T − 1. By the Cram´er-Wold device, it suﬃces to show that (cid:80)T−1
t=1 . Let a = (a1, . . . , aT−1)T be any non-zero constant
is asymptotically normal under H0. By Lemma 6, the variance of (cid:80)T−1
t=1 at ˆDnt
var((cid:80)T−1
(cid:80)T−1
(cid:80)T−1
0T =
N (0, 1) in distribution. The asymptotic normality of(cid:80)T−1
q=1 ataqQn,tq. Then we wish to show σ−1
t=1 at ˆDnt →
t=1 at ˆDnt can be shown by using the

t=1 at ˆDnt) = (cid:80)T−1

t=1 at ˆDnt is σ2

t=1

0T

57

martingale central limit theory, which is very similar to the proof of Theorem 2. Therefore,

we omit the details.

(cid:3)

Proof of Theorem 4. Assume that the alternative H∗

E( ˆDnt) = tr(Σ2

−

1

h=t+1

1) +

T − t

k1(cid:88)
(cid:110) k1(cid:88)
t(cid:88)
t(T − t)
− 2t(k1 − t)
k1 − t
tr(cid:8)(Σ1 − ΣT )2(cid:9).
T − t
t(T − t)

(cid:110)
1 +
T − k1
T − t

h=t+1

s=1

2

(cid:111)

=

h=k1+1

tr(Σ2
h)

1 is true. First, for t ∈ {1, . . . k1− 1},
T(cid:88)
T(cid:88)
h=k1+1
T − k1
T − t

(cid:111)
T ) − 2t(T − k1)
t(T − t)

tr(Σ1ΣT )

tr(ΣsΣh)

tr(Σ2

tr(Σ2

h) +

1

T − t

tr(ΣsΣh) +

tr(Σ2

1) +

=

Similarly, if k1 ≤ t, then E( ˆDnt) = k1tr(cid:8)(Σ1 − ΣT )2(cid:9)/t.

Deﬁne B(C) = {t ∈ {1, . . . , T − 1} : |t − k1| ≥ Cβn/(n∆n)} for some constant C. To
establish the rate of convergence of the change point estimator ˆk1, we need to show, for any
 > 0, there exist a constant C such that pr{|ˆk1− k1| > Cβn/(n∆n)} < . This is equivalent
to show that pr{ˆk1 ∈ B(C)} < . Since the event {ˆk1 ∈ B(C)} ⊂ {maxt∈B(C)
},
ˆDnt > ˆDnk1
}. Thus, it suﬃces to show, for any  > 0, there
pr{ˆk1 ∈ B(C)} ≤ pr{maxt∈B(C)
exist a constant C such that

ˆDnt > ˆDk1

pr( ˆDnt − Dk1

> ˆDnk1

− Dk1

) < .

(2.17)

} ≤ (cid:88)

t∈B(C)

pr{ max
t∈B(C)

ˆDnt > ˆDk1

Under H∗

1 , we have

ˆDnt − Dk1

= ˆDnt − Dt + Dt − Dk1
= ˆDnt − Dt − |t − k1|G(t; k1)tr{(Σ1 − ΣT )2},

= ˆDnt − Dt + {r(t; k1) − 1}tr{(Σ1 − ΣT )2}

58

where G(t; k1) = {1/(T − t)}I(1 ≤ t ≤ k1) + (1/t)I(k1 + 1 ≤ t ≤ T − 1). Then, for t ∈ B(C),

pr( ˆDnt > ˆDnk1

) ≤ pr{| ˆDnt − Dt| > |t − k1|G(t; k1)∆n/2}

− Dk1

+ pr{| ˆDnk1
≤ pr{|σ−1
+ pr{|σ−1
nk1

nt ( ˆDnt − Dt)| > CβnG(t; k1)/(cid:112)(4V0t + 8nV1t)}

| > |t − k1|G(t; k1)∆n/2}
(cid:113)
)| > CβnG(t; k1)/

( ˆDnk1

− Dk1
(cid:112)(4V0t + 8nV1t). Furthermore, w(t) and G(t; k1)

+ 8nV1k1

(4V0k1

)}.

For any t and some constant C1, βn > C1
are bounded away from zero for t ∈ B(C). Thus, by Chebyshev’s inequality,

pr( ˆDnt > ˆDnk1

) ≤ pr{|σ−1

nt ( ˆDnt − Dt)| > C} + pr{|σ−1

nk1

( ˆDnk1

− Dk1

)| > C} ≤ 2

C2 <

for large enough C. Therefore, (2.17) is true. This ﬁnishes the proof of Theorem 4.

,


T
(cid:3)

Proof of Theorem 5. Let k0 = 0 and kq+1 = T . Denote the common covariances between
the change points kj and kj+1 as ˜Σj for j = 0, . . . , q. To show that maxt Dt is at one of

the change points, it is enough to show that maxt Dt cannot be attained at any time points

except change points k1, . . . , kq. Thus, we need to show that the maximum of Dt is not
attainable for t in the following sets: (1) t ∈ {1, . . . , k1 − 1}; (2) t ∈ {kq + 1, . . . , T − 1}; and
(3) t ∈ {kl + 1, . . . , kl+1 − 1} for some l ∈ {1, . . . , q − 1}. We do not need to consider case
(1) if k1 = 1 or case (3) if kq = T − 1. Without loss of generality, we assume k1 > 1 and
kq < T − 1 in the following proof.

First, if t ∈ {1, . . . , k1 − 1}, then using the deﬁnition of Dt, we have
T(cid:88)

k1(cid:88)

t(cid:88)

1

1

(cid:107)Σs1 − Σs2(cid:107)2

F +

t(T − t)

Dt =

t(T − t)

s1=1

s2=k1+1

(cid:107)Σs1 − Σs2(cid:107)2

F

t(cid:88)
T(cid:88)

s1=1

s2=t+1
(cid:107) ˜Σ0 − Σs2(cid:107)2

F

=

1

T − t

s2=k1+1

which is an increasing function of t in this scenario. Therefore, the maximum value of Dt
will not be at any t ∈ {1, . . . , k1 − 1}.

59

Second, if t ∈ {kq + 1, . . . , T − 1}, then

kq(cid:88)

T(cid:88)

s1=1
s2=t+1
(cid:107) ˜Σq − Σs1(cid:107)2

F

Dt =

1

t(T − t)

kq(cid:88)

s1=1

=

1
t

(cid:107)Σs1 − Σs2(cid:107)2

F +

1

t(T − t)

t(cid:88)

T(cid:88)

s1=kq+1

s2=t+1

(cid:107)Σs1 − Σs2(cid:107)2

F

which is a decreasing function of t. Therefore, the maximum value of Dt will not be at any
t ∈ {kq + 1, . . . , T − 1}.

At last, let us consider the third case with t ∈ {kl + 1, . . . , kl+1 − 1} for some l ∈

{1, . . . , q − 1}. We rewrite Dt as

Dt =

1

t(T − t)

+ (t − kl)

j=l+1

(ki+1 − ki)(kj+1 − kj)(cid:107) ˜Σi − ˜Σj(cid:107)2
l−1(cid:88)
(ki+1 − ki)(cid:107) ˜Σi − ˜Σl(cid:107)2

F + (kl+1 − t)

(kj+1 − kj)(cid:107) ˜Σl − ˜Σj(cid:107)2

F

F

(cid:111)

.

j=l+1

i=0

Since (cid:107) ˜Σi − ˜Σj(cid:107)2
Dt as

F = (cid:107) ˜Σi − ˜Σl(cid:107)2

F + (cid:107) ˜Σl − ˜Σj(cid:107)2

F + 2tr{( ˜Σi − ˜Σl)( ˜Σl − ˜Σj)}, we further write
(cid:8)2∆ + tA + (T − t)B(cid:9),

q(cid:88)

(cid:110) l−1(cid:88)
q(cid:88)

i=0

Dt =

1

t(T − t)

l−1(cid:88)

q(cid:88)

where

i=0

∆ =

j=l+1

(ki+1 − ki)(kj+1 − kj)tr{( ˜Σi − ˜Σl)( ˜Σl − ˜Σj)},

A =(cid:80)q

i=0(ki+1 − ki)(cid:107) ˜Σi − ˜Σl(cid:107)2
the fact that 1/{t(T − t)} = (1/T ){1/t + 1/(T − t)} to further write Dt as

F and B =(cid:80)l−1
j=l+1(kj+1 − kj)(cid:107) ˜Σl − ˜Σj(cid:107)2
(cid:16)

(cid:16)

(cid:17)

(cid:17)

Dt =

A +

1
t

2∆
T

+

1

T − t

B +

2∆
T

.

F . Then we can use

We will consider four cases, (a)-(d), according to the signs of A + 2∆/T and B + 2∆/T .

(a) If A + 2∆/T ≥ 0 and B + 2∆/T ≤ 0, then Dt is a decreasing function of t. In this

case, the maximum of Dt will not be at any t for t ∈ {kl + 1, . . . , kl+1 − 1}.

60

(b) If A + 2∆/T ≤ 0 and B + 2∆/T ≥ 0, then Dt is an increasing function of t. In this

case, the maximum of Dt will not be at any t for t ∈ {kl + 1, . . . , kl+1 − 1}.

(c) If A + 2∆/T > 0 and B + 2∆/T > 0, then the derivative of Dt with respect to t is

D(cid:48)
t =

1

t2(T − t)2{(B − A)t2 + 2(A +
t is always positive for t ∈ {kl + 1, . . . , kl+1 − 1}. Thus, to determine

)T t − (A +

2∆
T

)T 2}.

2∆
T

t, we only need to know the sign of the numerator of D(cid:48)
t.

The denominator of D(cid:48)
the sign of D(cid:48)

The numerator of D(cid:48)

t is a quadratic form of t. To know the sign of the numerator of

D(cid:48)
t, we consider two cases: (i) B > A and (ii) B < A. In the case (i) with B > A, one
of the solution of t2(T − t)2D(cid:48)
t0 ∈ (kl, kl+1), then D(cid:48)
that the function Dt decreases for kl < t < t0 and increases for t0 < t < kl+1. Therefore,

t is negative for kl < t < t0 and positive for t0 < t < kl+1. This implies

t = 0 is less than 0, another solution t0 is greater than 0. If

Dt attains its minimum at t0 and the maximum of Dt will not be attained within (kl, kl+1).
If t0 (cid:54)∈ (kl, kl+1), then D(cid:48)
t is either always negative or always positive for t ∈ (kl, kl+1). In
this case, Dt is a monotonic function of t and hence the maximum of Dt will not be attained

within (kl, kl+1).

In the case (ii) with B < A, it can be shown that t2(T − t)2D(cid:48)

t1, t2 = T(cid:2)(A + 2∆/T )/(A − B) ±(cid:112){(A + 2∆/T )/(A − B) − 1/2}2 − 1/4(cid:3). Here, t1, t2

t = 0 has two solutions,

corresponds to the positive and negative sign, respectively. Because B + 2∆/T > 0, (A +
2∆/T )/(A − B) > 1. It follows that t2 > T . Similar to the case of B > A, if t1 ∈ (kl, kl+1),
the function Dt decreases for kl < t < t1 and increases for t1 < t < kl+1. Therefore, Dt

attains its minimum at t0 and the maximum of Dt will not be attained within (kl, kl+1).
If t1 (cid:54)∈ (kl, kl+1), Dt is a monotone function of t and hence the maximum of Dt will not
be attained within (kl, kl+1). In summary, the maximum of Dt will not be attained within

(kl, kl+1) if A + 2∆/T > 0 and B + 2∆/T > 0.

(d) If A + 2∆/T < 0 and B + 2∆/T < 0, then 2∆/T < 0 because A > 0 and B > 0.

61

Thus, A − 2|∆|/T < 0. Using the Cauchy-Schwarz inequality, we have

A < 2|∆|/T ≤ l−1(cid:88)

q(cid:88)

(ki+1 − ki)(kj+1 − kj)((cid:107) ˜Σi − ˜Σl(cid:107)2

F + (cid:107) ˜Σl − ˜Σj(cid:107)2

F )/T

i=0

j=l+1

= {(T − kl+1)A + klB}/T.

The above inequality implies that A/B < kl/kl+1 < 1. On the other hand, B < 2|∆|/T ≤
{(T − kl+1)A + klB}/T, which implies that A/B > (T − kl)/(T − kl+1) > 1. This is a
contradiction. Therefore, case (d) is not possible.

By the results of (a)-(d), the maximum of Dt will not attain within {kl + 1, . . . , kl+1 − 1}

for case (3). Thus, the proof is completed.

(cid:3)

Proof of Theorem 6. At the beginning of the binary segmentation algorithm, we have
Mn[1, T ] > Wαn[1, T ] with probability one because, for any t ∈ {1, . . . , T − 1},

pr(Mn[1, T ] > Wαn[1, T ]) ≥ pr(σ−1
= pr(cid:8)σ−1
= 1 − Φ(cid:8)σ−1

nt [1, T ]( ˆDnt[1, T ] − Dt[1, T ]) > σ−1

nt [1, T ](σnt,0[1, T ]Wαn[1, T ] − Dt[1, T ])(cid:9) → 1,

nt,0[1, T ] ˆDnt[1, T ] > Wαn[1, T ])

nt [1, T ](σnt,0[1, T ]Wαn[1, T ] − Dt[1, T ])(cid:9)

where we used the condition Wαn = o(mSN R) in Theorem 6. Therefore, using Theorems
4 and 5, one change point in {k1, . . . , kq} will be detected and estimated with probability
1 because βn[1, T ] = o(nDks[1, T ]) for some s ∈ {1, . . . , q}. Each subsequence satisﬁes the
condition Wαn = o(mSN R) in Theorem 6 and hence the detection continues.

Suppose we have detected less than q change points. By the assumptions in this theorem,
there exists a segment, {l1 + 1, . . . , l2}, that contains a change point, ks, such that Wαn =
o(mSN R) and βn[(l1 + 1), l2] = o{nDks[(l1 + 1), l2]} hold. Therefore, by similar arguments

as above, a change point will be detected and estimated consistently in the segment. Thus,
ˆq ≥ q. Once ˆq reaches q, all subsequent segments have end points at the change points
and two boundary points 1, k1, . . . , kq, T . Then, by Theorem 3, Mn[l1, l2] < Wαn with
probability one as αn → 0. This implies that no additional change point will be detected.
The proof is completed.

(cid:3)

62

CHAPTER 3

COVARIANCE CHANGE POINT DETECTION AND IDENTIFICATION

WITH HIGH-DIMENSIONAL FUNCTIONAL DATA

3.1 Introduction

Access to high-dimensional data has exploded in recent years due to technological im-

provements and cost reductions. High throughput technology has facilitated the collection

of genomics data, with more variables being measured than ever before. In addition, the

reductions in cost have allowed measurements to be taken over time, as is the case in time-

course microarray studies. Similarly, functional neuroimaging studies repeatedly measure a

massive number of variables throughout the duration of a medical experiment. Time-course

microarray data and functional neuroimaging data are just two examples of applications

that beget high-dimensional longitudinal, or functional, data, where a large number of vari-

ables are repeatedly measured on a small number of experimental units. Throughout this

chapter we focus on high-dimensional dense functional data, where the number of repeated

measurements is large (Ramsay 1982).

Functional magnetic resonance image (fMRI) data is an important example of high-

dimensional functional data. In a task-based fMRI study, individuals perform various tasks

while the fMRI machine records blood-oxygen-level dependent (BOLD) signals throughout

their brain. These tasks may be passive or active. For example, subjects may be shown a

movie, a sequence of pictures, or asked to respond to questions. In contrast, a resting-state

fMRI does not involve any subject engagement, but aims to investigate the brain’s functional

organization through the BOLD signal measurements. In the course of an fMRI study, the

human brain is partitioned into small uniform cubes, also known as voxels, that are about

the size of 1–3mm3. For each voxel, a BOLD measurement is recorded at each time point.

A cluster of voxels is known as a node or region of interest, where clusters can be deﬁned for

63

anatomical region of interest analysis or spherical region of interest analysis. BOLD signal

measurements are repeatedly recorded for each of about 100,000 brain voxels between 100

to 2000 times for a single subject. The number of repeated measurements typically depends

on the fMRI scanner and duration of the task-based or resting-state experiment. To enable

population inference, multiple subjects are included in an fMRI study. Rather than analyze

all 100,000 voxels, doctors may be interested in speciﬁc anatomical regions of the brain.

However, a region of interest will still have voxel BOLD signal measurements at the order

of 100. In addition to the sheer size of the data, fMRI data exhibit complex spatiotemporal

dependence. For a given subject, BOLD measurements in neighboring voxels are correlated,

as are BOLD measurements for a given voxel but across time points. The high-dimensional

and dependent structure make statistical modeling, testing, and analysis a challenge.

One major interest in neuroscience is to understand functional connectivity or dynamic

functional connectivity at an individual or group level across time points (Kundu et al.

2018). We refer to dynamic functional connectivity as the changing relationships between

spatially separated brain regions across experimental time points. In particular, we are inter-

ested in studying dynamic functional connectivity across individuals. Traditional functional

connectivity assumes stationary relationships between nodes throughout the experiment. To

characterize the functional connectivity at a given time point, the covariance matrix, or

precision matrix, of BOLD signals serve as a proxy for the within and between brain node

neural activity. As a result, dynamic functional connectivity of the brain can be explored

via a procedure that assesses covariance matrix stationarity.

The purpose of this chapter is to develop a robust statistical procedure to detect and iden-

tify change points among covariance matrices in high-dimensional functional data. Assume

Yit = (Yit1, . . . , Yitp)T is a p-dimensional random vector with mean vector µt and covariance

matrix Σt.

In the context of an fMRI study, Yit (i = 1, . . . , n; t = 1, . . . , T ) represents

the p BOLD signal measurements for the ith individual at the tth time point, where p, T ,

and n are typically at the order of 100,000, 100, and 10, respectively. For a speciﬁc region

64

of interest in the brain or for region of interest network analysis, p may be at the order of

100. Our proposed procedure aims to answer two questions. First, does a temporal change

exist among covariance matrices? This corresponds to a covariance change point detection

problem that can be posed in the form of a statistical hypothesis test

H0 : Σ1 = ··· = ΣT
H1 : Σ1 = ··· = Στ1 (cid:54)= Στ1+1 = ··· = Στq (cid:54)= Στq+1 = ··· = ΣT ,

versus

(3.1)

where τk < T (k = 1, . . . , q < ∞) are the unknown change point locations. Second, if a
temporal change does exist, can we determine its location and the locations of all possible

changes? This suggests a change point identiﬁcation problem that aims to estimate the

unknown locations of τks. Although we consider a high-dimensional setting, we do not

require a sparsity assumption for Σt, and we allow the complex spatiotemporal dependence

present in high-dimensional functional data. In the context of fMRI studies, our proposed

procedure will ﬁrst determine if functional connectivity is stationary.

If not, our change

point identiﬁcation procedure will partition the functional data into stationary sequences

with regards to the covariance matrices.

Testing covariance matrices is a classical problem in multivariate statistical analysis.

Muirhead (2005) and Anderson (2003) detailed multivariate tests for covariance matrices,

including testing the homogeneity of several covariance matrices. However, these tests rely

on likelihood ratios, and they require the sample size to exceed the number of variables

measured. Recent work done by Schott (2007), Srivastava and Yanagihara (2010), and Li

and Chen (2012) addressed the lack of an appropriate testing procedure for covariance ma-

trices in a high-dimensional setting. More recently, Ahmad (2017) and Zhang et al. (2018)

generalized aspects of the aforementioned works to an independent multi-sample test for

high-dimensional covariance matrices. All of the research in testing high-dimensional covari-

ance matrices since Schott’s 2007 pioneering procedure have addressed the high-dimensional

challenges. However, none have focused on how to incorporate temporal dependence in a

65

high-dimensional setting. Therefore, none of the previously mentioned methods are applica-

ble to high-dimensional functional data.

Researchers in neuroscience have developed a few methods to study the dynamic func-

tional brain connectivity for single patients and populations. However, in general, their

methods are ad hoc and lack the theoretical rigor to ensure a robust inference procedure.

Some neuroscience approaches were detailed in Chapter 2. Most of the existing work stud-

ies dynamic functional connectivity for an individual. For example, Monti et al. (2014)

developed a sliding window approach based on pair-wise correlations to study the dynamic

functional connectivity. Their approach was based oﬀ a single subject and is not directly

applicable to study the common dynamic functional connectivity for a population. Kundu

et al. (2018) developed a procedure to test (3.1) with the aim of studying group level brain

dynamic functional connectivity in a task-based fMRI experiment. To detect and identify

change points, Kundu et al. (2018) ﬁrst compute all pair-wise correlations between p nodes
at each time point. Thus, at each time point they obtain p(p − 1)/2 sample pair-wise corre-

lations that they stack as a vector. Next, they apply a generalized fused Lasso (Tibshirani

et al. 2005) approach to the multivariate time series of sample correlations. The fused

Lasso was developed for an ordered set of covariates, and as is the case with Lasso, it also

involves a penalty parameter. To tune the penalty parameter they use a lowess ﬁt, which

also depends on a smoothing parameter. Based on the fused Lasso, the number of change

points is a function of the penalty parameter’s value. A small value leads to more identiﬁed

change points, whereas a large value leads to a fewer number of identiﬁed change points.

In order to accurately identify all change points, they ﬁrst ﬁt the model where the tuning

parameter’s value is small, and subsequently apply screening criteria to remove any false

positive change points. In their approach they did not derive any theoretical results with

regards to change point identiﬁcation consistency. Nor did they investigate the size or power

of their proposed change point detection procedure. Furthermore, their method is heavily

dependent on the choice of parameters. Our proposed procedure is free of tuning parameters

66

and is theoretically rigorous.

While no methods in the existing literature are applicable to test (3.1) for high-dimensional

functional data, it is also the case that the methods developed in Chapter 2 are not appli-

cable for a few reasons. First, in Chapter 2 it was assumed that the number of repeated

measurements is small. Numerical studies considered the ﬁnite sample performance when

T = 5 and 8. A real data application was conducted where T = 6. Second, the asymptotic

distribution of the test statistic and rate of convergence for the change point estimator were

derived under an asymptotic setting in which p and n diverge but with T ﬁnite. For a

large number of repeated measurements, as is the case with dense functional data, it will be

more appropriate to consider an asymptotic setting in which p, n, and T diverge. Numerical

simulation and real data applications should be based on theoretical results derived under

this new asymptotic setting and not that considered in Chapter 2. Third, the computation

complexity of the proposed procedure in Chapter 2 was not a concern for small values of n

and T . The overall computation complexity of the change point detection procedure detailed

in Chapter 2 is O(pn4T 6). To directly apply the procedure from Chapter 2 would be com-

putationally impractical, if not impossible. Thus, in this chapter we aim to address these

theoretical and computational challenges so our procedure is applicable to high-dimensional

functional data.

In addition to testing the hypotheses of (3.1), we also develop a method to estimate

unknown change points. In Chapter 2, the rate of convergence was established under an

asymptotic setting where p and n diverge but with T ﬁnite. In this chapter we investigate

the rate of convergence of the change point estimator where p, n, and T all diverge. Much of

the research in change point identiﬁcation considers the scenario with n = 1. For instance,

Aue et al. (2009) considered a p-dimensional multivariate time series where T diverges but

under the assumption that p < T . Wang et al. (2017) considered covariance matrix change

point identiﬁcation for T independent p-dimensional sub-Gaussian random vectors. They

also require p < T . Dette et al.

(2018) proposed a two-stage covariance change point

67

identiﬁcation procedure based on T independent sub-Gaussian random vectors. Their ﬁrst

step involves dimension reduction governed by a regularization parameter.

In step two,

they use a CUSUM-type statistic to estimate the locations of change points. Despite these

recent advances, none of the aforementioned methods are applicable to identify change points

among covariance matrices in high-dimensional functional data.

This chapter provides both theoretical and computational contributions to the ﬁeld of

statistics. From a theoretical perspective, a new asymptotic setting is considered, a setting

suitable for high-dimensional functional data, in which n, p, and T diverge. For T diverg-

ing, the test statistic forms a stochastic process. The convergence of the ﬁnite-dimensional

distributions is not suﬃcient for weak convergence of a stochastic process. Thus, we extend

the ﬁnite-dimensional result to establish weak convergence of our proposed test statistic.

Furthermore, the rate of convergence with respect to the change point estimator is now im-

pacted by n, p, and T , as opposed to just n and p in Chapter 2. Our investigation reveals

that the rate of convergence depends on the data dimension, sample size, number of repeated

measurements, and signal-to-noise ratio. The change point identiﬁcation estimator is shown

to be consistent, provided the signal strength exceeds the noise. To our knowledge, the

asymptotic framework in which n, p, and T all diverge has not previously been investigated

with regards to change point identiﬁcation among high-dimensional covariance matrices.

From a computation perspective, we improve the eﬃciency of methods developed in Chap-

ter 2. This chapter considers T to be dense, so much of our attention is focused towards

computation eﬃciency for those statistics that have high orders of T . We introduce two

recursive relationships and computation eﬃcient formulae to reduce the computation com-

plexity from O(pn4T 6) to O(pn2T 4). A quantile approximation technique is shown to further

decrease the complexity to the order of pn2T 3. The approximation accuracy is demonstrated

through simulation. These improvements are included in an R package, tecoma, which also

aﬀords an option for parallel computing.

In the absence of these modiﬁcations, it would

be impossible to apply our methods to fMRI data, or any high-dimensional data set with a

68

large number of repeated measurements.

The remaining sections of this chapter are organized as follows. Section 3.2 details the

statistical model and our basic setting. Section 3.3 introduces the measure from Chapter

2 along with the unbiased estimator that is a linear combination of U-type statistics. The

test statistic’s asymptotic distribution is derived under the asymptotic framework in which

n, p, and T diverge. Computation consideration with regards to the statistics is provided in

Section 3.4. Section 3.5 introduces an estimator to identify the locations of change points

should we reject H0 of (3.1). The estimator’s rate of convergence is studied, and two pro-

cedures are detailed to estimate the locations of multiple change points. Sections 3.6 and

3.7 demonstrate the ﬁnite sample performance via simulation and investigate the brain’s

functional connectivity through a task-based fMRI data set, respectively. All proofs and

technical details are provided in Section 3.8.

3.2 Model

Suppose we have n independent individuals that have p variables recorded at each of

T identical time points. Let Yit = (Yit1, . . . , Yitp)T be an observed p-dimensional random

vector, where Yit (i = 1, . . . , n; t = 1, . . . , T ) is independently and identically distributed for

all n individuals. Assume Yit follows a general factor model, where

Yit = µt + ΓtZi,

(3.2)

and µt is a p-dimensional unknown mean vector, Γt is an unknown p × m matrix such that
m ≥ pT , and Zi’s are independent m-dimensional multivariate standard normal random
vectors. Since var(Zi) = Im, it follows that for the ith individual, cov(ΓsZi, ΓtZi) = ΓsΓT
t .
We deﬁne ΓsΓT

t as Cst for diﬀerent time points, s and t, and deﬁne ΓtΓT

t as Σt. Thus, for

the ith individual,

cov(Yis, Yit) =

 Cst,

Σt,

69

s (cid:54)= t,

s = t,

for all s, t ∈ {1, . . . , T}. For individuals i (cid:54)= j, cov(Yis, Yjt) = 0. By deﬁnition, Cst and Σt
are p × p matrices for all s, t ∈ {1, . . . , T}. No speciﬁc structure is required on covariance

matrices Cst and Σt. Their generality allows us to capture the spatiotemporal dependence

in and among the random vectors Yit (i = 1, . . . , n; t = 1, . . . , T ). In the context of fMRI

data, spatial dependence is present among neighboring voxels or nodes and is captured in

both Cst and Σt. Temporal dependence exists for the same voxel or node across time points

and is captured in matrix Cst.

3.3 Change point detection

We consider the measure, Dt (t = 1, . . . , T − 1), deﬁned in Chapter 2, where

t(cid:88)

T(cid:88)

s1=1

s2=t+1

Dt =

1

t(T − t)

tr{(Σs1 − Σs2)2}.

(3.3)

To simplify notation, let t(T − t) be deﬁned as w(t). The choice of Dt is motivated by the
fact that we can distinguish between H0 and H1 based on the maximum value of Dt for all
t ∈ {1, . . . , T − 1}. Let T = {1, . . . , T − 1}. Under H0 of (3.1), maxt∈T Dt = 0, and under
H1, maxt∈T Dt > 0.

Our test statistic is constructed in the same manner as detailed in Chapter 2. We

Dt. Quantity Dt can be expressed as Dt = w−1(t)(cid:80)t

use a linear combination of U-type statistic estimators to create an unbiased estimator of
) −
tr(Σs1Σs2) − tr(Σs2Σs1)}. An unbiased estimator for tr(Σs1Σs2) is given by Us1s2, where

(cid:80)T
s2=t+1{tr(Σ2

) + tr(Σ2
s2

s1

s1=1

Us1s2 = Us1s2,0 − Us1s2,1 − Us2s1,1 + Us1s2,2,

(3.4)

70

and

P 2
nUs1s2,0 =

P 3
nUs1s2,1 =

P 3
nUs2s1,1 =

P 4
nUs1s2,2 =

i,j

∼(cid:88)
∼(cid:88)
∼(cid:88)
∼(cid:88)

i,j,k

i,j,k

i,j,k,l

(Y T
is1

)2,

Yjs2

Y T
is1

Y T
is2

Yjs2

Y T
js2

Yks1

,

Yjs1

Y T
js1

Yks2

,

Y T
is1

Yjs2

Y T
ks1

Yls2

.

In the above expressions, quantity P k

sents the summation over mutually diﬀerent indices. Thus,(cid:80)∼

n = n!/(n − k)!, and the ∼ summation notation repre-
i,j,k is deﬁned as the summa-
tion over i, j, and k, such that i (cid:54)= j, j (cid:54)= k, and k (cid:54)= i. Therefore, an unbiased estimator of

Dt is

ˆDnt =

1

w(t)

=

1

w(t)

t(cid:88)
t(cid:88)

s1=1

T(cid:88)
T(cid:88)

s2=t+1

(Us1s1 + Us2s2 − Us1s2 − Us2s1)
2(cid:88)

(−1)|a−b|Usasb.

s1=1

s2=t+1

a,b=1

(3.5)

In this chapter we consider a diﬀerent asymptotic framework than that of Chapter 2.
Chapter 2 considered p(n) → ∞ as n → ∞, where p is a function of n. We now consider
p(n) → ∞ and T (n) → ∞ as n → ∞, where p and T are both functions of n. No speciﬁc

functional form is required, and we do not require any speciﬁc relationships between p, T ,

and n. Thus, we allow for p > n and p > T . To establish the limiting distribution of ˆDnt, we

assume Conditions 1 – 2 introduced in Section 2.3 along with the following two conditions.

s1=1

s2=t+1

h1=1

s1,s2,
h1,h2

(cid:80)t

(cid:80)T

(cid:80)T

given by (3.6).

is deﬁned as (cid:80)t

The notation (cid:80)∗
Condition 4. (cid:80)∗
Condition 5. There exists a function ψ(k) such that ψ(k) > 0 and (cid:80)∞
any s1, s2 ∈ {1, . . . , T}, tr2(Cs1s2Cs2s1) (cid:16) ψ(|s1 − s2|)tr2(Σs1Σs2).

0t), for any u, k, v, l ∈ {1, 2}.

tr4(Csuhk

) = o(V 2

s1,s2,
h1,h2

CT

svhl

h2=t+1, and quantity V0t is

k=1 ψ(k) < ∞. For

71

In Condition 5, (cid:16) means of the same order. Thus, f (s) (cid:16) g(s) implies there exists a con-
stant c1 such that |f (s)| ≤ c1|g(s)|, and there exists a constant c2 such that |g(s)| ≤ c2|f (s)|
for all s in the real numbers. Condition 5 imposes mild requirements on the spatiotem-

poral structure. Condition 4 is also a mild condition.

If no temporal dependence exists,

then V0t =(cid:80)t
tion 4 is(cid:80)t

(cid:80)T
(cid:80)T

s1=1

s2=t+1

s1=1

s2=t+1

(cid:80)
(cid:80)

u,v∈{1,2} tr2(ΣsuΣsv ). Similarly, the left-hand side of Condi-
u,v∈{1,2} tr4(ΣsuΣsv ). Furthermore, if all eigenvalues of Σt are
0t (cid:16) {t(T − t)p2}2. In comparison, the left-hand side

bounded for all t ∈ {1, . . . , T}, then V 2
of Condition 4 is of the order t(T − t)p4. As a result, Condition 4 holds.

In Chapter 2, we derived the leading order variance of ˆDnt, that is var( ˆDnt) = σ2

nt{1 +

(3.6)

(3.7)

o(1)}, where σ2

nt = w−2(t)(4V0t/n2 + 8V1t/n), and
∗(cid:88)
∗(cid:88)

(cid:88)
(cid:88)

(−1)|u−v|+|k−l|tr2(Csuhk

s1,s2,
h1,h2

u,v,

k,l∈{1,2}

V0t =

V1t =

CT

svhl

),

(−1)|u−k|tr{(Σs1 − Σs2)Csuhk

(Σh1

− Σh2

)CT

suhk

}.

u,k∈{1,2}

s1,s2,
h1,h2

The below theorem establishes the asymptotic distribution of ˆDnt under the asymptotic

setting considered in this chapter.

Theorem 7. Under Conditions 1 – 2, and 4, as n → ∞,

(cid:16) ˆDnt − Dt

(cid:17) d→ N (0, 1),

σ−1

nt

where σ2

nt = w−2(t)(4V0t/n2 + 8V1t/n) and V0t and V1t are given in (3.6) and (3.7), respec-

tively.

Under the null hypothesis, it follows that σ−1
ˆDnt → N (0, 1) in distribution, where
nt,0 = w−2(t)(4V0t/n2) and only Conditions 1 and 4 are required. To formulate an appro-
σ2
priate test procedure free of tuning parameters, consider the test statistic, Mn, of Chapter
2, where

nt,0

Mn = max

t∈T ˆσ−1

nt,0

ˆDnt,

(3.8)

72

and ˆσnt,0 is a plug-in estimator for σnt,0. Methods to construct ˆσnt,0 were detailed in Chapter
2. The following theorem establishes the asymptotic distribution of Mn under the setting
where n, p, and T all diverge.
Theorem 8. Under Conditions 1, 4, and 5, H0 of (3.1), and as n → ∞, Mn
where Zt is a Gaussian process with mean 0 and covariance Rz.

d→ maxt∈T Zt,

We assume that as n → ∞, Rn,z converges to Rz, where Rn,z is a correlation matrix with
(t, q) component deﬁned as Rn,tq = corr( ˆDnt, ˆDnq). The leading order of the cov( ˆDnt, ˆDnq)
is w−1(t)w−1(q)(4V0,tq/n2), where
T(cid:88)
q(cid:88)

(cid:88)

t(cid:88)

T(cid:88)

(−1)|u−v|+|k−l|tr2(Csuhk

V0,tq =

CT

svhl

).

(3.9)

s1=1

s2=t+1

h1=1

h2=q+1

u,v,

k,l∈{1,2}

In order to perform an α-level hypothesis test for (3.1), we must approximate Rn,z and thus

require an estimator for V0,tq. In Chapter 2, an unbiased estimator for tr(Csuhk
given as a linear combination of U-type statistics. Let ˆRn,tq be an estimator for the (t, q)

) was

svhl

CT

component of Rn,z.

Let W = maxt∈T Zt, where Zt is a Gaussian process with mean 0 and covariance Rz,
and deﬁne Wα as the quantity such that pr(W > Wα) = α. By Theorem 8, Mn→W in
distribution, and an α-level test rejects the null hypothesis in (3.1) if Mn > Wα. However,
there is no simple and computation eﬃcient approach to obtain Wα. The random variable

W depends on Rz. Chapter 2 proposed a procedure to approximate quantile Wα on the basis
of computing ˆRn,tq for each t, q ∈ {1, . . . , T − 1}. However, the computation complexity of
this approach, in terms of T , is at the order of T 4 for each component. Therefore, total

complexity is at the order of T 6 to compute ˆRn,z. As a result, it is not feasible to compute

all components when T is large.

As an attempt to alleviate this burden, we can further approximate the distribution of

W by a Gumbel distribution. Under additional assumptions and if T diverges, then

pr(Mn ≤(cid:112)2 log(T ) − log{log(T )} + x) → exp{−(2

√

π)−1 exp(−x/2)}.

73

Accordingly, an α-level quantile is deﬁned as (cid:112)2 log(T ) − log{log(T )} + xα, where xα =

π log(1 − α)}. However, the rate of convergence is at the order of log(T ), which

√
−2 log{−2

is slow. In addition, our simulation experiments demonstrated that the size of the test was

not well controlled at the nominal level. Moreover, using an extreme value-type distribution
does not eliminate the need to compute ˆσnt,0 for all t ∈ T . That overall cost in terms of T
is at the order T 5. Hence, we carefully consider an approximation procedure in Section 3.4

that improves eﬃciency and maintains accuracy.

3.4 Computation of the proposed statistics

The computation complexity for the change point detection procedure is at the order

of pn4T 6. To reduce the complexity, we re-formulate some of the statistics introduced in

Section 3.3 in a computation optimal manner.

The computation complexity of Us1s2 is at the order of n4 due to term Us1s2,2.

In
addition, term Us1s2,1 has computation complexity at the order of n3. To save computation
cost, we can rewrite Us1s2,1 and Us1s2,2 deﬁned in (2.4) in a computation eﬃcient form as
follows. First, we consider Us1s2,1, which can be rewritten as

n(cid:88)

(cid:16) n(cid:88)

j=1

i=1

(cid:17)2 − n(cid:88)

i,j=1

n(cid:88)

k(cid:54)=j=1

P 3
nUs1s2,1 =

Y T
is1

Yjs2

(Y T
is1

)2 − 2

Yjs2

Y T
js1

Yjs2

Y T
js2

Yks1

.

(3.10)

Therefore, the computation complexity of Us1s2,1 regarding the sample subjects is at the
order of n2, not n3. To write Us1s2,2 in a computation eﬃcient form, we ﬁrst deﬁne Vs1s2,1 =
(1/P 3
n)

. Similar to Us1s2,1, we can write Vs1s2,1 as

∼(cid:80)

Yjs2

i,j,k Y T
is1

P 3
nVs1s2,1 =

Y T
is1

Yjs2

Y T
is1

Yjs2

Y T
js1

Yis2

Y T
js1

Yks2

(cid:16) n(cid:88)
n(cid:88)
− n(cid:88)

j=1

i=1

k(cid:54)=j=1

(cid:17)(cid:16) n(cid:88)

i=1

(cid:17) − n(cid:88)

i,j=1

Y T
is2

Yjs1

− n(cid:88)

i(cid:54)=j=1

Y T
js1

Yjs2

Y T
js1

Yks2

Y T
is1

Yjs2

Y T
js1

Yjs2

.

The computation complexity of Vs1s2,1 regarding the sample subjects is also at the order of

74

n2. Finally, we can write Us1s2,2 as

(cid:17)2 − P 3

P 4
nUs1s2,2 =

Y T
is1

Yjs2

n(Us1s2,1 + Us2s1,1 + 2Vs1s2,1) − P 2

nUs1s2,0

(cid:16) n(cid:88)
− n(cid:88)

i(cid:54)=j=1

i(cid:54)=j=1

(Y T
is1

Yjs2

)(Y T
is2

Yjs1

).

(3.11)

Based on the above expression for P 4
plexity of Us1s2,2 regarding the sample subjects is at the order of n2.
computation cost of the proposed statistic Us1s2 with regard to sample subjects is at the
order of n2. These computation eﬃcient expressions can be derived in a similar manner for

nUs1s2,2, we can also see that the computation com-
In summary, the

Ususv,hkhl

, the term used as a plug-in estimator to primarily compute ˆRn,tq.

The computation complexity of ˆDnt in (3.5) in terms of T is at the order T 3. To reduce
the complexity in terms of T , we write ˆDnt recursively. Let f (s1, s2) = (Us1s1 + Us2s2 −
Us1s2 − Us2s1) such that s1, s2 ∈ {1, . . . , T}. By deﬁnition, it follows that for t ≥ 2,

ˆDnt =

w(t − 1)
w(t)

ˆDn(t−1) − w−1(t)

f (k, t) + w−1(t)

f (t, k).

(3.12)

k=1

k=t+1

t−1(cid:88)

T(cid:88)

When t = 1, the computation complexity of ˆDn1 is at the order of T . Therefore by (3.12),
for each t ∈ {1, . . . , T − 1} the computation complexity in terms of T is at the order T .
Since we compute ˆDnt for all t ∈ {1, . . . , T − 1}, the total computation complexity in terms
of T is at the order of T 2 rather than T 3. As a result, the overall computation complexity
to compute ˆDnt for all t ∈ {1, . . . , T − 1} is at the order of pn2T 2. Parallel computing can
further decrease the computation time.

The greatest cost in terms of computation is due to ˆRn,tq for all t, q ∈ T , where the
complexity is at the order of pn2T 6 provided (3.10) and (3.11) are applied. To reduce the

75

complexity, we express ˆRn,tq recursively. Let

(cid:88)
t(cid:88)

u,v,

k,l∈{1,2}

g(s1, h1, s2, h2) =

h(t, q) =

(−1)|u−v|+|k−l|U 2
T(cid:88)
q(cid:88)
T(cid:88)

susvhkhl

,

g(s1, h1, s2, h2).

s1=1

s2=t+1

h1=1

h2=q+1

Thus, n2w(t)w(q) ˆV0,tq/4 = h(t, q). Suppose the quantity h(t, q − 1) is known for t ∈
{1, . . . , T − 2} and q ∈ {2, . . . , 2 − 1}. For a ﬁxed t,

q−1(cid:88)

T(cid:88)

T(cid:88)

T(cid:88)

h(t, q) = h(t, q − 1) −

g(t, j, k, q) +

g(t, q, j, k).

(3.13)

An analogous recursive formula can be derived to traverse a ﬁxed column where h(t − 1, q)

j=t

k=t+1

j=t+1

k=q+1

is known and we want to compute h(t, q). Based on the recursive formula in (3.13), the

computation complexity in terms of T is at T 2. By the deﬁnition of ˆRn,tq, ˆRn,1,1, ˆRn,1,T−1,
ˆRn,T−1,T−1 can each be computed at the computation complexity in terms of T at the order
T 2. Therefore, based on (3.13) and the fact that we must compute ˆRn,tq for all t < q, the
overall computation complexity for ˆRn,z is at the order np2T 4. Despite this reduction, the
complexity can further be improved via linear interpolation on a sparse form of ˆRn,z.

Rather than compute ˆRn,tq for all t, q ∈ {1, . . . , T − 1}, we can compute h = (b + I)
oﬀ-diagonals of the matrix and interpolate the remaining values. Let b be the number of

consecutive oﬀ-diagonals immediately following the main diagonal, and let I be the num-

ber of oﬀ-diagonals computed at a ﬁxed interval after the b consecutive oﬀ-diagonals. Let
diag( ˆRn,1,d+1) be the dth oﬀ-diagonal, where d ∈ {1, . . . , T − 2}. For an eﬃcient approx-
imation of ˆRn,z, ﬁrst compute ˆRn,1,1. Next, apply (3.13) to compute diag( ˆRn,1,2), . . . ,
diag( ˆRn,1,b) for the corresponding b oﬀ-diagonals. Lastly, apply formula (3.13) to compute
diag( ˆRn,1,I1
Each of these I oﬀ-diagonals has an initial computation in terms of T at the order T 3.

) that correspond to the I oﬀ-diagonals at a ﬁxed interval.

), . . . , diag( ˆRn,1,II

Parallel processing can be utilized to start each oﬀ-diagonal’s computation independently.

The overall complexity in terms of T , will be at order hT 3 to obtain a sparse version of

76

ˆRn,z. Linear interpolation is then used to estimate the components not computed. Based

on our simulations, linear interpolation results in a negligible loss in power, and the size

remains near the nominal level. Full simulation results for the linear interpolation method

are available in Section 3.6.

Figure 3.1: Accuracy of linear interpolation for ˆRn,tq. Black circles represent ˆRn,1q for all
q ∈ {1, . . . , T − 1}. Red triangles represent the corresponding interpolated values.

Figure 3.1 illustrates ˆRn,1,q for all q ∈ {1, . . . , T − 1} and the corresponding interpolated
values based on parameters b = 20 and I = 8. The ﬁxed interval for oﬀ-diagonals was set

to ten. The accuracy of the linear interpolation is evident under the null and alternative

hypotheses. Therefore, the computation complexity in terms of T for ˆRn,tq can be reduced
from T 4 to hT 3. In Chapter 2, the overall computation complexity to approximate the quan-

tile was at the order pn4T 6. From the recursive formulae and estimation procedure via linear

interpolation, the overall computation complexity to estimate Rn,z is reduced to pn2T 3, and

thus, making our change point detection procedure applicable to high-dimensional functional

data.

77

3.5 Change point identiﬁcation

If the data leads us to the conclusion to reject H0 of (3.1), then a second task is to identify

the time points where changes exist among the T high-dimensional covariance matrices.

First, consider the case with only one change point. Let τ be the time point where the single

change point exists. Deﬁne ˆτ as an estimator for the change point’s location, where

ˆτ = arg max
t∈T

ˆDnt,

(3.14)

and T = {1, . . . , T − 1}. The form of the estimator is motivated by Theorem 4, which

states that Dt is maximized at the time point t = τ when a single change point exists at τ .

Consider the hypotheses

H0 : Σ1 = ··· = ΣT
H∗
1 : Σ1 = ··· = Στ (cid:54)= Στ +1 = ··· = ΣT .

versus

(3.15)

The following theorem establishes the rate of convergence for ˆτ .

Theorem 9. Assume that H∗
constant. Under Conditions 1 – 2, and 4, it follows that as n → ∞,

1 of (3.15) is true. Also, assume that as T → ∞, τ /T → ω, a

(3.16)

(cid:16)√

where νmax = maxt∈T max

ˆτ − τ = Op

√
nV1t

V0t,

(cid:112)log(T )

(cid:111)

,

n∆p

(cid:110)νmax
(cid:17)

and ∆p = tr{(Σ1 − ΣT )2}.

Theorem 9 demonstrates that the change point estimator, ˆτ , is consistent for high-

dimensional functional data, provided that ∆p/νmax (cid:29) (cid:112)log(T )/n. Quantity ∆p can

be interpreted as the signal, and quantity νmax can be interpreted as the noise. Thus,

(cid:112)log(T )/(n∆p) → 0, ˆτ is a consistent estimator for τ .
n) since ∆p,(cid:112)log(T ),

√
V0t, and

√

if νmax

√
is Op(1/

To investigate the impact of n, p, and T on the rate of convergence of ˆτ we consider each
in turn. Assume p and T are ﬁxed as n → ∞. As a result, the rate of convergence for ˆτ − τ

V1t are held constant. If we assume T is ﬁxed

78

as n and p diverge, then the rate of convergence is the same as that proved in Theorem 4.

n depending on the contributions of ∆p and νmax. Next, if

√
This rate can be faster than 1/
we assume p is ﬁxed as n and T diverge, then ˆτ − τ = Op
on the relationship between ∆p and νmax, the rate of convergence can be much faster than
n as p, T , and n all diverge. As p increases, Σ1 − ΣT can possibly contain
more nonzero components, so ∆p could get larger. However, as p and T increase, νmax

(cid:112)log(T )/

(cid:111)

(cid:110)

increases. Therefore, if νmax does not dominate ∆p, we obtain a faster rate of convergence

√
n. Despite the fact that the estimator in (3.14) is the same as that proposed

√

νmax

n

. Depending

√

(cid:112)log(T )/
than(cid:112)log(T )/

in Chapter 2, the rate of convergence for the estimator is very diﬀerent with regards to the
asymptotic framework in which n → ∞, p → ∞, T → ∞.

Assume H1 of (3.1) is true for multiple change points. First, we introduce two procedures

to identify the locations of multiple change points, and then introduce a theorem with regards

to the consistency of estimating multiple change points.

Let Q = {1 ≤ τ1 < ··· < τq < T} be the collection of all the true q change points,
and let ˆQ be the estimated set of change points. We make use of the notation in Chapter 2

and deﬁne for time points t1 < t2, S[t1, t2] is the statistic S calculated based on the data in

time interval t1 through t2. For example, νmax[t1, t2] is the quantity based on data between

t1 and t2. To identify multiple change points we apply binary segmentation (Venkatraman

1992). The binary segmentation algorithm is detailed follows.

Step 1: Compute Mn and compare it with Wα. If Mn > Wα, then ˆκ = arg maxt∈T ˆDnt
is the estimated change point, and set ˆκ = ˆτ1 so ˆQ = {ˆτ1}. Partition the full data set
into two intervals: [1, ˆκ] and [ˆκ + 1, T ] and proceed to step 2. However, if Mn ≤ Wα,
then no change points exist.

Step 2: Perform the detection procedure to test (3.1) using Y [1, ˆκ] and Y [ˆκ + 1, T ]. If H0 is
ˆDnt[1, ˆκ] as a change point.
rejected based on Y [1, ˆκ], then identify ˆκ1 = arg maxt∈[1,ˆκ]
Since ˆκ1 < ˆτ1, set ˆτ1 = ˆκ1 and ˆτ2 = ˆκ so ˆQ = {ˆτ1, ˆτ2}. Partition the data Y [1, ˆκ] into

79

two intervals: [1, ˆκ1] and [ˆκ1 + 1, ˆκ]. If H0 is not rejected, then no change points exist

in the interval [1, ˆκ]. Repeat this procedure for the data based on interval [ˆκ + 1, T ].
Set ˆQ is then updated to contain the ordered change points. If no change points are

detected in either interval, then stop, as ˆκ is the only change point that exists.

Step 3: If a change point is identiﬁed in at least one interval in step 2, repeat step 2 until

no further change points are detected. At each step update and order set ˆQ.

At the conclusion of the binary segmentation procedure we can partition the interval [1, T ] so
each sub-interval will consist of end points from the set {1, ˆQ, T}. For example, if no change

point is identiﬁed, then the single interval is [1, T ]. If a single change point is identiﬁed at

ˆτ , then two intervals where no change points exist are [1, ˆτ ] and [ˆτ , T ].

The computation time to identify multiple change points exceeds the time to detect

the existence of change points. If parallel computing is available, then the computation time

required to identify multiple change points can be improved via a more eﬃcient identiﬁcation

procedure when compared to the steps outlined for binary segmentation. The improvement

stems from the fact that the time to compute ˆDnt is less than the time required to test (3.1)

for a given time interval. An eﬃcient parallel procedure is detailed below.

Step 1: Perform binary segmentation by partitioning at arg maxt∈It

ˆDnt[It], where It is

the considered time interval of data. Change point detection is not performed at this

step. Binary segmentation continues until all intervals are either of the form [a, a] or
[a, a + 1], where a ∈ T . Suppose there exist N total intervals at the conclusion of

binary segmentation.

Step 2: For all N intervals of length at least one, apply the change point detection procedure

to test (3.1) in parallel. If H0 is rejected for a given interval, then a change point exists
ˆDnt[It] from step 1. Update ˆQ for each
and is estimated at the point arg maxt∈It
identiﬁed change point.

80

Hence, the computation time required to identify multiple change points will only slightly

exceed the time to perform change point detection on the longest interval [1, T ].

To establish the consistency of ˆQ we ﬁrst deﬁne some notation. Let It be a time interval
such that It = [τa + 1, τb], where a + 1 < b such that a ∈ {0, . . . , q− 1} and b ∈ {2, . . . , q + 1}.
Deﬁne τ0 = 0 and τq+t = T . Thus, It is an interval with at least one change point. Assume

the smallest maximum signal-to-noise ratio among all segments It is as deﬁned in Chapter

2, where minIt

maxτs∈It

σ−1
nτs,0[It]Dτs[It] is denoted as mSNR.

Theorem 10. Assume that τk/T converges to ωk as T diverges, Wαn = o(mSNR), and for

any interval It, νmax[It](cid:112)log(T )/(n∆p[It]) → 0. Furthermore, assume αn → 0. Therefore,

under Conditions 1 – 2, and 4, as n → ∞, ˆQ → Q in probability.

In the existence of change points, the assumption that Wαn = o(mSNR) ensures the con-

sistency of the proposed test at each phase of binary segmentation. In the absence of change
points, the assumption that αn → 0 ensures no change points will be detected and binary seg-

mentation will stop on the given interval. The assumption that νmax[It](cid:112)log(T )/(n∆p[It]) →

0 ensures that in the existence of change points, the estimator is consistent.

3.6 Simulation studies

In this section, we present multiple simulation studies to demonstrate the performance

of the change point detection and identiﬁcation procedures in a large p, large T , and small

n setting. All data were generated from a multivariate linear process,

L(cid:88)

Yit =

At,hξi(t−h)

(i = 1, . . . , n; t = 1, . . . , T ),

(3.17)

h=0

where At,h is a p× p matrix, and ξi(t−h) are p-dimensional multivariate normally distributed
random vectors with mean 0 and covariance Ip. The data generation scheme given by (3.17)

81

permits spatial and temporal dependence. Let t ≥ s. By deﬁnition of Yit in (3.17),

L(cid:88)



cov(Yit, Yis) =

At,hAT

s,h−(t−s),

t − s ≤ L;

t − s > L.

h=t−s
0,

Spatial dependence occurs among the vector Yit for a given time point t. Temporal de-
pendence exists among {Yit}T
parameter L.

t=1 at diﬀerent time points and is governed by the simulation

In the simulation studies, we set n = 40, 50 and 60, and p = 500, 750 and 1000. The num-

ber of repeated measurements, T, was set to be 50 and 100. For change point identiﬁcation

we considered an additional case with T = 150. The simulation parameter L = 3.

Simulation results reported in Tables 3.1 – 3.4 were based on 500 simulation replications,

and simulation results in Tables 3.5 and 3.6 were based on 100 simulation replications.

The spatial and temporal dependence incorporated in (3.17) depends on the choice of

matrices At,h. First, we deﬁne matrices At,h for the testing simulation to demonstrate the

size and power of the proposed test procedure. Later, matrices At,h will be deﬁned for the

change point identiﬁcation simulation.

Let τ1 be the true underlying change point among the covariance matrices such that

τ1 = (cid:98)T /2(cid:99), where (cid:98)x(cid:99) is the ﬂoor function. Deﬁne two matrices, B1 and B2, such that

where (i, j) represents the ith row and jth column of the p × p matrices B1 and B2. Thus,
for h ∈ {0, . . . , 3}

(cid:110)
(cid:111)
(cid:110)
(cid:111)
(0.6)|i−j|I(|i − j| < p/5)
,
(0.6 + δ)|i−j|I(|i − j| < p/5)
 B1,

t ∈ {1, . . . , τ1};
t ∈ {τ1 + 1 . . . , T}.

B2,

,

B1 =

B2 =

At,h =

Parameter δ in B2 governs the signal strength in terms of how diﬀerent the covariance

matrices are before and after the change point at time τ1. When δ = 0, B1 = B2 and At,h

82

is the same for all t, and the null hypothesis is true. If δ > 0, then the null hypothesis is

false, and τ1 is the true covariance change point. For the change point detection simulation,

δ was set to 0.00, 0.025, 0.05 and 0.10.

Table 3.1: Empirical size and power of the proposed test, percentages of simulation
replications that reject the null hypothesis

δ

0(size)

0.025

0.05

0.10

n
40
50
60
40
50
60
40
50
60
40
50
60

500
4.4
4.8
3.8
13.4
17.0
26.4
96.0
100
100
100
100
100

T = 50

T = 100

p
750
4.6
4.0
4.2
13.4
19.2
26.0
97.0
100
100
100
100
100

1000
3.8
3.6
2.8
10.8
17.0
27.4
98.0
100
100
100
100
100

500
3.6
2.0
5.4
18.0
30.6
47.0
100
100
100
100
100
100

p
750
5.4
4.6
3.6
19.0
27.2
41.6
100
100
100
100
100
100

1000
4.4
4.0
5.6
18.0
30.4
41.6
100
100
100
100
100
100

Table 3.1 demonstrates the empirical size and power of the proposed test procedure.

The size is well controlled at the nominal level of 0.05 for all values of n, p, and T . For a

ﬁxed p and T , as n increases the power increases. Likewise, as δ increases, the power of the

change point detection procedure increases. For a ﬁxed n and p, the power increases as T

increases. These relationships are further elucidated when simulation results from Table 2.1

under Setting (I) in Section 2.5 are considered. For example, when n = 40, p = 500, and

δ = 0.05, we observe that the power of the test is 21.4, 35.6, 96.0, and 100 as T is 5, 8, 50,

and 100, respectively.

83

Table 3.2: Empirical size and power of the proposed test for T = 100, percentages of
simulation replications that reject the null hypothesis, quantile computed from a
correlation matrix that used linear interpolation. The ﬁrst 5 oﬀ-diagonals were computed
exactly as well as the last w components for each row

δ

0(size)

0.025

0.05

0.10

n
40
50
60
40
50
60
40
50
60
40
50
60

500
3.4
2.0
4.8
17.8
30.8
46.6
100
100
100
100
100
100

w = 5

w = 10

w = 20

p
750
4.8
4.6
3.2
19.0
26.2
40.8
100
100
100
100
100
100

1000
4.2
3.8
5.0
17.6
30.2
41.0
100
100
100
100
100
100

500
3.4
2.0
4.8
17.8
30.8
46.6
100
100
100
100
100
100

p
750
4.8
4.6
3.2
19.0
26.6
40.8
100
100
100
100
100
100

1000
4.2
4.0
5.0
17.6
30.2
41.0
100
100
100
100
100
100

500
3.4
2.0
5.2
17.8
30.8
46.6
100
100
100
100
100
100

p
750
5.2
4.6
3.8
19.0
26.6
41.2
100
100
100
100
100
100

1000
4.2
4.0
5.6
17.6
30.2
41.0
100
100
100
100
100
100

Table 3.3: Empirical size and power of the proposed test for T = 100, percentages of
simulation replications that reject the null hypothesis, quantile computed from a
correlation matrix that used linear interpolation. The ﬁrst 10 oﬀ-diagonals were computed
exactly as well as the last w components for each row

δ

0(size)

0.025

0.05

0.10

n
40
50
60
40
50
60
40
50
60
40
50
60

500
3.4
2.0
4.8
18.0
30.8
46.6
100
100
100
100
100
100

w = 5

w = 10

w = 20

p
750
5.0
4.6
3.2
19.0
26.6
40.8
100
100
100
100
100
100

1000
4.2
4.0
5.0
17.6
30.2
41.0
100
100
100
100
100
100

500
3.4
2.0
4.8
18.0
30.8
46.6
100
100
100
100
100
100

p
750
5.0
4.6
3.4
19.0
26.6
40.8
100
100
100
100
100
100

1000
4.2
4.0
5.0
17.6
30.2
41.0
100
100
100
100
100
100

500
3.4
2.0
5.2
18.0
30.8
46.6
100
100
100
100
100
100

p
750
5.2
4.6
3.8
19.0
26.8
41.2
100
100
100
100
100
100

1000
4.2
4.0
5.6
17.6
30.2
41.0
100
100
100
100
100
100

84

Table 3.4: Empirical size and power of the proposed test for T = 100, percentages of
simulation replications that reject the null hypothesis, quantile computed from a
correlation matrix that used linear interpolation. The ﬁrst 20 oﬀ-diagonals were computed
exactly as well as the last w components for each row

δ

0(size)

0.025

0.05

0.10

n
40
50
60
40
50
60
40
50
60
40
50
60

500
3.6
2.0
5.2
18.0
30.6
46.8
100
100
100
100
100
100

w = 5

w = 10

w = 20

p
750
5.2
4.6
3.4
19.0
26.8
40.8
100
100
100
100
100
100

1000
4.2
4.0
5.6
17.6
30.2
41.0
100
100
100
100
100
100

500
3.6
2.0
5.2
18.0
30.6
47.0
100
100
100
100
100
100

p
750
5.2
4.6
3.4
19.0
26.8
41.4
100
100
100
100
100
100

1000
4.4
4.0
5.6
17.8
30.4
41.4
100
100
100
100
100
100

500
3.6
2.0
5.2
18.0
30.6
46.6
100
100
100
100
100
100

p
750
5.2
4.6
3.8
19.0
27.0
41.4
100
100
100
100
100
100

1000
4.4
4.0
5.6
18.0
30.4
41.6
100
100
100
100
100
100

Tables 3.2 – 3.4 demonstrate the empirical size and power of the proposed test procedure

using a modiﬁcation of the quantile approximation procedure introduced in Section 3.4.
Rather than compute ˆRn,tq for all t, q ∈ {1, . . . , T − 1}, we compute the ﬁrst b oﬀ-diagonals
and the last w columns of ˆRn,tq. The remaining values were imputed via linear interpolation.

Figure 3.1 demonstrates the accuracy of this linear interpolation procedure. Simulations

considered b = 5, 10 and 20, and w = 5, 10 and 20. Based on our simulation results,

there is only a minimal loss in power when compared to computing all components of ˆRn,tq.

Furthermore, the size of the test is well maintained at the nominal level of 0.05.

To evaluate the performance of the change point identiﬁcation procedure through binary
segmentation, consider two change points: τ1 and τ2. Let τ1 = (cid:98)T /2(cid:99), and let τ2 = τ1 + 2.

85

Deﬁne three matrices, B1, B2, and B3, such that

(cid:110)
(cid:111)
(cid:111)
(cid:110)
(|i − j| + 1)−2I(|i − j| < p/5)
,
(cid:110)
(cid:111)
(|i − j| + δ + 1)−2I(|i − j| < p/5)
(|i − j| + 2δ + 1)−2I(|i − j| < p/5)

,

,

B1 =

B2 =

B3 =

where (i, j) represents the ith row and jth column of the p × p matrices B1, B2, and B3.
Thus, for h ∈ {0, . . . , 3}



At,h =

B1,

B2,

B3,

t ∈ {1, . . . , τ1};
t ∈ {τ1 + 1 . . . , τ2};
t ∈ {τ2 + 1 . . . , T}.

When δ = 0, the null hypothesis is true, and At,h is the same for all t ∈ {1, . . . , T}. Since
our purpose is to demonstrate the ﬁnite sample accuracy of change point identiﬁcation, we

do not consider a null hypothesis setting in which δ = 0. The values of δ were selected to be

0.15, 0.25, and 0.35.

Two measures were considered to evaluate the change point identiﬁcation procedure’s

eﬃcacy: average true positives and average true negatives. For each simulation replication
there exists two true change points at time points τ1 and τ2, and there exists T − 3 time
points where no change point exists. The average true positives are deﬁned as the average

number of correctly-identiﬁed change points among 100 simulation replications. Similarly,

the average true negatives are deﬁned as the average number of correctly-identiﬁed time

points where no covariance change exists among 100 simulation replications.

Table 3.5 provides the eﬃcacy of the binary segmentation procedure in the large p, large

T , and small n setting. For ﬁxed p, n, and T , the average true positives and average true
negatives approach two and T − 3, respectively, as δ increases. As the sample size increases,

the average true positives and average true negatives approach their optimal values. Table

3.6 contains the corresponding standard errors for the measures in Table 3.5.

86

Table 3.5: Average true positives and average true negatives for identifying multiple change
points using the proposed binary segmentation method. The maximum number of true
positives for a given replication is 2. The maximum number of true negatives for a given
replication is T − 3

δ=0.15

δ=0.25

δ=0.35

T

p

500

50

750

1000

500

100

750

1000

500

150

750

1000

n ATP ATN ATP ATN ATP ATN
40
46.62
46.63
50
46.61
60
46.59
40
50
46.70
46.64
60
46.59
40
50
46.76
46.59
60
96.54
40
96.44
50
60
96.46
96.54
40
96.54
50
60
96.42
96.59
40
96.49
50
96.44
60
40
146.48
146.40
50
146.53
60
40
146.46
146.52
50
146.55
60
146.51
40
50
146.47
146.51
60

46.48
46.42
46.52
46.51
46.53
46.53
46.61
46.58
46.69
96.56
96.54
96.56
96.59
96.51
96.55
96.52
96.50
96.58
146.53
146.55
146.57
146.58
146.55
146.42
146.49
146.50
146.56

1.97
2.00
2.00
2.00
2.00
2.00
1.95
2.00
2.00
1.98
2.00
2.00
1.98
2.00
2.00
1.98
2.00
2.00
1.97
2.00
2.00
1.97
2.00
2.00
1.97
2.00
2.00

1.20
1.41
1.57
1.30
1.33
1.57
1.27
1.48
1.65
1.27
1.31
1.62
1.22
1.33
1.60
1.20
1.34
1.59
1.19
1.34
1.54
1.16
1.42
1.56
1.20
1.46
1.53

46.76
46.68
46.58
46.78
46.66
46.58
46.76
46.67
46.51
96.75
96.67
96.70
96.76
96.59
96.59
96.80
96.64
96.50
146.76
146.68
146.51
146.84
146.64
146.45
146.80
146.70
146.56

1.68
1.91
1.98
1.77
1.95
1.99
1.81
1.95
1.99
1.74
1.92
1.99
1.85
1.96
1.99
1.74
1.90
2.00
1.73
1.95
2.00
1.73
1.97
1.98
1.72
1.92
1.99

87

Table 3.6: Standard errors for average true positives and average true negatives given in
Table 3.5. The maximum number of true positives for a given replication is 2. The
maximum number of true negatives for a given replication is T − 3

δ=0.15

δ=0.25

δ=0.35

T

p

500

50

750

1000

500

100

750

1000

500

150

750

1000

n ATP SE ATN SE ATP SE ATN SE ATP SE ATN SE
40
50
60
40
50
60
40
50
60
40
50
60
40
50
60
40
50
60
40
50
60
40
50
60
40
50
60

0.62
0.53
0.55
0.42
0.48
0.55
0.55
0.47
0.63
0.50
0.47
0.48
0.50
0.55
0.49
0.43
0.50
0.61
0.43
0.47
0.52
0.40
0.50
0.58
0.40
0.46
0.54

0.49
0.49
0.55
0.61
0.46
0.50
0.50
0.43
0.67
0.52
0.50
0.54
0.50
0.50
0.78
0.55
0.50
0.61
0.56
0.53
0.52
0.58
0.56
0.56
0.52
0.63
0.50

0.40
0.49
0.50
0.46
0.47
0.50
0.45
0.50
0.48
0.45
0.47
0.49
0.42
0.47
0.49
0.40
0.48
0.49
0.39
0.48
0.50
0.37
0.50
0.50
0.40
0.50
0.50

0.47
0.29
0.14
0.42
0.22
0.10
0.39
0.22
0.10
0.44
0.27
0.10
0.36
0.20
0.10
0.44
0.30
0.00
0.45
0.22
0.00
0.45
0.17
0.14
0.45
0.27
0.10

0.52
0.52
0.50
0.50
0.50
0.56
0.51
0.50
0.47
0.50
0.50
0.52
0.51
0.50
0.50
0.50
0.52
0.50
0.56
0.50
0.50
0.50
0.52
0.52
0.50
0.67
0.52

0.17
0.00
0.00
0.00
0.00
0.00
0.22
0.00
0.00
0.14
0.00
0.00
0.14
0.00
0.00
0.14
0.00
0.00
0.17
0.00
0.00
0.17
0.00
0.00
0.18
0.00
0.00

3.7 An empirical study

Human memory has been studied through fMRI experiments in the context of discrete

and continuous activities. One goal of neurologists is to better understand perception and

memory processes in humans as they experience continuous real-world events (Baldassano

et al. 2017). Event segmentation theory, posited by Zacks et al. (2007), poses that under

certain conditions, humans generate event boundaries in memory during continuous percep-

88

tion events. Thus, humans may partition a continuous experience into a series of segmented

discrete events. Baldassano et al. (2017) investigated event boundary detection and con-

cluded that long-term memory in humans is structured as a series of hierarchical discrete

events. Moreover, Schapiro et al. (2013) suggested that event boundaries are formed around

changes in functional connectivity. In this section, we apply our method to the task-based

fMRI data set analyzed in Baldassano et al. (2017) and Chen et al. (2017) in order to

study the brain’s dynamic functional connectivity. In the presence of brain dynamic func-

tional activity, points of change may represent these event boundaries as suggested in the

aforementioned neuroscience literature.

We apply our proposed method to a task-based fMRI data set collected by Chen et al.

(2017), where they investigated the eﬀects of memories across diﬀerent individuals. The

experiment involved 17 participants that each watched the same 48-minute segment of the

BBC television series Sherlock while undergoing an fMRI scan. The 48-minute segment was

the ﬁrst 48-minutes of the ﬁrst episode in the television series. None of the participants

had watched the series Sherlock prior to the study. Chen et al. partitioned the television

episode into a 23-minute segment and a 25-minute segment. Each segment was prepended

by a 30-second cartoon to allow the brain time to adjust to new audio and visual stimuli.

Including an unrelated cartoon prior to studies such as this is common practice as it reduces

statistical noise. Subjects were instructed to watch the television episode as they would

watch a typical television episode in their own home. The fMRI data were gathered from

a Siemens Skyra 3T full-body scanner. More details about the experiment and processes of

acquiring functional and anatomical images are provided in Chen et al. (2017).

The 48-minute segment of Sherlock resulted in 1,976 time point measurements of data.

For each participant, the fMRI machine acquired an image the participant’s brain every 1.5

seconds. To demonstrate our proposed method, we analyzed the ﬁrst 100 time points which

equates to the ﬁrst 150 seconds of the Sherlock episode. Let Yit be the BOLD random vector

for the 268 nodes of the ith individual at time t. Thus, Yit (i = 1, . . . , 17; t = 1, . . . , 100)

89

is a 268-dimensional random vector. A node, or region of interest, represents a collection

of voxels. The 268 node parcellation was performed according to Shen et al. (2013), where

voxels groupings ensure functional homogeneity within each node, making it ideal for node

network and dynamic functional connectivity analysis. Figure 3.2 illustrates the 268 Shen

node parcellation along with large-scale node groupings. Node-level analysis decreases the

data dimensional and allows for more interpretable results. For further details on the beneﬁts

and processes of Shen node parcellation, we refer readers to Shen et al. (2013).

Figure 3.2: Shen 268 node parcellation. This image was obtained from Finn et al. (2015).

In our analysis n = 17, p = 268, and T = 100. Based on (3.1) – (3.2), we assume

that at each time point there exists a common population covariance matrix among all 17

individuals. Our assumption is not unrealistic given this task-based fMRI experiment. In

Chen et al. (2017) and Baldassano et al. (2017), they found that an across-subject design

was appropriate due to consistent stimulus-response across patients for a given brain region.

Under model (3.2), we applied our procedure to test (3.1). Based on the test statistic
value, Mn = 3.6596, we rejected H0 of (3.1) as the p-value was less than 0.001. Hence, we
rejected the claim that the covariance matrices were stationary for all T = 100 time points.

Accordingly, we applied binary segmentation to identify all signiﬁcant change points among

90

99 possible points of change. Our proposed method identiﬁed 17 locations of signiﬁcance.

Change points were located at time points 2, 25, 36, 39, 40, 41, 42, 58, 60, 61, 63, 81, 83,
88, 89, 91, and 92. A change point at time two implies that Σ1 = Σ2 (cid:54)= Σ3.

Table 3.7: Identiﬁed change points in the Sherlock fMRI data set. Range of time points
preceding the identiﬁed change point where the covariance matrices are temporally
homogeneous. An interval ID provides a reference to Figure 3.3

Change point

Interval Homogeneous interval

2
25
36
39
40
41
42
58
60
61
63
81
83
88
89
91
92

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

[1, 2]
[3, 25]
[26, 36]
[37, 39]
[40, 40]
[41, 41]
[42, 42]
[43, 58]
[59, 60]
[61, 61]
[62, 63]
[64, 81]
[82, 83]
[84, 88]
[89, 89]
[90, 91]
[92, 92]

Figures (3.3) illustrates these temporal changes among covariance matrices around the

identiﬁed change points listed in Table 3.7. Each subplot is the average correlation between

nodes across the time interval where the covariance matrices are homogeneous. Thus, in

Figure 3.3, Interval 1 represents the correlation network based on the average correlations

between nodes over time interval [1, 2]. Interval 2 represents the correlation network based

on the average correlations between nodes over time interval [3, 25]. Table 3.7 details the

time interval corresponding to the temporal homogeneous covariance matrices preceding

each identiﬁed change point. Therefore, given that a change point was located at t = 2,

the correlation network of Interval 1 compared to Interval 2 should be signiﬁcantly diﬀerent.

Correlation network layouts are structured according to the eight large-scale node groupings

91

illustrated in Figure 3.2. The top-centered circle consists of nodes within the medial frontal

group. Moving clockwise on a given sub-plot, the remaining circles represent frontoparietal,

default mode, subcortical-cerebellum, motor, visual I, visual II, and visual association.

The identiﬁed change points in Table 3.7 coincide with interesting events in the television

episode Sherlock. For example, the ﬁrst change point at t = 2 may be a reaction to initial

stimuli of the cartoon. The brain must process this initial video and audio stimuli. At

approximately 37 to 38 seconds into the series Sherlock the cartoon ends, and a graphic

war scene commences. Guns are ﬁred, casualties are shown, but there is no distinguishable

dialect. The transition point from cartoon clip to battle scene coincides with the change point

identiﬁed at t = 25. After this war scene a period of quiet ensues. The ﬁrst understandable

dialect from actors occurs at approximately two minutes and 11 seconds into the episode. At

this time, a therapist inquires about a patient’s well-being as the viewer learns the opening

war scene was a ﬂashback. Change points identiﬁed at 88, 89, 91, and 92 equate to the start

of this conversation.

92

Figure 3.3: Correlation networks based on an average over a time interval in which the
covariance matrices are homogeneous. Each circle is comprised of 67 Shen nodes. Solid
lines represent a positive correlation, and dashed lines represent a negative correlation. The
darker the line the stronger the correlation between nodes. A correlation threshold value of
0.70 in absolute values was used.

93

3.8 Technical details

This section contains proofs of lemmas and the main theorems. Some of the expressions

are rather long. Thus, for readability, an equation will not always be aligned with the initial

equality sign.

3.8.1 Proofs of lemmas

First, we provide proofs for some lemmas that will be used in the proofs corresponding to

the main theorems.

sakzi, where zi is a standard multivariate normal random vector,

Lemma 7. Let Yisak = γT
and γT

sak is known. Then

(cid:16)

E

YisakYisalYircmYircnYiueoYiuepYixgqYixgw

ueΣwq

xg + Σlk

saΣnm

rc Cpq

uexg Cwo
xgue

(cid:17) (cid:16) Σlk

rc Σpo
saΣnm
rc Σpo
ueClq
sarcCnk
uerc + Σlk

rcueCpm
sarcCnk
xgsaCno

uerc + Σnm
rcsa + Clm
rcueCpm

+ Σlk

+ Σnm

+ Clo

saΣpo
rc Σwq
saueCpk
rc Clo
sarcCno

ueCnq
xg Clo
uesaCnq
saueCpq
rcueCpq

rcxg Cwm
saueCpk

xgrc + Σlk
uesa + Σpo

saΣwq
ueΣwq
xgrc + Clq
xgsa + Σpo
xgsa + Clm

xg Cno
xg Clm
saxg Cwk
ueClm
sarcCnq

rcxg Cwm
uexg Cwk
uexg Cwk

+ Σnm

+ Clm

uexg Cwo
xgue
rcueCpq
uexg Cwm
xgrc

saxg Cwk
xgsa
rcsaCpq
saCno
sarcCno
saueCpm

sarcCnq

rcxg Cwk

xgsa + Σwq

xg Clm

rcxg Cwo

xgueCpk

uesa + Clo

rcueCpk
uesa
uercCnq
rcxg Cwk

xgsa,

where Σlk

sa = γT

salγsak and Cpq

uexg = γT

ueqγxgq.

Proof. Let A1, A2, A3, and A4 be any matrices of appropriate dimensions. Assume zi is a

94

3 )tr(A2A4 + A2AT
4 )

2 )(A3 + AT

3 )(A4 + AT

4 )}

+

tr(A1)tr(A2)tr(A3A4 + A3AT

4 ) + tr(A1)tr(A3)tr(A2A4 + A2AT
4 )

+ tr(A1)tr(A4)tr(A2A3 + A2AT

+ tr(A2)tr(A4)tr(A1A3 + A1AT

3 ) + tr(A2)tr(A3)tr(A1A4 + A1AT
4 )
3 ) + tr(A3)tr(A4)tr(A1A2 + A1AT
2 )

(cid:105)

+

tr(A1A2 + A1AT

2 )tr(A3A4 + A3AT

(cid:105)

4 ) + tr(A1A3 + A1AT

(cid:104)
tr(A1)tr{(A2 + AT
+
4 )}
4 )}

4 )tr(A2A3 + A2AT
3 )

1 )(A3 + AT
1 )(A2 + AT
1 )(A2 + AT

3 )(A4 + AT
2 )(A4 + AT
2 )(A3 + AT

+ tr(A1A4 + A1AT
+ tr(A2)tr{(A1 + AT
+ tr(A3)tr{(A1 + AT
(cid:104)
+ tr(A4)tr{(A1 + AT
tr{(A1 + AT
+
+ tr{(A1 + AT
+ tr{(A1 + AT

1 )(A2 + AT

2 )(A3 + AT

1 )(A2 + AT
1 )(A3 + AT

2 )(A4 + AT
3 )(A2 + AT

(cid:110)

3 )}(cid:105)
4 )}(cid:105)

4 )}
3 )(A4 + AT
3 )}

4 )(A3 + AT
2 )(A4 + AT

.

(cid:104)

(cid:104)

(cid:16)

(cid:110)
(cid:16)

standard multivariate normal random vector. By the results of multivariate analysis

E

zT
i A1zizT

i A2zizT

i A3zizT

i A4zi

= tr(A1)tr(A2)tr(A3)tr(A4)

(cid:17)

(cid:111)

=

By the deﬁnition of Yi··, E
i γrcmγT

salzizT

E

zT
i γsakγT

i γxgqγT
substitutions for A1, A2, A3, and A4, it follows that

i γueoγT

uepzizT

rcnzizT

xgwzi

YisakYisalYircmYircnYiueoYiuepYixgqYixgw

(cid:111)

. Thus, making the appropriate

E

YisakYisalYircmYircnYiueoYiuepYixgqYixgw

ueΣwq

xg + Σlk

saΣnm

rc Cpq

uexg Cwo
xgue

(cid:17) (cid:16) Σlk

rc Σpo
saΣnm
ueClq
rc Σpo
sarcCnk
uerc + Σlk

rcueCpm
sarcCnk
xgsaCno

uerc + Σnm
rcsa + Clm
rcueCpm

+ Σlk

+ Σnm

saΣpo
rc Σwq
saueCpk
rc Clo
sarcCno

ueCnq
xg Clo
uesaCnq
saueCpq
rcueCpq

rcxg Cwm
saueCpk

xgrc + Σlk
uesa + Σpo

saΣwq
ueΣwq
xgrc + Clq
xgsa + Σpo
xgsa + Clm

xg Cno
xg Clm
saxg Cwk
ueClm
sarcCnq

rcxg Cwm
uexg Cwk
uexg Cwk

+ Clo

+ Σnm

+ Clm

uexg Cwo
xgue
rcueCpq
uexg Cwm
xgrc

saxg Cwk
xgsa
rcsaCpq
saCno
sarcCno
saueCpm

sarcCnq

rcxg Cwk

xgsa + Σwq

xg Clm

rcxg Cwo

xgueCpk

uesa + Clo

rcueCpk
uesa
uercCnq
rcxg Cwk

xgsa,

where Σlk

sa = γT

salγsak and Cpq

uexg = γT

ueqγxgq.

(cid:3)

95

Lemma 8. Let Vsasb(i, j) = (Y T
isa
and i (cid:54)= j. Then

(cid:110)

E

Vsasb(i, j)Vrcrd(i, j)Vueuf (i, j)Vxgxh(i, j)

)2, where sa, sb, rc, rd, ue, uf , xg, xh ∈ {1 . . . T − 1}

Yjsb

(cid:111)

= H1 + H2 + H3 + H4 + H5 + H6

+ H7 + H8 + H9 + H10 + H11 + H12
+ H13 + H14 + H15 + H16 + H17

where Hk, k ∈ {1, . . . , 17} is given below.

(cid:104)

H1 =

(cid:104)

H2 =

tr4(Σ2) + tr2(Σ2)tr(ΣCuf xhΣCxhuf ) + tr2(Σ2)tr(ΣCrdxhΣCxhrd)
+ tr2(Σ2)tr(ΣCrduf ΣCuf rd) + tr2(Σ2)tr(ΣCsbxhΣCxhsb)
+ tr2(Σ2)tr(ΣCsbuf ΣCuf sb) + tr2(Σ2)tr(ΣCsbrdΣCrdsb)

+ tr(ΣCsbrdΣCrdsb)tr(ΣCuf xhΣCxhuf ) + tr(ΣCsbuf ΣCuf sb)tr(ΣCrdxhΣCxhrd)
+ tr(ΣCsbxhΣCxhsb)tr(ΣCrduf ΣCuf rd) + tr(Σ2)tr(ΣCrdxhΣCxhuf ΣCuf rd)
+ tr(Σ2)tr(ΣCsbxhΣCxhuf ΣCuf sb) + tr(Σ2)tr(ΣCsbxhΣCxhrdΣCrdsb)
+ +tr(Σ2)tr(ΣCsbuf ΣCuf rdΣCrdsb) + tr(ΣCsbxhΣCxhuf ΣCuf rdΣCrdsb)

+ tr(ΣCsbuf ΣCuf xhΣCxhrdΣCrdsb) + tr(ΣCsbxhΣCxhrdΣCrduf ΣCuf sb)

(cid:105)

tr2(Σ2)tr(ΣCuexg ΣCxgue) + tr2(Σ2)tr2(Cuexg Cxhuf )
+ tr(Σ2)tr(ΣCrdxhCxgueΣCuexg Cxhrd)
+ tr(Σ2)tr(ΣCrduf Cuexg ΣCxgueCuf rd) + tr(Σ2)tr(ΣCsbxhCxgueΣCuexg Cxhsb)
+ tr(Σ2)tr(ΣCsbuf Cuexg ΣCxgueCuf sb) + tr(ΣCsbrdΣCrdsb)tr(ΣCuexg ΣCxgue)
+ tr(ΣCsbrdΣCrdsb)tr2(Cuexg Cxhuf )

+ tr(ΣCsbuf Cuexg CxhrdΣCrdxhCxgueCuf sb)

+ tr(ΣCsbxhCxgueCuf rdΣCrduf Cuexg Cxhsb)
+ tr(Σ2)tr(ΣCrdxhCxgueCuf rd)tr(Cuexg Cxhuf )

96

+ tr(Σ2)tr(ΣCsbxhCxgueCuf sb)tr(Cuexg Cxhuf )

+ tr(ΣCsbxhCxgueΣCuexg CxhrdΣCrdsb)

+ tr(ΣCsbuf Cuexg ΣCxgueCuf rdΣCrdsb)
+ tr(ΣCsbxhCxgueCrduf ΣCrdsb)tr2(Cuexg Cxhuf )
+ tr(ΣCsbuf Cuexg CxhrdΣCrdsb)tr2(Cuexg Cxhuf )

+ tr(ΣCsbxhCxgueCuf sb)tr(ΣCrduf Cuexg Cxhrd)

(cid:105)

tr2(Σ2)tr(ΣCrcxg ΣCxgrc) + tr2(Σ2)tr2(Crcxg Cxhrd)
+ tr(Σ2)tr(ΣCuf xhCxgrcΣCrcxg Cxhuf )
+ tr(Σ2)tr(ΣCuf rdCrcxg ΣCxgrcCrduf ) + tr(Σ2)tr(ΣCsbxhCxgrcΣCrcxg Cxhsb)
+ tr(Σ2)tr(ΣCsbrdCrcxg ΣCxgrcCrdsb) + tr(ΣCsbuf ΣCuf sb)tr(ΣCrcxg ΣCxgrc)
+ tr(ΣCsbuf ΣCuf sb)tr2(Crcxg Cxhrd)

+ tr(ΣCsbrdCrcxg Cxhuf ΣCuf xhCxgrcCrdsb)

+ tr(ΣCsbxhCxgrcCrduf ΣCuf rdCrcxg Cxhsb)
+ tr(Σ2)tr(ΣCuf xhCxgrcCrduf )tr(Crcxg Cxhrd)
+ tr(Σ2)tr(ΣCsbxhCxgrcCrdsb)tr(Crcxg Cxhrd)
+ tr(ΣCsbxhCxgrcΣCrcxg Cxhuf ΣCuf sb)

+ tr(ΣCsbrdCrcxg ΣCxgrcCrduf ΣCuf sb)
+ tr(ΣCsbxhCxgrcCuf rdΣCuf sb)tr2(Crcxg Cxhrd)
+ tr(ΣCsbrdCrcxg Cxhuf ΣCuf sb)tr2(Crcxg Cxhrd)

+ tr(ΣCsbxhCxgrcCrdsb)tr(ΣCuf rdCrcxg Cxhuf )

(cid:105)

tr2(Σ2)tr(ΣCrcueΣCuerc) + tr2(Σ2)tr2(CrcueCuf rd)

97

(cid:104)

H3 =

(cid:104)

H4 =

+ tr(Σ2)tr(ΣCxhuf CuercΣCrcueCuf xh)
+ tr(Σ2)tr(ΣCxhrdCrcueΣCuercCrdxh) + tr(Σ2)tr(ΣCsbuf CuercΣCrcueCuf sb)
+ tr(Σ2)tr(ΣCsbrdCrcueΣCuercCrdsb) + tr(ΣCsbxhΣCxhsb)tr(ΣCrcueΣCuerc)
+ tr(ΣCsbxhΣCxhsb)tr2(CrcueCuf rd) + tr(ΣCsbrdCrcueCuf xhΣCxhuf CuercCrdsb)

+ tr(ΣCsbuf CuercCrdxhΣCxhrdCrcueCuf sb)
+ tr(Σ2)tr(ΣCxhuf CuercCrdxh)tr(CrcueCuf rd)
+ tr(Σ2)tr(ΣCsbuf CuercCrdsb)tr(CrcueCuf rd)

+ tr(ΣCsbuf CuercΣCrcueCuf xhΣCxhsb)

+ tr(ΣCsbrdCrcueΣCuercCrdxhΣCxhsb)
+ tr(ΣCsbuf CuercCxhrdΣCxhsb)tr2(CrcueCuf rd)
+ tr(ΣCsbrdCrcueCuf xhΣCxhsb)tr2(CrcueCuf rd)

+ tr(ΣCsbuf CuercCrdsb)tr(ΣCxhrdCrcueCuf xh)

(cid:105)

(cid:104)

H5 =

tr2(Σ2)tr(ΣCsaxg ΣCxgsa) + tr2(Σ2)tr2(Csaxg Cxhsb)
+ tr(Σ2)tr(ΣCuf xhCxgsaΣCsaxg Cxhuf )
+ tr(Σ2)tr(ΣCuf sbCsaxg ΣCxgsaCsbuf ) + tr(Σ2)tr(ΣCrdxhCxgsaΣCsaxg Cxhrd)
+ tr(Σ2)tr(ΣCrdsbCsaxg ΣCxgsaCsbrd) + tr(ΣCrduf ΣCuf rd)tr(ΣCsaxg ΣCxgsa)
+ tr(ΣCrduf ΣCuf rd)tr2(Csaxg Cxhsb)

+ tr(ΣCrdsbCsaxg Cxhuf ΣCuf xhCxgsaCsbrd)

+ tr(ΣCrdxhCxgsaCsbuf ΣCuf sbCsaxg Cxhrd)
+ tr(Σ2)tr(ΣCuf xhCxgsaCsbuf )tr(Csaxg Cxhsb)
+ tr(Σ2)tr(ΣCrdxhCxgsaCsbrd)tr(Csaxg Cxhsb)
+ tr(ΣCrdxhCxgsaΣCsaxg Cxhuf ΣCuf rd)

98

+ tr(ΣCrdsbCsaxg ΣCxgsaCsbuf ΣCuf rd)
+ tr(ΣCrdxhCxgsaCuf sbΣCuf rd)tr2(Csaxg Cxhsb)
+ tr(ΣCrdsbCsaxg Cxhuf ΣCuf rd)tr2(Csaxg Cxhsb)

+ tr(ΣCrdxhCxgsaCsbrd)tr(ΣCuf sbCsaxg Cxhuf )

tr2(Σ2)tr(ΣCsaueΣCuesa) + tr2(Σ2)tr2(CsaueCuf sb)
+ tr(Σ2)tr(ΣCxhuf CuesaΣCsaueCuf xh)
+ tr(Σ2)tr(ΣCxhsbCsaueΣCuesaCsbxh) + tr(Σ2)tr(ΣCrduf CuesaΣCsaueCuf rd)
+ tr(Σ2)tr(ΣCrdsbCsaueΣCuesaCsbrd) + tr(ΣCrdxhΣCxhrd)tr(ΣCsaueΣCuesa)
+ tr(ΣCrdxhΣCxhrd)tr2(CsaueCuf sb)

+ tr(ΣCrdsbCsaueCuf xhΣCxhuf CuesaCsbrd)

+ tr(ΣCrduf CuesaCsbxhΣCxhsbCsaueCuf rd)
+ tr(Σ2)tr(ΣCxhuf CuesaCsbxh)tr(CsaueCuf sb)
+ tr(Σ2)tr(ΣCrduf CuesaCsbrd)tr(CsaueCuf sb)

+ tr(ΣCrduf CuesaΣCsaueCuf xhΣCxhrd)

+ tr(ΣCrdsbCsaueΣCuesaCsbxhΣCxhrd)
+ tr(ΣCrduf CuesaCxhsbΣCxhrd)tr2(CsaueCuf sb)
+ tr(ΣCrdsbCsaueCuf xhΣCxhrd)tr2(CsaueCuf sb)

+ tr(ΣCrduf CuesaCsbrd)tr(ΣCxhsbCsaueCuf xh)

(cid:105)

(cid:105)

(cid:104)

H6 =

(cid:104)

H7 =

tr2(Σ2)tr(ΣCsarcΣCrcsa) + tr2(Σ2)tr2(CsarcCrdsb)
+ tr(Σ2)tr(ΣCxhrdCrcsaΣCsarcCrdxh)
+ tr(Σ2)tr(ΣCxhsbCsarcΣCrcsaCsbxh) + tr(Σ2)tr(ΣCuf rdCrcsaΣCsarcCrduf )

99

(cid:104)

H8 =

+ tr(Σ2)tr(ΣCuf sbCsarcΣCrcsaCsbuf ) + tr(ΣCuf xhΣCxhuf )tr(ΣCsarcΣCrcsa)
+ tr(ΣCuf xhΣCxhuf )tr2(CsarcCrdsb) + tr(ΣCuf sbCsarcCrdxhΣCxhrdCrcsaCsbuf )

+ tr(ΣCuf rdCrcsaCsbxhΣCxhsbCsarcCrduf )
+ tr(Σ2)tr(ΣCxhrdCrcsaCsbxh)tr(CsarcCrdsb)
+ tr(Σ2)tr(ΣCuf rdCrcsaCsbuf )tr(CsarcCrdsb)

+ tr(ΣCuf rdCrcsaΣCsarcCrdxhΣCxhuf )

+ tr(ΣCuf sbCsarcΣCrcsaCsbxhΣCxhuf )
+ tr(ΣCuf rdCrcsaCxhsbΣCxhuf )tr2(CsarcCrdsb)
+ tr(ΣCuf sbCsarcCrdxhΣCxhuf )tr2(CsarcCrdsb)

+ tr(ΣCuf rdCrcsaCsbuf )tr(ΣCxhsbCsarcCrdxh)

(cid:105)

tr(ΣCsarcΣCrcsa)tr(ΣCuexg ΣCxgue) + tr(ΣCsarcΣCrcsa)tr2(Cuexg Cxhuf )

+ tr(ΣCsarcCrdxhCxgueΣCuexg CxhrdCrcsa)

+ tr(ΣCsarcCrduf Cuexg ΣCxgueCuf rdCrcsa)

+ tr(ΣCrcsaCsbxhCxgueΣCuexg CxhsbCsarc)

+ tr(ΣCrcsaCsbuf Cuexg ΣCxgueCuf sbCsarc)
+ tr(ΣCuexg ΣCxgue)tr2(CsarcCrdsb) + tr2(CsarcCrdsb)tr2(Cuexg Cxhuf )
+ tr2(CsarcCrdxhCxgueCuf sb) + tr2(CsarcCrduf Cuexg Cxhsb)

+ tr(ΣCsarcCrduf Cuexg CxhrdCrcsa)tr(Cuexg Cxhuf )

+ tr(ΣCrcsaCsbuf Cuexg CxhsbCsarc)tr(Cuexg Cxhuf )

+ tr(ΣCuexg CxhsbCsarcCrdxhCxgue)tr(CsarcCrdsb)

+ tr(ΣCxgueCuf sbCsarcCrduf Cuexg )tr(CsarcCrdsb)

+ tr(CrcsaCsbxhCxgueCuf rd)tr(CsarcCrdsb)tr(Cuexg Cxhuf )

100

+ tr(CrcsaCsbuf Cuexg Cxhrd)tr(CsarcCrdsb)tr(Cuexg Cxhuf )

+ tr(CsarcCrduf Cuexg CxhrdCrcsaCsbxhCxgueCuf sb)

(cid:105)

(cid:105)

tr(ΣCsaueΣCuesa)tr(ΣCrcxg ΣCxgrc) + tr(ΣCsaueΣCuesa)tr2(Crcxg Cxhrd)

+ tr(ΣCsaueCuf xhCxgrcΣCrcxg Cxhuf Cuesa)

+ tr(ΣCsaueCuf rdCrcxg ΣCxgrcCrduf Cuesa)

+ tr(ΣCuesaCsbxhCxgrcΣCrcxg CxhsbCsaue)

+ tr(ΣCuesaCsbrdCrcxg ΣCxgrcCrdsbCsaue)
+ tr(ΣCrcxg ΣCxgrc)tr2(CsaueCuf sb) + tr2(CsaueCuf sb)tr2(Crcxg Cxhrd)
+ tr2(CsaueCuf xhCxgrcCrdsb) + tr2(CsaueCuf rdCrcxg Cxhsb)

+ tr(ΣCsaueCuf rdCrcxg Cxhuf Cuesa)tr(Crcxg Cxhrd)

+ tr(ΣCuesaCsbrdCrcxg CxhsbCsaue)tr(Crcxg Cxhrd)

+ tr(ΣCrcxg CxhsbCsaueCuf xhCxgrc)tr(CsaueCuf sb)

+ tr(ΣCxgrcCrdsbCsaueCuf rdCrcxg )tr(CsaueCuf sb)

+ tr(CuesaCsbxhCxgrcCrduf )tr(CsaueCuf sb)tr(Crcxg Cxhrd)

+ tr(CuesaCsbrdCrcxg Cxhuf )tr(CsaueCuf sb)tr(Crcxg Cxhrd)

+ tr(CsaueCuf rdCrcxg Cxhuf CuesaCsbxhCxgrcCrdsb)

(cid:104)

H9 =

(cid:104)

H10 =

tr(ΣCsaxg ΣCxgsa)tr(ΣCrcueΣCuerc) + tr(ΣCsaxg ΣCxgsa)tr2(CrcueCuf rd)

+ tr(ΣCsaxg Cxhuf CuercΣCrcueCuf xhCxgsa)

+ tr(ΣCsaxg CxhrdCrcueΣCuercCrdxhCxgsa)

+ tr(ΣCxgsaCsbuf CuercΣCrcueCuf sbCsaxg )

+ tr(ΣCxgsaCsbrdCrcueΣCuercCrdsbCsaxg )

101

(cid:104)

H11 =

+ tr(ΣCrcueΣCuerc)tr2(Csaxg Cxhsb)
+ tr2(Csaxg Cxhsb)tr2(CrcueCuf rd)
+ tr2(Csaxg Cxhuf CuercCrdsb) + tr2(Csaxg CxhrdCrcueCuf sb)

+ tr(ΣCsaxg CxhrdCrcueCuf xhCxgsa)tr(CrcueCuf rd)

+ tr(ΣCxgsaCsbrdCrcueCuf sbCsaxg )tr(CrcueCuf rd)

+ tr(ΣCrcueCuf sbCsaxg Cxhuf Cuerc)tr(Csaxg Cxhsb)

+ tr(ΣCuercCrdsbCsaxg CxhrdCrcue)tr(Csaxg Cxhsb)

+ tr(CxgsaCsbuf CuercCrdxh)tr(Csaxg Cxhsb)tr(CrcueCuf rd)

(cid:105)

+ tr(CxgsaCsbrdCrcueCuf xh)tr(Csaxg Cxhsb)tr(CrcueCuf rd)

+ tr(Csaxg CxhrdCrcueCuf xhCxgsaCsbuf CuercCrdsb)

tr(Σ2)tr(ΣCrcxg ΣCxgueΣCuerc) + tr(Σ2)tr(Cuexg Cxhuf )tr(ΣCrcxg Cxhuf Cuerc)
+ tr(Σ2)tr(Crcxg Cxhrd)tr(ΣCuercCrdxhCxgue)
+ tr(Σ2)tr(CrcueCuf rd)tr(ΣCxgueCuf rdCrcxg )

+ tr(ΣCsbxhCxgrcΣCrcueΣCuexg Cxhsb)

+ tr(ΣCsbuf Cuexg ΣCxgrcΣCrcueCuf sb)

+ tr(ΣCsbrdCrcueΣCuexg ΣCxgrcCrdsb)

+ tr(ΣCsbrdCrcueCuf xhCxgrcCrdsb)tr(Cuexg Cxhuf )

+ tr(ΣCsbuf Cuexg CxhrdCrcueCuf sb)tr(Crcxg Cxhrd)

+ tr(ΣCsbxhCxgrcCrduf Cuexg Cxhsb)tr(CrcueCuf rd)
+ tr(Σ2)tr(CrcueCuf rd)tr(Cuexg Cxhuf )tr(Crcxg Cxhrd)

+ tr(ΣCsbxhCxgrcΣCrcueCuf sb)tr(Cuexg Cxhuf )

+ tr(ΣCsbxhCxgrcCrdsb)tr(ΣCuercCrdxhCxgue)

102

+ tr(ΣCsbuf Cuexg ΣCxgrcCrdsb)tr(CrcueCuf rd)

+ tr(ΣCsbxhCxgrcCrdsb)tr(CrcueCuf rd)tr(Cuexg Cxhuf )

+ tr(ΣCsbuf Cuexg CxhrdCrcueCuf xhCxgrcCrdsb)

+ tr(ΣCsbxhCxgrcCrduf Cuexg CxhrdCrcueCuf sb)

(cid:105)

(cid:105)

103

(cid:104)

H12 =

(cid:104)

H13 =

tr(Σ2)tr(ΣCsaxg ΣCxgueΣCuesa) + tr(Σ2)tr(Cuexg Cxhuf )tr(ΣCsaxg Cxhuf Cuesa)
+ tr(Σ2)tr(Csaxg Cxhsb)tr(ΣCuesaCsbxhCxgue)
+ tr(Σ2)tr(CsaueCuf sb)tr(ΣCxgueCuf sbCsaxg )

+ tr(ΣCrdxhCxgsaΣCsaueΣCuexg Cxhrd) + tr(ΣCrduf Cuexg ΣCxgsaΣCsaueCuf rd)

+ tr(ΣCrdsbCsaueΣCuexg ΣCxgsaCsbrd)

+ tr(ΣCrdsbCsaueCuf xhCxgsaCsbrd)tr(Cuexg Cxhuf )

+ tr(ΣCrduf Cuexg CxhsbCsaueCuf rd)tr(Csaxg Cxhsb)

+ tr(ΣCrdxhCxgsaCsbuf Cuexg Cxhrd)tr(CsaueCuf sb)
+ tr(Σ2)tr(CsaueCuf sb)tr(Cuexg Cxhuf )tr(Csaxg Cxhsb)

+ tr(ΣCrdxhCxgsaΣCsaueCuf rd)tr(Cuexg Cxhuf )

+ tr(ΣCrdxhCxgsaCsbrd)tr(ΣCuesaCsbxhCxgue)

+ tr(ΣCrduf Cuexg ΣCxgsaCsbrd)tr(CsaueCuf sb)

+ tr(ΣCrdxhCxgsaCsbrd)tr(CsaueCuf sb)tr(Cuexg Cxhuf )

+ tr(ΣCrduf Cuexg CxhsbCsaueCuf xhCxgsaCsbrd)

+ tr(ΣCrdxhCxgsaCsbuf Cuexg CxhsbCsaueCuf rd)

tr(Σ2)tr(ΣCsaxg ΣCxgrcΣCrcsa) + tr(Σ2)tr(Crcxg Cxhrd)tr(ΣCsaxg CxhrdCrcsa)
+ tr(Σ2)tr(Csaxg Cxhsb)tr(ΣCrcsaCsbxhCxgrc)

+ tr(Σ2)tr(CsarcCrdsb)tr(ΣCxgrcCrdsbCsaxg )
+ tr(ΣCuf xhCxgsaΣCsarcΣCrcxg Cxhuf ) + tr(ΣCuf rdCrcxg ΣCxgsaΣCsarcCrduf )

+ tr(ΣCuf sbCsarcΣCrcxg ΣCxgsaCsbuf )

+ tr(ΣCuf sbCsarcCrdxhCxgsaCsbuf )tr(Crcxg Cxhrd)

+ tr(ΣCuf rdCrcxg CxhsbCsarcCrduf )tr(Csaxg Cxhsb)

+ tr(ΣCuf xhCxgsaCsbrdCrcxg Cxhuf )tr(CsarcCrdsb)
+ tr(Σ2)tr(CsarcCrdsb)tr(Crcxg Cxhrd)tr(Csaxg Cxhsb)
+ tr(ΣCuf xhCxgsaΣCsarcCrduf )tr(Crcxg Cxhrd)

+ tr(ΣCuf xhCxgsaCsbuf )tr(ΣCrcsaCsbxhCxgrc)

+ tr(ΣCuf rdCrcxg ΣCxgsaCsbuf )tr(CsarcCrdsb)

+ tr(ΣCuf xhCxgsaCsbuf )tr(CsarcCrdsb)tr(Crcxg Cxhrd)

+ tr(ΣCuf rdCrcxg CxhsbCsarcCrdxhCxgsaCsbuf )

+ tr(ΣCuf xhCxgsaCsbrdCrcxg CxhsbCsarcCrduf )

(cid:105)

(cid:104)

H14 =

tr(Σ2)tr(ΣCsaueΣCuercΣCrcsa) + tr(Σ2)tr(CrcueCuf rd)tr(ΣCsaueCuf rdCrcsa)
+ tr(Σ2)tr(CsaueCuf sb)tr(ΣCrcsaCsbuf Cuerc)
+ tr(Σ2)tr(CsarcCrdsb)tr(ΣCuercCrdsbCsaue)
+ tr(ΣC∗uf CuesaΣCsarcΣCrcueCuf∗) + tr(ΣC∗rdCrcueΣCuesaΣCsarcCrd∗)
+ tr(ΣC∗sbCsarcΣCrcueΣCuesaCsb∗)
+ tr(ΣC∗sbCsarcCrduf CuesaCsb∗)tr(CrcueCuf rd)
+ tr(ΣC∗rdCrcueCuf sbCsarcCrd∗)tr(CsaueCuf sb)
+ tr(ΣC∗uf CuesaCsbrdCrcueCuf∗)tr(CsarcCrdsb)
+ tr(Σ2)tr(CsarcCrdsb)tr(CrcueCuf rd)tr(CsaueCuf sb)

104

+ tr(ΣC∗uf CuesaΣCsarcCrd∗)tr(CrcueCuf rd)
+ tr(ΣC∗uf CuesaCsb∗)tr(ΣCrcsaCsbuf Cuerc)
+ tr(ΣC∗rdCrcueΣCuesaCsb∗)tr(CsarcCrdsb)
+ tr(ΣC∗uf CuesaCsb∗)tr(CsarcCrdsb)tr(CrcueCuf rd)
+ tr(ΣC∗rdCrcueCuf sbCsarcCrduf CuesaCsb∗)
+ tr(ΣC∗uf CuesaCsbrdCrcueCuf sbCsarcCrd∗)

(cid:105)

tr(ΣCsaxxΣCxgueΣCuercΣCrcsa) + tr(Cuexg Cxhuf )tr(ΣCsaxg Cxhuf CuercΣCrcsa)

(cid:104)

H15 =

+ tr(ΣCsaxg CxhrdCrcsa)tr(ΣCuercCrdxhCxgue)

+ tr(CrcueCuf rd)tr(ΣCsaxg ΣCxgueCuf rdCrcsa)

+ tr(Csaxg Cxhsb)tr(ΣCrcsaCsbxhCxgueΣCuerc)

+ tr(ΣCrcsaCsbuf Cuerc)tr(ΣCxgueCuf sbCsaxg )

+ tr(CsarcCrdsb)tr(ΣCuercCrdsbCsaxg ΣCxgue)

+ tr(CsarcCrdsb)tr(Cuexg Cxhuf )tr(CrcueCuf xhCxgsaCsbrd)

+ tr(CsarcCrdxhCxgsaCsbuf Cuexg CxhrdCrcueCuf sb)

+ tr(CrcueCuf rd)tr(Csaxg Cxhsb)tr(CsarcCrduf Cuexg Cxhsb)

+ tr(CrcueCuf rd)tr(Cuexg Cxhuf )tr(ΣCsaxg CxhrdCrcsa)

+ tr(Csaxg Cxhsb)tr(Cuexg Cxhuf )tr(ΣCrcsaCsbuf Cuerc)

+ tr(CsarcCrdsb)tr(Csaxg Cxhsb)tr(ΣCuercCrdxhCxgue)

+ tr(CsarcCrdsb)tr(CrcueCuf rd)tr(ΣCxgueCuf sbCsaxg )

+ tr(CsarcCrdsb)tr(CrcueCuf rd)tr(Cuexg Cxhuf )tr(CxgsaCsbxh)

+ tr(CsarcCrdsb)tr(CrcueCuf xhCxgsaCsbuf Cuexg Cxhrd)

+ tr(Csaxg Cxhsb)tr(CsarcCrduf Cuexg CxhrdCrcueCuf sb)

105

(cid:105)

(cid:104)

H16 =

tr(ΣCsaxxΣCuexg ΣCxgrcΣCrcsa) + tr(CxgueCuf xh)tr(ΣCsaueCuf xhCxgrcΣCrcsa)

+ tr(ΣCsaueCuf rdCrcsa)tr(ΣCxgrcCrduf Cuexg )

+ tr(Crcxg Cxhrd)tr(ΣCsaueΣCuexg CxhrdCrcsa)

+ tr(CsaueCuf sb)tr(ΣCrcsaCsbuf Cuexg ΣCxgrc)

+ tr(ΣCrcsaCsbxhCxgrc)tr(ΣCuexg CxhsbCsaue)

+ tr(CsarcCrdsb)tr(ΣCxgrcCrdsbCsaueΣCuexg )

+ tr(CsarcCrdsb)tr(CxgueCuf xh)tr(Crcxg Cxhuf CuesaCsbrd)

+ tr(CsarcCrduf CuesaCsbxhCxgueCuf rdCrcxg Cxhsb)

+ tr(Crcxg Cxhrd)tr(CsaueCuf sb)tr(CsarcCrdxhCxgueCuf sb)

+ tr(Crcxg Cxhrd)tr(CxgueCuf xh)tr(ΣCsaueCuf rdCrcsa)

+ tr(CsaueCuf sb)tr(CxgueCuf xh)tr(ΣCrcsaCsbxhCxgrc)

+ tr(CsarcCrdsb)tr(CsaueCuf sb)tr(ΣCxgrcCrduf Cuexg )

+ tr(CsarcCrdsb)tr(Crcxg Cxhrd)tr(ΣCuexg CxhsbCsaue)

+ tr(CsarcCrdsb)tr(Crcxg Cxhrd)tr(CxgueCuf xh)tr(CuesaCsbuf )

+ tr(CsarcCrdsb)tr(Crcxg Cxhuf CuesaCsbxhCxgueCuf rd)

+ tr(CsaueCuf sb)tr(CsarcCrdxhCxgueCuf rdCrcxg Cxhsb)

(cid:105)

(cid:104)

H17 =

tr(ΣCsaxxΣCxgrcΣCrcueΣCuesa) + tr(Crcxg Cxhrd)tr(ΣCsaxg CxhrdCrcueΣCuesa)

+ tr(ΣCsaxg Cxhuf Cuesa)tr(ΣCrcueCuf xhCxgrc)

+ tr(CuercCrduf )tr(ΣCsaxg ΣCxgrcCrduf Cuesa)

+ tr(Csaxg Cxhsb)tr(ΣCuesaCsbxhCxgrcΣCrcue)

+ tr(ΣCuesaCsbrdCrcue)tr(ΣCxgrcCrdsbCsaxg )

106

+ tr(CsaueCuf sb)tr(ΣCrcueCuf sbCsaxg ΣCxgrc)

+ tr(CsaueCuf sb)tr(Crcxg Cxhrd)tr(CuercCrdxhCxgsaCsbuf )

+ tr(CsaueCuf xhCxgsaCsbrdCrcxg Cxhuf CuercCrdsb)

+ tr(CuercCrduf )tr(Csaxg Cxhsb)tr(CsaueCuf rdCrcxg Cxhsb)

+ tr(CuercCrduf )tr(Crcxg Cxhrd)tr(ΣCsaxg Cxhuf Cuesa)

+ tr(Csaxg Cxhsb)tr(Crcxg Cxhrd)tr(ΣCuesaCsbrdCrcue)

+ tr(CsaueCuf sb)tr(Csaxg Cxhsb)tr(ΣCrcueCuf xhCxgrc)

+ tr(CsaueCuf sb)tr(CuercCrduf )tr(ΣCxgrcCrdsbCsaxg )

+ tr(CsaueCuf sb)tr(CuercCrduf )tr(Crcxg Cxhrd)tr(CxgsaCsbxh)

+ tr(CsaueCuf sb)tr(CuercCrdxhCxgsaCsbrdCrcxg Cxhuf )

+ tr(Csaxg Cxhsb)tr(CsaueCuf rdCrcxg Cxhuf CuercCrdsb)

.

(cid:105)

107

Proof. By deﬁnition of Vsasb(i, j),

(cid:110)

E

Vsasb(i, j)Vrcrd(i, j)Vueuf (i, j)Vxgxh(i, j)

= E

(Y T

isaYjsb

)2(Y T

ircYjrd

)2(Y T

iueYjuf

)2(Y T

ixg Yjxh

= E

(

YisakYjsbk)2(

YircmYjrdm)2(

YiueoYjuf o)2(

YixgqYjxhq)2(cid:111)

(cid:88)

q

E

YisakYjsbkYisalYjsblYircmYjrdmYircnYjrdnYiueoYjuf oYiuepYjuf p

(cid:110)
(cid:110)
(cid:88)

(cid:88)
(cid:110)

k

(cid:88)

m

(cid:111)

=

=

C
× YixgqYjxhqYixgwYjxhw
(cid:110)
(cid:88)
(cid:110)

E

C
× E

(cid:111)

(cid:88)

o

)2(cid:111)

(cid:111)

(cid:111)

YisakYisalYircmYircnYiueoYiuepYixgqYixgw

YjsbkYjsblYjrdmYjrdnYjuf oYjuf pYjxhqYjxhw

(3.18)
where C represents the summation over the p components of the vector Y.. for k, l, m, n, o, p, q,
and w. For each of the expectation terms in (3.18) we apply lemma 7 and sum over the set
C. After some tedious algebra it follows that

,

(cid:110)

E

Vsasb(i, j)Vrcrd(i, j)Vueuf (i, j)Vxgxh(i, j)

(cid:111)

= H1 + H2 + H3 + H4 + H5 + H6

+ H7 + H8 + H9 + H10 + H11 + H12
+ H13 + H14 + H15 + H16 + H17.

(cid:3)

3.8.2 Proofs of theorems

In this section we provide proofs for the theorems given in Chapter 3. Without loss of
generality, assume µt = 0 for all t ∈ {1, . . . , T}, since the test statistic, ˆDnt, is invariant

108

with respect to µt.

Proof of Theorem 7. With the addition of Condition 4 as an assumption, the proof is

similar to the proof of Theorem 2. Condition (a) and Condition (b) hold in the proof of

Theorem 2, and the martingale central limit theorem holds as all required terms are smaller

order with T diverging.

(cid:3)

Proof of Theorem 8. To establish the asymptotic distribution of Mn under H0, we must
show convergence of the ﬁnite-dimensional distributions and the tightness of the stochastic
process maxt∈T σ−1
ˆDntc)T
for t1 < ··· < tc is nearly identical to the proof in 2.7 when T is considered ﬁnite. Thus, it
remains for us to show the tightness of maxt∈T σ−1
ˆDnt so as to conclude Mn converges to
maxt∈T Zt, where Zt is a Gaussian process with mean 0 and correlation Rz

ˆDnt. The joint asymptotic normality of (σ−1
nt1,0

ˆDnt1, . . . , σ−1

ntc,0

nt,0

nt,0

By deﬁnition, ˆDnt = ˆDnt,0 + ˆDnt,2 − 2 ˆDnt,1, where

t(cid:88)

T(cid:88)

(cid:16)

s1=1

s2=t+1

ˆDnt,k =

Us1s1,k + Us2s2,k − Us1s2,k − Us2s1,k

for k ∈ {0, 1, 2}. Furthermore, by Lemmas 3 and 4 in 2.7.1, ˆDnt,1 = op( ˆDnt,0) and ˆDnt,2 =
op( ˆDnt,0). Therefore, to show the tightness of max1≤t<T σ−1
ˆDnt we can focus on the term
ˆDnt,0 where

nt,0

(cid:17)

(cid:17)

t(cid:88)

T(cid:88)

(cid:16)

s1=1

s2=t+1

ˆDnt,0 =

Let t = [T ν], where ν = j/T (j = 1, . . . , T − 1). Deﬁne Gn(ν) as follows:

Us1s10 + Us2s20 − Us1s20 − Us2s10
(cid:112)n(n − 1)

ˆDn[T ν],0

Gn(ν) =

tr(Σ2)T

3
2

.

(3.19)

(3.20)

where the term preceding ˆDn[T ν],0 is the order of σ−1
show the tightness of maxt∈T σ−1

nt,0

nt,0 in terms of n, p, and T . Thus, to

ˆDnt it is equivalent to show the tightness of (3.20).

109

Ws1s2 −
T(cid:88)

s2=[T η]+1

T(cid:88)

Ws1s2 −
(cid:110)
n(cid:88)

1

n(n − 1)

(cid:111)(cid:21)

˜Ws1s2

,

i(cid:54)=j

(cid:112)n(n − 1)

(cid:18) [T η](cid:88)

tr(Σ2)T

3
2

s1=[T ν]+1

(cid:112)n(n − 1)
[T ν](cid:88)

tr(Σ2)T

3
2

−

(cid:20)
[T η](cid:88)

[T η](cid:88)
(cid:110)

s1=1

s2=[T ν]+1

s1=[T ν]+1

s2=[T η]+1

1

n(n − 1)

[T ν](cid:88)

[T η](cid:88)

s1=1

s2=[T ν]+1

(cid:19)

Ws1s2

,

(cid:111)

˜Ws1s2

n(cid:88)

i(cid:54)=j

1

(cid:112)n(n − 1)tr(Σ2)T
[T ν](cid:88)

[T η](cid:88)

3
2

˜Ws1s2

n(cid:88)
(cid:19)

i(cid:54)=j

(cid:18) [T η](cid:88)

T(cid:88)

s1=[T ν]+1

s2=[T η]+1

˜Ws1s2−

,

=

=

=

=

Consider the diﬀerence Gn(η) − Gn(ν), such that η > ν. Let η = i/T (i = 1, . . . , T − 1).

By deﬁnition, Us1s10 = {n(n − 1)}−1(cid:80)
(cid:18) [T η](cid:88)

(cid:112)n(n − 1)

Gn(η) − Gn(ν) =

i(cid:54)=j(Y T
is1

Yjs1

T(cid:88)

)2. Thus, for η, ν ∈ (0, 1),

[T ν](cid:88)

T(cid:88)

s1=1

s2=[T ν]+1

(cid:19)

Ws1s2

,

tr(Σ2)T

3
2

s1=1

s2=[T η]+1

s1=1

s2=[T ν]+1

1

(cid:112)n(n − 1)tr(Σ2)T
(cid:80)T

n(cid:88)
˜Ws1s2 −(cid:80)[T ν]

i(cid:54)=j
)2 − (Y T
is1

)2 + (Y T
is2

Yjs2

3
2

f (i, j),

(3.21)

where ˜Ws1s2 = (Y T
is1

f (i, j) = (cid:80)[T η]

Yjs1

Yjs1
˜Ws1s2. We will bound the
fourth moment of (3.21) to ultimately show the tightness of (3.20). First, we compute some

s1=[T ν]+1

s2=[T ν]+1

s2=[T η]+1

Yjs2

s1=1

)2, and

)2 − (Y T
is2

(cid:80)[T η]

moments of f for various indices.

110

Under the null hypothesis and for i (cid:54)= j,

(cid:110)

(cid:111)

(cid:18) [T η](cid:88)

E

f (i, j)

= E

T(cid:88)

[T ν](cid:88)

[T η](cid:88)

(cid:19)

,

˜Ws1s2

˜Ws1s2 −

s1=[T ν]+1

s2=[T η]+1

s1=1

s2=[T ν]+1

=

[T η](cid:88)
[T ν](cid:88)

s1=[T ν]+1
−

T(cid:88)
[T η](cid:88)

(cid:104)
(cid:104)

s2=[T η]+1

s1=1

s2=[T ν]+1

(cid:105)
) − 2tr(Σs1Σs2)
(cid:105)
) − 2tr(Σs1Σs2)

,

tr(Σ2
s1

) + tr(Σ2
s2

tr(Σ2
s1

) + tr(Σ2
s2

= 0,

(3.22)

) − 2tr(Σs1Σs2) = tr(Σ2) + tr(Σ2) − 2tr(Σ2) = 0. Deﬁne the fol-
R1 ≡
r2=[T ν]+1. The

s1=[T ν]+1

s1=1

r1=1

(cid:80)T
s2=[T η]+1; (cid:80)
(cid:80)[T η]
R2 ≡ (cid:80)[T ν]
(cid:17)(cid:27)
˜Ws1s2 −(cid:88)
(cid:105)

˜Ws1s2

R2

,

E

= E

S1

S2

R1

˜Ws1s2

r1=[T ν]+1

f (i, j)f (i, j)

) + tr(Σ2
s2

since tr(Σ2
s1

second moment under the null hypothesis is given by

(cid:80)T
r2=[T η]+1; (cid:80)
(cid:26)(cid:16)(cid:88)
(cid:111)

lowing notation for the double summation: (cid:80)
S1 ≡ (cid:80)[T η]
(cid:80)[T η]
(cid:80)[T η]
s2=[T ν]+1; (cid:80)
(cid:17)(cid:16)(cid:88)
(cid:110)
(cid:104) ˜Ws1s2
2(cid:88)

S2 ≡ (cid:80)[T ν]
˜Ws1s2 −(cid:88)
(cid:88)
(−1)|x−y|(cid:88)
(−1)|x−y|(cid:88)
(cid:105)

(cid:88)

where Vsasb(i, j) = (Y T
isa

)2. Under the null hypothesis,

2(cid:88)

2(cid:88)

˜Wr1r2

Sx

Ry

Sx

Ry

a,b,c,d=1

Yjsb

=

=

x,y=1

E

,

x,y=1

(cid:104)

(−1)|a−b|+|c−d|E

(cid:104)

(cid:105)

,

Vsasb(i, j)Vrcrd(i, j)

E

Vsasb(i, j)Vrcrd(i, j)

= 2tr2(CsbrdCrcsa) + 2tr(CrcsaCsbrdCrcsaCsbrd)

+ 2tr(ΣCrdsbΣCsbrd) + 2tr(ΣCrcsaΣCsarc) + tr2(Σ2).

111

Therefore, under Condition 1,

(cid:110)

(cid:111)

E

f (i, j)f (i, j)

= C

x,y=1

for some constant C.

2(cid:88)

(−1)|x−y|(cid:88)

Sx

(cid:88)

Ry

2(cid:88)

(−1)|a−b|+|c−d|tr2(CsbrdCrcsa)

a,b,c,d=1

(3.23)

(cid:110)

E

f (i, j)f (i, k)

=

Next, consider mutually diﬀerent indices i, j, k. Thus,

(cid:111)

2(cid:88)

(−1)|x−y|(cid:88)

(cid:88)

2(cid:88)

x,y=1

Sx

Ry

(−1)|a−b|+|c−d|E

Vsasb(i, j)Vrcrd(i, k)

.

a,b,c,d=1

(3.24)

(cid:105)

Under the null hypothesis,

(cid:105)

E

Vsasb(i, j)Vrcrd(i, k)

= tr2(Σ2) + 2tr(ΣCrcsaΣCsarc).

Hence,

(−1)|a−b|+|c−d|(cid:110)
(cid:111)

(cid:110)

a,b,c,d=1

(cid:111)

(cid:110)

Therefore, E

f (i, j)f (i, k)

= 0. Lastly, if we consider the mutually diﬀerent indices i, j, k, l,

then E

f (i, j)f (k, l)

= 0 due to independence and the fact that E

f (i, j)

= 0.

Consider the diﬀerence Gn(η) − Gn(ν) squared.

(cid:111)

tr2(Σ2) + 2tr(ΣCrcsaΣCsarc)

= 0.

(3.25)

(cid:104)

2(cid:88)

(cid:104)

(cid:111)

(cid:110)
(cid:27)2

{Gn(η) − Gn(ν)}2 = {n(n − 1)tr2(Σ2)T 3}−1

f (i, j)

,

i(cid:54)=j

(cid:26)(cid:88)
= 2{n(n − 1)tr2(Σ2)T 3}−1(cid:88)
+ 4{n(n − 1)tr2(Σ2)T 3}−1 (cid:88)
+ {n(n − 1)tr2(Σ2)T 3}−1 (cid:88)

i(cid:54)=j

i(cid:54)=j(cid:54)=k

f (i, j)f (i, j)

f (i, j)f (i, k)

f (i, j)f (k, l).

i(cid:54)=j(cid:54)=k(cid:54)=l

112

For any real numbers a, b, and c, (a + b + c)2 ≤ 4a2 + 4b2 + 2c2. Thus,

{Gn(η) − Gn(ν)}4 ≤ 16{n2(n − 1)2tr4(Σ2)T 6}−1

f (i, j)f (i, j)

i(cid:54)=j
+ 64{n2(n − 1)2tr4(Σ2)T 6}−1

f (i, j)f (i, k)

+ 2{n2(n − 1)2tr4(Σ2)T 6}−1

f (i, j)f (k, l)

.

(cid:27)2

(cid:27)2
(cid:27)2

(cid:27)2(cid:21)

(cid:26)(cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)

i(cid:54)=j(cid:54)=k

i(cid:54)=j(cid:54)=k(cid:54)=l

i(cid:54)=j

(cid:20)(cid:26)(cid:88)
(cid:20)(cid:26) (cid:88)
(cid:20)(cid:26) (cid:88)

i(cid:54)=j(cid:54)=k

i(cid:54)=j(cid:54)=k(cid:54)=l

(cid:20)

(cid:21)

Taking the expectation of both sides of the above inequality it follows that

{Gn(η) − Gn(ν)}4

E

≤ 16{n2(n − 1)2tr4(Σ2)T 6}−1E

f (i, j)f (i, j)

+ 64{n2(n − 1)2tr4(Σ2)T 6}−1E

f (i, j)f (i, k)

+ 2{n2(n − 1)2tr4(Σ2)T 6}−1E

f (i, j)f (k, l)

,

(cid:27)2(cid:21)
(cid:27)2(cid:21)

≡ I1 + I2 + I3.

(3.26)

To bound the expectation in (3.26) we need the order of I1, I2, and I3. Thus, we need to

expand multiple summations where the summation is across multiple non-identical indices.

First, consider the possible indices for expanding the term inside the expectation for I1 in

(3.26). Consider(cid:110)(cid:88)

i(cid:54)=j

(cid:111)2

f (i, j)f (i, j)

(cid:88)

i(cid:54)=j

(cid:88)

i1(cid:54)=j1

=

f (i, j)f (i, j)f (i1, j1)f (i1, j1).

(3.27)

Let Dc = {i, j} ∪ {i1, j1} be the set of indices that are not equivalent to each other where
c represents the number of indices that are equivalent to each other in two sets {i, j} and
{i1, j1}. If there are no equivalent indices, then

D0 = {(i, j, i1, j1)}.

113

Hence, the summation over D0 is given by

(cid:88)

i(cid:54)=i1(cid:54)=j(cid:54)=j1

f (i, j)f (i, j)f (i1, j1)f (i1, j1)

(3.28)

If there is one equivalent index, then

D1 = {(i = i1, j, j1), (i = j1, j, i1), (i, j = i1, j1), (i, j = j1, i1)}.

Let ˜D1 be the set with one equivalent index that produces a unique combination of

f (i, j)f (i, j)f (i1, j1)f (i1, j1).

˜D1 = {(i = i1, j, j1)}.

Hence, the summation over D1 is equivalent to

4f (i, j)f (i, j)f (i, j1)f (i, j1)

(3.29)

If there are two equivalent indices, then

D2 = {(i = i1, j = j1), (i = j1, j = i1)}.

Hence, the summation over D2 is given by

(cid:88)

i(cid:54)=j(cid:54)=j1

(cid:88)

i(cid:54)=j

(cid:27)2(cid:21)

As a result, from (3.28) – (3.30),

(cid:20)(cid:26)(cid:88)

i(cid:54)=j

E

f (i, j)f (i, j)

= E

f (i, j)f (i, j)f (i1, j1)f (i1, j1)

(3.31)

2f (i, j)f (i, j)f (i, j)f (i, j).

(3.30)

(cid:27)

(cid:27)

(cid:27)

+ 4E

f (i, j)f (i, j)f (i, j1)f (i, j1)

+ 2E

f (i, j)f (i, j)f (i, j)f (i, j)

.

(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26)(cid:88)

i(cid:54)=i1(cid:54)=j(cid:54)=j1

i(cid:54)=j(cid:54)=j1

i(cid:54)=j

114

Thus,

I1 = 16{n2(n − 1)2tr4(Σ2)T 6}−1E

= 16{n2(n − 1)2tr4(Σ2)T 6}−1E

(cid:27)2(cid:21)

f (i, j)f (i, j)

i(cid:54)=j

(cid:20)(cid:26)(cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26)(cid:88)

i(cid:54)=j

i(cid:54)=j(cid:54)=j1

i(cid:54)=i1(cid:54)=j(cid:54)=j1

f (i, j)f (i, j)f (i1, j1)f (i1, j1)

+ 64{n2(n − 1)2tr4(Σ2)T 6}−1E

f (i, j)f (i, j)f (i, j1)f (i, j1)

+ 32{n2(n − 1)2tr4(Σ2)T 6}−1E

f (i, j)f (i, j)f (i, j)f (i, j)

(cid:27)

(cid:27)

(cid:27)

(3.32)

(3.33)

≡ R1 + R2 + R3.

We now show the order for each of R1, R2, and R3 in terms of n, p, and T . For R1, consider
(cid:111)
the order for E{f (i, j)f (i, j)f (i1, j1)f (i1, j1)} for the mutually diﬀerent indices.

(cid:110)

(cid:110)

(cid:111)

E

f (i, j)f (i, j)f (i1, j1)f (i1, j1)

= E

(cid:20)

=

C

f (i, j)f (i, j)

E

f (i1, j1)f (i1, j1)

(cid:110)
(cid:111)
2(cid:88)
(−1)|x−y|(cid:88)
(cid:21)2
2(cid:88)
(−1)|a−b|+|c−d|tr2(CsbrdCrcsa)

(cid:88)

Ry

Sx

x,y=1

×

a,b,c,d=1

(cid:20)

(cid:16) C

for some constant C. Therefore,

R1 (cid:16)

C[n2(n − 1)2]

n2(n − 1)2tr4(Σ2)T 6

(cid:20)

(cid:110)
(cid:110)

tr2(Σ2)

T 2([T η] − [T ν])

tr2(Σ2)

T 2([T η] − [T ν])

(cid:111)(cid:21)2
(cid:111)(cid:21)2

.

,

(3.34)

(3.35)

(cid:16) C

([T η] − [T ν])2

T 2

,

115

for some constant C. Next, consider the term R3. From lemma 7,
(cid:110)
E{Vsasb(i, j)Vrcrd(i, j)Vueuf (i, j)Vxgxh(i, j)} was calculated up to a constant. Thus,

(cid:111)

E

f (i, j)f (i, j)f (i, j)f (i, j)

is given by

2(cid:88)

×

w,x,y,z=1

(−1)|w−x|+|y−z|(cid:88)
2(cid:88)

Sw

(cid:88)

(cid:88)

(cid:88)

Rx

Uy

X z

(−1)|a−b|+|c−d|+|e−f|+|g−h|E

a,b,c,d,e,f,g,h=1

(cid:110)

Vsasb(i, j)Vrcrd(i, j)

× Vueuf (i, j)Vxgxh(i, j)

(cid:111)

.

Under the null hypothesis,

(cid:110)

E

Therefore,

(cid:20)

(cid:111) (cid:16) C
(cid:20)

(cid:110)

(cid:110)

f (i, j)f (i, j)f (i, j)f (i, j)

tr2(Σ2)

T 2([T η] − [T ν])

R3 (cid:16)

C

[n(n − 1)]tr4(Σ2)T 6

tr2(Σ2)

T 2([T η] − [T ν])

and thus R3 = o(R1). For the ﬁnal term in I1, R2, consider the order of
E{f (i, j)f (i, j)f (i, j1)f (i, j1)}. By the Cauchy-Schwarz inequality

(cid:110)

E

f (i, j)f (i, j)f (i, j1)f (i, j1)

E

f (i, j)f (i, j)f (i, j)f (i, j)

(cid:111)(cid:21)1/2

f (i, j1)f (i, j1)f (i, j1)f (i, j1)

(cid:20)

(cid:111) ≤

(cid:110)

(cid:110)
(cid:20)
(cid:110)

E

E

×

(cid:18)

(cid:16) O

f (i, j)f (i, j)f (i, j)f (i, j)

(cid:111)(cid:21)2

.

(cid:111)(cid:21)2

,

(cid:111)(cid:21)1/2

(cid:111)(cid:19)

.

Therefore, based on the above results for R3 it follows that R2 = o(R1). As a result,

for some constant C.

I1 ≤ C

([T η] − [T ν])2

T 2

,

116

(3.36)

(3.37)

Next, we investigate the order of I2. Consider the possible indices for expanding

(cid:110) (cid:88)

i(cid:54)=j(cid:54)=k

(cid:111)2

(cid:88)

(cid:88)

i(cid:54)=j(cid:54)=k

i1(cid:54)=j1(cid:54)=k1

f (i, j)f (i, k)

=

f (i, j)f (i, k)f (i1, j1)f (i1, k1).

(3.38)

Let Ec = {i, j, k} ∪ {i1, j1, k1} be the set of indices that are not equivalent to each other,
where c represents the number of indices that are equivalent to each other in two sets {i, j, k}
and {i1, j1, k1}. If there are no equivalent indices, then

E0 = {(i, j, k, i1, j1, k1)}.

Hence, the summation over E0 is given by

(cid:88)

i(cid:54)=i1(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1

If there is one equivalent index, then

f (i, j)f (i, k)f (i1, j1)f (i1, k1).

(3.39)

E1 = {(i = i1, j, k, j1, k1), (i = j1, j, k, i1, k1), (i = k1, j, k, i1, j1),

(i, j = i1, k, j1, k1), (i, j = j1, k, i1, k1), (i, j = k1, k, i1, j1),
(i, j, k = i1, j1, k1), (i, j, k = j1, i1, k1), (i, j, k = k1, i1, j1)}.

Let ˜E1 be the set with one equivalent index that produces a unique combination of

f (i, j)f (i, k)f (i1, j1)f (i1, k1), such that

˜E1 = {(i = i1, j, k, j1, k1), (i = j1, j, k, i1, k1), (i, j = j1, k, i1, k1)}.

Hence, the summation over E1 is equivalent to

f (i, j)f (i, k)f (i, j1)f (i, k1)

(3.40)

(cid:88)

i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1

(cid:88)
(cid:88)

+

+

i(cid:54)=i1(cid:54)=j(cid:54)=k(cid:54)=k1

i(cid:54)=i1(cid:54)=j(cid:54)=k(cid:54)=k1

4f (i, j)f (i, k)f (i1, i)f (i1, k1)

4f (i, j)f (i, k)f (i1, j)f (i1, k1)

117

If there are two equivalent indices, then

E2 = {(i = i1, j = j1, k, k1), (i = i1, j = k1, k, j1), (i = j1, j = i1, k, k1),

(i = j1, j = k1, k, i1), (i = k1, j = i1, k, j1), (i = k1, j = j1, k, i1),

(i = i1, k = j1, j, k1), (i = i1, k = k1, j, j1), (i = j1, k = i1, j, k1),

(i = j1, k = k1, k, i1), (i = k1, k = i1, j, j1), (i = k1, k = j1, j, i1),

(j = i1, k = j1, i, k1), (j = i1, k = k1, i, j1), (j = j1, k = i1, i, k1),
(j = j1, k = k1, i, i1), (j = k1, k = i1, i, j1), (j = k1, k = j1, i, i1)}.

Let ˜E2 be the set with two equivalent indices that produces a unique combination of

f (i, j)f (i, k)f (i1, j1)f (i1, k1).
˜E2 = {(i = i1, j = j1, k, k1), (i = j1, j = i1, k, k1), (i = j1, j = k1, k, i1), (j = j1, k = k1, i, i1)}.

Hence, the summation over E2 is equivalent to

4f (i, j)f (i, k)f (i, j)f (i, k1)

(3.41)

(cid:88)

i(cid:54)=j(cid:54)=k(cid:54)=k1

+

+

+

(cid:88)
(cid:88)
(cid:88)

i(cid:54)=j(cid:54)=k(cid:54)=k1

i(cid:54)=i1(cid:54)=j(cid:54)=k

i(cid:54)=i1(cid:54)=j(cid:54)=k

4f (i, j)f (i, k)f (j, i)f (j, k1)

8f (i, j)f (i, k)f (i1, i)f (i1, j)

2f (i, j)f (i, k)f (i1, j)f (i1, k).

Lastly, if there are three equivalent indices, then

E3 = {(i = i1, j = j1, k = k1), (i = i1, j = k1, k = j1), (i = j1, j = i1, k = k1),
(i = j1, j = k1, k = i1), (i = k1, j = j1, k = i1), (i = k1, j = i1, k = j1)}.

Let ˜E3 be the set with two equivalent indices that produces a unique combination of

f (i, j)f (i, k)f (i1, j1)f (i1, k1) such that

˜E3 = {(i = i1, j = j1, k = k1), (i = j1, j = i1, k = k1)}.

118

Hence, the summation over E3 is equivalent to

2f (i, j)f (i, k)f (i, j)f (i, k)

+

4f (i, j)f (i, k)f (j, i)f (j, k)

(cid:88)

i(cid:54)=j(cid:54)=k

(cid:88)

i(cid:54)=j(cid:54)=k

(cid:27)2(cid:21)

As a result, from (3.39) – (3.42),

(cid:20)(cid:26) (cid:88)

i(cid:54)=j(cid:54)=k

E

f (i, j)f (i, k)

= E

f (i, j)f (i, k)f (i1, j1)f (i1, k1)

+ E

f (i, j)f (i, k)f (i, j1)f (i, k1)

(3.42)

(3.43)

(cid:27)

(cid:27)

(cid:27)
(cid:27)

f (i, j)f (i, k)f (i1, i)f (i1, k1)

f (i, j)f (i, k)f (i1, j)f (i1, k1)

(cid:27)
(cid:27)
(cid:27)
(cid:27)

f (i, j)f (i, k)f (i, j)f (i, k1)

f (i, j)f (i, k)f (j, i)f (j, k1)

f (i, j)f (i, k)f (i1, i)f (i1, j)

f (i, j)f (i, k)f (i1, j)f (i1, k)

+ 2E

f (i, j)f (i, k)f (i, j)f (i, k)

+ 4E

f (i, j)f (i, k)f (j, i)f (j, k)

.

(cid:27)
(cid:27)

i(cid:54)=i1(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1

i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1

i(cid:54)=i1(cid:54)=j(cid:54)=k(cid:54)=k1

i(cid:54)=i1(cid:54)=j(cid:54)=k(cid:54)=k1

(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)

i(cid:54)=j(cid:54)=k

+ 4E

+ 4E

+ 4E

+ 4E

+ 8E

+ 2E

i(cid:54)=i1(cid:54)=j(cid:54)=k

i(cid:54)=i1(cid:54)=j(cid:54)=k

i(cid:54)=j(cid:54)=k(cid:54)=k1

i(cid:54)=j(cid:54)=k(cid:54)=k1

i(cid:54)=j(cid:54)=k

119

Thus,

I2 = 64{n2(n − 1)2tr4(Σ2)T 6}−1E

= 64{n2(n − 1)2tr4(Σ2)T 6}−1E

(cid:27)2(cid:21)

f (i, j)f (i, k)

,

+ 64{n2(n − 1)2tr4(Σ2)T 6}−1E

+ 256{n2(n − 1)2tr4(Σ2)T 6}−1E

+ 256{n2(n − 1)2tr4(Σ2)T 6}−1E

+ 256{n2(n − 1)2tr4(Σ2)T 6}−1E

+ 256{n2(n − 1)2tr4(Σ2)T 6}−1E

+ 512{n2(n − 1)2tr4(Σ2)T 6}−1E

i(cid:54)=i1(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1

i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1

i(cid:54)=i1(cid:54)=j(cid:54)=k(cid:54)=k1

i(cid:54)=i1(cid:54)=j(cid:54)=k(cid:54)=k1

i(cid:54)=j(cid:54)=k

(cid:20)(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)

i(cid:54)=j(cid:54)=k

i(cid:54)=j(cid:54)=k

i(cid:54)=i1(cid:54)=j(cid:54)=k

i(cid:54)=i1(cid:54)=j(cid:54)=k

i(cid:54)=j(cid:54)=k(cid:54)=k1

i(cid:54)=j(cid:54)=k(cid:54)=k1

(cid:27)

(cid:27)

(cid:27)
(cid:27)

f (i, j)f (i, k)f (i1, j1)f (i1, k1)

f (i, j)f (i, k)f (i, j1)f (i, k1)

f (i, j)f (i, k)f (i1, i)f (i1, k1)

f (i, j)f (i, k)f (i1, j)f (i1, k1)

f (i, j)f (i, k)f (i, j)f (i, k1)

f (i, j)f (i, k)f (j, i)f (j, k1)

f (i, j)f (i, k)f (i1, i)f (i1, j)

(cid:27)
(cid:27)
(cid:27)
(cid:27)

(cid:27)
(cid:27)

+ 128{n2(n − 1)2tr4(Σ2)T 6}−1E

f (i, j)f (i, k)f (i1, j)f (i1, k)

+ 128{n2(n − 1)2tr4(Σ2)T 6}−1E

f (i, j)f (i, k)f (i, j)f (i, k)

+ 256{n2(n − 1)2tr4(Σ2)T 6}−1E

f (i, j)f (i, k)f (j, i)f (j, k)

,

≡ S1 + S2 + S3 + S4 + S5 + S6 + S7 + S8 + S9 + S10.

(3.44)

Under the null hypothesis, S1, S2, S3, and S4 are all zero. Terms S9 and S10 are of the same

order as R2. Thus, S9 = o(I1) and S10 = o(I1). Terms S5, S6, S7, and S8 are all of the same

order in terms of n as term R1. Additionally, using two iterations of the Cauchy Schwarz

inequality, these terms will be the same as R3 in terms of n and p. Thus, S5, S6, S7, and S8

120

I2 ≤ C
(cid:88)

(cid:88)

(cid:111)2

(cid:110) (cid:88)

i(cid:54)=j(cid:54)=k(cid:54)=l

are all smaller order terms in comparison to I1. As a result, for some constant C,

([T η] − [T ν])2

T 2

.

(3.45)

Finally, we show the order of I3. Consider the possible indices for expanding

f (i, j)f (k, l)

=

f (i, j)f (k, l)f (i1, j1)f (k1, l1).

(3.46)

i(cid:54)=j(cid:54)=k(cid:54)=l

i1(cid:54)=j1(cid:54)=k1(cid:54)=l1

Let Fc = {i, j, k, l} ∪ {i1, j1, k1, l1} be the set of indices that are not equivalent to each
other, where c represents the number of indices that are equivalent to each other in two sets
{i, j, k, l} and {i1, j1, k1, l1}. If there are no equivalent indices, then

F0 = {(i, j, k, l, i1, j1, k1, l1)}.
(cid:88)

f (i, j)f (k, l)f (i1, j1)f (k1, l1).

Hence, the summation over F0 is given by

i(cid:54)=i1(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1(cid:54)=l(cid:54)=l1

If there is one equivalent index, then

F1 = {(i = i1, j, k, l, j1, k1, l1), (i = j1, j, k, l, i1, k1, l1), (i = k1, j, k, l, i1, j1, l1),

(i = l1, j, k, l, i1, j1, k1), (i, j = i1, k, l, j1, k1, l1), (i, j = j1, k, l, i1, k1, l1),

(i, j = k1, k, l, i1, j1, l1), (i, j = l1, k, l, i1, j1, k1)(i, j, k = i1, l, j1, k1, l1),

(i, j, k = j1, l, i1, k1, l1), (i, j, k = k1, l, i1, j1, l1), (i, j, k = l1, l, i1, j1, k1)

(i, j, k, l = i1, j1, k1, l1), (i, j, k, l = j1, i1, k1, l1), (i, j, k, l = k1, i1, j1, l1),
(i, j, k, l = l1, i1, j1, k1)}.
The summation over F1 is equivalent to(cid:88)

16f (i, j)f (k, l)f (i, j1)f (k1, l1).

i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1(cid:54)=l(cid:54)=l1
If there are two equivalent indices, then

(3.47)

(3.48)

F2 = {(i = i1, j = j1, k, l, k1, l1), (i = i1, j = k1, k, l, j1, l1), (i = i1, j = l1, k, l, j1, k1),

121

(i = j1, j = i1, k, l, k1, l1), (i = j1, j = k1, k, l, i1, l1), (i = j1, j = l1, k, l, i1, k1),

(i = k1, j = i1, k, l, j1, l1), (i = k1, j = j1, k, l, i1, l1), (i = k1, j = l1, k, l, i1, j1),

(i = l1, j = i1, k, l, j1, k1), (i = l1, j = j1, k, l, i1, k1), (i = l1, j = k1, k, l, i1, j1),

(i = i1, k = j1, j, l, k1, l1), (i = i1, k = k1, j, l, j1, l1), (i = i1, k = l1, j, l, j1, k1),

(i = j1, k = i1, j, l, k1, l1), (i = j1, k = k1, j, l, i1, l1), (i = j1, k = l1, j, l, i1, k1),

(i = k1, k = i1, j, l, j1, l1), (i = k1, k = j1, j, l, i1, l1), (i = k1, k = l1, j, l, i1, j1),

(i = l1, k = i1, j, l, j1, k1), (i = l1, k = j1, j, l, i1, k1), (i = l1, k = k1, j, l, i1, j1),

(i = i1, l = j1, j, k, k1, l1), (i = i1, l = k1, j, k, j1, l1), (i = i1, l = l1, j, k, j1, k1),

(i = j1, l = i1, j, k, k1, l1), (i = j1, l = k1, j, k, i1, l1), (i = j1, l = l1, j, k, i1, k1),

(i = k1, l = i1, j, k, j1, l1), (i = k1, l = j1, j, k, i1, l1), (i = k1, l = l1, j, k, i1, j1),

(i = l1, l = i1, j, k, j1, k1), (i = l1, l = j1, j, k, i1, k1), (i = l1, l = k1, j, k, i1, j1),

(j = i1, k = j1, i, l, k1, l1), (j = i1, k = k1, i, l, j1, l1), (j = i1, k = l1, i, l, j1, k1),

(j = j1, k = i1, i, l, k1, l1), (j = j1, k = k1, i, l, i1, l1), (j = j1, k = l1, i, l, i1, k1),

(j = k1, k = i1, i, l, j1, l1), (j = k1, k = j1, i, l, i1, l1), (j = k1, k = l1, i, l, i1, j1),

(j = l1, k = i1, i, l, j1, k1), (j = l1, k = j1, i, l, i1, k1), (j = l1, k = k1, i, l, i1, j1),

(j = i1, l = j1, i, k, k1, l1), (j = i1, l = k1, i, k, j1, l1), (j = i1, l = l1, i, k, j1, k1),

(j = j1, l = i1, i, k, k1, l1), (j = j1, l = k1, i, k, i1, l1), (j = j1, l = l1, i, k, i1, k1),

(j = k1, l = i1, i, k, j1, l1), (j = k1, l = j1, i, k, i1, l1), (j = k1, l = l1, i, k, i1, j1),

(j = l1, l = i1, i, k, j1, k1), (j = l1, l = j1, i, k, i1, k1), (j = l1, l = k1, i, k, i1, j1),

(k = i1, l = j1, i, j, k1, l1), (k = i1, l = k1, i, j, j1, l1), (k = i1, l = l1, i, j, j1, k1),

(k = j1, l = i1, i, j, k1, l1), (k = j1, l = k1, i, j, i1, l1), (k = j1, l = l1, i, j, i1, k1),

(k = k1, l = i1, i, j, j1, l1), (k = k1, l = j1, i, j, i1, l1), (k = k1, l = l1, i, j, i1, j1),
(k = l1, l = i1, i, j, j1, k1), (k = l1, l = j1, i, j, i1, k1), (k = l1, l = k1, i, j, i1, j1)}.

122

The summation over F2 is equivalent to

(cid:88)

+

+

(cid:88)
(cid:88)

4f (i, j)f (k, l)f (i, j)f (k1, l1)

i(cid:54)=j(cid:54)=k(cid:54)=k1(cid:54)=l(cid:54)=l1

28f (i, j)f (k, l)f (i, j1)f (j, l1)

i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=l(cid:54)=l1

32f (i, j)f (k, l)f (i, j1)f (k, l1).

(3.49)

i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=l(cid:54)=l1
If there are three equivalent indices, then
F3 = {(i = i1, j = j1, k = k1, l, l1), (i = i1, j = j1, k = l1, l, k1), (i = i1, j = k1, k = j1, l, l1),

(i = i1, j = k1, k = l1, l, j1), (i = i1, j = l1, k = j1, l, k1), (i = i1, j = l1, k = k1, l, j1),

(i = j1, j = i1, k = k1, l, l1), (i = j1, j = i1, k = l1, l, k1), (i = j1, j = k1, k = i1, l, l1),

(i = j1, j = k1, k = l1, l, i1), (i = j1, j = l1, k = i1, l, k1), (i = j1, j = l1, k = k1, l, i1),

(i = k1, j = i1, k = j1, l, l1), (i = k1, j = i1, k = l1, l, j1), (i = k1, j = j1, k = i1, l, l1),

(i = k1, j = j1, k = l1, l, i1), (i = k1, j = l1, k = i1, l, j1), (i = k1, j = l1, k = j1, l, i1),

(i = l1, j = i1, k = j1, l, k1), (i = l1, j = i1, k = k1, l, j1), (i = l1, j = j1, k = i1, l, k1),

(i = l1, j = j1, k = k1, l, i1), (i = l1, j = k1, k = i1, l, j1), (i = l1, j = k1, k = j1, l, i1),

(i = i1, j = j1, l = k1, k, l1), (i = i1, j = j1, l = l1, k, k1), (i = i1, j = k1, l = j1, k, l1),

(i = i1, j = k1, l = l1, k, j1), (i = i1, j = l1, l = j1, k, k1), (i = i1, j = l1, l = k1, k, j1),

(i = j1, j = i1, l = k1, k, l1), (i = j1, j = i1, l = l1, k, k1), (i = j1, j = k1, l = i1, k, l1),

(i = j1, j = k1, l = l1, k, i1), (i = j1, j = l1, l = i1, k, k1), (i = j1, j = l1, l = k1, k, i1),

(i = k1, j = i1, l = j1, k, l1), (i = k1, j = i1, l = l1, k, j1), (i = k1, j = j1, l = i1, k, l1),

(i = k1, j = j1, l = l1, k, i1), (i = k1, j = l1, l = i1, k, j1), (i = k1, j = l1, l = j1, k, i1),

(i = l1, j = i1, l = j1, k, k1), (i = l1, j = i1, l = k1, k, j1), (i = l1, j = j1, l = i1, k, k1),

(i = l1, j = j1, l = k1, k, i1), (i = l1, j = k1, l = i1, k, j1), (i = l1, j = k1, l = j1, k, i1),

(i = i1, k = j1, l = k1, j, l1), (i = i1, k = j1, l = l1, j, k1), (i = i1, k = k1, l = j1, j, l1),

(i = i1, k = k1, l = l1, j, j1), (i = i1, k = l1, l = j1, j, k1), (i = i1, k = l1, l = k1, j, j1),

123

(i = j1, k = i1, l = k1, j, l1), (i = j1, k = i1, l = l1, j, k1), (i = j1, k = k1, l = i1, j, l1),

(i = j1, k = k1, l = l1, j, i1), (i = j1, k = l1, l = i1, j, k1), (i = j1, k = l1, l = k1, j, i1),

(i = k1, k = i1, l = j1, j, l1), (i = k1, k = i1, l = l1, j, j1), (i = k1, k = j1, l = i1, j, l1),

(i = k1, k = j1, l = l1, j, i1), (i = k1, k = l1, l = i1, j, j1), (i = k1, k = l1, l = j1, j, i1),

(i = l1, k = i1, l = j1, j, k1), (i = l1, k = i1, l = l1, j, j1), (i = l1, k = j1, l = i1, j, k1),

(i = l1, k = j1, l = l1, j, i1), (i = l1, k = k1, l = i1, j, j1), (i = l1, k = k1, l = j1, j, i1),

(j = i1, k = j1, l = k1, i, l1), (j = i1, k = j1, l = l1, i, k1), (j = i1, k = k1, l = j1, i, l1),

(j = i1, k = k1, l = l1, i, j1), (j = i1, k = l1, l = j1, i, k1), (j = i1, k = l1, l = k1, i, j1),

(j = j1, k = i1, l = k1, i, l1), (j = j1, k = i1, l = l1, i, k1), (j = j1, k = k1, l = i1, i, l1),

(j = j1, k = k1, l = l1, i, i1), (j = j1, k = l1, l = i1, i, k1), (j = j1, k = l1, l = k1, i, i1),

(j = k1, k = i1, l = j1, i, l1), (j = k1, k = i1, l = l1, i, j1), (j = k1, k = j1, l = i1, i, l1),

(j = k1, k = j1, l = l1, i, i1), (j = k1, k = l1, l = i1, i, j1), (j = k1, k = l1, l = j1, i, i1),

(j = l1, k = i1, l = j1, i, k1), (j = l1, k = i1, l = k1, i, j1), (j = l1, k = j1, l = i1, i, k1),
(j = l1, k = j1, l = k1, i, i1), (j = l1, k = k1, l = i1, i, j1), (j = l1, k = k1, l = j1, i, i1)}.
(cid:88)

(cid:88)

The summation over F3 is equivalent to

32f (i, j)f (k, l)f (i, j)f (k, l1) +

64f (i, j)f (k, l)f (i, k)f (j, l1).

(3.50)

i(cid:54)=j(cid:54)=k(cid:54)=l(cid:54)=l1

i(cid:54)=j(cid:54)=k(cid:54)=l(cid:54)=l1

Lastly, if there are four equivalent indices, then

F4 = {(i = i1, j = j1, k = k1, l = l1), (i = i1, j = j1, k = l1, l = k1),

(i = i1, j = k1, k = j1, l = l1), (i = i1, j = k1, k = l1, l = j1),

(i = i1, j = l1, k = j1, l = k1), (i = i1, j = l1, k = k1, l = j1),

(i = j1, j = i1, k = k1, l = l1), (i = j1, j = i1, k = l1, l = k1),

(i = j1, j = k1, k = i1, l = l1), (i = j1, j = k1, k = l1, l = i1),

(i = j1, j = l1, k = i1, l = k1), (i = j1, j = l1, k = k1, l = i1),

124

(i = k1, j = i1, k = j1, l = l1), (i = k1, j = i1, k = l1, l = j1),

(i = k1, j = j1, k = i1, l = l1), (i = k1, j = j1, k = l1, l = i1),

(i = k1, j = l1, k = i1, l = j1), (i = k1, j = l1, k = j1, l = i1),

(i = l1, j = i1, k = j1, l = k1), (i = l1, j = i1, k = k1, l = j1),

(i = l1, j = j1, k = i1, l = k1), (i = l1, j = j1, k = k1, l = i1),
(i = l1, j = k1, k = i1, l = j1), (i = l1, j = k1, k = j1, l = i1)}.

The summation over F4 is equivalent to

6f (i, j)f (k, l)f (i, j)f (k, l) +

(cid:88)

i(cid:54)=j(cid:54)=k(cid:54)=l

18f (i, j)f (k, l)f (i, k)f (j, l).

(3.51)

(cid:88)

i(cid:54)=j(cid:54)=k(cid:54)=l

(cid:20)(cid:110) (cid:88)

i(cid:54)=j(cid:54)=k(cid:54)=l

As a result from (3.47) – (3.51),

(cid:111)2(cid:21)

(cid:26)

E

f (i, j)f (k, l)

= E

f (i, j)f (k, l)f (i1, j1)f (k1, l1)

(cid:27)
(cid:27)

(cid:27)
(cid:27)
(cid:27)

f (i, j)f (k, l)f (i, j1)f (k1, l1)

f (i, j)f (k, l)f (i, j)f (k1, l1)

f (i, j)f (k, l)f (i, j1)f (j, l1)

f (i, j)f (k, l)f (i, j1)f (k, l1)

f (i, j)f (k, l)f (i, j)f (k, l1)

f (i, j)f (k, l)f (i, k)f (j, l1)

(cid:27)
(cid:27)

(cid:27)
(cid:27)

i(cid:54)=i1(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1(cid:54)=l(cid:54)=l1

i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1(cid:54)=l(cid:54)=l1

+ 16E

+ 4E

+ 28E

+ 32E

+ 32E

+ 64E

i(cid:54)=j(cid:54)=k(cid:54)=k1(cid:54)=l(cid:54)=l1

(cid:88)
(cid:26)
(cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)

i(cid:54)=j(cid:54)=k(cid:54)=l(cid:54)=l1

i(cid:54)=j(cid:54)=k(cid:54)=l(cid:54)=l1

i(cid:54)=j(cid:54)=k(cid:54)=l

i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=l(cid:54)=l1

i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=l(cid:54)=l1

i(cid:54)=j(cid:54)=k(cid:54)=l

125

+ 6E

f (i, j)f (k, l)f (i, j)f (k, l)

+ 18E

f (i, j)f (k, l)f (i, k)f (j, l)

Thus,

I3 = 2{n2(n − 1)2tr4(Σ2)T 6}−1E

= 2{n2(n − 1)2tr4(Σ2)T 6}−1E

(cid:20)(cid:26) (cid:88)
(cid:26)

i(cid:54)=j(cid:54)=k(cid:54)=l

(cid:27)2(cid:21)

f (i, j)f (k, l)

,

+ 32{n2(n − 1)2tr4(Σ2)T 6}−1E

+ 8{n2(n − 1)2tr4(Σ2)T 6}−1E

+ 56{n2(n − 1)2tr4(Σ2)T 6}−1E

+ 64{n2(n − 1)2tr4(Σ2)T 6}−1E

+ 64{n2(n − 1)2tr4(Σ2)T 6}−1E

(cid:27)
(cid:27)

f (i, j)f (k, l)f (i1, j1)f (k1, l1)

f (i, j)f (k, l)f (i, j1)f (k1, l1)

f (i, j)f (k, l)f (i, j)f (k1, l1)

f (i, j)f (k, l)f (i, j1)f (j, l1)

f (i, j)f (k, l)f (i, j1)f (k, l1)

f (i, j)f (k, l)f (i, j)f (k, l1)

(cid:27)
(cid:27)
(cid:27)

(cid:27)
(cid:27)

i(cid:54)=i1(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1(cid:54)=l(cid:54)=l1

i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=k1(cid:54)=l(cid:54)=l1

i(cid:54)=j(cid:54)=k(cid:54)=k1(cid:54)=l(cid:54)=l1

(cid:88)
(cid:26)
(cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)
(cid:26) (cid:88)

i(cid:54)=j(cid:54)=k(cid:54)=l(cid:54)=l1

i(cid:54)=j(cid:54)=k(cid:54)=l

i(cid:54)=j(cid:54)=k(cid:54)=l

i(cid:54)=j(cid:54)=k(cid:54)=l(cid:54)=l1

i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=l(cid:54)=l1

i(cid:54)=j(cid:54)=j1(cid:54)=k(cid:54)=l(cid:54)=l1

(cid:27)
(cid:27)

,

(cid:110)

(cid:111)

(3.52)

= 0 for

f (i, j)

+ 128{n2(n − 1)2tr4(Σ2)T 6}−1E

f (i, j)f (k, l)f (i, k)f (j, l1)

+ 12{n2(n − 1)2tr4(Σ2)T 6}−1E

f (i, j)f (k, l)f (i, j)f (k, l)

+ 36{n2(n − 1)2tr4(Σ2)T 6}−1E

f (i, j)f (k, l)f (i, k)f (j, l)

≡ Q1 + Q2 + Q3 + Q4 + Q5 + Q6 + Q7 + Q8 + Q9.

Due to the mutually diﬀerent indices, Q1, Q2, Q3, Q4 all equal zero since E

(cid:110)

(cid:111)

(cid:110)

(cid:111)

(cid:111)

(cid:110)

(cid:111)

(cid:110)

i diﬀerent than j. Furthermore, under the null hypothesis, Q5, Q6, and Q7 all equal zero.

For Q5, consider E

f (i, j)f (k, l)f (i, j1)f (k, l1)

. Due to the mutually diﬀerent indices

E

f (i, j)f (k, l)f (i, j1)f (k, l1)

= E

f (i, j)f (i, j1)

E

f (k, l)f (k, l1)

. By (3.25), each of

the expectation terms is zero and thus Q5 = 0. Similarly, for

126

(cid:110)

(cid:110)

(cid:110)

Q6, E

f (i, j)f (k, l)f (i, j)f (k, l1)

the term E

f (k, l)f (k, l1)

E

f (i, j)f (k, l)f (i, k)f (j, l1)

= E

. Again, by (3.25)

S1

S2

(cid:111)

(cid:111)

(cid:111)

(cid:111)

f (k, l)f (k, l1)

˜Ws1s2(i, j)

˜Wr1r2(k, l)

= E

f (i, j)f (i, j)

E

is zero. Thus, Q6 = 0. To see that Q7 is zero, consider

(cid:111)
(cid:17)
(cid:17)
(cid:17)
(cid:17)(cid:27)

(cid:110)
(cid:26)(cid:16)(cid:88)
×(cid:16)(cid:88)
×(cid:16)(cid:88)
×(cid:16)(cid:88)
2(cid:88)

(cid:110)
˜Ws1s2(i, j) −(cid:88)
˜Wr1r2(k, l) −(cid:88)
˜Wu1u2(i, k) −(cid:88)
˜Wx1x2(j, l1) −(cid:88)
(−1)|a−b|+|c−d|(cid:88)
(cid:88)
(cid:110) ˜Ws1s2(i, j) ˜Wr1r2(k, l) ˜Wu1u2(i, k) ˜Wx1x2(j, l1)
(cid:111)
(cid:110) ˜Ws1s2(i, j) ˜Wr1r2(k, l) ˜Wu1u2(i, k) ˜Wx1x2(j, l1)
(cid:111)
(cid:110)

(cid:88)

(cid:88)

˜Wx1x2(j, l1)

can be expressed as

˜Wu1u2(i, k)

× E

Sa

Rb

Uc

X d

a,b,c,d=1

R1

U1

X 1

R2

U2

X 2

(−1)|a−b|+|c−d|+|e−f|+|g−h|E

=

,

Vsasb(i, j)Vrcrd(k, l)Vueuf (i, k)Vxgxh(j, l1)

.

.

(cid:111)

Accordingly, E

2(cid:88)

a,b,c,d,e,f,g,h=1

(cid:110)

E

Under the null hypothesis,

(cid:111)

Vsasb(i, j)Vrcrd(k, l)Vueuf (i, k)Vxgxh(j, l1)
+ 2tr2(Σ2)tr(ΣCsaueΣCuesa) + 2tr2(Σ2)tr(ΣCrcuf ΣCuf rc)
+ 4tr(Σ2)tr(ΣCuesaCsbxg ΣCxgsbCsaue) + 4tr(Σ2)tr(ΣCrcuf CuesaΣCsaueCuf rc)

= tr(Σ4) + 2tr2(Σ2)tr(ΣCsbxg ΣCxgsb)

+ 4tr(ΣCrcuf ΣCuf rc)tr(ΣCsbxg ΣCxgsb)

+ 8tr(ΣCrcuf CuesaCsbxg ΣCxgsbCsaueCuf rc).

The summation of the above expression over a, b, c, d, e, f, g, h ∈ {1, 2} is zero. Hence,

Q7 = 0. Terms Q8 and Q9 are at most the order of R1. Term Q8 has the same order up to a

127

constant as R1 due to the four mutually diﬀerent indices. By the Cauchy-Schwarz inequality,

the E

f (i, j)f (k, l)f (i, k)f (j, l)

with respect to Q9 can be expressed as

f (i, j)f (k, l)f (i, k)f (j, l)

E

f (i, j)f (k, l)f (i, j)f (k, l)

(cid:111)

(cid:20)

(cid:111) ≤

(cid:110)

(cid:110)
(cid:20)
(cid:110)

E

E

×

(cid:18)

(cid:111)(cid:21)1/2
(cid:111)(cid:21)1/2
(cid:111)(cid:19)

f (i, k)f (j, l)f (i, k)f (j, l)

,

(cid:16) O

f (i, j)f (i, j)f (k, l)f (k, l)

.

(cid:110)

(cid:110)

E

Therefore,

As a result,

(cid:110)|Gn(i/T ) − Gn(j/T )|4(cid:111)
(cid:21)2
(cid:20)(j − i)
(cid:18) 1
(cid:88)

(cid:19)2α

λ2T

λ4

ul

,

T

i<l≤j

(cid:18) 1

(cid:88)

T

0<l≤T

(cid:19)2α

ul

,

≤ C

=

C
λ4β

(cid:111) ≤ KC

λ4β

=

KC
λ4

128

Q9 ≤ Q8 (cid:16) R1 (cid:16) C

([T η] − [T ν])2

T 2

.

I3 ≤ C

([T η] − [T ν])2

T 2

,

(3.53)

for some constant C. In summary, E[{Gn(η) − Gn(ν)}4] ≤ C([T η] − [T ν])2/T 2.

Let 0 ≤ i ≤ j ≤ T . By the deﬁnitions of η and ν, (3.37), (3.45), (3.53), and Markov’s

inequality, it follows that for any λ > 0,

(cid:16)|Gn(i/T ) − Gn(j/T )| ≥ λ

(cid:17) ≤ E

pr

where we set α = 1, β = 1, and ul = l − (l − 1) = 1. By Theorem 10.2 in Billingsley (1999)

(cid:110)

pr

max

t∈T |Gn(i/T )| ≥ λ

For λ large, above probability is less than any ε > 0. As a result, maxt∈T |Gn(i/T )| is tight,
and thus maxt∈T σ−1
ˆDnt is also tight. Therefore, by the tightness of the stochastic process
and convergence of the ﬁnite dimensional distributions, it follows that Mn converges to a
Gaussian process with mean zero and correlation Rz.

nt,0

(cid:3)

Proof of Theorem 9. Assume that one change point exists at time τ . Thus, we assume
alternative H∗
νt,max = maxt∈T max

(cid:16)(cid:112)V0t/w2(t),(cid:112)nV1t/w2(t)

1 as deﬁned in (3.15). Let ∆p = tr{(Σ1 − ΣT )2} and

such that T = {1, . . . , T − 1}.

(cid:17)

Deﬁne a set, E(C), such that E(C) = {t ∈ {1, . . . , T − 1} : |t − τ| ≥ CΘ}, where C

is some constant and Θ is a function of p, n, and T . The value Θ is chosen to show the

rate of convergence of the change point estimator under the asymptotic setting where p,

(cid:16)

pr

max
t∈E(C)

n, and T diverge. Thus, to establish this rate of convergence we must show that for some
C, pr(|ˆτ − τ| ≥ CΘ) < ε.
since pr(|ˆτ − τ| ≥ CΘ) = pr(ˆτ ∈ E(C)) ≤ pr(maxt∈E(C)
{maxt∈E(C)

ˆDnt > ˆDnτ ) < ε
ˆDnt > ˆDnτ ) for {ˆτ ∈ E(C)} ⊂

It is suﬃcient to show that pr(maxt∈E(C)

t∈E(C)

ˆDnt > ˆDnτ

ˆDnt > ˆDnτ}. Thus,
(cid:17) ≤ (cid:88)
(cid:88)
(cid:88)
≤ (cid:88)

=

=

t∈E(C)

t∈E(C)

pr

pr

pr

(cid:17)

(cid:16) ˆDnt > ˆDnτ
(cid:16) ˆDnt − Dt + Dt − Dτ > ˆDnτ − Dτ
(cid:104){ ˆDnt − Dt} + {−( ˆDnτ − Dτ )} > −{Dt − Dτ}(cid:105)
(cid:104)|{ ˆDnt − Dt} + {−( ˆDnτ − Dτ )}| > −{Dt − Dτ}(cid:105)

(cid:17)

.

The term −(Dt − Dτ ) can be expressed as |t − τ|G(t; τ )∆p, where

pr

t∈E(C)

G(t; τ ) =



1

T − t

, 1 ≤ t ≤ τ,

1
t

,

τ + 1 ≤ t < T.

In terms of T , the function G is of the order 1/T .

129

Recall that for two random variables, X and Y ,

pr(|X + Y | > ε) ≤ pr(|X| > ε/2) + pr(|Y | > ε/2).

Hence,

(cid:16)

pr

max
t∈E(C)

ˆDnt > ˆDnτ

(cid:17) ≤ (cid:88)
≤ (cid:88)

t∈E(C)

t∈E(C)

for some constants C1 and C. Choose Θ = νt,maxT

log T /n∆p. By the choice of Θ, order

pr

t∈E(C)

2

2

σnt

σnτ

>

>

(cid:27)

(cid:27)

(cid:27)

(cid:27)

pr

pr

| ˆDnτ − Dτ| >

|t − τ|G(t; τ )∆p

|t − τ|G(t; τ )∆p

|t − τ|G(t; τ )∆p

2σnt
|t − τ|G(t; τ )∆p

(cid:104)|{ ˆDnt − Dt} + {−( ˆDnτ − Dτ )}| > −{Dt − Dτ}(cid:105)
(cid:26)
| ˆDnt − Dt| >
(cid:26)
(cid:88)
(cid:26)| ˆDnt − Dt|
(cid:26)| ˆDnτ − Dτ|
(cid:88)
(cid:26)| ˆDnt − Dt|
(cid:26)| ˆDnτ − Dτ|
(cid:88)
(cid:26)| ˆDnt − Dt|
(cid:26)| ˆDnτ − Dτ|
(cid:88)
(cid:26)| ˆDnt − Dt|
(cid:26)| ˆDnτ − Dτ|
(cid:88)

2σnτ
|t − τ|G(t; τ )n∆p
4 ˜V0t + 8n ˜V1t
2
|t − τ|G(t; τ )n∆p
4 ˜V0τ + 8n ˜V1τ
2
C1|t − τ|G(t; τ )n∆p

2νt,max
C1|t − τ|G(t; τ )n∆p

CΘG(t; τ )n∆p

CΘG(t; τ )n∆p

>

(cid:112)

(cid:112)

2νt,max

νt,max

(cid:27)

(cid:27)

(cid:27)

(cid:27)

(cid:27)

σnτ

σnτ

>

>

σnt

σnt

σnt

>

>

>

(cid:27)

,

νt,max

σnτ
√

pr

t∈E(C)

pr

t∈E(C)

+

(cid:88)

+

(cid:88)

=

=

pr

t∈E(C)

pr

t∈E(C)

+

≤ (cid:88)

pr

t∈E(C)

pr

t∈E(C)

+

≤ (cid:88)

pr

t∈E(C)

+

pr

t∈E(C)

130

of G(t; τ ), and the fact that both ( ˆDnt − Dt)/σnt, ( ˆDnτ − Dτ )/σnτ ∼ N (0, 1), it follows that

(cid:88)
(cid:88)

t∈E(C)

(cid:26)| ˆDnt − Dt|
(cid:26)| ˆDnτ − Dτ|

σnt

pr

pr

t∈E(C)

σnτ

(cid:27)
(cid:27)

≤ (cid:88)
≤ (cid:88)

t∈E(C)

t∈E(C)

(cid:18)
|Z| >(cid:112)C log T
(cid:18)
|Z| >(cid:112)C log T

(cid:19)
(cid:19)

,

pr

pr

(3.54)

,

(3.55)

>

>

CΘG(t; τ )n∆p

νt,max

CΘG(t; τ )n∆p

νt,max

where Z ∼ N (0, 1) and C is some constant.

Recall that for a standard normal random variable, Z, and for any k > 0, pr(|Z| > k) ≤
2 exp{−x2/2}. For a large enough C, the summation terms in (3.54) and (3.55) can be

expressed as

(cid:18)
|Z| >(cid:112)C log T

pr

(cid:19)

(cid:88)

t∈E(C)

≤ (cid:88)

t∈E(C)

− C

2 < ε.

2T

For large C, the series is convergent as T → ∞. Therefore, pr(maxt∈E(C)
and

ˆDnt > ˆDnτ ) < ε,

(cid:18) νt,maxT

√

log T

ˆτ − τ = Op

n∆p
for ∆p = tr{(Σ1 − ΣT )2} and νt,max = maxt∈T max
. The
rate of convergence can be simpliﬁed further. The function w−1(t) is minimized at T /2.

(cid:19)
(cid:16)(cid:112)V0t/w2(t),(cid:112)nV1t/w2(t)
(cid:19)

(cid:17)

(cid:17)

√

nV1t

V0t,

.

(cid:3)

Therefore,

ˆτ − τ = Op

for ∆p = tr{(Σ1 − ΣT )2} and νmax = maxt∈T max

(cid:18)νmax

√

log T

n∆p

(cid:16)√

Proof of Theorem 10. Recall Theorem 5: Under the alternative H1 of (3.1), the maximum

value of Dt is attained at one of the q change points. We will make use of this theorem in

the proof that follows.

We will ﬁrst show that provided change points exist, we can detect their existence with

probability one, and we can the locations of the change points with probability one. Assume

131

(cid:16)
pr(Mn[It] > Wαn[It]) ≥ pr

(cid:17)

at least one change point exists in the interval It and that the cardinality of ˆQ is less than
the cardinality of Q. To show we can detect the existence of change points with probability
one in the interval It we must show that pr(Mn[It] > Wαn[It]) = 1.

= 1 − pr

= 1 − pr

(cid:17)

σ−1
nt,0[It] ˆDnt[It] ≤ Wαn[It]

σ−1
nt,0[It] ˆDnt[It] > Wαn[It]
(cid:16)
(cid:18) ˆDnt[It] − Dt[It]
(cid:18)

σ−1
nt [It]

= 1 − pr

Z ≤ σnt,0[It]
σ−1
nt [It]

Wαn[It] − Dt[It]
σ−1
nt [It]

≤ σnt,0[It]Wαn[It] − Dt[It]

(cid:19)

σ−1
nt [It]
(cid:19)

→ 1,

(cid:18)

where Z is a standard normal random variable. The pr

(cid:19)

Z ≤ σnt,0[It]
σ−1
nt [It]

Wαn[It] − Dt[It]
σ−1
nt [It]

goes to zero by our premise that Wαn = o(mSNR) for any It. Therefore, it follows that we

can detect the existence of a change point with probability one. Furthermore, by Theorem

5, Theorem 9 and our premise that νmax[It](cid:112)log(T )/n∆p[It] → 0, we can also correctly

identify a change point with probability one. The above derivations do not depend on It

since each subsequence satisﬁes the premises of this Theorem.

We also need to demonstrate that no change points will be identiﬁed that are not true
change points. Thus, consider the case where ˆQ = Q. It is suﬃcient to demonstrate that

no change point will be detected among the remaining time interval segments. Under H0 of
(3.1), as αn → ∞, it follows that by Theorem 8 pr(Mn[It] > Wαn[It]) = αn → 0 for some
interval It with no change points. Therefore, no change points will be incorrectly-identiﬁed
at any stage of the binary segmentation procedure. As a result ˆQ → Q in probability.
(cid:3)

132

CHAPTER 4

A HIDDEN MARKOV APPROACH FOR QTL MAPPING USING

ALLELE-SPECIFIC EXPRESSION SNPS

4.1 Introduction

Allele-speciﬁc expression (ASE) is part of the foundation for genetic diversity and is

paramount to programming and development of biological cells (Ferguson-Smith 2001). ASE

serves as a proxy for diﬀerential expression of two alleles at the same location within an

organism (Gu & Wang 2015). For example, allele-speciﬁc expression can be characterized as

the ratio between allele A and allele T. Diﬀerential expression is primarily explained by three

factors: cis-acting modiﬁcation, post transcription modiﬁcation, and epigenetic modiﬁcation

(Ferguson-Smith 2001). Cis-eﬀects correspond to the allele-speciﬁc variation, and thus, by

quantifying ASE it is possible to identify cis-acting eﬀects on an inter-individual basis among

heterozygous individuals (Buckland 2004). The presence of ASE implies one or multiple

variants have cis-acting eﬀects on gene expression levels that could be directly correlated

to phenotypic variation (Skelly et al. 2011). In fact, the phenomena of ASE has become

a focal point in identifying predispositions towards certain diseases (de la Chapelle 2009).

Due to the importance of understanding ASE, two natural questions arise with regards

to its inﬂuence on phenotypic traits. What is the relationship between single nucleotide

polymorphisms (SNPs) with ASE and a phenotypic trait? Which SNPs with ASE are have

an eﬀect on phenotypic variation? Our focus in Chapter 4 is to develop a procedure using a

novel hierarchical model to answer the second question.

Quantitative trait loci (QTL) mapping is the statistical process of identifying locations in

the genome that have an association with a complex phenotypic trait. For example, geneti-

cists may be interested in understanding which genes aﬀect cholesterol. Their understanding

of this association can provide insight towards disease prevention and susceptibility. An eﬀec-

133

tive QTL mapping procedure can also provide researchers a better understanding of breeding

and appropriate techniques, and permit altered genetic variation within a population (Cheng

et al. 2015). Studying SNPs with ASE and phenotypic variation was shown to be successful

by Cheng et al. (2015). By applying multiple Bayesian approaches, Cheng et al. (2015)

identiﬁed genetic markers in chickens associated with a resistance to Marek’s disease. This

disease is highly contagious and results in paralysis of the animal. The potential to eradi-

cate Marek’s disease through superior breeding techniques would be valuable to farmers and

individuals within the animal science community. Cheng et al. (2015) discovered that 83%

of the genetic variance in Marek’s disease resistance was explained by the selected SNPs

exhibiting ASE. These results were validated through a progeny study that found a 22%

diﬀerence in the occurrence of Marek’s disease after one generation of bidirectional selection

(Cheng et al. 2015). The profound discovery by Cheng et al. (2015) gives credence to the

fact that gene expression explains a large portion of phenotypic variation.

Next generation RNA sequencing data is now being widely used to investigate the pres-

ence of ASE. However, inference with regards to ASE remains a challenge along with map-

ping quantitative trait loci in the presence of only RNA sequencing data (Skelly et al. 2011).

Skelly et al. (2011) proposed a three-stage hierarchical Bayesian model to test ASE gene ex-

pression and study cis-regulatory variation. However, their procedure requires genomic DNA

data to establish prior probabilities. Similarly, Nariai et al. (2016) established a Bayesian

framework with variational inference for estimating allele-speciﬁc expression. Their tech-

nique also relied on diploid DNA data and did not link any phenotypic response. Hu et

al. (2015) proposed a uniﬁed maximum likelihood approach combining two models based

on ASE and total RNA read counts. Their approach involved cis-expression QTL mapping

with RNA sequencing data via a beta-binomial distribution.

In this chapter we present a novel two-step approach to perform QTL mapping using

SNPs with allele-speciﬁc expression. In the ﬁrst step, we predict the ASE ratios from RNA

sequencing data. In step two, we use the predicted ASE ratios to identify SNPs with cis-

134

acting eﬀects as it relates to a phenotypic response variable. We elicit a hierarchical model

for analysis of RNA sequence data to discover polymorphisms in expressed sequences whose

allele-speciﬁc expression is correlated with observed phenotypic variation. In our hierarchical

model, we ﬁrst implement a hidden Markov approach to impute the underlying genotype and

ASE status combinations based on the RNA read count data and simultaneously predict ASE

ratios for heterozygous SNP locations. Second, we apply regularized regression to identify

SNPs with ASE ratios that signiﬁcantly impact an observed phenotypic response. Ordinary

least squares is then applied for reﬁnement. Our proposed hierarchical model and procedure

has several advantages over existing methods. First, the hidden Markov model allows us to

model dependence among SNPs and aﬀords accurate genotype-ASE status imputation given

RNA read counts (Steibel et al. 2015). Second, our procedure obtains an ASE ratio estimate

in the absence of genetic DNA data. Many of the existing techniques required genetic DNA

data for ASE estimation. Third, our proposed model integrates RNA sequencing data and

phenotypic data to make inferences about the ASE status and cis-acting aﬀects on phenotypic

data. Fourth, our proposed method is easy to implement, where parameter estimation for

the hidden Markov model is performed using the expectation-maximization (EM) algorithm.

Variable selection via cyclic coordinate descent allows us to identify signiﬁcant SNPs quickly

and accurately, given an adequate signal-to-noise ratio. Lastly, our hierarchical model oﬀers

ﬂexibility with regards to the phenotypic response model of interest, mapping error, spatial

dependency, and individual variation in ASE ratios (Steibel et al. 2015).

Chapter 4 is organized as follows.

In Section 4.2 we introduce the ﬁrst layer of our

proposed model. A hidden Markov model and genotype-ASE status with ASE prediction

is recited based on the results of Steibel et al.

(2015).

In Section 4.3 we propose our

method to identify SNPs with ASE that have cis-acting eﬀects on a phenotypic variable of

interest. Simulation results and a comparison with two competing procedures are detailed in

Section 4.4. Our procedure is applied to a real data example that combines RNA sequencing

data and phenotypic data from a sounder of swine in Section 4.5. The swim data set and a

135

procedure to implement the hidden Markov approach is available in the R package HMMASE

at http://www.stt.msu.edu/users/pszhong/HMMASE.html.

4.2 A hidden Markov model for SNP genotype calling

In this Section we introduce the basic setting and proposed model in Steibel et al. (2015).

We introduce the salient features of their model, HMM-ASE, before we concentrate on ASE

prediction and quantitative trait loci mapping in Section 4.3.

Let Xil = (Xil1, Xil2, Xil3, Xil4)T be a random vector of RNA read counts at the lth

SNP for the ith individual. Denote xil (l = 1, . . . , L; i = 1, . . . , n) as the observed RNA

read counts, where xil1, xil2, xil3, xil4 represent observed counts for alleles A, C, G, and T,

respectively. Deﬁne the total RNA read counts at SNP l for individual i as nil =(cid:80)4

j=1 xilj.

Below we provide a set-up for a hidden Markov model with only two possible alleles: A

or T. The procedure can easily be extended to consider a non-bi-allelic SNP. Let Gil be a

latent variable that describes the genotype-ASE status with ﬁve possible hidden states. For

each individual i at SNP l let

Gil =



1 for “AA”,

2 for “AT-NASE”,

3 for “AT-ASE-HIGH”,

4 for “AT-ASE-LOW”,

5 for “TT”.

(4.1)

Two homozygous hidden states are represented by AA and TT, and three heterozygous

states are classiﬁed according to a relative ASE level. The variable Gil is latent and we

assume that Gil(l = 1, . . . , L) follows a Markov process. Let A be the probability transition

matrix for the Markov process Gil. Deﬁne the transition probabilities as
k, k(cid:48) = 1, . . . , 5.

pr(Gil = k(cid:48)|Gi(l−1) = k) = akk(cid:48)

(4.2)

Let πik (i = 1, . . . , n; k = 1, . . . , 5) be the the initial probabilities of Gil1 being a speciﬁc

state in (4.1) such that pr(Gi1 = k) = πik.

136

We assume that the RNA read counts are generated by a hierarchical model conditional

on the underlying state of Gil and ASE ratios. Let δil be a random variable for ASE ratios

conditional on Gil. Thus,

δil|Gil = k ∼



I{δil=1}
I{δil=0.5}
Beta(0.5,1](α1, β1)

Beta[0,0,5)(α2, β2)
I{δil=0}

for k = 1,

for k = 2,

for k = 3,

for k = 4,

for k = 5.

(4.3)

If the underlying genotype is homozygous, then the corresponding ASE ratio is either zero or

one with probability one. If the underlying genotype is heterozygous, but without ASE, then

the corresponding ASE ratio is 0.5 with probability one. For the two remaining heterozygous

states, it follows that the ASE ratio is deﬁned as Beta(0.5,1](α1, β1) and Beta[0,0,5)(α2, β2),

where each represent scaled beta distributions with scale and shape parameters being α1,

α2 and β1, β2, respectively. Conditional on Gil, we assume that δil are independent.

The ﬁrst layer of the hierarchical model is conditional on a latent genotype-ASE status. In

the second layer of the hierarchical model we deﬁne the probability distribution for RNA read

counts conditional on 4.3. gives us the distribution of the RNA read counts. Here we assume

that Xil = (Xil1, Xil2, Xil3, Xil4)T conditional on δil follows a multinomial distribution such

that

(cid:16)(cid:16)

where p(δil, e) =

(cid:17)
Xil|δil ∼ M ultinomial(nil, p(δil, e)),
δil + 1 − e

(cid:16) 4e
(cid:17)
3 − 1

δil + e

3 , e

3 , e
3 ,

1 − 4e

3

(cid:17)

(4.4)

is the probability vector in

the multinomial distribution for A, C, G, and T, respectively. We assume that all reads

are observable via a mapping error parameter denoted as e.
(δil, 0, 0, 1 − δil) represents the probabilities for observing A, C, G, and T, respectively.

If e = 0, then p(δil, 0) =

Figure 4.1 illustrates our hidden Markov model speciﬁcation for the ith individual when

L = 5. The hidden variables Gil are dependent via a Markov process. The variables δil are

137

conditional on Gil and independent among each other. RNA read counts are conditional on

Gil, but through δil.

δi1

δi2

δi3

δi4

δi5

Gi1

Gi2

Gi3

Gi4

Gi5

Xi1

Xi2

Xi3

Xi4

Xi5

Figure 4.1: A graphical model for illustrating the hidden Markov model for SNP genotype
calling. Grey circles represent observed values. White circles represent latent variables.

Given observed RNA read counts, xil, we can predict the underlying genotype-ASE

status, Gil, via the expectation-maximization (EM) algorithm and forward-backward proce-

dure. In addition, and more importantly, given observed RNA read counts and underlying

genotype-ASE statuses, we can derive the distribution for allele-speciﬁc expression ratios

and use the posterior mode of the distribution as an estimate for the ratio of ASE.

4.3 Phenotypic model speciﬁcation

Our ultimate goal is to identify signiﬁcant SNPs and understand their aﬀects on pheno-

typic variation. Let Yi be a phenotypic response of interest, where

Yi∼fYi

(yi|τi, φ) = exp

+ c(yi, φ)

,

i = 1, . . . , n.

(4.5)

(cid:20)yiτi − b(τi)

a(φ)

(cid:21)

We assume the distribution of Yi is in the form of a known exponential family. Let τ be

the canonical parameter and let φ be the dispersion parameter. For example, suppose the

phenotypic trait is eye color. If eye color is binary, such as the case for blue eyes or not blue

eyes, then we assume (4.5) follows a Bernoulli distribution. However, if the phenotype is

continuous one may consider the distribution of (4.5) to be Gaussian or Exponential. Fur-

l=1 δilγl, where γ is an L-dimensional vector of unknown parameters

that represent the eﬀects of gene expression to the phenotypic response Yi and δi is an L-

138

thermore, let η(δi) =(cid:80)L

dimensional vector of ASE ratios. In order to relate the parameters of the distribution to

the various predictors we denote E[Yi] = µi. Thus, for a canonical link function, h, it is the

case that η(δi) = h(µi) = τi. In particular, if we assume Yi follows a normal distribution,

then h is the identity link; whereas if Yi follows a Bernoulli distribution, then h could be the

logit link.

4.3.1 Prediction of ASE ratios

ASE ratios are unknown random variables. If we want to use them as predictors with regards

to modeling a phenotypic response, then we need an estimation procedure. We consider two

posterior probabilities that will be useful in our ultimate goal of identifying signiﬁcant SNPs.

Calculation of these two posterior distributions is dependent on an unknown parameter

vector θ = (α1, β1, α2, β2, e, A). Details to obtain maximum likelihood estimates via the EM

algorithm are provided in Steibel et al. (2015). Our ﬁrst posterior probability of interest is
pr(Gil = gil|X), which will be used for predicting the underlying genotype-ASE status of the
lth SNP in the ith individual. Here X represents the RNA read counts for all n individuals

at all L SNP positions. Given the states of Gil as deﬁned in (4.1), we will also be able

to deduce the ASE status for the respective individual and SNP. The posterior probability
pr(Gil|X) can be computed by Bayes’ formula. Let Gi = (Gi1,··· , GiL)T represent all the
possible genotype-ASE status combinations. Then the posterior probability is

(cid:88)

(cid:88)

Li,k(l) := pr(Gil = k|X) =

pr(Gi|X)I(Gil = k) =

pr(X, Gi)

pr(X)

I(Gil = k).

(4.6)

Gi

Gi

In order to estimate the hidden state of Gil (i = 1, . . . , n;

l = 1, . . . , L) we compute

maxk Li,k(l) for each individual and SNP combination. The quantity Li,k(l) is computed

from the EM algorithm. By the deﬁnition of the random variable δil, we aim to use the

estimated state of Gil to obtain an estimate for the ratio of ASE. This leads to our second

posterior probability of interest.

139

Let θ∗ be the updated parameter vector upon convergence of the EM algorithm. Consider

f (δil|Xil, Gil = k; θ∗) such that

f (δil|Xil, Gil = k; θ∗) =

f (Xil|δil; θ∗)f (δil|Gil = k; θ∗)

(cid:82) f (Xil|δil; θ∗)f (δil|Gil = k; θ∗)dδil

,

(4.7)

where the distributions of f (Xil|δil; θ∗) and f (δil|Gil = k; θ∗) are deﬁned in (4.4) and

(4.3), respectively. It follows that the denominator of (4.7) can be expressed as



(cid:1)(1 − e)Xil1( e
(cid:1)(0.5 − e
(cid:1)( e
(cid:1)( e
(cid:1)(1 − e)Xil4( e

3)Xil2+Xil3

3)Xil2+Xil3

Xil

Xil

(cid:0) nl
(cid:0) nl
(cid:0) nl
(cid:0) nl
(cid:0) nl

Xil

Xil

Xil

3)nil−Xil1

3)Xil1+Xil4( e

3)Xil2+Xil3

C0(θ∗;Xil1,Xil4)
0.5α1+β1−1B(α1,β1)

C1(θ∗;Xil1,Xil4)
0.5α2+β2−1B(α2,β2)

3)nil−Xil4

for k = 1,

for k = 2,

for k = 3,

(4.8)

for k = 4,

for k = 5

f (Xil|Gil = k; θ∗) =

Xil

where(cid:0) nl
(cid:1) =
(cid:90) 1
(cid:90) 0.5

C0 =

0.5

C1 =

0

nl!

Xil1!Xil2!Xil3!Xil4!, and

((1 − 4e
3

)δil +

)Xil1((

e
3

4e
3

− 1)δil + 1 − e)Xil4(1 − δil)α1−1(δil − 0.5)β1−1dδil,

((1 − 4e
3

)δil +

)Xil1((

e
3

4e
3

− 1)δil + 1 − e)Xil4δ

α2−1

il

(0.5 − δil)β2−1dδil.

If Gil = 3 or Gil = 4, then we know that the heterozygous genotype has ASE with the

quantity determined by a rescaled beta distribution. Thus, we only consider these estimated

140

states when computing an estimate for δil. Hence, for Gil = 3 and Gil = 4 it follows that

M0(δil, Xil1, Xil4, θ∗)(δil − 0.5)α1−1(1 − δil)β1−1

for k = 3,

M1(δil, Xil1, Xil4, θ∗)(δil)α2−1(0.5 − δil)β2−1

for k = 4,

(4.9)



f (δil|Xil, Gil = k; θ∗) =

where

M0 = ((1 − 4e
3

)δil +

M1 = ((1 − 4e
3

)δil +

)Xil1((

)Xil1((

e
3

e
3

4e
3

4e
3

− 1)δil + 1 − e)Xil4C0(θ∗; Xil1, Xil4)

− 1)δil + 1 − e)Xil4C1(θ∗; Xil1, Xil4),

with C0 and C1 as deﬁned above.

We deﬁne our ASE ratio estimate as ˆδil (i = 1, . . . , n; l = 1, . . . , L), where ˆδil is the

mode of the posterior distribution in (4.9).

4.3.2

Identiﬁcation of quantitative trait loci

In order to quantify the impact of ASE on phenotypic variation we utilize ˆδil as estimated in

Section 4.3.1. For L large, we aim to ﬁnd a sparse solution for the L-dimensional parameter

vector γ. In order to accomplish this task we apply a Lasso penalty. A sparse solution is

computed via cyclical coordinate descent and k-fold cross validation. For Yi as deﬁned in

(4.5) an estimated vector ˆγ is given by

(cid:34)

(cid:40)
− n(cid:88)

i=1

(cid:80)L

yi

l=1

ˆδilγl − b((cid:80)L

a(φ)

ˆγ = arg min

γ

(cid:35)

(cid:41)

ˆδilγl)

l=1

+ c(yi, φ)

+ λ∗||γ||1

,

(4.10)

where λ∗ is a non-negative regularization parameter. If Yi has a Binomial distribution, then

T

ˆδi

˜γ − log(1 + e
ˆδi

T

γ)) + λ∗||γ||1

(yi

.

(4.11)

(cid:41)

a solution for ˆγ is given by

ˆγ = arg min

ˆγ

(cid:40)
− n(cid:88)

i=1

141

Similarly, if Yi has as Gaussian distribution, then a solution for ˆγ is given by

(cid:40) n(cid:88)

i=1

(cid:41)

ˆγ = arg min

ˆγ

(yi − ˆδi

T

γ)2 + λ∗||γ||1

.

(4.12)

A cyclic coordinate descent algorithm can be used to solve (4.10) – (4.12) (Friedman et al.
2010). Solutions are provided across a range of λ∗ values, so to determine an optimal sparse

solution we perform k-fold cross validation as a way to extract SNPs that have non-zero
coeﬃcients for a speciﬁc value of λ∗. Our choice of λ∗ is based on the “one-standard error”

rule as it provides the most parsimonious model whose error is no more than one standard

error above the error of the best model. Details on cyclic coordinate descent and how it can

be applied to speciﬁc exponential families is available in Friedman et al. (2010).

After identifying a sparse solution for ˜γ we apply ordinary least squares using the phe-

notypic response and ﬁltered ˆδ to obtain estimates and standard errors for the non-zero γs.

Ordinary least squares estimates are given by

(cid:16)

∗T

yi − ˆδi

γ∗(cid:17)2(cid:41)

(cid:40) n(cid:88)

i=1

ˆγ∗ = arg min
γ∗

,

(4.13)

where ∗ denotes a ﬁltered set of predictors and parameters after (4.10).

We summarize our procedure as follows. First, obtain ASE ratio estimates given RNA

read counts and imputed genotype-ASE statuses. Second, apply variable selection to deter-

mine the SNPs with ASE that inﬂuence phenotypic variation. Third, model the relationship

between SNPs with ASE and the response using ordinary least squares. Figure 4.2 illustrates

the relationships between Gil, Xil, δil, and Yi for the ith individual and L = 5.

142

Yi

δi1

δi2

δi3

δi4

δi5

Gi1

Gi2

Gi3

Gi4

Gi5

Xi1

Xi2

Xi3

Xi4

Xi5

Figure 4.2: Grey circles represent observed values. White circles represent latent variables.

4.4 Simulation studies

In this section we demonstrate the performance of our proposed two-stage model in

identifying signiﬁcant SNPs with ASE ratios as they relate to a phenotype. We consider

a simpliﬁed version of (4.1) for the simulation by ignoring an extended classiﬁcation of

heterozygous genotype-ASE states. Hence, we assume



Gil =

1 for “AA”,

2 for “AT-ASE”,

3 for “TT”,

(4.14)

follows a three-state Markov process. Our data generation process consisted of the following

steps. First, two independent haplotypes were generated to form genotypes. The sequences

for each of the haplotypes were created using linkage disequilibrium information. For each

individual and SNP, total RNA read counts were generated from a negative binomial distri-

bution with parameter λ and probability parameter p = .40. RNA read counts of A, C, G,

and T were then generated according to the total number of RNA read counts and (4.14) –

143

(4.15). From (4.14) it follows that

δil|Gil = k ∼



I{δil=1}
Beta(α, β)
I{δil=0}

for k = 1,

for k = 2,

for k = 3.

(4.15)

Let δ be an n × L matrix that represents the true allele-speciﬁc expression ratios given

the true underlying genotype-ASE statuses. For a given individual and SNP, where the

underlying genotype is homozygous, the corresponding value in the matrix δ is represented

by zero. We set these values to zero because our interest is only in exploring the cis-acting

genetic eﬀects on phenotypic variation.

In the next step we generate the phenotypic response by the following linear model, where

L(cid:88)

yi =

δT
il γl + εi,

i = 1, . . . , n,

(4.16)

l=1

and γ is an L-dimensional parameter vector. We assume γ is sparse and only allow the ﬁrst

four elements to be non-zero. Thus, γ = (γ1, γ2, γ3, γ4, 0, . . . , 0)T such that γ1 = γ2 = γ3 =
γ4. Under this set-up, the ﬁrst four SNPs have cis-acting eﬀects while the remaining L − 4
SNPs have no cis-acting eﬀects on yi.

In the simulation studies we set n = 50 and 100, and L = 8, 15, and 50. The parameter

λ deﬁned in the negative binomial distribution to simulate total RNA read counts was set

to 16 and 24. The signal strength used in generation of the continuous phenotypic data,γ,

was set to 2, 3, 5, and 7. Lastly, we set e = 0.07, α = 3, β = 3 for the mapping parameter

and Beta distribution parameters, respectively; and the linkage disequilibrium information

was set to 0.30. The simulation results presented in the Tables are ﬁgures were based oﬀ 100

replications.

To evaluate the performance of our proposed method considered the false positive rate

and false negative rate. For a given parameter combination, the false positive and false

negative rates were averaged over the 100 replications. The false positive rate and false

144

negative rate are deﬁned as follows:

False Positive Rate =

# coeﬃcents falsely identiﬁed as non-zero

# non-zero coeﬃcients

False Negative Rate =

# coeﬃcents falsely identiﬁed as zero

# zero coeﬃcients

(4.17)

Figure 4.3: Average false negative rates and average false positive rates for the proposed
method. Facets in row 1 are for n = 50. Facets in row 2 are for n = 100.

The simulation results for a single test at signiﬁcance level 0.01 are illustrated in Figure

4.3. As the heritability, or value of γ increases, the average false negative rate decrease. The

same relationship holds for the average false positive rates. As the number of SNPs increases

for a given γ, the average false positive rate increases whereas the average false negative rate

decreases. As the sample size increases both average rates decrease. The top row of plots

corresponds to the setting in which n = 50, and the bottom row of plots corresponds to the

setting in which n = 100. Lastly, all else held constant, a larger value of λ generally results in

smaller false negative and false positive rates. A larger value of λ means a larger number of

RNA read counts, and thus, more information. From a practical perspective, a low average

145

false negative rate implies signiﬁcant SNPs will not fail to be identiﬁed. Similar trends exist

when we consider a simultaneous test. Average rate values are displayed in Table 4.1 and

Table 4.2 for the single and simultaneous test, respectively.

Table 4.1: Average false positive and average false negative rates for the single test with
signiﬁcance level 0.01. Average false positive rate is top value

L
8

8

λ
16

24

15

16

15

24

50

16

50

24

γ = 2
0.0074
0.2633
0.0182
0.2105
0.0357
0.1240
0.0160
0.1062
0.0977
0.0345
0.1062
0.0288

n = 50

n = 100

γ = 3
0.0138
0.0828
0.0040
0.0573
0.0217
0.0515
0.0120
0.0328
0.1021
0.0138
0.1337
0.0086

γ = 5
0.0120
0.0240
0.0073
0.0040
0.0145
0.0099
0.0185
0.0042
0.0650
0.0021
0.0549
0.0011

γ = 7
0.0078
0.0085
0.0060
0.0020
0.0290
0.0052
0.0073
0.0017
0.0642
0.0021
0.0562
0.0004

γ = 2
0.0085
0.0678
0.0093
0.0220
0.0308
0.0184
0.0120
0.0117
0.0676
0.0051
0.0610
0.0034

γ = 3
0.0060
0.0040
0.0060
0.0000
0.0278
0.0034
0.0020
0.0000
0.0716
0.0002
0.0545
0.0002

γ = 5
0.0020
0.0000
0.0040
0.0000
0.0060
0.0000
0.0133
0.0000
0.0486
0.0000
0.0463
0.0000

γ = 7
0.0060
0.0000
0.0133
0.0000
0.0093
0.0000
0.0080
0.0000
0.0330
0.0000
0.0420
0.0000

Table 4.2: Average false positive and average false negative rates for the simultaneous test
with nominal level 0.05. Average false positive rate is top value

L
8

8

λ
16

24

15

16

15

24

50

16

50

24

γ = 2
0.0718
0.0668
0.0756
0.0440
0.1233
0.0436
0.1348
0.0350
0.2033
0.0178
0.2433
0.0124

n = 50

n = 100

γ = 3
0.0726
0.0213
0.0854
0.0040
0.1132
0.0106
0.1141
0.0077
0.2445
0.0043
0.2884
0.0020

γ = 5
0.0702
0.0040
0.0762
0.0000
0.1177
0.0008
0.0861
0.0027
0.2238
0.0011
0.1923
0.0004

γ = 7
0.0603
0.0020
0.0450
0.0000
0.1146
0.0009
0.0885
0.0000
0.2001
0.0013
0.1848
0.0002

γ = 2
0.0512
0.0120
0.0548
0.0065
0.0975
0.0025
0.0543
0.0025
0.1544
0.0017
0.1403
0.0015

γ = 3
0.0433
0.0000
0.0400
0.0000
0.0709
0.0000
0.0273
0.0000
0.1528
0.0002
0.1264
0.0000

γ = 5
0.0293
0.0000
0.0326
0.0000
0.0378
0.0000
0.0352
0.0000
0.0925
0.0000
0.1080
0.0000

γ = 7
0.0343
0.0000
0.0363
0.0000
0.0360
0.0000
0.0327
0.0000
0.0905
0.0000
0.0884
0.0000

We evaluated the performance of our proposed method with two alternative procedures.

146

Alternative procedure 1 used an exact binomial test based on the simulated RNA read counts

in order to estimate the unknown genotype-ASE status. Our test was performed under the

null hypothesis p = 0.50 with alternatives p > 0.50 and p < 0.50 corresponding to geno-

types AA and TT, respectively. Following genotype-ASE state imputation, we performed

an ordinary least squares post Lasso technique using the simulated phenotypic data as the

response and the estimated genotype-ASE statuses as predictors. Thus, we did not consider

ASE estimation in this alternative procedure. The average false positive and average false

negative rates were calculated under the same parameter scenarios as our proposed method.

Figure 4.4: Average false negative rates and average false positive rates for alternative
procedure 1. Facets in row 1 are for n = 50. Facets in row 2 are for n = 100.

Figure 4.4 depicts the average false positive and average false negative rates. When n

increases from 50 to 100, the average false negative rate decreases slightly. However, we

do not see the precipitous decline in average false negative rates as heritability increases,

compared to our proposed method. Likewise, the average false positive rate decreases as

the sample size increases. As the number of SNPs increases, the average false positive rate

147

increases and is much higher than the average rates in Figure 4.3. Table 4.3 provides the

raw values for all parameter combinations.

Table 4.3: Alternative method 1, average false positive and average false negative rates for
the single test with signiﬁcance level 0.01. Average false positive rate is top value

L
8

8

λ
16

24

15

16

15

24

50

16

50

24

γ = 2
0.0312
0.4864
0.0714
0.4886
0.2500
0.2592
0.2564
0.2608
0.3646
0.0778
0.4769
0.0782

n = 50

n = 100

γ = 3
0.1750
0.4869
0.1190
0.4884
0.0000
0.2577
0.1667
0.2555
0.5370
0.0782
0.3833
0.0775

γ = 5
0.0000
0.4867
0.0476
0.4845
0.1000
0.2587
0.1369
0.2519
0.4674
0.0763
0.4031
0.0750

γ = 7
0.0500
0.4843
0.1136
0.4850
0.2304
0.2579
0.1591
0.2538
0.2978
0.0753
0.3476
0.0760

γ = 2
0.0000
0.4864
0.0294
0.4876
0.1190
0.2528
0.0526
0.2509
0.2892
0.0767
0.2325
0.0762

γ = 3
0.0309
0.4726
0.1250
0.4776
0.1133
0.2513
0.0000
0.2493
0.2179
0.0740
0.1471
0.0757

γ = 5
0.0714
0.4752
0.0000
0.4674
0.0808
0.2434
0.0778
0.2447
0.3053
0.0757
0.2546
0.0714

γ = 7
0.0333
0.4746
0.0750
0.4578
0.0469
0.2469
0.0783
0.2424
0.2048
0.0719
0.1333
0.0723

The weak performance of alternative method 1 comes from two sources. First, the bino-

mial test results in less accurate genotype predictions compared to (4.6). Assuming genotypes

follow a Markov process and using a hidden Markov model to impute their state provides ex-

tra information that results in accurate predictions (Ferguson-Smith 2001). Second, correctly

predicted heterozygous states do not consider ASE.

For alternative method 2 we estimated an allele-speciﬁc expression ratio directly from

the simulated RNA read counts. Let

ˆASEil be an estimated quantity for ASE such that

ˆASEil =

Xilref

Xilref

+ Xilalt

i = 1, . . . , n; l = 1, . . . , L,

(4.18)

where we deﬁne A to be the reference allele and T to be the alternative allele. Again, we

performed an ordinary least squares post-Lasso procedure in conjunction. The average false

positive and average false negative rates were calculated under the same parameter scenarios

as our proposed method and alternative method 1. Figure 4.5 and Table 4.4 illustrate that

148

the performance is similar to alternative method 1 and inferior to our proposed method with

regards to false positive and false negative metrics.

Figure 4.5: Average false negative rates and average false positive rates for alternative
procedure 2. Facets in row 1 are for n = 50. Facets in row 2 are for n = 100.

149

Table 4.4: Alternative method 2, average false positive and average false negative rates for
the single test with signiﬁcance level 0.01. Average false positive rate is top value

L
8

8

λ
16

24

15

16

15

24

50

16

50

24

γ = 2
0.0938
0.4875
0.1250
0.4876
0.3000
0.2628
0.1154
0.2591
0.3968
0.0787
0.4009
0.0771

n = 50

n = 100

γ = 3
0.2111
0.4917
0.0000
0.4850
0.3148
0.2614
0.3182
0.2626
0.4011
0.0756
0.4190
0.0777

γ = 5
0.1522
0.4867
0.1190
0.4861
0.1618
0.2552
0.1746
0.2521
0.4141
0.0770
0.5042
0.0773

γ = 7
0.1391
0.4817
0.1481
0.4832
0.1458
0.2552
0.2283
0.2562
0.4429
0.0778
0.5198
0.0758

γ = 2
0.0000
0.4860
0.1667
0.4924
0.0702
0.2529
0.0000
0.2598
0.2083
0.0773
0.2633
0.0765

γ = 3
0.0185
0.4738
0.0962
0.4793
0.0938
0.2443
0.0909
0.2517
0.1607
0.0773
0.2333
0.0737

γ = 5
0.0455
0.4710
0.0556
0.4716
0.0500
0.2490
0.1496
0.2399
0.1865
0.0746
0.2646
0.0732

γ = 7
0.0095
0.4680
0.0000
0.4623
0.1000
0.2455
0.0208
0.2499
0.2033
0.0745
0.1470
0.0720

Figure 4.6 characterizes the discrepancy between an ASE estimate from RNA read counts

deﬁned in (4.18) and an ASE estimate using our proposed hierarchical model. For values

less than 0.50, the hidden Markov model ASE estimate is greater than our naive estimate in

(4.18). Above 0.50 the hidden Markov model ASE estimate is less than our naive estimate.

Figure 4.6: ASE estimates from the hidden Markov model compared to simulated raw
allele count ratios. Hidden Markov model imputed ASE ratios with value less than 0.50 are
marked as red, and values above 0.50 are marked as blue.

150

Through our simulation analysis, our hierarchical model appears to perform better than

two alternative procedures at identifying SNPs with cis-acting eﬀects on phenotypic varia-

tion.

4.5 An empirical study

The following paragraph provides some of the data gathering and processing details as

explained in Steibel et al. (2015). RNA sequence data was obtained from 24 female pigs

from an F2 cross of Duroc and Pietrain breeds (Choi et al. 2012, Choi et al. 2011, Edwards

et al. 2008a, Edwards et al. 2008b, Steibel et al. 2011). Protocols for RNA sequencing

and the accuracy of genotype calling using a hidden Markov-ASE model have already been

established in Steibel et al. (2015). To summarize the process, RNA from each sample was

reverse transcribed, fragmented, barcode-labeled and sequenced on an Illumina HiSeq 2000

(100 bp, paired-end reads). After quality control ﬁltering, sequence reads were aligned to

reference genome (Sus scrofa 10.2.69 retrieved from the Ensembl database) using Tophat

(Trapnell et al. 2009). Coding SNP discovery and genotyping were done with VarScan

(Trapnell et al. 2009). We focused on chromosome 13 and extracted counts of reads agreeing

with reference (R) or alternative (A) allele with respect to the reference genome at putative

5364 cSNP and we retained read counts on 65 SNPs that could be independently validated

using a SNP chip (Steibel et al. 2015). In addition to the RNA sequence data, 45 minute

post-mortem meat pH was recorded in these animals (Edwards et al. 2008b) and served as

our phenotypic response variable for analyses.

The RNA sequence data we analyzed is available in the HMMASE R package which is

available at http://www.stt.\msu.edu/users/pszhong/HMMASE.html. The data set was

partitioned so that the minimum number of SNPs in a segment is 30. Our proposed proce-

dure was applied to each segment of RNA sequence data. Figure 4.7 depicts estimates for

signiﬁcant SNPs along with their SNP ID number. For example, the second segmented data

set produced four signiﬁcant SNPs: 12256008, 12400307, 12403644, and 12404379.

151

Figure 4.7: Estimates for SNPs. Signiﬁcant SNPs are displayed with their respective ID
provided in the real data set. IDs correspond to the ordered locations.

We investigated the eﬀects of an estimated ASE ratio from the hidden Markov model

compared to the naive estimate deﬁned in (4.18). Figure 4.8 depicts the relationship between

the two estimates, and reveals a shrinkage around each Beta distribution’s mode of 0.25 and

0.75, respectively. For values below the respective mode, the hidden Markov ASE estimate

is less than the raw allele count ratio, and for values above the respective mode, the hidden

Markov ASE estimate is greater than the raw allele count ratio.

152

Figure 4.8: ASE estimates from the hidden Markov model compared to real raw allele
count ratios. Hidden Markov imputed ASE values conditional on Gil = 3 and Gil = 4 are
marked as blue and red, respectively.

Figure 4.9: ASE estimates from the hidden Markov model compared to real raw allele
count ratios. Hidden Markov imputed ASE values conditional on Gil = 3 and Gil = 4 are
marked as blue and red, respectively.

153

CHAPTER 5

CONCLUSION

5.1 Introduction

In this chapter we summarize the salient contributions to the ﬁeld of Statistics based on

the content of Chapters 2 through 4. We also introduce details to new and exciting research

challenges.

5.2 Summary of contributions

In Chapter 2, we proposed a novel nonparametric test procedure for testing the temporal

homogeneity of covariance matrices with high-dimensional longitudinal data. The proce-

dure aims to detect and identify change points among a temporally dependent collection

of covariance matrices. In Chapter 2, a new test statistic was introduced, and theoretical

results were derived under an asymptotic setting in which n and p diverge and T is ﬁnite.

The test statistic’s asymptotic distribution was derived under mild dependence assumptions

but with no assumption of sparsity and no requirement on the relationship between n and

p. We also proposed a procedure to identify the locations of change points through binary

segmentation. The corresponding change point identiﬁcation estimator’s rate of convergence

was investigated and shown to be consistent provided an adequate signal-to-noise ratio ex-

ists. Numerical studies demonstrated the ﬁnite sample performance of our procedure. These

developments expanded the ﬁeld of Statistics by pioneering a robust procedure to detect

and identify change points among covariance matrices in the presence of high-dimensional

longitudinal data.

In Chapter 3, we widened the scope of applicability with regards to the procedure de-

veloped in Chapter 2. Theoretical results were derived under an asymptotic framework in

which n, p, and T all diverge. We established the test statistic’s asymptotic distribution and

154

demonstrated that the change point identiﬁcation estimator’s rate of convergence depends

on n, p, T , and the signal-to-noise ratio. Therefore, the estimator was also shown to be con-

sistent in a diverging T setting, provided an adequate signal-to-noise ratio exists. Numerical

studies demonstrated the ﬁnite sample performance for a large T setting. Chapter 3 also

addressed computation challenges. Recursive formulae were derived as were computation eﬃ-

cient forms of U-type statistics. In addition, we proposed an accurate quantile approximation

procedure to via an estimated correlation matrix. The overall computation complexity was

reduced from the order pn4T 6 to the order of pn2T 3. These theoretical and computational

developments of Chapter 3 made our procedure applicable to high-dimensional functional

data and allowed us to demonstrate our method using a task-based fMRI data set. Thus, the

contribution to Statistics in Chapter 3 is an expanded scope of the cutting-edge procedures

introduced in Chapter 2.

In Chapter 4, we developed a hierarchical model to understand the relationship between

allele-speciﬁc expression and phenotypic variation. Our hierarchical model was able to use

RNA sequence data and identify SNPs with ASE that have a cis-acting eﬀect on a phenotypic

response. The performance is accurate and can quickly be applied through a combination of

the EM algorithm and Lasso procedure.

5.3 Future research

The procedures established in Chapter 2 and extended in Chapter 3 required mild as-

sumptions but did not allow much ﬂexibility in terms of n and T . For example, in many

longitudinal studies patients drop out, measurements are missing at random or non-random

time points, and the sample size can be extremely small or even one. The methods de-

veloped in Chapters 2 and 3 will not be applicable to data under these settings. More

work is necessary to accommodate a wider domain of real-world data and problems. One

valuable extension of our work will be to develop a procedure for single-subject inference

in high-dimensional longitudinal data and high-dimensional functional data.

In terms of

155

a homogeneity test for covariance matrices, this setting will allow for an increase in scope

of applications due to more realistic assumptions, and a sample size requirement of only

one. For genetic or fMRI data, an eﬀectively developed procedure will invoke a personalized

medicine approach and have a greater beneﬁt to the individual patient. Other potential

applications of this work would include real estate and ﬁnancial data, and motion sensor

data for activities.

From a computation standpoint, accurate and fast approximations could be developed

to handle situations where T is of the order 1000. Even with a high-performance computing

cluster, it is not practical to apply our proposed procedure for massive values of T . However,

as technology improves and longitudinal studies expand, the demand to address massive

high-dimensional longitudinal data will increase. It is paramount that statistical methods

can produce accurate and fast results for practitioners.

A natural extension to the model proposed in Chapter 4 is to develop a uniﬁed likeli-

hood approach in a hierarchical framework. Rather than only use RNA read count data to

predict the underlying genotype with ASE status, we could perform this prediction given

phenotypic data and RNA read counts. The additional information should improve predic-

tion accuracy. From a theoretical perspective, a uniﬁed likelihood approach would allow

for statistical inference, and under certain conditions consistency and asymptotic normality

could be proved. From a computation perspective, we could investigate a procedure that

performs variable selection and parameter estimation through a penalized EM algorithm or

penalized variational EM algorithm.

156

BIBLIOGRAPHY

157

BIBLIOGRAPHY

Ahmad, R. M. (2017). Testing homogeneity of several covariance matrices and multi-sample
sphericity for high-dimensional data under non-normality. Communications in Statistics -
Theory and Methods, 46:8, 3738–3753.

Aminikhanghahi, S. & Cook, D. (2016). A survey of methods for time series change

point detection. Knowledge And Information Systems, 51(2), 339–367.

Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, New York:

John Wiley.

Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., & Cherry, J.
et al. (2000). Gene ontology: tool for the uniﬁcation of biology. The gene ontology
consortium. Nature Genetics, 25, 25–29.

Aue, A., Hormann, S., Horv´ath, L., & Reimherr, M. (2009). Break detection in the
covariance structure of multivariate time series models. Annals of Statistics, 37, 4046–4087.

Bach, F. R. (2008). Bolasso: Model consistent lasso estimation through the bootstrap.
In Cohen, W. W., Mccallum, A., & Roweis, S. T., editors, Proceedings of the 25th In-
ternational Conference on Machine Learning, pages 33–40, Brookline, MA. Microtome
Publishing.

Bai, Z. & Saranadasa, H. (1996). Eﬀect of high dimension: by an example of a two

sample problem. Statistica Sinica, 6, 311–329.

Bai, Z. D. & Yin, Y. Q. (1993). Limit of the Smallest Eigenvalue of a Large Dimensional

Sample Covariance Matrix. Annals of Probability, 21, 1276–1294.

Baldassano, C., Chen, J., Zadbood, A., Pillow, J., Hasson, U., & Norman,
K. (2017). Discovering Event Structure in Continuous Narrative Perception and Memory.
Neuron, 95(3), 709–721.e5.

Barnett, I. & Onnela, J.-P. (2016). Change point detection in correlation networks.

Scientiﬁc Reports, 6, 18893.

Basseville M. & Nikiforov, I. V. (1993). Detection of Abrupt Changes - theory and

application. Prentice Hall.

Box, G. E. (1949). A general distribution theory for a class of

likelihood criteria.

Biometrika, 36, 317–346.

Brodsky, B. E. (2017). Change-Point Analysis in Nonstationary Stochastic Models, CRC

Press.

Brodsky, B. E. & Darkhovsky, B.S. (1993). Nonparametric Methods in Change Point

Problems, Boston: Springer.

158

Brown, R., Durbin, J. & Evans, J. (1975). Techniques for Testing the Constancy of
Regression Relationships over Time. Journal of the Royal Statistical Society: Series B,
37(2), 149–192.

Buckland, P. (2004). Allele-speciﬁc gene expression diﬀerences in humans. Human Molec-

ular Genetics, 13 (suppl 2), R255–R260.

B¨uhlmann, P. & van de Geer, S. (2011). Statistics for High-Dimensional Data, Boston:

Springer.

Cai, T., Liu, W., & Xia, Y. (2013). Two-sample covariance matrix testing and sup-
port recovery in high-dimensional and sparse settings. Journal of the American Statistical
Association, 108(501), 265–277.

Cai, T. & Ma, Z. (2013). Optimal hypothesis testing for high dimensional covariance

matrices. Bernoulli, 19(5B), 2359–2388.

Cai, T. & Xia, Y. (2014). High-dimensional sparse MANOVA. Journal of Multivariate

Analysis, 131, 174–196.

Candes, E. & Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much

larger than n. Annals of Statistics, 35(6), 2313–2351.

Chen, J., Leong, Y., Honey, C., Yong, C., Norman, K., & Hasson, U. (2016).
Shared memories reveal shared structure in neural activity across individuals. Nature Neu-
roscience, 20(1), 115–125.

Chen, S. X. & Qin, Y.-L. (2010). A two-sample test for high-dimensional data with

applications to gene-set testing. Annals of Statistics, 38, 808–835.

Chen, S. X., Zhang, L., & Zhong, P.-S. (2010). Testing high dimensional covariance

matrices. Journal of the American Statistical Association, 105, 810–819.

Cheng, H., Perumbakkam, S., Pyrkosz, A., Dunn, J., Legarra, A., & Muir,
W. (2015). Fine mapping of QTL and genomic prediction using allele-speciﬁc expression
SNPs demonstrates that the complex trait of genetic resistance to Marek’s disease is
predominantly determined by transcriptional regulation. BMC Genomics, 16(1).

Chernoff, H. & Zacks, S. (1964). Estimating the Current Mean of a Normal Distribution
which is Subjected to Changes in Time. Annals of Mathematical Statistics, 35(3), 999–
1018.

Choi, I., Bates, R., Raney, N., Steibel, J., & Ernst, C. (2012). Evaluation of QTL
for carcass merit and meat quality traits in a US commercial Duroc population. Meat
Science, 92:132–8.

Choi, I., Steibel, J., Bates, R., Raney, N., Rumph, J., & Ernst, C. (2011).
Identiﬁcation of Carcass and Meat Quality QTL in an F(2) Duroc x Pietrain pig resource
population using diﬀerent least-squares analysis models. Frontiers in Genetics, 2:18.

159

Cs¨org¨o, M. & Horv´ath, L. (1997). Limit Theorems in Change-Point Analysis, New

York: John Wiley.

Danaher, P., Paul, D., & Wang, P. (2015). Covariance-based analyses of biological

pathways. Biometrika, 102, 533–544.

de la Chapelle, A. (2009). Genetic predisposition to human disease: allele-speciﬁc ex-

pression and low-penetrance regulatory loci. Oncogene, 28(38), 3345–3348.

Dempster, A. (1958). A High Dimensional Two Sample Signiﬁcance Test. Annals of Math-

ematical Statistics, 29(4), 995–1010.

Dette, H., Pan, G., & Yang, Q. (2018). Estimating a change point in a sequence of

very high-dimensional covariance matrices. Arxiv.org.

Donoho, D. L. (2000). High-dimensional data analysis: The curses and blessings of dimen-

sionality. In AMS Conference on Mathematical Challenges of the 21st Century.

Donoho, D. L. & Johnstone, I. (1994). Ideal spatial adaptation by wavelet shrinkage.

Biometrika, 81(3), 425–455.

Edwards, D., Ernst, C., Raney, N., Doumit, M., Hoge, M., & Bates, R. (2008a).
Quantitative trait loci mapping in an F2 Duroc x Pietrain resource population: I. Growth
traits. Journal of Animal Science, 86:241–53.

Edwards, D., Ernst, C., Raney, N., Doumit, M., Hoge, M., & Bates, R. (2008b).
II.

Quantitative trait locus mapping in an F2 Duroc x Pietrain resource population:
Carcass and meat quality traits. Journal of Animal Science, 86:254–66.

Efron, B. (2007). Size, power and false discovery rates. Annals of Statistics, 35, 1351–1377.

Fan, J., Han, F., & Liu, H. (2014a). Challenges of Big Data analysis. National Science

Review, 1(2):293–314.

Fan, J. & Li, R. (2001). Variable Selection via Nonconcave Penalized Likelihood and its

Oracle Properties. Journal of the American Statistical Association, 96:456, 1348–1360

Fan, J. & Li, R. (2006). Statistical challenges with high dimensionality: Feature selection
in knowledge discovery. In Sanz-Sole, M., Soria, J., Varona, J. L., & Verdera, J., editors,
Proceedings of the International Congress of Mathematicians, volume 3, pages 595–622,
Z¨urich, CH. European Mathematical Society.

Fan, J., Liao, Y., & Yao, J. (2015). Power enhancement in high dimensional cross-

sectional tests. Econometrica, 83, 1497–1541.

Ferguson-Smith, A. (2001). Imprinting and the Epigenetic Asymmetry Between Parental

Genomes. Science, 293(5532), 1086–1089.

Finn, E., Shen, X., Scheinost, D., Rosenberg, M., Huang, J., & Chun, M. et
al. (2015). Functional connectome ﬁngerprinting: identifying individuals using patterns
of brain connectivity. Nature Neuroscience, 18(11), 1664–1671.

160

Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for General-
ized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22.

Fu, W. & Knight, K. (2000). Asymptotics for lasso-type estimators. Annals of Statistics,

28(5):1356–1378.

Fujikoshi, Y., Ulyanov, V., & Shimizu, R. (2010). Multivariate Statistics, Wiley Series

In Probability And Statistics.

Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., & Hothorn, T.

(2018). mvtnorm: Multivariate Normal and t Distributions, R package version 1.0–8.

Gu, F. & Wang, X. (2015). Analysis of allele speciﬁc expression - a survey. Tsinghua

Science And Technology, 20(5), 513–529.

Hall, P. & Heyde, C. (1980). Martingale Limit Theory and Its Application, New York:

Academic Press.

Hinkley, D. V. (1970). Inference about the change-point in a sequence of random variables.

Biometrika, 57, 1–17.

Horv´ath, L. & Kokoszka, P. (1997). The eﬀect of long-range dependence on change-

point estimators. Journal of Statistical Planning And Inference, 64(1), 57–81.

Huber, P. J. (1981). Robust statistics, New York: John Wiley.

Johnson, R. & Bagshaw, M. (1974). The Eﬀect of Serial Correlation on the Performance

of CUSUM Tests. Technometrics, 16(1), 103–112.

Johnstone, I. (2001). On the distribution of the largest eigenvalue in principal components

analysis.Annals of Statistics, 29 (2001), no. 2, 295–327.

Johnstone, I. & Titterington, D. (2009). Statistical challenges of high-dimensional
data. Philosophical Transactions of The Royal Society A: Mathematical, Physical And
Engineering Sciences, 367(1906), 4237–4253.

Kander, Z. & Zacks, S. (1966). Test Procedures for Possible Changes in Parameters
of Statistical Distributions Occurring at Unknown Time Points. Annals of Mathematical
Statistics, 37(5), 1196–1210.

Kannan, R. P., Hensley, L. L., Evers, L. E., Lemon, S. M., & McGivern, D.
R. (2011). Hepatitis C virus infection causes cell cycle arrest at the level of initiation of
mitosis. Journal of Virology, 85, 7989–8001.

Koh, W., Pan, W., Gawad, C., Fan, H. C., Kerchner, G. A., Wyss-Coray, T.,
Blumenfeld, Y. J., El-Sayed, Y. Y., & Quake, S. R. (2014). Noninvasive in vivo
monitoring of tissue-speciﬁc global gene expression in humans. Proceedings of the National
Academy of Sciences, 111, 7361–7366.

161

Kundu, S., Ming, J., Pierce, J., McDowell, J., & Guo, Y. (2018). Estimating
dynamic brain functional networks using multi-subject fMRI data. Neuroimage, 183, 635–
649.

Laumann, T. O., Snyder, A. Z., Mitra, A., Gordon, E. M., Gratton, C.,
Adeyemo, B., Gilmore, A. W., Nelson, S. M., Berg, J. J., Greene, D. J.,
McCarthy, J. E., Tagliazucchi, E., Laufs, H., Schlaggar, B. L., Dosenbach,
N. U. F., & Peterson, S. E. (2017). On the stability of BOLD fMRI correlations.
Cerebral Cortex, 27, 4719–4732 .

Ledoit, O. & Wolf, M. (2002). Some hypothesis tests for the covariance matrix when
the dimension is large compared to the sample size. Annals of Statistics, 30(4):1081–1102.

Li, J. & Chen, S. X. (2012). Two sample tests for high-dimensional covariance matrices.

Annals of Statistics, 40, 908–940.

Meinshausen, N. & B¨uhlmann, P. (2010). Stability selection. Journal of the Royal

Statistical Society: Series B, 72(4):417–473.

Monti, R., Hellyer, P., Sharp, D., Leech, R., Anagnostopoulos, C., & Mon-
tana, G. (2014). Estimating time-varying brain connectivity networks from functional
MRI time series. Neuroimage, 103, 427–443.

Muirhead, R. J. (2005). Aspects of Multivariate Statistical Theory, New York: John Wiley.

Nariai, N., Kojima, K., Mimori, T., Kawai, Y., & Nagasaki, M. (2016). A
Bayesian approach for estimating allele-speciﬁc expression from RNA-Seq data with
diploid genomes. BMC Genomics, 17(S1).

Osborne, M. R., Presnell, B., & Turlach, B. A. (2000). On the LASSO and its

dual. Journal of Computational and Graphical Statistics, 9(2):319–337.

Page, E. S. (1954). Continuous Inspection Schemes. Biometrika, 41(1–2), 100–115.

Ramsay, J. O. (1982). When the data are functions. Psychometrika, 47, 379–396.

Ramsay, J. O. & Silverman, B. W. (2005). Functional Data Analysis, New York:

Springer.

Schapiro, A., Rogers, T., Cordova, N., Turk-Browne, N., & Botvinick, M.
(2013). Neural representations of events arise from temporal community structure. Nature
Neuroscience, 16(4), 486–492.

Schott, J. (2007). A test for the equality of covariance matrices when the dimension is large

relative to the sample size. Computational Statistics & Data Analysis, 51, 6535–6542.

Sen, A. & Srivastava, M. (1973). On Multivariate Tests for Detecting Change in Mean.

Sankhy¯a: The Indian Journal of Statistics: Series A, 35(2), 173–186.

162

Shedden, K. & Taylor, J. (2005). Diﬀerential correlation detects complex associations
between gene expression and clinical outcomes in lung adenocarcinomas. Methods of Mi-
croarray Data Analysis, J. S. Shoemaker and S. M. Lin, eds. Boston: Springer, pp 121–131.

Shen, X., Tokoglu, F., Papademetris, X., & Constable, R. (2013). Groupwise
whole-brain parcellation from resting-state fMRI data for network node identiﬁcation.
Neuroimage, 82, 403–415.

Skelly, D., Johansson, M., Madeoy, J., Wakefield, J., & Akey, J. (2011). A
powerful and ﬂexible statistical framework for testing hypotheses of allele-speciﬁc gene
expression from RNA-seq data. Genome Research, 21(10), 1728–1737.

Srivastava, M. S. & Worsley, K. J. (1986). Likelihood Ratio Tests for a Change in
the Multivariate Normal Mean. Journal of the American Statistical Association, 81:393,
199–204.

Srivastava, M. S. & Yanagihara, H. (2010). Testing the equality of several covariance
matrices with fewer observations than the dimension. Journal of Multivariate Analysis,
101, 1319–1329.

Steibel, J., Bates, R., Rosa, G., Tempelman, R., Rilington, V., & Ragaven-
dran, A. et al. (2011). Genome-wide linkage analysis of global gene expression in loin
muscle tissue identiﬁes candidate genes in pigs. PLOS One, 6(2), e16766.

Steibel, J., Wang, H., & Zhong, P.-S. (2015). A hidden Markov approach for ascer-
taining cSNP genotypes from RNA sequence data in the presence of allelic imbalance by
exploiting linkage disequilibrium. BMC Bioinformatics, 16(1).

Storey, J., Xiao, W., Leek, J., Tompkins, R., & Davis, R. (2005). Signiﬁcance
analysis of time course microarray experiments. Proceedings of the National Academy of
Sciences, 102, 12837–12842.

Tai, Y. & Speed, T. P. (2006). A multivariate empirical Bayes statistic for replicated

microarray time course data. Annals of Statistics, 34, 2387–2412.

Taylor, M., Tsukahara, T., Brodsky, L., Schaley, J., Sanda, C., & Stephens,
M. et al. (2007). Changes in gene expression during pegylated interferon and ribavirin
therapy of chronic hepatitis C virus distinguish responders from nonresponders to antiviral
therapy. Journal of Virology, 81, 3391–3401.

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the

Royal Statistical Society: Series B, 58(1):267–288.

Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., & Knight, K. (2005). Sparsity
and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B,
67(1):91–108.

Trapnell, C., Pachter, L., & Salzberg, S. L. (2009). TopHat: discovering splice

junctions with RNA-Seq. Bioinformatics, 25(9):1105–11.

163

Venkatraman, E. S. (1992). Consistency results in multiple change-point situations. Tech-

nical report, Department of Statistics. Stanford University.

Wang, D., Yu, Y., & Rinaldo, A. (2017). Optimal Covariance Change Point Localiza-

tion in High Dimension. Arxiv.org.

Yang, Q. & Pan, G. (2017). Weighted statistic in detecting faint and sparse alternatives
for high-dimensional covariance matrices. Journal of the American Statistical Association,
112, 188–200.

Yao, Y. & Davis, R. (1986). The Asymptotic Behavior of the Likelihood Ratio Statistic
for Testing a Shift in Mean in a Sequence of Independent Normal Variates. Sankhy¯a: The
Indian Journal of Statistics: Series A, 48(3), 339–353.

Yuan, M. & Lin, Y. (2006). Model selection and estimation in regression with grouped

variables. Journal of the Royal Statistical Society: Series B, 68(1):49–67.

Zacks, J., Speer, N., Swallow, K., Braver, T., & Reynolds, J. (2007). Event

perception: A mind-brain perspective. Psychological Bulletin, 133(2), 273–293.

Zalesky, A., Fornito, A., Cocchi, L., Gollo, L. L., & Breakspear, M. (2014).
Time-resolved resting-state brain networks. Proceedings of the National Academy of Sci-
ences, 111, 10341–10346.

Zhang, C. (2010). Nearly unbiased variable selection under minimax concave penalty. An-

nals of Statistics, 38(2), 894–942.

Zhang, C., Bai, Z., Hu, J., & Wang, C. (2018). Multi-sample test for high-dimensional
covariance matrices. Communications in Statistics - Theory and Methods, 47:13, 3161–
3177.

Zhang, J. & Boos, D. D. (1992). Bootstrap critical values for testing homogeneity of

covariance matrices. Journal of the American Statistical Association, 87, 425–429.

Zhang, C. & Zhang, S. (2013). Conﬁdence intervals for low dimensional parameters in
high dimensional linear models. Journal of The Royal Statistical Society: Series B, 76(1),
217–242.

Zhao, P. & Yu, B. (2006). On Model Selection Consistency of Lasso. Journal of Machine

Learning Research, 2541–2563.

Zheng, S., Bai, Z., & Yao, J. (2015). Substitution principle for CLT of linear spectral
statistics of high-dimensional sample covariance matrices with applications to hypothesis
testing. Annals of Statistics, 43, 546–591.

Zhu, L.-X., Ng, K., & Jing, P. (1992). Resampling methods for homogeneity tests of

covariance matrices. Statistica Sinica, 12, 769–783.

Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American

Statistical Association, 101(476):1418–1429.

164

Zou, H. & Hastie, T. (2005). Regularization and variable selection via the elastic net.

Journal of the Royal Statistical Society: Series B, 67(2):301–320.

165