BOOTSTRAP BASED HYPOTHESIS TESTING FOR HIGH-DIMENSIONAL DATA
                                    By
                         Nilanjan Chakraborty
                          A DISSERTATION
                              Submitted to
                      Michigan State University
               in partial fulfillment of the requirements
                            for the degree of
                  Statistics — Doctor of Philosophy
                                  2022


                                         ABSTRACT
   BOOTSTRAP BASED HYPOTHESIS TESTING FOR HIGH-DIMENSIONAL DATA
                                              By
                                     Nilanjan Chakraborty
Over the last two decades inference problems on high-dimensional data that arise in finance,
genetics and information technology have gained huge momentum. In this work, the main
focus will be on developing bootstrap testing procedures under high dimensional set up for
the following two hypotheses testing problems.
i) High-dimensional Multivariate Analysis of Variance
ii) Testing the equality of two covariance matrices in the two sample set up.
The statistics considered for testing are infinity norm based statistics over either weighted
sums or differences across various samples. We provide Gaussian approximation results for
normalized sums of high dimensional random vectors and U-statistics under some weak con-
ditions on moments and tails of their marginal distributions. The obtained results are free
from the assumption of sparsity and correlation structures among the components of the
random vectors. For the implementation of these tests, we develop multiplier bootstrap and
jackknifed multiplier bootstrap procedures. These newly developed bootstrap techniques en-
sure first order accuracy of the asymptotic level and power of the formulated tests, enhancing
their applicability. We also provide consistency of the proposed test against both fixed and
local alternatives.


This thesis is dedicated to my family and to all my teachers for their endless love, support
and encouragement.
                                            iii


                                 ACKNOWLEDGMENTS
I am indeed grateful to many people who helped me accomplish this doctoral degree in
Statistics. First of all, I would like to thank my parents: Mr. Nandadulal Chakraborty and
Mrs. Krishna Chakraborty. From educating me in the earliest stages of life to constantly
supporting me, even to this day so far away from home, my parents have indeed been my
biggest support system. Without them, this day would not have been possible. Even though
my late grandmother, Smt. Pratima Chowdhury, is not with us, her love and care had always
provided me with a safety net to aspire for a better future. I couldn’t help but remember her
today. Also a very special thanks to my sister, Miss Manjira Chakraborty, for not allowing
me to give up when the chips were down.
    I consider myself fortunate enough to be advised by Prof. Hira Lal Koul and Prof.
Lyudmila Sakhanenko during my graduate study at the Michigan State University. Prof
Koul’s endless support and encouragement have always been a source of inspiration for me.
I must mention that throughout the difficult times of pandemic whenever I needed help, he
would have in-person meetings with me. Prof. Sakhanenko was instrumental in shaping up
my thesis in its present form. Her mathematical acumen and her way of thinking were quite
enticing, which helped me a lot to think independently and grow as a scientific researcher. I
would also like to thank Dr. Tapabrata Maiti (Taps) and Dr. Yuehua Cui for serving in my
committee and for all their support and discussions. I am grateful to Dr. Maiti for making
me look into the application sides of a problem in order to make it more appealing to the
practitioners. I would also like to thank Prof. Koul’s family and Dr. Maiti’s family for their
amazing hospitality.
    Apart from my committee, I am thankful to Prof. Yimin Xiao, Prof. Haolei Wang, Prof.
                                                iv


R.V. Ramamoorthi and Prof. P.K Pathak for their helpful discussions and encouragement
and Prof. Camille Fairbourn, Dr. Harish Sankaranarayanan and Prof. Elijah Dikong for
mentoring me as a teaching assistant at the department. A big thank you to all the staff
members: Andy Hufford, Tami Hankey, Megan Spaulding and Teresa Vollmer who ensured
a nice and smooth working environment at the department. I would also like to thank Prof.
Ashoke Kumar Sinha and his wife Kalyani Ghosh for their unconditional support.
    I am indeed grateful to my professors at the Department of Statistics University of Cal-
cutta: late Prof. Uttam Bandyopadhyay and Prof. Gaurangadeb Chattopadhyay whose im-
mense love, support and encouragement motivated me to pursue the doctoral program from
Michigan State University. The training I received from them and other faculty members,
on statistical inference and other topics, paved the way for my dissertation in this direction.
I am indeed grateful to Prof. Monika Bhattacharjee (Monika di) for being my philosopher
and guide. The discussions that I had with her helped me to enhance my mathematical rigor
and improve my chain of thoughts.
    My deepest gratitude to Abhijnan Chattopadhyay, Alex Pijyan, Satabdi Saha, Cheuk
Yin Lee, Chitrak Banerjee, Runze Su, Yeusong Wu, Pratim Guha Niyogi, Ashish Banik,
Rejaul Karim, Raka Mandal, Tathagata Dutta, Hema Kollipara, Sumegha Premchandar,
Nian Liu, Kaixu Yang, Metin Eroglu and others for all the great with memories of the last
five years, which I would remember for the rest of my life. And last but not the least, I would
like to express sincere gratitude to my friends: Ambarish Chattopadhyay, Aniket Biswas,
Anish Ganguli, Aditi Sen, Sneha Chakraborty, Sagnik Mukherjee, Anirban Ghosh, Srijan
Bhattacharya, Sagnik Halder, Dhrubojyoti Ghosh, Ronita Bose, Hiya Banerjee and Joydeep
Basu, for their support and encouragement in many moments of crisis. I thank you all from
the bottom of my heart.
                                               v


                             TABLE OF CONTENTS
Chapter 1    Introduction . . . . . . . . . . . . . .    . . . . . . . . . . . . . . . . .  1
  1.1 High dimensional MANOVA . . . . . . . . . .        . . . . . . . . . . . . . . . . .  2
  1.2 Testing equality of Two Population Covariance      Matrices    . . . . . . . . . . .  3
  1.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  6
Chapter 2   Multiplier bootstrap tests for high-dimensional data
            with applications to MANOVA . . . . . . . . . . . . .              . . . . . .  7
  2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  7
  2.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  8
  2.3 Gaussian approximation results over the class of Convex sets CL,s        . . . . . .  9
  2.4 Multiplier bootstrap results over the class of convex sets CL,s . . .    . . . . . . 22
  2.5 Motivating application to high-dimensional testing problems . . .        . . . . . . 27
      2.5.1 MANOVA (balanced case) . . . . . . . . . . . . . . . . . .         . . . . . . 28
      2.5.2 Unbalanced MANOVA . . . . . . . . . . . . . . . . . . . .          . . . . . . 34
      2.5.3 Two-Way MANOVA with unequal observations . . . . . .               . . . . . . 36
      2.5.4 Linear hypothesis testing . . . . . . . . . . . . . . . . . . .    . . . . . . 38
  2.6 Connection with other tests . . . . . . . . . . . . . . . . . . . . .    . . . . . . 40
  2.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Chapter 3 Bootstrap based testing for equality of covariance matrices                  . . 45
  3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
  3.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
  3.3 Gaussian approximation result for U statistics . . . . . . . . . . . . . . .     . . 47
  3.4 Jackknifed Multiplier Bootstrap Approximation for U statistics . . . . .         . . 56
  3.5 Two sample test for covariance matrices . . . . . . . . . . . . . . . . . . .    . . 75
  3.6 Analysis of Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  . . 81
APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   90
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
                                            vi


Chapter 1
Introduction
Over the last few decades, modern data collection techniques have facilitated the scientists
to gather data sets with huge number of variables. Such data sets typically referred to as
high dimensional data. It often arises in biology, genomics, artificial intelligence and in
financial sectors. A unique feature of the high dimensional data is that even for relatively
small sample sizes, the observed number of variables is very large. For example, in a typical
gene-data set, one has thousands of gene expressions for a few hundred human beings. More
insights about these data sets can be found in the Efron (2012) [23], Giraud (2021) [27],
Shedden and Taylor (2004) [44], Buhlmann and Van de Geer (2011) [8], Wainwright(2019)
[49] and others.
    From the discussion in the above paragraph, it is clear that to analyze these sorts of
datasets, one needs to apply multivariate methods of estimation and inference. Several
traditional multivariate methods of statistical inference can be found in Anderson (2003)
[3], Muirhead (1982) [36], Khatri and Srivastava (1979) [47], Roy (1957) [41] among many
others. Although there is a vast literature available for analysis of multivariate datasets, they
are of little use when the number of variables becomes either comparable or larger than the
sample size. Traditional multivariate results are not directly applicable when the dimension
of these datasets cannot be treated as a fixed constant. For the past three decades several
statisticians have been trying to develop new inference methods to tackle both estimation
and inferential problems arising in high dimensional datasets. In the thesis we would consider
inferential problems related to high dimensional datasets and a few possible solutions.
                                               1


1.1    High dimensional MANOVA
One high dimensional problem is to test the equality of means among K-groups of popu-
lations. When the sample observations are univariate in nature this problem is known as
classical analysis of variance or ANOVA. The first test statistic for ANOVA was proposed
by Ronald Fisher in 1918 [25]. Under traditional multivariate setup, where the dimension
of the mean vectors is held fixed compared to the sample size, Wilks (1932) [53] provided a
statistic for testing the equality of several population means. Since then several other test
statistics have been proposed by Lawley (1938) [31], Nanda (1951) [37], Pillai (1955) [40],
Roy (1957) [41] and others. These tests heavily rely on the likelihood ratio of the between
sample and the within sample covariance matrices. As soon as the dimension exceeds the
minimum sample size, at least one of the sample covariance matrices become singular and
these likelihood ratio based test statistics tend to become zero which leads to unsatisfactory
performances of these tests. Under the setup of increasing dimensions, the test based on the
ratio of the traces of the sample covariance matrices, was the first test which was proposed
by Dempster (1958, 1960) [21] [22]. Bai and Sarnadasa (1996) [4] provided asymptotic nor-
mality of Dempster’s test statistic based on corrected likelihood ratio when K = 2. Later
Fujikoshi, Himeno and Wakaki (2004) [26], Schott (2007) [42], Srivastava [45], Srivastava and
Yamada (2012) [55] have proposed several tests for MANOVA in high dimensions. All these
tests have been formulated under the assumption of either multivariate normality or under
the assumption of equal covariance matrices among the K-groups.
    Cai and Xia (2014) [10] proposed a test statistic based on infinity norm and proved its
asymptotic convergence to Gumbel distribution under the assumption of homogeneity of the
population covariance matrices. Under the same assumption as in Cai and Xia (2014) [10],
                                                2


Zhang, Guo and Zhou (2017) [56] proposed a test based on Frobenius norm and proved that
its asymptotic null distribution is a Chi- square distribution . Recently, Chen, Li and Zhong
(2019) [15] proposed a test statistic based on L2 -norm and proved its asymptotic normality
without the assumption of normality of the underlying populations and homogeneity of the
population covariance matrices.
    We propose a statistical test for equality of the means for K populations. We establish a
Gaussian approximation result for the class of sparse convex sets. This Gaussian approxima-
tion result generalizes the Gaussian approximation results over the class of hyperrectangles
by Chernozhukov, Chetverikov and Kato (2017) [19]. We also develop multiplier bootstrap
results over the class of convex sets. These results find an extremely useful application in
the context of High Dimensional MANOVA problem based on supremum type test statis-
tics for the difference in means among the K groups. Here we allow the number of groups
(K) to diverge to infinity. The test procedure considered here is free from any distribution
and correlational assumptions which broadens its scope towards practical applications. The
problem of Linear Hypothesis testing under high dimensional setup is also tackled using the
previously mentioned results. The asymptotic analysis of these tests in terms of controlling
size and power is been theoretically validated. The connection with various other tests is
also established.
1.2    Testing equality of Two Population Covariance Matrices
Along with detecting the differences among the population means, another important prob-
lem is to study the dependence among the components of the observed sample vectors under
various stages of treatments, e.g, cancer and Alzheimer’s’ disease data sets. As a motiva-
tion the phenomenon of increasing dimensions attributes to complex dependence structures.
                                                3


Thus the test for equality of two covariance matrices becomes quite challenging. For genomic
studies, the genetic networks of living cells determine the internal structures of the micro-
array gene expressions or single nucleotide polymorphism (SNP) counts. Huge variation
and dependence among the measurements of various genes are observed subject to different
biological conditions and treatments. For example, some genes may be tightly correlated in
the controlled or early stage of a disease but their dependence can wither away during the
later course or more serious stages of the disease. More reference on these data sets and
their various impacts on the dependence structure of the vectors can be found in Shedden
and Taylor (2004) [44].
    In the case of fixed dimension, several tests based on likelihood ratio inspired by Bartlett
(1937) [7] have been proposed. See Chapter 10, Anderson (2003) [3] for more details and
references. Marchenko and Pastur (1967) [35], Capitaine and Casalis (2004) [12], Wang
and Paul (2014) [50] have made immense contribution in the field of random matrix theory
to study the limiting spectral distribution of the sample covariance matrix. Several other
advancements were made by Johansson (2000) [28], Johnstone (2001) [29], Lee and Schnelli
(2016) [32] on the frontier of extreme eigen-values of the sample covariance matrices. In
the case of spiked covariance structure of the sample covariance matrices the asymptotic
distribution of the extreme eigen-values were studied by Johnstone (2001) [29], Baik and
Silverstein (2006) [6], Paul (2007) [39], Bai and Yao (2008) [5], Johnstone and Lu (2009)
[30], Wang and Fan (2017) [51], Cai, Han and Pan (2020) [9].
    From another point of view, without using the random matrix theory (RMT), several
other tests based on different norms have been proposed. The test proposed by Schott (2007)
[43] is based on a metric that measures the difference between the two sample covariance
matrices. With Σ1 and Σ2 denoting the two population covariance matrices, Srivastava and
                                                4


                                                                                  tr(Σ21 )
Yanighara (2010) [46] have proposed a test based on the difference between                 and
                                                                                (tr(Σ1 ))2
  tr(Σ22 )
           , where tr stands for the trace. Both of these tests have been formulated under the
(tr(Σ2 ))2
assumption of multivariate normality of the underlying populations or have been tailored for
moderate high dimensionality.
    A U-statistic based test on an unbiased estimator of the Frobenius norm of the difference
of two population covariance matrices has been proposed by Li and Chen (2012) [33]. As-
suming the difference of the two covariance matrices to be sparse, Cai, Liu and Xia (2013)
[11] have developed a test based on the maximum of the standardized difference between
the entries of the sample covariance matrices. The two tests work outside the regime of
multivariate normal populations. Recently, Chang, Zhou, Zhou and Wang (2017) [14] have
proposed a test based on the bootstrap version of the test statistic used in Cai et al. (2013)
[11] under the assumption of sparsity and correlational structure among the components of
the random vectors.
    We propose a statistical test for the equality of two high-dimensional covariance matrices
that requires no distributional assumptions, except for some weak conditions on the moments
of the random vectors and tails of the marginal distributions. Our derivation is based on
an extension of the one-sample central limit theorem for non-degenerate U statistics in
Chen (2018) [16]. This provides a practically useful procedure with rigorous theoretical
guarantees of asymptotic level and power assessment. In particular, the proposed test is easy
to implement, as we can allow arbitrary correlation structures among the components of the
random vectors. Other salient features include weaker moments and tail conditions than
the existing methods, allowance for highly unequal sample sizes, consistent power behavior
against fairly general alternatives and the data dimension is allowed to be exponentially high
                                                5


under the umbrella of such general conditions.
1.3    Notation
In this section we describe the notation and conventions throughout this thesis. Let d ≡ dm
or dm,n be a sequence of positive integers depending on the context. Let Rd denote the
d-dimensional Euclidean space and for an x ∈ Rd , let xT denote its transpose. For any
two column vectors x = (x1 , · · · , xd )T and y = (y1 , · · · , yd )T in Rd , write x ≤ y whenever
xj ≤ yj , for all j = 1, · · · , d. For any x ∈ Rd , a, b ∈ R, x + a := (x1 + a, x2 + a, · · · , xd + a),
a ∨ b = max{a, b} and a ∧ b = min{a, b}. For any two sequences of positive numbers an
and bn , write an ≲ bn , if for some positive and finite constant C, an ≤ Cbn , for all large
n. We write an ≈ bn if an ≲ bn and bn ≲ an . For any matrix A = ((aij )) of real numbers,
∥A∥∞ := maxi,j |aij |. For any function f : R 7→ R, ∥f ∥∞ := supz∈R |f (z)|. For a smooth
function g : Rd → R, we adopt indices to represent the partial derivatives for brevity, for
example ∂j ∂k ∂l g = gjkl . The notation ξ ∼D G means that the random vector ξ ∈ Rd has
distribution function (d.f.) G.
    Let ψα (x) = exp(xα ) − 1, x ≥ 0, α > 0. For any random variable (r.v.) X , the entity
                          ∥X ∥ψα := inf{λ > 0 : E{ψα (|X |/λ)} ≤ 1}.
is known as the Orlicz norm, when α ∈ [1, ∞) and a quasi-norm for α ∈ (0, 1), see, e.g., p95
of van der Vaart and Wellner (1996). We define Are as the class of all hyperrectangles A in
Rd of the form A = {x ∈ Rd : aj ≤ xj ≤ bj for all j = 1, · · · , d}, where −∞ ≤ aj ≤ bj ≤ ∞
for j = 1, 2, · · · , d.
                                                    6


Chapter 2
Multiplier bootstrap tests for high-dimensional data
with applications to MANOVA
2.1    Introduction
The statistical motivation for this work comes from the high dimensional MANOVA problem
for testing H0 : µ1 = · · · = µK for K > 2. This problem has been a focus of many recent
works due to its growing importance in genomics, neuroimaging, among many other fields
of science. For example, Fujikoshi et.al. (2004) [26] considered the ratio of the traces of
between-sample covariance and within-sample covariance. Meanwhile, Schott (2007) [42]
proposed a test based on the difference of those two traces. Srivastava (2007) [45] used
the Moore-Penrose inverse of the within-sample covariance matrix to construct a test. Cai
and Xia (2014) [10] proposed a test based on the maximum-norm of the squared differences
between K groups. All mentioned tests either have been formulated under the assumption
that the data is generated from a multivariate normal population or under some stringent
distributional or sparsity assumptions. Moreover, all these tests assume equal covariance
structure among all the groups. Recently, Chen, Li and Zhong (2019) [15] have proposed a
thresholded L2 -norm-type test statistic assuming sparsity in population means, mixing and
multivariate sub-Gaussianity among the components of the random vectors. They consider
different sparse covariance matrices across different groups. The sparsity assumptions on
the means and covariances are very important and crucial in their work. In this work, we
eliminate the need for these assumptions.
                                              7


    From a different point of view, this work also extends the recent work by Xue and Yao
(2020) for K = 2 to case K > 2. This extension is elegant and less technical than the direct
reproving of the results in Xue and Yao (2020), where one would have to tackle intricate
block-type dependency structures and work with U - statistics similar to what was done in
Chen (2018) [16]. The class of our tests enjoys all the good properties of the tests in Xue and
Yao (2020) [54]. In particular, our tests are computationally fast and simple since they do
not require estimating of covariance and/or precision matrices. Our tests have an advantage
of being versatile, so they can be just as easily adopted to solve MANOVA or to test for a
linear structure of the population means such as contrasts.
2.2    Overview
This chapter establishes the rates of the approximations of the distributions of the normalized
sums Sn of independent random vectors by those of the corresponding sums of independent
Gaussian random vectors over a new class of convex sets CL,s in a high dimensional setup.
We also establish the rate of the approximation of the distributions of Sn by the multi-
plier bootstrap distributions. The class of convex sets CL,s allows to quantify the effect of
sparsity on the convergence rate of the above approximation explicitly. We first prove the
Gaussian approximation results over a class of convex sets and then prove the multiplier
bootstrap approximation results over the same class. To appreciate the usefulness of the
results obtained here, we illustrate by developing some tests for high-dimensional MANOVA
(Multivariate Analysis of Variance) under various setups and conditions. In particular, these
results find an extremely useful application in the context of High Dimensional MANOVA
problem based on supremum type test statistics for the difference in means among the K
groups. Throughout this chapter, we allow the number of groups (K) to diverge to infinity.
                                               8


The considered test procedure is free from any distribution and correlational assumptions
which broadens its scope towards practical applications. The problem of Linear Hypothesis
testing for several means has also been tackled using the previously mentioned results. The
asymptotic analysis of these tests in terms of controlling size and power has been theoreti-
cally validated. We have also established the connection between our proposed test and some
other tests which are popular in the existing literature. The empirical comparison between
our proposed test and the existing popular competitors is beyond the scope of this thesis
work. However this work as well as a real data application has been recently carried out in
Chakraborty and Sakhanenko (2022) [13].
2.3    Gaussian approximation results over the class of Convex sets CL,s
Recall, that all the vectors are treated as columns. Let X1 , . . . , Xn be independent random
vectors in Rd . The components of Xi are denoted by Xij , j = 1, . . . , d. Assume
EXij = 0, EXij     2 = σ 2 < ∞ , i = 1, . . . , n and j = 1, . . . , d. The normalized sum
                         ij
                                                          n
                                          SnX      n−1/2
                                                         X
                                                 =            Xi
                                                         i=1
is approximated by its Gaussian analogue
                                                          n
                                          SnY      n−1/2
                                                         X
                                                 =            Yi ,
                                                         i=1
where Y1 , . . . , Yn are independent centered Gaussian random vectors in Rd such that EYij2 =
  2 , and all Y are independent from all X . The quality of the approximation of a normalized
σij              i                               i
                                                     9


sum by its Gaussian analogue is assessed via
                           ρn (B) = sup |P (SnX ∈ B) − P (SnY ∈ B)|,
                                    B∈B
where B is a subclass of all Borel sets in Rd .
    For an integer L > 0 and a real number s > 0, we introduce a natural class CL,s of sparse
convex sets which are intersections of a finite number Ld of half-spaces. Let A(1) , . . . , A(L)
be fixed d × d matrices and let u1 , . . . , uL be vectors in Rd . Define
  CL,s = {{w ∈ Rd :A(1) w ≤ u1 , . . . , A(L) w ≤ uL } : A(1) , . . . , A(L) ∈ A, u1 , . . . , uL ∈ Rd },
where A ⊂ Rd×d contains all matrices satisfying sparsity condition: For some s > 0 (poten-
tially depending on L and d.
                                d
                               X     (l)
                       (L)        |Ajm | ≤ s, j = 1, . . . , d, l = 1, . . . , L.                  (2.3.1)
                              m=1
    In a series of seminal works about Gaussian approximation for high dimensional data
Chernozhukov et al. (2017) [19] considered the class of hyperrectangles ARe , which consists
of all sets A of the type,
                         A = {w ∈ Rd : aj ≤ wj ≤ bj ,         ∀j = 1, ..., d}
for some −∞ ≤ aj ≤ bj ≤ ∞, j = 1, 2, · · · , d. Here we make an important observation that
the class ARe is a special case of the class CL,s , where L = 2 and A = {A, −A}.
    In order to formulate the results, we need a some more notation. Similar to Chernozhukov
                                                  10


et al. (2017) [19] we introduce the following quantities. For ϕ ≥ 1 and for any random
variable X we define
                              n                                            √      
                                                                               n
                        n−1                           3
                             X
        M̃n,d,L,X (ϕ) =          E max |Xij | I max |Xij | >                          ,        (2.3.2)
                                        1≤j≤d               1≤j≤d        4ϕ log(Ld)
                             i=1
        M̃n,d,L,X,Y (ϕ) = M̃n,d,L,X (ϕ) + M̃n,d,L,Y (ϕ).
For the sake of brevity from now on we would denote M̃n,d,L,X (ϕ) = M̃n,X (ϕ). It is slightly
different from their (2.3.2) functional, since the log term contains Ld as opposed to d in
Chernozhukov et al. (2017) [19]. Also recall from Chernozhukov et al. (2017)[19] the
notation
                                                           n   h       i
                                           n−1                       3
                                                         X
                                 Ln =            max         E |Xij | .
                                                1≤j≤d
                                                         i=1
    Moreover, we introduce average covariance matrix
                                           n
                                     1 X (i)           (i)
                              Σ=             Σ , Σkm = E(Xik Xim ).                            (2.3.3)
                                     n
                                         i=1
We assume the following condition.
    (N1) There exists a constant b > 0 such that for all j = 1, . . . , d and all l = 1, . . . , L,
                                          [A(l) Σ(A(l) )T ]jj ≥ b.
    This condition gives a structural interplay between matrices A(l) , l = 1, . . . , L, and the
covariance matrices Σ(i) , i = 1, . . . , n. If all the covariance matrices are the same and A is an
identity matrix we recover the condition (M.1) in Chernozhukov et al. (2017) [19]. Thus, (N1)
replaces (M.1) for more intricate matrices. Condition (N1) plays a key role towards achieving
                                                     11


the Gaussian approximation result, as this condition is critical for Nazarov inequality to hold.
See .0.4 for more details. Before beginning the statement of the following lemma we would
like to define a few quantities. For any random variable X̃ ∈ Rd , we define
                                      n
                         1
                                          E[|X̃im |3 ]
                                     X
               Ln,X̃ =        max
                         n 1≤m≤Ld
                                     i=1
                                 n                                                   √      
                              1 X
                                                        3                                n
               Mn,X̃ (ϕ) =          E     max |X̃im | I       max |X̃im | >                      ,
                              n         1≤m≤Ld              1≤m≤Ld                 4ϕ log(Ld)
                                i=1
               Mn,X̃,Ỹ (ϕ) = M̃n,X̃ (ϕ) + M̃n,Ỹ (ϕ).
Lemma 2.3.1. Suppose that (N1) and (L) hold. Then for some constant K’ depending only
on b and all ϕ ≥ 1 we have
                                  √          √
     ρ′n :=        sup        |P ( vSnX + 1 − vSnY ∈ C) − P (SnY ∈ C)|
              C∈CL,s ,v∈[0,1]
                      s3 2 2
                                                                                            p
                    ′                          ′
                                                       p                                    ′   log(Ld)
               ≤ K √ ϕ log (Ld) ϕL          n ρn + Ln log(Ld) + ϕMn,X,Y (sϕ) + K                        ,
                        n                                                                          ϕ
     Lemma 2.3.1 can be seen as an extension of Lemma .0.5.
Proof. The trick is to stack (A(1) Xi , . . . , A(L) Xi ) into Ld-dimensional vectors X̃i for i =
1, . . . , n. Then
                                                    
                        P SnX ∈ C = P SnX̃ ≤ u ,            u = (u1 , . . . , uL ) ∈ RLd .
                                                                                       h P
                                                                                        1  n
The following condition can be seen as an analogue of the condition n                                  2
                                                                                           i=1 E[(Xij ) ] ≥
                                                       12


                    i
b, j = 1, . . . , d   of Lemma .0.5
                             n
                          1X
                                E[(A(l) Xi )2j ] ≥ b, j = 1, . . . , d, l = 1, . . . , L.           (2.3.4)
                          n
                            i=1
Recalling (2.3.3), we see that (2.3.4) can be rewritten as
               n    X d            d                           n d      d
          1X                (l)    X      (l)               1 X X X (l) (l)
                  E       Ajk Xik       Ajm Xim =                            Ajk Ajm E(Xik Xim )
         n                                                  n
              i=1     k=1          m=1                         i=1 k=1 m=1
                                                       =   [A(l) Σ(A(l) )T ]jj .                    (2.3.5)
Therefore from (2.1.5) the condition with b becomes [A(l) Σ(A(l) )T ]jj ≥ b, j = 1, . . . , d and
l = 1, . . . , L which is condition (N1) defined above. Applying Lemma .0.5 on X̃i for the
function
                                     n                                      n
                         1                               1
                                        E[|X̃im |3 ] =                          E[|(A(l) Xi )j |3 ]
                                    X                                      X
               Ln,X̃ =        max                                max
                         n 1≤m≤Ld                        n 1≤j≤d,1≤l≤L
                                    i=1                                    i=1
                                           n        d              3 
                         1               X          X (l)
                      =         max             E       Ajk Xik .
                         n 1≤j≤d,1≤l≤L
                                         i=1        k=1
Using condition (L) and convexity properties of ℓ3 -norm (Orlicz norm) we have
                                                      13


                  d            3              d          (l)               3
                X
                       (l)
                                           X         |Ajk |
             E       Ajk Xik       ≤s E 3                             |Xik |
                                                 Pd             (l)
                 k=1                         k=1    m=1 jm |A       |
                                             d          (l)                            3
                                          X         |Ajk |
                                   ≤   s3                           (E(|Xik   |3 ))1/3
                                                 Pd           (l)
                                            k=1    m=1 |Ajm |
                                             d          (l)                                    3
                                          X         |Ajk |
                                   ≤   s3                             max (E(|Xij     |3 ))1/3
                                                 Pd           (l) 1≤j≤d
                                            k=1    m=1 jm|A       |
                                   = s3 max E(|Xij |3 )                                             (2.3.6)
                                          1≤j≤d
Using (2.3.6) we can bound Ln,X̃ as
                                                   n
                                        s3
                                                       E|Xik |3 = s3 Ln .
                                                  X
                              Ln,X̃ ≤       max
                                        n 1≤k≤d
                                                  i=1
Applying Lemma .0.5 on X̃i we obtain the function Mn,X̃,Ỹ (ϕ) = Mn,X̃ (ϕ) + Mn,Ỹ (ϕ),
where
                            n                                                        √        
                        1X                         3                                      n
          Mn,X̃ (ϕ) =          E      max |X̃im | I         max |X̃im | >                         .
                       n           1≤m≤Ld                1≤m≤Ld                    4ϕ log(Ld)
                           i=1
Note that by condition (2.3.1), we get
                                                       d
                                                     X        (l)
                  max |X̃im | =           max              Ajk Xik ≤ s max |Xik |.
                1≤m≤Ld               1≤j≤d,1≤l≤L                              1≤k≤d
                                                     k=1
Then Mn,X̃ (ϕ) is bounded from above by
               n                                                  √
          s3 X
                                                                           
                                   3                                  n
                  E max |Xik | I max |Xik | >                                     = s3 M̃n,X (sϕ).
           n         1≤k≤d              1≤k≤d              4sϕ log(Ld)
              i=1
                                                  14


Using the bounds for the expressions Mn,X̃,Ỹ , Ln,X̃ and (N1) in Lemma .0.5, we obtain the
end result of Lemma 2.3.1.
Lemma 2.3.2. Suppose that (N1) and (L) hold. Then there exist constants K1 , K2 > 0
depending only on b such that for every sequence (ln , Ln ) of real numbers ∋ ℓn ≥ Ln , we
have
                                               ℓn log7 (Ld) 1/6 Mn,X,Y (sϕn )
                                             2                                
                      Ln
                (1 −     )ρn (CL,s ) ≤ K1 s                      +                 ,
                      ℓn                             n                  ℓn
with
                        ℓn log4 (Ld) −1/6
                       2             
           sϕn = K2                          , Mn,X,Y (ϕn ) = M̃n,X (ϕn ) + M̃n,Y (ϕn ).
                              n
   Note that the technique of proving Lemma 2.3.2 is analogous to Theorem 2.1 in Cher-
nozhukov et al. (2017) [19] adapted to the class of sparse convex sets.
Proof: For
                            s3
               c0 (ϕ) = K ′ √ ϕ3 log2 (Ld)Ln ,
                              n
                            s3                           s3
               c1 (ϕ) = K ′ √ ϕ2 log5/2 (Ld)Ln + K ′ √ ϕ3 log2 (Ld)Mn,X,Y (sϕ)
                              n                            n
                               p
                                  log(Ld)
                         + K′              ,
                                     ϕ
from Lemma 2.3.1, we have
                                       ρ′n ≤ c0 (ϕ)ρ′n + c1 (ϕ).
For c0 (ϕ) > 1, the above inequality is trivial. So we only consider the case when c0 (ϕ) < 1.
We solve the two inequalities 1 − c0 (ϕ) = δ > 0 and ρ′n ≤ c1 (ϕ)/δ and try to choose a ϕ.
The first inequality gives
                                       s3
                                   K ′ √ ϕ3 log2 (Ld)Ln < 1 − δ,
                                        n
                                                  15


while the second yields
                            s3                        s3
                δρ′n ≤ K ′ √ ϕ2 log5/2 (Ld)Ln + K ′ √ ϕ3 log2 (Ld)Mn,X,Y (sϕ)
                             n                          n
                              p
                                log(Ld)
                        + K′            .
                                  ϕ
In other words, in the first inequality we need
                        √
                                                (K ) log (Ld)(Ln /(1 − δ))2 −1/6
                                    1/3  ′ 2 4                                  
                          n(1 − δ)
            sϕ <                           =                                             .
                     K ′ log2 (Ld)Ln                             n
          Ln
Set ℓn = 1−δ  > Ln for 0 < δ < 1 and set K2 = (K ′ )−1/3 /2.
                                                                            −1/6
                                                       K2      ℓ2n log4 (Ld)
    Meanwhile, the second inequality for ϕ = ϕn =        s           n             gives
                                   −2/3
            δρ′n ≤ K ′ K22 sn−1/6 ℓn    log7/6 (Ld)Ln + K ′ K23 ℓ−1  n Mn,X,Y (sϕn )
                      K ′ −1/6 1/3 7/6
                    +     sn      ℓn log (Ld)
                      K2
                                1          1/3
                 ≤ K ′ [K22 +     ]sn−1/6 ℓn log7/6 (Ld) + K ′ K23 ℓ−1   n Mn,X,Y (sϕn ).
                               K2
Now choose K1 = max(K ′ [K22 + K1 ], K ′ K23 ) which completes the proof.
                                      2
Before stating the next theorem, we would introduce some more conditions.
 (N2) let Bn ≥ 1, n ∈ N, be a sequence of numbers such that Ln ≤ Bn .
 (N3a) let Bn ≥ 1, n ∈ N, be a sequence of numbers such that
                              ∥Xij ∥ψ1 ≤ Bn ∀i = 1, . . . , n, ∀j = 1, . . . , d.
                                                 16


 (N3b) let Bn ≥ 1, n ∈ N, be a sequence of numbers such that
                                             |Xij | 3
                                                  
                                  E max               ≤ 2 ∀i = 1, . . . , n.
                                    1≤j≤d     Bn
 (N3c) There exists a universal constant c > 0 such that
                           sBn2 log7 (Ldn)
                                             ≤c
                                   n
       or
                             Bn log7 (nLd) 1/6         Bn log3 (nLd) 1/3
                             2                       2               
                          s                        +                         ≤c
                                     n                      n1/3
In comparison with Chernozhukov et.al. (2017), note that (N2) is their condition (M.2),
while condition (N3a) replaces condition (E.1), and (N3b) is condition (E.2) with q = 3.
Theorem 2.3.1. Suppose conditions (L), (N1), (N2) are satisfied, then under condition
(N3a)
                                                 Bn log7 (nLd) 1/6
                                                2             
                               ρn (CL,s ) ≤ Cs                        ,              (2.3.7)
                                                       n
while under condition (N3b)
                                    Bn log7 (nLd) 1/6         Bn log3 (nLd) 1/3
                                  2                        2              
                ρn (CL,s ) ≤ Cs                        +C                       ,    (2.3.8)
                                          n                        n1/3
where the constant C depends on b only.
Remark 2.3.1. Note that if we consider (N3c), then the conclusion obtained from this
theorem is non-trivial and it specifies the conditions about the growth rates of Bn , L, s, d
                                                 17


with n.
    The proof of Theorem 2.3.1 is an adaptation of the technique used in Proposition 2.1 in
Chernozhukov et al. (2017) [19]. We provide the details below for the sake of completeness.
Proof: From Lemma 2.3.2, we have
                                                ℓn log7 (Ld) 1/6 Mn,X,Y (sϕn )
                                             2                                       
                          ρn (CL,s ) ≤ K1 s                           +                    ,              (2.3.9)
                                                      n                        ℓn
where ϕn and ℓn are defined as in Lemma 2.3.2. To get the bounds on the right hand side
of (2.3.7) and (2.3.8) one needs to bound ℓn , M̃n,X (sϕn ) and M̃n,Y (sϕn ), under (N3a) and
(N3b). Condition(N2) implies that Ln ≤ Bn . We set ln = Bn . The condition (N3a) entails
that max1≤i≤n max1≤j≤d ∥Xij ∥ψ1 ≤ Bn . Now since Yi,j ∼ N (0, σij                   2 ), i = 1, 2, · · · , n and
j = 1, 2, · · · , d, we have for any t ≥ 0,
                                          h             i
                                               2
                  P |Yij | > t ≤ 2 exp − t /2σij ,   2       i = 1, · · · , n and j = 1, · · · , d.
                     ∥Yij ∥ψ2 ≤ [(1 + 2)(2σij   2 )]1/2 , i = 1, · · · , n and j = 1, · · · , d.
Now noting the fact that for p < q, ∥X∥ψp ≤ (log 2)p/q ∥X∥ψq , we obtain that for all
i = 1, 2, · · · , n and j = 1, 2, · · · , d,
                                         ∥Yij ∥ψ1 ≤ (log 2)1/2 ∥Yij ∥ψ2 .
From the property of Orlicz norm, it can be argued that for all i = 1, 2, · · · , n and j =
1, 2, · · · , d, we have
                                      2 )1/2 = (E(X )2 )1/2 ≤ 2∥X ∥ .
                                    (σij               ij                  ij ψ1
                                                         18


Therefore combining the above equations we can conclude that for some universal positive
constant c,
                                      max max ∥Yij ∥ψ1 ≤ cBn .
                                     1≤i≤n 1≤j≤d
Now applying Lemma .0.9, (the maximal inequality for Orlicz norms) we obtain that, for all
i = 1, 2, · · · , n and for some universal positive constants c1 , c2
                                      max |Xij |     ≤ c1 Bn log(d),
                                     1≤j≤d        ψ1
                                       max |Yij |    ≤ c2 Bn log(d).
                                      1≤j≤d       ψ1
Then, from Markov’s inequality it follows that, for all i = 1, 2, · · · , n and for all t ≥ 0,
                                              
                                                                          
                           P    max |Xij | > t ≤ 2 exp − t/c1 Bn log(d) ,
                               1≤j≤d
                                              
                                                                          
                            P    max |Yij | > t ≤ 2 exp − t/c2 Bn log(d) ,
                                1≤j≤d
Now, applying Lemma .0.6, we get
                           √                     3         "            √            #
                              n                                              n
  M̃n,X (sϕn ) ≲                    + Bn log(Ld) + exp −                              . (2.3.10)
                       sϕn log(Ld)                               4csϕm Bn (log(Ld))2
To bound the first term of (2.3.10), for some universal constants c, c∗ we note that
                                √                                  !−1/3
                                                  Bn2 (log7 (Ldn))
                                           
                                  n
                                              ≳                          log(Ldn)
                       4csϕn Bn (log(Ld))2                n
                                              ≳ c∗ log(Ldn).
                                                  19


                                   √              √
                                     n              n     √
Because of (N3c) sϕn ≥ 2,                   ≲ log(Ld) ≲ n, and
                              sϕn log(Ld)
                                                           !1/2
                                          Bn2 (log2 (Ldn))      √       √
                        Bn log(Ld) =                              n ≲ n.
                                                  n
Combining the above equations (2.3.10) reduces to
                                                          ∗
                              M̃n,X (sϕn ) ≲ n3/2 (nd)−c ≲ n−1/2 .                       (2.3.11)
Using similar arguments it can be concluded that
                                       M̃n,Y (sϕn ) ≲ n−1/2 .                            (2.3.12)
Now combining (2.3.11) and (2.3.12), we get
                                      Mn,X,Y (sϕn ) ≲ n−1/2 .
Now finally for (2.3.9), we obtain
                                        Bn log7 (Ld) 1/6 Mn,X,Y (sϕn )
                                       2             
                       ρn (CL,s ) ≤ s                      +
                                              n                    Bn
                                        Bn log7 (Ld) 1/6
                                       2             
                                                                1
                                  ≤s                       +√
                                              n                 nBn
              Bn log7 (Ld) 1/6         Bn log7 (Lnd) 1/6
            2                       2              
        =s                        +                        (Bn )−4/3 (log(Lnd))−7/6 n−1/3
                    n                        n
              Bn log7 (Lnd) 1/6
            2               
        ≲s                         .
                     n
                                                 20


                                        sB 2 log7 (Ldn)                                    3/2
Under (N3b) along with assuming n n                      ≤ c we also assume that Bn1/2−1/3
                                                                                       log     d
                                                                                                  ≤ c.
                                                                                   n
                              B2
Setting ln = Bn + 1/2−2/3n 1/2 , we obtain that Ln ≤ Bn ≤ ln . Further note that, by
                      n         log     d
                                     1/6 
                                                   2 log3 (nLd) 1/3
                                                              
                        2    7
                       Bn log (nLd)              Bn
Condition (N3c) s            n              +                         ≤ c and sϕn ≥ 2. Now, noting
                                                      n1/3
                                                                 √                                  
                                                                     n
the fact that E max1≤j≤d |Xij | I max1≤j≤d |Xij | > 4sϕ log(Ld)
                                    3                                         ≤ E max1≤j≤d |Xij |      3
and therefore by (N3b) we obtain that M̃n,X (sϕn ) ≲ Bn3 . Also note that
ln−1 ≤ Bn−2 n1/2−2/3 log1/2 d and therefore we obtain that
                                  3  n1/2−2/3 log1/2 d
            M̃n,X (sϕn )/ln ≲  Bn .
                                             Bn2
                                        Bn log3 d 1/2          Bn log3 (nLd) 1/3
                                       2                    2              
                                  1
                             ≲                             ≲                     .               (2.3.13)
                               log d n1−2/3                         n1/3
Similarly by (2.3.12)
                                        M̃n,Y (sϕn ) ≲ n−1/2 .                                   (2.3.14)
Now combining (2.3.13) and (2.3.14), we get
                                       Mn,X,Y (sϕn ) ≲ n−1/2 .
Now finally we conclude the proof by noting the fact that from (2.3.9), we obtain that
                                     Bn log7 (Ld) 1/6 Mn,X,Y (sϕn )
                                   2               
                   ρn (CL,s ) ≤ s                          +
                                           n                       ln
                                     Bn log7 (Ld) 1/6
                                   2               
                                                               1
                              ≤s                           +√
                                           n                   nln
                                     Bn log7 (Lnd) 1/6          Bn2 log3 (nLd) 1/3
                                   2                                       
                              ≲s                            +                      .
                                            n                         n1/3
                                                    21


2.4    Multiplier bootstrap results over the class of convex sets CL,s
Theorem 2.3.1 presents an abstract Gaussian approximation since the covariance matrix Σ
is unknown but needed to identify the distribution of Y ’s. To remove this abstractness we
propose the multiplier bootstrap technique which was introduced by Chernozhukov et al.
(2017) [19] for the class of hyper-rectangles. For a vector (e1 , . . . , en ) of iid N (0, 1) random
variables and independent of all X, Y , we define
                                        n                      n
                                     1X                    1 X
                               X̄ =        Xi , SneX = √           ei (Xi − X̄).
                                     n                      n
                                       i=1                    i=1
                                                               (i)
Recall (2.3.3) and denote Σ̂ = n1 n        i=1 Σ̂ , where Σ̂km = (Xik − X̄k )(Xim − X̄m ),
                                                  (i)
                                        P
1 ≤ i ≤ n and 1 ≤ k, m ≤ d.
Lemma 2.4.1. Suppose conditions (N1) and (L) hold. Then, for every constant ∆                   ¯ n > 0,
     ρM  B                            eX                               Y
       n (CL,s ) = sup |P (Sn ∈ B|X1 , . . . , Xn ) − P (Sn ∈ B)| ≤ C ∆n log
                                                                                       ¯ 1/3 2/3 (Ld)
                       B∈CL,s
    on the event
                            max      max     [A(l1 ) (Σ̂ − Σ)(A(l2 ) )′ ]jk ≤ ∆
                                                                              ¯ n,                  (2.4.1)
                        1≤l1 ,l2 ≤L 1≤j,k≤d
where the constant C depends on b only.
Proof of Lemma 2.4.1. We start by stacking (A(1) Xi , . . . , A(L) Xi ) into Ld-dimensional vec-
tors X̃i for i = 1, . . . , n. Then for any B ∈ CL,s , there exists a vector u = (u1 , . . . , uL ) ∈ RLd
such that
                         P (SneX ∈ B|X1 , . . . , Xn ) = P (SneX̃ ≤ u|X1 , . . . , Xn )
                                                       22


and
                                    P (SnY ∈ B) = P (SnỸ ≤ u).
Let,
                             1 X n                                   1 X  n               
            ∆=     max               (X̃i − X̃)(X̃i    − X̃)′        −          E[X̃i X̃i′ ]       .
                1≤k,m≤Ld      n                                   km     n                     km
                                 i=1                                        i=1
First we try to calculate the variances of X^      i − X̄ and show that ∆ reduces to the quantity
in the left hand side of (2.4.1) Note that X̃ = X̄      ˜ and X̃ − X̃ = X^
                                                                   i           i − X̄. Then we have
                                              n
                                           1 X h (l )                            ′   (l  )   ′
                                                                                               i
             ∆ =       max        max                A (Xi − X̄)(Xi − X̄) (A )
                                                        1                              2
                    1≤l1 ,l2 ≤L 1≤j,k≤d n                                                       jk
                                             i=1
                           n h
                       1                                   i
                               A(l1 ) E(Xi Xi′ )(A(l2 ) )′
                         X
                    −
                      n                                      jk
                         i=1
                                          h                           i
                 =     max        max       A(l1 ) (Σ̂ − Σ)(A(l2 ) )′      ,
                    1≤l1 ,l2 ≤L 1≤j,k≤d                                 jk
which is the quantity participating in the left side of (2.4.1). Now the statement of the
Lemma 2.4.1 can be obtained by applying Lemma .0.7 which concludes the proof.
    Before stating the next theorem we need to state another condition.
 (N3c’) There exists a sequence of αn ∈ (0, e−1 ), such that
                                                                      !
                                      Bn2 log5 (nLd) log2 (1/αn )
                              s1/9                                       ≤ 1, or
                                                      n
                                      Bn log3 (Ld) 1/3 −2/9
                                     2               
                              s 2/3                          αn      ≤ 1.
                                          n1/3
Theorem 2.4.1. Suppose conditions (L), (N1) and (N2) hold.
Additionally, assume E(Xij Xik )2 ≤ Bn2 , i = 1, . . . , n, j, k = 1, . . . , d.
                                                    23


Then under condition (N3a), with probability at least 1 − αn , we have
                                        Bn log(nLd) 1/6 2/3
                                       2             
               ρM B
                n (CL,s )   ≤  Cs2/3                       log (Ld) log1/3 (1/αn ),
                                               n
while under condition (N3b) with probability at least 1 − αn , we have
                           Bn log5 (Ld) 1/6 1/3                           Bn log3 (Ld) 1/3 −2/9
                          2                                          2             
   ρM  B
    n (CL,s )  ≤  Cs2/3                        log (1/αn ) + Cs   2/3                     αn ,
                                 n                                            n1/3
where the constant C depends on b only.
Remark 2.4.1. Note that if we consider (N3c’), then the conclusion obtained from this
theorem is non-trivial and it specifies the conditions about the growth rates of Bn , L, s, d
with n.
Theorem 2.4.1 can be viewed as an analogue of Corollary 4.2 in Chernozhukov et al. (2017)
[19] adapted for the class of sparse convex sets.
    Proof: The proof is in similar to that of Proposition 4.1 in Chernozhukov et al. (2017)
[19]. First, condition (L) yields
                  ∆˜ :=        max        max [A(l1 ) (Σ̂ − Σ)(A(l2 ) )′ ]jk
                           1≤l1 ,l2 ≤L 1≤j,k≤d
                                             n
                             2            1X
                       ≤   s max                [Xij Xik − E(Xij Xik )] − X̄j X̄k  .        (2.4.2)
                               1≤j,k≤d n
                                            i=1
Note that (2.4.2) can be bounded by
                                      n
                                  1X
                     s2   max            [Xij Xik − E(Xij Xik )] + s2 X̄j X̄k .             (2.4.3)
                        1≤j,k≤d n
                                     i=1
                                                  24


To bound the first term of (2.4.3), we denote
                                                   n
                                                1X
                            2
                         σn,1  = s2      max          E[Xij Xik − E(Xij Xik )]2 .
                                      1≤j,k≤d n
                                                  i=1
Applying Cauchy-Schwarz inequality and from the condition that E(Xij Xik )2 ≤ Bn2 ,
                                                       2 ≤ s2 n(B )2 . Also denote
i = 1, . . . , n, and j, k = 1, . . . , d, we bound σn,1            n
                              M X = max        max |Xij Xik − E(Xij Xik )|.
                                        1≤i≤n 1≤j,k≤d
To bound ∥M X ∥ψ           , we note that for some positive universal constants c′i s
                       1/2
   ∥M X ∥ψ            ≤      max    max |Xij Xik | + max         max E|(Xij Xik )|
                1/2        1≤i≤n 1≤j,k≤d                 1≤i≤n 1≤j,k≤d                ψ1/2
                    ≤ c1    max     max |Xij Xik |          + c2   max     max E|(Xij Xik )|
                           1≤i≤n 1≤j,k≤d               ψ1/2       1≤i≤n 1≤j,k≤d                 ψ1/2
                                                2
                    = c1    max max |Xij |          + c2 max      max E|(Xij Xik )|.
                           1≤i≤n 1≤j≤d          ψ1        1≤i≤n 1≤j,k≤d
                                                                                          2
Under (N3a), applying Lemma (.0.1) on the term max1≤i≤n max1≤j≤d |Xij |                      , we obtain
                                                                                          ψ1
that
                                             2
                       max max |Xij |           ≤ c3 Bn2 log2 (dn) ≤ c3 Bn2 log2 (Ldn).
                      1≤i≤n 1≤j≤d            ψ1
Note that c2 max1≤i≤n max1≤j,k≤d E|(Xij Xik )|                     ≤ Bn2 . Therefore, under (N3a) by
                                                             ψ1/2
applying Lemma (.0.1) and Lemma (.0.2) with ϵ = 1 and δ = 1/2, we obtain the following
                                                      25


inequalities.
  E∆˜ ≤ Cs2 (n−1 Bn2 log(Ldn))1/2 ,
                                                             2                         √       
    
              2  −1  2
      ˜ > Cs (n Bn log(Ldn))      1/2
                                                         nt                            c   nt
  P ∆                                  + t ≤ exp − 4 2 + 3 exp −                                    .
                                                        3s Bn                       sBn log(Ldn)
Choosing t = Cs2 (n−1 Bn2 log(Ldn))1/2 log(1/α), which is proportional to ∆         ¯ yields the result
under (N3a).
Under (N3b), we see that σn2 ≤ nsBn2 . Now, to bound the term M X , we note that for any
q≥2
                                                                                    )
             X q/2                                                             q/2
                        ≤ c4 E max |Xij Xik |q/2 + max E |Xij Xik |
                                                                           
           E (M )
                                  i,j,k                  i,j,k
                                                                             
                                               q/2                          q            q
                        ≤ c4 E max |Xij Xik |          = c4 E max |Xij | ≤ c4 nBn .
                                  i,j,k                              i,j
                                                           1/2                q
From the above inequalities, it implies that E (M X )2                ≤ c4 n2/q Bn . Therefore, under
(N3b) for q = 3, applying Lemma (.0.1) and Lemma (.0.3) with ϵ = 1 and u = 3/2 we obtain
the following inequalities.
                E∆˜ ≤ Cs2 (n−1 B 2 log(Ldn))1/2 Cs2 n−1/3 B 2 log(Ld),
                                  n                               n
                                                 2   
                           ˜ + t ≤ exp − nt
                                                          n                       o
                P ∆˜ > E∆                               +      ct−3/2 n−1/2 s3 B 3 .
                                                                                  n
                                              3s4 Bn2
                            h                                                         i
Then by choosing t =    Cs2   (n−1 Bn2 log(Ldn))1/2 log(1/α) + n−1/3 α−2/3 Bn2          we obtain the
desired result.
Corollary 2.4.1. Suppose conditions of Theorem 2.4.1 are satisfied. With probability 1 there
                                               26


exists integer n0 > 0 such that ∀n ≥ n0
                                          Bn log(nLd) 1/6 2/3
                                         2           
               ρM  B
                 n (CL,s )     ≤  Cs2/3                   log (Ld) log1/3 (1/αn ),
                                               n
under condition (N3a) and ∀n ≥ n0
                              Bn log5 (Ld) 1/6 1/3                    Bn log3 (Ld) 1/3 −2/9
                           2                                       2            
   ρM  B        ≤ Cs2/3                         log (1/αn ) + Cs2/3                     αn ,
    n (CL,s )
                                    n                                      n1/3
under condition (N3b).
    Proof: Corollary 2.4.1 follows from Theorem 2.4.1 and by Borel-Cantelli lemma as one
                                               P∞
can choose a sequence of αn ↓ 0 such that         n=1 αn < ∞ and consider the events in Theorem
2.4.1 to get the desired result.
Remark: This result proves the convergence of multiplier bootstrap result in almost sure
sense.
2.5    Motivating application to high-dimensional testing problems
Armed with the convergence results for the multiplier bootstrap, we consider test statistics
for MANOVA problem that are functionals of the normalized sum SnX and the class CL,s .
We show that bootstrap tests based on these statistics are consistent for MANOVA and for
general linear hypothesis testing in high dimensional setup. Our approach produces a whole
class of tests that are flexible and can be tuned to many different scenarios. The class of
matrices A(l) , l = 1, . . . , L, serves as a tuning mechanism. Our tests do not need sparsity
assumptions on the means and/or covariances. We require the moment and tail assumptions
on the distributions that are less stringent than those in the existing literature. Our approach
                                                   27


allows for multivariate sub-Gaussian distributions and some heavy-tailed distributions, and
thus broadens the regime of its practical use.
     Suppose there are K independent groups of random vectors Vk,i ∈ Rp , i = 1, . . . , n, k =
1, . . . , K, drawn from K populations with means µ1 , . . . , µK ∈ Rp . The k-th group after
centering consists of independent vectors Zk,1 = Vk,1 −µk , . . . , Zk,n = Vk,n −µk in Rp . Then
stack vectors Z1,1 , . . . , ZK,1 into X1 , stack vectors Z1,2 , . . . , ZK,2 into X2 , and so on, stack
Z1,n , . . . , ZK,n into Xn . Then we obtain centered long vectors Xi ∈ Rd with d = Kp.
     We remark that we only need independence and same means for Vk,i ∈ Rp , i = 1, . . . , n.
These vectors do not have to come from the same distribution. Also we can let d grow with
n. This can be achieved by letting the dimension p grow with n or by letting the number of
groups K grow with n or both. This is quite rare in the literature but a very useful setup in
practice.
2.5.1       MANOVA (balanced case)
We are interested in the hypotheses
                               H0 : µ1 = · · · = µK versus HA : not H0 .
For the i-th vector in the k-th group Vk,i denote its q −th components by [Vk,i ]q , q = 1, . . . , p.
We propose the test statistic
                                                       n X K X p
                                                                   (l)
                                                n−1/2
                                                      X
                      Tn = max         max                        Aj(q+(k−1)p) [Vk,i ]q ,
                            l=1,...,L j=1,...,d
                                                      i=1 k=1 q=1
                                                       28


where each matrix A(l) satisfies the condition
                       K
                      X      (l)
               (H)         Aj(q+(k−1)p) = 0, j = 1, . . . , d, q = 1, . . . , p, l = 1, . . . , L.
                      k=1
Note that
                                                                                          
                          Tn = max             max            (l)  X
                                                         [A Sn ]j + n [A µ]j ,1/2   (l)
                                  l=1,...,L j=1,...,d
where µ ∈ Rd consists of vectors µk ∈ Rp , k = 1, . . . , K, stacked into one long vector. Under
H0 and condition (2.5.1), the test statistic becomes
                                      Tn = max              max [A(l) SnX ]j .
                                               l=1,...,L j=1,...,d
     Since we apply multiplier bootstrap, we recall that
                                                 n                                n
                                          1 X                                  1X
                              SneX  =√              ei (Xi − X̄), X̄ =                Xi .
                                           n                                   n
                                               i=1                               i=1
where the vector (e1 , . . . , en ) of iid N (0, 1) random variables is independent of all Xi , Yi , i =
1, . . . , n. Note that
                                               n
                                        1 X
                            SneX   =√              ei (V1,i − V¯1 , . . . , VK,i − V¯K )T ,
                                          n
                                             i=1
where V¯k = n1 n     i=1 Vk,i , k = 1, . . . , K, are the groupwise averages. Define
                  P
                                                       n X  K X   p
                                                                         (l)
                 Tne                       n−1/2
                                                      X
                     = max         max                               Aj(q+(k−1)p) ei [Vk,i − V¯k ]q ,
                       l=1,...,L j=1,...,d
                                                      i=1 k=1 q=1
                                           h             i
                     = max         max A Sn     (l)   eX     .
                       l=1,...,L j=1,...,d                 j
                                                           29


Denote the Kolmogorov distance by
  KD
                                            n X K X    p                                        
                                       −1/2
                                             X                (l)                    1/2 (l)
  = sup P       max      max         n                      Aj(q+(k−1)p) [Vk,i ]q − n [A µ]j ≤ u
     u∈R      l=1,...,L j=1,...,d
                                             i=1 k=1 q=1
                 − Pe (Tne ≤ u) ,
where Pe stands for the probability with respect to Gaussian vector e only.
Theorem 2.5.1. Suppose conditions (L), (N1) and (N2) are satisfied. Additionally, assume
E(Xij Xik )2 ≤ Bn2 , i = 1, . . . , n, j, k = 1, . . . , d. Let α ∈ (0, 1/e) be arbitrary. Then, under
condition (N3a), with probability at least 1 − α, for all n large enough,
                                                   Bn log7 (nLd) 1/6
                                                  2                
                                    KD ≤ Cs                              ,
                                                            n
and, under condition (N3b), with probability at least 1 − α, for all n large enough,
                                    Bn log7 (nLd) 1/6              Bn log3 (nLd) 1/3
                                  2                              2             
                   KD ≤ Cs                                  +C                         ,
                                            n                          n1/3
where the constant C depends on b only.
    We remark that KD tends to 0 as n grows as long as s6 Bn2 log7 (nLKp)n−1 = o(1), under
(N3a). Then we can allow all the quantities s = sn , Bn , L = Ln , K = Kn , p = pn to grow
with n. We can consider extreme particular cases. For example, if only s is allowed to grow
then it can be as large as s = o(n1/6 ). If only p is allowed to grow, it can be as large
as log p = o(n1/7 ). The same argument holds for K. One can rebalance different growth
assumptions put on s, K, and p.
                                                        30


Proof: We start by observing that
     KD = sup |P ( max          max [A(l) SnX ]j ≤ u) − Pe ( max          max [A(l) SneX ]j ≤ u)|
             u∈R     l=1,...,L j=1,...,d                       l=1,...,L j=1,...,d
          ≤ sup |P (A(l) SnX ≤ uId , l = 1, . . . , L) − Pe (A(l) SneX ≤ uId , l = 1, . . . , L)|
             u∈R
          ≤ ρM  B
               n (CL,s ) + ρn (CL,s ),
where Id is a vector of ones in Rd and class CL,s has all ul = uId , l = 1, . . . , L.
Here we apply Theorems 2.3.1 and 2.4.1 noting the fact that for all n large enough
               Bn log7 (nLd) 1/6               Bn log(nLd) 1/6 2/3
              2                              2            
           s                          >s  2/3                      log (Ld) log1/3 (1/α)
                     n                                n
and
                       Bn log3 (nLd) 1/3                Bn log3 (Ld) 1/3 −2/9
                      2                              2             
                                               > s2/3                      α       .
                             n1/3                           n1/3
Consider testing the hypothesis
                               H0 : µ1 = · · · = µK vs HA : not H0
at the significance level α. We reject H0 if
                                               Tn ≥ Qα ,
                                                    31


where quantiles of the bootstrapped distribution Qα is defined as
                           Qα = inf{u ∈ R : Pe (Tne ≤ u) ≥ 1 − α},      α ∈ (0, 1).
                                                                                 1/6
                                                                     2 log7 (nLd)
                                                                    Bn
Theorem 2.5.1 implies that under condition (H), when s                    n            → 0, or
    2 log3 (nLd) 1/3
                   
  Bn
         1/3
                           → 0, we have
        n
                  P (Tn ≥ Qα |H0 ) → α,
                  P (Tn ≥ Qα |HA ) → 1, provided ∃ j and l such that : [A(l) µ]j ̸= 0.
Thus, these bootstrap tests are consistent against any fixed alternative as long as
[A(l) µ]j ̸= 0 for some l = 1, . . . , L and some j = 1, . . . , d.
    We also remark that Vk,i , i = 1, . . . , n, can come from different distributions with the
same mean vector µk . In this case Bn is not a constant. Then Bn , p, K can all grow with n.
In particular, if Bn ∼ nβ , log p ∼ nρ , log K ∼ nκ for some positive β, ρ, κ, then under (N2)
and (N3a) as long as 2β + 7 max(ρ, κ) < 1, KD would converge to 0 and the tests will remain
consistent against any fixed alternative. Under (N2) and (N3b) one would need additionally
6β + 9 max(ρ, κ) < 1. This is a more interesting setup than the case of identically distributed
Vk,i , i = 1, . . . , n, since Bn turns out to be a constant when (β = 0). In the iid case one can
allow max(ρ, κ) < 1/7 under (N2) and (N3a) or max(ρ, κ) < 1/9 under (N2) and (N3b).
    Currently the proposed test is the only test that can accommodate the non-identically-
distributed scenario.
    Next, we consider a class of local alternatives that converge to the null hypothesis as
                                                    32


n → ∞. Let
                  (n)
                HA : µ1 , . . . , µK ∈ Rp :         min       min [A(l) µ]j ≥ cn n−1/2 ,
                                                 l=1,...,L j=1,...,d
where cn → ∞ slowly as n → ∞. In fact cn n−1/2 → 0 as n → ∞. Recall d = Kp and it is
allowed to grow with the sample size n.
Corollary 2.5.1. Suppose conditions of Theorem 2.5.1 are satisfied. With probability tending
to 1, we have
                                                 (n)
                                P (Tn ≥ Qα |HA ) → 1 as n → ∞.
Proof: Note that
                         (n) 
        P Tn ≥ Qα |HA
                                           n X K X   p
                                                             (l)
                                     n−1/2
                                            X
         =P      max       max                            Aj(q+(k−1)p) [Vk,i ]q
               l=1,...,L j=1,...,d
                                            i=1 k=1 q=1
                    √                     √
                                                                    
                            (l)  (n)            (l)  (n)
                  − n[A µA ]j + n[A µA ]j ≥ Qα
                                           n X K X   p
                                                             (l)
                                     n−1/2
                                            X
         ≥P      max       max                            Aj(q+(k−1)p) [Vk,i ]q
               l=1,...,L j=1,...,d
                                            i=1 k=1 q=1
                    √
                                                                                             
                                 (n)                                         (n)
                  − n[A(l) µA ]j ≥ Qα − cn , min                 min [A(l) µA ]j  ≥ cn n−1/2
                                                     l=1,...,L j=1,...,d
                                         (n)                             (n)
             − Pe (Tne ≥ Qα − cn |HA ) + Pe (Tne ≥ Qα − cn |HA )
                                     (n)
         ≥ Pe (Tne ≥ Qα − cn |HA ) − KD → 1
as n → ∞.
                                                    33


2.5.2    Unbalanced MANOVA
Suppose that samples have now different sample sizes nk , k = 1, . . . , K. Introduce a modified
test statistic
                                            K X nt p
                                           X       X −1/2 (l)
                 T
                 fn = max         max                  nt   Aj(q+(t−1)p) [Vt,i ]q .
                       l=1,...,L j=1,...,d
                                           t=1 i=1 q=1
Then introduce its bootstrapped version and quantile
                                           K X nt p
                                          X        X −1/2 (l)
                fne = max
                T                 max                 nt   Aj(q+(t−1)p) [ei Vt,i ]q .
                      l=1,...,L j=1,...,d
                                          t=1 i=1 q=1
and
                               n                               o
                                               fne ≤ u ≥ 1 − α , α ∈ (0, 1).
                                                      
                    fα = inf u ∈ R : Pe T
                    Q
Let, n = min1≤k≤K nk . Renumber groups so that the first group has the smallest sample
size. Assume
                                  nk
                          (D)          = λk,n ∈ [1, ∞), k = 2, . . . , K,
                                   n
where λk,n do not have to converge to a constant as n → ∞ but should remain bounded.
                                                  34


   We decompose T      fn into a version of Tn
                                    K X  p "X    n
                                                               −1/2 (l)
                                                     n−1/2 λt,n Aj(q+(t−1)p) [Vt,i ]q
                                  X
       T
       fn = max           max
              l=1,...,L j=1,...,d
                                  t=1 q=1      i=1
                                         λt,n                             #
                                −1/2               (l)
                    + n−1/2 λt,n
                                          X
                                                Aj(q+(t−1)p) [Vt,i ]q
                                        i=n+1
                                    K X  p "X    n
                                                               −1/2 (l)
                                                     n−1/2 λt,n Aj(q+(t−1)p) [Vt,i ]q
                                  X
            ≈ max         max
              l=1,...,L j=1,...,d
                                   t=1 q=1     i=1
                                               n    [λt,n ]−1                                                    #
                                                                  (l)
                    + n−1/2 [λt,n ]−1/2
                                              X        X
                                                               Aj(q+(t−1)p) [Vt,n+(m−1)(λ                    ]
                                                                                                    t,n −1)+r q
                                             m=1       r=1
                                    K X  p              n
                                                              (l)              1/2
                                             n−1/2
                                  X                   X
            ≈ max         max                               Aj(q+(t−1)p) λt,n
              l=1,...,L j=1,...,d
                                   t=1 q=1             i=1
                                      "          [λt,n ]−1                                 #
                                  ]−1
                                                    X
                         × [λt,n         Vt,i +             Vt,n+(m−1)([λ                      ,
                                                                               t,n ]−1)+r
                                                    r=1                                      q
where we used a type of blocking and γn ≈ δn means limn→∞ γδ n = 1, with probability 1.
                                                                                       n
The approximation appeared due to the use of integer parts of λt,n , t = 1, . . . , K. Define
                                       (l)^                  (l)              1/2
                                     Aj(q+(t−1)p) = Aj(q+(t−1)p) λt,n .
Finally, define new variables as
                                 [λt,n ]−1                                 
                    ]−1
                                     X
      Vft,i = [λt,n       Vt,i +             Vt,n+(m−1)([λ           ]−1)+r , t = 1, . . . , K, i = 1, . . . , n.
                                                                t,n
                                     r=1
Note that E Vf  t,i = µt and the samples of these new variables are independent. Also note
                   
that T
     fn is an analogue of Tn with V replaced by Ve . The variables Vf                          t,i , t = 1, . . . , K, are
now stacked into new variables X         fi . We remark that X          fi , i = 1, . . . , n, are independent but
                                                           35


they will not be identically distributed even if the original Xi ’s were.
Corollary 2.5.2. Suppose the conditions of Lemma 2.3.1 are satisfied for X                 e and A   g (l) , l =
1, . . . , L. Moreover, assume conditions (2.5.2) and (2.5.1) hold for nk , k = 2, . . . , K and
A(l) , l = 1, . . . , L. When
g
                                                                        1/6
                                                     Bn2 log7 (nLd)
                                                 
                            s( max λk,n     )1/2                              →0
                              k=1,...K             n mink=2,...,K λ2k,n
     or
                                                       1/3
                                  Bn2 log3 (nLd)
                            
                                                            → 0,
                              n1/3 mink=2,...,K λ2k,n
we have
                  P (T fn ≥ Q
                            fα |H0 ) → α,
     and
                  P (T      fα |HA ) → 1 provided ∃ j and l such that : [A(l) µ]j ̸= 0.
                       fn ≥ Q
     Corollary 2.5.2 is proved similarly to establishing consistency of the test.
2.5.3       Two-Way MANOVA with unequal observations
Consider the set up, where
                                   Yi,j,k = µ0 + αi + βj + γij + ϵi,j,k ,
where k ∈ {1, 2, · · · , Ni,j }, i = 1, 2, · · · , I and j = 1, 2, · · · , J. Here µ0 , αi , βj , γij are un-
known p × 1 vectors of parameters while ϵi,j,k are mean zero p × 1 random vectors with
                                                      36


unknown covariance Σi,j . To make the model identifiable we are given a sequence of pos-
itive weights wij , i = 1, 2, · · · I, j = 1, 2, · · · , J so that Ii=1 wi. αi = 0, Jj=1 w.j βi = 0,
                                                                       P                 P
PI                        PJ                         PI PJ                                 PJ
     i=1 wij γij = 0,         j=1 wij γij = 0, and      i=1   j=1 wij γij = 0, where wi. =     j=1 wi,j and
         PI
w.j =       i=1 wi,j .    Consider the null hypothesis
                                           H0 : α1 = α2 = · · · = αI = 0.                            (2.5.1)
Comparing (2.5.1) with the set up of unbalanced MANOVA with K = I groups and testing
equality of means for µk = µ0 +αk when there are nk = Ij=1 Nkj non-identically distributed
                                                                    P
                                               
observations Vk,1 , Vk,2 , · · · , Vk,n = Yk11 , Yk12 , · · · , Yk1N , Yk21 , Yk22 , · · · , Yk2N ,
                                             k                             k1                       k2
                                       
· · · , YkJ1 , YkJ2 , · · · , YkJN       . Then the test statistic T
                                                                   fn would be well defined and then one
                                    kJ
rejects H0 if T   fn > Q    fα . Analogously for the null hypothesis
                                           H0 : β1 = β2 = · · · = βJ = 0,
we consider K = J groups and the testing problem µk = µ0 + βk when there are
         PI                                                                                 
nk = i=1 Nik non-identically distributed observations Vk,1 , Vk,2 , · · · , Vk,n =
                                                                                           k

   Y1k1 , Y1k2 , · · · , Y1kN , Y2k1 , Y2k2 , · · · , Y2kN ,
                               1k                          2k
                                      
· · · , YIk1 , YIk2 , · · · , YIkN      . Then the test statistic T  fn would be well defined and then
                                   Ik
rejects H0 if T   fn > Q    fα . Finally consider the null hypothesis
                                            H0 : γ11 = γ12 = · · · = γIJ .
This time we need to look at K = IJ groups and the testing problem reduces to
µk = µ0 + αi + βj + γij , when there are nij observations Vk,m = Yijm , k = i + (j − 1)I,
                                                          37


i = 1, 2, · · · , I, j = 1, 2, · · · , J. Then the test statistic T
                                                                  fn would be well defined and we reject
H0 if Tfn > Q   fα .
    Unlike tests based on L2 norm, proposed by Watanabe, Hyodo and Nakagawa (2020) [52]
our tests do not need to estimate the unknown unequal covariance matrices Σij and as a
result of which our tests avoid estimation of standard deviation of the test statistic which is
computationally expensive.
Remark: It is easy to extend our framework to multi-level MANOVA setup and we can
allow the number of groups K to grow exponentially with n.
2.5.4     Linear hypothesis testing
Similar to Zhang, Guo and Zhou (2017) [56] we are now interested in the hypotheses
                                         H0 : Gµ = 0 vs HA : Gµ ̸= 0,
where G is a known q × d matrix of rank q < d. This setup includes contrast tests and
MANOVA. Note that (GGT )−1 exists.
    Consider the d × d matrices A(l) = M (l) [GT (GGT )−1 G]T , for some non-singular d × d
matrices M (l) , l = 1, . . . , L. The expression in squared brackets is related to Moore-Penrose
matrix inverse of G. This structural condition on A(l) replaces condition (H). Under H0 we
have A(l) µ = 0, while under any alternative A(l) µ ̸= 0. Then the test described above works
for this setup, and Theorem 2.5.1 holds with (H) replaced by this new structure of matrices
A(l) .
    Let (c1 , . . . , cK ) be a non-zero vector in RK . As an example consider the following
                                                       38


hypotheses
                 H0 : c1 µ1 + · · · cK µK = 0 versus HA : c1 µ1 + · · · cK µK ̸= 0.
In this case G = (c1 Ip , . . . , cK Ip ) with Ip being a p × p identity matrix and let
                                                                                    
                                            c21 Ip      c1 c2 Ip     · · · c1 cK Ip 
                                                                                    
                                                          2
                                                                                    
                                    M (l)  c1 c2 Ip     c2 I p       · · · c2 cK Ip 
                        A(l) =                                                       ,
                                                                                    
                                    ∥c∥22
                                          
                                           ···          ···          ··· ···
                                                                                    
                                                                                     
                                                                                    
                                                                                    
                                             c1 cK I p   c2 cK Ip · · · cK Ip  2
               PK
where ∥c∥22 =     k=1 ck
                        2   and M (l) is an arbitrary non-degenerate d × d matrix. Recall that
d = Kp. As in MANOVA define test statistics as
                                                        n X  K X   p
                                                                           (l)
                                              n−1/2
                                                       X
                  Tn = max             max                             Aj(q+(k−1)p) [Vk,i ]q .
                         l=1,...,L j=1,...,d
                                                       i=1 k=1 q=1
Then the bootstrapped test statistics are
                                                    n X K X   p
                                                                    (l)
              Tne                         n−1/2
                                                  X
                  = max            max                            Aj(q+(k−1)p) ei [Vk,i − V¯k ]q .
                    l=1,...,L j=1,...,d
                                                   i=1 k=1 q=1
Define quantiles of the bootstrapped distribution as
                      Qα = inf{u ∈ R : Pe (Tne ≤ u) ≥ 1 − α}, α ∈ (0, 1).
We reject H0 if
                                                    Tn ≥ Qα .
                                                        39


Since A(l) µ = 0 holds if and only if H0 holds, then P (Tn ≥ Qα |H0 ) → α and P (Tn ≥
Qα |HA ) → 1 provided that conditions of Theorem 2.5.1 are satisfied. This test allows to
treat high-dimensional MANOVA and contrast tests in the unified framework unlike other
tests.
2.6    Connection with other tests
In this section we show that the tests introduced in Xue and Yao (2020) [54] and Lin, Lopes
and Muller (2021) [34] fit into our framework.
    2-sample test by Xue and Yao (2020): Consider the setup in Xue and Yao (2020).
There are K = 2 independent groups of random vectors Vi1 , Vi2 , i = 1, . . . , n, drawn from
2 populations in R2p with means µ1 , µ2 ∈ Rp . Their test statistics are the maximums of
ASnX , where the matrix A ∈ R2p×2p is tri-diagonal matrix. The elements of the matrix A is
denoted by ai,j for 1 ≤ i, j ≤ 2p. It has 1 on the main diagonal and -1 on the diagonals that
start from a1,(p+1) and a(p+1),1 . Indeed, Xi = (Vi1 , Vi2 )T ∈ R2p and
             AXi = ([Vi1 − Vi2 ]1 , . . . , [Vi1 − Vi2 ]p , [Vi2 − Vi1 ]1 , . . . , [Vi2 − Vi1 ]p )T .
Then
                                               V      V       V      V
                               ASnX = (Sn 1 − Sn 2 , Sn 2 − Sn 1 )T .
Therefore, their test statistic is
                    V      V                        V          V
                 ∥Sn 1 − Sn 2 ∥∞ = max |[Sn 1 − Sn 2 ]q | =                max [ASnX ]j .
                                        q=1,...,p                       j=1,...,2p
    Note that condition (2.5.1) holds for this A, where L = 1 and A = {A} and s = 2. Their
                                                    40


results are the special case of our results. Note that their proofs are specially designed for
K = 2 and the direct generalization would be difficult. It would require intricate work with
U -statistics in the same spirit as what is done in Chen (2018).
    MANOVA for functional data by Lin et al. (2021):
Consider the set up in Lin et al. (2021). There are K independent groups of random vectors
drawn from K populations with means µ1 , . . . , µK ∈ Rp . The k-th group after centering
consists of Zk,1 , . . . , Zk,n independent observations in Rp . Now stack vectors Z1,1 , . . . , ZK,1
into X1 and so on Z1,n , . . . , ZK,n into Xn . Then Xi ∈ Rd with d = Kp.
    For τ ∈ [0, 1), the test statistics in Lin et al. (2021) are the maximums of ASnX , where
the rows of matrix A are (0, . . . , 0, √ 1τ        , 0, . . . , 0, − √ 1τ       , 0, . . . , 0), where the non-zero
                                           2σ                           2σ
                                              k,l,j                        k,l,j
entries are in the positions (k − 1)p + j, (l − 1)p + j for 1 ≤ j ≤ p. Here
                           2
                         σk,l,j = 0.5V ar(X1,(k−1)p+j ) + 0.5V ar(X1,(l−1)p+j ).
                                                                    √
Thus, A satisfies (2.5.1) with L = 2 and s = min 2στ . From the setup in Lin et al.
                                                                  k,l,j k,l,j
(2021), the indices (k, l) belong to the set (k, l) ∈ P ⊂ {(i1 , i2 ) : 1 ≤ i1 < i2 ≤ K}. Their
test statistic is of the form of our test statistic with A = {A, −A}.
    Note that our theorems provide at best the rates for Kolmogorov distance of order n−1/6
with respect to n and of order (log d)7/6 with respect to d, while Lin et al. (2021) get rates
                                                                              √
of order n−1/2+δ for an arbitrary δ > 0 with d = Kp ≤ pe log n . Lin et al. (2021) attain
this rate under a stringent requirement of a special structure of the matrices Σ(i) with many
restrictions that are difficult to check on practice. Their conditions essentially reduce the
                                                                               a
high dimensional problem to a problem, where p ≈ n1/(log n) ∨ (log n)3 with a ∈ (0, 0.5).
                                                       41


2.7    Discussion
In this chapter we have introduced a framework of bootstrap tests that can address several
different testing problems in high-dimensional setup (n << p) in a unified fashion. This is
done by considering tests statistics that are supremums of sums over sparse classes of convex
sets of a novel type. These classes serve the role of a tuning mechanism, which can be chosen
based on the particular problem. Basically for a hypothesis about means, one needs to select
a finite number of sparse matrices A(l) , l = 1, . . . , L, such that A(l) µ = 0 under the null
hypothesis. To get a test with high power under a specific alternative, one needs to select
these matrices so that A(l) µ is maximized. For instance, in case of very sparse alternative,
one needs several more dense matrices, however, that is controled by condition (L) and one
can only ask for s = O(log(Ld)). On the other hand, for dense means one can use a single
matrix (so L = 1) with just a few non-zero diagonals in which case s is a finite number.
    The resulting bootstrap tests have many advantages. In particular, they are consistent
against any fixed alternative; they attain good level and power for large p - small n. They
are distribution and correlation free. They are computationally fast, in particular they
are faster by as much as p times in comparison to methods that require precision matrix
estimation. Even for sparse alternatives, our tests have comparable performance to that
of the existing specialized tests. We only require mild moment and tail assumptions on
the distributions. We do not require that the ratios of sample sizes converge to a specific
limit. Unlike current tests in literature, we do not require the samples to come from the
same distribution, the tail and moment conditions have Bn that can grow with n in the
non-identical case. We provide proofs for the bounds of ρn (CL,s ) with explicit dependency
on the sparsity parameter s together with dependency on n and d. These bounds are similar
                                               42


to the bounds in Chernozhukov et al. (2017) [19]. However, they are non-trivial extension of
such results to a more complicated class of convex sets for which the dependency on sparsity
of convex sets is tractable. Our proofs do not rely on delicate and complex geometric results
employed in Chernozhukov et al. (2017) [19] for their s-sparse convex sets.
    The only drawback in our test is that our methods have the rate of (log(LKp))a n−1/6
which is quite far from a parametric rate. Even though our framework can accommodate tests
of the type as in Lin et al. (2021) [34], exploiting the covariance structure, in particular the
decay of covariance components with the dimension p leads to specialized tests with better
near-parametric rates as in Lin et al. (2021) [34]. However, this covariance structure has to
be verified in practice, which could be difficult in general apart from functional and sparse
count data. Intuitively, such covariance structures give a dimension reduction mechanism
and the effective dimension becomes of the same order as n. However this drawback has
been dealt with by imposing stricter conditions and has been presented in [13].
    Our work is different from Zhang, Zhou, He and Liu (2018) [57] who proposed MANOVA
test that adapts to the sparsity of the alternative based on the data. Our work comes with
exact rates for the tests, see Corollary 2.5.2, unlike their tests. Moreover, our methods allow
for growing K = Kn as long as log(LKp) = o(nδ ) for δ ∈ (0, 1/7) under (N3a) condition.
The computation is also much easier as we do not require estimation of individual elements
of the covariances. More precisely, our tests require s6 Bn2 log7 (nLKp)n−1 = o(1) under
(N3a). Then we can allow all the quantities s = sn , Bn , L = Ln , K = Kn , p = pn to grow
with n. We can consider extreme particular cases. For example, if only s is allowed to grow
then it can be as large as s = o(n1/6 ). If only p is allowed to grow, it can be as large
as log p = o(n1/7 ). The same argument holds for K. One can rebalance different growth
assumptions put on s, K, and p. These growth assumptions are better than those in Zhang
                                                43


et al. (2018) [57].
    From a probabilistic point of view, this work also introduces a novel class of sparse convex
sets, for which the Berry-Esseen type results are obtained. This new class of convex sets
generalizes the classes of half-spaces and hyper-rectangles to classes of “hyper-polygons“,
which are linear transformations of intersections of a finite number of hyper-spaces. Such
sets are sparse in the sense of Chernozhukov et al. (2017) [19]. We manage to explicitly
track the effect of the sparsity parameter s on the rates. Under certain tail and moment
assumptions on the distribution, this effect is linear. Chernozhukov et al. (2017) [19] did not
establish this dependency explicitly, since it was buried in their complicated and intricate
implicit geometric structures.
                                               44


Chapter 3
Bootstrap based testing for equality of covariance matrices
3.1     Introduction
The problem of testing the equality of the two covariance matrices in the two sample multi-
variate set up is a classical problem. It has been thoroughly studied in the low-dimensional
setting where the multivariate dimension p is fixed and smaller than the sample sizes, see
Chapter 10, Anderson (2003) [3] and the references therein.
     In the context of high dimensional data where the number of components p either grow
polynomially or even exponentially with increasing sample sizes, this problem has been
addressed only in the last decade or so. The tests proposed by Schott (2007) [43] and
Srivastava and Yanighara (2010) [46] are applicable only for multivariate normal populations.
A U-statistic test based on an unbiased estimator of the Frobenius norm of the difference of
the two population covariance matrices was proposed by Li and Chen (2012) [33]. Cai, Liu
and Xia (2013) [11] proposed a test based on the maximum of the standardized differences
between the entries of the two estimates of the two population covariance matrices. The tests
of Li and Chen (2012) [33] and Cai et al. (2013) [11] work outside the regime of multivariate
Gaussian populations as well. Further investigation by .Cai et al. (2013) [11] revealed that
the statistic proposed by Li and Chen (2012) [33] fails to distinguish between the null and the
alternative hypotheses when the difference between the two population covariance matrices
fall under “dense" regime, i.e., when the number of non-zero elements in the difference matrix
is not too high. On the other hand the test described in .Cai et al. (2013) [11] works well
                                               45


when the difference matrix is sparse, i.e., when the two population covariance matrices can
differ only in a very small number of entries.
    Although Cai et al. (2013) [11] showed that under certain regularity conditions their test
enjoys some optimality in terms of the asymptotic power, it has been pointed out by Fan,
Liao and Yao (2015) [24] that the convergence of the limiting distribution to Gumbel re-
quires large sample sizes and some power-enhancement techniques. Chang, Zhou, Zhou and
Wang (2017) [14] investigate the finite sample performance of a bootstrap version of the Cai
et al. (2013) [11] test. Their technique involves using multiplier bootstrap approximation
result for random vectors after vectorizing the covariance matrices. Their bootstrap method
fails when the two populations means are unknown and unequal because then sample covari-
ance matrices can no longer be expressed as sums of independent vectors. Moreover, they
established consistency of their test under some restrictive conditions like sparsity and other
correlational structures.
    The need for U-statistics based testing approach for covariance matrices can be moti-
vated by noting the fact that the high dimensional central limit theorem fails because the
sample covariance matrix can no longer be written as a vectorized sum of independent high
dimensional vectors.
3.2    Overview
In this chapter we propose a test for testing the equality of the two population covariance
matrices in the high dimensional set up when the two populations means are unknown with
minimal distributional and correlational assumptions. The proposed test is based on the
maximum of the absolute differences between the entries of the Jacknifed estimators of the
two population covariance matrices. We actually use a multiplier bootstrap version of this
                                              46


test statistic. The proposed multiplier bootstrap procedure makes the size and power com-
putation a lot faster. Moreover, the absence of distributional and correlational assumptions
makes it applicable much more broadly, compared to the above mentioned tests. The pro-
posed test is shown to be consistent against a large class of alternatives and is argued to be
rate-optimal against the class of sparse alternatives. These results are obtained using the
seminal works of Chernozukov, Chetverikov and Kato (CCK) (2013, 2015, 2017) [17], [18],
[19] and Chen (2018) [16].
    The chapter is organised as follows. Section 3.3 describes the testing problem along with
the definition of the relevant quantities of interest. Section 3 has been split into two sections
where Section 3.3 provides the bounds between the test statistic and its corresponding Gaus-
sian counterpart. Section 3.4 provides a comprehensive discussion of the proposed jackknifed
multiplier bootstrap method. Section 3.5 describes the testing procedure along with the def-
inition of the relevant quantities of interest.This section also deals with the theoretical result
which facilitates us to conduct the proposed test at level α. Section 3.6 provides a statisti-
cal guarantee for the approximation between the true power of the test to its bootstrapped
version. Section 3.6 allows us to prove the consistency of the proposed test.
3.3    Gaussian approximation result for U statistics
In this section we shall prove the Gaussian approximation result for a large class of U
statistics.
    Let Fj , j = 1, 2 be possibly two different d.f.’s on Rp . Let µj and Σj , j = 1, 2 denote
their mean vectors and covariance matrices, respectively. Let X m represent the random
sample X1 , · · · , Xm from F1 and Y n denote the random sample Y1 , · · · , Yn from F2 . We
wish to test H0 : Σ1 = Σ2 versus the alternatives Ha : Σ1 ̸= Σ2 . The proposed test for this
                                                47


problem is based on the difference of the estimates of the covariance matrices Σ1 , Σ2 , which
in turn are U statistics. For that reason we shall first analyse some asymptotic properties of
a general class of U statistics.
     Let h be a kernel function from Rp ×Rp 7→ Rp ×Rp that is symmetric under permutations,
i.e., h(x1 , x2 ) = h(x2 , x1 ), for all x1 , x2 ∈ Rp and d = p × p. Thus h is a p × p matrix. Let
vec(h) denote vector representation of h, i.e., vec(h) is the d-dimensional vector consisting
of all entries of h. Assume E vec(h(X1 , X2 )) + E vec(h(Y1 , Y2 )) < ∞.
     Let Um X and U Y be the U-statistics defined for X m and Y n , respectively, as
                     n
                                                                            
                             X            1          X
                           Um    :=                           vec h(Xi , Xj ) ,
                                     m(m − 1)
                                                 1≤i̸=j≤m
                                                                          
                                         1                     
                           UnY
                                                   X
                                :=                          vec h(Yi , Yj ) .
                                     n(n − 1)
                                                1≤i̸=j≤n
                                                                                
The expected values of        Um X  is denoted by      ΣX  := E vec h(X1 , X2 ) and that of UmY is
                                          
denoted by ΣY := E vec h(Y1 , Y2 ) . We define the quantities
                                  √       X − ΣX )                √
                                    m(Um                             n(UnY − ΣY )
                       Wm  X  =                      ,     WnY  =                  ,
                                           2                                2
which will play a key-role in the hypothesis testing as seen in later sections. We further
                                                                                      
define the linear projection terms of Um       X as g(x) := E vec h(X , X ) X = x − ΣX , and
                                                                                 
                                                                            1  2     1
                                                          
that of UnY as g(y) := E vec h(Y1 , Y2 ) Y1 = y − ΣY , x, y ∈ Rp . The d × d covariance
                                                 
matrices associated with g(X) and g(Y ) are defined accordingly as
                                                                              
                        ΓX    = E g(X)g(X) ,     T         ΓY  = E g(Y )g(Y ) .T
                                                       48


    For any measurable function f from Rp ×Rp → Rd that is symmetric under permutation,
let fa denote the ath coordinate of f , a = 1(1)d. Further define for xj , yj ∈ Rp , j = 1, 2,
                                                                                        
 f (x1 , x2 ) := vec h(x1 , x2 ) − g(x1 ) − g(x2 ),       f (y1 , y2 ) := vec h(y1 , y2 ) − g(y1 ) − g(y2 ).
    A kernel h : Rp × Rp 7→ Rp × Rp is said to be non-degenerate if Var(ga (X) > 0,
∀a = 1, 2, · · · , d. It is said to be completely degenerate if P g(X) = 0 = 1 or equivalently,
                                                                                    
                                                                               ∀x1 , x2 ∈ Rp .
                                                                   
                E h(x1 , X2 ) = E h(X1 , x2 ) = E h(X1 , X2 ) = 0,
    For a sequence of real numbers δm,n , let
                                        Wm,n := Wm  X +δ              Y
                                                             m,n Wn .
Decompose Wm,n as Wm,n = Lm,n + Rm,n , where
                             n                  δ           n 
                        1 X                 X      m,n X
                                                                               
          Lm,n := √               g(Xi ) − Σ     + √               g(Yj ) − ΣY ,
                         m                              n
                            i=1                            j=1
                              1          X                             δm,n      X
          Rm,n := √                             f (Xi , Xj ) + √                           f (Yi , Yj ).
                       2 m(m − 1)                                 2 n(n − 1)
                                      1≤i̸=j≤m                                 1≤i̸=j≤n
Note that Rm,n is a degenerate U statistic while Lm,n is non-degenerate. It is reasonable to
                                                    49


expect that Lm,n would give a good approximation of Tm,n . Let
              ρ∗∗m,n
                             √ (W X − ΣX )              √ (W Y − ΣY )          
                := sup     P    m m              + δm,n ( n n              )∈A
                    A∈ARe
                                          2                        2
                                    G1           G2
                                                       
                             − P Tm + δm,n Tn ∈ A ,
                                             Y ∈ A − P T G1 + δ         G2
                                                                             
                = sup     P Um X +δ       U                           T     ∈ A   ,
                                      m,n n                m       m,n n
                  A∈ARe
         G                          G
where Tm 1 ∼D Nd (0, ΓX ), and Tn 2 ∼D Nd (0, ΓY ).
    To state the result about the Gaussian approximation of the U statistics we need the
following assumptions.
 (a) There exists constants 0 < b < ∞ and δ2 > δ1 > 0 such that δ1 < |δm,n | < δ2 and
                  h                    i
                             2 g 2 (Y ) > b, ∀ m ∧ n ≥ 1.
      inf 1≤a≤d E ga2 (X) + δm,n  a
 (b) There exists a sequence of positive constants Bm,nl , l = 1, 2 such that the following holds
      with with ξ = X, ξ ′ = X ′ and ξ = Y, ξ ′ = Y ′ .
                          h                       i
                                        ′      2+l      l ,
                    max E vec(h(ξ, ξ )) a            ≤ Bm,n    l = 1, 2, ∀ m ∧ n ≥ 1.
                  1≤a≤d
 (c) There exists a sequence of positive constants Bm,n such that the following holds with
      ξ = X, ξ ′ = X ′ and ξ = Y, ξ ′ = Y ′ ,
                         max ∥ vec(h(ξ, ξ ′ )) a ∥ψ1 ≤ Bm,n ,
                                
                                                                  ∀ m ∧ n ≥ 1.
                        1≤a≤d
 (d) log(d) ≤ b(m ∨ n), for some constants K, b > 0.
                                                50


    We are now ready to present the following theorem which provides an approximation of
the error bound estimate between the probability of interest and its Gaussian counterpart.
Theorem 3.3.1. Under the above set up and assumptions (a)–(d), the following holds.
                                       Y ∈ A − P T G1 + δ       G2
                                                                      
     ρ∗∗
      m,n = sup      P Wm  X +δ     W
                                 m,n n              m         T
                                                           m,n m    ∈  A   ≲ ϖmX + ϖY ,
                                                                                     n
             A∈ARe
                Bm,n log7 (md) 1/6               Bm,n log7 (nd) 1/6
               2                               2             
where  ϖmX  =                              Y
                                     and ϖn =                        .
                       m                                n
Remark 3.3.1. Condition (a) specifies the restriction on the sequence of constants δm,n to
be bounded and ensures the non-degeneracy of the samples X and Y . Condition (c) imposes
a condition on the third and fourth order moments. In the existing literature Chen and
Li (2012) [33], Cai et al. (2013) [11] and Chang et al. (2017) [14] assumed the third order
moments to be bounded whereas we allow them to diverge to infinty at a rate specified in
condition (d). A similar assumption of uniform boundedness on the tails of X and Y were
made by Chang et al. (2017) [14] and .Cai et al. (2013) [11]. In our case Condition (c)
implies the tails of the distribution of X and Y can diverge to infinity in accordance with
(d). All these condition are either the same or weaker than those appearing in the above
references.
Proof. This theorem can be seen as a two sample version of the Theorem 2.1 of Chen
(2018). Some detailed calculations are still needed so we provide the proof for the sake of
completeness. The proof uses the bounds obtained from Lemma (.0.12) and Lemma (.0.13).
                                              51


Let
                                               2      X := max E g (X) ,   3
             D2X := max E vec(h(X1 , X2 )) a ,       Dg,3              a
                     1≤a≤d                                   1≤a≤d
                                               4
             D4X := max E vec(h(X1 , X2 )) a .
                     1≤a≤d
    The Jensen’s inequality and assumption (b) yield that
                                            2                            3
            D2X = max E vec(h(X1 , X2 )) a ≤ max E vec(h(X1 , X2 )) a .
                     a                             a
For bounding the term Dg,3X , note that
                                                         
                        ga (x) := E vec(h(X, X2 )) a X = x ,    x ∈ R,
                             3                           3
                   E ga (X)    = E E vec(h(X1 , X2 )) a X1
                                                      3    
                               ≤ E E vec(h(X1 , X2 )) a X1
                                                    3             X .
                               = E vec(h(X1 , X2 )) a    ≤ Bm,n = D̄g,3
By the condition (b), we readily obtain
                                        D4X ≤ Bm,n
                                                2 .
For the term Mh,4
                X (τ ) at τ = 0, by the Cauchy Schwarz inequality, we obtain that for some
                                             52


constant C1 > 0,
   X (0) = E
             h                                     4                                 i
 Mh,4            max     max     vec(h(X1 , X2 )) a I      max   vec(h(X1 , X2 )) a > 0
               1≤i̸=j≤n 1≤a≤d                             1≤a≤d
             h                                    i1 h                                    i1
                         max | vec(h(X1 , X2 )) a |8 2 P( max | vec(h(X1 , X2 )) a | > 0) 2
                                                                                    
         ≤E      max
               1≤i̸=j≤n 1≤a≤d                                1≤a≤d
                                                    8
         ≤ C1      max     max     vec(h(X1 , X2 )) a ψ2
                1≤i̸=j≤n 1≤a≤d                           1
                                                    4
         ≤ C1      max     max | vec(h(X1 , X2 )) a ψ .
                1≤i̸=j≤n 1≤a≤d                          1
Use this bound and Lemma 2.2.2 of Van der Wart and Wellner (1996), pg.96, to obtain that
                       h                                         i4
     X (0)
  Mh,4     ≤ log4 (md)     max     max ∥ vec(h(X1 , X2 )) a ∥ψ1 ≤ log4 (md)Bm,n
                                                                                  4 .  (3.3.1)
                         1≤i̸=j≤n 1≤a≤d
                                               53


   By (3.3.1) and the definitions of the entities involved,
            3       1
                X )4               X )2 log4 d −1/6 
 ϕm (log d) 2 (Mh,4            (D̄g,3                     log d  32
                            
                      ≤ C2                                              X ) 14
                                                                     (Mh,4
          m                            m                    m
                                                                                                  2
                                   m1/6         log d  3                       5/6              3
                                                                             (log d)(log(md))(Bm,n )
                      = C2                               2     X ) 41
                                                           (Mh,4       ≤ C2
                                     1     2       m                                       5
                                X ) 3 log 3 d
                            (D̄g,3                                                      m6
                                         11   2
                            (log(md)) 6 Bm,n  3
                      = C2              5
                                      m6
                             2                    1                4
                               Bm,n (log(md))7 6 log(md) 6                       X,
                      ≤ C2                                               ≤ C2 ϖm
                                        m                    m
                             2       4
                           Dg,3 log d −1/6 log(d) 1/3
          log(d) 1/2                    
     ϕm √ D2 ≤                                     √      Bm,n
              m                 m                    m
                          2
                           Bm,n (log(nd))7 1/6
                                                                         1/6
                                                                1                       X,
                      ≤                                                          ≤ C3 ϖm
                                     n                    2 log5 (dm)
                                                       Bm,n
        log5/4 d 1/4         D2 log4 d −1/6 log5/4 d
                                 g,3                              1/2
   ϕm           D4 ≤ C4                                        Bm,n
          n 3/4                     m                 m  3/4
                          2                 7 1/6                             1/12
                           Bm,n (log(nd))               log(d) 7/12
                                                                        
                                                                              1                X.
                      ≤                                                      2
                                                                                       ≤ C4 ϖm
                                     n                     m               Bm,n
By Lemma C.1 of CCK (2017), applied with Bm,n = D̄g,3           X , we obtain that for some universal
constant c∗ > 0,
                                             √                           3
                                                 m
                         M3X (ϕm )    ≲                + Bm,n log(d)
                                           ϕm log(d)
                                                                               √             
                                                                                  m
                                                       + exp −                               ,
                                                                       4c1 ϕm Bm,n (log(d))2
                    √                                              !−1/3
                                           (Bm,n )2 (log7 (dm)
                                 
                      m
                                      ≳                                    log(dm)
           4c1 ϕm Bm,n (log(d))2                     m
                                      ≳ c∗ log(dm).
                                                   54


                      √           √
                        m           m      √
Because ϕm ≥ 2,               ≲ log(d) ≲ m, and
                   ϕm log(d)
                                                             !1/2
                                        (Bm,n )2 (log2 (dm)       √       √
                     Bm,n log(d) =                                  m≲      m.
                                                m
Combining these bounds we obtain
                                                          ∗
                                X (ϕ ) ≲ m3/2 (md)−c ≲ m−1/2 .
                             Mg,3    m
                                                          G
Using similar arguments it can be concluded that M3 1 (ϕm ) ≲ m−1/2 .
   The last two facts in turn yield that
                        M3X (ϕm ) = Mg,3  X (ϕ ) + M G1 (ϕ ) ≲ m−1/2 .
                                              m         3    m
Moreover,
           X )2 log7 d 1/6
       (D̄g,3                 M3X (ϕm )
     
                            +      X
              m                  D̄g,3
              Bm,n log7 d 1/6
             2           
                                        1
         ≤                      +√
                   m                 mBm,n
              Bm,n log7 d 1/6          2 log7 (md) 1/6
             2
                                     Bm,n
                                 
         =                      +                           (Bm,n )−4/3 (log(md))−7/6 m−1/3
                   m                        m
              Bm,n log7 (md) 1/6
             2               
         ≲                         .
                     m
Using all the above facts we finally conclude that,
                              Bm,n log7 (md) 1/6          Bm,n log7 (nd) 1/6
                             2                          2              
                    ρ∗∗
                     m,n  ≲                          +                         .
                                     m                           n
                                                 55


This completes the proof of the Theorem.                                                       2
    The conditions required to prove the above theorem are weaker than those in the existing
literature. Condition (a) states that the second moments of vec(h(X1 , X2 )), vec(h(Y1 , Y2 )) be
bounded away from zero. It is worth noting that unlike Cai et al. (2013) [11] and Li and Chen
(2012) [33], the condition (b) does not require a common fixed bounds on the moments. Here
the tail can grow in an uniform manner as Bm,n grows to infinity. Li and Chen (2012) [33]
considered some structural assumption on the traces of the two covariance matrices and Cai
et al. (2013) [11] imposed some structural assumptions like correlation and sparsity among
the components of X m and Y n . Schott(2007) [43], Srivastava and Yanighara (2010) [46] had
the strict conditions of normality on X m and Y n . As stated before the conditions assumed
here are much weaker in the sense that no specific distributional assumption or additional
correlational assumption nor any uniformity of the moment conditions are required.
    Although Theorem 3.3.1 acts as a foundational stone towards the Gaussian approximation
of the distribution of Wm       m,n Wn , but because the limiting distribution is unknown, this
                           X +δ      Y
theorem is of little use in implementing any test based on Wm      m,n Wn for the large sample
                                                              X +δ       Y
sizes. To circumvent this problem we are proposing bootstrap approximation in Theorem
4.1 in the next section, which acts a crucial step towards bridging this gap.
3.4     Jackknifed Multiplier Bootstrap Approximation for U statistics
In this section instead of applying re-weighted multiplier bootstrap to estimate the unknown
covariance matrix we employ the jackknifed version of multiplier bootstrap approximation
with jackknifed estimator of the covariance matrix. One reason behind choosing this strategy
is that the i.i.d re-weighted bootstrap or naive multiplier bootstrap techniques are proven to
be slower than the jackknifed counterpart, see, e.g., Section 3 in Chen (2018) [16].
                                               56


                                                                                                                  G
    Let e1 , e2 , · · · , em+n be i.i.d standard normal r.v.’s that are independent of X m , Y n , Tm 1
      G
and Tn 2 . Define the Jack-knife versions of Tm            X and T Y as follows.
                                                                       n
                                          m h             m
                           eX := √1
                                         X        1      X                               i
                                                                                        X e.
                         Tm                                    vec h(Xi , Xj ) − Um         i
                                     m         m−1
                                         i=1           j̸=i=1
                                          n h           n
                                   1            1                                      i
                         TneY := √                           vec h(Yi , Yj ) − UnY ei+m .
                                        X              X                      
                                     n        n−1
                                        i=1          j̸=i=1
Define the Jackknife estimators of the corresponding covariance matrices of Tm                        X and T Y as
                                                                                                               n
                                     m XXn
                         1                                                                                   X T,
                                                                                 on                            o
 Γ̂JK                                                                          X
                                    X                                                                 
   m   :=                                           vec   h(X  i , Xj )  −  U m     vec  h(X  i , X j )  − U m
            (m − 1)(m − 2)2
                                    i=1 j̸=i k̸=i
                                   n                                                                       oT
                        1         XXXn                                       on
 Γ̂JK
   n   :=                                          vec  h(Yi , Yj )) − Um Y      vec h(Yi , Yj )) − Um . Y
            (n − 1)(n − 2)2
                                  i=1 j̸=i k̸=i
Let
                                       (m − 2)2 JK                         (n − 2)2 JK
                            Γ̃JK
                              m :=                Γ̂ ,          Γ̃JK
                                                                   n :=              Γ̂ ,                     (3.4.1)
                                      m(m − 1) m                          n(n − 1) n
                            ∆m,n = (Γ̃JK           X       2        JK
                                           m − Γ ) + δm,n (Γ̃n − Γ ) .
                                                                             Y
                                                                                 ∞
    For any two random vectors ξ, ζ, the notation ξ|ζ denotes the conditional distribution of
ξ, given ζ. Note that Tm         eX |X m is N 0, Γ̃JK and T eY |Y n is N 0, Γ̃JK . We are ready
                                                                                                 
                                        1       d       m              n     1       d      n
to state the following lemma which plays a crucial role towards obtaining the bootstrap
approximation result.
Lemma 3.4.1. Let Z1X , Z2Y be two independent random vectors such that
Z1X |X m ∼ Nd (0, Γ̃JK                   Y    n               JK
                           m ) and Z2 |Y ∼ Nd (0, Γ̃n ). Then, for some constant 0 < C < ∞
                                                          57


and every sequence of real numbers ∆  ¯ m,n > 0, on the event {∆m,n ≤ ∆   ¯ m,n },
                                                  G            G
                                                                    
   sup    P Z1X + δm,n Z2Y ∈ A|X m , Y n − P Tm 1 + δm,n Tn 2 ∈ A ≤ C(∆          ¯ m,n )1/2 log d.
 A∈ARe
Proof. The proof is an immediate consequence of Theorem 5.1, CCKK (2022) with Z =
  G            G                    2 ΓY ) and Z X + δ
Tm 1 + δm,n Tn 2 ∼ Nd (0, ΓX + δm,n                             Y   X m , Y n ∼ Nd (0, Γ̃JK
                                                                             
                                                    1     m,n Z2                             m +
 2 Γ̃JK ).
δm,n  n
Remark 3.4.1. Lemma 3.4.1 is instrumental in deriving the rates of Jackknife version of the
U-statistics. It also is an improvement over similar results of Chen (2018) [16], Proposition
5.4. The impact of this result can be appreciated by noting the fact that the rate of bootstrap
                                       log (nd) 1/6      log (nd) 1/4
                                       5               5       
approximation has improved from            n         to      n        . This implies that the
boostrapped version of the statistic converges to a Gaussian distribution at a rate of n−1/4 .
    For the next theorem we need the following condition.
 (e) log(1/γm,n ) ≤ K log(d(m ∨ n)), for some positive constants γm,n , K, b.
Theorem 3.4.1. Under the above set up and assumptions (a)–(c)and (e), the following
holds. For a γm,n < 1/56, with probability at least 1 − 56γm,n ,
                                                             G            G
             ρJK                   eX         eY               1
              n,m = sup |Pe (Tm + δm,n Tn ∈ A) − P(Tm + δm,n Tm ∈ A)|,
                                                                            2
                     A∈ARe
                        BX (γ         BY
                  ≲ ϖm       m,n ) + ϖn (γm,n ),
                                               58


where
                                        Bm,n log5 (md) log2 (1/γm,n ) 1/4
                                       2                              
                        BX (γ
                     ϖm      m,n ) =                                        ,
                                                       m
                                        Bm,n (log5 nd) log2 (1/γm,n ) 1/4
                                      2                             
                        BY
                     ϖn (γm,n ) =                                         .
                                                       n
    This theorem provides a theoretical guarantee towards the Gaussian approximation term
and its jackknife covariance multiplier bootstrap counterpart. Since the multiplier bootstrap
term can be estimated it would facilitate its use to quantify the error bound of approxima-
tion ρJK
      n,m . The entity ρn,m provides an upper bound to the error of approximation of the
                         JK
bootstrap distribution of the test statistic, viz., Tm
                                                     eX +δ
                                                           m,n Tn by the Gaussian counterpart.
                                                                 eY
Proof. For any sequence of constants ∆    ¯ m,n > 0, on the event {∆ ˆ m,n ≤ ∆¯ m,n }, we have
                                    ρJK        ¯
                                     m,n ≲ (∆m,n )
                                                     1/2 log d.
Now we shall first bound the quantity
                        ˆ m,n = (Γ̂JK − ΓX ) + δ 2 (Γ̂JK − ΓY ) .
                        ∆            m               m,n n             ∞
We would be using Ki > 0, i = 1, 2, · · · to denote the absolute constants. The main goal is
to use the previous lemma to find a real sequence ∆    ¯ m,n such that P(∆
                                                                         ˆ m,n ≥ ∆¯ m,n ) ≤ γm,n .
                                                 59


Finally we would bound (∆   ¯ m,n )1/2 log d. Now, to bound ∆         ¯ m,n , we rewrite Γ̂JK as,
                                                                                                  m
                             m XXn
                 1                                                                                        X T,
                                                                          on                                 o
 Γ̂JK                                                                   X
                            X                                                                      
   m   =                                     vec   h(X i , X  j )   − Um       vec    h(X   i , Xj )  − U m
         (m − 1)(m − 2)2
                            i=1 j̸=i k̸=i
                                     X
                 1                 1                                                                   T
       =                  2
                              1 −                        vec(h(Xi , Xj ))}{vec(h(Xj , Xj ))
         (m − 1)(m − 2)            m
                                           1≤i̸=j≤m
                                                                                                            
                                                                                                          T
                                                X                               
                                     +                     vec(h(Xi , Xj )) vec(h(Xi , Xk ))
                                        1≤i̸=j̸=k≤m
                                     
                        1          1          X                                                       T
            −                                            vec(h(Xi , Xj ))         vec(h(Xl , Xj ))
              (m − 1)(m − 2)2 m
                                        1≤i̸=j̸=l≤m
                                                                                                        T
                                               X                              
                                       +                vec(h(Xi , Xj ))         vec(h(Xi , Xj ))
                                          1≤i̸=j≤m
                                                                                                            T
                                                X                                 
                                       +                     vec(h(Xi , Xj ))         vec(h(Xj , Xk ))
                                          1≤i̸=j̸=k≤m
                                                                                                          T
                                                X                                
                                       +                    vec(h(Xi , Xj ))         vec(h(Xi , Xl ))
                                          1≤i̸=j̸=l≤m
                                                                                                                
                                                                                                              T
                                                  X                                   
                                       +                        vec(h(Xi , Xj ))         vec(h(Xl , Xk ))        ,
                                          1≤i̸=j̸=l̸=k≤m
       = Γ̂JK     JK
           m1 − Γ̂m2 ,      (say).
Let
                              h                                                  i
                                                                               T
                    ΓX
                                                      
                      1   = E vec(h(X1 , X2 )) vec(h(X1 , X3 ))                     ,
                              h                      i h                            iT
                    ΓX2   = E    vec(h(X    1 , X2 ))  E       vec(h(X   1 , X 2 ))       .
Then, ΓX = ΓX 1 − Γ2 . We assume, without loss of generality, that
                      X
           Bm,n log5 (md) log2 (1/γm,n )                   Bm,n log5 (nd) log2 (1/γm,n )
          2                                           2                                         
                                                ≤ 1,                                                  ≤1
                         m                                                   n
                                                    60


since, otherwise Theorem 3.4.1 holds trivially.
     We shall deal with several cases as follows. We shall first obtain a rate bound for Γ̂JK          m1 −
ΓX1 . We rewrite this difference as the sum of two U-statistics. In other words we write Γ̂m1 =
                                                                                                       JK
                                        (n − 3)! P
  m,1,1 + Γ̂m,1,2 , where Γ̂m,1,1 =
Γ̂JK          JK              JK                                     vec(h(x1 , x2 )) vec(h( x1 , x3 )) and
                                                                                                      
                                           n!        1≤i̸ =j̸ = k≤m
             (n−2)!                                                      T
Γ̂JK
                      P                              
  m,1,2 = n!            1≤i̸=j≤n vec(h(x1 , x2 )) vec(h(x2 , x2 )) .
     We shall first handle the leading term 1 ≤ i ̸= j =             ̸ k ≤ m for which the kernel is
Ha (x1 , x2 , x3 ) = vec(h(x1 , x2 )) a vec(h(x1 , x3 )) a , i = 1, 2, 3, xi ∈ Rp . and a :=
                                                                
                                           1                       2
(a1 , a2 ), with 1 ≤ aj ≤ d, j = 1, 2. The notation 1 ≤ a ≤ d means a = (a1 , a2 ), 1 ≤
a1 , a2 ≤ d.
     Let r = [n/3], for any Xi ∈ Rp and define
                        (n − 3)!
           Γ̂JK                                                      Zm,1,1 := r|Γ̂JK         X
                                      X
             m,1,1 :=      n!
                                              H(Xi , Xj , Xk ),                     m,1,1 − Γ1 |∞ ,
                                 1≤i̸=j̸=k≤m
              X                                    3i+3 
           Mm,1,1    :=    max     max Ha X 3i+1 .
                         0≤i≤r−1 1≤a≤d
                  3i+3 
where, Ha       X 3i+1 = Ha (X3i+1 , X3i+2 , X3i+3 ). Note that Γ̂JK        m,1,1 is a U statistic of order
3 and E[Γ̂JK  m,1,1 ] = Γ1 .
                          X
     By applying Lemma .0.11 with α = 12 , η = 1 and δ = 12 , we obtain that,
                                                 t2
                                                                                         1/2 
  
      X               X
                                                                              t
P   Zm,1,1    ≥ E[Z̄m,1,1  ]+t   ≤ exp −        X )2
                                                           +3 exp −             X
                                                                                                 ,  ∀ t > 0,
                                             3(¯
                                               ς1,1,1                   K1 ∥Mm,1,1    ∥ψ 1
                                                                                         2
                                                       61


where
                      r−1                                            r−1
                                                                        Xh                              i
                                       3i+3 
    X     )2                   2                        X                    H̄a (X)3i+3
                      X                                                                      
  (¯
   ςm,1,1    := max       E Ha X 3i+1 ,               Z̄m,1,1 := max                   3i+1      − EH̄a   ,
                  a                                                a
                      i=0                                               i=0
         3            3                            
  H̄a   x 1 := Ha        x 1 I max Ha (x)31 ) ≤ τ , a = (a1 , a2 ), 1 ≤ a1 , a2 ≤ d.
                                    a
Note that
                                                    r−1
                                                      Xh              3i+3              i 21
             E[Z̄m,1,1 ] ≤ K2   (log(d)1/2     max         E(H̄a     X 3i+1 − EH̄a     )2
                                                  a
                                                      i=0
                                          h                                             
                                                           
                                                                3i+3
                                                                                 2 i1/2
                             + (log(d)) E max H̄a X 3i+1 − E[H̄a ]                         ,
                                                 i,a
                                                       r−1
                                                                      3i+3  1/2
                                                     X                    
                         ≤ K2 (log(d))    1/2    max             2
                                                           E H̄a X 3i+1
                                                   a
                                                       i=0
                                                                               2 1/2
                                                                             
                                                               3i+3
                             + (log(d)) E max H̄a X 3i+1 − EH̄a                         ,
                                               i,a
                              n                                                  o
                         ≤ K2 (log(d))1/2 ς¯1,1,1
                                               X + (log(d))∥M X           ∥
                                                                     m,1,1 ψ       .
                                                                             1/2
By applying Cauchy-Schwarz and Lyapounov’s inequalities along with the condition (b),
                                                                              1
                    2      3i+3                                             4 2
               E Ha X 3i+1 ≤ E vec h (X)3i+1 , (X)3i+2 a
                                                                              1
                                                                                         1
                                                                                     4 2
                                            × E vec h (X)3i+1 , (X)3i+3 a
                                                                                     2
                                           2 .
                                      ≤ Bm,n
Therefore,
                                      X        √
                                    ς¯m,1,1 ≤      rBm,n ≤ mBm,n .
                                                      62


By the condition (b) and Pisier’s inequality (.0.8) we can conclude that
           X
       ∥Mm,1,1   ∥ψ      ≤ K3 log2 (rd) max ∥{vec(h((X)3i+2              2
                                                              3i+1 ))}a ∥ψ     ≤ K3 Bm,n 2 log(md)2 .
                    1/2                   i,a                              1/2
Coupled with condition (e) we get,
                                                                                 
                                     2
             EZ̄m,1,1 ≤ K4 (mBm,n log(d))         1/2      2
                                                      + Bn (log(d)) log(md) ,   2
                        ≤ 2K4 (mBm,n2 log(md))1/2 .
                                                                          
             P |Γ̂JK      − Γ X | ≥ K (m−1 B 2 log(md))1/2 + t)
                  m,1,1       1 ∞       5         m,n
                                       (mt)2
                                                                                        1/2 
                                                                               mt
                        ≤ exp −               2
                                                     + 3 exp −                                     ,
                                    K6 3mBm,n                        K7 Bm,n2 log2 (md)
                                                                          √
                                       mt2
                                                                                   
                                                                             mt
                        = exp −             2
                                                  + 3 exp −                                .
                                    K6 3Bm,n                       K8 Bm,n log(md)
                                s
                                    2 log(md) log2 ( 1 )
                                   Bm,n                   γm,n
Now choose t =        t∗  = K8                                   , where the constant K8 > 0 is large
                                                m
enough. Then,
                                                     −1 ) log(md) log2 ( 1 )
                                                                               
 P   |Γ̂JK
        m,1,1 − ΓX1 |∞   ≥  2t∗   ≤ exp − (K82 K10
                                                                           γm,n
                                                      1
                                                           −1 )m1/4 (log 12 (1/γ             − 34       −1 
                                     + 3 exp − (K82 K12                           m,n )) log      (md)B  2
                                                                                                       m,n
Now for d ≥ 3 and for some γm,n small enough, so we obtain that log(md) ≥ 1 and
        1 ) ≥ 1. Therefore for K large enough,
log( γm,n                            8
                                       
                            X | ≥ 2t∗ ≤ γ           K 2 /K                 K 1/2 /K
           P |Γ̂JK
                 m,1,1   − Γ1 ∞                 m,n     8    10 + 3 γ
                                                                       m,n 8
                                                                                     12 ≤ 4γ
                                                                                                 m,n .
                                                      63


    By similar arguments as above, we conclude that,
                                                       2
                                                         Bm,n log(1/γm,n ) 1/4
                                      1/2                                                  
          |Γ̂JK     − ΓX          2                                                   ˙
    P        m,1,1       1 |∞ log (md)       ≥ K13
                                                                   m
                                                                                     log(md) ≤ 4γm,n ,
                                      1/2                          
             JK          X
    P |Γ̂m,1,1 − Γ1 |∞ log (md)   2                      BX
                                             ≥ K13 ϖ1 (γm,n ) ≤ 4γm,n .
    Next, to analyse the second term                         in the expression of Γ̂JK  m1 , let
                                             P
                                               1≤i̸=j≤m
                                                         T                   (n − 2)!
                                                                  Γ̂JK
                                                                                         X
H(x1 , x2 ) := vec(h(x1 , x2 )) vec(h(x2 , x2 )) ,                   m,1,2 =                      H(Xi , Xj ),
                                                                                 n!
                                                                                        1≤i̸=j≤n
                   h                                            i
         X                                                    T
       Γ1,2 := E vec(h(X1 , X2 )) vec(h(X2 , X2 ))                 .
Then Γ̂m,1,2 is a U-statistic of order 2. With r = [m/2], define
                                                                                    2i+2 
       X        r|Γ̂JK     − ΓX              X
   Zm,1,2    =      m,1,2      1,2 |∞ ,    Mm,1,2   =      max       max Ha       X 2i+1 ,
                                                        0≤i≤r−1 1≤a≤d
                         r−1                                            r−1
             2                          2i+2                            Xh                           i
       X                            2                    X                      H̄a ((X)2i+2
                         X
     ς¯m,1,2    = max         E Ha X 2i+1 ,            Z̄m,1,2  = max                    2i+1  ) − E H̄a ,
                     a                                                a
                          i=0                                              i=0
                                                             
              2                 2                 2        
   H̄a     X 1      = Ha      X 1 1 max Ha X)1 ≤ τ ,                    τ > 0.
                                        a
Let τ = 8E[Mm,1,2 X     ]. By Lemma .0.11, applied with α = 21 , η = 1 and δ = 12 , we obtain that
∀ t > 0,
                                                                          "                          1/2 #
                                                     t2
                                                                             
   
        X              X
                                                                                          t
 P Zm,1,2 ≥ E[Z̄m,1,2 ] + t ≤ exp −                 X
                                                                + 3 exp −                X
                                                                                                           .
                                                3(¯ςm,1,2  )2                    K1 ∥Mm,1,2    ∥ψ
                                                                                                 1/2
                                                      64


But
                                              h       r−1                                 1i
                                           1/2              E(H̄a ((X)2i+2              2 2
                                                       X
             E[Z̄m,1,2 ] ≤ K14 (log(d)           max                  2i+1 ) − E H̄ a )
                                                   a
                                                       i=0
                                        h                                   i1/2 
                                                            2i+2
                           + (log(d)) E[max |H̄a ((X)2i+1 ) − E[H̄a ]|    2       ,
                                             i,a
                                                                                
                                                X
                                  p
                         ≤ K14        log(d)(¯ςm,1,2 ) + (log(d))   ∥Mm,1,2 ∥ψ          .
                                                                              1/2
Apply the Cauchy-Schwarz inequality and condition (c) to obtain
                                                                          4
                                                                      
                  2       2i+2                                                       2 .
             E Ha X 2i+1 ≤ E vec h X 2i+1 , (X)2i+2                              ≤ Bm,n
                                                                             a
so that,
                                      X         √
                                    ς¯m,1,2  ≤    rBm,n ≤ mBm,n .
By a property of Orlicz norm, ∥X 2 ∥ψ            = ∥X∥2ψ . By condition (d) and Pisier’s inequality
                                            1/2           1
(.0.8), we obtain
                                                                   2
        X                                                   2i+2 
     Mm,1,2       ≤  K15 log2 (rd) max          vec h X 2i+1                 ≤ K15 Bm,n   2 log2 (md).
             ψ1/2                      i,a
                                                                       ψ1/2
                                                     65


    Use condition (e) to derive
                                                                        
                                                                                           2 log(md) 1/2 ,
                                                                                                   
                            2
 EZ̄m,1,2 ≤ K16 (mBm,n log(d))           1/2     2                     2
                                             + Bn (log(d)) log(md) ≤ 2K16 mBm,n
                                                                    
  P   |Γ̂JK
         m,1,2 − ΓX1,2 |∞  ≥   K16 (m−1 Bm,n2 log(md))1/2       + t)
                              (mt)2
                                                                                 1/2 
                                                                  mt
            ≤ exp −                  2
                                           + 3 exp −                                     ,
                         K17 3mBm,n                               2 log2 (md)
                                                           K18 Bm,n
                                                                   √
                              (mt)2
                                                                               
                                                                    mt
            = exp −                  2
                                           + 3 exp −                               .
                         K17 3mBm,n                        K19 Bm,n log(md)
                             s
                                   2 log(md) log2 ( 1 )
                                Bm,n                  γm,n
Choose, t = t∗ = K20                                          , for some constant 0 < K20 < ∞(large
                                              m
enough). Then, we have
                                                                                  
    |Γ̂JK     − ΓX           2t∗                   2 K −1 ) log(md) log2
P      m,1,2     1,2 |∞  ≥          ≤ exp    − (K20    17                      1/γm,n
                                                       1
                                                        2    −1    1/4     1/2              − 43    −12
                                                                                                        
                                       + 3 exp   − (K20 K19 )n log (1/γm,n ) log (md)Bm,n .
Now we see that
                                           2          1/2
                                         K20        K20
                                         K17         K
                                 
 P |Γ̂JK          X              ∗ ≤          + 3γm,n19
         m,1,2 − Γ1,2 |∞ ≥ 2t          γm,n                ≤ 4γm,n ,
                                              B 2 log(md) log2 (1/γm,n ) 1/4                  
         JK        X    1/2                      m,n
 P |Γ̂m,1,2 − Γ1,2 |∞ log(md) ≥ K21                                                   log(md) ≤ 4γm,n ,
                                                               m
                                                           
 P |Γ̂JK       − ΓX |1/2 log(md) ≥ K ϖ BX (γ               )   ≤ 4γm,n .
         m,1,2    1,2 ∞                     21 1      m,n
Note that by the Cauchy-Schwarz and Lyapounov’s inequalities and condition (b), we can
                                                      66


                            2/3
see that, |Γ1,2 |∞ ≤ Bm,n , from which we get m−1 |Γ1,2 |∞ ≤ t∗ /2. Therefore,
                                  |Γ̂JK              JK
                                     m,1,2 |∞ ≤ |Γ̂m,1,2 − Γ1,2 |∞ + |Γ1,2 |∞ .
Again,
                    n
                          JK             3t∗ o n JK                                      3t∗ o
                       |Γ̂m,1,2 |∞ ≥           ⊆ |Γ̂m,1,2 − Γ1,2 |∞ + |Γ1,2 |∞ ≥
                                          2                                               2
                                                  n                        3t ∗     nt ∗o
                                               ⊆ |Γ̂JK  m,1,2 − Γ1,2 |∞ ≥ 2 + 2 .
Therefore,
                                    3t∗        
                                                      JK − Γ | ≥ (3 + n)t
                                                                                    ∗
                P |Γ̂JK
                      m,1,2 ∞|   ≥          ≤  P   |Γ̂m,1,2     1,2 ∞                    ≤ 4γm,n .
                                       2                                       2
    Next, consider Γ̂JK    m2 . It can be decomposed as a sum of 5 U-Statistics. To deal with
                                                                                                            T
the leading term 1≤i̸=j̸=k̸=l≤m , let H(x1 , x2 , x3 , x4 ) = vec(h(x1 , x2 )) vec(h(x3 , x4 ))
                     P                                                                     
and define
                                       (m − 4)!
                          Γ̂JK
                                                         X
                            m,1,4 =        m!
                                                                    H(Xi , Xj , Xk , Xl ).
                                                  1≤i̸=j̸=k̸=l≤m
Note that Γ̂JK m,1,4 is a U statistics of order 4 and E[Γ̂m,1,4 ] = Γ2 . Let r = [m/4] and define
                                                                   JK         X
                                                                                      4i+4 
       X        r|Γ̂JK      − ΓX                 X
   Zm,1,4    =      m,1,4       2 |∞ ,        Mm,1,4    =    max      max Ha        X 4i+1 ,
                                                           0≤i≤r−1 1≤a≤d
                          r−1                                            r−1
             2                              4i+4                         Xh                           i
       X                             2                       X                     H̄a ((X)4i+4
                          X
     ς¯m,1,4    = max          E Ha X 4i+1 ,               Z̄m,1,4 = max                    4i+1 ) − EH̄a ,
                     a                                                 a
                          i=0                                               i=0
                                                                
              4                  4 
                               X 1 1 max Ha X)41 ≤ τ
                                                                 
   H̄a     X 1      = Ha                                             , τ > 0, a = (a1 , a2 ),
                                            a
   ∀a1 , a2 = 1, 2, · · · , d.
                                                          67


By Lemma .0.11 with α = 21 , η = 1 and δ = 12 . We have ∀t > 0 that
                                                      t2
                                                                                                1/2 
        X            X
                                                                                        t
   P Z1,1,4 ≥ E[Z̄1,1,4 ] + t ≤ exp −                X )2
                                                               + 3 exp −                 X ∥
                                                                                                           .
                                                 3(¯ς1,1,4                     K1 ∥M1,1,4      ψ1
                                                                                                   2
Moreover,
                                                    r−1
                                                      X              4i+4              2  21
           E[Z̄1,1,4 ] ≤ K22    (log(d)1/2      max         E H̄a X 4i+1 − EH̄a
                                                  a
                                                      i=0
                                        
                                                           4i+4             2 1/2 
                         + (log(d))[E max H̄a ( X 4i+1 ) − E[H̄a ]                     ,
                                            i,a
                                                       r−1
                                                                      4i+4 1/2
                                                     X                   
                       ≤ K22 (log(d))      1/2   max             2
                                                            E H̄a X 4i+1             ,
                                                   a
                                                       i=0
                                                                          1/2 
                                                           4i+4          2
                         + (log(d)) E max H̄a ((X)4i+1 ) − EH̄a                    ,
                                           i,a
                       ≤ K22 {(log(d))1/2 ς¯1,1,4
                                                X + (log(d))∥M X ∥
                                                                    1,1,4 ψ      .
                                                                            1/2
By the Cauchy-Schwarz inequality inequality and Condition(b),
            "                 #    "                                          #1
                                                                       4 2
           E Ha2 ((X)4i+4
                        4i+1 ) ≤ E vec h((X)4i+1 , (X)4i+2 )
                                                                            a1
                                     "                                          #1
                                                                         4 2
                                  × E vec h((X)4i+3 , (X)4i+4 )                             2 .
                                                                                       ≤ Bm,n
                                                                              a2
Therefore,
                                     X         √
                                   ς¯1,1,4  ≤    rBm,n ≤ mBm,n .
                                                     68


By Condition (b) and the Pisier’s inequality (.0.8),
                                           
                                                              
                                                                    2
        X ∥
   ∥M1,1,4          ≤ K23 log2 (rd) max     vec h (X)4i+14i+2                      2 log(md)2 .
                                                                          ≤ K23 Bm,n
              ψ1/2
                                     i,a                          m ψ1/2
This bound together with condition (e) yield that
                            n                                             o
            EZ̄1,1,4 ≤ K24 (mBm,n 2 log(d))1/2 + B 2 (log(d)) log(md)2
                                                       n
                                  2 log(md)    1/2
                     ≤ 2K24 mBm,n                   ,
                                                                      
            P |Γ̂JK     − ΓX | ≥ K (m−1 B 2 log(md))1/2 + t)
                  m,1,4    2 ∞         25       m,n
                                    (mt)2                                          1/2 
                                                          
                                                                      mt
                     ≤ exp −              2
                                                + 3 exp −                                 ,
                                K26 3mBm,n                            2 log2 (md)
                                                                K27 Bm,n
                                                                       √
                                    (mt)2                                         
                                                          
                                                                       mt
                     ≤ exp −              2
                                                + 3 exp −                              .
                                K26 3mBm,n                      K28 Bm,n log(md)
                                            s
                                                 2 log(md) log2 ( 1 )
                                               Bm,n                  γm,n
Apply the above bound with t = K29                                         , for constant K29 > 0
                                                            m
(large enough). Then, the above bound becomes
                                             2 K −1 ) log(md) log2 ( 1 ))
                             
   P   |Γ̂JK
          m,1,4 − ΓX2 |∞ ≥ 2t ≤ exp(−(K29        26                  γm,n
                                                 1                 1                        − 21
                                                 2    −1   1/4                    − 34
                                  + 3 exp(−(K29 K28 )n (log (1/γm,n )) log (md)Bm,n ).
                                                                   2
                                                  69


Now we see that,
                                           2             1/2
                                         K29          K29
                                         K26           K
                                 
  P |Γ̂JK         X             ∗ ≤           + 3γm,n28
        m,1,4 − Γ2 |∞ ≥ 2t              γm,n                  ≤ 4γm,n ,
                                              B 2 log(md) log2 (1/γm,n ) 1/4                
        JK         X  1/2                           m,n
  P |Γ̂m,1,4 − Γ2 |∞ log(md) ≥ K30                                                    log(md) ≤ 4γm,n ,
                                                                  m
                                                              
  P |Γ̂JK     −  ΓX |1/2 log(md) ≥ K ϖ BX (γ                 )   ≤ 4γm,n .
        m,1,4     2 ∞                      30 1          m,n
    Next, to analyse the second term                                                     m2 , let r = [m/2]
                                                                 in the expression of Γ̂JK
                                                P
                                                    1≤i̸=j≤m
and
                                                            T               (m − 2)!
                                                                  Γ̂JK
                                                                                      X                    
 H(x1 , x2 ) = vec(h(x1 , x2 )) vec(h(x1 , x2 )) ,                  m,1,3 =                     H Xi , Xj ,
                                                                              m!
                                                                                     1≤i̸=j≤m
            h                                               i
                                                           T
 ΓX                                                           , Zm,1,3 = r Γ̂JK          X
                                    
  1,3 := E     vec(h(X  1 ,  X 2 )    vec(h(X  1 , X 2 ))                      m,1,3 − Γ1,3 ∞ ,
                                                 
   X                                        2i+2
 Mm,1,3  = max max Ha X 2i+1 , ∀ a = (a1 , a2 ), a1 , a2 = 1, 2, · · · , d.
             0≤i≤r−1 1≤a≤d
Then Γ̂JK
        m,1,3 is a U-statistic of order 2.
    Let τ = 8E[Mm,1,3X      ]. By Lemma .0.11, applied with α = 21 , η = 1 and δ = 12 , we obtain
that ∀ t > 0,
                                                          t2                                        1/2 
                                                                          
     
        X             X
                                                                                        t
   P   Zm,1,3  ≥  E[Z̄m,1,3   ]+t      ≤ exp     −       X
                                                                    + 3 exp −            X
                                                                                                           ,
                                                     3(¯ςm,1,3 )2                 K1 ∥Mm,1,3   ∥ψ 1
                                                                                                  2
                                                          70


where
                        r−1                                             r−1
                                                                           Xh                                i
            2                           2i+2 
      X                           2                        X                     H̄a ((X)2i+2
                        X
    ς¯m,1,3    = max        E Ha X 2i+1 ,                Z̄m,1,3  = max                      2i+1   ) − EH̄a   ,
                    a                                                 a
                        i=0                                                i=0
                                                               
             2                2                     2        
   H̄a    X 1      = Ha     X 1 1 max Ha X)1                ≤τ      ,   a = (a1 , a2 ), 1 ≤ a1 , a2 ≤ d.
                                        a
Moreover,
                                                 h       r−1                                      1i
                                              1/2               E(H̄a ((X)2i+2                  2 2
                                                          X
               E[Z̄m,1,3 ] ≤ K31 (log(d)            max                     2i+1 ) −  E H̄  a )
                                                      a
                                                          i=0
                                         h                                        i1/2 
                                                                2i+2
                             + (log(d)) E[max |H̄a ((X)2i+1 ) − E[H̄a ]|        2         ,
                                               i,a
                                                                                       
                                                  X                         X ∥
                                    p
                           ≤ K31       log(d)(¯ ςm,1,3  ) + (log(d))    ∥M1,1,3    ψ1/2       .
Now again by applying the Cauchy-Schwarz inequality and condition (c), we obtain that
                   
                            2i+2 
                                                                           4 
                      2
                E Ha X 2i+1 ≤ E vec ha X 2i+1 , (X)2i+2
                                                              
                                                                                       ≤ Bm,n 2 .
Therefore,
                                       X          √
                                     ς¯m,1,3   ≤     rBm,n ≤ mBm,n .
By a property of Orlicz norm, we have ∥X 2 ∥ψ                  = ∥X∥2ψ . The condition (d) and Pisier’s
                                                          1/2            1
inequality (.0.8) yield that
                                                                        2
        X                                                        2i+2 
      Mm,1,3          ≤  K32 log2 (rd) max       vec ha X 2i+1                    ≤ K32 Bm,n   2 log(md)2 .
                ψ1/2                      i,a
                                                                            ψ1/2
                                                        71


Use condition (e) and this bound to obtain that
                                                                       
                                                                                         2 log(md) 1/2 ,
                                                                                                   
                            2
 EZ̄m,1,3 ≤ K33 (mBm,n log(d))           1/2     2                    2
                                             + Bn (log(d)) log(md) ≤ 2K33 mBm,n
                                                                  
  P   |Γ̂JK
         m,1,3 − ΓX1,3 |∞   ≥  K34 (m−1 Bm,n2 log(md))1/2      + t) ,
                             (mt)2 
                                                                              1/2 
                                                                 mt
            ≤ exp −                2
                                          + 3 exp −                                    ,
                          K35 mBm,n                             2 log2 (md)
                                                          K36 Bm,n
                                                                  √
                              (mt)2
                                                                               
                                                                  mt
            = exp −                  2
                                           + 3 exp −                                .
                         K35 3mBm,n                        K37 Bm,n log(md)
                                             s
                                                 2 log(md) log2 ( 1 )
                                               Bm,n                  γm,n
Apply this bound with t = t∗ = K38                                           , for constant K38 > 0 (large
                                                             m
enough), to obtain that
                                                                                   
     |Γ̂X      − ΓX           2t∗                  2 K −1 ) log(md) log2
 P      m,1,3     1,3 |∞   ≥        ≤ exp     − (K38     35                    1/γm,n
                                                      1
                                                       2    −1   1/4     1/2              − 43     −12
                                                                                                       
                                      + 3 exp   − (K38 K37 )n log (1/γm,n ) log (md)Bm,n .
Consequently,
                                           2          1/2
                                         K10       K12
                                         K11        K
                                 
 P |Γ̂JK          X             ∗ ≤           + 3γm,n13
         m,1,3 − Γ1,3 |∞ ≥ 2t          γm,n                ≤ 4γm,n ,
                                              B 2 log(md) log2 (1/γm,n ) 1/4                
         JK        X    1/2                       m,n
 P |Γ̂m,1,3 − Γ1,3 |∞ log(md) ≥ K39                                                   log(md) ≤ 4γm,n ,
                                                               m
                                                           
         JK       X     1/2                     BX
 P |Γ̂m,1,3 − Γ1,3 |∞ log(md) ≥ K39 ϖ1 (γm,n ) ≤ 4γm,n .
    By the Cauchy-Schwarz and Lyapounov’s inequalities and condition (b), we can see that,
                                                      72


             2/3                                    −2 X           ∗
  1,3 |∞ ≤ Bm,n , from which we obtain that n |Γ1,3 |∞ ≤ t /2. Therefore,
|ΓX
             |Γ̂JK            X          X          X
                m,1,3 |∞ ≤ |Γ̂m,1,3 − Γ1,3 |∞ + |Γ1,3 |∞ ,
             n                3t∗ o n JK             X | + |ΓX | ≥ 3t ,
                                                                              ∗o
               |Γ̂JK    |
                  m,1,3 ∞  ≥        ⊆    |Γ̂m,1,3 − Γ1,3 ∞      1,3 ∞
                               2                                             2
                                     n                      3t∗      2
                                                                   n t  ∗ o
                                  ⊆ |Γ̂JK          X
                                         m,1,3 − Γ1,3 |∞ ≥ 2 + 2           ,
               
                   JK          3t∗        
                                              JK        X       (3 + n2 )t∗ 
             P |Γ̂m,1,3 |∞ ≥          ≤ P |Γ̂m,1,3 − Γ2 |∞ ≥                    ≤ 4γm,n .
                                 2                                    2
    For the remaining terms in Γ̂JK  m2 , note that they are either a U statistic of degree three
or a U statistics of degree two, which are analyzed as above. Therefore,
                                                                        
                     P |Γ̂JK   −  ΓX |1/2 log(md) ≥ K ϖ BX (γ          )
                           m          ∞                 40 1       m,n
                                       X |1/2 log(md) ≥ K40 ϖ BX (γ
                                                                             
                       ≤ P |Γ̂JK
                              m1   − Γ 1 ∞                      1       m,n )
                                                           2
                             
                                 JK        X 1/2            K40 BX              
                          + P |Γ̂m2 − Γ2 |∞ log(md) ≥            ϖ1 (γm,n ) ,
                                                              2
                       ≤ 28γm,n ,
where K40 = max{K13 , K21 , K30 , K39 , · · · } denote a generic universal constant.
    Similarly for Y1 n we would have
                                        Γ̂JK      JK      JK
                                          n = Γ̂n1 − Γ̂n2 .
                                                  73


Recall that ΓY = ΓY1 − ΓY2 , where
                                h                                          i
                                                                          T
                       ΓY1 = E vec(h(Y1 , Y2 )) vec(h(Y1 , Y3 ))
                                                      
                                                                              ,
                                h                    i h                      iT
                         Y
                       Γ2 = E vec(h(Y1 , Y2 )) E vec(h(Y1 , Y2 ))                  .
By using the similar arguments as above, we obtain that
                                                                      
                                    1/2
                  P   |Γ̂JK
                         n    − ΓY |∞ log(nd)     ≥  K42 ϖ1BY (γm,n )
                                     Y |1/2 log(nd) ≥ K42 ϖ BY (γ
                                                                             
                   ≤ P |Γ̂JK n1  − Γ 1 ∞                       1      m,n   )
                                                           2
                                            Y |1/2 log(nd) ≥ K42 ϖ BY (γ
                                                                                     
                             + P |Γ̂Jk
                                    n2  − Γ 2  ∞                       1        m,n  )  .
                                                                 2
                            s                                       s
                                2 log(md) log2 ( 1 )
                               Bm,n                                       2 log(nd) log2 ( 1 )
                                                                       Bm,n
                                                      γm,n                                  γm,n
Now choose ∆ ¯ m,n = K40                                     + K42                                .
                                            m                                            n
Combining all the previous inequalities with this choice of ∆       ¯ m,n , it readily follows that,
                                                                        
                  P |Γ̂JKm    − ΓX + δ 2 (Γ̂JK − ΓY )| ≤ ∆
                                        m,n n                ∞    ¯ m,n
                                                                                 
                                 JK       X      2      JK
                   = 1 − P |Γ̂m − Γ + δm,n (Γ̂n − Γ |∞ > ∆m,n   Y          ¯
                             
                                            X |1/2 log(md) ≥ K40 ϖ BX (γ
                                                                                        
                   ≥ 1 − P |Γ̂JK   m   −  Γ    ∞                        1        m,n  )
                                                                  2
                                                          K42 BY               
                               JK      Y  1/2
                     + P |Γ̂n − Γ |∞ log(nd) ≥                  ϖ1 (γm,n )
                                                             2
                                                
                   ≥ 1 − 28γm,n + 28γm,n = 1 − 56γm,n .
Recall (3.4.1). The same conclusion holds for |Γ̃X      m − Γ |∞ and |Γ̃n − Γ |∞ from the fact
                                                               X                Y        Y
that Γ̃X
       m ≲ Γ̂m and Γ̃n ≲ Γ̂n . And thus the conclusion follows from Lemma 4.1, by
               JK          Y      JK
setting Tm
         eX |X m = Z X |X m and T eY |Y n = Z Y |Y n .
                1      1              n     1       2
                                                   74


Remark 3.4.2. One can choose a sequence γm,n such that                             < ∞. Then by
                                                                      P
                                                                          m,n γm,n
applying Borel-Cantelli lemma, we thus obtain the bootstrap convergence result in almost
sure sense. For example if one chooses γm,n = exp(− log(dm)) for m = n, then γm,n < 1/56
and log(1/γm,n ) ≤ K log(d(m∨n)) for some large m. To apply Borel-Cantelli lemma, we note
that
      P∞             P∞                           P∞            −1                 mc and c < 1/7
        m=4 γm,n  =    m=4 exp(− log(dm))       =    m=4 (dm) . For d < exp
                                                                        c
we obtain that
                P∞             P∞           mc m)−1 ≤ ∞ exp−u du < C (c 4c )−1 exp(−4c )
                                                         R
                  m=4 γm,n   =    m=4   (exp               u=4        u          1
< ∞ for some positive constant C1 .
    Now using Theorems 3.3.1 and 3.4.1 we are in a position to state the next important
result. Let Pe denote the probability distribution with respect to em+n only or it can be
thought of as being the conditional distribution of em+n , given all the other r.v.’s.
3.5    Two sample test for covariance matrices
Based on the results derived in the previous sections, we formulate a testing procedure for
testing the equality of two population covariance matrices under l∞ norm Consider the
problem of testing
                           H0 : Σ1 = Σ2       versus Ha : Σ1 ̸= Σ2 ,
where Σ1 , Σ2 ∈ Rp×p , representing the covariance matrices of X and Y respectively. Recall
             √     X − UY )
               m(Um       n
that Tmn =                    is the original test statistic, where
                    2
                                    m X   m
                                              vec (Xi − Xj )(Xi − Xj )T
                                                                              
                 X         1      X
                Um =                                                            and
                      m(m − 1)                              2
                                   i=1 j̸=i=1
                                  n X   n                              T
                                                                          
                 Y =      1     X           vec  (Yi − Yj )(Y i − Y j )
                Um                                                          .
                      n(n − 1)                           2
                                 i=1 j̸=i=1
                                                75


are the sample covariance matrices for the sample X m and Y n samples respectively. Note
that both Um  X , U Y are d = p2 dimensional U statistics. The proposed test rejects H
                   n                                                                                    0
whenever ∥Tmn ∥ is large. To implement the test we propose to use multiplier bootstrap
                                                   r
                                                      m eY
version of the Tm,n given by Tm,n = Tm −
                                 JK          eX          T , where
                                                      n m
                            m               m 
                                                                                                 !
          eX := √m      1 X       1         X     vec((Xi − Xj )(Xi − Xj )T )             
                                                                                        X e ,
        Tm                                                                         − Um        i
                       m        m−1                               2
                           i=1            j̸=i=1
                          n              n 
                                                                                              !
                 √                              vec((Y   − Y   )(Y   − Y   ) T)      
                      1         1                      i     j     i     j
        TneY := n                                                                  Y
                         X              X
                                                                                − Um    ei+m .
                      n        n−1                            2
                         i=1          j̸=i=1
Let,
                                                 r
                       n
                                          eX       m eY                         o
          cB (α) = inf t ∈ R : Pe (∥Tm −             Tn ∥∞ ≤ t) ≥ 1 − α ,            0 < α < 1.
                                                   n
Corollary 3.5.1 below show that the test rejects H0 whenever ∥Tmn ∥ > cB (α) is of the
asymptotic size α. From now for the sake of brevity, for any ξ1 , ξ2 ∈ Rd , let
                                              vec((ξi − ξj )(ξi − ξj ))T
                               h(ξi , ξj ) =
                                                          2
denote the covariance kernel. The following theorem along with the corollary provides redthe
guarantee of the asymptotic level of the above mentioned test. Before stating the next
theorem we need some more assumptions as stated below.
 (a′ ) For some universal constants 0 < c1 < c2 < 1, m+n       m ∈ (c , c ).
                                                                        1 2
 (b′ ) There exists a constant b > 0 such that E[ga2 (X) + δm,n      2 g 2 (Y )] ≥ b, for all 1 ≤ a ≤ d.
                                                                          a
                                                   76


 (c′ ) There exists a sequence of constants Bm,n ≥ 1 such that
                                                       
                                                   2+l        l , l = 1, 2,
                         max E | vec(h(X1 , X2 ) )a        ≤ Bm,n
                        1≤a≤d
                                                      
                                                  2+l       l , l = 1, 2.
                         max E vec(h(Y1 , Y2 ) )a         ≤ Bm,n
                        1≤a≤d
 (d′ ) The constants Bm,n defined in (c) also satisfy
                                                                            
          max     vec(h(X1 , X2 ) )a ψ ≤ Bm,n ,       max     vec(h(Y1 , Y2 ) )a ψ ≤ Bm,n .
         1≤a≤d                        1              1≤a≤d                        1
                                                      Bm,n log7 (pm)      Bm,n log7 (pn)
 (e′ ) The constants Bm,n defined in (c) also satisfy                ∼                   → 0,
                                                            m                    n
       as m ∧ n → ∞.
For brevity, let Ω = Σ1 − Σ2 . The Kolomogorov distance between the two distributions of
suitably centered Tmn and Tmn JK is defined to be
               √
                                          
                                     JK
KD Tmn −         m vec(Σ1 − Σ2 )/2 , Tmn
             √m(U X − U Y ) − √m vec(Σ − Σ )                            r
                                                                               m eY       
                   m      n               1     2                   eX
 = sup    P                                           ≤ t − Pe ∥Tm −            T ∥∞ ≤ t .
    t≥0                        2                   ∞                           n n
We are ready to state the following theorem.
Theorem 3.5.1. Suppose the above conditions (a′ )–(e′ ) hold. Then for any non-negative
definite matrices ΣX and ΣY , of real numbers, with probability tending to one,
                                                        7
                                       JK  ≲ Bm,n log (pm) 1/6 .
                                              n               o
                          KD Tmn , Tmn
                                                      m
                                              77


Remark 3.5.1. Condition (a′ ) specifies that the ratio of the sample sizes can reside in any
open interval. Condition (b′ ) ensures the non-degenracy of the sample observations. This
condition is less restrictive than the minimum eigen-value condition considered in several
existing literatures viz. Chen and Li(2012) [33], Cai et al.(2013) [11], Chang et al.(2017)
[14]. Condition (c′ ) allows the bound on the third and fourth order moments to grow with
the sample sizes m, n, unlike as in Cai et.al(2013) [11], Li and Chen (2012) [33] and Chang et
al. (2017) [14]. In these papers the moments appearning in (c′ ) are assumed to be bounded
from above by a fixed constant, for all sample sizes. Condition (d′ ) allows the sub-exponential
tails to grow freely with the sample sizes, which is also advantageous than conditions in Cai
et al. (2013) [11] and Chang et al. (2017) [14].
    On a similar note in .Cai et al. (2013) [11], for the convergence of their distribution of
test statistic under H0 to extreme Type-I distribution or to a normal distribution as in Li
and Chen (2012) [33], one needs to assume sparsity or weak correlation structure among
the individual components of their test statistics over which either l∞ or l2 -norm would be
calculated. The above multiplier bootstrap method helps us to formulate a similar testing
procedure without imposing any such correlational assumptions.
    Proof. The proof uses the results of the previous section with δm,n = −m1/2 n−1/2 .
From condition (a′ ), we can verify that
                                1/2                                   1/2
                           c1                                     c2
                                      = δ1 < |δm,n | < δ2 =                   ,          (3.5.1)
                       (1 − c1 )                               (1 − c2 )
It follows from assumption (b′ ) that
                           min E[ga2 (X) + δm,n
                                             2 g 2 (Y c )] ≥ min{1, δ 2 }b.
                                                 a                   1                   (3.5.2)
                          1≤a≤d
                                               78


                          G           G                                          ′     ′
Recall that, Tmn
               G = T 1 +δ
                         m       m,n Tm . By combining (3.5.1), (3.5.2), (a ), (e ) along with
                                         2
Theorem 3.3.1 with ΣX = Σ1 and ΣY = Σ2 we obtain that
                                                      Bm,n (log7 (pm)) 1/6
                                                     2                 
                        KD(Tmn , Tmn G )  ≤ ρ∗∗
                                             m,n ≲                                           (3.5.3)
                                                             m
Choose γm,n in condition (e) of Theorem 3.4.1 to be γm,n =              1      .
                                                                   m2 (log m)4
From (a′ ) and (e′ ) it can be easily verified that
  Bm,n log5 (pm) log2 (1/γm,n )      Bm,n log5 (pn) log2 (1/γm,n )
                                   ∼                               → 0,     as m, n → ∞. (3.5.4)
               m                                   n
Combining (3.5.1), (3.5.2), (3.5.4) and Theorem 3.4.1 we conclude that,
                                              (                                )1/4
                        JK , T G )              Bm,n log5 (pm) log2 (1/γm,n )
               KD(Tmn         mn   ≤ ρJK
                                      m,n  ≲                                        .        (3.5.5)
                                                             m
The claim of the theorem follows from the triangle inequality by verifying the fact that,
                                                                   Bm,n (log7 (pm)) 1/6
                                                                 2                   
  KD(Tmn , Tmn JK )  ≤  KD(Tmn , Tmn G ) + KD(T JK , T G )   ≲                           .   (3.5.6)
                                                  mn     mn
                                                                           m
This concludes the proof.
   The proposed test procedure rejects H0 : Σ1 = Σ2 versus Ha : Σ1 ̸= Σ2 , at the signifi-
cance level α ∈ (0, 1), whenever Φα = 1, where Φα =I(∥Tmn ∥∞ > cB (α)).
Corollary 3.5.1. Under the conditions of Theorem 3.5.1 and under H0 , the following holds.
                  √          X − UY )                             n B 2 log7 (pm) o1/6
                        m(Um        n                                    m,n
         sup   P                            ≥ cB (α) − α ≲ max                             .
       α∈(0,1)                 2         ∞                                    m
                                                 79


Proof. The proof is an immediate consequence of Theorem 3.5.1 from the definition of
cB (α).
    As an immediate consequence of this corollary is the formulation of the following 100(1 −
α)% confidence region for (Σ1 − Σ2 ).
                                          n                                   o
                                CR1−α := Σ1 − Σ2 : Tmn ∞ ≤ cB (α) .
    A computing procedure of cB (α). The multiplier bootstrapped version of the critical
value cB (α) is quite advantageous in terms of faster computation. A procedure of computing
the multiplier bootstrap critical value is described below for reader’s convenience.
 Step 1: From the sample of size m + n, generate N sets of standard normal random
       variables. Denote them by em+n   1   , · · · , em+n
                                                       N    . Treat them as random copies of em+n =
       {e1 , e2 , · · · , em+n }.
 Step 2: Keeping X m and Y n fixed, compute the bootstrapped version of the test statis-
       tic ∥Tmn ∥∞ N times, viz., calculate ∥Tmn               ∞ N times.       Compute N values of
                                                          JK ∥
          JK , T JK , · · · , T JK }.
       {Tmn1        mn2           mnN
 Step 3: The 100(1 − α) quantile of {Tmn1            JK , T JK , · · · , T JK } would be treated as an
                                                           mn2            mnN
       approximate value for cB (α).
    It is to be noted that normally general resampling method demands a computational
cost of the order O(N n2 p2 ) whereas this multiplier bootstrap technique reduces the compu-
tational cost to O(n(p2 + n)N ) providing a massive advantage. A general criticism received
by the use of this maximum norm based statistic in Cai et al. (2013) [11] is that the con-
vergence of the asymptotic null distribution to Gumbel requires relatively large sample size,
which in turn possess some computational challenges in terms of size and power when the
                                                       80


sample size is small. Fan et al. (2015) [24] suggested power enhancement techniques in this
context. However, in our case the proposed multiplier bootstrap method makes the power
computation a lot faster even without this power enhancement technique. The proposed
method can be much more appreciated in the asymptotic analysis of the power provided in
the following section.
3.6   Analysis of Power
Analysis of power : The goal of this section is to show that the difference between the
two power functions obtained from the original test statistic and its Jackknifed bootstrap
counterpart is asymptotically negligible. Further, we show that the proposed jackknifed
bootstrap based test is consistent under the class of general alternatives. The power function
of the test is
                                               √m(U X − U Y )                  
                                                       m     n
                   PHa {Φα = 1} = P ∥                            ∥∞ ≥ cB (α) Ha .
                                                        2
This power function is an abstract quantity because the respective covariance matrices ΓX
and ΓY of Tm   X and T Y are unknown in practice. To circumvent this problem we define
                        n
jackknifed multiplier bootstrap based power function as P∗Ha {Φα = 1}, which is expressed
as
                                 r                 √
                  
                       e∗ X           m e∗ Y         m(vec(Σ1 − Σ2 ))                
               Pe∗ ∥Tm      −             T     +                     ∥∞ ≥ cB (α)|Ha ,
                                       n n                 2
where Pe∗ (.) denotes probability with respect to e∗m+n only. Before exploring the asymptotic
theoretical aspects of the power function we shall describe the multiplier bootstrap procedure
in the context of approximating the true power function as follows.
    Step 1: Generate {e∗1 , e∗2 , · · · , e∗m+n } independent of em+n which has been used previously
                                                      81


to calculate cB (α).
    Step 2: Now compute the bootstrap power function for the proposed test which is denoted
by
                                 r           √
                    
                        e∗ X       m e∗ Y      m(vec(Σ1 − Σ2 ))            
                Pe∗ ∥Tm       −      T     +                    ∥∞ ≥ cB (α) .
                                   n n               2
    The following theorem establishes a theoretical guarantee in an asymptotic sense for ap-
proximating the true power function PHa {Φα = 1}, by its Multiplier Bootstrap counterpart
PH∗ {Φ = 1}. For the sake of brevity, we write Ω := Σ − Σ .
   a   α                                                1     2
Theorem 3.6.1. Assuming the conditions for Theorem 3.5.1 holds, then for any Ω ∈ Rp ×
Rp , we have with probability one,
                                    r            √
                       
                             e∗ X      m e∗ Y      mvec(Ω)            
                    Pe∗ ∥Tm       −      Tn +              ∥∞ ≥ cB (α)
                                       n             2
                                          √
                                        m(U X − U Y )                  
                                                m     n
                                   −P                       < cB (α) H1
                                                 2        ∞
                        Bm,n log7 (pm) 1/6
                       2                
                    ≲                        .
                                 m
                                               82


   Proof. Under Ha ,
               r                √
   
        e ∗X       m e∗ Y         mvec(Ω)                  
Pe∗ ∥Tm −             Tn +                   ∥∞ ≥ cB (α)
                    n                 2
                         r               √
            
                e ∗X         m e∗ Y        mvec(Ω)                 
 = 1 − Pe∗ ∥Tm −               T      +              ∥∞ < cB (α) ,
                             n n              2
             √m(vec(Ω))                            ∗X
                                                          r
                                                             m e∗ Y 
                                                                             √
                                                                               m(vec(Ω)a )          
                                a                 e
 = 1 − Pe ∗ −                     − cB (α) < Tm −               Tn     a < −               + c B (α)
                        2                                     n                    2
              √                               √                      √
                                                     X − U Y ) − m(vec(Ω))
                m(vec(Ω))a                     m(Um        n a                  a
     +P −                       − cB (α) <
                      2
                      √                                         2
                        m(vec(Ω))a              
                <−                      + cB (α)
                              2
           √m(vec(Ω))                        √
                                                m(Um X − U Y ) − √m(vec(Ω))
                              a                              n a                 a
     −P −                       − cB (α) <
                      2
                      √                                         2
                        m(vec(Ω))a              
                <−                      + cB (α) ,
                              2
                    m(U X − U Y ) − √mΩ
                      √                                     
                              m       n
 ≥ 1 − sup P ∥                                     ∥∞ ∈ A
       A∈ARe
                                    2
                          ∗X
                                 r
                                    m e∗ Y                    √m(U X − U Y )               
                         e                                              m     n
             − Pe∗ ∥Tm −               Tn ∥∞ ∈ A − P ∥                           ∥∞ < cB (α) ,
                                    n                                    2
      √m(U X − U Y )                                             r
                                                                       m Y         
               m       n                                      X
 =P ∥                       ∥∞ ≥ cB (α) − sup P ∥Tm −                    T ∥∞ ∈ A
                2                             A∈ARe
                                                                       n n
                                                                          r
                                                              
                                                                  e ∗X      m e∗ Y         
                                                       − Pe∗ ∥Tm −            Tn ∥∞ ∈ A .
                                                                            n
Similarly under Ha ,
                           √m(U X − U Y )                    
                                     m      n
                        P
                                      2          ∞ > cB (α)
                                                 √       X − UY )
                                                   m(Um        n a
                                                                              
                        = 1 − P − cB (α) <                            < cB (α) ,
                                                          2
                                                    83


                   √                                         r                      √
                    m(vec(Ω))a                     e ∗X        m e∗ Y                m(vec(Ω))a              
  = 1 + Pe∗ −                       − cB (α) < Tm −                Tn < −                            + cB (α)
                          2                                     n                          2
              √m(vec(Ω))                        √                          √
                                                          X − U Y ) − m(vec(Ω))
                                a                   m(Um         n a                         a
        −P −                       − cB (α) <
                        √2                                            2
                          m(vec(Ω))a                
                   <−                      + cB (α)
                               2                                                     √
                 √m(vec(Ω))                            ∗X
                                                              r
                                                                 m e∗ Y                m(vec(Ω))a               
                                   a                  e
        − Pe ∗ −                      − cB (α) < Tm −               Tn < −                            + cB (α) ,
                            2                                     n                          2
                      √m(U X − U Y ) − √mΩ                                                r
                                                                                               m e∗ Y
                                                                                   e∗ X −
                                                                                                              
                                m        n
  ≥ 1 − sup P ∥                                       ∥∞ ∈ A − Pe∗ ∥Tm                           Tn ∥∞ ∈ A
          A∈ARe
                                       2                                                       n
                            r                √
                
                    e∗ X −     m e∗ Y          mΩ                  
        − Pe∗ ∥Tm                 Tn +             ∥∞ < cB (α) ,
                               n               2
                      r               √
         
              e ∗X       m e∗ Y          mΩ                 
  = Pe∗ ∥Tm −              T       +          ∥∞ > cB (α)
                         n n              2
                    √m(U X − U Y ) − √mΩ                                       ∗X
                                                                                        r
                                                                                            m e∗ Y           
                              m       n                                         e
     − sup P                                          ∞  ∈  A   −  P e ∗   ∥T  m      −        T n   ∥∞  ∈ A     .
       A∈A  Re                      2                                                       n
Combining the above results we get,
                  r              √                             √m(U X − U Y )
           e∗ X       m e∗ Y         mΩ                                        m        n
                                                                                                               
 Pe ∗    Tm      −      T      +               > cB (α) − P                                      > cB  (α)|H a
                      n n             2     ∞                                    2            ∞
               √         X − U Y ) − √mΩ                                           r                    
                    m(Um         n                                         e ∗X         m e∗ Y
≤ sup P                                              ∈ A − Pe∗ Tm −                        Tn          ∈A .
  A∈A   Re                     2                 ∞                                       n         ∞
By arguing as in Theorem 3.5.1, we obtain that, with probability to one,
                  √        X − U Y ) − √mΩ                                           r
                       m(Um         n
                                                                      
                                                                              e ∗X        m e∗ Y            
       sup P                                           ∈ A − Pe∗ ∥Tm −                       Tn ∥∞ ∈ A
      A∈ARe
                                  2                ∞                                       n
                         2 log7 (dm)/m}1/6 .
                   ≲ {Bm,n
This completes the proof.
     We construct a class of general alternatives denoted as Mm,n,d given below. For a large
                                                       84


constant K > 0, define
                               n                                                        o
           (f ′ ) Mm,n,d = Ω ∈ Rp × Rp : ∥Ω/2∥∞ ≥ K{Bm,n log(md)/m}1/2
Theorem 3.6.2. Assuming the conditions for Theorem 3.5.1 and (f′ ) hold. Then, for all
Ω ∈ Mm,n,d ,
                             r                √
                 
                     e∗ X        m e∗ Y         mΩ               
            Pe∗ ∥Tm        −        T     +          ∥∞ ≥ cB (α) → 1, as n, m, d → ∞
                                 n n            2
Remark 3.6.1. Cai et al. (2013) [11] and Chang et al. (2017) [14], derived similar consis-
tency results for their test statistics under a class of sparse alternatives. The above Theorem
generalizes their result in the sense that it is valid for general alternatives where Bm,n possi-
bly diverge to infinity. This can be understood by noting that the class Mm,n,d is constructed
                                                                              q              
                                                                                  Bm,n log(d)
in a manner such that Σ and Σ are separated by a lower bound K
                            X          Y
                                                                                      m        . The-
orem 4 in Cai et al. (2013) [11] derived a similar bound treating Bm,n to be constant and
                                         q        
                                             log d
their bound was of the order of O              m     under the class of alternatives.
Proof. In the proof below, K ∗ and c∗ are positive and finite universal constants, not
depending on m, n, d, whose values keep changing depending on the context. We begin the
proof by noting that by the triangle inequality,
                                      r               √
                         
                              e∗ X        m e∗ Y        mΩ              
                    Pe∗ ∥Tm         −       Tn +           ∥∞ ≥ cB (α)
                                          n             2
                                          r                  √
                            
                                  e ∗X       m e∗ Y            mΩ               
                     ≥ Pe∗ ∥Tm −                T     ∥∞ ≤ ∥       ∥∞ − cB (α) .
                                              n n              2
Define the basis vectors ηa ’s to be natural basis vectors in Rd ∀a = 1, 2, · · · , d. Then, for
                                                     85


any t > 0, we have
                   r                       d                 r
        
            e∗ X      m e∗ Y           X          ∗
                                                      e X      m e∗ Y         
     Pe∗ ∥Tm     −       Tn ∥∞ ≥ t ≤           Pe∗ |Tma −         Tna | ≥ t ,
                       n                                        n
                                          a=1
                                                                      t2
                                                                                          
                                        ≤ 2d exp −                                           .
                                                      2 max1≤a≤d {ηaT (Γ̂X + m      X
                                                                                n Γ̂ )ηa }
The last bound follows from Lemma (.0.10) for Gaussian variables. Now setting the above
bound equal to α by plugging in t = cB (α), for some large enough m we obtain that,
                                                            m         1/2
                   cB (α) = 2 log(2d/α) max {ηaT (Γ̂X + Γ̂X )ηa }
                             
                                                                           ,
                                          1≤a≤d              n
                                                         m
                          = 4 log(dn) max {ηaT (Γ̂X + Γ̂X )ηa } .
                                                                   
                                        1≤a≤d             n
Note that,
                     m X                   m JK
    max {ηaT (Γ̂X +    Γ̂ )ηa } = ∥Γ̂JK
                                     m +      Γ̂ ∥∞ ,
   1≤a≤d             n                     n n
                                                  m JK                             m JK
                                ≤ ∥Γ̂JK
                                     m −Γ +
                                             X      (Γ̂n − ΓY )∥∞ + ∥ΓJK  m +        Γ ∥∞ .
                                                  n                                n n
                                                              pm                    1 , it follows
From the bounds of ∆  ˆ m,n in Theorem 3.4.1 with δm,n =
                                                                n  and γm,n = dm
that                                                     s
                                                              2 log3 (md)
                                                            Bm,n
                                  m JK
                   ∥Γ̂JK
                      m −Γ +
                              X      (Γ̂n − ΓY )∥∞ ≲                        .
                                   n                              m
                                              86


For the term, ∥ΓJK      m JK                                 ∗                   ′
                  m + n Γn ∥∞ , we note that m/n = c by the condition(a ) and
           m JK                       m JK               ∗ ∥ΓX ∥ + m ∥ΓY ∥
                                                                               
  ∥ΓJK
    m +      Γn ∥∞ ≤ ∥ΓJK m    ∥∞   +    ∥Γ     ∥ ∞ ≤  K         ∞            ∞   ,
           n                          n n                              n
                          ∗
                            
                                 X           X         ∗    Y         Y    
                      ≤ K ∥Γ1 ∥∞ + ∥Γ2 ∥∞ + c ∥Γ1 ∥∞ + ∥Γ2 ∥∞ ,
                            "                                                               
                          ∗
                                                                
                                                                                       T  
                      ≤K          max        E vec h(X1 , X2 )       (vec(h(X1 , X3 )) a
                              1≤(a1 ,a2 )≤d                      a1                        2
                                                                                             
                                  ∗
                                                                     
                                                                                        T  
                              +c        max        E vec h(Y1 , Y2 )     vec(h(Y1 , Y3 ) a
                                    1≤(a1 ,a2 )≤d                    a1                     2
                                                          #
                                 + ∥ΓX           ∗ Y
                                      2 ∥∞ + c ∥Γ2 ∥∞ .
From the definition of ∥ΓX
                         1 ∥ and ∥Γ1 ∥, it follows that,
                                       Y
           m JK
 ∥ΓJK
    m +      Γn ∥∞                                                                           (3.6.1)
        " n
                  n                         o1/2          n                     o1/2
  ≤ K ∗ max E(vec(h(X1 , X2 ))a1 )2                max E(vec(h(X1 , X3 )a2 )2          + ∥ΓX 2 ∥∞
          1≤a1 ≤d                                 1≤a2 ≤d
                      n                        o1/2         n                    o1/2
         + c∗ max      E(vec(h(Y1 , Y2 )a1 )2        max E(vec(h(Y1 , Y3 )a2 )2
              1≤a1 ≤d                               1≤a2 ≤d
                     #
         + c∗ ∥ΓY2 ∥∞ .
                                                 87


Note that,
                                                                              
      ∥ΓX2 ∥∞ =       max      (E(vec(h(X1 , X2 )))a1 )(E(vec(h(X1 , X2 )))a2 ) ,     (3.6.2)
                   1≤a1 ,a2 ≤d
                                                                               
              ≤     max E|(vec(h(X1 , X2 )))a1 | max E|(vec(h(X1 , X2 )))a2 | ,
                   1≤a1 ≤d                             1≤a2 ≤d
                 h                               i2
              =     max (E(vech(X1 , X2 ))a1 ) ,
                   1≤a1 ≤d
                          n                        o
              ≤ max         E(vec(h(X1 , X2 ))a1 )2 ,
                 1≤a1 ≤d
                   2/3
              ≤ Bm,n .
Similar conclusion holds for ∥ΓY2 ∥∞ . Hence we obtain that
                      m JK             h
                                            2/3          2/3
                                                              i
                                                                               2/3
             ∥ΓJK
               m +       Γn ∥∞ ≤ K ∗ 2Bm,n + 2c∗ Bm,n ≤ 2K ∗ (1 + c∗ )Bm,n ,
                       n
                                           2/3
                                 ≤ 2K ∗ Bm,n ≤ 2K ∗ Bm,n ,
where the second last inequality follows from the Holder’s inequaliy and condition (b).
Therefore with probability tending to one, we get that
                                          m
               cB (α) ≤ max {ηaT (Γ̂X + Γ̂X )ηa } ≤ (8K ∗ Bm,n log(dn))1/2 .
                         1≤a≤d             n
                                                    √
Upon choosing the constant K in (f′ ) to be K =        8K ∗ , we obtain that
                        √                                              1/2
                          mΩ/2 ∞ − cB (α) ≥ 8K ∗ Bm,n log(dm)
                                                 
                                                                           .
                                               88


Therefore, we conclude that as m ∨ n → ∞ and d → ∞,
                             r           √
                 
                     e∗ X      m e∗ Y      mΩ              
              Pe∗ ∥Tm      −     Tn +          ∥∞ ≥ cB (α)
                               n           2
                               r
                    
                         e∗X      m e∗ Y           ∗            1/2
                                                                    
              ≥ Pe∗ ∥Tm −          T     ∥∞ ≤ {8K Bm,n log(dm)}
                                  n n
                                   r
                             ∗X      m e∗ Y                           
              = 1 − Pe∗ ∥Tm −e          Tn ∥∞ ≥ {8K ∗ Bm,n log(dm)}1/2
                                      n
                                                              2
              ≥ 1 − 2d exp − 8K ∗ Bm,n log(dm)/2Bm,n ≥ 1 − 3 4 → 1.
                                                              d m
                                            89


APPENDIX
    90


In this Chapter we present some auxiliary results which are crucial in proving the results
Chapter 2 and Chapter 3. Some of them are presented without proofs as they are the results
from other research articles. Some lemmas are original to our work and we provide their
proofs.
Lemma .0.1. Let X1 , X2 , · · · , Xn be independent centered random vectors in Rd with d ≥ 2.
Define Z X := maxi≤j≤d | n
                           P
                             i=1 Xi,j |,
M X := max1≤i≤n max1≤j≤d |Xij | and σ 2 := max1≤j≤d n               2
                                                          P
                                                             i=1 E[Xij ].Then
                                                 q                
                           X           p
                                  ≤ C σ log d + E (M X )2 log d
                                                               
                        E Z
where C is a universal constant.
   Proof: See Lemma 8 in Chernozhukov et al. (2015) [18].
Lemma .0.2. Under the setting of Lemma .0.1, for every ϵ ≥ 0, δ ∈ (0, 1] and t > 0,
                                                                                    
         X               X                   2    2                            X   δ
    P Z ≥ (1 + ϵ)E[Z ] + t ≤ exp − t /(3σ ) + 3 exp − t/ C1 ∥M ∥ψ                        ,
                                                                                  δ
where C1 = C1 (ϵ, δ) is a constant depending only on ϵ, δ.
   Proof: See Theorem 4 in Adamczak (2008) [1].
Lemma .0.3. Under the setting of Lemma .0.1, for every ϵ ≥ 0, s > 0 and t > 0,
                                        
            P Z ≥ (1 + ϵ)E[Z ] + t ≤ exp{−t2 /(3σ 2 )} + C2 E M X )u /tu ,
                X                  X                                         
where C2 = C2 (ϵ, u) is a constant depending only on ϵ, u.
                                               91


    Proof: See Theorem 2 in Adamczak (2010) [2].
Lemma .0.4. (Nazarov’s inequality) Let X ∈ Rd denote a centered Gaussian random vector
in Rd such that for some constant b > 0, and E[Xj2 ] ≥ b, j = 1, 2, · · · , d. Then for every
x ∈ Rd and a > 0,
                                                                 p
                               P(X ≤ x + a) − P(X ≤ x) ≤ Ca log p,
where C is a constant depending only on b.
    Proof: See Nazarov [38] or [20] .
    The following lemma is Lemma 5.1 in Chernozhukov et al. (2017) [19], we state it here
for the sake of completeness.
    For a collection of independent random vectors X = X1 , X2 , · · · , Xn ∈ Rd with mean-
vector µ and covariance matrix Σ and a collection of independent Gaussian random vectors
Y = Y1 , Y2 , · · · , Yn ∈ Rd with the same mean and covariance matrix as that of X, we define
                                            √ X √
                                             vSn + 1 − vSnY ≤ y − P SnX ≤ y .
                                                                              
                   ρn :=      sup       P
                          y∈Rd ,v∈[0,1]
We further define Mn,X (ϕ), Mn,Y (ϕ) and Mn (ϕ) as
                                     n     h       i
                         n−1                     3
                                    X
                 Ln =         max        E |Xij | ,
                             1≤j≤d
                                    i=1
                                  n
                                        "                              √      #
                               1                                          n
                                      E max |Xik |3 I max |Xik | >
                                 X
                 Mn,X (ϕ) =                                                        ,
                               n          1≤k≤d            1≤k≤d     4ϕ log(d)
                                 i=1
                                  n
                                        "                             √      #
                               1                                         n
                                      E max |Yik |3 I max |Yik | >
                                 X
                 Mn,Y (ϕ) =                                                      ,
                               n          1≤k≤d           1≤k≤d     4ϕ log(d)
                                 i=1
                 Mn (ϕ) = Mn,X (ϕ) + Mn,Y (ϕ)
                                                     92


Now, we are in a position to state the lemma.
                                                                                    Pn
Lemma .0.5. Suppose that there exists some constant b > 0 such that n− 1
                                                                                             2
                                                                                      i=1 E  Xij ≥ b
for all j = 1, 2, · · · , d. Then ρn satisfies the following inequality for all ϕ ≥ 1 :
                              ϕ2 log2 p n              p                o log p
                       ρn ≲     √        ϕLn ρn + Ln log p + ϕMn (ϕ) +            ,
                                   n                                          ϕ
up to a constant K that depends only on b.
    Proof: See proof of Lemma 5.1 in Chernozhukov et al. (2017) [19].
The following Lemma appears in Chernozhukov et al. (2017) [19]. We quote the lemma here
as it been used in the proof of Theorem 2.3.1.
Lemma .0.6. Let ξ be a non-negative random variable such that P (ξ > x) ≤ Ae−x/B
for all x ≥ 0 and for some constants A, B > 0. Then for every t ≥ 0, E ξ 3 I{ξ > t} ≤
                                                                                                
6A(t + B)3 e−t/B .
    Proof: See proof of Lemma C.1 in Chernozhukov et al. (2017) [19].
The following lemma provides a bound of approximation between multiplier bootstrap and
Gaussian of normalized sums.
Lemma .0.7. Under the setup of Lemma .0.5 we define ∆n,r = max 1 ≤ j, k ≤ d |Σ̂j,k −Σj,k |
where Σ̂j,k denotes the (j, k)th element of Σ̂ = (n−1)−1 (Xi − X̄)(Xi − X̄)T and Σj,k denotes
the (j, k)th element of Σ. Now define,
                                              n                         n
                                          1 X                       1 X
                    ρMn
                         B  = sup P( √           ei Xi ≤ y|X) − P( √       Yi ≤ y) .
                               y∈Rd
                                           n                         n
                                             i=1                       i=1
                                                     93


Under these conditions we have,
                                                 1/3     2/3 d.
                                   ρM
                                    n
                                      B ≤ C∆
                                                 n,r log
    Proof: See proof of Theorem 4.1 in Chernozhukov et al. (2017) [19].
The following lemmas can be found as Lemma 2.2.2 and Lemma 2.2.1 on page-96 in Van-
der-Waart and Wellner (1996) [48]. We quote the lemma here as it has been used in the
proof of Theorem 2.3.1.
Lemma .0.8. Let ψ be a convex, non-decreasing, non-zero function with ψ(0) = 0 and
               ψ(x)ψ(y)
lim supx,y→∞ ψ(cxy) < ∞ for some constant c.
Then, for any random variable X1 , X2 , · · · , Xn , we have
                           ∥ max Xi ∥ψ ≤ Kψ −1 (n) max ∥Xi ∥ψ ,
                            1≤i≤n                           i
for a constant K depending only on ψ.
                                                                        p
Lemma .0.9. Let X be any random variable with P (|X| ≥ x) < Ke−Cx for every x, for
                                                                                1/p
constants K and C and for p ≥ 1. Then its Orlicz norm satisfies ∥X∥ψp ≤      1+K      .
                                                                               C
Lemma .0.10. Let X ∼ N (0, ν) where ν > 0 is the variance of X. Then,
                                                            β2 
                                P |X| ≥ β ≤ 2 exp −               .
                                                              2ν
    Now, we shall state a lemma which appears as Lemma E.1 in Chen (2018) [16]. Before
stating the lemma we need to define a few quantities. For m = [n/r], we define
Z X = m {n(n − 1)}−1
                        P
                          1≤i̸=j≤n h(Xi , Xj ) − E(h(X1 , X2 ) ∞ ,
                                                94


      ir+r ) = h (X
hj (Xir+1        j   ir+1 , Xir+2 , · · · , Xir+r ),
     ir+r ) = h (X ir+r )I(max                     ir+r
h̄(Xir+1       j    ir+1         1≤j≤d hj (Xir+1 ) ≤ τ ),
                     Pm−1            ir+r                ir+r 
Z1X = max1≤j≤d          i=0 h̄j (Xir+1 ) − Eh̄j (Xir+1 ) ,
                                               ir+r )|, ϑ2 = max        Pm−1 2 ir+r
M X = max1≤j≤d max0≤i≤m−1 |hj (Xir+1                              1≤j≤d    i=0 Ehj (Xir+1 ).
Lemma .0.11. Let X1 , X2 , · · · , Xn ∈ Rp and α ∈ (0, 1].
Suppose that hj (X1 , X2 , · · · , Xr ) ψ < ∞ for all j = 1, 2, · · · , d and r < n.
                                              α
Let τ = 8E[M X ]. Then , for any 0 < η ≤ 1 and δ > 0, there exists a constant C(α, η, δ) > 0
such that we have, ∀t > 0
                                                                      "                                α #
                                                     t2
                                                                         
                                                                                          t
   P(Z X ≥ (1 + η)EZ1 + t) ≤ exp             −                 + 3 exp −                                    .
                                                2(1 + δ)ϑ2                    C(α, η, δ)∥M X ∥ψα
    Proof. See proof of Lemma E.1 in Chen (2018) [16].
    Before stating the next lemma we need to define a few quantities: for any function f
from Rp × Rp to Rp × Rp , define
                     1                                                1
          X =                                               VnY =
                               X                                                X
        Vm                               f (Xi , Xj ),                                   f (Yi , Yj ),
               m(m − 1)                                            n(n − 1)
                            1≤i̸=j≤m                                         1≤i̸=j≤n
       MX =        max      max fa (Xi , Xj ) ,            MY =     max     max fa (Yi , Yj ) ,
               1≤i̸=j≤m 1≤a≤d                                    1≤i̸=j≤n 1≤a≤d
                                             1                                        1
                                               q                                           q
        DqX  = max       E|fa (X1 , X2   )| q    ,      DqY  = max     E|fa (Y1 , Y2 )|q     ,   q > 0.
               1≤a≤d                                           1≤a≤d
    The following lemma will provide a bound for Rm,n . The claim (.0.3) of this lemma is
Theorem 5.1 of Chen (2018) [16] while (.0.4) follows from (.0.3) by applying it to each of the
two samples.
Lemma .0.12. Let X m = (X1 , · · · , Xm ) and Y n = (Y1 , · · · , Yn ) be two independent random
                                                        95


samples from F1 , F2 , respectively. Let f : Rp × Rp 7→ Rd be a measurable function such
that f (x, z) = f (z, x), ∀ x, z ∈ Rp and E fa (X1 , X2 ) + E fa (Y1 , Y2 ) < ∞. If 2 ≤ d ≤
exp b(m ∨ n) , for some constant b > 0, then ∃ a constant 0 < K X < ∞ such that
              
           X
                             √  log(d)  3          X        log(d) X        log(d)  5 X 
     E  Vm   ∞  ≤  K X (1 +     b)            2 ∥M ∥4 +             D2 +               4D
                                                                                           4 .     (.0.3)
                                        m                       m                 m
Consequently, with K = max{K X , K Y } > 0, we obtain
              h                       i
             E Vm  X −δ         Y                                                                  (.0.4)
                         m,n Vn ∞
                                                                                           
                                         3                                        5
                        √
                                                                        
                                   log d 2                 log(d) X        log(d) 4 X 
              ≤ K(1 + b)                   ∥M X ∥4 +            D2 +                  D4
                                     m                       m               m
                                                                                       
                                    3                                       5
                             log(d) 2                  log(d) Y        log(d) 4 Y 
                + δm,n                   ∥M Y ∥4 +           D2 +                  D4 .
                                n                         n               n
   To proceed further we need notation. For q > 0 and any sequences ϕm , ϕn ≥ 1, define
          X = max E|g (X − µX )|q , D Y = max E|g (Y − µY )|q ,
        Dg,q               a                     g,q                a
                1≤a≤d                                    1≤a≤d
                                                                                      √     
           X                                  X    q                        X             n
        Mg,q (ϕm ) = E max ga (X − µ ) I max ga (X − µ ) >                                       ,
                          1≤a≤d                          1≤a≤d                      4ϕm log d
                                                                                    √      
           Y                                 Y   q                        Y             n
        Mg,q (ϕn ) = E max ga (Y − µ )| I max ga (Y − µ ) >                                    ,
                         1≤a≤d                          1≤a≤d                      4ϕn log d
                                                                  √       
           G1                        G1 q                 G1           n
        Mq (ϕm ) = E max |Tma | I max |Tma | >                                   ,
                          1≤a≤d               1≤a≤d              4ϕm log d
                                                                 √       
           G2                        G2 q                 G2          n
        Mq (ϕn ) = E max |Tna | I max |Tna | >                                 ,
                         1≤a≤d               1≤a≤d               4ϕn log d
        MqX (ϕm ) = Mg,qX (ϕ ) + M G1 (ϕ ),           MqY (ϕn ) = Mg,qY (ϕ ) + M G2 (ϕ ).
                              m         q    m                             n        q     n
                                                     96


Also denote for τ > 0,
                
                                                          q                                       
      X (τ )
   Mh,q      =E     max      max       vec(h(Xi , Xj )) a I max
                                                                                           
                                                                          vec(h(Xi , Xj )) a > τ ,
                  1≤i̸=j≤m 1≤a≤d                                1≤a≤d
                                                                                                
      Y                                                 q                             
   Mh,q (τ ) = E    max      max       vec(h(Yi , Yj )) a I max         vec(h(Yi , Yj )) a > τ .
                  1≤i̸=j≤m 1≤a≤d                              1≤a≤d
We are ready to state the following lemma.
Lemma .0.13. Suppose the condition (a) holds and log(d) ≤ b̄(m ∨ n), for some constant
b̄ > 0. Then, for constants Ci := Ci (b, b̄) > 0, i = 1, 2 such that for any real sequences D̄g,3      X
        Y satisfying D X ≤ D̄ X and D Y ≤ D̄ Y , and for all τ > 0, we obtain,
and D̄g,3               g,3      g,3         g,3      g,3
                                   !1
                       X )2 log7 d 6                           Y )2 |δ
                                                                                     !1
                                                                            6 log7 d 6
                   (D̄g,3                  M3X (ϕm )        (D̄g,3         |                 M3Y (ϕn )
               
                                                                       m,n
    ρ∗∗
     m,n ≤ C3                           +        X
                                                       +                                 +       Y
                          m                   D̄g,3                    n                      D̄g,3
                      log3/2 d                                                  5/4 d
                    
               + ϕ∗                X (τ )1/4 + τ ) + log(d) (D X )1/2 + log
                               (Mh,4                                                  (D4X )1/4
                                                                 2
                         m                              m1/2                  m3/4
                 log3/2 d                         log(d) X 1/2 log5/4 d X 1/4
                                                                                          
               +              X
                          (Mh,4 (τ ) 1/4  + τ) +          (D2 )     +            (D4 )        .
                    m                              m1/2                 m3/4
where, C3 = max{C1 , C2 },       ϕ∗ := (max ϕm , ϕn ), with
                             X )2 log4 d
                                           !−1/6                      Y )2 log4 d
                                                                                    !−1/6
                          (D̄g,3                                   (D̄g,3
              ϕm = C1                              ,   ϕ n = C2                             .       (.0.5)
                                 m                                        n
    Proof. This lemma is analogous to Proposition 5.3 of Chen (2018) [16]. We provide
details to clearly address the additional changes needed in the proof of Proposition 5.3 to
prove the stated lemma. Fix a y ∈ Rp and define
                                        d
                              1      X                      
                   Fβ (w) = log            exp β(wj − yj ) ,        β ∈ R, w ∈ Rp .
                              β
                                      j=1
                                                    97


We shall often use this function with β = ϕ log(d), where ϕ ≥ 1. In this case,
                                                     log(d)
                0 ≤ Fβ (w) − max (wj − yj ) ≤               = ϕ−1 ,   ∀ w ∈ Rp , ϕ ≥ 1.
                                1≤j≤d                  β
Next, let u0 : R → [0, 1] be a function such that u0 (t) = 1, if t < 0, u0 (t) = 0, if t > 1 and
u0 (t), t ∈ [0, 1], is five times continuously differentiable with bounded derivatives. Let
                u(t) := u0 (ϕt),       Ψ(w) = u(ϕFβ (w)),       t ∈ R, ϕ ≥ 1, w ∈ Rp .
Note that Ψ(w) : Rp × Rp → [0, 1]. For later use, we note that when β = ϕ log(d),
                                I(t ≤ 0) ≤ u(t) ≤ I(t ≤ ϕ−1 ),     t ∈ R.
Let G1i , H1i , 1 ≤ i ≤ m be i.i.d. Nd (0, ΓX ) and G2j , H2j , 1 ≤ j ≤ n be i.i.d. Nd (0, ΓY ),
independent of G1i , H1i , 1 ≤ i ≤ m, where ΓX = Cov(g(X − µX )), ΓY := Cov(g(Y − µY )).
Let
                   1 h √ n√              √            o √            i
      Zi∗ (t) := √         t vg(Xi ) + 1 − vG1i + 1 − tH1i , 1 ≤ i ≤ m,
                    m
                    1        h√ n√                    √         o √              i
      Zj∗∗ (t) := √ δm,n t vg(Yj − µY ) + 1 − vG2j + 1 − tH2j , 1 ≤ j ≤ n,
                     n
                  m                        n
      Z ∗ (t) :=      Zi (t), Z ∗∗ (t) :=     Zj∗∗ (t), Z(t) = Z ∗ (t) + Z ∗∗ (t), v, t ∈ [0, 1].
                 X                        X
                 i=1                      j=1
                                                    98


Let
                              m            √                 m                        n
                   √ 1 X                                               √
                  
                                                       1 X                        1 X
     Im,n := Ψ       v√          g(Xi ) + 1 − v √               G1i + vδm,n √             g(Yj )
                         m                             m                           n
                             i=1                            i=1                      j=1
                       √                     n                      m                     n       
                                       1 X                       1 X                    1 X
                     + 1 − vδm,n √              G2j − Ψ √                H1i + δm,n √           H2j
                                         n                        m                      n
                                           j=1                       i=1                   j=1
                                    
            = Ψ Z(1) − Ψ Z(0) .
Recall (.0.5). By Xue and Yao (2020) [54], we obtain
                                ϕ log2 d h
                               2                                                                i
      E[Im,n (v)] ≲ C1 (b) m√                       X ρ1           X                     X (ϕ )
                                                                      p
                                              ϕm Dg,3   m,n   +  Dg,3   log(d)  +  ϕ  M
                                                                                    m 3       m
                                     m
                         ϕ2n log2 d         3
                                              
                                                     Y    1         Y
                                                                       p                  Y
                                                                                                 
                      + √           |δm,n | ϕ2 Dg,3 ρm,n + Dg,3 log(d) + ϕn M3 (ϕn )                 .
                               n
To proceed further, define
                                                    m                         n
                                       √ n 1 X
                                    
                                                                         1 X           o
           ρ1m,n := sup sup P            v √            g(Xi ) + δm,n √          g(Yj )
                    v∈[0,1] y∈Rp                m                         n
                                                   i=1                       j=1
                                       √                  m                     n              
                                                n   1 X                     1 X         o
                                    + 1−v √                   G1i + δm,n √          G2j ≤ y
                                                     m                       n
                                                         i=1                   j=1
                                                 1       m                     n
                                                        X                  1   X             
                                           −P √               G1i + δm,n √         G2j ≤ y .
                                                     m                      n
                                                        i=1                    j=1
Note that
                                                                              
                      ρ1m,n = sup sup P Z(1) ≤ y − P Z(0) ≤ y .
                                v∈[0,1] y∈Rp
                                                     99


By the Mean Value Theorem,
                                       d                   d
                                                              u′ (Fβ (ξ))ηa (ξ)Rm,n,a
                                      X                   X
            Ψ(Wm,n ) − Ψ(Lm,n ) =         ∂a Ψ(ξ)Rm,n.a =
                                      a=1                 a=1
where ηa (w) = ∂Fβ (w)/∂wa is defined to be the first order partial derivative of Fβ (w)
w.r.t wa and η := (η1 , · · · , ηd )T is a d × 1 random vector on the line segment joining
Lm,n and Tm,n . Following the arguments in Xue and Yao (2020) [54], we can verify that
ηa (w) ≥ 0, da=1 ηa (w) = 1, for any w ∈ Rp and there is a constat K1 (ϕ∗ ) such that
              P
supt∈R |u′ (t)| ≤ K1 (ϕ∗ ), where ϕ∗ = max{ϕm , ϕn }. Therefore,
                            E[Ψ(Tm,n ) − Ψ(Lm,n )] ≤ K1 ϕ∗ E|Rm,n |∞ .
Proceeding as in Xue and Yao (2020) [54] (Eqn (99)) with ϕ = min{ϕm , ϕn }, we conclude
that
                            
       P Z(1) ≤ y − ϕ−1
                                                                                   
        ≤ P Z(0) ≤ y − ϕ−1 + C2 (b)ϕ−1 log(d) + |E[Im,n ]| + K1 ϕ∗ E |Rm,n |∞ ,
                                              p
                            
       P Z(0) ≤ y + ϕ     −1
                                                                                   
        ≥ P Z(1) ≤ y + ϕ−1 + C2 (b)ϕ−1 log(d) + |E[Im,n ]| + K1 ϕ∗ E |Rm,n |∞ .
                                              p
                                               100


Combining these bounds with the previous equations, we conclude that
                                                   1
  ρ1m,n ≤ K1 ϕ∗ E|Rm,n |∞ + C2 (b)ϕ−1 log 2 d
                           (ϕm )2 log2 d
                                                                                                
                                                    X    1        X                       X
                                                                     p
                + C1 (b)        √             ϕm Dg,3 ρm,n + Dg,3 log(d) + ϕm M3 (ϕm )
                                  m
                                |δm,n |3 ϕ2n log2 d
                                                                                                         
                                                              Y  1         Y                      Y
                                                                               p
                             +           √              ϕn Dg,3 ρm,n + Dg,3 log(d) + ϕn M3 (ϕn ) .
                                            n
By similar arguments as used in Lemma 4 of Xue and Yao (2020) [54] and choosing ϕX                         Y
                                                                                                      m , ϕn ≥
1 we conclude that for any real sequence (D̄g,3                                         g,3 m,n is bounded
                                                     X )2 ≥ D X and (D̄ Y )2 ≥ D Y , ρ1
                                                                g,3          g,3
from the above by C3 (b) multiplied by
                                       !1                                                 !1
                                                                                                            
                          X )2 log7 d 6                              Y )2 |δ       6  7
                     (D̄g,3                      M3X (ϕm )       (D̄g,3     m,n | log d 6         M3Y (ϕn ) 
ϕ E|Rm,n |∞ +                               +               +                                  +            .

                            m                        X
                                                   D̄g,3                     n                         Y
                                                                                                     D̄g,3
By similar arguments as used in Chen (2018) [16], Lemma A.1 and Jensen’s inequality, there
exist universal positive constants K2 , K3 such that the following inequalities hold.
           h                                    i          h                                     4 i
         E max         max      fa4 (Xi , Xj ) ≤ K2 E max             max       vec(h(Xi , Xj )) a ,
             1≤a≤d 1≤i̸=j≤m                                  1≤a≤d 1≤i̸=j≤n
             h                                  i          h                                    4 i
          E     max       max     fa4 (Yi , Yj ) ≤ K3 E max           max       vec(h(Yi , Yj )) a .
               1≤a≤d 1≤i̸=j≤m                                1≤a≤d 1≤i̸=j≤n
By Lemma .0.12, we obtain that
                                  log3/2 d                                                     5/4 d
                                
                         1                        X (τ )1/4 + τ ) + log(d) (D X )1/2 + log
 E[|Rm,n |∞ ] ≤   K3 (b̄ 2 + 1)              (Mh,4                               2                   (D4X )1/4
                                       m                              m 1/2                 m  3/4
                                   3/2
                       |δm,n | log d                               log(d) X 1/2 log d X 1/4 5/4             
                   +                            X
                                           (Mh,4 (τ ) 1/4  + τ) +         (D2 )      +            (D4 )       .
                              m                                     m1/2                  m3/4
    Finally by using Xue and Yao (2020) [54] Lemma 3 and Lemma 4, we conclude the proof
                                                       101


of this lemma, since
              "                 !1
                     X )2 log7 d 6                         Y )2 |δ      6     7
                                                                                 !1
                  (D̄g,3                M3X (ϕm )       (D̄g,3    m,n | log d 6         M3Y (ϕn )
    ρ∗∗
     m,n ≤ CI                        +       X
                                                   +                                  +     Y
                         m                D̄g,3                    n                      D̄g,3
                  log3/2 d                                               5/4 d
                
           + ϕ∗               X (τ )1/4 + τ ) + log(d) (D X )1/2 + log
                           (Mh,4                                                (D4X )1/4
                                                          2
                     m                           m1/2                  m3/4
                                                                                       #
                log3/2 d    Y (τ )1/4 + τ ) + log(d)               log 5/4 d
             +           (Mh,4                       (D2Y )1/2 +             (D4Y )1/4 .
                   m                           m 1/2                 m 3/4
                                               102


BIBLIOGRAPHY
      103


                                    BIBLIOGRAPHY
 [1] Radoslaw Adamczak. A tail inequality for suprema of unbounded empirical processes
     with applications to markov chains. Electronic Journal of Probability, 13:1000–1034,
     2008.
 [2] Radosław Adamczak. A few remarks on the operator norm of random toeplitz matrices.
     Journal of Theoretical Probability, 23(1):85–108, 2010.
 [3] TW Anderson. An introduction to statistical multivariate analysis. John-Wiley, 2003.
 [4] Zhidong Bai and Hewa Saranadasa. Effect of high dimension: by an example of a two
     sample problem. Statistica Sinica, pages 311–329, 1996.
 [5] Zhidong Bai and Jian-feng Yao. Central limit theorems for eigenvalues in a spiked
     population model. In Annales de l’IHP Probabilités et statistiques, volume 44, pages
     447–474, 2008.
 [6] Jinho Baik and Jack W Silverstein. Eigenvalues of large sample covariance matrices of
     spiked population models. Journal of multivariate analysis, 97(6):1382–1408, 2006.
 [7] Maurice Stevenson Bartlett. Properties of sufficiency and statistical tests. Proceed-
     ings of the Royal Society of London. Series A-Mathematical and Physical Sciences,
     160(901):268–282, 1937.
 [8] Peter Bühlmann and Sara Van De Geer. Statistics for high-dimensional data: methods,
     theory and applications. Springer Science & Business Media, 2011.
 [9] T Tony Cai, Xiao Han, and Guangming Pan. Limiting laws for divergent spiked eigen-
     values and largest nonspiked eigenvalue of sample covariance matrices. The Annals of
     Statistics, 48(3):1255–1280, 2020.
[10] T Tony Cai and Yin Xia. High-dimensional sparse manova. Journal of Multivariate
     Analysis, 131:174–196, 2014.
[11] Tony Cai, Weidong Liu, and Yin Xia. Two-sample covariance matrix testing and support
     recovery in high-dimensional and sparse settings. Journal of the American Statistical
     Association, 108(501):265–277, 2013.
[12] Mireille Capitaine and Muriel Casalis. Asymptotic freeness by generalized moments for
     gaussian and wishart matrices. application to beta random matrices. Indiana University
     mathematics journal, pages 397–431, 2004.
                                             104


[13] Nilanjan Chakraborty and Lyudmila Sakhanenko. Novel multiplier bootstrap tests for
     high-dimensional data with applications to manova. Manuscript, 2022.
[14] Jinyuan Chang, Wen Zhou, Wen-Xin Zhou, and Lan Wang. Comparing large covariance
     matrices under weak conditions on the dependence structure and its application to gene
     clustering. Biometrics, 73(1):31–41, 2017.
[15] Song Xi Chen, Jun Li, and Ping-Shou Zhong. Two-sample and anova tests for high
     dimensional means. The Annals of Statistics, 47(3):1443–1474, 2019.
[16] Xiaohui Chen. Gaussian and bootstrap approximations for high-dimensional u-statistics
     and their applications. The Annals of Statistics, 46(2):642–678, 2018.
[17] Victor Chernozhukov, Denis Chetverikov, and Kengo Kato. Gaussian approximations
     and multiplier bootstrap for maxima of sums of high-dimensional random vectors. The
     Annals of Statistics, 41(6):2786–2819, 2013.
[18] Victor Chernozhukov, Denis Chetverikov, and Kengo Kato. Comparison and anti-
     concentration bounds for maxima of gaussian random vectors. Probability Theory and
     Related Fields, 162(1):47–70, 2015.
[19] Victor Chernozhukov, Denis Chetverikov, and Kengo Kato. Central limit theorems and
     bootstrap in high dimensions. The Annals of Probability, 45(4):2309–2352, 2017.
[20] Victor Chernozhukov, Denis Chetverikov, and Kengo Kato. Detailed proof of nazarov’s
     inequality. arXiv preprint arXiv:1711.10696, 2017.
[21] Arthur P Dempster. A high dimensional two sample significance test. The Annals of
     Mathematical Statistics, pages 995–1010, 1958.
[22] Arthur P Dempster. A significance test for the separation of two highly multivariate
     small samples. Biometrics, 16(1):41–50, 1960.
[23] Bradley Efron. Large-scale inference: empirical Bayes methods for estimation, testing,
     and prediction, volume 1. Cambridge University Press, 2012.
[24] Jianqing Fan, Yuan Liao, and Jiawei Yao. Power enhancement in high-dimensional
     cross-sectional tests. Econometrica, 83(4):1497–1541, 2015.
[25] Ronald A Fisher. Xv.—the correlation between relatives on the supposition of mendelian
     inheritance. Earth and Environmental Science Transactions of the Royal Society of
     Edinburgh, 52(2):399–433, 1919.
[26] Yasunori Fujikoshi, Tetsuto Himeno, and Hirofumi Wakaki. Asymptotic results of a high
     dimensional manova test and power comparison when the dimension is large compared
     to the sample size. Journal of the Japan Statistical Society, 34(1):19–26, 2004.
                                             105


[27] Christophe Giraud. Introduction to high-dimensional statistics. Chapman and Hal-
     l/CRC, 2021.
[28] Kurt Johansson. Shape fluctuations and random matrices. Communications in mathe-
     matical physics, 209(2):437–476, 2000.
[29] Iain M Johnstone. On the distribution of the largest eigenvalue in principal components
     analysis. The Annals of statistics, 29(2):295–327, 2001.
[30] Iain M Johnstone and Arthur Yu Lu. On consistency and sparsity for principal com-
     ponents analysis in high dimensions. Journal of the American Statistical Association,
     104(486):682–693, 2009.
[31] DN Lawley. A generalization of fisher’s z test. Biometrika, 30(1/2):180–187, 1938.
[32] Ji Oon Lee and Kevin Schnelli. Tracy–widom distribution for the largest eigenvalue
     of real sample covariance matrices with general population. The Annals of Applied
     Probability, 26(6):3786–3839, 2016.
[33] Jun Li and Song Xi Chen. Two sample tests for high-dimensional covariance matrices.
     The Annals of Statistics, 40(2):908–940, 2012.
[34] Zhenhua Lin, Miles E Lopes, and Hans-Georg Müller. High-dimensional manova via
     bootstrapping and its application to functional and sparse count data. Journal of the
     American Statistical Association, pages 1–15, 2021.
[35] Vladimir Alexandrovich Marchenko and Leonid Andreevich Pastur. Distribution of
     eigenvalues for some sets of random matrices. Matematicheskii Sbornik, 114(4):507–
     536, 1967.
[36] Robb J Muirhead. Aspects of multivariate statistical theory. John Wiley & Sons, 2009.
[37] DN Nanda. Probability distribution tables of the largest root of a determinantal equa-
     tion with two roots. J. Indian Soc. Of Agricultural Stat, 3:175–177, 1951.
[38] Fedor Nazarov. On the maximal perimeter of a convex set in Rn with respect to a
     gaussian measure. In Geometric aspects of functional analysis, pages 169–187. Springer,
     2003.
[39] Debashis Paul. Asymptotics of sample eigenstructure for a large dimensional spiked
     covariance model. Statistica Sinica, pages 1617–1642, 2007.
[40] KC Sreedharan Pillai. Some new test criteria in multivariate analysis. The Annals of
     Mathematical Statistics, pages 117–121, 1955.
                                             106


[41] Samarenda Nath Roy and J Roy. A Note on a Class of Problems in" normal" Mul-
     tivariate Analysis of Variance. United States Air Force, Office of Scientific Research,
     1957.
[42] James R Schott. Some high-dimensional tests for a one-way manova. Journal of Multi-
     variate Analysis, 98(9):1825–1839, 2007.
[43] James R Schott. A test for the equality of covariance matrices when the dimen-
     sion is large relative to the sample sizes. Computational Statistics & Data Analysis,
     51(12):6535–6542, 2007.
[44] Kerby Shedden and Jeremy Taylor. Lung adenocarcinomas. Methods of Microarray
     Data Analysis IV, page 121, 2004.
[45] Muni S Srivastava. Multivariate theory for analyzing high dimensional data. Journal
     of the Japan Statistical Society, 37(1):53–86, 2007.
[46] Muni S Srivastava and Hirokazu Yanagihara. Testing the equality of several covariance
     matrices with fewer observations than the dimension. Journal of Multivariate Analysis,
     101(6):1319–1329, 2010.
[47] Muni Shanker Srivastava and CG Khatri. An introduction to multivariate statistics.
     North-Holland/New York, 1979.
[48] Aad W Vaart and Jon A Wellner. Weak convergence. In Weak convergence and empirical
     processes, pages 16–28. Springer, 1996.
[49] Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, vol-
     ume 48. Cambridge University Press, 2019.
[50] Lili Wang and Debashis Paul. Limiting spectral distribution of renormalized separable
     sample covariance matrices when p/n→ 0. Journal of Multivariate Analysis, 126:25–52,
     2014.
[51] Weichen Wang and Jianqing Fan. Asymptotics of empirical eigenstructure for high
     dimensional spiked covariance. Annals of statistics, 45(3):1342, 2017.
[52] Hiroki Watanabe, Masashi Hyodo, and Shigekazu Nakagawa. Two-way manova with
     unequal cell sizes and unequal cell covariance matrices in high-dimensional settings.
     Journal of Multivariate Analysis, 179:104625, 2020.
[53] Samuel S Wilks. Certain generalizations in the analysis of variance. Biometrika, pages
     471–494, 1932.
[54] Kaijie Xue and Fang Yao. Distribution and correlation-free two-sample test of high-
     dimensional means. The Annals of Statistics, 48(3):1304–1328, 2020.
                                              107


[55] Takayuki Yamada and Muni S Srivastava. A test for multivariate analysis of variance
     in high dimension. Communications in Statistics-Theory and Methods, 41(13-14):2602–
     2615, 2012.
[56] Jin-Ting Zhang, Jia Guo, and Bu Zhou. Linear hypothesis testing in high-dimensional
     one-way manova. Journal of Multivariate Analysis, 155:200–216, 2017.
[57] Mingjuan Zhang, Cheng Zhou, Yong He, and Bin Liu. Data-adaptive test for high-
     dimensional multivariate analysis of variance problem. Australian & New Zealand Jour-
     nal of Statistics, 60(4):447–470, 2018.
                                             108