BOOTSTRAP BASED HYPOTHESIS TESTING FOR HIGH-DIMENSIONAL DATA By Nilanjan Chakraborty A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Statistics — Doctor of Philosophy 2022 ABSTRACT BOOTSTRAP BASED HYPOTHESIS TESTING FOR HIGH-DIMENSIONAL DATA By Nilanjan Chakraborty Over the last two decades inference problems on high-dimensional data that arise in finance, genetics and information technology have gained huge momentum. In this work, the main focus will be on developing bootstrap testing procedures under high dimensional set up for the following two hypotheses testing problems. i) High-dimensional Multivariate Analysis of Variance ii) Testing the equality of two covariance matrices in the two sample set up. The statistics considered for testing are infinity norm based statistics over either weighted sums or differences across various samples. We provide Gaussian approximation results for normalized sums of high dimensional random vectors and U-statistics under some weak con- ditions on moments and tails of their marginal distributions. The obtained results are free from the assumption of sparsity and correlation structures among the components of the random vectors. For the implementation of these tests, we develop multiplier bootstrap and jackknifed multiplier bootstrap procedures. These newly developed bootstrap techniques en- sure first order accuracy of the asymptotic level and power of the formulated tests, enhancing their applicability. We also provide consistency of the proposed test against both fixed and local alternatives. This thesis is dedicated to my family and to all my teachers for their endless love, support and encouragement. iii ACKNOWLEDGMENTS I am indeed grateful to many people who helped me accomplish this doctoral degree in Statistics. First of all, I would like to thank my parents: Mr. Nandadulal Chakraborty and Mrs. Krishna Chakraborty. From educating me in the earliest stages of life to constantly supporting me, even to this day so far away from home, my parents have indeed been my biggest support system. Without them, this day would not have been possible. Even though my late grandmother, Smt. Pratima Chowdhury, is not with us, her love and care had always provided me with a safety net to aspire for a better future. I couldn’t help but remember her today. Also a very special thanks to my sister, Miss Manjira Chakraborty, for not allowing me to give up when the chips were down. I consider myself fortunate enough to be advised by Prof. Hira Lal Koul and Prof. Lyudmila Sakhanenko during my graduate study at the Michigan State University. Prof Koul’s endless support and encouragement have always been a source of inspiration for me. I must mention that throughout the difficult times of pandemic whenever I needed help, he would have in-person meetings with me. Prof. Sakhanenko was instrumental in shaping up my thesis in its present form. Her mathematical acumen and her way of thinking were quite enticing, which helped me a lot to think independently and grow as a scientific researcher. I would also like to thank Dr. Tapabrata Maiti (Taps) and Dr. Yuehua Cui for serving in my committee and for all their support and discussions. I am grateful to Dr. Maiti for making me look into the application sides of a problem in order to make it more appealing to the practitioners. I would also like to thank Prof. Koul’s family and Dr. Maiti’s family for their amazing hospitality. Apart from my committee, I am thankful to Prof. Yimin Xiao, Prof. Haolei Wang, Prof. iv R.V. Ramamoorthi and Prof. P.K Pathak for their helpful discussions and encouragement and Prof. Camille Fairbourn, Dr. Harish Sankaranarayanan and Prof. Elijah Dikong for mentoring me as a teaching assistant at the department. A big thank you to all the staff members: Andy Hufford, Tami Hankey, Megan Spaulding and Teresa Vollmer who ensured a nice and smooth working environment at the department. I would also like to thank Prof. Ashoke Kumar Sinha and his wife Kalyani Ghosh for their unconditional support. I am indeed grateful to my professors at the Department of Statistics University of Cal- cutta: late Prof. Uttam Bandyopadhyay and Prof. Gaurangadeb Chattopadhyay whose im- mense love, support and encouragement motivated me to pursue the doctoral program from Michigan State University. The training I received from them and other faculty members, on statistical inference and other topics, paved the way for my dissertation in this direction. I am indeed grateful to Prof. Monika Bhattacharjee (Monika di) for being my philosopher and guide. The discussions that I had with her helped me to enhance my mathematical rigor and improve my chain of thoughts. My deepest gratitude to Abhijnan Chattopadhyay, Alex Pijyan, Satabdi Saha, Cheuk Yin Lee, Chitrak Banerjee, Runze Su, Yeusong Wu, Pratim Guha Niyogi, Ashish Banik, Rejaul Karim, Raka Mandal, Tathagata Dutta, Hema Kollipara, Sumegha Premchandar, Nian Liu, Kaixu Yang, Metin Eroglu and others for all the great with memories of the last five years, which I would remember for the rest of my life. And last but not the least, I would like to express sincere gratitude to my friends: Ambarish Chattopadhyay, Aniket Biswas, Anish Ganguli, Aditi Sen, Sneha Chakraborty, Sagnik Mukherjee, Anirban Ghosh, Srijan Bhattacharya, Sagnik Halder, Dhrubojyoti Ghosh, Ronita Bose, Hiya Banerjee and Joydeep Basu, for their support and encouragement in many moments of crisis. I thank you all from the bottom of my heart. v TABLE OF CONTENTS Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 High dimensional MANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Testing equality of Two Population Covariance Matrices . . . . . . . . . . . 3 1.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 2 Multiplier bootstrap tests for high-dimensional data with applications to MANOVA . . . . . . . . . . . . . . . . . . . 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Gaussian approximation results over the class of Convex sets CL,s . . . . . . 9 2.4 Multiplier bootstrap results over the class of convex sets CL,s . . . . . . . . . 22 2.5 Motivating application to high-dimensional testing problems . . . . . . . . . 27 2.5.1 MANOVA (balanced case) . . . . . . . . . . . . . . . . . . . . . . . . 28 2.5.2 Unbalanced MANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.5.3 Two-Way MANOVA with unequal observations . . . . . . . . . . . . 36 2.5.4 Linear hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.6 Connection with other tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Chapter 3 Bootstrap based testing for equality of covariance matrices . . 45 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3 Gaussian approximation result for U statistics . . . . . . . . . . . . . . . . . 47 3.4 Jackknifed Multiplier Bootstrap Approximation for U statistics . . . . . . . 56 3.5 Two sample test for covariance matrices . . . . . . . . . . . . . . . . . . . . . 75 3.6 Analysis of Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 vi Chapter 1 Introduction Over the last few decades, modern data collection techniques have facilitated the scientists to gather data sets with huge number of variables. Such data sets typically referred to as high dimensional data. It often arises in biology, genomics, artificial intelligence and in financial sectors. A unique feature of the high dimensional data is that even for relatively small sample sizes, the observed number of variables is very large. For example, in a typical gene-data set, one has thousands of gene expressions for a few hundred human beings. More insights about these data sets can be found in the Efron (2012) [23], Giraud (2021) [27], Shedden and Taylor (2004) [44], Buhlmann and Van de Geer (2011) [8], Wainwright(2019) [49] and others. From the discussion in the above paragraph, it is clear that to analyze these sorts of datasets, one needs to apply multivariate methods of estimation and inference. Several traditional multivariate methods of statistical inference can be found in Anderson (2003) [3], Muirhead (1982) [36], Khatri and Srivastava (1979) [47], Roy (1957) [41] among many others. Although there is a vast literature available for analysis of multivariate datasets, they are of little use when the number of variables becomes either comparable or larger than the sample size. Traditional multivariate results are not directly applicable when the dimension of these datasets cannot be treated as a fixed constant. For the past three decades several statisticians have been trying to develop new inference methods to tackle both estimation and inferential problems arising in high dimensional datasets. In the thesis we would consider inferential problems related to high dimensional datasets and a few possible solutions. 1 1.1 High dimensional MANOVA One high dimensional problem is to test the equality of means among K-groups of popu- lations. When the sample observations are univariate in nature this problem is known as classical analysis of variance or ANOVA. The first test statistic for ANOVA was proposed by Ronald Fisher in 1918 [25]. Under traditional multivariate setup, where the dimension of the mean vectors is held fixed compared to the sample size, Wilks (1932) [53] provided a statistic for testing the equality of several population means. Since then several other test statistics have been proposed by Lawley (1938) [31], Nanda (1951) [37], Pillai (1955) [40], Roy (1957) [41] and others. These tests heavily rely on the likelihood ratio of the between sample and the within sample covariance matrices. As soon as the dimension exceeds the minimum sample size, at least one of the sample covariance matrices become singular and these likelihood ratio based test statistics tend to become zero which leads to unsatisfactory performances of these tests. Under the setup of increasing dimensions, the test based on the ratio of the traces of the sample covariance matrices, was the first test which was proposed by Dempster (1958, 1960) [21] [22]. Bai and Sarnadasa (1996) [4] provided asymptotic nor- mality of Dempster’s test statistic based on corrected likelihood ratio when K = 2. Later Fujikoshi, Himeno and Wakaki (2004) [26], Schott (2007) [42], Srivastava [45], Srivastava and Yamada (2012) [55] have proposed several tests for MANOVA in high dimensions. All these tests have been formulated under the assumption of either multivariate normality or under the assumption of equal covariance matrices among the K-groups. Cai and Xia (2014) [10] proposed a test statistic based on infinity norm and proved its asymptotic convergence to Gumbel distribution under the assumption of homogeneity of the population covariance matrices. Under the same assumption as in Cai and Xia (2014) [10], 2 Zhang, Guo and Zhou (2017) [56] proposed a test based on Frobenius norm and proved that its asymptotic null distribution is a Chi- square distribution . Recently, Chen, Li and Zhong (2019) [15] proposed a test statistic based on L2 -norm and proved its asymptotic normality without the assumption of normality of the underlying populations and homogeneity of the population covariance matrices. We propose a statistical test for equality of the means for K populations. We establish a Gaussian approximation result for the class of sparse convex sets. This Gaussian approxima- tion result generalizes the Gaussian approximation results over the class of hyperrectangles by Chernozhukov, Chetverikov and Kato (2017) [19]. We also develop multiplier bootstrap results over the class of convex sets. These results find an extremely useful application in the context of High Dimensional MANOVA problem based on supremum type test statis- tics for the difference in means among the K groups. Here we allow the number of groups (K) to diverge to infinity. The test procedure considered here is free from any distribution and correlational assumptions which broadens its scope towards practical applications. The problem of Linear Hypothesis testing under high dimensional setup is also tackled using the previously mentioned results. The asymptotic analysis of these tests in terms of controlling size and power is been theoretically validated. The connection with various other tests is also established. 1.2 Testing equality of Two Population Covariance Matrices Along with detecting the differences among the population means, another important prob- lem is to study the dependence among the components of the observed sample vectors under various stages of treatments, e.g, cancer and Alzheimer’s’ disease data sets. As a motiva- tion the phenomenon of increasing dimensions attributes to complex dependence structures. 3 Thus the test for equality of two covariance matrices becomes quite challenging. For genomic studies, the genetic networks of living cells determine the internal structures of the micro- array gene expressions or single nucleotide polymorphism (SNP) counts. Huge variation and dependence among the measurements of various genes are observed subject to different biological conditions and treatments. For example, some genes may be tightly correlated in the controlled or early stage of a disease but their dependence can wither away during the later course or more serious stages of the disease. More reference on these data sets and their various impacts on the dependence structure of the vectors can be found in Shedden and Taylor (2004) [44]. In the case of fixed dimension, several tests based on likelihood ratio inspired by Bartlett (1937) [7] have been proposed. See Chapter 10, Anderson (2003) [3] for more details and references. Marchenko and Pastur (1967) [35], Capitaine and Casalis (2004) [12], Wang and Paul (2014) [50] have made immense contribution in the field of random matrix theory to study the limiting spectral distribution of the sample covariance matrix. Several other advancements were made by Johansson (2000) [28], Johnstone (2001) [29], Lee and Schnelli (2016) [32] on the frontier of extreme eigen-values of the sample covariance matrices. In the case of spiked covariance structure of the sample covariance matrices the asymptotic distribution of the extreme eigen-values were studied by Johnstone (2001) [29], Baik and Silverstein (2006) [6], Paul (2007) [39], Bai and Yao (2008) [5], Johnstone and Lu (2009) [30], Wang and Fan (2017) [51], Cai, Han and Pan (2020) [9]. From another point of view, without using the random matrix theory (RMT), several other tests based on different norms have been proposed. The test proposed by Schott (2007) [43] is based on a metric that measures the difference between the two sample covariance matrices. With Σ1 and Σ2 denoting the two population covariance matrices, Srivastava and 4 tr(Σ21 ) Yanighara (2010) [46] have proposed a test based on the difference between and (tr(Σ1 ))2 tr(Σ22 ) , where tr stands for the trace. Both of these tests have been formulated under the (tr(Σ2 ))2 assumption of multivariate normality of the underlying populations or have been tailored for moderate high dimensionality. A U-statistic based test on an unbiased estimator of the Frobenius norm of the difference of two population covariance matrices has been proposed by Li and Chen (2012) [33]. As- suming the difference of the two covariance matrices to be sparse, Cai, Liu and Xia (2013) [11] have developed a test based on the maximum of the standardized difference between the entries of the sample covariance matrices. The two tests work outside the regime of multivariate normal populations. Recently, Chang, Zhou, Zhou and Wang (2017) [14] have proposed a test based on the bootstrap version of the test statistic used in Cai et al. (2013) [11] under the assumption of sparsity and correlational structure among the components of the random vectors. We propose a statistical test for the equality of two high-dimensional covariance matrices that requires no distributional assumptions, except for some weak conditions on the moments of the random vectors and tails of the marginal distributions. Our derivation is based on an extension of the one-sample central limit theorem for non-degenerate U statistics in Chen (2018) [16]. This provides a practically useful procedure with rigorous theoretical guarantees of asymptotic level and power assessment. In particular, the proposed test is easy to implement, as we can allow arbitrary correlation structures among the components of the random vectors. Other salient features include weaker moments and tail conditions than the existing methods, allowance for highly unequal sample sizes, consistent power behavior against fairly general alternatives and the data dimension is allowed to be exponentially high 5 under the umbrella of such general conditions. 1.3 Notation In this section we describe the notation and conventions throughout this thesis. Let d ≡ dm or dm,n be a sequence of positive integers depending on the context. Let Rd denote the d-dimensional Euclidean space and for an x ∈ Rd , let xT denote its transpose. For any two column vectors x = (x1 , · · · , xd )T and y = (y1 , · · · , yd )T in Rd , write x ≤ y whenever xj ≤ yj , for all j = 1, · · · , d. For any x ∈ Rd , a, b ∈ R, x + a := (x1 + a, x2 + a, · · · , xd + a), a ∨ b = max{a, b} and a ∧ b = min{a, b}. For any two sequences of positive numbers an and bn , write an ≲ bn , if for some positive and finite constant C, an ≤ Cbn , for all large n. We write an ≈ bn if an ≲ bn and bn ≲ an . For any matrix A = ((aij )) of real numbers, ∥A∥∞ := maxi,j |aij |. For any function f : R 7→ R, ∥f ∥∞ := supz∈R |f (z)|. For a smooth function g : Rd → R, we adopt indices to represent the partial derivatives for brevity, for example ∂j ∂k ∂l g = gjkl . The notation ξ ∼D G means that the random vector ξ ∈ Rd has distribution function (d.f.) G. Let ψα (x) = exp(xα ) − 1, x ≥ 0, α > 0. For any random variable (r.v.) X , the entity ∥X ∥ψα := inf{λ > 0 : E{ψα (|X |/λ)} ≤ 1}. is known as the Orlicz norm, when α ∈ [1, ∞) and a quasi-norm for α ∈ (0, 1), see, e.g., p95 of van der Vaart and Wellner (1996). We define Are as the class of all hyperrectangles A in Rd of the form A = {x ∈ Rd : aj ≤ xj ≤ bj for all j = 1, · · · , d}, where −∞ ≤ aj ≤ bj ≤ ∞ for j = 1, 2, · · · , d. 6 Chapter 2 Multiplier bootstrap tests for high-dimensional data with applications to MANOVA 2.1 Introduction The statistical motivation for this work comes from the high dimensional MANOVA problem for testing H0 : µ1 = · · · = µK for K > 2. This problem has been a focus of many recent works due to its growing importance in genomics, neuroimaging, among many other fields of science. For example, Fujikoshi et.al. (2004) [26] considered the ratio of the traces of between-sample covariance and within-sample covariance. Meanwhile, Schott (2007) [42] proposed a test based on the difference of those two traces. Srivastava (2007) [45] used the Moore-Penrose inverse of the within-sample covariance matrix to construct a test. Cai and Xia (2014) [10] proposed a test based on the maximum-norm of the squared differences between K groups. All mentioned tests either have been formulated under the assumption that the data is generated from a multivariate normal population or under some stringent distributional or sparsity assumptions. Moreover, all these tests assume equal covariance structure among all the groups. Recently, Chen, Li and Zhong (2019) [15] have proposed a thresholded L2 -norm-type test statistic assuming sparsity in population means, mixing and multivariate sub-Gaussianity among the components of the random vectors. They consider different sparse covariance matrices across different groups. The sparsity assumptions on the means and covariances are very important and crucial in their work. In this work, we eliminate the need for these assumptions. 7 From a different point of view, this work also extends the recent work by Xue and Yao (2020) for K = 2 to case K > 2. This extension is elegant and less technical than the direct reproving of the results in Xue and Yao (2020), where one would have to tackle intricate block-type dependency structures and work with U - statistics similar to what was done in Chen (2018) [16]. The class of our tests enjoys all the good properties of the tests in Xue and Yao (2020) [54]. In particular, our tests are computationally fast and simple since they do not require estimating of covariance and/or precision matrices. Our tests have an advantage of being versatile, so they can be just as easily adopted to solve MANOVA or to test for a linear structure of the population means such as contrasts. 2.2 Overview This chapter establishes the rates of the approximations of the distributions of the normalized sums Sn of independent random vectors by those of the corresponding sums of independent Gaussian random vectors over a new class of convex sets CL,s in a high dimensional setup. We also establish the rate of the approximation of the distributions of Sn by the multi- plier bootstrap distributions. The class of convex sets CL,s allows to quantify the effect of sparsity on the convergence rate of the above approximation explicitly. We first prove the Gaussian approximation results over a class of convex sets and then prove the multiplier bootstrap approximation results over the same class. To appreciate the usefulness of the results obtained here, we illustrate by developing some tests for high-dimensional MANOVA (Multivariate Analysis of Variance) under various setups and conditions. In particular, these results find an extremely useful application in the context of High Dimensional MANOVA problem based on supremum type test statistics for the difference in means among the K groups. Throughout this chapter, we allow the number of groups (K) to diverge to infinity. 8 The considered test procedure is free from any distribution and correlational assumptions which broadens its scope towards practical applications. The problem of Linear Hypothesis testing for several means has also been tackled using the previously mentioned results. The asymptotic analysis of these tests in terms of controlling size and power has been theoreti- cally validated. We have also established the connection between our proposed test and some other tests which are popular in the existing literature. The empirical comparison between our proposed test and the existing popular competitors is beyond the scope of this thesis work. However this work as well as a real data application has been recently carried out in Chakraborty and Sakhanenko (2022) [13]. 2.3 Gaussian approximation results over the class of Convex sets CL,s Recall, that all the vectors are treated as columns. Let X1 , . . . , Xn be independent random vectors in Rd . The components of Xi are denoted by Xij , j = 1, . . . , d. Assume EXij = 0, EXij 2 = σ 2 < ∞ , i = 1, . . . , n and j = 1, . . . , d. The normalized sum ij n SnX n−1/2 X = Xi i=1 is approximated by its Gaussian analogue n SnY n−1/2 X = Yi , i=1 where Y1 , . . . , Yn are independent centered Gaussian random vectors in Rd such that EYij2 = 2 , and all Y are independent from all X . The quality of the approximation of a normalized σij i i 9 sum by its Gaussian analogue is assessed via ρn (B) = sup |P (SnX ∈ B) − P (SnY ∈ B)|, B∈B where B is a subclass of all Borel sets in Rd . For an integer L > 0 and a real number s > 0, we introduce a natural class CL,s of sparse convex sets which are intersections of a finite number Ld of half-spaces. Let A(1) , . . . , A(L) be fixed d × d matrices and let u1 , . . . , uL be vectors in Rd . Define CL,s = {{w ∈ Rd :A(1) w ≤ u1 , . . . , A(L) w ≤ uL } : A(1) , . . . , A(L) ∈ A, u1 , . . . , uL ∈ Rd }, where A ⊂ Rd×d contains all matrices satisfying sparsity condition: For some s > 0 (poten- tially depending on L and d. d X (l) (L) |Ajm | ≤ s, j = 1, . . . , d, l = 1, . . . , L. (2.3.1) m=1 In a series of seminal works about Gaussian approximation for high dimensional data Chernozhukov et al. (2017) [19] considered the class of hyperrectangles ARe , which consists of all sets A of the type, A = {w ∈ Rd : aj ≤ wj ≤ bj , ∀j = 1, ..., d} for some −∞ ≤ aj ≤ bj ≤ ∞, j = 1, 2, · · · , d. Here we make an important observation that the class ARe is a special case of the class CL,s , where L = 2 and A = {A, −A}. In order to formulate the results, we need a some more notation. Similar to Chernozhukov 10 et al. (2017) [19] we introduce the following quantities. For ϕ ≥ 1 and for any random variable X we define n   √  n n−1 3 X M̃n,d,L,X (ϕ) = E max |Xij | I max |Xij | > , (2.3.2) 1≤j≤d 1≤j≤d 4ϕ log(Ld) i=1 M̃n,d,L,X,Y (ϕ) = M̃n,d,L,X (ϕ) + M̃n,d,L,Y (ϕ). For the sake of brevity from now on we would denote M̃n,d,L,X (ϕ) = M̃n,X (ϕ). It is slightly different from their (2.3.2) functional, since the log term contains Ld as opposed to d in Chernozhukov et al. (2017) [19]. Also recall from Chernozhukov et al. (2017)[19] the notation n h i n−1 3 X Ln = max E |Xij | . 1≤j≤d i=1 Moreover, we introduce average covariance matrix n 1 X (i) (i) Σ= Σ , Σkm = E(Xik Xim ). (2.3.3) n i=1 We assume the following condition. (N1) There exists a constant b > 0 such that for all j = 1, . . . , d and all l = 1, . . . , L, [A(l) Σ(A(l) )T ]jj ≥ b. This condition gives a structural interplay between matrices A(l) , l = 1, . . . , L, and the covariance matrices Σ(i) , i = 1, . . . , n. If all the covariance matrices are the same and A is an identity matrix we recover the condition (M.1) in Chernozhukov et al. (2017) [19]. Thus, (N1) replaces (M.1) for more intricate matrices. Condition (N1) plays a key role towards achieving 11 the Gaussian approximation result, as this condition is critical for Nazarov inequality to hold. See .0.4 for more details. Before beginning the statement of the following lemma we would like to define a few quantities. For any random variable X̃ ∈ Rd , we define n 1 E[|X̃im |3 ] X Ln,X̃ = max n 1≤m≤Ld i=1 n   √  1 X 3 n Mn,X̃ (ϕ) = E max |X̃im | I max |X̃im | > , n 1≤m≤Ld 1≤m≤Ld 4ϕ log(Ld) i=1 Mn,X̃,Ỹ (ϕ) = M̃n,X̃ (ϕ) + M̃n,Ỹ (ϕ). Lemma 2.3.1. Suppose that (N1) and (L) hold. Then for some constant K’ depending only on b and all ϕ ≥ 1 we have √ √ ρ′n := sup |P ( vSnX + 1 − vSnY ∈ C) − P (SnY ∈ C)| C∈CL,s ,v∈[0,1] s3 2 2   p ′ ′ p ′ log(Ld) ≤ K √ ϕ log (Ld) ϕL n ρn + Ln log(Ld) + ϕMn,X,Y (sϕ) + K , n ϕ Lemma 2.3.1 can be seen as an extension of Lemma .0.5. Proof. The trick is to stack (A(1) Xi , . . . , A(L) Xi ) into Ld-dimensional vectors X̃i for i = 1, . . . , n. Then     P SnX ∈ C = P SnX̃ ≤ u , u = (u1 , . . . , uL ) ∈ RLd . h P 1 n The following condition can be seen as an analogue of the condition n 2 i=1 E[(Xij ) ] ≥ 12 i b, j = 1, . . . , d of Lemma .0.5 n 1X E[(A(l) Xi )2j ] ≥ b, j = 1, . . . , d, l = 1, . . . , L. (2.3.4) n i=1 Recalling (2.3.3), we see that (2.3.4) can be rewritten as n X d d  n d d 1X (l) X (l) 1 X X X (l) (l) E Ajk Xik Ajm Xim = Ajk Ajm E(Xik Xim ) n n i=1 k=1 m=1 i=1 k=1 m=1 = [A(l) Σ(A(l) )T ]jj . (2.3.5) Therefore from (2.1.5) the condition with b becomes [A(l) Σ(A(l) )T ]jj ≥ b, j = 1, . . . , d and l = 1, . . . , L which is condition (N1) defined above. Applying Lemma .0.5 on X̃i for the function n n 1 1 E[|X̃im |3 ] = E[|(A(l) Xi )j |3 ] X X Ln,X̃ = max max n 1≤m≤Ld n 1≤j≤d,1≤l≤L i=1 i=1 n  d 3  1 X X (l) = max E Ajk Xik . n 1≤j≤d,1≤l≤L i=1 k=1 Using condition (L) and convexity properties of ℓ3 -norm (Orlicz norm) we have 13 d 3 d (l) 3  X (l) X |Ajk | E Ajk Xik ≤s E 3 |Xik | Pd (l) k=1 k=1 m=1 jm |A | d (l) 3 X |Ajk | ≤ s3 (E(|Xik |3 ))1/3 Pd (l) k=1 m=1 |Ajm | d (l) 3 X |Ajk | ≤ s3 max (E(|Xij |3 ))1/3 Pd (l) 1≤j≤d k=1 m=1 jm|A | = s3 max E(|Xij |3 ) (2.3.6) 1≤j≤d Using (2.3.6) we can bound Ln,X̃ as n s3 E|Xik |3 = s3 Ln . X Ln,X̃ ≤ max n 1≤k≤d i=1 Applying Lemma .0.5 on X̃i we obtain the function Mn,X̃,Ỹ (ϕ) = Mn,X̃ (ϕ) + Mn,Ỹ (ϕ), where n   √  1X 3 n Mn,X̃ (ϕ) = E max |X̃im | I max |X̃im | > . n 1≤m≤Ld 1≤m≤Ld 4ϕ log(Ld) i=1 Note that by condition (2.3.1), we get d X (l) max |X̃im | = max Ajk Xik ≤ s max |Xik |. 1≤m≤Ld 1≤j≤d,1≤l≤L 1≤k≤d k=1 Then Mn,X̃ (ϕ) is bounded from above by n √ s3 X    3 n E max |Xik | I max |Xik | > = s3 M̃n,X (sϕ). n 1≤k≤d 1≤k≤d 4sϕ log(Ld) i=1 14 Using the bounds for the expressions Mn,X̃,Ỹ , Ln,X̃ and (N1) in Lemma .0.5, we obtain the end result of Lemma 2.3.1. Lemma 2.3.2. Suppose that (N1) and (L) hold. Then there exist constants K1 , K2 > 0 depending only on b such that for every sequence (ln , Ln ) of real numbers ∋ ℓn ≥ Ln , we have ℓn log7 (Ld) 1/6 Mn,X,Y (sϕn )   2   Ln (1 − )ρn (CL,s ) ≤ K1 s + , ℓn n ℓn with ℓn log4 (Ld) −1/6  2  sϕn = K2 , Mn,X,Y (ϕn ) = M̃n,X (ϕn ) + M̃n,Y (ϕn ). n Note that the technique of proving Lemma 2.3.2 is analogous to Theorem 2.1 in Cher- nozhukov et al. (2017) [19] adapted to the class of sparse convex sets. Proof: For s3 c0 (ϕ) = K ′ √ ϕ3 log2 (Ld)Ln , n s3 s3 c1 (ϕ) = K ′ √ ϕ2 log5/2 (Ld)Ln + K ′ √ ϕ3 log2 (Ld)Mn,X,Y (sϕ) n n p log(Ld) + K′ , ϕ from Lemma 2.3.1, we have ρ′n ≤ c0 (ϕ)ρ′n + c1 (ϕ). For c0 (ϕ) > 1, the above inequality is trivial. So we only consider the case when c0 (ϕ) < 1. We solve the two inequalities 1 − c0 (ϕ) = δ > 0 and ρ′n ≤ c1 (ϕ)/δ and try to choose a ϕ. The first inequality gives s3 K ′ √ ϕ3 log2 (Ld)Ln < 1 − δ, n 15 while the second yields s3 s3 δρ′n ≤ K ′ √ ϕ2 log5/2 (Ld)Ln + K ′ √ ϕ3 log2 (Ld)Mn,X,Y (sϕ) n n p log(Ld) + K′ . ϕ In other words, in the first inequality we need √ (K ) log (Ld)(Ln /(1 − δ))2 −1/6  1/3  ′ 2 4  n(1 − δ) sϕ < = . K ′ log2 (Ld)Ln n Ln Set ℓn = 1−δ > Ln for 0 < δ < 1 and set K2 = (K ′ )−1/3 /2.  −1/6 K2 ℓ2n log4 (Ld) Meanwhile, the second inequality for ϕ = ϕn = s n gives −2/3 δρ′n ≤ K ′ K22 sn−1/6 ℓn log7/6 (Ld)Ln + K ′ K23 ℓ−1 n Mn,X,Y (sϕn ) K ′ −1/6 1/3 7/6 + sn ℓn log (Ld) K2 1 1/3 ≤ K ′ [K22 + ]sn−1/6 ℓn log7/6 (Ld) + K ′ K23 ℓ−1 n Mn,X,Y (sϕn ). K2 Now choose K1 = max(K ′ [K22 + K1 ], K ′ K23 ) which completes the proof. 2 Before stating the next theorem, we would introduce some more conditions. (N2) let Bn ≥ 1, n ∈ N, be a sequence of numbers such that Ln ≤ Bn . (N3a) let Bn ≥ 1, n ∈ N, be a sequence of numbers such that ∥Xij ∥ψ1 ≤ Bn ∀i = 1, . . . , n, ∀j = 1, . . . , d. 16 (N3b) let Bn ≥ 1, n ∈ N, be a sequence of numbers such that |Xij | 3   E max ≤ 2 ∀i = 1, . . . , n. 1≤j≤d Bn (N3c) There exists a universal constant c > 0 such that sBn2 log7 (Ldn) ≤c n or Bn log7 (nLd) 1/6 Bn log3 (nLd) 1/3  2   2  s + ≤c n n1/3 In comparison with Chernozhukov et.al. (2017), note that (N2) is their condition (M.2), while condition (N3a) replaces condition (E.1), and (N3b) is condition (E.2) with q = 3. Theorem 2.3.1. Suppose conditions (L), (N1), (N2) are satisfied, then under condition (N3a) Bn log7 (nLd) 1/6  2  ρn (CL,s ) ≤ Cs , (2.3.7) n while under condition (N3b) Bn log7 (nLd) 1/6 Bn log3 (nLd) 1/3  2   2  ρn (CL,s ) ≤ Cs +C , (2.3.8) n n1/3 where the constant C depends on b only. Remark 2.3.1. Note that if we consider (N3c), then the conclusion obtained from this theorem is non-trivial and it specifies the conditions about the growth rates of Bn , L, s, d 17 with n. The proof of Theorem 2.3.1 is an adaptation of the technique used in Proposition 2.1 in Chernozhukov et al. (2017) [19]. We provide the details below for the sake of completeness. Proof: From Lemma 2.3.2, we have ℓn log7 (Ld) 1/6 Mn,X,Y (sϕn )   2   ρn (CL,s ) ≤ K1 s + , (2.3.9) n ℓn where ϕn and ℓn are defined as in Lemma 2.3.2. To get the bounds on the right hand side of (2.3.7) and (2.3.8) one needs to bound ℓn , M̃n,X (sϕn ) and M̃n,Y (sϕn ), under (N3a) and (N3b). Condition(N2) implies that Ln ≤ Bn . We set ln = Bn . The condition (N3a) entails that max1≤i≤n max1≤j≤d ∥Xij ∥ψ1 ≤ Bn . Now since Yi,j ∼ N (0, σij 2 ), i = 1, 2, · · · , n and j = 1, 2, · · · , d, we have for any t ≥ 0, h i  2 P |Yij | > t ≤ 2 exp − t /2σij , 2 i = 1, · · · , n and j = 1, · · · , d. ∥Yij ∥ψ2 ≤ [(1 + 2)(2σij 2 )]1/2 , i = 1, · · · , n and j = 1, · · · , d. Now noting the fact that for p < q, ∥X∥ψp ≤ (log 2)p/q ∥X∥ψq , we obtain that for all i = 1, 2, · · · , n and j = 1, 2, · · · , d, ∥Yij ∥ψ1 ≤ (log 2)1/2 ∥Yij ∥ψ2 . From the property of Orlicz norm, it can be argued that for all i = 1, 2, · · · , n and j = 1, 2, · · · , d, we have 2 )1/2 = (E(X )2 )1/2 ≤ 2∥X ∥ . (σij ij ij ψ1 18 Therefore combining the above equations we can conclude that for some universal positive constant c, max max ∥Yij ∥ψ1 ≤ cBn . 1≤i≤n 1≤j≤d Now applying Lemma .0.9, (the maximal inequality for Orlicz norms) we obtain that, for all i = 1, 2, · · · , n and for some universal positive constants c1 , c2 max |Xij | ≤ c1 Bn log(d), 1≤j≤d ψ1 max |Yij | ≤ c2 Bn log(d). 1≤j≤d ψ1 Then, from Markov’s inequality it follows that, for all i = 1, 2, · · · , n and for all t ≥ 0,     P max |Xij | > t ≤ 2 exp − t/c1 Bn log(d) , 1≤j≤d     P max |Yij | > t ≤ 2 exp − t/c2 Bn log(d) , 1≤j≤d Now, applying Lemma .0.6, we get  √ 3 " √ # n n M̃n,X (sϕn ) ≲ + Bn log(Ld) + exp −  . (2.3.10) sϕn log(Ld) 4csϕm Bn (log(Ld))2 To bound the first term of (2.3.10), for some universal constants c, c∗ we note that √ !−1/3 Bn2 (log7 (Ldn))   n ≳ log(Ldn) 4csϕn Bn (log(Ld))2 n ≳ c∗ log(Ldn). 19 √ √ n n √ Because of (N3c) sϕn ≥ 2, ≲ log(Ld) ≲ n, and sϕn log(Ld) !1/2 Bn2 (log2 (Ldn)) √ √ Bn log(Ld) = n ≲ n. n Combining the above equations (2.3.10) reduces to ∗ M̃n,X (sϕn ) ≲ n3/2 (nd)−c ≲ n−1/2 . (2.3.11) Using similar arguments it can be concluded that M̃n,Y (sϕn ) ≲ n−1/2 . (2.3.12) Now combining (2.3.11) and (2.3.12), we get Mn,X,Y (sϕn ) ≲ n−1/2 . Now finally for (2.3.9), we obtain Bn log7 (Ld) 1/6 Mn,X,Y (sϕn )  2  ρn (CL,s ) ≤ s + n Bn Bn log7 (Ld) 1/6  2  1 ≤s +√ n nBn Bn log7 (Ld) 1/6 Bn log7 (Lnd) 1/6  2   2  =s + (Bn )−4/3 (log(Lnd))−7/6 n−1/3 n n Bn log7 (Lnd) 1/6  2  ≲s . n 20 sB 2 log7 (Ldn) 3/2 Under (N3b) along with assuming n n ≤ c we also assume that Bn1/2−1/3 log d ≤ c.   n B2 Setting ln = Bn + 1/2−2/3n 1/2 , we obtain that Ln ≤ Bn ≤ ln . Further note that, by n log d 1/6  2 log3 (nLd) 1/3   2 7 Bn log (nLd) Bn Condition (N3c) s n + ≤ c and sϕn ≥ 2. Now, noting n1/3   √    n the fact that E max1≤j≤d |Xij | I max1≤j≤d |Xij | > 4sϕ log(Ld) 3 ≤ E max1≤j≤d |Xij | 3 and therefore by (N3b) we obtain that M̃n,X (sϕn ) ≲ Bn3 . Also note that ln−1 ≤ Bn−2 n1/2−2/3 log1/2 d and therefore we obtain that 3 n1/2−2/3 log1/2 d M̃n,X (sϕn )/ln ≲ Bn . Bn2 Bn log3 d 1/2 Bn log3 (nLd) 1/3  2   2  1 ≲ ≲ . (2.3.13) log d n1−2/3 n1/3 Similarly by (2.3.12) M̃n,Y (sϕn ) ≲ n−1/2 . (2.3.14) Now combining (2.3.13) and (2.3.14), we get Mn,X,Y (sϕn ) ≲ n−1/2 . Now finally we conclude the proof by noting the fact that from (2.3.9), we obtain that Bn log7 (Ld) 1/6 Mn,X,Y (sϕn )  2  ρn (CL,s ) ≤ s + n ln Bn log7 (Ld) 1/6  2  1 ≤s +√ n nln Bn log7 (Lnd) 1/6 Bn2 log3 (nLd) 1/3  2    ≲s + . n n1/3 21 2.4 Multiplier bootstrap results over the class of convex sets CL,s Theorem 2.3.1 presents an abstract Gaussian approximation since the covariance matrix Σ is unknown but needed to identify the distribution of Y ’s. To remove this abstractness we propose the multiplier bootstrap technique which was introduced by Chernozhukov et al. (2017) [19] for the class of hyper-rectangles. For a vector (e1 , . . . , en ) of iid N (0, 1) random variables and independent of all X, Y , we define n n 1X 1 X X̄ = Xi , SneX = √ ei (Xi − X̄). n n i=1 i=1 (i) Recall (2.3.3) and denote Σ̂ = n1 n i=1 Σ̂ , where Σ̂km = (Xik − X̄k )(Xim − X̄m ), (i) P 1 ≤ i ≤ n and 1 ≤ k, m ≤ d. Lemma 2.4.1. Suppose conditions (N1) and (L) hold. Then, for every constant ∆ ¯ n > 0, ρM B eX Y n (CL,s ) = sup |P (Sn ∈ B|X1 , . . . , Xn ) − P (Sn ∈ B)| ≤ C ∆n log ¯ 1/3 2/3 (Ld) B∈CL,s on the event max max [A(l1 ) (Σ̂ − Σ)(A(l2 ) )′ ]jk ≤ ∆ ¯ n, (2.4.1) 1≤l1 ,l2 ≤L 1≤j,k≤d where the constant C depends on b only. Proof of Lemma 2.4.1. We start by stacking (A(1) Xi , . . . , A(L) Xi ) into Ld-dimensional vec- tors X̃i for i = 1, . . . , n. Then for any B ∈ CL,s , there exists a vector u = (u1 , . . . , uL ) ∈ RLd such that P (SneX ∈ B|X1 , . . . , Xn ) = P (SneX̃ ≤ u|X1 , . . . , Xn ) 22 and P (SnY ∈ B) = P (SnỸ ≤ u). Let, 1 X n  1 X n  ∆= max (X̃i − X̃)(X̃i − X̃)′ − E[X̃i X̃i′ ] . 1≤k,m≤Ld n km n km i=1 i=1 First we try to calculate the variances of X^ i − X̄ and show that ∆ reduces to the quantity in the left hand side of (2.4.1) Note that X̃ = X̄ ˜ and X̃ − X̃ = X^ i i − X̄. Then we have n 1 X h (l ) ′ (l ) ′ i ∆ = max max A (Xi − X̄)(Xi − X̄) (A ) 1 2 1≤l1 ,l2 ≤L 1≤j,k≤d n jk i=1 n h 1 i A(l1 ) E(Xi Xi′ )(A(l2 ) )′ X − n jk i=1 h i = max max A(l1 ) (Σ̂ − Σ)(A(l2 ) )′ , 1≤l1 ,l2 ≤L 1≤j,k≤d jk which is the quantity participating in the left side of (2.4.1). Now the statement of the Lemma 2.4.1 can be obtained by applying Lemma .0.7 which concludes the proof. Before stating the next theorem we need to state another condition. (N3c’) There exists a sequence of αn ∈ (0, e−1 ), such that ! Bn2 log5 (nLd) log2 (1/αn ) s1/9 ≤ 1, or n Bn log3 (Ld) 1/3 −2/9  2  s 2/3 αn ≤ 1. n1/3 Theorem 2.4.1. Suppose conditions (L), (N1) and (N2) hold. Additionally, assume E(Xij Xik )2 ≤ Bn2 , i = 1, . . . , n, j, k = 1, . . . , d. 23 Then under condition (N3a), with probability at least 1 − αn , we have Bn log(nLd) 1/6 2/3  2  ρM B n (CL,s ) ≤ Cs2/3 log (Ld) log1/3 (1/αn ), n while under condition (N3b) with probability at least 1 − αn , we have Bn log5 (Ld) 1/6 1/3 Bn log3 (Ld) 1/3 −2/9  2   2  ρM B n (CL,s ) ≤ Cs2/3 log (1/αn ) + Cs 2/3 αn , n n1/3 where the constant C depends on b only. Remark 2.4.1. Note that if we consider (N3c’), then the conclusion obtained from this theorem is non-trivial and it specifies the conditions about the growth rates of Bn , L, s, d with n. Theorem 2.4.1 can be viewed as an analogue of Corollary 4.2 in Chernozhukov et al. (2017) [19] adapted for the class of sparse convex sets. Proof: The proof is in similar to that of Proposition 4.1 in Chernozhukov et al. (2017) [19]. First, condition (L) yields ∆˜ := max max [A(l1 ) (Σ̂ − Σ)(A(l2 ) )′ ]jk 1≤l1 ,l2 ≤L 1≤j,k≤d n 2 1X ≤ s max [Xij Xik − E(Xij Xik )] − X̄j X̄k . (2.4.2) 1≤j,k≤d n i=1 Note that (2.4.2) can be bounded by n 1X s2 max [Xij Xik − E(Xij Xik )] + s2 X̄j X̄k . (2.4.3) 1≤j,k≤d n i=1 24 To bound the first term of (2.4.3), we denote n 1X 2 σn,1 = s2 max E[Xij Xik − E(Xij Xik )]2 . 1≤j,k≤d n i=1 Applying Cauchy-Schwarz inequality and from the condition that E(Xij Xik )2 ≤ Bn2 , 2 ≤ s2 n(B )2 . Also denote i = 1, . . . , n, and j, k = 1, . . . , d, we bound σn,1 n M X = max max |Xij Xik − E(Xij Xik )|. 1≤i≤n 1≤j,k≤d To bound ∥M X ∥ψ , we note that for some positive universal constants c′i s 1/2 ∥M X ∥ψ ≤ max max |Xij Xik | + max max E|(Xij Xik )| 1/2 1≤i≤n 1≤j,k≤d 1≤i≤n 1≤j,k≤d ψ1/2 ≤ c1 max max |Xij Xik | + c2 max max E|(Xij Xik )| 1≤i≤n 1≤j,k≤d ψ1/2 1≤i≤n 1≤j,k≤d ψ1/2 2 = c1 max max |Xij | + c2 max max E|(Xij Xik )|. 1≤i≤n 1≤j≤d ψ1 1≤i≤n 1≤j,k≤d 2 Under (N3a), applying Lemma (.0.1) on the term max1≤i≤n max1≤j≤d |Xij | , we obtain ψ1 that 2 max max |Xij | ≤ c3 Bn2 log2 (dn) ≤ c3 Bn2 log2 (Ldn). 1≤i≤n 1≤j≤d ψ1 Note that c2 max1≤i≤n max1≤j,k≤d E|(Xij Xik )| ≤ Bn2 . Therefore, under (N3a) by ψ1/2 applying Lemma (.0.1) and Lemma (.0.2) with ϵ = 1 and δ = 1/2, we obtain the following 25 inequalities. E∆˜ ≤ Cs2 (n−1 Bn2 log(Ldn))1/2 ,  2   √   2 −1 2 ˜ > Cs (n Bn log(Ldn)) 1/2  nt c nt P ∆ + t ≤ exp − 4 2 + 3 exp − . 3s Bn sBn log(Ldn) Choosing t = Cs2 (n−1 Bn2 log(Ldn))1/2 log(1/α), which is proportional to ∆ ¯ yields the result under (N3a). Under (N3b), we see that σn2 ≤ nsBn2 . Now, to bound the term M X , we note that for any q≥2  )  X q/2  q/2 ≤ c4 E max |Xij Xik |q/2 + max E |Xij Xik |     E (M ) i,j,k i,j,k      q/2  q q ≤ c4 E max |Xij Xik | = c4 E max |Xij | ≤ c4 nBn . i,j,k i,j   1/2 q From the above inequalities, it implies that E (M X )2 ≤ c4 n2/q Bn . Therefore, under (N3b) for q = 3, applying Lemma (.0.1) and Lemma (.0.3) with ϵ = 1 and u = 3/2 we obtain the following inequalities. E∆˜ ≤ Cs2 (n−1 B 2 log(Ldn))1/2 Cs2 n−1/3 B 2 log(Ld), n n  2  ˜ + t ≤ exp − nt   n o P ∆˜ > E∆ + ct−3/2 n−1/2 s3 B 3 . n 3s4 Bn2 h i Then by choosing t = Cs2 (n−1 Bn2 log(Ldn))1/2 log(1/α) + n−1/3 α−2/3 Bn2 we obtain the desired result. Corollary 2.4.1. Suppose conditions of Theorem 2.4.1 are satisfied. With probability 1 there 26 exists integer n0 > 0 such that ∀n ≥ n0 Bn log(nLd) 1/6 2/3  2  ρM B n (CL,s ) ≤ Cs2/3 log (Ld) log1/3 (1/αn ), n under condition (N3a) and ∀n ≥ n0 Bn log5 (Ld) 1/6 1/3 Bn log3 (Ld) 1/3 −2/9  2   2  ρM B ≤ Cs2/3 log (1/αn ) + Cs2/3 αn , n (CL,s ) n n1/3 under condition (N3b). Proof: Corollary 2.4.1 follows from Theorem 2.4.1 and by Borel-Cantelli lemma as one P∞ can choose a sequence of αn ↓ 0 such that n=1 αn < ∞ and consider the events in Theorem 2.4.1 to get the desired result. Remark: This result proves the convergence of multiplier bootstrap result in almost sure sense. 2.5 Motivating application to high-dimensional testing problems Armed with the convergence results for the multiplier bootstrap, we consider test statistics for MANOVA problem that are functionals of the normalized sum SnX and the class CL,s . We show that bootstrap tests based on these statistics are consistent for MANOVA and for general linear hypothesis testing in high dimensional setup. Our approach produces a whole class of tests that are flexible and can be tuned to many different scenarios. The class of matrices A(l) , l = 1, . . . , L, serves as a tuning mechanism. Our tests do not need sparsity assumptions on the means and/or covariances. We require the moment and tail assumptions on the distributions that are less stringent than those in the existing literature. Our approach 27 allows for multivariate sub-Gaussian distributions and some heavy-tailed distributions, and thus broadens the regime of its practical use. Suppose there are K independent groups of random vectors Vk,i ∈ Rp , i = 1, . . . , n, k = 1, . . . , K, drawn from K populations with means µ1 , . . . , µK ∈ Rp . The k-th group after centering consists of independent vectors Zk,1 = Vk,1 −µk , . . . , Zk,n = Vk,n −µk in Rp . Then stack vectors Z1,1 , . . . , ZK,1 into X1 , stack vectors Z1,2 , . . . , ZK,2 into X2 , and so on, stack Z1,n , . . . , ZK,n into Xn . Then we obtain centered long vectors Xi ∈ Rd with d = Kp. We remark that we only need independence and same means for Vk,i ∈ Rp , i = 1, . . . , n. These vectors do not have to come from the same distribution. Also we can let d grow with n. This can be achieved by letting the dimension p grow with n or by letting the number of groups K grow with n or both. This is quite rare in the literature but a very useful setup in practice. 2.5.1 MANOVA (balanced case) We are interested in the hypotheses H0 : µ1 = · · · = µK versus HA : not H0 . For the i-th vector in the k-th group Vk,i denote its q −th components by [Vk,i ]q , q = 1, . . . , p. We propose the test statistic n X K X p (l) n−1/2 X Tn = max max Aj(q+(k−1)p) [Vk,i ]q , l=1,...,L j=1,...,d i=1 k=1 q=1 28 where each matrix A(l) satisfies the condition K X (l) (H) Aj(q+(k−1)p) = 0, j = 1, . . . , d, q = 1, . . . , p, l = 1, . . . , L. k=1 Note that   Tn = max max (l) X [A Sn ]j + n [A µ]j ,1/2 (l) l=1,...,L j=1,...,d where µ ∈ Rd consists of vectors µk ∈ Rp , k = 1, . . . , K, stacked into one long vector. Under H0 and condition (2.5.1), the test statistic becomes Tn = max max [A(l) SnX ]j . l=1,...,L j=1,...,d Since we apply multiplier bootstrap, we recall that n n 1 X 1X SneX =√ ei (Xi − X̄), X̄ = Xi . n n i=1 i=1 where the vector (e1 , . . . , en ) of iid N (0, 1) random variables is independent of all Xi , Yi , i = 1, . . . , n. Note that n 1 X SneX =√ ei (V1,i − V¯1 , . . . , VK,i − V¯K )T , n i=1 where V¯k = n1 n i=1 Vk,i , k = 1, . . . , K, are the groupwise averages. Define P n X K X p (l) Tne n−1/2 X = max max Aj(q+(k−1)p) ei [Vk,i − V¯k ]q , l=1,...,L j=1,...,d i=1 k=1 q=1 h i = max max A Sn (l) eX . l=1,...,L j=1,...,d j 29 Denote the Kolmogorov distance by KD   n X K X p   −1/2 X (l) 1/2 (l) = sup P max max n Aj(q+(k−1)p) [Vk,i ]q − n [A µ]j ≤ u u∈R l=1,...,L j=1,...,d i=1 k=1 q=1 − Pe (Tne ≤ u) , where Pe stands for the probability with respect to Gaussian vector e only. Theorem 2.5.1. Suppose conditions (L), (N1) and (N2) are satisfied. Additionally, assume E(Xij Xik )2 ≤ Bn2 , i = 1, . . . , n, j, k = 1, . . . , d. Let α ∈ (0, 1/e) be arbitrary. Then, under condition (N3a), with probability at least 1 − α, for all n large enough, Bn log7 (nLd) 1/6  2  KD ≤ Cs , n and, under condition (N3b), with probability at least 1 − α, for all n large enough, Bn log7 (nLd) 1/6 Bn log3 (nLd) 1/3  2   2  KD ≤ Cs +C , n n1/3 where the constant C depends on b only. We remark that KD tends to 0 as n grows as long as s6 Bn2 log7 (nLKp)n−1 = o(1), under (N3a). Then we can allow all the quantities s = sn , Bn , L = Ln , K = Kn , p = pn to grow with n. We can consider extreme particular cases. For example, if only s is allowed to grow then it can be as large as s = o(n1/6 ). If only p is allowed to grow, it can be as large as log p = o(n1/7 ). The same argument holds for K. One can rebalance different growth assumptions put on s, K, and p. 30 Proof: We start by observing that KD = sup |P ( max max [A(l) SnX ]j ≤ u) − Pe ( max max [A(l) SneX ]j ≤ u)| u∈R l=1,...,L j=1,...,d l=1,...,L j=1,...,d ≤ sup |P (A(l) SnX ≤ uId , l = 1, . . . , L) − Pe (A(l) SneX ≤ uId , l = 1, . . . , L)| u∈R ≤ ρM B n (CL,s ) + ρn (CL,s ), where Id is a vector of ones in Rd and class CL,s has all ul = uId , l = 1, . . . , L. Here we apply Theorems 2.3.1 and 2.4.1 noting the fact that for all n large enough Bn log7 (nLd) 1/6 Bn log(nLd) 1/6 2/3  2   2  s >s 2/3 log (Ld) log1/3 (1/α) n n and Bn log3 (nLd) 1/3 Bn log3 (Ld) 1/3 −2/9  2   2  > s2/3 α . n1/3 n1/3 Consider testing the hypothesis H0 : µ1 = · · · = µK vs HA : not H0 at the significance level α. We reject H0 if Tn ≥ Qα , 31 where quantiles of the bootstrapped distribution Qα is defined as Qα = inf{u ∈ R : Pe (Tne ≤ u) ≥ 1 − α}, α ∈ (0, 1).  1/6 2 log7 (nLd) Bn Theorem 2.5.1 implies that under condition (H), when s n → 0, or 2 log3 (nLd) 1/3   Bn 1/3 → 0, we have n P (Tn ≥ Qα |H0 ) → α, P (Tn ≥ Qα |HA ) → 1, provided ∃ j and l such that : [A(l) µ]j ̸= 0. Thus, these bootstrap tests are consistent against any fixed alternative as long as [A(l) µ]j ̸= 0 for some l = 1, . . . , L and some j = 1, . . . , d. We also remark that Vk,i , i = 1, . . . , n, can come from different distributions with the same mean vector µk . In this case Bn is not a constant. Then Bn , p, K can all grow with n. In particular, if Bn ∼ nβ , log p ∼ nρ , log K ∼ nκ for some positive β, ρ, κ, then under (N2) and (N3a) as long as 2β + 7 max(ρ, κ) < 1, KD would converge to 0 and the tests will remain consistent against any fixed alternative. Under (N2) and (N3b) one would need additionally 6β + 9 max(ρ, κ) < 1. This is a more interesting setup than the case of identically distributed Vk,i , i = 1, . . . , n, since Bn turns out to be a constant when (β = 0). In the iid case one can allow max(ρ, κ) < 1/7 under (N2) and (N3a) or max(ρ, κ) < 1/9 under (N2) and (N3b). Currently the proposed test is the only test that can accommodate the non-identically- distributed scenario. Next, we consider a class of local alternatives that converge to the null hypothesis as 32 n → ∞. Let (n) HA : µ1 , . . . , µK ∈ Rp : min min [A(l) µ]j ≥ cn n−1/2 , l=1,...,L j=1,...,d where cn → ∞ slowly as n → ∞. In fact cn n−1/2 → 0 as n → ∞. Recall d = Kp and it is allowed to grow with the sample size n. Corollary 2.5.1. Suppose conditions of Theorem 2.5.1 are satisfied. With probability tending to 1, we have (n) P (Tn ≥ Qα |HA ) → 1 as n → ∞. Proof: Note that (n)  P Tn ≥ Qα |HA   n X K X p (l) n−1/2 X =P max max Aj(q+(k−1)p) [Vk,i ]q l=1,...,L j=1,...,d i=1 k=1 q=1 √ √   (l) (n) (l) (n) − n[A µA ]j + n[A µA ]j ≥ Qα   n X K X p (l) n−1/2 X ≥P max max Aj(q+(k−1)p) [Vk,i ]q l=1,...,L j=1,...,d i=1 k=1 q=1 √  (n) (n) − n[A(l) µA ]j ≥ Qα − cn , min min [A(l) µA ]j ≥ cn n−1/2 l=1,...,L j=1,...,d (n) (n) − Pe (Tne ≥ Qα − cn |HA ) + Pe (Tne ≥ Qα − cn |HA ) (n) ≥ Pe (Tne ≥ Qα − cn |HA ) − KD → 1 as n → ∞. 33 2.5.2 Unbalanced MANOVA Suppose that samples have now different sample sizes nk , k = 1, . . . , K. Introduce a modified test statistic K X nt p X X −1/2 (l) T fn = max max nt Aj(q+(t−1)p) [Vt,i ]q . l=1,...,L j=1,...,d t=1 i=1 q=1 Then introduce its bootstrapped version and quantile K X nt p X X −1/2 (l) fne = max T max nt Aj(q+(t−1)p) [ei Vt,i ]q . l=1,...,L j=1,...,d t=1 i=1 q=1 and n o fne ≤ u ≥ 1 − α , α ∈ (0, 1).  fα = inf u ∈ R : Pe T Q Let, n = min1≤k≤K nk . Renumber groups so that the first group has the smallest sample size. Assume nk (D) = λk,n ∈ [1, ∞), k = 2, . . . , K, n where λk,n do not have to converge to a constant as n → ∞ but should remain bounded. 34 We decompose T fn into a version of Tn K X p "X n −1/2 (l) n−1/2 λt,n Aj(q+(t−1)p) [Vt,i ]q X T fn = max max l=1,...,L j=1,...,d t=1 q=1 i=1 λt,n # −1/2 (l) + n−1/2 λt,n X Aj(q+(t−1)p) [Vt,i ]q i=n+1 K X p "X n −1/2 (l) n−1/2 λt,n Aj(q+(t−1)p) [Vt,i ]q X ≈ max max l=1,...,L j=1,...,d t=1 q=1 i=1 n [λt,n ]−1 # (l) + n−1/2 [λt,n ]−1/2 X X Aj(q+(t−1)p) [Vt,n+(m−1)(λ ] t,n −1)+r q m=1 r=1 K X p n (l) 1/2 n−1/2 X X ≈ max max Aj(q+(t−1)p) λt,n l=1,...,L j=1,...,d t=1 q=1 i=1 " [λt,n ]−1 # ]−1 X × [λt,n Vt,i + Vt,n+(m−1)([λ , t,n ]−1)+r r=1 q where we used a type of blocking and γn ≈ δn means limn→∞ γδ n = 1, with probability 1. n The approximation appeared due to the use of integer parts of λt,n , t = 1, . . . , K. Define (l)^ (l) 1/2 Aj(q+(t−1)p) = Aj(q+(t−1)p) λt,n . Finally, define new variables as  [λt,n ]−1  ]−1 X Vft,i = [λt,n Vt,i + Vt,n+(m−1)([λ ]−1)+r , t = 1, . . . , K, i = 1, . . . , n. t,n r=1 Note that E Vf t,i = µt and the samples of these new variables are independent. Also note  that T fn is an analogue of Tn with V replaced by Ve . The variables Vf t,i , t = 1, . . . , K, are now stacked into new variables X fi . We remark that X fi , i = 1, . . . , n, are independent but 35 they will not be identically distributed even if the original Xi ’s were. Corollary 2.5.2. Suppose the conditions of Lemma 2.3.1 are satisfied for X e and A g (l) , l = 1, . . . , L. Moreover, assume conditions (2.5.2) and (2.5.1) hold for nk , k = 2, . . . , K and A(l) , l = 1, . . . , L. When g 1/6 Bn2 log7 (nLd)  s( max λk,n )1/2 →0 k=1,...K n mink=2,...,K λ2k,n or 1/3 Bn2 log3 (nLd)  → 0, n1/3 mink=2,...,K λ2k,n we have P (T fn ≥ Q fα |H0 ) → α, and P (T fα |HA ) → 1 provided ∃ j and l such that : [A(l) µ]j ̸= 0. fn ≥ Q Corollary 2.5.2 is proved similarly to establishing consistency of the test. 2.5.3 Two-Way MANOVA with unequal observations Consider the set up, where Yi,j,k = µ0 + αi + βj + γij + ϵi,j,k , where k ∈ {1, 2, · · · , Ni,j }, i = 1, 2, · · · , I and j = 1, 2, · · · , J. Here µ0 , αi , βj , γij are un- known p × 1 vectors of parameters while ϵi,j,k are mean zero p × 1 random vectors with 36 unknown covariance Σi,j . To make the model identifiable we are given a sequence of pos- itive weights wij , i = 1, 2, · · · I, j = 1, 2, · · · , J so that Ii=1 wi. αi = 0, Jj=1 w.j βi = 0, P P PI PJ PI PJ PJ i=1 wij γij = 0, j=1 wij γij = 0, and i=1 j=1 wij γij = 0, where wi. = j=1 wi,j and PI w.j = i=1 wi,j . Consider the null hypothesis H0 : α1 = α2 = · · · = αI = 0. (2.5.1) Comparing (2.5.1) with the set up of unbalanced MANOVA with K = I groups and testing equality of means for µk = µ0 +αk when there are nk = Ij=1 Nkj non-identically distributed P    observations Vk,1 , Vk,2 , · · · , Vk,n = Yk11 , Yk12 , · · · , Yk1N , Yk21 , Yk22 , · · · , Yk2N , k k1 k2  · · · , YkJ1 , YkJ2 , · · · , YkJN . Then the test statistic T fn would be well defined and then one kJ rejects H0 if T fn > Q fα . Analogously for the null hypothesis H0 : β1 = β2 = · · · = βJ = 0, we consider K = J groups and the testing problem µk = µ0 + βk when there are PI   nk = i=1 Nik non-identically distributed observations Vk,1 , Vk,2 , · · · , Vk,n = k  Y1k1 , Y1k2 , · · · , Y1kN , Y2k1 , Y2k2 , · · · , Y2kN , 1k 2k  · · · , YIk1 , YIk2 , · · · , YIkN . Then the test statistic T fn would be well defined and then Ik rejects H0 if T fn > Q fα . Finally consider the null hypothesis H0 : γ11 = γ12 = · · · = γIJ . This time we need to look at K = IJ groups and the testing problem reduces to µk = µ0 + αi + βj + γij , when there are nij observations Vk,m = Yijm , k = i + (j − 1)I, 37 i = 1, 2, · · · , I, j = 1, 2, · · · , J. Then the test statistic T fn would be well defined and we reject H0 if Tfn > Q fα . Unlike tests based on L2 norm, proposed by Watanabe, Hyodo and Nakagawa (2020) [52] our tests do not need to estimate the unknown unequal covariance matrices Σij and as a result of which our tests avoid estimation of standard deviation of the test statistic which is computationally expensive. Remark: It is easy to extend our framework to multi-level MANOVA setup and we can allow the number of groups K to grow exponentially with n. 2.5.4 Linear hypothesis testing Similar to Zhang, Guo and Zhou (2017) [56] we are now interested in the hypotheses H0 : Gµ = 0 vs HA : Gµ ̸= 0, where G is a known q × d matrix of rank q < d. This setup includes contrast tests and MANOVA. Note that (GGT )−1 exists. Consider the d × d matrices A(l) = M (l) [GT (GGT )−1 G]T , for some non-singular d × d matrices M (l) , l = 1, . . . , L. The expression in squared brackets is related to Moore-Penrose matrix inverse of G. This structural condition on A(l) replaces condition (H). Under H0 we have A(l) µ = 0, while under any alternative A(l) µ ̸= 0. Then the test described above works for this setup, and Theorem 2.5.1 holds with (H) replaced by this new structure of matrices A(l) . Let (c1 , . . . , cK ) be a non-zero vector in RK . As an example consider the following 38 hypotheses H0 : c1 µ1 + · · · cK µK = 0 versus HA : c1 µ1 + · · · cK µK ̸= 0. In this case G = (c1 Ip , . . . , cK Ip ) with Ip being a p × p identity matrix and let    c21 Ip c1 c2 Ip · · · c1 cK Ip    2   M (l)  c1 c2 Ip c2 I p · · · c2 cK Ip  A(l) = ,   ∥c∥22   ··· ··· ··· ···        c1 cK I p c2 cK Ip · · · cK Ip 2 PK where ∥c∥22 = k=1 ck 2 and M (l) is an arbitrary non-degenerate d × d matrix. Recall that d = Kp. As in MANOVA define test statistics as n X K X p (l) n−1/2 X Tn = max max Aj(q+(k−1)p) [Vk,i ]q . l=1,...,L j=1,...,d i=1 k=1 q=1 Then the bootstrapped test statistics are n X K X p (l) Tne n−1/2 X = max max Aj(q+(k−1)p) ei [Vk,i − V¯k ]q . l=1,...,L j=1,...,d i=1 k=1 q=1 Define quantiles of the bootstrapped distribution as Qα = inf{u ∈ R : Pe (Tne ≤ u) ≥ 1 − α}, α ∈ (0, 1). We reject H0 if Tn ≥ Qα . 39 Since A(l) µ = 0 holds if and only if H0 holds, then P (Tn ≥ Qα |H0 ) → α and P (Tn ≥ Qα |HA ) → 1 provided that conditions of Theorem 2.5.1 are satisfied. This test allows to treat high-dimensional MANOVA and contrast tests in the unified framework unlike other tests. 2.6 Connection with other tests In this section we show that the tests introduced in Xue and Yao (2020) [54] and Lin, Lopes and Muller (2021) [34] fit into our framework. 2-sample test by Xue and Yao (2020): Consider the setup in Xue and Yao (2020). There are K = 2 independent groups of random vectors Vi1 , Vi2 , i = 1, . . . , n, drawn from 2 populations in R2p with means µ1 , µ2 ∈ Rp . Their test statistics are the maximums of ASnX , where the matrix A ∈ R2p×2p is tri-diagonal matrix. The elements of the matrix A is denoted by ai,j for 1 ≤ i, j ≤ 2p. It has 1 on the main diagonal and -1 on the diagonals that start from a1,(p+1) and a(p+1),1 . Indeed, Xi = (Vi1 , Vi2 )T ∈ R2p and AXi = ([Vi1 − Vi2 ]1 , . . . , [Vi1 − Vi2 ]p , [Vi2 − Vi1 ]1 , . . . , [Vi2 − Vi1 ]p )T . Then V V V V ASnX = (Sn 1 − Sn 2 , Sn 2 − Sn 1 )T . Therefore, their test statistic is V V V V ∥Sn 1 − Sn 2 ∥∞ = max |[Sn 1 − Sn 2 ]q | = max [ASnX ]j . q=1,...,p j=1,...,2p Note that condition (2.5.1) holds for this A, where L = 1 and A = {A} and s = 2. Their 40 results are the special case of our results. Note that their proofs are specially designed for K = 2 and the direct generalization would be difficult. It would require intricate work with U -statistics in the same spirit as what is done in Chen (2018). MANOVA for functional data by Lin et al. (2021): Consider the set up in Lin et al. (2021). There are K independent groups of random vectors drawn from K populations with means µ1 , . . . , µK ∈ Rp . The k-th group after centering consists of Zk,1 , . . . , Zk,n independent observations in Rp . Now stack vectors Z1,1 , . . . , ZK,1 into X1 and so on Z1,n , . . . , ZK,n into Xn . Then Xi ∈ Rd with d = Kp. For τ ∈ [0, 1), the test statistics in Lin et al. (2021) are the maximums of ASnX , where the rows of matrix A are (0, . . . , 0, √ 1τ , 0, . . . , 0, − √ 1τ , 0, . . . , 0), where the non-zero 2σ 2σ k,l,j k,l,j entries are in the positions (k − 1)p + j, (l − 1)p + j for 1 ≤ j ≤ p. Here 2 σk,l,j = 0.5V ar(X1,(k−1)p+j ) + 0.5V ar(X1,(l−1)p+j ). √ Thus, A satisfies (2.5.1) with L = 2 and s = min 2στ . From the setup in Lin et al. k,l,j k,l,j (2021), the indices (k, l) belong to the set (k, l) ∈ P ⊂ {(i1 , i2 ) : 1 ≤ i1 < i2 ≤ K}. Their test statistic is of the form of our test statistic with A = {A, −A}. Note that our theorems provide at best the rates for Kolmogorov distance of order n−1/6 with respect to n and of order (log d)7/6 with respect to d, while Lin et al. (2021) get rates √ of order n−1/2+δ for an arbitrary δ > 0 with d = Kp ≤ pe log n . Lin et al. (2021) attain this rate under a stringent requirement of a special structure of the matrices Σ(i) with many restrictions that are difficult to check on practice. Their conditions essentially reduce the a high dimensional problem to a problem, where p ≈ n1/(log n) ∨ (log n)3 with a ∈ (0, 0.5). 41 2.7 Discussion In this chapter we have introduced a framework of bootstrap tests that can address several different testing problems in high-dimensional setup (n << p) in a unified fashion. This is done by considering tests statistics that are supremums of sums over sparse classes of convex sets of a novel type. These classes serve the role of a tuning mechanism, which can be chosen based on the particular problem. Basically for a hypothesis about means, one needs to select a finite number of sparse matrices A(l) , l = 1, . . . , L, such that A(l) µ = 0 under the null hypothesis. To get a test with high power under a specific alternative, one needs to select these matrices so that A(l) µ is maximized. For instance, in case of very sparse alternative, one needs several more dense matrices, however, that is controled by condition (L) and one can only ask for s = O(log(Ld)). On the other hand, for dense means one can use a single matrix (so L = 1) with just a few non-zero diagonals in which case s is a finite number. The resulting bootstrap tests have many advantages. In particular, they are consistent against any fixed alternative; they attain good level and power for large p - small n. They are distribution and correlation free. They are computationally fast, in particular they are faster by as much as p times in comparison to methods that require precision matrix estimation. Even for sparse alternatives, our tests have comparable performance to that of the existing specialized tests. We only require mild moment and tail assumptions on the distributions. We do not require that the ratios of sample sizes converge to a specific limit. Unlike current tests in literature, we do not require the samples to come from the same distribution, the tail and moment conditions have Bn that can grow with n in the non-identical case. We provide proofs for the bounds of ρn (CL,s ) with explicit dependency on the sparsity parameter s together with dependency on n and d. These bounds are similar 42 to the bounds in Chernozhukov et al. (2017) [19]. However, they are non-trivial extension of such results to a more complicated class of convex sets for which the dependency on sparsity of convex sets is tractable. Our proofs do not rely on delicate and complex geometric results employed in Chernozhukov et al. (2017) [19] for their s-sparse convex sets. The only drawback in our test is that our methods have the rate of (log(LKp))a n−1/6 which is quite far from a parametric rate. Even though our framework can accommodate tests of the type as in Lin et al. (2021) [34], exploiting the covariance structure, in particular the decay of covariance components with the dimension p leads to specialized tests with better near-parametric rates as in Lin et al. (2021) [34]. However, this covariance structure has to be verified in practice, which could be difficult in general apart from functional and sparse count data. Intuitively, such covariance structures give a dimension reduction mechanism and the effective dimension becomes of the same order as n. However this drawback has been dealt with by imposing stricter conditions and has been presented in [13]. Our work is different from Zhang, Zhou, He and Liu (2018) [57] who proposed MANOVA test that adapts to the sparsity of the alternative based on the data. Our work comes with exact rates for the tests, see Corollary 2.5.2, unlike their tests. Moreover, our methods allow for growing K = Kn as long as log(LKp) = o(nδ ) for δ ∈ (0, 1/7) under (N3a) condition. The computation is also much easier as we do not require estimation of individual elements of the covariances. More precisely, our tests require s6 Bn2 log7 (nLKp)n−1 = o(1) under (N3a). Then we can allow all the quantities s = sn , Bn , L = Ln , K = Kn , p = pn to grow with n. We can consider extreme particular cases. For example, if only s is allowed to grow then it can be as large as s = o(n1/6 ). If only p is allowed to grow, it can be as large as log p = o(n1/7 ). The same argument holds for K. One can rebalance different growth assumptions put on s, K, and p. These growth assumptions are better than those in Zhang 43 et al. (2018) [57]. From a probabilistic point of view, this work also introduces a novel class of sparse convex sets, for which the Berry-Esseen type results are obtained. This new class of convex sets generalizes the classes of half-spaces and hyper-rectangles to classes of “hyper-polygons“, which are linear transformations of intersections of a finite number of hyper-spaces. Such sets are sparse in the sense of Chernozhukov et al. (2017) [19]. We manage to explicitly track the effect of the sparsity parameter s on the rates. Under certain tail and moment assumptions on the distribution, this effect is linear. Chernozhukov et al. (2017) [19] did not establish this dependency explicitly, since it was buried in their complicated and intricate implicit geometric structures. 44 Chapter 3 Bootstrap based testing for equality of covariance matrices 3.1 Introduction The problem of testing the equality of the two covariance matrices in the two sample multi- variate set up is a classical problem. It has been thoroughly studied in the low-dimensional setting where the multivariate dimension p is fixed and smaller than the sample sizes, see Chapter 10, Anderson (2003) [3] and the references therein. In the context of high dimensional data where the number of components p either grow polynomially or even exponentially with increasing sample sizes, this problem has been addressed only in the last decade or so. The tests proposed by Schott (2007) [43] and Srivastava and Yanighara (2010) [46] are applicable only for multivariate normal populations. A U-statistic test based on an unbiased estimator of the Frobenius norm of the difference of the two population covariance matrices was proposed by Li and Chen (2012) [33]. Cai, Liu and Xia (2013) [11] proposed a test based on the maximum of the standardized differences between the entries of the two estimates of the two population covariance matrices. The tests of Li and Chen (2012) [33] and Cai et al. (2013) [11] work outside the regime of multivariate Gaussian populations as well. Further investigation by .Cai et al. (2013) [11] revealed that the statistic proposed by Li and Chen (2012) [33] fails to distinguish between the null and the alternative hypotheses when the difference between the two population covariance matrices fall under “dense" regime, i.e., when the number of non-zero elements in the difference matrix is not too high. On the other hand the test described in .Cai et al. (2013) [11] works well 45 when the difference matrix is sparse, i.e., when the two population covariance matrices can differ only in a very small number of entries. Although Cai et al. (2013) [11] showed that under certain regularity conditions their test enjoys some optimality in terms of the asymptotic power, it has been pointed out by Fan, Liao and Yao (2015) [24] that the convergence of the limiting distribution to Gumbel re- quires large sample sizes and some power-enhancement techniques. Chang, Zhou, Zhou and Wang (2017) [14] investigate the finite sample performance of a bootstrap version of the Cai et al. (2013) [11] test. Their technique involves using multiplier bootstrap approximation result for random vectors after vectorizing the covariance matrices. Their bootstrap method fails when the two populations means are unknown and unequal because then sample covari- ance matrices can no longer be expressed as sums of independent vectors. Moreover, they established consistency of their test under some restrictive conditions like sparsity and other correlational structures. The need for U-statistics based testing approach for covariance matrices can be moti- vated by noting the fact that the high dimensional central limit theorem fails because the sample covariance matrix can no longer be written as a vectorized sum of independent high dimensional vectors. 3.2 Overview In this chapter we propose a test for testing the equality of the two population covariance matrices in the high dimensional set up when the two populations means are unknown with minimal distributional and correlational assumptions. The proposed test is based on the maximum of the absolute differences between the entries of the Jacknifed estimators of the two population covariance matrices. We actually use a multiplier bootstrap version of this 46 test statistic. The proposed multiplier bootstrap procedure makes the size and power com- putation a lot faster. Moreover, the absence of distributional and correlational assumptions makes it applicable much more broadly, compared to the above mentioned tests. The pro- posed test is shown to be consistent against a large class of alternatives and is argued to be rate-optimal against the class of sparse alternatives. These results are obtained using the seminal works of Chernozukov, Chetverikov and Kato (CCK) (2013, 2015, 2017) [17], [18], [19] and Chen (2018) [16]. The chapter is organised as follows. Section 3.3 describes the testing problem along with the definition of the relevant quantities of interest. Section 3 has been split into two sections where Section 3.3 provides the bounds between the test statistic and its corresponding Gaus- sian counterpart. Section 3.4 provides a comprehensive discussion of the proposed jackknifed multiplier bootstrap method. Section 3.5 describes the testing procedure along with the def- inition of the relevant quantities of interest.This section also deals with the theoretical result which facilitates us to conduct the proposed test at level α. Section 3.6 provides a statisti- cal guarantee for the approximation between the true power of the test to its bootstrapped version. Section 3.6 allows us to prove the consistency of the proposed test. 3.3 Gaussian approximation result for U statistics In this section we shall prove the Gaussian approximation result for a large class of U statistics. Let Fj , j = 1, 2 be possibly two different d.f.’s on Rp . Let µj and Σj , j = 1, 2 denote their mean vectors and covariance matrices, respectively. Let X m represent the random sample X1 , · · · , Xm from F1 and Y n denote the random sample Y1 , · · · , Yn from F2 . We wish to test H0 : Σ1 = Σ2 versus the alternatives Ha : Σ1 ̸= Σ2 . The proposed test for this 47 problem is based on the difference of the estimates of the covariance matrices Σ1 , Σ2 , which in turn are U statistics. For that reason we shall first analyse some asymptotic properties of a general class of U statistics. Let h be a kernel function from Rp ×Rp 7→ Rp ×Rp that is symmetric under permutations, i.e., h(x1 , x2 ) = h(x2 , x1 ), for all x1 , x2 ∈ Rp and d = p × p. Thus h is a p × p matrix. Let vec(h) denote vector representation of h, i.e., vec(h) is the d-dimensional vector consisting of all entries of h. Assume E vec(h(X1 , X2 )) + E vec(h(Y1 , Y2 )) < ∞. Let Um X and U Y be the U-statistics defined for X m and Y n , respectively, as n    X 1 X Um := vec h(Xi , Xj ) , m(m − 1) 1≤i̸=j≤m   1  UnY X := vec h(Yi , Yj ) . n(n − 1) 1≤i̸=j≤n   The expected values of Um X is denoted by ΣX := E vec h(X1 , X2 ) and that of UmY is   denoted by ΣY := E vec h(Y1 , Y2 ) . We define the quantities √ X − ΣX ) √ m(Um n(UnY − ΣY ) Wm X = , WnY = , 2 2 which will play a key-role in the hypothesis testing as seen in later sections. We further   define the linear projection terms of Um X as g(x) := E vec h(X , X ) X = x − ΣX , and  1 2 1   that of UnY as g(y) := E vec h(Y1 , Y2 ) Y1 = y − ΣY , x, y ∈ Rp . The d × d covariance  matrices associated with g(X) and g(Y ) are defined accordingly as     ΓX = E g(X)g(X) , T ΓY = E g(Y )g(Y ) .T 48 For any measurable function f from Rp ×Rp → Rd that is symmetric under permutation, let fa denote the ath coordinate of f , a = 1(1)d. Further define for xj , yj ∈ Rp , j = 1, 2,   f (x1 , x2 ) := vec h(x1 , x2 ) − g(x1 ) − g(x2 ), f (y1 , y2 ) := vec h(y1 , y2 ) − g(y1 ) − g(y2 ). A kernel h : Rp × Rp 7→ Rp × Rp is said to be non-degenerate if Var(ga (X) > 0, ∀a = 1, 2, · · · , d. It is said to be completely degenerate if P g(X) = 0 = 1 or equivalently,  ∀x1 , x2 ∈ Rp .       E h(x1 , X2 ) = E h(X1 , x2 ) = E h(X1 , X2 ) = 0, For a sequence of real numbers δm,n , let Wm,n := Wm X +δ Y m,n Wn . Decompose Wm,n as Wm,n = Lm,n + Rm,n , where n  δ n  1 X X m,n X  Lm,n := √ g(Xi ) − Σ + √ g(Yj ) − ΣY , m n i=1 j=1 1 X δm,n X Rm,n := √ f (Xi , Xj ) + √ f (Yi , Yj ). 2 m(m − 1) 2 n(n − 1) 1≤i̸=j≤m 1≤i̸=j≤n Note that Rm,n is a degenerate U statistic while Lm,n is non-degenerate. It is reasonable to 49 expect that Lm,n would give a good approximation of Tm,n . Let ρ∗∗m,n √ (W X − ΣX ) √ (W Y − ΣY )  := sup P m m + δm,n ( n n )∈A A∈ARe 2 2 G1 G2   − P Tm + δm,n Tn ∈ A , Y ∈ A − P T G1 + δ G2     = sup P Um X +δ U T ∈ A , m,n n m m,n n A∈ARe G G where Tm 1 ∼D Nd (0, ΓX ), and Tn 2 ∼D Nd (0, ΓY ). To state the result about the Gaussian approximation of the U statistics we need the following assumptions. (a) There exists constants 0 < b < ∞ and δ2 > δ1 > 0 such that δ1 < |δm,n | < δ2 and h i 2 g 2 (Y ) > b, ∀ m ∧ n ≥ 1. inf 1≤a≤d E ga2 (X) + δm,n a (b) There exists a sequence of positive constants Bm,nl , l = 1, 2 such that the following holds with with ξ = X, ξ ′ = X ′ and ξ = Y, ξ ′ = Y ′ . h i ′ 2+l l , max E vec(h(ξ, ξ )) a ≤ Bm,n l = 1, 2, ∀ m ∧ n ≥ 1. 1≤a≤d (c) There exists a sequence of positive constants Bm,n such that the following holds with ξ = X, ξ ′ = X ′ and ξ = Y, ξ ′ = Y ′ , max ∥ vec(h(ξ, ξ ′ )) a ∥ψ1 ≤ Bm,n ,  ∀ m ∧ n ≥ 1. 1≤a≤d (d) log(d) ≤ b(m ∨ n), for some constants K, b > 0. 50 We are now ready to present the following theorem which provides an approximation of the error bound estimate between the probability of interest and its Gaussian counterpart. Theorem 3.3.1. Under the above set up and assumptions (a)–(d), the following holds. Y ∈ A − P T G1 + δ G2     ρ∗∗ m,n = sup P Wm X +δ W m,n n m T m,n m ∈ A ≲ ϖmX + ϖY , n A∈ARe Bm,n log7 (md) 1/6 Bm,n log7 (nd) 1/6  2   2  where ϖmX = Y and ϖn = . m n Remark 3.3.1. Condition (a) specifies the restriction on the sequence of constants δm,n to be bounded and ensures the non-degeneracy of the samples X and Y . Condition (c) imposes a condition on the third and fourth order moments. In the existing literature Chen and Li (2012) [33], Cai et al. (2013) [11] and Chang et al. (2017) [14] assumed the third order moments to be bounded whereas we allow them to diverge to infinty at a rate specified in condition (d). A similar assumption of uniform boundedness on the tails of X and Y were made by Chang et al. (2017) [14] and .Cai et al. (2013) [11]. In our case Condition (c) implies the tails of the distribution of X and Y can diverge to infinity in accordance with (d). All these condition are either the same or weaker than those appearing in the above references. Proof. This theorem can be seen as a two sample version of the Theorem 2.1 of Chen (2018). Some detailed calculations are still needed so we provide the proof for the sake of completeness. The proof uses the bounds obtained from Lemma (.0.12) and Lemma (.0.13). 51 Let  2 X := max E g (X) , 3 D2X := max E vec(h(X1 , X2 )) a , Dg,3 a 1≤a≤d 1≤a≤d  4 D4X := max E vec(h(X1 , X2 )) a . 1≤a≤d The Jensen’s inequality and assumption (b) yield that  2  3 D2X = max E vec(h(X1 , X2 )) a ≤ max E vec(h(X1 , X2 )) a . a a For bounding the term Dg,3X , note that    ga (x) := E vec(h(X, X2 )) a X = x , x ∈ R, 3   3 E ga (X) = E E vec(h(X1 , X2 )) a X1    3  ≤ E E vec(h(X1 , X2 )) a X1   3 X . = E vec(h(X1 , X2 )) a ≤ Bm,n = D̄g,3 By the condition (b), we readily obtain D4X ≤ Bm,n 2 . For the term Mh,4 X (τ ) at τ = 0, by the Cauchy Schwarz inequality, we obtain that for some 52 constant C1 > 0, X (0) = E h  4  i Mh,4 max max vec(h(X1 , X2 )) a I max vec(h(X1 , X2 )) a > 0 1≤i̸=j≤n 1≤a≤d 1≤a≤d h  i1 h i1 max | vec(h(X1 , X2 )) a |8 2 P( max | vec(h(X1 , X2 )) a | > 0) 2  ≤E max 1≤i̸=j≤n 1≤a≤d 1≤a≤d  8 ≤ C1 max max vec(h(X1 , X2 )) a ψ2 1≤i̸=j≤n 1≤a≤d 1  4 ≤ C1 max max | vec(h(X1 , X2 )) a ψ . 1≤i̸=j≤n 1≤a≤d 1 Use this bound and Lemma 2.2.2 of Van der Wart and Wellner (1996), pg.96, to obtain that h i4 X (0) Mh,4 ≤ log4 (md) max max ∥ vec(h(X1 , X2 )) a ∥ψ1 ≤ log4 (md)Bm,n  4 . (3.3.1) 1≤i̸=j≤n 1≤a≤d 53 By (3.3.1) and the definitions of the entities involved, 3 1 X )4 X )2 log4 d −1/6  ϕm (log d) 2 (Mh,4 (D̄g,3 log d  32  ≤ C2 X ) 14 (Mh,4 m m m 2 m1/6  log d  3 5/6 3 (log d)(log(md))(Bm,n ) = C2 2 X ) 41 (Mh,4 ≤ C2 1 2 m 5 X ) 3 log 3 d (D̄g,3 m6 11 2 (log(md)) 6 Bm,n 3 = C2 5 m6  2 1  4 Bm,n (log(md))7 6 log(md) 6 X, ≤ C2 ≤ C2 ϖm m m 2 4 Dg,3 log d −1/6 log(d) 1/3 log(d) 1/2   ϕm √ D2 ≤ √ Bm,n m m m  2 Bm,n (log(nd))7 1/6   1/6 1 X, ≤ ≤ C3 ϖm n 2 log5 (dm) Bm,n log5/4 d 1/4  D2 log4 d −1/6 log5/4 d g,3 1/2 ϕm D4 ≤ C4 Bm,n n 3/4 m m 3/4  2 7 1/6  1/12 Bm,n (log(nd)) log(d) 7/12   1 X. ≤ 2 ≤ C4 ϖm n m Bm,n By Lemma C.1 of CCK (2017), applied with Bm,n = D̄g,3 X , we obtain that for some universal constant c∗ > 0,  √ 3 m M3X (ϕm ) ≲ + Bm,n log(d) ϕm log(d)  √  m + exp −   , 4c1 ϕm Bm,n (log(d))2 √ !−1/3 (Bm,n )2 (log7 (dm)   m ≳ log(dm) 4c1 ϕm Bm,n (log(d))2 m ≳ c∗ log(dm). 54 √ √ m m √ Because ϕm ≥ 2, ≲ log(d) ≲ m, and ϕm log(d) !1/2 (Bm,n )2 (log2 (dm) √ √ Bm,n log(d) = m≲ m. m Combining these bounds we obtain ∗ X (ϕ ) ≲ m3/2 (md)−c ≲ m−1/2 . Mg,3 m G Using similar arguments it can be concluded that M3 1 (ϕm ) ≲ m−1/2 . The last two facts in turn yield that M3X (ϕm ) = Mg,3 X (ϕ ) + M G1 (ϕ ) ≲ m−1/2 . m 3 m Moreover, X )2 log7 d 1/6 (D̄g,3 M3X (ϕm )  + X m D̄g,3 Bm,n log7 d 1/6  2  1 ≤ +√ m mBm,n Bm,n log7 d 1/6 2 log7 (md) 1/6  2 Bm,n   = + (Bm,n )−4/3 (log(md))−7/6 m−1/3 m m Bm,n log7 (md) 1/6  2  ≲ . m Using all the above facts we finally conclude that, Bm,n log7 (md) 1/6 Bm,n log7 (nd) 1/6  2   2  ρ∗∗ m,n ≲ + . m n 55 This completes the proof of the Theorem. 2 The conditions required to prove the above theorem are weaker than those in the existing literature. Condition (a) states that the second moments of vec(h(X1 , X2 )), vec(h(Y1 , Y2 )) be bounded away from zero. It is worth noting that unlike Cai et al. (2013) [11] and Li and Chen (2012) [33], the condition (b) does not require a common fixed bounds on the moments. Here the tail can grow in an uniform manner as Bm,n grows to infinity. Li and Chen (2012) [33] considered some structural assumption on the traces of the two covariance matrices and Cai et al. (2013) [11] imposed some structural assumptions like correlation and sparsity among the components of X m and Y n . Schott(2007) [43], Srivastava and Yanighara (2010) [46] had the strict conditions of normality on X m and Y n . As stated before the conditions assumed here are much weaker in the sense that no specific distributional assumption or additional correlational assumption nor any uniformity of the moment conditions are required. Although Theorem 3.3.1 acts as a foundational stone towards the Gaussian approximation of the distribution of Wm m,n Wn , but because the limiting distribution is unknown, this X +δ Y theorem is of little use in implementing any test based on Wm m,n Wn for the large sample X +δ Y sizes. To circumvent this problem we are proposing bootstrap approximation in Theorem 4.1 in the next section, which acts a crucial step towards bridging this gap. 3.4 Jackknifed Multiplier Bootstrap Approximation for U statistics In this section instead of applying re-weighted multiplier bootstrap to estimate the unknown covariance matrix we employ the jackknifed version of multiplier bootstrap approximation with jackknifed estimator of the covariance matrix. One reason behind choosing this strategy is that the i.i.d re-weighted bootstrap or naive multiplier bootstrap techniques are proven to be slower than the jackknifed counterpart, see, e.g., Section 3 in Chen (2018) [16]. 56 G Let e1 , e2 , · · · , em+n be i.i.d standard normal r.v.’s that are independent of X m , Y n , Tm 1 G and Tn 2 . Define the Jack-knife versions of Tm X and T Y as follows. n m h m eX := √1 X 1 X  i X e. Tm vec h(Xi , Xj ) − Um i m m−1 i=1 j̸=i=1 n h n 1 1 i TneY := √ vec h(Yi , Yj ) − UnY ei+m . X X  n n−1 i=1 j̸=i=1 Define the Jackknife estimators of the corresponding covariance matrices of Tm X and T Y as n m XXn 1 X T, on o Γ̂JK X X   m := vec h(X i , Xj ) − U m vec h(X i , X j ) − U m (m − 1)(m − 2)2 i=1 j̸=i k̸=i n oT 1 XXXn on Γ̂JK n := vec h(Yi , Yj )) − Um Y vec h(Yi , Yj )) − Um . Y (n − 1)(n − 2)2 i=1 j̸=i k̸=i Let (m − 2)2 JK (n − 2)2 JK Γ̃JK m := Γ̂ , Γ̃JK n := Γ̂ , (3.4.1) m(m − 1) m n(n − 1) n ∆m,n = (Γ̃JK X 2 JK m − Γ ) + δm,n (Γ̃n − Γ ) . Y ∞ For any two random vectors ξ, ζ, the notation ξ|ζ denotes the conditional distribution of ξ, given ζ. Note that Tm eX |X m is N 0, Γ̃JK and T eY |Y n is N 0, Γ̃JK . We are ready   1 d m n 1 d n to state the following lemma which plays a crucial role towards obtaining the bootstrap approximation result. Lemma 3.4.1. Let Z1X , Z2Y be two independent random vectors such that Z1X |X m ∼ Nd (0, Γ̃JK Y n JK m ) and Z2 |Y ∼ Nd (0, Γ̃n ). Then, for some constant 0 < C < ∞ 57 and every sequence of real numbers ∆ ¯ m,n > 0, on the event {∆m,n ≤ ∆ ¯ m,n }, G G     sup P Z1X + δm,n Z2Y ∈ A|X m , Y n − P Tm 1 + δm,n Tn 2 ∈ A ≤ C(∆ ¯ m,n )1/2 log d. A∈ARe Proof. The proof is an immediate consequence of Theorem 5.1, CCKK (2022) with Z = G G 2 ΓY ) and Z X + δ Tm 1 + δm,n Tn 2 ∼ Nd (0, ΓX + δm,n Y X m , Y n ∼ Nd (0, Γ̃JK   1 m,n Z2 m + 2 Γ̃JK ). δm,n n Remark 3.4.1. Lemma 3.4.1 is instrumental in deriving the rates of Jackknife version of the U-statistics. It also is an improvement over similar results of Chen (2018) [16], Proposition 5.4. The impact of this result can be appreciated by noting the fact that the rate of bootstrap log (nd) 1/6 log (nd) 1/4  5   5  approximation has improved from n to n . This implies that the boostrapped version of the statistic converges to a Gaussian distribution at a rate of n−1/4 . For the next theorem we need the following condition. (e) log(1/γm,n ) ≤ K log(d(m ∨ n)), for some positive constants γm,n , K, b. Theorem 3.4.1. Under the above set up and assumptions (a)–(c)and (e), the following holds. For a γm,n < 1/56, with probability at least 1 − 56γm,n , G G ρJK eX eY 1 n,m = sup |Pe (Tm + δm,n Tn ∈ A) − P(Tm + δm,n Tm ∈ A)|, 2 A∈ARe BX (γ BY ≲ ϖm m,n ) + ϖn (γm,n ), 58 where Bm,n log5 (md) log2 (1/γm,n ) 1/4  2  BX (γ ϖm m,n ) = , m Bm,n (log5 nd) log2 (1/γm,n ) 1/4  2  BY ϖn (γm,n ) = . n This theorem provides a theoretical guarantee towards the Gaussian approximation term and its jackknife covariance multiplier bootstrap counterpart. Since the multiplier bootstrap term can be estimated it would facilitate its use to quantify the error bound of approxima- tion ρJK n,m . The entity ρn,m provides an upper bound to the error of approximation of the JK bootstrap distribution of the test statistic, viz., Tm eX +δ m,n Tn by the Gaussian counterpart. eY Proof. For any sequence of constants ∆ ¯ m,n > 0, on the event {∆ ˆ m,n ≤ ∆¯ m,n }, we have ρJK ¯ m,n ≲ (∆m,n ) 1/2 log d. Now we shall first bound the quantity ˆ m,n = (Γ̂JK − ΓX ) + δ 2 (Γ̂JK − ΓY ) . ∆ m m,n n ∞ We would be using Ki > 0, i = 1, 2, · · · to denote the absolute constants. The main goal is to use the previous lemma to find a real sequence ∆ ¯ m,n such that P(∆ ˆ m,n ≥ ∆¯ m,n ) ≤ γm,n . 59 Finally we would bound (∆ ¯ m,n )1/2 log d. Now, to bound ∆ ¯ m,n , we rewrite Γ̂JK as, m m XXn 1 X T, on o Γ̂JK X X   m = vec h(X i , X j ) − Um vec h(X i , Xj ) − U m (m − 1)(m − 2)2 i=1 j̸=i k̸=i   X 1 1  T = 2 1 − vec(h(Xi , Xj ))}{vec(h(Xj , Xj )) (m − 1)(m − 2) m 1≤i̸=j≤m  T X   + vec(h(Xi , Xj )) vec(h(Xi , Xk )) 1≤i̸=j̸=k≤m  1 1 X   T − vec(h(Xi , Xj )) vec(h(Xl , Xj )) (m − 1)(m − 2)2 m 1≤i̸=j̸=l≤m T X   + vec(h(Xi , Xj )) vec(h(Xi , Xj )) 1≤i̸=j≤m T X   + vec(h(Xi , Xj )) vec(h(Xj , Xk )) 1≤i̸=j̸=k≤m T X   + vec(h(Xi , Xj )) vec(h(Xi , Xl )) 1≤i̸=j̸=l≤m  T X   + vec(h(Xi , Xj )) vec(h(Xl , Xk )) , 1≤i̸=j̸=l̸=k≤m = Γ̂JK JK m1 − Γ̂m2 , (say). Let h i T ΓX  1 = E vec(h(X1 , X2 )) vec(h(X1 , X3 )) , h i h iT ΓX2 = E vec(h(X 1 , X2 )) E vec(h(X 1 , X 2 )) . Then, ΓX = ΓX 1 − Γ2 . We assume, without loss of generality, that X Bm,n log5 (md) log2 (1/γm,n ) Bm,n log5 (nd) log2 (1/γm,n )  2   2  ≤ 1, ≤1 m n 60 since, otherwise Theorem 3.4.1 holds trivially. We shall deal with several cases as follows. We shall first obtain a rate bound for Γ̂JK m1 − ΓX1 . We rewrite this difference as the sum of two U-statistics. In other words we write Γ̂m1 = JK (n − 3)! P m,1,1 + Γ̂m,1,2 , where Γ̂m,1,1 = Γ̂JK JK JK vec(h(x1 , x2 )) vec(h( x1 , x3 )) and   n! 1≤i̸ =j̸ = k≤m (n−2)! T Γ̂JK P   m,1,2 = n! 1≤i̸=j≤n vec(h(x1 , x2 )) vec(h(x2 , x2 )) . We shall first handle the leading term 1 ≤ i ̸= j = ̸ k ≤ m for which the kernel is Ha (x1 , x2 , x3 ) = vec(h(x1 , x2 )) a vec(h(x1 , x3 )) a , i = 1, 2, 3, xi ∈ Rp . and a :=   1 2 (a1 , a2 ), with 1 ≤ aj ≤ d, j = 1, 2. The notation 1 ≤ a ≤ d means a = (a1 , a2 ), 1 ≤ a1 , a2 ≤ d. Let r = [n/3], for any Xi ∈ Rp and define (n − 3)! Γ̂JK Zm,1,1 := r|Γ̂JK X X m,1,1 := n! H(Xi , Xj , Xk ), m,1,1 − Γ1 |∞ , 1≤i̸=j̸=k≤m X 3i+3  Mm,1,1 := max max Ha X 3i+1 . 0≤i≤r−1 1≤a≤d  3i+3  where, Ha X 3i+1 = Ha (X3i+1 , X3i+2 , X3i+3 ). Note that Γ̂JK m,1,1 is a U statistic of order 3 and E[Γ̂JK m,1,1 ] = Γ1 . X By applying Lemma .0.11 with α = 12 , η = 1 and δ = 12 , we obtain that, t2   1/2   X X    t P Zm,1,1 ≥ E[Z̄m,1,1 ]+t ≤ exp − X )2 +3 exp − X , ∀ t > 0, 3(¯ ς1,1,1 K1 ∥Mm,1,1 ∥ψ 1 2 61 where r−1    r−1 Xh i 3i+3  X )2 2 X H̄a (X)3i+3 X  (¯ ςm,1,1 := max E Ha X 3i+1 , Z̄m,1,1 := max 3i+1 − EH̄a , a a i=0 i=0  3   3    H̄a x 1 := Ha x 1 I max Ha (x)31 ) ≤ τ , a = (a1 , a2 ), 1 ≤ a1 , a2 ≤ d. a Note that   r−1 Xh  3i+3  i 21 E[Z̄m,1,1 ] ≤ K2 (log(d)1/2 max E(H̄a X 3i+1 − EH̄a )2 a i=0  h    3i+3  2 i1/2 + (log(d)) E max H̄a X 3i+1 − E[H̄a ] , i,a r−1 3i+3  1/2   X   ≤ K2 (log(d)) 1/2 max 2 E H̄a X 3i+1 a i=0 2 1/2       3i+3 + (log(d)) E max H̄a X 3i+1 − EH̄a , i,a n o ≤ K2 (log(d))1/2 ς¯1,1,1 X + (log(d))∥M X ∥ m,1,1 ψ . 1/2 By applying Cauchy-Schwarz and Lyapounov’s inequalities along with the condition (b),     1 2 3i+3    4 2 E Ha X 3i+1 ≤ E vec h (X)3i+1 , (X)3i+2 a 1  1   4 2 × E vec h (X)3i+1 , (X)3i+3 a 2 2 . ≤ Bm,n Therefore, X √ ς¯m,1,1 ≤ rBm,n ≤ mBm,n . 62 By the condition (b) and Pisier’s inequality (.0.8) we can conclude that X ∥Mm,1,1 ∥ψ ≤ K3 log2 (rd) max ∥{vec(h((X)3i+2 2 3i+1 ))}a ∥ψ ≤ K3 Bm,n 2 log(md)2 . 1/2 i,a 1/2 Coupled with condition (e) we get,   2 EZ̄m,1,1 ≤ K4 (mBm,n log(d)) 1/2 2 + Bn (log(d)) log(md) , 2 ≤ 2K4 (mBm,n2 log(md))1/2 .   P |Γ̂JK − Γ X | ≥ K (m−1 B 2 log(md))1/2 + t) m,1,1 1 ∞ 5 m,n (mt)2     1/2  mt ≤ exp − 2 + 3 exp − , K6 3mBm,n K7 Bm,n2 log2 (md) √ mt2      mt = exp − 2 + 3 exp − . K6 3Bm,n K8 Bm,n log(md) s 2 log(md) log2 ( 1 ) Bm,n γm,n Now choose t = t∗ = K8 , where the constant K8 > 0 is large m enough. Then, −1 ) log(md) log2 ( 1 )     P |Γ̂JK m,1,1 − ΓX1 |∞ ≥ 2t∗ ≤ exp − (K82 K10 γm,n  1 −1 )m1/4 (log 12 (1/γ − 34 −1  + 3 exp − (K82 K12 m,n )) log (md)B 2 m,n Now for d ≥ 3 and for some γm,n small enough, so we obtain that log(md) ≥ 1 and 1 ) ≥ 1. Therefore for K large enough, log( γm,n 8   X | ≥ 2t∗ ≤ γ K 2 /K K 1/2 /K P |Γ̂JK m,1,1 − Γ1 ∞ m,n 8 10 + 3 γ m,n 8 12 ≤ 4γ m,n . 63 By similar arguments as above, we conclude that,  2 Bm,n log(1/γm,n ) 1/4  1/2   |Γ̂JK − ΓX 2 ˙ P m,1,1 1 |∞ log (md) ≥ K13 m log(md) ≤ 4γm,n ,  1/2  JK X P |Γ̂m,1,1 − Γ1 |∞ log (md) 2 BX ≥ K13 ϖ1 (γm,n ) ≤ 4γm,n . Next, to analyse the second term in the expression of Γ̂JK m1 , let P 1≤i̸=j≤m T (n − 2)! Γ̂JK   X H(x1 , x2 ) := vec(h(x1 , x2 )) vec(h(x2 , x2 )) , m,1,2 = H(Xi , Xj ), n! 1≤i̸=j≤n h i X  T Γ1,2 := E vec(h(X1 , X2 )) vec(h(X2 , X2 )) . Then Γ̂m,1,2 is a U-statistic of order 2. With r = [m/2], define  2i+2  X r|Γ̂JK − ΓX X Zm,1,2 = m,1,2 1,2 |∞ , Mm,1,2 = max max Ha X 2i+1 , 0≤i≤r−1 1≤a≤d r−1    r−1 2 2i+2  Xh i X 2 X H̄a ((X)2i+2 X ς¯m,1,2 = max E Ha X 2i+1 , Z̄m,1,2 = max 2i+1 ) − E H̄a , a a i=0 i=0     2 2  2   H̄a X 1 = Ha X 1 1 max Ha X)1 ≤ τ , τ > 0. a Let τ = 8E[Mm,1,2 X ]. By Lemma .0.11, applied with α = 21 , η = 1 and δ = 12 , we obtain that ∀ t > 0, " 1/2 # t2     X X  t P Zm,1,2 ≥ E[Z̄m,1,2 ] + t ≤ exp − X + 3 exp − X . 3(¯ςm,1,2 )2 K1 ∥Mm,1,2 ∥ψ 1/2 64 But  h r−1 1i 1/2 E(H̄a ((X)2i+2 2 2 X E[Z̄m,1,2 ] ≤ K14 (log(d) max 2i+1 ) − E H̄ a ) a i=0 h i1/2  2i+2 + (log(d)) E[max |H̄a ((X)2i+1 ) − E[H̄a ]| 2 , i,a    X p ≤ K14 log(d)(¯ςm,1,2 ) + (log(d)) ∥Mm,1,2 ∥ψ . 1/2 Apply the Cauchy-Schwarz inequality and condition (c) to obtain  4        2 2i+2   2 . E Ha X 2i+1 ≤ E vec h X 2i+1 , (X)2i+2 ≤ Bm,n a so that, X √ ς¯m,1,2 ≤ rBm,n ≤ mBm,n . By a property of Orlicz norm, ∥X 2 ∥ψ = ∥X∥2ψ . By condition (d) and Pisier’s inequality 1/2 1 (.0.8), we obtain     2 X 2i+2  Mm,1,2 ≤ K15 log2 (rd) max vec h X 2i+1 ≤ K15 Bm,n 2 log2 (md). ψ1/2 i,a ψ1/2 65 Use condition (e) to derive   2 log(md) 1/2 ,   2 EZ̄m,1,2 ≤ K16 (mBm,n log(d)) 1/2 2 2 + Bn (log(d)) log(md) ≤ 2K16 mBm,n   P |Γ̂JK m,1,2 − ΓX1,2 |∞ ≥ K16 (m−1 Bm,n2 log(md))1/2 + t) (mt)2  1/2     mt ≤ exp − 2 + 3 exp − , K17 3mBm,n 2 log2 (md) K18 Bm,n √ (mt)2      mt = exp − 2 + 3 exp − . K17 3mBm,n K19 Bm,n log(md) s 2 log(md) log2 ( 1 ) Bm,n γm,n Choose, t = t∗ = K20 , for some constant 0 < K20 < ∞(large m enough). Then, we have      |Γ̂JK − ΓX 2t∗ 2 K −1 ) log(md) log2 P m,1,2 1,2 |∞ ≥ ≤ exp − (K20 17 1/γm,n  1 2 −1 1/4 1/2 − 43 −12  + 3 exp − (K20 K19 )n log (1/γm,n ) log (md)Bm,n . Now we see that 2 1/2 K20 K20 K17 K   P |Γ̂JK X ∗ ≤ + 3γm,n19 m,1,2 − Γ1,2 |∞ ≥ 2t γm,n ≤ 4γm,n ,   B 2 log(md) log2 (1/γm,n ) 1/4  JK X 1/2 m,n P |Γ̂m,1,2 − Γ1,2 |∞ log(md) ≥ K21 log(md) ≤ 4γm,n , m   P |Γ̂JK − ΓX |1/2 log(md) ≥ K ϖ BX (γ ) ≤ 4γm,n . m,1,2 1,2 ∞ 21 1 m,n Note that by the Cauchy-Schwarz and Lyapounov’s inequalities and condition (b), we can 66 2/3 see that, |Γ1,2 |∞ ≤ Bm,n , from which we get m−1 |Γ1,2 |∞ ≤ t∗ /2. Therefore, |Γ̂JK JK m,1,2 |∞ ≤ |Γ̂m,1,2 − Γ1,2 |∞ + |Γ1,2 |∞ . Again, n JK 3t∗ o n JK 3t∗ o |Γ̂m,1,2 |∞ ≥ ⊆ |Γ̂m,1,2 − Γ1,2 |∞ + |Γ1,2 |∞ ≥ 2 2 n 3t ∗ nt ∗o ⊆ |Γ̂JK m,1,2 − Γ1,2 |∞ ≥ 2 + 2 . Therefore,  3t∗   JK − Γ | ≥ (3 + n)t ∗ P |Γ̂JK m,1,2 ∞| ≥ ≤ P |Γ̂m,1,2 1,2 ∞ ≤ 4γm,n . 2 2 Next, consider Γ̂JK m2 . It can be decomposed as a sum of 5 U-Statistics. To deal with T the leading term 1≤i̸=j̸=k̸=l≤m , let H(x1 , x2 , x3 , x4 ) = vec(h(x1 , x2 )) vec(h(x3 , x4 )) P   and define (m − 4)! Γ̂JK X m,1,4 = m! H(Xi , Xj , Xk , Xl ). 1≤i̸=j̸=k̸=l≤m Note that Γ̂JK m,1,4 is a U statistics of order 4 and E[Γ̂m,1,4 ] = Γ2 . Let r = [m/4] and define JK X  4i+4  X r|Γ̂JK − ΓX X Zm,1,4 = m,1,4 2 |∞ , Mm,1,4 = max max Ha X 4i+1 , 0≤i≤r−1 1≤a≤d r−1    r−1 2 4i+4  Xh i X 2 X H̄a ((X)4i+4 X ς¯m,1,4 = max E Ha X 4i+1 , Z̄m,1,4 = max 4i+1 ) − EH̄a , a a i=0 i=0     4 4  X 1 1 max Ha X)41 ≤ τ   H̄a X 1 = Ha , τ > 0, a = (a1 , a2 ), a ∀a1 , a2 = 1, 2, · · · , d. 67 By Lemma .0.11 with α = 21 , η = 1 and δ = 12 . We have ∀t > 0 that t2      1/2  X X  t P Z1,1,4 ≥ E[Z̄1,1,4 ] + t ≤ exp − X )2 + 3 exp − X ∥ . 3(¯ς1,1,4 K1 ∥M1,1,4 ψ1 2 Moreover,   r−1 X  4i+4  2  21 E[Z̄1,1,4 ] ≤ K22 (log(d)1/2 max E H̄a X 4i+1 − EH̄a a i=0  4i+4 2 1/2  + (log(d))[E max H̄a ( X 4i+1 ) − E[H̄a ] , i,a r−1 4i+4 1/2   X   ≤ K22 (log(d)) 1/2 max 2 E H̄a X 4i+1 , a i=0  1/2  4i+4 2 + (log(d)) E max H̄a ((X)4i+1 ) − EH̄a , i,a ≤ K22 {(log(d))1/2 ς¯1,1,4 X + (log(d))∥M X ∥ 1,1,4 ψ . 1/2 By the Cauchy-Schwarz inequality inequality and Condition(b), " # "  #1  4 2 E Ha2 ((X)4i+4 4i+1 ) ≤ E vec h((X)4i+1 , (X)4i+2 ) a1 "  #1   4 2 × E vec h((X)4i+3 , (X)4i+4 ) 2 . ≤ Bm,n a2 Therefore, X √ ς¯1,1,4 ≤ rBm,n ≤ mBm,n . 68 By Condition (b) and the Pisier’s inequality (.0.8),     2 X ∥ ∥M1,1,4 ≤ K23 log2 (rd) max vec h (X)4i+14i+2 2 log(md)2 . ≤ K23 Bm,n ψ1/2 i,a m ψ1/2 This bound together with condition (e) yield that n o EZ̄1,1,4 ≤ K24 (mBm,n 2 log(d))1/2 + B 2 (log(d)) log(md)2 n 2 log(md) 1/2 ≤ 2K24 mBm,n ,   P |Γ̂JK − ΓX | ≥ K (m−1 B 2 log(md))1/2 + t) m,1,4 2 ∞ 25 m,n (mt)2 1/2      mt ≤ exp − 2 + 3 exp − , K26 3mBm,n 2 log2 (md) K27 Bm,n √ (mt)2      mt ≤ exp − 2 + 3 exp − . K26 3mBm,n K28 Bm,n log(md) s 2 log(md) log2 ( 1 ) Bm,n γm,n Apply the above bound with t = K29 , for constant K29 > 0 m (large enough). Then, the above bound becomes 2 K −1 ) log(md) log2 ( 1 ))   P |Γ̂JK m,1,4 − ΓX2 |∞ ≥ 2t ≤ exp(−(K29 26 γm,n 1 1 − 21 2 −1 1/4 − 34 + 3 exp(−(K29 K28 )n (log (1/γm,n )) log (md)Bm,n ). 2 69 Now we see that, 2 1/2 K29 K29 K26 K   P |Γ̂JK X ∗ ≤ + 3γm,n28 m,1,4 − Γ2 |∞ ≥ 2t γm,n ≤ 4γm,n ,   B 2 log(md) log2 (1/γm,n ) 1/4  JK X 1/2 m,n P |Γ̂m,1,4 − Γ2 |∞ log(md) ≥ K30 log(md) ≤ 4γm,n , m   P |Γ̂JK − ΓX |1/2 log(md) ≥ K ϖ BX (γ ) ≤ 4γm,n . m,1,4 2 ∞ 30 1 m,n Next, to analyse the second term m2 , let r = [m/2] in the expression of Γ̂JK P 1≤i̸=j≤m and T (m − 2)! Γ̂JK   X  H(x1 , x2 ) = vec(h(x1 , x2 )) vec(h(x1 , x2 )) , m,1,3 = H Xi , Xj , m! 1≤i̸=j≤m h i T ΓX , Zm,1,3 = r Γ̂JK X  1,3 := E vec(h(X 1 , X 2 ) vec(h(X 1 , X 2 )) m,1,3 − Γ1,3 ∞ ,    X 2i+2 Mm,1,3 = max max Ha X 2i+1 , ∀ a = (a1 , a2 ), a1 , a2 = 1, 2, · · · , d. 0≤i≤r−1 1≤a≤d Then Γ̂JK m,1,3 is a U-statistic of order 2. Let τ = 8E[Mm,1,3X ]. By Lemma .0.11, applied with α = 21 , η = 1 and δ = 12 , we obtain that ∀ t > 0, t2 1/2      X X   t P Zm,1,3 ≥ E[Z̄m,1,3 ]+t ≤ exp − X + 3 exp − X , 3(¯ςm,1,3 )2 K1 ∥Mm,1,3 ∥ψ 1 2 70 where r−1    r−1 Xh i 2 2i+2  X 2 X H̄a ((X)2i+2 X ς¯m,1,3 = max E Ha X 2i+1 , Z̄m,1,3 = max 2i+1 ) − EH̄a , a a i=0 i=0     2 2  2  H̄a X 1 = Ha X 1 1 max Ha X)1 ≤τ , a = (a1 , a2 ), 1 ≤ a1 , a2 ≤ d. a Moreover,  h r−1 1i 1/2 E(H̄a ((X)2i+2 2 2 X E[Z̄m,1,3 ] ≤ K31 (log(d) max 2i+1 ) − E H̄ a ) a i=0 h i1/2  2i+2 + (log(d)) E[max |H̄a ((X)2i+1 ) − E[H̄a ]| 2 , i,a    X X ∥ p ≤ K31 log(d)(¯ ςm,1,3 ) + (log(d)) ∥M1,1,3 ψ1/2 . Now again by applying the Cauchy-Schwarz inequality and condition (c), we obtain that   2i+2      4  2 E Ha X 2i+1 ≤ E vec ha X 2i+1 , (X)2i+2  ≤ Bm,n 2 . Therefore, X √ ς¯m,1,3 ≤ rBm,n ≤ mBm,n . By a property of Orlicz norm, we have ∥X 2 ∥ψ = ∥X∥2ψ . The condition (d) and Pisier’s 1/2 1 inequality (.0.8) yield that    2 X 2i+2  Mm,1,3 ≤ K32 log2 (rd) max vec ha X 2i+1 ≤ K32 Bm,n 2 log(md)2 . ψ1/2 i,a ψ1/2 71 Use condition (e) and this bound to obtain that   2 log(md) 1/2 ,   2 EZ̄m,1,3 ≤ K33 (mBm,n log(d)) 1/2 2 2 + Bn (log(d)) log(md) ≤ 2K33 mBm,n   P |Γ̂JK m,1,3 − ΓX1,3 |∞ ≥ K34 (m−1 Bm,n2 log(md))1/2 + t) , (mt)2    1/2   mt ≤ exp − 2 + 3 exp − , K35 mBm,n 2 log2 (md) K36 Bm,n √ (mt)2      mt = exp − 2 + 3 exp − . K35 3mBm,n K37 Bm,n log(md) s 2 log(md) log2 ( 1 ) Bm,n γm,n Apply this bound with t = t∗ = K38 , for constant K38 > 0 (large m enough), to obtain that      |Γ̂X − ΓX 2t∗ 2 K −1 ) log(md) log2 P m,1,3 1,3 |∞ ≥ ≤ exp − (K38 35 1/γm,n  1 2 −1 1/4 1/2 − 43 −12  + 3 exp − (K38 K37 )n log (1/γm,n ) log (md)Bm,n . Consequently, 2 1/2 K10 K12 K11 K   P |Γ̂JK X ∗ ≤ + 3γm,n13 m,1,3 − Γ1,3 |∞ ≥ 2t γm,n ≤ 4γm,n ,   B 2 log(md) log2 (1/γm,n ) 1/4  JK X 1/2 m,n P |Γ̂m,1,3 − Γ1,3 |∞ log(md) ≥ K39 log(md) ≤ 4γm,n , m   JK X 1/2 BX P |Γ̂m,1,3 − Γ1,3 |∞ log(md) ≥ K39 ϖ1 (γm,n ) ≤ 4γm,n . By the Cauchy-Schwarz and Lyapounov’s inequalities and condition (b), we can see that, 72 2/3 −2 X ∗ 1,3 |∞ ≤ Bm,n , from which we obtain that n |Γ1,3 |∞ ≤ t /2. Therefore, |ΓX |Γ̂JK X X X m,1,3 |∞ ≤ |Γ̂m,1,3 − Γ1,3 |∞ + |Γ1,3 |∞ , n 3t∗ o n JK X | + |ΓX | ≥ 3t , ∗o |Γ̂JK | m,1,3 ∞ ≥ ⊆ |Γ̂m,1,3 − Γ1,3 ∞ 1,3 ∞ 2 2 n 3t∗ 2 n t ∗ o ⊆ |Γ̂JK X m,1,3 − Γ1,3 |∞ ≥ 2 + 2 ,  JK 3t∗   JK X (3 + n2 )t∗  P |Γ̂m,1,3 |∞ ≥ ≤ P |Γ̂m,1,3 − Γ2 |∞ ≥ ≤ 4γm,n . 2 2 For the remaining terms in Γ̂JK m2 , note that they are either a U statistic of degree three or a U statistics of degree two, which are analyzed as above. Therefore,   P |Γ̂JK − ΓX |1/2 log(md) ≥ K ϖ BX (γ ) m ∞ 40 1 m,n X |1/2 log(md) ≥ K40 ϖ BX (γ   ≤ P |Γ̂JK m1 − Γ 1 ∞ 1 m,n ) 2  JK X 1/2 K40 BX  + P |Γ̂m2 − Γ2 |∞ log(md) ≥ ϖ1 (γm,n ) , 2 ≤ 28γm,n , where K40 = max{K13 , K21 , K30 , K39 , · · · } denote a generic universal constant. Similarly for Y1 n we would have Γ̂JK JK JK n = Γ̂n1 − Γ̂n2 . 73 Recall that ΓY = ΓY1 − ΓY2 , where h i T ΓY1 = E vec(h(Y1 , Y2 )) vec(h(Y1 , Y3 ))  , h i h iT Y Γ2 = E vec(h(Y1 , Y2 )) E vec(h(Y1 , Y2 )) . By using the similar arguments as above, we obtain that   1/2 P |Γ̂JK n − ΓY |∞ log(nd) ≥ K42 ϖ1BY (γm,n ) Y |1/2 log(nd) ≥ K42 ϖ BY (γ   ≤ P |Γ̂JK n1 − Γ 1 ∞ 1 m,n ) 2 Y |1/2 log(nd) ≥ K42 ϖ BY (γ   + P |Γ̂Jk n2 − Γ 2 ∞ 1 m,n ) . 2 s s 2 log(md) log2 ( 1 ) Bm,n 2 log(nd) log2 ( 1 ) Bm,n γm,n γm,n Now choose ∆ ¯ m,n = K40 + K42 . m n Combining all the previous inequalities with this choice of ∆ ¯ m,n , it readily follows that,   P |Γ̂JKm − ΓX + δ 2 (Γ̂JK − ΓY )| ≤ ∆ m,n n ∞ ¯ m,n   JK X 2 JK = 1 − P |Γ̂m − Γ + δm,n (Γ̂n − Γ |∞ > ∆m,n Y ¯   X |1/2 log(md) ≥ K40 ϖ BX (γ  ≥ 1 − P |Γ̂JK m − Γ ∞ 1 m,n ) 2  K42 BY  JK Y 1/2 + P |Γ̂n − Γ |∞ log(nd) ≥ ϖ1 (γm,n ) 2   ≥ 1 − 28γm,n + 28γm,n = 1 − 56γm,n . Recall (3.4.1). The same conclusion holds for |Γ̃X m − Γ |∞ and |Γ̃n − Γ |∞ from the fact X Y Y that Γ̃X m ≲ Γ̂m and Γ̃n ≲ Γ̂n . And thus the conclusion follows from Lemma 4.1, by JK Y JK setting Tm eX |X m = Z X |X m and T eY |Y n = Z Y |Y n . 1 1 n 1 2 74 Remark 3.4.2. One can choose a sequence γm,n such that < ∞. Then by P m,n γm,n applying Borel-Cantelli lemma, we thus obtain the bootstrap convergence result in almost sure sense. For example if one chooses γm,n = exp(− log(dm)) for m = n, then γm,n < 1/56 and log(1/γm,n ) ≤ K log(d(m∨n)) for some large m. To apply Borel-Cantelli lemma, we note that P∞ P∞ P∞ −1 mc and c < 1/7 m=4 γm,n = m=4 exp(− log(dm)) = m=4 (dm) . For d < exp c we obtain that P∞ P∞ mc m)−1 ≤ ∞ exp−u du < C (c 4c )−1 exp(−4c ) R m=4 γm,n = m=4 (exp u=4 u 1 < ∞ for some positive constant C1 . Now using Theorems 3.3.1 and 3.4.1 we are in a position to state the next important result. Let Pe denote the probability distribution with respect to em+n only or it can be thought of as being the conditional distribution of em+n , given all the other r.v.’s. 3.5 Two sample test for covariance matrices Based on the results derived in the previous sections, we formulate a testing procedure for testing the equality of two population covariance matrices under l∞ norm Consider the problem of testing H0 : Σ1 = Σ2 versus Ha : Σ1 ̸= Σ2 , where Σ1 , Σ2 ∈ Rp×p , representing the covariance matrices of X and Y respectively. Recall √ X − UY ) m(Um n that Tmn = is the original test statistic, where 2 m X m vec (Xi − Xj )(Xi − Xj )T  X 1 X Um = and m(m − 1) 2 i=1 j̸=i=1 n X n T  Y = 1 X vec (Yi − Yj )(Y i − Y j ) Um . n(n − 1) 2 i=1 j̸=i=1 75 are the sample covariance matrices for the sample X m and Y n samples respectively. Note that both Um X , U Y are d = p2 dimensional U statistics. The proposed test rejects H n 0 whenever ∥Tmn ∥ is large. To implement the test we propose to use multiplier bootstrap r m eY version of the Tm,n given by Tm,n = Tm − JK eX T , where n m m  m  ! eX := √m 1 X 1 X vec((Xi − Xj )(Xi − Xj )T )  X e , Tm − Um i m m−1 2 i=1 j̸=i=1 n  n  ! √ vec((Y − Y )(Y − Y ) T)  1 1 i j i j TneY := n Y X X − Um ei+m . n n−1 2 i=1 j̸=i=1 Let, r n eX m eY o cB (α) = inf t ∈ R : Pe (∥Tm − Tn ∥∞ ≤ t) ≥ 1 − α , 0 < α < 1. n Corollary 3.5.1 below show that the test rejects H0 whenever ∥Tmn ∥ > cB (α) is of the asymptotic size α. From now for the sake of brevity, for any ξ1 , ξ2 ∈ Rd , let vec((ξi − ξj )(ξi − ξj ))T h(ξi , ξj ) = 2 denote the covariance kernel. The following theorem along with the corollary provides redthe guarantee of the asymptotic level of the above mentioned test. Before stating the next theorem we need some more assumptions as stated below. (a′ ) For some universal constants 0 < c1 < c2 < 1, m+n m ∈ (c , c ). 1 2 (b′ ) There exists a constant b > 0 such that E[ga2 (X) + δm,n 2 g 2 (Y )] ≥ b, for all 1 ≤ a ≤ d. a 76 (c′ ) There exists a sequence of constants Bm,n ≥ 1 such that    2+l l , l = 1, 2, max E | vec(h(X1 , X2 ) )a ≤ Bm,n 1≤a≤d    2+l l , l = 1, 2. max E vec(h(Y1 , Y2 ) )a ≤ Bm,n 1≤a≤d (d′ ) The constants Bm,n defined in (c) also satisfy   max vec(h(X1 , X2 ) )a ψ ≤ Bm,n , max vec(h(Y1 , Y2 ) )a ψ ≤ Bm,n . 1≤a≤d 1 1≤a≤d 1 Bm,n log7 (pm) Bm,n log7 (pn) (e′ ) The constants Bm,n defined in (c) also satisfy ∼ → 0, m n as m ∧ n → ∞. For brevity, let Ω = Σ1 − Σ2 . The Kolomogorov distance between the two distributions of suitably centered Tmn and Tmn JK is defined to be √    JK KD Tmn − m vec(Σ1 − Σ2 )/2 , Tmn  √m(U X − U Y ) − √m vec(Σ − Σ )   r m eY  m n 1 2 eX = sup P ≤ t − Pe ∥Tm − T ∥∞ ≤ t . t≥0 2 ∞ n n We are ready to state the following theorem. Theorem 3.5.1. Suppose the above conditions (a′ )–(e′ ) hold. Then for any non-negative definite matrices ΣX and ΣY , of real numbers, with probability tending to one, 7 JK  ≲ Bm,n log (pm) 1/6 . n o KD Tmn , Tmn m 77 Remark 3.5.1. Condition (a′ ) specifies that the ratio of the sample sizes can reside in any open interval. Condition (b′ ) ensures the non-degenracy of the sample observations. This condition is less restrictive than the minimum eigen-value condition considered in several existing literatures viz. Chen and Li(2012) [33], Cai et al.(2013) [11], Chang et al.(2017) [14]. Condition (c′ ) allows the bound on the third and fourth order moments to grow with the sample sizes m, n, unlike as in Cai et.al(2013) [11], Li and Chen (2012) [33] and Chang et al. (2017) [14]. In these papers the moments appearning in (c′ ) are assumed to be bounded from above by a fixed constant, for all sample sizes. Condition (d′ ) allows the sub-exponential tails to grow freely with the sample sizes, which is also advantageous than conditions in Cai et al. (2013) [11] and Chang et al. (2017) [14]. On a similar note in .Cai et al. (2013) [11], for the convergence of their distribution of test statistic under H0 to extreme Type-I distribution or to a normal distribution as in Li and Chen (2012) [33], one needs to assume sparsity or weak correlation structure among the individual components of their test statistics over which either l∞ or l2 -norm would be calculated. The above multiplier bootstrap method helps us to formulate a similar testing procedure without imposing any such correlational assumptions. Proof. The proof uses the results of the previous section with δm,n = −m1/2 n−1/2 . From condition (a′ ), we can verify that  1/2  1/2 c1 c2 = δ1 < |δm,n | < δ2 = , (3.5.1) (1 − c1 ) (1 − c2 ) It follows from assumption (b′ ) that min E[ga2 (X) + δm,n 2 g 2 (Y c )] ≥ min{1, δ 2 }b. a 1 (3.5.2) 1≤a≤d 78 G G ′ ′ Recall that, Tmn G = T 1 +δ m m,n Tm . By combining (3.5.1), (3.5.2), (a ), (e ) along with 2 Theorem 3.3.1 with ΣX = Σ1 and ΣY = Σ2 we obtain that Bm,n (log7 (pm)) 1/6  2  KD(Tmn , Tmn G ) ≤ ρ∗∗ m,n ≲ (3.5.3) m Choose γm,n in condition (e) of Theorem 3.4.1 to be γm,n = 1 . m2 (log m)4 From (a′ ) and (e′ ) it can be easily verified that Bm,n log5 (pm) log2 (1/γm,n ) Bm,n log5 (pn) log2 (1/γm,n ) ∼ → 0, as m, n → ∞. (3.5.4) m n Combining (3.5.1), (3.5.2), (3.5.4) and Theorem 3.4.1 we conclude that, ( )1/4 JK , T G ) Bm,n log5 (pm) log2 (1/γm,n ) KD(Tmn mn ≤ ρJK m,n ≲ . (3.5.5) m The claim of the theorem follows from the triangle inequality by verifying the fact that, Bm,n (log7 (pm)) 1/6  2  KD(Tmn , Tmn JK ) ≤ KD(Tmn , Tmn G ) + KD(T JK , T G ) ≲ . (3.5.6) mn mn m This concludes the proof. The proposed test procedure rejects H0 : Σ1 = Σ2 versus Ha : Σ1 ̸= Σ2 , at the signifi- cance level α ∈ (0, 1), whenever Φα = 1, where Φα =I(∥Tmn ∥∞ > cB (α)). Corollary 3.5.1. Under the conditions of Theorem 3.5.1 and under H0 , the following holds.  √ X − UY )  n B 2 log7 (pm) o1/6 m(Um n m,n sup P ≥ cB (α) − α ≲ max . α∈(0,1) 2 ∞ m 79 Proof. The proof is an immediate consequence of Theorem 3.5.1 from the definition of cB (α). As an immediate consequence of this corollary is the formulation of the following 100(1 − α)% confidence region for (Σ1 − Σ2 ). n o CR1−α := Σ1 − Σ2 : Tmn ∞ ≤ cB (α) . A computing procedure of cB (α). The multiplier bootstrapped version of the critical value cB (α) is quite advantageous in terms of faster computation. A procedure of computing the multiplier bootstrap critical value is described below for reader’s convenience. Step 1: From the sample of size m + n, generate N sets of standard normal random variables. Denote them by em+n 1 , · · · , em+n N . Treat them as random copies of em+n = {e1 , e2 , · · · , em+n }. Step 2: Keeping X m and Y n fixed, compute the bootstrapped version of the test statis- tic ∥Tmn ∥∞ N times, viz., calculate ∥Tmn ∞ N times. Compute N values of JK ∥ JK , T JK , · · · , T JK }. {Tmn1 mn2 mnN Step 3: The 100(1 − α) quantile of {Tmn1 JK , T JK , · · · , T JK } would be treated as an mn2 mnN approximate value for cB (α). It is to be noted that normally general resampling method demands a computational cost of the order O(N n2 p2 ) whereas this multiplier bootstrap technique reduces the compu- tational cost to O(n(p2 + n)N ) providing a massive advantage. A general criticism received by the use of this maximum norm based statistic in Cai et al. (2013) [11] is that the con- vergence of the asymptotic null distribution to Gumbel requires relatively large sample size, which in turn possess some computational challenges in terms of size and power when the 80 sample size is small. Fan et al. (2015) [24] suggested power enhancement techniques in this context. However, in our case the proposed multiplier bootstrap method makes the power computation a lot faster even without this power enhancement technique. The proposed method can be much more appreciated in the asymptotic analysis of the power provided in the following section. 3.6 Analysis of Power Analysis of power : The goal of this section is to show that the difference between the two power functions obtained from the original test statistic and its Jackknifed bootstrap counterpart is asymptotically negligible. Further, we show that the proposed jackknifed bootstrap based test is consistent under the class of general alternatives. The power function of the test is  √m(U X − U Y )  m n PHa {Φα = 1} = P ∥ ∥∞ ≥ cB (α) Ha . 2 This power function is an abstract quantity because the respective covariance matrices ΓX and ΓY of Tm X and T Y are unknown in practice. To circumvent this problem we define n jackknifed multiplier bootstrap based power function as P∗Ha {Φα = 1}, which is expressed as r √  e∗ X m e∗ Y m(vec(Σ1 − Σ2 ))  Pe∗ ∥Tm − T + ∥∞ ≥ cB (α)|Ha , n n 2 where Pe∗ (.) denotes probability with respect to e∗m+n only. Before exploring the asymptotic theoretical aspects of the power function we shall describe the multiplier bootstrap procedure in the context of approximating the true power function as follows. Step 1: Generate {e∗1 , e∗2 , · · · , e∗m+n } independent of em+n which has been used previously 81 to calculate cB (α). Step 2: Now compute the bootstrap power function for the proposed test which is denoted by r √  e∗ X m e∗ Y m(vec(Σ1 − Σ2 ))  Pe∗ ∥Tm − T + ∥∞ ≥ cB (α) . n n 2 The following theorem establishes a theoretical guarantee in an asymptotic sense for ap- proximating the true power function PHa {Φα = 1}, by its Multiplier Bootstrap counterpart PH∗ {Φ = 1}. For the sake of brevity, we write Ω := Σ − Σ . a α 1 2 Theorem 3.6.1. Assuming the conditions for Theorem 3.5.1 holds, then for any Ω ∈ Rp × Rp , we have with probability one, r √  e∗ X m e∗ Y mvec(Ω)  Pe∗ ∥Tm − Tn + ∥∞ ≥ cB (α) n 2 √  m(U X − U Y )  m n −P < cB (α) H1 2 ∞ Bm,n log7 (pm) 1/6  2  ≲ . m 82 Proof. Under Ha , r √  e ∗X m e∗ Y mvec(Ω)  Pe∗ ∥Tm − Tn + ∥∞ ≥ cB (α) n 2 r √  e ∗X m e∗ Y mvec(Ω)  = 1 − Pe∗ ∥Tm − T + ∥∞ < cB (α) , n n 2  √m(vec(Ω)) ∗X r m e∗ Y  √ m(vec(Ω)a )  a e = 1 − Pe ∗ − − cB (α) < Tm − Tn a < − + c B (α) 2 n 2 √ √ √ X − U Y ) − m(vec(Ω))  m(vec(Ω))a m(Um n a a +P − − cB (α) < 2 √ 2 m(vec(Ω))a  <− + cB (α) 2  √m(vec(Ω)) √ m(Um X − U Y ) − √m(vec(Ω)) a n a a −P − − cB (α) < 2 √ 2 m(vec(Ω))a  <− + cB (α) , 2  m(U X − U Y ) − √mΩ √  m n ≥ 1 − sup P ∥ ∥∞ ∈ A A∈ARe 2  ∗X r m e∗ Y   √m(U X − U Y )  e m n − Pe∗ ∥Tm − Tn ∥∞ ∈ A − P ∥ ∥∞ < cB (α) , n 2  √m(U X − U Y )   r m Y  m n X =P ∥ ∥∞ ≥ cB (α) − sup P ∥Tm − T ∥∞ ∈ A 2 A∈ARe n n r  e ∗X m e∗ Y  − Pe∗ ∥Tm − Tn ∥∞ ∈ A . n Similarly under Ha ,  √m(U X − U Y )  m n P 2 ∞ > cB (α) √ X − UY )  m(Um n a  = 1 − P − cB (α) < < cB (α) , 2 83 √ r √  m(vec(Ω))a e ∗X m e∗ Y m(vec(Ω))a  = 1 + Pe∗ − − cB (α) < Tm − Tn < − + cB (α) 2 n 2  √m(vec(Ω)) √ √ X − U Y ) − m(vec(Ω)) a m(Um n a a −P − − cB (α) < √2 2 m(vec(Ω))a  <− + cB (α) 2 √  √m(vec(Ω)) ∗X r m e∗ Y m(vec(Ω))a  a e − Pe ∗ − − cB (α) < Tm − Tn < − + cB (α) , 2 n 2  √m(U X − U Y ) − √mΩ r m e∗ Y e∗ X −    m n ≥ 1 − sup P ∥ ∥∞ ∈ A − Pe∗ ∥Tm Tn ∥∞ ∈ A A∈ARe 2 n r √  e∗ X − m e∗ Y mΩ  − Pe∗ ∥Tm Tn + ∥∞ < cB (α) , n 2 r √  e ∗X m e∗ Y mΩ  = Pe∗ ∥Tm − T + ∥∞ > cB (α) n n 2  √m(U X − U Y ) − √mΩ   ∗X r m e∗ Y  m n e − sup P ∞ ∈ A − P e ∗ ∥T m − T n ∥∞ ∈ A . A∈A Re 2 n Combining the above results we get,  r √   √m(U X − U Y ) e∗ X m e∗ Y mΩ m n  Pe ∗ Tm − T + > cB (α) − P > cB (α)|H a n n 2 ∞ 2 ∞  √ X − U Y ) − √mΩ   r  m(Um n e ∗X m e∗ Y ≤ sup P ∈ A − Pe∗ Tm − Tn ∈A . A∈A Re 2 ∞ n ∞ By arguing as in Theorem 3.5.1, we obtain that, with probability to one,  √ X − U Y ) − √mΩ  r m(Um n  e ∗X m e∗ Y  sup P ∈ A − Pe∗ ∥Tm − Tn ∥∞ ∈ A A∈ARe 2 ∞ n 2 log7 (dm)/m}1/6 . ≲ {Bm,n This completes the proof. We construct a class of general alternatives denoted as Mm,n,d given below. For a large 84 constant K > 0, define n o (f ′ ) Mm,n,d = Ω ∈ Rp × Rp : ∥Ω/2∥∞ ≥ K{Bm,n log(md)/m}1/2 Theorem 3.6.2. Assuming the conditions for Theorem 3.5.1 and (f′ ) hold. Then, for all Ω ∈ Mm,n,d , r √  e∗ X m e∗ Y mΩ  Pe∗ ∥Tm − T + ∥∞ ≥ cB (α) → 1, as n, m, d → ∞ n n 2 Remark 3.6.1. Cai et al. (2013) [11] and Chang et al. (2017) [14], derived similar consis- tency results for their test statistics under a class of sparse alternatives. The above Theorem generalizes their result in the sense that it is valid for general alternatives where Bm,n possi- bly diverge to infinity. This can be understood by noting that the class Mm,n,d is constructed q  Bm,n log(d) in a manner such that Σ and Σ are separated by a lower bound K X Y m . The- orem 4 in Cai et al. (2013) [11] derived a similar bound treating Bm,n to be constant and q  log d their bound was of the order of O m under the class of alternatives. Proof. In the proof below, K ∗ and c∗ are positive and finite universal constants, not depending on m, n, d, whose values keep changing depending on the context. We begin the proof by noting that by the triangle inequality, r √  e∗ X m e∗ Y mΩ  Pe∗ ∥Tm − Tn + ∥∞ ≥ cB (α) n 2 r √  e ∗X m e∗ Y mΩ  ≥ Pe∗ ∥Tm − T ∥∞ ≤ ∥ ∥∞ − cB (α) . n n 2 Define the basis vectors ηa ’s to be natural basis vectors in Rd ∀a = 1, 2, · · · , d. Then, for 85 any t > 0, we have r d r  e∗ X m e∗ Y  X  ∗ e X m e∗ Y  Pe∗ ∥Tm − Tn ∥∞ ≥ t ≤ Pe∗ |Tma − Tna | ≥ t , n n a=1 t2   ≤ 2d exp − . 2 max1≤a≤d {ηaT (Γ̂X + m X n Γ̂ )ηa } The last bound follows from Lemma (.0.10) for Gaussian variables. Now setting the above bound equal to α by plugging in t = cB (α), for some large enough m we obtain that, m 1/2 cB (α) = 2 log(2d/α) max {ηaT (Γ̂X + Γ̂X )ηa }  , 1≤a≤d n m = 4 log(dn) max {ηaT (Γ̂X + Γ̂X )ηa } .   1≤a≤d n Note that, m X m JK max {ηaT (Γ̂X + Γ̂ )ηa } = ∥Γ̂JK m + Γ̂ ∥∞ , 1≤a≤d n n n m JK m JK ≤ ∥Γ̂JK m −Γ + X (Γ̂n − ΓY )∥∞ + ∥ΓJK m + Γ ∥∞ . n n n pm 1 , it follows From the bounds of ∆ ˆ m,n in Theorem 3.4.1 with δm,n = n and γm,n = dm that s 2 log3 (md) Bm,n m JK ∥Γ̂JK m −Γ + X (Γ̂n − ΓY )∥∞ ≲ . n m 86 For the term, ∥ΓJK m JK ∗ ′ m + n Γn ∥∞ , we note that m/n = c by the condition(a ) and m JK m JK ∗ ∥ΓX ∥ + m ∥ΓY ∥   ∥ΓJK m + Γn ∥∞ ≤ ∥ΓJK m ∥∞ + ∥Γ ∥ ∞ ≤ K ∞ ∞ , n n n n ∗  X X ∗ Y Y  ≤ K ∥Γ1 ∥∞ + ∥Γ2 ∥∞ + c ∥Γ1 ∥∞ + ∥Γ2 ∥∞ , "    ∗   T   ≤K max E vec h(X1 , X2 ) (vec(h(X1 , X3 )) a 1≤(a1 ,a2 )≤d a1 2    ∗   T   +c max E vec h(Y1 , Y2 ) vec(h(Y1 , Y3 ) a 1≤(a1 ,a2 )≤d a1 2 # + ∥ΓX ∗ Y 2 ∥∞ + c ∥Γ2 ∥∞ . From the definition of ∥ΓX 1 ∥ and ∥Γ1 ∥, it follows that, Y m JK ∥ΓJK m + Γn ∥∞ (3.6.1) " n n o1/2 n o1/2 ≤ K ∗ max E(vec(h(X1 , X2 ))a1 )2 max E(vec(h(X1 , X3 )a2 )2 + ∥ΓX 2 ∥∞ 1≤a1 ≤d 1≤a2 ≤d n o1/2 n o1/2 + c∗ max E(vec(h(Y1 , Y2 )a1 )2 max E(vec(h(Y1 , Y3 )a2 )2 1≤a1 ≤d 1≤a2 ≤d # + c∗ ∥ΓY2 ∥∞ . 87 Note that,   ∥ΓX2 ∥∞ = max (E(vec(h(X1 , X2 )))a1 )(E(vec(h(X1 , X2 )))a2 ) , (3.6.2) 1≤a1 ,a2 ≤d      ≤ max E|(vec(h(X1 , X2 )))a1 | max E|(vec(h(X1 , X2 )))a2 | , 1≤a1 ≤d 1≤a2 ≤d h i2 = max (E(vech(X1 , X2 ))a1 ) , 1≤a1 ≤d n o ≤ max E(vec(h(X1 , X2 ))a1 )2 , 1≤a1 ≤d 2/3 ≤ Bm,n . Similar conclusion holds for ∥ΓY2 ∥∞ . Hence we obtain that m JK h 2/3 2/3 i 2/3 ∥ΓJK m + Γn ∥∞ ≤ K ∗ 2Bm,n + 2c∗ Bm,n ≤ 2K ∗ (1 + c∗ )Bm,n , n 2/3 ≤ 2K ∗ Bm,n ≤ 2K ∗ Bm,n , where the second last inequality follows from the Holder’s inequaliy and condition (b). Therefore with probability tending to one, we get that m cB (α) ≤ max {ηaT (Γ̂X + Γ̂X )ηa } ≤ (8K ∗ Bm,n log(dn))1/2 . 1≤a≤d n √ Upon choosing the constant K in (f′ ) to be K = 8K ∗ , we obtain that √ 1/2 mΩ/2 ∞ − cB (α) ≥ 8K ∗ Bm,n log(dm)  . 88 Therefore, we conclude that as m ∨ n → ∞ and d → ∞, r √  e∗ X m e∗ Y mΩ  Pe∗ ∥Tm − Tn + ∥∞ ≥ cB (α) n 2 r  e∗X m e∗ Y ∗ 1/2  ≥ Pe∗ ∥Tm − T ∥∞ ≤ {8K Bm,n log(dm)} n n r  ∗X m e∗ Y  = 1 − Pe∗ ∥Tm −e Tn ∥∞ ≥ {8K ∗ Bm,n log(dm)}1/2 n   2 ≥ 1 − 2d exp − 8K ∗ Bm,n log(dm)/2Bm,n ≥ 1 − 3 4 → 1. d m 89 APPENDIX 90 In this Chapter we present some auxiliary results which are crucial in proving the results Chapter 2 and Chapter 3. Some of them are presented without proofs as they are the results from other research articles. Some lemmas are original to our work and we provide their proofs. Lemma .0.1. Let X1 , X2 , · · · , Xn be independent centered random vectors in Rd with d ≥ 2. Define Z X := maxi≤j≤d | n P i=1 Xi,j |, M X := max1≤i≤n max1≤j≤d |Xij | and σ 2 := max1≤j≤d n 2 P i=1 E[Xij ].Then  q    X p ≤ C σ log d + E (M X )2 log d  E Z where C is a universal constant. Proof: See Lemma 8 in Chernozhukov et al. (2015) [18]. Lemma .0.2. Under the setting of Lemma .0.1, for every ϵ ≥ 0, δ ∈ (0, 1] and t > 0,      X X  2 2 X δ P Z ≥ (1 + ϵ)E[Z ] + t ≤ exp − t /(3σ ) + 3 exp − t/ C1 ∥M ∥ψ , δ where C1 = C1 (ϵ, δ) is a constant depending only on ϵ, δ. Proof: See Theorem 4 in Adamczak (2008) [1]. Lemma .0.3. Under the setting of Lemma .0.1, for every ϵ ≥ 0, s > 0 and t > 0,   P Z ≥ (1 + ϵ)E[Z ] + t ≤ exp{−t2 /(3σ 2 )} + C2 E M X )u /tu , X X   where C2 = C2 (ϵ, u) is a constant depending only on ϵ, u. 91 Proof: See Theorem 2 in Adamczak (2010) [2]. Lemma .0.4. (Nazarov’s inequality) Let X ∈ Rd denote a centered Gaussian random vector in Rd such that for some constant b > 0, and E[Xj2 ] ≥ b, j = 1, 2, · · · , d. Then for every x ∈ Rd and a > 0, p P(X ≤ x + a) − P(X ≤ x) ≤ Ca log p, where C is a constant depending only on b. Proof: See Nazarov [38] or [20] . The following lemma is Lemma 5.1 in Chernozhukov et al. (2017) [19], we state it here for the sake of completeness. For a collection of independent random vectors X = X1 , X2 , · · · , Xn ∈ Rd with mean- vector µ and covariance matrix Σ and a collection of independent Gaussian random vectors Y = Y1 , Y2 , · · · , Yn ∈ Rd with the same mean and covariance matrix as that of X, we define √ X √ vSn + 1 − vSnY ≤ y − P SnX ≤ y .   ρn := sup P y∈Rd ,v∈[0,1] We further define Mn,X (ϕ), Mn,Y (ϕ) and Mn (ϕ) as n h i n−1 3 X Ln = max E |Xij | , 1≤j≤d i=1 n "  √ # 1 n E max |Xik |3 I max |Xik | > X Mn,X (ϕ) = , n 1≤k≤d 1≤k≤d 4ϕ log(d) i=1 n "  √ # 1 n E max |Yik |3 I max |Yik | > X Mn,Y (ϕ) = , n 1≤k≤d 1≤k≤d 4ϕ log(d) i=1 Mn (ϕ) = Mn,X (ϕ) + Mn,Y (ϕ) 92 Now, we are in a position to state the lemma. Pn Lemma .0.5. Suppose that there exists some constant b > 0 such that n− 1  2 i=1 E Xij ≥ b for all j = 1, 2, · · · , d. Then ρn satisfies the following inequality for all ϕ ≥ 1 : ϕ2 log2 p n p o log p ρn ≲ √ ϕLn ρn + Ln log p + ϕMn (ϕ) + , n ϕ up to a constant K that depends only on b. Proof: See proof of Lemma 5.1 in Chernozhukov et al. (2017) [19]. The following Lemma appears in Chernozhukov et al. (2017) [19]. We quote the lemma here as it been used in the proof of Theorem 2.3.1. Lemma .0.6. Let ξ be a non-negative random variable such that P (ξ > x) ≤ Ae−x/B for all x ≥ 0 and for some constants A, B > 0. Then for every t ≥ 0, E ξ 3 I{ξ > t} ≤   6A(t + B)3 e−t/B . Proof: See proof of Lemma C.1 in Chernozhukov et al. (2017) [19]. The following lemma provides a bound of approximation between multiplier bootstrap and Gaussian of normalized sums. Lemma .0.7. Under the setup of Lemma .0.5 we define ∆n,r = max 1 ≤ j, k ≤ d |Σ̂j,k −Σj,k | where Σ̂j,k denotes the (j, k)th element of Σ̂ = (n−1)−1 (Xi − X̄)(Xi − X̄)T and Σj,k denotes the (j, k)th element of Σ. Now define, n n 1 X 1 X ρMn B = sup P( √ ei Xi ≤ y|X) − P( √ Yi ≤ y) . y∈Rd n n i=1 i=1 93 Under these conditions we have, 1/3 2/3 d. ρM n B ≤ C∆ n,r log Proof: See proof of Theorem 4.1 in Chernozhukov et al. (2017) [19]. The following lemmas can be found as Lemma 2.2.2 and Lemma 2.2.1 on page-96 in Van- der-Waart and Wellner (1996) [48]. We quote the lemma here as it has been used in the proof of Theorem 2.3.1. Lemma .0.8. Let ψ be a convex, non-decreasing, non-zero function with ψ(0) = 0 and ψ(x)ψ(y) lim supx,y→∞ ψ(cxy) < ∞ for some constant c. Then, for any random variable X1 , X2 , · · · , Xn , we have ∥ max Xi ∥ψ ≤ Kψ −1 (n) max ∥Xi ∥ψ , 1≤i≤n i for a constant K depending only on ψ. p Lemma .0.9. Let X be any random variable with P (|X| ≥ x) < Ke−Cx for every x, for  1/p constants K and C and for p ≥ 1. Then its Orlicz norm satisfies ∥X∥ψp ≤ 1+K . C Lemma .0.10. Let X ∼ N (0, ν) where ν > 0 is the variance of X. Then,   β2  P |X| ≥ β ≤ 2 exp − . 2ν Now, we shall state a lemma which appears as Lemma E.1 in Chen (2018) [16]. Before stating the lemma we need to define a few quantities. For m = [n/r], we define Z X = m {n(n − 1)}−1 P 1≤i̸=j≤n h(Xi , Xj ) − E(h(X1 , X2 ) ∞ , 94 ir+r ) = h (X hj (Xir+1 j ir+1 , Xir+2 , · · · , Xir+r ), ir+r ) = h (X ir+r )I(max ir+r h̄(Xir+1 j ir+1 1≤j≤d hj (Xir+1 ) ≤ τ ), Pm−1  ir+r ir+r  Z1X = max1≤j≤d i=0 h̄j (Xir+1 ) − Eh̄j (Xir+1 ) , ir+r )|, ϑ2 = max Pm−1 2 ir+r M X = max1≤j≤d max0≤i≤m−1 |hj (Xir+1 1≤j≤d i=0 Ehj (Xir+1 ). Lemma .0.11. Let X1 , X2 , · · · , Xn ∈ Rp and α ∈ (0, 1]. Suppose that hj (X1 , X2 , · · · , Xr ) ψ < ∞ for all j = 1, 2, · · · , d and r < n. α Let τ = 8E[M X ]. Then , for any 0 < η ≤ 1 and δ > 0, there exists a constant C(α, η, δ) > 0 such that we have, ∀t > 0 " α # t2    t P(Z X ≥ (1 + η)EZ1 + t) ≤ exp − + 3 exp − . 2(1 + δ)ϑ2 C(α, η, δ)∥M X ∥ψα Proof. See proof of Lemma E.1 in Chen (2018) [16]. Before stating the next lemma we need to define a few quantities: for any function f from Rp × Rp to Rp × Rp , define 1 1 X = VnY = X X Vm f (Xi , Xj ), f (Yi , Yj ), m(m − 1) n(n − 1) 1≤i̸=j≤m 1≤i̸=j≤n MX = max max fa (Xi , Xj ) , MY = max max fa (Yi , Yj ) , 1≤i̸=j≤m 1≤a≤d 1≤i̸=j≤n 1≤a≤d  1  1 q q DqX = max E|fa (X1 , X2 )| q , DqY = max E|fa (Y1 , Y2 )|q , q > 0. 1≤a≤d 1≤a≤d The following lemma will provide a bound for Rm,n . The claim (.0.3) of this lemma is Theorem 5.1 of Chen (2018) [16] while (.0.4) follows from (.0.3) by applying it to each of the two samples. Lemma .0.12. Let X m = (X1 , · · · , Xm ) and Y n = (Y1 , · · · , Yn ) be two independent random 95 samples from F1 , F2 , respectively. Let f : Rp × Rp 7→ Rd be a measurable function such that f (x, z) = f (z, x), ∀ x, z ∈ Rp and E fa (X1 , X2 ) + E fa (Y1 , Y2 ) < ∞. If 2 ≤ d ≤ exp b(m ∨ n) , for some constant b > 0, then ∃ a constant 0 < K X < ∞ such that  X √  log(d)  3 X log(d) X log(d)  5 X  E Vm ∞ ≤ K X (1 + b) 2 ∥M ∥4 + D2 + 4D 4 . (.0.3) m m m Consequently, with K = max{K X , K Y } > 0, we obtain h i E Vm X −δ Y (.0.4) m,n Vn ∞   3 5 √   log d 2 log(d) X log(d) 4 X  ≤ K(1 + b)  ∥M X ∥4 + D2 + D4 m m m    3  5 log(d) 2 log(d) Y log(d) 4 Y  + δm,n  ∥M Y ∥4 + D2 + D4 . n n n To proceed further we need notation. For q > 0 and any sequences ϕm , ϕn ≥ 1, define X = max E|g (X − µX )|q , D Y = max E|g (Y − µY )|q , Dg,q a g,q a 1≤a≤d 1≤a≤d   √  X X q X n Mg,q (ϕm ) = E max ga (X − µ ) I max ga (X − µ ) > , 1≤a≤d 1≤a≤d 4ϕm log d   √  Y Y q Y n Mg,q (ϕn ) = E max ga (Y − µ )| I max ga (Y − µ ) > , 1≤a≤d 1≤a≤d 4ϕn log d   √  G1 G1 q G1 n Mq (ϕm ) = E max |Tma | I max |Tma | > , 1≤a≤d 1≤a≤d 4ϕm log d   √  G2 G2 q G2 n Mq (ϕn ) = E max |Tna | I max |Tna | > , 1≤a≤d 1≤a≤d 4ϕn log d MqX (ϕm ) = Mg,qX (ϕ ) + M G1 (ϕ ), MqY (ϕn ) = Mg,qY (ϕ ) + M G2 (ϕ ). m q m n q n 96 Also denote for τ > 0,   q   X (τ ) Mh,q =E max max vec(h(Xi , Xj )) a I max  vec(h(Xi , Xj )) a > τ , 1≤i̸=j≤m 1≤a≤d 1≤a≤d   Y  q   Mh,q (τ ) = E max max vec(h(Yi , Yj )) a I max vec(h(Yi , Yj )) a > τ . 1≤i̸=j≤m 1≤a≤d 1≤a≤d We are ready to state the following lemma. Lemma .0.13. Suppose the condition (a) holds and log(d) ≤ b̄(m ∨ n), for some constant b̄ > 0. Then, for constants Ci := Ci (b, b̄) > 0, i = 1, 2 such that for any real sequences D̄g,3 X Y satisfying D X ≤ D̄ X and D Y ≤ D̄ Y , and for all τ > 0, we obtain, and D̄g,3 g,3 g,3 g,3 g,3 !1 X )2 log7 d 6 Y )2 |δ !1 6 log7 d 6 (D̄g,3 M3X (ϕm ) (D̄g,3 | M3Y (ϕn )  m,n ρ∗∗ m,n ≤ C3 + X + + Y m D̄g,3 n D̄g,3 log3/2 d 5/4 d  + ϕ∗ X (τ )1/4 + τ ) + log(d) (D X )1/2 + log (Mh,4 (D4X )1/4 2 m m1/2 m3/4 log3/2 d log(d) X 1/2 log5/4 d X 1/4  + X (Mh,4 (τ ) 1/4 + τ) + (D2 ) + (D4 ) . m m1/2 m3/4 where, C3 = max{C1 , C2 }, ϕ∗ := (max ϕm , ϕn ), with X )2 log4 d !−1/6 Y )2 log4 d !−1/6 (D̄g,3 (D̄g,3 ϕm = C1 , ϕ n = C2 . (.0.5) m n Proof. This lemma is analogous to Proposition 5.3 of Chen (2018) [16]. We provide details to clearly address the additional changes needed in the proof of Proposition 5.3 to prove the stated lemma. Fix a y ∈ Rp and define d 1 X  Fβ (w) = log exp β(wj − yj ) , β ∈ R, w ∈ Rp . β j=1 97 We shall often use this function with β = ϕ log(d), where ϕ ≥ 1. In this case, log(d) 0 ≤ Fβ (w) − max (wj − yj ) ≤ = ϕ−1 , ∀ w ∈ Rp , ϕ ≥ 1. 1≤j≤d β Next, let u0 : R → [0, 1] be a function such that u0 (t) = 1, if t < 0, u0 (t) = 0, if t > 1 and u0 (t), t ∈ [0, 1], is five times continuously differentiable with bounded derivatives. Let u(t) := u0 (ϕt), Ψ(w) = u(ϕFβ (w)), t ∈ R, ϕ ≥ 1, w ∈ Rp . Note that Ψ(w) : Rp × Rp → [0, 1]. For later use, we note that when β = ϕ log(d), I(t ≤ 0) ≤ u(t) ≤ I(t ≤ ϕ−1 ), t ∈ R. Let G1i , H1i , 1 ≤ i ≤ m be i.i.d. Nd (0, ΓX ) and G2j , H2j , 1 ≤ j ≤ n be i.i.d. Nd (0, ΓY ), independent of G1i , H1i , 1 ≤ i ≤ m, where ΓX = Cov(g(X − µX )), ΓY := Cov(g(Y − µY )). Let 1 h √ n√ √ o √ i Zi∗ (t) := √ t vg(Xi ) + 1 − vG1i + 1 − tH1i , 1 ≤ i ≤ m, m 1 h√ n√ √ o √ i Zj∗∗ (t) := √ δm,n t vg(Yj − µY ) + 1 − vG2j + 1 − tH2j , 1 ≤ j ≤ n, n m n Z ∗ (t) := Zi (t), Z ∗∗ (t) := Zj∗∗ (t), Z(t) = Z ∗ (t) + Z ∗∗ (t), v, t ∈ [0, 1]. X X i=1 j=1 98 Let m √ m n √ 1 X √  1 X 1 X Im,n := Ψ v√ g(Xi ) + 1 − v √ G1i + vδm,n √ g(Yj ) m m n i=1 i=1 j=1 √ n   m n  1 X 1 X 1 X + 1 − vδm,n √ G2j − Ψ √ H1i + δm,n √ H2j n m n j=1 i=1 j=1   = Ψ Z(1) − Ψ Z(0) . Recall (.0.5). By Xue and Yao (2020) [54], we obtain ϕ log2 d h  2 i E[Im,n (v)] ≲ C1 (b) m√ X ρ1 X X (ϕ ) p ϕm Dg,3 m,n + Dg,3 log(d) + ϕ M m 3 m m ϕ2n log2 d 3  Y 1 Y p Y  + √ |δm,n | ϕ2 Dg,3 ρm,n + Dg,3 log(d) + ϕn M3 (ϕn ) . n To proceed further, define m n √ n 1 X  1 X o ρ1m,n := sup sup P v √ g(Xi ) + δm,n √ g(Yj ) v∈[0,1] y∈Rp m n i=1 j=1 √ m n  n 1 X 1 X o + 1−v √ G1i + δm,n √ G2j ≤ y m n i=1 j=1  1 m n X 1 X  −P √ G1i + δm,n √ G2j ≤ y . m n i=1 j=1 Note that     ρ1m,n = sup sup P Z(1) ≤ y − P Z(0) ≤ y . v∈[0,1] y∈Rp 99 By the Mean Value Theorem, d d u′ (Fβ (ξ))ηa (ξ)Rm,n,a X X Ψ(Wm,n ) − Ψ(Lm,n ) = ∂a Ψ(ξ)Rm,n.a = a=1 a=1 where ηa (w) = ∂Fβ (w)/∂wa is defined to be the first order partial derivative of Fβ (w) w.r.t wa and η := (η1 , · · · , ηd )T is a d × 1 random vector on the line segment joining Lm,n and Tm,n . Following the arguments in Xue and Yao (2020) [54], we can verify that ηa (w) ≥ 0, da=1 ηa (w) = 1, for any w ∈ Rp and there is a constat K1 (ϕ∗ ) such that P supt∈R |u′ (t)| ≤ K1 (ϕ∗ ), where ϕ∗ = max{ϕm , ϕn }. Therefore, E[Ψ(Tm,n ) − Ψ(Lm,n )] ≤ K1 ϕ∗ E|Rm,n |∞ . Proceeding as in Xue and Yao (2020) [54] (Eqn (99)) with ϕ = min{ϕm , ϕn }, we conclude that   P Z(1) ≤ y − ϕ−1     ≤ P Z(0) ≤ y − ϕ−1 + C2 (b)ϕ−1 log(d) + |E[Im,n ]| + K1 ϕ∗ E |Rm,n |∞ , p   P Z(0) ≤ y + ϕ −1     ≥ P Z(1) ≤ y + ϕ−1 + C2 (b)ϕ−1 log(d) + |E[Im,n ]| + K1 ϕ∗ E |Rm,n |∞ . p 100 Combining these bounds with the previous equations, we conclude that 1 ρ1m,n ≤ K1 ϕ∗ E|Rm,n |∞ + C2 (b)ϕ−1 log 2 d (ϕm )2 log2 d    X 1 X X p + C1 (b) √ ϕm Dg,3 ρm,n + Dg,3 log(d) + ϕm M3 (ϕm ) m |δm,n |3 ϕ2n log2 d   Y 1 Y Y p + √ ϕn Dg,3 ρm,n + Dg,3 log(d) + ϕn M3 (ϕn ) . n By similar arguments as used in Lemma 4 of Xue and Yao (2020) [54] and choosing ϕX Y m , ϕn ≥ 1 we conclude that for any real sequence (D̄g,3 g,3 m,n is bounded X )2 ≥ D X and (D̄ Y )2 ≥ D Y , ρ1 g,3 g,3 from the above by C3 (b) multiplied by !1 !1   X )2 log7 d 6 Y )2 |δ 6 7 (D̄g,3 M3X (ϕm ) (D̄g,3 m,n | log d 6 M3Y (ϕn )  ϕ E|Rm,n |∞ + + + + .  m X D̄g,3 n Y D̄g,3 By similar arguments as used in Chen (2018) [16], Lemma A.1 and Jensen’s inequality, there exist universal positive constants K2 , K3 such that the following inequalities hold. h i h 4 i E max max fa4 (Xi , Xj ) ≤ K2 E max max vec(h(Xi , Xj )) a , 1≤a≤d 1≤i̸=j≤m 1≤a≤d 1≤i̸=j≤n h i h 4 i E max max fa4 (Yi , Yj ) ≤ K3 E max max vec(h(Yi , Yj )) a . 1≤a≤d 1≤i̸=j≤m 1≤a≤d 1≤i̸=j≤n By Lemma .0.12, we obtain that log3/2 d 5/4 d  1 X (τ )1/4 + τ ) + log(d) (D X )1/2 + log E[|Rm,n |∞ ] ≤ K3 (b̄ 2 + 1) (Mh,4 2 (D4X )1/4 m m 1/2 m 3/4 3/2 |δm,n | log d log(d) X 1/2 log d X 1/4 5/4  + X (Mh,4 (τ ) 1/4 + τ) + (D2 ) + (D4 ) . m m1/2 m3/4 Finally by using Xue and Yao (2020) [54] Lemma 3 and Lemma 4, we conclude the proof 101 of this lemma, since " !1 X )2 log7 d 6 Y )2 |δ 6 7 !1 (D̄g,3 M3X (ϕm ) (D̄g,3 m,n | log d 6 M3Y (ϕn ) ρ∗∗ m,n ≤ CI + X + + Y m D̄g,3 n D̄g,3 log3/2 d 5/4 d  + ϕ∗ X (τ )1/4 + τ ) + log(d) (D X )1/2 + log (Mh,4 (D4X )1/4 2 m m1/2 m3/4 # log3/2 d Y (τ )1/4 + τ ) + log(d) log 5/4 d + (Mh,4 (D2Y )1/2 + (D4Y )1/4 . m m 1/2 m 3/4 102 BIBLIOGRAPHY 103 BIBLIOGRAPHY [1] Radoslaw Adamczak. A tail inequality for suprema of unbounded empirical processes with applications to markov chains. Electronic Journal of Probability, 13:1000–1034, 2008. [2] Radosław Adamczak. A few remarks on the operator norm of random toeplitz matrices. Journal of Theoretical Probability, 23(1):85–108, 2010. [3] TW Anderson. An introduction to statistical multivariate analysis. John-Wiley, 2003. [4] Zhidong Bai and Hewa Saranadasa. Effect of high dimension: by an example of a two sample problem. Statistica Sinica, pages 311–329, 1996. [5] Zhidong Bai and Jian-feng Yao. Central limit theorems for eigenvalues in a spiked population model. In Annales de l’IHP Probabilités et statistiques, volume 44, pages 447–474, 2008. [6] Jinho Baik and Jack W Silverstein. Eigenvalues of large sample covariance matrices of spiked population models. Journal of multivariate analysis, 97(6):1382–1408, 2006. [7] Maurice Stevenson Bartlett. Properties of sufficiency and statistical tests. Proceed- ings of the Royal Society of London. Series A-Mathematical and Physical Sciences, 160(901):268–282, 1937. [8] Peter Bühlmann and Sara Van De Geer. Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media, 2011. [9] T Tony Cai, Xiao Han, and Guangming Pan. Limiting laws for divergent spiked eigen- values and largest nonspiked eigenvalue of sample covariance matrices. The Annals of Statistics, 48(3):1255–1280, 2020. [10] T Tony Cai and Yin Xia. High-dimensional sparse manova. Journal of Multivariate Analysis, 131:174–196, 2014. [11] Tony Cai, Weidong Liu, and Yin Xia. Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. Journal of the American Statistical Association, 108(501):265–277, 2013. [12] Mireille Capitaine and Muriel Casalis. Asymptotic freeness by generalized moments for gaussian and wishart matrices. application to beta random matrices. Indiana University mathematics journal, pages 397–431, 2004. 104 [13] Nilanjan Chakraborty and Lyudmila Sakhanenko. Novel multiplier bootstrap tests for high-dimensional data with applications to manova. Manuscript, 2022. [14] Jinyuan Chang, Wen Zhou, Wen-Xin Zhou, and Lan Wang. Comparing large covariance matrices under weak conditions on the dependence structure and its application to gene clustering. Biometrics, 73(1):31–41, 2017. [15] Song Xi Chen, Jun Li, and Ping-Shou Zhong. Two-sample and anova tests for high dimensional means. The Annals of Statistics, 47(3):1443–1474, 2019. [16] Xiaohui Chen. Gaussian and bootstrap approximations for high-dimensional u-statistics and their applications. The Annals of Statistics, 46(2):642–678, 2018. [17] Victor Chernozhukov, Denis Chetverikov, and Kengo Kato. Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. The Annals of Statistics, 41(6):2786–2819, 2013. [18] Victor Chernozhukov, Denis Chetverikov, and Kengo Kato. Comparison and anti- concentration bounds for maxima of gaussian random vectors. Probability Theory and Related Fields, 162(1):47–70, 2015. [19] Victor Chernozhukov, Denis Chetverikov, and Kengo Kato. Central limit theorems and bootstrap in high dimensions. The Annals of Probability, 45(4):2309–2352, 2017. [20] Victor Chernozhukov, Denis Chetverikov, and Kengo Kato. Detailed proof of nazarov’s inequality. arXiv preprint arXiv:1711.10696, 2017. [21] Arthur P Dempster. A high dimensional two sample significance test. The Annals of Mathematical Statistics, pages 995–1010, 1958. [22] Arthur P Dempster. A significance test for the separation of two highly multivariate small samples. Biometrics, 16(1):41–50, 1960. [23] Bradley Efron. Large-scale inference: empirical Bayes methods for estimation, testing, and prediction, volume 1. Cambridge University Press, 2012. [24] Jianqing Fan, Yuan Liao, and Jiawei Yao. Power enhancement in high-dimensional cross-sectional tests. Econometrica, 83(4):1497–1541, 2015. [25] Ronald A Fisher. Xv.—the correlation between relatives on the supposition of mendelian inheritance. Earth and Environmental Science Transactions of the Royal Society of Edinburgh, 52(2):399–433, 1919. [26] Yasunori Fujikoshi, Tetsuto Himeno, and Hirofumi Wakaki. Asymptotic results of a high dimensional manova test and power comparison when the dimension is large compared to the sample size. Journal of the Japan Statistical Society, 34(1):19–26, 2004. 105 [27] Christophe Giraud. Introduction to high-dimensional statistics. Chapman and Hal- l/CRC, 2021. [28] Kurt Johansson. Shape fluctuations and random matrices. Communications in mathe- matical physics, 209(2):437–476, 2000. [29] Iain M Johnstone. On the distribution of the largest eigenvalue in principal components analysis. The Annals of statistics, 29(2):295–327, 2001. [30] Iain M Johnstone and Arthur Yu Lu. On consistency and sparsity for principal com- ponents analysis in high dimensions. Journal of the American Statistical Association, 104(486):682–693, 2009. [31] DN Lawley. A generalization of fisher’s z test. Biometrika, 30(1/2):180–187, 1938. [32] Ji Oon Lee and Kevin Schnelli. Tracy–widom distribution for the largest eigenvalue of real sample covariance matrices with general population. The Annals of Applied Probability, 26(6):3786–3839, 2016. [33] Jun Li and Song Xi Chen. Two sample tests for high-dimensional covariance matrices. The Annals of Statistics, 40(2):908–940, 2012. [34] Zhenhua Lin, Miles E Lopes, and Hans-Georg Müller. High-dimensional manova via bootstrapping and its application to functional and sparse count data. Journal of the American Statistical Association, pages 1–15, 2021. [35] Vladimir Alexandrovich Marchenko and Leonid Andreevich Pastur. Distribution of eigenvalues for some sets of random matrices. Matematicheskii Sbornik, 114(4):507– 536, 1967. [36] Robb J Muirhead. Aspects of multivariate statistical theory. John Wiley & Sons, 2009. [37] DN Nanda. Probability distribution tables of the largest root of a determinantal equa- tion with two roots. J. Indian Soc. Of Agricultural Stat, 3:175–177, 1951. [38] Fedor Nazarov. On the maximal perimeter of a convex set in Rn with respect to a gaussian measure. In Geometric aspects of functional analysis, pages 169–187. Springer, 2003. [39] Debashis Paul. Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica, pages 1617–1642, 2007. [40] KC Sreedharan Pillai. Some new test criteria in multivariate analysis. The Annals of Mathematical Statistics, pages 117–121, 1955. 106 [41] Samarenda Nath Roy and J Roy. A Note on a Class of Problems in" normal" Mul- tivariate Analysis of Variance. United States Air Force, Office of Scientific Research, 1957. [42] James R Schott. Some high-dimensional tests for a one-way manova. Journal of Multi- variate Analysis, 98(9):1825–1839, 2007. [43] James R Schott. A test for the equality of covariance matrices when the dimen- sion is large relative to the sample sizes. Computational Statistics & Data Analysis, 51(12):6535–6542, 2007. [44] Kerby Shedden and Jeremy Taylor. Lung adenocarcinomas. Methods of Microarray Data Analysis IV, page 121, 2004. [45] Muni S Srivastava. Multivariate theory for analyzing high dimensional data. Journal of the Japan Statistical Society, 37(1):53–86, 2007. [46] Muni S Srivastava and Hirokazu Yanagihara. Testing the equality of several covariance matrices with fewer observations than the dimension. Journal of Multivariate Analysis, 101(6):1319–1329, 2010. [47] Muni Shanker Srivastava and CG Khatri. An introduction to multivariate statistics. North-Holland/New York, 1979. [48] Aad W Vaart and Jon A Wellner. Weak convergence. In Weak convergence and empirical processes, pages 16–28. Springer, 1996. [49] Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, vol- ume 48. Cambridge University Press, 2019. [50] Lili Wang and Debashis Paul. Limiting spectral distribution of renormalized separable sample covariance matrices when p/n→ 0. Journal of Multivariate Analysis, 126:25–52, 2014. [51] Weichen Wang and Jianqing Fan. Asymptotics of empirical eigenstructure for high dimensional spiked covariance. Annals of statistics, 45(3):1342, 2017. [52] Hiroki Watanabe, Masashi Hyodo, and Shigekazu Nakagawa. Two-way manova with unequal cell sizes and unequal cell covariance matrices in high-dimensional settings. Journal of Multivariate Analysis, 179:104625, 2020. [53] Samuel S Wilks. Certain generalizations in the analysis of variance. Biometrika, pages 471–494, 1932. [54] Kaijie Xue and Fang Yao. Distribution and correlation-free two-sample test of high- dimensional means. The Annals of Statistics, 48(3):1304–1328, 2020. 107 [55] Takayuki Yamada and Muni S Srivastava. A test for multivariate analysis of variance in high dimension. Communications in Statistics-Theory and Methods, 41(13-14):2602– 2615, 2012. [56] Jin-Ting Zhang, Jia Guo, and Bu Zhou. Linear hypothesis testing in high-dimensional one-way manova. Journal of Multivariate Analysis, 155:200–216, 2017. [57] Mingjuan Zhang, Cheng Zhou, Yong He, and Bin Liu. Data-adaptive test for high- dimensional multivariate analysis of variance problem. Australian & New Zealand Jour- nal of Statistics, 60(4):447–470, 2018. 108