MULTIVARIATE GENERALIZED FUNCTIONAL LINEAR MODELS WITH APPLICATIONS TO GENOMICS By Sneha Jadhav A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Statistics – Doctor of Philosophy 2017 ABSTRACT MULTIVARIATE GENERALIZED FUNCTIONAL LINEAR MODELS WITH APPLICATIONS TO GENOMICS By Sneha Jadhav This thesis is focused on developing functional data methodology with the aim of addressing problems that arise in genetic sequencing data. While significant progress has been made in identifying common genetic variants associated with diseases, these variants only explain a small proportion of heritability. Recent studies suggest that rare variants could account for this variability. With advancements in sequencing technology, large-scale sequencing studies are now being conducted to comprehensively investigate the contribution of rare variants to the genetic etiology of various diseases. Although these studies hold great potential for uncovering new disease-associated variants, the massive amount of data and complex structure of sequencing data poses great analytical challenges on association analysis. Advanced methods are needed to address these challenges and to facilitate the discovery process of new variants predisposing to various diseases. We use functional data analysis methods to capture the complexities of sequencing data. In the first chapter we investigate the importance of considering the genetic structure of sequencing data. In association studies the effect of appropriately modeling genetic structure of sequencing data on association analysis have not been well studied. We compare three statistical approaches which use different strategies to model the genetic structure. They are a burden test, a burden test that considers pairwise correlation, and a functional analysis of variance (FANOVA) test that models the gene through fitting continuous curves on an individuals genotype profile. We find some evidence in favor of treating sequencing data as a function. In the second chapter we present the definitions of some fundamental concepts in Functional Data Analysis like the mean element, covariance operator and its eigen decomposition, and KarhunenLo`eve expansion. Basis expansion and in particular Karhunen-Lo`eve expansion play an important role in this thesis. We briefly discuss the estimators for the mean function, the covariance operator and their consistency. Results on the consistency of the eigenvalues and eigenfunctions of the sample covariance operator are also stated. Several times genetic data is collected on families, where the response variable or the trait of the family members can be dependent on each other. Additionally, this trait of interest can be discrete or continuous. Thus there is a need for a functional model that can handle dependent data that may be continuous or discrete. The model proposed by M¨uller and Stadtm¨uller (2005) uses the generalized estimating equations approach that can handle both continuous and discrete data. However, they assume the response variable to be univariate and the sample to be independent. There are no existing functional methods that we know of that can be directly applied to the family data. In the third chapter we develop a framework for dependent generalized functional linear models where the response is multivariate, that can be used to test for a certain type of association between the genetic data and the trait of interest for family data. In the fourth chapter we develop regression framework where the response variable has a normal distribution and there is measurement error in the regressor function. In this set-up, the true regressor function is not observable. Instead, we observe a surrogate variable and its replicates. The relation between the true function and the surrogate one is assumed to follow the additive classical measurement error model. We use the approach developed by Stefanski and Carroll (1987) to propose an estimating equation for the parameters and show asymptotic existence and consistency of the estimate obtained from this equation. Copyright by SNEHA JADHAV 2017 ACKNOWLEDGMENTS I wish to express sincere gratitude to my advisor Prof. Hira L. Koul. I am thankful for his immeasurable support, guidance, encouragement and patience. I would also like to thank Prof. Qing Lu for giving me the opportunity to work with him on a variety of projects and broadening the scope of my work. He has been a very supportive mentor. I thank Prof. Sakhanenko for lending her ear to my ideas and offering her advice on several occasions and Prof Cui for serving on my dissertation committee. A loving thanks to all my friends for making this a fun-filled journey. I would like to express my appreciation to the graduate secretary Sue Watson for looking out for us students. I am grateful to my husband Venkat Ramakrishnan for his unyielding patience, support and love. This thesis might not have been written without the support of my mum Hemlata Jadhav, my father Umakant Jadhav and my sister Shalmali Jadhav who have shaped me in to who I am. v TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Chapter 1 Modeling Sequencing Data . . . . . . . . 1.1 Burden Test . . . . . . . . . . . . . . . . . . . . . 1.2 Burden test that considers the pairwise Correlation . 1.3 Functional Analysis of Variance test . . . . . . . . 1.4 Simulation . . . . . . . . . . . . . . . . . . . . . . 1.5 Results . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Discussion . . . . . . . . . . . . . . . . . . . . . . 1.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 3 3 4 5 6 7 Chapter 2 Preliminaries on Functional Data Analysis . . . . . . . . . . . . . . . . . 9 2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Chapter 3 Multivariate Generalized Functional Linear Models . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Asymptotic Theory . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Existence and Consistency . . . . . . . . . . . . . . . . . 3.4.2 Asymptotic Normality . . . . . . . . . . . . . . . . . . . 3.5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Application to the sequencing data from the Minnesota Twin Study 3.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 14 16 18 19 19 20 23 29 31 31 Chapter 4 Measurement Error Model . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 vi LIST OF TABLES Table 1.1: Power for unidirectional effect . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Table 1.2: Power for bidirectional effect . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Table 1.3: Summary of top 10 genes with the smallest p-values from the association analysis 6 Table 3.1: Empirical power as a function of γ . . . . . . . . . . . . . . . . . . . . . . . . 24 Table 3.2: Empirical level as a function of γ . . . . . . . . . . . . . . . . . . . . . . . . . 24 Table 3.3: Empirical power comparison with the gSKAT . . . . . . . . . . . . . . . . . . 26 Table 3.4: Empirical power as a function of sample size . . . . . . . . . . . . . . . . . . . 27 Table 3.5: Empirical level as a function of cluster size . . . . . . . . . . . . . . . . . . . . 27 Table 3.6: Effect of increasing pn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Table 3.7: p-values for Minnesota Twin Study . . . . . . . . . . . . . . . . . . . . . . . . 30 Table 4.1: Error in the estimator as a function of σ2 . . . . . . . . . . . . . . . . . . . . . 75 vii LIST OF FIGURES Figure 1.1: LD plots and plots of the fitted smooth functions for THRA . . . . . . . . . . Figure 3.1: Normal Q-Q plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 viii 8 Chapter 1 Modeling Sequencing Data Genome-wide association studies have made substantial progress in identifying common variants associated with human diseases. Despite this, a large portion of heritability remains unexplained. Evolution theory and empirical human genetic studies suggest that rare mutations could play an important role in human diseases, which motivates comprehensive investigation of rare variants in sequencing studies. Advances in sequencing technology have enabled researchers to sequence exome regions or even the whole genome at affordable costs (Cirulli and Goldstein, 2010). The emerging sequencing data facilitates the study of massive amounts of single nucleotide variants, including both rare and commons variants for their potential role in complex human diseases. Although these studies hold great promise for identification of new disease-susceptibility variants, the extremely large number of single nucleotide variants (SNVs) brings significant challenge for association analysis. Each of these variants have a certain position on the chromosome that is also available to us. Given the linkage disequilibrium (Laird and Lange (2010a)) and that the genes that are closer together are more likely to be inherited together during meiosis, we should consider the positions of the variants in our analysis. This suggests that we can consider the genetic data as a function of the positions of the variants. Moreover, the sequencing data is usually high dimensional and we shall see in subsequent chapters that treating it as a function will help address the problems posed by its high dimensional nature. Treating this data a function does not require the data to be independent or assume a certain correlation structure. The functional approach can accommodate 1 rare variants. To explore the association of rare variants with human diseases, many statistical approaches have been developed with different ways of modelling the genome so as to capture its properties to the fullest. Conventional single-locus analysis suffers from low power because of low frequency of SNVs and multiple testing issues. Grouping SNVs in a genetic region (e.g., gene) could aggregate the association signal and alleviate the multiple testing issues, and therefore has been widely used in statistical analysis of sequencing data (Lee et al. (2014)). Various statistical methods have been proposed to group SNVs with or without considering the underlying genetic structure. However, the impact of different strategies of modelling sequencing structure on the association results has rarely been investigated. If empirical evidence suggests no use of considering the sequencing structure in the association analysis, it gives us a basis for excluding this factor from statistical modelling. On the other hand, if it is important to consider the relationship among SNVs, then we need to investigate appropriate strategies for characterizing the underlying sequencing structure. As an initial step to investigate this issue, we choose three tests with different ways of modelling the correlation between SNVs: 1) A weighted burden test (BT) (Madsen and Browning, 2009); 2) A weighted burden test considering pairwise correlation (BTCOV) (Schaid et al., 2013); and 3) A functional analysis of variance test (FANOVA) (Vsevolozhskaya et al., 2014) that considers the relation among nearby loci and models the genotype profile of an individual as a continuous function. We briefly present the three methods below. 1.1 Burden Test We consider a burden test developed by Madsen and Browning (2009). The test summarizes the L genetic score of multiple SNVs as γj = gij , where gij is the number of low frequency allele i=1 wi 2 of the SNV, i, for individual j. The weight is defined to emphasize the effect of rare variants i.e. wi = ni qi (1 − qi ), where qi is the minor allele frequency (MAF) of the SNV, calculated from controls. Because the test simply adds the genotype of each SNV weighted by its MAF, it does not consider Linkage Disequilibrium (LD) or the correlation between SNVs. 1.2 Burden test that considers the pairwise Correlation In addition to the above burden test, we also consider another type of burden test suggested by Schaid et.al Schaid et al. (2013), which considers pairwise covariance. We consider the folL lowing summary of genetic scores, Sj = gij , wl and gij are defined in the same manner i=1 wi in BT. However, unlike the conventional burden test, the test statistic of BTCOV is given by L L ((Y − Y¯ ) S)2 T = , where V = wk wl Rkl pk (1 − pk )pl (1 − pl ) and Rkl is the s (Y − Y¯ ) Vs (Y − Y¯ ) k=1 l=1 correlation between the SNPs. 1.3 Functional Analysis of Variance test FANOVA fits a continuous function (curve) on the genotype data of an individual. ANOVA can then be used to test the curve difference in cases and controls. While various smoothing methods can be used to fit smooth functions on genotype data, we use the cubic B-splines to fit the smooth functions. After continuous functions g(t), are obtained, the functional analysis of variance can be used for association testing. FANOVA model is written as: gij (t) = µi (t) + ij (t) ij (t) → G.P (0, γ) 3 i = 1, 2, ..., nj j = 1, 2, where, t denotes the genomic position of the genetic variant, k denotes the case or control group and j denotes the individual, G.P (0, γ) denotes Gaussian process with γ as the covariance function, kj is the error term and µk is the mean function of group k. To evaluate the association, we test the hypotheses: H0 : µ1 (t) = µ2 (t) ∀t vs. H1 : µ1 (t) = µ2 (t) for some t. Similar to ANOVA, the test statistic for the hypothesis is, 2 F = nk (ˆ µk (t) − µ ˆ(t))2 dt/(2 − 1) k=1 2 nk , (gkj (t) − µ ˆk (t))2 dt/(n − 2) k=1 j=1 where µ ˆk (t) = n−1 k nk j=1 2 gkj (t) and µ ˆ(t) = n−1 nk µ ˆk (t). The numerator and the denominak=1 tor in the F follow a mixture of chi-squared distributions. Satterwaite approximation is used to approximate the distribution of F as F-distribution. The details can be found in Vsevolozhskaya et al. (2014). 1.4 Simulation We now report the finding of the simulation study that compares the performance of the three methods. We selected a one mb region from chromosome three of the unrelated real sequencing data provided by GAW19, which comprises of 8575 SNVs. For each replicate, we randomly selected a 30 kb segment from the one mb region. From this segment, we randomly selected a specified proportion of SNVs for generating phenotype. We used the logistic model to generate phenotype from these selected SNVs. The two types of effects were considered in the simulation. These effects refer to the coefficients of the SNVs in the logistic set up. We randomly generated the regression coefficients from N (0, 1) for bidirectional effects and N (2, 1) for unidirectional effects. 4 For each type of effect, we varied the proportion of disease-associated variants from 0.01 to 0.5. One thousand replicates were simulated for each scenario for power and type-I error calculation. For comparison, we adopted the same weight for both the burden tests. For FANOVA, we used the penalized cubic B-splines to determine the smoothness of functions. These simulations only evaluated one genetic region. To investigate the performance of the three methods on regions with different genetic structures, we applied them to the unrelated simulated GAW data in order to evaluate the association of the 294 reported disease-associated genes with the simulated hypertension phenotype. For the association analysis, hypertension (HTN1) from the first simulation out of the 200 simulations was used. This data has a sample size of 142, out of which there were 24 cases. 1.5 Results Type-I error rates of the three tests were well controlled at 0.05 level (0.046 for BT, 0.044 for BTCOV, and 0.047 for FANOVA). As we observe from Table 1, power of the three tests increases as we increase the proportion of disease-associated variants. Overall, FANOVA has better or comparable performance to BT and BTCOV, while BTCOV obtains similar power to BT. The same conclusion also holds when the effects are bidirectional (see Table 2). We also observe that the power of the three tests was slightly lower in the case of bidirectional effects than in the case of unidirectional effects. Table 3 summarizes the top ten genes with the smallest p-values from the association analysis. Consistent with the result from simulations, we find that in general FANOVA attains smaller p-values, while the p-values of BT and BTCOV were very similar. Here smaller p-values indicate better performance as these genes were from the 294 reported disease-associated genes. 5 Table 1.1: Power for unidirectional effect %CV .01 BT 0.343 BTCOV 0.339 FANOVA 0.398 .05 .1 .15 .2 0.617 0.714 0.766 0.759 0.615 0.712 0.767 0.755 0.700 0.764 0.808 0.807 .25 .3 .5 0.776 0.793 0.781 0.780 0.792 0.794 0.814 0.814 0.744 Table 1.2: Power for bidirectional effect %CV .01 BT 0.208 BTCOV 0.200 FANOVA 0.217 .05 .1 .15 .2 0.508 0.585 0.621 0.665 0.509 0.579 0.618 0.663 0.597 0.683 0.732 0.765 .25 .3 .5 0.678 0.703 0.683 0.668 0.698 0.680 0.799 0.809 0.815 Table 3 summarizes top 10 genes with the smallest p-values from the association analysis. 1.6 Discussion Through this study, we observe that overall BT and BTCOV have a comparable performance. However, for one gene, THRA, BTCOV attained better performance than the other two tests. In the follow-up analysis, we observe a small LD block in this gene (see Figure 1). The plot of the fitted genotype curves reveal the association happens to lie in that LD block. Therefore, BTCOV, which models the LD pattern, outperforms the other two tests. Also, the effects in the LD block Table 1.3: Summary of top 10 genes with the smallest p-values from the association analysis Gene SUMF1 RELB HIF3A THRA TFDP1 PROK2 POLR2A CD1C CCL24 MAP3K6 BT 8.75E-05 6.57E-02 2.12E-02 1.85E-02 1.50E-02 1.23E-02 2.10E-02 2.76E-02 2.24E-02 8.72E-02 FANOVA 1.31E-05 4.35E-04 4.34E-03 3.62E-02 1.95E-02 1.27E-01 1.31E-02 5.27E-02 1.92E-02 3.27E-02 6 BTCOV 8.43E-05 7.08E-02 2.19E-02 9.68E-03 1.13E-02 1.32E-02 2.25E-02 1.42E-02 2.71E-02 8.45E-02 were largely unidirectional, which is in favor of burden tests. The genetic structure of sequencing variants is usually more complex than pairwise LD. Functional methods provide tool to visualize this structure and explore the associated regions. Figure 1 reveals the interesting structure of RELB. We can see that the associated region lies in LD block but the correlation is not as strong as that for THRA. FANOVA performs much better for this gene. 1.7 Conclusion Our observations indicate that the performance of tests depends on the underlying genetic structure and hence ignoring it in the association analysis may not be ideal. It is advisable to use function based approaches to explore and model the sequencing structure. As illustrated by Figure 1 (fitted genotype function vs variant position), the plot of fitted functional curves provides a reasonable way to explore the genetic structure. The disease-associated regions can also be visualized in the plot. If the underlying genetic structure tends to be complex, it is also advisable to use function based approaches, such as FANOVA, to adequately model the sequencing data. 7 Figure 1.1: LD plots and plots of the fitted smooth functions for THRA 8 Chapter 2 Preliminaries on Functional Data Analysis Functional data can be viewed as realizations of a random variable that takes values in a Hilbert space. In this light, we briefly introduce the concept of mean element and covariance operator in Hilbert space and give some necessary definitions. We can also think of functional data as the sample paths of a stochastic process with smooth mean and covariance functions. Both these perspectives are discussed in a greater detail in Hsing and Eubank (2015). We briefly discuss the estimation of the mean function and the covariance operator in the space L2 [0, 1] of all square integrable functions on domain [0, 1]. We also discuss the concept as well as the estimation of eigenvalues and eigenfunctions of the covariance operator. Further details of these concepts can be found in Horv´ath and Kokoszka (2012). 2.1 Definitions Let f be a function on a measure space (E, B, µ) that takes values in a separable Hilbert space H. k Definition 2.1.1 A function f is called simple if it can be represented as: f (ω) = i=1 IEi (ω)gi , for some finite k, Ei ∈ B and gi ∈ H. k Definition 2.1.2 Any simple function f (ω) = i=1 IEi (ω)gi with µ(Ei ) < ∞, ∀i, is said to be k integrable and it’s Bochner integral is defined as: f dµ = µ(Ei )gi . i=1 9 Definition 2.1.3 A measurable function f is said to be Bochner integrable if there exists a sequence {fn }∞ lim n=1 of integrable simple functions such that n→∞ integral of f is defined as f dµ = lim n→∞ ||fn − f || = 0. The Bochner fn dµ. The existence of the sequence of functions fn in the above definition is guaranteed by the following condition: ||f ||dµ < ∞. Definition 2.1.4 Let {ei } be an orthonormal basis for Hilbert space H1 and X be a bounded ∞ linear function from H1 → H2 where H2 is also a Hilbert space. If X satisfies i=1 where || ||Xei ||22 < ∞, ||2 is the norm on H2 then X is a Hilbert Schmidt operator. The collection of all such Hilbert Schmidt operators is denoted by BHS (H1 , H2 ). This is a Hilbert space. If X1 , X2 ∈ ∞ BHS (H1 , H2 ) then X1 , X2 HS = X1 ei , X2 ei 2 . i=1 Definition 2.1.5 Let X1 ∈ H1 and X2 ∈ H2 . The tensor product X1 ⊗ X2 is an operator defined from H1 → H2 in the following way: X1 ⊗ X2 (Y ) = X1 , Y X2 , where Y ∈ H1 . Let X be a random element of a separable Hilbert space H defined on a probability space (Ω, F, P). The norm on this space is denoted by || || and the inner product by ·, · . Definition 2.1.6 If E||X|| < ∞, the mean of X is defined as the Bochner integral µ = E(X) = XdP. Definition 2.1.7 Assume that E||X||2 < ∞. Then, the covariance operator for X is the element of BHS (H) given by K = E[(X − µ) ⊗ (X − µ)] = (X − µ) ⊗ (X − µ)dP. The Hilbert space of particular interest to us is the L2 ([0, 1]) space. We say that a function X belongs to the space L2 = L2 ([0, 1]) if X is defined on [0, 1] and satisfies 01 X 2 (t)d(t) < ∞. L2 space is a separable Hilbert space with the following inner product: X, Y = 01 X(t)Y (t)dt. 10 We consider a random curve X(t), t ∈ [0, 1] to be a random element of L2 . The mean function is µ(t) = E(X(t)) and the covariance function is K(s, t) = E(X(s)X(t)). We can see that K(Y )(t) = E( X, Y X(t)) = E X(s)Y (s)ds)X(t) = E(X(t)X(s))Y (s)ds = K(t, s)Y (s)ds. We can show that the operator K is symmetric and positive definite. Definition 2.1.8 Suppose that there exists a λ such that Ke = λe, then λ is the eigenvalue and e is the eigenfunction of K. Note that we can show that the eigenfunctions of K are linearly independent. We can orthonormalize them using Gram Schmidt orthonormalization and use them as basis functions. We use this fact in our simulations extensively to generate basis functions. 2.2 Estimation Now that we have presented some basic definitions, we turn to estimation of the defined constructs. Assume that we have a sample of curves X1 , ...Xn from L2 , that are independent and have the same distribution as X. We assume that X is integrable. The mean function estimate is given by the sample mean: n µ ˆ(t) = n−1 Xi (t). i=1 The covariance function estimate is given by its sample counterpart: n ˆ s) = n−1 K(t, (Xi (s) − µ ˆ(s))(Xi (t) − µ ˆ(t)). i=1 11 Similarly the covariance operator estimate is given by n ˆ K(x) = n−1 Xi − µ ˆ, x (Xi − µ ˆ), x ∈ L2 . i=1 We state some results that establish the properties of these estimates. Result 2.2.1 Under the assumption that sample of curves is i.i.d, integrable and has the same distribution as X, E(ˆ µ) = µ and E||ˆ µ − µ||2 = O(n−1 ). Thus, sample mean is an unbiased and consistent. Thus, from now on we can assume that the mean function µ = 0. The estimate for covariance is biased just like in the multivariate case. This bias is asymptotically negligible. ˆ − K||2 = n−1 E||X||4 . Result 2.2.2 If E||X||4 < ∞, EX = 0 then E||K HS We often need to estimate the eigenvalues and eigenfunctions of the the covariance operator. Thus, ˆ v . Note that if v is an eigenfunction then ˆ v = λˆ the estimates of these eigenvalues are given by Kˆ av is also an eigenfunction. The eigenfunctions are usually normalized so that ||v|| = 1. This does not determine the sign of v. If kˆj = sign( vˆ, v ), then kˆj cannot be determined from the data. Result 2.2.3 Suppose E||Xi ||4 < ∞, EX = 0 and λ1 >, ..., > 0. Then, for each j ≥ 1, lim sup n→∞ nE(||kˆj vˆj − vj ||2 ) < ∞, lim sup n→∞ ˆ j |2 ) < ∞. If we assume that only the nE(|λ − λ top p + 1 eigenvalues are non zero then the above result holds for 1 ≤ j ≤ p. We now state the Mercers theorem and Karh¨unen Lo´eve expansion. Let (E, B, µ) be a measure space. Suppose that K is a measurable continuous function on E×E such that ∞. Define, operator K by Kf (·) = K(s, t)dµ(s)dµ(t) < K(s, ·)f (s)dµ(s). K is the integral operator of K and K is the kernel function of K. 12 Definition 2.2.1 The kernel function K is symmetric if K(s, t) = K(t, s). n ci cj K(xi , xj ) ≥ 0 holds for Definition 2.2.2 The kernel function is non negative definite if i,j=1 all n ∈ N, x1 , ...xn ∈ E, c1 , ...cn ∈ R. It’s positive definite if the relation is strictly greater than zero. The following result is the Mercers Theorem. Notice that the covariance function has this representation. Result 2.2.4 Let K be a symmetric, non negative definite kernel function and K be its integral operator. If (λi , ei ) are the eigenvalue and eigenfunction pairs of K, then K has the representation ∞ K(s, t) = λi ei (s)ei (t). i=1 The following result is the Karh¨unen Lo´eve expansion. Result 2.2.5 Let X be a zero-mean, square-integrable random function defined over a probability space (Ω, F, P) and indexed over a closed and bounded interval [a, b], with continuous covariance function K(s, t). Let (λi , ei ) be the eigenvalue and eigenfunction pairs of the integral operator of ∞ the covariance function. Then, X(t) admits the following decomposition: X(t) = Zi ei (t), i=1 where Zk = X(t)ek (t)dt. Furthermore, the random variables Zk have zero mean, are uncorre- lated and have variance λk . 13 Chapter 3 Multivariate Generalized Functional Linear Models 3.1 Introduction The focus of this chapter is to provide a large sample test for assessing if a functional covariate has a regression effect on real valued responses, when there is some dependence among responses. This kind of data typically arises in family based genetic studies when gene expression data or even DNA sequencing data is collected. The aim of these studies is often to test for regression relation between the gene region and the phenotype of interest. Sequencing data for a gene or a gene region consists of observations on a large number of single nucleotide variants. In the light of linkage disequilibrium (Laird and Lange, 2010b), we know that the variants that are closer to each other may have greater association than those that are farther from each other. This motivates us to treat the high dimensional sequencing data as a function of the single nucleotide variant positions and necessitates the use of a functional data based inference method for correlated data. There is abundant literature on functional linear models (Cardot et al., 1999, 2003; Cardot and Sarda, 2005; Cardot and Johannes, 2010; Ramsay, 2006). Recent reviews of functional regression can be found in Morris (2015) and Wang et al. (2016). These linear models assume that the response is a univariate continuous variable and the regressor variable is a function. However, in 14 genetic studies the phenotype or the response variable is often binary and few papers discuss generalized functional linear models (M¨uller and Stadtm¨uller, 2005; Li et al., 2010; Gertheiss et al., 2013) as needed for handling binary response variables. Existing methodology cannot be directly applied to genetic family data, where the response variable is a vector of dependent traits and the regressor is a vector of functions. We address this shortcoming using generalized estimating equations. The estimators thus obtained are shown to be consistent and asymptotically normal, under suitable conditions on the underlying entities. The latter result is used to propose an asymptotic test for regression relation between the functional covariates and the responses. The chapter also includes a finite sample simulation study. Through this study, we first demonstrate the importance of addressing correlation structures in the data and then compare the performance of our proposed method with another method for family studies proposed by Wang et al. (2013). This method is based on a generalized estimating equations approach suitable for accommodating a large number of variables. We also present additional results demonstrating the effect of sample size, the dimension of parameter to be estimated and family (cluster) size on the power of the proposed test. This chapter is organized as follows. Section 2 contains formal description of the original infinite dimensional model and a working truncated model based on the strategy proposed by M¨uller and Stadtm¨uller (2005). Section 3 contains the description of the estimators based on generalized estimating equations while section 4 describes the asymptotic normality results for these estimators. Section 5 describes findings of a simulation study while section 6 presents a real data application. 15 3.2 Model Let m, n be positive integers. We observe n clusters (Xi (t), t ∈ T = [0, 1], Yi ), i = 1, ..., n, each of size m, where, for each i, Xi = (Xi1 , Xi2 , · · · , Xim )T is a an m-dimensional predicting process and Yi = (Yi1 , Yi2 , · · · , Yim )T is a vector of m responses. We assume that these n clusters are independent and identically distributed and that all clusters have the same correlation structure. For jth subject in ith cluster the predicting process Xij (t), t ∈ T, is assumed to be a square integrable random process on T . The corresponding response Yij can be continuous or discrete, real valued variable which is related to Xij (t), t ∈ T via the following generalized regression model, where ω ˜ is a positive measure on T . For a constant β˜0 , and a real valued function β(t), t ∈ T , let ζ˜ij = β˜0 + ˜ β(t)X ij (t)dω(t), i = 1, ..., n, j = 1, ..., m. All integrals in this paper are taken over the interval T , unless specified otherwise. In the rest of this paper, i = 1, ..., n and j = 1, ..., m. We model the regression of Yij on Xij as Yij = g(ζ˜ij ) + eij , µij = E(Yij | Xij (t), t ∈ T ) = g(ζ˜ij ), σ 2 (µij ) = var(Yij − µij | Xij (T ), t ∈ T ) = σ 2 (ζ˜ij ), (3.2.1) ei = (ei1 , ei2 , ..., eim )T , R0 = cor(ei ), for a known real valued link function g and a positive function σ, where eij have zero mean. R0 is m × m true correlation matrix. We now give another representation of the model (3.2.1). Let ρj , j = 1, 2, ..., be an orthonormal basis of the functional space L2 (T, ω) = L2 (ω). The predictor process and parameter function can 16 be written as ∞ Xij (t) = ∞ (k) εij ρk (t), ˜ = β(t) k=1 β˜k ρk (t), (3.2.2) k=1 (k) with random variables εij = Xij (t)ρk (t)dω(t), and coefficients β˜k = ˜ β(t)ρ k (t)dω(t). By (k) assumption 3.4.1 below, we obtain E(εij ) = 0, for all i, j, and k. Note that the random variables (k) (l) εij and εij are uncorrelated for k = l. By the orthonormality of the basis functions, ∞ (k)2 σij ∞ = 2 (t))dω(t) E(Xij ˜ β(t)X ij (t)dω(t) = < ∞, k=1 (k) β˜k εij . k=1 Using the above representation we now address the infinite dimensionality issue or equivalently the issue of infinite number of predicting variables. Based on the truncation strategy proposed by M¨uller and Stadtm¨uller (2005) we replace model (3.2.1) with the following approximate sequence of finite dimensional models. Let p = pn be a sequence of positive integers tending to infinity and define our new approximate model as pn Yij = g β˜0 + (k) β˜k εij + eij , p η˜ijn pn = β˜0 + (p ) Uij n = (k) εij ρk (t), (3.2.3) k=1 k=1 pn (k) β˜k εij , (p ) (p ) (p ) σ 2 (ηij n ) = σ 2 (µij n ) = var(eij |Uij n ) k=1 ei = (ei1 , ei2 , ..., eim )T , R0 = cor(ei ). Let β˜ = (β˜0 , ..., β˜pn )T . The superscript pn indicates the number of parameters. We exhibit this superscript and subscript when necessary. We assume that the standardized error eij σ ˜ (µij ) is (k) independent of εij for all i, j, k. In the sequel, Nm (µ, Σ) stands for m-dimensional normal distribution with the mean vector µ and covariance matrix Σ, m ≥ 1, and all limits are taken as n → ∞, unless specified otherwise. 17 For any vector or a finite dimensional matrix A, A denotes the Frobenius norm of A. 3.3 Estimation ˜ In most appliWe use the generalized estimating equation set-up for estimating the parameter β. cations, we do not know the true correlation matrix R0 , so we use a working correlation matrix R(γ) to specify the working covariance. The parameter γ can be estimated using the residuals. We ˆ to denote an estimated working correlation matrix. The estimate R ˆ that we use below was use R suggested by Balan et al. (2005). Denote the derivative of function g as g . The estimator denoted by βˆ is the solution to the following equation. n n FiT Gi (β)Vˆi (β)−1 ei (β) = 0, ˆ −1 Ai (β)−1/2 ei (β) = FiT Gi (β)Ai (β)−1/2 R U ∗ (β) = i=1 i=1 (3.3.1) where,   gi1 (β) . . .   .. Gi (β) =  ... .   0 ... (0) εi1 (0) εi2 ...  0   (2) (2)   ..  , F T = εi1 εi2 . . .  .   i  .. .. ...  .  .   gim (β) (p) (p) εi1 εi2 . . .  2 (β), ..., σ 2 (β) , Ai (β) = diag σi1 im  (0) εim      yi1 − gi1 (β)  (2)     εim    .. , e (β) =   , .   ..  i   .    yim − gim (β) (p) εim n ˇ i − gi (β))(Y ˇ ˇ T ˇ ˆ= 1 R Ai (β)(Y i − gi (β)) Ai (β) n i=1 ˆ i (β)−1/2 . Vˆi (β) = Ai (β)−1/2 RA (0) (p ) Let εij = 1, εij = (εij , ..., εij n )T , gij (β) = g(εTij β), gi (β) = (gij (β), ..., gij (β)), σi1 (β) = σ(εTij β) and βˇ be a preliminary n/pn consistent estimate of β. It can be obtained using the 18 independence estimating equations. Further details of this estimator can be found in Wang et al. (2011). 3.4 3.4.1 Asymptotic Theory Existence and Consistency We first state the assumptions needed for existence of the solution of (3.3.1) and its consistency. Note that λmax (A), λmin (A) denote the maximum and minimum eigenvalues of a matrix A respectively. Also →D (→P ) denote convergence in distribution (probability). Assumption 3.4.1 Assume that E{X(t)} = 0 for all t, E X 4 < ∞ and β˜ 2 < ∞. Assumption 3.4.2 Function g is monotone, invertible, and has two continuous bounded derivatives. The function σ 2 has a continuous bounded derivative and is bounded from below by δ > 0. Assumption 3.4.3 Assume that pn n−1/2 → 0. Assumption 3.4.4 The true correlation matrix R0 has positive eigenvalues. The estimated workˆ satisfies R ˆ −1 − R ¯ −1 ing correlation matrix R ¯ is a constant positive = Op {(pn /n)1/2 }. R definite matrix with positive eigenvalues. Assumption 3.4.5 There exist two positive constants b1 , b2 such that b1 ≤ λmin {E(F1T F1 )} ≤ λmax {E(FiT Fi )} ≤ b2 . We assume that the mean function of the regressor functions is zero to ease some of the calculations. Note that since we can obtain a consistent estimator of the mean function this assumptions 19 is easily satisfied. We assume that E X 4 < ∞ mainly to obtain consistent estimates of the eigenvalues (Horv´ath and Kokoszka, 2012). Assumption 4, 5 and 6 can also be found in Wang et al. ˆ used in the score function satisfies the assumption 4. (2011). We prove that R Theorem 3.4.1 Under the assumptions 3.4.1–3.4.5, the solution βˆ to (3.3.1) exists and satisfies the following ˜ = Op ( pn /n) ||βˆ − β|| The proof is along the same lines as Wang et al. (2011) and can be found in the appendix. We approximate U ∗ (β) by U¯ (β) = n T −1/2 R ¯ −1 Ai (β)−1/2 ei (β). i=1 Fi Gi (β)Ai (β) Thus, we can consistently estimate the parameters even if the correlation structure is misspecified. 3.4.2 Asymptotic Normality To show the asymptotic normality of the estimator we will approximate U ∗ (β) by n U (β) = i=1 FiT Gi (β)Ai (β)−1/2 R0−1 Ai (β)−1/2 ei (β). We will now rewrite the U (β) as U (β) = DT (β)V (β)−1/2 (Y − µ(β)), where, Vi (β) = Ai (β)R0−1 Ai (β),  0 V1 (β)    0 V2 (β)  V (β) =   .. ..  . .   0 0 i = 1, ..., n,  ... 0    ... 0   ,  . .. . . .    . . . Vn (β) (3.4.1) 20  −1/2 (β) 0 V1   −1/2  (β) 0 V2  D(β) =   .. ..  . .   0 0 ... ... .. . ...  0 0  G1 (β)    0 0 G2 (β)     .. .. ..  . . .   −1/2 Vn (β) 0 0 Γ(β) = 1 E(D(β)T D(β)) = (γkl (β))0≤k,l≤p , n J(β) = dU (β) , dβ  ... 0 ... 0 .. .. . . . . . Gn (β)     F1       ..   . ,       Fn Ξ(β) = Γ(β)−1 = (ξkl (β))0≤k,l≤p ,    l11 . . . l1m     ..  . −1 T . . . Y = (Y11 , Y12 , ..., Y1m , ..., Ynm ) , R0 =  . . . .     lm1 . . . lmm ˆ Next, we shall state the needed assumptions for establishing the asymptotic normality of β. Assumption 3.4.6 Assume that pn n−1/8 → 0. ˜ = n−1 E{D(β) ˜ T D(β)} ˜ and n−1 D(β) ˜ T D(β) ˜ are Assumption 3.4.7 The matrices Γ = Γ(β) non-singular for all n. Assumption 3.4.8 The eigenvalues of Γ are bounded and ˜ T D(β) ˜ D(β) n −1 1/2 = Op (pn ). ˜ = We are now ready to state the theorem regarding the normality of the estimate. Let Γ ˜ T D(β) ˜ and Γ ˆ T D(β). ˆ Note that Γ ˆ = n−1 D(β) ˜ depends on the unknown parameter n−1 D(β) ˆ is a plug in estimator of Γ. ˜ where as Γ 21 Theorem 3.4.2 Under the assumptions 1–9, the following statements hold. ˜ T Γ(βˆ − β) ˜ − (pn + 1) n(βˆ − β) 2(pn + 1) ˜ T Γ( ˜ − (pn + 1) ˜ βˆ − β) n(βˆ − β) 2(pn + 1) T ˜ Γ( ˜ − (pn + 1) ˆ βˆ − β) n(βˆ − β) 2(pn + 1) →D N (0, 1). (3.4.2) →D N (0, 1). (3.4.3) →D N (0, 1). (3.4.4) Note that this theorem can be used to construct 1 − α confidence bands. Refer to corollary 4.3 in M¨uller and Stadtm¨uller (2005). The proofs of both the theorems are given at the nd of this chapter. We are now ready to describe a test for the problem of testing for no regression relation between a real valued response and a functional predicting variable. Referring to the model (3.2.1), testing for no association between the response and the predicting process is equivalent to testing for H0 : β˜ = 0. Since we use the sequence of approximate models proposed in (3.2.3), we test the following hypothesis instead: H0 : (β˜1 , ..., β˜pn ) = (0, ..., 0), versus the alternative that H0 is not true, for a given appropriate value of pn . The proposed test rejects H0 for the large absolute values of the statistic Dn = ˆ βˆ − (pn + 1) nβˆT Γ 2(pn + 1) . From Theorem 2 it follows that the test that rejects H0 whenever |Dn | > zα/2 is of the asymptotic size α, where zα is the 100(1 − α)th percentile of standard normal distribution. 22 3.5 Simulation In this section we report the finding of several simulation studies, which investigate different aspects of the proposed testing procedure. In the first study we demonstrate the importance of considering the correlation structure in the data. We then explore the applicability of our method to sequencing data. In this study, we use sequencing data in the simulation to demonstrate the applicability of our method to genetic studies. We also compare the empirical power of our test to that of gSKAT proposed by Wang et al. (2013). We then study several other aspects of the proposed test by varying the sample and cluster sizes and the dimensions pn . For the first simulation we generated pseudo-random regression functions using Fourier basis functions ρj , j ≥ 1, and the following model: X(t) = pn k k j k=1 ε ρ (t), ε are independent and identically distributed with N (0, 1) distribution. We then evaluated each function on a grid of finite points to obtain functional data that are realizations of an underlying function. We take this approach as in applications only finite discretization of functional data is available. We then applied a smoothing procedure to this discrete data to obtain smooth functions that are used in our proposed test. This smoothing procedure is described later in the section. We simulated the effect function β(t) in the following way: β(t) = pn ˜ k ˜ k=1 βk ρ (t), βk = k −1 δ, k ≥ 1. Here, δ determines the magnitude of the effect of the regression function on the response. The choice of the number of basis functions pn selected to generate each of these functions is set to 10n1/7 . We studied the performance of our method when the response is continuous and binary. For cluster size m = 3 and the case of continuous response we used the following model to generate the response variables: Yi = 01 Xi (t)β(t)dt + ei , ei ∼ N3 (¯0, R) where, ¯0 is a vector of zeroes of length 3 and R is a 3 × 3 correlation matrix with all off diagonal elements equal to γ, |γ| < 1. For binary response, we generated the correlated responses using the function from the bindata 23 package in R software with marginal probabilities given by g 1 0 X(t)β(t)dt , where g is the logistic link function. The correlation matrix is the same as in the previous case. We then studied the empirical power of the proposed test of no regression relation by considering the underlying correlation structure and treating the observations as being independent. We label these two approaches as F test and Ind test, respectively. In these power studies, we chose n = 500, replicated 1000 times and the significance level α = 0.05. The value of δ is fixed at 0.08 for continuous traits and at 0.1 for binary ones. The number of parameters pn was determined using five fold cross validation procedure (see the end of this section about how we chose the value of pn ). We can see from Table 3.1 that as the correlation increases between the individuals the empirical power of the F test increases whereas that of the Ind test remains more or less the same as expected. This demonstrates the need to factor in the correlation structure. The empirical level for this study can be found in Table 3.2. For the second study, we compared the performance in Table 3.1: Empirical power as a function of γ Corr(γ) Continuous Trait 0 0.05 0.30 0.50 0.80 F Test 0.421 0.456 0.567 .692 0.993 Ind Test 0.418 0.45 0.457 0.443 0.455 Binary Trait F Test 0.137 0.130 0.165 0.195 0.308 Ind Test 0.127 0.124 0.124 0.129 0.134 Table 3.2: Empirical level as a function of γ Corr(γ) Continuous Trait 0 0.05 0.30 0.50 0.80 F Test 0.03 0.048 0.045 0.059 0.068 Ind Test 0.051 0.049 0.051 0.043 0.047 Binary Trait F Test 0.039 0.036 0.032 0.043 0.041 Ind Test 0.032 0.035 0.029 0.037 0.037 terms of power of our test and gSKAT for testing of no regression relation. Note that gSKAT is a 24 sequence kernel association tests that is based on generalized estimating equations. It is not meant for functional variables though it can handle cases where the number of variables is larger than the sample size. To make the comparison fair and to investigate the treatment of sequencing data as a function we used real genetic data from the 1000 Genome Project (?) and then simulated response values using this data. In particular, we used a region of the genome (Chromosome 17) from this dataset for following simulation. Note that this a not a family data set and thus, all the individuals are assumed to be independent. The primary focus of this simulation is to investigate the treatment of sequencing data as a function. In real applications, we do not observe the regressor function, but assume that the densely observed data is a realization of a function. So the first step in any functional data study is a smoothing step which involves constructing regressor functions from observed data. Smoothing methods given in Ramsay (2006) typically use the following model to fit a single curve: The given curve Xij is observed at l discrete points (t1 , ..., tl ). Let xijk , k = 1, ..., l, denote these observed values. We then recover the function Xij from these observed values by fitting the linear model xijk = r q=1 cq φq (tk ), k = 1, ..., l, where φq ’s are basis functions. We choose a large value for r and use penalization techniques to ensure that the fitted function is not very rough. We penalize the integral of the squared second derivative, i.e., we choose cq , 1 ≤ q ≤ r to minimize l k=1 (xijk r 2 q=1 cq φq (tk )) − − λP EN2 (Xij ). We take P EN2 (Xij ) = D2 Xij (s)ds, where D2 denotes the second derivative. The underlying function Xij is approximated by Xij (t) = r q=1 cˆq φq (t). We used cubic B-spline basis functions and the smooth.spline function available in R software to carry out the smoothing procedure. We specified the knots to be at (t1 , ..., tl ) leading to r = l + 4 − 2. After obtaining the underlying function we simulated a discrete and continuous response using the sequencing data. Let X denote the sequencing data matrix of dimension 25 n i=1 mi × l, where n, mi , and l represent the number of clusters, cluster size, and the number of variants, respectively. For the selected region from the 1000 Genome Project, l = 800, n = 1092 while the family size mi is one for all the families. Then the responses are generated using the relations Yi = l k=1 βk Xik + ei , ei ∼ Nm (¯0, Ri ), βk = k −1 δ, k = 1, · · · , l, where now Ri is the mi × mi correlation matrix with all off diagonal terms equal to γ = 0. For the binary case we proceeded as before using the rmvbin function in R. The following empirical powers are based on 1000 replicates and level of significance of 0.05. We chose pn by using the five fold cross validation procedure (see the end of this section). We can see in Table 3.3 shows that both the tests have comparable power but Type-I error for gSKAT for the binary case is inflated. Table 3.3: Empirical power comparison with the gSKAT Effect(δ) Continuous Trait 0.00 0.50 1.00 3.00 5.00 F Test 0.041 0.102 0.314 0.996 1.00 gSKAT 0.048 0.108 0.374 0.994 1.00 Binary Trait F Test 0.047 0.072 0.114 0.555 0.951 gSKAT 0.061 0.063 0.133 0.608 0.962 In the simulation study to investigate the effect of sample size and cluster size on the power of our test, the regression function was generated using the Fourier basis functions and the response variable was generated using the linear and logistic model as in the first simulation study. The choice of the number of basis functions pn selected to generate the regression functions is again set to 10n1/7 . Effect size δ used in this study is set equal to 0.1 and 0.05 for binary response and continuous response, respectively. The correlation for the sample size and the cluster study are set to 0.8. From Table 3.4 we can see that the power increases with the sample size for both types of the response variables. 26 Table 3.4: Empirical power as a function of sample size Sample Size 500 1000 2000 Continuous Trait 0.718 0.970 1 Binary Trait 0.295 0.65 0.946 To demonstrate the effect of using large cluster size m, we used a sample size of 100 in the study. Results of this simulation can be seen in Table 3.5. We observe that the Type I error is inflated with the increase in the cluster size. For a cluster of size m the number of parameters in the correlation matrix is of order m2 . Large m will affect the consistency of correlation estimate ˆ used in the score equation (3.3.1) rendering our asymptotic results invalid . (R) Table 3.5: Empirical level as a function of cluster size Cluster Size 3 10 20 Continuous Trait 0.203 0.659 0.867 Binary Trait 0.05 0.14 0.4 Table 3.6 below demonstrates the effect of increasing the number of parameters pn in the model. The regressor function and the continuous response variable were simulated as before. The number of basis functions used to generate the regressor was set to pn = 30, 50, 90. Sample size was 500, the effect size δ was 0.05, correlation parameter γ was taken to be 0.8. The number of parameters in the model pn was chosen to be 30, 50, 90 instead of using cross validation. We can see that as the number of parameters increases the Type-I error also increases. Table 3.6: Effect of increasing pn pn 30 50 90 Empirical level 0.068 0.065 0.107 The following figure displays Q-Q plots to check the normality of the test statistic. For small 27 dimensions the test statistic follows a chi-square distribution with p degrees of freedom. As we increase p the normal approximation becomes more appropriate. Figure 3.1: Normal Q-Q plots p=10 20 15 0 0 5 10 Sample Quantiles 10 5 Sample Quantiles 15 25 p=2 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 Theoretical Quantiles Theoretical Quantiles p=30 p=80 3 100 60 80 Sample Quantiles 50 40 30 20 Sample Quantiles 60 120 70 −3 −3 −2 −1 0 1 2 3 −3 Theoretical Quantiles −2 −1 0 1 2 3 Theoretical Quantiles An important problem that we need to address while applying our method is the determination of the number of parameters pn . In all of the simulations reported here, we used the functional principal component analysis method for dimension reduction. We projected the infinite dimensional function on to a finite pn dimensional subspace spanned by the first pn eigenfunctions of the covariance operator of the regressor functions. This covariance operator is estimated from the regressor functions obtained by smoothing of the observed data. We first selected those values of p˜ for which 28 the proportion of variance explained by the first p˜ principal components p˜ k=1 λk ∞ k=1 λk var- ied from 80% to 99% as the potential values for the number of parameters pn . We then used 5 fold cross validation to select a final value for pn . 3.6 Application to the sequencing data from the Minnesota Twin Study Substance use disorders and related health conditions pose a remarkable burden on global health. Twin and family-based studies have suggested a substantial genetic contribution to substance dependence (e.g., nicotine and alcohol dependence). Studies have been successful in identifying genetic variants contributing to nicotine dependence. In this application, we applied the proposed method to assess the dependence between 15 neuronal nicotinic acetylcholine receptors (nAChRs) subunit genes and nicotine dependence using the sequencing data from Minnesota Twin Study (Vrieze et al., 2014). Genetic associations between nAChRs subunit genes and nicotine dependence has already been established. A comprehensive study on the these genes (Saccone et al., 2009) found associations for loci in the CHRNA5, CHRNA3,CHRNA4, CHRNB4,CHRNB3, CHRNB1, CHRNA6, CHRND, CHRNG, CHRNB4 with nicotine dependence. Our aim is to confirm whether our test can replicate these associations. The sequencing data set we used for this analysis has 662 families and 1445 individuals. Nicotine dependence is a continuous variable measured based on the protocols of the Substance Abuse Module of Composite International Diagnostic Interview (Hicks et al., 2011). It considers the frequency and quantity of nicotine use including cigarettes, cigars, pipes and chewing tobacco. Covariates of age and sex were also considered. We first fit a regression model with nicotine dependence as response and age and sex as predictors, and then used the residuals in the analysis. 29 We apply our proposed test to each of the 15 genes individually. As the response and regressors are centered, we omit the intercept from this analysis. The smooth functions obtained by the smoothing method discussed earlier in the simulation section serve as regressors in the test. Table 3 report the number of parameters i.e., pn used for each gene. Note that as these values are small, normal approximation does not work as can be seem from Figure 3.1 . We instead use Dn∗ as the statistic that follows χpn and use this to report the un-adjusted p-values for the F-test. From this table, we find that the p-values for F-test are smaller than those of the gSKAT for many genes. Also, our method is able to detect several associations that are undetected by gSKAT, which might suggest that our method has better performance. Note that even after adjustment for multiple testing our method is able to find associations for genes, such as CHRNB2, CHRNA6 and CHRND. Table 3.7: p-values for Minnesota Twin Study Gene CHRNA1 CHRNA2 CHRNA3 CHRNA4 CHRNA5 CHRNA6 CHRNA7 CHRNA9 CHRNB1 CHRNB2 CHRNB3 CHRNB4 CHRND CHRNE CHRNG F Test 0.929 0.740 0.057 0.011 0.148 4E-05 0.625 0.400 0.017 2E-09 0.003 0.382 8E-04 0.207 0.044 gSKAT pn 0.621 2 0.595 2 0.611 2 0.125 2 0.415 2 0.216 6 0.736 2 0.524 2 0.870 2 0.1351 7 0.429 5 0.561 2 0.175 2 0.675 2 0.270 2 30 Dn∗ 0.147 0.600 5.713 8.900 3.810 29.765 0.937 1.831 8.047 53.838 17.524 1.922 14.230 3.142 6.219 3.7 Discussion The proposed method can be easily extended to include additional finite dimensional covariates Z = (Z1 , ..., Zk ) as well as finitely many functional regressors X1 , ..., Xl . The main question is to choose the dimension p1n , ..., pln for each of the functional regresors. We can use cross validation techniques to determine p1n , ..., pln . Note that this is more computationally challenging than the case of a single functional regressor. The dimension of the truncated model (3.2.3) is l pn = pjn + k + 1. j=1 3.8 Results This section contains the consistency and existence results. Note that C denotes a constant in the details below. ˆ −1 converges to the inverse of the Proposition 3.8.1 The inverse of estimated correlation matrix R ˆ −1 − R−1 = Op {(pn /n)1/2 }. true correlation in probability i.e. R 0 Proof Let R∗ = n−1 n i=1 gives us R∗ −1/2 Ai ˜ i −gi (β)}{Y ˜ ˜ T −1/2 (β). ˜ Central limit theorem (β){Y i −gi (β)} Ai ˆ − R∗ = Op {(pn /n)1/2 }. Combining − R0 = Op (n−1/2 ). We will show that R these two results will yield the proposition. Recall that eij (β) = yij − g(εTij β), σij (β) = σ(εTij β). We have m ˆ − R∗ 2 R = ji ,j2 =1 m ≤2 ji ,j2 =1 1 n ˇ ij (β) ˇ ˜ ij (β) ˜ 2 eij1 (β)e eij1 (β)e 2 2 − ˇ ij (β) ˇ ˜ ij (β) ˜ σ ( β)σ σij1 (β)σ ij 1 2 i=1 2 n m Ij2 j ,1 + 2 1 2 ji ,j2 =1 31 Ij2 j ,2 , 1 2 where 1 Ij1 j2 ,1 = n Ij1 j2 ,2 = 1 n n i=1 n i=1 m ≤C 1 n 2 1 1 − ˜ ij (β) ˜ ˜ ij (β) ˜ σij1 (β)σ σij1 (β)σ 2 2 . Ij2 j ,1 = Op (pn /n). Using Assumption 3.4.2 1 2 ji ,j2 =1 1 ≤C n 1 ˜ ij (β) ˜ eij1 (β)e 2 We first show that Ij2 j ,1 1 2 ˇ ij (β) ˇ − eij (β)e ˜ ij (β) ˜ eij1 (β)e 2 1 2 , ˇ ij (β) ˇ σij (β)σ 2 n ˜ ˇ − eij (β)} ˇ ij (β) eij1 (β){e 2 2 i=1 n 1 n +C 2 n ˜ ˇ − eij (β)} ˜ ij (β) eij2 (β){e 1 1 i=1 2 ˜ ij (β) ˜ − gij (β)} ˇ eij1 (β){g 2 2 i=1 1 +C n 1 +C n 2 n ˇ ˜ ˇ ˜ − gij (β)}{g {gij1 (β) ij2 (β) − gij2 (β)} 1 i=1 n 2 ˜ ij (β) ˜ − gij (β)} ˇ eij2 (β){g 1 1 . i=1 Using Cauchy Schwarz inequality, m ji ,j2 =1 m Ij2 j ,1 1 2 ≤C ji ,j2 =1 m +C ji ,j2 =1 m +C ji ,j2 =1 1 n n n ˜ e2ij (β) 1 i=1 n 1 n 1 n i=1 1 ˜ − gij (β)} ˇ 2 {g (β) 2 n ij2 n ˜ − gij (β)} ˇ 2 {gij1 (β) 1 i=1 n i=1 n ˜ − gij (β)} ˇ 2 {gij1 (β) 1 i=1 i=1 = Ij1 j2 ,11 + Ij1 j2 ,12 + Ij1 j2 ,13 . 32 1 ˜ − gij (β)} ˆ 2 {g (β) 2 n ij2 1 2 ˜ e (β) n ij2 Now, m 1 n Ij1 j2 ,11 = C =C ji ,j2 =1   m  ji =1 n n i=1 ˜ e2ij (β) 1 i=1  n  m 1 2 ˜ eij (β)  1  n i=1 1 ˜ − gij (β)} ˇ 2 {g (β) 2 n ij2 n j2 =1 i=1  1 ˜ − gij (β)} ˇ 2 {g (β) 2 n ij2 ˇ − gij (β)} ˜ 2 ≤ C(βˇ − β) ˜ T εij εT (βˇ − β), ˜ yielding Taylors expansion gives us {gij (β) ij m n j2 =1 i=1 1 ˇ 2 ≤C1 ˜ − gij (β)} {gij2 (β) 2 n n n m ˜ ˜ T εij εT (βˇ − β) (βˇ − β) 2 ij 2 i=1 j2 =1 ˜T1 ≤ C(βˇ − β) n ˜T ≤ C(βˇ − β) 1 n n ˜ FiT Fi (βˇ − β) i=1 n ˜ FiT Fi − E(FiT Fi ) + E(FiT Fi ) (βˇ − β) i=1 ˜ T E(F T Fi )(βˇ − β) ˜ ≤ C(βˇ − β) i ≤ βˇ − β˜ 2 λmax E(FiT Fi ). = Op (pn /n) We get the above using Assumption 3.4.4, 3.4.5. We have n−1 n i=1 ˜ = Op (1). Using similar e2ij (β) 1 m techniques for the remaining terms we can show Ij1 j2 ,11 , Ij1 j2 ,12 , Ij1 j2 ,13 and thereby ji ,j2 =1 m are all of order Op (pn /n). The same holds for Ij2 j ,2 to complete the proof. ji ,j2 =1 1 2 Define, n −1/2 FiT Gi (β)Ai U¯ (β) = −1/2 ¯ −1 A (β)R i (β)ei (β), i=1 J(β) = ∂U (β) , ∂β β J ∗ (β) = ∂U ∗ (β) , ∂β β 33 ∂ U¯ (β) ¯ J(β) = . ∂β β Ij2 j ,1 1 2 Lemma 3.8.1 The jacobian J(β) can be decomposed as J(β) = −H(β) + R1 (β) − R2 (β) − R3 (β), where n −1/2 FiT Gi (β)Ai H(β) = − −1/2 (β)R0−1 Ai (β)Gi (β)Fi , i=1 n −1/2 FiT G¨i (β)Ai R1 (β) = i=1 n −1/2 FiT Gi (β)Ai R2 (β) = i=1 n R3 (β) = (β)F0 (β)Fi , (β)R0−1 A˙ i 1/2 (β)A−1 i (β)diag{ei (β)}Fi , ˙ 1/2 (β)F0 (β)Fi . FiT Gi (β)A−1 i (β)Ai i=1 −1/2 G¨i (β) = diag{gi1 (β), ..., gim (β)}, A˙ i = diag{σi1 (β), ..., σim (β)}, F0 (β) = diag(R0−1 Ai ei ). ¯ Similar result holds for J(β) and J ∗ (β) i.e. ¯ ¯ ¯ 1 (β) − R ¯ 2 (β) − R ¯ 3 (β), J(β) = −H(β) +R J ∗ (β) = −H ∗ (β) + R1∗ (β) − R2∗ (β) − R3∗ (β), where n ¯ H(β) =− i=1 n −1/2 FiT Gi (β)Ai H ∗ (β) = − i=1 −1/2 FiT Gi (β)Ai −1/2 ¯ −1 A (β)R i (β)Gi (β)Fi and −1/2 ˆ −1 A (β)R i ¯ i (β), R∗ (β), (i = (β)Gi (β)Fi . Remaining terms R i ¯ and R∗ respectively. 1, 2, 3) are defined in a similar fashion by using R Proof As the proof is simple but tedious we omit the details. 34 Lemma 3.8.2 For all ∆ > 0, bn ∈ Rpn , 1/2 } ¯ sup |bTn (J ∗ (β) − J(β))b n | = Op {(npn ) sup β−β˜ ≤∆(pn /n)1/2 bn =1 Proof Due to Lemma 3.8.1 it is sufficient to prove the following results: sup 1/2 } ¯ sup |bTn (H ∗ (β) − H(β))b n | = Op {(npn ) (3.8.1) β−β˜ ≤∆(pn /n)1/2 bn =1 sup ¯ i (β))bn | = Op {(npn )1/2 }, sup |bTn (Ri∗ (β) − R (i = 1, 2, 3). (3.8.2) β−β˜ ≤∆(pn /n)1/2 bn =1 We have n ¯ |bTn {H ∗ (β) − H(β)}b n| −1/2 |bTn FiT Gi (β)Ai ≤ ˆ −1 − R ¯ −1 )A−1/2 (β)Gi (β)Fi bn | (β)(R i i=1 n ˆ −1 − R ¯ −1 ≤C R bTn FiT 2 i=1 n ˆ −1 − R ¯ −1 =C R bTn FiT Fi bn i=1 n ≤ COp {(pn /n)1/2 }n i=1 bTn FiT Fi bn − bTn E(F1T F1 )bn n + COp {(pn /n)1/2 }nbTn E(F1T F1 )bn ≤ COp {(pn /n)1/2 }nλmax {E(F1T F1 )} ≤ COp {(pn /n)1/2 }. Similarly we can prove (3.8.2). 35 Lemma 3.8.3 We have 1/2 p ). ¯ ¯ + H(β)}b sup |bTn {J(β) n | = Op (n n sup β−β˜ ≤∆(pn /n)1/2 bn =1 Proof From Lemma 3.8.1 it is sufficient to show that sup sup |bTn R¯i (β)bn | = Op (n1/2 pn ), (i = 1, 2, 3). β−β˜ ≤∆(pn /n)1/2 bn =1 We first consider n R¯2 (β) = −1/2 ˜ ¯ −1 A˙ i 1/2 (β)A−1 (β)diag{ei (β)}Fi FiT {Gi (β) − Gi (β)}A (β)R i i i=1 n −1/2 ˜ FiT Gi (β){A i + i=1 n + i=1 n + i=1 n + i=1 n + −1/2 (β) − Ai ˜ R ¯ −1 A˙ i (β)} 1/2 (β)A−1 i (β)diag{ei (β)}Fi −1 (β)diag{e (β)}F ˜ −1/2 (β) ˜R ˜ ¯ −1 {A˙ i 1/2 (β) − A˙ i 1/2 (β)}A FiT Gi (β)A i i i i −1 (β) − A−1 (β)}diag{e ˜ −1/2 (β) ˜R ˜ ˜ ¯ −1 A˙ i 1/2 (β){A FiT Gi (β)A i (β)}Fi i i i ˜ −1/2 (β) ˜R ˜ −1 (β)[diag{e ˜ ˜ ¯ −1 A˙ i 1/2 (β)A FiT Gi (β)A i (β)} − diag{ei (β)}]Fi i i ˜ −1/2 (β) ˜R ˜ −1 (β)diag{e ˜ ˜ ¯ −1 A˙ i 1/2 (β)A FiT Gi (β)A i (β)}Fi i i i=1 6 ¯ . R 2,i = i =1 We investigate the first and last terms of the above equation and leave the rest to the reader. For the first term using Assumption 3.4.5, 3.4.2 we get 36 n sup sup β−β˜ ≤∆(pn /n)1/2 bn =1 i=1 ¯ 2,1 bn | |bTn R n ≤ sup sup β−β˜ ≤∆(pn /n)1/2 bn =1 i=1 −1/2 ˜ ¯ −1 A˙ i 1/2 (β)A−1 (β) |bTn FiT {Gi (β) − Gi (β)}A (β)R i i × diag{ei (β)}Fi bn | n ≤ bTn Fi 2 ˆ sup C sup{gij (β) − gij (β)} sup β−β˜ ≤∆(pn /n)1/2 bn =1 i,j i=1 n ≤ bTn FiT Fi bn sup C β − βˆ sup β−β˜ ≤∆(pn /n)1/2 bn =1 i=1 ≤ Op ((npn )1/2 ). In a similar fashion we can show that n sup sup β−β˜ ≤∆(pn /n)1/2 bn =1 i=1 ¯ bn | = Op ((npn )1/2 ), |bTn R 2,i (i = 2, ..., 5). ¯ i,6 . Using Assumption 3.4.2 we have We now investigate the last term R pn n m ¯ 2,6 2 ≤ C R k,l=0 i ,i =1 j ,j ,j j =1 1 1 1 2 1 2 (k) (l) (k) (l) ˜ ˜ εi j εi j ε ε e (β)e (β). 1 2 1 1 i j i j i 1 j1 i1 j1 1 2 1 1 Taking expectation and using the independence of errors between clusters we get ¯ 2,6 2 = Op (np2n ). R 37 (3.8.3) Thus we get, sup |bTn (R¯2 (β))bn | = Op (n1/2 pn ). sup β−β˜ ≤∆(pn /n)1/2 bn =1 ¯ 1 and R ¯ 3 in a similar manner to complete the proof. We can obtain the result for R Lemma 3.8.4 We have ˜ n = Op (n1/2 pn ). ¯ ¯ β)|b sup |bTn (H(β) − H( sup β−β˜ ≤∆(pn /n)1/2 bn =1 The proof of this lemma is similar to the proofs of Lemma 3.8.2, 3.8.3 so we omit the details. Lemma 3.8.5 We have ˜ T H( ˜ ˜ ≤ −Cn β − β˜ 2 . ¯ β)(β (β − β) − β) Proof Using Assumptions 3.4.2, 3.4.4, 3.4.5 , n ˜ T H( ˜ ˜ =− ¯ β)(β (β − β) − β) ˜ ˜ T F T Gi A−1/2 R ¯ −1 A−1/2 Gi Fi (β − β) (β − β) i i i i=1 n ¯ min λmin (A−1 )λmin (G2 ) ≤ −λmin (R) i i i FiT Fi β − β˜ 2 i=1 ≤ −Cn β − β˜ 2 . Lemma 3.8.6 Let U¯ (β) = n i=1 −1/2 ¯ −1 −1/2 (β)−1 ei (β). R Ai FiT Gi (β)Ai ˜ = Op {(npn )1/2 }. U¯ (β) 38 Then we can show that ¯ −1 = (¯lij )1≤i,j≤m . Using Assumption 3.4.2, Proof Let R pn ˜ 2= U¯ (β)  n m ˜ ¯lj j ε(k) eij (β) ˜ gij (β) 2 1 ij 1 2 ˜ ij (β) ˜ σij1 (β)σ 2  2  k=0 i=1 j1 ,j2 =1 2 ˜ 2 ) ≤ CO(npn ). E( U¯ (β) Thus, the result is proved. ˜ − U¯ (β) ˜ = Op (pn ). Lemma 3.8.7 We have U ∗ (β) ˆ −1 − R ¯ −1 . We have, Proof Let Q = {qj1 j2 }1≤j1 ,j2 ,m = R n m ˜ − U (β) ˜ = U ∗ (β) qj 1 j 2 i=1 j1 j2 =1 ηij2 ) g (˜ ηij2 )e(˜ ε . σ(˜ ηij2 )σ(˜ ηij1 ) ij1 Note that, n E i=1 This implies, n i=1 ηij2 ) g (˜ ηij2 )e(˜ 2 εij1 = O(npn ). σ(˜ ηij2 )σ(˜ ηij1 ) g (˜ ηij2 )e(˜ ηij2 ) εij1 = Op {(npn )1/2 }, (1 ≤ j1 , j2 ≤ m). Assumption 3.4.4 ηij1 ) σ(˜ ηij2 )σ(˜ yields the result. We are now ready to prove consistency result. Proof of Theorem 1 According to Theorem 6.3.4 from Ortega and Rheinboldt (2000), to prove the existence and conˆ it is enough to verify the following condition: for all > 0, there exists sistency of the estimator β, a constant ∆ > 0 such that, for all n sufficiently large,  P  sup ˜ T U ∗ (β) < 0 ≥ 1 − . (β − β) β−β˜ =∆(pn /n)1/2 39 This same technique has been used previously by Wang et al. (2013). Using Taylors expansion we get, ˜ T U ∗ (β) = (β − β) ˜ T U ∗ (β) ˜ − (β − β) ˜ T {−J ∗ (β ∗ )}(β − β) ˜ (β − β) (3.8.4) = A1 + A2 , where, β ∗ lies between β and β˜ i.e. max{ β ∗ − β , β ∗ − β˜ } ≤ β − β˜ . Also, β − β˜ = ∆(pn /n)1/2 . By Lemma 3.8.6, 3.8.7, ˜ + (β − β) ˜ T (U ∗ (β) ˜ − U¯ (β)) ˜ ˜ T U¯ (β) A1 = (β − β) |A1 | ≤ ∆(pn /n)1/2 (npn )1/2 + ∆(pn /n)1/2 pn ≤ ∆Op (pn ) + ∆op (pn ). Next, ˜ T J ∗ (β ∗ )(β − β) ˜ A2 = −(β − β) ˜ T J(β ˜ + (β − β) ˜ T {−J ∗ (β ∗ ) + J(β ˜ ¯ ∗ )(β − β) ¯ ∗ )}(β − β) = −(β − β) = A21 + A22 . 40 (3.8.5) From Lemma 3.8.2 we obtain A22 = ∆2 op (pn ). Also ˜ T {J(β ˜ ¯ ∗ ) + H(β ¯ ∗ )}(β − β) A21 = −(β − β) ˜ T {H(β ˜ ˜ ¯ ∗ ) − H( ¯ β)}(β (β − β) − β) ˜ T H( ˜ ˜ ¯ β)(β (β − β) − β) ≤ −C∆2 pn + op (pn ) + op (pn ). (3.8.6) ˜ T U ∗ (β) = ∆Op (pn ) − C∆2 pn . Thus for From (3.8.4), (3.8.5), (3.8.6) we can see that (β − β) ˆ a suitable ∆ this term is negative in probability and we get the existence and consistency of β. We now turn to prove the asymptotic normality result. Note that, n −1/2 D(β)T D(β) = −H(β) = FiT Gi (β)Ai −1/2 (β)R0−1 Ai (β)Gi (β)Fi . i=1 Similarly define, n D∗ (β)T D∗ (β) = −H ∗ (β) = −1/2 FiT Gi (β)Ai −1/2 ˆ −1 A (β)R i i=1 ˜ Proposition 3.8.2 Let β ∗ be a fixed value of β between βˆ and β, 1.5 ) = o (n). J ∗ (β) − J(β) = Op (n1/2 pn p 41 (β)Gi (β)Fi . Proof Using Lemma 3.8.1 we obtain, 3 J ∗ (β) − J(β) ≤ D∗ (β)T D∗ (β) − D(β)T D(β) Ri∗ (β) − Ri (β) , + i=1 = I1 + I2 . (3.8.7) n m − We now look at each of these terms, I1 = i=1 j1 j2 =1 qj1 j2 gij (β)gij (β) 1 2 σij1 (β)σij2 (β) εij1 εTij 2 , where ˆ −1 . We use Lemma 3.8.1, Assumptions 3.4.2 and 3.4.6 to get qj1 j2 are the terms from R0−1 − R (I1 /n)2 = Op (p3n /n) = op (1). Similarly we can prove that I2 = Op (n1/2 p1.5 n ) = op (n). ˜ Proposition 3.8.3 Let β ∗ be a fixed value of β between βˆ and β, ˜ = Op (np1.5 /n1/4 ) = op (n). J(β ∗ ) − J(β) n Proof n ˜ J(β ∗ ) − J(β) ≤ ˜ D(β ∗ )T D(β ∗ ) − D(β)T D(β) ˜ R∗ (β ∗ ) − R(β) + i=1 = I1 + I2  ∗ )g (β ∗ ) ˜ ˜  (β g ( β)g ( β) ij1 ij2 ij1 ij2 − l ε εT ∗ ∗ ˜ ij (β) ˜  j1 j2 ij1 ij2  σij1 (β )σij2 (β ) σij (β)σ i=1 j1 j2 =1 1 2 n I1 = m  g ∗ = εT β ∗ . Using integral form of the remainder in Taylor’s theorem, Recall that η˜ij = εTij β˜ and ηij ij 42 Theorem 1 and Assumption 3.4.2, ˜ = gij (β ∗ ) − gij (β) ∗ }(η ∗ − η˜ )dλ g {λ˜ ηij − (1 − λ)ηij ij ij ∗ | ≤ c × β˜T − β ∗T sup ε ≤ c × |˜ ηij − ηij ij (3.8.8) i,j Consider, P (sup εij ≥ n1/4 ) ≤ nmP ( ε1 > n1/4 ) i,j 2 pn ≤ mE ε1 4 = mE( ε1 2 )2 = mE k=1 ε21 k 2 ∞ ε21 ≤ mE k k=1 = mE X1 4 ≤ ∞ −1/4 ) This can be ˜ = Op (p1/2 Combining the above result with (3.8.8) we get gij (β ∗ ) − gij (β) n n used to show that I1 = Op (n3/4 p1.5 n ) = op (n). Similarly, we can prove the same result for I2 . Proposition 3.8.4 This proposition states ˜ that n1/2 (βˆ − β) ≈ ˜ T D(β) ˜ −1 U (β) ˜ D(β) n n1/2 Proof Using Taylors expansion, ˆ = U ∗ (β) ˜ + J ∗ (β ∗ )(βˆ − β) ˜ U ∗ (β) ˜ = −J ∗ (β ∗ )(βˆ − β) ˜ U ∗ (β) ˜ − U (β) ˜ U ∗ (β) n1/2 + ˜ U (β) n1/2 =− 43 J ∗ (β ∗ ) ˆ ˜ 1/2 (β − β)n n From Lemma 3.8.7 we obtain, ˜ U (β) n1/2 =− J ∗ (β ∗ ) ˆ ˜ 1/2 (β − β)n n The right hand side can be written as − ˜ J ∗ (β ∗ ) ˆ ˜ 1/2 −J ∗ (β ∗ ) + J(β ∗ ) ˆ ˜ 1/2 −J(β ∗ ) + J(β) ˜ 1/2 (β − β)n = (β − β)n + (βˆ − β)n n n n ˜ + D(β) ˜ T D(β) ˜ ˜T ˜ J(β) ˜ 1/2 + D(β) D(β) (βˆ − β)n ˜ 1/2 − (βˆ − β)n n n ˜ = Op (n1/2 pn ), i = 1, 2, 3 in the same manner as (3.8.3). Using AssumpWe can show that Ri (β) tion 3.4.2, 3.4.6, Proposition 3.8.2, 3.8.3, Theorem 1 yields − ˜ T D(β) ˜ J ∗ (β ∗ ) ˆ ˜ 1/2 D(β) ˜ 1/2 (β − β)n = (βˆ − β)n n n Thus, the Proposition is proved. Let, e (β) = V (β)−1/2 e(β), Zn (β) = D(β)T D(β) n 1/2 χn (β) = Ξn (β)D(β)T e (β) n1/2 , Ψn (β) = Γ(β)1/2 −1 D(β)T e(β) n1/2 D(β)T D(β) n , (3.8.9) −1 Γ(β)1/2 . ˜ for ease of notation we denote Zn = As the functions in Proposition 3.8.4 are evaluated at β, ˜ , χn = χn (β), ˜ Ψn = Ψn (β), ˜ e = e (β). ˜ From the statement of Theorem 2, 3.4.2, we are Zn (β) 44 interested in ZnT ΓZn . We decompose it into the three terms as ZnT ΓZn = χTn Ψ2n χn = χTn χn + 2χTn (Ψn − Ipn +1 )χn + χTn (Ψn − Ipn +1 )(Ψn − Ipn +1 )χn = Fn + Gn + Hn , 1/2 We shall show that Hn /pn say. 1/2 and Gn /pn are asymptotically negligible. This goal is in part facilitated by the following proposition. Proposition 3.8.5 We have Ψn − Ipn +1 2 = op (1/pn ), (3.8.10) d (χTn χn − (pn + 1))/(2pn )1/2 − → N (0, 1). (3.8.11) We shall prove this lemma shortly. As a consequence of this lemma, we obtain |χTn (Ψn − Ipn +1 )χn | ≤ |χn χTn | Ψn − Ipn +1 , 1/2 1/2 = Op (pn )op (1/pn ) = op (pn ). 1/2 p This implies that Gn /pn 1/2 p − → 0. We can similarly show that Hn /pn ZnT ΓZn (2pn )1/2 = Fn (2pn )1/2 45 . − → 0. Hence (3.8.12) Proof We shall first prove (3.8.10). Observe that Ψn − Ipn +1 ≤ Ψn Ψn ≤ Ipn +1 + Ψ−1 n − Ipn +1 Ψ−1 n − Ipn +1 1 − Ψ−1 n − Ipn +1 (3.8.13) . We shall show that −1 Ψ−1 n − Ipn +1 = op (pn ) = op (1), (3.8.14) implying Ψn ≤ Ipn +1 = (pn + 1)1/2 . This will prove (3.8.10), because 1/2 Ψn − Ipn +1 = Op (pn )op 1 pn = op 1 1/2 . pn (1/2) We now turn to the proof of (3.8.14). Let, Ξ = Ξ1/2 Ξ1/2 , Ξ1/2 = (ξks )0≤k,s≤pn . We can write 1/2 1 D(β) ˜ T D(β)Ξ ˜ 1/2 Ψ−1 n =Ξ n pn l ˜ ˜ m n j1 j2 gij2 (β)gij1 (β) 1 (1/2) (s) (t) (1/2) . = ξks εij εij ξtl ˜ ˜ n ( β) σ ( β)σ 1 2 0≤k,l≤pn ij2 ij1 i=1 j j =1 ts=1 1 2 Thus, 2 ˜ ˜  lj1 j2 gij (β)gij (β) (1/2) (s) (t) (1/2) 2 −1 2 1 E Ψn − Ipn +1 = E ξks εij εij ξtl − δkl ˜ ij (β) ˜ 1 2 n  σij2 (β)σ i=1 j1 j2 =1 st=0 1 kl=0  1 n pn m pn pn =E (akl − δkl )2 kl=0 pn =E 2 ), (a2kl − 2akl δkl + δkl kl=0 46 say. pn 2 kl=0 akl We investigate each term separately. Write 1 B1 = 2 n = B1 + B2 , where ˜ ˜ lj1 j2 gij (β)g ij1 (β) (1/2) (s1 ) (t1 ) (1/2) 2 ξks εij εij ξt l ˜ ij (β) ˜ 1 2 1 1 σ ( β)σ ij 2 1 kl=0 i=1 j j j j =1 s1 t1 s2 t2 =0 1 2 1 2 ˜ ˜ g (β)g (β) l pn n pn m × j1 j2 ij 2 ij1 (1/2) (s2 ) (t2 ) (1/2) ε ε ξ , 2 ij1 ij2 t2 l ξks ˜ ˜ σij (β)σ ij (β) 2 1 ˜ ˜ lj1 j2 gi j (β)g 1 i1 j1 (β) (1/2) (s1 ) (t1 ) (1/2) 1 2 B2 = 2 ξks εi j εi j ξt l ˜ i j (β) ˜ 1 1 1 2 1 1 n σ ( β)σ i j 1 2 1 1 kl=0 i1 =i2 =1 j j j j =1 s1 t1 s2 t2 =0 1 2 1 2 ˜ ˜ (β) (β)g g l pn n pn m × j1 j2 i2 j 2 i2 j1 ˜ ˜ σi j (β)σ i2 j1 (β) 2 2 (1/2) (s2 ) (t2 ) (1/2) ε ε ξ . 2 i 2 j1 i 2 j2 t2 l ξks Using the fact that Ξ1/2 Ξ1/2 = Ξ, 1 B1 = 2 n n pn m i=1 j j j j =1 s1 t1 s2 t2 =0 1 2 1 2 ˜ ˜ lj1 j2 gij (β)g ij1 (β) (s1 ) (t1 ) 2 εij εij ˜ ij (β) ˜ 1 2 σij (β)σ 2 1 ˜ ˜ lj j g (β)g (β) ij1 (s ) (t ) 1 2 ij2 × ε 2 ε 2 ξt1 t2 ξs1 s2 ˜ ˜ σij (β)σij (β) ij1 ij2 2 E(B1 ) = 1 o n 1 E(B2 ) = 2 n n p2n n 1 →0 pn m E i1 =i2 =1 j j j j =1 s1 t1 s2 t2 =0 1 2 1 2 ˜ lj1 j2 g (˜ ηi1 j2 )gi j (β) (s ) (t ) 1 1 εi 1j εi 1j ˜ ˜ 1 1 1 2 σi1 j2 (β)σi1 j1 (β)   ˜ ηi j )g (β)  lj j g (˜  2 2 i 2 j1 (s ) (t ) 1 2 ×E ε 2 ε 2 ξ ξ ˜ ˜ i2 j1 i2 j2  t1 t2 s1 s2  σ ( β)σ ( β) i j i j 2 2 1 = 2 n n 2 1 pn γs1 t1 γs2 t2 ξt1 t2 ξs1 s2 . i1 =i2 =1 s1 t1 s2 t2 =0 47 We observe that ΓΞ = I and that each of the matrices are symmetric giving, implying pn γs1 t1 ξt1 s2 = 1, s1 = s2 (3.8.15) s1 = s2 . (3.8.16) t1 =0 = 0, Thus = pn pn n 1 E(B2 ) = n2 γs2 t2 ξs1 s2 1 n2 γs1 t1 ξt1 t2 (3.8.17) t1 =0 i1 =i2 =1 s1 t2 s2 =0 pn pn n γs2 s1 ξs1 s2 i1 =i2 =1 s2 =0 s1 =0 1 = n2 n pn 1= i1 =i2 =1 s2 =0 pn + 1 n 2 2 n (n − 1)(pn + 1) . n = In a similar fashion, we can show that, pn −2 kl=0 and −2 E(akl δkl ) = n pn 2 kl=0 δkl n pn pn γst i=1 st=0 (1/2) (1/2) ξsk ξkt k=0 −2 = n n pn pn γts ξst = −2(pn + 1) i=1 t=0 s=0 = pn + 1. These results together imply that 2 = o E Ψ−1 n − Ipn +1 = o 1 p2n 1 p2n (n − 1)(pn + 1) n pn + 1 +O . n + 48 − 2(pn + 1) + (pn + 1),(3.8.18) 1 pn Thus, we obtain Ψ−1 n − Ipn +1 = op and conclude the proof of (3.8.10) in Proposition 3.8.5 . Next we prove (3.8.11). Referring to (3.8.9), χn = Ξ1/2 DT V −1/2 e n1/2 n = i=1 n = −1/2 −1 −1/2 R Ai ei n1/2 Ξ1/2 FiT Gi Ai i=1 −1/2 Ξ1/2 FiT Gi Ai R−1 e˜i , n1/2 −1/2 e˜i = Ai ˜ i (β) ˜ (β)e  1/2 (s)  ˜ j j e˜ij ξ0s εij gij (β)l 1 2 2 1 1    ˜ σij1 (β)   pn  m n  1   .. =  , . 1/2   n  i=1 j1 j2 =1 s=0  1/2 (s) ˜ j j e˜ij   ξp s ε g (β)l n ij ij  1 2 2 1 1 ˜ σij1 (β) 2  pn  n pn ξ 1/2 ε(s) g (β)l ˜ j j e˜ij  m ts ij1 ij1 1 1 2 2 χTn χn = ˜   n σij1 (β) t=0 i=1 j j =1 s=0 1 2 = An + Bn , 1 An = n pn (3.8.19) n pn m (1/2) (1/2) (s1 ) (s2 ) ˜ ˜ ˜ij2 e˜ij4 ξts εij εij gij (β)g ij3 (β)lj1 j2 lj3 j4 e 1 2 1 1 3 ξts ˜ ij (β) ˜ σij1 (β)σ 3 t=0 i=1 j1 j2 j3 j4 =1 s1 s2 =0 pn n m pn Bn = , (1/2) (1/2) (s1 ) (s2 ) ˜ ˜ ξts εi j εi j gi j (β)g ˜i1 j2 e˜i2 j4 i2 j3 (β)lj1 j2 lj3 j4 e 1 2 1 1 2 3 1 1 ξts ˜ i j (β) ˜ nσi1 j1 (β)σ 2 3 t=0 i1 =i2 =1 j1 j2 j3 j4 =1 s1 s2 =0 An − (pn + 1) is asymptotically negligible in Lemma 3.8.8 and that Bn has 1/2 pn desired distribution in Lemma 3.8.9. We shall show that Lemma 3.8.8 We have An − (pn + 1) p − → 0. 1/2 pn 49 . Proof Using independence of e˜ with all ε, pn n pn m E(An ) = E (1/2) (1/2) (s1 ) (s2 ) ˜ ˜ ξts εij εij gij (β)g eij2 e˜ij4 ) ij3 (β)lj1 j2 lj3 j4 cov(˜ 1 2 1 1 3 ξts ˜ ij (β) ˜ nσij1 (β)σ 3 t=0 i=1 j1 j2 j3 j4 =1 s1 s2 =0 = 1 n pn n  (s ) (s ) ˜ ˜ m  ε 1 ε 2 gij (β)g ij (β)lj1 j2 ij ij m ξs1 s2 i=1 s1 s2 =0 1 E j1 j2 j3 =1 1 3 3 ˜ ij (β) ˜ σij1 (β)σ 3  j4 =1   lj3 j4 cov(˜ eij2 e˜ij4 ).  We note that lj3 j4 are elements from the inverse of the correlation matrix (R) and that covariance −1/2 matrix of e˜i = Ai ei is R . Also for any invertible matrix M , M −1 M = I. Using (3.8.16), we obtain E(An ) = 1 n n pn m ξs1 s2 i=1 s1 s2 =0 pn = E j1 j2 =j3 =1 pn ξs1 s2 γs1 s2 = s1 s2 =0   (s ) (s ) ˜ j j  ˜  ε 1 ε 2 gij (β)g ( β)l ij ij ij 1 2 1  2 1 2 ˜ ˜ ij (β) σij1 (β)σ 2 (3.8.20)  1 = pn + 1. s1 =0 We now evaluate 2 (s ) (s ) ˜ j j lj j e˜ij e˜ij  ˜ ( β)l ξs1 s2 εij 1 εij 2 gij (β)g ij2 1 2 3 4 2 4 1 1 3 E(A2n ) = E ˜ ˜  n σij1 (β)σij3 (β) i=1 j1 j2 j3 j4 =1 s1 s2 =0 = Aan + Abn , say. (3.8.21)  1 n m pn Now consider, 50 Aan 1 = 2 n n pn m i=1 j j ...j j =1 s s s s =0 1 1 4 4 1 2 1 2 ξs1 s2 ξs s 1 2   s1 (s1 ) (s2 ) (s2 )   ˜ ˜ ˜ ˜    εij1 εij εij3 εij gij1 (β)gij (β)gij3 (β)gij (β)lj1 j2 lj j lj3 j4 lj j e˜ij2 e˜ij e˜ij4 e˜ij  4 2 3 4 1 2 3 1 3 1 ×E ˜ ˜ ˜ ˜   σij1 (β)σ   ij1 (β)σij3 (β)σij3 (β)   =O 1 n 1 p2n =o . (3.8.22) Next, we have Abn n 1 = 2 n pn m ξs1 s2 ξs s lj1 j2 lj j lj3 j4 lj j 3 4 1 2 1 2 i1 =i2 =1 j j ...j j =1 s s s s =0 1 1 4 4 1 2 1 2   (s1 ) (s1 ) (s2 ) (s2 )  ˜ ˜ ˜ ˜   ei1 j2 e˜i j e˜i1 j4 e˜i j   εi1 j1 εi j εi1 j3 εi j gi1 j1 (β)gi j (β)gi1 j3 (β)gi j (β)˜  2 4 2 2 2 1 2 3 2 1 2 3 ×E ˜ ˜ ˜ ˜ σi1 j1 (β)σ i2 j1 (β)σi1 j3 (β)σi2 j3 (β)    1 = 2 n =    n ξs1 s2 γs1 s2 i1 =i2 =1 s1 s2 s1 s2 ξs s γs s 1 2 1 2 (n − 1)(pn + 1)2 . n (3.8.23) Upon combining this fact with (3.8.20), (3.8.21) and (3.8.22), we obtain var(An ) = E(A2n ) − E 2 (An ) = o Thus, we have An − (pn + 1) p − → 0. 1/2 pn 51 1 p2n + (pn + 1)2 . n From, (3.8.19) and Lemma 3.8.8 we obtain χT χ − (pn + 1) (2pn )1/2 = Bn (2pn )1/2 . So the next lemma will establish (3.8.11). Lemma 3.8.9 Asymptotic distribution of Bn is given as Bn → N (0, 1). (2pn )1/2 Definition 3.8.1 (s ) (s ) ˜ ˜ ξs1 s2 εi 1j εij 2 gi j (β)g ˜i1 j2 e˜ij4 ij3 (β)lj1 j2 lj3 j4 e 1 1 1 1 3 , Wni = ˜ ij (β) ˜ σi1 j1 (β)σ i1 =1 j1 j2 j3 j4 =1 s1 s2 =0 3 2 Wni = Wni . n(2pn )1/2 i−1 m pn Note that, 2 Bn = n n Wni . i=1 The random variables Wni form a triangular array of martingale differences w.r.t. the filtration (k) Fn,i = σ(εi j , eij , 1 ≤ i1 ≤ i, 1 ≤ j ≤ m, 0 ≤ k ≤ pn ), (1 ≤ i ≤ n, n ∈ N). N denotes the set 1 of all natural numbers. This implies that Wni is also a martingale difference array. Proof To prove this lemma it is enough to prove the following: n d Wni − → N (0, 1). i=1 52 (3.8.24) According to the CLT for the sums of martingale difference arrays, see Corollary 3.1 in Hall and Heyde (1980), it suffices to verify the following two conditions n p 2 |F → 1, E(Wni n,i−1 ) − i=1 n (3.8.25) p 4 |F → 0, E{Wni n,i−1 } − for all > 0. (3.8.26) i=1 Consider, i−1 2 E(Wni m pn m | Fn,i−1 ) = i1 ,i2 =1 j1 j2 j3 j4 =1 j j j j =1 s s s s =0 1 2 1 2 1 2 3 4 (s ) (s ) ˜ j j l ˜ e˜ l e˜ l ξs1 s2 ξs s εi 1j ε 1 gi j (β)g (β)l 1 2 j1 j2 j3 j4 j3 j4 i1 j2 i2 j2 i 2 j1 1 2 1 1 i2 j1 1 1 × ˜ ˜ σi j (β)σ (β) 1 1 ×E  (s2 ) (s2 )  ˜  gij (β)g  εij ε 3 ij3 ij3 3 i2 j1 ˜ eij e˜ (β)˜ 4 ij 4 ˜ ˜ σij3 (β)σ ij (β)    | Fn,i−1    3 i−1 m m     pn = i1 ,i2 =1 j1 j2 j3 j4 =1 j j j =1 s s s s =0 1 2 1 2 1 2 3 (s ) (s ) ˜ j j l ˜ e˜ e˜ l (β)l ξs1 s2 ξs s εi 1j ε 1 gi j (β)g 1 2 j1 j2 j3 j4 i1 j2 i2 j2 i2 j1 1 2 1 1 i 2 j1 1 1 × ˜ ˜ σi j (β)σ (β) i2 j1 1 1 ×E  (s2 ) (s2 )  ˜  gij (β)g  εij ε      ˜  (β) m  ij3 3 3 ij3 lj j cov(˜ eij4 e˜ij ) ˜ ˜ 3 4 4  σij3 (β)σ  ij3 (β)  j =1 4 53 i−1 m pn m = i1 ,i2 =1 j1 j2 =1 j j =1 s s s s =0 1 2 1 2 1 2 (s ) (s ) ˜ j j l ˜ e˜ e˜ (β)l ξs1 s2 ξs s εi 1j ε 1 gi j (β)g 1 2 j1 j2 i1 j2 i2 j2 i2 j1 1 2 1 1 i2 j1 1 1 × ˜ ˜ σi j (β)σ (β) m × j3 j4 =1 i−1 2 E(Wni m 1 1 i2 j 3 4 1   (s )   ˜ ˜  lj j ε(s2 ) ε 2 gij (β)g  ij4 (β) 3 4 ij3 ij4 3 E , ˜ ij (β) ˜   σij (β)σ   pn m | Fn,i−1 ) = i1 ,i2 =1 j1 j2 =1 j j =1 s s s s =0 1 2 1 2 1 2 s s ˜ ˜ j j l ξs1 s2 ξs s εi 1j ε 1 gi j (β)g (β)l e˜ e˜ 1 2 j1 j2 i1 j2 i2 j2 i2 j1 1 2 1 1 i2 j1 1 1 × × γs s ˜ ˜ 2 2 ( β) σi1 j1 (β)σ i j 2 1 i−1 m pn = i1 ,i2 =1 j j j j =1 s s s s =0 1 2 1 2 1 2 1 2 (s ) (s ) ˜ ˜ j j l e˜ e˜ (β)l ξs s εi 1j ε 1 gi j (β)g 1 2 j1 j2 i1 j2 i2 j2 i 2 j1 1 2 1 1 i2 j1 1 1 × ˜ ˜ (β) σi j (β)σ 1 1 i 2 j1 pn × i−1 ξs1 s2 γs s 2 2 s2 =0 m pn = i1 ,i2 =1 j j j j =1 s s =0 1 2 1 2 1 1 (s ) (s ) ˜ ˜ j j l ξs s εi 1j ε 1 gi j (β)g (β)l e˜ e˜ 1 2 j1 j2 i1 j2 i2 j2 i 2 j1 (1) (2) 1 1 1 1 i2 j1 1 1 × = An + An , ˜ ˜ σi j (β)σ (β) 1 1 54 i 2 j1 where (1) An i−1 pn m = i1 =i2 =1 j j j j =1 s s =0 1 2 1 2 1 1 (2) An i−1 pn m = i1 =i2 =1 j j j j =1 s s =0 1 2 1 2 1 1 s s ˜ ˜ j j l ξs s εi 1j ε 1 gi j (β)g (β)l e˜ e˜ 1 2 j1 j2 i1 j2 i1 j2 i1 j1 1 1 1 1 i1 j1 1 1 , ˜ ˜ σi j (β)σ (β) 1 1 i1 j1 s s ˜ j j l ˜ ξs s εi 1j ε 1 gi j (β)g e˜ e˜ (β)l 1 2 j1 j2 i1 j2 i2 j2 i2 j1 1 1 1 1 i2 j1 1 1 . ˜ ˜ σi j (β)σ (β) 1 1 i2 j1 Note that (1) E(An ) i−1 pn m = i1 =1 j j j =1 s s =0 1 2 1 1 1    ξ    m  ×E lj j cov(˜ ei1 j2 e˜i1 j2 ) ˜ β˜   1 2 σi1 j1 (β)σ(   i1 j1 )   j2 =1   s s     ˜ ˜ pn m i−1  ξs s εi 1j εi 1j gi j (β)g i1 j2 (β)lj1 j2  1 1 1 1 1 2 1 1 E = ˜ i j (β) ˜   σi1 j1 (β)σ    i1 =1 j1 j2 =1 s s =0  1 2 1 1   s   pn ˜ ˜ m i−1  εs1 ε 1 gi j (β)g  i1 j2 (β)lj1 j2 i1 j1 i1 j2 1 1 E ξs s = ˜ ˜ i j (β)   1 2 σi1 j1 (β)σ  j j =1  i =1 1 2 s1 s1 ˜ ˜ s1 s1 εi1 j1 εi1 j gi1 j1 (β)gi1 j (β)lj1 j2 1 1 s1 s1 =0 pn i−1 1 2 1 = i1 =1 s s =0 1 1 ξs s γs s = (i − 1)(pn + 1), 1 1 1 1 (2) E(An ) = 0. The above facts together imply: 2 |F E{E(Wni n,i−1 )} = (i − 1)(pn + 1). 55 Thus n 2 E(Wni E{ i=1 2 | Fn,i−1 )} = E{ 2 n pn 2 |F E(Wni n,i−1 )} (3.8.27) i=1 n 2 = n n2 p n i=1 (i − 1)(pn + 1) → 1. Next, we shall show that n var 2 |F E(Wni n,i−1 ) → 0. (3.8.28) 2 |F 2 E{E(Wni n,i−1 )} (3.8.29) i=1 To prove this claim, first consider n n 2 |F 2 = E(Wni n,i−1 )} E{ i=1 i=1 2 |F 2 E{E(Wni n,i1 −1 )E(Wni | Fn,i2 −1 )}. +2 1 1≤i1 0, there exists a constant ζ > 0 such that for sufficiently large n,   P sup√ ˜ ||β−β||=ζ ˜ T U (β) < 0 ≥ 1 − . (β − β) pn /n Proof Let J(β) = ∂U (β) ∂β and β ∗ be such that β ∗ − β˜ < β − β˜ . Using Taylors expansion we obtain, ˜ T U (β) = (β − β) ˜ T U (β) ˜ + (β − β) ˜ T J(β ∗ )(β − β) ˜ (β − β) ˜ T U (β) ˜ + (β − β) ˜ T (J(β ∗ ) − J(β))(β ˜ ˜ + (β − β) ˜ T J(β)(β ˜ ˜ = (β − β) − β) − β). Using representation (4.1.10) of U (β) we get n Ω(W∗ Ti β)Yi∗ − ΩβWi∗T Yi∗ + ΩYi∗2 − Wi∗ Wi∗T . J(β) = − i=1 70 Consider, ˜ T (J(β ∗ ) − J(β))(β ˜ ˜ (β − β) − β) n −Ω(Wi∗T β ∗ )Yi∗ − Ωβ ∗ Wi∗T Yi∗ + ΩYi∗2 − Wi∗ Wi∗T ˜T = (β − β) ˜ (β − β) i=1 n ˜ ∗T Y ∗ − ΩY ∗2 + W∗ W∗T ˜ ∗ + ΩβW Ω(Wi∗T β)Y i i i i i i ˜T + (β − β) ˜ (β − β) i=1 n n ΩYi∗ (Wi∗T β˜ − Wi∗T β ∗ )(β ˜T = (β − β) ˜ ΩWi∗T Yi∗ (β˜ − β ∗ )(β − β) ˜ + (β − β) ˜T − β) i=1 i=1 = A1 + A2 Consider, n ˜ ΩYi∗ (Wi∗T β˜ − Wi∗T β ∗ )(β − β)| ˜T |A1 | = |(β − β) i=1 n ˜ T ΩY ∗ (W∗T β˜ − W∗T β ∗ )(β − β)| ˜ |(β − β) i i i ≤ i=1 n ≤ Ω Yi∗ Wi∗T (β˜ − β ∗ ) ˜ 2 (β − β) i=1 n ≤ Ω Yi∗ Wi∗T ˜ 3 (β − β) i=1 n ≤ Ω ∗T Yi∗ (X∗T i + Ui ) ˜ 3 (β − β) i=1 n ≤ Ω Yi∗ X∗T + Yi∗ U∗T i i ˜ 3 (β − β) i=1 n ≤ Ω Yi∗ X∗T + Yi∗ U∗T i i ˜ 3 (β − β) i=1 n ≤ Ω Yi∗ ˜ 3 (β − β) i=1 71 sup Xi∗ + sup Ui∗ i i Taking norm and using Assumption 1, 2, 3 we get, sup√ ˜ ||β−β||=ζ ˜ T (J(β ∗ ) − J(β))(β ˜ ˜ = Op ||(β − β) − β)|| pn /n √ 1/4 ζ˜3 pn p1.5 n nn n1.5 = op (1). We can prove similarly for A2 . Now consider, n n−1 (β ˜ ∗ + ΩY ∗2 − W∗ W∗T −2Ω(Wi∗T β)Y i i i i ˜ T J(β)(β ˜ ˜ = n−1 (β − β) ˜T − β) − β) i=1 We have, n n n−1 ˜ ∗ −2Ω(Wi∗T β)Y i ≈ −2Ωn−1 i=1 ˜ ∗) E((Wi∗T β)Y i i=1 n = −2Ω i=1 n n n−1 ΩY ∗2 ≈ n−1 i=1 =Ω+ i=1 n T ˜ ˜ (XiT β)(X i β) n2 i1 =i2 ˜2 (X T β) Ω i −Ω n n T ˜ ˜ (XiT β)(X i β) n = i=1 Xi XiT −Ω+ n Thus, 72 n i1 =i2 2 1 n2 i1 =i2 i=1 i=1 2 1 EWi∗ Wi∗T Wi∗ Wi∗T ≈ n−1 n−1 n , ΩEY ∗2 i=1 n n ˜2 (XiT β) + 2Ω n Xi1 XiT n2 2. , ˜ (β − β) ˜ T J(β)(β ˜ ˜ n−1 (β − β) − β)   n n X XT T i1 i2 Xi Xi ˜ Ω1 ˜T ˜ T − Ω1 β˜T ˜ = (β − β) β + 2β β˜ (β − β) 2 n σ σ n2 i=1 i =i  1 2 n n X XT i1 i2 Xi XiT T ˜ ˜  (β − β). + − (β − β)  n n2 i=1 n i=1 Let Xi XiT = Q, n n i1 =i2 i1 =i2 Xi1 XiT 2 n2 = R, max{|λmax (R)|, |λmin (R)|} = c. Taking supre- mum sup√ ˜ ||β−β||=ζ ≤ ζ 2 pn ˜ T J(β)(β ˜ ˜ (β − β) − β) pn /n −λmin (Ω1 ) ˜T ˜ λmax (Ω1 ) ˜T ˜ β Qβ + β Rβ − λmin (Q) σ2 σ2 + ζ 2 pn λmax (R) ≤ ζ 2 pn −λmin (Ω1 ) λmax (Ω1 ) ˜ 2 λmin (Q) β˜ 2 + c β − λmin (Q) + λmax (R) 2 σ σ2 = ζ 2 pn A Assumptions 4, 5, 6 ensure that A ≤ 0. Thus, the theorem is proved. 4.2 Simulation We report the details of a limited simulation study that investigates the accuracy of our proposed estimator in presence of measurement error. While our method does not put restrictions on the covariance function of the error process, given the challenges of generating a process with the specific covariance structure we considered the identity covariance function i.e. σ2 I{s=t} . This 73 makes it easier to control the error in the model. Following Xiaochen Cai (2014) we generated the functional covariate using the following: pn Xi (t) = εik ρk (t), k=1 where, εik ∼ N (0, 1) and are independent for all i = 1, ..., n and k = 1, ...pn . We set pn = 8. We generated the basis function using the Canadian weather data set Ramsay (2006). This data set is available in the fda package in R software. We first smoothed the data using B-splines smoothing that is described in the simulation section in Chapter 3. We generated the response using Yi = Xi (t)β(t)dt + i , where i ∼ N (0, 1) and are independent for all i. We then generated the surrogate variable Wi (t) using Wi (t) = Xi (t) + Ui (t) where, Ui (t) is a Gaussian process with 0 mean function and covariance function given as K(s, t) = σ2 I{s=t} . We vary the value of σ2 to study the effect of measurement error on the estimation procedure. We use 100 replicates to estimate the covariance function. Let Wi1 , ..., Wi100 denote the replicates of Wi and Wi. denote their mean. The estimate of the covariance function K(·, ·) of the measurement error is n 100 (Wij − Wi. )(Wij − Wi. )T i=1 j=1 n(100 − 1) . ˆ k denote the k th eigenvalue of the K(·, ˆ 1 , ..., λ ˆ p ). The dimension ˆ ·). Then, Ω ˆ 1 = diag(λ Let λ n pn is chosen by using 5-fold cross validation. We solve the (4.1.10) iteratively with start value obtained from the naive estimator. The following table reports the result of the simulation. Error ˜ T (βˆ − β)/ ˜ β˜T β. ˜ In Table 4.1, the naive estimator is the one where we is calculated as (βˆ − β) ignore the measurement error i.e treat W (t) as the true covariate. The corrected estimator is the one obtained by solving (4.1.10). We can see that as the measurement error increases, the error 74 Table 4.1: Error in the estimator as a function of σ2 Error(σ2 ) Naive 0.1 0.009 0.3 0.056 0.4 0.08 0.5 0.115 0.8 0.202 1 0.256 Corrected 0.001 0.006 0.008 0.02 0.08 0.855 in the estimation increases for both the estimators. We can also see that the corrected estimator improves the accuracy of the estimate. Recall that the estimating equation yields multiple solutions and we do not have a way of choosing the correct one. To avoid this problem, we solve the equation iteratively starting from the naive estimator. However, as the measurement error increases in the data, the start values are farther from the true value and hence, the iterative algorithm does not always converge to the true value. We observe that the corrected estimator performed better than the naive one only for a limited range of σ2 . 75 BIBLIOGRAPHY 76 BIBLIOGRAPHY Balan, R. M., Schiopu-Kratina, I., et al. (2005). Asymptotic results with generalized estimating equations for longitudinal data. The Annals of Statistics, 33(2):522–541. Cardot, H. (2000). Nonparametric estimation of smoothed principal components analysis of sampled noisy functions. Journal of Nonparametric Statistics, 12(4):503–538. Cardot, H., Crambes, C., Kneip, A., and Sarda, P. (2007). Smoothing splines estimators in functional linear regression with errors-in-variables. Computational statistics & data analysis, 51(10):4832–4848. Cardot, H., Ferraty, F., and Sarda, P. (1999). Functional linear model. Statistics & Probability Letters, 45(1):11–22. Cardot, H., Ferraty, F., and Sarda, P. (2003). Spline estimators for the functional linear model. Statistica Sinica, pages 571–591. Cardot, H. and Johannes, J. (2010). Thresholding projection estimators in functional linear models. Journal of Multivariate Analysis, 101(2):395–408. Cardot, H. and Sarda, P. (2005). Estimation in generalized linear models for functional data via penalized likelihood. Journal of Multivariate Analysis, 92(1):24 – 41. Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement error in nonlinear models: a modern perspective. CRC press. Cirulli, E. T. and Goldstein, D. B. (2010). Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nature Reviews Genetics, 11(6):415–425. Gertheiss, J., Maity, A., and Staicu, A.-M. (2013). Variable selection in generalized functional linear models. Stat, 2(1):86–101. Hall, P. and Heyde, C. (1980). Martingale limit theory and its application. Probability and mathematical statistics. Academic Press. Hicks, B. M., Schalet, B. D., Malone, S. M., Iacono, W. G., and McGue, M. (2011). Psychometric and genetic architecture of substance use disorder and behavioral disinhibition measures for gene association studies. Behavior genetics, 41(4):459–475. Horv´ath, L. and Kokoszka, P. (2012). Inference for functional data with applications, volume 200. Springer Science & Business Media. Hsing, T. and Eubank, R. (2015). Theoretical foundations of functional data analysis, with an introduction to linear operators. John Wiley & Sons. 77 Laird, N. M. and Lange, C. (2010a). The fundamentals of modern statistical genetics. Springer Science & Business Media. Laird, N. M. and Lange, C. (2010b). The Fundamentals of Modern Statistical Genetics. Springer Publishing Company, Incorporated, 1st edition. Lee, S., Abecasis, G. R., Boehnke, M., and Lin, X. (2014). Rare-variant association analysis: study designs and statistical tests. The American Journal of Human Genetics, 95(1):5–23. Li, Y., Wang, N., and Carroll, R. J. (2010). Generalized functional linear models with semiparametric single-index interactions. Journal of the American Statistical Association, 105(490):621– 633. Madsen, B. E. and Browning, S. R. (2009). A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet, 5(2):e1000384. Morris, J. S. (2015). Functional regression. Annual Review of Statistics and Its Application, 2:321–359. M¨uller, H.-G. and Stadtm¨uller, U. (2005). Generalized functional linear models. Ann. Statist., 33(2):774–805. Ortega, J. M. and Rheinboldt, W. C. (2000). Iterative Solution of Nonlinear Equations in Several Variables. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA. Ramsay, J. O. (2006). Functional data analysis. Wiley Online Library. Saccone, N. L., Saccone, S. F., Hinrichs, A. L., Stitzel, J. A., Duan, W., Pergadia, M. L., Agrawal, A., Breslau, N., Grucza, R. A., Hatsukami, D., et al. (2009). Multiple distinct risk loci for nicotine dependence identified by dense coverage of the complete family of nicotinic receptor subunit (chrn) genes. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics, 150(4):453–466. Schaid, D. J., McDonnell, S. K., Sinnwell, J. P., and Thibodeau, S. N. (2013). Multiple genetic variant association testing by collapsing and kernel methods with pedigree or population structured data. Genetic epidemiology, 37(5):409–418. Stefanski, L. A. and Carroll, R. J. (1987). Conditional scores and optimal scores for generalized linear measurement-error models. Biometrika, pages 703–716. Vrieze, S. I., Malone, S. M., Vaidyanathan, U., Kwong, A., Kang, H. M., Zhan, X., Flickinger, M., Irons, D., Jun, G., Locke, A. E., et al. (2014). In search of rare variants: preliminary results from whole genome sequencing of 1,325 individuals with psychophysiological endophenotypes. Psychophysiology, 51(12):1309–1320. 78 Vsevolozhskaya, O. A., Zaykin, D. V., Greenwood, M. C., Wei, C., and Lu, Q. (2014). Functional analysis of variance for association studies. PloS one, 9(9):e105074. Wang, J.-L., Chiou, J.-M., and M¨uller, H.-G. (2016). Functional data analysis. Annual Review of Statistics and Its Application, 3:257–295. Wang, L. et al. (2011). Gee analysis of clustered binary data with diverging number of covariates. The Annals of Statistics, 39(1):389–417. Wang, X., Lee, S., Zhu, X., Redline, S., and Lin, X. (2013). Gee-based snp set association test for continuous and discrete traits in family-based association studies. Genetic epidemiology, 37(8):778–786. 79