MULTIVARIATE GENERALIZED FUNCTIONAL LINEAR MODELS WITH
APPLICATIONS TO GENOMICS
By
Sneha Jadhav

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Statistics – Doctor of Philosophy
2017

ABSTRACT
MULTIVARIATE GENERALIZED FUNCTIONAL LINEAR MODELS WITH
APPLICATIONS TO GENOMICS
By
Sneha Jadhav
This thesis is focused on developing functional data methodology with the aim of addressing problems that arise in genetic sequencing data. While significant progress has been made in
identifying common genetic variants associated with diseases, these variants only explain a small
proportion of heritability. Recent studies suggest that rare variants could account for this variability. With advancements in sequencing technology, large-scale sequencing studies are now being
conducted to comprehensively investigate the contribution of rare variants to the genetic etiology of
various diseases. Although these studies hold great potential for uncovering new disease-associated
variants, the massive amount of data and complex structure of sequencing data poses great analytical challenges on association analysis. Advanced methods are needed to address these challenges
and to facilitate the discovery process of new variants predisposing to various diseases. We use
functional data analysis methods to capture the complexities of sequencing data.
In the first chapter we investigate the importance of considering the genetic structure of sequencing data. In association studies the effect of appropriately modeling genetic structure of
sequencing data on association analysis have not been well studied. We compare three statistical
approaches which use different strategies to model the genetic structure. They are a burden test,
a burden test that considers pairwise correlation, and a functional analysis of variance (FANOVA)
test that models the gene through fitting continuous curves on an individuals genotype profile. We
find some evidence in favor of treating sequencing data as a function.
In the second chapter we present the definitions of some fundamental concepts in Functional

Data Analysis like the mean element, covariance operator and its eigen decomposition, and KarhunenLo`eve expansion. Basis expansion and in particular Karhunen-Lo`eve expansion play an important
role in this thesis. We briefly discuss the estimators for the mean function, the covariance operator and their consistency. Results on the consistency of the eigenvalues and eigenfunctions of the
sample covariance operator are also stated.
Several times genetic data is collected on families, where the response variable or the trait of
the family members can be dependent on each other. Additionally, this trait of interest can be
discrete or continuous. Thus there is a need for a functional model that can handle dependent data
that may be continuous or discrete. The model proposed by M¨uller and Stadtm¨uller (2005) uses
the generalized estimating equations approach that can handle both continuous and discrete data.
However, they assume the response variable to be univariate and the sample to be independent.
There are no existing functional methods that we know of that can be directly applied to the family
data. In the third chapter we develop a framework for dependent generalized functional linear
models where the response is multivariate, that can be used to test for a certain type of association
between the genetic data and the trait of interest for family data.
In the fourth chapter we develop regression framework where the response variable has a normal distribution and there is measurement error in the regressor function. In this set-up, the true
regressor function is not observable. Instead, we observe a surrogate variable and its replicates.
The relation between the true function and the surrogate one is assumed to follow the additive classical measurement error model. We use the approach developed by Stefanski and Carroll (1987) to
propose an estimating equation for the parameters and show asymptotic existence and consistency
of the estimate obtained from this equation.

Copyright by
SNEHA JADHAV
2017

ACKNOWLEDGMENTS

I wish to express sincere gratitude to my advisor Prof. Hira L. Koul. I am thankful for his immeasurable support, guidance, encouragement and patience. I would also like to thank Prof. Qing Lu
for giving me the opportunity to work with him on a variety of projects and broadening the scope
of my work. He has been a very supportive mentor. I thank Prof. Sakhanenko for lending her ear to
my ideas and offering her advice on several occasions and Prof Cui for serving on my dissertation
committee.
A loving thanks to all my friends for making this a fun-filled journey. I would like to express
my appreciation to the graduate secretary Sue Watson for looking out for us students. I am grateful
to my husband Venkat Ramakrishnan for his unyielding patience, support and love. This thesis
might not have been written without the support of my mum Hemlata Jadhav, my father Umakant
Jadhav and my sister Shalmali Jadhav who have shaped me in to who I am.

v

TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Chapter 1
Modeling Sequencing Data . . . . . . . .
1.1 Burden Test . . . . . . . . . . . . . . . . . . . . .
1.2 Burden test that considers the pairwise Correlation .
1.3 Functional Analysis of Variance test . . . . . . . .
1.4 Simulation . . . . . . . . . . . . . . . . . . . . . .
1.5 Results . . . . . . . . . . . . . . . . . . . . . . . .
1.6 Discussion . . . . . . . . . . . . . . . . . . . . . .
1.7 Conclusion . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

1
2
3
3
4
5
6
7

Chapter 2
Preliminaries on Functional Data Analysis . . . . . . . . . . . . . . . . . 9
2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 3
Multivariate Generalized Functional Linear Models . .
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Asymptotic Theory . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Existence and Consistency . . . . . . . . . . . . . . . . .
3.4.2 Asymptotic Normality . . . . . . . . . . . . . . . . . . .
3.5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6 Application to the sequencing data from the Minnesota Twin Study
3.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

14
14
16
18
19
19
20
23
29
31
31

Chapter 4
Measurement Error Model . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

vi

LIST OF TABLES

Table 1.1:

Power for unidirectional effect . . . . . . . . . . . . . . . . . . . . . . . . . .

6

Table 1.2:

Power for bidirectional effect . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

Table 1.3:

Summary of top 10 genes with the smallest p-values from the association analysis

6

Table 3.1:

Empirical power as a function of γ . . . . . . . . . . . . . . . . . . . . . . . . 24

Table 3.2:

Empirical level as a function of γ . . . . . . . . . . . . . . . . . . . . . . . . . 24

Table 3.3:

Empirical power comparison with the gSKAT . . . . . . . . . . . . . . . . . . 26

Table 3.4:

Empirical power as a function of sample size . . . . . . . . . . . . . . . . . . . 27

Table 3.5:

Empirical level as a function of cluster size . . . . . . . . . . . . . . . . . . . . 27

Table 3.6:

Effect of increasing pn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Table 3.7:

p-values for Minnesota Twin Study . . . . . . . . . . . . . . . . . . . . . . . . 30

Table 4.1:

Error in the estimator as a function of σ2 . . . . . . . . . . . . . . . . . . . . . 75

vii

LIST OF FIGURES

Figure 1.1:

LD plots and plots of the fitted smooth functions for THRA . . . . . . . . . .

Figure 3.1:

Normal Q-Q plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

viii

8

Chapter 1
Modeling Sequencing Data
Genome-wide association studies have made substantial progress in identifying common variants
associated with human diseases. Despite this, a large portion of heritability remains unexplained.
Evolution theory and empirical human genetic studies suggest that rare mutations could play an
important role in human diseases, which motivates comprehensive investigation of rare variants
in sequencing studies. Advances in sequencing technology have enabled researchers to sequence
exome regions or even the whole genome at affordable costs (Cirulli and Goldstein, 2010). The
emerging sequencing data facilitates the study of massive amounts of single nucleotide variants,
including both rare and commons variants for their potential role in complex human diseases. Although these studies hold great promise for identification of new disease-susceptibility variants,
the extremely large number of single nucleotide variants (SNVs) brings significant challenge for
association analysis. Each of these variants have a certain position on the chromosome that is also
available to us. Given the linkage disequilibrium (Laird and Lange (2010a)) and that the genes that
are closer together are more likely to be inherited together during meiosis, we should consider the
positions of the variants in our analysis. This suggests that we can consider the genetic data as a
function of the positions of the variants. Moreover, the sequencing data is usually high dimensional
and we shall see in subsequent chapters that treating it as a function will help address the problems
posed by its high dimensional nature. Treating this data a function does not require the data to be
independent or assume a certain correlation structure. The functional approach can accommodate

1

rare variants. To explore the association of rare variants with human diseases, many statistical
approaches have been developed with different ways of modelling the genome so as to capture its
properties to the fullest. Conventional single-locus analysis suffers from low power because of low
frequency of SNVs and multiple testing issues. Grouping SNVs in a genetic region (e.g., gene)
could aggregate the association signal and alleviate the multiple testing issues, and therefore has
been widely used in statistical analysis of sequencing data (Lee et al. (2014)). Various statistical
methods have been proposed to group SNVs with or without considering the underlying genetic
structure. However, the impact of different strategies of modelling sequencing structure on the
association results has rarely been investigated. If empirical evidence suggests no use of considering the sequencing structure in the association analysis, it gives us a basis for excluding this
factor from statistical modelling. On the other hand, if it is important to consider the relationship
among SNVs, then we need to investigate appropriate strategies for characterizing the underlying
sequencing structure. As an initial step to investigate this issue, we choose three tests with different
ways of modelling the correlation between SNVs: 1) A weighted burden test (BT) (Madsen and
Browning, 2009); 2) A weighted burden test considering pairwise correlation (BTCOV) (Schaid
et al., 2013); and 3) A functional analysis of variance test (FANOVA) (Vsevolozhskaya et al., 2014)
that considers the relation among nearby loci and models the genotype profile of an individual as
a continuous function.
We briefly present the three methods below.

1.1

Burden Test

We consider a burden test developed by Madsen and Browning (2009). The test summarizes the
L

genetic score of multiple SNVs as γj =

gij
, where gij is the number of low frequency allele
i=1 wi
2

of the SNV, i, for individual j. The weight is defined to emphasize the effect of rare variants i.e.
wi =

ni qi (1 − qi ), where qi is the minor allele frequency (MAF) of the SNV, calculated from

controls. Because the test simply adds the genotype of each SNV weighted by its MAF, it does not
consider Linkage Disequilibrium (LD) or the correlation between SNVs.

1.2

Burden test that considers the pairwise Correlation

In addition to the above burden test, we also consider another type of burden test suggested by
Schaid et.al Schaid et al. (2013), which considers pairwise covariance. We consider the folL

lowing summary of genetic scores, Sj =

gij
, wl and gij are defined in the same manner
i=1 wi

in BT. However, unlike the conventional burden test, the test statistic of BTCOV is given by
L L
((Y − Y¯ ) S)2
T =
,
where
V
=
wk wl Rkl pk (1 − pk )pl (1 − pl ) and Rkl is the
s
(Y − Y¯ ) Vs (Y − Y¯ )
k=1 l=1

correlation between the SNPs.

1.3

Functional Analysis of Variance test

FANOVA fits a continuous function (curve) on the genotype data of an individual. ANOVA can
then be used to test the curve difference in cases and controls. While various smoothing methods
can be used to fit smooth functions on genotype data, we use the cubic B-splines to fit the smooth
functions. After continuous functions g(t), are obtained, the functional analysis of variance can be
used for association testing. FANOVA model is written as:

gij (t) = µi (t) + ij (t)

ij (t)

→ G.P (0, γ)

3

i = 1, 2, ..., nj j = 1, 2,

where, t denotes the genomic position of the genetic variant, k denotes the case or control group
and j denotes the individual, G.P (0, γ) denotes Gaussian process with γ as the covariance function,
kj

is the error term and µk is the mean function of group k.
To evaluate the association, we test the hypotheses: H0 : µ1 (t) = µ2 (t)

∀t vs. H1 : µ1 (t) =

µ2 (t) for some t. Similar to ANOVA, the test statistic for the hypothesis is,
2

F =

nk (ˆ
µk (t) − µ
ˆ(t))2 dt/(2 − 1)

k=1
2 nk

,
(gkj (t) − µ
ˆk

(t))2 dt/(n − 2)

k=1 j=1

where µ
ˆk (t) =

n−1
k

nk
j=1

2

gkj (t) and µ
ˆ(t) = n−1

nk µ
ˆk (t). The numerator and the denominak=1

tor in the F follow a mixture of chi-squared distributions. Satterwaite approximation is used to
approximate the distribution of F as F-distribution. The details can be found in Vsevolozhskaya
et al. (2014).

1.4

Simulation

We now report the finding of the simulation study that compares the performance of the three
methods. We selected a one mb region from chromosome three of the unrelated real sequencing
data provided by GAW19, which comprises of 8575 SNVs. For each replicate, we randomly
selected a 30 kb segment from the one mb region. From this segment, we randomly selected a
specified proportion of SNVs for generating phenotype. We used the logistic model to generate
phenotype from these selected SNVs. The two types of effects were considered in the simulation.
These effects refer to the coefficients of the SNVs in the logistic set up. We randomly generated the
regression coefficients from N (0, 1) for bidirectional effects and N (2, 1) for unidirectional effects.

4

For each type of effect, we varied the proportion of disease-associated variants from 0.01 to 0.5.
One thousand replicates were simulated for each scenario for power and type-I error calculation.
For comparison, we adopted the same weight for both the burden tests. For FANOVA, we used the
penalized cubic B-splines to determine the smoothness of functions.
These simulations only evaluated one genetic region. To investigate the performance of the
three methods on regions with different genetic structures, we applied them to the unrelated simulated GAW data in order to evaluate the association of the 294 reported disease-associated genes
with the simulated hypertension phenotype. For the association analysis, hypertension (HTN1)
from the first simulation out of the 200 simulations was used. This data has a sample size of 142,
out of which there were 24 cases.

1.5

Results

Type-I error rates of the three tests were well controlled at 0.05 level (0.046 for BT, 0.044 for
BTCOV, and 0.047 for FANOVA). As we observe from Table 1, power of the three tests increases
as we increase the proportion of disease-associated variants. Overall, FANOVA has better or comparable performance to BT and BTCOV, while BTCOV obtains similar power to BT. The same
conclusion also holds when the effects are bidirectional (see Table 2). We also observe that the
power of the three tests was slightly lower in the case of bidirectional effects than in the case of
unidirectional effects. Table 3 summarizes the top ten genes with the smallest p-values from the
association analysis. Consistent with the result from simulations, we find that in general FANOVA
attains smaller p-values, while the p-values of BT and BTCOV were very similar. Here smaller
p-values indicate better performance as these genes were from the 294 reported disease-associated
genes.

5

Table 1.1: Power for unidirectional effect
%CV
.01
BT
0.343
BTCOV 0.339
FANOVA 0.398

.05
.1
.15
.2
0.617 0.714 0.766 0.759
0.615 0.712 0.767 0.755
0.700 0.764 0.808 0.807

.25
.3
.5
0.776 0.793 0.781
0.780 0.792 0.794
0.814 0.814 0.744

Table 1.2: Power for bidirectional effect
%CV
.01
BT
0.208
BTCOV 0.200
FANOVA 0.217

.05
.1
.15
.2
0.508 0.585 0.621 0.665
0.509 0.579 0.618 0.663
0.597 0.683 0.732 0.765

.25
.3
.5
0.678 0.703 0.683
0.668 0.698 0.680
0.799 0.809 0.815

Table 3 summarizes top 10 genes with the smallest p-values from the association analysis.

1.6

Discussion

Through this study, we observe that overall BT and BTCOV have a comparable performance.
However, for one gene, THRA, BTCOV attained better performance than the other two tests. In
the follow-up analysis, we observe a small LD block in this gene (see Figure 1). The plot of the
fitted genotype curves reveal the association happens to lie in that LD block. Therefore, BTCOV,
which models the LD pattern, outperforms the other two tests. Also, the effects in the LD block
Table 1.3: Summary of top 10 genes with the smallest p-values from the association analysis
Gene
SUMF1
RELB
HIF3A
THRA
TFDP1
PROK2
POLR2A
CD1C
CCL24
MAP3K6

BT
8.75E-05
6.57E-02
2.12E-02
1.85E-02
1.50E-02
1.23E-02
2.10E-02
2.76E-02
2.24E-02
8.72E-02

FANOVA
1.31E-05
4.35E-04
4.34E-03
3.62E-02
1.95E-02
1.27E-01
1.31E-02
5.27E-02
1.92E-02
3.27E-02

6

BTCOV
8.43E-05
7.08E-02
2.19E-02
9.68E-03
1.13E-02
1.32E-02
2.25E-02
1.42E-02
2.71E-02
8.45E-02

were largely unidirectional, which is in favor of burden tests. The genetic structure of sequencing
variants is usually more complex than pairwise LD. Functional methods provide tool to visualize
this structure and explore the associated regions. Figure 1 reveals the interesting structure of RELB.
We can see that the associated region lies in LD block but the correlation is not as strong as that
for THRA. FANOVA performs much better for this gene.

1.7

Conclusion

Our observations indicate that the performance of tests depends on the underlying genetic structure
and hence ignoring it in the association analysis may not be ideal. It is advisable to use function
based approaches to explore and model the sequencing structure. As illustrated by Figure 1 (fitted
genotype function vs variant position), the plot of fitted functional curves provides a reasonable
way to explore the genetic structure. The disease-associated regions can also be visualized in the
plot. If the underlying genetic structure tends to be complex, it is also advisable to use function
based approaches, such as FANOVA, to adequately model the sequencing data.

7

Figure 1.1: LD plots and plots of the fitted smooth functions for THRA

8

Chapter 2
Preliminaries on Functional Data Analysis
Functional data can be viewed as realizations of a random variable that takes values in a Hilbert
space. In this light, we briefly introduce the concept of mean element and covariance operator
in Hilbert space and give some necessary definitions. We can also think of functional data as
the sample paths of a stochastic process with smooth mean and covariance functions. Both these
perspectives are discussed in a greater detail in Hsing and Eubank (2015). We briefly discuss the
estimation of the mean function and the covariance operator in the space L2 [0, 1] of all square
integrable functions on domain [0, 1]. We also discuss the concept as well as the estimation of
eigenvalues and eigenfunctions of the covariance operator. Further details of these concepts can be
found in Horv´ath and Kokoszka (2012).

2.1

Definitions

Let f be a function on a measure space (E, B, µ) that takes values in a separable Hilbert space H.
k

Definition 2.1.1 A function f is called simple if it can be represented as: f (ω) =
i=1

IEi (ω)gi ,

for some finite k, Ei ∈ B and gi ∈ H.
k

Definition 2.1.2 Any simple function f (ω) =
i=1

IEi (ω)gi with µ(Ei ) < ∞, ∀i, is said to be
k

integrable and it’s Bochner integral is defined as:

f dµ =

µ(Ei )gi .
i=1

9

Definition 2.1.3 A measurable function f is said to be Bochner integrable if there exists a sequence {fn }∞
lim
n=1 of integrable simple functions such that n→∞
integral of f is defined as

f dµ = lim

n→∞

||fn − f || = 0. The Bochner

fn dµ.

The existence of the sequence of functions fn in the above definition is guaranteed by the following
condition:

||f ||dµ < ∞.

Definition 2.1.4 Let {ei } be an orthonormal basis for Hilbert space H1 and X be a bounded
∞

linear function from H1 → H2 where H2 is also a Hilbert space. If X satisfies
i=1

where ||

||Xei ||22 < ∞,

||2 is the norm on H2 then X is a Hilbert Schmidt operator. The collection of all such

Hilbert Schmidt operators is denoted by BHS (H1 , H2 ). This is a Hilbert space. If X1 , X2 ∈
∞

BHS (H1 , H2 ) then X1 , X2 HS =

X1 ei , X2 ei 2 .
i=1

Definition 2.1.5 Let X1 ∈ H1 and X2 ∈ H2 . The tensor product X1 ⊗ X2 is an operator defined
from H1 → H2 in the following way: X1 ⊗ X2 (Y ) = X1 , Y X2 , where Y ∈ H1 .
Let X be a random element of a separable Hilbert space H defined on a probability space (Ω, F, P).
The norm on this space is denoted by ||

|| and the inner product by ·, · .

Definition 2.1.6 If E||X|| < ∞, the mean of X is defined as the Bochner integral µ = E(X) =
XdP.
Definition 2.1.7 Assume that E||X||2 < ∞. Then, the covariance operator for X is the element
of BHS (H) given by K = E[(X − µ) ⊗ (X − µ)] = (X − µ) ⊗ (X − µ)dP.
The Hilbert space of particular interest to us is the L2 ([0, 1]) space. We say that a function X
belongs to the space L2 = L2 ([0, 1]) if X is defined on [0, 1] and satisfies 01 X 2 (t)d(t) < ∞.
L2 space is a separable Hilbert space with the following inner product: X, Y = 01 X(t)Y (t)dt.
10

We consider a random curve X(t), t ∈ [0, 1] to be a random element of L2 . The mean function is µ(t) = E(X(t)) and the covariance function is K(s, t) = E(X(s)X(t)). We can see that
K(Y )(t) = E( X, Y X(t)) = E

X(s)Y (s)ds)X(t) =

E(X(t)X(s))Y (s)ds =

K(t, s)Y (s)ds.

We can show that the operator K is symmetric and positive definite.
Definition 2.1.8 Suppose that there exists a λ such that Ke = λe, then λ is the eigenvalue and e
is the eigenfunction of K.
Note that we can show that the eigenfunctions of K are linearly independent. We can orthonormalize them using Gram Schmidt orthonormalization and use them as basis functions. We use this
fact in our simulations extensively to generate basis functions.

2.2

Estimation

Now that we have presented some basic definitions, we turn to estimation of the defined constructs.
Assume that we have a sample of curves X1 , ...Xn from L2 , that are independent and have the same
distribution as X. We assume that X is integrable.
The mean function estimate is given by the sample mean:
n

µ
ˆ(t) =

n−1

Xi (t).
i=1

The covariance function estimate is given by its sample counterpart:
n

ˆ s) = n−1
K(t,

(Xi (s) − µ
ˆ(s))(Xi (t) − µ
ˆ(t)).
i=1

11

Similarly the covariance operator estimate is given by
n

ˆ
K(x)
= n−1

Xi − µ
ˆ, x (Xi − µ
ˆ),

x ∈ L2 .

i=1

We state some results that establish the properties of these estimates.
Result 2.2.1 Under the assumption that sample of curves is i.i.d, integrable and has the same
distribution as X, E(ˆ
µ) = µ and E||ˆ
µ − µ||2 = O(n−1 ).
Thus, sample mean is an unbiased and consistent. Thus, from now on we can assume that the mean
function µ = 0. The estimate for covariance is biased just like in the multivariate case. This bias
is asymptotically negligible.
ˆ − K||2 = n−1 E||X||4 .
Result 2.2.2 If E||X||4 < ∞, EX = 0 then E||K
HS
We often need to estimate the eigenvalues and eigenfunctions of the the covariance operator. Thus,
ˆ v . Note that if v is an eigenfunction then
ˆ v = λˆ
the estimates of these eigenvalues are given by Kˆ
av is also an eigenfunction. The eigenfunctions are usually normalized so that ||v|| = 1. This does
not determine the sign of v. If kˆj = sign( vˆ, v ), then kˆj cannot be determined from the data.
Result 2.2.3 Suppose E||Xi ||4 < ∞, EX = 0 and λ1 >, ..., > 0. Then, for each j ≥ 1,
lim sup
n→∞

nE(||kˆj vˆj − vj ||2 ) < ∞, lim sup
n→∞

ˆ j |2 ) < ∞. If we assume that only the
nE(|λ − λ

top p + 1 eigenvalues are non zero then the above result holds for 1 ≤ j ≤ p.
We now state the Mercers theorem and Karh¨unen Lo´eve expansion. Let (E, B, µ) be a measure
space. Suppose that K is a measurable continuous function on E×E such that
∞. Define, operator K by Kf (·) =

K(s, t)dµ(s)dµ(t) <

K(s, ·)f (s)dµ(s). K is the integral operator of K and K is

the kernel function of K.
12

Definition 2.2.1 The kernel function K is symmetric if K(s, t) = K(t, s).
n

ci cj K(xi , xj ) ≥ 0 holds for

Definition 2.2.2 The kernel function is non negative definite if
i,j=1

all n ∈ N, x1 , ...xn ∈ E, c1 , ...cn ∈ R. It’s positive definite if the relation is strictly greater than
zero.

The following result is the Mercers Theorem. Notice that the covariance function has this representation.
Result 2.2.4 Let K be a symmetric, non negative definite kernel function and K be its integral
operator. If (λi , ei ) are the eigenvalue and eigenfunction pairs of K, then K has the representation
∞

K(s, t) =

λi ei (s)ei (t).
i=1

The following result is the Karh¨unen Lo´eve expansion.
Result 2.2.5 Let X be a zero-mean, square-integrable random function defined over a probability
space (Ω, F, P) and indexed over a closed and bounded interval [a, b], with continuous covariance
function K(s, t). Let (λi , ei ) be the eigenvalue and eigenfunction pairs of the integral operator of
∞

the covariance function. Then, X(t) admits the following decomposition: X(t) =

Zi ei (t),
i=1

where Zk =

X(t)ek (t)dt. Furthermore, the random variables Zk have zero mean, are uncorre-

lated and have variance λk .

13

Chapter 3
Multivariate Generalized Functional Linear
Models

3.1

Introduction

The focus of this chapter is to provide a large sample test for assessing if a functional covariate
has a regression effect on real valued responses, when there is some dependence among responses.
This kind of data typically arises in family based genetic studies when gene expression data or
even DNA sequencing data is collected. The aim of these studies is often to test for regression
relation between the gene region and the phenotype of interest. Sequencing data for a gene or a
gene region consists of observations on a large number of single nucleotide variants. In the light of
linkage disequilibrium (Laird and Lange, 2010b), we know that the variants that are closer to each
other may have greater association than those that are farther from each other. This motivates us to
treat the high dimensional sequencing data as a function of the single nucleotide variant positions
and necessitates the use of a functional data based inference method for correlated data.
There is abundant literature on functional linear models (Cardot et al., 1999, 2003; Cardot and
Sarda, 2005; Cardot and Johannes, 2010; Ramsay, 2006). Recent reviews of functional regression
can be found in Morris (2015) and Wang et al. (2016). These linear models assume that the response is a univariate continuous variable and the regressor variable is a function. However, in

14

genetic studies the phenotype or the response variable is often binary and few papers discuss generalized functional linear models (M¨uller and Stadtm¨uller, 2005; Li et al., 2010; Gertheiss et al.,
2013) as needed for handling binary response variables. Existing methodology cannot be directly
applied to genetic family data, where the response variable is a vector of dependent traits and the
regressor is a vector of functions. We address this shortcoming using generalized estimating equations. The estimators thus obtained are shown to be consistent and asymptotically normal, under
suitable conditions on the underlying entities. The latter result is used to propose an asymptotic
test for regression relation between the functional covariates and the responses.
The chapter also includes a finite sample simulation study. Through this study, we first demonstrate the importance of addressing correlation structures in the data and then compare the performance of our proposed method with another method for family studies proposed by Wang et al.
(2013). This method is based on a generalized estimating equations approach suitable for accommodating a large number of variables. We also present additional results demonstrating the effect
of sample size, the dimension of parameter to be estimated and family (cluster) size on the power
of the proposed test.
This chapter is organized as follows. Section 2 contains formal description of the original
infinite dimensional model and a working truncated model based on the strategy proposed by
M¨uller and Stadtm¨uller (2005). Section 3 contains the description of the estimators based on
generalized estimating equations while section 4 describes the asymptotic normality results for
these estimators. Section 5 describes findings of a simulation study while section 6 presents a real
data application.

15

3.2

Model

Let m, n be positive integers. We observe n clusters (Xi (t), t ∈ T = [0, 1], Yi ), i = 1, ..., n, each of
size m, where, for each i, Xi = (Xi1 , Xi2 , · · · , Xim )T is a an m-dimensional predicting process
and Yi = (Yi1 , Yi2 , · · · , Yim )T is a vector of m responses. We assume that these n clusters are
independent and identically distributed and that all clusters have the same correlation structure. For
jth subject in ith cluster the predicting process Xij (t), t ∈ T, is assumed to be a square integrable
random process on T . The corresponding response Yij can be continuous or discrete, real valued
variable which is related to Xij (t), t ∈ T via the following generalized regression model, where ω
˜
is a positive measure on T . For a constant β˜0 , and a real valued function β(t),
t ∈ T , let

ζ˜ij = β˜0 +

˜
β(t)X
ij (t)dω(t),

i = 1, ..., n,

j = 1, ..., m.

All integrals in this paper are taken over the interval T , unless specified otherwise. In the rest of
this paper, i = 1, ..., n and j = 1, ..., m. We model the regression of Yij on Xij as

Yij = g(ζ˜ij ) + eij ,

µij = E(Yij | Xij (t), t ∈ T ) = g(ζ˜ij ),

σ 2 (µij ) = var(Yij − µij | Xij (T ), t ∈ T ) = σ 2 (ζ˜ij ),

(3.2.1)

ei = (ei1 , ei2 , ..., eim )T ,

R0 = cor(ei ),

for a known real valued link function g and a positive function σ, where eij have zero mean. R0 is
m × m true correlation matrix.
We now give another representation of the model (3.2.1). Let ρj , j = 1, 2, ..., be an orthonormal
basis of the functional space L2 (T, ω) = L2 (ω). The predictor process and parameter function can

16

be written as
∞

Xij (t) =

∞

(k)

εij ρk (t),

˜ =
β(t)

k=1

β˜k ρk (t),

(3.2.2)

k=1

(k)
with random variables εij = Xij (t)ρk (t)dω(t), and coefficients β˜k =

˜
β(t)ρ
k (t)dω(t). By

(k)

assumption 3.4.1 below, we obtain E(εij ) = 0, for all i, j, and k. Note that the random variables
(k)

(l)

εij and εij are uncorrelated for k = l. By the orthonormality of the basis functions,
∞

(k)2
σij

∞

=

2 (t))dω(t)
E(Xij

˜
β(t)X
ij (t)dω(t) =

< ∞,

k=1

(k)
β˜k εij .

k=1

Using the above representation we now address the infinite dimensionality issue or equivalently
the issue of infinite number of predicting variables. Based on the truncation strategy proposed by
M¨uller and Stadtm¨uller (2005) we replace model (3.2.1) with the following approximate sequence
of finite dimensional models. Let p = pn be a sequence of positive integers tending to infinity and
define our new approximate model as
pn

Yij = g β˜0 +

(k)
β˜k εij + eij ,

p
η˜ijn

pn

= β˜0 +

(p )

Uij n =

(k)

εij ρk (t),

(3.2.3)

k=1

k=1
pn

(k)
β˜k εij ,

(p )

(p )

(p )

σ 2 (ηij n ) = σ 2 (µij n ) = var(eij |Uij n )

k=1

ei = (ei1 , ei2 , ..., eim )T ,

R0 = cor(ei ).

Let β˜ = (β˜0 , ..., β˜pn )T . The superscript pn indicates the number of parameters. We exhibit this
superscript and subscript when necessary. We assume that the standardized error eij σ
˜ (µij ) is
(k)

independent of εij for all i, j, k.
In the sequel, Nm (µ, Σ) stands for m-dimensional normal distribution with the mean vector µ
and covariance matrix Σ, m ≥ 1, and all limits are taken as n → ∞, unless specified otherwise.

17

For any vector or a finite dimensional matrix A, A denotes the Frobenius norm of A.

3.3

Estimation

˜ In most appliWe use the generalized estimating equation set-up for estimating the parameter β.
cations, we do not know the true correlation matrix R0 , so we use a working correlation matrix
R(γ) to specify the working covariance. The parameter γ can be estimated using the residuals. We
ˆ to denote an estimated working correlation matrix. The estimate R
ˆ that we use below was
use R
suggested by Balan et al. (2005). Denote the derivative of function g as g . The estimator denoted
by βˆ is the solution to the following equation.
n

n

FiT Gi (β)Vˆi (β)−1 ei (β) = 0,

ˆ −1 Ai (β)−1/2 ei (β) =
FiT Gi (β)Ai (β)−1/2 R

U ∗ (β) =

i=1

i=1

(3.3.1)
where,


gi1 (β) . . .


..
Gi (β) =  ...
.


0
...

(0)
εi1

(0)
εi2

...

0 
 (2) (2)


..  , F T = εi1 εi2 . . .

. 
 i
 ..
..
...
 .

.


gim (β)
(p)
(p)
εi1 εi2 . . .


2 (β), ..., σ 2 (β) ,
Ai (β) = diag σi1
im



(0)
εim 




 yi1 − gi1 (β) 
(2) 



εim 


..
,
e
(β)
=


,
.


..  i


. 


yim − gim (β)
(p)
εim

n
ˇ i − gi (β))(Y
ˇ
ˇ T
ˇ
ˆ= 1
R
Ai (β)(Y
i − gi (β)) Ai (β)
n i=1

ˆ i (β)−1/2 .
Vˆi (β) = Ai (β)−1/2 RA

(0)
(p )
Let εij = 1, εij = (εij , ..., εij n )T , gij (β) = g(εTij β), gi (β) = (gij (β), ..., gij (β)), σi1 (β) =

σ(εTij β) and βˇ be a preliminary

n/pn consistent estimate of β. It can be obtained using the
18

independence estimating equations. Further details of this estimator can be found in Wang et al.
(2011).

3.4
3.4.1

Asymptotic Theory
Existence and Consistency

We first state the assumptions needed for existence of the solution of (3.3.1) and its consistency.
Note that λmax (A), λmin (A) denote the maximum and minimum eigenvalues of a matrix A respectively. Also →D (→P ) denote convergence in distribution (probability).
Assumption 3.4.1 Assume that E{X(t)} = 0 for all t, E X 4 < ∞ and β˜ 2 < ∞.
Assumption 3.4.2 Function g is monotone, invertible, and has two continuous bounded derivatives. The function σ 2 has a continuous bounded derivative and is bounded from below by δ > 0.
Assumption 3.4.3 Assume that pn n−1/2 → 0.
Assumption 3.4.4 The true correlation matrix R0 has positive eigenvalues. The estimated workˆ satisfies R
ˆ −1 − R
¯ −1
ing correlation matrix R

¯ is a constant positive
= Op {(pn /n)1/2 }. R

definite matrix with positive eigenvalues.
Assumption 3.4.5 There exist two positive constants b1 , b2 such that b1 ≤ λmin {E(F1T F1 )} ≤
λmax {E(FiT Fi )} ≤ b2 .
We assume that the mean function of the regressor functions is zero to ease some of the calculations. Note that since we can obtain a consistent estimator of the mean function this assumptions

19

is easily satisfied. We assume that E X 4 < ∞ mainly to obtain consistent estimates of the eigenvalues (Horv´ath and Kokoszka, 2012). Assumption 4, 5 and 6 can also be found in Wang et al.
ˆ used in the score function satisfies the assumption 4.
(2011). We prove that R
Theorem 3.4.1 Under the assumptions 3.4.1–3.4.5, the solution βˆ to (3.3.1) exists and satisfies
the following
˜ = Op ( pn /n)
||βˆ − β||
The proof is along the same lines as Wang et al. (2011) and can be found in the appendix. We
approximate U ∗ (β) by U¯ (β) =

n
T
−1/2 R
¯ −1 Ai (β)−1/2 ei (β).
i=1 Fi Gi (β)Ai (β)

Thus, we can

consistently estimate the parameters even if the correlation structure is misspecified.

3.4.2

Asymptotic Normality

To show the asymptotic normality of the estimator we will approximate U ∗ (β) by
n

U (β) =
i=1

FiT Gi (β)Ai (β)−1/2 R0−1 Ai (β)−1/2 ei (β). We will now rewrite the U (β) as

U (β) = DT (β)V (β)−1/2 (Y − µ(β)),

where,

Vi (β) = Ai (β)R0−1 Ai (β),


0
V1 (β)


 0
V2 (β)

V (β) = 
 ..
..
 .
.


0
0

i = 1, ..., n,


...

0 


...
0 

,

.
..
.
.
. 


. . . Vn (β)
(3.4.1)

20


−1/2
(β)
0
V1


−1/2

(β)
0
V2

D(β) = 

..
..

.
.


0
0

...
...
..

.

...


0
0
 G1 (β)


 0
0
G2 (β)


  ..
..
..
 .
.
.


−1/2
Vn
(β)
0
0

Γ(β) =

1
E(D(β)T D(β)) = (γkl (β))0≤k,l≤p ,
n

J(β) =

dU (β)
,
dβ


...

0

...

0

..

..
.

.

. . . Gn (β)

 

 F1
 
 
  .. 
 . ,
 
 

 Fn

Ξ(β) = Γ(β)−1 = (ξkl (β))0≤k,l≤p ,


 l11 . . . l1m 


 ..

.
−1
T
.
.
.
Y = (Y11 , Y12 , ..., Y1m , ..., Ynm ) , R0 =  .
. .
.




lm1 . . . lmm

ˆ
Next, we shall state the needed assumptions for establishing the asymptotic normality of β.
Assumption 3.4.6 Assume that pn n−1/8 → 0.
˜ = n−1 E{D(β)
˜ T D(β)}
˜ and n−1 D(β)
˜ T D(β)
˜ are
Assumption 3.4.7 The matrices Γ = Γ(β)
non-singular for all n.
Assumption 3.4.8 The eigenvalues of Γ are bounded and

˜ T D(β)
˜
D(β)
n

−1
1/2

= Op (pn ).

˜ =
We are now ready to state the theorem regarding the normality of the estimate. Let Γ
˜ T D(β)
˜ and Γ
ˆ T D(β).
ˆ Note that Γ
ˆ = n−1 D(β)
˜ depends on the unknown parameter
n−1 D(β)
ˆ is a plug in estimator of Γ.
˜
where as Γ

21

Theorem 3.4.2 Under the assumptions 1–9, the following statements hold.
˜ T Γ(βˆ − β)
˜ − (pn + 1)
n(βˆ − β)
2(pn + 1)
˜ T Γ(
˜ − (pn + 1)
˜ βˆ − β)
n(βˆ − β)
2(pn + 1)
T
˜ Γ(
˜ − (pn + 1)
ˆ βˆ − β)
n(βˆ − β)
2(pn + 1)

→D N (0, 1).

(3.4.2)

→D N (0, 1).

(3.4.3)

→D N (0, 1).

(3.4.4)

Note that this theorem can be used to construct 1 − α confidence bands. Refer to corollary
4.3 in M¨uller and Stadtm¨uller (2005). The proofs of both the theorems are given at the nd of this
chapter.
We are now ready to describe a test for the problem of testing for no regression relation between
a real valued response and a functional predicting variable. Referring to the model (3.2.1), testing
for no association between the response and the predicting process is equivalent to testing for
H0 : β˜ = 0. Since we use the sequence of approximate models proposed in (3.2.3), we test the
following hypothesis instead: H0 : (β˜1 , ..., β˜pn ) = (0, ..., 0), versus the alternative that H0 is not
true, for a given appropriate value of pn . The proposed test rejects H0 for the large absolute values
of the statistic
Dn =

ˆ βˆ − (pn + 1)
nβˆT Γ
2(pn + 1)

.

From Theorem 2 it follows that the test that rejects H0 whenever |Dn | > zα/2 is of the asymptotic
size α, where zα is the 100(1 − α)th percentile of standard normal distribution.

22

3.5

Simulation

In this section we report the finding of several simulation studies, which investigate different aspects of the proposed testing procedure. In the first study we demonstrate the importance of considering the correlation structure in the data. We then explore the applicability of our method to
sequencing data. In this study, we use sequencing data in the simulation to demonstrate the applicability of our method to genetic studies. We also compare the empirical power of our test to that
of gSKAT proposed by Wang et al. (2013). We then study several other aspects of the proposed
test by varying the sample and cluster sizes and the dimensions pn .
For the first simulation we generated pseudo-random regression functions using Fourier basis
functions ρj , j ≥ 1, and the following model: X(t) =

pn
k k
j
k=1 ε ρ (t), ε

are independent and

identically distributed with N (0, 1) distribution. We then evaluated each function on a grid of
finite points to obtain functional data that are realizations of an underlying function. We take
this approach as in applications only finite discretization of functional data is available. We then
applied a smoothing procedure to this discrete data to obtain smooth functions that are used in our
proposed test. This smoothing procedure is described later in the section. We simulated the effect
function β(t) in the following way: β(t) =

pn ˜ k
˜
k=1 βk ρ (t), βk

= k −1 δ, k ≥ 1. Here, δ determines

the magnitude of the effect of the regression function on the response. The choice of the number
of basis functions pn selected to generate each of these functions is set to 10n1/7 .
We studied the performance of our method when the response is continuous and binary. For
cluster size m = 3 and the case of continuous response we used the following model to generate
the response variables: Yi = 01 Xi (t)β(t)dt + ei , ei ∼ N3 (¯0, R) where, ¯0 is a vector of zeroes of
length 3 and R is a 3 × 3 correlation matrix with all off diagonal elements equal to γ, |γ| < 1.
For binary response, we generated the correlated responses using the function from the bindata

23

package in R software with marginal probabilities given by g

1
0 X(t)β(t)dt

, where g is the

logistic link function. The correlation matrix is the same as in the previous case.
We then studied the empirical power of the proposed test of no regression relation by considering the underlying correlation structure and treating the observations as being independent. We
label these two approaches as F test and Ind test, respectively. In these power studies, we chose
n = 500, replicated 1000 times and the significance level α = 0.05. The value of δ is fixed at
0.08 for continuous traits and at 0.1 for binary ones. The number of parameters pn was determined
using five fold cross validation procedure (see the end of this section about how we chose the value
of pn ). We can see from Table 3.1 that as the correlation increases between the individuals the
empirical power of the F test increases whereas that of the Ind test remains more or less the same
as expected. This demonstrates the need to factor in the correlation structure. The empirical level
for this study can be found in Table 3.2. For the second study, we compared the performance in
Table 3.1: Empirical power as a function of γ
Corr(γ) Continuous Trait
0
0.05
0.30
0.50
0.80

F Test
0.421
0.456
0.567
.692
0.993

Ind Test
0.418
0.45
0.457
0.443
0.455

Binary Trait
F Test
0.137
0.130
0.165
0.195
0.308

Ind Test
0.127
0.124
0.124
0.129
0.134

Table 3.2: Empirical level as a function of γ
Corr(γ) Continuous Trait
0
0.05
0.30
0.50
0.80

F Test
0.03
0.048
0.045
0.059
0.068

Ind Test
0.051
0.049
0.051
0.043
0.047

Binary Trait
F Test
0.039
0.036
0.032
0.043
0.041

Ind Test
0.032
0.035
0.029
0.037
0.037

terms of power of our test and gSKAT for testing of no regression relation. Note that gSKAT is a
24

sequence kernel association tests that is based on generalized estimating equations. It is not meant
for functional variables though it can handle cases where the number of variables is larger than the
sample size. To make the comparison fair and to investigate the treatment of sequencing data as a
function we used real genetic data from the 1000 Genome Project (?) and then simulated response
values using this data. In particular, we used a region of the genome (Chromosome 17) from this
dataset for following simulation. Note that this a not a family data set and thus, all the individuals
are assumed to be independent. The primary focus of this simulation is to investigate the treatment
of sequencing data as a function.
In real applications, we do not observe the regressor function, but assume that the densely
observed data is a realization of a function. So the first step in any functional data study is a
smoothing step which involves constructing regressor functions from observed data.
Smoothing methods given in Ramsay (2006) typically use the following model to fit a single
curve: The given curve Xij is observed at l discrete points (t1 , ..., tl ). Let xijk , k = 1, ..., l, denote
these observed values. We then recover the function Xij from these observed values by fitting the
linear model xijk =

r
q=1 cq φq (tk ),

k = 1, ..., l, where φq ’s are basis functions. We choose a

large value for r and use penalization techniques to ensure that the fitted function is not very rough.
We penalize the integral of the squared second derivative, i.e., we choose cq , 1 ≤ q ≤ r to minimize
l
k=1 (xijk

r
2
q=1 cq φq (tk ))

−

− λP EN2 (Xij ). We take P EN2 (Xij ) =

D2 Xij (s)ds, where

D2 denotes the second derivative. The underlying function Xij is approximated by Xij (t) =
r
q=1 cˆq φq (t).

We used cubic B-spline basis functions and the smooth.spline function available in

R software to carry out the smoothing procedure. We specified the knots to be at (t1 , ..., tl ) leading
to r = l + 4 − 2.
After obtaining the underlying function we simulated a discrete and continuous response using
the sequencing data. Let X denote the sequencing data matrix of dimension
25

n
i=1 mi

× l, where

n, mi , and l represent the number of clusters, cluster size, and the number of variants, respectively.
For the selected region from the 1000 Genome Project, l = 800, n = 1092 while the family
size mi is one for all the families. Then the responses are generated using the relations Yi =
l
k=1 βk Xik

+ ei , ei ∼ Nm (¯0, Ri ), βk = k −1 δ, k = 1, · · · , l, where now Ri is the mi × mi

correlation matrix with all off diagonal terms equal to γ = 0. For the binary case we proceeded as
before using the rmvbin function in R. The following empirical powers are based on 1000 replicates
and level of significance of 0.05. We chose pn by using the five fold cross validation procedure
(see the end of this section). We can see in Table 3.3 shows that both the tests have comparable
power but Type-I error for gSKAT for the binary case is inflated.
Table 3.3: Empirical power comparison with the gSKAT
Effect(δ) Continuous Trait
0.00
0.50
1.00
3.00
5.00

F Test
0.041
0.102
0.314
0.996
1.00

gSKAT
0.048
0.108
0.374
0.994
1.00

Binary Trait
F Test
0.047
0.072
0.114
0.555
0.951

gSKAT
0.061
0.063
0.133
0.608
0.962

In the simulation study to investigate the effect of sample size and cluster size on the power of
our test, the regression function was generated using the Fourier basis functions and the response
variable was generated using the linear and logistic model as in the first simulation study. The
choice of the number of basis functions pn selected to generate the regression functions is again
set to 10n1/7 . Effect size δ used in this study is set equal to 0.1 and 0.05 for binary response and
continuous response, respectively. The correlation for the sample size and the cluster study are set
to 0.8. From Table 3.4 we can see that the power increases with the sample size for both types of
the response variables.

26

Table 3.4: Empirical power as a function of sample size
Sample Size
500
1000
2000

Continuous Trait
0.718
0.970
1

Binary Trait
0.295
0.65
0.946

To demonstrate the effect of using large cluster size m, we used a sample size of 100 in the
study. Results of this simulation can be seen in Table 3.5. We observe that the Type I error is
inflated with the increase in the cluster size. For a cluster of size m the number of parameters in
the correlation matrix is of order m2 . Large m will affect the consistency of correlation estimate
ˆ used in the score equation (3.3.1) rendering our asymptotic results invalid .
(R)
Table 3.5: Empirical level as a function of cluster size
Cluster Size
3
10
20

Continuous Trait
0.203
0.659
0.867

Binary Trait
0.05
0.14
0.4

Table 3.6 below demonstrates the effect of increasing the number of parameters pn in the
model. The regressor function and the continuous response variable were simulated as before. The
number of basis functions used to generate the regressor was set to pn = 30, 50, 90. Sample size
was 500, the effect size δ was 0.05, correlation parameter γ was taken to be 0.8. The number of
parameters in the model pn was chosen to be 30, 50, 90 instead of using cross validation. We can
see that as the number of parameters increases the Type-I error also increases.
Table 3.6: Effect of increasing pn
pn
30
50
90

Empirical level
0.068
0.065
0.107

The following figure displays Q-Q plots to check the normality of the test statistic. For small
27

dimensions the test statistic follows a chi-square distribution with p degrees of freedom. As we
increase p the normal approximation becomes more appropriate.
Figure 3.1: Normal Q-Q plots
p=10

20
15
0

0

5

10

Sample Quantiles

10
5

Sample Quantiles

15

25

p=2

−2

−1

0

1

2

3

−3

−2

−1

0

1

2

Theoretical Quantiles

Theoretical Quantiles

p=30

p=80

3

100
60

80

Sample Quantiles

50
40
30
20

Sample Quantiles

60

120

70

−3

−3

−2

−1

0

1

2

3

−3

Theoretical Quantiles

−2

−1

0

1

2

3

Theoretical Quantiles

An important problem that we need to address while applying our method is the determination
of the number of parameters pn . In all of the simulations reported here, we used the functional principal component analysis method for dimension reduction. We projected the infinite dimensional
function on to a finite pn dimensional subspace spanned by the first pn eigenfunctions of the covariance operator of the regressor functions. This covariance operator is estimated from the regressor
functions obtained by smoothing of the observed data. We first selected those values of p˜ for which
28

the proportion of variance explained by the first p˜ principal components

p˜
k=1 λk

∞
k=1 λk

var-

ied from 80% to 99% as the potential values for the number of parameters pn . We then used 5 fold
cross validation to select a final value for pn .

3.6

Application to the sequencing data from the Minnesota Twin
Study

Substance use disorders and related health conditions pose a remarkable burden on global health.
Twin and family-based studies have suggested a substantial genetic contribution to substance dependence (e.g., nicotine and alcohol dependence). Studies have been successful in identifying
genetic variants contributing to nicotine dependence. In this application, we applied the proposed method to assess the dependence between 15 neuronal nicotinic acetylcholine receptors
(nAChRs) subunit genes and nicotine dependence using the sequencing data from Minnesota Twin
Study (Vrieze et al., 2014). Genetic associations between nAChRs subunit genes and nicotine
dependence has already been established. A comprehensive study on the these genes (Saccone
et al., 2009) found associations for loci in the CHRNA5, CHRNA3,CHRNA4, CHRNB4,CHRNB3,
CHRNB1, CHRNA6, CHRND, CHRNG, CHRNB4 with nicotine dependence. Our aim is to confirm whether our test can replicate these associations. The sequencing data set we used for this
analysis has 662 families and 1445 individuals.
Nicotine dependence is a continuous variable measured based on the protocols of the Substance
Abuse Module of Composite International Diagnostic Interview (Hicks et al., 2011). It considers
the frequency and quantity of nicotine use including cigarettes, cigars, pipes and chewing tobacco.
Covariates of age and sex were also considered. We first fit a regression model with nicotine
dependence as response and age and sex as predictors, and then used the residuals in the analysis.
29

We apply our proposed test to each of the 15 genes individually. As the response and regressors are
centered, we omit the intercept from this analysis. The smooth functions obtained by the smoothing
method discussed earlier in the simulation section serve as regressors in the test. Table 3 report
the number of parameters i.e., pn used for each gene. Note that as these values are small, normal
approximation does not work as can be seem from Figure 3.1 . We instead use Dn∗ as the statistic
that follows χpn and use this to report the un-adjusted p-values for the F-test. From this table, we
find that the p-values for F-test are smaller than those of the gSKAT for many genes. Also, our
method is able to detect several associations that are undetected by gSKAT, which might suggest
that our method has better performance. Note that even after adjustment for multiple testing our
method is able to find associations for genes, such as CHRNB2, CHRNA6 and CHRND.
Table 3.7: p-values for Minnesota Twin Study
Gene
CHRNA1
CHRNA2
CHRNA3
CHRNA4
CHRNA5
CHRNA6
CHRNA7
CHRNA9
CHRNB1
CHRNB2
CHRNB3
CHRNB4
CHRND
CHRNE
CHRNG

F Test
0.929
0.740
0.057
0.011
0.148
4E-05
0.625
0.400
0.017
2E-09
0.003
0.382
8E-04
0.207
0.044

gSKAT pn
0.621
2
0.595
2
0.611
2
0.125
2
0.415
2
0.216
6
0.736
2
0.524
2
0.870
2
0.1351 7
0.429
5
0.561
2
0.175
2
0.675
2
0.270
2

30

Dn∗
0.147
0.600
5.713
8.900
3.810
29.765
0.937
1.831
8.047
53.838
17.524
1.922
14.230
3.142
6.219

3.7

Discussion

The proposed method can be easily extended to include additional finite dimensional covariates
Z = (Z1 , ..., Zk ) as well as finitely many functional regressors X1 , ..., Xl . The main question
is to choose the dimension p1n , ..., pln for each of the functional regresors. We can use cross
validation techniques to determine p1n , ..., pln . Note that this is more computationally challenging
than the case of a single functional regressor. The dimension of the truncated model (3.2.3) is
l

pn =

pjn + k + 1.
j=1

3.8

Results

This section contains the consistency and existence results. Note that C denotes a constant in the
details below.
ˆ −1 converges to the inverse of the
Proposition 3.8.1 The inverse of estimated correlation matrix R
ˆ −1 − R−1 = Op {(pn /n)1/2 }.
true correlation in probability i.e. R
0
Proof Let R∗ = n−1

n
i=1

gives us

R∗

−1/2

Ai

˜ i −gi (β)}{Y
˜
˜ T −1/2 (β).
˜ Central limit theorem
(β){Y
i −gi (β)} Ai

ˆ − R∗ = Op {(pn /n)1/2 }. Combining
− R0 = Op (n−1/2 ). We will show that R

these two results will yield the proposition. Recall that eij (β) = yij − g(εTij β), σij (β) = σ(εTij β).
We have
m

ˆ − R∗ 2
R

=
ji ,j2 =1
m

≤2
ji ,j2 =1

1
n

ˇ ij (β)
ˇ
˜ ij (β)
˜ 2
eij1 (β)e
eij1 (β)e
2
2
−
ˇ ij (β)
ˇ
˜ ij (β)
˜
σ
(
β)σ
σij1 (β)σ
ij
1
2
i=1
2
n

m

Ij2 j ,1 + 2
1 2

ji ,j2 =1

31

Ij2 j ,2 ,
1 2

where

1
Ij1 j2 ,1 =
n
Ij1 j2 ,2 =

1
n

n
i=1
n
i=1

m

≤C

1
n

2

1
1
−
˜ ij (β)
˜
˜ ij (β)
˜
σij1 (β)σ
σij1 (β)σ
2
2

.

Ij2 j ,1 = Op (pn /n). Using Assumption 3.4.2
1 2

ji ,j2 =1

1
≤C
n

1

˜ ij (β)
˜
eij1 (β)e
2

We first show that

Ij2 j ,1
1 2

ˇ ij (β)
ˇ − eij (β)e
˜ ij (β)
˜
eij1 (β)e
2
1
2
,
ˇ ij (β)
ˇ
σij (β)σ

2

n

˜
ˇ − eij (β)}
ˇ ij (β)
eij1 (β){e
2
2
i=1
n

1
n

+C

2

n

˜
ˇ − eij (β)}
˜ ij (β)
eij2 (β){e
1
1
i=1

2

˜ ij (β)
˜ − gij (β)}
ˇ
eij1 (β){g
2
2
i=1

1
+C
n
1
+C
n

2

n

ˇ
˜
ˇ
˜ − gij (β)}{g
{gij1 (β)
ij2 (β) − gij2 (β)}
1
i=1
n

2

˜ ij (β)
˜ − gij (β)}
ˇ
eij2 (β){g
1
1

.

i=1

Using Cauchy Schwarz inequality,
m
ji ,j2 =1

m

Ij2 j ,1
1 2

≤C
ji ,j2 =1
m

+C
ji ,j2 =1
m

+C
ji ,j2 =1

1
n

n

n

˜
e2ij (β)
1

i=1
n

1
n
1
n

i=1

1
˜ − gij (β)}
ˇ 2
{g (β)
2
n ij2
n

˜ − gij (β)}
ˇ 2
{gij1 (β)
1
i=1
n

i=1
n

˜ − gij (β)}
ˇ 2
{gij1 (β)
1
i=1

i=1

= Ij1 j2 ,11 + Ij1 j2 ,12 + Ij1 j2 ,13 .

32

1
˜ − gij (β)}
ˆ 2
{g (β)
2
n ij2
1 2 ˜
e (β)
n ij2

Now,
m

1
n

Ij1 j2 ,11 = C

=C

ji ,j2 =1

 m



ji =1

n

n

i=1

˜
e2ij (β)
1

i=1


n
 m
1
2
˜
eij (β) 
1

n
i=1

1
˜ − gij (β)}
ˇ 2
{g (β)
2
n ij2
n

j2 =1 i=1


1
˜ − gij (β)}
ˇ 2
{g (β)
2
n ij2

ˇ − gij (β)}
˜ 2 ≤ C(βˇ − β)
˜ T εij εT (βˇ − β),
˜ yielding
Taylors expansion gives us {gij (β)
ij
m

n

j2 =1 i=1

1
ˇ 2 ≤C1
˜ − gij (β)}
{gij2 (β)
2
n
n

n

m

˜
˜ T εij εT (βˇ − β)
(βˇ − β)
2 ij
2

i=1 j2 =1

˜T1
≤ C(βˇ − β)
n
˜T
≤ C(βˇ − β)

1
n

n

˜
FiT Fi (βˇ − β)
i=1
n

˜
FiT Fi − E(FiT Fi ) + E(FiT Fi ) (βˇ − β)
i=1

˜ T E(F T Fi )(βˇ − β)
˜
≤ C(βˇ − β)
i
≤ βˇ − β˜ 2 λmax E(FiT Fi ).
= Op (pn /n)

We get the above using Assumption 3.4.4, 3.4.5. We have n−1

n
i=1

˜ = Op (1). Using similar
e2ij (β)
1

m

techniques for the remaining terms we can show Ij1 j2 ,11 , Ij1 j2 ,12 , Ij1 j2 ,13 and thereby

ji ,j2 =1

m

are all of order Op (pn /n). The same holds for

Ij2 j ,2 to complete the proof.
ji ,j2 =1 1 2

Define,
n

−1/2

FiT Gi (β)Ai

U¯ (β) =

−1/2

¯ −1 A
(β)R
i

(β)ei (β),

i=1

J(β) =

∂U (β)
,
∂β β

J ∗ (β) =

∂U ∗ (β)
,
∂β
β
33

∂ U¯ (β)
¯
J(β)
=
.
∂β β

Ij2 j ,1
1 2

Lemma 3.8.1 The jacobian J(β) can be decomposed as

J(β) = −H(β) + R1 (β) − R2 (β) − R3 (β),

where
n

−1/2

FiT Gi (β)Ai

H(β) = −

−1/2

(β)R0−1 Ai

(β)Gi (β)Fi ,

i=1
n

−1/2

FiT G¨i (β)Ai

R1 (β) =
i=1
n

−1/2

FiT Gi (β)Ai

R2 (β) =
i=1
n

R3 (β) =

(β)F0 (β)Fi ,
(β)R0−1 A˙ i

1/2

(β)A−1
i (β)diag{ei (β)}Fi ,

˙ 1/2 (β)F0 (β)Fi .
FiT Gi (β)A−1
i (β)Ai

i=1
−1/2
G¨i (β) = diag{gi1 (β), ..., gim (β)}, A˙ i = diag{σi1 (β), ..., σim (β)}, F0 (β) = diag(R0−1 Ai
ei ).

¯
Similar result holds for J(β)
and J ∗ (β) i.e.

¯
¯
¯ 1 (β) − R
¯ 2 (β) − R
¯ 3 (β),
J(β)
= −H(β)
+R
J ∗ (β) = −H ∗ (β) + R1∗ (β) − R2∗ (β) − R3∗ (β),
where
n

¯
H(β)
=−

i=1
n

−1/2

FiT Gi (β)Ai

H ∗ (β) = −

i=1

−1/2

FiT Gi (β)Ai

−1/2

¯ −1 A
(β)R
i

(β)Gi (β)Fi and

−1/2

ˆ −1 A
(β)R
i

¯ i (β), R∗ (β), (i =
(β)Gi (β)Fi . Remaining terms R
i

¯ and R∗ respectively.
1, 2, 3) are defined in a similar fashion by using R
Proof As the proof is simple but tedious we omit the details.

34

Lemma 3.8.2 For all ∆ > 0, bn ∈ Rpn ,

1/2 }
¯
sup |bTn (J ∗ (β) − J(β))b
n | = Op {(npn )

sup

β−β˜ ≤∆(pn /n)1/2 bn =1

Proof Due to Lemma 3.8.1 it is sufficient to prove the following results:

sup

1/2 }
¯
sup |bTn (H ∗ (β) − H(β))b
n | = Op {(npn )

(3.8.1)

β−β˜ ≤∆(pn /n)1/2 bn =1

sup

¯ i (β))bn | = Op {(npn )1/2 },
sup |bTn (Ri∗ (β) − R

(i = 1, 2, 3).

(3.8.2)

β−β˜ ≤∆(pn /n)1/2 bn =1

We have
n

¯
|bTn {H ∗ (β) − H(β)}b
n|

−1/2

|bTn FiT Gi (β)Ai

≤

ˆ −1 − R
¯ −1 )A−1/2 (β)Gi (β)Fi bn |
(β)(R
i

i=1
n

ˆ −1 − R
¯ −1
≤C R

bTn FiT 2
i=1
n

ˆ −1 − R
¯ −1
=C R

bTn FiT Fi bn
i=1
n

≤ COp {(pn /n)1/2 }n
i=1

bTn FiT Fi bn
− bTn E(F1T F1 )bn
n

+ COp {(pn /n)1/2 }nbTn E(F1T F1 )bn
≤ COp {(pn /n)1/2 }nλmax {E(F1T F1 )}
≤ COp {(pn /n)1/2 }.

Similarly we can prove (3.8.2).

35

Lemma 3.8.3 We have

1/2 p ).
¯
¯
+ H(β)}b
sup |bTn {J(β)
n | = Op (n
n

sup

β−β˜ ≤∆(pn /n)1/2 bn =1

Proof From Lemma 3.8.1 it is sufficient to show that

sup

sup |bTn R¯i (β)bn | = Op (n1/2 pn ),

(i = 1, 2, 3).

β−β˜ ≤∆(pn /n)1/2 bn =1

We first consider
n

R¯2 (β) =

−1/2
˜
¯ −1 A˙ i 1/2 (β)A−1 (β)diag{ei (β)}Fi
FiT {Gi (β) − Gi (β)}A
(β)R
i
i

i=1
n

−1/2

˜
FiT Gi (β){A
i

+
i=1
n

+
i=1
n

+
i=1
n

+
i=1
n

+

−1/2

(β) − Ai

˜ R
¯ −1 A˙ i
(β)}

1/2

(β)A−1
i (β)diag{ei (β)}Fi

−1 (β)diag{e (β)}F
˜ −1/2 (β)
˜R
˜
¯ −1 {A˙ i 1/2 (β) − A˙ i 1/2 (β)}A
FiT Gi (β)A
i
i
i
i
−1 (β) − A−1 (β)}diag{e
˜ −1/2 (β)
˜R
˜
˜
¯ −1 A˙ i 1/2 (β){A
FiT Gi (β)A
i (β)}Fi
i
i
i

˜ −1/2 (β)
˜R
˜ −1 (β)[diag{e
˜
˜
¯ −1 A˙ i 1/2 (β)A
FiT Gi (β)A
i (β)} − diag{ei (β)}]Fi
i
i
˜ −1/2 (β)
˜R
˜ −1 (β)diag{e
˜
˜
¯ −1 A˙ i 1/2 (β)A
FiT Gi (β)A
i (β)}Fi
i
i

i=1
6

¯ .
R
2,i

=
i =1

We investigate the first and last terms of the above equation and leave the rest to the reader. For
the first term using Assumption 3.4.5, 3.4.2 we get

36

n

sup

sup

β−β˜ ≤∆(pn /n)1/2 bn =1 i=1

¯ 2,1 bn |
|bTn R
n

≤

sup

sup

β−β˜ ≤∆(pn /n)1/2 bn =1 i=1

−1/2
˜
¯ −1 A˙ i 1/2 (β)A−1 (β)
|bTn FiT {Gi (β) − Gi (β)}A
(β)R
i
i

× diag{ei (β)}Fi bn |
n

≤

bTn Fi 2

ˆ
sup C sup{gij (β) − gij (β)}

sup

β−β˜ ≤∆(pn /n)1/2 bn =1

i,j

i=1
n

≤

bTn FiT Fi bn

sup C β − βˆ

sup

β−β˜ ≤∆(pn /n)1/2 bn =1

i=1

≤ Op ((npn )1/2 ).

In a similar fashion we can show that
n

sup

sup

β−β˜ ≤∆(pn /n)1/2 bn =1 i=1

¯ bn | = Op ((npn )1/2 ),
|bTn R
2,i

(i = 2, ..., 5).

¯ i,6 . Using Assumption 3.4.2 we have
We now investigate the last term R
pn

n

m

¯ 2,6 2 ≤ C
R
k,l=0 i ,i =1 j ,j ,j j =1
1 1
1 2 1 2

(k) (l) (k) (l)
˜
˜
εi j εi j ε
ε
e
(β)e
(β).
1 2 1 1 i j i j i 1 j1
i1 j1
1 2 1 1

Taking expectation and using the independence of errors between clusters we get

¯ 2,6 2 = Op (np2n ).
R

37

(3.8.3)

Thus we get,
sup |bTn (R¯2 (β))bn | = Op (n1/2 pn ).

sup

β−β˜ ≤∆(pn /n)1/2 bn =1

¯ 1 and R
¯ 3 in a similar manner to complete the proof.
We can obtain the result for R
Lemma 3.8.4 We have

˜ n = Op (n1/2 pn ).
¯
¯ β)|b
sup |bTn (H(β)
− H(

sup

β−β˜ ≤∆(pn /n)1/2 bn =1

The proof of this lemma is similar to the proofs of Lemma 3.8.2, 3.8.3 so we omit the details.
Lemma 3.8.5 We have

˜ T H(
˜
˜ ≤ −Cn β − β˜ 2 .
¯ β)(β
(β − β)
− β)

Proof Using Assumptions 3.4.2, 3.4.4, 3.4.5 ,
n

˜ T H(
˜
˜ =−
¯ β)(β
(β − β)
− β)

˜
˜ T F T Gi A−1/2 R
¯ −1 A−1/2 Gi Fi (β − β)
(β − β)
i
i
i

i=1
n

¯ min λmin (A−1 )λmin (G2 )
≤ −λmin (R)
i
i
i

FiT Fi β − β˜ 2
i=1

≤ −Cn β − β˜ 2 .

Lemma 3.8.6 Let U¯ (β) =

n
i=1

−1/2 ¯ −1 −1/2
(β)−1 ei (β).
R Ai

FiT Gi (β)Ai

˜ = Op {(npn )1/2 }.
U¯ (β)

38

Then we can show that

¯ −1 = (¯lij )1≤i,j≤m . Using Assumption 3.4.2,
Proof Let R
pn

˜ 2=
U¯ (β)



n

m

˜ ¯lj j ε(k) eij (β)
˜
gij (β)
2 1 ij
1

2

˜ ij (β)
˜
σij1 (β)σ
2



2


k=0

i=1 j1 ,j2 =1

2

˜ 2 ) ≤ CO(npn ).
E( U¯ (β)

Thus, the result is proved.
˜ − U¯ (β)
˜ = Op (pn ).
Lemma 3.8.7 We have U ∗ (β)
ˆ −1 − R
¯ −1 . We have,
Proof Let Q = {qj1 j2 }1≤j1 ,j2 ,m = R
n

m

˜ − U (β)
˜ =
U ∗ (β)

qj 1 j 2
i=1

j1 j2 =1

ηij2 )
g (˜
ηij2 )e(˜
ε .
σ(˜
ηij2 )σ(˜
ηij1 ) ij1

Note that,
n

E
i=1

This implies,

n
i=1

ηij2 )
g (˜
ηij2 )e(˜
2
εij1 = O(npn ).
σ(˜
ηij2 )σ(˜
ηij1 )

g (˜
ηij2 )e(˜
ηij2 )
εij1 = Op {(npn )1/2 }, (1 ≤ j1 , j2 ≤ m). Assumption 3.4.4
ηij1 )
σ(˜
ηij2 )σ(˜

yields the result.
We are now ready to prove consistency result.
Proof of Theorem 1
According to Theorem 6.3.4 from Ortega and Rheinboldt (2000), to prove the existence and conˆ it is enough to verify the following condition: for all > 0, there exists
sistency of the estimator β,
a constant ∆ > 0 such that, for all n sufficiently large,

P


sup

˜ T U ∗ (β) < 0 ≥ 1 − .
(β − β)

β−β˜ =∆(pn /n)1/2

39

This same technique has been used previously by Wang et al. (2013). Using Taylors expansion we
get,

˜ T U ∗ (β) = (β − β)
˜ T U ∗ (β)
˜ − (β − β)
˜ T {−J ∗ (β ∗ )}(β − β)
˜
(β − β)

(3.8.4)

= A1 + A2 ,

where, β ∗ lies between β and β˜ i.e. max{ β ∗ − β , β ∗ − β˜ } ≤ β − β˜ . Also, β − β˜ =
∆(pn /n)1/2 . By Lemma 3.8.6, 3.8.7,

˜ + (β − β)
˜ T (U ∗ (β)
˜ − U¯ (β))
˜
˜ T U¯ (β)
A1 = (β − β)
|A1 | ≤ ∆(pn /n)1/2 (npn )1/2 + ∆(pn /n)1/2 pn
≤ ∆Op (pn ) + ∆op (pn ).

Next,

˜ T J ∗ (β ∗ )(β − β)
˜
A2 = −(β − β)
˜ T J(β
˜ + (β − β)
˜ T {−J ∗ (β ∗ ) + J(β
˜
¯ ∗ )(β − β)
¯ ∗ )}(β − β)
= −(β − β)
= A21 + A22 .

40

(3.8.5)

From Lemma 3.8.2 we obtain A22 = ∆2 op (pn ). Also

˜ T {J(β
˜
¯ ∗ ) + H(β
¯ ∗ )}(β − β)
A21 = −(β − β)
˜ T {H(β
˜
˜
¯ ∗ ) − H(
¯ β)}(β
(β − β)
− β)
˜ T H(
˜
˜
¯ β)(β
(β − β)
− β)
≤ −C∆2 pn + op (pn ) + op (pn ).

(3.8.6)

˜ T U ∗ (β) = ∆Op (pn ) − C∆2 pn . Thus for
From (3.8.4), (3.8.5), (3.8.6) we can see that (β − β)
ˆ
a suitable ∆ this term is negative in probability and we get the existence and consistency of β.

We now turn to prove the asymptotic normality result.
Note that,
n

−1/2

D(β)T D(β) = −H(β) =

FiT Gi (β)Ai

−1/2

(β)R0−1 Ai

(β)Gi (β)Fi .

i=1

Similarly define,
n

D∗ (β)T D∗ (β) = −H ∗ (β) =

−1/2

FiT Gi (β)Ai

−1/2

ˆ −1 A
(β)R
i

i=1

˜
Proposition 3.8.2 Let β ∗ be a fixed value of β between βˆ and β,

1.5 ) = o (n).
J ∗ (β) − J(β) = Op (n1/2 pn
p

41

(β)Gi (β)Fi .

Proof Using Lemma 3.8.1 we obtain,
3

J ∗ (β) − J(β)

≤

D∗ (β)T D∗ (β) − D(β)T D(β)

Ri∗ (β) − Ri (β) ,

+
i=1

= I1 + I2 .

(3.8.7)

n

m

−

We now look at each of these terms, I1 =
i=1 j1 j2 =1

qj1 j2 gij (β)gij (β)
1

2

σij1 (β)σij2 (β)

εij1 εTij

2

, where

ˆ −1 . We use Lemma 3.8.1, Assumptions 3.4.2 and 3.4.6 to get
qj1 j2 are the terms from R0−1 − R
(I1 /n)2 = Op (p3n /n) = op (1). Similarly we can prove that I2 = Op (n1/2 p1.5
n ) = op (n).
˜
Proposition 3.8.3 Let β ∗ be a fixed value of β between βˆ and β,

˜ = Op (np1.5 /n1/4 ) = op (n).
J(β ∗ ) − J(β)
n

Proof
n

˜
J(β ∗ ) − J(β)

≤

˜
D(β ∗ )T D(β ∗ ) − D(β)T D(β)

˜
R∗ (β ∗ ) − R(β)

+
i=1

= I1 + I2

∗ )g (β ∗ )
˜
˜ 
(β
g
(
β)g
(
β)
ij1
ij2
ij1
ij2
−
l
ε εT
∗
∗
˜ ij (β)
˜  j1 j2 ij1 ij2
 σij1 (β )σij2 (β ) σij (β)σ
i=1 j1 j2 =1
1
2
n

I1 =

m


g

∗ = εT β ∗ . Using integral form of the remainder in Taylor’s theorem,
Recall that η˜ij = εTij β˜ and ηij
ij

42

Theorem 1 and Assumption 3.4.2,

˜ =
gij (β ∗ ) − gij (β)

∗ }(η ∗ − η˜ )dλ
g {λ˜
ηij − (1 − λ)ηij
ij
ij

∗ | ≤ c × β˜T − β ∗T sup ε
≤ c × |˜
ηij − ηij
ij

(3.8.8)

i,j

Consider,

P (sup εij ≥ n1/4 ) ≤ nmP ( ε1 > n1/4 )
i,j

2

pn

≤ mE ε1

4

= mE( ε1

2 )2

= mE
k=1

ε21
k

2

∞

ε21

≤ mE

k

k=1

= mE X1 4 ≤ ∞

−1/4 ) This can be
˜ = Op (p1/2
Combining the above result with (3.8.8) we get gij (β ∗ ) − gij (β)
n n

used to show that I1 = Op (n3/4 p1.5
n ) = op (n). Similarly, we can prove the same result for I2 .
Proposition 3.8.4 This proposition states

˜
that n1/2 (βˆ − β)

≈

˜ T D(β)
˜ −1 U (β)
˜
D(β)
n
n1/2

Proof Using Taylors expansion,

ˆ = U ∗ (β)
˜ + J ∗ (β ∗ )(βˆ − β)
˜
U ∗ (β)
˜ = −J ∗ (β ∗ )(βˆ − β)
˜
U ∗ (β)
˜ − U (β)
˜
U ∗ (β)
n1/2

+

˜
U (β)
n1/2

=−

43

J ∗ (β ∗ ) ˆ ˜ 1/2
(β − β)n
n

From Lemma 3.8.7 we obtain,
˜
U (β)
n1/2

=−

J ∗ (β ∗ ) ˆ ˜ 1/2
(β − β)n
n

The right hand side can be written as

−

˜
J ∗ (β ∗ ) ˆ ˜ 1/2 −J ∗ (β ∗ ) + J(β ∗ ) ˆ ˜ 1/2 −J(β ∗ ) + J(β)
˜ 1/2
(β − β)n
=
(β − β)n
+
(βˆ − β)n
n
n
n
˜ + D(β)
˜ T D(β)
˜
˜T
˜
J(β)
˜ 1/2 + D(β) D(β) (βˆ − β)n
˜ 1/2
−
(βˆ − β)n
n
n

˜ = Op (n1/2 pn ), i = 1, 2, 3 in the same manner as (3.8.3). Using AssumpWe can show that Ri (β)
tion 3.4.2, 3.4.6, Proposition 3.8.2, 3.8.3, Theorem 1 yields

−

˜ T D(β)
˜
J ∗ (β ∗ ) ˆ ˜ 1/2 D(β)
˜ 1/2
(β − β)n
=
(βˆ − β)n
n
n

Thus, the Proposition is proved.
Let,

e (β) =

V (β)−1/2 e(β),

Zn (β) =

D(β)T D(β)
n

1/2

χn (β) =

Ξn (β)D(β)T e (β)
n1/2

,

Ψn (β) =

Γ(β)1/2

−1

D(β)T e(β)
n1/2

D(β)T D(β)
n

,

(3.8.9)

−1

Γ(β)1/2 .

˜ for ease of notation we denote Zn =
As the functions in Proposition 3.8.4 are evaluated at β,
˜ , χn = χn (β),
˜ Ψn = Ψn (β),
˜ e = e (β).
˜ From the statement of Theorem 2, 3.4.2, we are
Zn (β)

44

interested in ZnT ΓZn . We decompose it into the three terms as

ZnT ΓZn = χTn Ψ2n χn
= χTn χn + 2χTn (Ψn − Ipn +1 )χn + χTn (Ψn − Ipn +1 )(Ψn − Ipn +1 )χn
= Fn + Gn + Hn ,

1/2

We shall show that Hn /pn

say.

1/2

and Gn /pn

are asymptotically negligible. This goal is in part

facilitated by the following proposition.
Proposition 3.8.5 We have

Ψn − Ipn +1 2 = op (1/pn ),

(3.8.10)
d

(χTn χn − (pn + 1))/(2pn )1/2 −
→ N (0, 1).

(3.8.11)

We shall prove this lemma shortly. As a consequence of this lemma, we obtain

|χTn (Ψn − Ipn +1 )χn | ≤ |χn χTn | Ψn − Ipn +1 ,
1/2

1/2

= Op (pn )op (1/pn ) = op (pn ).

1/2 p

This implies that Gn /pn

1/2 p

−
→ 0. We can similarly show that Hn /pn
ZnT ΓZn
(2pn

)1/2

=

Fn
(2pn )1/2

45

.

−
→ 0. Hence

(3.8.12)

Proof We shall first prove (3.8.10). Observe that

Ψn − Ipn +1 ≤ Ψn
Ψn ≤ Ipn +1 +

Ψ−1
n − Ipn +1

Ψ−1
n − Ipn +1
1 − Ψ−1
n − Ipn +1

(3.8.13)
.

We shall show that

−1
Ψ−1
n − Ipn +1 = op (pn ) = op (1),

(3.8.14)

implying Ψn ≤ Ipn +1 = (pn + 1)1/2 . This will prove (3.8.10), because

1/2

Ψn − Ipn +1 = Op (pn )op

1
pn

= op

1
1/2

.

pn

(1/2)

We now turn to the proof of (3.8.14). Let, Ξ = Ξ1/2 Ξ1/2 , Ξ1/2 = (ξks

)0≤k,s≤pn . We can write

1/2 1 D(β)
˜ T D(β)Ξ
˜ 1/2
Ψ−1
n =Ξ
n
pn l
˜
˜
m
n
j1 j2 gij2 (β)gij1 (β)
1
(1/2) (s) (t) (1/2)
.
=
ξks εij εij ξtl
˜
˜
n
(
β)
σ
(
β)σ
1 2
0≤k,l≤pn
ij2
ij1
i=1 j j =1 ts=1
1 2

Thus,
2
˜
˜

lj1 j2 gij (β)gij (β) (1/2) (s) (t) (1/2)
2
−1
2
1
E Ψn − Ipn +1 = E
ξks εij εij ξtl
− δkl
˜ ij (β)
˜
1 2
n

σij2 (β)σ
i=1 j1 j2 =1 st=0
1
kl=0

1 n

pn

m

pn

pn

=E

(akl − δkl )2
kl=0
pn

=E

2 ),
(a2kl − 2akl δkl + δkl
kl=0

46

say.

pn
2
kl=0 akl

We investigate each term separately. Write

1
B1 = 2
n

= B1 + B2 , where

˜
˜
lj1 j2 gij (β)g
ij1 (β) (1/2) (s1 ) (t1 ) (1/2)
2
ξks εij εij ξt l
˜ ij (β)
˜
1
2 1
1
σ
(
β)σ
ij
2
1
kl=0 i=1 j j j j =1 s1 t1 s2 t2 =0
1 2 1 2
˜
˜
g (β)g
(β)
l
pn

n

pn

m

×

j1 j2 ij
2

ij1

(1/2) (s2 ) (t2 ) (1/2)
ε
ε ξ
,
2 ij1 ij2 t2 l

ξks

˜
˜
σij (β)σ
ij (β)
2

1

˜
˜
lj1 j2 gi j (β)g
1
i1 j1 (β) (1/2) (s1 ) (t1 ) (1/2)
1 2
B2 = 2
ξks εi j εi j ξt l
˜ i j (β)
˜
1 1 1 2 1
1
n
σ
(
β)σ
i
j
1 2
1 1
kl=0 i1 =i2 =1 j j j j =1 s1 t1 s2 t2 =0
1 2 1 2
˜
˜
(β)
(β)g
g
l
pn

n

pn

m

×

j1 j2 i2 j
2

i2 j1

˜
˜
σi j (β)σ
i2 j1 (β)
2 2

(1/2) (s2 ) (t2 ) (1/2)
ε
ε
ξ
.
2 i 2 j1 i 2 j2 t2 l

ξks

Using the fact that Ξ1/2 Ξ1/2 = Ξ,

1
B1 = 2
n

n

pn

m

i=1 j j j j =1 s1 t1 s2 t2 =0
1 2 1 2

˜
˜
lj1 j2 gij (β)g
ij1 (β) (s1 ) (t1 )
2
εij εij
˜ ij (β)
˜
1
2
σij (β)σ
2

1

˜
˜
lj j g (β)g
(β)
ij1
(s ) (t )
1 2 ij2
×
ε 2 ε 2 ξt1 t2 ξs1 s2
˜
˜
σij (β)σij (β) ij1 ij2
2

E(B1 ) =

1
o
n

1
E(B2 ) = 2
n

n
p2n
n

1

→0
pn

m

E
i1 =i2 =1 j j j j =1 s1 t1 s2 t2 =0
1 2 1 2

˜
lj1 j2 g (˜
ηi1 j2 )gi j (β)
(s ) (t )
1 1
εi 1j εi 1j
˜
˜
1 1 1 2
σi1 j2 (β)σi1 j1 (β)



˜
ηi j )g
(β)
 lj j g (˜

2 2 i 2 j1
(s ) (t )
1 2
×E
ε 2 ε 2
ξ
ξ
˜
˜
i2 j1 i2 j2  t1 t2 s1 s2
 σ
(
β)σ
(
β)
i j
i j
2 2

1
= 2
n

n

2 1

pn

γs1 t1 γs2 t2 ξt1 t2 ξs1 s2 .
i1 =i2 =1 s1 t1 s2 t2 =0

47

We observe that ΓΞ = I and that each of the matrices are symmetric giving, implying
pn

γs1 t1 ξt1 s2 = 1,

s1 = s2

(3.8.15)

s1 = s2 .

(3.8.16)

t1 =0

= 0,

Thus

=

pn

pn

n

1
E(B2 ) =
n2

γs2 t2 ξs1 s2

1
n2

γs1 t1 ξt1 t2

(3.8.17)

t1 =0

i1 =i2 =1 s1 t2 s2 =0
pn pn
n

γs2 s1 ξs1 s2
i1 =i2 =1 s2 =0 s1 =0

1
=
n2

n

pn

1=
i1 =i2 =1 s2 =0

pn + 1 n
2
2
n

(n − 1)(pn + 1)
.
n

=

In a similar fashion, we can show that,
pn

−2
kl=0

and

−2
E(akl δkl ) =
n

pn
2
kl=0 δkl

n

pn

pn

γst
i=1 st=0

(1/2) (1/2)
ξsk ξkt

k=0

−2
=
n

n

pn pn

γts ξst = −2(pn + 1)
i=1 t=0 s=0

= pn + 1.

These results together imply that

2 = o
E Ψ−1
n − Ipn +1

= o

1
p2n
1
p2n

(n − 1)(pn + 1)
n
pn + 1
+O
.
n
+

48

− 2(pn + 1) + (pn + 1),(3.8.18)

1
pn

Thus, we obtain Ψ−1
n − Ipn +1 = op

and conclude the proof of (3.8.10) in Proposition

3.8.5 .
Next we prove (3.8.11). Referring to (3.8.9),

χn =

Ξ1/2 DT V −1/2 e
n1/2

n

=
i=1

n

=

−1/2 −1 −1/2
R Ai
ei
n1/2

Ξ1/2 FiT Gi Ai

i=1
−1/2
Ξ1/2 FiT Gi Ai
R−1 e˜i
,
n1/2

−1/2

e˜i = Ai

˜ i (β)
˜
(β)e

 1/2 (s)

˜ j j e˜ij
ξ0s εij gij (β)l
1 2 2
1 1



˜
σij1 (β)


pn 
m
n

1


..
=

,
.
1/2


n

i=1 j1 j2 =1 s=0  1/2 (s)
˜ j j e˜ij 
 ξp s ε g (β)l
n
ij
ij

1 2 2
1 1
˜
σij1 (β)
2

pn  n
pn ξ 1/2 ε(s) g (β)l
˜ j j e˜ij 
m
ts ij1 ij1
1
1 2 2
χTn χn =
˜


n
σij1 (β)
t=0 i=1 j j =1 s=0
1 2

= An + Bn ,
1
An =
n

pn

(3.8.19)

n

pn

m

(1/2) (1/2) (s1 ) (s2 )
˜
˜
˜ij2 e˜ij4
ξts εij εij gij (β)g
ij3 (β)lj1 j2 lj3 j4 e
1
2
1
1
3

ξts

˜ ij (β)
˜
σij1 (β)σ
3

t=0 i=1 j1 j2 j3 j4 =1 s1 s2 =0

pn

n

m

pn

Bn =

,

(1/2) (1/2) (s1 ) (s2 )
˜
˜
ξts εi j εi j gi j (β)g
˜i1 j2 e˜i2 j4
i2 j3 (β)lj1 j2 lj3 j4 e
1
2
1 1 2 3 1 1

ξts

˜ i j (β)
˜
nσi1 j1 (β)σ
2 3

t=0 i1 =i2 =1 j1 j2 j3 j4 =1 s1 s2 =0

An − (pn + 1)

is asymptotically negligible in Lemma 3.8.8 and that Bn has
1/2
pn
desired distribution in Lemma 3.8.9.
We shall show that

Lemma 3.8.8 We have

An − (pn + 1) p
−
→ 0.
1/2
pn
49

.

Proof Using independence of e˜ with all ε,
pn

n

pn

m

E(An ) =

E

(1/2) (1/2) (s1 ) (s2 )
˜
˜
ξts εij εij gij (β)g
eij2 e˜ij4 )
ij3 (β)lj1 j2 lj3 j4 cov(˜
1
2
1
1
3

ξts

˜ ij (β)
˜
nσij1 (β)σ
3

t=0 i=1 j1 j2 j3 j4 =1 s1 s2 =0

=

1
n

pn

n

 (s ) (s )
˜
˜
m
 ε 1 ε 2 gij (β)g
ij (β)lj1 j2
ij
ij

m

ξs1 s2
i=1 s1 s2 =0

1

E
j1 j2 j3 =1

1

3

3

˜ ij (β)
˜
σij1 (β)σ
3



j4 =1



lj3 j4 cov(˜
eij2 e˜ij4 ).


We note that lj3 j4 are elements from the inverse of the correlation matrix (R) and that covariance
−1/2

matrix of e˜i = Ai

ei is R . Also for any invertible matrix M , M −1 M = I. Using (3.8.16), we

obtain

E(An ) =

1
n

n

pn

m

ξs1 s2
i=1 s1 s2 =0
pn

=

E
j1 j2 =j3 =1
pn

ξs1 s2 γs1 s2 =
s1 s2 =0


 (s ) (s )
˜ j j 
˜
 ε 1 ε 2 gij (β)g
(
β)l
ij
ij
ij
1 2
1



2

1

2

˜
˜ ij (β)
σij1 (β)σ
2

(3.8.20)



1 = pn + 1.
s1 =0

We now evaluate
2
(s ) (s )
˜ j j lj j e˜ij e˜ij 
˜
(
β)l
ξs1 s2 εij 1 εij 2 gij (β)g
ij2
1 2 3 4 2 4
1
1
3
E(A2n ) = E
˜
˜

n
σij1 (β)σij3 (β)
i=1 j1 j2 j3 j4 =1 s1 s2 =0
= Aan + Abn , say.
(3.8.21)

1 n

m

pn

Now consider,

50

Aan

1
= 2
n

n

pn

m

i=1 j j ...j j =1 s s s s =0
1 1 4 4
1 2 1 2

ξs1 s2 ξs s
1 2



s1 (s1 ) (s2 ) (s2 )


˜
˜
˜
˜


 εij1 εij εij3 εij gij1 (β)gij (β)gij3 (β)gij (β)lj1 j2 lj j lj3 j4 lj j e˜ij2 e˜ij e˜ij4 e˜ij 
4
2
3 4
1 2
3
1
3
1
×E
˜
˜
˜
˜


σij1 (β)σ


ij1 (β)σij3 (β)σij3 (β)


=O

1
n

1
p2n

=o

.

(3.8.22)

Next, we have

Abn

n

1
= 2
n

pn

m

ξs1 s2 ξs s lj1 j2 lj j lj3 j4 lj j
3 4
1 2
1 2
i1 =i2 =1 j j ...j j =1 s s s s =0
1 1 4 4
1 2 1 2


(s1 ) (s1 ) (s2 ) (s2 )

˜
˜
˜
˜


ei1 j2 e˜i j e˜i1 j4 e˜i j 
 εi1 j1 εi j εi1 j3 εi j gi1 j1 (β)gi j (β)gi1 j3 (β)gi j (β)˜

2 4
2 2
2 1
2 3
2 1
2 3
×E
˜
˜
˜
˜
σi1 j1 (β)σ
i2 j1 (β)σi1 j3 (β)σi2 j3 (β)




1
= 2
n
=





n

ξs1 s2 γs1 s2
i1 =i2 =1 s1 s2

s1 s2

ξs s γs s
1 2 1 2

(n − 1)(pn + 1)2
.
n

(3.8.23)

Upon combining this fact with (3.8.20), (3.8.21) and (3.8.22), we obtain

var(An ) = E(A2n ) − E 2 (An ) = o

Thus, we have

An − (pn + 1) p
−
→ 0.
1/2
pn

51

1
p2n

+

(pn + 1)2
.
n

From, (3.8.19) and Lemma 3.8.8 we obtain
χT χ − (pn + 1)
(2pn )1/2

=

Bn
(2pn )1/2

.

So the next lemma will establish (3.8.11).
Lemma 3.8.9 Asymptotic distribution of Bn is given as

Bn

→ N (0, 1).

(2pn )1/2
Definition 3.8.1

(s ) (s )
˜
˜
ξs1 s2 εi 1j εij 2 gi j (β)g
˜i1 j2 e˜ij4
ij3 (β)lj1 j2 lj3 j4 e
1
1
1
1
3
,
Wni =
˜ ij (β)
˜
σi1 j1 (β)σ
i1 =1 j1 j2 j3 j4 =1 s1 s2 =0
3
2
Wni =
Wni .
n(2pn )1/2
i−1

m

pn

Note that,
2
Bn =
n

n

Wni .
i=1

The random variables Wni form a triangular array of martingale differences w.r.t. the filtration
(k)

Fn,i = σ(εi j , eij , 1 ≤ i1 ≤ i, 1 ≤ j ≤ m, 0 ≤ k ≤ pn ), (1 ≤ i ≤ n, n ∈ N). N denotes the set
1
of all natural numbers. This implies that Wni is also a martingale difference array.
Proof
To prove this lemma it is enough to prove the following:
n

d

Wni −
→ N (0, 1).
i=1

52

(3.8.24)

According to the CLT for the sums of martingale difference arrays, see Corollary 3.1 in Hall
and Heyde (1980), it suffices to verify the following two conditions
n

p

2 |F
→ 1,
E(Wni
n,i−1 ) −
i=1
n

(3.8.25)

p

4 |F
→ 0,
E{Wni
n,i−1 } −

for all > 0.

(3.8.26)

i=1

Consider,
i−1
2
E(Wni

m

pn

m

| Fn,i−1 ) =
i1 ,i2 =1 j1 j2 j3 j4 =1 j j j j =1 s s s s =0
1 2 1 2
1 2 3 4
(s ) (s )

˜ j j l
˜
e˜
l
e˜
l
ξs1 s2 ξs s εi 1j ε 1 gi j (β)g
(β)l
1 2 j1 j2 j3 j4 j3 j4 i1 j2 i2 j2
i 2 j1
1 2 1 1 i2 j1 1 1
×
˜
˜
σi j (β)σ
(β)
1 1

×E


(s2 ) (s2 )

˜

gij (β)g
 εij ε
3

ij3

ij3

3

i2 j1

˜ eij e˜
(β)˜
4 ij

4

˜
˜
σij3 (β)σ
ij (β)





| Fn,i−1





3

i−1

m

m






pn

=
i1 ,i2 =1 j1 j2 j3 j4 =1 j j j =1 s s s s =0
1 2 1 2
1 2 3
(s ) (s )
˜ j j l
˜
e˜
e˜
l
(β)l
ξs1 s2 ξs s εi 1j ε 1 gi j (β)g
1 2 j1 j2 j3 j4 i1 j2 i2 j2
i2 j1
1 2 1 1 i 2 j1 1 1
×
˜
˜
σi j (β)σ
(β)
i2 j1

1 1

×E


(s2 ) (s2 )

˜

gij (β)g
 εij ε






˜

(β) m

ij3
3
3 ij3
lj j cov(˜
eij4 e˜ij )
˜
˜
3 4
4 
σij3 (β)σ

ij3 (β)

j =1
4

53

i−1

m

pn

m

=
i1 ,i2 =1 j1 j2 =1 j j =1 s s s s =0
1 2 1 2
1 2
(s ) (s )
˜ j j l
˜
e˜
e˜
(β)l
ξs1 s2 ξs s εi 1j ε 1 gi j (β)g
1 2 j1 j2 i1 j2 i2 j2
i2 j1
1 2 1 1 i2 j1 1 1
×
˜
˜
σi j (β)σ
(β)
m

×
j3 j4 =1
i−1
2
E(Wni

m

1 1

i2 j

3

4

1


(s )


˜
˜
 lj j ε(s2 ) ε 2 gij (β)g

ij4 (β)
3 4 ij3 ij4
3
E
,
˜ ij (β)
˜


σij (β)σ


pn

m

| Fn,i−1 ) =
i1 ,i2 =1 j1 j2 =1 j j =1 s s s s =0
1 2 1 2
1 2
s
s
˜
˜ j j l
ξs1 s2 ξs s εi 1j ε 1 gi j (β)g
(β)l
e˜
e˜
1 2 j1 j2 i1 j2 i2 j2
i2 j1
1 2 1 1 i2 j1 1 1
×
× γs s
˜
˜
2 2
(
β)
σi1 j1 (β)σ
i j
2 1

i−1

m

pn

=
i1 ,i2 =1 j j j j =1 s s s s =0
1 2 1 2
1 2 1 2
(s ) (s )
˜
˜ j j l
e˜
e˜
(β)l
ξs s εi 1j ε 1 gi j (β)g
1 2 j1 j2 i1 j2 i2 j2
i 2 j1
1 2 1 1 i2 j1 1 1
×
˜
˜
(β)
σi j (β)σ
1 1

i 2 j1

pn

×
i−1

ξs1 s2 γs s
2 2

s2 =0
m

pn

=
i1 ,i2 =1 j j j j =1 s s =0
1 2 1 2
1 1
(s ) (s )
˜
˜ j j l
ξs s εi 1j ε 1 gi j (β)g
(β)l
e˜
e˜
1 2 j1 j2 i1 j2 i2 j2
i 2 j1
(1)
(2)
1 1 1 1 i2 j1 1 1
×
= An + An ,
˜
˜
σi j (β)σ
(β)
1 1

54

i 2 j1

where

(1)

An

i−1

pn

m

=
i1 =i2 =1 j j j j =1 s s =0
1 2 1 2
1 1

(2)

An

i−1

pn

m

=
i1 =i2 =1 j j j j =1 s s =0
1 2 1 2
1 1

s
s
˜
˜ j j l
ξs s εi 1j ε 1 gi j (β)g
(β)l
e˜
e˜
1 2 j1 j2 i1 j2 i1 j2
i1 j1
1 1 1 1 i1 j1 1 1
,
˜
˜
σi j (β)σ
(β)
1 1

i1 j1

s
s
˜ j j l
˜
ξs s εi 1j ε 1 gi j (β)g
e˜
e˜
(β)l
1 2 j1 j2 i1 j2 i2 j2
i2 j1
1 1 1 1 i2 j1 1 1
.
˜
˜
σi j (β)σ
(β)
1 1

i2 j1

Note that

(1)
E(An )

i−1

pn

m

=
i1 =1 j j j =1 s s =0
1 2 1
1 1




ξ




m

×E
lj j cov(˜
ei1 j2 e˜i1 j2 )
˜ β˜


1 2
σi1 j1 (β)σ(


i1 j1 )


j2 =1


s
s




˜
˜
pn
m
i−1
 ξs s εi 1j εi 1j gi j (β)g
i1 j2 (β)lj1 j2 
1
1
1
1
1
2
1
1
E
=
˜ i j (β)
˜


σi1 j1 (β)σ



i1 =1 j1 j2 =1 s s =0 
1 2
1
1


s


pn
˜
˜
m
i−1
 εs1 ε 1 gi j (β)g

i1 j2 (β)lj1 j2
i1 j1 i1 j2 1 1
E
ξs s
=
˜
˜ i j (β)


1 2
σi1 j1 (β)σ

j j =1 
i =1
1 2
s1 s1
˜
˜
s1 s1 εi1 j1 εi1 j gi1 j1 (β)gi1 j (β)lj1 j2
1
1

s1 s1 =0
pn
i−1

1 2

1

=
i1 =1 s s =0
1 1

ξs s γs s = (i − 1)(pn + 1),
1 1 1 1

(2)

E(An ) = 0.

The above facts together imply:

2 |F
E{E(Wni
n,i−1 )} = (i − 1)(pn + 1).

55

Thus
n
2
E(Wni

E{
i=1

2
| Fn,i−1 )} =
E{
2
n pn

2 |F
E(Wni
n,i−1 )}

(3.8.27)

i=1

n

2

=

n

n2 p

n i=1

(i − 1)(pn + 1) → 1.

Next, we shall show that
n

var

2 |F
E(Wni
n,i−1 ) → 0.

(3.8.28)

2 |F
2
E{E(Wni
n,i−1 )}

(3.8.29)

i=1

To prove this claim, first consider
n

n
2 |F
2 =
E(Wni
n,i−1 )}

E{
i=1

i=1
2 |F
2
E{E(Wni
n,i1 −1 )E(Wni | Fn,i2 −1 )}.

+2

1

1≤i1 <i2 ≤n

2

The second term satisfies

2 |F
2
E(Wni
n,i1 −1 )E(Wni2 | Fn,i2 −1 )
1
i1 −1

i2 −1

m

m

pn

=
a1 ,a2 =1 a3 ,a4 =1 j j j j =1 j j j j =1 s s s s =0
1 2 1 2
3 4 3 4
1 1 2 2
s

s

˜
˜ j j l
ξs s εa1 j ε 1 ga j (β)g
(β)l
e˜
e˜
1 2 j1 j2 a1 j2 a2 j2
a2 j1
1 1 1 1 a2 j1 1 1
×
˜
˜
σa j (β)σ
(β)
a2 j1

1 1

s
s
ξs s εa2 j ε 2 g (β˜a3 j3 )g (β˜a j )lj3 j4 lj j e˜a31j4 e˜a j
2 2 3 3 a4 j3
4 3
4 4
3 4
×
˜
˜
σa j (β)σ
(β)
(1)

(1)

(1)

3 3

a4 j3

(1)

(1)

= An + Bn + Cn + Dn + En ,

56

where,

(1)
An

i1 −1

m

pn

m

=
a=a1 =...=a4 =1 j j j j =1 j j j j =1 s s s s =0
1 2 1 2
3 4 3 4
1 1 2 2
s
s
˜ j j l
˜
ξs s εaj2 ε 2 gaj (β)g
e˜ e˜
(β)l
1 2 j1 j2 aj2 aj2
aj1
1
1 aj1
1 1
×
˜
˜
σaj (β)σ
(β)
aj1

1

s

s

˜ j j l
˜
ξs s εaj2 ε 2 gaj (β)g
e˜ e˜
(β)l
3 4 j3 j4 aj4 aj4
aj3
3
3 aj3
2 2
,
×
˜
˜
σaj (β)σ
(β)
aj3

3

(1)
Bn

i1 −1

i2 −1

m

m

pn

=
a1 =a2 =1 a3 =a4 =1 j j j j =1 j j j j =1 s s s s =0
3 4 3 4
1 2 1 2
1 1 2 2
s
s
˜
˜ j j l
e˜
e˜
(β)l
ξs s εa1 j ε 1 ga j (β)g
1 2 j1 j2 a1 j2 a1 j2
a1 j1
1 1 1 1 a1 j1 1 1
×
˜
˜
(β)
σa j (β)σ
1 1

a1 j1

s
s
˜
˜ j j l
e˜
e˜
(β)l
ξs s εa2 j ε 2 ga j (β)g
3 4 j3 j4 a31j4 a3 j4
a3 j3
2 2 3 3 a3 j3 3 3
,
×
˜
˜
(β)
σa j (β)σ
a3 j3

3 3

(1)
Cn

i1 −1

m

m

pn

=
a1 =a3 =a2 =a4 =1 j j j j =1 j j j j =1 s s s s =0
3 4 3 4
1 2 1 2
1 1 2 2
s
s
˜ j j l
˜
e˜
e˜
(β)l
ξs s εa1 j ε 1 ga j (β)g
1 2 j1 j2 a1 j2 a2 j2
a2 j1
1 1 1 1 a2 j1 1 1
×
˜
˜
σa j (β)σ
(β)
1 1

s

a2 j1

s

˜
˜ j j l
ξs s εa2 j ε 2 ga j (β)g
(β)l
e˜
e˜
3 4 j3 j4 a1 j4 a2 j4
a2 j3
2 2 1 3 a2 j3 1 3
×
˜
˜
σa j (β)σ
(β)
1 3

57

a2 j3

i1 −1

=

pn

m

ξs s ξs s
1 1 2 2
a1 =1 s s s s =0 j1 j2 j3 j4 =1
1 1 2 2
s
s
˜ j j lj j e˜a j e˜a j
εa1 j εa2 j ga j (β)l
1 2 3 4 1 4 1 2
1 1 1 3 1 1
˜ a j (β)
˜
σa1 j1 (β)σ
1 3
i1 −1

s
s
ε1 ε2 g

m

a2 j1 a2 j3 a2 j1

×

˜
˜a j e˜a j
(β)l
j1 j2 lj3 j4 e
2 2 2 4
.
˜
˜
(β)σ
(β)

σa j
2 1

a2 =1 j j j j =1
1 2 3 4

a 2 j3

(1)

(1)

(1)

(1)

From (3.8.29) we can see that our interest is only in E(An + Bn + Cn + Dn ). So it is
(1)

(1)

(1)

(1)

(1)

(1)

(1)

(1)

sufficient talk about just the expectation of these terms i.e An , Bn , Cn , Dn and En . Dn

has terms with a1 = a4 , a2 = a3 and has same expectation as Cn . En has all the terms like
a1 = a2 = a3 = a4 and so on with all of them the expectation is 0 and so E(En ) = 0. We now
(1)

(1)

(1)

examine the expectations of the remaining terms i.e An , Bn and Cn i.e.
n
p2n

(1)

E(An ) = (i1 − 1)o
(1)
E(Bn )

i1 −1

,
pn

m

=
a1 =1 j j j j =1 s s =0
1 2 1 2
1 1

×E




ξ



s1
s1
˜
˜

˜a1 j2 e˜a j 

s1 s1 εa1 j1 εa1 j ga1 j1 (β)ga1 j (β)lj1 j2 lj1 j2 e
1 2
1
1

˜
˜
σa1 j1 (β)σ
a1 j1 (β)




i2 −1

m





pn

×
a3 =1 j j j j =1 s s =0
3 4 3 4
2 2

×E




ξ



s2
s2
˜
˜

˜a3 j4 e˜a j 

s2 s2 εa3 j3 εa3 j ga3 j3 (β)ga3 j (β)lj3 j4 lj3 j4 e
3 4
3
3





˜
˜
σa3 j3 (β)σ
a3 j3 (β)

= (i1 − 1)(pn + 1)(i2 − 1)(pn + 1),

58





(1)
E(Cn )

i1 −1

i1 −1

pn

=
a1 =1 s s s s =0
1 1 2 2

γs1 s2 γs s
1 2

a2 =1

ξs s ξs s
1 1 2 2

= (i1 − 1)2 (pn + 1).

Thus, we get

(1)

(1)

(1)

(1)

(1)

2 |F
2
E{E(Wni
n,i1 −1 )E(Wni2 | Fn,i2 −1 )} = E(An + Bn + Cn + Dn + En )
1
n
= (i1 − 1)o 2 + (i1 − 1)(i2 − 1)(pn + 1)2
pn

+ 2(i2 − 1)2 (pn + 1).

(3.8.30)

On the same lines we can show that

n
p2n

2 |F
2
E{E(Wni
n,i−1 )} = (i − 1)o

+ (i − 1)2 (pn + 1)2 + 2(i − 1)2 (pn + 1).

(3.8.31)

Combining results (3.8.29), (3.8.30), (3.8.31) we obtain,
n

n
2
E(Wni

E{
i=1

| Fn,i−1

)}2

(i − 1)o

=
i=1

n

n
p2n

+ (i − 1)2 (pn + 1)2 + 2(i − 1)2 (pn + 1)

i−1

(i1 − 1)o

+2
i=1 i1 =1
n

n
p2n

+ (i − 1)(i1 − 1)(pn + 1)2

i−1

2(i1 − 1)2 (pn + 1)

+2
i=1 i1 =1

= O(n3 )o

n
p2n

+ n4 (pn + 1)2 {1 + o(1)} + n4 (pn + 1){1 + o(1)},

59

n

n
2
E(Wni

E{

| Fn,i−1

)}2

= E{
i=1

i=1

2
2
2 |F
E( 2 Wni
n,i−1 )}
n pn

= 1 + o(1).

Thus, we have proved (3.8.28). This fact together with (3.8.27) completes the proof of (3.8.25).
Next, we shall sketch a proof of the fact
n

P

4 |F
E(Wni
→ 0.
n,i−1 ) −

(3.8.32)

i=1

This fact readily implies (3.8.26).
Note that
i−1
2
Wni

m

pn

m

=
i1 ,i2 =1 j1 j2 j3 j4 =1 j5 j6 j7 j8 =1 s1 s2 s3 s4 =0
s
s
˜
˜
˜i1 j2 e˜i2 j6
ξs1 s2 ξs3 s4 εi 1j εi 3j gi j (β)g
i2 j5 (β)lj1 j2 lj5 j6 lj3 j4 lj7 j8 e
1
1
1
1
2
5
×

˜ i j (β)
˜
σi1 j1 a(β)σ
2 5

s s
˜
˜ eij e˜ij
εij2 εij4 gij (β)g
ij7 (β)˜
4 8
3
3
7
×
.
˜
˜
σij3 (β)σij7 (β)

4 in to 4 terms
Using techniques similar to the ones in the previous result, we can write terms in Wni

of type i1 = i2 = i3 = i4 that are of order (i − 1)o(n2 p2n ) and remaining terms that are of the
order i(i − 1)p4n o n/p2n , to obtain

60

n
4 ) = O(n2 )o(n2 p2 ) + O(n3 )p4 o
E(Wni
n
n
i=1
n
4)=O
E(Wni
i=1

1
n4 p2n

n
p2n

,

O(n2 )o(n2 p2n ) + O(n3 )p4n o

n
p2n

= o(1).

(3.8.33)

Thus, from (3.8.33) we prove (3.8.26) and (3.8.24) and thereby Proposition 3.8.9. From Lemma
3.8.5 , we also prove 3.4.2 of Theorem 2. Next, we prove the remaining parts of Theorem 2.
We now prove 3.4.3 of Theorem2
˜ = D, Ψ−1 = nΓ1/2 (DT D)−1 Γ1/2 and Γ
˜ = DT D/n.
Proof Recall, D(β)
n
From Lemma 3.8.4 and (3.8.9) we have,

n(βˆ − β)T Γ(βˆ − β) = ZnT ΓZn .

Hence consider,

1/2 Z
˜ n = ZnT Γ1/2 (I − Ψ−1
ZnT ΓZn − ZnT ΓZ
n
n )Γ

= χTn Ψn (I − Ψ−1
n )Ψn χn .

Taking norm we get,

˜ n = χT Ψn (I − Ψ−1 )Ψn χn
ZnT ΓZn − ZnT ΓZ
n
n
≤ |χn χTn | Ψn (I − Ψ−1
n )Ψn = Op (pn )Op

61

1
pn

.

We use (3.8.11), (3.8.14) to get the above result. Thus, we obtain,
˜ n
ZnT ΓZn − ZnT ΓZ
(2pn )1/2

= op (1).

This concludes the proof of 3.4.3 from Theorem 2.
We next prove 3.4.4 Theorem 2
Proof Consider,
DTˆ Dβˆ
DTˆ Dβˆ
TD
TD
2
D
D
β
β
˜ − n(βˆ − β)
˜T
˜ ≤ n(βˆ − β)
˜
˜T
(βˆ − β)
(βˆ − β)
−
.
n(βˆ − β)
n
n
n
n
T
DT D Dβˆ Dβˆ
−
= op (1) and from Theorem 1 we have n βˆ − β˜ = Op (1). Thus,
We have
n
n

we get
DTˆ Dβˆ
TD
D
˜T
˜ − n(βˆ − β)
˜T β
˜ = op (1).
n(βˆ − β)
(βˆ − β)
(βˆ − β)
n
n
Thus, the result is proved.

62

Chapter 4
Measurement Error Model
In this chapter, we consider the problem of estimating the effect function in a functional regression
set-up where the response is a scalar and the regressor is a function, in the presence of measurement
error. This is as an extension of the functional model proposed in the previous chapter albeit with
stronger assumptions. In several studies the covariates are contaminated with errors. Sometimes,
this error can be easily determined and eliminated from the data. However, if it is random then we
need to adjust our analysis to account for this measurement error as ignoring it can lead to bias in
estimating parameters. We adapt the framework provided by Stefanski and Carroll (1987) to accommodate functional covariate. We then use the Karhunen-Lo´eve expansion to obtain estimating
equations similar to those in Stefanski and Carroll (1987). The main difference between the two
frameworks is that the number of parameters to be estimated in the functional case is diverging
while in the non functional case is fixed. We prove the consistency of our estimates in the case that
the response has normal distribution conditional on the functional regressor. We perform a limited
simulation study to investigate the accuracy of our estimator.
There are two important features of measurement error models: the first one is the relationship
between the original unobserved process X(t) and the observed process W (t) with error, and the
type of data available to assess some characteristics of the measurement error. For the relationship
between X(t) and W (t) we assume the classical additive error model i.e. W (t) = X(t) + U (t),
E(U (t)|X(t), t ∈ T ) = 0, T = [0, 1]. We assume that replicates of W (t) are available. Further

63

details of this model will be given subsequently.
There is little literature available on measurement errors in functional model. Cardot et al.
(2007) consider the linear functional model: Yi =

β(t)Xi (t) + ei , i = 1..., n and assume the

following measurement error structure: Wi (tj ) = Xi (tj ) + Uij , i = 1, ..., n, j = 1, ..., p and
t1 < t2 <, ... < tp are the discrete points at which the curves are observed. Noise components Uij
are assumed to be independent of each other and error ei for all i, j. The noise components are not
considered to be discrete realizations of a continuous time stochastic process and are interpreted
as random measurement errors at finite discretization points. This particular measurement error
framework is used because in practice we only observe curves at finite discrete points instead
of the entire curve. Similar approach can be found in Cardot (2000). In her dissertation, Cai
(2014) extends the simulation extrapolation (SIMEX) method developed by Carroll et al. (2006)
to accommodate the measurement error at finite discrete points in linear as well as non-linear
functional models. The measurement errors Uij are allowed to have a sparse correlation structure
like the autocorrelation structure and are assumed to be normally distributed. The main advantage
of the method that we propose in this chapter is that it does not impose any restriction on the
covariance function of the measurement error process U (t), t ∈ T .

4.1

Model

We observe the following independent sample (Wi (t), t ∈ T = [0, 1], Yi ), i = 1, ..., n, where
Wi (t) is the surrogate of the true covariate Xi (t) which is unobservable. We assume that Wi (·), 1 ≤
i ≤ n are random elements of L2 = L2 [0, 1] and that Yi given the function Xi (·), has the following

64

distribution with respect to a dominating measure m :

fYi (yi ; θ1 , xi (·)) = exp

1 ˜
˜
yi (β0 + 01 β(t)x
i (t)d(t)) − b(β0 + 0 β(t)xi (t)dt)
+ c(yi , φ) .
a(φ)

(4.1.1)
˜ ∈ L2 , θ1 = (β0 , β(·),
˜ φ).
All integrals hereafter are taken over [0, 1] and Xi (·), ∀i and β(·)
2
Let (ρk )∞
k=1 be an orthonormal basis functions in L space. Using basis expansion we obtain
∞

E(Yi ; θ1 , Xi (·)) =

Xik β˜k ,

˜
β(t)X
i (t)dt =
k=1

where Xik =

X(t)ρk (t)dt, β˜k =

˜
β(t)ρ
k (t)dt. Model (4.1.1) has infinitely many parameters.

We address this issue of infinite dimensions with a truncation strategy. Let β˜ = (β1 , ..., βpn )T and
˜ φ). Instead of the model (4.1.1), we work with the following sequence of models with
θ˜ = (β0 , β,
increasing dimension pn :

 yi

˜ xi ) = exp 
fYi (yi ; θ,



pn

β0 +

xik β˜k

− b β0 +

k=1

pn


xik β˜k

k=1

a(φ)



+ c(yi , φ)
.


(4.1.2)

We assume the following relation between the true covariate Xi (·) and the observed surrogate
Wi (·) :

Wi (t) = Xi (t) + Ui (t), i = 1, ..., n, ∀t ∈ [0, 1],

(4.1.3)

where Ui is a copy of Gaussian process U having mean function 0 and covariance function K(·, ·), ∀i.
Take (ρi )∞
i=1 to be the basis formed from the eigenfunctions of the integral operator K associated
65

with the covariance function K(·, ·). Let Wik =

Wi (t)ρk (t)dt and Uik =

Ui (t)ρk (t)dt. This

yields the following measurement error set-up for (4.1.2):

Wik = Xik + Uik , i = 1, .., n, k ≥ 1.

Let, λk be the eigenvalue associated with the k th eigenfunction of K. From the Karh¨unen-Lo`eve
expansion we obtain that Uik are independent and Uik ∼ N (0, λk ), i = 1, .., n, k ≥ 1. Thus, we
have a additive measurement error set up where errors are independent with normal distribution.
This set-up is similar to that of Stefanski and Carroll (1987) where the authors proposed the
sufficiency estimator for generalized linear models. We use their approach to propose estimators
(l)
for the parameters. Denote Xi = (Xi1 , ..., Xipn )T . Let Wi denote the lth replicate of Wi =
(1)

(m)

(Wi1 , ..., Wipn )T . The likelihood of the replicates Wi = (Wi , ..., Wi
m

˜ xi ) =
fW (wi , θ,
i

j=1

(2π)−pn /2
exp
|Ω1 |

) is

−1 (j)
(j)
(wi − xi )T (wi − xi ) ,
2

(4.1.4)

where, Ω1 = diag(λ1 , ..., λpn ) is the unknown covariance matrix of the measurement error vector
and m denotes the number of replicates. We assume that m = 1 and Ω = Ω1 /a(φ) is known. The
estimate of Ω1 and a(φ) are given subsequently. Let Y, W denote random variables with the same
distribution as Yi , Wi , i = 1, ..., n respectively. For now we drop the subscript i and use it when
necessary. We assume that W has no information on Y other than what is contained in x i.e

fY,W (y, w; θ, x) = fY (y; θ, x)fW (w, θ, x).

66

(4.1.5)

˜ = (∆1 , ..., ∆p )T = ∆ = W + Y Ωβ.
˜ The distribution of Y |∆ is given by
Let ∆(β)
n
1
˜
˜ φ)
+ c(y, φ) − log S(η, β,
fY |∆ (y|∆ = δ; θ) = exp yη − y 2 β˜T Ωβ/a(φ)
2

,

(4.1.6)

1
˜
˜
exp yη − y 2 β˜T Ωβ/a(φ)
+ c(y, φ) dm(y), η = (β0 + δ T β)/a(φ).
2

˜ φ) =
where S(η, β,

This distribution belongs to exponential family. This leads to the following estimating equations:
Ψs (y, w, θ) = (∂/∂θ)log fY /∆ (y|δ; θ) evaluated at δ(β) = w + yΩβ.




(y − E(Y |∆(β) = δ(β))) /a (φ)






Ψ(y, w, θ) = (y − E(Y |∆(β) = δ(β))) δ(β)/a(φ) − y 2 − E(Y 2 |∆(β) = δ(β)) Ωβ/a(φ) .




r(y, w, θ) − E(r(y, w, θ)|∆(β) = δ(β))
(4.1.7)
∂c(Y, θ)
β T Ωβ
β + δ(β)T β
−Y 0 2
a (φ) + Y 2 2 a (φ). Note that Ψ is unbiased
∂φ
a (φ)
2a (φ)
n
ˆ = 0 is called as a sufficiency estimator. This sufficiency
for θ. Any θˆ satisfying
Ψ(wi , yi , θ)
where, r(Y, w, θ) =

i=1

estimator does not maximize the conditional likelihood.
We present the case when Y has normal distribution. In this case, (4.1.7) can be written as


1
(y − µ)
σ2




Ωβ
1
2
Ψ(y, w, θ) = 
 1 + β T Ωβ − σ 2 (y − µ) Ωβ − (y − µ)(δ(β) − 2µΩβ)


−1
(y − µ)2 (1 + β T Ωβ)
+
2σ 2
2σ 4





.




(4.1.8)

Let ∆∗ (β) = (I + Ωββ T )−1 (∆(β) − β0 Ωβ) and µ = (β0 + β T δ(β))/(1 + β T Ωβ). Consider the

67

following equations:
n

(∆∗i (β)Yi − ∆∗i (β)β0 − ∆∗i (β)β T ∆∗i (β)) = 0,

U (β) =

(4.1.9)

i=1
n

(Yi − β0 − β T ∆∗i (β)) = 0,
i=1

σ2 =

1 + β T Ωβ
n

n

(Yi − µi )2 .
i=1

These equations resemble the usual normal equations. The solution to (4.1.9) is also a solution to
(4.1.8). Note that the above equations are non linear in the parameters. Let W∗ = W − W, Y ∗ =
Y − Y , where Y and W denotes the average of Y1 , ..., Yn and W1 , ..., Wn respectively. The
equations (4.1.9) can be re-written as

β0 = Y − β T W,

(4.1.10)

n

n

Wi∗ Yi∗ β T Ωβ

U (β) = −
i=1

σ2 =

n

Yi∗2 Ωβ

+
i=1

n
1 + β T Ωβ

n

n

Wi∗ Wi∗T β

−
i=1

Wi∗ Yi∗ = 0,

+
i=1

(Yi − µi )2 .

i=1

The equation U (β) = 0 is a quadratic equation and thus has two roots. There is no way of
determining which root is the correct. One way to handle this problem is to solve the equation
iteratively starting from the naive estimator. Naive estimator can be obtained by treating W as the
true covariate i.e. ignoring the measurement error. We state the assumptions needed to show that
the equations (4.1.10) has a solution βˆ and that this solution is consistent.
Assumption 1 p = pn → ∞,

pn n−1/8 → 0.

Assumption 2 sup Xi = O(n1/4 ).
i

68

√
Assumption 3 For all n we assume that Ω1 = O( pn ).
Assumption 4 There exist positive constants b1 , b2 such that for all n,
n

0 < b1 ≤ λmin
i=1

n

Xi XiT
n

≤ λmax
i=1

Xi XiT
n

≤ b2 < ∞

Assumption 5 There exists a constant c such that for all n,

max







λ
 min





2

, λmax 

Xi1 XiT

n

i1 =i2

n2



2
≤c

n2

Xi1 XiT

n
i1 =i2

Assumption 6 For all n,

λmax (Ω1 )
≤ λmin
λmin (Ω1 )

n
i=1

Xi XiT
n

n

c−1 ,

c ≤ λmin
i=1

Xi XiT
n

.
Lemma 4.1.1
˜ = √npn .
U (β)
Proof
We can show that the distribution of Yi conditional on ∆∗ is normal with mean β0 + β˜T ∆∗ .
pn

˜
U (β)

2

2

n

(Yi − E(Yi |∆∗i ))∆∗ik

=
k=1 i=1
pn n

(Yi − E(Yi |∆∗i ))∆∗ik

=
k=1 i=1
pn

2

n

+
k=1 i1 =i2 =1

(Yi1 − E(Yi1 |∆∗i1 ))∆∗i k
1

69

(Yi2 − E(Yi2 |∆∗i ))∆∗i k
2
2

Taking expectation and using the fact that that the samples are independent we get

˜ 2 ) = O(npn ).
E(||U (β)||

As mentioned in chapter 3 the next theorem is sufficient to prove weak consistency of our
estimator.
Theorem 4.1.1 For all > 0, there exists a constant ζ > 0 such that for sufficiently large n,



P

sup√
˜

||β−β||=ζ

˜ T U (β) < 0 ≥ 1 − .
(β − β)

pn /n

Proof
Let J(β) =

∂U (β)
∂β

and β ∗ be such that β ∗ − β˜ < β − β˜ . Using Taylors expansion we

obtain,

˜ T U (β) = (β − β)
˜ T U (β)
˜ + (β − β)
˜ T J(β ∗ )(β − β)
˜
(β − β)
˜ T U (β)
˜ + (β − β)
˜ T (J(β ∗ ) − J(β))(β
˜
˜ + (β − β)
˜ T J(β)(β
˜
˜
= (β − β)
− β)
− β).

Using representation (4.1.10) of U (β) we get
n

Ω(W∗ Ti β)Yi∗ − ΩβWi∗T Yi∗ + ΩYi∗2 − Wi∗ Wi∗T .

J(β) = −
i=1

70

Consider,

˜ T (J(β ∗ ) − J(β))(β
˜
˜
(β − β)
− β)
n

−Ω(Wi∗T β ∗ )Yi∗ − Ωβ ∗ Wi∗T Yi∗ + ΩYi∗2 − Wi∗ Wi∗T

˜T
= (β − β)

˜
(β − β)

i=1
n

˜ ∗T Y ∗ − ΩY ∗2 + W∗ W∗T
˜ ∗ + ΩβW
Ω(Wi∗T β)Y
i
i
i
i
i
i

˜T
+ (β − β)

˜
(β − β)

i=1
n

n

ΩYi∗ (Wi∗T β˜ − Wi∗T β ∗ )(β

˜T
= (β − β)

˜
ΩWi∗T Yi∗ (β˜ − β ∗ )(β − β)

˜ + (β − β)
˜T
− β)
i=1

i=1

= A1 + A2

Consider,
n

˜
ΩYi∗ (Wi∗T β˜ − Wi∗T β ∗ )(β − β)|

˜T
|A1 | = |(β − β)
i=1
n

˜ T ΩY ∗ (W∗T β˜ − W∗T β ∗ )(β − β)|
˜
|(β − β)
i
i
i

≤
i=1

n

≤ Ω

Yi∗ Wi∗T (β˜ − β ∗ )

˜ 2
(β − β)
i=1
n

≤ Ω

Yi∗ Wi∗T

˜ 3
(β − β)
i=1
n

≤ Ω

∗T
Yi∗ (X∗T
i + Ui )

˜ 3
(β − β)
i=1
n

≤ Ω

Yi∗ X∗T
+ Yi∗ U∗T
i
i

˜ 3
(β − β)
i=1
n

≤ Ω

Yi∗ X∗T
+ Yi∗ U∗T
i
i

˜ 3
(β − β)
i=1
n

≤ Ω

Yi∗

˜ 3
(β − β)
i=1

71

sup Xi∗ + sup Ui∗
i

i

Taking norm and using Assumption 1, 2, 3 we get,

sup√

˜
||β−β||=ζ

˜ T (J(β ∗ ) − J(β))(β
˜
˜ = Op
||(β − β)
− β)||
pn /n

√
1/4
ζ˜3 pn p1.5
n nn
n1.5

= op (1).

We can prove similarly for A2 . Now consider,
n

n−1 (β

˜ ∗ + ΩY ∗2 − W∗ W∗T
−2Ω(Wi∗T β)Y
i
i
i
i

˜ T J(β)(β
˜
˜ = n−1 (β − β)
˜T
− β)
− β)
i=1

We have,
n

n

n−1

˜ ∗
−2Ω(Wi∗T β)Y
i

≈

−2Ωn−1

i=1

˜ ∗)
E((Wi∗T β)Y
i
i=1

n

= −2Ω
i=1
n

n

n−1

ΩY ∗2 ≈ n−1
i=1

=Ω+
i=1
n

T ˜
˜
(XiT β)(X
i β)

n2

i1 =i2

˜2
(X T β)
Ω i
−Ω
n

n

T ˜
˜
(XiT β)(X
i β)

n

=
i=1

Xi XiT
−Ω+
n

Thus,

72

n
i1 =i2

2

1

n2

i1 =i2

i=1

i=1

2

1

EWi∗ Wi∗T

Wi∗ Wi∗T ≈ n−1

n−1

n

,

ΩEY ∗2
i=1
n

n

˜2
(XiT β)
+ 2Ω
n

Xi1 XiT
n2

2.

,

˜
(β − β)

˜ T J(β)(β
˜
˜
n−1 (β − β)
− β)


n
n X XT
T
i1 i2
Xi Xi ˜ Ω1 ˜T
˜ T − Ω1 β˜T
˜
= (β − β)
β + 2β
β˜ (β − β)
2
n
σ
σ
n2
i=1
i =i

1 2
n
n X XT
i1 i2
Xi XiT
T
˜
˜
 (β − β).
+
− (β − β) 
n
n2
i=1

n
i=1

Let

Xi XiT
= Q,
n

n
i1 =i2

i1 =i2

Xi1 XiT

2

n2

= R, max{|λmax (R)|, |λmin (R)|} = c. Taking supre-

mum

sup√
˜

||β−β||=ζ

≤ ζ 2 pn

˜ T J(β)(β
˜
˜
(β − β)
− β)
pn /n

−λmin (Ω1 ) ˜T ˜ λmax (Ω1 ) ˜T ˜
β Qβ +
β Rβ − λmin (Q)
σ2
σ2

+ ζ 2 pn λmax (R)
≤ ζ 2 pn

−λmin (Ω1 )
λmax (Ω1 ) ˜ 2
λmin (Q) β˜ 2 +
c β − λmin (Q) + λmax (R)
2
σ
σ2

= ζ 2 pn A

Assumptions 4, 5, 6 ensure that A ≤ 0. Thus, the theorem is proved.

4.2

Simulation

We report the details of a limited simulation study that investigates the accuracy of our proposed
estimator in presence of measurement error. While our method does not put restrictions on the
covariance function of the error process, given the challenges of generating a process with the
specific covariance structure we considered the identity covariance function i.e. σ2 I{s=t} . This

73

makes it easier to control the error in the model.
Following Xiaochen Cai (2014) we generated the functional covariate using the following:
pn

Xi (t) =

εik ρk (t),
k=1

where, εik ∼ N (0, 1) and are independent for all i = 1, ..., n and k = 1, ...pn . We set pn = 8.
We generated the basis function using the Canadian weather data set Ramsay (2006). This data
set is available in the fda package in R software. We first smoothed the data using B-splines
smoothing that is described in the simulation section in Chapter 3. We generated the response
using Yi =

Xi (t)β(t)dt + i , where i ∼ N (0, 1) and are independent for all i. We then

generated the surrogate variable Wi (t) using Wi (t) = Xi (t) + Ui (t) where, Ui (t) is a Gaussian
process with 0 mean function and covariance function given as K(s, t) = σ2 I{s=t} . We vary the
value of σ2 to study the effect of measurement error on the estimation procedure. We use 100
replicates to estimate the covariance function. Let Wi1 , ..., Wi100 denote the replicates of Wi and
Wi. denote their mean. The estimate of the covariance function K(·, ·) of the measurement error
is

n 100

(Wij − Wi. )(Wij − Wi. )T

i=1 j=1

n(100 − 1)

.

ˆ k denote the k th eigenvalue of the K(·,
ˆ 1 , ..., λ
ˆ p ). The dimension
ˆ ·). Then, Ω
ˆ 1 = diag(λ
Let λ
n
pn is chosen by using 5-fold cross validation. We solve the (4.1.10) iteratively with start value
obtained from the naive estimator. The following table reports the result of the simulation. Error
˜ T (βˆ − β)/
˜ β˜T β.
˜ In Table 4.1, the naive estimator is the one where we
is calculated as (βˆ − β)
ignore the measurement error i.e treat W (t) as the true covariate. The corrected estimator is the
one obtained by solving (4.1.10). We can see that as the measurement error increases, the error

74

Table 4.1: Error in the estimator as a function of σ2
Error(σ2 ) Naive
0.1
0.009
0.3
0.056
0.4
0.08
0.5
0.115
0.8
0.202
1
0.256

Corrected
0.001
0.006
0.008
0.02
0.08
0.855

in the estimation increases for both the estimators. We can also see that the corrected estimator
improves the accuracy of the estimate. Recall that the estimating equation yields multiple solutions
and we do not have a way of choosing the correct one. To avoid this problem, we solve the equation
iteratively starting from the naive estimator. However, as the measurement error increases in the
data, the start values are farther from the true value and hence, the iterative algorithm does not
always converge to the true value. We observe that the corrected estimator performed better than
the naive one only for a limited range of σ2 .

75

BIBLIOGRAPHY

76

BIBLIOGRAPHY

Balan, R. M., Schiopu-Kratina, I., et al. (2005). Asymptotic results with generalized estimating
equations for longitudinal data. The Annals of Statistics, 33(2):522–541.
Cardot, H. (2000). Nonparametric estimation of smoothed principal components analysis of
sampled noisy functions. Journal of Nonparametric Statistics, 12(4):503–538.
Cardot, H., Crambes, C., Kneip, A., and Sarda, P. (2007). Smoothing splines estimators in
functional linear regression with errors-in-variables. Computational statistics & data analysis,
51(10):4832–4848.
Cardot, H., Ferraty, F., and Sarda, P. (1999). Functional linear model. Statistics & Probability
Letters, 45(1):11–22.
Cardot, H., Ferraty, F., and Sarda, P. (2003). Spline estimators for the functional linear model.
Statistica Sinica, pages 571–591.
Cardot, H. and Johannes, J. (2010). Thresholding projection estimators in functional linear models. Journal of Multivariate Analysis, 101(2):395–408.
Cardot, H. and Sarda, P. (2005). Estimation in generalized linear models for functional data via
penalized likelihood. Journal of Multivariate Analysis, 92(1):24 – 41.
Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement error
in nonlinear models: a modern perspective. CRC press.
Cirulli, E. T. and Goldstein, D. B. (2010). Uncovering the roles of rare variants in common
disease through whole-genome sequencing. Nature Reviews Genetics, 11(6):415–425.
Gertheiss, J., Maity, A., and Staicu, A.-M. (2013). Variable selection in generalized functional
linear models. Stat, 2(1):86–101.
Hall, P. and Heyde, C. (1980). Martingale limit theory and its application. Probability and
mathematical statistics. Academic Press.
Hicks, B. M., Schalet, B. D., Malone, S. M., Iacono, W. G., and McGue, M. (2011). Psychometric
and genetic architecture of substance use disorder and behavioral disinhibition measures for gene
association studies. Behavior genetics, 41(4):459–475.
Horv´ath, L. and Kokoszka, P. (2012). Inference for functional data with applications, volume
200. Springer Science & Business Media.
Hsing, T. and Eubank, R. (2015). Theoretical foundations of functional data analysis, with an
introduction to linear operators. John Wiley & Sons.
77

Laird, N. M. and Lange, C. (2010a). The fundamentals of modern statistical genetics. Springer
Science & Business Media.
Laird, N. M. and Lange, C. (2010b). The Fundamentals of Modern Statistical Genetics. Springer
Publishing Company, Incorporated, 1st edition.
Lee, S., Abecasis, G. R., Boehnke, M., and Lin, X. (2014). Rare-variant association analysis:
study designs and statistical tests. The American Journal of Human Genetics, 95(1):5–23.
Li, Y., Wang, N., and Carroll, R. J. (2010). Generalized functional linear models with semiparametric single-index interactions. Journal of the American Statistical Association, 105(490):621–
633.
Madsen, B. E. and Browning, S. R. (2009). A groupwise association test for rare mutations using
a weighted sum statistic. PLoS Genet, 5(2):e1000384.
Morris, J. S. (2015). Functional regression. Annual Review of Statistics and Its Application,
2:321–359.
M¨uller, H.-G. and Stadtm¨uller, U. (2005). Generalized functional linear models. Ann. Statist.,
33(2):774–805.
Ortega, J. M. and Rheinboldt, W. C. (2000). Iterative Solution of Nonlinear Equations in Several
Variables. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA.
Ramsay, J. O. (2006). Functional data analysis. Wiley Online Library.
Saccone, N. L., Saccone, S. F., Hinrichs, A. L., Stitzel, J. A., Duan, W., Pergadia, M. L., Agrawal,
A., Breslau, N., Grucza, R. A., Hatsukami, D., et al. (2009). Multiple distinct risk loci for
nicotine dependence identified by dense coverage of the complete family of nicotinic receptor
subunit (chrn) genes. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics,
150(4):453–466.
Schaid, D. J., McDonnell, S. K., Sinnwell, J. P., and Thibodeau, S. N. (2013). Multiple genetic variant association testing by collapsing and kernel methods with pedigree or population
structured data. Genetic epidemiology, 37(5):409–418.
Stefanski, L. A. and Carroll, R. J. (1987). Conditional scores and optimal scores for generalized
linear measurement-error models. Biometrika, pages 703–716.
Vrieze, S. I., Malone, S. M., Vaidyanathan, U., Kwong, A., Kang, H. M., Zhan, X., Flickinger,
M., Irons, D., Jun, G., Locke, A. E., et al. (2014). In search of rare variants: preliminary results
from whole genome sequencing of 1,325 individuals with psychophysiological endophenotypes.
Psychophysiology, 51(12):1309–1320.

78

Vsevolozhskaya, O. A., Zaykin, D. V., Greenwood, M. C., Wei, C., and Lu, Q. (2014). Functional
analysis of variance for association studies. PloS one, 9(9):e105074.
Wang, J.-L., Chiou, J.-M., and M¨uller, H.-G. (2016). Functional data analysis. Annual Review of
Statistics and Its Application, 3:257–295.
Wang, L. et al. (2011). Gee analysis of clustered binary data with diverging number of covariates.
The Annals of Statistics, 39(1):389–417.
Wang, X., Lee, S., Zhu, X., Redline, S., and Lin, X. (2013). Gee-based snp set association
test for continuous and discrete traits in family-based association studies. Genetic epidemiology,
37(8):778–786.

79