STATISTICAL INFERENCE FOR FUNCTIONAL AND LONGITUDINAL DATA
By
Guanqun Cao

A DISSERTATION
Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
Statistics
2012

ABSTRACT
STATISTICAL INFERENCE FOR FUNCTIONAL AND LONGITUDINAL
DATA
By
Guanqun Cao

Advances in modern technology have facilitated the collection of high-dimensional functional and low dimensional longitudinal data. For these data, it is often of interest to describe
the key signals of the data (mean functions, covariance functions, derivative functions, etc.).
Functional data analysis (FDA) and longitudinal data analysis (LDA) techniques have played
a central role in the analysis of these data. The primary goal of this dissertation is to provide
some novel statistical inference methods for FDA and LDA.
In Chapter 1, we describe the structure (design, notations, etc.) of functional data and
describe the spline smoothing technique as a tool to analysis these data. Longitudinal data
analysis with missing not at random response is also discussed.
In Chapter 2, a polynomial spline estimator is proposed for the mean function of dense
functional data together with a simultaneous conﬁdence band which is asymptotically correct. In addition, the spline estimator and its accompanying conﬁdence band enjoy semiparametric eﬃciency in the sense that they are asymptotically the same as if all random
trajectories are observed entirely and without errors. The conﬁdence band is also extended
to the diﬀerence of mean functions of two populations of functional data. Simulation experiments provide strong evidence that corroborates the asymptotic theory while computing
is eﬃcient. The conﬁdence band procedure is illustrated by analyzing the near infrared
spectroscopy data.

A nonparametric estimation of the covariance function for dense functional data using
tensor product B-splines is considered in Chapter 3. We develop both local and global
asymptotic distributions for the proposed estimator, and show that our estimator is as
eﬃcient as an “oracle” estimator. Monte Carlo simulation experiments and two real data
examples are also provided to illustrate the proposed method in this chapter.
In Chapter 4, we develop a new procedure to construct simultaneous conﬁdence bands for
derivatives of mean curves in FDA. The technique involves polynomial splines that provide
an approximation to the derivatives of the mean functions, the covariance functions and the
associated eigenfunctions. The conﬁdence band procedure is illustrated through numerical
simulation studies and a real life example.
In Chapter 5, we consider data generated from a longitudinal study with potentially
non random missing data. For these data, a joint model for the missing data process and
the outcome process, is found to be at best weakly identiﬁable. Due to this identiﬁability
concerns, tests concerning the parameters of interest may not be able to use conventional
theories and it may not be clear how to assess statistical signiﬁcance. We extend the literature by developing a testing procedure that can be used to evaluate hypotheses under
non and weakly identiﬁable semiparametric models. We derive the limiting distribution of
this statistic and propose theoretically justiﬁed resampling approaches to approximate its
asymptotic distribution. The methodology’s practical utility is illustrated in simulations and
an analysis of quality-of-life outcomes from a longitudinal study on breast cancer.

To my beloved parents

iv

ACKNOWLEDGMENTS

Foremost, I would like to express my sincere gratitude to my advisor Professor Lijian Yang
for the continuous support of my Ph.D study and research, for his patience, motivation,
enthusiasm, and immense knowledge. His guidance helped me in all the time of research and
writing of this thesis. I could not have imagined having a better mentor for my Ph.D study.
I would like to express my gratitude to my co-advisor Professor David Todem, for oﬀering
invaluable assistance, support and guidance. He provided me a fabulous opportunity to study
biostatistics topics. Moreover, I could not ﬁnish my job application promptly without his
generous help.
Besides my advisors, I would like to thank the rest of my thesis committee: Professors
Lyudmila Sakhanenko and Lifeng Wang for their hard questions, encourage and insightful
comments. My sincere thanks also goes to Professor Jiaguo Qi in the Center for Global
Change and Earth Observations at MSU, for oﬀering me the collaboration opportunities in
his group and leading me working on diverse exciting projects.
Thanks also goes to the entire faculty and staﬀ members in the Department of Statistics
and Probability who have taught me and help me during my study at MSU. My special
thanks goes to Professor James Stapleton, for listening to my presentation rehearsal many
times and helpful suggestions which leads to my successful presentations.
Thanks to the graduate school and the Department of Statistics and Probability who
provided me with the Dissertation Continuation Fellowship and Dissertation Completion
Fellowship for working on the dissertation. This dissertation ia also supported in part by
NSF award DMS 0706518, 1007594 and NCI/NIH K-award, 1K01 CA131259.
I also would like to thank my academic family members: Dr. Lan Xue, Dr. Jing Wang,
v

Dr. Li Wang, Dr. Rong Liu, Dr. Qiongxia Song and Shuzhuan Zheng for the stimulating
discussions, for the sleepless nights we were working together before submitting papers, and
for all the fun we have had in the last ﬁve years.
Last but not the least, I would like to thank my beloved parents Jinxiang Cao and Lazhen
Sun, for giving birth to me at the ﬁrst place and supporting me spiritually throughout my
life.

vi

TABLE OF CONTENTS

List of Tables . . . . . . . . . . . . . . .
List of Figures

. . . . . . . . . . . . . . . . . . .

ix

. . . . . . . . . . . . . . . . . .

xi

. . . . . . . . . . . . . .

Chapter 1 Introduction . . . . . . . . . . . . .
1.1 Functional data analysis . . . . . . . . . . .
1.1.1 The basics . . . . . . . . . . . . . . .
1.1.2 Polynomial spline . . . . . . . . . . .
1.1.3 The near infrared spectroscopy data .
1.1.4 Speech recognition data . . . . . . .
1.2 Longitudinal data analysis . . . . . . . . . .
1.2.1 Problems . . . . . . . . . . . . . . .
1.2.2 The breast cancer data . . . . . . . .

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

1
1
1
2
4
4
5
5
6

Functions
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

7
7
11
17
20
21
31

Chapter 3 Conﬁdence Envelopes for Covariance Functions
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Spline covariance estimation . . . . . . . . . . . . . . . . . .
3.2.1 Data structure and model assumptions . . . . . . . .
3.2.2 Spline covariance estimator . . . . . . . . . . . . . .
3.3 Asymptotic theory and simultaneous conﬁdence envelopes .
3.3.1 Assumptions and the oracle property . . . . . . . . .
3.3.2 Asymptotic conﬁdence envelopes . . . . . . . . . . .
3.3.3 Extension to two-sample test problems . . . . . . . .
3.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

47
47
50
50
51
54
54
56
58
59
61

Chapter 2 Conﬁdence Bands for
2.1 Introduction . . . . . . . . . .
2.2 Main results . . . . . . . . . .
2.3 Error decomposition . . . . .
2.4 Implementation . . . . . . . .
2.5 Simulation . . . . . . . . . . .
2.6 Empirical example . . . . . .

Mean
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .

vii

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

3.6

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

65
65
66
71

Chapter 4 Spline Conﬁdence Bands for Functional Derivatives .
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Models and spline estimators . . . . . . . . . . . . . . . . . . . . .
4.2.1 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.2 Spline estimators . . . . . . . . . . . . . . . . . . . . . . .
4.2.3 Convergence rate . . . . . . . . . . . . . . . . . . . . . . .
4.3 Conﬁdence bands . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1 Asymptotic conﬁdence bands . . . . . . . . . . . . . . . .
4.3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . .
4.4 Numerical studies . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 Simulated examples . . . . . . . . . . . . . . . . . . . . . .
4.4.2 Tecator data . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

96
96
98
98
99
101
103
103
104
107
107
117

Chapter 5 Testing Hypotheses Under Weakly Identiﬁability
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 The method . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 The general framework . . . . . . . . . . . . . . . . . .
5.2.2 Large sample properties of ˆ
θ(β) . . . . . . . . . . . . .
5.2.3 Global sensitivity testing . . . . . . . . . . . . . . . .
5.3 Numerical studies . . . . . . . . . . . . . . . . . . . . . . . .
5.3.1 Pseudo-likelihood models with missing data . . . . . .
5.3.2 Real data analysis . . . . . . . . . . . . . . . . . . . .
5.3.3 Simulation study . . . . . . . . . . . . . . . . . . . . .
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

137
137
143
143
144
147
153
153
156
160
165

3.7

Empirical examples . . . . . . . . . . . . .
3.6.1 Tecator near infrared spectra data .
3.6.2 Speech recognition data . . . . . .
Summary . . . . . . . . . . . . . . . . . .

Bibliography . . . . . . . . . . . . . . .

viii

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

. . . . . . . . . . . . . . . . . . . 173

LIST OF TABLES

Table 2.1

Coverage frequencies from 500 replications using linear spline (??)
with p = 2, Nm = [cn1/(2p) log(n)] and σ = 0.5. . . . . . . . . . . .

25

Coverage frequencies from 500 replications using linear spline (??)
with p = 2, Nm = [cn1/(2p) log(n)] and σ = 0.3. . . . . . . . . . . .

25

Coverage frequencies from 500 replications using cubic spline (??)
with p = 4, Nm = [cn1/(2p) log(n)] and σ = 0.5. . . . . . . . . . . .

26

Coverage frequencies from 500 replications using cubic spline (??)
with p = 4, Nm = [cn1/(2p) log(n)] and σ = 0.3. . . . . . . . . . . .

27

Coverage frequencies from 500 replications using least squares Bonferroni band and least squares Bootstrap band. . . . . . . . . . . . .

28

Table 2.6

Five number summary of ratios of conﬁdence band widths. . . . . .

29

Table 2.7

Empirical power and type I error of two-sample test using cubic spline. 30

Table 3.1

Simulation results: uniform coverage rates from 500 replications. . .

Table 4.1

Uniform coverage rates using cubic spline in Model I . . . . . . . . . 108

Table 4.2

Uniform coverage rates using cubic spline in Model II . . . . . . . . 117

Table 5.1

P-values for evaluating the Treatment and Time eﬀects using data
from Study VI of the IBCSG trials . . . . . . . . . . . . . . . . . . . 159

Table 5.2

Empirical type I error and power of the inﬁmum test and Wald test

Table 2.2

Table 2.3

Table 2.4

Table 2.5

ix

95

164

Table 5.3

Empirical power of the inﬁmum test to detect the interaction eﬀect
α3 = 1 for two ranges Ξ of β with true value being β 0 = −1 . . . . 166
˜

Table 5.4

Empirical power of the inﬁmum test to detect the interaction eﬀect
α3 = 0.7 for two sets Ξ of β with true value being β 0 = −1 . . . . . 166
˜

x

LIST OF FIGURES

Figure 2.1

Plots of the linear spline estimator for simulated data and 95% conﬁdence bands for m(x) . . . . . . . . . . . . . . . . . . . . . . . . .

23

Plots of the cubic spline estimator for simulated data and 95% conﬁdence bands for m(x) . . . . . . . . . . . . . . . . . . . . . . . . .

24

Left: Plot of Tecator data. Right: Sample curves for the Tecator
data. Each class has 10 sample curves. . . . . . . . . . . . . . . . .

32

Plots of the ﬁtted linear and cubic spline regressions of m1 (x)−m2 (x)
and conﬁdence bands . . . . . . . . . . . . . . . . . . . . . . . . . .

34

Plots of the true covariance functions of the simulated data and their
95% conﬁdence envelopes: n=200, N=100, σ=0.5. p1 = p2 = 2 . . .

63

Plots of the true covariance functions of the simulated data and their
95% conﬁdence envelopes: n=200, N=100, σ=0.5. p1 = p2 = 4 . . .

64

Figure 3.3

Plot of the Tecator data . . . . . . . . . . . . . . . . . . . . . . . . .

67

Figure 3.4

Plots of the cubic tensor spline estimator for the Tecator data . . . .

68

Figure 3.5

Plots of hypothesis test (3.12) results . . . . . . . . . . . . . . . . .

69

Figure 3.6

Plots of the speech recognition data . . . . . . . . . . . . . . . . . .

71

Figure 3.7

Plots of tensor spline estimators for speech recognition data sets . .

72

Figure 3.8

Plots of hypothesis test (3.13) results . . . . . . . . . . . . . . . . .

73

Figure 2.2

Figure 2.3

Figure 2.4

Figure 3.1

Figure 3.2

xi

Figure 4.1

Plots of the cubic spline estimators and 99% conﬁdence bands of
m(1) (t) in Model I. n = 30, N = 30. . . . . . . . . . . . . . . . . . . 109

Figure 4.2

Plots of the cubic spline estimators and 99% conﬁdence bands of
m(1) (t) in Model I. n = 30, N = 60. . . . . . . . . . . . . . . . . . . 110

Figure 4.3

Plots of the cubic spline estimators and 99% conﬁdence bands of
m(1) (t) in Model I. n = 50, N = 50. . . . . . . . . . . . . . . . . . . 111

Figure 4.4

Plots of the cubic spline estimators and 99% conﬁdence bands of
m(1) (t) in Model I. n = 50, N = 100. . . . . . . . . . . . . . . . . . 112

Figure 4.5

Plots of the cubic spline estimators and 99% conﬁdence bands of
m(1) (t) in Model II. n = 30, N = 30. . . . . . . . . . . . . . . . . . 113

Figure 4.6

Plots of the cubic spline estimators and 99% conﬁdence bands of
m(1) (t) in Model II. n = 30, N = 60. . . . . . . . . . . . . . . . . . 114

Figure 4.7

Plots of the cubic spline estimators and 99% conﬁdence bands of
m(1) (t) in Model II. n = 50, N = 50. . . . . . . . . . . . . . . . . . 115

Figure 4.8

Plots of the cubic spline estimators and 99% conﬁdence bands of
m(1) (t) in Model II. n = 50, N = 100. . . . . . . . . . . . . . . . . . 116

Figure 4.9

Plots of the cubic spline estimators and 99% conﬁdence bands . . . 119

Figure 5.1

Supremum of the pseudo-likelihood function proﬁled across β . . . . 140

Figure 5.2

Plot of the exact and the resampled CDF . . . . . . . . . . . . . . . 152

Figure 5.3

Simultaneous conﬁdence bands for time eﬀects on PACIS and Mood

xii

161

Chapter 1
Introduction

1.1
1.1.1

Functional data analysis
The basics

Functional data analysis (FDA) has recently become a focal area in statistical research, as
recent technological progress in measuring devices now allow one to observe spatiotemporal
phenomena on arbitrarily ﬁne grids, that is, almost in a continuous manner. This area
remains distinct due to its usefulness to climatology, medicine, meteorology, etc.
The functional data that we consider are a collection of trajectories η i (x) n which
i=1
are i.i.d. realizations of a smooth random function η(x), deﬁned on a continuous interval X .
Assume that {η(x), x ∈ X } is an L2 process, i.e. E X η 2 (x)dx < +∞, and deﬁne the mean
and covariance functions m(x) = E{η(x)} and G x, x

= cov η(x), η(x ) . The covari-

ance function is a symmetric nonnegative-deﬁnite function with a spectral decomposition,
G x, x

=

κ λ ψ (x)ψ x , where λ ≥ λ ≥ · · · ≥ 0,
1
2
k
k=1 k k

κ λ < +∞, are the
k=1 k

eigenvalues, κ is either a positive integer or inﬁnity and ψ k (x) κ
k=1 are the corresponding
1

eigenfunctions and form a set of orthonormal functions in L2 (X ). By the Karhunen-Lo`ve
e
representation, η i (x) = m(x) +

κ ξ φ (x), where the random coeﬃcients ξ are unik
k=1 ik k

correlated with mean 0 and variance 1, and the functions φk =

λk ψ k .

Characterizing nonlinear variation in FDA is a challenging problem. In particular, when
random curves are observed on regular dense grids, the existing literature on FDA focuses on
pointwise estimation and inference. This, however, is not suﬃcient to provide understanding
of the variability of the estimator of the whole regression curve, its derivatives and covariance
function, nor can it be used to correctly answer questions about the curve’s or surface’s shape.
In Chapters 2, 3 and 4, we propose oracle estimators and establish asymptotic correctness of the proposed simultaneous conﬁdence bands/envelopes for mean, covariance and its
derivative functions while the number of observations for each subject tends to inﬁnite as
sample size goes to inﬁnite, using various properties of spline smoothing.

1.1.2

Polynomial spline

To describe the spline functions, we ﬁrst introduce a sequence of equally-spaced points
tJ Nm , called interior knots which divide the interval [0, 1] into (Nm + 1) equal subinJ=1
tervals IJ = tJ , tJ+1 , J = 0, ...., Nm − 1, INm = tNm , 1 . For any positive integer p,
introduce left boundary knots t1−p , ..., t0 , and right boundary knots tNm +1 , ..., tNm +p ,

t1−p = · · · = t0 = 0 < t1 < · · · < tNm < 1 = tNm +1 = · · · = tNm +p ,
tJ = Jhm , 0 ≤ J ≤ Nm + 1, hm = 1/ (Nm + 1) ,

in which hm is the distance between neighboring knots. Denote by H(p−2) the space of
p-th order spline space, i.e., p − 2 times continuously diﬀerentiable functions on [0, 1] that
2

are polynomials of degree p − 1 on [tJ , tJ+1 ], J = 0, . . . , Nm .

The J-th B-spline of order p is

BJ,p (x) =

x − tJ+p BJ+1,p−1 (x)
x − tJ BJ,p−1 (x)
−
,
tJ+p−1 − tJ
tJ+p − tJ+1

1 − p ≤ J ≤ Nm ,

for p > 1, with





 1 t ≤x<t


J
J+1

BJ,1 (x) = IJ (x) =




 0



otherwise.

Hence, the B-spline basis of H(−1) , the space of constant splines, are indicator functions
of interval IJ , BJ,1 (x) = IJ (x), J = 0, ...., Nm , while the B-spline basis of H(0) , the space
of linear splines, are BJ,2 (x) = K

x − tJ+1 h−1 , J = −1, ...., Nm , where K (u) =
m

(1 − |u|)+ .

In this work, X = [0, 1] or [0, 1]2 . For any function φ on a domain X , let φ ∞ =
supx∈X |φ (x)|. For any Lebesgue measurable L2 -integrable functions φ and ϕ on [0, 1],
deﬁne their theoretical and empirical inner products as φ, ϕ

=

X φ (x) ϕ (x) dx and

φ, ϕ 2,N = N −1

N φ (j/N ) ϕ (j/N ), with the corresponding theoretical and empirical
j=1
1
−1 N φ2 (j/N ),
inner product norms deﬁned as φ 2 = 0 φ2 (x) dx and φ 2
2
j=1
2,N = N
Nm
respectively. The theoretical and empirical inner product matrices of BJ,p (x)
are
J=1−p
denoted as

Vp =

BJ,p , BJ ,p

Nm
ˆ
, Vp =
J,J =1−p
3

Nm
BJ,p , BJ ,p
.
2,N J,J =1−p

1.1.3

The near infrared spectroscopy data

The near infrared spectroscopy data data are recorded on a Tecator Infrared Food and Feed
Analyzer working in the wavelength range 850 - 1050 nm by the Near Infrared Transmission
(NIT) principle. Each sample contains ﬁnely chopped pure meat with diﬀerent moisture, fat
and protein contents. The task is to predict the fat content of a meat sample on the basis
of its near infrared absorbance spectrum.
The spectral data can be naturally considered as functional data as the number of observation points for each trajectory is relatively large compared with the sample size. In
Chapter 2, we will perform a two-sample test to evaluate whether absorbance from the spectrum diﬀers signiﬁcantly due to diﬀerence in fat content. In Chapter 4, we will study the
behavior of the ﬁrst order derivative of mean absorbance measurements for this data set.

1.1.4

Speech recognition data

Speech recognition data were extracted from the TIMIT database (TIMIT Acoustic-Phonetic
Continuous Speech Corpus, NTIS, US Dept of Commerce) which is a widely used resource for
research in speech recognition. The data set we use was formed by selecting ﬁve phonemes
for classiﬁcation based on digitized speech from this database. From continuous speech
of 50 male speakers, 4509 speech frames of 32 msec duration were selected. From each
speech frame, a log-periodogram was used as transformation for casting speech data in a
form suitable for speech recognition. The ﬁve phonemes in this data set are transcribed as
follows: “sh” as in “she”, “dcl” as in “dark”, “iy” as the vowel in “she”, “aa” as the vowel
in “dark”, and “ao” as the ﬁrst vowel in “water”. For illustration purpose, we focus on
the “sh” and “ao” phoneme classes as representatives of consonants and vowels. There are
4

n1 = 872 log-periodograms in the “sh” class, and n2 = 1022 log-periodograms in the “ao”
group. Each log-periodogram consists N = 256 equally spaced points.
This data set was ﬁrst analyzed by Hastie et al. (1995) using penalized linear discriminant
analysis. One of the basic assumptions is that the covariance functions are the same for
diﬀerent classes. In Chapter 3, we would like to test the equal covariance assumption formally
by the proposed tensor product spline conﬁdence envelopes.

1.2
1.2.1

Longitudinal data analysis
Problems

Identiﬁability issues commonly arise with non-random missing data, where the parameters
in the model for the missingness may not be jointly identiﬁable with those in the model for
the outcomes of interest using only the observed data. Analyses which assume identiﬁability
may be unreliable, with the joint selection and outcome model yielding ﬂat “estimation”
surfaces potentially having multiple modes.
In Chapter 5, we consider these missing data issues when analyzing longitudinal data
with informative dropout employing the model of Troxel, Lipsitz and Harrington (1998b).
The model is semiparametric, with the parameter being estimated denoted by (θ, β). The
parameter β is the selection parameter that measures the extent of non-randomness of the
missing data mechanism and θ consists of the remaining ﬁnite dimensional parameters of
the selection and outcome models. The hypotheses of interest concern covariate eﬀects on
the outcome, which are contained in θ. We extend the proﬁling idea to arbitrary estimating
functions involving θ and β but which do not require a complete parametric model speciﬁca5

tion. The generalization of the inﬁmum test and conﬁdence bands to non-likelihood settings
is nontrivial. We present generic conditions which establish the large sample properties of
the estimating function for θ proﬁled on β, including the uniform consistency and weak convergence of the θ estimator as a function of β. Owing to the complexity of the asymptotic
distributions of the inﬁmum test and conﬁdence bands, resampling is needed. A theoretically
justiﬁed procedure is discussed for approximating such distributions.

1.2.2

The breast cancer data

These breast cancer data are from the International Breast Cancer Study Group-IBCSG,
previously reported by H¨rny et al. (1992); and Troxel et al. (1998b). This is a group of
u
randomized breast cancer studies with primary endpoints being survival and relapse; and
quality of life being a secondary endpoint. One study, Study VI, is a randomized trial of
adjuvant chemotherapy following surgery for the treatment of breast cancer. In this study, 4
treatments (A, B, C and D) were randomly assigned to 431 pre-menopausal cancer patients
and several domains of quality of life were assessed. In this thesis, we focus on three qualityof-life domains; 1) PACIS (perceived adjustment to chronic illness scale), 2) Mood and 3)
Appetite. A full description of Study VI and other IBCSG trials may be found elsewhere
(H¨rny et al., 1992; Troxel et al., 1998a). In Chapter 5, we assess the treatment and time
u
eﬀects on the mean quality of life by apply the proposed test and resampling procedure.

6

Chapter 2
Conﬁdence Bands for Mean Functions

2.1

Introduction

In functional data analysis problems, estimation of mean function is the fundamental ﬁrst
step; see Cardot (2000); Rice and Wu (2001); Cuevas, Febrero and Frainman (2006); Ferraty
and Vieu (2006); Degras (2011) and Ma, Yang and Carroll (2012) for example. According
to Ramsay and Silverman (2005), functional data consist of a collection of i.i.d. realizations
η i (x) n of a smooth random function η(x), with unknown mean function Eη(x) = m(x)
i=1
and covariance function G x, x

= cov η(x), η(x ) . Although the domain of η(·) is an

entire interval X , the recording of each random curve η i (x) is only over a ﬁnite number Ni
of points in X , and contaminated with measurement errors. Without loss of generality, we
take X = [0, 1].
Denote by Yij the j-th observation of the random curve η i (·) at time point Xij , 1 ≤ i ≤
n, 1 ≤ j ≤ Ni . Although we refer to variable Xij as time, it could also be other numerical
measures, such as wavelength in Section 3.6. In this work, we examine the equally spaced
7

dense design, in other words, Xij = j/N, 1 ≤ i ≤ n, 1 ≤ j ≤ N with N going to inﬁnity.
For the i-th subject, i = 1, 2, ..., n, its sample path

j/N, Yij

is the noisy realization of

the continuous time stochastic process η i (x) in the sense that Yij = η i (j/N ) + σ (j/N ) εij ,
= 0, E(ε2 ) = 1, and η i (x), x ∈ [0, 1] are i.i.d. copies
ij
of the process {η(x), x ∈ [0, 1]} satisfying E [0,1] η 2 (x)dx < +∞.

with errors εij satisfying E εij

For the standard process {η(x), x ∈ [0, 1]}, let sequences λk ∞ , ψ k (x) ∞ be the
k=1
k=1
eigenvalues and eigenfunctions of G x, x
∞ λ < ∞,
k=1 k
tion G x, x

=

respectively, in which λ1 ≥ λ2 ≥ · · · ≥ 0,

ψ k ∞ form an orthonormal basis of L2 ([0, 1]) and covariance funck=1
∞ λ ψ (x)ψ x , which implies that
k
k=1 k k

G x, x

ψk x

dx =

λk ψ k (x). The process η i (x), x ∈ [0, 1] has the Karhunen-Lo`ve L2 representation η i (x) =
e
m(x) +

∞ ξ φ (x), where the random coeﬃcients ξ are uncorrelated with mean 0
ik
k=1 ik k

and variance 1, and φk =

λk ψ k . In what follows, we assume that λk = 0, for k > κ,

where κ is a positive integer or ∞, thus G(x, x ) =

κ φ (x)φ x
k
k=1 k

and the model that

we consider is
Yij = m (j/N ) +

κ
ξ φ (j/N ) + σ (j/N ) εij .
k=1 ik k

(2.1)

Although the sequences λk κ , φk (·) κ
k=1
k=1 and the random coeﬃcients ξ ik exist mathematically, they are unknown or unobservable.
The existing literature focuses on two data types. Yao, M¨ller and Wang (2005) studied
u
sparse longitudinal data for which Ni , i.e. the number of observations for the i-th curve, is
bounded and follows a given distribution, in which case Ma, Yang and Carroll (2012) obtained
asymptotically simultaneous conﬁdence band for the mean function of the functional data,
using piecewise constant spline estimation. Li and Hsing (2010a) established uniform convergence rate for local linear estimators of mean and covariance function of dense functional
8

data, where min1≤i≤n Ni

(n/logn)1/4 , as n → ∞, similar to our Assumption (A3), but

did not provide asymptotic distribution of maximal deviation or simultaneous conﬁdence
bands. Degras (2011) built asymptotically correct simultaneous conﬁdence bands for dense
functional data using local linear estimators. Bunea, Ivanescu and Wegkamp (2011) proposed asymptotically conservative rather than correct conﬁdence set for the mean function
of Gaussian functional data.

In this chapter, we propose polynomial spline conﬁdence bands for the mean function
based on dense functional data. In function estimation problems, simultaneous conﬁdence
bands are an important tool to address the variability in the mean curve, see Zhao and
Wu (2008); Zhou, Shen and Wolfe (1998) and Zhou and Wu (2010) for related theory and
applications. The fact that simultaneous conﬁdence bands have not been widely used for
functional data analysis is certainly not due to lack of interesting applications, but due to
the greater technical diﬃculty to formulate such bands for functional data and establish
their theoretical properties. In this work, we have established asymptotic correctness of
the proposed conﬁdence bands using various properties of spline smoothing. The spline
estimators and the accompanying conﬁdence bands are asymptotically the same as if all
the n random curves are recorded over the entire interval, without measurement errors.
They are oracally eﬃcient despite the use of spline smoothing, see Remark 2.2.1. This
provides partial theoretical justiﬁcation for treating functional data as perfectly recorded
random curves over the entire data range, as in Ferraty and Vieu (2006). Theorem 3 of
Hall, M¨ller and Wang (2006) stated mean square (rather than the stronger uniform) oracle
u
eﬃciency for local linear estimation of eigenfunctions and eigenvalues (rather than the mean
function), under assumptions similar to ours, but provided only an outline of proof. Among
9

the existing works on functional data analysis, Ma, Yang and Carroll (2012) proposed the
simultaneous conﬁdence bands for sparse functional data. However, their result does not
enjoy the oracle eﬃciency stated in Theorem 2.2.1, since there are not enough observations
for each subject to obtain a good estimate of the individual trajectories. As a result, it has
the slow nonparametric convergence rate of n−1/3 logn, instead of the parametric rate of
n−1/2 as in the present work. This essential diﬀerence completely separates dense functional
data from sparse ones.

The aforementioned conﬁdence bands are also extended to the diﬀerence of two regression
functions. This is motivated by Li and Yu (2008), who applied functional segment discriminant analysis to a Tecator data set, see Figure 2.3. In this data set, each observation (meat)
consists of a 100-channel absorbance spectrum in the wavelength with diﬀerent fat, water
and protein percent. Li and Yu (2008) used the spectra to predict whether the fat percentage
is greater than 20%. On the ﬂip side, we are interested in building a 100 (1 − α) % conﬁdence
band for the diﬀerence between regression functions from the spectra of the less than 20%
fat group and the higher than 20% fat group. If this 100 (1 − α) % conﬁdence band covers
the zero line, one accepts the null hypothesis of no diﬀerence between the two groups, with
p-value no greater than α. Test for equality between two groups of curves based on the
adaptive Neyman test and wavelet thresholding techniques were proposed in Fan and Lin
(1998), who did not provide an estimator of the diﬀerence of the two mean functions nor a
simultaneous conﬁdence band for such estimator. As a result, their test did not extend to
testing other important hypotheses on the diﬀerence of the two mean functions while our
Theorem 2.2.3 provides a benchmark for all such testing. More recently, Benko, H¨rdle and
a
Kneip (2009) developed two-sample bootstrap tests for the equality of eigenfunctions, eigen10

values and mean functions by using common functional principal components and bootstrap
tests.
This chapter is organized as follows. Section 2.2 states main theoretical results on conﬁdence bands constructed from polynomial splines. Section 2.3 provides further insights into
the error structure of spline estimators. The actual steps to implement the conﬁdence bands
are provided in Section 2.4. A simulation study is presented in Section 2.5, and an empirical
illustration on how to use the proposed spline conﬁdence band for inference is reported in
Section 2.6. Technical proofs are collected in the Appendix.

2.2

Main results

To describe the spline functions, we ﬁrst deﬁne a sequence of equally-spaced points tJ Nm ,
J=1
called interior knots, which have been introduced in Chapter 1. Denote by H(p−2) the
space of p-th order spline space, i.e., p − 2 times continuously diﬀerentiable functions on
[0, 1] that are polynomials of degree p − 1 on [tJ , tJ+1 ], J = 0, . . . , Nm . Then H(p−2) =
Nm
J=1−p bJ,p BJ,p (x) , bJ,p ∈ R, x ∈ [0, 1] , where BJ,p is the J-th B-spline basis of order
p as deﬁned in de Boor (2001).
We propose to estimate the mean function m(x) by

mp (x) =
ˆ

argmin
g(·)∈H(p−2)

n
i=1

2
N
Yij − g (j/N ) .
j=1

The technical assumptions we need are as follows:
(A1) The regression function m ∈ C p−1,1 [0, 1], i.e., m(p−1) ∈ C 0,1 [0, 1].
(A2) The standard deviation function σ (x) ∈ C 0,µ [0, 1] for some µ ∈ (0, 1].
11

(2.2)

(A3) As n → ∞, N −1 n1/(2p) → 0 and N = O nθ

for some θ > 1/ (2p); the number of

1/2
−p
−1
interior knots Nm satisﬁes N Nm → ∞, Nm n1/2 → 0, N −1/2 Nm logn → 0 or
−1/2
p
equivalently N hm → ∞, hm n1/2 → 0, N −1/2 hm logn → 0.
(A4) There exists CG > 0 such that G (x, x) ≥ CG , x ∈ [0, 1];
µ
C 0,µ [0, 1] , κ
k=1 φk ∞ < ∞ and as n → ∞, hm

for k ∈ {1, . . . , κ}, φk (x) ∈
κn φ
= o (1) for a
k=1 k 0,µ

sequence {κn }∞ of increasing integers, with limn→∞ κn = κ and the constant
n=1
µ ∈ (0, 1] as in Assumption (A2). In particular,

κ
k=κn +1 φk ∞ = o (1).

(A5) There are constants C1 , C2 ∈ (0, +∞), γ 1 , γ 2 ∈ (1, +∞), β ∈ (0, 1/2) and i.i.d.
N (0, 1) variables

n,κ
n,N
Zik,ξ
, Zij,ε
such that
i=1,k=1
i=1,j=1

max P max
1≤t≤n
1≤k≤κ

t
ξ −
i=1 ik

t
Z
> C1 nβ < C2 n−γ 1 ,
i=1 ik,ξ

(2.3)

P

t
ε −
i=1 ij

t
Z
> C1 nβ
i=1 ij,ε

(2.4)

max
max
1≤j≤N 1≤t≤n

< C2 n−γ 2 .

Assumptions (A1)-(A2) are typical for spline smoothing, see Huang and Yang (2004),
Xue and Yang (2006), Wang and Yang (2009a), Liu and Yang (2010) and Ma and Yang
(2011). Assumption (A3) concerns the number of observations for each subject, and the
number of knots of B-splines. Assumption (A4) ensures that the principal components have
collectively bounded smoothness. Assumption (A5) provides Gaussian approximation of
estimation error process, and is ensured by the following elementary assumption:
τ
(A5 ) There exist τ 1 > 4, τ 2 > 4+2θ such that E ξ ik τ 1 +E εij 2 < +∞, for 1 ≤ i < ∞,
1 ≤ k ≤ κ, 1 ≤ j < ∞. The number κ of nonzero eigenvalues is ﬁnite or κ is inﬁnite
12

while the variables

ξ ik 1≤i<∞,1≤k<∞ are i.i.d..

Degras (2011) makes a restrictive assumption (A.2) on the H¨lder continuity of the
o
stochastic process η(x) = m(x) +

∞ ξ φ (x). It is elementary to construct examples
k=1 k k

where our Assumptions (A4) and (A5) are satisﬁed while assumption (A.2) of Degras (2011)
is not.
The part of Assumption (A4) on φk ’s holds trivially if κ is ﬁnite and all φk (x) ∈
C 0,µ [0, 1]. Note also that by deﬁnition, φk =

λk ψ k , φk ∞ =

λk ψ k ∞ , φk 0,µ =

λk ψ k 0,µ , in which ψ k ∞ form an orthonormal basis of L2 ([0, 1]), hence, Assumpk=1
tion (A4) is fulﬁlled for κ = ∞ as long as λk decreases to zero suﬃciently fast. Following one
Referee’s suggestion, we provide the following example. One takes λk = ρ2[k/2] , k = 1, 2, ...
for any ρ ∈ (0, 1), with ψ k ∞ the canonical orthonormal Fourier basis of L2 ([0, 1])
k=1
√
ψ 1 (x) ≡ 1, ψ 2k+1 (x) ≡ 2 cos (kπx)
√
ψ 2k (x) ≡
2 sin (kπx) , k = 1, 2, ..., x ∈ [0, 1] .

In this case,

∞
k=1 φk ∞ = 1 +

∞ ρk √2 + √2 = 1 + 2√2ρ (1 − ρ)−1 < ∞, while
k=1

for any {κn }∞ with κn increasing, odd and κn → ∞, and Lipschitz order µ = 1
n=1

hm

√
κn
(κn −1)/2 k √
2kπ + 2kπ
φk 0,1 = hm
ρ
k=1
k=1
√
√
∞
≤ 2 2πhm ρ
ρk−1 k = 2 2πhm (1 − ρ)−2
k=1
= O (hm ) = o (1) .

Denote by ζ (x), x ∈ [0, 1] a standardized Gaussian process such that Eζ (x) ≡ 0,
13

Eζ 2 (x) ≡ 1, x ∈ [0, 1] with covariance function

Eζ (x) ζ x

= G x, x

G (x, x) G x , x

−1/2

, x, x ∈ [0, 1]

and deﬁne the 100 × (1 − α)-th percentile of the absolute maxima distribution of ζ (x),
∀x ∈ [0, 1], i.e., P supx∈[0,1] |ζ (x)| ≤ Q1−α = 1 − α, ∀α ∈ (0, 1). Denote by z1−α/2 the
100 (1 − α/2)-th percentile of the standard normal distribution. Deﬁne also the following
“infeasible estimator” of function m

n
η (x), x ∈ [0, 1] .
i=1 i

m(x) = η (x) = n−1
¯
¯

(2.5)

The term “infeasible” refers to the fact that m(x) is computed from unknown quantity η i (x),
¯
x ∈ [0, 1], and it would be the natural estimator of m(x) if all the i.i.d. random curves η i (x),
x ∈ [0, 1] were observed, a view taken in Ferraty and Vieu (2006).
We now state our main results in the following theorem.

Theorem 2.2.1. Under Assumptions (A1)-(A5), for ∀α ∈ (0, 1), as n → ∞, the “infeasible
estimator” m(x) converges at the
¯

√
n rate

P supx∈[0,1] n1/2 |m(x) − m(x)| G (x, x)−1/2 ≤ Q1−α → 1 − α,
¯
P n1/2 |m(x) − m(x)| G (x, x)−1/2 ≤ z1−α/2 → 1 − α, ∀x ∈ [0, 1],
¯

while the spline estimator mp is asymptotically equivalent to m up to order n1/2 , i.e.
ˆ
¯

supx∈[0,1] n1/2 m(x) − mp (x) = oP (1) .
¯
ˆ
14

Remark 2.2.1. The signiﬁcance of Theorem 2.2.1 lies in the fact that one does not need to
distinguish between the spline estimator mp and the “infeasible estimator” m in (2.5), which
ˆ
¯
converges with

√
n rate like a parametric estimator. We therefore have established oracle

eﬃciency of the nonparametric estimator mp .
ˆ

Corollary 2.2.2. Under Assumptions (A1)-(A5), as n → ∞, an asymptotic 100 (1 − α) %
correct conﬁdence band for m(x), x ∈ [0, 1] is

mp (x) ± G (x, x)1/2 Q1−α n−1/2 , ∀α ∈ (0, 1)
ˆ

while an asymptotic 100 (1 − α) % pointwise conﬁdence interval for m(x), x ∈ [0, 1], is
mp (x) ± G (x, x)1/2 z1−α/2 n−1/2 .
ˆ
We next describe a two-sample extension of Theorem 2.2.1. Denote two samples indicated
by d = 1, 2, which satisfy

Ydij = md (j/N ) +

κd
ξ φ (j/N ) + σ d (j/N ) εdij ,
k=1 dik dk

with covariance functions Gd (x, x ) =

κd
φ (x)φdk x
k=1 dk

1 ≤ i ≤ nd ,

1≤j≤N

respectively. We denote the

ratio of two sample sizes as r = n1 /n2 and assume that limn1 →∞ r = r > 0.
ˆ
ˆ
For both groups, let m1p (x) and m2p (x) be the order p spline estimates of mean functions
ˆ
ˆ
m1 (x) and m2 (x) by (2.2). Also denote by ζ 12 (x), x ∈ [0, 1] a standardized Gaussian process
such that Eζ 12 (x) ≡ 0, Eζ 2 (x) ≡ 1, x ∈ [0, 1] with covariance function
12

Eζ 12 (x) ζ 12 x

=

G1 x, x + rG2 x, x
{G1 (x, x) + rG2 (x, x)}1/2 G1 x, x + rG2 x, x
15

1/2

, x, x ∈ [0, 1].

Denote by Q12,1−α the (1 − α)-th quantile of the absolute maxima deviation of ζ 12 (x),
x ∈ [0, 1] as above. We mimic the two sample t-test and state the following theorem whose
proof is analogous to that of Theorem 2.2.1.

Theorem 2.2.3. If Assumptions (A1)-(A5) are modiﬁed for each group accordingly, then
for any α ∈ (0, 1), as n1 → ∞, r → r > 0,
ˆ

P





sup
 x∈[0,1]


1/2
n1

m1p − m2p − m1 + m2 (x)
ˆ
ˆ
{(G1 + rG2 ) (x, x)}1/2

≤ Q12,1−α





→ 1 − α.




Theorems 2.2.3 yields uniform asymptotic conﬁdence band for m1 (x) − m2 (x), x ∈ [0, 1].

Corollary 2.2.4. If Assumptions (A1)-(A5) are modiﬁed for each group accordingly, as n1
→ ∞, r → r > 0, a 100 × (1 − α) % asymptotically correct conﬁdence band for m1 (x) −
ˆ
−1/2
Q12,1−α {(G1 + rG2 ) (x, x)}1/2 , ∀α ∈
m2 (x), x ∈ [0, 1] is m1p − m2p (x) ± n1
ˆ
ˆ
(0, 1).

If the conﬁdence band in Corollary 2.2.2 is used to test hypothesis

H0 : m(x) = m0 (x), ∀x ∈ [0, 1] ←→ Ha : m(x) = m0 (x), for some x ∈ [0, 1],

for some given function m0 (x), as one referee pointed out, the asymptotic power of the test is
α under H0 , 1 under H1 due to Theorem 2.2.1. The same can be said for testing hypothesis
about m1 (x) − m2 (x) using the conﬁdence band in Corollary 2.2.4.
16

2.3

Error decomposition

In this section, we break the estimation error mp (x) − m(x) into three terms. We begin by
ˆ
discussing the representation of the spline estimator mp (x) in (2.2).
ˆ
The deﬁnition of mp (x) in (2.2) means that
ˆ
Nm
ˆ
β B (x),
J=1−p J,p J,p

mp (x) ≡
ˆ

ˆ
ˆ
with coeﬃcients β 1−p,p , ..., β Nm ,p

T

solving the following least squares problem

T
ˆ
ˆ
β 1−p,p , ..., β Nm ,p
n
=

argmin


N 

β 1−p,p ,...,β Nm ,p ∈RNm +p i=1 j=1

Nm

Y −
 ij

β J,p BJ,p (j/N )

2


.(2.6)



J=1−p

Applying elementary algebra, one obtains

mp (x) = B1−p,p (x) , . . . , BNm ,p (x)
ˆ
¯
¯
¯
where Y = Y.1 , . . . , Y.N T , Y.j = n−1

−1 T
T
X Y,
X X

(2.7)

n Y , 1 ≤ j ≤ N , and the design matrix X is
i=1 ij






 B1−p,p (1/N ) · · · BN ,p (1/N )
m




X=

···
···
···





B1−p,p (N/N ) · · · BNm ,p (N/N )
17















.

N ×(Nm +p)

Projecting via (2.7) the relationship in model (2.1) onto the linear subspace of RNm +p
spanned by BJ,p (j/N )
, we obtain the following crucial decomposi1≤j≤N,1−p≤J≤Nm
tion in the space H(p−2) of spline functions:

mp (x) = mp (x) + ep (x) + ˜p (x),
ˆ
˜
˜
ξ

(2.8)

where

mp (x) =
˜
˜ (x) =
ξp

Nm
Nm
˜
β J,p BJ,p (x), ˜p (x) =
ε
a B (x),
˜
J=1−p
J=1−p J,p J,p
κ ˜
Nm
ξ (x), ˜k,p (x) =
ξ
τ
˜
B (x).
k=1 k,p
J=1−p k,J,p J,p

(2.9)

T
T
T
˜
˜
The vectors {β 1−p , ..., β Nm } , a1−p , ..., aNm
˜
˜
and τ k,1−p , ..., τ k,N
˜
˜
in (2.9)
m
are solutions to (2.6) with Yij replaced by m (j/N ), σ (j/N ) εij and ξ ik φk (j/N ) respectively.
Alternatively,

mp (x) =
˜

B1−p,p (x) , . . . , BNm ,p (x)

−1 T
T
X m
X X

ep (x) =
˜

B1−p,p (x) , . . . , BNm ,p (x)

−1 T
T
X e
X X

˜
ξ k,p (x) = ¯.k B1−p,p (x) , . . . , BNm ,p (x)
ξ

−1 T
T
X X
X φk , 1 ≤ k ≤ κ

T
in which m = (m (1/N ) , . . . m (N/N )) is the signal vector,
T
e = (σ (1/N ) ¯.1 , . . . , σ (N/N ) ¯.N ) , ¯.j = n−1 n εij , 1 ≤ j ≤ N is the noise
ε
ε
ε
i=1
T
vector and φk = (φk (1/N ) , . . . , φk (N/N )) are the eigenfunction vectors, and ¯.k =
ξ
n−1

n ξ , 1 ≤ k ≤ κ.
i=1 ik

We cite next an important result from de Boor (2001), p. 149.
18

Theorem 2.3.1. There is an absolute constant Cp−1,µ > 0 such that for every φ ∈
C p−1,µ [0, 1] for some µ ∈ (0, 1], there exists a function g ∈ H(p−1) [0, 1] for which

µ+p−1
hm
.
g − φ ∞ ≤ Cp−1,µ φ(p−1)
0,µ
The next three propositions concern mp (x), ep (x) and ˜p (x) given in (2.8).
˜
˜
ξ
Proposition 2.3.1. Under Assumptions (A1) and (A3), as n → ∞

supx∈[0,1] n1/2 mp (x) − m(x) = o (1) .
˜

(2.10)

Proposition 2.3.2. Under Assumptions (A2)-(A4), as n → ∞

supx∈[0,1] n1/2 ep (x) = oP (1) .
˜

(2.11)

Proposition 2.3.3. Under Assumptions (A2)-(A4), as n → ∞

supx∈[0,1] n1/2 ˜p (x) − (m(x) − m (x)) = oP (1)
ξ
¯

(2.12)

also for any α ∈ (0, 1)

P supx∈[0,1] n1/2 ˜p (x) G (x, x)−1/2 ≤ Q1−α
ξ

→ 1 − α.

(2.13)

Equations (2.10), (2.11) and (2.12) yield the asymptotic eﬃciency of the spline estimator
mp , i.e. supx∈[0,1] n1/2 m(x) − mp (x) = oP (1). The Appendix contains the proofs of
ˆ
¯
ˆ
the above three propositions, which together with (2.8), imply Theorem 2.2.1.
19

2.4

Implementation

This section describes procedures to implement the conﬁdence band of Corollary 2.2.2.
N,n
Given any data set j/N, Yij
from model (2.1), the spline estimator mp (x) is
ˆ
j=1,i=1
obtained from (2.7), the number of interior knots in estimating m(x) is taken to be Nm =
[cn1/(2p) log (n)], in which [a] denotes the integer part of a. Our experience shows that the
choice of constant c = 0.2, 0.3, 0.5, 1, 2 seems quite adequate, and that is what we recommend.
When constructing the conﬁdence bands, one needs to estimate the unknown functions G (·, ·)
and the quantile Q1−α and then plug in these estimators: the same approach is taken in
Ma, Yang and Carroll (2012) and Wang and Yang (2009a).
ˆ
The pilot estimator Gp x, x

ˆ
Gp =

of covariance function G x, x

argmin
g(·,·)∈H(p−2),2

N
j=j

is

C.jj − g j/N, j /N

2

,

with C.jj = n−1

n
ˆ
ˆ
i=1 Yij − mp (j/N ) Yij − mp j /N , 1 ≤ j = j ≤ N and
NG
the tensor product spline space H(p−2),2 = {
b
B (t) B
(s) , b
∈
J ,p
JJ
J,J =1−p JJ J,p
R, t, s ∈ [0, 1]} in which NG = [n1/(2p) log(log(n))]. A detailed discussion of the consistent

property of this plug-in estimator can be found in Chapter 3.
ˆ
In order to estimate Q1−α , one ﬁrst does the eigenfunction decomposition of Gp x, x ,
i.e. N −1

N G (j/N, j /N )ψ (j/N ) = λ ψ j /N , to obtain the estimated eigenvalˆ
ˆ ˆ
ˆ
k
k k
j=1 p

ˆ
ˆ
ues λk and eigenfunctions ψ k . Next, one chooses the number κ of eigenfunctions by using
the following standard and eﬃcient criterion, i.e.
T λ > 0.95 , where {λ }T are the ﬁrst T estimated
ˆ
k k=1
k=1 k
ˆ
ˆ
ˆ
positive eigenvalues. Finally, one simulates ζ b (x) = Gp (x, x)−1/2 κ Zk,b φk (x), where
k=1
κ = argmin1≤l≤T

l
ˆ
k=1 λk /

20

ˆ
φk =

ˆ ˆ
λk ψ k , Zk,b are i.i.d standard normal variables with 1 ≤ k ≤ κ and b = 1, . . . , bM ,

and bM is a preset large integer, the default of which is 1000. We take the maximal absolute
ˆ
ˆ
value for each copy of ζ b (x) and estimates Q1−α by the empirical quantile Q1−α of these
maximum values. We then use the following conﬁdence band

ˆ
ˆ
mp (x) ± n−1/2 Gp (x, x)1/2 Q1−α ,
ˆ

x ∈ [0, 1],

(2.14)

ˆ
for the mean function. We estimate Q12,1−α in a similar way as Q1−α and compute
−1/2 ˆ
Q12,1−α
m1p − m2p (x) ± n1
ˆ
ˆ

ˆ
G1p + rG2p (x, x)
ˆˆ

1/2

,

(2.15)

as conﬁdence band for m1 (x) − m2 (x). Although beyond the scope of this work, as one
referee pointed out, the conﬁdence band in (2.14) is expected to enjoy the same asymptotic
coverage as if true values of Q1−α and G (x, x) were used instead, due to the consistency of
ˆ
Gp (x, x) estimating G (x, x). The same holds for the conﬁdence band in (2.15).

2.5

Simulation

To demonstrate the practical performance of our theoretical results, we perform a set of
simulation studies. Data are generated from model

Yij = m (j/N ) +

2
ξ φ (j/N ) + σεij , 1 ≤ j ≤ N, 1 ≤ i ≤ n,
k=1 ik k

(2.16)

where ξ ik ∼ N (0, 1), k = 1, 2, εij ∼ N (0, 1), for 1 ≤ i ≤ n, 1 ≤ j ≤ N , m(x) = 10 +
sin {2π (x − 1/2)}, φ1 (x) = −2 cos {π (x − 1/2)} and φ2 (x) = sin {π (x − 1/2)}. This setting
21

implies λ1 = 2 and λ2 = 0.5. The noise levels are set to be σ = 0.5 and 0.3. The number of
subjects n is taken to be 60, 100, 200, 300 and 500, and under each sample size the number
of observations per curve is assumed to be N = [n0.25 log2 (n)]. This simulated process has
a similar design as one of the simulation models in Yao, M¨ller and Wang (2005), except
u
that each subject is densely observed. We consider both linear and cubic spline estimators,
and use conﬁdence levels 1 − α = 0.95 and 0.99 for our simultaneous conﬁdence bands. The
constant c in the deﬁnition of Nm in Section 2.4 is taken to be 0.2, 0.3, 0.5, 1 and 2. Each
simulation is repeated 500 times.
Figures 2.1 and 2.2 show the estimated mean functions and their 95% conﬁdence bands
for the true curve m(·) in model (2.16) with σ = 0.3 and n = 100, 200, 300, 500, respectively.
As expected when n increases, the conﬁdence band becomes narrower and the linear and
cubic spline estimators are closer to the true curve.
Tables 2.2 to 2.4 show the empirical frequency that the true curve m (·) is covered by
the linear and cubic spline conﬁdence bands (2.14) at 100 points {1/100, . . . , 99/100, 1}
respectively. At all noise levels, the coverage percentages for the conﬁdence bands are close
to the nominal conﬁdence levels 0.95 and 0.99 for linear splines with c = 0.5, 1 (Tables 2.1
and 2.2), and cubic splines with c = 0.3, 0.5 (Tables 2.3 and 2.4) but decline slightly for
c = 2 and markedly for c = 0.2. The coverage percentages thus depend on the choice of
Nm , and the dependency becomes stronger when sample sizes decrease. For large sample
sizes n = 300, 500, the eﬀect of the choice of Nm on the coverage percentages is negligible.
Although our theory indicates no optimal choice of c, we recommend using c = 0.5 for data
analysis as its performance in simulation for both linear and cubic splines is either optimal
or near optimal.
22

10.5

11.5

n=200

8.5

9.0

9.5

m
8.5

9.0

9.5

m

10.5

11.5

n=100

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.4

0.6

0.8

1.0

0.8

1.0

10.5

11.5

n=500

8.5

9.0

9.5

m
8.5

9.0

9.5

m

10.5

11.5

n=300

0.2

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

Figure 2.1: For interpretation of the references to color in this and all other ﬁgures, the
reader is referred to the electronic version of this dissertation. Plots of the linear spline
estimator (2.2) for simulated data (dashed-dotted line) and 95% conﬁdence bands (2.14)
(upper and lower dashed lines) (2.14) for m(x) (solid lines). In all panels, σ = 0.3.
23

10.5

11.5

n=200

8.5

9.0

9.5

m
8.5

9.0

9.5

m

10.5

11.5

n=100

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.4

0.6

0.8

1.0

0.8

1.0

10.5

11.5

n=500

8.5

9.0

9.5

m
8.5

9.0

9.5

m

10.5

11.5

n=300

0.2

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

Figure 2.2: Plots of the cubic spline estimator (2.2) for simulated data (dashed-dotted line)
and 95% conﬁdence bands (2.14) (upper and lower dashed lines) (2.14) for m(x) (solid lines).
In all panels, σ = 0.3.
24

Table 2.1: Coverage frequencies from 500 replications using linear spline (2.14) with p = 2,
Nm = [cn1/(2p) log(n)] and σ = 0.5.
n

1−α

60

0.950

0.384

0.790

0.990

0.692

0.950

0.894

0.852

0.938

0.970

0.976

0.942

0.184

0.826

0.886

0.884

0.838

0.476

0.936

0.964

0.966

0.944

0.950

0.418

0.856

0.914

0.922

0.862

0.712

0.966

0.976

0.990

0.972

0.950

0.600

0.888

0.920

0.932

0.874

0.834

0.978

0.976

0.980

0.972

0.950

0.772

0.880

0.922

0.886

0.894

0.990

500

0.876

0.990

300

c=2

0.990

200

c=1

0.990

100

c = 0.2 c = 0.3 c = 0.5

0.902

0.964

0.984

0.976

0.976

Table 2.2: Coverage frequencies from 500 replications using linear spline (2.14) with p = 2,
Nm = [cn1/(2p) log(n)] and σ = 0.3.
n

1−α

60

0.950

0.410

0.786

0.990

0.702

0.950

500

0.930

0.914

0.884

0.950

0.972

0.966

0.954

0.198

0.822

0.916

0.916

0.896

0.496

0.940

0.974

0.974

0.968

0.950

0.414

0.862

0.946

0.942

0.926

0.720

0.966

0.984

0.984

0.980

0.950

0.602

0.896

0.940

0.934

0.926

0.990

300

c=2

0.990

200

c=1

0.990

100

c = 0.2 c = 0.3 c = 0.5

0.840

0.982

0.984

0.986

0.980

0.950

0.768

0.888

0.954

0.950

0.942

0.990

0.906

0.968

0.992

0.994

0.988

25

Table 2.3: Coverage frequencies from 500 replications using cubic spline (2.14) with p = 4,
Nm = [cn1/(2p) log(n)] and σ = 0.5.
n

1−α

60

0.950

0.644

0.916

0.990

0.866

0.950

500

0.902

0.890

0.738

0.980

0.958

0.964

0.888

0.596

0.902

0.904

0.876

0.846

0.786

0.970

0.968

0.956

0.952

0.950

0.928

0.942

0.932

0.936

0.904

0.978

0.992

0.982

0.992

0.978

0.950

0.920

0.948

0.926

0.948

0.898

0.990

300

c=2

0.990

200

c=1

0.990

100

c = 0.2 c = 0.3 c = 0.5

0.976

0.986

0.986

0.988

0.980

0.950

0.928

0.922

0.954

0.902

0.898

0.990

0.980

0.982

0.990

0.976

0.978

26

Table 2.4: Coverage frequencies from 500 replications using cubic spline (2.14) with p = 4,
Nm = [cn1/(2p) log(n)] and σ = 0.3.
n

1−α

60

0.950

0.672

0.922

0.990

0.884

0.950

0.940

0.916

0.986

0.986

0.984

0.982

0.610

0.916

0.914

0.914

0.896

0.798

0.980

0.974

0.970

0.964

0.950

0.938

0.952

0.950

0.948

0.934

0.982

0.984

0.992

0.982

0.984

0.950

0.922

0.956

0.948

0.942

0.938

0.982

0.984

0.988

0.984

0.982

0.950

0.928

0.928

0.936

0.932

0.916

0.990

500

0.940

0.990

300

c=2

0.990

200

c=1

0.990

100

c = 0.2 c = 0.3 c = 0.5

0.980

0.982

0.990

0.990

0.992

We compare by simulation the proposed spline conﬁdence band to the least squares
Bonferroni and least squares bootstrap bands in Bunea, Ivanescu and Wegkamp (2011)
(BIW). Table 2.5 presents the empirical frequency that the true curve m (·) for model (2.16)
is covered by these bands at {1/100, . . . , 99/100, 1} respectively as Tables 2.1 and 2.2. The
coverage frequency of the BIW Bonferroni band is much higher than the nominal level making
it too conservative. The coverage frequency of the BIW bootstrap band is consistently lower
than the nominal level by at least 10%, thus not recommended for practical use.
We also compare the widths of the three bands. For each replication, we calculate the
ratios of widths of the two BIW bands against the spline band at {1/100, . . . , 99/100, 1}
and then average these 100 ratios. Table 2.6 shows the ﬁve number summary of these 500
averaged ratios for σ = 0.3 and p = 4. The BIW Bonferroni band is much wider than
27

Table 2.5: Coverage frequencies from 500 replications using least squares Bonferroni band
and least squares Bootstrap band.
Coverage frequency

500

σ = 0.3

0.950

0.990

0.988

0.742

0.744

0.994

0.994

0.856

0.864

0.950

0.996

0.998

0.678

0.712

0.998

1.000

0.860

0.870

0.950

0.988

0.992

0.710

0.734

1.000

1.000

0.856

0.888

0.950

0.988

0.998

0.704

0.720

0.990

300

σ = 0.5

0.990

200

σ = 0.3

0.990

100

least squares bootstrap

0.990

60

least squares Bonferroni
σ = 0.5

n

Coverage frequency

1.000

1.000

0.868

0.870

0.950

0.996

0.998

0.718

0.732

0.990

1.000

1.000

0.856

0.860

1−α

28

Table 2.6: Five number summary of ratios of conﬁdence band widths.
least squares Bonferroni/cubic spline

least squares bootstrap/cubic spline

n

1−α

Min.

Q1

Med.

Q3

Max.

Min.

Q1

Med.

Q3

Max.

60

0.950

0.964

1.219

1.299

1.397

1.845

0.522

0.667

0.716

0.770

0.967

0.990

0.907

1.114

1.188

1.285

1.730

0.527

0.662

0.715

0.770

1.048

0.950

0.995

1.263

1.331

1.415

1.684

0.565

0.675

0.714

0.754

0.888

0.990

0.910

1.148

1.219

1.295

1.603

0.536

0.665

0.708

0.752

0.925

0.950

1.169

1.326

1.383

1.433

1.653

0.600

0.683

0.715

0.743

0.855

0.990

1.045

1.197

1.250

1.300

1.507

0.557

0.668

0.702

0.740

0.888

0.950

1.169

1.363

1.412

1.462

1.663

0.574

0.690

0.717

0.742

0.838

0.990

1.067

1.228

1.277

1.322

1.509

0.587

0.676

0.707

0.739

0.850

0.950

1.273

1.395

1.432

1.476

1.601

0.620

0.691

0.714

0.737

0.818

0.990

1.132

1.243

1.288

1.334

1.465

0.607

0.674

0.707

0.734

0.839

100

200

300

500

cubic spline band, making it undesirable. While the BIW bootstrap band is narrower, we
have mentioned previously that its coverage frequency is too low to be useful in practice.
Simulation for other cases (e.g. p = 2, σ = 0.5) leads to the same conclusion.

To examine the performance of the two-sample test based on spline conﬁdence band,
Table 2.7 reports the empirical power and type I error for the proposed two-sample test. The
data were generated from (2.16) with σ = 0.5 and m1 (x) = 10 + sin {2π (x − 1/2)} + δ (x),
n = n1 for the ﬁrst group, and m2 (x) = 10 + sin {2π (x − 1/2)}, n = n2 for the another
group. The remaining parameters, ξ ik , εij , φ1 (x) and φ2 (x) were set to the same values
for each group as in (2.16). In order to mimic the real data in Section 2.6, we set N = 50,
100 and 200 when n1 = 160, 80 and 40 and n2 = 320, 160 and 80 accordingly. The studied
29

Table 2.7: Empirical power and type I error of two-sample test using cubic spline.

δ (x)

n1 = 160, n2 = 320 n1 = 80, n2 = 160

n1 = 40, n2 = 80

Nominal test level

Nominal test level

Nominal test level

0.05

0.01

0.05

0.01

0.05

0.01

0.6t

1.000

1.000

0.980

0.918

0.794

0.574

0.7sin(x)

1.000

1.000

0.978

0.910

0.788

0.566

0

0.058

0.010

0.068

0.010

0.096

0.028

Monte Carlo SE

0.001

0.004

0.001

0.004

0.001

0.004

hypotheses are:

H0 : m1 (x) = m2 (x), ∀x ∈ [0, 1] ←→ Ha : m1 (x) = m2 (x), for some x ∈ [0, 1].

Table 2.7 shows the empirical frequencies of rejecting H0 in this simulation study with
nominal test level equal to 0.05 and 0.01. If δ(x) = 0, these empirical powers should be
close to 1, and for δ(x) ≡ 0, the nominal levels. Each set of simulations consists of 500
Monte Carlo runs. Asymptotic standard errors (as the number of Monte Carlo iterations
tends to inﬁnity) are reported in the last row of the table. Results are listed only for cubic
spline conﬁdence bands, as those of the linear spline are similar. Overall, the two-sample test
performs well, even with a rather small diﬀerence (δ (x) = 0.7 sin(x)), providing a reasonable
empirical power. Moreover, the diﬀerences between nominal levels and empirical type I error
do diminish as the sample size increases.
30

2.6

Empirical example

In this section, we revisit the Tecator data mentioned in Section 1, which can be downloaded
at http://lib.stat.cmu.edu/datasets/tecator. In this data set, there are measurements on n =
240 meat samples, where for each sample a N = 100 channel near-infrared spectrum of
absorbance measurements was recorded, and contents of moisture (water), fat and protein
were also obtained. The Feed Analyzer worked in the wavelength range from 850 nm to
1050 nm. Figure 2.3 shows the scatter plot of this data set. The spectral data can be
naturally considered as functional data, and we will perform a two-sample test to see whether
absorbance from the spectrum diﬀers signiﬁcantly due to diﬀerence in fat content.
This data set has been used for comparing four classiﬁcation methods (Li and Yu, 2008),
building a regression model to predict the fat content from the spectrum (Li and Hsing,
2010b). Following Li and Yu (2008), we separate samples according to their fat contents
being less than 20% or not. The right panel of Figure 2.3 shows 10 samples from each group.
Here, hypothesis of interest is:

H0 : m1 (x) = m2 (x), ∀x ∈ [850, 1050] ←→ Ha : m1 (x) = m2 (x), for some x ∈ [850, 1050],

where m1 (x) and m2 (x) are the regression functions of absorbance on spectrum, for samples
with fat content less than 20% and greater than or equal to 20%, respectively. Among 240
samples, there are n1 = 155 with fat content less than 20%, the rest n2 = 85 no less than
20%. The numbers of interior knots in (2.2) are computed as in Section 2.4 with c = 0.5
and are N1m = 4 and N2m = 3 for cubic spline ﬁt and N1m = 8 and N2m = 6 for
linear spline ﬁt. Figure 2.4 depicts the linear and cubic spline conﬁdence bands according
31

5.5
4.0
2.0

2.5

3.0

3.5

absorbance

4.5

5.0

5.5
5.0
4.5
4.0
3.5
2.0

2.5

3.0

absorbance

850

900

950

1000

1050

850

wavelength

900

950

1000

1050

wavelength

Figure 2.3: Left: Plot of Tecator data. Right: Sample curves for the Tecator data. Each
class has 10 sample curves. Dashed lines represent spectra with fact > 20% and solid lines
represent spectra with fact < 20%.

32

to (2.15) at conﬁdence levels 0.99 (upper and lower dashed lines) and 0.999995 (upper and
lower dotted lines), with the center dashed-dotted line representing the spline estimator
m1 (x) − m2 (x) and a solid line representing zero. Since even the 99.9995% conﬁdence band
ˆ
ˆ
does not contain the zero line entirely, the diﬀerence of low fat and high fat populations’
absorbance was extremely signiﬁcant. In fact, Figure 2.4 clearly indicates that the less the
fat contained, the higher the absorbance is.

33

0.6
0.0

0.2

0.4

absorbance

0.4
0.0

0.2

absorbance

0.6

0.8

Cubic spline

0.8

Linear spline

850

900

950

1000

1050

850

wavelength

900

950

1000

1050

wavelength

Figure 2.4: Plots of the ﬁtted linear and cubic spline regressions of m1 (x) − m2 (x) for the
Tecator data (dashed-dotted line), 99% conﬁdence bands (2.15) (upper and lower dashed
lines), 99.9995% conﬁdence bands (2.15) (upper and lower dotted lines) and the zero line
(solid line).

34

APPENDIX

35

In this appendix, we use C to denote a generic positive constant unless stated otherwise.

Preliminaries
For any vector ζ = (ζ 1 , ..., ζ s ) ∈ Rs , denote the norm ζ r = (|ζ 1 |r + · · · + |ζ s |r )1/r ,
1 ≤ r < +∞, ζ ∞ = max (|ζ 1 | , ..., |ζ s |). For any s × s symmetric matrix A, we deﬁne λmin (A) and λmax (A) as its smallest and largest eigenvalues, and its Lr norm as
A r = maxζ∈Rs ,ζ=0 ζ −1 Aζ r . In particular, A 2 = λmax (A), and if A is also
r
nonsingular, A−1 = λ−1 (A).
min
2
For any functions φ, ϕ ∈ L2 [0, 1], deﬁne the theoretical and empirical inner products as
1
φ, ϕ = 0 φ (x) ϕ (x) dx and φ, ϕ 2,N = N −1

N φ (j/N ) ϕ (j/N ). The corresponding
j=1

norms are φ 2 = φ, φ , φ 2 = φ, φ 2,N .
2
2,N
We state a strong approximation result, which is used in the proof of Lemma 2.6.6.
Lemma 2.6.1. [Theorem 2.6.7 of Cs˝rg˝ and R´v´sz (1981)] Suppose that ξ i , 1 ≤ i < ∞ are
o o
e e
i.i.d. with E(ξ 1 ) = 0, E(ξ 2 ) = 1 and H(x) > 0 (x ≥ 0) is an increasing continuous function
1
such that x−2−γ H(x) is increasing in x for some γ > 0 and x−1 logH(x) is decreasing in
x with EH (|ξ 1 |) < ∞. Then there exist a sequence of Brownian motions {Wn (l)}∞ and
n=1
constants C1 , C2 , a > 0, depending only on the distribution of ξ 1 and such that for any
{xn }∞ satisfying H −1 (n) < xn < C1 (nlogn)1/2 and Sl =
n=1

P

max Sl − Wn (l) > xn
1≤l≤n

l
i=1 ξ i ,

≤ C2 n {H (axn )}−1 .

The next lemma is a special case of Theorem 13.4.3, Page 404 of DeVore and Lorentz
(1993). Let p be a positive integer. A matrix A =

aij

is said to have bandwidth p

if aij = 0 when |i − j| ≥ p, and p is the smallest integer with this property. Denote by
36

d = A 2 A−1 and d is the condition number of A.
2
Lemma 2.6.2. If a matrix A with bandwidth p has an inverse A−1 , then A−1
1/(4p)
2c0 (1 − τ )−1 , with c0 = τ −2p A−1 , τ = (d2 − 1)/(d2 + 1)
.
2

∞

≤

N B (j/N ) Y Nm
¯
.j J=1−p , where the theoretj=1 J,p
Nm
ical and empirical inner product matrices of BJ,p (x)
are denoted as
J=1−p
ˆ
One writes XT X = N Vp , XT Y =

Vp =

BJ,p , BJ ,p

Nm
ˆ
, Vp =
J,J =1−p

Nm
BJ,p , B
.
J ,p 2,N
J,J =1−p

(2.17)

We establish next that the theoretical inner product matrix Vp deﬁned in (2.17) has an
inverse with bounded L∞ norm.

Lemma 2.6.3. For any positive integer p, there exists a constant Mp > 0 depending only
−1
on p, such that Vp
≤ Mp h−1 , where hm = (Nm + 1)−1 .
m
∞
Proof. According to Lemma A.1 in Wang and Yang (2009b), Vp is invertible since
−1
it is a symmetric matrix with all eigenvalues positive, i.e. 0 < cp Nm ≤ λmin Vp ≤
−1
λmax Vp ≤ Cp Nm < ∞, where cp and Cp are positive real numbers. The compact
support of B-spline basis makes Vp of bandwidth p, hence one can apply Lemma 2.6.2.
Since dp = λmax Vp /λmin Vp ≤ Cp /cp , hence

τ p = d2 − 1
p

1/4p

d2 + 1
p

−1/4p

−1/4p
1/4p
2
2
Cp c−2 + 1
< 1.
≤ Cp c−2 − 1
p
p

−1
If p = 1, then Vp = h−1 INm +p , the lemma holds with Mp = 1. If p > 1, let u1−p =
m
T
T
1, 0T +p−1 , u0 = 0T , 1, 0T
p−1
Nm
Nm , then u1−p 2 = u0 2 = 1. Also lemma A.1
37

in Wang and Yang (2009b) implies that

2
2
= λmin Vp u1−p ≤ uT Vp u1−p = B1−p,p ,
1−p
2
2
2
uT Vp u0 = B0,p ≤ λmax Vp u0 2 = λmax Vp ,
2
0
2

λmin Vp

hence dp = λmax Vp /λmin Vp ≥

2
−2
= rp > 1, where rp is an
B1−p,p
B0,p
2
2

absolute constant depending only on p. Thus
1
1
−1
−1
2
2
τ p = d2 − 1 4p d2 + 1 4p ≥ rp − 1 4p rp + 1 4p > 0. Applying Lemma 2.6.2
p
p
and putting the above bounds together, one obtains

−1
Vp

−2p
−1
hm ≤ 2τ p
Vp
1 − τ p −1 hm
∞
2


1/4p −1
1/2

2
2


rp + 1
Cp c−2 − 1
p
hm
≤ 2
λ−1 Vp × 1 −
min
2
2


rp − 1


Cp c−2 + 1
p

−1

2 c−2 − 1 1/4p 
2 + 1 1/2


rp
−1 1 − Cp p
cp
≡ Mp .
≤ 2
2
2


rp − 1


Cp c−2 + 1
p

The lemma is proved.
T
For any function φ ∈ C [0, 1], denote the vector φ = (φ (1/N ) , . . . , φ (N/N )) and
function
˜
φ (x) ≡ B1−p,p (x) , . . . , BNm ,p (x)

−1 T
T
X X
X φ.

ˆ
ˆ
Lemma 2.6.4. Under Assumption (A3), for Vp and Vp deﬁned in (2.17), Vp − Vp

∞

=

ˆ −1
and Vp
≤ 2Mp h−1 , for large enough n. There exists cφ,p ∈ (0, ∞) such
m
∞
˜
that when n is large enough, φ
≤ cφ,p φ ∞ for any φ ∈ C [0, 1]. Furthermore, if
∞

O N −1

38

˜
φ ∈ C p−1,µ [0, 1] for some µ ∈ (0, 1], then for Cp−1,µ = cφ,p + 1 Cp−1,µ

˜
φ−φ

∞

˜
≤ Cp−1,µ φ(p−1)

ˆ
Proof. We ﬁrst show that Vp − Vp

∞

µ+p−1
hm
.
0,µ

(2.18)

= O N −1 . Suppose p = 1, deﬁne for any

0 ≤ J ≤ Nm , the number of design points j/N in the J-th interval IJ as NJ , then
NJ = # {j : j ∈ [N J/ (Nm + 1) , N (J + 1) / (Nm + 1))}, for 0 ≤ J < Nm , and NJ =
# {j : j ∈ [N J/ (Nm + 1) , N (J + 1) / (Nm + 1)]}, for J = Nm .
Clearly max0≤J≤Nm NJ − N hm ≤ 1, and hence

ˆ
V 1 − V1

∞

=

max
0≤J≤Nm

2
2
− BJ,1
BJ,1
2
2,N

N
N −1
max
B 2 (j/N ) − hm
j=1 J,1
0≤J≤Nm
NJ − N hm ≤ N −1 .
N −1 NJ − hm = N −1 max
=
max
0≤J≤Nm
0≤J≤Nm
=

For p > 1, de Boor (2001), Page 96, B-spline property ensures that there exists a constant
C1,p > 0 such that

max
max
sup
BJ,p (j/N ) B
(j/N ) − BJ,p (x) B
(x)
J ,p
J ,p
1−p≤J,J ≤Nm 1≤j≤N x∈[(j−1)/N,j/N ]
≤ C1,p N −1 h−1 ,
m
while there exists a constant C2,p > 0 such that max1−p≤J,J ≤N NJ,J ≤ C2,p N hm
m
39

where N
= # j : 1 ≤ j ≤ N, BJ,p (j/N ) B
(j/N ) > 0 . Hence
J,J
J ,p

ˆ
V p − Vp

∞

1
N
BJ,p (x) BJ ,p (x) dx
BJ,p (j/N ) BJ ,p (j/N ) −
j=1
0
j/N
N
≤
max
B (j/N ) BJ ,p (j/N ) − BJ,p (x) BJ ,p (x) dx
j=1 (j−1)/N J,p
1−p≤J,J ≤Nm
≤ C2,p N hm × N −1 × C1,p N −1 h−1 ≤ CN −1 .
m
=

max
N −1
1−p≤J,J ≤Nm

−1
≤ h−1 γ ∞ . Hence,
According to Lemma 2.6.3, for any (Nm + p) vector γ, Vp γ
m
∞
−1
Vp γ ∞ ≥ Mp hm γ ∞ . By Assumption (A3), N −1 = o (hm ) so if N is large enough,
for any γ, one has

−1 γ
−1
ˆ
≥ hm Mp
≥ Vp γ ∞ − Vp γ − Vp γ
∞−O N
∞
∞
hm −1
Mp
γ ∞.
=
2
ˆ
Vp γ

γ ∞

ˆ −1
Hence Vp
≤ 2Mp h−1 .
m
∞
To prove the last statement of the lemma, note that for any x ∈ [0, 1] at most (p + 1) of
the numbers B1−p,p (x) , . . . , BNm ,p (x) are between 0 and 1, others being 0, so

˜
φ (x)

≤ (p + 1)

−1 T
T
T
ˆ −1 X φN −1
X X
X φ = (p + 1) Vp

T
T
ˆ −1
≤ (p + 1) Vp
X φN −1 ≤ 2 (p + 1) Mp h−1 X IN N −1
m
∞

φ ∞

T
T
˜
in which IN = (1, ..., 1) . Clearly X IN N −1 ≤ Chm for some C > 0, hence φ (x) ≤
2Mp (p + 1) C φ ∞ = cφ,p φ ∞ . Now if φ ∈ C p−1,µ [0, 1] for some µ ∈ (0, 1], let
40

µ+p−1
hm
according to
g ∈ H(p−1) [0, 1] be such that g − φ ∞ ≤ Cp−1,µ φ(p−1)
0,µ
Theorem 2.3.1, then g ≡ g as g ∈ H(p−1) [0, 1] hence
˜

∞

=

˜ ˜
φ − g − (φ − g)

≤

˜
φ−φ

cφ,p + 1

∞

˜ ˜
≤ φ−g

∞

+ φ−g ∞

µ+p−1
hm
φ − g ∞ ≤ cφ,p + 1 Cp−1,µ φ(p−1)
0,µ

proving (2.18).

Lemma 2.6.5. Under Assumption (A5), for C0 = C1 1 + βC2

∞ sβ−1−γ 1
s=1

and

n≥1

¯
ξ
max E ¯.,k − Z.k,ξ
1≤k≤κ
¯
max ¯.,j − Z.j,ε
ε
1≤j≤N
¯
where Z.k,ξ = n−1

n Z
−1
¯
i=1 ik,ξ , Z.j,ε = n

≤ C0 nβ−1 ,

(2.19)

= Oa.s. nβ−1

(2.20)

n Z
i=1 ij,ε , 1 ≤ j ≤ N , 1 ≤ k ≤ κ. Also

max E ¯.,k ≤ n−1/2 (2/π)1/2 + C0 nβ−1 .
ξ
1≤k≤κ

Proof. The proof of (2.20) is trivial. Assumption (A5) entails that
¯
Fn+t,k < C2 (n + t)−γ 1 , k = 1, ..., κ, t = 0, 1, ..., ∞, in which
41

(2.21)

n ξ −
i=1 ik

¯
Fn+t,k = P

E

n Z
β
i=1 ik,ξ > C1 (n + t) . Taking expectation, one has

n
ξ −
i=1 ik

≤ C1 (n + 0)β +

n
Z
i=1 ik,ξ
∞
¯
¯
C (n + t)β Fn+t−1,k − Fn+t,k
t=1 1

∞
C C (n + t)−γ 1 β (n + t)β−1
t=0 1 2
∞
≤ C1 nβ + βC2
(n + t)β−1−γ 1
t=0
∞
sn−1
≤ nβ C1 1 + βC2 n−1−γ 1
(1 + t/n)β−1−γ 1
s=1
t=sn−n
∞ β−1−γ
1 ≤ C 0 nβ ,
≤ nβ C1 1 + βC2 n−1−γ 1 × n
t
t=1
≤ C 1 nβ +

¯
which proves (2.19) if one divides the above inequalities by n. The fact that Z.k,ξ ∼
¯
N (0, 1/n) entails that E Z.k,ξ = n−1/2 (2/π)1/2 and thus
ξ
max1≤k≤κ E ¯.,k ≤ n−1/2 (2/π)1/2 + C0 nβ−1 .
Lemma 2.6.6. Assumption (A5) holds under Assumption (A5’).
τ
Proof. Under Assumption (A5 ), E ξ ik τ 1 < +∞, τ 1 > 4, E εij 2 < +∞, τ 2 >
4 + 2θ, so there exists some β ∈ (0, 1/2) such that τ 1 > 2/β, τ 2 > (2 + θ) /β.
Now let H(x) = xτ 1 , then Lemma 2.6.1 entails that there exists constants C1k , C2k , ak
which depend on the distribution of ξ ik , such that for
−τ −τ
n
xn = C1k nβ ,
= a 1 C 1 n1−τ 1 β and i.i.d. N (0, 1) variables Zik,ξ such that
1k
k
H ak xn

P

max
1≤t≤n

t
ξ −
i=1 ik

t
−τ −τ
Zik,ξ > C1k nβ < C2k a 1 C 1 n1−τ 1 β .
k
1k
i=1

Since τ 1 > 2/β, γ 1 = τ 1 β − 1 > 1. If the number κ of k is ﬁnite, so there are common con42

t ξ −
i=1 ik

stants C1 , C2 > 0 such that P max1≤t≤n

−γ 1
t Z
β
i=1 ik,ξ > C1 n < C2 n

which entails (2.3) since κ is ﬁnite. If κ is inﬁnite but all the ξ ik ’s are i.i.d., then C1k , C2k , ak
are the same for all k, so the above is again true.
Likewise, under Assumption (A5 ), Lemma 2.6.1 applied to H(x) = xτ 2 implies that
there exists constants C1 , C2 and a which depend on the distribution of εij and i.i.d. N (0, 1)
−τ
n
variables Zij,ε , such that with xn = C1 nβ ,
= a−τ 2 C1 2 n1−τ 2 β and
H ak x n

max P
1≤j≤N

t
ε −
i=1 ij

max
1≤t≤n

t
Z
> C1 nβ
i=1 ij,ε

−τ
≤ C2 a−τ 2 C1 2 n1−τ 2 β ,

Hence, τ 2 β > 2 + θ which implies that there is γ 2 > 1 such that τ 2 β − 1 > γ 2 + θ and (2.4)
follows.

Proof of Proposition 2.3.1.

Applying (2.18),

p
mp − m ∞ ≤ Cp−1,1 hm .
˜

Since Assumption (A3) implies that

p
O hm n1/2 = o (1), equation (2.10) is proved.

Proof of Proposition 2.3.2.
−1 T
˜
Denote by Zp,ε (x) = B1−p,p (x) , . . . , BNm ,p (x) XT X
X Z, where
T
¯
¯
Z = σ (1/N ) Z.1,ε , . . . , σ (N/N ) Z.N,ε . By (2.20), one has Z − e ∞ = Oa.s. nβ−1 ,
while

N −1 XT (Z − e)
≤ C Z−e ∞

∞

≤ Z−e ∞

max
1−p≤J≤Nm

max
BJ,p , 1
2,N
1−p≤J≤Nm
# j : BJ,p (j/N ) > 0 N −1 ≤ C Z − e ∞ hm .
43

Also for any ﬁxed x ∈ [0, 1], one has

˜
Zp,ε (x) −˜p (x)
e

∞

ˆ −1
B1−p,p (x) , . . . , BNm ,p (x) Vp N −1 XT (Z − e)
∞
ˆ −1
Z − e ∞ hm = Oa.s. nβ−1 .
≤ C Vp
∞
=

ˆ −1
Note next that the random vector Vp N −1 XT Z is (Nm + p)-dimensional normal with
ˆ −1
ˆ −1
covariance matrix N −2 Vp XT var (Z) XVp , bounded above by

ˆ −1
ˆ −1 ˆ ˆ −1
≤ CN −1 n−1 Vp
≤ CN −1 n−1 h−1 ,
max σ 2 (x) n−1 N −1 Vp Vp Vp
m
∞
∞
x∈[0,1]
ˆ −1
bounding the tail probabilities of entries of Vp N −1 XT Z and applying Borel-Cantelli
Lemma leads to

ˆ −1
Vp N −1 XT Z

∞

−1/2
= Oa.s. N −1/2 n−1/2 hm log1/2 (Nm + p)
−1/2
= Oa.s. N −1/2 n−1/2 hm log1/2 n .

−1/2
˜
Hence, supx∈[0,1] n1/2 Zp,ε (x) = Oa.s. N −1/2 hm log1/2 n

and

−1/2
sup n1/2 ep (x) = Oa.s. nβ−1/2 + N −1/2 hm log1/2 n
˜
x∈[0,1]
Thus (2.11) holds according to Assumption (A3).

Proof of Proposition 2.3.3.
44

= oa.s. (1) .

˜
¯
We denote ζ k (x) = Z·k,ξ φk (x), k = 1, . . . , κ and deﬁne

˜
ζ (x) = n1/2

−1/2
κ
φk (x) 2
k=1

κ ˜
ζ (x) = n1/2 G (x, x)−1/2
k=1 k

κ ˜
ζ (x).
k=1 k

˜
˜
˜
It is clear that ζ (x) is a Gaussian process with mean 0, variance 1 and covariance E ζ (x) ζ x
= G (x, x)−1/2 G x, x

−1/2

˜
G x, x , for any x, x ∈ [0, 1]. Thus ζ (x), x ∈ [0, 1] has the

same distribution as ζ (x), x ∈ [0, 1].
Using Lemma 2.6.4, one obtains that

µ
˜
˜
˜
φk
≤ cφ,p φk ∞ , φk − φk
≤ C0,µ φk 0,µ hm , 1 ≤ k ≤ κ.
∞
∞

(2.22)

Applying the above (2.22), (2.21) and Assumptions (A3), (A4), one has

κ ¯
˜
ξ
φ (x) − φk (x)
k=1 ·k k
κn
κ
µ
≤ Cn1/2
E ¯·k φk 0,µ hm +
ξ
E ¯·k φk ∞
ξ
k=1
k=κn +1
κn
κ
µ
≤ C
φ
h +
φ
= o (1) ,
k=1 k 0,µ m
k=κn +1 k ∞
En1/2 supx∈[0,1] G (x, x)−1/2

hence

n1/2 supx∈[0,1] G (x, x)−1/2

κ ¯
˜
ξ
φ (x) − φk (x)
k=1 ·k k

= oP (1) .

In addition, (2.19) and Assumptions (A3) and (A4) entail

En1/2 supx∈[0,1] G (x, x)−1/2
≤ Cnβ−1/2

κ
φ
= o (1) .
k=1 k ∞
45

κ
¯
Z
− ¯·k φk (x)
ξ
k=1 ·k,ξ

(2.23)

Hence

n1/2 supx∈[0,1] G (x, x)−1/2

κ
¯
Z
− ¯·k φk (x) = oP (1) .
ξ
k=1 ·k,ξ

Note that

m(x) − m (x) − ˜p (x) =
¯
ξ
˜
¯
n−1/2 G (x, x)1/2 ζ (x) − {m(x) − m (x)} =

κ ¯
˜
φ (x) − φk (x) ,
ξ
k=1 ·k k
κ
¯
Z
− ¯·k φk (x)
ξ
k=1 ·k,ξ

hence

n1/2 sup G (x, x)−1/2 m(x) − m (x) − ˜p (x)
¯
ξ
x∈[0,1]
˜
supx∈[0,1] ζ (x) − n1/2 G (x, x)−1/2 {m(x) − m (x)}
¯

= oP (1) ,
= oP (1) .

according to (2.23) and (2.24), which leads to both (2.12) and (2.13).

46

(2.24)

Chapter 3
Conﬁdence Envelopes for Covariance
Functions

3.1

Introduction

Covariance estimation is crucial in both functional and longitudinal data analysis. For
longitudinal data, a good estimation of the covariance function improves the estimation
eﬃciency of the mean parameters (Wang, Carroll and Lin, 2005; Fan, Huang and Li, 2007).
In functional data analysis (Ramsay and Silverman, 2005), covariance estimation plays a
critical role in functional principal component analysis (James, Hastie and Sugar, 2000;
Zhao, Marron and Wells, 2004; Yao, M¨ller and Wang, 2005a; Hall, M¨ller and Wang,
u
u
2006; Yao and Lee, 2006; Zhou, Huang and Carroll, 2008; Li and Hsing, 2010a), functional
generalized linear models (Cai and Hall, 2005; Yao, M¨ller and Wang, 2005b; Li, Wang and
u
Carroll, 2010), and other functional nonlinear models (James and Silvermen, 2005; Li and
Hsing, 2010b). Other related work on functional data analysis includes Ferraty and Vieu
(2006) and Morris and Carroll (2006).
47

There are some important recent works on nonparametric covariance estimation in functional data, which are mostly based on kernel smoothing, for example Yao et al. (2005a),
Hall et al. (2006) and Li and Hsing (2010a). More recently, Cai and Yuan (2010) also
proposed a smoothing spline covariance estimator. So far, all existing work concentrated
on estimation and the corresponding asymptotic convergence rate. There is no theoretical
or methodological development for inference procedures on the covariance functions, such
as simultaneous or uniform conﬁdence envelopes. Nonparametric simultaneous conﬁdence
regions are powerful tools for making global inference on functions; see H¨rdle and Marron
a
(1991), Claeskens and Van Keilegom (2003) and Zhao and Wu (2008) for related theory and
applications.
In this chapter, we consider a typical functional data setting where the functions are
recorded on a dense regular grid in an interval X and the measurements are contaminated
with measurement errors. Some recent applications of this type of functional data include
near infrared spectra (Li and Hsing, 2010a), recorded speeches for voice recognition (Hastie,
Tibshirani and Buja, 1995), electroencephalogram (EEG) data (Crainiceanu, Stacu and Di,
2009). We propose to estimate the covariance function by tensor product B-splines. In
contrast with the kernel methods (Yao et al., 2005a; Hall et al., 2006; Li and Hsing, 2010b),
our proposed spline estimator is much more eﬃcient in terms of computation. The reason
is that the kernel smoothers are evaluated pointwisely, while for the spline estimator, we
only need to solve for a small number of spline coeﬃcients to have an explicit expression for
the whole function. For smoothing a two-dimensional covariance surface with a moderate
sample size, the kernel smoother might take up to half an hour, while our spline estimator
only takes a few seconds. Computation eﬃciency is a huge advantage for the spline methods
in analyzing large data sets and in performing simulation studies. The reader is referred to
48

Huang and Yang (2004) for more discussions on the computational merits of spline methods.
Compared with the smoothing spline approach of Cai and Yuan (2010), our method uses
reduced rank tensor product B-splines, and is potentially faster while analyzing large data
sets.
We show that the estimation error in the mean function is asymptotically negligible
in estimating the covariance function, and our covariance estimator is as eﬃcient as an
“oracle” estimator where the true mean function is known. We derive both local and global
asymptotic distribution for the proposed spline covariance estimator. Especially, based on
the asymptotic distribution of the maximum deviation of the estimator, we propose a new
simultaneous conﬁdence envelope for the covariance function, which can be used to visualize
the variability of the covariance estimator and to make global inferences on the shape of the
true covariance.
We apply the proposed conﬁdence envelope method to a Tecator near infrared spectra
data set to test the hypothesis that the covariance is stationary. In a speech recognition
application, the classic functional linear discriminant analysis (Hastie et al., 1995; James
and Hastie, 2001) assumes that the random curves from diﬀerent classes share a common
covariance function. We further extend our conﬁdence envelope method to a two-sample
problem, where one can test whether the covariance functions from two groups are diﬀerent.
We organize this chapter as follows. In Section 3.2 we describe the data structure and the
proposed spline covariance estimator. In Section 3.3, we study the local and global asymptotic properties of the proposed estimator. Based on the theory, we propose a new conﬁdence
envelope approach and extend the method to two-sample hypothesis testing problems. More
implementation details of the proposed conﬁdence envelopes are provided in Section 3.4.
We present simulation studies in Section 3.5 and applications to the Tecator infrared spec49

troscopy and the speech recognition data set in Sections 3.6. Some concluding remarks are
provided in Section 3.7. All proofs of the theorems and technical lemmas are provided in
the appendix and the supplementary material.

3.2
3.2.1

Spline covariance estimation
Data structure and model assumptions

Following Ramsay and Silverman (2005), the data that we consider are a collection of trajectories η i (x) n which are i.i.d. realizations of a smooth random function η(x), deﬁned on
i=1
a continuous interval X . Assume that {η(x), x ∈ X } is a L2 process, i.e. E X η 2 (x)dx <
+∞, and deﬁne the mean and covariance functions as m(x) = E{η(x)} and G x, x

=

cov η(x), η(x ) . The covariance function is a symmetric nonnegative-deﬁnite function with
κ λ ψ (x)ψ x , where λ ≥ λ ≥ · · · ≥ 0,
1
2
k
k=1 k k
κ λ < +∞, are the eigenvalues, and ψ (x) κ are the corresponding eigenfunctions
k
k=1
k=1 k
and are a set of orthonormal functions in L2 (X ). By the Karhunen-Lo`ve representation,
e
a spectral decomposition, G x, x

η i (x) = m(x) +

=

κ ξ φ (x), where the random coeﬃcients ξ are uncorrelated with
ik
k=1 ik k

mean 0 and variance 1, and the functions φk =

λk ψ k . In the standard Karhunen-Lo`ve
e

expansion, κ can diverge to ∞. For practical consideration, κ is always truncated at a ﬁnite
number in real data analysis. Our main theoretical results are developed under the assumption that κ is a ﬁnite positive integer, but some of our theoretical results can be further
generalized to the inﬁnite dimension case, which will be discussed in Section 3.3.
Without loss of generality, we take X = [0, 1]. Then the observed data are Yij =
η i Xij

+ σ Xij εij , for 1 ≤ i ≤ n, 1 ≤ j ≤ N , where Xij = j/N , εij are i.i.d.
50

random errors with E (ε11 ) = 0 and E(ε2 ) = 1, and σ 2 (x) is the variance function of
11
the measurement errors. By the Karhunen-Lo`ve representation, the observed data can be
e
written as
Yij = m (j/N ) +

κ
ξ φ (j/N ) + σ (j/N ) εij .
k=1 ik k

We model m(·) and G(·, ·) as nonparametric functions, and hence {λk }κ , {φk (·)}κ
k=1
k=1
and {ξ ik }κ
k=1 are unknown and need to be estimated.

3.2.2

Spline covariance estimator

To describe the tensor product spline estimator of the covariance functions, we ﬁrst introduce
some notation. Denote a sequence of equally-spaced points tJ Ns , called interior knots
J=1
which have been deﬁned in Chapter 1. Let hs = 1/ (Ns + 1) be the distance between
neighboring knots. Let H(p−2) = H(p−2) [0, 1] be the polynomial spline space of order p.
The J th B-spline of order p is denoted by BJ,p as in de Boor (2001). Thus we deﬁne the
tensor product spline space as

H(p−2),2 [0, 1]2 ≡ H(p−2),2 = H(p−2) ⊗ H(p−2)


Ns

=
b
B (x) B
x
JJ p J,p
J ,p


J,J =1−p




,b
∈ R, x, x ∈ [0, 1] .
JJ p



If the mean function m(x) was known, one could compute the errors
κ
Uij ≡ Yij − m(j/N ) =

ξ ik φk (j/N ) + σ (j/N ) εij , 1 ≤ i ≤ n, 1 ≤ j ≤ N.
k=1

¯
Denote U·jj = n−1

n U U , 1 ≤ j = j ≤ N , one can then deﬁne the “oracle”
i=1 ij ij
51

estimator of the covariance function

˜
Gp2 (·, ·) =

¯
argmin
U·jj − g j/N, j /N
g(·,·)∈H(p2 −2),2 1≤j=j ≤N

2

,

(3.1)

using tensor product splines of order p2 ≥ 2. Since the mean function m(x) is unavailable
when one analyzes data, one can use instead the spline smoother of m(x), i.e.,
n
mp1 (·) =
ˆ

N

argmin
Yij − g (j/N )
g(·)∈H(p1 −2) i=1 j=1

2

, p1 ≥ 1.

To mimic the above “oracle” smoother, we deﬁne

ˆ
Gp1 ,p2 (·, ·) =

argmin
g(·,·)∈H(p2 −2),2 1≤j=j ≤N

ˆ
¯
where U·jj ,p = n−1
1

ˆ
¯
U
− g j/N, j /N
·jj ,p1

2
,

(3.2)

n U
ˆ
ˆ
ˆ
ˆ
i=1 ijp1 Uij p1 with Uijp1 = Yij − mp1 (j/N ). Let Ns1 be the

number of interior knots for mean estimation, and Ns2 be the number of interior knots for
2
ˆ
Gp1 ,p2 (x, x ) in each coordinate. In other words, we have Ns interior knots for the tensor
2
product spline space H(p2 −2),2 .

We now provide detailed algorithm for the spline covariance estimator. For simplicity,
denote B
x, x
JJ ,p2

B p2 x, x

X=

B p2

=

= BJ,p (x) B
x
J ,p2
2

B1−p ,1−p ,p x, x
2
2 2

and

, . . . , BNs ,1−p ,p x, x
2 2
2

,

T
. . . , B1−p ,Ns ,p x, x , . . . , BNs ,Ns ,p x, x
,
2 2 2
2
2 2
1
1
2 1
1
,
, . . . , B p2 1,
, . . . , B p2
, 1 , . . . , B p2 1 − , 1
N N
N
N
N
52

T

.

ˆ
Then Gp1 ,p2 (x, x ) deﬁned in (3.2) can be rewritten as
ˆT
ˆ
Gp1 ,p2 (x, x ) ≡ β p ,p B p2 x, x
1 2

,

(3.3)

ˆ
where β p ,p is the collector of the estimated spline coeﬃcients by solving the following
1 2
least squares

ˆ
β p ,p =
1 2

argmin
(N +p )2
bp2 ∈R s2 2 1≤j=j ≤N

ˆ
¯
U
− bT B p2 (j/N, j /N )
p2
·jj ,p1

2
.

By elementary algebra, one obtains

ˆ
Gp1 ,p2 (x, x ) = B T (x, x ) XT X
p2

−1

ˆ
¯
XT U p 1 ,

˜
Gp2 (x, x ) = B T (x, x ) XT X
p2

−1

¯
XT U,

(3.4)

where

ˆ
¯
Up1 =

ˆ
ˆ
ˆ
ˆ
¯
¯
¯
¯
U·21,p , . . . , U·N 1,p , . . . , U·1N,p , . . . , U·(N −1)N,p
1
1
1
1

¯
¯
¯
¯
¯
U = (U·21 , . . . , U·N 1 , . . . , U·1N , . . . , U·(N −1)N )T .

53

T

,

3.3

Asymptotic theory and simultaneous conﬁdence envelopes

3.3.1

Assumptions and the oracle property

For any ν ∈ (0, 1], we denote C q,ν [0, 1] as the space of ν-H¨lder continuous functions on
o
[0, 1],
C q,ν [0, 1] =





φ:

sup
x=x ,x,x ∈[0,1]

φ(q) (x) − φ(q) x
x−x ν



< +∞ .


We need the following technical assumptions.
(B1) The regression function m ∈ C p1 −1,1 [0, 1].
(B2) The standard deviation function σ (x) ∈ C 0,ν [0, 1]. For any k = 1, 2, . . . κ, φk (x) ∈
C p2 −1,ν [0, 1]. Also sup

G x, x
x,x ∈[0,1]2

< C, for some positive constant C

and minx∈[0,1] G (x, x) > 0.
(B3) The number of knots Ns1 and Ns2 satisfy n1/(4p1 )
p
Ns 1 .
min N 1/2 , n1/3 and Ns2
1

Ns1

N , n1/(2p2 )

Ns2

∞,κ
(B4) The number κ of nonzero eigenvalues is ﬁnite. The variables ξ ik
and
i=1,k=1
∞,∞
εij
are independent. In addition, Eε11 = 0, Eε2 = 1, Eξ 1k = 0, Eξ 2 =
11
1k
i=1,j=1
1 and max1≤k≤κ E ξ 1k δ 1 < +∞, E |ε11 |δ 2 < +∞, for some δ 1 , δ 2 > 4.

Assumptions (B1)-(B4) are standard in the spline smoothing literature; see Huang (2003),
for instance. In particular, (B1) and (B2) guarantee the orders of the bias terms of the spline
smoothers for m(x) and φk (x). Assumption (B3) is a weak assumption to ensure the order
54

of the bias and noise terms in Propositions 3.3.1 and 3.3.2. Assumption (B4) is necessary
for strong approximation. More discussion about the assumptions is in Section 3.4.
ˆ
To gain a deeper understanding on the behavior of the spline covariance estimator Gp2
˜
in (3.2), we ﬁrst study the asymptotic property of Gp2 in (3.1).
Denote by
κ
∆ x, x

=

φk (x) φ
k

x

¯
ξ
−δ
·kk
kk

,

(3.5)

k,k =1
where ¯·kk = n−1
ξ

n ξ ξ
i=1 ik ik and δ kk = 1 for k = k and 0 otherwise.

Proposition 3.3.1. Under Assumptions (B2)-(B4), one has

˜
sup
Gp2 (x, x ) − G(x, x ) − ∆ x, x
(x,x )∈[0,1]2

= op n−1/2 .

(3.6)

The proof of Proposition 3.3.1 is provided in the supplementary material. The next
ˆ
proposition provides that the tensor product spline estimator Gp1, p2 is uniformly close to
the “oracle” smoother at the rate of op n−1/2 .

Proposition 3.3.2. Under Assumptions (B1)-(B4), one has

ˆ
˜
Gp1, p2 (x, x ) − Gp2 (x, x ) = op n−1/2 .
sup
(x,x )∈[0,1]2
The proof of Proposition 3.3.2 is provided in the supplementary material. As a result of
Propositions 3.3.1 and 3.3.2,

ˆ
sup
Gp1 ,p2 (x, x ) − G(x, x ) − ∆ x, x
(x,x )∈[0,1]2
55

= op n−1/2 .

3.3.2

Asymptotic conﬁdence envelopes

ˆ
The next theorem provides a pointwise approximation to the mean squared error of Gp1, p2 (x, x ).
Theorem 3.3.1. Under Assumptions (B1)-(B4),

ˆ
nE[Gp1, p2 (x, x ) − G(x, x )]2 = V x, x

where V x, x

= G x, x

2

+ G (x, x) G x , x

+

+ o(1),

κ φ2 (x) φ2 x
k=1 k
k

Eξ 4 − 3 .
1k

Remark 3.3.1. Although the convergence result in Theorem 3.3.1 is derived under the assumption that κ is a ﬁnite positive number, it continues to hold when κ → ∞ as long as the
sum in the deﬁnition of V (x, x ) still converges. The existence of V (x, x ) in the inﬁnite dimension case is guaranteed by imposing an additional assumption that E[{supx∈X η(x)}4+δ ] <
∞, for some δ > 0, which is commonly assumed in functional data analysis (see Li and Hsing,
2010a). The assumption κ being ﬁnite is merely a practical consideration for the development of inference procedures. Further discussion on how to choose κ in real data will be
provided in Section 3.4.
ˆ
To obtain the quantile of the distribution of n1/2 Gp1, p2 (x, x ) − G(x, x ) V −1/2 x, x ,
one deﬁnes

ζ Z x, x

 κ

=
Z φk (x) φ
kk
k


k=k

(3.7)
κ
x

+

φk (x) φk x
k=1

Zk Eξ 4 − 1
1k



1/2 

V −1/2 x, x




where Z
=Z
and Zk are i.i.d. standard gaussian random variables. Hence, for any
kk
kk
x, x ∈ [0, 1]2 , ζ Z x, x is a standardized gaussian ﬁeld such that Eζ Z x, x = 0,
56

,

Eζ 2 x, x
Z

= 1. Deﬁne Q1−α as the 100 (1 − α)th percentile of the absolute maxima

distribution of ζ Z x, x , ∀ x, x

P





∈ [0, 1]2 , i.e.

sup
ζ Z x, x
2
x,x ∈[0,1]

≤ Q1−α




= 1 − α,

∀α ∈ (0, 1) .



The following theorem and corollary address the simultaneous envelopes for G(x, x ).
Theorem 3.3.2. Under Assumptions (B1)-(B4), for any α ∈ (0, 1),



lim P
n→∞ 

ˆ
sup
n1/2 Gp1, p2 (x, x ) − G(x, x ) V −1/2 x, x
(x,x )∈[0,1]2

ˆ
lim P n1/2 Gp1, p2 (x, x ) − G(x, x ) V −1/2 x, x
n→∞

≤ Q1−α




= 1 − α,



≤ Z1−α/2 = 1−α, ∀(x, x ) ∈ [0, 1]2 ,

where Z1−α/2 is the 100 (1 − α/2)th percentile of the standard normal distribution.
Remark 3.3.2. Although this covariance function estimator cannot be guaranteed to be positive deﬁnite, it tends to the true positive deﬁnite covariance function in probability.
The next result follows directly from Theorem 3.3.2.
Corollary 3.3.3. Under Assumptions (B1)-(B4), as n → ∞, an asymptotic 100 (1 − α) %
conﬁdence envelope for G(x, x ), ∀(x, x ) ∈ [0, 1]2 is

ˆ
Gp1, p2 (x, x ) ± n−1/2 Q1−α V 1/2 x, x

, ∀α ∈ (0, 1) ,

(3.8)

while an asymptotic 100 (1 − α) % pointwise conﬁdence envelope for G(x, x ), ∀(x, x ) ∈
[0, 1]2 is
ˆ
Gp1, p2 (x, x ) ± n−1/2 Z1−α/2 V 1/2 x, x
57

, ∀α ∈ (0, 1) .

3.3.3

Extension to two-sample test problems

In functional analysis of variance and linear discriminant analysis, it is commonly assumed
that the covariance functions are the same across diﬀerent treatment groups. It is natural to
extend our method to the two-sample problems, where we can construct conﬁdence envelopes
for the diﬀerence between the covariances functions from two independent groups. This
procedure is equivalent to two-sample t-test.
Suppose we have two independent groups of curves with sample sizes n1 and n2 , respectively. We denote the ratio of two sample sizes as r = n1 /n2 and assume that limn1 →∞ r =
ˆ
ˆ
ˆ (1)
ˆ (2)
r > 0. Let Gp p (x, x ) and Gp p (x, x ) be the spline estimates of covariance functions
1, 2
1, 2
G(1) (x, x ) and G(2) (x, x ) by (3.2). Also denote by ζ 12 x, x , ∀ x, x ∈ [0, 1]2 a standardized Gaussian process such that Eζ 12 x, x

≡ 0, Eζ 2 x, x
12

≡ 1, ∀ x, x

∈ [0, 1]2

with covariance function, for x, x ∈ [0, 1],

Eζ 12 x, x

ζ 12 x, x

=

V1 x, x + rV2 x, x
{V1 (x, x) + rV2 (x, x)}1/2 V1 x , x + rV2 x , x

1/2

.

Denote by Q12,1−α the (1 − α)-th quantile of the absolute maxima deviation of ζ 12 x, x ,
∀ x, x

∈ [0, 1]2 as above.

We have the following theorem, the proof of which is analogous to that of Theorem 3.3.2
and therefore omitted.

Theorem 3.3.4. Under Assumptions (B1)-(B4) modiﬁed for each group accordingly, for
58

any α ∈ (0, 1), as n1 → ∞, r → r > 0,
ˆ







ˆ (1) p (x,x )−G(2) p (x,x )−G(1) (x,x )+G(2) (x,x )
ˆp

Gp

1, 2
1, 2
1/2
sup
n
P
≤ Q12,1−α


(x,x )∈[0,1]2 1

V1 x,x +rV2 x,x 1/2


= 1 − α.

Remark 3.3.3. Under Assumptions (B1)-(B4), an asymptotic 100(1 − α)% conﬁdence envelope for G(1) (x, x ) − G(2) (x, x ), ∀ x, x

∈ [0, 1]2 is:

−1/2
ˆ (2)
ˆ (1)
Q12,1−α V1 x, x
Gp p (x, x )− Gp p (x, x )±n1
1, 2
1, 2

+ rV2 x, x

1/2

, ∀α ∈ (0, 1) .
(3.9)

one can use this conﬁdence envelope to test any hypothesis on G(1) x, x

3.4

− G(2) x, x .

Implementation

In this section, we describe the procedure to implement the conﬁdence envelopes. Given
N,n
the data set j/N, Yij
, the number of interior knots Ns1 for mp1 (x) is taken to
ˆ
j=1,i=1
be [n1/(4p1 ) logn], where [a] denotes the integer part of a. Meanwhile, the spline estimator
ˆ
Gp1 ,p2 (x, x ) is obtained by (3.3) with the number of interior knots Ns2 = [n1/(2p2 ) loglogn].
These choices of knots satisfy condition (B3) in our theory.
To construct the conﬁdence envelopes, one needs to evaluate the percentile Q1−α and
59

ˆ
estimate the variance function V x, x . An estimator V x, x

of V x, x

is

ˆ
V x, x
ˆ
ˆ
ˆ
= Gp1 ,p2 (x, x )2 + Gp1 ,p2 (x, x)Gp1 ,p2 (x , x )
n ˆ4
κ ˆ2
ˆ2
+
n−1
ξ −3 ,
φ (x) φk x
i=1 ik
k=1 k
ˆ
where φk and ˆik are the estimators of φk and ξ ik , respectively. According to Yao et al.
ξ
ˆ
(2005b), the estimates of eigenfunctions and eigenvalues correspond to the solutions φk and
ˆ
λk of the eigen-equations,
1
0

ˆ
ˆ ˆ
ˆ
Gp1 ,p2 (x, x )φk (x) dx = λk φk x

,

(3.10)

1ˆ
1 ˆ2
ˆ
ˆ
ˆ
where the φk are subject to 0 φk (t) dt = λk and 0 φk (t) φ (t) dt = 0 for k < k. Since
k
N is suﬃciently large, (3.10) can be approximated by

N −1

N ˆ
ˆ
ˆ ˆ
G
(j/N, j /N )φk (j/N ) = λk φk j /N .
j=1 p1 ,p2

For the same reason, the estimation of ξ ik has the form of
ˆ = N −1
ξ ik

N ˆ −1
ˆ
λ
Yij − mp1 (j/N ) φk (j/N ) .
ˆ
j=1 k

To choose the number of principal components, κ, M¨ller (2009) described two methods.
u
The ﬁrst method is the “pseudo-AIC” criterion proposed in Yao et al. (2005a). The second
is a simple “fraction of variation explained” method, i.e. select the number of eigenvalues
that can explain, say, 95% of the variation in the data. From our experience in the numerical
60

studies, the simple “fraction of variation explained” method often works well.
Finally, to evaluate Q1−α , we need to simulate the Gaussian random ﬁeld ζ Z (x, x ) in
(3.7). The deﬁnition of ζ Z (x, x ) involves φk (x) and V (x, x ), which are replaced by their
estimators described above. The fourth moment of ξ 1k is replaced by the empirical moments
of ˆik . We simulate a large number of independent realizations of ζ Z (x, x ), and take the
ξ
maximal absolute deviation for each copy of ζ Z (x, x ). Then Q1−α is estimated by the
empirical percentiles of these maximum values.
For the two-sample hypothesis testing problem, we will center the two groups of curves
separately by their own mean functions, since we do allow each group to have a diﬀerent
ˆ
mean function. Analogous to Q1−α , we estimate Q12,1−α and further
−1/2 ˆ
ˆ
ˆ (2)
ˆ (1)
Q12,1−α V1 x, x
Gp p (x, x )− Gp p (x, x )±n1
1, 2
1, 2

+ rV2 x, x
ˆˆ

1/2

, ∀α ∈ (0, 1) .
(3.11)

is applied for two samples covariance functions in the practice. The rest of the procedure
follows as described in Section 3.3.3.

3.5

Simulation

To illustrate the ﬁnite-sample performance of the spline approach, we generate data from
the model

Yij = m (j/N ) +

2
ξ φ (j/N ) + σεij , 1 ≤ j ≤ N, 1 ≤ i ≤ n,
k=1 ik k
61

where ξ ik ∼ N (0, 1), k = 1, 2, εij ∼ N (0, 1), for 1 ≤ i ≤ n, 1 ≤ j ≤ N , m(x) = 10 +
sin {2π (x − 1/2)}, φ1 (x) = −2 cos {π (x − 1/2)} and φ2 (x) = sin {π (x − 1/2)}. This setting
implies λ1 = 2 and λ2 = 0.5. The noise levels are set to be σ = 0.5 and 1.0. The number of
subjects n is taken to be 50, 100, 200, 300 and 500, and under each sample size the number
of observations per curve is assumed to be N = 4[n0.3 log(n)]. This simulated process has a
similar design to one of the simulation models in Yao et al. (2005a), except that each subject
is densely observed. We consider both linear and cubic spline estimators, and use conﬁdence
levels 1 − α = 0.95 and 0.99 for our simultaneous conﬁdence envelops. Each simulation is
repeated 500 times.
Table 3.1 shows the empirical frequency that the true surface G(x, x ) is entirely covered
by the conﬁdence envelopes. At both noise levels, one observes that, as sample size increases,
the true coverage probability of the conﬁdence envelopes becomes closer to the nominal
conﬁdence level, which shows a positive conﬁrmation of Theorem 3.3.2.
We present two estimation schemes: a) both mean and covariance functions are estimated
by linear splines, i.e., p1 = p2 = 2; b) both are estimated by cubic splines, i.e. p1 = p2 = 4.
Since the true covariance function is smooth in our simulation, the cubic spline estimator
provides better estimate of the covariance function. However, as can been seen from Table
3.1, the two spline estimators behave rather similarly in terms of coverage probability. We
also did simulation studies for the cases p1 = 4, p2 = 2 and p1 = 2, p2 = 4, the coverage
rates are not shown here because they are similar to the cases presented in Table 3.1.
We show in Figures 3.1 and 3.2 the spline covariance estimator and the 95% conﬁdence
envelops for n = 200 and σ = 0.5. Figures 3.1 and 3.2 correspond to linear (p1 = p2 = 2)
and cubic (p1 = p2 = 4) spline estimators respectively. In each plot, the true covariance
function is overlayed by the two conﬁdence envelopes.
62

4

G

2

0
1
−2
1

0.5
0.8

0.6

0.4

0.2

x

0

0

x

Figure 3.1: Plots of the true covariance functions (middle surfaces) of the simulated data and
their 95% conﬁdence envelopes (3.11) (upper and lower surfaces): n=200, N=100, σ=0.5.
p1 = p2 = 2.
63

4
3

G

2
1
0
−1
−2
1

1
0.8

0.5
0.6

0.4

0.2

x

0

0

x

Figure 3.2: Plots of the true covariance functions (middle surfaces) of the simulated data and
their 95% conﬁdence envelopes (3.11) (upper and lower surfaces): n=200, N=100, σ=0.5.
p1 = p2 = 4.

64

3.6
3.6.1

Empirical examples
Tecator near infrared spectra data

We ﬁrst apply our methodology to the Tecator data mentioned in Section 1 and chapter
2.6.Figure 3.3 shows the scatter plot of the spectra. As we can see, the spectra can be
naturally considered as functional data, since they are recorded on a dense grid of points
with little measurement error. On the other hand, there is a lot of variation among diﬀerent
curves. We show the estimated covariance function and the 95% conﬁdence envelope in
Figures 3.4. These results are obtained by applying cubic spline smoothing to both the
mean and covariance functions, with the number of knots Ns1 = 10, Ns2 = 6, respectively.
We also tried other combinations of knots numbers and linear spline estimators. The results
are very similar, and hence are not shown here. From Figure 3.4, we can see that the within
curve covariance is positive and quite signiﬁcant, since the zero hyperplane is far below the
lower bound of the conﬁdence envelope.
Using the simultaneous conﬁdence envelopes, one can test other interesting hypotheses
on the true covariance function, such as the true covariance being stationary. Speciﬁcally,
we are interested in the following hypothesis,

H0 : G(x, x ) ≡ g(|x − x |), ∀ x, x
v.s. Ha : G(x, x ) = g(|x − x |), ∃ x, x

∈ [a, b]2
∈ [a, b]2 ,

(3.12)

where g(·) is a stationary covariance function, and [a, b] is the range of wavelength.
To test the hypothesis in (3.12), we need to generate a new estimator under the stationarity assumption and check if this estimator can be covered by the simultaneous conﬁdence
65

ˆ
envelope. Letting G(x, x ) be the tensor product B-spline covariance estimator, we deﬁne
b−u G(x, x + u)dx for 0 ≤ u ≤ b − a and G (u) = G (−u)
ˆ
ˆ
ˆ
ˆ
GS (u) = (b − a − u)−1 a
S
S
ˆ
ˆ
for a − b < u < 0. Similar to G, the new estimator GS is not guaranteed to be positive
ˆ
semi-deﬁnite, but it is suﬃcient for our purpose. Under the stationarity assumption, GS is
ˆ
a better estimator of the true covariance. We will pretend that GS is the true covariance
and reject the null hypothesis if this function is not covered by the conﬁdence envelope.
Figure 3.5 shows cubic tensor spline envelopes with 0.9995 conﬁdence level, and the center
ˆ
surface is GS (x − x ) as a two-dimensional function. As we can see, even for such a high
conﬁdence level, the estimator under the stationarity assumption is still not fully covered
in the envelopes. We conclude that the covariance structure in these Tecator spectra is
non-stationary. The same conclusion can be drawn using linear tensor spline method.

3.6.2

Speech recognition data

The data were extracted from the TIMIT database (TIMIT Acoustic-Phonetic Continuous
Speech Corpus, NTIS, US Dept of Commerce) which is a widely used resource for research
in speech recognition. The data set we use was formed by selecting ﬁve phonemes for
classiﬁcation based on digitized speech from this database. From continuous speech of 50
male speakers, 4509 speech frames of 32 msec duration were selected. From each speech
frame, a log-periodogram was used as transformation for casting speech data in a form
suitable for speech recognition. The ﬁve phonemes in this data set are transcribed as follows:
“sh” as in “she”, “dcl” as in “dark”, “iy” as the vowel in “she”, “aa” as the vowel in “dark”,
and “ao” as the ﬁrst vowel in “water”. For illustration purpose, we focus on the “sh” and
“ao” phoneme classes as representatives of consonants and vowels. There are n1 = 872
66

5.5
5.0
4.5
4.0
3.5
3.0
2.5
2.0

absorbance

qq

qqqqqq
qqqq
qqq

qqq
qq
qqq
qqqqqqqq
qqqq
qq q
qq
qqqqqqqqqq qq
qqqqq
qq qq
qq
qq qq
q
qq q
q
qq qq
qq
qqq
q
q
qq
qqqq
q
qq
qqq
qq
q
q
q
qq
q q
q
q
q
qqqqq
qq
q q
q
q
q
q
q
qq
q
qqqqqq
qq
q
qq
q
qq qqqqqqqqqqq
q
qq
qq
qq
q
q
qqq
q
qq
qq
q
q
qqqqq
q
q
qq
q
q
q
qqq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q qq
q
q qqq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qqqq
qq
q
q
q
q
q
qqq
q
qqqqqqqqq
q
q
q
q
qqq
qqqqqqqqqqqqqqqq
q
qq
q
q
q
qq
q qqqqqqq
qqq
qq
qq
q
q
q
q
qq
qqq
qqq qq q
q
q
q
q
qq
qq qqq
q
qqqqqqqqqqqqqqqqqqqqq qqqqq
qqq
q
qq
q
q
qqqq qqqq
q q
qqqqq
q
q
qq
q
qq
q
q
qq
q
qq
qqqq
q qq
q
q
qqq
qqqqqq
qqq qqqqqqqqqqq
q
q
q
q
q
q
qqqqqqqqqqq
q
q
q
qqqqq
q
q
q
q
qqq
q q qq
q q
qq
q
qq qq
q qq
q
qqqqqqqqqqqq q qqq qq
q q qq
qq
q
q
qq
qq qq q qqq
q
q
qq
q
q q q
qqqqqqqqqqqqqqqqq
qq qq
q
qq
q
q qqq
qq
qqqqqqq
qqqqqqqq q qq
q
qqq
qqq
qq q
qq
q
q
qq
qqqqq
qq
q
qq
q
qq qq qq
qqq
q
qq qqq q q
q qq
qqqq
qqqqqqqqq
q
qqqq
qq
q
q
q qqqq qqq
qq
qqq
q
q
q
q
qqq
qq qqqqqqq q
qqqq
qqqq
q
q
q
qq
q
q q q
qqq
qqqqqqqqqqqqqq
q q
q
q
qq
qq qqqqqqqq q
qqq
q
qq
qqqqqq
q
q q
q
qqq
q
qqqqq q
qq
qqqqqq qqq
qqqq
q qqq
qq q
qqqqq qqqqq
qqqq
qqqq
q
qqqqqqqqq
q
q
q q
q
qq
q
qqqq
q
qqq
qq q
q
q
qq
q
qqqqqqqq
qq qq
qqqq
qqqqqqqqqqqq qqq qqqqqqqqqqqq
q
qqqqqqqqqq
qqq q
qq
q
q
qq q
qq q
qqqqq
qqq
q
q
qq q
q
qqqq
q qqqq
qq
qq
qq qqqqqqqqqqqqqqqqq qqqqqqqqq qq qq
q
qq
qqqq
qqq qq
qqq
qq
qqqq q
qqq
qqqq
q
qq
q qqqqqqqqqqqqqqqqqqqq
qq
qqqq
qq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qq
q
q
q qq
qqqqqqqqqqqq
qqqq
qq
q q
qq
qq
qqq
qq
qqqqq
qq
q
q
qqqqqqq
qq
qq q
qqqqqqqqqqqqqqqqq qq
qq
q
q q
qqqqqq
q
qqq
qq
qqqqqqqqqqqqqq
qq
qq
qq
q
qq
qq
qq
qqqqqqqq qqqqqqq
qq
qq qq
q
q qqq
qqqqq qqq qqqqq q
q
q
q q
qq
q
qqqqqqqqq
qqq
q
q
qqqqqqqqqq qqq q qq
q
qq qq
qqqqqqqq qqqqqqqq qqqqqqqqqqqqqqq
q q
qqqqqq
qq
qq
qqqqqqqq
qqq
q
q qq qqq
qqq
qqqqqqqqqqqqqqq qqqq qqq
qqqqq
qqqqqqq qqqqqqqqqqqqqqqqqq
q qq q
q
qq
qq qqq
qqq
qq
qqqq
qqqqqqqqqqqqqqq
qqqqqqqqqqq qqqqq qqqqq
qqq qqqq
q
qqqq
qq
qqq
q qqqq
q
qq
qqqqqqqqqqqqqq qqqqq
q
qqq qqqqqqqqqqqqqqqqqq
qqqqqqqqqq
qq q
q
qq
qqq
qqqqqqqqq
qqqqqqqqqq
q
qqqq
q
qq
q
qq q
qq qqq qqq q
q
qqqqq qqqqqqqq qqqqqqqq qqqqq qqqqqqqqqqqqqqqqqqqq
q
q
qqqq
q qq q
qq
qqq
qq q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqq qqqqqqqqqqq
qq q qqqqq qqqqqqq
qqqqqqqqqq qq
qqq
qqqqqq
q
q
q qq
qqqqq qqqq
qq
q qq
q
qq qq
q
q q qqqqqqqqqqq qqq
qqq
qqq
q
q
q qqq
qq
qqqqqqqqqqq qqq qqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqq
qqqqq qqqq qq qq qq qqqqqqq q qqqq
q
qqqq
q
q
qq
q qq
q
qqq
qqq
qqq q qq q q q qq qqqqqqq
q
qqqqqq qqqqq qqq qqqq
qqq qq
q
q q q qq
qqqqq q q
q
q qq
q
qqqq
qqq
q
q
q q q
qqq
q
qq
q
qq qqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqq qqqq qqq qq
qq
qq qqqqqqq
q qqqqq
q
q
qqqq
qq
q
q
q
qqqqq
q q
qq qq q qq
qqq
qqqq q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qq qqq
q
qq
qqq qqqq qqq qq
qq qq
q
qqqq q qq
q
qqqqqqqqq q q
qqq
qqq
qq
qqqqqq q
q
qqq qq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqq qqq
q
qqqqqq
qq qqq
qqq qqqqqq
qq q
q
q
qq qq
q qq qqqq qqqqqqqqqqqqqqqqqqqq
q qqqqqq qq
qq qq qqqq
qqq
qqqq qqqq q
qq q
qqq
q
qq
qqqqq
q
qq qqq q
qqqqq
q
qq
qqq qqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
qq q qqq qqqqqqqqqqqqqqqqqqqqqqqq qqqqqq
q
q
q q
q
q
q
q
q qqq qqqqqqqq
q
qqqqq
qq
qqqqqqq
q
q
q
q
q qqqqqq qq qqqqq qqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqq qqqqq qqq
q
qqqqqqqqqq
qqq qqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqq
qqqqqq qqqq qq qqqqqq qqq
q qq qqqqq
q
q qq
q qqqq
qqq
q
qq q
q q qq
qqqqqqq qqqq
q
qqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqq qqq qqq
qqqqqq
q
q
qqqq
qq
q
q
qq
qq
q
q
qqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqq
qqqq q
qqqq qq
qqq qqq qq
q q
qqq
qqq qq q
q qq q q
qq
q q
q
qqqqq qq qqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqq qqq
q
q
q
q
q
qqqqqqqqqqqqqq qqq
q
q
q qqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqq
q
qq
q q qqq
q
qq
q q
qq qqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qq
qqqqq
qq
q
q q qq q
qq
qq
qqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q qqqqqqqqqqqqqqqq
q
q q
qq qqqqqqqqq
q qqqq qqqq qq q
q
qq qq
q
q
q
q
qqq
q
q
q
qqq qqqqq qqqqqqqqq qqqqqqqqqqqqqq qqqqqqqqqqqqqq qqqqqqqqq
qq qqq q
q q qqqqq q
q
qqqqqqqqqq qqqqq qqqqqqqqqq qqqqq
qq
qqq q
q
qqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqq qq
qqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqq
q
q
q qq q
q
q
q
q q
qq q q q
q
qq q qqqqq qqqqqqq
qq q
qqqq
q
qq
qq q
qqqqqq
qq
qqqqq
qqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqq
qq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqq
q qqqqq
q
qqqqq
q
q
q
qqq qqqq qq qqq
q
qq qqqqqqqqqqqqqqqqqqqqqq
q
qq qq
qq qq
q qqq
q
qqq qq
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqq
q
qqqqqqqqqqqqq
qqqqqq qqqqqq q qqq
qqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q qqqqq qqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqq
qqqqqqqqq
qqqq q q
qqqqq qqqqqqqqqqqqqq
q
q
qqq
qq
q q
q
qqqq
qqqqqqqqqq
q
qqq
q
qqq qqqqqqqqqqqq
qqqqq
qq q q
q
qqqqq qqqqqqq
q
q q
q
q q qqq
q q q
q
q
qq qqq q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqq q
qq
q
q
q
qqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqq qqqqqqqqqqq
q
q q
q
qqqq qqq qqqq q q
q
q
qq
q
q
q qqqqq
q q
qqqqqqqq
qqq
q q qqqqq
qqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqq
q qq
q
q qqqq qq
q
qq q q
qqqqqqqq qqq qq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qq
q
qq qq
qqqqqqqqq qqqqqqq qqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
qq
q
qq qqqqqq qq
q
q
qqq
q
q qq
qqqq qq qqqqqqqqqqqqqqqq qqq qqqq qqqqqqqqqq qqq qqqqq qq
qqq
q
qq
qq
qqqqq
qqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qq q
q
q
q
q
q
q
qq qqqqqqq qq
qq
qq q
qqqqqqqq q qqq
q qqqqq qq qqqqqqq q
q qqq qqqqq qq qqqqqqqqq qqqqqqq
q
q q q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qq qqq q qq
q
q
qqq
qqqqq qqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqq
q qq qqqq q qqqqq qq qqq qqqq qqq qqqqqqqq qq qqq
q
q qqqqqq
qqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqq
qqqq
q
q
q
q
q
qqqqqqqqqqq
qq q q q
qq
qqqq q
q qq q
qqqqqqqq q
q q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqq
q
qqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q
q qq
q
qqq
qqq
q
qqqq q
qqqqq
qq
qq qqqqqqq qqqqqqq qq
qqqq
q q qq
q qqq qqqqqqqqqqq qqq
q
q
qqqq qqqqqqqqqqqqqq
qq
qqqq qq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qq q
q
qq q
q
q qq
q
qqq q q qqqqq qqq q qqqqqq
q
q
qqq qqqqqqqq qqqqqqqqqqqqqqqq qqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q qqqqqq
q
q qqq qqqqqqq
qq
q
q
q
qq
qqqqqqqqqqqqqqqqq
qqq qqq q
qqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqq
qqqqqqqqqqqqqq qqqqqqqqq qqq
qqqqqqqqqqqqqq
qqqqqqqqqqqqqqqq
qqq qq qqqq
q
qqqqqqqqq q
q
q
q q
q
q
qq qq qq
q qqq qqqqq
q
q
qq q qqqq
qqq qqqq
qqqqqqqqqqq
qqq q qqq qqqqqq qqqq
qqq q qq qq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqq
qqq
qqq
q
q
q q
q q
qqqq qq
q q q q
q
qqqqqq
q
q
qqq qqqqq q
q
q qqqqqqqqqqqqqqq qqqqqqqqqqqqqqq qqq qqqq qq
qqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqq qqqqqq qqqqqqqqq
q
q
q qqqqqq
qqq qqqq
q qq
q
q
q
q
qq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
qqq
q
q
qqqq
qqq q qqqqq q
qqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqq qq qqqq qqqqqqqqqqqq qqqqq qqqqqqqqqqqqqqqqqqqq
q
q q q
qqqq
qq qqqqqq q qq qqqqq
qq
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqq qqqqqqqq qq qqq
qqq q qqqqqqqqq qqqqqqqqqqqqqq
qqq
q
q
qqq
q
qqqq
q
qq qq
qqq
qq
q q qq
q
qqqqqqqq qq
qq
q
qqqqqqqq qqqqqqqqqqqqq
qq q
qq qqqqqqqqq qqq qqqqqqqqq qq
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qq
q
q qqqqq
qqqqqqqqqqqqqqqq
qqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqq
q
qqqqqqqq qqq q
q qqqq qqq qqqqqqqqq q
q q
qqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqq
qq
qq
q q
q
qqqq qq
q
qq q q q
qqqq
qqqq qqqqqq q qqqqqqqq qqqqqqqqqq
qqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqq
q
qq
qqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qq qqqqqqqqqqq q qqqqqqqqqq qq q
q q qqqqqqqqq qqqqqqqqqqqqqqqq
qqq
qqqqq q
qqq
qqqqqqq
q q
q
q
q
q qq
q
qq q qqq qq qq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qq qqq qqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqq qqqqqqqqqqqqqqqq
qqqq
qq
qq
q q
q
qqqq q
qqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
qq q
qqqqq qqq qqqq q
q qqqq
qq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqq qqqqq
q qq q
qq qqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qq
qq qq qqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q q
qq
q
qq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqq
qqqqqqq qqqqqqqqqqqqqq
qq
qqq
qqqqqqqqqq q
qq
q
qqqqqqq qq
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqq
q
q
q qq q q
qq qqqq qqqqqqq
qqqq qqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqq
qqqqqqqq
q
qqqqqqqqqqqqqqq qq
q
qq
qqq
q qqqqqqq
qqq qqqq
qqqq qqqqqqqqqqqqqqqqqqq
q
q
q
qqqqqqqqqq q
qq q
q
q
qqqqqqqq q q
qqqqqqqqqqqqqqqq qqqq
qqqqqqq qqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q
qqqq qqqqqqqqq qqqqqqq
q
qq q qq
q
qq
qqqqqqqq
qq
qqqqqqq
q qqq
q
q
qq
qqqqqqqqqqqqqqqqqqqqqqqq q
qqqqqqqq q qqqq
qq q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqq
qqq qqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q
q q
qqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqq
q
q
qqqqqqq
qq q
qqq qqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqq qqqqq
qqqqqqqqqqq q
q
qqqqq qqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q
qqq qq
qqqqq q
q
q
qq
q
qqqqqq
q
qqqq qq qqqqqqqqqq qq qqqqq
q
qq
qqqqqq
q
qq
qqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqq
q
q
qqq
qqqq
qqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q
qq
qqq
qqq
q
qqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqq
qq
qq qqq q
qqqqqqqq
qqqqqqqq
qqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q
q
qqqqqqqq
qqqqqq
q
qqqqqqqqqq
qqq
qqqqqqqq
qqq qq
q
q
qq
qqqqqqqqqq
qqqqqq
q
qqqqqqqqqqqqqq
q
q
qqqqqqqq q qqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qq
qq qqqqqq
qqqqq
qqqqqqqqqqqqqqqqqqqqq qqq qqq q
q
qqqqq q
q
qqqq qq
qq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqq
qq qqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qq
q
qqqq
qqqqqqqqqqqqq
q
qq
q
qqqqqqqqqqqqqqqqqqqqqqqq q qqqq qqq qq
qq q
qqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqq
qqqqqqqqqqqqqqqqq
qqq
qq
qqq
qq
q
qqq
qqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqq
q
qqqqqqqqqqqqqq
qq
q
qqqqqqqqqqqqqqqqqqqqqqqqq
qq
qqq
qqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqq
q
q

850

900

q
q

q

950

wavelength

Figure 3.3: Plot of the Tecator data.

67

1000

1050

0.4
0.35

G

0.3
0.25
0.2
0.15
0.1
1050

1200
1000

950

1000
900

850

wavelength

800

wavelength

Figure 3.4: Plots of the cubic tensor product spline covariance estimator (3.3) for the Tecator
data (middle surface) and the 95% simultaneous conﬁdence envelope (3.11) (upper and lower
surfaces).

68

0.5
0.4

G

0.3
0.2
0.1
0
1050

1100
1000

1000
950

900

900

850 800

wavelength

wavelength

Figure 3.5: Plot for testing hypothesis (3.12) for the Tecator data. The upper and lower
surfaces are the 99.95% conﬁdence envelopes for the covariance function, and the middle
ˆ
surface is the covariance estimator under stationarity assumption, GS (x − x ).

69

log-periodograms in the “sh” class, and n2 = 1022 log-periodograms in the “ao” group.
Each log-periodogram consists N = 256 equally spaced points. Figure 3.6 shows a sample
10 log-periodograms from each of the two phoneme classes.
This data set was ﬁrst analyzed by Hastie et al. (1995) using penalized linear discriminant
analysis. One of the basic assumptions is that the covariance functions are the same for
diﬀerent classes. Judging from the scatter plot of the data in Figure 3.6, despite the clear
diﬀerence between the mean functions of the two groups, there is no obvious indication of
diﬀerence in covariance structures.
We ﬁrst obtain the cubic tensor product spline covariance estimators for the two phoneme
classes separately, which are shown in Figure 3.7. These results are obtained by using
Ns1 = 10, Ns2 = 4 number of knots for the “sh” class, and Ns1 = 11 and Ns2 = 5 for the
“ao”class. Diﬀerent number of knots between the two groups reﬂects that the sample sizes
are diﬀerent.
By comparing the covariance estimators in Figure 3.7, there seems to be a visible diﬀerence between the two classes. We now would like to test the equal covariance assumption
formally. The hypotheses of interest are

H0 : G(1) (x, x ) ≡ G(2) (x, x ), ∀ x, x
v.s.

∈ [0, 1]2

Ha : G(1) (x, x ) = G(2) (x, x ), ∃ x, x

∈ [0, 1]2 .

(3.13)

The 99.95% conﬁdence envelopes for the diﬀerence of the two covariance functions are provided in Figure 3.8, and the zero hyperplane is used as a reference. Since the zero hyperplane
is not covered by the envelopes, the equal covariance hypothesis is rejected with p-value
< 0.0005. We also tried diﬀerent numbers of knots and the test result is not sensitive to this
70

15
5
0

Log−Periodogram

sh

0

50

100

150

200

250

200

250

5

15

ao

0

Log−Periodogram

Frequency

0

50

100

150

Frequency
Figure 3.6: Plots of the speech recognition data.

choice. Our results suggest that a quadratic discriminant analysis, that takes into account
the diﬀerence in the within-group covariance functions, might yield a better classiﬁcation
error rate than a linear discriminant analysis in this data set.
71

Figure 3.7: Plots of tensor spline estimators for “sh” and “ao” data sets. Right: “sh”; Left:
“ao”.

72

Figure 3.8: Plots of hypothesis test (3.13) results with the cubic tensor product spline 99.95%
conﬁdence envelopes (3.11) (upper and lower surfaces) for the speech recognition data and
the zero surfaces (middle ﬂat surface).

73

3.7

Summary

In this chapter, we consider covariance estimation in functional data and propose a new
computationally eﬃcient tensor-product B-spline estimator. The proposed estimator can be
used as a building block for further data analysis, such as principal component analysis, linear
discriminant analysis and analysis of variance. We study both local and global asymptotic
properties of our estimator and proposed a simultaneous envelope approach to make inference
on the true covariance function. The method is applied to a Tecator near-infrared spectra
data to test the stationarity assumption on the covariance. In a classiﬁcation problem, we
further extend our method to a two-sample problem to test the equal covariance assumption
between diﬀerent treatment groups.

74

APPENDIX

75

Throughout this section, C means some positive constant in this whole section.
The detailed proofs of the following lemmas, Proposition 3.3.1 and Proposition 3.3.2 can
be found in the supplement.

Preliminaries
For any positive integer p, denote the theoretical and empirical inner product matrices of
Ns
BJ,p (x)
as
J=1−p

Vp =

BJ,p , B
J ,p

Ns
ˆ
, Vp =
J,J =1−p

Ns
BJ,p , B
.
J ,p 2,N
J,J =1−p

−1
The following lemma is from Chapter 1, which established the upper bound of Vp

∞

.

Lemma 3.7.1. For any positive integer p, there exists a constant Mp > 0 depending only
−1
≤ Mp h−1 , for a large enough n, where hs = (Ns + 1)−1 .
on p, such that Vp
s
∞
Denote by “⊗” the Kronecker product of two matrices. Note that (A ⊗ A)−1
A−1 ⊗ A−1

∞

≤ A−1

∞

=

2
, for any invertible matrix A, which, together with Lemma
∞

3.7.1, leads to the following result.

Lemma 3.7.2. For any positive integer p, there exists a constant Mp > 0 depending only
on p, such that

Vp ⊗ Vp −1

∞

2
≤ Mp h−2 .
s

Next we deﬁne the theoretical and empirical inner product matrices of tensor product
76

spline basis

x, x
B
JJ ,p2

Ns2

as

J,J =1−p2
Ns2

Vp ,2 =
2

BJJ ,p , BJ J ,p
2
2

ˆ
Vp ,2 =
2

.
BJJ ,p , BJ J ,p
2
2 2,N J,J ,J ,J =1−p
2

J,J ,J ,J =1−p2
Ns2
(3.14)

ˆ
The following results show that the diﬀerence of Vp ,2 and Vp ,2 is negligible. Using the
2
2
ˆ −1
results from Lemma 3.7.2, we can obtain the upper bound of the norm of Vp ,2 .
2
ˆ
Lemma 3.7.3. Under Assumption (A3), for Vp ,2 and Vp ,2 deﬁned in (3.14),
2
2
ˆ −1
ˆ
= O h−2 .
Vp ,2 − Vp ,2
= O N −1 and Vp ,2
s2
2
2 ∞
2 ∞
ˆ
Lemma 3.7.4. For Vp ,2 deﬁned in (3.14) and any N (N − 1) vector ρ = ρ
, there exjj
2
ˆ −1
N −2 B T (x, x )Vp ,2 XT ρ
ists a constant C > 0, such that
sup
≤ C ρ ∞.
p2
2
∞
2
x,x ∈[0,1]
Denote φ
kk

x, x

= B T (x, x ) XT X
p2

−1

XT φ , and
kk

φ
=
kk
φk (2/N ) φ (1/N ) , . . . , φk (1) φ (1/N ) , . . . , φk (1/N ) φ (1) , . . . , φk (1 − 1/N ) φ (1)
k
k
k
k

The following lemma is a direct result from de Boor (2001), p. 149 and Theorem 5.1 of
Huang (2003), thus the proof is omitted.
Lemma 3.7.5. There is an absolute constant Cg > 0 such that for every g ∈ C p−1,µ [0, 1],
there exists a function g ∗ ∈ H(p−1) [0, 1] and some µ ∈ (0, 1] such that supx∈[0,1] g(x) − g ∗ (x)
p−1+µ
≤ Cg hs
. If Assumption (B2) holds, sup
φ (x, x ) − φkk (x, x )
=
(x,x )∈[0,1]2 kk
∞
77

T

.

p
O hs 2 .
2

Proofs of theorems 3.3.1 and 3.3.2
Proof of Theorem 3.3.1. By Propositions 3.3.1,

˜
E[Gp1, p2 (x, x ) − G(x, x )]2 = E∆2 x, x

where ∆ x, x

is deﬁned in (3.5). Let ¯
ξ
= n−1
·kk

+ o(1),

n ξ ξ , 1 ≤ k, k ≤ κ. According
i=1 ik ik

to (3.16) and (3.17), one has
κ

κ
∆ x, x

¯
ξ
φ (x) φ
·kk k
k

=

x

¯
ξ ·kk − 1 φk (x) φk x

+

.

k=1

k=k
Since

nE ∆ x, x
κ
=

2

φ2 (x) φ2
k
k

κ
x

+

φk x

φk (x) φk (x) φk

k,k =1
k,k =1
κ
+
φ2 (x) φ2 x
Eξ 4 − 3
k
k
1k
k=1
κ
2
= G x, x
+ G (x, x) G x , x +
φ2 (x) φ2 x
k
k
k=1

x

Eξ 4 − 3 ≡ V x, x
1k

,

and the desired result follows from Proposition 3.3.2.
Next deﬁne ζ x, x

= n1/2 V −1/2 x, x

∆ x, x .

Lemma 3.7.6. Under Assumptions (B2)-(B4), sup
78

ζ
x, x
x,x ∈[0,1]2 Z

− ζ x, x

=

oa.s. (1), where ζ Z x, x

is given in (3.7).

Proof of Theorem 3.3.2. According to Lemma 3.7.6, Propositions 3.3.1 and 3.3.2
and Theorem 3.3.1, for ∀α ∈ (0, 1), as n → ∞,








−1/2
ˆ
lim P
sup
n1/2 Gp1 ,p2 x, x − G(x, x ) V x, x
≤ Q1−α
n→∞ 

 x,x ∈[0,1]2







−1/2
1/2 G
˜ p ,p x, x − G(x, x ) V x, x
= lim P
sup
n
≤ Q1−α
1 2
n→∞ 


 x,x ∈[0,1]2






= lim P
sup
ζ x, x
≤ Q1−α
n→∞ 


 x,x ∈[0,1]2




= lim P
sup
ζ Z x, x
≤ Q1−α .
n→∞ 

x,x ∈[0,1]2

Supplement

This supplement contains proofs for Lemmas 3.7.3, 3.7.4 and 3.7.6, Propositions 3.3.1 and
3.3.2.
79

Proof of Lemma 3.7.3. Note that

ˆ
Vp ,2
2



=
N −2



N
 s2


j/N, j /N B
j/N, j /N
B
J J ,p2
JJ ,p2



1≤j=j ≤N
J,J ,J ,J =1−p2



Ns
N
N

 2
=
N −2 
BJJ ,p (j/N, j/N ) 
BJ J ,p (j/N, j/N )


2
2
j=1
j=1
J,J ,J ,J =1−p2
N

N
 s2

−2
B
(j/N, j/N ) B
(j/N, j/N )
− N
JJ ,p2
J J ,p2


j=1
J,J ,J ,J =1−p2

N
s2
N

j j
j j 
−2
ˆ
ˆ
= Vp 2 ⊗ V p 2 − N
B
,
,
.
B
JJ ,p2 N N
J J ,p2 N N 

j=1
J,J ,J ,J =1−p2

ˆ
ˆ
ˆ
Note that the entries in the matrix Vp ,2 − Vp2 ⊗ Vp2 are zero when the maximum absolute
2
diﬀerence between any two of the indices J, J

N −2

, J ,J

is greater than p; otherwise

N
BJJ ,p (j/N, j/N ) BJ J ,p (j/N, j/N )
2
2

j=1
1

B
(x, x) B
(x, x) dx + O(N −1 h−1 )
s2
JJ ,p2
J J ,p2
0
= O N −1 hs2 + N −2 h−1 .
s2

= N −1

80

ˆ
ˆ
ˆ
= O N −1 hs2 + N −2 h−1 . Since
Hence, Vp ,2 − Vp2 ⊗ Vp2
s2
2
∞
ˆ
ˆ
Vp 2 ⊗ V p 2 − Vp 2 ⊗ V p 2
Ns2
=

N −2

N

BJJ ,p
max
2
1−p2 ≤J ,J ≤Ns2
j,j =1
J,J =1−p2
1 1
x, x B
x, x dxdx
B
−
J J ,p2
0 0 JJ ,p2
Ns2

≤

∞

N

max
1−p2 ≤J ,J ≤Ns2
J,J =1−p2 j,j =1

j /N

j j
,
N N

BJ J ,p
2

j j
,
N N

j/N

j −1 /N (j−1)/N

j/N, j /N
B
JJ ,p2

×B
j/N, j /N − B
x, x B
x, x
J J ,p2
JJ ,p2
J J ,p2
2
≤ Ch−2 N hs2 × N −2 × N −2 h−2 = CN −2 h−2 ,
s2
s2
s2

dxdx

ˆ
applying Assumption (B3) one has Vp2 ⊗ Vp2 − Vp ,2
= o N −1 .
2 ∞

2
≤
According to Lemma 3.7.2, for any Ns2 + p2 vector τ , one has (Vp2 ⊗ Vp2 )−1 τ
∞
h−2 τ ∞ . Hence, (Vp2 ⊗ Vp2 )τ
≥ h2 τ ∞ . Note that
s2
s2
∞
ˆ
ˆ
Vp ,2 τ
≥ (Vp2 ⊗ Vp2 )τ
− (Vp2 ⊗ Vp2 )τ − Vp ,2 τ
= O h2
s2
2
2
∞
∞
∞
ˆ −1
ˆ −1
= Vp ,2 τ
≤ O h−2
If τ satisﬁes that Vp ,2
s2
2 ∞
2
∞
proved.

τ ∞.

τ ∞ = O h−2 , the lemma is
s2

m,n
Proof of Lemma 3.7.4. Note that for any matrix A = aij
and any n by
i=1,j=1
81

1 vector α = (α1 , . . . , αn )T , one has Aα ∞ ≤ A ∞ α ∞ . It is clear that

N −2 XT ρ

∞

=

N −2

N
 s2







1≤j=j ≤N

≤

ρ ∞

j/N, j /N ρ
B
jj 
JJ ,p2

J,J =1−p2 ∞

max
N −2
1−p2 ≤J,J ≤Ns2

1≤j=j ≤N

B
j/N, j /N
JJ ,p2

≤ h2 ρ ∞ .
s2

One also observe that

ˆ −1
ˆ −1
N −2 B T (x, x )Vp ,2 XT ρ
≤ B T (x, x )
N −2 XT ρ
Vp ,2
,
p2
p2
∞
∞
2
2 ∞
∞
which, together with the boundedness of spline functions and Lemma 3.7.3, leads to the
desired result.
Proof of Lemma 3.7.6. According to Assumption (B4) and multivariate Central Limit
Theorem
√

ξ
ξ ·k
n ¯·kk , ¯2 − 1

Eξ 4 − 1
1k

−1/2
1≤k<k ≤κ

→d N 0κ(κ+1)/2 , Iκ(κ+1)/2 .

Applying Skorohod’s Theorem, there exist i.i.d. variables Zkk = Zk k ∼ N (0, 1), Zk ∼
N (0, 1), 1 ≤ k < k ≤ κ, such that as n → ∞,

max
1≤k<k ≤κ

√
√
1/2
n¯
ξ
−Z
, n ¯2 − 1 − Zk Eξ 4 − 1
ξ ·k
1k
·kk
kk

= oa.s. (1) .
(3.15)

The desired result follows from (3.15).
82

Proof of Proposition 3.3.1

Recall that the error terms deﬁned in Section 3.2 are Uij = Yij − m(j/N ), i = 1, ..., n,
j = 1, ..., N . Note that

¯
U
·jj

= n−1

n

Uij U
ij
i=1 


κ
κ
n

ξ ik φk (j/N ) + σ (j/N ) εij  
ξ ik φk j /N + σ j /N εij 
= n−1
i=1 k=1
k=1
¯
¯
¯
¯
= U1jj + U2jj + U
+U
,
3jj
4jj

where
κ
¯
U

1jj

¯
ξ
φ (j/N ) φ
·kk k
k

=

j /N ,

k=k
κ
¯
U2jj

¯ φ (j/N ) φ j /N ,
ξ ·kk k
k

=
k=1

¯
U3jj
¯
U
4jj

= n−1

n

σ (j/N ) σ j /N εij εij ,
i=1 

n  κ
κ

−1
= n
ξ ik φk (j/N ) σ j /N ε +
ξ ik φk j /N σ (j/N ) εij .
ij


i=1 k=1
k=1

˜
Let Uip x, x
2

¯
¯
= B T (x, x ) (XT X)−1 XT Ui , where Ui =
p2

¯
U

ijj

1≤j=j ≤N

˜
1, 2, 3, 4. Then we can have the following decomposition for Gp2 (x, x )

˜
˜
Gp2 (x, x ) = U1p x, x
2

˜
+ U2p

2

x, x
83

˜
+ U3p

2

x, x

˜
+ U4p

2

x, x

.

, for i =

Deﬁne
κ
¯
ξ
φ (x) φ
·kk k
k

U1 (x, x ) =

x

,

(3.16)

k=k
κ
U2 (x, x ) = G x, x

φk (x) φk x

+

¯
ξ ·kk − 1

.

(3.17)

k=1
˜
Next we illustrate the facts that U1p

2

˜
and U2p

x, x

2

x, x

are the dominating terms in

the above decomposition, which converge uniformly to U1 (x, x ) and U2 (x, x ) respectively,
˜
while U3p x, x
2

˜
and U4p x, x
2

are negligible noise terms.

¯
By the deﬁnition of U1 , one has that

˜
U1p x, x
2


 κ


ˆ −1
= N −2 B T (x, x )Vp ,2 XT
p2

2


¯
ξ
φ (j/N ) φ
·kk k
k

j /N







k=k

κ
¯
ξ ·kk φkk

=

x, x

.

k=k
Lemma 3.7.5 and Assumption (B3) imply that




κ





¯
˜
˜
U
−
ξ
φ
x, x
sup
(U1p − U1 ) x, x
=
sup
·kk kk 
2
 1p2

2 
2
x,x ∈[0,1]
x,x ∈[0,1]
k=k




2
¯
≤ κ
max
ξ
sup
(φ
− φ ) x, x
·kk
kk
kk

2
1≤k=k ≤κ 
x,x ∈[0,1]
p
= Op hs 2 n−1/2 = op n−1/2 .
2
84

Similarly,

˜
sup
(U2p − U2 ) x, x
2
x,x ∈[0,1]2


¯
ξ
sup
≤ κ max
(φkk − φkk ) x, x
 ·kk
1≤k≤κ
2
x,x ∈[0,1]
p
= Op hs 2 = op n−1/2 .
2





Therefore, one has

˜
˜
sup
(U1p + U2p − U1 − U2 ) x, x
2
2
x,x ∈[0,1]2
Denote that
¯
N −2 XT U3 =





n−1

n
A
i=1

= op n−1/2 .

N
 s2
iJJ 

,

J,J =1−p2

where

A

iJJ

= N −2
1≤j=j ≤N

BJ,p (j/N ) σ (j/N ) B
j /N σ j /N εij ε .
J ,p2
ij
2

It is easy to see that EA
= 0 and EA2
= O h2 N −2 . Using standard arguments
s2
iJJ
iJJ
in Wang and Yang (2009), one has

¯
N −2 XT U3

∞

= oa.s. N −1 n−1/2 hs2 log1/2 (n) .

85

˜
¯
Therefore, according to the deﬁnition of U3p x, x , one has
2
ˆ −1
¯
sup
B T (x, x )Vp ,2 N −2 XT U3
p2
2
∞
2
x,x ∈[0,1]
ˆ −1
¯
sup
Vp ,2 N −2 XT U3
Bp2 x, x
∞
2
∞
x,x ∈[0,1]2
= oa.s. N −1 n−1/2 h−1 log1/2 n = oa.s. n−1/2 .
s2

≤ C p2

˜
Likewise, in order to get the upper bound of U4p x, x
2

, one has

¯
X T U4
=

2
n

n
i=1

1≤j=j ≤N

κ
ξ
k=1 ik
φk (j/N ) σ j /N B
JJ ,p2

j j
,
N N

ε

N
 s2

ij 


.

J,J =1−p2

Let

D
= N −2
iJJ

κ






ξ ik
k=1


φk (j/N ) σ j /N B
j/N, j /N ε  ,
JJ ,p
ij
1≤j=j ≤N

86

then ED
= 0,
iJJ
ED2
iJJ

= N −4

κ

φ2 (j/N ) σ 2 j /N B 2
j/N, j /N Eξ 2 Eε2
k
ik ij
JJ ,p2

k=1 1≤j=j ≤N
κ
2
−4
φ2 (j/N ) BJ,p (j/N ) σ 2 j /N B 2
j /N
≤ CN
k
J ,p2
2
k=1 1≤j=j ≤N
= O h2 N −2 .
s2

Similar arguments in Wang and Yang (2009) leads to
¯
XT U4
N2

= oa.s. N −1 n−1/2 hs2 log1/2 (n) .

∞

Thus,
T¯
ˆ −1 X U4
B T (x, x )Vp ,2
p2
2
N2

∞

= oa.s. N −1 n−1/2 h−1 log1/2 (n) = oa.s. n−1/2 .
s2

Proof of Proposition 3.3.2
For simplicity, denote


 B1−p ,p (1/N ) · · ·

1 1



X1 = 

···
···





B1−p ,p (N/N ) · · ·
1 1



BNs ,p (1/N ) 

1 1





···





BNs ,p (N/N )
1 1

N × Ns1 +p1

for the positive integer p1 . We decompose mp1 (j/N ) into three terms mp1 (j/N ), ˜p (j/N )
ˆ
˜
ξ
1
87

and ˜p1 (j/N ) in the space H(p1 −2) of spline functions:
ε

mp1 (x) = mp1 (x) + ˜p1 (x) + ˜p (x),
ˆ
˜
ε
ξ
1

where
mp (x) = B1−p ,p (x) , . . . , BNs ,p (x)
˜
1 1
1

XT X1
1

−1

˜p1 (x) = B1−p ,p (x) , . . . , BNs ,p (x)
ε
1 1
1

X T X1
1

−1

˜ (x) = B
ξp
1−p1 ,p1 (x) , . . . , BNs ,p1 (x)
1

X T X1
1

−1

XT
1

XT m,
1
XT e,
1
κ
ξ ·k φk ,
k=1

T
where m = (m (1/N ) , . . . , m (N/N )) is the signal vector,
T
e = σ (1/N ) ε·1 , . . . , σ (N/N ) ε·N , ε·j = n−1 n εij , 1 ≤ j ≤ N , is the noise
i=1
T
are the eigenfunction vectors, and ξ ·k =
vector and φk = φk (1/N ) , . . . , φk (N/N )
n−1

n ξ , 1 ≤ k ≤ κ.
i=1 ik

ˆ
Thus, one can write the residuals Uij,p = Yij − mp1 (j/N ) as
ˆ
1
ˆ
Uij,p = m (j/N ) − mp1 (j/N ) − ˜p (j/N ) − ˜p1 (j/N ) + Uij .
˜
ξ
ε
1
1
ˆ
¯
ˆ
ˆ
ˆ
Let U·jj ,p = n−1 n Uij,p Uij, p . We calculate the diﬀerence of Gp1 ,p2 (x, x ) −
i=1
1
1
1
ˆ
˜
¯
¯
Gp2 (x, x ) by checking the diﬀerence U·jj ,p − U·jj ﬁrst. For any 1 ≤ j = j ≤ N , one
1
88

has

= n−1

ˆ
¯
¯
U·jj ,p − U·jj
1

n
i=1

Uij (mp1 − m) j /N + Uij (mp1 − m) (j/N )
˜
˜

+ Uij ˜p j /N + U ˜p (j/N ) + Uij ˜p1 j /N +U ˜p1 (j/N )
ξ
ξ
ε
ε
ij 1
ij
1
+

mp1 − m
ˆ

j /N

mp1 − m (j/N ) .
ˆ

ˆ
˜
Next, we calculate the super norm of each part of Gp1, p2 (x, x )−Gp2 (x, x ) respectively. One
can write ˜p1 j /N =
ε

Ns1
Ns1
−1 ˆ −1 T
J=1−p1 BJ,p1 j /N wJ , where wJ J=1−p1 = N Vp1 X1 e

and

XT


 n


=

i=1

Uij ˜p1 j /N
ε


1≤j=j ≤N
N
 s2



 n


i=1




1≤j=j ≤N

Uij ˜p1 j /N B
ε
j/N, j /N
JJ ,p2



89

J,J =1−p2

.

Thus,

XT




n−1



2




n
i=1

Uij ˜p1 j /N
ε


1≤j=j ≤N 2

Ns2
=
J,J =1−p2


Ns1
n

 −1
j /N w
Uij 
B
n
J
J ,p1
i=1 1≤j=j ≤N
J =1−p1

Ns1
Ns2
n
 −1
Uij B
j
≤
n
J ,p1
i=1 1≤j=j ≤N
J,J =1−p2 J =1−p1
Ns1
×

w2
J



 BJJ ,p
2

2

j/N, j /N 

/N B
JJ ,p2

2

j/N, j /N 

= I × II,

J =1−p1
where
Ns2

Ns1

I=
J,J =1−p2 J =1−p1



n

−1
n


i=1

2



1≤j=j ≤N

2
ˆ −1
and II = N −1 Vp XT e .
1 1 2

90

Uij BJJ ,p j/N, j /N BJ ,p j /N

2
1


Let h∗ = min hs1 , hs2 . The deﬁnition of spline function implies that
Ns2

E[I] = n−1

Ns1

J,J =1−p2 J =1−p1
N





N




j,j =1

E U1j U
1j

BJ,p (j/N ) BJ,p j /N
2
2




N

×

BJ ,p j /N BJ ,p j /N BJ ,p j /N BJ ,p j /N

1
2
1
2


j =j j =j
≤ C(G, σ 2 )n−1 N 4 h2 h2 Ns2 max Ns1 , Ns2
s2 ∗

≤ C(G, σ 2 )n−1 N 4 hs2 h∗ .

Hence,

N −2 XT





n−1

n




1/2 1/2
Uij ˜p1 j /N
ε
= O n−1/2 hs h∗
.
2

i=1
1≤j=j ≤N 2

Meanwhile, one has that

2
II ≤ Cp1 N −1 XT e h−2 = Oa.s. (N nh2 )−1 .
s1
1 2 s1

By Lemma 3.7.4, one has



n




ˆ −1
Uij ˜p1 j /N
ε
sup
B T (x, x )Vp ,2 N −2 XT n−1
p2


2
2
x,x ∈[0,1]
i=1
1≤j=j ≤N ∞
1/2 1/2
−1/2 −3/2
= O n−1/2 N −1/2 h−1 hs h∗ h−2 = O n−1/2 N −1/2 hs
hs
= o (1) .
s1 2
s2
1
2
n1/2

91

Similarly, ˜p j /N =
ξ
1

Ns1
J=1−p1 BJ,p1 j /N sJ , where
κ
Ns1
ˆ −1
ξ ·k φk .
sJ J=1−p = N −1 Vp XT
1 1
1
k=1

Assumption (B3) ensures that

XT




n−1

2




n

Uij ˜p j /N
ξ
1

i=1
1≤j=j ≤N 2

2
Ns2
Ns1
n
 −1

≤
Uij BJ ,p j /N BJJ ,p j/N, j /N 
n
1
2
i=1 1≤j=j ≤N
J,J =1−p2 J =1−p1


Ns1

s2
J

×

= I × III,

J =1−p1
ˆ −1
where III = N −1 Vp XT
1 1
κ

N −1 XT
1

ξ ·k φk =
k=1

κ ξ φ 2 . Note that
k=1 ·k k 2

 κ


k=1

and


κ

ξ ·k N −1

N

ξ ·k N −1
E
j=1
k=1

N
 s1

N

BJ,p (j/N ) φk (j/N )
1

j=1
J=1−p1

2
BJ,p (j/N ) φk (j/N ) = O n−1 h2 ,
s1
1

hence
III ≤ Cp1 N −1 XT
1

2

κ

h−2 = Op (nhs1 )−1
s1

ξ ·k φk
k=1

2

92

and






n

ˆ −1
sup
Uij ˜p j /N
ξ
B T (x, x )Vp ,2 N −2 XT n−1
p2
1


2
2
x,x ∈[0,1]
i=1
1≤j=j ≤N ∞
−3/2
−1/2 1/2 1/2 −2
= O n−1/2 hs
hs h∗ hs
= O(n−1/2 hs
) = o (1) .
1
2
2
2
n1/2

Next one obtains that


n

−1 N −2
E n


i=1
= n−1 N −4

Uij BJ,p (j/N ) B
j /N
2
J ,p2
1≤j=j ≤N

m − mp1
˜

jN

2





N
E U1j U1j

BJ,p (j/N ) BJ,p j /N
2
2

j,j =1
N
N
×
B
j /N B
j /N m − mp1 j /N
˜
J ,p2
J ,p1
j =j j =j
4
2p
2p
≤ C(G, σ 2 )hs 1 n−1 N −4 N hs2 = C(G, σ 2 )hs 1 n−1 h4 .
s2
1
1

Therefore,

n1/2

ˆ −1
sup
B T (x, x )Vp ,2 N −2 XT
p2
2
x,x ∈[0,1]2



n


−1
n
Uij m j /N − mp1 j /N
˜


i=1
j=j ∞
−2 hp1 h ) = O(h−1 hp1 ) = o (1) .
= O(hs s s2
s2 s1
2 1

Finally, we derive the upper bound of

93

m − m p1
˜

j /N

⊗2
ˆ −1
ˆ
, where
sup
B T (x, x )Vp ,2 N −2 XT m − mp1
p2
x,x ∈[0,1]2
2
∞
ˆ
m − mp1

⊗2

=

m − mp1 (j/N ) m − mp1
ˆ
ˆ

j /N

1≤j=j ≤N

.

ˆ
m − mp1

⊗2

.
∞
Using the similar proof as Lemma A.8 in Wang and Yang (2009) and Assumption (B3), one
In order to apply Lemma 3.7.4, one needs to ﬁnd the upper bound of

has supx∈[0,1] ˜p1 (x) + supx∈[0,1] ˜p (x) = o n−1/2 . Therefore,
ε
ξ
1

m (x) − mp1 (x)
ˆ
x,x ∈[0,1]2
2
ˆ
supx∈[0,1] m (x) − mp1 (x)

sup
≤
≤
≤

Hence

m x

− mp1 x
ˆ

ξ
ε
supx∈[0,1] m (x) − mp1 (x) + supx∈[0,1] ˜p1 (x) + supx∈[0,1] ˜p (x)
˜
1
2
2p
p
p
= O hs 1 + n−1 + hs 1 n−1/2 = o n−1/2 .
O hs 1 + n−1/2
1
1
1

ˆ
m − mp1

⊗2

= o n−1/2 . Hence, the proposition has been proved.
∞

94

2

Table 3.1: Simulation results: uniform coverage rates from 500 replications.
σ

n

1−α

Coverage proportion Coverage proportion
(p1 = p2 = 4)

1.0

300

500

0.946

0.930

0.950

0.912

0.898

0.962

0.956

0.950

0.890

0.884

0.960

0.958

0.950

0.908

0.894

0.976

0.964

0.950

0.626

0.690

0.720

0.796

0.950

0.752

0.796

0.874

0.904

0.950

0.798

0.852

0.912

0.944

0.950

0.822

0.828

0.990

200

0.834

0.990

100

0.858

0.990

50

0.950

0.990

500

0.828

0.990

300

0.824

0.990

200

0.710

0.990

0.5

0.720

0.990

100

0.950
0.990

50

(p1 = p2 = 2)

0.922

0.936

0.950

0.864

0.858

0.990

0.946

0.946

95

Chapter 4
Spline Conﬁdence Bands for
Functional Derivatives

4.1

Introduction

In exploratory FDA, it is often of interest to estimate the mean functions; see for example,
Ramsay and Silverman (2005), Yao, M¨ller and Wang (2005a,b), Ferraty and Vieu (2006),
u
Li and Hsing (2010) and Cao, Yang and Todem (2012). In some settings, however, estimation and inference of derivatives of the mean functions in FDA are of equal importance.
For example, in economics, consistent and direct estimation of derivatives are essential for
estimating elasticities, returns to scale, substitution rates and average derivatives. Often,
these index (derivative) functions are as interesting as the mean functions themselves. Another example is in the ﬁelds of engineering and biomedical sciences, where the estimation
of velocity and acceleration are of great importance in addition to obtaining a smooth curve
of the measurements.
The problem of estimation and inference of derivatives for functional data is very chal96

lenging; see Ramsay and Silverman (2005), Liu and M¨ller (2009), and Hall, M¨ller and
u
u
Yao (2009) for some discussions. Existing methodologies for derivatives of the regression
function in FDA often rely on a pointwise analysis. For example, in Liu and M¨ller (2009)
u
the theoretical focus was primarily on obtaining consistency and asymptotic normality of
the proposed estimators, thereby providing the necessary ingredients to construct pointwise
conﬁdence intervals. This approach is important but its usefulness in conducting global inferences is limited. To our knowledge, we are not aware of any methodology that provides
simultaneous conﬁdence bands for functional derivatives in FDA. In this chapter, we develop
such methodology with the primary aim to better understand the variability and shape of
the mean curve.
Nonparametric simultaneous conﬁdence bands are powerful tools for global inference of
functions. Some work has been conducted to study the simultaneous conﬁdence band of the
mean curves for FDA; see Degras (2011), Ma, Yang and Carroll (2012) and Cao, Yang and
Todem (2012). The research work on conﬁdence bands for functional derivatives is actually
sparse. This is partially due to the technical diﬃculty to formulate such bands for FDA and
establish the associated theoretical properties.
Some smoothing tools are necessary to construct the conﬁdence bands. Popular smoothing methods include kernels (Gasser and M¨ller, 1984; H¨rdle, 1989; Xia, 1998; Claeskens
u
a
and Van Keilegom, 2003), local polynomials (Fan and Gijbels, 1996), splines (Wahba, 1990;
Stone, 1994) and series expansion methods (Morris and Carroll, 2006). In this chapter, we
use B-splines, which can be readily implemented due to their explicit expression, to construct the bands. B-spline approximation has also been employed to estimate the functional
mixed-eﬀect models in Shi, Weiss and Taylor (1996) and Rice and Wu (2001), and to study
functional data via principle components in Yao and Lee (2006) and Zhou, Huang and Car97

roll (2008). Other works include Zhou, Shen and Wolfe (1998) and Wang and Yang (2009a)
who have proposed B-spline conﬁdence bands for regression functions.
The proposed conﬁdence bands are asymptotically the same as if all the random trajectories are correctly recorded over the entire interval. As discussed in Section 4.3, the
estimators are semiparametrically eﬃcient thereby providing partial theoretical justiﬁcation
for treating functional data as perfectly recorded random curves over the entire data range,
as in Cao, Yang and Todem (2012).
The rest of the chapter is organized as follows. In Section 4.2, we introduce the model
and the spline estimators for the mean curves and their derivatives. Section 4.3 presents
the simultaneous conﬁdence bands for the derivatives of the mean curves. Speciﬁcally, in
Section 4.3.1, we show that the bands have asymptotically correct coverage probabilities;
and in Section 4.3.2, we discuss how to estimate the unknown components involved in the
band construction and other issues of the implementation. Section 4.4 reports ﬁndings from
a simulation study and a real data set. Proofs of technical results are relegated to the
Appendix.

4.2
4.2.1

Models and spline estimators
Models

We consider a collection of trajectories Xi (t) n which are i.i.d. realizations of a smooth
i=1
random function X(t), deﬁned on a continuous interval T . Assume that {X(t), t ∈ T } is
a L2 (T ) process, i.e. E T X 2 (t)dt < +∞, and deﬁne the mean and covariance functions as m(t) = E{X(t)} and G (t, s) = cov {X(t), X(s)}, t, s ∈ T . The covariance func98

tion is a symmetric nonnegative-deﬁnite function with a spectral decomposition, G (t, s) =
∞ λ φ (t)φ (s), where λ ≥ λ ≥ · · · ≥ 0, are the eigenvalues satisfying ∞ λ <
1
2
k
k=1 k k
k=1 k
∞, and φk (t) ∞ are the corresponding eigenfunctions that form an orthonormal bak=1
sis.

By the standard Karhunen-Lo`ve representation (Hall and Hosseini-Nasab, 2006),
e

Xi (t) = m(t) +

∞ ξ φ (t), where the random coeﬃcients ξ are uncorrelated with
ik
k=1 ik k

mean 0 and variance λk . In what follows, we assume that λk = 0, for k > κ, where κ is a
positive integer or ∞.
We consider a typical functional data setting where Xi (·) is recorded on a regular grid
in T , and assumed to be contaminated with measurement errors. Without loss of generality,
we take T = [0, 1]. Then the observed data are Yij = Xi Tij + σ Tij εij , for 1 ≤ i ≤ n,
1 ≤ j ≤ N , where Tij = j/N , εij are independent random errors with E εij

= 0 and

E(ε2 ) = 1, and σ(·) is the standard deviation of the measurement errors. By the Karhunenij
Lo`ve representation, the observed data can be written as
e

Yij = m (j/N ) +

κ
ξ φ (j/N ) + σ (j/N ) εij ,
k=1 ik k

(4.1)

where m(·), σ(·) and {φk (·)}κ
k=1 are smooth but unknown functions of t. In addition,
1 2
1
{φk (·)}κ
k=1 are further subject to constraints 0 φk (t) dt = 1, and 0 φk (t) φk (t) dt = 0,
for k = k.

4.2.2

Spline estimators

We ﬁrst introduce some notation of the B-spline space. Divide the interval T = [0, 1] into
(Nm + 1) subintervals IJ = ω J , ω J+1 , J = 0, ..., Nm − 1, INm = ω Nm , 1 , where
Nm
(p−2)
m := ω J J=1 is a sequence of equally-spaced points, called interior knots. Let H
99

be the polynomial spline space of order p on [0, 1]. The J-th B-spline of order p is denoted
by BJ,p . We augment the boundary and the number of interior knots as ω 1−p = ... =
ω −1 = ω 0 = 0 < ω 1 < ... < ω Nm < 1 = ω Nm +1 = ... = ω Nm +p , in which ω J = Jhm ,
J = 0, 1, ..., Nm + 1 and hm = 1/ (Nm + 1) is the distance between neighboring knots.

Following Cao, Yang and Todem (2012), we estimate the mean function m(·) in (4.1) by

m(·) =
ˆ

arg min
g(·)∈H(p−2)

n
i=1

2
N
Yij − g (j/N ) =
j=1

Nm
ˆ B (·),
b
J=1−p J,p J,p

where the coeﬃcients

N 

n

Nm

T
ˆ
Yij −
bp = ˆ1−p,p , ..., ˆNm ,p
b
b
= arg min
Nm +p i=1 j=1 
R
J=1−p
¯
¯
¯
Let Y = Y·1 , . . . , Y·N T and Y·j = n−1

2

bJ,p BJ,p (j/N )
.


n Y , 1 ≤ j ≤ N . Applying elementary
i=1 ij

algebra, one obtains
m (t) = Bp (t) BT B
ˆ

−1

BT Y

(4.2)

T
is
in which Bp (t) = B1−p,p (t) , . . . , BNm ,p (t) and B = BT (1/N ) , . . . , BT (N/N )
p
p
the design matrix.
We denote by m(ν) (t) the ν-th order derivative of m (t) with respect to t. Since m (t) is
ˆ
an estimator of m (t), it is natural to consider m(ν) (t) as the estimator of m(ν) (t), for any
ˆ
ν = 1, ..., p − 2, i.e.
−1 T
(ν)
m(ν) (t) = Bp (t) BT B
ˆ
B Y,
(ν)
where Bp (t) =

(4.3)

(ν)
(ν)
B1−p,p (t), . . . , BN ,p (t) . According to B-spline property in de Boor
m
100

(2001), for p > 2 and 2 − p ≤ J ≤ Nm − 1,
BJ,p−1 (t)
BJ+1,p−1 (t)
−
ω J+p−1 − ω J
ω J+p − ω J+1

d
B (t) = (p − 1)
dt J,p

.

(ν)
Therefore, Bp (t) = Bp−ν (t) DT , in which D(ν) = D1 · · · Dν−1 Dν , with matrix
(ν)




−1
 ω −ω
 1 1−p+l




 ω −ω1
 1 1−p+l



Dl = (p − l) 

0





.
.

.





0

0

0

···

0

0

−1
ω 2 −ω 2−p+l

0

···

0

0

−1
ω 3 −ω 3−p+l · · ·

0

0

.
.
.

.
.
.

1
ω 2 −ω 2−p+l
.
.
.

.
.
.

..

0

0

···

.

1
0 ω
Nm +p−l −ω Nm



























for 1 ≤ l ≤ ν ≤ p − 2, which is the same as equation (6) in Zhou and Wolfe (2000).

4.2.3

Convergence rate

Deﬁne the following “infeasible estimator” of function m(ν)

¯
m(ν) (t) = X (ν) (t) = n−1
¯

n
(ν)
X (t),
i=1 i

t ∈ [0, 1] .

(4.4)

The term “infeasible”, borrowed from Cao, Yang and Todem (2012), refers to the fact that
(ν)
m(ν) (·) would be a natural estimator of m(ν) (·) if all random curves Xi (·) were observed.
¯
In the following, we want to show that the spline estimator m(ν) (·) in (4.3) is asymptotically
ˆ
101

equivalent to m(ν) (·).
¯
We break the error m(ν) (·) − m(ν) (·) into three terms. Let ¯·j = n−1
ˆ
ε

n ε ,
i=1 ij

1 ≤ j ≤ N . Denote the signal vector, the noise vector and the eigenfunction vectors by
T
N T , e= σ 1 ¯ , . . . , σ N ¯
1
and
m= m N , . . . , m N
N ε·1
N ε·N
T
1
. Projecting the relationship in model (4.2) onto the linφk = φk N , . . . , φk N
N
ear subspace of RNm +p spanned by BJ,p (j/N )
, we obtain the
1≤j≤N,1−p≤J≤Nm
following crucial decomposition:

(ν)
m(ν) (t) = m(ν) (t) + e(ν) (t) + ˜ (t),
ˆ
˜
˜
ξ

(4.5)

(ν)
where m(ν) (t) = Γ(ν) (t)m, e(ν) (t) = Γ(ν) (t)e and ˜
˜
˜
ξ
(t) =

κ ¯ Γ(ν) (t)φ with
k
k=1 ξ ·k

−1 T
(ν)
B and ¯·k = n−1
ξ
Γ(ν) (t) = Bp (t) BT B

n ξ , 1 ≤ k ≤ κ.
i=1 ik

The following proposition provides asymptotic properties of the three terms.

Proposition 4.2.1. Under Assumptions (C1)-(C6) in Appendix, one has

supt∈[0,1] m(ν) (t) − m(ν) (t) = o n−1/2 ,
˜

(4.6)

(ν)
supt∈[0,1] ˜ (t) − (m(ν) (t) − m(ν) (t)) = oP n−1/2 ,
ξ
¯

(4.7)

supt∈[0,1] e(ν) (t) = oP n−1/2 .
˜

(4.8)

Appendix A.2 contains proofs for the above proposition, which together with (4.5), leads
to the following semiparametric eﬃciency result.

Theorem 4.2.1. Under Assumptions (C1)-(C6) in Appendix, the B-spline estimator m(ν)
ˆ
102

√
is asymptotically equivalent to m(ν) with the n approximation power, i.e.
¯

supt∈[0,1] m(ν) (t) − m(ν) (t) = oP n−1/2 .
ˆ
¯

Remark 4.2.1. Since the “infeasible estimator” m(ν) (t) is the sample average of i.i.d. tra¯
jectories Xi (t) n , an application of the central limit theorem gives
i=1
supt∈[0,1] m(ν) (t) − m(ν) (t) = OP n−1/2 . Thus combining with the results in Theo¯
rem 4.2.1, one has

ˆ
supt∈[0,1] m(ν) (t) − m(ν) (t) = OP n−1/2 .

4.3

Conﬁdence bands

In this section, we develop the simultaneous conﬁdence bands for the derivative function
m(ν) (·).

4.3.1

Asymptotic conﬁdence bands

Let Σ(·, ·) be a positive deﬁnite function, and deﬁned as Σ(t, s) =

κ λ φ(ν)(t)φ(ν) (s),
k=1 k k
k

t, s ∈ [0, 1]. Denote by ζ (t), t ∈ [0, 1] a standardized Gaussian process such that Eζ (t) ≡ 0,
Eζ 2 (t) ≡ 1 with covariance function Eζ (t) ζ (s) = Σ (t, s) {Σ (t, t) Σ (s, s)}−1/2 , t, s ∈ [0, 1].
Denote by q1−α the 100 (1 − α)th percentile of the absolute maxima distribution of ζ (t),
t ∈ [0, 1], i.e. P supt∈[0,1] |ζ (t)| ≤ q1−α = 1 − α, ∀α ∈ (0, 1) .

103

Theorem 4.3.1. Under Assumptions (C1)-(C6) in Appendix, ∀α ∈ (0, 1), as n → ∞,

P supt∈[0,1] n1/2 m(ν) (t) − m(ν) (t) Σ (t, t)−1/2 ≤ q1−α → 1 − α.
¯

Applying Theorems 4.2.1 and 4.3.1 gives asymptotic conﬁdence bands for m(ν) (t), t ∈
[0, 1].
Corollary 4.3.2. Under Assumptions (C1)-(C6) in Appendix, ∀α ∈ (0, 1), as n → ∞, an
asymptotic 100 × (1 − α) % exact conﬁdence band for m(ν) (t) is

P m(ν) (t) ∈ m(ν) (t) ± n−1/2 q1−α Σ (t, t)1/2 ,
ˆ

4.3.2

t ∈ [0, 1] → 1 − α.

Implementation

When constructing the conﬁdence bands, one needs to estimate the unknown function Σ(t, s).
(ν)
Note that Σ(t, s) = G(t, s), when ν = 0. Following Liu and M¨ller (2009), we estimate φ
u
k
through the derivatives of G(t, s). According to Cao, Wang, Li and Yang (2012), G (t, s) is
estimated by
ˆ
G (t, s) =
ˆ
where R
= n−1
·jj

NG
ˆ
b
B (t) B
(s)
J ,p
J,J =1−p JJ J,p

n
ˆ
i=1 Yij − m (j/N )

Y

ij

− m j /N
ˆ

(4.9)

, 1 ≤ j = j ≤ N , NG is

the number of interior knots for B-spline, and the coeﬃcients

ˆ
bJJ
=

NG
J,J =1−p
2




N 


ˆ
arg min
R
−
b
B (j/N ) B
j /N
JJ J,p
J ,p
 ·jj



NG +p ⊗RNG +p
R
j=j
1−p≤J,J ≤NG
104

ˆ
with “⊗” being the tensor product of two spaces. They showed that G converges to G as n
ˆ
goes to ∞. In this section, we further show that G and G are asymptomatically equivalent
up to the ν-th partial derivative. We deﬁne the ν-th derivative with respect to s for G(t, s)
ˆ
and G(t, s) as
∂ν ˆ
∂ν
ˆ
G(0,ν) (t, s) = ν G(t, s), G(0,ν) (t, s) = ν G(t, s) =
∂s
∂s

(ν)
ˆ
bJJ BJ,p (t) B
(s) . (4.10)
J ,p
J,J

Theorem 4.3.3. Under Assumptions (C1)-(C6), one has

ˆ
sup
G(0,ν) (t, s) − G(0,ν) (t, s) = oP (1) ,
(t,s)∈[0,1]2

1 ≤ ν ≤ p − 2.

The proof of Theorem 4.3.3 is given in Appendix A.3.
According to Liu and M¨ller (2009), we estimate the ν-th derivative of eigenfunctions
u
ˆ (ν) using the following eigenequations,
φ
k
1 ∂ν
1
dν
ˆ
ˆ
ˆ ˆ (ν)
ˆ
ˆ
G(t, s)φk (t) dt =
ν
ν G(t, s)φk (t) dt = λk φk (s) ,
ds 0
0 ∂s

(4.11)

1 ˆ2
1ˆ
ˆ
ˆ
where φk are subject to 0 φk (t) dt = 1 and 0 φk (t) φk (t) dt = 0 for k < k. If N is
suﬃciently large, the left hand side of (4.11) can be approximated by
j
N G(0,ν) ( j , j )φ
ˆ
ˆ
j=1
N N k N . Then we estimate Σ(t, s) by
ˆ ˆ (ν) ˆ (ν) (s).
ˆ
Σ(t, s) = κ λk φ (t)φ
k=1
k
k
1
N

ˆ
The following theorem shows that Σ(·, ·) and Σ(·, ·) are asymptomatically equivalent.

105

Theorem 4.3.4. Under Assumptions (C1)-(C6), one has

ˆ
sup
Σ(t, s) − Σ(t, s) = oP (1) .
(s,t)∈[0,1]2

The proof of Theorem 4.3.4 is given in Appendix A.4.
ˆ
ˆ
In practice, we choose the ﬁrst L positive eigenvalues λ1 ≥. . . ≥ λL > 0 by eigenvalue
ˆ
decomposition of G(t, s). Then we apply a standard criterion in M¨ller (2009), to choose the
u
number of eigenfunctions, i.e. κ = arg min1≤l≤L

l
ˆ
k=1 λk /

L λ > 0.95 . M¨ller
ˆ
u
k=1 k

(2009) suggests the “pseudo-AIC” and this simple method of counting the percentage of
variation explained can be used to choose the number of principal components. The simple
method performed well in our simulations and is used for our numerical studies.
To construct the conﬁdence bands, we use cubic splines to estimate the mean and covariance functions and their ﬁrst order derivatives. Generalized cross-validation is used to
choose the number of knots Nm (from 2 to 20), to smooth out the mean function. According
to Assumption (C3), the number of knots for smoothing the covariance function is taken to
be NG = [n1/(2p) log(n)], where [a] denotes the integer part of a.
Finally, in order to estimate q1−α , we generate i.i.d standard normal variables Zk,b ,
ˆ
ˆ
ˆ (ν)
ˆ
1 ≤ k ≤ κ, b = 1, . . . , 5000. Let ζ b (t) = Σ (t, t)−1/2 κ
k=1 λk Zk,b φk (t), t ∈ [0, 1],
ˆ
q1−α can be estimated by 100(1 − α)-th percentile of {supt∈[0,1] |ζ b (t)|}5000 . Therefore,
b=1
in application we recommend the following band

ˆ
m(ν) (t) ± n−1/2 Σ (t, t)1/2 q1−α ,
ˆ
ˆ

106

t ∈ [0, 1].

(4.12)

4.4

Numerical studies

4.4.1

Simulated examples

To illustrate the ﬁnite-sample performance of the conﬁdence band in (4.12), we generate
data from the following:

Yij = m (j/N ) +

κ
ξ φ (j/N ) + εij ,
k=1 ik k

i.i.d.
εij ∼ N (0, 0.12 ),

for 1 ≤ i ≤ n, 1 ≤ j ≤ N . We consider two scenarios.
√
√
Model I: m(t) = 5t+4 sin(2π(t−0.5)), φ1 (t) = − 2 cos(2π(t−0.5)), φ2 (t) = 2 sin(4π(t−
0.5)), ξ ik ∼ N (0, λk ), λ1 = 2, λ2 = 1, κ = 2;
√
(t−0.5)2
Model II: m(t) = 4t + √ 1
exp −
, φk (t) = 2 sin (πkt), ξ ik ∼ N (0, λk ),
2π0.1
2(0.1)2
λk = 2−(k−1) , k = 1, 2, . . . , κ = 8.
The second case has similar design as in Simulation C of Liu and M¨ller (2009). We use
u
the proposed method in (4.12) and its “oracle” version with true Σ (t, t) to construct the
conﬁdence bands for m(1) (·) respectively in both studies. We consider two conﬁdence levels:
1 − α = 0.95, 0.99. The number of trajectories n is taken to be 30, 50, 100, 200, and for
each n, we try diﬀerent numbers of observations on the trajectory. Each simulation consists
of 1000 Monte Carlo samples.
We evaluate the coverage of the bands over 200 equally spaced points on [0, 1] and test
whether the true functions are covered by the conﬁdence bands at these points. Tables
4.1 and 4.2 show the empirical coverage probabilities out of 1000 replications for Models
I and II, respectively. From Tables 4.1 and 4.2, we observe that coverage probabilities for
107

Table 4.1: Coverage rates of the spline conﬁdence bands in Model I.
95%

99%

n

N

Est.

Oracle

Est.

Oracle

30

30

0.756

0.831

0.901

0.936

60

0.833

0.900

0.926

0.969

50

0.824

0.863

0.932

0.933

100 0.856

0.904

0.947

0.979

100 100 0.851

0.897

0.943

0.971

200 0.856

0.907

0.949

0.971

200 200 0.866

0.910

0.950

0.972

400 0.869

0.944

0.959

0.987

50

both estimated bands and “oracle” bands approach the nominal levels, which show positive
conﬁrmation of Theorem 4.3.1. In most of the scenarios the “oracle” conﬁdence bands
outperform the estimated bands, and the “oracle” bands arrive at about the nominal coverage
for large n and N . The convergence rates of estimated bands are slower than those “oracle”
bands, but the convergence trend to nominal level is clearly.

Figure 4.1 to Figure 4.8 show the estimated functions and their 99% conﬁdence bands
for the ﬁrst order derivative curve m(1) (·) for Models I and II, respectively. As expected
when n and N increase, the conﬁdence band is narrower and the cubic spline estimator is
closer to the true derivative curve. For Model I, the boundary eﬀects in all four panels are
almost unnoticeable. For Model II, there seems to be some boundary eﬀects for small n and
N , which are attenuated as n and N increase.
108

−20

0

m ( 1)

20

40

n=30, N=30

0.0

0.2

0.4

0.6

0.8

1.0

Figure 4.1: Plots of the cubic spline estimators (dotted-dashed line) and 99% conﬁdence
bands (upper and lower dashed lines) of m(1) (t) (solid line) in Model I. n = 30, N = 30.

109

−20

0

m ( 1)

20

40

n=30, N=60

0.0

0.2

0.4

0.6

0.8

1.0

Figure 4.2: Plots of the cubic spline estimators (dotted-dashed line) and 99% conﬁdence
bands (upper and lower dashed lines) of m(1) (t) (solid line) in Model I. n = 30, N = 60.

110

−20

0

m ( 1)

20

40

n=50, N=100

0.0

0.2

0.4

0.6

0.8

1.0

Figure 4.3: Plots of the cubic spline estimators (dotted-dashed line) and 99% conﬁdence
bands (upper and lower dashed lines) of m(1) (t) (solid line) in Model I. n = 50, N = 50.

111

−20

0

m(1)

20

40

n=50, N=100

0.0

0.2

0.4

0.6

0.8

1.0

Figure 4.4: Plots of the cubic spline estimators (dotted-dashed line) and 99% conﬁdence
bands (upper and lower dashed lines) of m(1) (t) (solid line) in Model I. n = 50, N = 100.

112

−20

0

m ( 1)

20

40

n=30, N=30

0.0

0.2

0.4

0.6

0.8

1.0

Figure 4.5: Plots of the cubic spline estimators (dotted-dashed line) and 99% conﬁdence
bands (upper and lower dashed lines) of m(1) (t) (solid line) in Model II. n = 30, N = 30.

113

−20

0

m ( 1)

20

40

n=30, N=60

0.0

0.2

0.4

0.6

0.8

1.0

Figure 4.6: Plots of the cubic spline estimators (dotted-dashed line) and 99% conﬁdence
bands (upper and lower dashed lines) of m(1) (t) (solid line) in Model II. n = 30, N = 60.

114

−20

0

m ( 1)

20

40

n=50, N=50

0.0

0.2

0.4

0.6

0.8

1.0

Figure 4.7: Plots of the cubic spline estimators (dotted-dashed line) and 99% conﬁdence
bands (upper and lower dashed lines) of m(1) (t) (solid line) in Model II. n = 50, N = 50.

115

−20

0

m(1)

20

40

n=50, N=100

0.0

0.2

0.4

0.6

0.8

1.0

Figure 4.8: Plots of the cubic spline estimators (dotted-dashed line) and 99% conﬁdence
bands (upper and lower dashed lines) of m(1) (t) (solid line) in Model II. n = 50, N = 100.

116

Table 4.2: Coverage rates of the spline conﬁdence bands in Model II.
95%

99%

n

N

Est.

Oracle

Est.

Oracle

30

30

0.650

0.730

0.830

0.854

60

0.695

0.839

0.860

0.941

50

0.817

0.911

0.930

0.982

100 0.830

0.929

0.933

0.986

100 100 0.858

0.940

0.948

0.986

200 0.876

0.939

0.960

0.986

200 200 0.874

0.939

0.949

0.980

400 0.889

0.946

0.963

0.991

50

4.4.2

Tecator data

Here we apply the proposed method to the Tecator dataset, which can be introduced in
chapter 2.6. Figure 4.9 shows the estimated mean absorbance measurements m(·) and its
estimated ﬁrst order derivative m(1) (·) in the lower panel. Their 99% conﬁdence bands
(dashed lines) are also included in the ﬁgure, both bands have similar band width around
0.1 to 0.2 even though the bands for m(1) (·) looks much narrower in the ﬁgure.
As shown in Figure 4.9, in the region of 850 − 950 nm the derivative estimate of mean
absorbance is increasing gradually above 0, which corroborates with the convex behavior
of the corresponding estimated mean function. For wavelength between 950 and 970 nm,
the big bump in the derivative graph is consistent with the changing pattern of the mean
estimate before it reaches the turning point at around 970 nm. When wavelength is larger
than 970 nm, its estimated derivative turns negative and is relatively ﬂat after 1000 nm, and
117

it is in accordance with the quick dip of the mean absorbance and a linear decreasing trend
for wavelength larger than 1000 nm.

118

3.6
3.4
3.2
2.8

3.0

absorbance

850

900

950

1000

1050

1
−2

−1

0

µ(1)

2

3

4

wavelength

0.0

0.2

0.4

0.6

0.8

1.0

Wavelength

Figure 4.9: Plots of the cubic spline estimators (dotted-dashed line) and 99% conﬁdence
bands (upper and lower dashed lines) of the mean function and its ﬁrst order derivative.

119

APPENDIX

120

For any r ∈ (0, 1], we denote C q,r [0, 1] as the space of H¨lder continuous functions on
o
[0, 1], C q,r [0, 1] = φ : φ q,r < +∞ . The technical assumptions we need are as follows:
(C1) The regression function m ∈ C p−1,1 [0, 1];
(C2) The standard deviation function σ (t) ∈ C 0,δ [0, 1] for some δ ∈ (0, 1];
nθ for some θ > 1+2ν ; the
2(p−ν)
1
1
Nm
(N/log(n)) 1+2ν , n 2p

(C3) The number of observations for each trajectory N
1
2(p−ν)
number of interior knots satisﬁes n
1
NG
n 2+2ν ;

(C4) There exists a constant C > 0 such that Σ (t, t) > C, for any t ∈ [0, 1];
(ι)
(C5) For k ∈ {1, . . . , κ}, ι = 0, 1, . . . , p − 2, φ (t) ∈ C 0,δ [0, 1], for some δ ∈ (0, 1],
k
(ι)
κ
< ∞; and for a sequence {κn }∞ of increasing integers with
n=1
k=1 λk φk
∞
(ι)
−δ κn
= o (1);
λk φ
limn→∞ κn = κ, Nm
k 0,δ
k=1
4+δ 2
(C6) There exist δ 1 > 0, δ 2 > 0 such that E ξ ik 4+δ 1 +E εij
< +∞, for 1 ≤ i < ∞,
1 ≤ k ≤ κ, 1 ≤ j < ∞. The number κ of nonzero eigenvalues is ﬁnite or κis inﬁnite
while the variables

ξ ik 1≤i<∞,1≤k<∞ are i.i.d..

Assumptions (C1)-(C6) are standard in the spline smoothing literature; see Huang (2003)
and Cao, Yang and Todem (2012), for instance. In particular, (C1) and (C2) guarantee the
convergence rates of m(t) and its derivatives. Assumption (C3) states the requirement of the
ˆ
number of observations within each curve to the sample size, and the order of the number
of knots of splines. Assumptions (C4) and (C5) concern that the derivatives of principal
components have collectively bounded smoothness. When ι = 0, Assumption (C5) is the
121

same as (C4) in Cao, Yang and Todem (2012). Assumption (C6) is necessary when using
strong approximation result in Lemma 4.4.4.
(ν)
(ν)
If κ is ﬁnite and all φ
(t) ∈ C 0,δ [0, 1], then Assumption (C5) on φ ’s holds trivially.
k
k
For κ = ∞, Assumption (C5) is fulﬁlled as long as λk decreases to zero suﬃciently fast. For
example, considering the following canonical orthonormal Fourier basis of L2 ([0, 1]):
√
φ1 (t) ≡ 1, φ2k+1 (t) ≡ 2 cos (kπt)
√
2 sin (kπt) , k = 1, 2, ..., t ∈ [0, 1] ,
φ2k (t) ≡

we can take λ1 = 1 and λk = ([k/2] π)−2ν ρ2[k/2] , k = 2, ..., for any ρ ∈ (0, 1), then
√
√
√
(ν)
∞
= 1 + ∞ ρk
λk φ
2 + 2 = 1 + 2 2ρ (1 − ρ)−1 < ∞. While for
k=1
k=1
k ∞
any δ = 1 and {κn }∞ with κn increasing, odd and κn → ∞, one has
n=1
−1
Nm

κn
k=1

√
(κn −1)/2 k √
(ν)
−1
λk φ
= Nm
ρ
2kπ + 2kπ
k 0,1
k=1
√
√
∞
−1
−1
≤ 2 2πNm ρ
ρk−1 k = 2 2πNm (1 − ρ)−2
k=1
−1
= O Nm = o (1) .

In the following, deﬁne the theoretical and empirical inner product matrices of
Nm
BJ,p (t)
as
J=1−p

Vp =
ˆ
Vp =

Nm
Nm
= v
,
JJ ,p J,J =1−p
J,J =1−p
Nm
Nm
BJ,p , B
= v
ˆ
.
J ,p 2,N J,J =1−p
JJ ,p JJ =1−p

BJ,p , B
J ,p

ˆ
We establish next that Vp has an inverse with bounded L∞ norm.
122

(4.13)

ˆ
Lemma 4.4.1. [Cao, Yang and Todem (2012)] Under Assumption (C3), for Vp and Vp
ˆ
deﬁned in (4.13), Vp − Vp

∞

ˆ −1
= O (Nm ).
= O N −1 and Vp
∞

Proof of Proposition 4.2.1
Following Wang and Yang (2009b), we introduce the p-th order quasi-interpolant of m corresponding to the knots

, denoted by Q

(m); see equation (4.12) on page 146 of DeVore

and Lorentz (1993) for details. According to Theorem 7.7.4, DeVore and Lorentz (1993), the
following lemma holds.

Lemma 4.4.2. There exists a constant C > 0, such that for 0 ≤ ν ≤ p − 2 and m ∈
C p,1 [0, 1],
(m − Q

(m))(ν)

∞

≤ C m(p)

p−ν
h
.
∞ m

Lemma 4.4.3. Under Assumptions (C2), (C3) and (C6), one has

1
N



N

BJ,p (j/N ) σ (j/N ) ¯·j = OP 
ε
j=1


log(n) 
.
nN Nm

Proof. We ﬁrst truncate the random error εij by un = (nN )γ (2/9 < γ < 1/3) and
write εij = εij,1 +εij,2 +aij , where εij,1 = εij I

εij > un , εij,2 = εij I

aij = E εij I εij ≤ un . It is easy to see that aij = −E εij I εij > un
4+δ 2
−(3+δ 2 )
εij
un
. It is straightforward from the boundedness of spline basis that

aij and
≤E

εij ≤ un −

1
nN

N

n
BJ,p (j/N ) σ (j/N )

j=1

i=1
123

−(3+δ 2 )
aij = O(un
).

Next we show that the tail part vanishes almost surely. Note that
∞
P
n=1

4+δ 2
∞
∞ E ε
ij
−(4+δ 2 )
un
< ∞.
≤ Mδ
εij > un ≤
4+δ 2
n=1
n=1 un

By the Borel-Cantelli Lemma, P ω|∃N (ω) , εij (ω) ≤ un f or n > N (ω) = 1. Let vε =
max

εij , 1 ≤ i, j ≤ N (ω)

and there exists N1 (ω) > N (ω), uN (ω) > vε . Since un =
1

(nN )γ is an increasing function, un > uN (ω) > vε and n > N1 (ω).
1

Therefore P {ω|∃N (ω) , εij (ω) ≤ un , 1 ≤ i ≤ n, 1 ≤ j ≤ N , for min (n, N ) >
N (ω)} = 1, which implies P {ω|∃N (ω) , εij,1 = 0, 1 ≤ i ≤ n, 1 ≤ j ≤ N for min (n, N ) >
N (ω)} = 1. Thus

1
nN

n

N
BJ,p (j/N ) σ (j/N )

εij,1 = Oa.s. (nN )−k , for any k > 0.

i=1

j=1

Next denote Dj = (nN )−1 BJ,p (j/N ) σ (j/N )

Var εij,2 = 1 − E ε2 I
ij

2
one has Vn = Var

N D
j=1 j

εij > un

n ε
i=1 ij,2 . Since

− a2 = 1 + O P
ij

−δ
−2(1+δ 2 )
un 2 + un
,

= c (nN Nm )−1 for a constant c > 0. Now Minkowski’s

inequality implies that

E εij,2

Thus E Dj

k

≤ 2k−1 E εk I
ij

k

k−2
2
≤ 2n−1 N −1 un
k!E(Dj ) < ∞ with the Cramer constant c∗ = 2un .
nN

εij ≤ un + ak
ij

124

2
k−2
≤ 2k−2 un E εij,2 k!,

k ≥ 2.

log(n)
nN Nm . By the Bernstein inequality, for any large enough δ > 0,

For any δ, let δ n = δ

P


 N


Hence

j=1




Dj ≥ δ n ≤ 2 exp


∞ P
n=1

1
nN






2
−δ n
= 2 exp
≤ 2n−3 .
 4c + 4 un δ n 
nN Nm
nN

−δ 2
n
2
4Vn + 2c∗ δ n

N B (j/N ) σ(j/N )
j=1 J,p

n ε
i=1 ij,2 ≥ δ n

≤2

∞ n−3 < ∞,
n=1

for such δ > 0. Thus Borel-Cantelli’s lemma implies the desired result.

Lemma 4.4.4. [Theorem 2.6.7 of Cs˝rg˝ and R´v´sz (1981)] Suppose that ξ i , 1 ≤ i ≤ n
o o
e e
are i.i.d. with E(ξ 1 ) = 0, E(ξ 2 ) = 1 and H(x) > 0 (x ≥ 0) is an increasing continuous
1
function such that x−2−γ H(x) is increasing for some γ > 0 and x−1 logH(x) is decreasing with EH (|ξ 1 |) < ∞. Then there exist constants C1 , C2 , a > 0 which depend only
on the distribution of ξ 1 and a sequence of Brownian motions {Wn (m)}∞ , such that
n=1
for any {xn }∞ satisfying H −1 (n) < xn < C1 (nlog(n))1/2 and Sm =
n=1

m ξ , then
i=1 i

P max1≤m≤n |Sm − Wn (m)| > xn ≤ C2 n {H (axn )}−1 .
Proof of Proposition 4.2.1. We ﬁrst show (4.6). According to Theorem A.1 of
Huang (2003), there exists an absolute constant C > 0, such that

m−m ∞ ≤C
˜

p
inf
g − m ∞ ≤ C m(p)
hm .
∞
g∈C p,1

(4.14)

Applying Lemma 4.4.2, for 0 ≤ ν ≤ p − 2,

{Q

(m) − m}(ν)

∞

≤ C m(p)

As a consequence of (4.14) and (4.15) if ν = 0, one has Q
125

p−ν
hm .
∞

(4.15)

p
(m) − m ∞ ≤ C m(p)
˜
hm ,
∞

which, according to the diﬀerentiation of B-spline given in de Boor (2001), entails that

{Q

(m) − m}(ν)
˜

∞

≤ C m(p)

p−ν
h
,
∞ m

(4.16)

for 0 ≤ ν ≤ p − 2. Combining (4.15) and (4.16) proves (4.6) for ν = 1, ..., p − 2.

(ν)
Next we prove (4.7). Similar to the deﬁnition of m(ν) (t) and ˜ (t), in the following
˜
ξ
k
(ν)
˜
we denote φ
(t) = Γ(ν) (t) φk , for any k ≥ 1. Using the similar arguments as in the proof
k
of (4.6), we can show that, for any k ≥ 1,

(ν) ˜ (ν)
p−ν
≤ Cφ hm .
φ
−φ
k
k ∞

(4.17)

(ν)
˜ (ν)
Also, according to triangle inequality one has that φ
≤ cφ φ
= O(1).
k ∞
k ∞
According to Assumption (C6), E ξ ik 4+δ 1 < +∞, δ 1 > 0, so there exists some
n
β ∈ (0, 1/2), such that 4 + δ 1 > 2/β. Let H(x) = x4+δ 1 , xn = nβ , then
=
H(axn )
a−4−δ 1 n1−(4+δ 1 )β = O n−γ 1 for some γ 1 > 1. Applying Lemma 4.4.4 and BorelCantelli Lemma, one ﬁnds i.i.d. variables Zik,ξ ∼ N (0, 1) such that

¯
max ¯·k − Z·k,ξ
ξ
k≥1
¯
where Z·k,ξ = n−1

λk = Oa.s. nβ−1 ,

(4.18)

n Z
i=1 ik,ξ , k ≥ 1.

¯
If κ is ﬁnite, according to (4.18) note that ¯·k ≤ Z·k,ξ
ξ

¯
λk + ¯·k − Z·k,ξ
ξ

λk ,

1 ≤ k ≤ κ, so max1≤k≤κ ¯·k = OP (n−1/2 + nβ−1 ). Then the deﬁnition of m(t) in (4.4)
ξ
¯
126

and (4.17) entail that

(ν)
m(ν) − m(ν) − ˜
¯
ξ

(ν) ˜ (ν)
= oP n−1/2 .
ξ
−φ
≤ κ max ¯·k max φ
k
k ∞
1≤k≤κ
1≤k≤κ
∞

Thus (4.8) holds according to Assumption (C3).

If κ = ∞, using similar arguments in Cao, Yang and Todem (2012), by (4.18) one can
−1/2
−1/2 ¯
−1/2
¯
ξ
=
ξ
− Z·k,ξ , for any k ≥ 1, so ¯·k λ
ξ
≤ Z·k,ξ + ¯·k λ
show that ¯·k λ
k
k
k
OP (n−1/2 + nβ−1 ). Also following Assumption (C5) one has

≤

≤

+

≤

Hence,

(ν)
E sup m(ν) (t) − m(ν) (t) − ˜
¯
ξ
(t)
t∈[0,1]
κ
κ
(ν)
(ν)
˜ (ν) (t) ≤ C
¯
(t) − φ
E ¯·k sup φ
ξ
(t)
E ξ ·k sup φ
k
k
k
t∈[0,1]
t∈[0,1]
k=1
k=1

 κn
−1/2
1/2 (ν)
φ
E ¯·k λ
ξ
λ
hδ
C
k
k
k 0,δ m

k=1

κ

−1/2
1/2 (ν)
φ
E ¯·k λ
ξ
λ
k
k
k ∞
k=κn +1


κ
 κn

1/2 (ν)
1/2 (ν)
Cn−1/2
λ
φ
hδ +
λ
φ
= o n−1/2 .
k
k 0,δ m
k
k ∞

k=1
k=κn +1
(ν)
m(ν) − m(ν) − ˜
¯
ξ
∞

= oP n−1/2 . According to Lemmas 4.4.1 and 4.4.3,

ﬁnally we have

e(ν)
˜

∞

(ν)
ˆ −1
Bp (t) Vp N −1 BT e

ˆ −1
≤ Ch−ν Vp
N −1 BT e
m
∞
∞
∞
−1/2−ν
= Oa.s. n−1/2 N −1/2 hm
log(n) .

=

127

Thus (4.8) holds according to Assumption (C3).

Proof of Theorem 4.3.1

˜
We denote ζ k (t) =

˜
ζ (t) = n1/2

(ν)
¯
λk Z·k,ξ φ
(t) , k = 1, . . . , κ and deﬁne
k

2 −1/2
κ
(ν)
λ
φ (t)
k
k=1 k

κ ˜
ζ (t) = n1/2 Σ (t, t)−1/2
k=1 k

κ ˜
ζ (t).
k=1 k

˜
It is clear that, for any t ∈ [0, 1], ζ (t) is Gaussian with mean 0 and variance 1, and the
˜ ˜
covariance E ζ (t) ζ (s) = Σ (t, t)−1/2 Σ (s, s)−1/2 Σ (t, s), for any t, s ∈ [0, 1]. That is, the
˜
distribution of ζ (t), t ∈ [0, 1] and the distribution of ζ (t), t ∈ [0, 1] in Section 3.1 are
identical. Similar to the proof of (4.7), Note that

(ν)
˜
E supt∈[0,1] ζ (t) − n1/2 Σ (t, t)−1/2 ˜ (t)
ξ
= n1/2 E supt∈[0,1] Σ (t, t)−1/2

κ ˜
(ν)
ζ (t) − ˜ (t)
ξ
k=1 k

κ
1/2 E sup Σ (t, t)−1/2
≤ n
t∈[0,1]
k=1

¯
Z·k,ξ

(ν)
(ν)
˜ (ν)
λk − ¯·k φ (t) + ¯·k φ (t) − φ (t)
ξ
ξ
k
k
k

128

(ν)
p−ν
˜
ξ
If κ is ﬁnite, then E supt∈[0,1] ζ (t) − n1/2 Σ (t, t)−1/2 ˜ (t) = O nβ−1/2 + hm

=

o(1). If κ = ∞, by Assumption (C5) one has
κ
(ν)
(ν)
1/2 E sup Σ (t, t)−1/2
˜ (ν)
¯
n
ξ
λk Z·k,ξ − ¯·k φ (t) + ¯·k φ (t) − φ (t)
ξ
k
k
k
t∈[0,1]
k=1
≤ n1/2 sup Σ (t, t)−1/2
t∈[0,1]


κ
κ


(ν)
(ν)
β−1
−1/2
˜ (ν) (t)
n
λk φ (t) + n
λk φ (t) − φ
k
k
k


k=1
k=1
= o(1).

Theorem 4.3.1 follows directly.

Proof of Theorem 4.3.3

Following Cao, Wang, Li and Yang (2012), we deﬁne the tensor product spline space as

H(p−2),2 [0, 1]2 ≡ H(p−2),2 = H(p−2) ⊗ H(p−2)


 NG



b
B (t) B
(s) , t, s ∈ [0, 1] .
=
JJ J,p
J ,p




J,J =1−p
¯
Let Rij ≡ Yij − m(j/N ), 1 ≤ i ≤ n, 1 ≤ j ≤ N and R·jj = n−1

n R R ,
i=1 ij ij

1 ≤ j, j ≤ N . Note that the spline estimator in (4.9) is

ˆ
G(·, ·) =

arg min
g(·,·)∈H(p−2),2

N
j=j
129

ˆ
R
− g j/N, j /N
·jj

2

,

so we deﬁne the “infeasible estimator” of the covariance function

˜
G (·, ·) = argmin

g(·,·)∈H(p−2),2

¯
R·jj − g j/N, j /N

2

.

(4.19)

1≤j=j ≤N

(ν)
(0,ν)
Denote X = B ⊗ B, Bp (t, s) = Bp (t) ⊗ Bp (s) and Bp
(t, s) = Bp (t) ⊗ Bp (s). Let
¯
¯
vector R = R
·jj

1≤j,j ≤N

, then we have

ν
−1 T
ˆ
˜ (0,ν) (t, s) ≡ ∂ G(t, s) = B(0,ν) (t, s) XT X
¯
X R.
G
p
∂sν

(4.20)

(0,ν)
(ν)
In the following we write φ
(t, s) = φk (t) φ (s), φ
(t, s) = φk (t) φ
(s). Let
kk
k
kk
k
φ
= φk ⊗ φ , where φk is a N -dimensional vector deﬁned in Section 2.3,
kk
k
−1 T
φ
(t, s) = Bp (t, s) XT X
X φ . Further we denote
kk
kk
−1 T
(0,ν)
(0,ν)
φ
(t, s) = Bp
(t, s) XT X
X φ .
kk
kk

Lemma 4.4.5. Under Assumptions (C5), for any 1 ≤ ν ≤ p − 2, and κ = ∞, one has

λk λ
k
k,k ≥1

(0,ν)
(0,ν)
φ
−φ
= o (1) .
kk
kk
∞

Proof. When κ = ∞, one has λk > 0 for any k ≥ 1. Note that

λk λ
k
k,k ≥1

(0,ν)
φ
≤
kk
∞ k≥1

λk φk ∞

λ
k
k ≥1

130

(ν)
φ
= o (1) .
k ∞

Also similarly,

λk λk
k,k ≥1

(0,ν)
≤
φ
kk
∞
k≥1
≤ C

λk φk
∞

λk
k ≥1

λk φk ∞
k≥1

λ
k
k ≥1

(ν)
φ
k ∞
(ν)
= o (1) .
φ
k ∞

Lemma 4.4.6. Under Assumption (C5), for any 0 ≤ ν ≤ p − 2 and k, k ≥ 1, one has
(0,ν)
(0,ν)
p−ν
.
= O hG
φ
−φ
kk
kk
∞
Proof. According to Theorem 12.8 of Schumaker (2007), there exists an absolute constant C > 0, such that

(p)
(p)
p
φ
−φ
≤ C φ φ + φk φ
hG ,
kk
kk ∞
k k
k ∞
which proves (4.6) for the case υ = 0. Let Q (f ) the p-th order quasi-interpolant of a function
f ; see the deﬁnition in (12.29) of Schumaker (2007), for 0 ≤ ν ≤ p − 2,

Q φ
kk

−φ
kk

(0,ν)

≤C φ
∞

(p)
(p)
p−ν
φ + φk φ
hG .
k k
k ∞

For the case ν = 0, one has

Q φkk

− φkk

∞

Q φkk − Q(φkk )
≤ C φkk − φkk
∞
∞
(p)
(p)
p
≤ C φ φk + φk φ
hG ,
k
k ∞
=

which, according to the diﬀerentiation of B-spline given in de Boor (2001), entails that
Q φ
kk

−φ
kk

(0,ν)

(p)
(p)
p−ν
≤ C φ φ + φk φ
h
, for 0 ≤ ν ≤ p − 2.
k k
k ∞ G
∞
131

Lemma 4.4.7. Under Assumptions (C1)-(C6), for any 1 ≤ ν ≤ p − 2

˜
G(0,ν) − G(0,ν)

∞

p−ν
= OP n−1/2 + hG
+ oP N −1 n−1/2 h−1−ν log1/2 n , (4.21)
G

ˆ
˜
G(0,ν) − G(0,ν)

= OP

∞

−3/2−ν
p
n−1 hG
+ n−1/2 h−1−ν hm ,
G

(4.22)

ˆ
ˆ
where G(0,ν) and G(0,ν) are given in (4.10) and (4.20).

Proof. Let ¯
ξ
= n−1
·kk

n ξ ξ
−1
ε
i=1 ik ik and ¯·jj = n

n ε ε . To show (4.21),
i=1 ij ij

¯
we decompose R
in (4.19) into
·jj
κ
¯
R

1jj

=
k=k

¯
R3jj = σ

¯
R
= n−1
4jj

j
N


n  κ

σ

j
N

ξ ik φk


i=1 k=1

κ

j j
,
N N

¯
ξ
φ
·kk kk

¯ φ
ξ ·kk kk

¯
, R2jj =
k=1

j j
,
N N

,

¯·jj ,
ε

j
N

κ

j
N

σ

ε

ij

ξ ik φk

+
k=1

j
N

σ

j
N

εij




.



−1 T
¯
¯
˜ (0,ν) (t, s) = B(0,ν) (t, s) XT X
¯
Denote Ri = R
X Ri , i = 1, 2, 3, 4.
, Ri
p
ijj 1≤j,j ≤N
˜
˜ (0,ν) (t, s) + R(0,ν) (t, s) + R(0,ν) (t, s) + R(0,ν) (t, s). Next we deﬁne
˜
˜
˜
Then G(0,ν) (t, s) = R1
2
3
4
(0,ν)
R1
(t, s) =

κ

(0,ν)
¯
ξ ·kk φ
(t, s) ,
kk

k=k
(0,ν)
R2
(t, s) = G(0,ν) (t, s) +

κ

(0,ν)
φ
(t, s) ¯·kk − λk
ξ
kk

.

k=1
˜ (0,ν) (t, s) =
Note that R1

(0,ν)
κ
¯
ξ
φ
(t, s), then Lemma 4.4.5 and Assumption (C5)
·kk kk
k=k
132

imply that if κ is ∞, λk λ > 0 and one has
k
˜ (0,ν) (t, s) − R(0,ν) (t, s)
sup
E R1
1
(t,s)∈[0,1]2
κ
−1/2
(0,ν)
(0,ν)
ξ
λk λ
= o (1) .
≤
E ¯
λk λ
φ
−φ
·kk
k
k
kk
kk
∞
k=k
Similarly, one has

˜ (0,ν) (t, s) − R(0,ν) (t, s)
sup
E R2
2
(t,s)∈[0,1]2
κ
(0,ν)
(0,ν)
E ¯·kk λ−1 − 1 λk φ
ξ
≤
−φ
= o (1) .
kk
k
kk
∞
k=1

If κ is ﬁnite, then Lemma A.6 and Assumption (C3) imply that

˜ (0,ν) − R(0,ν)
R1
1

≤ κ2
∞

(0,ν)
(0,ν)
p−ν
φ
−φ
= OP hG n−1/2 .
kk
kk
∞

¯
max
ξ
·kk
1≤k=k ≤κ

Similarly,

˜ (0,ν) − R(0,ν)
R2
2

≤ κ max ¯·kk − λk
ξ
1≤k≤κ
∞

˜ (0,ν) − R(0,ν)
Hence, R2
2

˜ (0,ν) − R(0,ν)
+ R1
1

∞
Wang, Li and Yang (2012), one has

¯
N −2 XT R3

∞

(0,ν)
(0,ν)
p−ν
φ
−φ
= OP hG
.
kk
kk
∞

¯
= N −2 XT R4

∞

= o (1). By Proposition 3.1 in Cao,
∞

= oP N −1 n−1/2 hG log1/2 (n) .

133

˜ (0,ν)
Hence, R3

∞

˜ (0,ν)
= R4

˜
G(0,ν) − G(0,ν)

∞

= oP N −1 n−1/2 h−1−ν log1/2 n . Therefore,
G
κ

∞

¯
ξ
·kk

≤

˜ (0,ν) − R(0,ν)
+ R1
1

k=k

κ
(0,ν)
¯
+
ξ ·kk − λk
φ
kk
∞ k=1

˜ (0,ν) − R(0,ν)
+ R2
2

˜ (0,ν)
+ R3

∞
∞
∞
−1/2 + hp−ν + o
−1 n−1/2 h−1−ν log1/2 n .
= OP n
P N
G
G

(0,ν)
φ
kk
∞

˜ (0,ν)
+ R4

∞

The proof of (4.22) is similar to Proposition 2.1 in Cao, Wang, Li and Yang (2012), thus
omitted.
Proof of Theorem 4.3.3: According to Lemma 4.4.7, one has

ˆ
G(0,ν) − G(0,ν)

∞

˜
≤ G(0,ν) − G(0,ν)

∞

ˆ
˜
+ G(0,ν) − G(0,ν)

∞

= oP (1).

Proof of Theorem 4.3.4
ˆ
ˆ
We ﬁrst show asymptotic consistency of λk and φk (·), for k ≥ 1, in the following lemma.

Lemma 4.4.8. Under Assumptions (C1)-(C6), one has

ˆ
λk − λk = oP (1) ,

ˆ
φk − φk
= oP (1) ,
∞

k ≥ 1.

Proof. We ﬁrst want to show that for any k ≥ 1, ∆φk ∞ = oP (1), in which ∆ is the
ˆ
integral operator with kernel G − G. Note that ∆φk (t) =
ˆ
Theorem 4.3.3, when ν = 0, G − G

∞

ˆ
(G − G) (s, t) φk (s) ds. By

= oP (1). Thus, for any k ≥ 1, ∆φk ∞ = oP (1).
134

Hall and Hosseini-Nasab (2006) gives the L2 expansion

ˆ
φk − φk =

λk − λ

−1

∆φk , φ
k

k

φ +O
k

∆ 2 ,
2

k =k

ˆ
G (s, t) − G (s, t)

where ∆ 2 =

2

1/2
dsdt

. By Bessel’s inequality, one has

u
∆φk 2 + ∆ 2 = oP (1). By (4.9) in Hall, M¨ller and Wang (2006)
∞
2

ˆ
φk − φk ≤ C
2
and Theorem 4.3.3

ˆ
λk − λk =

ˆ
(G − G) (s, t) φk (s) φk (t) dsdt + O

∆φk 2 = oP (1) .
2

ˆ
Thus λk − λk = oP (1) for any k ≥ 1. Next note that

ˆ ˆ
λk φk (t) − λk φk (t) =

ˆ
ˆ
G (s, t) φk (s) ds −

ˆ
ˆ
(G − G) (s, t) φk (s) − φk (s) ds +

=
+

G (s, t) φk (s) ds

ˆ
(G − G) (s, t) φk (s) ds

ˆ
G (s, t) φk (s) − φk (s) ds.

By the Cauchy-Schwarz inequality, uniformly for all t ∈ [0, 1]

ˆ
G (s, t) φk (s) − φk (s) ds ≤

G2 (s, t) ds

Similar arguments and Theorem 4.3.3 imply that

1/2

ˆ
φk − φk = oP (1) .
2

ˆ
ˆ
(G − G) (s, t) φk (s) − φk (s) ds =

ˆ ˆ
ˆ
oP (1) and (G−G) (s, t) φk (s) ds = oP (1). All the above together yield λk φk − λk φk
=
∞

135

oP (1). By the triangle inequality and

ˆ
ˆ ˆ
ˆ
+ λk − λk
≤ λk φk − λk φk
λk φk − φk
∞
∞

ˆ
= oP (1) ,
φk
∞

the second result in Lemma 4.4.8 follows directly.
Proof of Theorem 4.3.4. According to Lemma 4.4.8 and Theorem 4.3.3, one has
1
1
ˆ
ˆ
ˆ (ν) (s) − φ(ν) (s) = λ−1
ˆ
G(0,ν) (t, s)φk (t) dt
G(0,ν) (t, s)φk (t) dt − λ−1
φ
k
k
k
k
0
0
1
ˆ −1
ˆ
ˆ
G(0,ν) (t, s)φk (t) dt
≤ λk − λ−1 ×
k
0
1
ˆ
ˆ
G(0,ν) (t, s)φk (t) − G(0,ν) (t, s)φk (t) dt.
+λ−1
k
0
Hence, sups∈[0,1]

1
0

ˆ
ˆ
ˆ
G(0,ν) − G(0,ν) (t, s)φk (t) + G(t, s) φk − φk (t) dt = oP (1)

ˆ (ν) − φ(ν)
= oP (1). Theorem 4.3.4 follows from the deﬁnition of Σ(t, s).
and φ
k
k ∞

136

Chapter 5
Testing Hypotheses Under Weakly
Identiﬁability

5.1

Introduction

Data in statistical research are often well described by models, in which the scientiﬁc questions of interest are described by an unknown, ﬁnite-dimensional parameter vector. Such
models may be either fully parametric or semiparametric, where other aspects of the model
may be described by inﬁnite dimensional parameters which are completely unspeciﬁed. In
such settings, it is often of interest to use the observed data in order to draw inferences about
the parameters of interest. Standard inferential techniques may be applied if the parameters
of interest can be well estimated by minimizing a parametric loss function or more generally by solving a parametric estimating function which does not involve inﬁnite dimensional
nuisance parameters. In many situations, however, these parameters may be nonidentiﬁable
or at best weakly identiﬁable from the estimating function so that the standard inferential
theories may not be valid. The goal of this paper is to develop hypothesis tests for scenarios
137

in which the model parameters are weakly identiﬁable. Conceptually, the term weak identiﬁability refers to the situations where data contain some information about model parameters
but not enough to identify them uniquely.
To illustrate the problem quite sharply, we consider a simple theoretical example where
a fully parametric model is indexed by an unknown parameter vector (θ, β) for an observable random quantity Y . We assume that realizations {Yi }n of Y are independent and
i=1
identically distributed (i.i.d) normal N (θ + β, 1) variates. The objective is to evaluate the
hypothesis H0 : θ0 = 0, where θ0 is the true value of θ. Using only observed data and
assuming that β 0 , the true value of β is unknown, inferences for θ0 may not be conducted
using standard techniques due to identiﬁability problems arising from the mean model being
overparameterized.
Another interesting, more practical illustration of this problem comes from the missing
data literature where weakly identiﬁable models are frequently encountered. Speciﬁc examples include the study of publication bias in meta-analysis (Chambers and Welsh, 1993;
Copas, 1999; Copas and Li, 1997) and the analysis of longitudinal data subject to nonrandom nonresponses (Scharfstein et al., 1999; Kenward et al., 2001; Rotnitzky et al., 2001;
Little and Rubin, 2002). Identiﬁability issues commonly arise with non-random missing data,
where the parameters in the model for the missingness may not be jointly identiﬁable with
those in the model for the outcomes of interest using only the observed data, particularly
with semiparametric models, where some of the nuisance parameters may be inﬁnite dimensional. Analyses which assume identiﬁability may be unreliable, with the joint selection
and outcome model yielding ﬂat “estimation” surfaces potentially having multiple modes.
These phenomena have previously been reported by several authors in modeling potentially
non-ignorable missing data models (Scharfstein et al., 1999; Todem et al., 2010) .
138

In section 3, we consider these missing data issues when analyzing longitudinal data with
informative dropout employing the model of Troxel et al.(1998b). The model is semiparametric, with the parameter being estimated denoted by (θ, β), where β is the selection parameter
that measures the extent of non-randomness of the missing data mechanism and θ consists
of the remaining ﬁnite dimensional parameters of the selection and outcome models. The
hypotheses of interest concern covariate eﬀects on the outcome, which are contained in θ. In
Troxel et al. (1998b), a so-called pseudo-likelihood analysis, described in detail in Section 3,
was carried out under the assumption of parameter identiﬁability. The resulting estimating
function only involves (θ, β), with the longitudinal dependence in the outcomes completely
unspeciﬁed and not estimated. We investigated the parameter identiﬁability assumption in
a reanalysis of the cancer data from Troxel et al. (1998b) by proﬁling the pseudo-likelihood
analysis in β (Figure 5.1). The proﬁle pseudolikelihood is ﬂat in β, suggesting a model that
is at best weakly identiﬁable. These results draw into question inferences which assume
identiﬁability of θ and β.
Due to identiﬁability concerns, tests concerning the model parameters cannot use conventional theory to assess statistical signiﬁcance. Essentially, the standard estimation and
inference techniques may fail due to the models being overparameterized. A natural remedy is to partition the parameter indexing the estimating function into certain parameters
of interest and other parameters which may be viewed as secondary parameters. For the
theoretical example discussed earlier, where Y is normally distributed, the parameter of interest in light of the hypothesis under study is θ, while β is the secondary parameter. In
the missing data application (Troxel et al., 1998b), the parameter β which describes the
informativeness may be viewed as the secondary parameter, while the covariate eﬀects in
θ may be of primary interest in hypothesis testing. In practice, the choice of θ and β will
139

−1680
−1720
−1760
−1800

Sup(LogL(θ, β))

−2

0

2

4

6

8

10

β

Figure 5.1: Supremum of the pseudo-likelihood function proﬁled across β, the parameter
measuring the extent of non-randomness of the missing data mechanism in the study.

140

depend on the application.
Various approaches to the problem of non identiﬁable parameters that have appeared
in the literature focused primarily on maximum likelihood based procedures. Almost all
previous works in hypothesis testing deal with the case where non identiﬁability only occurs
under the null hypothesis. Examples include Davies (1977, 1987), Hansen (1996), Ritz
and Skovgaard (2005) and Song et al. (2009). Generally, this requires that the model is
identiﬁable under the alternative hypothesis. In sensitivity analysis, the testing problem
has a diﬀerent formulation. The model may not be identiﬁable under either the null or the
alternative hypothesis. Moreover, even after ﬁxing a set of parameters, it may not be clear
whether the parameters of interest can be consistently estimated under the null hypothesis.
To be concrete, in the normal example, for each value of β, the maximum likelihood estimator
of θ consistently estimates θ0 + β − β 0 , where θ0 and β 0 are the true values of θ and β.
This only equals θ0 when β = β 0 . Our approach to inference about the parameters of
interest is to adapt the proﬁling strategy from the earlier works described above. Because
the testing problem is fundamentally diﬀerent, the resulting developments are nonstandard,
with relatively little work in the literature on this problem. Since the model may not be
identiﬁable even after proﬁling, we need to consider the behavior of the proﬁle estimator
under model misspeciﬁcation under the null.
This inferential strategy poses substantial technical challenges beyond those encountered
with supremum tests which assume identiﬁability under the alternative. In missing data
applications used to motivate the sensitivity analysis, rigorous results for full likelihood
analyses have been established (Lu and Copas, 2004), essentially requiring model identiﬁability. More recently, Todem et al. (2010) demonstrated how to conduct likelihood inference
via inﬁmum tests, including a precise analysis of the behavior of the proﬁle estimators under
141

model misspeciﬁcation and the distribution of the corresponding inﬁmum test. Such tests
are particularly important when the quantity being tested does not increase or decrease
monotonically as the nonidentiﬁed parameters are increased or decreased. Under monotonicity, it is only necessary to perform the tests at the limits of the nonidentiﬁed parameter
space. They developed simultaneous conﬁdence bands which enable identiﬁcation of those
values of the sensitivity parameter for which signiﬁcant results are obtained. Although these
likelihood-based methods are useful, they require a full distribution speciﬁcation for the data.
This can be a diﬃcult task in practice, especially when observed data do not have enough
information to fully identify the parameter of interest.
In this paper, we extend the proﬁling idea to arbitrary estimating functions involving
θ and β but which do not require a complete parametric model speciﬁcation. Our set-up
includes the likelihood score functions as a special case. The generalization of the inﬁmum
test and conﬁdence bands to non-likelihood settings is nontrivial. The inﬁmum test has
the advantage that it is simply deﬁned directly in terms of contrasts whereas the supremum
tests are obtained through nontrivial derivations using the log-likelihood functions (DacunhaCastelle and Gassiat, 1999). We present generic conditions which establish the large sample
properties of the estimating function for θ proﬁled on β, including the uniform consistency
and weak convergence of the θ estimator as a function of β. To our knowledge, these
theoretical results are novel, with issues related to nonidentiﬁable estimating functions not
having been studied rigorously, previously. We accommodate misspeciﬁcation and uniformity
in β in a general paradigm which permits the proﬁling to be carried out with respect to any
suitable estimating function. Owing to the complexity of the asymptotic distributions of the
inﬁmum test and conﬁdence bands, resampling is needed. A theoretically justiﬁed procedure
is discussed for approximating such distributions.
142

The rest of this chapter is organized as follows. In Section 5.2, we present the general
framework of the problem, the proposed test and the resampling procedure, along with a
proof of the key asymptotic properties. In Section 5.3, the methodology is exhibited using
the cancer dataset in Troxel et al. (1998b) and in simulations, where the naive Wald test
may have either inﬂated type I error rate or reduced power. Some remaining issues are
discussed in Section 5.4.

5.2
5.2.1

The method
The general framework

We consider a model involving a ﬁnite dimensional parameter
random quantity Y . The parameter

∈ Ω for an observable

may not completely determine the distribution of

Y , that is, there may be other aspects of the model which are unspeciﬁed. The interest is
drawing inferences about
function SY ( ). Denote by
ˆ of

with i.i.d realizations {Yi }n of Y and a general estimating
i=1
0 the true value of

. If E{SY ( 0 )} = 0, then an estimator

0 usually can be obtained by solving the estimating equation, SY ( ) = 0; see

Chapter 5 of van der Vaart (2000b) for an overview of Z-estimators. If SY identiﬁes

0,

then under other mild regularity conditions, this estimating equation yields a consistent and
asymptotically normal parameter estimator. Under such regularity conditions, inferences
about

0 can be conducted using the large sample properties of ˆ . Problems may occur if

the model as a function of

is “overparameterized”, with multiple values of

satisfying

E{SY ( )} = 0. In this case, the estimator may not have the usual asymptotic properties.
Nonidentiﬁability can be addressed by ﬁxing some components of
143

, conditional upon

which the remaining parameters are uniquely deﬁned by SY . One may partition

= (θ, β),

where θ, a p-dimensional vector, is assumed to be “identiﬁable” for a ﬁxed q-dimensional
vector β, as deﬁned in Section 2.2 below. If the true value β 0 of the nonidentiﬁed parameter
β is known, the estimator ˆ0 at β = β 0 can be used to conduct reliable inferences about θ0 ,
θ
the true value of θ. This estimator is readily available by solving the estimating equation
SY (θ, β 0 ) = 0, for ﬁxed and known β 0 . The approach is unfeasible, as the true value β 0 is
usually unknown to the analyst in practice. A common strategy is to ﬁx β and study the
estimator of θ at various values of β ∈ Ξ. To highlight the dependence on β, we denote by
ˆ
θ(β), the estimator of θ for a ﬁxed β. The estimator of θ when β = β 0 is ˆ0 = ˆ 0 ).
θ
θ(β
¯
¯
¯
For the simple normal example, ˆ
θ(β) = Y − β and ˆ 0 ) = Y − β 0 , where Y is the
θ(β
sample mean. This estimator is normally distributed with mean θ0 + β 0 − β and variance
n−1 , uniformly in β, for each ﬁxed n. Of course, in general, it is not possible to obtain clean
ﬁnite sample results and large sample approximations are needed. In the subsection below,
we study the uniform asymptotic properties of ˆ
θ(β) for β ∈ Ξ.

5.2.2

Large sample properties of ˆ
θ(β)

When β is ﬁxed at its true value β 0 , it is well established that for an estimating function
SY (θ, β 0 ) which is smooth in θ, the estimator ˆ is consistent and approximately normal
θ
under mild regularity conditions (see, for example, van der Vaart and Wellner, 2000a). That
1
is, n 2 {ˆ 0 )−θ0 } →d N (0, Σ0 ), where Σ0 = (D(θ0 ))−1 var(SY (θ0 , β 0 ))(D−1 (θ0 ))T , with
θ(β
D(θ0 ) being the expected value of the ﬁrst order derivative of SY (θ, β 0 ) with respect to θ.
These properties of ˆ 0 ) can be used to conduct large-sample inferences about θ0 .
θ(β
For a given β, the estimator ˆ
θ(β) will converge to a quantity θ∗ (β), which is generally
144

diﬀerent from θ0 if β = β 0 . For the simple normal example, θ∗ (β) = θ0 + β 0 − β. This
contrasts with set-ups on testing with nonidentiﬁability under the null (Davies, 1977, 1987),
where it is generally assumed that θ∗ (β) = θ0 for all β. Moreover, appropriately standardized, ˆ
θ(β) will be asymptotically normal, with variance which may be estimated using
a sandwich variance approach. This is an extension of standard pointwise asymptotic theory for maximum likelihood estimation with misspeciﬁed models, originating in the seminal
work of Huber (1967) and White (1982). We study below the uniform convergence of this
estimator across all values of β ∈ Ξ.
Suppose the data consist of iid realizations {Yi }n of Y . Let sY (θ, β) be the contribui=1
i
tion of subject i to the estimating function SY (θ, β). Deﬁne Sn (θ, β) = n−1 n sY (θ, β)
i=1 i
˜
and S(θ, β) = E{sY (θ, β)}. Let gY (θ, β) = ∂sY (θ, β)/∂θ, WY (θ, β) = n−1 n gY (θ, β)
i=1 i
1
i
i
˜
and W (θ, β) = E{gY (θ, β)}. For any given β ∈ Ξ, let ˆ
θ(β) denote the solution to
1
SY (θ, β) = 0, that is SY (ˆ
θ(β), β) = 0. The “least false” (White, 1982) parameter θ∗ (β),
˜
satisﬁes S(θ∗ (β), β) = 0. Deﬁne G1 = {sY (θ, β) : i = 1, . . . , n, θ ∈ Θ, β ∈ Ξ} and
i
G2 = {gY (θ, β) : i = 1, . . . , n, θ ∈ Θ, β ∈ Ξ}.
i
We assume the following regularity conditions:

C1. The sets Θ ⊂ Rp and Ξ ⊂ Rq are compact and θ∗ (β) is an interior point of Θ for any
β ∈ Ξ.

C2. The function classes, G1 and G2 , are pointwise measurable and satisfy the uniform
entropy condition (van der Vaart and Wellner, 2000a).
˜
C3. inf θ∈Θ,β∈Ξ λmin {−W (θ, β)} > 0, where λmin (·) denotes the minimum eigenvalue of
a matrix.
145

C4. The estimating function SY (θ, β) has continuous ﬁrst order derivatives with respect
to θ for any given β ∈ Ξ.

Condition C1 deﬁnes the parameter space for the implied parameter θ∗ (β) for a given
β. Because θ∗ (β) may be nonconstant in β, the parameter space for θ∗ (β) across β is
contained in a suitably deﬁned functional space. Conditions C2 and C3 give conditions
under which uniform asymptotic results for θ∗ (β) may be obtained. The entropy condition
C2 ensures that the estimating function is well behaved across all β. The condition is satisﬁed
by functions which are uniformly bounded and uniformly Lipschitz of order > {dim(θ) +
dim(β)}/2, where dim(·) denotes the dimension of a vector. Condition C3 guarantees the
identiﬁability of θ∗ (β) for all β. The longitudinal data model presented in Section 3 meets
these requirements. Note that the smoothness speciﬁed in condition C4 only applies to θ.
Diﬀerentiability in β is not assumed. Non-smoothness in θ could be accommodated under
stronger assumptions.
The proof of the following theorem is provided in the appendix.

Theorem 5.2.1. Under Conditions C1-C4, supβ∈Ξ ˆ
θ(β) − θ∗ (β) →p 0, where ·
1
represents the Euclidean norm. Furthermore, n 2 ˆ
θ(β) − θ∗ (β) converge weakly to a tight
Gaussian process with positive deﬁnite covariance function

Σ∗ (β 1 , β 2 ) =
=

1
1
lim cov n 2 ˆ 1 ) − θ∗ (β 1 ) , n 2 ˆ 2 ) − θ∗ (β 2 )
θ(β
θ(β
n→∞
˜
W (θ∗ (β 1 ), β 1 )

−1 T

E sY (θ, β 1 )sT (θ, β 2 )
Y1
1
146

˜
W (θ∗ (β 2 ), β 2 )

−1

.

For ﬁxed β,

Σ∗ (β, β) =
=

1
θ(β) − θ∗ (β)
lim var n 2 ˆ
n→∞
˜
W (θ∗ (β), β)

−1 T

E sY (θ, β)sT (θ, β)
Y1
1

˜
W (θ∗ (β), β)

−1

.

The covariance function may be easily estimated using a robust sandwich variance estimator
along the lines of White (1982), which is valid under model misspeciﬁcation. This estimator
may be used to construct pointwise conﬁdence intervals for θ∗ (β) at ﬁxed β using the pointwise asymptotic normality of ˆ
θ(β). However, for the testing and conﬁdence band procedures
described below, the complexity of the limiting distribution across β is prohibitive for conducting inference, even with variance estimation. For such scenarios, we suggest resampling
to approximate the distribution of the estimator.
It can easily be shown that the regularity conditions are satisﬁed for the simple normal
¯
example. Interestingly, ˆ
θ(β) − θ∗ (β) = Y − θ0 − β 0 , which does not depend on β. This
greatly simpliﬁes the results of Theorem 1, since the standardized estimators are identical
for all β, which is not generally true. One should note that the form of the mean model
is critical. If we assumed that E(Y ) = θβ, then the eigenvalue condition, C3, would be
violated at β = 0 and the uniform convergence in Theorem 1 would fail to hold on intervals
containing zero.

5.2.3

Global sensitivity testing

Suppose we are interested in evaluating the null hypothesis: H0 : Cθ0 = c, where θ0 is
the true value of θ and C an r × dim(θ0 ) contrast matrix for assessing single and multiple
147

linear combinations of model parameters. For example, when testing the jth component of
θ, one takes C to be 1 × dim(θ) vector with a one at the jth position and zeros elsewhere.
Under nonidentiﬁability, the above hypothesis cannot be tested without imposing unveriﬁable
restrictions. If the true sensitivity parameter β 0 is known, then H0 : Cθ∗ (β 0 ) = c, where
θ∗ (β 0 ) = θ0 .

In practice, where β 0 is unknown, one may consider the process θ∗ (β), observing that
the trivial inequality,

0 ≤ inf Cθ∗ (β) − c ≤ Cθ∗ (β 0 ) − c ≤ sup Cθ∗ (β) − c ,
β∈Ξ
β∈Ξ
permits a conservative assessment of H0 . To do so, we formulate the inﬁmum hypothesis:
Hinf : inf β∈Ξ Cθ∗ (β) − c = 0.

The inﬁmum statistic Tinf = inf β∈Ξ C ˆ
θ(β)−c can be used to evaluate this hypothesis.
The distribution of this statistic can be derived analytically in some simple situations. As an
example, we revisit the normal scenario discussed earlier where the interest is in evaluating
¯
the hypothesis, H0 : θ0 = 0, using the processes ˆ
θ(β) = Y − β. For ease of illustration,
¯
assume Ξ = [0, 1], such that the inﬁmum statistic becomes Tinf = inf β∈[0,1] |Y − β|. This
¯
is a mixture of a point mass at 0 with probability Pr(Y ∈ [0, 1]) and two truncated normal
148

distributions. Speciﬁcally,





 ¯
 −Y










¯
inf |Y − β| =

β∈[0,1]















0

¯
if Y < 0,

¯
if Y ∈ [0, 1],

¯
¯
Y − 1 if Y > 1.

¯
The corresponding cumulative distribution function-CDF Finf (x) = Pr(inf β∈[0,1] |Y − β| ≤
x) is
¯
¯
Finf (x) = Pr(Y ≤ x + 1) − Pr(Y ≤ −x), for x ≥ 0.

(5.1)

¯
¯
¯
In particular, we have Finf (0) = Pr(Y ≤ 1) − Pr(Y ≤ 0) = Pr(Y ∈ [0, 1]), reﬂecting the
point mass at 0 for Tinf .
In general, because of the complexity of the limiting distribution of the inﬁmum of the test
process, simple general analytic results do not appear tractable. Instead, resampling may
be utilized. A simple nonparametric bootstrap (Efron and Tibshirani, 1993) may be used
to compute variance estimators, and to carry out the simultaneous inferences necessary for
the inﬁmum tests and the conﬁdence bands, described below. The validity of the bootstrap
follows automatically from empirical process theory under the regularity conditions given in
van der Vaart and Wellner (2000b) even under model misspeciﬁcation. This requires the
boundedness of the estimating function for ﬁxed β ∈ Ξ. A diﬃculty with the nonparametric
bootstrap is that it requires solving the estimating function for all β in each bootstrap sample, which may be computationally demanding. An alternative resampling technique which
does not require repeatedly solving the estimating function may be constructed. The basic
149

idea is to generate realizations directly from the limiting distribution of ˆ
θ(β) and to use these
realizations to approximate the distribution of the inﬁmum test and conﬁdence bands. This
resampling technique has been extensively used in the literature when the true asymptotic
distribution is hard if not impossible to derive analytically (see for example, Parzen et al.,
1994 and Zhu and Zhang, 2006). To do this, one ﬁxes the estimator based on the observed
data and then “perturbs” this estimator using a disturbance which conditionally on data
has mean zero and variance-covariance in β equalling that of ˆ
θ(β) in Theorem 5.2.1. The
procedure is given by the following steps:
Step 1.

Generate n i.i.d random variables from a standard normal model ζ, denoted

(b)
(b)
{ζ 1 , . . . , ζ n }, where superscript (b) represents replications.
(b)
Step 2. Given the realizations of the data, {Yi }n , and values of β ∈ Ξ, calculate ˜ (β)
θ
i=1
(b)
(b)
using the simulated {ζ 1 , . . . , ζ n } and the equation,

˜(b) (β) = ˆ
θ
θ(β) + n−1

(b)
−1 θ(β), β),
n s (ˆ
WY (ˆ
i=1 Yi θ(β), β)ζ i

(5.2)

o
where the statistic ˆ
θ(β) takes value ˆ (β) for observed data {Yi }n .
θ
i=1
Step 3. Calculate T

(b)
(b)
(b)
= inf β∈Ξ C ˜ (β) − c using ˜ (β), β ∈ Ξ.
θ
θ
inf

By repeatedly generating the normal variates {ζ j }n , B times, and repeating steps 2 and
j=1
(b)
3 for each generated sample, we obtain the empirical distribution of T
given observed
inf
data. Theorem 2 below establishes that this empirical distribution converges to the marginal
asymptotic distribution of Tinf as n → ∞. Let 1(E) be the indicator function for event
(b)
o
E. The p-value of the test is then B−1 B 1(T
≥ Tinf ), the proportion of these
b=1
inf
o
bootstrap observations which exceed Tinf the observed value of the statistic.
150

For the simple normal example, we compare the resampling null distribution of T

(b)
inf

to the analytical distribution Finf (.) in (5.1) for a ﬁnite sample size. Setting θ0 = 0 under the null and β 0 = 0.5, we generate {Yi }n from a normal distribution N (0.5, 102 ).
i=1
(b)
Furthermore, we take Ξ = [0, 1] and for each resample b = 1, · · · , B, we compute T
=
inf
(b)
(b)
¯ (b)
¯
inf β∈[0,1] |˜ (β)|, where ˜ (β) = Y − β − n−1 n (Yi − Y )ζ i . Results with n = 100
θ
θ
i=1
and B = 10000 resamples are plotted in Figure 5.2. The resampling distribution provides a
good approximation to the analytical distribution for this simple hypothetical example.
If the inﬁmum (null) hypothesis cannot be rejected, then a supremum test or equivalently
a simultaneous conﬁdence region may be used to check whether Cθ∗ (β) − c > 0 in some
o
regions of Ξ. The supremum hypothesis Hsup may be tested with the statistic Tsup =
(b)
supβ∈Ξ C ˆ
θ(β) − c using the bootstrap realizations of ˜ (β), β ∈ Ξ. The p-value of
θ
(b)
(b)
o
the supremum test is then B−1 B
b=1 1(Tsup ≥ Tsup ), where Tsup are the bootstrap
realizations of the statistic. Alternatively, a simultaneous conﬁdence region for Cθ∗ (β) − c
across all values of β may be constructed. Let 0 < ϕ < 1. A simultaneous conﬁdence region
for Cθ∗ (β) − c, β ∈ Ξ is given by {ϑ(β) : Ξ → Rr ;

ϑ(β) − C ˆ
θ(β) + c < ρϕ }, where ρϕ is

o
o
(b)
θ
the (1 − ϕ)th empirical percentile of {supβ∈Ξ C ˜ (β) − C ˆ (β) }B , with ˆ (β) being
θ
θ
b=1
the value of the statistic ˆ
θ(β) for observed data {Yi }n .
i=1
The following result supports the validity of the resampling based inﬁmum test and
conﬁdence bands.
Theorem 5.2.2. Under Conditions C1-C4, the conditional distribution of the process
o
n1/2 {˜
θ(β) − ˆ (β)} given realizations {Yi }n
θ
i=1 of Y , is asymptotically equivalent to the
unconditional distribution of the process n1/2 {ˆ
θ(β) − θ∗ (β)}, β ∈ Ξ.
Theorem 5.2.2 (proof provided in the appendix) coupled with a continuous mapping
151

1.0
0.8
0.6
0.4
0.0

0.2

CDF(x)

0

1

2

3

4

5

x
Figure 5.2: Plot of the exact (solid line) and the resampled (dashed line) CDF (CDF(x) =
¯
Pr(inf β∈[0,1] |Y − β| ≤ x)) of the inﬁmum test statistic under the null θ0 = 0 for the simple
normal example, assuming the true parameter β 0 = 0.5, sample size n = 100 and B = 10000
resamples.

152

theorem gives that the inﬁmum and supremum tests can be carried out using this resampling
¯
procedure. For the simple normal example, n1/2 {ˆ
θ(β) − θ∗ (β)} = n1/2 (Y − θ0 − β 0 ) and
o
¯
n1/2 {˜
θ(β) − ˆ (β)} = −n−1/2 n (Yi − Y )ζ i , which do not depend on β. The random
θ
i=1
¯
quantity n1/2 (Y −θ0 −β 0 ) is normally distributed with mean 0 and variance 1, uniformly in
β, for each ﬁxed n. Given observed data {Yi }n , −n−1/2
i=1
distributed with mean 0 and variance n−1

n (Y − Y )ζ is also normally
¯
i
i=1 i

n (Y − Y )2 which converges almost surely
¯
i=1 i

to 1 as n → ∞.
The choice of the support Ξ of β is critically important in performing the test in practice.
If values of β are selected in some data-driven fashion, the limiting distribution in Theorem
1 will be invalid. This is similar to Hansen (1996) for the case where the model is identiﬁable
under the null after proﬁling on β, that is, when θ∗ (β) = θ0 , ∀β ∈ Ξ. On the other hand,
an approach which ignores sample information about Ξ may be unnecessarily conservative
and potentially sacriﬁces power. One possible solution is to consult with subject-matter
experts on the choice of Ξ. This choice ideally should be based on prior studies, as in the
breast cancer analysis in Section 3, where closely related datasets were used to select the
range for the sensitivity parameter. From a technical standpoint, this choice should also be
computationally feasible.

5.3

Numerical studies

5.3.1

Pseudo-likelihood models with missing data

We consider the data set-up and model described in Troxel et al. (1998b) for potentially nonrandom missing data in longitudinal studies. The model will be referred to as the TLH model.
153

The data arise from a longitudinal study where each subject i (i = 1, . . . , n), is to be observed
∗
∗
at K occasions. For subject i, we have a K ×1 response vector, Yi∗ = (Yi1 , . . . , YiK )T which
may not be fully observed. To accommodate missingness, subject i has a vector of missing
∗
data indicators Ri = (Ri1 , . . . , RiK )T , where Rit = 1 if Yit is observed and 0 otherwise.
∗
∗
Let Yi,obs and Yi,miss denote the observed and missing components of Yi∗ , respectively.
Each individual also has a K × J covariate matrix Xi , which is assumed fully observed. The
∗
response Yi in our general formulation is {Yi,obs , Ri , Xi }.
∗
The key idea of the TLH methodology is to model the time point pair (Yit , Rit ), without accounting for the dependence on other time points. Let f (u | w) denote the density
function of random quantity u conditional on possibly non random quantity w. We assume
∗
a simple selection model given by, f (Yit , Rit | Xit ,
where

∗
) = f (Yit | Xit ,

∗
)f (Rit | Yit , Xit ,

),

is a ﬁnite but unknown parameter and Xit may contain both time dependent and

independent covariates.
∗
The TLH model assumes that density f (Yit | Xit ,
µit and σ t (t = 1, · · · , K) are elements of

) is that of normal N (µit , σ t ), where

. The missing data process is assumed to satisfy

∗
Rit ∼ Bernoulli(1 − π it ) where the failure probability π it = Pr(Rit = 0 | Yit , Xit ,

).

We assume a logistic regression model relating the missing data probability to potentially
unobserved responses, that is,

∗
logit(π it ) = γ 0t + γ 1t Xit + β t Yit ,

where γ jt and β t (j = 0, 1; t = 1, · · · , K) are unknown parameters and elements of

(5.3)

. The

parameter β t measures the extent of non-randomness of the missing data mechanism in the
study at time t. Speciﬁcally, exp{β t } represents the odds ratio for missing response at time t
154

∗
for each additional unit increase of the hypothetical response Yit . Here, π it in (5.3) depends
∗
on Yit and not on previous elements of Yi∗ . Following warnings by Troxel et al. (1998b),
we emphasize that this model could suﬀer from misspeciﬁcation if the approximation of the
logistic link function to the true link function fails.

The TLH model lends itself to a pseudo-likelihood analysis (Gong and Samaniego, 1981),
where the longitudinal association is naively ignored in the likelihood construction. Speciﬁ∗
cally, the independence pseudo-likelihood function based on observed data {Yi,obs , Ri , Xi }n
i=1
is
n
ind ( ) =

∗
f (Yi,obs , Ri |

n
···

)=

∗
{f (Yit , Rit | )}Rit
=
i=1 t=1
n K
∗
∗
{f (Yit | )f (Rit | Yit ,
=
i=1 t=1

n

K

∗
)dYi,miss

i=1

i=1
n K

∗
f (Yit |

∗
∗
f (Yi,obs , Yi,miss , Ri |

∗
)f (Rit | Yit ,

∗
=
{f (Yit |
i=1 t=1

∗
f (Yit , Rit |

1−Rit

)}Rit

∗
)dYit

)(1 − π it )}Rit

∗
)dYit

1−Rit

∗
f (Yit |

∗
)π it dYit

1−Rit

.

We have suppressed the dependence on covariates in ind ( ). As a pseudo-likelihood model,
conditions C1-C4 are easily veriﬁed and the asymptotic results hold. The densities in the
TLH model are normal and Bernoulli, which are smooth functions of the unknown parameters.
155

5.3.2

Real data analysis

To illustrate our methodology, we consider data from the International Breast Cancer Study
Group-IBCSG, previously reported by H¨rny et al. (1992); and Troxel et al. (1998b). This
u
is a group of randomized breast cancer studies with primary endpoints being survival and
relapse; and quality of life being a secondary endpoint. One study, Study VI, is a randomized
trial of adjuvant chemotherapy following surgery for the treatment of breast cancer. In this
study, 4 treatments (A, B, C and D) were randomly assigned to 431 pre-menopausal cancer
patients and several domains of quality of life were assessed. In this paper, we focus on
three quality-of-life domains; 1) PACIS (perceived adjustment to chronic illness scale), 2)
Mood and 3) Appetite. These variables were originally measured on a 0 − 100 scale but are
normalized using a square-root transformation as recommended by Troxel et al. (1998b).
Questionnaires for the quality of life assessment were administered to study patients at
baseline and every three months for two years. Our analysis employs the ﬁrst three time
points, with rates of missing data equalling 16%, 33% and 37% for PACIS, 16%, 33% and
38% for Mood, and 15%, 33% and 38% for Appetite. A full description of Study VI and
other IBCSG trials may be found elsewhere (H¨rny et al., 1992; Troxel et al., 1998a).
u
As in earlier analyses of Study VI, we consider the following model for the measurement
outcome,
µit = µ0t + α1 X1i + α2 X2i + α3 X3i ,

(t = 1, 2, 3),

where µ0t is a time-dependent intercept and αj is a slope associated with Xji , j = 1, 2, 3.





 (1, 0, 0) if treatment A,

(0, 0, 1)
if treatment C

Here (X1i , X2i , X3i ) =




 (0, 1, 0)

if treatment B,
(0, 0, 0) if treatment D.

156

The missing data model is

∗
logit(π it ) = γ 0t + βYit

(i = 1, . . . , 431; t = 1, 2, 3),

∗
where γ 0t is a time-dependent intercept and β is a slope associated with Yit . As discussed
previously, β quantiﬁes the nonrandomness of the missing data process. A constant σ t is
assumed across time.
Our objective is to assess the treatment and time eﬀects on the mean quality of life. Under
the assumed model, the hypotheses of interest are α1 = α2 = α3 = 0 and µ01 = µ02 = µ03
for the treatment and time eﬀects, respectively. As a preliminary analysis, we ﬁrst evaluated
these hypotheses under identiﬁability assumptions. Speciﬁcally, we ﬁt the TLH model by
simultaneously estimating both β and θ = (α1 , α2 , α3 , µ01 , µ02 , µ03 , γ 01 , γ 02 , γ 03 ) via the
independence pseudolikelihood estimating function. A Wald test based on the sandwich
estimator of the covariance matrix of the regression parameter estimates was performed to
evaluate the hypotheses of interest. P-values of these Wald tests for the three responses are
given in Table 1. In brief, these inferences suggest that there is no treatment eﬀect on PACIS
and Appetite and no time eﬀect on PACIS and Mood. The treatment eﬀect on Mood and
the time eﬀect on Appetite are signiﬁcant at 1% level. In addition to these analyses, we also
conducted two crude analyses that do not explicitly model the missing data mechanism. The
ﬁrst analysis used only subjects with complete data sequences, therefore removing subjects
with incomplete data proﬁles. The second analysis ignored the missing data and conduct the
so-called ignorable (missing at random) inferences by forcing β, the non-randomness missing
data parameter, to 0. Results of these analyses are also summarized in Table 1. From these
additional exploratory analyses, the treatment and time eﬀects are found to be statistically
157

signiﬁcant for Mood at 5% level. The ignorable analysis also appears to yield a statistically
signiﬁcant time eﬀects on Appetite. Of course, these crude analyses may not be reliable as
they rely on assumptions that are not veriﬁable using observed data at hand.
The Wald tests conducted under the assumption of identiﬁability may not have desirable
properties if identiﬁability is violated. As illustrated in Figure 5.1, the model was at best
weakly identiﬁable for the outcome PACIS. Model identiﬁability was also a concern for β
for the other two responses. We performed the inﬁmum test to conservatively evaluate the
treatment and time eﬀects on the three quality of life domains. To conduct these tests, the
set Ξ for the range of β was obtained from an independent source. We considered data
on post-menopausal cancer patients from Study VII of the IBCSG trials. Objectives of
this study were similar to those of Study VI, except that the menopausal status of study
participants diﬀered. The joint model appeared to be identiﬁable when applied to Study
VII data. Based on these results, we derived 99% conﬁdence intervals to use as ranges
for β in the inﬁmum tests for Study VI. The ranges for PACIS, Mood and Appetite were
[−4, 0], [−3, 0] and [−5.6, −1.6], respectively. Recall that in the missing data model, exp{−β}
represents the odds ratio of being observed at any time point for each additional unit increase
∗
∗
of the hypothetical response Yit . Since Yit takes values in the range 0 − 10 on a squareroot scale, for the selected ranges, the odds ratio may be as high as; exp{4} = 54.60 for
PACIS, exp{3} = 20.09 for Mood, and exp{5.6} = 270.43 for Appetite. One might criticize
these upper bounds as being scientiﬁcally unreasonable. However, permitting such extreme
scenarios provides for a conservative test, which is in the spirit of sensitivity analysis. For
computational feasibility, the ranges were approximated on ﬁne grids with equally spaced
points of 0.02. P-values of the inﬁmum tests are given in Table 1.
The inﬁmum hypothesis for the treatment eﬀect was rejected for Mood at the 5% level (p158

Table 5.1: P-values for evaluating the Treatment and Time eﬀects using data from Study
VI of the IBCSG trials
Reponeses
PACIS

0.008

0.229

COM†

0.170

0.041

0.370

0.303

0.011

0.376

0.231(0.522*)

0.025

0.281(0.369*)

TLH

0.216

0.552

<0.001

COM†

0.302

0.030

0.163

IGN‡

eﬀect

0.281

IGN‡

Wald test

Appetite

TLH
Treatment

Mood

0.142

0.006

0.023

0.134(<0.001*)

0.063(<0.001*)

<0.001

Inﬁmum test

Time

Wald test

eﬀect
Inﬁmum test
* P-value of supremum test
†

Complete cases;

‡

Ignorable cases

159

value= 0.025), but not for PACIS (p-value= 0.281) and Appetite (p-value= 0.231). For the
time variable, a strongly signiﬁcant eﬀect was detected only for Appetite (p-value< 0.001).
For nonsigniﬁcant inﬁmum test results, a supremum test was conducted to see if one could
not reject the null hypothesis for all values β ∈ Ξ. The supremum test for the treatment
eﬀect was not rejected on PACIS (p-value= 0.522) and Appetite (p-value= 0.369), but was
strongly rejected for the time eﬀect on PACIS (p-value< 0.001) and Mood (p-value< 0.001).
When the supremum test was rejected, a sensitivity analysis was conducted using a
simultaneous 95% conﬁdence band approach to identify regions of β for which the pointwise null hypotheses are rejected. Plots of these analyses for contrasts µ∗ (β) − µ∗ (β)
01
02
and µ∗ (β) − µ∗ (β) for PACIS and Mood are given in Figure 3. For PACIS, the 95%
02
03
simultaneous conﬁdence band for µ∗ (β) − µ∗ (β), −4 ≤ β < −0.4; and µ∗ (β) − µ∗ (β),
02
01
03
02
−4 ≤ β ≤ −3, did not contain 0. Similar analyses for Mood revealed that a 95% simultaneous conﬁdence band for µ∗ (β) − µ∗ (β), −3 ≤ β < −0.7, did not contain 0. The conﬁdence
02
01
band for µ∗ (β) − µ∗ (β) did not exclude 0 over the selected range of β (−3 ≤ β < 0). The
03
02
Wald tests which assume identiﬁability were nonsigniﬁcant for all pairwise comparisons at
the 5% level.

5.3.3

Simulation study

In this section, we report results of a simulation study comparing the performance of the
inﬁmum test to that of the naive Wald test derived under identiﬁability assumptions. The
simulations were conducted under a TLH model speciﬁed so as to roughly approximate data
from Study VI of the IBCSG trials. For simplicity, only two treatments (A and B) and two
∗ ∗
time points (K = 2) were considered. The outcome vector (Yi1 , Yi2 ), assuming dependence
on subject i, was generated from a two-dimensional normal distribution with univariate mean
160

−2

0.0
−2.0 −1.0

Time effect 3 vs 2

0.0
−2.0 −1.0

Time effect 2 vs 1

−4

0

−4

−1.5

0.0

0.0
−2.0 −1.0

Time effect 3 vs 2

0.0
−3.0

0

β

−2.0 −1.0

Time effect 2 vs 1

β

−2

−3.0

β

−1.5

0.0

β

Figure 5.3: The top panel corresponds to PACIS; and the bottom panel to Mood. In each
panel, the solid lines represent µ02 (β) − µ01 (β) (on the left) and µ03 (β) − µ02 (β) (on the
ˆ
ˆ
ˆ
ˆ
right) for ﬁxed values of the parameter β. The dashed lines are the corresponding 95%
simultaneous conﬁdence bands and the dotted lines are the null values.

161

models,
µit = µ0t + αt X1i , t = 1, 2,
and time-point variances σ t , t = 1, 2, and correlation coeﬃcient ρ. The parameters µ0t and
αt are time-dependent intercepts and slopes associated with covariate X1i , which equals 1 if
treatment B and 0 otherwise. We reparameterized µ0t and αt as, µ0t = α0 + α1 I(t = 2) and
˜
˜
αt = α2 + α3 I(t = 2), where I(t = 2) is an indicator variable taking value 1 at the second
˜
˜
time point. Throughout our simulations, we ﬁxed the variances σ t , t = 1, 2, to 1 and the
correlation coeﬃcient ρ to 0.4. Missing observations were generated using a logistic model
∗
relating the dropout probability π it to the response Yit as,
∗
logit(π it ) = γ 0t + γ 1t X1i + βYit ,
∗
where γ 0t , γ 1t and β are respectively the intercept and slopes associated with X1i and Yit .
Time-dependent parameters γ 0t and γ 1t were reparameterized as, γ 0t = γ 0 + γ 1 I(t = 2)
˜
˜
and γ 1t = γ 2 + γ 3 I(t = 2), t = 1, 2.
˜
˜
We study the size and power of the inﬁmum and Wald tests for α3 , the parameter that
˜
captures the interaction eﬀect of time and treatment on the mean response. We set α3 = 0
˜
and α3 = 1 for the size and power of the test respectively. Additionally, (˜ 1 , α2 ) = (0, 0) and
˜
α ˜
(˜ 1 , γ 2 , γ 3 ) = (0.5, −2, 0.2) when evaluating size, and (˜ 1 , α2 ) = (0.1, 1) and (˜ 1 , γ 2 , γ 3 ) =
γ ˜ ˜
α ˜
γ ˜ ˜
(1, −3, 1) when evaluating power.
The parameter γ 0 was varied throughout our simulations to produce diﬀerent missing
˜
data rates. Speciﬁcally, to study the size of the test γ 0 was ﬁxed to 0.5, to produce about
˜
15% and 22% missing observations at the ﬁrst and second time point, respectively and at
162

1.8 to produce about 33% and 43% missing observations at the ﬁrst and second time point,
respectively. For the power, γ 0 was ﬁxed to 0.5, producing rates of missing observations
˜
roughly 14% and 26% at the ﬁrst and second time point, respectively and to 2, producing
rates of missing observations roughly 32% and 46% at the ﬁrst and second time point,
respectively. Finally, throughout our simulations, we set the true β to −1.
One thousand datasets were generated with sample sizes 100 and 300. Equal proportions
of subjects were assigned to treatment A and B. The inﬁmum tests were performed on the
interval Ξ = [−2, 0]. To ensure computational feasibility, a ﬁne grid of equally spaced points
of 0.02, was considered. We used 1000 resamples from the alternative resampling scheme
discussed in section 2.3 to approximate the null distribution of the inﬁmum test.
The inﬁmum and Wald tests were performed using working regression models having the
same form as those used to generate data. These models saturate the number of parameters,
leading to potential nonidentiﬁability as a result of overparameterization. Table 2 shows the
rejection rates for nominal test levels 0.01, 0.05, and 0.1. Asymptotic standard errors (as
the number of Monte Carlo iterations tends to inﬁnity) are reported in the last row of the
table. Overall, the inﬁmum tests perform well, with the resampling distribution of the test
providing a reasonable approximation to the nominal level. The Wald test appears to be
very liberal when compared to the inﬁmum test. The anti-conservativeness of the Wald test
does not diminish as the sample size increases. Based on these results, our recommendation
is to avoid the Wald test when identiﬁability is of concern. Because the empirical type I error
rate of the inﬁmum test and that of the Wald test are diﬀerent, comparing their empirical
powers is not appropriate. Nevertheless for both methods, a larger sample size improves
the power of detecting the alternatives under consideration, a ﬁnding consistent with the
literature. Moreover, the power decreases with increasing missing data rates.
163

Table 5.2: Empirical type I error and power of the inﬁmum test and Wald test (in parenthesis) for evaluating the interaction eﬀect represented by the parameter α3
˜
n=100

n=300

Nominal test level

Nominal test level

True

Missing+

value

data rate

0.1

0.05

0.01

0.1

0.05

0.01

α3 = 0
˜

15%, 22%

0.099

0.049

0.011

0.103

0.057

0.010

(˜ 0 = 0.5)
γ

(0.147)

(0.092)

(0.034)

(0.158)

(0.117)

(0.059)

33%, 43%

0.110

0.044

0.010

0.122

0.062

0.012

(˜ 0 = 1.8)
γ

(0.149)

(0.097)

(0.051)

(0.165)

(0.113)

(0.073)

14%, 26%

0.982

0.959

0.870

(˜ 0 = 0.5)
γ

(0.961)

(0.944)

(0.848)

(0.948)

(0.942)

(0.934)

32%, 46%

0.905

0.846

0.675

0.999

0.999

0.993

(˜ 0 = 2)
γ

(0.889)

(0.827)

(0.659)

(0.961)

(0.949)

(0.926)

0.003

0.007

0.009

0.003

0.007

0.009

α3 = 1
˜

Monte carlo SE

Test performed using Ξ = [−2, 0]
+ First and second time point missing data rate

164

>0.999 >0.999 >0.999

While the ability to choose an appropriate support set Ξ of β to perform the inﬁmum
tests is highly desirable in practice, our simulations (results not shown) indicate that only
a minimal inﬂation of type I error rate is observed under a modest misspeciﬁcation of the
set Ξ. For example, when Ξ does not contain the true β but β 0 is not far away from the
boundaries of the set, close to the nominal level is still achieved under the null hypothesis.
As an example, we performed the inﬁmum test on the interval [0, 2], which does not contain
β 0 = −1. For this range of β, the inﬁmum tests nearly maintain their sizes at all signiﬁcance
levels. However, when [10, 12] was selected for the range of β, the inﬁmum tests were overly
anti-conservative.
Another simulation study was conducted to evaluate the eﬀects of the choice of the set Ξ
on the power of the inﬁmum tests. Speciﬁcally, we generated data as before, but performed
the inﬁmum tests on wider intervals, namely [−3, 3] and [−5, 5]. Results of this simulation
study are given in Table 5.2. As expected, the power decreases as the interval widens, which
occurs regardless of the missing data rate. Following a referee’s recommendation, further
simulations were conducted to evaluate the loss of power when the inﬁmum test is performed
on a given support set of β compared to the ideal set Ξ = {β 0 }. For this, we generated the
data as before with the only diﬀerence that α3 = 0.7. We then performed the inﬁmum test
˜
using Ξ = [−2, 0] and Ξ = {−1}. Results revealed a minor loss of power of the inﬁmum test
on Ξ = [−2, 0] compared to the ideal set Ξ = {−1} (see Table 5.4).

5.4

Discussion

While hypothesis testing under nonidentiﬁability has been previously considered, the framework is often too restrictive for sensitivity analyses. In a sensitivity analysis, the model may
165

Table 5.3: Empirical power of the inﬁmum test to detect the interaction eﬀect α3 = 1 for
˜
two ranges Ξ of β with true value being β 0 = −1
n=100
Missing+

0.10

0.05

0.01

0.10

0.05

0.01

0.912

0.865

0.720

0.999

0.997

0.987

32%, 46%

0.783

0.706

0.529

0.981

0.960

0.905

14%, 26%

0.899

0.836

0.655

0.997

0.990

0.970

32%, 46%

[−5, 5]

Nominal test level

14%, 26%

[−3, 3]

Nominal test level

data rate

Ξ

n=300

0.767

0.695

0.492

0.953

0.925

0.858

0.003

0.007

0.009

0.003

0.007

0.009

Monte carlo SE
+

First and second time point missing data rate

Table 5.4: Empirical power of the inﬁmum test to detect the interaction eﬀect α3 = 0.7 for
˜
two sets Ξ of β with true value being β 0 = −1
n=100

n=300

Missing+

Nominal test level

Nominal test level

data rate

0.10

0.05

0.01

0.10

0.05

0.01

[−2, 0]

13%, 23%

0.806

0.699

0.470

0.996

0.989

0.949

{−1}

13%, 23%

0.829

0.725

0.469

0.998

0.998

0.982

[−2, 0]

31%, 44%

0.677

0.570

0.342

0.978

0.957

0.834

{−1}

31%, 44%

0.707

0.589

0.348

0.992

0.974

0.915

Monte carlo SE

0.003

0.007

0.009

0.003

0.007

0.009

Ξ

+

First and second time point missing data rate

166

not be identiﬁable under either the null or alternative hypothesis, and proﬁling may not
lead to consistent estimation of the parameter of interest under the null. As a result, the
supremum test may not be appropriate. As discussed in this paper, a theoretically rigorous
approach to this testing problem may be based on inﬁmum statistics, whose distribution
must be carefully considered under model misspeciﬁcation under the null hypothesis.
The inﬁmum testing approach was previously studied for likelihood analyses of parametric
models (Todem et al., 2010). In this paper, we have extended these results to general
estimating functions for parametric models. This includes limiting results for the proﬁle
estimators and the inﬁmum test and conﬁdence bands, as well as the validity of the bootstrap
procedure. Such results are critically important in sensitivity analyses of complex data arising
in longitudinal studies, where full model speciﬁcation may be diﬃcult and partially speciﬁed
models may be more easily analyzed using non-likelihood based approaches.

167

APPENDIX

168

Proof of Theorem 5.2.1
i) We show that supβ∈Ξ

ˆ
θ(β) − θ∗ (β) →p 0.

Condition C2 implies that G1 and G2 are Donsker and hence Glivenko-Cantelli (van der
Vaart andWellner, 2000a, 2000b). Therefore,

˜
˜
sup
SY (θ, β) − S(θ, β) →p 0 and
sup
WY (θ, β) − W (θ, β) →p 0. (5.4)
θ∈Θ,β∈Ξ
θ∈Θ,β∈Ξ
The deﬁnitions of ˆ
θ(β) and θ∗ (β) and Condition C4 imply that

˜
0 = SY (ˆ
θ(β), β) − S(θ∗ (β), β)
=

˜
SY (ˆ
θ(β), β) − SY (θ∗ (β), β) + SY (θ∗ (β), β) − S(θ∗ (β), β)

˜ θ(β), β) ˆ
θ(β) − θ∗ (β) + ν 1n (β),
θ(β) − θ∗ (β) + v2n (β) ˆ
= W (ˇ

(5.5)

where ˇ
θ(β) is on the line segment between ˆ
θ(β) and θ∗ (β). Also, supβ∈Ξ |v2n (β)|
˜
→p 0 and ν 1n (β) = SY (θ∗ (β), β) − S(θ∗ (β), β). From (5.5), we have,

sup ˆ
θ(β) − θ∗ (β)
β∈Ξ

˜
sup − W −1 (ˇ
θ(β), β) + v2n (β) ν 1n (β)
β∈Ξ
˜
≤ sup −W −1 (ˇ
θ(β), β) + v2n (β) sup ν 1n (β) .
β∈Ξ
β∈Ξ
=

Because of Condition C3, for any θ ∈ Θ, for any β ∈ Ξ, there exists a positive number
λ1 , such that λmin (β) > λ1 > 0. For any s × s symmetric matrix A, denote its Euclidean norm as A = λmax (A), where λmax (A) is the largest eigenvalue of A and if
˜
= λ−1 (A). Therefore, −W −1 (ˇ
θ(β), β) = λ−1 (β) ≤
min
min
λ−1 , and supβ∈Ξ ˆ
θ(β) − θ∗ (β) ≤ λ−1 supβ∈Ξ ν n (β) . The uniform consistency of
1
1
A is also nonsingular, A−1

169

ˆ
θ(β) to θ∗ (β) follows from supβ∈Ξ ν n (β)

˜
= supβ∈Ξ SY (θ∗ (β), β) − S(θ∗ (β), β)

≤

˜
supθ∈Θ,β∈Ξ SY (θ, β) − S(θ, β) →p 0, according to (5.4).
ii) We show that n1/2 (ˆ
θ(β) − θ∗ (β)) converge weakly to a tight Gaussian process.
Based on the uniform consistency of ˆ
θ(β), and (5.4) and (5.5), applying the Taylor
expansion to SY (ˆ
θ(β), β) around SY (θ∗ (β), β) gives

˜
n1/2 ˆ
θ(β) − θ∗ (β) ≈ −n−1/2 W −1 (θ∗ (β), β)ν 1n (β)
= −n−1/2
= −n−1/2

n ˜ −1 ∗
W (θ (β), β) sY (θ∗ (β), β) − EsY (θ∗ (β), β)
i=1
1
i
n ˜ −1 ∗
n
W (θ (β), β)sY (θ∗ (β), β) ≡ −n−1/2
η (β),
i=1
i=1 i
i

where ≈ denotes asymptotic equivalence uniformly in β ∈ Ξ. Because Condition C2 implies
˜
that G1 is Donsker and using previous results that W −1 (θ∗ (β), β) is uniformly bounded
˜
for β ∈ Ξ, the function class {W −1 (θ∗ (β), β)sY (θ∗ (β), β), β ∈ Ξ, i = 1, . . . , n} is Donsker.
i
This permits the application of a functional central limit theory to establish the weak convergence of ˆ
θ(β). Therefore, limn→∞ cov

θ(β
θ(β
n1/2 ˆ 1 ) − θ∗ (β 1 ) , n1/2 ˆ 2 ) − θ∗ (β 2 )

= E η 1 (β 1 )η T (β 2 )
1
= Σ∗ (β 1 , β 2 ). For a given β, var n1/2 ˆ
θ(β) − θ∗ (β)

= E η 1 (β)η T (β) .
1

Proof of Theorem 5.2.2
o
Applying the Taylor expansion to sY (θ∗ (β), β) around sY (ˆ (β), β) gives
θ
i
i
n−1/2
= n−1/2

n
s (θ∗ (β), β)ζ i
i=1 Yi
n
o
o
sY (ˆ (β), β)ζ i + n−1/2 θ∗ (β) − ˆ (β)
θ
θ
i=1 i
170

n
g (¯
θ(β), β)ζ i ,
i=1 Yi

o
where ¯
θ(β) is on the line segment between ˆ (β) and θ∗ (β). Given observations {Yi }n ,
θ
i=1
o
Condition C4 and supβ∈Ξ θ∗ (β) − ˆ (β) = op (1), one has
θ
n−1/2

n
s (θ∗ (β), β)ζ i ≈ n−1/2
i=1 Yi

n
o
sY (ˆ (β), β)ζ i .
θ
i=1 i

Based on the deﬁnition of ˜
θ(β) in (5.2) and (5.4), one has

o
−1 θo
n1/2 ˜
θ(β) − ˆ (β) = −n−1/2 WY (ˆ (β), β)
θ
−1 θo
≈ −n−1/2 WY (ˆ (β), β)
˜
≈ −n−1/2 W −1 (θ∗ (β), β)
= −n−1/2

n
o
sY (ˆ (β), β)ζ i
θ
i=1 i

n
s (θ∗ (β), β)ζ i
i=1 Yi
n
s (θ∗ (β), β)ζ i
i=1 Yi

n
η (β)ζ i .
i=1 i

o
θ(β) − ˆ (β)
θ
Hence, conditional on observations {Yi }n , n1/2 ˜
i=1

converges weakly to a

Gaussian process with mean 0 and covariance function
Σo (β 1 , β 2 ) = limn→∞ n−1

n E(η (β )ζ ζ T η T (β ) | Y ). We also have
i 1 i i i 2
i=1

Σo (β 1 , β 2 ) − Σ∗ (β 1 , β 2 )
=

lim n−1
n→∞

=

lim n−1
n→∞

n
E(η i (β 1 )ζ i ζ T η T (β 2 ) | Y ) − E η 1 (β 1 )η T (β 2 )
1
i i
i=1
n
η (β )η T (β ) − E η 1 (β 1 )η T (β 2 ) = 0.
1
i=1 i 1 i 2

o
o
Hence, limn→∞ cov n1/2 ˜ 1 ) − ˆ (β 1 ) , n1/2 ˜ 2 ) − ˆ (β 2 )
θ(β
θ
θ(β
θ
= Σ∗ (β 1 , β 2 ), the
o
conditional distribution of n1/2 ˜
θ(β) − ˆ (β) is asymptotically equivalent to the unconθ
ditional distribution of n1/2 ˆ
θ(β) − θ∗ (β) .

171

BIBLIOGRAPHY

172

BIBLIOGRAPHY
[1] Cai, T. and Hall, P. (2005). Prediction in functional linear regression. Annals of Statistics,
34, 2159-2179.
[2] Cai, T. and Yuan, M. (2010). Nonparametric covariance function estimation for functional
and longitudinal data. Technical report.
[3] Cao, G., Wang, L., Li, Y. and Yang, L. (2012). Spline conﬁdence envelopes for covariance
function in dense functional/longitudinal data. Manuscript.
[4] Cao, G., Yang, L. and Todem, D. (2012). Simultaneous inference for the mean function
of dense functional data. Journal of Nonparametric Statistics, 24, 359-377
[5] Cao, G., Wang, J., Wang, L. and Todem, D. (2012). Spline conﬁdence bands for functional
derivatives. Journal of Statistical Planning and Inference, 104, 1557-1570.
[6] Cao, G., Todem, D., Yang, L. and Fine, J. (2012). Evaluating statistical hypotheses using
weakly identiﬁable estimating functions. Scandinavian Journal of Statistics, accepted.
[7] Chambers, R. L. and Welsh, A. H. (1993). Log-linear models for survey data with
non-ignorable non-response. Journal of the Royal Statistical Society, Series B: Statistical
Methodology, 55, 157-170.
[8] Claeskens, G. and Van Keilegom, I. (2003). Bootstrap conﬁdence bands for regression
curves and their derivatives. Annals of Statistics, 31, 1852-1884.
[9] Crainiceanu, C. M., Staicu, A. and Di, C. (2009). Generalized multilevel functional regression, Journal of the American Statistical Association, 104, 1550-1561
[10] Copas, J. (1999). What works?: Selectivity models and meta-analysis. Journal of the
Royal Statistical Society, Series A: Statistics in Society, 162, 95-109.
173

[11] Copas, J. B. and Li, H. G. (1997). Inference for non-random samples. Journal of the
Royal Statistical Society, Series B: Methodological, 59, 55-77.
[12] Cs˝rg˝, M. and R´v´sz, P. (1981). Strong Approximations in Probability and Statistics.
o o
e e
Academic Press, New York-London.
[13] Dacunha-Castelle, D. and Gassiat, E. (1999). Testing the order of a model using locally
conic parametrization: population mixtures and stationary ARMA processes. Annals of
Statistics, 27, 1178-1209.
[14] Davies, R. B. (1977). Hypothesis testing when a nuisance parameter is present only
under the alternative. Biometrika, 64, 247-254.
[15] Davies, R. B. (1987). Hypothesis testing when a nuisance parameter is present only
under the alternative. Biometrika, 74, 33-43.
[16] de Boor, C. (2001) A Practical Guide to Splines. Springer-Verlag, New York.
[17] Degras, D. A. (2011). Simultaneous conﬁdence bands for nonparametric regression with
functional data. Statistica Sinica, 21, 1735-1765.
[18] DeVore, R. and Lorentz, G. (1993). Constructive Approximation: Polynomials and
Splines Approximation. Springer-Verlag, Berlin.
[19] Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman &
Hall Ltd.
[20] Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and its Applications. Chapman & Hall, London.
[21] Ferraty, F. and Vieu, P. (1996). Nonparametric Functional Data Analysis: Theory and
Practice. Springer, New York.
[22] Gasser, T. and M¨ller, H. G. (1984). Estimating regression functions and their derivau
tives by the kernel method. Scandinavian Journal of Statistics, 11, 171-185.
[23] Gong, G. and Samaniego, F. J. (1981). Pseudo maximum likelihood estimation: Theory
and applications. The Annals of Statistics, 9, 861-869.
174

[24] Hall, P. and Hosseini-Nasab, M. (2006). On properties of functional principal components analysis. Journal of the Royal Statistical Society: Series B, 68, 109-126.
[25] Hall, P., M¨ller, H. G. and Wang, J. L. (2006). Properties of principal component
u
methods for functional and longitudinal data analysis. Annals of Statistics, 34, 1493-1517.
[26] Hall, P., M¨ller, H. G. and Yao, F. (2009). Estimation of functional derivatives. Annals
u
of Statistics, 37, 3307-3329.
[27] Hansen, B. E. (1996). Inference when a nuisance parameter is not identiﬁed under the
null hypothesis. Econometrica, 61, 413-430.
[28] H¨rdle, W. (1989). Asymptotic maximal deviation of M-smoothers. Journal of Multia
variate Analysis, 29, 163-179.
[29] Huang, J. (2003). Local asymptotics for polynomial spline regression. Annals of Statistics, 31, 1600-1635.
[30] Huber, P. J. (1967). The behavior of the maximum likelihood estimates under nonstandard conditions (in Proceedings of the ﬁfth Berkeley Symposium in mathematical
Statistics and Probability, ed.). Berkeley: University of California Press.
[31] H¨rny, C., Bernhard, J., Gelber, R., Coates, A., Castiglione, M., Isley, M., Dreher, D.,
u
Peterson, H., Goldhirsch, A. and Senn, H. (1992). Quality of life measures for patients
receiving adjuvant therapy for breast cancer: an international trial. Eur.J. Cancer, 28,
118-124.
[32] James, G. M., Hastie, T. and Sugar, C. (2000). Principal component models for sparse
functional data. Biometrika, 87, 587–602.
[33] James, G. and Hastie, T. (2001). Functional linear discriminant analysis for irregularly
sampled curves. Journal of the Royal Statistical Society Series B, 63, 533–550.
[34] James, G. M. and Silverman, B. W. (2005). Functional adaptive model estimation.
Journal of the American Statistical Association, 100, 565-576.
[35] Kenward, M. G., Goetghebeur, E. J. T. and Molenberghs, G. (2001). Sensitivity for
incomplete categorical data. Statistical Modelling, 1, 31-48.
175

[36] Li, Y. and Hsing, T. (2010a). Uniform convergence rates for nonparametric regression
and principal component analysis in functional/longitudinal data. Annals of Statistics, 38,
3321-3351.
[37] Li, Y. and Hsing, T. (2010b). Deciding the dimension of eﬀective dimension reduction
space for functional and high-dimensional data. Annals of Statistics, 38, 3028-3062.
[38] Li, Y., Wang, N. and Carroll, R. J. (2010). Generalized functional linear models with
semiparametric single-index interactions, Journal of the American Statistical Association,
105, 621-633.
[39] Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data. John
Wiley & Sons.
[40] Liu, B. and M¨ller, H. G. (2009). Estimating derivatives for samples of sparsely observed
u
functions, with application to online auction dynamics. Journal of the American Statistical
Association, 104, 704-717.
[41] Lu, G. and Copas, J. B. (2004). Missing at random, likelihood ignorability and model
completeness. Annals of Statistics, 32, 754-765.
[42] Ma, S., Yang, L. and Carroll, R. J. (2012). A simultaneous conﬁdence band for sparse
longitudinal regression. Statistica Sinica, 22, 95-122.
[43] Morris, J. S. and Carroll, R. J. (2006). Wavelet-based functional mixed models. Journal
of the Royal Statistical Society, Series B, 68, 179-199.
[44] M¨ller, H. G. (2009). Functional modeling of longitudinal data. In: Longitudinal Data
u
Analysis (Handbooks of Modern Statistical Methods), Ed. Fitzmaurice, G., Davidian, M.,
Verbeke, G., Molenberghs, G., Wiley, New York, 223-252.
[45] Parzen, M. I., Wei, L. J. and Ying, Z. (1994). A resampling method based on pivotal
estimating functions. Biometrika, 81, 341-350.
[46] Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis. Springer, New
York.
[47] Rice, J. A. and Wu, C. O. (2001). Nonparametric mixed eﬀects models for unequally
sampled noisy curves. Biometrics, 57, 253-259.
176

[48] Ritz, C. and Skovgaard, I. M. (2005). Likelihood ratio tests in curved exponential
families with nuisance parameters present only under the alternative. Biometrika, 92,
507–517.
[49] Rotnitzky, A., Scharfstein, D., Su, T.-L. and Robins, J. (2001). Methods for conducting
sensitivity analysis of trials with potentially nonignorable competing causes of censoring.
Biometrics, 57, 103-113.
[50] Scharfstein, D. O., Rotnitzky, A. and Robins, J. M. (1999). Adjusting for nonignorable
drop-out using semiparametric nonresponse models (C/R: P1121-1146). Journal of the
American Statistical Association, 94, 1096-1120.
[51] Schumaker, L. (2007). Spline Functions: Basic Theory. Third Edition. Cambridge University Press, Cambridge.
[52] Shi, M., Weiss, R. E. and Taylor, J. M. G. (1996). An analysis of paediatric CD4 counts
for acquired immune deﬁciency syndrome using ﬂexible random curves. Journal of the
Royal Statistical Society, Series B, 45, 151-163.
[53] Song, R., Kosorok, R. and Fine, J. (2009). On asymptotically optimal tests under loss
of identiﬁability in semiparametric models. Annals of Statistics, 37, 2409-2444.
[54] Stone, C. J. (1994). The use of polynomial splines and their tensor products in multivariate function estimation. Annals of Statistics, 22, 118-184.
[55] Todem, D., Fine, J. and Peng, L. (2010). A global sensitivity test for evaluating statistical hypotheses with nonidentiﬁable models. Biometrics, 66, 558-566.
[56] Troxel, A. B., Harrington, D. P. and Lipsitz, S. R. (1998a). Analysis of longitudinal
data with non-ignorable non-monotone missing values. Journal of the Royal Statistical
Society, Series C: Applied Statistics, 47, 425-438.
[57] Troxel, A. B., Lipsitz, S. R. and Harrington, D. P. (1998b). Marginal models for
the analysis of longitudinal measurements with nonignorable non-monotone missing data.
Biometrika, 85, 661-672.
[58] van der Vaart, A. W. and Wellner, J. A. (2000a). Preservation theorems for
Glivenko-Cantelli and uniform Glivenko-Cantelli theorems. High Dimensional Probability II. Birkh¨user, Boston.: eds: E Gin´, DM Mason, JA Wellner.
a
e
177

[59] van der Vaart, A. W. and Wellner, J. A. (2000b). Weak convergence and empirical
processes. New York: Springer.
[60] Wahba, G. (1990). Spline Models for Observational Data., 59. SIAM, Philadelphia, PA.
[61] Wang, J. and Yang, L. (2009a). Polynomial spline conﬁdence bands for regression curves.
Statistica Sinica, 19, 325-342.
[62] Wang, L. and Yang, L. (2009b). Spline estimation of single index model. Statistica
Sinica, 19, 765-783.
[63] Wang, N., Carroll, R. J. and Lin, X. (2005). Eﬃcient semiparametric marginal estimation for longitudinal/clustered data. Journal of the American Statistical Association, 100,
147-157.
[64] White, H. (1982). Maximum likelihood estimation of misspeciﬁed models. Econometrica,
50, 141-161.
[65] Xia, Y. (1998). Bias-corrected conﬁdence bands in nonparametric regression. Journal of
the Royal Statistical Society, Series B, 60, 797-811.
[66] Yao, F. and Lee, T. C. M. (2006). Penalized spline models for functional principal
component analysis. Journal of the Royal Statistical Society, Series B, 88, 3-25.
[67] Yao, F., M¨ller, H. G. and Wang, J. L. (2005a). Functional linear regression analysis
u
for longitudinal data. Annals of Statistics, 33, 2873-2903.
[68] Yao, F., M¨ller, H. G. and Wang, J. L. (2005b). Functional data analysis for sparse
u
longitudinal data. Journal of the American Statistical Association, 100, 577-590.
[69] Zhao, X., Marron, J. S. and Wells, M. T. (2004). The functional data analysis view of
longitudinal data. Statistica Sinica, 14, 789-808.
[70] Zhao, Z. and Wu, W. (2008). Conﬁdence bands in nonparametric time series regression.
Annals of Statistics, 36, 1854-1878.
[71] Zhou, L., Huang, J. and Carroll, R. J. (2008). Joint modelling of paired sparse functional
data using principal components. Biometrika, 95, 601-619.
178

[72] Zhou, S., Shen, X. and Wolfe, D. A. (1998). Local asymptotics of regression splines and
conﬁdence regions. Annals of Statistics, 26, 1760-1782.
[73] Zhou, S. and Wolfe, D. A. (2000). On derivative estimation in spline regression. Statistica
Sinica, 10, 93-108.
[74] Zhu, H., Zhang, H. (2006). Generalized score test of homogeneity for mixed eﬀects
models. Annals of Statistics, 34, 1545–1569.

179