SEMI-PARAMETRIC ESTIMATION OF BIVARIATE DEPENDENCE
UNDER MIXED MARGINALS
By
Wenmei Huang

A DISSERTATION
Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY

Department of Statistics and Probability
2011

ABSTRACT

SEMI-PARAMETRIC ESTIMATION OF BIVARIATE DEPENDENCE UNDER MIXED
MARGINALS
By
Wenmei Huang

Copulas are a ﬂexible way of modeling dependence in a set of variables where the association
between variables can be elicited separately from the marginal distributions. A semi-parametric
approach for estimating the dependence structure of bivariate distributions derived from copulas
is investigated when the associated marginals are mixed, that is, consisting of both discrete and
continuous components. The semi-parametric likelihood approach is proposed for obtaining the
estimator of the dependence parameter under unknown marginals. The consistency and asymptotic
normality of the estimator is established as sample size tends to inﬁnity. For constructing conﬁdence intervals in practice, an estimator of the asymptotic variance is proposed and its properties
are investigated via simulation. Extensions to higher dimensions are discussed. Several simulation studies and real data examples are provided for investigating the application of the developed
methodology of inference. This work generalizes prior results obtained on the estimation of dependence when the marginals are continuous by Genest et al. [11].

To my professors and my loving family.

iii

ACKNOWLEDGMENTS

I would like to express my sincere gratitude to my dissertation advisor, Prof. Sarat C. Dass,
for his patient guidance, encouragement and support during my graduate study at Michigan State
University, and his precious advice for my professional career. He helped me to learn not only
how to do research in statistics, but also how to write statistical papers using proper English. The
education I received under his guidance will be the most valuable in my future career.
I am grateful to Prof. Tapabrata Maiti, Prof. Chae Young Lim of the Department of Statistics
and Probability, and Prof. Zhengfang Zhou of the Department of Mathematics, for serving on my
thesis committee. I accumulated practical experience from the consulting work at the Center for
Statistical Training and Consulting. Prof. Connie Page, Prof. Dennis Gilliland, and Prof. Sandra
Herman shared with me their experience and knowledge in dealing with practical problems, and
guided me through my projects.
I would like to thank Prof. James Stapleton for his help and encouragement with respect to
my teaching and study. He has been very supportive to me, as well as to all his students. I thank
him for offering to review my thesis. I would like to thank Prof. Anil K. Jain at Department of
Computer Science and Engineering, Michigan State University, for letting me use the biometrics
data in the PRIP lab in my research. I would like to thank Dr. Yongfang Zhu for discussions
on course work and research. I also would like to thank all the professors of the Department of
Statistics and Probability and all my great friends at Michigan State University for making my life
there so unique and wonderful.
Finally, I especially thank my loving family for their encouragement and support.

iv

TABLE OF CONTENTS

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vii

List of Figures

viii

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

INTRODUCTION

1

2

A MODEL FOR BIVARIATE DISTRIBUTIONS
2.1 Joint Distributions with Mixed Marginals . . . . . . . . . . . . . . . . . . . . . .
2.2 Semi-Parametric Estimation of θ . . . . . . . . . . . . . . . . . . . . . . . . . . .

6
6
9

3

ASYMPTOTIC PROPERTIES OF SEMI-PARAMETRIC
VARIATE DISTRIBUTIONS
3.1 Statement of the Main Theorem . . . . . . . . . . . . . .
ˆ
3.2 Consistency of θn . . . . . . . . . . . . . . . . . . . . . .
3.3 Asymptotic Behavior of Bn . . . . . . . . . . . . . . . .
3.4 Asymptotic Behavior of An . . . . . . . . . . . . . . . .
cc
3.4.1 Asymptotic normality of Rn . . . . . . . . . . .
cd
dc
3.4.2 Asymptotic normality of Rn and Rn . . . . . .
dd
3.4.3 Asymptotic normality of Rn . . . . . . . . . . .
3.4.4 Asymptotic normality of Rn . . . . . . . . . . . .
3.5 Proof of the Main Result . . . . . . . . . . . . . . . . . .

ESTIMATOR IN BI.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

11
11
13
19
20
23
33
41
44
45

4

A VARIANCE ESTIMATOR OF BIVARIATE DISTRIBUTIONS

46

5

EXTENSIONS TO HIGHER DIMENSIONS

51

6

JOINT DISTRIBUTIONS USING t-COPULA
6.1 Joint Distributions Using t-copula . . . . . . .
6.2 Estimation of R and ν . . . . . . . . . . . . . .
6.2.1 Estimation of R for ﬁxed ν . . . . . . .
6.2.2 Selection of the Degrees of Freedom, ν
6.3 Summary . . . . . . . . . . . . . . . . . . . .

7

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

56
56
63
63
68
68

SIMULATION AND REAL DATA RESULTS
7.1 Simulation Results . . . . . . . . . . . . . . . . . . . . . .
7.2 Application to Multimodal Fusion in Biometric Recognition
7.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . .
7.2.2 Application in Biometric Fusion . . . . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

69
69
74
74
81

v

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

8

SUMMARY AND CONCLUSION

86

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vi

88

LIST OF TABLES

4.1

Relationship among (Xj , Yj ), RX(j) , and RY (j) . . . . . . . . . . . . . . . . . . 47

7.1

Simulation results for Experiment 1 with ν0 = ∞ and ρ = 0.75. The absolute
bias and relative absolute bias of the estimator ρn are provided, together with the
ˆ
empirical coverage of the approximate 95% conﬁdence interval for ρ based on the
asymptotic normality of ρn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
ˆ

7.2

L1 distances for paired values of ν0 corresponding to two values of ρ0 , 0.20 and
0.75. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

7.3

Simulation results for Experiment 2 with ν0 = ∞ and ρ0 = 0.75. . . . . . . . . . 72

7.4

Simulation results for Experiment 3 with ν0 = 10. The two correlation values
considered are ρ0 = 0.2 and ρ0 = 0.75. . . . . . . . . . . . . . . . . . . . . . . . 73

7.5

Simulation results for Experiment 4. . . . . . . . . . . . . . . . . . . . . . . . . . 73

7.6

Results of the estimation procedure for R and ν based on the NIST database. . . . 82

vii

LIST OF FIGURES

7.1

Density curves for t distribution with degrees of freedom ν = 3, 5, 10, 15, 20, 25
and normal distribution. For interpretation of the references to color in this and all
other ﬁgures, the reader is referred to the electronic version of this dissertation. . . 71

7.2

Some examples of biometric traits: (a) ﬁngerprint, (b) iris scan, (c) face scan, (d)
signature, (e) voice, (f) hand geometry, (g) retina, and (h) ear. . . . . . . . . . . . 75

7.3

Histograms of matching scores, corresponding to genuine scores for Matcher 1.
Continuous (respectively, generalized) density estimators is given by the dashed
lines (solid lines). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

7.4

Histograms of matching scores, corresponding to genuine scores for Matcher 2.
Continuous (respectively, generalized) density estimators is given by the dashed
lines (solid lines). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

7.5

Histograms of matching scores, corresponding to impostor scores for Matcher 1.
Continuous (respectively, generalized) density estimators is given by the dashed
lines (solid lines). The spike corresponds to discrete components. Note how the
generalized density estimator performs better compared to the continuous estimator (assuming no discrete components). . . . . . . . . . . . . . . . . . . . . . . . 79

7.6

Histograms of matching scores, corresponding to impostor scores for Matcher 2.
Continuous (respectively, generalized) density estimators is given by the dashed
lines (solid lines). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

7.7

Performance of copula fusion on the MSU-Multimodal database . . . . . . . . . . 83

7.8

Performance of copula fusion on the NIST database. . . . . . . . . . . . . . . . . 84

viii

CHAPTER 1
INTRODUCTION
Copulas are functions that join or “couple” multivariate distribution functions to their onedimensional marginal distribution functions. Essentially, a copula is a function C from [0, 1]D
(D ≥ 2) to [0, 1] with the following properties:
1. For every (u1 , ..., uD ) ∈ [0, 1]D ,
C(u1 , ..., uD ) = 0 if at least one of uk = 0, for k = 1, ..., D

(1.1)

C(u1 , ..., uD ) = uk if uj = 1 for all j = k, k = 1, ..., D

(1.2)

and

2. For every (u1 , ..., uD ) and (v1 , ..., vD ) ∈ [0, 1]D such that uk ≤ vk for k = 1, ..., D, which
deﬁnes a D-box V = [u1 , v1 ] × [u2 , v2 ] × ... × [uD , vD ] ⊂ [0, 1]D . Then
sgn(c)C(c) ≥ 0

(1.3)

where the sum is taken over all vertices c of V and sgn(c) is given by


 1, if c = u for an even number of k s,
k
k
sgn =

 −1, if c = u for an odd number of k s.
k
k
Alternatively, copulas are multivariate distribution functions whose one-dimensional marginals are
uniform on the interval [0, 1].
1

Since their introduction by Sklar [46], copulas have proved to be a useful tool for analyzing
multivariate dependence structures due to many unique and interesting features. Let Fk (xk ) =
P (Xk ≤ xk ) for k = 1, 2, . . . , D denote D continuous distribution functions on the real line, and
let F denote a joint distribution function on RD whose k-th marginal distribution corresponds to
Fk . According to Sklar’s Theorem [46], there exists a unique function C(u1 , ..., uD ) from [0, 1]D
to [0, 1] satisfying
F (x1 , ..., xD ) = C(F1 (x1 ), . . . , FD (xD )),

(1.4)

where x1 , ..., xD are D real numbers. The function C is known as a D-copula function that couples
the one-dimensional distribution functions Fk , k = 1, 2, . . . , D to obtain F . If not all marginals
are continuous, C is uniquely determined on RanF1 × ... × RanFD , where RanFk is the range
of Fk . Equation (1.4) can also be used to construct D-dimensional distribution functions F whose
marginals are the pre-speciﬁed distributions Fk , k = 1, . . . , D: Choose a copula function C and
deﬁne the distribution function F as in (1.4). It follows that F is a D-dimensional distribution
function with marginals Fk , k = 1, . . . , D.
Investigating dependence structures of multivariate distributions has always been an important
area for researchers; see, for example, [26], [27], and [35]. For example, one of the central problems in statistics concerns testing the hypothesis that random variables are independent. Prior to the
very recent explosion of copula theory and application, the only models available in many ﬁelds to
represent the dependence structure were the classical multivariate models, such as Gaussian multivariate model. These models entailed rigid assumptions on the marginal and joint behaviors of the
variables. Therefore, they provide limited usefulness.
Though the phrase “copula” was ﬁrst used by Sklar [46] in 1959 and traces of copula theory
can be found in Hoeffdings work during the 1940s, the study of copulas and their application in
statistics is a rather modern phenomenon. Earlier efforts have addressed different statistical aspects
of specialized as well as general copula-based joint distributions. Inference procedures for bivariate
Archimedean copulas have been developed and discussed by Genest et. al [12]. Demarta et al.,
2

[8], studied properties related to t-copulas, whereas Genest et al., [11] and [13], gave a goodness
of ﬁt procedure for general copulas. Estimation techniques and properties of the estimators have
been well studied with applications ranging from statistics to mathematical ﬁnance and ﬁnancial
risk management; see, for example, Shih et al. [44], Embrechts et al. [9], Cherubini et al. [5], Frey
et al. [10] and Chen et al. [4].
Many multivariate models for dependence can be generated by parametric families of copulas,
{Cθ : θ ∈ Θ}, typically indexed by a real or vector-valued parameter θ. Examples of such systems
are given in [23], [30], [22], and others. The recent monograph by Hutchinson and Lai [17], which
includes an extensive bibliography, constitutes a handy reference to this expanding literature.
Copula-based models are natural in situations where learning about the association between
the variables is important, since the effect of the dependence structure is easily separated from
that of the marginals. In such situations, there is typically enough data to obtain nonparametric
estimators of the marginal distributions, but insufﬁcient information to afford nonparametric estimation of the structure of the association. In such cases, it is convenient to adopt a parametric
form for the dependence function while keeping the marginals unspeciﬁed. To estimate the dependence parameter θ, two strategies could be adopted depending on the circumstances: (i) if valid
parametric models are already available for the marginals, then it is straightforward in principle to
write down a likelihood function for the data, which makes the estimation of θ margin-dependent,
because the estimators of the parameters involved in the marginals would be indirectly affected by
the copula. This is the parametric approach. (ii) when nonparametric estimators are contemplated
for the marginals, however, inference about the dependence parameter θ will be margin-free. This
is the semi-parametric approach. Clayton [6], Hougaard et al. [15] and Oakes [36] have pointed
out that the margin-free requirement is sensible in applications where the focus of the analysis is
on the dependence structure.
Given a sample of n observations Xj = (X1j , X2j , . . . , XDj ), j = 1, 2, . . . , n from the joint
distribution F (x1 , x2 , . . . , xD ) = Cθ (F1 (x1 ), . . . , FD (xD )), the estimation procedure involves
3

ˆ
selecting θn to maximize the semi-parametric likelihood
n
(θ) =
log [cθ (F1n (X1j ), . . . , FDn (XDj ))]
j=1

(1.5)

where Fkn denotes n/(n + 1) times the empirical distribution function of the k-th component
observations Xkj for j = 1, 2, . . . , n, and cθ is the density of Cθ with respect to Lebesgue measure
on [0, 1]D . The utilization of Fkn instead of the empirical distribution here avoids difﬁculties
arising from the potential unboundedness of log cθ (u1 , ..., uD ) as some of the uk ’s tend to 1.
A central assumption made in the earlier studies is that the marginals associated with the joint
distribution F should be continuous. In reality, there are many situations where this assumption
is not satisﬁed and the marginals can be mixed, that is, they contain both discrete and continuous
components (see, for example, Kohn et al. [37]). Based on the continuous assumption and the
ˆ
ˆ
regularity conditions, Genest et al. [11] have shown that n1/2 θn is asymptotic normal, where θn
is the semi-parametric estimator of θ in (1.5).
Our work extends previous methodology by accounting for mixed marginals for D = 2 and
Θ ⊂ R. Uniqueness is an important consideration when estimating the unknown parameters corresponding to a copula. Since our mixed marginals have discrete components, one way to achieve
uniqueness is to restrict C to belong to a particular parametric family. Under this assumption, We
develop an estimation technique for ﬁnding the semi-parametric maximum likelihood estimator,
ˆ
θn , for the parametric family {Cθ : θ ∈ Θ} in the bivariate situation in Chapter 2. The estimation methodology involves integrals corresponding to the discrete components and is therefore,
non-standard. Consistency and asymptotic normality of the semi-parametric maximum likelihood
ˆ
ˆ
estimator θn is developed in Chapter 3. A a variance estimator of θn ,

is developed in Chapter 4.

Note that for multivariate observations, copulas allow ﬂexible modelling of the joint distribution
via its marginal distributions as well as the correlation between pairwise components of the vector
of observations. Therefore, the theory presented in Chapter 2 and Chapter 3 can be extended to
higher dimensions, and this is given in Chapter 5. In Chapter 6, C is restricted to a particular parametric family of copula, the t-copula, and ﬁndings related to the t-copula are provided. Numerical
4

simulations and application of the methodology to real biometric data are presented in Chapter 7.
We summarize our work and discuss future research directions in Chapter 8.

5

CHAPTER 2
A MODEL FOR BIVARIATE
DISTRIBUTIONS
2.1 Joint Distributions with Mixed Marginals
Let F and G be two distributions on the real line that are mixed, that is, both F and G consist of a
mixture of discrete as well as continuous components. The general form for F is given by
dF

dF
pF h I{D ≤x} + (1 −
Fh

F (x) =

pF h )

x
−∞

f (w) dw,

(2.1)

h=1
h=1
where I{A} is the indicator function of set A (that is, I{A} = 1 if A is true, and 0 otherwise);
in (2.1), the distribution function F consists of dF discrete components denoted by DF h with
F (DF h ) − F (D− ) = pF h for h = 1, 2, . . . , dF where x− denotes the left limit of x, and f
Fh
is the density (continuous) component of F . The discrete components DF h , h = 1, 2, . . . , dF
correspond to jump points in the distribution function F , and are called the jump points of F . All
other points on R correspond to points of continuity of F , that is, F (x) − F (x− ) = 0 if x = DF h .
The set of jump and continuity points of F are denoted by J (F ) and C(F ), respectively. In the
same way, writing
dG
G(y) =
l=1

dG
pGl I{D ≤y} + (1 −
Gl
6

pGl )
l=1

y
−∞

g(w) dw,

(2.2)

similar deﬁnitions can be given for the quantities dG , pGl , DGl and g. We denote the set of jump
and continuity points of G by J (G) and C(G), respectively.
Let Hθ be the bivariate function deﬁned by
Hθ (x, y) = Cθ (F (x), G(y))

(2.3)

for θ ∈ Θ with F and G as in (2.1) and (2.2), respectively. It follows from the properties of a
copula function that
Theorem 2.1.1. The function Hθ , as deﬁned in (2.3), is a valid bivariate distribution function on
R2 with marginals F and G.
Proof:

For Hθ (x, y) to be a valid distribution function on R2 , the following four conditions

should be satisﬁed:
(1) 0 ≤ Hθ (x, y) ≤ 1 for all (x, y),
(2) Hθ (x, y) → 0 as max(x, y) → −∞,
(3) Hθ (x, y) → 1 as min(x, y) → +∞, and
(4) for every x1 ≤ x2 and y1 ≤ y2 ,
∆H ≡ Hθ (x2 , y2 ) − Hθ (x1 , y2 ) − Hθ (x2 , y1 ) + Hθ (x1 , y1 ) ≥ 0.
Note that since Hθ (x, y) = Cθ (F (x), G(y)) = P (U ≤ F (x), V ≤ G(y)), (1) follows. To
prove (2), note that F (x) → 0 and G(y) → 0 when max(x, y) → −∞. Hence, Hθ (x, y) → 0
from the property of a cumulative distribution function. (3) follows similarly. To prove (4), we use
the facts that F (x1 ) ≤ F (x2 ) for x1 ≤ x2 and G(y1 ) ≤ G(y2 ) for y1 ≤ y2 , and
∆H =

[F (x1 ),F (x2 )]×[G(y1 ),G(y2 )]

cθ (u, v) du dv.

(2.4)

Since cθ (u, v) ≥ 0, ∆H ≥ 0 follows. Proof completed.
We turn now to give a characterization of the density, hθ (x, y), of Hθ (x, y). For a ﬁxed (x, y) ∈
R2 , the two components of (x, y) correspond to either a point of continuity or a jump point of
7

the corresponding marginal distribution function. For (X, Y ) ∼ Hθ , hθ (x, y) has four different
expressions, namely,

P {X ∈ (x, x + a], Y ∈ (y, y + b]}

 lim


a,b→0

ab







P {X = x, Y ∈ (y, y + b]}
 lim


b→0
b
hθ (x, y) =



P {X ∈ (x, x + a], Y = y}

 lim

a→0


a






 P {X = x, Y = y}

if x ∈ C(F ), y ∈ C(G),
if x ∈ J (F ), y ∈ C(G),
if x ∈ C(F ), y ∈ J (G), and
if x ∈ J (F ), y ∈ J (G),
(2.5)

The following theorem gives workable expressions for hθ (x, y) in terms of the copula:
Theorem 2.1.2. For each (x, y) ∈ R2 ,


 f (x) g(y) c (F (x), G(y))


θ







 y(y)
cθ (u, G(y)) du



[F (x− ),F (x)]

hθ (x, y) =

 f (x)

cθ (F (x), v) dv



[G(y − ),G(y)]








cθ (u, v) du dv


[F (x− ),F (x)]×[G(y − ),G(y)]

if x ∈ C(F ), y ∈ C(G),
if x ∈ J (F ), y ∈ C(G),
if x ∈ C(F ), y ∈ J (G),
if x ∈ J (F ), y ∈ J (G).
(2.6)

Proof:

When x ∈ C(F ) and y ∈ C(G),

P {X ∈ (x, x + a], Y ∈ (y, y + b]} =

[F (x),F (x+a)]×[G(y),G(y+b)]

cθ (u, v) du dv

from (2.4). This is approximately
ab f (x) g(y) cθ (F (x), G(y))
for small a and b.
When x ∈ J (F ) and y ∈ C(G), note that
P {X = x, Y ∈ (y, y + b]} =

[F (x− ),F (x)]×[G(y),G(y+b)]
8

cθ (u, v) du dv,

again from (2.4), which is approximately b g(y)

cθ (u, G(y)) du for small b. The
[F (x− ),F (x)]
third case follows similarly and the fourth expression is straightforward. Proof completed.
The terms in hθ (x, y) that involve the densities f and g do not depend on θ and can,
hence, be ignored for the estimation of θ. Subsequently, we focus on the function c∗ ≡
θ
c∗ (F (x− ), F (x), G(y − ), G(y)) deﬁned by
θ


 c (F (x), G(y))
if x ∈ C(F ), y ∈ C(G),
 θ









c (u, G(y)) du
if x ∈ J (F ), y ∈ C(G),


 [F (x− ),F (x)] θ

c∗ =
(2.7)
θ 


if x ∈ C(F ), y ∈ J (G),

 [G(y − ),G(y)] cθ (F (x), v) dv









cθ (u, v) du dv if x ∈ J (F ), y ∈ J (G).


[F (x− ),F (x)]×[G(y − ),G(y)]

2.2 Semi-Parametric Estimation of θ
If not all marginals are continuous, C in (1.4) is no longer unique. Uniqueness is an important consideration when estimating the unknown parameters corresponding to the copula function. Since
our marginals have discrete components, one way to achieve uniqueness is to restrict C to belong
to a particular parametric family. From now on, let {Cθ : θ ∈ Θ} be a restricted parametric family
of copulas.
Suppose { (Xj , Yj )T , j = 1, 2, . . . , n } is the set of n independent and identically distributed
bivariate random vectors arising from the joint distribution Hθ (x, y) in (2.3). The parameter of
interest is the bivariate dependence parameter, θ, and the log-likelihood function corresponding to
the n observations is

n
j=1

log hθ (xj , yj ).

The estimation of θ is complicated by the presence of nuisance parameters consisting of the
unknown distributions F and G (only the number dF and jump points DF h , h = 1, 2, . . . , dF
9

of F (correspondingly, dG and points DGl of G) are known). An objective function that has θ
as the only unknown parameter can be obtained by replacing F and G by Fn and Gn in the loglikelihood function, where Fn (and respectively, Gn ) is n/(n + 1) times the empirical cdf of F
(respectively, G). The rescaling by n/(n + 1) avoids difﬁculties arising from the unboundedness of
the log-likelihood function as the empirical cdfs tend to 0 or 1, and has been employed by Genest
ˆ
et al. in [11]. Thus, the unique semi-parametric maximum likelihood estimator of θ, θn , is the
maximizer of
n
L(θ) =
j=1

−
log {c∗ (Fn (x− ), Fn (xj ), Gn (yj ), Gn (yj ))}
θ,j
j

(2.8)

ˆ
(that is, θn = arg maxθ∈Θ L(θ)), where
−
c∗ ≡ c∗ (Fn (x− ), Fn (xj ), Gn (yj ), Gn (yj )),
θ,j
θ
j
ignoring terms that do not depend on θ in

n log h (x , y ). In the special case when no disθ j j
j=1

crete components are present, Genest et al. [11] developed a semi-parametric approach to estimate
the unknown parameters based on the semi-parametric likelihood. Subsequently, it was shown that
the resulting estimators were consistent and asymptotically normally distributed; see [11] for details. Expressions (2.7) and (2.8), respectively, are generalizations of the methodology of Genest
et al. [11] when F and G contain discrete components. Note that the challenge in maximizing
the semiparameteric log-likelihood in (2.8) is that it involves several integrals corresponding to
discrete components in (xj , yj ), j = 1, 2, . . . , n.

10

CHAPTER 3
ASYMPTOTIC PROPERTIES OF
SEMI-PARAMETRIC ESTIMATOR IN
BIVARIATE DISTRIBUTIONS

3.1 Statement of the Main Theorem
In view of its similarity with the semi-parametric maximum likelihood estimator for continuous
ˆ
marginals, we expect that θn in the case of mixed marginals to be consistent and asymptotically
normal. To prove this, we denote l(θ, u1 , u2 , v1 , v2 ) = log {c∗ (u1 , u2 , v1 , v2 )} and use the notaθ
ˆ
tion lθ and lθ,θ to denote the ﬁrst and second derivative of l with respect to θ. The estimator θn
satisﬁes
1 ∂L(θ)
1
=
n ∂θ
n

n
j=1

−
lθ (θ, Fn (x− ), Fn (xj ), Gn (yj ), Gn (yj )) = 0.
j

(3.1)

Here is some heuristics of the proof of asymptotic normality. Expanding in a Taylor’s series, one
obtains,
1 ∂L(θ)
ˆ
ˆ = 0 ≈ An − (θn − θ)Bn
n ∂θ θ=θn
11

(3.2)

where
An =

1
n

n
j=1

−
lθ (θ, Fn (x− ), Fn (xj ), Gn (yj ), Gn (yj ))
j

and
1
Bn = −
n

n
j=1

−
lθ,θ (θ, Fn (x− ), Fn (xj ), Gn (yj ), Gn (yj )).
j

ˆ
It follows from equation (3.2) that n1/2 (θn − θ) ≈ n1/2 An /Bn . From the theory of multivariate
rank statistics, one can get the following limiting behaviors:

Bn → β = −E(lθ,θ {θ, F (X − ), F (X), G(Y − ), G(Y )})

(3.3)

almost surely, and n1/2 An is asymptotically normal with zero mean and variance σ 2 , of which
the explicit form will be provided in Section 3.4.
Thus, we have

ˆ
Theorem 3.1.1. Under suitable regularity conditions, the semi-parametric estimator θn is consisˆ
tent and n1/2 (θn − θ) is asymptotically normal with mean 0 and variance

= σ 2 /β 2 .

We start with the proofs of consistency in Section 3.2, the asymptotic properties of Bn and An
in Section 3.3 and Section 3.4 respectively, on which Theorem 3.1.1 is based, and complete this
Chapter by proving Theorem 3.1.1 in Section 3.5.
Here are some notations which will be used later in this Chapter.

Let SF h

=

[F (D− ), F (DF h )], for h = 1, 2, ..., dF , SGl = [F (D− ), F (DGl )] for l = 1, 2, ..., dG . Under
Fh
Gl
the situation that both F and G only have one jump point, the previous notations can be simpliﬁed
−
−
to DF , DG , SF = [F (DF ), F (DF )], and SG = [G(DG ), G(DG )], respectively.
12

ˆ
3.2 Consistency of θn
ˆ
Nothing that θn satisfying (3.1), by letting J = lθ , which is a function on (0, 1)2 , we only need to
show that
1
Rn =
n

n
J(Fn (Xj ), Gn (Yj )) → 0

a.s.

(3.4)

j=1

Since Hθ involves one or more jump points, Rn can be written as
1
Rn =
n

n
j=1

−
J(Fn (Xj ), Fn (Xj ), Gn (Yj− ), Gn (Yj ))

(3.5)

We introduce some notations for the subsequent presentation. Let {cc}, {cd}, {dc}, and {dd}
be the events that {(X, Y ) : X ∈ C(F ), Y ∈ C(G)}, {(X, Y ) : X ∈ C(F ), Y ∈ J (G)}, {(X, Y ) :
X ∈ J (F ), Y ∈ C(G)} and {(X, Y ) : X ∈ J (F ), Y ∈ J (G)}, respectively. Further, let
Acc = {j : (Xj , Yj ) ∈ {cc}}. Similarly, one can deﬁne Acd , Adc and Add . Note that the sets AS
for S = {cc}, {cd}, {dc}, and {dd} is a partition of the set of integers {1, 2, . . . , n}. We consider
cc
cd
dc
dd
the decomposition of Rn based on these four partition sets, namely, Rn = Rn +Rn +Rn +Rn
where
1
S
Rn =
n

j∈AS

−
J S (Fn (Xj ), Fn (Xj ), Gn (Yj− ), Gn (Yj )),

for S = {cc}, {cd}, {dc}, and {dd}, and J S is given by

 J cc (u , v )


2 2



 cd
 J (u , v , v )
2 1 2
S (u , u , v , v ) =
J
1 2 1 2
 dc
 J (u , u , v )

1 2 2



 dd
 J (u , u , v , v )
1 2 1 2

if S = {cc}
if S = {cd}

(3.6)

if S = {dc}
if S = {dd}.

The functions J S , where S = {cc}, {cd}, {dc} and {dd}, are assumed to be continuous on
their respective domains. For example, J cc (u2 , v2 ) is assumed to be continuous on [0, 1]2 ; this
∂
is a reasonable assumption to make since candidates for J cc will be either
log cθ (u2 , v2 ) or
∂θ
13

∂2
log cθ (u2 , v2 ) in this Chapter, which are continuous functions of u2 and v2 . J cd (u2 , v1 , v2 )
2
∂θ
is assumed to be continuous on [0, 1]3 . A particular candidate of J cd is
J cd (u2 , v1 , v2 ) =

∂
log
cθ (u2 , v)dv
∂θ
[v1 ,v2 ]

which is continuous in u2 , v1 and v2 . Similarly for J dc and J dd .
Almost sure convergence of Rn is established in the following theorem:
Theorem 3.2.1. Let J = lθ , r(u) = u(1 − u), δ > 0, p and q are positive numbers satisfying
1/p + 1/q = 1. Let a and b be numbers given by a = (−1 + δ)/p and b = (−1 + δ)/q. Consider
the conditions
(C1) J cc (u2 , v2 ) ≤ M1 r(u2 )a r(v2 )b ,
(C2) J cd (u2 , v1 , v2 ) ≤ M2 r(u2 )a (independent of v1 and v2 ), and
(C3) J dc (u1 , u2 , v2 ) ≤ M3 r(v2 )b (independent of u1 and u2 ).
(C4) In a small neighborhood of each (F (D− ), F (DF h ), G(D− ), G(DGl )) ∈ [0, 1]4 , for h =
Fh
Gl
dd (u , u , v , v ) is ﬁnite.
1, 2, ..., dF and l = 1, 2, ..., dG , J
1 2 1 2
Under conditions (C1-C4), Rn → κ(= 0) almost surely. where
κ = E[N cc (X, Y ) + N cd (X, Y ) + N dc (X, Y ) + N dd (X, Y )],

(3.7)

with E[N cc (X, Y )], E[N cd (X, Y )], E[N dc (X, Y )] and E[N dd (X, Y )] are given as below, respectively,
J cc (u, v)cθ (u, v) du dv,
dF
dG
{(0,1)∩(∪
S )c }×{(0,1)∩(∪
S )c }
l=1 Gl
h=1 F h
dG
cθ (u, v) dv du,
J cd (u, G(D− ), G(DGl ))
dF
Gl
SGl
(0,1)∩(∪
SF h )c l=1
h=1
dF
cθ (u, v) du dv,
J dc (F (D− ), F (DF h ), v)
dG
Fh
c
SF h
(0,1)∩(∪
SGl ) h=1
l=1
dd (F (D− ), F (D ), G(D− ), G(D ))
cθ (u, v) du dv.
J
Fh
Gl
Fh
Gl
SF h ×SGl
h,l
14

Remark: In the case of t-copulas (as deﬁned in (6.1)), a and b can be chosen such that
2
2
−1 + δ
−1 + δ
a≤− , b≤− , a=
, b=
for some p, q and δ.
ν
ν
p
q
Proof:

Without loss of generality, we consider the case that there is only one point of dis-

continuity in both F and G, i. e., DF and DG . Let Cn (u, v) be the empirical copula deﬁned by
the sample
1
Cn (u, v) =
n

n
j=1

I{F (X )≤u,G (Y )≤v} .
n j
n j

cc cd
We consider the decomposition of the empirical copula measure into 4 sub-measures Cn , Cn ,
dc
dd
Cn , and Cn , where
n
1
S
Cn (u, v) =
I{F (X )≤u,G (Y )≤v} ,
n j
n j
n
j∈AS
for S = {cc}, {cd}, {dc} and {dd}, and
S
Cn (u, v).

Cn (u, v) =
S∈{cc,cd,dc,dd}

cc
cd
Then by the Glivenko-Cantelli Theorem, Cn (u, v) converges uniformly to C 1 (u, v), Cn (u, v)
dc
dd
converges uniformly to C 2 (u, v), Cn (u, v) converges uniformly to C 3 (u, v), and Cn (u, v) converges uniformly to C 4 (u, v) respectively, and C i (u, v) for i = 1, 2, 3, 4, are given by
C 1 (u, v) =

 P (F (X) ≤ u and G(Y ) ≤ v)






 P (F (X) ∈ (0, u) ∩ S c and G(Y ) ≤ v)
F

 P (F (X) ≤ u and G(Y ) ∈ (0, v) ∩ S c )


G



 P (F (X) ∈ (0, u) ∩ S c and G(Y ) ∈ (0, v) ∩ S c )
F
G

−
−
if (u, v) ∈ (0, F (DF )) × (0, G(DG ))
−
−
if (u, v) ∈ [F (DF ), 1) × (0, G(DG ))
−
−
if (u, v) ∈ (0, F (DF )) × [G(DG ), 1)
−
−
if (u, v) ∈ [F (DF ), 1) × [G(DG ), 1)

C 2 (u, v) =

 0

if v ∈ (0, G(DG ))



−
if (u, v) ∈ (0, F (DF )) × [G(DG ), 1)
 P (F (X) ≤ u and Y = DG )



 P (F (X) ∈ (0, u) ∩ S c and Y = D ) if (u, v) ∈ [F (D− ), 1) × [G(D ), 1)
G
G
F
F
15

C 3 (u, v) =

 0

if u ∈ (0, F (DF ))



−
if (u, v) ∈ [F (DF ), 1) × (0, G(DG ))
 P (X = DF and G(Y ) ≤ v)



 P (X = D and G(Y ) ∈ (0, v) ∩ S c ) if (u, v) ∈ [F (D ), 1) × [G(D− ), 1)
F
F
G
G
and C 4 (u, v) only put mass P (X = DF and Y = DG ) on a single point {F (DF )} × {G(DG )}.
Note that C i (u, v), i = 1, ..., 4, are not probability measures and
4
Cθ (u, v) =

C i (u, v).

i=1
To obtain the almost sure convergence, it takes four steps.
Step 1. Let
cc
Rn =

cc
J cc (u, v)dCn (u, v).
2
(0,1)
cc → µcc ≡ E[N cc (X, Y )] almost surely. We show that J cc is uniformly inNow we prove that Rn
cc
cc
tegrable with respect to the measures dCn (u, v) by showing
|J cc (u, v)|1+ dCn (u, v)
2
(0,1)
is bounded for some > 0. Using the H¨lder’s inequality and the assumption (C1), one can deo
1
cc
rive the following chain of inequalities, noting that dCn (u, v) putting mass on each continuous
n
point:
(0,1)2
≤ M1

≤

≤

=
≤

cc
|J cc (u, v)|1+ dCn (u, v)

(0,1)2

cc
r(u)a(1+ ) r(v)b(1+ ) dCn (u, v)

1/p
1/q
a(1+ )p dC cc (u, v)
b(1+ )q dC cc (u, v)
M1
r (u)
r (u)
n
n
(0,1)2
(0,1)2

1/p 
1/q
1
1
a(1+ )p 
b(1+ )q 
j
j
M1
r
r
n

n

n+1
n+1
j∈Acc
j∈Acc
(−1+δ)(1+ )
M1
j
r
n
n+1
j∈Acc
1
1
M1
du < ∞,
0 {u(1 − u)(1−δ)(1+ ) }
16

for

cc
< δ. Combining the result above with the fact that Cn (u, v) uniformly converges to

C 1 (u, v), we have
cc
Rn →

(0,1)2

J cc (u, v)dC 1 (u, v) =

c
c
((0,1)∩SF )×((0,1)∩SG )

J cc (u, v)cθ (u, v) dudv

almost surely, which completes step 1.
Step 2. Let
cd
Rn =

(0,1)2

cd
J cd (u, v1 , v2 )dCn (u, v).

cd
Now we prove that Rn → µcd ≡ E[N cd (X, Y )] almost surely.
We show that
cd
J cd (u, v1 , v2 ) is uniformly integrable with respect to the measures dCn (u, v) by showing
cd
|J cd (u, v1 , v2 )|1+ dCn (u, v) is bounded for some > 0. Using the H¨lder’s ino
2
(0,1)
equality and the assumption (C2), one can derive the following chain of inequalities, noting that
1
cd
c
dCn (u, v) putting mass on each point on ((0, 1) ∩ SF ) × Gn (DG ):
n
(0,1)2
≤ M2

≤

≤

=

≤

cd
|J cd (u, v1 , v2 )|1+ dCn (u, v)

(0,1)2

cd
r(u)a(1+ ) dCn (u, v)

1/p
1/q
a(1+ )p dC cd (u, v)
(1+ )q dC cd (u, v)
M2
r (u)
1
n
n
(0,1)2
(0,1)2
1/p



1
a(1+ )p 
j
r
·1
M2

n
n+1

 j∈A
cd

1/p

1

(−1+δ)(1+ ) 
j
M2
r
n

n+1
 j∈A

cd
1/p
1
1
M2
du
0 {u(1 − u)(1−δ)(1+ ) }

< ∞,
cd
if < δ. Combining the result above with the fact that Cn (u, v) uniformly converges to C 2 (u, v),
17

we have
cd
Rn →

(0,1)2

−
J cd (u, G(DG ), G(DG ))dC 2 (u, v)

−
J cd (u, G(DG ), G(DG ))
cθ (u, v) dv du
c)
((0,1)∩SF
SG
almost surely, which completes step 2.
=

dc
Step 3. Under the assumption (C3), the proof for Rn → µdc ≡ E[N dc (X, Y )] is similar to
that in step 2, so it is omitted.
dd
Step 4. The convergence of Rn can be established using the SLLN. Let
dd
Rn =

(0,1)2

dd
J dd (u1 , u2 , v1 , v2 )dCn (u, v).

dd → µdd ≡ E[N dd (X, Y )] almost surely.
Now we prove that Rn
We show that
dd
J dd (u1 , u2 , v1 , v2 ) is uniformly integrable with respect to the measures dCn (u, v) by showdd
ing
|J dd (u1 , u2 , v1 , v2 )|1+ dCn (u, v) is bounded for some
> 0. Noting
2
(0,1)
n∗
dd (u, v) putting mass dd on the single point (F (D ), G (D )), where n∗ is
that dCn
n F
n G
dd
n
the number of observations of (Xj , Yj ) such that Xj = DF and Yj = DG :
(0,1)2
=

n∗
dd
n

dd
|J dd (u1 , u2 , v1 , v2 )|1+ dCn (u, v)

|J dd (u1 , u2 , v1 , v2 )|1+

< ∞,
as long as J dd (u1 , u2 , v1 , v2 ) is ﬁnite, which is the assumption (C4). Combining the result above
dd
with the fact that Cn (u, v) uniformly converges to C 4 (u, v), we have
dd
Rn →
=

J dd (u1 , u2 , v1 , v2 )dC 4 (u, v)

(0,1)2
−
−
J dd (F (DF ), F (DF ), G(DG ), G(DG ))

almost surely, which completes step 4.
18

SF ×SG

cθ (u, v) du dv

In summary, note that
cc
cd
dc
dd
Rn = Rn + Rn + Rn + Rn ,
Theorem 3.2.1 is true.

3.3 Asymptotic Behavior of Bn
In the case when Hθ and J are both continuous, in [11], by letting J = lθ,θ , one can see of interest
is the asymptotic behavior of the statistic Rn deﬁned by
n
1
Rn =
J(Fn (Xj ), Gn (Yj ))
n
j=1

(3.8)

where J(·, ·) is a function on (0, 1)2 . Genest et al. [11] have shown that
Bn → −E(lθ,θ {θ, F (X), G(Y )})
as n → ∞ almost surely.
In the present case, the appropriate statistic is similar in form to Rn but with a signiﬁcant
difference, Hθ involves one or more jump points. Therefore, Rn can be written of the form as in
(3.5), with J = lθ,θ . Almost sure convergence of Rn is established in the following theorem:
Theorem 3.3.1. Let J = lθ,θ , r(u) = u(1 − u), δ > 0, p and q are positive numbers satisfying
1/p + 1/q = 1. Let a and b be numbers given by a = (−1 + δ)/p and b = (−1 + δ)/q. Consider
the conditions
(C1) J cc (u2 , v2 ) ≤ M1 r(u2 )a r(v2 )b ,
(C2) J cd (u2 , v1 , v2 ) ≤ M2 r(u2 )a (independent of v1 and v2 ), and
(C3) J dc (u1 , u2 , v2 ) ≤ M3 r(v2 )b (independent of u1 and u2 ).
(C4) In a small neighborhood of each (F (D− ), F (DF h ), G(D− ), G(DGl )) ∈ [0, 1]4 , for h =
Fh
Gl
1, 2, ..., dF and l = 1, 2, ..., dG , J dd (u1 , u2 , v1 , v2 ) is ﬁnite.
Under conditions (C1-C4), Rn → β almost surely where
β = E[N cc (X, Y ) + N cd (X, Y ) + N dc (X, Y ) + N dd (X, Y )],
19

(3.9)

with E[N S (X, Y )] deﬁned as in (3.7), and S = {cc}, {cd}, {dc}.

Since the proof is similar to that in Theorem 3.2.1, it is omitted here.

3.4 Asymptotic Behavior of An
By taking J = lθ in (3.5), we have the following Theorem:

S
S
Theorem 3.4.1. Let r(u) = u(1 − u), δ > 0, p and q be as in Theorem 3.3.1. Let Ju and Jv be
the partial derivatives of J S with respect to u and v, respectively, for S = {cc}, {cd}, {dc}, and
{dd}. Also, let a and b be numbers given by a = (−0.5 + δ)/p and b = (−0.5 + δ)/q. Consider
the conditions

(D1)

J cc (u2 , v2 ) ≤ M1 r(u2 )a r(v2 )b , with partial derivatives satisfying

(D2)

cc
cc
Ju (u2 , v2 ) ≤ M2 r(u2 )a−1 r(v2 )b and Jv (u2 , v2 ) ≤ M3 r(u2 )a r(v2 )b−1 ,
2
2
cd
J cd (u2 , v1 , v2 ) ≤ M4 r(u2 )a with Ju (u2 , v1 , v2 ) ≤ M5 r(u2 )a−1 ,
2
v2
cθ (u2 , v)dv ≤ M6 r(u2 )a ,
v1
(0,1)∩{∪h SF h

}c

cd
|Jη (u2 , G(D− ), G(DGl ))|
cθ (u2 , v)dv
Gl
SGl

du2 < ∞

l
cd
with η = v1 or v2 , Ju (u2 , v1 , v2 ) is continuous w.r.t. u2 on C(F ) almost surely,
2
cd
cd
Jv (u2 , v1 , v2 ) and Jv (u2 , v1 , v2 ) are continuous w.r.t. v1 and v2 respectively in
1
2
the small neighborhoods of rectangles C(F ) × (G(D− ), G(DGl )] almost surely
Gl
for l = 1, ..., DG ,

20

(D3)

dc
J dc (u1 , u2 , v2 ) ≤ M7 r(v2 )b with Jv (u1 , u2 , v2 ) ≤ M8 r(v2 )b−1 ,
2
u2
cθ (u, v2 )du ≤ M9 r(v2 )b ,
u1
(0,1)∩{∪l SGl }c

h

dc
cθ (u, v2 )du
|Jη (F (D− ), F (DF h ), v2 )|
Fh
SF h

dv2 < ∞

dc
with η = u1 or u2 , Jv (u1 , u2 , v2 ) is continuous w.r.t. v2 on C(G) almost surely,
2
dc
dc
Ju (u1 , u2 , v2 ) and Ju (u1 , u2 , v2 ) are continuous w.r.t. u1 and u2 respectively in
1
2
the small neighborhoods of rectangles (F (D− ), F (DF h )] × C(G) almost surely
Fh
for h = 1, ..., DF ,
(D4)

dd
dd
dd
dd
Ju (u1 , ·, ·, ·), Ju (·, u2 , ·, ·), Jv (·, ·, v1 , ·), and Jv (·, ·, ·, v2 ) are continuous w.r.t.
1
2
1
2
u1 , u2 , v1 and v2 , respectively, in a small neighborhood of (F (D− ), F (DF h ),
Fh
G(D− ), G(DGl )) for any h = 1, ..., DF and l = 1, ..., DG .
Gl

Under conditions (D1-D4), n1/2 (Rn − κ) → N (0, σ 2 ) in distribution as n → ∞, where κ is
deﬁned in (3.7), and

σ 2 = var M cc (X, Y ) + M cd (X, Y ) + M dc (X, Y ) + M dd (X, Y ) ,

(3.10)

with M cc (x, y), M cd (x, y), M dc (x, y) and M dd (x, y) are given as below, respectively,

J cc (F (x), G(y))I{x∈C(F ),y∈C(G)}
+
+

cc
I{x≤x } Ju (F (x ), G(y ))dHθ (x , y )
2
{R\{∪h DF h }}×{R\{∪l DGl }}
J cc (F (x ), G(y ))dHθ (x , y ),
{y≤y } v2
{R\{∪h DF h }}×{R\{∪l DGl }}
I

21

(3.11)

dG

J cd (F (x), G(D− ), G(DGl ))I{x∈C(F ), y=D }
Gl
Gl

l=1
dG
+
R\{∪h DF h }
l=1
dG
+
R\{∪h DF h }
l=1
dG
+

I
J cd (F (x ), G(D− ), G(DGl ))dHθ (x , DGl )
{x≤x } u2
Gl
cd
I{y<D } Jv (F (x ), G(D− ), G(DGl ))dHθ (x , DGl )
Gl
1
Gl

cd
I{y≤D } Jv (F (x ), G(D− ), G(DGl ))dHθ (x , DGl ),
Gl
2
Gl
l=1 R\{∪h DF h }

dF

J dc (F (D− ), F (DF h ), G(y))I{y∈C(G)}
Fh

h=1
dF
dc
+
I{y≤y } Jv (F (D− ), F (DF h ), G(y ))dHθ (DF h , y )
Fh
2
h=1 R\{∪l DGl }
dF
dc
+
I{x<D } Ju (F (D− ), F (DF h ), G(y ))dHθ (DF h , y )
Fh
1
Fh
R\{∪l DGl }
h=1
dF
dc
+
I{x≤D } Ju (F (D− ), F (DF h ), G(y ))dHθ (DF h , y ),
Fh
2
Fh
h=1 R\{∪l DGl }

h,l

J dd (F (D− ), F (DF h ), G(D− ), G(DGl ))I{x=D , y=D }
Fh
Gl
Fh
Gl

+
h,l

dd
I{y<D } Jv (F (D− ), F (DF h ), G(D− ), G(DGl ))
Fh
Gl
1
Gl

+ I{y≤D

Gl }

+ I{x<D

dd
Jv (F (D− ), F (DF h ), G(D− ), G(DGl ))
Fh
Gl
2

−
−
dd
} Ju1 (F (DF h ), F (DF h ), G(DGl ), G(DGl ))
Fh

dd
+ I{x≤D } Ju (F (D− ), F (DF h ), G(D− ), G(DGl ))
Fh
Gl
2
Fh
·P (X = DF h , Y = DGl ).
In the case when both Hθ and J are continuous, Genest et al. [11] showed that the statistic Rn
22

was a special case of multivariate rank order statistics whose asymptotic behavior was thoroughly
studied by Ruymgaart et al. [41], Rumyaart [42] and R¨ schendorf [40]. For continuous Hθ and J ,
u
Genest et al. [11] proposed regularity conditions ensuring almost sure convergence and asymptotic
normality. In the present case, Hθ involves one or more jump points, which causes the J function
to be discontinuous on (0, 1)2 as well.
In the rest of this Chapter, we will provide a scratch proof of Theorem 3.4.1. Without loss of
generality, we will show the theorem is true in the case that there is only one discontinuity point in
both F and G, which should be completed in the subsequent sections, using arguments similar to
those used in [41].
We shall need the following empirical processes
Un (F (x)) = n1/2 (Fn (x) − F (x)),
Un (F (x− )) = n1/2 (Fn (x− ) − F (x− )),
Vn (G(y)) = n1/2 (Gn (y) − G(y))),
Vn (G(y − )) = n1/2 (Gn (y − ) − G(y − )).
Note that
P (Ω0 ) = P ({ω : Fn (F −1 (F )) = Fn , Gn (G−1 (G)) = Gn , for all x, y and n}) = 1.

(3.12)

The above identities, Fn (F −1 (F )) = Fn and Gn (G−1 (G)) = Gn are true even for the case
when there are jump points. At the jump point of F , one can see that F −1 (F (DF )) = DF . The
similar result holds for G as well.

cc
3.4.1 Asymptotic normality of Rn

For small positive γ deﬁne the set

Ωγn = {ω : sup |Fn − F | < γ/2, sup |Gn − G| < γ/2}.
23

(3.13)

For ω ∈ Ω0 ∩ Ωγn , the Mean Value Theorem gives
cc
n1/2 J cc (Fn , ·) = n1/2 J cc (F, ·) + Un (F )Ju (Φn , ·),
2
for all x ∈ C(F ). In the formula above Φn is deﬁned by Φn = F + η(Fn − F ), where η =
η(ω, x, n) is a number between 0 and 1. Let
∆F = [X1n , DF )

(DF , Xnn ],

∆G = [Y1n , DG )

(DG , Ynn ],

where
X1n ( or Y1n ) = min1≤j≤n Xj ( or Yj ),
Xnn ( or Ynn ) = max1≤j≤n Xj ( or Yj ).
Note that
4

3

cc
n1/2 (Rn − µcc ) =

Ain +
i=1

i=1

Bγin + B5n + Cn ,

where
µcc =

E[N cc (X, Y )] with lθ,θ replaced by lθ in (3.9),

A1n =

n1/2

{R\DF }×{R\DG }

J cc (F (x), G(y))d[Hn (x, y) − Hθ (x, y)],

A2n =

cc
Un (F (x))Ju (F (x), G(y))dHθ (x, y),
2
{R\DF }×{R\DG }

A3n =

cc
Vn (G(y))Jv (F (x), G(y))dHθ (x, y),
2

{R\DF }×{R\DG }

Bγ 1n = χ(Ωc n ){n1/2
γ

{R\DF }×{R\DG }

[J cc (Fn (x), G(y))

−J cc (F (x), G(y))]dHn (x, y) − A2n } ,

24

Bγ 2n = χ(Ωγn )

{R\DF }×{R\DG }

cc
Un (F (x))[Ju (Φn (x), G(y))
2

cc
−Ju (F (x), G(y)) dHn (x, y),
2
Bγ 3n = χ(Ωγn )

{∆F ×∆G }∩{{R\DF }×{R\DG }}

Un (F (x))·

cc
Ju (F (x), G(y))d[Hn (x, y) − Hθ (x, y)],
2
Bγ 4n = χ(Ωγn )

{∆F ×∆G }∩{{R\DF }×{R\DG }}

Un (F (x))·

cc
Ju (F (x), G(y))dHθ (x, y) − A2n ,
2
B5n =

n1/2

{R\DF }×{R\DG }

[J cc (F (x), Gn (y))

−J cc (F (x), G(y))] dHn (x, y) − A3n ,
Cn =

n1/2

{R\DF }×{R\DG }

[J cc (Fn (x), Gn (y)) − J cc (Fn (x), G(y))

−J cc (F (x), Gn (y)) + J cc (F (x), G(y))] dHn (x, y),
where χ(Ωγn ) denotes the indicator function of Ωγn ; Hn (x, y) is the empirical cumulative distribution of Hθ (x, y) .
Note that with a further decomposition of Cn , one has
Cγ 1n = χ(Ωc n )n1/2
γ

{R\DF }×{R\DG }

[J cc (Fn (x), Gn (y)) − J cc (Fn (x), G(y))

−J cc (F (x), Gn (y)) + J cc (F (x), G(y))] dHn (x, y),
Cγ 2n = χ(Ωγn )

cc
Un (F (x))[Ju (Φn (x), Gn (y))
2
{R\DF }×{R\DG }

cc
−Ju (Φn (x), G(y)) dHn (x, y).
2
We start with a few lemmas to be used in the proof.
25

Lemma 3.3.1 For any ξ ≥ 0 and function r(u) = u(1 − u), the function r(u)−ξ is symmetric
1
1
about , decreasing on 0,
and has the property that for each β in (0, 1) there exits a constant
2
2
M = Mβ such that
1
r(βs)−ξ ≤ M r(s)−ξ for 0 < s ≤ ,
2
and
r(1 − β(1 − s))−ξ ≤ M r(s)−ξ

for

1
< s < 1.
2

Proof of Lemma 3.4.1 can be found in [41].
Lemma 3.3.2 Let Φn and Ψn be functions on ∆n1 and ∆n2 , where

 [X , X ],

if Xnn = Dd

1n nn

F

∆n1 =




 [X , Xnn ),
if Xnn = Dd
1n
F
and


 [Y , Y ],
 1n nn



∆n2 =







[Y1n , Ynn ),

if Ynn = Dd
G
if Ynn = Dd
G

respectively, satisfying
min(F, Fn ) ≤ Φn ≤ max(F, Fn )
and
min(G, Gn ) ≤ Ψn ≤ max(G, Gn )
where deﬁned. Then uniformly for n = 1, 2, · · · ,
sup r(Φn )−ξ r(F )ξ = Op (1), for each ξ ≥ 0,
∆n1
(ii) sup r(Ψn )−η r(G)η = Op (1), for each η ≥ 0.
∆n2
It sufﬁces to prove (i). From Lemma A.3 in [45] and (3.12), it follows that one needs
(i)

Proof:

to show that for each > 0, there exists a constant β = β in (0, 1) such that
P (Ωn ) = P ({ω : βF ≤ Fn ≤ 1 − β(1 − F ) any x(ω) on ∆n1 }) > 1 − ,
26

(3.14)

for all n and F .
Let A = {ω : βF ≤ Fn ≤ 1 − β(1 − F ) any x(ω) ∈ ∆n1 ∩ C(F )}, and B = {ω : βF ≤
Fn ≤ 1 − β(1 − F ) any x(ω) ∈ ∆n1 ∩ J (F )}.
Note that in our case,
P ({ω : βF ≤ Fn ≤ 1 − β(1 − F ) any x(ω) on ∆n1 })
= P (A ∪ B)
= P (A) + P (B)
= P ({ω : x(ω) ∈ ∆n1 ∩ C(F )}) · P ({ω : βF ≤ Fn ≤ 1 − β(1 − F ) | x(ω) ∈ ∆n1 ∩ C(F )})
+P ({ω : x(ω) ∈ ∆n1 ∩ J (F )})·
P ({ω : βF ≤ Fn ≤ 1 − β(1 − F ) | x(ω) ∈ ∆n1 ∩ J (F )}).
In Lemma 6.1 in [41] (3.14) has been veriﬁed for all continuous F for a constant β = β , which
gives us P ({ω : βF ≤ Fn ≤ 1 − β(1 − F ) | x(ω) ∈ C(F ) ∩ ∆n1 }) > 1 − . Noting that
P ({ω : x(ω) ∈ ∆n1 ∩ C(F )}) + P ({ω : x(ω) ∈ ∆n1 ∩ J (F )}) = 1,
to prove (3.14) we only need to show when x(ω) ∈ ∆n1 ∩ J (F ), P ({ω : βF ≤ Fn ≤ 1 − β(1 −
F ) | any x(ω) ∈ ∆n1 ∩ J (F )}) > 1 − is true for a constant β = β ” . This is a direct result of
the Glivenko-Cantelli Theorem, which completes the proof.
Lemma 3.3.3 Uniformly in all F , we have
∗
sup |Un (F ) − Un (F )|r(F )ρ−1/2 →p 0, as n → ∞, for each ρ ≥ 0,
∆n1
(ii)
sup
|Un (F )|r(F )ρ−1/2 = Op (1), as n → ∞, for each ρ ≥ 0,
(−∞,∞)\{∪h DF h }

(i)

∗
ˆ
ˆ
where Un (F (x)) = n1/2 (Fn (x) − F (x)), and Fn is the empirical distribution of F . Note that Fn
n ˆ
was deﬁned to be
Fn .
n+1
Proof: (i) Note that
ˆ
Fn
∗
|Un (F ) − Un (F )|r(F )ρ−1/2 = n1/2
r(F )ρ−1/2 ,
n+1
27

and that for any ﬁxed β ∈ (0, 1), we have
β ρ−1/2
β ρ−1/2
=r 1−
= O(n−ρ+1/2 ).
n
n

r

Since F (Xi ) are i.i.d. uniform random variables, given any arbitrary

> 0, we can choose a

β = β in (0, 1) such that

β
β
≤ F (X1n ) ≤ F (Xnn ) ≤ 1 −
n
n

P

>1−

for all n and all F with F (DF ) = 1, which is obvious since

P

β
β
≤ F (X1n ) ≤ F (Xnn ) ≤ 1 −
n
n

=

1−

β β n
−
n n

=

1−

2β n
→ e−2β > 1 − ,
n

for sufﬁciently small β.
Combining with all above results, we proved (i). (ii) follows from (i) and (iii) of Lemma 4.2 in
[41]. Proof completed.
cc
Proof of Asymptotic normality of Rn .
(1) Note that A1n can be written as

A1n

= n−1/2

n
A1jn ,
j=1

where A1jn = J cc (F (Xj ), G(Yj )) · I{j∈A } − µcc . The A1jn are i.i.d. with mean zero. By
cc
28

the assumption (D1) and the H¨lder’s inequity, we have
o
{R\DF }×{R\DG }
≤

{R\DF }×{R\DG }

2+δ
≤ M1 0
2+δ
= M1 0

(0,1)

(0,1)

|J cc (F (x), G(y))|2+δ0 dHθ (x, y)
2+δ
M1 0 |r(F (x))a(2+δ0 ) r(G(y))b(2+δ0 ) |dHθ (x, y)

r(u)a(2+δ0 )p0 du

1/p0
(0,1)

(− 1 +δ)(2+δ0 )
du
r(u) 2

r(v)b(2+δ0 )q0 dv

1/p0
(0,1)

1/q0

(− 1 +δ)(2+δ0 )
r(v) 2
dv

1/q0

< ∞,
for some selected p0 , q0 and δ satisfying (1 − δ)(2 + δ0 ) < 1, which means that A1jn has a ﬁnite
absolute moment of order 2+δ0 for some δ0 > 0. Moreover, the term on the left hand is uniformly
bounded above of order 2 + δ0 . Using the CLT, we get the asymptotic normality of A1n .
(2) Note that A2n can be written as
cc
ˆ
(Fn (x) − F (x))Ju (F (x), G(y))dHθ (x, y)
2
{R\DF }×{R\DG }
cc
ˆ
+n1/2
(Fn (x) − Fn (x))Ju (F (x), G(y))dHθ (x, y)
2
{R\DF }×{R\DG }
= A∗ + A∗ ,
2n1
2n2

A2n = n1/2

ˆ
where Fn (x) is the empirical distribution function of X. Let

 0 if x < X


j
φX (x) =

j

 1 if x ≥ X
j
Then

A∗
2n1

can

be

written

n−1/2

as

n
A2jn ,

where

A2jn

=

j=1
cc
(φX (x) − F (x))Ju (F (x), G(y))dHθ (x, y) are i.i.d.
2
j

{R\DF }×{R\DG }
zero. Note that |φX (x) − F (x)| ≤ 1. Under the assumption (D1), we have
j
|A2jn | ≤ M2

{R\DF }×{R\DG }
29

ra−1 (F (x))rb (G(y))dHθ (x, y).

with mean

ra−1 (F (x))rb (G(y))dHθ (x, y) < ∞ uniformly as long as we

Note that

{R\DF }×{R\DG }
can ﬁnd some p1 and q1 satisfying 1/p1 + 1/q1 = 1 and (a − 1)p1 > −1 and bq1 > −1. Thus,
we have shown that A∗ has an absolute moment of order 2 + δ1 , which leads to the asymptotic
2n1
normality of A∗ . Note that with the same p1 and q1
2n1
n1/2
A∗
=
2n2
n+1
≤

{R\DF }×{R\DG }

n1/2
ˆ
sup |Fn (x)|
n+1 x

cc
ˆ
Fn (x)Ju (F (x), G(y))dHθ (x, y)
2

cc
Ju (F (x), G(y))dHθ (x, y)
{R\DF }×{R\DG } 2

= op (1).
Therefore, we have the asymptotic normality of A2n .
(3) Similarly, note that A3n can be written as
cc
ˆ
(Gn (y) − G(y))Jv (F (x), G(y))dHθ (x, y)
2
{R\DF }×{R\DG }
cc
ˆ
+n1/2
(Gn (y) − Gn (y))Jv (F (x), G(y))dHθ (x, y)
2
{R\DF }×{R\DG }
= A∗ + A∗ ,
3n1
3n2

A3n = n1/2

ˆ
where Gn (y) is the empirical distribution function of Y .
Similarly, the A∗
3n1 can
n
be written as n−1/2
A3jn , where A3jn =
(φY (y) −
j
{R\DF }×{R\DG }
j=1
cc
G(y))Jv (F (x), G(y))dHθ (x, y) are i.i.d. with mean zero. The same asymptotic conclusion can
2
be drawn for A3n .
n
(4) Note that the A1n + A2n + A3n can be written as
n−1/2 (A1jn + A2jn + A3jn ) +
j=1
n
A∗ + A∗ ≡
n−1/2 A∗∗ + A∗ + A∗ , where the A∗∗ = A1jn + A2jn + A3jn
2n2
3n2
2n2
3n2
1jn
1jn
j=1
p
depends on (Xj , Yj ) only, hence are i.i.d. with mean zero, and the terms A∗
2n2 → 0 and
p
A∗
3n2 → 0 as n → ∞. Using the CLT, we have that the A1n + A2n + A3n is asymptotically normally distributed with mean 0.
The asymptotic negligibility of the B− and C− terms will be stated as corollaries.
30

Corollary 1. For ﬁxed γ, Bγ1n →p 0 and Cγ1n →p 0 as n → ∞.
Proof:

Note that P (Ωc n ) → 0 in (3.13) for any Hθ by the Glivenko-Cantelli Theorem, and
γ

because the distribution of sup |Fn (x) − F (x)| does not depend on Hθ . This completes the proof.
x
Corollary 2. For ﬁxed γ, Bγ2n →p 0 and Cγ2n →p 0 as n → ∞.
Proof:

According to Lemma 3.3.3 (ii) with ρ =

1
, for given
2

> 0, there exists a constant

M such that
P (Ω1n ) = P (ω : {sup |Un (F )| ≤ M and x ∈ C(F )}) > 1 −
x

(3.15)

for all n and Hθ , and M such that
P (Ω2n ) = P (ω : { sup r−ξ (Φn )rξ (F ) ≤ M }) > 1 − ,
∆n1
which gives P (Ω1n ∩ Ω2n ) > 1 − 2 . Also,
χ(Ω1n ∩ Ω2n )|Bγ2n |
cc
cc
≤ M sup∆ ×∆ Ju (Φn (x), G(y)) − Ju (F (x), G(y)) .
2
2
F
G
+χ(Ω1n ∩ Ω2n )

{(−∞,X1n )∪(Xnn ,∞)}×{(−∞,Y1n )∪(Ynn ,∞)}

Un (F (x))·

cc
cc
[Ju (Φn (x), G(y)) − Ju (F (x), G(y))]dHn (x, y)
2
2
By the Glivenko-Cantelli Theorem,
P ({(−∞, X1n ) ∪ (Xnn , ∞)} × {(−∞, Y1n ) ∪ (Ynn , ∞)}) → 0
for any Hθ , so the second term on the right side of the inequality above converges to
cc
0, as n → ∞. The function Ju (u2 , v2 ) is uniformly continuous on (0, 1)2 . Since
2
|Φn − F | ≤ |Fn − F | where Φn is deﬁned, the Glivenko-Cantelli Theorem yields
cc
cc
sup∆ ×∆ Ju (Φn (x), G(y)) − Ju (F (x), G(y)) →p 0 uniformly for Hθ . Therefore, we
2
2
F
G
have shown that Bγ2n →p 0 as n → ∞.
A similar argument may be used for |Cγ2n |. Proof completed.
31

Corollary 3. For ﬁxed γ, Bγ3n →p 0 as n → ∞.

Proof:

Noting that {∆F × ∆G } ∩ {{R\DF } × {R\DG }} = ∆F × ∆G ,

Bγ3n = χ(Ωγn )

∆F ×∆G

= χ(Ωγn ∩ Ω1n )

cc
Un (F (x))Ju (F (x), G(y))d[Hn (x, y) − Hθ (x, y)]
2

∆F ×∆G

+χ(Ωγn ∩ Ωc )
1n

∆F ×∆G

cc
Un (F (x))Ju (F (x), G(y))d[Hn (x, y) − Hθ (x, y)]
2

cc
Un (F (x))Ju (F (x), G(y))d[Hn (x, y) − Hθ (x, y)]
2
(3.16)

where Ω1n is as deﬁned in (3.15).
Note that the second term in (3.16) converges to 0 in probability, and
χ(Ωγn ∩ Ω1n )

∆F ×∆G

≤ χ(Ωγn ∩ Ω1n )M
≤ χ(Ωγn ∩ Ω1n )M M

cc
Un (F (x))Ju (F (x), G(y))d[Hn (x, y) − Hθ (x, y)]
2

∆F ×∆G

cc
Ju (F (x), G(y))d[Hn (x, y) − Hθ (x, y)].
2

∆F ×∆G

d[Hn (x, y) − Hθ (x, y)]

cc
= sup∆ ×∆ Ju (F (x), G(y)) . From the Theorem 1.9, (i), from [43], the above
2
F
G
integral converges to 0 in probability as n → ∞. Proof completed.
where M

Corollary 4. For ﬁxed γ, Bγ4n →p 0 as n → ∞.

Proof:

Note that

Bγ4n = χ(Ωγn ∩ Ω1n )
+χ(Ωγn ∩ Ωc )
1n

∆F ×∆G

cc
Un (F (x))Ju (F (x), G(y))dHθ (x, y) − A2n
2

∆F ×∆G

cc
Un (F (x))Ju (F (x), G(y))dHθ (x, y) − A2n
2
32

= −χ(Ωγn ∩ Ω1n )

{(−∞,X1n )∪(Xnn ,∞)}×{(−∞,Y1n )∪(Ynn ,∞)}

Un (F (x))·

cc
Ju (F (x), G(y)) dHθ (x, y)
2
+χ(Ωγn ∩ Ωc )
1n

where

Ω1n

is

as

∆F ×∆G

deﬁned

in

cc
Un (F (x))Ju (F (x), G(y))dHθ (x, y) − A2n
2

(3.15).

By

the

Glivenko-Cantelli

Theorem,

P ({(−∞, X1n ) ∪ (Xnn , ∞)} × {(−∞, Y1n ) ∪ (Ynn , ∞)}) → 0 for any Hθ . Combining
cc
with the fact that Ju is uniformly continuous on (0, 1)2 , the righthand side converges to 0, as
2
n → ∞.
To see B5n converging to 0 in probability as n → ∞, let us notice that
4

Bγin = n1/2

i=1

{R\DF }×{R\DG }

[J cc (F (x), Gn (y)) − J cc (F (x), G(y))]dHn (x, y)

−A2n .

In summary, we have shown that all these B-terms and C-term are negligible. Combining with
cc
the results of these A-terms, we have established the asymptotic normality of Rn .

cd
dc
3.4.2 Asymptotic normality of Rn and Rn

cd
It sufﬁces to show the asymptotic normality of Rn . Note that
4

cd
n1/2 (Rn − µcd ) =

6
Ain +

i=1
where
33

Bin ,
i=1

µcd

= E[N cd (X, Y )],

A1n = n1/2
A2n =
A3n =
A4n =
B1n =
B2n =
B3n =
B4n =

{R\DF }

−
J cd (F (x), G(DG ), G(DG ))d[Hn (x, DG ) − Hθ (x, DG )];

−
cd
Un (F (x))Ju (F (x), G(DG ), G(DG ))dHθ (x, DG );
2
{R\DF }
{R\DF }
{R\DF }

− cd
−
Vn (G(DG ))Jv (F (x), G(DG ), G(DG ))dHθ (x, DG );
1
−
cd
Vn (G(DG ))Jv (F (x), G(DG ), G(DG ))dHθ (x, DG );
2

−
cd
Un (F (x))Ju (Φn (x), Gn (DG ), Gn (DG ))d[Hn (x, DG ) − Hθ (x, DG )]
2
{R\DF }
{R\DF }
{R\DF }

−
− cd
Vn (G(DG ))Jv (F (x), Θn (DG ), Gn (DG ))d[Hn (x, DG ) − Hθ (x, DG )]
1
−
cd
Vn (G(DG ))Jv (F (x), G(DG ), Ψn (DG ))d[Hn (x, DG ) − Hθ (x, DG )]
2

−
cd
Un (F (x))[Ju (Φn (x), Gn (DG ), Gn (DG ))
2
{R\DF }
−
cd
−Ju (F (x), G(DG ), G(DG )) dHθ (x, DG )
2

B5n =

{R\DF }

−
−
cd
Vn (G(DG ))[Jv (F (x), Θn (DG ), Gn (DG ))
1

−
cd
−Jv (F (x), G(DG ), G(DG )) dHθ (x, DG )
1
B6n =

{R\DF }

−
cd
Vn (G(DG ))[Jv (F (x), G(DG ), Ψn (DG ))
2

−
cd
−Jv (F (x), G(DG ), G(DG )) dHθ (x, DG )
2
where Φn (·), Θn (·), and Ψn (·) are deﬁned by the Mean Value Theorem.
Next, we will show that the A− terms are asymptotic normal and the B− terms converge to 0
in probability.
34

(1) Note that A1n can be written as

A1n

= n−1/2

n
A1jn ,
j=1

where A1jn = J cd (F (Xj ), G(DG − ), G(DG )) · I{j∈A } − µcd . Note that A1jn are i.i.d. with
cd
mean zero. Since
J cd (u, v1 , v2 ) =

v2
∂
log
cθ (u, v)dv,
∂θ
v1

using assumption (D2), we have

{R\DF }
≤

{R\DF }
·

≤

|J cd (F (x), G(DG − ), G(DG ))|2+δ0 dHθ (x, DG )
|J cd (F (x), G(DG − ), G(DG ))|(2+δ0 )p0 dHθ (x, DG )

{R\DF }

1(2+δ0 )q0 dHθ (x, DG )

1/p0

1/q0

|J cd (u, G(DG − ), G(DG ))|(2+δ0 )p0
cθ (u, v)dv du
c
SG
(0,1)∩SF

Under the assumption (D2),

c
(0,1)∩SF
≤ M4

|J cd (u, v1 , v2 )|(2+δ0 )p0

c
(0,1)∩SF

≤ M4 M6

r(u)a(2+δ0 )p0

c
(0,1)∩SF

SG

SG

cθ (u, v)dv du

cθ (u, v)dv du

r(u)a(2+δ0 )p0 r(u)a du

< ∞,
if a(2 + δ0 )p0 + a > −1. By the CLT, we have the asymptotic normality of A1n .
35

1/p0
.

(2) Note that A2n can be written as
cd
ˆ
(Fn (x) − F (x))Ju (F (x), G(DG − ), G(DG ))dHθ (x, DG )
2
{R\DF }
cd
ˆ
+n1/2
(Fn (x) − Fn (x))Ju (F (x), G(DG − ), G(DG ))dHθ (x, DG )
2
{R\DF }
= A∗ + A∗ .
2n1
2n2

A2n = n1/2

n A
j=1 2jn , where A2jn = {R\D } (φXj (x) −
F
cd
F (x))Ju (F (x), G(DG − ), G(DG ))dHθ (x, DG ) are i.i.d. with mean 0. Using the assumption
2
(D2), we have
Furthermore, A∗ can be written as n−1/2
2n1

|A2jn | ≤ M5

c
(0,1)∩SF

≤ M5 M6

ra−1 (u)

c
(0,1)∩SF

SG

cθ (u, v)dv du

r2a−1 (u)du < ∞

as long as (2a − 1) > −1. Thus we have the asymptotic normality of A∗ . Using the assumption
2n1
(D2), we have
√
n
cd
∗
ˆ
Fn (x)Ju (F (x), G(DG − ), G(DG ))dHθ (x, DG )
A2n2 =
2
n + 1 {R\D }
F
√
n
cd
ˆ
≤
sup |Fn (x)|
Ju (F (x), G(DG − ), G(DG ))dHθ (x, DG )
n+1 x
{R\DF } 2
= op (1).
Therefore, the asymptotic normality of A2n has been established.
(3) Note that A3n can be written as
cd
ˆ
[Gn (DG − ) − G(DG − )]Jv (F (x), G(DG − ), G(DG ))dHθ (x, DG )
1
{R\DF }
cd
ˆ
+n1/2
[Gn (DG − ) − Gn (DG − )]Jv (F (x), G(DG − ), G(DG ))dHθ (x, DG )
1
{R\DF }
= A∗ + A∗ .
3n1
3n2

A3n = n1/2

Similarly, A∗ can be written as n−1/2
3n1

n A
j=1 3jn , where A3jn = {R\D } (ψYj (DG ) −
F
36

cd
Gn (DG − ))Jv (F (x), G(DG − ), G(DG ))dHθ (x, DG ) are i.i.d. with mean zero. and
1


 1
if Yj < y,
ψY (y) =
j

 0
if Yj ≥ y.
Noting that the absolute value of the random part |ψY (DG ) − Gn (DG − )| in A3jn is bounded
j
above by 1, we have
|A3jn | ≤

c
(0,1)∩SF

cd
|Jv (u, G(DG − ), G(DG ))|
cθ (u, v)dv du.
1
SG

By the assumption (D2), the integral above is ﬁnite. Therefore, A3jn has an absolute moment of
order 2 + δ1 for some δ1 > 0, which leads to the asymptotic normality of A∗ .
3n1
Using argument similar to that used for A∗ as in (2), one can show that A∗ = op (1). In
2n2
3n2
summary, we have the asymptotic normality of A3n .
(4) Result of A4n can be obtained in a similar way by noting that A4n can be written as
cd
ˆ
[Gn (DG ) − G(DG )]Jv (F (x), G(DG − ), G(DG ))dHθ (x, DG )
2
{R\DF }
cd
ˆ
+n1/2
[Gn (DG ) − Gn (DG )]Jv (F (x), G(DG − ), G(DG ))dHθ (x, DG )
2
{R\DF }
= A∗ + A∗ ,
4n1
4n2

A4n = n1/2

−1/2
and A∗
4n1 can be written as n

n
A4jn , where A4jn =

{R\DF }

(φY (DG ) −
j

j=1
cd (F (x), G(D − ), G(D ))dH (x, D ) are i.i.d. with mean zero.
Gn (DG ))Jv
G
G
G
θ
2
Using arguments similar to those in (3), we can get that A4n is asymptotically normally distributed with mean 0.

(5) Finally, we show that the sum of Ain , i = 1, ..., 4, converges to a normal random variable
n
with mean 0. Note that the sum can be written as
n−1/2 (A1jn + A2jn + A3jn + A4jn ) +
j=1
n
A∗ + A∗ + A∗ ≡
n−1/2 A∗∗ + A∗ + A∗ + A∗ , where A∗∗ = A1jn +
2n2
3n2
4n2
2n2
3n2
4n2
2jn
2jn
j=1
A2jn + A3jn + A4jn depends on (Xj , Yj ) only, hence are i.i.d. with mean 0 as shown before
37

dc
(Similarly, we can deﬁne A∗∗ for Rn ), and A∗ , A∗ and A∗ are negligible. Using the
2n2 3n2
4n2
3jn
CLT, we have that the sum of Ain , i = 1, ..., 4, is asymptotically normally distributed with mean
0.
cd
We will ﬁnish the proof of Rn by a few corollaries.
Corollary 5. B1n →p 0 as n → ∞.
Proof:

Note that

B1n = χ(Ω1n )

{R\DF }

−
cd
Un (F (x))Ju (Φn (x), Gn (DG ), Gn (DG ))d[Hn (x, DG )
2

−Hθ (x, DG )]
−
cd
+χ(Ωc )
1n {R\D } Un (F (x))Ju2 (Φn (x), Gn (DG ), Gn (DG ))d[Hn (x, DG )
F
−Hθ (x, DG )]
where Ω1n is deﬁned in (3.15), which tells us the second term on the righthand side of the equality
above goes to 0 in probability as n → ∞. Also,
χ(Ω1n )
≤ χ(Ω1n )
= χ(Ω1n )
χ(Ω1n )

−
cd
Un (F (x))Ju (Φn (x), Gn (DG ), Gn (DG ))d[Hn (x, DG ) − Hθ (x, DG )]
2
{R\DF }
−
cd
M Ju (Φn (x), Gn (DG ), Gn (DG ))d[Hn (x, DG ) − Hθ (x, DG )]
2
{R\DF }
∆F

−
cd
M Ju (Φn (x), Gn (DG ), Gn (DG ))d[Hn (x, DG ) − Hθ (x, DG )] +
2

{(−∞,X1n )∪(Xnn

−
cd
M Ju (Φn (x), Gn (DG ), Gn (DG ))d[Hn (x, DG )
2
,∞)}

−Hθ (x, DG )],
the second term on the righthand side of the inequality goes to 0 in probability as n → ∞ by the
−
cd
Glivenko-Cantelli theorem and the assumption (D2). Since Ju (Φn (x), Gn (DG ), Gn (DG )) is
2
bounded above almost surely when x ∈ ∆F and Hn (x, y) converges to Hθ (x, y) in distribution
as n → ∞, using Theorem 1.9 (i), [43], once again, we have the ﬁrst term on the righthand side of
the inequality above converges to 0. In summary, we have shown that B1n →p 0 as n → ∞.
38

Corollary 6. B2n →p 0 and B3n →p 0 as n → ∞.
Proof:

It sufﬁces to show the result for B2n . By the CLT, Vn (G(DG − )) converges to a

normal random variable as n → ∞, so we can deﬁne a set Ω2n such that
P (Ω2n ) = P ({ω : sup |Vn (G(DG − ))| ≤ M }) > 1 − .
ω
Therefore,
B2n = χ(Ω2n )

{R\DF }

−
cd
Vn (G(DG − ))Jv (F (x), Θn (DG ), Gn (DG ))d[Hn (x, DG )
1

−Hθ (x, DG )]
−
− cd
+χ(Ωc )
2n {R\D } Vn (G(DG ))Jv1 (F (x), Θn (DG ), Gn (DG ))d[Hn (x, DG )
F
−Hθ (x, DG )],
and the second term on the righthand side of the equality above goes to 0 in probability as n → ∞.
Also,
−
cd
Vn (G(DG − ))Jv (F (x), Θn (DG ), Gn (DG ))d[Hn (x, DG )
1
{R\DF }
−Hθ (x, DG )]

χ(Ω2n )

≤ χ(Ω2n )
= χ(Ω2n )
χ(Ω2n )

−
cd
M Jv (F (x), Θn (DG ), Gn (DG ))d[Hn (x, DG ) − Hθ (x, DG )]
1
{R\DF }
∆F

−
cd
M Jv (F (x), Θn (DG ), Gn (DG ))d[Hn (x, DG ) − Hθ (x, DG )] +
1

{(−∞,X1n )∪(Xnn
−Hθ (x, DG )],

−
cd
M Jv (F (x), Θn (DG ), Gn (DG ))d[Hn (x, DG )
1
,∞)}

the second term on the righthand side of the inequality goes to 0 in probability as n → ∞ by the
Glivenko-Cantelli theorem and the assumption (D2).
From the assumption (D2), combining with the deﬁnition of Θn (·) and the fact that
cd
Ju (u2 , v1 , v2 ) is continuous with respect to u2 , v1 , and v2 on (0, 1)3 , we know that
2
39

−
cd
Ju (F (x), Θn (DG ), Gn (DG )) is bounded above almost surely for x ∈ ∆F . Noting that
2
Hn (x, y) converges to Hθ (x, y) in distribution as n → ∞, using Theorem 1.9 (i), [43], once
again, we have the ﬁrst term on the righthand side of the inequality above converges to 0. In
summary, we have shown that B2n →p 0 as n → ∞.
Corollary 7. B4n →p 0 as n → ∞.
Proof:

Note that
B4n = χ(Ω1n )

−
cd
Un (F (x))[Ju (Φn (x), Gn (DG ), Gn (DG ))
2
{R\DF }

−
cd
−Ju (F (x), Gn (DG ), Gn (DG )) dHθ (x, DG )
2
+χ(Ωc )
1n

−
cd
Un (F (x))[Ju (Φn (x), Gn (DG ), Gn (DG ))
2
{R\DF }

−
cd
−Ju (F (x), Gn (DG ), Gn (DG )) dHθ (x, DG )
2
where Ω1n is deﬁned in (3.15). Therefore, the second term on the righthand side of the equality
above goes to 0 in probability as n → ∞. Also,
χ(Ω1n )

{R\DF }

−
cd
Un (F (x))[Ju (Φn (x), Gn (DG ), Gn (DG ))
2

−
cd
−Ju (F (x), Gn (DG ), Gn (DG )) dHθ (x, DG )
2
≤ χ(Ω1n )

−
cd
M |Ju (Φn (x), Gn (DG ), Gn (DG ))
2
{R\DF }

−
cd
−Ju (F (x), Gn (DG ), Gn (DG )) dHθ (x, DG )
2
= χ(Ω1n )

∆F

−
cd
M |Ju (Φn (x), Gn (DG ), Gn (DG ))
2

−
cd
−Ju (F (x), Gn (DG ), Gn (DG )) dHθ (x, DG ) +
2
χ(Ω1n )

{(−∞,X1n )∪(Xnn ,∞)}

−
cd
M |Ju (Φn (x), Gn (DG ), Gn (DG ))
2

−
cd
−Ju (F (x), Gn (DG ), Gn (DG )) dHθ (x, DG ),
2
40

the second term on the righthand side of the inequality goes to 0 in probability as n → ∞ by the
−
cd
Glivenko-Cantelli theorem and the assumption (D2). Since |Ju (Φn (x), Gn (DG ), Gn (DG )) −
2
−
cd
Ju (F (x), Gn (DG ), Gn (DG ))| is bounded above almost surely when x ∈ ∆F and Hn (x, y)
2
converges to Hθ (x, y) in distribution as n → ∞, using Theorem 1.9 (i), [43], once again, we have
the ﬁrst term on the righthand side of the inequality above converges to 0. In summary, we have
shown that B4n →p 0 as n → ∞.
Corollary 8. B5n →p 0 and B6n →p 0 as n → ∞.
Proof:

Using similar arguments to those used in the proof of Corollary 7 and Corollary 8,

one can see this is true, so proof omitted.
cd
In summary, we have the asymptotic normality of Rn .

dd
3.4.3 Asymptotic normality of Rn

Note that
dd
n1/2 (Rn − µdd ) =

5

4
Akn +

k=1

Bkn ,
k=1

where
µdd =

E[N dd (X, Y )],

−
−
dd
A1n = n1/2 J dd (F (DF ), F (DF ), G(DG ), G(DG ))[dHn − dH dd ],
−
−
dd
A2n = Vn (G(DG ))Jv (F (DF ), F (DF ), G(DG ), G(DG ))dH dd ,
2
−
−
dd
A3n = Un (F (DF ))Ju (F (DF ), F (DF ), G(DG ), G(DG ))dH dd ,
2
− dd
−
−
A4n = Vn (G(DG ))Jv (F (DF ), F (DF ), G(DG ), G(DG ))dH dd ,
1
− dd
−
−
A5n = Un (F (DF ))Ju (F (DF ), F (DF ), G(DG ), G(DG ))dH dd ,
1
41

−
−
dd
dd
B1 = Vn (G(DG ))[Jv (F (DF ), F (DF ), G(DG ), Ψn )dHn
2
−
−
dd
−Jv (F (DF ), F (DF ), G(DG ), G(DG ))dH dd ],
2
−
−
dd
dd
B2 = Un (F (DF ))[Ju (F (DF ), Φn , G(DG ), G(DG ))dHn
2
−
−
dd
−Ju (F (DF ), F (DF ), G(DG ), G(DG ))dH dd ],
2
−
−
dd
dd
B3 = Vn (G(DG ))[Jv (F (DF ), F (DF ), Θn , G(DG ))dHn
1
−
−
dd
−Ju (F (DF ), F (DF ), G(DG ), G(DG ))dH dd ],
1
−
−
dd
dd
B4 = Un (F (DF ))[Ju (Γn , F (DF ), G(DG ), G(DG ))dHn
1
−
−
dd
−Ju (F (DF ), F (DF ), G(DG ), G(DG ))dH dd ],
1
with n∗ = {the number of observations such that Xj = DF and Yj = DG } for k = 1, 2, ..., n,
dd
and
n∗
dd
dHn = dd
n
dd = P (X = D and Y = D ).
dH
F
G
We need to show that the A-terms are asymptotically normal, and the B-terms converge to 0 in
dd
probability as n → 0. By the SLLN, it is obvious that dHn → dH dd almost surely as n → ∞,
and by the CLT, in distribution
−
−
−
Un (F (DF )) → N (0, [F (DF )(1 − F (DF ))])
Un (F (DF )) → N (0, [F (DF )(1 − F (DF ))])
−
−
−
Vn (G(DG )) → N (0, [G(DG )(1 − G(DG ))])
Vn (G(DG )) → N (0, [G(DG )(1 − G(DG ))])
dd
n1/2 [dHn − dH dd ] → N (0, [dH dd (1 − dH dd )])
42

as n → ∞. Note that
−
−
J dd (F (DF ), F (DF ), G(DG ), G(DG )),
−
−
dd
Jv (F (DF ), F (DF ), G(DG ), G(DG )),
2
−
−
dd
Ju (F (DF ), F (DF ), G(DG ), G(DG )),
2
−
−
dd
Jv (F (DF ), F (DF ), G(DG ), G(DG )), and
1
−
−
dd
Ju (F (DF ), F (DF ), G(DG ), G(DG )),
1
in A1n , A2n , A3n , A4n and A5n , respectively, are ﬁxed numbers. Therefore, the A-terms are
asymptotically normally distributed with mean 0. To show the sum of the A-terms is still normal,
one only need to notice that
−
−
−
−
−
ˆ
ˆ
Un (F (DF )) = n1/2 (Fn (DF ) − F (DF )) + n1/2 (Fn (DF ) − Fn (DF ))
n
−
−
−1/2
= n
(ψX (DF ) − F (DF )) + op (1)
j
j=1
n
∗
= n−1/2
H1kn + op (1)
j=1
ˆ
ˆ
Un (F (DF )) = n1/2 (Fn (DF ) − F (DF )) + n1/2 (Fn (DF ) − Fn (DF ))
n
−1/2
= n
(φX (DF ) − F (DF )) + op (1)
j
j=1
n
∗
−1/2
= n
H2kn + op (1)
j=1
−
−
−
−
−
ˆ
ˆ
Vn (G(DG )) = n1/2 (Gn (DG ) − G(DG )) + n1/2 (Gn (DG ) − Gn (DG ))
n
−
−
−1/2
= n
(ψY (DG ) − G(DG )) + op (1)
j
j=1
n
∗
−1/2
= n
H3kn + op (1)
j=1
ˆ
ˆ
Vn (G(DG )) = n1/2 (Gn (DG ) − G(DG )) + n1/2 (Gn (DG ) − Gn (DG ))
n
−1/2
= n
(φY (DG ) − G(DG )) + op (1)
j
j=1
n
∗
−1/2
= n
H4kn + op (1)
j=1
43

and
n

dd
n1/2 [dHn − dH dd ] = n−1/2

j=1

I{j∈A }
dd

− dH dd

= n−1/2

n
j=1

∗
H5kn ,

∗
where Hikn are i.i.d. and depend on (Xj , Yj ) only, for k = 1, ..., 5. The A1n + A2n + A3n +
n
∗
A4n + A5n can be written as
n−1/2 A∗∗ , where A∗∗ are i.i.d. sum of products of Hikn
4jn
4jn
j=1
and some certain ﬁxed number, and four negligible terms. Using the CLT one more time, we have
that the sum of A-terms is asymptotically normally distributed with mean 0.
Under the assumption (D4), Bi converges to 0 as n → ∞, for i = 1, ..., 4, in probability. We
dd
have shown that the Rn is asymptotically normally distributed with mean 0.

3.4.4 Asymptotic normality of Rn
cc
cd
dc
dd
To show that n1/2 Rn = n1/2 (Rn + Rn + Rn + Rn ) is asymptotic normal, we need the
following notations:
cc
n1/2 (Rn − µcc ) ⇒ Rcc ∼ N (0, var(M cc (X, Y ))),
cd
n1/2 (Rn − µcd ) ⇒ Rcd ∼ N (0, var(M cd (X, Y ))),
dc
n1/2 (Rn − µdc ) ⇒ Rdc ∼ N (0, var(M dc (X, Y ))),
dd
n1/2 (Rn − µdd ) ⇒ Rdd ∼ N (0, var(M dd (X, Y ))).
Noting that
n

cc
n1/2 (Rn − µcc ) =
cd
n1/2 (Rn − µcd ) =

j=1
n

dc
n1/2 (Rn − µdc ) =
dd
n1/2 (Rn − µdd ) =

n−1/2 A∗∗ + a few negligible terms ,
1jn

j=1
n
j=1
n

n−1/2 A∗∗ + a few negligible terms ,
2jn
n−1/2 A∗∗ + a few negligible terms ,
3jn

j=1

n−1/2 A∗∗ + a few negligible terms ,
4jn
44

by the CLT, we have
n1/2 (Rn − µ) =

n
j=1

n−1/2 (A∗∗ + A∗∗ + A∗∗ + A∗∗ ) + a few negligible terms ,
1jn
2jn
3jn
4jn

⇒ N (0, σ 2 )
where µ = (µcc + µcd + µdc + µdd ), and
σ 2 = var[M cc (X, Y ) + M cd (X, Y ) + M dc (X, Y ) + M dd (X, Y )].

3.5 Proof of the Main Result
We ﬁnish this Chapter by providing the proof of Theorem 3.1.1.
Proof of Theorem 3.1.1 As shown in Section 3.3, Bn → β almost surely, and in Section
3.4 An is asymptotically normal with mean 0 and variance σ 2 , as n → ∞. Using the Slutsky’s
ˆ
Theorem, n1/2 (θn − θ) = n1/2 An /Bn is asymptotically normal with mean 0 and variance
= σ 2 /β 2 .
In summary, we have shown the consistency and asymptotic normality of the semi-parametric
ˆ
estimator θn .

45

CHAPTER 4
A VARIANCE ESTIMATOR OF
BIVARIATE DISTRIBUTIONS
σ2
ˆ
. Suppose estimators σ 2 and β 2 could
ˆ
2
β
be found for σ 2 and β 2 respectively. A rough-and-ready estimator of the asymptotic variance of
σ2
ˆ
would then be given by ˆ =
. If the variables given in (3.9) and (3.10) could be observed, one
ˆ
β2
could simply estimate σ 2 and β by the respective sample variance and sample mean. As this is not
In Chapter 3, we provided an explicit formula for

=

possible, the corresponding pseudo-observations can be used instead, which are deﬁned in term of
Cn , the re-scaled empirical copula function of the bivariate sample, namely
1
Cn (u, v) =
n

n
I
j=1

Fn (Xj )≤u,Gn (Yj )≤v

.

cc cd dc
dd
Let lθ,θ , lθ,θ , lθ,θ , and lθ,θ be the decomposed components of lθ,θ with respect to c∗ in (2.7)
θ
cc cd dc
dd
under different situations. Similarly, one can deﬁne lθ , lθ , lθ , and lθ for lθ .
From (3.9), we have the following:
ˆ 1
β=
n

n

ˆ cc
ˆ cd
ˆ dc
ˆ dd
[Nj + Nj + Nj + Nj ],
j=1
46

(4.1)

j
(Xj , Yj )
RX(j)

1
(2.1,1.0)
1

2
(2.5, 0.5)
2

3
(3.2,1.0)
5

4
(3.2,1.5)
5

5
(3.2,2.0)
5

6
(3.5,2.5)
6

7
(3.7,2.5)
7

RY (j)

3

1

3

4

5

7

7

Table 4.1. Relationship among (Xj , Yj ), RX(j) , and RY (j) .

where
cc ˆ
ˆ cc
Nj = I{j∈A } lθ,θ (θn , Fn (Xj ), Gn (Yj ))
cc
−
cd ˆ
ˆ cd = I
Nj
{j∈Acd } lθ,θ (θn , Fn (Xj ), Gn (Yj ), Gn (Yj ))
−
dc ˆ
ˆ dc = I
Nj
{j∈Adc } lθ,θ (θn , Fn (Xj ), Fn (Xj ), Gn (Yj ))
−
−
dd ˆ
ˆ dd = I
Nj
{j∈Add } lθ,θ (θn , Fn (Xj ), Fn (Xj ), Gn (Yj ), Gn (Yj )).

To get σ 2 for σ 2 in (3.11), we need more notation. Note that rearranging {Xj , Yj }, j =
ˆ
1, ..., n, shall not change the value of σ 2 . Therefore, we assume the sample is in an order such that
ˆ
Xj ’s are in non-decreasing order. Let
RX(j) =

i

I{X ≤X }
i
j

be the rank of Xj in the sequence of X’s (Similarly, we can deﬁne RY (j) for Y ’s), and let
R−
=
X(j)

i

I{X <X }
i
j

be the next lower rank to RX(j) within the X’s sequence. Similarly, we can deﬁne R−
for
Y (j)
RY (j) within the Y ’s sequence.
Under the current setting, for any i < j, the following always holds:
RX(i) ≤ RX(j) .
47

Table 4.1 gives a simple example showing the relationship among (Xj , Yj ), RX(j) , and
RY (j) . As shown in the table, RX(3) = RX(4) = RX(5) = 5, and R−
= R−
=
X(5)
X(4)
R−
= RX(2) = 2 by deﬁnition.
X(3)
Now we can work on the details. Note that in (3.10), σ 2 is the variance of the sum of
M S (X, Y ), where S = {cc}, {cd}, {dc}, or{dd}. For a given sample (Xj , Yj ), σ 2 can be estimated by the sample variance of the following items:


ˆ cc =
Mj
+

1
cc ˆ
lθ (θn , Fn (Xj ), Gn (Yj ))I{j∈A } + 
cc
n
cc
RY (k) ≥RY (j) lθ,v2

n

cc
lθ,u
2

ˆ
θn ,

k=j
RX(k) RY (k)
ˆ
,
I{k∈A }
θn ,
cc
n+1 n+1

RX(k) RY (k)
,
n+1 n+1

dG

cd ˆ
lθ (θn , Fn (Xj ), Gn (Yj− ), Gn (Yj ))I{j∈A , Y =D }
cd j
Gl
l=1



−
dG n
RX(k) RY (k) RY (k)
1
cd
ˆ
I
lθ,u θn ,
+
,
,

{k∈Acd , Yk =DGl }
n
n+1 n+1 n+1
2
l=1 k=j


−
dG
RX(k) RY (k) RY (k)
cd
ˆ
I
lθ,v θn ,
+
,
,
{k∈Acd , Yk =DGl }
n+1 n+1 n+1
1
l=1 RY (k) >RY (j)


−
dG
RX(k) RY (k) RY (k)
cd
ˆ
I
lθ,v θn ,
+
,
,
{k∈Acd , Yk =DGl }
n+1 n+1 n+1
2
l=1 RY (k) ≥RY (j)

ˆ cd =
Mj

ˆ dc =
Mj

+

dF
h=1

1

n

−
dc ˆ
lθ (θn , Fn (Xj ), Fn (Xj ), Gn (Yj ))I{j∈A , X =D }
dc j
Fh
dF

h=1 RY (k) ≥RY (j)




R−
RX(k) RY (k)
X(k)
dc
ˆ
·
lθ,v θn ,
,
,
n+1 n+1 n+1
2

I{k∈A , X =D }
dc k
Fh
48



dF
+
h=1 RX(k) >RX(j)


R−
RX(k) RY (k)
X(k)
dc
ˆ
·
lθ,u θn ,
,
,
n+1 n+1 n+1
1

I{k∈A , X =D }
dc k
Fh



−
dF n
R
X(k) RX(k) RY (k) 

dc
ˆ
+
lθ,u θn ,
,
,
I{k∈A , X =D } 
n+1 n+1 n+1
2
dc k
Fh
h=1 k=j
dF dG
−
dd ˆ
ˆ dd =
Mj
lθ (θn , Fn (Xj ), Fn (Xj ), Gn (Yj− ), Gn (Yj ))I{X =D , Y =D }
j
Fh j
Gl
h=1 l=1



−
d
d
n
R−
RX(k) RY (k) RY (k)
 F G
X(k)
dd
ˆ
·
+
lθ,v θn ,
,
,
,

n+1 n+1 n+1 n+1
1
h=1 l=1 RY (k) >RY (j)
I{X =D , Y =D }
k
Fh k
Gl
dF dG

n

+
h=1 l=1 RY (k) ≥RY (j)




−
R−
RX(k) RY (k) RY (k)
X(k)
dd
ˆ
·
lθ,v θn ,
,
,
,
n+1 n+1 n+1 n+1
2

I{X =D , Y =D }
k
Fh k
Gl
dF dG

n

+
h=1 l=1 RX(k) >RX(j)




−
R−
RX(k) RY (k) RY (k)
X(k)
dd
ˆ
·
lθ,u θn ,
,
,
,
n+1 n+1 n+1 n+1
1

I{X =D , Y =D }
k
Fh k
Gl


−
dF dG n
R−
RX(k) RY (k) RY (k)
X(k)
ˆ
·
+
lθ,u θn ,
,
,
,
n+1 n+1 n+1 n+1
2
h=1 l=1 k=j
n∗
hl
I{X =D , Y =D } · n ,
k
Fh k
Gl

where n∗ = {the number of observations (Xj , Yj ) : Xj = DF h , Yj = DGl }.
hl
ˆ2
Next, to establish the consistency of ˆn , it sufﬁces to show that σn and βn are themselves
ˆ2
ˆ
consistent. To prove that βn − β converges almost surely, for example, express the difference as
n
1
−
J ˆ (Fn (Xj ), Fn (Xj ), Gn (Yj− ), Gn (Yj )) − E[Jθ (F (X − ), F (X), G(Y − ), G(Y ))]
θn
n
j=1
49

in terms of Jθ = lθ,θ . By the triangle inequality, this quantity is no greater than
1
n

n
j=1

−
J ˆ (Fn (Xj ), Fn (Xj ), Gn (Yj− ), Gn (Yj ))
θn
−
− Jθ (Fn (Xj ), Fn (Xj ), Gn (Yj− ), Gn (Yj ))

1
+
n

n
j=1

−
Jθ (Fn (Xj ), Fn (Xj ), Gn (Yj− ), Gn (Yj )) − E[Jθ (F (X − ), F (X), G(Y − ), G(Y ))]

Assuming that Jθ is bounded by an integrable function in a neighborhood of the true value
of θ, by the convergence of maximum likelihood estimators, the ﬁrst summand then converges to
zero by the Dominated Convergence Theorem. Since the second term vanishes asymptotically by
ˆ
a similar argument as used in Chapter 3, it follows that βn → β almost surely. The argument for
σn is similar, though somewhat more involved. It will not be presented here.
ˆ
In summary, we provided a variance estimator of , ˆ, and illustrated its consistency.

50

CHAPTER 5
EXTENSIONS TO HIGHER
DIMENSIONS
The previous developments extend more or less automatically to situations where it is desired to
estimate a multidimensional dependence semi-parametrically.
Let a boldfaced letter such as x denotes a D-tuple (D ≥ 2) vector of real numbers, that is,
x ≡ (x1 , . . . , xD )T where xk ∈ R for k = 1, . . . , D, and F1 , ..., FD denote D univariate mixed
marginals on the real line given by
dk

dk
Fk (xk ) =
h=1

pkh I{D ≤x } + (1 −
kh k

pkh )
h=1

w≤xk

fk (w)dw,

where Dkh is the h-th jump point of Fk with P (Xk = Dkh ) = pkh , for h = 1, ..., dk , and fk (·)
is a continuous density function with support on the real line.
When restricted to a particular copula family, say {Cθ : θ = (θ1 , ..., θq )T ∈ A ⊆ Rq }, a
multivariate joint distribution function for (x1 , . . . , xD )T can be deﬁned as follows:
F D (x1 , . . . , xD ) = Cθ (F1 (x1 ), . . . , FD (xD )).

(5.1)

Result 1: F D (x1 , . . . , xD ) deﬁned in (5.1) is a valid joint distribution function on RD .
Proof:

For F D (x1 , . . . , xD ) to be a valid distribution function on RD , the following four

conditions should be satisﬁed:
51

(1) 0 ≤ F D (x1 , . . . , xD ) ≤ 1 for all (x1 , . . . , xD ),
(2) F D (x1 , . . . , xD ) → 0 as max(x1 , . . . , xD ) → −∞,
(3) F D (x1 , . . . , xD ) → 1 as min(x1 , . . . , xD ) → +∞, and
(4) for every pair of (x1 , . . . , xD ) and (y1 , . . . , yD ) ∈ RD with xk ≤ yk for k = 1, ..., D,
which deﬁnes a D-box V = [x1 , y1 ] × [x2 , y2 ] × ... × [xd , yD ]. Then
sgn(c)F (c) ≥ 0

where the sum is taken over all vertices c of V and sgn(·) is deﬁned as in (1.3).
Note that since F D (x1 , . . . , xD )

=

Cθ (F1 (x1 ), . . . , FD (xD ))

=

P (U1

≤

F1 (x1 ), ..., UD ≤ FD (xD )), (1) follows. To prove (2), note that Fk (xk ) → 0 for all k = 1, ..., D
when max(x1 , . . . , xD ) → −∞. Hence, F D (x1 , . . . , xD ) → 0 from the property of a cumulative distribution function. (3) follows similarly. (4) is the direct result of (1.3) and (5.1). Proof
completed.
Deﬁne C(Fk ) to be the collection of all points of continuity of Fk and J (Fk ) =
{Dk1 , ..., Dk d } to be the collection of all jump points of Fk .
k
For a ﬁxed x ∈ RD , every component of x corresponds to either a point of continuity or
discontinuity of the corresponding marginal distribution. Suppose the indexes c1 , c2 , ..., c
∈
H
{1, 2, ..., D} corresponds to xc ∈ C(Fc ) (i.e., xc belongs to the set of continuity points of Fc )
h
h
h
h
and the remaining indexes, say d1 , d2 , ..., dL , with H + L = D, are the indexes of marginals
such that
xd = Dd ,i , l = 1, 2, ..., L .
l
l dl
When {c1 , c2 , ..., cH } = {1, 2, ..., D}, the density of F D is given by
dF D = cθ (F1 (x1 ), . . . , FD (xD ))

D
fk (xk ),
k=1

52

and if {c1 , c2 , ..., c
dF D =

...

H

} ⊂ {1, 2, ..., D} the density of F D is given by

H
cθ (u1 , . . . , uD )dud ...dud ·
fch (xch ),
L [F (D−
1
L
),Fd (Dd ,i )]
l=1 dl dl ,id
h=1
l
l dl
l

where uch = Fch (xch ). Note that the above can be written as
c∗ (F1 , ...FD )(x) ·
θ

H
h=1

fch (xch ).

Letting Xj = (X1j , X2j , ..., XDj )T where j = 1, ..., n represent a random sample from F D ,
ˆ
the semi-parametric estimator θn of θ would then be obtained as a solution of the system
1
n

n

∂
log[dF D (X1j , . . . , XDj )] = 0, (1 ≤ i ≤ q)
∂θi

j=1

(5.2)

Since θ is the only vector of parameters of interest, the components of likelihood that matters is
1
n

n
j=1

∂
log[c∗ (F1 , ...FD )(X1j , . . . , XDj )] = 0, (1 ≤ i ≤ q)
θ
∂θi

(5.3)

Using the same techniques as in Chapter 3, Theorem 3.1.1 can be extended to the Dˆ
dimensional case. Furthermore, the limiting variance-covariance matrix of n1/2 (θn − θ) is then
B −1 ΣB −1 , where B is the information matrix associated with Cθ . To give the explicit form of
Σ, we need some notations:
Let



 F (x ) ≡ u ,
if xk ∈ C(Fk ),
p
k k
k2
F (xk ) =
k

 (F (x− ), F (x )) ≡ (u , u ), if x ∈ J (F ).
k k
k k
k1 k2
k
k

Note that in the present case, the relevant statistic in (5.3) can be written as, for 1 ≤ i ≤ q,
1
Ri,n =
n
1
=
n

n
j=1

−
−
Ji (F1 (X1j ), F1 (X1j ), ..., FD (XDj ), FD (XDj ))
n

Q j=1

Q p
p
Ji (F1 (X1j ), ..., FD (XDj )),
53

where Q = {Q1 , ..., QD } is a D-sequence consisting of letters of {c} or {d}, with the following properties: (i) when Q is combined with a point x and the marginals F1 , ..., FD , say
x ∈ Q(F1 , ..., FD ), a ‘c’ at the k-th component of Q refers to xk belonging to C(Fk ), and
a ‘d’ at the k-th component refers to xk belonging to J (Fk ), for k = 1, ..., D; (ii) when Q
is combined with Ji or Ji ’s derivative with respect to uk1 or uk2 , a ‘c’ at the k-th compop
p
nent refers to F (Xkj ) = Fk (Xkj ), and a ‘d’ at the k-th component refers to F (Xkj ) =
k
k
(Fk (X − ), Fk (Xkj )), for k = 1, ..., D.
kj
Using the same techniques as those used in Chapter 3, one can see that Σ is the variancecovariance matrix of the q-dimensional random vector whose i-th component is given by
var(
Q

Q
Mi (X1 , ..., XD ));

Q
the following is a general form of Mi (X1 , ..., XD )). Assuming Q = {c1 , c2 , ..., cH } ∪
{d1 , d2 , ..., dL }, and L and H can be any value among {0, ..., D}, such that H + L = D.
Then
Q
Mi (x1 , ..., xD ) =

...
id
1

id
L

Q p
p
Ji (F1 (x1 ), , ..., FD (xD ))

H
h=1

I{x

ch ∈C(Fch )}

·

D
L I
Lik .
l=1 {xd =Dd i } +
l
l dl
k=1
The set of jump points for Fd is {Dd 1 , ..., Dd d }, and Lik is given as below:
l
l
l dl
(i) if the k-th component in Q = c;
Lik =

...

...
I(xk ≤ xk )·
dc1
dc
H Dc n }}
{R\{∪n=1 Dc1 n }}
{R\{∪n=1
id
id
H
1
L
Q
p
p
Jiu (F1 (x1 ), ..., FD (xD ))dF (x1 , ..., xD )
k2

with xd = Dd i ;
l dl
l
54

(ii) if the k-th component in Q = d;
Lik =

...

dc1
{R\{∪n=1 Dc1 n }}

...

I(xk < xk )·
dc
H Dc n }}
{R\{∪n=1
H

id
id
1
L
Q
p
p
Jiu (F1 (x1 ), ..., FD (xD ))dF (x1 , ..., xD )
k1
...
I(xk ≤ xk )·
+
...
dc
dc1
H Dc n }}
{R\{∪n=1 Dc1 n }}
{R\{∪n=1
id
id
H
1
L
Q
p
p
Jiu (F1 (x1 ), ..., FD (xD ))dF (x1 , ..., xD ),
k2

with xd = Dd i .
l dl
l
As the parallel with the case q = 1 and D = 2 described earlier in Chapter 2 and Chapter 3,
ˆ
an estimator of the variance-covariance matrix of θn could be found by repeating the procedure
described in Chapter 4.

55

CHAPTER 6
JOINT DISTRIBUTIONS USING
t-COPULA
6.1 Joint Distributions Using t-copula
In this Chapter, we develop the joint distributions with the family of t-copulas, see, [34], and [8].
The reason we choose t-copulas is not only it is a generalization of the Gaussian copulas, but also
it has the great tail dependence property. This property allows us to study the limiting association
between random variables X and Y as both x and y go to their boundaries. Our ﬁnding is that
the limiting association is governed by the correlation coefﬁcient ρ together with the degrees of
freedom ν, which is listed in Lemma 6.1.3. In reality, there are situations where random variables
are still associated in a certain level even in the tails. For instance, in biometric recognition, the
genuine and imposter distributions generated from the same biometric trait or different biometric
traits in a multimodal biometric system, tend to have not exact the same but similar tail behavior,
as shown in Chapter 7, Section 7.2.2. The t-copula models can ﬁt in this case more appropriately
than a copula family which does not have the tail dependency.
Before deﬁning the t-copula, we develop some notations for the presentation. Let Σ denote a
positive deﬁnite matrix of dimension D × D. For such a Σ, the D-dimensional t density with ν
56

degrees of freedom will be denoted by
Γ( ν+D )
D
2
fν,Σ (x) ≡
D
1
(πν) 2 Γ( ν )|Σ| 2
2

− ν+D
2
xT Σ−1 x
1+
,
ν

with corresponding cumulative distribution function
D
tD (x) ≡
fν,Σ (w) dw.
ν,Σ
w≤x
The matrix Σ with unit diagonal entries corresponds to a correlation matrix and will be denoted by
R.
The D-dimensional t-copula function is given by
Cν,R (u) ≡

D
−1 (u) fν,R (w) dw
w≤tν

(6.1)

where t−1 (u) ≡ (t−1 (u1 ), . . . , t−1 (uD ))T , t−1 is the inverse of the cumulative distribution
ν
ν
ν
ν
function of univariate t with ν degrees of freedom, and R is a D × D correlation matrix. Note that
Cν,R (u) = P{X ≤ t−1 (u)}, for X = (X1 , . . . , XD )T distributed as tD , demonstrating that
ν
ν,R
D with uniform marginals. The density corresponding
Cν,R (u) is a distribution function on (0, 1)
to Cν,R (u) is given by
∂ D Cν,R (u)

cν,R (u) =
=
∂u1 ∂u2 . . . ∂uD

D
fν,R {t−1 (u)}
ν
D f {t−1 (u )}
k
k=1 ν ν

,

(6.2)

where fν in (6.2) is the density of the univariate t distribution with ν degrees of freedom.
We consider the joint distributions of the form
D
Fν,R (x) = Cν,R {F1 (x1 ), . . . , FD (xD )}

(6.3)

D
with Cν,R deﬁned as in (6.1). It follows from the properties of a copula function that Fν,R is a
valid multivariate distribution function on RD . The identiﬁability of the marginal distributions Fk ,
k = 1, . . . , D, the correlation matrix R, and the degrees of freedom parameter, ν, are established
in the following theorem:
57

D
Theorem 6.1.1. Let Fν ,R and GD ,R denote two distribution functions on RD obtained from
ν2 2
1 1
equation (6.3) with marginal distributions Fk , k = 1, . . . , D and Gk , k = 1, . . . , D, respectively.
D
Suppose we have Fν ,R (x) = GD ,R (x) for all x. Then, Fk (x) = Gk (x) for all k, ν1 =
ν2 2
1 1
ν2 and R1 = R2 .
In order to prove identiﬁability of (ν, R) in Theorem 6.1.1, we ﬁrst must state and prove several
lemmas.
D
Lemma 6.1.1 Fix ν. Let Fν,R and GD
ν,R2 denote two cumulative distribution functions on
1
RD obtained from equation (6.3) with marginal distributions Fk , k = 1, . . . , D and Gk , k =
1, . . . , D, respectively. Suppose we have
D
Fν,R (x) = GD (x)
ν,R2
1

(6.4)

for all x. Then, Fk (x) = Gk (x) for all k and R1 = R2 .
D
Proof: By taking xi , i = k tending to ∞, we get that Fν,R (x) → Fk (xk ) and GD (x) →
ν,R2
1
Gk (xk ). It follows that Fk (x) = Gk (x) for all k.
Next we show that the correlation matrices R1 and R2 are equal. We ﬁrst prove this result
for D = 2. Note that when D = 2, R1 and R2 can be determined by one correlation parameter,
namely, ρ1 and ρ2 , respectively, so, we have
2
Fν,ρ (x) = Cν,ρ1 {F1 (x1 ), F2 (x2 )} ≡
1

2
−1 (v) fν,ρ1 (w) dw
w≤tν

(6.5)

and
G2 (x) = Cν,ρ2 {F1 (x1 ), F2 (x2 )} ≡
ν,ρ2

2
−1 (v) fν,ρ2 (w) dw
w≤tν
where v = (F1 (x1 ), F2 (x2 ))T , so, one only needs to prove ρ1 = ρ2 .

(6.6)

Now, for any real numbers a and b, the bivariate t-cumulative distribution function, t2 (a, b),
ν,ρ
with ν degrees of freedom and correlation parameter ρ is a strictly increasing function of ρ. Note
that
t2 (a, b) =
ν,ρ
=

a

b

−∞ −∞
a
b

2
fν,ρ (m, n)dm dn
∞

−∞ −∞ 0
58

φ(m, n, ρ, w) gν (w) dw dm dn,

where

m2 − 2ρmn + n2
φ(m, n, ρ, w) =
exp −
2w(1 − ρ2 )
2πw(1 − ρ2 )1/2
and gν (w) is the probability density function of the inverse Gamma distribution deﬁned as
1

ν 2
gν (w2 ) ∼ IG( , ).
2 ν

(6.7)

Differentiating t2 (a, b) with respect to ρ, we get
ν,ρ
∂t2 (a, b)
ν,ρ

∞ ∂φ(m, n, ρ, w)
gν (w) dw dm dn
∂ρ
∂ρ
−∞ −∞ 0
a
b
∞ d2 φ(m, n, ρ, w)
=
w
gν (w) dw dm dn
dm dn
−∞ −∞ 0
∞
=
w φ(m, n, ρ, w) gν (w) dw > 0,
0
using the fact that ∂φ(m, n, ρ, w)/dρ = w ∂ 2 φ(m, n, ρ, w)/∂m∂n.
=

a

b

2
Note that (6.5) and (6.6) can be re-written as Fν,ρ (x) = t2 (a, b) for j = 1, 2 with a =
ν,ρj
j
2
2
t−1 (F1 (x1 )) and b = t−1 (F2 (x2 )). So, Fν,ρ (x) = Fν,ρ (x), which gives
ν
ν
1
2
t2 (a, b) = t2 (a, b).
ν,ρ2
ν,ρ1
Since when ν is ﬁxed, for any a and b, t2 (a, b) is an strictly increasing function of ρ, the equality
ν,ρj
above implies that ρ1 = ρ2 must hold. The proof is completed.
Now, we prove Lemma 6.1.1 for general D. Let R1 = ((ρkk ,1 )) and R2 = ((ρkk ,2 )) be
the representation of the correlation matrices R1 and R2 in terms of their entries. Note that the
condition (6.4) can be reduced to the case D = 2 by taking xi , i = k, k tending to inﬁnity. Thus,
2
we have Fν,ρ

kk ,1

(xk , xk ) = G2
ν,ρ

kk ,2

(xk , xk ), but this implies ρkk ,1 = ρkk ,2 from the

case D = 2.
Next, we state and prove two more relevant lemmas:
Lemma 6.1.2 For a > 0, tν {−(aν)1/2 } is a decreasing function of ν.
Proof: Let Y ∼ tν (·). Then Z = Y /ν 1/2 has pdf fν (z) = [Γ{(ν + 1)/2}/{Γ(ν/2)}]
−(ν+1)/2 −1/2
1 + z2
π
. One can see that if ν1 ≤ ν2 , {fν2 (|z|)/fν1 (|z|)} is a strictly decreasing function of |z|. Since I
is a nondecreasing function of a, mimicking the
{|Z|≥a1/2 }
59

proof of Lemma 2 in [28], one can prove that Eν (I
) is a decreasing function of ν. It
{|Z|≥a1/2 }
follows that
tν {−(aν)1/2 } = P{Y ≤ −(aν)1/2 } = P{Z ≤ −a1/2 } =

Eν (I

)
{|Z|≥a1/2 }
2

is decreasing in ν.
Lemma 6.1.3 For the bivariate t-copula Cν,ρ (u, v), let
λ∗ = lim
v→0+

∂Cν,ρ (u, v)
∂v
u=v

and

µ∗ = lim
v→0+

∂Cν,ρ (u, v)
.
∂v
u=1−v

Then, it follows that
1 − ρ 1/2
λ∗ = tν+1 − (ν + 1)
1+ρ

1 + ρ 1/2
and µ∗ = tν+1 − (ν + 1)
.
1−ρ

Proof: It is not hard to see that if (X, Y )T ∼ t2 , then given Y = y,
ν,ρ
ν + 1 1/2 X − ρy
∼ tν+1 .
ν + y2
(1 − ρ2 )1/2

(6.8)

Therefore
Cν,ρ (u, v) = P(X ≤ t−1 (u), Y ≤ t−1 (v))
ν
ν
−1 (v)
tν
ν + 1 1/2
=
tν+1
ν + y2
−∞
Taking derivative with respect to v, we get

1/2
∂Cν,ρ (u, v)
ν+1
= tν+1 
∂v
ν + t−1 (v)2
ν

1/2
ν+1
= tν+1 
ν + t−1 (v)2
ν

1/2
ν+1
= tν+1 
ν + t−1 (v)2
ν
1 − ρ 1/2
→ tν+1 − (ν + 1)
1+ρ
60

t−1 (u) − ρy
ν
(1 − ρ2 )1/2

fν (y) dy.


t−1 (u) − ρt−1 (v) 
ν
ν
1/2
(1 − ρ2 )

−1 (v) − ρt−1 (v)
tν
ν
 , putting u = v
1/2
(1 − ρ2 )

−1 (v)(1 − ρ)
tν

(1 − ρ2 )1/2

as v → 0+. The other expression can be derived similarly.
Remark: Lemma 6.1.3 tells the property of tail dependence in t-copulas.
Proof of Theorem 6.1.1
Using similar arguments as before, we only need to prove Theorem 6.1.1 for D = 2. It easily
follows by taking xi → ∞ for i = k, that Fk (xk ) = Gk (xk ) for k = 1, 2. It follows that
Cν1 ,ρ1 (u, v) = Cν2 ,ρ2 (u, v)

(6.9)

for all pairs of (u, v) of the form (F1 (x1 ), F2 (x2 )). Thus, (6.9) holds for all values of (u, v) in
(0, 1)2 as in the case when both marginals are continuous. Nevertheless, since both marginals have
only a ﬁnite number of discontinuities, there are inﬁnite values of (u, v) for which the limits and
derivatives in Lemma 6.1.3 can be applied to obtain λ∗ and µ∗ . Since (6.9) holds, we must have
λ∗ = λ∗ and µ∗ = µ∗ . Now without loss of generality, assume ρ1 = ρ2 and ρ1 ≥ ρ2 , from
1
2
1
2
the equality λ∗ = λ∗ and Lemma 6.1.2, we must have ν1 ≥ ν2 . On the other hand, the equality
1
2
µ∗ = µ∗ gives ν1 ≤ ν2 . Hence, ρ1 ≥ ρ2 implies that ν1 = ν2 = ν, say. Now, using Lemma
1
2
6.1.1, we get ρ1 = ρ2 . The proof of theorem is completed.
Remark: Note that as the degrees of freedom ν → ∞, the t-distribution converges asymptotically to a Gaussian distribution, so does the t-copula. For the Gaussian copulas, the counterpart of
Theorem 6.1.1 also holds. Therefore, we have the following lemma:
D
Lemma 6.1.4 Let FR and GD denote two distribution functions on RD in the Gaussian
R2
1
copula family with marginal distributions Fk , k = 1, . . . , D and Gk , k = 1, . . . , D, respectively.
D
Suppose we have FR (x) = GD (x) for all x. Then, Fk (x) = Gk (x) for all k, and R1 = R2 .
R2
1
Proof: Without loss of generality, we prove Lemma for the case that D = 2, i.e., ρ1 = ρ2 .
Notice that it sufﬁces to prove that for any ﬁxed (a, b) ∈ R2 ,
Pρ (Z1 ≤ a, Z2 ≤ b) = Φ2,ρ (a, b)
is increasing in ρ, where Zi ∼ N (0, 1), i = 1, 2, with correlation coefﬁcient ρ.
61

Note that
Φ2,ρ (a, b) =

a

b

−∞ −∞

φρ (z1 , z2 )dz1 dz2 ,

which gives us
∂Φ2,ρ
∂ρ

(a, b) =
=

=

a

b

−∞ −∞
a
b

φρ (z1 , z2 )dz1 dz2

(z ρ − z2 )(z2 ρ − z1 )
ρ
+ 1
·
(1 − ρ2 )2
−∞ −∞ 2π (1 − ρ2 ) 1 − ρ2
2
2
z1 −2ρz1 z2 +z2
exp −
dz1 dz2
2(1−ρ2 )
1

∂ 2 φρ (z1 , z2 )
dz1 dz2
−∞ −∞ ∂z1 ∂z2
a

b

= φρ (a, b)
which is always positive. Therefore, we have proved the uniqueness of ρ. The proof of Lemma is
completed.
Remark: It is not hard to see that for the Gaussian copulas, the tail dependence between
random variables no longer exists.
D
D
Using notations deﬁned in Chapter 5, the density of Fν,R at a ﬁxed point x ∈ RD , dFν,R (x),
is a function on RD that satisﬁes
D
D
Fν,R (x) =
dFν,R (w),
w≤x

(6.10)

D
where dFν,R (x) is given by
H
...

where uc

L [F (D−
l=1 dl dl ,id ),Fdl (Ddl ,id
l
l
h

cν,R (u1 , . . . , uD )dud , ..., dud ·
1
L
)]

h=1

fc (xc ),
h h

= Fc (xc ), and H and L could be any value among {0, 1, ..., D} such that H +
h h

L = D. Note that the above density can be generally written as
c∗ (ν, R, F1 , ..., FD )(x) ·

H
h=1

62

fc (xc ).
h h

(6.11)

6.2 Estimation of R and ν
6.2.1 Estimation of R for ﬁxed ν
Let X1 , . . . , Xn be n independent and identically distributed D-dimensional random vectors arisD
ˆ
ing from the joint distribution FR,ν in (6.3), Fkn denote the empirical distribution function of Fk
ˆ
and Fkn = nFkn /(n + 1).
From (6.11), the log-likelihood function corresponding to the n i.i.d. observations Xj , j =
1, . . . , n, is given by
n
τ (ν, R) =
j=1
n
=
j=1

D
log dFν,R (Xj )
(6.12)
log c∗ (ν, R, F1 , . . . , FD )(Xj )

For ﬁxed ν, our estimator of R is taken to be the maximizer of τ (ν, R), that is,
ˆ
ˆ
R(ν) = arg maxR τ (ν, R).
ˆ

(6.13)

Note that there are two main challenges in maximizing the likelihood τ (ν, R): Firstly, τ (ν, R)
ˆ
ˆ
involves several integrals corresponding to discrete components in Xj , j = 1, . . . , n; c∗ is not
available in a closed form. Secondly, τ (ν, R) needs to be maximized over all D × D correlation
ˆ
2
matrices. The space of all correlation matrices is a compact space of [−1, 1]D since all entries of
a correlation matrix lie between −1 and 1. However, this space is not easy to maximize over due
to the constraint of positive deﬁniteness placed on correlation matrices.

The EM Algorithm
The difﬁculties mentioned can be overcome with the use of the EM algorithm; see, for example,
[31]. The EM algorithm is a well-known algorithm used to ﬁnd the maximum likelihood estimator
(MLE) of a parameter θ based on data y distributed according to the likelihood obs ( y; θ). In
63

many situations, obtaining the MLE,
ˆ
θM LE = arg maxθ obs ( y; θ),
via maximization of obs ( y; θ) over θ turns out to be difﬁcult. In such cases, the observed likelihood can usually be expressed in terms of an integral over missing components of a complete
likelihood; in other words, if z and com ( y, z; θ), respectively, denote the missing observations
and the complete likelihood corresponding to (y, z), it follows that
obs ( y; θ) =

Z(y)

com ( y, z; θ) dz

where Z(y) is the range of integration of z subject to the observed data being y. The EM algorithm
is an iterative procedure that can be formulated in two steps: First, (i) the E-step, where the quantity
Q(θ(N ) , θ) = E
{log com ( y, z; θ)}
π( z | θ(N ) ,y)

(6.14)

is formed; the expectation in (6.14) is taken with respect to the conditional distribution of Z given
y when evaluated at a particular value, θ(N ) , and the conditional distribution of Z given y is given
by
π( z | θ, y) =

com ( y, z; θ)
.
obs (y; θ)

(6.15)

Second, (ii) the M-step, where Q(θ(N ) , θ) is maximized with respect to θ to obtain θ(N +1) , that
is,
θ(N +1) = arg maxθ Q(θ(N ) , θ).
ˆ
Starting from an initial value θ(0) , the sequence θ(N ) , N ≥ 0 converges to θM LE under suitable
regularity conditions.
The integrals in τ (ν, R) corresponding to the discrete components can be formulated as missing
ˆ
components of a complete likelihood. Recall that the notations ch and dl were used to denote the
discrete and continuous components in the vector x. We now extend this notation to represent
discrete and continuous components in the j-th observation vector Xj , j = 1, . . . , n. For j =
64

1, . . . , n, let dlj , l = 1, . . . , Lj and chj , h = 1, . . . , Hj , respectively, denote the discrete and
continuous components of Xj with respect to the corresponding marginal distributions. Next,
deﬁne the vector uj in the following way: The dlj -th component of uj , ud , is a number that
lj
− ), F (x )], that is, u ∈ [F (x− ), F (x )] ≡
is allowed to vary between [Fd (x
dlj dlj
lj
dlj d
dlj dlj
lj dlj
lj
Slj , say, for l = 1, . . . , Lj . The chj -th component of uj , uc , is taken to be uc = Fc (xc )
hj
hj
hj hj
for h = 1, . . . , Hj . In that case, we have
c∗ (ν, R, F1 , . . . , FD )(Xj ) =

S1j

...

S

Lj

cD (uj ) dud . . . dud
ν,R
1j
L

j

Making the transformation zj = t−1 (uj ), the likelihood corresponding to the j-th observation in
ν
(6.12) can be written as c∗ (ν, R, F1 , . . . , FD )(Xj ) = N U M j /DEN OM j , where
NUMj =

D
−1 (S ) . . . t−1 (S ) fν,R (zj ) dzd1j . . . dzdL
tν
ν
1j
j
Lj

and
Hj
DEN OM j =
h=1

fν (zchj ),

where zc
= t−1 {Fc (xc )} and t−1 (Slj ) = [t−1 {Fd (x− )}, t−1 {Fd (xd )}]. We
ν
ν
ν
ν
hj
hj hj
lj dlj
lj
lj
make several important observations. First, where the maximization of R is concerned for ﬁxed ν,
it is enough to consider the N U M j terms. Thus, in the EM framework, we deﬁne N U M j to be
the “observed likelihood” corresponding to the xj :
obs (xj ; ν, R) =

D
−1 (S ) . . . t−1 (S ) fν,R (zj ) dzd1j . . . dzdL .
tν
ν
1j
j
Lj

(6.16)

If the variables zd j = 1, . . . , Lj in (6.16) are treated as missing, the “complete likelihood”
lj
corresponding to the j-th observation becomes


Lj


D
,
I
com (xj , Zj ; ν, R) = fν,R (zj ) 

{zd ∈t−1 (Slj )} 
l=1
lj ν
65

D
where Zj = { zd , j = 1, . . . , Lj }. Next, we note that the t density, fν,R , is an inﬁnite scale
lj
mixture of Gaussian densities, namely,
D
fν,R (zj ) =

∞

2
φD (z ) g (σ 2 ) dσj
2 =0 R,σj j ν j
σj


where
φD (z) =
R,σj

1

zT R−1 z




exp −
2
D
1
2σj
D
(2π) 2 σj |R| 2
and the mixing distribution on σj is the inverse Gamma distribution as deﬁned in (6.7). In other
words, an extra missing component can be added into the E-step, namely, the mixing parameter,
σj . The complete likelihood speciﬁcation for the j-th observation now becomes


Lj

D
2 
.
I
com (xj , Zj , σj ; ν, R) = φR,σ (zj ) · gν (σj ) 

{zd ∈ t−1 (Slj )} 
j
l=1
lj ν

(6.17)

The conditional distribution π required to obtain Q in (6.14) is deﬁned as
n
π(Zj , σj , j = 1, . . . , n; ν, R ) =
πj (Zj , σj ; ν, R )
j=1
where
πj (Zj , σj ; ν, R ) =

com (xj , Zj , σj ; ν, R )
;
obs (xj ; ν, R )

(6.18)

see (6.15). Two distributions derived from (6.18) will be used subsequently: (i) the conditional
distribution of σj given Zj , given by


 


T R−1 z −1 
ν + D

ν + zj
j
2

πj (σj | Zj , ν, R) = IG
,
,
 2

2



and (ii) the marginal of Zj s (after integrating out σj ), given by


Lj
D

I
fν,R (zj ) 
l=1 {z ∈t−1 (t−1 (S )}
ν
lj
dlj ν
πj (Zj ; ν, R) =
.
D (z ) dz
f
· · · −1
j
d1j · · · dzd
Lj
tν (S ) ν,R
t−1 (S1j )
ν
Lj
66

The E-step entails that the expected value of the logarithm of the complete likelihood (6.17) is
taken with respect to the missing components, Zj , and the N -th iterate of R, R(N ) :
E
{log com (xj , Zj , σj ; ν, R ) } ≡ Ej {log com (xj , Zj , σj ; ν, R )},
π(Zj ,σj ; ν,R(N ) )
where πj (Zj , σj ; ν, R ) is as deﬁned in (6.18). The expected value can be simpliﬁed to

Ej −

zT R−1 zj
j
2
2σj


 − 1 log|R|
2

(6.19)

plus other terms that do not involve the correlation matrix R, and hence are irrelevant for the
subsequent M-step. The expectation in (6.19) is taken in two steps, namely, Ej = Ej1 Ej2 where
Ej2 is the conditional expectation of σj given Zj , and Ej1 is the expectation with respect to the
marginal of Zj . On taking Ej2 , the expression in (6.19) simpliﬁes to



−




T
zj R−1 zj

ν+D
ν+D
Ej1
tr R−1 W (N ) ,
=−
(N ) )−1 z 
 ν + z T (R
2
2
j
j
(N )
)) is a D × D matrix with entries
kk




zkj zk j
(N )
w
=
πj (Zj ; ν, R(N ) ) dZj
 ν + z T (R(N ) )−1 z 
kk
j
j

where W (N ) = ((w

for zj = (z1j , . . . , zDj )T .
The last integral is approximated by numerical integration on a grid. We partition each interval
t−1 (Slj ), l = 1, . . . , Lj into a large number of subintervals and evaluate the integrand on the
ν
partition points. Finally, a Reimann sum is obtained as an approximation to the integral.
The M -step consists of maximizing the complete likelihood function (6.19),


zT R−1 zj
j
 − 1 log|R| = − ν + D tr(R−1 W (N ) ) − 1 log|R|,
Ej −
2
2
2
2
2σj

(6.20)

with respect to the correlation matrix R. Since we have to maximize the objective function in
the space of all correlation matrices, this is somewhat a difﬁcult task. We adopt the methodology
67

presented by Barnard et al. [1] where an iterative procedure to maximize (6.20) is developed by
considering the maximization of one element of R, say ρ, each time. In order to preserve the
positive deﬁniteness of R, one can show (see Barnard et al. [1] for details) that ρ should lie in an
interval [ρl , ρu ]. The lower and upper limits of this interval are derived from the fact that in order
for R to be positive deﬁnite, it is both necessary and sufﬁcient that all the principal submatrices of
R are positive deﬁnite. This is equivalent to a non-negativity condition on the determinant of each
principal submatrix, which translates to an interval for the range of values of ρ. This procedure is
repeated for the other elements of R and cycled until convergence before going to the (N + 1)-st
iteration of the EM algorithm.

6.2.2 Selection of the Degrees of Freedom, ν
The above procedure is carried out for a collection of degrees of freedom ν ∈ A, where A is ﬁnite.
ˆ
For each ﬁxed ν, we obtain the estimator of the correlation matrix R(ν) based on the EM algorithm
above. We select the degrees of freedom in the following way: Select ν such that
ˆ
ν = arg maxν∈A
ˆ

1
ˆ
τ {ν, R(ν)}.
ˆ
n

(6.21)

6.3 Summary
In this Chapter, we introduced the t-copula family in RD , and showed that for any joint distribution
D
F D in RD , there exists a unique pair (ν, R) such that Fν,R (x) = Cν,R (F1 (x1 ), ..., FD (xD ))
within this family. Furthermore, we developed a semi-parametric technique to estimate the unknown pair (ν, R) using EM algorithm. The application of this approach will be presented in the
next chapter.

68

CHAPTER 7
SIMULATION AND REAL DATA
RESULTS

7.1 Simulation Results
The results in this section are based on simulated data of four experiments. The ﬁrst three are for
the bivariate case, whereas the fourth is for a trivariate distribution. The true distributions for the
observations Xj , j = 1, . . . , n, in all experiments are of the type as deﬁned in (6.3).
In experiment 1, ν0 = ∞ in (6.3) corresponding to the Gaussian copula function. We choose
ρ = 0.75. The two mixed marginals, F1 and F2 , have the following cumulative distribution

Sample size, n
500
1000
1500
2000
2500

Average Absolute
Bias
0.0168
0.0135
0.0116
0.0113
0.0087

Relative Average
Absolute Bias
2.2 %
1.8 %
1.5 %
1.5 %
1.2 %

Coverage
92.4 %
91.6 %
92.3 %
93.2 %
94.7 %

Table 7.1. Simulation results for Experiment 1 with ν0 = ∞ and ρ = 0.75. The absolute bias and
relative absolute bias of the estimator ρn are provided, together with the empirical coverage of the
ˆ
approximate 95% conﬁdence interval for ρ based on the asymptotic normality of ρn .
ˆ
69

functions:
F1 (x1 ) = 0.3I{x ≥0.2} + 0.7φ(x1 ),
1
and
F2 (x2 ) = 0.2I{x ≥0.1} + 0.8φ(x2 ),
2
where φ(·) is the standard normal density function. Thus, we have d1 = d2 = 1 with D11 = 0.2,
and D21 = 0.1 with probabilities p11 = 0.3, and p21 = 0.2, respectively. The set of jump points
are J (F1 ) = {0.2} and J (F2 ) = {0.1} with continuous components of F1 and F2 corresponding
to f1 (x) = f2 (x) = φ(x).
The sample size n is taken from n = 500 to n = 2500 in increments of 500. In each trial,
2
we simulate n observations from F∞,0.75 (x1 , x2 ) and estimate ρ within the Gaussian copulas
using the methodology presented in Section 6.2. Table 7.1 contains the simulation result from
Experiment 1. For each sample size, the experiment was repeated 1,000 times. The average
absolute bias and relative average absolute bias of the estimator ρn are reported, together with
ˆ
the empirical coverage of the approximate 95% conﬁdence interval for ρ based on the asymptotic
normality of ρn .
ˆ
Experiment 2 consists of the following choices: ν0 = ∞ in (6.3) corresponding to the Gaussian
copula function. We choose ρ0 = 0.75. The two mixed marginals, F1 and F2 , have the following
cumulative distribution functions:
F1 (x1 ) = 0.25I{x ≥0.2} + 0.6t10 (x1 ) + 0.15I{x ≥0.7} ,
1
1
and
F2 (x2 ) = 0.2I{x ≥0.3} + 0.5t10 (x2 ) + 0.3I{x ≥0.6} .
2
2
Thus, we have d1 = d2 = 2 with D11 = 0.2, D12 = 0.7, D21 = 0.3 and D22 = 0.6 with
probabilities p11 = 0.25, p12 = 0.15, p21 = 0.2 and p22 = 0.3, respectively. The set of jump
points are J (F1 ) = {0.2, 0.7} and J (F2 ) = {0.3, 0.6} with continuous components of F1 and
F2 corresponding to f1 (x) = f2 (x) = t10 (x).
70

0.45
df =3
df=5
df=10
df=15
df=20
df=25
normal

0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
−3

−2

−1

0

1

2

3

Figure 7.1. Density curves for t distribution with degrees of freedom ν = 3, 5, 10, 15, 20, 25 and
normal distribution. For interpretation of the references to color in this and all other ﬁgures, the
reader is referred to the electronic version of this dissertation.

For now, we select the set for the degrees of freedom of the t-copula to be A = {3, 5, 10, ∞} for
illustrative purposes. Note that we choose ν0 values in A so that the corresponding t-distributions
are signiﬁcantly different from one another. Figure 7.1 gives the t-densities for several ν0 values,
including values in A. There exist signiﬁcant gaps between the t-density curves corresponding to
ν0 = {3, 5, 10}, but relatively smaller gaps for ν0 = {15, 20, 25, ∞}. Table 7.2 provides the L1 distances between some t- distributions in Figure 7.1.

The sample size n is taken from n = 500 to n = 2500 in increments of 500. In each trial, we
71

ρ0
ρ0 = 0.2

ρ0 = 0.75

ν0
3
5
10
∞
3
5
10
∞

3
5
0
0.0874
0.0874
0
0.1705 0.0837
0.2690 0.1835
0
0.0896
0.0896
0
0.1735 0.0845
0.2721 0.1844

10
0.1705
0.0837
0
0.1005
0.1735
0.0845
0
0.1006

∞
0.2690
0.1835
0.1005
0
0.2721
0.1844
0.1006
0

Table 7.2. L1 distances for paired values of ν0 corresponding to two values of ρ0 , 0.20 and 0.75.

Sample size, n
500
1000
1500
2000
2500

Percentage of times
ν = ν0
ˆ
86%
94%
100%
100%
100%

Mean ρ(ˆ)
ˆν

M SE(ˆ(ˆ))
ρν

0.7361
0.7392
0.7413
0.7429
0.7450

0.6312 ×10−3
0.3869 ×10−3
0.3808 ×10−3
0.3035 ×10−3
0.2861 ×10−3

Table 7.3. Simulation results for Experiment 2 with ν0 = ∞ and ρ0 = 0.75.

2
simulate n observations from F∞,0.75 (x1 , x2 ) and estimate ρ0 and ν0 based on the methodology
presented in Section 6.2. The trial is repeated 50 times. The simulation results, including percentage of times (out of 50) that the true value of ν0 , mean of ρ(ˆ), and the M SE of ρ(ˆ) is chosen ,
ˆν
ˆν
are presented in Table 7.3.
In Experiment 3, we took ν0 = 10. The two generalized marginal distributions are the same
as in Experiment 2. The correlation parameter ρ0 were selected to be 0.20 and 0.75, respectively.
The results are presented in Table 7.4. From the entries of Table 7.3 and Table 7.4, we see that
the estimation procedure is more effective in selecting the true degrees of freedom when ν0 = ∞
compared to ν0 = 10. The reason of being that is the distribution corresponding to ν0 = ∞ is
further away from all the other candidate distributions in A. Also, the estimation procedure is less
effected by the value of ρ0 as illustrated by the percentage of times ν = ν0 column in Table 7.4.
ˆ
72

ρ0

Sample size, n Percentage of times
ν = ν0
ˆ
500
84%
1000
88%
ρ0 = 0.20
1500
92%
2000
94%
2500
100%
500
82%
1000
88%
ρ0 = 0.75
1500
96%
2000
98%
2500
100%

Mean ρ(ˆ)
ˆν

M SE(ˆ(ˆ))
ρν

0.1861
0.1877
0.1889
0.1923
0.1944
0.7398
0.7401
0.7429
0.7523
0.7481

0.7532×10−3
0.6871 ×10−3
0.6095 ×10−3
0.4173 ×10−3
0.3664 ×10−3
0.7235 ×10−3
0.6789 ×10−3
0.6565 ×10−3
0.4546 ×10−3
0.3648 ×10−3

Table 7.4. Simulation results for Experiment 3 with ν0 = 10. The two correlation values considered are ρ0 = 0.2 and ρ0 = 0.75.

sample
size
500
1000
1500
2000
2500

percentage
of getting
true ν
85 %
90 %
95 %
100 %
100 %

mean
ρ1 (ˆ)
ˆ ν

mean
ρ2 (ˆ)
ˆ ν

mean
ρ3 (ˆ)
ˆ ν

total
M SE

0.1990
0.1830
0.1795
0.1820
0.1955

0.2760
0.2805
0.2845
0.2880
0.2920

0.1885
0.2005
0.1970
0.1880
0.1930

0.7143 ×10−3
0.6732 ×10−3
0.6714 ×10−3
0.6126 ×10−3
0.1335 ×10−3

Table 7.5. Simulation results for Experiment 4.

73

In Experiment 4, we took D = 3 and ν0 = 10. The ﬁrst two marginal distributions are the
same as before. The third marginal distribution is taken to be the t-distribution with 10 degrees of
freedom (thus, having no points of discontinuity). We took the correlation matrix R0 as




 1 0.2 0.3 


R0 =  0.2 1 0.2 




0.3 0.2 1





 1 ρ1 ρ2 


, say.
=  ρ1 1 ρ3 




ρ2 ρ3 1
3×3
3×3

(7.1)

For different sample sizes, the experiment were repeated 20 times (instead of 50) to reduce computational time. The estimators of ρi , ρi , i = 1, 2, 3, were obtained based on the iterative procedure
ˆ
outlined in Section 6.2. Since the maximization step involves another loop within the M-step, the
objective function was maximized over ρ-intervals in steps of 0.01 to reduce computational time.
The iterative procedure within the M-step was not required when D = 2 which enabled us to maximize the objective function over a ﬁner grid (steps of 0.0001). The results are given in Table 7.5;
note that (i) the estimators converge and (ii) the MSE reduces as n tends to inﬁnity.

7.2 Application to Multimodal Fusion in Biometric Recognition
7.2.1 Introduction
Biometric recognition refers to the automatic identiﬁcation of individuals based on their biological
or behavioral characteristics [20]. In recent years, recognition of an individual based on his/her
biometric trait has become an increasingly important method for testing “you are who you claim
you are”, see, for example, [18] and [29]. Biometric recognition is more reliable compared to traditional approaches, such as password-based or token-based approaches, as biometric traits cannot
be easily stolen or forgotten. Some examples of biometric traits include ﬁngerprint, face, signature,
voice and hand geometry (See Figure 7.2). A number of commercial recognition systems based on
74

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Figure 7.2. Some examples of biometric traits: (a) ﬁngerprint, (b) iris scan, (c) face scan, (d)
signature, (e) voice, (f) hand geometry, (g) retina, and (h) ear.

these traits have been deployed and are currently in use. Biometric technology has now become
a viable alternative to traditional government applications (e.g., US-VISIT program [48] and the
proposed biometric passport which is capable of storing biometric information of the owner in a
chip inside the passport). With increasing applications involving human-computer interactions,
there is a growing need for recognition techniques that are reliable and secure.
Recognition of an individual can be viewed as a test of statistical hypothesis. Based on the
biometric input Q and a claimed identity Ic , we would like to test
H0 : It = Ic

vs.

H1 : It = Ic ,

(7.2)

where It is the true identity of the user.
The testing in (7.2) is performed by a matcher which computes a similarity measure, S(Q, T ),
based on Q and T ; large (respectively, small) values of S indicate that T and Q are close to (far
75

from) each other (A matcher can also compute a distance measure between Q and T in which case
similar Q and T will produce distance values that are close to zero and vice versa). The distribution
of S(Q, T ) is called genuine (respectively, impostor) when It = Ic (It = Ic ) under H0 (H1 ).
We denote the genuine (imposter) matching score distribution function by Fgen (Fimp ).
Assuming that Fgen (x) and Fimp (x) have densities fgen (x) and fimp (x), respectively. The
Neyman-Pearson theorem states that the optimal ROC curve is the one corresponding to the likelihood ratio statistic
N P (x) =

fgen (x)
fimp (x)

[14]. The ROC curve corresponding to N P (x) has the highest genuine accept rate (GAR) for every
given value of the false acceptance rate (FAR) compared to any other statistic U (x) = N P (x).
However, both fgen (x) and fimp (x) are unknown, and are estimated from the observed matching scores. The ROC corresponding to N P (x) may turn out to be suboptimal, which is mainly due
to the large errors in the estimation of fgen (x) and fimp (x). Thus, for a set of genuine and imposter matching scores, it is important to be able to estimate fgen (x) and fimp (x) reliably and
accurately. The articles, [14] and [38], assume that the distribution function F has a continuous density with no discrete components. In reality most matching algorithms apply thresholds
at various stages in the matching process. When the required threshold conditions are not met,
speciﬁc matching scores are output by the matcher. For example, some ﬁngerprint matchers produce a score of zero if the number of extracted minutiae is less than a threshold. This leads to
discrete components in the matching scores distribution that can not be modelled accurately using
a continuous density function (see Figure 7.3, Figure 7.4, Figure 7.5 and Figure 7.6) . Thus, discrete components need to be detected and modelled seperately to avoid large errors in estimating
fgen (x) and fimp (x).
Another issue is that biometric systems based on a single source of information suffer from
limitations like the lack of uniqueness, non-universality and noisy data [21] and hence, may not
be able to achieve the desired performance requirements of real-world applications. In contrast,
76

4
3.5
3
2.5
2
1.5
1
0.5
0
−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Figure 7.3. Histograms of matching scores, corresponding to genuine scores for Matcher 1. Continuous (respectively, generalized) density estimators is given by the dashed lines (solid lines).

77

0.05
0.045
0.04
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0
−20

0

20

40

60

80

100

120

Figure 7.4. Histograms of matching scores, corresponding to genuine scores for Matcher 2. Continuous (respectively, generalized) density estimators is given by the dashed lines (solid lines).

78

12

10

8

6

4

2

0
−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Figure 7.5. Histograms of matching scores, corresponding to impostor scores for Matcher 1. Continuous (respectively, generalized) density estimators is given by the dashed lines (solid lines). The
spike corresponds to discrete components. Note how the generalized density estimator performs
better compared to the continuous estimator (assuming no discrete components).

79

0.07

0.06

0.05

0.04

0.03

0.02

0.01

0
−10

0

10

20

30

40

50

60

Figure 7.6. Histograms of matching scores, corresponding to impostor scores for Matcher 2. Continuous (respectively, generalized) density estimators is given by the dashed lines (solid lines).

80

some of the limitations imposed by unimodal biometric systems (that is, biometric systems that
rely on the evidence of a single biometric trait) can be overcomed by using multiple biometric
modalities[2], [24], [3], [25], [49] and [47]. Such systems, known as multibiometric systems, are
expected to be more reliable due to the presence of multiple pieces of evidence. In a Multimodal
biometric system, fusion can be done at (i) feature level, (ii) matching scores level, or (iii) decision
level. Matching score level fusion is commonly preferred because matching scores are easily
available and contain sufﬁcient information to distinguish between a genuine and an imposter case.
Dass et al. [7] proposed a biometric fusion using generalized densities. In [7], a Gaussian copula model is chosen to estimate the correlation structure. In reality, sometimes the joint distribution
can be ﬁtted better by using a t-copula model instead of a Gaussian copula model, which is due to
the nature of the data set, so we consider the Gaussian copula and t-copula models together, and
choose the more appropriate model by model selection method based on BIC criteria (Publication
for this research is [16]).

7.2.2 Application in Biometric Fusion
When people deal with biometric fusion, a natural question people need to answer ﬁrst is how to
get the joint distribution of multiple modalities. Previously, people simply assume independence
between the individual modalities, but this assumption is not always true, especially, when the
fusion is done on the same biometric trait with different matchers. Here we deal with the correlation
structure via semi-parametric copula models.
Based on ﬁngerprint images in the MSU-Multimodal database, see [19], corresponding to
100 users, genuine and impostor similarity scores were obtained for two matchers: a correlation
matcher, S1 (see [32]), and a minutiae-based matcher, S2 (see [39]). A total of 2,800 and 4,950
vectors of similarity scores, Xj ≡ (S1 (Q, T ), S2 (Q, T ))T , were obtained for the genuine and impostor cases, respectively. The histogram plot of each marginal (both genuine and impostor) gives
strong indication of non-Gaussianity, thus, justifying the need for the methodology developed in
81

Matching score type
Genuine
Impostor

ν
ˆ
14
25

( ρ1 , ρ2 , ρ3 )
ˆ ˆ ˆ
( 0.76, -0.11, -0.14 )
( 0.3, 0.04, 0.02 )

Table 7.6. Results of the estimation procedure for R and ν based on the NIST database.

this thesis. The match scores are highly correlated since S1 and S2 are applied to the same ﬁngerprint images. Further, both S1 and S2 output the discrete score ‘0’ if certain “initial conditions” are
not met, resulting in a spike at 0 in the corresponding marginal distributions. For both the genuine
and impostor distributions, the set of degrees of freedom, ν, considered is A = {1, 2, 3, . . . 25, ∞}.
We obtained ν = 3 and ρ(ˆ) = 0.4178 for the genuine scores, ν = 3 and ρ(ˆ) = 0.1563 for the
ˆ
ˆν
ˆ
ˆν
impostor scores.
For the reasons mentioned above, joint distribution functions of the form (6.3) with D = 2 are
appropriate for strongly correlated biometric data as we have here.
The second experiment was carried out on the ﬁrst partition of the Biometric Scores Set Release I (BSSR1) released by NIST, [33], consisting of face and ﬁngerprint images from 517
users. Like the previous case, the marginal distributions have discontinuities: The ﬁrst matcher, S1 ,
for the face biometric outputs the value -1 if certain alignment conditions do not hold for the query
and template face pair. The second face matcher, S2 , outputs continuous values and therefore,
does not have any discrete components. The second ﬁngerprint matcher of the MSU-Multimodal
database, renamed S3 here, is used for the ﬁngerprint images resulting in a discrete score of ‘0’. We
obtained 517 and 266,772 vectors of similarity scores, Xj ≡ (S1 (Q, T ), S2 (Q, T ), S3 (Q, T ))T ,
corresponding to the genuine and impostor scores, respectively. In this case, D = 3 and the
correlation matrix R0 can be written as in (7.1). A is taken as before. Table 7.6 gives the results
of the estimation procedure.
We investigate the performance of fusion obtained by combining the D similarity scores obtained from the D different modalities. Since we assume that the genuine and impostor distributions are of the form (6.3), the test of hypotheses (7.2) can be re-stated in terms of the score
82

100

Genuine Accept Rate(%)

95

90

85

80

copula fusion with generalize pdf
copula fusion with continuous pdf
finger matcher 1
finger matcher 2

75

70
−2
10

−1

10

0

10
False Accept Rate(%)

1

10

Figure 7.7. Performance of copula fusion on the MSU-Multimodal database

distributions under H0 and H1 , namely,
D
H0 : FQ (x) = Fν ,R (x)
0 0

vs.

D
H1 : FQ (x) = Fν ,R (x)
1 1

for some (ν0 , R0 ) and (ν1 , R1 ). The optimal decision rule (fusion rule) then turns out to be the
D
D
likelihood ratio LR(x) = dFν ,R (x)/dFν ,R (x)from the Neyman-Pearson Lemma, following
0 0
1 1
in a similar fashion for the case in the single modality explained earlier. However, the LR rule
cannot be used in the current form since the parameters ν0 , R0 , ν1 and R1 are unknown. The
methodology developed here can be used to obtain estimators of all of these parameters, thus
ˆ
obtaining the estimated likelihood ratio statistic LR(x) = dF D ˆ (x)/dF D ˆ (x).
ν0 ,R0
ˆ
ν1 ,R1
ˆ
The effectiveness of the (estimated) LR fusion rule can be evaluated based on a K-fold cross
83

100

Genuine Accept Rate(%)

95

90

85

80

75
−2
10

copula fusion with generalize pdf
copula fusion with continuous pdf
face matcher 1
face matcher 2
−1

10

0

10
False Accept Rate(%)

Figure 7.8. Performance of copula fusion on the NIST database.

84

1

10

validation procedure. In the k-th iteration, k = 1, . . . , K, a random subset, say S0 , of n0 genuine
scores from the total of ngen scores is selected for estimating the parameters ν0 and R0 . Similarly,
for estimating ν1 and R1 , a (random) subset n1 impostor scores, say S1 , is selected from a total
of nimp scores. The remaining genuine and impostor scores are used to obtain an estimator of
the false accept and genuine accept rates (FAR and GAR, respectively) for each threshold λ. The
relevant formulas are
ˆ
F AR(λ) =

c
ˆ
j∈S1 I{LR(xj )>λ }
ˆ
and GAR(λ) =
nimp − n1

c
ˆ
j∈S0 I{LR(xj )>λ }
,
ngen − n0

c
c
where S0 and S1 are the complements of S0 and S1 , respectively. The ROC (Receiver Operating
ˆ
ˆ
Characteristics) curve is the plot of F AR(λ) versus GAR(λ) with higher ROC values indicating
better recognition performance. Our experiments on the MSU-Multimodal and NIST databases
were based on the following choices: K = 10 and n0 /ngen = n1 /nimp ≈ 0.8. The fusion results
are presented in Figure 7.7 and Figure 7.8. Note that there is an dramatic overall improvement of
the performance indicating that the elicited joint distributions are good models for the distribution
of similarity scores.

85

CHAPTER 8
SUMMARY AND CONCLUSION
Investigating dependence structures of multivariate distributions has always been an interesting
area for researchers. Copulas have proved to be a useful tool for analyzing multivariate dependence
structures by providing more ﬂexibility than the classic parametric approach as they can easily
separate the effect of dependence structure from that of the marginals, especially in situations that
the marginals contain discrete points.
This thesis developed a semi-parametric approach to estimate the dependence structure for the
bivariate distributions with mixed marginals. The semi-parametric estimator established in this
thesis has been shown to be asymptotically consistent. A variance estimator has been provided
as well. The estimation methodology involves integrals corresponding to the discrete componets
and is therefore, non-standard. Furthermore, our approach was generalized to the higher dimensional case under similar assumptions and using the same arguments. The higher dimensional case
requires more computing time.
The semi-parametric approach developed has been utilized in the t-copula family and the Gaussian family, which is the limiting distribution of the t-copula family. Estimation of the correlation
matrix as well as the degrees of freedom corresponding to the t-copula were carried out based
on the EM algorithm. This estimation was also done for the estimation of the correlation matrix
corresponding to the Gaussian copula.
86

We have shown large sample consistency of our estimates and demonstrated this based on several simulation examples. Finally, the methodology was applied to real data consisting of matching scores from various biometric sources. Fusion based on the generalized distributions gave
improved performance compared to the individual systems.
As future work, we will consider extensions to copula functions derived from general elliptical
distributions.

87

BIBLIOGRAPHY

88

BIBLIOGRAPHY

[1] Barnard, J. and McCulloch, R. and Meng, X. L., Matas, J.G., Modeling Covariance Matrices
in Terms of Standard Deviations and Correlations, with Application to Shrinkage, Statistica
Sinica, 10 (2000), 1281–1311.
[2] Bigun, E. S., Bigun, J., Duc, B., Fischer, S., Expert Conciliation for Multimodal Person Authentication Systems using Bayesian Statistics, In: Proceedings of First International Conference on AVBPA, Crans–Montana, Switzerland (1997), 291–300.
[3] Brunelli, R., and Falavigna, D., Person identiﬁcation using multiple cues, IEEE Transactions
on Pattern Analysis and Machine Intelligence, 20 (1998), 226–239.
[4] Chen, X. and Fan, Y., Estimation of Copula-Based Semiparametric Time Series Models,
Journal of Econometrics, 130 (2006), 307–335.
[5] Cherubini, U., Luciano, E., and Vecchiato, W., Copula Methods in Finance, Wiley, 2004.
[6] Clayton, D. G., A Model for association in Bivariate Life Tables and Its Application in Epidemiological Studies of Familial Tendency in Chronic Disease Incidence, Biometrika, 65
(1978) 141–151.
[7] Dass, S., Nandakumar, K., Jain. A. K., A principled approach to score level fusion in multimodal biometric systems, To appear in Proceedings of AVBPA, (2005).
[8] Demarta, S., McNeil, A. J., The t–copula and Related Copulas, International Statistical Review, 73 (2005), no. 1, 111–129.
[9] Embrechts P., Lindskog F., and McNeil A., Modelling Dependence with Copula and Applications to Risk Management, Handbook of Heavy Tailed Distributions in Finance, Elsevier,
(2003), 329–384.
[10] Frey, R. and McNeil, A. J., Copulas and Credit Models, RISK, (2001), 111–114.
[11] Genest, C., Ghoudi, K., and Rivest, L. –P., A semiparametric estimation procedure of dependence parameters in multivariate families of distributions, Biometrika, 82 (1995), no. 3,
543–552.
89

[12] Genest, C. and Rivest, L. –P., Statistical Inference Procedures for Bivariate Archimedean
Copulas, Journal of the American Statistical Association, 88 (1993), 1034–1043.
[13] Genest, C., and Quessy, J. F., and Remillard, B., Goodness–of–ﬁt procedures for copula models based on the probability integral transform, Scandinavian Journal of Statistics, (2006),
337–366.
[14] Grifﬁn, P., Optimal Biometric Fusion for Identity Veriﬁcation, Identix Corporate Research
Center Preprint RDNJ–03–0064, (2004).
[15] Hougaard, P., Harvald, B., and Holm, N. V., Measuring the Similarities Between the Lifetimes
of Adult Danish Twins Born Between 1881 and 1930, J. Am. Statist. Assoc., 87 (1992) 17–24.
[16] Huang, W. and Dass, C. S., Generalized t-Copula and Its Application on Biometric, JSM
Proceedings, (2007).
[17] Hutchinson, T. P. and Lai, C. D., Continuou Bivariate Distributions, Emphasising Applications. Adelaide: Rumsby Scientiﬁc.
[18] Jain, A. K. and Bolle, R. and Pankanti, S., BIOMETRICS: Personal Identiﬁcation in Networked Society. Kluwer Academic Publishers, Boston, 1999.
[19] Jain, A. K. and Prabhakar, S. and Ross, A., Fingerprint Mathcing: Data Acquisition and
Performance Evaluation. MSU Technical Report TR99-14, (1999).
[20] Jain, A. K., Ross, A., Prabhakar, S., An Introduction to Biometric Recognition, IEEE Transactions on Circuits and Systems for Video Technology, Special Issue on Image– and Video–
Based Biometrics, 14 (2004), 4–20.
[21] Jain, A. K., Ross, A., Multibiometric Systems, Communications of the ACM, Special Issue
on Multimodal Interfaces, 47 (2004), 34–40.
[22] Joe, H., Parametric families of multivariate distributions with given marginals, J. Mult. Anal.,
46, 262–282.
[23] Kimeldorf, G., Sampson, A. R., One parameter families of bivariate distributions with ﬁxed
marginals, Commun. Statist., 4, 293–301.
[24] Kittler, J., Hatef, M., Duin, R. P., Matas, J.G., On Combining Classiﬁers, IEEE Transactions
on Pattern Analysis and Machine Intelligence, 12(10) (1998), 955–966.
90

[25] Lam, L., Suen, C. Y., Optimal Combination of Pattern Classifers, Pattern Recognition Letters,
16 (1995), 945–954.
[26] Lancaster, H., The Structure of Bivariate Distributions, Annals of Mathematical Statistics, 29
(1958), 719–736.
[27] Lehmann, E., Some Concepts of Dependence, Annals of Mathematical Statistics, 37 (1966),
1137–1153.
[28] Lehmann, E. L., Testing Statistical Hypotheses, Chapman & Hall, NY, 1994.
[29] Maltoni, D., Maio, D., Jain, A. K. and Prabhakar, S., Handbook of Fingerprint Recognition.
Springer-Verlag, 2003.
[30] Marshall, A. W. and Olkin, I., Families of Mulitvariate Distributions, J. Am. Statist. Assoc.,
83 (1988) 834–841.
[31] McLachlan, G., J. and Krishnan, T., The EM Algorithm and Extensions, John Wiley & Sons,
1997.
[32] Nandakumar, K. and Jain, A. K., Local Correlation-based Fingerprint Matching, Proc. of
Indian Conference on Computer Vision, Graphics and Image Processing, (Kolkata), (2004),
503–508.
[33] National Institute of Standards and Technology: NIST Biometric Scores Set. Available at
http://www.itl.nist.gov/iad/894.03/biometricscores (2004).
[34] Nelsen. R., An introduction to Copulas, Springer, 1998.
[35] Oakes, D., Semiparametric inference in a model for association in bivariate survival data,
Biometrika, 73 (1986), 353–361.
[36] Oakes, D., Multivariate Survival Distributions, J. Nonparam. Statist., 3 (1994), 343–354.
[37] Pitt, M., and Chan, D., and Kohn, R., Efﬁcient Bayesian inference for Gaussian copula regression models, Biometrika, 93 (2006), 537–554.
[38] Prabhakar, S., Jain, A. K., Decision–level Fusion in Fingerprint Veriﬁcation, Pattern Recognition, 35 (2002), 861–874.
91

[39] Ratha, N. K., Chen, S., Karu, K., and Jain, A. K., A Real-time Matching System for Large
Fingerprint Databases, IEEE Transactions on PAMI, 18 (1996), 799–813.
[40] R¨ schendorf, L., Asymptotic Distribution of Multivarite Rank Order Statistics, The Annals
u
of Statistics, 4 (1976), 912–923.
[41] Ruymgaart, F. H., Shorack G. R., and Zwet, W. R.,Asymptotic Normality of Nonparametric
Tests for Independence, Annals of Mathematical Statistics, 43 (1972), 1122–1135.
[42] Ruymgaart, F. H., Asymptotic Normality of Nonparametric Tests for Independence, Annals
of Statistics, 2 (1974), 892–910.
[43] Shao, J., Mathematical Statistics, Springer, 2003.
[44] Shih, J. and Louis, T., Inferences on the Association Parameter in Copula Models for Bivariate Survival Data, Biometrics, 51 (1995), 1384–1399.
[45] Shorack, G. R., Functions of Order Statistics, Annals of Mathematical Statistics, 43 (1974),
412–427.
`
[46] Sklar, A., Fonctions de r´ partition a n dimensions et leurs marges, Publ. Inst. Statist. Univ.
e
Paris, 8, 229–231.
[47] Toh, K. A., Jiang, X., Yau, W. Y., Exploiting Global and Local Decisions for Multi– modal
Biometrics Veriﬁcation, IEEE Transactions on Signal Processing, 52 (2004), 3059–3072.
[48] USVISIT, U. S. Department of Homeland Security. U. S. Department of Homeland Security.
Online: http://www.dhs.gov/dhspublic/display?theme=91.
[49] Wang, Y., Tan, T., Jain, A. K., Combining Face and Iris Biometrics for Identity Veriﬁcation,
In: Proceedings of Fourth International Conference on AVBPA, Guildford, U.K., (2003) 805–
813.

92