A DEPENDENCE MODEL FOR UNBALANCED CROP INSURANCE INDEMNITY
AMOUNTS

By

Yi-Hsuen Lin

A THESIS

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

Statistics—Master of Science

2023

ABSTRACT

In this thesis, a copula model is constructed to estimate dependency and calculate the Value

at Risk for insurance coverage under the dependence of the covered losses. The dependence

model is illustrated in a three-dimensional setting to simplify the complex theoretical functions

and provide an accessible introduction to the copula model. The modeling uses the U.S. crop

insurance dataset aggregated by each county level and commodity type. The composite likelihood

approach helps to simplify the computation in high-dimensional problems by approximating the

negative log-likelihood using bivariate components. In this study, the majorization-minimization

principle is employed to estimate the parameters of the normal copula by minimizing the composite

likelihood iteratively. To avoid overfitting and result in a valid correlation matrix, the L1 penalty

is applied to induce sparsity and shrink irrelevant parameters toward zero. The optimal tuning

parameter is selected based on the BIC score to generate a positive semi-definite correlation matrix

for the result. In the Appendix of the thesis, the dependence model is extended to a high dimension.

The Value at Risk computed for the fictional insurance contract in the data analysis results in a

higher value when considering dependence between variables.

ACKNOWLEDGEMENTS

I would like to express my sincere gratitude to my advisor, Dr. Gee Lee, for his guidance and

offering me the opportunity to participate in this research. This thesis would not have reached

its completion without his constant support throughout the research process. His belief in my

capabilities has always motivated me, driving me to achieve more than I thought possible. Words

cannot adequately express how lucky I am to have learned and grown under his mentorship.

I extend my deep appreciation to my thesis committee members, Dr. Haiyan Liu and Dr.

Haolei Weng, who provide their invaluable time and constructive suggestions that have enriched

the content and quality of my work.

Most importantly, none of this would have been possible without my family and friends. I am

profoundly grateful to my parents, San-Ching Lin and Mei-Ling Chen, for their unconditional love

and support. Being your daughter is an immeasurable blessing I deeply treasure. Your constant

encouragement and understanding have given me the freedom to explore my passions, pursue my

dreams without hesitation, and evolve into the person I am today. I am also thankful to my lovely

furry friend, Bella, who accompanied me during late-night paper writing and through frustrated

moments, bringing joy even on the most challenging days.

This thesis is a culmination of the support and encouragement I’ve received from each one of

you. Thank you all for being an integral part of this remarkable journey!

iii

TABLE OF CONTENTS

LIST OF FIGURES .

LIST OF TABLES .

INTRODUCTION .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.

.

v

vi

1

CHAPTER 1: COPULA MODEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.

4
4
1.1 Assumptions .
1.2 Estimation of Copula Parameters . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.3 The Distribution of Correlation Matrices . . . . . . . . . . . . . . . . . . . . . . . 13

.

.

.

.

.

.

.
.

.
.
.
.
.

.
.

.
.
.
.
.

CHAPTER 2: SIMULATION .

2.1
.
2.2 C and IC Analysis .
2.3 Model Performance .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
. 15
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Influence of Spurious Correlations . . . . . . . . . . . . . . . . . . . . . . . . .

CHAPTER 3: RESULTS .
.
.
3.1 Data Description .
.
3.2 Marginal Models
.
3.3 Dependence Model
.
3.4 Value at Risk (VaR) .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

CHAPTER 4: CONCLUSION .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

BIBLIOGRAPHY .

APPENDIX .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 28

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 29

iv

LIST OF FIGURES

Figure 1.1: Graph of ρjk as a function of ψjk

. . . . . . . . . . . . . . . . . . . . . . . . .

6

Figure 2.1:

Influence of spurious correlations . . . . . . . . . . . . . . . . . . . . . . . . . 16

Figure 3.1: Scatter plot of liabilities and indemnities at the county level

. . . . . . . . . . . 23

Figure 3.2: Solution path and optimal tuning parameter for selected commodities . . . . . . 24

Figure 3.3: Simulated density of indemnities for a fictional insurance coverage . . . . . . . 26

v

LIST OF TABLES

Table 2.1: C and IC analysis result

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Table 3.1: Dataset of liabilities and indemnities . . . . . . . . . . . . . . . . . . . . . . . . 22

Table 3.2: Coefficient estimates for the marginal gamma model

. . . . . . . . . . . . . . . 23

Table 3.3: Correlation of selected commodities . . . . . . . . . . . . . . . . . . . . . . .

. 23

vi

INTRODUCTION

Dependency refers to the relationship between variables, and the analysis of dependency is crucial

since it helps building accurate models and managing risk. Suppose we are given n sample points

for a p-dimensional multivariate response, where each of the p marginal response has a different

distribution. In this case, the copula approach can be employed to model the dependence between

random variables. The classic textbook Nelsen (1999) provides an introduction to the copula

model. The literature on copula modeling has developed significantly in recent years and provides

numerous extended introduction to the copula models, such as Hofert et al. (2018) and Czado

(2019) including examples in the R programming language, or Frees and Valdez (1998) exploring

the modeling in the actuarial science context. For the use of the copula approach for count data,

the reader may see Nikoloulopoulos and Karlis (2010), or Shi and Emiliano (2014).

Sklar’s Theorem allows us to derive the multivariate distribution function of a random vector

through the use of copula functions, with its corresponding marginal distributions. If we attempt

to estimate the normal copula parameters with a p-dimensional correlation matrix, where the
(cid:1) > n), it will be considered a
number of dependence parameters exceeds the samples size n, ((cid:0)p

2

high-dimensional problem. In this situation, using maximum likelihood to estimate the parameters

may arise several problems. One problem is the potential overfitting with spurious correlations, it

happens when all off-diagonal entries of the correlation matrix are non-zero, and the model may

be fitting the noise that is not truly meaningful. Another problem is the difficulty in ensuring the

estimated correlation matrix satisfies the positive semi-definite property.

Due to these problems, we hope to automatically set some correlation entries to zero during the

estimate process, so that we could introduce sparsity that simplifies the model, avoid overfitting,

and further result in a valid correlation matrix. There are various ways to achieve this, such as

the lasso and ridge regression approach. In this paper, the lasso penalty (L1 penalty) is used to

select and shrink some irrelevant correlation parameters toward zero. Nevertheless, while reducing

the number of non-zero entries, the process cannot ensure that the resulting matrix is positive

semi-definite; it can only increase the likelihood of obtaining a valid correlation matrix. More

1

relevant details will be discussed in the following chapter.

The lasso is a variable selection method that solves the optimization problem with a constraint,

and it was initially introduced in the statistical lecture by Tibshirani (1996). In the thesis, the lasso

approach is used to add a penalty term to the negative likelihood function and adjust the tuning

parameter to control the severity of the penalty. Due to the accessibility of efficient algorithms for

solving the estimation problem, the lasso method has become increasingly popular and developed

several extensions, such as shrinking group variables in Yuan and Lin (2006), applying to non-normal

responses in Yang and Zou (2015), and the Tweedie compound distribution in Qian et al. (2016).

The Tweedie compound distribution is commonly used in actuarial science, insurance, and financial

fields to model data with both continuous and discrete characteristics. The distribution can be

applied in frequency-severity regression as a distribution for modeling insurance losses, which is

the severity component of the regression.

Frequency-severity regression is commonly used in actuarial science to analyze the insurance

claim occurrence (frequency) and claim amounts (severity). One can refer to Frees et al. (2011),

Frees (2014) to gain the introduction and applications of frequency-severity regression.

In the

thesis, the approach used for indemnity modeling has a close connection with the frequency-severity

regression approach, both modeling responses consisting of continuous and discrete components.

Moreover, the gamma distribution is mainly used to model the crop insurance indemnities in the

research.

This research demonstrates the use of the lasso technique on the multivariate normal copula,

which helps to simplify the copula parameters by inducing sparsity. The approach takes advantage

of the composite likelihood approach for estimation. The composite likelihood approach, explained

in Zhao and Joe (2005), Song et al. (2005), and Joe (2014), constructs an approximation to the

log-likelihood function using bivariate components.

It is used for estimating the dependence

parameters in the copula model since the log-likelihood may be hard to compute. It is a straightforward

method that may be adaptable to other copula families.

The main intent of this research is to explore the dependence model and underlying theoretical

2

functions that enable the selection and estimation of the normal copula parameters in a three-dimensional

setting. By concentrating on three dimensions, we can simplify the explanation of the core concepts

and provide a more approachable introduction to the copula model. Although we restrict our focus

to three dimensions, the approach explained in this thesis can be extended to higher dimensions.

And extensions to other copula families might be possible using the methodologies and insights

gained from the research.

The rest of the thesis is organized into four parts. Chapter 1 illustrates the assumptions and

approach utilized in the dependence model in three-dimensional setting. The section also introduces

the distribution of correlation matrices that is useful while generating valid correlation matrices

and may be used in future work. Chapter 2 conducts some simulation studies using the copula

estimation routine to evaluate the performance and properties of the model. Chapter 3 shows the

results of the estimation using the copula model and the aggregated U.S. crop insurance dataset

and calculates Value at Risk for the insurance portfolio losses. Moreover, the Appendix shows the

extensions of the normal copula model in Chapter 1 to high dimensions.

3

CHAPTER 1: COPULA MODEL

1.1 Assumptions

Suppose we are given 3-dimensional multivariate responses of n samples, denoted by (yi1, yi2, yi3)

for i = 1, ..., n, and the marginal distributions Fi1(yi1), Fi2(yi2), Fi3(yi3) are known. Then, the

task is to use these to derive the joint density among the variables, f (yi1, yi2, yi3). Various copula

models could fulfill the task, and an introduction to the copulas is explained in Nelsen (1999).

Here, we assume using a multivariate normal copula C that links the univariate marginals to their

joint distribution and can be written as

C(u1, u2, u3; Σ) = P r(U1 < u1, U2 < u2, U3 < u3)

= Φ3(Φ−1(u1), Φ−1(u2), Φ−1(u3); Σ),

(1.1)

where Ui is uniformly distributed on the interval [0, 1], Φ3 is the cumulative distribution function

of a 3-dimensional multivariate normal distribution with zero means and correlation matrix Σ, and

Φ−1 is the inverse cumulative distribution function of standard normal distribution. In this case,

the density of the copula C is

c(u1, u2, u3; Σ) =

∂3
∂u1∂u2∂u3

C(u1, u2, u3; Σ).

(1.2)

Using the probability integral transform (u1, u2, u3) = (F1(y1), F2(y2), F3(y3)), the normal copula

C can also be written as

C(Fi1(yi1), Fi2(yi2), Fi3(yi3); Σ) = F (yi1, yi2, yi3).

(1.3)

By the chain rule, the joint density of the multivariate response is

f (yi1, yi2, yi3) =

∂3
∂yi1∂yi2∂yi3

C(Fi1(yi1), Fi2(yi2), Fi3(yi3))

(1.4)

= c(Fi1(yi1), Fi2(yi2), Fi3(yi3); Σ) · fi1(yi1) · fi2(yi2) · fi3(yi3).

4

1.2 Estimation of Copula Parameters

The correlation matrix Σ is symmetric and displays the dependence parameter between the i-th

and j-th variables in (i, j) entry. In the three-dimensional setting,









1

ρ12 ρ13

Σ =

ρ21

1

ρ23

ρ31 ρ32

1









.

(1.5)

In the research, the pairwise composite likelihood method is used to estimate the dependence

parameters ρij. The method is well explained in a multivariate setting by Zhao and Joe (2005),

and it is helpful to lower the computational complexity by reducing the dimensionality of the

data, especially in high-dimensional problems with complex dependencies among the variables.

Assuming all of the three-dimensional multivariate responses for each sample are fully defined,

the composite negative log-likelihood function will be

ℓ(Σ) = −

n
(cid:88)

i=1

(cid:20)

1
3 − 1

log c (Fi1(yi1), Fi2(yi2); ρ12) + log f1(yi1) + log f2(yi2)

+ log c (Fi1(yi1), Fi3(yi3); ρ13) + log f1(yi1) + log f3(yi3)
(cid:21)
+ log c (Fi2(yi2), Fi2(yi3); ρ23) + log f2(yi2) + log f3(yi3)

,

(1.6)

where 1/(3 − 1) is the weight to match the log-likelihood when independent case and c is a

density of bivariate normal copula. However, in general, one or more responses for some of the

samples may be undefined, then the weight will become 1/(mi − 1) where mi is the number

of defined responses in observation i. Note that we know the marginal distributions Fi1(yi1),

Fi2(yi2), Fi3(yi3), as well as the densities fi1(yi1), fi2(yi2), fi3(yi3) for i = 1, 2, ..., n, the negative

log-likelihood ℓ(Σ) is a function in the dependence parameters only. The optimal values of the

parameters ρjk that minimize the negative log-likelihood function ℓ(Σ) would fall between −∞

5

and ∞. Nonetheless, in order to ensure the dependence parameters satisfy −1 < ρjk < 1, we set

ρjk =

tan−1(ψjk)
π/2

, −∞ < ψjk < ∞,

(1.7)

Figure 1.1 shows that the function is a one-to-one mapping, and for any ψjk, the corresponding

dependence parameters ρjk will fall within the range of -1 and 1.

Figure 1.1: Graph of ρjk as a function of ψjk

Then, we use the L1 penalty method to reduce the number of non-zero parameters and avoid

overfitting, which is helpful in high-dimensional problems which have a large number of parameters

to estimate. Now, the penalized negative log-likelihood is

ℓP (Ψ) = ℓ(Σ) + λ(|ψ12| + |ψ13| + |ψ23|)

(1.8)

where λ is tuning parameter that control the severity of the penalty and









1

ψ12 ψ13

Ψ =

ψ21

1

ψ23

ψ31 ψ32

1









.

(1.9)

Then, the problem now is to estimate the alternative parameters ψij by minimizing the penalized

6

−10−50510−1.0−0.50.00.51.0Graph of r as a function of yyrnegative log-likelihood function ℓP (Ψ). Some of the estimates may end up shrunk to zero due

to the lasso penalty. The tuning parameter λ controls the number of zero entries in the estimated

matrix Ψ. When λ increases, more and more coefficients are set to zero and eliminated. There

are different ways to determine λ, such as cross-validation and minimizing AIC (or BIC). In the

thesis, the tuning parameter is optimized by minimizing the BIC score. Since we assume that c is

the density of a bivariate normal copula, the negative log-likelihood function can be written as

ℓ(Σ) = −

n
(cid:88)

i=1

(cid:20)

1
3 − 1

(cid:32)

log

1
(cid:112)1 − ρ2

12

(cid:33)

(cid:32)

+ log

1
(cid:112)1 − ρ2

13

(cid:33)

(cid:32)

+ log

(cid:33)

1
(cid:112)1 − ρ2

23

−

(a2

i + b2

12 − 2aibiρ12

i )ρ2
2(1 − ρ12)2

−

(a2

i + c2

13 − 2aiciρ13

i )ρ2
2(1 − ρ13)2

−

(b2

i + c2

23 − 2biciρ23

i )ρ2
2(1 − ρ23)2

+ log f1(yi1) + log f2(yi2) + log f1(yi1) + log f3(yi3) + log f2(yi2) + log f3(yi3)

(cid:21)
,

where

with

√

2 erf−1(2ui1 − 1),

ui1 = Fi1(yi1)

√

2 erf−1(2ui2 − 1),

ui2 = Fi2(yi2)

√

2 erf−1(2ui3 − 1),

ui3 = Fi3(yi3),

ai =

bi =

ci =

erf(z) =

2
√
π

(cid:90) z

0

exp(−t2)dt.

(1.10)

(1.11)

(1.12)

7

In order to estimate the dependent parameters, let

Φi12 = −

(cid:34)

(cid:32)

log

1
3 − 1

1
(cid:112)1 − ρ2

12

(cid:33)

−

Φi13 = −

(cid:34)

(cid:32)

log

1
3 − 1

1
(cid:112)1 − ρ2

13

(cid:33)

−

Φi23 = −

(cid:34)

(cid:32)

log

1
3 − 1

1
(cid:112)1 − ρ2

23

(cid:33)

−

(a2

i + b2

12 − 2aibiρ12

i )ρ2
2(1 − ρ12)2

(a2

i + c2

13 − 2aiciρ13

i )ρ2
2(1 − ρ13)2

(b2

i + c2

23 − 2biciρ23

i )ρ2
2(1 − ρ23)2

so that

ℓ(Σ) =

n
(cid:88)

i=1

Φi12 + Φi13 + Φi23.

+ log fi1(yi1) + log fi2(yi2)

(cid:35)

(cid:35)

+ log fi1(yi1) + log fi3(yi3)

+ log fi2(yi2) + log fi3(yi3)

(cid:35)

(1.13)

(1.14)

Remember that we map ρjk to ψjk by ρjk = 2 tan−1(ψjk)/π for ensuring the dependence parameters

satisfy the condition of between -1 and 1. To obtain the Hessian and gradient of log-likelihood

ℓ(Σ), the log-likelihood ℓ(Σ) should be differentiated in terms of the parameters ψjk using the

chain rule. Then consider

∂Φi12
∂ψ12

=

∂Φi12
∂ρ12

∂ρ12
∂ψ12

,

∂Φi13
∂ψ13

=

∂Φi13
∂ρ13

∂ρ13
∂ψ13

,

∂Φi23
∂ψ23

=

∂Φi23
∂ρ23

∂ρ23
∂ψ23

,

(1.15)

where

∂Φi12
∂ρ12
∂Φi13
∂ρ13
∂Φi23
∂ρ23

= −

= −

= −

1
3 − 1
1
3 − 1
1
3 − 1

(cid:20) ρ12
1 − ρ2
12
(cid:20) ρ13
1 − ρ2
13
(cid:20) ρ23
1 − ρ2
23

+

+

+

aibiρ2

12 − (a2

aiciρ2

13 − (a2

biciρ2

23 − (b2

(cid:21)

(cid:21)

i + b2
i )ρ12 + aibi
(1 − ρ2
12)2
i + c2
i )ρ13 + aici
(1 − ρ2
13)2
i + c2
i )ρ23 + bici
(1 − ρ2
23)2

(cid:21)

,

(1.16)

8

and

∂ρ12
∂ψ12

=

2
π(1 + ψ2

12)

,

∂ρ13
∂ψ13

=

2
π(1 + ψ2

13)

,

∂ρ23
∂ψ23

=

2
π(1 + ψ2

23)

.

(1.17)

For the Hessian, we have

∂2Φi12
∂ψ2
12
∂2Φi13
∂ψ2
13
∂2Φi23
∂ψ2
23

=

=

=

∂2Φi12
∂ρ2
12
∂2Φi13
∂ρ2
13
∂2Φi23
∂ρ2
23

(cid:18) ∂ρ12
∂ψ12
(cid:18) ∂ρ13
∂ψ13
(cid:18) ∂ρ23
∂ψ23

(cid:19)2

(cid:19)2

(cid:19)2

+

+

+

∂Φi12
∂ρ12
∂Φi13
∂ρ13
∂Φi23
∂ρ23

(cid:18) ∂2ρ12
∂ψ2
12
(cid:18) ∂2ρ13
∂ψ2
13
(cid:18) ∂2ρ23
∂ψ2
23

(cid:19)

(cid:19)

(cid:19)

,

(1.18)

where

∂2Φi12
∂ρ2
12

∂2Φi13
∂ρ2
13

∂2Φi23
∂ρ2
23

= −

= −

= −

(cid:34)

(cid:34)

(cid:34)

1
2

1
2

1
2

1 + ρ2
12
(1 − ρ2

12)2 +

1 + ρ2
13
(1 − ρ2

13)2 +

1 + ρ2
23
(1 − ρ2

23)2 +

and

2aibiρ12 − (a2
12)2

(1 − ρ2

i + b2
i )

2aiciρ13 − (a2
13)2

(1 − ρ2

i + c2
i )

2biciρ23 − (b2
23)2

(1 − ρ2

i + c2
i )

+

+

+

4 {aibiρ2

12 − (a2

i + b2

i )ρ12 + aibi} (1 − ρ2

12)ρ12

(1 − ρ2

12)4

4 {aiciρ2

13 − (a2

i + c2

i )ρ13 + aici} (1 − ρ2

13)ρ13

(cid:35)

(cid:35)

(1 − ρ2

13)4

4 {biciρ2

23 − (b2

i + c2

i )ρ23 + bici} (1 − ρ2

23)ρ23

(1 − ρ2

23)4

(1.19)

(cid:35)

,

∂2ρ12
∂ψ2
12

= −

4ψ12
π(1 + ψ2

12)2 ,

∂2ρ13
∂ψ2
13

= −

4ψ13
π(1 + ψ2

13)2 ,

∂2ρ23
∂ψ2
23

= −

4ψ23
π(1 + ψ2

23)2 .

(1.20)

The (cid:0)3

2

(cid:1) dimensional vector of parameters is

ψ = (ψ12, ψ13, ψ23)T .

(1.21)

9

The (cid:0)3

2

(cid:1) × (cid:0)3

2

(cid:1) Hessian matrix Hψ is diagonal and the off-diagonal elements are zero

Hψ =

Hψ(1, 2)









0

0

0

Hψ(1, 3)

0

0

0

Hψ(2, 3)









.

(1.22)

where

and similarly

Hψ(1, 2) =

∂2
∂ψ2
12

ℓ(Σ) =

n
(cid:88)

i=1

∂2
∂ψ2
12

[Φi12 + Φi13 + Φi23] =

n
(cid:88)

i=1

∂2Φi12
∂ψ2
12

(1.23)

Hψ(1, 3) =

n
(cid:88)

i=1

∂2Φi13
∂ψ2
13

, Hψ(2, 3) =

n
(cid:88)

i=1

∂2Φi23
∂ψ2
23

.

And the (cid:0)3

(cid:1) gradient vector Uψ is

2

Uψ = (Uψ(1, 2), Uψ(1, 3), Uψ(2, 3))T

n
(cid:88)

= (

i=1

∂Φi12
∂ψ12

,

n
(cid:88)

i=1

∂Φi13
∂ψ13

,

n
(cid:88)

i=1

∂Φi23
∂ψ23

)T

(1.24)

(1.25)

Note ˜ψ denotes the parameters updated in the outer loop of Algorithm 1, and ˘ψ for the inner

loop. Then, the second-order (quadratic) Taylor series approximation of the unpenalized negative

log-likelihood function ℓ(Σ) around a given point ˜ψ is

ℓQ(ψ) = ℓ( ˜ψ) + ˜U T

ψ (ψ − ˜ψ) +

1
2

(ψ − ˜ψ)T ˜Hψ(ψ − ˜ψ),

(1.26)

where ˜Uψ is the gradient vector and ˜Hψ is the Hessian matrix obtained at the given point ˜ψ =
( ˜ψ12, ˜ψ13, ˜ψ23)T . Now, the problem is to minimize the following penalized objective function:

PQ(ψ) = ℓQ(ψ) + λ (|ψ12| + |ψ13| + |ψ23|) .

(1.27)

10

However, the log-likelihood ℓQ(ψ) is complicated and the penalized objective function PQ(Ψ)

may be non-smooth, so the problem cannot be directly handled by first and second-order conditions

for optimization. Thus, the majorization-minimization algorithm is used to accomplish the task,

and it is well explained in Yang and Zou (2015) and Qian et al. (2016). The key step of applying

the majorization-minimization is constructing a surrogate function that majorizes the objective

function. The algorithm estimates the parameters by iteratively minimizing a sequence of surrogate

functions. In this paper, we construct the surrogate function by second-order Taylor expansion.

First, define

UQ = ∇ℓQ(ψ)

HQ = ∇2ℓQ(ψ),

(1.28)

(1.29)

where the entries of UQ are given by

UQ(1, 2) = ˜Uψ(1, 2) + ˜Hψ(1, 2)(ψ12 − ˜ψ12)
(cid:33)
∂ ˜Φi12
∂ψ12

∂2 ˜Φi12
∂ψ2
12

(cid:32) n

(cid:32) n

(cid:88)

(cid:88)

(cid:33)

=

+

i=1

i=1

(ψ12 − ˜ψ12),

UQ(1, 3) = ˜Uψ(1, 3) + ˜Hψ(1, 3)(ψ13 − ˜ψ13)
(cid:33)
∂ ˜Φi13
∂ψ13

∂2 ˜Φi13
∂ψ2
13

(cid:32) n

(cid:32) n

(cid:88)

(cid:88)

(cid:33)

=

+

i=1

i=1

(ψ13 − ˜ψ13),

UQ(2, 3) =

(cid:32) n

(cid:88)

i=1

∂ ˜Φi23
∂ψ23

(cid:33)

+

(cid:32) n

(cid:88)

i=1

(cid:33)

∂2 ˜Φi23
∂ψ2
23

(ψ23 − ˜ψ23),

(1.30)

11

and the diagonal entries of HQ are given by

HQ(1, 2) = ˜Hψ(1, 2) =

HQ(1, 3) = ˜Hψ(1, 3) =

HQ(2, 3) = ˜Hψ(2, 3) =

n
(cid:88)

i=1
n
(cid:88)

i=1
n
(cid:88)

i=1

∂2 ˜Φi12
∂ψ2
12
∂2 ˜Φi13
∂ψ2
13
∂2 ˜Φi23
∂ψ2
23

,

,

,

(1.31)

where HQ is also diagonal.

Then, to ensure the surrogate function majorizes the objective function PQ(ψ), we substitute

the Hessian matrix HQ with its largest eigenvalue in the second-order Taylor expansion of the

unpenalized negative log-likelihood function ℓQ(ψ). Given ℓQ(ψ), consider updating current

estimates ˘ψ. The new parameters ˘ψ(new) are given by solving

˘ψ(new) = arg min
ψ

ℓQ( ˘ψ)+ ˜U T

Q (ψ − ˘ψ)+(ψ − ˘ψ)T ˜γψ(ψ − ˘ψ)+λ(|ψ12|+|ψ13|+|ψ23|). (1.32)

The solution for this problem is given by

˘ψjk(new) =

(cid:16)

˜γψ

˘ψjk − ˜UQ(j, k)

(cid:17) (cid:16)

1 −

˜γψ

λ

˘ψjk− ˜UQ(j,k)|

|˜γψ

(cid:17)

+

,

(1.33)

where ˜γψ is the largest eigenvalue of ˜HQ. The entire procedure is summarized in Algorithm 1.

12

Algorithm 1: Algorithm for penalized estimation of the copula parameters.

1. Initialize ˜ψ.

2. (Outer Loop) Update the penalized objective function in Equation (1.27).

(a) Compute ˜UQ, and ˜HQ.

(b) Compute ˜γψ, the largest eigenvalue of ˜HQ.

(c) (Inner Loop) Obtain the minimizer of the objective function in (1.27).

• Initialize ˘ψ = ˜ψ.

• Repeat the following until ˘ψ converges.

– Compute ˘ψ(new) by Equation (4.38).

– Set ˘ψ = ˘ψ(new).

• Set ˜ψ = ˘ψ.

3. Repeat until ˜ψ converges.

1.3 The Distribution of Correlation Matrices

The authors Pourahmadi and Wang (2015) have shown any p dimensional correlation matrix Σ

can be Cholesky decomposed into Σ = BBT , where B is a lower triangular matrix with entries

bjk, and

bjk =






1

for j = k = 1,

cos θj1
k−1
(cid:89)

sin θjh

h=1

cos θjk ·

k−1
(cid:89)

h=1

for j = 2, · · · , p,

for j = k > 1,

(1.34)

sin θjh

for 2 ≤ k ≤ j − 1,

13

where θjk, j > k are some angles. For example, in a three dimensional setting,

1

0

B =

cos θ21

sin θ21









0

0









.

(1.35)

cos θ31

cos θ32 sin θ31

sin θ32 sin θ31

The matrix B is unique if its diagonal entries are positive, or equivalently if the angles are restricted
(cid:1) matrix Θ = (θjk) is
to the range (0, π). Furthermore, the transformation from Σ to a (cid:0)p

(cid:1) × (cid:0)p

2

2

one-to-one, where θjk = 0 for j ≤ k and θjk are angles for j > k. Thus, this helps us to generate
valid high-dimensional correlation matrices using (cid:0)p

(cid:1) parameters in the matrix Θ.

2

Now, we may wonder if the resulting matrix from the estimating procedure shown in Section

1.2 would be a valid correlation matrix that is positive semi-definite. However, it would not be an

issue for practical application since the estimated correlations will be close to the true underlying

correlations of the data when the sample size is large enough. Practically, the resulting correlation

matrix in Section 3 is positive semi-definite since the smallest eigenvalue is positive. Figure 3.2

shows that the smallest eigenvalue decreases when the tuning parameter gets smaller, which means

too many non-zero parameters may result in an invalid correlation matrix. Future work may use
the (cid:0)p
(cid:1) parameters in matrix Θ as alternatives to correlations ρjk in the copula model, to avoid any

2

possible theoretical problems in the algorithm. For this, the following compact expression from

Pourahmadi and Wang (2015) will be useful to transform correlations to thetas:

ρjk = cj1ck1 +

j−1
(cid:88)

h=2

cjhckh

h−1
(cid:89)

l=1

sjlskl + ckj

j−1
(cid:89)

l=1

sjlskl,

1 ≤ j < k ≤ p,

(1.36)

where cjk = cos θjk and sjk = sin θjk. However, applying the expression to the copula model may

increase the computational complexity, especially in the differentiation parts. In the thesis, we rely

on the ρij = tan−1(ψij)/(π/2) mapping instead of the alternative parameterization using Θ for its

simplicity.

14

CHAPTER 2: SIMULATION

2.1 Influence of Spurious Correlations

The purpose of this simulation is to analyze the influence of spurious correlations and further

demonstrate the advantages of setting certain correlations to zero. Assume we have the following

correlation matrices with their corresponding situation:

1

0.9 0.1

0.9 0

1

0.9 −0.1









A =

0.9

1

0.1

0.1 0.1

1









, B =









1

0.9

0









, C =



0






1

1

0

−0.1 −0.1

1









0.9

1

−0.1

,

(2.1)

• A: Positive spurious correlations in the other off-diagonal entries

• B: True underlying correlation matrix without spurious correlations

• C: Negative spurious correlations in the other off-diagonal entries

The task is to analyze the variance and the 95th percentile of insurance portfolio losses with each

dependence matrix from the simulation. To do so, we simulate B samples from copulas with each

correlation matrix and denote the randomly generated uniform variables by uA,ij, uB,ij, uC,ij, for

i = 1, · · · , B and j = 1, 2, 3. Assume the distribution of each response is known, which is a

gamma distribution with shape 2 and scale 100. The realized losses can be simulated using inverse

transform sampling: yA,ij = F −1(uA,ij), yB,ij = F −1(uB,ij), yC,ij = F −1(uC,ij). Since we are

interested in the simulated insurance portfolio losses with the dependence structure A, B, and C,

the losses should be aggregated for analysis

TA,i = (yA,i,1 + yA,i,2 + yA,i,3),

i = 1, · · · , B

TB,i = (yB,i,1 + yB,i,2 + yB,i,3),

i = 1, · · · , B

(2.2)

TC,i = (yC,i,1 + yC,i,2 + yC,i,3),

i = 1, · · · , B,

15

The variance and the 95th percentiles are plotted with increasing sample sizes B and shown in

Figure 2.1.

Figure 2.1: Influence of spurious correlations

In Figure 2.1, it is obvious that spurious correlations may affect the estimations of portfolio

risk. Positive spurious correlations may overestimate the portfolio variance and 95th percentile,

while negative spurious correlations resulted in underestimation. This indicates the usefulness of

inducing sparsity into the resulting correlation matrix that may reduce the spurious correlations.

16

0e+002e+054e+056e+058e+051e+068500095000105000VarianceBVariancePositive spurious correlationsWithout spurious correlationsNegative spurious correlations0e+002e+054e+056e+058e+051e+06115011701190121095th PercentileB95th PercentilePositive spurious correlationsWithout spurious correlationsNegative spurious correlations2.2 C and IC Analysis

The goal of this simulation is to know how the sample size n and the dimension p impact the

algorithm on identifying zero and non-zero correlation coefficients. In order to reach the goal, we

plan to randomly generate a p-dimensional correlation matrix with some zero entries and simulate

n observations from a copula using this matrix. Note that the distribution of correlation matrices in

Section 1.3 is useful when generating a correlation matrix with a particular dependence structure,
(cid:1) parameters in the lower triangular matrix
and the inputs of the generation approach are the (cid:0)p
(cid:1) parameters in Θ to π/2 and the rest 10% to π/4. It results in a
Θ. Here, we set 90% of the (cid:0)p

2

2

correlation matrix with 90% of the off-diagonal entries being zero, since bjk = 0 for k < j when

cos(θjk) = cos(π/2) = 0. Comparing the generated matrix, which is the true correlation matrix

that produces the data, and the estimated correlation matrix from the algorithm, we classify the

entries of the correlation matrix into the following cases:

• CZ (Correctly identified zero): This occurs when the true coefficient is zero, and the algorithm

correctly identifies it as zero.

• ICZ (Incorrectly identified zero): This occurs when the true underlying coefficient is zero,

but the algorithm thinks it is non-zero.

• CNZ (Correctly identified non-zero): This occurs when the true coefficient is non-zero, and

the algorithm thinks it is non-zero.

• ICNZ (Incorrectly identified non-zero): This occurs when the true coefficient is non-zero,

but the algorithm identifies it as zero.

We duplicate the simulation 100 times, and the average number of CZ, ICZ, CNZ, and ICNZ

cases with different sample size (n) and dimension (p) are listed in Table 2.1. Note that the Oracle

is the idealized situation, the HDcop is the high-dimensional algorithm of our research, and the

MLE is the maximum likelihood estimation approach.

In the first panel of Table 2.1, the number of responses is fixed to p = 10, the algorithm

has a high CZ value for all sample sizes, indicating that it was successful at identifying the zero

17

coefficients. The ICZ value is higher when the sample size is small, but it decreases quickly as

the sample size increases. On the contrary, the CNZ value increases as the sample size increases.

On the other hand, MLE seems to be unsuccessful at identifying the true zero coefficients, but it

identifies the true non-zero well. In the second panel of Table 2.1, we fix the sample size to n =

400, and the number of correctly identified coefficient (CZ and CNZ) increases when dimension p

increases. However, the MLE seems poor at identifying any true zero and non-zero coefficients.

Oracle

Sample size (n)
100
CZ
40
ICZ
0
CNZ
5
ICNZ
0
CZ
39.90
ICZ
3.49
CNZ
1.51
ICNZ
0.10
CZ
0.00
ICZ
0.00
5.00
CNZ
ICNZ 40.00

HDcop

MLE

Oracle

Dimension (p)
CZ
ICZ
CNZ
ICNZ
CZ
ICZ
CNZ
ICNZ
CZ
ICZ
CNZ
ICNZ

MLE

HDcop

3
2.68
0.00
0.32
0.00
2.61
0.00
0.32
0.07
0.00
0.00
0.00
2.68

Table 2.1: C and IC analysis result

200
40
0
5
0
39.54
0.00
5.00
0.46
0.00
0.00
5.00
40.00

4
5.26
0.00
0.74
0.00
5.25
0.00
0.59
0.16
0.00
0.00
0.00
5.41

300
40
0
5
0
39.48
0.00
5.00
0.52
0.00
0.00
5.00
40.00

5
8.95
0.00
1.05
0.00
8.82
0.00
1.05
0.13
0.00
0.00
0.00
8.95

400
40
0
5
0
39.49
0.00
5.00
0.51
0.00
0.00
5.00
40.00

6
13.53
0.00
1.47
0.00
13.05
0.00
1.86
0.09
0.00
0.00
0.00
13.14

500
40
0
5
0
39.44
0.00
5.00
0.56
0.00
0.00
5.00
40.00

7
18.38
0.00
2.62
0.00
18.41
0.00
2.32
0.27
0.00
0.00
0.00
16.68

600
40
0
5
0
39.49
0.00
5.00
0.51
0.00
0.00
5.00
40.00

8
24.51
0.00
3.49
0.00
24.29
0.00
3.42
0.29
0.00
0.00
0.00
24.58

700
40
0
5
0
39.37
0.00
5.00
0.63
0.00
0.00
5.00
40.00

9
31.41
0.00
4.59
0.00
31.27
0.00
4.43
0.30
0.00
0.00
0.00
31.57

800
40
0
5
0
39.46
0.00
5.00
0.54
0.00
0.00
5.00
40.00

10
39.61
0.00
5.39
0.00
38.70
0.00
5.75
0.56
0.00
0.00
0.00
39.26

900
40
0
5
0
39.38
0.00
5.00
0.62
0.00
0.00
5.00
40.00

11
47.85
0.00
7.15
0.00
47.61
0.00
6.87
0.52
0.00
0.00
0.00
48.13

1,000
40
0
5
0
39.40
0.00
5.00
0.60
0.00
0.00
5.00
40.00

12
57.73
0.00
8.27
0.00
56.68
0.02
8.55
0.75
0.00
0.00
0.00
57.43

2.3 Model Performance

The goal of this simulation is to test the performance and precision of the dependence model

described in Section 1.2. In other words, we want to compare the underlying correlation matrix,

which produced the data, and the resulting matrix from the estimating procedure. First, according

to the generation approach in Section 1.3, we randomly generate a lower triangular matrix Θ with

18

(cid:0)5
2

(cid:1) angles in range (0,π) as input. For example,

(θ21, θ31, θ41, θ51, θ32, θ42, θ52, θ43, θ53, θ54) =

(1.570, 1.568, 0.964, 1.213, 1.561, 2.108, 1.743, 2.415, 1.690, 1.201)

Then, by Equation 1.34, the lower triangular matrix B is as follow

0

1

0

0

0.0098

0.9999

0

0

0



1

0.0008













B =

0.0028

0.5702 −0.4204 −0.5275 0.4689

0.3502 −0.1605 −0.1097 0.3312 0.8543

0

0

0

0















So that, our underlying correlation matrix Σ will be



1

0.001

0.003

0.57

0.35















0.001

1

0.01 −0.42 −0.16

Σ = BBT =

0.003

0.01

1

−0.53 −0.11

0.57 −0.42 −0.53

1

0.48

0.35 −0.16 −0.11

0.48

1

After simulating 2000 observations from copula with dependent matrix Σ and applying the copula

estimation routine shown in Section 1.2, we obtain the estimated dependence parameters and show

them as the following correlation matrix A.















A =

1

0

0

0

1

0

0

0

1

0.5475

0.2977



−0.3848 −0.1583

−0.4822 −0.0705

0.5475 −0.3848 −0.4822

1

0.4540

0.2977 −0.1583 −0.0705

0.4540

1

19

(2.3)

(2.4)

(2.5)

(2.6)

























Comparing the resulting matrix A with the underlying correlation matrix Σ, all of the estimates

seem close to the true value, and the underlying coefficients that are close to zero, such as 0.001,

0.003, and 0.01 in Σ, end up being shrunk to exactly zero in the resulting matrix A. It indicates

that the dependent model estimates the parameters well and successfully eliminates the irrelevant

ones.

20

CHAPTER 3: RESULTS

In this section, the method in Chapter 1 is applied to estimate the dependence of indemnity amounts

among the selected commodities in the U.S. federal crop insurance data. Here, the selected

commodities are the top 3 prevalent crops: corn, soybean and wheat.

3.1 Data Description

The raw dataset is the U.S. crop insurance dataset on the Risk Management Agency website

(RMA) 1which is publicly available and includes the liability amounts and indemnity amounts for

each farm. The samples are composed of 368,720 farms, located in 2356 different counties in the

United States. Each farm grows only one of the crop commodities and the commodity types are

188 in total. Crops grown in similar regions shared environmental conditions and may be exposed

to common hazards, leading to the correlation in the indemnity amounts. The dependency among

commodity types in the same region may be important for the insurance/reinsurance company

interested in the Value at Risk (VaR) of specific insurance contracts covering any subset of the

commodities.

For the analysis in this thesis, the crop indemnity amounts and liability amounts for each farm

are further aggregated by county level, resulting in 2356 observations. Each observation (county)

has the total indemnity and liability for each commodity type. In the aggregated dataset, we are

given the observed indemnity Zij and liability Lij amount. Table 3.1 shows the partial dataset

of the aggregated liabilities amount and indemnities amount by the top 3 prevalent crops (corn,

soybean, and wheat). According to the dataset, we can see that observed indemnity Zij must be

zero when liability Lij is zero, and Zij may be zero or positive number when Lij is positive.

3.2 Marginal Models

Suppose the random variable Yij represents the realization of the indemnity amount for the j-th

commodity type in county i. As the assumption of Yij is gamma distributed, we select the positive

observed indemnity amounts Zij as Yij (response) with their corresponding liability amounts Lij

1

https://www.rma.usda.gov

21

Table 3.1: Dataset of liabilities and indemnities

County
(i)
1
2
3
4
5
6
7
8
9
10

Liab.

Liab.

Liab.

Indem.

Indem.

Indem.
Corn(Li1) Corn(Zi1) Soy Bean(Li2) Soy Bean(Zi2) Wheat(Li3) Wheat(Zi3)
15482
9605
33660
11957
0
0
96086
0
451107
0

119156
1564468
118968
552515
97843
43851
556543
0
1563561
72413

109918
1432266
329826
188870
351199
320717
297396
59827
647271
85054

10957
297804
0
9069
51486
0
66829
0
153198
31204

60485
126290
37598
25364
0
0
239153
0
682013
5738

0
79946
3909
0
0
4261
1084
0
17755
7163

(exposure) from the aggregate dataset. Then, Yij > 0 for each i and j allow us to fully define the

density and distribution of Yij for the copula model.

The gamma distribution is commonly used in actuarial science since it is only defined for

positive values, and this is appropriate for modeling the variables that represent quantities in the

insurance field that cannot be negative, such as claim sizes and loss amounts. Although heavy-tail

distributions are recommended for individual loss severity modeling in the recent literature, we still

use the gamma distribution instead of heavy-tailed loss distributions in this research for simplicity

and practical purpose. Moreover, according to the central limit theorem, the skewness in the

distribution of aggregated indemnity amount will reduce while the loss amounts for more and

more farms are aggregated, thus gamma distribution may be a more suitable choice for the model.

Since the indemnity amount Yij follows gamma distribution and the expected indemnity amount

can be exposed using the corresponding liability amount Lij, the marginal model is defined by

GLM with gamma distributed dependent variable. The right panel in Figure 3.1 shows that the

liability amounts and the indemnity amounts have a positive relationship at the county level after

log transformation.

Therefore, we assume a log link, and the marginal model is defined as

log (E(Yij)) = β0 + β1 log(Lij).

(3.1)

22

Figure 3.1: Scatter plot of liabilities and indemnities at the county level

And the estimated coefficients of the model for each commodity type are shown in Table 3.2.

Table 3.2: Coefficient estimates for the marginal gamma model

(Intercept)
LogExposurePos

Corn

Soybean
Estimate Std. Error Estimate Std. Error Estimate Std. Error
0.8038
0.7738

0.2531
0.0161

0.2860
0.0213

0.1038
0.8445

2.1346
0.6753

0.3750
0.0235

Wheat

3.3 Dependence Model

Using the marginals and the dependence model described in Chapter 1, we obtain estimates for

the dependence parameters of the selected commodities shown in Table 3.3. These estimates are

obtained using the tuning parameter that minimizes the BIC score, as shown in Figure 3.2.

Table 3.3: Correlation of selected commodities

Corn
Soybean
Wheat

Corn
1
0.4660
0.1821

Soybean Wheat

1
0.1923

1

In Figure 3.2, the correlation estimates are shown for different log lambda in the left penal.

Although there are only 3 correlations, we can still observe that more coefficients are shrunk to

zero when lambda increases. The BIC scores corresponding to different log lambda values are

23

0.0e+005.0e+081.0e+091.5e+090.0e+005.0e+081.0e+091.5e+09LiabilitiesIndemnities0510152005101520Log LiabilitiesLog Indemnitiesdisplayed in the upper right panel, and the vertical line indicates the value of log lambda that

minimized the BIC score. Here, log(λ) = −9.21034 leads to the minimum BIC and will be set as

the optimal tuning parameter to estimate coefficients. The smallest eigenvalue of the correlation

matrix Σ with the corresponding log lambda is shown in the bottom right panel, where the panel

indicates that the correlation matrix for the optimal tuning parameter is positive semi-definite since

all of its eigenvalues are non-negative.

Figure 3.2: Solution path and optimal tuning parameter for selected commodities

In the 3-dimensional copula case, with only three dependent parameters, there is sufficient data

to estimate the parameters accurately. Thus, the lasso penalty might not be as crucial, and the

optimal tuning parameter may not shrink any coefficient to exactly zero, as depicted in Figure 3.2.

However, in higher-dimensional copula cases, the model may heavily rely on regularization (L1

penalty) to deal with sparse data and prevent overfitting, leading to more coefficients being shrunk

toward zero, ultimately resulting in a positive semi-definite correlation matrix for the optimal

tuning parameter.

3.4 Value at Risk (VaR)

In order to demonstrate the practical application of the estimated model, let’s consider a fictional

insurance contract that covers a selection of commodities, including corn, soybean, and wheat.

24

Suppose the correlations among the commodity type are shown in Table 3.3, and the marginal

models are defined as equation (3.1) with coefficients in Table 3.2. Furthermore, the log liability

amounts for each commodity are 10.

Using the marginal models and the given log liability amounts, the distribution of the indemnity

amount for each commodity may be known. And further, the indemnity amount can be randomly

sampled using the simulation from copula with correlations shown in Table 3.3 and inverse cdf

transformation. In other words, random indemnity amounts can be simulated using the copula and

marginal models based on the dependence structure for the selected commodities. The simulation

is repeated B = 100,000 times, and the indemnity amounts are aggregated each run to calculate the

Value at Risk for the insurance portfolio losses. To compare the Value at Risk for the insurance

coverage under the dependence and independence model, this simulation process may be applied

to an independence model using an identity matrix as the correlation matrix.

Once the simulations are complete, the resulting aggregate indemnity amounts from the two

models can be compared using histograms. Figure 3.3 displays the histograms for the dependence

model and the independence model, and the vertical line indicates the 95th percentile for each case.

It can be observed that the dependence model results in a more skewed distribution of aggregate

indemnity amounts and a higher 95th percentile, indicating that the dependence among commodity

types can impact the Value at Risk for insurance coverage. The 95th percentile is 42,255 for the

dependent case, which corresponds to the Value at Risk at a 95% confidence level, indicating there

is a 5% chance that the losses will exceed 42,255. In contrast, the 95th percentile is 37,683 under

independent circumstances.

To gain a more comprehensive understanding of the insurance portfolio’s risk, we calculate the

Conditional Value at Risk (CVaR), also known as Tail Value at Risk. CVaR is the expected loss

given that the loss exceeds the VaR level, it offers more information about the potential severity

of extreme events. The CVaR at a 95% confidence level is 53,220 for the dependent case, and

46,086 for the independent case. For this fictional insurance contract, both the Conditional Value

at Risk (CVaR) and Value at Risk (VaR) turn out to be larger under dependence. This observation

25

reminds us that when variables are dependent, we might be especially cautious about the potential

for substantial losses.

Figure 3.3: Simulated density of indemnities for a fictional insurance coverage

26

CHAPTER 4: CONCLUSION

In this thesis, the algorithm estimates the normal copula parameters of the composite pairwise

likelihood with the L1 penalty, which is used to induce sparsity into the resulting correlation

matrix and avoid overfitting. Inducing sparsity allows the dependence parameters to be estimated

even in high-dimensional problems, thus it is useful when the number of dependence parameters

among the p commodity types is larger than the sample size n. In the estimation approach, the

majorization-minimization is used to minimize the composite likelihood function iteratively since

the function with penalty has complex constraints and is difficult to optimize directly.

In practical application, the resulting dependence model can be used to calculate Value at

Risk for an insurance contract covering any subset of commodities. The Value at Risk is a

crucial measure for insurance companies to facilitate risk assessment and ensure sufficient capital

reserves for potential losses. With the estimated dependence model, the distribution of aggregated

indemnity amount for the subset of commodities can be simulated using marginal and copula

models, and Value at Risk of indemnity amount covered by contract can be further computed.

According to result in Section 3.4, insurance contract with positive dependencies among commodities

may have higher Value at Risk than one with independence model.

The thesis focuses on using 3 dimensions as an example to illustrate the dependence model and

the extension to the high-dimensional model is shown in the Appendix. With the three-dimensional

setting, the complex theoretical functions and core concepts may be simplified and easily understood.

There are some limitations that may be addressed in future work. One is the approach does not use

the true distribution of correlation matrices in Section 1.3, thus it is possible to result in an invalid

correlation matrix with non-positive semi-definiteness. Another is the approach is restricted in

normal copula cases. For future research, apply the alternative parameterization using Θ to the

algorithm. Moreover, the approach explained in this thesis may be extended to other copula

families. Last, we only set liability amounts as exposure for modeling the expected indemnity

amount in the thesis, however, if there are more explanatory variables available, including them as

exposures may help improve the marginal model.

27

BIBLIOGRAPHY

Nelsen, R.B. An Introduction to Copulas; Vol. 139, Springer Science & Business Media, 1999.

Hofert, M.; Kojadinovic, I.; M¨achler, M.; Yan, J. Elements of Copula Modeling with R; Springer,

2018.

Czado, C. Analyzing Dependent Data with Vine Copulas: A Practical Guide With R; Springer,

2019.

Frees, E.W.; Valdez, E.A. Understanding relationships using copulas. North American Actuarial

Journal 1998, 2, 1–25.

Nikoloulopoulos, A.K.; Karlis, D. Regression in a copula model for bivariate count data. Journal

of Applied Statistics 2010, 37, 1555–1568.

Shi, P.; Emiliano, V. Longitudinal modeling of insurance claim counts using jitters. Scandinavian

Actuarial Journal 2014, 2014, 159–179.

Tibshirani, R. Regression shrinkage and selection via the lasso. Statistics and Computing 1996,

58, 267–288.

Yuan, M.; Lin, Y. Model selection and estimation in regression with grouped variables. Journal of

the Royal Statistical Society. Series B (Methodological) 2006, 68, 49–67.

Yang, Y.; Zou, H. A fast unified algorithm for solving group-lasso penalize learning problems.

Statistics and Computing 2015, 25, 1129–1141.

Qian, W.; Yang, Y.; Zou, H. Tweedie’s compound poisson model with grouped elastic net. Journal

of Computational and Graphical Statistics 2016, 25, 606–625.

Frees, E.W.; Gao, J.; Rosenberg, M.A. Predicting the frequency and amount of health care

expenditures. North American Actuarial Journal 2011, 15, 377–392.

Frees, E.W. Frequency and severity models. In Predictive Modeling Applications in Actuarial
Science; Frees, E.W.; Meyers, G.; Derrig, R.A., Eds.; Cambridge University Press: Cambridge,
2014.

Zhao, Y.; Joe, H. Composite likelihood estimation in multivariate data analysis. The Canadian

Journal of Statistics 2005, 33, 335–356.

Song, P.X.K.; Fan, Y.; Kalbfleisch, J.D. Maximization by parts in likelihood inference. Journal of

the American Statistical Association 2005, 100, 1145–1158.

Joe, H. Dependence Modeling with Copulas; CRC Press, 2014.

Pourahmadi, M.; Wang, X. Distribution of random correlation matrices: Hyperspherical
parameterization of the Cholesky factor. Statistics and Probability Letters 2015, 106, 5–12.

Frees, E.W.; Lee, G.Y.; Yang, L. Multivariate frequency-severity regression models in insurance.

Risks 2016, 2016.

28

Assumptions of General Case (p dimensions)

APPENDIX

Consider the general problem, where we are given observations of a continuous multivariate

response Zij, and liability (exposure) amounts Lij. The observations are given in two matrices:

z11

z12

· · ·

z21
...

z22
...

· · ·
. . .

z1p

z2p
...























and












L11 L12

· · · L1p

L21 L22
...
...

· · · L2p
...
. . .

zn1 zn2

· · ·

znp

Ln1 Ln2

· · · Lnp












.

(4.1)

In practice, we may observe a matrix of explanatory variables in addition to the liabilities. Yet,

for simplicity of the illustration, we assume in this paper that there are no additional explanatory

variables observed. The principle may be easily extended to the case with more covariates (explanatory

variables). We assume that Zij holds the value of zero when Lij is zero. Yet, this is not the only

case when Zij is zero. Typically, the value Zij may be either zero or a positive number with some

probability, say

P r(Rij = 1|Lij > 0) = E[Rij|Lij > 0],

(4.2)

where Rij may be thought of as an indicator of the response being positive, given a postive liability.

This is a typical situation arising in insurance claims modeling, where the data are unbalanced due

to the response being undefined when the liability amount Lij is zero, and given a positive Lij

the response is observed. In practice, the frequency of insurance claims may also be observed,

resulting in the modeling of average severities, as opposed to the total severity. An exampe of

where the frequencies are observed is the article by Frees et al. (2016). The method proposed in

this paper can be extended to this situation as well. Yet, for now let’s assume we are interested

in modeling the total severity Zij directly. We hope to understand the dependence structure of the

observations of the Zij response. For this, we decompose the response, so that

Zij = I(Lij > 0) · Rij · Yij.

(4.3)

29

Here, we assume that the indemnity amount Yij satisfies Yij > 0 for each i and j. This allows the

density of Yij to be well defined. We imagine that there is a binary random variable Rij for each

indemnity amount Yij, that holds the value 1 when the observed indemnity amount is positive. We

also assume that Rij and Yij are independent. We may define the observed indemnity amount as

the product of Yij and Rij, so that Rij · Yij are the observed indemnity amounts. Notice that the

following relationship holds:

Zij =




Yij

if Rij = 1 and Lij > 0



0

otherwise.

(4.4)

The liability amount Lij may also be zero or a positive number. For this reason, the observed

response Zij is positive only if the liability is positive, and zero (in fact undefined) otherwise. We

consider the general situation, where the marginal models are either given, or may be estimated

from data separately using the Inference for Margins (IFM) approach, described in books and

articles on copula modeling, such as Joe (2014). Thus, we assume that we know the parametric

form and parameters of the distributions Fi1(yi1), Fi2(yi2), · · · , Fip(yip), as well as the densities

fi1(yi1), fi2(yi2), · · · , fip(yip). In other words, we are given marginal models dependent on the

liability amounts Lij so that

Fij(y) = F (y; Lij),

i = 1, · · · , n,

j = 1, · · · , p.

(4.5)

Using these, we hope to find a joint distribution among p variables, f (yi1, yi2, · · · , yip). This task

may be accomplished using a copula. Here, we make an assumption that there exists a multivariate

normal copula C of p-dimension, linking the marginal distributions, so that

C(u1, · · · , up) = Φp(Φ−1(u1), · · · , Φ−1(up); Σ),

(4.6)

where Φp is the cumulative distribution function for a p-dimensional multivariate normal distribution

30

with zero means and correlation matrix Σ, and Φ−1 are the inverse cumulative distribution functions

for standard normal distributions. The density for the copula is

c(u1, · · · , up; Σ) =

∂p
∂u1 · · · ∂up

C(u1, · · · , up; Σ).

(4.7)

The joint density is then given by

f (yi1, yi2, · · · , yip) = c (Fi1(yi1), Fi2(yi2), · · · , Fip(yip); Σ) · fi1(yi1) · fi2(yi2) · · · fip(yip). (4.8)

Note that by separating Yij from Zij by factoring out Rij and Lij, we may assume for every

observation i, there exists such a density. Because we are assuming the marginal models are given,

we may assume the input for the copula estimation routine is a matrix of uniformly distributed

values uij, and indicators r∗

ij = rij · I(Lij > 0) where

and

uij =




Fi(zij) Rij = 1, and Lij > 0



0

Rij = 0, or Lij = 0,

r∗
ij =






1 Rij = 1, and Lij > 0

0 Rij = 0, or Lij = 0.

(4.9)

(4.10)

Shrinkage Estimation of Copula Parameters

In order to estimate the dependence parameters in the matrix Σ, we use the composite likelihood

approach. Specifically, we minimize the composite negative log-likelihood function

ℓ(Σ) = −

1
mi − 1

n
(cid:88)

(cid:88)

i=1

j>k
r∗
ij =1
r∗
ik=1

(cid:20)

(cid:21)
log c (Fij(yij), Fik(yik); ρjk) + log fj(yij) + log fk(yik)

,

(4.11)

31

where c is a bivariate normal copula, and the weight for observation i, is mi and is obtained by

mi =

p
(cid:88)

j=1

r∗
ij,

(4.12)

or in other words, the number of non-zero responses for observation i. Note that given the marginal

distributions Fij for each i = 1, · · · , n, j = 1, · · · , p, the likelihood in Equation (4.11) is a function

in the dependence parameters only. Note that Σ is symmetric.

Σ =












1

ρ12

· · · ρ1p

ρ21
...

1
...

ρp1 ρp2

· · · ρ2p
...
. . .

· · ·

1












.

(4.13)

If p is large, then the problem is high-dimensional, since the number of dependence parameters in
the matrix Σ is exactly (cid:0)p
(cid:1). In order to automatically reduce the number of non-zero dependence

2

parameters, we use a shrinkage approach. For this, let

ρjk =

tan−1(ψjk)
π/2

, −∞ < ψjk < ∞,

(4.14)

which ensures that for any ψjk, the resulting dependence parameter satisfies −1 < ρjk < 1. We

then attempt to minimize the penalized negative log likelihood

ℓP (Ψ) = ℓ(Σ) + λ

|ψjk|,

(cid:88)

j>k
r∗
ij =1
r∗
ik=1

(4.15)

32

where



Ψ =

1

ψ12

· · · ψ1p










ψ21
...

1
...

ψp1 ψp2

· · · ψ2p
...
. . .

· · ·

1












.

(4.16)

The problem boils down to estimating the parameters in the Ψ matrix, some of which may end up

being zero due to the L1 penalty on the likelihood. This may be done using coordinate descent. In

this thesis, the tuning parameter λ is to be determined by minimizing the BIC score after solving

the problem for a set of λ values. In order to estimate the dependence parameters, let

Φijk = −

1
mi − 1

(cid:20)

(cid:21)
log c (Fij(yij), Fik(yik); ρjk) + log fij(yij) + log fik(yik)

,

(4.17)

so that

ℓ(Σ) =

Φijk.

n
(cid:88)

(cid:88)

i=1

j>k
r∗
ij =1
r∗
ik=1

(4.18)

Here, we assume that c is the density of a bivariate normal copula. This assumption allows us to

write the following:

Φijk = −





log



(cid:113)

1
mi − 1



 −

1

1 − ρ2
jk

where

(a2

i + b2

jk − 2aibiρ

i )ρ2
2(1 − ρjk)2

+ log fij(yij) + log fik(yik)

 ,



(4.19)

(4.20)

√

ai =

2 erf−1(2uij − 1),

uij = Fij(yij)

√

2 erf−1(2uik − 1),

bi =

uik = Fik(yik),

33

with

erf(z) =

2
√
π

(cid:90) z

0

exp(−t2)dt.

(4.21)

Furthermore, remember that we have parametrized the dependence parameters so that we have

ρjk = 2 tan−1(ψjk)/π. In order to obtain the gradient and hessian of the likelihood in terms of the

parameters ψjk, consider

∂Φijk
∂ψjk

=

∂Φijk
∂ρjk

∂ρjk
∂ψjk

.

The first and second terms in the product may be obtained by differentiation.

∂Φijk
∂ρjk

= −

1
mi − 1

(cid:34)

ρjk
1 − ρ2
jk

+

aibiρ2

jk − (a2

i + b2
i )ρjk + aibi
(1 − ρ2
jk)2

∂ρjk
∂ψjk

=

2
π(1 + ψ2

jk)

.

Furthermore, we have

∂2Φijk
∂ψ2
jk

=

∂2Φijk
∂ρ2
jk

(cid:18) ∂ρjk
∂ψjk

(cid:19)2

+

∂Φijk
∂ρjk

(cid:32)

(cid:33)

,

∂2ρjk
∂ψ2
jk

(cid:35)

(4.22)

(4.23)

(4.24)

(4.25)

where

and

(cid:34)

∂2Φijk
∂ρ2
jk

= −

1
mi − 1

1 + ρ2
jk
(1 − ρ2
4 (cid:8)aibiρ2

jk)2 +

+

{2aibiρjk − (a2

jk)2

i )} (1 − ρ2
i + b2
(1 − ρ2
jk)4
(cid:9) (1 − ρ2

jk − (a2

i + b2

i )ρjk + aibi
jk)4

(1 − ρ2

jk)ρjk

(cid:35)

,

(4.26)

∂2ρjk
∂ψ2
jk

= −

4ψjk
π(1 + ψ2

jk)2 .

(4.27)

34

Denote the flattened vector of parameters by a (cid:0)p

2

(cid:1) dimensional vector

ψ = (ψ21, ψ31, · · · , ψp1, ψ32, ψ42, · · · ψp2, · · · , ψp,p−1)T .

(4.28)

The Hessian matrix has dimension (cid:0)p

(cid:1) × (cid:0)p

(cid:1), and is diagonal. For simplicity, we denote the entries

2

2

of the Hessian matrix using j and k, the Hessian matrix Hψ has entries

Hψ(j, k) =

n
(cid:88)

i=1

∂2Φijk
∂ψ2
jk

,

j > k,

(4.29)

where in the three dimensional example, there are only three possible values for the j and k indices:

(2, 1), (3, 1), and (3, 2). Denote the gradient vector by

Uψ(j, k) =

n
(cid:88)

i=1

∂Φijk
∂ψjk

,

j > k.

(4.30)

Remember that j and k are indices for the dependence parameter matrix, not the Hessian matrix

itself. Also remember that the off diagonal elements of Hψ are zero. We take a second order

Taylor series approximation to the unpenalized negative log-likelihood function in Equation (4.18)

around a given point ˜ψ.

ℓQ(ψ) = ℓ( ˜ψ) + ˜U T

ψ (ψ − ˜ψ) +

1
2

(ψ − ˜ψ)T ˜Hψ(ψ − ˜ψ),

(4.31)

Then, we would like to minimize the following penalized objective function:

PQ(ψ) = ℓQ(ψ) + λ

|ψjk|.

(cid:88)

j>k
Rij =1
Rik=1

(4.32)

35

For this, we use the majorization minimization principle. First, define

UQ = ∇ℓQ(ψ)

HQ = ∇2ℓQ(ψ),

where the entries of UQ are given by

UQ(j, k) = ˜Uψ(j, k) + ˜Hψ(j, k)(ψjk − ˜ψjk)
(cid:33)
∂ ˜Φijk
∂ψjk

∂2 ˜Φijk
∂ψ2
jk

(cid:32) m
(cid:88)

(cid:32) m
(cid:88)

(cid:33)

=

+

i=1

i=1

(ψjk − ˜ψjk),

and

HQ(j, k) = ˜Hψ(j, k) =

n
(cid:88)

i=1

∂2 ˜Φijk
∂ψ2
jk

,

(4.33)

(4.34)

(4.35)

(4.36)

where the simplicity is due to the fact that Hψ is diagonal. Notice that because of the simplicity,
HQ is also diagonal. Given ℓQ(ψ), consider updating current estimates ˘ψ. The new parameters
˘ψ(new) is given by solving

˘ψ(new) = arg min
ψ

ℓQ( ˘ψ) + ˜U T

Q (ψ − ˘ψ) + (ψ − ˘ψ)T ˜HQ(ψ − ˘ψ) + λ

|ψjk|.

(4.37)

(cid:88)

j>k
r∗
ij =1
r∗
ik=1

The solution for this problem is given by

˘ψjk(new) =

(cid:16)

˜γψ

˘ψjk − ˜UQ(j, k)

(cid:17) (cid:16)

1 −

˜γψ

λ

|˜γψ

˘ψjk− ˜UQ(j,k)|

(cid:17)

+

,

(4.38)

where ˜γψ is the largest eigenvalue of ˜HQ.

36