fifth, LIBRARY Michigan State University This is to certify that the thesis entitled ASYMPTOTIC AND COMPUTATIONAL METHODS IN SPATIAL STATISTICS presented by Juan Du has been accepted towards fulfillment of the requirements for the Ph.D. degree in Statistics Maw 0.0m Ma'jor'F‘r'ofessor’s Signature 5/29/07 Date MSU is an Affin'native Action/Equal Opportunity Employer _.-.—.-.-.----o—o—n-u-------n-a---n---n-a—n-.-.----.—v--n-.—.—--.-.-.-—--.-.—-—-—--.- PLACE IN RETURN Box to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 5I08 K:IPrq'/Acc&Pres/CIRCIDateDuo. indd ASYMPTOTIC AND COMPUTATIONAL METHODS IN SPATIAL STATISTICS By Juan Du A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Statistics 2009 ABSTRACT ASYMPTOTIC AND COMPUTATIONAL METHODS IN SPATIAL STATISTICS By Juan Du The dissertation consists of two parts. The first part studies connection between fixed- domain asymptotics and the equivalence of Gaussian measures. It is stressed that one of the most important probabilistic tools to establish fixed-domain asymptotics is to use the equivalence of Gaussian measures and related criteria. Two alternative proofs are attempted to show some results about asymptoic optimality of prediction and the equivalence of Gaussian measures based on reproducing kernel Hilbert space, which may have potential power to give preferable conditions for fixed-domain asymptotics in spatial domain without constrains like stationarity of underlying processes. The second part of the dissertation pertains to the application of covariance ta- pering to deal with large spatial data sets. When the spatial sample size is extremely large, as for many environmental and ecological studies, operations on the large co- variance matrix are numerically challenging. Covariance tapering is a technique to alleviate these numerical challenges. We investigate how tapering affects the asymp— totic efficiency of the maximum likelihood estimator (MLE) and establish asymptotic properties, particularly asymptotic distributions of the exact MLEs and tapered MLEs under the fixed-domain asymptotic framework for the Matérn model. We show that under some conditions on the tapering function, the tapered MLE is asymptotically as efficient as the true MLE for the microergodic parameter in the Matérn model. For the general setting, we compare the exact and tapered likelihood and their derivatives in seeking conditions on tapering which may yield no loss of efficiency. The convergence rate of effect of tapering on prediction is also studied. Finally, The computational gain and comparable estimation are illustrated by simulation studies and an application to the US precipitation data for April 1948. Copyright by Juan Du 2009 DEDICATION To my parents - Zihu Du and Wangxian Weng iv ACKNOWLEDGMENT I want to thank all the people who have helped and inspired me during my doctoral study. Working with my advisors, Professor V. S. Mandrekar and Professor Hao Zhang, was the turning point of my graduate study life. I have heart-felt gratitude for Prof. Mandrekar’s guidance of my study at Michigan State University. He accepted me as one of his students when I felt lost and hopeless three years ago. He helped me get through the most difficult time in my life by his integrity, his constant encouragement, his enthusiasm, his great efforts to explain things clearly and simply. What’s more, he respected my research interest and made my work with his former student, Prof. Zhang, possible. That has been the greatest help that only an open— minded advisor can offer. He sets an example of a world-class researcher and teacher for me to learn in my entire life. I want to express my deep and sincere appreciation to Prof. Zhang for his guidance in my research. I was delighted to interact with Prof. Zhang via email or telephone and to have had the chance to follow his direction of research. His insightful ideas and great expertise in both applied and theoretical statistics have inspired me and brought out my genuine interest in spatial statistics. In addition, he has been extremely patient and time-devoted in teaching and training me to be a researcher. Influenced by his righteousness, wise advice, kindness, humility, rigor and passion on research, my research life has become much smoother and rewarding for me. I will continue to pursue my academic goal following his example. My special gratitude goes to Professor James Stapleton, who has been my mentor in my entire graduate study. With his kind assistance with writing, speaking, teaching and so on, I have become more positive about my career and stepped out of fear. He is always available to help me and cheer me up. Professor Yimin Xiao, Professor Sarat Dass, and Professor Clifford Weil deserve enormous thanks as my thesis committee members and advisors. This thesis was com- piled using the template provided by Prof. Weil, who has kindly made this heavy work much easier. I am also grateful to Professor Connie Page, Professor Dennis Gilliland, Professor Elijah DiKong, Professor Paul Anderson, Professor Ashton Shortridge, Pro- fessor Parthanil Roy and Professor Jennifer Kaplan. I really appreciate their help in all different ways and having their valuable suggestions and comments. In particu- lar, I would like to thank Prof. Page, Prof. Gilliland for hiring me as a consultant at CSTAT. The associated experience broadened my perspective on the practical aspects of statistics. I am indebted to Professor Mark Meerschaert, Professor Lijian Yang, Professor Vincent Melfi, Ms. Cathy Sparks, Ms. Suzanne Watson and Mr. Eric Segur for their tremendous help and constant support. Furthermore, Mr and Mrs. Crickmore have always been a constant source of encouragement during the past three years. They are like my spiritual father and mother who drew me close to God. I also wish to thank my sister Zuoya Du and my friend who is from my home town, Shuzhuan Zheng, for their trust and support. Lastly, and most importantly, I wish to thank my parents, Zihu Du and Wangxian Weng. They bore me, raised me, supported me, taught me, and loved me. To them I dedicate this thesis. vi TABLE OF CONTENTS List of Figures ................................. viii List of Tables .................................. ix Introduction and Motivation 1 1.1 Fixed-domain asymptotics and tapering .................. 1 Fixed-domain asymptotics and the equivalence of two gaussian mea- sures 7 2.1 Introduction ................................. 7 2.2 Equivalence and singularity of Gaussian measures ............ 9 2.3 Equivalence of Gaussian measures and kriging .............. 22 2.4 Fixed-domain asymptotics of maximum likelihood estimators ...... 27 Covariance tapering and fixed-domain properties of tapered maxi- mum likelihood estimators 30 3.1 Introduction ................................. 30 3.2 Tail properties of tapering spectral density ................ 32 3.3 F ixed-domain asymptotic properties of tapered maximum likelihood estimators .................................. 38 3.3.1 Strong consistency of tapered MLE for Matérn model ...... 38 3.3.2 The fixed-domain asymptotics of tapered MLE for Exponential Model ................................ 41 3.3.3 Fixed-domain asymptotic distribution of MLE and tapered MLE for general Matérn model ..................... 44 3.3.4 Discussion .............................. 48 3.3.5 Appendix 1. Proofs for Section 3.3.2 ............... 49 3.3.6 Appendix 2. Proofs for Section 3.3.3 ............... 64 3.4 Comparison between tapered and exact likelihood functions ...... 78 3.4.1 Main results of comparison ..................... 78 3.4.2 Discussion ............................. 83 3.4.3 Appendix 3: Proof of Lemma in subsection 3.4.1 ......... 84 3.5 Simulation and model fitting of climate data ............... 91 3.5.1 Simulation study .......................... 91 3.5.2 Fitting a Matérn model to climate data .............. 106 3.6 Future Work ................................. 107 Bibliography ..................................................... 109 vii 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 LIST OF FIGURES Matérn and tapered covariograms ...................... 92 Sample data set locations of size 196. ................... 93 The boxplot of MLE and tapered MLE (where A: 9‘, B: (32, C: 6%”) with sample size 196. ............................ 96 The histogram of MLE and tapered MLE (where A: 6, B: {72, C: 6262”) with sample size 196 ......................... 97 The boxplot of MLE and tapered MLE (where A: d, B: 62, C: c3262”) with sample size 298 ............................. 98 The histogram of MLE and tapered MLE (where A: 0, B: 62, C: (3262”) with sample size 298 ......................... 99 The boxplot of MLE and tapered MLE (where A: 6, B: (32, C: 6262”) with sample size 385. ............................ 100 The histogram of MLE and tapered MLE (where A: 6, B: 52, C: 57292”) with sample size 385 ......................... 101 The boxplots of tapered MLE with different tapering range and increas- ing sample size. ............................... 102 3.10 The boxplots of tapered MLE with severe degree of tapering range. . . 104 3.11 Timing and estimation for tapered MLE for Matérn model ........ 105 3.12 The Precipitation Anomaly Field of April 1948. ............. 107 viii 3.1 3.2 3.3 LIST OF TABLES Average standard deviation(asd), relative coverage frequency(rcf), average length of intervals(al) of 95% CI of 0202” for Matérn model with u = 0.8. . . 103 Timing and estimation for tapered MLE with tapering range 7 = 0.1 using computer with CPU 2.8 Ghz and 8.00 GB RAM ......... 103 Fitting Matérn model with V = 0.3 and estimate consistently estimable parameter 0262”, tapering with 'y = 50 miles. .............. 107 ix Chapter 1 Introduction and Motivation 1.1 Fixed-domain asymptotics and tapering With the advancement of technology, large amounts of data are routinely collected over space and/or time in many studies in environmental monitoring, climatology, hydrology and other fields. The large amounts of correlated data present a great chal- lenge to the statistical analysis and may render some traditional statistical approaches impractical. For example, in maximum likelihood or Bayesian inference, the inverse of an n x n covariance matrix is involved, where the sample size n may be in hundreds of thousands or even larger. Inverting the large covariance matrix repeatedly is a great computational burden if not impractical, and some approximation to the likelihood is necessary. Covariance tapering is one of the approaches to approximating the covariance matrix and, therefore, the likelihood. Let the second order stationary Gaussian process X (t), t E Rd have mean 0 and an isotropic covariance function K (h; 0 , 02), where 02 is the variance of the process and 0 is the parameter that controls how fast the covariance function decays. Given n observations Xn = (X (t1) X (t,,))’ , the log-likelihood ..... I I 1 is n [11(6102) = 2 log 27r — glog[detVn(6,o2)] — gmwnwezn—lxn, (1.1.1) where Vn(0, 02) denotes the covariance matrix of Xn. The idea of tapering is to keep the covariances approximately unchanged at small distance lags and to reduce the covariances to zero at large distances. To implement the idea, let K tap be an isotropic correlation function of compact support; that is, K tap(h) = 0 if h 2 7 for some 7 > 0. Then, the tapered covariance function If is the product of K and K tap, R01; 9,02) = K(h; 6,02)Ktap(h), (1.1.2) and the tapered covariance matrix is a Hadamard product ‘77, = Vn(6,02) o Tn, where Tn has the (2', j )th element as K tap(||t,- — th). The tapered covariance matrix has a high proportion of zero elements and is, therefore, a sparse matrix. Inverting a sparse matrix is much more efficient computationally than inverting a regular matrix of the same dimension [see, e.g., Pissanetzky (1984), Gilbert, et a1. (1992) and Davis (2006)]. One would use the tapered covariance function R for spatial interpolation and estimation as if it was the correct covariance function. For example, the tapered maximum likelihood estimator maximizes the corresponding log—likelihood n 1 ~ 1 ~ _ en,tap(9,a2) = 7- log 27r — 5 log[detVn] — -2—gi,, 1X,,. (1.1.3) Intuitively, if the taper is sufficiently close to 1 in the neighborhood of the origin, the tapering would not change the behavior of the covariance function near the origin. It has been seen that the behavior of the covariance function near the origin is most important to fixed-domain asymptotics. Stein (1988, 1990a, 1990b, 1999a and 1999b) has established rigorous fixed-domain asymptotic theory for spatial interpolation. Ap- plying the general fixed-domain asymptotic theory, Furrer, Genton and Nychka (2006) 2 showed that appropriate tapering does not affect the fixed—domain asymptotic mean square error (mse) of prediction for Matérn model. In this direction, we studied further the convergence rate of comparison of mse’s of prediction with and without tapering in section 2.3 and 3.2. Kaufman, et a1. (2008) showed that the parameter in the Matérn covariance func- tion, which is consistently estimable under the fixed-domain asymptotic framework, can be estimated consistently by the tapered MLE with 9 fixed. However, it has been unknown whether the covariance tapering results in any loss of asymptotic efficiency. One of the main objectives of this thesis is to establish the asymptotic proper- ties, and particularly the asymptotic distribution of exact and tapered MLES under the fixed-domain asymptotic framework. We now make a few remarks about why we adopt the fixed-domain asymptotic framework and how to deal with fixed-domain asymptotic problems. When the spatial domain is fixed and bounded, more sample data can be obtained by sampling the domain increasingly densely. This results in the fixed-domain asymptotic framework. It is known that not all parameters in the covari- ance function are consistently estimable [e.g. Zhang (2004)] under the fixed-domain asymptotic framework.However, there is another asymptotic framework, where more data are sampled by increasing the spatial domain. This is the increasing domain asymptotic framework. Under mild regularity conditions, MLES for all parameters are consistent and asymptotically normal [see Mardia and Marshall (1984)]. There- fore, asymptotic results are quite different under the two asymptotic frameworks. In any real application, we work with a finite sample and there is a need to know which asymptotic framework is more appropriate to be applied to the finite sample. Zhang and Zimmerman (2005) showed through both theoretical and simulation studies that, for the exponential covariance function, the fixed-domain asymptotic approximation performs at least as well as increasing domain one. Actually, it has been extensively accepted that the fix-domain asymptoics is more relevant to actual spatial data col- lection. In light of these results, we adopt the fixed-domain asymptotic framework in 3 this work. Fixed-domain asymptotic results for estimation are difficult to derive in general because of the increasingly correlated observations and there are only few results in literature [see Stein (1990c), Ying (1991 and 1993), Chen, Simpson and Ying (2000), Zhang (2004), Loh (2005) and Kaufman, et a1. (2008)]. Existing asymptotic distribu- tions have been established only for specific models such as the exponential model for covariance functions [see Ying (1991 and 1993) and Chen, Simpson and Ying (2000)] and a particular Matérn model with the smoothness parameter V = 1.5 [see Loh (2005)]. For the general Matérn model, the fixed-domain asymptotic distribution is not available even when data are observed along a line. In order to evaluate the effi- ciency of the tapered MLE, we establish the fixed-domain asymptotic distribution of the MLE for the microergodic parameter (Stein (1999b), p.163), which is more impor- tant to spatial interpolation, in the general Matérn model [Theorem 2.4.3] under the assumption that data are collected along a line. This result is of interest in its own right, outside the context of tapering. It is even more difficult to study asymptotic properties of tapered MLE. Indeed, we are not aware of any fixed-domain asymptotic distribution established for tapered MLE. For this reason, we will start with a simple model, the Ornstein-Uhlenbeck process along a line, which is a stationary Gaussian process with zero mean and an exponential covariance function, and has Markovian properties. Due to the Markovian properties, the inverse of the covariance matrix can be given in closed form and is a band matrix. Therefore, for this model, it is not necessary to approximate the likelihood function. However, this simple model serves as a starting point in the study of covariance tapering and provides insight into the more general settings, which we will study subsequently. Although spatial data are usually collected over a spatial region, there are sit- uations when data are collected along lines. One example is the International H2O Project, where measurements of meteorological data were collected by surface stations 4 and aircraft along three flight paths that are along straight lines and transect under the varied environmental conditions of the southern Great Plains [see Weckworth et a1. (2004), LeMone et a1. (2007) and Stassberg et a1. (2008)]. Ecological data are sometimes collected along line transects as well. One of the most important probabilistic tools to establish fixed-domain asymptot- ics is the equivalence and singularity of Gaussian measures and the related criteria. In the next chapter, we therefore summarize those conditions for the equivalence of Gaussian measures that will be used to study fixed-domain asymptoics later and iden- tify the concordance between the equivalence of Gaussian measures and fixed-domain asymptoics. Even though most of the analytic approaches used so far for fixed-domain asymptotics are through spectral conditions for the equivalence of Gaussian measures, it should be preferable from a practice perspective to characterize the required condi— tions, such as the taper condition in spatial domain directly. With this in mind, we attempt to employ the conditions for the equivalence of Gaussian measures using re- producing kernel Hilbert space which has no constrains to stationarity of process. Two alternative proofs based on this idea are provided for some results about fixed-domain asymptoic prediction and sufficient conditions for the equivalence of two Gaussian measures. Chapter 3 focuses on tapering techniques and fixed-domain asymptotic properties of tapered maximum likelihood estimators. We note that the condition on the tail behavior of tapering plays the essential role of maintaining the asymptotic optimality of prediction and consistency as well as efficiency of estimation using tapering. This will be specifically addressed in section 3.2. In Section 3.3 the strong consistency and asymptotic normality of tapered MLE with both parameters jointly maximized for the Ornstein-Uhlenbeck process are presented first. Then, for the microergodic pa- rameter in a Gaussian stationary process having a Matérn covariogram, we establish the asymptotic distribution of both exact and tapered MLES with one of the para- meter chosen fixed. Finally for the general case, in view of the lack of fixed domain- 5 asymptotic results for MLE of joint maximization under the general setting, we will just study limiting differences between the log-likelihood and tapered log—likelihood and the difference between their derivatives. The results in section 3.4 indicate that tapered MLE and the exact MLE may have the same asymptotic distribution and therefore tapering may yield no loss of efficiency. Finally those theoretical results are supported by simulations study in two dimensional case and this suggests the possible generalization to high dimensional case of the derived asymptotic theorems in previ- ous sections. The computational gain and accuracy of results are also demonstrated by an application of climate data. We put all proofs in the three appendices. Chapter 2 Fixed-domain asymptotics and the equivalence of two gaussian measures 2. 1 Introduction There are two types of asymptotical framework in spatial statistics. One is the fixed- domain asymptotic framework (Stein(1999b), p.11) or infill asymptotic framework [Cressie(1993), p.350], where more data are taken ever densely in a fixed and bounded domain. The other one is increasing domain asymptotic framework, where more data are sampled by increasing the spatial domain and the minimum distance of nearby points is bounded below. This is the spatial analogue of the asymptotics usually observed in time series. Now, we explain more specifically about why we choose fixed- domain aymptotic framework over increasing domain one in our work. First, fixed domain asymptotics seem to be more relevant to the process of most of the spatial data collections. It is not hard to imagine intuitively that the more and more accurate local weather interpolation or kriging (best linear prediction for spatial process) can often be obtained based on more and more information available from increased number of weather stations in the same region. Moreover, Stein (1999b) mentioned in his book that, fixed domain framework is the more appropriate asymptotic framework for spatial problems related to local behavior of process like interpolation. Actually, asymptotic results are quite different under the two asymptotic frame- works, It has been mentioned earlier that not all parameters in the covariance func- tion are consistently estimable [e.g. Zhang (2004)] under the fixed-domain asymptotic framework. Zhang and Zimmerman (2005) argued that MLES of the microergodic parameters are generally consistent but those of the non-microergodic parameters in general converge in distribution to a non-degenerate distribution. Following Stein (1999, p. 163), we say that a function h(q§) is microergodic if, for all gb and (Z) in the parameter space, h(¢) # h( 0 /\—+00 a suflicient and necessary condition for the equivalence of the Gaussian measures P0 and P1 on the o-algebra .7: is that the difference between their covariance functions b(s, t), s,t E D {D is any finite interval), be ertendable to a square-integrable function 13 b(s, t), —00 < s,t < 00, whose Fourier transform gc()\,,a) satisfies |R ./lpl>R——(_— f0(/\) W (V) < 00 ( ) for any R. In particular, if the spectral density [0 is such that fee) >< W)? (22.13) for sufficiently large [A], where ;p(/\) is the Fourier transform of some square integrable finite function, then there is a sufficient condition which is relatively easy to verify in practice. Theorem 2.2.5. For spectral densities of the type given by (2.2.13), a sufficient and necessary condition for the equivalence of the Gaussian measures P0 and P1 on the pathes of {X(t),t E D}, where D is any finite interval, is that the function h : f0(/\) - f1(/\) f0(/\) satisfies the condition / |h()\)|2 dA < 00, (2.2.14) |A|>R for any R < 00. Another method to find out the sufficient and necessary conditions for the equiva- lence of Gaussian measures is to use reproducing kernel Hilbert space. This approach has no constrains like stationarity or isotropy on the underlying process. Therefore it could be a potential tool for analyzing nonstationary processes. We will review some important results given by Chatterji and Mandrekar (1978), it is worth noting that those results apply to any dimensional case: 14 Definition 2.2.6. Let D be any set and K be a real-valued covariance on D X D. Then K iscalled a covariance on D if (a) K(s,t) = K(t,s) for any s,t E D and (b) Ensel asatK(s, t) _>_ 0 for all finite subsets I of D and {as, s E I} oflR. Definition 2.2.7. Let D be any set. A Class [C(K) of functions on D forming a Hilbert space is called the reproducing kernel Hilbert space (for short, rkhs) for a covariance K . The following theorem gives existence and uniqueness of rkhs. Theorem 2.2.8. [Aronszajn(1950)/ Let D be any set and K be a real-valued function on D x D. Then there exists a unique Hilbert space lC( K) of functions on D, satisfying K(-, t) E [C(K) for each t E D; (f,K(-,t)) = f(t) for each f E IC(K) and t E D. To give the sufficient and necessary condition for the equivalence of two Gaussian measures, we need introduce a “product” covariance function which is defined as K0 ® K1{[(t1, t2), (31, 32)]} = K0(t1, s1)K1(t2, 32), where K0 is a covariance function on D1 and K1 is a covariance function on D2. We will present the following two theorems in a more general setting. Let D be an index set. Let {X (t),t E D} be a family of real random variables on a measurable space (Q, A) and f = 0{X (t), t E D}. Suppose P0, P1 are two measures on f such that {X (t),t E D} is a Gaussian process on (Q,.F,P,-),i = 0,1. Denote by K0(t,s) = E0(X(t) — m.0(t))(X(s) — m0(s)) and K10, 8) = E1(X(t) - m1(t))(X(S) 4711(8)), where E0(X(t)) = m0(t) and E1(X(t)) = 7711(1)- Theorem 2.2.9. The following are equivalent. (a) P0 E P1 on f. (b) K0 — K] E K(K1®K0) and m1— m0 E [C(Ko). 15 (c) (i) There exists 71,72 (0 < 71 S 72 < 00) such that 71K0 << K1 << 72K0. (ii) K0 — K1 6 [C(K0® K0); and (iii) m0 - m1 6 [C(Ko). Where 71K0 << K1 means that K1 — 71K0 is a covariance. By virtue of rkhs, we can obtain the analogue of Theorem 2.2.1 in spatial domain. Let ’Hp(m,-, K,) be as defined before, which is the completion of linear manifold of {X (t), t E D} with respect to the inner product given by the corresponding second- order structure (mi, K,), i = 0,1 here. Let A be the bounded linear operator on ’HD(m0, K0) into HD(m1, K1) such that Ah; = h, for any ht as linear combination of {X(t),t E D}. Theorem 2.2.10. P0 E P1 on .7: if and only if that (a) A is one-one bounded, with bounded inverse on HD(m0, K0) into HD(m1, K1) (b) (I _— A*A) is Hilbert-Schmidt, where I is identity on ’HD(m0, K0). (C) m1 — m0 E [C(KO) Since those conditions for the equivalence of Gaussian measures based rkhs has no constrains on the set D, we can extend Theorem 2.2.3 and Theorem 2.2.5 to high di- mensional case. We note that some slightly weaker results presented in Yadrenko(1983, p.154 and p.156), but we believe our proofs are more straightforward and clearer in the context of this thesis. Let {Xt,t E D} be centered Gaussian random field, with covariance function K j and spectral measure F j with spectral density fj under corresponding measures Pj,j = 0,1, where D C Rd is bounded. Let b(s,t) = K0(s,t) — K1(s,t), s,t E D, as is well known that Lj(s, t) = f ei’\,(s_t)fj(/\) dA. Suppose there exist constants 71,72 such that ’71 K0 << K1 << 72K0, Theorem 2.2.11. A necessary and sufficient condition for the equivalence of the Gaussian measures P0 and P1 is that the covariance diflerence b(s,t), s,t E D can be 16 extended to a square integrable function b(s, t) on Rd x Rd and the Fourier transform (,9 of which satisfies hflku)| [Rd [Rd— foo.) )fo (u) dAdp < 00- (2.2.15) Proof. If follows from the Theorem 2.2.9(c) [Chatterji and Mandrekar(1978), p.183]. that P0 :— P1 is equivalent to b(s, t) E [C(Ko «8) K0), where the reproducing kernel Hilbert space (C(Ko ® Kol— — {95.33 =//6 -z( (SIAM; V) 990‘, V) (“’09) 011’700u ) 99 E L2(F0 X 170)}, which follows from Theorem 2.2.8. Because memes). (st. t1» =K0(S—SI)K0(t-t1)=//6i(t—t1)l)‘+i(s_sl)’” are) due). and for any cf) 6 [C(KO <8) K0), there exists 30 E L2(F0 x F0) such that (AKbeKmat)» 2% flay/“81” )ew t1” “S‘sllmmxmdflwdF(u)ds1dt1 =rMst D We will employ the following well—known properties of Fourier transform. For any square integrable functions (with respect to Lebesgue measure) gay-(A), A E Rd, there are square integrable functions aj(t), t 6 Rd such that tpj(/\) =/ exp(—iA't)aJ-(t)dt, j: 1,2. Rd 17 Furthermore, memo) = [Rd expi—z‘xnial * aext) dt (2.2.16) fad eXp(”“29910090200dA = (2r)d(a1 * a2)(t), (2.2.17) where all the equalities are in the L2( dA) sense, and a1 * a2 is the convolution, i.e., a1* aga) = [R .1 a1(s)a2(t — 5) ds. Theorem 2.2.12. Suppose o < fo(>\) x lc(>‘)l2, IAI —» 00. (2218) where to is the Fourier transform of some square integrable finite function identically zero outside bounded set T. Let h A ( ) [0(3) satisfy the condition, for some [II > 0 / |h(A)|2 dA < 00. (2.2.19) |A|>M Then P0 E P1. Proof. When the function h(A) is square integrable and f0(A) x |4p(A)]2, where 92 = f eitlAc(A) dA and C(A) = 0 when A lies out of bounded set T. Then Plancherel’s Theorem (see,e.g. Yosida (1968), p.153) implies there exists square integrable function 18 a(t) such that h(A) = fe‘i’Vta(t) dt. Furthermore, for s,t E D x D, b(s,t) = feixls")(f1(z\)- foo» an = / ems—em) we)? dA By (3.3.83), not? = f expen’t) (/ c2 f0(A). Let 19 fO(A) = |‘.p(A)|2, then f1(A) = f0(A) + (f1(A) — f0(A)). From the result in first case, we know there exists extension of the difference b(s, t) such that corresponding Fourier transform (b(A, p) satisfies ff—W 1700);; :)dA dp < 00. (2.2.21) Note that b(et) = fei*’lf1—foo)| dz = / ems-t) mo) — fo(>\)| dA For large enough M1, there exists 73 such that f0(A) < 73f0(A), |A| > M1, this together with (2.2.21) gives / / _|__(>‘u)l2 dAduMl |p|>ul f0()\) fem ) Hence P0 E P1 in this case from Theorem 1. Finally, for any f(A) of the type (2.2.18), let f2(A) = f0(A) + max{0,f1(A) — f0(A)}. It is observed that f2(A) 2 f0(A),- f2(A) Z h(A) and fi) H) for some Mg > 0. Let P2 be the measure induced by Gaussian homogenous random field on D with the corresponding spectral density [2. Based on the results for the second case, we have P0 E P2 and P0 E P2. This implies P0 E P1. The proof is completed. C] In the last part of this section, I will particularly address the the equivalence of Gaussian measures for Matérn model. It means that the underlying process {X (t)} 20 possesses the following isotropic Matérn covariance function 2 V 0 (6h) liq/(6h), h > 0, (2.2.22) K h- 2 6 = _— ( 70) 3V) F(V)2V_ with parameters 02,6 and V, where [CV is the modified Bessel function of order V [see Abramowitz and Stegun (1967) p.375-376], 02 is the covariance parameter, 6 is the scale parameter and V is the smoothness parameter. This covariance struc- ture has been widely used in practice for modeling spatial data. Due to parameter V, Matérn covariagram has great flexibility to model a large variety of Gaussian processes with different degree of smoothness. In particular, it includes exponential covariance K (s, t) = o2 exp{—6 It — 3]} as a special case with V equal 1 / 2 and Gaussian covari- ance function K (s, t) = 02 exp{—6 |t - s|2} when V —> 00. In addition, Matérn model is mathematically amendable largely because it possesses a rational spectral density as in the following F(V + d/2) 0292” Few/2 (62 + IIAII2)"+d/2' f(A; 02, 0) = (2.2.23) Zhang(2004) showed Theorem 2.2.13. Let P,- be probability measure under which X (t), t 6 Rd is station- ary Gaussian with mean 0 and an isotropic Matérn covariagram ( it is a term used in geostatistics for covariance) with a variance 02-2, a scale parameter a,,i = 1,2, and the same smoothness parameter V, where d = 1, 2, or 3. For any bounded infinite set D c Rd, P1 2 P2 0n the paths of X(t),t e D, if and only if 0393" = 0%". This theorem implies that for Matérn model, neither 02 nor 0 is consistently es- timable, but the quantity 0202” is (Zhang (2004)). We will apply the results in this section to spatial interpolation in next section and parameter estimation later. 21 2.3 Equivalence of Gaussian measures and kriging- The best linear unbiased prediction for spatial data is often called kriging, or spa- tial interpolation. If two covariograms define two equivalent Gaussian measures, the interpolation under the two covariograms will be asymptotically equal. This can be seen from some well established probability results. Blackwell and Dubins (1962) de- rived the following results. Let two probability measures P0 and P1 be equivalent on X(t),t 6 D and tk, k = 1,... ,n be the sampling sites in D, and tk, k = n+1, n+2, . . ., the sites in D where spatial interpolation is needed. Write X k = X (tk). sup I P1(A]X1, . . . ,Xn) — P0(A|X1, . . . ,Xn)| —> 0, as n —> 00 where the supremum is taken over all A E a(Xk, k > n). In particular, sup IP1(Xk E B]X1,...,Xn) — P()(.X}C E BIX1,...,X")] k>n,B (2.3.1) —>0,asn—>oo. Therefore, the predictive distribution of X k for any h > n given X1, . . . ,Xn would be nearly the same for sufficiently large n. This asymptotic optimality of prediction based on misspecified covariance structure is also studied systematically by Stein (e. g., 1990a,1999b). In the following, we will briefly review some of the results and give an alternative proof for the first basic theorem (Theorem 8 in Stein(1999b))using Hilbert- Schmidt operator. Let the underlying process X(t) be stationary with mean zero and be observed at locations tk, k = 1, 2 . . . , in a bounded region D C Rd. Assume the set of sampling sites {tk, k = 1, 2, . . . } is dense in D. For any sample size n and any site t at tk for any k, let X,(o2,n) be the best linear prediction based on X (t1), . . . , X (tn) under 22 the covariance function K,(h), i = 0, 1 and write the corresponding prediction error A e,(t, n) = X(t) - X,(t,n). (2.3.2) In general, for any h E H(K0), let e,(h, n) = h — E(h|X1, . . . ,Xn) be prediction error for h. If we regard P0 as the measure corresponding to the true covariance function and P1 the measure corresponding to misspecified covariance function. The following theorem [Theorem8 in Stein(1999b)] states that as long as these two covariance struc- tures specify equivalent Gaussian measures, the misspecified covariance K1 will still retain the asymptotical optimality of prediction in terms of quality of point prediction and accuracy of assessments of mean square errors. Suppose (0, K0) and (m1, K1) are two possible second-order structures on ”H,, the linear manifold generated by {X(t),t E D}. Define H(0, K), ’H(m, K) as in section 2.2. Take $1,1/12,... to be the Gram-Schmidt orthogonalization of til, ’12, . . . under (0, K0) so that K0(wj, wk) 2 (SJ-k. Define operator A as in Theorem 2.2.10. Based on this theorem, we will give an al- ternative proof here. Theorem 2.3.1. If P0 E P1,,let H-” be made up of elements h of’H(0, K0) for which E0 e0(i/),n)2 > 0.Then E160(¢.n)2 — E0 800% n)2 lim sup 2 0 (2.3.3) "“00 term, BO 6001),")2 E 2 _ 2 E 2 _ , 1,. 2 lim sup 0 81W), n) E0 ZOW' n) = 0 (2.3.5) "TOO $671...” E0 600/), Tl) E 2 _ .. 2 lim sup 180W, Tl) E1621(7Ln) : 0 (2.36) "“00 veitn E1 610/): 71) Proof. It follows from Theorem 2.2.10 that P0 E P1 implies (I — A“ A) is Hilbert- 23 Schmidt and m1 6 [C(KO). Therefore a) a) Z “(I — NAlelfi < 00 and 2m; < 00. (2.3.7) i=1 j=1 Where mlj = E1(r,l1j). For if) E H(0, K0),we can write it = 23:1 cjwj,where Z c? < 00. Then the error of the best linear prediction for if) given HMO, K0) is 00 800.0,”) = 2 0311132 j=n+1 If E0 e()(ib,n)2 > 0,then as n —+ 00 E1 80(1/J,n)2 — E0 800b, n)2 E0 60(4). n)2 = E1(4600/1: n)2) - E0 60(111. 702 E060012.71)2 ](A60(¢r ”)7A60(¢2n))H(m1,K1)+(E1 60W, ””2 — (80(Ip, n)? 60010, n))H(O,K0)] — Eo 60W. ”)2 ](A*A€0(t/J.n). 800/), n))H(0,K0) — (80W).n).60(¢,n))n(o,xo) + (Zgin-fl ijjl2] Ejfin+19§ * a) 2 (n 2 [((I — A Meow. n). eoii. ammo) + z c,- :3 (m1),- j=n+1 j=n+1 5 oo 2 (2.3.8) ijn+1 C) Cauchy-Schwarz inequality indicates that the left hand side of (2.3.8) * ' ‘ 00 S I|(I — A A)60(C:I:7n)ll02ll80(¢7n))ll0 + 2 m1? 23f=n+lc' j=n+1 ( ) 2.3.9 >332...“ Ile II(I - A"‘Alz/lelo (£31m Ci 0" .2 S 00 2 + 2 m1] §:f=n+1§j j=n+1 24 and this is . , 2 Mme... Wm... H(1 — A at)”, oo 2 S a) 2 .+ "Hj V ijrH—l c]. j=n+1 (2.3.10) 00 = Z Ilt-II3+ 2 m1? j=n+1 j=n+l The right side of (2.3.9) does not depend on w and tends to 0 as n -—> 00 by the (2.3.7), so (2.3.3) follows.Switching the roles of K0 and K1 yields (2.3.4). Next, since E16%SEE168 E081(1/ML)2 < E0 6100.702 . E161(¢.n)2 Bo 60012,")2 — 13181011.”)2 E0 600%”)2 So (2.3.5) follows from (2.3.3)(2.3.4). Again switching the roles of K0 and K1 yields (2.3.6). 1:1 Another sensible measure for how well predictions based on K1 do when K 0 is the correct covariance function is Eo((*1(tan) — eo(t,n))2 E0 60(0)")2 ’ i.e., how large the mean squared difference of predictions is relative to correct mean squared error. Because the mean squared error is often calculated in practice, it is also of interest to compare the presumed MSE with the actual MSE by evaluating the ratio E1e1(t,n)2/ E0 e1(t, n)2. The following theorem is a special version of Stein (1999b, p.135). Theorem 2.3.2. If the two covariance functions K0 and K1 define two equivalent Gaussian probability measures, and the set of sampling sites {tk, k = 1, 2, . . . } is dense 25 in D, where D C Rd is bounded, then uniformly in t E D such that E0 e0(t, n)2 > 0, E0(€1(02an) - 60(t.n))2 l' = 0 2.3.11 "530 E0 60(t, n)2 ( ) "limoo E1e1(t,n)2/E0 el(t, n)2 = 1. (2.3.12) Sufficient conditions for equivalent probability measures exist in terms of spectral density. Let fi(A) for A 6 Rd be the spectral density corresponding to the covariance function K,(h) for i = 0,1. If f,(A)’s are isotropic, i.e., depending only on “A”, it follows from Theorem 2.2.5 that the corresponding Gaussian measures are equivalent if for some e > 0, f1(>\)/fo(A)-1 = Oahu—(”2+”) as Inn _. 00. (2.3.13) Condition (2.2.19) implies that f1(A)/f0(A) ——> 1 as ”All —+ co and imposes some rate on the convergence. Condition (2.2.19) is stronger than necessary for (2.3.11) and (2.3.12) to hold, as seen from the next theorem (see Stein 1999b, p.136). Theorem 2.3.3. Let the underlying process X (t) be Gaussian under probability P,- with mean 0 and spectral density f,, i = 0, 1. Iffor some (.9 > 1, f0(A) IIAII‘p is bounded away from 0 and 00 and f 1(A) f0()\) then (2.3.11) and (2.312) hold. —>1 as ”All -—> 00, If observations are taken on infinite lattice (SZ while 6 —> 0, the rate imposed on the tendency of f1/f0 ——> 1 will yield the rate of convergence to optimality of predictions in (2.3.11) and (2.3.12) (see Stein 1999a, p.252). As (2.3.2) defined in finite sample case, we let e,-(t, 6) be the prediction error of X (t) under measure P,- when the process is observed at an infinite lattice 6%. Theorem 2.3.4. If f,(A)(1 + HAN)“9 is bounded away from 0 and 00 (i = 0,1) and 26 lf1(’\)/f0()‘)— 1] S A(1+ ||A||)_7 for some positive numbers 90,7, A, then as 6 ——> 0 E0(60(02.5) — 81012.5»2 530810127032 E161((7215)2 S11" 2 2" 02¢52d E061(0 a6) sup 0 (6min(§9,27)(log 6-1)1{,c=2c,}) (2.3.14) 02¢6Zd I] 0 (6mi“(W>(Ieg 5—1)1{~9=r}) (2.3.15) Under an additional condition on the spectral densities, Stein(1990a, p.259) also obtained such convergence rates when the observations are unequally spaced on an interval. 2.4 Fixed-domain asymptotics of maximum likeli- hood estimators Comparing with increasing domain asymptotics, fixed—domain asymptotic results for estimation are considerably fewer due to lack of analytic tools dealing with increas- ingly stronger correlations between nearby observations. More specifically, taking more and more data in a fixed domain will give you more and more correlated ob- servations, as opposed to increasing domain asymptotics, where taking samples in an increasingly large domain gives roughly independent observations if correlations decay with distance fast enough. For the simplest model, exponential model, the fixed do- main asymptotics of exact MLE has been thoroughly studied by Ying(1991 and 1993), Chen, Simpson, and Ying (2000). For the general Matérn model, Zhang(2004) gives the strong consistency of MLE with 6 fixed. However, the fixed-domain asymptotic distribution is not available even when data are observed along a line, therefore we study and establish first the fixed-domain asymptotic distribution of MLE for the mi- croergodic parameter in the general Matérn model. In the following, we will present some of these results. Let the second order stationary Gaussian process X (t),t 6 Rd have mean 0 and 27 an isotropic covariance function K (h; 6, 02), where a2 is the variance of the process and 0 is the parameter that controls how fast the covariance function decays. Given n observations Xn = (X (t1), . . . ,X (tn))’, the log—likelihood is 1 1 en(9,02) = ‘3 log 27r — Eleg[detv,,(19,02)] — Ex;,[v,,(0,02)]"1x,,, (2.4.1) where Vn(0, o2) denotes the covariance matrix of X". The maximum likelihood es- timator (MLE) of (6,02) maximizes the likelihood function €n(0,a2), i.e. (8,62) = ArgMax €n(0, 02). In this work, we will focus on Matérn model, that is, the covari- ance function considered in (2.4.1) is Matérn defined as in (3.3.14). As we mentioned 202V earlier, the product a is shown to be consistently estimable, although both pa- rameters 02 and 0 are not if spatial sampling domain is bounded [see e.g. Zhang (2004)]. It is actually more important to estimate 0262” well for spatial interpolation [see, e.g. Zhang (2004), Stein(1999b)]. For the exponential model which is the special case with V = 1 / 2, the underlying process is known as Ornstein-Ulenbeck process. Because this process possesses some nice properties such as the Markovian properties, the fix-domain asymptotic analysis is relatively more tractable [Ying (1991), Chen et al. (2000)]. Ying (1991) gives the following fixed-domain strong consistency and asymptotic normality for the MLE of the consistently estimable parameter 002 for exponential model as following: Theorem 2.4.1. Let the underlying process {X (t),t E [0,1]} be Gaussian with mean 0 and an exponential covariance function K (h) = o2 exp(—6h). The domain of the maximization in (6,02) is J = [a,oo] x [w,v] or [a, b] X [10,00], 0 < a < b < 00 and 0 < w < v < 00. Then, with probability one the maximum likelihood estimator A (dump, damp) exists for all large n and as n —+ 00 02,33, ——> 6003 as, (2.4.2) . . d mane, — 9003) —. N(0, 2(6003)2). (2.4.3) 28 For general Matérn model, there is no such properties like Markovian property strictly speaking for the underlying process. So the fixed-domain asymptotic proper- ties for (8,62) by joint maximizing both parameters are not available yet. However, Zhang (2004) proved that when V is known and 6 is fixed at any value 01 > 0, the max- imizer of likelihood (2.4.1) 6% ensures the following strong consistency of an estimator of the consistently estimable parameter 602. Theorem 2.4.2. Let the underlying process {X(t),t E Rd}, d = 1, 2,or 3, be second order stationary Gaussian with mean 0 and possess an isotropic Matérn covariogram 3.3.14 with the unknown parameter values 08, 00 and a known V. Let D”, n = 1, 2, . . . be an increasing sequence of finite subsets of Rd such that U311 Dn is bounded and infinite, and Ln(02, 0) be likelihood function when the process is observed at locations in Dn. For any fixed 01 > 0, let 6?, maximize Ln(02,61), Then 626%” —> 0303” with P0 probability 1, where P0 is the Gaussian measure defined by the Matérn covariogram corresponding to parameter values 03, 0 and V. We showed the asymptotic normality of this estimator. Theorem 2.4.3. [Du, Zhang and Mandrekar(2009)] With the same notation and assumptions as in previous Theorem 2.4 .2, for any fixed 01, flags?” — 0303") i. N(0, 2(0363”)2). (2.4.4) This theorem is proved based on those theoretical results related to the equivalence of Gaussian measures. The detailed proof will be provided in next chapter. 29 Chapter 3 Covariance tapering and fixed-domain properties of tapered maximum likelihood estimators 3.1 Introduction As introduced in the very beginning, applying some traditional statistical approachs, such as best linear unbiased prediction or kriging, the maximum likelihood estima- tion or the Bayesian inferences, to large spatial data sets are often computationally infeasible because of cubic order matrix algorithms on matrix inverse or determinant involved. To obviate these computational hurdles, a natural idea is to make the co- variances exactly zero after certain distance so that the resulting matrix has a high proportion of zero entries and is therefore a sparse matrix. Operations on sparse ma- trices take up less computer memories and run faster. However, this has to be done in a way such that the resulting matrix is still positive definite. Covariance tapering assures that the tapered covariance matrix is positive definite while retaining most of the information. Technically this method is to taper the covariance function to zero beyond a certain range by directly multiplying a positive definite but compactly 30 supported function, that results in the so called tapered covariance matrix which can be efficiently handled by well-established sparse matrix algorithms. The tapered covariogram is of the form ~ K(h;6’) = K(h;9)Ktap(h). (31-1) where K (h, 0) is the covariance function of the underlying process that depends on a vector of parameter 0 and K tap(h) is the taper, a known correlation function that is 0 after a threshold distance. Some examples of taper are spherical, Wendland tapers (Wendland, 1995, 1998). See also Wu (1995), Gneiting (2002) and Mitra et a1. (2003). One would then use K(h, 0) in estimation and interpolation as if it was the cor- rect covariance function. For example, we would maximize the following tapered log likelihood function for Gaussian observations to obtain an estimate for 6, 2 n 1 ~ 1 I “' _1 [n,tap(09(7 ) 2 —5 log 27r — §10g[detVn] — EXnVn Xn. (3.1.2) where the tapered covariance matrix is a Hadamard product Vn = Vn(6, 02) o Tn, and Tn has the (i, j)th element as Ktap(||t, — tj“). Xn is the vector of observations. Maximizing (3.3.4) is computationally feasible for extremely large datasets but results in a pseudo-likelihood estimator whose properties need to be studied. Kaufman (2008) established consistency of the pseudo-likelihood estimator for Matérn class. Very recently, Du, Zhang and Mandrekar (2009) gave some general conditions to ensure that tapering does not affect the efficiency of the maximum likelihood estimator. For spatial interpolation, Furrer Genton and Nychika (2006) showed that under some regularity conditions, tapering produces asymptotically optimal prediction. All the conditions that result in no effect on consistency, asymptotic efficiency of estimation or asymptotic Optimality of prediction put constrains on the tail behavior of spectral density of tapering function. Roughly speaking, this can be accounted for 31 by noting that spectral density is the Fourier transform of covariance function. So the faster the spectral density of taper decays, the smoother the tapering function is at origan and it hardly changes the degree of differentiability of the underlying process, which actually plays a central role in any fixed-domain asymptotic results. Therefore, section 3.2 is devoted to address the tail properties of tapering spectral density and effect of tapering on spatial interpolation. The effect on estimation is in section 3.3, we will focus on studying the asymptotic distribution of tapered MLE since it is unknown if the covariance tapering causes any loss of asymptotic efficiency before. The content of these two sections is from the joint papers [Du, Zhang and Mandrekar (2009)], which has been accepted by Annals of Statistics, and [Zhang and Du(2008)] with my advisors. In the absence of asymptotic distributions for the true MLE in the general case, we compare the log likelihood function and the tapered log likelihood function, and their derivatives in section 3.4. The simulation study and an application to the climate data to show accuracy and computational are presented in the last section. 3.2 Tail properties of tapering spectral density If K (h) is the covariance function of X (t), and R (h) = K (h)Ktap(h) the tapered covariance function, denote the spectral density corresponding to K (h), R (h) and Ktap(h) by f0(A), f1(A) and ftap(/\) for A 6 Rd, respectively. We have the following relationship from a well-known fact about the Fourier transformation, MA) = / foo—xvtapwx. (32.1) Based on an equivalent equation as (3.2.1), Furrer Genton and Nychika (2006) have shown the following result. Theorem 3.2.1. Let f0(A) 0c (MW/(e? + ||>.||2)V+d/2 for A 6 Rd be the spectral density function of a Matérn covariance function. Let the spectral density fmpO‘) 0f 32 the taper satisfies the following taper condition: 0 < Imp.) < 1t1(1+||/\||2)_"_d/2—C (3.2.2) for some 6 2 0 and M > 0. Then f1(A)/f0(z\) has a finite limit as ”All —-> 00 and this limit equals 1 if e > 0. This theorem and Theorem 2.3.3 imply that if ftap(A) has a lower tail than f0(/\), then predictions under both models will be nearly the same when a large sample is obtained from a bounded region. We can give a stronger result which combined with Theorem 2.3.4 gives the conver- gence rate of mse’s based on a tapered covariance structure, this indicates the degree of maintaining the Optimality of prediction using appropriate tapering. It also pro- vides the condition for the two measures corresponding to the two spectral densities, i.e. tapered and untapered, to be equivalent in one dimension case. Theorem 3.2.2. If condition (3.2.15) holds for some 5 > d/2+'7/2, where 0 < 'y < 1, the following holds. 0) |f1(/\)/f0(>\) -1| S A(1+ ||/\||)"7 (32-3) 01 < MAXI + llA||)2“+d < 02 (2' = 0.1) (3.2.4) for some constants A, Cl, Cg > 0. (ii) The two Gaussian measures corresponding to the two spectral densities are equiv- alentford=1and1>7>1/2. Proof. We can assume 0 = 1 without loss of any generality because the results will be the same except for the magnitude of the constants A and (7,, i = 1, 2. Then in view of isotropy, f0(A) can be written as f6‘(||A||) = Ml/(l + ||A||2)V+d/2, where M1 is a positive constant [see (33.14)]. By continuity and f0(A) > 0, to show (3.2.3) it 33 suffices to show that there exists A > 0 such that when ”A” large enough, |f1(/\)— fo(>\)l(1 + HAM)” < A. MA) _ (3.2.5) Indeed, since tapered covariance function C () is also isotropic, from (3.2.1) it follows that MA) = from”) = [Rd/a(u — wl|)f€‘ap(llwll)dftap .(.~).~d-1d.~dU(u). 68d 0 and the LHS of (3.3.86) is therefore bounded by 0° l.f6(|l7"u—wvl|)—f6(llwv||)] * d_1 ‘ A12(1+w)7 [98.1 /0 f6(w) ftap(r)r (mum). (3.2.8) Furthermore, for w sufficiently large, we will bound the inner integral over intervals [0,w — A], [w — A,w + A], [w + A, 00) by some quantities independent of u, where A = 0(w6) for 6 = (d+ 21/ + 7)/(d+ 21/ +1), clearly 0 < 6 < 1. Then the assessment of the whole iterated integral in (3.2.8) becomes straightforward because the outer integral is one. First, note that the Mean Value Theorem implies that there exits 5 between w and 34 ||ru — wvll such that f(’)"(||rul - w'vll) - f5(||wv||) = f6'(€)(|lru - wvll - NW“) and |f6’(a:)] = [2Ml(z/ + d/2)x]/[(1 + $2)"+d/2+1] which is decreasing in a: > 0. Therefore, for r E [0,w — A] or [w + A, 00), we have Ilru — wvll 2 la) — r] 2 A. This together with w > A entails 6 > A, thus *I _ M3AT f0 (A)l ||ru|| _ (1+ A2)u+d/2+1’ (3'2'9) lf6<|lru - wvll) - f6(l|wv||)l S where M3 denotes 2Ml(u+d/2). For r E [w— A,w+A], we have ||ru — wvll Z |w—r| and w > A > |w — r|, which indicates { > Iw — r], so ,.. ,.. * A13 to — r|r lf0(ll7"u — vaI)—fo(llwv||)ls fo'(|w — r1) llrull=(1+ |,,_',.|2)..+d/2+1 (32-10) From (3.2.9) and taper condition (3.2.15), it follows that If5‘(||7‘u—W||)—f6lllwvllll * (1-1 * f r r dr (WM f0(w) ta“ ) M3A(1+ w2)"+d/2 / * d—l ' < d - M1(1+ Mad/2+1 (MM T9 W T (3'2“) M3MA(1 + w2)u+d/2 w-A rd 00 rd 5 , / dr+/ dr (”1(1 + A2)V+d/2+l 0 (1+T2)'U+(l/2+f w+A(1+T2)v+(l/2~R As 6 > d/ 2 2 1/2 — V, the first integral in (3.2.11) is finite and the second integral tends to 0 as w —> 00. To control the last inner integral part, we need to use the monotonicity of rd/ [(1 + r2)”+d/2+‘] which is decreasing in r for e > d/ 2. In view of this, (3.2.10) and taper condition (3.2.15), we have ft*ap(7')7'd_1 d'r /w+A ”a(u“; — w’vll) - f5‘(||w’v||)| —A f6(w) < M31W(1+ w2)1/+d/2/w+A '7. _ w] 7.d — Ml w—A (1 + |.,. _ wl2)u+d/2+l(1 + .,.2)v+d/2+e < M3M(w _ A)d(1+ W2)u+d/2 /w+A IT _ 9’] M1(1+ (w — A)2)”+d/2+€ WA (1+ Ir — wl2)”+d/2+1 dr, (3.2.12) where the finiteness of the last integral is easy to be verified via changing of variables. Since the total mass of outer integral in (3.2.8) is one, combining (3.2.8) with (3.2.11) and (3.2.12) gives oo * , limsung(1+w)/ f0 If0(llru wfv”) f°(”“m”)lfgap(r)rd—1drdU(u).)(1+ ||A||) < 02. (3.2.13) In addition, (3.2.3) entails f1(«\)s fo(A)(1+ A(1+ MAID”) s fo(A)(1+ A). Therefore f1(A)(1+ ||.\||)2V+d < 05(1 + A). (3.2.14) On the other hand, (3.2.6) and (3.2.13) imply <1+IIAII)2”+df1(A) = <1+IIAII>W A, h(A—mtapa) dz > Cf (drape) da: = Ci". which together with (3.2.14) completes the proof of (3.2.4) with C1 = Cf, Cg = C§(l + A), and (i) is proved. Finally (ii) follows readily from combining (3.2.3) with (2.3.13). E] In practice, the compactly supported radial basis functions constructed by Wend- 36 land (1995; 1998) are often used as the tapering function after parameterization. These Wendland tapers are of the form llhll Kw,d,k(h; 9) = A'ld,kpd,k(7)1{ogxga}. where Pd,k is a polynomial of degree [d/2]+3k+1. Specially Kw,2,1(h; 6) = (l—%)i(1+ 4%) and Kw,2,g(h;0) = (1— %)§_(l+ 6% + 33%;), 0 > 0,h > 0 (17+ = max{0,:z:}) are the two wendland tapers discussed in Furrer Genton and Nychika. (2006), their one dimensional spectral densities denoted by 91, gg have the following tail properties: A4g1()\) —+ 120/ (n73) and A6gg()\) —) 17920/(7r'y5), as A —> 00. Therefore, according to (ii) in Theorem 2.3.4, tapering with KW,2,1 or ngg will yield the equivalent Gaussian measure if the exponential model (V = 1/2) is studied, for instance. It was shown (Wendland, 1998, p.8) that 9d,k()\) S M(1 + ]|z\||2)”d/2’k_1/2 with 9d,k denoting the spectral density of K W,d,k in Rd, so we can see K W,d,k satisfies taper condition (3.2.15) with e > (d + 7)/2 whenever k > (d + 7)/2 + V — 1/2. By assuming faster decay of spectral density of tapering function than (3.2.15), Kaufman(2009) showed the following equivalence of two Gaussian measures denoted by P0, P1 corresponding to exact and tapered Matérn covariance functions. Theorem 3.2.3. Let K0 be the Matérn covariance function on Rd, d S 3, with parameters 02, theta, V, and let ftap be the spectral density of tapering function K tap- Suppose there exist 6 > max{d/4,1 — V} and Me < 00 such that o < ftapo) < M(1+||/\||2)""‘d/2‘C (3.2.15) Then P0 :—: P1 on the paths of {X(t),t E D}, for any bounded subset D C Rd. Actually tapering condition (3.2.15) with e > max{d/4,1 — V} implies that the difference of the spectral density function of exact and tapered covariance functions 37 is of the following order, f0()\) — h(A) MA) = 0(IIAII") with r > 1/2. The condition (2.2.14) in Theorem 2.2.5 is satisfied and therefore the Gaussian measure specified by tapered covariance function is equivalent to the original one. This theorem will lead to the strong consistency of tapered MLE under the tapering condition above. Now if we strengthen the tapering condition (3.2.15) for d = 1 by letting e > max{1/2,1— V}, we get f0“) — h(A) h(A) = 0(||/\||_T) with r > 1. This is stronger result than equivalence of Gaussian measures and it will enable tapering to retain the asymptotic efficiency. The detailed proof of this result will be given in the next section. 3.3 Fixed-domain asymptotic properties of tapered maximum likelihood estimators 3.3.1 Strong consistency of tapered MLE for Matérn model Assume that the underlying process X (t), t 6 Rd is stationary Gaussian with mean 0 and a Matérn covariogram as defined in section 2.2. Namely, 2 V , ' 2 _ 0’ (01') , [((I, (7 ,0, V) — WTKV(0I), I 2 0, (3.31) Let the process be observed at n locations tk, k = 1, . . . , n and V is known. Write X n = (X (t1), . . . ,X (tn))' . Then the maximum likelihood estimator following log 38 likelihood function 1 1 €n(0,a2) = 125 log 27r — 5 log[detVn(6,02)] — Ex;,[v,,(0,02)]—1x,,, (3.3.2) where Vn(0, 02) is the covariance matrix whose (i, j )th element is K ( ”t,- — tj H ,02, 0, V). The maximization has to be carried out using some iterative procedure such as the Newton-Raphson method or Fisher’s scoring method. Then the covariance matrix Vn has to be inverted repeatedly. When n is large, the inversion can be quite slow if not impractical. To overcome this, we replace the covariance function with a tapered covariance function frat; 9,02) = K(h; 6,02)Ktap(h), (3.3.3) for some correlation function having a finite support, i.e., Ktap(a:) = 0 if ”in” Z *7 for some '7 > 0. The covariance matrix corresponding to the tapered covariance function K is the Hadamard product Vn = Vn o Tn where Tn = (Ktap(||t,- — tj||)):j=1. We would then maximize the following function, which will henceforth be call the tapered log likelihood function, 2 Tl 1 “' l I " _1 ln,tap(6,a ) = —§ log 27r — 2 log[det Vn] — §XnVn Xn. (3.3.4) How does this tapering affect the estimation? We will resort to the fixed-domain asymptotic theory to find an answer. As we mentioned in first chapter, It is known that not all parameters in the covariance function are consistently estimable (Ying, 1991; Zhang, 2004, e.g.) under the fixed-domain asymptotic framework. First we note that Zhang (2004) showed that two Matérn covariance functions with the same smoothness parameter define two equivalent Gaussian measures if they have the same product 0262” on the paths of {X (t),t e D} where D is bounded (Zhang, 2004, Theorem 2). Consequently, the parameters 02 and 0 cannot be estimated con- sistently if the spatial sampling domain is bounded, regardless of estimation method. 39 However, Zhang (2004) showed that 0202” is consistently estimable and constructed a consistent estimator. Following Zhang’s approach, we will construct an consistent estimator of the prod- uct 0202” using the tapering covariance function. We assume V is known for technical reason. Then the tapered covariance matrix Vn = 02Rfl(9) o Tn, where Rn(6) is the correlation matrix and depends only on 6. Fix 0 at a known value 01 and maximize €n,tap(02a 01) which equals n 1 ~ 1 ~ —1 log(2rr) — 5 log |an - -2-X;,Vn Xn 2 n 1 n 1 _ = —§ log(27r)—-2- log IR.” 0 Tn] — §log(02)—WX;,(R,1(61) o Tn) 1X71 (3.3.5) Hence . 1 _ atapvi) = ArgMaxen,tap(a2. 61) = ;X£.(R..(61)o T.) 1X... Theorem 3.3.1. [Zhang, et al.(2008)/ Assume the underlying process on Rd, d S 3 is Gaussian with mean 0 and a Matérn covariance function (3.3.1). If the taper is such that the tapered covariance function K(-, 02, 0, V) and the untapered covariance function K (., a2, 0, V) define two equivalent measures for X(t),t E D, then as n —> oo 57%;;le)ng -—> 0868” as. where 08 and 90 denote the true value of the parameters. Combing this theorem and the equivalence result (3.2.3) under tapering given by Kaufman yields the following strong consistency theorem. To simplify the notation, - ~2 2 write amp for Utap(61)' Theorem 3.3.2 (Kaufman, et al.(2008)). Let the underlying process {X (t), t 6 Rd}, d = 1,2,or 3, be second order stationary Gaussian with mean 0 and possess an 40 isotropic Matérn covariagram 3.3.14 with the unknown parameter values 08,60 and a known V. Let Dn, n = 1, 2, . . . be an increasing sequence of finite subsets of Rd such that U321 0,, is bounded and infinite, and €n,tap(02:0) be likelihood function when the process is observed at locations in Dn. With K tap satisfying the taper condition (3.2.15) with e > max{d/4,1—V} For any fixed 01 > 0, let 6% maximize En,tap(o2,61), Then 6%.,tapefu —2 0868” with P0 probability 1, where P0 is the Gaussian measure de- fined by the Mate’rn covariagram corresponding to parameter values 08, 6 and V. 2 MPG?” is a strongly consistent estimator for The previous theorem says that 6 the microergodic parameter 0368” for any fixed 01. Does the choice of 01 affect the asymptotic distribution and therefore the asymptotic efficiency of the estimator? This is a hard question to answer because the fixed-domain asymptotic distribution of the true MLE has only been established for some simple models and has not been explicitly given for general cases. It would be a harder problem to find the explicit infill asymptotic distribution for the tapered MLE. Du, Zhang and Mandrekar (2009) considered this issue which will be presented in the rest of this section. The main results for the Ornstein-Uhlenbeck process are presented in subsection 3.3.2. For the microergodic parameter in the Ornstein-Uhlenbeck process, we establish the asymptotic distribution of tapered MLE. In subsection 3.3.3, we present the main results for a Gaussian stationary process having a Matérn covariogram. We put all proofs in the two appendices. 3.3.2 The fixed-domain asymptotics of tapered MLE for Ex- ponential Model We assume the underlying process X (t),t 6 [0,1] is Gaussian that has a mean 0 and an isotropic exponential covariogram K (h) = 02 exp(—9h). Such a process is known as the Ornstein-Uhlenbeck process, which has a Markovian property that will be exploited in our proof. 41 The exponential isotropic covariance function is one of the most commonly used models for spatial data analysis. It follows from Ying ( 1991) and Zhang (2004) that both 02 and 0 are not consistently estimable under the fixed-domain asymptotic frame- work, but the product 026 is. Applying the fixed-domain asymptotic theory for spatial interpolation, Zhang (2004) showed that it is only this product, and not the individual parameters 02 and 6, that asymptotically affects the interpolation. Therefore, it is important to estimate this product well. In this section, we establish the asymptotic properties of the tapered MLE of this product. For simplicity of argument, we will maximize the likelihood function over (6,02) E J = [a, b] x [w,v] for some constants 0 < a _<_ b and 0 < w _<_ v and do not require that J contains the true parameter value (60,08). However, we do assume that 6003 E {602, (6, 02) E J}; that is, there exists a pair (6, 02) in J such that 002 = 0003. The following two assumptions are made throughout this section: (A1) The process is observed at points t)”, E [0,1],k = 1,... ,n, with 0 S th < tgm < - -- < tn,” 3 1, and suppose that nAkfl, is bounded away from 0 and 00, where A)”, = tk,n — tk_1,,,,k = 2,... ,n. We also assume that tn,n —> 1 and t1," —>0 asn—r 00. (A2) K tap(h; 'y) is an isotropic correlation function such that Ktap(h; 'y) = 0 if h _>_ ”y, where 7 6 (0,1) is a constant. Moreover, Ktap(h;'y) has a bounded second derivative in he (0,1) and K(ap(h; '7) =ch + 0(h) as h—>0+ for some constant c. A taper can be any correlation function with compact support, and such correla- tion functions have been studied in literature [see Wu (1995), Wendland (1995 and 1998) and Gneiting (1999 and 2002)]. We believe that a large number of compactly supported correlation functions satisfy assumption (A2). Particularly, a Wendland taper is a truncated polynomial and, therefore, satisfies (A2) if the degree of the polynomial is greater than 3. We also note that the assumption in (A2) that K tap has a bounded second deriv- 42 ative in h E (0,1) can be weakened, so that dQKtap/th exists at any h E (0,7) as long as the first derivative exists everywhere in (0,1). The weakened condition will necessarily make the proof longer and, therefore, is not considered in this paper. Before we state the main results of this section, we need to introduce some nota- tions that will be used throughout this paper. For sequences of real positive numbers an and a sequence of real or random numbers bn that may depend on model parame- ters, bn = Du(an) if, for any n, P(|bn| g Man) = 1, for some 0 < M < 00, which does not depend on parameters but could be random. That is, bn/an is bounded uniformly in the parameters. Similarly, we write bn = on (an) to mean that, with a probability 1, bn/an converges to 0 uniformly in parameters. The following theorem compares the tapered log-likelihood function with the untapered one, and their derivatives. This theorem is essential to the establishment of the asymptotic properties of the tapered MLE to be given in the subsequent theorem. Theorem 3.3.3. Under the assumptions (A1) and (A2), uniformly in (6, 02) E J and with I’D-probability 1, twp“), o2) =rn(o, 02) + 0,,(n1/2), (3.3.6) 8 2 a 2 1 2 8—6€R,tap(630 ) 28—6671(910 )+ 0U(n / )3 (3‘37) where P0 is the probability measure corresponding to the true parameter values 03, 60. The next theorem establishes the strong consistency and the asymptotic distrib- ution of the tapered MLE. Comparing the asymptotic distribution of MLE of 0860 in Ying (1993) and that in the following theorem, we see that the tapered MLE is asymptotically equally efficient. A Theorem 3.3.4. Assume (A1) and (A2) hold, and let (6n,mp,62 n9 mp) maximize the 43 tapered likelihood function over (6, 02) E J. Then, as n -—> oo, P0(n1_i__,mooén,tap5%,tap = 9003) = 1, (3.3.8) . . d fi<6n,tap0121,tap — 0003) _) N(01 2(6008)2)7 (3'3'9) where P0 is the probability measure corresponding to the true parameter values 03, 60. If one of the parameters is fixed at any chosen value, we can easily get the following corollary. This will be the type of the estimator which we are able to deal with for general case. Corollary 3.3.5. In particular, let 0% > 0 and 61 > 0 be predetermined constants and define 9n,tap,17 62 n.tap 2 as solutions of the maximization problems €n,tap(én,tap,2aag) = SUP gn,tap(630'§)a (3-3-10) 6E[a,b] and €n,tap(0115721,tap,1) = SUP €n,tap(61102)- (33'11) 02E[u,v] Then én,tap,2 —> 6g 2 6008/03 as. and 6121,tap,1 —+ of g 6003/61 as. Moreover, as n —+ 00 A d dawnyapg — 92) ——-> N(0, 20%), (3.3.12) . d «mam, — 01?) —+ N(0, 2031). (3.3.13) 3.3.3 Fixed-domain asymptotic distribution of MLE and ta- pered MLE for general Matérn model In this section, we will focus on studying the asymptotics of tapered MLE for a general Matérn model. We assume the underlying process is stationary with mean 0 and the 44 following isotropic Matérn covariogram 02m)” Kh- 20 =———- ( ’0 ’ "’l rip/)2!!—1 ICV(6h), h > 0, (3.3.14) with unknown 02, 6 and known V, where [CV is the modified Bessel function of order V [see Abramowitz and Stegun (1967) p.375—376], 02 is the covariance parameter, 6 is the scale parameter and V is the smoothness parameter. Further, assume that the process is observed at n sites t1,tg, . . . ,tn in a bounded interval D C IR, and write Xn = (X (t1), . . . , X (tn))' . Zhang (2004) noted that neither 02 nor 6 is consistently estimable under the fixed-domain asymptotic framework, but the quantity 0262” is consistently estimable. Furthermore, this consistently estimable quantity is more important to prediction than the parameters 02 and 6. The primary focus of this section is to establish the asymptotic distribution of the estimators for 0262”. This is a more difficult problem than in the exponential case, and we cope with it by considering an easy version of the problem. Following Zhang (2004), we fix 6 at an arbitrarily chosen value 61 and consider the following estimators: 3,2, = ArgMax 3,,(01, 02), (3.3.15) 0' Ar2t,tap = ArgMax €7l,t(lp(61302)a (3.3.16) where ln(61,o2) and l,,,tap(61,o2) are the log—likelihood function and the tapered log-likelihood function, respectively. We make the following assumption on the spectral density of the taper Ktap(h). Similar conditions were used in Furrer et al. (2006) and Kaufman et al.(2008). Our condition here is stronger, and it is necessary for our approach to deriving the asymp— totic distribution of tapered MLE: (A3) The spectral density of the taper, denoted by ftap()\), satisfies for some constant 45 e >max{1/2,1—V} and0< M < 00 M (3.3.17) We note that taper condition (3.3.17) is satisfied by some well-known tapers. For ex- ample, Wendland tapers (1995, 1998) have isotropic spectral densities that are contin- uous and satisfy 9d,lc(/\) g M (1 + A2)—d/2‘k“1/2 for some constant M, where d is the dimension of the domain (d = 1 in this work). Therefore, condition (3.3.17) is satisfied if k > max{1/2, V}. Furrer, Genton and Nychka(2006) gave explicit tail limits for two Wendland tapers K1(h;7) = (1 — $010+ 4%), 7 > 0 and Kg(h; 7) = (1 — %)§_(1+ 6% + 33%;), 7 > 0 (2+ = max{0,:r}), and showed that A4g1(/\) —> 120/(7r73) and AsggO‘) —> 17920/(7r75), as A —> 00, where g,- is the spectral density of K, (i = 1,2). Therefore, condition (3.3.17) holds if V < 1 for taper K1 and V < 2 for taper Kg. One important probabilistic tool we will extensively use is the equivalence of prob- ability measures. The assumption (A3) implies that the tapered covariance function specifies a Gaussian measure that is equivalent to the Gaussian measure specified by the true covariance function [Kaufman et al. (2008)]. It readily follows that 6727.,tap6TV is a strongly consistent estimator of 0363” [e.g., Kaufman, et al. (2008)]. The main results in this section are the following three theorems. The next theorem is a general result about two equivalent Gaussian measures and is not restricted to the case of Matérn model or covariance tapering. It will be used to prove the other two theorems. Theorem 3.3.6. Let X (t),t E R be a stationary Gaussian process having mean zero and an isotropic covariagram K J- and a spectral density fj under measure Pj, j = 0, 1. Assume the process is observed at t1, tg, - -- in a bounded interval D, and let Xn = (X(ti)....X(tn))'. If 0 < f0(/\) x [AI—Tl , A —> 00 for some r1 > 1, (3.3.18) 46 and f1(/\) f0(/\) h(A) = — 1 = OHM—"2), /\ ——> 00 for some rg >1, (3.3.19) then E0(X:. 1/ 2. However, equivalence alone can not imply (3.3.20), and we need stronger conditions than the equivalence of two measures. We will show later that condition (A3) implies that (3.3.19) holds, for some rg > 1, if f0 and f1 represent the spectral densities of the true and tapered covariograms, respectively. Theorem 3.3.7. Suppose condition (A3) is satisfied, and the underlying process is stationary Gaussian having a mean 0 and a Matérn covariance function, and the sampling locations {t1, tg, . . . } are from a bounded interval. Then, for any fixed 61 > 0, with [DO-probability 1, uniformly in 02 E [w, v], tn,ta,,(61,02) = t’n(61,02) + 0.,(1) (3.3.21) 0 a mgnitazxgl, 02) 2' 67€n(61302) + Du(l), (3.322) where P0 is the probability measure corresponding to the true parameter values 03, 60, V. Next, we give the asymptotic distributions for both exact MLE and tapered MLE of the consistently estimable quantity 0262”. 47 Theorem 3.3.8. Assume that the underlying process is stationary Gaussian having a mean 0 and a Matérn covariance function with a known smoothness parameter u, and the sampling locations {t1, t2, . . .} are from a bounded interval: (i) For any fixed 01, flags?" — 0303”) 1) N(O, 2(0303V)2). (3.3.23) (ii) In addition, if the taper satisfies condition (A3), . 1 mains?" — 0868") 4-» M0, 2336832). (3324) Theorem 3.3.8 implies that the covariance tapering does not reduce the asymptotic efficiency for the Matérn model under condition (3.3.17). In this paper, we are not able to show that (3.3.24) remains true if 01 is replaced by the MLE of 6’. Therefore, the results in this theorem are not as strong as those in Theorem 3.3.4. More work will need to be done to extend Theorem 3.3.4 to the general Matérn case. We note that asymptotic distributional results about the microergodic parameter 0262” in the general Matérn class have not appeared in literature. Theorem 3.3.8 (i) is the first of such results, and its proof requires a novel approach. 3.3.4 Discussion There are some open problems for future research. First, for the Matérn model, the estimator of 0292” is constructed by fixing 6 at an arbitrary value. For a finite sample, common practice is to also estimate 9. It is an interesting question to see if Theorem 3.3.8 still holds for the MLE 6%” and the tapered MLE [72 92" Our conjecture n,tap n,tap° is that Theorem 5 can be extended to this case. 48 The main results in Sections 3.3.2 and 3.3.3 are for the processes with one di- mensional index. It is a more interesting problem to study the high dimensional case. However, our techniques in Section 3.3.3 cannot be directly extended to obtain analogous asymptotic distribution in the high dimensional case. For example, for a d—dimensional process, we would need (3.3.19) to hold for some r2 > d in order for the proof to carry through. Unfortunately, for the Matérn model, (3.3.19) dose not hold for any r2 > 2. The high dimensional case calls for new techniques for establishing asymptotic distributions. A referee suggested letting the bandwidth 7 vary and go to O as n increases to 00. This is a natural scheme in the fixed-domain asymptotic framework. We believe that everything in Section 3.3.2 carries through if the band- width goes to 0 not too fast. When the bandwidth of the taper depends on n, it is not obvious if our techniques in Section 3.3.3 still apply, because the properties of equivalence of probability measures are no longer directly applicable. 3.3.5 Appendix 1. Proofs for Section 3.3.2 In the sequel, we often suppress n in the subscripts. For example, write tk = tk,m Ak = Akfl. We will need three lemmas for the proofs of the theorems in Section 3.3.2. Lemma 3.3.9. Let X (t) be the Gaussian Ornstein-Uhlenbeck process, and assume (A1) holds. Denote E(X(ti)|X(tJ-),j aé i) = —Zj75ib,-j,n(6)X(tJ-),l g i g n, Var(X(t,-)|X(tj),j # i) = d.,;‘,—,(6’,02), which is written as d, for short. Then, for 1=—e“’4‘2,bm._1,n(9)=—e*mn,b.j,n(6)=0 for li—J'|>1- (3.3.26) 49 In addition, uniformly in (6,02) E J, 1 < i < n, 1 g j g n, 1 1 2020A-A.- 1 _ 2 _ 2 __ 2 5+1 1 1 1 a _ b;,,n(6)=0u(;), b’lj,..(0)=0u(;), b;,,..(6)=0u(;), 5543.20.00. (3.3.28) PI'OOf. Note that E(X(t2')|X(tj),j 75 ’l) = _Zjaéi bij,n(0)X(tj),l S i S n if and only if CW(X(ti) + Zbik,n(6)X(tklax(tj)) = 0, for anyj#i.j=1.-~.n- kaéi We therefore prove (3.3.25) and (3.3.26) by verifying that Gov (2((tz') + bi,i_1X(ti_1) + bi,i_+_1X(ti+1),X(tj)) = 0, for any ] 75 ”l, (3.3.29) where we let bIO = bum,“ = O. For i = 1 or n, (3.3.29) readily follows the stationarity and the Markovian property of the Ornstein-Uhlenbeck process. For 1 < i < n, (3.3.29) holds, because, ifj 2 i+ 1, the LHS of (3.3.29) equals _gA. -20A- —0A° —29A- 026—6(tj.ti+l) e_0Ai+1_ 6 z(1‘6 z+l)e—0(Ai+Ai+1)_ 8 1+1(1 — 6 z) , 1_e-20(Ai+Ai+1) 1 _ e-20(A,~+A,-+1) which is zero. We can get the similar expression when 3' _<_ i — 1. Therefore, (3.3.29) is proved. Since d,- = E(X(t,~) + b,,.i_1X(t,-_1) + bi,,-+1X(t,-+1))2, straightforward calculation yields (24—20Ai)(1 __ (3-20A’l+1) _ 2 _ —26A2 _ 2 —20A ._ 2(1— d1—0 (1 e ), dn—O (1‘6 n), ‘12-" 1_e—26(A¢+Ai+1) ,1 1, we only need to consider j = i — 1 or i + 1. Write the derivative b’-- i i—l ,n (6) = A/(l — e_20(Ai+Ai+1))2, where _ (_e—QAi + e—OAi—26Ai+1)2(Ai + Ai+1)6_20(Ai+Ai+1) =Ai8—6Ai _ (A; + 2Ai+1)(e-0Ai—29Ai+l _ e—36Ai—26Ai+1) — A,e—39Ai—4"Ai+1. (3.3.30) Note that 1 1 1 _ 8‘20(Ai+Ai+1) 26m,- + 1),-+1) is uniformly bounded and n(A,- + AMI) is bounded away from 0 and 00 by Assump- tion (A1). Hence, 1/(1 — e—26(Ai+Ai+1)) = 0(1/n), and it suffices to show that A is Du(l/n4). Using, again, the fact that A,- = 011(1/n) and applying the Taylor expansion, we get mfg—"Ar = A,- — M)? + (1 /2) 62A,3 + 0.0 /n4), _ (A2 + 2Ai+1)(6_BAi-26Ai+l _ 6—30Ai—29Ai+1) = —2 623,2 + 492A,3 +12 62A,2A,+1 — 46A,-A,-+1+ 8 (FAA-+12 + Gnu/n“). _ A_e—36A,-—46Ai+1 = 2 9 — A,- + 3 as} + 46A,A,+1— 56233 — 1262A,2A,+1 — 802A,A,+12 + 0.,(1/n4). All the terms except Ou(1/n4) are canceled out. Therefore, A = Ou(1/n4) and b’. (a) = Ou(1/n2). [j 2i—1 ”(9) = Ou(1/n2). Similarly, we can show b'-. “+1.71 We now introduce the following notations. Let On denote a matrix of which the elements are Ou(1/n) except those in the first and last rows, which are uniformly 51 bounded; that is, Ou(1). Denote, by (an, the matrix whose (i, j )th element is Ou(1) if i = 1 or n or i = j, and is Ou(1/n) otherwise. Therefore, (0.31) 0.0) ) 0,,(1/n) Ou(1/n) 0.,(1) on = 1 6n = én‘l' Ou(1/n) Ou(1/n) Ou(1) (0.,(1) 0,,(1) ) Lemma 3.3.10. Under assumptions (A1) and (A2), uniformly in 0 E [a, b]: (i) V;1(Vn O Tn) = In + 6n, V_1 8;” = 6n: .. a _ "' a _laV'n _ V (11) 55m, (v,, o T n))— on, 37W 39 )_ on, (iii) 1 < det(v,;1(vn o Tn)) = 0.,(1), (V;1(Vn o T..))‘1 = In + (3n, where In is the n x n identity matrix. From the definitions of On and 6n, we have ~ énén : 6n, 6110" = 6n, 671,671 = 6n. (3.3.31) Then, Lemma 2 (i) and (ii) and the following well known fact [see, e.g., Graybill (1983) pp.357—358]: 8 _ _18Vn_ (9—9an —_-,,—V —V,,186 . (3.3.32) imply a _ _ .. 56(vn1(vnorn)) 1:0,, (3.3.33) Proof. We can assume 02 = 1 without loss of any generality, because all quantities in the lemma do not depend on 02. We will repeatedly use Lemma 1 and particularly 52 the fact that V;1 is a band matrix. The proof involves tedious computation, and we will keep a balance between brevity and clarity. Several quantities in the Lemma are of the form V;1(Vn 0 Q), where Q is an n x n matrix whose (i. j)th element is g(t,- — tj) for some even function g(t) that has a bounded second derivative on [—1,0) U (0,1]. If the limits of the derivative g'(0+) = limt_,+ g’ (t) and g’(O——) = limt_.0_ g’ (t) exist and are finite, we show now V;1(Vn 0 Q) = 9(0)In + (Q’(0+) — Q’(0-))diag{0u(1),- -- ,0u(1)} +612, (3334) where diag(0u(1), . . . ,Ou(1)) denotes a diagonal n x n matrix with bounded elements. There are immediate corollaries from (3.3.34). First, it implies V;1(Vn 0 Q) = 6". Second, by taking g(t) = —|t|, we get V;1(8Vn/86) = (3". Lastly, if 9(t) = Ktap(|t|), then g’(0+) = g'(0—) and V;1(Vn o T") = In + (3n because Ktap(0) = 1. To prove (3.3.34) we need the following well known result [e.g., Ripley (1981) p.89]: vglw, 02) = D;1(6,a2)Bn<6), (3335) where 3,,(0) .—. (b,,,n(9))lg,,jg,, and Dn(0,02) -_— diag{d,-(6,o2),i = 1,...,n}, in which bij,n(6), di(0, 02) are defined as in Lemma 3.3.9 and bii,n(6) = 1. let Wij denote the (i, j)th element of V171(Vn 0 Q). Hereafter, for the ease of notation, we will suppress the dependence of any quantity on n and parameters [e.g. bij,n(6) = bij, d,,n(6, 02) = d,], wherever confusion does not arise, throughout the rest of the paper. Write bij = 0 ifj < 1 orj > n and to =t1,tn+1=tn. Since bij = 0 if |i —j| > 1, i+1 wij : (Ii—1 2: bikK(|tk — tjllém- — tj). (3.3.36) k=i—I For any i > j, and k = i— 1 or i+ 1, we have tk - ti 2 0. Hence, the Taylor Theorem 53 implies 9(tk - tj) = 0(ti - tj) + Q,(ti - thk - ti) + Q”(tz' - tj + {(ik - tj))(tk - t02/2. for some é E (0,1). Since 9 has a bounded second derivative on (0,1), and tk — t,- = 0(1/n), we have 00% — tj) = 902' - tj) + Q,(ti - tj)(tk - ti) + 0(1/n2l- (3-3-37) Here then, 2+1 wij = d;1e(t.-—tj) Z bikKUk — tj) kzi—l H1 (3.3.38) + (1,7130.- — t.) 2 Km. 4,.)(1, 431).]. + 0.11/1»). k=i~1 where we have used di—l = 0u(n). Note that di-1 22:24 bikK(tk — t3“) is the (iii) element of DngnVn = In. Hence, the first summand in (3.3.38) equals 9(0)1{i=j}- Similar to the establishment of (3.3.37), we can show K(tk — tj) = K(t, — t,) + K’(t,- — tj)(tk — t,) + 0u(1/n2). It follows that, for i > j, i+1 wz'j = di—le’Ui — tj)K(t'i — tj) Z (tic. — tilbik + 0210/71), (3-3-39) k=i—1 By utilizing the explicit expressions of bij given in Lemma 1, we can show “”1 Du(l/n2) if1 j Ou(1) ifi = 1 or n wz'j = (3.3.41) Ou(1/n) if 1 < z‘ < n. In view of the fact that g is an even function, we can show, similarly, that (3.3.41) holds for 2' < 3'. Now, let us consider wii. First, note that 9(ti—1 '41) = 9(0) + Q'(0-)(ti—1 - ti) + 0(1/712): (3342) 9(t2'+1 — ti) = 9(0) + Q,(O+)(ti+1 — ti) + 00/712)- (3343) Since di—1 :32:24 bikK(tk — ti) = 1, 2+1 wiz‘ = (1,71 2 bikK(tk — ti)9(tk — ti) k=i—1 = 9(0) + di—1{bi,i—1K(ti—l - ti)9’(0—)(ti—1 - ti) + bi,i+1K(ti+1 — ti)0’(0+)(ti+1 - 152)} + 0210/73)- Since K(h) = K(O) + K'(O)h + 0u(h) as h -—> 0, W = 9(0) + K(Oldi—lwm—10'(0—)(lz’—1 - ii) + bi,i+10'(0+)(ti+1 - t2)} + 011(1/71’2): which can be rewritten as i+1 wz‘z‘ =Q(0) + Q'(0—)K(0)d{1 Z (tk — ti)bik kzi-l (3.3.44) + [e’(0+) — e’(0—)1K(0)d;1b.-,.+lA.-+1 + Ono/n2). Then, (3.3.34) follows from (3.3.40), (3.3.41) and (3.3.44). (i) is therefore proved. 55 To prove (ii), note that (3.3.60) and Lemma 3.3.10(i) imply,, 3 _ _ @(Vn 1(Vn 0 T7») = _V12. 86 86 = —V;16—0-V"(In + On)+V;1(8——;;" 0Tn> 1’ V51 (8V7; 0 (Tn — J11)) + 6n: V V which is clearly ()n from (3.3.34) by taking g(t) = — |t| (Ktap(|t|) — 1) that is differ- entiable at 0, where Jn is a matrix of all 1’s. 6 _1 8Y7; 8—6(Vn ———)=O n similarly. Write Next, we will show 5‘6— (Vn av avn 32v 2 _ —1 Tl —1 —1 n 86 ) V” 86 V" 89 +V" 302 ' (3.3.45) By (i), the first term on the RHS of (3.3.45) is énén = 6n, and the second term ~ 2 is On, because V;,'1 88;" second derivative so that (3.3.34) applies. This completes the proof of Lemma 3.3.10 (ii). = V;1(Vn 0 Q) with g(t) = t2, which has a continuous Let An = V;1(Vn oTn) and (12-3- denote the (i, j)th element of An. We now apply a series of column operations, so that An becomes In + 9n and each of the operations retains the determinant of An, where an is a matrix whose elements are bounded by M /n for some constant M not depending on 6; that is, Qn(z', j) S M /n We have shown that An = In + 67,, where elements of (3,, are Du(l/n) except those that are on the first and last rows that are bounded. We can subtract from the jth column the first column multiplied by the (1, j)th element of An, 2 < j < n. Then, all elements in the first row are Ou(1/n), except the (1,1)th element, which is 1 + Ou(1/n) and remains unchanged throughout the operations. Similarly, we can reduce the elements in the last row to Ou(1/n) except the last (71, n)th element. Applying the Hadamard inequality [Bellman (1970) p.130], we can show there exists some constant M such 56 that det(An) = det(In + on) g ((1 + M/n)2 + (n — 1)(M/n)2)n/2, which is bounded. To show A;1 = In + 6n, we first note that by Oppenheim’s inequality [Mirsky (1955) p.421], which yields the inequality for the determinant of Hadamard product of positive definite matrices, det(VnoTn) > det(Vn) 1119-9, t,,- where ti,- is the diagonal element of Tn and it is one. Therefore, det(An) > 1. We only need to show that the (i, j )th cofactor 4420' = d€t(An)1{1:j} + Ou(l/'n)-+-1{j:10r j=n}0u(1)° Similar to proving det(An) = Ou(1), we can show all the (n — 1) by (n — 1) cofactors are also Ou(1). In addition, Aij = Ou(1/n) for 1 < j < n,i 74 3' since it has one row of elements Ou(1/n) and replacing that row with Ou(1) would yield a bounded determinant. To complete proof of the Lemma, it remains to show Aii = det(An) + 0u(1/n), 1 < i < n, which is true by Laplace expansion det(An) = (1+Ou(1/n))A,-i+ Lemma 3.3.11. For any 0 6 [a,b], let 371(6), n = 1,2,..., be a sequence of ran- dom variables such that E(Sn(0)) = Ou((log 71)"), E[Sn(6) — E.S'n(0)]6 = Ou((log 71)") uniformly in 0 for some constant 7" > 0. Assume that, with probability one, Sn(0) is differentiable with respect to 0 and S,’,(0) = Ou(n2(logn)r) uniformly in 0. Then, sup ISn(0)|=o(n1/2) a.s. 0€[a,b] Proof. Let a = 60 < 61 < - - - < 6M" 2 b partition [a, b] into intervals of equal length, where Mn is the integer part of n3/2+0‘ for some 0 < a < 1/ 14. Then, SH 5- 0 < max 8 0 + ma su 5'9 —S,,0 . 3.3.46 9€lalibll ”( )l _ 1995M" H( k“ 1_<_k_<_)]f.[n0€[gk_11),gk]l M k) 1( )l ( ) 57 Because there exists constant C > 0 such that the sixth central moment of 571(6) is uniformly bounded by C (log 71V. Ah; P(1>l>n2 )< ZP (IS we. manhunt“) lC(log 71)" _ C(logn)’ < — —. — Mn 713-60 n3/2—7a Since 3/2 - 70: > 1 with a < 1/ 14, it follows from Borel—Cantelli lemma that S 6’ —ES 0. =0 1/2—a 131,132?»nt n( kl ( n( klll (Tl ) a 9 Since E(Sn(bl)) = 0((log n)T) uniformly in 6, and I512(9k)|< Infill/171' STL(6k)— M(NWklll'l' max lE(Sn(0kllla 1’")a.s. 06kt] Therefore, max sup S 6 — on ”2 a.s. 3.3.48 131.311.. 0616.44,.“ ..< 5— 5.60 )l— (n ) < > The proof is completed by combining (3.3.46), (3.3.47) and (3.3.48). [:1 Proof of Theorem 3.3.3. Recall that the tapered and untapered log likelihoods are given by (3.3.4) and (3.3.2), respectively. The proof of (3.3.6) consists of direct com- parisons of the log determinants and the two quadratic forms. First, Lemma 3.3.10 58 (iii) implies that log[det(Vn o Tn)] = log[det(Vn)] + Ou(1). (3.3.49) Define Hn(6) = (v;1(vn 0 Ta)“-1 — I... Then, Hn = 6,, by Lemma 3.3.10 (iii). Because (VT, o T.)1 = V;1 + HnV;1, (3.3.50) x;,(vn o Tn)—1Xn = xgvfixn + Xanvflxn. (3.3.51) Proof of (3.3.6) would be completed if, uniformly in ((9, 02), with probability 1, x;,H,,V;1xn = nail/2). (3.3.52) We will apply Lemma 3 to prove (3.3.52). Define 53(6) = aannvglxn, and note that Sn(9) depends on 6 but not on 02. In view of symmetry of HnV;1 by (3.3.50), we can write E03740) = 02 trace{HnV;1Vn,0}, (3.3.53) where hereafter in the proof the expectation is evaluated under the true parameter 08 and 00, and V50 = Vn(60, 53). The rth cumulant of x;,H,,V;1xn 1.. = 2T—1(r — 1)! trace{HnV;1vn,0}T, (3.3.54) r = 1, 2, . . . [see Searle (1971), Theorem 1, p.55]. 59 Next, we show that v;1(0, 02)Vn(60, 03) = 0u(1)In + 5... (3.3.55) Then, it follows from (3.3.53)-(3.3.55) and (3.3.31) that the first moment and the sixth central moment of Sn(0) are uniformly bounded, because the sixth central moment of Sn(0) is 155 + 15154152 + 103% + 15133, which is uniformly bounded because all of the four cumulants involved are uniformly bounded. By using notations introduced in proof of Lemma 3.3.9, we now give explicit ex— pression for the elements of V; 1(61, 02) based on Lemma 3.3.9 and (3.3.35). For brevity, we drop the parameters in the matrices and write Bn,0 = Bn(60) and Dn,0 = Dn(00, 0%). Decompose V; 1 into v;1= D513”), + D,;1(B,,,0 — B”) = A1+ A2. Then, AN...) = Dngn,oB;},Dn,o = Balm...) and the diagonals of D; 1Dn,0 converge uniformly to (0860/ (026) by (3.3.27). There- fore, Alvw = diag(0u(1), . . . ,ou(1)). (3.3.56) In addition, ~ A2Vn,0 = On uniformly in 6 E [a, b]. (3.3.57) Indeed, the absolute value of the (i, j ) th element of AgVnfl, 1 < i < n is n “ —6 t —t,- Edi1(bik,n(0)—bik,n(60))age 0] k J k=1 (3.3.58) _ 1 _<_ d.) 108 E : lbik,n(6) _ bik,n(60)] = 011(5)) lk-iISI 60 where the last equality follows from (3.3.27), (3.3.28) and the Taylor Theorem. Simi- larly, we can show the elements on the first and last rows are Ou(1). Hence, (3.3.55) 8 follows from (3.3.56), (3.3.57) immediately. Lastly, note that 5557“” = Du(n2) b Lemma 3.3.10. The conditions of Lemma 3.3.11 are satisfied. Therefore, sup Sn(6) = 0(n1/2), (3.3.59) 0€[a,b] which implies (3.3.52). We have now proved (3.3.6). (3.3.7) can be proved similarly, and the remaining proof will be brief. By using the following facts in matrix algebra ‘9 __ —18Vn filog[det(Vn) —trace{V; 80 }, %V;1_ — — v;la;;"V;1, (3.3.60) 8 8Vn 55(Vn0Tn) '— 89 OTn, (3.3.61) The derivatives of the log likelihood functions can be written as 6 2 _ _18Vn_18Vn _1 (9—96” ”(6, o ) — trace{Vn “V 86 ; Xn, (3.3.62) 8 _ 16V Eégn,tap(97 02) = _ traC€{(Vn O Tn) 1( +8671 0 Tn)} I —-1 8V7). _1 + X,,(v,, o Tn) ( 66 o Tn)(v,, o Tn) xn. (3.3.63) We first show that the two traces differ by Ou(1). Write An = V; 1(Vn, o Tn). It is straightforward to verify 86 18Vn ———An WA ILA" 1 _ OT”): A" V" 66 66' (VnOTn)_1( 61 Then, 5V1. 06) 8V1; . _ 3A; 80 }+trace(A,,l 7' 06 )’ trace{(Vn o Tn)_1( o Tn)} = trace{V;1 (3.3.64) where the second trace in the RHS is clearly uniformly bounded by Lemma 3.3.10. Similarly, we can write BVn ('96 (VnoTn)_1( (90 6'1‘n)(vno'1‘,,)—1 =V; V;1+WnV;1 for some matrix Wn, which is On. Again note that WnV;1 is symmetric, using the exact same same technique for deriving (3.3.52), we can show xgwnvflxn = ou(n1/ 2). (3.3.65) The proof is complete. Cl Proof of Theorem 3.3.4. First, for (3.3.8), it suffices to show that, for any 6 > 0, P0( inf ~ {gmtapwa J2) _ €n,tap(6a02l}—_’OO) : 1’ (3'3'66) {(0,02)€J,]002—6~02]26} where (6,03) 6 J can be any fixed vector such that blo~2 = 0003. Ying (1991) has shown (3.3.66) for the log likelihood function €n(0,02). More specifically, Ying (1991) showed that, uniformly in (0,02) 6 J and ]002 — blah] 2 6, with probability 1, (”(6162) — elln(0,a2) 2 1771. + Ou(n1/2+a) for any a > 0, [see the proof of Theorem 1 in Ying (1991) p. 289]. Then, (3.3.66) follows because of (3.3.6) in Theorem 1. 62 Similarly, we can show (3.3.9) by using (3.3.7) and some asymptotic results in Ying (1991). We can write [see (3.10) and (3.11) in Ying (1991) p. 291] E) o 26 86M” (’06 2) 20363 2 Wk n 5+ 014(1), where X(tk) — 8—00AkXUk—1) 00 \/1 - e_200Ak Note that Wk,” depends only on the true parameters and are i.i.d. N(0, 1) for k = Wk,n : 1,...,n. Then, for any (6,62) 6 J, we have 0 626266611 tap(6, 0’ 2)=606OZ(Wkn — 1) —n(626—6060) +0u(n1/2) k: 2 by Theorem 1. In particular, for (6, 02) = (67,, who 612,501,), the left hand side is zero. Therefore, we obtain 7?. " A2 2 1 2 W62; (Wk n ”(gnfiapanmap — 6000) + 0u(" / )- Since WE”, k = 1,... ,n, are i.i.d. xi we have n A ,. __ 2 fi(6n,tapo,2,,tap — 6008) = 60087). ”2 E (WOW, -— 1) + ou(1) k=2 d —> N(o,2<6008)2). The proof is complete. [:1 63 3.3.6 Appendix 2. Proofs for Section 3.3.3 We will employ some known properties of equivalent Gaussian measures reviewed in Chapter 2 [e.g. (2.2.2)~ (22.5)] and will refer to Ibragimov and Rozanov (1978) frequently. Before we proceed with the proof of the main results in Section 3, we will establish the following lemmas. Lemma 3.3.12. Let f1(A) be the spectral density corresponding to isotropic Matérn covariagram K (h; 0%, 61) and f1 (A) be the spectral density corresponding to the tapered covariance function R(h;612,61) = K(h;o¥,61)Ktap(h). Under condition (A3), there exists r > 1 such that fl()‘) — f1(’\) = 0(IAI—1) h(A) as IA] —+ 00. (3.3.67) Proof. Using the fact that Fourier transform of product of two functions is the con- volution of their Fourier transforms, we have f1(A)= [112136636 — .33., (3368) where ftap is the spectral density corresponding to K tap- It is seen that f1(A)/f1(A) does not depend on 0% so that we can assume without loss of generality that a? = 1. It suffices to consider the case that A > 0, because f1(A) is symmetric about A = 0. Using IR ftap(A — 3:)dx = 1 and breaking down these integrals over intervals (—00, A — Ak] U [A + Ak, +00) and (A — Ak, A + Ak) for any 16 E (0,1), we have fl _ __ flA—xIZAk f1($)ftap()\ — x)dx _ _ $ h(A) 1 ‘ m) [WM 6.33 and f.._,|<,k(h(x) — f1(A))f...p(A — x) 6.: f1()\) =T1+T2+T3. 64 By condition (A3), we have 1W 1 :6 dr. (1+A2k)u+%+.f1(A)/f1( ) |T1| S The Matérn spectral density has a closed form 2 21/ co 6 F(V +1/2) A = 1 1 or c = —. 3.3.69 f 1( ) (9i+|/\I2)"+1/2 row/2 ( ) Here, we set 0‘1? = 1. In addition, f f1(A)dA = of = 1 is the variance. Then, 1 2 2 11+? |T1| g M (61 H l 1 (3.3.70) C9212“ +/\2k)V+2+C Similarly, M ngl 3 (3.3.71) 6 4.3121651 Since 6 > 1/2 and u + 6 > 1 by Condition (A3), we can choose 16 to be sufficiently close to 1, so that both T1 and T2 are 0(A—T) for some r > 1. To bound T3, write, for some 5 between A and :6, (I - A)? 11(1) — 11(3) = Jinx): — A) + f{'(€) 2 Then, 1 , , T3 77,7010) (Hwy — A)fmp(/\ — x) 3.. II (33 "' M2 + [Ail-[0k f1 (él—2—_ftap()‘ — 3?) dil- The first term is 0 because the integrand is odd. For the second term, note 0 0, there exists {7-(A) such that {,(A) = /cr(t)exp(—iAt) dt, 0 < |§T(A)|2 x |A|_T, A ——> 00, (3.3.72) where or(t) is square integrable and has a compact support. Proof. We only need to show the case 0 < r S 1, because the product of any functions of the type given by (3.3.72) belongs to this type, due to the fact that the Fourier transform of convolution coincides with the product of Fourier transforms. Let 1 _ 1 3(1): / e—Wlt|’”/2—1dt=2/ 655(1):)6/2-161. _1 0 We will show that {T(A) satisfies (3.3.72). We only need to prove it for A 2 0, because {,(A) is symmetric about A = 0 and {7(0) > 0. Let u 2 At. We can write A €r(/\) = ”7772/ cos(u)uT/2"1du. 0 Then, {7—(A)2 x W” as A —> +00, because cos(u)u."/2_1 is integrable for 0 < r S 1. 66 Next, we will show {,(A) > 0 for any A > 0. It suffices to show y(A) = /0/\ cos(u)u"‘S du > 0, (3.3.73) where 6 = 1—r/2 E [1/2,1). Note that y’(A) = cos(A)A“S and y”(A) = — sin(A)A‘6 — 6cos(A)A'6_1. Therefore, the minimum points are {21:77 + 377/2,k = 0,1,.. . }. So, we only need to show y(2k7r + 377/2) > O, k = 0,1,... by induction. First, using monotonicity of cos(u), we have 377 77/4 77/2 77 377/2 y(—) 2/ cos(u)u_6du +/ cos(u)u—6du +/ cos(u)u-6du +/ cos(u)u-6du. 2 0 77/4 77/2 77 (77/4)1”5 1 \/§ 1 1 > ‘ 4 m"?- + — I. - -—-- — —-—-—- — —— ./§ 2 \/§ \/§ 1 >— — — —— —— — — — . . . _2fi+7r(1 2) 7r fi>0 (3374) Next, suppose y(2(k — 1)77 + 377/2) > 0, for k 2 1, then y(2k7r + 377/2) 2k77+77/2 2k7r+37r/2 _—_ y(2(k — I)77 + 377/2) +/ cos(u)u_6du +/ cos(u)u_6du 2k77—7r/2 2k77+77/2 2k77+77/2 2k77+7r/2 = y(2(k — 1)77 + 377/2) +/ cos(u)u‘6du -/ cos(u)(u + 7r)"6du 2k7r—7r/2 215777-77/2 2k77+77/2 = y(2(k —1)77 + 377/2) +/ cos(u) (u‘d — (u + 70—6) du. 2k77—77/2 The integral is positive because the integrand is positive. This completes the proof of Lemma 3.3.13. [:1 Proof of Theorem 3.3.6. Write the Cholesky decomposition of V0,” 2 LL’ for some lower triangular matrix L. Let Q be an orthogonal matrix such that QL—1v1,nL’—1Q’ = diag{6¥1;,. . . ,afim}. 67 Then, QL’Vi‘JllLQ’ = diag{1/6in, . . . , 1 /o.,%,;}. Taking the trace of both sides, we have trace( (VOnV i711)=Zl/6kn. Hence, Tl, E0(X;,(Vin —VO n)Xn)— — trace(VO ”Vl.) n — = — — 1). k:1(0k,n Let en 2 QL—IXn. Obviously, E0 eneg, = In, E1 ene; = diag{oin, . . . ,ogan}. (3.3.75) (3.3.20) follows if, for any orthogonal sequence {7})“ k x 1,2, . . .} in the Hilbert space L%(d P0) spanned by X (t),t E D under the covariance inner product corre— sponding to P0, there exists a constant M > 0 independent of The: k = 1, 2, . . . , such that 00 Z — —1 < M. (3.3.76) 2 One important technique to prove (3.3.76) is to write, for any s,t E D, E1X(t)X(s—) onu) s)=/ /e( (.1. -#1)<1> (,/\p)dAdu, (3.3.77) where (A, u) is square integrable with respect to Lebesgue measure on R2. For any bounded region D, the existence of such a function (P and, therefore, the equivalence of P0 and P1, are shown in Ibragimov and Rozanov (1978) p.104, Theorem 17, under the assumption that the function h(A) in (3.3.19) is square integrable. However, we will show, under the assumption of this lemma (3.3.19), which requires integrability 68 of h(A) being, that (I) takes a particular form (A,p) = ¢1(A)2(,u) [Te—“ATM” d0) (3.3.78) for some functions (DJ-(A), A E R such that f |j(A )|2 )A/f0( )dA < 00, j = 1,2, and a compact interval T that is solely determined by rl and 72. This particular form is central to the proof, and we will establish it at the end of this proof. We now proceed by assuming it is true. Let dZ0(A) denote the stochastic orthogonal measure so that X (t) has the spectral representation under measure P0; that is, X (t) = f exp(—iAt)dZ0(A). Then, for any 77 E L2 D,(dP0) there is a function 6(A) such that 77: f( 6(A )dZO A() and E0772 f |6( ((A)]2f0 (A) dA. We first show E1772 =/I¢(A)1271\)|2(r\f1 d)A— /|¢(/\) A)Ifo(A)(1+h(A))dA—»o, because h = (fl — f0)/f0 is bounded. It follows that 77m converges in L2(dP1) norm to some variable 77 because E1(77€—77m)2 = f [673(A) —6m(A)|2f1(A) dA —> 0 as €, 771 -—> 00. Then, E1772= "331100 61723131100] I6...(A)I271(A)dA= / I6I271(A)dA Since L2 convergence implies convergence in probability, we have 77m —> 77 in probabil- ity P1. On the other hand, 77m ——> 77 in probability P0 and, consequently, in probability P1, due to the equivalence of the two probabilities. Then, we must have P1 (77 = 77) = 1 and E1772 = E1772. We have proved (3.3.79). To show (3.3.80), note that ]/ m¢m(#)q)(A,/l) dA d/J —/ W¢(M)¢(A,,u) dA d,” A 6) )l d/\ (1)“ (3.3.82) + // ](¢m(/‘l2 — (Willlml |‘1>(A,)u.)l dA dp, where the first term tends to zero, because Cauchy-Schwarz inequality implies that its square is bounded by ITI2 / ——>O, |1(A)|2 (@207)? MA) ‘2 f0(l1) dfl 6.. A) 6(A)(|270A) dA / I6... )|2f f()7()du by (3.3.81) and square integrability of 1(A)/ \/ f0(A), where and hereafter |T| stands for the length of finite interval T. Similarly, we can show the second term in (3382) also tends to zero. Therefore, (3.3.80) is now proved by taking the limit of E1772; — E0773». and ff ¢m( _<6m(u)‘1’(/\ 71) dA duo 70 Applying (3.3.80) to the orthonormal sequence 71k 2 f¢k(A)dZO(A), k = 1, 2, . . ., we have Em}: — 1= // ‘¢.(A)‘¢k(u) [T WW)“ dw‘<1>‘i(A)2(u) dA d1) = [Tm/(mow, where Aj,k(w) = f¢k(A) exp(2'Aw)J-(A) dA, j=1,2. Because (3.3.19) implies P0 E P1, there exists constant C > 0 such that E177)?C > C [see Ibragimov and Rozanov (1978) Page 104, Theorem 17 and page 76, (2.8)], and, therefore, 2 11/121713 — 1| s (Elm? — 11/0 s (1/20) 2 [T (Aj,k(w)l2dw. In view that A J k(u. ) IS the inner product of the two integrable functions ¢k(A)f0(A A)”2 and exp(iAw) jA( )/f0(A)1/2 in L2(dA), and that ¢k(A)fo(A)1/2, k— — 1, 2,. ,is an orthonormal sequence in L2(dA) [because Eongnk— — f ¢g(A )Aqbk(A)f0( )dA], we have, by Bessel’s inequality, 2 (fly-1(a))? s / Ij(A)l2/fo(A)d/\ < cc. 1621 It follows that ~— 2 oo 2—1)<()1/2C Z/T ZIAMM j=1Tk=1 S (ITI/2C)Z/Ij(A)|2/f0(A)dA < oo. j=1 We just need to show (3.3.77) and (3.3.78) to complete the proof. We will employ the following well-known properties of Fourier transform. For any square integrable func- tions (with respect to Lebesgue measure) (pj (A), A 6 Rd, there are square integrable 71 functions aj(t), t 6 Rd such that (pj(A) = / exp(—z'/\’t)aj(t) dt, j = 1,2. Rd Furthermore, 90100792”) = /1Rdexp(_iXt)(a1 *02)(t)dt (3-3-83) [Rd eXp(iA’t)901(/\) 00, (3.3.85) for some square integrable function cj(t) that has a compact support [i.e., cj(t) is 0 outside a compact set]. Let {(A) = (f0(A) — f1(A))/|{1(A)|2. Then, {(A) is square integrable by the as- sumption of the theorem and the properties of 61(A). Therefore, we can write, for some square integrable function c(t), 5 (A) = f exp(—2'At)c(t)dt. Furthermore, for all s, t, E0X(s)X(t) — E1X(s)X(t) = /e77<-*—‘>(10(A) — h(A)) dA = / ei)‘(‘s_t)€(A) |g1(A)|2 dA, (3.3.86) 72 which we will denote by b(s, t). By (3.3.83), [51(A)|2 = /eXp(—2'At) (/ 01(z)cl(z — t)dz) dt. Applying (3.3.84) to {(A) and |£1(A)|2, we get b(s, t) =27r/ a(w)] c1(z)cl(-—(s — t — w — z))dzdw (3.3.87) lR IR =27r/l2 c(u — v)cl(s — u)cl(t — v)dudv, R which holds for all s,t 6 IR. If we restrict s,t to the compact set D, the integral (3.3.87) is an integral over a compact set, say, A x A. This is because c1 is 0 outside a compact interval. Next, we write c(t) as a convolution of two functions. For this purpose, we write 5 (A) = €2(A)£3(A). Then, {3(A) so defined is square integrable from assumptions and (3.3.85) and, therefore, can be written as §3(/\) = fexp(—7'At)63(t)dt. Then, 0 = c2 * C3 and, consequently, c(u — v) = [02(13)C3(u — v — x)dx = /02(u — w)C3(w — 72) do). (3.3.88) Since we are only interested in b(s, t) for s,t E D and, consequently, only interested in c(u —- v) for u, v E A, we will restrict both 77, v to the interval A, so that the second interval in (3.3.88) is an integral on a finite interval, say, T, because 02 has a compact support. Define the bivariate function a(u,v) = / (22(71 — w)c:3(w — v) dw, 11,7) 6 R, T 73 which is square integrable because (a(u. v):2 s ITI jT |62(u — w)|2lcs(w — 2))? d... and both 02 and C3 are square integrable. In addition, for 71,1) 6 A, we have, from (3.3.88), a(u, v) = c(u — 7}). We therefore have shown that, for s,t E D, b(s, t) = 27r/2 a(u, v)cl(s — u)cl(t — v)dudv. IR Note that the integral is a convolution of functions of (71,12). Applying (3.3.84), we get 2wb(s. t) = / exp(z’(As +ut))m(A,u)m(A,u) «(A (1)., where 9919,71)=f2a(U.v)€_i(“A+”")dudv, 7920,71) =/ 01(”)61(v)€_i(ux+v")dudv- IR R2 Clearly, $20M!) = 61(A)€1(-u)- Now, W) = f ()() R2 = f / c2(u — w)C3(w — v)e-i(u)‘+v“)dudv do) T R2 =/ / C2(-T)C3(—3})C—i((1+w)’\+(y+w)#)drdydw T R2 = (/ 02(I)6—i1‘A63(_y)e—iypd$dy)e-i(A+u)wdw T 1R2 =€2(/\)€3(—H)/ fi—i(’\+“)“’dw. ,1. 74 Hence, 1 . _ . ((3,7) = Z; [1.2 6‘(AS+“‘)€1(/\)€2(A)€1(-u)€3(-u) /T e-“dedA dn 1 = 2_7; 2 €i(As—pt)$1—(;\‘)—q)2(p)/ e—i(A—/l.)w duld/A dp lR T for MA) = {1(A):2(A), and @201) = Wager). Clearly, I Ij(A)I2/fo(A)dA < oo by the assumptions of this theorem, (3.3.85) and the square-integrability of £2 and {3. The proof is complete. D Proof of Theorem 3.3.7. Let 0% be such that 0%0?” = 0803”, and let Pj be the probability measure under which the process has a Matérn covariogram with parameters (OJ-,0?) for j = O, 1. Then, P0 E P1 by Theorem 2 in Zhang (2004). Consequently, we only need to show that (3.3.21) and (3.3.22) hold, almost surely, with respect to P1. Let f1(A) = f1(A; 01, a?) be the spectral density under measure P1 and f2(A) the corresponding tapered spectral density as as f1(A) defined in Lemma 3.3.12, from which we see that, for some constant c > 0, /|A|>c which is a sufficient condition for the equivalence of the measures P1 and P2 where 2 dA N(0, 2), (3.3.93) 0 1 52t d «a “2“" — 1) —+ N(0,2). (3.3.94) 0 1 Let Vj,n be the covariance matrix of Xn corresponding to parameter values (02-, 022), j l“ O, 1. Write V1,n = UgRLn, where R1," is the related correlation matrix. First, 76 _...-—fi 4.! ‘ we note that [72, has a closed form express 1 &,.2, = axgnrlxn (3.3.95) 1,11 that can be derived straightforwardly from the maximization. Then, . I —1 «(z—é- ,:,(__..,._, aln 'v-lx {iii—n — 1) . (3.3.96) = (1/fi)(X§z(Vi}, - Vibxn) + fi( Since V6, 1/ 2X" consists of i.i.d. N(0, 1) variables, XQVggxn is the sum of i.i.d Tl variables having a x? distribution. The central limit theorem implies that the second term in (3.3.96) converges in distribution to N (O, 2). (3.3.23) in Theorem 3.3.8 follows if the first term is shown to be bounded almost surely with respect to P0. In view of (2.2.5), if suffices to show that Eo(x:.(v1j.1.— v8;)xn) = 0(1). To this end, we only need to verify that conditions of Theorem 3.3.6 are satisfied. The Matérn spectral density (3.3.69) satisfies, as A —> oo, 0 < f(A;0,,a,-2) ~ |A|_(2”+1). (3.3.97) Moreover, in view of 0808” = 0%”, 12 12 h(A)—h(A) 1— 63+” WM 1— 1+6(2)_6¥ VH 1 (3398) _f0(A) — 9f+,\2 _ 9§+A2 ’ " where f,(A) stands for f (A; 6,, 02.2), 7' = O, 1. Using the Taylor expansion, we can get h(A) ~ |A|‘2. 77 Hence, (3.3.23) is proved. Next, we derive the asymptotic distribution of the tapered MLE 0?, tap' Similar to 6,2,, the tapered MLE 02 takes the closed form n ,tap 1 0721,tap=%XnR1ana where film is the tapered correlation matrix corresponding to R1,". It follows from (2.2.5) 3.3.90 with of : 1 that X,,R1'71,Xn—X;,R1’},Xn : 0(1). a.s. Then, with probability 1 . 1 _ . agmp: —x;,R,},xn + 0(1/71) : 02, + 0(1/71). It follows immediately that 02 and (72, have the same asymptotic distribution. The n ,tap proof is complete. CI 3.4 Comparison between tapered and exact likeli- hood functions 3.4.1 Main results of comparison In previous section, the derived asymptotic distribution of tapered MLE for general Matérn model is only a partial extension of the results in Exponential case, because one of the parameters 9 is chosen to be fixed. However in practice, if you have no prior knowledge about either of the parameters, what is commonly done is to jointly maximize both parameters like we did in Exponential case. Actually the exponen- tial model has suggested some insight into the generalization to the case when the 78 covariance function is not exponential. However, explicit fixed-domain asymptotic distributions of exact MLE with all parameters maximized simultaneously or with high dimensional index in the general case are not yet available. It would be a harder problem to establish the asymptotic distribution of the tapered MLE. In light of this, we will focus on the comparisons between the log likelihood function and the tapered log likelihood function, and between their derivatives. We will establish results sim- ilar to Theorem 3.3.3. Consequently, the tapered MLE and MLE share the same asymptotic distribution under some regularity conditions. Therefore tapering would not reduce the asymptotic efficiency. Consider the stationary Gaussian process X (t),t E [0, 1] with covariance func- tion K(ls — tl) = 02p9(|s — tl), s,t E [0, 1] where 02 is the variance, and pg is the correlation function that depends on a parameter 0. For simplicity of argument, an equally spaced sampling scheme is used throughout the rest of this section, i.e. X (t) is observed at n points t,- = 7' / 71,7 = 1, . . . ,n. As in the previous section, write E(X(t.-)IX(t,-),j ¢ 7) = — 2b.),n(0)X(tj), 7742' (1,461,112) : Var(X(t,-)]X(tj),j : 1), 1: 1, . . . ,n, where 577,71 = 1. The notations Dn, Bn, Tn are also used for the general case, these quantities have the same meaning as before. For any sequence of integers an < 71/4, let Cn(an) denote a matrix of which the elements are uniformly Ou(1/n) in the middle 71 — 2an rows and the elements in the first and last an rows are uniformly bounded. Denote by Cn(an) the matrix whose (71, j)th element is uniformly Ou(1) if 1 g 7' 3 an or n — an < 7' _<_ n or Ii — j l < an, and is uniformly Ou(1/n) otherwise. We make the following assumptions on the coefficients b,- 3172(9) and on the prediction variances di,n(6)02)- (Cl) As function of 0 E [a, b], bijm,(6) is uniformly bounded in 6’ and in (i,j,n). For 79 some sequence of positive integers kn such that kn = 0((log log 71.)” 8), k?) . ijn( (9)=l) 01487), for any i, and (3.4.1) 2 (W IV’ j3lj-il>kn Z bane) )\/ b.J,6n( )|= (3.4.2) JIHISkn Cu (16}, / n) otherwise , where xV y = max{rr, y}. (C?) di,n(6102)—1=0u(n)7 di,n(9010(2))/di,n(9)02)=0u(1)and 3611:7169 0 2)/5’¢9’=0u(n )- . . . . . . . 8 2 (C3) p9(h) lS tw1ce differentiable function in (6,h) w1th p9(h), 55,0901), 5852,0901) having bounded second derivatives in h and these bounds are independent of 0. Moreover, limh__,0+ p’9(h) > 0 for any 6. Lemma 3.4.1. Under conditions (A2) and (C1)—(C3), the following holds for some constant 7' 2 0. 1 (9V7; (1) v;1(vno'r,,):1,,+(logn)ron, v; 89 =(Iogn)’"(")n. a _ - a _ av" . (11) 5—(;(v,,1(v,,o'rn)):(1ogn)"on, 6—0 v,,1 89"): (logny‘on. (111) det(V,‘,‘1(Vn o Tn)) =Ou((log n)"), (v.;1(v.,, o Tn))—1:In+(logn)ron. where ()7, denotes 6,,(an) and ()n denotes (37,,(an) for some (Ln = 009,). This lemma is analogous to Lemma 3.3.10 and conditions (C1)-(C3) are satisfied in the exponential case. Theorem 3.4.2. Under the conditions in Lemma 3.4.1, (3.3. 6) and (3.3.7) hold uni- forme in (6,02) 6 J. Proof. The development of this theorem follows closely the approach used for Theo- rem 3.3.3. As in proof of Theorem 3.3.3, to show (3.3.6) we compare the correspond- 8O ing log determinants and quadratic forms of €n¢ap(0, 02) and €n(6, 02). Since the first equality in Lemma 3.4.1 (III) gives log[det(Vn o T,,,)] = log[det (Vn)] + Ou(log log 71), along the same line comparing the quadratic forms in proof of Theorem 3.3.3, we only need to show (3.3.52), i.e. KanVhJXn = ou(n1/2) a.s. to obtain (3.3.6). Where recall that Hn = (V;1(Vn o T,,))‘1 — In, then Hn = (log n)r(~)n by (III) in Lemma 3.4.1. Throughout the rest of the paper, r represents some positive constant that is inde— pendent of parameters and may differ from time to time, but all the values it stands for are bounded above. First, by definition it is straightforward to verify the analogues to (3.3.31) and (3.3.33) with Cmén replaced by (log n)"'(~)n(an), (log n)rCn(an) re- spectively, still hold in this general case. This together with (3.4.1), (3.4.2) and the general expression (3.3.35) entails $87109) = Du(n2(log n)") almost surely, where Sn(0) = 02XQHnV; 1Xn, and it is noted that at most 2kn + 1 elements on each column of V; l are On (71.) and all others are 01,093 / n) Consequently, (3.3.52) follows from Lemma 3 if we can show E(Sn) : Du((log n)") and E(s,, — 12:(8,,))6 : Du((log n)"). (3.4.3) Indeed, thanks to (3.3.53) and (3.3.54), (3.4.3) directly results from the analogue of (3.3.55), namely v;1(9,02)v.,(90, 03) : ou(1)1,, + (log nyon. (3.4.4) which follows if (3.3.56) and A2Vn,0 = (log n)T()n uniformly in 6. (3.4.5) 81 hold based on the same proof to show (3.3.55) in previous subsection 3.3.5. Since (3.3.56) is a easy consequence of condition b), we only need to show (3.4.5) henceforth. Consider the absolute values of the (i, j) th element of A2Vn,0, kn < i _<_ n — kn, which under condition a) equals lk-jl T). Z d;1(bik,n(6) - bik,n(90))08P90( ) k=1 1 _ _ log log n)3 5d.- ‘08 Z lb.k,n(6)41.k,n(eo)|+d.108 Z Ib.k,n(6)—b.-).,n(eo)|=0u((—n—). k:|k—i|§kn k:|k—i]>kn i Because (1,—1 : 0,,(n) and 1),-mm) —b,-,,,,,(90) : Ou((log log n)1/8/n2) for kn <1: 3 n—kn by (3.4.2) and Mean Value Theorem. Similarly, it also follows from (3.4.2) that the elements on the first and last kn rows are Du((log log n)1/4), thus (3.4.5) and therefore (3.3.6) follows. By the same reasoning as before, to show (3.3.7), we compare the corresponding traces and quadratic forms in (3.3.62) and (3.3.63) in the same way as in proof of Theorem 3.3.3. Imitating the proof of (3.364) and (3.3.65) in section 3.3.5, one can show Lemma 3.4.1 along with (3.4.4) implies BVn. at) trace{(Vn o Tn)-1( o Tn)} = trace{V;1%g} + 011((108 ”)7), and (3.3.65) is still true in this general case. Furthermore, those two results give (3.3.7) immediately and the proof of Theorem 3.4.2 is completed. [:1 Next we can give a condition in terms of spectral density function for comparison between exact and tapered likelihood. Even though for two Matérn spectral densities, (3.3.19) cannot hold for any r2 > 2. However it might be hold for one Matérn spectral density and another spectral density of some covariance structure. Suggested by this idea, we have the following theorem of closeness of tapered and untapered likelihood. Theorem 3.4.3. If the underlying process is stationary Gaussian X (t), t 6 Rd, d S 3 having a mean 0 and a Matérn covariance function. Let f1 (A) be the spectral density 82 corresponding to isotropic Matérn covariagram K(h;o¥,61) and f1(A) be the spectral density corresponding to the tapered covariance function R (h; of, 01) = K (h; 0512, 61) K tap(h)- Suppose there exists r > d such that f1()\)—f1()\)_ h(A) _ O (]A|_r) as IA] ——> 00. (3.4.6) And the sampling locations {t1,t2, . . .} are from a bounded set D C N. Then, for any fixed 01 > O, with Po—probability 1, uniformly in 02 6 [w, v], €n,tap(01102)=€n(61302)+ 011(1), 8 €71,tap(61302) = 535 (9 80-2 671(61, 02) + 011(1), where P0 is the probability measure corresponding to the true parameter values 03, 00, V. This theorem can be proved by following the similar proof to show Thoerem 3.3.7 and Theorem 3.3.6, if we note that (3.4.6) implies (3.3.20). 3.4.2 Discussion In previous section, we investigated how the covariance tapering affects the asymp- totic efficiency of maximum likelihood estimators. We started out with the Ornstein- Uhlenbeck process that has an exponential covariance function. For this particular model, the inverse of the covariance matrix is a banded matrix. We would expect that, for this model, it would be easy to establish the asymptotic properties of ta— pered MLE. It turns out that even in this simple case, it is quite involved to derive the asymptotic distribution of the tapered MLE. We also considered the general case when the stochastic process on the real line has a covariance function that may not be exponential. We gave conditions on the coeffi— cients of drop-one prediction under which the tapered MLE and true MLE will have the same asymptotic distribution. One of the condition requires that the coefficient 83 decays rapidly as the sampling site moves away from the prediction site. Similar con- ditions can be given for a spatial process on le for d > 1 and the results in Subsection 3.4.1 can all be extended to the high dimensional case. We did not put more conditions on the taper in the general case than that in the exponential model. It is an interesting problem to establish Theorem 3.4.2 by weakening the conditions (C1) and (C2) but restricting the taper to a class that satisfies certain properties. Thoerem 3.4.3 suggests that when the spectral density of the taper has a lower tail than the spectral density of the covariance function of the underlying process, it is then possible to establish Theorem 3.4.2. 3.4.3 Appendix 3: Proof of Lemma in subsection 3.4.1 Proof of Lemma 3.4.1. Let n be large enough such that [n7] > 2kn. As in the proof of Lemma 3.3.10, we assume 02 = 1 without loss of generality. In addition, we assume all quantities with indices out of range [1,n] are 0, say bij = 0 if j < 1 or 3' > n. Let g(t) and g(t) be bounded even function on IR that may depend on 0 and have bounded second derivative on [—1,0)U(O, 1]. Define Q and G to be the matrices whose (i, j)th element is g(t, — tj) and g(ti — tj) respectively. We will show v;1(G 0 Q) : L1 + L2 + (log log n)3/86n(0) (3.4.7) where Lk, k = 1, 2 stands for a matrix whose (i,j)th element is denoted by Til-1k and . . i-Hc-n 2 _ J 5-7 Z'-J'| 1 n ) EIth n )) Tij,2:1{|7i—jI2i n n n n ). (3.4.11) Let 5,]— denote the (i,j)th element of V;1(G0Q). Since Zézll—ipkn IbigI = Du(kg/n2) and g(Il—jl /n) = p((l—j)/n), it follows from (3.3.35) and (3.4.11) that i+kn . - 3 . 3/8 _ i — _7 l — ] k ~ (log logn) €13: d,- 1: 5719(7)“ HUM—7721‘): Tij,1+Tz'j+0u( n )- (3-4-12) [=1—kn Where 12:11 M" (7—11 — 12—11 6 J t) — dgle'u ) 2: b. ( ) (3.4.13) Consider the case that Ii —jI 2 kn, then when i — kn g 6 S i + kn, 13 —j and i —j have the same sign. So by Taylor Theorem, i n )= g(2 f) + g’(—;,i)(—;—) + ou(——n2——). Plugging this into (3.4.13) gives 1 Ii—jl i-j ”kn 5 ~ - I Tij=d,- 9+(-—n )9(—n ) EIbif [23—1617, i + Ou((log log n)%/n2). (3-4-14) n We will show the following: . 3 (+17 7-) ou((loeloen)8/n2), Wage—kn; 2 b,)—: 1 (3.4.15) 3:149, n 071((log log TL); /n), otherwise. Before proving (3.4.15), we note that (3.4.15), (3.4.14) and d271 = Ou(n) yield, if Ii_‘jl2:kn 3 Du((loglogn)8/n), if kn < i _<_ n — kn; a,- : (3.4.16) O,,((loglogn)1/4), ifi S kn or i > n — kn. i+kn . . . . l— — — £7 — On the other hand, by condition (C1), we always have 2 bigl ‘7' It Jl g( J) [=3—kn n n = Ou((loglogn)1/4/n), observing that Ilt—jl — Ii —jII S Il—iI. So in view of (3.4.13), if It —jI < kn, .. i— ' W = e’+(| njll0u((108108n)1/4)- This together with (3.4.12) and (3.4.16) completes the proof of (3.4.7) if we can show (3.4.15) now. We now set to prove (3.4.15). Let us only consider the case when kn < i g n — kn 86 because the other case is obvious under condition (C1). Note that when kn < i g n '— kn, ’l'l'kn . kn )k Z 17%;: 2:10) i,i+k— bi ,i—k)’ 6: 2— —kn kl:- It suffices to show kn. . k 3 2 2(52'2442—5232—12); = 022((108108 "lg/n ) (3-4-17) Because D; anVn = In and 02 is set to be 1, n Zbith(It€ — th) = 0 if i7é j. (3.4.18) Then we have 221:1 bz-jpg(Itg — tiikn I) = O, which implies 19:12 ( (ha—12109“ 2—k‘ti—kn)+bi,t+k200(tt+k"ti—kn»=‘PO(ti‘ti—kn)+0u(kfi/ 712% kn M21112- ,2—12109(t2+12n—t2'—k)+b2 MPeUHkn—takll=—Pe(t2442n-t2)+0u(l€?2/n2)- (3.4.19) where the remainders are Ou(k,3,/n2) because they are bounded by 2 j: |,_ j|> kn Ibijl' Since t,- = i/n, tiik — ti—kn = ti+kn — tirk = (kn :l: k)/n,k = 1,. ..,kn. Taking the difference of the equalities in (3.4.19), we get 2 (p2( "+k)— p2(’“""°))(b2,22—b2,2_2)=02(k2/n2). (3.4.20) 13kgkn " By condition (C3) and Taylor Theorem, kn—k , kn—k 2k 122 2k 72,2, “jib—pa 2 )=22( 2 )72'+07(72‘)=P'+(0)7; +024”: p2)( ), (3.4.21) 87 where p;(0) = limt._,0+ pI9(t) and we suppressed 6. The last equality follows from Mean Value Theorem for p’6(h) at 0. Combining (3.4.21) with (3.4.20) gives 2 10+(0 l—(bi,i+k_ bi,i—k): 014(kfi/n2la (34-22) l 0 for every 6 E Ia, b] and therefore has a uniform positive bound from below. Hence, (3.4.7) is established and proof (I) is completed. In what follows we will consider —(Vn1(Vn o T71.» and 2% (v-13Vn ——(V,,1 (Vn oTn)), from 0) inthe similar way as in the proof of Lemma 3.3. 10. First, consider—— 86 (3.3.60) it can be written as 8Vn 86 _V-1 V; 1(Vn o Tn) + V;1 (% o Tn) . (3.4.23) This together with (3.4.8) imply that $(V;1(Vn o T,,)) equals 1 8V", 86 86 8Vn 86 :V; — V; —(log log n)3/80n(kn) + V,,1( o Tn) , (3.4.24) where the second summand is (log log n)7/ 8On(2kn) because of (3.4.10) and the fact that On(kn)On(kn) = (log log n)1/ 8(u)n(2lcn). The last summand can be rewritten as V22 16;)” + V;1 (6W 0 (Tn —— Jn)) . (3.4.25) From (3.4.24) and (3.4.25), we see that 8 _ .. 5—2 1 still holds in this general case by the same reason as in the proof of Lemma 3.3.10, therefore the proof of (III) with r = 27" is completed by using Laplacian expansion to calculate inverse of matrix in the conventional way and the whole lemma is proved by taking an = kn in (I), (III) and an = 2kn in (II). U 90 3.5 Simulation and model fitting of climate data 3.5. 1 Simulation study We are aware that most of the theoretical results in this chapter is about process with one-dimensional index. So we conducted the simulation to assess the validity of the asymptotic results obtained for multidimensional case and study the finite sample performance of tapered MLE. Because both Zhang(2004) and Kaufman(2008) have pointed out in their work that even the general results apply only to estimators with fixed 6, but joint estimation of o2 and 6 is what we usually do in practice. Kaufman(2005) did simulation to show that the joint maximizer 8%,tap6gi’tap performs better than 8§,tap6%” unless the chosen 61 happens to be the true value or very close by. Therefore, we adopted joint maximization using 1000 independent realizations from Gaussian process with mean zero and Matérn covariogram (3.3.14) with parameter values 1! = 0.8, o = 1 and = 5 at each of increasing sets of locations within unit square. Wendland 1 tapering function K1(h; 7)) = (1 — $10M?) 7 > 0 will be used according to condition (A3). The Figure 3.1 shows all the original, tapering and tapered covariance functions. For the first part of the simulation, we calculated MLE, tapered MLE, and the approximated 95% confidence interval of r] = 0262” ( = 13.13), which is (8262” :l: 1.96 =1: V28262V/fi) based on asymptotic results (3.3.24) with '7 = 0.6,0.3,0.2,0.1 respectively. To obtain those irregularly spaced points, we first generated grid points with increment 0.1 on each side in unit square, then add some more closely spaced points within [0, 0.2]2 with increment 0.03, then some more with increment 0.01, 0.02 and so on. Because it has been demonstrated in Stein [(1999b), p.223] that having some more closely spaced sample points will dramatically improve the estimation of variogram. Following this idea, we generated five set of sample points with increasing sample sizes 169, 298,385, 751, 1087. Figure 3.2 shows the sample location with sam- ple size 169. 91 f I. Tapering Covariance Function o- : " . — Matern Covariance(v=0.8) i". ‘2 - - - Wendland1 Taper I‘.’ ‘\ Tapered Cov(y=0.6) g _ '1"; 2--- Tapered Cov(y=0.3) ‘. '1 ‘ Tapered Cov(y=0.2) I ‘. ‘ -—-- Tapered Cov(y=0.1) l (D 2 8 o' " I C I .Q 1 I6 . > 8 w. _ O N on —I q _ . o I 1 I I 1 1 O 0 0 2 0.4 0 6 0 8 1 0 distance Figure 3.1: Matérn and tapered covariograms. Boxplot of MLE, tapered MLE with 7 = 0.6, 0.3 for 6, 02 and 0262” are shown in Figures 3.3, 3.5 and 3.7 with increasing numbers of sample size. The corresponding histograms are depicted in Figures 3.4, 3.6 and 3.8. In each figure, the plot of exact MLE serves as baseline. From these boxplots and histograms we can see the estimates of 6 and 02, untapered or tapered, are skewed and quite biased no matter how large the sample size is, but the estimates of (7262" are more and more centered at true value and have less and less variability with increasing sample size. It again suggests 2 that neither 0 nor 6 is consistently estimable, but the product 0262” is. Moreover, the distributions for both MLE and tapered MLE are more and more normal with 92 1.0 0.8 0.6 0.4 0.2 0.0 Sample Size 169 o o C) o o c) o o (3 o o o o o c> o o 0 (3 o o o t) o o o C) o o C) o o o o o C) o o C) o o C) o o o o c) o o C) o o C) o o o o C) o o C) o o C) o o o o C) o 0 <3 0 0 <> 0 o o o (3 o o C) o o C) o o 8008bod’ o o (3 o o C) o 0 0000000 888%8833 o o (3 o 0 <3 0 0 0000000 0000000 ooomooa> o o C) o o C) o o I I I I I I 0 0 0.2 0.4 0.6 0.8 1.0 x Figure 3.2: Sample data set locations of size 196. 93 decreasing variance, this agrees with the asymptotic normality result given by The- orem 3.3.8. The distribution for tapered MLE with 7 = 0.6 looks more similar to the one given by exact MLE than tapered MLE with 7 = 0.3 but it takes more time to compute though. It is not surprising, since larger tapering range will keep more information and therefore gives more accurate results, however the tapered covariance matrix will be less sparse and therefore more computational time is needed. Actually, our simulation study shows that as long as the number of observations keeps increasing, the distribution of tapered MLE of 0262” will be more and more symmetric around true value and normal with decreasing variance even when the degree of tapering is more severe with 7 = 0.2. As showed in Figure 3.9. When n=751, the distributions for 8262” with different tapering ranges 0.3 or 0.2 are roughly the same. This table 3.1 lists average standard deviation, the relative coverage frequency (rcf) and the average length of the intervals (al) of 95% confidence interval constructed using the result in Theorem 3.3.8. Even though rcf’s by using tapering technique are lower than the nominal level 95% for sample size 169 and confidence intervals are inflated more than exact MLE, they are dramatically improved and tend to perform similar to the MLE as sample size increases. When sample size reaches 385, all the relative coverage frequencies are quite close to the nominal level 95% and interval lengths by tapering are close to the MLE based one. These findings support our conjecture that the convergence results in Theorem 3.3.4 and Theorem 3.3.8 should still hold for multidimensional case under certain conditions. We have seen that the distribution of tapered MLE is more and more comparable with that of exact MLE with increasing sample size albeit the tapering rage is mod- erate small. This comparable finite sample behavior of tapered MLE as opposed to that of exact MLE is also shown by using even more severe degree of tapering with taper range 0.1 given that the sample size is large enough, see the case when n = 1087 as suggested in the Figure 3.10. This justifies our theoretical result in Theorem 3.3.8, 94 which gives asymptotic normality of the tapered estimate under the condition about the tail behavior of tapering function, but has no constrains on the tapering range. That is, theoretically speaking, the choice of tapering range has no impact on the asymptotic efficiency. For the second part of simulation, we recorded the estimates and timing for samples with sample size ranging from 1000 to 8000 to illustrate the comparable estimation as well as computational gain. In this part, the way to locate the sample points is to generate grid points in unit square with increment 0.005 and those with increment 0.0025 within [0, 0.2]2, then randomly choose ranging from 1000 to 8000 points out of them as sample locations. Aside from hardly reducing the efficiency of the estimation the computational efficiency is also achieved. See Figure 3.11, the red one depicts the time needed to derive the traditional MLE and blue one for tapered MLE, the red one goes up much faster than the blue one. So when the sample size gets larger, the time saving is more and more appealing. But the estimates almost stay the same when sample size is more than 2000. We can see from Table 3.2 when the sample size is 7000, it takes almost 9 times longer to get the exact MLE. (7 111 vs. 1 h 6 m using department server with CPU 2.8 Ghz and 8.00 GB RAM), but the difference of the estimates is only 0.02. All the computations in this section were conducted using open-source software R, with special courtesy to “Spam” created by Furrer. 95 Distribution of MLE and Tapered MLE for Matern model (v = 0.8, n=169 ) U I MLE E g D Tapered _y=0.6 T 9 [j Tapered _y=0.3 : 3* ".2 : 1 9 _ : o I 3 1r 0 O .. - » a 3 o I I I A B C Figure 3.3: The boxplot of MLE and tapered MLE (where A: 6, B: 82, C: (7262") with sample size 196. 96 Histogram of MLE (n = 169, Matern v = 0.8) I ~ ‘I 200 1 250 l l 0 50 100 m1 0 100 141 0 100 11111111 IIIII IIIIII IIIIIII 2 4 6 8 10 0.5 1.0 1.5 2.0 25 3.0 8 101214161820 Tapered MLE (y: 0.6) _ l 0 T1 _ 1W 0 1 I 9‘ gr 8] 8 - o . .8-1 0 ‘I 7 I 7 I I - I I I I I I o ‘1 I I I I I 2 4 6 8 0.5 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 Tapered MLE (2:03) _ I"‘_ 200 1 250 O 50 1 1 1 0 100 1 O 100 1 174. 1 1 1 1 1 Figure 3.4: The histogram of MLE and tapered MLE (where A: 6, B: 82, C: 8262”) with sample size 196. 97 Distn'bution of MLE and Tapered MLE for Matem model (v = 0.8, n=298 ) I MLE [j Tapered _y= 0.6 [:1 Tapered _y= 0.3 15 1 -—--— n—o-q -c-c- pun--- i--- --I ban 0 _DO Figure 3.5: The boxplot of MLE and tapered MLE (where A: 9, B: 62, C: {1282” ) with sample size 298. 98 Histogram of MLE (n = 298, Matern v = 0.8) O | a a o O 8 8 g 0 O 0 2 4 6 8 10 0 1 2 3 4 Tapered MLE (y: 0.6) o 0 ‘ o a a l 2 a o I ’ 9 I|I|I o o o _..I I.-__ |_l_fi—I F—T—T—I 4 6 8 10 0.5 1.0 1.5 2.0 2.5 10 12 14 16 Tapered MLE (y: 0.3) I l 8 o ' e ” a i I 8 3- "II o o o _..I II-__ 1_l_lfi_l'_l_l_l T—T—lfifi—fi i——T—T_| 2 3 4 5 6 7 8 9 0 1 2 3 4 5 10 12 14 16 A B C Figure 3.6: The histogram of MLE and tapered MLE (where A: (9, B: (P, C: (3262") with sample size 298. 99 Distribution of MLE and Tapered MLE for Matern model (v = 0.8, n=385 ) o o o I MLE 1:] Tapered_y=0.6 _8_ _a. 3- m _ Cl Tapered_y=0.3 : : : l ..._.... ....__ 9 - 0 E o g- 0 8 m-E . . 2 _L i 9 t ‘0' m j l i - =5- a: o I I I A B C Figure 3.7: The boxplot of MLE and tapered MLE (where A: d, B: {72, C: d2é2”) with sample size 385. 100 Histogram of MLE ( n = 385, Matern v = 0.8) 400 250 200 0 0 0 100 100 200 300 Tapered MLE (y: 0.6) l l I I 8 .II III- F‘l—I—Ffi—i—lfi 1—jfl—l—l 12 14 16 18 345678910 0.51.0 1.52.0 10 50 150 100 200 300 O O O Tapered MLE (y: 0.3) i o ‘0- I -. I 1.- 12 14 16 4 6 8 10 0.5 1.0 1.5 2.0 2.5 10 100 200 300 2 50 100 0 O O A B C Figure 3.8: The histogram of MLE and tapered MLE (where A: d, B: {72, C: 6262”) with sample size 385. 101 Tapered MLE of 0292” for Matem Model with v = 0.8 99 ._ 8 o o o (D ._ E E. _8_ 8 E I ; fl 8 £- $— 5— s: — I ' l 7' Fri 1' - _' __ _L ; 8 : o 3— g z I '6' 0 I o I O _L : . 9 _b. _L co — 0 8 I Tapered_y=0.3 0 _ E Tapered __y=0.2 I I I I I I I I 169 298 385 751 169 298 385 751 SAMPLE SIZE Figure 3.9: The boxplots of tapered MLE with different tapering range and increasing sample size. 102 Table 3.1: Average standard deviation(asd), relative coverage frequency(rcf), average length of intervals(al) of 95% CI of 0202" for Matérn model with u = 0.8. n 169 298 385 751 MLE rcf 0.93 0.92 0.93 0.94 3.1 5.67 4.26 3.73 2.65 asd 1.45 1.09 0.95 0.68 7 = 0.6 rcf 0.91 0.92 0.94 0.94 a1 5.59 4.27 3.77 2.69 asd 1.43 1.09 0.96 0.69 7 = 0.3 rcf 0.85 0.91 0.94 0.93 a1 5.28 4.15 3.76 2.71 asd 1.35 1.06 0.96 0.69 7 = 0.2 rcf 0.79 0.85 0.93 0.93 al 5.08 4.00 3.67 2.71 asd 1.30 1.02 0.94 0.69 Table 3.2: Timing and estimation for tapered MLE with tapering range 7 = 0.1 using computer with CPU 2.8 Ghz and 8.00 GB RAM 11 1000 2000 3000 4000 5000 6000 7000 8000 MLE 14.25 12.46 13.26 13.06 13.01 13.66 13.17 13.47 3 41.77 208.92 507.83 966.31 1499.67 2590.83 3954.36 5584.57 TMLE 13.76 12.40 13.18 12.97 12.99 13.72 13.19 13.52 s 8.37 33.61 79.83 115.29 189.07 295.11 442.87 624.83 103 Tapered MLE of (5262v <0 _ ‘- o 0 o O O 9 — .5. .8. e 3 — : : fl : _ . i I ..__... . 17° — i ‘— I—> § -.l—— m; 1_' O I ' n- : s: — E : i -- —o— (rcf= 0.943) t . (ref: 0.911) 0 1: —I i .1. I ~ 8 . (ref: 0.868) 2 — "‘5— Matern(v=0.8) 0 7:0.1 1 l T 385 751 1087 SAMPLE SIZE Figure 3.10: The boxplots of tapered MLE with severe degree of tapering range. 104 seconds 3000 5000 1 000 0 Timing of Estimating 0292“ A —e— _ "9" WEEl‘HH) / - A ./ / _ /A /A o..--0 0 «3,. MN... I l I l 1 l _l 1000 3000 5000 7000 samplesize estimate Estimates of (5202y for Matern Model with II: 0.8 16 15 14 13 12 =\ I.\_ I/ _3_ 111150141) tire /\ l 1000 I i | l l 3000 5000 sample size I l 7000 Figure 3.11: Timing and estimation for tapered MLE for Matérn model. 105 3.5.2 Fitting a Matérn model to climate data One of the applications of tapering is to fit a model to a large climate data set, which has been common based on satellite observing or remote sensing systems. The tapering technique is often important to fill in the missing value or refine the irregular data observations to standard gridded version using kriging, because the kriging or estimation involving repeatedly inversion of large matrix can be compu- tationally too heavy to be feasible. To demonstrate how tapering technique obvi- ate these hurdles without sacrificing the accuracy in estimation, we chose to use the climate data set analyzed in Furrer, Genton and Nychka(2006). Actually the whole data base consists of monthly average temperature and total precipitation from 1895 to 1997 at 5,900 weather stations (Johns, et al.(2003)). It is accessible via http://www.image.ucar.edu/GSP/Data/US.monthly.met by the University Corpora— tion for Atmospheric Research (UCAR). In order to have exact MLE to compare against and make the data set closer to Gaussian and stationary, we studied the precipitation anomaly field of April 1948 recorded at 5909 stations (See coverage map Figure 3.12) as Furrer, Genton and Ny- chka(2006) did for kriging, where however they assume the parameters are all known. It is noted that the “anomaly” field here means the the raw monthly total precipitation recorded in the conterminous US for April 1948 were taken square root and standard- ized based on long run mean and standardization to meet the model assumptions (Fuentes et al., 1998, 2005). We fit a Matérn with V = 0.3 and calculate MLE and ta- pered MLE of consistently estimable parameter 0292” , tapering still with Wandland 1 taper with 7 = 50 miles. From Table 3.3, we note that the tapered estimate 0.0158487 is very close to MLE 0.0158899, so is the standard error and 95% confidence inter- val (0.0153168, 0.0164630) vs. (0.0152771, 0.0164204) based on asymptotical result Theorem 3.3.8. However, the time to estimate exact MLE is almost 8 times longer (3.5 m vs. 28.3 m). Hence the accuracy and computational gain of the tapered MLE 106 Table 3.3: Fitting Matérn model with l/ = 0.3 and estimate consistently estimable parameter U202", tapering with 7 = 50 miles. estimate SD 95% CI length time (s) MLE 0.0158487 0.0002917 (0.0152771, 0.0164204) 0.0011432 1698.99 TMLE 0.0158899 0.0002924 (0.0153168 , 0.0164630) 0.0011462 212.95 are obtained by applying the tapering method to large datasets. The more sizable computational gain will be archived if we work on even larger datasets. Figure 3.12: The Precipitation Anomaly Field of April 1948. 3.6 Future Work There are some open problems for future research. First, for the Matérn model, the estimator of 0202” is constructed by fixing 0 at an arbitrary value. For a finite sample, common practice is to also estimate 0. It is an interesting question to see if Theorem 3.3.1 and Theorem 3.3.8 still hold for the MLE and tapered MLE by jointly maximizing 107 these two parameters. Our conjecture is that Theorem 3.3.8 can be extended to this case. Second, the theoretical results in this work are for the processes with one dimen- sional index. It is a more practical problem to study the high dimensional case. This is our on-going research. Third, in our study, we fixed the tapering range. We can consider another kind of tapering regime, which is to let the tapering range depend on the sample size and tend to zero with increasing sample. This will give a lot faster calculation, but we still need to find out theoretical justification of this type of tapering. Finally, huge datasets arise most frequently when spatial locations are observed repeatedly over time. So we believe covariance tapering is then potentially more powerful to deal with the large spatio-temporal data. This is another direction of my current research. 108 Bibliography Abramowitz, M. and Stegun, I. (1967). Handbook of Mathematical Functions. US. Government Printing Office, Washington, DC. (eds). Aronszajn(1950). Theory of reproducing kernels. Transactions of the American Math- ematical Society, 68:337—404, 1950. Blackwell, D. and Dubins, L. (1962). Merging of opinions with increasing information. Annals of Mathematical Statistics, 33, 882—886. Chatterji, S.D., Mandrekar, V.S., (1978). Equivalence and singularity of Gaussian measures and applications, Probabilistic Analysis and Related Topics (Ed. A.T. Barucha Ried), 1, AP, 169—197, Chen, H.-S., Simpson, D.G., Ying, Z., (2000). Infill asymptotics for a stochastic process model with measurement error. Statist. Sinica 10, 141—156. Davis, T. A. (2006). Direct Methods for Sparse Linear Systems. Philadelphia: Society for Industrial and Applied Mathematics. Du, J ., Zhang, H., and Mandrekar, V. (2007). F ixed-domain asymptotic properties of tapered maximum likelihood estimators. Accepted by the Annals of Statistics . Furrer, R., Genton, M. G., and Nychika, D. (2006). Covariance tapering for interpo- lation of large spatial datasets. Journal of Computational and Graphical Statistics, 15(3), 502—523. Gilbert, J. R., Moler, C. and Schreiber, R. (1992). Sparse matrices in MATLAB: Design and implementation. SIAM J. Matrix Anal. Appl, 13, 333-356. Gneiting, T. (1999). Radial positive definite functions generated by Euclids hat. J. Multivar. Anal. 69, 88-119. Gneiting, T. (2002). Compactly supported correlation functions. Journal of Multi- variate Analysis, 83(2), 493-508. 109 Graybill, Franklin A. (1983). Matrices with Applications in Statistics (Second Edition). Wadsworth, Belmont, California. IR(1978)) Ibragimov, I. A. and Rozanov, Yu. A. (1978). Gaussian Random Processes. Springer-Verlag, New York. Kaufman, C. (2005). Covariance Tapering for Likelihood-based Estimation in Large Spatial Datasets. Ph.D Thesis Proposal, 1—33. Kaufman, C., Schervish, M. and Nychka, D. (2008). Covariance tapering for likelihood- based estimation in large spatial datasets. Journal of the American Statistical As- sociation. (In press). LeMone, M.A., Chen, R, Alfieri, J.G., Cuenca, R., Hagimoto, Y., Blanken, P.D., Niyogi, D., Kang, S., Davis, K. and Grossman, R. (2007). NCAR/CU surface vegetation observation network during the international H20 Project 2002 field campaign. Bulletin of the American Meteorological Society, 88, 65-81. Loh, W.-L. (2005). Fixed-domain asymptotics for a subclass of Matérn-type Gaussian random fields. The Annals of Statistics, 33, 2344—2394. Mardia,K. V. and Marshall,R. J. (1984). Maximum likelihood estimation of models for residual covariance in spatial regression. Biometrika 71, 135—46. Mirsky, L. (1955). An Introduction to Linear Algebra. Oxford: Oxford University Press. Mitra, S., Gneiting, T., and Sasvari, Z. (2003). Polynomial covariance functions on intervals. Bernoulli, 9(2), 229-241. Pissanetzky, S. (1984). Sparse Matrix Technology. Academic Press. Ripley, B. D. (1981). Spatial Statistics. Wiley, New York. Rudin, W, (1966). Real and Complex Analysis, McGraw Hill. Stassberg, D., LeMone, M.A., T. Warner, T., and Alfieri, J. G. (2008). Comparison of observed 10 m wind speeds to those based on Monin-Obukhov similarity theory using aircraft and surface data from the International H2O Project. Mon. Wea. Rev. 136, 964—972. Searle, SR. (1971). Linear Models. Wiley, New York. Stein, M. L. (1988). Asymptotically efficient prediction of a random field with a misspecified covariance function. The Annals of Statistics 16, 55—63. Stein, M. L. (1990a). Uniform asymptotic optimality of linear predictions of a random field using an incorrect second-order structure. Annals of Statistics, 18, 850—872. 110 Stein, M. L. (1990b). Bounds on the efficiency of linear predictions using an incorrect covariance function. The Annals of Statistics 18, 1116—1138. Stein, M. L. (1990c) A comparison of generalized cross validation and modified max- imum likelihood for estimating the parameters of a stochastic process. The Annals of Statistics 18, 1139—1157. Stein, M. L. (1999). Predicting random frields with increasing dense observations. Annals of Statistics, 9, 242—273. Stein, M. L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. Springer, New York. Weckworth, T.M., Parsons, D.B., Koch, S.E., J.A. Moore, M.A. LeMone, B.B. Demoz, C. Flamant, B. Geerts, J. Wang, W.F. Feltz (2004). An overview of the international H2O project (IHOP_2002) and some preliminary highlights. Bull. Amer. Meteor. Soc. 85, 253—277. Wendland, H. (1995). Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree. Advances in Computational Mathematics, 4, 289—396. Wendland, H. (1998). Error estimates for interpolation by compactly supported radial basis functions of minimal degree. Advances in Computational Mathematics, 93, 258—272. Wu, Z. M. (1995). Compactly supported positive definite radial functions. Advances in Computational Mathematics, 4, 283-292. Ying, Z. (1991). Asymptotic properties of a maximum likelihood estimator with data from a Gaussian process. Journal of Multivariate Analysis, 36, 280—296. Ying, Z. (1993). Maximum likelihood estimation of parameters under a spatial sam- pling scheme. Ann. Statist. 21, 1567—1590. Zhang, H. (2004). Inconsistent estimation and asymptotically equivalent interpolations in model-based geostatistics. Journal of the American Statistical Association, 99, 250—261. Zhang, H and Du, J (2008). Covariance Tapering in Spatial Statistics. in Positive definite functions: From Schoenberg to space-time challenges, (eds) J. Mateu and E. Porcu. Graficas Castaii, Spain, 181—198. (ISBN: 978-84-612—8282-1) Zhang, H. and Zimmerman, D. L. (2005). Towards reconciling two asymptotic frame- works in spatial statistics. Biometrika, 92, 921—936. 111 3 1293 (13062 9194