n ..,. .. . vi at»? :1. f . ids? u»... . 3....1. 4!. in , . t1? .4... EU 5 F L...” )4 . .fik 5 V II I. 71 ‘1}. “wavfi‘flqwu . t ‘ I .3 f . v E: . fixtkrO ! .1} ‘ an} . .5 . duff it tilt: .. .ZJI azufih if. flit .. :7. . .4. 2r :5an 1.1 . . in... Va) in}. [‘1. hr}: EVIL.“ 3‘01. (to (NM LIBRARY Michigan State University This is to certify that the dissertation entitled METAMODEL-BASED OPTIMIZATION APPLIED TO CRASH WORTHINESS AND SHEET METAL HYDROFORMING presented by Aravindhan Ravisekar has been accepted towards fulfillment of the requirements for MS . Mechanical Engr. degree in 'Mafi professor Date 28 HAM) ZODB MS U i: an Affirmative Action/Equal Opportunity Institution 0-12771 PLACE IN RETURN Box to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/01 cJCIRCJDeteDuepes-pJS METALMODEL-BASED OPTIMIZATION APPLIED TO CRASHWORTHINESS AND SHEET METAL HYDROFORMING By Aravindhan Ravisekar A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Department of Mechanical Engineering 2003 ABSTRACT METALMODEL-BASED OPTIMIZATION APPLIED TO CRASHWORTHINESS AND SHEET METAL HYDROFORMING By Aravindhan Ravisekar A metamodel-based optimization approach is presented that can be used efficiently to replace complex, computationally expensive computer analysis and simulation codes in optimization. Kriging is used as the statistical interpolation technique. Optimization is performed using a gradient-free algorithm based on a Bayesian technique. The proposed approach is applied to problems of crashworthiness optimization and sheet metal hydroforming. The objective in the problem of crashworthiness optimization is to minimize the sum of maximum section force and maximum displacement in the rails of a bumper-rail assembly of an automobile. The objective in sheet metal hydroforming is to arrive at an optimal fluid pressure-punch stroke path that would reduce wrinkling failures and allow for greater draw depths before rupturing. The proposed approach is efficient in terms of computational time involved in optimization. To My Family iii ACKNOWLEDGMENTS I would like to thank, first and foremost, my advisor, Dr. Alejandro Diaz, for his guidance and support throughout my graduate career and during the completion of this thesis. His leadership, support, attention to detail and hard work have set an example, that I hope to match in my career. I am also grateful to the members of my committee, Dr. Andre Benard and Dr. F arhang Pourboghrat , for their interest in this research. I also owe a great debt of gratitude to Ms. Cori I gnatovitch for her continued support throughout this thesis work. Thanks also to my family for supporting me in my educational pursuits and all my friends for their encouragement. iv TABLE OF CONTENTS LIST OF TABLES .................................................................................. vii LIST OF FIGURES ................................................................................ viii CHAPTER 1 INTRODUCTION ................................................................................... 1 1.1 Motivation ............................................................................... 1 1.2 Background ............................................................................. 2 1.3 Problem Statement ..................................................................... 3 1.4 Selection of Optimization Approach ................................................ 4 1.5 Thesis Outline .......................................................................... 5 CHAPTER 2 LITERATURE REVIEW ............................................................................. 6 2.1 Design of Experiment ................................................................. 6 2.1.1 Factorial Designs .......................................................... 7 2.1.2 Latin Hypercubes .......................................................... 8 2.2 Metamodel Selection .................................................................. 9 2.2.1 Least Squares ............................................................ 10 2.2.2 Interpolating and Smoothing Splines .................................. 11 2.2.3 Fuzzy Logic and Neural Networks .................................... 11 2.2.4 Radial Basis Functions and Wavelets ................................. 12 2.2.5 Response Surface Methodology ....................................... 13 2.2.6 Kriging ..................................................................... 15 CHAPTER 3 KRIGING AND EFFICIENT GLOBAL OPTIMIZATION .................................. 16 3.1 Kriging ................................................................................. 16 3.1.1 History and Terminology ................................................ 16 3.1.2 General Kriging Approach .............................................. 17 3.1.3 Derivation of the Prediction Formula ................................. 20 3.1.4 Evaluation of Kriging ................................................... 25 3.2 Efficient Global Optimization ...................................................... 30 3.2.1 Expected Improvement Criterion ....................................... 32 3.2.2 Evaluation of Efficient Global Optimization ......................... 35 CHAPTER 4 CRASHWORTHINESS OPTIMIZATION OF A BUMPER-RAIL ASSEMBLY ........ 41 4.1 Background on Crashworthiness Analysis ........................................ 41 4.2 Finite Element Model ................................................................ 43 4.3 Problem Statement .................................................................... 45 4.4 Optimization ........................................................................... 46 4.4.1 Case 1 ..................................................................... 46 4.4.1.1 Discussion of Results .......................................... 47 4.4.2 Case2 ...................................................................... 52 4.4.2.1 Discussion of Results .......................................... 54 4.5 Conclusion .............................................................................. 58 Chapter 5 FINDING OPTIMAL FLUID PRESSURE-PUNCH STROKE PATH IN STAMP HYDROFORMING ...................................................................... 60 5.1 Background on Stamp Hydroforming ............................................. 60 5.2 Summary of Previous Works ........................................................ 62 5.3 The Finite Element Model .......................................................... 65 5.4 Problem Formulation for Surrogate-Based Optimization ....................... 67 5.5 Problem Statement ...................... . .............................................. 71 5.6 Optimization ........................................................................... 72 5.6.1 Step 1 ...................................................................... 72 5.6.1.1 Discussion of Results ...... ' ..................................... 74 5.6.2 Step 2 ...................................................................... 76 5.6.2.1 Discussion of Results ........................................... 78 5.6.3 Step 3 ...................................................................... 80 5.6.4 Step 4 ...................................................................... 81 5.6.4.1 Discussion of Results ............................................ 82 BIBLIOGRAPHY ................................................................................... 84 vi LIST OF TABLES Table 1. Initial designs for case 1 .................................................................. 47 Table 2. Search points generated by EGO for case 1 ............................................ 50 Table 3. Initial designs for case 2 .................................................................. 54 Table 4. Search points generated by EGO for case 2 ............................................. 57 Table 5. Initial Designs for step 1 .................................................................. 73 Table 6. Initial Designs for step 2 .................................................................. 77 Table 7. Search points generated by EGO for step 2 ............................................ 79 Table 8. Initial designs for step 3 ................................................................... 80 Table 9. Initial Designs for step 4 .................................................................. 81 Table 10. Search points generated by EGO for step 4 ........................................... 82 vii LIST OF FIGURES “ Images in this thesis are presented in color” Figure l. Full factorial designs ....................................................................... 7 Figure 2. 31 Fractional factorial design .............................................................. 8 Figure 3. Latin hypercube sampling (LHS) ......................................................... 9 Figure 4. Latin Hypercube sampling of 21 points ................................................. 26 Figure 5 True and the kriging model ............................................................... 27 Figure 6 True Vs Predicted responses ............................................................. 28 Figure 7 Cross validation predictions from the kriging model .................................. 30 Figure 8 Surface plot of Branin function .......................................................... 37 Figure 9 Contour plot of Branin fimction ......................................................... 38 Figure 10 Initial LHS points for Branin function .................................................. 39 Figure 11 Initial points and points generated by EGO ........................................... 40 Figure 12 Top view of the bumper-rail assembly before crash ................................. 43 Figure 13 Symmetric model of the bumper-rail assembly ....................................... 44 Figure 14 Latin Hypercube Sampling for case 1 ................................................. 46 Figure 15 Scatter Plot of true responses before Optimization for casel ....................... 48 Figure 16 Scatter Plot of true responses after Optimization for easel ......................... 50 Figure 17 Resultant sectional force Vs. time at optimal solution for case 1 .................. 51 Figure 18 X-displacement Vs. time at optimal solution for case 1 ............................. 52 Figure 19 Latin hypercube sampling for case 2 ................................................... 53 Figure 20 Scatter Plot of true responses before Optimization for case2 ....................... 54 viii Figure 21 Scatter Plot of true responses after Optimization for case2 ......................... 55 Figure 22 Resultant sectional force Vs. time at optimal solution for case 2 .................. 57 Figure 23 X-displacement Vs. time at optimal solution for case 2 ............................. 58 Figure 24 Optimum fluid pressure—punch stroke path from previous work .................. 63 Figure 25 FLD Failure and distribution of strains with the initial pressure profile ........... 64 Figure 26 Full model created for the stamp hydroforrning process with a hemispherical punch, using square blank .............................................................. 65 Figure 27 Quarter-model for the hydroforrning process using a hemispherical punch with a round blank ............................................................................. 66 Figure 28 F LD Failure and distribution of strains for high values of pressure ............... 68 Figure 29 Fluid pressure—punch stroke path: Upper limit ....................................... 69 Figure 30 F LD Failure and distribution of strains: Upper Limit ................................ 70 Figure 31 Latin hypercube sampling for step 1 .................. . ................................. 73 Figure 32 Scatter Plot of true responses before Optimization for step 1 ....................... 74 Figure 33 Scatter Plot of true responses after Optimization for step 1 ......................... 75 Figure 34 F LD Failure and distribution of strains after step] ................................... 76 Figure 35 Latin hypercube sampling for step 2 ................................................... 77 Figure 36 Scatter Plot of true responses before Optimization for step 2 ...................... 78 Figure 37 FLD Failure and distribution of strains after step 2 ................................. 79 Figure 38 FLD Failure and distribution of strains for the optimized pressure profile. . . . . ...82 ix CHAPTER 1 INTRODUCTION 1.1 Motivation Computer-based simulation and analysis is used extensively in engineering for a variety of tasks. Practical engineering analysis oflen requires running complex, computationally expensive c omputer analysis and simulation c odes s uch as finite e lement analysis and computational fluid dynamics models. Despite the steady increase in computing power, these analyses seem to pose computational problems in terms of time consumed. As a result many optimization problems in industries become expensive when solved by conventional optimization techniques. In engineering, performance of a product is governed by its design. Computational expense due to time could negate many good features of the product design. Hence in recent years, industries have started focusing more on finding alternate techniques to solve the computationally expensive optimization problems. A widely accepted practice is to build an inexpensive approximation to the expensive problem and solve or optimize the inexpensive approximation instead of the original problem. These approximations are normally referred to as coarse models and the original problem as a fine model. There are again several ways to build a coarse model. In terms of finite element analysis a coarse model would be one with a coarser mesh than the fine model. This readily tells that there is considerable loss of accuracy in a coarse model. Hence, there is a necessity to develop a coarse model, which closely approximates the true problem. 1.2 Background Initially, function approximations used Taylor series expansion, which required the gradient of the function. Most of the complex problems, which need approximate models, are normally multi-modal and highly nonlinear. This makes it difficult to get the derivative information out of these functions. As a result, optimization methods that require derivative information are not suitable for these problems. Lately, researchers ([28], [29]) in the field have come up with the result that gradient-based approaches are not compatible for the complex problems of modern .day. Hence, derivative-free optimization methods are gaining popularity among researchers. Curve-fitting techniques such as least squares and response surfaces were initially used to build the approximations. Another solution to the problem of fitting highly nonlinear data involves the 11 se 0 f splines, w here the p olynomials are d efined in a piecewise m anner rather than a single expression. Common to all the modeling techniques mentioned above is the assumption that the underlying data are p olynomial in nature. A method, which provided a suitable alternate to above techniques, is kriging [9]. The origin of kriging is in mining and geostatistics. A Detailed description of these methods is given in chapter two. 1.3 Problem Statement The objective of design optimization is to select the values of design variables xi(i =1,....,n) subjected to various constraints in such a way that an objective fimction f = f (x) attains an extreme value. This can be expressed in the abbreviated form min {f(x):h(x)=0,g(x)_<_0} (1.1) x93" with ‘R”the set of real numbers, f an objective function,xe SR"a vector of 11 design variables, g(x) a vector of in inequality constraints, h(x) a vector of p equality constraints, and X = {x 6 SR" :h(x) = 0, g(x) S 0} the feasible domain. The field of interest in this thesis is simulation-based optimization. Consider, for example, a crashworthiness test performed on a vehicle model. There may be hundreds of design parameters and millions of degrees of freedom. The huge computational time involved could hinder finding an optimum solution to the test problem. Replacing such expensive test by computer simulation may reduce the cost of the experiment. But these computer simulations may prove expensive in terms of computational time involved. Hence, there is a need to find an inexpensive solution. The present work focuses on building statistical approximations by taking data from such expensive computer simulations. The statistical models are referred to as “metamodels”[35], meaning model of a model. The so-called metarnodels are considerably less expensive because they require less numbers of evaluations of complex finite element models. The functions approximated by metamodels are typically (a) noisy or discontinuous (i.e., non-smooth) and/or (b) requiring a long time to compute. The terms approximations, surrogate models, and metamodels will be used synonymously throughout this thesis. The major disadvantage of replacing the original computer-based simulation model by a surrogate model is that the solution of the metamodel optimization problem may not be the solution to the true problem (i.e., we are actually solving a different problem). Optimizing a surrogate model by a gradient-free optimization method requires more function evaluations compared to gradient-based techniques. This thesis focuses on finding an optimization approach that can address the problems of this genre. 1.4 Selection of Optimization Approach The basic requirement for the optimization approach is the selection of the design points for b uilding the s urrogate. T he (I esign o f e xperiments p lays a m ajor p art in s urrogate- based optimization. The performance of the surrogate depends on where the points are selected in the design domain. The most widely accepted method called Latin hypercube sampling is used for designing the experiments in this thesis. The next step is to build the surrogate based on true function evaluations at the selected design sites. The metamodel in this thesis is built based on a statistical interpolation technique called kriging. A derivative-free optimization technique called Efficient Global Optimization (EGO) is used to optimize the surrogate. The compatibility of EGO and kriging will be discussed later in this thesis. 1.5 Thesis outline The rest of the chapters are arranged as follows. In chapter two, various available methods for designing the experiments and modeling the surrogates are described, and their relative advantages a re discussed. C hapter three discusses the k riging m etamodel technique in detail and a test function is used to evaluate it. Also chapter three describes the Efficient Global Optimization technique. In chapter four an example problem on design for crashworthiness is solved using the suggested optimization approach. In chapter five an example problem on Sheet metal hydroforrning is solved using the suggested optimization approach. CHAPTER 2 LITERATURE REVIEW As discussed in introduction one can use approximations to enhance design optimization. The surrogate based optimization approach is normally divided in to three stages. The first stage is the selection of initial design sites or design of experiments. Some common methods for designing the experiment are discussed in this chapter. The design of experiment stage is followed by the metamodeling stage where a surrogate function is built. This thesis uses kriging for metamodeling. Other metamodeling techniques, which are widely used, are discussed. 2.1 Design of Experiment (DOE) Building an approximation for computer simulations involves (a) choosing a strategy to sample the region of interest and (b) constructing an approximation model to the sampled data. The region of interest is referred to as “design space”. It is bounded by the upper and lower bounds of each design (input) variable being studied. Design of experiment techniques allows a d esigner to s elect the p oints intelligently and in such a w ay as to produce an accurate and statistically meaningfiil approximation. This chapter compares several experimental design types, which are widely used in engineering design community. 2.1.1 Factorial Design There are basically two types of factorial designs, full factorial designs and fractional factorial designs. Both these types are characterized by the terms factor and level. A factor is the dimension of the design space. A level is a sub-division of a factor. Each factor (the dimension of the design space) is fixed at certain number of levels. Some examples of full factorial designs are 2" and 3" designs. The power k in the designs stand for the number of factors or dimensions and the numbers (2 and 3) represent the number of levels. Consider a 22 design i.e., a two-dimensional space with two levels for each dimension, the points are at the comer of a square. In a 32 design the points are at the center of each factor apart from the ends. These two designs are shown in figure 1. f 9 t + . ‘ O 22 Full factorial design 32 Full factorial design Figure l. Full factorial designs The size of a full factorial experiment increases exponentially with the number of factors; this leads to an unmanageable number of experiments. Fractional factorial designs are used when experiments are costly and many factors are required. A fractional factorial design is a fraction of a full factorial design. 3‘1”” is a fraction of 3" design. If we take the case of 32 design, one fractional design of it is 3'. A 31 fractional factorial design is shown in figure 2. Figure 2. 31 Fractional factorial design 2.1.2 Latin Hypercubes Latin hypercubes were the first type of design proposed specifically for computer experiments [36]. A Latin hypercube is a matrix of 11 rows and k columns where n is the number of levels being examined and k is the number of design variables (dimensions). A level refers to the total number of points to be sampled. Each of the k columns contains the levels 1,2...n, randomly permuted, each level being equally likely. Then these k columns are matched at random to form the Latin hypercube. Latin hypercubes offer flexible sample sizes while ensuring stratified sampling. A 10 x 2 Latin hypercube sampling is shown in figure 3. Figure 3. Latin hypercube sampling (LHS) As it can be seen from figure 3 there is no definite pattern for the selection of points. This totally eliminates any bias in the selection of design sites. As said previously LHS is well suited for computer experiments because selection of random numbers and random match of columns is easily accomplished with computers. Due to the above discussed features Latin hypercube sampling is used for designing experiments in this thesis. 2.2 Metamodel Selection In this section, a brief overview of different surrogate modeling techniques is given. The purpose is not to explain each in detail. Rather, the aim is to illustrate the wide variety of approximations available. It should be made clear that the most important technique for this thesis is kriging (Section 2.2.6). While kriging is presented in this overview, the details of the technique are lefi for the next chapter. 2.2.1 Least Squares This was the first kind of metamodel brought in to use. Least squares use polynomials. For a given data, one can determine the regression coefficients ,6,- in, say, a quadratic model of the form ([19]) we = 130 + Ax + flzxz, (2.1) where x is the input variable and y(x) is the predicted value of the input. The regression coefficients are found by minimizing the mean of the sum of squared errors (MSSE) of the predicted output values at x, 1 n A 2 MSSE = ; 2U: - y(xr)) . (2 -2) ‘-1 Equation 2.1 can be extended to any k th- order polynomial as j:(x) = ,30 + Ax + ,62x2 + ..... + ,kak , (2.3) Equation 2.3 in matrix form is written as Xfl = y (2-4) Solving for the vector ,6 gives the least square estimate [3 = (XTX)'1 X7 y. (2.5) with the optimal values of the coefficients chosen, a simple polynomial equation relating the inputs and outputs is built. This type of metamodel can be extended to any higher order polynomial by changing the number of regression coefficients. 10 2.2.2 Interpolating and Smoothing Splines Splines, where the polynomials are defined in a piecewise manner rather than as a single expression for the entire data set, provide an alternate solution to the problem of fitting highly nonlinear data. Simply put, several low order polynomials are fit to the data, each over a separate range defined by the knots or break points. Then boundary conditions are placed on the polynomials to ensure that the pieces match up with a prescribed order of continuity. Most often, the cubic polynomial pieces with C 2 continuity are used. By using C 2 continuity, the pieces have identical value, slope, and curvature at the knot locations. In the case of interpolating splines, the knots are taken at the data points so that the spline passes exactly through each data point. Therefore, while interpolating splines may be more true to the sampled data at the data points than least squares polynomials (i.e., a single polynomial), they can exhibit more waviness in between the data points [28]. There has been considerable work done on another regression model known as the smoothing spline. By adjusting a weight factor known as the smoothing parameter, a compromise can be reached between the goodness of fit seen in interpolating splines and the smoothness of the approximating functions seen in least squares models. 2.2.3 Fuzzy Logic and Neural Networks Even though splines do not require the fit and analyze methods that least squares models use, they still make the assumption that the data can be locally fit by a cubic polynomial. 11 There are many other methods that avoid these assumptions by letting the data be fit in a more free form. Two popular methods in use are known as fuzzy logic and neural networks. In the former, variables are classified as belonging to sets that define an input as say, high, medium or low. Rules are then defined to describe how the dependent variable responds to the various input sets. By describing how closely an input belongs to one set or another, an approximate response can be defined. For neural networks, only the set of input and corresponding output values are required. As the name suggests, the technique is analogous to the way the human brain learns. The input values are passed to a layer of nodes, which alter the data using, transforms such as the sigmoid or tansig functions The outputs of the hidden layer nodes are then passed to the output layer where they are recombined and scaled. Nonlinear optimization techniques are used to “train” the model parameters of the nodes so that the network can predict the output or response for a given input accurately. Having multiple nodes and layers allows for highly nonlinear models. Because no prior knowledge of the behavior of the response is necessary, these methods have gained in popularity throughout various fields. 2.2.4 Radial Basis Functions and Wavelets Two other metamodels that have found wide use in image filtering and reconstruction are radial basis functions and wavelets. The radial basis function metamodel was originally 12 developed by Hardy in 1971 as an interpolating scheme. Fifteen years later, the work of Dyn made radial basis functions more practical by enabling them to smooth data. As the name suggests, the form of these metamodels is a basis function dependent on the Euclidean distance between the sampled data point and the point to be predicted. Radial basis functions cover a wide range of metamodels, and one particular choice of basis function leads to the well-known thin plate splines. Wavelet metamodels are also relatively recent in their development. They are similar to Fourier transforms, but wavelet transforms have the ability to maintain space or time information. They are also much faster and require fewer coefficients to represent a regular signal. 2.2.5 Response Surface Methodology A response surface model may be written in general as follows ;=f(x1,x2,...,xk)+£ (2.6) where 8 represents the total error (the difference between the actual values and the predicted values) and k is the number of predictor variables in the model. The function f is normally chosen to be a low order polynomial ([42]), typically second order. Such a second order model with k variables may be represented mathematically as follows A k A k k A f=xio+ Zing-+2211] xixj (2.7) i=1 i=1j=l l3 for k variables the second order polynomial of equation (2.7) will have p = (k + l)(k + 2)/ 2 parameters. The intercept or constant term 20 is always included in the parameters of a model. The model may be expanded to include higher order terms. Once a model is chosen as a response model, the next step is to obtain the estimates of the unknown parameters. This step will be accomplished by using a multiple linear regression analysis. In general the error terms are unknown and the regression method is A used to get the estimates of the parameters. A.- is such that the sum L of the square of the error ICI'HIS L = Z 8,-2 (2-8) is minimized . The parameters 2.- can be obtained by solving the following equation 3 =(XTX)71XTy (2.9) The polynomial model as determined above will usually include redundant parameters. These redundant parameters will reduce the accuracy of the model. Several methods exist for selecting the best subset of parameters to be used in the final response model. This final model is referred to as reduced model. Stepwise regression is a procedure employed to get the final response model. Interested readers can refer to ([23] and [42]). 14 2.2.6 Kriging Another type of metamodel that is currently gaining popularity is kriging. In all of the above metamodels, a fundamental assumption is that y(x) = f (x) +8 , where the residuals a are independent, identically distributed normal random variables. The main idea b ehind k riging is that the e rrors in the p redicted v alues, 8,. , are 11 ot independent. Rather, the error is taken to be a systematic function of x. The kriging metamodel, y(x) = f (x)+Z (x), is comprised of two parts: a polynomial f (x) and a functional departure from that polynomial Z (x) . While the idea behind kriging is simply put here, the details are left for the next chapter. 15 CHAPTER 3 KRIGING AND EFFICIENT GLOBAL OPTIMIZATION 3.1 Kriging This section describes kriging in detail. The model fitting procedures are explained and the prediction equation of the method is derived. The final part of the chapter evaluates kriging. 3.1.1 History and Terminology There has been much interest of late in a form of metamodeling known as kriging. The name refers to a South African geologist, D. G. Krige who first used the statistics-based technique in analyzing mining data in the 1950s and 1960s. His work was furthered in the early 1970s by authors such as Matheron, and formed the foundation for an entire field of study now known as geostatistics [6]. At the end of the 1980s, kriging was taken in a new direction as a group of statisticians applied the techniques of kriging to deterministic computer experiments [25], [46]. This new research avenue was quickly explored by a variety of authors as the potential for more general engineering work became apparent. Their process of experimental design and using kriging models was named Design and Analysis of Computer Experiments (DACE), much as RSM refers to a set of tools for modeling and optimizing a system. 16 3.1.2 General Kriging Approach This section gives a general overview of kriging. The majority of the metamodeling A techniques rely on the decomposition y(x) = y(x)+£,. where the residuals a are independent, identically distributed normal random variables. The main idea behind kriging is that the errors in the predicted values, a, , are not independent. Rather, they take the view that the errors are a systematic function of x. To understand this statement better consider the following illustration where a quadratic equation is fit via least squares to a given data set. For a narrow region around two sample points, let the value of true function be y(x) and its associated regression model be f (x). At the sample point xi, we know the true function value is y(xi)andi therefore the error is g(xi) = y(xi) — f (x,) . At a location xi + 6 , for some small 6 , we only know the value of regression fimction f (xi +5). Kriging theory posits that if the error at x,- is large, then there is good reason to believe that the error at xi + 6 is also large. This is because there is a systematic error associated with the functional form f (x) . The closer a point x is to x,- the more likely f (x) is to deviate from y(x)by the same amount as g(xi) . A kriging metamodel in multidimensions, which takes the form y(x) = f(x)+Z(X) (3-1) 17 is comprised of two parts: a polynomial f (x) and a functional departure from that polynomial Z (x) . This can be written as [25] A d y(x) = Z fljfj (x) + Z(x) (3-2) j=l where f1 (x) are the basis functions (i.e., the polynomial terms), fl j are the corresponding coefficients, and Z (x) is a Gaussian process. More precisely, Z (x) represents uncertainty in the mean of y(x) with E (Z (x)) = O. The f (x) term is similar to the polynomial model in a response surface, providing a “global” model of the design space. In many cases f (x) is simply taken to be a constant term ,6 [25]. While f (x) “globally” approximates the design space, Z (x) creates “localized” deviations so that the kriging model interpolates the sampled data points. The covariance matrix of Z (x) is given by: C0v(Z(w),Z(x)) = 02R(w, x) (3.3) Here 02 is a scale factor called process variance that can be tuned to the data and R( w, x) is known as spatial correlation function (SCF). The choice of SCF determines how the metamodel fits the data. There are many choices of R(w, x) , to quantify how smoothly the function moves from point x to point w. one of the most common [21] SCF’s (shown here for one-dimensional points w and x) used in DACE is R(w — x) = e(‘glw‘xlp ), (3.4) 18 where 6 > 0 and O s p S 2. The choice ofthis SCF is motivated by the fact that the sample path produced is extremely smooth and also the function is infinitely differentiable. As with all choices of SCF, the function goes to zero as lw— x| increases. This shows that the influence of a sampled data point on the point to be predicted becomes weaker as their separation distance increases. The magnitude of 6 dictates how quickly that influence deteriorates. For large values of 9 , only the data points very near each other are well correlated. For small values of 6 , points further away still influence the p oint to b e p redicted b ecause they are still w ell c orrelated. The p arameter p is a measure of smoothness. The response becomes increasingly smooth with increasing value of p. One should also note the interpolating behavior of the SCF. As R(x, x) = 1 , the predictor will go exactly through any measured data point. To extend the SCF to several dimensions, common practice in DACE is to multiply the correlation functions. The so-called product correlation rule applied to Equation (3.4) results in d -6d|wd -xd’p R(w,x): [DIe (3.5) d=l where the subscript d=1 to D refers to the dimension. The metamodels are robust enough to allow a different choice of SCF for each dimension. In equation (3.5) the values of (9 and p are different in each dimension. But some authors [25] suggest using the same correlation parameters for all dimensions. In the work at hand (9 and p are fit for each dimension. l9 3.1.3 Derivation of prediction formula Several quantities and formulae have to be defined before deriving the prediction formula. The polynomial term of the model in equation (3.2) comprises of the d x] vector of regression functions f(X) = If] (X), f2 (x)9°°’fd (x)]t and the associated vector of constants fl = [filuBZa-"anlt- We next define the n x d expanded design matrix q FftOCI) ft(x2) [ft (xn )_ Notice that each row is one of f (x) vector and there are 11 rows for the n sampled points. If we notate the stochastic process as 2 =[Z(x1),Z(x2), ..... ,Z(x,, )]’ then for the output of the sampled data, y = [y1, y2,...,y,,]' , one can rewrite equation (3.2) as y = F ,6 + z (3.6) Let R be the n x n correlation matrix with i ,j defined by R(x,,xj )as in equation (3.5) and let 20 rx = [R(x1,x),R(x2,x), ..... ,R(x,,,x)]t be the n x 1 vector of correlations between the point at which to predict ,x, and all the previously sampled points xi. The goal of this derivation is to determine what is called the Best Linear Unbiased Predictor (BLUP). To do so, we will need a measure of the prediction error to quantify the best covariance model and a constraint to ensure unbiasedness. This can be written as a minimization problem using Lagrange parameters for the constraints. The term unbiasedness refers to the fact that the expected value of the linear predictor must equal the expected value of Equation (3.6). A linear predictor is one of the form cx’ y. That is, it uses a linear combination of the sampled data points, y, with the linear weighting vector cx. Thus, an unbiased linear predictor is one for which E (ext y): of F )6 , since E (Z (x)) = 0 by definition. This also means that by similar reasoning, E(;(x)) = f x’ ,B. Equating these two for all ,6 , we obtain the set of d unbiasedness constraints: F’cx =fx (3.7) we now show that for any linear predictor c,’ y of Y(x), the mean square error of prediction is E(c,’ y — Y(x))2 = E cx‘yy‘c, + Y2(x) — 2cx‘yY(x)J = Ec.‘(F/3 + 2)(F/3 + z)’c. + (m + Z(x))2 — 2c.‘(Ffl + z)(fx’fl + Z(x)) J The second equation is obtained by substituting for y and Y (x) in the first equation. Expanding the above equation we get: 21 E cxtFfl(Ffl)tcx +Cxtzztcx +2cxtFflzcx +fxtflfltfx Z(x)2 + 2 fx’ ,62(x) — 2cx’Fm’p — 2cx’zfx’ ,8 - ZcxtFflZ(x) — 2c,’zZ(x) Substituting the unbiasedness constraint from equation (3.7) the above equation simplifies to: = Elcx’FmFm’ c. + cx’zz’c. + fx’flfl’fx + Z(x)2 — 2c.’1~m‘fl — 2c.’22 true res Figure 7 Cross validation predictions from the kriging model The line is extremely well fit proving that the cross-validated responses are close to the actual responses. 3.2 Efficient Global Optimization Once the metamodel is constructed, then an optimization algorithm can be employed to optimize the surrogate. This section describes the Efficient Global Optimization (EGO) technique, an optimization algorithm, used in this thesis. Efficient Global Optimization uses a surrogate of the true function to generate the search points that leads the algorithm to the solution of the problem. Since it uses a stochastic model (kriging) for the surrogate, it falls in to the general class of optimization 3O algorithms referred as Bayesian analysis algorithms. These global searching algorithms use statistical models from an initial data sample to determine where to evaluate the functions. The simplest way to use surrogates for optimization is to create the metamodel and then find the minimum (local search) of the metamodel. But the problem with this method is that, it totally depends on the accuracy of the surrogate and can easily lead to a local optimum. By simply finding the minimtun of the surrogate, the method does not acknowledge the uncertainty associated with the metamodel. To eliminate this problem, the method must put some emphasis on sampling (global search) where the uncertainty associated with the metamodel is high. The uncertainty of the metamodel is measured by the root mean squared error of the predictor. EGO uses Expected Improvement criterion to strike a balance between the global and local search. The EGO algorithm proceeds as follows: 1. Choose a small initial experimental design (set of points, say It) spread over the entire input space and get the true function responses at these points. 2. Model (kriging) the true function using all the previous function evaluations. 3. Find the maximum of the “expected improvement” criterion. The location of maximum is the new design point. 4. Evaluate the true function at the new design point. 5. Go to step two and continue till the algorithm converges to a solution. 31 The most important part of the algorithm is step three. The next design point found by the algorithm is one with largest “expected improvement” in the current minimum. In simple terms the next point for optimization [30] is one which has a response less than the current minimum or is the point where the uncertainty is maximum. 3.2.1 Expected Improvement Criterion Let us now explain the expected improvement criterion in detail. Let f {gm be the current minimum function value, after the initial sample (n) evaluations. If the function is sampled at x to determine y(x) then the improvement I over the current minimum is defined as I = {friiin — y(x) If y(x) < frflim (3.22) 0 otherwise The expected improvement is then defined as [32] fn’iin n E(1)= I (fmin - y(X))¢(y(X))dy (323) where ¢(y(x)) is the probability density function representing uncertainty about y. To A predict y at an untried location the kriging predictor y(x) (3.17) is used. in) = 3+ r’xR“(y — 1,3) (3.24) the mean squared error of the predictor is given by (3.14) 32 52(x)=0’22[1—[1 rx’] [(1) 11:] [H] (3.25) More details on the kriging predictor and its derivation are available in section 3.1 of this thesis. It is assumed [30] that y(x) has a Gaussian distribution with mean given by the predictor, y (omitting the dependence on x for notational simplicity) and variance given by 32(x). As a result ((y(x)-y)/s) is standard normal. By applying some tedious integration by parts on the right hand side of equation (3.23), one can get an expression for the expected improvement in closed form as shown below [11], k n A n A if S>0 (min—m ffll +s¢ fmh’y E(I) = 4 s (3.26) [0 1'f s = 0 In the above equation (D( . ) is the cumulative density function. The first term in the equation (3.26) is the difference between the current minimum and the predicted value multiplied by the probability that the predicted value is smaller than the current minimum. The second term is the product of the root mean squared error and the probability that y is equal to f 13m . The second term is large when the predicted value is closer to friiin and when s is large, i.e., when there is much uncertainty about the predicted value. Thus the expected improvement will tend to be large at a point with predicted value smaller than f n’iin and/or where there is much uncertainty associated with the prediction. 33 The expected improvement criterion is then generalized [32] by introducing a parameter gas, 0 otherwise 18 ={(f,:m —Y(x))g y(x)