‘ 12%???- Aytf {l 2 ‘ - u i'g’étvggfih ‘j 3;-.1‘31' V .5, ‘;A v 2 “‘2‘! i3? ,- F. (, gain"? ‘5 1»: € ‘2 mag.” M} .. - 'V ‘4'. l % Tuna. L133}, 1.; f ' rr‘rr/If'i‘m . hiu. ’- m2}; 9547—73.: .2: 32?: .6211 éfifitx . 3.. "-4 A) C) r . 9"; ,CJ 6‘6. This is to certify that the dissertation entitled INTEGRATING MULTISPECTRAL REFLECTANCE AND FLUORESCENCE IMAGING FOR APPLE DISORDER CLASSIFICATION presented by Diwan Prima Ariana has been accepted towards fulfillment of the requirements for the Ph.D. de- ree in Biosystems Engineean “J {27% Major Professor’s Signature May 14, 2004 Date MSU is an Affirmative Action/Equal Opportunity Institution - -I-O-l-C-O-o-a-n---n-I-l-n-.-a- LIBRARY Michigan State University PLACE IN RETURN Box to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/01 c-JCIFICIDateDue.p65-p.15 INTEGRATING MULTISPECTRAL REFLECTAN CE AND FLUORESCENCE IMAGING FOR APPLE DISORDER CLASSIFICATION By Diwan Prima Ariana A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Agricultural Engineering 2004 ABSTRACT INTEGRATING MULTISPECTRAL REFLECTANCE AND FLUORESCENCE IMAGING FOR APPLE DISORDER CLASSIFICATION By Diwan Prima Ariana Multispectral imaging in reflectance and fluorescence modes was used to classify various types of apple disorder from three apple varieties (Honeycrisp, Redcort, and Red Delicious). Eighteen images from a combination of filter sets ranging from the visible region through the NIR region and three different imaging modes (reflectance, visible light induced fluorescence, and UV induced fluorescence) were acquired for each apple as a basis for pixel-level classification into normal or disorder tissue. Two classification schemes, a 2-class and a multiple class, combined with four different classifiers, nearest neighbor, neural network, linear discriminant function and quadratic discriminant function, were developed and tested in this study. In the 2—c1ass scheme, pixels were categorized into normal or disorder tissue, whereas in the multiple class scheme, pixels were categorized into normal, bitter pit, black rot, decay, soft scald, and superficial scald tissues. Total classification accuracy of the nearest neighbor classifier under the 2-class scheme for the full model, using all eighteen images, was 99.1, 96.8, 95.9, and 99.2% for Honeycrisp, Redcort, Red Delicious, and combined variety respectively. Furthermore, in the multiple-class scheme, the classification accuracy of Honeycrisp apple for normal, bitter pit, black rot, decay, and soft scald was 98.7, 99.3, 98.9, 98.5, and 100% respectively. These results indicate the potential of this technique to accurately recognize different types of disorder. Performance result comparison of the four classifiers demonstrated that for Honeycrisp and combined variety, the nearest neighbor classifier yielded the highest accuracy followed by neural network, linear discriminant and quadratic discriminant classifiers. However, there were no significant differences among the classifiers on Redcort and Red Delicious. Feature selection analysis to develop reduced-feature models was carried out through three different approaches, i.e. imaging mode combinations, filter combinations, and feature combinations. Imaging mode combinations analysis indicates a potential of integrating UV induced fluorescence and reflectance mode. Furthermore, the use of UV induced fluorescence alone has a potential to detect superficial scald in Red Delicious, and was able to classify black rot and soft scald on Honeycrisp with high accuracy, 100 and 99.4% respectively. Several important wavelengths were identified from the filter combination analysis, i.e. 680, 740, 905 nm. Reflectance at 680 nm relates to red color, and fluorescence response at 680 and 740 nm relates to the peaks of chlorophyll fluorescence emission, whereas the 905 NIR responses may relate to tissue physical characteristics. Feature combination analysis found the best 4—feature model resulted in total accuracy up to 96.6%, 98.8%, and 99.4% for Honeycrisp, Redcort, and Red Delicious respectively. Dedication To my mother Tien Kartinasarie and my father Undi Syamsuddin, who both passed away during my graduate study in Michigan State University. iv ACKNOWLEDGMENTS I would like to express my deepest appreciation and gratitude to my major professor, Dr. Daniel E. Guyer for the time and effort he devoted to me. His valuable guidance and assistance was essential to completing my Ph.D. degree. I appreciate the research assistantship given by Michigan State University through Dr. Guyer. I would also like to thank the members of my committee, Dr. Renfu Lu, USDA; Dr. Randolph Beaudry, Department of Horticulture; and Dr. Clark Radcliffe, Department of Mechanical Engineering, for their support and timely advice. Gracious appreciation is extended to Dr. Roger Brook for his guidance and who has served as my major professor in the first four years of my study at Michigan State University. I would like to thank The Department of Animal Science through Dr. Mike VandeHaar and Mr. Robert Kriegel for giving me research assistantships. I extend my great appreciations to The Indonesian Oil Palm Research Institute (IOPRI), who allowed me to continue my graduate study in US and initially supported my graduate study. I would like to thank Dr. Bim Shrestha, member of our research group, for his suggestions and help throughout my research project. I also wish to acknowledge Dr. Sastry Jayanty and Melissa from The Postharvest Technology and Physiology Laboratory, Department of Horticulture who help me identify disorders on apples. Finally, deepest appreciation to my beloved wife Endang Susilawati, my beloved sons Ega Prima, Fakhri Pranayanda, and Kevin Triananda, and my parents-in-law for their help, sacrifices, understanding, and patience during my graduate study. TABLE OF CONTENTS LIST OF TABLES viii LIST OF FIGURES - A xi 1. INTRODUCTION 1 1.1. Background ....................................................................................................... 1 1.2. Objectives and Hypothesis ................................................................................ 4 2. LITERATURE AND TECHNICAL REVIEW 6 2.1. Interaction of Light and Matter ......................................................................... 6 2.2. Spectral Imaging ............................................................................................... 7 2.3. Fluorescence ................................................................................................... 10 2.4. Apple Disorders .............................................................................................. 13 2.4.1. Bitter Pit .............................................................................................. 13 2.4.2. Soft Scald ............................................................................................ 15 2.4.3. Superficial Scald ................................................................................. 15 2.4.4. Decay .................................................................................................. 16 2.4.5. Black Rot ............................................................................................ 17 2.5. Classification Techniques ............................................................................... 19 2.5.1. Artificial Neural Network Classifier ................................................... 19 2.5.2. Discriminant Functions ....................................................................... 25 2.5.3. Nearest Neighbor Classifier ................................................................ 27 2.6. Feature Selections ........................................................................................... 30 2.6.1. Branch-and-Bound .............................................................................. 31 2.6.2. Sequential Forward and Sequential Backward Selection ................... 32 2.6.3. Principal Component Analysis ........................................................... 33 2.6.4. Neural Network Weights .................................................................... 34 2.7. Empirical Studies ............................................................................................ 35 2.7.1. Spectral Reflectance ............................................................................ 35 2.7.2. Fluorescence ....................................................................................... 43 3. MATERIALS AND METHODS 46 3.1. Multispectral Imaging System ........................................................................ 46 3.2. Apples ............................................................................................................. 51 3.3. Image Acquisition ........................................................................................... 52 3.4. Image Processing ............................................................................................ 54 3.5. Pixel Sampling ................................................................................................ 56 3.6. Classification ................................................................................................... 57 3.6.1. Classification Using Artificial Neural Network ................................. 60 3.6.2. Classification Using Discriminant Functions ..................................... 61 3.6.3. Classification Using K-Nearest Neighbor .......................................... 62 vi 3.7. Feature Selection ............................................................................................. 62 3.7.1. Imaging Mode Combinations ............................................................. 63 3.7.2. Filter Combinations ............................................................................ 64 3.7.3. Feature Combinations ......................................................................... 65 4. RESULTS AND DISCUSSION 69 4.1. Captured Images ............................................................................................. 69 4.2. Spectral Responses .......................................................................................... 71 4.2.1. Reflectance .......................................................................................... 71 4.2.2. Fluorescence ....................................................................................... 75 4.3. Full Model Classification ................................................................................ 79 4.3.1. Two-class scheme ............................................................................... 79 4.3.2. Multiple-class ...................................................................................... 81 4.4. Feature Selection ............................................................................................. 82 4.4.1. Imaging Mode Combinations ............................................................. 83 4.4.2. Filter Combinations ............................................................................ 86 4.4.3. Feature Combinations ......................................................................... 90 5. CONCLUSIONS 100 5.1. Spectral Responses ........................................................................................ 100 5.2. Full Model ..................................................................................................... 101 5.3. Reduced-feature Models ............................................................................... 101 5.4. General .......................................................................................................... 103 6. APLICATIONS AND PERSPECTIVE 104 6.1. Image-based classification ............................................................................ 104 6.2. Generality of the classification model .......................................................... 106 6.3. Recommendations ......................................................................................... 107 APPENDIX A. Classification accuracy of imaging mode combination models ...... 109 APPENDIX B. Classification accuracy of filter combination models using available images. 115 APPENDIX C. Classification accuracy of filter combination models using UV-induced fluorescence (FUV) images. 121 APPENDIX D. Classification accuracy of filter combination models using reflectance (R) images 127 APPENDIX E. Classification accuracy of feature combination models .................. 133 BIBLIOGRAPHY --140 vii Table 3.1. Table 3.2. Table 3.3. Table 3.4. Table 3.5. Table 4.1. Table 4.2. Table 4.3. Table 4.4. Table 4.5. Table 4.6. Table 4.7. Table 6.1. Table A.1. Table A2. LIST OF TABLES Acquired image for each imaging mode and filter combinations (indicated by «1) ............................................................................................. 53 Lighting level and exposure time for each imaging mode and filter combinations. ................................................................................................ 53 Total number of pixels selected for each tissue type from images of Honeycrisp, Redcort, and Red Delicious ...................................................... 56 Features included in imaging mode combinations (indicated by ‘1) ............. 64 Feature included in 1-filter model (indicated by \I) ...................................... 65 Means of normalized sampled pixel values on Honeycrisp for each type of tissue ......................................................................................................... 74 Means of normalized sampled pixel values on Redcort and Red Delicious for each type of tissue ................................................................... 75 Classification accuracy of full model for 2-class scheme ............................. 80 Classification accuracy of full model for multiple-class scheme ................. 82 Optimum number of filters for each variety following filter combination approach ........................................................................................................ 90 Optimum number of filters for each variety based on PCA method ............ 97 Optimum number of filter for each variety based on neural network weight method ............................................................................................... 99 Classification accuracy of the best 4-feature model (FVIS710, R740, R905, R710) of combined variety applied to each variety (based on nearest neighbor classifier). ........................................................................ 106 Classification accuracy of imaging mode combination models on Honeycrisp (2-class scheme) ...................................................................... 109 Classification accuracy of imaging mode combination models on Redcort (2-class scheme) ........................................................................... 110 viii Table A3. Table A4 Table A5. Table A6. Table B.1. Table B.2. Table B.3. Table B.4. Table B.5. Table B.6. Table C.l. Table C.2. Table C.3. Table C.4. Table C.5. Classification accuracy of imaging mode combination models on Red Delicious (2-class scheme) ......................................................................... 111 Classification accuracy of imaging mode combination models on combined variety (2-ciass scheme) ............................................................. 112 Classification accuracy of imaging mode combination models on Honeycrisp (multiple-class scheme) .......................................................... 113 Classification accuracy of imaging mode combination models on combined variety (multiple-class scheme) ................................................ 114 Classification accuracy of filter combination models using available images on Honeycrisp (2-c1ass scheme). .................................................... 115 Classification accuracy of filter combination models using available images on Redcort (2—class scheme) ........................................................... 116 Classification accuracy of filter combination models using available images on Red Delicious (2-class scheme) ................................................. 117 Classification accuracy of filter combination models using available images on combined variety (2-class scheme). .......................................... 118 Classification accuracy of filter combination models using available images on Honeycrisp (multiple-class scheme) .......................................... 119 Classification accuracy of filter combination models using available images on combined variety (multiple—class scheme). ............................... 120 Classification accuracy of filter combination models using UV-induced fluorescence (FUV) images on Honeycrisp (2—class scheme). ................... 121 ClassifiCation accuracy of filter combination models using UV—induced fluorescence (FUV) images on Redcort (2-class scheme). ......................... 122 Classification accuracy of filter combination models using UV-induced fluorescence (FUV) images on Red Delicious (2-class scheme). ............... 123 Classification accuracy of filter combination models using UV-induced fluorescence (FUV) images on combined variety (2-class scheme) ........... 124 Classification accuracy of filter combination models using UV-induced fluorescence (FU V) images on Honeycrisp (multiple-class scheme) ......... 125 ix Table C.6. Table D. 1. Table D2. Table D3. Table D4. Table D5. Table D6. Table E. 1. Table E.2. Table E.3. Table E.4. Table E.5. Table E.6. Classification accuracy of filter combination models using UV-induced fluorescence (FUV) images on combined variety (multiple-class scheme). ...................................................................................................... 126 Classification accuracy of filter combination models using reflectance (R) images on Honeycrisp (2-class scheme) ............................................... 127 Classification accuracy of filter combination models using reflectance (R) images on Redcort (2—class scheme). ................................................... 128 Classification accuracy of filter combination models using reflectance (R) images on Red Delicious (2-class scheme). ......................................... 129 Classification accuracy of filter combination models using reflectance (R) images on combined variety (2-class scheme). .................................... 130 Classification accuracy of filter combination models using reflectance (R) images on Honeycrisp (multiple-class scheme). .................................. 131 Classification accuracy of filter combination models using reflectance (R) images on combined variety (multiple-class scheme). ......................... 132 Classification accuracy of feature combination models on Honeycrisp (2-class scheme) .......................................................................................... 133 Classification accuracy of feature combination models on Redcort (2- class scheme). ............................................................................................. 134 Classification accuracy of feature combination models on Red Delicious (2-class scheme). ......................................................................................... 135 Classification accuracy of feature combination models on combined variety (2-class scheme) ............................................................................. 136 Classification accuracy of feature combination models on Honeycrisp (multiple-class scheme). ............................................................................. 137 Classification accuracy of feature combination models on combined variety (multiple-class scheme). ................................................................. 138 Figure 2.1. Figure 2.2. Figure 2.3. Figure 2.4. Figure 2.5. Figure 2.6. Figure 2.7. Figure 3.1. Figure 3.2. Figure 3.3. Figure 3.4. Figure 4.1. LIST OF FIGURES Conceptual representation of a volume of hyperspectral image data. Dark arrows indicate directions for sequential acquisitions to complete the volume of spatial and spectral data (Kim et al., 2001a); a) wavelength scanning, b) spatial scanning (pushbroom). ................................ 9 Light absorption and emission by chlorophyll (Taiz and Zeiger, 1998); (a) Energy level diagram, (b) the spectra absorption and fluorescence ........ 11 Examples of some disorders on apples. (a) bitter pit, (b) soft scald, (c) superficial scald, (d) black rot, (e) decay. ............................................... 18 Nonlinear model of a neuron (Haykin, 1999) ............................................... 20 Sigrnoid function with varying slope parameter a (Haykin, 1999) ............... 21 S-Nearest Neighbor classifier in the case of 3 classes .................................. 29 Tree representation of a branch-and-bound algorithm (Kittler, 1986) ......... 32 Schematic diagram of the multispectral imaging system. ............................ 46 Spectral characteristics of the camera, filters, and light sources. ................. 48 Relation between digital light level value and light intensity of tungsten halogen light model A-240P Dolan-Jenner Industries, Inc. (measured at 160 mm distance) .................................................................... 51 Pixel classification from multispectral images. ............................................ 58 Examples of single apple image sets with various defect types. (a) bitter pit on Honeycrisp, (b) soft scald, decay, and black rot on Honeycrisp, (c) superficial scald on Red Delicious. (Each individual image was linearly stretched to achieve optimal visual contrast). Imaging modes: reflectance (R); visible-1i ght-induced fluorescence (FV IS); and UV-induced fluorescence (FUV). Filters: numbers indicate peak wavelength (nm) of bandpass filters except 710 is cut-off wavelength (nm) of the hi ghpass filter and NF=no filter. ............................ 70 xi Figure 4.2. Figure 4.3. Figure 4.4. Figure 4.5. Figure 4.6. Figure 4.7. Figure 4.8. Figure 4.9. Response of VNIR reflectance of sampled pixels for various disorders of a) Honeycrisp, b) Redcort, and c) Red Delicious apple. Values are the average of the selected pixels (Table 3.3). Filters: R indicate reflectance mode, numbers indicate peak wavelength (nm) of bandpass filters except 710 is cut-off wavelength (nm) of the highpass filter and NF=no filter, * indicates significantly distinguished all tissue types. .......... 73 Response of visible induced fluorescence of sampled pixels for various disorders of a) Honeycrisp, b) Redcort, and c) Red Delicious apple varieties. Values are the average of selected pixels (Table 3.3) . Filters: FVIS indicate visible induced fluorescence mode, numbers indicate peak wavelength (nm) of bandpass filters except 710 is cut-off wavelength (nm) of the highpass filter and NF=no filter, * indicates significantly distinguished all tissue types .................................................... 77 Response of UV induced fluorescence of sampled pixels for various disorders of a) Honeycrisp, b) Redcort, and c) Red Delicious apple varieties. Values are the average of selected pixels (Table 3.3). Filters: FUV indicate UV induced fluorescence mode, numbers indicate peak wavelength (nm) of bandpass filters except 710 is cut-off wavelength (nm) of the hi ghpass filter and NF=no filter, * indicates significantly distinguished all tissue types ......................................................................... 78 Total classification accuracy of imaging mode combinations in 2-class scheme (based on nearest neighbor classifier). Imaging mode combinations with the same letter for a variety are not significantly different at (1:005. ....................................................................................... 84 Total classification accuracy of imaging mode combinations in multiple-class scheme (based on nearest neighbor classifier). Imaging mode combinations with the same letter for a variety are not significantly different at (1:005. .................................................................. 84 Maximum total classification accuracy from each number of combined filters using all available images in each filter. ............................................ 87 Maximum total classification accuracy from each number of combined filters within UV-induced fluorescence mode. ............................................. 87 Maximum total classification accuracy from each number of combined filters within reflectance mode. .................................................................... 88 xii Figure 4.10. Total classification accuracy based on nearest neighbor using single image: a) 2-class scheme, b) multiple—class scheme on Honeycrisp. Prefixes FUV=UV-induced fluorescence, FVIS=visible-light-induced fluorescence, R=reflectance; Numbers followed prefix indicate peak wavelength (nm) of bandpass filters except 710 is cut-off wavelength (nm) of the hi ghpass filter and NF=no filter. ................................................ 92 Figure 4.11. Total classification accuracy based on nearest neighbor on each step of Figure 4.12. Figure 6.1. backward elimination process ....................................................................... 94 Absolute value of eigenvector coefficient of: a) first principal component, b) second principal component ................................................. 96 Images built from pixel-based classification. (a) Original FUV550 image, (b) Full model, (c) 4-feature model. ................................................ 105 xiii 1. INTRODUCTION 1.1. Background Internal and external quality are important factors in the highly competitive market of stored apples. Important quality criteria for consumers are: appearance, including size, color, and shape; texture; flavor; nutritional value; and presence of defects or disorders. Many factors influence the quality, but can be generally categorized into preharvest, harvest, and postharvest factors. Quality classification of fruits is an important procedure in marketing and processing. In the past, segregation of hi gh- and low-quality fruit was performed manually, but in modern packinghouses it is performed automatically, although mainly is still limited to sorting fruit by color and size. Since manual fruit grading has drawbacks such as subjectivity, inconsistency, tediousness, labor availability, and cost, efforts to develop efficient and accurate automated fruit classification systems continue to be industry priorities. Automated sorting technology can sort fruit and vegetables rapidly and consistently. Electronic sorting technology is in place, or is available, for sorting many commodity quality characteristics. The most sophisticated optical or electronic sorting equipment available today can sort with “good” accuracy. However, the ability to detect surface and sub-surface defects, disorders, and diseases is limited. This limitation results particularly from a lack of data on the spectral range, or set of ranges, needed to adequately detect as well as classify surface and sub-surface disorders. Considerable work in the area of noninvasive / nondestructive techniques to inspect fruits and vegetables has been conducted. The techniques include surface reflectance and transmittance of various forms of (ultraviolet, visible, NIR, MIR) light energy, acoustic response, mechanical deformation, x-ray, computed tomography (CT), fluorescence, and magnetic resonance imaging (MRI). Diffuse reflectance in visible (VIS) and NIR regions provides useful information to detect bruises, chilling injury, scald, decay lesions, and numerous other defects (Abbott, 1999). Diffuse reflectance measurement using spectrometers has been widely implemented in a variety of applications; however, spectroscopic assessment with relatively small point-source measurements has disadvantages compared to an imaging approach that characterizes the spatial variability of a sample material (Kim et al., 2001b). In particular, imaging techniques are better suited for the detection of localized effects of a sample material. Imaging techniques have been successfully used for classification or sorting of agricultural products. One of the imaging techniques that has been widely used is multi and hyperspectral imaging that captures a set of images at different wavelengths. Multispectral and hyperspectral imaging techniques have been adopted in many disciplines, such as airborne remote sensing, environmental monitoring, medicine, military operations, factory automation and manufacturing (Gat er al., 1997; Shaw and Manolakis, 2002). In agricultural product quality assessment, the techniques have been studied for inspection of poultry carcasses (Park et al., 1998), chicken skin tumor detection (Chao er al., 2002), defect detection on cherries (Guyer and Yang, 2000), apples (Kavdir and Guyer, 2002; Lu, 2003; Mehl er al., 2002), citrus (Aleixos et al., 2002), and tomatoes (Polder et al., 2002). Hyperspectral imaging techniques currently cannot be directly implemented in an online system for agricultural product sorting because the time required for image acquisition and analysis is too long. Multispectral imaging is a faster technique based on discrete spectral analysis at a few wavelengths as opposed to the continuous spectral analysis used in hyperspectral imaging (Mehl et al., 2002). Most of the studies in hyperspectral and multispectral imaging for agricultural product inspection involve reflectance imaging. An alternative, or additional inspection technique is fluorescence imaging. Chlorophyll fluorescence in plant/leaf tissue has been studied extensively but only recently has been applied to fruit post-harvest physiology and to even lesser degree to physical surface defects arising from handling or disorders. A fluorometer, which doesn’t provide spatial information as in fluorescence imaging, was used in the majority of the studies of chlorophyll fluorescence. DeEll et al. (1999) summarized much of the past work related to fluorescence studies. The majority of the work focused on stress or disorders involving whole plant or whole commodity response, such as chilling injury, heat stress, environmental stress and maturity, with limited study on disorders which involve localized or smaller areas of the surface tissue. Several chlorophyll fluorescence studies on apples have been reported, such as in relation to superficial scald development (Mir et al., 1998), heat injury (Song et al., 2001), freezing injury (Fomey er al., 2000), controlled-atrnosphere disorders (DeEll er al., 1995), and maturation (Song et al., 1997). Most plant leaves, when illuminated with UV radiation, exhibit a broad fluorescence emission with maxima at 440, 525, 685, and 740 nm (Chappelle er al., 1985). These fluorescence emissions are indicative of the complex interactions of both physiological and biochemical processes in plants. Changes in fluorescence emission in response to environmental perturbations can be wavelength dependent and are usually species dependent as well. A multispectral fluorescence imaging system with UV excitation has been developed by Kim et al. (2001b) to capture fluorescence images of leaves in the blue, green, red, and far-red regions of the spectrum, using band pass filters centered at 450, 550, 680, and 740 nm respectively. 1.2. Objectives and Hypothesis Although multispectral reflectance and fluorescence imaging have individually been studied widely in a variety of applications, most of the studies related to object classifications only deal with one of the two imaging modes at a time. Integrating reflectance and fluorescence information in the classification model may improve the classification accuracy considering both reflectance and fluorescence images carry different information as a result of interaction of light energy and matter. Therefore, the main objective of this study was to develop a detection technique for defects on apples based on integrated multispectral reflectance and fluorescence imaging. To accomplish this overall objective, the following sub-objectives were established: (1) Design and build a multispectral imaging system to capture images of apples under reflectance and fluorescence imaging modes. (2) Determine if imaging of light energy reflectance in the visible and near infrared regions, as well as imaging of chlorophyll fluorescence under both visible and UV excitation, can successfully be used to detect different types of defects or disorders on apples. (3) Optimize the combination of filters and lighting mode(s) for best classification success. Integrating reflectance and fluorescence information in a single classification model represents the uniqueness of this study, resulting in the hypothesis that integrated multispectral imaging in reflectance and fluorescence modes can be used to enhance detection of different types of defects or disorders on apples. 2. LITERATURE AND TECHNICAL REVIEW 2.1. Interaction of Light and Matter The interaction of light and matter is a highly complex phenomenon. The absorbing molecules of matter are excited to specific vibrational states or energy levels dependent on the energy of the incoming radiation. For example, long wavelength radiations (low energy) such as radio or microwaves can excite gases; short wavelength radiations (high energy) such as x-rays affect liquids and solids. According to quantum theory, molecules absorb light in the visible and ultraviolet regions because their electrons can move to higher energy states. Infrared light does not have enough energy to excite electrons in molecules. Instead, excitations resulting in molecular absorption come from vibration and rotation of molecules. Rotational absorption bands are predominantly in the far infrared. Vibrational absorption bands involve the near infrared, which has been applied extensively to component analysis of food and agricultural materials (Muir et al., 1989). When a light beam falls on an object, part of the incident beam is reflected by the surface and the rest is transmitted into the object where it is either absorbed, reflected back to the surface (body reflectance), or transmitted through the object. Part of the absorbed radiation may be transformed into another form of radiation, such as fluorescence and delayed-light emission (light emitted from the object after the source has been removed). The amounts of radiant energy in the reflectance, transmittance, absorption, or emission depend on the properties of the object and the incident radiation (Chen, 1978). When a fruit or vegetable is exposed to light, about 4% of the incident light is reflected at the outer surface, causing specular reflectance or gloss, and the remaining 96% of the incident energy is transmitted through the surface into the cellular structure of the product where it is scattered by the small interfaces within the tissue or absorbed by cellular constituents (Birth, 1976). Plant tissues are optically dense, which is difficult to penetrate and alters the path length traveled by the light so that the amount of tissue interrogated is not known with certainty. Most light energy penetrates only a very short distance and exits near the point of entry; this is the basis of color. But some penetrates deeper (usually a few millimeters, depending on optical density) into the tissues and is altered by differential absorbance of various wavelengths before exiting and therefore contains useful chemometric information. Such light may be called diffuse reflectance or body reflectance (Abbott, 1999). 2.2. Spectral Imaging Machine vision provides automated production processes with vision capabilities. Machine vision can be described as the integration of imaging devices, computers, algorithms, and robotics for automated inspection, characterization, and control. It has been applied widely in many sectors of industries, especially in electronic and automotive, and is increasingly applied in agricultural sectors in recent years. The most common industrial applications of machine vision are inspection and quality controls. The majority of inspection tasks are highly repetitive and extremely boring, and their effectiveness depends on the efficiency of the human inspector. Since inspection or classification of agricultural products is tedious and repetitive, machine vision and image processing techniques are useful for agricultural and food industry applications, particularly in grading and inspection (Park et al., 1998). An important part of machine vision is imaging devices along with algorithms to accomplish the purpose of its application, for example to classify objects which are inspected. There are two main imaging systems currently used, the first captures spatial information only, the second captures both spatial and spectral information. While spatial imaging resolves objects into their morphological dimensions, spectral imaging resolves a phenomenon of the interaction of light and objects to be inspected (Park et al., 1998). Spectral imaging involves measuring the intensity of diffusely reflected light from a surface. The reflected light contains information about the absorbers near the surface of the material that modifies the reflection. By using different wavelengths across a waveband, it is possible to construct a characteristic of spectral features for the material (Muir, 1993). These spectra] images are multi-dimensional and the process of distinguishing between them is known as spectral pattern recognition. Spectral imaging also known as imaging spectroscopy, is the application of reflectance/emittance spectroscopy to every pixel in a spatial image. Multispectral or hyperspectral imaging systems permit acquisition of images at many wavelengths. Multispectral imaging system collects images at few, discrete, noncontiguous wavelengths. On the other hand, hyperspectral images are acquired at hundreds of narrow and contiguous wavelengths. The spectral image dataset can be visualized as a cube, with the X and Y dimensions being the length and width of the image or spatial information (in pixels) and the Z dimension being spectral wavelengths; each data point is an intensity value. Alternatively, the dataset could be envisioned as a stack of single wavelength pictures of the object, with as many pictures as the number of wavelengths used. Since chemical bonds absorb light energy at specific wavelengths, some compositional information can be determined from spectral data, thus multispectral or hyperspectral imaging provides information about the spatial distribution of constituents (pigments, sugars, moisture, etc.) near the product’s surface (Abbott, 1999). Figure 2.1 shows the conceptual representation of spectral imaging. M 3 $9 i Spatial (t‘ l) SW (AI) Spatial (t ,) (a) (b) Figure 2.1. Conceptual representation of a volume of hyperspectral image data. Dark arrows indicate directions for sequential acquisitions to complete the volume of spatial and spectral data (Kim et al., 2001a); a) wavelength scanning, b) spatial scanning (pushbroom) There are two approaches of how a cube of spatial and spectral data can be acquired in spectral imaging. One approach, illustrated in Figure 2.1a, sequentially captures a full spatial scene at each spectral band to form a three-dimensional image cube. Multiple band—pass filters, a liquid-crystal tunable filter, or an acousto-optic tunable filter can be used for this approach. Another approach (Figure 2.1b) is a pushbroom method in which a line of spatial information with a full spectral range per spatial pixel is captured sequentially to complete a volume of spatial-spectral data (Kim et al., 2001a). 2.3. Fluorescence Fluorescence is the property of some atoms and molecules to absorb light of particular wavelengths and after a brief interval, termed the fluorescence lifetime, to re- emit light at longer wavelengths. Fluorescence requires an outside source of energy, is the result of the absorption of light, and involves the emission of electromagnetic radiation (light). This process is different from chemiluminescence, where the excited state is created via a chemical reaction (Herman, 1998). Many agricultural materials fluoresce and nearly all horticultural application of fluorescence refers specifically to chlorophyll fluorescence (Abbott, 1999). Chlorophyll appears green to our eyes because it absorbs light in the red and blue parts of the spectrum, so only some of the light enriched in green wavelengths (about 550 nm) is reflected into our eyes. Equation 2.1 represents the absorption of light in which chlorophyll (Chl) in its lowest-energy, or ground, state absorbs a photon (represented by hv) and make a transition to a higher-energy, or excited, state (Chl*). Chl + hv —) Chl* (2.1) The distribution of electrons in the excited molecule is somewhat different from the distribution in the ground state molecules. Figure 2.2 illustrates the absorption and emission of light by chlorophyll molecules. Absorption of blue light (about 430 nm) 10 excites the chlorophyll to a higher energy state than absorption of red light (about 660 nm), because the energy of photons is higher when their wavelength is shorter. In the higher excited state, chlorophyll is extremely unstable, very rapidly gives up some of its energy to the surrounding as heat, and enters the lowest excited state, where it can be stable for a maximum of several nanoseconds (10’9 s) (T aiz and Zeiger, 1998). Higher excited state r . .. f». 7 Blue / Heat loss 1:" .9 " Lowest excited state Energy :2 it; (1 Wavelength. A. ”6 F 9 7‘,~.':»',i;:‘-.~‘" : s ------ Red C to- “ .1: ‘, I ‘ ‘I 2 o . . . ’_-- I T c I E .9 / Fluorescence r 3 ‘é-E (loss of energy by c g o 3 emission of light g ' I g ‘3 of longer A) 3 g ‘- 3 Ground state “- (a) (b) Figure 2.2. Light absorption and emission by chlorophyll (T aiz and Zeiger, 1998); (a) Energy level diagram, (b) the spectra absorption and fluorescence In the lowest excited state, the excited chlorophyll has several possible pathways for disposing of its available energy such as (Taiz and Zeiger, 1998): (1) Re-emit a photon and thereby return to its ground state, a process known as fluorescence. (2) Return to its ground state by directly converting its excitation energy into heat, with no emission of photon. (3) Transfer its energy to another molecule, a process known as energy transfer. (4) Cause a chemical reaction to occur, known as photochemistry. ll When the excited chlorophylls fluoresce, the wavelength of fluorescence is almost always slightly longer than the wavelength of absorption of the same electron state, because a portion of the excitation energy is converted into heat before the fluorescence photon is emitted. Conservation of energy therefore requires that the energy of the fluorescent photon be lower than that of the excitation photon — hence the shift to longer wavelength, known as stokes shift (Herman, 1998). Chlorophylls fluoresce in the red region of the spectrum. Chlorophylls can be found in organized pigment/protein complexes in the chloroplast membrane. These protein/pigment complexes are referred to as photosystems I and II, each of which has a ‘reaction center’ wherein the light energy is converted and utilized. A portion of the absorbed energy is transferred to electrons (from water) in photosystem II (PSII). The electrons are, in turn, used to fuel the reduction of C02 to sugar and carbon skeletons in the process of photosynthesis. A small portion of the energy is not used and is reradiated as fluorescence (Mir et al., 1998). When the intensity of illuminating light is well below the capacity of the tissue to process the energy, P811 is able to pass on nearly all the electrons excited by the light to photosynthetic processes, such that its reaction center is essentially always ‘open’ for additional energy influx. Under these conditions, the fluorescence intensity is at a minimum, referred to as dark, background, or initial fluorescence (F0). Conversely, when the intensity of the illuminating light is well above the capacity of the tissue to process the energy, P811 is able to pass on only a fraction of the electrons excited by the light. The reaction center is essentially ‘closed’ to energy influx and the excited electrons 12 have a tendency to lose their energy as fluorescence. Under these conditions, the fluorescence intensity is at maximum, referred to as maximum fluorescence (Fm). The relationship between these two responses is more commonly expressed as the ratio between the increase in fluorescence from minimal to maximal (Fm-F0) and the maximal (Fm). The quantity Fm-Fo is often referred to variable fluorescence (Fv). As long as P811 is functioning normally, the ratio of Fm to F0 is usually about 0.8. When P811 is functioning poorly, fluorescence characteristics are altered (Beaudry et al., 1998). 2.4. Apple Disorders The identification of fruit disorders at harvest, during storage, or after shipping is of utmost importance to producers, shippers, and consumers. Accurate recognition of disorders is needed before problems associated with orchard nutrition, cultural practices, or postharvest treatment can be corrected. In this section, detailed information is given on several apple disorders found in the samples used in this research including bitter pit, soft scald, superficial scald, black rot, and decay. It should be noted that some disorder are difficult for the human eye to distinguish, especially at early stage of disorder development. 2.4.1. Bitter Pit Bitter pit is a disorder in which small, brown, somewhat dry, slightly bitter tasting lesions 3-5 mm in cross section develop in the flesh of the apple (Figure 2.3). The first symptoms of bitter pit may be small, darkened, slightly depressed spots under the skin, 13 usually in the calyx end of the fruit. The disorder does not affect the skin directly. It may appear before harvest or develop during storage. Internal lesions are often associated with the vascular elements. In severe cases, several lesions may become confluent to form larger necrotic areas. With time, the lesions at the skin darken, sometimes becoming reddish and more sunken, especially in Newton and Golden Delicious (Meheriuk et al., 1994). Initiation of symptoms may begin four to six weeks after petal fall when affected tissues have a higher rate of respiration and ethylene production. This is a period of greater protein and pectin synthesis with greater migration of organic ions into the affected areas. Affected areas retain starch grains not seen in healthy tissue. A mineral imbalance in the apple flesh develops with low levels of calcium and relatively high concentrations of potassium and magnesium. Low levels of calcium impair the selective permeability of cell membranes leading to cell injury and necrosis (Meheriuk er al., 1994). Honeycrisp is one of the cultivars that are susceptible to bitter pit. This trait is most pronounced on young, vigorous trees with a small crop load and large fruit. The occurrence of bitter pit is greatly reduced as the trees mature and the crop load increases. Foliar applications of calcium also have proven very effective in preventing bitter pit on Honeycrisp. Avoiding excessive amounts of nitrogen may also help prevent its occurrence (Bedford, 2001). 14 2.4.2. Soft Scald Soft scald is easily identified by the sharply defined, irregularly shaped, smooth brown areas in the skin of the apple (Figure 2.3). There may be one or more small lesions, or the disorder may affect most of the apple, irrespective of skin color, but usually not at the stem or calyx end. In its various stages, soft scald affects the skin only, but it may damage hypoderrnal tissue as the lesion continues to develop (Meheriuk et al., 1994) Soft scald is a low-temperature—induced disorder of apples. The disorder is likely to occur when highly respiring susceptible cultivars are cooled rapidly. Delayed cooling can advance the onset of the climacteric and thus render the fruit more prone to soft scald upon subsequent rapid cooling. The disorder is prevented if the apples are subjected to 20-30% C02 for 2 days during the cooling period (Meheriuk et al., 1994). Dipping the fruit in an aqueous solution containing antioxidants such as diphenylamine (DPA) and edible oil markedly reduce or prevent the disorder (Wills and Scott, 1982). 2.4.3. Superficial Scald Superficial scald is a postharvest disorder of apples characterized by diffuse browning of the skin, somewhat roughened in severe cases, which become more extensive after a few days at room temperature (Figure 2.3). On red cultivars, the scald lesion is often confined to the unblushed area of the skin (Meheriuk er al., 1994). A naturally occurring terpene, a—farnesene, has been found in the skin of apples. Its oxidation products are suggested as the cause of superficial scald. Lipoxygenase, in 15 addition to a-farnesene, may be involved in the induction of scald and may be responsible for the browning (Ingle and D'Souza, 1989). Factors that increase the severity of the disorder include immaturity, high fruit nitrogen, low fruit calcium, warm preharvest weather, delayed cold storage, high storage temperature, high relative humidity in storage, restricted ventilation, extended storage periods and (in controlled atmosphere storage) slow oxygen reduction and high oxygen concentration. Effective treatments to prevent the scald are DPA dips, hot water dips, ethrel sprays, calcium sprays, and fruit coating such as lecithin (Meheriuk er al., 1994). 2.4.4. Decay Postharvest diseases of fruit crops are caused mostly by fungal infection. The infected tissue, also known as decay, is typically different from surrounding healthy tissue in color and/or texture. In most cases, infected tissue forms a discrete zone, known as a lesion, which extends radially from an infection point in a characteristic pattern determined by the interaction between the host fruit and the pathogen (Figure 2.3). In some postharvest diseases, the border between infected and apparently healthy tissue is sharply defined, in others it is more diffuse (Sugar, 2002). Some of the diseases common on apples along with the causal fungi are: bitter rot (Glomerella cingulata), black rot (Physalospora obtusa), gray mold (Botrytis cinerea), blue mold (Penicilium expansum), bull’s-eye rot (Pezicula malicorticis (J acks.)), white rot (Botryospheria ribis), flyspeck (Microthyriella rubz), and side rot (Phialophora marolum) (Pierson et al., 1971). Organisms rot fruit and vegetables while still immature and attach to the plant or during the harvesting and subsequent handling and marketing operations. The infection 16 process, particularly postharvest, is greatly aided by mechanical injuries to the skin of the produce, such as fingernail scratches and abrasions, rough handling, insect punctures and stem cuts. Furthermore, the physiological condition of the produce, the temperature, and the formation of the periderrn significantly affect the infection process and the development of the infection (Wills er al., 1998). 2.4.5. Black Rot Black rot is identified as a firm brown spot on any part of the apple (Figure 2.3). The affected surface may be marked with concentric zones of different shades of brown, especially if the fruit rotted on the tree. In advanced rots, which can involve the whole fruit, the skin is dark brown or even black and sometimes dotted with numerous small black fungal fruiting bodies called pycnidia. The presence of pycnidia and their random distribution help to distinguish black rot from most other apple rots (Pierson et al., 1971). The black rot fungus, Physalospora obtusa, attacks the leaves, wood, and fruits of apple. While immature fruits may be attacked, the disease is primarily a rot of ripe fruits. Infections may occur at insect injuries and wound sites. Calyx end infections may follow spray and frost injury. Core and calyx end rots may result from fungal invasion of the open calyx tubes in varieties such as Delicious. The disease develops very slowly in green or immature fruits. Black rot ordinarily does not spread from one fruit to another. Black rot should be controlled in the orchard (Pierson et al., 1971). 17 .r. iii-nth ‘1 at! (6) Figure 2.3. Examples of some disorders on apples. (a) bitter pit, (b) soft scald, (c) superficial scald, ((1) black rot, (e) decay 2.5. Classification Techniques Classification, the assignment of an object to one of a number of predetermined groups, is of fundamental importance in many areas of science and technology. For the most part, unless the classification is obvious and trivial we still depend on human expertise to classify on the basis of observation. As the computer has become more and more accessible so it has become attractive to try and use it to either replace the experts or at the very least to guide and help them. Classification is an important component in pattern recognition systems, which usually consists of sensing, segmentation, feature extraction, classification, and post-processing components (Duda et al., 2001). This section will present theoretical background of three classification techniques used in the research, i.e. artificial neural network, discriminant analysis, and k-nearest neighbor. 2.5.1. Artificial Neural Network Classifier Artificial neural networks (ANN) provide an emerging paradigm for pattern recognition implementation that involves large interconnected networks of relatively simple and typically nonlinear units. A neural network is designed to model the way in which the brain performs a particular task or function of interest. Basically, three entities characterize an ANN (Schalkoff, 1992): (1) The network topology, or interconnection of neural units, (2) The characteristics of individual units or artificial neurons, and (3) The strategy for pattern learning or training. 19 A neuron is an information-processin g unit that is fundamental to the operation of a neural network. The block diagram of Figure 2.4 shows the model of a neuron, which form the basis for designing ANNs. There are three basic elements of the neuronal model: (1) A set of synapses or connecting links, each of which is characterized by a weight or strength of its own, denoted by ij. A signal xj at the input of synapse j connected to neuron k is multiplied by the synaptic weight wkj. (2) An adder for summing the input signals, weighted by the respective synapses of the neuron; the operation described here constitutes a linear combiner. (3) An activation function for limiting the amplitude of the output of neuron. Bias r bk ‘1 Activation x2 function Input . Output signals < ('00 H yr 6’" Synaptic weights Figure 2.4. Nonlinear model of a neuron (Haykin, 1999) The neuronal model of Figure 2.4 also includes an externally applied bias, denoted by bk, which has the effect of increasing or lowering the net input of the activation function, depending on whether it is positive or negative, respectively. 20 In mathematical terms, we may describe a neuron k by writing the following pair of equations: v, = i wkjrj (2.2) j=l and y. = ¢(Vk + b.) (2.3) where x1, x;, xIn are the input signals; wkl, wkz, wkm are the synaptic weights of neuron k; vk is the linear combiner output due to the input signal; bk is the bias; (p(-) is the activation function; and yk is the output signal of the neuron. The most common form of activation function used in construction of ANNs is the sigmoid function, whose graph is s-shaped. It is defined as a strictly increasing function that exhibits a graceful balance between linear and nonlinear behavior (Haykin, 1999). An example of the sigmoid function is the logistic function, defined by: ¢(v) 4 (2.4) = l + exp(—av) where a is the slope parameter of the sigmoid function. By varying the parameter a, we obtain sigmoid functions of different slopes, as illustrated in Figure 2.5. .12 1 I- 0.3 - 0.6 ~ 0.4 - 0.2 - 910—8-6—4-20246810 Figure 2.5. Sigmoid function with varying slope parameter a (Haykin, 1999) 21 Network Architectures In general, there are three fundamentally different classes of network architectures (Haykin, 1999): (l) single-layer feedforward networks, (2) multilayer feedforward networks, and (3) recurrent networks. A multilayer feedforward network is distinguished from a single-layer feedforward network by the presence of one or more hidden layers, whose computational nodes are correspondingly called hidden neurons or hidden units. The function of hidden neurons is to intervene between the external input and the network output in some useful manner. By adding one or more hidden layers, the network is enabled to extract hi gher-order statistics, which is valuable when the size of the input layer is large (Haykin, 1999). A recurrent neural network distinguishes itself from a feedforward neural network in that it has at least one feedback loop. The presence of feedback loops has a profound impact on the learning capability of the network and on its performance. Feedforward networks with a back-propagation learning algorithm are commonly used for classification purposes. Learning Process A neural network learns about its environment through an interactive process of adjustments applied to its synaptic weights and bias level. Ideally, the network becomes more knowledgeable about its environment after each iteration of the learning process. There are two learning paradigms, first is learning with a teacher or supervised learning and second is learning without a teacher or unsupervised learning. In supervised learning, there is a targeted output to which the neural network approaches. The objective 22 of the learning process is then to minimize the difference between the target output (correct class) and the neural network output by adjusting network parameters. The adjustment is carried out iteratively in a step-by-step fashion with the aim of eventually making the neural network emulate the teacher. In unsupervised learning there is no external teacher to oversee the learning process. Rather, provision is made for a task- independent measure of the quality of representation that the network is required to learn, and the parameters of the network are optimized with respect to that measure (Haykin, 1999). Pattern recognition or classification tasks are in the category of supervised learning. A neural network performs pattern recognition by first undergoing a training session, during which the neural network is repeatedly presented a set of input patterns along with the category to which each particular pattern belongs. Later, a new pattern is presented to the network that has not been seen before, but which belongs to the same population of patterns used to train the network. Multilayer feed forward networks, also known as multilayer perceptrons, have been applied successfully to solve some difficult and diverse problems by training them in a supervised manner with a highly popular algorithm known as the error back- propagation algorithm. This algorithm is based on the error-correction learning rule. Basically, error back-propagation learning consists of two passes through the different layers of the network; a forward pass and a backward pass. In the forward pass, an input vector is applied to the sensory nodes of the network, and its effect propagates through the network layer by layer. Finally, a set of outputs is produced as the actual response of the network. During the forward pass the synaptic weights of the networks 23 are all fixed. During the backward pass, on the other hand, the synaptic weights are all adjusted in accordance with an error—correction rule. Specifically, the actual response of the network is subtracted from a desired (target) response to produce an error signal. This error signal is then propagated backward through the network, against the direction of synaptic connection - hence the name “error back-propagation”. The synaptic weights are adjusted to make the actual response of the network move closer to the desired response in a statistical sense. The error back-propagation algorithm is also referred to as the back propagation algorithm. The error signal at the output of neuron j at iteration n is defined by: e,(n)=d,-(n)-y,-(n) (2.5) where dJ-(n) is the target response for neuron j; yj(n) is the neural network output of neuron j at iteration n. The back-propagation algorithm applies a correction ij,(n) to the synaptic weight wfi(n) connecting neuron i to neuron j, which is defined by the delta rule: Awfi (n) = 7761. (n)y,. (n) (2.6) where 17 is learning rate parameter; 6,- is local gradient and y; is input signal of neuron j. The local gradient 6,- depends on whether neuron j is an output node or a hidden node. If neuron j is an output node, 610') : ej (nhj (Vi (n)) (2'7) If neuron j is a hidden node, 5,-I"): ¢j(vj(n))26k (")ij(n) (2'8) k 24 where k is index for neurons in the next hidden or output layer that are connected to neuron j. The back-propagation algorithm provides an “approximation” to the trajectory in weight space computed by the method of steepest descent. The smaller we make the learning-rate parameter :7, the smaller the changes to the synaptic weights in the network will be from one iteration to the next, and the smoother will be the trajectory in weight space. This improvement, however, is attained at the cost of a slower rate of learning. If, on the other hand, we make the learning-rate parameter 17 large in order to speed up the rate of learning, the resulting large changes in the synaptic weights cause the network to become unstable. A simple method of increasing the rate of learning yet avoiding the danger of instability is to modify the delta rule of Eq. 2.6 by including a momentum term as shown by (Rumelhart et al., 1986) Aw), (n) = aAwfi(n — 1) + 276,.(n)y,(n) (2.9) where a is usually a positive number called the momentum constant. 2.5.2. Discriminant Functions Bayes decision theory is a fundamental statistical approach to the problem of pattern classification (Duda er al., 2001). This approach is based on the assumption that the decision problem is posed in probabilistic terms, and that all of the relevant probability values are known. Bayes decision theory is a basis for developing a discriminant function. 25 A decision rule partitions a space into regions (2;, i=1,. . .,N, where N is the number of classes. An object is classified as coming from class 00,. if its corresponding vector representation, x, lies in region Q... Bayes rule can be expressed as (Hand, 1981): PW- ”): P(xl;'8;’(wi) (2.10) where P(a),. | x) is posterior probability of co,- given x; P(w.~) is a prior probability for class (0.; p(x) is the probability that x occurs; p(xl (0,.) is a class—conditional probability density function. If p(x | (0,) are known then the problem is solved - we simply substitute the x vector, for the object to be classified, into equation 2.10 and find the largest value of p(x | a), )P((:)‘ ). But the p(x I (0,.) are usually unknown and are estimated from the set of classified samples. If the class-conditional probability density function is assumed Gaussian distributed, then: l 1 r -1 . = e -— - . Z - . 2. p(xIQ) (2”)d12 |2|112 Xp[ 2(x #1) (x flr)] ( 11) where y and 2 are mean and covariance matrix of a class. The parameters (u, 22) are sufficient to uniquely characterize the normal (Gaussian) distribution. The parameters ([1, 2) are estimated from the training samples using Maximum Likelihood Estimation (MLE) given as follows: #= _Z x (2.12) 2:71:10. -,u)(.—x,u (2.13) A discriminant function for the i-th class is defined as: 26 g.(x)=P(w.- Ix) (2.14) Given a feature vector x, the classification rule is based on finding the largest discriminant function. Assuming equal a priori probabilities, this means choosing the class for which p(xlwg) is largest. Any monotonically increasing function of g.(x) is also a valid discriminant function. The log function meets this requirement, that is, an alternative discriminant function is: em = log{P(w.- I x)} (2.15) 8£(x)=-%(x-#.)TE§‘(x-#.)-%logIIE.-I)+log(P(w.-)) (2.16) Equation 2.16 is known as a Quadratic Discriminant Function. If we further assume that the population covariance matrices Z,- are all the same, we can simplify the quadratic discriminant score in Equation 2.16 into the linear discriminant score: 3,:(X) = #iTZr‘x _%'u’TZ-1fli + 10g(P(wi )) (2-17) 2.5.3. Nearest Neighbor Classifier The Nearest Neighbor (NM classifier is an example of a nonparametric classifier. Using the label information of the training sample, an unknown observation x is compared with all the cases in the training sample. N distances between a pattern vector x and all the training patterns are calculated, and the label information, with which the minimum distance results, is assigned to the pattern x. That is, the NN rule allocates the x to wk class if the closest sample xc is with the label k =L(x¢) (Micheli-Tzanakou, 2000): 27 x, = argmin{d(xo,x,.)}, i = 1,2,...,N (2.18) x0 6 W, = L(xk) (2.19) The distance measure between the unknown and the training sample has a general quadratic form: d(x,xk ) = (x0 - x,‘ )TM(x0 - xk) (2.20) If the Mahalanobis distance is used, M is equal to E", which is the inverse of the covariance matrix in the sample. If Euclidean distance is used, M is equal to I, which is the identity matrix. The K-Nearest Neighbor (KNN) rule is the same as the NN rule except that the algorithm finds the K nearest point within the points in the training set from the unknown observation x and assigns the class of the unknown observation to the majority class in K points. In the example in Figure 2.6, there are three classes, and the value of K is 5. Of the 5 closest neighbors, 4 belong to 001 and 1 belongs to 003, so xu is assigned to (01, the predominant class. The only parameter that should be determined is “K”, the number of the nearest neighbors to consider. The value of K depends on the number of training data. With a larger number of samples, larger numbers of K can be chosen. 28 X2 (.01 If, >4 c: 003 X1 Figure 2.6. 5-Nearest Neighbor classifier in the case of 3 classes KNN is considered a lazy learning algorithm, which exhibit three characteristics that distinguish them from other learning algorithms: (1) defers processing of their input until they receive requests for information; they simply store their inputs for future use, (2) replies to information requests by combining their stored (training) data, and (3) discards the constructed answer and any intermediate results. In contrast, eager learning algorithms have three characteristics: (1) compiles its input data into a compressed description or model (for example density parameters in statistical pattern recognition and associated weights in neural network pattern recognition, (2) discards the training data after compilation of the model, and (3) classifies incoming patterns using the induced model, which is retained for future requests. There is a tradeoff between the lazy and eager algorithms, lazy algorithms have fewer computational costs than the eager algorithms during training, but they typically have greater storage requirements and higher computational costs on recall (Aha, 1997). 29 2.6. Feature Selections For any given classification problem there is an unlimited number of measurements that could be made on the objects to be classified. It is therefore necessary to choose a finite subset of these which leads to good classification results. The most straightforward reason to use smaller subsets is cost. If it is excessively expensive or time-consuming to gather measurements, then, the fewer, the better. If an adequate subset of the original measurements can be found, then only this subset need be measured on all future objects to be classified. Another reason for reducing the dimensionality of the space in which classifications are made is simply to eliminate redundancy. There is no point in measuring a feature that does not add to the accuracy of the classification achieved without this feature. Furthermore, a lower misclassification rate can sometimes be achieved by using fewer features (Hand, 1981). Basically, the feature selection problem is to find the best set of d