.18. a , . , i . t . .mm ...wi. .... haunt. «PM... 40‘ «‘11. ’ bflénflfxgw. , sq. .. 3.. s). : I; .. 32% humid. , if; 3.93.4 .Wafirfifiu. . swan»; . a”... v . inn».- 1 2...». 6:... If u. ”L- . anmrx thaw” .. n! IVS: 3 cl 1 .1, Elllmtufiu)‘: 93.35.33; I51 Hitlélf. .....r... 3 v. .3“ big . ... 3.3.. i; $7 . :lx‘il 14‘. It I H , .dmmfig I .. 13.33%", A . .731; 1 ...}: .. . ..v..v t I 3. ..E i 4:, LIBRARY Michigan State University This is to certify that the dissertation entitled MODEL BASED IMAGE FUSION presented by MRITYUNJAY KUMAR has been accepted towards fulfillment of the requirements for the PhD. degree in Electrical Engineering kiwi/V Major Professor’fiigfiature Cl / K/ o 8 Date MSU is an Affirmative Action/Equal Opportunity Employer .--—.-.-n-u—o-n_.---.--—-—-.--.—n- PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 5/08 K:IProj/AccaPres/CIRCIDaleDue indd MODEL BASED IMAGE FUSION By Mrityunj ay Kumar A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Electrical Engineering 2008 ABSTRACT MODEL BASED IMAGE FUSION By Mrityunj ay Kumar Data fusion integrates redundant as well as complementary information present in input signals in such a manner that the fused signal describes the true source better than any of the individual signals. The exploitation of redundant information improves accuracy and reliability whereas integration of complementary information improves the interpretability of the signal. Data fusion plays a crucial role in image- based applications such as computer vision, robotics, remote sensing, etc. The goal of such fusion is to extract information from input images such that the fused image provides better information for human or machine perception as compared to any of the input images. The goal of the research presented in this dissertation is to develop model based im- age fusion algorithms. The primary advantage of model based fusion approach is that it allows identification of complementary and redundant information present. in the input images in a systematic manner. In this thesis, we present three approaches to image fusion, based on multichannel deconvolution methods, weighted average meth- ods and local affine models for pixel level image fusion. The proposed multichannel fusion algorithm fuses blurred images obtained from different sensors such that the fused image does not look blurred. The algorithm assumes a linear space invariant model for each of the measurements. However, it does not require knowledge of the point spread function or PSF of the imaging sensors and can be used in several areas of image processing such as digital camera imaging, nondestructive evaluation, etc. This framework has been extended to perform superresolution, where multiple low resolution images of the same scene are used to generate a higher resolution image of the scene. In a more general setting, the degradation in an image may be due to a nonlinear or a local process. In such cases, weighted average or local affine models are more appropriate. A Rayleigh quotient approach for fusing multisensor images using the weighted average model is proposed and contrasted with other existing techniques for pixel level fusion. Finally, the total variation (TV) norm is used to solve the problem of fusion when using a local affine model. Results of a range of datasets indicate the feasibility and performance of the proposed algorithms. To my family iv ACKNOWLEDGMENTS I would like to thank my advisor, Dr. Pradeep Ramuhalli for his direction and encouragement. I am greatly indebted to him for his advice and support. I wish to thank the other members of my thesis committee, Dr. Satish Udpa, Dr. Lalita Udpa and Dr. Sarat Dass. During these years, I have known them as very inspiring and enthusiastic people who have made a deep impression on me. I have been fortunate to have them on my committee. I would particularly like to express my gratefulness to Dr. Satish Udpa and Dr. Lalita Udpa for showing faith in me and supporting me when I really needed it. I cannot thank them enough. I am also indebted to Dr. Dass for his encouragement and kindness in answering the questions. The numerous fruitful discussions we had, helped me sort out the technical details of Chapter 5 of this dissertation. My internship at Eastman Kodak during the summer of 2007 was a unique experi- ence. It inspired the series of mathematical models and experiments described in this dissertation. I am grateful to my mentor at Eastman Kodak, Mr. Rodney Miller, for his encouragement and advice during my internship and thereafter. I thank my family members for their love and support. At last, but not at least, I would like to express my heart-felt gratitude to my wife Sonali who has been a constant source of love, support, patience, encouragement and strength all these years. \Y TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES 1 Introduction 1.1 Overview .................................. 1.2 Image Fusion Issues ............................ 1.3 Research Contribution .......................... 1.4 Organization of the Report ........................ 2 Literature Review 2.1 Data Fusion Models ............................ 2.2 Evaluation of Fused Image ........................ 2.3 Prior Work ................................ 2.3.1 Image Fusion using Multichannel Deconvolution ........ 2.3.2 An Optimal Weighted Average Pixel-Level Fusion ....... 2.3.3 Local Affine Models for Pixel-Level Fusion ........... 2.3.4 Super-resolution .......................... 3 Image Fusion Using Multichannel Deconvolution 3.1 Richardson-Lucy Deconvolution Algorithm ............... 3.2 Proposed Approach ............................ 3.3 Results ................................... 3.4 Discussion ................................. 3.5 Superresolution .............................. 3.6 Superresolution Results .......................... 4 An Optimal Weighted Average Pixel-Level Fusion 4.1 Proposed Approach ............................ 4. 2 Overall Approach ............................. 4.3 Noise Analysis ............................... 4.4 Results ................................... 4.5 Discussion ................................. vi viii ix 03»wa 10 11 11 12 16 16 19 2O 22 28 28 29 31 41 41 43 50 51 53 53 54 57 5 A Total Variation Based Algorithm for Pixel Level Image Fusion 5.1 The Image Acquisition Model ...................... 5.2 Image Fusion ............................... 5.2.1 Total Variation Norm for Image Eision ............. 5.2.2 Estimation of B .......................... 5.3 Overall Approach ............................. 5.4 Results ................................... 5.5 Discussion ................................. 6 Conclusions And Future Research 6.1 Conclusions oooooooooooooooooooooooooooooooo 6.2 Future Research .............................. BIBLIOGRAPHY vii 66 66 67 67 71 74 75 77 87 87 88 91 3.1 4.1 5.1 LIST OF TABLES Performance summary of the multichannel deconvolution algorithm on benchmark images ............................ 38 Performance summary of the weighted average pixel level fusion algo- rithm .................................... 57 Performance summary of the total variation based pixel level fusion algorithm ................................. 77 viii 1.1 1.2 1.3 2.1 2.2 2.3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 LIST OF FIGURES Image fusion example (a) LLTV Image (b) F LIR Image (c) Fused Image 3 Single Input Multiple Output (SIMO) Model .............. 7 Super-resolution Example: (a) LR Image 1 (256x256) (b) LR Image 2 (256x256) (c) LR Image 3 (256x256) ((1) SR Image (512x512) . . . . 9 Single Sensor Image Acquisition Process . '. .............. 16 Multiple Sensors Image Acquisition Process .............. 17 Observation Model for Super-resolution ................. 25 Multichannel Image Acquisition Model ................. 30 Serial multistage model for blind deconvolution ............ 30 Flowchart of the multichannel blind deconvolution algorithm ..... 32 Cameraman Image:(a) Original (b) Channel 1: Blurred by motion blur of length 5 and angle 13 (c) Channel 2: Blurred by motion blur of length 6 and angle 120 (d) Channel 3: Blurred by motion blur of length 6 in vertical direction ............................. 34 Deconvolution of Cameraman Image:(a) Channel 1 only (b) Channel 2 only (0) Channel 3 only (d) Proposed approach ........... 35 Barbara Image:(a) Original (b) Channel 1: Blurred by motion blur of length 9 and angle 30 (c) Channel 2: Blurred by motion blur of length 9 and angle 60 ((1) Channel 3: Blurred by motion blur of length 11 in horizontal direction ............................ 36 Deconvolution of Barbara Image:(a) Channel 1 only (b) Channel 2 only (c) Channel 3 only (d) Proposed approach ............... 37 ix 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15 3.16 4.1 4.2 4.3 Noise Analysis (Additive white Gaussian Noise, mean = 0, SNR :- 19 dB): Cameraman Image (a) Channel 1 (b) Channel 2 (c) Channel 3 (d) Fused Image: Using RL without damping (e) Fused Image: Using damped RL ................................ 39 Plot of SNR vs. UIQI ........................... 40 Convergence Analysis (a) Fused Cameraman Image (b) Fused Barbara Image ................................... 40 Overall Approach for Super-resolution ................. 42 LR frames for Cameraman image: (a) Original Image (256x256) (b) LR Image 1 (128x 128): Blurred by motion blur of length 7 and angle 60 (c) LR Image 2 (128x128): Blurred by motion blur of length 7 and angle 90 (d) LR Image 3 (128x128): Blurred by motion blur of length 9 and angle 0 (e) LR Image 4 (128x128): Blurred by motion blur of length 9 and angle 135 .......................... 45 SR results for Cameraman image (256x 256) (a) Using one Input Image shown in Fig. 3.12b (b) Using two input images shown in Fig. 3.12b and 3.12b (c) Using three input images shown in Fig. 3.12b-3.12d ((1) Using all the four input images ..................... 46 LR frames for Barabara image: (a) Original Image (512x512) (b) LR Image 1 (256x256): Blurred by motion blur of length 9 and angle 60 (c) LR Image 2 (256x256): Blurred by motion blur of length 9 and angle 90 ((1) LR Image 3 (256x256): Blurred by motion blur of length 11 and angle 0 (e) LR Image 4 (256x256): Blurred by motion blur of length 11 and angle 135 (f) SR Image (512x512) ........... 47 Performance in presence of noise, SNR = 18 dB: (a) Input 1 (b) Input 2 (c) Input 3 (d)Input 4 (e) SR Image ................. 48 Plot of SNR vs. UIQI for “barbara” image ............... 49 Flowchart of the Proposed Weighted Average Pixel Level Image Fusion 54 Medical Images (a) MRI Image (b) CT Image (c) Fused Image: Pro- posed Approach (d) Fused Image: “Max” rule ............. 59 Aircraft Navigation Images (a) LLTV Image (b) FLIR Image c) Fused Image: Proposed Approach (d) Fused Image: “Max” rule ...... 60 4.4 4.5 4.6 4.7 4.8 5.1 5.2 5.3 C)! A}; 5.5 5.6 5.7 5.8 5.9 5.10 Multifocus Images (a) Focus on the front part (b) Focus on the back part c) Fused Image: Proposed Approach (d) Fused Image: “Max” rule 61 Medical Images (SN R = 15 dB)(a) MRI Image (b) CT Image (c) Fused Image: Proposed Approach (d) Fused Image: “Max” rule ...... Aircraft Navigation Images (SNR = 11 dB) (a) LLTV Image (b) FLIR Image 0) Fused Image: Proposed Approach (d) Fused Image: “Max” rule .................................... Multifocus Images (SNR = 16 dB) (a) Focus on the front part (b) Focus on the back part c) Fused Image: Proposed Approach (d) Phsed Image: “Max” rule ............................ Plot of SNR vs. Similarity Quality Index ................ Comparison of least square and TV norm based fusion(a) LLTV Im- age (b) FLIR Image (c) Fused Image: Least Square (d) Fused Image: Proposed Approach (TV Norm) ..................... Flowchart of the Proposed TV Norm Based Pixel Level Image Fusion Medical Images: (a) CT Image (b) MRI Image ............. Medical Images, SNR = 23 dB: (a) CT Image (b) MRI Image (c) Fused Image: Least Square ((1) Fused Image: Proposed Approach ...... Medical Images, SNR = 12 dB: (a) CT Image (b) MRI Image (c) Fused Image: Least Square (d) Fused Image: Proposed Approach ...... Medical Images, SN R = 0 dB: (a) CT Image (b) MRI Image (c) Fused Image: Least Square (d) Fused Image: Proposed Approach ...... Aircraft Navigation Images: (a) LLTV Image (b) FLIR Image . . . . Aircraft Navigation Images, SNR = 23 dB: (a) LLTV Image (b) FLIR. Image (c) Fused Image: Least Square (d) Fused Image: Proposed Ap- proach ................................... Aircraft Navigation Images, SNR = 8 dB: (a) LLTV Image (b) FLIR Image (c) Eised Image: Least Square ((1) Fused Image: Proposed Ap- proach ................................... Aircraft Navigation Images, SNR = 0 dB: (a) LLTV Image (b) FLIR. Image (c) Fused Image: Least Square (d) Fused Image: Proposed Ap- proach ................................... xi 62 63 64 78 79 80 81 82 83 84 85 5.11 Plot of SNR vs. Similarity Quality Index (a) CT, MRI dataset (b) LLTV, FLIR dataset ........................... 86 xii CHAPTER 1 Introduction 1. 1 Overview Data fusion integrates redundant as well as complementary information present in input signals in such a manner that the fused signal describes the true source bet- ter than any of the individual signals. The exploitation of redundant information improves accuracy and reliability whereas integration of complementary information improves the interpretability of the signal [1—5]. In a typical data fusion application, multiple sensors observe a common source and a decision is taken based on the collec- tive information. For example, radar provides the ability to accurately determine an aircraft’s range, but has a limited ability to determine the angular extent of the air- craft. By contrast, an infrared imaging sensor can accurately determine the angular extent of an aircraft, but is imprecise in measuring range. If these two observations are correctly associated, then the combination of data from the two sensors provides an improved determination of location than could be obtained by either of the two in- dependent sensors [2]. Another application, reconstruction of high-resolution image, attempts to obtain high-resolution image from multiple low-resolution images [6]. The concept of data fusion is not new. It is naturally performed by living beings to achieve a more accurate assessment of the surrounding environment and identification of threats, thereby improving their chances of survival. For example, quality of an edible object can be perceived more accurately by using a combination of sight, touch, smell and taste. The advent of new sensors, advanced processing techniques, and improved processing hardware has provided the ability to emulate, in hardware and software, the natural data fusion capabilities of humans and animals [2]. Data fusion plays a crucial role in image-based applications such as computer vision, robotics, and remote sensing. The goal of image fusion is to extract information from input images such that the fused image provides better information for human or machine perception as compared to any of the input images [7—10]. An example of image fusion is demonstrated in Fig. 1.1. Fig. 1.1a and 1.1b show the images of the same scene captured using a low-light-television (LLTV) sensor and a forward-looking- infrared (FLIR) sensor, respectively. LLTV sensor provides the surface information of the ground as well as building and vegetation details. However, FLIR image describes the road network details accurately. The fused image shown in Fig. 1.1c contains all the salient features of both the sensors. Image fusion has been used extensively in various areas of image processing such as remote sensing, biomedical imaging, nondestructive evaluation, etc. [11—13]. In remote sensing, the objective is the acquisition and interpretation of spectral mea- surements made at a distant location to obtain information about the Earth’s sur- face. In order to produce a high-accuracy map, the classification process assigns each pixel of the image to a particular class of interest, (such as water, dry soil, wet soil, etc.). The image resulting from the labeling of individual pixels is known as the the- matic (or class) map [3]. A typical remote sensing system uses multispectral sensors and the thematic map is generated using information present across observed im- ages [5]. As another example, recent developments in medical imaging have resulted in many imaging techniques to capture various aspects of the patient’s anatomy and (C) Figure 1.1: Image fusion example (a) LLTV Image (b) FLIR Image (c) Fused Image metabolism [14,15]. However these techniques are complementary in nature. For example, magnetic resonance imaging (MRI) is very useful for defining anatomical structure whereas metabolic activity can be captured very reliably using Positron Emission Tomography (PET). Hence, by using fusion, it is possible to obtain a single image that describes anatomical as well as metabolic activity of the patient effec- tively [16]. Similarly, in nondestructive evaluation of materials, a single sensor does not capture all the details of the flaw [13]. For example, high frequency eddy current sensors are capable of capturing the surface footprint of the flaw but they cannot pro- vide much information about its depth. Similarly, low frequency eddy current sensors characterize the depth of the flaw but are poor at estimating the surface details. It has been reported in the literature that fusion of images obtained from high and low frequency eddy current sensors provides accurate and robust estimation of flaws as compared to results obtained from the input images [17,18]. 1.2 Image Fusion Issues The central objective of image fusion is to combine complementary and redundant information from multiple images such that the fused image represents the original scene better than any of the input images. However, development of fusion algorithms is a challenging task. Image fusion algorithms are always application dependent and require good knowledge of the image acquisition process. Typically, in image fusion based applications a suite of sensors are used to acquire images. Different sensors have different sensing characteristics; as a result, the input images may be significantly different from each other and cannot be compared. For example, at pixel-level fusion, averaging works well if the input images are similar. But averaging cannot be used to fuse visible-band and infrared images as it will reduce the contrast [19]. Therefore, for fusion purposes, these images must be transformed into representations in which they can be compared and fused [20]. However, the transformation is application dependent. Some of the challenges that are commonly faced during the development of fusion algorithms are as given below: 1. Feature Discrepancy: The design of the sensors are based on different phys- ical phenomena; therefore, their image acquisition processes are very different. Consequently, the features (edges, local contrast, etc.) of the images obtained using these sensors are very different. Feature mismatch makes it difficult to compare the input images and fuse them. For example, the local contrast of visible-band and infrared images have opposite polarity [21]. This is also referred to as local polarity reversal. Hence, to fuse these images, the fusion algorithm must be able to handle the local polarity reversal. In some cases, image features present in one image are missing in others. Such features are called complemen- tary features [22,23]. The fusion algorithm should retain the complementary features as they improve the interpretability of the original scene. 2. Common Geometric Representation: Images obtained from different sen- sors may represent different geometry of the scene. For example, in visible-band imaging, images are formed by a 2—D projection of the 3—D scene on the image plane. However, millimeter-wave radar (MMWR) sensors have a range geom- etry as they measure the intensity of the reflected radio wave. To fuse images obtained from these sensors, the input images must be transformed to a common geometric representation [20]. 3. Registration: Registration is the process of aligning images such that there is a one-to-one correspondence between the pixels of these images [24]. Regis- tration is a key requirement for fusion, especially for pixel-level fusion. Images of different sensors may be misaligned because of the several factors including different spatial positions of the sensors, different temporal capture rates, etc. The misaligned input images must be co-registered before fusion. 1.3 Research Contribution The main contribution of this thesis is the development of algorithms based on a model based image fusion framework. The central objective of any fusion algorithm is to identify the redundant and complementary information and integrate them suit- ably. The relationship between the input and fused images may be represented using mathematical models. The primary advantage of the model based approach is that it allows incorporation of complementary as well as redundant information present in the input images in a mathematically convenient and analytically tractable man- ner. If the framework that relates the input images is sufficiently generic then the fusion algorithm based on this framework can be used in more than one area of image processing. For example, in several areas of image processing such as NDE, digital camera applications, medical imaging, etc., the image acquisition process is modeled as a convolution of the true image and the point spread function (PSF) of the sensor. Therefore, a fusion algorithm based on a convolution model can be used in all these application areas. The single input multiple output (SIMO) multichannel model that has been used extensively for multichannel image restoration [25,26] can also be used as an image fusion model. Fig. 1.2 shows the schematic of a SIMO model. In this model, it is assumed that the measured or observed images (11,- - - ,1”) are obtained by imaging the same scene (10) using different sensors that have complementary information. Therefore, the original image can be estimated by using all of the measured images simultaneously. This model can also be perceived as an image fusion model where 11,- - - ,1" are the observed images and 10 is the fused image. If the sensors are linear and space invariant, then the measured images can be modeled as the output of linear -—>| Sensor] I» 11W) 10m) ... —»| Sen... 2 ]-. 12w) —’| Sensorn I" In(x,y) Figure 1.2: Single Input Multiple Output (SIMO) Model shift invariant systems. In this case, a multichannel deconvolution approach can be used to estimate 10 from 11,- - - ,In [25]. Currently existing algorithms typically solve this problem using a linear inverse problem framework [26]. But these approaches usually involve storage of large matrices, which makes it difficult to use these algo- rithms for large images. This thesis presents an alternative approach for multichannel blind deconvolution based on the Richardson-Lucy deconvolution algorithm for im- age deblurring [27,28]. The proposed algorithm is iterative and does not require the knowledge of the sensor PSFs. The algorithm is applied to motion blur removal which is a common problem in digital camera application. A multichannel deconvolution based image fusion approach is of use only if all observed images are blurred. Blurring is a global operation, i.e., the entire image appears blurry. However, sometimes observed images have blurs at the local level, i.e., only some regions of the images are blurred. The multifocus problem is a classic example of this case. The multifocus problem is caused due to the limited depth of field of the optical lenses. Lenses of long focal length cannot generate very clear images of all the objects in a scene that are located at different distances from the lens. As a result the object which is in focus appears to be visually appealing whereas the rest of the objects appear blurry [29]. Clearly, the multichannel deconvolution model is not appropriate to solve this problem. Similarly, in many applications such as radar imaging, visual and IR imaging, the complementary and redundant information are available at local level in the measured images [21,30]. These types of images are not blurred at all but contain complementary information and convolution based models cannot be used to fuse such images. To fuse such images, two models have been investigated, namely (i) weighted average model, and (ii) local affine model. In the weighted average approach, the fusion process is modeled as a weighted average of input pixels where the weights of the fusion process are estimated by maximizing the average power of the corresponding fused pixel. This approach does not require any user specified threshold unlike other similar algorithms and is computationally efficient. However, this model is susceptible to sensor noise. In a local affine model, the relationship between fused and input pixels are expressed using an affine model [21,30, 31]. Existing approaches estimate the fused image from this model using parametric techniques. In the proposed research, a total variation (TV) norm [32] based approach has been used to estimate the fused image. The total variation norm has been used in several image processing applications [32-35] and is based on a nonlinear partial differential equation (PDE). It is robust to noise [36,37], and is a data driven approach that does not require knowledge of the probability density function of the fused pixels. An interesting application of image fusion is super-resolution (SR). In many areas of image processing, high resolution (HR) images are desired and often required. The range of resolution of imaging devices is limited. The images captured using such devices may not capture all the details of the scene if the scene has many levels of details. The resolution of an image can be increased either at the hardware or software level. Hardware based approaches are sensitive to noise and the cost involved is inappropriate for general purpose commercial applications. In software based approaches multiple low resolution images of the scene are captured and signal processing techniques are used to fuse these low resolution images to obtain a higher resolution image. This approach is called super-resolution [6, 38—41]. An example of super-resolution is shown in Fig. 1.3. Fig. 1.3a - 1.3c are low resolution images and Fig. 1.3d shows the corresponding high resolution image. (C) ((0 Figure 1.3: Super-resolution Example: (a) LR Image 1 (256x256) (b) LR Image 2 (256x256) (c) LR Image 3 (256x256) (d) SR Image (512x512) The super-resolution technique is much more cost effective than a. hardware solu— tion. Super-resolution is an inverse problem [6,38] and the forward model consists of three stages namely (i) Warping (ii) Blurring and (iii) Downsampling. Classic super-resolution approaches assume knowledge of all three stages [6]. In addition, these approaches are computationally expensive. A computationally efficient ap- proach that does not require the knowledge of blurring transfer function and uses the fusion algorithm proposed here has been investigated for super-resolution. 1.4 Organization of the Report The rest of the report is organized as follows. Chapter 2 provides a literature survey of image fusion that includes quality estimation of fused images, inverse problems and local affine based frameworks for super-resolution and image fusion. Chapter 3 de- scribes a multichannel deconvolution algorithm and its extension to super-resolution for digital images. Chapter 4 discusses a weighted average approach for image fu- sion whereas Chapter 5 provides a local affine transform based approach for fusion. Finally, Chapter 6 provides a summary of the proposed research work. 10 CHAPTER 2 Literature Review 2.1 Data Fusion Models Due to the large number of applications as well as the diversity of fusion techniques, considerable efforts have been put in developing standards for data fusion. Several models for data fusion have been proposed in the recent past [42,43]. One of the models commonly used in signal processing applications is the three-level fusion model that is based on the levels at which information is represented [44]. This model classifies data fusion into three levels depending on the way information present in the data/ image is represented and combined. If the raw images are directly used for fusion then it is called low level fusion. In the case when the features of the raw images such as edge, texture, etc. are used for fusion then it is called feature or intermediate level fusion. The third level of fusion is known as decision or high level fusion in which decisions made by several experts are combined. A detailed description of these three levels of fusion is given below: 0 Low Level Fusion: At this level, several raw images are combined to produce anew ”raw” image that is expected to be more informative than the inputs [45— 49,49]. The main advantage of low level fusion is that the original measured 11 quantities are directly involved in the fusion process. Furthermore, algorithms are computationally efficient and easy to implement. Low level fusion requires a precise registration of the available images. 0 Feature or Intermediate Level Fusion: Feature level fusion combines var- ious features such as edges, corners, lines, texture parameters, etc. [50-52]. In this model several feature extraction methods are used to extract the fea- tures of input images and a set of relevant features are selected from the avail- able features. Methods of feature fusion include Principal Component Analysis (PCA), Multi-layer Perceptron (MLP), etc. [53]. The fused features are typically used for computer vision applications such as segmentation, object detection, etc. [53]. In feature level fusion, all the pixels of the input images do not con- tribute in fusion. Only the salient features of the images are extracted and fused. 0 Decision or High Level Fusion: This stage combines decisions from several experts [54—57]. Methods of decision fusion include voting methods [55], sta- tistical methods [54], fuzzy logic based methods [56], and Dempster-Schafer’s method [57] , etc. 2.2 Evaluation of Fused Image Image quality measurement is crucial for most image processing applications. Gen- erally speaking, an image quality metric has three kinds of applications [58]: 1. Control the Quality of Image Acquisition Process: The image quality metric can be used to optimize the performance of an image acquisition system. For example, an image and video acquisition system can use the quality metric to monitor and automatically adjust itself to obtain the best quality image and video data. 12 2. Benchmarking of Image Processing Systems and Algorithms: It can be employed to benchmark image processing systems and algorithms. Suppose we need to select one from multiple image processing systems for a specific task, then a quality metric can help evaluate which algorithm provides the best result. 3. Design of Image Processing System: Image quality metric can be used to optimize the algorithms and the parameters of image processing algorithms. For instance, in a visual communication system, a quality metric can help optimal design of the pre-filtering and bit assignment algorithms encoder and the post processing algorithms at the decoder [58]. Usually, an image quality metric is derived by comparing the processed image with the ground truth image [59]. However, in the case of image fusion, ground truth data is not always available. Therefore, the quality must be measured by using only the input and the fused images. Due to this reason, the objective assessment of the quality of the fused image is a challenging problem in image processing. Some of the quality metrics that have been commonly used in the area of image fusion are summarized below. 0 Similarity Index Based Quality Metric: A similarity index based quality metric is based on the universal image quality index (UIQI) proposed by Wang et a1 [59]. It is used to compare a processed image with the corresponding ground truth image. For the processed image y : {yi [ z' z 1, 2, - -- ,N} and the corresponding ground truth image x : {17, | i = 1, 2, - -- ,N}, the quality metric Q is defined as 401111701! ya'v Q : (a; + caravan? + (yaazr (2.1) 13 where N 1 N 33M? N 2171 yap — ' 92 1:1 221 N N 022—1 Eur—:1: J2 02=—-1——:(U'—y )2 1' N _ 1 i 1 Z all i y N _1 _ 1 g 2 av : 1.: 1 N 03:3; : —N———I 210% — $av)(yi — yam) 2: The dynamic range of Q is [— 1, 1]. Due to the nonstationary nature of the image signal, (2.1) is applied to local regions of the image using a sliding window and the final index is computed by taking the average of all the local quality indices. Cvejic et al. extended UIQI for measuring the quality of fused images by using only the input and the fused images [60]. This quality index is based on the weighted combination of the universal image quality index between the fused and the input images. A brief description of the algorithm is given below. Let x 2 {xi [2 : 1,2,--- ,N} and y 2 {3), [z : 1,2,--- ,N} be two input images for fusion and f = {f, | 2' : 1,2, - u ,N} is the corresponding fused image. The fusion quality index Q F is defined as QF : sim(x,y, f)Q(x, f) + (1 — sim(x,y,f))Q(y, f). (2.2) The similarity factor sz'm(x, y, f) is expressed as f O if——U:5f——- < 0, Uxf + ny sz'm(x,y,f) : ( —(—73—f————— if0 g -———U¥i—— S 1, (2.3) . Uzi + ”yr Uxf + 0y! 1 if 0“” > 1 \ 033f + 09f 14 where [V 1 Oatf = m. N101} — $av)(f'£ " fat?) 2: ——:"1‘Z(yi— yavx fav) The similarity index lies between —1 and +1. Index values close to :tl indicate the robustness of the fusion algorithm. Entropy Based Quality Metric: The entropy [61] of an image is defined as [62,63] G —§kmmmnm. as i=0 where G is the number of gray levels in the image and p(i) is the normalized frequency of occurrence of the it” gray level. The entropy is used to measure the overall information in the fused image. The larger the value of entropy, the better the fusion result. Cross EntrOpy Based Quality Metric: The overall cross entropy [61] of input images [63] X, Y and fused image F is defined as CE(X, F) + car, F) CE = 2 (2.5) where E(X F)_ 2pm 982325)) (2.6) , _ z. 0 0271/03) awn—gamemw. an p X(1I), py(z') and pp(i) are the normalized frequency of occurrence of the it“ gray level of images X, Y and F respectively. G is the number of gray levels in the input and fused images. The overall cross entropy is used to measure 15 the difference between the input images and the fused image. In general, the smaller the value, the better the fusion result. The entropy and the cross entropy based quality metrics are susceptible to noise and are not used to evaluate the fused result in presence of noise. The quality metrics for fusion discussed above are based on the first and the second order statistics of image data, which are not sufficient to capture all the features of the fused image that are relevant to human visual system. Hence, a quality metric based on human visual system will be more appropriate to evaluate the fused image quantitatively. The quality metrics discussed above are useful if the fused image is the final output of the system. However, in some cases, image fusion is just an intermediate step and the fused image is further used by other image processing applications. In such cases, the performance of the application that exploits the fused image can be utilized to judge the quality of the fusion. 2.3 Prior Work In this section, a survey of prior work related to the research foci in this dissertation is provided. 2.3.1 Image Fusion using Multichannel Deconvolution Under the assumption that the sensor used in image acquisition process is linear and space invariant, the observed image can be modeled as the output of a linear, space invariant system [64] as shown in Fig. 2.1. Io(:r,y) and 1(33, y) are true and observed images, respectively, h(:c, y) is the sensor PSF and 77(3), y) is noise added during image acquisition. In this model, noise has been assumed to be additive. 16 77(x. y) 10(x.y)——9| h(x,y) I(x.y) Figure 2.1: Single Sensor Image Acquisition Process The relationship between 10(ar,y) and I (say) can be expressed as a convolution operation [64] as shown below: 1(18, :1) = titty) * 1003.31) + May) (2-8) In general, I (as, y) is a smeared representation of 10(x,y) since the sensor PSF is not a delta function. Typically, inverse techniques are used to restore Io(:r,y) from I (:12, y) and the most commonly used technique is deconvolution [64]. In cases where the sensor PSF h.(:c,y) is known, a Wiener filter based deconvolution approach [65] can be used. In most practical applications, the only known quantity is 1(1ry) In such cases, a blind deconvolution technique is necessary to estimate both 10(13, 3;) and h(a:, y) [66]. However, perfect reconstruction of 10(33, y) from I (x, y) is difficult if zeros in the frequency domain of h(x, y) coincide with the non-zero frequency components of 10(1):, y). Estimation of 10(33, y) can be improved significantly by using a multichannel setup [25,26] as shown in Fig. 2.2. In the multichannel setup, the source is inspected using multiple sensors. If the sensors are complementary, i.e., if sensor PSFs do not have common zeros then it is possible to reconstruct 10(1r,y) perfectly by using all the observed images simultaneously [5]. In Fig. 2.2, h1(:r,y), h2(:z:,y), ~-, hn(a:,y) are PSFS of n. sensors used to inspect [0(13, y). 1,;(112, y)(1 S 2' S n) is the output of it“ sensor and 1),:(93, y) is the correspond- ing sensor noise. The relationship between the original or source image 10(33, y) and output images 11(x, y), -- - , In(:r, y) is expressed as given below. Many) = fat-(1230* 100?, y) + May); 1 S 'i- s 71» (‘2-9) 17 712 (ml 10(.,y)__.] 122(3) 1,05) I : Unlwl Figure 2.2: Multiple Sensors Image Acquisition Process Typically, [0(x,y) is estimated by deconvolving all the observed images simultane- ously. This approach is known as multichannel deconvolution [25,26]. Similar to single channel deconvolution, if PSFs h.1(:r, y), h2(:r,y), ---, hn(:c,y) are unknown, the approach is called blind multichannel deconvolution. Note that the multichannel deconvolution schematic can be viewed as an image fusion model in which Io(:r, y) is the fused image that needs to be estimated by using all the observed images [1(17, y), 12(5r,y), , In,(:r,y) simultaneously. In [25] a Fourier transform based approach has been proposed to estimate Io(:r. y), from multiple digital camera images. Let [if ( f1, f2)(1 _<_ i _<_ n.) be the Fourier transform of ith' observed image I,-(:r,y). Similarly, H1(f1,f2), H2(f1,f2), ---, Hn(f1, f2) are Fourier transforms of h.1(:c,y), h2(a:,y), ---, hn(:r,y), respectively. Let 15(E81)(f1, f2) be the Fourier transform of estimated 10(27, y). Then, _ 232:1 H§“(f1,f2)1{(f1.f2) f(E3t) 10 (f1? f2) - i=1 le'(f1,f2)l2 + K I (2.10) where H; (f1, f2) is conjugate of H,-(f1, f2) and K is regularization constant. How- 18 ever, this approach assumes that PSFs are estimated from the camera exposure time which limits its application. Furthermore, the regularization constant K is selected heuristically. Reference [26] estimates 10(13, y) by solving a set of linear equations. Let [5,, I f and 77$ be the lexicographical representation of input image Io(:r,y), it” observed image I,(:c, y) and noise 77,017, y), respectively. Therefore, if 1 S a: S M and 1 _<_ y _<_ N, then each of If” If and 7721. is a vector of size M x N. Let H,- be the matrix representation of PSF h,(:c,y). Clearly, the size of H,- is M2 x N2. From eq. (2.9), If : H,Ié+n,l;;15i_<_n 2 => I = HIé+n. (2.11) ‘ where I = [I[,--- ,I,l‘,]T, n : [77[,--- ,n,’,]T, H = [H[,—-- ,H,",]T, and []T represents the matrix transpose operation. Estimation of I (I, from eq. (2.11) is a classical restora- tion problem and there are several methods available to solve this equation [67,68]. However, the size of H, is very large even for moderate size of images, and it is therefore difficult to use this approach for large size images. 2.3.2 An Optimal Weighted Average Pixel-Level Fusion A simple and intuitive approach for pixel level fusion comprises of estimating the fused pixel by taking the weighted combination of the corresponding pixels in the input images [47,69]. Let X and Y be two input images that are to be fused and Z is the corresponding fused image. For any given spatial location (m, n), the fused pixel and the input pixels can be related as Z(m, n.) : wg;(m, n)X(m, n) + wy(m, n)Y(m, n). (2.12) 19 To compute the weights 11:1; and my, a match measure M Xy(m. n) is computed [47,70]. The expression for M Xy(m, n) is given below: 2: 217(3, t)X(m + s, n + t)Y(m + 3,71, + t) SESJET MXi/(m. n) = , (‘2-13) A§(m, n) + A2Y(m, n) where w(s,t) is a weight window and ZsES,t€T w(s,t) = 1. S and T describe the size of the window. The pixels of the window are defined by the user. A X (m, n.) and Ay(m, n) are activity levels as defined below: AX(m,n) : Z w(s,t)|X(m+s,n+t)| (2.14) SES,tET Ay(m,n) : Z w(s,t)|Y(m+s,n+t)[. (2.15) sES,t€T The following rule is used to estimate weights 212;; and my: u1x(m, n) : 0, tug/(mm) : 1 — wx(m,n) if MXy(m,n) < 0., 1 1 1— MXY , 2. ,fi , : — — — ——-— r ,, , > 2.16 u1(mvn) 2 2( l—a ) 1fMAy(m n)_a ( ) u1y(m, n) = 1 — uix(m,n), (2.17) where a is a user defined threshold. A local variance based method has also been proposed to estimate weights rum and my [47]. In this approach, the weight for each input image is determined by computing the variance in a neighborhood (N x M) centered at (m, n): a 1 2 uzm(7n.,n) : MN 2: (X (mm) — Xaz,(m,n)) , (2.18) , sES,tET where X av(m, n) is the mean value computed over the neighborhood (N x M) centered at (m, n) and exponent a allows the weight distribution to be modified. The weight toy is estimated in the similar manner. Petrovic et al. proposed a background elimination method for fusion [71]. In this 20 approach, the fused pixel Z (777., n) is estimated as: Z(m,n) = X(m,n) — p.X(m, n) + Y(m, n) — py(m, n) #lea n) + #Ylms 77’) 2.19 + 2 < ) where ,uX(m, n) and py (m, n) are the mean values in a given window. In [72], following approach is proposed: Z(m, n) : {X (m, n)Y(m, n) + b, (2.20) where the constant b is a bias in order to modify the final fused image. The approaches for weighted pixel level fusion discussed above are heuristic in nature. Furthermore, most of them depend on a user specified threshold. In this dissertation we propose a “Rayleigh Quotient” [73] based approach for weighted pixel level fusion. This algorithm is mathematically robust, computationally efficient and does not require any heuristic threshold. 2.3.3 Local Affine Models for Pixel-Level Fusion The local affine transform based model for image fusion has gained significant atten- tion recently [21,30,31] because it can efficiently relate the complementary as well as redundant information present in the input images at the local level. Furthermore. it allows incorporation of the effect of sensor noise. Let 10(17, y) be the true image, which is inspected by a different sensors and 11(17, 3;), 12(1r,y), , In(:r.,y) are cor- responding n measurements. The local affine transform that relates the input pixel and the corresponding pixel in measured images is given by 11(1‘, 31) = fii($.y)10(r,y) + 771(5):. y); 1 S 2' S n. (221) 6,-(17, y) and 77,-(17, y) are the gain, offset and sensor noise, respectively, of the 2"“ sensor at location (:1: y). The goal of fusion is to estimate 10(17, y). 21 In many applications such as radar imaging, visual and IR imaging, the comple- mentary, redundant and polarity reversed information are available at local level in the measured images [21,30]. The main advantage of the local affine transform model is that it can relate these local information contents efficiently, which is not possible in the multichannel deconvolution setup. Note that the two sensors 2' and j (2' # j; 1 S i, j S n) have complementary information at location (:c,y) if 6,-(23, y) aé fij(:c,y) and 6,:(23, y),,8j(:r, y) 6 {0,1}. Similarly, these two sensors have redundant information if 6,-(33, y) = 1 and fij(:c, y) = 1. Furthermore, 13,:(2:,y) sé @(ayy) and B,(:r,y),fij(x,y) E {—1, 1} corresponds to polarity reversal. Eq. (2.21) can be re-written as \ (1,) (,1) (,1) in, ' = 3 10+ 3 + 3 (2‘22) W) W) W) \"n/ => I : filo+n. (2.23) where, I = [11, - -- ,In]T, B : [61, - -- ,/3n,]T, and n = [771, - -- ,nn]T. The reference to the pixel location (3:, y) has been dropped for simplicity. In [21], eq. (2.23) is solved using a Bayesian approach. The noise vector 17 is assumed 2 to be Gaussian with zero mean and diagonal covariance 2,, = diag[o,71 , - - - ’Uiinl' 0%. is the variance of the noise of the it“ sensor at location (1:,y). Furthermore, the a priori probability density function of [0 is assumed to be Gaussian with mean m0 and variance 0%. The input pixel 10 is estimated using maximum a posterion'flVlAP) technique. Let If“ be the MAP estimate of 10. Then, If“ : [ptzglp + %] _1 + (chain — a) + -m—.§’) (2.24) o 06 Sensor gains ,6 is estimated using factor analysis. However, the performance of this 22 approach is compromised if sensor noise is not z.z..d Gaussian or the a priorl density is not Gaussian. Reference [30] estimates 10(23, y) using an Expectation-Maximization approach. However, it is assumed that Biting) 6 {—1,0, 1}. Furthermore, sensor noise 77,-(113, y) is modeled using K -term mixture of Gaussian probability density functions. 2.3.4 Super-resolution The range of resolution of imaging devices is limited. The images captured using such devices may not capture all the details of the scene if the scene has many levels of details. In many areas of image processing, high resolution (HR) images are desired and often required. The pixel density within an HR image is high, consequently, the HR image offers more detail that may be critical in various applications such as digital imaging, medical images, satellite imagery, etc. The resolution of images can be increased either at hardware or software level. Hardware based approaches typically include increasing the number of pixels per unit area or increasing the size of the sensor chip. Increasing the number of pixels per unit area increases shot noise, which in turns makes HR image noisy [40]. Increase in chip size increases the cost of imaging device, which may be inappropriate for general purpose applications. In addition, increasing chip size also increases the capacitance, which reduces the image pixel transfer rate. In the software based approach, multiple low resolution (LR) images of the scene are captured and signal processing techniques are used to fuse these low resolution images to obtain high resolution image. This approach is called super-resolution [6,38—41]. The super-resolution technique is much more cost effective than a hardware solution. Conceptually, each LR—observed image is a downsampled or aliased version of the HR image. The aliasing effect cannot be removed by processing any one of the observed LR images alone. However, if these LR images are shifted with respect to 23 each other with subpixel precision, then the shift information can be exploited to undo the aliasing effect and obtain HR images [40]. Note that if the LR images are shifted by integer units, then each image contains the same information, and thus there is no new information that can be used to reconstruct the HR image. For static scenes, the shift is introduced by the image acquisition process. For example, if multiple imaging sensors are used to image the scene, then these sensors must be located at different locations. Similarly, for a single imaging sensor case, the sensor is moved appropriately to introduce subpixel level shifts [40]. For dynamic scenes, the necessary shift is introduced by the relative motion between the camera and the scene [40]. Super-resolution has been used in several image processing applications such as remote sensing, medical imaging, biometrics, image and video surveillance, etc. Super-resolution techniques have been used extensively to increase the spatial res- olution of the remote sensing data. The multispectral images obtained using remote sensing technologies have subpixel level shifts that are used for super-resolution pur- pose [74,75]. Super-resolution has also been recently applied to magnetic resonance imaging (MRI) images [76,77]. In this case, super-resolution has been used to reduce the thickness of 2D slices [77] and to obtain 3D images from 2D slices [76]. Biometrics is also an important area of application of super-resolution techniques. Application of super-resolution for iris and face data to improve the accuracy of target identification and authentication can be found in [78] and [78], respectively. In surveillance based applications, the location of interest is monitored using dig- ital cameras, webcams, camcorders, etc. and these images/videos are analyzed for detecting abnormalities. Due to bandwidth constraints, these data are often captured at low resolution and super-resolution is used to obtain HR images/ videos [79]. Fig. 2.3 shows an observation model that relates the observed LR images and the 24 corresponding desired HR image. Let 11,12,- - - ,In be the lexicographical representa- tion of 77. low resolution images obtained using n different sensors. Note that these sensors could represent 71 different physical devices or the same device with n differ- ent settings. The 2”“ low resolution image I, is obtained by warping, blurring and downsampling the HR image I H R(lexicographically arranged). Warping captures the [HR §Sensorz 2 Warping: W1 Warping: W2 Blurring: H1 Blurring; H 2 i i f Downsgmpling: Downsgmpling: I‘DDOO. ...-0.0... CI} oooocoooooooooo00000000000 06063111011 OD‘UOCU I O ' I .......................................................................................... Figure 2.3: Observation Model for Super-resolution global or local translation and rotation of the imaging sensors. Imaging sensors also introduce blurring due to out-of—focus, diffraction limit, relative motion between the imaging sensor and the original scene, etc. The spatial resolution of HR image is higher than the low resolution observed images. In the observation model, the spatial resolution reduction is represented by the downsampling stage. Furthermore, the low resolution images are corrupted by sensor noise. 7),: is the noise of 2”” sensor, which has been assumed to be additive. Representing the warping, blurring and downsampling operation of it“ sensor by 25 matrices W,, H, and D,, the relationship between the LR images and the correspond- ing HR image can be expressed as I, :— D,H,'VV,'IHR + 772'; IS i S n. (2.25) =>I = CIHR+TI. (2.26) where I z [11,- .. ,In]", C : [D1H1W1,-~-,DanWn]",and n 2 [n1,--- ,nnj’. Es- timation of I H R from eq. (2.26) is a classic restoration problem [67,68]. Typically, the matrices D,, H,, and W, are estimated from the observed images and then de- terministic or probabilistic approaches are used to solve eq. (2.25) to estimate high resolution image I H R- The downsampling matrix D, can be estimated if the deci- mation ratio between the HR image I HR and the kth LR image 1k is known [26]. The number of available LR images usually determines the upper limit of the size of the HR image [26]. To estimate the warping matrix VV,, the LR images 11,- . - ,1" are upsampled and then the motion vector estimation techniques are used on these upsampled LR images to estimate W, [80,81]. The blurring matrix H, is typically assumed to be known a priori [82]. A brief literature survey for super-resolution methods is presented next. The HR image I HR can be estimated from eq. (2.25) using a constrained least square (CLS) framework [40]. In this approach, I H R is estimated to minimize TI 2 III. — Dim/1111312 + allCIHRllga (2.27) i=1 where a is Lagrange multiplier and C is weighting matrix to impose smoothness. However, this approach is computationally expensive and the matrices D,, H, and W, must be known. Projection onto convex sets (POCS) based approach for super-resolution has been proposed in [6,83]. In this approach, each a priori information about the HR image, such as positivity, smoothness, etc. is modeled as a convex set. in the vector space that. 26 contains the HR image. The desired HR image is the intersection of all the convex sets. However, this approach usually takes a large number of iterations to converge and solution also depends on the initial guess. The iterative back projection (IBP) method for super-resolution has been discussed in [84,85]. IBP is an iterative algorithm in which observed LR images are simulated by projecting HR image onto LR images space and the error between the actual and the simulated LR images is used to update HR image. This approach is prone to noise and selection of error metric such that solution converges is not trivial. Several probabilistic approaches have also been proposed to solve eq. (2.25). Eq. (2.25) can be re-arranged as eq. (2.26). The HR image I H R can be estimated by maximizing the a posteriori probability density function P(I H R /I) This approach is known as maximum a posteriori (MAP) estimation. Using eq. (2.26), P(IHR/I) can be expressed as: P(IHR/I) = P (I/ 125811)) U” R). (2.28) Note that the matrix C has been assumed to be known. If it is unknown then it must be estimated from the observed images as explained above. Since P(I) is independent of P(IHR), maximizing eq. (2.28) is equivalent to maximizing its numerator, i.e., P(I/ I H R)P(I H R). P(I/IHR) can be estimated from eq. (2.26) if probability density function of noise r] is known. The a priori density function of I HR: P(IHR), is usually determined based on the prior knowledge available about HR image. Schultz and Stevenson used a Huber Markov Random Field (HMRF) model for P(I H R) and assumed noise to be z'.z'd. random variables to estimate HR image using MAP model [86]. Hardie et al. used MAP model for super-resolution under the assumption of Gaussian noise and Gaussian MRF prior model [87]. Sementilli et al. presented a MAP estimator for super-resolution where posteriori density was derived from a Poisson observation model and a Poisson prior for the HR image [88]. 27 CHAPTER 3 Image Fusion Using Multichannel Deconvolution An iterative approach for image fusion using blind multichannel deconvolution is discussed in this chapter. The proposed algorithm is based on the Richardson-Lucy blind deconvolution algorithm [27,28,89,90]. 3.1 Richardson-Lucy Deconvolution Algorithm Richardson-Lucy (RL) deconvolution is an iterative algorithm to deblur a blurred image [27,28]. In this approach, the observed image I (say) and the original image [0(23, y) are related using a convolution relationship as given below: HM) = Marty) * May), (31) where h(:c, y) is the blurring point spread function (PSF) and is assumed to be known. The RL algorithm uses a Bayesian approach to estimate 10(x,y) iteratively. Let Ig(:r, y) be an estimate of 10(1):, y) at (i + I)” iteration. Then i+1 _ 101%?!) * _ _ 2' 10 (am—{[Iax’yhmflm] h( at, 9)}10(2zy)- (3-2) 28 The RL algorithm has also been extended to blind deconvolution where both h(:c, y) and 10(23, y) are estimated iteratively [90], using eq.( 3.2) and ( 3.3). 2+1 _ Hm) * _ _ i h (x’y)—{[Io(x.y)*hi(x,y)[ 10(x, y)}h(rr,y)- (3-3) The RL algorithm is susceptible to noise. It amplifies the noise if the observed data is blurry and noisy [91]. A modification known as the “damped” RL algorithm has been proposed to avoid the amplification of noise that occurs in the RL iteration. In this approach, the iteration is stopped in regions of the estimated image that fit the observed image well while allowing the estimated image to continue to improve in regions where it fits the observed image less well [91]. 3.2 Proposed Approach Let 11(a:,y),- - - ,In(:c, y) be it images acquired using 11. linear space-invariant sensors whose PSFs are denoted as h1(:r,y),- - - ,hn(:r,y). The schematic of the image ac- quisition process is shown in Fig 3.1 where Io(x,y) is the source or original image. Because sensors are assumed to be linear and shift-invariant, the observed images can. be modeled as [92,93]: 12(39): h’i($1y)* 10(x,y); 1—<— i S Tl. (34) The goal is to estimate 10(1):, y),h1(a:,y),- - - ,hn(:r, y) from eq. (3.4). Note that 10(x,y) is common in all the measurements. Motivated by this fact, we propose a serial multistage model for blind deconvolution as shown in Fig. 3.2. S, is the it“ stage of the model. I},(a:,y) and Ig+1(:c,y) are the input and output variables of it“ stage, respectively. The PSF estimation is done in a serial manner and only one PSF is estimated at each stage, i.e., PSF h, (:c, y) is estimated at stage S, (1 S i S 72.). Let h.;9‘(:r,y) be an estimate of h, (:r,y). Then h,;9‘(:c, y) is estimated 29 _,| h1 (x, y) |—> 11(x.y) lot...) .| gay) |__,12(.,y) _.| hn(x. y) l——> 1,,(x, y) Figure 3.2: Serial multistage model for blind deconvolution such that the following cost function is minimized: (130331) = am min ll 1103,31) - lit-(any) * 13.032) ||2- (3-5) 2,, 113,31 subject to the constraints Zia-(cry) : 1, and h,(:c,y) Z 0 Vi. (3.6) my and 121m) = 1.03;, 1,, hat-v.21). (3.7) 30 The constraint given in eq. (3.6) ensures that PSFs are low pass in nature and do not alter the average value of the source image. Eq. (3.7) is used to combine stages S, and S,+1. The output variable of nth stage Ig+1(:r,y) is the estimated value of 10(17in- However, knowledge of 101 (27,71) is required to execute the proposed serial model. Since 13(23, y) is unknown, an initial guess of I; (:17, y) is used and the model is imple- mented in an iterative manner. After the first iteration, Ig+l(:r, y), computed at the previous iteration is used as I; (:r, y) in the next iteration. The complete flowchart of this approach is shown in Fig. 3.3. At the kth iteration, Ionic—11(x’ y) is applied to stage 1, which in conjunction with measurement image 11 (:r,y) is used to esti— mate h1,k(:r,y). Then, h.1,k(:r,y) and 11(x,y) are used to estimate 12k(:r, y), which is passed to the next stage as an input variable. This process is repeated for all the subsequent stages. At each stage, the Richardson—Lucy blind deconvolution al- gorithm [90] is used to estimate h,,k(:r, y) andI I'+1(:r, y). The output variable of the nth stage is the estimated value of source image 10(23, y). The algorithm terminates when the convergence criterion in eq. (3.8) is satisfied. ll 13:11am 21—) 1:1iltz.y) 11’s n. (3.8) 77 is a pre-defined tolerance value. 3.3 Results The proposed algorithm was tested using benchmark images that were artificially blurred using three known motion blur filters (three-channel image restoration). In all cases, the support of the PSFs were assumed to be known. An example is the “cameraman” image in Fig. 3.4a that was blurred using three different types of filters- (1) Motion filter of length 5 and angle 13 (Channel 1) (ii) Motion filter of length 6 and angle 120 (Channel 2) and (iii) Motion filter of length 6 and angle 90 (Channel 31 . g 1 o o I Initial Values. 109,110, ’hn,0 l _ +1 Stage 1 [Ia—1431...? 1 . th (‘10,k—1’h1k—1’11 ‘ k=k+1 2 1 10,k (_10,k—l’hl,k’ V In 0,k )1 Stage I! St h”) k FIO,k’hgk—l’1n on No 11 n+1 n IO,k (_IO,k’hn,k’In Figure 3.3: Flowchart of the multichannel blind deconvolution algorithm 3). The blurred images are shown in Fig. 3.4b—3.4d. The results of the single channel deconvolution as well as the proposed algorithm are shown in Fig. 3.5. Clearly, the proposed multichannel algorithm performs better than single channel deconvolution. Similar results for “barbara” image are shown in Fig. 3.6 and 3.7. While the results of proposed algorithm are visually pleasing, a quality metric is necessary to objectively compare the results of single and multichannel deconvolution. The universal image quality index (UIQI) proposed by Wang and Bovik [59] is used to compare the deconvolved image with the corresponding ground truth image. UIQI is similar to the correlation coefficient, and lies between —1 and +1. Index values close to i1 indicate the robustness of the image processing algorithm. But this index can only be computed when the original image is available. The UIQI values for “cameraman” and “barbara” images have been summarized in Table 3.1. Clearly, the high value of UIQI indicate the feasibility of the proposed algorithm. However, note 32 that for “barbara” image, UIQI values of single as well as multichannel deconvolution are the same, but visual quality of multichannel deconvolution is better than that of single channel deconvolution. The Richardson Lucy (RL) deconvolution algorithm amplifies noise [91]. Since the proposed algorithm is based on RL algorithm, it is expected that the fused image would be noisy if the input images are noisy. To validate this property empirically, white Gaussian noise of mean 0 andlsignal to noise ratio (SNR) 19 dB was added to the blurred cameraman images shown in Fig. 3.4b-3.4d. The SNR was computed using the following expression: Signal Variance SNR : 101091 dB. (3.9) 0 Noise Variance The corresponding noisy and blurred images are shown in Fig. 3.8a—3.8c. Fig. 3.8d shows the corresponding fused image. It is clear that the proposed algorithm amplifies the noise. The UIQI of the fused image obtained by comparing it with the original cameraman image was 0.37. A modification to RL deconvolution, known as “Damped Richardson Lucy” has been proposed in the literature, which is more robust to noise as compared to RL algorithm [91]. Fig. 3.8e shows the fused image obtained by using damped RL algorithm in the proposed fusion framework. The UIQI in this case was 0.39. Note that the fused image obtained using damped BL is less noisy as compared to the one obtained using RL algorithm. However, damping also introduces extra smoothing. Fig 3.9 shows a plot of SNR of the input images vs. UIQI of the corresponding deblurred cameraman image. It is clear that “Damped” RL performs better than the one without damping. The proposed fusion framework is heuristic in nature. Hence, the mathematical framework for convergence analysis is not available. However, for an empirical un- derstanding of the convergence of the algorithm, the blurred cameraman and barbara images shown in Fig. 3.4 and 3.6 were fused, using a random image with pixel values lying between 0 and 255 as the initial guess for the fused image. The fused images 33 as shown in Fig. 3.10 are clearly different from those shown in Fig. 3.4 and 3.6. This suggests that the proposed fusion algorithm is sensitive to the initial guess. Figure 3.4: Cameraman Image:(a) Original (b) Channel 1: Blurred by motion blur of length 5 and angle 13 (c) Channel 2: Blurred by motion blur of length 6 and angle 120 ((1) Channel 3: Blurred by motion blur of length 6 in vertical direction 34 v L. \ ' \ .1- .... ...__.____.-. v"--. . .. Figure 3.5: Deconvolution of Cameraman Image:(a) Channel 1 only (b) Channel 2 only (0) Channel 3 only (d) Proposed approach 35 (C) (d) Figure 3.6: Barbara Image:(a) Original (b) Channel 1: Blurred by motion blur of length 9 and angle 30 (c) Channel 2: Blurred by motion blur of length 9 and angle 60 (d) Channel 3: Blurred by motion blur of length 11 in horizontal direction 36 (C) (01) Figure 3.7: Deconvolution of Barbara Image:(a) Channel 1 only (b) Channel 2 only (c) Channel 3 only ((1) Proposed approach 37 Table 3.1: Performance summary of the multichannel deconvolution algorithm on benchmark images SI No. Images UIQI: UIQI: UIQI: UIQI: Channel Channel Channel PrOposed 1 2 3 approach 1. Cameraman 0.45 0.41 0.63 0.75 2. Barbara 0.99 0.99 0.99 0.99 38 Figure 3.8: Noise Analysis (Additive white Gaussian Noise, mean = 0, SNR = 19 dB): Cameraman Image (a) Channel 1 (b) Channel 2 (c) Channel 3 (d) Fused Image: Using RL without damping (e) Fused Image: Using damped BL 39 + No Damping 0.6- an Damping ..ar " L-+ 0.5- - 1‘ 0_4_ * .. _ I .. *+ 5 0.3- .1" + - 0.2- - 0.1- ,1!" _ + 0 + 1 1 1 1 1 I 1 0 5 10 15 20 25 30 SNR(dB)---> Figure 3.9: Plot of SNR vs. UIQI (a) (b) Figure 3.10: Convergence Analysis (a) Fused Cameraman Image (b) Fused Bar- bara Image 40 3.4 Discussion An iterative algorithm based on Richardson-Lucy blind deconvolution is proposed in this chapter. Initial results indicate that the proposed algorithm provides superior performance when compared to single channel blind deconvolution. However, this algorithm is based on Richardson-Lucy deconvolution algorithm, which tends to am- plify the noise [91]. Therefore, in presence of noise either damped Richardson-Lucy algorithm [91] should be used or the input images should be denoised prior to de- convolution using denoising algorithms [94,95]. Furthermore, the proposed algorithm assumes the knowledge of true size or support of the PSFs. 3.5 Superresolution Recently, a deconvolution based approach has also been proposed for super- resolution [96,97]. If warping is a global translation and blur is shift invariant, then the blur matrix H, and the warping matrix W, in eq. (2.25) correspond to convolution matrices and they commute. In this case eq. (2.25) can be re-written as 12' = Din'NIHR + 772'; 1 S i S n (310) where H?” = H,W,. Since H, and W, are convolution matrices, Hz-N is also a convo- lution matrix, and the HR image I H R can be estimated from eq. (3.10) using blind deconvolution. Sroubek et al. proposed a MAP estimator based blind deconvolu- tion to estimate I H R [96]. The noise was assumed to be additive, white Gaussian. Nguyen et al. also proposed a multi-channel framework but a parametric model was assumed for convolution matrix H 2N [97]. This is a very limiting assumption for many applications. The multi-channel deconvolution framework has also been investigated for superres- olution in which the LR images are first upsampled and multichannel deconvolution 41 algorithm is applied to the upsampled LR images to estimate HR image [82]. In this section we explore this approach for super-resolution. Representing the upsampled version of I, as IiT , it is clear from eq. (3.10) that InggleRang lgz‘gn. (3.11) I H R can be estimated from eq. (3.11) using a multichannel blind deconvolution ap- proach. We propose to use the multi-channel blind deconvolution algorithm proposed earlier to estimate I H R from eq. (3.11). The overall approach for the proposed super- | Upsampling I l Upsampling I | Upsampling : l Multi-channel Blind Deconvolution I l ................ Figure 3.11: Overall Approach for Super-resolution resolution algorithm is shown in Fig. 3.11. The observed LR images 11, - -- ,In are assumed to be noise free. These LR images are first upsampled and then the upsam- pled LR images are deconvolved using the multichannel blind deconvolution algorithm proposed above. 42 3.6 Superresolution Results The proposed super-resolution algorithm was tested using “cameraman” and “barabara” images. In all cases, the support of the PSFs were assumed to be known. The first set of results is shown for cameraman image. Fig. 3.12a shows the original image. Low resolution (LR) images were generated by blurring the original image using four different types of motion filter (1) Motion filter of length 7 and angle 60 (ii) Motion filter of length 7 and angle 90 (iii) Motion filter of length 9 and angle 0 and (iv) Motion filter of length 9 and angle 135. These blurred images were further downsampled by a factor of 0.5. The LR images are shown in Fig. 3.12b-3.12e. For comparison purpose the LR images have been shown at the same scale as the original image. The super-resolution algorithm explained in Section 3.5 was applied to these LR images. In order to investigate the effect of number of frames used to construct the SR image, the proposed SR algorithm was applied to 1 input image, 2 input images with minimum blur, 3 input images with minimum amount of blur and all the 4 input images. Fig. 3.13 shows all the four SR images. The UIQI was computed for each of these SR images by comparing them with the corresponding ground truth. The UIQI for SR images shown in Fig. 3.13a, 3.13b, 3.130 and 3.13d were 0.43, 0.53, 0.62 and 0.61, respectively. Visual inspection also suggests that the quality of SR images obtained using 3 and 4 input images are much better than those obtained using only 1 or 2 inputs. Furthermore, the performance of 3 and 4 inputs appear to be the same. Similar results for “barbara” image are shown in Fig. 3.14. The motion blurs used were-(i) Motion filter of length 9 and angle 60 (ii) Motion filter of length 9 and angle 90 (iii) Motion filter of length 11 and angle 0 and (iv) Motion filter of length 11 and angle 135. In this case the high resolution image was obtained using all the four inputs. The UIQI for the “barbara” image was 0.66. To study the performance of the proposed algorithm in the presence of noise, zero mean white Gaussian noise was added to the downsampled, blurred LR. “barbara” 43 images and the proposed SR algorithm was applied to these noisy LR frames. F ig. 3.15 shows one set of noisy images and the corresponding SR image. These noisy LR images were obtained by adding zero mean white Gaussian noise of SN R 18 dB to the LR images shown in Fig. 3.14. The SNR was computed using eq. (3.9). Clearly the SR image appears to be noisy. The UIQI of this SR images was 0.43. Fig. 3.16 shows a plot of UIQI of SR image and the corresponding SNR of the input images. It is apparent that the noise affects the performance of the proposed algorithm. Note that the damped RL algorithm was used in the multichannel blind deconvolution process. 44 (6) Figure 3.12: LR frames for Cameraman image: (a) Original Image (256x256) (b) LR Image 1 (128x128): Blurred by motion blur of length 7 and angle 60 (c) LR Image 2 (128x128): Blurred by motion blur of length 7 and angle 90 ((1) LR Image 3 ( 128x 128): Blurred by motion blur of length 9 and angle 0 (e) LR Image 4 (128x128): Blurred by motion blur of length 9 and angle 135 45 (C) ((0 Figure 3.13: SR results for Cameraman image (256x256) (a) Using one Input Image shown in Fig. 3.12b (b) Using two input images shown in Fig. 3.12b and 3.12b (c) Using three input images shown in Fig. 3.12b-3.12d (d) Using all the four input images 46 (C) (I) Figure 3.14: LR frames for Barabara image: (a) Original Image (512x512) (b) LR Image 1 (256x256): Blurred by motion blur of length 9 and angle 60 (c) LR Image 2 (256x256): Blurred by motion blur of length 9 and angle 90 (d) LR Image 3 (256x256): Blurred by motion blur of length 1 1 and angle 0 (e) LR Image 4 (256x256): Blurred by motion blur of length 11 and angle 135 (f) SR Image (512x512) 47 (C) Figure 3.15: Performance in presence of noise, SNR : 18 dB: (a) Input 1 (b) Input 2 (0) Input 3 (d)Input 4 (e) SR Image 48 1 f I 1 I 1 4* SR UIQI: Damped RL 09* --+- SR UIQI: No Noise 1 0.81 0.7r ................................................................. .+ A 0.6’ '1 [ it 605- ...-“N“- .1 5 _ '3' 0.4 .--*' 0.3- “I 0.2- 0.1' $.41" ‘ O 1 1 1 I 1 1 1 0 5 10 15 20 25 30 SNR(dB)---> Figure 3.16: Plot of SNR vs. UIQI for “barbara” image 49 CHAPTER 4 An Optimal Weighted Average Pixel-Level Fusion In the previous chapter a multichannel deconvolution algorithm was presented. Con- volution is a global model in which the PSF of the sensor affects the entire image. Often, however, the complementary information is present at the local level. Fur- thermore, the input images that are to be fused may not be blurred at all and still contain complementary information. For example, CT and MRI sensors provide com- plementary information even if the images do not look blurred. Clearly, in such cases convolution is not an appropriate model for fusion. A simple and intuitive approach to extract local complementary information is pixel level fusion. In this chapter a. computationally efficient approach is discussed for pixel level fusion. The proposed approach models the fusion process as a weighted average of input data where the weights of the fusion process are estimated by maximizing the average power of the corresponding fused pixel. The proposed approach does not require any user specific threshold and is computationally efficient. 50 4. 1 PrOposed Approach Let f1(:r,y), f2(:r,y),--- , fn,(:z:,y) be n observed images that are to be fused for (Lt, y) E Q. We assume pixels of the observed images to be random variables. Fur- thermore, we also assume f,(:c, y) 2 0 (1 S 2' S n). This assumption is valid for many imaging devices such as digital cameras, IR cameras, etc. and does not limit the proposed algorithm in any way since data not satisfying this requirements (i.e., with negative pixel values) can always be transformed using a simple linear transformation to make the pixel values positive. Denoting the fused image as f0(x, y), the pixel of the fused image and the corresponding pixels in the observed images are related as: f0(93a?/) Z a1(:c,y)f1(:ry) + (12(1‘, y)f2(x. y) + ' ° ' + an.(;r,y)fn(;r,y). (41) The following constraints are imposed on the weights ai(;r, y) (1 S i S n): Tl , . 2 __ . ~ away) 6 [0,1], :01,th 1, IS 2 S 72, (aw) E 9- (4-2) 7 i=1 The weights a,(:r, y) are usually unknown and must be estimated from the data. In the absence of noise, the average power of the pixel can be viewed as estimate of the contrast of the image [59]. Therefore, we propose to estimate the weights a1(:r., y), ag(:r,y), . - . , an(:r, y) such that the average power of the fused pixel f0(;r., y) is maximized. Eq. (4.1) can be re-written as f0 = an (4.3) where a : [(11, - -- ,an]T, f 2 [f1,--- , fan, and the reference to the pixel location (say) has been dropped for notational simplicity. The average power of the fused pixel f0, denoted as Pfo, can be estimated as shown below: 13,0 = E[f0f0T] = E[anfTa] = aTE[ffT]a. (4.4) 51 Let 2f be the correlation matrix of the random vector f, i.e. 2f: E [fl'T]. Also, from eq. (4.2), it is clear that aTa : 1. Therefore, P — aTZfa (4 5) f0 — aTa ' ' We propose to estimate the weight vector a such that PfO in eq. (4.5) is maximized. Note that 2f is a real and symmetric matrix and for any given real and symmetric matrix 2f, Pfo is the Rayleigh quotient [73]. Therefore, PfO is maximized if the weight vector a is equal to the principal eigenvector of 2f. It is evident that the correlation matrix 2f is unknown and must be estimated from the input images to estimate weight vector a. We use a region-based approach to estimate the correlation matrix for all the pixels in that region as explained below. Let us assume that the input images f1, - - - , fn are divided into blocks of size p x q. For one such block denoted by R, assume that 0 X X (fll’ f1 ”’fpxq)? (f21’fi’ ' fp Q), "'9 (lei9f’g9m9 71?). q) are the lexicographical representation of pixel values of n input images. These pixels can be viewed as n—variate random variables (f11,f21, ' ' ' 9 fd) T T (fifim .fg) , "w (ffxq, fgqu- f3”) . Furthermore, we assume these n-variate random variables are 1'.sz Therefore, the correlation matrix of these random variables, represented as 2V, can be computed as shown below. qu 2 I/kI/T (4.6) v- p x q _2 A k k. k T . . . . . and Vk : (f1, f2 , - -- , fn) . 2,, IS used in eq. (4.5) to estimate optimal weights to fuse all the input pixels inside region R. Therefore, we propose the following rule to fuse the pixels inside region R. 1. Compute 2,, for block R. 2. Let u be the principal eigenvector of 2V. Estimate M such that ”Tu : 1. 52 3. Set weight a : M; V (any) 6 B This process may be repeated for each block. Note that the weights computed this way are constant over the region. 4.2 Overall Approach The flowchart of the proposed pixel level fusion method is shown in Fig. 4.1. We first split the input images into several blocks, and arrange the pixels inside each block lexicographically. The lexicographically arranged pixels of any given block from all the input images are used to compute the correlation matrix 23,, and the principal eigenvector of this correlation matrix is used to determine the optimal weights to fuse that block. This process is repeated for all the blocks. After the optimal weights are estimated, eq. (4.1) is used to estimate the fused pixels, f0(:r, y) for all locations (any) 6 0 4.3 Noise Analysis To study the performance of the proposed algorithm in the presence of sensor noise, we assume that the input images f1(x, y), f2(:r., y), - - - , fn(:r., y) are corrupted by additive noise. Therefore, the fused pixel in eq. (4.1) can be written as: f0 = a1(f1 + 771) + a2(f2 + 722) + ° ' ' + anlfn + 7772.). (4-7) => f0 : an + aTn (4.8) __ T _ .. T — T ' Where a — [(11, ,anl 9 — [f 1, ' ,fnl and 77 — [771, .7719] - Assuming f and n to be uncorrelated, the average power of the fused pixel f0, denoted as PfO’ can be expressed as: , T :3 + 2, PfO : ElfofoTl : a ( :Ta ”3 (4-9) oInput: n. co-registered sensor images OOutput: Fused image OSensor Gain Estimation: —) Split each input image into B blocks each of size p x q —) For each block (2)21: 1,--- ,B 0 Compute the correlation matrix of the block. 0 Estimate optimal weights for each block: Principal eigenvector of the correlation matrix as presented in Section 4.1 o Rision Process: —) Use eq. (4.1) to fuse input pixels. Figure 4.1: Flowchart of the Proposed Weighted Average Pixel Level Image Fusion where 2f : E [ffT] and Zn 2 E [’rmT]. Hence, it is clear that in presence of noise, the estimation of a using the rules proposed in Section 4.1 will be affected by the noise, and the fused image will be noisy. Therefore the use of appropriate pre-filtering is necessary to mitigate the effect of noise in this algorithm. 4.4 Results The results of the proposed fusion algorithm are presented in this section. The pro- posed fusion algorithm was applied to three different datasets: (i) medical imaging (ii) aircraft navigation, and (iii) multifocus images. For each dataset, only two input im- ages were considered for the fusion process and these two inputs were co—registered. To compare the performance of the proposed algorithm with those using heuristic rules, the proposed fusion approach is also compared to the “Max” rule for fusion. In 54 the “Max” rule, the input pixels having the maximum absolute value is selected as the fused pixel [47]. Fig. 4.2 shows the medical images that were used to validate the proposed algo- rithm. Fig. 4.2a and 4.2b are the images of a human brain obtained using magnetic resonance imaging (MRI) and computed tomography (CT), respectively. These im- ages are available online at [98]. CT and MRI sensors provide complementary in- formation, with CT proving a good visual description of bone tissue, whereas soft tissues are better visualized by MRI. To apply the proposed approach, CT and MRI images were divided into non-overlapping regions of size 8 x 8 and for each region, the optimal weights for fusion was computed using the approach explained in Section 4.2. The fused image is shown in Fig. 4.2c. The fused image obtained using “Max” rule is shown in Fig. 4.2c. In this case the performance of the “Max” rule and the proposed approach visually appears to be similar. The proposed algorithm was also applied to the aircraft navigation images [98] shown in Fig. 4.3. Fig. 4.3a was captured using a low-light-television (LLTV) sensor and Fig. 4.3b was obtained using a forward-looking-infrared (FLIR) sensor. Note that LLT V sensor provides the surface information of the ground as well as building and vegetation details. However, FLIR image describes the road network details accurately. The goal of fusion is to have all the salient features of both the sensors in the fused image. Fig. 4.3c shows the corresponding fused image. The fusion process was exactly the same as that of the medical imaging dataset. Fig. 4.3d shows the output of the “Max” rule. Visually, the proposed approach appears to be better than the heuristic approach. The third set of results is shown for the multifocus image [99] shown in Fig. 4.4. Fig. 4.4a corresponds to the case when the camera was focused on the front part of the scene, while Fig. 4.4b shows the image when the camera was focused on the background part of the scene. Fig. 4.4c shows the corresponding fused image. The 55 fusion process was exactly the same as those of the previous datasets. Fig. 4.4d shows the fused image obtained using “Max” rule. Again, it is obvious that the proposed algorithm provides a better fused image. To validate the noise analysis presented in Section 4.3, zero mean white Gaussian noise was added to the input images of all the three datasets shown in Fig. 4.2-4.4. Fig. 4.5a and 4.5b show the noisy input images obtained by adding zero mean white Gaussian noise of SNR :: 15 dB. The SNR was computed using eq. (3.9). Fig. 4.5c and 4.5d are the fused images obtained using the proposed approach and the “Max” rule, respectively. The results for the “aircraft navigation” and the “multifocus” images are shown in Fig. 4.6 and 4.7, respectively. The SNR for these images were 11 dB and 16 dB, respectively. Visual inspection clearly shows that in all the cases both types of fused images are noisy, however, except for the medical images, the proposed approach provides more details about the scene even in the presence of noise. The similarity index based quality metric was used to estimate the quality of the fused image quantitatively [60]. In the similarity based index, a similarity metric is computed between the measurement images and the fused image. This index lies between —1 and +1. Index values close to 21:1 indicate the robustness of the fusion algorithm. Table 4.1 summarizes this quality index for the cases depicted in Figs. 4.2- 4.4 and Figs. 4.5-4.7. Note that the maximum value of the similarity index based metric is 1. Clearly, the high value of similarity index indicates the feasibility of the proposed algorithm. However, the similarity index and visual inspection of the fusion results in different conclusions. For example, in the case of “aircraft navigation” and “multifocus” images, the similarity index for “Max” rule are almost the same or slightly higher than the corresponding values for the proposed approach. But visual inspection suggests that for these images the proposed approach provides better visual quality than the “Max” rule even in the presence of noise. A plot of the similarity index of the proposed fused image as a function of SNR is 56 presented in Fig.4.8. The similarity index for the “Max” rule has not been reported in this plot as the quality metric provides ambiguous information in this case. It is clear from the plot quality index decreases with SN R. Therefore, performance of the proposed fusion algorithm is prone to sensor noise. Hence, in presence of noise, noise removal should be used as a pre-processing stage in the proposed framework for fusion. Table 4.1: Performance summary of the weighted average pixel level fusion algo- rithm Image DataSet SNR (dB): SNR (dB): Similarity Similarity Input 1 Input 2 Index: Index: Proposed Max rule Approach , oo(No noise) oo(No noise) 0.93 0.94 Medical Image 15 15 0.39 0.39 N '. N '. 0.79 0.84 Aircraft Navigation 00( O n01se) OO( 0 none) 11 11 0.28 0.27 _ 00(No noise) oo(No noise) 0.84 0.86 Multifocus 16 16 0.35 0.34 The proposed algorithm was also found to be computationally efficient. The fusion of images shown in Fig. 4.2, which are of size 256 x 256 took 0.25 sec. Similarly. for the images shown in Fig. 4.3 (size = 512 x 512) and 4.4 (size = 512 x 512), the fusion took 0.55 and 0.59 sec., respectively. All the simulations were done using Matlab 7.1 running on 1.86 GHz Intel Machine, with 1.0 GB RAM. 4.5 Discussion An optimal weighted average approach for pixel level fusion is proposed in this chap- ter. The average power of the fused pixel is modeled as a Rayleigh quotient and the weights are selected such that the Rayleigh quotient is maximized. The initial results 57 indicate the feasibility of the proposed approach. However, this algorithm is prone to sensor noise. Therefore, in the presence of noise a de—noising filter should be applied to the inputs prior to fusion. 58 (C) (d) Figure 4.2: Medical Images (a) MRI Image (b) CT Image (c) Fused Image: Pro— posed Approach (d) Fused Image: “Max” rule 59 (c) (d) Figure 4.3: Aircraft Navigation Images (a) LLTV Image (b) FLIR Image c) Fused Image: Proposed Approach (d) Fhsed Image: “Max” rule 60 (C) (d) Figure 4.4: Multifocus Images (a) Focus on the front part (b) Focus on the back part c) Fused Image: Proposed Approach ((1) Fused Image: “Max” rule 61 (c) (d) Figure 4.5: Medical Images (SNR : 15 dB)(a) MRI Image (b) CT Image (c) Fused Image: Proposed Approach (d) Fused Image: “Max” rule 62 (C) (01) Figure 4.6: Aircraft Navigation Images (SNR = 11 dB) (a) LLTV Image (b) FLIR Image c) Fused Image: Proposed Approach (d) Fused Image: “Max” rule 63 (C) (d) Figure 4.7: Multifocus Images (SNR : 16 dB) (a) Focus on the front part (b) Focus on the back part c) Fused Image: Proposed Approach ((1) Fused Image: “Max" rule 64 Similarity Quality Index--> 1 I I I I I I I I --+-- Medical Image 0.9 - ...... -- Aircraft Navigation 0 8 _ ~--D ~ Multifocus o.7~ . * .' I 0.6- as a 0.5L *_,.'.":4'~,'B _ "+13“ 0.4 7 9F+ .' +.:' . D 0.3- _.--ji::*i..EI'n 0.2 - 0.1 - .+. 3+1}. 0 SE55- -5 0 5 10 15 20 25 30 SNR (dB)---> Figure 4.8: Plot of SNR vs. Similarity Quality Index 65 CHAPTER 5 A Total Variation Based Algorithm for Pixel Level Image Fusion The pixel level fusion discussed in the previous chapter is susceptible to sensor noise. In this chapter a total variation (TV) based approach is discussed for pixel level fusion to fuse images acquired using multiple sensors in the presence of sensor noise. In this approach, fusion is posed as an inverse problem and a locally affine model is used as the forward model. A total variation norm based approach in conjunction with principal component analysis is used iteratively to estimate the fused image. 5.1 The Image Acquisition Model Let. f0(;r., y) be the true image, which is inspected by 71 different sensors and f1 (:17 y), fg(:1:,y), ~--, f,,,(.r,y) are the corresponding n measurements for (1:, y) E Q. The local affine transform that relates the input pixel and the corresponding pixel in the measured images is given by [21] fi(x9y) : fli(I9y)f0(xiy) + 772(139); 132$ 71.. (51) 66 Here, 6,-(33, y) and 77,:(17, y) are the gain and sensor noise, respectively, of the it” sensor at location (:13, y). The goal of fusion is to estimate fo(:r, y) from f,(x,y), 1 S i S n. In many applications such as radar imaging and visual and IR imaging, the comple- mentary as well as redundant information are available at the local level in the mea- sured images [21,30]. The main advantage of the local afline transform model is that it. can relate this local information content. in a mathematically convenient manner. For example, as an extreme case, two sensors 2' and j (z 75 j; 1 S i, j S n) have complemen- tary information at location (2:, y) if 6,-(27, y) 7f ,8j(:r, y) and 6,-(512, y), 63- (x, y) E {0, 1}. Similarly, these two sensors have redundant information if 6,-(36, y) = flj(x,y). 5.2 Image Fusion 5.2.1 Total Variation Norm for Image Fusion In order to estimate f0(:r,y) from eq. (5.1), we assume that f0(;r,y), f,(:c, y) 2 0 (1 S i S n). This assumption is valid for many imaging devices such as digital cameras, IR cameras, etc. and does not limit the proposed algorithm in any way since data not satisfying this requirement (i.e., with negative pixel values) can always be transformed using a simple linear transformation to make the pixel values positive. Furthermore, we also assume that sensor noise 01(17, y), 772(2:, 3}), - - - , nn(x, y) are zero mean random variables and are independent of each other. The standard deviation of 77,-(33, y) is denoted as 0,, and 0,- is assumed to be known a priori and independent of spatial location (as, y). 67 From eq. (5.1), “fl” “51” “m” w W) w :2 f : fifo+77 (53) where f : [f19' ' ° 9fn]T9 fl 2 [1819 ' ° ' 99372111977 : [771, ' ' ' 977an9 and the reference t0 the pixel location (:13, y) has been dropped for notational simplicity. Eq. (5.3) can be re-arranged as T f BM“ = n+3”?! (5.4) where a!" 2 (Mm—137". The goal of fusion is to estimate f0 V(a:,y) E Q from eq. (5.4). Note that ,BN is unknown and must be estimated from the measurement images f,(:17, y) (1 S i S n). The method for estimating 3N has been discussed in Section 5.2.2. If fiN is known, then a common approach to estimating f0 from the measurements f minimizes the cost function [| [6N f — fol]. The solution to this least square estimate is f0 = ,GNf. However, this approach is not robust to noise. A total variation norm [32] based approach is proposed to estimate f0 from eq. (5.4). The total variation norm has been used in several image processing applications [32—35]. It is based on a nonlinear partial differential equation (PDE) and is more robust to noise as compared to L2 norm based estimators such as least square [36,37]. A comparison of least square and TV norm based fusion has been shown in Fig. 5.1. Fig. 5.1a and 5.1b show the input noisy images that are to be fused. Fig. 5.1c and 5.1d are the fused images obtained using least square and TV norm based approach, respectively. A detailed comparison of the fusion results are presented in Section 5.4. However, a visual comparison of Fig. 5.1c and 5.1d clearly shows that the fused image obtained using a TV norm 68 based approach is less noisy and provides pronounced edges as compared to the fused image obtained using a least squares approach. Therefore, we propose to estimate f0(:r, y) by minimizing its TV norm under suit- able constraints as given below: Nib-d A OI L! V minimize f9 [Vfo(:r,y)[dx at: [We y>| = (f3, + 9:3,) subject to the following constraints / f0 d1: dy —_— / fiNf drr. dy (5.6) Q 9 / (Mr — f0)? dzr. dy = f 02 dar. dy (5.7) Q Q . . . . . . 9 where fox and fay are the derivatives of f0 in :r and y direction, respectively, and a“ is the variance of the transformed noise ,BNn. Eq. (5.5) represents the TV norm for io- 69 (C) (d) Figure 5.1: Comparison of least square and TV norm based fusion(a) LLTV Im- age (b) FLIR Image (c) Fused Image: Least Square (d) Fused Image: Proposed Approach (TV Norm) 70 The constraints given by eq. (5.6) and (5.7) involve mean and variance of the noise BNn. Eq. (5.6) ensures that the noise [BN7] is of zero mean whereas eq. (5.7) is based on a priori information about the variance of the noise. Note that the sensor gain fiN depends on spatial location, which in turn also implies that a2 is a function of (:13, y). Assuming the sensor gain vector H to be independent of f0 and its derivatives, the approach proposed by Rudin et al. in [32] may be used to derive the following iterative solution to solve eq. (5.5): - . v k - 5+1fg—rk(v.(lvfil)+Ak(f§—[3Nf)) (0.8) 0 where k is the iteration number, Tk : is the time step, “ o ” represents the dot product and 1 k N fof A : -— - f V d d . ’° foa2dxdy/Ii 0" B l ' (Wm) 9” y The boundary condition _33—[39 : 0 must be satisfied for each iteration of eq. (0.8), where 89 represents the boundary of Q and £ is the outward normal along 89. 2 is the However, the solution to eq. (5.8) requires knowledge of 3 and 02. Since a variance of BN7) and the variance of the noise vector 1] is assumed to be known, a2 can easily be estimated if B is known. In general, fl is unknown and must be estimated prior to solving eq. (5.8). This is addressed next. 5.2.2 Estimation of B To estimate 6,-(1 S 2' S n), we assume that the gain of the sensor varies slowly from one spatial location to another. Hence, )6,- can be assumed to be constant over a small region of the sensor image. Thus, the input images are split into small regions of size p x q and sensor gains may be computed for each region. Furthermore, we also assume that ,8, 6 [0,1] for all 1 S i S 77.. From eq. (5.3), it is clear that in the absence of sensor noise, the variance of the observed images f1, f2, - -- , fn are related to each other through sensor gains 71 31, ,82, - - - ,fln. Therefore, the variances of the observed images may be used to esti- mate ,8. Also, it is well known that for a given dataset, the principal eigenvector of the correlation matrix points in the direction of maximum variance [100]. Motivated by this fact, we propose to use a principal eigenvector based approach to estimate sensor gains. The pixels of the observed images are used to construct the correla- tion matrix and its principal eigenvector is used to estimate the sensor gains. This approach is explained below. Let us assume that the input images are divided into blocks of size p x q. For one such block denoted by R, assume that (fll,f12,- -- ,ffxq), (f21,f§,- -- ,fgxq), 7 (f3, f3, - - ~ , fix“) are the lexicographical representation of pixel values of n sensor . . , T images. These pixels can be viewed as n-variate random variables (fll, fig, - - - . fl) , T T 9 ‘7 9 X X X (ff9f2d9u'9f7‘7‘) 9. (ff (Inf;9 q9"'9 712). (I) - LEI”: [H1,"',/.tvn,]T be the principal eigenvector of the correlation matrix )3”, where 1 1”“! I T 2 : — V 11- . (5.9) k l.- k T . . . . and V}; : (f1 ,f.‘2 , - - - ,f.,,) . Assuming n01se in eq.(5.3) to be zero, and due to the assumption that sensor gains are constant over block R, it can be shown that 7,, \ 1 73)“! EVZ— (fl19°"9/3n.)2f02- 1:21 W m => 21/ :- C (9819 ' ‘ ' afinl (5-10) V3") X 9 Eiizif; where C : ——————. (pxm—l 72 Let p and u 2 (,ul, - - - ,pn)T be the principal eigenvalue and principal eigenvector respectively, of 2,,. From eq.(5.10) it is clear that the rank of the matrix 2,, is one. Therefore, ,0 and u can be related to the sensor gains as shown below: p = 003% +~~+ii;%,). (5.11) M = (m.--- ,Mn.)T0<(,819"' MT (5.12) where “0c” indicates proportionality. Therefore, we propose the following rule to estimate fl : [51, - -- , fin]T for region R. 1. Compute 2,, for block R, and estimate is such that “Tu. : 1. 2. Set 61 :flgz-nzfin: 1 ifiil 2px;: =umotherwise setfi2u. This process may be repeated for each block. Note that the sensor gains computed this way are assumed to be constant over the region, i.e., fi,(:r,y) = 6,:(7‘, s), where r. 74 7", y -T" s, :r,y,r,s, E R and 1 S 2' S n. Next, we show that this proposed rule for estimating sensor gains can discriminate between common and complementary features. For the sake of simplicity, assume n = 2, i.e., there are only two sensors. Then, x312 fiifiz - 2,, : C (5.13) fiifiz 322 p : C(ii'fwé’). (5.14) and u. can be estimated by solving the following equation: #1 #1 2.. ——- p ; u? + 113:1- (5.15) #2 #2 If the two input images have completely redundant information, i.e., 61 2 ,8-2, then, p : 20/312 = 206.3 and m : [1.2 : In this case, we select [31 = ,82 = 1, i.e., Sol" 73 ,6 : [1,1]T. Similarly, if ,81 > ,82 (corresponding to some redundant information between the two measurements), then, from eq. (5.15): iii — iii __ fi‘i’ — 3% —— _ —. (5.16) #1M2 fiifl2 It is clear that in this case, in > #2. Therefore, according to the proposed rule, [31 2 p1 and 62 2 #2. In the extreme case when 51 75 0 and ,82 = 0, it can be shown using a similar procedure that ,6 = u = [1,0]T. Similarly, it can be shown that B = p. 2 [0,1]T if [31 : 0 and ,62 7é 0. Therefore, it is clear that in case of complementary information, the proposed approach selects the region that has relevant information. Due to the assumption that pixel values of the measurement images are always positive numbers, the sensor gains estimated using the proposed approach will always be positive. Note that in the above analysis, we assumed sensor noise to be zero. If noise is present, then the principal eigenvector will provide a noisy estimate of the sensor gains. However, these gains are further used in the TV based framework of eq. (5.8), which is robust to noise and compensates for the noisy estimate of the sensor gains. 5.3 Overall Approach The flowchart of the proposed pixel level fusion method is shown in Fig. 5.2. We first split the input images into several blocks, and arrange the pixels inside each block lexicographically. The lexicographically arranged pixels of any given block from all the input images are used to compute the correlation matrix 2,, and the principal eigenvector of this correlation matrix is used to determine the gain ,6 of that block. This process is repeated for all the blocks. After the sensor gains are estimated, eq. (5.8) is solved iteratively to estimate the fused pixels, fo(x,y) for all locations (any) 6 Q 74 0 Input: n co—registered sensor images 0 Output: Fused image OSensor Gain Estimation: —) Split each input image into B blocks each of size p x q —) For each block (2) : 2': 1,--- ,B 0 Estimate sensor gain for each block using algorithm presented in Section 5.2.2 0 Fusion Process: —) Solve eq. (5.8) iteratively for each pixel. Figure 5.2: Flowchart of the Proposed TV Norm. Based Pixel Level Image Fusion 5.4 Results The results of the proposed fusion algorithm are presented in this section. The pro- posed fusion algorithm was applied to “medical imaging” and “aircraft navigation” datasets explained in Section 4.4. The sensor noise was simulated by adding zero mean white Gaussian noise to the input images. The signal-to—noise ratio (SNR) was computed using eq. (3.9). Fig. 5.3 shows the medical images that were used to validate the proposed algo- rithm. Fig. 5.3a and 5.3b are the images of a human brain obtained using computed tomography (CT) and magnetic resonance imaging (MRI), respectively. To simulate the sensor noise, zero mean white Gaussian noise were added to these input images. The noisy input images were fused using the least square as well as proposed ap- proach. For both the approaches for fusion, CT and MRI images were divided into non-overlapping regions of size 8 x 8 and for each region, the sensor gain was com- 75 puted using the principal eigenvector approach explained in Section 5.2.2. For least square based fusion, the estimated sensor gain was used in eq. (5.3) to estimate the fused images. To fuse these images using the proposed approach, the estimated sensor gains were used in the total variation algorithm described in Section 5.2.1. Several different levels of SNR were used to validate the robustness of the algorithm. Three sets of fusion results for 23 dB, 12 dB and 0 dB signal to noise ratio (SNR) have been presented in Fig. 5.4-5.6. It is clear that for high level of SNR (Fig. 5.4), the performance of least square and the proposed approach is similar. However, as SN R. decreases, the proposed approach performs better than the least square approach as shown in Fig. 5.5 and 5.6. For very high levels of noise, i.e., low SNR, the proposed algorithm oversmoothes the edges of the fused image as shown in Fig. 5.6. The proposed algorithm was also applied to the aircraft navigation images [98] shown in Fig. 5.7. Fig. 5.7a was captured using a low-light-television (LLTV) sensor and Fig. 5.7b was obtained using a forward-looking-infrared (FLIR) sensor. The sensor noise simulation and the fusion process was exactly the same as that of the previous dataset, and Fig. 5.8-5.10 show the fusion results for 23 dB, 8 dB and 0 dB signal to noise ratio (SNR). It is clear from the fused images that the proposed approach is more robust to noise as compared to the least square approach. Again, for low SNR, the proposed fusion algorithm tends to oversmooth the edges of the fused images. To estimate the quality of the fused image quantitatively, the similarity measure based quality index proposed in [60] was used. The goal of quality assessment is to compare the information content in the fused image and the corresponding input images. Therefore, the similarity index was computed by comparing the fused images with the noiseless versions of the corresponding input images. The similarity index for the medical and navigation datasets are summarized in Table 5.1, and plots of the similarity quality index for LSE and the proposed approach as a function of SNR are 76 presented in Fig. 5.11. It is evident from both the Table 5.1 as well Fig. 5.11 that for very high SN R, performance of both the LSE and the proposed TV norm based approaches is comparable. However, as the SN R is reduced, the proposed approach performs better than the LSE. It is apparent that for both approaches, similarity index value approaches zero as SNR drops. Table 5.1: Performance summary of the total variation based pixel level fusion algorithm Image DataSet SNR SNR Similarity Similarity (dB): (dB): Index: Index: Sensor 1 Sensor 2 LSE Pro- posed Ap- proach , 23 23 0.58 0.62 Medical Image 12 12 0.34 0.44 0 O 0.11 0.22 , , _ 23 23 0.66 0.67 Aircraft Nav1gation . 0.20 0.32 0.07 0.16 5.5 Discussion A total variation framework for pixel level fusion is proposed in this chapter. The proposed fusion approach was applied to several different types of datasets. The results on these data indicate the feasibility of the proposed approach. 77 (a) (b) Figure 5.3: Medical Images: (a) CT Image (b) MRI Image 78 (C) ((1) Figure 5.4: Medical Images, SNR : 23 dB: (a) CT Image (b) MRI Image (c) Fused Image: Least Square ((1) Fused Image: Proposed Approach 79 (c) (d) Figure 5.5: Medical Images, SNR, = 12 dB: (a) CT Image (b) MRI Image (c) Fused Image: Least Square (d) Fused Image: Proposed Approach 80 Figure 5.6: Medical Images, SNR : 0 dB: (a) CT Image (b) MRI Image (c) Fused Image: Least Square ((1) Fused Image: Proposed Approach 81 (a) (b) Figure 5.7: Aircraft Navigation Images: (a) LLTV Image (b) FLIR Image 82 (C) ((1) Figure 5.8: Aircraft Navigation Images, SNR : 23 dB: (a) LLTV Image (b) FLIR Image (c) Fused Image: Least Square ((1) Fused Image: Proposed Approach 83 (C) (d) Figure 5.9: Aircraft Navigation Images, SNR = 8 dB: (a) LLTV Image (b) FLIR Image (c) Fused Image: Least Square (d) Fused Image: Proposed Approach 84 (C) (d) Figure 5.10: Aircraft Navigation Images, SNR : 0 dB: (a) LLTV Image (b) FLIR Image (c) Fused Image: Least Square (d) Fused Image: Proposed Approach 85 1 . . a i --+ -- Medical Image: LSE 0.9 r ~91? -- Medical Image: TV ‘ 0.8 - ‘ i -: [=5 0.7 - . i '0 AK. i 0.6 '- *+.' -i ‘73 o.5~ ' * 5+ - 8 Afar ‘_ + .‘E‘ 0.4 r- + - g _ 9+ - E 0.3 . m at 0.2 - r ‘ an?” - + - 0.1 ++,+- 0 1 i l 1 -10 O 10 20 3O 40 SNR (dB)—--> (a) 1 . r - f ~-+ -- Aircraft Navigation: LSE 0.9 - mar -- Aircraft Navigation: TV ‘ 0.8 - ‘ g 0.7- *m ‘ E 0.6- ..." " - g 0.5 r as _r' ' O .' g 0.4 " .' 5* +' - .E ' ' .-* w o.2~ _. + - .w 0.1 ~ H ' ++. .+'t o l l I I -10 O 10 20 30 40 SNR (dB)---> (b) Figure 5.11: Plot of SNR vs. Similarity Quality Index (a) CT, MRI dataset (b) LLTV, FLIR dataset 86 CHAPTER 6 Conclusions And Future Research 6. 1 Conclusions In this dissertation, model based image fusion algorithms have been explored. The primary advantage of model based fusion approach is that it allows identification of complementary and redundant information present in the input images in a systematic manner. First, a multichannel blind deconvolution based approach for blur removal was considered. The approach was based on Richardson-Lucy algorithm and is iterative in nature. Furthermore, this approach was also extended to obtain high-resolution images from several low-resolution images. However, this approach was seen to be susceptible to high levels of noise. In several situations, the input images for fusion contain complementary informa- tion at the local level. For such cases, pixel level fusion that derives a fusion rule based on small regions in the image is more appropriate for fusion. A pixel level fusion that maximizes the power of the fused pixel was presented in Chapter 4. The proposed approach does not depend on any heuristic threshold but is susceptible to noise. A total variation based approach for pixel level fusion is discussed in Chapter 5. This 87 approach was compared to the “Least Square” approach and was seen to be more robust to noise as compared to the “Least Square” technique. 6.2 Future Research Future work includes the exploration of the following issues: Experimental results indicate that the multichannel deconvolution algorithm presented in Chapter 3 converges. Future work will include a mathematical analysis of the convergence properties of this algorithm. Literature review suggests that the upsampling size for super-resolution applica- tions is assumed to be known a priori. A mathematical framework to estimate the optimal upsampling size based on the image contents of the low-resolution images will be explored. The sensor gain parameters for total variation fusion algorithm presented in Chapter 5 are estimated using principal Eigenvector analysis. Alternative ap- proaches to estimate sensor gains that are more robust to sensor noise will be investigated as the part of the future work. More robust algorithms to estimate the quality of the fused image will also be investigated. In particular, quality metrics that attempt to incorporate infor- mation about the image scene will be investigated to provide a better estimate of image fusion quality. Finally, the fusion algorithms discussed in the dissertation will be validated using additional datasets. For instance, deconvolution models have been used in several different areas of image processing such as medical imaging, nonde- structive evaluation, remote sensing, etc. The application of the multi-channel 88 algorithm to these areas will be examined as a part of the future work. Some initial studies on using the proposed multichannel algorithm to estimate flaw sizes from multi-frequency eddy coils have already been conducted [17]. The initial results indicate that the proposed approach performs better than the single channel analysis. However, this study needs more investigation. 89 BIBLIOGRAPHY 90 BIBLIOGRAPHY [1] M. Kokar and K. Kim, “Review of multisensor data fusion architectures,” in Proceeding of the IEEE International Symposium on Intelligent Control, August 1993, pp. 261—266. [2] D. L. Hall and J. Llinas, “An introduction to multisensor data fusion,” Proc. IEEE, vol. 85, no. 1, pp. 6—23, 1997. [3] R. Mahler, “A unified foundation for data fusion,” in Proc 1994 Data Fusion Syst. Conf, John Hopkins University, June 1987. [4] M. A. Abidi and R. C. Gonzalez, Data fusion in Robotics and Machine Intelli- gence. San Diego, CA, USA: Academic Press Inc., 1992. [5] J. Llinas and E. Waltz, Multisensor Data Fusion. Boston, MA, USA: Artech House, 1990. [6] M. Elad and A. Feuer, “Restoration of a single super resolution image from several blurred, noisy, and undersampled measured images,” IEEE Trans. Image Processing, vol. 6, pp. 1646—1658, 1997. [7] E. Waltz, “The principles and practice of image and spatial data fusion,” in Proc. 8th Natl. Data Fusion Conf., Dallas, TX, March 1995. [8] R. Blum, Z. Xue, and Z. Zhang, An overview of image fusion, in: R. Blum, Z. Liu (Eds), Chapter in the book Multi-Sensor Image Fusion and I ts Applications. Signal and Image Processing Series of Marcel Dekker/CRC Press, 2005. [9] Z. Zhang and R. Blum, “A categorization and study of multiscale— decomposition-based image fusion schemes,” Proceedings of the IEEE, vol. 87, no. 8, pp. 1315—1326, August 1999. [10] R. S. Blum, “Robust image fusion using a statistical signal processing ap- proach,” Information Fusion, vol. 6, no. 2, pp. 119—128, June 2005. [11] Y. Chibani, “Multisource image fusion by using the redundant wavelet decom- position,” IEEE Geoscience and Remote Sensing Symposium, vol. 2, pp. 1383* 1385, July 2003. 91 [12] [13] [14] [15] [15] [17] [18] [19] [20] [21] [22] [23] M. Hurn, K. Mardia, T. Hainsworth, J. Kirkbride, and E. Berry, “Bayesian fused classification of medical images,” IEEE Trans. Med. Imag., vol. 15, no. 6, pp. 850—858, December 1996. X. Gros, Z. Liu, K. Tsukada, and K. Hanasaki, “Experimenting with pixel-level ndt data fusion techniques,” IEEE Trans. Instrum. Meas., vol. 49, no. 5, pp. 1083—1090, October 2000. F. Nebeker, “Golden accomplishments in biomedical engineering,” IEEE Engi- neering in Medicine and Biology Magazine, vol. 21, no. 3, pp. 17—47, May/June 2002. D. Barnes, G. Egan, G. OKeefe, and D. Abbott, “Characterization of dy- namic 3-d pet imaging for functional brain mapping,” IEEE Trans. Med. Imag., vol. 16, no. 3, pp. 261—269, June 1997. S. Wong, R. Knowlton, R. Hawkins, and K. Laxer, “Multimodal image fusion for noninvasive epilepsy surgery planning,” International Journal of Remote Sensing, vol. 16, no. 1, pp. 30—38, January 1996. M. Kumar and P. Ramuhalli, “Dynamic programming based multichannel im- age restoration,” in IEEE Conf. on Acoustics, Speech, and Signal Processing, Philadelphia, PA, March 2005, pp. 609—612. I. Edwards, X. Gros, D. Lowden, and P. Strachan, “Fusion of ndt data,” British Journal of Non Destructive Testing, vol. 35, no. 12, pp. 710-713, December 1993. M. Pavel, J. Larimer, and A. Ahumada, “Sensor fusion for synthetic vision,” in Society for Information Display, 1992, pp. 475—478. M. A. Burgess, T. Chang, D. E. Dunford, R. H. Hoh, W. F. Home, R. F. TuCker, and J. A. Zak, “Synthetic vision technology demonstration: Flight tests,” in Technical Report DOT/FAA/RD-93/40, III, Research and Development Ser- vice, Washington DC, USA, December 1993. R. K. Sharma, T. K. Leen, and M. Pavel, “Probabilistic image sensor fusion,” in Advances in neural information processing systems 11, 1999, pp. 824—830. P. J. Burt and R. J. Kolczynski, “Enhanced image capture through fusion,” in Fourth International Conference on Computer Vision, 1993, pp. 173—182. H. Li and Y. Zhou, “Automatic visual/IR. image registration,” in Optical En- gineering, vol. 35(2), 1996, pp. 391—400. 92 [24] L. G. Brown, “A survey of image registration techniques,” in Computing Sur- veys, vol. 24(4), 1992, pp. 325—376. [25] D. C. Ghiglia, “Space-invariant deblurring given n independently blurred images of a common object,” J. Opt. Soc. Am. A, vol. 1, no. 4, pp. 398—402, April 1984. [26] M. Elad and A. Feuer, “Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images,” IEEE Trans. Image Processing,'vol. 6, no. 12, pp. 1646—1658, December 1997. [27] W. H. Richardson, “Bayesian-based iterative method of image restoration,” J. Opt. Soc. Amer. A, vol. 62, no. 1, pp. 55—59, 1972. [28] L. B. Lucy, “An iterative technique for the rectification of observed distribu- tions,” The Astronomical J., vol. 79, no. 6, pp. 745—754, 1974. [29] W. Seales and S. Dutta, “Everywhere-in-focus image fusion using controllable cameras,” in Proceedings of SPIE 2905, vol. 11, 1996, pp. 227—234. [30] J. Yang and R. S. Blum, “A statistical signal processing approach to image fusion for concealed weapon detection,” in Proc. of International Conference on Image Processing, vol. 1, 2002, pp. I—513—I—516. [31] A. Mohammad-Djafari, “Bayesian approach for data and image fusion,” in Int. Workshop on Bayesian and Maximum Entropy Methods, Moscow, Idaho, USA, August 2002. [32] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D, vol. 60, pp. 259—268, 1992. [33] P. L. Combettes and J.-C. Pesquet, “Image restoration subject to a total varia- tion constraint,” IEEE Trans. Image Processing, vol. 13, no. 9, pp. 1213—1222, 2004. [34] T. Chen, Y. Wotao, S. Z. Xiang, S. D. Comaniciu, and T. S. Huang, “Total variation models for variable lighting face recognition,” IEEE Trans. Pattern Anal. Machine Intell, vol. 28, no. 9, pp. 1519-1524, 2006. [35] G. A. Hewer, C. Kenney, and B. S. Manjunath, “Variational image segmentation using boundary functions,” IEEE Trans. Image Processing, vol. 7, no. 9, pp. 1269-1282, 1998. [36] L. Alvarez, F. Guichard, P.-L. Lions, and J.-M. Morel, “Axioms and funda- mental equations of image processing,” Archive for Rational Mechanics and Analysis, vol. 123, pp. 199—257, 1993. 93 [37] L. Alvarez and J .-M. Morel, “Formalization and computational aspects of image analysis,” Archive for Rational Mechanics and Analysis, pp. 1—59, 1994. [38] S. Farsiu, D. Robinson, M. Elad, and P. Milanfar, “Advances and challenges in super-resolution,” International Journal of Imaging Systems and Technology, vol. 14, no. 2, pp. 47—57, August 2004. [39] —, “Fast and robust multi-frame super-resolution,” IEEE Trans. Image Pro- cessing, vol. 13, no. 10, pp. 1327—1344, October 2004. [40] S. C. Park, M. K. Park, and M. G. Kang, “Super-resolution image reconstruc- tion: a technical overview,” IEEE Signal Processing Mag., vol. 20, no. 3, pp. 21-36, May 2003. [41] M. K. Ng and N. K. Bose, “Mathematical analysis of super-resolution method- ology,” IEEE Signal Processing Mag, vol. 20, no. 3, pp. 62—74, May 2003. [42] D. Hal, Mathematical Techniques in Multisensor Data Fusion, 2nd ed. Boston, USA: Artech House, March 2004. [43] J. Llinas and D. Hall, “A challenge for the data fusion community ii: Research imperatives for improved processing,” in Proc. 7th Natl. Symp. On Sensor Fu- sion, Albuquerque, New Mexico, March 1994. [44] C. Pohl and J. L. van Genderen, “Multisensor image fusion in remote sensing: concepts, methods and applications,” International Journal of Remote Sensing, vol. 19, no. 5, pp. 823—854, 1998. [45] M. Daniel and A. Willsky, “A multiresolution methodology for signal-level fu- sion and data assimilation with applications to remote sensing,” Proc. IEEE, vol. 85, no. 1, pp. 164—180, January 1997. [46] E. Lallier and M. Farooq, “A real time pixel-level based image fusion via adap- tive weight averaging,” in Proceedings of the Third International Conference on Information Fusion, vol. 2, no. 10, Paris, France, July 2000, pp. WEC3/3 —— WEC313. [47] G. Pajares and J. D. L. Cruz, “A wavelet-based image fusion tutorial,” Pattern Recognition, vol. 37, no. 9, pp. 1855—1872, September 2004. [48] J. Richards, “Thematic mapping from multitemporal image data using the prin- cipal components transformation,” Remote Sensing of Environment, vol. 16, pp. 36—46, 1984. 94 [49] I. Bloch, “Information combination operators for data fusion: a comparative review with classification,” IEEE Trans. Syst, Man, Cybern. C, vol. 26, no. 1, pp. 52—67, January 1996. [50] Y. Gao and M. Maggs, “Feature-level fusion in personal identification,” in Pro- ceedings of the IEEE Computer Society on Computer Vision and Pattern Recog- nition, vol. 1, San Diego, CA, USA, June 2005, pp. 468-473. [51] V. Sharma and J. W. Davis, “Feature-level fusion for object segmentation using mutual information,” in Computer Vision and Pattern Recognition Workshop, June 2006, pp. 139-146. [52] A. Kumar and D. Zhang, “Personal recognition using hand shape and texture,” IEEE Trans. Image Processing, vol. 15, no. 8, pp. 2454—2461, August 2006. [53] K. Fukanaga, Introduction to Statistical Pattern Recognition, 2nd ed. New York, USA: Academic Press, September 1990. [54] Y. Liao, L. W. Nolte, and L. M. Collins, “Decision fusion of ground—penetrating radar and metal detector algorithmsa robust approach,” IEEE Trans. Ceosci. Remote Sensing, vol. 45, no. 2, pp. 398—409, February 2007. [55] L. O. Jimenez, A. Morales-Morell, and A. Creus, “Classification of hyperdimen- sional data based on feature and decision fusion approaches using projection pursuit, majority voting, and neural networks,” IEEE Trans. Geosci. Remote Sensing, vol. 37, no. 3, pp. 1360—1366, May 1999. [56] M. Fauvel, J. Chanussot, and J. A. Benediktsson, “Decision fusion for the classi- fication of urban remote sensing images,” IEEE Trans. Ceosci. Remote Sensing, vol. 44, no. 10, pp. 2828—2838, October 2006. [57] S. Foucher, M. Germain, J .-M. Boucher, and G. B. Benie, “Multisource classi- fication using icm and dempster—shafer theory,” IEEE Trans. Instrum. M eas., vol. 51, no. 2, pp. 277—281, April 2002. [58] Z. Wang, A. C. Bovik, and L. Lu, “Why is image quality assessment so dif- ficult?” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, USA, May 2002. [59] Z. Wang and A. Bovik, “A universal image quality index,” IEEE Signal Pro- cessing Lett., vol. 9, no. 3, pp. 81—84, March 2002. [60] N. Cvejic, A. Loza, D. Bull, and N. Canagarajah, “A similarity metric for assess- ment of image fusion algorithms,” International Journal of Signal Processing, vol. 2, no. 3, pp. 178-182, 2005. 95 [61] T. Cover and J. Thomas, Elements of Information Theory, 2nd ed. Wiley- Interscience, July 2006. [62] L. Leung, B. King, and V. Vohora, “Comparison of image data fusion techniques using entropy and ini,” in Proc. ACRS, Singapore, November 2001, pp. 152—157. [63] Z.Li, Z.Jing, G.Liu, S.Sun, and H.Leung, “A region-based image fusion algo- rithm using multiresolution segmentation,” in IEEE International Conference on Intelligent Transportation Systems, Shanghai, China, October 2003, pp. 96— 101. [64] H. C. Andrews and B. R. Hunt, Digital Image Restoration. Englewood Cliffs, NJ, USA: Prentice Hall, June 1977. [65] A. Jain and S. Ranganath, “Application of two dimensional spectral estimation in image restoration,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 6, April 1981, pp. 1113—1116. [66] D. Kundur and D. Hatzinakos, “Blind image deconvolution,” IEEE Signal Pro- cessing Mag, vol. 13, no. 3, pp. 43—64, May 1996. [67] A. Kirsch, An Introduction to the Mathematical Theory of Inverse Problems, lst ed. Springer, September 1996. [68] A. Tarantola, Inverse Problem Theory and Methods for Model Parameter Esti- mation. SIAM, December 2004. [69] J. Liu, Q. Wang, and Y. Shen, “Comparisons of several pixel-level image fusion schemes for infrared and visible light images,” in Proceedings of the IEEE Instru- mentation and Measurement Technology Conference, vol. 3, Ottawa, Canada, May 2005, pp. 2024— 2027. [70] E. H. Adelson, C. H. Anderson, J. Bergen, P. Burt, and J. Ogden, “Pyramid methods in image processing,” RCA Eng, vol. 29, no. 6, pp. 33—41, 1984. [71] V. Petrovic and C. Xydeas, “Multiresolution image fusion using cross band selection,” Proc. SPIE, vol. 39, pp. 319—326, 1999. [72] R. VVelch and M. Ehlers, “Merging multiresolution SPOT HRV and landsat TM data,” Photogramm. Eng. Remote Sensing, vol. 53, no. 3, pp. 301—303, 1987. [73] R. Bellman, Introduction to Matrix. Analysis, 2nd ed. Society for Industrial Mathematics, January 1987. 96 [74] [761 [77] [781 [79] [80] [81] [82] [83] [84} [85] T. Akgun, Y. Altunbasak, and R. M. Mersereau, “Super-resolution reconstruc- tion of hyperspectral images,” IEEE Trans. Image Processing, vol. 14, pp. 1860-— 1875, 2005. R. Molina, J. Mateos, and A. K. Katsaggelos, “Super resolution reconstruction of multispectral images,” in VirtualObservatory: Plate Content Digitization, Archive Mining and Image Sequence Processing, Heron Press, Sofia, Bulgary, 2006, pp. 211~220. H. Greenspan, G. Oz, N. Kiryati, and S. Peled, “MRI inter-slice reconstruction using super resolution,” Magn. Reson. Imaging, vol. 20, pp. 437—446, 2002. R. R. Peeters, P. Kornprobst, M. Nikolova, S. Sunaert, T. Vieville, G. Ma- landain, R. Deriche, O. Faugeras, M. Ng, and P. V. Hecke, “The use of super- resolution techniques to reduce slice thickness in functional MRI,” Int. J. Imag- ing Syst. and Technol, vol. 14, pp. 131—138, 2004. J. Cui, Y. Wang, J. Huang, T. Tan, and Z. Sun, “An iris image synthesis method based on pea and super-resolution,” in Proc. 17th Int. Conf. Pattern Recognit, vol. IV, Cambridge, UK, 2004, pp. 471—474. S. Chaudhuri and D. R. Taur, “High-resolution slow-motion sequencing: how to generate a slow-motion sequence from a bit stream,” IEEE Signal Processing Mag, vol. 22, pp. 16—24, 2005. R. Y. Tsai and T. S. Huang, “Multiframe image restoration and registration,” Adv. Comput. Vis. Image Process, vol. 1, pp. 317—339, 1984. P. Vandewalle, S. Siisstrunk, and M. Vetterli, “A frequency domain approach to registration of aliased images with application to super-resolution,” E URA SIP J. Appl. Signal Process, vol. ID 71459, p. 14, 2006. A. K. Katsaggelos, R. Molina, and J. Mateos, Super Resolution of Images and Video, 1st ed. Morgan and Claypool Publishers, May 2007. H. Stark and P. Oskoui, “High—resolution image recovery from image—plane ar- rays, using convex projections,” Journal of the Optical Society of America A, vol. 6, no. 11, pp. 1715—1726, 1989. S. Peleg, D. Keren, and L. Schweitzer, “Improving image resolution using sub- pixel motion,” Pattern Recognit. Lett., vol. 5, no. 3, pp. 223—226, 1987. D. Keren, S. Peleg, and R. Brada, “Image sequence enhancement using sub- pixel displacement,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 1988, pp. 742—746. 97 [86] [87] [88] [89] [901 [91] [92] [93] [94] [95] [96] [97] R. Schultz and R. Stevenson, “Extraction of high-resolution frames from video sequences,” IEEE Trans. Image Processing, vol. 5, no. 6, pp. 996—1011, 1996. R. Hardie, T. Tuinstra, J. Bognar, K. Barnard, and E. Armstrong, “High reso— lution image reconstruction from digital video with global and non—global scene motion,” in In Proceedings of the IEEE International Conference on Image Processing, vol. I, Santa Barbara, CA, 1997, pp. 153—156. P. Sementilli, M. Nadar, and B. Hunt, “Poisson map super-resolution estimator with smoothness constraint,” in In Neural and Stochastic Methods in Image and Signal Processing 11, vol. 2032 of Proc. SPIE, 1993, pp. 2—13. G. R. Ayers and J. C. Dainty, “Iterative blind deconvolution method and its applications,” Opt. Lett., vol. 13, pp. 547—549, 1988. D. A. Fish, A. M. Brinicombe, and E. R. Pike, “Blind deconvolution by means of the richardson-lucy algorithm,” J. Opt. Soc. Am. A, vol. 12, no. 1, pp. 58—65, 1995. R. L. White, “Image restoration using the damped richardson-lucy method,” in ASP Conference Series, vol. 61, 1994, pp. 292—295. A. K. Jain, “Advances in mathematical models for image processing,” Proc. IEEE, vol. 69, no. 5, pp. 502-528, 1981. C. A. Berenstein and E. V. Patrick, “Exact deconvolution for multiple convo- lution operators - an overview plus performance characterizations for imaging sensors,” Proc. IEEE, vol. 78, pp. 723-734, 1990. A. Chambolle, R. D. Vore, N. Lee, and B. Lucier, “Nonlinear wavelet image pro- cessing: variational problems, compression, and noise removal through wavelet. shrinkage,” IEEE Trans. Image Processing, vol. 7, no. 3, pp. 319—333, 1998. G. Chang, B. Yu, and M. Vetterli, “Adaptive wavelet thresholding for image denoising and compression,” IEEE Trans. Image Processing, vol. 9, pp. 1522— 1531, 2000. F. Sroubek and J. Flusser, “Resolution enhancement via probabilistic decon- volution of multiple degraded images,” Pattern Recognit. Lett., vol. 27, pp. 287—293, 2006. N. Nguyen, P. Milanfar, and G. H. Golub, “Efficient generalized cross-validation with applications to parametric image restoration and resolution enhancement,” IEEE Trans. Image Processing, vol. 10, pp. 1299—1308, 2001. 98 [98] [Online]. Available: http://www.imagefusion.org/ [99] (2006) Signal Processing & Communication Research Labora- tory, Lehigh University, Bethlehem, PA, USA. [Online]. Available: http: / / www.ece.lehi gh.edu / SPCRL / IF / image_fusion.htm / [100] A. C. Rencher, Methods of Multivariate Analysis, 2nd ed. W iley-Interscience, March 2002. 99