(Ir ;.l. ..r... » r. fig...“ 1 i 1 . Iv . on, I“ v.13» ... ......I.......:.. .1 4.1.} l :1, L. . :53... t "H.093 I In 7 . ~ 1 : 9:13.}. . ... a. 3?.» .. {.5 . :3 5......» ? v . .n r . . ...?! . r! .‘ ...: xii. «pram. . 5 .3 4} ..Idi) c t ?:.b 1". .J” 3. .. 9:3. . l . I. ...u I. a! l... ...". 5:3,! ...... 5.5.5.. ...]...23: .533 .... . :2... .8». RN“. lot iv.” 1 i ...... an}... “anew 2.; {£35 A ... c3...“ 53... . I. a. 1.1.? ... 1.153.513... Erna“? an..." . .9! awn. . .. fiur by“! .1: 3 313.1%.» ...} . l "a; . ...: z £$.:£§vtcl...kl1 a 1 D5. . {IV 3 ...!r: u l .3:- c1 it a ain't»! ”It”; 3. . .ft :15. :7... 3:35.}! 3:1 .... mu. urn-«:45— 1...!!! . 55.3.1: I: 5.1»... : .... us i=3... 5 ...s . 1....1. .. a)! a: ~ IRSF! ...... z... u: .... . k at {51‘ '0 t...‘ . Iv . ...,lllli... 1.. « . .... .... Jul l. ... in); 1.7 .3. [II ... 2.3.. I 1.... :3 2t. . . ..ult Ln..- .: a1. .. a... i. f . 3...}! p. .3:- y . . 5b.»; S 5.. 7 OI I". f YA .l’rl‘ . f 0?... .Y ..u.l|¢:1t ... . vs “.11”; 1 fl. .ni..7....‘. \ I ... ...... : I...) 9 (Ir 3...... .... l .... 2 x3” . 1.9. 7 1 13!... 1 . n a 1 I V .. .p 4-. . .1 A. 55 h :V .I . c .5 . . :1 .I I4 ... [I s! 1‘ .rLPKHAJ-i I .2... .l‘?!.. iliul! ..f 0‘ I!!! I315 t‘ '4...) l . THESlS low This is to certify that the dissertation entitled NOVEL DETECTION METHODS FOR FUNCTIONAL MAGNETIC RESONANCE IMAGING presented by Fangyuan Nan has been accepted towards fulfillment of the requirements for Ph . D degree in Electrical Eng Maior professor Date 5’1/5/1/1000 MS U is an Affirmative Action/Equal Opportunity Institution 0—12771 LIBRARY Michigan State University PLACE IN RETURN BOX to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE woo cameos-p.14 Novel Detection Methods for Functional Magnetic Resonance Imaging By Fangyuan Nan A Dissertation Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Electrical and Computer Engineering 2000 ABSTRACT Novel Detection Methods for Functional Magnetic Resonance Imaging By F angyuan Nan This dissertation considers the detection problem in functional magnetic resonance imaging (fMRI), i.e., to determine which parts of the brain show active response to some stimulus. Some basic magnetic resonance imaging and fMRI background is introduced in the beginning. The underlying principle is that subtle variations in the image intensity over time can be detected to reveal brain activity. The detection problem is first attacked on a pixel by pixel basis. A new nonlinear detector for MRI based on Generalized Likelihood Ratio Test (GLRT) is systemat- ically studied. Theoretical analysis and Monte Carlo simulation are used to explore the performance of the new detector. At relatively low baseline signal intensities, I the GLRT detector outperforms both the conventionally used magnitude correlation (MC) detector and the newly proposed complex correlation (CC) test. At high base- line signal intensities, the nonlinear GLRT performs as well as the standard MC test and significantly better than the CC test. fMRI signals are actually both temporally and spatially dependent. Pixel-wise detection, however, considers only temporal correlation information and ignores. spa- tial correlation information. In order to remedy this deficiency, the dissertation then uses a multi-scale image segmentation algorithm to first segment an MRI correla- tion image into several regions, each with homogeneous statistical behavior. A single pixel detection algorithm is then applied to each homogeneous region. Extensive simulations demonstrate the efficacy of the new method. ACKNOWLEDGMENTS I’m most grateful to God, Jesus and Li Hongzhi for giving me a new birth and teaching me a lot of unearthly wisdom. Everyone the Lord has rescued from trouble should praise him [Bible - Psalm 107:1-2]. I’m very grateful to my grandmother and my parents, Mr. Shu-Guang Nan and Mrs. Benzhen Li, for nurturing me. The unfailing love from them as well as from my sisters, Mrs. Xiaoping Nan and Mrs. Xueqing Nan, is always a strong encouragment and support for my life. Pay attention to your father, and don’t negelect your mother when she grows old. Invest in truth and wisdom, discipline and good sense, and don’t part with them. Make your father truly happy by living right and showing sound judgment. Make your parents proud, especially your mother [Bible — Proverbs 23:22- 25]. I’m grateful to my advisor Dr. Nowak, a brilliant young scholar and a very nice gentleman, who helped me greatly not only in my research, but also in my life. Wthout his help, this dissertation would have been impossible. I’d also like to express gratitude to Dr. Deller, Dr. Pierre and Dr. LePage for their willingness to serve on my committee and for their helpful suggestions for my research and this dissertation. I’m very thankful to many friends’ help at a very difficult time during my research. I am obligated to mention the following names: Dr. Robert Barger and Dr. Paul Liu, Tom and Ruth Shillair, Yi Wan, Dr. Yingcheng Dai and his wife Dr. Qing Li, Dr. Dimitri Mihalas, Dr. William Schmidt, Dr. Jen-Je Su and his wife Dr. Shu-Shyan Wong, Ms. Smith, Dr. Yingnian Wu and his wife Ms. Liping Li. I’m thankful to many teachers. Among them, of particular importance and help to me are my previous advisors: Prof. Siyong Zhou at Beijing Institute of Technology, Prof. E. C. Young, Prof. Christopher Hunter at Florida State University (FSU), and Prof. Kun-Mu Chen and Prof. Edward Rothwell at Michigan State University (MSU). I am also indebted to the Dept. of Mathematics at FSU and the Dept. of Electrical and Computer Engineering at MSU. Without the teaching assistantship provided by them, my graduate studies in the States would be impossible. iv TABLE OF CONTENTS LIST OF TABLES vii LIST OF FIGURES viii 1 Introduction 1 1.1 How does fMRl work? .......................... 4 1.2 Characteristics of the fMRI Signal .................... 5 1.3 Outline for Detection Methods for MRI ................ 5 1.4 Organization of This Dissertation .................... 6 2 Pixel-wise Detection for MRI 9 2.1 Overview .................................. 9 2.2 fMRI Signal Model ............................ 12 2.3 Generalized Likelihood Ratio Tests ................... 17 2.4 Method 1: Magnitude Correlation Detection .............. 19 2.5 Method 2: Complex Correlation Detection ............... 21 2.6 Method 3: Nonlinear GLR Detection .................. 23 2.6.1 Derivation of the GLRT Test Statistic ............. 24 2.6.2 Invariance of GLRT Test Statistics ............... 28 2.6.3 Upperbound of GLRT Test Statistics .............. 32 2.6.4 Asymptotic of GLRT Test Statistics ............... 33 2.6.5 Threshold Selection Based on Numerical Simulation ...... 34 2.6.6 Distribution of GLRT Test Statistics .............. 34 2.7 Comparisons of the Three Detectors ................... 35 2.8 A Simulated fMRl Study ......................... 38 3 Multi—scale Detection for MM 42 3.1 Overview .................................. 42 3.1.1 Spatial Modeling and Outline for the Method ......... 43 3.1.2 General Setting of Bayesian Image Segmentation ....... 44 3.1.3 Multi-scale Image Segmentation Methods and Advantages . . 45 V 3.2 3.3 3.4 3.5 Multi-scale Analysis ........................... 3.2.1 l-D Multi-resolution Analysis (MRA) .............. 3.2.2 Properties of the Discrete Wavelet Transform ......... Image Segmentation by Multi-scale Markov Modeling Discrete Field . 3.3.1 Main Points of Algorithm I ................... 3.3.2 Simulation Result ......................... Image Segmentation by Multi-scale Hidden Markov Model (MHMM) of Wavelet Coefficients .......................... 3.4.1 Likelihood Function for {MRI Data ............... 3.4.2 Key Point for Edge Detection and a Prior Distribution for the WaveletCoefiicients 3.4.3 A Failure Modeling ........................ 3.4.4 Solution for Joint a Posteriori State Probability ........ 3.4.5 Marginal a Posteriori State Probability Calculation ...... 3.4.6 Image Label Estimation ..................... 3.4.7 Extension to 2 Dimensions .................... 3.4.8 A Simulation of the New Segmentation Method ........ Processing Results for MRI Data by Two-Step Approach ....... 4 Discussions and Conclusions 4.1 4.2 4.3 For Pixelwise Detection .......................... Consideration on Spatial Information .................. Epilogue .................................. 5 Appendices 5.1 Appendix I: Notations and Some Results on Projection Matrix . . . . 5.2 Appendix II: Some Preliminary Results on x2 and F Distribution . . 5.3 Appendix III: Definition of chian, Rayleigh and t Distribution . . . . 5.4 Appendix IV: Analysis of t Test Used in MRI Detection ....... 5.5 Appendix V: Principle of Invariant Test ................. 5.6 Appendix VI: SMAP Algorithm for Multi-scale Image Segmentation . 5.6.1 A Prior Consideration ...................... 5.6.2 Likelihood Function ........................ 5.6.3 Criterion and Solution ...................... BIBLIOGRAPHY vi 47 54 55 55 57 60 67 69 70 72 73 74 76 81 81 83 84 85 85 86 88 89 91 94 95 96 97 101 LIST OF TABLES 2.1 P, with P, = 0.01, N = 120 ....................... 36 2.2 P, with P, = 0.025, N = 120 ....................... 36 2.3 P, with P, = 0.05, N = 120 ....................... 37 vii 1..1 1.2 1.3 2.1 2.2 2.3 2.4 2.5 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 LIST OF FIGURES A block diagram of an MRI system .................... A series of images are acquired in fMRl to detect neuronal activity. Illustration of fMRI stimulus and responses. a) Controlled stimulus; b) Response from an activated area; c) Response from an inactivated area. .................................... (a) Activation-baseline pattern for an MRI experiment; (b) Modeling the activated pixel as a system. ..................... One time series from a real fMRI experiment. (a) Real part; (b) Imag- inary part; (c) Phase ............................ Structure of matched subspace detector. ................ Three curves comparing the performance of three detectors. (a) a/o = 1; (b) a/o = 3.162; (c) a/o = 10. .................... A simulated fMRI experiment illustrating pixel-wise detection by three detectors. (a) Brain image with simulated activation region high- lighted; (b) MC test results: Pd = 0.77; (c) CC test results: Pd = 0.70; (d) GLRT results: Pd 2 0.79. ...................... Illustration of image segmentation. ................... Nested scale spaces and wavelet spaces ................. Computation of DWT by filter bank .................. Synthesis by filter bank .......................... Pyramid structure of the MSRF. (a) Continuous image Y at different scales. (b) The random field X j at each scale is causally dependent on the coarser scale field X1"1 above it. .................. Neighborhood structure used in Algorithm 1. (a) Quad-tree structure used for 2-D case; (b) Binary tree structure used for 1-D case. One simulated image segmentation using Algorithm 1. (a) Original noisy image; (b) Segmentation result ................... Binary data tree structure for Harr wavelet analysis ........... Propagation of wavelet coefficients (three scales) ........... viii 14 16 23 39 41 46 52 53 53 56 58 59 61 64 3.10 Wavelet-based HMM ............................ 66 3.11 Conversion of a 2-D image to 1-D sequence ............... 74 3.12 One simulated image segmentation by Algorithm II. a) Noisy image; b) Detected edges; 0) Segmented image .................. 75 3.13 Processing results for fMRI data by two-step approach. a) One fMRI correlation image; b) Segmented image of (a), also final detection re- sults by combinational use of image segmentation and single pixel de- tection; (c) Another correlation image; (d) Segmentation and detection result of (c) ................................. 80 5.1 Experiment setup and two hypotheses for t-test used in MRI detection. 90 5.2 Invariant Test ................................ 93 5.3 Segmentation Result Using SMAP Algorithm. ............. 100 ix CHAPTER 1 Introduction No matter how advanced the computer is, it is still no match for the human brain, which to this day remains an unfathomable enigma. Li Hongzhi, Zhuan Falun Magnetic resonance imaging (MRI) is a powerful diagnostic imaging technique based on the principle of nuclear magnetic resonance (N MR), describing the interac- tion of nuclei and magnetic fields [31]. A block diagram of an MRI system is shown in Figure (1.1), adapted from [31] and [39]. It is a very complicated system, embracing many aspects of electrical engineering. The patient serves both as a transmitter and as a receiver. There are three types of magnetic fields in an MRI system [39, 67]. A static and very strong magnetic field is generated by huge superconducting magnets. A much weaker pulsing radio frequency field is employed to generate MR signals. Three sets of orthogonal gradient fields are used for imaging purposes, i.e., to spatially resolve the patient’s small structures to form an image. Although other imaging methods such as projection methods do exist, the current trend for MR imaging is to use Fourier inversion. The first set of gradient pulses uses linear changes in field strength to localize a region of interest in the subject’s body to be imaged — “slice selection”. -..” The second set of gradient pulses employs linear changes in frequency to distinguish columns in an image —- “frequency encoding”. The third set of gradient pulses utilizes linear changes in phase to distinguish the rows 7“ “phase encoding” [39, 12]. While traditional MRI provides only static images to analyze anatomical structure, functional MRI (fMRI), a newer imaging modality which is based on MRI and just comes to the stage during the past decade, acquires a series of images to detect neural activity, that is, to locate where— and how — the brain responds to certain stimuli [12, 37]. In other words, the central task for fMRI is to obtain maps of active and non- active regions of the brain corresponding to specific stimuli. Figure (1.2) illustrates a series of images acquired in an fMRI experiment. In this figure, the white small square indicates an activated region; the black small square indicates an inactive region. Compared with other imaging techniques, such as X-ray computerized tomography (CT), MRI is considered safer since it does not require the subject to be exposed to ionizing radiation [31]. MR images are also of high contrast and resolution. MRI provides more information since MR signals depend on several tissue parameters [31, 37, 68]. In addition to spin density p(r), the number of NMR visible spins in a given region, there are two principal relaxation times, each of which, in principle, can be used individually or combined [12]. MRI does not use exogenous agents, an advantage over another popular brain mapping technique known as positron ‘emission tomography (PET). Research on MRI involves substantial knowledge in physics, physiology, neurol- ogy, and psychology. This dissertation, however, concentrates on signal processing aspect. Note that the materials in this chapter are just for illustrative purposes; they are not meant to be comprehensive. For details, the reader is encouraged to refer to [12, 31, 37, 39, 68], which are the main sources for materials in this chapter. fin elementary particle with the mass of an electron and a charge of the same amount as the electron’s but positive. Magnet Gradient Coils RF Coil Diagram of Receiver Low . . Main RF C011 __ Gradient Coils ””33 AmP Magnet AmP Phase Shi 90 de 1 Odegree Phase Phase - Detector 22:1:2; PowerArnp Detector l Filter Filter Gradient RfPulse Di . . Pulse . 3“” gate V l___ I ]_ Cottinuous RFWave Computer Frequency Synthesizer Figure 1.1. A block diagram of an MRI system. / / / Figure 1.2. A series of images are acquired in MRI to detect neuronal activity. 1.1 How does flVIRI work? In short, there are basically two foundations [12, 37]. The first is due to physiological reasons. The other foundation comes from physics, that is, magnetic susceptibility l Blood contains iron (hemoglobin). Neural activity is linked to blood oxygenation levels in blood vessels close to active neurons. This relationship is called the Blood Oxygenation Level Dependent (BOLD) effect [66]. Specifically, neuronal activity causes an excess of oxygenation level in the blood nearby active brain tissue. De- oxygenated blood is more paramagnetic than oxygenated blood. Changes in the local concentration of de—oxyhemoglobin within the brain lead to alterations in the MR signal. Subtle variations in the magnetic susceptibility of oxygenated and de- oxygenated blood that are then detected in the MR signal indicate neural activity. For more details, refer to [12, 37]. Any object placed in a magnetic field will magnetize to a degree slightly more than (paramag- netic) or less than (diamagnetic) the applied field. The relationship between the field experienced within a sample and the applied field is known as the magnetic susceptibility calculated as the ratio of the internal field to the applied field. 1.2 Characteristics of the fMRI Signal Currently, the procedure to do fMRI experiments is to instruct the subject to perform experimental (E) and control (C) tasks in an alternating sequence of some design [12]. Refer to Figure (1.3a). Specifically, the subject is instructed to remain relaxed in the controlled (resting) state, and to perform some kind of consecutive and repetitive task (for example, finger tapping) in the activation (experiment) state [36]. In addition to controlled stimulus, two responses are also made up in Figure (1.3), one of which may be from the white spot (active pixel) in Figure (1.2) and the other of which may be from the black spot (inactive pixel) in Figure (1.2). It takes some time for MRI signal to reach its peak after the onset of the stimulus, which is called Response Latency [12]. An fMRl time series has a complicated structure. It contains noise varying with anatomical location. There is random noise as well as structured noise due to in- strumental (MR system characteristics), physiological (cardiac and respiratory pul- sations), and experimental (such as patient motion) factors [ 2, 12, 29, 38]. In addition to this complicated structure, the temporal and spatial characteristics of the time series are also unknown [29]. All these render the analysis of fMRI time series very difficult. The hemodynamic signal changes in MRI during brain activation are extremely small, from 2 to 5% at moderate magnetic field strengths (1.5T) [12]. 1.3 Outline for Detection Methods for MRI So far most fMRI detection methods are only pixel-wise. Generally, analysis of changes in neural activity is explored using statistical parametric map (SPM) [69], which is a two dimensional (2-D) image of a test statistic determined at each pixel by some operation between signal and reference. Originally, t statistics were usually 5 used [14]. In repetitive experiments involving a dynamic time sequence of images, a correlation method is now common, in which the correlation between each time series and a reference signal is used to decide whether or not activity is present [3]. It reduces to an F statistic test. Many generalizations and extensions of this simple idea have been proposed [24, 25, 36, 55, 60, 64]. Chapter 2 is also on pixel-wise detection, but contrary to most methods, complex-valued data are considered. Detection methods exploiting spatial information (correlation) of fMRI signal have also been recently proposed for MM [21, 35, 58]. Most of them use Bayesian strate- gies. For example in [35, 17], Bayesian principles and Markov Random Field (MRF) models are employed to facilitate joint spatio-temporal analysis of fMRI data. Chap- ter 3 develops a new multi-scale framework for similar purposes. 1.4 Organization of This Dissertation Chapter 2 stresses pixel-wise testing, which is most common in practice. First a model for the complex fMRI time series is proposed . The most distinct feature of the model is that the baseline nuisance component and reference signal component share a common phase. A nonlinear detection problem ensues. Based on the classical generalized likelihood ratio test (GLRT), three methods are investigated to attack that detection problem: the conventionally used magnitude correlation (MC) detec- tor, the complex correlation (CC) detector newly proposed by Lai and Glover [36], and a new nonlinear GLRT detector, with emphasis on the last one. The properties of the nonlinear GLR detector are carefully studied and a method for threshold se- lection from numerical simulations is presented. The nonlinear GLR detector has the best performance among the three [48]. An multi-scale detection method exploiting spatial information is presented in Chapter 3. Specifically, the idea of multi-scale image segmentation is used to improve the performance of fMRI detection. An multi-scale Bayesian framework for image segmentation is introduced in the beginning. Two image segmentation algorithms are then investigated, with emphasis on the second one whose application to fMRI detection is the second part of this research. Some discussions and conclusions regarding work of this dissertation are gathered in Chapter 4. Chapter 5 collects Appendices including some notations used throughout this dissertation, some results on x2, F and t distributions useful for the development in Chapter 2, the t-test used in MRI detection [14], principle of and some definitions and theorems on invariant tests [7, 56] used in Chapter 2, and the sequential maximal a posterior (SMAP) algorithm for multi-scale image segmentation [10]. 1* E E E E E E c C c c C y t (a) \/ V f (b) MAW/[W (C) "V Figure 1.3. Illustration of fMRI stimulus and responses. a) Controlled stimulus; b) Response from an activated area; c) Response from an inactivated area. CHAPTER 2 Pixel-wise Detection for flVIRI 1 A tree broader than a man can embrace is born of a tiny shoot; A dam greater than a river can overflow starts with a clod of earth; A journey of a thousand miles begins with a single step. Lao Tzu (500 BC, China), Tao Te Ching 2. 1 Overview In fMRI a series of MR images of the brain are acquired over time to detect neural activity. As explained in Section (1.1), the BOLD effect can be used to obtain maps of active and non-active regions of the brain. In order to achieve high signal noise ratio, the spatial and temporal imaging resolution must be limited [34]. Unfortunately, low resolution imaging may lead to a loss in signal information originating from microvasculature [60]. Hence, there is a fundamental trade-off between resolution and SNR in MRI. It is therefore of great interest to develop reliable detection methods for MRI in the presence of noise. Almost all fMRI tests are based on the image magnitude data. In standard prac- tice, the raw MRI data is reconstructed and the magnitude is taken to eliminate the (unknown) phase. I focus in this chapter on the repetitive experiments involving 9 a dynamic time sequence of images. Under various assumptions and experimental setups the fMRI detection problem reduces to well-known statistical tests including t-tests and F-tests [69]. When comparing two groups of images — “rest state” and “active state” images — a t-test is usually used [14], the derivation of which is at Appendix IV of Chapter 5. Another common approach to fMRI detection, called magnitude correlation (MC) detection in this dissertation, is based on a test statistic obtained at each pixel by correlating the magnitude time-series with a reference signal, which is assumed to be known and representative of the BOLD response [3]. Many generalizations and extensions of this simple idea have been proposed [24, 25, 36, 55, 58, 60, 64]. For example, recently Lai and Glover proposed a complex correlation (CC) test based on the complex data (i.e., image data before taking the magnitude at each pixel), in order to take advantage of phase information in the data and improve the detectability of fMRI responses [36]. The CC test statistic is F -distributed and the detector has a constant false-alarm rate (CFAR) property, which means that a specified false-alarm rate, i.e., the proba- bility of deciding a pixel is active when in fact it is not, can be achieved irrespective of the unknown parameters. Throughout this dissertation, the false-alarm rate is denoted by P, and the probability of detection is denoted by P4. Despite the CFAR property, the CC test focuses only on the response component (called signal component) of the data and ignores the baseline component (nuisance component in a general setting) of the data. The baseline component does not contain information relevant to the re- sponse itself (and is hence called nuisance component), but it does contain important information about the phase. In this chapter, a new test is proposed based on the GLRT principle that allows us to incorporate the phase information contained in the baseline component. In this chapter simple pixel-wise testing is stressed based on a Gaussian white 10 noise observation model. Pixel-wise testing ignores spatial relationships in MRI data, which is to be considered in Chapter 3. However, since the focus of this chapter is to assess the potential benefits of fMRI detection using complex data, a simple data model and testing procedure is employed to explore this basic issue. The assumptions are perhaps too simplistic in many practical cases. However, it is possible to extend the results and conclusions to more elaborate approaches based on more realistic data and / or correlated noise models, potentially accounting for uncertainties in the BOLD response and/or different nuisance component. Such extensions are briefly discussed in the conclusions of Chapter 4. This chapter is organized as follows. In Section (2.2), a basic model for MRI data is reviewed. Three tests will be studied, all of which may be interpreted as GLRTs under different data model assumptions. Therefore, before looking at each method, the GLRT principle is briefly reviewed in Section (2.3). In Section (2.4) and Section (2.5), the standard MC and recently-proposed CC tests are examined and the statistical properties of each are studied. The new GLRT for MRI is derived in Section (2.6). Its properties are explored in detail. In Section (2.7), the performance of the MC test, CC test, and GLRT in various regimes of baseline signal intensity to noise ratio are compared. Extensive Monte Carlo simulation is used to assess the performance of the detectors. The results show that the GLRT does have a CFAR property and a simple rule for choosing the threshold to achieve a desired P, is observed. The distribution of the GLRT statistic at high ratio of baseline signal intensity to noise is approximated from observation on threshold selection. Numerical studies show that the GLRT outperforms the CC test. Specifically, for a fixed false-alarm rate Pf, the GLRT’s detection rate Pd is higher than that of the CC test. Furthermore, the GLRT also performs significantly better than the MC test at low baseline signal intensity. The performances of the GLRT and MC detectors are roughly the same at high baseline signal intensity, and in such situations both 11 perform better than the CC detector. In Section (2.8), the performance of all three detectors is demonstrated in a simulated fMRI experiment. Section (2.9) contains the three detectors for the same model but with known noise varianceone, which is a by-product of my research. 2.2 fMRI Signal Model Due to phase errors which are difficult to control, the signal component of the mea- surements occurs in both real and imaginary channels [41, 6]. This suggests the following simple model *for an fMRI pixel time-series. Suppose there are N images acquired in the experiment. Let x denote an N x 1 vector containing the time-series data from one pixel: x = (a1 + br)e“9 + one. (2.1) The data vector x consists of three complex-valued components. The first com- ponent a1 is a constant (DC) baseline component, where 1 denotes an N x 1 vector of ones and a > O is the amplitude of the DC component. This vector represents the average value of the time-series. The baseline component model proposed here is the simplest version of the nuisance component. The second component br is the oscilla- tory response (signal) component. The vector r is a reference function that models the expected response characteristic. The amplitude b characterizes the strength of the response. In the absence of activity, b = 0. Typically, in MRI studies, while the subject is under some baseline condition (for example, at rest), a number of frames Nb is acquired; then the subject is asked to perform some task (for example, finger tapping) and a number of activation frames N, is acquired, or vice versa. These constitute one cycle. During each cycle, the total This model is attributed to Dr. Nowak. 12 number of frames is thus N = N, + Na. This pattern is repeated for a number of cycles. In Figure (2.1) adapted from [2], the activation-baseline pattern is represented by a periodic rectangular waveform of period N, with 1 and 0 representing activation and baseline, respectively [2]. One can think of the signal component as the response of a system whose input is the activation-baseline pattern. See Figure (2.1b). In real problems, the response signal r (the output of the system in Figure (2.1)b) is unknown. Friston et al. modeled the system as a linear time invariant (LTI) system [22], which is questioned by [2, 3]. Several possible estimates for the reference signal are suggested in [2, 3]. The first method is to use a delayed version of the activation-baseline pattern. It is most easily implemented. The disadvantage of this method is that the delay is not known a priori, and it may vary from pixel to pixel. The second suggestion is to select the response of one or more activated pixels as the reference signal. The third Option is to average the response of some activated pixels across cycles. The reference signal would then be formed by periodically replicating this time-averaged cycle throughout the time course. None of these approaches is perfect. Throughout this dissertation, the reference signal r is assumed to be known. The baseline component and signal component share a common phase 19. Hence, we model this phase-coupling by multiplying both components by the complex number e‘”, where i = \/—_1. In addition to these two components, an additive complex Gaussian white noise component onc models errors primarily due to thermal noises in the patient [13, 18, 19, 44]. The term 116 denotes a standard (zero mean, unit variance) complex Gaussian vector of length N. The factor a scales the noise resulting in a variance of 02. In general, the parameters of this model a, b, 19, and o are unknown and are different for each pixel time-series. This model was compared to actual fMRl time-series and these assumptions are in good agreement with actual data. Figure (2.2) shows the real part, imaginary part and phase of one time series from real fMRl 13 A Activation-Baseline Pattern 1 o - Na N=Na+Nb 2N Scan # (a) Stimulus Pattern Response Signal System 3 (Unknown Response > of Activated Pixel) (b) Figure 2.1. (a) Activation-baseline pattern for an fMRl experiment; (b) Modeling the activated pixel as a system. 14 experiment and it is illustrative of the constant phase idea in the model. The modeling of fMRI signals is a very complicated process [24, 29, 38]. Our model is still rough and not complete. The noise structure in fMR.I is very complicated. Our model does not try to capture more complicated disturbances present in MRI data such as other nuisance components, for example, due to physiologic motions. For the sake of simplicity and demonstrating our method and ideas, the noise is assumed to be white Gaussian. The whiteness assumption does not change the problem essentially, since we can always use Cholesky factorization [56] to whiten the signal and leave the detection problem unchanged, although, of course, estimation of the covariance of the noise is a challenging problem in itself. Usually parametric tests, to which discussions in this chapter are confined, assume a Gaussian model for the underlying time series. Several researchers have challenged this assumption and have used nonparametric tests, including the Kolmogorov-Smirnov test, Kruskal-Wallis test, and Wilcoxon signed ranks test, for a summary, see [69]. I succeed in deriving the GLRT directly from this complex model in Equation (2.1): _ xHx—N[af+a§] xHx - its + a3 + a2 + a? + t/(2ata2 + 25152)? + (at + fl? — a3 — airl’ L0 where H denotes complex conjugate transpose and xgl 01 = —i x/N xTr )61 : _R—'i \/N x?1 02 = —? \/l—V_ le flz = -I—- \/N 15 x 10 Real Part 1 I f I I I I I I I _1 1 l l l l I l l 1 0 1O 20 30 40 50 60 70 80 90 100 x 10‘ Imaginary Part "1 I F I I r fl I I I _,,2_........; ......... f. ......... f. ......... g ......... s _1.4 t— ....................................................................................... ., -1.6 .. ..... , ................................................................................... _ -1.8 l l l l l l l l 10 20 30 4O 50 60 70 80 90 100 Phase 1 r r r I I I g I § e ‘ g : e a a . f ; 1 l l l l l l l l 10 20 30 40 50 60 70 80 90 100 Figure 2.2. One time series from a real fMRI experiment. (a) Real part; (b) Imaginary part; (0) Phase. 16 This form, however, is not well suited for mathematical analysis. I thus turned to analysis in real domain. Because complex numbers can be interpreted as pairs of real numbers, I therefore re-express the complex model (2.1) as a 2N x 1 dimensional real-valued model: y = 805 + quS + on, (2.2) where u “=— b/a and x 1 0 r 0 acosr9 n y = R , S = , H = , z ’ n = CR x1 0 1 0 r asini9 nd The subscripts R and I denote real and imaginary parts, respectively. The phase- coupling in the complex model is manifest in this real model as a nonlinear coupling between parameter p and parameter 05. The reader is reminded here that u is the ratio of reference signal intensity to baseline signal intensity. Note here that this nonlinear model stands in marked contrast to the classical linear regression model: y = S¢1 + uH¢2 + on, (2.3) where 421 and 4», are independent. In the following it is shown that the CC test of Lai and Glover [36] can be derived from the linear model above. It is our contention that the nonlinear model is a more accurate representation of the physical fMRl problem, and, indeed, the new GLRT based on the nonlinear model outperforms the CC test. 2.3 Generalized Likelihood Ratio Tests The likelihood ratio test (LRT) [33] is an optimal method for deciding which of two hypotheses (competing data models) best describes a set of observed data. The data model corresponding to each hypothesis is given by a probability density function 17 (pdf). Unfortunately, however, to implement the LRT, the pdf’s under each hypoth- esis must be completely specified. The corresponding test is called simple hypothesis test problem. This is not the case in MRI. In the fMRI case, we have two hypotheses: Ho, BOLD response absent (u '= O), and H1, BOLD response present (a at 0). Under hypothesis H0, the vector 05 and the noise power 02 are unknown. Under hypothesis H1, 05, o2 and u are unknown. Due to the unknowns, in MRI we have what is called a composite hypothesis test. There are two standard approaches to composite hypothesis testing. The Bayesian approach prescribes a prior pdf’s for the unknown parameters themselves; and the likelihoods are integrated against these pdf’s to eliminate the dependence of the LRT on the unknown parameters. Another approach, the generalized likelihood ratio test (GLRT) is often preferable to Bayesian approaches due to its ease of implementation and less restrictive assumptions. Specifically, the GLRT does not require the specifi— cations of a priori probability distributions for the unknown parameters [33, 56]. For these reasons, this chapter focuses on the GLRT. The idea of GLRT comes from the LRT used in simple hypothesis testing which means the pdf for each assumed hypothesis is completely known. Let pH,(x; 8,), i = 0,1, denote the pdf’s corresponding to the two hypotheses. Recall that x denotes the data. The argument 6,- denotes known parameters that specify the precise form of the pdf. For example, 9, may represent the mean vector and covariance matrix of a multivariate Gaussian density. The LRT decides H1 if L(x) _ PH1(X; e1) _ > , pHo (X; 90) n where n is the threshold, which can be chosen to achieve a desired P,. The likelihood ratio (LR) L(x), a function of the data x, is called the test statistic. The GLRT is also based on the LR, but in the composite case the parameters are 18 unknown. The key idea in the GLRT is to replace the unknown parameters by their maximum likelihood estimates (MLE’s). In general, the GLRT decides H1 if pHo (x; 60) i where O1 is the MLE of 81 assuming H1 is true, and O0 is the MLE of Go assuming H0 is true. The MLE of a parameter is simply the value of the parameter that maximizes the corresponding pdf (i.e., the value that makes the observed data most likely). The GLRT has no optimality property, in general, but it asymptotically ap- proaches the uniformly most powerful (UMP) test among invariant tests [7]. For more details on maximum likelihood estimation and the GLRT, see [33]. In the fol- lowing sections we review the basic MC and CC tests and introduce the new nonlinear test for MRI detection using the GLRT. 2.4 Method 1: Magnitude Correlation Detection The magnitude of MRI data is known to be Rician distributed [41]. To see this, note that 1:,- in Equation (2.1), the jth observation in the time series can be written as x,- = (a + brj) c039 + (mg,- + i[(a + brj) sin 9 + on;,-]. The two terms being independent, 2, E [le is Rician distributed (see Definition (3) and Equation (5.1) in Appendix III). However, for large values of ratio a/o (ratio of baseline component intensity to noise standard deviation) the Rician density can be well approximated as a Gaussian distribution because ZjEIIEj] = x/[(a+brj)cosl9+onR,—]2+[(a+br,-)sin0+on1,-]2 : \/(a + brj)2 + 02073.2]- + 7731-) + 2(0 + ij)0(nRJ €080 + nlj Sin 0) 19 20(nR-cosd+n1-sin0) 02 :: (a+bTJ)¢1-[- .7 a+brj J +——-——(a+brj)2(n§j+n%j). Note that n3, c050 + n],- sind is nothing but another Gaussian random variable. We denote it as n,. Also note that n,- and nk are independent for j 79 k. n32]- + nij is a x3 random variable. Under the assumptions that a >> a and u = b/a is very small, the third term under the square root sign is much smaller than the second one and therefore can be neglected. In this case, application of the binomial expansion Wz1+éx |:r|<<1, to the above equation leads to z,- za+brj+onj. Hence, the following Gaussian approximationa is commonly assumed for MRI detection [3]: z z a1 + br + on, (2.4) where here 2 = Ix], n is (real) Gaussian distributed, and with b = 0 under Ho and b 75 0 under H1. Hence, in this case 60 = [a 02] and 91 = [a b 02]. Bear in mind that this approximation does not accurately model the data when a/o is small, as we shall see later in some examples. The GLRT for the above detection problem results in the following test statistic [56, 57]: II can II2 IIP.%P1*z II” t1(Z) = (N - 1)[L1(Z) - 1] = (N - 1) (2-5) 20 where 2 ll Prz H L z = . “’ “Pszu’ If t1(z) > m, then we decide H1; otherwise choose Ho. On the assumption that z is truly Gaussian, t1(z) is distributed as F1,(N_1)(SNR) [57], where SNR E uzaz/oz. Refer to Appendix II in Chapter 5. The detector has CFAR property l Therefore, we can choose a threshold in to achieve a desired P, irrespective of the signal to noise ratio, which is generally unknown a priori. This detector is called the magni- tude correlation (MC) detector because the test statistic t1(z) is proportional to the . correlation between the magnitude data 2 and the reference signal r. Unfortunately, the Gaussian approximation in Equation (2.4) is unreasonable when a/o S 3. In fact, when a/o is small, the distribution of test statistic t1(z) is not known, nor whether the MC detector has CFAR property. So determination of a proper threshold to obtain a desired P, is theoretically very difficult. How to solve this problem will be explained together with numerical results. Moreover, in this case, one expects the performance of the MC test to suffer. This is indeed the case as shown later by numerical results. 2.5 Method 2: Complex Correlation Detection Recently, Lai and Glover proposed a complex correlation (CC) test based on the complex data, in order to take advantage of phase information in the data and improve the detectability of fMRI responses [36]. Here, the CC test statistic is shown to be also F—distributed and also has a CFAR Note that the MC test statistic has a central F1, N-) distribution under Ho ([1 = 0). 21 prOperty. Recall the linear model y = 54>, + pH (#2 + on. (2-5) The unknown parameters in this case are 90 = [of d); 02] and 91 = [05] g u oz]. The GLRT based on this model in fact coincides with the CC test [36] and is given by .L 2 P PL 2 t2()’) = (N-1)[L2(y) - 11=(N—1)“priy”2 = (IV-1)” Z 1"”2 (2.7) llPSHPSyll “PHPSY” where 2 || PsLy ll _ yTPsfy 1420’) = — —- H y H2 " ll Ps)’ ”2 — H PH)’ “2 YTPSLHY (2.8) If t2(y) > 172, then we decide H1; otherwise, choose Ho. This test is called the complex correlation (CC) test because it is based on the correlation between reference signal and real and imaginary components of the complex data. The pdf of t2(y) is non-central F2,2(N_1)(SNR), where again SNR = u2a2/02, and thus the detector has the CFAR property i Despite this desirable property, the drawback to this test is that it is based on a less accurate model. The CC test focuses only on the response component of the data and ignores the baseline component. The DC component does not contain information relevant to the response itself, but it does contain important information about the phase. Although the phase is a nuisance parameter in the testing problem, more accurate knowledge of the phase can improve the detectability of the fMRI response. As noted previously, the phase-coupling between the nuisance and signal response components of the data dictates the nonlinear model in Equation (2.2). Therefore, I next derive a new GLRT based on this more accurate model. Note that the CC test statistic has a central F2300-” distribution under Ho (p = O). 22 Figure 2.3. Structure of matched subspace detector. Before going to next section, I’d like to summarize a little bit. The MC detector and CC detector actually have the same structure as shown in Figure (2.3) adapted from [57], which is known in the signal processing literature as a matched subspace detector [57]. First the data are projected onto a low-rank subspace by removing interference. The projector is also termed an interference rejecting or null steering filter [57]. Then the resulting data are further projected onto another low-rank sub- space that is matched to the signal component, and energy is taken. This projector is usually called a matched subspace filter. Since the noise level is unknown, this energy is then compared with the energy in the component orthogonal to signal subspace. The ratio is computed and compared with a threshold for a decision [57]. 2.6 Method 3: Nonlinear GLR Detection In this section, a new nonlinear test based on the GLRT principle is found that incorporates the phase information contained in the baseline component [48]. The unknown parameters in model (2.2) are 90 = [¢T 02] and 81 = [¢T u 02], under Ho 23 and H1, respectively. Recall that the phase coupling introduces a nonlinear coupling between the parameters u and cf) under H1. This nonlinearity makes the MLE’s very difficult to compute, but, remarkably, a closed-form solution for the GLRT statistic which is derived in the following subsection does exist: t3(¥) = [L300 - llUV - 1), (29) where 2” PsLy II2 (2.10) H Pity II2 + II PsLy |l2 - \/l| Pay II4 + II Psy II4 + 2II Pay l|2|| Psy “2008 2P L3(Y) = with T (61:02) yT-}%y coscp y = = (2.11 ‘ ’ II a HH 02 n n Pay nu Psy ll ’ and 01 and 02 are two sufficient statistics 1 T 1 T 01(y) = NH y, and 02(y) = N5 y. (2.12) As usual, given a specified threshold 173, we decide H1 if t3(y) > 173, and Ho otherwise. 2.6.1 Derivation of the GLRT Test Statistic In my derivations, I assume lTr == 0 and rTr = N without loss of generality, so STH = HTS = 02x2,STS = HTH = N12“. The second condition can always be satisfied since we can always normalize the reference signal without changing the problem essentially. The first condition is more difficult to meet. But the same ideas of decomposing one orthogonal projection operator into two oblique projection operators as in [4, 57] can be utilized to achieve the result, even under more compli- 24 cated nuisance and signal (reference) structures, i.e., the nuisance component is not just a constant baseline 1, nor is the signal component just one reference signal r. Possibly, both of them may be adaptively selected from real data to contain several components (see discussions in Chapter 4). However, the analysis of the test statis- tic’s properties (such as invariance), and hence the determination of the threshold are much more complicated than presented in this dissertation. Even so, the properties of the corresponding linear detector may still be used as a guide. Let 30 and 31 denote the MLEs of noise variance under hypotheses Ho and H1, H (Xiél) respectively. Recall that the GLRT is based on m. It is easy to show that this 0 2 statistic reduces to L3(y) = min(3§)/min(3f). (2.13) Recall that under Gaussian distribution the maximum likelihood estimate is the same as the least square estimate. Therefore, the calculation of min 33 is straightfor- ward: . A 2 "111103 = II PsLy ll (2-14) However, determining min(3f) is much more difficult due to the nonlinear coupling between the two unknowns u and ob. To circumvent this difficulty, I first decompose y and y — Sd) — uH¢ into three orthogonal components, i.e., y = PsLHy + Psy + PH)’, and then 8f = IIy-——S<2b-MH<1>II2 2 = II PsLHy + (Psy - S¢>) + (Pay - uH¢l ll 2 = II Pay II + II Far — Set I!” + II PHY - uH¢ H2 25 = n Pay “2 + n 862 -- s¢ u? + In Hot — uH¢ “2 "' l] PsLHy ”2 + N[01T01 + 0392 +(1+ #2)¢T¢ — 2(92 + HallT‘l’l where 01 and 02 are given by Equation (2.12). Now min 3;" is equivalent to min J = (1 + u“)¢T¢ - 2(92 + #01)T¢. Setting partial derivatives of J with respect to u and (b to zero results in: ii- = 2u¢T¢ - 20w = 0, 3;; = 2(1+ #2)d> - 2(92 + not) = 0 We then get A 9T“ A 92 + I201 = —, 2.1 ¢ 1 + 1,, ( 6) So now mine? = u Pay ”2 + metal + 91592) — N<1+ now. Furthermore, note that N(9f91+ 9592) = II Pay II2 + II Fey “2, and II PsLHy II2 + II Psy II2 + II Pay II2 = ll 3’ “2, which gives minaf = u y ll’ — N0 + fi2)$T¢— — II y u— 3% ————[6T92 + more + T120911 Eliminating d; from Equations (2.15) and (2.16) (or setting the derivative of the 26 above equation with respect to II to zero) shows that II must satisfy: 0302 + 222:9?)2 + Wafer _ 9% + new, 2.17 1+? p ( ) Using this equation, min 8? can be further simplified: A 0% min 0i = H 3' ||2 - N(9f91+ 1_fi2) (2-18) This equation is important in our derivation of an asymptotic expression for L3 (y) in the following subsection. From Equation (2.17), II satisfies quadratic equation u2 + on — 1 = 0 with zfln—flm 93‘s, Since a] [1.2 = —1, there are two solutions of Opposite signs: c c 2 A = —— :l: 1 — u 2 + (2) However, from Equation (2.18), to make sure 3? is minimal, fl must have the same sign as 9:02, and so the unique solution for 22 is: x c 2 p:—§+ l+(-). Substitution of ii into Equation (2.18) yields the right solution for min 3?. Finally, from Equations (2.13), (2.14) and (2.18) I get the closed form expression for L3(y) as given by Equation (2.10). Instead of using L3(y) directly, I use Equation (2.9) as the test statistic, which is suggested by the form of test statistic t2(y) for the CC detector. The main reasons are 27 to facilitate the determination of the proper threshold and to get a good comparison between the three different detectors. It will become much clearer when we study the asymptotic property of t3 (y). Having established the form of the test statistic, the next question is naturally how to choose the threshold, which is a very difficult problem. In order to answer this question we need know the pdf of this statistic. Unfortunately, unlike the CC test, a closed form for the pdf of t3 (y) is not known to me at this stage. I can, however, show that the pdf of t3(y) is a function of u and a2/o2 alone. 2.6.2 Invariance of GLRT Test Statistics In this subsectiOn, I employ theories on invariance [46, 56] to prove that the pdf of t3(y) is a function of only a and a/o. In order not to interrupt the continuity of description, the ideas and principles on invariant test are put in Appendix V of Chapter 5. The difficult part in using invariant theory is to find an appropriate set of trans- formations which fully exploit the structure of the signal to be detected. Since our problem is nonlinear, finding this set of transformation is not an easy matter. Actu- ally discovery of this set of transformations comes simultaneously with its proof. The following theorem may be regarded as an extension of the results in [56, 57], which only deal with the linear model of Equation (2.3). Theorem: The family of distributions of y defined by y = quz + Sci) + an (2.19) where n is Gaussian distributed as N (0, I), is invariant to the group of transformations defined by: G = {9(y) = 9(y) = CQSQHy} (2-20) 28 where the two orthogonal matrices Q5 = UsQU§+PsL = (Sngv't'I—‘Sfi—T, Q” = Utter/twat = ffio”f;+I-”THT, c is any constant and Q is any 2 x 2 orthogonal matrix, Us and U H are defined in an obvious way. Under the above transformation, g(y) is explicitly expressed as 9(3’) = HIH¢1+ S¢1+ Uln (2°21) with the induced transformation G given by: #1 = H ¢1 — CQ¢ 01 = 60’ Note that Q5 and Q H are two orthogonal matrices (rotational matrices [56]). The geometrical meaning of this transformation is thus consecutive rotations of the original signal within the 5 plane (defined as the subspace spanned by the columns of S) and then within the H plane followed by sealing that introduces unknown variance. The following is presented to make the proof as applicable as possible to a general case. Proof: QHY = Q3595 + #QHHd’ + UQHD = 545 + HHW + UQHD, 29 since (235' = (UHQU; + Pfi)S = PfiS = S, . T because HTS = 0, i.e., U55 = g—fiS = 0. Similarly, Q5 H = H: while H¢' = QHHd) = (UHQUE; + P§)H¢ = UHQU§H¢ = UH(U,§UH)-1U;UHQU;H¢ = H(HTH)-1HTQHH¢ (2.22) due to the fact that UEUH = I. Therefore, QSQHY = QS(S¢ + #HW + UQHD) = 545" + #HW + OQSQHD where, similarly to Equation (2.22), S¢fl = stcp = S(STS)"STQSS¢. (2.23) It turns out that 03’ in Equation (2.22) and 03” in Equation (2.23) are coincident, which is the fascinating and devious part for finding the above set of transformations: T ”T“ —H—H¢=Q¢. (l), = (HTHl-IHTQHH¢=NWQW 30 due to the fact that HTH = N12x2. Similarly, ¢" = (STS')"’STQSS 0. By the asymptotic property of maximum likelihood estimates, as N —+ 00, fl —) p. In order to get a more accurate approximation of II, we use Equation (2.16) combined with II —> ,u —+ 0 (as N —-) 00) and get $4 92, so from Equation (2.15) (as N —> 00), A ~ 9T92 ...—— 2.24 It 0302, ( ) which is the maximum likelihood estimate for the corresponding linear model in Equation (2.3). Substituting Equation (2.24) into Equation (2.18) we have min??? = H y H2 — N(61T0.+ 0362) = H y H2 — II Psy II" — ll PHY Il2- (2.25) Noting that Psi}! = PéPsLHPsL, (2-26) P; —— Psi” = PgLPHPgL = PH = PHPgL, (2.27) PSJ-HP,L = P,.]P,L = P3,, = I — PS — PH. (2.28) we get, from Equations (2.13), (2.14), (2.25), as N —> 00, L3()’) 3 1420’)- This implies that t3 (y) asymptotically has the same distribution as t, (y), i.e., non- central F2,2(N_1)(SNR), and thus asymptotically has the CFAR property with small u. 33 2.6.5 Threshold Selection Based on Numerical Simulation Another method is to try to determine the exact thresholds via Monte Carlo simula- tion. Some of the results of our simulations are given in the next subsection. Here, we summarize the conclusions. Extensive Monte Carlo simulation reveals that the GLRT test is also CFAR when a/o 2 1, which is the case for most, if not all, fMRI experiments. More importantly and more interestingly, to achieve the desired P], the proper threshold of our GLRT detector is almost exactly one half that of the corre- sponding threshold required for an Fl,(N-1) distributed test statistic. This is confirmed by extensive Monte Carlo simulation. 2.6.6 Distribution of GLRT Test Statistics The observation in above subsection may lead to Nan’s Conjecture formulated as follows. In mathematical terms, we have PI=/ pt3|H0(t3)dt3z/ fo(t)dt, (2-29) 77 2n where f0 denotes the density of an FM 10-1) distributed statistic. Differentiating the above equation with respect to 1) leads us to an exciting result that the density of the test statistic t3 under H0 is related to the F1,N_1 density by the approximation p¢3][10(t3) z 2f0(2t3). (2.30) This implies that a very accurate threshold can be selected using standard Fl,(N-1) distribution tables [1]. Actually, on careful observation of the following Tables (2.1-2.3), we find that Pd’s 34 for GLRT and for MC detectors are the same, when a/o is large, i.e., Pd = / [3,3]”, (t3)dt3 "~"’ / f1(t)dt, (2.31) n 2n where f1 denotes the density of an F1,(N-1)(SNR) distributed statistic, since for large a/o, the approximation for the MC model in Equation (2.4) is quite valid and hence the Pd for MC detector is given by the right integral of above equation. Differentiating the above equation with respect to 77 leads us to another result that the density of the test statistic t3 under H1 is related to the F1,N_1(SNR) density by the approximation Pt3|H1(t3) z 2f1(2t3). (2.32) When u = 0, fl is coincident with f0. Summarizing two cases, I suspect that Pt3(t3) z 2f1(2t3)a (2-33) for a/o > 1. Although theoretical proof for this conjecture is difficult to achieve, it is still worth investigating. Because the pdf of this statistic does not depend on the angle 19, the expression of y simplifies greatly and may direct us to its final solution. I caution here that numerical simulations reveal that the conclusions in Subsec- tions (2.6.5) and (2.6.6) all break down for a/o < 1. 2.7 Comparisons of the Three Detectors In order to compare the performances of the three detectors, I run extensive Monte Carlo experiments. Because originally we only know that the performance of GLRT depends on two parameters a/o and u E b/a, it is necessary to study the performance for different values of a/o. However, as mentioned above, for a/o Z 1, the Monte 35 Threshold 4.70 3.43 6.85 a/o )2 CC GLRT MC 1 .3162 .72 .80 .44 3.162 1 .72 .80 .78 10 .03162 .72 .80 .80 Table 2.1. Pd with P, = 0.01, N = 120 Threshold 3.75 2.58 5.15 a/o [1 CC GLRT MC 1 .3162 .82 .88 .58 3.162 1 .82 .88 .87 10 .03162 .82 .88 .88 Table 2.2. Pd with P, = 0.025, N = 120 Carlo analysis suggests that the GLRT is essentially CFAR. In Tables (2.1-2.3), we compare the detection rates Pd of the three tests under three P, specifications. The Pf’s are selected to be representative of those commonly used in fMRI. In these tables, the first row contains the thresholds corresponding to the preselected Pf. In order to see the functional dependence of Pd on SN RE ”202/02, I deliberately select [2 so that SN R is the same for three different a/o cases. In order to get as accurate results as possible, for each value of Pd (and P,), 105 simulations are run and the average is taken as true result. The most difficult element of the Monte Carlo analysis, except in the CC test case, is the determination of proper thresholds to achieve a desired false-alarm rate with each detectors. The CC test is F2200-“ distributed under Ho, and therefore the proper threshold is very easily determined from standard tables [1]. Because the GLRT is not known to possess the CFAR property, the proper thresh- old will, in general, depend on a/o. For a given value of a/o, the threshold needed to achieve a desired false-alarm rate can be determined via Monte Carlo analysis and trial-and-error over a range of thresholds under the guidance of the rough method 36 Threshold 3.03 1.96 3.92 a/o u CC GLRT MC 1 .3162 .88 .93 .69 3.162 1 .88 .93 .92 10 .03162 .88 .93 .93 Table 2.3. Pd with P, = 0.05, N = 120 described in Subsection (2.6.3). This is precisely how the thresholds were determined for the results given in Tables (2.1-2.3). Remarkably, however, the Monte Carlo anal- ysis revealed that both the MC test and the GLRT are essentially CFAR so long as a/o > 1, which is almost always true in MRI. Moreover, the Monte Carlo analysis supports the use of some very simple rules for threshold selection. First, in the case of the MC test, for very large values of a/o the magnitude data are very well approximated as Gaussian. Therefore, in such situations, the MC test is (approximately) F1.( N-” distributed under Ho and the prOper threshold can be determined again from standard tables [1]. Because the Monte Carlo simulations show that for a/o _>_ 1 the MC test is essentially CFAR, the proper threshold may be determined from F1, N-” distribution for all cases in which a/o _>_ 1. The derivation of approximation model (2.4) for MC detector also supports this, although not strictly. Second, the similar performances of the GLRT and MC test for large a/o suggest the possibility of a relationship between the GLRT statistic and the F1,(N_1). This intuition led to the discovery that the proper threshold for the nonlinear GLRT can be selected as one half the threshold required to achieve the desired P, for 3 F1, 10-1) distributed statistic. The results in the three tables show clearly that our GLRT detector performs best for all three (low, medium, high) a/o cases. The CC detector performs better than the MC detector in the low a/o case. However, as the DC component becomes more and more dominant over the noise, the GLRT and MC test significantly outperform 37 the CC test. Finally, note that the detection rate of the CC test is constant for fixed SN R = u2(a/o)2. This is expected because the .CC test statistic is non-central F2,2(N_1)(SNR) under H1. Remarkably, note that the dependence of detection rate of our GLRT detector also depends only on SNR. The same is not true of the MC test, whose performance drops severely as a/o decreases. I also illustrate in Figure (2.4) the results using three performance curves (PD versus response strength u = £- curves) with N = 120, P, = .01. Therefore, the thresholds are chosen as in Table (2.1). Solid (—) line for GLRT; dash-doth.) line for CC; dashed (— —) line for MC. Figure (2.4a) shows the case for a/o = 1.0, which is low, so we expect MC detector to suffer. It is indeed the case: the GLRT curve is at the top, CC curve is in the middle and the bottom one is for MC. Figure (2.4b) shows the case for a/o = 3.162, which is large. In this case, the top one is for GLRT, the middle one for MC, the bottom one for CC. It shows the MC detector begins to surpass the CC detector but is still inferior to the GLRT. Figure (2.4c) indicates the case for a/o = 10, which is quite large, and so the MC and GLRT detectors have almost the same performance as shown in the figure: the GLRT and MC curves coalesce to one in the top while the CC detector remains at the bottom. All three curves clearly demonstrate that our GLRT detector is always the best. 2.8 A Simulated flVIRI Study One fMRI experiment is simulated to illustrate pixel-wise detection efficiency by the above three detectors. The results are shown in Figure (2.5). Figure (2.5a) shows one slice image of the brain (64 x 64 pixels) with simulated activation region highlighted. A 9 x 9 voxel region in the lower right corner of the brain (indicated in white) is selected to be active in this simulation. For the simulation, a time series with length 38 0.9 - 0.8 ] iii) _ 30.5- 3,-0.3 ~ 0.9 0.8 .. 0.7 ..1 .. 50.5 0",... 0.3T ‘ 0.2r- -- - ' 0.1,-.. ~’ 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 his 00 0C) Figure 2.4. Three curves comparing the performance of three detectors. (a) a/o = 1; (b) a/o = 3.162; (c) a/o = 10. 39 N = 120 is simulated for each voxel. The reference signal r is a square wave with period 10. The fluctuation of reference signal about constant level is i10%, i.e., u = 0.1. The noise variance in each time series is set so that a/o = 3.162. For each time series, the phase is a constant. Spatially, the phase has a fluctuation (Gaussian noise with zero mean 0.1 variance) about a constant phase of 1r/3. The reader is referred to Figure (3.13a) in Chapter 3 for the correlation image for this simulated experiment. The desired false-alarm rate in this example is chosen to be P, = 0.01, and thus the three thresholds for CC, GLRT and MC detectors are 4.70, 3.43, and 6.85, respectively. The MC test, CC test, and GLRT test are compared in Figures (2.5b—d). The actual detection rates observed in this simulation, given in the caption of Figure (2.5), are in excellent agreement with the tabulated Monte Carlo results. However, we note that in the activation maps shown in Figure (2.5), there is activation ”detected” outside the brain. This is a serious problem which results from the fact that spatial information is completely ignored. This is the main defect of pixel-wise detection. Therefore next chapter is devoted to dealing with the problem of how to utilize spatial information to further enhance detection efficiency. 40 Figure 2.5. A simulated fMRI experiment illustrating pixel-wise detection by three detectors. (3.) Brain image with simulated activation region highlighted; (b) MC test results: Pd = 0.77; (c) CC test results: Pd = 0.70; (d) GLRT results: Pd = 0.79. 41 CHAPTER 3 Multi—scale Detection for MRI 3.1 Overview The last chapter focused on pixel-wise detection for MRI, which is most common in practice [3, 24, 25, 36, 55, 60, 64]. However, these techniques do not take advantage of mutual information among neighboring pixels. Ignoring such spatial information can reduce detection accuracy. Utilizing spatial information may enhance our detec- tion accuracy. For example, in Figure (2.5) activity is “detected” in areas outside the brain — an erroneous decision that could be avoided by incorporating ananatomical information in the decision rule. Furthermore, it may be quite possible that con- nected region of activation is larger than individual pixel dimensions. In other words, activated areas in reality tend to occur in clusters of neighboring pixels. Thus, lim- iting testing to individual pixels imposes artificial boundaries in the analysis process that may weaken the detection performance. On the other hand, if there is strong indication that a large group of pixels, which may be thought of as one large pixel at a very coarse (spatial) scale, is active, then the individual pixels inside this group may be more likely to be active themselves. Hence comes the idea of (spatial) scale and incorporating spatial correlation into the fMRI detection process. 42 3.1.1 Spatial Modeling and Outline for the Method In light of the ideas above, the pixel-wise detection is oversimplistic. Therefore, it is necessary to develop detection methods that take advantage of spatial correlation. There are many approaches to attacking the problem, for example, cluster analysis [20, 26, 27] and independent component analysis (ICA) [43]. Detection methods using Bayesian strategies have also been recently proposed for MRI [21, 35, 17]. Just as in the pixel-wise detection strategies, we need to model each time series; when we incorporate spatial correlation, we also need develop spatial models of the fMRI data. This is by no means easy. The recent works of [35, 17] use Markov random field (MRF) models to model the spatial relationships in MRI data. These methods all have their shortcomings. The clustering and independent com- ponent analysis techniques are somewhat ad hoc and do not enable explicit modeling assumptions about spatial correlation. The existing Bayesian methods mentioned above are all restricted to modeling only the finest scale (highest resolution). Such methods tend to be very computationally demanding, and are often difficult to ana- lyze and interpret. Therefore, we will put forward a novel multi-scale modeling and detection framework that incorporates spatial correlation information and is much more amenable to analysis and optimization. More specifically, this chapter will present a two-step approach for fMRI detection. First, a new multi-scale image segmentation algorithm is pr0posed to decompose the correlation image into several different regions, each of which is of homogeneous statistical behavior. Second, each homogeneous region will be classified independently as active or inactive using detection methods analogous to the pixel-wise test described in Chapter 2. 43 3.1.2 General Setting of Bayesian Image Segmentation In a general setting, the fMRI activation mapping may be viewed as a particular image segmentation problem. We are given a random continuous noisy image Y which must be segmented into a discrete image X consisting of regions of distinct statistical behavior. For example, in MRI, image Y may be composed of the correlation values (specifically, the correlation between amplitude time series and the reference signal) at each pixel location; the image X may be just a binary detection map, as in Chapter 2. To simplify the presentation, I will modify the notation slightly from that used in previous chapters. From now on, I will adopt some notation from [10]. Symbols without subscript refer to the whole image field. Individual pixels in the image X are denoted by X k where k is a point of a one-dimensional (l-D) or two-dimensional (2-D) lattice, depending on the context. The collection of lattice points at scale j is denoted as Sj. Random quantities are usually denoted by upper—case letters. For notational ease, however, lower-case letters may denote the stochastic quantities or corresponding deterministic realizations, which should be distinguishable also from contexts. We assume that each observed pixel in image Y is dependent on a corresponding unobserved label in X. Each label specifies one of M possible states, each with its own statistical behavior. In our case, M = 2 indicating “active” and “inactive”. However, in general, we may need to segment the correlation image into regions with homogeneous statistical behavior, so M 2 2. The dependence of observed pixels on their labels is specified through the condi- tional distribution of Y given X, i.e., py|,(y|x). The function py|,(y|x) is called the likelihood function. In fMRI, we are essentially interested in inverting this relation- ship; that is, we would like to determine px]y(x|y), the probabilistic description of the 44 unknown activation map given the observed data. This calculation is facilitated by introducing a priori knowledge about the size and shapes of regions, modeled by a a prior distribution p,‘ (x) By Bayes formula, we will estimate X given observed image Y = y as: x = arg max p(X = x|Y = y) I = p(X=g)p p(Y=y) = p(X=x)p