(Ir

;.l.

..r...

 

» r.

ﬁg...“

1 i 1 . Iv . on,
I“

v.13»

...
......I.......:..
.1 4.1.}

l :1,

L.
. :53...
t "H.093 I
In 7 .
~ 1 : 9:13.}. . ...
a. 3?.»

.. {.5 .
:3
5......»

?
v .
.n

r . .
...?!
.

r! .‘
...: xii. «pram.

.

5 .3 4}
..Idi)

c
t

?:.b
1".

.J”
3. .. 9:3. .
l . I.

...u I. a! l...

...". 5:3,!
...... 5.5.5.. ...]...23:
.533 .... . :2...

.8».

RN“.

lot

iv.”

1 i ......
an}... “anew

2.;
{£35
A

... c3...“

53... .

I. a.

1.1.?

...
1.153.513...
Erna“?

an..." .

.9!

awn. . ..
ﬁur by“!

.1:

3
313.1%.»

...} .
l "a;
.

...: z
£$.:£§vtcl...kl1
a 1

D5. . {IV
3 ...!r: u

l .3:-

c1 it a ain't»!

”It”;

3. . .ft
:15. :7...
3:35.}!

3:1 ....

mu.
urn-«:45—

1...!!!
. 55.3.1:
I: 5.1»... :
.... us i=3...

5
...s . 1....1. ..
a)! a: ~

IRSF! ......

z... u:
.... . k
at {51‘

'0
t...‘ .

Iv . ...,lllli...
1.. « .

.... .... Jul
l.

... in);

1.7 .3.

[II ... 2.3..
I 1.... :3 2t.

. . ..ult Ln..-
.: a1. ..
a... i. f .
3...}! p.

.3:-

y . .
5b.»;

 

S 5.. 7 OI
I". f YA .l’rl‘ .
f 0?... .Y ..u.l|¢:1t ...
. vs “.11”; 1 ﬂ. .ni..7....‘. \
I ... ...... :
I...) 9 (Ir 3...... .... l
.... 2 x3” . 1.9. 7 1 13!... 1 .
n a 1 I V .. .p 4-. . .1 A. 55
h :V .I . c .5 .
. :1 .I I4 ... [I s!

1‘ .rLPKHAJ-i I
.2...
.l‘?!.. iliul!

..f

0‘

I!!!
I315 t‘
'4...) l .

THESlS

low

This is to certify that the

dissertation entitled

NOVEL DETECTION METHODS FOR
FUNCTIONAL MAGNETIC RESONANCE IMAGING

presented by
Fangyuan Nan

has been accepted towards fulﬁllment
of the requirements for

 

 

Ph . D degree in Electrical Eng
Maior professor

 

Date 5’1/5/1/1000

MS U is an Afﬁrmative Action/Equal Opportunity Institution 0—12771

 

LIBRARY
Michigan State
University

PLACE IN RETURN BOX to remove this checkout from your record.
To AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE

DATE DUE

DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

woo cameos-p.14

 

Novel Detection Methods for Functional Magnetic

Resonance Imaging

By

Fangyuan Nan

A Dissertation

Submitted to
Michigan State University
in partial fulﬁllment of the requirements

for the degree of

DOCTOR OF PHILOSOPHY

Department of Electrical and Computer Engineering

2000

ABSTRACT

Novel Detection Methods for Functional Magnetic

Resonance Imaging
By

F angyuan Nan

This dissertation considers the detection problem in functional magnetic resonance
imaging (fMRI), i.e., to determine which parts of the brain show active response to
some stimulus. Some basic magnetic resonance imaging and fMRI background is
introduced in the beginning. The underlying principle is that subtle variations in the
image intensity over time can be detected to reveal brain activity.

The detection problem is ﬁrst attacked on a pixel by pixel basis. A new nonlinear
detector for MRI based on Generalized Likelihood Ratio Test (GLRT) is systemat-
ically studied. Theoretical analysis and Monte Carlo simulation are used to explore
the performance of the new detector. At relatively low baseline signal intensities, I
the GLRT detector outperforms both the conventionally used magnitude correlation
(MC) detector and the newly proposed complex correlation (CC) test. At high base-
line signal intensities, the nonlinear GLRT performs as well as the standard MC test
and signiﬁcantly better than the CC test.

fMRI signals are actually both temporally and spatially dependent. Pixel-wise

detection, however, considers only temporal correlation information and ignores. spa-
tial correlation information. In order to remedy this deﬁciency, the dissertation then
uses a multi-scale image segmentation algorithm to ﬁrst segment an MRI correla-
tion image into several regions, each with homogeneous statistical behavior. A single
pixel detection algorithm is then applied to each homogeneous region. Extensive

simulations demonstrate the efﬁcacy of the new method.

ACKNOWLEDGMENTS

I’m most grateful to God, Jesus and Li Hongzhi for giving me a new birth and
teaching me a lot of unearthly wisdom. Everyone the Lord has rescued from trouble
should praise him [Bible - Psalm 107:1-2].

I’m very grateful to my grandmother and my parents, Mr. Shu-Guang Nan and
Mrs. Benzhen Li, for nurturing me. The unfailing love from them as well as from my
sisters, Mrs. Xiaoping Nan and Mrs. Xueqing Nan, is always a strong encouragment
and support for my life. Pay attention to your father, and don’t negelect your mother
when she grows old. Invest in truth and wisdom, discipline and good sense, and don’t
part with them. Make your father truly happy by living right and showing sound
judgment. Make your parents proud, especially your mother [Bible — Proverbs 23:22-
25].

I’m grateful to my advisor Dr. Nowak, a brilliant young scholar and a very
nice gentleman, who helped me greatly not only in my research, but also in my life.
Wthout his help, this dissertation would have been impossible. I’d also like to express
gratitude to Dr. Deller, Dr. Pierre and Dr. LePage for their willingness to serve on
my committee and for their helpful suggestions for my research and this dissertation.

I’m very thankful to many friends’ help at a very difﬁcult time during my research.
I am obligated to mention the following names: Dr. Robert Barger and Dr. Paul Liu,
Tom and Ruth Shillair, Yi Wan, Dr. Yingcheng Dai and his wife Dr. Qing Li, Dr.
Dimitri Mihalas, Dr. William Schmidt, Dr. Jen-Je Su and his wife Dr. Shu-Shyan
Wong, Ms. Smith, Dr. Yingnian Wu and his wife Ms. Liping Li.

I’m thankful to many teachers. Among them, of particular importance and help to
me are my previous advisors: Prof. Siyong Zhou at Beijing Institute of Technology,
Prof. E. C. Young, Prof. Christopher Hunter at Florida State University (FSU),
and Prof. Kun-Mu Chen and Prof. Edward Rothwell at Michigan State University
(MSU).

I am also indebted to the Dept. of Mathematics at FSU and the Dept. of Electrical
and Computer Engineering at MSU. Without the teaching assistantship provided by
them, my graduate studies in the States would be impossible.

iv

TABLE OF CONTENTS

LIST OF TABLES vii
LIST OF FIGURES viii
1 Introduction 1
1.1 How does fMRl work? .......................... 4
1.2 Characteristics of the fMRI Signal .................... 5
1.3 Outline for Detection Methods for MRI ................ 5
1.4 Organization of This Dissertation .................... 6

2 Pixel-wise Detection for MRI 9
2.1 Overview .................................. 9
2.2 fMRI Signal Model ............................ 12
2.3 Generalized Likelihood Ratio Tests ................... 17
2.4 Method 1: Magnitude Correlation Detection .............. 19
2.5 Method 2: Complex Correlation Detection ............... 21
2.6 Method 3: Nonlinear GLR Detection .................. 23
2.6.1 Derivation of the GLRT Test Statistic ............. 24

2.6.2 Invariance of GLRT Test Statistics ............... 28

2.6.3 Upperbound of GLRT Test Statistics .............. 32

2.6.4 Asymptotic of GLRT Test Statistics ............... 33

2.6.5 Threshold Selection Based on Numerical Simulation ...... 34

2.6.6 Distribution of GLRT Test Statistics .............. 34

2.7 Comparisons of the Three Detectors ................... 35
2.8 A Simulated fMRl Study ......................... 38

3 Multi—scale Detection for MM 42
3.1 Overview .................................. 42
3.1.1 Spatial Modeling and Outline for the Method ......... 43

3.1.2 General Setting of Bayesian Image Segmentation ....... 44

3.1.3 Multi-scale Image Segmentation Methods and Advantages . . 45

V

3.2

3.3

3.4

3.5

Multi-scale Analysis ...........................
3.2.1 l-D Multi-resolution Analysis (MRA) ..............
3.2.2 Properties of the Discrete Wavelet Transform .........
Image Segmentation by Multi-scale Markov Modeling Discrete Field .
3.3.1 Main Points of Algorithm I ...................
3.3.2 Simulation Result .........................
Image Segmentation by Multi-scale Hidden Markov Model (MHMM)
of Wavelet Coefﬁcients ..........................
3.4.1 Likelihood Function for {MRI Data ...............
3.4.2 Key Point for Edge Detection and a Prior Distribution for the

WaveletCoeﬁicients
3.4.3 A Failure Modeling ........................
3.4.4 Solution for Joint a Posteriori State Probability ........
3.4.5 Marginal a Posteriori State Probability Calculation ......
3.4.6 Image Label Estimation .....................
3.4.7 Extension to 2 Dimensions ....................
3.4.8 A Simulation of the New Segmentation Method ........
Processing Results for MRI Data by Two-Step Approach .......

4 Discussions and Conclusions

4.1
4.2
4.3

For Pixelwise Detection ..........................
Consideration on Spatial Information ..................
Epilogue ..................................

5 Appendices

5.1 Appendix I: Notations and Some Results on Projection Matrix . . . .
5.2 Appendix II: Some Preliminary Results on x2 and F Distribution . .
5.3 Appendix III: Deﬁnition of chian, Rayleigh and t Distribution . . . .
5.4 Appendix IV: Analysis of t Test Used in MRI Detection .......
5.5 Appendix V: Principle of Invariant Test .................
5.6 Appendix VI: SMAP Algorithm for Multi-scale Image Segmentation .

5.6.1 A Prior Consideration ......................

5.6.2 Likelihood Function ........................

5.6.3 Criterion and Solution ......................

BIBLIOGRAPHY

vi

47
54
55
55
57

60

67
69
70
72
73
74
76

81
81
83
84

85
85
86
88
89
91
94
95
96
97

101

LIST OF TABLES

2.1 P, with P, = 0.01, N = 120 ....................... 36
2.2 P, with P, = 0.025, N = 120 ....................... 36
2.3 P, with P, = 0.05, N = 120 ....................... 37

vii

1..1

1.2

1.3

2.1

2.2

2.3
2.4

2.5

3.1
3.2
3.3
3.4
3.5

3.6

3.7

3.8
3.9

LIST OF FIGURES

A block diagram of an MRI system ....................
A series of images are acquired in fMRl to detect neuronal activity.
Illustration of fMRI stimulus and responses. a) Controlled stimulus;
b) Response from an activated area; c) Response from an inactivated
area. ....................................
(a) Activation-baseline pattern for an MRI experiment; (b) Modeling
the activated pixel as a system. .....................
One time series from a real fMRI experiment. (a) Real part; (b) Imag-
inary part; (c) Phase ............................
Structure of matched subspace detector. ................
Three curves comparing the performance of three detectors. (a) a/o =
1; (b) a/o = 3.162; (c) a/o = 10. ....................
A simulated fMRI experiment illustrating pixel-wise detection by three
detectors. (a) Brain image with simulated activation region high-
lighted; (b) MC test results: Pd = 0.77; (c) CC test results: Pd = 0.70;
(d) GLRT results: Pd 2 0.79. ......................
Illustration of image segmentation. ...................
Nested scale spaces and wavelet spaces .................
Computation of DWT by ﬁlter bank ..................
Synthesis by ﬁlter bank ..........................
Pyramid structure of the MSRF. (a) Continuous image Y at different
scales. (b) The random ﬁeld X j at each scale is causally dependent on
the coarser scale ﬁeld X1"1 above it. ..................
Neighborhood structure used in Algorithm 1. (a) Quad-tree structure
used for 2-D case; (b) Binary tree structure used for 1-D case.

One simulated image segmentation using Algorithm 1. (a) Original
noisy image; (b) Segmentation result ...................
Binary data tree structure for Harr wavelet analysis ...........
Propagation of wavelet coefﬁcients (three scales) ...........

viii

14

16
23

39

41
46
52
53
53

56

58

59

61
64

3.10 Wavelet-based HMM ............................ 66
3.11 Conversion of a 2-D image to 1-D sequence ............... 74
3.12 One simulated image segmentation by Algorithm II. a) Noisy image;
b) Detected edges; 0) Segmented image .................. 75
3.13 Processing results for fMRI data by two-step approach. a) One fMRI
correlation image; b) Segmented image of (a), also ﬁnal detection re-
sults by combinational use of image segmentation and single pixel de-
tection; (c) Another correlation image; (d) Segmentation and detection

result of (c) ................................. 80
5.1 Experiment setup and two hypotheses for t-test used in MRI detection. 90
5.2 Invariant Test ................................ 93
5.3 Segmentation Result Using SMAP Algorithm. ............. 100

ix

CHAPTER 1

Introduction

No matter how advanced the computer is, it is still no match for the
human brain, which to this day remains an unfathomable enigma.

Li Hongzhi, Zhuan Falun

Magnetic resonance imaging (MRI) is a powerful diagnostic imaging technique
based on the principle of nuclear magnetic resonance (N MR), describing the interac-
tion of nuclei and magnetic ﬁelds [31]. A block diagram of an MRI system is shown in
Figure (1.1), adapted from [31] and [39]. It is a very complicated system, embracing
many aspects of electrical engineering. The patient serves both as a transmitter and
as a receiver.

There are three types of magnetic ﬁelds in an MRI system [39, 67]. A static and
very strong magnetic ﬁeld is generated by huge superconducting magnets. A much
weaker pulsing radio frequency ﬁeld is employed to generate MR signals. Three sets of
orthogonal gradient ﬁelds are used for imaging purposes, i.e., to spatially resolve the
patient’s small structures to form an image. Although other imaging methods such
as projection methods do exist, the current trend for MR imaging is to use Fourier
inversion. The ﬁrst set of gradient pulses uses linear changes in ﬁeld strength to

localize a region of interest in the subject’s body to be imaged — “slice selection”.

-..”

The second set of gradient pulses employs linear changes in frequency to distinguish
columns in an image —- “frequency encoding”. The third set of gradient pulses
utilizes linear changes in phase to distinguish the rows 7“ “phase encoding” [39, 12].

While traditional MRI provides only static images to analyze anatomical structure,
functional MRI (fMRI), a newer imaging modality which is based on MRI and just
comes to the stage during the past decade, acquires a series of images to detect neural
activity, that is, to locate where— and how — the brain responds to certain stimuli
[12, 37]. In other words, the central task for fMRI is to obtain maps of active and non-
active regions of the brain corresponding to speciﬁc stimuli. Figure (1.2) illustrates a
series of images acquired in an fMRI experiment. In this ﬁgure, the white small square
indicates an activated region; the black small square indicates an inactive region.

Compared with other imaging techniques, such as X-ray computerized tomography
(CT), MRI is considered safer since it does not require the subject to be exposed to
ionizing radiation [31]. MR images are also of high contrast and resolution. MRI
provides more information since MR signals depend on several tissue parameters [31,
37, 68]. In addition to spin density p(r), the number of NMR visible spins in a
given region, there are two principal relaxation times, each of which, in principle,
can be used individually or combined [12]. MRI does not use exogenous agents, an
advantage over another popular brain mapping technique known as positron ‘emission
tomography (PET).

Research on MRI involves substantial knowledge in physics, physiology, neurol-
ogy, and psychology. This dissertation, however, concentrates on signal processing
aspect. Note that the materials in this chapter are just for illustrative purposes; they
are not meant to be comprehensive. For details, the reader is encouraged to refer to

[12, 31, 37, 39, 68], which are the main sources for materials in this chapter.

 

ﬁn elementary particle with the mass of an electron and a charge of the same amount as the
electron’s but positive.

 

Magnet

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Gradient Coils
RF Coil
Diagram of Receiver
Low .
. Main
RF C011 __
Gradient Coils ””33
AmP
Magnet AmP
Phase Shi
90 de
1 Odegree Phase Phase
- Detector
22:1:2; PowerArnp Detector
l
Filter Filter
Gradient RfPulse Di . .
Pulse . 3“”
gate V
l___ I
]_ Cottinuous RFWave

 

 

 

 

 

 

Computer Frequency
Synthesizer

 

 

 

 

 

 

 

Figure 1.1. A block diagram of an MRI system.

 

 

 

 

 

 

/ / /

 

 

Figure 1.2. A series of images are acquired in MRI to detect neuronal activity.

1.1 How does ﬂVIRI work?

In short, there are basically two foundations [12, 37]. The ﬁrst is due to physiological
reasons. The other foundation comes from physics, that is, magnetic susceptibility
l Blood contains iron (hemoglobin). Neural activity is linked to blood oxygenation
levels in blood vessels close to active neurons. This relationship is called the Blood
Oxygenation Level Dependent (BOLD) effect [66]. Speciﬁcally, neuronal activity
causes an excess of oxygenation level in the blood nearby active brain tissue. De-
oxygenated blood is more paramagnetic than oxygenated blood. Changes in the
local concentration of de—oxyhemoglobin within the brain lead to alterations in the
MR signal. Subtle variations in the magnetic susceptibility of oxygenated and de-
oxygenated blood that are then detected in the MR signal indicate neural activity.

For more details, refer to [12, 37].

 

Any object placed in a magnetic ﬁeld will magnetize to a degree slightly more than (paramag-
netic) or less than (diamagnetic) the applied ﬁeld. The relationship between the ﬁeld experienced
within a sample and the applied ﬁeld is known as the magnetic susceptibility calculated as the
ratio of the internal field to the applied ﬁeld.

1.2 Characteristics of the fMRI Signal

Currently, the procedure to do fMRI experiments is to instruct the subject to perform
experimental (E) and control (C) tasks in an alternating sequence of some design [12].
Refer to Figure (1.3a). Speciﬁcally, the subject is instructed to remain relaxed in the
controlled (resting) state, and to perform some kind of consecutive and repetitive task
(for example, ﬁnger tapping) in the activation (experiment) state [36]. In addition
to controlled stimulus, two responses are also made up in Figure (1.3), one of which
may be from the white spot (active pixel) in Figure (1.2) and the other of which may
be from the black spot (inactive pixel) in Figure (1.2).

It takes some time for MRI signal to reach its peak after the onset of the stimulus,
which is called Response Latency [12].

An fMRl time series has a complicated structure. It contains noise varying with
anatomical location. There is random noise as well as structured noise due to in-
strumental (MR system characteristics), physiological (cardiac and respiratory pul-
sations), and experimental (such as patient motion) factors [ 2, 12, 29, 38].

In addition to this complicated structure, the temporal and spatial characteristics
of the time series are also unknown [29]. All these render the analysis of fMRI time
series very difﬁcult.

The hemodynamic signal changes in MRI during brain activation are extremely

small, from 2 to 5% at moderate magnetic ﬁeld strengths (1.5T) [12].

1.3 Outline for Detection Methods for MRI

So far most fMRI detection methods are only pixel-wise. Generally, analysis of
changes in neural activity is explored using statistical parametric map (SPM) [69],
which is a two dimensional (2-D) image of a test statistic determined at each pixel

by some operation between signal and reference. Originally, t statistics were usually

5

used [14]. In repetitive experiments involving a dynamic time sequence of images,
a correlation method is now common, in which the correlation between each time
series and a reference signal is used to decide whether or not activity is present [3]. It
reduces to an F statistic test. Many generalizations and extensions of this simple idea
have been proposed [24, 25, 36, 55, 60, 64]. Chapter 2 is also on pixel-wise detection,
but contrary to most methods, complex-valued data are considered.

Detection methods exploiting spatial information (correlation) of fMRI signal have
also been recently proposed for MM [21, 35, 58]. Most of them use Bayesian strate-
gies. For example in [35, 17], Bayesian principles and Markov Random Field (MRF)
models are employed to facilitate joint spatio-temporal analysis of fMRI data. Chap-

ter 3 develops a new multi-scale framework for similar purposes.

1.4 Organization of This Dissertation

Chapter 2 stresses pixel-wise testing, which is most common in practice. First a
model for the complex fMRI time series is proposed . The most distinct feature of
the model is that the baseline nuisance component and reference signal component
share a common phase. A nonlinear detection problem ensues. Based on the classical
generalized likelihood ratio test (GLRT), three methods are investigated to attack
that detection problem: the conventionally used magnitude correlation (MC) detec-
tor, the complex correlation (CC) detector newly proposed by Lai and Glover [36],
and a new nonlinear GLRT detector, with emphasis on the last one. The properties
of the nonlinear GLR detector are carefully studied and a method for threshold se-
lection from numerical simulations is presented. The nonlinear GLR detector has the
best performance among the three [48].

An multi-scale detection method exploiting spatial information is presented in

Chapter 3. Speciﬁcally, the idea of multi-scale image segmentation is used to improve

the performance of fMRI detection. An multi-scale Bayesian framework for image
segmentation is introduced in the beginning. Two image segmentation algorithms
are then investigated, with emphasis on the second one whose application to fMRI
detection is the second part of this research.

Some discussions and conclusions regarding work of this dissertation are gathered
in Chapter 4.

Chapter 5 collects Appendices including some notations used throughout this
dissertation, some results on x2, F and t distributions useful for the development in
Chapter 2, the t-test used in MRI detection [14], principle of and some deﬁnitions
and theorems on invariant tests [7, 56] used in Chapter 2, and the sequential maximal

a posterior (SMAP) algorithm for multi-scale image segmentation [10].

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1* E E E E E E
c C c c C y
t
(a)
\/ V f
(b)

 

MAW/[W

(C)

 

 

"V

Figure 1.3. Illustration of fMRI stimulus and responses. a) Controlled stimulus; b)
Response from an activated area; c) Response from an inactivated area.

CHAPTER 2

Pixel-wise Detection for ﬂVIRI 1

A tree broader than a man can embrace is born of a tiny shoot;
A dam greater than a river can overﬂow starts with a clod of earth;
A journey of a thousand miles begins with a single step.

Lao Tzu (500 BC, China), Tao Te Ching

2. 1 Overview

In fMRI a series of MR images of the brain are acquired over time to detect neural
activity. As explained in Section (1.1), the BOLD effect can be used to obtain maps of
active and non-active regions of the brain. In order to achieve high signal noise ratio,
the spatial and temporal imaging resolution must be limited [34]. Unfortunately,
low resolution imaging may lead to a loss in signal information originating from
microvasculature [60]. Hence, there is a fundamental trade-off between resolution and
SNR in MRI. It is therefore of great interest to develop reliable detection methods
for MRI in the presence of noise.

Almost all fMRI tests are based on the image magnitude data. In standard prac-
tice, the raw MRI data is reconstructed and the magnitude is taken to eliminate the

(unknown) phase. I focus in this chapter on the repetitive experiments involving

9

a dynamic time sequence of images. Under various assumptions and experimental
setups the fMRI detection problem reduces to well-known statistical tests including
t-tests and F-tests [69].

When comparing two groups of images — “rest state” and “active state” images
— a t-test is usually used [14], the derivation of which is at Appendix IV of Chapter
5. Another common approach to fMRI detection, called magnitude correlation (MC)
detection in this dissertation, is based on a test statistic obtained at each pixel by
correlating the magnitude time-series with a reference signal, which is assumed to
be known and representative of the BOLD response [3]. Many generalizations and
extensions of this simple idea have been proposed [24, 25, 36, 55, 58, 60, 64]. For
example, recently Lai and Glover proposed a complex correlation (CC) test based
on the complex data (i.e., image data before taking the magnitude at each pixel), in
order to take advantage of phase information in the data and improve the detectability
of fMRI responses [36].

The CC test statistic is F -distributed and the detector has a constant false-alarm
rate (CFAR) property, which means that a speciﬁed false-alarm rate, i.e., the proba-
bility of deciding a pixel is active when in fact it is not, can be achieved irrespective of
the unknown parameters. Throughout this dissertation, the false-alarm rate is denoted
by P, and the probability of detection is denoted by P4. Despite the CFAR property,
the CC test focuses only on the response component (called signal component) of the
data and ignores the baseline component (nuisance component in a general setting)
of the data. The baseline component does not contain information relevant to the re-
sponse itself (and is hence called nuisance component), but it does contain important
information about the phase. In this chapter, a new test is proposed based on the
GLRT principle that allows us to incorporate the phase information contained in the
baseline component.

In this chapter simple pixel-wise testing is stressed based on a Gaussian white

10

noise observation model. Pixel-wise testing ignores spatial relationships in MRI
data, which is to be considered in Chapter 3. However, since the focus of this chapter
is to assess the potential beneﬁts of fMRI detection using complex data, a simple data
model and testing procedure is employed to explore this basic issue. The assumptions
are perhaps too simplistic in many practical cases. However, it is possible to extend
the results and conclusions to more elaborate approaches based on more realistic data
and / or correlated noise models, potentially accounting for uncertainties in the BOLD
response and/or different nuisance component. Such extensions are brieﬂy discussed
in the conclusions of Chapter 4.

This chapter is organized as follows. In Section (2.2), a basic model for MRI
data is reviewed. Three tests will be studied, all of which may be interpreted as
GLRTs under different data model assumptions. Therefore, before looking at each
method, the GLRT principle is brieﬂy reviewed in Section (2.3). In Section (2.4) and
Section (2.5), the standard MC and recently-proposed CC tests are examined and
the statistical properties of each are studied. The new GLRT for MRI is derived in
Section (2.6). Its properties are explored in detail. In Section (2.7), the performance
of the MC test, CC test, and GLRT in various regimes of baseline signal intensity
to noise ratio are compared. Extensive Monte Carlo simulation is used to assess
the performance of the detectors. The results show that the GLRT does have a
CFAR property and a simple rule for choosing the threshold to achieve a desired
P, is observed. The distribution of the GLRT statistic at high ratio of baseline
signal intensity to noise is approximated from observation on threshold selection.
Numerical studies show that the GLRT outperforms the CC test. Speciﬁcally, for a
ﬁxed false-alarm rate Pf, the GLRT’s detection rate Pd is higher than that of the
CC test. Furthermore, the GLRT also performs signiﬁcantly better than the MC test
at low baseline signal intensity. The performances of the GLRT and MC detectors

are roughly the same at high baseline signal intensity, and in such situations both

11

perform better than the CC detector. In Section (2.8), the performance of all three
detectors is demonstrated in a simulated fMRI experiment.
Section (2.9) contains the three detectors for the same model but with known

noise varianceone, which is a by-product of my research.

2.2 fMRI Signal Model

Due to phase errors which are difﬁcult to control, the signal component of the mea-
surements occurs in both real and imaginary channels [41, 6]. This suggests the
following simple model *for an fMRI pixel time-series. Suppose there are N images
acquired in the experiment. Let x denote an N x 1 vector containing the time-series
data from one pixel:

x = (a1 + br)e“9 + one. (2.1)

The data vector x consists of three complex-valued components. The ﬁrst com-
ponent a1 is a constant (DC) baseline component, where 1 denotes an N x 1 vector
of ones and a > O is the amplitude of the DC component. This vector represents the
average value of the time-series. The baseline component model proposed here is the
simplest version of the nuisance component. The second component br is the oscilla-
tory response (signal) component. The vector r is a reference function that models
the expected response characteristic. The amplitude b characterizes the strength of
the response. In the absence of activity, b = 0.

Typically, in MRI studies, while the subject is under some baseline condition (for
example, at rest), a number of frames Nb is acquired; then the subject is asked to
perform some task (for example, ﬁnger tapping) and a number of activation frames

N, is acquired, or vice versa. These constitute one cycle. During each cycle, the total

 

This model is attributed to Dr. Nowak.

12

number of frames is thus N = N, + Na. This pattern is repeated for a number of
cycles. In Figure (2.1) adapted from [2], the activation-baseline pattern is represented
by a periodic rectangular waveform of period N, with 1 and 0 representing activation
and baseline, respectively [2].

One can think of the signal component as the response of a system whose input
is the activation-baseline pattern. See Figure (2.1b). In real problems, the response
signal r (the output of the system in Figure (2.1)b) is unknown. Friston et al. modeled
the system as a linear time invariant (LTI) system [22], which is questioned by [2,
3]. Several possible estimates for the reference signal are suggested in [2, 3]. The
ﬁrst method is to use a delayed version of the activation-baseline pattern. It is most
easily implemented. The disadvantage of this method is that the delay is not known
a priori, and it may vary from pixel to pixel. The second suggestion is to select the
response of one or more activated pixels as the reference signal. The third Option is
to average the response of some activated pixels across cycles. The reference signal
would then be formed by periodically replicating this time-averaged cycle throughout
the time course. None of these approaches is perfect. Throughout this dissertation,
the reference signal r is assumed to be known.

The baseline component and signal component share a common phase 19. Hence,
we model this phase-coupling by multiplying both components by the complex number
e‘”, where i = \/—_1. In addition to these two components, an additive complex
Gaussian white noise component onc models errors primarily due to thermal noises in
the patient [13, 18, 19, 44]. The term 116 denotes a standard (zero mean, unit variance)
complex Gaussian vector of length N. The factor a scales the noise resulting in a
variance of 02. In general, the parameters of this model a, b, 19, and o are unknown
and are different for each pixel time-series. This model was compared to actual fMRl
time-series and these assumptions are in good agreement with actual data. Figure

(2.2) shows the real part, imaginary part and phase of one time series from real fMRl

13

 

 

 

 

 

 

 

 

 

 

 

 

 

A Activation-Baseline Pattern
1

o -
Na N=Na+Nb 2N Scan #

(a)

Stimulus Pattern Response Signal

System

3 (Unknown Response >
of Activated Pixel)

 

 

 

(b)

Figure 2.1. (a) Activation-baseline pattern for an fMRl experiment; (b) Modeling the

activated pixel as a system.

14

experiment and it is illustrative of the constant phase idea in the model.

The modeling of fMRI signals is a very complicated process [24, 29, 38]. Our model
is still rough and not complete. The noise structure in fMR.I is very complicated. Our
model does not try to capture more complicated disturbances present in MRI data
such as other nuisance components, for example, due to physiologic motions. For
the sake of simplicity and demonstrating our method and ideas, the noise is assumed
to be white Gaussian. The whiteness assumption does not change the problem
essentially, since we can always use Cholesky factorization [56] to whiten the signal
and leave the detection problem unchanged, although, of course, estimation of the
covariance of the noise is a challenging problem in itself. Usually parametric tests,
to which discussions in this chapter are conﬁned, assume a Gaussian model for the
underlying time series. Several researchers have challenged this assumption and have
used nonparametric tests, including the Kolmogorov-Smirnov test, Kruskal-Wallis
test, and Wilcoxon signed ranks test, for a summary, see [69].

I succeed in deriving the GLRT directly from this complex model in Equation

(2.1):

_ xHx—N[af+a§]
xHx - its + a3 + a2 + a? + t/(2ata2 + 25152)? + (at + ﬂ? — a3 — airl’

 

L0

 

where H denotes complex conjugate transpose and

xgl
01 = —i
x/N
xTr
)61 : _R—'i
\/N
x?1
02 = —?
\/l—V_
le
ﬂz = -I—-
\/N

15

 

 

 

 

 

 

 

 

 

 

 

 

 

 

x 10 Real Part
1 I f I I I I I I I
_1 1 l l l l I l l 1
0 1O 20 30 40 50 60 70 80 90 100
x 10‘ Imaginary Part
"1 I F I I r ﬂ I I I
_,,2_........; ......... f. ......... f. ......... g ......... s
_1.4 t— ....................................................................................... .,
-1.6 .. ..... , ................................................................................... _
-1.8 l l l l l l l l
10 20 30 4O 50 60 70 80 90 100
Phase
1 r r r I I I g I
§ e ‘
g :
e a
a .
f ;
1 l l l l l l l l
10 20 30 40 50 60 70 80 90 100

Figure 2.2. One time series from a real fMRI experiment. (a) Real part; (b) Imaginary
part; (0) Phase.

16

This form, however, is not well suited for mathematical analysis. I thus turned
to analysis in real domain. Because complex numbers can be interpreted as pairs of
real numbers, I therefore re-express the complex model (2.1) as a 2N x 1 dimensional

real-valued model:

y = 805 + quS + on, (2.2)
where u “=— b/a and
x 1 0 r 0 acosr9 n
y = R , S = , H = , z ’ n = CR
x1 0 1 0 r asini9 nd

The subscripts R and I denote real and imaginary parts, respectively. The phase-
coupling in the complex model is manifest in this real model as a nonlinear coupling
between parameter p and parameter 05. The reader is reminded here that u is the
ratio of reference signal intensity to baseline signal intensity.

Note here that this nonlinear model stands in marked contrast to the classical

linear regression model:

y = S¢1 + uH¢2 + on, (2.3)

where 421 and 4», are independent. In the following it is shown that the CC test of Lai
and Glover [36] can be derived from the linear model above. It is our contention that
the nonlinear model is a more accurate representation of the physical fMRl problem,

and, indeed, the new GLRT based on the nonlinear model outperforms the CC test.

2.3 Generalized Likelihood Ratio Tests

The likelihood ratio test (LRT) [33] is an optimal method for deciding which of two
hypotheses (competing data models) best describes a set of observed data. The data

model corresponding to each hypothesis is given by a probability density function

17

(pdf). Unfortunately, however, to implement the LRT, the pdf’s under each hypoth-
esis must be completely speciﬁed. The corresponding test is called simple hypothesis
test problem. This is not the case in MRI. In the fMRI case, we have two hypotheses:
Ho, BOLD response absent (u '= O), and H1, BOLD response present (a at 0). Under
hypothesis H0, the vector 05 and the noise power 02 are unknown. Under hypothesis
H1, 05, o2 and u are unknown. Due to the unknowns, in MRI we have what is called
a composite hypothesis test.

There are two standard approaches to composite hypothesis testing. The Bayesian
approach prescribes a prior pdf’s for the unknown parameters themselves; and the
likelihoods are integrated against these pdf’s to eliminate the dependence of the LRT
on the unknown parameters. Another approach, the generalized likelihood ratio test
(GLRT) is often preferable to Bayesian approaches due to its ease of implementation
and less restrictive assumptions. Speciﬁcally, the GLRT does not require the speciﬁ—
cations of a priori probability distributions for the unknown parameters [33, 56]. For
these reasons, this chapter focuses on the GLRT.

The idea of GLRT comes from the LRT used in simple hypothesis testing which
means the pdf for each assumed hypothesis is completely known. Let pH,(x; 8,),
i = 0,1, denote the pdf’s corresponding to the two hypotheses. Recall that x denotes
the data. The argument 6,- denotes known parameters that specify the precise form
of the pdf. For example, 9, may represent the mean vector and covariance matrix of

a multivariate Gaussian density. The LRT decides H1 if

L(x) _ PH1(X; e1)

_ > ,
pHo (X; 90) n

where n is the threshold, which can be chosen to achieve a desired P,. The likelihood
ratio (LR) L(x), a function of the data x, is called the test statistic.

The GLRT is also based on the LR, but in the composite case the parameters are

18

unknown. The key idea in the GLRT is to replace the unknown parameters by their

maximum likelihood estimates (MLE’s). In general, the GLRT decides H1 if

pHo (x; 60)

i

where O1 is the MLE of 81 assuming H1 is true, and O0 is the MLE of Go assuming H0
is true. The MLE of a parameter is simply the value of the parameter that maximizes
the corresponding pdf (i.e., the value that makes the observed data most likely).
The GLRT has no optimality property, in general, but it asymptotically ap-
proaches the uniformly most powerful (UMP) test among invariant tests [7]. For
more details on maximum likelihood estimation and the GLRT, see [33]. In the fol-
lowing sections we review the basic MC and CC tests and introduce the new nonlinear

test for MRI detection using the GLRT.

2.4 Method 1: Magnitude Correlation Detection

The magnitude of MRI data is known to be Rician distributed [41]. To see this, note

that 1:,- in Equation (2.1), the jth observation in the time series can be written as
x,- = (a + brj) c039 + (mg,- + i[(a + brj) sin 9 + on;,-].

The two terms being independent, 2, E [le is Rician distributed (see Deﬁnition (3)
and Equation (5.1) in Appendix III). However, for large values of ratio a/o (ratio of
baseline component intensity to noise standard deviation) the Rician density can be

well approximated as a Gaussian distribution because

 

ZjEIIEj] = x/[(a+brj)cosl9+onR,—]2+[(a+br,-)sin0+on1,-]2

 

: \/(a + brj)2 + 02073.2]- + 7731-) + 2(0 + ij)0(nRJ €080 + nlj Sin 0)

19

 

 

20(nR-cosd+n1-sin0) 02
:: (a+bTJ)¢1-[- .7 a+brj J +——-——(a+brj)2(n§j+n%j).

Note that n3, c050 + n],- sind is nothing but another Gaussian random variable.
We denote it as n,. Also note that n,- and nk are independent for j 79 k. n32]- + nij
is a x3 random variable. Under the assumptions that a >> a and u = b/a is very
small, the third term under the square root sign is much smaller than the second one

and therefore can be neglected. In this case, application of the binomial expansion
Wz1+éx |:r|<<1,
to the above equation leads to
z,- za+brj+onj.

Hence, the following Gaussian approximationa is commonly assumed for MRI
detection [3]:

z z a1 + br + on, (2.4)

where here 2 = Ix], n is (real) Gaussian distributed, and with b = 0 under Ho and
b 75 0 under H1. Hence, in this case 60 = [a 02] and 91 = [a b 02]. Bear in mind
that this approximation does not accurately model the data when a/o is small, as we
shall see later in some examples.

The GLRT for the above detection problem results in the following test statistic

[56, 57]:
II can II2
IIP.%P1*z II”

 

t1(Z) = (N - 1)[L1(Z) - 1] = (N - 1) (2-5)

20

where

2
ll Prz H

L z = .
“’ “Pszu’

If t1(z) > m, then we decide H1; otherwise choose Ho. On the assumption that z
is truly Gaussian, t1(z) is distributed as F1,(N_1)(SNR) [57], where SNR E uzaz/oz.
Refer to Appendix II in Chapter 5. The detector has CFAR property l Therefore,
we can choose a threshold in to achieve a desired P, irrespective of the signal to
noise ratio, which is generally unknown a priori. This detector is called the magni-
tude correlation (MC) detector because the test statistic t1(z) is proportional to the
. correlation between the magnitude data 2 and the reference signal r.

Unfortunately, the Gaussian approximation in Equation (2.4) is unreasonable
when a/o S 3. In fact, when a/o is small, the distribution of test statistic t1(z)
is not known, nor whether the MC detector has CFAR property. So determination
of a proper threshold to obtain a desired P, is theoretically very difﬁcult. How to
solve this problem will be explained together with numerical results. Moreover, in
this case, one expects the performance of the MC test to suffer. This is indeed the

case as shown later by numerical results.

2.5 Method 2: Complex Correlation Detection

Recently, Lai and Glover proposed a complex correlation (CC) test based on the
complex data, in order to take advantage of phase information in the data and improve
the detectability of fMRI responses [36].

Here, the CC test statistic is shown to be also F—distributed and also has a CFAR

 

Note that the MC test statistic has a central F1, N-) distribution under Ho ([1 = 0).

21

prOperty. Recall the linear model

y = 54>, + pH (#2 + on. (2-5)

The unknown parameters in this case are 90 = [of d); 02] and 91 = [05] g u oz].

The GLRT based on this model in fact coincides with the CC test [36] and is given

 

 

by
.L 2 P PL 2
t2()’) = (N-1)[L2(y) - 11=(N—1)“priy”2 = (IV-1)” Z 1"”2 (2.7)
llPSHPSyll “PHPSY”
where
2
|| PsLy ll _ yTPsfy

 

1420’) = — —-
H y H2 " ll Ps)’ ”2 — H PH)’ “2 YTPSLHY

(2.8)

If t2(y) > 172, then we decide H1; otherwise, choose Ho. This test is called the
complex correlation (CC) test because it is based on the correlation between reference
signal and real and imaginary components of the complex data. The pdf of t2(y) is
non-central F2,2(N_1)(SNR), where again SNR = u2a2/02, and thus the detector has
the CFAR property i

Despite this desirable property, the drawback to this test is that it is based on a less
accurate model. The CC test focuses only on the response component of the data and
ignores the baseline component. The DC component does not contain information
relevant to the response itself, but it does contain important information about the
phase. Although the phase is a nuisance parameter in the testing problem, more
accurate knowledge of the phase can improve the detectability of the fMRI response.
As noted previously, the phase-coupling between the nuisance and signal response

components of the data dictates the nonlinear model in Equation (2.2). Therefore, I

next derive a new GLRT based on this more accurate model.

 

Note that the CC test statistic has a central F2300-” distribution under Ho (p = O).

22

 

 

 

Figure 2.3. Structure of matched subspace detector.

Before going to next section, I’d like to summarize a little bit. The MC detector
and CC detector actually have the same structure as shown in Figure (2.3) adapted
from [57], which is known in the signal processing literature as a matched subspace
detector [57]. First the data are projected onto a low-rank subspace by removing
interference. The projector is also termed an interference rejecting or null steering
ﬁlter [57]. Then the resulting data are further projected onto another low-rank sub-
space that is matched to the signal component, and energy is taken. This projector is
usually called a matched subspace ﬁlter. Since the noise level is unknown, this energy
is then compared with the energy in the component orthogonal to signal subspace.

The ratio is computed and compared with a threshold for a decision [57].

2.6 Method 3: Nonlinear GLR Detection

In this section, a new nonlinear test based on the GLRT principle is found that
incorporates the phase information contained in the baseline component [48]. The
unknown parameters in model (2.2) are 90 = [¢T 02] and 81 = [¢T u 02], under Ho

23

and H1, respectively. Recall that the phase coupling introduces a nonlinear coupling
between the parameters u and cf) under H1. This nonlinearity makes the MLE’s very
difﬁcult to compute, but, remarkably, a closed-form solution for the GLRT statistic

which is derived in the following subsection does exist:

t3(¥) = [L300 - llUV - 1), (29)

where

2” PsLy II2

(2.10)
H Pity II2 + II PsLy |l2 - \/l| Pay II4 + II Psy II4 + 2II Pay l|2|| Psy “2008 2P

 

L3(Y) =

 

 

with
T
(61:02) yT-}%y
coscp y = = (2.11
‘ ’ II a HH 02 n n Pay nu Psy ll ’
and 01 and 02 are two sufﬁcient statistics
1 T 1 T
01(y) = NH y, and 02(y) = N5 y. (2.12)

As usual, given a speciﬁed threshold 173, we decide H1 if t3(y) > 173, and Ho

otherwise.

2.6.1 Derivation of the GLRT Test Statistic

In my derivations, I assume lTr == 0 and rTr = N without loss of generality, so
STH = HTS = 02x2,STS = HTH = N12“. The second condition can always be
satisﬁed since we can always normalize the reference signal without changing the
problem essentially. The ﬁrst condition is more difﬁcult to meet. But the same
ideas of decomposing one orthogonal projection operator into two oblique projection

operators as in [4, 57] can be utilized to achieve the result, even under more compli-

24

cated nuisance and signal (reference) structures, i.e., the nuisance component is not
just a constant baseline 1, nor is the signal component just one reference signal r.
Possibly, both of them may be adaptively selected from real data to contain several
components (see discussions in Chapter 4). However, the analysis of the test statis-
tic’s properties (such as invariance), and hence the determination of the threshold are
much more complicated than presented in this dissertation. Even so, the properties
of the corresponding linear detector may still be used as a guide.

Let 30 and 31 denote the MLEs of noise variance under hypotheses Ho and H1,

H (Xiél)

respectively. Recall that the GLRT is based on m. It is easy to show that this
0 2
statistic reduces to
L3(y) = min(3§)/min(3f). (2.13)

Recall that under Gaussian distribution the maximum likelihood estimate is the
same as the least square estimate. Therefore, the calculation of min 33 is straightfor-
ward:

. A 2
"111103 = II PsLy ll (2-14)

However, determining min(3f) is much more difﬁcult due to the nonlinear coupling
between the two unknowns u and ob. To circumvent this difﬁculty, I ﬁrst decompose

y and y — Sd) — uH¢ into three orthogonal components, i.e.,
y = PsLHy + Psy + PH)’,

and then

8f = IIy-——S<2b-MH<1>II2
2
= II PsLHy + (Psy - S¢>) + (Pay - uH¢l ll

2
= II Pay II + II Far — Set I!” + II PHY - uH¢ H2

25

= n Pay “2 + n 862 -- s¢ u? + In Hot — uH¢ “2
"' l] PsLHy ”2 + N[01T01 + 0392 +(1+ #2)¢T¢ — 2(92 + HallT‘l’l

where 01 and 02 are given by Equation (2.12).
Now min 3;" is equivalent to min J = (1 + u“)¢T¢ - 2(92 + #01)T¢. Setting partial

derivatives of J with respect to u and (b to zero results in:

ii- = 2u¢T¢ - 20w = 0,
3;; = 2(1+ #2)d> - 2(92 + not) = 0
We then get
A 9T“
A 92 + I201
= —, 2.1
¢ 1 + 1,, ( 6)
So now

mine? = u Pay ”2 + metal + 91592) — N<1+ now.
Furthermore, note that
N(9f91+ 9592) = II Pay II2 + II Fey “2, and II PsLHy II2 + II Psy II2 + II Pay II2 = ll 3’ “2,
which gives
minaf = u y ll’ — N0 + ﬁ2)$T¢— — II y u— 3% ————[6T92 + more + T120911
Eliminating d; from Equations (2.15) and (2.16) (or setting the derivative of the

26

 

above equation with respect to II to zero) shows that II must satisfy:

0302 + 222:9?)2 + Wafer _ 9% + new,

 

 

2.17
1+? p ( )
Using this equation, min 8? can be further simpliﬁed:
A 0%
min 0i = H 3' ||2 - N(9f91+ 1_ﬁ2) (2-18)

This equation is important in our derivation of an asymptotic expression for L3 (y) in
the following subsection.

From Equation (2.17), II satisﬁes quadratic equation u2 + on — 1 = 0 with

 

zﬂn—ﬂm

93‘s,
Since a] [1.2 = —1, there are two solutions of Opposite signs:
c c 2
A = —— :l: 1 —
u 2 + (2)

However, from Equation (2.18), to make sure 3? is minimal, ﬂ must have the same
sign as 9:02, and so the unique solution for 22 is:

x c 2
p:—§+ l+(-).

Substitution of ii into Equation (2.18) yields the right solution for min 3?. Finally,
from Equations (2.13), (2.14) and (2.18) I get the closed form expression for L3(y)
as given by Equation (2.10).

Instead of using L3(y) directly, I use Equation (2.9) as the test statistic, which is

suggested by the form of test statistic t2(y) for the CC detector. The main reasons are

27

to facilitate the determination of the proper threshold and to get a good comparison
between the three different detectors. It will become much clearer when we study the

asymptotic property of t3 (y).

Having established the form of the test statistic, the next question is naturally
how to choose the threshold, which is a very difﬁcult problem. In order to answer this
question we need know the pdf of this statistic. Unfortunately, unlike the CC test,
a closed form for the pdf of t3 (y) is not known to me at this stage. I can, however,

show that the pdf of t3(y) is a function of u and a2/o2 alone.

2.6.2 Invariance of GLRT Test Statistics

In this subsectiOn, I employ theories on invariance [46, 56] to prove that the pdf
of t3(y) is a function of only a and a/o. In order not to interrupt the continuity
of description, the ideas and principles on invariant test are put in Appendix V of
Chapter 5.

The difficult part in using invariant theory is to ﬁnd an appropriate set of trans-
formations which fully exploit the structure of the signal to be detected. Since our
problem is nonlinear, ﬁnding this set of transformation is not an easy matter. Actu-
ally discovery of this set of transformations comes simultaneously with its proof. The
following theorem may be regarded as an extension of the results in [56, 57], which
only deal with the linear model of Equation (2.3).

Theorem: The family of distributions of y deﬁned by

y = quz + Sci) + an (2.19)

where n is Gaussian distributed as N (0, I), is invariant to the group of transformations
deﬁned by:

G = {9(y) = 9(y) = CQSQHy} (2-20)
28

where the two orthogonal matrices

Q5 = UsQU§+PsL = (Sngv't'I—‘Sﬁ—T,
Q” = Utter/twat = fﬁo”f;+I-”THT,

c is any constant and Q is any 2 x 2 orthogonal matrix, Us and U H are deﬁned in an
obvious way.

Under the above transformation, g(y) is explicitly expressed as

9(3’) = HIH¢1+ S¢1+ Uln (2°21)

with the induced transformation G given by:

#1 = H
¢1 — CQ¢
01 = 60’

Note that Q5 and Q H are two orthogonal matrices (rotational matrices [56]).
The geometrical meaning of this transformation is thus consecutive rotations of the
original signal within the 5 plane (deﬁned as the subspace spanned by the columns of
S) and then within the H plane followed by sealing that introduces unknown variance.

The following is presented to make the proof as applicable as possible to a general
case.

Proof:

QHY = Q3595 + #QHHd’ + UQHD = 545 + HHW + UQHD,

29

since
(235' = (UHQU; + Pﬁ)S = PﬁS = S,

. T
because HTS = 0, i.e., U55 = g—ﬁS = 0.

Similarly,
Q5 H = H:
while

H¢' = QHHd) = (UHQUE; + P§)H¢ = UHQU§H¢
= UH(U,§UH)-1U;UHQU;H¢

= H(HTH)-1HTQHH¢ (2.22)

due to the fact that UEUH = I.

Therefore,
QSQHY = QS(S¢ + #HW + UQHD) = 545" + #HW + OQSQHD
where, similarly to Equation (2.22),
S¢ﬂ = stcp = S(STS)"STQSS¢. (2.23)

It turns out that 03’ in Equation (2.22) and 03” in Equation (2.23) are coincident,

which is the fascinating and devious part for ﬁnding the above set of transformations:

T
”T“ —H—H¢=Q¢.

(l), = (HTHl-IHTQHH¢=NWQW

30

due to the fact that HTH = N12x2. Similarly,

¢" = (STS')"’STQSS<I3 = ch = 45'-

Regarding the transformed form of noise 11, since Q3 and Q H are both orthogonal
matrices, Qngn has exactly the same covariance matrix form as that for 11. So,
g(y) = 0625ng assumes the form as expressed in Equation (2.21). QED.

It follows that (u, I] d) ||2/02) or more simply (,u, 02/02) is a set of maximal invariant
parameters under G (see [7, 46, 56]) for details on invariance principles).

Fhrthermore, it is easy to verify that t3(y) is invariant to the transformation group
G. To see this, ﬁrst of all, we observe that t3 (y) or L3(y) is a function of only I] y I],
I] 61 (y) I], l] 02 (y) I] and coscp (equivalently $92). We already know that Q; and Q H

are both orthogonal matrices. Moreover, noting that

HTQs = HT, HTQH = QHT,STQ5 = QST, STQSQH = QSTQH = QST,

we have invariance for the norms

|| 01(3') ||=|| 91(QsQHy) II, II 920') ||=|| 92(QsQHY) II, II y ll=|l @5033! ll -

and

9f(Y)92(Y) = 9f(QsQHY)92(QsQHY)-

A constant c in both numerator and denominator of L3 (y) and cos (p does not change
the original quantities. Therefore, t3 (y) is invariant to the transformation group G.
Hence, from Theorem 3 in Appendix V of Chapter 5, the pdf of t3(y) is a function

of u and a/o alone (instead of all four model parameters a, o, 19, and u).

31

This is a desirable and very useful feature, since this result shows that the test is
only a function of two key variables whereas in general a test could depend on all four
of the unknown parameters (a, u, 19, 02). In particular, the GLRT is invariant to the
unknown phase 19. The invariance property saved me a lot of work when I performed
numerical simulations in the next stage.

Unfortunately though, the dependence of the test on a2/o2 in addition to signiﬁ-
cant parameter u implies that, in general, this test does not have the CFAR prOperty.

Hence, selection of a threshold 7);; to achieve a desired P, is still very difﬁcult.

2.6.3 Upperbound of GLRT Test Statistics

However, some interesting relationships exist between the nonlinear GLRT test and
the CC test, which suggest some possibilities for threshold selection. Let us compare
Equation (2.8) with Equation (2.10). Note that if (p = O in Equation (2.10), then
L3 and L2 coincide. It is precisely through the term cos 2cp that the effect of phase
coupling comes into play. Actually, from Equation (2.10), the upper bound of L3(y)
is easily seen to be coincident with L2 (y).

Therefore, one method of threshold selection is to choose the threshold slightly
smaller than that determined for the CC test statistics, which has the F2,2(N_1) dis-
tribution. Furthermore, if the true parameter [1 under H1 is small, which is the case
for most fMRI detection problems, then we show in the following subsection that
L3(y) asymptotically (as N, the length of the time-series, increases) has the same
distribution as L2 (y) This provides a more solid foundation for this threshold choice
method. Under the guidance of this rough method, extensive numerical simulations

lead to more accurate threshold selection method in Subsection (2.6.5).

32

2.6.4 Asymptotic of GLRT Test Statistics

In detection problem, our concern is low SNR case. In our situation, we assume that
the true parameter )1 under H, is small, i.e., u -> 0. By the asymptotic property of
maximum likelihood estimates, as N —+ 00, fl —) p. In order to get a more accurate
approximation of II, we use Equation (2.16) combined with II —> ,u —+ 0 (as N —-) 00)
and get $4 92, so from Equation (2.15) (as N —> 00),

A ~ 9T92

...—— 2.24
It 0302, ( )

which is the maximum likelihood estimate for the corresponding linear model in
Equation (2.3).

Substituting Equation (2.24) into Equation (2.18) we have
min??? = H y H2 — N(61T0.+ 0362) = H y H2 — II Psy II" — ll PHY Il2- (2.25)

Noting that

Psi}! = PéPsLHPsL, (2-26)
P; —— Psi” = PgLPHPgL = PH = PHPgL, (2.27)
PSJ-HP,L = P,.]P,L = P3,, = I — PS — PH. (2.28)

we get, from Equations (2.13), (2.14), (2.25), as N —> 00,

L3()’) 3 1420’)-

This implies that t3 (y) asymptotically has the same distribution as t, (y), i.e., non-

central F2,2(N_1)(SNR), and thus asymptotically has the CFAR property with small

u.
33

2.6.5 Threshold Selection Based on Numerical Simulation

Another method is to try to determine the exact thresholds via Monte Carlo simula-
tion. Some of the results of our simulations are given in the next subsection. Here,
we summarize the conclusions. Extensive Monte Carlo simulation reveals that the
GLRT test is also CFAR when a/o 2 1, which is the case for most, if not all, fMRI
experiments. More importantly and more interestingly, to achieve the desired P], the
proper threshold of our GLRT detector is almost exactly one half that of the corre-
sponding threshold required for an Fl,(N-1) distributed test statistic. This is conﬁrmed

by extensive Monte Carlo simulation.

2.6.6 Distribution of GLRT Test Statistics

The observation in above subsection may lead to Nan’s Conjecture formulated as
follows.

In mathematical terms, we have

PI=/ pt3|H0(t3)dt3z/ fo(t)dt, (2-29)
77 2n

where f0 denotes the density of an FM 10-1) distributed statistic. Differentiating the
above equation with respect to 1) leads us to an exciting result that the density of the

test statistic t3 under H0 is related to the F1,N_1 density by the approximation

p¢3][10(t3) z 2f0(2t3). (2.30)

This implies that a very accurate threshold can be selected using standard Fl,(N-1)
distribution tables [1].

Actually, on careful observation of the following Tables (2.1-2.3), we ﬁnd that Pd’s

34

for GLRT and for MC detectors are the same, when a/o is large, i.e.,

Pd = / [3,3]”, (t3)dt3 "~"’ / f1(t)dt, (2.31)
n 2n

where f1 denotes the density of an F1,(N-1)(SNR) distributed statistic, since for large
a/o, the approximation for the MC model in Equation (2.4) is quite valid and hence
the Pd for MC detector is given by the right integral of above equation. Differentiating
the above equation with respect to 77 leads us to another result that the density of the

test statistic t3 under H1 is related to the F1,N_1(SNR) density by the approximation

Pt3|H1(t3) z 2f1(2t3). (2.32)

When u = 0, fl is coincident with f0. Summarizing two cases, I suspect that

Pt3(t3) z 2f1(2t3)a (2-33)

for a/o > 1. Although theoretical proof for this conjecture is difﬁcult to achieve, it
is still worth investigating. Because the pdf of this statistic does not depend on the
angle 19, the expression of y simpliﬁes greatly and may direct us to its ﬁnal solution.

I caution here that numerical simulations reveal that the conclusions in Subsec-

tions (2.6.5) and (2.6.6) all break down for a/o < 1.

2.7 Comparisons of the Three Detectors

In order to compare the performances of the three detectors, I run extensive Monte
Carlo experiments. Because originally we only know that the performance of GLRT
depends on two parameters a/o and u E b/a, it is necessary to study the performance

for different values of a/o. However, as mentioned above, for a/o Z 1, the Monte

35

 

Threshold 4.70 3.43 6.85

 

 

 

a/o )2 CC GLRT MC
1 .3162 .72 .80 .44
3.162 1 .72 .80 .78

 

10 .03162 .72 .80 .80
Table 2.1. Pd with P, = 0.01, N = 120

 

 

 

 

 

 

 

 

Threshold 3.75 2.58 5.15

 

 

 

a/o [1 CC GLRT MC
1 .3162 .82 .88 .58
3.162 1 .82 .88 .87

 

 

 

 

 

 

 

10 .03162 .82 .88 .88
Table 2.2. Pd with P, = 0.025, N = 120

 

Carlo analysis suggests that the GLRT is essentially CFAR.

In Tables (2.1-2.3), we compare the detection rates Pd of the three tests under
three P, speciﬁcations. The Pf’s are selected to be representative of those commonly
used in fMRI. In these tables, the ﬁrst row contains the thresholds corresponding to
the preselected Pf. In order to see the functional dependence of Pd on SN RE ”202/02,
I deliberately select [2 so that SN R is the same for three different a/o cases. In order
to get as accurate results as possible, for each value of Pd (and P,), 105 simulations
are run and the average is taken as true result.

The most difﬁcult element of the Monte Carlo analysis, except in the CC test
case, is the determination of proper thresholds to achieve a desired false-alarm rate
with each detectors. The CC test is F2200-“ distributed under Ho, and therefore the
proper threshold is very easily determined from standard tables [1].

Because the GLRT is not known to possess the CFAR property, the proper thresh-
old will, in general, depend on a/o. For a given value of a/o, the threshold needed
to achieve a desired false-alarm rate can be determined via Monte Carlo analysis and
trial-and-error over a range of thresholds under the guidance of the rough method

36

 

Threshold 3.03 1.96 3.92

 

 

 

a/o u CC GLRT MC
1 .3162 .88 .93 .69
3.162 1 .88 .93 .92

 

10 .03162 .88 .93 .93
Table 2.3. Pd with P, = 0.05, N = 120

 

 

 

 

 

 

 

described in Subsection (2.6.3). This is precisely how the thresholds were determined
for the results given in Tables (2.1-2.3). Remarkably, however, the Monte Carlo anal-
ysis revealed that both the MC test and the GLRT are essentially CFAR so long as
a/o > 1, which is almost always true in MRI. Moreover, the Monte Carlo analysis
supports the use of some very simple rules for threshold selection.

First, in the case of the MC test, for very large values of a/o the magnitude
data are very well approximated as Gaussian. Therefore, in such situations, the MC
test is (approximately) F1.( N-” distributed under Ho and the prOper threshold can
be determined again from standard tables [1]. Because the Monte Carlo simulations
show that for a/o _>_ 1 the MC test is essentially CFAR, the proper threshold may be
determined from F1, N-” distribution for all cases in which a/o _>_ 1. The derivation of
approximation model (2.4) for MC detector also supports this, although not strictly.

Second, the similar performances of the GLRT and MC test for large a/o suggest
the possibility of a relationship between the GLRT statistic and the F1,(N_1). This
intuition led to the discovery that the proper threshold for the nonlinear GLRT can
be selected as one half the threshold required to achieve the desired P, for 3 F1, 10-1)
distributed statistic.

The results in the three tables show clearly that our GLRT detector performs best
for all three (low, medium, high) a/o cases. The CC detector performs better than
the MC detector in the low a/o case. However, as the DC component becomes more

and more dominant over the noise, the GLRT and MC test signiﬁcantly outperform

37

the CC test.

Finally, note that the detection rate of the CC test is constant for ﬁxed SN R =
u2(a/o)2. This is expected because the .CC test statistic is non-central F2,2(N_1)(SNR)
under H1. Remarkably, note that the dependence of detection rate of our GLRT
detector also depends only on SNR. The same is not true of the MC test, whose
performance drops severely as a/o decreases.

I also illustrate in Figure (2.4) the results using three performance curves (PD
versus response strength u = £- curves) with N = 120, P, = .01. Therefore, the
thresholds are chosen as in Table (2.1). Solid (—) line for GLRT; dash-doth.) line
for CC; dashed (— —) line for MC. Figure (2.4a) shows the case for a/o = 1.0, which
is low, so we expect MC detector to suffer. It is indeed the case: the GLRT curve is
at the top, CC curve is in the middle and the bottom one is for MC. Figure (2.4b)
shows the case for a/o = 3.162, which is large. In this case, the top one is for GLRT,
the middle one for MC, the bottom one for CC. It shows the MC detector begins
to surpass the CC detector but is still inferior to the GLRT. Figure (2.4c) indicates
the case for a/o = 10, which is quite large, and so the MC and GLRT detectors
have almost the same performance as shown in the ﬁgure: the GLRT and MC curves

coalesce to one in the top while the CC detector remains at the bottom. All three

curves clearly demonstrate that our GLRT detector is always the best.

2.8 A Simulated ﬂVIRI Study

One fMRI experiment is simulated to illustrate pixel-wise detection efﬁciency by the
above three detectors. The results are shown in Figure (2.5). Figure (2.5a) shows one
slice image of the brain (64 x 64 pixels) with simulated activation region highlighted.
A 9 x 9 voxel region in the lower right corner of the brain (indicated in white) is

selected to be active in this simulation. For the simulation, a time series with length

38

 

0.9 -
0.8

]
iii) _

30.5-

3,-0.3 ~

 

 

 

 

 

0.9

0.8 ..
0.7 ..1 ..
50.5
0",...

0.3T ‘

0.2r- -- - '

 

0.1,-.. ~’

 

 

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
his

00

 

0C)

 

 

 

 

 

Figure 2.4. Three curves comparing the performance of three detectors. (a) a/o = 1;
(b) a/o = 3.162; (c) a/o = 10.

39

N = 120 is simulated for each voxel. The reference signal r is a square wave with
period 10. The ﬂuctuation of reference signal about constant level is i10%, i.e.,
u = 0.1. The noise variance in each time series is set so that a/o = 3.162. For each
time series, the phase is a constant. Spatially, the phase has a ﬂuctuation (Gaussian
noise with zero mean 0.1 variance) about a constant phase of 1r/3. The reader is
referred to Figure (3.13a) in Chapter 3 for the correlation image for this simulated
experiment.

The desired false-alarm rate in this example is chosen to be P, = 0.01, and
thus the three thresholds for CC, GLRT and MC detectors are 4.70, 3.43, and 6.85,
respectively. The MC test, CC test, and GLRT test are compared in Figures (2.5b—d).
The actual detection rates observed in this simulation, given in the caption of Figure
(2.5), are in excellent agreement with the tabulated Monte Carlo results.

However, we note that in the activation maps shown in Figure (2.5), there is
activation ”detected” outside the brain. This is a serious problem which results from
the fact that spatial information is completely ignored. This is the main defect of
pixel-wise detection. Therefore next chapter is devoted to dealing with the problem

of how to utilize spatial information to further enhance detection efﬁciency.

40

 

Figure 2.5. A simulated fMRI experiment illustrating pixel-wise detection by three
detectors. (3.) Brain image with simulated activation region highlighted; (b) MC test
results: Pd = 0.77; (c) CC test results: Pd = 0.70; (d) GLRT results: Pd = 0.79.

41

CHAPTER 3

Multi—scale Detection for MRI

3.1 Overview

The last chapter focused on pixel-wise detection for MRI, which is most common in
practice [3, 24, 25, 36, 55, 60, 64]. However, these techniques do not take advantage
of mutual information among neighboring pixels. Ignoring such spatial information
can reduce detection accuracy. Utilizing spatial information may enhance our detec-
tion accuracy. For example, in Figure (2.5) activity is “detected” in areas outside the
brain — an erroneous decision that could be avoided by incorporating ananatomical
information in the decision rule. Furthermore, it may be quite possible that con-
nected region of activation is larger than individual pixel dimensions. In other words,
activated areas in reality tend to occur in clusters of neighboring pixels. Thus, lim-
iting testing to individual pixels imposes artiﬁcial boundaries in the analysis process
that may weaken the detection performance. On the other hand, if there is strong
indication that a large group of pixels, which may be thought of as one large pixel
at a very coarse (spatial) scale, is active, then the individual pixels inside this group
may be more likely to be active themselves. Hence comes the idea of (spatial) scale

and incorporating spatial correlation into the fMRI detection process.

42

3.1.1 Spatial Modeling and Outline for the Method

In light of the ideas above, the pixel-wise detection is oversimplistic. Therefore, it
is necessary to develop detection methods that take advantage of spatial correlation.
There are many approaches to attacking the problem, for example, cluster analysis
[20, 26, 27] and independent component analysis (ICA) [43]. Detection methods
using Bayesian strategies have also been recently proposed for MRI [21, 35, 17]. Just
as in the pixel-wise detection strategies, we need to model each time series; when
we incorporate spatial correlation, we also need develop spatial models of the fMRI
data. This is by no means easy. The recent works of [35, 17] use Markov random
ﬁeld (MRF) models to model the spatial relationships in MRI data.

These methods all have their shortcomings. The clustering and independent com-
ponent analysis techniques are somewhat ad hoc and do not enable explicit modeling
assumptions about spatial correlation. The existing Bayesian methods mentioned
above are all restricted to modeling only the ﬁnest scale (highest resolution). Such
methods tend to be very computationally demanding, and are often difﬁcult to ana-
lyze and interpret. Therefore, we will put forward a novel multi-scale modeling and
detection framework that incorporates spatial correlation information and is much
more amenable to analysis and optimization.

More speciﬁcally, this chapter will present a two-step approach for fMRI detection.
First, a new multi-scale image segmentation algorithm is pr0posed to decompose
the correlation image into several different regions, each of which is of homogeneous
statistical behavior. Second, each homogeneous region will be classiﬁed independently
as active or inactive using detection methods analogous to the pixel-wise test described

in Chapter 2.

43

3.1.2 General Setting of Bayesian Image Segmentation

In a general setting, the fMRI activation mapping may be viewed as a particular
image segmentation problem. We are given a random continuous noisy image Y which
must be segmented into a discrete image X consisting of regions of distinct statistical
behavior. For example, in MRI, image Y may be composed of the correlation values
(speciﬁcally, the correlation between amplitude time series and the reference signal)
at each pixel location; the image X may be just a binary detection map, as in Chapter
2.

To simplify the presentation, I will modify the notation slightly from that used
in previous chapters. From now on, I will adopt some notation from [10]. Symbols
without subscript refer to the whole image ﬁeld. Individual pixels in the image X
are denoted by X k where k is a point of a one-dimensional (l-D) or two-dimensional
(2-D) lattice, depending on the context. The collection of lattice points at scale j
is denoted as Sj. Random quantities are usually denoted by upper—case letters. For
notational ease, however, lower-case letters may denote the stochastic quantities or
corresponding deterministic realizations, which should be distinguishable also from
contexts.

We assume that each observed pixel in image Y is dependent on a corresponding
unobserved label in X. Each label speciﬁes one of M possible states, each with its
own statistical behavior. In our case, M = 2 indicating “active” and “inactive”.
However, in general, we may need to segment the correlation image into regions with
homogeneous statistical behavior, so M 2 2.

The dependence of observed pixels on their labels is speciﬁed through the condi-
tional distribution of Y given X, i.e., py|,(y|x). The function py|,(y|x) is called the
likelihood function. In fMRI, we are essentially interested in inverting this relation-

ship; that is, we would like to determine px]y(x|y), the probabilistic description of the

44

unknown activation map given the observed data. This calculation is facilitated by
introducing a priori knowledge about the size and shapes of regions, modeled by a a
prior distribution p,‘ (x)

By Bayes formula, we will estimate X given observed image Y = y as:

x = arg max p(X = x|Y = y)
I
= p(X=g)p<Y=y1X=x>
p(Y=y)

= p(X=x)p<Y=yIX=x)
Ea" x P(X)P(Y=ylxl

 

0‘ px(x)py]x(ylx)°

where upper case letters denote random quantities and lower case letters denote the
deterministic realizations.

This is the so-called maximum a posteriori (MAP) estimator. The general frame-
work for Bayesian image segmentation problem is shown in Figure (3.1) (adapted
from [10]).

Despite the apparent simplicity of this estimator, remember that X is 2-D image
of integer values, making optimization prohibitively difﬁcult, in general. Further,

speciﬁcation of the a priori distribution px(x) is not straightforward, either.

3.1.3 Multi—scale Image Segmentation Methods and Advan-
tages

Traditionally, statistical image segmentation has been accomplished using MRF mod-
els. The global statistical models in the MRF theory lead to substantially better seg-
mentation results than those of simpler, local methods [23, 50]. The theory of MRF
models provides a powerful framework for studying nonlinear interactions among dif-

ferent features [50]. Under the MAP criterion, it leads to the minimization of a global

45

Given

Y: Continuous Noisy Image

    

Segmentation

Seek
X: Discrete Image

Figure 3.1. Illustration of image segmentation.

energy function which is very computationally expensive to carry out [28, 23, 45].

Recently, an alternative to the classical MRF approach to image segmentation
was proposed in [9, 10]. This approach is based on modeling the discrete ﬁeld X as
a multi-scale Markov chain. In the sequel, it is called Algorithm 1.

The multi—scale hidden Markov model (MHMM) proposed in [15] and its extension
[49] may also be used to deal with the segmentation problem. Instead of the discrete
ﬁeld X, the states of the wavelet coefﬁcients at different scales are modeled as a
Markov chain. In the sequel, this approach is called Algorithm II.

According to the results and conclusions of previous work [10, 11, 40, 49], this
kind of multi—scale modeling not only captures the key inter-scale physical dependency
present in natural signals and images, it also leads to computationally efﬁcient (usually
scale-recursive) algorithms. A more precise explanation of physical dependency will

be given later in the review of Algorithm I.

46

However, before proceeding to the detailed description of algorithms, I will ﬁrst

give a brief introduction to multi-scale analysis.

3.2 Multi-scale Analysis

In general, multi-scale analysis refers to the study of behavior of signals or images at

various spatial and/or temporal resolutions [8, 32, 53, 54, 63, 65].

3.2.1 1-D Multi—resolution Analysis (MRA)

Let us begin by considering the multi-scale or multi-resolution analysis of a ﬁnite
energy l-D signal. There does exist 2-D MRA. However, it is not used in this disser-
tation and thus omitted. Finite energy signals are those signals belonging to the space
of square-integrable functions on the real line, L2(R), see Appendix I in Chapter 5.

An MRA of L2(R) is deﬁned to be a sequence {Vj | j E Z} of closed subspaces of
L2(R) satisfying the following properties [32, 53, 65]:

1)
Vi C V)“, (3.1)
2)
f(x) 6 W 4:» f(2x) 6 V1“, (3.2)
3) for all f E L2(R),
1.13900 II Pif ||= 0, white-M, Vi = {0} (3.3)
11330 n P3, f - f H: 0, orlimjaoo Vi = L2(R) (3.4)

47

where P3, is the orthogonal projection on the space V], which can be considered as
an approximation of f at the scale of 2‘1. In the above deﬁnition, j indexes the
scale or resolution of analysis — a smaller j corresponds to a lower resolution of
analysis. A resolution of 2j corresponds to a scale of 2”. The ﬁrst limit implies
that as resolution gets smaller and smaller (21 —-> 0), the approximation is just 0, all
information (particularly its details) about f is lost. The second limit implies that
the approximation Pf, f converges to the true signal f as resolution gets higher and
higher (23' —-> oo).

4) There exists a function (b(t) E L2(R) such that {4)(t — k)|k e Z} form an
orthonormal basis ‘for V0. k indicates the time (or spatial) location of analysis.
¢(t) is called the scaling function of the associated MRA, which plays an essential
role in MRA. Then it follows that the family {¢i(t) = 21/2¢(2J't — k)|k e Z} is an
orthonormal basis of V1 for all j E Z. The factor 21/2 is introduced for normalization:
II at) H ea) n= 1.

The nesting of Vj C VJ.+1 implies that (b E V0 can be expressed as a linear

combination of {t/2¢[[k E Z}:
W) = Z hex/Mu — k), (3.5)
k

where coefﬁcients {hk =< (b(t), J2¢(2t — k) > [k E Z} constitute scaling ﬁlter.

In addition to V7, Wj is deﬁned as the orthogonal complement of V1 in W“, i.
e., VJ.+1 = Vj €19 Wj for any j E Z. Wj represents the additional information that is
necessary to pass from an approximation at resolution Zj to an approximation at the

higher resolution 21“. A direct consequence is Wi J. Wj,i 5:5 j. For example, since

 

For sake of generality, some literature ﬁrst introduce a function b(t) 6 V0 such that {b(t - k), k E
Z} is a Riesz basis for V0. Then another function ¢(t) E L2(R) can be constructed from b(t) such
that {¢(t - k)|k E Z} forms an orthonormal basis for V0. The content of this dissertation is not on
these issues of function construction, therefore I adopt a more direct and simpler deﬁnition.

48

W”1 J. V”1 = Wi 63 V’, we have W’+1 _L W‘.

As in the case of V0, there exists another function ’l/J E W0, which is called the
wavelet function and can be constructed from ¢, such that the family {ib(x—k) [k 6 Z}
forms an orthonormal basis for the space W0. To understand how one can generate
ib(t) from ¢(t), consider the following. Because Wj C Vi“, wavelet function w E W0

can also be expressed as a linear combination of {ﬁdilk E Z}:
Wt) = Zita/2M” — ’9), (3-6)
I:

where gk,k E Z is chosen to be l(—1)"h1_k and constitutes the wavelet ﬁlter. It
follows that the family {vim = 21/ 21,!)(2j t — k)|k 6 Z} is an orthonormal basis of Wj
for all j 6 Z.

Iteratively applying the relationship Vj+1 = V1 EB Wj, we have
VJ = VJ—l EB WJ—l = VJ—2 EB WJ—2 EB WJ—l = = VJO EB WJ° . . . 69 WJ—l, (3.7)
As J -—> 00, we get
L2(R) = eszoWJ' e V10. (3.8)
Further, let Jo —> -00, then we get
L202) = $jesz- (3.9)

Equation (3.8) means that there exist c,{°, called scaling coeﬁ'lcients, and di, called

wavelet coefﬁcients, such that any one-dimensional signal f E L2(R) can be repre—

 

It turns out that there exist other recipes for 11:. For example, 9;, = (-1)"h1-k+2n,n e Z or
9): = (-1)"'1h_k_1 also work.

49

sented as:

f(t) = Zci°¢i°(t) + Z Zeitmt), (3.10)

It 1:10 I:
where the coefﬁcients of =< f, ¢,{° > and 9,7, =< f,i/z,{(t) > are called the discrete
wavelet transform (DWT) of f. The reader is reminded that 0 in this chapter de-
notes wavelet coefﬁcents. By invoking Equations (3.5) and (3.6), there exist recursive

relations:

7.9..

= < Leg; >= :h,_2,.cz,+1, (3.11)

0}, = < L114], >= Zgn-2kc,’,+l. (3.12)

In practice, there is a fundamental limit on the meaningful resolution when we sample
a continuous (inﬁnitely high resolution) temporal signal or spatial image. Therefore,
we usually start with a scale subspace VJ , with J chosen to be large enough to repre-
sent the ﬁnest details of interest in a signal, since Pg,’ f a: f for large J (see Equation
(3.4)). So we replace the semi-inﬁnite sum in Equation (3.10) with a sum over a ﬁnite
number of scales JO 3 j S J, {J, Jo} C Z, where Jo and J indicate the coarsest scale
(or lowest resolution) and ﬁnest scale (or highest resolution), respectively.

This point can be paraphrased from another perspective as well. Because at high
resolutions, the scaling functions are similar to Dirac delta functions (assuming 45 is
localized and well-behaved, i. e. lim,_,°° (b(t) = 0), since the time scale is compressed

while the magnitude scale is enlarged,
j —+ 00, 2j/2¢(2jt — k) —> 2‘j/26(t — 12!),

which can be veriﬁed by checking the integration results of both sides. Therefore, for

50

 

j sufﬁciently large, for example, j = J, we have

ct = (M1)
= [00 f(t)2J/2¢(2Jt—k)dt

[00 f(t)2"/26(t — k2")dt
= 2‘J/2f(k2") = 2”/2f(kT.).

In other words, the scaling coefﬁcients are approximately pr0portional to signal sam-
ples at a sampling rate of T, = 2". So in practical computation of DWT, we start
with an initial set of scaling coefﬁcients Ci, which are assumed to represent an ap-
proximation to signal f at a certain scale 2" (corresponding to the sampling period
T, = 2").

The wavelet and scaling coefﬁcients at coarser scales j < J can be computed
recursively using the lowpass scaling ﬁlter {h_,,} and highpass wavelet ﬁlter {g_,,},
but only even-indexed samples at ﬁlter outputs [are retained (downsampling) according
to Equations (3.11) and (3.12). This is called the pyramidal algorithm [65], the
realization structure for which is depicted in Figure (3.3).

Similarly, for signal synthesis (inverse DWT), we have

0;.” = Z C‘Lhk—Zm + Z 0’,,,gk_2m. (3.13)

The synthesis operations can be implemented using ﬁlter banks as well, involving
interpolation (upsampling) and the two ﬁlters {hﬂ} and {gn}, as shown in Figure
(3.4).

Under some conditions [32, 65] a ﬁlter {hnln e Z} corresponds to a valid MRA
satisfying the aforementioned conditions. Determination of the set of coefﬁcients

{hnln 6 Z} (or corresponding scaling function (b(t)) is beyond scope of this disserta-

51

 

Figure 3.2. Nested scale spaces and wavelet spaces

tion. Here, we are only interested in the simplest DWT, i.e., Haar wavelet transform,

in which case,

1 0 S t < 1/2
1 0 S t < 1

¢(t)= , ¢(t)= —1 1/2gt<1
0 otherwise,

0 otherwise,

and the analysis algorithm degenerates to much simpler form. See Section (3.4).

52

 

 

 

 

 

 

 

 

8n [2 ——-9’

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ci+1
———-. an [2 _——. 0’ 1
cl
h-n [ 2
h-n ] 2 —>c 1'1
Fine Scale (High Resolution) Coarse Scale (LOW RCSOIUtiOU)
Figure 3.3. Computation of DWT by ﬁlter bank
1
6——w ] 2 g“
j+1
0’" °
__+, ] 2 gn
cl
[2 11n
j-l

 

 

 

 

 

 

Coarse Scale (Low Resolution) Fine Scale (High Resolution)

 

Figure 3.4. Synthesis by ﬁlter bank

53

3.2.2 Properties of the Discrete Wavelet Transform

Several attractive properties make the wavelet transform ideal for many applications
in signal and image processing [15, 42]. Two key properties are multi~resolution
and locality each wavelet wﬂt) is only a dilated and translated version of the orig-
inal mother wavelet «b(t) and is localized simultaneously in time and frequency. A
third property is the compressive property: the wavelet coefﬁcients of real-world sig-
nals/ images tend to be sparse. Two ﬁnal properties are clustering and persistence: if
one wavelet coefﬁcient is large/small, then its adjacent coefficients within the same
scale are very likely to also be large/small; and large/small values of wavelet co-
efﬁcients tend to propagate across scales. The clustering property suggests that
coefﬁcients may have strong dependencies within scale, while persistence leads to
dependencies across scale. The hidden Markov model used in Section (3.4) utilizes
these prOperties.

It is very important to note that the above description is just mathematical multi-
scale analysis of a signal/image. Actually, in many applications [10, 11, 40] with
multi-scale analysis, it is the physical nature of the real signal (image) that is directly
modeled via a multi-scale representation. In those cases, no wavelet transform is
performed on the image at all. The motivation to do this is to utilize the nice multi-
scale structure. The nicety is two-fold as summarized in Subsection (3.1.3) and is
well embodied in the review of Algorithm I in Section (3.3) and Appendix V. This
is a fundamental point to be kept in mind. Otherwise, when referred to multi-scale
analysis, one may be misled to jump into the speciﬁc wavelet transform and gain
nothing.

In other words, multi-scale analysis and wavelet analysis are not synomonous:
wavelet analysis is just one form of multi-scale analysis. Of course, wavelet analysis

has some beneﬁts as well, one of which, as opposed to other multi-scale analysis

54

methods, is that it provides an orthogonal multi-scale decomposition that facilitates

modeling and computation, as we shall see in Section (3.4).

3.3 Image Segmentation by Multi—scale Markov

Modeling Discrete Field

In order to better understand the basic ideas of multi-scale modeling and to see clearly
the distinction between Algorithm I and Algorithm II, I brieﬂy outline the main points
of Algorithm I [10] in this section. A more detailed mathematical derivation appears

in Appendix VI of Chapter 5.

3.3.1 Main Points of Algorithm I

First of all, let us refer to Figure (3.5) and make clear what the ﬁeld X j physically
means. Each resolution is a level in a quad-tree in the 2-D case, so a lattice point at
one resolution corresponds to four points at the next ﬁner resolution. This group of
four pixels in the continuous image Y is considered as a block and X j denotes the
ﬁeld containing the labeling of each of the blocks at resolution j. We may assume
that the ﬁner segmentation X 1+1 of X is an interpolated version of X j [9]. Note in
Figure (3.5), Y" E Y and X" _=_ X.

The fundamental assumption in Algorithm I is that the sequence of X 1 forms a
ﬁrst-order Markov chain [62], i.e., the distribution of X j given all coarser scale ﬁelds

is only dependent on X j ’1:
P(.'17"’].’1:l IS j — 1) == ijlxj-1($j]$j—l). (3.14)

This pyramid structure of the multi-scale random ﬁelds (MSRF) is depicted in Figure
(3.5b).
55

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

/ z«—-/ z A:

(a) (b)

Figure 3.5. Pyramid structure of the MSRF. (a) Continuous image Y at different
scales. (b) The random ﬁeld X j at each scale is causally dependent on the coarser

scale ﬁeld X j“ above it.

56

The second important assumption is that each pixel Xi is only dependent on
a local neighborhood of pixels at the next coarser scale. 6k is used to denote the
neighborhood of point k. The ﬁrst choice of the neighborhood 8k in [10] is a quad-
tree structure in 2-D case or binary-tree structure in 1-D case, as depicted in Figure
(3.6). Speciﬁcally, in the quad-tree structure each point is only dependent on a single
point at the next coarser scale — its father d(k). In words, if, by some means, we
know that 1”,: is active (X i = 1), then with a high probability, we may say that its
four children are active as well. This probability is just the transitional probability
density between individual pixels from a coarser scale to a ﬁner scale. The transition
probability that X ,1, has state m given that its father is in state m’ is

. 1 _ qr
. . , —

 

This equation tells us two facts: 1) the probability that the labeling will remain the
same from scale j — 1 to j is qj + l—Xl-‘i, where q’ E [0, 1]; and 2) the probabilities that
the child has any one of a number of different labels from its parent’s are equally likely,
i. e., hf. The so-called “physical dependency” previously mentioned in Subsection
(3.1.3) is embodied in the transition probability of the Markov chain.

This kind of choice of neighborhood structure leads to a simple and efﬁcient algo-
rithm -— called sequential MAP (SMAP) algorithm — for the image/signal segmen-

tation [10]. For more details, refer to Appendix VI of Chapter 5.

3.3.2 Simulation Result

Figure (3.7) shows the segmentation result for one 2-D image. In Figure (3.7) there
are M = 2 states. Each state corresponds to a Gaussian distribution, but has different

mean and variance dictated by its label.

57

Neighborhood of
kl ,k2,k3,k4

    
 

k4 Neighborhood

kl _
. ofkl.k2

  

k3

(a) (b)

Figure 3.6. Neighborhood structure used in Algorithm 1. (a) Quad-tree structure used
for 2—D case; (b) Binary tree structure used for 1-D case.

3.4 Image Segmentation by Multi—scale Hidden
Markov Model (MHMM) of Wavelet Coefﬁ-
cients

An alternative to the multi-scale method of [10] based on wavelet analysis can also be
used for image segmentation, which is a major contribution of this chapter. The main
idea is to take advantage of the properties of DWT, as explained in Section (3.2.2).
Its origination traces back to [15, 49]. The new method consists of two procedures:
the ﬁrst one is edge detection and the second is label estimation. In the following
subsections, I will address the ﬁrst step of edge detection in detail, since it is the
core of our algorithm. I will then brieﬂy explain the second step, which is much more
straightforward, and provide some example applications.

The idea of applying wavelet analysis to edge detection is quite simple. Roughly
speaking, wavelet coefﬁcients represent the differences between signal / image approx-
imations at different scales (or resolutions). Hence, they are actually a kind of diﬁer-

entiation, and intuitively, are well suited for edge detection.

58

 

(b)

Figure 3.7. One simulated image segmentation using Algorithm I. (a) Original noisy
image; (b) Segmentation result.

59

3.4.1 Likelihood Function for MRI Data

In this chapter, I deal exclusively with MRI magnitude (rather than complex) time
series which is approximately Gaussian distributed at reasonable SNR levels, as dis-
cussed in Chapter 2. Referring back to Equation (2.4), the correlation between the

reference signal r and the magnitude time series 2:
N
y = rTz = erz, (3.16)
i=1

is Gaussian distributed (provided the signal noise ratio is not very small). Once again,
the reader is reminded that y in this chapter denotes correlation value.

Hence, the fMRI correlation image may be modeled as a 2-D Gaussian process.
For simplicity, let us ﬁrst consider the 1-D case. Extension to 2-D case is given
in Subsection (3.4.7). Yielding to convention, we assume that the length of the

correlation sequence is a power of 2. The observation model is:

yi:p,{+w,{, k=0,...,2’—1, (3.17)
where y" _=_ {yjc’} are the observations, p" 5 {p,{} are “true” correlation values, and
{w;,’} are noise.

Now we are going to employ the Haar multi-scale analysis:
'+1 '+1
2!: + y2k+1

ﬂ

 

it: .k=0,....2j—1.J05j5J—1.

The multi-scale analyses of p and w are deﬁned in an analogous way. The binary
tree structure of this multi-scale data analysis from scale j+1 to scale j and then to

scale j-l (ﬁne-to-coarse) is shown in Figure (3.8) (adapted from [49]).

60

coarse )4 scale j-l

j y 11 scale 3

 

- ~ +1 '+1 '
ﬁne y6+1 y{+1 sz y?) scale j+l

Figure 3.8. Binary data tree structure for Harr wavelet analysis.

It is straightforward to see that

y}, = pi + wi. (3.18)

The noises {wile = 0, - - -,2J — 1} are assumed to be independent, identically
distributed (i. i. d.) as Gaussian random variables with zero-mean and variance 0.
Because Haar wavelet transform is an orthonormal transform, it then follows that the
preceding sentence is also true for any j, resulting in the likelihood function (refer to
the above Equation (3.18)):

23-1

p(ylp”)= [IA/(whim ), JOSjSJ (3.19)

where y3= {yizj ho 1and similarly for p’, N (x | p, 02) denotes a Gaussian density
with mean p and variance 02 evaluated at the point x.
The relationship between a “parent” (e. g., yi) and a “child” (e. g., ygzl) is very

important in multi-scale data analysis. The parent-child conditional likelihood in our

61

case turns out to be:

‘+1 ' +1 Vi ‘7
p(ygk lykipl=N (3’21: |\/2+ +_6J\/__2_,_;), (3°20)

where

. '+1_ ‘+1
6.17:: 2k 2k+1 (3.21)

\/§

 

is simply the Haar wavelet coefﬁcient of true correlation p at scale j and location k.
This nice form of the likelihood suggests the use of a special kind of a prior model
for the wavelet coefﬁcients in Subsection (3.4.2), which complements the observation
model and leads to closed-form expression for the a posterior distribution of the states
in Subsection (3.4.4).

The conditional density in Equation (3.20) is derived as follows.

 

 

yj = + y2k+1
k kﬁ

'+1 '+1

9315 = 2k — 2k+1

:lx/2
1 ' 1
921:“ + “J“ -1112)?“

J2

 

9i+9wi

where ﬂy}, and 6w}, (read 0y and 0w as one symbol, not the multiplication of two
symbols) are the Haar wavelet coefﬁcients for observed correlation data y and noise
w, respectively.

Summing up,

—— + —4. (3.22)

62

 

And since p and w are independent, so are 0}, and dwi. It can also be shown that

y}: and Owi are independent since:

E (#9114) = [E (will 1) “103551021

:(a2 — 02) = 0.

Since both yﬁ andd’ hare independent of dwk, and
0w): ~ N(O, 02),

by the property of conditional likelihood [62, 61], Equation (3.20) ensues, which
completes the derivation.
Fhrther, the likelihood function in Equation (3.19) with j = J can be factorized

as follows:

J—l2J-l

p(ylp)= y)’°lp’° HIM; use; (3.23)

j=Jok= 0

where Jo is the coarsest scale for the analysis (usually we use Jo = 0), p(y.i,,',H lyi, 01,)
is given by Equation (3.20) and p(yJ°|p"°) is given by Equation (3.19) with j = Jo.
Note that p8 is the global average correlation data.

The factorization follows from the following lines of reasoning. Let us refer to
Figure (3.9), which is an example of three scales. The key point is that the information
contained in data at the ﬁnest scale {y§, yf, yg, y§} is completely the same as that in

{313,313, y3, y3} (corresponding to the white dots in Figure (3.9)), therefore,

P(Y2|P) = P(y3,y3,y3,y§lp)

= 10013 IP)P(y3, 113, 2)?le , p)

63

 

Figure 3.9. Pr0pagation of wavelet coefﬁcients (three scales)

= p(yglp)p(y3|y8, [010013, ails/8,113, p)
= p(vh’ lp)P(y3 lyg, P)p(y3 ly3. vi, p)p(y§|y3, 313, at, p)

= 1903931001308, p)p(v§ly3, mph/glad, p) (324)

due to Equation (3.20), for example:

01
Myth/3,10) = N(y6|y3, ﬁlo

Generalization of Equation (3.24) leads to Equation (3.23).

3.4.2 Key Point for Edge Detection and a Prior Distribution

for the Wavelet Coefﬁcients

Now let us consider (joint) a prior probability for the (unknown) wavelet coefﬁcients
9. A simple approach is to model them as independent Gaussian mixture random vari-
ables. We move beyond this simple a prior, by specifying probabilistic dependencies

between the states underlying the mixtures of parent and child wavelet coefﬁcients.

64

To deduce discrete state estimation from continuous data, the key point for our al-
gorithm is to associate the continuous wavelet coefﬁcients with a 2-state (discrete)
Markov chain as depicted in Figure (3.10), i. e., each wavelet coefﬁcient is described
by a Gaussian Mixture Model (GMM). In Figure (3.10), each black node represents
a continuous-valued wavelet coeﬂicient 0%; each white node represents the discrete
hidden state variable 3}, for the corresponding wavelet coefﬁcient 0}, (connected by
a solid line to the state variable 31,). To match the inter-scale coefﬁcient dependen-
cies, the hidden states are vertically linked across scale by dashed lines. Connections
across scale capture the “parent-child” dependency inherent in the DWT of natural
signals/ images [49].

For our real problem of edge detection, the states of Markov chain are unknown
(“hidden”) and represent the presence or absence of edges: state 0 indicates a ho-
mogeneous region; state 1 represents the existence of an edge. If we believe that
the underlying signal is generally smooth with a few large edges, then the follow-
ing modeling is intuitively reasonable. Speciﬁcally, consider two-state mixture model
where state ‘0’ is a highly probable low-variance Gaussian density, indicative of a
homogeneous region, while state ‘1’, corresponding to another less likely Gaussian
density with a larger variance, indicates the presence of an edge (non-smooth area).
Using this interpretation, we may test for the presence of an edge simply by checking

whether or not the following condition holds:

p(si = lly) > p(si = on). (3.25)

If it holds, then we conclude there is an edge at scale j and location k.
Keeping these basic ideas in mind, let us now turn to a formal mathematical
description. The MHMM is based on the modeling assumption that the value of each

state 3}, is caused by the value of its parent state 33],?” This leads to the factorization

65

Scale j-l mi

Scale j i” f

I \ / \
I \ ’ \

I
I \ ’ ‘

Scale j+1 f i i ‘ ——» Hidden ’State’ Variable

_" Wavelet Coefﬁcients.

Figure 3.10. Wavelet-based HMM.

of the joint state probability function:

J— 121'— 1
=H l—IPW lie/21) (3'26)

j=Jo k=0
where S: _ {sj }’_=__g’ 11:1“, ”2% _1 and p(sg|sg 1): — p(sg). At the coarsest scale j- — 0, there

is no parent wavelet coefﬁcient and so a prior is introduced for the state 33 of the
wavelet coefﬁcient 03): 93(m) E p(33 = m).

Another property of HMMs [16, 52] in general is that, given their respective state
values, all parameters 0 are conditionally independent, which is also implied by the

assumption of the model. That is,

J—l 2L1

p(9ls) = H 111909031), (3.27)

'=Jo Ic=0

where p(Oilsi) is assumed to be Gaussian as explained previously:

p(eilsi = m) = News, T33). (3.28)

66

We regard the signal and its wavelet coefﬁcients as realizations of a zero-mean random
signal. Therefore, we assume #2,, = 0 for all m and j. In general, the variances T3,? are
scale (j)-dependent, but in our experiment we set them the same at different scales.

In this case, there are only two parameters 7'6" and r3.

3.4.3 A Failure Modeling

An acute reader might raise the question: why do you abandon Algorithm I and pro-
ceed to think about Algorithm II? The ﬁrst reason is that I need develop something
new for my dissertation. The second reason is that actually we made a big mis-
take before proposing this two-step approach for image segmentation (edge detection
followed by state estimation) in its present form.

At this point, I would like to further emphasize the meaning of X i and its mod-
eling in Equation (3.15) in Algorithm I. Algorithm I places label X ,1 directly on some
abstract of scaling coeﬂ‘icient Y: — refer back to Figure (3.5)). This is a crucial virtue
and stands as a sharp contrast with our Algorithm II, which places states on wavelet
coeﬁ‘icients. Since the states in Algorithm I reﬂects directly the image’s classiﬁca-
tion labels, it is easy to physically understand the transition of states between upper
and lower scales (see Equation (3.14) and Equation (3.15)), and thus solve image
segmentation problem directly. Another feature of Algorithm I, as already stated in
Subsection (3.2.2), is that it does not resort to any DWT at all, which is another
contrast with our Algoprithm II.

During the second stage of my research, at ﬁrst we also hoped to segment the
correlation image directly in one step to active and nonactive regions (instead of
presently used edge detection and label estimation). The fundamental point was to
deduce the labels of image pixels from the states of wavelet coefﬁcients. The modeling
was to use the state of wavelet coefﬁcient at the next coarser scale to represent the

labels of two image pixels at the next ﬁner scale which correspond to that wavelet

67

coefﬁcients; mathematically, to associate s{, the state of wavelet coefﬁcient 0}, at scale
j, with one of four values, depending on the two corresponding image pixels at the

next ﬁner scale j + 1:

0,0) 2:1, 2:11 inactive (small)

' 1. . ' 1 .
0,1) 2: inactlve , ”2:“ active

39?...
||

(

(

. . (329)
(1,0) p5,“ active , ”2:41 inactive
(

1,1) p321, pit], active (large)

We also suspected that this approach is equivalent to Algorithm I. But after sev-
eral months’ trial, it turned out that our initial conjecture was wrong. The underlying
reason is that the states of wavelet coefﬁcients DO NOT have the same physical inter-
pretation as to represent the label of image pixels in Algorithm 1. Further, according
to the compressive property of DWT, the states of wavelet coefﬁcients should be just
a few. However, think about the general situation of image segmentation in which
there are M > 2 possible labels for each image pixel. According to above modeling
in Equation (3.29), there would be M 2 labels for each wavelet coefﬁcient, which is
contradictory to the compressive property.

That conjecture cost me several months’ time on analytical formulation and nu-
merical simulations. However, our efforts ended up in vain. “Failure is the mother of
success”. So later I changed the meaning of the states of wavelet coefﬁcients to that
as described in the preceding Subsection (3.4.2), and accordingly changed Algorithm
II to two-step approach: edge detection followed by label estimation. The successful

results are shown in Subsection (3.4.8) and Section (3.5).

68

3.4.4 Solution for Joint a Posteriori State Probability

Let us continue our discussion with Algorithm 11. Having set up the formulations for
likelihood and a prior, we are now ready to determine the joint a posterior density of

the states 8 given observations of correlation data y. Note that:

p(s=mly) = fpls=m,0IyId0

oc [pols = m. 0)p(9ls = m)p(s = node

J—12-7—1

= Hprl'tlyi,61.,s-’=mnpl,;Isi=mv

j=Jolc= 0
pw=mdrwn=nmmw

J—12-7—1

= H Hp( (sjz mklsJ' Lie/121 — —m[k/12J)Lj(si =mi) (3.30)

szok
where mi is one particular (deterministic) value assumed by random state variable 3},
and Lﬂs}, = m) 0: p( ,fllyi, sf, = m), the essential ingredients for our estimation of
the a posterior states, are actually marginal likelihoods. From the likelihood function

in Equation (3.20) and the a prior in Equation (3.28), we derive them to be:

LilmI = [a l: late" ... =m)p( ,;slI =mId0i

0‘ “Gull/£315.: m)
= N( é+llﬂm+ﬁ kﬁn__-72+0’2)
f f’ 2 ’
ngng-L kzanqu—1,m=ar an)

Proof: First recall Equation (3.22):

'+1___yi+9__y_ie____ 311+ 03'
2" 1/2 72+ J2

+—

6w},
7:

69

where the ﬁrst two terms are independent of the third one.

Further, recall Equation (3.28):

i. e.,

Therefore,

 

- - - #2,. if
E(2:1lylu3lc=ml = ’J—E+-\/k72-
- - - 'r,J,,2+0'2
V“’"( 2:1lyi1'S-I1; =77!) = 2

So, Li(m) or p( 22-18655: = m) (where y; is regarded as constant), one marginal

density function, has the closed-form representation in Equation (3.30).QED.

3.4.5 Marginal a Posteriori State Probability Calculation

After determining the joint a posteriori state probability, we can use an upward-
downward probability propagation algorithm [15] to determine the marginal a pos-
terior probability of state 3}, for the wavelet coefﬁcients 9;], and then use Equation
(3.25) to test the presence of an edge.

In the upward-downward algorithm, the Up Step marginalizes the a posterior
state probability recursively from the ﬁnest scale j = J — 1 to the coarsest scale
j = 0. At the end the a posterior state probabilities {p(sg = mly)},’§‘,=’(] are provided

and partial marginalizations are also stored for use in the Down Step. The Down

70

Step computes the marginal a posterior state probabilities for each 31, recursively.

Speciﬁcally, the upward-downward algorithm goes as follows [15, 49]. In the following,

d(mnn) E p(si = mist/‘2. = n) and 93<m> a M = m).

Upward-Downward Propagation Algorithm

Up Step
Beginning at j = J — 1, compute
qﬂn) = Z d(m|n)Li(m), (3.32)
m=0
Then forj= J—2,--~,1
mm = Z qéz‘m) 2Zl1(m)o’.(mln)Li(m) (3.33)
m=0
and for j = 0
(18(n) = Q6(n)ql(n)98(n)L3(n). (3-34)

The ﬁnal quantities {q8(m)} are the (unnormalized) posterior state probabil-
ities {10(88 = mly)}?§;3-
Down Step
Beginning the posterior states probabilities at scale 0, set p8(m) = q8 (m). Then
forj= 1,---,J—2

. M" T12 ' mn {H m ii, m Lj m
a<m)=ZPl*/J(")"“ ”£0; )4]. ( ) .< ) (3.35)

n=0

 

andforj=J—1

 

. “-1 p71. (n)er‘(mln)Li(m)
P}.(m)=Z% l” 420,) .

71

(3.36)

The ﬁnal quantities {p,’c(m)}x;01 are the desired marginal a posterior state

probabilities {p(si = m|y)},‘,{__fol.

This determines all the marginal a posterior state probability. Then according to
the criterion in Equation (3.25), if the a posterior probability that state S": is 1 is
greater than the a posterior probability that state sfc is 0, then we decide that there
is an edge between p321 and [4:1, Note, in this formulation we actually adopt the

Maximum Marginal a Posterior criterion.

3.4.6 Image Label Estimation

After the edges are determined, it is straightforward to formulate likelihood ratio test
to estimate the label of each homogeneous region.

Consider the following multi-hypothesis problem. The observation data y =
[111 312 yn]T within each homogeneous region is Gaussian random vector of di-

mension n. The M hypotheses are
H,:y~N(m,-,C',-), i=1,2,°°°,M, (3.37)

where rm and 0,, which are assumed known, are the mean vector and covariance
matrix of the observation under the ith hypothesis (i = 1, 2, - - - , M). Suppose each
hypothesis is equally likely and minimum error criterion is adopted [61], the decision

rule then boils down to

choosing Hj where j = arg min H y — m,- II2 + lnIC,-|, (3.38)

where H y -- m,- ||2 E (y - m,)TC',71(y -- 111,-) here and |C,-| is the determinant of 0,.
In the following numerical simulation, the observed data within each homoge-

neous region are assumed to be i. i. d.; that is, in each region it is assumed

72

that m,- = m,1 and C, = 0,?I. Some important practical problems are the as-
signment of the a prior probability 93(m) for 33 and state transition probabilities
d(mln) = p (sfc = ml S{;/12J = n), as well as the variances 7'3 and 7,2 characterizing
Gaussian mixture density in Equation (3.28). Selection of the parameters for the
Gaussian distribution characterizing each homogeneous region is also crucial. These
parameters may be estimated by a complicated EM algorithm [61]. For the ini-
tial investigation, I set them empirically (by observation). It turns out that our
experimental results are insensitive to the a prior probability 93(m) and transition

probability gumln). This apparent robustness is a nice feature.

3.4.7 Extension to 2 Dimensions

The preceding descriptions in this section are all conﬁned to 1-D case. Direct applica-
tion of the above procedures (edge detection followed by image segmentation) in 1-D
to 2-D is not very easy. Recall the core idea of our Algorithm II is to use the states
of wavelet coefﬁcients as indicators of edges. This is not easily extendable to 2-D
images, since we have three sets of wavelet coefﬁcients at each scale for 2—D DWT,
reﬂecting signal intensity changes in three orientations (horizontal, vertical and diag-
onal) [53, 54]. How to use the wavelet coefﬁcients to represent edges of 2—D images
in our general framework is not straightforward and needs future consideration.

We can, however, extend the multi-scale analysis and MHMMs from 1-D sequence
to 2—D images by the following method. Instead of taking the usual 2—D wavelet
transform of the original image, we use the following conversion method. First we
convert the original 2-D image into l-D sequence, and then apply previous l-D wavelet
analysis to the resulting sequence. The conversion details are: ﬁrst split the image
vertically into two halves, then horizontally splitting each half into two quarters, and
reiterate until each one is a 1 x 1 pixel. Refer to Figure (3.11) for details. The

merit of this conversion is that it retains the original spatial conﬁguration. And by

73

 

 

 

 

 

 

 

 

1 2 3 4
5 6 8
9 10 11 12
13 14 15 16

 

 

 

[1|5|216[9[13110]1413]714]8]11115]12[16]

 

Figure 3.11. Conversion of a 2—D image to 1-D sequence

this conversion method, the essential computations are perfomed with regard to 1-D

sequence and thus quite affordable.

3.4.8 A Simulation of the New Segmentation Method

Figure (3.12a—c) shows a simulated noisy image, the detected edges and the gray level
of the segmented image by Algorithm II, respectively. A two-state MHMM is speciﬁed

for this problem with the following parameter settings:

78 = 1,
7'12 = 100,
£Wl= &
dmm =.ak=Q~3W-Lj=LngJ—L

9,7;(0I1) .25,k=O,---,2-1—1,j=1,...,‘]_1.

Figure (3.12c) demonstrates that the overall result is excellent.

In Figure (3.12b), there are some artiﬁcial edges (boundaries), because, for the
sake of numerical stability, the whole image is divided into 16 subimages and our
algorithm is actually applied to each sub-image. This parsing also brings another
advantage: greatly reducing false edges in the ﬁnal segmented image. For example,

the region outside the brain usually has different statistical behavior as that inside

74

 

Figure 3.12. One simulated image segmentation by Algorithm II. a) Noisy image; b)
Detected edges; c) Segmented image.

75

the brain. By this parsing, these two regions are almost treated separately, thereby
greatly reducing boundaries between brain and air, which are most likely to be false
edges (i. e., edges beyond our interests of activeness). See the following fMRl data

processing example in Figure (3.13).

3.5 Processing Results for fMRI Data by Two-
Step Approach

As stated in the beginning of this chapter, the method for fMRI detection in this
chapter involves a two-step procedure: multi—scale image segmentation will be ﬁrst
used to break the correlation image into different regions of homogeneous statistical
behavior, each region will then be tested independently as active or inactive by single
pixel detection method.

In order to see the potential of this method for fMRl detection, the following
experiment is conducted to compare results from the combined effects of single pixel
detection and image segmentation with results obtained in the last chapter based
solely on pixel-wise detection.

Using the model introduced earlier in Equation (2.1), a simulated fMRI complex
time series is generated at each pixel. In order to simulate the proﬁle of the brain, the
magnitudes of the baseline signal (a’s in Equation (2.1) in the complex time series
roughly follow the magnitude data from a static brain image. Actually the original
complex data used in this example are the same as those used in Figure (2.5). Next,
the correlation value at each pixel is computed by correlating the magnitude time
series with the reference — see Equation (3.16) — to produce Figure (3.13a).

Figure (3.13b) is the segmented result of the correlation image in Figure (3.13a)

based on our Algorithm II in this chapter. There are M = 2 labels: each pixel is

76

assigned to either 0 or 1 according to its label. The parameters in this example are

set to be:

02 = 1

7'3 = 1;

7'12 2 100;
93(0) = -95;

g},(O|0) = .95,k=0,~-,2J‘-1,j=1,---,J—1;
yuan) = .05,k=0,---,2J‘-1,j=1,---,J—1;
m0 = 0;

"11:2.

In this example, I set 01 and 00 (variances for the Gaussian distributions charac-
terizing two homogeneous regions) to be equal. The test criterion in Equation (3.38)
reduces to simpler form in this case. The original simulated active region in Figure
(2.5a) is a 9*9 SQUARE (the coordinates are: y = 40, 41, - - - , 48; a: = 34,35, - - ~ , 42);
in Figure (3.13b) the white region is 10*8 RECTANGLE (the coordinates are:
3; = 41, . - - ,48;.z‘ = 33,34, - ~-,42). They are in good agreement but not in perfect
match. This is not surprising, since, in general, we cannot guarantee the segmentation
step produces exactly the same GEOMETRY as the original simulated regions.

Next I consider applying single pixel detection technique in Chapter 2 to each
of the above homogeneously (statistically) distributed region. The idea is to regard
each homogeneous region as one large, macro-pixel: the average of all time series
inside each macro-pixel is taken to be the new time series characterizing this macro-
pixel; then apply single pixel detection (MC detection) method to the new time

series individually to determine which of these macro-pixels is active and which one

77

is inactive. By this approach, the micro-pixels (original pixels in Figure (3.13a), in
contrast to macro-pixel) corresponding to the black region in Figure (3.13b) are all
inactive, which is expected since this region contains a large area outside the brain.
The micro-pixels in Figure (3.13a) corresponding to the white region in Figure (3.13b)
turn out to be all active. In other words, by this approach, only 9 pixels inside the
square are missed while the 8 pixels outside the square are false-alarmed.

Now let us take a comparison between Figure (2.5) in Chapter 2 and Figure (3.13b)
in this chapter. Recalling the results in last chapter, we see spurious activation regions
outside the brain. However, the falsely alarmed regions disappear in Figure.(3.13b)
(except for 8 pixels outside the square) after combining image segmentation with
single pixel detection. Pure single pixel detection methods failed to detect some
active pixles inside the small square in Figure (2.5a). However, these regions (except
for 9 pixles) are now correctly detected by combining image segmentation with single
pixel detection.

The enhancement of detection efﬁciency is clearly visible and also easily under-
standable. Actually we are given spatial-temporal series. However, the pixel-wise
detection method used in the last chapter only takes temporal information into ac-
count: spatial information is completely ignored. The image segmentation algorithm
in this chapter exactly complements the pixel-wise detection and remedies its short-
coming: it utilizes the spatial correlation information inherent in the data. So it is
no wonder that the detection performance improves after image segmentation.

To see the inﬂuence of different parameter setting on detection results, I produce
another correlation image and corresponding detection map in Figure (3.13c) and
Figure (3.13d). Figure (3.13c) is the correlation image produced completely by the
same procedure but with a/o = 10 (a and 0 here are two parameters in our model
(2.1)). If we use the previous set of parameters, the detection result is the same as in

Figure (3.13b). However, Figure (3.13d) is the corresponding detection map achieved

78

under a different set of parameters: 02 = 4, 73 = 2, r? = 200 (other parameters are
the same as those used for (3.13b)). The detected region in this case is 8*8 square
(the coordinates are: y = 41, - - - , 48; a: = 33, 34, - - - , 40). The performance from this
set of parameters is inferior to that from the previous set of parameters.

To my knowledge, the idea of multi-scale detection has not been applied to fMRI
data processing yet, and therefore the method in this chapter is quite original and is
promising to future real fMRI data processing.

One last point I’d like to make is that in Figures (3.13)b and (3.13)d the brain

proﬁles are artiﬁcially overlapped, as are the cases in Figure (2.5).

79

 

Figure 3.13. Processing results for MRI data by two-step approach. a) One fMRI
correlation image; b) Segmented image of (a), also ﬁnal detection results by combina-
tional use of image segmentation and single pixel detection; (c) Another correlation
image; ((1) Segmentation and detection result of (c).

80

CHAPTER 4

Discussions and Conclusions

4.1 For Pixelwise Detection

In Chapter 2, a novel nonlinear GLRT detector for MRI using complex data is devel-
oped, and is compared to the commonly used MC test and the recently proposed CC
test. The test statistic for the nonlinear GLRT detector has a closed-form expression.
All three tests are roughly equal in terms of computational complexity. Theoretical
analysis establishes an invariance property for the test statistic, and it is shown that
the GLRT and the CC test are asymptotically equivalent (as the length fMRI time
series increases). Monte Carlo analysis is used to demonstrate that the GLRT per-
forms better than the MC test or CC test overall. Furthermore, the analysis reveals
that a desired Pf can be achieved with the GLRT using thresholds selected from
well-known distribution tables. The distributions of GLRT statistic at high baseline
signal intensity under two hypotheses are approximated as well.

There are several avenues for future work within the GLRT framework. First,
the noise structure in MRI is very complicated. For simplicity and the purpose of
demonstrating our method and ideas, we assume the noise is white and Gaussian.
The whiteness assumption does not change the problem essentially, since given a

known time-correlation structure we can always use the Choleksy factorization of the

81

noise covariance to whiten the data [56], producing a model with the same form as
that used in Chapter 2. Hence, many of our conclusions are easily extended to more
realistic noise models that incorporate random ﬂuctuations due to the respiration
and cardiac cycle and patient motions [5, 29, 38], provided that these components
are known.

Second, more realistic (and necessarily more complicated) signal models can be
used in the GLRT framework. For example, multi-parameter models of the reference
signal r could account for uncertainties in the BOLD response. Multi-parameter
linear regression models of the response could be used within the GLRT framework
to make the test more robust to such uncertainties.

Some difﬁculties that we face, however, are 1) the distribution of noise n is usually
unknown a priori — not as nice as Gaussian model which is adopted in this disserta-
tion, and estimation of the noise covariance is a challenging issue even if it is Gaussian
distributed; 2) the structures of nuisance components are quite spatially-varying and
it is hard to distinguish the signal components from the nuisance components. How
to determine adaptively the signal and nuisance components from real data is an
important issue.

Actually, when I was studying on our nonlinear model, of course I pondered on
the linear model in Equation (2.3). Speciﬁcally, the problem under my consideration
was how to determine the best representations for signal subspace H and nuisance
subspace S for (magnitude) time series directly from actual data. Ardekani et al
went one step ahead of me. In [2], they partially solved the problem under my
consideration (he also dealt. with magnitude time series), that is, he devised one
method for determining the best representation of nuisance component assuming the
signal component has a known form. Completely solving the problem remains an
important issue.

Finally, we close with a summary of our conclusions regarding complex domain

82

fMRI. First, at relatively high baseline signal intensity (a/o > 3), the simple MC test,
which is very common in practice, performs quite well. Hence, in such regimes there
is no compelling reason for testing based on the complex data. This is expected since
the magnitude data is approximately Gaussian at high signal intensity, in which case
the MC test is nearly optimal. In fact, in most typical fMRI experiments a/o > 3 and
the MC test is adequate. However, at lower baseline signal intensity the performance
of the MC test drops off dramatically, and in such situations complex data tests
such as the new GLRT and CC test offer superior performance. Low signal intensity
does occur as the spatial and / or temporal resolution of the fMRI study is increased.
Most fMRI experiments work with limited resolution in order to avoid the low signal
intensity problem. However, high resolution, low signal intensity fMRI may be useful

in certain research or clinical paradigms, and in such cases we advocate the GLRT.

4.2 Consideration on Spatial Information

In the detection method of Chapter 3 I ﬁrst segment image composed of correlation
data which are assumed to be Gaussian. I then apply single pixel detection method
to each homogeneous region. One disadvantage in real data processing is that we
do not know a priori M, the deﬁnite number of homogeneous regions to which the
correlation image is to be classiﬁed. We may turn the wheel around: ﬁrst apply pixel-
wise detection to get the values of test statistics at all pixels, and then apply image
segmentation algorithm to decide active and non-active regions. One advantage in
this direction is that we have a deﬁnite number of labels (M = 2) when we perform
image segmentation. However, in this case, the data (test statistics) are most likely
to be F distributed and the nice factorization for likelihood function and parent-child
transition in Chapter 3 breaks down. Hence applying ideas in our Algorithm II is

much more difﬁcult and entails further consideration in this case.

83

Considering utilizing spatial correlation information, this dissertation uses
Bayesian image segmentation method. One practical difﬁculty in real data processing
is that it is hard to determine and incorporate the a prior distribution. Other ap-
proaches involving spatial consideration can be used as well. For example, clustering
analysis are gaining more recognition in this ﬁeld [20, 26, 27]. Borrowing some ideas
from array signal processing [47, 58, 70] may also be beneﬁcial to fMRI detection.

Formulation as decentralized detection problem is also a promising candidate [30, 51].

4.3 Epilogue

As stated in the very beginning of this dissertation, fMRI involves a lot of background
in physiology, neurology. This dissertation deals with pure signal and image process-
ing, because I lack a priori knowledge about the spatial varying nature of fMRI time
series, the feature of physiological respiration, machine and/ or head motion artifacts,
etc. To fully validate and further reﬁne the methodologies developed in this disserta-
tion, comprehensive testing and evaluation with real fMRI data is necessary.

In view of the great complexity of fMRI data processing, insight and expertise
of experts from other ﬁelds are very valuable — even indispensable — for success-
ful, practical research on real data processing. However, one biggest disadvantage
during my research is that I did not have enough communication and collaboration
with specialists from other ﬁelds, let” alone control over speciﬁc experiment. This is
exactly what hindered me from real data processing in this dissertation. I hope these

drawbacks can be remedied later in the future.

84

CHAPTER 5

Appendices

5.1 Appendix I: Notations and Some Results on

Projection Matrix

Some mathematical conventions and notations used throughout this dissertation are

established here.
All norms are the standard (Euclidean) 2-norm. Given two real one-dimensional

(1-D) function f (t),t 6 R and g(t), t 6 R, their inner product is deﬁned as

< 159 >2 [R f(t)g(t)dt.

The norm of f is deﬁned as

H f |l= \/< M >-

Given a sequence h", —00 < n < 00, its norm is deﬁned as

H h n: J5"?

85

L2(R) is deﬁned to be the set of {fl [I f ||< oo}. 12(Z) is deﬁned to be the set of
{hnl H h |l< 00}-

Given a real column vector x of dimension N, its norm is deﬁned as

 

||x||=\/)_cT_x=\Z$?.

 

Let M denote an p x q matrix. Let PM denote the matrix that projects a vector
onto the subspace spanned by the columns of M, i.e., PM = M (M TM )‘1M T, where
the superscript T denotes matrix transposition. Let Pf; denote the matrix projecting
a vector onto the complementary subspace that is perpendicular to the subspace
spanned by the columns of M, i.e., Pi, = I — PM, where I denotes the p x p identity
matrix. There are several properties of a projection matrix [4]:

1) Idempotent: P2 = P. The eigenvalues of a projection matrix are either 0 or 1.

2) Symmetry for orthogonal projection: PT = P;

5.2 Appendix II: Some Preliminary Results on X2

and F Distribution

Def 1[33]: Suppose x is a N dimension column random vector x ~ N (m, I), i. e., the
x,’s are independent and 2,- ~ N(m,-,1). If a: = H x I]2 = 23:12:? then :1: ~ xMA)
where the noncentrality parameter /\ = [I m [I2 = 2i, mg. When /\ = 0, it is called
central x2 distribution, otherwise noncentral.
Lemma 1: Suppose x is a N dimension column random vector x ~ N (m, 021), i.e.,
the x,’s are independent and 2:,- ~ N (m,, 02), and P is a projection matrix of rank 1',
then
2
w ~ x30)-
86

where the noncentrality parameter A = “Egan-”i.

Proof: Since P is a projection matrix with rank r, it can be factorized as P =
UNmUﬂxr where U is a full rank (= r) orthogonal matrix, i.e., UTU = In". Then
|| Px I]2 = xTPx = xTUUTx = l] UTx “2. UTx is now a r x 1 Gaussian random

vector due to the orthogonality of U. So

E(UTx) _—. UTm
Var(UTx) = E(UTxxTU)—E(UTx)E(xTU)
= UTE(xxT)U - E(UTx)E(x'-"U)

= UT(mmT + olexN)U — UTmmTU = 021,”.

In other words, 9% ~ N (U:"‘,I,x,). Therefore the conclusion in the lemma

 

follows from Def 1: “Kali-”3 = I] 9;! II2 with the noncentrality parameter A = [Mg—"J: =
A direct consequence is that A = 0 when Pm = 0. In this case, since Pm =

U(UTm) = 0 and the r column vectors in UNX, are independent, we must have

UTm = 0.

Def 2 [33]: If :1: = 27%, where 2:1 ~ x§v1(A) and 232 ~ xfj’v2(0) and 171,112 are indepen-

dent, then a: has F distribution, denoted as 3: ~ FN,,N,(A). When A = 0, it is called

central F distribution, otherwise noncentral.

Lemma 2: Suppose P and Q are two projection matrices with rank r1 and r2

respectively, and PQ = 0. x ~ N (m, 021). Qm = 0. Then

ll PX ll2

— ~ F, ,. (A).
II Qx ”2 ‘ ’

where A = ”ID—:13]:

Proof: Similar to the proof for Lemma 1, we can decompose P and Q as: P =

87

UIUT, Q = UgUg. And UT): and ng are r1 x 1 and 7'2 x 1 Gaussian vectors
and the latter vector has zero mean. In light of Lemma 1, ”-237”: ~ x§(A), and

2
“5%,“— ~ x3(0). Further, their correlation is

E[U{xxTU2] = UITE(xxT)U2
= U,T(mmT+ozl)U2
= 0'2UTU2
= 02(UTU,)U§‘U2(U§"U2)
= azUTPQUg

=0.

Therefore, the numerator and denominator are independent and the conclusion in

Lemma ensues.

5.3 Appendix III: Deﬁnition of Rician, Rayleigh
and t Distribution

Def 3 [33]: The pdf of X = ,/X§ + X3, where X, ~ N(p1,02) and X2 ~ N(u2,o2)

are independent, is called Rician pdf. Its pdf is explicitly expressed as:

3‘5; exp[-%,r(:z:2 + a2)]Io(§§-) a: > O

O x<0,

Px($) = (5.1)

where 02 = pf + [1% and [0(a) is the modiﬁed Bessel function of the ﬁrst kind and

order 0:

10(u) = if” ezp(u cos 6)d0. (5.2)

88

When a2 = 0 it reduces to the Rayleigh PDF.
Def 4 [59]: Suppose X ~ N(O, 1) and Y ~ x3,(0) are independent. Then the pdf of

E ;% is called t distribution, T ~ tn. Its pdf is explicitly expressed as
n

pp(t) = %(1 + $431 — 00 < t < 00. (5.3)

limnxoopﬂt) = 9%; exp(—%), i.e., as n is very large, tn —+ N(0,1).

5.4 Appendix IV: Analysis of t Test Used in MRI

Detection

Refer to Figure (5.1). 12,-’3 and y,’s are all independent. The two hypotheses are:

Ho : 1:,- ~ N(py,02), y,- ~ N(uy,o2) versus H1 : 1:,- ~ N(umoz), y,- ~ N(py,02)

i=1,2,-~,N.

where the parameters at, ay, 02 are all unknown. Also by GLRT principle, the test

statistics in this case turns out to be [14]:

 

 

£9;
,/2/N
t(x,y) = 1 N _ N _ (5.4)
\/2‘ﬁl2i=1($i " ~77)2 + 25:1(yi - 31?]
where 5: and g are the mean of 3:53 and y,’s (i = 1,2, - . -,N).

Under Ho: 5: ~ N(m,, g3) and 37 ~ N(py, 47;), so :2 - g ~ N(O, 2%), i. e.,

i-y

 

~ N(0,1). (5.5)

0'

k

89

 

 

 

 

l J A I .l

J
0 2O 40 GO .0 100 120 140

 

(a) Experiment-Rest States

‘° ' ' Y ' Y '_

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

14]... .............. .J ........ ', _

‘2- . ..; ..... .‘ ....
. . . .
1o ........". ... ...... ...... .... .. ...... ;. . ...

 

 

 

O 20 40 00 .0 100 120 140

(c) Activation Present During Experiment Period

Figure 5.1. Experiment setup and two hypotheses for t-test used in MRI detection.

90

On the other hand,

 

N _ _ 'r 2
Zea-x)? _ llx-rl “2 _ IIx—HN—xn _ n 101*fo ~ 2 (0)
i=1 02 — 02 _ 02 _ 02 XN-l )
Similarly,
” (gt-:7)? _ n Fey “2 2 0
Z T — T "V XN—1( )
i=1
Hence,
N _ 2 N - 2
i: (xi-Iv + e: (yr-y)
Z 1 ) ,2 1 ~ xterm. (5.6)

 

0'

Further, we observe that Plix .L P1X = 15: and Pliy J. Ply = 137. Therefore, the
numerator and the denominator in Equation (5.4) are independent due to Gaussian

assumption, and ﬁnally the t(x, y)|Ho ~ t2N_2 following Deﬁnition 4.

5.5 Appendix V: Principle of Invariant Test

In this Appendix I explain the idea behind invariant test [7 , [56] and introduce some
basic deﬁnitions and theorems [46].

In the hypothesis testing problems in Chapter 2, in addition to the signiﬁcant
parameter b or u on which we are testing, there are other unknown parameters called
nuisance parameters, such as the amplitude of DC level a and the variance of noise 02.
The nuisance parameters do not affect our decision, but their presence complicates
the distribution of a given test statistic [7] and determination of appropriate threshold
for the detector is entangled.

Therefore, a test is targeted which is unaffected by such nuisance parameters. We

can do this by deliberately selecting a special class of transformations on the data so

91

that the distribution of the transformed data still belongs to the same family of dis-
tributions as the original, just with another set of parameters. After transformation,
however, the signiﬁcant parameters on which we are testing for different hypotheses
still correspond to the same hypotheses as the original while the nuisance parameters
are left free to change. Since the transformed data also support the original hypoth-
esis with the signiﬁcant parameters staying within the same region of the parameter
space, it is a natural physical reasoning that if a test gives some decision on the orig-
inal data, it has to output the same decision on the transformed data. This is what
invariant test means. The basic idea is depicted in Figure (5.2) adapted from [7]. It
turns out that these transformations possess a group structure [46].

This idea can be formulated more precisely in combination with our detection
problem in Section (2.6). The observed data y is regarded as a point in the sample
space \II of random vectors with the same dimension as y. It has the probability dis-
tribution P9, 8 E Q, where 9 is the parameter (vector-valued in our case) describing
the distribution and lying in the parameter space 0. Under hypothesis H1, (2 becomes
(21; under hypothesis Ho, 52 becomes {21. Thus if {20 and {21 form a partitioning of Q,
the goal of the detection problem is then just to locate which partition 6 lies in Q,

i. e., choose between the hypotheses as follows:

Ho : G E 520 versus H1:G E 01.

From the above description, let 9 be a one to one transformation on the sample
space ‘1! such that gy has the same distribution form as y but is characterized by a
different parameter 9’, that is, it is distributed as Per, 9’ E $2. This transformation
thereby induces another transformation g on the parameter space deﬁned by 39 = 6’.

It is easy to see that this decision problem is invariant to the transformation 9 if the

92

 

 

 

 

 

 

 

 

 

 

 

 

 

 

> Decision
t(Y) < :
Y The
ame The Same
> Decision
G t(GY) < —’

 

 

 

 

 

 

 

 

 

 

 

 

Figure 5.2. Invariant Test.

corresponding induced transformation g maps each of the partitions of Q to itself:

£790 = Q0 991 = Q1-

In fact, the invariance of the test may be achieved by requiring that the test
statistic be an invariant to these transformation groups [7]; what this precisely means
will be explained in the following deﬁnitions and is clariﬁed by the proof of our
theorem in Subsection (2.6.2).

We will know from the theory of invariance that all invariant tests can be charac-
terized in this way, and it is possible to answer such questions as to whether they are
CFAR, or if an optimum test exists among them. Further, restricting attention to
such tests may bring us other advantages. For example, in many cases, the transfor-
mations 9 turn out to have such natural physical interpretations that, in practice, any
test without the corresponding invariant property would not be acceptable, thereby
substantially reducing the class of test statistics needed to be considered [57]. Also,
as can be noted from our own problem in Section (2.6), there is a great reduction in
the dimensionality of the parameters describing the performance of such tests.

Having understood the idea of invariant tests, we now give exact deﬁnitions for

93

relevant concepts and some theorems [46].

Def 5: A function t(y) is said to be invariant under a group of transformations G if
t(gy) = t(y) for all y E \II and all g E G.

Def 6: The family of distributions {P9 : 0 E Q} is said to be invariant under G if
every 9 6 G, G E Q determine a unique element in Q, denoted by g9, such that when
y has distribution P9, gy has distribution Pge. The ensemble of g constitutes the
induced group of transformations G.

Def 7: A function r/I(G) is said to be a maximal invariant under a group of transfor-
mations G if it is invariant under G and if M91) = p(ez) implies there exists some
I] E G such that 92 = gel.

Def 8: Let the family of distributions {P9 : O 6 Q} be invariant under G. The
problem of testing Ho : 9 6 S20 against H1 : 9 E Q — Do is said \to be invariant under
G ingo = $20 for all g E G.

Thm 1: A function is invariant under G if and only if it is a function of a maximal
invariant under G.

Thm 2: If the family of distributions {P9 : G e 9} is invariant under the group G,
then G = {g : g E G} is a group of transformations from Q to itself.

Thm 3: Suppose that the family of distributions {P9 : G E Q} is invariant under
the group G. If t(y) is invariant under G and «p(e) is a maximal invariant under the

induced group G, then the distribution of t(y) depends only on MB).

5.6 Appendix VI: SMAP Algorithm for Multi-
scale Image Segmentation

In Section (3.3), I only brieﬂy outline the main points of Algorithm I [10]. More

mathematical descriptions of this algorithm are given here. For a comprehensive

94

derivation, please refer to the original paper [10]. For the sake of completeness, some

descriptions and equations in Section (3.3) are rewritten below.

5.6.1 A Prior Consideration

First of all, let us make clear what the ﬁeld X j physically means. Each resolution is
a level in a quad-tree in the 2-D case, so a lattice point at one resolution corresponds
to four points at the next ﬁner resolution, as shown in Figure (3.5). This group of
four pixels in the continuous image Y is considered as a block and X1 denotes the
ﬁeld containing the labeling of each of the blocks at resolution j. We may assume
that the ﬁner segmentation Xi+1 of X is an interpolated version of X j [9].

The fundamental assumption in Algorithm I is that the sequence of X 1 forms a
Markov chain [62], i. e., the distribution of Xj given all coarser scale ﬁelds is only

dependent on X j’1:
P($j]$l l g j — 1) = szIIj—1(IjIIj—l). (5.7)

This pyramid structure of the multi-scale random ﬁeld (MSRF) is depicted in Figure
(3.5).
Since Y (in Figure (3.5a), Y" E Y) is exclusively dependent on X J , where J is

the ﬁnest scale index, it follows that the likelihood function is given by
P(y|2vj n S J) = p(le’) = pends/If), (5-8)

and then the joint distribution of X and Y may be expressed as product of recursive

transition probabilities:

Jo+1

p(y, 1‘) = rye-d(ylx") H pzilxj‘1(le$j_l)p210($Jo)i (5-9)

73:]

95

where Jo is the coarsest scale of analysis.

5.6.2 Likelihood Function

Under the assumption that observed pixels are conditionally independent given their

labels, the conditional density function for the image has the form

pyli (ylxl) = H pyklxi (yklxi) (510)
kesl

In [10], the authors conﬁne themselves to models with two properties. First, the
pixels in X j are conditionally independent given the pixels in X 3‘1. Second, each
pixel X ,1 is only dependent on a local neighborhood of pixels at the next coarser scale.
Use 6k to denote the set of neighboring locations to k, then the transition distribution

from coarse to ﬁne scale assumes the form

PxJIx1-1($j|1""1) = 1'1 whale-lush, (5.11)

kesﬁ

where pd I 1&1 is the probability density for 37,] given its neighbors at the coarser scale
52‘-

The second important assumption is that each pixel X i is only dependent on a
local neighborhood of pixels at the next coarser scale. The authors’ ﬁrst choice of the
neighborhood 0k is a quad-tree structure in 2—D case or binary-tree structure in 1-D
case, as depicted in Figure (3.6). Speciﬁcally, in the quad-tree structure each point is
only dependent on a single point at the coarser scale — its father d(k). In words, if,
by some means, we know that 1”,: is active (X j = 1), then with a high probability, we
may say that its four children are active as well. This probability is characterized by

the transitional probability density between individual pixels (from a coarser scale to

the next ﬁner scale). The transition probability that X ,1 has state m given that its

96

father is in state m’ is

I ' 1 — qj
pdpﬁdmlm) = qjam,m’ + M - (5.12)

 

This equation tells us two facts: 1) the probability that the labeling will remain the
same from scale j — 1 to j is qj + lx—fi, where qj E [0, 1]; and 2) the probabilities
that the child has any one of a number of different state from its parent’s are equally
likely, i.e., l—X—fi. The so-called “physical dependency” mentioned in Subsection (3.1.3)
is embodied in the transition probability of the Markov chain.

An important property of the quad-tree structure is that the conditional distribu-

tion of Y given X1 has a product form that can be computed recursively:

ICESJ

where puilxi is deﬁned and computed recursively according to:

M
pyi-lei-l(yi_lll‘l_l = m) = H 2 Pyglg(y¥|m')Pglzg;l(m'lm), ’6 6 SH,

red-1(Ic)m’=1
[(5.14)

where d’1(k) denote the four (two) children of k in 2-D (1-D) case. Thus this kind
of choice of neighborhood structure leads to simple and efﬁcient algorithm for the

image/signal segmentation [10].

5.6.3 Criterion and Solution

A Bayesian estimator minimizes the average cost of an erroneous segmentation:

:i: = arg mxin E[C(X,:s)]Y = y] (5.15)

97

where C(X, :r) is the cost of estimating the true segmentation, X, by the approximate
segmentation :17. Notice that X is random whereas a: is deterministic. Expectation E
is with respect to X.

The cost function used by the authors in [10] is deliberately selected to be
C (X, z) = 2’,

where j is the unique scale such that X j aé 3:], but X i = (Bi for all i < j.

According to their cost function, :2: turns out to be:

J
as = argmin22i{1—P(X‘=x‘ ism/=31»
j=Jo

J
= argmaxz 2jP(Xi = :1:i i S j|Y = y).
1
1:10
Since the random ﬁelds X j form a Markov Chain, this estimate is computed
recursively. Assuming that 53‘ has been computed for i < j, and using this result to

compute 237':

i“ = arg mgxlogp310|y(x"°|y), (5.16)
:67 = arg max logplexj—1,y(a>7|i:j"l, y). (5.17)
1:1

The recursion is started by determining the MAP estimate of the coarsest scale
ﬁeld given the observed data Y. The segmentation at each ﬁner scale is computed as
the MAP estimate of X j given Xi"1 and the image Y. It is therefore referred to as
a sequential MAP (SMAP) estimator.

Assuming that X J0 is uniformly distributed, they use Bayes rule and the Markov

98

properties of X to change the above form into another more easily computed form:

5:10 = arg maxlogpylzJo(y|xJ°), (5.18)
z 0
ij = arg m?x{logpylxi (ylxj) + logpzjlxi‘1($jlij—l)}' (519)
I

The ﬁrst term in Equation (5.19) is the likelihood of the observed data y given
the labeling at scale j. The second term carries the a prior information about the
behavior of X.

In order to satisfy dynamic range requirement, a log likelihood function is deﬁned

at each point at each scale:

li(m) a log pm: Ix, (gym). (5.20)

A new recursion ensues on substituting the transition distribution of Equation (5.12)

into Equation (5.14) and converting (5.14) to log likelihood functions:

lion) = logpyklxi(yklm)a (5.21)
' M
. . . 1— J .
li—1(m) = Z Iog{q1exp[z:(m)1+Wizexpuumm. Jo+ISJS(3-22)
red-10c) mzl

Finally, the SMAP segmentation may be efﬁciently computed by using the log

likelihood functions:

531° = argmgx Z l:°(:z:,{°), (5.23)
3” Ices-’0

iJ = My my 291(17):)+103Pd(xg;1(17ili‘5;1)}, Jo + 1 SJ' 5 J- (534)
kESJ

99

Noisy Image to be Segmented

 

 

 

15 I I I I I I m I
10 ............................................................................. _
5 l‘ ............................................................ ..
o ....... ; ............ ‘9‘ ......... _
_5 l I I I I I I l
0 500 1000 1500 2000 2500 3000 3500 4000 4500

Segmentation Result

 

    

 

 

3500 4500

Figure 5.3. Segmentation Result Using SMAP Algorithm.

The estimate of individual pixel label is then easily evaluated by:

"’0 = 11° .2
1'1: argmrgila’xﬂ k (m), (5 5)
~’ _ a“ . . .~_1 -

117,7c — arg mréilaxﬂﬂdm) +logptilr$;1(m|$§,k )}, Jo +1 5 j S J. (5.26)

One simulation result for 2-D image is already given in Section (3.3). Here I give

another example of segmenting 1-D sequence in Figure (5.3).

100

BIBLIOGRAPHY

BIBLIOGRAPHY

[1] Standard Mathematical Tables and Formulae. 29th Edition, CRC Press, 1991.

[2] Babak A. Ardekani, Jeff Kershaw, Kenichi Kashikura, and Iwao Kanno. Ac-
tivation detection in functional MRI using subspace modeling and maximum
likelihood estimation. IEEE Trans. Medical Imaging, 18(2):101—114, 2 1999.

[3] P. A. Bandettini, A. Jesmanowicz, E. C. Wong, and J. S. Hyde. Processing
strategies for time-course data sets in functional MRI of the human brain. Magn.
Reson. Med, 30:161-173, 1993.

[4] RT. Behrens and L. L. Scharf. Signal rocessing applications of obligue projection
operators. IEEE Trans. Signal Processing, 42(6):1413—1424, 1994.

[5] J.W. Belliveau, D.N. Kennedy, B.R. Buchbinder R.C. McKinstry, R.M. Weis-
skoff, M. S. Cohen, J. M. Vevea, T.J. Brady, and B. R. Rosen. Functional

mapping of the human visual cortex by magnetic resonance imaging. Science,
254(5032):716—719, 11 1991.

[6] M. A. Berstein, D. M. Thomasson, and W. H. Perman. Improved detectability
in low signal-to—noise ratio magnetic resonance images by means of a phase-
corrected real reconstruction. Med. Phys, 16(5):813—817, 1989.

[7] S. Bose and A. O. Steinhardt. A maximal invariant framework for adaptive
detection with structured and unstructured covariance matrices. IEEE' Trans.
Signal Processing, 43(9):2164—2175, 1995.

[8] A. Bruce, D. Donoho, and H-Y Gao. Wavelet analysis. IEEE' Spectrum, (4):26—-
35, 10 1996.

[9] C.Bouman and B.Liu. Multiple resolution segmentation of textured images.
IEEE Trans. Patt. Anal. Mach. Intell., 13(2):99—113, 1991.

[10] C.Bouman and M. Shapiro. A multiscale random ﬁeld model for bayesian image
segmentation. IEEE Transactions on Image Processing, 3(2):162—177, 1994:

101

[ll] C.H.Forsgate, H. Krim, W.W. Irving, W.C.Karl, and A. S. Willsky. Multiscale
segmentation and anomaly enhancement of sar imagery. IEEE Trans. Image
Processing, 6(1):7—20, 1997.

[12] Mark S. Cohen. Rapid MRI and Functional Applications, In Brain Mapping:
The Methods, A. W. Toga and J. C. Mazziota (Editors). Academic Press, 1996.

[13] R. T. Constable and R. M. Henkelman. Why MEM does not work in MR image
reconstruction. Magn. Reson. Med, 14:12—25, 1990.

[14] R. Todd Constable, P. Skudlarski, and John C. Gore. An ROC approach for
evaluating functional brain MR imaging and postprocessing protocols. Magn.
Reson. Med, 34:57-64, 1995.

[15] MS. Grouse, R.D.Nowak, and R.G.Baraniuk. Wavelet-based statistical sig-
nal processing using hidden markov models. IEEE Trans. Signal Processing,
46(4):886-—902, April 1998.

[16] J. Deller, J. Proakis, and J. Hanson. Discrete- Time Processing of Speech Signals.
Prentice Hall, Englewood Cliffs, NJ, 1993.

[17] Xavier Descombes, Frithjof Kruggel, and D. Yves von Cramon. Spatio-temporal
fmri analysis using markov random ﬁelds. IEEE Trans. Medical Imaging,
17(6):1028—1039, 1998.

[18] W. A. Edelstein, P. A. Bottomley, and L. M. Pfeifer. A signal-to-noise calibration
procedure for NMR imaging systems. Med. Phys, 11:180—185, 1984.

[19] W. A. Edelstein, G. Glover, C. Hardy, and R. Redington. The intrinsic signal-
to-noise ratio in NMR imaging. Magn. Reson. Med, 3:604—618, 1986.

[20] S. Forman, J. C. Cohen, M. Fitzgerald, W. F. Eddy, M. A. Mintun, and D. C.
Noll. Improved assessment of signiﬁcant change in functional magnetic resonance
fMRI: Use of cluster size threshold. Magn. Reson. Med, 33:636—647, 1995.

[21] L. R. Frank, R. B. Buxton, and E. C. Wong. Probabilistic analysis and functional
magnetic resonance imaging data. Magn. Reson. Med, 39:132-148, 1998.

[22] K. J. Friston, P. Jezzard, and R. Turner. Analysis of functional mri time-series.
Human Brain Mapping, 1:153-171, 1994.

102

[23] S. Geman and D. Geman. Stochastic relaxation, gibbs distributions, and bayesian
restoration of images. IEEE Trans. Patt. Anal. Mach. Intell., 6(11):721—741,
1984.

[24] C. R. Genovese. A time-course model for MRI data. In Proc. Intl. Symp. Magn.
Reson. Med. Meeting, page 1669, 1997.

[25] C. R. Genovese, D. C. N011, and W. F. Eddy. Estimating test-retest reliability of
functional MR imaging izstatistical methodology. Magn. Reson. Med, 38:497—
507, 1997.

[26] Xavier Golay, Spyros Kollias, Gautier Stoll, Dieter Meier, Anton Valavanis, and
Peter Boesiger. A new correlation - based fuzzy logic clustering algorithm for
MRI. Magn. Reson. Med, 40:249—260, 1998.

[27] Cyril Goutte, Peter Toft, Egil Rostrup, Finn A. Nielsen, and Lars Kai Hansen.
On clustering fMRI time series. NeuroImage, 9:298—310, 1999.

[28] H.Derin and H.Elliott. Modeling and segmentation of noisy and textured images
using Gibbs random ﬁelds. IEEE Trans. Patt. Anal. Mach. Intell., 9(1):39—55,
1987.

[29] Xiaoping Hu, Tuong Huu Le, Todd Parrish, and Peter Erhard. Retrospective
estimation and correlation of physiological ﬂuctuation in functional MRI. Magn.
Reson. Med, 34:201—212, 1995.

[30] W. W. Irving and J. N. Tsitsiklis. Some properties of optimal thresholds in
decentralized detection. IEEE Trans. Automatic Control, 39(4):835—838, 1994.

[31] J ian-Ming J in. Electromagnetics in magnetic resonance imaging. IEEE Antennas
and Propagation Magazine, 40(6):7—22, 12 1998.

[32] Gerald Kaiser. A Friendly Guide to Wavelets. Birkhauser, Boston, Basel, Berlin,
1994.

[33] S. M. Kay. Fundamentals of Statistical Signal Processing. Detection Theory.
Prentice-Hall, New Jersey, 1998.

[34] S. Kim, W. Richter, and K. Ugurbil. Limitations of temporal resolution in
functional MRI. Magn. Reson. Med, 37:631-636, 1997.

[35] T. Kim, L. Al-Dayeh, P. Patel, and M. Singh. Bayesian processing for MRI. In
Proc. Intl. Soc. Magn. Reson. Med, Sidney, Australia, 1998.

103

[36] S. Lai and G. H. Glover. Detection of BOLD fMRI signals using complex data.
In Proc. Intl. Soc. Magn. Reson. Med. Meeting, page 1671, 1997.

[37] Nicholas Lange. Tutorial in biostatistics: Statistical approaches to human map-
ping by fMRI. Statistics in Medicine, 15:389—428, 1996.

[38] Tuong Huu Le and Xiaoping Hu. Retrospective estimation and correlation of
physiological artifacts in fMRl by direct extraction of physiological activity from
MR data. Magn. Reson. Med, 35:290—298, 1996.

[39] R A Lerski. Physical Principles and Clinical Applications of Nuclear Magnetic
Resonance. Paradigm Print, Gateshead, London, Great Brtain, 1985.

[40] M .R. Luettgen, W.C.Karl, A. S. Willsky, and R.R.Tenney. Multiscale representa-
tions of Markov random ﬁelds. IEEE Trans. Signal Processing, 41(12):3377—3396,
1993.

[41] A. Macovski. Noise in MRI. Magn. Reson. Med, 36(3):494—497, 1996.

[42] S. Mallat and S. Zhong. Characterization of signals from multiscale edges. IEEE
Trans. Patt. Anal. Mach. Intell., 14(7):710—732, 1992.

[43] M. J. McKeown, S. Makeig, G. G. Brown, T-P Jung, S. S. Kindermann, A. J.
Bell, and T. J. Sejnowski. Analysis of fMRI data by blind separation into inde-
pendent spatial components. Human Brain Mapping, 6:160-188, 1998.

[44] E. R. McVeigh, R. M. Henkelman, and M. J. Bronskill. Noise and ﬁltration in
magnetic resonance imaging. Med. Phys, 12:586—591, 1985.

[45] M.Malfait and Dirk Roose. Wavelet-based image denoising using a Markov ran-
dom ﬁeld a priori model. IEEE Trans. Image Processing, 6(4):549—565, 1997 .

[46] J. Muirhead. Aspects of Multivariate Statistical Theory. Wiley, New York, 1982.

[47] Fangyuan Nan. SVD reconstruction algorithm and determination of signal num-
ber by frequency domain information. Submitted to IEEE Trans. on Antenna
and Propagation, 9 1999.

[48] Fangyuan Nan and Robert D. Nowak. Generalized likelihood ratio detection
of functional MRI signal using complex data. IEEE Thans. Medical Imaging,
18(4):320—329, 4 1999.

104

49 R. Nowak. Multiscale Hidden Markov Models for Bayesian Image Anal-
[
ysis, In Bayesian Inference in Wavelet Based Models, B. Vidakovic and
P.Muller(Editors). Springer Verlag, New York, 1999.

[50] Patrick Perez and Fabrice Heitz. Restriction of a markov random ﬁeld on a graph
and multiresolution statistical image modeling. IEEE Trans. Inform. Theory,
42(1):180—190, 1996.

[51] G. Polychronopoulos and J. N. Tsitsiklis. Explicit solutions for some sim-
ple decentralized problems. IEEE Trans. Aerospace and Electronic Systems,
26(2):282—292, 1990.

[52] L. Rabiner. A tutorial on hidden Markov models and selected applications in
speech recognition. Proc. IEEE, 77:257—285, 1989.

[53] Raghuveer M. Rao and Ajit S. Bopardikar. Wavelet Transforms—Introduction
to Theory and Applications. Addison-Wesley Longman, Inc., Reading, Mas-
sachusetts, 1998.

[54] O. Rioul and M. Vetterli. Wavelets and signal processing. IEEE Signal Processing
Mag., pages 14—38, 10 1991.

[55] U. E. Ruttimann, N. F. Ramsey, D. W. Hommer, P. Thevanaz, Ch. Lee, and
M. Unser. Analysis of functional magnetic resonance images by wavelet decom-
position. In Proc. Intl. Conf. Image Proc., volume 1, pages 633-636, 1995.

[56] L. L. Scharf. Statistical Signal Processing. Detection, Estimation, an Time Series
Analysis. Addison-Wesley, Reading, MA, 1991.

[57] L. L. Scharf and B. Fn'edlander. Matched subspace detectors. IEEE Trans.
Signal Processing, 42(8):2146—2157, 1994.

[58] K. Sekihara and H. Koizumi. Detecting cortical activities from fMRl time-course
data using MUSIC algorithm with forward and backward covariance averaging.
Magn. Reson. Med, 35:907—813, 1996.

[59] K. S. Shanmugan and A. M. Breipohl. Random Signals: Detection, Estimation
and Data Analysis. John Wiley and Sons, New York, 1988.

[60] B. Siewert, B. M. Bly, G. Schlaug, D. G. Darby, V. Thangaraj, S. Warach, and
R. Edelman. Comparison of the bold- and epistar-technique for functional brain
imaging using signal detection theory. Magn. Reson. Med, 36:249—255, 1996.

105

[61] M. D. Srinath, P. K. Rajasekaran, and R. Viswanathan. Introduction to Statis—
tical Signal Processing with Applications. Prentice Hall, Englewood Cliffs, New
Jersey, 1996.

[62] H. Stark and J. W. Woods. Probability, Random Processes and Estimation The-
ory for Engineers (2nd Edition). Prentice Hall, Englewood Cliffs, New Jersey,
1994.

[63] Gilbert Strang. Wavelets. American Scientist, 82:250—255, May-June 1994.

[64] S. C. Strother, I. Kanno, and D. A. Rottenberg. Principle component analysis,
variance partioning, and functional connectivity. Journal of Cerebral Blood Flow
and Metabolism, 15:353—360, 1995.

[65] M. Vetterli and Jelena Kovacevic. Wavelets and Subband Coding. Prentice Hall
PTR, Englewood Cliffs, New Jersey, 07632, 1995.

[66] A. Villringer and U. Dirnagl. Coupling of brain activity and cerebral blood
ﬂow: Basis of functional neuroimaging. Cerebrovascular and Brain Metabolism
Reviews, 7:240—276, 1995.

[67] Marinus T. Vlaardingerbroek and Jacques A. Den Boer. Magnetic Resonance
Imaging. Springer, Verlag Berlin Heidelberg, 1996.

[68] G. A. Wright. Magnetic resonance imaging. IEEE Signal Processing Mag.,
40(1):56—66, 1 1997.

[69] Jinhu Xiong, Jia-Hong Gao, Jack L. Lancaster, and Peter T. Fox. Assessment
and optimization of functional MRI analyses. Human Brain Mapping, 4:153-167,
1996.

[70] Ilan Ziskind and Mati Wax. Maximum likelihood localization of multiple sources
by alternating projection. IEEE Trans. Acoust., Speech, Signal Processing,
36(10):1553—1560, 10 1988.

106

. . . ..I... 3.9:119 poi...
. ..4I. ..C.4I.I

.. ...? 11...... ......o ......1 ...-......L... .6 ..ISET-wwd. runny” ”55.057304. a
. . .. ...I. 2...... 1... __ ... .. w... ..w......._...... ovuétfh'l 1.37:5}...3... . «__I. :2. o. . _ .

. . ..7 . L :I.. .2 .... r7... .Ls ...-8.. $42.13..." 1.5:; "2:... ...... ...”. . :‘3x .1... .
. T ... . . 1...; =...-w.ML.— Lha- ...“2. .3 3 _— =th - £3.27 5% .ﬂ‘: “3.1.9 .........d..._ 3 _ . .

e. .. 1.8.3.... ......» ......z..t»-.1..~ .v..v...:.:.3.-c...w :7; 3.157 ......312 .....r

6‘ JT-uﬂrwgm...“ vu£........... .3... .. 63.9.?- A’. ......imhlq ~.¥...b . ...». n . .13
...-3d,. _ 75:2... ...q..._:.. _

     

 

..-.440

 

“L411... ..44.. _~.. .... .Jnn... ... ... (a, ...

. . at, I
. . . ...-1:“..M
.

. ...-1.41.1.4: wI -»d..'l.o .
......u...:r:..3.....”..._ 3.... ...nmaﬁngd ..

3.3174; ..

  
  
 

 

illlllﬂllIJELIIMHMiliiﬂlﬂl‘ztlimlﬂlill