IIMHHNWI \ l WWI!“ I E W 200‘) LIBRARY Mmhigan State Unix-re rsitv This is to certify that the thesis entitled DEVELOPMENTAL STEREO: EMERGENCE OF DISPARITY PREFERENCE IN COMPUTATIONAL MODELS OF VISUAL Master of Science CORTEX presented by Mojtaba Solgi has been accepted towards fulfillment of the requirements for the degree in Computer Science \ \ . l . . .‘\ b. 1 ‘ I A i ' r ‘\ I I, 1| I 7) L (\v {I L x t! \I, {I \' \Ir 1i Major BiofESsor’sSignature 31/ ”if ,/ W L] r Date MSU is an Affirmative Action/Equal Opportunity Employer PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 5/08 K:IProj/Acc&Pres/CIRCIDatEDue indd DEVELOPMENTAL STEREO: EMERGENCE OF DISPARITY PREFERENCE IN COMPUTATIONAL MODELS OF VISUAL CORTEX By M ojtaba Solgi A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Computer Science 2009 ABSTRACT DEVELOPMENTAL STEREO: EMERGENCE OF DISPARITY PREFERENCE IN COMPUTATIONAL MODELS OF VISUAL CORTEX By M ojtaba. Solgi One of the major tasks carried out in our visual system is to create three dimen- sional representation of the visual world using the two-dimensional images reflected on the retinas. How do we estimate the relative depth of the objects in the visual field? It is known that stereoscopic cues for binocular disparity are one of the major ways for the brain to perceive three—dimensional objects. However, there is much unknown about how this complicated process takes place in the brain. This the- sis proposes computational models to study the role of 6-layer architecture of the laminar cortex to detect the slight differences in the images on the left and right retina, disparity. Assuming the spatial continuity of the visual stimuli, we investi- gate how top-down signals can be used as temporal context information to guide recognition during the testing phase. The experimental results indicate that the use of top-down efferent signals in the form of supervision or temporal context signals not only greatly improves the performance of the networks, but also results in bi- ologically compatible cortical maps — the representation of disparity selectivity is grouped, and changes gradually along the cortex. To our knowledge, this work is the first neuromorphic, end-to—end model of laminar cortex that integrates tempo- ral context to develop internal representation, and generates accurate motor actions in the challenging problem of detecting disparity in natural images. The network reaches a sub—pixel error rate in the case of regression, and 0.90 recognition rate in the case of classification, given limited resources. DEDICATION I dedicate this thesis to my parents who have offered me unconditional love and care throughout the course of my life. iii ACKNOWLEDGEMENTS I would like to thank my adviser, Professor John Weng, for his insightful guidance throughout this thesis. My Special gratitude goes to Dr. Danil Prokhorov from Toyota Research Institute for research assistantship support during the past two semesters. I thank my labmates Matthew Luciw for his helpful discussions, and Paul Cornwell for his generosity whenever I needed help. Last but not least, I would like to Sincerely thank my roommate and dear friend, Rouhollah J afari, for his true friendship and care over the past two years. iv Keep away from people who try to belittle your ambitions. Small people always do that, but the really great make you feel that you, too, can become great. Mark Twain TABLE OF CONTENTS LIST OF TABLES ..................................... viii LIST OF FIGURES .................................... ix 1 Introduction ....................................... 1 1.1 Motivation ...................................... 1 1.2 Task Decomposition ................................ 2 1.3 Thesis Outline .................................... 3 2 Background ....................................... 4 2.1 Basics of Human Visual System .......................... 4 2.1.1 Eye ..................................... 5 2.1.2 Visual Pathway ............................... 5 2.1.3 Retina .................................... 6 2.1.4 LGN .................................... 7 2.1.5 Primary Visual Cortex ........................... 8 2.1.6 Disparity .................................. 8 2.1.7 Geometry of Binocular Vision ....................... 9 2.1.8 Encoding of Binocular Disparity ...................... 11 2.2 Existing Work in Computational Modeling of Binocular Vision .......... 12 2.2.1 Energy Model ................................ 13 2.2.2 Memer et. a1. 2000 ............................. 14 2.2.3 Works based on LLISOM ......................... 14 3 Overview of the Project ................................ 22 4 Network Architecture and Operations . -. ...................... 27 4.1 Single-layer architecture .............................. 27 4.2 6-1ayer architecture ................................. 29 5 Analysis ......................................... 36 5.1 Elongated Input Fields Using Top-down ...................... 36 5.2 Top-down Connections Help Recruit Neurons More Efficiently .......... 39 5.3 Why use 6-layer Architecture? ........................... 42 5.4 Recovery from Hallucination ............................ 44 vi 6 Experiments and Results ................................ 47 6.1 Classification .................................... 47 6.1.1 The Effect of Top-Down Projection .................... 48 6.1.2 Topographic Class Maps .......................... 48 6.2 Regression ...................................... 49 6.2.1 The Advantage of Spatio-temporal 6-layer Architecture .......... 49 6.2.2 Smoothly Changing Receptive Fields ................... 50 7 Conclusion ....................................... 58 APPENDICES ....................................... 62 A Neuronal Weight Updating ............................... 62 BIBLIOGRAPHY ..................................... 63 vii 2.1 LIST OF TABLES Four basic types of disparity selective neurons. viii LIST OF FIGURES 2.1 Anatomy of the human eye (reprinted from [33]) .................... 5 2.2 Visual pathway in human (reprinted from [31]) ..................... 6 2.3 Samples of the receptive fields shapes in human V1 (reprinted from [31]) ........ 7 2.4 The geometry of stereospsis (reprinted from [40]) ................... 9 2.5 Horizontal Disparity and the Vieth-Muller circle(reprinted from [11]) .......... 10 2.6 Vertical Disparity (reprinted from [3]) ......................... 11 2.7 'I\vo models of disparity encoding (reprinted from [1]) ................. 16 2.8 An example of random dot stereogram (reprinted from [40]) .............. 17 2.9 Disparity tuning curves for the 6 categories of disparity selective neurons. TN: tuned near, TE: tuned excitatory, TF: tuned far, NE: near, TI: tuned inhibitory, FA: far (reprinted from [18]) ...................................... 17 2.10 Energy Model by thawa et. a1. [34] (reprinted from [34]) ............... 18 2.11 Modified Energy Model by Read et. a1. [42] (reprinted from [42]) ........... 18 2.12 Pre—processing to create a pool of stimuli by Wimer et. a]. [21] (reprinted from [21]) . . 19 2.13 Self-organized maps of left and right eye receptive fields (reprinted from [21]) ..... 19 2.14 Schematic of the architecture for basic LLISOM (reprinted from [31]) ......... 20 2.15 Self-organized orientation map in LLISOM (reprinted from [31]) ............ 20 2.16 TWO eye model for self organization of disparity maps in LLISOM (reprinted from [39]) . 21 2.17 Topographic disparity maps generated by LLISOM (reprinted from [39]) ........ 21 ix 4.1 4.2 4.3 5.1 5.2 (a). The binocular network single-layer architecture for classification. (b). The binocular network 6—layer architecture for regression. Two image patches are extracted from the same image position in the left and right image planes. Feature-detection cortex neurons self-organize from bottom-up and top-down signals. Each motor neuron is marked by the disparity it is representative for (ranging from -8 to +8). Each circle is a neuron. Activation level of the neurons is shown by the darkness of the circles: the higher the activation, the darker the neurons are depicted. The diagram shows an instance of the network during training phase when the disparity of the presented input is —4. In (a) the stereo feature—detection cortex is a single layer of LCA neurons. A rectangular kernel sets the activation of only Disparity -—4 neuron to 1 and all the others to 0. In (b), the stereo feature-detection cortex has a 6-layer laminar architecture (see Fig. 4.3). A triangular kernel, centered at the neuron of Disparity —4, imposes the activation level of Disparity —4 neuron and four of its neighbors to positive values and all the others to 0. ...... Examples of input, which consists of two rows of 20 pixels each. The top row is from the left view and the bottom row is from the right view. The numbers on the left side of the bars exhibit the amount of shift/disparity. ....................... Architecture diagram of the 6-layer laminar cortex studied in this paper, which also intro- duces some notation. The numbers in circles are the steps of the algorithm described in Section 4. See the text for notations. ......................... Each circle represents a neuron, and the shade of circlesrepresents the degree of dispar- ity the neuron is tuned to. The areas shown around neurons are the the input fields of neurons. (a) The quantization of input space by neurons without top-down input. The input fields of neurons has the same amount of variation in either of directions relevant and irrelevant input (shown as a square for the sake of visualization simplicity, should be Voronoi diagrams). (b) The quantization of input space by neurons with top-down input. For simplicity we assume the there is a linear relation between relevant part of bottom-up input, X R, and the top—down input, Z. The input fields of the neurons are still isomorphic (shown as squares) on the input manifold. However, the projection of the input fields on the bottom-up space is no longer isomorphic, but elongated along the irrelevant axis. Top-down connections enable neurons to pick up relevant receptive fields. If a neuron is supervised by the top-down connections to detect a particular disparity d, the irrelevant subspace includes those areas where object images do not overlap, i.e. 55,-; and fir. The first subindex indicates whether it is the irrelevant or relevant part of the input space (i and 1' respectively), and the second subindex shows whether it is from the left view or right view (I and 1' respectively). .............................. 33 37 5.3 5.4 5.5 6.1 The deviation of samples along any direction in the input space recruits neurons along this direction. (a) The subspace of relevant information has smaller variance than the ir- relevant information. Neurons spread more along the direction of irrelevant subspace. In other words, more neurons characterize the values in the irrelevant space (e.g., 5 neurons per unit distance versus 2 per unit distance). (h) Scale the relevant input by a factor of 2, increasing the standard deviation by a factor of two. Then, neurons spread in both direc- tion with similar densities. (c) Further scale down the irrelevant input, enabling neurons to spread exclusively along the relevant direction (i.e., invariant to irrelevant direction). The mechanisms of neuron winner selection (via lateral inhibition) in single-layer and 6—layer architectures. The maps are taken from a snap-shot of the 20 x 20 neurons in the networks performing on real data. Each small square projects the value for a neuron in that particular position (black(white): minimum(maximum) values). The top row shows the steps in the single-layer architecture, and the bottom row shows the steps for the 6-layer architecture (which shares some steps with the single-layer architecture). 69 represents the operation of taking weighted average of two vectors (similar to Eq. 4.6). ...... Schematic illustration of how 6-layer architecture, as opposed to single-layer architecture, makes recovery possible. A sample from class A is given to the network during testing (after the network is developed) while the context top~down signals are related to class B (wrong tOp—down signals depicted in red(darker) in the figure) . This causes the input to the neurons to be considered as a malicious (wrong) input (denoted by red(darker) stars) and lie out of the input distributions. This figure illustrates the state of the networks after receiving such an input. (a) Single-layer architecture. At time t, two closest neurons to the input have the highest pre-responses (k = 2). They win and fire. The winner neu- rons cause the top-down context input to slightly change/adapt to their top-down values. However, this change is not beneficial as the top-down component is still wrong. There- fore, at time t + 1 the input will still be classified as class B, which is wrong. (b) In a 6—layer architecture, neurons in L4 compete for bottom-up energy and two vertically closest neurons to the input have the highest pre-response and win. In the same manner, two horizontally closest neurons to the input in L2 / 3 have the highest pre-response and win. Then when the pre-response of neurons in L2 / 3 is computed it is very probable that some neurons from the correct class A have high preresponses and win in the next step (lst row of (b) far right graph). As a result, top-down input will have a right component as well. Because of this right component of the top-down signal, at the next time step t + 1, the network receives a right input (shown by light star in the 2nd row of (b) far left graph) besides the wrong input. Therefore, we see that one of the final winner neurons is in the correct class A. At the next time step t + 1 the network recovers to the state where the top-down signals are right again. ........................... Bottom-up weights of 40 x 40 neurons in feature-detection cortex using top-down con- nections. Connections of each neurons are depicted in 2 rows of each 20 pixels wide. The top row shows the weight of connections to the left image, and the bottom row shows the weight of connections to the right image. ....................... xi 41 6.2 6.3 6.4 6.5 6.6 6.7 7.1 The recognition rate versus the number of training samples. The performance of the network was tested with 1000 testing inputs after each block of 1000 training samples. The class probability of the 40 x 40 neurons of the feature-detection cortex. (a) Top-down connections are active (0: = 0.5) during development. (b) Top-down connections are not active (or = 0) during development. .......................... The effect of top-down projection on the purity of the neurons and the performance of the network. Increasing a in Eq. 4.1 results in purer neurons and better performance. How temporal context signals and 6-layer architecture improve the performance. The effect of relative top—down coefficient, a, on performance in disjoint recognition test on randomly selected training data. .......................... (a) Map of neurons in V2 of macaque monkeys evoked by stimuli with 7 different dis- parities. Adapted from Chen et. a1. 2008 [7] (b) Disparity-probability vectors of L2 / 3 neurons for different disparities when n = 5. (c,e). Disparity-probability maps in L2 / 3 where K. = 5 in (c) and K = 1 (e). (d,f). Cross—correlation of disparity-probability where 'n=5in(d)andn=1in(f). ............................. Comparison of our novel model of L2 / 3 where it performs both sparse coding and inte- gration of top-down and bottom-up signals, with traditional models in which it only does integration. ...................................... xii 53 54 55 59 Chapter 1 Introduction 1.1 Motivation Humans and many other animals posses two eyes via which they perceive the visual world. Because the two eyes are placed in horizontally different positions in the skull, they receive two slightly different images of the visual scenes. This difference is referred to as disparity. Psychophysical studies indicate that disparity is one of the main cues for the emergence of three-dimensional representation of the world from two-dimensional retinal images [35]. Intensive amount of studies in computer vision community during the past few decades has proved that the challenges of stereo vision cannot be addressed without a thorough understanding of the biological visual systems. Recent studies of binoc- ular depth perception in the physiological level has shed light onto many aspects of the role of stereoscopic cues in the perception of depth. However, there are much more unknown for a unified theory of how the actual mechanism takes place. The important role of computational models toward such theory should never be under- estimated. It is via computational models that researchers can verify their theories based on experimental observations, and also predict the details of some mechanisms before any data is available for it. Depending on the matter of study, the proposed models must be as biologically plausible as computaional tools allow, otherwise one cannot imply the biological analogy of the results. One such computational model is MILN (Multilayer In-place Learning Networks), a cortex inspired learning network architecture that operates using LCA (Lobe Componenet Analysis). By in-place learning, we mean each neuron in the network learns on its own and by interacting with other neurons, without the need of any external controller. Lobe Component Analysis is a dual-optimal learning algorithm that atonomously derives representaion from input samples. As an extention to the original MILN networks, we implemented a model of the 6—layer architecture of the laminar cortex within the same architecture. The main goal of the project was to investigate the mechanisms of top-down connections as supervision or context signals in the cortical architectures to the emergence of disparity preference in the modeled neurons. 1.2 Task Decomposition The project was carried out in two main phases. In the first phase we utilized the default version of MILN, and investigated its abilities to detect disparities in a challenging setting of natural images. A new implementation of MILN was developed in C++ from scratch. The necessary modules were added to the basic MILN to handle binocular disparity data. After the preliminary study in the first phase, it was evident that MILN has the capability to operate on binocular data. The second phase involved designing a novel architecture of the newtworks to handle top—down context signals during testing. A graphical user interface was developed to visualize the internal states and operations of the network. Final results were convincing that the new architecture successfully utilizes context information to elevate the recognition abilities of the network, and demonstrates biologically plausible cortical maps. 1.3 Thesis Outline The outline of the remainder of this thesis is as follows: 0 Chapter 2 introduces the fundamentals of biological visual systems required for understanding the biological terminology used in the later chapters. It also briefly presents an overview of the previous computational models of binocular disparity encoding and disparity detection. 0 Chapter 3 provides an overview of the specific problems addressed in this thesis. 0 Chapter 4 presents the structure of the different types of the network used in the thesis, along with the learning algorithms. 0 Chapter 5 analytically explains the mechanisms of the methods used, and provides reasons as to why we should expect such beahavior and outputs from the networks. 0 Chapter 6 presents the experiments done in this thesis along with the results obtained. 0 Chapter 7 concludes the thesis, and provides some predictions about the func- tionality of cortical regions. Chapter 2 Background This chapter presents the fundamentals of neurological knowledge required for un- derstanding the biological binocular vision systems regarding disparity encoding and detection. Furthermore, the details of LCA (Lobe Component Analysis) and MILN (Multilayer In-place Learning Networks) are discussed and compared with other models of visual neural networks. At the end of the chapter, related works on dis- parity models are presented. Most material on biological visual systems is adapted from Kandel 2000 [24] and Ramtohul 2006 [39], and those about LCA and MILN are largely adapted from Weng & Luciw 2009 [49]. 2.1 Basics of Human Visual System The human visual system is one of the most remarkable biological systems in nature, formed and improved by millions of years of evolution. About the half of the human cerebral cortex is involved with vision, which indicates the computational complexity of the task. Neural pathways starting from the retina and continuing to V1 and the higher cortical areas form a complicated system that interprets the visible light projected on the retina to build a three dimensional representation of the world. In this chapter we provide background information about the human visual system Vitreous gel ‘3‘ ,, ' '- .. . Iris Optic nerve "f " . : g Cornea Macula .. _ .1 Pupil Figure 2.1: Anatomy of the human eye (reprinted from [33]) and the neural mechanisms involved during the development and operation of visual capabilities. 2.1.1 Eye When visible light reaches the eye, it first gets refracted by the cornea. After passing through the cornea, it reaches the pupil. To control the amount of light entering the eye, the pupils size is regulated by the dilation and constriction of the iris muscles. Then the light goes through the lens, which focuses it onto the retina by proper adjustment of its shape. 2. 1.2 Visual Pathway The early visual processing involves the retina, the lateral geniculate nucleus of thalamus (LGN), and the primary visual cortex (V1). The visual signals then go through the higher visual areas, which include V2, V3, V4 and V5/ MT. After initial Right eye . Primary Visual field ‘ . ' ‘ visual cortex (V1) Figure 2.2: Visual pathway in human (reprinted from [31]) processing in the retina, output from each eye goes to LGN, at the base of the same Side of the brain. LGN in turn does some processing on the signals and projects to the V1 of the opposite side of the brain. The optic nerves, going to opposite sides of the brain, cross at a region called the optic chiasm. V1 then feeds its output to higher visual cortices where further processing takes place. Fig. 2.2 presents a schematic overview of the visual pathway. 2.1.3 Retina The retina is placed on the back surface of the eye ball. There is an array of special purpose cells on the retina, such as photoreceptors, that are responsible for converting the incident light into neural signals. There are two types of light receptors on the retina: 1) rods that are responsible for vision in dim light 2) cones that are responsible for vision in bright light. The total number of rods is more than cones, however there are no rod cells in the center of retina. The central part of the retina is called the fovea which is the center of fixation. The density of the cone cells is high in the fovea, which enables this area to detect the fine details of retinal images. For the first time, Stephen Kuffler recorded the responses of retinal ganglion cells IDII (a) ON cell in (b) OFF cell in (c) 2-lobe V1 (d) 3-lobe V1 retina or LGN retina or LGN simple cell simple cell Figure 2.3: Samples of the receptive fields shapes in human V1 (reprinted from [31]) to rays of light in a cat in 1953(Hubel, 1995). He discovered that it is possible to influence the firing rate of a retinal ganglion cell by projecting a ray of light to a specific spot on retina. This spot is called the receptive field (RF) of the cell. Below is a definition of receptive field from Livine & Shefner 1991: ”Area in which stimulation leads to a response of a particular sensory neuron” In other words, for any neuron involved in the visual pathway, the receptive field is a part of the visual stimuli that influences the firing rate of the specific neuron. Fig. 2.3 shows a few examples of the shape of receptive fields in the visual pathway. 2.1.4 LGN The LGN acts like a relay that gets signals from the retina and projects to the primary visual cortex (V1). It consists of neurons similar to retinal ganglion cells, however the role of these cells is not clear yet. The arrangement of the LGN neurons is retinotopic, meaning that the adjacent neurons have gradually changing, overlap- ping receptive fields. This phenomena is also called topographic representation. It is believed that the LGN cells perform edge detection on the input signals they receive from the retina. 2.1.5 Primary Visual Cortex Located at the back side of the brain, the primary visual cortex is the first cortical area in the visual pathways. Similar to LGN, V1 neurons are reinotopic too. V1 is the lowest level of the visual system hierarchy in which there are binocular neu- rons. These neurons are identified by their ability to respond strongly to stimuli from either eye. These neurons also exhibit preference to specific features of the visual stimuli such as spatial frequency, orientation and direction of motion. It has been observed that some neurons in V1 Show preference for particular disparities in binocular stimuli - stimuli with a certain disparity causes potential discharge in the neuron. V1 surface consists of columnar architecture where neurons in each column have more or less similar feature preference. In the columnar structure, fea- ture preference changes smoothly across the cortex, meaning that nearby columns exhibit similar and overlapping feature preference while columns far from each other respond differently to the same stimuli. Overall, there is a smoothly varying map for each feature in which preferences repeat at regular intervals in any direction. Examples of such topographic maps include orientation maps, and disparity maps which are the subject of study in this thesis. 2.1.6 Disparity It is known that the perception of depth arises from many different visual cues (Qian 1997 [37]) such as occlusion, relative size, motion parallax, perspective, shading, blur, and relative motion (DeAngelis 2000 [11], Gonzalez & Perez 1998 [18]). The cues mentioned were monocular. There are also binocular cues because of the stereo property of the human vision. Binocular disparity is one of the strongest binocular cues for the perception of depth. The existence of disparity is because the two eyes are laterally separated. The terms stereo vision, binocular vision and stereospsis are interchangeably used for the three-dimensional vision based on binocular disparity. IQ O N. N Ca 4 D! Figure 2.4: The geometry Of stereospsis (reprinted from [40]) 2.1.7 Geometry of Binocular Vision Fig. 2.4 illustrates the geometry of the stereo vision. Suppose that the eyes are focused(fixated) at the point Q. The images of the fixation point falls on the fovea, Q L and Q R on the left and right eyes, respectively. These two points are called corresponding points on the retina, since they both get the reflection of the same area of the visual field (fixation point in this example). The filled circle S 'is closer to the eyes and its image reflects on different Spots on the two retinas, which are called non-corresponding points. This lack of correspondence is referred to as disparity. The relative depth of the point S, distance 2 from the fixation point, can be easily calculated given the retinal disparity 6 = r — l, and the interocular distance (the distance between the two eyes), I. Since this kind of disparity is caused by the location of the objects on the horizontal plane, it is known as horizontal disparity. It can be proven that all the points that are at the same disparity as the fixation point lie on a semi-sphere in the three-dimensional Space. This semi-sphere is referred to as the horopter. Points on the horopter, inside and outside of the horopter have Vieth Muller Circle P Figure 2.5: Horizontal Disparity and the Vieth-Muller circle(reprinted from [11]) zero, negative and positive disparities, respectively. The projection of the horopter on the horizontal plane crossing the eyes (at the eyes level) is the Vieth-Muller circle. It is known that another type of disparity, called vertical disparity, plays some role in the perception of depth, however, it has not been studied as intensively as horizontal disparity. The vertical disparity occurs when an object is considerably closer to one eye than the other. According to Bishop 1989 [3], such vertical dispar- ities occur when objects are located relatively close to eyes and are above or below the horizontal visual plane, but do not reside on the median plane, the vertical plane that divides the human body into left and right halves. Fig. 2.6 simply illustrates vertical disparity. Point P is above the visual plane and to the right of the median plane, which makes it closer to the right eye. It can be seen that the relation fig > C1 holds between two angles BI and 62. The vertical disparity, denoted by v, iS the difference between these two angles, '0 = B2 — 31 [3]. 10 Figure 2.6: Vertical Disparity (reprinted from [3]) 2.1.8 Encoding of Binocular Disparity There are several ways that binocular disparities can be described. One can encode disparity as the retinal positions of visual features (such as edges) corresponding to the same spots in the visual field, or formulate the images as a set of sine waves using Fourier analysis, and encode disparity as the phase difference between the sine waves at the same retinal position. The former is referred to as position disparity and the latter is phase disparity. There is evidence supporting the existence of the both of disparities in biological visual systems [9]. These two possibilities are illustrated in Fig. 2.7. 11 Disparity selective cell type Placement of stimuli Tuned-excitatory Stimuli at zero disparity Tun ed-inhibi tory Stimuli at all disparities except those near zero disparity Near Stimuli at negative disparity Far Stimuli at positive disparity Table 2.1: Four basic types of disparity selective neurons. 2.2 Existing Work in Computational Modeling of Binocular Vision Perhaps the first remarkable study of the neural mechanisms underlying binocu- lar vision dates back to the 1960’s by Barlow et. al. [2]. They discovered that neurons in the cat striate cortex respond selectively to the objects with different binocular depth. In 1997 Poggio and Fischer [16] did a similar experiment with an awake macaque monkey that confirmed the previous evidence by Barlow et. al. [2]. Since the visual system of these animals to a great extent resembles that of human, researchers believe that there are disparity-selective neurons in the human visual cortex as well. Poggio & Fischer [16] used solid bars as visual stimuli to identify and categorize the disparity selective neurons. Table 2.2 contains the naming they used to categorize the cell types. Julesz 1971 [22] invented random dot stereogram (RDS), which was a great con- tribution to the field. A random dot stereogram consists of two images filled with dots randomly black or white, where the two images are identical except a patch of one image that is horizontally shifted in the other (Fig. 2.8). When a human subject fixates eyes on a plane farther or closer to the plane on which RDS lies, due to the binocular fusion in the cortex, the shifted region jumps out (seems to be at a different depth from the rest of the image). Experiments based on RDS contributed to strengthen the theory of 4 categories of disparity selective 12 neurons [18]. Later experiments revealed the existence of two additional categories, named tuned near and tuned for [36]. Fig. 2.9 depicts the 6 categories identified by Poggio et. al. 1988 [36]. Despite neurophysiological data and thrilling discoveries in binocular vision, a computational model was missing until 1990 when thawa et. al. [34] published their outstanding article in Science journal. They introduced a model called the disparity energy model. Later some results from physiological studies did not match the predictions made by energy model. Read et. al. 2002 [42] proposed a modi- fied version of the original energy model. In the following sections, we present an overview of the two different versions of the important work of the energy model. 2.2. 1 Energy Model thawa-DeAngelis-Freeman (ODF) 1990 [34] studied the details of binocular dispar- ity encoding and detection in the brain, and tried to devise a computational model compatible with the biological studies of binocular vision. They argued that at least two more points need to be taken into account before one can devise a plausible model of the binocular vision. 1. Complex cells must have much finer receptive fields compared to what was reported by Nikara et. al. [32] 2. Disparity sensitivity must be irrelevant to the position of the stimulus within the receptive field. Considering the limitations of the previous works and inspired by their own predictions, thawa et. al. presented the Energy Model for disparity selective neurons. Fig. 2.10 schematically shows their model. There are 4 binocular Simple Cells (denoted by S) each receiving input from both eyes. The receptive field profile of the Simple cells is depicted in small boxes. The output of the simple cells then 13 goes through a half-wave rectification followed by a squaring function. A complex cell (denoted by Ca: in Fig. 2.10) then adds up the output of the 4 subunits S 1, S2, S3 and S4 to generate the final output of the network. Read et. al. [42] completed the previous energy model by thawa et. al. [34]. They added monocular simple cells to the model that performs a half-wave recti- fication on the inputs from each eye before feeding them to the binocular Simple cells. The authors claimed that the modification in the Energy Model results in the neurons exhibiting behavior close to real neuronal behavior when the input is anti-correlated binocular stimuli. Fig. 2.11 Shows the modified Energy Model. 2.2.2 Wiemer et. al. 2000 Wiemer et. al. [21] used SOM as their model to exhibit self-organization for disparity preference. Their work was intriguing as for the first time it demonstrated the development of modeled binocular neurons. They took stereo images form three- dimensional scenes, and then built a binocular representation of each pair of stereo images by attaching corresponding stripes from the left and right images. They then selectively chose patches from the binocular representation to create their input to the network. An example of this pre-processing is shown in Fig. 2.12. After self-organization they obtained disparity maps that exhibited some of the characteristics observed in the visual cortex. Fig. 2.13 shows one exmaple of the maps they reported. 2.2.3 Works based on LLISOM Laterally Interconnected Synergetically Self-Organizing Maps by Mikkulainen et. al. [31] is a computational model of the self-organizing visual cortex that has been extensively studied over the past years. It emphasized the role of the lateral connec- tions in such self-organization. Mikkulainen et. a1. [31] point out three important 14 findings based on their models: 1. Self-organization is driven by bottom-up input to shape the cortical structure 2. Internally generated input (caused by genetic characteristics of the organism) also plays an important role in Self-organization of the visual cortex. 3. Perceptual grouping is accomplished by interaction between bottom-up and lateral connections. Although LLISOM was an important work that shed some light on the self- organization in the visual cortex, they failed to model an important part of the signals received at the visual cortex, namely top-down connections, and the role of this top-down connections in perception and recognition. Fig. 2.14 Shows an overall structure of the LLISOM. It consists of retina, LGN- ON and LGN-OFF sheets, and V1 sheet. Unlike SOM, in LLISOM each neuron is locally connected to a number of neurons in its lower-level sheet. Also, neurons are laterally connected to their neighbors. The strength of the connection between neurons is adapted during learning based on Hebbian learning rule. The process of learning connection weights is called self-organization. Thanks to lateral connec- tions, LLISOM gains finer self-organized maps than SOM. Fig. 2.15 presents an example of the self-organizing maps using LLISOM. Ramtohul 2006 [39] studied the self-organization of disparity using LLISOM. He extended the basic architecture of LLISOM to handle two eyes, and the new architecture two eye model for disparity selectivity. Fig. 2.16 shows a schematic diagram of his model. He then provided the network with patches of natural images as input to investigate the emergence of disparity maps. The network successfully developed topographic disparity maps as a result of input-driven self-organization using LLISOM. However, this work did not provide any performance measurement 15 report, since the motor/ action layer was absent in the model. Fig. 2.17 shows an example of the topographic disparity maps reported by Ramtohul 2006 [39]. 16 Fixation plane RF phase disparity (a) Position Difference Model W M Far Fixation plane RF phase disparity (b) Phase Difference Model Figure 2.7: Two models of disparity encoding (reprinted from [1]) 17 Figure 2.8: An example of random dot stereogram (reprinted from [40]) ioo TN 100 TE ioo TF 80 80 80 50 60 60 4o 40 4o 3 20 20 20 i: 8. o 7 , 0 _ . o if) -1 o 1 -i o 1 -1 o i 8 100 NE 100 T1 100 FA so so so so so so 40 4o 40 20 20 20 o 0, . o -1 o 1 -l o r -1 o 1 Horizontal Disparity (deg) Figure 2.9: Disparity tuning curves for the 6 categories of disparity selective neurons. TN: tuned near, TE: tuned excitatory, TF: tuned far, NE: near, TI: tuned inhibitory, FA: far (reprinted from [18]) 18 =71 ' N In pha.se- III III . ’ ' 4., . 4 , I” III IIII9n =-A . II III Quadr.atu-re Figure 2.10: Energy Model by thawa et. al. [34] (reprinted from [34]) Left Right Simple Complex ' Monocular Binocular iiI MS db MS MS Cx Figure 2.11: Modified Energy Model by Read et. al. [42] (reprinted from [42]) 19 Pool of 'Left-eyed’ View _ _ Stimuli Binocular Representation EDI—:Drml—ZUl—Nl—ZDI— m {I Figure 2.12: Pre-processing to create a pool of stimuli by Wimer et. al. [21] (reprinted from [21]) B Fields Figure 2.13: Self-organized maps of left and right eye receptive fields (reprinted from [21]) 20 Figure 2.14: Schematic of the architecture for basic LLISOM (reprinted from [31]) i‘- Iteration O Iteration 10,000 Figure 2.15: Self-organized orientation map in LLISOM (reprinted from [31]) 21 Figure 2.16: Two eye model for self organization of disparity maps in LLISOM (reprinted from [39]) fl- BIIIIBDGIIDI EIIDEIHWDE LIIWE EEDDEEIIII BEIIEHIDBE EEDIBEIJBBE LGNOnch‘tAfferent Weights after] 0,000 iterations. Plotting Density 10.0 LGNOnRightAfferent Weights after] 0,000 iterations, Plotting Density 10.0 Figure 2.17: Topographic disparity maps generated by LLISOM (reprinted from [39]) 22 Chapter 3 Overview of the Project The past few decades of engineering efforts to solve the problem of stereo vision proves that the computational challenges of binocular disparity are far from trivial. In particular, the correspondence problem is extremely challenging considering dif- ficulties such as featureless areas, occlusion, etc. Further, the existing engineering methods for binocular matching are not only computationally expensive, but are also hard to integrate with other visual cues to help the perception of depth. It is important to look at the problem from a different angle — How the brain solves the problem of binocular vision? In particular, what are the computational mech- anisms that regulate the development of the visual nervous system, and what are the roles of gene-regulated cortical architecture and spatiotemporal aspects of such mechanisms? In the real world, objects do not come into and disappear from the field of view randomly, but rather, they typically move continuously across the field of view, given their motion is not too fast for the brain to respond. At the pixel level, however, values are very discontinuous as image patches sweep across the field of view. Our model assumes that visual stimuli are largely spatially continuous. Motivated by the cerebral cortex, it utilizes the temporal context in the later cortical areas, including the intermediate areas and motor output area, to guide the development of earlier 23 areas. These later areas are more “abstract “ than the pixel level, and thus pro- vide needed information as temporal context. However, how to use such emergent information is a great challenge. Existing methods for stereo disparity detection fall into three categories: 1. Explicit matching: Approaches in this category detect discrete features and explicitly match them across two views. Well-known work in this category include [19], [12] and [55]. 2. Hand-designed filters: Filters are designed to compute profile-sensitive val- ues (e.g. Gabor filters [53], [41], and phase information [14], [47]) from images and then utilize these continuous values for feature matching. 3. Network learning models: These models develop disparity-selective filters (i.e. neurons) from experience, without doing explicit matching, and map the responses to disparity outputs (e.g. [26], [27], [21], [15]). Categories (1) and (2) employ explicit left and right match through either an explicit search or implicit gradient-based search. They are generally called explicit matching approaches. Among the different stages of the explicit matching approaches, the correspon- dence problem is believed to be the most challenging step; i.e. the problem of match- ing each pixel of one image to a pixel in the other [30]. Solutions to the correspon- dence problem have been explored using arear, feature, pixel- and phase-based, as well as Bayesian approaches [12]. While those approaches have obtained limited suc- cess in special problems, it is becoming increasingly clear that they are not robust against wide variations in object surface properties and lighting conditions [14]. The network learning approaches in category (3) do not require a match between the left and right elements. Instead, the binocular stimuli with a specific disparity are matched with binocular neurons in the form of neuronal responses. Different neurons 24 have developed different preferred patterns of weights, each pattern indicating the spatial pattern of the left and right receptive fields. Thus, the response of a neuron indicates a degree of match of two receptive fields, left and right. In other words, both texture and binocular disparity are measured by a neuronal response - a great advantage for integration of binocular disparity and spatial pattern recognition. However, existing networks that have been applied to binocular stimuli are either bottom-up SOM type or error-back propagation type. There has been no biolog- ical evidence to support error back-propagation, but the Hebbian type of learning has been supported by the Spike-Time Dependent Plasticity (STDP) [10]. SOM type of networks that use both top-down and bottom-up inputs has not be stud- ied until recently [44,45,48, 50]. In this paper we show that top-down connections that carry supervisory disparity information (e.g. when a monkey reaches an apple) enable neurons to self-organize according to not only bottom-up input, but also su- pervised disparity information. Consequently, the neurons that are tuned to similar disparities are grouped in nearby areas in the neural plane, forming what is called topographic class maps, a concept first discovered in 2007 [29]. Further, we exper- imentally show that such a disparity based internal topographic grouping leads to improved disparity classification. Neurophysiological studies (e.g. [17] and [6] ) have shown that the primary visual cortex in macaque monkeys and cats has a laminar structure with a local circuitry similar to our model in Fig. 4.3. However, a computational model that explains how this laminar architecture contributes to classification and regression was unknown. LAMINART [38] presented a schematic model of the 6—layer circuitry, accompanied with simulation results that explained how top-down attentional enhancement in V1 can laterally propagate along a traced curve, and also how contrast-sensitive perceptual grouping is carried out in V1. Weng et. al. 2007 [20] reported per- formance of the laminar cortical architecture for classification and recognition, and 25 Weng et. al. 2008 [50] reported the performance advantages of the laminar architec- ture (paired layers) over a uniform neural area. Franz & Triesch 2007 [15] studied the development of disparity tuning in toy objects data using an artificial neural network based on back-propagation and reinforcement learning. They reported a 90% correct recognition rate for 11 classes of disparity. In Solgi & Weng 2008 [46], a multilayer in-place learning network was used to detect binocular disparities that were discretized into classes of 4 pixels intervals from image rows of 20 pixels wide. This classification scheme does not fit well for higher accuracy needs, as a misclassi- fication between disparity class -1 and class 0 is very different from that between a class —1 and class 4. The work presented here also investigates the more challenging problem of regression with sub-pixel precision, in contrast with the prior scheme of classification in Solgi & Weng 2008 [46]. For the first time, we present a spatio-temporal regression model of the laminar architecture of the cortex for stereo that is able to perform competitively on the difficult task of stereo disparity detection in natural images with sub-pixel precision. The model of the inter-cortical connections we present here was informed by the work of Felleman 8; Van Essen [13] and that for the intra-cortical connections was informed by the work of Callaway [5] and Wiser & Callaway [54] as well as others. Luciw & Weng 2008 [28] presented a model for top-down context signals in spatio-temporal object recognition problems. Similar to their work, in this paper the emergent recursive top—down context is provided from the response pattern of the motor cortex at the previous time to the feature detection cortex at the current time. Biologically plausible networks (using Hebbian learning instead of error back- propagation) that use both bottom-up and top-down inputs with engineering-grade performance evaluation have not been studied until recently [20,46,50]. It has been known that orientation preference usually changes smoothly along the cortex [4]. Chen et. al. [7] has recently discovered that the same pattern applies 26 to the disparity selectivity maps in monkey V2. Our model shows that defining disparity detection as a regression problem (as opposed to classification) helps to form similar patterns in topographic maps; disparity selectivity of neurons changes smoothly along the neural plane. In summary, the work here is novel in the following aspects: (1) The first laminar model (paired layers in each area) for stereo. (2) The first utilization of temporal signals in a laminar model for stereo (3) The first sub-pixel precision among the network learning models for stereo. Applying the novelties mentioned in (1) and (2) showed surprisingly drastic accuracy differences in performance. (4) The first study of smoothly-changing disparity sensitivity maps (5) Theoretical analysis that supports and provides insights into such performance differences. 27 Chapter 4 Network Architecture and Operations The networks applied in this paper are extentions of the previous models of Mul- tilayer In—place Learning Network (MILN) [50]. To comply with the principles of Autonomous Mental Development (AMD) [51], these networks autonomously de- velop features of the presented input, and no hand-designed feature detection is needed. To investigate the effects of supervisory top-down projections, temporal context, and laminar architecture, we study two types of networks: 1) Single-layer architec- ture for classification and 2) 6-layer architecture for regression. An overall sketch of the networks is illustrated in Fig. 4.1. In this particular study, we deal with networks consisting of a sensory array (marked as Input in Fig. 4.1), a stereo feature-detection cortex, which may be a single layer of neurons or have a 6-layer architecture inspired by the laminar architecture of human cortex, and a motor cortex that functions as a regressor or a classifier. 28 4.1 Single-layer architecture In the single-layer architecture, the feature-detection cortex simply consists of a grid of neurons that is globally connected to both the motor cortex and input. It per- forms the following 5 steps to develop binocular receptive fields: 1. Fetching input in L1 and imposing supervision signals (if any) in motor cortex — When the network is being trained, 2(M) is imposed originating from outside (e. g., by a teacher). In a classification problem, there are c motor cortex neurons and c possible disparity classes. The true class being viewed is known by the teacher, who communicates this to the system. Through an internal process, the firing rate of the neuron corresponding to the true class is set to one, and all others set to zero. 2. Pre-response — Neuron n, on the feature-detection cortex computes its pre- competitive response 22(L1) — called pie-response, linearly from the bottom-up part and top-down part L1 , ~(L1) .(m) I“ Rt) w...- (t) Z7; (0 = (1 _ a) Il5lL1)(t)ll nwéfilknu ziM ) (t) - 23:?) (t) IIaM>Iquv§§1lu (4.1) If) (t) are this neuron’s bottom-up and top-down weight vec- _. L1 .. where wg’z. )(t) and w: tors, respectively, and 2( t) is the firing rates of motor cortex neurons (supervised during training, and not active during testing). The relative top-down coefficient a is discussed in detail later. We do not utilize linear or non-linear function 9, such as a sigmoid, on firing rate in this paper. 29 3. Competition via Lateral Inhibition - A neuron’s preresponse is used for intra-level competition. It neurons with the highest preresponse win, and the others are inhibited. If Ti = rank(2§L1)(t)) is the ranking of the pre—response of the i’th neu- ron (with the highest active neuron ranked as 0), we have 22(L1) (t) = s(r,-)2§L1)(t), where k—‘El HO 3 r,- < k (4.2) 0 ifrz-Zk S(Ti) = 4. Smoothing via Lateral Excitation — Lateral excitation means that when a neuron fires, the nearby neurons in its local area are more likely to fire. This leads to a smoother representational map. The topographic map can be realized by not only considering a nonzero-responding neuron 2' as a winner, but also its 3 x 3 neighbors, which are the neurons with the shortest distances from 2' (less than two). 5. Hebbian Updating with LCA - After inhibition, the top-winner neuron and its 3 x 3 neighbors are allowed to fire and update their synapses. We use an updating technique called lobe component analysis [52]. See Appendix A for details. The motor cortex neurons develop using the same five steps as the above, but there is not top-down input, so Eq. 4.1 does not have a top—down part. The response ilM) is computed in the same way otherwise, with its own parameter k controlling the number of non-inhibited neurons. 4.2 6-layer architecture The architecture of the feature-detection cortex of the 6—layer architecture is sketched in Fig. 4.3. Layer L1 is connected to the sensory input in a one-to-one fashion; there is one neuron matched with each pixel, and the activation level of each neuron is equal to the intensity of the corresponding pixel (i.e. 2(L1)(t) = f(t)). We use no hand-designed feature detector (e.g. Laplacian of Gaussian, Gabor filters, etc.), as it 30 would be against the paradigm of AMD [51]. The other four layers1 are matched in functional-assistant pairs (referred as feedforward-feedback pairs in [6]). L6 assists L4 (called assistant layer for L4) and L5 assists L2/ 3. Layer L4 is globally connected to L1, meaning that each neuron in L4 has a connection to every neuron in L1. All the two-way connections between L4 and L6, and between L2/ 3 and L5, and also all the one-way connections from L4 to L2/ 3 are one-to-one and consant. In other words, each neuron in one layer is connected to only one neuron in the other layer at the same position in neural plane coordinates, and the weight of the connections is fixed to 1. Finally, neurons in the motor cortex are globally and bidirectionally connected to those in L4. There are no connections from L2/3 to L4. The stereo feature-detection cortex takes a pair of stereo rows from the sensory input array. Then it runs the following developmental algorithm. 1. Fetching input in L1 and imposing supervision signals (if any) in mo— tor cortex — L1 is a retina—like grid of neurons which captures the input and sends signals to L4 proportional to pixel intensities, without any further processing. Dur- ing developmental training phase, an external teacher mechanism sets the activation levels of the motor cortex according to the input. If nz- is the neuron representative for the disparity of the currently presented input, then the activation level of 11,- and its neighbors are set according to a triangular kernel centered on ”i- The activation level of all the other neurons is set to zero: — M i 2' ' K. z;- (t) = 1 "‘ .f (1(3) < (4.3) 0 1f d(z,j) 2 K. where d(z', j) is the distance between neuron ”2' and neuron nj in the neural plane, and h: is the radius of the triangular kernel. 1L2 and L3 are counted as one layer (L2/ 3) 31 (M Then the activation level of motor neurons from the previous time step, 2]. )(t - 1), is projected onto L2/3 neurons via top-down connections. 2. Pre-response in L4 and L2/3 — Neurons in L4(L2/3) compute their pre- response (response prior to competition) solely based on their bottom-up(top—down) input. They use the same equation as in Eq. 4.1, except L4 only has bottom-up and L2 / 3 only has top—down. SWIM) . 23$“) (t) .(L4) _ z. (t) L4) (4-4) ,i _ “Mammal-2‘] (an and é(L2/3)(,.) _ @2122/3) (L2/3) z Ham/3N0“Hafiz/”(t)” .22. t) (4.5) 3. L6 and L5 provide modulatory signals to L4 and L2 / 3 — L6 and L5 receive the firing pattern of L4 and L2/3, respectively, via their one-to-one connections. Then they send modulatory signals back to their paired layers, which will enable the functional layers to do long-range lateral inhibition in the next step. 4. Response in L4 and second pre-response in L2/ 3 — Provided by feedback signals from L6, the neurons in L4 internally compete via lateral inhibition. The mechanism for inhibition is the same as described in Step 4 of single-layer architec- ture. The same mechanism concurrently happens in L2/ 3 assisted by L5, except the output of L2/ 3 is called the second pre—response (denoted by §§L2/3) (t)). 5. Response in L2/3 — Each neuron, 11,; in L2/3 receives its bottom-up input (L2/3) (t) z' from one—to—one connection with the corresponding neuron in L4 (i.e. b 22(L4) (t)). Then it applies the following formula to merge bottom-up and top-down 32 information and compute its response. (1.2/3) z, (t) = (1 — a) . bzle/ 3) (t) + a . 2?” 3) (t) (4.6) where a is the relative top-down coefficient. We will discuss the effect of this pa- rameter in detail in Section 6.2.1. 6a. Response of motor Neurons in Testing — The activation level of the motor neurons is not imposed during testing, rather it is computed utilizing the output of feature-detection cortex, and used as context information in the next time step. The neurons take their input from L2/3 (i.e. b‘EM) (t) = 2(L2/3)(t)). Then, they compute their response using the same equation as in Eq. 4.4, and laterally com- pete. The response of the winner neurons is scaled using the same algorithm as in Eq. 4.2 (with a different It for the motor layer), and the response of the rest of the neurons will be suppressed to zero. The output of the motor layer is the response weighted average of the disparity of the winner neurons: (M ) 72., is winner 2 ,th n,- is winner disparity = where d,- is the disparity level that the winner neuron 71,- is representative for. 6b. Hebbian Updating with LCA in Training — The top winner neurons in L4 and motor cortex and also their neighbors in neural plane (excited by 3 x 3 short- range lateral excitatory connections) update their bottom-up connection weights. Lobe component analysis (LCA) [52] is used as the updating rule. See Appendix A for details. Afterwards, the motor cortex bottom-up weights are directly copied to L4 top- down weights. 33 o Disparity -8 Left Image fl 1 Disparity-4 Right Image Layer 1 (features) Layer 2 (motor) (a) Single-layer Architecture Ll, L2/3, L4, M L5, and L6 3‘— Disparity -8 Left Image lefi row - ht row g, ’ I E 0‘— Disparity +4 3 m Input Stereo feature- 1‘ 0‘— D‘Spamy +8 detection cortex Motor cortex 0’) 6-layer Architecture Figure 4.1: (a). The binocular network single-layer architecture for classification. (b). The binocular network 6—layer architecture for regression. Two image patches are extracted from the same image position in the left and right image planes. Feature-detection cortex neurons self-organize from bottom-up and top-down signals. Each motor neuron is marked by the disparity it is representative for (ranging from -8 to +8). Each circle is a neuron. Activation level of the neurons is shown by the darkness of the circles: the higher the activation, the darker the neurons are depicted. The diagram shows an instance of the network during training phase when the disparity of the presented input is —4. In (a) the stereo feature—detection cortex is a single layer of LCA neurons. A rectangular kernel sets the activation of only Disparity —4 neuron to 1 and all the others to 0. In (b), the stereo feature-detection cortex has a 6—layer laminar architecture (see Fig. 4.3). A triangular kernel, centered at the neuron of Disparity —4, imposes the activation level of Disparity ——4 neuron and four of its neighbors to positive values and all the others to 0. Figure 4.2: Examples of input, which consists of two rows of 20 pixels each. The top row is from the left view and the bottom row is from the right view. The numbers on the left side of the bars exhibit the amount of shift /disparity. 35 Supervision Signals T0 Motors Lateral . _ inhibition 6118100er 6 I" O O . . . 0 (fl Motor cortex inhibition 0 000 000L5 1.2/3 _®_> O O O 2423)“) O O O 4 " Laterale O O Z(L5)(,) O O 0 «©— .Lateral O O O z(L6)(t) o o o <—@-— ’ I inhibition I ' 1,4000 C OOOL6 0002M)(t)000 Stereo feature- detection cortex 0 0 0 L10 0 O O O O From Sensors Figure 4.3: Architecture diagram of the 6-layer laminar cortex studied in this paper, which also introduces some notation. The numbers in circles are the steps of the algorithm described in Section 4. See the text for notations. 36 Chapter 5 Analysis 5.1 Elongated Input Fields Using Top-down The neighborhood of the input space to which a neuron 71,- is tuned (the neuron wins given input from that neighborhood) is called the spatial input field1 of that neuron, denoted by Q,- C R". We assume that for eachneuron iii the subspace Q,- has a uniform distribution2 along any direction (axis) d with mean value ”i,d and standard deviation ”id The d’th element of the input vector 53' is denoted by 1rd. Proposition 1: The higher the variation of data along a direction in the input field of a neuron, the less is the contribution of that direction of input to the neuron’s chance to win in lateral competition. According to the principles of LCA learning [49], after development each neu- ron ”i is tuned to the mean value of its input field3, aid, along any direction d. Therefore, the average deviation of input from the neurons tuned weight is 0i, d for any direction d. It is evident that the larger this deviation ‘72' d is, the less it is 1 “a plot of the relationship between position in the input field and neural response [9]. It is also referred to as input field profile. 2which is a reasonable assumption given the data is patches from natural images 3from now on, wherever we refer to “input field” we mean “input field profile” or equivalently “spatial input field” 37 statistically probable that the input matches with the neuron’s tuned weight along that direction, which in turn implies that the less is the contribution of rd on the neuron’s final chance to win in lateral competition with other neurons in the same layer. Proposition 2: Top-down connections help neurons develop input fields with higher variation along the irrelevant dimensions of input ( elongated input fields). Given uniform distribution in input data, the neurons always develop in such a way that input space is divided equally among their input fields, in a manner similar to Voronoi diagrams. In other words, they develop semi-isomorphic input fields. Therefore, we expect that i,d1 — i,d2 (5-1) for any neuron nz- and directions d1 and d2 along the uniform distribution manifold. However, when the neurons develop using top-down input, the projection of their input field on the bottom up input space is not isomorphic anymore. Instead, the bottom-up input field of the neuron is elongated along the direction of irrelevant input (See Fig. 5.1). Assuming linear dependence of Z on X R in Fig. 5.1), we have: 0- , z’d’l'r rel where dir dre l respectively represents any irrelevant and relevant dimensions of the bottom-up input, and ,3 and A are constants. According to the triangle similarity (see Fig. 5.1), when we project the input space onto bottom-up space, the constant A is a function of the ratio of the range of top-down input, 2m, to the bottom—up 2 2 2 :1: +2: A=——m2——=\/1+(3’—n-) (5.3) mm 33m 38 input, 5cm: Near Disparity Far Disparity Figure 5.1: Each circle represents a neuron, and the shade of circles represents the degree of disparity the neuron is tuned to. The areas shown around neurons are the the input fields of neurons. (a) The quantization of input space by neurons without top-down input. The input fields of neurons has the same amount of variation in either of directions relevant and irrelevant input (shown as a square for the sake of visualization simplicity, should be Voronoi diagrams). (b) The quantization of input space by neurons with top-down input. For simplicity we assume the there is a linear relation between relevant part of bottom-up input, X12, and the top-down input, Z. The input fields of the neurons are still isomorphic (shown as squares) on the input manifold. However, the projection of the input fields on the bottom-up space is no longer isomorphic, but elongated along the irrelevant axis. where rd E (0, mm) and zd E (0, zm) for any direction d. Hence: A > 1 (5.4) 4. The value of fl is a function of relative top—down coefficient, a, in Eq. 4.1, and also the ratio of the number of relevant and irrelevant dimensions in input. In the settings we used in this paper, an estimation of ,8 is as follows: dim(:i':') _ 32 fl 2 adim(2) 0.4 x —8— = 1.6 (5.5) 4A = \/2 given zm 2 mm 39 5 number of dimensions (number of e1- where dim(:F:) and dim(2') are the average ements) in the bottom-up and top-down input vectors. Therefore, the following inequality always holds: fl > 1 (5.6) Equations 5.2, 5.4 and 5.6 together imply that: (5.7) o- . >0- zvdir zidrel which is the variation of input fields of the neurons is higher along the irrelevant dimensions, and the reasoning is complete. Combining Proposition 1 and Proposition 2, we conclude that: Theorem 1: As a result of top-down connections, neurons autonomously develop input fields in which they are relatively less sensitive to irrelevant parts of the input. Figure 5.2: Top-down connections enable neurons to pick up relevant receptive fields. If a neuron is supervised by the t0p-down connections to detect a particular disparity d, the irrelevant subspace includes those areas where object images do not overlap, i.e. 55,-; and fir. The first subindex indicates whether it is the irrelevant or relevant part of the input space (i and r respectively), and the second subindex shows whether it is from the left view or right view (I and 1' respectively). 5dimensions change according to degree of disparity (See Fig. 5.2) 40 5.2 Top-down Connections Help Recruit Neurons More Efficiently According to the rules of in—place learning [48], neurons don’t know whether their inputs are from bottom-up or top-down, neither do they know where they are in the cortical architecture. Each neuron can be thought as an autonomous agent that learns on its own without the help of any controlling mechanism from outside. Adding top-down connections to a neuron increases its input dimensionality from X to X x Z where U=XxZ={(a:,z)]:cEX,z€Z} (5.8) where x is the Cartesian product operator meaning that the new space X x Z includes inputs from both bottom-up and top-down input spaces. X and Z are respectively bottom-up and top-down input spaces, defined as the following: X = {:c = Brill-"2' is the bottom-up weights of any neuron ni} (5.9) Z = {z = giléi is the top—down weights of any neuron n,} (5.10) In general, bottom-up input space X of each neuron in a cortical area is composed of the relevant subspace R, the space that is related to motor output, and irrelevant subspace I, the part of input that is not related to the output: X = R x I (5.11) It is evident that the top-down input from the space Z is relevant to the output. Thus, we write: U=X> ;‘ I: II I ::I II I I I I II I I'll I I I' "it. Is IIJ Single-layer Architecture [ Rank & Scale 6-layer Architecture Figure 5.4: The mechanisms of neuron winner selection (via lateral inhibition) in single- layer and 6—layer architectures. The maps are taken from a snap-shot of the 20 x 20 neurons in the networks performing on real data. Each small square projects the value for a neuron in that particular position (black(white): minimum(maximum) values). The top row shows the steps in the single-layer architecture, and the bottom row shows the steps for the 6—layer architecture (which shares some steps with the single—layer architecture). 69 represents the operation of taking weighted average of two vectors (similar to Eq. 4.6). down context/ supervision vector is relatively more variant. As a result we have var(Eb) << var(Ee) (5.15) where Eb and Be are two random variables that can get any of the values Eb,i and E 6),, respectively. Here, we show that as a result of the lack of variation in bottom- up stimuli in such a single-layer architecture, activation level of the feature detection neurons is mostly determined by only top-down energy and the bottom-up energy is almost discarded. Obviously, this greatly reduces the performance of the network, as the top—down context signals are misleading when the input to the network at time t is considerably different from the input at time t — 1. We call this effect “hallucination” . Let us define Eb = Eb — Ebf where Eb is the mean value of the elements in Eb (scalar value) and I is the unit matrix of the same size as Eb. Also, E}; 2 E6 — Ref 44 in the same manner, and 25 = (1 — oz) - Eb + a - E8. Since 5' is only a constant term different from 73’, we have rank(zZ-) = rank(§:'z-) (5.16) which is, the rank of each element 2,- in 2' is the same as the rank of the corresponding element 2,; in 5. In addition, the rank of each element 2’,- = (1 — a) - Eb,i + a - E15,,- is mostly determined by its top-down component, Ema The reason is because Eq. 5.15 induces the absolute value of the top-down component for most of the neurons is much greater than the absolute value of the bottom-up component, i.e. [EM] >> [Eb,il' Hence, the ranking of neurons’ activation is largely effected only by their top-down component, and the reasoning is complete. On the other hand, in the case of 6—layer architecture (the bottom row in Fig. 5.4), the bottom-up and top-down energies are ranked separately in L4 and L2/3, respectively, before they get mixed and compete again to decide the winner neurons in L2 / 3. Therefore, as a result of separation of bottom-up and top—down energies in different laminar layers, the 6-layer architecture manages to out-perform the single- layer architecture, specially when the imperfect context top-down signals are active (as opposed to supervision top-down signals which are always perfect). 5.4 Recovery from Hallucination Fig. 5.5 is an intuitive illustration of how ranking top—down and bottom-up energy separately, as done in the 6—layer laminar architecture, will lead to recovery from a hallucination state, while the single layer architecture cannot recover. This analysis is consistent with the results presented in Fig. 6.5. In Fig. 5.5, the input space of neurons is shown on the two axes; top—down input is represented by the horizontal axis, and bottom-up input is represented by the vertical axis. The input signals to the networks are depicted in filled curves along 45 the axes. Distribution of the two classes A and B are shown in rounded rectangles which are wider along the direction of the top-down input since, as discussed earlier in Section 5.3, top-down input is more variant than the bottom-up which results in recruitment of neurons more along the top-down direction according to Property 2. The two classes are shown to be linearly separable 8 along the direction of top-down input, but not along the bottom-up input, because top-down signals are always relevant during training. We assume that only top 2 neurons fire (e.g. k = 2). In a single-layer architecture (Fig. 5.5a), given an input with wrong top-down component of class B while the input actually belongs to class A (e.g. when context is unrelated to the bottom-up input), the network will be trapped in a hallucination state, because the high variation of the top-down signal leaves a very small chance for the input to lie close to neurons in class A. Fig. 5.5a illustrates that having a similar bottom-up input at time t + 1 (according to spatial continuity of the input) will not change the situation. On the other hand, in a 6—layer architecture, the neurons compete for top-down energr (in L2 / 3) and bottom-up energy (in L4) separately. In the first row, far left plot of Fig. 5.5b two neurons in class B have high preresponses because of the wrong (misleading) top—down input, and two other neurons in class A have high preresponses because of the right (correct) bottom-up input. As a result, there is a high chance that there are winners among the class A neurons. As the new sample comes in at time t + 1 (with the same or very similar bottom-up component due to spatial continuity of input), it is expected that only neurons in the correct class A win as both their bottom-up and top-down component are closer to the input. Finally the network recovers in the far right plot in Fig. 5.5b as both the winner neurons are from the correct class A, and the top-down input will be right from then on. 8shown linearly separable only for the sake of illustration simplicity in the figures 46 Highest pre-response neurons a Winner neurons 0 Input signal, wrong _A_ Non-winner neurons 0 Input with wrong top-down component o ... Input with right top 1 in Eq. 4.3) during training. Fig. 6.7b shows this effect on topographic maps; having K. = 5 causes the regions sensitive to close-by disparities quite often reside next to each other and change gradually in neural plane (in many areas in Fig. 6.7b the colors change smoothly from dark blue to red). Experiment B — It 2 1 However, if we define disparity detection as a classification problem, and set It = 1 in Eq. 4.3 (only one neuron active in motor layer), then there is no smoothness in the change of the disparity sensitivity of neurons in the neural plane. These observations are consistent with recent physiological discoveries about the smooth change of stimuli preference in topographic maps in the brain [8] and dis- parity maps in particular [7, 43]. 52 Baau9923333333mrr .mmnnnumnnurmmmnnullllln aunvv1=a==::..ae.mmnnn.=mnuuuuuunmasasc. auurrer-=::::.w.1umunn.mmuunmmmnmssaau. a.m:r11111::1a~11uuuu1mmmmmmmmamanuuns. WflflPPPVPr:::Jm ”:3uuflJQIIXSIIIBBIHRHHRB. I O‘I‘IIIIISSUIOIOC: “PEEP-n.0,Pun—.m-m-mmmm-mmmuu". mum.".mmmmmmemmmwmmmmmHuaoozImm annnnrrrrrrrrwwmuuwafl. .mammmszsmnnmsassea raaanrrrrrervn..11...111111sgsgmum===.=. Pr—u-u-uh- - .1..~..umuum..d-o.-.o.u u ..m.____flmm83rmmHH«—==III. nannrrrmm”was...uunnr»:==fluu:rn -Iinzaall fl-uflfi... . u. . 3......- .. c a . . .. .u—u—Iin-tltiall 882... em... 9. on a .. a o a - 1.333533 ‘3: .m I a owuooooocooo-chu—n‘. a; .u‘.’oo'331¢.‘.".‘.. 38:....a::;JJJaahulnnllwwyslaaaauuuunHun 88:;sgas:eaaaaa:asunmnltuUJllluuuumununu wvvwau=..:33aaaas:annssslasallaunuummmmnn m1vnrrrr:=aaa:aa:=ssnair.redaafluuummmlan 111”LIrraaaauawas:ssaatrrrzaaaaumflmmsssl asauhrrrr. .aaunanassasaarvvmzaauunmmsssls aliquot-urfuo-H-nfl—u uSI‘IICQ-oloflur-oo“luun"unnsgouggs lad.ummhh:1fiflm..LL3£$%Vfiattzsgflnflut====n aa:ammhnnna===..aasruusva============== fig . a...1...........................nuu.muu.m.......m. 1...... .. .1 . 1.1....- u .. o a I-- -.-ncu __ . ......................1111111“. 1.... ................ ........ _ I . -. o a... I on... .... .1. . . _ .. ,. -—. 1a . I .1. . : . .59: ‘A Q... 1. in I. 1.- . - 1.1-1...... 11 . a.-. . twsarrrnflnastzsaesa. ...n11111 mar-car111111111121.-91.111...11111 mu:uu==wa1111111121111111:....«was nr-I=.:isssssasasaa.19211..qqes ner::::.====uasssss.ua..uuuaaeqq an :1 ::sa::::uus=aaassssssamesuuuvre..1aannau as:a:::=====-=as:1ssx:m:ss.wvaa.«snags .n ncasaaaaaa=uuasaaaaaw:amwaa..;..nwemnnna nun-aaaau==auaaaeeamea::1ascc...“mmmmmns Inlaa=======u===z=nmua:uucassaaarmmaamnn Isa=======auuu==zuazamLuaaasaaa:=$z==nnu aaaaaaaaauuuuz1111:111111111513..11:51am naaaaaaaauqummmez. -.sz=uuugagg=a=gsagssgn I-aaaaaaaquuuzsmmxma iuuuuxgsaszswa..zaga .aaaaaaaauamaap1mmwu.uuuugsnnaaszvegvsau .. . . . . . II We an any! hi}. I. S E g I... I v .4 l. W .. . A. 5.. Rum I“... .33 —- h‘. .. . “— -.. K- ...a....::umu1111.1111...=...=.=1111111~ . a-x1- .aaaaauuu1.11111111111111....11111-111.11 .1 3.11 n.1M n.- ion cortex using top— 53 Bottom—up weights of 40 x 40 neurons in feature—detect down connections. Connections of each neurons are depicted in 2 rows of each 20 pixels wide. The top row shows the weight of connections to the left image, and the bottom row shows the weight of connections to the right image. Fngretil 0.9 - , . 0.8 .- "\’~ g I O d 3 I g 0.7. g .’ l E 0.6 - I -4 c f 8‘ 0.5 - I . U l A? , 0.4 _ — 03 _ + With top-down projection, Top-50 - - - - With top-down projection, Top—1 0.2 » , —0— Without top-down projection, Top-1 L- 0-1 1 l l 1 1 1 1 0 5 10 15 20 25 3O 35 40 Number of training samples (x1000) Figure 6.2: The recognition rate versus the number of training samples. The performance of the network was tested with 1000 testing inputs after each block of 1000 training samples. 54 i3” a‘: Class Probability "i‘“\l;’ // \ \‘\ ‘;I¢ V II,” \‘“ Class Probability 0 Figure 6.3: The class probability of the 40 X 40 neurons of the feature—detection cortex. (a) Top-down connections are active ((1 = 0.5) during development. (b) Top-down connections are not active (01 = 0) during development. 55 0.4 0.35 - - - - Testing Entropy Training Entropy 0.3 0.25 0.2 Entropy 0.15 0.1 0.05 0 Oil 0.2 0.3 0:4 0.5 Relative Top—down Coeflicient Figure 6.4: The effect of top-down projection on the purity of the neurons and the performance of the network. Increasing a in Eq. 4.1 results in purer neurons and better performance. Effect of utilizing laminar architecture and temporal context 5.5 l l l l 5 4.5 ._ ...................... ............. + Single—layer architecture _ COHtEXt enabled - : -*- Euclidean SOM updating rule 5 -o— Single—layer architecture - Context disabled ..................... . . . . . . . . -B- Dot—product SOM updating rule - + 6—layer architecture — Context disabled ...................... . . . . .. . . . . .. + 6—layer architecture _ Context enabled .. .h l l w in U) I N 3 Root Mean Square Error (pixel) N in nu. 9 Ln 4 6 8 10 Epochs of Training Figure 6.5: How temporal context signals and 6—layer architecture improve the perfor- mance. 56 Mean Root Square Error (pixels) ........................................................... .............................................................. ............-..... .......................................-. .n.--......-~-.........-................~ ----------- . . . ; -0- Single-layer - Context enabled ............ .. ......... I Singie-IBYEI'~COfltextdisab|Ed _l t I I -*- 6-Iayer - Context disabled + 6-Iayer - Context enabled ........................................................ ‘ . ............................................................................ . - #5 n' O - ' . V - V - v —.r . . ............................................................................................................. _. 1 l i i 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Relative Top-down Coefficient, a Figure 6.6: The effect of relative top-down coeflicient, a, on performance in disjoint recognition test on randomly selected training data. 57 I“ I r , . “552535 ”I ' . . Disparity —8 am :i 7’ O Disparity +4 Neural Plane Dlmenslon 2 N a U) L» Disparity +8 Disparity—probability Vector O _._. mom _ (d) '2: "- -Disparity —8 8 32:: = - ._ :5“ : - o ' m: 2 - . . U ”:1: i, =Dlsparlty —4 g 4 g5,...'§§.“3t5 E . E «3‘ ' " mums: i5 -Dlsparlty0 43 {$9.33, :2 ‘ 5': E55! 3 inn-w —. u 0 0 ti! i: n .u . . Q, 53 EDlsparltyHi I :3 E2 9? as I: . . g 3' lZlDIsparIty +8 L, E! 0 —8 —4 O 4 Neural Plane Dimension 1 Disparity—probabillty Vector (e) (1) Figure 6.7: (a) Map of neurons in V2 of macaque monkeys evoked by stimuli with 7 different disparities. Adapted from Chen et. al. 2008 [7] (b) Disparity—probability vectors of L2 / 3 neurons for different disparities when n = 5. (c,e). Disparity—probability maps in L2/3 where n = 5 in (c) and It 2 l (e). (d,f). Cross-correlation of disparity-probability where n = 5 in (d) and n = 1 in (f). 58 ’ Chapter 7 Conclusion The lack of computational experiments on real-world data in previous works has led to the oversight of the role of sparse coding in neural representation in the models of laminar cortex. Sparse coding of the input is computationally advantageous both for bottom-up and top-down input, specially when the input is complex. Therefore, we hypothesize that the cortical circuits probably have a mechanism to sparsely rep- resent top—down and bottom-up input. Our model suggests that the brain computes a sparse representation of bottom-up and top-down input independently, before it integrates them to decide the output of the cortical region. Thus, we predict that: 1 Prediction 1: What is known as Layer 2/ 3 in cortical laminar architecture has two functional roles: 1. Rank and scale the top-down energy received at the cortex (modulated by signals from L5) 2. Integrate the modulated bottom-up energy received from L4 to the modulated top-down energy received from higher cortical areas to determine the output signals‘of the cortex 1Marked as LevelZ , layers 2 through 4B in [5] Figure 2. 59 Neuroscientists have known for a long time that there are sublayers in the laminar cortex [23]. However, the functionality of these sublayers has not been modeled before. This is a step towards understanding the sublayer architecture of the laminar cortex. Our prediction breaks down the functionality of L2 / 3 to two separate tasks. This is different from the previous models (e.g. [5]), as they consider L2 /3 as one functional layer. Fig. 7.1 illustrates the result of an experiment in which we compared two models of L2 / 3. In the traditional model of L2 / 3, it is modeled as one functional layer that integrates the sparse coded signals received from L4 with the top-down energy, while in our novel model used in this thesis, L2/3 functions as 2 functional layers (see Prediction 1). 1 .3 I l l T + L2/3 as one functional layer — Integration 1 2 + L2/3 as two functional layers — Sparse coding and Integration ii 3. 1.1 — - s? u“; E! 1 - m 3 0' m C 3 0.9 r 2 ‘6 a? 0.8 - 0.7 - 0 2 4 6 8 1 0 Epochs of training Figure 7.1: Comparison of our novel model of L2 / 3 where it performs both sparse coding and integration of top-down and bottom-up signals, with traditional models in which it only does integration. 60 Presented is the first spatio—temporal model of the 6—layer architecture of the cortex which incorporated temporal aspects of the stimuli in the form of top-down context signals. It outperformed simpler single-layer models of the cortex by a sig- nificant amount. Furthermore, defining the problem of binocular disparity detection as a regression problem by training a few nearby neurons to relate to the presented stimuli (as opposed to only one neuron in the case of classification), resulted in biologically-observed smoothly changing disparity sensitivity along the neural lay- ers. Since the brain generates actions through numerical signals (spikes) that drive muscles and other internal body effectors (e.g. glands), regression (output signals) seems closer to what the brain does, compared to many classification models that have been published in the literature. The regression extension of the MILN [50] has potentially a wide scope of application, from autonomous robots to machines that can learn to talk. A major open challenge is the complexity of the motor actions to be learned and autonomously generated. As presented here, an emergent-representation based binocular system has shown disparity detection abilities with sub-pixel accuracy. In contrast with engineering methods that used explicit matching between the left and right search windows, a remarkable computational advantage of our work is the potential for integrated use of a variety of image information for tasks that require disparity as well as other visual cues. Our model suggests a computational reason as to why there is no top-down connection from L2 / 3 to L4 in laminar cortex; to prevent the top-down and bottom- up energies received at the cortex from mixing before they internally compete to sort out winners. Hence, we predict that the thick layer L2/3 in laminar cortex carries out more functionality than what has been proposed in previous models - it provides sparse representation for top-down stimuli, combines the top-down and 61 bottom-up sparse representations and projects the output of the cortical region to higher cortices. Utilization of more complex temporal aspects of the stimuli and using real-time stereo movies will be a part of our future work. 62 Appendix A Neuronal Weight Updating For a winner cell 2', update the weights using the lobe component updating prin- ciple [52]. That reference also provides a theoretical perspective on the following. Each winner neuron updates using the neuron’s own internal temporally scheduled plasticity as Wb,i(t) = filwb,z~(t — 1) + figzib(t) where the scheduled plasticity is determined by its two age-dependent weights: m a ’81: Z_1-lu(m’i) _ 1+ p(mi) J32 — -— m2- m2- , (A-I) with 51 + ,82 E 1. Finally, the cell age (maturity) m,- for the winner neurons increments: m,- 4— m, + 1. All non-winners keep their ages and weight unchanged. In Eq. (A.1), p(mi) is the plasticity function depending on the maturity m7; of neuron 2'. The neuron maturity increments every time a neuron updates its weights, starting from zero. The plasticity function prevents learning rate from converging to zero. Details are presented in [52]. 63 BIBLIOGRAPHY 64 BIBLIOGRAPHY [1] thawa I. Freeman R.D. Anzai, A. Neural mechanisms underlying binocular fusion and stereopsis: position v/s phase. In Proc. Natl. Acad. Sci., pages 5438—5443, 1997. [2] Blakemore C. Pettigrew J .D. Barlow, H.B. The neural mechanisms of binocular depth discrimination. J. Physiol, 1932327342, 1967. [3] PO. Bishop. Vertical disparity, egocentric distance and stereoscopic depth constancy: a new interpretation. In Proc. R. Soc. London Ser., pages 445—469, 1989. [4] W. H. Bosking, Y. Zhang, B. Shoefield, and D. Fitzpatrick. Orientation selec- tivity and arrangement of horizontal connections in tree shrew striate cortex. Journal of neuroscience, 17:2112—2127, 1997. [5] E. M. Callaway. Local circuits in primary visual cortex of the macaque monkey. Annual Review of Neuroscience, 21:47—74, 1998. [6] Edward M. Callaway. Feedforward, feedback and inhibitory connections in primate visual cortex. Neural Netw., 17(5-6):625—632, 2004. [7] G. Chen, H. D. Lu, and A. W. Roe. A map for horizontal disparity in monkey v2. Neuron, 58(3):442——450, May 2008. [8] D. B. Chklovskii and A. A. Koulakov. Maps in the brain: What can we learn from them? Annual Review of Neuroscience, 27:369—392, 2004. [9] B. Cumming. Stereopsis: how the brain sees depth. Current Biology, 7(10):645— 647,1997. [10] Y. Dan and M. Poo. Spike timing-dependent plasticity: From synapses to perception. Physiological Review, 86:1033—1048, 2006. [11] GO. DeAngelis. Seeing in three dimensions: the neurophysiology of stereopsis. Trends in Cognitive Sciences, 4(3), 2000. [12] U. R. Dhond and J. K. Aggarwal. Structure from stereo - a review. Systems, Man and Cybernetics, IEEE' Transactions on, 19(6):1489-1510, Nov. / Dec. 1989. 65 [13] D. J. Felleman and D. C. Van Essen. Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex, 1:1-—47, 1991. [14] D. J. Fleet, A. D. Jepson, and M. R. M. Jenkin. Phase-based disparity mea- surement. In CVGIP: Image Understand, volume 53, pages 198—210, 1991. [15] A. Franz and J. Triesch. Emergence of disparity tuning during the development of vergence eye movements. In International Conference on Development and Learning, pages 31—36, 2007. [16] Poggio GF and Fischer B. Binocular interaction and depth sensitivity of striate and prestriate cortex of behaving rhesus monkey. J. Neurophysiol, 40:13921405, 1977. [17] CD. Gilbert and TN. Wiesel. Microcircuitry of the visual cortex. Annu. Rev. Neurosci, 6:217247, 1983. [18] F. Gonzales and Perez R. Neural mechanisms underlying stereoscopic vision. Progress in Neurobiology, 55:191-224, 1998. [19] W. E. L. Grimson. From Images to Surfaces: A Computational Study of the Human Early Visual System. MIT Press, 1981. [20] T. Luwang J. Weng, H. Lu and X. Xue. A multilayer in-place learning network for development of general invariances. International Journal of Humanoid Robotics, 4(2), 2007. [21] T. Burwick J. Wiemer and W. Seelen. Self-organizing maps for visual fea- ture representation based on natural binocular stimuli. Biological Cybernetics, 82(2):97—110, 2000. [22] B. Julesz. Foundations of cyclopean perception. 1971. University of Chicago Press: Chicago. [23] E. R. Kandel, J. H. Schwartz, and T. M. Jessell, editors. Principles of Neural Science. Appleton & Lange, Norwalk, Connecticut, 3rd edition, 1991. [24] E. R. Kandel, J. H. Schwartz, and T. M. Jessell, editors. Principles of Neural Science. McGraw-Hill, New York, 4th edition, 2000. [25] T. Kohonen. Self-Organizating Maps. 1997. [26] S. R. Lehky and T. J. Sejnowski. Neural model of stereoacuity and depth interpolation based on a distributed representation of stereo disparity. The Journal of Neuroscience, 70(7):2281—2299, July 1990. 66 [27] J. Lippert, D. J. Fleet, and H. Wagner. Disparity tuning as simulated by a neural net. Journal of Biocybernetics and Biomedical Engineering, 83:61-72, 2000. [28] M. D. Luciw and J. Weng. Motor initiated expectation through top—down connections as abstract context in a physical world. In Proc. 7th International Conference on Development and Learning (ICDL ’08), 2008. [29] MD. Luciw and J. Weng. Topographic class grouping with applications to 3d object recognition. In Proc. International Joint Conf. on Neural Networks, Hong Kong, June 2008. accepted and to appear. [30] D. Marr. Vision: A Computational Investigation into the Human Representa- tion and Processing of Visual Information. Freeman, New York, 1982. [31] R. Miikkulainen, J. A. Bednar, Y. Choe, and J. Sirosh. [Computational Maps in the Visual Cortex. Springer, Berlin, 2005. [32] Bishop P. O. Nikara, T. and J. D. Pettigrew. Analysis of retinal correspondence by studying receptive fields of binocular single units in cat striate cortex. Earp. Brain Res, 6:353372, 1968. [33] National Eye Institute [NEI] of the US. National Institute of Health. http://ww.nei.nih.gov/photo (first visited 04/24/09). [34] DeAngelis G.C. Freeman R.D. thawa, I. Stereoscopic depth discrimination in the visual cortex: neurons ideally suited as disparity detectors. Science, 249210371041, 1990. [35] A. J. Parker. Binocular depth perception and the cerebral cortex. Nature Reviews Neuroscience, 8(5):379—391, 2007. [36] G. F. Poggio, F. Gonzalez, and F. Krause. Stereoscopic mechanisms in monkey visual cortex: Binocular correlation and disparity selectivity. JNeuSci, 8:4531— 4550, December 1988. [37] N. Qian. Binocular disparity and the perception of depth. Neuron, 18:359368, 1997. [38] R. D. Raizada and S. Grossberg. Towards a theory of the laminar architecture of cerebral cortex: computational clues from the visual system. Cereb Cortex, 13(1):100—113, January 2003. [39] T. Ramtohul. A self—organizing model of disparity maps in the primary visual cortex. Master’s thesis, School of Informatics, University of Edinburgh, 2006. 67 [40] J CA. Read. Early computational processing in binocular vision and depth perception. Progress in Biophysics and Molecular Biology 87, pages 77—108, 2005. [41] Jenny C A C. Read and Bruce G G. Cumming. Sensors for impossible stimuli may solve the stereo correspondence problem. Nat Neurosci, September 2007. [42] Parker A.J. Cumming B.G. Read, J.C.A. A simple model accounts for the reduced response of disparity-tuned v1 neurons to anti-correlated images. Vis. Neurosci, 19:735753, 2002. [43] A. W. Roe, A. J. Parker, R. T. Born, and G. C. DeAngelis. Disparity channels in early vision. The Journal of neuroscience : the ofifcial journal of the Society for Neuroscience, 27(44):11820—11831, October 2007. [44] P. R. Roelfsema and A. van Ooyen. Attention-gated reinforcement learning of internal representations for classification. Journal of Neural Computation, 17:2176—2214, 2005. [45] Y. F. Sit and R. Miikkulainen. Self-organization of hierarchical visual maps with feedback connections. Neurocomputing, 69:1309—1312, 2006. [46] M. Solgi and J. Weng. Developmental stereo: Topographic iconic-abstract map from top—down connection. In Proc. the First of the Symposia Series New developments in Neural Networks (NNN’08), 2008. [47] J. Weng. Image matching using the windowed Fourier phase. International Journal of Computer Vision, 11(3):211—236, 1993. [48] J. Weng and M. D. Luciw. Optimal in-place self-organization for cortical de- velopment: Limited cells, sparse coding and cortical topography. In Proc. 5th International Conference on Development and Learning (ICDL ’06), Blooming- ton, IN, May 31 - June 3 2006. [49] J. Weng and M. D. Luciw. Dually optimal neural layers: Lobe component analysis. IEEE Transaction on Autonomous Mental Development, 1, 2009. [50] J. Weng, T. Luwang, H. Lu, and X. Xue. Multilayer in—place learning networks for modeling functional layers in the laminar cortex. Neural Networks, 21:150- 159, 2008. [51] J. Weng, J. McClelland, A. Pentland, O. Sporns, I. Stockman, M. Sur, and E. Thelen. Autonomous mental development by robots and animals. Science, 291(5504):599—600, 2001. 68 [52] J. Weng and N. Zhang. Optimal in—place learning and the lobe component analysis. In Proc. World Congress on Computational Intelligence, Vancouver, Canada, July 16-21 2006. [53] Peter Werth, Stefan Scherer, and Axel Pinz. Subpixel stereo matching by robust estimation of local distortion using gabor filters. In CAIP ’99: Proceedings of the 8th International Conference on Computer Analysis of Images and Patterns, pages 641—648, London, UK, 1999. Springer-Verlag. [54] A. K. Wiser and E. M. Callaway. Contributions of individual layer 6 pyrami- dal neurons to local circuitry in macaque primary visual cortex. Journal of neuroscience, 16:2724—2739, 1996. [55] C. L. Zitnick and T. Kanade. A cooperative algorithm for stereo matching and occlusion detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7):675—684, Jul. 2000. 69