«as

. ‘
a a
éwﬂﬁ -
z: a n
r! . .u z.

x
.
- :95
_ «mama...»
f...\......n».
Leif.

 

E. A. ,2“

3.4% '15.

 

 

 

 

 

 

 

 

 

L5" .
. 1...: - V.
.r? a. 2....

 

 

.2. 1..

 

..w l | . 9 ,'.Y . 1!. «’L 3 .h. 7%. W 3.19111449731 ‘wm. ”a. Emuwvnm.‘ “J“.ﬂﬂnwmg

._ _ f4 7 _ it;

 

7351813 LIBRARY
2005 Michigan State
University

 

 

 

This is to certify that the
dissertation entitled

THE EFFECT OF PERCEPTUAL INFORMATION ON
THE ACTIVATION OF SCENE GIST:
THE NFLUENCE OF COLOR AND STRUCTURE

presented by

Monica Soﬁa Castelhano

has been accepted towards fulﬁllment
of the requirements for the

Doctoral degree in PsychologL

 

 

/
~Major Professor’s ye
/§//QZ 0'? 5/

Date

MSU is an Afﬁnnative Action/Equal Opportunity Institution

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2/05 mm.mms

THE EFFECT OF PERCEPTUAL INFORMATION ON
THE ACTIVATION OF SCENE GIST:
THE INFLUENCE OF COLOR AND STRUCTURE
By

Monica Soﬁa Castelhano

A DISSERTATION
Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
Department of Psychology

2005

ABSTRACT
THE EFFECT OF PERCEPTUAL INFORMATION ON
THE ACTIVATION OF SCENE GIST:
THE INFLUENCE OF COLOR AND STRUCTURE
By

Monica Soﬁa Castelhano

Previous studies have shown that scene gist perception occurs extremely rapidly;
however, many of these studies rely on participants’ explicit reports, making it unclear
how soon after onset scene gist is able to inﬂuence subsequent behavioral responses.
This dissertation examined the onset of scene gist activation and investigated two
possible sources of information using a new methodology: the Contextual Bias paradigm.
The Contextual Bias paradigm relies on the tendency of participants to afﬁrm having
perceived a target object that is consistent with the scene gist, and to disconﬁrm having
perceived an object that is inconsistent. In this paradigm, participants judged whether an
object (either consistent or inconsistent with scene’s gist) was present in a scene that was
brieﬂy shown previously. If the scene presented is processed to the level of gist, then
participants should be more likely to respond “yes” to consistent and “no” to inconsistent
targets. If, however, the scene gist has not been processed, then participants should
respond “yes” to both target types equally.

Seven experiments were conducted to explore scene gist activation, and how color
and scene structure contribute to this activation. Experiments 1-3 demonstrate that scene
gist is activated after 42 ms of scene presentation, and that the strength of the response
bias (the response difference between consistent and inconsistent objects) increases with

longer scene presentation durations.

Experiments 4-6 examined the contribution of color and structure by manipulating
color (color vs. monochrome) and sharpness (sharp vs. blurred) of the scenes. Results
showed that color inﬂuences gist activation later (80 ms), and only when structure was
degraded (blurred). Thus, color plays a role in rapid gist activation, but only when the
scene’s structural information is relatively more difﬁcult to extract.

Experiment 7 examined whether color is acting as a boundary segmenter or is
directly associated with gist information. Abnormally colored scenes were used to
provide segmentation for equiluminant regions, but without any association with the
’ scene’s gist. Results support the gist information hypothesis. A weighted input
framework of the interaction between structure and color is proposed to explain the

results of the current and previous studies.

Cepyright by
Monica Soﬁa Castelhano

2005

To my mother and father,

who taught me to dream big and that being somebody means being true to yourself.

ACKNOWLEDMENTS

As with any accomplishment (big or small), it couldn’t have been done without
the help, love and support of a whole army of people. I have many people to thank.
First, I’d like to thank my advisor John Henderson for all his guidance and support over
the past ﬁve years. I also wish to thank Tom Carr, Erik Altmann, and Fred Dyer for
serving as my guidance committee and for their words of advice over these past years.
Also, to Aude Oliva for her advice and coffee in times of need.

I would’ve never made it through without my family. I thank my mother, Gloria
Castelhano who encouraged me to follow my dreams even when it meant making a hard
situation even harder. I thank my sister, Marta Castelhano who has always been there to
cheer me up and listen to my gripes for hours on end. George Frade, whose kind words
pushed onto this path. To the rest of my family, who have always given me love and
support despite my absence. I also thank my friends, Teresa Silva, Ana Cavacas, Patricia
Ponte, Nancy Pinto, and Rui Vicente, who despite everything have kept me sane and kept
believing in me.

Of course, in the absence of my blood-relations, I have formed a great
dysfunctional family of friends at graduate school. First, Mareike Wieth. I do not have
the ability to properly thank you because your kindness, love and support in my times of
anxiety are more than I will ever be able to repay. Tom Wagner, although I’ve only
recently been a plague on your life, you have already helped me in more ways than I can
count. Sian Beilock, Matt Husband, Chrissy Velarde, Lisa Helder, Chris Chan, Laurie
Carr and Christy Miscisin, I thank you for putting up with me and for all your

encouragement and support. Then there was P. Barrel and T. Twist for the sudsy

vi

libations and sweet treats that helped me get through the tough parts. I also wish to thank
my fellow lab mates, Dan Gaj ewski and Aaron Pearson for their help and advice over the
last few years. And I also thank Gary Schrock and Dave Mcfarlane for their help with all
the technical stuff associated with doing research. In addition, there are a number of
graduate students, who although have moved on in recent years, nevertheless offered me
great advice and encouragement. I want to thank Kiel Christianson, Carrick Williams,
Catherine Am'ngton, and Karl Bailey.

Finally, I’d like to thank Zed the cat, for it would not do for me to forget to
mention my ﬁirry companion for the past four years. And despite her not know many
tricks or even being an avid hunter, she has kept my feet warm during late night sessions
and on many occasions has allowed herself to be subjected to lazy petting. Also, the
recent additions to our pet family, F litzer the hamster, Beta the beta ﬁsh, and the late
Long John Silver, the long living tetra for being the subjects of endless hours of mindless
starring when work became too much to bear.

Thanks to you all. I would have never been able to do this without you.

vii

TABLE OF CONTENTS

LIST OF TABLES ........................................................................... x
LIST OF FIGURES ......................................................................... xi
INTRODUCTION ......................................................................... 1
Rapid ScenePerception... .... 1
Perception of Scenes and Structure .................................................... 8
Perception of Scenes and Color ........................................................ 10
Studying Scene Gist Onset: The Contextual Bias Paradigm ....................... 16
Overview of the Current Research ................................................................ 21
THE ONSET OF SCENE GIST PERCEPTION .................................... 24
Experiment I ............................................................................. 25
Method ............................................................................ 25
Results ............................................................................. 27
Discussion ........................................................................ 29
Experiment 11 ........................................................................... 30
Method ........................................................................... 3 1
Results ............................................................................ 31
Discussion ........................................................................ 33
Experiment III ........................................................................... 33
Method ............................................................................ 34
Results ............................................................................. 35
Discussion ........................................................................ 36
AN EFFECT OF COLOUR ON SCENE GIST PERCEPTION .................. 39
Experiment IV ........................................................................... 39
Method ............................................................................ 40
Results ............................................................................. 41
Discussion ........................................................................ 46
Experiment V ........................................................................... 48
Method ............................................................................ 48
Results ............................................................................. 50
Discussion ........................................................................ 54

THE INTERACTION BETWEEN COLOUR AND STRUCTURE ON

 

SCENE GIST PERCEPTION 55
Experiment VI ........................................................................... 55
Method ............................................................................ 55

Results ............................................................................. 57
Discussion ........................................................................ 62

viii

THE ROLE OF COLOR IN RAPID SCENE GIST PERCEPTION ............... 64

 

Experiment VII ........................................................................... 66
Method ............................................................................ 67

Results ............................................................................. 68
Discussion ........................................................................ 76
GENERAL DISCUSSION ............... - -- -- - 81
Implications for Scene Perception 83
The Response Bias and Long-term Memory for Scenes" . . . . . . . . . . . . .. 86
Conceptual vs. Visual Representations of Scenes ................................... 89
Conclusion ............................................................................... 94
APPENDIX ................................................................................... 96
REFERENCES ............................................................................ 106

ix

LIST OF TABLES

Table 1. Mean (Standard Deviation) of Proportion of “yes” responses for
Experiment 6 ................................................................................................................ 56

Table 2. Mean (Standard Deviation) of Proportion of “yes” responses for
Experiment 7 ................................................................................................................ 67

LIST OF FIGURES

Figure 1. Trial Sequence. Consistent Object was “Spatula” and Inconsistent object
“Wrench”. 26

Figure 2. Proportion “yes” responses to consistent target objects (blue bars) and to
inconsistent target objects (red bars) for each duration condition in Experiment 1.
Error bars represent Standard Error of the Mean and the asterisk indicate
signiﬁcant difference between target conditions (p < 0.05) .................................. 28

Figure 3. Proportion “yes” responses to consistent target objects (blue bars) and to
inconsistent target objects (red bars) for each duration condition in Experiment 2.
Error bars represent Standard Error of the Mean and the asterisk indicate
signiﬁcant difference between target conditions (p < 0.05) .................................. 32

Figure 4. Proportion “yes” responses to consistent target objects (blue bars) and to
inconsistent target objects (red bars) for each duration condition in Experiment 3.
Error bars represent Standard Error of the Mean .................................................. 35

Figure 5. Depicts the Color (A) and Monochrome (B) conditions for Experiment 4. . ....40

Figure 6. Proportion “yes” responses to consistent target objects (darker bars) and to
inconsistent target objects (lighter bars) for each duration condition in Experiment
4: (A) Colored scene condition (B) Monochrome scene condition. Error bars
represent Standard Error of the Mean ................................................................... 42

Figure 7. Difference scores (Inconsistent — Consistent) for the Colored (blue) and
Monochrome (red). Error bars represent Standard Error of the Mean ................. 44

Figure 8. Depicts the Color (A) and Monochrome (B) conditions for Experiment 5. All
scenes were ﬁltered; spatial frequencies higher than 1 deg/image (or 17 cycles/
image) were removed, leaving only med- and low- spatial frequency
information ............................................................................................................ 48

Figure 9. Proportion “yes” responses to consistent target objects (darker bars) and to
inconsistent target objects (lighter bars) for each duration condition in Experiment
5. (A) Colored scene condition (B) Monochrome scene condition Error bars
represent Standard Error of the Mean ................................................................... 50

Figure 10. Difference scores (Inconsistent — Consistent) for the Colored (blue) and
Monochrome (red). Error bars represent Standard Error of the Mean ................ 52

Figure 11. Example of stimulus color conditions used in Experiment 6; (A) Sharp-

Colored condition, (B) Sharp-Monochrome condition, (C) Blurred-Colored
condition, (D) Blurred-Monochrome condition .................................................... 55

xi

Figure 12. Proportion “yes” responses to consistent target objects (darker bars) and to
inconsistent target objects (lighter bars) for each duration condition in Experiment
6. (A) Sharp-Colored scene condition (B) Sharp-Monochrome scene condition
(C) Blurred Colored scene condition and (D). The Coloured condition is
represented with blue bars, and the Monochrome condition with the red bars; The
Sharp condition are represented with the solid bars, and the Blurred condition are
represented with the hatched bars. Error bars represent Standard Error of the
Mean ...................................................................................................................... 57

Figure 13. Difference Scores (inconsistent mean — consistent mean) for each duration
condition in Experiment 6. The Colored condition is represented with blue lines,
and the Monochrome condition with the red lines; The Sharp conditions are
represented with the solid lines, and the Blurred condition are represented with
the dashed lines. Error bars represent Standard Error of the Mean .................... 60

Figure 14. Example of stimulus color conditions used in Experiment 7; (A) Normal Color
condition, (B) Monochrome Color condition and (C) Abnormal Color
condition ................................................................................................................ 66

Figure 15. Results of Experiment 7: Responses Proportions for the (A) Coloured scenes
(B) Monochrome scenes and (C) Abnormal scenes. ........................................... 68

Figure 16. Difference Scores (inconsistent mean — consistent mean) for each duration
condition in Experiment 7. The Colored condition is represented with blue line,
the Monochrome condition with the red line, and the Abnormal condition with the
green line. Error bars represent Standard Error ofthe Mean . ........71

Figure 17. Example of low abnormal and high abnormal stimulus used in Experiment 7;
(A) Low Abnormal Color average rating (for image: 3.60; average rating for
group: 3.77), (B) High Abnormal Color (average rating for image: 5.8; average
rating for group: 5.84) ........................................................................................ 73

Figure 18. Difference Scores (inconsistent mean — consistent mean) for each duration
condition in Experiment 7 for the colored (blue), abnormal -—low (dashed green)
and abnormal-high (solid green) color conditions. Error bars represent Standard
Error ofthe Mean74

xii

INTRODUCTION

Despite the complex nature of the visual information that surrounds us, we seem
to have an incredible ability to quickly determine what scene we are currently viewing,
even when time is too limited to view speciﬁc details. This becomes most apparent when
we are ﬂipping through channels on a television set or quickly perusing through a slide
show of vacation photos. The study of how our visual system processes the panorama of

visual information from our environment is referred to as scene perception.

Rapid Scene Perception

What deﬁnes a scene? People have an intuitive sense of what qualiﬁes as a scene,
but few researchers have explicitly speciﬁed the qualiﬁcations. Most oﬁen, a scene is
referred to as a view of a natural environment, but this deﬁnition makes it difﬁcult to
specify whether something is not a “scene”: essentially any view of the world qualiﬁes as
a scene. According to Henderson and Hollingworth (1999), a scene is deﬁned as “. . .a
semantically coherent view of a real-world environment comprising of background
elements and multiple discrete objects arranged in a spatially licensed manner” (p. 244).
They ﬁirther specify background elements as immovable surfaces and structure, while
discreet objects are small-scaled, discrete entities that are movable within the scene.
Although not immediately obvious, one important aspect of scenes captured by this
deﬁnition is that a scene (and its elements) is dependent on spatial scale. As a result,
what qualiﬁes as a background element and what qualiﬁes as an object within the scene
can change according to how much the view is zoomed in. For instance, a coffee table
can be considered an object within a living room, but a closer view of the table may shift

the coffee table to a background element, while a mug, remote controller and set of keys

become the objects within that scene (for more examples, see Henderson and
Hollingworth, 1999). In the present study, all scenes were views of natural
environments that ﬁt with the deﬁnition proposed by Henderson and Hollingworth.
Furthermore, all scenes were scaled to a human size, because it corresponds most to the
type of views of the environment that people experience in their everyday lives. Thus,
examples such as close-ups of a desktop or satellite images of a city were not included as
real-world scenes.

The current work explores whether certain perceptual properties (i.e., color and
structure) are important in the rapid processing of scenes. Although many researchers
have focused on the perception of objects, there has been a recent increase in the number
of investigators investigating real-world scenes. The study of real-world scenes was at
ﬁrst thought to be a simple extension of the object recognition literature, but on a
quantitatively larger scale. However, the mappings between objects and scenes are not as
simple as ﬁrst assumed. It has become obvious from recent ﬁndings that in order to
understand scene processing, real-world scenes must be the objects of study. The study
of real-world scenes has also called into question many general assumptions about how
the visual system processes information that originally emerged from the study of objects
and other simpliﬁed stimuli (e.g., basic geometrical shapes, letters, symbols, etc.). These
assumptions are: (1) that the more information that is available, the more time processing
that information will take; (2) that in order to extrapolate the global properties, details
must ﬁrst be processed and represented, and (3) that in order for a scene representation to
be functional (e.g., able to produce the phemenology of viewing the world as richly

detailed), all of the rich detail must be somehow represented. In the following section,

each of these assumptions will be reviewed, as well as how rapid scene perception has
challenged these assumptions.

Many researchers assumed that the more information there is in the environment
to process, the longer processing should take, and for many simple stimuli, this
assumption holds true (e. g., parallel vs. serial visual search). For example, when
searching through a random arrangement of objects on a screen, the more objects there
are, the longer it takes to ﬁnd the target object (Williams, Zacks, & Henderson, in press).
Although real-world scenes seem to have both a high quantity of information and a high
level of complexity, the assumption that more information leads to longer processing
does not seem to apply: scenes are processed very rapidly despite the amount of
information available to the system. In a seminal study on scene perception, Potter (1976)
demonstrated that scene gist could be derived within the ﬁrst 100 ms of viewing. When
participants were given a categorical label for a scene and then viewed a rapid sequence
of scenes displayed for 125 ms each, detection rates were extremely high (80%). These
ﬁndings are in direct opposition to this assumption because it is clear that not all the
details can be processed in this short amount of time; thus, rapid scene perception must
be based on other properties of the scene.

The ability to rapidly perceive a scene within the ﬁrst 100 ms despite its level of
complexity and familiarity has challenged the assumption that detailed information is
processed before global information is perceived (Palmer, 1977). David Marr (Marr,
1982; Marr & Nishihara, 1978) proposed that visual representations are constructed from
simple local properties of the scene (e.g., changes in luminance). From these local

properties, surfaces, edges, and other visual properties are integrated into a progressively

more complex representation (e.g., component objects). Scene representations are then
the result of the integration of these increasingly complex component representations. As
a result, the implication for scenes is that they cannot be categorized until their
components are represented. Interpretation of a scene, therefore, would have to occur
after the visual details are analyzed and grouped. However, many studies have since
found that visual properties can lead directly to the recognition of scenes (Oliva &
Schyns, 1997, 2000; Schyns and Oliva, 1994).

Taking Marr’s theory one-step further, scenes can be conceptualized as the
representation resulting from certain combinations of objects. Freidman (1979) proposed
that the categorization of scenes occurred in a piece-meal process that categorized the
scene based on the individual objects or a set of diagnostic objects in the image. For
instance, a kitchen is categorized as such because it is a room that contains a reﬁ‘igerator,
stove, and toaster. This assumption of scene perception was derived from the
mechanisms underlying how semantic schemas are likely to be activated — that is,
through the activation of several key components (Friedman, 1979). Therefore, scene
perception would have to occur after the perception of one or more objects. However,
studies on the rapid perception of scenes have shown that categorizing a scene takes just
as long as categorizing a single object (Biederrnan, 1988). Based on this ﬁnding, it is
unlikely that the scene’s category is derived from recognizing a set of objects. In order
for scenes to be perceived so rapidly, scene gist must be derived directly from the visual
properties of the scene.

A third assumption is that the visual system must be able to process most or all of

the information available in the visual ﬁeld in order to produce a functional

representation. This assumption is predominantly seen in theories concerning the nature
of the on-line representation. Our visual system is physiological and mechanistically
constructed to acquire highly detailed visual information only ﬁ‘orn the center of the
visual ﬁeld. This physical reality is in direct opposition to the subjective feeling of being
able to “see” all the visual details in a scene simultaneously. To explain how the visual
system produces the phenomenology of a richly detailed environment, Rayner and
McConkie (1976) proposed that the visual system relies on an integrative buffer
(although versions of this theory have been proposed by a number of other researchers,
see also Brietrneyer, Kropﬂ, & Julesz, 1982; Davidson, Fox, & Dick, 1973; Duhamel,
Colby, & Goldberg, 1992; F eldman, 1985; Jonides, Irwin, & Yantis, 1982; Pouget,
Fisher, & Sejnowski, 1993). Visual details are stored in the buffer and new visual
information from the current ﬁxation is aligned and stored with previous ﬁxations on the
scene. With each ﬁxation, the visual system builds up a representation of the scene. The
buffer eventually contains an analogical, veridical representation of the scene that is
responsible for the subjective feeling of being able to “see” all the details in a scene. The
integrative buffer has found numerous challenges (see reviews by Irwin, 1992, 1996;
O’Regan, 1992; Pollatsek & Rayner, 1992), not the least being the speed of perceiving
complex scenes. In addition, researchers have found that the visual system can use the
initial representation to locate informative or interesting regions (Loftus & Mackworth,
1978; Mackworth & Morandi, 1979), help or hinder perception of objects within the
scene (Biederrnan, Mezzanotte, & Rabinowitz, 1982; Davenport & Potter, 2004; De
Graef, Christiaens, & d’Ydewalle, 1990; Hollingworth & Henderson, 1998, 1999), guide

eye movements (Castelhano & Henderson, under review), and assist with the retrieval of

associated semantic information such as schemas and event scripts (Metzger & Antes,
1983, Friedman, 1979). Therefore, the scene representation resulting from the initial
visual processing is functional, despite the fact that most of its visual details and
component objects are not included in the representation.

Studies on rapid scene perception have shown that despite complexity of the
environment, scene perception is not based on the processing and representation of all
available visual details. The implication for scene representations is that perceiving
global information about the scene is independent of gathering details and foveating
details is not necessary to form a functional representation of the scene. So what
information is thought to be represented in rapid scene perception? The representation
of a scene is thought to include low-level properties, (e.g., color and luminance), high-
level properties (e.g., semantic category), and various intermediate-level properties, such
as spatial layout and some level of object content (De Graef, et al., 1990; Sanocki &
Epstein, 1997). Consequently, the initial representation of a scene is composed of visual
information speciﬁc to the particular scene, as well as more conceptual or general
semantic information. In the current dissertation, discussions of scene gist will
speciﬁcally refer to the conceptual representation that is activated when a scene is
viewed. In the scene literature, scene gist is oﬂen deﬁned as the basic-level semantic
category of the scene, but in the current work scene gist is more broadly deﬁned. In
addition to the basic category label, scene gist also includes the inferences this
categorization affords, such as expected component objects, expected layout, schemas,
scripts, and functions associated with a scene (F reidman, 1979; Oliva, in press; Potter,

1976, 1999). This conceptual representation of a scene is related to the semantic label of

the scene category, but is thought of as the pre—stored concept that is activated before the
activation of the semantic label. It is this conceptual representation that can be
interpreted in multiple ways and allows for a single scene to have many labels, depending
on the context of the task. The ﬂexibility of viewing the concept of the scene as separate
from a single categorical label allows the participant to mix pre-stored categories when
encountering an unusual or new type of scene. The paradigm works under the
assumption that as long the objects inquired about are within this conceptual
representation of the scene (or outside, if inconsistent), then the onset of its activation can
be measured. This deﬁnition acknowledges that once a conceptual representation for a
scene has been activated, there is a certain set of expectations that affect how that scene is
further processed. Although I acknowledge that there is more to the initial representation
of a scene than simply its conceptual representation, the focus of the present dissertation
is the perceptual information processed by the visual system that results in the activation
of this representation.

What does scene perception say about visual processing in general?
Understanding the mechanisms underlying rapid scene perception has implications for
theories of scene perception and for theories on how the visual system ﬁrst processes
incoming information. Ultimately, theories of scene perception may lead to ﬁirther
insights into how visual information is translated into conceptual representation at the
interface with higher cognitive processes. Representations resulting from rapid scene
perception constrain on-line interactions (such as directing attention to a target object or
navigating through a room) and the information that is then stored in memory (both in

level of detail and type of information). Because of the broad implications for how

scenes are represented and how the visual system processes information and interacts
with other cognitive systems (i.e., semantic memory), rapid scene perception has received

increasing amounts of attention in the visual literature in recent years.

Perception of Scenes and Structure

Sometimes referred to as the scene’s shape, scene structure at its most basic level
is the luminance information within a scene that gives rise to edges and large boundary
regions. At a higher level, scene structure refers to the different surfaces and their spatial
layout within the scene. Luminance information is known to be processed rapidly within
the early visual system. Recent studies have investigated how luminance patterns across
the whole scene can contribute directly to scene perception (Oliva & Schyns, 1997;
Parraga, Brelstaff, & Troscianko, 1998; Renninger & Malik, 2004; Schyns & Oliva,
1994; Torralba & Oliva, 2003)

To examine the contribution of scene structure, Schyns and Oliva (1994)
constructed hybrid scenes by combining low spatial ﬁequency information ﬁ'om one
scene with high spatial frequency information from another scene. Each hybrid stimulus
therefore conveyed two possible scenes, one in the low ﬁ'equency channels, and the other
in the high frequency channels. The hybrid scenes were presented for either 30 ms or
150 ms, and participants were asked to name the scene just shown. Under extremely
brief presentations (30 ms), the low spatial frequencies in the hybrids mediated scene
recognition, while at longer presentation times (150 ms) high spatial frequency
information was used for recognition. The low spatial frequency scenes were

sufﬁciently blurred so that no objects in the scenes were identiﬁable, yet participants

were able to correctly identify these scenes at brief display durations. Schyns and Oliva
concluded that scene recognition is very rapid, and that coarse scene information (low
spatial ﬁ'equency) is extracted during the ﬁrst 50 ms of scene viewing, while detailed
information (high spatial frequency) is acquired in the next few tens of milliseconds.
Furthermore, because objects were not identiﬁable in scenes composed of only low
spatial frequencies, they posited that this initial rapid perception is mediated by
information other than the combination of objects present in the scene.

In a follow-up study, Oliva and Schyns (1997) found that the coarse-to-ﬁne
information extraction sequence could be modiﬁed. Participants were “trained” to attend
either to the high or low spatial frequency channels by having them identify hybrid
scenes that were a combination of scene information and noise. That is, for each
participant, scene information was consistently displayed in one channel (high or low),
while the contrary channel displayed a meaningless pattern. In this way, participants
implicitly learned to pay attention to the only frequency channels carrying useful
information. When tested later with real hybrid scenes that included a different scene in
each of the high and low spatial frequency channels, the previous training mediated
which scene the participant would “see”. Interestingly, the type of training (high or low
spatial frequencies) did not affect accuracy at very brief presentation rates (30ms); in
both conditions, accuracy was very high. Based on these results, Schyns and Oliva
concluded that even if the system typically uses coarse-to-ﬁne analysis, it is not a ﬁxed
processing sequence. Furthermore, it is not the case that only course, low spatial
frequency information is available early on; instead, it seems that all spatial frequency

channels are available and are selected according to task demands.

In a recent study, Torralba and Oliva (2003) demonstrated that there are statistical
regularities in the images, and that the visual system can base the categorization of a
scene on the differences in these regularities (see also, Torralba & Oliva, 2002). Torralba
and Oliva (2003) also showed that these statistical regularities exist for other higher-level
categorical distinctions, such as natural vs. man-made environments, as well as whether
the view is of a close or far range. The results of these studies suggest that global
properties such as edge and region boundaries convey a great deal of information about a
particular scene’s semantic category, as well as other properties of the scene, such as
distance. These studies raise the interesting question of what other global information

may be available for the identiﬁcation of the scene and its properties.

Perception of Scenes and Color

Another source of visual information that has recently received some attention in
scene perception is color. Psychophysical and neuroimaging studies suggest that color
information is rapidly extracted (Edwards, Xiao, Keysers, Fdldiak, & Perrett 2001;
Livingstone, 1988; Livingstone & Hubel, 1984, 1988). However, studies examining the
contribution of color to the perception of both objects and scenes have shown mixed
results. Because more research has been done in the area of color effects in object
perception than in scene perception, the object perception literature can illuminate what
role (if any) color may have in scene perception. I will brieﬂy review the studies
investigating color and object perception, before reviewing research on the effect of color

on the perception of scenes.

10

Studies investigating the role that color may play in the initial visual processing of
objects have produced highly inconsistent results. This is in turn has led researchers to
argue for widely different assumptions on what information is exploited by the visual
system in its early stages of processing. There are two main camps concerning what
information is used in the initial perception of objects: edge-based information and
surface-based information (including some combination of both edge and surface).
Researchers supporting an edge-based view have argued that color (as well as other
surface cues) has no additive value in the initial perceptual analysis of objects
(Biederman, 1972, 1987, 1988; Biederrnan & In, 1988; Davidoff, 1991; Davidoff &
Ostergaard, 1988; Ostergaard & Davidoff, 1985; Ryan & Schwartz, 1956). Davidoff and
Ostergaard (1988; Ostergaard & Davidoff, 1985) had participants either name or
recognize a set of colored and monochrome objects. For the naming task, participants
had to respond with a label for each object presented on a computer screen. For the
recognition task, a series of objects were presented in which they had to detect the
presence of a target object. Based on past ﬁndings, naming was thought to involve
slower processes than the recognition test. Results for the naming task showed that
colored objects were named slightly faster, but color had no effect on the recognition
rates of the target object. From these results, Davidoff and Ostergaard argued that the
only reason that a color effect was found in the naming task was due to the slower
processes involved in executing a response for that task and concluded that color was not
used in the initial processing of the object representation because there was no effect of
color in the recognition task. Biederrnan and In (1988) compared the recognition rates of

ﬁill-color photographs to line-drawing depictions of objects and found no difference in

11

either the reaction times or accuracy performance. They concluded structural information
is sufﬁcient for the classiﬁcation of objects, and color (and other surface cues, such as
texture and light gradients) is not necessary. Biederrnan and J u argued that color was not
a necessary cue for the initial recognition of the object, unless edge information was poor
or degraded in some way (e.g., objects are occluded or ﬁltered). In either case, the role of
color is limited and irrelevant in the initial perceptual processing of objects.

In Contrast, others have argued that surface cues, such as color and textures, are
the important factor in the perception of objects. Furthermore, there is disagreement as to
how and when surface cues matter. More speciﬁcally, some researchers argue that color
sometimes provides unique information about an object’s identity (Gegenfurtner &
Rieger, 2000; Joseph, 1997; Joseph & Profﬁtt, 1996; Price & Humphreys, 1989; Tanaka
& Presnell, 1999; Tanaka, Weiskopf, & Pepper, 2001), whereas others argue that it
merely acts as additional information for boundary segmentation (Rossion & Pourtois,
2004; Wurm, Legge, Isenberg, & Luebker, 1993). The former argument is based on the
assumption that color effects are seen only when color is predictive or diagnostic of the
object. In other words, an object that is associated with a particular color is said to be
color diagnostic because color can help by restricting the possible objects that may be
associated with the structure (Joseph, 1997; Joseph & Profﬁtt, 1996; Price & Humphreys,
1989; Tanaka & Presnell, 1999; Tanaka, et al., 2001). Price and Humphreys (1989)
asked participants to classify whether a stimulus presented was a hit or a vegetable
(Experiment 2). Results showed that food items were more easily classiﬁed when
presented in color, and that this effect of color was more pronounced in objects that were

structurally similar (Experiment 3). Price and Humphreys concluded that color does have

12

an effect on object perception, but that it would only be seen when color was diagnostic
for a particular object and when that object’s structure did not sufﬁce (e. g., within-in
category distinction). They argued that previous studies (Biederman, 1988; Davidoff &
Ostergaard, 1985; Ostergaard & Davidoff, 1988) found no effect of color because they
used everyday objects that all had distinctive shapes. In the case of visually distinct
objects, edge information provides a unique cue for identity, and as a results color effects
are not detected.

Recently, Tanaka and Presnell (1999) investigated the effect of color on the
perception of objects ranked high in color diagnosticity only. In addition to a typicality
rating that asked participants to report how often a particular object is seen in a particular
color, Tanaka and Presnell used a feature list ranking of the objects to determine color
diagnosticity. For the feature lists, each participant rank ordered typical colors for each
given object. The number of times a particular color was listed for a particular object
determined whether that object was designated color diagnostic. Tanaka and Presnell
found larger effects of color for diagnostic objects than non-diagnostic objects in both a
naming task and a classiﬁcation task. In addition, they found that when objects were
matched for shape (rate of recognition for monochrome versions both diagnostic and non-
diagnostic objects were equivalent) an effect of color for diagnostic objects was still seen,
suggesting that color may contribute to the recognition of diagnostic objects
independently of their shape information. Based on these results, Tanaka and Presnell
concluded that if objects have a strong association with a color, then color could act as a
direct indicator of object identity independently of the shape information conveyed by

edges.

13

On the other hand, Wurm et a1. (1993) found an effect of color on the recognition
of food items, but did not ﬁnd that color diagnosticity ratings was related to the color
advantage. Wurm et a1. determined the diagnosticity of food items based on how often
participants associated a particular color with a particular food object. They found that
high color diagnostic foods were not recognized faster or more accurately than food items
that were not typically associated with a particular color. Results also showed the color
advantage was reduced for more distinctive or prototypical depictions of the food items.
They concluded that the degree to which color improved reaction time and accuracy is
due to additional segmentation and contour extraction information that color provides.
Wurm et a1. argued that color only becomes helpful when shape or structural information
is absent or degraded, and that rather than providing unique identity information, color
provides auxiliary boundary segmentation when edge information alone is not sufﬁcient.

Taken together, the ﬁndings reviewed above suggest that the role of color
depends on whether structure (edge) information is sufﬁcient. It seems possible that
whether color effects are seen depends in part on structural similarity of the stimulus set,
quality of the stimulus (whether the information is degraded due to occlusion or blurring
of the image), and diagnosticity of color for a particular object. However, how
diagnosticity is deﬁned varies from study to study, as does the deﬁnition of the structural
similarity of the stimulus set. Furthermore, what role color plays in perception of objects
is not clear. It could be that color contributes unique gist information and acts to simply
narrow down possible alternatives (derived from structure), or is simply acting as a ﬁne-
tuning of boundary segmentation. Overall, it seems that there is little agreement what

role, if any, color plays in object recognition.

14

The scene perception literature reveals a similar pattern of mixed results over a
much smaller number of studies. In a recent study, Delonne, Richard, and F abre-Thorpe
(2000) investigated the contribution of color to the rapid categorization of scenes
containing either animals or food. Participants were asked to detect the presence of food
or an animal in rapidly presented scenes (40-20 ms). There was no effect of color on
animal classiﬁcations and a small effect (10-15 ms) of color on food categorization.
Delorrne et al. found that the effect of color (for food items) coincided with reaction
times longer than 250 ms and, therefore, concluded that the initial processing of the
stimuli did not involve color information. Instead, Delorrne et a1. argue, the results
suggest that the vital information lies in determining global regions of the scene that lead
to the activation of gist information (i.e., scene structure).

However, when looking at the contribution of color on the classiﬁcation (naming
and veriﬁcation) of natural scenes and man-made scenes, Oliva and Schyns (2000) found
that natural scenes (such as scenes of forests, deserts, and beaches) have a unique
combination of colors in speciﬁc conﬁgurations that are associated with that particular
scene category (e.g., beaches are associated with a band of blue along the upper region of
the scene and a band of light brown along the lower region). In this way, natural scenes
are color diagnostic because their color composition is consistent across exemplars and
are indicative of the scene category (i.e., unique to each category). In this study, color
diagnostic scenes were determined by plotting average hue in color space. Scene
belonging to non-overlapping groups within the color space were selected as being color
diagnostic. Oliva and Schyns found categorization of natural scenes was slower and less

accurate when color information was removed compared to man-made scenes, which

15

showed no difference between the colored and monochrome versions. They concluded
that the early processing of the scene must use color information when natural scenes are
viewed, and therefore the early prOcessing must include color information regardless of
scene type. These results have been replicated in a recent study by Goffaux, Jacques,
Mouraux, Oliva, Rossion, and Schyns (in press) using event-related potential (ERP).
Goffaux et al. showed that for color diagnostic scenes, there was an earlier activation in
the frontal and parietal sites when scenes were presented in normal color versus in
abnormally colored or monochrome conditions, by 200 ms and 351 ms, respectively.
These results support the conclusion that it is not merely the presence of chromatic cues
that lead to fast processing of scene information (i.e., color merely helps with scene
segmentation), but that the normal colors are cues for scene categorization. Therefore,
despite the mixed results in the literature as to the nature and extent of the contribution of
color to the initial processing of scenes and objects, recent studies seem to suggest that

color may play a role in the initial formation of the scene representation.

Studying Scene Gist Onset: The Contextual Bias Paradigm

Previous studies have shown that when a novel scene is presented, information
necessary for processing scene gist is acquired within the ﬁrst ﬁxation, and no longer
than 100 ms (Intraub, 1981; Metzger & Antes, 1983; Oliva & Schyns, 2000; Potter, 1976,
1975; Potter & Levy, 1969). Although these experiments demonstrate that scene gist is
processed very quickly, they do not reveal exactly how soon after onset the information
necessary to activate the scene gist is acquired. The activation of scene gist means that

the scene information is available to other cognitive systems and, therefore, able to

16

inﬂuence subsequent interactions with and judgments about the scene (e.g., possible
component objects). Knowing the onset of the activation of scene gist would also allow
us to further investigate which scene properties are being exploited by the visual system
to identify a scene (e.g., color).

To date, studies of rapid scene perception have used various methodologies to
assess how quickly a scene is detected or categorized. Many of these past studies relied
on naming or judgment tasks that require participants to make explicit what they thought
they saw. Judgment tasks that have open-ended responses often results in a high
variability in responses that require multiple judges to then assess the accuracy of the
responses. This presents a problem because it is unclear whether naming the scene is
interpreted the same way for each participant. For instance, when a scene of a forest is
named as trees, does that mean that the participants is understanding the gist, or are they
simply naming prominent component objects? Also, some scenes are difﬁcult to name
because there is no particular category that they ﬁt, so it is not clear how the variability in
naming should be scored or whether these types of scenes should be included at all. To
counteract the variability in naming tasks, some researchers prompt participants with
category names before the scenes are viewed and instruct the participants to use only
those names.

Veriﬁcation tasks have also often been used to study rapid scene perception.
Potter and colleagues (1975, 1976; Potter & Levy, 1969) have used a Rapid Serial Visual
Presentation (RSVP) paradigm, in which the task was to detect a pre-speciﬁed scene.
The requirement that participants had to be told the target before seeing the series of

presented scenes could have led participants to engage in a feature detection strategy or

17

some other guessing strategy that would not necessitate the identiﬁcation of the scene
gist. Oliva and Schyns (1997, 2000; Schyns & Oliva, 1994) used veriﬁcation task
requiring participants to indicate whether the scene matches the label of the scene shown
just before its presentation. These veriﬁcation tasks face the same uncertainty of labeling
found with the judgment tasks. It requires that the experimenter assign a label to the
scene that will correspond to what the participant labels the scene; otherwise, it leads to a
higher rate of errors. Another related problem to providing labels which participant must
judge as appropriate is with scenes that have a combination of features belonging to
multiple categories. How close a scene is to the typical scene in any category will vary,
as will the interpretation of a correct and incorrect label. This variability makes the
selection of distractors difﬁcult, and leads to an underestimation of how quickly
information for the formation of the conceptual representation of a scene is gathered.
Additionally, there are researchers that argue that judgment and veriﬁcation tasks
tap into late, rather than early, initial visual processing (Biederman & J 11, 1988; Davidoff,
1990). To circumvent the problems associated with these tasks, Fabre-Thorpe, Thorpe
and colleagues (Delorrne, Richard, & Fabre-Thorpe, 2000; Fabre-Thorpe, Delorrne,
Marlot, & Thorpe, 2001; Thorpe, Marlot, & F ize, 1996; Van Rullen & Thorpe, 2001) use
a go/no-go categorization task in which participants are asked to detect the presence of a
target object (e. g., animal, human, and vehicle). Scenes are ﬂashed for an extremely brief
duration (20-40 ms) and participants are asked to make a decision. For each trial, a scene
is presented individually (unmasked) or a series of scenes is presented in an RSVP format
and one of the scenes may contain a target object. These detection tasks are thought to be

better able to capture early processes, but it is unclear exactly what type of processing is

18

necessary to complete these tasks. For instance, is it necessary to comprehend the scene
gist in order to detect the presence of an animal? Recent studies have shown that this
task may be performed by using a feature detection strategy and is highly dependent on
the contents of the distractor scenes viewed within a trial sequence (Evans & Treisman,
2004). All these methodologies are limited in their ability to assess how quickly the
necessary categorical information can be extracted to inﬂuence subsequent behavioral
responses. The present study investigates the rapid perception of scene gist by
introducing a new paradigm that avoids some of the problems associated with judgment,
veriﬁcation, and detection tasks.

The new paradigm examines scene gist perception by measuring the speed at
which relevant information is extracted to initially activate. Previous studies have found
that when participants are asked about the presence of a target object in a rapidly
presented scene, their responses are highly biased by the scene gist. Participants show a
tendency to afﬁrm the presence of a target object if it is consistent with the scene gist,
and to reject its presence if it is inconsistent with the scene gist (Hollingworth &
Henderson, 1998, 1999). From these results, it is clear that the activation of a scene’s gist
precedes the examination of speciﬁc objects within that scene, but biases participants to
assume the presence of certain objects. The current study examines how quickly
information necessary for the activation of scene gist is acquired by using the Contextual
Bias paradigm, which capitalizes on this bias. The Contextual Bias paradigm uses
response bias to examine how quickly sufﬁcient information is extracted to activate a
scene’s gist after its onset. After presenting a scene, an object name that could be

consistent or inconsistent with the scene’s gist is displayed on the screen. A consistent

19

object is associated with the scene’s gist and therefore has a high likelihood of appearing
in the scene. An inconsistent object is not typically associated with that scene’s gist and
has a lower likelihood of appearing in the scene. The participants are asked to indicate
whether the object was present in the scene by responding “yes” or “no” to the object
name via a response button. The logic of the paradigm is simply that if the perceptual
information acquired ﬁom the scene within the presentation time is sufﬁcient for the
scene gist to be activated, then the responses should reveal a bias. If a scene presented
for a given duration is processed to the level of gist, then participants should be more
likely to respond “yes” to consistent and “no” to inconsistent targets. If, however, the
scene is not presented long enough to acquire information to process to the level of gist,
then participants should respond “yes” to "both target types in equal proportions.
Therefore, it is the presence of the bias that indicates that the scene gist was activated and
available to inﬂuence object judgments.

By looking at the responses to semantic information early on, the current study
hopes to ascertain how soon after onset is the necessary information acquired to form an
initial conceptual representation that is functional (i.e., able to be used to make
inferences). Further, the present study hopes to decipher which scene properties (i.e.
scene information that is initially extracted) most directly affect rapid scene gist
activation. By manipulating scene properties (such as the quality of its structure or the
presence/absence of color), activation of the scene gist can be measured in the strength of
the response bias. It is assumed that if the information removed or changed in the scene
affects the response bias, then that information must be used by the system to initially

activate a conceptual representation of the scene.

20

Overview of the Current Research

The aim of the current work was to investigate the speed of the acquisition of
scene gist information, to examine two possible sources of information in the scene likely
to lead to this rapid activation (e.g., structure and color information), to investigate the
nature of the interaction between these sources of information, and to introduce a novel
way of investigating the onset of scene gist information. As reviewed above, scene gist
information is thought to inﬂuence a number of later cognitive processes involved in
interactions with that scene (e.g., navigation, memory, and visual search); however, very
few studies have looked at the timing for the onset of the activation of this information or
have investigated what information in a scene the system is capitalizing on in order to
attain rapid activation.

Unlike the methodologies used in past studies, the Contextual Bias paradigm
measures the onset of scene gist by looking at the degree to which the activation of the
conceptual representation inﬂuences judgments about the content of the scene. The
participants’ task was to indicate whether the named target object was present in the
scene (Experiments 1-3) or was likely to be present in the scene (Experiments 4-6). The
degree to which participants responded differently to consistent and inconsistent target
objects indicated not only whether scene gist was initially activated, but also
demonstrated how this activation changed increasing exposure durations.

Experiments 1-3 demonstrated that the Contextual Bias paradigm is a feasible
method for the study of rapid scene gist perception. Results revealed that the information
for the activation of scene gist is sufﬁciently acquired within the ﬁrst 42 ms of onset and

so the extraction of necessary information almost instantaneous. In addition, the pattern

21

of results showed that the inﬂuence of the scene gist activation on the response bias
increases as the duration of scene presentation increases, suggesting a number of
implications for the formation of scene representations.

The next set of experiments was designed to examine what perceptual factors may
play a role in the rapid onset of scene gist activation. Whether color information is
important during initial processing has been a topic of much debate over the past few
decades. Experiments 4-6 examined the contribution of color to the onset of scene gist.
These experiments demonstrated that there is an interaction between a scene’s available
structure information and its color information. Results reveal that if structural
information is degraded, then color information seems to contribute to the early
activation of scene gist. However, when structure information is fully available (i.e., the
scenes are normal), color effects are not present. Based on this pattern of results, an
interaction between color and structure is proposed that emphasizes the weighted input of
information from these two sources into the decision process required for the response.
When structure is informative, having color information is inconsequential. However,
when the availability of structure information is lessened, then color contributes to the
rapid onset of scene gist activation.

There are two possible roles for color, based on the results of Experiment 6. First,
color could simply be providing additional boundary segmentation information. This
would indicate that color is not directly ﬁmctional in the activation of scene gist, and as
long as regions are distinguishable, color information is not needed. The second
possibility is that color information is functional and provides a direct route to gist

information. Its contribution is usually masked under normal circumstances because the

22

system color is redundant with structure information and the system is extremely efﬁcient
at extrapolating the necessary information.

Experiment 7 was designed to investigate the role that color plays by presenting
scenes with abnormal colors. The abnormal hues provide the segmentation information
but not any direct cue to scene gist. Color as a segmenter only would indicate that the
initial processing of visual information is based on structure information only and that the
addition of color information (be it normal or abnormal) does not affect the onset of
scene gist information when structure information is ﬁrlly available. If color does
provide a route to gist information independent of the structure information and
presenting abnormally colored scenes interferes with the initial onset of scene gist, then it
may be that the visual system is processing both these types of information during the
initial activation of scene gist. In this case, the reason that color effects were not found in
past studies and in the studies reported above (Experiments 4 and 6) is either that color
provides no additional unique information than that provided by the structure information
or the system automatically relies on structure information, unless that information
proves to be less useful than color. The results from Experiment 7 support the role of
color as a direct contributor to the onset of scene gist activation.

The interaction between color and structure information has implications for
what perceptual properties are necessary for the rapid activation of scene gist and also for
how scene representations are constructed and stored, and more generally, how this

information may then inﬂuence other cognitive processes.

23

THE ONSET OF SCENE GIST PERCEPTION

In this chapter, three experiments were conducted to investigate how quickly
scene gist is activated from scene onset. Using the Contextual Bias paradigm,
Experiments 1-3 investigated the nature of the response bias when scenes were presented
for different durations. In every experiment, the participants’ task was to indicate
whether the object named after the scene had been displayed was present in that scene.
Based on associated schemas, the activation of scene gist should lead them to response
“yes” more often to consistent than inconsistent target objects. The response bias is
measured as the difference in “yes” responses to consistent and inconsistent objects. The
degree of this difference indicates the strength of the activation of related semantic
activation.

Experiment 1 investigated a broad range of durations from 20 to 250 ms. The
results demonstrated that the response bias has an extremely early onset and increases in
strength as the duration increases. Experiment 2 was designed to investigate how early
the response bias can be measured by displaying scenes in a narrow, more ﬁne-grained
range of durations than those in Experiment 1, ranging ﬁ'om 20 ms to 50 ms. Results
reveal an earlier response time than Experiment 1 and again show a continued increase in
the response bias as the duration times for the scenes increase. Experiment 3 investigated
the nature of this increase in response bias by looking at a more ﬁne-grained durations

ranging from 50 ms to 100 ms.

24

Experiment 1

The ﬁrst experiment investigated how soon after onset a scene’s semantic
information is available. In the Contextual Bias paradigm, the participants are asked to
make a judgment based on whatever information they have gathered from a brief display
of a photograph scene. The span of durations varied from 20 ms to 250 ms. The
predictions are as follows: If related semantic information is available earlier than
previously reported, then the effect of object consistency should be seen before 100 ms.
If related semantic information is not available until later in the processing of visual
information, then the object consistency effect should not appear until 100 ms after onset
or later.

Methods

Participants. There were 24 Michigan State University undergraduates that
participated in this experiment. All participants received credit towards an introductory
psychology course.

Apparatus & Stimuli. The stimuli were full-color photographs taken from a
number of sources (books, calendars, web, and personal photos). There were a total of 80
scenes presented (10 scenes/condition).

The scenes were presented on a Dell P78 Trinitron 16-in. (41.1 cm) monitor
driven by a G—force3 ND VIDIA Pro super video graphics adapter card. The refresh rate
was set at 100 Hz. The scenes had a resolution of 800 x 600 pixels and subtended 30° x
22.5° of visual angle viewed from a comfortable seated position of 61 .5 cm away. Head
and body position were not restricted, and so the calculation of visual angle is based on

the average distance that participants were seated from the monitor.

25

Design. The experiment had a basic design of two factors (2 x 4). There were
two target object conditions (consistent and inconsistent), and four duration conditions
(20, 50, 100, and 250 ms).

Procedure. After the participant had signed the consent form, the experimenter
explained the sequence of events for each trial and that the object of the task was to try to
understand the “gist” or “what the scene was about” for each picture presented. Figure 1
depicts the events shown for any given trial. All images in this dissertation are presented

in color.

 

 

 

 
 
 
      
   

Spatula
Until
S‘yes’, or 66110,, subject
2 response

 

Duration varies with
condition

    
 
    

Time

Subject initiates trial

 

 

 

Figure 1: Trial Sequence. Consistent Object was “Spatula” and Inconsistent object “Wrench”.

At the beginning of a trial, participants would ﬁxate on a screen with a center
ﬁxation cross displayed for 2000 ms. The participants would then view a photograph of a

scene. The scene’s presentation duration varied by condition. Each participant took part

26

 

in all conditions and viewed each scene once. Scenes were counterbalanced for each
condition across all participants and were presented randomly (determined by the
program for each individual). The presentation of the scene was followed by a visual
mask for 50 ms. The mask was composed of a jumble of scene sections taken ﬁ'om the
collection of scenes being shown in the experiment (see Figure 1). Next, a word was
displayed at the center of the screen until the participant responded. The word could
name an object that was semantically consistent with the scene or inconsistent. Object
names were chosen to produce a high percentage of “yes” responses for consistent
objects and a low percentage of “yes” responses for inconsistent object. An initial
norming study showed this pattern for all scenes when presented for 250 ms, which is
ample time for the gist of the scene to be acquired. The named object was never present
in the scene, regardless of whether it was consistent or inconsistent. Therefore, the
participants’ responses were always based on bias. That is, reSponses were based on
whether the named object ﬁt or belonged in that picture, never on having viewed that
object in the scene. The participants made their judgments by pressing either a response
button labeled “1” for yes or “2” for no (these labels were always presented as a reminder
on the response screen below the word). The experiment took approximately 10-20
minutes to complete.
Results

Response bias for each scene duration condition was calculated and analyzed
using the same method for all experiments reported in this dissertation. For each duration
condition, the proportion of “yes” responses was calculated for both the consistent and

inconsistent target conditions. Planned comparisons between the target conditions were

27

carried out for each duration condition. The bias effect was deﬁned as a statistically
signiﬁcant higher proportions of “yes” responses for the consistent target condition than
the inconsistent target condition for a given duration and the onset of the response bias
was the indicator that the scene’s gist was activated to some threshold level that was able
to affect the judgments on the target objects. The results for Experiment 1 are depicted in

Figure 2.

 

 

 

D Consistent I Inconsistent

 

 

 

1.00
0.90
0.80
0.70

0.60 +
0.50
0.40
0.30
0.20

0.10
0.00 —

 

 

 

" Responses

 

 

 

 

(I)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Proportion of "Ye

 

 

 

 

 

 

 

 

 

 

 

 

 

20 50 1 00 250

 

 

Dura’o'on of Scene (ms)

 

Figure 2: Proportion “yes” responses to consistent target objects (blue bars) and to inconsistent
target objects (red bars) for each duration condition in Experiment 1. Error bars represent Standard Error
of the Mean.

Figure 2 shows the proportion of “yes” responses by duration according to target
conditions for Experiment 1. An omnibus ANOVA revealed that there was a main effect
of target condition (F (1,23) = 129.21, p<0.01, MSE =0.0392), in which mean “yes”

responses were signiﬁcantly higher for consistent targets than inconsistent targets; a main

28

effect of duration condition (F (3,69) = 129.21, p<0.01, MSE = 0.0308), which revealed
an overall increase in “yes” responses as the duration time increased; and a signiﬁcant
interaction between target condition and duration (F (3,69) = 55.37, p<0.01, MSE =
0.0175), in which the relative difference between targets increased as a ﬁrnction of
increase in duration. Planned paired-sample t-tests revealed signiﬁcant difference
between the consistent and inconsistent targets at each of the following duration
conditions: 250 ms [Consistent M = 0.67, SD = 0.28; Inconsistent: M = 0.07, SD =
0.09; (t (23) = 10.27, p<0.01)], 100 ms [Consistent M = 0.66, SD = 0.25; Inconsistent:
M = 0.11, SD = 0.12; (t (23) = 12.15, p<0.01)], and 50 ms [Consistent M = 0.44, SD =
0.25; Inconsistent: M = 0.26, SD = 0.21; (t (23) = 5.69, p<0.01)]. Scenes presented at 20
ms showed no signiﬁcant difference [Consistent M = 0.26, SD = 0.24; Inconsistent: M =
0.26, SD = 0.24; (t (23) = 0.12, n.s.)].
Discussion

The ﬁrst experiment was designed ﬁrst, to demonstrate the useﬁrlness of the
Contextual Bias paradigm and second, to investigate the speed at which related semantic
information about a scene is retrieved when that scene is presented rapidly. The onset of
semantic activation for scene gist was measured by the presence of a response bias. The
response bias was calculated as the difference between the proportion of “yes” responses
to consistent target objects and the proportion of “yes” responses to inconsistent target
objects. Experiment 1 clearly establishes the efﬁcacy of the Contextual Bias paradigm in
determining how soon after onset the scene gist information is sufﬁciently extracted to
inﬂuencing judgments on component objects. Results also showed that related semantic

information is available by at least the ﬁrst 50 ms; results also revealed an increase in the

29

size of the bias as the scene duration was lengthened. Therefore, semantic information
related to a scene becomes available prior to the 100 ms estimate proposed in previous
studies (Intraub, 1981; Levy & Potter, 1969; Metzger & Antes, 1983; Oliva & Schyns,
1997; Potter, 1976, 1975; Schyns & Oliva, 1994). Further, as the presentation duration
for a scene is lengthened, the response bias is larger, suggesting that the inﬂuence of the
activation on behavioral responses increases in strength. Interestingly, an informal
survey of the participants suggested that the scenes being viewed at such short durations
did not reach conscious awareness. Most claimed that they felt that they are guessing;
however, the bias was present and its strength increased before participants can report a

conscious recognition of the scenes.

Experiment [I

The second experiment investigated how soon after onset a scene’s semantic
information is available by investigating a more ﬁne-grained range of durations than
those used in the previous experiment. In Experiment 1, the response bias was present
for durations of 50 ms and higher. However, that does not indicate that it takes 50 ms for
the activation to occur. Experiment 2 investigated whether the response bias was present
for shorter durations than that found in Experiment 1. The span of durations varied from
20 ms to 50 ms, with an additional duration of 250 ms to ensure that participants were
performing the task. At 250 ms, the scene is easily visible, so any participants that
merely responded without paying attention to the screen could be identiﬁed because they
would fail to show a response bias at this condition. The predictions are similar to those

of Experiment 1: If related semantic information is activated earlier, then a response bias

30

should emerge at a shorter duration condition than 50 ms. If related semantic
information is not activated until after a duration of at least 50 ms, then the response bias
should not appear until 50 ms.

Methods

Participants. Thirty Michigan State University undergraduates participated in this
experiment. All participants received credit towards an introductory psychology course.

Apparatus & Stimuli. The stimuli and apparatus were identical to those used in
Experiment 1, with the following exception. The addition of a duration condition
reduced the number of images per condition to 8 scenes/condition. In addition, in order
to get at a ﬁner gradation of times between duration conditions, the screen refresh rate
was increased to 120 Hz.

Design. The design was identical to the ﬁrst experiment with the following
exceptions. In Experiment 2 the duration conditions were: 25, 33, 42, 50, and 250 ms.
The 250 ms condition was included in order to make certain that the participants were
actually performing the task. Participants that were not responding according to the task
instructions would not produce an effect at this duration.

Procedure. The procedure for Experiment 2 was identical to Experiment 1.
Results

As in Experiment 1, the proportion of “yes” responses was calculated for both the
consistent and inconsistent target conditions for each duration condition. Planned

comparisons between the target conditions were carried out for each duration condition.

31

 

 

 

D Consistent I Inconsistent

 

 

 

 

 

 

eponses
9997‘
3888

R
8

 

 

 

 

 

 

 

 

888‘

 

 

 

0.20
0.10
0.00 -

 

 

 

Proportion of 'Yes'

 

 

 

 

 

 

 

 

 

 

 

Duration of Scene (ms)

 

 

 

Figure 3: Proportion “yes” responses to consistent target objects (blue bars) and to inconsistent
target objects (red bars) for each duration condition in Experiment 2. Error bars represent Standard Error
of the Mean.

Figure 3 shows the proportion of “yes” responses by duration according to target
conditions for Experiment 2. An omnibus AN OVA revealed that the pattern of results
mimicked those found in Experiment 1. Speciﬁcally, there was an overall effect of target
condition (F (1,29) = 67.1, p<0.01, MSE = 0.0523), a main effect of duration condition
(F (4,116) = 2.6, p<0.05, MSE = 0.0411), and a signiﬁcant interaction between target
condition and duration (F (4,1 16) = 33.21, p<0.01 , MSE = 0.0315). Planned paired-
sample t-tests revealed signiﬁcant difference at durations of 250 ms [Consistent M =
0.73, SD = 0.23; Inconsistent: M = 0.09, SD = 0.10; (t (29) = 16.14, p<0.01)], 50 ms
[Consistent M = 0.56, SD = 0.23; Inconsistent: M = 0.30, SD = 0.22; (t (29) = 5.89,

p<0.01)], and 42 ms [Consistent M = 0.44, SD = 0.27; Inconsistent: M = 0.31, SD =

32

0.22; (t (29) = 2.5, p<0.01)]. Scenes presented at 33 ms [Consistent M = 0.41, SD =
0.32; Inconsistent: M = 0.33, SD = 0.27; (t (29) = 1.25, n.s.)] and 25 ms [Consistent M
= 0.31, SD = 0.26; Inconsistent: M = 0.34, SD = 0.23; (t (29) = -0.67, n.s.) showed no
signiﬁcant difference.
Discussion

Experiment 2 investigated the onset of the bias more closely by using smaller
increments in the duration condition. Results showed that relevant semantic information
was extracted as soon as 42 ms after onset. In comparison with the ﬁndings of previous
studies (Intraub, 1981; Metzger & Antes, 1983; Oliva & Schyns, 1997; Potter, 1976,
1975; Potter & Levy,1969; Schyns & Oliva, 1994), the activation of scene gist can be
detected very early on in processing.

In addition, the results replicate Experiment 1 in that the response bias for the 250
ms condition was much greater than those found in the 42 and 50 ms conditions,
indicating that the activation of scene gist is much stronger, resulting in a stronger

inﬂuence of the response patterns during longer durations.

Experiment 1]]

Experiment 3 investigated the nature of the response bias. In Experiments 1 and
2, the response bias seemed to increase in strength from 50 ms to 100 ms, and then
asymptote for longer durations (there is no difference between the bias in the 100 ms and
250 ms conditions). The increase in response bias suggests that there is an accompanying
increase in the activation of scene gist. So, between durations of 50 ms and 100 ms, it is

possible that a maximum amount of activation is reached, leading to an asymptote in the

33

size of the response bias. The main purpose of Experiment 3 was to investigate whether
the increase in response bias reaches a maximum activation by examining durations
spanning ﬁ'om 50 ms to 100 ms.

If the increase in response bias does reach a maximum value, then there should
be a point of inﬂection for the response biases as the duration conditions increase.
However, if the increase in activation does not reach a maximum within the ﬁrst 100 ms,
then there should be a gradual increase in the strength of the response bias, with no
obvious point of inﬂection for any given presentation duration.

Methods

Participants. Thirty-six Michigan State University undergraduates participated in
this experiment. All participants received credit towards an introductory psychology
course.

Apparatus & Stimuli. The apparatus and stimuli were identical to those used in
Experiment 2; however, another 16 photos were added to the stimulus set. These scenes
were added to maintain the same image to condition ratio as Experiment 2 (8
scenes/condition), due the addition of a sixth duration condition in Experiment 3.

Design. The target object conditions for Experiment 3 were identical to
Experiments 1 and 2; however, an additional duration condition was included, resulting
in a 2 x 6 factor design. In this experiment, participants viewed the scenes for: 50, 58,
75, 83, 92, and 100 ms.

Procedure. The procedure for Experiment 3 was identical to Experiments 2.

34

Results

For each duration condition, the proportion of “yes” responses was calculated for
both the consistent and inconsistent target conditions. As in Experiments 1 and 2,
planned comparisons between the target conditions were carried out for each duration

condition.

 

 

[ El Consistent I Inconsistent

 

 

50 58 75 83 92 100
Duration of Scene (ms)

 

 

 

Figure 4: Proportion “yes” responses to consistent target objects (blue bars) and to inconsistent
target objects (red bars) for each duration condition in Experiment 3. Error bars represent Standard Error
of the Mean.

Results for Experiment 3 are shown in Figure 4. An omnibus AN OVA revealed a
similar pattern of effects as the previous two experiments. There was an overall effect of
target condition (F (1,35) = 476.19, p<0.01, MSE = 0.0598), no main effect of duration
condition (F (5,175) = 0.46, n.s., MSE = 0.0258), and a signiﬁcant interaction between

target condition and duration (F (5,175) = 17.6, p<0.01, MSE = 0.0217). Planned paired-

35

sample t-tests revealed signiﬁcant difference across all duration conditions: 100 ms
[Consistent M = 0.78, SD = 0.19; Inconsistent: M = 0.14, SD = 0.15; (t (35) = 16.32,
p<0.01)], 92 ms [Consistent M = 0.75, SD = 0.21; Inconsistent: M = 0.13, SD = 0.09; (t
(35) = 18.92, p<0.01)], 83 ms [Consistent M = 0.75, SD = 0.14; Inconsistent: M = 0.15,
SD = 0.14; (t (35) = 18.19, p<0.01)], 75 ms [Consistent M = 0.72, SD = 0.16;
Inconsistent: M = 0.19, SD = 0.18; (t (35) = 12.73, p<0.01)], 58 ms [Consistent M =
0.63, SD = 0.22; Inconsistent: M = 0.23, SD = 0.16; (t (35) = 9.74, p<0.01)], and 50 ms
[Consistent M = 0.57, SD = 0.22; Inconsistent: M = 0.29, SD = 0.22; (t (35) = 5.97,
p<0.01)].

Discussion

In Experiment 3, the increase in the bias as duration times increased was further
investigated and the results revealed that the response bias increased monotonically in
strength up to and including 100 ms. These results suggest that activation was increasing
with longer stimulus presentations; however, no maximum in the amount of activation
was uncovered.

Generally speaking, there are two possible reasons for the increase in the response
bias over time in this and the previous experiments (Experiments 1 and 2). The ﬁrst
assumes that the recognition of a scene occurs as an all-or-none or binary process, in
which a scene is recognized when information supporting that semantic scene category
reaches a certain threshold. The other possible explanation is that scene recognition is
continuous. In this case, a scene is recognized incrementally with increased presentation
durations because more supporting visual information is available with increased display

times. Across the three experiments reported so far, there was a gradual increase in

36

response bias across the range of durations. An assumption that all scenes are processed
at equal rates and reach similar levels of activation at the same time would lead to the
conclusion that the increase in the response bias is due to a gradual accumulation of
activation strength in all scenes simultaneously. However, given the variety in
complexity and type of information available from one scene to the next, it is unlikely
that this assumption is true. It is more likely that scenes are processed at different rates
and reach activation of their gist information at different times. For instance, the
response bias results of Experiment 2 show that the fastest a scene can be retrieved is 42
ms, but it may not be the case for all scenes. Taking this assumption into account, it
seems that the gradual increase in the response bias could be to due to both an increase in
the number of scenes that reached some level of scene gist activation, as well as an
increase in the activation level as more visual information is available with increasing
durations.

The actual mechanism responsible for the response bias (as either binary or
continuous) is of little consequence to the predictions of this dissertation. In either case,
the increase in bias is affected by an increasing availability of information about the
scene. Deciding which of the two models is correct is not in the scope of this dissertation
and does not affect the predictions. Therefore, for the purpose of the present paper, we
will arbitrarily adopt the view that scene recognition occurs as a continuous activation of
related information that can increase over time. Adopting this view does not mean that
we support this view exclusively, but rather have sided with one view in order to outline
the predictions of the current investigation more clearly. For the remainder of the paper,

all predictions will be outlined with the continuous activation mechanism in mind;

37

however, that is not to say that other alternative mechanisms may not be equally

probable.

38

THE INFLUENCE OF COLOR ON SCENE GIST PERCEPTION

As reviewed in the introduction, the effect of color on the perception of objects
and scenes has shown mixed results. For scenes, some studies have shown no effect of
color (Delorrne et al., 2000), while others have shown that color can effect scene
perception, but only for natural not man-made scenes. (Goffaux et al., in press; Oliva &
Schyns, 2000). We know from other studies on rapid scene perception that the structural
information available globally plays an important role in the initial activation of gist
(Oliva & Schyns, 1997; Schyns & Oliva, 1994; Torralba, 2003; Torralba & Oliva, 2003).
If color and structural information are processed immediately and simultaneously
(Edwards, et al., 2003; Livingston & Hubel, 1984a, 1984b, 1988), it is possible that the
initial processing of information for a scene includes color information, but that its
inﬂuence can only be seen at the very early stages of processing. The mixed results of
past studies, therefore, may be due to differences in tasks (naming vs. veriﬁcation of
stimuli) and timing (very early vs. later processing). In the present study, the
contribution of color is investigated by examining the role color plays in the initial

activation of scene gist.

Experiment IV

Experiment 4 investigated the effect of color on the activation of scene gist.
Again the Contextual Bias paradigm was used and the participants were asked to make a
judgment based whatever information they have extracted from a brief displayed
photograph of a scene. In Experiment 4, the scenes are presented either as colored or

monochrome photographs. In order to look at the early effects of color processing, the

39

span of durations used was identical to that used in Experiment 2 (from 20 to 50 ms). As
reviewed above, there are two views on what role color can play in the perception of
objects and scenes. On the one hand, researchers argue that color helps to boost the
perception of certain objects (either through further assisting the segmentation of the
shape or form or through a semantic association between the color and the object name).
One the other hand, some researchers posit that the effect of color is much later, occurs
only after the initial recognition, and so, the initial representation of visual information is
colorblind. In this case, the initial representation would contain edge-information only,
and perception of the scene category would be based on its structure. Therefore, if color
does contribute to the activation of related semantic information during early stages of
visual processing, then the bias effect should be larger for colored scenes than
monochrome scenes. However, if the color is not used in the early stages of processing
leading to the activation of relevant semantic information, then there should no difference
between the colored and monochrome scenes.

Methods

Participants. Sixty Michigan State University undergraduates participated in this
experiment for credit in an introductory psychology course.

Apparatus & Stimuli. The apparatus was identical to Experiment 1 and 64 scenes
were added to the collection used in Experiment 3 (for a total of 160 scenes). For each
of the 160 colored scenes, a monochrome counterpart was generated. Monochrome
versions of the photographs were created by transforming the photograph ﬁ'om RGB to

L*a*b* color mode and then discarding the chromatic components a*b* of the colored

40

scenes, leaving only L* (the gray-levels)‘. Colored scenes were simply the original

photographs. Figure 5 shows an example of the scenes used in this experiment.

 

Figure 5: Depicts the Color (A) and Monochrome (B) conditions for Experiment 4.

Design. The second experiment had three factors: color, target and duration (2 x
2 x 5). The target and duration conditions were identical to Experiment 2. The only
factor that was added was the color condition (color scenes and monochrome scenes).

Procedure. The procedure was similar to the one used in Experiment 1, with the
following exceptions. In Experiment 4, participants viewed 160 scenes, half of which
were full-color and the other half monochrome. The experiment lasted for approximately
20—30 minutes.
Results

There were two types of analyses carried out for this experiment. The ﬁrst set of
analyses was the planned comparisons between the target conditions for each
presentation duration. This was the same analysis carried out in Experiments 1 to 3 and
looks at how soon after onset the response bias was present. The second analysis looked

at the contribution of color. Difference scores were calculated by subtracting the target

 

' Thanks to Aude Oliva for the Matlab code that perform these transformations on the photographs

41

conditions (consistent and inconsistent) from each other for each duration and color
condition separately. The difference scores for each color condition were then compared
for each duration condition. In this way, the differences between the biases of color and
monochrome scenes are more transparent, thus making the data more interpretable.
Figures 6a and 6b show the proportion of “yes” responses by duration according
to target conditions (light bars: consistent, and dark bars: inconsistent) for the color
condition and monochrome conditions, respectively. An omnibus AN OVA revealed that
there was no main effect of color (F (1,59) = 0.34, n.s., MSE = 0.0162), a main effect of
target condition (F (1,59) = 241.3, p<0.01, MSE = 0.0487), and a main effect of duration
condition (F (4,236) = 16.56, p<0.01, MSE = 0.0481). There was a signiﬁcant
interaction between target and duration conditions (F (4,236) = 109.38, p<0.01, MSE =
0.033). Neither the interaction between color and duration (F (4,236) = 0.83, n.s. , MSE
= 0.0279), the interaction between target and color (F (1,59) = 3.1, p=0.083, MSE =
0.0344), nor the three-way interaction between color, target and duration (F (4,236) =

0.79, n.s., MSE = 0.0216) were signiﬁcant.

42

 

 

A IConsistent Ilnconsistent
1.00
0.90
0.80

g 0.70

m 0.60

'8 0.50

a

I: 0.40

'8 0.30 ~

g 0.20 ~
0.10 «
0.00 ~

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Duration (ms)

 

 

 

 

 

 

B IConsistent Inconsistent
1 .00

0.90
0.00
g 0.70
g 0.60
'8 0.50
a
c 0.40
.2
E 0.30
g 0.20 ~
0.10 ~
0.00 .

 

 

 

 

 

 

 

 

 

 

 

 

 

Duration (ms)

 

 

 

Figure 6: Proportion “yes” responses to consistent target objects (darker bars) and to inconsistent
target objects (lighter bars) for each duration condition in Experiment 4. (A) Colored scene condition (B)
Monochrome scene condition. Error bars represent Standard Error of the Mean.

For the colored scenes, planned paired-sample t—tests revealed signiﬁcant

difference at durations of 250 ms [Consistent M = 0.72, SD = 0.24; Inconsistent: M =

43

0.07, SD = 0.13; (t (59) = 18.87, p<0.01)], 50 ms [Consistent M = 0.50, SD = 0.23;
Inconsistent: M = 0.26, SD = 0.21; (t (59) = 7.3, p<0.01)], and 42 ms [Consistent M =
0.40, SD = 0.23; Inconsistent: M = 0.30, SD = 0.20; (t (59) = 3.16, p<0.01)]. Scenes
presented at 33 ms [Consistent M = 0.31, SD = 0.20; Inconsistent: M = 0.26, SD = 0.23;
(t (59) = 1.55, n.s.)] and 25 ms [Consistent M = 0.27, SD = 0.23; Inconsistent: M = 0.22,
SD = 0.19; (t (59) = 1.91, n.s.) showed no signiﬁcant difference. The same pattern was
seen for the monochrome scenes. Flamed paired-sample t-tests revealed signiﬁcant
difference at durations of 250 ms [Consistent M = 0.65, SD = 0.26; Inconsistent: M =
0.07, SD = 0.10; (t (59) = 16.18, p<0.01)], 50 ms [Consistent M = 0.47, SD = 0.28
Inconsistent: M = 0.27, SD = 0.22; (t (59) = 5.5, p<0.01)], and 42 ms [Consistent M =
0.39, SD = 0.25; Inconsistent: M = 0.28, SD = 0.21; (t (59) = 3.67, p<0.01)] and no
signiﬁcant difference for scenes presented at 33 ms [Consistent M = 0.31, SD = 0.22;
Inconsistent: M = 0.31, SD = 0.22; (t (59) = -0.01, n.s.)] and 25 ms [Consistent M =
0.24, SD = 0.22; Inconsistent: M = 0.25, SD = 0.23; (t (59) = -0.29, n.s.)]. Figure 7

shows the difference scores for each color condition at each duration.

 

 

[+Coloured +Monoohrome

 

1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
-0.10

 

Difference Score (Consistent - Inconsistent)

0 50 100 150 200 250 300
Duration (ms)

 

 

 

Figure 7. Difference scores (Inconsistent — Consistent) for the Colored (blue) and Monochrome
(red). Error bars represent Standard Error of the Mean.

The premise of the current experimental design was that when presented with the
two target choices, consistent targets would solicit more yes responses than inconsistent
targets. It is this difference between responses to the targets that is expected to change as
a function of duration depending on whether the scene was presented for long enough for
the “gist” of that scene to be processed. By looking at the difference scores, we can see
whether this change in responses over time varies as a function of the other factor of
interest, namely color.

The analysis of the difference scores compared the colored vs. monochrome
conditions for each duration condition. If there was an advantage of color present, then
there should be a bigger effect for the color scenes than the monochrome scenes at that
duration. The analysis showed that at no point did the colored scenes show an advantage

over the monochrome (as measured by the responses to the consistent and inconsistent

45

targets). The results for each duration condition were as follows: 250 ms (t (59) = 1.64,
n.s.), 50 ms (t (59) = 0.98, n.s.), 42 ms (t (59) = -0.59, n.s.), 33 ms (t (59) = 1.13, n.s.) and
25 ms (t (59) = 1.4, n.s.).

Discussion

Experiment 4 investigated the contribution of color in the initial retrieval of
related semantic information, or is the initial representation colorblind. In the results
section, the two analyses completed were: (1) to investigate the onset of the response bias
(for the replication of previous experiments), and (2) to explore the initial contribution of
color. Each of these will be discussed in turn.

The results of the ﬁrst analysis replicated the results ﬁ'om Experiment 2, in that
the response bias appeared 42 ms after onset, although a trend at 33 ms was also present
in two of the experiments (Experiment 2 & 4). To investigate whether a bias effect could
be detected as early as 33 ms, a between-experiment analysis was conducted. The sharp-
color condition in Experiment 4 was analyzed with the data from Experiment 2. The
ANOVA showed no main effect of experiment (F (1 ,107) = 1.7, n.s.), a main effect of
target (F (1,107) = 226.13, p<0.001), a main effect of duration (F (4,107) = 12.59,
p<0.001), and an interaction between target and duration (F (4,428) = 100.36, p<0.001).
No other interactions were signiﬁcant. Collapsing across experiments, a planned paired-
sample t-test of the 33ms duration condition revealed that the bias effect was signiﬁcant
(t (108) = 2.137, p<0.05). These post-hoe analyses indicate that sufﬁcient information
about the scene is acquired almost immediately after onset, as the effects of the activation
of scene gist on the judgment of component objects can be seen with presentation

durations as short as 33 ms.

46

The second analysis examined the contribution of color to the bias effect, and
demonstrated that in the case of full—colored scenes color had no effect on the onset of the
bias. In other words, monochrome scenes produced a bias of the same magnitude as the
colored scenes. The results from this experiment support the notion from previous
studies that the initial construction of a visual representation is based only on edges or
luminance information, which are still preserved in the monochrome photographs
(Biederman & J u, 1988; Davidoff, 1991; Davidoff & Ostergaard, 1988; Ostergaard &
Davidoff, 1985).

An alternative explanation for these results is that the initial representation does
have color information, but when the semantic information can be derived from structural
information, then the contribution of color is masked (Price & Humphreys, 1989). With
this view the visual system is so efﬁcient at extracting necessary information from the
structural information of the scene that there is essentially a ceiling level of performance
that the addition of color cannot improve. This would mean that when the structural
information is not as efﬁciently extracted (causing the visual system to no longer be at its
ceiling of efﬁciency), then the contribution of color might be seen. Evidence for this
alternative explanation arises from the study by Oliva and Schyns (2000) demonstrating
that the effect of color is exacerbated by degrading the edge information of scenes; the
same effect has been found in the object recognition literature (Wurm et al., 1993). The
following experiments were designed to investigate this alternative explanation regarding

the existence of color in the initial representation.

47

Experiment V

As reviewed in the introduction of this chapter, the relative contribution of color
to the categorization of scenes may be masked by the speed of processing of structural
information. Experiment 5 investigated whether the quicker processing of structural
information masked the effect of color on the activation of semantic scene information.
In order to investigate the contribution of color independent of that derived from the
scene structure, the scenes used in Experiment 4 were ﬁltered to remove high-spatial
frequency information (thus, keeping most med- and low-level spatial frequency
information). By removing some of the structural information, it is possible that colored
scenes would show an increased rate of recognition with shorter presentation durations,
relative to their blurred monochrome counterparts. If color is part of the information
quickly extracted for the activation of scene gist, , then the response bias should be larger
for the blurred colored scenes than the blurred monochrome scenes. If the color is not
extracted and processed in the early stages of perception, leading to the activation of
relevant semantic information (as suggested by the results of Experiment 4), then there
should no difference between the color conditions.
Methods

Participants. Sixty Michigan State University undergraduates participated in this
experiment for credit in an introductory psychology course.

Apparatus & Stimuli. For Experiment 5, an additional set of the scenes was
created in which each scene was low-pass ﬁltered at 1 cycle/deg of visual angle

(corresponds to 17 cycles/ image). In total, the experiment comprised 160 low-pass

48

ﬁltered colored images and their 160 monochrome counterparts. Figure 8 shows an

example of the blurred scenes used in this experiment.

 

Figure 8: Depicts the Color (A) and Monochrome (B) conditions for Experiment 5. Spatial
frequencies higher than 1 deg/image (or 17 cycles/ image) were removed, leaving only rned— and low-
spatial frequency information.

Design. The design for Experiment 5 was the same as in Experiment 4, with the
following exception. The duration condition included the following levels: 20, 50, 80,
100, and 250 ms. The reason for this change in the duration conditions (from Experiment
4) was that with information available in the scenes, activation would most likely take
longer that those found in Experiments 2 and 4. Therefore, duration conditions were the
same as those used in Experiment 1.

Procedure. The procedure was identical to the one used in Experiment 4, with the
following exception. The participants were instructed to indicate whether the target
object was likely to occur in the scene just displayed (as opposed to being asked to
indicate whether the object was present). The instructions were modiﬁed because the
scenes were now blurred and even if the scene was perceived, the fact that objects were

harder to make out inclined a number of participants to press the “no” button.

49

Results

The same sets of analyses that were described in Experiment 4 are reported for
Experiment 5. First, the relative effect of target in each duration and color condition is
considered. Second, difference scores (calculated using the same methods reported in
Experiment 4) are analyzed. The question of whether the colored scenes have any
advantage over monochrome scenes when the structural information is degraded is
investigated by comparing the difference scores between color conditions. Figures 9a
and 9b show the proportion of “yes” responses by duration according to target conditions
(light bars: consistent and dark bars: inconsistent) for the color and monochrome

conditions, respectively.

50

 

 

A [_IConsistent Inconsistent l

 

 

 

 

 

 

 

Duration (ms)

 

 

 

 

[ IConsistent Ilnconsistent I

 

80

Duration (ms)

 

Figure 9. Proportion “yes” responses to consistent target objects (darker bars) and to inconsistent target
objects (lighter bars) for each duration condition in Experiment 5. (A) Colored scene condition (B)
Monochrome scene condition. Error bars represent Standard Error of the Mean

To investigate the duration at which the scene gist was perceived, the target effect

was investigated for each color condition. An omnibus ANOVA revealed that there was

51

no main effect of color (F (1,59) < 1, n.s., MSE = 0.0277), a main effect of target
condition (F (1,59) = 392.97, p<0.01, MSE = 0.0695), and a main effect of duration
condition (F (4,236) = 27.72, p<0.01, MSE = 0.0499). There was a signiﬁcant
interaction between target and duration conditions (F (4,236) = 135.42, p<0.01, MSE =
0.0295). None of the interactions between color and duration (F (4,236) = 1.73, n.s.,
MSE = 0.0301), target and color (F (1,59) = 9.56, n.s., MSE = 0.0251), nor color, target,
and duration (F (4,236) = 2.26, n.s., MSE = 0.06), were signiﬁcant.

As found in the previous experiment, the same pattern of results was seen in both
the colored and monochrome scenes. For the colored scenes, planned paired-sample t-
tests revealed a signiﬁcant difference at durations of 250 ms [Consistent M = 0.86, SD =
0.14; Inconsistent: M = 0.18, SD = 0.18; (t (59) = 22.13, p<0.01)], 100 ms [Consistent
M = 0.78, SD = 0.19; Inconsistent: M = 0.27, SD = 0.19; (t (59) = 13.94, p<0.01)], 80
ms [Consistent M = 0.70, SD = 0.28; Inconsistent: M = 0.33, SD = 0.21; (t (59) = 11.32,
p<0.01)], 50 ms [Consistent M = 0.45, SD = 0.24; Inconsistent: M = 0.31, SD = 0.19; (t
(59) = 4.74, p<0.01)], but not at 20 ms [Consistent M = 0.34, SD = 0.24; Inconsistent: M
= 0.36, SD = 0.23; (t (59) = -0.82, n.s.)]. Planned paired-sample t-tests for the
monochrome scenes revealed a signiﬁcant difference at durations of 250 ms [Consistent
M = 0.81, SD = 0.22; Inconsistent: M = 0.20, SD = 0.16; (t (59) = 19.21, p<0.01)], 100
ms [Consistent M = 0.70, SD = 0.20; Inconsistent: M = 0.31, SD = 0.21; (t (59) = 11.76,
p<0.01)], 80 ms [Consistent M = 0.61, SD = 0.24; Inconsistent: M = 0.35, SD = 0.21; (t
(59) = 7.3, p<0.01)], 50 ms [Consistent M = 0.46, SD = 0.22; Inconsistent: M = 0.38,
SD = 0.22; (t (59) = 2.5, p<0.01)], and no signiﬁcant difference for scenes presented at

20 ms [Consistent M = 0.36, SD = 0.3; Inconsistent: M = 0.33, SD = 0.23; (t (59) = -

52

1.03, n.s.]). Figure 10 shows the difference scores for each color condition at each

duration condition.

 

 

+Colored + Monochrome]

 

1.1!)
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
-0.10

 

Difference Score (Consistent - Inconsistent)

0 50 100 150 200 250 300
Duration (me)

 

 

 

Figure 10. Difference scores (Inconsistent — Consistent) for the Colored (blue) and Monochrome
(red). Error bars represent Standard Error of the Mean.

The analysis of the difference scores was carried out in the same way as in
Experiment 4; difference scores for colored scenes were compared to difference scores
for monochrome scenes for each duration condition. If there was an advantage of color,
then there should be a bigger effect for the colored scenes than the monochrome scenes at
that duration. The analysis revealed colored scenes show an advantage over
monochrome scenes, but only for scenes shown for durations of 100 ms (t (59) = 2.69,
p<0.01), and 80 ms (t (59) = 2.67, p<0.01). There was no difference for scenes presented

at 250 ms (t (59) = 1.25, n.s.), 50 ms (1(59) = 1.55, n.s.), and 20 ms (t (59) = 4.03, n.s.).

53

Discussion

Prior studies have shown that structural information is important in identifying
scene gist (Oliva & Schyns, 1997; Schyns & Oliva, 1994) and the speed at which gist
information is extracted may be responsible for the mixed results for the effect of color
(Davidoff, 1991). In Experiment 5, the structural information for each scene was
degraded by ﬁltering the high-level spatial frequency information from the scene
photographs. Results showed that when ﬁltered scenes were used, color did have an
effect on the activation of scene gist: monochrome scenes showed less of a bias than
colored scenes. Furthermore, the results suggest that the beneﬁt of color occurred later,
in that the effect of color started at 80 ms, which is well after the 42 ms onset of the bias
effect seen in Experiments 2 and 4. Taken together, the results of both Experiment 4 and
5 suggest that there is an interaction between available structure information and color
and that color may be used to categorize scenes and activate relevant semantic
information when structure information is degraded and the system cannot extract
structure information as easily. However, the two experiments look at different ranges of
duration and a comparison across experiments makes it difﬁcult to draw any ﬁrm
conclusions about the degree to which color contributes and the onset of its contribution.
The next chapter further investigates the existence and nature of the interaction between a

scene’s structural information and its color information.

54

THE INTERACTION BETWEEN COLOUR AND STRUCTURE ON SCENE GIST

PERCEPTION

Experiment VI

In Experiment 6, the effect of color on the activations of semantic scene
information is assessed by a within-subject manipulation of the availability of both color
and structure information. The results of Experiments 4 and 5 suggest that the
contribution of color to the activation of scene information is dependent on the relative
contributions of structure. If color contributes to the activation of scene gist only when
structural information is degraded, then the bias effect should be larger for the blurred
colored scenes than the blurred monochrome scenes and no color effect should be seen
between the sharp color and sharp monochrome scenes. On the other hand, if color is not
extracted and processed in the initial activation of scene gist, then there should be no
difference between the color conditions for either the sharp or blurred scenes.

Methods

Participants. Eighty Michigan State University undergraduates participated in
this experiment for credit in an introductory psychology course.

Apparatus & Stimuli. Experiment 6 was essentially the combination of
Experiments 4 and 5 into a single, within-subj ect experiment. With the addition of a
condition (sharpness), another 140 scenes were added to the experiment (for a total of
400 scenes) in order to keep the same number of scenes per condition. Each photograph
had 4 versions: sharp colored, sharp monochrome, blurred colored, and blurred
monochrome. Figure 11 shows an example scene in each of the sharp and color

conditions. Each version was created using the same procedures described in

55

Experiments 4 and 5.

 

Figure 11. Example of stimulus color conditions used in Experiment 6; (A) Sharp-Colored condition, (B)
Sharp-Monochrome condition, (C) Blurred-Colored condition, (D) Blurred-Monochrome
condition.

Design. In Experiment 6, the color, target, and duration conditions were the same
as those used in Experiment 5. In addition, a sharpness condition was included (scenes
were either sharp or blurred), resulting in a four factor, within-subject design: sharpness,
color, target, and duration (2 x 2 x 2 x 5).

Procedure. The experimental procedures were identical to the one in Experiment
5, with the following exceptions. The experiment took approximately 25 to 40 minutes to
complete. Participants were encouraged to take breaks while performing the experiment

and all participants took at least two breaks during the experiment.

56

Results

The analyses are presented in the same order as described in previous
experiments. Figure 12 shows the proportion “yes” responses for each target condition
for sharp-color scenes (Figure 12a), sharp-monochrome scenes (Figure 12c), blurred-
color scenes (Figure 12b), and blurred-monochrome scenes (Figure 12d). An omnibus
ANOVA also revealed that there was a signiﬁcant main effect of sharpness (F (1,79) =
91.64, p<0.01, MSE = 0.0378), color (F (1,79) = 4.99, p<0.05, MSE = 0.0925), target (F
(1,79) = 1372.33, p<0.01, MSE = 0.093), and duration (F (4,316) = 103.122, p<0.01,
MSE = 0.044). There were several signiﬁcant two-way interactions that included those
between target and sharpness (F (1,79) = 404.93, p<0.01, MSE = 0.0318), sharpness and
duration (F (4,316) = 15.22, p<0.01, MSE = 0.0224), target and color (F (1,79) = 13.65,
p<0.01, MSE = 0.0293), and target and duration (F (4,316) = 374.8, p<0.01, MSE =
0.024). There was also a signiﬁcant three-way interaction between sharpness, target and
duration (F (4,316) = 24.18, p<0.01, MSE = 0.0248). Finally, the analysis revealed that
there was a signiﬁcant 4-way interaction between sharpness, color, target, and duration

(F (4,316) = 3.52, p<0.01, MSE = 0.0202). No other interactions were signiﬁcant.

57

 

AaEv c8830
emu 03 on on on

 

 

E22308...“ Eoﬁwcoom ,

 

 

sesuodsea ,seK_ uoprodmd

G

EEC 5:530

 

 

EoEmcooEU 392200 I A

 

 

 

sesuodsea 59A,. uoruodord

 

 

 

3.5 5:930
emu on: on on om

 

CmquCOOC—L €929.00. I

 

D
A
.
.
A
A
_.
D

 

cud.

8.0

O
V
O
sssuodsea .seA, uoluodord

 

A25 c2350

 

 

acoumﬁcooci EoﬁmcoOI _

 

 

 

sesuodseu _eeA_ uomodord

 

 

 

3.2::—

9:25

aurorqoouow

.10po

58

Figure 12. Proportion “yes” responses to consistent target objects (darker bars) and to inconsistent
target objects (lighter bars) for each duration condition in Experiment 6. (A) Sharp-Colored scene
condition (B) Sharp-Monochrome scene condition (C) Blurred-Colored scene condition and (D) Blurred-
Monochrome scene condition. The Coloured condition is represented with blue bars, and the Monochrome
condition with the red bars; The Sharp condition is represented with the solid bars, and the Blurred

condition is represented with the hatched bars. Error bars represent Standard Error of the Mean

59

Of theoretical interest are the effects of color as a function of sharpness. In order
to simplify this analysis, difference scores were analyzed. Figure 13 shows the difference
scores plotted as a function of sharpness and color by the duration condition. Planned
comparisons were conducted between the color and monochrome scenes at each duration

condition. Table 1 shows the means and standard deviation for all conditions.

 

 

Sharp Blurred
Colored Monochrome Colored Monochrome
250
Consistent 0.88 0.85 0.81 0.74
(0.12) (0.15) (0.19) (0.24)
Inconsistent 0.12 0.13 0.19 0.19
(0.13) (0.16) (0.16) (0.1 7)
100
Consistent 0.85 0.81 0.68 0.61
(0.1 7) (0.15) (0.21) (0.21)
Inconsistent 0.16 0.13 0.26 0.28
(0.14) (0.16) (0.1 7) (0.18)
80
Consistent 0.83 0.82 0.59 0.53
(0.1 7) (0.1 7) (0.22) (0. 20)
Inconsistent 0.19 0.16 0.26 0.30
(0.16) (0.15) (0.1 7) (0.19)
50
Consistent 0.75 0.69 0.40 0.41
(0.19) (0.19) (0.21) (0. 22)
Inconsistent 0.24 0.25 0.26 0.30
(0.27) (0.1 7) (0.16) (0. 22)
20
Consistent 0.37 0.34 0.25 0.27
~ (0. 23) (0.23) (0.20) (0. 22)
Inconsistent 0.25 0.29 0.26 0.26
(0.22) (0. 24) (0.21 ) (0.21 )

60

Table 1. Mean (Standard Deviation) of Proportion of Wes” responses for Experiment 6.

For scenes in the Sharp condition there was no difference between color and
monochrome for all durations [250 ms: t (79) = 1.34, n.s.; 100 ms: t (79) = 1.41, n.s.; 80
ms: t (79) = 1.41, n.s.; 20 ms: t (79) = 1.43, n.s.)], with the exception of the 50 ms
duration (t (79) = 2.12, p<0.05), in which the colored condition had a greater bias than
the monochrome condition. For the blurred scenes condition, however, there was a
signiﬁcant effect of color at duration of 250 ms (t (79) = 2.4, p<0.05), 100 ms (t (79) =
2.93, p<0.01), and 80 ms (t (79) = 2.94, p<0.01). In all cases, there was an advantage for
the colored scenes over the monochrome scenes. There was no difference between the
colored blurred scenes and monochrome blurred scenes at durations of either 50 ms (t

(79) = 0.66, n.s.) or 20 ms (t (79) = -072, n.s.).

 

 

+Sharp Colored —O—Sharp Monochrome
—i- BlurredColored -I- Stand Monochrome

 

 

1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
-- -0.10

-0.20

fference Score (Consistent - Inconsistent)

 

0 50 100 150 200 250 300
Duration (ms)

 

 

 

Figure 13. Difference Scores (inconsistent mean — consistent mean) for each duration condition in
Experiment 6. The Colored condition is represented with blue lines, and the Monochrome condition with
the red lines; The Sharp condition are represented with the solid lines, and the Blurred condition are
represented with the dashed lines. Error bars represent Standard Error of the Mean.

61

Discussion

Experiment 6 was designed to look at the effects of color on normal and
structurally degraded scenes. These effects were examined for presentation durations
ranging from 20 ms to 250 ms. Results revealed that responses to sharp photographs did
not change according to whether the scenes were colored. However, blurred photographs
were at an advantage when presented in color versus monochrome. Thus, the ﬁndings of
Experiment 6 replicated the results of Experiment 4 and 5. More interestingly, the
contribution of color seems to occur later than the effects of structure. At presentation
durations of 50 ms, there was a deﬁnite (and not surprising) advantage for sharp scene
over blurred scenes, but there was no effect of color.

The contribution of color information begins at durations of 80 ms and continues
for longer durations for the blurred scenes only. Thus, it seems that structure information
was available earlier than color information. Although the data do not speak to the
question of why the color effects have a later onset, there remains two possibilities. On
the one hand, the effect of color could onset later than structure because the system only
uses luminance information in the initial stages of processing visual information
(Biederman & Ju, 1988; Davidoff, 1991). In this case, the effects of color are seen only
when color is available to the decision-making system and it has an onset of at most 80
ms from onset. On the other hand, it could be that both types of information are available
soon after onset, but the system is bias to use one type of information over the other.
Because most objects and scenes have highly variable shapes, it would be natural that the
system favors luminance information over color information as a default. In this case, the

later onset of color simply reﬂects a change in strategy of the system in that the color

62

information is eventually weighted more, but that this change in weighting of incoming
information takes time and is stimulus dependent. Therefore, the shift in the weighting is

seen only at later onset, namely ~80 ms.

63

THE ROLE OF COLOR IN SCENE GIST PERCEPTION

The experiments reported thus far have examined the relationship of color and
structure in terms of the effect that color has on scene gist perception when the structure
information is degraded (i.e., the scenes are blurred). Experiments 4, 5, and 6
demonstrated that when the structure was degraded, color contributed to the activation of
scene gist; however, when structure was normal, there was no effect of having the scene
presented in color. Yet none of these experiments addressed the question of how color
contributes to gist activation and whether it is being processed at all when color effects
are not seen.

The role of color in the object recognition literature is thought to be based on one
of two functions: color could help with the segmentation of the shape (thus, acts as a
auxiliary segmenter in scenes), or color could directly act as a cue for object identity due
to its association with the object in memory. In object recognition, the latter function of
color is thought to be limited to objects that are consistently perceived with a speciﬁc
color (i.e., that are color diagnostic). However, when the objects and distractors have
similar shape or structure (Biederman & J u, 1988; Price & Humphreys, 1989; Tanaka &
Presnell, 1999; Wurm, 1993), color becomes an important identifying factor. In fact,
many objects that are thought to be color diagnostic also share similar structure (e.g., red
fruit). Therefore, when the structure information across a stimulus set is highly
overlapping, the unique contribution of color seems to increase relative to the
contribution of structure information.

The role of color as diagnostic of identity in object perception has recently been

extended to scene perception (Goffaux et al., in press; Oliva & Schyns, 2000). In these

64

studies, when natural scenes were presented in normal color, participants’ performance
was higher in a veriﬁcation task (participants indicated whether a scene matched a given
label) than when the scenes were presented in grayscale. Furthermore, natural scenes that
were displayed with abnormal colors had a greater disadvantage than those presented
without color. Both of these studies’ ﬁndings support the notion that color provides more
than segmentation information. These results suggest that although color may sometimes
act as an auxiliary boundary segmenter for equiluminant regions, color also directly
provides scene gist information.

In light of the ﬁndings reviewed above, it is not clear if the effect of color (found
for blurred scenes in Experiments 5 and 6) was due to segmentation, or if color also
provides gist information. It is possible that during the initial processing of visual
information, scene gist activation is based on the scenes’ structure alone. In this case,
color contributed only for blurred scenes because the ability of the visual system to
segment region boundaries has been hampered by the ﬁltering of high spatial frequencies.
Therefore, even if the hues of the scenes are replaced with opposite hues, the activation
of gist should be the same as when the normal hues are present and should be greater than
when no color is presented (i.e., the scene is presented monochromatically) because color
is available to aid segmentation.

On the other hand, color information could be directly contributing to gist
activation independently of structure. In this case, color effects were seen for the blurred
scenes of Experiment 5 and the blur condition of Experiment 6 because the quality of
structure information was lessened and so color was able to contribute more information,

relative to structure information. If color is supplying gist information independently of

65

structure information, then swapping the hues of the scenes should adversely affect the

activation of scene gist.

Experiment VII

Experiment 7 addressed the question of whether color directly contributes to the
activation of scene gist when color effects are seen (i.e., for blurred scenes). The design
was similar to Experiment 6; however, an abnormally colored scene condition was added
to be compared to the colored and monochrome scene conditions. The abnormally
colored scenes are able to reveal how the system processes color information because the
misplaced color hues have no link to the relevant gist information. Therefore when
blurred, the abnormal color scenes will provide the segmentation information that is
thought to be provided by color, but will not be associated with the correct scene gist
information.

If the effect of color found in the previous experiments (Experiments 5 and 6) was
only due to color acting as a segmenter of equiluminant regions, then abnormally colored
scenes should affect the onset of scene gist activation equally to their normally colored
counterparts. Therefore, the response biases for abnormally colored scenes should be
greater than monochrome scenes and should be just as strong as the normally colored
scenes (i.e., not statistically different from normally colored scenes).

On the other hand, if color directly contributes to the activation of scene gist, then
color information in abnormally colored scenes should interfere with gist activation. In
this case, abnormally coloured scenes should produce less of a response bias than scenes

presented in normal color. Whether there is a signiﬁcant difference ﬁ'om monochrome

66

scenes will depend on how much interference the abnormal color causes compared to the
cost incurred by the absence of color (i.e., monochrome scenes).
Methods
Apparatus & Stimuli. A subset of 300 scenes was randomly selected from the set
used in Experiment 6. Each scene had three versions: normal color, monochrome, and
abnormal color. Figure 14 shows an example of a scene in each of the three color
conditions. The methods used to produce the abnormal color version are identical to
those used by Oliva & Schyns (2000). The photographs are transformed into L*a*b*
color space and the information for a* and b* channels are swapped and inverted, thereby ‘

producing hues that are opposite of each other within the L*a*b* color space. The

scenes were also blurred using the same methods described in Experiment 5.

 

Figure 14. Exarrrple of stimulus color conditions to be used in Experiment 7: (A) Normal Color condition,
(B) Monochrome Color condition and (C) Abnormal Color condition.

67

Design. Three factors that were varied for each photograph: color, target, and
duration conditions (3x2x5).

Procedure. The procedure was similar to Experiments 6, but participants were
shown only blurred scenes in normal color, monochrome, or abnormally color.
Results

The analyses are organized as follows. First, the response bias patterns are
analyzed for all color conditions. Then, the color effects are analyzed by comparing the
difference scores across color conditions as described in previous experiments. Figure 15
shows the proportion “yes” responses for each target condition for normal color (Figure
15a), monochrome (Figure 15b), and abnormal color scenes (Figure 150). An omnibus
ANOVA treating color, target, and duration (3 x 2 x 5) as within-subject factors revealed
that there was a signiﬁcant main effect of color (F (2,238) = 12.52, p<0.01, MSE =
0.023), target (F (1,119) = 1265.66, p<0.01, MSE = 0.066), and duration (F (4,476) =
83.92, p<0.01, MSE = 0.054). In addition, the interaction between color and target (F
(2,238) = 15.63, p<0.01, MSE = 0.027), and target and duration (F (4,476) = 332.02,

p<0.01, MSE = 0.030) were reliable. No other interactions were signiﬁcant

68

 

 

IColored Consistent IColored Inconsistent
1 .00 7.
l

§oso 77 777777
c .
30.70 77 77 77 7 777i.

50.607 7 7777 7
'8 0.50
c0407 7 7 7 7 7
.8 0.307
E 0.20

0.1077

0.007

 

 

20 50 80 100 250
Duration (ms)

 

 

 

Deuteron- @001me L

80
Duration (ms)

Vﬂbnormal Consistent iIAbnormal Inconsistent

80
Duration (ms)

 

Figure 15. Proportion “yes" responses to consistent target objects (darker bars) and to inconsistent
target objects (lighter bars) for each duration condition in Experiment 7. (A) Colored scene condition (B)
Monochrome scene condition (C) Abnormal scene condition Error bars represent Standard Error of the Mean.

69

Of speciﬁc theoretical interest was the effect of color between the abnormal
condition and the other two color conditions. The pattern of differences with the normal
color and monochrome conditions will decisively support one of the aforementioned

hypotheses. The means for all conditions are presented in Table 2.

Colored Abnormal Monochrome

 

250
Consistent 0.86 0.78 0.82
(0.13) (0.17) (0.15)
Inconsistent 0.18 0.22 0.20
(0.14) (0.1 7) (0.16)
100
Consistent 0.78 0.67 0.71
(0.15) (0.21 ) (0.1 7)
Inconsistent 0.30 0.31 0.30
(0.1 7) (0.18) (0.1 7)
80
Consistent 0.69 0.60 0.64
(0.18) (0.19) (0.19)
Inconsistent 0.33 0.32 0.16
(0.18) (0.18) (0.1 7)
50
Consistent 0.49 0.45 0.48
(0.22) (0.21 ) (0.21 )
Inconsistent 0.34 0.34 0.34
(0.19) (0.19) (0. 20)
20
Consistent 0.35 0.33 0.35
(0. 25) (0. 25) (0. 25)
Inconsistent 0.32 0.31 0.31
(0. 24) (0. 22) (0. 24)

Table 2. Mean (Standard Deviation) of Proportion of “yes” responses for Experiment 7.

In order to ﬁirther simplify these analyses, difference scores were calculated by
subtracting the Inconsistent Target means from the Consistent Target means for each

color and duration condition combination. Figure 16 shows the difference scores plotted

70

as a function of color and duration conditions. As explained above, the Segmentation
Hypothesis predicted that the color information, regardless of hue, should aid with the
deﬁnition of region boundaries that may have been lost with the removal of high spatial
frequency information. Therefore, the abnormal color scenes should produce the same
advantage over monochrome scenes as normal color scenes, and it also predicts that there
should be no difference between the normal color and abnormal color scenes. The Gist
Cue Hypothesis predicts that the abnormal color scenes should interfere with gist
processing because the hue information provided is not associated with the correct scene
gist, and may activate an unrelated scene gist. Therefore, the normal color scenes should
produce an advantage over the abnormal color scenes. Whether the abnormal color
scenes produce a cost that is equivalent, less than, or greater than the cost produced by
the monochrome scene is unknown because the proposed mechanisms of interference for
each of these scene types is different (i.e., no hues vs. misleading hues). To test the
predictions of these competing hypotheses, two simpliﬁed AN OVAs were conducted
comparing normal color to abnormal color scenes and monochrome to abnormal color

$061168.

71

 

l +Colored +Monochrome +Abnormal

 

 

 

0.80

0.60

 

 

0.40

0.20

 

 

 

Differece Score (Consistent - Inconsistent)

 

0.00

 

 

-0.20 I l I I V TV I T I l l I
0 20 40 60 80 100 120 140 160 180 200 220 240 260

Duration (ms)

 

Figure 16. Difference scores (Inconsistent — Consistent) for the Colored (blue) and Monochrome
(red) and Abnormal (green) scene conditions. Error bars represent Standard Error of the Mean.

First, the abnormal color scenes were compared to the monochrome scenes. A
simpliﬁed ANOVA showed that there was a main effect of color (F (1,119) = 8.46,
p<0.05), a main effect of duration (F (4,476) = 220.25, p<0.01), but no interaction (F
(1,59) < 0.1, n.s.). Planned comparison t-tests were conducted and showed no effect of
color across all duration conditions [250 ms: t(119) = 1.48, n.s.; 100 ms: 1(119) = 1.95,
n.s.; 80 ms: t(119) = 1.55, n.s.; 50 ms: t(119) = 1.51, n.s.; 20 ms: t(119) = 0.32, n.s.].

Second, a simpliﬁed ANOVA was conducted to compare the effect of normal
color scenes and abnormal color scenes. The analysis revealed a signiﬁcant main effect
of color (F (1,1 19) = 32.64, p<0.01), and duration (F (4,476) = 258.45, p<0.01), and a

signiﬁcant interaction (F (4,476) = 2.71 , p<0.05). Further planned comparisons using t-

72

test revealed an advantage of the normal color over the abnormal color for certain
duration conditions. There was a higher response bias for the normal color than
abnormal color for 250 ms (t (119) = 4.04, p<0.001), 100 ms (t (119) = 4.39, p<0.001),
and 80 ms (t (119) = 3.14, p<0.01). There was no difference between the normal color
condition and abnormal color condition at durations of either 50 ms (t (119) = 1.69, n.s.)
or 20 ms (t (119) = -0.30, n.s.)z.

One interesting aspect about creating abnormal color scenes by swapping and
inverting the color hues is that it is possible that some scenes in the abnormal color
condition that were produced would still fall within or close to its normal range of
possible color hues. For example, many man-made scenes have a greater variety of
possible colors than natural scenes (Oliva & Schyns, 2000). Because the normal hues
have an association to the correct color hue and because some of these abnormal color
scenes may be close to the normal range, it is possible that a difference exists in the
amount of interference that produced depending on how different the abnormal colors are
from a scene’s normal colors. To further explore the degree to which abnormal colors
can differentially interfere with the activation of scene gist, a secondary analysis was
conducted in which the abnormal scenes were divided into high and low abnormal. The
abnormal color scenes were rated by a separate group of participants (11 = 20) on a 7-point

Likert scale indicating the strangeness of the colors for each particular scene (1 -normal:

 

2 A more conservative set of postlroc analyses using Tukey’s LSD (p< 0. 05 for all conrparisons
collectively) were also conducted to check against possible Type II errors, but t-tests are reported in the
results section for consistency with the previously reported experiments.

Comparisons for the colored scenes and abnorrrml scenes at each duration condition revealed an
identical pattern as reported above with the t—tests, in that the normal color condition has a signiﬁcantly
higher response bias than the abnormal color condition at durations of 80 rrrs, 100 ms, and 250 nrs, but not
at either 50 ms or 20 ms.

73

7-extremely strange). Participants were shown the 300 images from this experiment; half
were shown in normal color and half were in abnormal color. Each participant saw each
scene once and the color condition for each scene was counter-balanced across
participants.

The abnormal scenes with an average rating between 1.1 and 5 were designated as
low abnormal and those scenes with an average rating of higher than 5 were designated
as high abnormal. There were 165 low abnormal scenes (average rating: 3.77) and 135
high abnormal scenes (average rating: 5.84). Figure 17 shows an example of a low and

high abnormal scene representing the average rating for each range.

 

Figure 17. Example of low abnormal and high abnormal stimulus used in Experiment 7; Panel (A) and (B)
show a scene in its Normal and Low Abnormal Color, respectively. The average rating for the image was
3.60. The average rating for the group was 3.77. Panels (C) and (D) show a scene in its Normal and High

74

Abnormal Color, respectively. The average rating for the image was 5.8. The average rating for the group
was 5.84.

Response bias difference scores were calculated for each abnormal group for all
duration conditions. Figure 18 shows the difference scores for normally colored scenes,

low abnormal, and high abnormal scenes as a function of duration.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

—e—Colour - i- Abnormal-Low +Abnormal High

1.00
E
E 0.80
i
{E 0.60
t;
E)
g 0.40
8
2
5’; 0.20

I

i .
E 0.00
a

‘0.20 T r T r r

o 50 100 150 200 250 300
Duration (ms)

 

 

 

Figure 18. Difference scores (Inconsistent — Consistent) for the Colored (blue), Low Abnormal
(solid green) and High Abnormal (dashed green) scene conditions. Error bars represent Standard Error of
the Mean.

A within-subjects AN OVA was conducted for the normal color and abnormal
color conditions with color and duration (3 x 5) as factors and revealed that there was a
signiﬁcant main effect of color (F (2,212) = 19, p<0.01), duration (F (4,424) = 171.64,

p<0.01), and a signiﬁcant interaction (F (4,424) = 2.51, p<0.01). Post-hoe comparison of

75

 

 

color conditions (using Tukey’s LSD, p<0.05 collectively) revealed that normally colored
scenes produced a larger response bias than low and high abnormal scenes. High
abnormal scenes were also signiﬁcantly lower overall than low abnormal. In order to
look more closely at the onset of the abnormal differences, post-hoe comparisons
between normally colored and each of the abnormal conditions were conducted at each
duration condition. Tukey’s LSD comparisons revealed that the response bias was greater
for the normally colored scenes than low abnormal scenes for durations of 100 ms and
longer and greater than high abnormal scenes for durations of 80 ms and longer. Taken
together, the results indicate that the greater the abnormal coloration of the scenes, the
more interference with the activation of the scene gist is produced and that the closer to
normal the abnormal colors are the later the onset of the interference.
Discussion

Experiment 7 investigated the role of color when structural information is
degraded. There were two hypotheses proposed: the Segmentation Hypothesis and the
Gist Information Hypothesis. The Segmentation Hypothesis was that color aided in the
extraction of scene regions and predicts that colored scenes (both normal and abnormal)
would show a greater bias than the monochrome scenes and would not be signiﬁcantly
different from each other. Predicting that color can only help in the extraction of gist
because it aids in the segementation of scene regions assumes that the structural
information of a scene is responsible for the activation of the scene gist. This hypothesis
is in line with the theory that luminance information alone is important for the object
identiﬁcation and scenes gist, and color can only play an indirect role (Biederman & Ju,

1988; Davidoff, 1991; Delorrne et al., 2000). On the other hand, the Gist Information

76

Hypothesis presupposes that color hue is associated with scene gist information and plays
a direct role in the activation of scene gist. Therefore, when the hues are swapped (as in
abnormal color scenes) the correct scene gist cannot be activated through color and the
hues that are present could activate an incorrect scene gist. The Gist Information
Hypothesis predicts that the normal color scenes would produce a greater bias than the
abnormal scenes because the hues in the abnormal scenes would be misleading and
interfere with the activation of the correct scene gist. This hypothesis assumes that color
is a direct cue for the correct scene gist, and holds some identity information; therefore it
is in-line with the color diagnostic arguments that state that color can be used for
identiﬁcation purposes because certain colors are diagnostic of certain stimuli (Joseph,
1997; Joseph & Profﬁtt, 1996; Price & Humphreys, 1989; Tanaka & Presnell, 1999;
Tanaka, et al., 2001).

The pattern of results for both the comparisons of the abnormal scenes to
monochrome scenes and the abnormal color scenes to the normal color scenes supported
the Gist Information Hypothesis. The bias effect was greater for the normal scenes than
the abnormal scenes for durations of 80 ms and longer and there was no interaction
between the abnormal color and monochrome scenes, although the signiﬁcant main effect
of color suggests that these two color conditions interfere with the activation of scene gist
in differently, with abnormal colors producing a slightly greater interference. It is clear
from this pattern that the hue information is directly contributing to the activation of
scene gist. Although these results support the Gist Information Hypothesis, they do not
necessarily rule out the Segmentation hypothesis. It is not possible to rule out the

Segmentation hypothesis with the current data because it could be that color is doing both

77

 

and the pattern of results viewed stem from both the cost of interference from the
abnormal hues and the beneﬁt from improved segmentation, with the overall cost being
higher. Future investigation would be required to rule out the role of color as a
segmenter of scene regions.

The ﬁnding that abnormal colors interfere with scene gist activation is similar to
previous studies of color diagnosticity. Color diagnosticity is deﬁned as the strength of
the association between a color and an object or scene. Many researchers measure this
strength by having participants name the color of a set of particular objects or rate the
frequency with which an object is usually seen in a particular color. Color diagnostic
stimuli are then deﬁned as the items that have a particular color that is intrinsically linked
to its identity and therefore, produced the most consistent responses (either ﬁequency
ratings or naming frequencies) across all participants (Price & Humphreys, 1989; Tanaka
& Presnell, 1999; Wurm et al., 1993). Another method is by plotting the average or most
frequent hue for each scene in a color space. Color diagnostic scenes are deﬁned as those
that form a tight cluster in one area of the color space with very little overlap with other
types of scenes (Oliva & Schyns, 2000). Either way, the proposed role of color is the
same: it contributes to identiﬁcation of the stimuli because it is associated with its
identity.

Inherent in the concept of color diagnosticity is that there is variation in the
contribution of color for any given object or scene. Some objects have a much stronger
association between a color and identity than others. For example, ripe bananas are
associated with yellow, while ripe apples can be red, green or yellow. The scenes in the

current study were not measured according to their diagnosticity because it was assumed

78

that for any particular scene, even one with a high variation (and therefore, considered to

have low color diagnosticity), the variation of acceptable hues is still limited. For

instance, kitchens are often colored in a wide range of color hues; however, it is still

possible to depict a kitchen in a hue that falls outside its normal range, such as

ﬂuorescent yellow. Although previous studies have deemed only certain objects or

scenes as color diagnostic, the assumption in the current study is that all scenes are

potentially “color diagnostic” as long as the hue change falls outside the normal variation.

If all hues are swapped and the resulting hues are outside this normal variation of a

particular scene, then the resulting hues interfere with processing and a cost in ‘
performance is observed.

In addition, there is the possibility that how far the abnormal scenes fall outside
the normal range could affect performance. It was possible with the current set of stimuli
to investigate whether abnormal hues that were conSidered to be highly abnormal
produced a greater cost than those that were considered to be close to normal. A post-hoe
analysis of the abnormal scenes was conducted in which the abnormal scenes were rated
as high or low abnormal by a separate group of participants suggested that the amount of
interference produced by abnormal scenes is dependent on how much the abnormal hues
departed from the hues considered normal for a particular scene. The bias effect was
greater for the low abnormal scenes than the high abnormal scenes. Therefore, it seems
that if the hues are closer to the normal variation of hues for a given scene, the correct
scene gist is activated more readily than if those hues are considered extremely abnormal.

Furthermore, the analysis of duration condition shows that as the duration of the

abnormal scenes increase the interference seems to increase. The pattern seen in Figure

79

14 suggests that the increased difference over time is due to the increase of the bias effect
for the normal color scenes over time, over which period of time there is no equivalent
increase for the abnormal scenes. However, these results are based on only on two points
and therefore, are interpreted only at a speculative level. It is not possible to draw any
ﬁrm conclusions about the nature of the interaction or how it inﬂuences performance
over time. However, do suggest an interesting interaction that can be investigated further

in future studies.

80

GENERAL DISCUSSION

The seven experiments reported in the current study investigated how quickly
information is extracted for the activation of scene gist after onset, whether color plays a
role in its rapid activation, and what type of role color plays under circumstances in
which its effects are seen. The Contextual Bias paradigm was introduced in which scene
gist onset is measured as the bias to afﬁrm objects that are consistent with the scene gist 1“
and to disconﬁrm objects that are inconsistent. This paradigm allows for the measure of
the onset of scene gist at the conceptual level by having that concept measured by another
component (likely objects), rather than by asking the participant to agree with the label
chosen for that concept. The scenes were presented for various durations and the onset of
the bias was used as an indication when scene gist was activated.

Across Experiments 1-3 (and all subsequent experiments), the onset of the bias
effect increased with an increase in the duration of scene presentation. These results
suggest that the activation of scene gist builds up over time as more visual information is
acquired. Further discussion of the implications for scene representations and
suggestions for future investigations will be discussed in detail later on.

Experiments 4-6 showed that removal of color information from normal scenes
produced no effect on the activation of scene gist. However, when structural information
is degraded (i.e., the scene is blurred by removing high frequency information, thus
slowing down the efﬁciency with which the scene gist information can be extracted),
color had an effect on the activation of scene gist. Blurred scenes that were presented in

color produced a more pronounced bias effect than those that were presented in

81

monochrome. These results suggest that color plays a role in the activation of scene gist,
but is dependent on lack of structural information.

Moreover, these results suggest two possible roles for color: Color may act as
either a segmenter of scene regions, or as a cue for scene gist. These hypotheses suggest
two different architectures for how color information may contribute to the activation of
scene gist. The Segmentation Hypothesis assumes that color contributes to the activation
of scene gist only when the structure information is degraded because it can help recover
some of the boundary edges that were lost when the high spatial frequency information
was removed. The architecture would involve an indirect route for color to scene gist
activation through structure information and thus, would mean that only structure can
directly activate scene gist. Alternatively, the Gist Information Hypothesis assumes that
color is associated with scene gist and therefore that it can act as a direct cue in its
activation. The architecture implied in this case would be that both structure and color
information are associated with scene gist and each can contribute to its activation.

To further explore the role that color plays in the activation of scene gist,
abnormally colored scenes were used in Experiment 7. Abnonnally colored scenes were
selected because the altered hue information can differentiate between equiluminant
regions and therefore provide segmentation, but the hues have no association with the
correct scene gist. The results strongly supported the Gist Information Hypothesis.
Response biases for the abnormal scenes were signiﬁcantly lower than the normal color
scenes and did not differ signiﬁcantly from monochrome scenes. Therefore, providing

segmentation information alone did not contribute to the activation of the scene gist.

82

 

Additionally, because the wrong hues were displayed, they could have potentially cued

other scene gists and thus, interfered with the activation of the correct scene gist.

Implications for Scene Perception

The interaction of structure and color and the ﬁnding that color does contribute
directly to the activation of scene gist suggests that scene gist may be activated by a
combination of weighted input from these two sources. It is possible that color
information is available, but as a default the visual system weighs structure information
more because it is usually sufﬁcient for the activation of scene gist. Therefore, the use of
color would depend on the infonnativeness of the scene structure. If the stimulus set has
indistinctive structures (i.e., same shape), or some structure information is irretrievable
(i.e., due to occlusion or blurring), then the color contributes relatively more to the
activation. This type of interaction suggests that color plays an important role in the
activation of scene gist, but only under speciﬁc circumstances. For instance, when
stimulus sets are structurally distinct (such as most collections of man—made objects),
color is less of a necessity for identiﬁcation (Biederman & Ju, 1988; Davidoff, 1991).
Therefore, not only do the results from the current study suggest how color and structure
interact, they also strongly suggest that color has a role in the activation of scene gist.
Additionally, the interaction outlined above can highlight why one would or would not
expect differences between color and monochrome stimuli. For example, as reviewed in
the discussion of Experiment 7, in some cases it seems that the role that color plays in the
identiﬁcation of certain objects depends on the strength of the association between a

particular hue and a particular item. However, previous studies have shown that the

83

 

association of color to these identities alone cannot be used to predict the usefulness of
color in the initial identiﬁcation of an item (Wurm et a1, 1993). Rather, it is the
combination of structure and color that: Given a certain structure (with many possible
identity candidates), color can further narrow the possible alternatives. Despite the
ﬁndings that color diagnosticity cannot be used to predict identity, given a certain
structure, color is a unique identifying feature.

Narrowing possible candidate items given a certain structure can also explain the
differences found in the contribution of natural vs. man-made scenes (Oliva & Schyns,
2000). These two types of scenes are categorically different, but that alone cannot
explain why the system would weigh color more with one particular type of scene, while
weighing structure more for another before the category is processed. However, from the
perspective of the interaction, the importance of color would have to rely on the structure
information. One property that natural scenes share is the inclusion of mass objects (e.g.,
water, sand, snow). Mass objects do not have any deﬁning structure, but instead have
deﬁning textures and colors. Based on the ﬁndings in the current study, one may
speculate that because scenes containing mass objects have similar structures, color is
more informative. Therefore, scenes such as deserts, beaches, ﬁelds, and forest seen
from a bird’s-eye perspective have similar structures that arise ﬁ'om various types of
mass objects. The interaction framework would then predict that because natural scenes
have more overlapping structures, color is weighed more heavily. Although they may
have stronger associations with color cues than the man-made scenes, deciding to use
color according to the structure would not require the scenes’ categorical membership be

known ahead of time. Another interesting prediction would be that man-made scenes

84

containing mass objects should also show a beneﬁt from having color (e. g., outdoor
pools, fountains, etc.). The default of having structure initially weighted more heavily
could provide the information that the system needs for adjusting the amount of
importance it assigns to the incoming perceptual information. When structure
information is poor, indistinct, or unavailable, then the system could essentially tune
itself toward color cues. This proposed system of “structure ﬁrst” in deciding which
perceptual cues to then consider is consistent with the later onset of color effects seen in
the current study. However, further investigation is necessary to determine the reason is
for the later onset of color across these experiments.

In addition to the properties of the stimulus set, the interaction between color and
structure framework also predicts that the default strategy of the system can be changed
according to the task. If the task involves a decision that relies more on color information
than structural information (for instance, estimating temperature of the environment or
deciding indoor vs. outdoor), then removing color information should produce a cost in
performance even when all structural information is available (i.e., normal, not blurred).
In addition, changing the contribution of color relative to structure by changing the task
could answer questions about the availability of color early on. In all experiments that
showed a color effect (Experiments 5-7), the contribution of color seemed to have a later
onset (~80ms) than the onset of the response bias overall. The later onset could result
from the fact that color is simply not available before 80 ms (due to a longer processing
time than luminance information) or it could be that changing from the default strategy to
an alternative source (i.e., from structure to color) takes time in a case in which structure

is relied upon three quarters of the time (sharp-color; sharp-monochrome and blur

85

monochrome). Future studies could examine whether color effects could onset earlier by
having a task in which in all conditions, color provides more useful information. By
using a change in task, the nature of the interaction between color and structure could be

further explored.

The Response Bias and long-term Memory for Scenes

From the current study, it is clear that with the initial representation of scenes,
certain biases are present. However, it is not clear how this bias then changes over time
and whether it is incorporated into the long-term representation of a particular scene.
Furthermore, explorations into how the response bias changes with increased exposure to
the scene may help explain a discrepancy in the literature about memory for scenes. It
seems that memory of the global properties of a scene is better than memory of speciﬁc
details within a particular scene.

Past studies on memory for complex scenes have shown that people have an
enormous capacity for remembering previously viewed scenes, even when the distractors
were mirror images of previously viewed scenes (N ickerson, 1965; Shepard, 1967;
Standing, 1973). Based on these studies, many researchers assumed that representations
of scenes included many details about the scenes, such as its component objects. The
problem with this assumption is that memory for the details of a brieﬂy presented scene
is often based on the scene’s associated semantic category and are affected by its
associated schemas, not the actual details present in the scene (Hollingworth &
Henderson, 1999). Studies have also shown that this inﬂuence of schemas is also seen

when participants are given an extended viewing time (Brewer & Treyens, 1981; Intraub,

86

1981). For example, when a scene is viewed either for 500 ms or 5 s, participants tend to
make an equal number and the same types of errors (boundary extension) when asked to
reconstruct the scene or to recall its details (Intraub, 1981). In another study, participants
were escorted into what they thought was a graduate student’s ofﬁce to wait until the start
of the experiment (Brewer & Treyens, 1981). In this case, participants were given
several minutes to study the room, although they were not made explicitly aware that
they would be tested later. Memory for this ofﬁce was then tested with various measures.
The most notable ﬁnding was that participants recalled non-existent objects in the room
(i.e., books). Recall of the non-existent objects was inferred from the schema, rather than
the room itself. Furthermore, studies examining memory for objects within scenes have
led many researchers to argue that there is little or no memory for visual information.
Several ﬁndings such as change blindness, incremental change, and inattenttional
blindness are used to argue for a poor memory of scenes overall. These arguments
capitalize on the fact that the memory for details may be poorer than memory for
different scenes. From these studies it seems that the biases that come with the initial
perception of a scene has a strong inﬂuence on how information about a speciﬁc scene is
retrieved. Despite the impressive capacity of remembering scenes, it is clear ﬁ'om
previous studies that memory for scene details are poorly encoded to begin with or may
simply be more difﬁcult to retrieve.

From the current study, with the initial activation of the scene gist, expectations
about the scene have an effect on how its contents are later recalled. The question now
becomes whether the bias seen during the initial activation of the scene is responsible for

these errors made later during recall. For instance, it could be that the bias becomes a

87

part of the memory for that scene and further exploration adds visual details to the initial
representation with visual details that are encountered, but the initial biases still persist.
On the other hand, it could be that with further exploration of the scene, the biases
diminish and are replaced with the information that is actually encountered.
Investigations into how the bias response is affected with longer viewing durations could
help address this question.

When asked to recall details within a scene versus recognize a previously viewed
scene it may be that participants rely on feelings of familiarity rather than recalling actual
object details. Future investigations could examine these possible differences using a
technique used often in the memory literature. In this technique, often referred to as
“know/remember judgments”, participants are asked to make a judgment on the
memories they retrieve by indicating whether they are recalling the actual instance
(remember) or whether they are basing their answer on a general feeling of familiarity
(know) (Inoue & Belleza, 1998; Rajarem, 1993; Rajarem & Roediger, 1997). These
types of judgments could easily be used when measuring for the retrieval of scene
knowledge versus actual memory for the objects in a particular scene. The
remember/know judgments could be especially intriguing method to examine how an
increase in encoding time may affect the bias. For instance, it could be that even when
participants are given ample time to exhaustively examine the scene, the response bias
will still be present and more “know” than “remember” judgments will be made. Or it
could be the case that when enough time is allotted to sufﬁciently build up the
representation of a particular scene, the response bias will decrease and more

“remember” judgments will be made. Investigations into the variability in recall abilities

88

between global properties and details of a scene could shed some light on the reasons for

the differences seen throughout the literature.

Conceptual vs. Visual Representations of Scenes

Recent controversies in the scene perception literature have led some researchers
to highlight a distinction between two types of information relating to scene
representations: a visual/perceptual and a conceptual representation of the scene
Glollingworth & Henderson, 2003; Oliva, in press; Potter, 1993; Potter et al., 2004).
The perceptual representation refers to the visual information gathered and analyzed by
the visual system. For instance, visual global information could include spatial layout,
textures, and the most dominant hue, while speciﬁc visual details could include the
orientation, shape and color of objects. The conceptual representation, however, refers to
a more abstract coding of the scene that may include related schemas and scripts, and will
often lead to expectations about the scene (e.g., what its component objects are and where
they are likely to be found within a scene). There are those researchers that concentrate
on the visual details of the scene: how visual information leads to the interpretation or
categorization of the scene (Biederman, 1988; Homa & Viera, 1988; Marr, 1981; Oliva &
Schyns, 1997, 2000; Schyns & Oliva, 1994; Torralba & Oliva, 2003), or what type of
visual information is represented and stored in memory (Castelhano & Henderson, in
press; Hollingworth & Henderson, 2002; Sanocki, 2003; Sanocki & Epstein, 1997). Then
there are those that concentrate on the conceptual representations of the scene and how
this representation can then affect further evaluations and memory for a particular scene

exemplar (Freidman, 1979; Intraub, 1981, 1988, 1999; Palmer, 1975; Potter, 1999; Potter

89

et al., 2002, 2004). Regardless of how the representation of the scene is evaluated,
studies have shown that within the ﬁrst few hundred milliseconds of viewing each of
these representations exist in some form.

Within the ﬁrst 100ms, a representation of the scene is formed, but the
representation is highly unstable (Potter, 1978, Potter et al., 2002, 2004). Potter (1999)
proposed that even if scenes can be perceived within that short amount of time, one
second of further processing by the system is necessary (even a blank screen) to store a
ﬁrnctional representation of the scene. In a recent study, Potter et ‘al. (2002) asked
participants to discriminate between scenes they had been brieﬂy shown and ones that
they had not been shown but that shared a similar conceptual representation. Results
showed that participants had better memory for a conceptualization of the scene than the
speciﬁc visual details of that scene. Intraub (1981) has also shown that after a brief
viewing of a scene, participants have speciﬁc biases in how the visual details of a scene
are reconstructed and have a tendency to extend beyond the boundaries of a particular
view of a scene during recall. Intraub explains these biases as a result of the related
semantic information, which expands the current view due to expectations drawn ﬁ'om
the conceptual representation of the scene. Taken together, these results have been used
to erroneously argue that visual details of a scene are reconstructed based on the
conceptual representation and that very little visual information is stored.

Other research has shown that a scene’s visual information is stored and that
some information such as spatial layout and color can then be used to improve both
detection, matching, recall, and recognition performance for previously viewed scenes

(Amano, Uchikawa, & Kuriki, 2002; Gegenfurtner & Rieger, 2000; Hanna &

9O

 

Remington, 1996; Sanocki & Epstein, 1997 ; Wichman, Sharpe, & Gegenfurtner, 2002).
For instance, in one study Wichman et a1. (2002) had participants recognize previously
viewed scenes that were originally shown either in color or in monochrome. In one
experiment, the images that were shown in their original state (color or monochrome)
showed an advantage of color in recognition accuracy. Importantly, the images were
selected from four categories (forests, ﬂowers, rock formations and man-made) and a
color advantage was found across all categories. These results indicate that the
associated semantic information (e. g., forests tend to be green) may not be as important
for recognition as the remembered hues in episodic memory. In another experiment, the
color state of the images was swapped between study and test. Results showed there was
an overall performance cost for showing a different state at test than seen at study, which
is consistent with the encoding speciﬁcity principle. However, the cost for removing
color was much greater than when color was added, indicating that the color in scenes is
processed and stored regardless of whether it is used in the initial conceptual
identiﬁcation of a particular scene.

Other studies investigating encoding and storage of visual information suggest
that this information is not iconic in nature, but rather abstracted away from the percept.
In one study, Amano et a1. (2002) had participants view images for a memory test that
had been modiﬁed by changing the color hues in a speciﬁc manner. The hues could
either change, but remain in the same category (e.g., one shade of red for another), or
change categories completely (e.g., from red to orange). For the within category changes,
the hue could be replaced with a border hue (e. g., to orangish-red) that was identiﬁed

within the RGB color space or could be replaced with the ideal hue deﬁned as the centre

91

of the color category with in the color space (e. g., pure red). Memory for images was
enhanced when colors were replaced with the ideal hue, slightly impaired when replaced
with the border colors, and greatly impaired when replaced with a hue ﬁom another
category. Amano et a1. concluded that memory for the color in images is not absolute,
but rather are more abstract. Furthermore, when participants were asked to compare
these modiﬁed images to the original, they are able to make the distinction. Therefore, it
is not the case that these modiﬁcations produced the same subjective percept. Taken
together, the results ﬁ'om many of these memory studies show that visual features are
stored to some degree during encoding. However, it is not the case that the system
encodes these features as exact visual copies, but rather as abstract visual information.
Change blindness was once thought to demonstrate how visual features within an
image that is currently being viewed are not encoded or permanently stored (for review,
see Hollingworth & Henderson, 2003; Rensink, 2000; Simons, 2000). However,
research scrutinizing how visual information was encoded and retrieved revealed that
visual information for scenes does exits in some form (Hollingworth & Henderson, 2002;
Simons, Chabris, Schnur, & Levin, 2002). Further studies by Hollingworth
(Hollingworth, 2003, in press; Hollingworth & Henderson, 2002) have shown that when
directly tested, participants were able to distinguish between a previously viewed object
and one that shares many similar visual features. Furthermore, this ability to recall or
recognize previously viewed objects remains intact even when the stimuli has been
absent from view for some time (Castelhano & Henderson, in press; Hollingworth &
Henderson, 2002). More recently, studies have shown that this information is stored

incidentally (Castelhano & Henderson, in press; Williams et al., in press). The

92

phenomenon of change blindness is explained as the failure to either encode the original
visual feature or as the failure to retrieve the previously encoded representation and make
the comparison to the current view (Hollingworth, 2003, in press). Therefore, not only
are visual features stored during the on-line exploration of a scene, but also the visual
information is stored into a more permanent representation that can potentially be
accessed when encountered again.

Many of the studies reported above investigated the nature of scene
representations by examining memory for the two types of information (i.e., perceptual
and conceptual). The current study was designed to look speciﬁcally at the initial
perception of the conceptual representation and what perceptual factors may affect how
quickly the conceptual representation is activated. Although the current study examined
only the initial conceptual representation of scenes, this is not to say that some visual
properties of the scene are not encoded into memory as well. Given that certain visual
features act as an important memory cue (e.g., color), it is intuitive to think that the
information used to activate the conceptual representation of the scene is also
consolidated and stored as a visual representation. However, the relation is between the
initial processing of visual information and the encoding of visual information into a
more permanent store is still unclear. There may be differences in how the conceptual
and visual information of scenes are perceived, encoded (e.g., may have variable
consolidation times), and retrieved. Future investigations will be needed to examine
whether the initially processed visual information is in fact stored and consolidated into a
long-term representation and what the nature of the visual information is when only a

brief amount of viewing time for the scene is allowed.

93

Conclusions

The present work examines the contribution of two perceptual factors (color and
structure) to the rapid activation of scene gist. Previous research showed mixed results
for the contribution of color in the initial identiﬁcation of objects and scenes. As a result,
researchers have been divided on whether color plays an important role in the initial
processing of visual stimuli. The current study sought to explain these differences as a
result of an interaction between color and structure. Rather than an all-or-none role for
color, the results from the current study suggest that the role of color may be explained in
terms of the properties of the stimulus set and the task selected. One way to
conceptualize this is in a dynamic visual system that shifts its use of information received
from early visual processes according to the task goals.

In addition, the Contextual Bias paradigm was introduced for the study of scene
gist. Instead of providing a label for each scene (e.g., kitchen, bedroom, park, etc.),
participants were asked to make judgments on objects that are related or unrelated to the
scene. In this way, the Contextual Bias paradigm avoids the problem of subjectivity in
labeling images and is still able to access the activation of the conceptual representation
of the scene.

Exploring the interaction between structure and color may help to guide ﬁiture
investigations into how the visual system takes incoming visual information and activates
the appropriate conceptual representation. In this way, research into the junction between
early perceptual processes and later cognitive processes can help to illuminate possible

strategies (either explicit or implicit in the visual system dynamics) that are used for

94

ﬁnding and using the most useful incoming information ﬁ'om all that is made available in

order to complete the current task or set of tasks most efﬁciently.

95

APPENDIX

96

APPENDIX

A list the scenes and the accompanying consistent and inconsistent objects used in all
experiments. There are 400 total scenes listed here, and all experiments with fewer
stimuli were a subset taken ﬁom this list. These photographs were taken ﬁprn
magazines, books, calendars, and the Internet and were formatted as 800 x 600 bmp ﬁles.

 

Item Scene Description Consistent Inconsistent

Number Object Object
l mountain road truck coat rack
2 white building with courtyard water fountain couch
3 restaurant patio waiter ﬁshing net
4 bedroom clock stove
5 city line with bridge statue barn
6 amusement park ferris wheel train station
7 shipping yard ﬁshing net couch
8 living room bookshelf car
9 kid's bedroom books lawn mower
10 alloy street lamp water fountain
1 1 garden wind chime motorcycle
12 garden with fountain sun dial ofﬁce building
13 kitchen microwave couch
14 living room lamp doll house
15 bathroom bath towel ﬂag
16 city bridge trafﬁc sign washing machine
1 7 cemetery gate bathtub
18 ﬁshing boats ﬁsh stop sign
19 store ﬁre hydrant bridge
20 building in forest ﬂag area rug
21 bikes bike helmet cows
22 garden birdfeeder streetlamp
23 venice street balcony television
24 harbor lifejacket painting
25 front of house mailbox stop sign
26 living room TV remote bridge
27 park frisbee television
28 street street sign wardrobe
29 living room book cashier
30 bathroom toilet paper calculator
3 1 bathroom lotion skateboard
32 bathroom toothpaste dictionary
33 bathroom mouthwash grill
34 greenroom rug stove
35 city bridge helicopter ice machine
36 city billboard ocean liner

97

37
38
39
40
41
42
43

45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77

78
79
80
81
82

garden
bar
outdoor cafe patio
living room
bedroom
backyard with pool
mountainous coast
cliff on ocean
city street
castle
cemetery
cemetery
dining room
street corner
kitchen
park statues
skyline
cliff
construction site
living room
ﬁeld
cemetery
courtyard
ﬁeld with cows
courtyard
street
dam
dining room
dining room
dining room
dining room
dining room
dining room
dining room
dining room
dining room
dining room
dining room
dining room
dining room
dining room

dining room
dining room
dining room
dining room
dining room

98

fountain
martini glass
bicycle
couch
bedside table
swimsuit
ocean liner
sailboat
motorcycle
fountain
ﬂowers
ﬂowers
wine glass
mailbox
refrigerator
park bench
ﬂag
ship
crane
lamp
cows
vase
birdfeeder
barn
sun dial
street lamp
boat
N9
china cabinet
water pitcher
tea cup
wine bottle
area rug
painting
fruit bowl
teapot
candelabra
place mat
wine bottle

candle
salt and pepper
shakers

coffee pot
bread basket
curtains
chandelier
portrait painting

monkey bars
bed
microwave
dresser
balcony
ﬁshing boat
skyscraper
street lamp
sandbox
ocean liner
street sign
beach ball
mouthwash
hammock
motorcycle
tractor
painting
oven
mug
dumpster
water fountain
doll house
dumpster
ice machine
parking meter
couch
sun dial
toaster
computer
grill
toaster
bird's nest
crib
baseball mitt
skateboard
phonebook
printer
toy chest
scale
mouthwash

rose bush
lawn mower
oven
stereo
toilet paper
life jacket

83
84
85
86
87
88
89
90
91
92
93
94
95
96
97

98

99

100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128

dining room
dining room
dining room/kitchen
dining room [living room
dining room
dining room
kitchen
kitchen
outdoors dining room
dining room
outdoors dining room
outdoors dining room
dining room
dining room

dining room
dining room/kitchen
dining room
dining room
dining room
dining room
dining room
dining room
dining room
dining room
desk
bedroom
living room
living room
artsy un-contemporary dining room
living room
living room
dining room
bedroom
bedroom
living room
living room
kitchen
street
forest
highway
houses
living room
city
classroom
living room
kitchen

99

oranges
blinds
garbage can
area rug
bookshelf
place mat
refrigerator
stove
pitcher
statue
bird feeder
tea pot
tea cup
silverware

salt and pepper
shakers

coffee pot
candles
napkins
portrait painting
ﬂower bouquet
painting
clock
blinds
water pitcher
phone
waste basket
painting
recliner
lamp
television
magazine rack
curtains
alarm clock
dresser
bookshelf
couch
stove
ﬁre hydrant
deer
speed limit sign
mailbox
bookcase
harbor
globe
ﬁreplace
fruit bowl

ﬂag
street sign
skateboard
toothbrush
buoy
bike rack
toilet
dresser
mirror
dishwasher
microphone
sandbox
moss
rake

couch
lawn mower
chalkboard
bicycle
overhead projector
recliner
swing set
pinball machine
phone book
printer
rocking horse
blender
overhead projector
playpen
speed limit sign
hammock
blender
toothpaste
mailbox
ﬁre truck
bed
toaster
sleeping bag
harbor
recycling bin
television
coffee maker
towel
chalkboard
microwave
sun dial
globe

129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174

house
ruins
fountain in city plaza
fountain in city plaza
fountain in city plaza
kitchen
kitchen
garden
ancient roman\greek ruins
bedroom
golden gate bridge
living room
den
rafts in canyon
cemetery
greenroom
docked sail boat
harbor/outdoor cafe
underwater reef
dining room
bedroom
pond
mountain tunnel
forest
home ofﬁce
ofﬁce
ofﬁce
bedroom
art studio
ofﬁce
ofﬁce
ofﬁce
home ofﬁce
ofﬁce
backyard
kitchen
store
kitchen
living room
living room
classroom
classroom
classroom
living room
bedroom
living room

100

ﬂowerbed
stairs
hedges
bench

garbage can -

refrigerator
phone
bench
bird's nest
toy chest
sailboat
coffee table
lamp
life jacket
wreath
magazines
life jacket
buoy
sting ray
china cabinet
mirror
frog
speed limit sign
bird
stapler
printer
notebook
lamp
recliner
waste basket
bookcase
laptop
scanner
scanner
towel
stove
cashier
refrigerator
lamp
stereo
books
garbage can
books
juice pitcher
bedside table
tv remote

dresser
car
birdfeeder
beach ball
baseball bat
mailbox
pillows
china cabinet
speed limit sign
cooler
pool
pick axe
Frisbee
bookcase
toy chest
stairs
lawn chair
bathrobe
bird
oil drum
life jacket
lamp
loveseat
wine bottle
pot
water fountain
bathrobe
weather vane
toaster
loveseat
hedge clippers
baseball bat
lawn chair
washing machine
crane
coffee table
couch
hammock
trees
crosswalk
trees
trailer
coffeemaker
wind chime
corkscrew
kiddie pool

175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220

bathroom
kitchen
ofﬁce
ofﬁce
kitchen shelves
kitchen
garden shed
closet
attic
bathroom
bathroom
living room
living room
bedroom
living room
temple
kitchen
living room
living room
bathroom
kitchen
kitchen
kitchen
kitchen
kiosk
kitchen
kitchen
kitchen
kitchen

kitchen
kitchen

kitchen
kitchen
kitchen
kitchen
kitchen
kitchen
kitchen
kitchen
kitchen
kitchen
kitchen
kitchen
kitchen

kitchen
kitchen

scale
oven
waste basket
desk lamp
potatoes
corkscrew
birdfeeder
mirror
toy chest
hamper
mouthwash
television
ﬁreplace
phone
magazine rack
podium
coffee maker
loveseat
television
toothbrush
cutting board
stove
cookie jar
crock pot
candy
toaster
dishwasher
dishwasher
paper towels
spatula
microwave
juice pitcher
blender
coffee maker
cutting board
spatula
crock pot
oven
fruit bowl
table
trash can
paper towels
food processor
soup ladle
wok
cookie jar

101

coffeemaker
well
shovel
cutting board
dresser
no parking sign
sofa
ﬁreplace
street sign
motorcycle
pillows
mailbox
boat
rocking horse
shovel
jukebox
desk lamp
birdbath
bench
bird house
ﬂower bed
welcome mat
stop sign
street light
lamp
coffee table
magazine rack
toothbrush
magazine rack
wrench
loveseat
fountain
dumpster
street sign
kiddie pool
monkey bars
tool shed
lighthouse
bathrobe
microphone
tent
swing set
hydrant
bookcase
ﬂag
chest

221
222
223
224
225
226
227
228
229
230
23 1
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
25 l
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266

kitchen
kitchen
kitchen
kitchen
kitchen
kitchen
kitchen
kitchen
kitchen
kitchen
kitchen
kitchen/dining area
kitchen
kitchen
kitchen
kitchen
kitchen
kitchen
kitchen
kitchen
kitchen

kitchen
dining room
kitchen
kitchen
road tunnel
harbor

building on lakeshore

road

street
living room
living room

motel

highway

living room
living room

living room type area

living room
living room
living room
living room
living room
living room
living room
living room
living room

102

wok
corkscrew
sink
bread basket
refrigerator
knives
frying pan
cookbook
pot
microwave
oven
bananas
bananas
refrigerator
chinaware
juice pitcher
mixer
coffee cup
cutting board
wine glass
kettle
corkscrew .
wine bottle
coffee maker
refrigerator
street sign
steering wheel
boat
car
weather vane
coffee table
painting
ice machine
billboard
ﬁre stoker
tissue box
television
coffee table
mirror
stereo
television
books
telephone
coffee table
chest
magazine rack

ferris wheel
kiosk
yield sign
coffee table
podium
tire swing
reading lamp
restroom sign
doghouse
sled
sundial
wardrobe
laptop
wind chime
ﬁshing rod
mailbox
kiddie pool
lawn chair
recliner
hose
crosswalk
hammer
shower
crib
bathtub
speed boat
painting
punching bag
keyboard
lighthouse
Ping-Pong table
overhead projector
loveseat
parking meter
ice machine
microwave
recycling bin
bathrobe
drying machine
shower
tricycle
ﬁshing boat
life jacket
toolbox
picket fence
tractor

267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
31 1
312

living room
living room
ﬁeld
forest
ship
motel
dive restaurant
country corner store
kitchen
houses
harbor of yachts
stores
store
boats
backyard
city square
town
mountain river
city
backyard
restaurant
city
store
gas station
porch
desk
lab
porch
street
river bridge
fruit stand
street and store front
living room
street with trolley
street
alley behind building
city
Vienna street
city bridge
garage
houses
building
building
building
street
street

103

coffee table
candles
rabbit
moss
sail
parking meter
open sign
welcome mat
blender
doorbell
ﬁshing rod
bike rack
street light
ﬁsh
birdhouse
bndge
truck
raft
billboard
swimsuit
menu
trafﬁc lights
mailbox
street sign
mailbox
fountain pen
ﬂask
ﬂower bed
yield sign
bicycle
peppers
bicycle rack
lamp
no parking sign
stop sign
dumpster
lighthouse
bﬂdge
trailer
radio
street light
bench
ﬂag
bike rack
bench
lamppost

hose
tree
no parking sign
television
car
stool
doll house
sandbox
barn
ﬁsh bowl
street sign
ship
lighthouse
street light
ﬁre hydrant
ferris wheel
bed
recliner
sailboat
street sign
faucet
overhead projector
coat rack
wine rack
parking meter
shovel
bathtub
ceiling fan
table
refrigerator
hedge clippers
dresser
dining set
aquarium
pillows
dining set
jukebox
fruit bowl
swing set
parking meter
buoy
rocking horse
podium
bndge
barn
dishwasher

313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358

street
courtyard
stores
stairs
store
street corner
alley
restaurant patio
street
square
street
building with river
Vienna street
Vienna street
castle
chinese building
chinese building
cemetery
Sydney
ancient hall
ancient hall
stairs
street
building
small street
street
street
bndge
fountain
pier
bedroom
deck
deck
porch
living room
river and rafts
restaurant
restaurant front
bedroom
bedroom

ancient roman\greek ruins

carnival
lake cottages
alley
ﬁeld
living room

parking meter
bench
street sign
garbage can
door
crosswalk
dumpster
garbage can
hedges
statue
ﬂag
ﬂag
boat
oar
ﬂower bed
garbage can
fountain
birdbath
bicycle
pillows
statue
fountain
parking meter
fountain
ﬂower bed
doorbell
ﬁre hydrant
ﬁshing boat
bench
ﬁshing boat
dresser
towel
wine bottle
birdfeeder
bowl
backpack
motorcycle
menu
closet
wardrobe
stairs
ferris wheel
ﬁshing rod
shovel
dog
picture frame

drying machine
step ladder
boathouse
toy chest
tricycle
sofa
lawn chair
barn
life jacket
couch
computer
swing set
food processor
dust pan
picket fence
ﬁre hydrant
painting
pool
silverware
restroom sign
phone booth
punching bag
refrigerator
phone booth
scale
porch swing
monkey bars
hedge clippers
shampoo
lawnmower
blender
car
Ping-Pong table
ofﬁce building
sandbox
speed limit sign
toaster
wardrobe
rocking horse
computer
car
garden
birdbath
ﬁre hydrant
crosswalk
blender

359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400

living room
building
kitchen
living room
living room
city
bﬁdge
city
harbour
backyard
street\buildings
street\buildings
street\buildings
street\buildings
street\buildings
street\buildings
street\buildings
street\buildings
street
swamp
temple
dining room
city street
road
house front
construction site
street
living room
street
street
venice street
venice street
store
venice street
venice street
venice street
stairs
pond
house
dining room
dining room
dining room

105

books
lamppost
food processor
candle
ceramic ﬁgurine
trafﬁc lights
ﬁshing boat
ﬂag
life jacket
swing
motorcycle
ﬁre hydrant
lamppost
no parking sign
parking meter
mailbox
phone booth
street light
mailbox
turtle
microphone
teabag
bench
speed limit sign
trash can
shovel
street light
magazines
truck
phone booth
ﬂower pot
ﬂag
painting
ﬂower pot
bﬁdge
boat
welcome mat
frog
lawn mower
vase
wine bottle
china cabinet

curtains
faucet
printer
ﬂower bed
overhead projector
port
stop sign
microwave
crosswalk
bathtub
ﬁle cabinet
guitar
painting
overhead projector
stool
barn
sheep
cows
aquarium
computer
menu
recliner
microwave
bike
pillows
phone booth
tractor
dresser
ﬁreplace
deer
television
barn
magazine rack
bird's nest
magazine rack
stop sign
pond
notepad
ﬁle cabinet
trunk
desk
ﬂower bed

REFERENCES

Amano, K., Uchikawa, K., & Kuriki, I. (2002). Characteristics of color memory for
natural scenes. Journal of the Optical Society of America, 19 (8), 1501-1514.

Biederman, I. (1972). Perceiving real-world scenes. Science, I 77(4043), 77-80.

Biederman, I. (1987). Recognition-by—components: a theory of human image
understanding. Psychological Review, 94, 115—147.

Biederman, I. (1988). Aspects and extension of a theory of human image understanding.
In Z. Pylyshyn (Ed.), Computational Processes in Human Vision: An
Interdisciplinary Perspective. New Jersey: Abblex Publishing Corporation.

Biederman, I., & Ju, G. (1988). Surface vs. edge-based determinants of visual
recognition. Cognitive Psychology, 20, 38764.

Biederman, 1., Mezzanotte, R.J., Rabinowitz, J .C. (1982). Scene perception: Detecting
and judging objects undergoing relational violations. Cognitive Psychology, 14,
143-1 77.

Breitrneyer, B.G., Kropﬂ, W., & Julesz, B. (1982). The existence and role of retinotopic
and spatiotopic forms of visual persistence. Acta Psychological, 52(3), 175-196.

Brewer, W. F ., & Treyens, J. C. (1981). Role of schemata in memory for places.
Cognitive Psychology, 13, 207-230.

Castelhano, M.S., & Henderson, J .M. (in press). Incidental Visual Memory for
Objects in Scenes. Visual Cognition: Special Issue on Scene Perception.

Castelhano, M.S., & Henderson, J .M. (under review). Initial Scene
Representations Facilitate Eye Movement Guidance.

Davenport, J.L., & Potter, MC. (2004). Scene Consistency in Object and Background
Perception. Psychological Science, 15(8), 559-564.

Davidoff, J. (1991). Cognition through color. MIT Press: Cambridge, MA.

Davidoff, J. & Ostergaard, AL. (1988). The role of colour in categorical judgments.
Quarterly Journal of Experimental Psychology: Human Experimental
Psychology, 40 (3-A), 533-544.

Davidson, M.L., Fox, M.J., & Dick, AC. (1973). Effect of eye movements on backward
masking and perceived location. Perception & Psychophysics, 14, 110-116.

106

De Graef, P., Christiaens, D., & d’Ydewalle, G. (1990). Perceptual effects of scene
context on object identiﬁcation. Psychological Research, 52, 317-329.

Delorme, A., Richard, G., & Fabre-Thorpe, M. (2000). Ultra-rapid categorization of
natural scenes does not rely on colour cues: a study in monkeys and humans,
Vision Research, 40, 2187— 2200.

Duhamel, J ., Colby, C.L., & Goldberg, ME. (1992). The updating of the representation
of visual space in parietal cortex by intended eye movements. Science,
255(5040), 90-92.

Edwards, R., Xiao, D., Keysers, C., Fdldiak, P., & Perrett, D. (2001). Color sensitivity of
cells responsive to complex stimuli in the temporal cortex. Journal of
Neurophysiology, 90 (2), 1245-1256.

Evans, A. & Treisman, A. (2004). Perception of natural scenes; is it really attention-ﬂee?
Paper presented at the Annual Workshop for Object Perception, Attention and
Memory, Minneapolis, MN.

Fabre-Thorpe M, Delorrne A, Marlot C, Thorpe S. 2001 . A limit to the speed of
processing in ultra-rapid visual categorization of novel natural scenes. Journal of
Cognitive Neuroscience, 1 3, 171-80.

Fabre-Thorpe, M., Richard, G. & Thorpe, SJ. (1998). Rapid categorization of natural
images by rhesus monkeys. Neuroreport, 9(2), 303-308.

Feldman, J .A. (1985). Four frames sufﬁce: a provisional model of vision and space.
Behavioral & Brain Sciences, 8, 265-289.

Friedman, A. (1979). Framing pictures: the role of knowledge in automatized encoding
and memory for gist. Journal of Experimental Psychology: General, 108, 316—
355.

Goffaux, V., Jacques, C., Mouraux, A., Oliva, A., Rossion, B., & Schyns. PC. (in press).
Diagnostic colors contribute to early stages of scene categorization: behavioral

and neurophysiological evidences. Visual Cognition.

Gegenﬁrrtner, K.R., & Rieger, J. (2000). Sensory and cognitive contributions of color to
the recognition of natural scenes. Current Biology, 10(13), 805-808.

Hanna, A., & Remington, R. (1996). The representation of color and form in long-terrn
memory. Memory & Cognition, 24, 322-330.

Henderson, J.M. and Hollingworth, A. (1999). High-level scene perception. Annual
Review of Psychology, 50, 243—271.

107

Hochberg, J. (1978). Perception (2nd ed.). Endglewood Cliffs, NJ: Prentice-Hall.

Hochberg, J. (1986). Representation of motion and space in video and cinematic
displays. In K]. Boff, L. Kauﬁnan, & J .P. Thomas (eds), Handbook of
perception and human performance (vol. 1, pp. 22:1 -— 22:64). New York: John
Wiley & Sons.

Hollingworth, A. (2003). Failures of retrieval and comparison constrain change detection
in natural scenes. Journal of Experimental Psychology: Human Perception and
Performance, 29, 388-403.

Hollingworth, A. (in press). Visual memory for natural scenes: Evidence ﬁom change
detection and visual search. Visual Cognition: Special Issue on Visual Search and
Attention.

Hollingworth, A., & Henderson, J .M. (1998). Does consistent scene context facilitate
object perception? Journal of Experimental Psychology: General, 127, 398-415.

Hollingworth, A., & Henderson, J. M. (1999). Object identiﬁcation is isolated ﬁ'om scene
semantic constraint: Evidence ﬁ'om object type and token discrimination. Acta
Psychological (Special Issue on Object Perception and Memory), 102, 319-343.

Hollingworth, A., & Henderson, J. M. (2002). Accurate visual memory for previously
attended objects in natural scenes. Journal of Experimental Psychology: Human
Perception and Performance, 28, 113-136.

Hollingworth, A., & Henderson, J. M. (2003). Testing a conceptual locus for the

inconsistent object change detection advantage in real-world scenes. Memory &
Cognition, 31, 930-940.

Homa D., & Viera, C. (1988). Long-term memory for pictures under conditions of
thematically related foils. Memory & Cognition, 16, 411-421.

Inoue, C. & Belleza, ES. (1998). The detection model of recognition using know and
remember judgments. Memory & Cognition, 26, 299-308.

Intraub, H. (1980). Presentation rate and the representation of brieﬂy glimpsed pictures in
memory. Journal of experimental Psychology: Human Learning and Memory, 6,
1-12.

Intraub, H. (1981). Rapid conceptual identiﬁcation of sequentially presented pictures.

Journal of Experimental Psychology: Human Perception and Performance, 7,
604-610.

108

Intraub, H. (1992). Contextual factors in scene perception. In E. Chekaluk, & K.R.
Llewellyn (eds.), The role of eye movements in perceptual processes (pp. 45-72).
Amsterdam: Elsevier Science Publications

Intraub, H. (1999). Understanding and remembering brieﬂy glimpsed pictures:
Implications for visual scanning and memory. In V. Coltheart (Ed), Fleeting
Memories: Cognition of Brief Visual Stimuli (pp. 47- 70). Boston, MA: MIT
Press.

Irwin, D. E. (1992). Memory for position and identity across eye movements. Journal of
Experimental Psychology: Learning, Memory & Cognition, 18, 307-317.

Irwin, DE. (1996). Integrating information across saccadic eye movements. Current
Directions in Psychological Science, 5, 94-100.

Jonides, J ., Irwin, D.E., & Yantis, S. (1982). Integrating visual information from
successive ﬁxations. Science, 215, 192-194.

Joseph, J .E., & Profﬁtt, DR. (1996). Semantic versus perceptual inﬂuences of color in
object recognition. Journal of Experimental Psychology: Learning, Memory, &
Cognition, 22(2), 407-429.

Joseph, LE. (1997). Color processing in object veriﬁcation. Acta Psychological Special
Issue: Higher level cortical processing of colour, 97 (1), 95-127.

Livingstone, MS. (1988). Art, illusion, and the visual system. Scientiﬁc American, 258,
78-85.

Livingstone, M.S., & Hubel, DH. (1984). Anatomy and physiology of a color system in
the primate visual cortex. Journal of Neuroscience, 4, 309-356.

Livingstone M.S., & Hubel DH. (1988) Segregation of form, color, movement, and
depth: Anatomy, physiology, and perception. Science, 240, 740-749.

Loftus, G.R., & Mackworth, NH. (1978). Cognitive determinants of ﬁxation location
during picture viewing. Journal of Experimental Psychology: Human Perception
and Performance, 4, 565—572.

Mackworth, N.H. & Morandi, A. J. (1967) The gaze selects informative details within
pictures. Perception & Psychophysics, 7, 173-178.

Marr, D. (1982). Vision : a computational investigation into the human representation
and processing of visual information. W.H. Freeman Press: San Francisco, CA.

109

Marr, D. & Nishihara, H.K. (1977). Representation and recognition of the spatial
organization of three-dimensional shapes. Proceedings of the Royal Society of
London B, 200, 269-294.

Metzger, R.L., & Antes, JR. (1983). The nature of processing early in picture perception.
Psychological Research, 45, 267—274.

Nickerson, RS. (1965). Short-terrn memory for complex meaningful conﬁgurations: a
demonstration of capacity. Canadian Journal of Psychology, 19, 155-160.

O’Regan, J. K. (1992). Solving the “real” mysteries of visual perception: The world as
an outside memory. Canadian Journal of Psychology, 46, 461 -488.

Oliva, A. (in press). Gist of a Scene. In Neurobiology of Attention. L. Itti, G. Rees and J.
Tsotsos (Eds.). Academic Press, Elsevier.

Oliva, A. & Schyns, P. G. (1997). Course blobs or ﬁne edges? Evidence that information
diagnosticity changes the perception of complex visual stimuli. Cognitive
Psychology, 34, 72-107.

Oliva, A. & Schyns, P. G. (2000). Diagnostic colors mediate scene recognition.
Cognitive Psychology, 41, 176-210.

Ostergaard, A.L., & Davidoff, J. (1985). Some effects of color on naming and recognition
of objects. Journal of Experimental Psychology: Learning, Memory, &
Cognition, 11(3), 579-587.

Palmer, S. E. (1975). The effects of contextual scenes on the identiﬁcation of objects.
Memory and Cognition, 3, 519-526.

Palmer, SE. (1977). Hierarchical structure in perceptual representation. Cognitive
Psychology, 9(4), 441-474.

Parraga, C.A., Brelstaff, G., & Troscianko, T. (1998). Color and luminance information
in natural scenes. Journal of Optical Society of America A, 15 (3), 563-569.

Price, C.J., & Humphreys, G.W. (1989). The effects of surface detail on object
categorization and naming. Quarterly Journal of Experimental Psychology:
Human Experimental Psychology, 41(4), 797-827.

Pollatsek, A., & Rayner, K. (1992). What is integrated across ﬁxations? In K. Rayner
(Ed.), Eye movements and visual cognition: Scene perception and reading,

(pp.166-191). New York: Springer-Verlag.

Potter, MC. (1975). Meaning in visual search. Science, 187, 965—966

110

Potter, MC. (197 6). Short-term conceptual memory for pictures. Journal of Experimental
Psychology: Learning, Memory & Cognition, 2, 509-522.

Potter, MC. (1993). Very short-term conceptual memory. Memory & Cognition, 21, 156-
161.

Potter, MC. (1999). Understanding sentences and scenes: the role of conceptual short-
term memory. In V. Coltheart (Ed), Fleeting Memories: Cognition of Brief Visual
Stimuli (pp. 13-46). Boston, MA: MIT Press.

Potter, M.C. & Levy, EL (1969). Recognition memory for a rapid sequence of pictures.
Journal of Experimental Psychology, 81,10-15.

Potter, M. C., Staub, A., Rado, J ., & O'Connor, D. H. (2002). Recognition memory for
brieﬂy-presented pictures: The time course of rapid forgetting. Journal of
Experimental Psychology: Human Perception and Performance, 28, 1163-1175.

Potter, M. C., Staub, A., & O'Connor, D. H. (2004). Pictorial and conceptual
representation of glimpse pictures. Journal of Experimental Psychology: Human
Perception and Performance, 30, 478-489.

Pouget, A., Fisher, S.A., & Sejnowski, T]. (1993). Egocentric spatial representation in
early vision. Journal of Cognitive Neuroscience, 5, 150-161.

Raj arem, S. (1993). Remembering and knowing: Two means of access to the personal
past. Memory & Cognition, 21, 89-102.

Rajarem, S., & Roediger, H. (1997). Remembering and knowing as states of
consciousness during retrieval. In J .D. Cohen & J .W. Schooler (Eds), Scientific
approaches to consciousness. (pp. 213-240). Hillsdale, N.J.: Englandiates, Inc.

Rayner, K. (1998). Eye movements in reading and information processing: 20 years of
research. Psychological Bulletin, 124, 372-422.

Rayner, K., & McConkie, G. W. (1976). What guides a reader's eye movements? Vision
Research, 16, 829-837.

Rayner, K. & Pollatsek, A. (1983). Is visual information integrated across saccades?
Perception & Psychophysics, 34, 39-48.

Renninger, L.W. & Malik, J. (2004). When is scene recognition just texture recognition?
Vision Research, 44, 2301-2311.

Rensink, RA. (2000). The dynamic representation of scenes. Visual Cognition, 7, 17742.

111

Rossion, B., & Pourtois, G. (2004). Revisiting Snodgrass and Vanderwart’s object
pictorial set: The role of surface detail in basic-level object recognition.
Perception, 33(2), 217-236.

Ryan, T.A., & Schwartz, CB. (1956). Speed of perception as a function of mode of
representation. American Journal of Psychology, 69, 60-69.

Sanocki, T. (2003). Representation and perception of spatial layout. Cognitive
Psychology, 47, 43-86.

Sanocki, T., & Epstein, W. (1997). Priming spatial layout of scenes. Psychological
Science, 8(5), 374-378.

Schyns, P.G., & Oliva, A. (1994). From blobs to boundary edges: Evidence for time- and
spatial-scale-dependent scene recognition. Psychological Science, 5, 195—200.

Shepard, RN. (1967). Recognition memory for words, sentences and pictures. Journal of
Verbal Learning and Verbal Behavior, 6, 156-163.

Simons, D. J. (2000). Attentional capture and inattentional blindness. Trends in Cognitive
Sciences, 4, 147-155.

Simons, D. J., Chabris, C. F., Schnur, T. T., & Levin, D. T. (2002). Evidence for
preserved representations in change blindness. Consciousness and Cognition, I I,
78-97.

Standing, L. (1973). Learning 10,000 pictures. Quarterly Journal of Experimental
Psychology, 25, 207-222.

Tanaka, J .W., & Presnell, L.M. (1999). Color diagnosticity in object recognition.
Perception & Psychophysics, 61(6), 1140-1153.

Tanaka, J.W., Weiskopf, & Williams, P. (2001). The role of color in high-level vision.
Trends in Cognitive Sciences, 5 (5), 211-215.

Thorpe, S., Delorrne, A., & Van Rullen, R. (2001). Spike-based strategies for rapid
processing. Neural Networks, 14(6- 7), 715-725.

Thorpe, S., Fize, D., & Marlot, C. (1996). Speed of processing in the human visual
system. Nature, 381 (6582), 520-522.

Torralba, A. (2003). Contextual priming for object detection. International Journal of
Computer Vision, 53, 153-167.

112

Torralba, A., & Oliva, A. (2002). Depth estimation from image structure. IEEE Pattern
Analysis and Machine Intelligence, 24,1226-1238.

Torralba, A., Oliva, A. (2003). Statistics of Natural Images Categories. Network:
Computation in Neural Systems, 14, 391-412.

Van Rullen, R., & Thorpe, SJ. (2001). The time course of visual processing: from early
perception to decision- making. Journal of Cognitive Neuroscience, 13, 454-461.

Van Rullen & S.J. Thorpe (2002). Surﬁng a spike wave down the ventral stream. Vision
Research, 42(23), 2593-2615.

Wichman, F.A., Sharpe, L.T., & Gegenfurtner, K.R. (2002). The contributions of color
to recognition memory for natural scenes. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 28(3), 509-520.

Williams, C. C., Henderson, J. M., & Zacks, R. T. (in press). Incidental visual memory
for targets and distractors in visual search. Perception & Psychophysics.

Wurm, L.H., Legge, G.E., Isenberg, L.M., & Luebker, A. (1993). Color improves object
recognition in normal and low vision. Journal of Experimental Psychology: Human
Perception and Performance, 19(4), 899-911.

113

   

mgmgggmgwmnuW