...
I}\¥)"1.
,.,.m....nm.§...,
.. ‘ .11.)»;-
.x._.:.. ‘ c u. ‘
. g... n}, 1.... “an. ..
.. 11.3 3......
4 V la -!3“’I’a. -
. a... «5.5.2.:
‘ ..:. 1.5}...
. . ‘ a 5....ﬂmwsﬂmﬁbn
.\ A . ..
3:3:
. . '81 I
:33“:
: 3 g .
, 3
1.... 93mm :
t3l123"!~‘ -.
IhﬂtVlr’115‘1rlvvta
iﬁxuphﬂil f

. 4

vii? a. r

Y-I.l’1. 1‘: V”
a.

1.1;: ‘
. .i‘ 1 an
. 13.1....

.y z
‘7

1. 1..
.437. a. £53.
.2 . xv ,
5. rainy!
i
.1.
3??? ‘
.95.. .
I. .1.» L... s
o .7 ’3‘! ‘. 1.
. I.
r «in... a.
.1
R 5.4...
.. 514i. . .
it: . T.
.9229 s :1... .
.11. .5 t2,
1.....54io4 .4. J x
I . . :9. ‘ N y l‘.|7:.
3!, . s
. i ,
, 1! 4.1, .5:
, 1...: .. {Etc}:
. [£31.33 p.
l .
y n] I
3. 1.. . ..
. I v .ilanvuln
2:!!!
A “INVEJutJ ,
.l i. I}...
1.0.1.5311... 7|:
.. IzdeOIV’tum
.5111. ... . . blurs"... 3 :.
.v< tnuhﬂﬁb 5.2; ‘ (301‘, u .
.1 4 .. nth . .
v
:7
. 55:? I Y
.. I: L... 5;. .f.
4. , .- 9. 31
. l i .
t
V Ilﬁ.‘ 3“
ill...“ 1’
I...
s 3...}. Icf. I?
.. u. .. {:57
- . :J. .I! is“.
T336.
1?: A
r
{If x.
(if;
f .2 ti... 1
203.31.}. r
3" ,

I‘ll

.v . 2.... .3...
.Q ankgkak,
x3 I,

p»
.3.
3. 1.3

, 3:. ..
. ”ﬁg.

3
.. v E. :2

. 4.11:: . 0.,
i?

7.1.;

4%,“...

. :1...

3% 3:.

5..

 

'h
an

8007

 

 

LIBRARY
Michigan State
University

 

This is to certify that the
dissertation entitled

THE INTEGRATION OF INFORMATION ABOUT OBJECTS
ACROSS EYE MOVEMENTS

presented by

DANIEL A. GAJEWSKI

has been accepted towards fulﬁllment
of the requirements for the

Doctoral degree in Psychology

 

 

 

 

  

 

 

/ Major Professor’s Sign re
I, E/Zyoe
Date

MSU is an Afﬁrmative Action/Equal Opportunity Institution

 

..-._-—.—.--—a-o—-—-_._..._._~.—-._-_.-_-—-o-.-~—-—-—--u—-—o--.—._..—.-—.--u-n—o—

v—---0—.-9—o—u-o—u—o—o—o-n-n-a—u—uu—-'~.

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE

DATE DUE

DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6/01 c:/CIRC/DateDue.p65—p. 1 5

 

THE INTEGRATION OF INFORMATION ABOUT OBJECTS
ACROSS EYE MOVEMENTS

By

Daniel A. Gajewski

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

2006

ABSTRACT

THE INTEGRATION OF INFORMATION ABOUT OBJECTS
ACROSS EYE MOVEMENTS

By
Daniel A. Gajewski

This work investigated the nature of the information about objects that is
maintained and integrated across eye movements. Given that eye movements are needed
to bring objects from the periphery to the fovea so that visual details can be resolved,
what information acquired about an object in the periphery before the saccade plays a
functional role in the identiﬁcation of that object when it is ﬁxated? In the extrafoveal
preview paradigm, participants direct their eyes to a peripherally presented preview
object that is replaced during the saccade with a to-be-named target object. Inferences
about the nature of the information integrated are made on the basis of preview beneﬁts,
which are the differences in naming latencies when the preview and target objects are
similar versus dissimilar.

The current study determined the role for visual integration in the generation of
preview beneﬁts using full-color pictures of real-world objects and a non-repeating
stimulus set. Preview and target objects were from the same basic-level category but
varied in terms of visual similarity. Preview beneﬁts were observed for identical and a
range of visually dissimilar previews compared to meaningless-object and different-
object controls. These effects were observed despite the fact that items were not repeated.
The magnitudes of the preview beneﬁts were largely undiminished by surface-level
differences between the preview and target, such as color, texture, and the shape of the

parts, suggesting that the representations involved are abstracted away from these

properties. Preview beneﬁts were reduced, however, by differences in viewpoint
generated by rotation, mirror-reversal, or by taking an entirely different perspective of the
object.

To determine the extent to which facilitation depends on identiﬁcation of the
preview object before the saccade, a regression analysis was employed examining
preview beneﬁts as a function of the proportion of participants who correctly identiﬁed
each object on the basis of a peripheral preview alone. Preview beneﬁts increased as
objects were more readily identiﬁed from the periphery, and the better part of the effect
depended on identiﬁcation accuracy; however, preview beneﬁts were also observed for
objects that never or rarely were identiﬁed from the periphery.

A second regression analysis examined the combined roles for visual similarity
and preview identiﬁcation. Here, the importance of maintaining an object’s viewpoint
across saccades was conﬁrmed, but the effect of viewpoint-similarity did not depend on
the identiﬁability of the preview object. The pattern of results suggests that preview
beneﬁts are enhanced by but not dependent on identiﬁcation before the saccade, and that
integration at the level of object identity combines additively with that provided by the

integration of visual information.

ACKNOWLEDGEMENTS

I would like to thank John Henderson, Rose Zacks, Erik Altman, and Fred Dyer
for their helpful comments on the dissertation. I also wish to thank Gary Schrock and
Christy Miscisin for their technical assistance as well as a number of research assistants
who have assisted with various aspects of this project, ranging from stimulus generation
to the collection of data: Jennifer Gorman, Lyaz Marshall, Paula Ogston, Tonisha Banks,
Matt Piszczek, and Twila Starosciak.

Portions of this research were supported by NSF-IGERT Grant ECS-9874541.
This material is based upon work supported by, or in part by, the U. S. Army Research

Ofﬁce under grant number W911NF-O4-1-OO78 awarded to John Henderson.

iv

TABLE OF CONTENTS

List of Tables ......................................................................................................... vi
List of Figures ........................................................................................................ vii
Introduction ............................................................................................................ 1
The Spatiotopic Fusion Hypothesis .................................................................. 1
The Continuation of Processing Across Saccades ............................................ 3
Transsaccadic Object Identiﬁcation .................................................................. 7
The Role of Spatial Location ............................................................................ 9
The Two Representational Systems Theory ..................................................... 14
Overview of Current Research .......................................................................... 16
Experiment 1 ........................................................................................................... 20
Methods ............................................................................................................ 23
Results and Discussion ..................................................................................... 27
Experiment 2 ........................................................................................................... 30
Methods ............................................................................................................ 33
Results and Discussion ..................................................................................... 34
Experiment 3 ........................................................................................................... 37
Methods ............................................................................................................ 42
Results and Discussion ..................................................................................... 45
General Discussion ................................................................................................. 55
Appendix A ............................................................................................................ 66
Appendix B ............................................................................................................ 69
Appendix C ............................................................................................................ 71
Appendix D ............................................................................................................ 72
Appendix E ............................................................................................................ 74
Appendix F ............................................................................................................ 77
References ............................................................................................................ 79

LIST OF TABLES

Table 1. Mean Naming Latencies and Standard Errors (in milliseconds) for

Experiments 1 and 2 by Preview Condition. ............................................................. 28
Table 2. Correlations for the Measures Employed in Experiment 3 (* p < .01) .............. 49
Table 3. Beta Weights From Regression Analysis I (* p < .05). ..................................... 50
Table 4. Correlations of Preview Beneﬁts with Visual Similarity Ratings

(* p < .01). .................................................................................................................. 52
Table 5. Beta Weights From Regression Analysis 11 ("‘ p < .05). .................................... 54

vi

LIST OF FIGURES

Figure 1. Schematic illustration of the displays presented in Experiment 1 (Top).
During the ﬁrst display, participants ﬁxated a small plus sign on the left-hand
side of the screen. A preview object was then presented in the second display
and participants initiated a saccade toward the object. While the eyes were
moving, the display was changed to present the target object. Participants
named the target object as quickly as possible after the saccade. Example
items for the meaningless-object, different-object, and identical preview
conditions are also shown (Bottom). Full-color images were used in the actual
experiment. The trial illustration is not shown to scale. ............................................ 21

Figure 2. Example stimuli from Experiment 2. The columns from left to right
correspond to the identical, visually-dissimilar, and maximally-dissimilar
preview conditions. The objects in the identical and visually-dissimilar
conditions were the same as those employed in Experiment 1. Most objects in
the maximally-dissimilar condition were different exemplars from an
absolutely different perspective (the camera), some were different exemplars
mirror-reversed (the pen), and a few were substantially different in another
way (the apple). Full-color images were used in the actual experiment. ................... 32

Figure 3. Frequency distribution for the proportions of participants who correctly
identiﬁed a given object from the periphery in the extrafoveal identiﬁcation
task. ............................................................................................................................ 47

Figure 4. Preview beneﬁts (in milliseconds) as a function of extrafoveal
identiﬁcation .............................................................................................................. 48

vii

INTRODUCTION

Human vision is dynamic: one’s perception of a scene is a product of a sequential
sampling process. Because the fovea] region of high visual acuity covers an area
corresponding to only about two degrees of visual angle, the eyes are directed from one
point to another at a rate of nearly three times per second to resolve and encode the
details. In addition, information is extracted from the environment primarily during
ﬁxation, when the point of regard is stable (Matin, 1974; Rayner, 1998). As a result, the
perception of a scene can be characterized as a series of relatively discrete glimpses or
snapshots of the world, with high-resolution information available only at the center of
vision. The dynamic properties of visual information acquisition, coupled with the
variable resolution of the input, lead to a number of empirical questions. The current
research is concerned with the nature of the representations that are integrated from one
ﬁxation to the next. In what sense does information acquired from one ﬁxation carry over
and combine with information acquired during the subsequent ﬁxation? Must visual
processing begin anew with each ﬁxation?
The Spatiotopic Fusion Hypothesis

The initial hypothesis about the combining of information across saccades was
most literal. The idea, labeled the spatiotopic fusion hypothesis by Irwin (1993), was that
contents of individual ﬁxations could be melded together within a spatiotopic buffer to
form a stable, coherent percept. The fusion hypothesis was stated formally by a number
of theorists (e.g., Feldman, 1985; McConkie & Rayner, 1976; Trehub, 1977), but the idea
is actually quite old. In cognitive psychology’s original text, Neisser (1967) suggested

that it was common to assume that successive snapshots are projected “onto the right

place in a higher-level ‘map’ of phenomenal space” (p. 140), and by most accounts the
idea can be traced back to Helmholz ([1867] 1925).

The idea of successive glimpses translated and fused together within a spatiotopic
reference frame is appealing because it could simultaneously account for a number of
puzzles of human visual perception. By maintaining information from previous glimpses
in a spatially organized buffer, the spatiotopic fusion hypothesis gives memory a
prominent role in one’s immediate perception of the world, potentially explaining why
disruptions associated with eye movements go unnoticed, why the percept seems to
provide more detail than is available in the retinal input at any given point in time, and
how the positions of objects in the world with respect to the viewer are perceived as
stable despite the retinal ﬂux, a phenomenon called visual direction constancy. However,
while there was some initial support for the idea (e.g., Breitrneyer, Kropﬂ, & J ulesz,
1982; Davidson, Fox, & Dick, 1973, Jonides, Irwin, & Yantis, 1982; Ritter, 1976; Wolf,
Hauske, & Lupp, 1978; 1980), the early ﬁndings have been countered by a number of
negative results. One informative task involves the presentation of two arrays of dots, one
before and one after a saccade. The arrays are created by randomly placing 24 dots in a 5
by 5 matrix so that one set of 12 appear in a presaccadic array and a different set of 12
appear in a postsaccadic array. Participants are required to report the single location in
the matrix that is left unﬁlled. Because the arrays appear at the same location in space but
at different retinal locations, successful report of the missing dot should occur only if the
two arrays are fused spatiotopically. Early successes in tasks of this nature (e.g.,
Breitrneyer et al., 1982; Jonides et al., 1982) could not be replicated and were attributed

to phosphor persistence (Irwin, Yantis, & Jonides, 1983; Rayner & Pollatsek, 1983).

Although the decay rate of the screen phosphor in the original experiments was thought
to be sufﬁciently fast, careﬁrl examination of the issue suggested that there was enough
residual illumination from the initial array of dots that the two arrays could to some
degree be seen at the same time.

Performance failures in the dot matrix task were echoed in tasks that required the
combining of line of segments to form three-letter words (O’Regan & Lévy-Schoen,
1983), the summation of sine wave gratings to enhance spatial frequency judgments
(Irwin, Zacks, & Brown, 1990), and the spatiotopic combining of a mask or bar probe
with the location of a letter within an array (Irwin, Brown, & Sun, 1988). The results of
an early study employing an alternating letter case paradigm were also inconsistent with
the spatiotopic fusion hypothesis. McConkie and Zola (1979) had participants read
passages of text with words composed of alternating upper and lower case letters (e. g.,
ThE fLoRiDa EvErGlAdEs). During the reading of the text a number of saccades were
selected on the basis of an eye velocity criterion, and during these saccades the text was
either changed so that every letter switched case (e.g., tI-Ie FlOrIdA eVeRgLaDeS) or the
text remained unchanged. Fixation durations, saccade lengths, and regressive saccades
were not affected by this manipulation; participants did not even notice that these
changes were taking place while they were reading. If pattern information were
spatiotopically aligned and combined across saccades, these changes should have been
salient and reading should have been disrupted.

The Continuation of Processing Across Saccades
While the spatiotopic fusion hypothesis does not appear to be a valid

conceptualization for transsaccadic integration, investigations of the perceptual span in

reading supported the idea that information is integrated across eye movements in one
way or another. For example, in an early study using a saccade-contingent boundary
technique, Rayner (1975) had participants read short passages that contained a number of
critical words that were sometimes altered until the gaze position crossed a software-
deﬁned boundary. Fixation durations after the boundary crossing were shorter when the
word before the crossing shared properties of the word after the crossing, such as
beginning letters or word shape, suggesting that information acquired during the pre-
crossing ﬁxation contributes in some way to processing during the post-crossing ﬁxation.
To investigate this process more directly, Rayner (1978; Rayner, McConkie, &
Ehrlich, 1978) developed an extrafoveal preview paradigm. In this paradigm, participants
initiate a saccade toward a location in the periphery where a preview word is presented.
While the eyes travel toward the preview item (a word or nonword), a saccade-contingent
display change is executed and the preview item is replaced with a to-be-named target
word. The underlying assumption is that identiﬁcation and naming will be facilitated to
the extent that information provided by the preview is retained and integrated with
information acquired when the target is ﬁxated. The kind of information that is integrated
can be explored within this paradigm by manipulating the similarity between the preview
and target items. Preview beneﬁts, the difference in naming latencies when the preview
and target items are similar versus dissimilar, should be observed when the additional
information provided by the similar preview is maintained and integrated. In their studies,
the orthographic similarity of preview and target items was manipulated. Target words
(e.g., phone) were named fastest when the preview was identical. However, naming was

faster when the preview was a word (plane) or a nonword (ptcne) that shared the target’s

initial and ﬁnal letters and maintained the target’s shape than when only the terminal
letters (psfne) or only the shape (qtcuc) were maintained. These data suggested that
information about terminal letters and word shape is maintained and integrated across eye
movements but not lexical or semantic information, which bolstered the argument that
extrafoveal information acquired during one ﬁxation could be maintained and integrated
across an eye movement, and that this process supports the identiﬁcation of words when
they are directly ﬁxated.

The properties of transsaccadic integration in the context of reading and word
recognition have subsequently been elaborated by Rayner and colleagues (Balota &
Rayner, 1983; Pollatsek, Lesch, Morris, & Rayner, 1992; Rayner, McConkie, & Ehrlich,
1978; Rayner, McConkie, & Zola, 1980), primarily using the extrafoveal preview
paradigm. Rayner et a1. (1978), for example, demonstrated that the beneﬁt associated
with the extrafoveal preview is not tied to the execution of the saccade, but when an eye
movement is executed, facilitation occurs only for the region toward which the eyes are
moving. This study also provided evidence that the effect was one of facilitation as
opposed to interference. All alternate word preview conditions showed a beneﬁt relative
to a no-extrafoveal-stimulus condition (a single asterisks). Rayner et a1. (1980) further
investigated the kind of overlap between the preview and target words that was necessary
to produce a beneﬁt. Preview beneﬁts were observed when the preview and target shared
two or three initial letters (green-grave or grain-grave), but these beneﬁts were not as
large as when the preview and target were identical. There was no beneﬁt associated with
previews that shared only the ﬁrst letter (write-walks), that shared all four ending letters

(write-trite), that was semantically related (write-print), or that shared the ﬁrst phoneme

(write-rough). Pollatsek et a1. (1992), however, did show a phonological effect in this
task. Preview beneﬁts for homophones were somewhat greater than for pairs matched in
terms of visual similarity (cite-site versus cake-sake).

The view of transsaccadic integration that has emerged from these studies is one
of continued processing of text based on letter codes abstracted away from precise visual
form (e.g., type style and letter case). Words are processed in the periphery, but this
processing does not proceed to the point of complete identiﬁcation. Rayner et al. (1978)
argue that if the preview were fully identiﬁed before the saccade, naming would suffer
from interference when the preview was a word other than the target. However, their data
showed facilitative effect even when this was this case. In addition, fully identiﬁed
preview words that differ from the target might be expected to intrude during naming,
which did not occur. The integration process, then, according to Pollatsek et al. (1992),
can be characterized in one of two ways, depending on one’s “modeling taste” (p. 159).
First, integration could be explained in terms of the activation of abstract letter codes
(graphemes) as well as phonemes. These orthographic and phonologic units remain active
across the saccade, which shortens the time needed for identiﬁcation once ﬁxated.
Second, integration could be explained in terms of activation of a neighborhood of entries
in the lexicon. Because neighborhood activation is thought to be inﬂuenced by its
similarity to the information coded from the preview, the latter of these two might be
better suited to deal with the fact that identical words provide more facilitation than do
homophones or words that share the ﬁrst three letters, which suggests that factors like

word shape also come into play.

The continued-processing view of transsaccadic integration has done equally well
in the context of the viewing of pictorial stimuli. In an initial study of transsaccadic
integration for real-world objects, Pollatsek, Rayner, and Collins (1984) manipulated the
visual, conceptual, and name similarity of preview and target objects using line drawings
in the extrafoveal preview paradigm. In their study, target objects were named more
quickly when the preview and target objects were identical. However, the amount of
facilitation was not reliably diminished when the size of the object changed from one
ﬁxation to the next. In addition, preview beneﬁts were observed when the preview and
target objects were different exemplars from the same basic-level category. Although the
results suggested that veridical representations are not integrated across eye movements
(see also Henderson, 1997), additional experiments supported the idea that there is a
visual component to the effect over and above that derived from the objects belonging to
the same category or having the same name: target objects (e.g., a ball) were named faster
after previews that were visually similar (e.g., a tomato) than after those that were
semantically similar (e.g., a bat), and greater facilitation was observed when the preview
was a mirror image of the target object than when it was a different object with the same
name (e.g., a baseball bat and a ﬂying-mammal bat). On the basis of this study, Pollatsek
et al. concluded that integration occurs at the level of the visual features of the object as
well as its name.

Transsaccadic Object Identiﬁcation
The continued-processing framework applied to pictorial stimuli returns transsaccadic
integration to the domain of scene perception, but now information integration is posited

to play a role in the identiﬁcation of objects as opposed to the compiling of a highly-

detailed internal picture of the world. That is, given that the perception of a scene largely
entails the sequential ﬁxation of objects, and that eye movements serve to bring objects
from the peripheral to the central region of the visual ﬁeld so that the details can be
resolved, what information acquired about an object from beyond ﬁxation has a
functional role in the identiﬁcation of that object when it is ﬁxated?

The problem is best understood when the relationship between eye movements
and attention is considered. The most prevalent view of the saccade-attention dynamic is
one where a shift of attention to the location of the upcoming saccade target precedes the
change in gaze direction (Henderson, 1992b; Henderson, Pollatsek, & Rayner, 1989;
Hoffman & Subramaniarn, 1995; Kowler, Anderson, Dosher, & Blaser, 1995; Shepherd,
Findlay, & Hockey, 1986). Interestingly, preview beneﬁts were used as a tool to
investigate the allocation of attention in the context of eye movements in much the same
way that they were used to investigate the perceptual span in reading. Henderson et a1.
(1989) had participants sequentially view an array of four objects arranged in a square in
preparation for a memory test. The availability of the objects during the course of
viewing was manipulated using a moving window technique. Fixation durations on the
objects were shorter when the full display was available during the entire viewing
sequence than when the objects were presented one at a time as they were foveated.
Objects were also ﬁxated more brieﬂy in a condition where two objects were presented at
a time, the foveated object and the next object in the viewing sequence. Irnportantly, the
full-display condition did not provide an additional advantage over the foveated plus next
object condition, suggesting that extrafoveal information acquisition was limited to the

object that was about to be ﬁxated.

Sequential attention models such as the one proposed by Henderson (1992b)
suggest that attention is initially allocated to the foveated stimulus. When processing at
the center of ﬁxation is complete or nearly complete, attention is disengaged and
reallocated to a more peripheral location. This reallocation of attention coincides with the
programming of an eye movement that brings the center of vision to the newly attended
region of the visual ﬁeld. Importantly, the lag between the shift of attention and the
execution of the eye movement affords the visual system a blurry glimpse of the object
that is the target of the impending saccade. Transsaccadic integration can thus be thought
of as a combining of processing initiated on the object at the extrafoveal region of the
retina with that initiated when the object is foveated.

The Role of Spatial Location

A question that has been raised about the integration of information about objects
is whether facilitation of processing depends on the object occurring in the same location
before and after the saccade. The question is important because the location dependence
or independence of the effect provides information regarding the kind of representational
systems that play a role in transsaccadic integration. The candidates will be referred to
here as the object type and object token representational systems. The object type system
is responsible for the identiﬁcation of objects. Location-independent beneﬁts are
generally taken to suggest the priming of long-terrn memory representations stored within
this system. These would correspond to the visual descriptions that support object
identiﬁcation as well as conceptual identity and name codes. This source of facilitation
would not be expected to depend on the location of the object in the preview display

because the system that supports object identiﬁcation is generally thought to be

independent of the system that supports object localization, as suggested by dissociable
effects of damage to what has become known as the what and where pathways
(Ungerleider & Mishkin, 1982).

The object token system, on the other hand, is responsible for maintaining
information about objects as they move or change (Kanwisher & Driver, 1992;
Kahneman, Treisman, & Gibbs, 1992). An inﬂuential theoretical instantiation of a token
system involves a construct termed the object ﬁle (Kahneman & Treisman, 1984). Object
ﬁles are short-term, episodic representations. Irnportantly for the present discussion, they
are thought to be addressed by spatial and temporal coordinates rather than by form or
identity (Kahneman et al., 1992, Kanwisher & Driver, 1992, Treisman, 1993). The
construct is founded on the idea that an object is an object by way of its continuity in
space and time, a concept that is often illustrated with an example of apparent motion.
Consider a movie of a simple object, such as a square, translating across a computer
display so that when it reaches mid-screen it is replaced with a triangle. The perception of
motion, of course, is created by controlling the displacement of the object from one frame
to another in the movie. By manipulating the displacement parameters, however, the
display can be made either to appear as a square being transformed into a triangle or as a
disappearing square and an appearing triangle. Of more practical import, the primacy of
spatiotemporal continuity accounts for the fact that one’s perception of an object can
evolve over time. For example, a vehicle viewed from one’s rearview mirror may appear
as a police car when distant and as civilian automobile with a roof rack when near. While
the identity ascribed to the object changes, its continuity as a single object remains stable.

Indeed, an important aspect of the ﬁle metaphor is that information can be added as it

10

becomes available during the course of a perceptual episode. Importantly, because the
information is indexed by location, preview beneﬁts arising from the object token
representational system should be location-dependent.

To determine whether the preview beneﬁt depends on the continuity of object
location, Pollatsek, Rayner, and Henderson (1990) used a modiﬁed version of the
extrafoveal preview paradigm. The primary difference was that the new version had two
objects in the preview display, side-by-side in the periphery. When the participants
initiated a saccade, one of the preview objects was replaced with a target object and the
other was replaced with a checkerboard mask so that only One nameable object remained.
The location-dependency of the preview beneﬁt was examined by manipulating the target
object such that it occurred in the same or in a switched position relative to its position in
the preview display. The greatest portion of the beneﬁt observed in this version of the
task was location-independent. That is, there was an advantage associated with having the
target object in the preview display, but the additional beneﬁt of having it remain in the
same location was small. As a result, they suggested that the identiﬁcation of objects
from one ﬁxation to the next was primarily facilitated by the activation of object
representations stored in long-term memory.

While the Pollatsek et a1. (1990) study implicated representations that are not
referenced by location, Kahneman et a1. (1992), found evidence for the primary
involvement of spatially-indexed representations in the integration of information across
disruptions of another kind. Their study was aimed not at the integration of information
across eye movements but at the maintenance of an object’s identity through change and

motion. The logic and technique, however, were nearly identical. Whereas the Pollatsek

11

et al. (1990) experiments provided an index of the beneﬁt of having the preview in the
same versus a different location, Kahneman et al. (1992) established a beneﬁt that is tied
to having identity information associated with the same versus a different object. In their
experiments, objects were deﬁned as entities independent of their identities. This was
accomplished by having square frames appear in one display with a letter in each. A
linking display containing empty frames appeared in such a way as to produce the
perception that the frames moved from one location to another. When the ﬂames arrived
at their ﬁnal location, one of the preview letters was displayed in either the same or a
different frame. Compared to a control condition, the letter was named faster if it
appeared in the same object frame. There was little or no beneﬁt derived from the mere
presence of the letter in the other object frame, supporting the idea that the maintenance
of object identity is accomplished through the reviewing of object ﬁles. Speciﬁcally,
Kahneman et al. proposed that an object ﬁle is created during the initial view of an object
and information is collected within the ﬁle as it becomes available, including a visual
description of the object as it develops and the identity of the object once recognized.
During subsequent views, the object ﬁle can be retrieved on the basis of its spatial and
temporal position and target identiﬁcation can then be speeded by reviewing the contents
of the ﬁle.

The fact that performance in these two similar tasks favored two different
representational systems warranted further investigation. It was possible that the
discrepancy between the two studies could be accounted for by the fact that viewing in
one was transsaccadic and the other was within-ﬁxation; however, there were a number

of methodological differences that might also have contributed to the differences in

12

results. To address these issues, Henderson and Anes (1994) put together a study that
captured elements of the two previous approaches. Like the Pollatsek et al. (1990) study,
they measured transsaccadic effects. However, like the Kahneman et al. (1992) study,
they used letters in frames and a smaller stimulus set. In addition, the items were aligned
vertically in the preview and target displays and the mask was eliminated. Kahneman et
al. argued that the appearance of the mask in the switch condition of the Pollatsek et al.
study might have generated the perception of motion. If this were the case, the observed
effects would have to be considered object-speciﬁc. Finally, Henderson and Anes
manipulated the number of task-relevant items in the preview display. While the target
display always had a single to-be-named letter ﬂanked by a plus sign, the preview display
could have either two letters (a target letter and a ﬂanking letter) or a letter and a plus
sign. This manipulation was expected to provide converging evidence for the
involvement of object ﬁles under the assumption that only the construction or reviewing
of object ﬁles would be capacity-limited. Thus, if there was a location-dependent
component to the effect, only it would be reduced by the additional item in the preview
display. With these modiﬁcations in place, Henderson and Anes found both location-
dependent and location-independent preview beneﬁts: targets were named more quickly
in the same versus switch conditions as well as in the switch versus control conditions. In
addition, only the location-dependent beneﬁt was reduced by having a task-relevant
ﬂanker object in the preview. When the preview was two letters as opposed to a letter and
a plus sign, the object-speciﬁc beneﬁt was reduced but the nonspeciﬁc beneﬁt remained

unchanged.

13

The Two Representational Systems Theory

On the basis of the ﬁndings discussed above, Henderson (1994) proposed a two-
representational-systems theory to explain how information from one ﬁxation might
facilitate the identiﬁcation of an object viewed in a subsequent ﬁxation. On this account,
two sources contribute independently to the integration process. First, the initial view of
an object generates activity at a number of levels of representation in the system that
supports object identiﬁcation. As a result, the identiﬁcation of target objects can be
facilitated by the priming of visual descriptions stored in long-term memory, basic-level
semantic categories, and/or the object’s name. A second source of facilitation is derived
by the construction and review of object ﬁles (Kahneman et al., 1992). However, because
only the location-dependent component of the effect was modulated by the task relevance
of the ﬂanking object, either the construction or the reviewing of object ﬁles is resource-
limited.

The generality of the two-representational-systems framework has subsequently
been tested using pictorial stimuli. The pattern of results found using letters in frames
was replicated by Henderson & Siefert (2001) using line drawings of objects: both
location-dependent and location-independent beneﬁts were found and only the location-
dependent beneﬁt was reduced by the presence of a meaningful ﬂanker in the preview
display. More telling, however, was the pattern of results observed in a second
experiment that included a mirror reversal condition, a condition that manipulates visual
but not semantic or name content. Here, the location-dependent beneﬁt was reduced
when the preview and target were mirror images, but the location-independent beneﬁt

was undiminished by this manipulation. This has been taken as rather strong evidence in

14

favor of two independently contributing representational systems. While the type
representation would be abstracted to the identity or concept level, only the token
representation would be expected to preserve detailed information about the form of the
speciﬁc object.

The two-representational-systems theory did, however, change in terms of its
alliance with the object ﬁle theory. Henderson and Siefert (2001) opted to use the term,
token, to refer to the episodic representation of their theory because the pattern of results
observed in the transsaccadic studies was not entirely consistent with the object ﬁle
theory. In particular, the object ﬁle theory suggests that the object ﬁle is the
representation that gets matched to the long-term representations during identiﬁcation,
and as a result, the priming of object types is mediated by the object ﬁle. In the two-
representational-systems theory, however, a resource limitation is associated with the
episodic representation but not with the priming of object types. To accommodate the
transsaccadic data, the object ﬁle theory would have to be modiﬁed to include a
limitation on the review of object ﬁles that is not imposed on their construction.

An additional ﬁnding that may be difficult to reconcile with the object ﬁle theory
is the fact that the pattern of results found in the transsaccadic studies holds when the
retinal events are simulated within a steady ﬁxation (Henderson, 1994; Henderson &
Anes, 1994). Because the objects are moved from the periphery to the center of vision
while ﬁxation is maintained, the spatial coordinates of the objects change whether the
target object is in the same or a switched location. This ﬁnding suggests that a conﬁgural
spatial code plays a role in the integration process, and it is unclear how such a coding

scheme would play out in the object ﬁle theory.

15

Overview of Current Research

While the pattern of results found in the location-dependency experiments
suggests that integration can occur within a system that codes object types as well as
within a system that codes object tokens, integration can be taking place at a number of
levels of representation within each of these systems. Indeed, the study by Pollatsek et al.
(1984) suggested that visual information and the object’s name are each maintained and
integrated across eye movements. According to the two-representational-systems theory
advanced by Henderson (1994; Henderson & Siefert, 2001), the presaccadic allocation of
attention towards the objects in the periphery can result in activation at the level of stored
visual descriptions, semantic categories, and/or object names. Residual activity in the
object recognition system can then produce location-independent priming at each of these
levels when the target is processed after the saccade. In addition, a small number of
object tokens will be constructed before the eye movement. When spatial continuity is
maintained across the saccade, the token is retrieved and the properties of the object are
reactivated within the object recognition system, leading to a location-dependent source
of facilitation. In sum, the research and theory to date suggest that the integration of
information about real-world objects can range from fairly detailed visual descriptions of
objects to their conceptual identities and names. The overarching goal of the present
study was to determine the relative contribution of the varied levels of representation
using full-color pictures of real-world objects in the extrafoveal preview paradigm.

One objective was to determine the kind of visual information that plays a role in
transsaccadic object identiﬁcation. While studies in the context of reading suggest that

only abstract letter codes are integrated (e. g., Rayner et al., 1980), previous research

16

using pictures of objects suggests that visual features contribute to the integration
process. Evidence for a visual component is given by an advantage for identical preview
and target objects over that derived by objects that visually differ, such as mirror
reversals and token substitutions (e.g., Henderson & Siefert, 2001; Pollatsek et al., 1984).
The additional beneﬁt for the identical preview is diagnostic because the two kinds of
previews differ only in terms of their visual properties. Preview beneﬁts could arise from
higher levels of representation in both of these cases because the preview and target
objects always share a conceptual identity and name; however, the advantage for the
identical preview is thought to reﬂect a visual component because the contribution of the
higher levels of representation is assumed to be equated. In an effort to systematically
investigation of the kinds of visual properties that are pertinent, Experiments 1 and 2
manipulated the visual similarity between preview and target objects. Of particular
interest was the nature of the difference that was required to produce an identical preview
advantage.

A second objective was to examine the effects of extrafoveal previews using a
non-repeating stimulus set. To date, the studies that have investigated the transsaccadic
integration of pictorial information have all employed relatively small sets of items that
repeat within a session (e.g., Gajewski & Henderson, 2005; Henderson, 1992a;
Henderson, Pollatsek, & Rayner, 1987; Pollatsek et al., 1984, 1990). That is, naming
latencies for a given target object were measured for each participant as well as for all
preview conditions employed. Initial studies examining transsaccadic integration in
reading were met with criticism due to the fact that items came from a limited stimulus

set (McClelland & O’Regan, 1981; Paap & Newsom, 1981). The argument was that

17

preview beneﬁts could be driven by expectations derived from the repetition of stimulus
items within an experimental session. Preview beneﬁts were found to survive when
repetition was controlled in the context of reading (Balota & Rayner, 1983), but this issue
has not yet been addressed using pictorial stimuli.

Testing the generality of the preview beneﬁts with pictorial stimuli in the non-
repeated context is important for at least two reasons. First, repetition effects on preview
beneﬁts might be stronger with objects compared to words because object shapes are
likely to be visible in the periphery due to the usefulness of lower spatial frequency
information. As a result, familiarity with the objects could allow participants to bypass
the normal identiﬁcation process by simply associating a constellation of (context-
speciﬁc) visual features with an object’s name. If facilitation is limited to contexts where
familiarity with the objects is high, it would constrain what could be said about the
continuation of identiﬁcation processing across saccades.

Second, because identiﬁcation of the preview object itself has been shown to
occur quite readily in the repeated-item context (Pollatsek et al. 1984), investigations
using item repetitions may not be best suited for testing integration at visual levels of
representation. As mentioned above, evidence for a visual component to transsaccadic
integration is given by the identical preview advantage. While testing the generality of
the preview beneﬁts is interesting in its own right, examining the impact of visual
differences in the non-repeated context should be most informative. That is, assuming the
identical preview advantage reﬂects the occurrence of integration at a level of

representation prior to identiﬁcation, the paradigm should be most sensitive to these

18

differences when the probability of reaching identiﬁcation before the saccade is
minimized, as would be the case when objects are not repeated.

A third objective was to determine the relative contribution of visual versus
identity and name levels of representation to the generation of preview beneﬁts in the
non-repeated context. Preview beneﬁts observed in the extrafoveal preview paradigm
suggest that information acquired from the periphery during one ﬁxation can facilitate the
identiﬁcation of objects during a subsequent ﬁxation; however, this facilitative effect
could be driven almost entirely by the attainment of identiﬁcation before the saccade.
Experiment 3 addressed this issue in two regression analyses. In the ﬁrst analysis,
preview beneﬁts were examined as a function of accuracy in an extrafoveal identiﬁcation
task, which required participants to identify objects based solely on a brief peripheral
glimpse. If preview beneﬁts are driven primarily by identiﬁcation of the preview object
before the saccade, extrafoveal identiﬁcation should strongly predict the magnitude of the
preview beneﬁts observed, and there should be little or no beneﬁt for objects that are
rarely or never identiﬁed in the periphery. On the other hand, if preview beneﬁts do not
depend on identiﬁcation, there should be facilitation even for objects that are rarely or
never identiﬁed peripherally.

In the second analysis, the contributions of visual and higher levels of
representation were quantiﬁed by a multiple regression model that simultaneously
accounted for preview identiﬁcation and visual similarity. Of particular interest was
whether the effect of visual similarity on preview beneﬁts would vary as a function of
extrafoveal identiﬁcation. One hypothesis was that the visual contribution to the effect

should be greatest for objects that are not readily identiﬁed from the periphery. When

19

objects are readily identiﬁed from the periphery, the identity of the object should
frequently be activated or retrievable upon completion of the saccade, potentially
obscuring integration at visual levels of representation. If this is the case, the beneﬁt of
visual similarity should be greatest for objects that are at the low end of the extrafoveal
identiﬁcation scale. On the other hand, objects that are more readily identiﬁed from the
periphery may better activate the store descriptions that support identiﬁcation. If this is
the case, the beneﬁt of visual similarity should be greatest for objects that are at the high
end of the extrafoveal identiﬁcation scale.
EXPERIMENT 1

Experiment 1 employed the extrafoveal preview paradigm of Pollatsek et al.
(1984), but with photo-realistic pictures of objects instead of line drawings. Each trial
consisted of three displays, as depicted in Figure 1. First, a ﬁxation display was
presented, consisting of a ﬁxation cross on the left-hand side of the screen and a square
frame on the right-hand side of the screen. The participant began each trial with their
gaze directed at the ﬁxation cross. Second, a preview display was presented. The preview
display was exactly the same as the ﬁxation display except that an object (meaningful or
meaningless) appeared within the frame on the right. Participants were instructed to shift
their gaze toward the object in the frame as quickly as possible once it appeared. Third, a
target display was presented once the eyes crossed a software-deﬁned boundary. Target
displays were conﬁgured the same as the preview displays, except that only meaningful
objects would appear within the frame on the right. Participants named the target object

as quickly as possible and their vocal response terminated the target display.

20

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Control Dissimilar Similar IdenticaIfTarget

     

Figure 1. Schematic illustration of the displays presented in Experiment 1 (Top). During
the ﬁrst display, participants ﬁxated a small plus sign on the left-hand side of the screen.
A preview object was then presented in the second display and participants initiated a
saccade toward the object. While the eyes were moving, the display was changed to
present the target object. Participants named the target object as quickly as possible after
the saccade. Example items for the meaningless-object, different-object, and identical
preview conditions are also shown (Bottom). Full-color images were used in the actual
experiment. The trial illustration is not shown to scale.

21

Experiment 1 was designed to satisfy three goals. The ﬁrst goal was to test the
generality of the preview beneﬁt by using a non-repeating stimulus set. Each object was
presented only once to each participant. Thus, there was no opportunity for participants to
generate expectations concerning the stimulus set or to learn to associate particular visual
features in the periphery with speciﬁc target objects.

The second goal was to examine the kind of visual information that is integrated
across eye movements. Pollatsek et al. (1984) showed that a portion of the beneﬁt is
derived ﬁom the activation of visual features; however, the amount of visual feature
overlap that is needed to produce facilitation remains an open question. In addition, the
studies manipulating location suggest that two kinds of visual representations contribute
independently to the integration process: spatiotemporally addressed episodic
representations that are thought to include visual details associated with particular
instantiations, and stored object descriptions that are considered more abstract. To
examine the contribution of detailed versus abstract forms of visual representation, the
visual similarity of preview and target objects was manipulated in four preview
conditions (identical, visually-similar, visually-dissimilar, and control). Preview and
target objects in the experimental conditions could differ visually but were always from
the same basic-level category, and the objects were selected so as to be from the same
approximate viewpoint (see Figure 1). As a result, they differed primarily in terms of
surface-level features, such as color, texture, and the shapes of the parts. If preview
beneﬁts are driven by representations that preserve these properties, there should be a
reduction in the magnitude of the preview beneﬁt that corresponds to the reduction in the

visual similarity between the preview and target objects. On the other hand, if preview

22

beneﬁts are driven primarily by representations that are abstracted over these properties,
the manipulation of visual similarity should have little or no effect.

The ﬁnal goal was to determine the extent to which name priming contributes to
performance in the non-repeated context. Pollatsek et al. (1984) found an inhibitory effect
when the preview and target objects were from a different basic-level category: target
objects were named more quickly in a control condition without a preview object than in
the different-object preview condition. The inhibitory component was determined to
reﬂect the availability of the preview object’s name. When the preview object was closer
to the point of initial ﬁxation, the preview was more readily identiﬁed and name
inhibition was elevated. In the present study, identiﬁcation of the preview objects was
assumed to be less frequent because each object appeared only once per session.
Nevertheless, the contribution of name priming to the preview beneﬁt is to some degree
indicated by the amount of interference generated by the different object preview.
Experiment 1 was divided into two subexperiments. In Experiment 1A, the control object
for each target was from a different basic-level category than the target. In Experiment
13, a meaningless non-object was used as a control. If name activation is a dominant
component of the preview beneﬁt in the non-repeated context, naming should be faster in
the non-object control condition than in the different-obj ect control condition, because
only meaningful preview objects are expected to be associated with a name that would
interfere with the naming of the target object.

Method
Participants. Thirty-two Michigan State University undergraduate students

participated in the experiment for course credit, 16 each in Experiments 1A and 1B. All

23

participants had normal or corrected-to-normal vision and were naive with respect to the
hypotheses under investigation.

Stimuli. The stimuli consisted of full—color pictures of real-world objects and a
meaningless object. The meaningless object was created by taking a mottled color pattern
and shading it to give it dimension. The shading was based on overlapping simple
geometrical ﬁgures. Three exemplars of 60 object types were selected from the Hemera
Photo-objects 50,000 Premium Image Collection. Objects were selected so that all the
tokens within a category were from the same approximate point of view. The selection of
real-world objects was based on two norrning studies: the ﬁrst rated pairs of object tokens
for visual similarity, and the second rated the target objects for naming consistency (see
Appendix A). Target, similar, and dissimilar objects were selected on the basis of the
mean similarity ratings for each pair of object tokens so as to minimize the visual
difference between the target and similar objects, and to maximize the visual difference
between the target and dissimilar objects. The mean visual similarity score was reliably
greater in the visually-similar condition (3.39) compared to the visually-dissimilar
condition (2.19), t(59) = 19.49, p < .001, and all target objects were given the same name
at least 75% of the time. An additional 15 objects were selected for the different-object
preview condition. Examples for two object types are shown in Figure 1, and the visual
similarity and naming consistency scores for the objects employed are presented in
Appendix B.

Preview and target displays comprised an object centered within a square frame
on the right-hand side of the screen and a ﬁxation cross on the left-hand side of the

screen. A total of 196 displays were generated using the three exemplars of 60 objects,

24

the 15 objects in the different-object preview condition, and the meaningless object. The
object pictures were 5.5° in height and 5. 1 ° in width on average at a viewing distance of
58 cm. The meaningless object was 58° in height and 64° in width. The ﬂame subtended
8.8° vertically and horizontally. The ﬁxation marker was vertically centered and 4.0°
from the left-hand side of the screen. There was a 238° separation between the ﬁxation
marker and the center of the object frame on the right-hand side of the screen.

Apparatus. Stimuli were displayed at a resolution of 800 by 600 pixels by 24-bit
color on a 19-inch Dell P991 monitor driven by a NVIDIA GeForce3 video graphics card
with a screen refresh rate of 100 Hz. The room was illuminated by ﬂuorescent overhead
lighting. Eye movements were monitored using an ISCAN ETL-400 pupil and corneal
reﬂection tracking system sampling at 240 Hz. The position of the right eye was tracked,
though viewing was binocular. The eyetracker is accurate to within 0.25° of visual angle
both horizontally and vertically. The computer changed the display contingent on
detecting an eye movement that crossed an invisible boundary positioned 3.3° to the right
of the ﬁxation marker and 206° to the left of the center of the target objects. Display
changes required a maximum of 20 ms and were accomplished during the saccade when
vision was suppressed.

Stimulus presentation and response collection were controlled by E-Prime
experimental software. Naming latencies were collected with a voice key provided by E-
Prime. The eyetracker and display monitor were interfaced with a 2GHz, Pentium 4,
microcomputer. The computer controlled the experiment and maintained a complete
record of the position and time values for the point of regard, as well as time values for

voice key events over the course of each trial.

25

Procedure. Upon arriving for the experimental session, each participant was
seated comfortably. A forehead rest minimized head movements and maintained viewing
distance. The session began with a generic object naming task to provide the
experimenter an opportunity to adjust the sensitivity of the microphone. None of these
objects were used in the actual experiment.

The eyetracker was calibrated at the beginning of the session and then checked
between trials using the ﬁxation display. Participants were asked to direct their eyes to
the ﬁxation marker and to the center of the object frame. If the calibration was
satisfactory (plus or minus 05° from each of the positions), the participant was asked to
direct their gaze toward the ﬁxation marker to indicate readiness to begin. The
experimenter then initiated each trial by pressing a silent button. The ﬁxation display was
replaced by the preview display and the participant immediately initiated a rightward
saccade to the object centered within the frame. During the saccade, the preview display
was replaced by the target display. The target display remained in view until the
participant responded by naming the object as quickly as possible.

In both Experiments 1A and 1B, each participant named 60 objects. Trials were
produced by the within-participant combination of 4 preview conditions: identical,
visually-similar, visually-dissimilar, and control. For Experiment 1A, the controls were
objects from a different basic-level category. For Experiment 1B, the control was always
the non-object. Within each subexperiment, items were assigned to preview conditions
via Latin square design so that each object appeared in each condition an equal number of
times across participants. The order of object presentation (and hence the order of

condition presentation) was determined randomly for each participant within each

26

subexperiment. Participants were assigned to subexperiment using a pseudorandom
procedure; each participant took part in only one experiment. The entire session lasted
approximately 30 minutes.

Results and Discussion

The mean naming latencies'for this analysis appear in Table 1. Naming latencies
were deﬁned as the elapsed time between the crossing of the display-changing boundary
to the onset of the vocal response. These means exclude trials in which the target object
was named incorrectly, an anticipatory eye movement occurred (saccade latencies of less
than 100 ms), and trials on which the naming latency was less than 200 ms or more than
3 standard deviations greater than the mean naming latency for that subject. Eliminated
trials accounted for 10% of the data in Experiment 1A and 11% of the data in Experiment
1B. Saccade latencies were marginally slower in Experiment 1A (mean = 323 ms) than in
Experiment 13 (mean = 269 ms), F(l,30) = 3.00, MSE = 31,81 1, p = .09, but did not
differ across conditions in either experiment, F < l, and, F(3,45) = 1.725, MSE = 11,235,
p = .18, respectively. The source of the between-experiment difference in saccade
latencies is unknown; however, the analyses that follow do not indicate a differential
impact on the measures of interest.

The ﬁrst question addressed in Experiment 1 was whether item-familiarity
through repetition is a prerequisite for the observation of extrafoveal preview beneﬁts on
the identiﬁcation and naming of real-world objects. Analyses of variance (ANOVAs)
were performed on each subexperiment by participants (F 1) and by items (F2) with
preview condition as the within-participants and within-items factors respectively. There

was an effect of preview condition in Experiment 1A, F1(3,45) = 8.53, MSE = 7,995, p <

27

 

Identical Visually Visually Maximally Different- Non-

 

Similar Dissimilar Dissimilar object object
Control Control
Exp 1A 796 (46) 802 (29) 812 (31) 933 (29)
Exp 1B 774 (24) 806 (28) 810 (27) 908 (31)

M 785 (25) 804 (20) 811 (20)

By; 2 677 (20) 690 (22) 721 (17) 785 (19)

 

Table 1. Mean Naming Latencies and Standard Errors (in milliseconds) for Experiments
1 and 2 by Preview Condition.

.001, F2(3,177) = 14.04, MSE = 18,491, p < .001, and Experiment 1B, F; (3,45) = 12.07,
MSE = 4,437,p < .001, F2(3,177) = 12.14, MSE = 19,328,p < .001. In both
subexperiments, naming latencies were faster in all three experimental conditions than in
the control conditions (all ps < .01). Naming latencies were slower overall compared to
those typically observed using repeated items, but the magnitude of the preview beneﬁts
in the present study were as great or greater than those observed in previous studies (e.g.,
Henderson, 1992a; Henderson et al., 1987; Pollatsek et al., 1984, 1990). For example,
across several experiments, naming latencies for the control conditions in the Pollatsek et
al. (1984) study ranged from 681-787ms (compared to 908 ms in the present study), and
the identical versus control preview beneﬁts in their experiments ranged from 85-135 ms
(compared to 136 ms in the present study). Thus, the results of Experiments 1A and 18
showed robust preview beneﬁts, despite the fact that each participant saw each object
only once.

The second objective was to determine the role of visual similarity in the
generation of preview beneﬁts. In Experiment 1A, there were no differences between
experimental conditions when the control conditions were eliminated from the analyses

(ps > .82). In Experiment 13, the effect of preview condition was marginal by items,

28

F 2(2,1 18) = 2.88, MSE = 17,682, p = .06, when the control conditions were eliminated,
but it was not reliable by participants, F 1 (2,30) = 1.64, MSE = 3,663, p = .21. To
determine whether the effect of visual similarity could be examined using the full power
of both subexperiments, a mixed ANOVA was performed on the entire experiment with
version included as a between-participants factor in the analysis by participants and as a
within-items factor in the analysis by items. There was no effect of version, F1(1,30) =
.03, MSE = 34,038,p = .87, F2(l,59) = 2.17, MSE = 11,807,p = .15, and the effect of
preview condition did not differ between subexperiments, F 1 (2,60) = .27, MSE = 5,408, p
= .77, F2(3,177) = 1.20, MSE = 18,966, p = .31. As within each subexperiment, the effect
of preview condition was eliminated when the control condition was removed from the
analysis, F1(2,60) = 1.036, MSE = 5409, p > .35, F2(2,118) = 1.19, MSE = 8,937, p = .31.
However, because previous research has shown an advantage for identical previews,
planned comparisons were conducted between the identical preview condition and the
similar and dissimilar preview conditions. The contrast between the identical and
dissimilar conditions was of particular interest because the items were selected so as to
minimize the difference between the identical and similar conditions. Consistent with
previous research, there was some indication that detailed visual representations
contributed to performance: naming latencies were marginally faster in the identical
preview condition relative to the dissimilar preview condition, t(31) = -1.75, p = .09,
t2(59) = -1.68, p = .10. The better part of the preview beneﬁt, however, appears to be
driven by a more abstract level of representation. The advantage of the dissimilar over the
control condition (110 ms) was four times greater than the advantage of the identical over

the dissimilar condition (26 ms). The fact that the visual effect was tenuous here suggests

29

that the visual differences between preview conditions were too small, and/or that the
stimulus properties manipulated were largely inconsequential to the integration process.

The ﬁnal objective of this experiment was to examine the role of name activation
in the generation of preview beneﬁts in the context of a non-repeating stimulus set. This
issue was addressed by comparing naming latencies in the two control conditions. If
name activation were a signiﬁcant component of the effect, there should be interference
when there is a mismatch between the name of the preview and the name of the target.
While the mean naming latency was numerically greater in the different-object control
condition than it was in the non-object control condition, this difference was not reliable,
tl(30) = 0.605, p = .549, t2(59) = 1.03, p = .31. The result of this comparison suggests that
name activation did not play as signiﬁcant a role in the present study as it did in the
Pollatsek et al. (1984) study. Assuming that name inhibition depends on preview
identiﬁcation, there are at least two differences between studies that could account for
this difference. First, while full-color pictures of objects were employed here (as opposed
to line drawings), the objects in the present study were displayed at a greater distance
from the initial ﬁxation. Second, the use of a non-repeating stimulus set was intended to
reduce familiarity with the items. Each of these differences would be expected to reduce
the frequency of preview identiﬁcation, leading to a decreased contribution of name
activation.

EXPERIMENT 2

While the identical preview condition showed an advantage relative to the

dissimilar condition, the difference in naming latencies between these two conditions was

small and statistically marginal. There are at least two potential reasons for the

30

tenuousness of the visual effect. First, visual differences between the preview and target
object may have a lesser impact in the context of a non-repeating stimulus set. That is,
while preview beneﬁts measured as the difference between the identical and control
conditions generalize to the non-repeated context, item-familiarity may be required to
observe evidence of visual integration. This would be surprising, however, given that the
lack of repetition was assumed to reduce the probability that the objects would be fully
identiﬁed before the saccade, an assumption supported by the absence of name
interference. If the contributions of the identity and name levels of representation are
indeed reduced, one might expect a greater role for visual information in the integration
process. Nevertheless, the overall slowing of naming in the inﬁnite set paradigm could
decrease its sensitivity to more subtle effects.

A second possibility, however, is that the visual differences between the preview
and target objects in Experiment 1 were simply too small. The purpose of Experiment 2
was to extend Experiment 1 by employing a preview condition that introduced greater
visual differences between the preview and target objects. The task and conditions
employed were identical except that the visually-similar condition was dropped in favor
of a very, visually-dissimilar condition (which will be termed the maximally-dissimilar
condition), and only the non-object control was employed. The objects in the maximally-
dissimilar condition were selected so as to be visually different and from a different
viewpoint, visually different and mirror-reversed, visually different and rotated, or
substantially different in another way (e.g., an apple chewed to the core as a preview for
an unblemished apple, see Figure 2). If the limited effect of visual similarity in

Experiment 1 should be attributed to the kind of visual differences employed, there

31

 

Figure 2. Example stimuli from Experiment 2. The columns from left to right correspond
to the identical, visually-dissimilar, and maximally-dissimilar preview conditions. The
objects in the identical and visually-dissimilar conditions were the same as those
employed in Experiment 1. Most objects in the maximally-dissimilar condition were
different exemplars from an absolutely different perspective (the camera), some were
different exemplars mirror-reversed (the pen), and a few were substantially different in
another way (the apple). Full-color images were used in the actual experiment.

32

should be a more robust identical preview advantage relative to the maximally dissimilar
condition.
Method

Participants. Thirty-two Michigan State University undergraduate students
participated in exchange for course credit or were paid. All participants had normal or
corrected-to-normal vision, were naive with respect to the hypotheses under
investigation, and had not participated in Experiment 1.

Stimuli. The stimuli were largely the same as those used in Experiment 1, except
that the visually-similar condition was dropped and additional exemplars were selected
for the maximally-dissimilar condition. The objects were selected so as to maximize their
visual differences with respect to the targets without becoming obscure. For most items
the selected object was visually dissimilar and from a different viewpoint; however, some
of the objects were visually dissimilar objects mirror-reversed or altered in some other
way. Of the original 60 items, there were 8 for which a suitable object could not be
found. To maintain the size of the original stimulus set, these were replaced with
alternates, which also required the selecting of targets and exemplars for the visually-
dissimilar condition.

Apparatus and Procedure. The apparatus and procedure were identical to
Experiment 1, except that a display containing the correct name of the object was
presented on the computer screen after each object was named to facilitate on—line
scoring by the experimenter, a set of 15 practice trials was administered immediately
before each experiment began, and all participants were presented with the same non-

object control condition. The entire session lasted approximately 30 minutes.

33

Results and Discussion

The mean naming latencies for this analysis appear in Table 1. As in Experiment
1, these means exclude trials in which the target object was named incorrectly, an
anticipatory eye movement occurred (saccade latencies of less than 100 ms), and trials on
which the naming latency was less than 200 ms or more than 3 standard deviations
greater than the mean naming latency for that subject. Eliminated trials accounted for
16% of the data. Saccade latencies did not differ across the three experimental conditions
(mean = 299 ms), F(2,62) = 1.058, MSE = 1568, p = .35, but were slower in the control
condition (mean = 332 ms), F(l,31) = 8.954, MSE = 2034, p < .01.

Naming latencies were subjected to within-participant and within-item AN OVAs,
which revealed reliable differences across the preview conditions, F1(3,93) = 18.20, MSE
= 4,053, p < .001, F2(3,177) = 13.00, MSE = 9,869, p < .001. Naming latencies were
faster in all three experimental conditions than in the control condition (all ps < .01), and
there was an effect of preview condition when the control condition was removed from
the analysis, F(2,62) = 4.167, MSE = 3,953, p < .05, F2(2,118) = 3.10, MSE = 7,684, p <
.05. Interestingly, although naming latencies in Experiment 2 were better than 100 ms
faster here than in Experiment 1, the magnitude of the preview beneﬁts were comparable:
the differences between naming latencies in the identical and control conditions were 123
ms and 108 ms in Experiments 1 and 2 respectively. Of particular interest was whether
there would be an identical preview advantage over one or both of the visually different
preview conditions. Planned comparisons showed a 44 ms advantage for the identical
preview over the maximally dissimilar condition, F1(1,31) = 9.198, MSE = 3410, p <

.001, F2(1,59) = 5.65, MSE = 8,135, p < .05, but the 14 ms advantage over the visually

34

dissimilar condition was not reliable, F s< 1. The fact that naming was faster in the
identical preview condition than at least one of the visually different preview conditions
dispels the idea that visual effects in the preview paradigm are strictly tied to repeated
contexts. The clear failure to ﬁnd an identical preview advantage over the visually-
dissimilar condition, however, which was the same in both experiments, suggests that the
marginal effect in Experiment 1 was spurious, and that the visual differences employed
there were too subtle or the stimulus properties manipulated were inconsequential to the
integration process.

Indeed, the overall pattern of results is indicative of the kind of visual information
that is relevant to the integration process. Surface-level features, such as the object’s
color, texture and the shape of its parts, appear not to factor in prominently if at all:
preview beneﬁts in the visually-dissimilar preview condition were as great as those
observed in the identical preview condition. It is important to note that the failure to ﬁnd
an effect of integration does not mean the preview and target objects in these conditions
are indistinguishable across saccades. In a change detection task using the stimuli and
display parameters of Experiment 1, differences between the visually-similar preview and
the target could be detected 59% of the time, and differences between the visually-
dissimilar preview and the target could be detected 87% of the time, both well above the
6% false alarm rate (for details, see Appendix C). Thus, it is not the case that the ﬂeeting
and poorly resolved extrafoveal retinal image renders the difference between these
preview and target objects imperceptible.

While the failure to ﬁnd an identical preview advantage over the visually-

dissirnilar condition suggests that the representations involved are abstracted away from

35

the surface-level properties, the advantage for the identical preview over the maximally-
dissimilar preview suggests that the properties manipulated in that condition are
important. Although the nature of the difference between these two conditions was
somewhat varied and could be considered a matter of degree, the majority of objects
represented a difference in viewpoint of one kind or another, whether by rotation, mirror-
reversal, or by taking an entirely different perspective. The importance of maintaining an
object’s viewpoint across saccades should not be surprising given the empirical support
for the view that object recognition is viewpoint-dependent (e.g., Tarr, Williams,
Hayward, & Gauthier, 1998; see Tarr, 2003, for review). Transsaccadic integration cast
in terms of object identiﬁcation would be expected to reﬂect image properties that are
captured by the object descriptions stored in long-term memory. If facilitation in the
extrafoveal preview paradigm is in part determined by the priming of the object
descriptions that support identiﬁcation, it is the similarity or visual overlap between the
preview object and the description that ultimately gets matched to the target object that
should determine the magnitude of the beneﬁt. The present ﬁndings could therefore be
interpreted as consonant with image-based models of recognition (e.g., Poggio &
Edelman, 1990; Riesenhuber & Poggio, 2000; Tarr & Biilthoff, 1998; Ullman, 1998),
which suggest that the descriptions that support identiﬁcation are two-dimensional
images corresponding to a small number of familiar views of a given object.

The overall pattern of results is also indicative of the contribution of detailed
versus abstract visual representations. Recall, the two-representational-systems theory
(Henderson, 1994; Henderson & Siefert, 2001) suggests a contribution of object types

and object tokens, with detailed visual information provided by the retrieval of tokens

36

and abstract visual information provided by the priming of types. Given that objects
always appeared in the same location in this study (i.e., spatiotemporal continuity was
maintained), preview beneﬁts were expected to have a contribution from type priming as
well as from token retrieval. In the present experiment, preview beneﬁts were
undiminished by the visual differences between preview and target objects in the visually
dissimilar condition. If it were the case that these differences could not be discerned from
the periphery, then the integration of visual information would simply reﬂect the
precision of the episodic representation. Instead, these differences were readily noticed in
the change detection task, which suggests that there is information captured by the
episodic representation that has no bearing on the integration process.

EXPERIMENT 3

In contrast to the goals of Experiments 1 and 2, which was primarily to determine
the kind of visual information that is relevant to the integration process, Experiment 3
was designed to determine the relative contribution of visual versus higher levels of
representation, such as an object’s conceptual identity and name.

In the present study, the use of a non-repeating stimulus set was assumed to
reduce the probability that the objects would be fully identiﬁed before the saccade. The
assumption that identiﬁcation of the preview object factors in less prominently in the
non-repeated context was supported by the lack of name interference. However, picture
naming is generally held to be comprised of three relatively discrete stages: object
identiﬁcation, name activation, and response generation (Johnson, Paivio, & Clark,
1996). Thus, it is possible that attention to the preview object before the saccade led to

identiﬁcation but not name activation. Indeed, the results of Experiment 2 suggest a

37

substantial contribution of non-visual levels of representation. While there was a 44 ms
identical preview advantage, the maximally-dissimilar preview condition generated a
reliable 64 ms preview beneﬁt measured relative to the non-obj ect control, presumably
driven by activation of the preview object’s conceptual identity, and perhaps its name but
to a lesser extent. I

A question that arises, then, is whether partitioning the preview beneﬁt into
components that do and do not depend on the visual form of the preview object is
equivalent to partitioning it into components that do and do not depend on preview
identiﬁcation. Theoretically, the identical preview advantage can be said to reﬂect only
the contribution of visual information, because the preview objects in the identical and
maximally-dissimilar conditions only differed visually. Similarly, the component that is
unaffected by visual differences between preview conditions can be said to reﬂect the
contribution of conceptual identity and name, because these were held constant across
preview conditions. Assuming that only the contributions of conceptual identity and
name depend on preview identiﬁcation, attributing the identical preview advantage to the
component of the preview beneﬁt that does not depend on identiﬁcation is perfectly
reasonable. However, visual differences between the preview conditions may be
accompanied by systematic differences in how readily the preview object can be
identiﬁed. If the preview object tended to be more identiﬁable in the identical preview
condition, the identical preview advantage would overestimate the visual component of
the effect. An alternative approach is therefore needed to ﬁrlly tease apart the visual and
higher-level components, an approach that accounts for the identiﬁability of the preview

object.

38

The approach employed in Experiment 3 was to determine the role of preview
identiﬁcation in the generation of preview beneﬁts using a series of items-based
regression analyses. The general idea was to examine preview beneﬁts for objects in the
extrafoveal preview paradigm as a function of their proportions correct in an extrafoveal
identiﬁcation task. The extrafoveal identiﬁcation task was a modiﬁed version of the
extrafoveal preview paradigm. Each trial began with a ﬁxation display and was followed
by a preview display that contained an object in a frame on the right-hand side of the
screen. However, during the saccade the preview object was replaced with a question
mark to cue participants to report the name of the object that appeared in the preview
display. Thus, performance in this task was based solely on the information that could be
acquired from a brief peripheral glimpse. The underlying assumption was that the
proportion of participants who correctly identify a given object in the identiﬁcation task
reﬂects the probability that the object will be identiﬁed before the saccade in the preview
paradigm.

In Experiment 3, there were two sets of analyses. The primary goal of the ﬁrst
analysis was to determine the predictive value of extrafoveal identiﬁcation on the
generation of preview beneﬁts using a simple linear regression model. To accomplish this
objective, naming latencies for 120 objects were collected in the identical preview and
non-object control conditions of the extrafoveal preview paradigm, and identiﬁcation
accuracy was measured for each object in the extrafoveal identiﬁcation task. Of particular
interest were the slope and intercept terms given by the regression. Because the range of
possible values on the extrafoveal identiﬁcation scale extend from 0 to l, the slope and

intercept terms were considered indicative of the components that do and do not depend

39

on identiﬁcation, respectively. That is, the magnitude of the preview beneﬁt for objects
that were never identiﬁed was given by the intercept with the Y axis, and the additional
beneﬁt for objects that were always identiﬁed was given by the slope.

An additional goal of the ﬁrst analysis was to determine whether the extrafoveal
identiﬁcation task would predict the magnitude of the preview beneﬁt beyond its
relationship to foveal identiﬁcation time. The extrafoveal identiﬁcation measure reﬂects
the probability that an object can be identiﬁed based on a brief peripheral glimpse, and
objects that are more readily identiﬁed from the periphery would be expected to also be
more quickly identiﬁed at ﬁxation. Because the predictive value of extrafoveal
identiﬁcation could be completely tied to its relationship to foveal identiﬁcation time, it is
important to determine whether extrafoveal identiﬁcation predicts the magnitude of
preview beneﬁts when foveal identiﬁcation time is controlled for statistically. To
accomplish this objective, naming latencies were collected for objects presented at the
center of the screen with no eye movement needed, and the two predictors were
examined in a hierarchical regression analysis with foveal naming time entered in the
ﬁrst step and extrafoveal identiﬁcation accuracy entered in the second. If extrafoveal
identiﬁcation has unique predictive value, it should account for variance unaccounted for
by the foveal identiﬁcation time measure.

The second analysis also had two goals. The ﬁrst goal was to determine whether
viewpoint differences between the preview and target object are the primary visual
determinants of the magnitude of the preview beneﬁt using the regression approach. The
objects in the maximally-dissimilar condition of Experiment 2 were primarily from

different viewpoints, but this manipulation was not as systematic as would be ideal

4o

because the viewpoint differences were created by rotation, mirror-reversal, as well as by
taking an absolutely different perspective. The most powerful manipulation would be to
cross viewpoint and surface-level differences in an experimental design, which would
allow one to directly compare preview beneﬁts for identical objects from different views
with preview beneﬁts for different objects from the same viewpoint; however, a
sufﬁcient number of object pairs from matching views could not be found in the database
of photo-realistic objects employed in this study. The advantage of the regression
approach is that the contribution of viewpoint and surface-level properties can be teased
apart with fewer constraints on stimulus selection. To accomplish this objective, naming
latencies were collected in a different-token version of the extrafoveal preview paradigm
with the visual similarity of the preview and target items ranging from nearly identical
and from the same viewpoint to appreciably different and from a different viewpoint. The
object pairs were normed for object similarity, which was the visual similarity of the
objects disregarding differences in viewpoint, as well as for viewpoint similarity, which
was the similarity of the view of the objects disregarding differences in the objects
themselves. If visual integration is based on representations that are abstracted over
surface-level properties but not differences in viewpoint, the magnitude of the preview
beneﬁt should be predicted by the viewpoint similarity rating but not by the object
similarity rating.

The second goal was to determine the contributions of visual and identity levels
of representation to the integration process by examining the combined effects of visual
similarity and extrafoveal identiﬁcation in a multiple regression model. By including both

predictors, the preview beneﬁt can be partitioned into components that do and do not

41

depend on the visual form of the preview object as well as into components that do and
do not depend on preview identiﬁcation. In other words, because preview objects vary in
terms of their similarity to the target object as well as their identiﬁability, the relative
contribution of visual and higher levels of representation can be teased apart most
effectively by accounting for each of these variables simultaneously.

An additional beneﬁt of the multiple regression approach is that the effect of
visual similarity can be examined as a function of extrafoveal identiﬁcation by including
a visual similarity x extrafoveal identiﬁcation interaction term. Throughout the paper it
has been assumed that the attainment of identiﬁcation before the saccade would obscure
the contribution of visual effects. In fact, part of the motivation for the use of a non-
repeated stimulus set hinged on this idea. An alternate hypothesis, however, is that the
objects that are more readily identiﬁed also better activate the stored visual descriptions.
There is actually some indication that this might be the case. The Pollatsek et al. (1984)
study included a retinal eccentricity manipulation, which had been shown to affect the
identiﬁability of the preview object. Interestingly, there was a trend toward a greater
identical preview advantage in the condition where the preview object had the higher
probability of identiﬁcation (see their Experiment 5). If this is the case here, the beneﬁt
of visual similarity should be greater for objects at the high end of the extrafoveal
identiﬁcation scale. I

Method

Participants. One hundred and sixty-ﬁve Michigan State University

undergraduate students participated in the experiment for course credit (25 in the

extrafoveal identiﬁcation task, 60 in the identical/non—object control version of the

42

extrafoveal preview paradigm, 30 in the different-token version of the extrafoveal
preview paradigm, and 25 in each of 2 versions of the foveal naming task). All
participants had normal or corrected-to-normal vision and were naive with respect to the
hypotheses under investigation.

Stimuli. A set of 120 object pairs were selected for the study on the basis of a
preliminary obj ect-naming task. Target objects were selected so that the same name was
generated by at least 85% of these participants. Each item was paired with another object
from the same basic-level conceptual category. The pairs were selected by the
experimenter with the goal of creating a stimulus set with a wide range of visual
differences between the object pairs. The pairs of objects were then normed for visual
similarity (see Appendix D), with ratings obtained on three scales: 1) object similarity,
where participants were instructed to rate pairs based on the similarity of the objects
themselves while disregarding differences in viewpoint, 2) viewpoint similarity, where
participants were instructed to rate pairs based on the similarity of the viewpoint of the
objects while disregarding differences in the appearance of the objects, and 3) image
similarity, where participants were simply asked to indicate the visual similarity each of
the two objects without further instruction. (The image similarity scale was included so
that the relative importance of object- and viewpoint-similarity to judgments of visual
similarity could be determined, but it was not a variable of primary interest.) The pairs of
objects were divided into two sets prior to the collection of data. Objects in the ﬁrst set
were used as previews and targets in the identical condition and as previews in the
different-token condition. Objects in the second set were used as targets in the different-

token condition. The assignment of objects to sets was arbitrary but with a bias toward

43

putting the more canonical picture in the second set; however, none of the objects were
obscure.

Procedure. Three tasks with slightly different procedures were employed in

Experiment 3. The procedure for the extrafoveal preview paradigm was the same as in
Experiment 2, except that number of trials doubled. The assignment of participants to
conditions was based on two analyses of interest: one group of participants was presented
with the identical and non-object control conditions and another group of participants was
presented only with the different-token condition. Participants in the identical/non-object
control group saw all 120 items with half of the items assigned to the identical condition
and half assigned to the control condition. Objects were assigned to preview conditions
via Latin square design so that each object appeared in each condition an equal number of
times across participants. Participants in the different-token group saw all 120 preview
and target pairs.

The procedure for the extrafoveal identiﬁcation task was largely the same as that
used for the extrafoveal preview paradigm, except participants were instructed to name
the object that appeared in the preview display instead of the target display. During the
saccade, the preview display was replaced by a display that contained a question mark
centered within the frame. The question mark remained in view until the participant
responded by naming the object, but speed of response was not emphasized. Separate
groups of participants were used for each set of objects so that each exemplar appeared
only once in a session.

The foveal naming task was comprised of 3 display events: A ﬁxation display that

was presented until the participant pressed the mouse button, an object display that was

44

presented until the onset of the voice response, and a scoring display that contained the
correct name of the object. As with the extrafoveal identiﬁcation task, separate groups of
participants were used for each set of objects. The order of object presentation was
determined randomly for all tasks and participants. The sessions lasted approximately 30
minutes.

Results and Discussion
Analysis 1: Preview Beneﬁts by Extra/oveal Identiﬁcation and F oveal Naming Speed

The ﬁrst goal of Experiment 3 was to assess the relationship between preview
beneﬁts and extrafoveal identiﬁcation using an items-based regression approach. To
accomplish this goal, performance measures were acquired from three tasks: 1) naming
latencies in the identical and non-object control conditions of the extrafoveal preview
paradigm; 2) naming latencies in the foveal naming task; and 3) proportions correct in the
extrafoveal identiﬁcation task.

Naming latencies in the extrafoveal preview paradigm were deﬁned as the elapsed
time between the crossing of the display-changing boundary to the onset of the vocal
response, and preview beneﬁts were the differences in mean naming latencies between
the identical and non-object control conditions by item. Consistent with Experiments 1
and 2, these means excluded trials in which the target object was named incorrectly, an
anticipatory eye movement occurred (saccade latencies of less than 100 ms), and trials on
which the naming latency was less than 200 ms or more than 3 standard deviations
greater than the mean naming latency for that subject.

Naming latencies in the fovea] naming task were deﬁned as the elapsed time

between the onset of the picture and the onset of the vocal response. As with the

45

extrafoveal preview paradigm, these means excluded trials in which the target object was
named incorrectly as well as trials on which the naming latency was less than 200 ms or
more than 3 standard deviations greater than the mean naming latency for that subject.

Performance in the extrafoveal identiﬁcation task was measured as the
proportions of participants that correctly identiﬁed each object on the basis of the
preview alone. These proportions were based only on trials with saccade latencies of at
least 100 ms. After computing means for the items, 3 were eliminated from the analysis
because their mean naming latencies were more than 3 standard deviations greater than
the mean of all the items. For the 117 remaining items, eliminated trials accounted for
13% of the data in the extrafoveal preview paradigm, 11% of the data in the foveal
naming task, and 5% of the data from the extrafoveal identiﬁcation task.

Figure 3 shows a frequency distribution for the extrafoveal identiﬁcation task.
The X axis represents the proportions of participants that correctly identiﬁed a given
object, and the Y axis represents the number of objects that were correctly identiﬁed at a
given rate. The mean proportion correct was 0.54. As can be seen in the ﬁgure, the entire
range of identiﬁcation scores was represented in the data: accuracy ranged ﬁ'om 0 to 100
percent.

It is important to note that the saccade latencies in the extrafoveal identiﬁcation
task were comparable to those observed in the extrafoveal preview paradigm. Because
the preview object disappeared saccade-contingently in the identiﬁcation task,
participants might have been led to a strategy of delaying their saccades so as to allow

attention to covertly dwell on the object. While saccade latencies were somewhat slower

46

 

35

30s

 

25-

20-

 

 

15-

10*

 

 

 

 

 

Frequency (Number of Objects)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95

Proportion Correct

Figure 3. Frequency distribution for the proportions of participants who correctly
identiﬁed a given object from the periphery in the extrafoveal identiﬁcation task.

in the identiﬁcation task (mean = 294 ms) relative to the identical preview condition of
the preview paradigm (mean = 279 ms), F(1,116) = 21.11, MSE = 601.07,p < .001, the
difference was small and latencies were slower in the non-object control condition (mean
= 323 ms), F(1,116) = 33.94, MSE = 1476.31, p < .001. The similarity of the saccade
latencies suggests that participants were not adopting different eye movement strategies
in the two tasks. In contrast, participants were often surprised to learn that the timing of
the display changes were under their behavioral control.

In the analysis of primary interest, preview beneﬁts in the preview paradigm were
examined as a function of the proportions correct in the identiﬁcation task using a simple

linear regression model. Figure 4 shows a scatter plot of the data with the regression line.

47

 

 

 

 

 

 

 

 

 

 

 

300 -
1% Y
a 3* age
‘84 200 ‘ ,3 ‘Id
:5 e if”?
31;) 100 a "i, 9%.” e ‘: 3
o 9 1
3 W ’
.9 ' 1% 9
g” t.
0- .1...
w (,9. '3‘. at;
-100 - I
-200 I I 1 .
0.0 0.2 0.4 0.6 0.8 1.0

Proportion Correct in Extrafoveal Identiﬁcation Task

Figure 4. Preview beneﬁts (in milliseconds) as a function of extrafoveal identiﬁcation

The X axis represents the proportions correct in the identiﬁcation task and the Y axis
represents the corresponding means for the preview beneﬁts. As can be seen in the ﬁgure,
preview beneﬁts rise with increasing identiﬁcation accuracy; extrafoveal identiﬁcation
explained a signiﬁcant amount of variance in preview beneﬁts, R2 = .14, F(1,115) =
18.409, p < .001. The fact that such a relationship is observed is not surprising; larger
preview beneﬁts would be expected for objects that are more readily identiﬁed from the
periphery because more information would be available to contribute to target
identiﬁcation after the saccade. What is of interest is the degree to which preview
beneﬁts depend on extrafoveal identiﬁcation. The most extreme possibility would be the

case where facilitation occurs only when the preview object itself is identiﬁed. The

48

 

1. 2. 3. 4. 5.

 

1. Extrafoveal Identiﬁcation - --
2. Foveal Naming -.38* --

3. Naming Latencies (Non-object Control) -.38* .86" --
4. Naming Latencies (Identical Preview) -.60* .80“ .78* --
5. Preview Beneﬁts .37* .03 .27“ -.40* --

 

Table 2. Correlations for the Measures Employed in Experiment 3 (* p < .01)

present analysis does not favor such a conclusion, however, because the intercept term
differed reliably from 0, b0 = 32.70, t(115) = 2.59, p < .05, suggesting a 33 ms beneﬁt for
objects that are never identiﬁed from the periphery.

Although the data do not support complete dependence on extrafoveal
identiﬁcation, preview identiﬁcation appears to play a prominent role in the generation of
preview beneﬁts. The slope term suggests an additional 85 ms beneﬁt for objects that are
always identiﬁed, bl = 85.28, t(115) = 4.29, p < .00]. Given the equation that has
emerged from the present analysis:

Y' = 85(ExtraID) + 33,
the component that does not depend on identiﬁcation is a little more than a third the size
of the component that does depend on identiﬁcation. In contrast, the identical preview
advantage of Experiment 2 was about two-thirds the size of the component that did not
depend on visual form. The difference between experiments suggests that preview
identiﬁcation may have played a more prominent role in the generation of preview
beneﬁts than indicated by the identical preview advantage of Experiment 2.

The second goal of the present analysis was to determine whether the predictive
value of extrafoveal identiﬁcation extends beyond its relationship to foveal identiﬁcation.

Table 2 shows the correlations for the measures from the three tasks employed. As would

49

 

13 p :

 

Step 1
Foveal Naming .019 .029 0.31
Step 2
Foveal Naming .130 .197 2.14“
ExtraID 102.26 .446 4.84*

 

Table 3. Beta Weights From Regression Analysis I (* p < .05).

be expected, extrafoveal identiﬁcation and naming latencies in the foveal naming task
were reliably correlated, r = -.3 8, p < .001. It is not surprising that objects more readily
identiﬁed from the periphery can be more quickly identiﬁed when foveated. The question
is whether extrafoveal identiﬁcation accounts for variance in preview beneﬁts over and
above the variance it shares with foveal naming speed. Naming latencies in the foveal
naming task were highly correlated with those in the identical and non-object control
conditions (rs 2 .80, ps < .01), which would be expected given that naming the objects at
ﬁxation is a component of performance in each of these conditions. What is interesting,
however, is the fact that the correlation between the foveal naming and preview beneﬁts
was not reliable, p = .76. The failure to ﬁnd an effect here suggests that preview beneﬁts
are independent of foveal naming speed, and as a result, there is no reason to expect the
predictive value of the extrafoveal identiﬁcation task to be mediated by foveal naming.
Nevertheless, preview beneﬁts were examined as a function of extrafoveal identiﬁcation
and foveal naming in a hierarchical regression analysis with foveal naming entered in the
ﬁrst step (see Table 3). When foveal naming and extrafoveal identiﬁcation were both
included in the model, both terms were reliable (ps < .05). However, extrafoveal
identiﬁcation explained 17% of the variance in preview beneﬁts unaccounted for by

foveal naming, AR2 = .17, F (1,1 14) = 23.43, p < .001. Thus, extrafoveal identiﬁcation has

50

unique predictive value; the probability that an object can be identiﬁed on the basis of a

brief peripheral glimpse alone is not tied to the object’s speed of foveal identiﬁcation.

Analysis 11: Preview Beneﬁts by Extrafoveal Identification and Visual Similarity

The goal of this second analysis was to assess the integration of visual
information across saccades by examining the effect of visual similarity on naming
latencies in the preview paradigm. Of particular interest was whether preview beneﬁts
would depend more on viewpoint- than object-similarity, and whether the effect of visual
similarity would depend on extrafoveal identiﬁcation. To accomplish this goal,
performance measures were acquired from three tasks: 1) naming latencies in the
different-token condition of the extrafoveal preview paradigm; 2) naming latencies in the
fovea] naming task; and 3) proportions correct in the extrafoveal identiﬁcation task. In
the different-token condition, preview objects were taken from the ﬁrst set of items and
target objects were taken from the second. As a result, the second set of objects was
employed in the foveal naming task and the ﬁrst set was employed in extrafoveal
identiﬁcation task (as it was in the ﬁrst analysis).

Preview beneﬁts for this analysis were the differences in mean naming latencies
between the different-token condition and the foveal naming task by item. As a result,
naming latencies in the different-token condition were measured from the onset of the
ﬁrst ﬁxation after the display-changing saccade. Consistent with all other analyses, these
means excluded trials in which the target object was named incorrectly, an anticipatory
eye movement occurred (saccade latencies of less than 100 ms), and trials on which the

naming response (measured from the boundary crossing) was less than 200 ms or more

51

 

1. 2. 3. 4.

 

1. Object-similarity --

2. Viewpoint-similarity .1 7 --
3. Image-similarity .87* .53* --
5. Preview Beneﬁts .10 25* .14 --

 

Table 4. Correlations of Preview Beneﬁts with Visual Similarity Ratings (* p < .01).

than 3 standard deviations greater than the mean for that subject. To account for bad eye-
tracking samples, trials were also eliminated if the difference between the saccade latency
and the onset of the post-saccadic ﬁxation was more than 2 standard deviations above or
below the overall mean.

Naming latencies in the foveal naming task were deﬁned as the elapsed time
between the onset of the picture and the onset of the vocal response. As with the
extrafoveal preview paradigm, these means excluded trials in which the target object was
named incorrectly as well as trials on which the naming latency was less than 200 ms or
more than 3 standard deviations greater than the mean naming latency for that subject.
Saccade latencies averaged 289 ms in the different-token condition, which was not
reliably different from the saccade latencies in the extrafoveal identiﬁcation task (mean =
293 ms), F(1,119) = 1.426, MSE = 626.17, p < .001. The analysis was based on all 120
items. Eliminated trials accounted for 18% of the data in the extrafoveal preview
paradigm, 10% of the data in the foveal naming task, and 5% of the data from the
extrafoveal identiﬁcation task.

The ﬁrst goal of the analysis was to evaluate the three visual similarity scales by
testing the correlation of each with preview beneﬁts in the different-token condition (see
Table 4). To begin, object-similarity and viewpoint-similarity were only marginally

correlated (r = .17, p = .07), which shows that the objects differed independently on these

52

dimensions. Interestingly, image-similarity was more strongly correlated with object-
similarity (r = .87) than with viewpoint-similarity (r = .53), t(117) = 9.55, p < .01,
suggesting that differences between the objects themselves factor more strongly in the
psychological assessment of visual similarity. Nevertheless, viewpoint-similarity was the
only similarity scale that correlated with preview beneﬁts (r = .25, p < .01). This ﬁnding
provides converging evidence for the relative importance of maintaining viewpoint
versus surface-level properties in the transsaccadic integration of visual information.
The second goal of the present analysis was to evaluate the combined roles of
extrafoveal identiﬁcation and visual similarity on the generation of preview beneﬁts. Of
particular interest was a) determining the relative contribution of visual and higher levels
of representation by accounting for visual similarity and extrafoveal identiﬁcation
simultaneously, and b) determining whether the contribution of visual information
changes over the levels of extrafoveal identiﬁcation. As with the analysis above, the
predictive values of the variables were evaluated using the hierarchical regression
approach. Viewpoint similarity was the visual similarity scale employed in this analysis
because it was the only scale that varied with the magnitude of the preview beneﬁts.
Extrafoveal identiﬁcation was entered in the ﬁrst step so that the additional contribution
of viewpoint similarity could be evaluated in the second step (see Table 5). Extrafoveal
identiﬁcation again explained a signiﬁcant amount of variance in preview beneﬁts, R2 =
.30, F (1,1 18) = 49.85, p < .001, and the addition of viewpoint similarity in the second
step improved the ﬁt of the model, Alt2 = .04, F (1,1 17) = 6.46, p < .05. However, the
addition of the viewpoint-similarity x extrafoveal identiﬁcation interaction term in the

third step had no effect, AR2 = .01, F(l,116) = 1.65, p = .20, suggesting that the

53

 

B p t

 

Step 1
ExtralD 120.60 .545 7.06“
Step 2
ExtraID 116.14 .525 6.92*
VP-similarity 14.75 .193 2.54“
Step 3
ExtraID 33.24 .150 0.50
VP-similarity 3.27 .043 0.3 l
ExtraID x VP-similarity 22.13 .429 1.28

 

Table 5. Beta Weights From Regression Analysis II (* p < .05).

contribution of visual information was the same regardless of the identiﬁability of the
preview object.

The results of the present analysis also provide a sharper characterization of the
contributions of visual and higher levels of representation to the integration process. The
model that best ﬁts the data is given by the following equation:

Y' = 115(ExtraID) + 13(VP-similarity),
which was derived by including extrafoveal identiﬁcation and viewpoint similarity as
predictors. This model was run without a constant, however, because the constant term
did not reliably differ from 0 in step 2 in the above regression, b0 = -8.01, t(117) = -.34, p
= .73. As a result, the coefﬁcients differ slightly from those derived earlier. To quantify
the contributions of preview identiﬁcation and viewpoint similarity, consider the
minimum and maximum values for each variable. The extrafoveal identiﬁcation values
ranged from 0 to l, and the viewpoint similarity values ranged from 2.0 to 4.9. Plugging
these values into the equation suggests a 0-115 ms component that depends on

identiﬁcation and a 26-64 ms component that depends on viewpoint similarity.

54

The present model is actually quite consistent with that generated in the ﬁrst
analysis. The difference between the maximum and minimum values for viewpoint
similarity suggests a 38 ms advantage when preview and target objects are from the same
viewpoint. Thus, the component that depended on visual properties of the object was one-
tlrird the size of the component that depended on identiﬁcation. In the earlier analysis, the
component that did not depend on identiﬁcation was a little over one-third the size of the
component that did depend on identiﬁcation. Finally, the fact that there was no
interaction is consistent with the idea that the visual and identity components contribute
additively to the overall effect.

In sum, the results of Experiment 3 support the idea that the preview beneﬁt can
be partitioned into a component that depends on preview identiﬁcation as well as a
component that depends on the visual similarity of the preview to the target. While the
better portion of the preview beneﬁt reﬂects integration at the level of the object’s
conceptual identity, at least part of the effect is independent of identiﬁcation, and the part
that is independent of identiﬁcation appears to depend on the maintenance of viewpoint
across saccades but not the visual properties of the objects themselves.

General Discussion

The present study had three primary objectives. One objective was to determine
whether preview beneﬁts in the extrafoveal preview paradigm would generalize to the
case where items are not repeated within a session. The facilitative effect’s potential
dependence on stimulus familiarity had already been ruled out in the context of reading
(Balota & Rayner, 1983) but not in studies using pictorial stimuli. With pictures of

objects, however, more useful information can likely be acquired from the periphery, and

55

as a result, the efficacy of repetition is likely elevated. Indeed, the fact that the
identiﬁcation of preview objects themselves occurs readily in the repeated-item context
(Pollatsek et al., 1984) is consistent with the possibility that participants are able to
bypass normal identiﬁcation processes by associating context-speciﬁc visual features
with an object’s name. While the present study does not speak to the veracity of this
hypothesis, it does indicate that preview beneﬁts can be observed when context-speciﬁc
associations are not given the opportunity to come into play. In Experiments 1 and 2,
preview beneﬁts for identical previews measured relative to the control conditions were
all in excess of 100 ms. The fact that preview beneﬁts were of comparable magnitude to
those observed using repeated stimulus sets despite the fact that there were considerable
between-group differences in the overall speed of naming indicates that the facilitative
effect of the extrafoveal preview is quite robust. This ﬁnding is important for the
continued-processing framework for transsaccadic integration, particularly the idea that
transsaccadic integration can be thought of as object identiﬁcation in the context of eye
movements. If facilitative effects were only observed for familiar stimuli, it would
suggest that integration is not a component process of normal identiﬁcation, at least
during one’s initial encounter with an object.

A second objective was to determine the kind of visual information that is
integrated across saccades, particularly in the context of a non-repeated stimulus set,
where the contribution of the higher levels of representation was expected to be
minimized. The benchmark for the integration of visual information was the advantage
for the identical preview. If a preview object visually differs from the target object but

produces the same facilitation as that produced by the identical preview, then the

56

properties that were not held constant cannot be said to play a role in the integration
process. In contrast, if the difference between the preview and target diminishes the
preview beneﬁt, then the properties that were not held constant must be relevant. In
Experiment 1, preview beneﬁts were largely unaffected by visual differences between the
preview and target objects, but the objects were always ﬁ'om the same approximate
viewpoint. In Experiment 2, an identical preview advantage was observed relative only to
the maximally-dissimilar condition. While the differences between previews and targets
were somewhat varied in this condition, the majority reﬂected a change in viewpoint of
one kind or another in addition to the more surface-level differences manipulated in the
other dissimilar condition, such as color, texture and the shapes of the parts. The
importance of viewpoint was reinforced in the second regression analysis of Experiment
3, which showed a relationship between preview beneﬁts and viewpoint-similarity but
not object-similarity. Thus, the results of this study indicate that surface-level properties
of the objects play at best a minor role in the integration process. The visual properties
that are integrated, however, are the properties that are common to the identical, visually-
similar, and visually-dissimilar conditions, and differ between these and the maximally-
dissirnilar condition, such as the object’s outline shape and/or its overall volumetric shape
abstracted away from surface-level details.

An alternate way to frame the role for visual differences in the magnitude of the
preview beneﬁt is more quantitative than is suggested above. That is, perhaps preview
beneﬁts simply diminish when the visual differences between the preview and target are
big. In the maximally-dissimilar condition, previews and targets differed in terms of

viewpoint in addition to the surface-level differences that were manipulated in the other

57

conditions, and the viewpoint changes alone would be expected to alter the image greatly.
However, at least two arguments can be made against the idea that the amount of change
is the determining factor. To begin, the visual differences in the visually-dissimilar
condition were readily noticed in a transsaccadic change detection task. If the magnitude
of the preview beneﬁt was in some way tied to the salience of the change, then the
identical preview advantage would be about as robust in the visually-dissimilar condition
as it was in the maximally-dissimilar condition. A stronger case is perhaps provided by
the visual similarity analysis in Experiment 3. If preview beneﬁts were affected only by
the amount of change in the image, the image-similarity scale would be expected to
correlate with preview beneﬁts more strongly than either of the other two similarity
scales. However, this outcome was not observed: viewpoint-similarity was the only
similarity scale that correlated reliably with preview beneﬁts. Ruling out the amount-of-
change hypothesis is important because it would likely suggest that the visual effects are
in some way artifactual. The visual source of facilitation has been cast as residual
activation in the object recognition system (Henderson & Siefert, 2001). As a result, the
kind of visual differences between preview and target that matter are expected to
correspond to the kind of information is that captured by the representations that support
identiﬁcation. In contrast, it is unclear how the amount of change by itself would ﬁt into
an object recognition framework. Instead, it would best be explained as a general
disruptive effect that occurs when a change is noticed.

A third objective was to determine the relative contribution of visual versus
identity and name levels of representation to the generation of preview beneﬁts. In

Experiment 1, different-object and non-obj ect control conditions were employed to assess

58

the contribution of name activation in the context of a non-repeated stimulus set. Previous
research using a repeated stimulus set showed an effect of interference when the preview
and target objects had different names (Pollatsek et al., 1984). This deﬁcit suggested that
the name of the preview object was activated and carried over into the processing of the
target. In the present study, the naming of target objects was not statistically slower when
the preview was a different object with a different name than when it was a meaningless
non-object without an associated name. This ﬁnding suggests a reduced contribution of
name activation when familiarity with the items is not developed through repetition,
presumably due to a corresponding reduction in the frequency of preview identiﬁcation.

While the lack of name interference indicated a reduced role for name activation,
performance in the preview paradigm and the identiﬁcation task suggested a contribution
from the identity level of the representation. Although the rate of extrafoveal
identiﬁcation was not nearly as great as observed in the repeated context (Pollatsek et al.,
1984), participants were able to identify the objects better than 50% of the time on
average. In addition, a sizeable preview beneﬁt was observed in Experiment 2 even when
the visual differences between previews and targets were greatest. Thus, a primary issue
in the present study was the extent to which preview beneﬁts depend on the identiﬁcation
of the preview object itself.

In Experiment 3, the role for preview identiﬁcation in the generation of preview
beneﬁts was examined in a series of items-based regression analyses. Preview beneﬁts
increased as a function of extrafoveal identiﬁcation, but the facilitative effect was present
for objects that were rarely or never identiﬁed on the basis of the peripheral glimpse

alone. Thus, it cannot be said that preview beneﬁts depend entirely on identiﬁcation

59

before the saccade. The analyses do, however, suggest a prominent role for preview
identiﬁcation. While the non-zero intercept provided support for the idea that there was a
component that does not depend on identiﬁcation, this component was relatively small,
amounting to about a 33 ms effect. Partitioning the preview beneﬁt into components that
do and do not depend on extrafoveal identiﬁcation indicates a much larger (85 ms)
component that does depend on identiﬁcation. A similar decomposition was provided by
accounting for visual similarity and extrafoveal identiﬁcation simultaneously. It is
important to note that partitioning the effect in this manner does not mean that identity
priming is always the greater source of facilitation. If the preview object is not identiﬁed
on a given trial, facilitation would be provided exclusively by the priming at the visual
level of representation. Indeed, the fact that the viewpoint similarity x extrafoveal
identiﬁcation interaction did not approach statistical signiﬁcance is consistent with the
idea that priming at the identity level of representation contributes independently with
priming at the visual level of representation.

In sum, previous research has suggested that there is a visual component to the
integration process when pictures of objects are used as stimuli (Henderson & Siefert,
2001; Pollatsek et al., 1984; Pollatsek et al., 1990); the present study not only provides
additional support for this conclusion but also indicates that integration is not dependent
on the attainment of identiﬁcation before the saccade. These ﬁndings support the idea
that transsaccadic integration can be conceptualized as the continuation of processing
across saccades, and suggest that integration is a component process of object
identiﬁcation in the context of eye movements. The term, transsaccadic integration, is

more intimately tied to the classic notion that snapshots are merged together than it is

60

with the idea that the results of processing are combined across saccades. As a result, the
integration problem has only loosely been tied to the problem of object recognition. From
the continued-processing perspective, however, generating hypotheses about the kind of
information integrated requires knowledge on the nature of the representations and
processes involved in object identiﬁcation. Since the problem of object recognition is not
one that can be considered solved, the best that one can hope for is that theory and
research from the two domains can be mutually informative. From this perspective one
might ask: How does the present study contribute to an understanding of the problem of
object recognition?

Aside from the suggestion that object identiﬁcation can be thought of as a process
that bridges discrete visual samples of the world, the most obvious contribution is derived
by the nature of the visual mismatch between the preview and target that is required to
observe an identical preview advantage. If the visual source of facilitation can indeed be
cast as residual activation in the object recognition system (Henderson & Siefert, 2001),
then the kind of visual differences that matter should correspond to the kind of
information is that captured by the representations that support identiﬁcation. In other
words, the preview beneﬁt should depend on how well the preview primes the
representation that ultimately gets matched to the target after the saccade. As discussed
above, preview beneﬁts were robust to surface-level differences but not to differences in
viewpoint. This ﬁnding is most consonant with the view-based approach, which suggests
that the object recognition system encodes information about objects as viewed from
particular vantage points (e.g., Poggio & Edelman, 1990; Riesenhuber & Poggio, 2000;

Tarr & Btllthoff, 1998; Ullman, 1998). An object-centered approach, such as that

61

employed in Biederrnan’s (1987) “Recognition-By-Components” model, which suggests
that the object recognition system encodes objects in terms of their volumetric parts and
the relations between the parts, could account for the viewpoint effect if the preview
objects were systematically more difﬁcult to decompose into parts when the viewpoint of
preview and target differed maximally. This possibility seems remote, however, given
that the objects employed in the present study were not depicted from obscure
perspectives.

Another aspect of the current study that would seem particularly important for
researchers studying object recognition is the fact that the objects could be so readily
identiﬁed from the periphery. Although the lack of repetition decreased the frequency of
extrafoveal identiﬁcation overall, 29 objects were correctly identiﬁed 90% of the time or
better. This occurred despite the fact that objects subtended around 5° of visual angle and
were presented greater than 20° into the periphery. This ﬁnding would seem to place
constraints on the kind of information that is needed to identify an object. Interestingly,
the idea that identiﬁcation can be based on visual input that is relatively coarse is
reﬂected in a recent proposal suggesting that entry-level object recognition could be
supported by low spatial scale images, something like blurry silhouettes of objects (Tarr,
2003). The idea is based largely on the possibility that fully-detailed images might not be
best suited for perceptual categorization. In particular, the details associated with
complete images are posited to reduce the similarity between exemplars of a category
thereby making it difﬁcult to map multiple exemplars to a particular class. The proposal
is supported by the fact that recognition performance based on silhouettes can be as good

or better than performance based on shaded renderings of objects (Hayward, 1998). In

62

addition, computational and behavioral studies indicate that silhouettes provide enough
information to discriminate between classes of very similar objects, such as between dogs
and cats (Cutzu & Tarr, 1997; Eimas & Quinn, 1994; Quinn, Eimas, & Tarr, 2001).

Present support for this idea comes from a subsidiary experiment using the
extrafoveal preview paradigm and silhouettes generated for the objects of Experiment 1.
Here, a reliable 43 ms preview beneﬁt was observed relative to the non-object control
condition (see Appendix F), suggesting that outline shape is an important property, a
property that changes with changes in viewpoint. Importantly, it suggests that
information about outline shape can be readily acquired from the periphery and used to
support target identiﬁcation. A question that might need to be asked is whether blurred
extrafoveal retinal images can support the decomposition of objects into volumetric parts
as suggested by Biederrnan’s (1987) model.

Having considered the implications for the problem of object recognition, the
discussion turns now to the present study’s contribution to current theory of transsaccadic
integration. Previous research has suggested that the integration of information about
objects takes place at various levels of representation, ranging from detailed visual
representations to the representation of the object’s name (Henderson, 1994; Pollatsek et
al., 1984). The overarching goal of this study was to sharpen theory not only by
identifying the visual properties that are integral to the integration process but also by
determining the relative contributions of the varied levels of representation.

To begin, the two-representational-systems theory proposed by Henderson (1994;
Henderson & Siefert, 2001) suggests that the type and token systems contribute

independently to the integration process. Prior to the execution of a saccade, attention is

63

directed toward the upcoming saccade target, and with that shift of attention, processing
of the soon-to-be-ﬁxated object begins. Facilitation of identiﬁcation from within the type
system is provided by activation at visual, identity and name levels of representation,
depending on the level of processing achieved before the saccade landing. The present
study indicates that objects differ widely in terms of how readily they can be identiﬁed
from the periphery. These differences are likely tied to a number of factors, such as the
object’s divergence from the canonical view (Palmer, Rosch, & Chase, 1981) as well as
the individual’s personal history with objects of that kind. When the saccade target is
ﬁxated, identiﬁcation is speeded by the combining of new and residual activation within
the object type system. If the object cannot be identiﬁed from the periphery, priming will
occur only at the level of the object’s stored description. However, if the object can be
identiﬁed from the periphery, priming will occur additionally at the level of the object’s
identity and perhaps its name. If spatiotemporal continuity is maintained across the
saccade, facilitation of identiﬁcation can also be provided by the token system. This
source of facilitation is generated by the retrieval of the spatially-indexed episodic
representation of the object created before the saccade. According to the theory (see
Henderson & Siefert, 2001), retrieval of the token reactivates the properties of the object
within the type system, and this reactivation combines with new activation in the exact
same way that residual activation combines with new activation during the priming of
types. That is, although the token is considered a more visually detailed representation, its
inﬂuence on identiﬁcation is mediated by the type system. The present data lend support
to this idea. Preview beneﬁts were largely undiminished by surface-level differences,

suggesting a contribution from representations that are abstracted away from these

64

properties. However, these very same differences could be noticed in a change detection
task, suggesting that the episodic representation provides details that are not relevant to
the integration process.
Conclusion

To summarize, a number of conclusions can be reached on the basis of the present
study. First, preview beneﬁts have been shown to generalize to the case where items are
not repeated. Second, integration is largely but not completely driven by identiﬁcation of
the object before the saccade. Third, the visual representations that are involved in the
integration process are abstracted away from surface-level properties but not viewpoint.
Fourth, the visual component of the preview beneﬁt is independent of identiﬁcation,
suggesting that priming at the identity level of representation contributes additively with
priming generated at the visual level of representation. Finally, the study validates the
idea that the transsaccadic integration should be thought of as a component of object
identiﬁcation, and that the extrafoveal preview paradigm could be used as a tool to

leverage theory in that domain.

65

APPENDIX A
Method for Visual Similarity Norm of Experiment 1

Participants. One hundred and nine Michigan State University undergraduate
students participated in the experiment for course credit. All participants had normal or
corrected-to-normal vision and were naive with respect to the hypotheses under
investigation.

Stimuli. The stimuli consisted of full-color pictures of real-world objects selected
from the Hemera Photo-objects 50,000 Premium Image Collection. The pictures were
comprised of 140 object types: 4 exemplars of 135, and 3 exemplars of 5 for a total of
555 images. Objects were selected so that all the tokens within a category were from the
same approximate point of view. A display was generated for all possible within-category
object pairings: 6 for the 135 with 4 exemplars and 3 for the 5 with 3 exemplars for a
total of 825 new images. The displays were each comprised of a neutral gray background,
the trial list number, and two objects positioned side-by-side around the center of the
display. The objects were of the same pixel dimensions as those employed in Experiment
1, and the displays were projected so as to subtend about the same number of degrees of
visual angle at the average viewing distance.

Apparatus and Procedure. The visual similarity norm was conducted in a
classroom by projecting the images on a screen via LCD projector. Each object pair
object was presented in a random order (determined in advance of the session) for a
duration of 5 seconds. A warning tone sounded one second before each display was
terminated. All pairs of objects were rated by each participant on a 5-point scale with 5

indicating the highest degree of visual similarity and 1 indicating the lowest degree of

66

visual similarity. Responses were indicated by bubbling in scantron sheets, and the raw
data was compiled by the optical scanning service offered by the Scoring Ofﬁce at
Michigan State University. The study was run in 11 sessions, and each session lasted
approximately 90 minutes.
Method for Name Consistency Norm

Participants. Seventy-one Michigan State University undergraduate students
participated in the experiment for course credit. All participants had normal or corrected-
to-normal vision and were naive with respect to the hypotheses under investigation.

Stimuli. The stimuli were 226 objects selected from the set employed in the visual
similarity norm. The objects selected were candidate target objects for the extrafoveal
preview paradigm (i.e., the objects that would ultimately be named). Because two
similarity schemes were considered, one where the identical and similar previews were as
similar as possible and one where the differences between similarity conditions were
about equal, 86 out of the 140 object types required 2 exemplars in this study. The
displays generated were each comprised of a neutral gray background, the trial list
number, and one object at the center of the display. The objects were of the same pixel
dimensions as those employed in Experiment 1, and the displays were projected so as to
subtend about the same number of degrees of visual angle at the average viewing
distance.

Apparatus and Procedure. The name consistency norm was conducted in a lecture
ball by projecting the images on a screen via LCD projector. Each object pair object was
presented in a random order (determined in advance of the session) for a duration of 6

seconds. A warning tone sounded one second before each display was terminated.

67

Participants were instructed to generate a name for each object and write it down on
numbered score sheets that were provided by the experimenter. The study was nm in 1
session that lasted approximately 35 minutes.
Results

Naming consistency was deﬁned as the frequency of the most frequent response
divided by total number of responses. Of the 140 object types employed in the visual
similarity norm, 60 were selected based on a name consistency criterion of 0.75. Target,
similar, and dissimilar objects were then selected on the basis of the mean similarity
ratings for each pair of object tokens so as to minimize the visual difference between the
target and similar objects, and to maximize the visual difference between the target and
dissimilar objects. The mean visual similarity score for the sixty selected objects was
reliably greater in the similar condition (3.3 9) compared to the dissimilar condition
(2.19), t(59) = 19.49, p < .001. The object names, the visual similarity values for the
similar and dissimilar previews, and the naming consistency rating for the targets are all

presented in Appendix B.

68

APPENDIX B

 

 

Name Similarity l Similarity 2 Name Consistency
Accordion 3.8 1 .9 0.89
Apple 4.0 2.4 0.99
Ball 3.7 2.0 0.76
Basket 3.0 1.9 0.99
Bell 3.5 2.4 1.00
Binoculars 3.6 2.1 0.99
Boots 3.2 2.3 0.92
Bowl 3.4 1.6 0.89
Bullet 3.5 1.7 0.97
Butterﬂy 4.0 1 .8 l .00
Cake 3.0 1.5 0.96
Calculator 3. 1 2.5 1 .00
Camera 3.0 2.4 0.99
Cane 3.6 2.5 0.90
Chair (1) 3.0 2.0 0.97
Chair (2) 3.5 2.3 1.00
Doll 2.8 2.0 0.92
Donut 3.9 2.0 0.97
Earphones 3.2 3.0 0.96
Earrings 2.4 1.8 0.93
Fan 3.7 2.2 0.86
Feather 3 .2 1.9 1 .00
Fire hydrant 3.7 1.9 0.91
Fireplace 2.9 2.0 0.92
Fish 3.7 2.3 0.87
Fork 3.4 2.1 1.00
Frog 3.3 1.8 0.89
Globe 3.3 2.3 0.97
Grapes 4.2 2.9 0.92
Guitar 4.0 2.6 0.94
Hammer 4.0 2.8 0.96
Hanger 3.6 2.7 0.92
Key 3.3 1.6 0.93
Lamp 3.9 2.1 0.90
Leaf 4.1 2.0 0.90
Light bulb 3.5 2.2 0.92
Lock 3.1 2.3 0.89
Medal 4.1 3.4 0.90
Microphone 3.2 2.2 0.79
Microscope 3.9 2.6 0.94
Mushroom 2.6 2.2 1 .00
Pear 3.9 2.7 0.99
Pen 3.9 2.1 0.97

69

Pipe

Printer
Purse
Roller blade
Screw
Skateboard
Spoon

Stool
Sunglasses
Sword
Telescope
Tire
Typewriter
Vase
Walkie—talkie
Wheelchair
Wreath

3.9
3.2
2.9
3.0
3.1
3.5
3.5
3.1
3.0
3.6
2.6
3.2
3.0
2.7
3.7
3.6
2.4

3.0
2.6
1.4
2.7
2.0
2.6
1.9
2.0
1.9
1.6
2.1
1.8
2.0
2.4
2.5
2.2
1.8

70

0.99
0.94
0.96
0.93
0.90
1.00
0.86
0.97
0.94
0.92
0.97
0.97
0.94
0.99
0.90
0.97
0.82

APPENDIX C
Method for Change Detection Task

Participants. Nine Michigan State University undergraduate students participated
in the experiment for course credit. All participants had normal or corrected-to-normal
vision and were naive with respect to the hypotheses under investigation.

Stimuli. The stimuli employed were taken from the identical, visually similar, and
visually dissimilar preview conditions of Experiment 1.

Apparatus and Procedure. The apparatus and procedure were the same as
Experiment 1, except the naming response was replaced with a same-different judgment
and the control conditions were not used. Participants began each trial with ﬁxation on a
cross on the left-hand side of the screen, and directed their gaze toward the object when it
appeared in the peripheral ﬁ'ame. A saccade-contingent display change was then initiated
so that upon completion of the saccade the object was the same, visually-similar, or the
visually-dissimilar. Participants were instructed to indicate with a button press whether
the preview and target objects were the same or different. Objects were assigned to
conditions via Latin square design so that each object appeared in each condition an equal
number of times across participants. The order of object presentation (and hence the order
of condition presentation) was determined randomly for each participant. The entire
session lasted approximately 20 minutes.

Results

Accuracy was higher in the visually—dissimilar condition (mean = .87) than in the

visually-similar condition (mean = .59), F(1,8) = 19.52, MSE = .02, p < .01, but both

were well above the false alarm rate (mean = .06), ps < .001

71

APPENDIX D
Method for the Visual Similarity Norm of Experiment 3

Participants. Seventy-nine Michigan State University undergraduate students
participated in the experiment for course credit. All participants had normal or corrected-
to-norrnal vision and were naive with respect to the hypotheses under investigation.

Stimuli. A set of 120 object pairs was selected ﬁom the Hemera Photo-objects
50,000 Premium Image Collection on the basis of a preliminary object-naming task.
Target objects were selected so that the same name was generated by at least 85% of
these participants. Each item was paired with another object from the same basic-level
conceptual category. The pairs were selected by the experimenter with the goal of
creating a stimulus set with visual differences ranging from nearly identical and from the
same viewpoint to visually different and from a different viewpoint. The displays for the
120 pairings were each comprised of a neutral gray background, the trial list number, and
two objects positioned side-by-side around the center of the display. The objects were of
the same pixel dimensions as those employed in Experiment 3, and the displays were
projected so as to subtend about the same number of degrees of visual angle at the
average viewing distance.

Apparatus and Procedure. The visual similarity norm was conducted in a
classroom by projecting the images on a screen via LCD projector. Each object pair
object was presented in a random order (determined in advance of the session) for a
duration of 10 seconds. A warning tone sounded one second before each display was
terminated. All pairs of objects were rated for visual similarity by each participant on

three 5-point scales with one being least similar and 5 being most similar. The three

72

scales were: 1) object similarity, where participants were instructed to rate pairs based on
the similarity of the objects themselves while disregarding differences in viewpoint, 2)
viewpoint similarity, where participants were instructed to rate pairs based on the
similarity of the viewpoint of the objects while disregarding differences in the appearance
of the objects, and 3) image similarity, where participants were simply asked to indicate
the visual similarity each of the two objects without further instruction. (The image
similarity scale was included so that the relative importance of object- and viewpoint-
similarity to judgments of visual similarity could be determined, but it was not a variable
of primary interest.) Responses were indicated by circling numbers on score sheets that
were provided by the experimenter. The study was run in 13 sessions, and each session
lasted approximately 30 minutes. The mean similarity ratings for all 120 object pairings

is presented in Appendix E.

73

APPENDIX E

 

 

Object Name Object Similarity Viewpoint Similarity Image Similarity
Apple 4.2 4.5 4.1
Backpack 3.4 3.4 3.2
Bagel 3.5 4.7 3.6
Balloons 3.3 4.4 3.3
Banana 3.3 2.3 2.9
Basket 3.7 4.8 3.9
Battery 3.5 4.8 ‘ 3.5
Bear 3.1 2.0 2.9
Bed 2.8 2.7 2.8
Bell 3.6 4.8 3.6
Belt 2.9 3.5 3.1
Bench 2.8 2.4 2.6
Bib 3.0 4.7 3.1
Binoculars 3.8 4.7 3.8
Blender 3 .2 3.1 3.1
Boat 4.1 2.8 3.9
Books 3.0 2.2 2.8
Bowl 3.5 4.9 3.7
Brush 2.8 4.8 3.0
Bus 3.1 4.3 3.4
Butterﬂy 3.9 4.8 4.1
Button 3.8 4.4 3.9
Cake 3.6 4.6 3.7
Calculator 2.8 2.6 2.9
Camera 3.6 3.0 3.4
Cane 4.1 4.9 4.0
Cannon 3.3 2.2 3.1
Car 2.9 2.4 2.8
Carrot 3.6 4.9 3.8
Cat 2.8 3.3 2.9
Chair 4.1 4.6 4.1
Cheese 3.9 2.7 3.5
Clock 2.8 3.1 2.7
Comb 3.7 2.5 3.3
Corn 3.6 3.5 3.3
Couch 3.2 2.4 2.8
Dice 3.3 3.5 3.4
Doll 3.5 4.8 3.6
Earrings 2.7 3.7 3.0
Eggs 2.8 2.7 3.0
Elephant 3.9 4.7 3.8
Fan 3.9 4.8 4.1
Feather 3.9 4.7 4.0

74

Fireplace
Fish
Flashlight
Flower
Football
Fork

Frog
Glasses
Globe
Glove
Guitar
Hammer
Helicopter
Horse
Horse shoe
Iron

Key
Ladder
Lamp
Leaf
Lighter
Lion
Lipstick
Lock
Mailbox
Medal
Motorcycle
Mouse
Mufﬁn
Mushroom
Notebook
Owl
Paciﬁer
Pear

Pen
Penguin
Piano

Pie

Pill
Pineapple
Pipe
Potato
Pretzels
Purse
Roller blade
Ruler

3.6
3.0
3.0
2.8
3.4
3.8
3.7
4.0
3.7
3.6
3.1
3.1
3.4
3.4
2.6
3.3
2.8
3.5
2.7
2.4
2.7
4.3
3.2
2.6
2.8
4.1
3.5
3.0
3.5
3.1
3.6
2.6
3.6
3.8
3.2
3.5
3.6
3.0
2.8
3.7
3.7
3.7
4.5
2.9
3.3
3.7

75

4.5
4.8
2.4
4.4
4.7
4.8
4.8
4.7
4.4
4.7
3.7
2.3
2.1
3.2
4.3
2.2
4.9
2.6
4.6
4.3
4.8
2.3
4.5
3.7
4.1
4.9
2.4
2.5
4.6
4.7
3.1
4.0
2.3
4.7
4.8
2.9
4.7
3.6
3.6
4.7
4.9
4.7
3.1
2.7
4.7
4.7

3.6
3.1
3.1
2.9
3.6
3.8
3.8
4.0
3.9
3.7
3.1
2.9
2.9
3.2
2.9
3.0
3.0
3.6
3.1
2.8
3.1
3.7
3.3
2.7
3.0
4.0
3.2
3.0
3.4
3.3
3.4
2.8
3.3
3.9
3.4
3.4
3.7
3.3
2.7
3.6
3.7
3.6
4.1
2.9
3.6
3.7

Saxophone
Scissors
Shark
Shovel
Skateboard
Sponge
Spoon
Stapler
Swing set
Tent

Tie

Tire
Toaster
Toilet
Tomato
Tractor
Trophy
Turtle
Tweezers
Typewriter
Umbrella
Van
Violin
Watch
Watermelon
Wheelchair
Whistle
Wreath
Yarn

Yoyo
Zebra

3.3
3.1
3.6
3.2
2.9
2.5
3.3
2.5
3.4
3.1
2.9
3.1
2.9
3.3
4.0
2.6
2.7
3.1
2.8
2.8
3.2
3.1
3.5
3.4
3.3
3.4
3.0
3.5
4.0
3.7
4.3

76

2.7
3.8
2.3
2.9
4.3
3.9
4.9
2.4
4.2
4.3
4.0
4.8
3.1
2.3
4.5
2.4
4.6
4.5
2.4
2.9
3.6
4.3
2.4
2.3
2.9
2.4
4.3
4.8
4.7
2.5
2.2

3.2
3.2
3.3
3.3
3.2
2.8
3.5
2.9
3.4
3.3
3.0
3.2
2.9
3.1
4.0
2.7
2.8
3.4
2.6
2.8
3.1
3.5
3.2
3.1
2.8
3.3
3.3
3.5
3.9
3.4
3.7

APPENDIX F
Method for Silhouette Study

Participants. Twenty-one Michigan State University undergraduate students
participated in exchange for course credit. All participants had normal or corrected-to-
norrnal vision and were naive with respect to the hypotheses under investigation.

Stimuli. The stimuli were the target objects used in Experiment 1, silhouettes of
the targets created by reducing the contrast in the original objects until the interior was
blackened, and the non-object control.

Apparatus and Procedure. The apparatus and procedure were identical to
Experiment 2, except that there were only three preview conditions (same, silhouette, and
the control) and only 12 practice trials.

Results

As in Experiments 1 and 2, mean naming latencies excluded trials in which the
target object was named incorrectly, an anticipatory eye movement occurred (saccade
latencies of less than 100 ms), and trials on which the naming latency was less than 200
ms or more than 3 standard deviations greater than the mean narrring latency for that
subject. Eliminated trials accounted for 8% of the data.

Naming latencies were subjected to a within-participants ANOVA, which
revealed reliable differences across the preview conditions, F (2,40) = 14.306, MSE =
3867, p < .001. Naming latencies were 102 ms faster in the identical preview condition
(mean = 796 ms) than in the control condition (mean = 898 ms), F (1,20) = 28.574, MSE
= 3844, p < .001, and the 43 ms faster in the silhouette condition (mean = 855 ms) than in

the control condition, F (l ,20) = 4.382, MSE = 4559, p < .05. The 59 ms advantage for the

77

identical preview over the silhouette was also reliable, F (1,20) = 11.302, MSE = 3197, p

<01.

78

References

Balota, D. A., & Rayner, K. (1983). Parafoveal visual information and semantic
contextual constraints. Journal of Experimental Psychology: Human Perception
and Performance, 9, 726-738.

Biederrnan, I. (1987). Recognition-by-components: A theory of human image
understanding. Psychological Review, 94, 115-147.

Breitrneyer, B. G., Kropﬂ, W., & Julesz, B. (1982). The existence and role of retinotopic
and spatiotopic forms of visual persistence. Acta Psychologica, 52, 175-196.

Cutzu, F., & Tarr, M. J. (1997). The representation of three-dimensional object similarity
in human vision. In SPIE proceedings from electronic imaging: Human vision
and electronic imaging ii (V 01. 3016, p. 460-471). San Jose, CA: SPIE.

Davidson, M. L., Fox, M. J ., & Dick, A. O. (1973) Effect of eye movements on backward
masking and perceived location. Perception & Psychophysics, 14, 110-116.

Eimas, P. D., & Quinn, P. C. (1994). Studies on the formation of perceptually based
basic-level categories in young infants. Child Development, 65, 903-917.

Feldman, J. A. (1985). Four frames suffice: A provisional model of vision and space.
Behavioral and Brain Sciences, 8, 265-289.

Gajewski, D. A., & Henderson, J. M. (2005). The role of saccade targeting in the
transsaccadic integration of types and tokens. Journal of Experimental
Psychology: Human Perception and Performance, 31, 820-830.

Hayward, W. G. (1998). Effects of outline shape in object recognition. Journal of
Experimental Psychology: Human Perception and Performance, 24, 427-440.

Helmholz (1867, 1925). Treatise on physiological optics. Ed and trans. J. P. C. Southall.
New York: Optical Society of America.

Henderson, J. M. (1992a). Identifying objects across eye ﬁxations: Effects of extrafoveal
preview and ﬂanker object context. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 18, 521-530.

Henderson, J. M. (1992b). Visual attention and eye movement control during reading and
picture viewing. In K. Rayner (Ed.), Eye movements and visual cognition: Scene
perception and reading (pp. 260-283). New York: Springer-Verlag.

Henderson, J. M. (1994). Two representational systems in dynamic visual identiﬁcation.
Journal of Experimental Psychology: General, 123, 410-426.

79

Henderson, J. M. (1997). Transsaccadic memory and integration during real-world object
perception. Psychological Science, 8, 51-55.

Henderson, J. M., & Anes, M. D. (1994). Roles of object-ﬁle review and type priming in
visual identiﬁcation within and across eye ﬁxation. Journal of Experimental
Psychology: Human Perception and Performance, 20, 826-839.

Henderson, J. M., Pollatsek, A., & Rayner, K. (1987). The effects of foveal priming and
extrafoveal preview on object identiﬁcation. Journal of Experimental Psychology:
Human Perception and Performance, 13, 449-463.

Henderson, J. M., Pollatsek, A., & Rayner, K. (1989). Covert visual attention and
extrafoveal information use during object identiﬁcation. Perception &
Psychophysics, 45, 196-208.

Henderson, J. M., & Siefert, A. B. C. (2001). Types and tokens in transsaccadic object
identiﬁcation: Effects of spatial position and left-right orientation. Psychonomic
Bulletin & Review, 8, 753-760.

Hoffman, J. E., & Subramaniam, B. (1995). The role of visual attention in saccadic eye
movements. Perception & Psychophysics, 5 7, 787-795.

Irwin, D. E. (1993). Perceiving an integrated visual world. In D. E. Meyer and S.
Kornblum (Eds), Attention and performance: Synergies in experimental

psychology, artificial intelligence, and cognitive neuroscience- A silver jubilee
(pp. 121-142). Cambridge, MA: MIT Press.

Irwin, D. E., Brown, J. S., & Sun, J. S. (1988). Visual masking and visual integration
across saccadic eye movements. Journal of Experimental Psychology: General,
I I 7, 276-287.

Irwin, D. E., Yantis, S., & Jonides, J. (1983). Evidence against visual integration across
Saccadic eye movements. Perception & Psychophysics, 34, 49-57.

Irwin, D. E., Zacks, J. L., & Brown, J. S. (1990). Visual memory and the perception of a
stable visual environment. Perception & Psychophysics, 47, 35-46.

Johnson, C. J ., Paivio, A., & Clark, J. M. (1996). Cognitive components of picture
naming. Psychological Bulletin, 120, 113-139.

Jonides, J ., Irwin, D. E., and Yantis, S. (1982). Integrating visual information from
successive ﬁxations. Science, 215, 192-194.

Kahneman, D., & Treisman, A. (1984). Changing views of attention and automaticity. In

R. Parasuraman & R. Davies (Eds.) Varieties of attention (pp. 29-61). Cambridge,
MA: MIT Press.

80

Kahneman, D., Treisman, A., & Gibbs, B. J. (1992). The reviewing of object ﬁles:
Object-speciﬁc integration of information. Cognitive Psychology, 24, 175-219.

Kanwisher, N., & Driver, J. (1992). Objects, attributes, and visual attention: Which, what,
and where. Current Directions in Psychological Science, I , 26-31.

Kowler, E., Anderson, E., Dosher, B., & Blaser, E. (1995). The role of attention in the
programming of saccades. Vision Research, 35, 1897-1916.

Matin, E. (1974) Saccadic suppression: A review and an analysis. Psychological Bulletin,
81, 899-917.

McClelland, J. L., & O’Regan, J. K. (1981). Expectations increase the beneﬁt derived
from parafoveal visual information in reading words aloud. Journal of
Experimental Psychology: Human Perception and Performance, 7, 634-644.

McConkie, G. W., & Rayner, K. (1976). Identifying the span of the effective stimulus in
reading: Literature review and theories of reading, In H. Singer & R. B. Ruddell
(Eds.), Theoretical models and processes of reading (pp. 137-162). Newark, NJ:
International Reading Association.

McConkie, G. W., & Zola, D. (1979). Is visual information integrated across successive
ﬁxations in reading? Perception & Psychophysics, 25, 221-224.

Neisser, U. (1967). Cognitive psychology. Englewood Cliffs, NJ: Prentice-Hall.

O’Regan, J. K., & Lévy-Schoen, A. (1983). Integrating visual information from ﬁxations:
Does trans-saccadic fusion exist? Vision Research, 23, 765-768.

Paap, K. R., & Newsome, S. L. (1981). Parafoveal information is not sufﬁcient to
produce semantic or visual priming. Perception & Psychophysics, 29, 457-466.

Palmer, S. E., Rosch, E., & Chase, P. (1981). Canonical perspective and the perception of
objects. In J. Long & A. Baddeley (Eds.), Attention and Performance (V 01. 9) (pp.
135-151). Hillsdale, NJ: Erlbaum.

Poggio, T., & Edelman, S. (1990). A network that learns to recognize three-dimensional
objects. Nature, 343, 263-266.

Pollatsek, A., Lesch, M., Morris, R. K., & Rayner, K. (1992). Phonological codes are
used in integrating information across saccades in word identiﬁcation and
reading. Journal of Experimental Psychology: Human Perception and
Performance, 18, 148-162.

Pollatsek, A., Rayner, K., & Collins, W. E. (1984). Integrating pictorial information

81

across eye movements. Journal of Experimental Psychology: General, 113, 426-
442.

Pollatsek, A., Rayner, K., & Henderson, J. M. (1990). Role of spatial location in
integration of pictorial information across saccades. Journal of Experimental
Psychology: Human Perception and Performance, 16, 199-210.

Quinn, P. C., Eimas, P. D., & Tarr, M. J. (2001). Perceptual categorization of cat and dog
silhouettes by 3- to 4-month-old infants. Journal of Experimental Child
Psychology, 79, 78-94.

Rayner, K. (1975). The perceptual span and peripheral cues in reading. Cognitive
Psychology, 7, 65-81.

Rayner, K. (1978). Foveal and parafoveal cues in reading. In J. Requin (Ed.), Attention
and performance (Vol. 7, pp. 149-162). Hillsdale, NJ: Erlbaum.

Rayner, K. (1998). Eye movements in reading and information processing: 20 years of
research. Psychological Bulletin, 85, 618-660.

Rayner, K., McConkie, G. W., & Ehrlich, S. (1978). Eye movements and integrating
information across ﬁxations. Journal of Experimental Psychology: Human
Perception and Performance, 4, 529-544.

Rayner, K., McConkie, G. W., & Zola, D. (1980). Integrating information across eye
movements. Cognitive Psychology, 12, 202-226.

Rayner, K., & Pollatsek, A. (1983). Is visual information integrated across saccades?
Perception & Psychophysics, 3 4, 39-48.

Riesenhuber, M., & Poggio, T. (2000). Models of object recognition. Nature
Neuroscience, 3, 1 199-1204.

Ritter, M. (1976). Evidence for visual persistence during saccadic eye movements.
Psychological Research, 39, 67-85.

Shepherd, M., Findlay, J. M., & Hockey, R. J. (1986). The relationship between eye
movements and spatial attention. The Quarterly Journal of Experimental
Psycholog, 38A, 475-491.

Tarr, M. J ., & Billthoff, H. H. (1998). Irnage-based object recognition in man, monkey,
and machine. Cognition, 67, 1-20.

Tarr, M. J. (2003). Visual object recognition: Can a single mechanism suffice? In M. A.

Peterson & G. Rhodes (Eds.), Perception of faces, objects, and ocenes: Analytic
and holistic processes (pp. 177-211). Oxford, UK: Oxford University Press.

82

Tarr, M. J ., Williams, P., Hayward, W. G., & Gauthier, I. (1998). Three-dimensional
object recognition is viewpoint-dependent. Nature Neuroscience, 1, 275-277.

Trehub, A. (1977). Neuronal models for cognitive processes: Networks for learning,
perception, and imagination. Journal of Theoretical Biology, 65 , 141-169.

Treisman, A. (1993). Representing visual objects. In D. E. Meyer & S. Komblum (Eds.),
Attention and Performance XIV: Synergies in experimental psychology, artificial
intelligence, and cognitive neuroscience (pp. 163-175). Cambridge, MA: MIT
Press.

Ullman, S. (1998). Three-dimensional object recognition based on the combination of
views. Cognition, 6 7, 21-44.

Ungerleider, L. G., & Mishkin, M. (1982). Two cortical visual systems. In D. J. Ingle, M.

A. Goodale, & R. J. W. Mansﬁeld (Eds.), Analysis of visual behavior (pp. 549-
586). Cambridge, MA: MIT Press.

Wolf, W., Hauske, G., & Lupp, U. (1978) How pre-saccadic gratings modify post-
saccadic modulation transfer functions. Vision Research, 18, 1173-1179.

Wolf, W., Hauske, G., & Lupp, U. (1980). Interactions of pre- and post-saccadic patterns
having the same coordinates in space. Vision Research, 20, 117-125.

83