mans... .. .vxni- a I. v 1:}. a t 5:: .11! 3 {1.5 x I .V 3 .8. Wm.“ .. . t Jrufifihu 1 hi. z . . u I $3.. . an. an “in...“ vim“ hum. 3.....‘1 a 17: 9 g ‘ .‘I. 1‘01 aria. . . , : 43$...3 as .. 3.5.: . , . , u ‘3. V ‘ urnwya 1. , . ~.‘ Waxiffivfivdgv .31. i... . i ...... no; “w .u A...- . . _ a! “QR“ '5 CI. E ‘ trig! . 9 \x f. .i x x! 335.... i .. . .. . .E! . . . :fi.x;3§ h. at, 3911:... . . . , XML. . "M‘21‘ . k w .... luv-3.... , , . . ‘5 . Enmwhtn . 8%.“?! MW. at!!! .5. 1‘13... )5! 2 3.1.3.. Jahan— rublnt‘ 9. .1. ’1 . .13.... 1.2.2.8 gays»)... 5Q. . a"?! .n .3.‘ J x ‘11:. . ~ . L 21 .0 km“) Q v ‘5’, ,rSHHqQoflfistx. 0.. 7 by . n o , 9...! Li? ‘1’ 12H £211.! 2 p u . This is to certify that the dissertation entitled “WHERE WAS I?” A PSYCHOLINGUISTIC INVESTIGATION OF CONVERSATIONAL INTERRUPTIONS presented by BENJAMIN SWETS has been accepted towards fulfillment of the requirements for the degree In Psychology “Ea/A fin. m Major Professor’s Signature ,9 Mm 970% De MSU is an Affirmative Action/Equal Opportunity Institution LIBRARY University - .—--o—o—o-s-o-o-n-o—u-O—o-o—o-n-o—o—.—..- -— PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 2/05 p:/CIRC/DaleDue.indd-p.1 “WHERE WAS .I?”: A PSYCHOLINGUISTIC INVESTIGATION OF CONVERSATIONAL INTERRUPTIONS By Benjamin Swets A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Psychology 2006 ABSTRACT “WHERE WAS l?”: A PSYCHOLINGUISTIC INVESTIGATION OF CONVERSATIONAL INTERRUPTIONS By Benjamin Swets In three experiments, speakers were interrupted in the middle of language production in order to try to find out 1) whether bookkeeping processes and representations for message level planning in language production can be demonstrated empirically, 2) how such representations are architecturally tied in to higher— and lower-level cognitive processes such as executive planning and memory, and 3) the applied question of whether particular interruption types are more or less disruptive to speakers. Experiment 1 implemented a semi-natural dialog in which a participant was interrupted at predetermined narrative points by a confederate participant. Interruption types were manipulated, and resumption difficulties were measured in order to address the above research questions. Experiments 2 and 3 were more tightly controlled experiments that investigated whether similar bookkeeping processes exist for syntactic planning during language production. The study begins the investigation of two corners of a potentially large research space on the topic of interruptions in conversation. For Molly. ACKNOWLEDGMENTS The kernel for the idea behind work presented here began as a comprehensive examinations topic that Fernanda Ferreira, my graduate advisor, and Erik Altmann, who served on the committee, had suggested. Together, we developed the idea to study interruptions as Dr. Altmann had, but from the perspective of language. I thank both of them for their insightfulness, openness, and helpfulness during this process. Dr. Ferreira has been an inspirational advisor throughout graduate school. I hope the present work reflects the kind of ingenuity that she has displayed throughout her career. I would also like to thank the other members of my dissertation committee, John Henderson and Joyce Chai, for their valuable input to this work. A number of undergraduate research assistants aided some part of this project. Emily France (pilot interrupter) and Ritik Tawari (data coding) were notably helpful. But most of the thanks go to Twila Starosciak, the affable interrupter in Experiment 1. I am grateful not only for her interruptions, but also for the hours and hours of work she put into transcribing and coding those interruptions. Finally, I would like to acknowledge the support of my family and friends. My parents, Patti and Tom, provided the supportive environment necessary to reach this point. My friends, especially Mike, Dave, Steve, and Todd, have always provided compelling interruptions to work. But no one deserves more thanks than Molly, who is the primary inspiration for all non-psycholinguistic aspects of my life. TABLE OF CONTENTS LIST OF TABLES ................................................................................. vii LIST OF FIGURES ............................................................................... viii KEY TO ABBREVIATIONS ...................................................................... x INTRODUCTION ................................................................................... 1 Interruptions in a Fictional Dialog .......................................................... 1 Defining Interruptions ......................................................................... 6 Interruptions and the Question of Disruptiveness .................................... 11 Non—conversational Interruptions ..................................................... 12 Are Conversational Interruptions Disruptive? .......................................... 20 Bookkeeping Processes in Language Production ............................... 25 Overview of Experiments ................................................................... 29 EXPERIMENT 1 .................................................................................. 31 Research Questions ......................................................................... 31 Pilot Studies .................................................................................... 33 Main Study ..................................................................................... 33 Method ...................................................................................... 33 Data Analysis .............................................................................. 40 Hypotheses and Predictions ........................................................... 44 Results and Discussion ................................................................. 45 EXPERIMENT 2 .................................................................................. 59 Method ...................................................................................... 60 Data Analysis .............................................................................. 67 Hypotheses and Predictions ........................................................... 72 Results and Discussion ................................................................. 74 EXPERIMENT 3 .................................................................................. 97 Norming Study A .............................................................................. 98 Method ...................................................................................... 98 Results and Discussion ................................................................. 99 Norming Study B ............................................................................ 101 Method .................................................................................... 101 Results and Discussion ............................................................... 101 Interruption Study ........................................................................... 102 Method .................................................................................... 102 Data Analysis ............................................................................ 103 Hypotheses and Predictions ......................................................... 103 Results and Discussion ............................................................... 104 GENERAL DISCUSSION ..................................................................... 120 Implications for Research on Interruptions ........................................... 122 Temporal Issues ........................................................................ 122 A Role for Interference? ............................................................... 125 Utterance Planning and Task Goals—A Useful Comparison? .............. 129 Implications for Theories of Language Production ................................. 130 Implications for Bookkeeping ........................................................ 130 Issues of Incrementality ............................................................... 133 Dialog: ls Alignment Real? ........................................................... 135 Future Directions ............................................................................ 136 Conclusions .................................................................................. 144 APPENDICES .................................................................................... 147 REFERENCES .................................................................................. 174 vi LIST OF TABLES Table 1. Means and standard deviations of the four measures of resumption difficulty, displayed by interruption type ............................................ 47 Table 2. Latency and accuracy data, Experiment 3, Norming Study A ............ 100 Table 3. Latency and accuracy data, Experiment 3, Norming Study B ............ 102 vii LIST OF FIGURES Figure 1. The interruption and resumption process, involving a primary (interrupted) and secondary (interrupting) task. Adapted from Trafton et al, 2003 ......................................................................................... 14 Figure 2. Item examples, reading span and spatial span tasks, Experiment 1....38 Figure 3. Primary (RL1) and secondary (RL2) resumption lags as function of interruption type. Error bars are standard errors ................................. 47 Figure 4. Average number of repeated words as a function of interruption type. 50 Figure 5. Average proportion of resumptions containing disfluencies (uh or um) ........................................................................................... 51 Figure 6. Average interruption length by type .............................................. 53 Figure 7. Number of repeated words as a function of clip number and interruption type .......................................................................................... 55 Figure 8. Example Object Array, Experiment 1. .......................................... 62 Figure 9. Proportion of correctly answered interruption tasks ......................... 76 Figure 10. Interruption task judgment latencies as a function of interruption modality and locus ....................................................................... 78 Figure 11. Resumption lag as a function of interruption modality and locus, correct trials only ......................................................................... 81 Figure 12. Average proportion of resumptions containing repeats as a function of interruption modality and locus ....................................................... 84 Figure 13. Average proportion of resumptions containing skips as a function of interruption modality and locus ....................................................... 87 Figure 14. Average number of words in error (repeated and skipped) upon resumption as a function of interruption modality and locus. ................. 89 Figure 15. Proportion of correctly answered interruption tasks ...................... 105 Figure 16. Interruption task judgment latency as a function of interruption modality and locus. .................................................................... 107 viii Figure 17. Resumption lag as a function of interruption modality and locus, correct trials only. ...................................................................... 109 Figure 18. Average proportion of resumptions containing repeats as a function of interruption modality and locus ..................................................... 112 Figure 19. Average proportion of resumptions containing skips as a function of interruption modality and locus. .................................................... 113 Figure 20. Average number of words in error (repeated and skipped) upon resumption as a function of interruption modality and locus. ............... 115 Figure 21. Average resumption lag as a function of interruption type, Experiment 1, Pilot Study 1 .......................................................................... 150 Figure 22. Average proportion of disfluent resumptions as a function of interruption type, Experiment 1, Pilot Study 1. ................................. 151 Figure 23. Average resumption lag as a function of interruption type, Pilot Study 2. ........................................................................................... 153 Figure 24. Average number of disfluencies as a function of interruption type, Pilot Study 2 .................................................................................... 154 ix LIST OF ABBREVIATIONS ANOVA: Analysis of Variance CI: Conversational Interruption DP: Discourse Purpose DSP: Discourse Segment Purpose HCI: Human-Computer Interaction MSE: Mean Square Error RL: Resumption Lag RL1: Primary Resumption Lag RL2: Secondary Resumption Lag Tl: Task Interruption TOT: Time on Task INTRODUCTION Interruptions in a Fictional Dialog When people speak to each other in dialog, the speech stream of one speaker is sometimes broken off, or interrupted, by the speech stream of another speaker. When the result of the overlap between speech streams is that the interrupting participant temporarily gains control of the conversation, or “takes the floor", it is often difficult for the interrupted participant to return to the utterance, or topic, from before the interruption. In order to facilitate understanding of some parameters of interruptions, I present below an excerpt from Reservoir Dogs, a screenplay by Quentin Tarantino (1990), in which men sitting around a breakfast table are having a conversation. According to the screenplay, “They are MR. WHITE, MR. PINK, MR. BLUE, MR. BLONDE, MR. ORANGE, MR. BROWN, NICE GUY EDDIE CABOT, and the big boss, JOE CABOT. Most are finished eating and are enjoying coffee and conversation. Joe flips through a small address book. Mr. Pink is telling a long and involved story about Madonna.” MR. PINK: “Like a Virgin” is all about a girl who digs a guy with [deleted]1. The whole song is a metaphor for [deleted]. MR. BLUE: No it's not. It's about a girl who is very vulnerable and she's been [deleted] over a few times. Then she meets some guy who's really sensitive- MR. PINK: -Whoa...whoa...time out Greenbay. Tell that [deleted] to the tourists. 1 Potentially offensive and/or explicit material from the original screenplay was deleted. JOE: (looking through his address book) Toby...who the [deleted] is Toby? Toby... Toby... think... think... think... MR. PINK: It’s not about a nice girl who meets a sensitive boy. Now granted that’s what “True Blue” is about, no argument about that. MR. ORANGE: Which one is “True Blue?” NICE GUY EDDIE: You don’t remember “True Blue?” That was a big ass hit for Madonna. [Deleted], I don’t even follow this Tops In Pops [deleted], and I’ve at least heard of “True Blue.” MR. ORANGE: Look, [deleted], I didn’t say I ain’t heard of it. All I asked was how does it go? Excuse me for not being the world’s biggest Madonna fan. MR. BROWN: I hate Madonna. MR. BLUE: I like her early stuff. You know, “Lucky Star,” “Borderline” - but once she got into her “Papa Don’t Preach” phase, I don’t know, I tuned out. MR. PINK: Hey, [deleted] all that, I’m making a point here. You’re gonna make me lose my train of thought. JOE: Oh [deleted], Toby’s that little [deleted] girl. MR. WHITE: What’s that? JOE: I found this old address book in a jacket I ain’t worn in a coon’s age. Toby what? What the [deleted] was her last name? MR. PINK: Where was I? MR. ORANGE: You said “True Blue” was about a nice girl who finds a sensitive fella. But “Like a Virgin” was a metaphor for [deleted]. MR. PINK: Let me tell ya what “Like a Virgin”’s about. It’s about some [deleted] who’s a regular [deleted]. I mean all the time, morning, day, night, afternoon, [deleted] [deleted] [deleted] [deleted] [deleted] [deleted] [deleted] [deleted] [deleted] [deleted] [deleted]. MR. BLUE: How many [deleted] was that? MR. WHITE: A lot. MR. PINK: Then one day... [proceeds to finish story while handling several non-disruptive interjections]. In this dialog, Mr. Pink is interrupted several times, and the results of each interruption are quite varied. Mr. Pink is first interrupted by Mr. Blue, who disagrees with Mr. Pink's initial thesis. Mr. Pink makes a brief return to his story, but is then interrupted twice: first by Joe, who seems to be ignoring Mr. Pink entirely while consulting and muttering about his address book, and then by Mr. Orange, who seeks clarification about the identity of the song under discussion. Mr. Pink then explicitly asks people to please stop interrupting him, warning that he will “lose his train of thought”. Undaunted, Joe continues to talk about his address book, which finally draws a response from Mr. White, leading to a brief exchange in which Joe explains the subject of his muttering. Pink is then faced with a dilemma: he wishes to continue his story, and is unwilling to yield the floor to anyone else; and yet, his warning seems to have been valid, as he finds himself unable to recall where he was in his story. His solution is to ask the group for help. His plea of “Where was I?” is graciously answered by Mr. Orange, who happens to remember exactly where Mr. Pink had left off. Orange provides Pink with verbal cues so that Pink can continue his profane narrative. lmportantly, not all of the interruptions in the example were disruptive. After Mr. Blue and Mr. White interject jocular remarks about Mr. Pink’s storytelling style, Mr. Pink is able to continue relaying his racy metaphor, and he is eventually able to finish his story despite several more interjections from his shady breakfast companions. The goal of the present work is to examine the disruptive nature of interruptions in the context of language production processes. First, I will explore whether conversational interruptions are associated with observable “disruptions” in production performance. This would seem to be a facile point, but it has no empirical verification in any literature. Assuming that such effects will be observed, I will seek to discover under what conversational circumstances interruptions cause a “disruptive” effect on a speaker. In the Reservoir Dogs example, several differences between the “disruptive” and “non-disruptive” interruptions are evident: the number of speakers, utterances, and words between interruption and resumption; the amount of time elapsed between interruption and attempted resumption; the contextual and meaningfulness relationships (similar, different, higher-level, lower-level, meta-level, etc.) between the interrupted message and the message of the interruption; and the place in the narrative where the speaker gets interrupted (toward the beginning or toward the end of the narrative). Some of these factors may have more significance than others, and although sorting all of these variables out will be an important pursuit for future work, the present study only addresses some of them. Finally, assuming I answer the whether affirmatively, and can answer some of the when experimentally, I can then ask why conversational interruptions are disruptive (or not). Answers to these questions will offer clues about architectural issues in language production such as bookkeeping (Levelt, 1989), message formulation, syntactic planning, and the relationship between language production and executive memory processes. The rest of this document is organized as follows. First, I will operationally define “interruption” for the purpose of the present study, borrowing from previous definitions from linguistics and combining them with current literature on non-conversational interruptions from the human-computer interaction (HCI) literature. Next, I will summarize the available literature on the topic of conversational interruptions, focusing especially on those works from computational linguistics, linguistics, and HCI that carry implications for the questions raised above. A discussion of the HCI work covering non- conversational interruptions will follow. HCI investigations of interruptions lay the groundwork for a theoretical and empirical approach to examining conversational interruptions, and such research will be shown to link quite well in both respects to the literature on “bookkeeping” in language production. This review of bookkeeping representation issues in language production regarding message planning and syntactic planning will set up the theoretical and experimental issues that the current project seeks to address. Defining Interruptions Interruptions have been studied empirically largely from two different fields: linguistics and human computer interaction (HCI). However, the two fields study different sorts of interruptions. Therefore, the first step in defining “interruptions” is to divide them into two categories: Conversational Intenuptions (henceforth Cls) and non-conversational interruptions, or Task Intenuptions (Tls). The focus of this section is on defining what a conversational interruption is; a task interruption is a more general usage of the term interruption, but in the present discussion it is meant to cover the types of interruptions that are studied in the HCI interruptions literature (e.g., McFarlane, 2002). It is important to distinguish the two types of interruptions at the outset because one of the subordinate goals of the present proposal is to empirically evaluate the degree to which Cls and Tls share cognitively relevant properties. The categorization and definition of Cls has come down to several key issues, including the physical overlap of speech streams between two speakers (Sacks, Schegloff, & Jefferson, 1974); the syntactic intrusion of such physical overlap and the resulting potential to disrupt a speaker (West & Zimmerman, 1983); and the cooperative versus competitive nature of the interruption (Kennedy & Camden, 1982; Yang, 2001). I will presently turn to describing these issues more fully. Cls were first characterized by Sacks et al (1974) as “overlap” between speakers in a dialog that violated the “turn” of a speaker. Social conventions hold that during a speaker’s tum-at-talk, as Sacks et al had conceived of it, each participant in a conversation should speak only when others are not speaking, or, if overlap is to take place, that overlap should occur quite near the anticipated end of the current speaker’s turn, when overlap is more expected and common (Sacks et al, 1974). These sites at which overlap ls permitted are called transition-relevant places (TRP). Later work by researchers interested in the socio-political relations between men and women in society (West & Zimmerman, 1983) used Cls as measures of attempted dominance in cross-gender interactions. West and Zimmerman (1983) viewed interruptions as a telling symbol of domination in human interaction, and used interruption tendencies as a dependent measure in cross-gender studies to further their sociopolitical view that men express their power over women through subtle, implicit means. Although the aims of their research are rather unrelated to the present ones, there was a very immediately useful aspect of their work: an attempt to define “interruption” in observable, empirical terms: In contrast to overlaps, interruptions do not appear to have a systemic basis in the provisions of the turn-taking model. An interruption involves a “deeper intrusion into the internal structure of a speaker's utterance” than an overlap, and penetrates well within the syntactic boundaries of a current speaker’s utterance (West & Zimmerman, 1977: 523). Defined operationally, candidate interruptions are incursions initiated more than two syllables away from the initial or terminal boundary of a unit-type...We intend the term intenuption to refer only to those deep incursions that have the potential to disrupt a speaker’s turn, although actual disruption (e.g., diversion of activity within a turnspace to address the intrusion, including yielding the turnspace to the interrupter) is a product of further interaction between the parties. (West & Zimmerman, 1983) Interruptions, therefore, potentially take the floor away from a current speaker (the speaker can usually raise her own speech amplitude and disallow disruption) and signify a place where the current speaker could surrender the floor. Confirrnatory “uh-huh”-type overlaps, therefore, and other such non-disruptive overlaps, are not interruptions. West and Zimmerman conceive of a disruption as the successful ability of an interruption to force the speaker to address the interruption in a cooperative manner. I will discuss shortly how this usage of the term “disruptive” refers to something quite different from my usage of the term. Kohonen (2004) performed an analysis that synthesized operational definitions of interruptions from Drummond (1989), Lerner (1989), and West and Zimmerman (1983). After excluding many “interruptions” that she found to fit into a set of predefined exclusionary principles, she found that actual interruptions (defined solely in terms of what they were not) constituted less than 1% of cases of speech overlap. However, Kohonen’s definition is too restrictive and fails to provide an actual description of an interruption. Dissatisfied with the tendency of researchers like West and Zimmerman (1977) to define interruptions purely in negative, dysfunctional terms, Kennedy and Camden (1982) investigated the various functions of interruptions in group- work settings. Analyses of videotaped interruptions showed that up to half of all interruptions served a positive purpose: to strengthen the message of the speaker who is interrupted, whether through supporting, clarifying, or repeating the message. This finding motivates a categorical division between cooperative and competitive interruptions, a distinction that Yang (2001) recently upheld. Although Yang partially defines the two types of interruptions according to pseudo-empirical concepts related to the mindset of the interrupter, including emotions and underlying intentions, a great contribution of his work is his finding that cooperative interruptions generally are associated with low-pitch and low- amplitude prosodic contours, but competitive interruptions are associated with high-pitch and high-amplitude prosodic contours. To sum up, Cls have been defined and categorized on several levels. In order to be an intenuption, an utterance must at least overlap with another speaker’s utterance. From there, the categorization becomes difficult. As Kohonen demonstrated, it is possible to consider a number of points about defining interruptions in order to integrate them into a generalized definition that fits well with research in linguistics and communication, but the resulting “definition” could be muddled and fit poorly with the current research purpose. A recent movement to categorize interruptions by their competitive versus cooperative natures, however, seems more useful in the current context. Esk interru_gtions. Task interruptions are generally defined quite differently than are conversational interruptions. The human-computer interaction literature usually considers an interruption episode to be composed of three parts (following terminology of Trafton et al, 2003): 1) a primary task that a person must carry out, such as running a computer program or editing a manuscript; 2) an interruption component called the “secondary task”, in which a person must stop performing the primary task and perform the secondary task until they finish or are instructed to stop; and 3) resumption of the primary task. This line of research approaches interruptions primarily from an applied perspective: How do interruptions influence performance on the primary task? Work from this perspective will be described in more detail below. For now, it is important to note that Tls are defined very differently from Cls, and a primary reason for this is that Cls have typically been studied and defined as a means to another end (a dependent measure of social dominance or power), and Tls have been studied to understand their effect on primary task performance. My definition of interruptions adopts a task interruptions approach to conversational interruptions. Experimentally manipulating interruptions in dialog and psycholinguistic tasks in the proposed line of research provides the ability to define interruptions empirically. Although many other sorts of Cls could potentially be examined, I will restrict interruptions in the present study to the sort that stop a speaker from completing an utterance, then allow the speaker to resume the same utterance after the interruption has been completed. In this sense, the interruptions under investigation here all fit with the West and Zimmerman (1983) definition of 10 “interruption”. This is because interruptions in the present study cause what West and Zimmerman define as “disruption”. However, just as I am leaving the general definition of “interruption” open in favor of a specific operational definition, I am also going to operationally define disruption in terms of this study’s particular dependent variables that measure difficulty in resuming speech production following an interruption. Three primary measures will be used to estimate the disruptive effects of Cls. One measure is resumption lag (RL), or the time it takes to resume the primary task (i.e., to get back to where one was in a dialog before being interrupted) following the completion of the secondary task (the interruption) (Altmann & Trafton, 2002; Trafton et al, 2003). Another measure will be filler-type disfluencies, otherwise known as uh and um. These disfluencies can be used as indirect measures of difficulty in preparing speech. Also, verbal cues that a message has been forgotten, such as when Mr. Pink asks, “Where was I?” hint at another type of measure of resumption difficulty: resumption error. Interruptions and the Question of Disruptiveness Children are taught from an early age not to interrupt another speaker. Parents and educators generally cite rudeness to bolster their admonishments of those who interrupt. However, research in psycholinguistics, which has overlooked interruptions in general, has ignored a very important potential aspect of interruptions during conversation: although interrupting a speaker may be rude, evidence from non-conversational interruptions (see below) and current theories of dialog (Pickering & Garrod, 2004) suggest that there may exist circumstances under which interruptions are helpful for the cognitive processing 11 of sentences in a dialog. I will presently review evidence from TI research that has shown both “positive” and “negative” ramifications of interruptions on task performance. I will then discuss what Cls may have in common with Tls, and discuss psycholinguistic theories that suggest that interruptions may be, in some circumstances, helpful, or at least harmless, to sentence processing during conversation (or dialog), despite what authority figures have always taught us. Non—conversational lnterrgm Although we have a subjective experience that Tls are disruptive to cognitive tasks, evidence from the TI literature does not support this notion wholesale. In fact, Tl literature has produced a wide range answers to the question of whether Tls actually are disruptive. This state of affairs may be disconcerting to a business manager who would like to know whether the email interruptions that come to employees every couple of minutes hurt overall job performance, help overall job performance, or have no effect on job performance except to annoy their employees. One of the virtues of the TI literature is that it has established a set of objective measures to determine what a “disruptive” effect is, and has evaluated these measures against each other. Unfortunately, the evidence these measures have produced is a rather mixed bag. On one hand, some research has shown interruptions to be subjectively annoying; another band has shown them to be objectively harmless or helpful to overall performance; a third hand has shown that interruptions are objectively disruptive to performance. Here, I present a brief analysis of factors that can influence the degree to which a TI causes disruption in performance, and offer the view that 12 Tls are disruptive insofar as they can cause a delay in resuming a task because old goals must be reactivated past the activation level of the most recent goal (Altmann & Trafton, 2002). There are three general measures for the possible disruptive, or helpful, effects on task execution, but in order to define these, I should first introduce the set of terms to be used throughout (following terminology from Trafton, Altmann, Brock, & Mintz, 2003; Figure 1, below, is also adopted from Trafton et al, 2003). As shown in Figure 1, the primary task is the task that gets interrupted. The task that interrupts the primary task is the secondary task. The period of time between the offset of the primary task and the onset of the secondary task is called the intenuption lag. The period of time between the offset of the secondary task and the resumption of the primary task is called the resumption leg (or RL). The total amount of time spent on the primary task, subtracting the time spent during the secondary task, is called time on task or TOT. Although most experiments reported here used TOT as the dependent measure of disruption, Trafton et al (2003) raise a convincing point: that the measurement of time on task is not a precise measure of disruption because, at any point in the course of the primary task, factors such as motivation or strategy could decrease the overall time spent on a task, but leave resumption lag times quite high. So, if researchers wish to study the effects of interruptions on overall task performance, then TOT is a good measure. However, if one wishes to understand the cognitive implications of an interruption apart from motivational or strategic factors, then the time preceding the resumption of a task (resumption lag) is likely a more valid measure (Trafton 13 et al, 2003; Altmann & Trafton, 2002). Finally, overall accuracy on a task is also a possible measure for the disruptive effects of interruptions on performance. Begin Alert for Begin End Resume End Primary Secondary Secondary Secondary Primary Primary Task Task Task Task Task Task Time on Interruption Interruption Resumption Time on Task (1) Lag Lag Task (2) Time Figure 1. The interruption and resumption process, involving a primary (interrupted) and secondary (interrupting) task. Adapted from Trafton et al, 2003. Interruptions and disruptiveness: a mixed bag. Research has shown that interruptions annoy people. Specifically, Bailey et al (2001) and Adamczyk & Bailey (2004) administered questionnaires to participants after experiments in which a primary task was occasionally interrupted by a secondary task. In the Bailey et al (2001) experiment, half of the subjects were interrupted in the middle of primary tasks (such as arithmetic calculations), and the other half performed all tasks to completion without any interruption. A questionnaire following the experiment showed that participants in the interruption condition found the primary tasks more annoying, more anxiety-inducing, and more difficult than did the non-interrupted control group, suggesting that perhaps Tls are not objectively disruptive, but merely unpleasant. 14 In fact, other studies have failed to find effects of Tls on empirical measures of disruptiveness (Burmistrov & Leona, 2003; Gillie & Broadbent, 1989). During simple tasks in which subjects added paragraphs to text, Burmistrov & Leona (2003) interrupted half of the subjects with phone calls. There was no difference in either RL or TOT with this manipulation. Gillie & Broadbent (1989) studied the effects of mathematics task interruptions on subjects performing a verbal item-location matching task and similarly found no effects of interruption on performance whether the math task was short (30 s) or long and involved (over 2 minutes). Even more damaging to claims that Tls are disruptive to performance is evidence that Tls can improve performance and efficiency on certain simple tasks (Speier et al, 2003; Zijlstra et al, 1999). Speier et al (2003) demonstrated that when performing simple decision making tasks that involved the monitoring of a few number cues, occasionally interrupting the task actually led to a faster TOT and overall higher accuracy on the primary task. Zijlstra et al (1999) also showed TOT improvement when a primary task was interrupted. It should be noted, however, that this experiment also allowed participants to choose when to perform the secondary tasks. McFarlane (2002) showed that such freedom elicits positive strategic effects on primary task performance. So far, this review has shown that in some circumstances, Tls do not inhibit, and can, in fact, improve performance. The picture will soon become murkier. Trafton et al (2003) presented some convincing data that Tls can be disruptive to primary task performance if the proper dependent measure is used: 15 resumption lag. In their study, Trafton et al (2003) used as a primary task a game in which participants allocated fuel to various tanks. The secondary task had participants judge whether incoming “bogies” were friends or foes. Some subjects received warnings that interruptions were upcoming, some did not. When an 8-s secondary task onset warning was given to participants during the primary fuel task, resumption lag was significantly shorter than when interruption by the secondary task was immediate. Not only do these data suggest that there is a costly aspect to interruption, but the authors offered a reason for why they are costly: memory for goals (Altmann & Trafton, 2002). During the primary tank task, the subject is setting goals for the task and completing them. When a secondary task begins, the primary task goal becomes less active in memory than the secondary task goal. When the secondary task ends, the subject must then reactivate the old, primary task goal where it was left off. This cannot be easily done, Altmann & Trafton (2002) argue, without cues to prime the activation of the old goal. The reason subjects in the no—warning condition are unable to resume the primary task as quickly as the subjects in the warning condition is that subjects in the warning condition are able to use the interruption lag (the time between the warning of a pending interruption and the interruption itself) to encode cues for later goal-priming. A subsequent follow-up experiment (Altmann & Trafton, 2004) manipulated the presence and absence of cues during the interruption lag to verify this hypothesis. The study showed that cues were helpful in reducing the disruptive effect of the secondary task on RL. The qualification to these studies is 16 that the disruptive effects of interruption on RL in the warning versus no warning conditions were limited to the first session. In other words, it seemed that once a participant was trained to expect the interruptions, some sort of strategic response was learned to overcome its negative effects on RL (see also Hess & Detweiler, 1994). To sum up, Tls are almost always associated with negative perceptions, but the evidence is mixed as to whether they are objectively disruptive. The General Discussion (below) will summarize these factors and explore their usefulness as possible future directions for the current research program. | presently offer connections between findings from TI research and the rationale for the design of the current experiments. Dependent measures. As mentioned above, studies using RL as a dependent measure of disruption may be more valid than studies using TOT as a dependent measure. Whereas some studies (Cutrell et al, 2001) found disruptive effects on RL for both simple and complex tasks, Speier et al (2003), taking only TOT measures, found that simple tasks were aided by Tls. It is possible that the TOT and RL measure completely different processes (overall task strategy and goal retrieval, respectively). The present study is more interested in the aspects of language production that involve goal retrieval; hence, resumption lag will be emphasized over time-on-task in the context of speech. To measure resumption lag in the context of a language production task, the experiments presented here measured the amount of time that elapsed from the end of an interrupting task to the resumption of the speech unit that had been interrupted. In the case of 17 Experiment 1, resumption lag was measured by analyzing waveform files by hand. In Experiments 2 and 3, a software program directly encoded such resumption lag times. Similarity of primary and secondary tasks. The Gillie & Broadbent (1989) study is valuable insofar as it demonstrates that simple tasks can be more disruptive as interruptions than complex tasks as long as the simple task is more similar to the primary task (i.e., of the same domain) than the complex task. Findings such as this have been used to argue that interruptions will be the least disruptive when they distribute information across many different sensory modalities (Adamczyk and Bailey, 2004). This issue is addressed to some degree in all of the proposed experiments. In Experiment 1, similarity is not based on cognitive domain, but on the relation between interrupting topic to the interrupted topic of dialog. Experiments 2 and 3 investigate the similarity issue by asking whether interruptions to sentence production are more disruptive when the task is of the same verbal cognitive domain (i.e., reading a sentence) or of a non-verbal cognitive domain (i.e., performing an arithmetic computation). Qgrse versus fige break points. Adamczyk and Bailey (2004) interrupted people performing a primary task (such as titling a video) at different points in the task with various types of secondary tasks (such as text-titling tasks). Crucially, some participants were interrupted at “coarse points” in the video-titling task, while others were interrupted at “fine points”. Coarse points were defined as something like the break between scenes in the movie, whereas fine points occurred in the middle of important scenes. Participants expressed more 18 frustration if they were interrupted at fine break points than at coarse break points, and they took longer to perform primary tasks (TOT). However, the researchers failed to find effects of fine versus coarse break points on resumption lag. They attributed this to a problem with the grain-size of their resumption lag measure (i.e., their instruments only measures times in whole second increments). The important implication of these findings is that interruptions may be less disruptive at moments in task execution when a cognitive break may be taken (McFarlane, 2002). Experiments 2 and 3 investigate this issue from the perspective of syntactic boundaries—are interruptions less costly when they come between clauses than when they come in the middles of clauses? Training. Finally, both Trafton et al (2002) and Hess & Detweiler (1994) found that the disruptive effects of interruptions can be trained away with practice. Trafton et al (2002) showed that, by the end of their study in which participants were interrupted somewhat frequently, the interrupting tasks were no longer more disruptive than non-interruption control conditions. They argue that strategies can be developed over the course of an experiment such that appropriate cues for later recall can be encoded by interruptees to aid resumption. It is possible that conversational interruptions share this property in the context of a single conversation, such that the first interruption or two disrupt the speaker, but later become easier once they are expected. This potential constraint must be kept in mind for the present study. 19 Are Conversational Interruptions Disruptive? Literature on Cls has typically considered interruptions to be disruptive to the superficial extent that the speech of one person is broken off by the speech of another (see above). This literature also considers this sort of disruption to be a necessary condition for an interruption, not a possible result of one. However, this definition is inadequate for two reasons: First, because it ignores cases in which interruptions aid the overall message of a dialog, as Kennedy and Camden (1982) and others have pointed out. To call a helpful interruption “disruptive” simply because one person stops speaking ignores common sense and the growing body of literature showing that conversation is a cooperative, joint activity (for reviews, see Clark, 1996; and Pickering and Garrod, 2004). The second problem with the typical Cl view of disruptiveness is its measurability with respect to drawing cognitive inferences. The definition of disruptiveness used in the TI literature reviewed above is able to account for cognitive effects of interruptions more aptly; there, an interruption is disruptive to the extent that it affects performance of a primary task. This view of disruptiveness allows for a methodology to investigate what types of interruptions cause more or less disruption to a primary task, which, in the case of dialog, would be the speaker’s primary message or utterance. There is little in the way of psycholinguistic research to suggest whether interruptions do, in fact, disrupt dialog processing. The handful of researchers who have addressed conversational interruptions actually hint that interruptions should be helpful rather than harmful to language production. For instance, 20 Pickering and Garrod (2004), inspired to combine the study of dialog forged by Clark with the rigorous methodological techniques of mainstream psycholinguistics in their current model of dialog processing, have suggested that interruptions that help conversational partners align their representations would not be disruptive to the processing of dialog. According to this view, there should be no disruption at all as long as interlocutors are still “aligning” on the relevant levels of representation, most importantly the situation model. On the other hand, if alignment between participants in a dialog is broken there is reason to believe that the dialog will be disrupted (Pickering, personal communication with Fernanda Ferreira). One way in which alignment could be disrupted is if the situation model being conveyed by a speaker is suddenly altered by an interruption that is either tangential to the speaker’s model or addresses a different situation altogether. Other work by Chapanis (1976) and Schober et al (2004) suggests that the ability to interrupt may actually help the processing of dialog. The mere ability to interrupt another conversational participant, these works suggest, may aid the extent to which the messages expressed and agreed upon in dialog are eventually understood. Pickering and Garrod’s (2004) contention that dialog introduces a slew of naturalistic advantages over monologue could easily adopt “ability to interrupt" as yet another advantage afforded by dialog. On the other hand, intuition drawn from examples such as the Reservoir Dogs dialog, and the work of Grosz and Sidner (1986) suggest that interruptions 21 will, in fact, disrupt speakers from completing thoughts and narratives in dialog. Clearly, people occasionally become distracted when they are interrupted. The computational model of Grosz and Sidner (1986) is among the most thorough attempts to analyze and categorize conversational interruptions. In developing their computational theory of dialog processes, Grosz and Sidner identify three major components that must explain the process of discourse (defined as spoken or written conversation between 2 or more people): a linguistic, discourse structure; an intentional (or goal, or task) structure; and an attentional structure that models the focus of a discourse. The primary motivations for their selection of this series of discourse components is that previous accounts of discourse could not adequately explain interruptions in discourse. A discourse, according to Grosz and Sidner, is coherent to the extent that the intentions of the participants are the same (or aligned). Interruptions disrupt the coherence of a discourse by temporarily altering the overall purpose of a discourse (discourse purpose, DP), by introducing a discourse segment with a different purpose (discourse segment purpose, DSP). Generally, an interruption to a discourse breaks off a discourse in some way, and is followed by a resumption. Grosz and Sidner divide interruptions into 3 major categories: true interruptions, flashbacks, and digressions. I will presently explain each type and provide Grosz and Sidner’s examples of each. Interestingly, all of Grosz and 22 Sidner‘s examples of interruptions are within-speaker interruptions—that is, a speaker interrupts herself and then resumes speaking about the primary topic. In true interruptions, one discourse segment (D1 below) is interrupted by a discourse segment that has “distinct, unrelated purpose(s) and convey(s) different information about properties, objects, and relations” (DZ). So, a DP must be altered entirely for a true interruption to take place. In the example below, even the audience of the interruption is different than the primary discourse audience. Note that this definition of “true” has ramifications primarily for the intentional structure of the discourse. Grosz and Sidner point out that D1 is resumed “almost immediately” in the discourse—not after some delay, even though the interruption did not promote alignment. D1: John came by and left the groceries DZ: Stop that you kids D1: and I put them away after he left Flashbacks are mainly different from true interruptions in that the intention, or goal, of the interruption is related to the overall discourse purpose. There is also generally some sort of verbal warning that an interruption of the discourse segment is about to take place. Such a warning might call attention to something having gone wrong that will require amendment. Unlike true interruptions, the audience for flashbacks remains the same. D1: OK, now how do I say that Bill is... 23 02: Whoops I forgot about ABC. I need an individual concept for the company ABC (continues with discourse segment on ABC). D1: Now back to Bill. How do I say that Bill is an employee of ABC? A digression, according to Grosz and Sidner (1986), is “a strong interruption that contains a reference to some entity that is salient in both the interruption and the interrupted segment.” Like flashbacks, digressions are usually accompanied by a verbal warning such as “speaking of” or “that reminds me”. D2: Speaking of Bill, that reminds, me, he came to dinner last week... Digressions share some properties of true interruptions (a shift in intentional structure) and flashbacks (verbal warnings of pending interruptions), and an explicit verbal closing before resumption of the primary discourse segment such as “getting back to ABC” or “anyway”. A problem with Grosz and Sidner’s analysis of interruptions for present purposes is that they focused heavily on self-initiated interruptions. The current project will focus on the phenomenon of one person interrupting another in dialog or conversation. One of the virtues of the Grosz and Sidner work is that it provides a handy starting point for characterizing types of interruptions that could be disruptive or helpful for the processing of dialog. It seems as though “true” interruptions and digressions could sufficiently disrupt interlocutors’ memory for primary conversational goals such that resuming the dialog from before one of these interruptions could be quite difficult. Flashbacks are the sorts of 24 interruptions that not only maintain the goals of a dialog, but deepen interlocutors’ understanding of a topic. According to the framework of Pickering and Garrod (2004), one might suppose that digressions and true interruptions serve to break alignment between two participants in a dialog, whereas flashbacks serve to promote alignment. Experiment 1 will test this hypothesis. Bookkeeping Processes in La_ngu_age Production When a person attempts to converse with another in order to accomplish some specific purpose, a speaker must develop a verbal plan for how to do so. As an illustration, I will suppose that Bob has gone to a movie with David. In the middle of the movie, David gets up to refill his popcorn and soda. After the movie ends, David asks Bob what happened while he was gone. Bob must develop a plan to explain the scene to David. This plan likely has many layers to it: An overall goal, which would be to adequately explain the scene; and subgoals, including the event structure, as well as the subgoal to mention characters’ names. Other subgoals might also exist, such as the desire to show exasperation with David because he always misses crucial scenes in movies to satiate his unquenchable thirst for sugar water and his indefatigable hunger for popped corn. Bob must keep track of a number of different goals and intentions during his narrative to ensure that his plan is carried out correctly. Specifically, Bob must remember which aspects of the plan have already been executed, which aspects of the plan have yet to be executed, and whether any part of the plan has been missed, or forgotten. Levelt (1989) has termed this process of keeping track of 25 one’s communicative intentions during language production “bookkeeping”. In fact, Levelt’s view of bookkeeping in dialog draws heavily from the ideas of Grosz and Sidner (1986). Levelt views conversation as hierarchical in structure, such that it is possible to move up and down in a conversation, i.e., back and forth between high-level goals and low-level goals. Crucially, this structure is not just an abstract one that bears no cognitive relevance (as it was for Grosz and Sidner). Rather, Levelt, citing several sources from discourse and text processing, argues that this structure of intentions is the representation from which speakers perform bookkeeping operations during speech. Knowledge of where one is within that represented structure is achieved via a “pointer" mechanism that binds the current conversational content to a site in the goal structure. According to this framework, the conversation between Bob and David might begin at a high level. Several researchers (Kintsch, 1974; Meyer, 1975; Thorndyke, 1977) have argued that, in story structure, high level representations include setting (characters, time, location), theme (event and goal), plot (consisting of episodes), and resolution (consisting of events). Accordingly, Bob might begin by saying which scene it was that David missed, and briefly cite which characters were involved in the scene. Bob’s narrative must soon move down to lower levels on the hierarchy, since “(t)he lowest-level subgoals eventually evoke individual messages and their topics; they are the smallest ingredients in the hierarchy of discourse topics” (Levelt, 1989). In other words, 26 the low-level goals are the plans for the production of actual speech that are sent to the next production module. Using this hierarchical framework of Levelt, at any point during Bob’s description, David can interrupt and do one of three things: ask for clarification of minor points at lower levels in the narrative hierarchy; ask for information that is considered “above” the current pointer in Bob’s narrative structure; or, in fact, change the topic of conversation entirely (at the risk of infuriating Bob even further). The intuition of Levelt (1989) and Grosz and Sidner (1986) is that the further the interruption material is from the hierarchy pointer, the more disruptive the interruption might be. Theory from the task interruptions literature can help psycholinguists think about the cognitive architecture that supports bookkeeping for speakers in a dialog. Trafton et al (2003) maintain that the reason interruptions become disruptive to performance of a primary task is that the goals of the primary task decay from memory during engagement with the secondary task. Following along, Cls that stray the furthest from the bookkeeping pointer may cause the most disruption because they have the most potential to interfere with pre- interruption dialog goals due to the memory constraints that limit bookkeeping representations. On the other hand, Gillie and Broadbent (1989) found that interruptions with content similar to the primary task are more disruptive than unrelated interruptions. In that case, perhaps Cls closest to the pointer should be the most disruptive. The types of Cls mentioned above have similar properties to Tls: They differ in complexity, in length, and in distance from and similarity to the 27 “primary” narrative. In sum, they represent a natural testing ground for interruptions that can be both an existence proof of narrative bookkeeping and provide naturalistic data that address controversial issues in the task interruption literature. Although research on conversational interruptions has been sparse, Glanzer and colleagues (Glanzer et al, 1981, 1984; Fischer & Glanzer, 1986) carried out research on the representations that persist across interruptions to the reading of text. In this work, people reading cohesive text were interrupted with an alternative task, then asked to resume reading the text. A control group was not interrupted. The general finding observed in these studies was that interrupted readers took longer to read sentences after interruption than their non-interrupted counterparts, especially when the sentence they resumed with was thematically or referentially tied to the previous sentences. Although these were studies of reading rather than dialog, the studies are relevant to the current questions because they address the role of memory processes during the interruption of verbal processing. Glanzer and colleagues conclude that short term memory (STM) is a vital architectural necessity for the processing of text because it maintains the meaningful dependencies between sentences for readers. Ericsson and Kintsch (1995) have gone on to argue that this work is even better-suited for their Long-Term Working Memory (LT-WM) model. It would also be interesting to consider these results from the perspective of more recent versions of WM in language comprehension (Just & Carpenter, 1992; Waters & Caplan, 1996). Whichever model the Glanzer work best supports, it is clear that 28 memory should have a role in resumption from conversational interruptions. For this reason, we included measures of both verbal and spatial working memory by using a variants of reading span (Swets, Desmet, Ferreira & Hambrick, 2005) and spatial span (Shah & Miyake, 1996). Overview of Experiments Three experiments were carried out in order to begin to investigate the untapped issues of bookkeeping and interruptions from a psycholinguistic perspective. Experiment 1 used a semi-natural dialog in which a confederate participant interrupted at predetermined narrative junctures. Interruption types were manipulated, and resumption difficulties were measured in order to investigate what types of interruption are more (or less) disruptive to speakers’ bookkeeping abilities. Experiments 2 and 3 investigated whether similar bookkeeping processes exist for syntactic planning during sentence production. Theories of alignment in dialog (Pickering & Garrod, 2004) should predict that cooperative interruption types should be more disruptive to speakers than competitive types. In Experiment 1, speakers described movie clips and a confederate interrupted at predictable points in the description with four different types of interruptions. Two types were cooperative: The confederate either attempted to make sure something was correctly understood, or to clarify some ambiguity. The other two types were competitive: They altered the purpose of discourse either by digressing within the framework of the topic, or by introducing a topic that had no relevance to the task and topic at hand. To preview the results, several measures of resumption difficulty, including resumption lag, 29 errors, and disfluencies, showed main effects of interruption type. However, the pair-wise comparisons did not support the alignment hypothesis: Clarifying interruptions (a cooperative type) were just as disruptive as the most irrelevant competitive interruptions. The other two interruption types were roughly equal in reduced disruptiveness. These results suggest that different properties of interruptions do carry different costs to bookkeeping, but they also suggest that other factors (such as attention) are at least as important as alignment. Speakers in Experiments 2 and 3 were interrupted by grammaticality or arithmetic judgment tasks during the production of two-clause sentences. Again previewing the results, when interruption length was properly controlled, domain had no effect. This implies that bookkeeping processes may be domain-general memory phenomena. The location of the interruption during sentence production was also manipulated: Interruptions could occur during the first clause, between clauses, or during the second clause. Analyses of resumption lag data showed that interruptions are most disruptive at the earliest positions in sentence production. These results support theories of strategically non-incremental grammatical encoding (e.g., Ferreira & Swets, 2002). Resumption error data revealed that errors were reduced when participants were interrupted at clause boundaries rather than in the middle of clauses. We conclude that the clause is a unit of sentence-level bookkeeping (Bock & Cutting, 1992). The two approaches are intended to lay the groundwork for a program of research on conversational interruptions. 30 EXPERIMENT 1 The first experiment addresses a series of related research questions. These issues include the disruptiveness of conversational interruptions, message planning and bookkeeping, and the establishment of a psycholinguistic methodology for investigating interruptions. Research Questions The disruptiveness of conversational interruptions. The most basic research aim of Experiment 1 is to show that interruptions can be disruptive. Of course, a control condition with no interruptions would be difficult, if not outright impossible to implement in this dialog study. So rather than compare interruption to no interruption conditions, I will consider a rejection of the null hypothesis— that different interruption types do not differ in disruptive properties—to support the idea that interruptions are disruptive. If it is not possible to prove interruptions disrupt speakers more than a lack of interruptions, it seems that the next best option is to show that some types of Cls are, in fact, more disruptive than other types of Cls. Specifically, interruptions that tend to promote the alignment of goals and representations between the participant and the confederate should result in less disruption than interruptions that interfere with such alignment. This concept of alignment is inspired by the views of Pickering and Garrod (2004), but the idea of alignment here is slightly different. Alignment in the Pickering and Garrod model held for the specific levels of representation of phonology, syntax, and meaning. In the present study, alignment refers to whether an interruption helps interlocutors retain the same task goal or sub-goal as the primary task. 31 Message planning and bookkeeping. The ability of a speaker in a dialog to keep track of what has already been said, what has yet to be said, and where he or she currently stands in relation to those positions—a process that Levelt (1989) termed “bookkeeping”——has very little empirical support to justify its existence in a language production architecture. Evidence that different failure or resumption tendencies to recover from interruptions depending on a location within a message-level bookkeeping hierarchy would provide some empirical evidence for such an architectural property of language production. Manipulating Cls experimentally affords the opportunity to explore the nature of the system that gets interrupted (see Meyer, Irwin, Osman, & Kounios, 1988). Comparisons and analogies between the TI literature and the message planning-as-bookkeeping theory are naturally drawn and rich with possibility. The TI research has shown that a key component of overcoming the disruptive effects of interruptions is keeping a memory copy of the goal structure that is robust to the TI. Recovery from conversational interruptions similarly involves the need to keep a goal structure in mind. I will test whether such principles apply during language production in dialog. One way in which Experiment 1 will do this is by testing whether individual differences in working memory are associated with differential abilities to deal with conversational interruptions. Establishing an interruptions methodologv for psycholinguistics. Studying interruptions using a confederate methodology has promising implications for research in psycholinguistics. The methodology offers a chance to study Cls themselves by directly manipulating their properties and analyzing their effects 32 on the interruptee. The problem with inferring whether different conversational interruptions are more or less disruptive is that such work has only been done under naturalistic conditions in which researchers analyzed videotaped or audio- taped dialogs (Kennedy and Camden, 1982; West & Zimmerman, 1983; etc.). An experiment in which Cl types are manipulated experimentally could be used to demonstrate a causal connection between interruption type and disruptiveness. Pilot Studies Two pilot studies were run before the main study. Full details of the rationale and findings of these studies are given in Appendix A. Given the results of these pilot studies and the theoretical problems with Pilot Study 2 that are outlined in Appendix A, Experiment 1 was designed to test a rather basic idea: that interruptions that attempt to align the interrupter’s representation with the interruptee’s representation of a story will result in less disruption than interruptions that do not lead to such alignment. Main Study EM Participants. Forty-eight participants were tested in exchange for partial credit in their introductory psychology classes. All were native speakers of English and students at Michigan State University. Materials: Dialog task. Participants viewed two film clips. Both clips were taken from obscure 19805 movies with no recognizable actors in order to prevent the possibility that familiarity (or expertise) with the clips will aid some 33 participants in bookkeeping processes. The clips also have several plot points that participants would reliably include in descriptions during pilot testing. One clip was from the movie Basket Case, a low-budget comedy-horror film about a man, Duane, who is surgically separated from his deformed and murderous conjoined twin, Belial, whom Duane now carries around in a basket. The movie follows the travails of Duane and Belial as they search in New York City for the doctors who separated them against their will. The clip has 4 basic plot points targeted for inclusion in participants' descriptions: 1) a man feeding hot dogs into a burping basket; 2) a withered, monster-like arm emerging from the basket and accidentally breaking a television dial; 3) the man running to a woman’s apartment; and 4) the man and the woman emerging from the Statue of Liberty and kissing? The second clip is taken from the movie Breakin’, a movie about two breakdancers (Ozone and Turbo) from the streets of Los Angeles who befriend a female ballet dancer (Kelly) from a rich family in the suburbs. The 35-minute clip shown from this movie also contains 4 plot points targeted for inclusion within participant descriptions: 1) Kelly and her dance school friend, Adam, drive up to a beachside park where onlookers have gathered around some performing breakdancers, including Ozone and Turbo; 2) Kelly and Adam then join the dancing themselves; 3) some unsavory breakdancing characters, who apparently belong to a rival breakdancing troupe, then dance in a threatening manner while icily staring at Ozone and Turbo; and finally, 4) wanting to avoid a breakdancing 2 lmportantly, the clip has no offensive material—the violence, gore, and sexual contents that the movie does contain are not to be found in the clip shown to participants. 34 confrontation, Kelly, Adam, Ozone, and Turbo all leave the dancing scene in order to introduce themselves to each other more formally. Appendix B displays the interruptions used to create a series of interruption scripts. Each interruption script consisted of 8 interruptions—four for each description of the two film clips that participants watched. Each clip description had four intended locations at which the participant was to be interrupted, meaning that each full script contained two interruptions from each individual interruption type. There were 24 permutations of interruption-type orders for each description such that each interruption type occurred at each story point an equal number of times. Forty-eight unique scripts were created. The first 24 scripts randomly paired a set with a particular order of interruption types for the Basket Case script with a set of interruptions from the Breakin’ list. The second set of 24 scripts re-shuffled the pairing of the Basket Case and Breakin’ scripts and reversed the order in which the film clips were shown to the participant and described. The order in which the scripts were used was determined at random before data collection began. Before data collection, interruption types had been categorized into two major categories: aligning interruptions that fit the goals of the task to make the confederate understand the movie clip, and non-aligning interruptions that digressed from the goals of the task (for a description of how this definition of alignment differs from Pickering and Garrod’s definition, see above). These categories were themselves split into two types apiece. Aligning interruptions consisted of repetition interruptions and clarification interruptions. For repetition 35 interruptions, the confederate stopped the participant’s speech and repeated the last two or three points that had been made in order to verify her understanding. For clarification interruptions, the confederate stopped the description in order to ask a question that the participant had not addressed, such as the possible setting of the action. Non-aligning interruptions also comprised two types: digressions, in which the confederate took a point the participant had just made and related it to a topic that did not aid the task goal, and true interruptions, in which the confederate interrupted in order to address some issue that had nothing at all to do with the primary dialog. The properties of these two interruption types, along with their names, are borrowed from the Grosz and Sidner (1986) taxonomy of interruptions. As noted previously, all scripted interruptions are listed in Appendix B by condition and by the point at which they were used in descriptions of the two film clips. There were several constraints to consider in creating the interruption scripts. The most important constraint, besides creating interruptions that fit the definitions of the interruptions types, was how natural each interruption would sound to a participant. The ideal scenario for the creation of stimuli is one in which each interruption type has consistent items that can be used in each condition, but which only vary in one way across conditions. Unfortunately, this was not possible to implement. Take clarification interruptions, for example. If a clarification interruption at one point in a story description asks for the speaker to describe a particular object in more detail, one would like to find instances at 36 different story points in the two clips when an object could be described more. On the other hand, such an approach is problematic because, now, all clarification interruptions are being represented by only one specific sort of question. The solution was to consider the four story points in each story: What is the most natural interruption that could occur at that point that still fits the particular type of interruption that is being tested? Unfortunately, with such a small number of items possible to be used per subject3, this is the best sort of counterbalancing that could be achieved. Materials: Working Memorv talk. Two different kinds of working memory tasks were used: reading span to measure verbal working memory, and spatial span to measure non-verbal, spatial working memory. Each task consisted of 36 items with 8 trials apiece. Single trials consisted of set sizes of 3, 4, 5, or 6 items. Trials of each set size appeared twice in each working memory task. Each participant viewed the same items in the same order. The 36 reading span items were modeled on the Daneman and Carpenter (1980) design and then modified in a manner similar to that described in Turner and Engle (1989). Each item consisted of a sentence presented above a single to-be-remembered word that was highlighted in red. Half the sentences made sense, and the other half were semantically implausible or unlikely. A question mark appeared after each sentence to indicate to participants that a response to the sentence was desired (see Figure 2 for examples of each task). 3 Given that each description took about 1 to 5 minutes to complete, it would have been impossible to add any more than 4 interruptions per description. The number of interruptions per description already seemed somewhat forced at times. 37 Reading Span Spatial Span After dinner the couple had a glass of tree. ? 0 Figure 2. Item examples, reading span and spatial span tasks, Experiment 1. Apparatus. Digitized film clips were displayed on a Dell monitor. Dialog between the participants and the confederate was recorded onto a Marantz digital audio recorder. Two lapel microphones fed into a sound mixer, which sent an output signal to the Marantz recorder. PowerPoint was used to present the working memory tasks on the same monitor. Participants recorded their responses to these tasks in a paper answer packet. Procedure. Participants were greeted by the experimenter upon their arrival and brought to a waiting room where a confederate participant was already seated. The experimenter introduced them to each other and explained, in general terms, how the experiment would unfold: One of them will watch a film 38 clip and describe it to the other person, who will have to take a memory test on the description. The experimenter then asked the participant and the confederate each to choose a number between 1 and 10. After both guesses were made, the experimenter revealed that the number that the confederate chose was closer to the number the experimenter had been thinking of. The experimenter then asked the confederate which task she would rather complete: the watching/describing task or the listening/memory test task. The confederate chose the memory test. The memory test is merely a cover story that increases the likelihood that the participant will include important details of the film clip in his or her description. The participant was brought to a room, given the consent procedure, and then read the instructions shown in Appendix C. The participant then watched two short clips from highly obscure films from the early- to mid-1980’s (Basket Case and Breakin’). The experimenter then retrieved the confederate participant and brought her into the experiment room. When they were both seated, the experimenter explained how to attach the microphones to their clothes and made sure that each microphone was working properly. When the microphones were set up, the experimenter told the participant that he or she could begin describing the first clip they just viewed in as much detail for the confederate as possible (for the later memory test). The confederate was given a script to follow during the experiment. The confederate had been trained to interrupt according to the following guidelines. First, she was trained to make sure to interrupt in the middle of people’s sentences. This was done because interruptions that occur between sentences 39 may come at times when no plan for the upcoming sentence has been made, whereas interruptions coming mid-sentence imply that there is something definitive to return to. Second, the confederate was asked to make sure to end the interruption herself by saying, “OK, sorry, go on.” This was done to attempt to normalize the way in which each interruption ended. Finally, to help prevent participants from feeling offended by interruptions, the confederate was trained to be as courteous as possible while interrupting. Specifically, the confederate was urged to apologize frequently and to generally act polite. Each participant was interrupted eight times over the course of the descriptions of the two clips. Once the participant had finished the descriptions, the experimenter brought the confederate into another room for the supposed memory test. Upon returning roughly 20 seconds later, the experimenter administered reading and spatial span to the participant, as well as a questionnaire. The questionnaire was designed to assess two factors: the participants’ familiarity with the film clips, and the extent to which the participant was aware of the purpose of the experiment. Each experiment ended with a debriefing session. Data Analysis Four primary measures of disruptiveness were used to evaluate disruption. Two of these were temporal measures of resumption difficulty (primary and secondary resumption lags). These measures were obtained by having a laboratory assistant listen to waveform files of the recorded conversations, then manually measuring time lags between interruption offsets 40 and resumption onsets. Two were transcript-based measures of disruption (repeated words and disfluencies). In order to obtain these measures, one laboratory assistant transcribed segments of the dialogs containing interruptions, and another assistant coded the transcriptions in a data file after reading the transcriptions. Prima_rv resumption lag. A standard measure in the task interruption literature to evaluate the disruptive effect of secondary tasks on primary task interruptions is the amount of time it takes the participant to resume the primary task after completing the secondary task (Trafton et al, 2003; Altmann 8. Trafton, 2004). For this case involving naturalistic dialog, we are interested in observing the time between the offset of the conversational interruption and the onset of the first meaningful word spoken after the offset. The offset is defined as the end of the last word related to or terminating the material discussed during the interruption. Secmd_arv resumption lag. On many occasions, the first meaningful word a participant utters upon resumption is not the first word that continues the speaker's pre-interruption conversational goal. Instead, the speaker could go back even further in the narrative, skip ahead in the narrative, or utter words simply to stall as she tries to recall the pre-interruption topic. A measure of disruption designed to supplement resumption lag, secondary resumption lag, measures the amount of time it takes after interruption offset to utter the first word that does continue the pre-interruption material. 41 There are two drawbacks to the secondary resumption lag measure. First, the decision about where a speaker resumes uttering a particular message is ultimately a subjective one. Second, the measure should be highly correlated with primary resumption lag, since in many cases, the first uttered word will be the same as the first word that resumes the pre-interruption topic. For these reasons, the measure may not be particularly informative as a measure of disruption. Hence, I have included two transcript-based measures of disruption: number of repeated words and number of disfluencies. In order to obtain the measures, a laboratory assistant who also served as the experiment’s confederate transcribed the speech data from the waveform files. Then, a different lab assistant naive to the purpose of the experiment made the relevant judgments. Npmber of repeated words. One way to improve the objectivity of measures to determine whether an utterance continues from where a speaker had intended to leave off is by examining whether he or she repeats a significant portion of the message that had already been expressed before being interrupted. Repetition of a message that has already been stated could be viewed as a fair indication of bookkeeping trouble. If a speaker knows where he or she left off, then it should not be necessary to repeat that message unless the speaker is having trouble remembering where the message had been cutoff. An effective strategy to deal with this inability to correctly keep books would be to go back in the narrative further than necessary, or else one risks skipping important pieces of information. Doing so not only helps prevent the omission of important 42 information in the course of the narrative, but also buys the speaker time to plan the message that was disrupted by the interruption. Granted, it could be argued that speakers will repeat messages after interruptions solely to reestablish common ground with the listener (Clark, 1996), but one could also argue that this re-establishment of common ground would be necessary to use regardless of interruption type. Number of disfluencies. Speech planning difficulty is often accompanied by an increase in the number of filler-type disfluencies that a person utters (such as uh and um). The number of disfluencies a person makes during the interruption lag augments the resumption lag and word repetition data by representing yet another indirect measure of resumption difficulty. The logic here is that the more difficulty a speaker has recalling the place at which he or she left off, the more disfluencies will be uttered. Other measures. Two other measures were taken. First, I determined the length of the interruptions. Although several studies of Tls (Gillie & Broadbent, 1989) have failed to reveal effects of interruption lengths on measures of disruptiveness, it will be important to rule out this variable as a possible explanation of effects that could be observed. Additionally, many studies of priming in language production and comprehension (Branigan et al, 1999; Wheeldon & Smith, 2003; Scheepers, 2003; cf. Bock & Griffin, 2000) have shown that the length of the delay between a prime sentence and a target sentence influences the degree to which priming holds over the interim. Since the present task is more linguistic in nature than the tasks studied in TI studies above, it is 43 possible that such length effects could be revealed. Unfortunately, it would be impossible to regulate the lengths of the interruptions during the experiments without sacrificing ecological validity. For this reason, the interruption length data must be examined post hoc to rule out a confound between interruption length and speech resumption. The final measures to mention are reading span and spatial span. The rationale for the inclusion of these measures of working memory span is straightforward: to investigate the role of working memory resources during resumption from a Cl. People with low working memory (low spans) should have more difficulty resuming speech following a CI than people with high working memory span (high spans) if working memory plays some role in bookkeeping. Reading span scores are determined by individual items as has been done in recent individual differences work (Swets et al, 2005) rather than by computing the standardized reading span scores used in other studies (Waters & Caplan, 1996). Hypotheses and Preditions The Bookkeeping Hypothesis is the theory that speakers in a dialog must actively maintain a representation of the past, present, and future contents of the current dialog. This representation consists of a hierarchical dialog event structure and a pointer that maintains one’s place in the structure. Results from the pilot studies in concert with the literature review suggesting that bookkeeping processes are more likely to be disrupted by interruptions that take a speaker’s attention further away from a bookkeeping pointer (Levelt, 1989; Grosz & Sidner, 44 1986; Pickering, personal communication) suggest that the non-aligning conditions should be the most disruptive interruption conditions. Specifically, these conditions should elicit a) longer resumption lags, b) more disfluencies, and c) more outright resumption failures. The Bookkeeping Hypothesis should also predict differences within alignment types. Specifically, repetition interruptions should be less disruptive than clarification interruptions because it is likely that many clarification questions will address topics that the participant had not intended to cover. Therefore, repetition interruptions should serve as the baseline condition for all levels of the independent variable. Also, the true interruptions should produce more disruption than the digressions because they are even further from the conversational goals of the speaker. The Bookkeeping Hypothesis also relies somewhat on the assumption that working memory supports maintenance of bookkeeping goals during dialog. A relationship between working memory and disruption must therefore be observed in order to maintain this assumption. Results and Discussion Resumption lag measures. Average primary and secondary resumption lags are shown in Table 1 as well as Figure 3. Images in this dissertation are presented in color. A one-way ANOVA showed an overall significant effect of interruption type on both primary resumption lag, F(3,141) = 4.27, MSE = .97, p < .01, and secondary resumption lag, F(3,141) = 6.56, MSE = 2.98, p < .001. Planned comparisons among the interruption types showed that for the primary resumption lag measure, the interruption types that resulted in the longest lags 45 were the clarification and true interruptions, whereas repetition and digression interruptions resulted in the shortest primary resumption lags. The true interruptions resulted in significantly longer primary resumption lags than the repetition interruptions, t(47) = 2.37, p < .05, and digressions, although this latter effect was marginal, t(47) = 3.31, p = .10. However, primary resumption lags were not significantly longer following true interruptions than primary resumption lags following clarification interruptions, ((47) < 1. Clarification interruptions resulted in significantly longer primary resumption lags than both repetition interruptions, t(47) = 3.31, p < .01, and digressions, t(47) = 2.51, p < .05. Primary resumption lags were not significantly different in the repetition and digression conditions, t(47) < 1. To sum up, in the primary resumption lag measure, the repetition and digression interruptions seem to pattern together and the true and clarification interruptions pattern together such that the latter types are more disruptive than the former types, because speakers took longer to begin speaking again after the latter types. These results are not predicted by alignment accounts because the clarification interruptions, which are still related to the overall discourse goal, cause more disruption than the digressions, which are not so related. 46 Table 1. Means and standard deviations of the four measures of resumption difficulty, displayed by interruption type. Interruption Type Repetition Clarification Digression True Measure M SD M SD M SD M SD RL1 in seconds 1.36 0.85 1.94 1.13 1.43 1.21 1.81 1.21 RL2 in seconds 2.46 2.07 2.76 1.64 1.90 1.45 3.44 1.96 Repeated words 1.14 1.63 1.90 1.86 1.06 1.29 2.19 2.09 Disfluencies 0.40 0.45 0.52 0.45 0.35 0.45 0.63 0.47 4 _ _ Repetition Interruptions - Clarification Interruptions - Digressions - ”True" Interruptions Resumption Lag in Seconds RL1 RL2 Figure 3. Primary (RL 1) and secondary (RL2) resumption lags as function of interruption type. Error bars are standard enors‘. ‘ All error bars presented in figures throughout the present document are standard errors. 47 The secondary resumption lag measures showed a similar pattern, with some small differences. Mainly, the repetition interruptions were relatively more disruptive than in the primary resumption lag measure, and the clarification interruptions were slightly less disruptive (see Figure 3). True interruptions resulted in the longest secondary resumption lags. They resulted in lags significantly greater than those following repetition interruptions, t(47) = 2.89, p < .01, and digressions, t(47) = 4.73, p < .001; and they resulted in lags that were marginally greater than those following clarification interruptions, t(47) = 1.89, p = .07. Although digressions showed significantly shorter secondary resumption lags than clarification interruptions, t(47) = 3.20, p < .01, they were not significantly shorter than in the repetition condition, t(47) = 1.47, p > .10. On the other hand, repetition and clarification interruptions did not differ from each other significantly, t(47) < 1. Hence, those two types patterned together as would be predicted by alignment accounts of dialog. The problem for the alignment account with respect to these data are the digression results. A condition in which a completely unrelated topic is introduced during an interruption should derail the speaker’s attention more than the two aligning conditions. However, such a pattern of results was not observed in either the primary or secondary resumption lag measures. Instead, what the two sets of results share in common is rather amorphous compared to a simple alignment account. The true interruptions, as predicted, are greatly disruptive relative to the other interruption types in both resumption lag measures. However, in contrast to the predictions, the clarification interruptions are quite disruptive, and the 48 digressions hardly disruptive. Because this pattern propagates through the remainder of the measures to present, I will offer explanations for these findings once all of the data are presented. Number of repeated words. The dependent measure of number of words repeated upon resumption is intended to represent the need for speakers to backtrack upon being disrupted by an interruption. There is a need to backtrack because of the uncertainty of where one had left off. Data from this measure are presented in Table 1 and Figure 4. A one-way ANOVA was performed to determine whether speakers tend to repeat words that had been uttered prior to interruption. Results showed a significant overall ANOVA, F(3,141) = 5.15, MSE = 2.91, p < .01. Planned comparisons revealed a pattern very similar to that found in the resumption lag measures. Namely, the clarification and true interruption conditions were not significantly different, t(47) < 1, nor were the repetition interruptions and digressions, t(47) < 1. However, as with the resumption lag measures, the clarification and true interruptions resulted in more repeated words than the repetition and true interruptions (clarification vs. repetition, t(47) = 2.14, p < .05; clarification vs. digression, t(47) = 2.66, p < .05; true vs. repetition, t(47) = 2.59, p < .05; true vs. digression: t(47) = 3.25, p < .01). 49 3.0 - - Repetition Interruptions - Clarification Interruptions _ Digressions - "True" Interruptions 2.5 - 2.0 - Average Number of Repeated Words 01 0.5 - 0.0 - Figure 4. Average number of repeated words as a function of interruption type. A clear pattern is beginning to emerge: the clarification and true interruption types seem to be more disruptive than the repetition and digression types. There is one more dependent measure of disruption to examine: disfluencies. Disfluencies. Table 1 and Figure 5 display the mean proportion of trials during which speakers produced disfluencies upon resumption in Experiment 1, presented by interruption condition. As Figure 5 hints, a one-way ANOVA revealed an overall effect of interruption type, F(3,141) = 4.97, MSE = .16, p < .01. Again, the true interruptions and the clarification interruptions seemed to be more disruptive than the other two types of interruptions, although the difference between clarification and repetition interruption conditions was not significant, 50 t(47) = 1.37, p > .10. True interruptions resulted in significantly enhanced tendencies to produce disfluencies upon resumption than both repetition interruptions, t(47) = 2.83, p < .01, and digressions, t(47) = 3.45, p < .01; the true interruption type did not differ significantly from clarification interruptions, t(47) = 1.49, p > .10. Finally, the difference between the clarification and digression conditions was significant, t(47) = 2.32, p < .05, but digressions and repetitions did not differ, t(47) < 1. _O co 4: - Repetition Interruptions - Clarification Interruptions _ Digressions - "True" Interruptions p O) J Average Propotion of of Disfluency Trials 9 o M h 0.0 - Figure 5. Average proportion of resumptions containing disfluencies (uh or um). Given that the four measures of resumption difficulty all reveal the same general pattern of results, it is reasonable to conclude that this pattern is robust, and that true interruptions and clarification interruptions are more disruptive than repetition interruptions and digressions as defined in Experiment 1. However, 51 there are some alternative considerations that may explain some of the results, such as length of interruption, practice effects, and participants’ awareness of the purpose of the experiment. I will presently discuss these issues. Length of interruption. It would be problematic if the Experiment 1 patterns of disruption occurred simply because the more disruptive interruptions (clarification and true interruptions) happened to take longer. Average interruption lengths by condition are shown in Figure 6. It is clear that the repetition and digression interruptions were longer on average than the clarification and true interruptions. In fact, there was an overall significant effect of condition on interruption length in a one-way ANOVA, F(3,141) = 18.22, MSE = 4.65, p < .05. Given the implausibility of an assumption that shorter interruption lengths could lead to greater disruption, it appears that interruption length is not a concern for the pattern of results found in Experiment 1.5 5 In fact, Experiments 2 and 3 test the above implausible assumption and show it to be false. 52 181 - Repetition Interruptions - Clarification Interruptions _ Digressions _ "True" Interruptions 16- 14- Time in Seconds Figure 6. Average interruption length by type. Practice effects. One could reasonably suspect that many differences that arose between conditions were found solely in the first half of the description phase. That is, once a speaker has been interrupted a few times already, one might adapt to the interlocutor and come to expect more interruptions. However, due to the particulars of the experiment design, it is difficult to examine this issue statistically. The problem is that there were only two observations per interruption condition per subject at most. Moreover, many participants were only subjected to only one of the interruptions, generally because they failed to mention one of the plot points targeted for interruption. The major goal for each participant was to make sure that each interruption condition was represented once over the course of the two clip descriptions, and this goal was satisfied. But because 53 fewer than half of the participants tested filled out each cell of the 4 (interruption type) x 2 (clip number) design, any data presented from such an analysis would have to be considered unstable. My hope is that future work will be able to avoid this problem. Having stated that, informal investigations of dependent measure means by description number (1"t versus 2"“) hint that a large portion of the effects of interruption type observed in the present study occurred during the first description. Take Figure 7, below, which displays the average number of repeated words by condition divided into first-clip descriptions (on the left-hand side) and second-clip descriptions (on the right-hand side). It seems as though the effects were much stronger on average during the first set of descriptions than during the second. Other measures, such as secondary resumption lag, also showed such a trend. Again, however, it is important not to over-interpret these informal observations, as they average across many cells of missing data.6 The experiment was not suited to measure practice effects by ensuring an observation in each cell of the design required for such an analysis. ° lnforrnal observations of the data also hinted that interruptions were not any more or less costly from the first description to the second in terms of grand mean disruptiveness. 54 3 0 _ - Repetition ' _ Clarification - Digression 2‘5 _ _ True 2.0 - Number of Repeated Words 2; 0.5 - 0.0 - Clip 1 Clip 2 Number of Clip Description Figure 7. Number of repeated words as a function of clip number and interruption type- Awareness of purpose. During the course of pilot testing, it became apparent that trying to interrupt people more than a few times within the course of a limited time period might raise some suspicions about the purpose of the experiment. Because of this tendency, participants were given a brief questionnaire after the experiment in which they commented on whether they had noticed anything “unusual” during the experiment. Participants were coded into three categories: 1) Fully unaware of the purpose of the experiment (comprising 29% of participants); 2) aware of the unusual way in which the other “participant” (the confederate) kept interrupting (44% of participants); and 3) 55 aware that the confederate participant was intentionally, or artificially, interrupting (27% of participants). When this measure was used as a covariate in the ANOVAs of the four dependent measures presented earlier, it produced no interactions with the effects of interruption condition, Fs < 1. Therefore, it appears that a meta-awareness of the purpose of interruptions affords interruptees few advantages. Working memom Oddly, none of the dependent measures of interest shared any type of correlation with either verbal or spatial working memory span (all rs non-significant). There are two possible sources of explanation: either there was not enough power to observe the correlations, or the types of disruption measured in the present experiment are outside the domain of working memory. With regard to this latter possibility, what may be happening is that the types of working memory measured by reading span and spatial span correspond to offline processing phenomena (Waters & Caplan, 1996). That is, perhaps the types of working memory that underlie the ability to keep track of a bookkeeping pointer occur at a level that the memory tasks that were used do not adequately measure (Waters & Caplan, 1996). Summam. Although the predicted pattern of results did not emerge, the pattern of resumption difficultly that did emerge was fairly consistent. It appears that the two interruption types most associated with relatively large costs in speech resumption are clarification interruptions and true interruptions. This was true in observing both primary and secondary measures of resumption lag, unnecessarily repeated words, and disfluencies. Because clarification 56 interruptions were predicted to pattern along with repetition interruptions, and digressions predicted to pattern along with true interruptions, these results are somewhat surprising. However, these data also allow for several conclusions to be drawn. The first is that different kinds of interruptions produce different amounts to disruption to bookkeeping processes, or the memory and attention processes devoted to keeping track of where one is in a dialog. Therefore, conversational interruptions are, in fact, disruptive, and this is the first study to demonstrate this phenomenon experimentally. Furthermore, there are at least two properties of interruptions that are likely to cause more disruption than others. The first of these properties is the relevance to the overall goal of a dialog, or DP according to the terminology of Grosz and Sidner (1986). The second of these properties is the amount of response to an interruption that is required before returning to one’s previous discourse location. The underlying factor that unites these two properties of interruptions is attention. If an interruption is wildly off-topic, but lacks any noteworthy content and requires no response (as with the digression type), the interruption will result in relatively meager disruption. On the other hand, if an interruption is wildly off-topic, as well as interesting, unexpected, or bizarre, it seems that disruption to bookkeeping processes can be expected because attention has been successfully drawn away from the previous bookmark. Conversely, if the topic of an interruption is aligned with the overall discourse goal, but requires lengthy vocal response from the primary speaker (as with the 57 clarification interruption types), attention is similarly diverted from a dialog’s bookkeeping pointer. I will return to these issues in the General Discussion. 58 EXPERIMENT 2 Though Experiment 1 has many positive features for the psycholinguistic investigation of interruptions (especially its quasi-ecological nature), it does not have a very typical psycholinguistic design or approach. Psycholinguistic experiments are typically more tightly controlled than a naturalistic dialog situations could ever allow (Clark, 1996; Pickering & Garrod, 2004). The area of interest of Experiment 1 is also outside the scope of typical psycholinguistic research—interruptions, message planning, and prospective memory are under- represented in psycholinguistics. Experiment 2, in which individual sentences of speakers are interrupted at predictable locations during production by tightly controlled experimenter-induced interruption, attempts to fill in these potential deficiencies of Experiment 1. The objective of this experiment is to investigate the way syntactic representations of speakers are accessed for production before and after interruptions. I created a task to test whether syntactic planning is analogous to finding one’s place within a bookkeeping representation in the message representation. The task, in other words, was designed to determine whether speakers keep “syntactic books”. The experiment also examines the issue of incrementality in language production. Participants in Experiment 2 were interrupted by an experimenter at various points during the production of two-clause utterances that were elicited with stimuli similar to those used in Smith and Wheeldon (1999). The manipulation of interruption location (mid-first clause, between clauses 1 and 2, and mid-second clause) will test various hypotheses about the need to keep 59 “syntactic books” (records of syntax uttered and syntax planned) during language production. The interruptions will be of two types: verbal (a grammaticality judgment task) and non-verbal (an arithmetic judgment task). This manipulation will help clarify unresolved issues from the literature on task interruptions that were raised by Gillie & Broadbent (1989). It is important that both types of interruptions be sufficiently demanding in order to properly test hypotheses relying upon the assumption that interruptions are disruptive to primary task performance due to such memory issues as prospective goal encoding, cue retrieval, interference, and decay. math—0d Participants. Thirty-six participants were recruited in the same manner as in Experiment 1. Materials: Picture Description. Materials similar to those used in Smith and Wheeldon’s (1999) picture description task were used to elicit multiple clause sentences. The stimuli from this experiment depict arrays of several objects that depict movement above, below, or next to each other. The individual objects that comprised the whole picture arrays were taken from a database of rendered color images of simple objects created by Bruno Rossion, available free of cost at mpzllwwwcoggrown.edu/~tarrlstimuli.html. The objects are catalogued by naming times and errors in a database available in Rossion & Pourtois (2004). This database was chosen rather than the Snodgrass and Vanderwart (1980) object set because Rossion and Pourtois demonstrated that the color objects in their set were significantly faster and more accurate to be named than the line 60 drawings used by Snodgrass and Vanderwart (1980). All objects that were chosen for images in Experiment 2 had single-syllable names that were selected for their high naming accuracy and low reaction time according Rossion and Pourtois (2004). It was important that all objects be nameable within a certain range of accuracy and time. The object names used as experimental items are listed in Appendix D, along with their mean naming times and accuracy as reported by Rossion and Pourtois (2004). Objects were digitally arranged into arrays so that three objects at a time appeared side-by-side, in a row, in each array. Twenty-four objects were chosen to appear in 24 different object arrays which comprised the experimental items. Hence, each object appeared in three different experimental arrays. There were a number of constraints considered in creating these arrays. First, no object could be shown in the same position twice. Second, no object could be paired with another object more than once. Third, objects that were similar in visual or phonological form, or semantically related to other objects, were placed in separate arrays. The “experimental” objects were also placed into some filler picture arrays in order to prevent participants from expecting certain target grammatical structures given some set of objects. The remaining objects for filler arrays were selected from a set of 30 highly nameable, single-syllable objects that were chosen from the same database as the experimental objects. Half of the experimental arrays depicted the left-most object moving down by placing a downward arrow underneath it, and depicted the right-most object moving up by placing an upward arrow above it. The other half of the arrays 61 depicted the opposite: the left-most object moving up and the right—most object moving down. Figure 8 below shows an example object array. The numbers at the bottom of the arrays, as shown, carried the purpose of signaling to the experimenter when to interrupt. The middle digit of each 3-digit number at the bottom of the screen was the key: If the middle digit was a 0 or 1, a first-clause interruption was signaled; If the middle digit was a 2 or 5, a between—clause interruption was signaled; and if the middle digit was a 3 or 9, a second-clause interruption was signaled. As revealed by a post-experiment questionnaire, all participants were unable to learn this pattern, and most never even noticed the numbers. 109 Figure 8. Example Object Anay, Experiment 1. 62 The target sentences for the experimental object arrays were two-clause sentences such as “The spoon moves above the box and the bowl moves below the box.” Twenty-four multi-clause sentences were elicited as experimental items to be interrupted. Seventy-two filler sentences were also elicited. Twenty-four of the fillers were multi-clause utterances similar to the experimental items; however, no interruptions occurred during the production of these sentences in order to minimize the extent to which participants could predict the circumstances under which they will be interrupted. The remaining 48 filler sentences were simple, single-clause sentences in which a single movement event was described. Half of these filler descriptions were interrupted to minimize participants’ ability to predict when interruptions would occur. All experimental and filler target descriptions are listed in Appendix E. M_at_eflls: Speeded Grammaticalitv Juc_lg_ment. A list of 120 sentences that had been used in previous grammaticality judgment experiments were pilot tested to determine mean item latencies and accuracies. A set of 20 sentences, half grammatical, half ungrammatical, were chosen because the eight pilot participants answered them all correctly within a range of 3400 to 4400 milliseconds. This time range was chosen because the interruption task time window was set for 4000 milliseconds. The goal was for the interruption task to be as disruptive as possible (within reason). Hence, sentences that took just under or just over 4000 ms to respond to were selected. The sentences chosen are listed in Appendix F. 63 Materials: Speeded Arithmetic Judmnent. During the pilot testing of grammaticality judgment problems, 26 arithmetic problems were also presented to participants. Half of the problems showed an incorrect solution, and other half showed a correct solution. All problems were addition problems, and they ranged in difficulty from simple addition of 1-digit and 2-digit addends (easiest), to addition to two 2-digit addends (harder), to addition of two addends with decimal places to the tens place (hardest). Participants in the pilot test were to judge whether the problem was solved correctly. This pilot test showed that the two 2- digit addend problems fit best within the range of the 3.5— to 4.5—second interruption task window. Having found this, a set of 20 such problems were taken from Ferreira and Swets (2002) and modified to include either a correct or incorrect solution. These problems are listed in Appendix F. Mp. Two independent variables were manipulated within-participants. The first, interruption domain, had two levels: verbal (same domain) and arithmetic (different domain). The second variable, interruption site, had three levels, chosen by their location in the sentence. Speakers were interrupted at one of three sites: the middle of the first clause (e.g., The spoon moves INTERRUPT above the kite and the bowl moves below the kite), between the first and second clauses (e.g., The spoon moves above the kite INTERRUPT and the bowl moves below the kite), and the middle of the second clause (The spoon moves above the kite and the bowl moves IN TERRUPT below the kite). Each item (defined here as an object array with a particular target description) was represented in each of the six cells of the design an equal 64 number of times. Four blocks of 6 lists apiece were created. In each block, a particular item was counterbalanced across the two levels of the interruption type variable and across the three levels of the interruption locus variable. This is how the six lists were produced. Interruptions were sorted by type then assigned randomly to the experimental and filler items. In three of the lists, a particular experimental item was associated with a particular interruption of one type— arithmetic or grammaticality judgment. Consequently, the other three lists saw that same item associated with a different interruption of the other type. Filler items and filler interruptions remained constant across each list in a block. Each list was given its own pseudo-random order. A second block of lists was created in which the interruption tokens associated with each item were rearranged such that all of the interruptions used for filler items in Block 1 became interruptions used for experimental interruptions in Block 2. This rearrangement was done in order to reduce possible error due to the fact that particular interruption tokens may be disruptive than others. The lists in Block 2 were also each given their own pseudo-random order. Blocks 3 and 5 re-shuffled the random orders of the lists of Block 1, and Blocks 4 and 6 reshuffled the random orders of the lists of Block 2. Apparatus. The experiment was carried out on a Dell computer using E- Prime software. Vocal responses were recorded as in Experiment 1 onto a Marantz digital recorder for later transcription. An E-Prime microphone and voicekey apparatus recorded participants’ resumption times post-interruption. 65 Procedure. Each experiment began with a practice session during which the participant was trained to perform all three tasks required during the experiment. First, they were trained in the speeded grammaticality judgment task; then the speeded arithmetic judgment task; and finally, the description task. Each trial of the experiment began in the same way: the participant looked at a fixation point in the center of a computer screen and pressed a button to begin. An object array then appeared, and the participant began describing the array quickly, accurately, and with well-formed, complete sentences. On occasional predetermined experimental trials, the experimenter pressed a button in the middle of the participant’s utterance that caused a “beep” to sound. The beep sound was accompanied by a fixation screen that lasted for 250 ms. During the practice session, the participant was told that the beep signified that the participant was to immediately cease speech and perform the task that came up on the screen after the 250-ms fixation screen. Half of these interruptions were speeded grammaticality judgment tasks, and half were speeded arithmetic judgment tasks. There were 42 total interruptions during the experiment: 24 occurred during experimental trials, and 18 occurred during filler trials. The remaining 52 filler trials had no interruptions. The judgment tasks were designed to last equal amounts of time (4 seconds), and they discontinued at the 4-second point regardless of whether the participant had finished the problem he or she was working on. When the 4-second window terminated, another beep sounded, and the original image that the participant had been describing was presented again. The 66 instructions were to then resume the description of the image from precisely where the speaker had left off. After the participant finished the description, the experimenter pressed a button. The participant could then begin the next trial by pressing a button. No-interruption filler trials also began with a participant button press, also contained a description, and also ended with an experimenter button press—however, there were no beeps or judgment tasks in those trials. Data Analysis As in Experiment 1, several dependent measures were used to estimate the effects of interruption type and location on disruption. These measures were similar to those used in Experiment 1, but also had notable differences. One measure was resumption lag, or the time lag from the offset of the interruption to the onset of speech. This measure corresponds to the primary resumption lag measure used in Experiment 1.7 Three separate dependent variables measured disruption by examining resumption errors. That is, after performing an interruption task, participants would often resume speech ahead of, but more often, behind where they had left off. This type of error, particularly those errors in which participants skip ahead, thus omitting information, are indicative of bookkeeping disruption. I therefore measured the proportion of resumptions that contained repeated words; the proportion of resumptions in which words were skipped after an interruption; and the number of words by which speakers were “off” upon resumption, which is like averaging the absolute values of the number 7 There was no measure of secondary resumption lag in Experiment 2, partially because of time and resource limitations, but mainly because the measure would not be nearly as informative. Specifically, there are no instances of speakers talking about anything besides the pictures in front of them, and so there were never any points at which one could say that, yes, the participant began speaking again here—but not about the picture. 67 of repeated words and the number of skipped words across subjects. First, I will fully describe each of these measures below. Then I will detail some extraneous variables that may contribute to disruption. It is important to rule out these variables as possible causes of disruption to validly test the present hypotheses. Resumption lag. The logic behind this measure follows that of Experiment 1: The more a particular interruption type or location disrupts bookkeeping, the longer it should take that speaker to begin speaking again. An E-Prime voice-key mechanism automatically collected this measure as the time from the re-onset of the object array following the interruption to the time when the participant first triggered the voice key afterward. Resumption errors. A laboratory assistant na'r've with respect to the hypotheses of the experiment transcribed participants’ utterances and coded them as presently described. Three types of resumption errors were coded from transcripts of participants’ recorded utterances. The first type of error was the proportion of trials in which the speaker repeated words that had already been uttered. Instructions to participants stated explicitly to resume sentences from exactly where they had left off after the interruption task. Because of this, the present experiment assumes that retracing words from before the interruption must be an error. Utterance 1 provides an example of an utterance containing such repetition. 1. “The frog moves below the cat and the cake moves (TASK) the cake moves above the cat. 68 In this case, the speaker repeated the words the cake moves even though those words had already been uttered, and would already have been understood by the theoretical audience. Hence, this utterance would be (and was) coded as a repeat. For future reference, the coder also correctly noted that three words were repeated. The coder was instructed to include as repeats only words that had been uttered in their entirety before performing the interruption task. In other words, if some began saying below, but only uttered be! before being interrupted, and then resumed by saying below in its entirety, such a scenario was not coded as a repeat. The second type of error to be coded was the proportion of resumptions in which a participant skipped words germane to a grammatical sentence. Unlike Experiment 1, in which the contents of upcoming utterances were impossible to predict, each utterance in Experiment 2 was entirely predictable. If, as described in Utterance 1, a participant saw a frog moving above a cat and a cake moving below that cat, only one utterance is possible (with a few unimportant, and likewise predictable, exceptions). Because of this predictability, it was possible to measure how far ahead a participant “skips” into sentence production. Take Utterance 2: 2. “The leaf moves (TASK) and the glass moves below the foot.” The speaker of this sentence was describing a slide in which a leaf moved above a foot and a glass moved below the foot. The target utterance is as follows: The leaf moves above the foot and the glass moves below the foot. The underlined portion of this target utterance, consisting of three words, is what the speaker 69 had neglected to utter, or skipped. Much like repeats, a word that was only partially uttered before an interruption could not be counted as being skipped in the resumption. A word had to be skipped completely to be considered a skipped word. The significance of this measure of skipped words is that it is perhaps the clearest demonstration of disrupted bookkeeping. One could reason that repeating words is a natural component of sentence resumption, and, in fact, participants often appeared to repeat words simply out of habit. If a speaker repeats words, it is still possible to understand what he or she is saying, because everything necessary is said—in some cases, twice. On the other hand, skipping forward in a sentence often results in an utterance that is either ungrammatical or fails to adequately capture a description. Utterance 2 is not ungrammatical, but it does not express the entire message a hypothetical listener would need in order to fully imagine the picture. It is important that Experiment 2 demonstrates cases of this type of disruption, not only to measure how the independent variables impact their occurrence, but also to justify the use of repeated words as a measure of bookkeeping disruption. If people repeat words but never skip them upon resumption, it could be argued that no disruption had taken place at all, but merely some type of habit of repetition. The finding that speakers skip words in sentences after interruptions would be excellent evidence that the interruptions used in Experiment 2 can cause speakers to lose track of their place in sentence production. 70 Finally, the number of repeated words and the number of skipped words were coded into a single measure: number of words in error. For this measure, repeating two words and skipping two words were both coded the same way. The advantage of this measure is that it codes not only the total errors in one measure, but does so in a way that represents the scope of the error.8 Other measures. As in Experiment 1, there are several factors that could obscure interpretation of results. One of these factors is the accuracy and speed with which participants performed the interruption tasks in the different conditions. If the arithmetic problems that were used as interruption tasks were more difficult, or simply took longer, than the grammaticality judgments (or vice versa), then it may be difficult to determine whether it is the time spent doing the task or its domain that would lead to differences in resumption measures. Another issue is that of experimenter reaction time lag—the ability of the experimenter to press a button to interrupt the speaker at precisely timed points in relation to spoken clause boundaries. This could be a problem because it recruits human timing to make the judgment call of when to hit the button to interrupt the speaker. Since a major variable in this study is the location of the interruption, it is important to minimize the error due to experimenter reaction time lag. Another precaution has to do with the HCI findings (Trafton et al, 2003; Hess & Detweiler, 1994) that repeated exposure to interruptions during complex primary tasks reduces the disruptive effects that can be observed in resumption 3 Because of the noticeable infrequency of filler-type disfluencies in Experiment 2, such disfiuencies were not used as a measure of disruptiveness in Experiment 2. 71 lag measure—in other words, practice effects. This reduction of RL effects due to practice may result in the need to include fewer rather than more interruption trials, which could reduce the power of Experiment 2.9 Hypotheses apd Predictions Domain effects. How is bookkeeping during language production represented? ls bookkeeping a domain-general phenomenon that requires general attentional and memory resources, or is it domain-specific such that only verbal or non-verbal interruptions will sufficiently disrupt it? Disruptive interruptions requiring a lot of verbal response from the speakers in Experiment 1 raised the possibility that verbal interference could play a role in resumption difficulty following interruptions. Moreover, Gillie & Broadbent (1989) found that task interruptions of the same domain as the primary task were more disruptive than task interruptions of different domains. The hypothesis that similar effects will be observed for interruptions to sentence production, a more natural mode for interruptions, will be termed the Same-Cost Hypothesis. This hypothesis is based on the assumption that within-domain interruptions produce more interference for the prospective goal signal than extra-domain interruptions. The hypothesis predicts that the grammaticality judgment task will be more disruptive to speakers than the arithmetic judgment task because the interruptions taxing 9 It may also be of some concern that the total time spent “interrupting” in this task is almost as great at the amount of time it takes to produce a single utterance. In that sense, this experiment would not be a study of interruptions, but rather of task switching. However, there are answers to this concern. First, a single trial follows the format of interruptions put forth by Trafton et al (2003): there is a primary task (describing the picture), a secondary task (judgment task), and a resumption of the primary task. Additionally, there are multiple trials in a row during which no interruptions occur. In that sense, the primary task of description lasts quite a bit longer than a single interruption because it lasts over several trials spanning up to 20 seconds, whereas interruptions in this task last for only a few seconds. 72 the same verbal resources as primary tasks will be more disruptive than interruptions that tax more general or otherwise-specified resources. The Same- Benefit Hypothesis, however, holds that cross-modal cognitive exertion will cause more interference when recalling the goals of the primary task (speech) when the recall of those goals is necessary, leading to greater resumption lag costs. Locality effects. The other primary question to address in Experiment 2 is one that is of both practical and theoretical concern. The question is this: At what point during sentence production is it most costly to be interrupted? Are early, middle, or late interruptions the most costly? The theoretical interest of this question lies in the contrast between a purely temporal explanation for disruption and an explanation that rests on hierarchical considerations. A temporal explanation could be considered from two points of view—is it more costly to be interrupted while the path ahead is a long and arduous one, or is it more costly to be interrupted after the path has already been traveled? One point of view is that interruptions are most costly at the ends of sentences because speakers must have the representation of the entire already-uttered sentence stored in order to return to the correct bookkeeping pointer. The other point of view is that it is most costly to be interrupted at the beginning of a sentence because a great deal of the utterance plan has yet to be executed. This view holds that speakers create a global syntactic plan for a sentence that must be actively maintained in memory during production of the entire sentence. The further one progresses in the production of an utterance, the less prospective memory the speaker must maintain to complete the sentence. This viewpoint relies on key findings in the 73 language production literature (Christianson, 2002; Ferreira & Swets, 2002, 2005; Swets & Ferreira, 2003) showing that speakers plan much more content before speaking and during speech than allowed for by the phrase-by-phrase view that has been previously endorsed (e.g., Smith & Wheeldon, 1999, 2001). In other words, the hypothesis that early interruptions are more disruptive than later interruptions is a hypothesis for non-incremental language production, and it predicts that speakers interrupted early in a sentence show more disruption than speakers interrupted later in a sentence. On the other hand, perhaps the question of locality in interruptions to sentence production has little to do with temporal issues, and much more to do with hierarchical, or syntactic issues. It is possible that the most important consideration in determining the relative effects of interruption at different points in sentence production is simply the proximity to clause boundaries. It may be either more or less disruptive to be interrupted when at or near a clause boundary than when one is articulating the middle of a clause. If this were the case, bookkeeping processes would be shown to involve a clear syntactic component. Such a result would also verify that, like task interruptions, conversational interruptions are best handled at coarse rather than fine break points (Adamczyk & Bailey, 2004). Results and Discpssion This section will be organized as follows. First, I discuss the accuracy and judgment time data. These results are presented first because understanding the nature of these measures is crucial to interpreting the resumption lag and 74 resumption error data that are presented aftenrvard. Presentation of data relating to possible practice effects follows. Finally, I summarize the data of Experiment 2 and discuss the larger implications of the results. Interruption Task Accuracy and Judgment Time. Participants were quite accurate when responding to the interruption tasks. On average, participants were able to correctly answer the tasks within the 4-second window with 76.4% accuracy. However, when these numbers are broken down by experimental condition, an interesting effect occurred. The left half of Figure 9 illustrates this effect. A 2 (interruption type) x 3 (interruption locus) repeated measures ANOVA showed a main effect of interruption type, F1(1,35) = 21.89, MSE = .06, p < .001; F2(1,23) = 31.28, MSE = .03, p < .001. Specifically, participants were much more likely to judge arithmetic problems correctly, and in time (M = .84, SD = .12), than to make correct grammaticality judgments in time (M = .69, SD = .15). There was no main effect of interruption locus on this measure, nor did the two factors interact, all Fs < 1. 75 - 1st Clause - Between 1st and 2nd Clauses - 2nd Clause 1.0 I 0.8 'I E t 0.6 - O O C .Q 1: 8 9 0.4 ‘ o. 0.2 — 0.0 — Language Arithmetic Language Arithmetic All Trials Answered In Time Figure 9. Proportion of conectly answered intenuption tasks. But consider the right half of Figure 9, in which accuracy rates are depicted from trials in which participants answered in time. Clearly, if participants gave an answer in time, they were at least as accurate for grammaticality judgments as they were for arithmetic problems. A two-way ANOVA confirmed that there was no main effect of interruption type on accuracy when only trials answered in time are considered, F(1,35) = 2.20, MSE = .043, p > .10. Additionally, there was no main effect of interruption locus, nor was there an interaction between interruption type and locus. 76 The comparison between these two accuracy measures reveals depicted in Figure 9 suggests that participants do not find the grammaticality judgments any more difficult than thearithmetic judgments. The grammaticality judgments simply took longer to complete, as evidenced by the finding that accuracy rates were so much better for language tasks if all trials were included in analyses. More evidence that language tasks simply took longer to answer than arithmetic judgments can be seen in the judgment time data. Figure 10, below, shows mean judgment times broken down at each level of both independent variables. The left half of Figure 10 shows the means by subject for each trial in which the interruption task was answered within the available 4-second window.10 The right half shows mean times to answer on only those trials that were answered correctly. Since both analyses showed the same results, I will present statistics only from the “in time” measure. A 2 x 3 repeated measures ANOVA revealed a significant main effect of interruption type, F1(1,35) = 102.33, MSE = 161,200, p < .001; F2(1,23) = 147.89, MSE = 75136, p < .001, such that grammaticality judgments (M = 3042 ms, SD = 259) took longer on average than arithmetic judgments (M = 2490 ms, SD = 371). The main effect of interruption locus was also significant by participants, F1(2,70) = 3.17, MSE = 69,066, p < .05; but not by items, F2(2,46) = 2.04, MSE = 45635, p > .10. It seems that the locus effect was driven by the third location. When speakers were interrupted during the second clauses of sentences, they took longer to make judgments (M ‘° Due to software limitations, it was not possible to obtain judgment time measures for trials in which the participants took longer than 4 seconds to answer. If such a measure was possible, it would probably show even larger effects of interruption type on judgment time, based on the comparison of accuracy rates in Figure 9. 77 = 2829, SD = 355) than when interrupted during the first clause (M = 2725, SD = 311), t(35) = 2.36, p < .05, or between the first and second clauses (M = 2745, SD = 270), t(35) = 1.98, p = .056. It is difficult to say why such an effect occurred, but it may suggest that interruptions are more difficult to process when a sentence has nearly been completed. The two factors did not interact. - 1st Clause - Between 1st and 2nd Clauses 3500 - - 2nd Clause 3000 — 2500 - (I) 'U C § 2000 - g S E 1500 - 0) .§ [— 1000 - 500 4 0 _ Language Arithmetic Language Arithmetic Answered In Time Correct Only Figure 10. lntenuption task judgment latencies as a function of intenuption domain and locus. Again, even without including trials in which participants did not answer in time, language interruptions took about a half second longer than arithmetic interruptions. This result is important to introduce before the rest of the results because it could color interpretation of the remaining disruption result. 78 Specifically, the result raises the following question: If we find that language tasks are more disruptive than arithmetic tasks, is it the use of language representations across the primary and secondary tasks that causes interference, or is it simply the amount of time language interruptions take, that causes these effects? This issue is important not only for this experiment, but for Experiment 3. Resumption lag. The first measure of disruption to primary task (description) performance under consideration is resumption lag. Two separate ANOVA analyses were performed. The first analysis was based on all trials in which participants responded in time to the interruption task, regardless of whether the answer to task was correct or not. The other analysis was performed using only the data from correctly answered trials. The rationale for using both analyses is that each could be informative in different ways. The “in time” measure could be useful to show how participants are disrupted regardless of whether they answer the interruption problem correctly or not. On the other hand, it could be argued that using such a measure is much more noisy than a measure in which all participants have performed the interruption task correctly. A two-way ANOVA showed that when all correct and incorrect trials were included in which participants answered in time, there was a marginal main effect of interruption type by that was significant by subjects, F1(1,35) = 3.74, MSE = 189497, p = .061, but not by items, F2(1,23) = 1.87, MSE = 265,418, p > .10. The opposite pattern was shown for interruption locus, in which a main effect was not found to be significant by subjects, F1(2,70) = 1.47, MSE = 403677, p > .10, but 79 was significant by items, F2(2,46) = 3.19, MSE = 142688, p = .05.11 The two factors did not interact, Fs < 1. The results analyzing only correctly answered judgment trials were more clear, and are shown in Figure 11 below. The ANOVA revealed a main effect of interruption type that was significant by subjects, F1 (1 ,35) = 13.10, MSE = 133596, p = .001, and marginal by items, F2(1,23) = 2.89, MSE = 271224, p = .10, such that grammaticality judgment interruptions (M = 1190 ms, SD = 388) took longer to recover from than arithmetic judgments (M = 1010 ms, SD = 356). There was also a main effect of interruption locus that was significant by subjects, F1(2,70) = 4.33, MSE = 184725, p < .05, and marginal by items, F2(2,46) = 3.04, MSE = 119795, p = .06. There was no interaction, Fs < 1. The locus main effect was driven by the difference between first-clause interruptions (M = 1215 ms, SD = 498) and the other two locations: between clauses (M = 1077 ms, SD = 359), t(35) = 2.08, p < .05, and during the second clause (M = 1010 ms, SD = 356), t(35) = 2.37, p < .05. The latter two interruption locations did not differ significantly, t(35) = 1.20, p > .20. ‘1 Pairwise comparisons are presented for the correct-only analysis below. 80 1600 — - 1st Clause - Between 1st and 2nd Clauses 1400 _ - 2nd Clause 1200 - 1000 - 800 - 600 - Time in Milliseconds 400 - 200 - Language Arithmetic Interruption Domain Figure 11. Resumption lag as a function of interruption domain and locus, correct trials only. There are two preliminary conclusions that can be drawn: First, interruptions that come earlier in a sentence are more disruptive than interruptions that come toward the middle or toward the end of a sentence. In other words, it is more costly for early-sentence planning to be disrupted than at latter stages, at which point more material has been uttered. This conclusion can be drawn with some confidence because the accuracy and judgment time data do not present confounds for interpretation. The second preliminary conclusion is that language-based interruptions, because they share the same memory resources as language production, are particularly disruptive to bookkeeping 81 processes due to interference. However, this conclusion is not as strong because of the potential confound between interruption duration and interruption domain. Res—umption errors. Although resumption lag offers one insight into the disruption caused by different interruption properties, the errors speakers make in resuming sentences after interruptions are perhaps just as telling, if not more so. I will presently describe the effects of different interruption conditions on repeated words, skipped words, and total words in error, in turn. Out of every single experimental trial, including those in which no answer was recorded for the interruption task, speakers repeated at least one word on 16.3% of trials. This is a rather high percentage considering the instructions given to participants to resume sentences from exactly where they had left off. This figure dropped to 12.8% when only including correctly-answered interruption tasks. I will present analyses from both samples for this dependent measure, as well as for skips and words in error, in order to get a complete picture of error tendencies. It is possible that, in looking at data only from when speakers answered interruption tasks correctly, one ignores an interesting pocket of data: the one in which speakers were most disrupted. For this reason, both types of analyses are included. The analysis including all trials (see the left half of Figure 12) showed clear main effects of both interruption type, F1(1,35) = 10.81, MSE = .023, p < .01; F2(1,23) = 5.89, MSE = .028, p < .05, and locus, F1(2,70) = 16.58, MSE = .036, p < .001; F2(2,46) = 18.69, MSE = .021, p < .001, as well as a significant interaction between type and locus, F1(2,70) = 13.73, MSE = .028, p < .001; 82 F2(2,46) = 8.29, MSE = .032, p = .001. I will begin with the main effects. Participants were much more likely to repeat a word after resumption if the interruption was language-based (M = .20, SD = .14) rather than arithmetic (M = .13, SD = .12), t(35) = 2.39, p < .01. Each level of the locus effect was different from each other level. Interruptions during the second clauses of sentences resulted in the most repeats (M = .26, SD = .19), more than both first clause interruptions (M = .16, SD = .17), t(35) = 2.81, p < .01, and between-clause interruptions (M = .08, SD = .10), t(35) = 6.01, p < .001. First-clause interruptions resulted in more repeats than between-clause interruptions, t(35) = 2.87, p < .01. It appears that a single difference, between second-clause language versus arithmetic interruptions, accounted for both the main effect of interruption type and the interaction. Specifically, during the second clause, verbal interruptions (M .38, SD = .30) were more disruptive than arithmetic interruptions (M = .14, SD .17), t(35) = 4.75, p < .001. When only the arithmetic data were considered, there was no difference between first-clause and second-clause interruption locations, t < 1, whereas a clear difference was evident in the language-only analysis. 83 0.5 - 9‘ - Ist Clause ii - Between 1st and 2nd Clauses a) o 4 a - 2nd Clause m o D .5 .s g 8 o 3 - o . to C _O 8 E g 0.2 ~ 0) o: "5 C .9 g 0.1 - O. 9 o. 0.0 ‘ r Language Arithmetic Language Arithmetic All Trials Correct Only Figure 12. Average proportion of resumptions containing repeats as a function of interruption domain and locus. The analysis including only correctly-answered trials (see the right half of Figure 12, above) showed many of the same statistical trends, but at reduced levels of significance. The two-way ANOVA showed no significant main effect of type, F1(1,35) = 2.54, MSE = .036, p = .12; F2(1,23) = 1.81, MSE = .036, p = .19, but did show a main effect of locus, F1(2,70) = 3.31, MSE = .054, p < .05; F2(2,46) = 6.75, MSE = .028, p < .01, as well as an interaction between type and locus that was marginal by subjects, F1(2,70) = 2.50, MSE = .034, p = .09, and significant by items, F2(2,46) = 3.81, MSE = .032, p < .05. The painrvise effects break down in much the same way as in the previous analysis, so will be omitted for simplicity. 84 Two aspects of the repeat data bear some discussion: the odd patterns of repeats across both samples that were analyzed, and the reduction in significance from the all-trials analysis to the correct-only analysis. The overall pattern of word repetition likelihood with regard to interruption locus is quite interesting. Why would speakers repeat words more often when interrupted toward the ends of sentences? There are two general explanations possible. The first explanation appeals to bookkeeping mechanisms, and suggests that there are two locality constraints that determine the disruptiveness of interruptions: whether an interruption occurs mid-clause or between clauses, and whether the mid-clause interruption is early or late in a sentence. The second possible explanation is that speakers, when interrupted, may tend to seek out previous clause boundaries from which to resume—and they do this here despite instructions not to. It is difficult to distinguish these two accounts, but further looks at the error data, compared with resumption lag data, later in this section, will help do so. If mid-clause and late interruptions cause more bookkeeping errors than between-clause interruptions, then speakers should not only repeat words more often at such junctures—they should also skip words more often. In fact, given the anecdotal evidence that it seems natural to repeat following interruptions, skipped words are likely an even more valid measure of disruption to bookkeeping. The likelihood that speakers would skip words was analyzed with two different two-way ANOVAs: the first on the set of all trials, right or wrong, whether answered in time or not; and the second on correctly-answered 85 interruption task trials. The ANOVA performed for all trials (see the left half of Figure 13, below) showed main effects of both interruption type, F1 (1 ,35) = 4.94, MSE = .011, p < .05; F2(1,23) = 7.37, MSE = .006, p < .05, and locus, F1(2,70) = 8.25, MSE = .013, p =.001; F2(2,46) = 8.32, MSE = .008, p = .001, as well as a significant interaction between type and locus, F1(2,70) = 4.71, MSE = .008, p < .05; F2(2,46) = 2.98, MSE = .008, p = .06. The interruption type main effect shows, again, that language interruptions (M = .056, SD = .071) are more disruptive than arithmetic interruptions (M = .023, SD = .058). Interestingly, pairwise comparisons for the interruption locus effect showed that the mid-first clause interruptions (M = .083, SD = .12) were most disruptive, resulting in more skipped words than the between-clause interruptions (M = .021, SD = .056), t(35) = 2.84, p < .01, and second-clause interruptions (M = .014, SD = .050), t(35) = 3.44, p < .01. The between-clause and second-clause interruptions were not different, t< 1. The interaction is singularly attributed to the fact that first-clause interruptions resulted in more skips if they were verbal (M = .125, SD = .174) rather than arithmetic (M = .042, SD = .127), t(35) = 2.65, p < .05. 86 0.5 - an O. .2 (D 0.4 - - 1st Clause 5’ - Between 1st and 2nd Clauses g - 2nd Clause 8 0 0.3 - m C O '3 O. E 5’, 0 0.2 " a: "6 C .9 1: g 0.1 - 9 o. 0.0 , HUL— Language Arithmetic Language Arithmetic All Trials Correct Only Figure 13. Average proportion of resumptions containing skips as a function of intenuption domain and locus. A quick glance at right-hand side of Figure 13 hints that the effects shown for the all-trials skips analysis disappears when computed only for trials in which interruption tasks were correctly answered in time. In fact, all levels of the analysis (main effects and interaction) that were significant in the prior analysis are not significant in the correct-only analysis, Fs < 1. There is a logical explanation for this result: time to prepare for resumption. When speakers answered in time, andcorrectly, there was an average of 1239 ms between the end of the interruption task and the re-introduction of the picture array that signaled description resumption. It could be argued that the reason there were 87 any skips at all in Experiment 1 is that speakers occasionally took so long to answer the interruption task, they had no time between the end of the task and the resumption of the sentence to prepare to resume, and therefore had to make a quick guess about where they were in the sentence. Since this happened more often in the first location, especially with language, it is possible to make the preliminary conclusions that early, language-based interruptions are more disruptive than later, non-verbal ones. The final bookkeeping error type to consider is the total number of words spoken in error (see Figure 14). This measure is intended as an overall picture of bookkeeping error—it captures the total number of words that participants skip ahead or repeat, added together and averaged. The picture that emerges is quite clear: Speakers tend to make the most and the largest bookkeeping errors when interrupted in the middles of clauses rather than at clause boundaries, and when interrupted by language interruptions (that take longer) rather than arithmetic interruptions (that do not take as long). Two separate analyses were performed as they were for repeats and skips. The two-way ANOVA in which all trials were included yielded significant main effects of both interruption type, F1(1,35) = 25.70, MSE = .214, p < .001; F2(1,23) = 10.53, MSE = .348, p < .01, and locus, F1(2,70) = 9.95, MSE = .573, p < .001; F2(2,46) = 16.14, MSE = .224, p < .001, as well as an interaction between type and locus, F1(2,70) = 10.44, MSE = .368, p < .001; F2(2,46) = 6.64, MSE = .398, p < .01. The interruption type effect again showed that verbal interruptions (M = .65, SD = .46) were more disruptive than arithmetic interruptions (M = .33, SD = .34), t(35) = 5.07, p < .001. Both mid- 88 clause levels of the locus variable (first-clause: M = .50, SD = .44; second- clause: M: .77, SD = .81) resulted in significantly more words in error than the between-clause interruption type (M = .20, SD = .32), t(35) = 3.82, p = .001, t(35) = 3.90, p < .001, respectively. The comparison between the two mid-clause interruption locations showed a marginal difference such that second—clause interruptions resulted in more words in error than first-clause interruptions, t(35) = 1.82, p = .08. 1.6 - _ 1st Clause 1.4 _ - Between tst and 2nd Clauses - 2nd Clause —L N l .3 o L Average Number of Words 9 o O) on P b J _O N 1 0.0 - Language Arithmetic Language Arithmetic All Trials Correct Only Figure 14. Average number of words in emor (repeated and skipped) upon resumption as a function of intenuption domain and locus. The correct-only analysis showed trends in similar directions, but failed to produce similarly significant effects, especially the domain effect. The main effect 89 of domain in the correct-only analysis (see the right-hand side of Figure 14) was not significant, Fs < 1. Also, the main effect of interruption locus was not significant by subjects, F1(2,70) = 2.16, MSE = .400, p = .12, but was significant by items, F2(2,46) = 6.76, MSE = .106, p < .01. The interaction was marginally significant by subjects, F1(2,70) = 2.90, MSE = .165, p = .06, and significant by items, F2(2,46) = 4.80, MSE = .111, p < .05. Since the interaction is the only result that was significant, or near significant, for both subjects and items, I will focus on that result for interpretation. The only difference between the language and arithmetic simple effects is that second-clause verbal interruptions are significantly different in words in error (M = .52, SD = .83)_than both first-clause (M = .16, SD = .31), t(35) = 1,83, p = .07, and between clause verbal interruptions (M = .16, SD = .31), t(35) = 2.48, p < .05; on the other hand, arithmetic interruptions showed no differences in terms of locus, ts < 1. So, again, it appears that whatever force is driving the effects of locus, it is exacerbated by language interruptions more than mathematical interruptions. Experimentar error. Two general approaches confirm that the experimenter who pressed the button to initiate the interruption process was able properly instantiate the manipulation of interruption locus. The first approach is to show that each interruption locus showed consistent onset times by condition across participants. The E-Prime software used to implement the experiment registered the amount of time that passed between the initial onset of the object array to be described and the experimenter’s button press to initiate the interruption. This measure showed that across all trials, the time to interruption 90 onset differed significantly across interruption locations, F(2,70) = 331.45, MSE = 97291, p < .001. Painrvise comparisons showed that, as expected, first-clause interruptions had the earliest onset time (M = 1568 ms, SD = 294). These 1 interruptions had onsets before those of between-clause interruptions (M = 2592, SD = 480), t(35) = 15.24, p < .001, which in turn had onsets before second- clause interruptions (M = 3459, SD = 454), t(35) = 11.09, p < .001. Clearly, the temporal manipulation check reveals no problem. The other approach that serves as a manipulation check is to sample several participants at random to check whether they were interrupted at the appropriate point in utterance articulation. Transcribed data from five participants were selected at random. Each transcription showed three elements: A complete record of what the participant said, the point at which the beep sounded to signal the participant to stop, and the point at which the participant actually stopped speaking to perform the interruption task. Two aspects of these transcriptions were coded for manipulation checks: the number of words by which the experimenter missed the intended interruption mark, and the number of words the speaker uttered after the beep, but before completing the interruption task. Out of the 120 trials analyzed in this way, only four of the trials contained severe errors that violated the intention of the manipulations. One of these errors occurred because of a combination of factors: the speaker used an abbreviated utterance, was beeped two words too late, and continued to finish the rest of her utterance (three additional words) before addressing the interruption task. This trial had been excluded even before performing statistical analyses for 91 Experiment 2. The other three errors were slightly more concerning: In two cases, for an intended between-clause interruption (before the onset of the word “and”), the beep sounded only a single word early, as in (3) below. 3) The cake moves above the (BEEP/1' ASK) mouse and the key moves below the mouse. What this example shows is that between-clause interruptions may have been particularly susceptible to experimenter error. If the experimenter anticipated this location a word early, then the interruption occurred while the first clause was still being uttered. There was slightly more room for error on the other side: If the experimenter was one word late (interrupting after and but before the), then the interruption could still be said to occur at the clause boundary. On the other hand, if the was articulated before the beep, then the interruption occurred during the articulation of the second clause. The fourth error in the sampled data set was of this latter type. It should also be noted that one between-clause manipulation error resulted in a “repeat”: 4) The shirt moves below the (BEEP/T ASK) below the ear and the dog moves above the ear. Because the interruption did not occur at the clause boundary, but was coded as such in data analyses, it is possible that errors such as these may have biased the data. Specifically, the data reported above (and in Experiment 3) may be overestimating the extent to which repeats legitimately occur when speakers are interrupted between clauses. 92 Practice effects. Data from correctly answered interruption trials were broken into halves according to the order in which they appeared in a given experimental session. Each half was treated as an independent variable with two levels: first and second half. Analyses of variance were performed to examine whether practice effects may have reduced the extent to which the effects of interruption domain and locus changed over the course of individual experiment sessions. Besides one exception, none of the 4 major dependent measures examined in Experiment 3 showed that half interacted with either independent variable, all Fs < 1. In other words, the effects of interruption locus were present regardless of how much practice a participant had with being interrupted. The exception was a marginal interaction on the repeats measure that was found between interruption domain and 1’”t versus 2"d half, F(1,35) = 2.20, MSE = 326156, p = .09. However, because the interaction was only marginal, it may not be prudent to interpret the results.12 However, there was one main effect, of resumption lag, which showed that performance as a whole improved from the first half to the second half of experiment sessions. Resumption lag measures reflected a drop from 1218 ms (SD = 553) in the first halves of sessions to 980 ms (SD = 279) during second halves, t(35) = 2.94, p < .01. Repeats, skips, and words in error revealed no main effects of experiment half, Fs < 1. It would seem, for this particular task, that practice does not reduce the differences between interruption types. ‘2 An informal look at the means indicated that the interaction occurred not because differences between domains were reduced throughout the session, but rather that participants became more efficient with arithmetic interruptions than verbal interruptions throughout the course of the experiment. 93 Summary. Every major measure of resumption difficulty showed, to different degrees, that the two independent variables used in Experiment 2 produced effects. The resumption lag measure revealed that it takes longer to resume following language interruptions, as well as early interruptions. The measure of repeats showed that people most often repeat words when interrupted by language tasks, but showed that most repeats occur when interrupted mid-clause, especially if the clause is later in the sentence. Speakers most often skipped words, resulting in incomplete sentences, when faced with language interruptions and early interruptions. The common results here are that the within-domain interruptions were consistently more disruptive than the different-domain interruptions, and that the mid-clause interruptions were consistently more disruptive than the between-clause interruptions. I will first discuss the locality effects, since those effects have no judgment time duration confound. I will then discuss the possible interpretations of the domain effects. Although Experiment 3 and the General Discussion will explore these issues in more detail, the results of Experiment 2 reveal some interesting tendencies regarding resumption difficulty along temporal and location-based dimensions. First, it was found that it simply takes longer to resume a sentence if a speaker is interrupted early on in sentence production. I will argue that this result supports theories of strategically incremental language production (Ferreira & Swets, 2002; Swets & Ferreira, 2003). On the other hand, the resumption error data revealed that most errors occurred when participants were interrupted in the middles of clauses rather than at clause boundaries. Later, I will argue that these 94 results support the notion that conversational interruptions are similar to task interruptions (Adamczyk & Bailey, 2004; McFarlane, 2002): Interruptions at coarse break points (like clause boundaries) are more easily handled than interruptions at fine break points (mid-clause). There is little question that the manipulations used in Experiment 2 affected speakers” ability to maintain records of utterance articulation. Unfortunately, there are three possible sources of the effects that took place: First, there are the two variables that I intended to manipulate, interruption domain and interruption locus. But there was also a third variable that may have influenced disruption: time, both spent in judgment (allowing decay to act longer in the language interruptions), and available after judgment to prepare to resume (giving arithmetic interruptions a longer resumption preparation window). Because the grammaticality judgment interruptions simply took longer to answer than the arithmetic judgment interruptions, an unfortunate confound arises. Namely, since they did not take as long, and because there was a constant 4- second interruption window, participants gained extra resumption lag time when answering arithmetic as opposed to language interruptions. But why did verbal interruptions take longer, especially given the attempt to equate verbal and arithmetic interruption on such a dimension before testing? Two general explanations are possible. The first explanation is the more theoretically interesting one. It is possible that having to maintain information about sentence production is a process that interferes with a verbal task more so than it does with an arithmetic task—the particular memory resources overlap to 95 such a degree that interruption task performance is degraded, which in turn leads to the longer times to answer the task, which in turn reduces the amount of time and verbal resources necessary to resume sentence production (Reitman, 1971, 1974). The other possibility is less interesting: because the interruption domains were not particularly well controlled for temporal factors before testing, the time required to perform each task may have been different a priori, independently of a possible interference factor. A third experiment was performed to help determine whether the interference or temporal account of the domain effect in Experiment 2 is more valid.13 '3 The main effects of locus found in Experiment 2 are uncontaminated by any interruption length confound. 96 EXPERIMENT 3 Experiment 2 showed that a speaker interrupted by a grammaticality judgment task is more likely to be disrupted upon resumption than when interrupted by a non-verbal, arithmetic judgment task. However, it was also found that the grammaticality judgments in Experiment 2 simply took longer to answer than the arithmetic judgments. This difference in length between interruption types could have been due to interference from primary and secondary tasks that share the same representational resources (Gillie & Broadbent, 1989). Otherwise, it could also be the case that if interruptions types are carefully equated in latency dimensions before testing their effects on resumption, the domain effects disappear, thus supporting a notion of temporal effects on disruption that Gillie and Broadbent (1989) had not found. Experiment 3 calibrated a set of grammaticality judgments and a set of arithmetic judgments to be equal in mean latencies. These tasks were then used as interruptions in a task exactly like Experiment 2. If temporal factors account for the domain effects of Experiment 2, then there should be no effect of interruption type on the various measures of resumption difficulty. On the other hand, two outcomes could support the interference explanation: First, longer judgment times for the verbal interruptions despite careful pilot testing would be somewhat compelling. Such a result might suggest that there is some aspect of bookkeeping that is disruptive to secondary task performance (Reitman, 1971, 1974). But more importantly, verbal interruptions must result in greater primary task resumption difficulty. The 97 former result would be evidence for interference. The latter result would be evidence for interference that causes disruption to bookkeeping. Norming Study A The results of Experiment 2 showed a confound in resumption lag results. On one hand, the verbal interruption domain could have produced greater effects on resumption lag and other measures of disruption because of interference related to domain-specific processing resources. On the other hand, because verbal interruptions lasted longer than arithmetic interruptions, it is possible that participants used the extra time available in the arithmetic condition as a helpful boost to resumption lag before the beep signaled them to resume speaking. Then again, it could be that the verbal interruptions took longer because of domain-dependent interference. In order to sort out this issue, a norming study was run to determine whether the interruption sets used in Experiment 2 would show domain differences in a simple judgment task in which representations or processes from a primary task could not interfere with the secondary task judgments. The sentences and arithmetic problems used as interruptions in Experiment 2 were presented in isolation to a group of participants to establish baselines of judgment latency and accuracy. Math—0d Participants. Twenty-three participants were recruited in the same manner as in Experiments 1 and 2. Materials. The same 21 sentences and 21 arithmetic problems that were presented as interruptions in Experiment 2 were used as the stimuli. 98 Daaiggu All 42 items were presented in random orders to each participant. Apparatus. An E-Prime software program was used to present the stimuli and collect the dependent measures of judgment latency and accuracy. Procedure. After being administered informed consent, participants read instructions that asked them to respond to the grammaticality of sentences and the truth of solutions to the arithmetic problems by pressing yes or no on a button box. The instructions stated that they were to respond to the sentences and problems as quickly and as accurately as possible, and the experimenter reinforced this particular instruction verbally. Each trial began with participants fixating a cross in the center of the computer monitor. Pressing either the yes or no button allowed participants to view the following sentence or arithmetic problem to judge. No limit was placed on the amount of time participant could use to judge the stimuli. After the participants judged the 42 stimuli, they were debriefed about the purpose of the experiment. Results and_Discussion Table 2 shows the means and standard deviations of the accuracy and latency data. The most salient result was that grammaticality judgments took about a whole second longer to answer than arithmetic judgments whether analyzing all trials, t(22) = 7.64, p < .001, or only the correct trials, t(22) = 7.84, p < .001. However, accuracy rates remained the same across type, t< 1. 99 Table 2. Latency and accuracy data, Experiment 3, Norming Study A. Language Arith_metic Measure Mean SD Mean SD Latency in ms (all trials) 3881 721 2879 536 Latency in ms (correct only) 3838 690 2881 556 Accuracy (percent) 94.0 7.52 95.0 4.65 Establishing this baseline of interruption latencies is revealing for the results of Experiment 2. It is quite clear now that interference is not a viable explanation for the longer latencies for secondary task response. Instead, the set of verbal interruptions simply took longer for participants to judge than the set of arithmetic interruptions. However, domain-dependent interference could still explain the disruption to primary task performance seen across domain types in Experiment 2. In order to test this, a series of iterative norming studies was first carried out. The objective of these studies was to obtain a set of grammaticality judgments that yielded the same latencies as a set of arithmetic judgments so that the interference assumption could finally be laid to rest. Three norming studies followed the first one, which established the baseline. There were two aspects of equating the two interruption sets. First, arithmetic problems were made more difficult by using only problems that added up to two-digit sums with both the ones place and the tens place having integers greater than or equal to 5. The more challenging step was to reduce the amount of time that participants spent reading sentences during grammaticality judgments. This was done by reducing the number of content words in the 100 sentences to about 5 per sentence. Rather than give results for the two norming studies in which equal latencies were not achieved across domain, I will presently proceed straight to the fourth norming study (which I will call Norming Study B). Norming Study B m Participants. Twenty participants were recruited in the same manner is in Experiments 1 and 2. Materials. A set of 21 sentences and 21 arithmetic problems was created by modifying the materials of Norming Study A over a series of iterations. The final materials list for the present study is presented in Appendix G. Design Apparatus and Procedure. All details of design, apparatus, and procedure were identical to the details of Norming Study A, above. Results and Discussion Table 3 shows the means and standard deviations of the accuracy and latency data. T-tests showed that, although a slight trend seemed to exist for grammaticality tasks to take longer and be more accurate than arithmetic tasks, these trends did not approach significance, ts(19) < 1. Because the differences do not approach significance, the set of 21 items per domain were used in the Experiment 3 interruption task. 101 Table 3. Latency and accuracy data, Experiment 3, Norming Study B. Language Arith_metic Maaaure Mean SD Mean SD Latency in ms (all trials) 2732 841 2658 547 Latency in ms (correct only) 2710 824 2658 522 Accuracy (percent) 92.6 7.33 89.7 10.9 Interruption Study mam Participants. Forty-eight participants were recruited in the same manner as in Experiments 1 and 2. Materials: Picture Description. The same set of 24 experimental object arrays and 72 filler arrays that were used to elicit descriptions in Experiment 2 were used in Experiment 3. Materials: Speeded Grammaticality Judgment. The list of 21 sentences in Appendix G, which were shown to be judged with equal latencies and accuracy rates as the arithmetic tasks, were used as interruptions to participants’ descriptions in Experiment 3. Materials: Speeded Arithmetic Judgment. The list of 21 arithmetic problems in Appendix G, which were shown to be judged with equal latencies and accuracy rates as the sentences, were used as interruptions to participants’ descriptions in Experiment 3. m. The design details of Experiment 3 are the same as those in Experiment 2, with a few minor exceptions. First, 48 rather than 36 unique lists 102 were created to accommodate the 12 extra participants to be run. Second, block creation was slightly different, in that interruptions were randomly re-assigned to different description items after each block of 6 lists. This was done to reduce even more the chances that particular interruption-description pairings could bias the data. Apparatus. Apparatus details match those of Experiment 2. Procedure. Details of procedure are identical to those of Experiment 2, with a single exception: the time given to participants to judge interruption tasks was reduced from 4 seconds to 3.25 seconds. This was done because the average judgment time for each task domain, as pilot tested above, was just under 3 seconds. About 75% of the responses took about 3.25 seconds or less. Because it was important for the interruptions to take as much of the interruption time as possible to ensure some disruption upon resumption, the 3.25 second figure seemed like a reasonable figure to use. Data Analysis The same dependent measures used in Experiment 2 were used for Experiment 3. Hypotheses and Pregictions The purpose of Experiment 3 was to test two different accounts of the domain effect found in Experiment 2. According to one hypothesis, the domain effects occurred because verbal interruptions caused domain-specific interference for recall of primary tasks (descriptions). This hypothesis predicts that such effects will again be found in the present experiment. The other 103 hypothesis is that the different interruption lengths between the interruption domains accounted for the effects. This hypothesis predicts a null result of domain in the present experiment. Results and Discussion Interruption task accuracy and iudg_ment time. As in Experiment 1, participants were quite accurate overall in being able to accurately complete the interruption tasks—despite the smaller time window. On average, participants were able to correctly answer the tasks within the 3.25-second window with 74.0% accuracy. Recall that in Experiment 2, there was an overall accuracy difference between language and arithmetic interruptions when all trials, including those in which no response was given, were considered (arithmetic was more accurate). The left half of Figure 15 (below) shows that this effect did not occur in Experiment 3, nor was there an effect of locus, or an interaction, Fs < 1. 104 - 1st Clause - Between 1st and 2nd Clauses 1'0 _ - 2nd Clause 0.8 - E t 0.6 d O 0 C .9 1: 8 9 0.4 - o. 0.2 - 0.0 - Language Arithmetic Language Arithmetic All Trials Answered In Time Figure 15. Proportion of correctly answered intenuption tasks. The right half of Figure 15 depicts accuracy rates from trials in which participants answered in time. A two-way ANOVA showed a main effect of interruption type on accuracy when only such trials were considered, F(1,47) = 26.15, MSE = .027, p < .001, F2(1, 23) = 30.25, MSE = .013, p < .001. This effect occurred because when participants gave an answer in time, they were more accurate for grammaticality judgments (M = .94, SD = .10) than they were for arithmetic problems (M = .84, SD = .13). There was no main effect of interruption locus, nor was there an interaction between interruption type and locus (Fs < 1). 105 The comparison between the two sides of Figure 15 suggests a speed- accuracy trade-off was taking place for the two types of interruption tasks. When participants failed to answer in time, it usually occurred during a language task. However, if they did answer in time, they were correct on language tasks more often. However, it is the judgment time data that matters most. Figure 16, below, shows mean judgment times in Experiment 3. The left half of Figure 16 shows the means by subject for each trial in which the interruption task was answered, correctly or not, within the available 3.25-second window. The right half shows mean times to answer on only those trials that were answered correctly. Both analyses showed similar results, and suggested a speed-accuracy tradeoff indeed existed in Experiment 3 based on interruption type. However, the degree of the tradeoff in terms of speed was not nearly as severe. Again the results were quite similar when examining results from the two sets of data, so only the correct-only analysis will be used. When only correct trials were considered, a 2 x 3 repeated measures ANOVA revealed a main effect of interruption domain that was marginal by subjects, F1(1,47) = 3.13, MSE = 119,300, p = .08, and non-significant by items, F2(1,23) = 1.64, MSE = 48,171, p = .21. On average, grammaticality judgments (M = 2471 ms, SD = 269) took 72 milliseconds longer than arithmetic judgments (M = 2399 ms, SD = 321). There was no main effect of locus, Fs < 1. However, there was a marginally significant interaction between type and locus, F1(2,94) = 2.29, MSE = 52,677, p =.11; F2(2,46) = 2.94, MSE = 33245, p = .06. However, this interaction was non- 106 significant, and contributes little to interpretation of upcoming resumption measures. - 1st Clause 3000 1 - Between 1st and 2nd Clauses - 2nd Clause 2500 ~ .3 2000 ~ C O 0 o a 5 1500 a .E (D .E r— 1000 — 500 - 0 _ Language Arithmetic Language Arithmetic In Time Correct Only Figure 16. Interruption task judgment latency as a function of interruption domain and locus. The primary concern of Experiment 3 was to create interruption tasks that took the same amount of time to answer across domain. The marginally significant difference, by subjects, between language and arithmetic interruptions could be considered a problem if not for three reasons: First, the two interruption types showed no difference in the above normative test results. This might suggest that, once the primary task is involved, it is within-domain interference that forces the difference in judgment times across domain. I should note, 107 however, that such a conclusion would be weak in the present circumstance. Second, the effect did not approach significance by items, and the difference itself was on average 72 ms (recall that the differences in Experiment 2 and Norming Study A reported above were much higher, on the order of 500-1000 ms). And finally, as I will show presently, interruption type had little if any effect on any measures of resumption difficulty. Hence, the endeavor to equate interruption types on the basis of judgment time was a modest success. Resumption lag. The first measure of disruption to primary task (description) performance under consideration is resumption lag. A single 2 x 3 ANOVA was performed using only the data from correctly answered trials. The rationale for using only the correct trials was that the two interruption types were closer in judgment times than when all correct and incorrect trials were included. Because the major goal of the present study was to analyze the effects of interruption type and locus without judgment time confounding interpretation, the other analysis was discarded. Results are presented in Figure 17 (below). The ANOVA revealed that interruption type had no effect on resumption lag, Fs < 1. However, as in Experiment 2, there was a main effect of interruption locus that was significant by subjects, F1(2,94) = 13.74, MSE = 107007, p < .001; F2(2,46) = 15.26, MSE = 38228, p < .001. There was no interaction, Fs < 1. 108 - 1st Clause - Between 1st and 2nd Clauses 1400 _ - 2nd Clause 1200 - 1000 - 800 - 600 - Time in Milliseconds 400 - 200 - Language Arithmetic Interruption Domain Figure 17. Resumption lag as a function of intenuption domain and locus, correct trials only. There are two important results that should be addressed: The non- significant effect of domain, which does not replicate Experiment 2 results, and the replication of the Experiment 2 locus effect (with a small twist). The lack of a domain effect here makes one point very clear: the only reason there was an effect of interruption domain on resumption lag in Experiment 2 was because the two interruption sets tested took different amounts of time to perform a priori. In other words, the interpretation of Experiment 2 that relied on effects of interference is untenable. However, as I will discuss later on, this difference between the two experiments does have theoretical and practical import: 109 Namely, that a temporal factor plays into resumption difficulty, which has rarely been shown in HCI interruptions literature (Gillie & Broadbent, 1989). Recall in Experiment 2 that the locus main effect was attributed to first- clause interruptions resulting in longer resumption lags than both between-clause and second-clause interruptions. The main effect of locus in Experiment 3 was statistically more robust than in Experiment 2, and the pairwise comparisons also displayed one key difference from Experiment 2. Although the first-clause interruptions still resulted in longer resumption lags (M = 1138 ms, SD = 350) than second—clause interruptions (M = 906 ms, SD = 242), t(47) = 5.28, p < .001, first-clause interruptions were not significantly more disruptive than between- clause interruptions (M = 1096 ms, SD = 433), t < 1. Also, the between-clause interruption resulted in longer resumption lags than the second-clause interruptions, t(35) = 3.50, p = .001, whereas Experiment 2 did not show a difference between those locations. However, these differences are not problematic for the conclusions that were drawn on the basis of the results of Experiment 2. That conclusion was that interruptions coming earlier in a sentence production are more disruptive than interruptions that come later in a sentence. It seems that the locality effects on resumption lag are explained by the fact that the earlier a speaker is within the course of utterance production, the weightier the plan for the rest of the utterance is. As one reaches the middle of the final clause of an utterance, there is less of a plan to re-load once an interruption has been adequately handled. The next question to address is whether resumption errors show patterns similar to resumption lag results. 110 R_es_iflption errors. Resumption errors will be presented as both all-trial and correct-only analyses in the figures below, but statistical analyses will only be presented for the correct-only analyses. Too many variables could be influencing resumption errors when more than just correct trials are included. It was useful to present these analyses in Experiment 2, when the gathering of information of many sorts was important. However, the present scope of interest is more narrow, so analyses will focus on cases in which interruptions were judged correctly and within the 3250 ms window. I will first consider repeats. The two-way analysis including only correctly- answered trials (see the right half of Figure 18, below) showed no significant main effect of domain, nor did it show an interaction between domain and type, Fs < 1. However, the ANOVA did show a main effect of locus, F1(2,94) = 5.74, MSE = .078, p < .01 ; F2(2,46) = 6.49, MSE = .027, p < .01. Pairwise comparisons again showed that second-clause interruptions resulted in more repeats (M = .23, SD = .30) than both first-clause (M = .13, SD = .21) and between-clause (M = .10, SD = .22) interruptions, t(47) = 2.60, p < .05, t(47) = 2.74, p < .01, respectively.“ 1‘ The all-trials analysis showed evidence for a domain effect, but again, this effect is difficult to interpret with all the noise from accuracy and temporal confounds. 111 - 1st Clause 0,35 - - Between 1st and 2nd Clauses _ 2nd Clause 0.30 - 0.25 - 0.20 - 0.15 - 0.10 - 0.05 - Proportion of Resumptions Containing Repeats 0. 00 " r Language Arithmetic Language Arithmetic All Trials Correct Only Figure 18. Average proportion of resumptions containing repeats as a function of interruption domain and locus. In Experiment 2, the effect of interruption locus on skips was only significant when all trials were included for analysis. This weakened the claim that any aspect of Experiment 2 measured actual bookkeeping disruption, since the cleanest way to analyze the present data is to use only those trials in which the interruption task is answered correctly. Experiment 3, in which the average time lag between pressing the button to answer the interruption task and hearing the signaling sentence resumption was greatly reduced, found effects of interruption locus on skipping tendencies in the correct-only analyses (see Figure 112 19, below). The two-way ANOVA yielded a significant effect of locus, F 1 (2,94) = 12.05, MSE = .009, p <.001; F2(2,46) = 13.67, MSE = .063, p < .001. Just as in Experiment 1, the first-clause interruptions resulted in skips (M = .070, SD = .116) more often than both between-clause interruptions (M = .015, SD = .044) and second-clause interruptions (M = .010, SD = .038), t(47) = 3.50, p = .001, t(47) = 3.95, p < .001, respectively. Between-clause and second-clause interruptions did not differ, t< 1. 0.5 ~ (I) .9- x g, 0.4 - g - lst Clause II; - Between 1st and 2nd Clauses 5 - 2nd Clause 0 0.3 - to C O 23 O. E a m 0.2 " o: ”5 C .9 t g 0.1 - 2 o. 0.0 - r Language Arithmetic Language Arithmetic All Trials Correct Only Figure 19. Average proportion of resumptions containing skips as a function of intenuption domain and locus. The final bookkeeping error type to consider is the total number of words spoken in error (see Figure 20, below). The correct-only analysis found no main 113 effect of domain, Fs < 1. There was a main effect of interruption locus, F1(2,94) = 4.16, MSE = .767, p < .05, F2(2,46) = 7.21, MSE = .255, p < .01. Domain and Locus did not interact, Fs < 1. Planned comparisons among levels of the Locus variable revealed that speakers produced the greatest magnitude of resumption errors when interrupted during the second clause (M = .68 words, SD = .90): The difference between second-clause and between-clause interruptions (M = .38 words, SD = .70) was significant, t(47) = 2.18, and the difference between second-clause and first-clause interruptions was marginally significant, t(47) = 1.89, p = .07. Although the means suggested such a trend, first-clause interruptions did not show any greater error magnitude than did between-clause interruptions, t < 1. However, the analysis in which all trials were included (see the left half of Figure 20) indeed showed that this difference was significant. Hence, the overall resumption error magnitude data contribute to the patterns of results showing that resumption errors are reduced when interruptions come between clauses. 114 1.4 - - 1st Clause - Between 1st and 2nd Clauses 1 2 - 2nd Clause 0) g 1.0 - E 3 0.8 - o .0 E 3 2 0.6 — c U) S! 3% 0.4 ~ 0.2 ~ 0.0 ~ 1 Language Arithmetic Language Arithmetic All Trials Correct Only Figure 20. Average number of words in error (repeated and skipped) upon resumption as a function of intenuption domain and locus. Practice effects. Data from correctly answered interruption trials were broken into halves according to the order in which they appeared in a given experimental session. Each half was treated as an independent variable with two levels: first and second half. Analyses of variance were performed to examine whether practice effects may have reduced the extent to which the effects of interruption domain and locus changed over the course of individual experiment sessions. None of the 4 major dependent measures examined in Experiment 3 showed that half interacted with either independent variable, all Fs < 1. In other words, the effects of interruption locus were present regardless of how much 115 practice a participant had with being interrupted. This result differs from previous research on task interruptions (Hess & Detweiler, 1994; Trafton et al, 2003) which showed that training could eliminate the disruptive effects of interruptions. However, there were main effects of experiment half showing that performance as a whole improved from the first half to the second half of experiment sessions. For example, resumption lag measures reflected a drop from 1128 ms (SD = 357) in the first halves of sessions to 979 ms (SD = 348) during second halves, t(47) = 3.21, p < .01. First-half repeats (M = .18, SD = .23) were marginally more common that second-half repeats (M = .13, SD = .19), t(47) = .08, and although first-half skips occurred on 4% of correct trials versus 1.9% in the second half, the difference was not significant, t(47) = 1.21, p = .13. Experimenter error. Just as in Experiment 2, each interruption locus showed consistent interruption onset times by condition across participants. Across all trials, the time to interruption onset differed significantly across interruption locations, F(2,70) = 324.53, MSE = 132772, p < .001. Pairwise comparisons again showed that first-clause interruptions had the earliest onset time (M = 1674 ms, SD = 292). These interruptions had onsets before those of by between-clause interruptions (M = 2591, SD = 446), t(47) = 16.61, p < .001, which in turn had onsets before second-clause interruptions (M = 3569, SD = 763), t(35) = 11.95, p < .001. So, again, the temporal manipulation check reveals no problem. The other manipulation check sampled several participants at random to observe whether they were interrupted at the appropriate point in utterance 116 articulation. Transcribed data from six participants were selected at random. Out of the 144 trials analyzed in this way, only seven of the trials (4.8%) contained severe errors that violated the intention of the manipulations. One error occurred because the experimenter never initiated the interruption task for an intended first-clause interruption. This was the only intended first-clause interruption that did not occur during first-clause articulation. For the other 6 errors, 3 occurred for intended between-clause interruptions, and 3 occurred for intended second- clause interruptions. Although the three between-clause errors were similar in nature to those described in Experiment 2, the second-clause errors were different in kind because they did not occur due to experimenter error. Rather, in these cases, the experimenter initiated the interruption task at the correct locus in articulation, but rather than immediately stop speaking, as instructed, speakers finished the present sentence before completing the interruption task. These trials had been excluded from analyses, and so trials such as these do not present a problem for any of the above analyses. Summam. Unlike Experiment 2, in which all measures of resumption difficulty showed that verbal interruptions were more costly to speakers than non- verbal interruptions, not a single measure of resumption difficulty showed such an effect in Experiment 3. Resumption lag, repeats, and skips were affected equally by interruption domain in Experiment 3. Clearly, the Experiment 2 domain effects were due to inadequately-controlled task sets. As such, the possibility that domain-specific interference was driving the domain results of Experiment 2 (Reitman, 1971, 1974) is dead. On the other hand, this contrast in results is 117 interesting for another reason: it shows that the amount of time someone has to perform an interruption task and the amount of time a person has to recover from that task are important factors in recovery from interruption (Gillie & Broadbent, 1989). The General Discussion will explore this issue further. Although the domain effects in Experiment 2 failed to replicate with properly controlled stimuli, the locality effects were replicated with some small exceptions. Experiment 2 had shown that interruptions coming during the middle of the first clause of a two-clause picture description were resulted in higher resumption lag and more skips than interruptions coming at a major clause boundary or during the second clause. Experiment 3 also showed that resumption lag increased during the first clause, but also showed an elevated resumption lag at the clause boundary. Repeats came more often in the middle of clause 2 in both experiments, and both experiments revealed that mid-clause interruptions are more costly in terms of overall words spoken in error than between-clause interruptions. There are two conclusions to be drawn from this set of results, which will be expanded upon in the General Discussion. The first conclusion is that the resumption lag measure in this paradigm is sensitive to production planning load—the more burdensome the utterance plan is that must be reloaded upon resumption, the longer it will take to reload it. The second conclusion is that the resumption error measures are sensitive to bookkeeping load. Because errors are less likely to occur at clause boundaries than in when a speaker is mid- clause, it must be easier for speakers to maintain a bookkeeping pointer when 118 interrupted at a clause boundary (a coarse break point, Adamczyk & Bailey, 2004) than when interrupted in the middle of a clause (a fine break point, Adamczyk & Bailey, 2004). 119 GENERAL DISCUSSION The present project could be viewed as taking a rather asymmetrical approach to a scientific issue. Whereas Experiment 1 is rather open-ended, exploratory, and (psycholinguistically speaking) uncontrolled, the project ends with Experiment 3, a highly controlled experiment with specific questions about one or two issues dealing with interruptions. Although this asymmetry could be viewed as being scientifically unaesthetic, there are legitimate reasons for the disparate approaches. Experiment 1 approaches the problem of conversational interruptions from a naturalistic, applied perspective, and answers questions of bookkeeping at the message level during interactive conversation. The experiment tests whether there are types of conversational interruptions that, due to their differences in relatedness to the goals of the primary dialog, are more disruptive than other interruptions. But besides being about interruptions, what truly ties all of the experiments together is that they all approach the same question from very different angles. When we are interrupted, what is it about those interruptions that makes us lose track of where we were? Although Experiment 1 begins to answer this question in a broad sense, it is clear that the full answer to this question will require the careful decomposition of many disparate factors. For this reason, studies that are more like Experiment 1 will never be able to fully embrace the challenge of explaining conversational interruption cost until studies like Experiments 2 and 3 break large problems down into smaller pieces. For example, the result in Experiment 1 showing that clarification interruptions, which required a verbal 120 response from the speaker before resuming the primary dialog, were more disruptive than repetition interruptions, raised the possibility that verbal interference played some role in resumption difficulty. Experiments 2 and 3 set out to test such an interference hypothesis, a process that would be difficult to accomplish using the techniques of Experiment 1. The General Discussion will be organized as follows. First, I will address the ways in which the present results contribute to ongoing research on interruptions in field of human-computer interaction. The present research will contribute to that field’s work on temporal issues of resumption from interruptions, the influence of interruption domain on primary task performance, and the role of practice effects in recovering from interruptions. A thorough examination of the ramifications of the present study for theories of language production and dialog will follow. First, I will address how the findings of Experiment 1 reflect on the model of dialog put forth by Pickering and Garrod (2004). Then, I will discuss how the results of Experiment 1, 2, and 3 begin to lay the groundwork for a more well-specified theory of bookkeeping at the levels of both dialog and sentence-level language production. Finally, I will explain how the results of Experiments 2 and 3 provide further support for the notion that sentence planning is a somewhat non-incremental process. The General Discussion will conclude with a series of proposals for a future program of research on interruptions. 121 Implications for Research on Interruptions The Introduction laid out a number of issues that have been investigated by researchers in human-computer interaction. The temporal issues of coarse versus fine break points (Adamczyk & Bailey, 2004), task length (Gillie & Broadbent, 1989), and practice effects (Hess & Detweiler, 1994; Trafton et al, 2002) are all common issues for research on interruptions, and the present work sheds light on all of them. The present work also deals with such issues as domain-specific interference, the role of attention in recovery from interruptions, and, importantly, the memory for task goals. I will address each of these issues in turn. Temporal Issues Experiments 2 and 3 were highly-controlled examinations of speech resumption tendencies following interruptions. Each experiment manipulated the timing of the interruption in two ways: early versus late, and mid-clause versus between clause. Both of these issues pose Important practical implications for human-computer interaction—namely, that the timing of interruptions can have a measurable effect on resumption (Adamczyk & Bailey, 2004). The present experiments were also relevant to such issues as interruption length and practice effects. Sentence Location as st Progression. Is it more costly to be interrupted early or late in task performance? Little work has been done related to this question in the HCI literature, and the present study makes it clear that perhaps this issue should be addressed. Experiments 2 and 3 both found that the further 122 one reaches in sentence production, the faster one is able to resume utterance production when the interrupting task ends. The explanation that was offered relates to the size of the plan that becomes disrupted. Early in sentence production, assuming a non-incremental view (Christianson, 2002; Ferreira & Swets, 2002), there is a plan for sentence production that extends far ahead. Later in sentence production, that plan has largely been executed. I concluded that there is a memory cost to have to re-Ioad a weightier plan early in sentence production. However, there is some question as to whether these findings would extend to realms beyond sentence production. Coarse vs. fine break points. Recall the Adamczyk and Bailey (2004) study, in which participants expressed more frustration if they were interrupted at fine break points than at coarse break points, took longer to perform primary tasks after fine break point interruptions, but showed no effect of coarse versus fine break point interruption on resumption lag. The authors noted that the apparatus that measured resumption lag was only able to record time in full- second intervals, and hence, they were unable to rule out effects of coarse versus fine break points on the local measure of resumption lag. An examination of Experiments 2 and 3 affirms the general pattern of results shown in the Adamczyk and Bailey (2004) study. I submit that being interrupted in the middle of a clause is analogous to a fine break point, and that being interrupted at a clause boundary is the analog of a coarse break point. Given this analysis, Experiments 2 and 3 showed that resumption errors are more likely to occur when interruptions occur at coarse break points in sentence 123 production, i.e., at clause boundaries. In sum, present results represent a potentially informative contribution to the notion that coarse and fine break points in human representation of tasks play a large role in dealing appropriately with interruptions (Zacks & Tversky, 2001). Issues of Resumption Lag and Task Length. In one of the foundational studies of what types of interruptions cause more disruption to primary task performance, Gillie and Broadbent (1989) failed to find any effects of the duration of the task. When Gillie & Broadbent (1989) studied the effects of mathematics task interruptions on participants performing a verbal item-location matching task, they found that interrupting had no effect on performance whether the math task was short (30 s) or long and involved (over 2 minutes). Insofar as the comparisons between Experiments 2 and 3 of the present study showed that the length of an interruption does, in fact, influence resumption difficulty, the present study represents evidence to assert that interruption length can, under some circumstances, affect primary task performance. The upshot of this finding is that the results may be considered rather artificial. It is questionable whether resumption tendencies would have been affected if speakers had been shown the picture target and been allowed to resume speaking as soon as they pressed a button to answer the interruption tasks. The present results cannot decide this matter. However, perhaps future research can answer this question with more precision. On the other hand, Experiment 1 showed that the length of an interruption had little to do with disruption that interruption caused for speech resumption 124 during dialog. It would be interesting to dissect the issue of task length further, because it clearly has ramifications for theories of how memory supports the storage of prospective goals over interruptions. Specifically, what role, if any, does decay play in explaining the disruptiveness of interruptions? gramme effects (and the lack thereof). Studies by Trafton et al (2003) and Hess and Detweiler (1994) found that the disruptive effects of interruptions can be trained away with practice. Although several dependent measures of disruption in Experiments 2 and 3 showed main effects of practice such that performance in general improved from the first to the second halves of experimental sessions, the order variable rarely if ever interacted with the manipulated variables of interruption domain or locus. In other words, if a first- clause interruption caused a higher resumption lag than a second-clause interruption for a particular speaker during the first half of the Experiment 2/3 paradigm, such a relationship would likely remain intact during the second half of the experiment as well. Such results imply that, at least in the context of the tasks employed here, there are cases in which interruptions do not lose their ability to disrupt primary task performance. A Role for Interference? A sizable portion of the preceding document has been devoted to trying to understand whether domain-specific interference could play a role in how different sorts of interruptions might disrupt performance of a primary task. Experiments 2 and 3 reveal that domain-specific interference could be a non- 125 issue for the interruptions literature, but Experiment 1 raises the possibility of a different sort of interference. flpport for the Similarity vaothesfi As noted in the Introduction, Gillie & Broadbent (1989) had shown that the total time spent on a primary task (TOT) in a verbal item-allocation task was unaffected by an arithmetic task interruption. However, when a verbal task interrupted the item-allocation task, TOT increased, regardless of whether an interruption warning was given. The authors concluded that similarity between primary and secondary tasks, in particular along the dimension of domain, seems to affect the degree to which an interrupting task is disruptive. To be precise, arithmetic tasks should disrupt arithmetic tasks to a greater degree than verbal tasks should, and verbal tasks should disrupt verbal tasks to a greater degree than arithmetic tasks should. The notion at work here is that interruptions that occupy the same mental resources as primary tasks should be more disruptive than interruptions that recruit other mental resources (Adamczyk & Bailey, 2004). In other words, the particular domain or modality that an interruption takes is vital for determining whether that interruption will cause disruption via interference to domain-specific memory mechanisms. Experiment 3 demonstrates that the particular domain of an interruption has no effect on the resumption of sentence production following interruptions if other factors are properly controlled. Before proceeding, I should note that the dependent measures used in Experiment 3 and the dependent measures used in Gillie and Broadbent (1989) differ. Whereas Gillie and Broadbent (1989) used a global measure of time on task as their dependent measure of disruption, 126 Experiment 3 used several local measures, including resumption lag and three types of resumption error. Perhaps the original study would have made similar observations if the present measures had been used—it is impossible to say. What can be said is this: Experiment 3 of the present study contributes to a growing body of evidence supporting the notion that domain-general short-term and working memory factors underlie a significant proportion of memory phenomena (Kane et al, 2004; Swets, Desmet, Hambrick, & Ferreira, 2005) that have typically been regarded as phenomena of domain-specific memory factors (Shah & Miyake, 1996). In the present case, it seems as though both arithmetic- and language-based tasks disrupt memory for utterance plans and bookkeeping pointers equally. That is, if interference from a secondary task is responsible for disruption to primary task performance, the null results of the Experiment 3 might suggest that this interference is domain-general. However, this conclusion must be considered tentative for two reasons. First, conclusions based on null results are simply not convincing. Second, it is possible that the reason domain effects were not observed in Experiments 2 and 3 is that the task that had been used to introduce non-verbal interference, the arithmetic judgment task, actually may have a verbal component. That is, participants may have used verbal strategies to support their arithmetic calculations, thereby producing the same verbal interference that grammaticality judgments did. Future research will test whether tasks providing little opportunity for verbal coding will elicit domain effects. T_ask involvement. Experiment 1 revealed a surprising finding: One of the interruption types that was hypothesized to produce relatively little disruption, the 127 clarification interruption type, was one of the most disruptive types tested. The property of this interruption type that seems most saliently responsible for this unexpected effect is the level of involvement required by the interruptee to deal with the interruption. Namely, it was the only interruption type that necessitated a meaningful and lengthy verbal response from the interruptee before resuming the primary dialog. Although, perhaps, a more carefully controlled experiment may have fit the hypotheses better, it is a happy accident such as this that could help drive the science of interruption further by looking at how the level of involvement required by interruptions influences their effects on primary task performance and resumption. Consider the following scenario in a dialog. Bob is again telling David about a scene in a movie that he missed. When David casually interrupts to mention how unusual an aspect of the scene sounds, Bob is able to resume without incident. However, when David asks what a particular character looked like so he can know which character Bob is talking about, Bob answers, and then stammers, repeats, and is disfluent upon his resumption of the scene description. lmportantly, both interruptions take the same amount of time. The difference is the level of attention Bob needed to pay to each interruption type. For the first interruption, there was actually no need at all for Bob to consider David’s interruption beyond mere superficial encoding. For the second interruption, however, there was a requirement to encode, comprehend, and answer the question David posed. The issue at stake here is akin to the idea of good-enough processing in language comprehension (Ferreira, Bailey, 8. Ferraro, 2002; 128 Sanford & Sturt, 2002). Perhaps interruptions are disruptive (in part) to the extent that they require a deep level of processing rather than shallow processing: The more attention an interruption demands, the greater its cost to the storage of primary task goals. Utterance Planningfll Task Goals—A usefulficomparison? This document makes frequent and important analogies between interruptions to language production and task interruptions. These connections are made at the levels of theory, design, terminology, and dependent measures. A valid question could be posed to undermine this approach: What if language is, indeed, special? Why should the theories, processes, measures, and research approaches used in studying task interruptions translate to the study of interruptions to language production? I respond to this hypothetical and slightly opinion-based question with a slightly opinion-based answer: First, there are those of us in psycholinguistics who study language as though it is an untouched island apart from the otherwise crowded land masses of the mind. Then, there are others who view language as just another facet of mind that is subject to the same mental limitations of memory and attention that all mental processes face. I consider myself to be of the latter camp, and it seems to me that it is just as useful to understand what language has in common with other mental faculties as it is to know how it differs. But more importantly, the analogy has, for the sake of the present project, been effective for the sake of learning from a domain in which interruptions are somewhat understood, obtaining objective measures of 129 disruption to language production from this domain, and examining what these measures mean for language production. I will presently turn to these issues. Implications for Theories of Language Production Although the implications of the present research for the study of task interruptions is interesting and rich, the potential implications for theories of language production are just as broad. | presently review how the current work helps not only begin to sketch out a model of bookkeeping during language production, but adds a new approach to the incrementality debate, and also poses an important question for the dialog model of Pickering and Garrod (2004). Mions for Bookkfimg The clause as a unit of sentence-level bgkkeeping. When speakers are interrupted in the middle of articulating a sentence, there is an apparent preference to return to a clause boundary. Two pieces of resumption error evidence from Experiments 2 and 3 support this notion. The first is that errors are far less common at clause boundaries than they are when speakers are interrupted mid-clause. The second is that speakers interrupted early in a sentence will often skip ahead to a clause boundary, and speakers interrupted late in utterance production will backtrack to a previous clause boundary. It is almost as though the sentence-level bookkeeping system has two tiers. At the first tier, the speaker is able to recall with exact detail the locus where he or she had left off. Resumption in this case is error-free, and this error-free resumption comprises the majority of trials in the experimental paradigm used here. At the second tier, the bookkeeping pointer has been lost, but the overall representation 130 of the utterance plan is still intact. When the pointer is lost, the system employs a strategy borne either of past success, convenience (as it conforms to the architecture of the system), or both: go back, or go forward, to the nearest major syntactic boundary. For these reasons, it makes some sense to argue that maintaining a bookkeeping pointer at the sentence level uses the clause as the major unit of information storage. So, where was I? A third tier. Although such cases never arose in Experiments 2 and 3, and arose only in the Pilot Studies for Experiment 1, there certainly are also cases in which interrupted speakers lose track not only of the place in a particular sentence where they had left off, but the entire utterance itself. Cases like this are often illustrated in discourse by the question, “Where was I?”, as in the opening scripted conversation. It would be interesting to try to find circumstances under which these types of errors can be elicited. Knowing when these error occur could be very informative in future studies of dialog-level bookkeeping. Although participants in Experiment 1 did not explicitly lose track of their place in sentences, indirect measures of resumption difficulty revealed that certain types of interruptions disrupted bookkeeping with more frequency than other types. As noted previously, it is likely that the amount of attention paid to conversational interruptions plays a large role in how disruptive they become to dialog-level bookkeeping. Immersive interruptions that demand attention by requiring thoughtful responses or by simply being outlandish seem the most likely interruptions to produce disruptions to maintaining track of a dialog. Although this 131 analysis does not mesh especially well with the idea that cooperative, or aligning, interruptions will aid bookkeeping, but competitive, or non-aligning, interruptions will disrupt it, the analysis fits marvelously with the model of dialog that inspired much of this work (Grosz and Sidner, 1986). The Grosz and Sidner model uses a goal stack in which discourse goals get pushed and popped depending on how interruptions are able to capture attention. This model creates a passable bridge for the two research domains of language production and task interruptions. By combining the study of conversational interruptions with the notion that interruptions require the maintenance of primary task goals (e.g., Altmann & Trafton, 2004; Trafton et al, 2003), the present study can be viewed as empirical support for the model of dialog proposed by Grosz and Sidner. Let us try to unpack this argument by using Experiment 1 as an example. In Experiment 1, the primary goal of the speaker was to convey enough information about the film clips for the listener to be able to take a memory test based on the description. When they were interrupted at hypothetical Point X, the subgoal that was interrupted was to inform the listener about the plot details of Point X. Describe PointX is the goal that requires attention, according to Grosz and Sidner, and memory maintenance, according to Altmann & Trafton (2002). Interruptions that push this goal down into the attentional/memory stack are either the ones that veer wildly from Describe Point X, or require a higher degree of attention than usual if the interruption is related to Describe Point X. The further attention is diverted from the primary goal, the more the attended interruption will interfere with memory for Describe Point X (Altmann & Trafton, 132 2002). This analysis of interruptions thus synthesizes principles from computational linguistics and task interruptions literature to lay the groundwork for a cognitive model of conversational interruptions. Isires of lncrementafit An ongoing debate in the language production literature centers on the amount of planning that speakers do before and during utterance production. According to influential models (Kempen & Hoenkamp, 1987; Levelt, 1989) and empirical studies (Schriefers, Teruel & Meinshausen, 1998; Smith & Wheeldon, 1999, 2001) of grammatical encoding in language production, syntactic planning proceeds incrementally, phrase by phrase, through an utterance. These models (Kempen & Hoenkamp, 1987; Levelt, 1989) also assume that grammatical encoding, unlike conceptual/semantic planning, operates in a reflexive, cost-free fashion. In one study (Smith & Wheeldon, 2001) it was argued that grammatical encoding is costly before articulation, but comes for free once speech has begun. Research supporting incremental, phrase-by-phrase planning of syntax have typically focused on studies showing that latencies to begin speaking are not affected by information beyond the earliest parts of an utterance (Smith & Wheeldon, 1999, 2001). In recent years, new artillery has been recruited to demonstrate the narrow scope of syntactic planning: eye-tracking (e.g., Griffin & Bock, 2000). The eye-tracking methodology has shown that speakers tend to follow rigid looking-and-speaking schedules such that fixation on an object is shortly (within 500 ms) followed by the naming of that object in the context of simple sentences. 133 Other researchers claim that the scope of planning in language production is not rigidly incremental, but rather, is under the strategic control of the speaker (Costa & Caramazza, 2002; Ferreira & Swets, 2002, 2005; Schriefers & Teruel, 1999; Swets & Ferreira, 2003). This research has shown that different task circumstances, especially time pressure, elicit different syntactic planning ranges. In one such study (Ferreira & Swets, 2005), speakers under time pressure were affected during sentence production by syntactic information that would not be uttered for two full clauses. Such research shows not that speakers always plan ahead by a clause or two in advance, but merely that, if desired, they are capable of doing so. Experiments 2 and 3 represent a new approach to the question of planning scope in sentence production. By interrupting participants at various points in two-clause sentence production (early, middle, and late), it is possible to measure how much planning has been done before interruption (Meyer et al, 1986). Because interruptions caused the most disruption when a full clause or more had yet to be uttered, and the least amount of disruption (in terms of resumption lag) when only a portion of a single clause remained, one must conclude that the participants in Experiments 2 and 3 were planning their syntax at least a clause at a time. Consider this: if speakers only planned one phrase at a time, why would they bother taking more time to resume speech following an interruption when more of the sentence remained to be planned? If syntactic planning is truly phrasal in scope, then planning the next phrase in the middle of a sentence should surely be just as easy as when only one phrase remains to be 134 planned. For this reason, the data presented in Experiments 2 and 3 should be considered as more evidence against the radically incremental view of syntactic planning, and for the strategically incremental view.15 Dialog: ls Alignment Real? In the model of dialog espoused by Pickering and Garrod (2004), the alignment of representations between interlocutors in a dialog is regarded as the essential component in their theory of why dialog seems so easy compared to monolog. This notion was paramount when designing the interruptions for Experiment 1, and testing whether interruptions that helped promote representational alignment between interlocutors resulted in easier dialog (and hence, more efficient speech resumption). But a funny thing happened on the way to supporting the alignment model: One of the interruption types (clarification) designed to promote alignment caused just as much disruption to speakers as the interruption designed to produce the most disruption (true interruptions). Conversely, one of the interruptions designed to produce a good deal of disruption by breaking alignment (digression) was just as un-disruptive as the least disruptive, and most aligning type (repetition). I will not argue that this result does any notable damage to the alignment model of dialog. I would like to note, however, that if the results obtained in Experiment 1 are replicable such that an interruption that promotes alignment does more damage to the primary dialog than an interruption that does not promote alignment, then clearly, there ‘5 I would be curious to see what Experiments 2 and 3 would show if an eyetracking component were added. It seems as though this could be a case in which the eyes trick experimenters by appearing to betray an incremental following of items to be named, while the syntactic planner is, in actuality, moving beyond the scope of the eye movements. 135 M‘ L l ... I are factors that are just as important as, if not more than, alignment. Earlier in this General Discussion, I argued that the degree to which it is important to pay attention to an interruption is quite likely to be such a factor. Future Directions Establishing a baseline. The HCI literature on task interruptions uses a tool for establishing whether interruptions are disruptive: performing a task with interruptions in one condition, and performing a task without interruptions in another condition. In which condition does task performance show negative effects? Because of the nature of the designs of the two paradigms used in the present study, this type of baseline could not be researched. Future research could address this in two ways: by changing the dependent measures used to evaluate primary speech production difficulty (for example, switching from RL to TOT), or by modifying the paradigms employed herein. For example, a study in which grammaticality judgments interrupt speech compared to a condition in which speech is interrupted by 3.25 seconds of empty silence could easily establish that filled interruptions are more disruptive than mere pauses. In fact, Experiment 2 begins to demonstrate that fact by showing that time spent waiting to resume after answering the arithmetic problems essentially functions as extra resumption lag time. Testing the Good-Enough Theogy. Are interruptions really more disruptive when speakers are required to pay attention to them? Future work could test this idea by interrupting speakers with various bits of information to read. In one condition, speakers are told that they will later be tested on this information. In 136 another condition, speakers are told that there will be some meaningless sentences that appear on the screen, and that they will not need to know that information later on. A result showing that the former condition results in greater disruption could represent evidence for a kind of good-enough theory of interruption cost (Ferreira et al, 2002). The role of storv grammars. Appendix A refers to story structure as a possible influence on the ability to recover from conversational interruptions. Levelt (1989) hints at a link between storytelling and the structure of intentions behind the production of stories by citing research in which primary goals of storytellers are treated differently in sentence structure than secondary goals. For instance, Brown and Dell (1987) showed that storytellers use different grammatical structures depending on whether an instrument of an action (e.g., stabbing) is a typical instrument (e.g., a knife) or an atypical instrument (e.g., an ice pick). Two findings emerged from this study. First, typical instruments were explicitly mentioned less often than atypical instruments. Second, when typical instruments were mentioned, they tended to be grammatically encoded outside of the action clause (e.g., The robber grabbed a knife and stabbed the man), whereas atypical instruments were mentioned in the same clause as the verb (e.g., The robber stabbed the man with an ice pick). Typical and atypical instruments are encoded differently because the intentions behind mentioning each are different. Atypical instruments are mentioned more often than typical ones because, for example, although it can be assumed that a knife was used to stab someone, the same cannot be assumed for an ice pick. It is imperative for 137 the overall intention of telling a story to know which instrument was used stab someone. On the other hand, if one can assume that a knife was used, then explicitly mentioning the knife must be motivated by a “side intention” such as embellishing the story. Unfortunately, Levelt does not draw the connection directly between goal hierarchies and story structures. Therefore, it would seem to be best to reserve investigation into this area for future work. Although Experiment 1 did not go in this direction, it is possible that future work could address this issue. However, it will be necessary to find some research that has adequately explained how a question about, say, a character’s hair color, resides at a fundamentally different “level” than asking about the setting of an action. One way to carry the findings of Experiments 2 and 3 into the domain of dialog would be to interrupt people at coarse versus fine break points in storytelling (Zacks & Tversky, 2001). The major challenge in such research would be untangling the fine and coarse break points in storytelling from mid- sentence versus between-sentence interruptions. Familiarity as expertise. The video clips that participants had to memorize for Experiment 1 were most likely more difficult to remember than clips of movies that participants had already seen. There are two possible reasons for this: First, the clips are presented out of context, so the only story hierarchy that participants can develop as they watch these clips is the local hierarchy. They do not get an overall sense of the themes of the films from which the clips are taken. As a result, participants lack the sort of meaningfulness context that has been shown 138 to aid story recall in classic experiments in cognitive psychology (Bransford & Johnson, 1972). Second, having never seen the films before, it may be more difficult for participants to chunk a great deal of information by using a well-worn hierarchical representation of the events they are to remember. This point is also inspired by a classic study in which chess players that have developed expertise in chess are able to chunk more chess-piece configuration information than novices, thus allowing them to recall chess board positions better (Chase & Simon, 1973; see also Ericsson & Kintsch, 1995). Essentially, participants in Experiment 1 are watching the movies as novices (no one has ever seen Basket Case) without any context (for example, they do not know that the creature in the basket is Duane’s formerly-conjoined twin brother, Belial—such a context could make it easier to remember other details of the film). For these reasons, it is possible that one might observe that participants have more trouble recovering from interruptions while describing these sorts of films as opposed to describing well-known movies. Future studies could manipulate familiarity and context as independent variables in order to test such a prediction. Warnings of uacominainterruptions—adapting to interruptiona, A human- computer interaction study performed by Zijlstra et al ( 1999) on task-interruptions provided support for a comprehensive taxonomy of interruption coordination put forth by McFarlane (2002). McFarlane proposed that there are four ways of coordinating interruptions: 1) immediate, 2) negotiated, 3) mediated, and 4) scheduled. Immediate interruptions are those that come as in the no-warning condition of Trafton et al (2003), and they are supposed to be best for efficiency 139 on the secondary task. Negotiated interruptions are considered to be the most user-friendly, and primary task performance on these types of tasks, in which the subject chooses when to begin performing any number of built-up interruption tasks, is the best of the four. This is because people tend to delay the interruption task until they reach a coarse break point in the primary task. On the other hand, mediated interruption (a computer decides which moments are prime for interruption performance), and scheduled interruption (a rigid schedule decides that the user should be interrupted every n units of time) are associated with generally low user-friendliness as well as performance. It seems apparent that most conversational interruptions are of the first type (immediate). However, perhaps there are signs interrupters give off that could signal an upcoming interruption (Susan Brennan, personal communication). Such signals could include gestures (a raised finger) or a verbal signal that a rebuttal is forthcoming ("But...”). Trafton et al (2003) and Altmann & Trafton (2004) demonstrated that the extent to which a subject can rehearse cues to goal retrieval during the interruption lag can affect the duration of the RL. A potentially fruitful direction for research in this area to take would be to experimentally manipulate the presence or absence of warnings that an interruption is coming up. This could be done using gestures or verbal signals as described above for follow-ups to Experiment 1, or by using signals such as flashes of light for follow-ups to the more controlled Experiment 2. Such studies could provide evidence that interruptions are, in fact, a negotiated process between interrupter and interruptee. It would reveal that 140 there are strategies people use in dialog in order to adapt to the potential for interruption. Effects of interruptions on production strategies. The aim of Experiment 1 was to test whether interruptions to speakers cause disruptions in memory of a bookkeeping structure at a local level. However, it could be interesting to investigate what effect interruptions have on a more global level of production strategies. Earlier, I mentioned that some studies have indicated that the possibility of interruption from a conversational participant may result in more concise (Chapanis, 1976) and higher quality production (Schober et al, 2004). However, these issues have not been directly examined. The protocol developed in Experiment 1 allows this issue to be tested directly. Specifically, participants could describe scenes to confederates under one of two conditions: Either the participant is told beforehand that the confederate must remain silent until he or she is finished, or the participant will be told that the confederate will interrupt in order to clarify, or expand on, the participant’s description. Independent ratings showing that the interruption condition results in better descriptions could demonstrate what Pickering & Garrod (2004) propose: that monolog is an inherently inferior method of communication than dialog. It would also show that the ability to interrupt someone comprises a significant portion of this difference. Task interruptions to convefitjons. During a pilot testing session for Experiment 1, a participant’s cell phone rang during her description. The resulting disruption was as strong as the most effective true interruption from any of the pilot testing (she actually asked, “Where was l?”). It may be interesting to perform 141 an experiment in which participants are interrupted not just by typical verbal interruptions, but by interruptions that more closely resemble task interruptions. Is “power outage” more disruptive than another lab member “accidentally" intruding on an interview between the experimenter and the participant? Are these sorts of interruptions actually more disruptive than conversational interruptions? This latter question is the familiar domain issue, but this study could be a straightfonrvard way to test it in a dialog. Effects of leagth aad complexuy' of interruptions on recoyerv from syntactic disruptions. An appealing aspect of the design of Experiments 2 and 3 is that their tightly controlled nature allows for more freedom to play with various properties of the interruptions that are tested. One area that is ripe for follow-up work is how the manipulation of length and complexity of interruptions could affect the ability to resume sentence production. Once the general technique is mastered, it would be relatively effortless to implement changes in how long the judgment tasks are to last, and how difficult the tasks are. Previous research (Gillie & Broadbent, 1989) on Tls has found little evidence that such manipulations ever work, but in sentence production, the currently “hot” syntactic priming literature could offer a reason to try these manipulations out in the dialog domain. Priminfl amogel of mg. The recent work of Picking and Garrod (in press, 2004) appears to be a promising enterprise that will bridge the disciplines of language production, language comprehension, dialogue, and social psychology. The model they propose is a mechanistic account of dialog, in which 142 they account for the ease with which people converse by assuming that representations between the participants of a dialog become aligned at many different levels, from phonological, to syntactic, to semantic, to the overall situation model. The common goals of the interlocutors contribute to the drive to align representations, but it is another process—priming (Bock, 1986)—that does the most work for Pickering and Garrod’s (2004) model. The potential pitfall for setting syntactic priming as the motivation for this ambitious theory is that researchers are still having a difficult time sorting out what exactly syntactic priming is. Is it a long-lasting effect that reflects the learning of certain abstract concept-to-structure mapping procedures (Bock & Griffin, 2000; Chang et al, 2000), or is it a strong, temporary activation of lexical and syntactic representations (Pickering and Branigan, 1998; Branigan et al, 2000b)? The two views of priming differ primarily along one empirical dimension: Whereas the implicit learning view requires that priming be evident over many intervening trials (up to 10, Bock & Griffin, 2000) and over large amounts of time (20 minutes), it seems as though a theory in which priming is used to help conversation-specific representations align between two people would require that priming not last so long. The evidence for long durations of priming effects primarily comes from a study by Bock and Griffin (2000) in which priming effects were elicited after intervening trials numbering 1, 2, 3, and 10. However, there was no effect of priming after 4 intervening trials. After failing to find structural priming after even a single intervening trial using written completion priming (Branigan et al, 1999), 143 Branigan et al (2000a) tried again using an oral sentence completion task and found that priming occurs after a single intervening filler trial. Other studies of the durations of priming effects have generally found rapid decay of structural representations (Wheeldon & Smith, 2003). The methodology developed in Experiments 2 and 3 could help address the robustness of syntactic representations over time and intervening tasks. Although it does not address syntactic priming directly, the syntactic priming literature addresses the same issue: syntactic persistence, which, in fact, is what syntactic priming is still sometimes called. Are syntactic representations long- Iasting learning phenomena, or are they strong, but transient, activations? By manipulating the duration and complexity of interruptions to utterances, the methodology developed in Experiments 2 and 3 might be able to sort out the some of the happy mess that has been cultured by researchers in syntactic priming. Conclusions At the outset of this document, a fictional conversation demonstrated the possibility that interruptions can disrupt a speaker’s train of thought. A series of questions were then raised: First, are interruptions, in fact, disruptive to the ability of speakers to remember what they were talking about? If so, what interruption properties are the most disruptive? And finally, what do these properties suggest about the nature of the language production system—how it keeps its books and plans its utterances? 144 The first question was answered without resorting to fictional scripts. Experiments 1, 2 and 3 showed that different properties of interruptions elicited different degrees of disruption to speakers’ bookkeeping. Using the reasoning that if interruptions are disruptive, then some interruption types must be more costly than other types, the results of Experiments 1-3 all reveal that interruptions are, in fact, disruptive. I then addressed the question of what interruption properties are the most disruptive. Experiment 1 revealed that interruptions that require more attention are more disruptive than interruptions that require less attention. Specifically, an interruption that is wildly off topic and diverts attention is more disruptive than a mild digression. Similarly, an interruption that requires verbal response is more disruptive than a digression, regardless of whether it helps align the interlocutors on the primary goal of the dialog. Experiments 2 and 3 together showed that, at the sentence level, interruptions coming earlier in sentence production take the most time to recover from, but that resumption errors are more tied to the location of interruptions relative to clause boundaries. Also, longer interruptions are more costly than shorter interruptions, but verbal interruptions may be just as disruptive as non-verbal interruptions when interruption length is controlled. The final question asked was this: What do these disruptive properties tell us about the nature of bookkeeping processes and language production architecture? First, as Grosz and Sidner (1986) alluded to early on, attention paid to various levels of a goal hierarchy seems to play a role in the ability to handle interruptions. The further an interrupting goal brings the speaker from the primary 145 goal of the dialog, the more difficult it will be to recall that goal later. Second, at the sentence level, bookkeeping is much easier when interruptions occur at natural break points in the representation of sentence information, such as clause boundaries. In that sense, sentence-level bookkeeping occurs at the interface of verbal and non-verbal processes such as attention and memory. Finally, the result that early interruptions take longer to recover from than later interruptions is also further evidence that language production is a strategically non-incremental process. I conclude with a suggestion. It seems that the process of bookkeeping in language production is not so different from bookkeeping in any complex task. In both cases, the process of interruption can be subjected to the same conceptual analyses (as in Figure 1), manipulations (e.g., length, domain, similarity, etc.), and measurements (e.g., resumption lag). Likewise, a conversational interruption is subject to the same limitations in attention and memory that dictate resumption from a task interruption. An interruption at a coarse break point is less costly than an interruption at a fine break point in the specific contexts of each domain. So although it may be interesting to see how conversational interruptions differ from task interruptions, it could be more scientifically productive to unite several disciplines by investigating how interruptions among different domains can be viewed as the same. 146 APPENDICES 147 Appendix A Pilot Studies 1 and 2, Experiment 1 Pilot Studv 1 An experiment with 6 participants was run in order to determine the viability of a procedure in which a confederate interrupts a participant in such a way as to measure the disruptiveness of various Cls. Participants in Pilot Study 1 were instructed to watch a three minute video clip from a Lord of the Rings film. I instructed participants that the purpose of watching the clip was to memorize the events, dialog, and characters’ names as accurately as possible because another participant (a confederate) was going to listen to the clip description for a later memory test. During the ensuing descriptions, the confederate participant interrupted the experimental participant at pre-established story junctures in the participants’ descriptions. Interruption type was manipulated as a between participants independent variable. Interruption types were selected to correspond to Grosz and Sidner’s taxonomy of interruptions. Here are descriptions of each type presented in order of predicted disruptiveness: Flashbacks. The confederate asks for clarification of something the participant had just said. For example, the first time the participant said the word “hobbit”, the confederate would immediately interrupt the participant mid- sentence and ask, “What’s a hobbit? Are those the short characters?” 148 Digressions. The confederate follows up a piece of the participants’ description with a connected thought, but a thought that diverts from the goal of the dialog. For example, if the participant mentioned that a tree came to life, the confederate might say, “I hate it when stuff happens like that in movies—trees coming alive. Don’t you?” True interruptions. The confederate interrupts the description by bringing up a completely unrelated topic and addressing a different audience (the experimenter). For example, after the participant mentions that a monster was threatening the hobbits, the confederate turns to the experimenter and asks, “How long is this experiment supposed to last? I was just wondering because I have a class in 20 minutes.” Two dependent measures were used to determine the disruptiveness of the types of interruptions. The first measure, resumption lag (RL), was taken to be the number of milliseconds between the offset of the interruption and the onset of the first word uttered by the participant that carries meaningful connections to what they had been saying before being interrupted. The second measure was the number of disfluencies uttered by the participant during the measured RL. Only filler-type disfluencies were counted (uh and um). Pauses and restarts were not counted, but the effect of such behavior would be reflected in larger RL times. As with RL, disfluency measures are assumed to reflect the difficulty a participant is having planning speech. Resumption lag results and disfluency results are shown in Figures 21 and 22, respectively. With the small number of participants tested (N = 6), no 149 significant differences were found between the conditions in either measure. However, a clear trend emerged such that true interruptions and digressions resulted in longer mean RLs and mean number of disfluencies than flashbacks. Also, task-related conversational interruptions were less disruptive than unrelated interruptions. 5000 - 4000‘ 3000 - Time in ms 2000 - 1000 - "Flashback" "Digression" "True interruption" Interruption Type Figure 21. Average resumption lag as a function of interruption type, Pilot Study 1. 150 0.4 - Proportion of Resumption containing Disfluency "Flashback" "Digression" "True lntenuption" Interruption Type Figure 22. Average proportion of disfluent resumptions as a function of intenuption type, Pilot Study 1. Pilot Study 2 A second pilot study modified the Pilot Study 1 method in two ways. First, I changed the types of interruptions. Rather than following the Grosz and Sidner taxonomy, l categorized interruptions according to their hierarchical relationship to the “story grammar” of participants’ narratives. Types and examples of the interruptions, in the order of predicted disruptiveness (from least to most disruptive) are given below: Low-level interruptions: The confederate interrupts at a specific story point and asks that a lower-level story element be explained further. For example, when the participant mentions that a tree came alive, the confederate interrupts by asking what the tree looked like. 151 flgh-Ievel interruptions: As the participant comes a certain story point, the confederate interrupts by asking that a higher-level story element be clarified. For example, if the participant has just explained that the tree came to life, the confederate interrupts by asking about the overall setting of the story—when and where it took place. Off the hierarchy: The third type of interruption is like the “true” interruption or digression—the interruption has no relation to the story hierarchy whatsoever. I reused true interruptions from Pilot Study 1 as the items for this type of interruption. The second major difference from Pilot Study 1 is that l manipulated interruption type as a within-participants variable. This difference could make cause either smaller or greater differences to emerge: either the effects found in Pilot Study 1 will be more robust because between-subjects variation is reduced, or the effects could diminish because such variation explained the differences observed between conditions. Preliminary results for resumption lag and disfluencies are presented in Figures 23 and 24 (below), respectively. It appears that interruptions that probe information from “further down” in the narrative hierarchy are more disruptive than interruptions that probe for information at a higher story level than the present one. In addition, “true” interruptions that require attention to be given to elements outside the narrative hierarchy seem to cause the same amount of disruption as “downward” interruptions. These results were not predicted. It is possible that the downward interruptions presented information to the participant 152 that had never been considered as a part of the narrative. In that case, the interruption actually is more like an “off” interruption than a “downward" one, and requires just as much attention. On the other hand, perhaps it is more difficult to resume a narrative after downward interruption. This would imply that it is more resource-demanding to pay attention to lower-level discourse planning elements, and subsequently return to a higher-level element, than it is to go from high back to low. 5000- 4000 - 3000 - Time in ms 2000 ~ 1000 * "Downward" "Upward" "Off" Interruption Type Figure 23. Average resumption lag as a function of intenuption type, Pilot Study 2. 153 1.4 - 1.2 - 1.0 - 0.8 a 0.6 - Time In ms 0.4 - 0.2 - 0.0 - IlDownll "Up" lloffll Interruption Type Figure 24. Average number of disfluencies as a function of intenuption type, Pilot Study 2. Pilot Studies: Lessons Learned Pilot Studies 1 and 2 highlighted several methodological issues that were addressed in the proposed experiments. First, it was important to run the experiment using a within-participants design. Second, it was important to account for the length of the interruptions by making sure that interruption length is not a confound with predicted disruptiveness. Participants offered some helpful suggestions for how to make the confederate’s “performance” more credible. One of these suggestions was to initially bring both participant and confederate in a room before the experiment begins and to rig a game of chance in favor of the confederate. The winner of the game of chance would win the right to choose between “describing" or “listening” 154 to the description of the movie clip. This suggestion was adopted in Pilot Study 2, and was used in Experiment 1. Another suggestion offered by a pilot participant was that the confederate should not seem as excited and eager as she appeared to be. Such personality traits, the participant advised, are not expected from a Michigan State University undergraduate. Finally, the results of Pilot Studies 1 and 2 made it a difficult task to choose between the two interruption manipulation strategies. The advantages of the “up, down, off the hierarchy” manipulation was that it straightforwardly tested various levels of representation of message level planning and bookkeeping. The Grosz and Sidner (1986) classification was slightly more nebulous in that regard. However, the results do not appear as clean for the Pilot Study 2 manipulation, and the idea that some story details are “low-level” versus “high-level” is difficult to verify, even though this is precisely what a story grammar is designed to do. The fact is, it is difficult to determine a priori what is and is not included in participants’ narrative plans. If results of Pilot Study 2 had been significant such that “low-level” interruptions required longer RLs than “high-level” interruptions, it would be impossible to determine whether these results were obtained because participants have difficulty going from lower-level to higher-level aspects of their story representation; it could just as easily be the case that participants did not intend to produce lower-level information, or worse yet, had simply never encoded the information at the lower level. The other problem with the design of Pilot Study 2 is that the story structure literature does not make it completely clear that asking about the location in which an event takes place is a “higher" 155 level than asking about the color of a character’s hair. According to Thorndyke (1977), there are 4 high-level components to a story: setting, theme, plot, and resolution. Each of these levels has lower levels achieved through re-write rules. For the examples above, both location and hair color are lower levels under the “setting” component. In that sense, one does “dominate” the other, but is more like a syntactic sister of the other. Given the results of the pilot studies and the theoretical problems with Pilot Study 2 that were just outlined, Experiment 1 was designed to test a rather basic idea: that interruptions that attempt to align the interrupter's representation with the interruptee’s representation of the story will result in less disruption than interruptions that do not lead to such alignment. In that sense, Experiment 1 follows the design of Pilot Study 1 more closely than Pilot Study 2, although the issues addressed in Pilot Study 2 were listed as a possible future direction of this work in the General Discussion. 156 Appendix B Interruptions Used in Experiment 1 Repetition interruptions for Breakin’ Approach: “Sorry, let me see if I got this right—so a guy and girl got out of a car to watch some people dancing?” Dance: “Sorry, let me see if I got this right—so the guy and the girl that had gotten out of a car went out on the dance floor and started dancing?” Other guys: “Sorry, let me see if I got this right—so while the guy and the girl were dancing, some other guys came up and starting dancing in front of them?” Leave/introductions: “Sorry, let me see if I got this right—so after they left the dance floor, they all started talking to each other and introduced Turbo under some sort of “street name”? Repetition interruptions for Basket Case Hot dogs: “Sorry, let me see if I got this right—so a guy walks into the hotel room, starts talking to a basket, and drops a bunch of hot dogs into it?” Arm: “Sorry, let me see if I got this right—so after the guy the leaves, some sort of alien or monster arm or something comes out of the basket and breaks the television dial?” 157 Girl: “Sorry, let me see if I got this right—so after the guy leaves the apartment and the arm breaks the TV, the guys runs out and meets some girl somewhere?” State of Liberty/ Grass: “Sorry, let me see if I got this right—so after the guy leaves, some arm breaks a TV dial, and the guy and the girl go the statue of libertyl talk on some lawn?” Clarification interruptions for Breakin’ Approach: “Wait, sorry—so what time period is this taking place in? Is it like, current times, because people don’t just dance in public much anymore.” Dance: “Wait, sorry—so what time period is this taking place in? Is it like, current times, because people don’t just dance in public much anymore.” Other guys: “Wait, sorry—so, what did these guys look like?” Leave/introductions: “Wait, sorry—so were those other guys, like, dancing in a threatening way or something? So that’s why the main people left?” Clarification interruptions for Basket Case Hot dogs: “Wait, sorry—so the hot dogs weren’t like cooked hot dogs, they were just raw hot dogs? They didn’t have ketchup or mustard or anything?” Arm: “Wait, sorry—so what kind of basket was it? Was it like a flower basket or a, like, sizable basket?” Girl: “Wait, sorry—so is this all taking place, like, in the city or something?” 158 Statue of Liberty/grass: “Wait, sorry—so is this all taking place, like, in the city or something?” Digressions for Breakin’ Approach: “Oh no, is this one of those movies about dancing? Like, where people just dance in public and people watch? Those movies are so annoying. Anyways, sorry, go on.” Dance: “Oh, man, I hate movies with dancing scenes, don’t you? I just think it’s so boring to watch dancing, I don’t understand why people watch these movies.” Other guys: “Oh no, is this one of those movies where people, like, dance in front of each other, like that You Got Served movie? Oh, I hate those movies.” Leavefrntroductions: “Oh no, is this one of those movies where people, like, dance in front of each other, like that You Got Served movie? Oh, I hate those movies.” Digressions for Basket Case Hot dogs: “Oh, man, I can’t stand hot dogs, I think they’re pretty much the grossest food imaginable. Don’t you think so? Anyways, sorry, go on.” Arm: “Ugh, is this like a horror movie or something? I hate movies about aliens and monsters. They just seem so pointless. Anyways, sorry, go on.” 159 Running/Girl: “Wow, this sounds like a really bad movie. It sort of reminds me of this movie I watched this weekend, it was just terrible. Anyways, sorry, go on.” Statue of Liberty/grass: “Wow, this sounds like a really bad movie. It sort of reminds me of this movie I watched this weekend, it was just terrible. Anyways, sorry, go on.” “True” interruptions for Breakin’ Approach: (After looking briefly to the side) “I’m sorry, do you guys hear that noise in the hall? It’s like a high-pitched buzzing or something? No, ok, sorry, go on.” Dance: (After looking briefly to the side, then to experimenter and the subject) “I’m sorry, do you guys hear that noise in the hall? It’s like a high-pitched buzzing or something? No, ok, sorry, go on.” Other guys: (After looking briefly to the side, then to experimenter and the subject) “I’m sorry, do you guys hear that noise in the hall? It’s like a high-pitched buzzing or something? No, ok, sorry, go on.” Leave/introductions: (After looking briefly to the side, then to experimenter and the subject) “I’m sorry, do you guys hear that noise in the hall? It’s like a high- pitched buzzing or something? No, ok, sorry, go on.” “True” interruptions for Basket Case 160 Hot dogs: (To experimenter) “I’m sorry, do you know about how much longer this experiment is supposed to last, because I have class in like a half hour.” Arm: (To experimenter) “I’m sony, do you know about how much longer this experiment is supposed to last, because I have class in like a half hour.” Running/Girl: (To experimenter) “I’m sorry, do you know about how much longer this experiment is supposed to last, because I have class in like a half hour.” Statue of Liberty/grass: (To experimenter) “I’m sorry, do you know about how much longer this experiment is supposed to last, because I have class in like a half hour.” 161 Appendix C Experiment 1 Instructions You are about to perform a memory task. In this task, you will be shown clips from movies. Please watch the clips carefully. After watching the film clips, you will be asked to describe from memory the events and dialog contents from the clips as accurately as possible. In your description of the scene, please include as many details about the events and characters as you can remember, whether this knowledge was acquired from simply watching the clip or from previous experience watching the film. Also, please offer event descriptions in the order in which they occurred. There will be another participant in this study whose job is to listen to and understand your description. This other participant will be tested for their memory of your description. Therefore, he or she will need to have a good understanding of your description, and may ask some questions about the clips in addition to what you describe. The entire experiment should take about 30-45 minutes. 162 Appendix D Experimental Objects, Naming Accuracies, and Times, Experiments 2 and 3 Name Naming Accuracy Naming Time shoe 100 730 foot 100 730 ear 100 716 glass 95 684 key 100 640 chair 100 672 spoon 100 651 bell 100 644 lamp 100 655 truck 100 732 eye 100 683 dog 100 670 fork 1 00 61 5 pants 100 668 cake 100 762 cat 100 736 bed 100 672 tree 100 683 163 coat 100 727 mouse 100 604 house 95 662 frog 100 719 book 100 702 leaf 100 789 164 Appendix E Target Sentences, Experimental Items, Experiment 2. The ear moves above the book and the shoe moves below the book. The leaf moves above the foot and the glass moves below the foot. The key moves above the chair and the spoon moves below the chair. The bell moves above the lamp and the truck moves below the lamp. The eye moves below the dog and the fork moves above the dog. The pants move below the cake and the mouse moves above the cake. The bed moves below the tree and the coat moves above the tree. The cat moves below the house and the frog moves above the house. The shoe moves above the leaf and the lamp moves below the leaf. The glass moves above the key and the chair moves below the key. The spoon moves above the bell and the foot moves below the bell. The truck moves above the eye and the book moves below the eye. The fork moves below the pants and the house moves above the pants. The mouse moves below the bed and the tree moves above the bed. The frog moves below the cat and the cake moves above the cat. The coat moves below the ear and the dog moves above the ear. The lamp moves above the frog and the ear moves below the frog. The dog moves above the coat and the leaf moves below the coat. The cake moves above the mouse and the key moves below the mouse. The tree moves above the fork and the bell moves below the fork. 165 The house moves below the truck and the eye moves above the truck. The book moves below the spoon and the pants move above the spoon. The foot moves below the glass and the bed moves above the glass. The chair moves below the shoe and the cat moves above the shoe. Target Sentences, Filler Items, Experiment 2. The toe moves above the pen and the axe moves below the pen. The broom moves above the drum and the skirt moves below the drum. The bus moves above the bear and the fun moves below the bear. The gloves move above the doll and the wheel moves below the doll. The lips move above the leg and the shirt moves below the leg. The snail moves above the frog and the bow moves below the frog. The cow moves above the snake and the arm moves below the snake. The swan moves above the wheel and the fish moves below the wheel. The harp moves above the boot and the train moves below the boot. The chain moves above the box and the desk moves below the box. The pen moves below the wheel and the fish moves above the wheel. The cow moves below the bow and the box moves above the bow. The broom moves below the harp and the train moves above the harp. The frog moves below the chain and the gun moves above the chain. The toe moves below the bus and the boot moves above the bus. The leg moves below the skirt and the snake moves above the skirt. The snail moves above the lips and the axe moves above the lips. 166 The cap moves below the desk and the swan moves above the desk. The arm moves below the doll and the pen moves above the doll. The drum moves below the shirt and the bear moves above the shirt. The desk moves next to the gun and the cow moves above the gun. The snake moves next to the swan and the leg moves below the swan. The doll moves above the toe and the pen moves next to the toe. The chain moves below the gloves and the bow moves next to the gloves. The axe and the train move next to the frog. The arm moves next to the cap and the skirt moves above the cap. The snail moves next to the bus and the boot moves below the bus. The bear moves above the shirt and the lips move next to the shirt. The broom moves below the box and the harp moves next to the box. The drum moves next to the fish and the wheel moves next to the fish. The swan moves next to the toe. The skirt moves next to the chain. The box moves next to the frog. The snake moves next to the cow. The boot moves next to the desk. The drum moves next to the train. The wheel moves next to the broom. The gloves move next to the harp. The pen moves next to the gun. The snail moves next to the cap. 167 The lips move above the book and the snail. The broom moves below the leaf and the gun. The shoe moves above the bow and the toe. The foot moves below the gloves and the frog. The drum moves above the snake and the ear. The shirt moves below the glass and the bus. The fish and the key move above the doll. The chain and the chair move below the box. The spoon and the skirt move above the cap. The bell and the axe move below the wheel. The desk and the arm move above the lamp The bear and the pen move below the truck. The cow moves next to the eye and the leg moves below the eye. The boot moves next to the dog and the harp moves above the dog. The swan moves next to the fork. The chain moves below the frog and the cap moves next to the frog. The gloves move above the house and the bus moves next to the house. The arm moves next to the mouse. The coat moves next to the skirt. The tree moves next to the boot. The harp moves next to the fish. The cat moves next to the gun. The cake moves next to the drum. 168 The pants move next to the pen. The desk and the bear move next to the fork. The train and the frog move next to the dog. The snake and the leg move next to the eye. The toe and the broom move above the truck. The axe and the doll move below the lamp. The shirt and the swan move above the bell. 169 Appendix F Grammaticality Judgment Task Items, Experiment 2. Whenever the main pipe was closed because of the terrible flooding. It was expected Janis to win the race. The whole team lost the coach’s money himself. Bring Carrie the backpack in the corner to Troy. Jill will use Weight Watchers and thirty pounds were lost. Visiting the museum is a good experience for most children. The county commissioner was forced to resign after the scandal. The mayor announced that the recycling program would be reinstated. Joan decided that she would never attend a concert again. To be energy the battery was created by the scientist. The jelly jar that my grandma filled and sent to me. Jill gave the man and he enjoyed the book. The policewoman who held at hostage radioed for backup. What did you think the fact that Peter was explained. Spending time outdoors made Andy's allergies worse. Running over herself was the worst that Joan seems done. The memories of spending a week in Mexico would last a lifetime. Vicky finished her homework and was able to watch some television. Vince used to trade baseball cards with all of his friends. 170 Arithmetic Judgment Task Items, Experiment 2. 27 + 21 = 49 35 + 21 = 66 37 + 21 = 52 35 + 22 = 47 37 + 22 = 58 35 +23=48 23+26=49 23 + 36 = 59 71 +13 = 84 65 + 32 = 97 45 +14 = 59 12 + 25 = 37 35 + 24 = 60 72 +11 = 93 62 +15 = 73 42 + 34=67 13 +24= 39 21 +26=47 21+28=49 171 Appendix G Interruption Items, Norming Study B and Experiment 3. Visiting the museum is a good experience for children. The senator was forced to resign after the scandal. The book had been stored for many years. The mayor reinstated the recycling program. Joan decided that she would never drink again. Going outdoors made Andy's allergies worse. The memories of a week in Mexico would last a lifetime. Vicky finished her homework and watched some television. Vince always trades baseball cards with his friends. The lion killed the gazelle and ate it immediately. Whenever the main pipe was closed because. It was expected Janis for win the race. The whole team lost the money himself. Kyle threw the frisbee for the dog brought it back. Jill will diet and thirty pounds were lost. To be energy the battery was by scientists. The jelly jar that my grandma sent to me. Jill gave the man and he enjoyed book. The policewoman who radioed for backup. 172 What did the fact that Peter was explained. That was the worst that Joan seems done. 45 + 52 = 97 23 + 45 = 68 35 + 34 = 69 53 + 24 = 77 32 + 46 = 78 25 + 54 = 79 62 + 24 = 86 32 + 65 = 97 32 + 47 = 79 32 + 45 = 77 32 + 45 = 76 42 + 35 = 87 35 + 54 = 79 62 + 34 = 86 75 + 34 = 99 73 + 24 = 98 42 + 54 = 98 42 + 34 = 67 37 + 32 = 79 24 + 73 = 98 23+76=89 173 REFERENCES 174 REFERENCES Adamczyk, P. D. & Bailey, B. P. (2004). If not now, when? The effects of interruption at different moments within task execution. In: Human Factors in Computing Systems: Proceedings of CHI ’04. New York: ACM Press, 271-278. Altmann, E. M., & Trafton J. G. (2002) Memory for goals: An activation-based model, Cognitive Science, 26, 39—83. Altmann, E. M., & Trafton, J. G. (2004). Task interruption: Resumption lag and the role of cues. In: Proceedings of the 26’" Annual conference of the Cognitive Science Society. Bailey, B. P., Konstan, J. A., & Carlis, J. V. (2001). The effects of interruptions on task performance, annoyance, and anxiety in the user interface. In: M. Hirose (Ed.) Human-Computer Interaction - INTERACT 2001 Conference Proceedings. Amsterdam: IOS Press, 593-601. Bock, K. (1986). Syntactic persistence in language production. Cognitive Psychology, 18, 355-387. Book, J. K. & Cutting, J. C. (1992). Regulating mental energy: Performance units in language production. Journal of Memory and Language, 31, 99-127. Bock, K., & Griffin, Z. M. (2000). The persistence of structural priming: Transient activation or implicit learning? Journal of Experimental Psychology: General, 129, 177-192. Branigan, H. P., Pickering, M. J., & Cleland, A. A. (2000). Syntactic co-ordination in dialog. Cognition, 75, Bl3—B25. Branigan, H. P., Pickering, M. J., Stewart, A.J., & McLean, J. F. (2000). Syntactic priming in spoken production: Linguistic and temporal interference. Memory & Cognition, 28, 1297-1302. Branigan, H. P., Pickering, M. J., & Cleland, A. A. (1999). Syntactic priming in written production: Evidence for rapid decay. Psychonomic Bulletin & Review, 6, 635-640. Bransford, J. D., & Johnson, M. K. (1972). Contextual prerequisites for understanding: Some investigations of comprehension and recall. Joumal of Verbal Learning and Verbal Behavior, 11, 717-726. 175 Brown, P. & Dell, G. S. (1987). Adapting production to comprehension: The explicit mention of instruments. Cognitive Psychology. 19, 441-472. Burmistrov, |., & Leona, A. (2003). Do interrupted users work faster or slower? The micro-analysis of computerized text editing task. In: J. Jacko and C. Stephanidis (Eds.) Human-Computer Interaction: Theory and Practice (Part l)—Proceedings of HCI lntemational 2003, Vol. 1. Mahwah: Lawrence Erlbaum Associates, 621-625. Chapanis, A. (1976). Interactive human communication: Some lessons learned from laboratory experiments. Paper presented at NATO Advanced Study Institute on ”Man-Computer Interaction", Mati, Greece. Christianson, K. (2002). Advanced planning in sentence production: Evidence from Odawa. Unpublished doctoral dissertation, Michigan State University. Chase, W. G., and Simon, H. A. (1973). The mind’s eye in chess. In W. G. Chase, ed., Visual information processing, pp. 215—281. New York: Academic Press. Chang, F., Dell, G. S., Bock, K., & Griffin, Z. M. (2000). Structural priming as implicit learning: A comparison of models of sentence production. Joumal of Psycholinguistic Research, 29, 217-229. Clark, H. H. (1996). Using language. Cambridge: Cambridge University Press. Costa, A. & Caramazza, A. (2002). The production of noun phrases in English and Spanish: Implications for the scope of phonological encoding in speech production. Journal of Memory and Language, 46, 178-198. Cutrell, E., Czerwinski, M., & Horvitz, E. (2001) Notification, disruption, and memory: Effects of messaging interruptions on memory and performance. In: M. Hirose (Ed.), Human-Computer Interaction — INTERACT 2001 Conference Proceedings. IOS Press, IFIP, 263-269. Drummond, K. (1989). A backward glance at interruptions. Western Journal of Speech Communication, 53, 150-166. Ericsson, K. A., & Kintsch, W. (1995). Long-term working memory. Psychological Review, 102, 211-245. Ferreira, E, Bailey, K. G. D., & Ferraro, V. (2002). Good-enough representations in language comprehension. Cunent Directions in Psychological Science, 1 1, 1 1-15. 176 Ferreira, F. & Swets, B. (2002). How incremental is language production? Evidence from the production of utterances requiring the computation of arithmetic sums. Journal of Memory and Language, 46, 57-84. Ferreira, F. & Swets, B. (2005). The production and comprehension of resumptive pronouns in relative clause “island” contexts. Fischer, B., & Glanzer, M. (1986). Short-term storage and the processing of cohesion during reading. Quarterly Journal of Experimental Psychology, 38A, 431-460. Garrod, S. 8 Pickering, M. J. (2004). Why is conversation so easy? Trends in Cognitive Sciences, 8, 8-11 Gillie, T. & Broadbent, D. (1989) What makes interruptions disruptive? A study of length, similarity and complexity, Psychological Research, 50 (4), 243- 250. Glanzer, M., Dorfman, D., & Kaplan, B. (1981). Short-term storage in the processing of text. Joumal of Verbal Leaming and Verbal Behavior, 20, 656-670. Glanzer, M, Fischer, 8., & Dorfman, D. (1984). Short-term storage in reading. Joumal of Verbal Learning and Verbal Behavior, 23, 467-486. Griffin, Z & Bock, K. (2000). What the Eyes Say About Speaking. Psychological Science, 11(4), 274-279. Grosz, B. J. & Sidner, C. L. (1986). Attention, intention, and the structure of discourse. Computational Linguistics, 12, 175-204. Hess, S. M., & Detweiler M. C. (1994) Training to reduce the disruptive effects of interruptions. In: Proceedings of the Human Factors and Ergonomics Society 38th Annual Meeting. Santa Monica: Human Factors and Ergonomics Society, 1 173-1177. Just, M. A., & Carpenter, P. A. (1992). A capacity theory of comprehension: Individual differences in working memory. Psychological Review, 99, 122- 149. Lerner, G. H. (1989). Notes on overlap management in conversation: The case of delayed completion. Westem Journal of Speech Communication, 53, 167- 177. Levelt, W. J. M. (1983). Monitoring and self-repair in speech. Cognition, 14, 41— 104. 177 Levelt, W.J.M. (1989). Speaking: From intention to articulation. Cambridge, MA: MIT Press. Local, J. (1992). Continuing and restarting. In Peter Auer & Aldo di Luzio (eds.), The Contextualization of Language. Philadelphia: John Benjamins Publishing Company. Kane, M. J., Hambrick, D. Z., Tuholski, S. W., Wilhelm, O., Payne, T. W., Engle, R. W. (2004). The generality of working memory capacity: A latent- variable approach to verbal and visuosptial memory span and reasoning. Journal of Experimental Psychology: General, 133, 189-217. Kennedy, C. W., & Camden, C. T. (1982). A new look at interruptions. Westem Journal of Speech Communication, 57, 45-58. Kintsch, W. (1974). The representation of meaning in memory. Hillsdale, NJ: Erlbaum. Meyer, B. J. F. (1975). The organization of prose and its effect on memory. Amsterdam: North-Holland. Meyer, D. E., Irwin, D. E., Osman, A. M., & Kounios, J. (1988). The dynamics of cognition and action: Mental processes inferred from speed-accuracy decomposition. Psychological Review, 95, 183-287. Kempen, G. & Hoenkamp, E. (1987). An incremental procedural grammar for sentence formulation. Cognitive Science, 1 1, 201-258. Kohonen, S. (2004). Turn-taking in conversation: Overlaps and interruptions in intercultural talk. Cahiers AFLS ,10, 15-32. Retrieved April 2005 from Mplew.afis.net/Cahiers/Spring%202004%20kohonen.pd_f. McFarlane, D. C. & Latorella, K. A. (2002) The scope and importance of human interruption in human-computer interaction design, Human-Computer Interaction, 17(1), 1-61. McFarlane, D. C. (2002) Comparison of four primary methods for coordinating the interruption of people in human-computer interaction, Human- Computer Interaction, 17 (1), 63-139. Pickering, M. J., & Branigan, H. P. (1998). The representation of verbs: Evidence from syntactic priming in language production. Pickering, M. T. & Garrod, S. (2004) Toward a mechanistic psychology of dialog. Behavioral and Brain Sciences, 27, 169-226. 178 Reitman, J. S. (1971). Mechanisms of forgetting in short-temi memory. Cognitive Psychology, 2, 1 85-1 95. Reitman, J. S. (1974). Without surreptitious rehearsal, information in short-tenn memory decays. Joumal of Verbal Learning and Verbal Behavior, 13, 365- 377. Rossion, B., & Pourtois, G. (2004). Revisiting Snodgrass and Vanderwart’s object pictorial set: The role of surface detail in basic-level object recognition. Perception, 33, 217-236. Sacks, H., Schegloff, E., and Jefferson, G., (1974). A simplest systematics for the organization of tum-taking for conversation. Language, 50. 696-735. Sanford, A. J. & Sturt, P. (2002). Depth of processing in language comprehension: Not noticing the evidence. Trends in Cognitive Sciences, 6, 382-386. Scheepers, C. (2003). Syntactic priming of relative clause attachments: Persistence of structural configuration in sentence production. Cognition, 89, 179-205. Schober, M. F., Conrad, F. G., & Fricker, S. S. (2004). Misunderstanding standardized language in research interviews. Applied Cognitive Psychology, 18, 169-188. Schriefers, H., Teruel, E. & Meinshausen, R. M. (1998). Producing simple sentences: results from picture-word interference experiments. Joumal of Memory & Language, 39, 609-632. Schriefers, H. & Teruel, E. (1999). Phonological facilitation in the production of two-word utterances. European Journal of Cognitive Psychology, 11(1), 1 7-50. Shah, P., & Miyake, A. (1996). The separability of working memory resources for spatial thinking and language processing: An individual differences approach. Journal of Experimental Psychology: General, 125, 4—27. Speier, C., Vessey, l., & Valacich, J. S. (2003). The effects of interruptions, task complexity, and information presentation on computer-supported decision- making performance. Decision Sciences, 34, 771-797. Smith, M. & Wheeldon, L. (1999). High level processing scope in spoken sentence production. Cognition, 73, 205-246. 179 Smith, M. & Wheeldon, L. (2001). Syntactic priming in spoken sentence production—an online study. Cognition, 78, 123-164. Snodgrass, J. G., & Vanderwart, M. (1980). A standardized set of 260 pictures: Norms for name agreement, image agreement, familiarity, and visual complexity. Journal of Experimental Psychology: Leaming, Memory, and Cognition, 6, 174-215. Swets, B., Desmet, T., Hambrick, D. Z., & Ferreira, F. (2005, September). The role of working memory in syntactic ambiguity resolution: A psychometric approach. Paper presented at the Architectures and Mechanisms of Language Processing meeting, Ghent, Belgium. Swets, B. & Ferreira, F. (2003). The effect of time pressure on the production of resumptive pronoun relative clauses. Poster presented at the 16th Annual CUNY Conference on Human Sentence Processing, Cambridge, MA. Tarantino, Q. (1990). Reservoir Dogs. Screenplay from the motion picture, Reservoir Dogs. Retrieved March, 2005 from h_ttp://www.imsdb.com/scripts/ReservoIr-ngs.html. Thorndyke, P. (1977). Cognitive structures in comprehension and memory of narrative discourse. Cognitive Psychology, 9, 77-110. Trafton, J. G., Altmann, E. M., Brock, D. P., & Mintz, F. E. (2003). Preparing to resume an interrupted task: Effects of prospective goal encoding and retrospective rehearsal. lntemational Journal of Human-Computer Studies, 58, 583-603. Waters, G. S., & Caplan, D. (1996). The capacity theory of sentence comprehension: Critique of Just and Carpenter (1992). Psychological Review, 103, 761-772. West, C. & Zimmerman, D. H. (1977). Women’s place in everyday talk: Reflections on parent-child interaction. Social Problems, 245, 521-529. West, C. & Zimmerman, D. H. (1983). Small insults: A study of interruptions in cross-sex conversations between unacquainted persons. In B. Thorne, C. Kramarare, & N. Henley (eds), Language, Gender and Society. Rowley, Massachusetts: Newbury House Publishers, Inc. Wheeldon, L. R., & Smith, M. C. (2003). Phrase structure priming: A short-lived effect. Language and Cognitive Processes, 18, 431 -442. 180 Yang, L.-c. (2001) Visualizing spoken discourse: Prosodic form and discourse functions of interruptions, in: Proceedings of 2nd SlGdial Workshop on Discourse and Dialogue. Zacks, J., & Tversky, B. (2001). Event structure in perception and cognition. Psychological Bulletin, 127, 3-21. Zijlstra F. R. H., Roe R. A., Leonova A. B. and Krediet, I. (1999) Temporal factors in mental work: Effects of interrupted activities. Journal of Occupational and Organizational Psychology, 72 (2), 163-185. 181