VIDEO MEDIATED LISTENING PASSAGES: THEIR EFFECTS ON INTEGRATED WRITING TASK PERFORMANCE AND NOTE-TAKING PRACTICES By Justin Cubilo A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF ARTS Teaching English to Speakers of Other Languages 2011 ABSTRACT VIDEO MEDIATED LISTENING PASSAGES: THEIR EFFECTS ON INTEGRATED WRITING TASK PERFORMANCE AND NOTE-TAKING PRACTICES By Justin Cubilo The surge in international students studying abroad in English-speaking countries in recent years has made it increasingly important to develop adequate assessments of their language abilities. Prior research has led to a debate regarding listening assessment tasks and whether visual support should be provided to test takers in these tasks, and much of this previous research has predominantly investigated the effects of visual support on multiple choice listening comprehension tasks. The present study seeks to expand on this research by investigating the effects of video listening passages on integrated writing task performance and note-taking strategies. 40 international students at Michigan State University participated in the current study. Each participant wrote essays for two integrated writing tasks, one task was presented with video listening material and one task was presented with audio listening material. Participants also completed an exit survey concerning their perceptions of the video and audio tasks. Results indicated that there was no significant difference in performance on the integrated writing task between the audio and video conditions; however, there was a significant difference in word count in participants’ notes. Qualitative data suggested that there were mixed perceptions of the usefulness of video among test takers. However, the majority of participants preferred the video condition and believed that video mediated listening passages aided in comprehension of information presented in the listening passages. ACKNOWLEDGEMENTS I would like to take this opportunity to thank all the people who have helped me in the process of writing this thesis. First, I would like to thank Dr. Paula Winke, my advisor, for the support and guidance she has offered throughout this project and for the time she has spent recording the listening material and video lectures and teaching me some of the intricacies of SPSS. Dr. Charlene Polio, my second reader, has also been helpful throughout this process. I would also like to give thanks to the MA TESOL program for their generous financial support. In addition, I would like to thank Mike Kramizeh, the head of the language laboratory at Michigan State University, for the use of the language laboratory in my data collection and for taking time out of his busy day to teach me how to use video editing software to create my lecture videos. I have also received help and support from my peers in the academic community. Special thanks goes to Erin Sutton and Erika Lessien, both of whom have given up much of their time for this project and have provided me with ample feedback. Their help has been invaluable. To my fellow thesis writers Ann Desiderio and Hyesun Lee who were my mutual encouragers, thank you. Thanks also to my friends Xiaopeng, Xiang, and Crystal who offered their help in the earliest stages of my project. Finally, thanks to my Mom and Dad who have always supported me and encouraged me throughout my years of education.  iii TABLE OF CONTENTS LIST OF TABLES ....................................................................................................................... vi CHAPTER 1 INTRODUCTION ........................................................................................................................ 1 CHAPTER 2 LITERATURE REVIEW ............................................................................................................. 2 Theories and Models of Listening Comprehension ................................................................ 2 The Role and Effects of Visuals in Listening Comprehension ............................................... 4 Difficulties in the Definition of the Listening Construct ........................................................ 7 Gaps in the Literature and Research Questions ...................................................................... 9 CHAPTER 3 METHODS ................................................................................................................................. 12 Participants ............................................................................................................................ 12 Materials ............................................................................................................................... 13 Procedure .............................................................................................................................. 15 Study Approval ............................................................................................................... 15 Setting ............................................................................................................................. 15 Computer Set-up ............................................................................................................. 16 Data Collection .............................................................................................................. 16 Essay Rating.................................................................................................................... 18 Analysis................................................................................................................................. 19 RQ 1 ................................................................................................................................ 20 RQ 2 ................................................................................................................................ 20 RQ 3 & 4 ......................................................................................................................... 20 CHAPTER 4 RESULTS ................................................................................................................................... 21 Scale Reliability .................................................................................................................... 21 RQ 1 ...................................................................................................................................... 22 RQ 2 ...................................................................................................................................... 24 RQ 3 ...................................................................................................................................... 27 RQ 4 ...................................................................................................................................... 29 CHAPTER 5 DISCUSSION AND CONCLUSION ........................................................................................ 32 Listening Comprehension and Construct Validity................................................................ 32 The Effects of Video and Audio Listening Tasks on Note Taking....................................... 34 Limitations ............................................................................................................................ 37 Directions for Future Research ............................................................................................. 38  iv APPENDIX A YOUTUBE LINKS TO LISTENING MATERIALS ................................................................ 42 APPENDIX B BACKGROUND QUESTIONNAIRE ....................................................................................... 43 APPENDIX C EXIT QUESTIONNAIRE .......................................................................................................... 45 APPENDIX D DIRECTIONS AND NOTETAKING SHEET........................................................................... 46 APPENDIX E ANALYTIC RUBRIC ................................................................................................................ 47 APPENDIX F WRITING PROMPT .................................................................................................................. 50 APPENDIX G INTERVIEW 1 TRANSCRIPT .................................................................................................. 51 APPENDIX H INTERVIEW 2 TRANSCRIPT .................................................................................................. 55 REFERENCES ........................................................................................................................... 58  v LIST OF TABLES TABLE 1 Participants’ backgrounds .......................................................................................... 13 TABLE 2 Listening material order ............................................................................................. 17 TABLE 3 Inter-rater reliability ................................................................................................... 21 TABLE 4 Descriptive statistics for type of visual input ............................................................. 22 TABLE 5 Paired-samples t-test comparison of scores between video and audio conditions ................................................................................................................... 23  TABLE 6 Descriptive statistics of note word counts.................................................................. 24 TABLE 7 Paired-samples t-test of note word count compared to input method ........................ 24 TABLE 8 Participants' overall impression of video input .......................................................... 25 TABLE 9 Participants' preference and reasons .......................................................................... 28 TABLE 10 Participants' focus and use of video content ............................................................ 30 TABLE 11 Participants' comments on note-taking and memory ............................................... 31   vi CHAPTER 1: INTRODUCTION With recent increases in the numbers of international students studying abroad in English-speaking countries, it has become increasingly important to develop adequate assessments of their English abilities that will exhibit their potential to perform in an Englishspeaking classroom. As a result, tests such as the Test of English as a Foreign Language (TOEFL—http://www.ets.org) and the International English Language Testing System (IELTS-http://www.ielts.org) have had a great amount of importance attached to them as indicators of student English ability. Due to this importance in the United States, the TOEFL has been continuously adapted. Out of this adaptation arose the TOEFL iBT, which aims to provide a more integrated method for testing English ability in a variety of areas related to the academic context. One such ability is the skill of incorporating information from both readings and lectures in students’ writing. In fact, TOEFL writers have placed enough importance on this ability so as to necessitate the inclusion of an integrated writing task on the TOEFL exam. With this current study I am interested in the integrated writing task as it appears on the TOEFL iBT and the role that visual support in the form of video lectures has not only on an individual’s ability to write an essay that adequately incorporates the listening and reading material into the essay, but also on the individuals’ ability to retain information from the lecture and the video’s effects on note-taking strategies. I focus on the role of two types of input used as a basis for testing L2 learners’ academic listening skills: (a) audio-only (AO) input with just a photograph present, and (b) audio-visual (AV) input in which the test taker watches an actual video of a lecture. While several studies have examined the effect of visuals on listening comprehension tasks (E. Wagner, 2007; 2010; Suvorov, 2008) few, if any, have actually examined the effects of visuals on writing task performance or note-taking strategies.  1 CHAPTER 2: LITERATURE REVIEW The purpose of this chapter is to discuss theories and studies that have previously been developed or conducted regarding L2 listening comprehension and listening skill assessment. This chapter is broken down into four sections. The first section reviews theories and models of listening comprehension. The second section consists of a discussion of the role of visuals in listening comprehension and the effects they have on the assessment of listening comprehension. In the third section I discuss difficulties with the definition of the listening construct. In the fourth section I conclude by reviewing some of the gaps in the literature and presenting the research questions investigated in the current study. Theories and Models of Listening Comprehension Listening is an essential component for communication in any language and is a necessary part of acquiring a new, second language (Suvorov, 2009). With listening comprehension being so important, many researchers have attempted to define it. However, since some of the first definitions of listening comprehension first arose, there have been many conflicting ideas about what should be included in the definition. The definition of listening comprehension has gone through different stages of development over time. Some of the earlier definitions, such as that put forward by Lado (1961), placed the transference of sound and the information it brought with it as the main component of listening comprehension. As time passed, definitions started to move away from being strictly concerned with linguistic information and started to also be concerned with the nonverbal cues that one is able to see when listening to a speaker. Rubin (1995) defined the skill when he wrote listening comprehension should be considered as “an active process in which listeners select and interpret information which comes from auditory and visual cues in order to define what is going  2 on and what the speakers are trying to express” (p. 7). Chung (1994) further developed this definition by stating that messages that listeners hear have three types of information associated with them: oral (verbally transmitted information from speaker to listener), paralinguistic (body language, gestures, posture, facial expressions, voice pitch, and rate of speech), and the visual context (items present in the environment of the conversation). Because listening can be defined in such a way, it is possible to further define it as a communication activity (Suvorov, 2008). In the process of listening the listener takes all the aspects of the situation, both verbal and non-verbal, into account and acquires some sort of meaning. Many researchers have found that this process can be influenced by a variety of factors. Ockey (2007) cited a number of studies in which it was found that such factors as prosody, rate of speech, background knowledge, and rhetorical cues have an impact on an individual’s ability to listen. The use of non-verbal cues has also been found to have an effect on listening comprehension. Sueyoshi and Hardison (2005) found that both lip movements and gestures are able to aid in the comprehension of a listening task. Ockey (2007) and Rubin (1995) found similar results indicating that body movements, gestures, and facial expressions have an effect on listening comprehension. Such findings show the importance of taking both the visual and audio components of communication into account when trying to develop an assessment task. Based on previous research on and definitions of listening comprehension, researchers have developed several models of listening comprehension. Gruba (1999), referencing Kintsch (1998), wrote that a connectionist cognitive processing model is the most defensible. In this model, the mind processes multiple incoming stimuli at the same time and revises understanding of the stimuli continuously as more information about it becomes available. Bejar, Douglas,  3 Jamieson, Nissan, and Turner (2000) took this connectionist approach further and modeled listening comprehension by splitting it into two stages: the listening stage and the response stage. In this model, three types of knowledge need to be accessed during the listening stage in real time: situational knowledge (SK), linguistic knowledge (LK), and background knowledge (BK). When the incoming acoustic signal and visual cues are received, each of these types of knowledge is accessed as the signal is processed. This stage culminates in a set of propositions (PR) being produced from the incoming auditory and visual signals. Once the propositions are created, the individual switches into the response stage in which they use the propositions as a means for formulating a response which can manifest itself as a selection among a set of choices, a spoken response, or, as is the case in the present study, a written response. Bejar et al.’s model and the model put forth by Grabe exhibit the complexity behind an individual’s listening comprehension and serve to show that each person’s ability to comprehend what they hear differs based on the knowledge they have at their disposal. The present study looks to these models as a method of explaining the way in which learners use the information provided in order to formulate their written responses. The Role and Effects of Visuals in Listening Comprehension With the ever-increasing role of technology in the area of assessment, it has become more and more important to investigate the effects of such technology on test performance. Many studies looking at the effects of different visuals on individuals’ performances on listening comprehension tasks have been conducted. Visuals can range from a single still picture, which is currently used on the listening and speaking portions of the TOEFL iBT, to multiple still pictures as used by Ockey (2007), to video images used by several researchers in their studies (E.  4 Wagner, 2007; 2010; Ockey, 2007; Sovorov, 2008). The results of using such differentiating visuals have produced some mixed results as well. Several researchers investigating the effects of differing visual input have found that the inclusion of visual input in the form of video may not be as beneficial for comprehension as some researchers have tended to suggest. For example, Gruba (1993) found that there was no statistical significance between scores of video and audio tests. Brett (1997) also found conflicting results in his study which showed that, while a video group scored higher on certain task types, an audio group scored higher on other tasks. Coniam (2001) demonstrated that there was actually no significant difference between an audio and a video group and that the audio group actually scored slightly higher on a test of listening comprehension. The results of Coniam’s study also illustrated that participants in the video test-taking group felt that they had gained nothing from taking the test in this method and that they actually felt that they would have done better had they not been distracted by the video. Similarly, Suvorov (2008, 2009) found that, while scores between an audio-only and photo-mediated listening task were not significantly different, performance on the video-mediated task was significantly lower. The results of these studies seem to add credence to previous claims that nonverbal cues are not necessary for, and may be detrimental to, testing listening. While there have been a number of results supporting the exclusion of video from listening tests, results supporting the inclusion of video have been just as numerous. An earlier study conducted by Baltova (1994) with French foreign language learners found that not only did videos help learners in their development of listening comprehension, but the videos also contributed to the learners’ confidence in understanding when their comprehension of the message was low. It was likewise found that visuals in the form of pictures were also somewhat  5 helpful in aiding comprehension, although Chung (1994) and Ockey (2007) found that multiple pictures caused distraction among test takers. E. Wagner (2010) continued in this line of research by investigating the effects of video-enhanced tasks on listening comprehension scores. In his study, he found that scores on the video-enhanced tasks were significantly higher, which he attributed to test takers’ use of non-verbal cues that were exhibited by the speaker in the video. Sueyoshi and Hardison (2005) further examined the effects of visuals on listening comprehension by looking at the way in which lip movement and gestures affected the comprehension process. They found that the inclusion of the visual channel led to increased test scores. They found that test takers at higher proficiency levels did not attend to gestures as much as they attended to lip and facial movement. Sueyoshi and Hardison concluded that not only are visuals able to aid in listening comprehension, but that the use of visual cues differs based on the level of language proficiency at which each individual learner is. Wagner (2006, 2008) found similar findings in his study investigating the effects of audio-only texts compared to video texts. He found that individuals used videos in different ways, once again suggesting that the ability to use non-verbal cues differs based on several factors such as proficiency. Finally, it has been found that not only do visuals have an effect on the listening comprehension of non-native speakers of a language, but they also have an effect on the listening comprehension of native speakers. Morrel Samuels and Krauss (1992), in a study looking at the interplay of gesture and speech in interaction, found that gestures can actually serve to facilitate speech production and can even be an aid to listeners who are listening to a speaker speaking their native language. Thus it would appear that native speakers of a language rely on the presence of gestures to aid comprehension. Likewise, Hadar, Wenkert-Olenik, Krauss, and Soroket (1998) found that gestures are able to help native speakers negotiate the meaning that the  6 speaker is attempting convey in instances where there may be misunderstanding and that they actually aid native speakers of a language by helping them recall lexical items more quickly. Despite the fact that the L2 literature shows that the usefulness of non-verbal cues can be rather contradictory in assessing L2 listening comprehension, it appears that there is at least some effect of visuals on listening comprehension for native and non-native speakers alike. Difficulties in the Definition of the Listening Construct With the research investigating the effects of visuals on listening comprehension has come a discussion of what exactly the listening construct is and what exactly should be tested in listening comprehension. There has recently been much discussion in what one’s listening ability is actually comprised of and whether tests of listening comprehension should include video in their listening tasks. While it has been suggested that video should be used in tasks of listening comprehension which are based on audio that originated with video (Buck, 2001), test developers have frequently rejected the use of video in their listening tests (as reported in E. Wagner, 2008). Coniam (2001) mentioned that the test developers for the Hong Kong English Language Benchmark Test purposely rejected the use of video in their tests, even when some audio was taken from video-based materials. While part of the reason for splicing video away from the audio may be the result of a lack of adequate technology in certain areas where listening comprehension tests are administered, it raises a serious question of the construct validity of these tests (i.e. are these tests measuring the full range of listening ability?). Researchers have expressed opinions on both sides of this issue. Buck (2001) stated his concerns that research has shown that people differ in their ability to use visual cues. He suggested that the use of visuals may create an unfair advantage for those particularly adept at using nonverbal cues and that it is therefore better to focus on comprehension of strictly auditory  7 information. In addition, Gruba (1993) was concerned with whether the use of visual information would affect the overall construct validity of listening tasks and seemed to agree with Buck in the idea that the verbal aspects of communication are more important than the non-verbal aspects of communication. On the other hand, more recently a number of researchers have taken the opposite viewpoint and have stated that video that is naturally in connection with audio helps with the construct validity of a listening test. Construct validity, defined by Bachman and Palmer (1996) is “the extent to which one can interpret a given test score as an indicator of the ability(ies), or construct(s), we want to measure” (p. 21). Without the presence of video, the construct validity of a listening test would be endangered. For instance, von Raffler-Engel (1980, p. 235) suggested that, by taking away the natural visual cues of communication, “an unnatural condition which strains the auditory receptors to capacity” would be present. In other words, by removing the visual channel, the test taker is being put under unnecessary strain and the test does not reflect the natural environment in which test takers would generally use their listening ability and, therefore, the construct validity of this sort of test is lacking. This would seem to be especially true given the findings of Morrel-Samuels and Krauss (1992) and Hadar et al. (1998), reviewed above. They showed that native speakers use the visual channel input in order to facilitate L1 communication. If indeed this is the case, then by removing the visual channel, language testers are undermining the validity of the task, especially given the test taker is no longer required to utilize all natural aspects of communication. The issue of construct validity is of great importance in language testing, and many researchers have devoted a great deal of time to discussing the concept itself. Messick (1989, 1996) discussed this idea when he suggested that in order for a test task to be considered  8 construct relevant, the task must reflect the context of listening. Bachman and Palmer (1996) continued with this suggestion by proposing the idea of the target language use domain. Target language use is simply defined as any set of language tasks that the test taker may encounter outside of the test. If the task is to listen to a lecture where an individual would normally have access to visual cues through the lecturer’s gestures, then the listening task should include the visual channel. Likewise, M. Wagner (2006) argued that if the visual channel is not included in such contexts, task validity is threatened due to construct underrepresentation. In addition to construct validity, Bachman (1990) and Bachman and Palmer (1996) also provided an extensive discussion on authenticity in language assessment tasks. Authenticity, the extent to which a the TLU task corresponds to the test task, is described as having an important role that works together with that of construct validity in determining how the construct definition and the domain of generalization will affect the way in which a test score will be interpreted. This model and the discussion of it in the literature make it clear that test developers need to consider what exactly should be tested in a listening task. Using video will lead to the assessment of a test taker’s ability to use visual cues for understanding auditory information much as they would do in a real-life situation outside of a test task. However, as some (Gruba, 1997; Ockey, 2007) have suggested, in order to include video in test tasks, the definition of the listening construct must first be expanded to incorporate visual cues. However, given the debate in this area and the mixed results of previous studies, it appears that this remains a matter of considerable debate. Gaps in the Literature and Research Questions  9 Researchers have conducted many studies in the area of listening comprehension. These studies have investigated the effects of visuals on student performance on listening comprehension tests. However, these studies have all looked at the impact visuals have on the test taker’s ability to answer multiple-choice comprehension questions. They have not gone further to examine the effects video may have in skill areas. For this reason, the current study examines the effects that video has on essays written as part of an integrated writing task and on the test takers’ note-taking strategies. Therefore, the following research questions guided the present study: 1. Do test takers score higher on their written responses when they are given an audio-visual lecture rather than an audio-only lecture? Although there are some results to the contrary (cf. Sueyoshi and Hardision, 2005), I hypothesize that test takers will score higher on their written responses when writing an essay following an audio-visual lecture than following the audio-only lecture accompanied by a still picture. While writing is different from the multiple choice assessments given in previous research, I believe that results of writing tasks will align with previous research conducted by E. Wagner (2010) and Baltova (1994) by showing that participants are able to more easily comprehend audio-visual listening passages and, as a result, will be able to better use information from the video listening passage in their essays. 2. Do note-taking strategies change based on the way the listening material is presented? Based on a study done by English (1982), in which she stated that participants were unable to attend adequately to a video stimulus due to a note-taking task which was assigned, it is hypothesized that test takers will take fewer notes when presented with a video listening task due to a greater attention they will need to place upon the gestures used by the lecturer. It is also  10 believed that the gestures will provide a better means for promoting recall and, therefore, the test takers will not need to focus as much attention on taking notes. 3. Do test takers notice non-verbal information when watching and listening to a video lecture? How do they use the non-verbal information? Based on previous research (Hardison & Sueyoshi, 2005; Morrel-Samuels & Krauss, 1992; E. Wagner, 2008), it is hypothesized that test takers will take notice of the non-verbal information. In these studies, test takers explicitly stated how they used the different non-verbal cues in order to aid their comprehension of certain aspects of their listening comprehension cues. 4. Do test takers find the video cues helpful or distracting and which method of delivering the listening material do they prefer? Based on research performed by E. Wagner (2007) and Ockey (2007), it is hypothesized that for an integrative writing task, test takers will greatly use non-verbal information as a means to enhance their ability to determine the meaning of previously unknown words and for faster recall. Therefore, it is hypothesized that test takers will view the video lecture as being helpful. As Ockey found, there may be a range of opinions concerning how helpful the video actually is. It is also hypothesized that the video-based prompt will be preferred to the still-picture-based prompt given results of surveys taken by test takers in Sueyoshi and Hardison’s (2005) study.  11 CHAPTER 3: Methods Here I discuss the materials used and the data collection procedure. I will then conclude chapter three with a discussion of the way in which the data were analyzed. Participants The participants for the study were non-native speakers of English enrolled in one of several programs at Michigan State University. Participants at the time of data collection were either enrolled in the two highest levels of the Intensive English Program (IEP), or were attending MSU part-time while taking English for Academic Purposes (EAP) classes. In addition they were either full-time undergraduate students, or full-time graduate students. IEP classes consisted of both provisionally admitted students who had not earned sufficient TOEFL scores in order to be admitted to the university and individuals enrolling for language development. Those with insufficient TOEFL scores are enrolled in IEP classes in order to offset any deficiencies they may have before attending classes in the university. EAP courses were courses that students are required to take after finishing their IEP requirements. They are a series of four courses that give students additional training in using English in an academic context while at the same time allowing them to attend the university courses part-time. To recruit participants, I went to the nine IEP level four classrooms, an IEP level three content lecture (which every IEP level three student is expected to attend twice a week), and contacted via email EAP, undergraduate, and graduate students at MSU. Flyers were also posted on campus to inform other international students of the research opportunity. The study was presented as an opportunity to write TOEFL test preparation essays. Students wishing to participate gave me their contact information or emailed me at a later time. I subsequently assigned them to a specific testing date. A total of 40 students participated in the study.  12 Descriptions of the participants are in Table 1. As indicated, most of the participants were Chinese and most participants were between the ages 18 and 21. Table 1. Participants' Backgrounds Male 17 22 11 Total 40 21 10 9 1 6 1 19 1 0 3 28 2 6 4 14 1 0 1 1 Number of Participants Average Age Average Years of English Instruction Level of Study IEP EAP Undergraduate Graduate Native Language Chinese Korean Japanese Arabic Vietnamese Female 23 20 10 16 4 2 1 0 30 5 2 2 1 Materials I took two practice, integrated TOEFL iBT writing tasks specifically designed by the Educational Testing Service (ETS) as prompts for this study. I took one of these tasks from the ETS website (http://www.ets.org/toefl/ibt/prepare/sample_questions) and I used another task from the TOEFL iBT test preparation guide (ETS, 2009). I used these materials because ETS has extensively investigated the reliability and validity of the materials through pilot testing and found that the prompts fell within ETS’s own fairness guidelines (ETS, 2010). In this study I had the participants follow the instructions and format of the TOEFL iBT integrated writing task. In addition, for one of the tasks, a video of the lecturer was provided, which differs from the still picture that is the only source of visual information present on the actual TOEFL iBT writing exam.  13 All listening materials I used were between two and three minutes in length. The two sets were completely unrelated in the topics they covered. One set discussed the concept of altruism while the other discussed the advantages and disadvantages of computerized voting. The same professor was given the scripts for the listening passages. I recorded the audio and video using a digital video recorder while the professor delivered the lectures for both listening passages. During the recordings, a group of students sat in the classroom because previous research (Alibali et al., 2001) has found that speakers are more likely to produce more natural gestures when there is an audience present. Once the videos were recorded, two versions of each listening prompt were created: one with video and audio together and one with the audio and a snapshot from the video as a still picture. I used Windows Live Movie Maker to edit the videos for clarity and to extract the sound for the audio/still-picture listening prompts. I uploaded the materials to a private YouTube account for later use (Appendix A). I typed and printed the reading materials for use during the task. Materials also included a background questionnaire (Appendix F) that I used to ask participants information regarding their age, length of English study, number of times they had taken the TOEFL, and whether they planned to take the TOEFL again. I also developed an exit questionnaire (Appendix G), which consisted of four questions concerning opinions and thoughts about the two forms of listening passages and a question about their note-taking behavior. Participants used a personal note-taking sheet for each integrated writing task (Appendix H) which also contained the directions for the task. The directions on the sheet were the same as those offered on the TOEFL iBT writing task. Finally, I had two researcher use an analytic rubric originally developed by Polio and Hughes (2002) to evaluate the participants’ ESL writing development (Appendix I). I chose an  14 analytic rubric for a number of reasons. While it may be more time-consuming than a holistic rubric to use, the analytic rubric is able to reflect the different aspects of a test taker’s writing ability (Weir, 2005) and, therefore, it gives a much fuller picture of how the different variables affect writing ability. In addition, since some students wanted to receive scores back in order to know what they should improve on, it was important to provide them with a measure that would actually display specific areas where they needed to improve (Hamp-Lyons, 1991). Furthermore, I chose this style of rubric since I believed it would be easier to train the raters using an analytic scale as suggested by McNamara (1996). In addition, I made this decision based on theories on rubric use outlined by Weigle (2002) and Bachman and Palmer (1990): they noted that the use of an analytic scale is more reliable than a holistic scale and lends toward a greater range of difference in scores. Procedure Study Approval Prior to starting data collection, the university IRB approved the study. Policy dictated that every participant must sign a consent form discussing the purpose of the study and the risks, benefits, means of ensuring privacy, and the procedures associated with it. Therefore, at the start of each testing session, I passed out and went through the consent form with participants. Setting Testing took place in a university computer lab at Michigan State University. The lab was equipped with 36 iMac computers running Mac OS X Snow Leopard and all were connected to the Internet. Computers were arranged in six rows, each with six computers in it. Seats were arranged so that participants would be facing the front of the room when seated at their computers. Each computer had a large monitor and headphones so that individuals could hear the  15 listening passage and see the video on their computers clearly. There was also a teacher’s station connected to a projector which was used to show participants how much time was remaining in the test. Computer Set-up One computer was assigned to each student who had signed up for that day’s test. Each computer had two open Microsoft Word documents on it that they used for the two writing tasks. While spell check and grammar check were available, participants were instructed not to use them. The documents were blank except for the essay prompt typed at the top of the page (Appendix J). In addition to the Word documents, the Firefox Internet browser was also open with two separate YouTube pages open. One page contained the video listening passage while the other page contained the audio listening passage with a still picture and these were ordered according to the condition the participant was in. The first listening passage that the participants listened to was already on the screen in full-screen mode when the students were seated at their assigned computers. Data Collection Participants came to the computer lab on the day they had signed up for and were placed randomly into one of four experimental groups based on the order in which they signed up to participate in the exams. The experimental groups were arranged based on four conditions: 1. Altruism Listening Material with Video. 2. Altruism Listening Material with Audio/Still Picture. 3. Computerized Voting Listening Material with Video. 4. Computerized Voting Listening Material with Audio/Still Picture.  16 In order to ensure that the listening content or order of presentation did not have an effect on the participants’ performance, a repeated measure design was used. Table 2 shows the order in which the listening material was played for all groups. Table 2. Listening Material Order Group Listening One Listening Two 1 Condition 1 Condition 4 2 Condition 2 Condition 3 3 Condition 3 Condition 2 4 Condition 4 Condition 1 Upon arrival, participants were seated at their assigned computers. Once all participants were present, I explained the consent form and instructed students to sign it if they still wanted to participate. Once I collected the consent forms, I instructed the participants to complete the background questionnaire. They did so before the start of the actual writing task. After collecting all background questionnaires, the researcher passed out the first directions and note sheet to the participants. The researcher then read the Directions out loud to the participants and stressed that they should use this sheet should to take notes on the listening material. Participants then had the opportunity to ask questions that they may have had, the researcher answered them, and then the actual task was begun. The participants then began their first integrated writing task. The researcher handed out the reading material face down to each participant. When each participant had a copy of the reading, they were instructed to flip over the sheet of paper and to take three minutes to do the reading. At the end of the three minutes, the test takers flipped the reading over again. At this  17 time, they put on the headphones that belonged to their individual computer station and, at the same time, everyone started the first listening that was on the screen and they took notes on the listening material. At the end of the listening, participants closed the Internet browser and opened the Microsoft Word document prepared for them on the computer. They read the essay prompt and had 20 minutes to write an essay comparing the reading and listening material. While writing, they were able to look at the reading material again. After time was up, participants were saved the document to the desktop of the computer so that the researcher could later go and collect all essays. The second writing task was conducted in the same manner as the first. Following the completion of the second writing task, participants received the exit questionnaire and the researcher explained the questions on it. Participants then filled out the questionnaire and the researcher walked around answering any questions that may have arisen due to misunderstanding of what the questions were asking. When the participants finished filling out the exit questionnaires, the researcher asked for volunteers who would be willing to participate in a brief recorded interview. The interview was completely voluntary and participants knew that it was not necessary for them to participate in the interview. Of the 40 participants, four agreed to the interview and they answered questions concerning their opinions of the task and what their preferences were. Due to time constraints of the participants, the researcher conducted the interviews with two participants present at the same time. Essay Rating Two graduate students studying Teaching English to Speakers of Other Languages (TESOL) volunteered to help me in rating the essays. One rater had no experience in rating essays and one had some experience with rating practice ACT essays. The raters used the analytic scale previously discussed in order to assign scores to the different essays. I trained the  18 raters according to suggestions put forth by Weigle (2002). Raters received three sets of benchmark essays from the integrated writing tasks of the TOEFL iBT made available by the Educational Testing Service and each rater examined one set of benchmarks essays at one time. The researcher gave the raters the first set with the essays in order from lowest score to highest. Each essay had an appropriate score according to the scoring scheme of the analytic rubric written on it. The researcher went through each essay with the raters and described the specific points of the essay that corresponded with the rubric’s criteria. Once raters felt comfortable with the rubric, they received a second set of essays representing each score range of the analytic rubric. Raters individually assigned scores to each essay and compared with each other. Finally, raters examined a third set off essays which contained essays at each score level, with some score levels having multiple essays and with some more problematic essays. Raters once again scored essays individually and then compared with each other. At this point, the researcher decided that the scores given to the practice essay were sufficiently close to each other and the raters began individually rating the 80 essays collected from the participants. Analysis In this study I used mixed methodology in order to answer the four research questions I presented earlier. The combination of quantitative and qualitative methodology has become increasingly common and has been viewed as “complementary rather than fundamentally incompatible” (Duff, 2002, p. 14). Quantitative data consisted of the raw scores that were collected from scoring the essays with the analytic rubric and the word counts from the note pages that were collected from the participants. Qualitative data consisted of transcriptions from the four interviews that were conducted as well as answers to the questions on the exit questionnaire. IBM SPSS 19 software was used to perform statistical analyses of the quantitative  19 data. Qualitative data was investigated by looking for themes that arose from interview transcripts (Appendix K and L) and questionnaires. RQ 1: I addressed the first research question by using a paired samples t-test in order to compare the average of the raw scores assigned to the essays across the two types of tasks—the video task as compared to the audio/still-picture task. RQ 2: I addressed the second research question through a combination of quantitative and qualitative analysis. I compared word counts calculated for each student’s note pages were compared between video and audio tasks. In addition, I examined answers given in the exit questionnaire about note-taking in relation to the results of the quantitative analysis. RQ 3 and 4: I addressed the final two research questions through the analysis of the qualitative data obtained from the exit questionnaire and the interview transcriptions. By looking at responses, I found common themes among the participants’ replies and they were coded.  20 CHAPTER 4: RESULTS The purpose of chapter four is to present the results of the data analysis for each of the four research questions. In this chapter I present the results as they relate to each research question. Scale Reliability In order to assure that the scores assigned to the essays by the raters were reliable I calculated inter-rater reliability and percent agreement values by examining the relationship between scores given by the two raters for each topic overall as well as for each of the subscales found on the rubric. The results of this analysis can be seen in Table 3. Table 3. Inter-rater Reliability Topic Sub-Categories Altruism Voting Content 0.87 ** (92.25) 0.64 ** (87.63) Organization 0.81 ** (92.38) 0.77 ** (90.75) Vocabulary 0.63 ** (90.13) 0.48 * (90.63) Language Use 0.57 ** (90.00) 0.23 (85.88) Mechanics 0.55 ** (93.44) 0.17 (91.38) 0.62 ** (91.42) Overall 0.86 ** (94.24) Note: Values are Pearson Product Moment correlation coefficients (r) between the two raters on the given rating category for the given topic; ** = p< .000, * = p< .05; within parentheses is the percentage of agreement between the two raters. As can be seen in Table 3, the inter-rater reliabilities are highly significant across most of the subscales and in terms of the overall scores. The only exception to this is in the areas of language use and mechanics for the topic of computerized voting. This, however, may be expected given the vague nature of these categories (i.e. not everyone agrees on what to count as a syntactic error or punctuation error). The overall inter-rater reliability values are also high, and  21 these values are also closely matched to those obtained by ETS (2011) concerning the writing portion of the TOEFL iBT, which has obtained average inter-rater reliability results of .78 in its own calculations. While some of the correlation coefficient values are rather low, the percent agreements across all categories are very high, with the lowest percentage at 85.88% for language use in the computerized voting essays. This signals that the raters were generally in close agreement on scores and that the scores used for the rest of the statistical analyses in this study were reliable. RQ 1 The first research question investigated whether there was a significant difference in essays scores received after being presented with video listening material as compared with audio/still-picture listening material. Table 4 containts the descriptive statistics for each of the input types for the 40 participants. Table 4. Descriptive Statistics for Type of Visual Input Audio/Still Picture Video Mean SD Mean SD Overall 40.44 10.08 43.28 12.08 Content 8.94 3.35 8.78 3.93 Organization 7.83 2.88 8.64 3.27 Vocabulary 9.31 1.96 10.04 2.47 Language Use 8.59 2.20 9.83 2.70 Mechanics 5.87 1.24 5.94 1.44 Note: n=40 In Table 4 it can be seen that, for overall score, the video condition had a slightly higher mean than the audio with still-picture condition. However, when looking at the subcategories found on the rubric, the results are not so straightforward. While the means for the subcategories  22 of organization, vocabulary, language use, and mechanics all seem to increase in the video condition, the content subcategory actually received a lower mean in the video condition as compared to the audio condition. In order to investigate the significance of these mean differences, paired samples t-tests were performed. Table 5 displays the results of these tests for both overall scores and the scores for each of the subcategories found on the rubric. Table 5. Paired-Samples t-test Comparison of Scores Between Video and Audio Conditions Effect Size Category t-value df p SE (r) Content -0.21 39 0.84 0.78 0.03 Organization 1.71 39 0.10 0.47 0.26 Vocabulary 1.71 39 0.10 0.42 0.26 Language Use 2.78 39 0.01 * 0.45 0.41 Mechanics 0.28 39 0.78 0.25 0.04 Overall 1.62 39 0.11 1.74 0.25 Note: * = significance at the .05 level As Table 5 shows, on average, the participants did not receive significantly different scores on their essays between the video (M = 43.28, SE = 1.93) and audio/still-picture (M = 40.44, SE = 1.61) conditions, t(39) = 1.62, p > .05, r = .25. The subcategories of content, organization, vocabulary, and mechanics likewise produced non-significant results showing that the presence of video or still picture did not have a great effect on these areas. However, on average, it was found that participants did receive a significantly higher score in the area of language use in the video condition (M = 9.83, SE = 0.43) as opposed to language use in the audio/still-picture condition (M = 8.59, SE = 0.35), t(39) = 2.78, p < .05, r = .41. According to Cohen (1988, 1992), an r value of .41 indicates that there was a moderate  23 effect size of the video- versus audio-condition on this category of the rubric, meaning that the video listening materials were providing some significant aid in this category. RQ 2 The second research question investigated whether there was a difference in note-taking practices between the video and audio/still-picture conditions. This was evaluated by doing a word count on participants’ notes. Table 6 displays the descriptive statistics for the word counts between the two conditions. Table 6. Descriptive Statistics of Note Word Counts Input Method N Mean SD Audio 40 34.00 19.90 Video 40 28.93 17.55 Results in Table 6 display the audio condition with the highest mean (M = 34.00) and the video condition with a lower mean (M = 28.93). In order to investigate whether these means were significantly different, I performed a paired samples t-test. The results of this analysis are in Table 7. Table 7. Paired-Samples t-test of Note Word Count Compared to Input Method t-value -2.39 df p 0.02 SE 2.13 Effect Size (r)  39 0.36 24 The results of this analysis demonstrate that the number of words in the participants’ notes significantly differed between the video condition (M = 28.93, SE = 2.78) and the audio condition (M = 34.00, SE = 3.15), t(39) = -2.39, p < .05, r = .36. Thus, it appears that the method of delivering the listening material had a moderate effect on note-taking practices—those who received the video-based input wrote significantly fewer notes. In addition to performing a paired samples t-test on word counts, qualitative data were also collected from the participants via the exit questionnaires. I analyzed their responses for themes related to note-taking that may have explained why fewer notes were taken on average during the video-based listening tasks. Table 8 displays these themes and the number of tokens for the occurrence of each theme. Table 8. Participants' Comments on Note-Taking and Memory Major Category Subthemes Note-Taking a. Multitasking difficult 11 b. Body Language was Distracting 3 d. Did not watch video 2 e. Did not take notes 5 a. Easier comprehension led to easier recall 4 b. Gestures facilitate longer storage in memory 15 c. Easier to memorize 2 Memory Tokens d. Too much information to recall 4 Note: Numbers may not match total number of participants due to the fact that some comments were not related to any of the major themes.  25 Table 8 has two major categories related to comments on note-taking and comments on memory and the subthemes associated with these categories. The most common theme that appeared in regards to note-taking was the difficulty of listening, watching and taking notes at the same time. Participants generally wrote that they found it extremely difficult to take notes during the video. They gave several reasons for this on their exit questionnaires. Many of these reasons were that it was simply too difficult to pay attention to everything at the same time or that the movement was distracting. The following are some representative examples of the common themes that arose in relation to the note-taking category: Example 1: Yes, it is hard for me. Because when I was taking notes, I couldn't pay attention to the video. I can only focus on one thing carefully. (Participant 25, in IEP program, from China) Example 2: Well Audio with picture was attentive? I couldn't concentrate on it any more because there is no moving or anything like that. Audio with video, the lecturer was moving and hand motion and everything. I was just following her action, looking at her. So, I couldn't concentrate as much as the audio with picture. (Participant 3, in IEP program, from Korea) Example 3: I think both ok. I don’t have time to look at screen. (Participant 7, in IEP program, from China) Example 4: I don't prefer to take a note while the video was playing because the length of lecture is only 2 or 3 minutes. (Participant 16, in IEP program, from China) The researcher also selected comments concerning memory given that it was hypothesized that non-verbal cues would facilitate storage of information in memory. The most common theme to arise in respect to this aspect of the question, which had 15 tokens, was that  26 the gestures in the video facilitated longer storage of information so participants felt that they could take fewer notes. Example 5: Yes. Sometimes the presenter will show some picture or gester to me, that is the point. And make me remember longer than reading. (Participant 5, in IEP program, from Taiwan) Other common themes in regards to memory were rather equal in their distribution and much lower than those related to gestures aiding longer storage in memory. These comments focused more on recall of information rather than the storage of it. Example 6: Yes. Some behaviors of instructor help me to recall some parts of listening. (Participant 39, in EAP program, from China) Example 7: No, because the information is too much and I cannot understand all the information, just can remember the things which I can understand include a lot of extra information. (Participant 11, in IEP program, from Vietnam) RQ 3 The third research question addressed whether participants actually took note of nonverbal information in the video listening and how they used non-verbal information that they did notice. In order to investigate this, I evaluated the responses from the exit questionnaires and interviews. I developed a list of themes as they related to the question. The themes and the number of tokens for each theme are in Table 9. The information in Table 9 shows that, in terms of focus, the majority of responses on the questionnaire and in the interviews stated that the focus was either on the gestures and body language of the lecturer or on the content of the lecture. A smaller portion of the comments focused on the teacher herself, the intonation and stress in the teacher’s voice, or some other  27 Table 9. Participants' Focus and Use of Video Content Major Category Subtheme Tokens Focus a. Teacher 4 b. Gestures/Body Language 11 c. Stress/Intonation 3 d. Listening to information 15 e. other 8 a. Aid Comprehension 11 b. Cues for important information 6 Use c. No indication of use 9 Note: Numbers may not match total number of participants due to the fact that some comments were not related to any of the major themes. aspect that was only mentioned by one of the participants and could not be considered a major subtheme. Examples 8 through 11 are representative of these comments. When asked what they were focusing on when the video was playing, this is what some of the participants said: Example 8: On the people who talking. (Participant 21, in IEP program, from China) Example 9: Interviewer: so you're saying that when you...when you were watching the video you were paying more attention to her... Speaker 2: her gestures...yeah. (Participant 1, in IEP program, from Saudi Arabia) Example 10: The stressed words and the words appear in the reading. (Participant 31, Freshman, from China) Example 11: I pay the most attention towards the listening. (Participant 30, in IEP program, from Korea) It appears the major category of use is related to the actual use of non-verbal cues only. In terms of use, there was a relatively equal spread across themes regarding the number of instances of each one found in participant responses. Eleven participants stated that they used gestures to  28 aid in their comprehension of words or overall comprehension of the listening passage. Nine participants, even though taking notice of the non-verbal cues, indicated that they did not use the information or find it helpful. Finally, six participants stated that they used the gestures and other visual information as clues that told them what was important to write down in their notes. Following are some representative examples of these themes: Example 12: Speaker 2: uh..i think it's the same as she said, but at first i didn't like it. because i am...i am the kind of per..people who like to look around everything. So..but...actually, I understand more because her gestures help me like to find out word like "burrow." I didn't know it was the hole in the ground unless she did with her gesture that (hand motion). And also, like she said, the eye contact and some gestures just show you the important things she's going to do. So...her hand(?) just maybe talks as important as her words. (Participant 5, in IEP program, from Taiwan) Example 13: Audio with video. I can find which one is important and which is the first or second from the person's body language. (Participant 31, Freshman, from China) RQ 4 The final research question asked whether participants preferred the video format or the audio-and-still-picture format for the listening passage. With the question I investigated whether participants found the video lecture distracting or helpful. To determine participants’ preferences, responses to question one on the exit questionnaire were tallied and preference reasons were coded as well. The results of this analysis can be seen in Table 10. The data in Table 10 illustrates that the majority of participants had a preference for video input with most of the comments revealing that it was easier to comprehend the information presented in the lecture because of clues from the visual channel. The participants also seemed to prefer the  29 Table 10. Participants' Preference and Reasons Preference Themes Tokens Video a. More Realistic 6 b. More Active/Higher Concentration 3 c. Easier to Comprehend 15 d. Other 4 Total Audio 28 a. Easier to Focus on Content 9 b. Other 1 Total Both 10 a. Did not look at screen 2 video condition because it was more realistic or authentic. Ten participants said they preferred the audio/still-picture input, with all but one saying the reason for this was because it was easier to focus on the audio content of the lecture. Finally, two other participants stated that both forms of input were okay and stated that they often or mostly did not look at the screen for the video. Examples 14 and 15 provide contrasting examples of responses from participants that had a definite preference for one form of input over another. Example 14: I prefer the audio with video because it is more realistic. I just feel like in the class. I could see our teacher's action. (Participant 2, in IEP program, from Korea) Example 15: [I prefer] audio with picture because audio with video made me looking the computer and follow her moves, therefore I couldn't concentrate on listening as much as I did on audio with picture. (Participant 26, in EAP program, from China) In addition to examining these themes, I also examined the survey data to determine whether participants found the video helpful or distracting. Each participant made mention of  30 this at least once in their survey. The count of those who found the videos helpful or distracting are in Table 11. Table 11. Participants' Overall Impression of Video Input Perception Count Helpful 25 Distracting 11 Neutral 4 Total 40 Overall, the information Table 11 shows that participants generally believed, for reasons that will be covered in the next chapter, that the videos were a helpful aid, with 25 saying video was helpful. Eleven participants stated they felt the videos distracted them and four other participants were neutral concerning the use of video in the listening task. It should be noted that not everyone who stated that they had a preference for video stated that the video was helpful. Two participants who had said they preferred the video condition displayed neutrality on whether it was helpful or distracting and one participant actually said that the video was distracting.  31 CHAPTER 5: DISCUSSION AND CONCLUSION In this chapter I examine the results from the previous section and seek to discuss them in further detail for each research question. I then continue by indicating the general and pedagogical implications that arise. Finally, I conclude with a discussion of the limitations of this study and some suggestions for future research. Listening Comprehension and Construct Validity The findings of the current study seem to demonstrate that the type of visual that an individual is presented with will generally not affect their use of information in an integrated writing examination. The results of this study concur with those previously found by Coniam (2001) and Londe (2009) who found that audio and video conditions on a listening comprehension task did not produce significant test score differences. The results of the current study conflict to a degree with those found by other previous research. Many researchers (Baltova, 1994; Gruba, 1999; Sueyoshi & Hardison, 2005; Suvorov, 2008, 2009; E. Wagner, 2010) have found significant differences on listening comprehension tasks based on the types of visuals that are provided. However, it should be noted here that the assessments they were using came in the form of multiple-choice questions related specifically to comprehension, whereas the assessment task in this study was focused on having the participants not only comprehend the information they had heard, but also use it in a meaningful way in the essay. Therefore, it could possibly be argued that, while visuals may have significant impacts on an individual’s ability to comprehend the information received from a lecture, this effect does not extend into the way the information is used when writing an essay for an integrated writing task. By investigating the qualitative data regarding the way the participants used and focused on the video listening material, a much fuller picture of issues at hand can be determined. Given  32 that the majority of the participants preferred the presence of video and noted the video’s helpful non-verbal cues, it would seem that the presence of video input, while not necessarily helpful during the writing phase of the task, is actually relatively helpful during the comprehension phase. This was seen in many of the comments provided on the exit questionnaires and in the interviews. The preferences found in this study aligned with the results of previous studies conducted by Progrosh (1996) and Baltova (1994) who also found that participants generally preferred video input over audio-only input. Evidence for this preference for video-based input was revealed in the participants’ comments regarding what they focused on when presented with the videos and how participants used the videos’ non-verbal information. It was apparent from comments that participants were not only paying attention to the video, but that they were also using non-verbal information in the form of gestures as a means for aiding comprehension of difficult vocabulary items and to determine which aspects of the lecture were the most important for them to attend to. These results support the previous findings of M. Wagner (2006) who found that participants tended, on average, to watch video listening input 69% of the time. It also lends support to findings by Sueyoshi and Hardison (2005), who found that learners used gestures and lip movement to aid in comprehension, and Ginther (2002), who found that content visuals were more helpful in aiding comprehension than context visuals. However, the results of this study should be interpreted with some caution. Coniam (2001) found that 80% of his study’s participants felt they were not aided by video and actually preferred audio-only input, contrary to the results found here. Results from the surveys collected in this study may shed some light on the conflicting findings from this study and Coniam’s. Some participants openly stated that audio-only input was what they were accustomed to when presented with listening material in classrooms in their home countries and, therefore, that the  33 audio-with-still-picture condition was preferred. Other participants simply did not see the purpose of including non-verbal cues. For example, one participant said in an interview, “I don't care about the lecture is with a video or a picture. The point is, do I have the ability to understand what they said?” (Participant 35, Freshman, from China). Comments such as this one indicate that some of the participants came from a background in which they were not taught how to use non-verbal information during listening tasks, or that perhaps they were not used to watching video in the context of a listening test. This study has pedagogical implications in the area of strategy instruction. In this study, participants who preferred audio-only input generally stated that the reason for this was because the audio-visual input was distracting and made it difficult to focus on content. This may indicate that students need to be instructed in strategies for using these non-verbal cues and that video should, perhaps, be included in listening tasks that are presented in class and/or in preparation for listening tests. The tests themselves should have more video-based content. If students are being assessed on their ability to receive information from a lecture, in order for the test to be a valid measure of this ability the presence of non-verbal information should be present. If students are expected to succeed in a real lecture, it seems that it would not only be important that they know how to utilize non-verbal information, but that they also know how not to be distracted by it. Buck (2001) stated that by including non-verbal information, students who are adept at using such information have an unfair advantage. However, as E. Wagner (2008) stated, by administering an audio-only listening task, the components of the Target Language Use domain are missing and those students who are adept at using these cues are unfairly disadvantages and their ability is underrepresented. The Effects of Video and Audio Listening Tasks on Note Taking  34 Being able to take notes can compensate for memory constraints and therefore increase face validity of a test (Vandergrift, 2009). In addition, listeners are not able to go back and review the information that has been presented to them in the same way readers can (Thomson, 1995), making a task such as the integrated writing task incredibly difficult without some form of notes. Previous research has done little in the way of investigating the note taking behavior of test takers on listening comprehension tests. Several studies have found that the presence of notes and the ability to take notes during a listening task can actually facilitate comprehension and aid in the recognition of specific information (Liu, 2001; Carrell, Dunkel, & Mollaun, 2004). Another study conducted by Carrell (2007) investigated the effect of note taking strategy intervention find that while there was little impact of strategy intervention on note taking, the subsequent performance on a listening and writing task, similar to that conducted in this study, related consistently to the number of content words in the test taker’s notes. Since the task in this study was based on the integrated writing task found on the TOEFL iBT, students were provided with paper to take notes as they are in that test. The current study took advantage of this condition by investigating the effects of the different listening conditions on note taking. Results indicated that there was a significant difference in word counts between the two conditions with the video condition producing notes with lower word counts. This supports previous research in which it was found that test takers tend to focus their attention on the screen for longer periods of time when video is present than when there are still-pictures only (E. Wagner, 2007) and that changing images or movement can be distracting (Chung, 1994; E. Wagner, 2008). This may also be the result of a greater cognitive load being placed on the test taker who has to focus on watching the lecturer, listening to the information, and writing relevant information down in notes.  35 Qualitative data helped clarify this conclusion by providing some additional information that may support this evidence of split-attention effect. Many participants stated that the video created a need to multitask and that this made it much more difficult to take notes. In fact, as was seen in Table 8, 11 out of 21 tokens in reference to note-taking mentioned multitasking difficulty. Further comments demonstrated that participants seemed to believe the video listening was somehow “faster” than the audio/still-picture listening material and that the video content made getting information from the lecture more difficult. This comment was found in relation to both video lecture topics, which may suggest that participants perceived the video lectures as being “faster” because of an extra cognitive load the video condition presented. In essence, it could be that the “distracting” effect of video presentation reported by Chung (1994) and E. Wagner (2008) is one and the same as the split-attention effect also described by E. Wagner (2008)—the participants may just be describing their inability to attend to all modalities (audio, visual) and task components (comprehension, note-taking) at the same time and the participants may, in laymen’s terms, describe their perception of this cognitive overload as being “distracted by the video” or by “the movement in the video.” In the current study I also investigated whether test takers felt it was easier to remember information in the video lecture than in the audio lectures. As was seen in Table 8, 19 of the 25 comments concerning memory centered on how the video was helpful in terms of storage of information in memory and recall. Participants’ comments suggested that specific gestures actually facilitated the recall of information when they needed it. This could possibly be because the gestures aided in comprehension and made the information in the lectures more salient to the listeners. Furthermore, some participants stated that they felt taking notes actually negatively impacted the effect of the videos. One of the interview participants stated:  36 Example 16: If I do not take notes, I think it is better for video, because I mean...body language is helpful. So, I can understand more, but if we need to take notes, it's not a good idea. (Participant 9, in IEP program, from China) Comments such as this one suggest that language learners realize that body language and gestures are helpful and that they should use them as an aid when listening. However, this comment also causes more questions to arise. Namely, if students are taking notes, how attentive can they be to body language and gestures? Obviously in the real world, students in lecture halls have to negotiate both tasks. Perhaps practice in doing so is needed. There is some evidence in this study that a test which requires both (note taking and visual attention to gestures, facial features, and body language to augment comprehension) is more advanced than the curriculum leading up to the test—if listening tasks in the classrooms do not require such task and skill negotiation, a test with such components may feel unfair or be too taxing. Limitations While the data obtained from the current study is rather informative, there are still several limitations in relation to the study and the listening test itself. First of all, the study lacked authentic materials. The listening materials used for this study were scripted materials originally designed by the Educational Testing Service and therefore lacked authenticity. Gruba (1997) stated that authentic materials are preferable in listening tests and therefore this would most likely be a consideration for future studies. A second limitation of the study is found in the interviews and questionnaires. The information from the interviews and questionnaires should be taken cautiously since participants were informed of what the study was investigating. Therefore, it could be possible that a Hawthorne effect was present in which participants presented answers to survey and interview  37 questions to fit in more with the study’s main investigation. In addition, due to participants’ time constraints, it was necessary to conduct interviews in pairs. This may have made it so that one participant affected the other participant’s opinions. A third limitation is the way the notes were analyzed. A word count does not necessarily provide a good indication of how video and audio/still-picture input sources affect note-taking strategies. It may have been better to count content units since that may have provided a more accurate measure of how video affected note taking behavior. Another limitation to the present study can be found in the participants themselves. Not only were most of the participants from the same cultural background, but the proficiency levels that were represented are not very diverse. Most were approximately at the intermediate level of proficiency on the ACTFL scale. Therefore, the results of this study are difficult to generalize. Directions for Future Research The results of the present study leave several questions to be investigated in future research. The first of which would be to investigate the effects of strategy instruction on participants’ ability to utilize non-verbal cues. Given that there were several responses of individuals who found the video listening passages “distracting,” it would be interesting to instruct test takers on stress patterns and gestures and then see if there is an effect of that instruction on subsequent test scores. Second, there were several participants who stated that they felt more relaxed when given the video listening input and that because of this they were able to comprehend better. Researchers such as Arnold (2000) and In’nami (2006) have found conflicting results concerning the effects of anxiety on listening comprehension. The comments provided in this study may indicate that anxiety does have some effect on listening comprehension and, therefore, it may be  38 important to further investigate the affective dimensions of the inclusion of video input on listening comprehension exams. A third possibility for future research concerns note taking strategies. When looking at notes from the current study, content was not considered. It may be interesting to examine how video and audio listening passages affect the actual content found in notes and what kind of content may contribute to higher scores on an integrated writing task. A fourth direction for future research could be to investigate the actual focus of test takers when given a video listening task. While E. Wagner (2007) found that participants looked at the screen 69% of the time when presented with a video listening task, it would be interesting to investigate what test takers actually focus on in the video using eye-tracking technology. Another direction for possible future research would be to investigate specific kinds of gestures and to investigate how each type aids comprehension. For instance, investigating whether metaphoric gestures would aid in retention and comprehension more than beat gestures would lead to further understanding of why types of gestures may attract the most attention and facilitate the process of listening comprehension. Furthermore, investigating the effect of these types of gestures on note-taking practices in terms of content would be of interest to investigate what is more salient to test takers. A further possibility for further research would be to investigate the connection between the comprehension phase and the writing phase. It would be interesting to examine whether the results of this study are due to the fact that there is a stronger link between the listening and comprehension phase than there is between listening and writing or if some other factor is at play regarding performance on the essay task.  39 Finally, with the number of participants stating that they found it easier to store information in their memory and later recall it with the video, it may be worth investigating the way in which body language and gestures ease an individual’s cognitive load and the amount of information one is able to recall as a result of video listening tasks. The findings of such a study would give a fuller view on the factors that affect an individual’s ability to comprehend the aural stream.  40 APPENDICES  41 APPENDIX A– YOUTUBE LINKS TO LISTENING MATERIALS Computerized Voting with Still Picture and Audio: http://youtu.be/FBW_STr6tEA?hd=1 Computerized Voting with Video: http://youtu.be/3IWSQ8PT4CY?hd=1 Altruism with Still Picture and Audio: http://youtu.be/kAWvQmKZI-U?hd=1 Altruism with Video: http://youtu.be/hRgsmMRpeDY?hd=1  42 APPENDIX B – BACKGROUND QUESTIONNAIRE Participant ID # _________________ (To be filled in by the researcher) BACKGROUND QUESTIONNAIRE TOEFL Integrative Writing Task Project PLEASE FILL OUT THE FOLLOWING BACKGROUND INFORMATION. PLEASE PRINT CLEARLY. 1. Name: a. First name: ____________________________ c. Middle initial: _______ b. Last name: ___________________________ 2. Age: _____ 3. Gender: 4. Phone number: ( 5. Email address: _________________________________________ 6. Native language (first fluent language, also known as your “mother tongue”): __________________________ Male Female ) __________ - __________________ a. How did you learn English? _______________________________________________________________________ b. How old were you when you started learning English? ________________________ 7. How many times have you taken the TOEFL in the past? ________________________ 8.  Do you currently have plans to retake the TOEFL? If so, when? Have you taken any classes designed to improve your test score? 43 ________________________________________________________________________ 9.  Have you taken any classes or used any study aids (books, flashcards, etc…) designed to improve your TOEFL score? If so, what did you use and how recently did you use it? _______________________________________________________________________      44 APPENDIX C – EXIT QUESTIONNAIRE Exit Questionnaire: Please answer the following questions to the best of your ability based on your test-taking experience. Name: ____________________ 1. Which of the lectures did you prefer, the audio with picture or the audio with video? Why? 2. Do you think that the presence of the video aided in your comprehension of the information being delivered? Why or why not? 3. What did you find yourself paying the most attention towards in the video lecture? 4. Did you find it difficult to take notes while the video was playing? Please Explain. 5. Did you think it was easier to remember information received from the video lecture? Please Explain.      45 APPENDIX D – DIRECTIONS AND NOTE-TAKING SHEET Directions: For this task you will read a passage about an academic topic and you will listen to a lecture about the same topic. You may take notes while you read and listen. Then you will write a response to a question that asks you about the relationship between the lecture you heard and the reading passage. Try to answer the question as completely as possible using information from the reading passage and the lecture. The question does not ask you to express your personal opinion. You may refer to the reading passage again when you write. You may use your notes to help you answer the question. Typically, an effective response will be 150 to 225 words. Your response will be judged on the quality of your writing and on the completeness and accuracy of the content. You will be given 3 minutes passage. Then you will listen to the lecture. Then you will be allowed 20 minutes to plan and write your response.  46 APPENDIX E – ANALYTIC RUBRIC Essay ID #:____________________  Content Organization Vocabulary Language Use Score Mechanics /2 20 Thorough and 20 Excellent overall 20 Very sophisticated 20 No major errors in 20 logical development organization vocabulary word order or of thesis Clear thesis Excellent choice of complex structures Substantive and statement words with no errors No errors that detailed Substantive Excellent range of interfere with No irrelevant introduction and vocabulary comprehension 16 information 16 conclusion 16 Idiomatic and near 16 Only occasional 16 Interesting Excellent use of native-like errors in A substantial transition word vocabulary morphology number of words for Excellent Academic register Frequent use of amount of time connections complex sentences given between paragraphs Excellent sentence Unity within every variety paragraph Appropriate layout with indented paragraphs No spelling errors No punctuation errors 15 Good and logical development of thesis Fairly substantive and detailed Almost no irrelevant information 11 Somewhat interesting Appropriate layout with indented paragraphs No more than a few spelling errors in less frequent vocabulary No more than a few punctuation errors  15 Good overall organization Clear thesis statement Good introduction and conclusion Good use of transition 11 wordsGood connections 15 Somewhat 15 Occasional errors sophisticated in awkward order vocabulary or complex Attempts, even if not structures completely Almost no errors successful, at that interfere with sophisticated comprehension vocabulary Attempts, even if 11 Good choice of 11 not completely words with some successful, at a 47 15 11 An adequate number of words for the amount of time given between paragraphs Unity within most paragraphs errors that don’t obscure meaning Adequate range of vocabulary but some repetition Approaching academic register variety of complex structures Some errors in morphology Frequent use of complex sentences Good sentence variety 10 Some development 10 Some general 10 Unsophisticated 10 of thesis coherent vocabulary Limited Not much substance organization word choice with or detail Minimal thesis some errors Some irrelevant statement or main obscuring meaning information idea Repetitive choice of Somewhat Minimal words uninteresting introduction and No resemblance to 6 Limited number of 6 conclusion 6 academic register 6 words for the Occasional use of amount of time transitions words given Some disjointed connections between paragraphs Some paragraphs may lack unity Errors in word 10 order or complex structures Some errors that interfere with comprehension Frequent errors in morphology Minimal use of 6 complex sentences Little sentence variety Appropriate layout with most paragraphs indented Some spelling errors in less frequent and more frequent vocabulary Several punctuation errors 5 Serious errors in 5 word order or complex structures Frequent errors that interfere with comprehension Many error in 0 No attempt to arrange essay into paragraphs Several spelling errors even in frequent vocabulary Many punctuation 0  No development of thesis No substance or details Substantial amount of irrelevant information 5 0 No coherent 5 Very simple organization vocabulary No thesis statement Severe errors in or main idea word choice that No introduction and often obscure conclusion meaning No use of transition 0 No variety in word 48 5 0 Completely uninteresting Very few words for the amount of time given words Disjointed connections between paragraphs Paragraphs lack unity choice No resemblance to academic register   49 morphology Almost no attempt at complex sentences No sentence variety errors APPENDIX F– WRITING PROMPT  Summarize the points made in the lecture you just heard, explaining how they cast doubt on points made in the reading.  50 APPENDIX G – INTERVIEW 1 TRANSCRIPT [00:00:01.23] Interviewer: ok, um..what..which of the lectures did you end up preferring? ..the video or the picture [00:00:17.00] Speaker 1: I think the video one is better. actually..umm..except the problem of the topics, actually they are different topics...i think videos and i can focus on the lecturer's eyes or her hand gestures or facial features, especially mouth and i can understand when does she when she stopped and when some kind of her attitude and uh i can have more understanding about that. And i think it is better...it's kind of distraction, but i think it's good because i took tons of times TOEFL listening test and when every time i listened to lectures i was very easily to distracted because..because only audio input and you stay there and you feel sleepy or you feel um you want to do anything el...something else so you cannot focus on the lecture very well, but when you have someone you can have some eye contact with the lecturer or you have...some...other clues to focus on and you will not...will not so easily to uh...you know to disconcentrate. [00:01:41.14] Speaker 2: uh..i think it's the same as she said, but at first i didn't like it. because i am...i am the kind of per..people who like to look around everything. So..but...actually, I understand more because her gestures help me like to find out word like "burrow." I didn't know it was the hole in the ground unless she did with her gesture that (hand motion). And also, like she said, the eye contact and some gestures just show you the important things she's going to do. So...her head?hand (?) just maybe talks as important as her words. [00:02:31.20] Interviewer: so you're saying that when you...when you were watching the video you were paying more attention to her... [00:02:38.01] Speaker 2:her gestures...yeah  51 [00:02:36.26] Interviewer: ok. were you paying attention to her lips at all?Did you see her lips? [00:02:42.29] Both: No... [00:02:46.09] Interviewer: Um... [00:02:48.21] Speaker 1: but, but, but i don't think that uh...i think that, i think underst...seeing, looking at her lips clearly is kind of important because actually uh international students' listening skills is not as good as...yeah...so...if they have some...they have they can look more clearly.They can have more understanding about what was really said. For example, when I watch the TV shows or I watch CNN I always depend on their lips to understand more about what they said. [00:03:23.06] Speaker 2: I never looked at their lips. *laughs* [00:03:32.08] Speaker 1: it's important [00:03:32.08] Interviewer: Um...do you think that gestures are more important than lip reading maybe? [00:03:41.18] Speaker 2: yeah, to me it is. [00:03:42.23] Speaker 1: yes, yes, to me too. umm...uh, uh, uh, like _________ said, at first i'm not really used to the video one, because i'm , i when i practice my TOEFL test i usually use the audio one, but after when we take the second task, i feel a little bit weird when only listen to the audio. yeah..it's not...i cannot [00:04:06.25] Interviewer: so the difference between the two is a little jarring? [00:04:14.21] Speaker 1:yeah...yup  52 [00:04:18.24] Speaker 2: the first is much better, the video than that picture. I will not get...like i'm not gonna get anything out of the picture that's there. no gestures, no eye contact...you don't, you just don't know. [00:04:33.00] Speaker 1:actually, pictures there is no use, totally no use. so i don't know why TOEFL listening test always give me some pictures there. [00:04:46.10] Interviewer: what types of things do you pay attention to when you're listening? what do you think a listening test should include in it's listening? [00:05:16.19] Speaker 2:i think anything that's mainly related to that article. If they're going to put a picture, i don't need to see the lecturer, i wanna see that, i think they were talking about the meerkat? On the first one. It was very helpful. I'd like to see the picture ofthe meerkat, or even the meerkat guard. [00:05:42.03] Speaker 1:it helps out [00:05:45.12] Speaker 2:yeah, i think pictures are important if there's no video. [00:05:54.24] Speaker 1: i'm sorry, i cannot geet your question. [00:06:03.25] Interviewer::so when you're given a listening, and say you're in the real world. so compare what you have in the test to the real world. what do you think the listening in your test should include from the outside? so like when you're listening to people talk, what are you paying attention to? Do you think those should be included in the test? [00:06:34.04] Speaker 2:yes [00:06:35.29] Speaker 1: let me think. i don't have a lot of opinions about this questions. maybe...pictures as an add, not the main one. and i need some...i don't, i...i don't know how to explain it well. the frequency of the voice? yeah, because most of tests i would listen are not  53 too...you know...they don't have lots of emotions when they explain something. yeah...i think i need more emotions. [00:07:15.27] Interviewer: ok. umm...i have one more question. When you were taking notes, did you find it more difficult for the video listening than for the the audio listening...to take notes? [00:07:35.29] Speaker 2:Video was more difficult [00:07:41.00] Interviewer: more difficult?what made it more difficult? [00:07:39.15] Speaker 2:um because, to me i want to concentrate on what she's saying and what she's doing and her gestures, and also to write what i think. uh...it's just i'm not good with multitasking so...that's why. [00:07:54.25] Speaker 1:yeah, to me, it's kind of, but not affect much because i think, uh...in my opinion, i think the video lecture um the gestures and the facial expressions such other clues are useful, but not uh...not necessary. Like the lecturer might say, "please feel free to look at my face." but you can look and you cannot, you don't have to, you don't have to look at it. So if I want to focus on my, take notes I won't look at her. Yeah. I don't think to me it is kind of multi...mult...multitasking? because I will divide it into two sides. If I know, I am sure that this part is an important part I need to take notes, I won't...I don't have to look at her anymore. Yeah. [00:08:55.16] Interviewer: What types of gestures did you find most helpful? [00:08:59.16] Speaker 2:Hand gestures. [00:08:56.07] Speaker 1:Hand gestures. Yes. Really Helpful. Like ________ said this (makes burrow gesture) [00:09:07.04] Interviewer: Yeah. The burrow? [00:09:09.23] Speaker 1:Yeah that was very helpful.  54 APPENDIX H – INTERVIEW 2 TRANSCRIPT [00:00:00.16] Interviewer: um so which, which of the lectures did you end up preferring? [00:00:11.07] Speaker 1:the second [00:00:09.08] Interviewer: the second. which one is the second one? [00:00:11.01] Speaker 1:The vote. Vote [00:00:18.05] Interviewer: Was it video or picture? [00:00:20.04] Speaker 1:I Picture. I think picture is more...i can understand picture uh...more clear. Clear...you know...I think so, but I don't know. [00:00:38.23] Interviewer: Why do you think you understand it more clearly? [00:00:43.28] Speaker 1:You know, your computer play a video, I am easy to interrupt...by a video...by a motion I mean. So I cannot take notes very well. And that is the reason. [00:00:59.20] Interviewer: Ok. What about you? [00:01:04.17] Speaker 2:Well Audio with picture was attentive? I couldn't concentrate on it any more because there is no moving or anything like that. Audio with video, the lecturer was moving and hand motion and everything. I was just following her action, looking at her. So, I couldn't concentrate as much as the audio with picture. [00:01:31.18] Interviewer: Ok. So you both like the audio with picture... [00:01:35.00] Speaker 1: I have another... [00:01:37.17] Interviewer: Ok [00:01:37.17] Speaker 1: If I do not take notes, I think it is better for video, because I mean...body language is helpful. So, I can understand more, but if we need to take notes, it's not a good idea.  55 [00:01:52.22] Interviewer: Did you think, if...did you think it was necessary to take notes as much with the video? [00:02:01.27] Speaker 1:....I like to take notes. But the video is actually interrupt me...to take notes. I didn't try it...you know, I just...but I think TOEFL test is necessary to take notes when the lecture is playing. [00:02:20.01] Interviewer: Ok. Um...Do you think that...do you think it was...like notes aside...do you think it was easier to understand using the video....instead of just the still picture? [00:02:45.05] Speaker 1:Without taking notes? I think it's video, because I can understand more by body language and movement and...you know...and, I can also understand the emotion of the professor. But if I taking notes, that is really interrupt me. So, it depends. [00:03:15.25] Interviewer: Ok. What about you? What's your feeling? [00:03:26.00] Speaker 2:What was your question? [00:03:25.14] Interviewer: Do you think that the video helped you to understand more? Say you didn't have to take notes and you could just watch the video. Do you think it would've helped you understand more? [00:03:37.07] Speaker 2:Um...like, I understand more...but when I have to write it, I forget the story. So if I don't write it I cannot like...I have to compare both the reading and lecture, but I cannot remember the lecture...lecturer's saying. So I have to take notes. But if I don't take notes I understand better. [00:04:01.05] Interviewer: Ok. So....so the use, so looking at the gestures it didn't help you to remember it? Did it help you? [00:04:21.18] Speaker 1:It's a really hard question. It depends. But...help me to remember? no. Help me to understand? yes  56 [00:04:41.26] Interviewer: Um...Did you...so...it's more difficult to take notes...um...do you think it's important to include that video since you 're being tested on your academic skills in a lecture? Even though it's more difficult to take notes, do you think it's more important to include that video because....of more realistic situations... [00:06:26.16] Speaker 1:in TOEFL test? No. Because, first of all, it's a...you know...I think it interrupt people to understand, when, during the test. Second, the network may be no good. So...I think the most important point is they interrupt. [00:06:52.07] Interviewer: What about you? Do you feel the same way? [00:07:04.18] Speaker 2:I don't feel any realistic...from...from the video since it is, I cannot...(?????)...test TOEFL, I always feel like "oh yeah, it's not like i'm in there..." so... [00:07:23.23] Interviewer: One final question...was the audio in the videos clear in both of the passages? [00:07:47.17] Speaker 1:Yeah...I understand the second one, the American Votes. But the first one is a little bit not very clear. Maybe I interrupt by something, because I did multitasking, you know I look at watching the video and taking the notes, listen to the audio, so maybe it's not very distinct. [00:08:07.20] Interviewer: So you're saying the first one is not as....the quality of the audio is not as good? [00:08:14.06] Speaker 1:It's not as good as the second one, but it's still okay, but it...i think the second one is better. [00:08:28.20] Speaker 2:Same as him.  57 REFERENCES  58 REFERENCES Alibali, M. W., Heath, D. C., & Myers, H. J. (2001). Effects of visibility between speaker and listener on gesture production: Some gestures are meant to be seen. Journal of Memory and Language, 44, 169-188. Arnold, J. (2000). Seeing through listening comprehension exam anxiety. TESOL Quarterly 34, 777–786. Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press. Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice: Designing and developing useful language tests. Oxford: Oxford University Press. Baltova, I. (1994). The impact of video on the comprehension skills of core French students. Canadian Modern Language Review, 50(3), 507-531. Bejar, I., Douglas, D., Jamieson, J., Nissan, S., & Turner, J. (2000). TOEFL 2000 listening framework: A working paper . Princeton, NJ: Educational Testing Service. Brett, P. (1997). A comparative study of the effects of the use of multimedia on listening comprehension. System, 25(1), 39-53. Buck, G. (2001). Assessing listening. Cambridge: Cambridge University Press. Carrell, P. (2007). Notetaking strategies and their relationship to performance on listening comprehension and communicative assessment tasks (TOEFL Monograph Series No. MS-25). Princeton, NJ: ETS. Carrell, P. L., Dunkel, P. A., & Mollaun, P. (2004). The effects of notetaking, lecture length and topic on a computer-based test of ESL listening comprehension. Applied Language Learning, 14, 83-105. Chung, U. K. (1994). The effect of audio, a single picture, multiple pictures, or video on secondlanguage listening comprehension. Unpublished PhD dissertation, University of Illinois at Urbana-Champaign. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New York: Academic Press. Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159. Coniam, D. (2001). The use of audio or video comprehension as an assessment instrument in the certification of English language teachers: A case study. System, 29, 1-14.  59 Duff, P. (2002). Research approaches in applied linguistics. In R. A. Kaplan (Ed.), The Oxford handbook of applied linguistics (pp. 13-23). Oxford, UK: Oxford University Press. English, S. L. (1982, May). Kinesics in academic listening. Paper presented at the 16th annual conventionof Teachers of English to Speakers of Other Languages, Honolulu, HI. (ERIC Document Reproduction Service No. ED 218 976) Educational Testing Service. (2009). The official guide to the TOEFL® test (3rd ed.). New York, NY: McGraw Hill. Educational Testing Service. (2010). TOEFL iBT™ test framework and tst development. Retrieved February 24, 2011 from http://www.ets.org/toefl/research/ibt_insight_series Education Testing Service. (2011). Reliability and comparability of TOEFL iBT™ scores. Retrieved February 24, 2011 from http://www.ets.org/toefl/research/ibt_insight_series Ginther, A. (2002). Context and content visuals and performance on listening comprehension stimuli. Language Testing, 19(2), 133-167. Gruba, P. (1993). A comparison study of audio and video in language testing. JALT Journal, 15(1), 85-88. Gruba, P. (1997). The role of video media in listening assessment. System, 25 (3), 335-345. Gruba, P. (1999). The role of digital video media in second language listening comprehension. Unpublished PhD dissertation, Department of Linguistics and Applied Linguistics, University of Melbourne. Retrieved May 5, 2008, from http://eprints.unimelb.edu.au/archive/00000244/ Hadar, U., Wenkert-Olenik, D., Krauss, R., & Soroket, N. (1998). Gesture and the processing of speech: Neuropsychological evidence. Brain and Language, 62, 107-126. Hamp-Lyons, L. (1991). Scoring procedures for ESL contexts. In: L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 241–276). Norwood, NJ: Ablex. In’nami, Y. (2006). The effects of test anxiety on listening test performance. System 34, 317–340. Kintsch, W. (1998). Comprehension. Cambridge: Cambridge University Press. Lado, R. (1961). Language testing: The construction and use of foreign language tests. London: Longman. Liu, Y. (2001). A cognitive study on the functions of note-taking and the content of notes taken in a context of Chinese EFL learners. (Unpublished master’s thesis). Guangdong University of Foreign Studies, Guangdong, People’s Republic of China.  60 Londe, Z. C. (2009). The effects of video media in English as a second language listening comprehension tests. Issues in Applied Linguistics, 17(1), 41-50. McNamara, T. (1996). Measuring second language performance. London: Longman. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (pp. 13-103). New York: Macmillan. Messick, S. (1996). Validity and washback in language testing. Language Testing, 13, 242-256. Morrel-Samuels, P., & Krauss, R. M. (1992). Word familiarity predicts temporal asynchrony of hand gestures and speech. Journal of Experimental Psychology: Learning, Memory and Cognition, 18, 615-662. Ockey, G. J. (2007). Construct implications of including still image or video in computer- based listening tests. Language Testing, 24(4), 517-537. Polio, C., & Hughes, A. Writing development in an ESL program: What can we expect in 15 weeks? TESOL Annual Convention, Salt Lake City, April 2002. Progrosh, D. (1996). Using video for listening assessment: Opinions of test-takers. TESL Canada Journal, 14, 34-44. von Raffler-Engel, W. (1980). Kinesics and paralinguistic: A neglected factor in secondlanguage research and teaching. Canadian Modern Language Review, 36(2), 225-237. Rubin, J. (1995). The contribution of video to the development of competence in listening. In D. Mendelsohn, & J. Rubin (Eds.), A guide for the teaching of second language listening (pp. 151-165). San Diego, CA: Dominie Press. Sueyoshi, A., & Hardison, D. M. (2005). The role of gestures and facial cues in second language listening comprehension. Language Learning, 55, 661-699. Suvorov, R. (2008). Context visuals in L2 listening tests: The effectiveness of photographs and video vs. audio-only format (Unpublished master’s thesis). Iowa State University, Ames, IA. Suvorov, R. (2009). Context visuals in L2 listening tests: The effects of photographs and video vs. audio-only format. In C. A. Chapelle, H. G. Jun, & I. Katz (Eds.) Developing and evaluating language learning materials (pp 53-68). Ames, IA: Iowa State University. Thompson, I. (1995). Assessment of second/foreign language listening comprehension. In D. Mendelsohn, & J. Rubin (Eds.), A guide for the teaching of second language listening (pp. 31-58). San Diego, CA: Dominie.  61 Vandergrift, L. (2007). Recent developments in second and foreign language listening comprehension research. Language Teaching, 40, 191-201. Wagner, E. (2007). Are they watching? Test-taker viewing behavior during an L2 video listening test. Language Learning and Technology , 11(1), 67-86. Wagner, E. (2008). Video listening tests: What are they measuring? Language Assessment Quarterly, 5(3), 218-243. Wagner, E. (2010). The effect of the use of video texts on ESL listening test-taker performance. Language Testing, 27, 493-513. Wagner, M. (2006). Utilizing the visual channel: An investigation of the use of video texts on tests of second language listening ability. Unpublished doctoral dissertation, Teachers College, Columbia University, New York. Weigle, S. C. (2002). Assessing writing. Cambridge: Cambridge University Press. Weir, C. (2005). Language testing and validation: An evidence-based approach. Basingstoke: Palgrave Macmillan.   62